Opened 20 years ago

Closed 13 years ago

Last modified 8 years ago

#49 closed enhancement (fixed)

Ensure OP_ELITE_SET matches at least some documents

Reported by: Olly Betts Owned by: Olly Betts
Priority: lowest Milestone: 1.2.9
Component: Library API Version: SVN trunk
Severity: minor Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description (last modified by Olly Betts)

OP_ELITE_SET should never select groups of subqueries which don't match any documents. (Currently, it will exclude those for which termfreq_max() is 0, but this may still result in a bad choice).

Change History (8)

comment:1 by Olly Betts, 20 years ago

Severity: blockernormal
Status: newassigned

comment:2 by Olly Betts, 20 years ago

Severity: normalenhancement

Since OP_ELITE_SET performs an OR on the subqueries it selects, this can only be a problem if the selected subqueries are all something like 'a AND b' or 'a NOT b' or NEAR/PHRASE operations, and none of these match anything.

This is pretty obscure, and I'm not sure what the solution is. Perhaps read the first posting from each subquery before picking the elite set to determine if any are in reality empty?

comment:3 by Olly Betts, 20 years ago

Component: otherLibrary API
op_sys: otherAll
Priority: highlowest
rep_platform: OtherAll
Version: otherCVS HEAD

comment:4 by Olly Betts, 18 years ago

Operating System: All
Owner: changed from Olly Betts to Not currently assigned
Status: assignednew

comment:6 by Olly Betts, 14 years ago

Description: modified (diff)
Milestone: 1.2.x
Summary: Ensure OP_ELITE_SET matches at least some documentEnsure OP_ELITE_SET matches at least some documents

No API or ABI changes required, so suitable for fixing in 1.2.x, so setting milestone to reflect this. Probably not a high priority though.

comment:7 by Olly Betts, 14 years ago

Milestone: 1.2.x1.3.0
Owner: changed from Not currently assigned to Olly Betts
Status: newassigned

Revisiting this ticket, I'm now thinking that we should just note in the documentation that this can happen when OP_ELITE_SET is used with non-term subqueries. The natural use case for OP_ELITE_SET is to pick a sane-sized set of terms from a larger set and it's fine there.

Also, FILTER(OP_ELITE_SET(A,B,C,...),Z) might match some or no documents, depending which of A,B,C,... are selected, so why should OP_ELITE_SET(FILTER(A,Z),FILTER(B,Z),FILTER(C,Z),...) be handled so differently?

So unless there's disagreement, I suggest we document this and do it for 1.3.0 (and then backport to 1.2.x).

comment:8 by Olly Betts, 13 years ago

Milestone: 1.3.01.2.9

Richard agreed on IRC that just documenting was reasonable, so done on trunk r16215.

Marking to backport for 1.2.9.

Last edited 8 years ago by Olly Betts (previous) (diff)

comment:9 by Olly Betts, 13 years ago

Resolution: fixed
Status: assignedclosed

Backported in r16223.

Last edited 8 years ago by Olly Betts (previous) (diff)
Note: See TracTickets for help on using tickets.