Opened 14 years ago

Closed 14 years ago

Last modified 6 years ago

#464 closed defect (fixed)

get_matches_estimated() can be wrong when collapsing

Reported by: Olly Betts Owned by: Olly Betts
Priority: normal Milestone: 1.0.20
Component: Matcher Version: 1.1.4
Severity: normal Keywords:
Cc: sascha-web-trac.xapian.org@… Blocked By:
Blocking: Operating System: All

Description

Mailing list thread:

http://thread.gmane.org/gmane.comp.search.xapian.general/8213/focus=8219

Test case:

http://www.vdp.com/exchange/default.zip

I don't want to hold up 1.0.19/1.1.5 further, so will mark with for 1.1.6 once I've added that as a milestone.

Attachments (1)

bug-464-fix.patch (1.1 KB ) - added by Olly Betts 14 years ago.
candidate fix

Download all attachments as: .zip

Change History (10)

comment:1 by Olly Betts, 14 years ago

Milestone: 1.1.6

comment:2 by Richard Boulton, 14 years ago

I've just been trying to replicate this problem using the supplied database, xapian trunk, and the python bindings, but have had no success so far. For the record, the script I'm using is:

import xapian
db=xapian.Database('../../../tmp/default/')
qp=xapian.QueryParser()
qp.set_database(db)
qp.add_boolean_prefix('catalog', 'XCATALOG')
qp.add_boolean_prefix('productgroup', 'XPRODUCTGROUP')
q=qp.parse_query ('(catalog:2 OR catalog:425) AND productgroup:6')
e = xapian.Enquire(db)
e.set_sort_by_value(1, True)
e.set_collapse_key (0)
e.set_query (q)
def display(mset):
    print mset.get_matches_lower_bound(), \
          mset.get_matches_estimated(), \
          mset.get_matches_upper_bound()

display(e.get_mset(0, 10))
# prints 48 392 438
display(e.get_mset(0, 10, 1000))
# prints 417 417 417

The output of this seems to be self-consistent (but, interestingly, the number of results is 1 less than that reported in the email from the original reporter).

I think it's possible that this is a problem in omega, not in the matcher.

comment:3 by Sascha Silbe, 14 years ago

Cc: sascha-web-trac.xapian.org@… added

I am (or was - last tried in January with the then-current version of Xapian in Debian squeeze) seeing a very similar (or identical) issue in my version support branch for the Sugar data store, but never managed to get reproducible test cases. It only happens when offset/limit are used, so my current workaround is to implement offset/limit on the Python side (with the expected impact on performance).

comment:4 by Olly Betts, 14 years ago

OK, I can reproduce with Omega on trunk. Will look deeper.

comment:5 by Olly Betts, 14 years ago

BTW, Richard - your script ignores the B=NL and B=XTYPEproduct parameters being passed to Omega...

by Olly Betts, 14 years ago

Attachment: bug-464-fix.patch added

candidate fix

comment:6 by Olly Betts, 14 years ago

Status: newassigned

The attached patch seems to fix this issue for me. I'll test further tomorrow and sort out a regression testcase.

comment:7 by Olly Betts, 14 years ago

Milestone: 1.1.61.0.20

Fixed in trunk r14334.

This probably affects 1.0.x too, so marking for backporting.

Last edited 6 years ago by Olly Betts (previous) (diff)

comment:8 by Olly Betts, 14 years ago

Resolution: fixed
Status: assignedclosed

Fixed in 1.0 branch in r14336.

Last edited 6 years ago by Olly Betts (previous) (diff)

comment:9 by Olly Betts, 14 years ago

Sascha: If this doesn't fix your problem too, can you open a new ticket for that please?

Note: See TracTickets for help on using tickets.