#464 closed defect (fixed)
get_matches_estimated() can be wrong when collapsing
| Reported by: | Olly Betts | Owned by: | Olly Betts |
|---|---|---|---|
| Priority: | normal | Milestone: | 1.0.20 |
| Component: | Matcher | Version: | 1.1.4 |
| Severity: | normal | Keywords: | |
| Cc: | sascha-web-trac.xapian.org@… | Blocked By: | |
| Blocking: | Operating System: | All |
Description
Mailing list thread:
http://thread.gmane.org/gmane.comp.search.xapian.general/8213/focus=8219
Test case:
http://www.vdp.com/exchange/default.zip
I don't want to hold up 1.0.19/1.1.5 further, so will mark with for 1.1.6 once I've added that as a milestone.
Attachments (1)
Change History (10)
comment:1 by , 16 years ago
| Milestone: | → 1.1.6 |
|---|
comment:2 by , 16 years ago
comment:3 by , 16 years ago
| Cc: | added |
|---|
I am (or was - last tried in January with the then-current version of Xapian in Debian squeeze) seeing a very similar (or identical) issue in my version support branch for the Sugar data store, but never managed to get reproducible test cases. It only happens when offset/limit are used, so my current workaround is to implement offset/limit on the Python side (with the expected impact on performance).
comment:5 by , 16 years ago
BTW, Richard - your script ignores the B=NL and B=XTYPEproduct parameters being passed to Omega...
comment:6 by , 16 years ago
| Status: | new → assigned |
|---|
The attached patch seems to fix this issue for me. I'll test further tomorrow and sort out a regression testcase.
comment:7 by , 16 years ago
| Milestone: | 1.1.6 → 1.0.20 |
|---|
Fixed in trunk r14334.
This probably affects 1.0.x too, so marking for backporting.
comment:8 by , 16 years ago
| Resolution: | → fixed |
|---|---|
| Status: | assigned → closed |
Fixed in 1.0 branch in r14336.
comment:9 by , 16 years ago
Sascha: If this doesn't fix your problem too, can you open a new ticket for that please?

I've just been trying to replicate this problem using the supplied database, xapian trunk, and the python bindings, but have had no success so far. For the record, the script I'm using is:
import xapian db=xapian.Database('../../../tmp/default/') qp=xapian.QueryParser() qp.set_database(db) qp.add_boolean_prefix('catalog', 'XCATALOG') qp.add_boolean_prefix('productgroup', 'XPRODUCTGROUP') q=qp.parse_query ('(catalog:2 OR catalog:425) AND productgroup:6') e = xapian.Enquire(db) e.set_sort_by_value(1, True) e.set_collapse_key (0) e.set_query (q) def display(mset): print mset.get_matches_lower_bound(), \ mset.get_matches_estimated(), \ mset.get_matches_upper_bound() display(e.get_mset(0, 10)) # prints 48 392 438 display(e.get_mset(0, 10, 1000)) # prints 417 417 417The output of this seems to be self-consistent (but, interestingly, the number of results is 1 less than that reported in the email from the original reporter).
I think it's possible that this is a problem in omega, not in the matcher.