#37 closed defect (released)
Exception can be thrown on queries of form: term1 NOT term1-term2
Reported by: | Bruce Toll | Owned by: | Olly Betts |
---|---|---|---|
Priority: | high | Milestone: | |
Component: | Library API | Version: | 0.8.1 |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
In xapian version 0.8.1, queries of the form 'term1 NOT term1-term2' cause an exception to be thrown in multimatch.cc:781 (denom > 0).
To drive this bug, there must be at least one document with term1, where not all occurrences are in phrases with term2 of form term1-term2.
As of 8/17/04, this can be demonstrated on the xapian.org home page with a search of 'xapian not xapian-project'.
I believe the problem occurs because 'term1-term2' is treated as part of an AND_NOT clause and thus gets evaluated as a boolean. As a result, term1 is initially inserted in term_info at localmatch.cc:403 with a termweight of 0. Subsequently, an attempt is made to insert term1 (from the LHS) at localmatch.cc:403 with a non-zero weight, but because term1 is already present, the later insert has no effect. Ultimately, this causes the failure at multimatch.cc:781 when the termweights in term_info are summed, resulting in a denom of zero.
A possible fix might be to add a check near localmatch.cc:403 to see if the term to be inserted is already in term_info with a termweight value of 0. If so, the existing termweight should be replaced by the termweight of the term to be inserted.
I am not familiar enough with the software to know whether this is a reasonable fix or whether it would have adverse side-effects.
Change History (8)
comment:1 by , 20 years ago
comment:2 by , 20 years ago
op_sys: | Linux → All |
---|---|
rep_platform: | PC → All |
Status: | new → assigned |
Thanks - I'll investigate...
comment:3 by , 20 years ago
I believe the correct fix is to simply sum the termweights for a term which occurs more than once, as that will fix related issues with a term which occurs in weighted form more than once.
I'm away from my dev box at present, but here's an untested patch (it does at least compile):
http://www.survex.com/~olly/localmatch.patch
Let me know if you get a chance to try it before I do.
comment:4 by , 20 years ago
Thanks for the quick response. I'm currently getting a '404 File Not Found' trying to access the patch.
comment:5 by , 20 years ago
Oops, I copied it to my home directory, not my web directory. I've moved it and checked I can see it via the web now.
comment:6 by , 20 years ago
I applied the patch and it fixes the reported problem. Thanks!
I also ran a "make check" and the output matches the output from the original unpatched 0.8.1 release on my system. That is, all tests passed except for the stemdict test which were skipped (since I haven't downloaded the stemming files from CVS).
Thanks, again.
In the example below, it should be 'xapian NOT xapian-project'. The problem can be driven with omega or quest.