Opened 20 years ago

Closed 20 years ago

Last modified 20 years ago

#37 closed defect (released)

Exception can be thrown on queries of form: term1 NOT term1-term2

Reported by: Bruce Toll Owned by: Olly Betts
Priority: high Milestone:
Component: Library API Version: 0.8.1
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

In xapian version 0.8.1, queries of the form 'term1 NOT term1-term2' cause an exception to be thrown in multimatch.cc:781 (denom > 0).

To drive this bug, there must be at least one document with term1, where not all occurrences are in phrases with term2 of form term1-term2.

As of 8/17/04, this can be demonstrated on the xapian.org home page with a search of 'xapian not xapian-project'.

I believe the problem occurs because 'term1-term2' is treated as part of an AND_NOT clause and thus gets evaluated as a boolean. As a result, term1 is initially inserted in term_info at localmatch.cc:403 with a termweight of 0. Subsequently, an attempt is made to insert term1 (from the LHS) at localmatch.cc:403 with a non-zero weight, but because term1 is already present, the later insert has no effect. Ultimately, this causes the failure at multimatch.cc:781 when the termweights in term_info are summed, resulting in a denom of zero.

A possible fix might be to add a check near localmatch.cc:403 to see if the term to be inserted is already in term_info with a termweight value of 0. If so, the existing termweight should be replaced by the termweight of the term to be inserted.

I am not familiar enough with the software to know whether this is a reasonable fix or whether it would have adverse side-effects.

Change History (8)

comment:1 by Bruce Toll, 20 years ago

In the example below, it should be 'xapian NOT xapian-project'. The problem can be driven with omega or quest.

comment:2 by Olly Betts, 20 years ago

op_sys: LinuxAll
rep_platform: PCAll
Status: newassigned

Thanks - I'll investigate...

comment:3 by Olly Betts, 20 years ago

I believe the correct fix is to simply sum the termweights for a term which occurs more than once, as that will fix related issues with a term which occurs in weighted form more than once.

I'm away from my dev box at present, but here's an untested patch (it does at least compile):

http://www.survex.com/~olly/localmatch.patch

Let me know if you get a chance to try it before I do.

comment:4 by Bruce Toll, 20 years ago

Thanks for the quick response. I'm currently getting a '404 File Not Found' trying to access the patch.

comment:5 by Olly Betts, 20 years ago

Oops, I copied it to my home directory, not my web directory. I've moved it and checked I can see it via the web now.

comment:6 by Bruce Toll, 20 years ago

I applied the patch and it fixes the reported problem. Thanks!

I also ran a "make check" and the output matches the output from the original unpatched 0.8.1 release on my system. That is, all tests passed except for the stemdict test which were skipped (since I haven't downloaded the stemming files from CVS).

Thanks, again.

comment:7 by Olly Betts, 20 years ago

Resolution: fixed
Status: assignedclosed

Fixed in CVS HEAD

comment:8 by Olly Betts, 20 years ago

Operating System: All
Resolution: fixedreleased

Fixed in 0.8.2

Note: See TracTickets for help on using tickets.