Ticket #37 (closed defect: released)
Exception can be thrown on queries of form: term1 NOT term1-term2
| Reported by: | btoll | Owned by: | olly |
|---|---|---|---|
| Priority: | high | Milestone: | |
| Component: | Library API | Version: | 0.8.1 |
| Severity: | normal | Keywords: | |
| Cc: | Blocked By: | ||
| Operating System: | All | Blocking: |
Description
In xapian version 0.8.1, queries of the form 'term1 NOT term1-term2' cause an exception to be thrown in multimatch.cc:781 (denom > 0).
To drive this bug, there must be at least one document with term1, where not all occurrences are in phrases with term2 of form term1-term2.
As of 8/17/04, this can be demonstrated on the xapian.org home page with a search of 'xapian not xapian-project'.
I believe the problem occurs because 'term1-term2' is treated as part of an AND_NOT clause and thus gets evaluated as a boolean. As a result, term1 is initially inserted in term_info at localmatch.cc:403 with a termweight of 0. Subsequently, an attempt is made to insert term1 (from the LHS) at localmatch.cc:403 with a non-zero weight, but because term1 is already present, the later insert has no effect. Ultimately, this causes the failure at multimatch.cc:781 when the termweights in term_info are summed, resulting in a denom of zero.
A possible fix might be to add a check near localmatch.cc:403 to see if the term to be inserted is already in term_info with a termweight value of 0. If so, the existing termweight should be replaced by the termweight of the term to be inserted.
I am not familiar enough with the software to know whether this is a reasonable fix or whether it would have adverse side-effects.
