Opened 8 years ago

Last modified 12 months ago

#700 new defect

Support Enquire::matching_terms_begin() without termlist table?

Reported by: Olly Betts Owned by: Olly Betts
Priority: normal Milestone: 2.0.0
Component: Backend-Glass Version:
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description (last modified by Olly Betts)

(Split out of #181)

Currently Enquire::matching_terms_begin() uses the termlist of the document, comparing it with terms in the query. This means it doesn't work if the database has no termlist. It's also another item to lookup for each result, and comparing the two lists of terms isn't free.

It's also arguably not quite correct in some cases, for example for this query:

A OR (B AND NOT C)

It'll report A and B as matching terms in a document containing all three terms, but perhaps only A should be reported in such a case since B AND NOT C wouldn't say B matched this document.

We could record the information about matching terms for each candidate entry in the proto-MSet, which would solve both of these issues. The tricky part is doing this in a way which doesn't incur a significant space or time overhead during the match. E.g. a bitmap of matching terms is fairly space efficient.

If we don't care about the corner cases of which terms match like the one above, we could also skip through the posting lists a second time to get this information. More data to decode, but it's likely to already be in cache.

Probably doesn't need API or ABI changes, so suitable for 1.4.x.

Change History (2)

comment:1 by Olly Betts, 4 years ago

Description: modified (diff)

comment:2 by Olly Betts, 12 months ago

Milestone: 1.4.x2.0.0
Note: See TracTickets for help on using tickets.