Opened 20 years ago

Closed 17 years ago

Last modified 17 years ago

#23 closed enhancement (released)

Matcher could optimise by hoisting near/phrase filter

Reported by: Olly Betts Owned by: Olly Betts
Priority: high Milestone:
Component: Library API Version: SVN trunk
Severity: minor Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

The near/phrase filter could usefully be hoisted up the tree in some cases (leaving the AND part where it is). Consider:

e-mail AND filter This is probably more efficient as a search for: `e AND mail AND filter' with results from that filtered for phrase matches on "e mail".

But that's not clear cut. It might be that every document matches that AND, but just one match the phrase. In that case, the current code will try the phrase check on all documents, find one match, skip to that posting in "filter", find it matches, and return one results.

If the phrase match is hoisted, then the 3-way AND needs to look at *all* of the postings for "filter", and the phrase filtering still does the same amount of work.

This extreme is probably rare, but it's not totally obvious that hoisting the phrase filter is a good idea generally. Or it might be there's a good heuristic for when to. If there's an AND with a rare term which we can hoist the filter above for example...

Change History (7)

comment:1 by Olly Betts, 20 years ago

Status: newassigned

comment:2 by Olly Betts, 20 years ago

Severity: normalenhancement

comment:3 by Olly Betts, 17 years ago

Blocking: 120 added
Version: 0.7.5SVN HEAD

It would be nice to implement this in the 1.0 series.

comment:4 by Olly Betts, 17 years ago

Blocking: 200 added; 120 removed

I have a working patch for this which I'm currently running performance tests on. This should make it into 1.0.4, so marking this bug for 1.0.4.

comment:5 by Olly Betts, 17 years ago

Resolution: fixed
Status: assignedclosed

(Finally) fixed in SVN HEAD. The savings are impressive, and I think the worries about making things worse in some corner cases have turned out to be unfounded, or at least swamped by the improvements in real world cases.

comment:6 by Olly Betts, 17 years ago

Resolution: fixedreleased

Fixed in 1.0.4

comment:7 by Olly Betts, 17 years ago

Blocking: 200 removed
Operating System: All
Note: See TracTickets for help on using tickets.