Opened 17 years ago

Closed 16 years ago

#270 closed defect (fixed)

More efficient valuerangepostlist iteration

Reported by: Richard Boulton Owned by: Richard Boulton
Priority: normal Milestone: 1.1.0
Component: Matcher Version: SVN trunk
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

Currently, if a pure OP_VALUE_RANGE (or _GE or _LE) search is performed, ValueRangePostList::next() is called repeatedly to iterate through the documents. This starts at docid=1, and iterates through all documents ids <= lastdocid, checking for suitable values. If the docids used in the are sparse, this can result in a very slow iteration. It also results in lots of Xapian::DocNotFoundError exceptions being thrown, and then caught, while testing whether a particular document ID exists.

Instead, it would be better to use a direct iterator across the database. One approach is to use an all document postlist to get a iterator across the documents in the database. I'll attach a patch against SVN HEAD which implements such an approach to this ticket shortly. This approach has the downside that it usually requires iterating through the termlist table (with the current database backends, anyway). However, this table is already checked with the current approach when checking if a document for which get_value() has returned the empty string exists in the database, so this may not be much of a downside.

The ideal approach would be to add methods to the database interface to iterate through all the values in a particular slot, to use this iterator in value range postlists, and to implement such iterators efficiently in the database backends.

Attachments (1)

value_range_alldocs.patch (4.4 KB ) - added by Richard Boulton 17 years ago.
Patch to implement iteration using an alldocs postlist (originally from olly, updated to apply to HEAD)

Download all attachments as: .zip

Change History (3)

by Richard Boulton, 17 years ago

Attachment: value_range_alldocs.patch added

Patch to implement iteration using an alldocs postlist (originally from olly, updated to apply to HEAD)

comment:1 by Richard Boulton, 17 years ago

Component: OtherMatcher
Owner: changed from Olly Betts to Richard Boulton
Version: SVN HEAD

comment:2 by Richard Boulton, 16 years ago

Resolution: fixed
Status: newclosed

Implemented in revision [10659] by applying the patch.

Note: See TracTickets for help on using tickets.