Opened 17 years ago

Closed 17 years ago

Last modified 17 years ago

#164 closed defect (released)

Filtering with ValueRangePostLists can be very slow for large databases

Reported by: Richard Boulton Owned by: Olly Betts
Priority: normal Milestone:
Component: Other Version: SVN trunk
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

I've been testing with a database of slightly over 3 million documents. Every one of these documents has an associated value (which is an encoded number). Searches on this database typically take well under 0.1 seconds (when a reasonable amount of the database is cached, anyway). However, searches which are filtered with a ValueRange filter are extremely slow - typically taking around 9 seconds. The speed of such searches is independent of the number of terms matching the non-ValueRange part of the search.

The restriction performed by the ValueRangePostList typically matches only a small proportion of the documents in the database.

Analysis of the code with valgrind (for a search for which the entire database is cached, and hence the system is CPU bound) indicates that over 98% of the time is being spent in the ValueRangePostList::skip_to() function. What appears to happen is that the skip_to() function is called to move the ValueRangePostList to the next matching document after a given position, but because only a small proportion of the documents match the range restriction, and this involves stepping through the postlist testing documents to find the next match, a very large number of documents are accessed.

Ideally, the matcher would know, somehow, that skip_to() on a value range postlist is expensive, and in appropriate situations it would call an alternative method which simply checks whether a particular document is present in the postlist or not.

Attachments (2)

weak-skip_to-updated.patch (12.6 KB ) - added by Olly Betts 17 years ago.
Updated patch
weak_skip_to.patch (12.4 KB ) - added by Richard Boulton 17 years ago.
Rough implementation of a possible fix.

Download all attachments as: .zip

Change History (14)

comment:1 by Richard Boulton, 17 years ago

Blocking: 120 added

Marking this bug as for fixing in the 1.0 release series, since it doesn't require any API or ABI changes, and is a significant performance problem for some users.

comment:2 by Richard Boulton, 17 years ago

op_sys: LinuxAll
rep_platform: PCAll

comment:3 by Richard Boulton, 17 years ago

Status: newassigned

comment:4 by Olly Betts, 17 years ago

Owner: changed from Richard Boulton to Olly Betts
Status: assignednew

comment:5 by Olly Betts, 17 years ago

Blocking: 200 added; 120 removed
Status: newassigned

I'm working on a fix, so this should be done for 1.0.4.

comment:6 by Olly Betts, 17 years ago

Blocking: 203 added

comment:7 by Olly Betts, 17 years ago

Blocking: 203 removed

by Olly Betts, 17 years ago

Attachment: weak-skip_to-updated.patch added

Updated patch

by Richard Boulton, 17 years ago

Attachment: weak_skip_to.patch added

Rough implementation of a possible fix.

comment:8 by Olly Betts, 17 years ago

attachments.isobsolete: 01

comment:9 by Olly Betts, 17 years ago

Resolution: fixed
Status: assignedclosed

I've committed a patch to SVN which give most of the benefit of the proof of concept patch, but is significantly simpler (Richard's patch reduces the run time by ~75% for perftest.py with some real world queries from tweakers with a VRP filter attached, the patch I've applied gives ~72%). I've not managed to work out where the extra saving comes from, but after discussing with Richard, we're pleased with the result given the reduced complexity.

comment:10 by Olly Betts, 17 years ago

Fixed in 1.0.4

comment:11 by Olly Betts, 17 years ago

Resolution: fixedreleased

comment:12 by Olly Betts, 17 years ago

Blocking: 200 removed
Operating System: All
Note: See TracTickets for help on using tickets.