#164 closed defect (released)
Filtering with ValueRangePostLists can be very slow for large databases
Reported by: | Richard Boulton | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Other | Version: | SVN trunk |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
I've been testing with a database of slightly over 3 million documents. Every one of these documents has an associated value (which is an encoded number). Searches on this database typically take well under 0.1 seconds (when a reasonable amount of the database is cached, anyway). However, searches which are filtered with a ValueRange filter are extremely slow - typically taking around 9 seconds. The speed of such searches is independent of the number of terms matching the non-ValueRange part of the search.
The restriction performed by the ValueRangePostList typically matches only a small proportion of the documents in the database.
Analysis of the code with valgrind (for a search for which the entire database is cached, and hence the system is CPU bound) indicates that over 98% of the time is being spent in the ValueRangePostList::skip_to() function. What appears to happen is that the skip_to() function is called to move the ValueRangePostList to the next matching document after a given position, but because only a small proportion of the documents match the range restriction, and this involves stepping through the postlist testing documents to find the next match, a very large number of documents are accessed.
Ideally, the matcher would know, somehow, that skip_to() on a value range postlist is expensive, and in appropriate situations it would call an alternative method which simply checks whether a particular document is present in the postlist or not.
Attachments (2)
Change History (14)
comment:1 by , 17 years ago
Blocking: | 120 added |
---|
comment:2 by , 17 years ago
op_sys: | Linux → All |
---|---|
rep_platform: | PC → All |
comment:3 by , 17 years ago
Status: | new → assigned |
---|
comment:4 by , 17 years ago
Owner: | changed from | to
---|---|
Status: | assigned → new |
comment:5 by , 17 years ago
Blocking: | 200 added; 120 removed |
---|---|
Status: | new → assigned |
I'm working on a fix, so this should be done for 1.0.4.
comment:6 by , 17 years ago
Blocking: | 203 added |
---|
comment:7 by , 17 years ago
Blocking: | 203 removed |
---|
comment:8 by , 17 years ago
attachments.isobsolete: | 0 → 1 |
---|
comment:9 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
I've committed a patch to SVN which give most of the benefit of the proof of concept patch, but is significantly simpler (Richard's patch reduces the run time by ~75% for perftest.py with some real world queries from tweakers with a VRP filter attached, the patch I've applied gives ~72%). I've not managed to work out where the extra saving comes from, but after discussing with Richard, we're pleased with the result given the reduced complexity.
comment:11 by , 17 years ago
Resolution: | fixed → released |
---|
comment:12 by , 17 years ago
Blocking: | 200 removed |
---|---|
Operating System: | → All |
Marking this bug as for fixing in the 1.0 release series, since it doesn't require any API or ABI changes, and is a significant performance problem for some users.