Context Navigation

← Previous Ticket
Next Ticket →

#164 closed defect (released)

Filtering with ValueRangePostLists can be very slow for large databases

Reported by:	Richard Boulton	Owned by:	Olly Betts
Priority:	normal	Milestone:
Component:	Other	Version:	SVN trunk
Severity:	normal	Keywords:
Cc:		Blocked By:
Blocking:		Operating System:	All

Description

I've been testing with a database of slightly over 3 million documents. Every one of these documents has an associated value (which is an encoded number). Searches on this database typically take well under 0.1 seconds (when a reasonable amount of the database is cached, anyway). However, searches which are filtered with a ValueRange filter are extremely slow - typically taking around 9 seconds. The speed of such searches is independent of the number of terms matching the non-ValueRange part of the search.

The restriction performed by the ValueRangePostList typically matches only a small proportion of the documents in the database.

Analysis of the code with valgrind (for a search for which the entire database is cached, and hence the system is CPU bound) indicates that over 98% of the time is being spent in the ValueRangePostList::skip_to() function. What appears to happen is that the skip_to() function is called to move the ValueRangePostList to the next matching document after a given position, but because only a small proportion of the documents match the range restriction, and this involves stepping through the postlist testing documents to find the next match, a very large number of documents are accessed.

Ideally, the matcher would know, somehow, that skip_to() on a value range postlist is expensive, and in appropriate situations it would call an alternative method which simply checks whether a particular document is present in the postlist or not.

Attachments (2)

weak-skip_to-updated.patch (12.6 KB ) - added by Olly Betts 18 years ago.: Updated patch
weak_skip_to.patch (12.4 KB ) - added by Richard Boulton 18 years ago.: Rough implementation of a possible fix.

Download all attachments as: .zip

Change History (14)

comment:1 by Richard Boulton, 18 years ago

Blocking:	120 added

Marking this bug as for fixing in the 1.0 release series, since it doesn't require any API or ABI changes, and is a significant performance problem for some users.

comment:2 by Richard Boulton, 18 years ago

op_sys:	Linux → All
rep_platform:	PC → All

comment:3 by Richard Boulton, 18 years ago

Status:	new → assigned

comment:4 by Olly Betts, 18 years ago

Owner:	changed from Richard Boulton to Olly Betts
Status:	assigned → new

comment:5 by Olly Betts, 18 years ago

Blocking:	200 added; 120 removed
Status:	new → assigned

I'm working on a fix, so this should be done for 1.0.4.

comment:6 by Olly Betts, 18 years ago

Blocking:	203 added

comment:7 by Olly Betts, 18 years ago

Blocking:	203 removed

by Olly Betts, 18 years ago

Attachment:	weak-skip_to-updated.patch added

Updated patch

by Richard Boulton, 18 years ago

Attachment:	weak_skip_to.patch added

Rough implementation of a possible fix.

comment:8 by Olly Betts, 18 years ago

attachments.isobsolete:	0 → 1

comment:9 by Olly Betts, 18 years ago

Resolution:	→ fixed
Status:	assigned → closed

I've committed a patch to SVN which give most of the benefit of the proof of concept patch, but is significantly simpler (Richard's patch reduces the run time by ~75% for perftest.py with some real world queries from tweakers with a VRP filter attached, the patch I've applied gives ~72%). I've not managed to work out where the extra saving comes from, but after discussing with Richard, we're pleased with the result given the reduced complexity.

comment:10 by Olly Betts, 18 years ago

Fixed in 1.0.4

comment:11 by Olly Betts, 18 years ago

Resolution:	fixed → released

comment:12 by Olly Betts, 18 years ago

Blocking:	200 removed
Operating System:	→ All

Note: See TracTickets for help on using tickets.

Download in other formats: