Optimise MultiAndPostList using current weight of lhs
|Reported by:||Richard Boulton||Owned by:||Richard Boulton|
Currently, MultiAndPostList::next() advances the sub-postlists until they all reach a common document ID, by moving the first postlist forward, and then calling check on all the subsequent postlists, with the document ID that first postlist has moved to. The weights are not calculated until after the postlists have all settled on a position (so the w_min value is not used by MultiAndPostList - though it may be used by the sub-postlists).
Instead, each time the lhs moves to a new position, the current weight of that sub-postlist could be calculated. If this weight is insufficient, when combined with the maxweights of the remaining sub-postlists, to attain w_min(), the lhs could be moved further forward, without calling check() on any of the subsequent posting lists; thus reducing the frequency at which the subsequent posting lists need to be accessed. If the subsequent posting lists are "expensive" to check for some reason (perhaps they're external posting sources, or value ranges), this can speed things up.
Speculatively, this idea could be extended, such that the skip_to() and check() methods of posting sources would be given an extra "current_min_weight" parameter, indicating the minimum weight required for the specified document id. This would allow the optimisation to be applied outside within the individual sub-postlist trees of a MultiAndPostList.
I've got a prototype patch to MultiAndPostList, and some timings for a particular query. In most cases, I don't see a noticeable performance difference with the prototype patch, but in some cases, it results in a significant speed up. On the other hand, (though I've not observed it), it's plausible that this patch could slow things down if calculating the weight on the lhs postlist is more expensive than calling check() on the rhs postlist.
Change History (11)
comment:4 by , 10 years ago
|Summary:||Optimise MultiAndPostList using current weight of lhs. → Optimise MultiAndPostList using current weight of lhs|