wiki:FAQ/EliteSet

What is OP_ELITE_SET for? How does it differ from OP_OR?

If you want to implement a feature which finds documents similar to a piece of text, an obvious approach is to build an "OR" query from all the terms in the text, and run this query against a database containing the documents. However such a query can contain a lots of terms and be quite slow to perform, yet many of these terms don't contribute usefully to the results.

The OP_ELITE_SET operator can be used instead of OP_OR in this situation. OP_ELITE_SET selects the most important N terms and then acts as an OP_OR query with just these, ignoring any other terms. This will usually return results just as good as the full OP_OR query, but much faster.

In general, the OP_ELITE_SET operator can be used when you have a large OR query, but it doesn't matter if the search completely ignores some of the less important terms in the query.

You can specify a parameter to the query constructor which control the number of terms which OP_ELITE_SET will pick. If not specified, this defaults to 10 (Xapian used to default to ceil(sqrt(number_of_subqueries)) if there are more than 100 subqueries, but this rather arbitrary special case was dropped in 1.3.0). For example, this will pick the best 7 terms:

Xapian::Query query(Xapian::Query::OP_ELITE_SET, subqs.begin(), subqs.end(), 7);

If the number of subqueries is less than this threshold, OP_ELITE_SET behaves identically to OP_OR.

FAQ Index

Last modified 13 years ago Last modified on 12/22/11 02:29:36
Note: See TracWiki for help on using the wiki.