wiki:FAQ/MoreAccurateEstimates

How can I make MSet::get_matches_estimated() more accurate?

By default, Xapian tries to minimise the amount of work done to produce the exact search results asked for, and so MSet::get_matches_estimated() may be rather inaccurate. After all, it is just an estimate!

You can see how inaccurate it could be by looking at MSet::get_matches_lower_bound() and MSet::get_matches_upper_bound() - the true answer must be between these two bounds.

If you want to make the estimate more accurate (one particular reason for wanting to do so is so you know how many pages of results to show buttons for), you can use the checkatleast parameter to Enquire::get_mset(). This parameter specifies the minimum number of documents which the matcher will look at. By default we try to minimise this number, while still returning correct results, as that makes searches faster. Note that setting checkatleast will tend to make searches slower, and the higher you set it, the worse this effect will be.

If there are fewer matches than checkatleast, then get_matches_estimated(), get_matches_lower_bound() and get_matches_upper_bound() will all return the same answer, which will be the exact number of matches.

So if you want to show 10 page buttons and have 10 hits per page, pass 101 as checkatleast (the extra 1 allows you to tell the difference between "exactly 100 hits" and "more than 100 hits").

If there are more, then get_matches_estimated() won't necessarily be exact, though because the matcher may have looked at more documents, it will usually be a better estimate. You can look at get_matches_lower_bound() and get_matches_upper_bound() to see how wrong it could be.

Note that this isn't unique to Xapian - most search engines estimate the number of matches for reasons of performance (e.g. Google usually says something like "Results 1 - 10 of about 781,000").

Omega's MINHITS CGI variable allows you to set checkatleast.

FAQ Index

Last modified 10 years ago Last modified on 01/08/09 11:52:40