How can I make MSet::get_matches_estimated() more accurate?
By default, Xapian tries to minimise the amount of work done to produce the exact
search results asked for, and so
MSet::get_matches_estimated() may be rather inaccurate.
After all, it is just an estimate!
You can see how inaccurate it could be by looking at
MSet::get_matches_upper_bound() - the true
answer must be between these two bounds.
If you want to make the estimate more accurate (one particular reason for wanting to do so is
so you know how many pages of results to show buttons for), you can use the
Enquire::get_mset(). This parameter specifies the minimum number of
documents which the matcher will look at. By default we try to minimise this number, while
still returning correct results, as that makes searches faster. Note that setting
checkatleast will tend to make searches slower, and the higher you set it, the worse
this effect will be.
If there are fewer matches than
get_matches_upper_bound() will all
return the same answer, which will be the exact number of matches.
So if you want to show 10 page buttons and have 10 hits per page,
pass 101 as
checkatleast (the extra 1 allows you to tell the
difference between "exactly 100 hits" and "more than 100 hits").
If there are more, then
get_matches_estimated() won't necessarily
be exact, though because the matcher may have looked at more documents,
it will usually be a better estimate. You can look at
get_matches_upper_bound() to see how wrong it could be.
Note that this isn't unique to Xapian - most search engines estimate the number of matches for reasons of performance (e.g. Google usually says something like "Results 1 - 10 of about 781,000").
MINHITS CGI variable allows you to set