How can I make MSet::get_matches_estimated() more accurate?
By default, Xapian tries to minimise the amount of work done to produce the exact
search results asked for, and so MSet::get_matches_estimated()
may be rather inaccurate.
After all, it is just an estimate!
You can see how inaccurate it could be by looking at
MSet::get_matches_lower_bound()
and MSet::get_matches_upper_bound()
- the true
answer must be between these two bounds.
If you want to make the estimate more accurate (one particular reason for wanting to do so is
so you know how many pages of results to show buttons for), you can use the checkatleast
parameter to Enquire::get_mset()
. This parameter specifies the minimum number of
documents which the matcher will look at. By default we try to minimise this number, while
still returning correct results, as that makes searches faster. Note that setting
checkatleast
will tend to make searches slower, and the higher you set it, the worse
this effect will be.
If there are fewer matches than checkatleast
, then get_matches_estimated()
,
get_matches_lower_bound()
and get_matches_upper_bound()
will all
return the same answer, which will be the exact number of matches.
So if you want to show 10 page buttons and have 10 hits per page,
pass 101 as checkatleast
(the extra 1 allows you to tell the
difference between "exactly 100 hits" and "more than 100 hits").
If there are more, then get_matches_estimated()
won't necessarily
be exact, though because the matcher may have looked at more documents,
it will usually be a better estimate. You can look at get_matches_lower_bound()
and get_matches_upper_bound()
to see how wrong it could be.
Note that this isn't unique to Xapian - most search engines estimate the number of matches for reasons of performance (e.g. Google usually says something like "Results 1 - 10 of about 781,000").
Omega's MINHITS
CGI variable allows you to set checkatleast
.