#31 closed defect (released)
estimates of matching documents ignore collapse operation
Reported by: | Richard Boulton | Owned by: | Olly Betts |
---|---|---|---|
Priority: | high | Milestone: | |
Component: | Library API | Version: | SVN trunk |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | Linux |
Description
When performing a match using a collapse operation, the bounds and estimate of the number of matching documents, returned by get_matches_lower_bound(), get_matches_estimated() and get_matches_upper_bound(), are not adjusted to reflect the changed number of matching documents.
Strictly, this is the behaviour documented by the documentation comments for the functions: eg get_matches_lower_bound() is documented as "A lower bound on the number of documents in the database which have a weight greater than zero."
However, the functions currently take into account percentage cutoffs, so the behaviour is already not precisely as documented.
The get_matches_*() functions are considerably less useful if they don't take into account collapsing (in particular, they cannot easily be used for calculating how many links to pages of result documents to display), so I propose that the implementations should be updated to take into account whether collapsing is turned on, and the documentation should be amended to make it clear that collapsing and cutoffs will be taken into account.
I have a preliminary patch for this behaviour, which I will attach to this bugreport shortly.
Attachments (3)
Change History (10)
comment:1 by , 20 years ago
Status: | new → assigned |
---|
comment:2 by , 20 years ago
Attached a revised patch - takes account of the number of collapse keys seen, and fixes a bug involving the lower_bound() being overwritten when no cutoff is in use.
This patch is quite thoroughly tested, and seems to give good results.
comment:3 by , 20 years ago
I've not forgotten about this patch, but I'm failing to find time to look at it right now...
The interaction between percentage cutoff and collapse keys looks subtle, so I want to be sure that's correct, as it doesn't seem to be covered by the tests.
Incidentally, what happend to test_collapsekey2?!?
comment:4 by , 20 years ago
Fair enough - I'll try and put together a test of combining percentage cutoff and collapse keys. I'll have to spend some time working out what is likely to go wrong - it'll probably need a few test cases to test it thoroughly.
As far as I can tell, nothing happened to test_collapsekey2. My patch doesn't change it.
(I don't really have time to look at this in detail now, either - should do in a few days, though. Feel free to prod if nothing is forthcoming soon.)
comment:5 by , 20 years ago
Re test_collapsekey2, I'd just looked at the patch which seemed to add 3 and 4 after 1. But looking more closely, 2 is muscat36 specific so is listed elsewhere...
comment:6 by , 20 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
I've worked through refactoring the code to produce your changes, and I believe I better understand the patch now, and that my worries about the interaction between percentage cutoff and collapsing are unfounded.
Therefore I've applied the patch.
If you do get a chance, it'd still be good to have a test case which includes both. Coverage and all that...
Agreed. These functions should be defined in terms of the size of the "complete mset" - i.e. the mset you'd get if you asked for database.get_doccount() matches.