Opened 20 years ago

Closed 20 years ago

Last modified 19 years ago

#31 closed defect (released)

estimates of matching documents ignore collapse operation

Reported by: Richard Boulton Owned by: Olly Betts
Priority: high Milestone:
Component: Library API Version: SVN trunk
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: Linux

Description

When performing a match using a collapse operation, the bounds and estimate of the number of matching documents, returned by get_matches_lower_bound(), get_matches_estimated() and get_matches_upper_bound(), are not adjusted to reflect the changed number of matching documents.

Strictly, this is the behaviour documented by the documentation comments for the functions: eg get_matches_lower_bound() is documented as "A lower bound on the number of documents in the database which have a weight greater than zero."

However, the functions currently take into account percentage cutoffs, so the behaviour is already not precisely as documented.

The get_matches_*() functions are considerably less useful if they don't take into account collapsing (in particular, they cannot easily be used for calculating how many links to pages of result documents to display), so I propose that the implementations should be updated to take into account whether collapsing is turned on, and the documentation should be amended to make it clear that collapsing and cutoffs will be taken into account.

I have a preliminary patch for this behaviour, which I will attach to this bugreport shortly.

Attachments (3)

diffs (12.4 KB ) - added by Richard Boulton 20 years ago.
Patch containing proposed fix
diffs2 (12.3 KB ) - added by Richard Boulton 20 years ago.
Updated patch (corrects mistake in estimation)
diffs3 (13.1 KB ) - added by Richard Boulton 20 years ago.
Argh. Another patch, this time thoroughly tested.

Download all attachments as: .zip

Change History (10)

comment:1 by Olly Betts, 20 years ago

Status: newassigned

Agreed. These functions should be defined in terms of the size of the "complete mset" - i.e. the mset you'd get if you asked for database.get_doccount() matches.

by Richard Boulton, 20 years ago

Attachment: diffs added

Patch containing proposed fix

by Richard Boulton, 20 years ago

Attachment: diffs2 added

Updated patch (corrects mistake in estimation)

by Richard Boulton, 20 years ago

Attachment: diffs3 added

Argh. Another patch, this time thoroughly tested.

comment:2 by Richard Boulton, 20 years ago

Attached a revised patch - takes account of the number of collapse keys seen, and fixes a bug involving the lower_bound() being overwritten when no cutoff is in use.

This patch is quite thoroughly tested, and seems to give good results.

comment:3 by Olly Betts, 20 years ago

I've not forgotten about this patch, but I'm failing to find time to look at it right now...

The interaction between percentage cutoff and collapse keys looks subtle, so I want to be sure that's correct, as it doesn't seem to be covered by the tests.

Incidentally, what happend to test_collapsekey2?!?

comment:4 by Richard Boulton, 20 years ago

Fair enough - I'll try and put together a test of combining percentage cutoff and collapse keys. I'll have to spend some time working out what is likely to go wrong - it'll probably need a few test cases to test it thoroughly.

As far as I can tell, nothing happened to test_collapsekey2. My patch doesn't change it.

(I don't really have time to look at this in detail now, either - should do in a few days, though. Feel free to prod if nothing is forthcoming soon.)

comment:5 by Olly Betts, 20 years ago

Re test_collapsekey2, I'd just looked at the patch which seemed to add 3 and 4 after 1. But looking more closely, 2 is muscat36 specific so is listed elsewhere...

comment:6 by Olly Betts, 20 years ago

Resolution: fixed
Status: assignedclosed

I've worked through refactoring the code to produce your changes, and I believe I better understand the patch now, and that my worries about the interaction between percentage cutoff and collapsing are unfounded.

Therefore I've applied the patch.

If you do get a chance, it'd still be good to have a test case which includes both. Coverage and all that...

comment:7 by Olly Betts, 20 years ago

Operating System: Linux
Resolution: fixedreleased

Fixed in 0.8.1

Note: See TracTickets for help on using tickets.