Opened 15 years ago

Closed 9 years ago

#344 closed defect (fixed)

Allow calculation of percentages to be disabled

Reported by: Richard Boulton Owned by: Olly Betts
Priority: low Milestone: 1.1.2
Component: Library API Version: SVN trunk
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

Currently, all calls to get_mset() calculate percentage weights for each document. This has a measurable overhead, and percentages are often not needed. Therefore, it would be nice to be able to disable calculation of percentages for a match (and possibly even for the calculation to be disabled by default). Alternatively, if we can reduce the overhead to a very small amount (eg, less than 1%) it would probably be reasonable to continue calculating it in all cases, for the added convenience of not needing to enable it before searches.

I've just been examining the performance of a set of 10 term OR searches. According to kcachegrind, around 5.5% of the CPU time is spent at the end of get_mset() in reading the termlist of the top document; this is done only to check which terms are present in the top document, in order to calculate the percentage for that document. Therefore, the current overhead for these searches is at least 5.5% of the search time (when we're not IO bound).

There is a patch attached to ticket #216 (http://trac.xapian.org/attachment/ticket/216/calcpercent.patch) which adds this feature (though it may need updating to match SVN trunk). However, it seemed to have been ignored/forgotten, so I think it deserves a ticket to discuss it.

Change History (8)

comment:1 by Olly Betts, 15 years ago

Status: newassigned

Sorry, I was a bit eager at having solved the main issue in #216 that I overlooked that patch. Well spotted.

comment:2 by Olly Betts, 15 years ago

Milestone: 1.1.11.1.3

comment:3 by Olly Betts, 15 years ago

Priority: normallow

Could be added in 1.2.x in a compatible way.

comment:4 by Olly Betts, 15 years ago

Milestone: 1.1.31.1.7

Moving to 1.1.7 to balance the milestones.

comment:5 by Olly Betts, 15 years ago

Milestone: 1.1.71.2.0

API addition, so bumping to stay on track for 1.2.0.

comment:6 by Olly Betts, 11 years ago

Component: OtherLibrary API
Milestone: 1.2.x1.3.x

This needs re-profiling to see what the overhead of calculating percentages now actually is.

comment:7 by Olly Betts, 10 years ago

Milestone: 1.3.x1.3.3

comment:8 by Olly Betts, 9 years ago

Milestone: 1.3.31.1.2
Resolution: fixed
Status: assignedclosed

The way we calculate percentages has changed a lot since this ticket was filed - now each time the highest ranked document changes, we count how many leaf subqueries match, and then at the end we divide this by the total number of leaf subqueries to get a percentage scale factor.

I've just done a quick profile of current git master with callgrind, and I get that these recursive calls amount to 0.016% of the execution time, which I think is completely acceptable, so I'm going to close this ticket.

This new approach was implemented back in 1.1.2 (though the query internals were completely rewritten in 1.3.0, which may have changed the exact fraction of time this takes) so marking as fixed by that version.

Note: See TracTickets for help on using tickets.