Opened 16 years ago
Closed 10 years ago
#344 closed defect (fixed)
Allow calculation of percentages to be disabled
Reported by: | Richard Boulton | Owned by: | Olly Betts |
---|---|---|---|
Priority: | low | Milestone: | 1.1.2 |
Component: | Library API | Version: | SVN trunk |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
Currently, all calls to get_mset() calculate percentage weights for each document. This has a measurable overhead, and percentages are often not needed. Therefore, it would be nice to be able to disable calculation of percentages for a match (and possibly even for the calculation to be disabled by default). Alternatively, if we can reduce the overhead to a very small amount (eg, less than 1%) it would probably be reasonable to continue calculating it in all cases, for the added convenience of not needing to enable it before searches.
I've just been examining the performance of a set of 10 term OR searches. According to kcachegrind, around 5.5% of the CPU time is spent at the end of get_mset() in reading the termlist of the top document; this is done only to check which terms are present in the top document, in order to calculate the percentage for that document. Therefore, the current overhead for these searches is at least 5.5% of the search time (when we're not IO bound).
There is a patch attached to ticket #216 (http://trac.xapian.org/attachment/ticket/216/calcpercent.patch) which adds this feature (though it may need updating to match SVN trunk). However, it seemed to have been ignored/forgotten, so I think it deserves a ticket to discuss it.
Change History (8)
comment:1 by , 16 years ago
Status: | new → assigned |
---|
comment:2 by , 15 years ago
Milestone: | 1.1.1 → 1.1.3 |
---|
comment:5 by , 15 years ago
Milestone: | 1.1.7 → 1.2.0 |
---|
API addition, so bumping to stay on track for 1.2.0.
comment:6 by , 12 years ago
Component: | Other → Library API |
---|---|
Milestone: | 1.2.x → 1.3.x |
This needs re-profiling to see what the overhead of calculating percentages now actually is.
comment:7 by , 11 years ago
Milestone: | 1.3.x → 1.3.3 |
---|
comment:8 by , 10 years ago
Milestone: | 1.3.3 → 1.1.2 |
---|---|
Resolution: | → fixed |
Status: | assigned → closed |
The way we calculate percentages has changed a lot since this ticket was filed - now each time the highest ranked document changes, we count how many leaf subqueries match, and then at the end we divide this by the total number of leaf subqueries to get a percentage scale factor.
I've just done a quick profile of current git master with callgrind, and I get that these recursive calls amount to 0.016% of the execution time, which I think is completely acceptable, so I'm going to close this ticket.
This new approach was implemented back in 1.1.2 (though the query internals were completely rewritten in 1.3.0, which may have changed the exact fraction of time this takes) so marking as fixed by that version.
Sorry, I was a bit eager at having solved the main issue in #216 that I overlooked that patch. Well spotted.