Opened 17 years ago
Closed 16 years ago
#213 closed defect (fixed)
Expose statistics to user defined Xapian::Weight subclasses
Reported by: | Richard Boulton | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 1.1.0 |
Component: | Library API | Version: | SVN trunk |
Severity: | normal | Keywords: | |
Cc: | Olly Betts | Blocked By: | |
Blocking: | Operating System: | All |
Description (last modified by )
Currently, The Xapian::Weight::Internal class (which is, as of last night, the class holding the statistics for the whole collection used by the weight objects) is not publically visible. This means that it would be impossible, for example, for a user to write a weighting class equivalent to, say, the BM25Weight class, using the public API, because the statistics aren't available.
After cleaning up the weighting calculation system, I believe the Xapian::Weight::Internal class is now nearly clean enough that it could reasonably be made public, allowing custom weighting classes access to all the statistics currently available.
We might want to make the termfreq and reltermfreq members private, since they're likely to be accessed mainly through the accessor functions anyway. Also we might want to combine them into a single map with entries holding both the termfreq and the reltermfreq, since it's usual to want to access both the termfreq and the reltermfreq for a particular term at the same time.
Also, we might want to call the class Xapian::Stats, instead of Xapian::Weight::Internal, to reflect the Stats being part of the public API, but this would require an ABI change, so would have to wait for 1.1.0. (We could keep the API compatible by making Xapian::Weight::Internal a typedef for Xapian::Stats, I think; currently Stats (with no namespace) is a typedef for Xapian::Weight::Internal).
Change History (13)
comment:1 by , 17 years ago
Blocking: | 120 added |
---|
comment:2 by , 17 years ago
I don't think we need a new public API class for this - we can just add five non-virtual protected methods to Xapian::Weight - for example:
Xapian::doccount Xapian::Weight::get_termfreq() const {
return internal->get_termfreq(tname);
}
Hmm, or perhaps we should just read these values from the Weight::Internal object once when create() is called - then we wouldn't need to store "internal" or "tname", but instead we'd need to store:
average_length termfreq collection_size rset_size rel_termfreq
But that's mostly orthogonal - it's cleaner to wrap these in getter methods, and then what we store isn't important, except that we can't change what we store until 1.1.0.
comment:3 by , 17 years ago
Owner: | changed from | to
---|
Not having to store tname in the Weight class would make various things cleaner.
Currently, we have to pass "" as the tname when making the extra-weight stuff,
and we special-case this in Xapian::Weight::Internal::get_reltermfreq(), which isn't really very nice.
Additionally, to implement the weighting stuff for OP_SYNONYM queries, I'm having to invent terms (and carefully check they're not in the query) so that the weighting object for the SynonymPostlist can get the right statistics. Which is terribly unpleasant.
I think adding members to Xapian::Weight is better than adding accessor methods: because Xapian::Weight::Internal isn't public, the accessor methods couldn't be inline, so a Weight subclass might well want to cache some of them anyway, if it wants to reuse them.
There's no particular reason that we couldn't add members holding the statistics right now, I think; we just can't get rid of tname and internal until 1.1.0 (at the earliest).
comment:4 by , 17 years ago
Cc: | added |
---|---|
Status: | new → assigned |
Actually, there is a reason that we can't add member variables to Xapian::Weight right now - it'll change the size of the class and break ABI compatibility.
What we _could_ do is change the Xapian::Weight::Internal class _again_ to something which just contains the relevant stats for the weight object (ie, the same global statistics as the current Weight::Internal class, but only the termfreq and reltermfreq for the term which the Weight class is for). This would allow me to do the synonym stuff more easily, and be a step towards making the relevant statistics publically available (which we could then do in 1.1.0 by replacing the internal class with members).
comment:5 by , 17 years ago
Just to note - I've committed the "change the Xapian::Weight::Internal class _again_ to something which just contains the relevant stats for the weight object" bit in SVN HEAD.
comment:6 by , 17 years ago
Operating System: | → All |
---|---|
Summary: | Make Xapian::Weight::Internal part of the public API → Expose statistics to user defined Xapian::Weight subclasses |
Re-titling to reflect what we want to achieve, rather than a particular approach.
comment:7 by , 17 years ago
Once this is done, we should add a better example of a user-defined weighting scheme to docs/sorting.rst.
comment:9 by , 17 years ago
Description: | modified (diff) |
---|---|
Milestone: | → 1.1 |
comment:10 by , 17 years ago
Blocking: | 120 removed |
---|
(In #120) Remove the unfixed dependencies so we can close this bug - they're all marked for the 1.1.0 milestone.
comment:12 by , 16 years ago
Owner: | changed from | to
---|---|
Status: | assigned → new |
Marking as owned by Olly, since I believe he has a partially written solution for this.
comment:13 by , 16 years ago
Description: | modified (diff) |
---|---|
Status: | new → assigned |
Yes - I was working on it yesterday in fact (while I waited for my DSL to be fixed).
comment:14 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Fixed in trunk in r12151.
Marking for 1.0.x, since making the interface public could be done without changing the ABI. I don't have any pressing need for this though (it just seems useful), so I'm not going to complain if this is pushed back to 1.1.0, when we'd be able to make it public with a more sensible name.