#124 closed defect (released)
get_termfreq() and related should be removed from internal postlist classes
Reported by: | Richard Boulton | Owned by: | Olly Betts |
---|---|---|---|
Priority: | low | Milestone: | |
Component: | Library API | Version: | SVN trunk |
Severity: | trivial | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | #120 | Operating System: | All |
Description
Postlists internally have a get_termfreq() and get_collection_freq() methods, which allow the frequency information for the posting list to be read. These methods are in the postlist classes largely because the information is read from the postlist table in the quartz and flint backends, and therefore it's available for free when the postlist is opened. However, in the API, the PostList class has had these methods commented out for a long while now, since the API exposes postlists as an iterator, and methods to get the length of the iterator don't really belong in the iterators.
Internally in the quartz and flint backends, the get_termfreq() method is called by the database classes get_termfreq() method, but get_termfreq() method can never be called by anything else, as far as I can see. For the remote backend, I believe the methods are never called at all.
One way to tidy this situation up is to change the API such that opening a postlist from a database returns a "PostList" class (instead of an iterator), which would have a method to get an iterator over the postlist, and methods to get the frequency information.
The other way is to remote the get_termfreq() and get_collection_freq() methods entirely from the postlist classes, and to redirect the few uses of these methods internally.
I've titled this bug thinking that the second of these solutions is the best way forward, since it reduces code complexity, but I'm open to persuasion.
Change History (5)
comment:1 by , 18 years ago
Blocking: | 120 added |
---|
comment:2 by , 18 years ago
Owner: | changed from | to
---|---|
rep_platform: | PC → All |
Severity: | minor → trivial |
I'd agree if we're talking about changing something like:
FlintPostList pl(NULL, &postlist_table, NULL, tname); RETURN(pl.get_termfreq());
to something like:
RETURN(postlist_table.get_termfreq(tname));
I don't see a particularly compelling reason why these methods shouldn't exist on PostList per se though (the external and internal APIs needn't necessarily be aligned), so if they're actually used in sensible ways anywhere I'd have reservations about killing them.
But I suspect you are right about where these are used - these cases are probably converted from using the lexicon, and I can't think off-hand where else we'd use this information.
Reassigning to "nobody@…" - I've created this address as a bugzilla user for "unassigned" bugs since it's currently hard to distinguish "actively being fixed" vs "this just ended up assigned to me but I've no plans to work on it in the for the foreseeable future".
comment:3 by , 18 years ago
Owner: | changed from | to
---|
PostList::get_termfreq() is called internally - see common/leafpostlist.h:
Xapian::doccount get_termfreq_max() const { return get_termfreq(); } Xapian::doccount get_termfreq_min() const { return get_termfreq(); } Xapian::doccount get_termfreq_est() const { return get_termfreq(); }
But there's certainly scope for improving the code here - I'm taking a look.
comment:5 by , 18 years ago
Operating System: | → All |
---|---|
Resolution: | fixed → released |
Fix is in upcoming 1.0.2 release.
Marking as blocking 120, since it would be nice to get this tidied up in the 1.0 series.