Opened 17 years ago

Closed 17 years ago

Last modified 17 years ago

#124 closed defect (released)

get_termfreq() and related should be removed from internal postlist classes

Reported by: Richard Boulton Owned by: Olly Betts
Priority: low Milestone:
Component: Library API Version: SVN trunk
Severity: trivial Keywords:
Cc: Blocked By:
Blocking: #120 Operating System: All

Description

Postlists internally have a get_termfreq() and get_collection_freq() methods, which allow the frequency information for the posting list to be read. These methods are in the postlist classes largely because the information is read from the postlist table in the quartz and flint backends, and therefore it's available for free when the postlist is opened. However, in the API, the PostList class has had these methods commented out for a long while now, since the API exposes postlists as an iterator, and methods to get the length of the iterator don't really belong in the iterators.

Internally in the quartz and flint backends, the get_termfreq() method is called by the database classes get_termfreq() method, but get_termfreq() method can never be called by anything else, as far as I can see. For the remote backend, I believe the methods are never called at all.

One way to tidy this situation up is to change the API such that opening a postlist from a database returns a "PostList" class (instead of an iterator), which would have a method to get an iterator over the postlist, and methods to get the frequency information.

The other way is to remote the get_termfreq() and get_collection_freq() methods entirely from the postlist classes, and to redirect the few uses of these methods internally.

I've titled this bug thinking that the second of these solutions is the best way forward, since it reduces code complexity, but I'm open to persuasion.

Change History (5)

comment:1 by Richard Boulton, 17 years ago

Blocking: 120 added

Marking as blocking 120, since it would be nice to get this tidied up in the 1.0 series.

comment:2 by Olly Betts, 17 years ago

Owner: changed from Olly Betts to Not currently assigned
rep_platform: PCAll
Severity: minortrivial

I'd agree if we're talking about changing something like:

FlintPostList pl(NULL, &postlist_table, NULL, tname); RETURN(pl.get_termfreq());

to something like:

RETURN(postlist_table.get_termfreq(tname));

I don't see a particularly compelling reason why these methods shouldn't exist on PostList per se though (the external and internal APIs needn't necessarily be aligned), so if they're actually used in sensible ways anywhere I'd have reservations about killing them.

But I suspect you are right about where these are used - these cases are probably converted from using the lexicon, and I can't think off-hand where else we'd use this information.

Reassigning to "nobody@…" - I've created this address as a bugzilla user for "unassigned" bugs since it's currently hard to distinguish "actively being fixed" vs "this just ended up assigned to me but I've no plans to work on it in the for the foreseeable future".

comment:3 by Olly Betts, 17 years ago

Owner: changed from Not currently assigned to Olly Betts

PostList::get_termfreq() is called internally - see common/leafpostlist.h:

Xapian::doccount get_termfreq_max() const { return get_termfreq(); } Xapian::doccount get_termfreq_min() const { return get_termfreq(); } Xapian::doccount get_termfreq_est() const { return get_termfreq(); }

But there's certainly scope for improving the code here - I'm taking a look.

comment:4 by Olly Betts, 17 years ago

Resolution: fixed
Status: newclosed

Fixed in SVN HEAD.

comment:5 by Richard Boulton, 17 years ago

Operating System: All
Resolution: fixedreleased

Fix is in upcoming 1.0.2 release.

Note: See TracTickets for help on using tickets.