Ticket #124 (closed defect: released)

Opened 21 months ago

Last modified 19 months ago

get_termfreq() and related should be removed from internal postlist classes

Reported by: richard Owned by: olly
Priority: low Milestone:
Component: Library API Version: SVN trunk
Severity: trivial Keywords:
Cc: Blocked By:
Operating System: All Blocking: #120

Description

Postlists internally have a get_termfreq() and get_collection_freq() methods, which allow the frequency information for the posting list to be read. These methods are in the postlist classes largely because the information is read from the postlist table in the quartz and flint backends, and therefore it's available for free when the postlist is opened. However, in the API, the PostList? class has had these methods commented out for a long while now, since the API exposes postlists as an iterator, and methods to get the length of the iterator don't really belong in the iterators.

Internally in the quartz and flint backends, the get_termfreq() method is called by the database classes get_termfreq() method, but get_termfreq() method can never be called by anything else, as far as I can see. For the remote backend, I believe the methods are never called at all.

One way to tidy this situation up is to change the API such that opening a postlist from a database returns a "PostList?" class (instead of an iterator), which would have a method to get an iterator over the postlist, and methods to get the frequency information.

The other way is to remote the get_termfreq() and get_collection_freq() methods entirely from the postlist classes, and to redirect the few uses of these methods internally.

I've titled this bug thinking that the second of these solutions is the best way forward, since it reduces code complexity, but I'm open to persuasion.

Change History

Changed 21 months ago by richard

  • blocking set to 120

Marking as blocking 120, since it would be nice to get this tidied up in the 1.0 series.

Changed 21 months ago by olly

  • owner changed from olly to nobody
  • rep_platform changed from PC to All
  • severity changed from minor to trivial

I'd agree if we're talking about changing something like:

FlintPostList? pl(NULL, &postlist_table, NULL, tname); RETURN(pl.get_termfreq());

to something like:

RETURN(postlist_table.get_termfreq(tname));

I don't see a particularly compelling reason why these methods shouldn't exist on PostList? per se though (the external and internal APIs needn't necessarily be aligned), so if they're actually used in sensible ways anywhere I'd have reservations about killing them.

But I suspect you are right about where these are used - these cases are probably converted from using the lexicon, and I can't think off-hand where else we'd use this information.

Reassigning to "nobody@…" - I've created this address as a bugzilla user for "unassigned" bugs since it's currently hard to distinguish "actively being fixed" vs "this just ended up assigned to me but I've no plans to work on it in the for the foreseeable future".

Changed 19 months ago by olly

  • owner changed from nobody to olly

PostList::get_termfreq() is called internally - see common/leafpostlist.h:

Xapian::doccount get_termfreq_max() const { return get_termfreq(); } Xapian::doccount get_termfreq_min() const { return get_termfreq(); } Xapian::doccount get_termfreq_est() const { return get_termfreq(); }

But there's certainly scope for improving the code here - I'm taking a look.

Changed 19 months ago by olly

  • status changed from new to closed
  • resolution set to fixed

Fixed in SVN HEAD.

Changed 19 months ago by richard

  • resolution changed from fixed to released

Fix is in upcoming 1.0.2 release.

Changed 19 months ago by trac

  • platform set to All
Note: See TracTickets for help on using tickets.