Ticket #124 (closed defect: released)
get_termfreq() and related should be removed from internal postlist classes
| Reported by: | richard | Owned by: | olly |
|---|---|---|---|
| Priority: | low | Milestone: | |
| Component: | Library API | Version: | SVN trunk |
| Severity: | trivial | Keywords: | |
| Cc: | Blocked By: | ||
| Operating System: | All | Blocking: | #120 |
Description
Postlists internally have a get_termfreq() and get_collection_freq() methods, which allow the frequency information for the posting list to be read. These methods are in the postlist classes largely because the information is read from the postlist table in the quartz and flint backends, and therefore it's available for free when the postlist is opened. However, in the API, the PostList? class has had these methods commented out for a long while now, since the API exposes postlists as an iterator, and methods to get the length of the iterator don't really belong in the iterators.
Internally in the quartz and flint backends, the get_termfreq() method is called by the database classes get_termfreq() method, but get_termfreq() method can never be called by anything else, as far as I can see. For the remote backend, I believe the methods are never called at all.
One way to tidy this situation up is to change the API such that opening a postlist from a database returns a "PostList?" class (instead of an iterator), which would have a method to get an iterator over the postlist, and methods to get the frequency information.
The other way is to remote the get_termfreq() and get_collection_freq() methods entirely from the postlist classes, and to redirect the few uses of these methods internally.
I've titled this bug thinking that the second of these solutions is the best way forward, since it reduces code complexity, but I'm open to persuasion.
