Opened 16 years ago
Closed 15 years ago
#375 closed enhancement (fixed)
Merge postingsources branch
Reported by: | Olly Betts | Owned by: | Richard Boulton |
---|---|---|---|
Priority: | high | Milestone: | 1.1.3 |
Component: | Library API | Version: | SVN trunk |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
Richard said he wanted to merge this branch next, so creating a ticket to track stuff about it.
Change History (11)
comment:1 by , 15 years ago
comment:2 by , 15 years ago
Priority: | normal → high |
---|
Should be an easy merge, and one fewer branch to maintain would be nice, but it could as easily be merged in 1.2.x. Priority high.
comment:4 by , 15 years ago
New headers merged with postingsource.h. Documentation comments added for LimitedValueWeightPostingSource (which also mention that the class is only really useful with older backends, and will probably be deprecated when they go away). Added FIXME to doxy2swig.py
So, remaining issues are test coverage. I'll implement some for DecreasingValueWeightPostingSource as suggested above, and I'll also implement some cursory tests for LimitedValueWeightPostingSource (but not worry too much since it is going away soon).
Then, it'll be ready to merge.
comment:5 by , 15 years ago
Cool - I'll take a look at the updated diff tomorrow, but it sounds good.
Flint will be deprecated in 1.2.0, so unless I'm misunderstanding, you seem to be suggesting we should add a new class which will be deprecated in its first stable release...
comment:6 by , 15 years ago
I suppose I am... My thinking is that I'm currently not convinced that we're going to be able to deprecate flint at that stage, because of ticket #326 (slow doclen access). I'm also concerned that the same problem that manifests there (slow skipping through postinglist chunks) will mean that access to the new-style value slots is also slow - I've not been able to measure this yet, because of the slowness of any search with chert.
If we _do_ deprecate flint in 1.2.0, the LimitedValueWeightPostingSource should be removed before we make the release for 1.2.0, since there'll be a better alternative (use chert!), so it will never make a stable release.
comment:7 by , 15 years ago
I've found a better fix for doxy2swig.py and applied it to trunk.
However, increasing test coverage has found a segfault - I've committed the test, will investigate the problem tomorrow.
comment:8 by , 15 years ago
Access to values in flint was very slow too, and values are generally larger than the encoded doclength (or wdf of a term), so fewer fit in a chunk. If you want to check timings, try using chert with BM25 parameters such that the document length isn't used.
I think if chert really isn't ready to replace flint, it would be better to backport valuestats to flint (which has wider benefits, such as allowing the new omindex "don't reindex unchanged documents" feature to work efficiently), or to consider backing out the document length change from chert (though I think it's probably likely to be a general win for most users - your test only considered the case where there's plenty of RAM, and only used single term queries rather than a realistic mix).
So I don't think we should add LimitedValueWeightPostingSource currently. We can of course review that nearer the release.
Looking at the diff to the last merge point, for consistency, name() shouldn't include "()" at the end of the returned name, but should include a "Xapian::" prefix.
comment:9 by , 15 years ago
I'm happy to merge the branch as it now stands apart from LimitedValueWeightPostingSource.
I'm confused by your comment about name() - the latest code in the postingsource branch reads:
std::string DecreasingValueWeightPostingSource::name() const {
return "Xapian::DecreasingValueWeightPostingSource";
}
comment:10 by , 15 years ago
To clarify - I'd merge the changes in the branch other than those dealing with LimitedValueWeightPostingSource. I'd leave the branch containing just LimitedValueWeightPostingSource, with no plan to merge the branch, and be quite happy for the branch to be deleted at any point after that, actually.
comment:11 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Version: | → SVN trunk |
OK, I've just reviewed this diff between the branch and trunk:
Comments:
include/xapian
for every new class. These are subclasses of PostingSource and can just go in postingsource.h with the other subclasses of PostingSource.I was concerned we should try to merge branches right at the start of working on a particular 1.1.x, but in this case it's only adding functionality and doesn't really touch existing code, so it seems it's only going to break itself at worst.