Opened 15 years ago

Closed 15 years ago

#375 closed enhancement (fixed)

Merge postingsources branch

Reported by: Olly Betts Owned by: Richard Boulton
Priority: high Milestone: 1.1.3
Component: Library API Version: SVN trunk
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

Richard said he wanted to merge this branch next, so creating a ticket to track stuff about it.

Change History (11)

comment:1 by Olly Betts, 15 years ago

OK, I've just reviewed this diff between the branch and trunk:

svn diff --old=.@12771 --new=svn+userv:///xapian/branches/postingsources@HEAD

Comments:

  • Test coverage for DecreasingValueWeightPostingSource doesn't test a restricted range starting at a docid other than 1.
  • There's also no coverage for having more than one document in a row with the same weight.
  • I don't think we should add a new header to include/xapian for every new class. These are subclasses of PostingSource and can just go in postingsource.h with the other subclasses of PostingSource.
  • LimitedValueWeightPostingSource has no documentation comments and no test coverage, but I think we decided it didn't have enough utility outside of use with flint, and flint will be deprecated in 1.2.0, so I guess you didn't bother with that deliberately?
  • The tweak to not wrap in xapian-bindings/python/doxy2swig.py.in should probably just drop the old version of the line rather than commenting it out. If it's an issue to resolve, then it deserves a FIXME and explanatory comment.

I was concerned we should try to merge branches right at the start of working on a particular 1.1.x, but in this case it's only adding functionality and doesn't really touch existing code, so it seems it's only going to break itself at worst.

comment:2 by Olly Betts, 15 years ago

Priority: normalhigh

Should be an easy merge, and one fewer branch to maintain would be nice, but it could as easily be merged in 1.2.x. Priority high.

comment:3 by Richard Boulton, 15 years ago

Status: newassigned

Going to address these comments now.

comment:4 by Richard Boulton, 15 years ago

New headers merged with postingsource.h. Documentation comments added for LimitedValueWeightPostingSource (which also mention that the class is only really useful with older backends, and will probably be deprecated when they go away). Added FIXME to doxy2swig.py

So, remaining issues are test coverage. I'll implement some for DecreasingValueWeightPostingSource as suggested above, and I'll also implement some cursory tests for LimitedValueWeightPostingSource (but not worry too much since it is going away soon).

Then, it'll be ready to merge.

comment:5 by Olly Betts, 15 years ago

Cool - I'll take a look at the updated diff tomorrow, but it sounds good.

Flint will be deprecated in 1.2.0, so unless I'm misunderstanding, you seem to be suggesting we should add a new class which will be deprecated in its first stable release...

comment:6 by Richard Boulton, 15 years ago

I suppose I am... My thinking is that I'm currently not convinced that we're going to be able to deprecate flint at that stage, because of ticket #326 (slow doclen access). I'm also concerned that the same problem that manifests there (slow skipping through postinglist chunks) will mean that access to the new-style value slots is also slow - I've not been able to measure this yet, because of the slowness of any search with chert.

If we _do_ deprecate flint in 1.2.0, the LimitedValueWeightPostingSource should be removed before we make the release for 1.2.0, since there'll be a better alternative (use chert!), so it will never make a stable release.

comment:7 by Richard Boulton, 15 years ago

I've found a better fix for doxy2swig.py and applied it to trunk.

However, increasing test coverage has found a segfault - I've committed the test, will investigate the problem tomorrow.

comment:8 by Olly Betts, 15 years ago

Access to values in flint was very slow too, and values are generally larger than the encoded doclength (or wdf of a term), so fewer fit in a chunk. If you want to check timings, try using chert with BM25 parameters such that the document length isn't used.

I think if chert really isn't ready to replace flint, it would be better to backport valuestats to flint (which has wider benefits, such as allowing the new omindex "don't reindex unchanged documents" feature to work efficiently), or to consider backing out the document length change from chert (though I think it's probably likely to be a general win for most users - your test only considered the case where there's plenty of RAM, and only used single term queries rather than a realistic mix).

So I don't think we should add LimitedValueWeightPostingSource currently. We can of course review that nearer the release.

Looking at the diff to the last merge point, for consistency, name() shouldn't include "()" at the end of the returned name, but should include a "Xapian::" prefix.

comment:9 by Richard Boulton, 15 years ago

I'm happy to merge the branch as it now stands apart from LimitedValueWeightPostingSource.

I'm confused by your comment about name() - the latest code in the postingsource branch reads:

std::string DecreasingValueWeightPostingSource::name() const {

return "Xapian::DecreasingValueWeightPostingSource";

}

comment:10 by Richard Boulton, 15 years ago

To clarify - I'd merge the changes in the branch other than those dealing with LimitedValueWeightPostingSource. I'd leave the branch containing just LimitedValueWeightPostingSource, with no plan to merge the branch, and be quite happy for the branch to be deleted at any point after that, actually.

comment:11 by Richard Boulton, 15 years ago

Resolution: fixed
Status: assignedclosed
Version: SVN trunk

Merged as planned in r13193 and r13194

Note: See TracTickets for help on using tickets.