Opened 11 years ago

Closed 10 years ago

#503 closed enhancement (fixed)

Add Python PostingSource example from Xappy to docs

Reported by: Joost Cassee Owned by: Olly Betts
Priority: normal Milestone: 1.2.6
Component: Xapian-bindings (Python) Version: 1.2.2
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description (last modified by Olly Betts)

The Xappy source code contains a perfect example of a weight-only (non-filtering) PostingSource written in Python. This would be a good addition to the postingsource docs. I have slightly edited the original.

class ExternalWeightPostingSource(xapian.PostingSource):
    """
    A Xapian posting source returning weights from an external source.
    """
    def __init__(self, db, wtsource):
        xapian.PostingSource.__init__(self)
        self.db = db
        self.wtsource = wtsource

    def init(self, db):
        self.alldocs = db.postlist('')

    def get_termfreq_min(self): return 0
    def get_termfreq_est(self): return self.db.get_doccount()
    def get_termfreq_max(self): return self.db.get_doccount()

    def next(self, minweight):
        try:
            self.current = self.alldocs.next()
        except StopIteration:
            self.current = None

    def skip_to(self, docid, minweight):
        try:
            self.current = self.alldocs.skip_to(docid)
        except StopIteration:
            self.current = None

    def at_end(self):
        return self.current is None

    def get_docid(self):
        return self.current.docid

    def get_maxweight(self):
        return self.wtsource.get_maxweight()

    def get_weight(self):
        doc = self.db.get_document(self.current.docid)
        return self.wtsource.get_weight(doc)

Change History (10)

comment:1 by Olly Betts, 11 years ago

Component: OtherXapian-bindings (Python)
Description: modified (diff)
Milestone: 1.2.3
Owner: changed from Olly Betts to Richard Boulton
Version: 1.2.2

Marking for 1.2.3, though that's pending on us being OK to relicense this in the future. Richard, who wrote this? The (C) headers on the file list you and Lemur (which shouldn't be a problem, though we should explicitly check) and Pablo Hoffman who I don't think I know.

We should kill reset() from it if it really is for backward compatibility - compatibility with 1.1.x isn't interesting at this point, and going forward a clean example is more important.

Probably also better to rename xapdb to just db for the new context.

comment:2 by Joost Cassee, 11 years ago

By the way, please add a note to the Python documentation that the database reference passed into PostingSource.init(db) (not __init__()) by Xapian should not be stored as a class attribute. Xapian will remove the underlying C++ object after leaving the method, and the Python application will segfault if you try to use it later on.

By the way2: it would be nice if Trac users could edit their own ticket description; there is still one typo in there...

comment:3 by Olly Betts, 11 years ago

Hmm, I'm sure I wrote a response to comment:2 already. I guess I must have previewed it but failed to actually submit it or something.

Charlie says it's fine for future relicensing, so the Lemur (C) isn't an issue.

Not being able to store the passed database sounds like a bug in the Python wrappers to me.

And I think users should now be able to edit ticket descriptions (I didn't realise they couldn't - thanks for pointing that out).

comment:4 by Olly Betts, 11 years ago

Milestone: 1.2.31.2.4

I want to release 1.2.3, so bumping.

in reply to:  3 comment:5 by Joost Cassee, 11 years ago

Replying to olly:

Not being able to store the passed database sounds like a bug in the Python wrappers to me.

I cannot reproduce this problem in version 1.2.3.

comment:6 by Olly Betts, 11 years ago

Description: modified (diff)

I've made some further edits - fixing a typo in the wiki markup, removing the reset() method, renaming xapdb to db, and removing the ProcessedDocument reference (which I think must be a xappyism).

Richard said on IRC he'd like to have this actually tested (I think he means dynamically) so that the docs don't have an incorrect example.

comment:7 by Olly Betts, 11 years ago

Milestone: 1.2.41.2.5

Had a report on IRC that this example crashes, so we should definitely at least check it works before adding it to the docs:

| eugene_beast> well, python process aborts if i'm copying the example from #503 and trying to search for something

Not worth delaying 1.2.4 further for this, so bumping milestone.

comment:8 by Olly Betts, 11 years ago

Milestone: 1.2.51.2.6

Bumping to 1.2.6.

comment:9 by Olly Betts, 10 years ago

Owner: changed from Richard Boulton to Olly Betts
Status: newassigned

Works for me if I fill in a suitable class for wtsource:

    class WeightSource:
        def __init__(self):
            pass

        def get_maxweight(self):
            return 1234.;

        def get_weight(self, doc):
            return doc.get_docid()

I wonder if eugene_beast fail to supply a suitable class there. Anyway, the example does work.

I'm going to try to slot this in now.

comment:10 by Olly Betts, 10 years ago

Resolution: fixed
Status: assignedclosed

Added to postingsource.rst in r15507. It isn't automatically tested or anything like that, but I have manually checked it before added it at least.

Richard seemed keen to have it automatically tested, which seems a really nice idea, but more than I have time to do right now, so I've opened #547 for that.

Note: See TracTickets for help on using tickets.