Opened 8 years ago

Closed 7 years ago

#739 closed enhancement (fixed)

Expose a lazy Xapian::Document() attached to a database

Reported by: German M. Bravo Owned by: Olly Betts
Priority: normal Milestone: 1.4.2
Component: Other Version:
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

This can be accomplished either exposing a "lazy" flag in Database::get_document(Xapian::docid), such as passing flags or a boolean to it, or by exposing a Xapian::Document constructor which takes the database(s) as well as the docid.

The need for this lazy document initializer is that in Xapiand we create Xapian::Document objects on the fly, many times with different underlying database objects coming from a pool of databases; we need an efficient/fast way of telling Xapian::Document from what database to get the required document, without it having to hit the database on each get_document() call.

Change History (6)

comment:1 by Olly Betts, 7 years ago

A flag makes much more sense. Having both Database::get_document(Xapian::docid did) and Document::Document(const Database& db, Xapian::docid did) with different behaviour is just confusing, and API users will inevitably end up finding and using the one they didn't want.

comment:2 by Olly Betts, 7 years ago

Trying to implement this, I found one slightly awkward issue.

Currently the backend can just assume that the Document object's docid is valid, but with a lazily created document it can't.

I can see two options:

  • We record that the docid hasn't been vetted, and check before first use - essentially validate the docid lazily. This doesn't seem to gain much - if you routinely create Document objects you don't use, that seems better solved at a higher level. I guess maybe it's useful to delay the potential I/O, but that seems a stretch.
  • We say that's undefined behaviour. But that means you can get segfaults and the like via innocent looking code in scripting languages via the bindings, which is nasty.

I guess you know for sure that the docid is valid, so lazy vetting isn't useful for your case.

comment:3 by Olly Betts, 7 years ago

The segfaults were only in the inmemory backend, where it's cheap to check on lazy creation if the docid is within the valid range, so actually this is easy to handle.

So I have a working patch. The remote backend isn't lazy, but that's OK I think (we can document this is essentially a hint).

Not sure on the best name for the flag. "Lazy" doesn't seem quite right (Document objects are already lazy at loading doc data, terms and values, and we might later want to provide flags to hint which of those you're going to want). It's more like "don't check doc exists".

comment:4 by Olly Betts, 7 years ago

Implemented in master in ce33b20d98f02d0f6aa1e5c94c30b9cda10ef09a. Needs backporting.

I went for DOC_ASSUME_VALID.

comment:5 by German M. Bravo, 7 years ago

Thank you Olly, this will work beautifully. Anyway, what is the reason the remote backend does have to check anyway? As we'll heavily use the remote backend as well.

comment:6 by Olly Betts, 7 years ago

Resolution: fixed
Status: newclosed

Backported to branch:RELEASE/1.4 in 26f4a06b5bf3ac4bbd0d326b76f1b17608951b35.

When you call Database::get_document() on a remote database, the client requests the server send the document information at that point (and waits for it to be received). That could be made lazier, but it's rather more work than just exposing an existing flag, and needs careful thought in case the lazy fetch could end up happening in the middle of another remote backend message exchange, as the protocol isn't multiplexed.

If you need lazier remote documents too, I think let's handle that as a separate ticket.

Note: See TracTickets for help on using tickets.