Opened 8 years ago
Closed 8 years ago
#739 closed enhancement (fixed)
Expose a lazy Xapian::Document() attached to a database
Reported by: | German M. Bravo | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 1.4.2 |
Component: | Other | Version: | |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
This can be accomplished either exposing a "lazy" flag in Database::get_document(Xapian::docid)
, such as passing flags or a boolean to it, or by exposing a Xapian::Document
constructor which takes the database(s) as well as the docid
.
The need for this lazy document initializer is that in Xapiand we create Xapian::Document
objects on the fly, many times with different underlying database objects coming from a pool of databases; we need an efficient/fast way of telling Xapian::Document
from what database to get the required document, without it having to hit the database on each get_document()
call.
Change History (6)
comment:1 by , 8 years ago
comment:2 by , 8 years ago
Trying to implement this, I found one slightly awkward issue.
Currently the backend can just assume that the Document object's docid is valid, but with a lazily created document it can't.
I can see two options:
- We record that the docid hasn't been vetted, and check before first use - essentially validate the docid lazily. This doesn't seem to gain much - if you routinely create
Document
objects you don't use, that seems better solved at a higher level. I guess maybe it's useful to delay the potential I/O, but that seems a stretch. - We say that's undefined behaviour. But that means you can get segfaults and the like via innocent looking code in scripting languages via the bindings, which is nasty.
I guess you know for sure that the docid is valid, so lazy vetting isn't useful for your case.
comment:3 by , 8 years ago
The segfaults were only in the inmemory backend, where it's cheap to check on lazy creation if the docid is within the valid range, so actually this is easy to handle.
So I have a working patch. The remote backend isn't lazy, but that's OK I think (we can document this is essentially a hint).
Not sure on the best name for the flag. "Lazy" doesn't seem quite right (Document
objects are already lazy at loading doc data, terms and values, and we might later want to provide flags to hint which of those you're going to want). It's more like "don't check doc exists".
comment:4 by , 8 years ago
Implemented in master in ce33b20d98f02d0f6aa1e5c94c30b9cda10ef09a. Needs backporting.
I went for DOC_ASSUME_VALID
.
comment:5 by , 8 years ago
Thank you Olly, this will work beautifully. Anyway, what is the reason the remote backend does have to check anyway? As we'll heavily use the remote backend as well.
comment:6 by , 8 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Backported to branch:RELEASE/1.4 in 26f4a06b5bf3ac4bbd0d326b76f1b17608951b35.
When you call Database::get_document()
on a remote database, the client requests the server send the document information at that point (and waits for it to be received). That could be made lazier, but it's rather more work than just exposing an existing flag, and needs careful thought in case the lazy fetch could end up happening in the middle of another remote backend message exchange, as the protocol isn't multiplexed.
If you need lazier remote documents too, I think let's handle that as a separate ticket.
A flag makes much more sense. Having both
Database::get_document(Xapian::docid did)
andDocument::Document(const Database& db, Xapian::docid did)
with different behaviour is just confusing, and API users will inevitably end up finding and using the one they didn't want.