Opened 17 years ago
Last modified 15 months ago
#178 assigned defect
No remote backend support for: spelling correction, matchdecider
Reported by: | Richard Boulton | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 2.0.0 |
Component: | Backend-Remote | Version: | git master |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description (last modified by )
The remote database was briefly feature complete, but it's fallen behind again - it doesn't support spelling correction (adding and removing spellings at index time is supported though) or matchdeciders.
We should ideally add these in to it at some point.
Attachments (5)
Change History (40)
comment:1 by , 17 years ago
Blocking: | 120 added |
---|
comment:2 by , 17 years ago
Owner: | changed from | to
---|
comment:3 by , 17 years ago
Status: | new → assigned |
---|
comment:4 by , 17 years ago
Cc: | added |
---|---|
Summary: | Remote database does not support spelling correction → Remote database does not support spelling correction or synonyms |
It never supported matchdeciders. I don't believe those or matchspies are implementable in 1.0.x.
Let's make this bug for spelling correction and synonyms, and bug#183 can be for remote support for Xapian::MatchDecider.
comment:5 by , 17 years ago
Operating System: | → All |
---|---|
Summary: | Remote database does not support spelling correction or synonyms → No remote backend support for: spelling correction, synonyms, metadata |
And now ... metadata!
comment:7 by , 17 years ago
Description: | modified (diff) |
---|---|
Milestone: | → 1.1 |
comment:8 by , 17 years ago
Blocking: | 120 removed |
---|
(In #120) Remove the unfixed dependencies so we can close this bug - they're all marked for the 1.1.0 milestone.
comment:9 by , 16 years ago
Milestone: | 1.1.0 → 1.1.1 |
---|
These aren't incompatible changes (except for remote protocol changes) so bumping to 1.1.1.
comment:11 by , 15 years ago
Milestone: | 1.1.7 → 1.2.0 |
---|
These can be implemented in 1.2.x without incompatible changes. Bumping.
by , 15 years ago
Attachment: | metadata_and_mvcms_register.patch added |
---|
comment:12 by , 15 years ago
I've added a patch against the matchspy branch that does metadata and also registers MultiValueCountMatchSpy.
comment:13 by , 15 years ago
Incidentally - another thing that remote databases don't support: Enquire.get_mset can take a matchdecider, but that has no effect for remote databases.
comment:14 by , 15 years ago
Description: | modified (diff) |
---|---|
Summary: | No remote backend support for: spelling correction, synonyms, metadata → No remote backend support for: spelling correction, synonyms, matchdecider |
The MultiValueCountMatchSpy patch hunk isn't relevant for trunk, as that class doesn't exist there.
This changes the remote protocol incompatibly, so XAPIAN_REMOTE_PROTOCOL_MAJOR_VERSION needs bumping from 33 to 34, which I've done.
Also, the documentation for the remote protocol (remote_protocol.html) needs updating, which I've also done.
RemoteServer::set_metadata()
was truncating at zero bytes in the value - I've extended the metadata1 test to cover cases with zero bytes, and fixed the patch.
And the new methods use 2 space indents rather than the standard 4, so I fixed that too.
I assume this work is (C) Lemur, so I updated the headers to reflect that. If that's wrong, let me know.
Committed to trunk with the changes above as r13688.
comment:15 by , 15 years ago
Regarding comment:13, that should throw UnimplementedError
rather than silently ignoring the matchdecider.
by , 15 years ago
Attachment: | metadata_and_mvcms_register_and_spelling.patch added |
---|
comment:16 by , 15 years ago
I've added another patch which includes stuff for remote spellings. However I'm not really sure whether it's doing the right thing - I'm a bit confused about the separation of Database and Database::Internal as far as the various spelling related methods go. I've had to use Database::Internal inside RemoteServer, which nothing else does - so maybe this approach misses something?
open_spelling_wordlist sends them all back in one go, so is maybe not what's wanted? But that isn't used by get_spelling_suggestion which is the main thing I want to work at the moment.
comment:17 by , 15 years ago
It would be much more helpful to have a patch with just the new changes, not one which also includes those I've already applied to trunk.
As I said on IRC, I think Database::get_spelling_suggestion()
with remote databases should cause a remote search for the best spelling suggestion and just pass that back. Only the local databases should take part in the OrTermList
. You can easily calculate the edit distance for the remote spelling suggestion(s), and compare it/them to that of the best local one to find the best overall answer.
I don't think opening the spelling termlists and streaming them over is the right approach. It involves sending a lot of data, and if we have to read each as a single lump, will be rather slow too. Arrange to stream them is possible, but fiddly, and the effort is misdirected I feel.
So RemoteDatabase::open_spelling_termlist()
should never get called and can simply remain unimplemented.
And RemoteDatabase::open_spelling_wordlist()
can be implemented using the public API
(via Xapian::Database::spellings_begin()
and Xapian::Database::spellings_end()
).
Also, you're not being zero-byte safe (again) - in RemoteServer::msg_getspellingsuggestion(const string & message):
string word = string(p);
should be:
string word(p, p_end - p);
Otherwise word will be truncated at the first zero byte it contains (not a very likely occurrence, but it is easy to get right.
comment:18 by , 15 years ago
Oh, hmm - except that the search for the best spelling on the remote server should ideally use the spelling frequencies from the combined database. So this would only give the same answers if there's a single sub-database which is remote - for the more general case it won't, though the answers should be reasonable still. If you only care about that case, supporting only that would be an improvement over not supporting remote spelling at all!
I don't immediately see how to solve this issue though. The approach used for matching documents doesn't seem to work here.
comment:19 by , 15 years ago
Re comment:13 and comment:15, trunk as of r13690 throws UnimplementedError if you try to use a MatchDecider with the remote backend.
comment:20 by , 15 years ago
I'm not sure where it happened (IRC maybe), but a while ago we discussed whether MatchDecider was still useful - for the "match spy" use, there's now MatchSpy, and for filtering documents with a functor, PostingSource can do this too, but with more possibilities (e.g. weights, applying to sub-queries, applying more than one) and that already has remote support.
Perhaps we should deprecate MatchDecider in 1.2.0?
comment:21 by , 15 years ago
Milestone: | 1.2.x → 1.1.7 |
---|
And Paul Rubin has a newer patch for remote spelling support, which is might be nice to integrate before 1.2.0 - rationale: it'll avoid a (probably minor) protocol version bump in 1.2.x, and is likely to be minimally invasive, so if it breaks, it'll probably only be remote spelling support that is broken, and that doesn't work at all currently.
Not worth holding a release for, but marking this ticket for 1.1.7 so we take a concious decision on this and deprecating MatchDecider.
comment:22 by , 15 years ago
Owner: | changed from | to
---|---|
Status: | assigned → new |
It might be a good idea to deprecate MatchDecider for 1.2.0 (for removal in 1.3.0?), but I think we should try and provide an easy migration path for users (which would be a PostingSource which implemented the MatchDecider interface, so users could simply subclass that posting source and filter the query with it). PostingSource classes are quite a bit more fiddly to implement than MatchDecider classes.
I'm not 100% sure we should deprecate the MatchDecider interface - while it can be replicated with a PostingSource, it's a more convenient API. We could change the internal implementation to use a PostingSource, which would allow us to simplify the matcher code. For this reason, I don't think there's much need to hurry to deprecate MatchDecider.
I've emailed Paul Rudin to ask if he wants to submit the spelling patch to this ticket.
I don't have time to work on this ticket in the near future, so assigning it to "nobody" for clarity.
comment:23 by , 15 years ago
Cc: | removed |
---|---|
Owner: | changed from | to
Status: | new → assigned |
Yes, it would be for removal in 1.3.0 (or by 1.4.0 at least, assuming we have another development series, which I think probably makes sense as it generally worked well).
There are various costs to having redundant ways of doing things kept around for no good reason - there's more API for users to get to grips with, and more code around to maintain, which adds to the download and installed sizes, and to the build time of the library and applications using it, and perhaps worst of all for us, the testsuite runtime.
by , 15 years ago
Attachment: | remote-spell.patch added |
---|
comment:24 by , 15 years ago
I've attached a patch that include remote spelling. I'd intended to rework it a bit as it was hacked together quite quickly so that I could do some experimentation, but I'm not sure when I'd get around to it at the moment so here's what I have at the moment.
comment:25 by , 15 years ago
The part which moves and reindents a large chunk of code doesn't apply for me. I didn't look into what had changed underneath, as this seems a bit major to commit right now with 1.2.0 very close. It would also be nicer if we didn't have to poke in internals here.
I've pulled out the add_spelling() and remove_spelling() part, as that looks good, and might be useful to some users. Applied that in r14076. I'll attach a patch with the remaining changes, tidied up a bit.
by , 15 years ago
Attachment: | remote-spell-remainder.patch added |
---|
Updated patch with unapplied changes
comment:27 by , 14 years ago
Description: | modified (diff) |
---|---|
Summary: | No remote backend support for: spelling correction, synonyms, matchdecider → No remote backend support for: spelling correction, synonyms, matchdecider, metadata_keys_begin() |
by , 14 years ago
Attachment: | remote-spell-remainder-updated.patch added |
---|
Updated remote spelling suggestions patch.
comment:28 by , 14 years ago
Just attached an updated version of remote-spell-remainder.patch ('remote-spell-remainder-updated.patch') that applies cleanly against trunk (r15475 right now) and against 1.2.5.
How likely is this to be merged into mainline?
comment:29 by , 13 years ago
Milestone: | 1.2.x → 1.3.0 |
---|
tizoc: I'm not sure how likely it is - the patch reindents and moves code around, so it's not obvious just from reading it what is actually being changed. I'd have to review it in more detail, which I don't have time for at present.
Marking for milestone 1.3.0 for now, to make sure we consider this patch for the development series.
comment:30 by , 13 years ago
Milestone: | 1.3.0 → 1.3.x |
---|
comment:31 by , 9 years ago
Milestone: | 1.3.x → 1.4.x |
---|
Dear me, this has languished a long time. It would be good to this sorted out (at least the spelling part that there's a patch for).
Looking at the patch, it seems to implement approximately what's sketched out in comment:17 and comment:18, except that every candidate causes a fetch of the spelling frequency from each of the remote sub-databases, and that happens individually, incurring the network round-trip latency each time - that seems like it's going to be a performance killer in real-world use.
tizoc - have you been using this patch in production? If so, how well does it work?
In technical terms, it's not a 1.4.0 blocker (it's ABI-compatible and requires a minor protocol version bump), and at this point shortening the path to 1.4.0 is an important goal, so adjusting the milestone.
comment:32 by , 9 years ago
Hi olly.
I used it for a while without issues I think, but I then wrapped the Xapian store with a small HTTP application that opened the database files directly and without using the remote backend (was not needed anymore because that HTTP application was the only store reader/writer, other parts of the system just talked to it).
So, while for a few months it worked fine on what I was working on (I don't remember having any issues), I can't say it was really put to enough stress for enough time to confirm that it is production ready.
comment:34 by , 5 years ago
Description: | modified (diff) |
---|---|
Milestone: | 1.4.x → 1.5.0 |
Summary: | No remote backend support for: spelling correction, synonyms, matchdecider, metadata_keys_begin() → No remote backend support for: spelling correction, matchdecider, metadata_keys_begin() |
Version: | SVN trunk → git master |
Gaurav implemented remote support for synonyms in https://github.com/xapian/xapian/pull/284 (merged to master in 8bff57c22173dfbd3c14a1e1fad280ee8c38c423).
I don't think this or the remaining missing features are 1.4.x material at this point in the release cycle. Moving to 1.5.0 for now.
comment:35 by , 20 months ago
Description: | modified (diff) |
---|---|
Summary: | No remote backend support for: spelling correction, matchdecider, metadata_keys_begin() → No remote backend support for: spelling correction, matchdecider |
metadata_keys_begin()
was actually implemented a long time ago, back in 2010 in [eacacefcc5d29c95292a7b1a26cd8fa5a904748a] which debuted in 1.2.20. It's tested by testcase metadata5
which runs and passes for remote backends.
comment:36 by , 15 months ago
Milestone: | 1.5.0 → 2.0.0 |
---|
We should probably try and get this done in the 1.0.x series.