Ticket #178 (assigned defect)

Opened 3 years ago

Last modified 2 months ago

No remote backend support for: spelling correction, synonyms, matchdecider

Reported by: richard Owned by: richard
Priority: normal Milestone: 1.2.x
Component: Backend-Remote Version: SVN trunk
Severity: normal Keywords:
Cc: olly Blocked By:
Operating System: All Blocking:

Description (last modified by olly) (diff)

The remote database was briefly feature complete, but it's fallen behind again - it doesn't support spelling correction, synonym expansion, or matchdeciders.

We should add these in to it at some point.

Attachments

metadata_and_mvcms_register.patch Download (5.7 KB) - added by pr100 2 months ago.
metadata_and_mvcms_register_and_spelling.patch Download (14.7 KB) - added by pr100 2 months ago.

Change History

Changed 3 years ago by richard

  • blocking 120 added

We should probably try and get this done in the 1.0.x series.

Changed 3 years ago by richard

  • owner changed from olly to richard

Changed 3 years ago by richard

  • status changed from new to assigned

Changed 2 years ago by olly

  • cc olly@… added
  • summary changed from Remote database does not support spelling correction to Remote database does not support spelling correction or synonyms

It never supported matchdeciders. I don't believe those or matchspies are implementable in 1.0.x.

Let's make this bug for spelling correction and synonyms, and bug#183 can be for remote support for Xapian::MatchDecider?.

Changed 2 years ago by olly

  • summary changed from Remote database does not support spelling correction or synonyms to No remote backend support for: spelling correction, synonyms, metadata

And now ... metadata!

Changed 2 years ago by trac

  • platform set to All

Changed 22 months ago by richard

  • description modified (diff)
  • milestone set to 1.1

Changed 22 months ago by richard

  • blocking 120 removed

(In #120) Remove the unfixed dependencies so we can close this bug - they're all marked for the 1.1.0 milestone.

Changed 14 months ago by olly

  • milestone changed from 1.1.0 to 1.1.1

These aren't incompatible changes (except for remote protocol changes) so bumping to 1.1.1.

Changed 10 months ago by olly

  • milestone changed from 1.1.1 to 1.1.7

Triaging milestone:1.1.1 bugs.

Changed 7 months ago by olly

  • milestone changed from 1.1.7 to 1.2.0

These can be implemented in 1.2.x without incompatible changes. Bumping.

Changed 2 months ago by pr100

Changed 2 months ago by pr100

I've added a patch against the matchspy branch that does metadata and also registers MultiValueCountMatchSpy?.

Changed 2 months ago by pr100

Incidentally - another thing that remote databases don't support: Enquire.get_mset can take a matchdecider, but that has no effect for remote databases.

Changed 2 months ago by olly

  • description modified (diff)
  • summary changed from No remote backend support for: spelling correction, synonyms, metadata to No remote backend support for: spelling correction, synonyms, matchdecider

The MultiValueCountMatchSpy patch hunk isn't relevant for trunk, as that class doesn't exist there.

This changes the remote protocol incompatibly, so XAPIAN_REMOTE_PROTOCOL_MAJOR_VERSION needs bumping from 33 to 34, which I've done.

Also, the documentation for the remote protocol (remote_protocol.html) needs updating, which I've also done.

RemoteServer::set_metadata() was truncating at zero bytes in the value - I've extended the metadata1 test to cover cases with zero bytes, and fixed the patch.

And the new methods use 2 space indents rather than the standard 4, so I fixed that too.

I assume this work is (C) Lemur, so I updated the headers to reflect that. If that's wrong, let me know.

Committed to trunk with the changes above as r13688.

Changed 2 months ago by olly

Regarding comment:13, that should throw UnimplementedError rather than silently ignoring the matchdecider.

Changed 2 months ago by pr100

Changed 2 months ago by pr100

I've added another patch which includes stuff for remote spellings. However I'm not really sure whether it's doing the right thing - I'm a bit confused about the separation of Database and Database::Internal as far as the various spelling related methods go. I've had to use Database::Internal inside RemoteServer?, which nothing else does - so maybe this approach misses something?

open_spelling_wordlist sends them all back in one go, so is maybe not what's wanted? But that isn't used by get_spelling_suggestion which is the main thing I want to work at the moment.

Changed 2 months ago by olly

It would be much more helpful to have a patch with just the new changes, not one which also includes those I've already applied to trunk.

As I said on IRC, I think Database::get_spelling_suggestion() with remote databases should cause a remote search for the best spelling suggestion and just pass that back. Only the local databases should take part in the OrTermList. You can easily calculate the edit distance for the remote spelling suggestion(s), and compare it/them to that of the best local one to find the best overall answer.

I don't think opening the spelling termlists and streaming them over is the right approach. It involves sending a lot of data, and if we have to read each as a single lump, will be rather slow too. Arrange to stream them is possible, but fiddly, and the effort is misdirected I feel.

So RemoteDatabase::open_spelling_termlist() should never get called and can simply remain unimplemented.

And RemoteDatabase::open_spelling_wordlist() can be implemented using the public API (via Xapian::Database::spellings_begin() and Xapian::Database::spellings_end()).

Also, you're not being zero-byte safe (again) - in RemoteServer::msg_getspellingsuggestion(const string & message):

    string word = string(p); 

should be:

    string word(p, p_end - p);

Otherwise word will be truncated at the first zero byte it contains (not a very likely occurrence, but it is easy to get right.

Changed 2 months ago by olly

Oh, hmm - except that the search for the best spelling on the remote server should ideally use the spelling frequencies from the combined database. So this would only give the same answers if there's a single sub-database which is remote - for the more general case it won't, though the answers should be reasonable still. If you only care about that case, supporting only that would be an improvement over not supporting remote spelling at all!

I don't immediately see how to solve this issue though. The approach used for matching documents doesn't seem to work here.

Changed 2 months ago by olly

Re comment:13 and comment:15, trunk as of r13690 throws UnimplementedError if you try to use a MatchDecider with the remote backend.

Note: See TracTickets for help on using tickets.