Opened 17 years ago

Last modified 20 months ago

#199 assigned enhancement

Merge MultiValueCountMatchSpy

Reported by: Richard Boulton Owned by: Richard Boulton
Priority: normal Milestone: 2.0.0
Component: Library API Version: git master
Severity: normal Keywords:
Cc: Blocked By: #198
Blocking: Operating System: All

Description (last modified by Richard Boulton)

matchspy.h on the matchspy branch contains a MultiValueCountMatchSpy implementation, which allows a document to have multiple values in a single slot. This is definitely a useful feature when implemented faceting systems, but the current implementation is a bit unpleasant: it uses a pair of public classes to serialise and unserialise lists of values to be held in the slots.

A nicer approach would be to make values capable of storing multiple values. (See bug #198).

Attachments (4)

matchspy.patch (90.2 KB ) - added by Richard Boulton 15 years ago.
Patch for this - merge of most of matchspy branch
matchspy2.patch (90.0 KB ) - added by Richard Boulton 15 years ago.
Updated patch (with all files included this time)
matchspy3.patch (90.2 KB ) - added by Richard Boulton 15 years ago.
Patch as applied to trunk
matchspy_changes_13246_13247.patch (16.5 KB ) - added by Richard Boulton 15 years ago.
Remaining diffs on the branch

Download all attachments as: .zip

Change History (31)

comment:1 by Richard Boulton, 17 years ago

Blocking: 154 added

comment:2 by Richard Boulton, 17 years ago

Blocked By: 198 added

comment:3 by Olly Betts, 17 years ago

Status: newassigned

comment:4 by Richard Boulton, 17 years ago

Blocking: 200 added; 154 removed
Summary: Tidy up matchspy.h before releaseTidy up matchspy.h for next release

Olly's backed the matchspy stuff out of SVN HEAD for now, so we've got until the 1.0.4 release to tidy this up. Changing to block #200 instead of #154.

comment:5 by Olly Betts, 17 years ago

Blocking: 160 added; 200 removed

Moving to 1.1

comment:6 by Olly Betts, 17 years ago

Operating System: All
Summary: Tidy up matchspy.h for next releaseTidy up matchspy.h for inclusion in a release

Reword summary

comment:8 by Richard Boulton, 17 years ago

Description: modified (diff)
Milestone: 1.1

comment:9 by Richard Boulton, 17 years ago

Blocking: 160 removed

comment:10 by Olly Betts, 17 years ago

Owner: changed from New Bugs to Richard Boulton
Status: assignednew

comment:11 by Olly Betts, 16 years ago

Milestone: 1.1.01.1.1

Bumping milestone to 1.1.1 as there are unresolved issues and this isn't an incompatible change.

comment:12 by Olly Betts, 16 years ago

Milestone: 1.1.11.1.4

Triaging milestone:1.1.1 bugs.

comment:13 by Olly Betts, 15 years ago

Priority: normalhigh

by Richard Boulton, 15 years ago

Attachment: matchspy.patch added

Patch for this - merge of most of matchspy branch

comment:14 by Richard Boulton, 15 years ago

Status: newassigned

Attached matchspy.patch. This patch merges most of the changes on the matchspy branch to trunk, with the exception of the code which deals with a way to support multiple values in a single slot (eg StringListSerialiser, and MultiValueMatchSpy).

It also addresses ticket #183.

I havn't yet run coverage tests on this patch, but the code is in pretty good shape, I think (I did do coverage tests on some earlier versions of the code in the matchspy branch). I'll report back here when I've done some coverage tests.

comment:15 by Olly Betts, 15 years ago

A few comments:

  • As I mentioned on IRC, MultiMatchSpy can probably just have a reference to the vector of MatchSpy objects in Xapian::Enquire rather than making a copy.
  • If that works, then we can just make the MultiMatchSpy object an auto variable (since it's just a wrapper around a reference member variable) which avoids having to allocate it on the heap and use AutoPtr to manage its cleanup.
  • The "new" matchspy is applied after matchdecider, but the old one was applied before. Without thinking about it much, I think this is probably saner, but I thought I should point it out in case it's a mistake.
  • This adds some "naked" structs to the API, which isn't something we currently have, and tends to tie down the implementation rather a lot, so I think we need to carefully consider doing this. Classes with private members readable via getters inlined from the API headers would generate equally good code, not restrict future changes as much, and may be easier to wrap nicely with SWIG.

I noticed a few typos and minor tweaks I don't think are at all controversial, so I've just committed changes for those as that's less work that trying to describe them...

Not completely finished looking through, but it's nearing bedtime so I'll take a further look later.

comment:16 by Richard Boulton, 15 years ago

Applying the new matchspy after the match decider was deliberate, since it should only really be applied to the documents which get into the mset. However, I don't think it's very important since we plan to deprecate matchdecider soon.

I've applied patches for all the other comments, since I agreed with them!

by Richard Boulton, 15 years ago

Attachment: matchspy2.patch added

Updated patch (with all files included this time)

by Richard Boulton, 15 years ago

Attachment: matchspy3.patch added

Patch as applied to trunk

comment:17 by Richard Boulton, 15 years ago

Matchspy branch mostly merged to trunk. The outstanding parts are: StringListSerialiser (and unserialiser), and MultiValueCountMatchSpy. These are waiting on a resolution to ticket #198.

by Richard Boulton, 15 years ago

Remaining diffs on the branch

comment:18 by Olly Betts, 15 years ago

Priority: highlow

The remaining changes are API-compatible additions, so updating the priority to "low".

comment:19 by Olly Betts, 15 years ago

Summary: Tidy up matchspy.h for inclusion in a releaseMerge MultiValueCountMatchSpy
Type: defectenhancement

Retitling to something more appropriate (I figure the MatchSpy subclass is the interesting one - whether it uses StringListSerialiser or multi-values or something else is largely an implementation detail).

Also marking as "enhancement".

comment:20 by Olly Betts, 15 years ago

Milestone: 1.1.41.2.0

Bumping to stay on track for release.

comment:21 by Richard Boulton, 15 years ago

Description: modified (diff)

Update summary to reflect the current state: all the other matchspies which were mentioned are now gone.

comment:22 by Richard Boulton, 15 years ago

Description: modified (diff)

comment:23 by Richard Boulton, 15 years ago

Priority: lownormal

I've been looking at ticket #198, and I'm unconvinced it's something which we're going to be able to support without an excessive amount of work, and not until 1.3.0 at the earliest (and even then, not for the chert backend). In the interests of making it possible in the reasonably near future to do facet calculations with multiple values in a slot in a single document, I think the StringListSerialiser concept is the way to go. It can probably be tidied up quite a bit, though.

On thought is that if we're adding this, we should probably add support for using values serialised with StringListSerialiser for sorting (ie, add a KeyMaker which understands the serialised format, and just picks the first string). We might be able to find nicer names, and a nicer API for the classes, too.

Side note: this ticket was marked as priority:low. I think that the lack of this will be a frequent complaint when users start trying to use the faceting support in the 1.2.0 release, so it should at least be a priority:normal ticket.

comment:24 by Olly Betts, 15 years ago

The "low" priority was from prioritising bugs to address (or not) before 1.2.0. Generally you can't read much into our bug priorities anyway, at least historically.

This certainly isn't 1.2.0 material, and right now I'd rather not be too distracted by discussion of issues which aren't relevant for that goal, so let's just see how the faceting support is actually received rather than trying to guess what user reaction might be.

comment:25 by laurentm, 13 years ago

Hi,

I need to store tags for documents in Xapian. A document can have multiple tags, so I'm looking for something like MultiValueCountMatchSpy to do faceting search. (IMHO, this is a frequent need.)

I'm already using ValueCountMatchSpy for single-valued fields, this is working nicely and I'm an happy xapian user :)

I hope to see this feature in the next release.

Thanks. Laurent

comment:26 by Olly Betts, 12 years ago

Milestone: 1.2.x1.3.x

This isn't 1.2.x material now.

comment:27 by Olly Betts, 9 years ago

Milestone: 1.3.x1.4.x

As noted in #198:

Nobody's working on this, and while there's interest in it, I don't think we have a clear design. It's also possible to use multi-valued facets already, you just need to unpack them in your own matchspy class. So I think it isn't worth holding up 1.4.0 for.

I think the same applies here

comment:28 by Olly Betts, 20 months ago

Milestone: 1.4.x2.0.0
Version: SVN trunkgit master
Note: See TracTickets for help on using tickets.