Opened 12 years ago

Last modified 4 years ago

#199 assigned enhancement

Merge MultiValueCountMatchSpy

Reported by: richard Owned by: richard
Priority: normal Milestone: 1.4.x
Component: Library API Version: SVN trunk
Severity: normal Keywords:
Cc: Blocked By: #198
Blocking: Operating System: All

Description (last modified by richard)

matchspy.h on the matchspy branch contains a MultiValueCountMatchSpy implementation, which allows a document to have multiple values in a single slot. This is definitely a useful feature when implemented faceting systems, but the current implementation is a bit unpleasant: it uses a pair of public classes to serialise and unserialise lists of values to be held in the slots.

A nicer approach would be to make values capable of storing multiple values. (See bug #198).

Attachments (4)

matchspy.patch (90.2 KB) - added by richard 10 years ago.
Patch for this - merge of most of matchspy branch
matchspy2.patch (90.0 KB) - added by richard 10 years ago.
Updated patch (with all files included this time)
matchspy3.patch (90.2 KB) - added by richard 10 years ago.
Patch as applied to trunk
matchspy_changes_13246_13247.patch (16.5 KB) - added by richard 10 years ago.
Remaining diffs on the branch

Download all attachments as: .zip

Change History (30)

comment:1 Changed 12 years ago by richard

  • Blocking 154 added

comment:2 Changed 12 years ago by richard

  • Blocked By 198 added

comment:3 Changed 12 years ago by olly

  • Status changed from new to assigned

comment:4 Changed 12 years ago by richard

  • Blocking 200 added; 154 removed
  • Summary changed from Tidy up matchspy.h before release to Tidy up matchspy.h for next release

Olly's backed the matchspy stuff out of SVN HEAD for now, so we've got until the 1.0.4 release to tidy this up. Changing to block #200 instead of #154.

comment:5 Changed 12 years ago by olly

  • Blocking 160 added; 200 removed

Moving to 1.1

comment:6 Changed 12 years ago by olly

  • Operating System set to All
  • Summary changed from Tidy up matchspy.h for next release to Tidy up matchspy.h for inclusion in a release

Reword summary

comment:8 Changed 12 years ago by richard

  • Description modified (diff)
  • Milestone set to 1.1

comment:9 Changed 12 years ago by richard

  • Blocking 160 removed

comment:10 Changed 11 years ago by olly

  • Owner changed from newbugs to richard
  • Status changed from assigned to new

comment:11 Changed 11 years ago by olly

  • Milestone changed from 1.1.0 to 1.1.1

Bumping milestone to 1.1.1 as there are unresolved issues and this isn't an incompatible change.

comment:12 Changed 11 years ago by olly

  • Milestone changed from 1.1.1 to 1.1.4

Triaging milestone:1.1.1 bugs.

comment:13 Changed 10 years ago by olly

  • Priority changed from normal to high

Changed 10 years ago by richard

Patch for this - merge of most of matchspy branch

comment:14 Changed 10 years ago by richard

  • Status changed from new to assigned

Attached matchspy.patch. This patch merges most of the changes on the matchspy branch to trunk, with the exception of the code which deals with a way to support multiple values in a single slot (eg StringListSerialiser?, and MultiValueMatchSpy?).

It also addresses ticket #183.

I havn't yet run coverage tests on this patch, but the code is in pretty good shape, I think (I did do coverage tests on some earlier versions of the code in the matchspy branch). I'll report back here when I've done some coverage tests.

comment:15 Changed 10 years ago by olly

A few comments:

  • As I mentioned on IRC, MultiMatchSpy can probably just have a reference to the vector of MatchSpy objects in Xapian::Enquire rather than making a copy.
  • If that works, then we can just make the MultiMatchSpy object an auto variable (since it's just a wrapper around a reference member variable) which avoids having to allocate it on the heap and use AutoPtr to manage its cleanup.
  • The "new" matchspy is applied after matchdecider, but the old one was applied before. Without thinking about it much, I think this is probably saner, but I thought I should point it out in case it's a mistake.
  • This adds some "naked" structs to the API, which isn't something we currently have, and tends to tie down the implementation rather a lot, so I think we need to carefully consider doing this. Classes with private members readable via getters inlined from the API headers would generate equally good code, not restrict future changes as much, and may be easier to wrap nicely with SWIG.

I noticed a few typos and minor tweaks I don't think are at all controversial, so I've just committed changes for those as that's less work that trying to describe them...

Not completely finished looking through, but it's nearing bedtime so I'll take a further look later.

comment:16 Changed 10 years ago by richard

Applying the new matchspy after the match decider was deliberate, since it should only really be applied to the documents which get into the mset. However, I don't think it's very important since we plan to deprecate matchdecider soon.

I've applied patches for all the other comments, since I agreed with them!

Changed 10 years ago by richard

Updated patch (with all files included this time)

Changed 10 years ago by richard

Patch as applied to trunk

comment:17 Changed 10 years ago by richard

Matchspy branch mostly merged to trunk. The outstanding parts are: StringListSerialiser? (and unserialiser), and MultiValueCountMatchSpy?. These are waiting on a resolution to ticket #198.

Changed 10 years ago by richard

Remaining diffs on the branch

comment:18 Changed 10 years ago by olly

  • Priority changed from high to low

The remaining changes are API-compatible additions, so updating the priority to "low".

comment:19 Changed 10 years ago by olly

  • Summary changed from Tidy up matchspy.h for inclusion in a release to Merge MultiValueCountMatchSpy
  • Type changed from defect to enhancement

Retitling to something more appropriate (I figure the MatchSpy subclass is the interesting one - whether it uses StringListSerialiser or multi-values or something else is largely an implementation detail).

Also marking as "enhancement".

comment:20 Changed 10 years ago by olly

  • Milestone changed from 1.1.4 to 1.2.0

Bumping to stay on track for release.

comment:21 Changed 10 years ago by richard

  • Description modified (diff)

Update summary to reflect the current state: all the other matchspies which were mentioned are now gone.

comment:22 Changed 10 years ago by richard

  • Description modified (diff)

comment:23 Changed 10 years ago by richard

  • Priority changed from low to normal

I've been looking at ticket #198, and I'm unconvinced it's something which we're going to be able to support without an excessive amount of work, and not until 1.3.0 at the earliest (and even then, not for the chert backend). In the interests of making it possible in the reasonably near future to do facet calculations with multiple values in a slot in a single document, I think the StringListSerialiser? concept is the way to go. It can probably be tidied up quite a bit, though.

On thought is that if we're adding this, we should probably add support for using values serialised with StringListSerialiser? for sorting (ie, add a KeyMaker? which understands the serialised format, and just picks the first string). We might be able to find nicer names, and a nicer API for the classes, too.

Side note: this ticket was marked as priority:low. I think that the lack of this will be a frequent complaint when users start trying to use the faceting support in the 1.2.0 release, so it should at least be a priority:normal ticket.

comment:24 Changed 10 years ago by olly

The "low" priority was from prioritising bugs to address (or not) before 1.2.0. Generally you can't read much into our bug priorities anyway, at least historically.

This certainly isn't 1.2.0 material, and right now I'd rather not be too distracted by discussion of issues which aren't relevant for that goal, so let's just see how the faceting support is actually received rather than trying to guess what user reaction might be.

comment:25 Changed 8 years ago by laurentm

Hi,

I need to store tags for documents in Xapian. A document can have multiple tags, so I'm looking for something like MultiValueCountMatchSpy? to do faceting search. (IMHO, this is a frequent need.)

I'm already using ValueCountMatchSpy? for single-valued fields, this is working nicely and I'm an happy xapian user :)

I hope to see this feature in the next release.

Thanks. Laurent

comment:26 Changed 7 years ago by olly

  • Milestone changed from 1.2.x to 1.3.x

This isn't 1.2.x material now.

comment:27 Changed 4 years ago by olly

  • Milestone changed from 1.3.x to 1.4.x

As noted in #198:

Nobody's working on this, and while there's interest in it, I don't think we have a clear design. It's also possible to use multi-valued facets already, you just need to unpack them in your own matchspy class. So I think it isn't worth holding up 1.4.0 for.

I think the same applies here

Note: See TracTickets for help on using tickets.