Opened 10 years ago

Closed 9 years ago

#663 closed enhancement (fixed)

Make ValueRangeProcessor return a Query object

Reported by: Olly Betts Owned by: Olly Betts
Priority: normal Milestone: 1.3.6
Component: QueryParser Version: SVN trunk
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

Currently VRP takes two strings by reference, potentially modifies them, and returns a value slot number.

It would be more flexible and easier to wrap for the bindings if it took two strings by const reference and returned a Query object instead. There's an example in the getting started guide suggesting adding boolean terms to restrict the number of docs a VRP has to consider, but this doesn't really work as you can't insert an OP_FILTER for such terms at the right place in the query tree - this change would allow that to work nicely.

We could provide this in 1.4 by having two overloaded virtual forms of the operator()() method with the existing and new signature, and have a default implementation of the new method which calls the old.

Change History (8)

comment:1 by Olly Betts, 10 years ago

Milestone: 1.3.31.3.4
Status: newassigned

Perhaps a cleaner way to provide this is to add a new RangeProcessor class, since there would no longer be a tie to using OP_VALUE_RANGE for the resulting query (though it would probably be the most common implementation). Then ValueRangeProcessor can be deprecated.

We should think if there are any other aspects of VRPs we'd like to adjust, as this would be a good time to address them.

Bumping to 1.3.4 so we can think about this a bit more.

comment:2 by Olly Betts, 10 years ago

Blocking: 193 added

(In #193) Ah, that's #663.

comment:3 by Olly Betts, 10 years ago

Blocking: 193 removed

This would address #193 too.

One issue I noticed: QueryParser wants to know which slot the range is for, so it knows about combining ranges on the same slot with OP_OR and then combine these groups with OP_AND. That isn't unsurmountable, but would probably need us to return both a Query object and some sort of string or number "key" to identify the type of range. Or perhaps this is an overly fussy little detail, which most people wouldn't expect us to worry about. FieldProcessor just returns a Query object after all.

comment:4 by Olly Betts, 9 years ago

We can check the type of the returned query with Query::get_type() and could easily add a method to return the slot for a range query. But then returning different query types for some cases wouldn't work so well.

Perhaps some sort of OP_FILTER_KEY operator, which takes a single subquery and just delegates to it when evaluated, but serves as something which can store the "key" above (and could be added by FieldProcessor as well). If we go that way, a clear name would be good.

This is a fairly simple change (once we have a plan) and the benefit seems high so I'd quite like to try to do it for 1.4.0.

comment:5 by Olly Betts, 9 years ago

Milestone: 1.3.41.3.5

Ticket retargeted after milestone closed

comment:6 by Olly Betts, 9 years ago

Milestone: 1.3.51.3.x

Changing milestone as I want to get 1.3.5 out.

comment:7 by Olly Betts, 9 years ago

Milestone: 1.3.x1.3.6

I have been working on this.

I've tried making operator() return std::pair<Xapian::Query, std::string> but it makes writing a custom range processor class feel rather clumsy. A few details:

  • The second parameter is a "grouping key" - I used str(slot) for value ranges
  • I used std::make_pair(Xapian::Query(), std::string()) to mean "range not handled"
  • I provided a helper method make_value_range(Xapian::valueno slot, const std::string& begin, const std::string& end) to create a value range (with slot == Xapian::BAD_VALUENO meaning "range not handled", and end.empty() meaning OP_VALUE_GE), which helps a bit
  • FieldProcessor really ought to be consistent with this - we could change its API incompatibly at this point, but I am aware people have already started to use it

I'm wondering if the OP_FILTER_KEY approach would work better overall - the key benefits I can see are:

  • Simple return type (Xapian::Query)
  • Simple cases (prefixed terms, value ranges) can just work automatically by introspecting the returned Query object
  • Complex cases (e.g. wanting to group date: range processor and date: field processor together, and grouping a range which may be optimised using terms) are supported
  • Something like OP_NULL can cleanly signify "range not handled" - having a "no query" Query object (distinct from a Query which matches nothing) seems like it would be more widely useful

comment:8 by Olly Betts, 9 years ago

Resolution: fixed
Status: assignedclosed

Fixed in [cca8b4a5cf6f68bc44b26aff00a9a6ccb7925d74].

After trying various approaches, I've made the return type Xapian::Query with a new OP_INVALID op for "range not handled". But rather than OP_FILTER_KEY, the grouping key can optionally be explicitly specified to QueryParser methods add_boolean_prefix() and add_rangeprocessor(), with sane defaults if not specified.

Note: See TracTickets for help on using tickets.