Opened 13 years ago

Last modified 4 months ago

#224 new enhancement

Supply and optimise more OP_VALUE_ comparison operators

Reported by: Richard Boulton Owned by: visvesh
Priority: low Milestone: 1.4.x
Component: Matcher Version: git master
Severity: minor Keywords: GoodFirstBug
Cc: Blocked By:
Blocking: Operating System: All

Description (last modified by Olly Betts)

We currently supply OP_VALUE_GE and OP_VALUE_LE. It might be useful to also supply OP_VALUE_GT and OP_VALUE_LT, and possibly OP_VALUE_EQ.

Also, we currently supply a range operator which can be represented in terms of two comparison operators: ie, OP_VALUE_RANGE(1, a, b) is equivalent to: (OP_VALUE_GE(1, a) AND OP_VALUE_LE(1, b)). It would be nice to teach the query optimiser about this equivalence, so that a range specified as two comparison operators would be transformed into an efficient single ValueRangePostList internally, instead of the combination of two Value*Cmp*PostLists. Also, the ValueRangePostList could be made more flexible by adding flags to tell it whether each end of the range is inclusive or not, to allow other combinations of comparison operator to be represented as ranges.

Change History (11)

comment:1 by Richard Boulton, 13 years ago

Operating System: All
Status: newassigned

comment:2 by Olly Betts, 13 years ago

I think OP_VALUE_EQ would be ill-advised. If you want to find things matching an exact category, boolean terms are a lot more efficient, so OP_VALUE_EQ is just a tempting trap to disappoint users - for example:

http://article.gmane.org/gmane.comp.search.xapian.general/5481

If you really really know you want to do this (e.g. small collection, storing values for other reasons, smallest database size really matters) then you can just use OP_VALUE_RANGE with the same start and end.

comment:4 by Olly Betts, 12 years ago

Description: modified (diff)
Owner: changed from New Bugs to Olly Betts
Status: assignednew

comment:5 by Olly Betts, 9 years ago

Description: modified (diff)
Milestone: 1.2.x

Implementation can easily be upwardly API and ABI compatible.

comment:6 by Olly Betts, 8 years ago

Milestone: 1.2.x1.3.x

1.3.x material now, especially as the query internals have been rewritten in trunk.

comment:7 by Olly Betts, 5 years ago

Milestone: 1.3.x1.4.x

This isn't worth holding up 1.4.0 for - we haven't had user requests for such operators, and they could be added cleanly in a stable point release.

comment:8 by Olly Betts, 9 months ago

Keywords: GoodFirstBug added
Version: SVN trunkgit master

comment:9 by Olly Betts, 8 months ago

Type: defectenhancement

comment:10 by visvesh, 5 months ago

I am trying to do the first subpart, i.e. implement OP_VALUE_GT and OP_VALUE_LT. The component is labeled as "Matcher", but according to the API, I am supposed to make modifications to query.h in include directory and few more files in the api directory viz. query.cc, queryinternal.h, queryinternal.cc, and possibly some other files also. Why is the module labeled as "Matcher" instead of "Query"? Also I am unable to figure out the data type of data member internal in query object. Does its data type depend on the constructor called?

Last edited 5 months ago by visvesh (previous) (diff)

comment:11 by Olly Betts, 5 months ago

Owner: changed from Olly Betts to visvesh

Why is the module labeled as "Matcher" instead of "Query"?

The "Component" field in trac is really just one tool trac provides to try to impose a bit of order on the ticket pile, so I wouldn't read too much into what it's set to. It's often the case that a ticket doesn't actually fit completely into a single component. Free-form tagging via the "Keywords" field would allow multiple areas of the code to be specified, but you then get issues like someone using "matching" instead of "matcher" - with "Component" there's a pre-defined list of values to choose from. (One can also set a different default owner for a new ticket based on the component the submitter specifies, but currently we don't make use of that.)

The part you've chosen to look at does involve adding new operators to Query objects, but you'll also need to modify files under matcher/ to actually implement it.

We don't actually have a "Query" component in trac. "Library API" would cover that part, but I think overall the things discussed in this ticket are more about the matcher part than the API part.

Also I am unable to figure out the data type of data member internal in query object. Does its data type depend on the constructor called?

The internal member is a (reference-counted) pointer to a Xapian::Query::Internal object, but that's an abstract base class and it's concrete type will be a subclass of that, depending on the query operator/leaf subquery type (e.g. QueryTerm for a term, QueryAnd for OP_AND).

comment:12 by Boda, 4 months ago

Hey, I would like to work on this

Note: See TracTickets for help on using tickets.