Opened 16 years ago

Closed 8 years ago

Last modified 8 years ago

#234 closed enhancement (fixed)

add an option to specify whether filter terms of a given prefix should be ORed or ANDed together

Reported by: tv+xapian.org Owned by: Olly Betts
Priority: normal Milestone: 1.3.4
Component: Omega Version: SVN trunk
Severity: minor Keywords:
Cc: nonijil@… Blocked By:
Blocking: Operating System: All

Description (last modified by Olly Betts)

Hi,

the patch at http://people.debian.org/~tviehmann/list-search/xapian_omega_add_option_filter_defaultop.diff adds an option map to allow overriding the filter behaviour from OR to AND among the terms of a given prefix. For example, if first and last name are indexed with prefix A, I would add

$setmap{filter_defaultop,A,AND}

to the query template in order to be handle first, last, or first and last name entered into the appropriate fields.

Kind regards

Thomas

URL: http://people.debian.org/~tviehmann/list-search/xapian_omega_add_option_filter_defaultop.diff

Attachments (2)

xapian_omega_add_option_filter_defaultop.diff (610 bytes ) - added by tv+xapian.org 16 years ago.
patch as described
diff4ORopOption (533 bytes ) - added by laserbled 13 years ago.
This is the diff to have a option to OR the prefix. We have AND operator as the default. This is a behavioral change likely to be accompanied with this patch to provide user with such an option.

Download all attachments as: .zip

Change History (12)

by tv+xapian.org, 16 years ago

patch as described

comment:1 by Olly Betts, 16 years ago

op_sys: LinuxAll
Owner: changed from New Bugs to Olly Betts
rep_platform: PCAll

comment:2 by Olly Betts, 16 years ago

Status: newassigned

comment:3 by Olly Betts, 16 years ago

Operating System: All
Summary: add an option to specifiy whether filter terms of a given prefix should be ORed or ANDed togetheradd an option to specify whether filter terms of a given prefix should be ORed or ANDed together

I tend to feel that this particular case would be better handled by allowing Omega to parse the author field probabilistically - the current approach comes from a nasty hack I used for Gmane I believe.

But there might be uses for AND rather than OR on the same prefix, so the patch might still be worthwhile. It would need to be a case where the filter terms are overlapping but not just in a subset/superset way.

in reply to:  3 comment:5 by Carl Worth, 14 years ago

Replying to olly:

But there might be uses for AND rather than OR on the same prefix, so the patch might still be worthwhile. It would need to be a case where the filter terms are overlapping but not just in a subset/superset way.

The notmuch email program definitely has a use for this. There we tag various emails, so I might want to search for emails with "tag:xapian tag:bugs" and I would want want those to be ANDed together.

The last time I investigated this issue, I remember seeing the proposal to indicate on a pure-prefix basis whether OR or AND was desired, (or perhaps the proposed sematic was to indicate whether a given prefix could appear multiply or not and the appropriate operator is chosen accordingly).

I'm not seeing that proposal in any open ticket now though. Am I misremembering, or am I just failing to find it?

-Carl

comment:6 by Olly Betts, 14 years ago

Description: modified (diff)

This ticket is a different issue really - it's about how Omega builds a filter from multiple B CGI parameters, whereas your issue is about the QueryParser's handling of boolean filter terms. It would be better not to hijack this ticket other issue, and there's already #402 for your issue.

by laserbled, 13 years ago

Attachment: diff4ORopOption added

This is the diff to have a option to OR the prefix. We have AND operator as the default. This is a behavioral change likely to be accompanied with this patch to provide user with such an option.

comment:7 by laserbled, 13 years ago

Cc: nonijil@… added

comment:8 by Olly Betts, 12 years ago

Milestone: 1.3.x

comment:9 by Olly Betts, 8 years ago

diff4ORopOption​ is changing the wrong thing - that's the default operator used for "probabilistic" queries (i.e. for fields where the text is split into words at whitespace, etc). I think this patch was probably meant for #512.

This ticket is about allowing control of the operator used when there are multiple filters in the query - e.g.:

baked potatoes host:recipes.com host:cooking.org type:pdf type:ps

The current Omega logic is that filters with the same prefix are OR-ed together, and then those groups are AND-ed together - the above example is interpreted as:

(baked potatoes) AND (host:recipes.com OR host:cooking.org) AND (type:pdf OR type:ps)

Generally that does the sensible thing - it's certainly what you want for type: (where a document only has one type), or for a case where it can have multiple values, but they're strictly supersets/subsets (if you index a host: term for each level of the domain, e.g. documents on vegan.recipes.com could be selected by any of the filters host:vegan.recipes.com, host:recipes.com or host:com, then OR-ing multiple host: filters also makes most sense).

But sometimes a filter prefix has overlapping terms, and you actually want them AND-ed. E.g. for colour: you might want colour:red colour:blue to return only the items which are part red and part blue.

Thomas' original patch is along the right lines - we should probably apply something like this before 1.4.0.

comment:10 by Olly Betts, 8 years ago

Milestone: 1.3.x1.3.5

Grouping the omega 1.3.* tickets on 1.3.5.

comment:11 by Olly Betts, 8 years ago

Milestone: 1.3.51.3.4
Resolution: fixed
Status: assignedclosed

Implemented in [e077e8f70b83955b19f5c05368f44b9ad1846c21].

I've used the terminology "non-exclusive prefix", as that matches what the QueryParser::add_boolean_prefix() method calls a similar thing:

$setmap{nonexclusiveprefix,XCOLOUR,true}

Setting this affects both filter terms from B CGI parameters (e.g. B=XCOLOURred&B=XCOLOURblue, and those from parsing the query string (e.g. colour:red colour:blue).

Last edited 8 years ago by Olly Betts (previous) (diff)
Note: See TracTickets for help on using tickets.