Opened 3 months ago

Last modified 3 months ago

#795 new enhancement

treat parenthesized subqueries the same for boolean and probabilistic fields

Reported by: bremner Owned by: Olly Betts
Priority: normal Milestone:
Component: QueryParser Version:
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

currently foo:(bar blah) is either parsed as foo:bar OP foo:blah for probabilistic fields but as "foo:(bar" OP blah for boolean fields. This is pretty surprising. I think it would better to in both cases parse the subquery with the default prefix foo.

Change History (3)

comment:1 by bremner, 3 months ago

I did some experiments with faking this in a field processor, by calling parse_query(stuff, prefix). There were two flaws I noticed with this approach

1) if 'stuff' had punctuation in it, it was parsed as phrases, which is probably not what you want for boolean operators (e.g. notmuch tags use non-letter characters quite extensively)

2) It doesn't respect the field specific default operator / exclusive status, because of course it has no idea what field the prefix is coming from.

comment:2 by Olly Betts, 3 months ago

If this was handled within the QueryParser then we could track if the current prefix is boolean or not and adapt parsing behaviour to suit.

Mostly a term inside a foo:( ... ) would be parsed as if actually prefixed with foo:.

In the non-boolean case, foo:(abc bar:xyz) switches prefix for xyz, so a potential difference is to handle this similarly for boolean prefixes for consistency, rather than being parsed like as foo:bar:xyz. But maybe the latter is better (in the non-boolean case : can't be part of the term, though it can be a phrase-generator).

comment:3 by bremner, 3 months ago

It's a bit of a tough call, but I think parsing foo:(bar baz:thunk) as foo:bar foo:baz:thunk is probably least confusing, mainly because it is a bit late to tell people that boolean terms should not contain :

Note: See TracTickets for help on using tickets.