Ticket #113 (assigned enhancement)
QueryParser limitation/inconsistency
| Reported by: | federico.schwindt | Owned by: | olly |
|---|---|---|---|
| Priority: | normal | Milestone: | 1.1.0 |
| Component: | QueryParser | Version: | SVN trunk |
| Severity: | minor | Keywords: | |
| Cc: | richard | Blocked By: | |
| Operating System: | All | Blocking: |
Description (last modified by richard) (diff)
Hi,
I've been using xapian (0.9.9 and now 0.9.10) recently at work and I've found
that the exquisite QueryParser? (no irony intended) imposes some serious limitations for certain queries, as it does treat some characters specially, even when flags does not contain FLAG_PHRASE.
I'm talking about the method is_phrase_generator(). In the organization I work
for we have a mixed setup of html documents and code. This includes several references to text in the word_word format. Unfortunately the QueryParser? treats underscore as phrase generator, making impossible to search for terms indexed using whitespace separators, even when allterms() shows the term exists on the database.
I believe this is an inconsistency and also a limitation in the QueryParser?,
as it does not matter what flags are used, in such cases where the query string contains any of the characters defined in is_phrase_generator(), the query will be automatically converted to a phrase search (note that these characters can't be changed).
In an ideal world (mine at least), I'd expect the user to define a phrase
(using " or any other previously defined character) and if this is not the case the QueryParser? should not try to convert the query to anything else (except for the defined operations, OR, AND, etc).
ITOH, I could change the indexing to strip the underscores (and the other
characters) and treat every part of the word_word as a separate term, but that would also mean that "word word" would match as well, when it's not what you wanted.
I hope you have this into consideration. Feel free to contact me if you need
further details or I can clarify anything else.
Many thanks,
f.-
