Ticket #128 (assigned enhancement)
Allow queryparser to treat some prefixes as literal text
| Reported by: | richard | Owned by: | richard |
|---|---|---|---|
| Priority: | normal | Milestone: | 1.1.0 |
| Component: | QueryParser | Version: | SVN trunk |
| Severity: | minor | Keywords: | |
| Cc: | olly, sidnei, mhammond | Blocked By: | |
| Operating System: | All | Blocking: |
Description (last modified by richard) (diff)
By default, the query parser splits words at spaces and applies lower-casing, stemming, and other normalisation to generate terms.
I believe that it should be possible to override the query parser's default behaviour for fields with a given set of prefixs, such that the query parser will treat some terms as literal text, allowing any character to occur in the term (including spaces and quotes), and not applying stemming or other normalisation to the term.
My thinking is that this can be implemented by adding a third prefix type (which I've called "EXACT_TEXT" for want of a better name), which causes the query parser to put all the characters following the prefix until the next space or ')' into the term (like terms with a "BOOL_FILTER" prefix type). The terms so generated are then included in the query structure in the same way as "FREE_TEXT" terms - ie, they obey surrounding boolean operators, and '+' and '-' prefixes.
In order to allow spaces (and ')' characters) in the terms, the query parser should support basic backslash escaping for the contents of such fields.
I have a patch which implements this that I'll attach to this bug report shortly. The patch has a few test cases (but more are needed for such a new feature), and has I've not written any documentation for it yet.
I know that Sidnei needs this for something he's working on, and I'd be delighted if we managed to get this into 1.0 since I'm going to have to maintain it until it gets committed, but it needs thorough review before being committed and timescales for 1.0 may not allow this.
