Opened 13 years ago
Closed 12 years ago
#562 closed enhancement (fixed)
QueryParser incorrectly generates stemmed terms for prefixed fields
Reported by: | Vitaliy Filippov | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 1.2.11 |
Component: | QueryParser | Version: | 1.2.6 |
Severity: | minor | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description (last modified by )
QueryParser generates stemmed terms for prefixed fields as "PREFIX + stem". But TermGenerator indexes them as "Z + PREFIX + stem", so the search on stemmed terms inside prefixed fields returns incorrect results.
Example query: "title:идея". It generates the following query under STEM_ALL: Xapian::Query(Sиде:(pos=1)) ("иде" is the stem for russian word "идея"). But, TermGenerator has indexed stemmed term for title as "ZSиде"! So the search won't return correct results.
Attachments (2)
Change History (8)
by , 13 years ago
Attachment: | patch-bug562.diff added |
---|
comment:1 by , 13 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
The bug here is really that TermGenerator doesn't support generating terms for QueryParser's STEM_ALL mode, which is what #563 is asking for. If you're stemming all the terms, why add a "Z" in front of them all?
comment:2 by , 13 years ago
Resolution: | duplicate |
---|---|
Status: | closed → reopened |
Hm. Interesting. It could be said like that. But I think always adding "Z" in front of all terms is better, because you'll then have an option - search this database with STEM_ALL or without STEM_ALL, and both will work! And you'll also have an option to search normal databases with STEM_ALL, or do more complex queries with exact terms. If you just say - search only the #563-style databases using STEM_ALL, then you don't have these options.
comment:3 by , 13 years ago
Severity: | major → minor |
---|---|
Type: | defect → enhancement |
Then it needs to be a new STEM_<something> mode - we can't change the long-defined meaning of STEM_ALL.
comment:4 by , 13 years ago
This patch provides matching modes for QueryParser and TermGenerator.
The mode for indexing can have following 4 values :
- STEM_NONE: Don't index any stemmed word
- STEM_SOME: Index both stemmed as well as full (non-stemmed) words.(note: prefix "Z" is present in front of the stemmed words).
- STEM_ALL: Index only stemmed words.(note: stemmed words DONT have "Z" prefix).
- STEM_ALL_Z: Index only stemmed words. (note: stemmed words have "Z" prefix).
Correspondingly a new stemming strategy QueryParser::STEM_ALL_Z has been introduced.
comment:5 by , 12 years ago
Description: | modified (diff) |
---|---|
Milestone: | → 1.2.11 |
Half of patch which adds !QueryParser::STEM_ALL_Z applied to trunk in r16626, along with test coverage for the new feature.
Marking to backport to 1.2.11.
comment:6 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Patch to fix this bug