Opened 14 years ago
Closed 13 years ago
#562 closed enhancement (fixed)
QueryParser incorrectly generates stemmed terms for prefixed fields
| Reported by: | Vitaliy Filippov | Owned by: | Olly Betts | 
|---|---|---|---|
| Priority: | normal | Milestone: | 1.2.11 | 
| Component: | QueryParser | Version: | 1.2.6 | 
| Severity: | minor | Keywords: | |
| Cc: | Blocked By: | ||
| Blocking: | Operating System: | All | 
Description (last modified by )
QueryParser generates stemmed terms for prefixed fields as "PREFIX + stem". But TermGenerator indexes them as "Z + PREFIX + stem", so the search on stemmed terms inside prefixed fields returns incorrect results.
Example query: "title:идея". It generates the following query under STEM_ALL: Xapian::Query(Sиде:(pos=1)) ("иде" is the stem for russian word "идея"). But, TermGenerator has indexed stemmed term for title as "ZSиде"! So the search won't return correct results.
Attachments (2)
Change History (8)
by , 14 years ago
| Attachment: | patch-bug562.diff added | 
|---|
comment:1 by , 14 years ago
| Resolution: | → duplicate | 
|---|---|
| Status: | new → closed | 
The bug here is really that TermGenerator doesn't support generating terms for QueryParser's STEM_ALL mode, which is what #563 is asking for. If you're stemming all the terms, why add a "Z" in front of them all?
comment:2 by , 14 years ago
| Resolution: | duplicate | 
|---|---|
| Status: | closed → reopened | 
Hm. Interesting. It could be said like that. But I think always adding "Z" in front of all terms is better, because you'll then have an option - search this database with STEM_ALL or without STEM_ALL, and both will work! And you'll also have an option to search normal databases with STEM_ALL, or do more complex queries with exact terms. If you just say - search only the #563-style databases using STEM_ALL, then you don't have these options.
comment:3 by , 14 years ago
| Severity: | major → minor | 
|---|---|
| Type: | defect → enhancement | 
Then it needs to be a new STEM_<something> mode - we can't change the long-defined meaning of STEM_ALL.
comment:4 by , 14 years ago
This patch provides matching modes for QueryParser and TermGenerator.
The mode for indexing can have following 4 values :
- STEM_NONE: Don't index any stemmed word
 - STEM_SOME: Index both stemmed as well as full (non-stemmed) words.(note: prefix "Z" is present in front of the stemmed words). 
 - STEM_ALL:  Index only stemmed words.(note: stemmed words DONT have "Z" prefix).
 - STEM_ALL_Z:  Index only stemmed words. (note: stemmed words have "Z" prefix).
 
Correspondingly a new stemming strategy QueryParser::STEM_ALL_Z has been introduced.
comment:5 by , 13 years ago
| Description: | modified (diff) | 
|---|---|
| Milestone: | → 1.2.11 | 
Half of patch which adds !QueryParser::STEM_ALL_Z applied to trunk in r16626, along with test coverage for the new feature.
Marking to backport to 1.2.11.
comment:6 by , 13 years ago
| Resolution: | → fixed | 
|---|---|
| Status: | reopened → closed | 

Patch to fix this bug