Opened 9 years ago

Closed 8 years ago

#562 closed enhancement (fixed)

QueryParser incorrectly generates stemmed terms for prefixed fields

Reported by: Vitaliy Filippov Owned by: Olly Betts
Priority: normal Milestone: 1.2.11
Component: QueryParser Version: 1.2.6
Severity: minor Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description (last modified by Olly Betts)

QueryParser generates stemmed terms for prefixed fields as "PREFIX + stem". But TermGenerator indexes them as "Z + PREFIX + stem", so the search on stemmed terms inside prefixed fields returns incorrect results.

Example query: "title:идея". It generates the following query under STEM_ALL: Xapian::Query(Sиде:(pos=1)) ("иде" is the stem for russian word "идея"). But, TermGenerator has indexed stemmed term for title as "ZSиде"! So the search won't return correct results.

Attachments (2)

patch-bug562.diff (446 bytes ) - added by Vitaliy Filippov 9 years ago.
Patch to fix this bug
modified.patch (7.0 KB ) - added by Sehaj Singh Kalra 8 years ago.
Combined Patch for this(#562) as well as #563

Download all attachments as: .zip

Change History (8)

by Vitaliy Filippov, 9 years ago

Attachment: patch-bug562.diff added

Patch to fix this bug

comment:1 by Olly Betts, 9 years ago

Resolution: duplicate
Status: newclosed

The bug here is really that TermGenerator doesn't support generating terms for QueryParser's STEM_ALL mode, which is what #563 is asking for. If you're stemming all the terms, why add a "Z" in front of them all?

comment:2 by Vitaliy Filippov, 9 years ago

Resolution: duplicate
Status: closedreopened

Hm. Interesting. It could be said like that. But I think always adding "Z" in front of all terms is better, because you'll then have an option - search this database with STEM_ALL or without STEM_ALL, and both will work! And you'll also have an option to search normal databases with STEM_ALL, or do more complex queries with exact terms. If you just say - search only the #563-style databases using STEM_ALL, then you don't have these options.

comment:3 by Olly Betts, 9 years ago

Severity: majorminor
Type: defectenhancement

Then it needs to be a new STEM_<something> mode - we can't change the long-defined meaning of STEM_ALL.

by Sehaj Singh Kalra, 8 years ago

Attachment: modified.patch added

Combined Patch for this(#562) as well as #563

comment:4 by Sehaj Singh Kalra, 8 years ago

This patch provides matching modes for QueryParser and TermGenerator. The mode for indexing can have following 4 values :

  1. STEM_NONE: Don't index any stemmed word
  2. STEM_SOME: Index both stemmed as well as full (non-stemmed) words.(note: prefix "Z" is present in front of the stemmed words).
  3. STEM_ALL: Index only stemmed words.(note: stemmed words DONT have "Z" prefix).
  4. STEM_ALL_Z: Index only stemmed words. (note: stemmed words have "Z" prefix).

Correspondingly a new stemming strategy QueryParser::STEM_ALL_Z has been introduced.

comment:5 by Olly Betts, 8 years ago

Description: modified (diff)
Milestone: 1.2.11

Half of patch which adds !QueryParser::STEM_ALL_Z applied to trunk in r16626, along with test coverage for the new feature.

Marking to backport to 1.2.11.

comment:6 by Olly Betts, 8 years ago

Resolution: fixed
Status: reopenedclosed

Backported in r16716 and r16718.

Note: See TracTickets for help on using tickets.