Ticket #167 (assigned enhancement)

Opened 19 months ago

Last modified 19 months ago

Add mode to query parser to search for both stemmed and unstemmed forms

Reported by: richard Owned by: olly
Priority: normal Milestone:
Component: QueryParser Version: SVN trunk
Severity: minor Keywords:
Cc: Blocked By:
Operating System: All Blocking:

Description

Now that we store both the stemmed and unstemmed forms of each word in the database, it might be nice to add a new stemming mode to the query parser which takes each word in the query and generates an "OR" query for it, with two parts; one being the unstemmed form and one being the stemmed form. This would mean that each query would match any document with words which match the stemmed form, but would give documents with the unstemmed form a higher weight.

We might call this option "STEM_BOTH", or some better name that someone other than me can think of.

Change History

Changed 19 months ago by olly

  • status changed from new to assigned
  • severity changed from normal to enhancement

Perhaps a special query operator would be useful here - the statistics are probably going to be different since we know that documents indexed by the unstemmed for are (or at least should be) indexed by the stemmed form too.

Changed 19 months ago by trac

  • platform set to All

Changed 19 months ago by richard

Yes, something to adjust the weights might be a good idea. I'm not quite sure what it would do, though: perhaps a synonym, but with the wdf for the unstemmed form given a multiplier, making unstemmed forms match with a higher effective wdf. We probably need to experiment with a few things.

Note: See TracTickets for help on using tickets.