Opened 15 years ago

Last modified 20 months ago

#465 new defect

Stemmers which can produce multiple stems

Reported by: Olly Betts Owned by: Olly Betts
Priority: normal Milestone: 2.0.0
Component: Library API Version:
Severity: normal Keywords:
Cc: asaf.bartov@… Blocked By:
Blocking: Operating System: All

Description

The current API assumes exactly one stem per word, but some stemming algorithms can produce multiple stems (and possibly not producing any stems would be useful too...)

For example:

Likely to be require API adjustments to handle well, so marking as 1.3.0 material for now.

Change History (5)

comment:1 by Asaf Bartov, 15 years ago

Cc: asaf.bartov@… added

comment:2 by Olly Betts, 13 years ago

Some API thoughts:

Double Metaphone and the Schinke stemmer both produce up to two forms, but the hunspell stemmer can apparently return more (at least from what was said in that thread). Certainly allowing for an indefinite number would be most flexible.

We probably want to return them as an ordered list rather than an unordered set in case the stems are ranked in some way (at least with Double Metaphone, one form is the "primary").

comment:3 by Olly Betts, 13 years ago

Milestone: 1.3.01.3.x

comment:4 by Olly Betts, 9 years ago

Milestone: 1.3.x1.4.x

Not a blocker for 1.4.0.

comment:5 by Olly Betts, 20 months ago

Milestone: 1.4.x2.0.0
Note: See TracTickets for help on using tickets.