Opened 15 years ago
Last modified 20 months ago
#465 new defect
Stemmers which can produce multiple stems
Reported by: | Olly Betts | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 2.0.0 |
Component: | Library API | Version: | |
Severity: | normal | Keywords: | |
Cc: | asaf.bartov@… | Blocked By: | |
Blocking: | Operating System: | All |
Description
The current API assumes exactly one stem per word, but some stemming algorithms can produce multiple stems (and possibly not producing any stems would be useful too...)
For example:
Likely to be require API adjustments to handle well, so marking as 1.3.0 material for now.
Change History (5)
comment:1 by , 15 years ago
Cc: | added |
---|
comment:2 by , 13 years ago
comment:3 by , 13 years ago
Milestone: | 1.3.0 → 1.3.x |
---|
comment:5 by , 20 months ago
Milestone: | 1.4.x → 2.0.0 |
---|
Note:
See TracTickets
for help on using tickets.
Some API thoughts:
Double Metaphone and the Schinke stemmer both produce up to two forms, but the hunspell stemmer can apparently return more (at least from what was said in that thread). Certainly allowing for an indefinite number would be most flexible.
We probably want to return them as an ordered list rather than an unordered set in case the stems are ranked in some way (at least with Double Metaphone, one form is the "primary").