Opened 12 years ago
Last modified 14 months ago
#609 new enhancement
term generation for some French elisions produces imperfect results.
Reported by: | Paul Rudin | Owned by: | Olly Betts |
---|---|---|---|
Priority: | highest | Milestone: | 1.5.0 |
Component: | QueryParser | Version: | git master |
Severity: | normal | Keywords: | |
Cc: | Kelson | Blocked By: | |
Blocking: | Operating System: | All |
Description
Using the xapian.TermGenerator with the standard French stemmer text containing, for example, "l'Etat" gives terms "l'etat" and "Zl'etat". The problem is that if you then search for "etat" you won't get a match but in most cases this is probably what users want.
I suppose that the correct thing would be to stem to etat?
Change History (6)
comment:1 by , 12 years ago
Component: | Other → QueryParser |
---|---|
Version: | → SVN trunk |
comment:2 by , 5 years ago
Milestone: | → 1.5.0 |
---|---|
Version: | SVN trunk → git master |
comment:3 by , 4 years ago
Probably kind of obvious, but this is not only causing a problem for "l'", but as well for "d'" which is really common as well. See this Kiwix ticket https://github.com/openzim/libzim/issues/592 for an other concrete example of the problem.
comment:4 by , 4 years ago
Cc: | added |
---|
comment:5 by , 14 months ago
I think ideally we'd deal with this in Snowball so I've opened an issue there: https://github.com/snowballstem/snowball/issues/187
comment:6 by , 14 months ago
Priority: | normal → highest |
---|
I guess we need to decide if it is the TermGenerator's job to handle the apostrophe in cases like this, or the stemmer's job to cope with the apostrophe appropriately.
Currently TermGenerator treats apostrophe as a word character, and the English stemmer understands "'s" suffixes, but I don't think any other stemmers do anything special with apostrophes.
And QueryParser needs to match TermGenerator in this regard.