Opened 11 years ago

Last modified 5 months ago

#609 new enhancement

term generation for some French elisions produces imperfect results.

Reported by: Paul Rudin Owned by: Olly Betts
Priority: highest Milestone: 1.5.0
Component: QueryParser Version: git master
Severity: normal Keywords:
Cc: Kelson Blocked By:
Blocking: Operating System: All

Description

Using the xapian.TermGenerator with the standard French stemmer text containing, for example, "l'Etat" gives terms "l'etat" and "Zl'etat". The problem is that if you then search for "etat" you won't get a match but in most cases this is probably what users want.

I suppose that the correct thing would be to stem to etat?

Change History (6)

comment:1 by Olly Betts, 11 years ago

Component: OtherQueryParser
Version: SVN trunk

I guess we need to decide if it is the TermGenerator's job to handle the apostrophe in cases like this, or the stemmer's job to cope with the apostrophe appropriately.

Currently TermGenerator treats apostrophe as a word character, and the English stemmer understands "'s" suffixes, but I don't think any other stemmers do anything special with apostrophes.

And QueryParser needs to match TermGenerator in this regard.

comment:2 by Olly Betts, 5 years ago

Milestone: 1.5.0
Version: SVN trunkgit master

comment:3 by Kelson, 3 years ago

Probably kind of obvious, but this is not only causing a problem for "l'", but as well for "d'" which is really common as well. See this Kiwix ticket https://github.com/openzim/libzim/issues/592 for an other concrete example of the problem.

comment:4 by Kelson, 3 years ago

Cc: Kelson added

comment:5 by Olly Betts, 5 months ago

I think ideally we'd deal with this in Snowball so I've opened an issue there: https://github.com/snowballstem/snowball/issues/187

comment:6 by Olly Betts, 5 months ago

Priority: normalhighest
Note: See TracTickets for help on using tickets.