Opened 14 years ago
Closed 14 years ago
#507 closed defect (notabug)
Some little problems with the french stemmer
Reported by: | Versmisse David | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Library API | Version: | 1.2.3 |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
Hello,
Here, a little list of few problems with the french stemmer that we found:
For nouns with "...e", the e is removed by example: poule (chicken) => poul (must be poule) it's good for an adjectiv, but not for a noun.
And it's the same thing with the nouns with "...lle" or "...tte", by example, brouette (wheelbarrow) => brouet (must be brouette)
I understand, it's a problem, because the same rule cannot be applied for nouns and adjectives. With the current solution (xapian 1.2.3), we get too many solutions with a search, ie "false positive", so it's better than "false negative".
Have you got a file to test the stemmer? We can help you to fill this file.
Best regards,
- Versmisse.
Change History (2)
comment:1 by , 14 years ago
Component: | Other → Library API |
---|---|
Version: | → 1.2.3 |
comment:2 by , 14 years ago
Resolution: | → notabug |
---|---|
Status: | new → closed |
Type: | enhancement → defect |
Closing as "notabug" - as I explained in the previous comment, I think this is working as intended.
I think what you're describing is a feature rather than a bug.
The stems which are produced aren't necessarily actual words, but rather tokens which look rather like the words associated with that stem.
For example, in English early stems to earli which isn't a real word. But this doesn't matter, as what is important is that earlier also stems to earli.
Section 5 of http://snowball.tartarus.org/texts/introduction.html discusses this:
If the stemmer is producing the same stem for words which should have different stems (or different stems for cases which should be the same) then it would be more efficient to report this directly to the Snowball developers. Snowball is the project which maintains these algorithms - see http://snowball.tartarus.org/
There's test data for the stemmers in SVN under browser:trunk/xapian-data