#11 closed defect (released)
Make stemming language configurable in omindex and scriptindex
Reported by: | Arjen | Owned by: | Olly Betts |
---|---|---|---|
Priority: | high | Milestone: | |
Component: | Omega | Version: | 0.7.4 |
Severity: | minor | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
As posted to the mailinglost:
When I do the simplest possible version of this query, I get these results: 0.6.5
All 10 matches OmQuery((nos:(pos=1) AND meet:(pos=2))) Term frequencies: nos: 7,291, meeting: 97
1-10 of about 7,387 matches OmQuery((nos:(pos=1) OR meet:(pos=2))) Term frequencies: nos: 7,291, meeting: 97
0.6.4
1-10 of about 52 matches OmQuery((nos:(pos=1) AND meeting:(pos=2))) Term frequencies: nos: 7,291, meeting: 3,101
1-10 of about 10,360 matches OmQuery((nos:(pos=1) OR meeting:(pos=2))) Term frequencies: nos: 7,291, meeting: 3,101
This is, of course, on the same database. The problem is in the number of found documents for the term 'meeting' (which gets stemmed to 'meet').
The problem only occurs with a 0.6.5 omega, an 0.6.4 with an 0.6.5 xapian-backend appears to be ok.
Change History (9)
comment:1 by , 22 years ago
Owner: | changed from | to
---|---|
Version: | other → 0.6.5 |
comment:2 by , 22 years ago
Severity: | blocker → major |
---|---|
Status: | new → assigned |
comment:3 by , 22 years ago
Severity: | major → enhancement |
---|
*cough* If I change the stemming-language to Dutch it works correctly... And if I change the 0.6.4's stemming to english, that one fails aswell.
Couldn't you have the ./configure script change the stemming-language (defaulting to "english") ?
comment:4 by , 22 years ago
Severity: | enhancement → minor |
---|---|
Summary: | Documents are not found with omega 0.6.5, while 0.6.4 does find them → Make stemming language configurable |
configure options don't fit well with pre-compiled binary packages - they force the packager to choose some settings, and unless they're just a matter of enabling functionality (e.g. database backends) that means the packager is making choices for the user.
But this really should be configurable without editing the source code, and ideally on a per-database basis (with a per-installation default for newly created databases).
comment:5 by , 21 years ago
Now configurable for omega (in CVS version) by putting $set{stemmer,nl} near the start of the omegascript template. Not configurable for omindex or scriptindex yet.
comment:6 by , 21 years ago
op_sys: | Linux → All |
---|---|
rep_platform: | PC → All |
Summary: | Make stemming language configurable → Make stemming language configurable in omindex and scriptindex |
Version: | 0.6.5 → 0.7.4 |
comment:7 by , 19 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Now fixed in SVN trunk (omindex and scriptindex now take an optional --stemmer command line option).
comment:9 by , 19 years ago
Operating System: | → All |
---|---|
Resolution: | fixed → released |
The OmQuery::get_description() results show that 0.6.4 seems to stem "meeting" to "meeting", while 0.6.5 stems "meeting" to "meet". "meet" is correct judging by http://snowball.tartarus.org/ at least.
This might explain the results as the database in question was built with 0.6.4.
Not sure about the results for 0.6.4 omega and the 0.6.5 xapian-core - that should behave the same as 0.6.5 omega and xapian-core by this theory...
Will investigate further...