Opened 21 years ago

Closed 18 years ago

Last modified 18 years ago

#11 closed defect (released)

Make stemming language configurable in omindex and scriptindex

Reported by: Arjen Owned by: Olly Betts
Priority: high Milestone:
Component: Omega Version: 0.7.4
Severity: minor Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

As posted to the mailinglost:

When I do the simplest possible version of this query, I get these results: 0.6.5

All 10 matches OmQuery((nos:(pos=1) AND meet:(pos=2))) Term frequencies: nos: 7,291, meeting: 97

1-10 of about 7,387 matches OmQuery((nos:(pos=1) OR meet:(pos=2))) Term frequencies: nos: 7,291, meeting: 97

0.6.4

1-10 of about 52 matches OmQuery((nos:(pos=1) AND meeting:(pos=2))) Term frequencies: nos: 7,291, meeting: 3,101

1-10 of about 10,360 matches OmQuery((nos:(pos=1) OR meeting:(pos=2))) Term frequencies: nos: 7,291, meeting: 3,101

This is, of course, on the same database. The problem is in the number of found documents for the term 'meeting' (which gets stemmed to 'meet').

The problem only occurs with a 0.6.5 omega, an 0.6.4 with an 0.6.5 xapian-backend appears to be ok.

Change History (9)

comment:1 by Olly Betts, 21 years ago

Owner: changed from James Aylett to Olly Betts
Version: other0.6.5

comment:2 by Olly Betts, 21 years ago

Severity: blockermajor
Status: newassigned

The OmQuery::get_description() results show that 0.6.4 seems to stem "meeting" to "meeting", while 0.6.5 stems "meeting" to "meet". "meet" is correct judging by http://snowball.tartarus.org/ at least.

This might explain the results as the database in question was built with 0.6.4.

Not sure about the results for 0.6.4 omega and the 0.6.5 xapian-core - that should behave the same as 0.6.5 omega and xapian-core by this theory...

Will investigate further...

comment:3 by Arjen, 21 years ago

Severity: majorenhancement

*cough* If I change the stemming-language to Dutch it works correctly... And if I change the 0.6.4's stemming to english, that one fails aswell.

Couldn't you have the ./configure script change the stemming-language (defaulting to "english") ?

comment:4 by Olly Betts, 21 years ago

Severity: enhancementminor
Summary: Documents are not found with omega 0.6.5, while 0.6.4 does find themMake stemming language configurable

configure options don't fit well with pre-compiled binary packages - they force the packager to choose some settings, and unless they're just a matter of enabling functionality (e.g. database backends) that means the packager is making choices for the user.

But this really should be configurable without editing the source code, and ideally on a per-database basis (with a per-installation default for newly created databases).

comment:5 by Olly Betts, 21 years ago

Now configurable for omega (in CVS version) by putting $set{stemmer,nl} near the start of the omegascript template. Not configurable for omindex or scriptindex yet.

comment:6 by Olly Betts, 20 years ago

op_sys: LinuxAll
rep_platform: PCAll
Summary: Make stemming language configurableMake stemming language configurable in omindex and scriptindex
Version: 0.6.50.7.4

comment:7 by Olly Betts, 18 years ago

Resolution: fixed
Status: assignedclosed

Now fixed in SVN trunk (omindex and scriptindex now take an optional --stemmer command line option).

comment:8 by Olly Betts, 18 years ago

Fixed in release 0.9.3.

comment:9 by Olly Betts, 18 years ago

Operating System: All
Resolution: fixedreleased
Note: See TracTickets for help on using tickets.