Opened 13 years ago
Closed 13 years ago
#563 closed enhancement (fixed)
Add a mode for indexing only stemmed terms in TermGenerator
Reported by: | Vitaliy Filippov | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 1.2.11 |
Component: | QueryParser | Version: | 1.2.6 |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description (last modified by )
Many search engines just index stems and throw away exact terms. This may be convenient if you don't need searching for exact terms, and it greatly reduces the index.
It would be good for TermGenerator to have such indexing mode.
Attachments (1)
Change History (6)
comment:1 by , 13 years ago
Milestone: | → 1.3.x |
---|
by , 13 years ago
Attachment: | modified.patch added |
---|
comment:2 by , 13 years ago
This patch provides matching modes for QueryParser and TermGenerator. The mode for indexing can have following 4 values :
- STEM_NONE: Don't index any stemmed word
- STEM_SOME: Index both stemmed as well as full (non-stemmed) words.(note: prefix "Z" is present in front of the stemmed words).
- STEM_ALL: Index only stemmed words.(note: stemmed words DONT have "Z" prefix).
- STEM_ALL_Z: Index only stemmed words. (note: stemmed words have "Z" prefix).
Correspondingly a new stemming strategy QueryParser::STEM_ALL_Z has been introduced.
comment:3 by , 13 years ago
Description: | modified (diff) |
---|
Thanks for the patch. It looks pretty good to me, but a few comments:
Some test coverage for the new modes would be good - we already have tests for the existing STEM_xxx modes in tests/queryparsertest.cc, and for the now default (previously only) stemming mode of TermGenerator in tests/termgentest.cc.
It's better to just write string stem;
rather than string stem("");
since std::string objects are empty by default, and the compiler can special case default initialisation and handle it more efficiently (GCC does, I haven't looked at other compilers closely).
And a couple of style issues:
Please put a space after keywords followed by an opening bracket (so if (foo)
not if(foo)
) to distinguish them more clearly visually from function calls.
For Xapian code, we use 4 space indent, tab filled with a tab being 8 spaces wide - I think your editor has tabs as 4 spaces wide - the indentation of some of the changed lines is too deep with the standard settings anyway.
comment:4 by , 13 years ago
Milestone: | 1.3.x → 1.2.11 |
---|---|
Status: | new → assigned |
Applied the remaining half of this patch which corresponds to this ticket in r16628.
Writing testcases revealed that it wasn't adding term positions in all cases where it should have been, so I tweaked it to do that correctly.
Marking to consider backporting.
comment:5 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Combined Patch for this(#563) as well as for #562