Opened 15 years ago
Closed 15 years ago
#446 closed defect (fixed)
TermGenerator: Strange handling of '+' within a word
Reported by: | Carl Worth | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 1.0.18 |
Component: | Library API | Version: | 1.1.3 |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description (last modified by )
I asked the TermGenerator to generate terms for a string containing " xapian+kanru ". I was surprised to see the result as the following two terms:
xapian+ kanru
I did note that the documentation[1] of the term-generator says that "trailing +" is included on a term. But the handling of the above seems inconsistent. It appears that the embedded '+' is first treated as a non-word character to split the string into "xapian+" and "kanru" and then the '+' is identified as trailing, so is considered a word-character to yield "xapian+".
I expected the embedded '+' to be treated consistently as a non-word character here, (it's not a trailing +), so the desired result would be the two terms "xapian" and "kanru".
As always, thanks for Xapian!
-Carl
[1] http://xapian.org/docs/termgenerator.html
PS. The above documentation has phrases like "a few other characters" in some places. I would love to see those replaced with lists of the actual characters so that I could predict correct results by reading the documentation.
Change History (2)
comment:1 by , 15 years ago
Component: | Other → Library API |
---|---|
Description: | modified (diff) |
Milestone: | → 1.0.18 |
Status: | new → assigned |
comment:2 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Upon reflection, the term with the appended + isn't actually useful, so there's no point generating it in the name of "compatibility". So backported the exact change from trunk to 1.0 in r13992.
QueryParser already gets this right.
Fixed in trunk r13988.
For 1.0 just backporting this change arguably introduces an incompatibility in indexing. Not sure if it matters or not, but perhaps we should index the first term both with and without the suffix there.