Opened 15 years ago

Closed 15 years ago

#446 closed defect (fixed)

TermGenerator: Strange handling of '+' within a word

Reported by: Carl Worth Owned by: Olly Betts
Priority: normal Milestone: 1.0.18
Component: Library API Version: 1.1.3
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description (last modified by Olly Betts)

I asked the TermGenerator to generate terms for a string containing " xapian+kanru ". I was surprised to see the result as the following two terms:

xapian+ kanru

I did note that the documentation[1] of the term-generator says that "trailing +" is included on a term. But the handling of the above seems inconsistent. It appears that the embedded '+' is first treated as a non-word character to split the string into "xapian+" and "kanru" and then the '+' is identified as trailing, so is considered a word-character to yield "xapian+".

I expected the embedded '+' to be treated consistently as a non-word character here, (it's not a trailing +), so the desired result would be the two terms "xapian" and "kanru".

As always, thanks for Xapian!

-Carl

[1] http://xapian.org/docs/termgenerator.html

PS. The above documentation has phrases like "a few other characters" in some places. I would love to see those replaced with lists of the actual characters so that I could predict correct results by reading the documentation.

Change History (2)

comment:1 by Olly Betts, 15 years ago

Component: OtherLibrary API
Description: modified (diff)
Milestone: 1.0.18
Status: newassigned

QueryParser already gets this right.

Fixed in trunk r13988.

For 1.0 just backporting this change arguably introduces an incompatibility in indexing. Not sure if it matters or not, but perhaps we should index the first term both with and without the suffix there.

comment:2 by Olly Betts, 15 years ago

Resolution: fixed
Status: assignedclosed

Upon reflection, the term with the appended + isn't actually useful, so there's no point generating it in the name of "compatibility". So backported the exact change from trunk to 1.0 in r13992.

Note: See TracTickets for help on using tickets.