#9 closed defect (released)
$highlight{} doesn't handle accented characters correctly
Reported by: | Arjen | Owned by: | Olly Betts |
---|---|---|---|
Priority: | high | Milestone: | |
Component: | Omega | Version: | 0.7.4 |
Severity: | minor | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
I was just notified of the behaviour that words with é (and probably other characters aswell) are split into multiple words, as far as I know that shouldn't happen? Working around is of course to do a phrase search, but afaik xapian should either replace the é with an e or treat is as a normal character.
Change History (8)
comment:1 by , 22 years ago
Owner: | changed from | to
---|---|
Severity: | minor → normal |
comment:2 by , 22 years ago
Status: | new → assigned |
---|
comment:3 by , 22 years ago
Presumably this will also affect query construction? Or does that already transliterate characters? Whatever, $highlight{} will also be affected, as it will need to do the transliteration while it's looking for words to highlight. I have a feeling I didn't pull out code when writing $highlight{} - it's probably duplicating code in query.cc or similar, and so will need to be fixed twice.
comment:4 by , 21 years ago
Now fixed up everywhere apart from $highlight{} in query.cc which should share code with indextext.cc.
comment:5 by , 21 years ago
Summary: | A word like bézier is split into b and zier → $highlight{} doesn't handle accented characters correctly |
---|
comment:6 by , 21 years ago
op_sys: | Linux → All |
---|---|
rep_platform: | PC → All |
Severity: | normal → minor |
Version: | 0.6.4 → 0.7.4 |
Bug is in omindex.cc / scriptindex.cc. Similar code should be pulled out into a shared file and fixed to handle accented characters. Probably transliterate most to unaccented characters to normalise accent representation...