#9 closed defect (released)
$highlight{} doesn't handle accented characters correctly
| Reported by: | Arjen | Owned by: | Olly Betts |
|---|---|---|---|
| Priority: | high | Milestone: | |
| Component: | Omega | Version: | 0.7.4 |
| Severity: | minor | Keywords: | |
| Cc: | Blocked By: | ||
| Blocking: | Operating System: | All |
Description
I was just notified of the behaviour that words with é (and probably other characters aswell) are split into multiple words, as far as I know that shouldn't happen? Working around is of course to do a phrase search, but afaik xapian should either replace the é with an e or treat is as a normal character.
Change History (8)
comment:1 by , 23 years ago
| Owner: | changed from to |
|---|---|
| Severity: | minor → normal |
comment:2 by , 23 years ago
| Status: | new → assigned |
|---|
comment:3 by , 23 years ago
Presumably this will also affect query construction? Or does that already transliterate characters? Whatever, $highlight{} will also be affected, as it will need to do the transliteration while it's looking for words to highlight. I have a feeling I didn't pull out code when writing $highlight{} - it's probably duplicating code in query.cc or similar, and so will need to be fixed twice.
comment:4 by , 22 years ago
Now fixed up everywhere apart from $highlight{} in query.cc which should share code with indextext.cc.
comment:5 by , 22 years ago
| Summary: | A word like bézier is split into b and zier → $highlight{} doesn't handle accented characters correctly |
|---|
comment:6 by , 22 years ago
| op_sys: | Linux → All |
|---|---|
| rep_platform: | PC → All |
| Severity: | normal → minor |
| Version: | 0.6.4 → 0.7.4 |

Bug is in omindex.cc / scriptindex.cc. Similar code should be pulled out into a shared file and fixed to handle accented characters. Probably transliterate most to unaccented characters to normalise accent representation...