Opened 12 years ago
Last modified 11 months ago
#618 assigned enhancement
Omega: Improved indexing of leafname (intelligent split into several words)
Reported by: | peterpan | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 2.0.0 |
Component: | Omega | Version: | 1.2.14 |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description (last modified by )
Reference: http://article.gmane.org/gmane.comp.search.xapian.general/9561
Omega indexes file names. The file name seems to indexed as several words if the name contains space characters or hyphens.
In my NAS share I often separate words in the file name using "-" , "_" or even using a capital letter at the beginning of each word (I guess this is also the case for many other users):
Examples:
"this_is_a_file.txt"
"thisIsAFile.txt"
In those cases, a noticed that omega does not index the individual words, but only the full basename as one single word.
Therefore, omega should index each respective word (i.e. "this" "is" "a" "file") in addition to the full basename (i.e. "this_is_a_file"), in order to ease the search.
Change History (7)
comment:1 by , 12 years ago
Description: | modified (diff) |
---|
comment:2 by , 12 years ago
Type: | defect → enhancement |
---|
comment:3 by , 8 years ago
Milestone: | → 1.4.1 |
---|---|
Status: | new → assigned |
comment:4 by , 8 years ago
Backported to RELEASE/1.4 branch in [50b1129bb024b7995584d820335fa1535f09aa15].
comment:6 by , 20 months ago
Milestone: | 1.4.x → 1.5.0 |
---|
comment:7 by , 11 months ago
Milestone: | 1.5.0 → 2.0.0 |
---|
We need an algorithm that handles camel-case suitable, without doing stupid things to other cases.
Perhaps "word-split before an upper case character if it's followed by either a lower case character, or by another upper case character and then a lower case character, so:
thisIsAFile
->this
Is
A
File
AndThis
->And
This
README
->README
nothandled
->nothandled
This would be reasonable to backport to a stable release series (especially early in the series) so not a blocker.
_
(and also&
) are handled as of [e66f0f0598a4a54243964fd4a7feca8080066b19] on git master. Marking for 1.4.1.I've not attempted to handle camel-case yet. It seems some subtlety is needed there - e.g. "README.txt" shouldn't get index as "R E A D M E".