#552 closed task (fixed)
omindex extracts wrong extension
Reported by: | Ditha | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 1.2.4 |
Component: | Omega | Version: | 1.2.6 |
Severity: | minor | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description (last modified by )
If you try to index with "omindex --follow --preserve-nonduplicates --stemmer=german -M:text/html --db /data/INDEX /data/QUELLE" a directory structure like "/data/.../0/118/blog.laukien.com/software/admen" the indexer thinks ".com/software..." is an extension, if the file to index has no own extension. Everything after the last dot is the extension...
If you change the source of omindex.cc into
const char * dot_ptr = strrchr(d.leafname(), '.'); const char * dot_slash = strrchr(d.leafname(), '/'); if (dot_ptr && dot_slash && dot_ptr > dot_slash)
the extension will be interpreted right. ...I think. ;-)
Change History (4)
comment:1 by , 13 years ago
Description: | modified (diff) |
---|
comment:2 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Version: | 1.2.6 → 1.2.4 |
Thanks for your report, but this bug isn't actually present in 1.2.6 - d.leafname()
returns the leafname of the file, so in the situation you describe, d.leafname()
will return "admen"
and the extension is empty.
Testing (on the tip of browser:branches/1.2, but nothing relevant has changed there since 1.2.6):
mkdir -p 0/118/blog.laukien.com/software echo testing > 0/118/blog.laukien.com/software/admen ./omindex --follow --preserve-nonduplicates --stemmer=german -M:text/html --db INDEX 0 ../../xapian-core/examples/delve INDEX -r1
The output from delve is:
Term List for record #1: D20110628 E I* M201106 Oolly P/ Ttext/html U/118/blog.laukien.com/software/admen Y2011 Zadm Ztesting admen testing
Note there's an "E" term, not "Ecom/software/admen" which there would be if this bug were present.
We did used to get this wrong, but it was fixed last year in r15181, and the fix was released in 1.2.4. Did you perhaps misreport the version you were using?
comment:3 by , 13 years ago
Milestone: | → 1.2.4 |
---|---|
Version: | 1.2.4 → 1.2.6 |
Oops, meant to set milestone not version.
It should probably be
slash_ptr
notdot_slash
. Also, I think the conditional needs to be:since if you're indexing relative, "wibble.html" needs to be interpreted as an extension of ".html".