Opened 16 years ago
Last modified 21 months ago
#376 assigned enhancement
omindex: use config file for multi-start directories
Reported by: | Olly Betts | Owned by: | Olly Betts |
---|---|---|---|
Priority: | low | Milestone: | 2.0.0 |
Component: | Omega | Version: | git master |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description (last modified by )
It would be better (more convenient for users especially) to replace omindex's --no-delete
option and having both BASEDIR
and --url URL
with some sort of configuration
file which listed one or more starting directories with path->URL
mappings for each.
Change History (8)
comment:1 by , 15 years ago
Priority: | normal → low |
---|---|
Status: | new → assigned |
comment:3 by , 14 years ago
I've been looking at the current "sites" support in omindex, which is closely related to the issue in this ticket.
It adds a "P" term with the path part of the start URL, and if there's a host part it also adds an "H" term.
I think it would make more sense to add a single term for each "site" with the full start URL - say XSITE<url>. Benefits:
- Updating a site can delete from the database any documents from that site which are no longer present on disk, without affecting documents from other sites (currently you have to use -p to prevent any deletion, otherwise indexing a site deletes all the documents from other sites).
- Deleting all documents from a site is easy (a single API call!)
- We can use "P" terms in a more natural way - either one term per path level of each document (so a filter restricting to all documents under a directory is easy), or one for the directory a document is actually in (so restricting to an exact directory is easy).
comment:5 by , 9 years ago
Before 1.4.0, I think we should at least implement a change to have a single term per site, and change what P
terms we index, as comment:3 suggests. Both are easy to make, but unsuitable for doing mid release series.
Not quite clear to me if we want one P
term with the "directory" part of the URL path of the document, or one for each parent directory too. Both seem to have their use cases, but I think I'm leaning slightly towards the first currently. Possibly we should index both, one as P
and one as something else (perhaps W
- mnemonic: "where" the document is).
The "site" term could use prefix C
(mnemonic: "cite" is a homonym of "site"?!) or J
(sorry, I think I just strained my mnemonic generator).
comment:6 by , 9 years ago
Milestone: | 1.3.x → 1.4.x |
---|
[cfbf588546dcdb64a275029e7534ce07b03fd242] implements the new terms.
I went for J
as the site term (the start URL is a "Jumping-off point"), and P
terms are added for each parent directory too (which means a document is indexed by the same P
term as before, plus (usually) some additional ones.
The rest doesn't need to block 1.4.0 - we can add features which use these new terms in 1.4.x without requiring a reindex.
comment:7 by , 5 years ago
Description: | modified (diff) |
---|---|
Version: | SVN trunk → git master |
Not vital, but better to do for 1.2.0 rather than mid-series.