Opened 16 years ago
Last modified 3 years ago
#376 assigned enhancement
omindex: use config file for multi-start directories
| Reported by: | Olly Betts | Owned by: | Olly Betts |
|---|---|---|---|
| Priority: | low | Milestone: | 2.0.0 |
| Component: | Omega | Version: | git master |
| Severity: | normal | Keywords: | |
| Cc: | Blocked By: | ||
| Blocking: | Operating System: | All |
Description (last modified by )
It would be better (more convenient for users especially) to replace omindex's --no-delete option and having both BASEDIR and --url URL with some sort of configuration
file which listed one or more starting directories with path->URL
mappings for each.
Change History (8)
comment:1 by , 16 years ago
| Priority: | normal → low |
|---|---|
| Status: | new → assigned |
comment:3 by , 15 years ago
I've been looking at the current "sites" support in omindex, which is closely related to the issue in this ticket.
It adds a "P" term with the path part of the start URL, and if there's a host part it also adds an "H" term.
I think it would make more sense to add a single term for each "site" with the full start URL - say XSITE<url>. Benefits:
- Updating a site can delete from the database any documents from that site which are no longer present on disk, without affecting documents from other sites (currently you have to use -p to prevent any deletion, otherwise indexing a site deletes all the documents from other sites).
- Deleting all documents from a site is easy (a single API call!)
- We can use "P" terms in a more natural way - either one term per path level of each document (so a filter restricting to all documents under a directory is easy), or one for the directory a document is actually in (so restricting to an exact directory is easy).
comment:5 by , 10 years ago
Before 1.4.0, I think we should at least implement a change to have a single term per site, and change what P terms we index, as comment:3 suggests. Both are easy to make, but unsuitable for doing mid release series.
Not quite clear to me if we want one P term with the "directory" part of the URL path of the document, or one for each parent directory too. Both seem to have their use cases, but I think I'm leaning slightly towards the first currently. Possibly we should index both, one as P and one as something else (perhaps W - mnemonic: "where" the document is).
The "site" term could use prefix C (mnemonic: "cite" is a homonym of "site"?!) or J (sorry, I think I just strained my mnemonic generator).
comment:6 by , 10 years ago
| Milestone: | 1.3.x → 1.4.x |
|---|
[cfbf588546dcdb64a275029e7534ce07b03fd242] implements the new terms.
I went for J as the site term (the start URL is a "Jumping-off point"), and P terms are added for each parent directory too (which means a document is indexed by the same P term as before, plus (usually) some additional ones.
The rest doesn't need to block 1.4.0 - we can add features which use these new terms in 1.4.x without requiring a reindex.
comment:7 by , 6 years ago
| Description: | modified (diff) |
|---|---|
| Version: | SVN trunk → git master |

Not vital, but better to do for 1.2.0 rather than mid-series.