Opened 16 years ago

Last modified 20 months ago

#376 assigned enhancement

omindex: use config file for multi-start directories

Reported by: Olly Betts Owned by: Olly Betts
Priority: low Milestone: 2.0.0
Component: Omega Version: git master
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description (last modified by Olly Betts)

It would be better (more convenient for users especially) to replace omindex's --no-delete option and having both BASEDIR and --url URL with some sort of configuration file which listed one or more starting directories with path->URL mappings for each.

Change History (8)

comment:1 by Olly Betts, 15 years ago

Priority: normallow
Status: newassigned

Not vital, but better to do for 1.2.0 rather than mid-series.

comment:2 by Olly Betts, 15 years ago

Milestone: 1.1.71.2.0

Bumping to stay on track for release.

comment:3 by Olly Betts, 14 years ago

I've been looking at the current "sites" support in omindex, which is closely related to the issue in this ticket.

It adds a "P" term with the path part of the start URL, and if there's a host part it also adds an "H" term.

I think it would make more sense to add a single term for each "site" with the full start URL - say XSITE<url>. Benefits:

  • Updating a site can delete from the database any documents from that site which are no longer present on disk, without affecting documents from other sites (currently you have to use -p to prevent any deletion, otherwise indexing a site deletes all the documents from other sites).
  • Deleting all documents from a site is easy (a single API call!)
  • We can use "P" terms in a more natural way - either one term per path level of each document (so a filter restricting to all documents under a directory is easy), or one for the directory a document is actually in (so restricting to an exact directory is easy).

comment:4 by Olly Betts, 12 years ago

Milestone: 1.2.x1.3.x

This isn't 1.2.x material now.

comment:5 by Olly Betts, 9 years ago

Before 1.4.0, I think we should at least implement a change to have a single term per site, and change what P terms we index, as comment:3 suggests. Both are easy to make, but unsuitable for doing mid release series.

Not quite clear to me if we want one P term with the "directory" part of the URL path of the document, or one for each parent directory too. Both seem to have their use cases, but I think I'm leaning slightly towards the first currently. Possibly we should index both, one as P and one as something else (perhaps W - mnemonic: "where" the document is).

The "site" term could use prefix C (mnemonic: "cite" is a homonym of "site"?!) or J (sorry, I think I just strained my mnemonic generator).

comment:6 by Olly Betts, 9 years ago

Milestone: 1.3.x1.4.x

[cfbf588546dcdb64a275029e7534ce07b03fd242] implements the new terms.

I went for J as the site term (the start URL is a "Jumping-off point"), and P terms are added for each parent directory too (which means a document is indexed by the same P term as before, plus (usually) some additional ones.

The rest doesn't need to block 1.4.0 - we can add features which use these new terms in 1.4.x without requiring a reindex.

Last edited 20 months ago by Olly Betts (previous) (diff)

comment:7 by Olly Betts, 5 years ago

Description: modified (diff)
Version: SVN trunkgit master

comment:8 by Olly Betts, 20 months ago

Milestone: 1.4.x2.0.0

Bumping.

Note: See TracTickets for help on using tickets.