Opened 13 years ago

Last modified 4 months ago

#568 new enhancement

omindex: fallback filter(s)?

Reported by: Charles Owned by: Olly Betts
Priority: normal Milestone: 2.0.0
Component: Omega Version: git master
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

Given that there are no 100% reliable filters for complex file types such as .doc and .pdf, how about being able to specify fallback filters?

This might be implemented by specifying multiple --filter options for the same file type; if the first fails (non-zero return code, no output for files larger than a minimum size ...) then omindex would try the others in sequence until one works.

Change History (4)

comment:1 by Olly Betts, 12 years ago

Milestone: 1.3.x
Version: SVN trunk

Marking for 1.3.x.

comment:2 by Olly Betts, 8 years ago

Milestone: 1.3.x1.4.x

We need to get 1.4.0 out, and this doesn't seem a release blocker.

comment:3 by Olly Betts, 19 months ago

Milestone: 1.4.x1.5.0
Version: SVN trunkgit master

Currently specifying --filter overrides any previously set filter for the specified mime-type, so changing that to instead create a chain of filters is potentially incompatible with existing usage.

We could clearly come up with a way to specify a list of filters though.

Meanwhile a simple version of this can be achieved with a wrapper script which tries tools in turn until one succeeds, e.g.:

#!/bin/sh
set -e
antiword "$@" || catdoc "$@"

The main downside of this approach is probably the extractors have to all produce the same format (plaintext, HTML or SVG), all in the same character encoding, and all output to stdout or a temporary file. If omindex handled this it could mix and match. With git master you could also mix the new plugin extractors with command line extractors.

If output is to stdout, the wrapper script also potentially risks a failing tool producing partial output on stdout before failing. If a temporary file is used the subsequent tool should just overwrite any partial output.

comment:4 by Olly Betts, 4 months ago

Milestone: 1.5.02.0.0
Note: See TracTickets for help on using tickets.