Opened 13 years ago
Last modified 14 months ago
#568 new enhancement
omindex: fallback filter(s)?
Reported by: | Charles | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 2.0.0 |
Component: | Omega | Version: | git master |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
Given that there are no 100% reliable filters for complex file types such as .doc and .pdf, how about being able to specify fallback filters?
This might be implemented by specifying multiple --filter options for the same file type; if the first fails (non-zero return code, no output for files larger than a minimum size ...) then omindex would try the others in sequence until one works.
Change History (4)
comment:1 by , 13 years ago
Milestone: | → 1.3.x |
---|---|
Version: | → SVN trunk |
comment:2 by , 9 years ago
Milestone: | 1.3.x → 1.4.x |
---|
We need to get 1.4.0 out, and this doesn't seem a release blocker.
comment:3 by , 2 years ago
Milestone: | 1.4.x → 1.5.0 |
---|---|
Version: | SVN trunk → git master |
Currently specifying --filter
overrides any previously set filter for the specified mime-type, so changing that to instead create a chain of filters is potentially incompatible with existing usage.
We could clearly come up with a way to specify a list of filters though.
Meanwhile a simple version of this can be achieved with a wrapper script which tries tools in turn until one succeeds, e.g.:
#!/bin/sh set -e antiword "$@" || catdoc "$@"
The main downside of this approach is probably the extractors have to all produce the same format (plaintext, HTML or SVG), all in the same character encoding, and all output to stdout or a temporary file. If omindex handled this it could mix and match. With git master you could also mix the new plugin extractors with command line extractors.
If output is to stdout, the wrapper script also potentially risks a failing tool producing partial output on stdout before failing. If a temporary file is used the subsequent tool should just overwrite any partial output.
comment:4 by , 14 months ago
Milestone: | 1.5.0 → 2.0.0 |
---|
Marking for 1.3.x.