omindex: delay libmagic checks
|Reported by:||Olly Betts||Owned by:||Bruno Baruffaldi|
Description (last modified by )
Currently omindex's logic is:
- map the extension to a mime type
- if "ignore" or "skip" move on to next file
- check file size (requires
stat()call, which we have avoided so far if the file system returns
- if 0 or > max_size then move on to next file
- if extension mapping didn't give a mime type, call libmagic to get a mime type
- if libmagic doesn't recognise the file, move on to next file
Documentobject and set up a little
- check timestamps from
stat()and the DB for an existing entry and move on to next file if this has been indexed and hasn't changed
- check for failed entry in DB and move on if we already tried and failed (needs file size and last mod from
The ordering here isn't ideal - in particular:
- The probing done by libmagic is potentially fairly expensive since it has to open and read the start of the file, so we should avoid calling libmagic if another cheap check which doesn't need the mime type could reject the file (e.g. possibly timestamps if we can uncouple those checks from the check for the existing DB entry). If we have a mapping checking the mimetype for "ignore" or "skip" is still a cheap early check.
- We create and setup the
Documentobject a bit early (though this shouldn't be very expensive).
Change History (11)
Note: See TracTickets for help on using tickets.