Omega is a search application built on top of the Xapian library. You can use it to easily add a search feature to your website, but it's also easy to use as a search frontend with your own indexer.
Things to cover:
- scriptindex allows easily configurable indexing of data from diverse sources (e.g. indexing from SQL)
- document dbi2omega Environment variables: DBUSER - user name to connect to the database with (defaults to $USER then $LOGNAME then "")
DBPASSWORD - password to connect to the database with (defaults to "")
DBIDRIVER - DBI driver to use (defaults to "mysql")
- document mbox2omega
- crawling using ht://dig:
- document htdig2omega
- what changes will htdig4 need?
- crawling using GNU wget:
- mirror web pages locally and then use omindex
- supports resuming download after error, proxies, cookies
- HOWTO style guide and/or wrapper script would be useful
- Peter Masiar concluded ht://dig was more suitable - find out why...
- file formats which omindex understands
- how to add new formats (this should be specifiable in a config file). See FAQ/OmegaNewFileFormat, OmindexSamples.
