wiki:OmegaExample

Detailed Example of Running Using Omega for the First Time

This worked example was based on Xapian 1.0, and has not been updated to a more recent version. Although the concepts and principles have not changed, some of the details may have done. In particular, we strongly recommend you use a recent released version of Xapian. These day you can skip the installation steps if you are able to use operating system packages.

Reformatted from Jim's original at http://fayettedigital.com/omegaexample.html by Olly.
See also a tutorial at http://www.linux.com/archive/feature/149223?theme=print

This document will outline in detail the steps necessary to get an example search engine based on Omega and Xapian up and running. I'll point you to a set of files that you can install on your own system and index. This example uses omindex and omega.

Installing Xapian and Omega

You should generally follow the instructions in our user guide. If you use operating system packages, you can skip a lot of the following, although you may have to change some URLs and paths in the later sections depending on the details of those packages.

Requirements are:

  • Apache or another http server that you are familiar with
  • A C++ compiler

This example was developed on Linux. I have no idea how to get it to run on any other OS, so it's up to you to translate the instructions here to your specific system. I'm running a Mint 20 system with Apache 2.4.1 and G++ 9.3.0.

First you must install the xapian libraries. Download the source from the Xapian download site.

Extract the files from the archive with the following command. Note, the file name will probably be different from this example:

tar xzf xapian-core-1.4.18.tar.xz

This will create a directory xapian-core-1.4.18 so change to that directory, i.e.:

cd xapian-core-1.4.18

And configure via:

./configure --prefix=/usr/local

The --prefix isn't mandatory so if you know what you are doing, you can remove it. If there are no errors, then you can make the libraries with a make command:

make

Assuming the make went OK and you didn't get any errors, become root (su or sudo command) and type:

sudo make install

This will install the xapian library on your system.

Now that we have Xapian installed, we'll have to install the Omega utilities. To do this download Omega from the same place you found the Xapian files, extract, configure, make and install the same way you did for the libraries. The following commands should work.

cd ~
tar xzf xapian-omega-1.4.18.tar.xz
cd xapian-omega-1.4.18
./configure --prefix=/usr/local
make
sudo make install

If you are installing from source and encounter errors during the configure or make steps for either of these scripts, please check the README and INSTALL files in each directory for possible additional instructions. If that doesn't help, search the mail list archive and then post a message to the mail list if you still are having problems.

If you've gotten this far then we're almost home. The next step is to copy the omega program into your cgi-bin directory. If you don't know where it is, you'll need to look at the apache (or httpd) configuration files. Here's the section of my apache config file that tells me where to look:

ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/

So I know to put cgi binaries in the /usr/lib/cgi-bin directory. The next few lines demonstrate copying the omega binary.

sudo cp /usr/local/lib/xapian-omega/bin/omega /usr/lib/cgi-bin/omega.cgi
cd xapian-omega-1.4.18
sudo cp omega.conf /usr/lib/cgi-bin/
sudo chmod 755 /usr/lib/cgi-bin/omega.cgi

Some http servers require the cgi binaries to have an extension of .cgi, so we'll do that so we're sure it'll work. Note we've also copied the omega.conf file to the same directory. This is the easiest way to get things to work.

Building a database

The next step is to download the sample data and install it on your system. The file is less than 7MB so hopefully you've got enough space for it and download time won't be too bad. Point your browser to http://fayettedigital.com/book/book.0.1.tar.gz and download the file to somewhere convenient. Or use this command:

wget http://fayettedigital.com/book/book.0.1.tar.gz

Change directory to your document root and extract the files. On my system, I used the following commands:

cd ~
wget http://fayettedigital.com/book/book.0.1.tar.gz
cd /var/www/html
sudo tar xf ~/book.0.1.tar.gz

You may also extract the files somewhere else and copy them to your document root. There is nothing magic about the book directory.

First let's examine the /usr/lib/cgi-bin/omega.conf file we just copied. Here is the file as it is in the release (at least for this version):

# Directory containing Xapian databases:
database_dir /var/lib/omega/data

# Directory containing OmegaScript templates:
template_dir /var/lib/omega/templates

# Default template name if the CGI parameter "FMT" is not specified.
# (If not specified here, the default template name is "query"):
#default_template query

# Default database name if the CGI parameter "DB" is not specified.
# (If not specified here, the default database name is "default"):
#default_db default

# Directory to write Omega logs to:
log_dir /var/log/omega

# Directory containing any cdb files for the $lookup OmegaScript command:
cdb_dir /var/lib/omega/cdb

You may leave the values as they are or you can change them. In any case you'll have to create the missing directories, e.g.:

sudo mkdir -p /var/lib/omega/data
sudo mkdir /var/lib/omega/templates
sudo mkdir /var/lib/omega/cdb
sudo mkdir /var/log/omega

And copy the templates to the new directory.

cd ~/xapian-omega-1.4.18
sudo cp -r templates/* /var/lib/omega/templates

Be sure the templates are readable by others. Now we are ready to index the data we just stored in the directory /var/www/html/book.

omindex is the utility that we will use to index the documents. It knows how to parse html documents so we don't have to do anything special.

You should change the ownership of the /var/lib/omega/data directory to a non-root user and do the indexing as that user, but also make sure all the database files are readable by others (since the user that CGI programs run as needs to be able to read them):

sudo chown "`whoami`" /var/lib/omega/data
sudo chmod -R a+r /var/lib/omega/data

The command I used to index the data and the output is as follows:

/usr/local/bin/omindex --db /var/lib/omega/data/default --url /book /var/www/html/book
[Entering directory /]
S_ci_sto1.jpg: Skipping - unknown MIME type 'image/jpeg'
S_ci_mul.jpg: Skipping - unknown MIME type 'image/jpeg'
S_ci_spr1.jpg: Skipping - unknown MIME type 'image/jpeg'

Let's look at the omindex command. The --db parameter tells it to create a database with a name of default. That's the name that omega uses as its default. That can be changed, but for this demonstration let's keep it simple. The --url parameter identifies the url prefix that corresponds to the directory we start indexing from. Since we put the documents in /var/www/html/book we need to specify --url /book. If we were adding files that were in the document root, we'd set use --url /.

The last parameter, /var/www/html/book tells omindex to look for the documents at that location on disk. Omindex does not web crawl, it only looks at files on disk.

Using delve to show stats:

$ xapian-delve /var/lib/omega/data/default/
UUID = 715c2d1d-9199-4697-978c-c9bb96944055
number of documents = 55
average document length = 7882.53
document length lower bound = 292
document length upper bound = 35317
highest document id ever used = 55
has positional information = true
revision = 1
currently open for writing = false

Searching using the Omega CGI

Now test your installation by pointing your browser at http://localhost/cgi-bin/omega.cgi

Last modified 3 years ago Last modified on 12/09/21 12:51:45
Note: See TracWiki for help on using the wiki.