Profiling Xapian

If you think you've found a bottleneck in Xapian's code, it will be much easier for us to investigate if you can run a profiling tool on your application and we can look at the results and see where most of the time is spent. In bad cases, there are usually a small number of "hot spots" which account for the majority of the time, and it's often easy to obtain a good speed up by looking at these. But we don't know where they are without the profiling data.

Any decent profiling tool will give some idea, but if you're running on Linux, we recommend oprofile as it looks at the whole system, so we see time spent in the kernel, and doing I/O (and slow operations are often partly due to I/O). Also, oprofile has a low overhead so it doesn't change the situation we're trying to measure too much.

The most useful output (from oprofile or other tools) is usually a callgraph which shows how time splits down between functions and the functions they in turn call.

How to profile Xapian using oprofile

What follows is a run through of how you might install and use oprofile to generate a callgraph.

Prerequisites:

  • A Linux box (ideally running a 2.6 series kernel for best results)
  • An architecture where oprofile supports callgraphs (x86 and x86_64 seem to)
  • root access
  • If you're running virtualised under Xen, you apparently need to do special things to get oprofile working. We're not sure exactly what, but for some promising-sounding resources see:  http://www.google.com/search?q=oprofile+xen

First, make sure you've got oprofile installed. I'm running Ubuntu, so to install I just do:

sudo apt-get install oprofile

You may also want to ensure you have debug symbols for libraries where time is spent so we can see which functions time is spent in. On Debian and derived distros, debug symbols are often packaged separately in a package named after the library package with -dbg appended. E.g. for libc6 debug symbols install libc6-dbg:

sudo apt-get install libc6-dbg

Debian should be the same as Ubuntu for installing oprofile and debug symbols, but this part may vary for other distros.

From now on, the commands should work for any distro, provided you use sudo (except that in edgy, feisty, and Ubuntu gutsy's oprofile package has a bug and you'll need to use bash opcontrol in all the commands below to work around this).

Once you have it installed:

sudo opcontrol --init
sudo opcontrol --no-vmlinux --callgraph=12
sudo opcontrol --start-daemon
sudo opcontrol --start ; [run testcase] ; sudo opcontrol --stop
opreport --callgraph=12 > oprofile-callgraph.txt
sudo opcontrol --deinit

The output of opreport can be enormous, so please don't post reports to the mailing list as attachments. The best thing to do is open a bug report and attach the report there so that it doesn't get lost.

How to profile Xapian using sysprof

Sysprof is another Linux system-wide profiling tool. It is very easy to use (once installed), but provides less detail than oprofile (in particular, it doesn't provide access to the hardware counters).

First, note that you will need the sysprof kernel module to be installed and loaded before sysprof will work. On Ubuntu, the sysprof module can be installed by running:

sudo apt-get install sysprof-module-source
sudo module-assistant auto-install sysprof-module

It will then need to be loaded by running:

sudo modprobe sysprof-module

Install the sysprof application using:

sudo apt-get install sysprof

When you want to profile, start the sysprof GUI application (under Applications>Programming, or by typing sysprof in a terminal window). To start profiling, click the "Start" button, then start the activity you wish to profile, and once this has run for a sufficient period of time, click "Profile" to stop the collection of information.

A list of functions, and the time spent in them and in their descendants is displayed in the sysprof window. You can click on a function to get a list of callers and callees, and click on a caller to change the view to that function. You can also sort the list by time-in-self or by accumulated time.