Table of Contents
If you think you've found a bottleneck in Xapian's code, it will be much easier for us to investigate if you can run a profiling tool on your application and we can look at the results and see where most of the time is spent. Slow cases are usually due to a small number of "hot spots" which account for the majority of the time, and it's often easy to obtain a good speed up by looking at these. But we don't know where they are without the profiling data.
The information currently here is mostly Unix-specific, and fairly biased towards Linux - if you have further profiling hints, particularly for other platforms, please add them below.
The first thing to do is find out if the testcase is I/O or CPU bound.
To do this, run it under
time which will reveal this, and
also whether a lot of time is spent in the kernel, for example:
$ time ./test-script real 0m27.863s user 0m4.514s sys 0m0.236s
The exact output format may vary between versions of time, but you should get these 3 statistics.
The above example is I/O bound (the user+sys times are much less than the real time, so assuming the machine is otherwise idle, the difference is due to waiting for I/O).
If we're CPU bound, most profiling tools are suitable. If we're I/O bound, then using a profiling tool which doesn't just look at CPU time will produce much more useful data.
In I/O bound cases, if you're running on Linux, we recommend oprofile. This looks at the whole system, so we see time spent in the kernel, and doing I/O (and slow operations are often partly due to I/O). Also, oprofile has a low overhead so it doesn't change the situation we're trying to measure too much.
The most useful output (from oprofile or other tools) is usually a callgraph which shows how time splits down between functions and the functions they in turn call.
How to profile Xapian using perf
Linux kernels from 2.6.31 onwards have a new profiling framework with a userland tool called "perf":
It seems to be very versatile, and likely to supplant oprofile, but we haven't yet tried it with Xapian. If you do, we encourage you to fill in any useful tips here.
One limitation we're aware of is the perf currently relies on frame pointers to show call stacks, which are omitted on x86-64 (and also on x86 by GCC 4.6 when optimising).
How to profile Xapian using oprofile
What follows is a run through of how you might install and use oprofile to generate a callgraph.
- A Linux box
- An architecture where oprofile supports callgraphs (x86 and x86_64 seem to)
- root access
- If you're running virtualised under Xen, you apparently need to do special things to get oprofile working. We're not sure exactly what, but for some promising-sounding resources see: http://www.google.com/search?q=oprofile+xen
First, make sure you've got oprofile installed. I'm running Ubuntu, so to install I just do:
sudo apt-get install oprofile
You may also want to ensure you have debug symbols for libraries where time is
spent so we can see which functions time is spent in. On Debian and derived
distros, debug symbols are often packaged separately in a package named after
the library package with
-dbg appended. E.g. for
sudo apt-get install libc6-dbg
Debian should be the same as Ubuntu for installing oprofile and debug symbols, but this part may vary for other distros.
From now on, the commands should work for any distro, provided you use
Once you have it installed:
sudo opcontrol --init sudo opcontrol --no-vmlinux --callgraph=12 sudo opcontrol --start-daemon sudo opcontrol --start ; [run testcase] ; sudo opcontrol --stop opreport --callgraph=12 > oprofile-callgraph.txt sudo opcontrol --deinit
The output of opreport can be enormous, so please don't post reports to the mailing list as attachments. The best thing to do is open a bug report and attach the report there so that it doesn't get lost.
How to profile Xapian using sysprof
Sysprof is another Linux system-wide profiling tool. It is very easy to use (once installed), but provides less detail than oprofile (in particular, it doesn't provide access to the hardware counters).
First, note that you will need the sysprof kernel module to be installed and loaded before sysprof will work. On Ubuntu, the sysprof module can be installed by running:
sudo apt-get install sysprof-module-source sudo module-assistant auto-install sysprof-module
It will then need to be loaded by running:
sudo modprobe sysprof-module
Install the sysprof application using:
sudo apt-get install sysprof
When you want to profile, start the sysprof GUI application (under Applications>Programming,
or by typing
sysprof in a terminal window). To start profiling, click the
"Start" button, then start the activity you wish to profile, and once this has run
for a sufficient period of time, click "Profile" to stop the collection of information.
A list of functions, and the time spent in them and in their descendants is displayed in the sysprof window. You can click on a function to get a list of callers and callees, and click on a caller to change the view to that function. You can also sort the list by time-in-self or by accumulated time.
Profiling just CPU usage
The following profilers aren't so useful when there's a significant amount of waiting on I/O, but in mostly CPU-bound cases, they can be handy.
gprof isn't as useful as some of the other profiling options, but it works on more platforms than some, and is easy to enable - just configure with:
./configure CXXFLAGS=-pg LDFLAGS=-pg
If your testcase is a separate program, then you also should also pass
you compile and link it.
Then run the testcase as normal, which will create a file
gmon.out in the current
directory with the raw profiling data in. To analyse this data, run:
Which produces a report of stdout (so you probably want to redirect output to a file or a pager).
valgrind's callgrind and cachegrind
These naturally only work on platforms which valgrind supports - see http://valgrind.org/ for an up-to-date list of these.
No recompiling is required - just prefix your test command like so:
valgrind --tool=cachegrind ./testcase
--tool=callgrind for callgrind).