Ticket #284 (assigned defect)

Opened 4 months ago

Last modified 6 weeks ago

occasional DatabaseModifiedErrors

Reported by: mrks Owned by: olly
Priority: normal Milestone: 1.0.10
Component: Backend-Flint Version: 1.0.7
Severity: normal Keywords:
Cc: mrks@… Blocked By:
Operating System: Linux Blocking:

Description

I use xapian-core-1.0.7 with the corresponding perl bindings. I run a 1 writer/N reader setup, and I do reopen() a database-handle before each query. Nevertheless I casually get DatabaseModifiedErrors.

This is what I found out so far:

* The errors occurs after explicit flushing my most frequented index. The error does less often occur, if I do a sleep(1) after each explicit flush() before applying no changes (without flush) to the index, and it never occured so far with a sleep(4). This is my workaround.

* I already set XAPIAN_FLUSH_THRESHOLD to a large value (100000).

* I patched the xapian-core lib to log all calls of FlintDatabase::set_revision_number(), and the throw-points of the XapianModifiedErrors, which turned out that the exception gets thrown in FlintTable::set_overwritten().

* I patched again to get the caller and found out that set_overwritten() got called by FlintTable::block_to_cursor(), which I patched again to expose the condions:

if (REVISION(p) > REVISION(C_[j + 1].p)) {
 fprintf(stderr, "set_overwritten: from block_to_cursor() %d > %d\n", REVISION(p), REVISION(C_[j + 1].p));
 set_overwritten();
 return;
}

and it turned out:

set_overwritten: from block_to_cursor() 10194 > 10192
terminate called after throwing an instance of 'Xapian::!DatabaseModifiedError'
(...)
set_overwritten: from block_to_cursor() 10195 > 10193
terminate called after throwing an instance of 'Xapian::!DatabaseModifiedError' set_overwritten: from block_to_cursor() 10195 > 10193
terminate called after throwing an instance of 'Xapian::!DatabaseModifiedError'
(...)
set_overwritten: from block_to_cursor 10199 > 10197
terminate called after throwing an instance of 'Xapian::!DatabaseModifiedError'
set_overwritten: from block_to_cursor 10199 > 10197
terminate called after throwing an instance of 'Xapian::!DatabaseModifiedError'

I originally tested this with xapian-1.0.6, but it also occurs in 1.0.7.

I run xapian on Ubuntu Linux 8.04 (Hardy) with a 2.6.24-19-server kernel and an ext 3 file filesystem. The machine is an IBM x3650 with 40 GB RAM, and a ServeRAID-8k Controller running a Raid 10 over 6 SAS-Disks.

My most frequented index (the one that drops the exceptions) contains about 850.000 documents, needs 11 Gb of disk space, gets 5-15 updates per second, and about 20-25 search hits per second. I flush() this index every 10 minutes (which takes about 60-100 seconds + 4 seconds workaround delay ;-)

Change History

Changed 4 months ago by olly

  • status changed from new to assigned
  • version changed from SVN HEAD to 1.0.7
  • milestone set to 1.0.8

I wonder if the issue could be that writes are being reordered aggressively by the disk subsystem such that we pick up some inconsistent intermediate state.

Perhaps we need to flush something like this to avoid such issues:

  • fdatasync() (or equivalent)
  • write new root block
  • fdatasync() (or equivalent)
  • update basefiles

Currently we do the above but without the first fdatasync().

I'll put together a patch for you to test which does this, or perhaps something which shows the timings if I can think of a good way to do that...

Changed 4 months ago by richard

The extra fdatasync() is worth a try, though I can't quite see how that can be the explanation, because the new root block shouldn't be being accessed until the new basefile has been written (or at least, not by a reader open at the old revision.

In the error printfs, I notice that the version numbers are 2 apart (eg "10194 > 10192"). So, this means a reader open at rev 10192 is finding that a block it wanted to read is now part of revision 10194. This looks very much as if the reopen has failed to work, somehow. Do we know what the reader is doing when the error occurs? One possibility that occurred to me is that the reopen() call itself is throwing the error - perhaps it's trying to read a block as part of the cleanup of the old revision or something (though I haven't found anything in the code to reflect that).

Changed 4 months ago by mrks

Do we know what the reader is doing when the error occurs?

The exception occurs during my (perl-API) get_mset() call.

One possibility that occurred to me is that the reopen() call itself is throwing the error.

Not in my code anyway. My (perl-API) reopen() returns before the exception gets thrown.

I wrapped my calls and logged as follows:

(6223) reopen() begin
(6223) reopen() end
(6223) get_mset() begin
(6223) get_mset() end
(6223) reopen() begin
(6223) reopen() end
(6223) get_mset() begin
(6223) Xapian::DatabaseModifiedError: in set_overwritten    # <- this line comes from my patched xapian lib
terminate called after throwing an instance of 'Xapian::DatabaseModifiedError'

(The number in brackets is the PID of the process, to be sure that nothing gets mixed up.)

Changed 3 months ago by olly

  • summary changed from occasional !DatabaseModifiedErrors to occasional DatabaseModifiedErrors
  • milestone changed from 1.0.8 to 1.0.9

Bumping to 1.0.9. We need to get to the bottom of this, but we also need to get 1.0.8 out.

Changed 7 weeks ago by olly

Just looking at bugs marked for 1.0.9. I suspect we'll have to bump this one again for now, but I've just been trying to break it down a bit...

It seems to me that one of the following is probably true:

* Our strategy for moving a database to a new revision is flawed.

* We've incorrectly implemented a valid strategy.

* The OS (or possibly disk hardware) doesn't correctly implement some guarantee

(of POSIX or some other relevant standard) which our strategy relies on.

* The compiler miscompiled the code (last refuge excuse of the desperate bug-chaser!)

So, does that help? It doesn't magically solve it, but perhaps we can narrow things down.

For example, can we reproduce this on another system? (That would make it easier to poke with a debugger too!)

Does a different GCC version make any difference if it's easy to rebuild with one?

Changed 7 weeks ago by mrks

* The OS (or possibly disk hardware) doesn't correctly implement some guarantee

I have no idea how to test this. If you do let me know. In addition to the information above: I use the deadline scheduler (default in ubuntu server kernels), and have the write-cache enabled in my ServeRAID-8k controller (as it is backup-battery powered, this should not make any differences, even in case of power outage).

* The compiler miscompiled the code (last refuge excuse of the desperate bug-chaser!)

I just compiled it with gcc/g++ 4.1 instead of 4.2 - with no effect.

For example, can we reproduce this on another system? (That would make it easier to poke with a debugger too!)

I tried to reproduce it with some generated indices in different sizes with no success so far. Its hard to simulate the actual load which the live-system gets. I am also not able to move the whole setup to another physical machine without risk and downtime.

I'll keep trying.

Changed 6 weeks ago by olly

I'm not sure how to check for OS/disk hardware bugs either. I did notice this recently:

http://google-opensource.blogspot.com/2008/08/linux-disk-scheduler-benchmarking.html

Which mentions they found and fixed bugs in the deadline scheduler while benchmarking.

It might be interesting to see if you can reproduce the problem with other schedulers or with the hardware write cache off, if it's possible to test that without disrupting the live system.

Good to have rule out miscompilation (well probably anyway, it could be miscompiled by both versions, but I don't think this is a likely explanation anyway).

I'm going to skip this for 1.0.9, since we don't yet have a grasp of what the cause is, so the only other option is to delay 1.0.9 until we do, which doesn't help anyone.

Changed 6 weeks ago by olly

  • milestone changed from 1.0.9 to 1.0.10
Note: See TracTickets for help on using tickets.