Opened 13 years ago
Closed 11 years ago
#592 closed defect (fixed)
(crash) ChertTable::add_item_to_block on a seemingly corrupted block
Reported by: | static-void | Owned by: | Olly Betts |
---|---|---|---|
Priority: | normal | Milestone: | 1.2.17 |
Component: | Backend-Chert | Version: | 1.2.5 |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | Linux |
Description
Xapian crashes when trying to flush Xapian WritableDatabase.
Environment: Ubuntu 11.10 x64; Xapian is used in zeitgeist-daemon (fts extension) through python bindings.
I've tried to diag the crash with gdb. The simple reason is as follows:
Inside of: (backends/chert/chert_table.cc:683) void ChertTable::add_item_to_block(byte * p, Item_wr kt_, int c): DIR_END(p) is zero, and the following line: 699 memmove(p + c + D2, p + c, dir_end - c); causes to pass memmove an insane len arg (like len=18446744073709551599). Dump of the first 16 bytes of the block p: (gdb) x /16xb $rbx 0x151c3d0: 0x00 0x00 0x00 0x00 0x00 0xca 0xd9 0xca 0x151c3d8: 0xd9 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Backtrace is attached.
Attachments (2)
Change History (8)
by , 13 years ago
Attachment: | backtrace.txt added |
---|
comment:1 by , 13 years ago
comment:2 by , 13 years ago
We shouldn't really end up actually calling memmove with a bad size in this case. I'll add a sanity check.
Does xapian-check report the database as corrupt?
If so, the more interesting question is how it got that way, rather than how we end up getting to where the crash is in the code. That's likely to be much easy to understand if you can see it happening, rather than just the aftermath.
If xapian-check is happy, it's definitely worth investigating.
comment:3 by , 13 years ago
Xapian-check does report an error:
record: baseB blocksize=8K items=23132 lastblock=42 revision=3865 levels=1 root=42 B-tree checked okay record table structure checked OK termlist: baseB blocksize=8K items=46264 lastblock=2175 revision=3865 levels=2 root=2172 B-tree checked okay termlist table structure checked OK postlist: baseB blocksize=8K items=15549 lastblock=698 revision=3865 levels=2 root=10 B-tree error 90 xapian-check: btree error
I've tried to move the index dir away, so that zeitgeist would initiate a re-index. The resulting index is ok:
record: baseB blocksize=8K items=24913 lastblock=37 revision=3 levels=1 root=14 B-tree checked okay record table structure checked OK termlist: baseB blocksize=8K items=49826 lastblock=2146 revision=3 levels=2 root=631 B-tree checked okay termlist table structure checked OK postlist: baseB blocksize=8K items=16622 lastblock=964 revision=3 levels=2 root=5 B-tree checked okay postlist table structure checked OK position: baseB blocksize=8K items=635149 lastblock=1765 revision=3 levels=2 root=639 B-tree checked okay position table structure checked OK spelling: Lazily created, and not yet used. synonym: Lazily created, and not yet used. No errors found
Currently, I see no way to find out how it became corrupted. I guess I will just run zeitgeist normally and check if the issue would appear again. I left the old copy of the index as it is, if its of any interest.
comment:4 by , 13 years ago
Btw, I wonder if it's the right place for a sanity check. I mean, there is a lot of code which relies on memory structures being correct. I wonder if it's possible and productive to cover it all with checks. Rather, such an index should not be loaded in the first place - i.e. there should be checks when loading, if it's possible.
comment:5 by , 11 years ago
Milestone: | → 1.2.17 |
---|---|
Status: | new → assigned |
Oops, this ticket really fell through the cracks. Sorry about that.
I've added a check that dir_end() is sane when a block is loaded for trunk in r17804, and marked for backporting for 1.2.17.
Although I haven't seen the inside of Xapian before starting this diag, I guess DIR_END(p) should not ever be zero. Could it be database file corruption?
I've got an idea how I can find the primary reason for this:
Is it worth it, i.e. can it be caught like this? Or should I just not waste time and delete the index file?