Opened 12 years ago

Closed 10 years ago

#592 closed defect (fixed)

(crash) ChertTable::add_item_to_block on a seemingly corrupted block

Reported by: static-void Owned by: Olly Betts
Priority: normal Milestone: 1.2.17
Component: Backend-Chert Version: 1.2.5
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: Linux

Description

Xapian crashes when trying to flush Xapian WritableDatabase.

Environment: Ubuntu 11.10 x64; Xapian is used in zeitgeist-daemon (fts extension) through python bindings.

I've tried to diag the crash with gdb. The simple reason is as follows:

Inside of:
(backends/chert/chert_table.cc:683)
void
ChertTable::add_item_to_block(byte * p, Item_wr kt_, int c):

DIR_END(p) is zero, and the following line:

699     memmove(p + c + D2, p + c, dir_end - c);

causes to pass memmove an insane len arg (like len=18446744073709551599).

Dump of the first 16 bytes of the block p:

(gdb) x /16xb $rbx
0x151c3d0:	0x00	0x00	0x00	0x00	0x00	0xca	0xd9	0xca
0x151c3d8:	0xd9	0x00	0x00	0x00	0x00	0x00	0x00	0x00

Backtrace is attached.

Attachments (2)

backtrace.txt (4.9 KB ) - added by static-void 12 years ago.
asm_dumps.txt (8.0 KB ) - added by static-void 12 years ago.
Dump of related asm code, locals and registers

Download all attachments as: .zip

Change History (8)

by static-void, 12 years ago

Attachment: backtrace.txt added

comment:1 by static-void, 12 years ago

Although I haven't seen the inside of Xapian before starting this diag, I guess DIR_END(p) should not ever be zero. Could it be database file corruption?

I've got an idea how I can find the primary reason for this:

  • compile libxapian with the usual debug options and -fno-inline (to stop setint2 being inline)
  • breakpoint setint2(unsigned char *p, int c, int x) with condition c=9 and x=0 (if that's possible)

Is it worth it, i.e. can it be caught like this? Or should I just not waste time and delete the index file?

by static-void, 12 years ago

Attachment: asm_dumps.txt added

Dump of related asm code, locals and registers

comment:2 by Olly Betts, 12 years ago

We shouldn't really end up actually calling memmove with a bad size in this case. I'll add a sanity check.

Does xapian-check report the database as corrupt?

If so, the more interesting question is how it got that way, rather than how we end up getting to where the crash is in the code. That's likely to be much easy to understand if you can see it happening, rather than just the aftermath.

If xapian-check is happy, it's definitely worth investigating.

comment:3 by static-void, 12 years ago

Xapian-check does report an error:

record:
baseB blocksize=8K items=23132 lastblock=42 revision=3865 levels=1 root=42
B-tree checked okay
record table structure checked OK

termlist:
baseB blocksize=8K items=46264 lastblock=2175 revision=3865 levels=2 root=2172
B-tree checked okay
termlist table structure checked OK

postlist:
baseB blocksize=8K items=15549 lastblock=698 revision=3865 levels=2 root=10
B-tree error 90
xapian-check: btree error

I've tried to move the index dir away, so that zeitgeist would initiate a re-index. The resulting index is ok:

record:
baseB blocksize=8K items=24913 lastblock=37 revision=3 levels=1 root=14
B-tree checked okay
record table structure checked OK

termlist:
baseB blocksize=8K items=49826 lastblock=2146 revision=3 levels=2 root=631
B-tree checked okay
termlist table structure checked OK

postlist:
baseB blocksize=8K items=16622 lastblock=964 revision=3 levels=2 root=5
B-tree checked okay
postlist table structure checked OK

position:
baseB blocksize=8K items=635149 lastblock=1765 revision=3 levels=2 root=639
B-tree checked okay
position table structure checked OK

spelling:
Lazily created, and not yet used.

synonym:
Lazily created, and not yet used.

No errors found

Currently, I see no way to find out how it became corrupted. I guess I will just run zeitgeist normally and check if the issue would appear again. I left the old copy of the index as it is, if its of any interest.

comment:4 by static-void, 12 years ago

Btw, I wonder if it's the right place for a sanity check. I mean, there is a lot of code which relies on memory structures being correct. I wonder if it's possible and productive to cover it all with checks. Rather, such an index should not be loaded in the first place - i.e. there should be checks when loading, if it's possible.

comment:5 by Olly Betts, 10 years ago

Milestone: 1.2.17
Status: newassigned

Oops, this ticket really fell through the cracks. Sorry about that.

I've added a check that dir_end() is sane when a block is loaded for trunk in r17804, and marked for backporting for 1.2.17.

Last edited 10 years ago by Olly Betts (previous) (diff)

comment:6 by Olly Betts, 10 years ago

Resolution: fixed
Status: assignedclosed

Backported in r17805.

Note: See TracTickets for help on using tickets.