Opened 3 months ago

Closed 3 months ago

Last modified 3 months ago

#824 closed defect (fixed)

Out-of-bounds array access on table open if root info level is corrupt

Reported by: group13 Owned by: Olly Betts
Priority: normal Milestone: 1.4.25
Component: Backend-Glass Version: 1.4.24
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

In the Glass backend, the RootInfo level field comes from disk and is used before it is checked, which means that e.g. this snippet...

void
GlassTable::basic_open(const RootInfo * root_info, glass_revision_number_t rev)
{
    ...
    level =		   root_info->get_level();
    ...
    for (int j = 0; j <= level; ++j) {
	C[j].init(block_size);
    }

...is UB if the level in the file exceeds BTREE_CURSOR_LEVELS for whatever reason (perhaps file corruption on disk). If so, even if the table is otherwise unused, this runs past the end of the C array here and again when the GlassTable is destroyed. In practice, this results in a segmentation fault/crash; because it happens so early, it cannot be recovered through xapian-check.

We would instead expect that a Xapian::DatabaseCorruptError or similar would be thrown in this case.

We've confirmed that this happens with 1.4.24, on Linux, using the Glass backend in single-file mode. The Chert backend may be affected as well.

This issue was found by running the AFL++ fuzzer to generate corrupted single-file Glass database files.

Attachments (2)

crash.bin (40.0 KB ) - added by group13 3 months ago.
Corrupt database file from fuzzer
seed.bin (40.0 KB ) - added by group13 3 months ago.
Valid database used to seed the fuzzer

Download all attachments as: .zip

Change History (8)

comment:1 by Olly Betts, 3 months ago

Milestone: 1.4.25
Resolution: fixed
Status: newclosed

comment:2 by group13, 3 months ago

Thanks for the quick response!

Is it expected that the patched code sets the level member first and then checks the value? When trying to confirm the fix with our test input, it looks like the following now happens:

  1. DatabaseCorruptError is now thrown as expected
  2. As part of unwinding, the GlassTable is destroyed
  3. As part of GlassTable's destructor, the close method is called
  4. As part of GlassTable::close, a loop similar to the one above runs through all cursors to destroy them
  5. This loop uses the impossible level value from the file to know how many levels there are, and we hit the same UB/segfault.

This may be easier to reproduce with a test file that is wildly out of range rather than just out of range.

comment:3 by Olly Betts, 3 months ago

Resolution: fixed
Status: closedreopened

This may be easier to reproduce with a test file that is wildly out of range rather than just out of range.

FWIW my logic for picking the first "just wrong" value was to catch if the check added to the code was at the wrong threshold.

Your reasoning looks correct, but I'm surprised CI didn't catch this as it runs a build (and the testsuite) using ubsan. I can see valgrind not spotted a small overrun as there are members after this array in the class and valgrind works on already compiled code so likely only sees the allocated block, but ubsan should be able to add checks based on the actual members.

It'd be helpful to provide your test input when reporting these sort of bugs...

by group13, 3 months ago

Attachment: crash.bin added

Corrupt database file from fuzzer

by group13, 3 months ago

Attachment: seed.bin added

Valid database used to seed the fuzzer

comment:4 by group13, 3 months ago

It'd be helpful to provide your test input when reporting these sort of bugs...

Of course. I've added one of the crashing inputs, as well as the starting seed we used.

The test input is naturally "more corrupted than necessary" for this issue. Please let us know if you would like to see a cleaned-up/minimized version as well (so one that is only corrupt in the way mentioned above).

comment:5 by Olly Betts, 3 months ago

Resolution: fixed
Status: reopenedclosed

Of course. I've added one of the crashing inputs, as well as the starting seed we used.

Thanks, fixed in 44aaa6fdfdccb4f2708da0605fa6efcc327d7908 - I've moved the check to where the value is read from disk and decoded (which I probably should have done to start with, but it requires moving where we define the constant to check against). Backported for 1.4.25 as 61249b067d1477cc099df342ada2f96b24822f42.

I've also fixed a similar issue with lack of vetting of the blocksize which is read from the same file in the same method - that fix is 218e5c4b591c7ec5a17b0d78c6bf0a72876f6176, backported as 7e5c9a4963e4c1701b64f1fcf81a0c2c089874a6.

The test input is naturally "more corrupted than necessary" for this issue. Please let us know if you would like to see a cleaned-up/minimized version as well (so one that is only corrupt in the way mentioned above).

In this case just your reproducer was enough - it allowed me to verify the fix actually addressed the problem(s) it triggered.

comment:6 by group13, 3 months ago

Confirmed, we now get an "Impossibly deep Btree" message as expected. Thanks again!

Note: See TracTickets for help on using tickets.