Opened 9 years ago

Closed 9 years ago

Last modified 8 years ago

#675 closed defect (fixed)

Exception in [Chert/Glass]Cursor::find_entry()

Reported by: German M. Bravo Owned by: Olly Betts
Priority: high Milestone: 1.2.21
Component: Backend-Chert Version:
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

Glass and Chert dies if I do the following steps:

  1. open a NEW empty database for Writing
  2. open the same database for Reading
  3. read from the Reading database, for each document get the value of a slot
  4. index documents to the Writing database
  5. commit the Writing database
  6. reopen the Reading database
  7. read from the Reading database, for each document get the value of a slot
  8. repeat all until it crashes

This seems very related to issue #654 to me, although it's not entirely the same, there the remote backend dies due an exception while accessing the cursors during find_entry().

Attachments (3)

the_error.cpp (6.8 KB ) - added by German M. Bravo 9 years ago.
valgrind-chert.log (136.8 KB ) - added by German M. Bravo 9 years ago.
Logs with valgrind using chert database
valgrind-glass.log (22.4 KB ) - added by German M. Bravo 9 years ago.
Logs with valgrind using glass database

Download all attachments as: .zip

Change History (13)

comment:1 by German M. Bravo, 9 years ago

I've attached source code that triggers the problem.

by German M. Bravo, 9 years ago

Attachment: the_error.cpp added

by German M. Bravo, 9 years ago

Attachment: valgrind-chert.log added

Logs with valgrind using chert database

by German M. Bravo, 9 years ago

Attachment: valgrind-glass.log added

Logs with valgrind using glass database

comment:2 by German M. Bravo, 9 years ago

ChertTable::open() closes the table, which frees C[j].p and maybe (when it's lazy) do_open_to_read() doesn't create memory for those, yet returns true. Then later, when ChertTable::find is called, it uses C_[j].p but its memory doesn't exist as it has already been freed.

Maybe ChertTable::do_open_to_read should always initialize the C cursor when returning true? (or even maybe always?)... also, the same might apply to ChertTable::do_open_to_write as well.

comment:3 by German M. Bravo, 9 years ago

I believe this is the same bug: https://bugs.kde.org/show_bug.cgi?id=341990

comment:4 by German M. Bravo, 9 years ago

The cursor received in ChertTable::find(), C_, is a different object than that for which memory is allocated during ChertTable::do_open_to_read, so it remains with the old freed pointers during ChertTable::close. In other words, ChertTable::C is updated with new memory allocated, but ChertValueManager::cursor (the one calling ChertCursor::find_entry() -> ChertTable::find()) isn't so ChertValueManager::reset() needs a cursor.reset(0); (since the cursor will no longer be valid after the reopen) and value_manager.reset(); should be called *after* opening the tables in ChertDatabase::open_tables_consistent().

comment:6 by Olly Betts, 9 years ago

Component: OtherBackend-Chert
Status: newassigned

Thanks for the patch, but I'm not quite sold on the logic behind the change.

ChertTable::cursor_get() returns a new cursor, not the table's built-in cursor, so I'm not seeing how we end up with a stale cursor here, since the cursor which ChertTable::close() releases the memory for shouldn't be the cursor which ChertValueManager has (though valgrind's output does seem consistent with that being what's happening).

I'll have a poke and see if I can either convince myself your patch is right, or if not, work out what's actually going on.

Did you already agree to the patch licensing requirements?

http://trac.xapian.org/browser/git/xapian-core/HACKING#L1376

comment:7 by Olly Betts, 9 years ago

Milestone: 1.3.3

Aha, so the new ChertCursor copies the pointer to the root block from the cursor in the table, so that's why this the freeing of the blocks in built-in cursor in ChertTable::close() breaks the external cursor which the ChertValueManager holds.

In glass, we reference count the cursor blocks with copy-on-write, so this shouldn't be an issue - the root block will be valid until all references are gone.

I think the other problem (which affects both chert and glass) is when the table gains levels over a reopen - then the cursor in the ChertValueManager doesn't have enough levels and we end up iterating off the end of it. I've not yet confirmed that's what's happening, but assuming it is the patch does indeed look correct.

(Mostly orthogonally, it would be nice to reduce the rather needless churn of memory allocations that currently happens when reopen() finds a new revision...)

Last edited 8 years ago by Olly Betts (previous) (diff)

comment:8 by German M. Bravo, 9 years ago

Yes, I agree with the licensing of the patches. Is there anything else besides saying I do that I have to do?... I'd actually really love the whole Xapian was MIT licensed.

Regarding Glass, the problem is also there in that backend as well (for some reason). I'm not sure how the GlassValueManager::cursor gets those pointers, but the memory is being released by the GlassTable::close() and later is tried to be used. In glass it might be the reference counter isn't being increased for some reason when cursor is copied/assigned, however simply adding that cursor.reset(0) to the GlassValueManager::reset() doesn't seem to fix the issue as it does in Chert... and thinking about it, even if it did keep the blocks alive (if the reference counting there was working), would we still want to use the old blocks after the reopen? I mean, the close() during the reopen resets the cursors for the tables, I don't think we'd still would want to use a stale cursor for the value manager... but I might be understanding cursors an blocks incorrectly.

comment:9 by Olly Betts, 9 years ago

Milestone: 1.3.31.2.21

Before we actually use the data in the cursor, we check B->cursor_version != version and if it isn't, we call rebuild() which updates the cursor to match the table.

The table's cursor_version gets incremented each time add() or del() is called if at least one cursor has been created since the last call (the reason for this condition is to avoid churning the counter unnecessarily and having it wrap round).

Looking over the code involved, I see two bugs here though - we don't set cursor_created_since_last_modification in the cursor rebuild() method, but that flag is meant to really indicate if any cursor might exist with the current cursor_version, and after rebuild() a cursor now does.

Also, we don't consider incrementing cursor_version after cancel() or reopen() (the latter being very relevant here).

Fixing those, your testcase passes cleanly. I've written a testcase based on it and committed the fixes + the testcase in [1900eca6d325e9185b9f9f5c75a4a4a545616d7f/git]. Testing I found that the fixes for reopen() and rebuild() are both required to fix the testcase.

I think we're now good after those changes - although with chert the cursor still has a reference to a freed root block, the cursor will now get rebuilt before being used, so that doesn't actually matter as that dud reference will never get used.

If this doesn't fix the problem in you full system, please comment with details.

Marking for backporting for 1.2.21.

comment:10 by Olly Betts, 9 years ago

Resolution: fixed
Status: assignedclosed

Backported for brass and chert in [826d1a19cc356e7bf66c1681626e70af32967447].

Applied equivalent fixes for flint in [d784290ce015958474f965817f7a41f1483c3e03].

Note: See TracTickets for help on using tickets.