Opened 17 years ago

Closed 16 years ago

#173 closed enhancement (fixed)

Bindings should have an explicit WritableDatabase::close() method

Reported by: Richard Boulton Owned by: Richard Boulton
Priority: normal Milestone: 1.1.0
Component: Xapian-bindings Version: SVN trunk
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description (last modified by Richard Boulton)

In garbage collected languages, it is hard to ensure that a WritableDatabase object has been closed, because this requires ensuring that no objects still hold a reference to it. To make this easier, WritableDatabases should have an explicit close() method, which would delete the underlying C++ object. After this method has been called, all other methods on the WritableDatabase object in the bindings would be invalid.

Change History (14)

comment:1 by Olly Betts, 17 years ago

Status: newassigned

Odd, I thought there was a bug for this already!

comment:2 by Olly Betts, 17 years ago

Blocking: 120 added

Marking for 1.0.x.

comment:3 by Olly Betts, 17 years ago

Blocking: 120 removed
Summary: Bindings should have an explicit database close() methodBindings should have an explicit WritableDatabase::close() method

Ah yes, I remember why this isn't as easy as it at first seems.

I don't think this can be implemented in the bindings, since deleting the WritableDatabase just decreases its reference count by one. You might as well just do `database = 0' in your Python code. Neither addresses the problem of lingering references.

The most obvious way to implement this at the C++ level is that WritableDatabase::close() would just set the "internal" pointer to NULL. Sadly this doesn't really solve anything as again that only reduces the handle count by one.

So I think this would need to be handled by making every backend's subclass of Database::Internal support a "this database has been closed" mode where most operations fail. And any class which currently carefully keeps a database reference around to ensure that the database doesn't get pulled out from underneath it could then have its intentions subverted, and we need to consider and handle any fallout from that.

This seems a rather disproportionate amount of ugliness and complication, so I wonder if there might be a better approach. Do you have some examples where this problem manifests?

(I know there's an issue which may be of this sort in Tcl, but I really don't follow what is going on there, as a seemingly very similar variant works fine

  • details are in bug#62).

Dropping the 1.0.x dependency for now...

comment:4 by Richard Boulton, 17 years ago

I've come across this myself in python scripts which try to handle multithreading by sharing a reference to a database, with an associated mutex to protect access. There was also a reference to the problem in a posting I recently re-read in the xapian mailing list archives on gmane: http://article.gmane.org/gmane.comp.search.xapian.general/1387 (point 2 in that very long email), which was again mentioned in http://thread.gmane.org/gmane.comp.python.twisted/11817/focus=11818

In python, with the current implemention of reference-counted garbage collection, it's not usually a big problem. However, if any of the objects involved form a cycle, the reference-counted garbage collection doesn't work, so you have to wait for a full garbage collection cycle to trigger, which happens at an unpredictable time. Therefore, even if all objects with references to the database or any of it's derived objects are deleted, the database can still remain open for an unpredictable length of time and cause DatabaseLocked exceptions.

It may well be possible to write higher-level abstraction layers which deal with this problem by keeping tight control of the database objects and their derived objects, and writing code to ensure that they are all unreferenced, and don't get involved in cycles. However, this is tricky to get right; and in languages like Java with a less deterministic garbage collection system, even that might not be possible.

Therefore, I think the only way to fix this is to provide a close() method. Unfortunately, you've just done a very good job of pointing out how tricky that is to implement correctly. I do think it's the only way to do it, though... :(

comment:5 by Richard Boulton, 17 years ago

Owner: changed from Olly Betts to Richard Boulton
Status: assignednew

comment:6 by Richard Boulton, 17 years ago

Operating System: All
Status: newassigned

comment:8 by Richard Boulton, 17 years ago

Description: modified (diff)
Type: defectenhancement

comment:9 by Richard Boulton, 16 years ago

Milestone: 1.1.1

Flint and Chert Database now have the ability to be closed "permanently", which would allow an easy implementation of this, I think.

Marking this for 1.1.1, since it's more plausible to implement now, and would be very useful to me (but I don't want it to block 1.1.0).

comment:10 by Richard Boulton, 16 years ago

Milestone: 1.1.11.1.0
Resolution: fixed
Status: assignedclosed

Implemented on trunk in revision [11797]. Marking as fixed, and changing milestone to 1.1.0, since it should be in that, now.

comment:11 by Olly Betts, 16 years ago

r11797 is the trivial change to xapian-bindings/xapian.i - the actual implementation in xapian-core is r11796...

comment:12 by Olly Betts, 16 years ago

Resolution: fixed
Status: closedreopened

Hmm, I think we need to think about the semantics of this a bit more so I'm going to reopen this for now to make sure we do before 1.1.0.

Declaring that calling any methods of the object (apart from to call close() on it again) as "undefined" seems a bit unhelpful. Assuming that this is the ISO C definition of "undefined", then that means anything can happen. Since things like documents and iterators implicitly access the database, I think we want to promise more than this. It's also bad to allow access to undefined behaviour from the bindings.

I think "unspecified" is closer to what we want here - in reality, either you'll get an exception, or it'll use some cached value (the "right" answer) or some default behaviour (e.g. behaving as an empty database in many ways for inmemory).

I'm not sure we want to promise to throw an exception for any use. Perhaps we should aim for the "right" answer or an exception? Having to gate all inmemory methods isn't a big deal if we get a consistency win that way - it's not currently built for performance and will be replaced by a variant of the current disk-based backend eventually anyway.

Side issue - there's no feature test for this in the bindings!

comment:13 by Richard Boulton, 16 years ago

Just to note that I agree with Olly's comments about the semantics, and am working on this issue. I've also got a feature test for the bindings half-written.

comment:14 by Richard Boulton, 16 years ago

I've committed fixes to the comments in the header to describe the behaviour as "right" answer or an exception, and to inmemory to gate all the inmemory methods. I've also extended the test coverage to test the behaviour of some iterators when the database is closed part-way through use. This showed some problems with chert and flint (which were causing segfaults when such iterators were used), which I've also fixed.

Getting close to being ready to declare this bug fixed again, but I want to complete the test coverage to check the behaviour of _all_ iterators first.

comment:15 by Richard Boulton, 16 years ago

Resolution: fixed
Status: reopenedclosed

Created ticket #337 to track implementing further tests - marked that for release 1.1.1 - it doesn't need to block 1.1.0 being released.

Marking this ticket as fixed, therefore.

Note: See TracTickets for help on using tickets.