Opened 15 years ago

Closed 6 years ago

Last modified 6 years ago

#337 closed defect (fixed)

Extend test of behaviour of databases after close

Reported by: Richard Boulton Owned by: Guruprasad Hegde
Priority: low Milestone: 1.4.6
Component: Test Suite Version: SVN trunk
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

Currently, the tests in api_closedb.cc to check how objects associated with a database behave when the database is closed cover only the postlist iterator. These tests should be extended to cover at least termlist, alltermlist, allpostlist, and positionlist.

The tests should also cover all the methods of database.

Change History (26)

comment:1 by Olly Betts, 15 years ago

Milestone: 1.1.11.1.4
Type: taskdefect

Triaging milestone:1.1.1 bugs.

comment:2 by Olly Betts, 15 years ago

Priority: normallow

comment:3 by Olly Betts, 15 years ago

Milestone: 1.1.41.2.0

Bumping to keep on track - improving test coverage is not a release blocker.

comment:4 by Olly Betts, 12 years ago

Milestone: 1.2.x1.3.2

I've added coverage for many more methods of Database, and uncovered a minor bug in the process (fixed in trunk r16611).

Marking the rest for 1.3.2.

Last edited 6 years ago by Olly Betts (previous) (diff)

comment:5 by Olly Betts, 12 years ago

Owner: changed from Richard Boulton to Olly Betts
Status: newassigned

Added coverage for all remaining Database methods, and almost all WritableDatabase methods in r16612.

Still to do:

  • WritableDatabase::commit() (should either do nothing or throw expected exception)
  • transaction API
  • objects associated with a database, as description notes

comment:6 by Olly Betts, 12 years ago

I've backported the tests and applicable fixes for 1.2.11 in a series of commits concluding with r16696.

comment:7 by Olly Betts, 11 years ago

Milestone: 1.3.21.3.x

While adding the tests for Database did find a minor bug, reviewing this it seems less important than some other areas which currently have imperfect test coverage - e.g. looking at http://lcov.xapian.org/ I quickly see we don't have coverage for SmallVector::do_reserve(n) in the n > 2 case - that's used by Query objects with 3 or more subqueries. So I'm marking this as "suitable for the 1.3 series" instead of necessary for 1.3.2.

comment:8 by Olly Betts, 9 years ago

Milestone: 1.3.x1.4.x

Not a blocker for 1.4.0.

comment:9 by Olly Betts, 6 years ago

Keywords: GoodFirstBug added

comment:11 by Guruprasad Hegde, 6 years ago

In closedb5 test, remote backend is excluded, comment states it fails for remote case. I ran the test with remote and test seems to pass. Is the issue as stated in the comment fixed?

[gp@gp tests]$ ./runtest ./apitest -v closedb5
Running test './apitest -v closedb5' under eatmydata and valgrind
Running tests with backend "honey"...
Running tests with backend "none"...
Running tests with backend "inmemory"...
Running tests with backend "glass"...
Running test: closedb5... ok
./apitest backend glass: All 1 tests passed.
Running tests with backend "singlefile_glass"...
Running tests with backend "multi_glass"...
Running tests with backend "remoteprog_glass"...
Running tests with backend "remotetcp_glass"...
./apitest total: All 1 tests passed.

Last edited 6 years ago by Guruprasad Hegde (previous) (diff)

comment:12 by Olly Betts, 6 years ago

Owner: changed from Olly Betts to Guruprasad Hegde
Status: assignednew

Good spot - git bisect shows that this was fixed by 9cf77da645325affb88fac71760add8f4a9088a9 back in 2010 (so it was actually failing due to a bug in the disk-based backends, not one in the test harness or the remote backend). I've reenabled this for the remote backend in 963048332bf135457ca163dc339579d0a710a26e.

comment:13 by Guruprasad Hegde, 6 years ago

Observations on methods after db close -

commit() method: no op for local database types. Throws DatabaseError for remote type.

begin_transaction(): In case of flushed, fails only remote due to commit() method failure. For non-flushed, no op for all backends.

commit_transaction(): Fails with InvalidOpError for remote db type(for flushed and non-flushed case).

This behavior seems to be as expected(compared to comment 5).

Last edited 6 years ago by Guruprasad Hegde (previous) (diff)

comment:14 by Olly Betts, 6 years ago

I think you're coming at this the wrong way.

A testcase is meant to test what the implemented behaviour should be, not what how a particular implementation behaves.

So when as here there's a current implementation, you shouldn't start by testing the current implementation and then writing the test based on that. The current implementation could be wrong, and if it is then discovering that is a useful result from this exercise (previous added test coverage here uncovered a bug - see 2b40b693c2dd15a196b7f2699ec848f804595f80).

So instead think about what the behaviour of these methods should be after close() and write a test for that.

The documented behaviour is that the method should either behave as it would have before close() was called or else throw DatabaseError.

For commit(), the pre-close() behaviour when there are no pending changes is actually to do nothing so the local behaviour is OK (though a little odd at first sight). And the remote behaviour is also OK. But it would also be OK for the local behaviour to be throwing an exception and/or for the remote behaviour to be doing nothing so the testcase should accept either behaviour and not test which happens depending on the backend name. That's also what existing checks in this testcase are doing.

For transactions, the transaction has no observable effect unless anything is done within it, so it's probably OK for the calls to do nothing, and it's certainly OK to throw DatabaseError.

If begin_transaction() fails, you aren't in a transaction, so commit_transaction() throwing InvalidOperationError is correct. If begin_transaction() succeeds, then commit_transaction() should either work or throw DatabaseError.

comment:15 by Guruprasad Hegde, 6 years ago

You are right, I got your point, thanks.

If begin_transaction() succeeds, then commit_transaction() should either work or throw

DatabaseError.

Test steps: begin_transaction() -> add_document() -> close() -> commit_transaction()

local db throws DatabaseError, but remote db throws InvalidOperationError. Response is not consistent. How to go about this?

comment:16 by Olly Betts, 6 years ago

If begin_transaction() failed you aren't in a transaction.

When you aren't in a transaction, commit_transaction() is not a valid operation, so InvalidOperationError is correct (as I said above).

If you are in a transaction, commit_transaction() on a closed database should either just end the transaction or throw DatabaseError. (The first option assumes there's not actually anything to commit, but I'm not sure how you could have anything to commit since e.g. add_document() on a closed database should throw DatabaseError, and Database::close() should end any active transaction so you can't have a transaction active from before the close()).

comment:17 by Guruprasad Hegde, 6 years ago

Understood. I guess now all of the methods are addressed. Let me know if additional tests needed.

Even though close() is called explicitly here, some other event can cause a similar effect as close() like network error. In such case DatabaseError or kind of NetworkError raised? (It's a different matter I guess, just thought of asking).

comment:18 by Olly Betts, 6 years ago

Understood. I guess now all of the methods are addressed. Let me know if additional tests needed.

Thanks, will review when I get a chance (probably tomorrow).

Even though close() is called explicitly here, some other event can cause a similar effect as close() like network error. In such case DatabaseError? or kind of NetworkError? raised? (It's a different matter I guess, just thought of asking).

I don't think we currently have any tests for what happens when the remote server dies, but it would be good to. And as you suggest the testcases could be very similar to these. I think we'd just need to add a mechanism to the test harness to kill the remote server corresponding to a particular database.

It's harder to test in the non-remote case - I can't see a good way to force a database error for a local database that's realistic. You could probably arrange to close all the file descriptors it is using - a bit artificial, but perhaps is better than not having such tests.

comment:19 by Olly Betts, 6 years ago

Keywords: GoodFirstBug removed
Milestone: 1.4.x1.4.6

Fixed by 67353b1ba9bca92979591fdb4083a7745702f1f3 on git master. Worth trying to backport so setting milestone.

comment:20 by Olly Betts, 6 years ago

I've created #758 to track adding tests for remote backend errors (as discussed above).

comment:21 by Guruprasad Hegde, 6 years ago

You could probably arrange to close all the file descriptors it is using - a bit artificial, but perhaps is better than not having such tests.

I was thinking how to acheive this, can you tell me how to get the file descriptors for db files? Is it maintained in database object?(I glanced through backend code, couldn't find)

comment:22 by Olly Betts, 6 years ago

I'm not sure the local database case is as interesting as the remote one, as just closing the file descriptors isn't a realistic simulation of an actual failure mode, but FWIW it'd really need to happen in the test harness code, not the backend code. The Database::Internal object (e.g. GlassDatabase) knows what fds are open on the database, but there's no way to ask it to report that information, and it doesn't seem an appropriate public API. And better to avoid adding special private APIs just for the test harness if we can, not least because it's har

The test harness supports running tests on different "backends" (which are slightly different to the library's concept of a backend, since a test harness "backend" specifies the whole setup - e.g. the test harness backed "remotetcp_glass" is the remote library-backend talking TCP to a remove server using the glass library-backend for its database).

The harness knows the backend it's running tests for, and what features that supports (e.g. spelling, transactions) and testcases are flagged with what feature combinations they need. The testcase just tells the harness things like "make me a WritableDatabase filled with data by generator function X", and the harness creates one of the right type with that data in.

So the harness could (at least on platforms with /proc/self/fd) note which fds are in use, create the database, and check which additional fds are now in use - those are presumably open on the database we just opened. One potential flaw is that some library call made during DB opening/creation might open some file used internally lazily - e.g. /dev/urandom might get opened to generate a UUID for the database. If that happens, we'd assume that file was one for the database. It's also hard to make work if the test harness becomes multi-threaded, which is something that will probably happen in the future to speed up running the testsuite.

Or perhaps compare the targets of the symlinks in /proc/self/fd with the directory path of the database, though that potentially has path normalisation issues, and I'm not sure if all platforms with /proc/self/fd have symlinks there (Linux does at least).

There's existing code in xapian-core/tests/harness/fdtracker.cc which tracks fds so it can report when a testcase fails to close all the file descriptors it opened. This works on Linux and a few other platforms which have /proc/self/fd - on other platforms we'll fail to open that special directory (which contains an entry for each fd open by the current process) and the FD checking is disabled automatically when that happens.

I notice it has a subtle bug actually - the DIR handle that opendir() returns wraps a file descriptor, so that'll be reported by readdir() - you can see this effect if you ls /proc/self/fd:

$ ls -l /proc/self/fd
total 0
lrwx------ 1 olly olly 64 Apr 18 09:16 0 -> /dev/pts/8
lrwx------ 1 olly olly 64 Apr 18 09:16 1 -> /dev/pts/8
lrwx------ 1 olly olly 64 Apr 18 09:16 2 -> /dev/pts/8
lr-x------ 1 olly olly 64 Apr 18 09:16 3 -> /proc/20289/fd

The first three entries are stdin, stdout, stderr all pointing to the terminal, the last is open on the directory /proc/self/fd symlinks to and is from ls calling opendir() (or perhaps actually some lower level system call which works in a similar way).

So really the FD tracking should get that fd using dirfd() on the DIR handle and discount it.

comment:23 by Olly Betts, 6 years ago

Handling this is probably actually simpler for the remote database case - at least in the TCP case that already knows which fds are in use for remote databases (see the "pid_to_fd" array in xapian-core/tests/harness/backendmanager_remotetcp.cc).

comment:24 by Guruprasad Hegde, 6 years ago

Thanks for the detailed explanation, I got to know many things. I agree simulating local db failure case is not interesting. I want to work on the remote case. I will update my progress on that ticket.

Please address question below only when you have time:

I didn't get the bug.

DIR is directory handler for /proc/self/fd, so fd returned by dirfd() will be corresponding to the opened directory. How is that useful? Would you elaborate a bit? You may be talking in the context of getting fds to close db files? In that case close(dirfd(DIR) closes all files right?

As this is not important scenario, please address only if you have time.

comment:25 by Olly Betts, 6 years ago

Resolution: fixed
Status: newclosed

Backported for 1.4.6 in 148449e487b20bf61b6dc5fa5d280fdc5cd819bf.

That uncovered what's arguably a bug in TermIterator::get_termfreq() over a multi-database (it was approximating its answer; this code has been replaced in git master with code that doesn't) so I fixed that. So the added test coverage uncovered a bug, though not actually in what it was being added to test!

in reply to:  24 comment:26 by Olly Betts, 6 years ago

Replying to gp1308:

DIR is directory handler for /proc/self/fd, so fd returned by dirfd() will be corresponding to the opened directory. How is that useful? Would you elaborate a bit?

The fdtracker gets run before and after each test and needs to record what fds are open, but in order to find out what fds are open it has to call opendir(), and doing that opens another fd. So without extra care we'd note that fd as being open too, when it's really just an artefact of how we determine which fds are open.

So what I was saying is we ought to call dirfd() on the DIR* we get from opendir() and then exclude that from the list of fds we get by calling readdir().

However, I missed that the fdtracker code actually keeps the DIR* open while the testcase runs, and uses the same DIR* for the pre- and post- testcase fd enumeration. It still gets an extra fd in its list, but that's harmless because it's there and the same fd both before and after, so it will be quietly ignored like any already open fd (e.g. 0, 1, 2 will be open for stdin, stdout, stderr). We are only interested in finding leaked fds - i.e. those which a testcase opens but fails to close.

You may be talking in the context of getting fds to close db files? In that case close(dirfd(DIR) closes all files right?

In the context of closing fds for a database we definitely SHOULD NOT close dirfd(DIR) - that would be undefined behaviour, see http://pubs.opengroup.org/onlinepubs/9699919799/functions/dirfd.html :

The dirfd() function shall return a file descriptor referring to the same directory as the dirp argument. This file descriptor shall be closed by a call to closedir(). If any attempt is made to close the file descriptor, or to modify the state of the associated description, other than by means of closedir(), readdir(), readdir_r(), rewinddir(), or seekdir(), the behavior is undefined.

Note: See TracTickets for help on using tickets.