Opened 4 years ago

Last modified 4 years ago

#758 new defect

Test behaviour on remote backend errors

Reported by: Olly Betts Owned by: Olly Betts
Priority: normal Milestone:
Component: Test Suite Version: git master
Severity: normal Keywords: GoodFirstBug
Cc: Blocked By:
Blocking: Operating System: All

Description

We have very little coverage of handling of database errors.

Mostly this is because it is tricky for the test harness to arrange for them to happen, but for the remote backend we could add a way to tell the harness to kill the server corresponding to a particular database it provided, and at least test cases where the remote backend loses its connection.

As gp1308 pointed out in #337, the testcases would be fairly similar to those in api_closedb.cc which test behaviour after an explicit call to close() on a database.

Change History (5)

comment:1 by Olly Betts, 4 years ago

Keywords: GoodFirstBug added

comment:2 by Guruprasad Hegde, 4 years ago

Comment from olly in #337:

Handling this is probably actually simpler for the remote database case - at least in >the TCP case that already knows which fds are in use for remote databases (see the >"pid_to_fd" array in xapian-core/tests/harness/backendmanager_remotetcp.cc).

Initially, I thought pid_to_fd will contain the pid of the server for the entire duration of test run, but later found out it gets cleaned up once child signals the parent.

I am thinking to add a member to BackendManagerRemote class to store the pid of the remote server. It can be used later by the kill() method.

Where should these tests go(api_closedb.cc or other)?

comment:3 by Guruprasad Hegde, 4 years ago

Initial attempt: https://github.com/guruhegde/xapian/commit/bffcad5be953c30b51a0e99857c8eb68d7f62c63

Above commit is buggy(doesn't kill the server because pid_to_fd is empty).I want to show the interface used by apitest. Please comment on that and suggest if you have better way.

I thought of raising PR once I discuss the plan.

comment:4 by Olly Betts, 4 years ago

Initially, I thought pid_to_fd will contain the pid of the server for the entire duration of test run, but later found out it gets cleaned up once child signals the parent.

It's really once the OS signals to the parent that the child process has terminated (the child doesn't signal this to parent - the child process no longer exists at that point).

If we didn't clean out entries for processes which have gone we'd need a much larger fixed sized array, and linearly scanning it to find entries would probably be problematic. A dynamic data structure doesn't work well here as we're inside the memory leak checking that's wrapped around the running of each testcase. But a fixed size array for just the live servers can be much smaller, and doesn't need enlarging just because more testcases have been added.

Where should these tests go(api_closedb.cc or other)?

api_closedb.cc seems reasonable - it's not too large yet and these tests are related.

I want to show the interface used by apitest. Please comment on that and suggest if you have better way.

It would be better if we could kill just the remote server corresponding to a particular database. Killing all running remote databases would be a real problem if we move to a multi-threaded test harness, which is something that would be good to be able to do so the tests can be run faster.

Also it would allow testing more cases - for example, searching two remote databases together and then one of them dies but the other still works (which simulates failure of one node in a distributed sharded system).

This needs a way to map from a database to the server process to kill. Not sure how best to allow that. Possibly there needs to be an opaque (to the testcase) token which a testcase can get when it opens a database it will want to kill later. That token could just be the index into pid_to_fd.

comment:5 by Guruprasad Hegde, 4 years ago

Note: See TracTickets for help on using tickets.