Opened 15 years ago

Closed 5 years ago

#320 closed defect (fixed)

replicationtest.py fails most of the time

Reported by: Olly Betts Owned by: Richard Boulton
Priority: normal Milestone: 1.4.14
Component: Xapian-bindings (Python) Version: git master
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description

Here's the output from "make check" on atreus:

Running test: replication_concurrency.../home/olly/svn/xapian-trunk/xapian-core/bin/.libs/lt-xapian-replicate: NetworkError: Unable to fully synchronise: Can't open database: Cannot open tables at consistent revisions
 FAILED

replicationtest2.py:139:Expected equality: got '1', expected '2'
   137             set_master(masterpath, secondpath)
   138             time.sleep(1)
-> 139             expect(xapian.Database(slavepath).get_metadata('dbname'), '2')
   140
   141             set_master(masterpath, firstpath)
Xapian version: 1.1.0
Platform: Linux 2.6.24.2 (#1 SMP Mon Feb 18 22:05:27 GMT 2008)

When reporting this problem, please quote all the preceding lines from
"replicationtest2.py:139" onwards.

0 tests passed, 1 tests failed
FAIL: replicationtest.py
=======================================
1 of 3 tests failed
Please report to http://xapian.org/bugs
=======================================

I've now wasted several hours failing to fix this. I'm starting to suspect that the testcase is likely just broken as it currently exists, but I don't understand it well enough to sort that out. So in the interests of keeping the snapshot builder working, I'm going to disable it for now as Richard seems to be away.

I thought for a while that the issue was broken databases in dbs_replication as the testcase just tries to blindly use them if they exist. It passed once when I removed them, but this doesn't seem to be repeatable.

I noticed that "make clean" doesn't remove these databases (fix committed) and also that pythontest2.py leaves databases behind for a couple of testcases (fix also committed).

As an aside, using a fixed TCP port number for the server is problematic - unless I'm missing some subtlety, it means that the test will fail if it's already being run on the same host - on atreus that means me vs Richard vs James vs buildbot vs the snapshot builder all contending for port 7876, and someone else might even have a service using that port for something else. The easiest fix would probably be to skip the test if the server can't be started. A better fix would be to try different port numbers like the C++ test harness does.

Marking for 1.1.0, at least for now.

Change History (10)

comment:1 by Olly Betts, 15 years ago

I have a theory - I wonder if before committing you changed the "create database" code but were using cached databases created by a previous run. So the test passed for you, but won't when run from a fresh tree.

That doesn't explain the test passing for me once, but I may have been mistaken about that.

Anyway, disabled for now.

comment:2 by Richard Boulton, 15 years ago

Status: newassigned

I probably shouldn't have enabled the test by default, sorry.

The test is sensitive to timing issues - which is hard to avoid since its a test of concurrency. However, the bit which is failing can be fixed - I'll change it to check for equality every second or so, and only fail if the value is still unequal after a long period of time (say, 10 seconds). I believe it is failing because the replication client hasn't yet finished replicating the changed database.

The port issue is as you describe - I'll need to resolve that before we re-enable the test by default, too.

comment:3 by Olly Betts, 15 years ago

Milestone: 1.1.01.1.1

Doesn't need to block 1.1.0.

comment:4 by Olly Betts, 15 years ago

Milestone: 1.1.11.1.7

Triaging milestone:1.1.1 bugs.

comment:5 by Olly Betts, 15 years ago

Milestone: 1.1.71.2.0

Bumping for now, but if you want to fix it for 1.1.x I've no objections.

comment:6 by Olly Betts, 14 years ago

Component: Xapian-bindingsXapian-bindings (Python)

comment:7 by Olly Betts, 12 years ago

Milestone: 1.2.x1.3.x

Unlikely to get done for 1.2.x now.

comment:8 by Olly Betts, 9 years ago

Milestone: 1.3.x1.4.x

Not something we'd hold 1.4.0 for.

comment:9 by Olly Betts, 5 years ago

Milestone: 1.4.x1.4.14
Version: SVN trunkgit master

I don't see much point keeping this open any longer. If it's not been fixed in over a decade, it seems unlikely it ever will be. We have reasonable test coverage for replication in the xapian-core testsuite.

Removed from master in a6ce127b3ce9c673dc085d4079c1171e75cedf44. Will backport for 1.4.14.

comment:10 by Olly Betts, 5 years ago

Resolution: fixed
Status: assignedclosed

Removed from RELEASE/1.4 branch in 1135c9f61d12f5fca0a219debe80206e5db5dc71.

Note: See TracTickets for help on using tickets.