Opened 16 years ago
Closed 5 years ago
#320 closed defect (fixed)
replicationtest.py fails most of the time
Reported by: | Olly Betts | Owned by: | Richard Boulton |
---|---|---|---|
Priority: | normal | Milestone: | 1.4.14 |
Component: | Xapian-bindings (Python) | Version: | git master |
Severity: | normal | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Operating System: | All |
Description
Here's the output from "make check" on atreus:
Running test: replication_concurrency.../home/olly/svn/xapian-trunk/xapian-core/bin/.libs/lt-xapian-replicate: NetworkError: Unable to fully synchronise: Can't open database: Cannot open tables at consistent revisions FAILED replicationtest2.py:139:Expected equality: got '1', expected '2' 137 set_master(masterpath, secondpath) 138 time.sleep(1) -> 139 expect(xapian.Database(slavepath).get_metadata('dbname'), '2') 140 141 set_master(masterpath, firstpath) Xapian version: 1.1.0 Platform: Linux 2.6.24.2 (#1 SMP Mon Feb 18 22:05:27 GMT 2008) When reporting this problem, please quote all the preceding lines from "replicationtest2.py:139" onwards. 0 tests passed, 1 tests failed FAIL: replicationtest.py ======================================= 1 of 3 tests failed Please report to http://xapian.org/bugs =======================================
I've now wasted several hours failing to fix this. I'm starting to suspect that the testcase is likely just broken as it currently exists, but I don't understand it well enough to sort that out. So in the interests of keeping the snapshot builder working, I'm going to disable it for now as Richard seems to be away.
I thought for a while that the issue was broken databases in dbs_replication
as the testcase just tries to blindly use them if they exist. It passed once when I removed them, but this doesn't seem to be repeatable.
I noticed that "make clean" doesn't remove these databases (fix committed) and also that pythontest2.py leaves databases behind for a couple of testcases (fix also committed).
As an aside, using a fixed TCP port number for the server is problematic - unless I'm missing some subtlety, it means that the test will fail if it's already being run on the same host - on atreus that means me vs Richard vs James vs buildbot vs the snapshot builder all contending for port 7876, and someone else might even have a service using that port for something else. The easiest fix would probably be to skip the test if the server can't be started. A better fix would be to try different port numbers like the C++ test harness does.
Marking for 1.1.0, at least for now.
Change History (10)
comment:1 by , 16 years ago
comment:2 by , 16 years ago
Status: | new → assigned |
---|
I probably shouldn't have enabled the test by default, sorry.
The test is sensitive to timing issues - which is hard to avoid since its a test of concurrency. However, the bit which is failing can be fixed - I'll change it to check for equality every second or so, and only fail if the value is still unequal after a long period of time (say, 10 seconds). I believe it is failing because the replication client hasn't yet finished replicating the changed database.
The port issue is as you describe - I'll need to resolve that before we re-enable the test by default, too.
comment:5 by , 16 years ago
Milestone: | 1.1.7 → 1.2.0 |
---|
Bumping for now, but if you want to fix it for 1.1.x I've no objections.
comment:6 by , 15 years ago
Component: | Xapian-bindings → Xapian-bindings (Python) |
---|
comment:9 by , 5 years ago
Milestone: | 1.4.x → 1.4.14 |
---|---|
Version: | SVN trunk → git master |
I don't see much point keeping this open any longer. If it's not been fixed in over a decade, it seems unlikely it ever will be. We have reasonable test coverage for replication in the xapian-core testsuite.
Removed from master in a6ce127b3ce9c673dc085d4079c1171e75cedf44. Will backport for 1.4.14.
comment:10 by , 5 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Removed from RELEASE/1.4 branch in 1135c9f61d12f5fca0a219debe80206e5db5dc71.
I have a theory - I wonder if before committing you changed the "create database" code but were using cached databases created by a previous run. So the test passed for you, but won't when run from a fresh tree.
That doesn't explain the test passing for me once, but I may have been mistaken about that.
Anyway, disabled for now.