#236 closed defect (fixed)
Fix problems with concurrent db replication and modification
Reported by: | Richard Boulton | Owned by: | Richard Boulton |
---|---|---|---|
Priority: | normal | Milestone: | 1.1.0 |
Component: | Replication | Version: | SVN trunk |
Severity: | normal | Keywords: | |
Cc: | Olly Betts | Blocked By: | |
Blocking: | Operating System: | All |
Description (last modified by )
Currently, there is no automated test of the behaviour when the replication function is doing a full copy of a database which gets modified while the copy is in progress.
I've done a manual test of this, so I'm moderately confident it works right, but I can't work out how to do a reliable automated test of it... at least, not without hacking a big sleep (or even a condition) into the database copying code, to allow me to be sure of getting some modifications done in the middle of it.
Any suggestions appreciated.
Change History (11)
comment:1 by , 17 years ago
Blocking: | 227 added |
---|
comment:2 by , 17 years ago
Status: | new → assigned |
---|
comment:3 by , 17 years ago
Cc: | added |
---|---|
Operating System: | → All |
comment:4 by , 16 years ago
Blocking: | 227 removed |
---|
comment:6 by , 16 years ago
Description: | modified (diff) |
---|
I've now had some reports which lead me to believe there is a race condition here, which causes the client to attempt to open an invalid database (invalid due to some of the files involved in the full database copy having been deleted before the server could send them) and fail.
The intention was that this would be recovered by sending followup changesets to the client, which would be applied in turn and result in a valid database. However, the client attempts to open the database before these changesets have been applied, and fails, leaving an invalid database (and also blocking future replication attempts).
I'm attempting to fix this, but making a repeatable test for it is rather hard...
comment:7 by , 16 years ago
Milestone: | → 1.1.0 |
---|
One option (which I would much rather avoid) is to mark replication as "experimental" and release 1.1.0 with this known issue unresolved.
Setting milestone to 1.1.0 for now at least.
comment:8 by , 16 years ago
Perhaps the server should open all the files it wants to send before sending? On Unix that'll mean it has them for sure even if they are deleted (ignoring NFS at least, but even there this will usually work) and on Windows it will block their deletion (or we open to allow deletion and cope with the consequences).
comment:9 by , 16 years ago
Summary: | Implement automated tests of concurrent db replication and modification → Fix problems with concurrent db replication and modification |
---|
I've now got a repeatable test of this (implemented in python). I've added it to the xapian-bindings/python/ directory, but not hooked it into the automatic test run, since it currently fails. Am fixing at present.
Opening all the files in advance helps reduce the likelihood of problems, but we can still get problems if one of the files is changed while the files are being opened (eg, because the database being replicated is switched). This results in an invalid database being sent to the client.
The fundamental problem is that the client tries to open the database, before waiting for subsequent changesets to apply to the database which would make it valid.
Changing title of ticket to indicate current status.
comment:11 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:12 by , 13 years ago
Component: | Backend-Flint → Replication |
---|
(In #227) This is done except for the two blockers, and a bug just to track two others is just clutter so unmarking the blockage and closing.