Opened 16 years ago

Closed 15 years ago

Last modified 13 years ago

#236 closed defect (fixed)

Fix problems with concurrent db replication and modification

Reported by: Richard Boulton Owned by: Richard Boulton
Priority: normal Milestone: 1.1.0
Component: Replication Version: SVN trunk
Severity: normal Keywords:
Cc: Olly Betts Blocked By:
Blocking: Operating System: All

Description (last modified by Richard Boulton)

Currently, there is no automated test of the behaviour when the replication function is doing a full copy of a database which gets modified while the copy is in progress.

I've done a manual test of this, so I'm moderately confident it works right, but I can't work out how to do a reliable automated test of it... at least, not without hacking a big sleep (or even a condition) into the database copying code, to allow me to be sure of getting some modifications done in the middle of it.

Any suggestions appreciated.

Change History (11)

comment:1 by Richard Boulton, 16 years ago

Blocking: 227 added

comment:2 by Richard Boulton, 16 years ago

Status: newassigned

comment:3 by Olly Betts, 16 years ago

Cc: olly@… added
Operating System: All

comment:4 by Olly Betts, 15 years ago

Blocking: 227 removed

(In #227) This is done except for the two blockers, and a bug just to track two others is just clutter so unmarking the blockage and closing.

comment:6 by Richard Boulton, 15 years ago

Description: modified (diff)

I've now had some reports which lead me to believe there is a race condition here, which causes the client to attempt to open an invalid database (invalid due to some of the files involved in the full database copy having been deleted before the server could send them) and fail.

The intention was that this would be recovered by sending followup changesets to the client, which would be applied in turn and result in a valid database. However, the client attempts to open the database before these changesets have been applied, and fails, leaving an invalid database (and also blocking future replication attempts).

I'm attempting to fix this, but making a repeatable test for it is rather hard...

comment:7 by Olly Betts, 15 years ago

Milestone: 1.1.0

One option (which I would much rather avoid) is to mark replication as "experimental" and release 1.1.0 with this known issue unresolved.

Setting milestone to 1.1.0 for now at least.

comment:8 by Olly Betts, 15 years ago

Perhaps the server should open all the files it wants to send before sending? On Unix that'll mean it has them for sure even if they are deleted (ignoring NFS at least, but even there this will usually work) and on Windows it will block their deletion (or we open to allow deletion and cope with the consequences).

comment:9 by Richard Boulton, 15 years ago

Summary: Implement automated tests of concurrent db replication and modificationFix problems with concurrent db replication and modification

I've now got a repeatable test of this (implemented in python). I've added it to the xapian-bindings/python/ directory, but not hooked it into the automatic test run, since it currently fails. Am fixing at present.

Opening all the files in advance helps reduce the likelihood of problems, but we can still get problems if one of the files is changed while the files are being opened (eg, because the database being replicated is switched). This results in an invalid database being sent to the client.

The fundamental problem is that the client tries to open the database, before waiting for subsequent changesets to apply to the database which would make it valid.

Changing title of ticket to indicate current status.

comment:10 by Richard Boulton, 15 years ago

Fixed in revision [11741].

comment:11 by Richard Boulton, 15 years ago

Resolution: fixed
Status: assignedclosed

comment:12 by Olly Betts, 13 years ago

Component: Backend-FlintReplication
Note: See TracTickets for help on using tickets.