Opened 16 years ago

Closed 15 years ago

Last modified 13 years ago

#227 closed enhancement (fixed)

Implement database replication system

Reported by: Richard Boulton Owned by: Richard Boulton
Priority: normal Milestone: 1.1.0
Component: Replication Version: SVN trunk
Severity: minor Keywords:
Cc: Olly Betts Blocked By: #251
Blocking: Operating System: All

Description (last modified by Richard Boulton)

I have a setup where I would like to be able to perform index updates one one master database, and then replicate this database to multiple client machines for searching.

I've experimented with using an NFS setup for this, with the database kept local on the index server and mounted remotely on the search clients, hoping that the client machines would keep enough of the database cached that the network traffic would not slow down searches too much. However, this method doesn't work satisfactorily because the NFS protocol doesn't allow NFS clients to get information about file updates other than by polling the mtime of a file: therefore, whenever the index is updated, any cached pages from the database are discarded. This leads to many very slow searches.

For now, I'm looking at setting up a system to take snapshots of databases using filesystem features (eg, the snapshot functionality provided by ZFS) and then using xdelta to calculate the differences between the databases, transferring the differences manually, and then applying the differences to the database on the search machines.

However, this approach has two major drawbacks: firstly, it depends on filesystem specific features (to take filesystem snapshots - a standard file copy could be used, but this would have poor cache performance, which is exactly what we're trying to avoid). Secondly, it requires the whole database to be traversed on the index machine to calculate the binary diffs. This is undesirable because it imposes unnecessary load on the index machine.

Instead, I would like to have a hook into flint which writes out a list of the modified btree pages, so that these can then be distributed to the search servers. If this information was written to a log file, together with the points at which fsync were called, and with details of the changes made to the base files, this log file could be transferred to the search machines, and could be replayed there, with minimal work required there.

Attachments (1)

patch (25.6 KB ) - added by Richard Boulton 16 years ago.
Work in progress patch

Download all attachments as: .zip

Change History (11)

by Richard Boulton, 16 years ago

Attachment: patch added

Work in progress patch

comment:1 by Richard Boulton, 16 years ago

Cc: olly@… added
Owner: changed from New Bugs to Richard Boulton

comment:2 by Richard Boulton, 16 years ago

attachments.isobsolete: 01

(From update of attachment 148) This has been implemented on HEAD now.

comment:3 by Richard Boulton, 16 years ago

Status: newassigned

Replication is now implemented on HEAD, but there are a few things I'd like to tidy up. I'll be making separate issues for each of these things shortly, and marking them as blockers on this bug.

comment:4 by Richard Boulton, 16 years ago

Blocked By: 236 added
Operating System: All

comment:6 by Richard Boulton, 16 years ago

Description: modified (diff)
Milestone: 1.1.0

comment:7 by Richard Boulton, 16 years ago

Blocked By: 251 added

comment:8 by Richard Boulton, 16 years ago

Blocked By: 278 added

comment:9 by Olly Betts, 15 years ago

Blocked By: 236, 278 removed

This is done except for the two blockers, and a bug just to track two others is just clutter so unmarking the blockage and closing.

comment:10 by Olly Betts, 15 years ago

Resolution: fixed
Status: assignedclosed

comment:11 by Olly Betts, 13 years ago

Component: OtherReplication
Note: See TracTickets for help on using tickets.