Ticket #227 (closed enhancement: fixed)

Opened 12 months ago

Last modified 3 weeks ago

Implement database replication system

Reported by: richard Owned by: richard
Priority: normal Milestone: 1.1.0
Component: Other Version: SVN trunk
Severity: minor Keywords:
Cc: olly Blocked By: #251
Operating System: All Blocking:

Description (last modified by richard) (diff)

I have a setup where I would like to be able to perform index updates one one master database, and then replicate this database to multiple client machines for searching.

I've experimented with using an NFS setup for this, with the database kept local on the index server and mounted remotely on the search clients, hoping that the client machines would keep enough of the database cached that the network traffic would not slow down searches too much. However, this method doesn't work satisfactorily because the NFS protocol doesn't allow NFS clients to get information about file updates other than by polling the mtime of a file: therefore, whenever the index is updated, any cached pages from the database are discarded. This leads to many very slow searches.

For now, I'm looking at setting up a system to take snapshots of databases using filesystem features (eg, the snapshot functionality provided by ZFS) and then using xdelta to calculate the differences between the databases, transferring the differences manually, and then applying the differences to the database on the search machines.

However, this approach has two major drawbacks: firstly, it depends on filesystem specific features (to take filesystem snapshots - a standard file copy could be used, but this would have poor cache performance, which is exactly what we're trying to avoid). Secondly, it requires the whole database to be traversed on the index machine to calculate the binary diffs. This is undesirable because it imposes unnecessary load on the index machine.

Instead, I would like to have a hook into flint which writes out a list of the modified btree pages, so that these can then be distributed to the search servers. If this information was written to a log file, together with the points at which fsync were called, and with details of the changes made to the base files, this log file could be transferred to the search machines, and could be replayed there, with minimal work required there.

Attachments

patch (25.6 kB) - added by richard 12 months ago.
Work in progress patch

Change History

Changed 12 months ago by richard

Work in progress patch

Changed 12 months ago by richard

  • cc olly@… added
  • owner changed from newbugs to richard

Changed 11 months ago by richard

  • attachments.isobsolete changed from 0 to 1

(From update of attachment 148) This has been implemented on HEAD now.

Changed 11 months ago by richard

  • status changed from new to assigned

Replication is now implemented on HEAD, but there are a few things I'd like to tidy up. I'll be making separate issues for each of these things shortly, and marking them as blockers on this bug.

Changed 11 months ago by richard

  • blockedby set to 236

Changed 11 months ago by trac

  • platform set to All

Changed 9 months ago by richard

  • description modified (diff)
  • milestone set to 1.1.0

Changed 9 months ago by richard

  • blockedby changed from 236 to 236, 251

Changed 7 months ago by richard

  • blockedby changed from 236, 251 to 236, 251, 278

Changed 3 weeks ago by olly

  • blockedby changed from 236, 251, 278 to 251

This is done except for the two blockers, and a bug just to track two others is just clutter so unmarking the blockage and closing.

Changed 3 weeks ago by olly

  • status changed from assigned to closed
  • resolution set to fixed
Note: See TracTickets for help on using tickets.