Metawerx Java Hosting Small Logo

Near-Live Backup

A backup strategy where data is continually backed up, asynchronously, such that the time of the backup is near the current "live" time of the original data, without forcing the writer to wait.

A near-live backup can be used to restore data on the primary data store in case of a corruption, hard disk crash, or other fatal disaster, losing as little data as possible. Alternatively, users of the data can be told to use the backup directly if the primary data store cannot be brought back online.

In a near-live backup scenario, the data is replicated or backed up after it has been written to the primary data store. Therefore, the near-live backup always lags slightly behind the original data. See Backup Lag Time for more information.

For a truly live backup, data must be replicated before the writer is told that it has been written successfully. This ensures that all data exists on the backup (including possibly corrupted data). Some RAID configurations use this technique. See below for advantages/disadvantages.

Consistency

Ideally, the near-live backup should be in a consistent state. Techniques such as transactions can be used to achieve this. As long as the backup system is aware of the transactional state, it will only back up the data after the transaction has been completed. This method is used in database replication systems such as MySQL Master-Slave Replication. If data is consistent, the system can be easily restored to a stable condition if necessary.

Advantages / Disadvantages of a Near-Live Backup

  • Advantage: Data is backed up regularly, and usually in a consistent state
  • Advantage: The writer is not forced to wait for the backup to take place, meaning time is not an issue and the backup device can therefore be in a different physical location
  • Disadvantage: As with any asynchronous backup, a loss of the primary data source can mean any data written since the last backup is lost. Ideally, a redundancy system such as RAID would be used on the primary data source to counter this, and near-live backups would be taken frequently.

Example

  • Writer sends data A to primary data store.
  • Primary data store sends a SUCCESS code for data A back to the writer.
  • -- at this point data A has not been backed up, so a crash can lose this data --
  • Writer sends data B to primary data store.
  • Near-backup device backs up data A, data A is now safe.
  • Primary data store sends a SUCCESS code for data B back to the writer.
  • -- at this point data B has not been backed up, so a crash can lose this data --
  • Near-backup device backs up data B, data B is now safe.

Trade-offs: distance vs. lag time

  • A backup device that is close to the source, can achieve a very low backup lag time. The backup data can be very near in time, to the live data.
  • A backup device (or backed up data) that is far from the source provides better protection in case of physical destruction of the primary data source.

More Information

navigation
metawerx specific
search
Share
tools
help

referring pages

Share