I am sorry if this question has an answer already somewhere, but as it is highly specific, all my search queries returned standard questions/answers corresponding to the backup topic in general.
What I am not asking for:
- How to backup data
- How to backup the system
- General backup tools and their features
My question:
What is the most efficient way to backup an entire harddisk, having multiple partitions, to an identical harddisk (mirror disk) that is kept offline and powered on manually only to update its content?
Definition of efficient for this question: least performance implication on the system (CPU, I/O load on source harddisk), no requirement for manual interaction (no UI application) and reliable mechanism (don't want to fix sync errors all the time)
My ideas:
The following alternative commands should be started automatically, as soon as the mirror disk is powered on. Displaying a progress via e. g. desktop notification is minor.
- Run rsync for each partition
- Pros:
- Default tool for tasks like this
- Should be quite efficient
- Supports incremental updates (sync)
- Cons:
- Has to be set up for each partition/updated on partition changes
- Overhead of working on filesystem level?
- Pros:
- Run command (dd?) for the whole disk
- Is there a tool on disk level that supports "syncing" differences to a mirror disk only?
- Will operations on disk level be more performant than rsync on filesystem level?
- Use raid 1
- Will a raid 1 setup operate reliably if one of the disks is offline most of the time?
- Are there unwanted side effects of the permanent degraded raid 1 array in normal operation?
- Is raid 1 able to sync differences efficently (or is it designed to rather sync securely)?
- May there be problems if the differences grow too large?
- There are both, hardware and software raid 1 options for my system - do they have different behaviours regardings the points above and this unusual use case in general?
- Is there another option?
At first glance, I favoured the raid 1 solution, cause the content mirroring requirement is exactly what raid 1s are made for. But as raid arrays are meant to work with online devices and the offline use case (device failure) should occur rarely only, I wonder if this solution is the most efficient approach for me.
Update: Partition/data characteristics
Partition characteristics (that may affect the performance comparisions):
- One partition will contain a linux operating system (75 GB)
- One is a swap partition (that may be omitted in the filesystem approach; but it is rarely used, so this should not be taken into account) (43 GB)
- One partition is designed for arbitrary data (may be big sized files, e. g. virtual machine disk files) (107 GB)
- The major partition (5,7 TB) is a luks encrypted container
- Filesystem is ext4 for all appropriate partitions
Update 2018-04-28:
As it turned out here that it is a non-trivial question to compare these different mechanisms, the best approach would be to try it out and measure the performance.
But in the meantime I realized that there is no need for an additional backup of all of the partitions except the luks partition (no need to backup the operating system partition, as it already is a backup of the real operating system on a SSD, no need to backup swap obviously and finally no need to backup the partition with arbitrary data). Therefore it was easy now to decide to go with the rsync approach, to just mirror the luks partition. The remaining space on the "mirror disk" is defined now as partition for manual backups - e. g. for data from foreign systems, which is a valuable additional "feature".
I am sorry to not having analyzed/answered the original question now, because technically it is quite interesting.
0 Answers