I'm trying to test and document a backup and restore procedure for Centos 6. Here's where I'm up to, but there are a few areas where I need a bit of clarity. CentOS backup/restore documentation on the 'net is a bit hit and miss.
General backup and restore plan
Backup your systems each day using your favourite backup software. I'm not going to go into this in great depth, but let's say you have a proper backup system that allows you to make a backup of one system and restore it to another.
Smoke and flame engulfs one of your servers! After dealing with the immediate danger you realise that an important system is irreparably damaged. You need to restore it to different hardware.
Check the backup files. Look at the backup of the failed system's
/etc/redhat-release
file. Use this to establish which version (patch level) of CentOS the failed system was using? Grab the install media for this version.Using the install media, do a minimal install of the Operating System to your replacement hardware, partitioning the disks as appropriate for the system's end use.
After the minimal system is installed, temporarily disable selinux,
echo ‘0’> /selinux/enforce
stop iptables,service iptables stop
and install your backup client.Recover from the backup, excluding the following files from recovery:
/proc
/sys
/tmp
/dev
/var/lock<- don't exclude from restore - see answer
/var/run<- don't exclude from restore - see answer
/var/tmp
/etc/fstab
/etc/mdadm.conf
/etc/mtab
/etc/resolv.conf
/etc/networks
/etc/sysconfig/network*
/etc/sysconfig/kernel
/etc/hosts
/etc/modprobe*
/etc/networkmanager <- to ensure that IP isn't restored - see answer
/etc/udev
/lib/modules
/boot
When the restore has completed, reboot and watch for errors
Check that the network configuration is correct. You may need to use
system-config-network
to make changes to your network settings.Some applications like Apache and MySQL may not start correctly after the restore. BecauseThis shouldn't be a problem as long as you don't exclude /var/run and /var/lock from the restore - see answer./var/run
was excluded from the restore, subfolders like/var/run/httpd
won't exist, and so applications won't be able to create PID files properly. You need to restore folders like/var/run/httpd/
and/var/run/mysqld/
and give them the correct permissions.After completing remedial action ensure applications come up correctly.
If you're running a MySQL database it may still be OK without you having to restore it from any flatfile backup you may have made. You can check the state of the database by running
mysqlcheck -c -u root –p******** --all-databases
. If you see any errors runmysqlcheck -c -u root –p******** --all-databases --auto-repair
to repair them. You should always ensure that you have a proper backup your database as indicated in the answer below. I personally use mysqldump.Patch the system up to the latest level using
yum update
.After rebooting to ensure the system comes back up clearly and thoroughly checking /var/log/messages for any errors, test the system's functionality to ensure it is operating correctly. When this is the case, use
system-config-network
to change the IP address to that of the original faulty system.
Issues/Questions
Excluding/var/run/*
from the restore causes the subfolders used to contain PIDs for some apps not to not be created when you restore. Is it really necessary to exclude/var/run/*
from restore? Is a better way to simply not restore the PID files?When the system was restored, the IP address of the 'faulty system' was also restored. I didn't want this. I must have missed a file off of my 'exclude from recovery' list. Any ideas where it is?
When updating I get lots of messages like/sbin/ldconfig: /usr/lib64/libblah.so is not a symbolic link
. When I reboot the system after updating some services don't come up correctly. I wonder if this is something to do with the backup system restoring the files that the symbolic links point to instead of the symbolic links themselves. If I run ldconfig and look at one of the shared objects it complains about, the shared object is an actual file rather than a symlink. Anybody else seen this?
1. Excluding
/var/run
As you already noticed, excluding
/var/run
during a complete restore of a CentOS 6 system causes problems, because it also excludes directories created by installed packages. Excluding/var/lock
can also cause similar problems, because some packages create subdirectories there too.(There may be no such issues on more recent Linux distributions which use
systemd
— on such distributions/var/lock
and/var/run
(really/run
) may be placed ontmpfs
, and any required subdirectories are created during every boot; however, CentOS 6 is much older and does not have any support for automatic creation of subdirectories in/var/lock
or/var/run
.)However, actually excluding
/var/run
and/var/lock
is not needed for a proper restore, because the/etc/rc.d/rc.sysinit
script on CentOS 6 includes the following command:This command will remove all stale lock or pid files (or any other non-directory files, such as sockets and symlinks) during the system boot. Therefore you should remove
/var/lock
and/var/run
from the restore exclusion list.2. Location of network configuration files
You already exclude
/etc/sysconfig/network*
when restoring the backup; this should match both the/etc/sysconfig/network
file (global networking configuration) and the/etc/sysconfig/network-scripts
directory (per-interface configuration filesifcfg-*
). However, these files are used only by the old-style network configuration scripts included in theinitscripts
package, and CentOS 6 has another network configuration system — NetworkManager, configuration for which is stored in/etc/NetworkManager
. Try also excluding that directory when you restore the backup.3. The issue with symbolic links replaced with files
If you see that symbolic links were replaced with plain files after the restore, this means that either your backup/restore program was not configured correctly, or (if there is no option for saving and restoring actual symlinks) the program you used is not suitable for Linux system backup/restore at all. You can get away with a program which does not support symlinks only if the program is used to backup and restore only some specific data which definitely will not contain symlinks. Note that you may find symlinks in places where you did not expect them — e.g., in some cases symlinks may be used in MySQL database directories (to store some parts of data on a different device), therefore relying on the “no symlinks” assumption may be dangerous.
4. MySQL backup
If your backup program simply copies files from a running server, your backup is not really “crash consistent“, because different files (and even different blocks of a same file) are copied at different times, therefore you will not actually get a consistent snapshot of the database in your backup. (This applies to any kind of database, not just MySQL.)
There are several ways to backup MySQL databases using just a file-level backup:
Use
mysqldump
to create a SQL dump before starting the file-level backup; backup the dump file instead of the database directory. This is the most portable backup format, but both dumping and restoring may be slow.Stop the MySQL server before starting the backup, make a file-level backup, then start the MySQL server again. To restore, just restore all files on the new server, then start the server normally. This kind of backup is fast, but requires a significant downtime during the backup.
To reduce the MySQL server downtime required by the previous method, you can create a filesystem snapshot after stopping the server, then start the MySQL server again, and then mount the snapshot, perform a file level backup and delete the snapshot. You need to have the filesystem on an LVM volume with some free space in the volume group for the snapshot.
To reduce the downtime even further, you can use
FLUSH TABLES WITH READ LOCK
before taking the snapshot instead of stopping the server, as described here; in this case the snapshot will contain MyISAM tables in a consistent state, and InnoDB tables in a crash-consistent state (InnoDB recovery will be needed after a file level restore).Read this documentation for more information about MySQL backup.
Combining exclude list in this Question thread, and one Rackspace tutorial, i was able to make the below configuration work reliably to replicate/copy an entire installed CentOS server.
My setup is CentOS 6.7 + Virtualmin. However this would possibly work with CentOS 6.X without any control panel.
The procedure i created is below:
If you aren't using Virtualmin, you possibly will need to leave out Virtualmin items.
The exclude file list to copy to remote server is below:
Credits:
https://support.rackspace.com/how-to/migrating-a-linux-server-from-the-command-line-2/
There's an excellant open source project ReaR (Relax and Recover) that has done amazing things in the area of creating an image style backup of linux (including CentOS and Red Hat). Of particular note is the cool way they capture the filesystem layout and incorporate this into their recovery disk to make restoring filesystem layout work quite well. Best of all, it's written in bash (and really well written bash to boot!).
We have no connection to the project, other than we wrote a quick tutorial http://carroll.net/blog/red-hat-bare-metal-backup.