I have a Linux log server where multiple applications write data. Data is written in bursts, and in a lot of different files. I need to make a backup of this mess, preferably preserving as much coherence between the file versions as possible and avoiding getting truncated files. Total amount of data on the server is about 100Gb. What I really would want (but can't) is to shut-down, backup the system cold and then start it up again.
What kind of guarantees against concurrent modification does the various backup tools give? When do they "freeze" the file versions? I am looking at rsync, dump and tar at the moment, but I am open for other (open source) alternatives.
Changing the application or blocking writing for backups is sadly not an option. System is not running LVM (yet), but I have considered that for rebuilding the system and then snapshots.
isn't some sort of log rotation an option for you? just back up log files that are already rotated - wont it be a solution?
and yes - otherwise snapshot on the LVM level would be your best choice [ remember that when snapshot is active your write performance degrade ].
If you're using syslog, you can configure your log server to replicate logs live to another server (e.g. rsyslog) for real-time backup.
Then backup all rotated files as already suggested for long-term archival.
logrotate can be configured for custom applications, too, and can apply bash scripts on rotated logs. So you may skip /var/log in external backup tools altogether and copy logs to an archive directory which is more static.
None of the tools you are considering provide guarantees against concurrent modification. However, do you really need a point in time snapshot. If so use the LVM snapshot option given above. As you have given rsync as an option I assume that disk to disk backup is an option.
Least secure is dump which takes a copy of the disk blocks as they are read. Given the size of your data there is likely to be significant differences between the directory information and the data. For disk to disk backup you could consider dd to partitions of the same size as an alternative. Both solutions do essentially the same thing and have the same problems.
Tar will read the files one by one and will read to the end of each. If a file is renamed or deleted while tar is backing it up tar will back up the file it started reading. It is a reasonable solution for log files.
Rsync behaves like tar, but only copies changes. Essentially it will copy all changes to the directories. With a date based log rotation scheme (logfile.yyymmdd) instead of the common rotated version scheme (logfile.1 logfile.2gz ...). It can efficiently backup your logfiles.