Ping a Specific Port

Question

Kazimieras Aliulis

Asked: 2009-05-06 14:29:20 +0800 CST2009-05-06 14:29:20 +0800 CST 2009-05-06 14:29:20 +0800 CST

Best practices for backup checking?

772

It is a common situation, when administrator makes system for automatic backuping and forgets it. Only after a system fails administrator notices, that backup system has broken before or backups are unrestorable because of some fault and he has no current backup to restore from... So what are best practices to avoid such situations??

9 Answers

Voted

trent · Answer 1 · 2009-05-06T14:33:37+08:00

Best Answer

trent

2009-05-06T14:33:37+08:002009-05-06T14:33:37+08:00

Run fire drills ... every couple of months it is a good idea to say XYZ system is down ... then actually go through the motions of bringing it back online to a new VM etc etc. It keeps things honest and helps you catch mistakes.

28

Mr Shark · Answer 2 · 2009-05-15T03:55:57+08:00

Mr Shark

2009-05-15T03:55:57+08:002009-05-15T03:55:57+08:00

soapbox mode: ON

I would say that its as simple that backups that isn't tested regularly is worthless.

A my previous job we had a policy that every system (production, test, development monitoring etc.) should be test restored every 6 months.

This was also the job of the most junior admin so that documentation was up to date. Junior being defined by how much work he/she had don on the specific system, sometime (quite often actually) it was the "group manager" that did it

We had special hardware dedicated to this (one Intel and one IBM/AIX box) that was low spec for everything but diskspace, since we did not need to run anything real on the restored host.

Quite a lot of work the first couple of rounds but it led us to streamline the restore process which is the important part of backup.

10

WerkkreW · Answer 3 · 2009-05-14T08:53:07+08:00

Since you seem to be referring to the fact that the administrator doesn't notice that the backup job "breaks", and not so much that a working backup did not work right, I would suggest building some sort of monitoring scripts around the backups.

When building a home-grown backup solution, I would do something like this:

Build a script to back up your data.
Perform test restoration to ensure the script works correctly.
In the script, or via some other means, implement a way to track the status of the backups (success, failure, ran, did not run).
Have that tracking status monitored (email, database, something)

Once all of that is done, you should be fine. One extra thing to do would be perform regular test restores. If you have extra hardware to donate to the cause that is.

Where I work we have a warm-site, once a month we randomly choose a system or database and go to our warm site and perform a test restoration exercise on bare-metal to ensure the ability to recover our data.

Honestly, if you data is very important to you, it would be in your best interest to invest in some software to manage your backups for you. There are hundreds of products out there for this, from the cheap and simple, to the enterprise class.

If you are relying on a set of hand-written scripts running in the crontab for your companies backups, sooner or later you will likely get burned.

Chopper3 · Answer 4 · 2009-05-06T14:37:11+08:00

Chopper3

2009-05-06T14:37:11+08:002009-05-06T14:37:11+08:00

We have 60%-size 'Reference' versions of our 'Production' systems, we use them for final testing of changes, we restore 'Production' backups to these systems - it tests the backup plus ensures both environments are in step with each other.

4

nedm · Answer 5 · 2009-05-11T22:30:42+08:00

nedm

2009-05-11T22:30:42+08:002009-05-11T22:30:42+08:00

One approach is to script a "recovery" job to run periodically, for instance one that grabs a specific text file from the most recent backup and emails you its contents. If it's possible, this should -- at least sometimes -- be done using a different box than the one that created or backed up the data, just to ensure it will work if you should need to do so. The advantage is that you can be sure your encryption/decryption, compression, and storage mechanisms are all working.

This is a little more involved for specialized backups such as email and database servers, though performing some kind of small-scale recovery from a small DB or brick-level mailbox backup and verifying the contents is certainly possible, just a little more involved.

This approach also shouldn't replace a periodic full restore to ensure you can recover data in the event of an emergency -- it just allows you to be a little more confident about the integrity of your day-to-day backup job.

1

kubanczyk · Answer 6 · 2009-06-16T14:42:29+08:00

When performing test restore I don't really feel comfortable at the point "this looks nice, files are restored, it seems no file is missing, even the sizes match", or at the point "this looks nice, I started my application... does not crash, displays some decent data".

I want to restore server/cluster from scratch, and then to actually use it for production. Not for a minute, not for an hour, but permanently. If you claim that your restore was successful, then there is absolutely no reason not to start a production. This is not some "dirty" system, that should be forgotten. This is the system that you will face after a real disaster. So, if it passes "looks nice" stage, live with it. Back it up next night. Forget about the original one. You probably will discover some glitches using this approach, and you will be forced to fix all of them. The next restore of the same system has a decent chance to be 100% successful.

This includes your backup software and server. Yes, you need to restore these too.

Have no budget to buy dedicated hardware for restore?

Make a point that you absolutely need a budget. On every occasion remind the decision makers that a valid, throughout restore test has not happened yet. (And yes, gather the evidence to cover your ass. Tough world.)
In most organizations there is occasionally a business need to migrate some system to another hardware, so use the opportunity. Always choose "restore from backup" method for migration, pretending that you have just lost the original hardware. Yes, it means more downtime, sorry about that. At least you will have confidence that your backup is useful.
No migration? Maybe you can borrow some hardware for two weeks and perform two restore tests (restore to borrowed hardware, wait more than a week, restore from borrowed to original, live with it). Usually, if there is a new hardware purchased for some new system and you arrange things properly, you can easily borrow it - by offering to exhaustively test it for two weeks. If the new hardware is not 100% identical to the old one, that will make your test even better. How do you know if you get identical hardware in case of real disaster?
Any new system is being implemented by you at the moment? Can you test the restore right now? Don't use additional hardware, just overwrite the new system as you have fresh knowledge how to re-implement it quickly. This works if it has no significant data yet. Again, go to production on the restored version, not on the freshly re-installed version.

Trondh · Answer 7 · 2013-10-19T02:05:51+08:00

Trondh

2013-10-19T02:05:51+08:002013-10-19T02:05:51+08:00

Fire drills.
A policy on testing all backups every 6 months is a very good idea
When it comes to testing, you need to look at each application or system your backing up. Ideally, what constitutes a "successful" or "recoverable" backup should be listed in the Service Description or SOP (operational documentation) for your backup, along with other details such as retention time, bladibla.

You'll probably find that some backup types are can be easily restore-tested by scripts (such as databases) while others need some manual input (Active Directory restore). Automate as much as you can of this, make sure some kind of reporting is in place, and make sure "someone" performs the manual tests at regular intervals as well. An isolated environment (downscaled copy of prod) will make it easier to perform restore testing.

1

Patrick Leonard · Answer 8 · 2015-10-01T11:28:41+08:00

Patrick Leonard

2015-10-01T11:28:41+08:002015-10-01T11:28:41+08:00

While we don't test backups we do have the centralized backup checking and reporting component in the system we developed BackupRadar.com. Feel free to check it out to see if it helps with that component. It attaches a copy of the success/failure emails to the backup policy and it will also attach screenshots if your backup software is capable of sending those as well.

Thanks, Patrick

0

SqlACID · Answer 9 · 2009-05-06T14:37:16+08:00

SqlACID

2009-05-06T14:37:16+08:002009-05-06T14:37:16+08:00

Make sure backup activity is logged, then write something (in perl of course) that parses those logs looking for failures, distill it down and have it sent as a daily email.

-1

Best practices for backup checking?

Ping a Specific Port

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What's the command-line utility in Windows to do a reverse DNS look-up?

How to check if a port is blocked on a Windows machine?

What port should I open to allow remote desktop?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?