Ping a Specific Port

Question

dkusleika

Asked: 2019-12-25 08:16:36 +0800 CST2019-12-25 08:16:36 +0800 CST 2019-12-25 08:16:36 +0800 CST

How Can I Ensure Completeness in my Backup Process?

772

I want to develop an automated process for checking that every machine on our domain is getting backed up. I'm wondering how other people do it.

We have a SAN (virtualized windows and linux servers and three SQL Servers) and a couple of NAS in our data center, a couple dozen physical DCs in the field (Win Server 2016), and a few hundred workstations (soon to be all Win 10). We keep Veam snapshots for a month locally, then push them to AWS after that.

Recently we needed to restore an Excel file that was used to update a table on one of our SQL Servers. We failed. The share on the NAS where the file lived was not being backed up. When we created the backup process, the share was barely used and I'm sure we chose not to back it up on purpose. But as we gradually started using that share for more important stuff, we didn't change the process.

Next we tried to restore the data from the SQL server. This server was added in the last year and, while it was being backed up, we missed the part where we pushed it to AWS so we only had it going back one month.

We should have backed up that share from the beginning - important or not. And we should have been pushing the new SQL backups to AWS. My take away from all this is that there are too many places for human error in our process.

One idea we had was to get every machine from Active Directory and select a "random" file from each drive/share (excluding system files and executables) and see if we can find it in our backups. We could automate the selection process with PowerShell. I'm not sure about automating the checking of our backups, but hopefully there's a way. If we had to manually do it for a few hundred files it would be better than nothing.

Is there a best practice for backup completeness? Is there something better than the humans-being-careful method?

1 Answers

Voted

John Mahowald · Answer 1 · 2019-12-27T05:56:59+08:00

Identifying process and procedure failures like you are doing is important. Constructive criticism enables improvement.

The share on the NAS where the file lived was not being backed up. When we created the backup process, the share was barely used and I'm sure we chose not to back it up on purpose. But as we gradually started using that share for more important stuff, we didn't change the process.

Choose the backup strategy of every storage volume, even if it is "no backup". Communicate what is permanent and what is temporary to users. Backing up everything is not required, if the durability of every storage is known.

Also have a process for reviewing backups as processes change. Whenever you hear about important projects, ask the questions "Where did you save that?" and "If the file was gone, what problems would that cause?"

Backups are useless. Restores are what you care about.

Make restores a mandatory part of testing and business continuity planning.

Restore from archive media, probably cloud or tape archives. Worst case scenario, but needs to function if local storage has died.
Create test and disaster recovery environments with data completely from backup.
Verify that data owners access these environments and confirm they work as expected. Even better, a DR cutover to put end users on a system restored from backup.
Leadership involvement is required to commit to the time and money for such projects.
Ongoing projects, as you will want to do such restores at least a couple times a year.

The verification bit will be somewhat manual, as you want to verify the restores produce something humans would want to use. But if the users actually use the restored system, they will definitely discover their important spreadsheet is missing.

Feel free to add automated integrity checks like file checksum verification, and DBMS verify procedures. But verifying data suitable for use is difficult. You may have a completely valid file, but it is a month old and the organization cannot use it. Or, a volume was intentionally not backed up, but users put important stuff on it anyway.

How Can I Ensure Completeness in my Backup Process?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?