Ping a Specific Port

Question

Jonas Bylov

Asked: 2014-04-07 22:39:14 +0800 CST2014-04-07 22:39:14 +0800 CST 2014-04-07 22:39:14 +0800 CST

Monday morning mistake: sudo rm -rf --no-preserve-root /

772

Please note: The answers and comments to this question contains content from another, similar question that has received a lot of attention from outside media but turned out to be hoax question in some kind of viral marketing scheme. As we don't allow ServerFault to be abused in such a way, the original question has been deleted and the answers merged with this question.

Here's a an entertaining tragedy. This morning I was doing a bit of maintenance on my production server, when I mistakenly executed the following command:

sudo rm -rf --no-preserve-root /mnt/hetznerbackup /

I didn't spot the last space before / and a few seconds later, when warnings was flooding my command line, I realised that I had just hit the self-destruct button. Here's a bit of what burned into my eyes:

rm: cannot remove `/mnt/hetznerbackup': Is a directory
rm: cannot remove `/sys/fs/ecryptfs/version': Operation not permitted
rm: cannot remove `/sys/fs/ext4/md2/inode_readahead_blks': Operation not permitted
rm: cannot remove `/sys/fs/ext4/md2/mb_max_to_scan': Operation not permitted
rm: cannot remove `/sys/fs/ext4/md2/delayed_allocation_blocks': Operation not permitted
rm: cannot remove `/sys/fs/ext4/md2/max_writeback_mb_bump': Operation not permitted
rm: cannot remove `/sys/fs/ext4/md2/mb_stream_req': Operation not permitted
rm: cannot remove `/sys/fs/ext4/md2/mb_min_to_scan': Operation not permitted
rm: cannot remove `/sys/fs/ext4/md2/mb_stats': Operation not permitted
rm: cannot remove `/sys/fs/ext4/md2/trigger_fs_error': Operation not permitted
rm: cannot remove `/sys/fs/ext4/md2/session_write_kbytes': Operation not permitted
rm: cannot remove `/sys/fs/ext4/md2/lifetime_write_kbytes': Operation not permitted
# and so on..

I stopped the task and was relieved when I discovered that the production service was still running. Sadly, the server no longer accept my public key or password for any user via SSH.

How would you move forward from here? I'll swim an ocean of barbed wire to get that SSH-access back.

The server is running Ubuntu-12.04 and hosted at Hetzner.

9 Answers

Voted

Journeyman Geek · Answer 1 · 2016-04-12T00:02:31+08:00

Fact is? At this point, there's no simple/easy automatic fix for this. Data recovery is a science and even the basic, common tools need someone to sit down and ensure the data is there. If you're expecting to recover from this without massive amounts of downtime, you're going to be disappointed.

I'd suggest using testdisk or some file system specific recovery tool. Try one system, see if it works, and so on. There's no real way to automate the process but you can probably carefully do it in batches.

That said, there is a few very scary things in the questions and comments that ought to be part of your after action reports.

Firstly, you ran the command everywhere without checking it first. Run a command on one box. Then a few, then more. Basically if something goes wrong, its better to have it affect a few rather than all your systems.

Secondly

@Tim how to do a backup without mounting a remote drive on the server?

Scares me. File level one way backups are a solved problem. Rsync can be used to preserve permissions and copy over files one way to a backup site. Accidentally something? Reinstall (preferably automatically) rsync back, and things work. In future, you might use file system level snapshots with btrfs or zfs snapshots and shipping those for system level backups. I'd actually toy with separating application servers, databases and storage and introduce the principle of least privilege so you would split up the risk of something like this..

I know there is anything I can do. I now need to think how to protect myself

After something has happened is the worst time to consider this.

What can we learn from this?

Backups save data. Possibly careers.
If you have a tool and arn't aware if what it can do, its dangerous. A jedi can do amazing things with a lightsaber. A roomful of chimps with lightsabers... would get messy.
Never run a command everywhere at once. Seperate out test and production machines, and preferably do production machines in stages. Its better to fix 1 or 10 machines rather than 100 or 1000.
Double and triple check commands. There's no shame in asking a co worker to double check "hey, I'm about to dd a drive, could you sanity check this so I don't end up wiping a drive?". A wrapper might help as well, but nothing beats a less tired set of eyes.

What can you do now? Get an email out to customers. Let them know there's downtime and there's catastrophic failures. Talk to your higher ups, legal, sales and such and see how you can mitigate the damage. Start planning for recovery, and if needed you're going to have to, at best, hire extra hands. At worst, plan for spending a lot of money on recovery. At this stage, you're going to work at mitigating the fall out as well as technical fixes.

faker · Answer 2 · 2014-04-07T23:00:18+08:00

Best Answer

faker

2014-04-07T23:00:18+08:002014-04-07T23:00:18+08:00

Boot into the rescue system provided by Hetzner and check what damage you have done.
Transfer out any files to a safe location and redeploy the server afterwards.

I'm afraid that is the best solution in your case.

97

Amal Murali · Answer 3 · 2014-04-07T23:57:09+08:00

Amal Murali

2014-04-07T23:57:09+08:002014-04-07T23:57:09+08:00

When you delete stuff with rm -rf --no-preserve-root, its nigh impossible to recover. It's very likely you've lost all the important files.

As @faker said in his answer, the best course of action is to transfer the files to a safe location and redeploy the server afterwards.

To avoid similar situations in future, I'd suggest you:

Take backups weekly, or at least fortnightly. This would help you in getting the affected service back up with the least possible MTTR.
Don't work as root when not needed. And always think twice before doing anything. I'd suggest you also install safe-rm.
Don't type options that you don't intend to invoke, such as --no-preserve-root or --permission-to-kill-kittens-explicitly-granted, for that matter.

93

Octo · Answer 4 · 2016-04-12T00:17:54+08:00

Octo

2016-04-12T00:17:54+08:002016-04-12T00:17:54+08:00

I've had the same issue but just testing with a harddrive, I've lost everything. I don't know if it'll be useful but don't install anything, don't overwrite your data, you need to mount your hard drives and launch some forensics tools such us autopsy, photorec, Testdisk.

I strongly recommend Testdisk, with some basics command you can recover your data if you didn't overwrite it.

48

Monty Harder · Answer 5 · 2014-04-08T13:22:18+08:00

Monty Harder

2014-04-08T13:22:18+08:002014-04-08T13:22:18+08:00

The best way to fix a problem like this is to not have it in the first place.

Do not manually enter an "rm -rf" command that has a slash in the argument list. (Putting such commands in a shell script with really good validation/sanity routines to protect you from doing something stupid is different.)

Just don't do it.
Ever. If you think you need to do it, you aren't thinking hard enough.

Instead, change your working directory to the parent of the directory from which you intend to start the removal, so that the target of the rm command does not require a slash:

cd /mnt

sudo rm -rf hetznerbackup

34

Abc Xyz · Answer 6 · 2016-04-11T16:32:54+08:00

Abc Xyz

2016-04-11T16:32:54+08:002016-04-11T16:32:54+08:00

I would try to recover backup machine, where all copies were stored:

1st step - Make a backup of this erased "backup machine" drives with dd comand.
2nd step - Use testdisk to recover files.

So lets say you want to recover 1TB, You will need extra 2TB, 1TB for backup (1st step) plus 1TB for recovery (2nd step).

I did similar mistake with alias rm -fr [phone rang] and cd to precious directory. Now i always think twice and recheck couple times before i use rm or dd command.

15

kasperd · Answer 7 · 2014-04-07T23:54:07+08:00

kasperd

2014-04-07T23:54:07+08:002014-04-07T23:54:07+08:00

As mentioned in another answer, Hetzner has a rescue system. It includes both a netboot option with ssh access as well as a java applet to give you screen and keyboard on your vserver.

If you want to recover as much as possible, reboot the server into the netboot system and then log in and download an image of the filesystem by reading from the appropriate device inode.

I think something like this should work:

ssh root@host cat /dev/sda > server.img

Of course the redirection is done by the shell before the ssh command is invoked, so server.img is a local file. If you want just the root file system and not the full disk, replace sda by sda3 assuming you are using the same image as me.

7

Gerry · Answer 8 · 2016-04-16T01:51:28+08:00

Gerry

2016-04-16T01:51:28+08:002016-04-16T01:51:28+08:00

How would you move forward from here?

I would swear off using rm for the rest of my life and think that it's madness that trash-cli isn't the default removal command on nix systems.

https://github.com/andreafrancia/trash-cli

I would make sure it is the first thing I install on a brand new system and alias rm to something that tells people to use trash-cli instead. It would also include a note about another alias that actually runs /bin/rm but tells them to avoid using it in most cases.

:( True story

3

BiG_NoBoDy · Answer 9 · 2016-04-19T06:46:25+08:00

BiG_NoBoDy

2016-04-19T06:46:25+08:002016-04-19T06:46:25+08:00

I would advice in such case is unmount and use debugfs, and with help of lsdel you can list all recently removed files, which where not cleaned up from journals and then dump needed files. Fast search link for the same: http://www.linuxvoodoo.com/resources/howtos/debugfs

hope it will help someone. ;)

And yes, once of suggestions is to make script, which moved ream rm to real.rm and symlinc mv to rm ;)

2

Monday morning mistake: sudo rm -rf --no-preserve-root /

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?