Ping a Specific Port

Question

Jonik

Asked: 2011-08-30 00:41:03 +0800 CST2011-08-30 00:41:03 +0800 CST 2011-08-30 00:41:03 +0800 CST

How to recreate a working AMI from recovery snapshot after Aug 8 outage?

772

After Amazon's Aug 8 outage, all (EBS based) AMIs stopped working for many users. This is due to corruption of some sectors in snapshots that the AMIs are based on.

However, Amazon created recovery snapshots where the disk problems should be fixed. Those are named along the lines of "Recovery snapshot for vol-xxxxxxxx".

I created a new AMI from recovery snapshot which worked fine, but instances launched from this new AMI do not work: their state is "Running", but I cannot ssh into the machine nor access any of the web services that should be running there. It boils down to this (from System Log, accessible through AWS management console):

EXT3-fs: sda1: couldn't mount because of unsupported optional features (240).

EXT2-fs: sda1: couldn't mount because of unsupported optional features (244).

Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,1)

I've mounted a volume created from that recovery snapshot in a another server on AWS, and everything looks quite normal though. For example, fsck says:

$ sudo fsck -a /dev/xvdg
fsck from util-linux-ng 2.17.2
uec-rootfs: clean, 53781/524288 files, 546065/2097152 blocks

In one of the AWS forum discussions, I found this advice from someone with similar problems:

A work around will be to make a volume from the snapshot and attach it to a running instance, use fsck --force to force the checking of the filesystem and once cleared, you can make a snapshot and use it for the AMI.

But I don't know how to force fsck on Ubuntu (11.04):

$ sudo fsck --force /dev/xvdg
fsck from util-linux-ng 2.17.2
fsck.ext3: invalid option -- 'o'

Anyone know how to force file system check on the volume on Ubuntu? Any other ideas on how to launch working instances that are based on the recovery snapshot?

Right now it looks like it might be quicker to just start over from a clean Ubuntu AMI and re-setup all our services. :-( But of course I would prefer not to do that if there's any way to get the recovery snapshot to actually work.

3 Answers

Voted

DCYorke · Answer 1 · 2011-09-01T08:13:40+08:00

DCYorke

2011-09-01T08:13:40+08:002011-09-01T08:13:40+08:00

I ran into the same problem when trying to duplicate a machine.

The problem turned out to be the kernel. Both when creating the AMI and the instance I selected default for the kernel image.

To resolve the problem, I recreated the AMI using the same kernel image as the original instance.

14

fred · Answer 2 · 2011-08-30T01:54:52+08:00

fred

2011-08-30T01:54:52+08:002011-08-30T01:54:52+08:00

Could you try the following command (note -f option instead of --force): sudo fsck -f /dev/xvdg

Hope this helps. Fred

2

Jonik · Answer 3 · 2011-08-30T05:16:34+08:00

Jonik

2011-08-30T05:16:34+08:002011-08-30T05:16:34+08:00

I didn't want to waste more time fighting with weird AWS-specific problems, so I created a new clean instance from one of the official Ubuntu AMIs (in my case ami-359ea941 which is a 32-bit EBS-backed image of Ubuntu 11.04 in the eu-west-1 region), and re-created my server setup there.

The fact that I could mount a volume created from the recovery snapshot in the new instance made the re-setup much faster though. For example, I did something like cp -a /mnt/recovery/usr/local /usr to restore a whole lot of stuff under /usr/local.

So in my case the recovery backups were far from useless as I could access the data on them. But of course it would still have been nicer to just create an AMI from the snapshot and continue using (instances from) that like the whole incident never happened. (Feel free to add an answer if you know how to achieve that!)

0

How to recreate a working AMI from recovery snapshot after Aug 8 outage?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?