I have a backup procedure for ec2 instances with lvm spanned volumes that does the following:
1) ssh to the box as root using an ssh forced command to dmsetup suspend the spanned volume. 2) take an ebs snapshot of the volumes 3) ssh to the box as root using an ssh forced command to dmsetup resume the spanned volume.
This has been working fine for a while, but last night something went wrong. It appears that the volume was suspended and never came back to the active state. I could ssh into the instance, but very few commands worked (ls did, top did, ps did not). I could run dmsetup info to see that it was suspended, but attempts to run dmsetup resume did nothing. I eventually rebooted, and it forced a disk check on that volume, which will take a very long time. I've restored from a previous snapshot instead.
What might have gone wrong here, and are there any steps I can take to prevent this from happening in the future?
What is the point in suspending before snapshotting?
I read "force" three times in your procedure.
If you have to "force" something you should know exactly what you are doing. If not, try to avoid the "force".
So what warnings do you get, if you do not force? That might give you some clues to what went wrong.