This afternoon, someone at our office decided to pull the plug out of our server because it was storming outside. They didn't shut it down, they just pulled the plug out while it was running.
The server has 4 SATA drives in a software RAID 10 configuration, and LVM running on top of the RAID. The Server is running CentOS 6.2 Minimal and is a virtual machine host using KVM. At the time that it was unplugged, there were many guest machines running on the computer. Each guest has one or more LVM partitions that it uses directly as hard drives. The guest partitions are EXT3, EXT4 and NTFS. The Host OS is on an EXT4 partition.
Later, when the power came back, that person plugged it back in, and it started up. Since they plugged it in without attaching a monitor first, there is no way to see what came up on the screen. I tried attaching a monitor now, but it won't work unless the monitor is connected at boot. I've left it on, exactly as is, until I can get some advice since I don't want to screw anything up (further).
I can get into the host via SSH. I have not rebooted it yet in case there is something in a log somewhere that might be useful.
What I need to do is check all the disks and partitions for data integrity, if that's even possible. I think RAID 10 uses some kind of memory based cache and I'm worried about the drives being inconsistent, or files being corrupt if there were things in the cue to write to the drive that hadn't been written yet.
[root@othello ~]# cat /proc/mdstat
Personalities : [raid10] [raid1]
md2 : active raid1 sdc1[2] sda1[0] sdd1[3] sdb1[1]
102388 blocks super 1.0 [4/4] [UUUU]
md0 : active raid10 sda3[0] sdc3[2] sdd3[3] sdb3[1]
1952289792 blocks super 1.1 512K chunks 2 near-copies [4/4] [UUUU]
bitmap: 0/15 pages [0KB], 65536KB chunk
md1 : active raid10 sdc2[2] sda2[0] sdd2[3] sdb2[1]
1022976 blocks super 1.1 512K chunks 2 near-copies [4/4] [UUUU]
unused devices: <none>
It also bothers me that it's calling my arrays, "near-copies". Is that normal?
What kind of disk checks should I run to make sure everything is OK with the drives and data? Are there any other things I should check?
UPDATE
Output of mdadm --detail
[root@othello ~]# mdadm --detail /dev/md0
/dev/md0:
Version : 1.1
Creation Time : Sat Feb 25 09:26:20 2012
Raid Level : raid10
Array Size : 1952289792 (1861.85 GiB 1999.14 GB)
Used Dev Size : 976144896 (930.92 GiB 999.57 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sun Mar 11 12:59:30 2012
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : near=2
Chunk Size : 512K
Name : othello.myserver.com:0 (local to host othello.myserver.com)
UUID : 58ba40ab:12516733:e3779362:68200fdd
Events : 2208
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 8 19 1 active sync /dev/sdb3
2 8 35 2 active sync /dev/sdc3
3 8 51 3 active sync /dev/sdd3
The RAID is fine, all UUUU's means all disks in the array are up. I'd not even worry about that for now.
As for the VM's, if you want to run fscks on them, stop the VM's and run
fsck.ext3 (ext4, etc) /path/to/lvm
(usually like /dev/vg-name/lv-name)If you are using KVM, you should be able to use
virsh
to do anything you need to the VM's. Here is a link to the virsh man page http://linux.die.net/man/1/virshIf you really want to run disk checks on your raid arrays, you'll have to reboot into single user mode or boot from a live cd so you can fsck the individual /dev/mdX devices. Since the primary filesystem is EXT4, I'd not bother, it's much better than EXT3 with power outages.
Try mdadm --detail /dev/md0 (same for md1 and md2).
Then try the advice given here: http://linas.org/linux/raid.html