My problem seemed to start during a manual update although I don't know if that was coincidence or causal. About 20% of the way through downloading the update, it just stopped. I left that window alone for about 20 minutes or so but it wasn't progressing and although it said it was downloading there wasn't network activity.
I xkill
ed the window and rebooted. During shutdown and then again at boot there was a bunch of these ata1.00: status: { DRDY ERR }
entries. When it finally booted into KDE, my graphics resolution had reverted to 768p instead of 1440p and only one monitor came on.
I tried to purge my nvidia and reinstall but I got an error which I forgot but it suggested I run dpkg ...
(I forget the ...) which I did. After another reboot, my graphics were still in the same boat and I was still getting the above errors so I googled them and saw that they indicated a problem with my hard drive and weren't directly related to my video driver problems.
I ran mdadm -D /dev/md0
and sure enough it came back saying one of my drives is in removed
state, not failure though. The overall state is clean, degraded
. At this point I decided to stop screwing around with the video drivers. My question though is, why would the video settings/driver go out at the same time as the SSD (apparently) did if the mdadm is clean? Am I misinterpreting what clean means? Besides making fresh backup of critical data, should I do anything before replacing faulty SSD?
As to replacing the drive here's what I think I need to do.
First manually fail the drive mdadm /dev/md0 -f /dev/sda2
Next remove the drive with mdadm /dev/md0 -r /dev/sda2
At this point I would power down and replace the faulty drive with a new one.
After rebooting I would run gparted on the new drive and then run mdadm --manage /dev/md0 --add /dev/sda2
At this point mdadm would automatically rebuild the array?
Am I missing anything?
For reference here is my latest dmesg
I did the above and there were no big surprises. The only thing is that since the drive was already in removed status, it couldn't be manually failed or removed. It would seem that manually failing a drive is just for testing drives that are synced.
My graphics resolution (and 2nd screen) went back to normal so I wonder if that is a feature to alert people that they need to do something. Were it not for that, I would have droned on with a bad drive.