I have a server from Red Barn that has been throwing errors accessing its local SATA flash drive. Here are some facts:
- I'm running Ubuntu 15.10, although two other similar Supermicro servers are as well, with no trouble.
- The root drive is a SATA flash drive.
- If I reboot the system, it'll seem fine at least a day and then gets stuck throwing these errors all the time.
- We tried reseating all the RAMs and running memtest86 for days, with no problems.
- We booted the system from a USB stick, with the root drive not attached and watched it have no problems for days.
- We booted from USB, mounted the drive in question, and had a script touch a file once every 5 seconds. This ran for several days with no errors.
We thought about OS corruption, but why would it wait so long before manifesting? If the drive is failing, why does SMART report nothing, and why does it seem to work for a long time before going weird?
What else can we do to investigate this failure? We're kinda stuck.
Here's a screenshot from the remote console. I see this if I try to login, and then it returns me to the login problem.
0 Answers