I have a Windows 10 workstation used within my business for things like image processing (Photoshop) and software development (Eclipse). It's an i7-2600K based computer, Gigabyte GA-B75M-D3H B75 motherboard, 16 GB RAM. OS is on Samsung 850 pro SSD, there's another 850 pro for data, WD Black for data, plus two 4GB HGST drives each on SATA 3 ports, formatted ReFS, in a storage spaces mirror. The array has 1.63GB used, 1.99GB free.
Recently the ReFS drives in the storage spaces mirror have started dropping - so far three times in a month. This usually occurs under moderate to heavy load, after an extended period. None of the other disks drop under load as far as I can tell, so I assume it's ReFS, Storage Spaces, or a problem with an underlying disk. A reboot brings the disk online.
I can see errors in the event viewer such as those below. These are not all in one place, and while there are NTFS and Storage Spaces log areas under "application and services log -> microsoft -> windows" there doesn't seem to be one for ReFS.
I'd appreciate help tracking down what's causing these problems, and resolving them, so my system stays up.
16:27.05 (under event viewer -> application and services log -> microsoft -> windows -> storagespaces-driver-operationsl
Virtual disk {26bf58b3-1cb9-4b93-a945-1b89331bb565} requires a data integrity scan.
Data on the disk is out-of-sync and a data integrity scan is required. To start the scan, run the following command:
Get-ScheduledTask -TaskName "Data Integrity Scan for Crash Recovery" | Start-ScheduledTask
Once you have resolved the condition listed above, you can online the disk by using the following commands in PowerShell:
Get-VirtualDisk | ?{ $_.ObjectId -Match "{26bf58b3-1cb9-4b93-a945-1b89331bb565}" } | Get-Disk | Set-Disk -IsReadOnly $false
Get-VirtualDisk | ?{ $_.ObjectId -Match "{26bf58b3-1cb9-4b93-a945-1b89331bb565}" } | Get-Disk | Set-Disk -IsOffline $false
16:27.05 (windows system event log): The file system was unable to write metadata to the media backing volume R:. A write failed with status "A device which does not exist was specified." ReFS will take the volume offline. It may be mounted again automatically.
16:27.06 (windows system event log): The file system detected a checksum error and was not able to correct it. The name of the file or folder is "<unable to determine file name>".
18:35.50 (windows system event log): Failed to connect to the driver: (-2147024894) The system cannot find the file specified.
18:35.50 (Kernel PNP) The driver \Driver\WudfRd failed to load for the device SWD\WPDBUSENUM\_??_USBSTOR#Disk&Ven_Generic&Prod_STORAGE_DEVICE&Rev_9451#7&2a9fd895&0#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}.
18:35.58: Virtual disk {26bf58b3-1cb9-4b93-a945-1b89331bb565} could not be repaired because there is not enough free space in the storage pool.
Replace any failed or disconnected physical disks. The virtual disk will then be repaired automatically or you can repair it by running this command in PowerShell:
Get-VirtualDisk | ?{ $_.ObjectId -Match "{26bf58b3-1cb9-4b93-a945-1b89331bb565}" } | Repair-VirtualDisk
UPDATE as yagmoth points out this error includes something about USB. The scenarios where I recall this error happening are a) When backing up to an external USB disk b) When running CrashPlan backups to another internal SATA disk
Storage spaces seems very sensitive to write latency: if it too much spikes, the volume can be dropped.
This seems a know problem when using consumer SSDs, as you can find here
First, you really should check the HCL. I'd bet a pretty nice dinner, nothing you mentioned there is on Storage Spaces HCL. Like vSAN, Windows and Storage Spaces have completely different HCLs. I can tell without even looking up your drives, that none of them are going to be on the HCL, because none of them are enterprise grade drives.
If you want a stable solution, get an LSI SAS card (non-RAID), get enterprise SATA HDDs and get an Intel dc series SSD. Is it expensive? Yep. Will it be reliable? As much as Windows can be when it comes to storage (which isn't great).
Me, I dumped storage spaces and went back to an LSI RAID card. Went from data corruption every week to rock solid storage for over two years on the same hardware. And I had ALL enterprise grade kit that was on the HCL.
You can find the Hardware Compatibility List (HCL) here https://www.windowsservercatalog.com/results.aspx?&chtext=&cstext=&csttext=&chbtext=&bCatID=1642&cpID=0&avc=10&ava=0&avq=0&OR=1&PGS=25&ready=0