We built some servers running Windows Server 2016 out of some "spare" parts with the following specs:
- Supermicro X10DRH-iT
- Dual E5-2620 v3
- 256GB Registered ECC DDR4 RAM
- 1x Adaptec 71685 RAID controller
- 8x Intel DC S3500 80GB SSDs
- 4x Intel DC S3500 240GB SSDs
- 4x 300GB 15k SAS HDDs
- 2x OCZ RevoDrive 350 480GB
- 1x OCZ RevoDrive 350 960GB
We use these servers for high performance Oracle DB testing environments.
The problem is, that after some time (really quite random) and not under heavy load or anything, the OCZ 350 drives start to act up, flooding the Windows event logs with ocz10xx Adapter \Device\RaidPort2 received srs interrupt.
and Request failed on \Device\0000004a, physical disk 2.
warnings and ultimately destroying parts of the Oracle tablespace files.
In this state, the Toshiba SSD Utility tool only reports the first 480GB drive as OK, the other two are missing.
These "warnings" won't cease until we completely power off the servers, unplug the power cords, wait some time, and power them up again. The tablespace files are still corrupt then, so we have to recreate them and flash back (or reimport) the database.
All drivers and firmware etc. are up to date.
We tried setting possibly every imaginable combination regarding power, interrupts, timings, PCIe, etc. in BIOS, switching the cards to different slots, but to no avail.
Anyone got any clue what we could try? Other than dumping the hardware, if possible!