Yesterday one of the disks in RAID 1 mirror attached to Adaptec 5405 died and was replaced by new one but after about 2 hours of rebuilding array went offline. Hoster staff updated firmware on controller and forced array online back. System booted normally and arcconf getconfig 1
showed following output:
Controllers found: 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
Controller Status : Optimal
Channel description : SAS/SATA
Controller Model : Adaptec 5405
Controller Serial Number : 1D091194BC6
Physical Slot : 18
Temperature : 85 C/ 185 F (Normal)
Installed memory : 256 MB
Copyback : Disabled
Background consistency check : Disabled
Automatic Failover : Enabled
Global task priority : High
Performance Mode : Default/Dynamic
Stayawake period : Disabled
Spinup limit internal drives : 0
Spinup limit external drives : 0
Defunct disk drive count : 0
Logical devices/Failed/Degraded : 1/0/1
SSDs assigned to MaxCache pool : 0
Maximum SSDs allowed in MaxCache pool : 8
MaxCache Read Cache Pool Size : 0.000 GB
MaxCache flush and fetch rate : 0
MaxCache Read, Write Balance Factor : 3,1
NCQ status : Enabled
Statistics data collection mode : Enabled
--------------------------------------------------------
Controller Version Information
--------------------------------------------------------
BIOS : 5.2-0 (18948)
Firmware : 5.2-0 (18948)
Driver : 1.1-7 (28000)
Boot Flash : 5.2-0 (18948)
--------------------------------------------------------
Controller Battery Information
--------------------------------------------------------
Status : Not Installed
----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
Logical device number 0
Logical device name :
RAID level : 1
Status of logical device : Degraded
Size : 953334 MB
Read-cache mode : Enabled
MaxCache preferred read cache setting : Enabled
Write-cache mode : Disabled (write-through)
Write-cache setting : Disabled (write-through)
Partitioned : Yes
Protected by Hot-Spare : No
Bootable : Yes
Failed stripes : No
Power settings : Disabled
--------------------------------------------------------
Logical device segment information
--------------------------------------------------------
Segment 0 : Present (Controller:1,Connector:0,Device:1) 9VP3AGB1
Segment 1 : Rebuilding (Controller:1,Connector:0,Device:0) Z1D49LKS
----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
Device #0
Device is a Hard drive
State : Rebuilding
Supported : Yes
Transfer Speed : SATA 3.0 Gb/s
Reported Channel,Device(T:L) : 0,0(0:0)
Reported Location : Connector 0, Device 0
Vendor :
Model : ST1000DM003-9YN1
Firmware : CC4H
Serial number : Z1D49LKS
Size : 953869 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
Power State : Full rpm
Supported Power States : Full rpm,Powered off,Reduced rpm
SSD : No
MaxCache Capable : No
MaxCache Assigned : No
NCQ status : Enabled
Device #1
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SATA 3.0 Gb/s
Reported Channel,Device(T:L) : 0,1(1:0)
Reported Location : Connector 0, Device 1
Vendor :
Model : ST31000528AS
Firmware : CC38
Serial number : 9VP3AGB1
Size : 953869 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
Power State : Full rpm
Supported Power States : Full rpm,Powered off
SSD : No
MaxCache Capable : No
MaxCache Assigned : No
NCQ status : Enabled
and aacraid-status
-- Controller informations --
-- ID | Model | Status
c0 | Adaptec 5405 | Optimal
-- Arrays informations --
-- ID | Type | Size | Status | Task | Progress
c0u0 | RAID1 | 953G | Degraded | Rebuild | 44%
-- Disks informations
-- ID | Model | Status
There is at least one disk/array in a NOT OPTIMAL state.
smartctl for drive #1 showed that it is in pre-failure state:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 114 099 006 Pre-fail Always - 69962026
3 Spin_Up_Time 0x0003 095 094 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 93
5 Reallocated_Sector_Ct 0x0033 002 002 036 Pre-fail Always FAILING_NOW 4015
7 Seek_Error_Rate 0x000f 083 060 030 Pre-fail Always - 222073391
9 Power_On_Hours 0x0032 073 073 000 Old_age Always - 24485
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 72
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 006 006 000 Old_age Always - 94
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 1507556589923
189 High_Fly_Writes 0x003a 095 095 000 Old_age Always - 5
190 Airflow_Temperature_Cel 0x0022 059 051 045 Old_age Always - 41 (Min/Max 40/41)
194 Temperature_Celsius 0x0022 041 049 000 Old_age Always - 41 (0 19 0 0)
195 Hardware_ECC_Recovered 0x001a 045 028 000 Old_age Always - 69962026
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 202082506249469
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 1020514404
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 590957029
and array failed to rebuild again and went offline.
What is the best way to proceed now with getting this system back online?
According to smart - drive might be in failing status, try to rebuild again and if reallocated sector count grows - it's definitely bad.
ST1000DM003 is not supported drive - see compatibility report, also, according to my experience, these drives have some firmware/compatibility problems.
Globally, Adaptec 5 Series are very problematic from the point of view of compatibility, in some cases workaround is to connect drives directly, without backplane, in some cases they stop failing when drives are switched to 1.5 gbps (drive jumpers).
Use drives from compatibility list and don't forget to upgrade drive firmwares.
p.s. you've got write cache enabled on drives, but disabled on controller.