My laptop SSD is acting up and the number of errors soared since the last time I posted.
Is this drive dead / dying?
It's on now and I'm writing this on it - I have all my data backed up and all, but I am still unsure if it's usable or not?
Contacting the manufacturer didn't help much: they asked me to install Windows and run the disk check utility from there or connect it as an external drive to a Windows host and test it there.
I did both and no errors were encountered.
I also checked it with the utility they provide (see screenshot below). I then used the image I made with clonezilla to return to Ubuntu, and I found that the SATA PHY error count is nearing 300 errors!
I've also checked the connectors, but since the SSD is in a laptop I cannot change the cable (easily).
These are the test results generated by the manufacturer's utility
And the smartctl
output on Ubuntu, later:
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.14.0-041400-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: SPCC Solid State Disk
Serial Number: XXXXXXXXXX
Firmware Version: S9FM02.8
User Capacity: 120,034,123,776 bytes [120 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 (minor revision not indicated)
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Sun Feb 18 02:22:56 2018 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 30) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0
9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 6352
12 Power_Cycle_Count 0x0012 100 100 000 Old_age Always - 2717
168 Unknown_Attribute 0x0012 100 100 000 Old_age Always - 0
170 Unknown_Attribute 0x0013 100 100 010 Pre-fail Always - 25
173 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 105447539
192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age Always - 77
194 Temperature_Celsius 0x0023 070 070 000 Pre-fail Always - 30
196 Reallocated_Event_Count 0x0000 100 100 000 Old_age Offline - 0
218 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 15431
241 Total_LBAs_Written 0x0012 100 100 000 Old_age Always - 6281157
SMART Error Log Version: 1
ATA Error Count: 298 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 298 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 01 00 00 00
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ff d5 01 01 00 00 00 ff 00:11:08.077 [VENDOR SPECIFIC]
ca 00 80 b0 8f 12 e1 00 00:11:08.076 WRITE DMA
ca 00 80 30 8f 12 e1 00 00:11:08.076 WRITE DMA
ca 00 80 b0 8e 12 e1 00 00:11:08.075 WRITE DMA
ca 00 80 30 8e 12 e1 00 00:11:08.074 WRITE DMA
Error 297 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 01 00 00 00
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ff d5 01 01 00 00 00 ff 00:11:08.039 [VENDOR SPECIFIC]
ca 00 80 b0 7c 12 e1 00 00:11:08.038 WRITE DMA
ca 00 80 30 7c 12 e1 00 00:11:08.038 WRITE DMA
ca 00 80 b0 7b 12 e1 00 00:11:08.037 WRITE DMA
ca 00 80 30 7b 12 e1 00 00:11:08.037 WRITE DMA
Error 296 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 01 00 00 00
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ff d5 01 01 00 00 00 ff 00:11:07.974 [VENDOR SPECIFIC]
ca 00 80 b0 48 12 e1 00 00:11:07.973 WRITE DMA
ca 00 80 30 48 12 e1 00 00:11:07.972 WRITE DMA
ca 00 80 b0 47 12 e1 00 00:11:07.972 WRITE DMA
ca 00 80 30 47 12 e1 00 00:11:07.972 WRITE DMA
Error 295 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 01 00 00 00
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ff d5 01 01 00 00 00 ff 00:11:07.927 [VENDOR SPECIFIC]
ca 00 80 b0 2a 12 e1 00 00:11:07.926 WRITE DMA
ca 00 80 30 2a 12 e1 00 00:11:07.925 WRITE DMA
ca 00 80 b0 29 12 e1 00 00:11:07.925 WRITE DMA
ca 00 80 30 29 12 e1 00 00:11:07.924 WRITE DMA
Error 294 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 01 00 00 00
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ff d5 01 01 00 00 00 ff 00:11:07.899 [VENDOR SPECIFIC]
ca 00 80 b0 22 12 e1 00 00:11:07.898 WRITE DMA
ca 00 80 30 22 12 e1 00 00:11:07.897 WRITE DMA
ca 00 80 b0 21 12 e1 00 00:11:07.897 WRITE DMA
ca 00 80 30 21 12 e1 00 00:11:07.896 WRITE DMA
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 6288 -
# 2 Conveyance offline Completed without error 00% 6285 -
# 3 Short offline Completed without error 00% 6285 -
# 4 Extended offline Completed without error 00% 6283 -
# 5 Extended offline Completed without error 00% 6283 -
# 6 Short offline Completed without error 00% 6283 -
# 7 Extended offline Completed without error 00% 6262 -
# 8 Conveyance offline Completed without error 00% 6262 -
# 9 Conveyance offline Completed without error 00% 6262 -
#10 Extended offline Completed without error 00% 6262 -
#11 Short offline Completed without error 00% 6262 -
#12 Conveyance offline Completed without error 00% 6211 -
#13 Extended offline Completed without error 00% 6211 -
#14 Short offline Completed without error 00% 6211 -
#15 Short offline Completed without error 00% 6075 -
#16 Conveyance offline Completed without error 00% 5564 -
#17 Extended offline Completed without error 00% 5564 -
#18 Short offline Completed without error 00% 5564 -
#19 Conveyance offline Completed without error 00% 5319 -
#20 Short offline Completed without error 00% 5319 -
#21 Conveyance offline Completed without error 00% 4403 -
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Replace your SSD
People have tried a lot of things in the comments, but this SSD seems to have some issues.
Judging by the S.M.A.R.T readouts, your drive has not seen a lot of action (~250 power on days, ~6 TB written) and you say it is about 2 years old. This should well be inside the warranty!
My advice is
Your " Slim S70 " disk should be covered under the 5 year warranty of Silicon Power
Just send them a RMA request here.
Some time before May 11, 2017 you updated your SSD Firmware. However a new version was released in September 2017 and you should apply it using Windows.
Run
fstrim
to discard unused blocks in the file system:In my case the results for Windows 10 partitions
/mnt/c
and/mnt/e
were out of this world. So I checked the files and no harm was done to the data.Run
fsck -f
on your SSD after booting with a Live-USB when the partition is not mounted. Another option is runningfsck -f
from grub - How to fsck hard drive while hard drive is unmounted, using bootable USB stick?.As mentioned in comments a bad SATA cable can cause errors. But as this answer points out, a loose connection can also cause errors. To rule out a bad/loose connection, remove the plugs from your SSD, blow compressed air over them and the male pins on the drive and firmly reseat the cables.
How much is your time worth?
The last question is how much is your time worth. Assuming you've spent 10 hours on this problem it works out to $5 / hour because many brand new 120GB SATA III SSDs can be purchased from ebay.com
Feb 23/2018 update
I read all the other answers tonight. One answer says to return it. But if you do and they find nothing wrong they'll simply send it back and you'll be without a drive for 2 weeks to 2 months.
Another answer says smartctl reports there is nothing wrong with the drive.
In this answer I suggested running
fsck -f
and you responded that no errors were reported.Run
fsck
every bootAs a compromise between the negative answer (return it) and the positive answer (nothing is wrong), my inclination would be to run
fsck
on every boot. If an error is discovered the boot is paused and you can read the error message. To summarize the link use:Note: replace
X
with your drive letter, iea
,b
, etc..If after a month of no errors, change the value from
1
to30
which is typical for most systems I believe. On a typical SSD thefsck
will run quickly.Clean and re-seat SATA cables
Others mentioned replacing the SATA cable which is problematic for a laptop. As a compromise consider unplugging all cables on the drive side, using compressed air on male and female ends and then plugging the cables back in firmly.
There is nothing wrong with your drive. All tests pass. You are simply misinterpreting the SMART data.
Firstly, the first screenshot contains raw data and you cannot draw any conclusions about it. I have no idea what use its creator thinks that data would be to anybody, but it doesn't really mean anything. Unless the meaningful columns can be reached by scrolling right in the window or something.
Let me explain the columns in the SMART report (the latter report you posted).
To address some specific areas of the report:
This reflects everything passed. None of the metrics measured has ever entered a failure state.
The log of "errors" is relatively typical for a drive. These do not necessarily indicate unrecoverable errors or even problems with the drive itself; their reports are vague, so you can't tell what actually happened from this except that it was during DMA transfer at the controller, but if anything was important it would be reflected in the overall health report. In particular, these ones could be something fairly innocent like writes that were cancelled at the controller end, or the OS requesting some feature during load that the drive doesn't support, which may be entirely normal when probing device capabilities.
Finally, a note about CRC errors or error rates: all drives have an error rate. Drives store data at such high densities that a certain number of bit errors is expected and designed for, by using error correction code. The error correction code ensures that a certain number of bit errors per chunk of bits may occur and be 100% corrected. The drive is constantly applying the error correction code all the time, and the error correction code is designed so that the chance of an unrecoverable error occurring randomly is very low (as in, significantly less likely than winning the lottery) in a well functioning drive. If you see an error rate in any stats and it's treated like no big deal, it's because it isn't, it'll just be corrected errors.
Since you have only WRITE DMA errors and short and long tests show no errors.
And since DMA, is about the Direct Memory Access, try to find out if the BIOS has a separate hardware diagnostics test, and try the memory related tests.
If not a BIOS embedded test is available, look at the manufacturers support site if an offline hardware diagnostics is available (eg: bootable ISO file to burn on CD or USB-stick)
(BTW: An ubuntu cd has also memory diagnostics)
Because DMA write is IO, I would try to replace the SATA cable and look if no new error numbers are added after that (last one is here 298 but more can be added bynow)