How can I use docker without sudo?

Question

Muaad ElSharif

Asked: 2018-02-10 10:09:48 +0800 CST2018-02-10 10:09:48 +0800 CST 2018-02-10 10:09:48 +0800 CST

Issues with SSD : rising CRC errors , freezing, sometimes read-only

772

My laptop SSD is acting up and the number of errors soared since the last time I posted.

Is this drive dead / dying?
It's on now and I'm writing this on it - I have all my data backed up and all, but I am still unsure if it's usable or not?

Contacting the manufacturer didn't help much: they asked me to install Windows and run the disk check utility from there or connect it as an external drive to a Windows host and test it there.
I did both and no errors were encountered.

I also checked it with the utility they provide (see screenshot below). I then used the image I made with clonezilla to return to Ubuntu, and I found that the SATA PHY error count is nearing 300 errors!

I've also checked the connectors, but since the SSD is in a laptop I cannot change the cable (easily).

These are the test results generated by the manufacturer's utility

And the smartctl output on Ubuntu, later:

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.14.0-041400-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     SPCC Solid State Disk
Serial Number:    XXXXXXXXXX
Firmware Version: S9FM02.8
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 (minor revision not indicated)
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sun Feb 18 02:22:56 2018 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (   30) seconds.
Offline data collection
capabilities:            (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   1) minutes.
Extended self-test routine
recommended polling time:    (   2) minutes.
Conveyance self-test routine
recommended polling time:    (   2) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000a   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       6352
 12 Power_Cycle_Count       0x0012   100   100   000    Old_age   Always       -       2717
168 Unknown_Attribute       0x0012   100   100   000    Old_age   Always       -       0
170 Unknown_Attribute       0x0013   100   100   010    Pre-fail  Always       -       25
173 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       105447539
192 Power-Off_Retract_Count 0x0012   100   100   000    Old_age   Always       -       77
194 Temperature_Celsius     0x0023   070   070   000    Pre-fail  Always       -       30
196 Reallocated_Event_Count 0x0000   100   100   000    Old_age   Offline      -       0
218 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       15431
241 Total_LBAs_Written      0x0012   100   100   000    Old_age   Always       -       6281157

SMART Error Log Version: 1
ATA Error Count: 298 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 298 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 01 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ff d5 01 01 00 00 00 ff      00:11:08.077  [VENDOR SPECIFIC]
  ca 00 80 b0 8f 12 e1 00      00:11:08.076  WRITE DMA
  ca 00 80 30 8f 12 e1 00      00:11:08.076  WRITE DMA
  ca 00 80 b0 8e 12 e1 00      00:11:08.075  WRITE DMA
  ca 00 80 30 8e 12 e1 00      00:11:08.074  WRITE DMA

Error 297 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 01 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ff d5 01 01 00 00 00 ff      00:11:08.039  [VENDOR SPECIFIC]
  ca 00 80 b0 7c 12 e1 00      00:11:08.038  WRITE DMA
  ca 00 80 30 7c 12 e1 00      00:11:08.038  WRITE DMA
  ca 00 80 b0 7b 12 e1 00      00:11:08.037  WRITE DMA
  ca 00 80 30 7b 12 e1 00      00:11:08.037  WRITE DMA

Error 296 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 01 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ff d5 01 01 00 00 00 ff      00:11:07.974  [VENDOR SPECIFIC]
  ca 00 80 b0 48 12 e1 00      00:11:07.973  WRITE DMA
  ca 00 80 30 48 12 e1 00      00:11:07.972  WRITE DMA
  ca 00 80 b0 47 12 e1 00      00:11:07.972  WRITE DMA
  ca 00 80 30 47 12 e1 00      00:11:07.972  WRITE DMA

Error 295 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 01 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ff d5 01 01 00 00 00 ff      00:11:07.927  [VENDOR SPECIFIC]
  ca 00 80 b0 2a 12 e1 00      00:11:07.926  WRITE DMA
  ca 00 80 30 2a 12 e1 00      00:11:07.925  WRITE DMA
  ca 00 80 b0 29 12 e1 00      00:11:07.925  WRITE DMA
  ca 00 80 30 29 12 e1 00      00:11:07.924  WRITE DMA

Error 294 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 01 01 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ff d5 01 01 00 00 00 ff      00:11:07.899  [VENDOR SPECIFIC]
  ca 00 80 b0 22 12 e1 00      00:11:07.898  WRITE DMA
  ca 00 80 30 22 12 e1 00      00:11:07.897  WRITE DMA
  ca 00 80 b0 21 12 e1 00      00:11:07.897  WRITE DMA
  ca 00 80 30 21 12 e1 00      00:11:07.896  WRITE DMA

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      6288         -
# 2  Conveyance offline  Completed without error       00%      6285         -
# 3  Short offline       Completed without error       00%      6285         -
# 4  Extended offline    Completed without error       00%      6283         -
# 5  Extended offline    Completed without error       00%      6283         -
# 6  Short offline       Completed without error       00%      6283         -
# 7  Extended offline    Completed without error       00%      6262         -
# 8  Conveyance offline  Completed without error       00%      6262         -
# 9  Conveyance offline  Completed without error       00%      6262         -
#10  Extended offline    Completed without error       00%      6262         -
#11  Short offline       Completed without error       00%      6262         -
#12  Conveyance offline  Completed without error       00%      6211         -
#13  Extended offline    Completed without error       00%      6211         -
#14  Short offline       Completed without error       00%      6211         -
#15  Short offline       Completed without error       00%      6075         -
#16  Conveyance offline  Completed without error       00%      5564         -
#17  Extended offline    Completed without error       00%      5564         -
#18  Short offline       Completed without error       00%      5564         -
#19  Conveyance offline  Completed without error       00%      5319         -
#20  Short offline       Completed without error       00%      5319         -
#21  Conveyance offline  Completed without error       00%      4403         -

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

4 Answers

Voted

Robert Riedl · Answer 1 · 2018-02-20T06:19:23+08:00

Robert Riedl

2018-02-20T06:19:23+08:002018-02-20T06:19:23+08:00

Replace your SSD

People have tried a lot of things in the comments, but this SSD seems to have some issues.

Judging by the S.M.A.R.T readouts, your drive has not seen a lot of action (~250 power on days, ~6 TB written) and you say it is about 2 years old. This should well be inside the warranty!

My advice is

backup all you data immediately (though you say you have that covered already)
remove / replace the SSD (depending on your budget, of course)
send the disk to the manufacturer for replacement

Your " Slim S70 " disk should be covered under the 5 year warranty of Silicon Power

Just send them a RMA request here.

6

WinEunuuchs2Unix · Answer 2 · 2018-02-18T14:02:17+08:00

Some time before May 11, 2017 you updated your SSD Firmware. However a new version was released in September 2017 and you should apply it using Windows.

Run fstrim to discard unused blocks in the file system:

$ sudo fstrim --verbose --all
/mnt/c: 16 EiB (18446744073709551615 bytes) trimmed
/mnt/e: 16 EiB (18446744073709551615 bytes) trimmed
/: 23.4 GiB (25132920832 bytes) trimmed

In my case the results for Windows 10 partitions /mnt/c and /mnt/e were out of this world. So I checked the files and no harm was done to the data.

Run fsck -f on your SSD after booting with a Live-USB when the partition is not mounted. Another option is running fsck -f from grub - How to fsck hard drive while hard drive is unmounted, using bootable USB stick?.

As mentioned in comments a bad SATA cable can cause errors. But as this answer points out, a loose connection can also cause errors. To rule out a bad/loose connection, remove the plugs from your SSD, blow compressed air over them and the male pins on the drive and firmly reseat the cables.

How much is your time worth?

The last question is how much is your time worth. Assuming you've spent 10 hours on this problem it works out to $5 / hour because many brand new 120GB SATA III SSDs can be purchased from ebay.com

Feb 23/2018 update

I read all the other answers tonight. One answer says to return it. But if you do and they find nothing wrong they'll simply send it back and you'll be without a drive for 2 weeks to 2 months.

Another answer says smartctl reports there is nothing wrong with the drive.

In this answer I suggested running fsck -f and you responded that no errors were reported.

Run `fsck` every boot

As a compromise between the negative answer (return it) and the positive answer (nothing is wrong), my inclination would be to run fsck on every boot. If an error is discovered the boot is paused and you can read the error message. To summarize the link use:

sudo tune2fs -c 1 /dev/sdX

Note: replace X with your drive letter, ie a, b, etc..

If after a month of no errors, change the value from 1 to 30 which is typical for most systems I believe. On a typical SSD the fsck will run quickly.

Clean and re-seat SATA cables

Others mentioned replacing the SATA cable which is problematic for a laptop. As a compromise consider unplugging all cables on the drive side, using compressed air on male and female ends and then plugging the cables back in firmly.

thomasrutter · Answer 3 · 2018-02-23T05:58:32+08:00

There is nothing wrong with your drive. All tests pass. You are simply misinterpreting the SMART data.

Firstly, the first screenshot contains raw data and you cannot draw any conclusions about it. I have no idea what use its creator thinks that data would be to anybody, but it doesn't really mean anything. Unless the meaningful columns can be reached by scrolling right in the window or something.

Let me explain the columns in the SMART report (the latter report you posted).

Attribute name: name of the metric
Value: current value, higher is better. Values are often out of 100 where 100 = best, but can use any scale as long as higher is better. Even if the metric is something like "error rate", it's normalised so higher values mean lower error rates.
Worse: worst observed value, higher is better.
Thresh: if value drops below this, it's a fail condition. At or above = pass.
Type: what a fail condition would mean for this metric.
- Old_age: this metric is indicative of age/usage of the drive, not a specific problem.
- Pre-fail: this metric is indicative of a potential problem with the drive, increasing chance of drive failure.
When_failed: When this entered failure mode, if ever
Raw_value: internal measurement of the drive that contributed to the value - this is not useful for end user and lower or higher values do not necessarily indicate better or worse.

To address some specific areas of the report:

SMART overall-health self-assessment test result: PASSED

This reflects everything passed. None of the metrics measured has ever entered a failure state.

The log of "errors" is relatively typical for a drive. These do not necessarily indicate unrecoverable errors or even problems with the drive itself; their reports are vague, so you can't tell what actually happened from this except that it was during DMA transfer at the controller, but if anything was important it would be reflected in the overall health report. In particular, these ones could be something fairly innocent like writes that were cancelled at the controller end, or the OS requesting some feature during load that the drive doesn't support, which may be entirely normal when probing device capabilities.

Finally, a note about CRC errors or error rates: all drives have an error rate. Drives store data at such high densities that a certain number of bit errors is expected and designed for, by using error correction code. The error correction code ensures that a certain number of bit errors per chunk of bits may occur and be 100% corrected. The drive is constantly applying the error correction code all the time, and the error correction code is designed so that the chance of an unrecoverable error occurring randomly is very low (as in, significantly less likely than winning the lottery) in a well functioning drive. If you see an error rate in any stats and it's treated like no big deal, it's because it isn't, it'll just be corrected errors.

jringoot · Answer 4 · 2018-02-23T05:38:13+08:00

jringoot

2018-02-23T05:38:13+08:002018-02-23T05:38:13+08:00

Since you have only WRITE DMA errors and short and long tests show no errors.

And since DMA, is about the Direct Memory Access, try to find out if the BIOS has a separate hardware diagnostics test, and try the memory related tests.

If not a BIOS embedded test is available, look at the manufacturers support site if an offline hardware diagnostics is available (eg: bootable ISO file to burn on CD or USB-stick)

(BTW: An ubuntu cd has also memory diagnostics)

Because DMA write is IO, I would try to replace the SATA cable and look if no new error numbers are added after that (last one is here 298 but more can be added bynow)

0

Issues with SSD : rising CRC errors , freezing, sometimes read-only

Replace your SSD

How much is your time worth?

Feb 23/2018 update

Run `fsck` every boot

Clean and re-seat SATA cables

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?

Issues with SSD : rising CRC errors , freezing, sometimes read-only

4 Answers

Replace your SSD

How much is your time worth?

Feb 23/2018 update

Run fsck every boot

Clean and re-seat SATA cables

Run `fsck` every boot