I upgraded my NVMe SSD from a PC300 SK Hynix 512G to a Kingston SA2000 1T in my Dell XPS 15. I used Clonezilla to clone the disk and then GParted to resize/move partitions. Everything went smoothly without error.
But now, on Ubuntu 20.04, I experience freezing when doing heavier I/O operations (Android app compilation, writing multiple files via scripts, sometimes opening Chrome,...).
I don't see anything in /var/log/syslog
or in /var/log/kern.log
corresponding to the freezing occurrences but the system is completely stuck: I cannot ctrl+C in the terminal or start anything. Also, icons disappear progressively from menus (as it tries to load them?).
On Windows 10, I have no issue while gaming, so I presume it's related to Ubuntu but cannot really prove it.
I tried a check disk, it didn't report anything, the only thing reported was "inode extent tree (at level 1) could be shorter" which I corrected.
smartctl
doesn't show any error:
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-58-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: KINGSTON SA2000M81000G
Serial Number: 50026B76842D46F8
Firmware Version: S5Z42105
PCI Vendor/Subsystem ID: 0x2646
IEEE OUI Identifier: 0x0026b7
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization: 782,120,886,272 [782 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 0026b7 6842d46f85
Local Time is: Tue Dec 29 11:57:29 2020 CET
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 75 Celsius
Critical Comp. Temp. Threshold: 80 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 9.00W - - 0 0 0 0 0 0
1 + 4.60W - - 1 1 1 1 0 0
2 + 3.80W - - 2 2 2 2 0 0
3 - 0.0450W - - 3 3 3 3 2000 2000
4 - 0.0040W - - 4 4 4 4 15000 15000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 25 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 911,056 [466 GB]
Data Units Written: 1,455,569 [745 GB]
Host Read Commands: 13,570,300
Host Write Commands: 12,412,188
Controller Busy Time: 104
Power Cycles: 35
Power On Hours: 23
Unsafe Shutdowns: 16
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Thermal Temp. 1 Transition Count: 41
Thermal Temp. 1 Total Time: 837
Error Information (NVMe Log 0x01, max 256 entries)
No Errors Logged
My system is up to date on Linux gp2mv3-laptop 5.4.0-58-generic #64-Ubuntu SMP Wed Dec 9 08:16:25 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
.
What can I do to correct this issue? Is there an incompatibility issue with Ubuntu?
I found the culprit and the solution, after contacting the manufacturer of the SSD.
Culprit
The issue is in the SSD firmware, in the implementation of APST. From what I understood, the SSD gives wrong timing information to the kernel.
APST is a power saving mode that put the SSD into sleep mode and need to know the "wake-up delay" needed by the SSD. The Firmware in the A2000 advertises a faster wake-up delay which blocks the wake-up and the SSD.
To solve this issue, you can either disable APST or override the value advertised by the SSD.
Solution
You have to edit the
nvme_core.default_ps_max_latency_us
config. Use 0 to disable it, or a sufficiently large value to avoid issues.Open
/etc/default/grub
and addnvme_core.default_ps_max_latency_us=500
at the end of theGRUB_CMDLINE_LINUX_DEFAULT
variable.The answer comes from that post: https://askubuntu.com/a/1100886/33386