I want a report on my system health, so that I know that all my hardware components (CPU, memory, disks...) are functioning as expected. It would be easiest to read if the report listed only the problems found (if any). Is there a system tool that does this?
Related notes:
- I know that the disk utility can report SMART results for my disk. I'd like something similar for all my other components.
- Raw diagnostic tools and benchmarks aren't suitable. Diagnostic tools list component details, but not their health. Benchmarks only sometimes highlight health issues. I am only interested in direct health reports.
- I am aware of an equivalent tool that performs this function in Windows (reports if a hardware component is failing), but I've forgotten the name :P I'd basically like an equivalent of this.
Electronics generally work 100% or zero percent. Mechanical devices such as hard drives do have indicators of impending failure as per SMART reporting which you already know about.
Fans
Fans have impending failure indicators but that is based on your hearing and listening for indicators such as oscillating speeds, squealing bearings, etc.
CPU
Another potential indicator of a degrading fan is CPU heat level. On a laptop means fan exhaust vents are clogged or RPM is too low. It could also mean CPU / motherboard needs a dust cleaning with compressed air (don't use your breath which contains moisture). It could also mean your CPU heat sink needs to be reseated with new thermal paste.
RAM
If your machine locks up and display a bad memory error you can test your RAM following these instructions: How to check for errors in RAM via linux?.
If the RAM checker finds a bad memory block you can blacklist it using these instructions: Is there a way of limiting the Kernel's memory manager to use only 75% of memory?
NVMe PCIe M.2 Gen 3.0 x 4 (or 2) SSD
If you have an SSD they're life span is measured in trillions of writes. Your SMART utility already measures SSD life but not for NVMe SSDs. For that you need
nvme-cli
. To install it use:Next gather information available from SSD:
The most important field is
Percentage used
which shows as 0%. This isn't disk usage percent but life used percent. The drive was purchased in October 2017 and it was still0%
in December 2018. ThePercentage used
hit 1% on October 2020. At this rate the NVMe SSD lifespan will be 300 years. Of course it will be obsolete well before then...System Monitor on desktop with
conky
Many people like to show their system status (and health) on a portion of their desktop. I like to keep my Conky running on the right 20% of my primary monitor:
Note: The 97% CPU usage on single CPU is caused by screen recorder itself.
To learn more about
conky
and CPU usage see: How do I stress test CPU and RAM (at the same time)?