here is example from dmesg
output from important production server ( RHEL 7.2 - DELL machine HW )
as we can see the sde
disk in server is dying
[Wed Jun 30 11:24:58 2021] sd 0:2:4:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Wed Jun 30 11:26:18 2021] sd 0:2:4:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Wed Jun 30 11:26:18 2021] sd 0:2:4:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Wed Jun 30 11:27:28 2021] sd 0:2:4:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Wed Jun 30 11:27:46 2021] sd 0:2:4:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
what is interesting is that these messages are old from 2021
, and we not seen this messages on 2022/2023
based on that facts, I want to ask if disk replacement should be considered based on faulty disk messages from 2021
second important question, is how to capture new fresh kernel messages by dmesg
is it possible to re-create new fresh kernel messages ?
as I know maybe reboot machine can helps about this , but I want to avoid machine reboot
dmesg
by default prints the messages from the kernel ring buffer.A ring buffer is a special kind of buffer that is always a constant size, removing the oldest messages when new messages are received, it gets freshly instantiated on system boot so what you're seeing are already the most recent kernel messages available.
When today you see messages from almost two years ago, in combination with a legacy RHEL version 7.2 the first thing that comes to mind is: you didn't perform any reboot for close to two years and seemingly did not do any maintenance on that server for even longer!
If your server is indeed from late 2015 - early 2026 (what the RHEL version suggests) before anything else I would start with checking the integrity of your back-ups, your restore procedure and disaster recovery plan and possibly start planning for a replacement and upgrade.
If you want to check the disk health on a live system: you can try to read the S.M.A.R.T. data and/or initiate a smart self-test with
smartctl
To see an estimate of how long the various supported self tests will take:
And the for example start a short test: