we have kafka cluster with 3 VM machines. , when each kafka machines use the sdb disk ( VMDK disk ) in order to store the data
on all machines we seen the following kernel messages
[1123783.849575] EXT4-fs (sdb): error count since last fsck: 9
[1123783.849582] EXT4-fs (sdb): initial error at time 1595958527: ext4_writepages:2414
[1123783.849586] EXT4-fs (sdb): last error at time 1613639279: ext4_put_super:791
[1210205.709917] EXT4-fs (sdb): error count since last fsck: 9
[1210205.709937] EXT4-fs (sdb): initial error at time 1595958527: ext4_writepages:2414
[1210205.709944] EXT4-fs (sdb): last error at time 1613639279: ext4_put_super:791
[1296627.570121] EXT4-fs (sdb): error count since last fsck: 9
[1296627.570141] EXT4-fs (sdb): initial error at time 1595958527: ext4_writepages:2414
[1296627.570147] EXT4-fs (sdb): last error at time 1613639279: ext4_put_super:791
[1383049.419003] EXT4-fs (sdb): error count since last fsck: 9
[1383049.419019] EXT4-fs (sdb): initial error at time 1595958527: ext4_writepages:2414
[1383049.419025] EXT4-fs (sdb): last error at time 1613639279: ext4_put_super:791
[1469471.269771] EXT4-fs (sdb): error count since last fsck: 9
.
.
.
red hat explain this messages as the following. ( from case - https://access.redhat.com/solutions/383993 )
Issue
I see the following lines in /var/log/messages:
kernel: EXT4-fs (sdd1): error count: 5
kernel: EXT4-fs (sdd1): initial error at 1369732760: ext4_lookup:1044: inode 11534366
kernel: EXT4-fs (sdd1): last error at 1369733705: ext4_lookup:1044: inode 11534366
Resolution These are not errors, they're informational messages; however, they may be referencing other possible historical errors. These error counts should be reset once a successfully fsck has been run; however prior to e2fsprogs-1.41.12-18 a bug was preventing the reset. This has been corrected in e2fsprogs-1.41.12-18 via errata.
the messages that we get on our Kafka cluster are little diff from the redhat case
so we are more worry about the sdb disks ,
from what red hat say they not so worry because they are explained the messages as they're informational messages
so about my kernel messages, I can umount
the disk from mount-point and do fsck
in order to fix the Erros ,
but my question is how much I need to be worry about the following messages:
[1123783.849575] EXT4-fs (sdb): error count since last fsck: 9
[1123783.849582] EXT4-fs (sdb): initial error at time 1595958527: ext4_writepages:2414
[1123783.849586] EXT4-fs (sdb): last error at time 1613639279: ext4_put_super:791
[1210205.709917] EXT4-fs (sdb): error count since last fsck: 9
[1210205.709937] EXT4-fs (sdb): initial error at time 1595958527: ext4_writepages:2414
It seems fairly obvious that this is exactly what the RedHat document is talking about; the "initial error" and "last error" lines are simply reporting the historical errors. Make sure that your
e2fsprogs
version is more recent that the one in the RedHat document, thenfsck
, and the errors should go away. Since the dates represent dates in 2020 and 2021:you can safely ignore them, I think.