I look after a cluster of servers running Ubuntu 20.04. Each one has their drives exported via NFS to other servers. They are ext4
file systems.
Also, we have a disk array (i.e., a SAN) which has been formatted as an ocfs2
(Oracle Cluster File System). This is mounted by the servers to provide additional disk space.
Everything appears fine... Except for one or two programs. These programs generate output on standard error and to a file. If these two outputs are sent to the ocfs2
file system, gibberish sometimes appears. They are non-ASCII characters...as if parts of memory have been written out directly to standard error and/or the output file. Sometimes, I see many ^@
characters, which is the NULL character (ASCII code 0
). It's different each time and not consistent. Sometimes a program execution works, but sometimes it does not with the same input.
If these programs output to the NFS drives, then this problem never happens. We've done this now hundreds of times and it hasn't happened once.
Until now, we've "solved" this problem by having the programs output to the NFS drives and then copy the outputs over for long term storage. But, it bothers me that I don't really know the cause. As the system administrator, I guess what I'm worried about is whether the ocfs2
drive has been mis-configured somehow. When I copy files over to the ocfs2
drive and test the md5sum
afterwards, it all checks out.
(It could be a bug in the programs and we've reported this issue to its developer. But we're now noticing it happens to two programs. And it never happens with the NFS drives.)
If anyone has suggestions as to what I should check or consider, please let me know. I'm completely stumped... Thank you in advance!
0 Answers