Sobrique

Asked: 2016-07-20 05:23:19 +0800 CST2016-07-20 05:23:19 +0800 CST 2016-07-20 05:23:19 +0800 CST

stat() mtime is in the past, despite fsync

This might be a bit esoteric, but I've got an odd sort of a problem with stat().

Specifically - on a Linux system, with Isilon storage NFS mounted. I have a process that is getting time differences when running fstat on the same file.

I can reproduce this with a bit of C, which does nothing more complicated than open-write-stat-close; open-stat-close

And these two numbers will - occasionally - differ by a few milliseconds - a file might appear 'older' than it actually is.

I'm unclear if the 'stat' is reading the time from the host, or from the server. And I'm also not clear if the problem is with the Isilon (although it doesn't seem to happen on NetApp storage).

Can anyone give me a bit of insight into what might be going on?

As an example:

time mismatch: 1468936451.862865611 != 1468936451.860183107

The first is the time read when the file handle is open. The second is the time read after closing (and fsync).

As you can see - the file is a bit older 'on the server' than on the client.

The C to test this is:

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>

int main ()
{
  while (1)
  {
    struct stat sb[2];

    /*
     * Open file
     */
    int fd = open ("test.dat", O_CREAT | O_RDWR | O_APPEND, 0644);
    if (fd < 0)
    {
      perror ("failed to open file (1)");
      return EXIT_FAILURE;
    }

    /*
     * Write to it
     */
    write (fd, "", 1);

    /*
     * Ensure it's sync'd to disk
     */
    if (fsync (fd) < 0)
    {
      perror ("failed to fsync file");
      return EXIT_FAILURE;
    }

    /*
     * Stat the file
     */
    if (fstat (fd, &(sb[0])) < 0)
    {
      perror ("failed to fstat file (1)");
      return EXIT_FAILURE;
    }

    /*
     * Close it
     */
    close (fd);

    /*
     * Open file
     */
    fd = open ("test.dat", O_RDONLY);
    if (fd < 0)
    {
      perror ("failed to open file (2)");
      return EXIT_FAILURE;
    }

    /*
     * Stat it again
     */
    if (fstat (fd, &(sb[1])) < 0)
    {
      perror ("failed to fstat file (2)");
      return EXIT_FAILURE;
    }

    /*
     * Close it
     */
    close (fd);

    /*
     * Check the times match
     */
    if (sb[0].st_mtime != sb[1].st_mtime ||
        sb[0].st_mtim.tv_nsec !=sb[1].st_mtim.tv_nsec)
    {
      printf ("time mismatch: %d.%09ld != %d.%09ld\n",
              (int)(sb[0].st_mtime), sb[0].st_mtim.tv_nsec,
              (int)(sb[1].st_mtime), sb[1].st_mtim.tv_nsec);
    }
  }
}

And you can trigger it by running this in an NFS mount, then calling 'stat' (shell command is fine) from another system. E.g. on the isilon itself:

while true
do 
    stat test.dat
done

This gets me errors like:

time mismatch: 1468936613.313045037 != 1468936613.310174576
time mismatch: 1468936613.547768543 != 1468936613.544878047
time mismatch: 1468936615.228495345 != 1468936615.225862947
time mismatch: 1468936619.036053897 != 1468936619.033362349
time mismatch: 1468936619.988151772 != 1468936619.985529620
time mismatch: 1468936664.541226541 != 1468936664.538549678
time mismatch: 1468936666.165674866 != 1468936666.163171366
time mismatch: 1468936666.787528027 != 1468936666.784797051
time mismatch: 1468936711.334376729 != 1468936711.331868927

And ... I don't think it should.

Edit:

The plot thickens - when running the above and capturing packets, the response from the NFS GETATTR call from the server is inconsistent. I get two different timestamps.

Mount options don't seem to make a difference - but we have tried setting noac and actimeo=0, sync lookupcache=none etc.

Our isilon nodes are all ntp-synced to the same sources. As usual, there's a small amount of drift between them.

We have found that we can apply a workaround in the form of isi export modify --zone <zoneid> <exportnumber> --time-delta 1

This sets the resolution of the timestamp to 1s rather than 1ns. However whilst it diminishes the frequency of the problem occuring, it will still happen - just less frequently, because it only occurs at a '1s' boundary.

stat() mtime is in the past, despite fsync

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?

stat() mtime is in the past, despite fsync

0 Answers