I have a C application that occasionally fails to open a file which is stored on a /tmp
share.
Here is the relevant chunk of code:
// open file and start parsing
notStdin = strcmp(inFile, "-");
if (notStdin) {
coordsIn = fopen(inFile, "r"); <----- inFile = file that I want to open
if (coordsIn == NULL) {
fprintf(stderr, "ERROR: Could not open coordinates file: %s\n\t%s\n", inFile, strerror(errno));
exit(EXIT_FAILURE);
}
}
else
coordsIn = stdin;
Once out of eight to ten trials, I get a NULL
FILE pointer. Here is an example error message:
ERROR: Could not open coordinates file: /tmp/coordinates.txt
File or directory does not exist
However, the file /tmp/coordinates.txt
does indeed exist, as I can open it with standard utilities like head
, cat
or more
, etc.
The permissions of different /tmp/coordinates.txt
trial files are the same.
Here is the result from uname -a
:
$ uname -a
Linux hostname 2.6.18-128.2.1.el5 #1 SMP Wed Jul 8 11:54:47 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
If I use a different inFile
that is stored in a different, non-/tmp
share, then I do not observe this symptom.
Is there anything that would cause fopen()
to fail on a file stored in the /tmp
share? Are there other troubleshooting steps I can pursue?
Too Many Open Files?
Is your program opening lots of files? Perhaps you are running out of file descriptors? Here is a link about how to change your program, the shell, and the OS if this is the case. To see you many you are using with your program:
On my Ubuntu system, the shell limit is 1024 include the stdout, stderr, and stdin. This is set in /etc/security/limits.conf. The following little program shows this:
When I run it prints "1021" with an exit status of 1.
Check For System Errors:
More generically, you can always check the output of dmesg or /var/log/messages for any errors.
Watch the file, see if something else is messing with it:
Perhaps the file doesn't exist, something is deleting it out from under you? You might want to use inotify to watch all events on the file, or tools that uses inotify such as incron or inotify-tools.
Maybe some program is locking this file? It can be another copy of your program.
Does
lsof /tmp/coordinates.txt
show anything?I can't think of anything special with /tmp which should make files be intermittently not be there. /tmp is just a regular directory with slightly special permissions allowing everybody to mess around, but limit non-root users to not owning files to delete them.
Is there anything external to your program modifying/changing that file, or, are your program multi-threaded and work with that file ? If so, it could be a race-condition.
To check if this is the case you could use inotify as Kyle suggests (you're probably after delete and moved from events) to see exactly what happens with the file.
Alternatively, you could try to stat the file and see if it's ctime and/or mtimes coincide with when fopen failed with ENOENT.
Vim?
Just a wild guess. Do you have the /tmp/coordinates.txt file open in vim while this problem occur?
I have experienced problems with the file I am editing in vim disappearing from the file system and reappearing very shortly after. I have never actually seen the file missing with ls, but I have had e.g. gcc fail in trying to open the file, only to succeed in the second try.
This is not a very seldom artifact, it happens regularly during the day.
I don't know what the solution is, but I know that I have observed this same problem, and not just on /tmp. It can happen over NFS mounts, and even local mounts, like /lib. A problem I found today reproduces this by (incorrectly) opening, seeking, reading a little and closing the same file thousands of times (instead of just keeping the file open for the whole operation). Sporadically, one of the fopen()'s would fail with an ENOENT error. This is not too many files open as one person suggested -- it literally thinks the file isn't there for a split second.
I've been searching to see if anyone else has had similar problems, and this is the closest I've come. I don't have any answers, though, just looking for a solution.
I don't think this is endemic to Linux (or RHEL), because I don't see this everywhere, just in one environment. I don't know what's different about that environment that could cause this problem (there are differences). Though it isn't good to see that whatever it is isn't fixed in RHEL5 (I see it on RHEL4).