EDIT: I totally forgot about this thread. It turns out I had a bad hard disk. We had to redeploy this server for other needs so I finally got around to replacing the one bad disk and we're back in business.
For a few weeks now I couldn't figure out why I wasn't able to delete this one particular file. As root I can, but my shell script runs as a different user. So I go run ls -la and it's not there. However, if I call it as a parameter, it shows up! Sure enough, the owner is root, hence I'm not able to delete.
Notice, 6535 is missing ...
[root@server]# ls -la 653*
-rw-rw-r-- 1 svn svn 24002 Mar 26 01:00 653
-rw-rw-r-- 1 svn svn 7114 Mar 26 01:01 6530
-rw-rw-r-- 1 svn svn 8653 Mar 26 01:01 6531
-rw-rw-r-- 1 svn svn 6836 Mar 26 01:01 6532
-rw-rw-r-- 1 svn svn 3308 Mar 26 01:01 6533
-rw-rw-r-- 1 svn svn 3918 Mar 26 01:01 6534
-rw-rw-r-- 1 svn svn 3237 Mar 26 01:01 6536
-rw-rw-r-- 1 svn svn 3195 Mar 26 01:01 6537
-rw-rw-r-- 1 svn svn 27725 Mar 26 01:01 6538
-rw-rw-r-- 1 svn svn 263473 Mar 26 01:01 6539
Now it shows up if you call it directly.
[root@server]# ls -la 6535
-rw-rw-r-- 1 root root 3486 Mar 26 01:01 6535
Here's something interesting. So I caught this issue because in my shell script, it would fail to delete because 6535 is owned by root. The file actually shows up after I run "rm -rf ." I tried it earlier and it failed to remove the directory since it told me the directory isn't empty. I went in and looked and sure enough, file "6535" finally shows up. No idea why it's doing this.
strace says the following
#strace ls -la 653* 2>&1 | grep ^open
open("/etc/ld.so.cache", O_RDONLY) = 3
open("/lib64/tls/librt.so.1", O_RDONLY) = 3
open("/lib64/libacl.so.1", O_RDONLY) = 3
open("/lib64/libselinux.so.1", O_RDONLY) = 3
open("/lib64/tls/libc.so.6", O_RDONLY) = 3
open("/lib64/tls/libpthread.so.0", O_RDONLY) = 3
open("/lib64/libattr.so.1", O_RDONLY) = 3
open("/etc/selinux/config", O_RDONLY) = 3
open("/proc/mounts", O_RDONLY) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
open("/proc/filesystems", O_RDONLY) = 3
open("/usr/share/locale/locale.alias", O_RDONLY) = 3
open("/usr/share/locale/en_US.UTF-8/LC_TIME/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US.utf8/LC_TIME/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US/LC_TIME/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.UTF-8/LC_TIME/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.utf8/LC_TIME/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en/LC_TIME/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/etc/nsswitch.conf", O_RDONLY) = 3
open("/etc/ld.so.cache", O_RDONLY) = 3
open("/lib64/libnss_files.so.2", O_RDONLY) = 3
open("/etc/passwd", O_RDONLY) = 3
open("/etc/group", O_RDONLY) = 3
open("/etc/mtab", O_RDONLY) = 3
open("/proc/meminfo", O_RDONLY) = 3
open("/etc/localtime", O_RDONLY) = 3
That's a bit worrysome. I'd verify that your
ls
file wasn't modified by comparing to a known good file. You could use your distribution's package tools to verify the file on an isolated system.Sometimes filenames get odd characters in them such as cursor movement sequences. Try this to make sure:
It should show question marks instead of control characters (it's probably the default, but it might not be).
This partially demonstrates the type of problem that may be present:
I would also try:
to see if an alias or function is defined or to see if a binary is in an odd place or has been modified.
You may want to fsck that volume.
I usually do something like this if I believe 'ls' has been modified...
python -c "import os; print os.listdir('.')"
Of course Python, the C Library, the kernel, or the file system could also be modified, but usually it's just the shell utils.
You can look into exactly what ls is doing by using strace, and that may tell you why it is avoiding showing that filename.
look that through that and see what's going on.
The output will look like this:
and if you see something like
be careful, you've been 0wned...
This isn't a conclusive test, but it is a good indicator...
(if you're using solaris or other OSs, you may need to use truss, or some other similar utility instead of strace)
(if you're using a csh/tcsh derived shell, you'll likely need different redirection statements)
Quick update, we had to replace the server for other reasons. It was the filesystem. All is well now!!! Thank you everyone.
The hack theory is interesting, but I have an alternative theory. Unix file deletion semantics will keep the file around until all processes have closed open file handles pointing at it. Perhaps someone has paused an SVN checkout / commit, or a server thread hung up. If restarting the SVN process (or Apache) solves your problem, this is where I'd place the blame.
Perhaps you can identify the process still using this file with
lsof | grep 6535
?