So I'm trying to find out if the stderr
of a process has been redirected to somewhare unusual (it's a java process and I want a thread dump, but it's launched through a nest of startup scripts).
I find my process with pgrep
, and use pfiles
to see what's there:
4366: /foo/bar/platform/solaris2/jre_1.5.0/bin/java -Xmx2048m -Xms10 Current rlimit: 65536 file descriptors 0: S_IFCHR mode:0666 dev:302,0 ino:6815752 uid:0 gid:3 rdev:13,2 O_RDONLY|O_LARGEFILE /devices/pseudo/mm@0:null 1: S_IFREG mode:0640 dev:85,56 ino:26471 uid:0 gid:0 size:10485812 O_WRONLY|O_LARGEFILE 2: S_IFREG mode:0640 dev:85,56 ino:26471 uid:0 gid:0 size:10485812 O_WRONLY|O_LARGEFILE 3: S_IFCHR mode:0666 dev:302,0 ino:6815772 uid:0 gid:3 rdev:13,12
So I can see that stdout
and stderr
(file descriptors 1 and 2) are pointing to the same place; I think they are redirected to the same file in the startup scripts so this tallies.
But when I look for a file with inode number 26471, I see this:
# find / -inum 26471 /usr/share/man/man3mlib/mlib_MatrixScale_S16_U8_Sat.3mlib /proc/4366/fd/1 /proc/4366/fd/2 /proc/4366/fd/83
The first hit is (I'm certain) a file on a different filesystem. The three entries in /proc
are fds my process has open.
Looking in /proc/4366
, I can't see any more info than I get from pfiles
.
# ls -li 0 1 2 3 6815752 c--------- 1 root sys 13, 2 Jan 20 14:10 0 26471 --w------- 0 root root 10485812 Jan 20 13:42 1 26471 --w------- 0 root root 10485812 Jan 20 13:42 2 6815772 c--------- 1 root sys 13, 12 Jun 7 2009 3 # file 0 1 2 3 0: character special (13/2) 1: ascii text 2: ascii text 3: character special (13/12)
(I can tail one of these fds and work out which file it is from that. I'm asking because I clearly don't understand the relationship between the fds and the inodes in enough depth).
So my process is writing to something (on some device, with inode 26471) and the data is then getting into a file with a different inode number. Can anyone give me an idea of what this something might be (or even let me know if my reasoning so far is totally broken)?
AFAIK,
find
searches the filesystem's directories. If that file was deleted but still existing because it's open (a common trick on unix), it won't be found byfind
.I haven't tried in Solaris, but here is a note about using
lsof
to identify such 'deleted but open' files, and recovering via acat /proc/<procid>/fd/<fdid> > /tmp/xxxx
Edit:
it seems you've already identified this is the case, but still wondering how is it possible. here's a short explanation:
on POSIX filesystem's, files are handled by its
inode
, and the directories are little more than a "path => inode" mapping. You can have more than one path 'pointing' to the same inode (it's called a hardlink), and the inode keeps a count of how many links it has. Therm
command simply callsunlink()
on this path, which reduces the link count and 'possibly' deletes the file itself.But a path on the directory tree isn't the only possible reference to an inode, an open
fd
on a running process also counts, and a 'deleted' file won't be really removed until it goes to 0.As i mentioned in passing above, it's a common trick: if you have a temporary file that you don't care to keep after your process finishes running, just open it and immediately 'delete' it. The opened handle will work reliably, and when your process finishes (either normally, killed or crashing), the system will remove the handle and cleanly delete the temporary file.
A logfile isn't a likely candidate for such a 'hidden autodeleting' file; but it's not hard to do accidentally.
Since your deleted logfile is still live and collecting data, it seems that simply copying the content wouldn't help much. so try creating a new hardlink to the /proc//fd/ file, something like
ln /proc/4366/fd/1 /tmp/xxxx
. Note there's no-s
flag, soln
should create a new hardlink with the same inode as the original, not a symbolic link (which is little more than a pointer to an existing path, and not what you want).Edit:
The
ln /proc/... /tmp/...
command can't work because /proc and /tmp are in different filesystems. Unfortunately, I don't know any way to create a pathname for an existing inode. One would want that thelink()
syscall would take an inode number and a path, but it takes source and destination paths.