Ping a Specific Port

Question

Piavlo

Asked: 2011-04-28 14:16:48 +0800 CST2011-04-28 14:16:48 +0800 CST 2011-04-28 14:16:48 +0800 CST

how to find out what is causing huge dentry_cache usage?

772

Note that inode_cache & ext3_inode_cache slabs are very small compared to dentry_cache. What happens is that slowly and steadily the within a week dentry_cache grows from 1M to ~5-6G Then I need to run echo 2 > /proc/sys/vm/drop_caches && echo 0 > /proc/sys/vm/drop_caches This started to happening one day on all servers hosting some web code - the developers are saying that they have not changed anything related to filesystem access pattern around the time then the problem started.

The system is centos5 with 2.6.18 kernel so I don't have any instrumentation features available th newer kernels. Any I idea how I can debug the problem? maybe with systemtap? This is a ec2 instance - so not even sure that systemtap will work there.

Thanks Alex

3 Answers

Voted

J. Paulding · Answer 1 · 2014-06-03T11:42:32+08:00

J. Paulding

2014-06-03T11:42:32+08:002014-06-03T11:42:32+08:00

Late, but maybe useful for others who come upon this.

If you are using the AWS SDK on that EC2 instance, it is highly likely that curl is causing the dentry bloat. While I haven't seen this trigger OOM, it is known to impact the performance of the server, due to the additional work required by the OS to reclaim SLAB.

If you can confirm that curl is being used by your developers to hit https (many of the AWS SDK do this), then the solution is to upgrade the nss-softokn library to at least v3.16.0 and set the environment variable, NSS_SDB_USE_CACHE (YES and NO are valid values, you may have to benchmark to see which performs curl requests more efficiently) for the process which is using libcurl.

I recently ran into this myself and wrote a blog entry (old blog entry link and upstream bug report) with some diagnostics & more detailed information, in case that helps.

5

polynomial · Answer 2 · 2011-08-27T16:20:23+08:00

polynomial

2011-08-27T16:20:23+08:002011-08-27T16:20:23+08:00

You have a few options. If I were in this situation I would start tracking the stats in:

# cat /proc/sys/fs/dentry-state 
87338   82056   45      0       0       0

Over time to see how fast it is growing. If the rate is somewhat regular I think you could identify possible culprits in two ways. First looking at the output of lsof might indicate that some process is leaving around deleted file handles. Second, you could strace the main resource using applications and look for an excessive number of fs related calls (like open(), stat(), etc).

I am also curious about @David Schwartz's comment. I haven't seen issues where the dentry cache causes the oom to kill things, but maybe that happens if they are all still referenced and active? If that is the case I'm pretty confident lsof would expose the issue.

3

Leo Ufimtsev · Answer 3 · 2018-11-21T13:03:13+08:00

Leo Ufimtsev

2018-11-21T13:03:13+08:002018-11-21T13:03:13+08:00

In our case we were able to identify the rough process by looking at minflt/s

pidstat -l -r
pidstat -l -r | sort -k4nr    # Sort on 4th column. Numerically, reverse (descending). RHEL6.
pidstat -l -r | sort -k5nr    # On newer pidstat, minflt/s is the 5th column. RHEL 7.

1

how to find out what is causing huge dentry_cache usage?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?