We have an 'old' Dell PE r740xd server with quite high specs, installed with rhel 7 (latest). Running ls -l on / can take minutes.
Some specs:
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping: 4
CPU MHz: 2400.000
BogoMIPS: 4800.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 28160K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40 ,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41 ,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1g b rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonst op_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 s sse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_dead line_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd mba ibrs ibpb stibp tpr_shadow vnmi fle xpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_lo cal dtherm ida arat pln pts pku ospke md_clear spec_ctrl intel_stibp flush_l1d
# free -h
total used free shared buff/cache available
Mem: 376G 4.5G 371G 10M 342M 370G
Swap: 4.0G 0B 4.0G
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 17.5T 0 disk
└─sda1 8:1 0 17.5T 0 part
sdb 8:16 0 111.7G 0 disk
├─sdb1 8:17 0 1G 0 part /boot
└─sdb2 8:18 0 110.7G 0 part
├─rhel_lab110--16-root 253:0 0 50G 0 lvm /
├─rhel_lab110--16-swap 253:1 0 4G 0 lvm [SWAP]
└─rhel_lab110--16-home 253:2 0 56.7G 0 lvm /home
Only sdb is being used right now, I have just installed the OS. What can be affecting the performance so dramatically?
As you only mentioned
ls -l /
taking a long time (and not all directories, for example), one possibility is that your root inode got really large.You can check this with
stat /
and look at the reported size. A typical root inode on a filesystem with 4K blocks would be only 4K.A directory's inode can get really large by creating lots of names in it---it doesn't matter whether those names are files, directories, device nodes, etc. Anytime the names don't fit in the inode's current blocks, it has to be expanded.
A directory with a large inode will be slow to enumerate all of the names that it contains, even if most of the names have since been removed. If that's the root inode, it can affect many filesystem operations, such as calls to
open()
, etc.Unfortunately, most filesystems won't automatically shrink inodes when names are removed.
For large non-root inodes, you can create a new directory, move everything from the old to the new, remove the old, then rename the new.
For large root inodes on an ext2/3/4 filesystem, you can run
fsck -f -D /dev/...
on the block device if you can connect it to another system. If you can't do that, you can tryshutdown -r -F now
to restart the system and force a fsck on startup; it might optimize and shrink the directory.For other filesystems, the only sane remedy may likely be to rebuild the filesystem on a new disk.
To prevent a large root inode in the future, try to identify what program created so many names in
/
and prevent it from doing so in the future. It's likely that a program is storing its temp files there; configure it to use/tmp
instead; or, even better, a subdirectory of/tmp
just for it, so that you don't have to interrupt other programs using/tmp
if you want to rebuild the offending program's temp directory again.While looking for such files, use
ls -a /
to show hidden files. If that doesn't turn up anything, you might try wading through the output oflsof / | grep -i del
; there may be files that had been created in /, opened, then unlinked so the name no longer shows up.it turns out, this was a broken up-link port on a switch. This has been repaired, and now the performance is what we would expect.