I am trying to identify if behaviour that I am observing is correct or if WildFly is leaking file handle descriptors.
During our standard performance testing after upgrading from WildFly 11 to 14 we ran into an issue regarding too many open files. After digging into things a bit more it looks like it is actually the number of pipes that WildFly has open that is increasing.
To help reproduce the problem I have created a simple JSF 2.2 application that contains a large image (100mb to simplify testing). I am retrieving the image using the standard JSF resource URL:
/contextroot/javax.faces.resource/css/images/big-image.png.xhtml
And have also tried adding omnifaces and using the unmapped resource handler URL:
/contextroot/javax.faces.resource/css/images/big-image.png
Adding Omnifaces did not change the behaviour I am seeing, and I have only included it as we first thought that it might have been a contributing factor.
Behaviour I am seeing:
WildFly starts and jstack reports that it has two threads matching default task-*
which is the default value for task-core-threads
If I send in 5 concurrent requests for my large image, 3 new default task-*
threads are spawned to serve the requests. 3 new Linux pipes will also be created.
If I stop my requests and wait for 2 minutes (the default value for task-keepalive
) 3 of the threads will be removed. The pipes remain open.
Periodically - I believe about every 4.5 minutes some kind of clean up occurs and the pipes that were left over from the step above are removed.
However... If one of the original 2 worker threads is removed, e.g. task-1, task-3 and task-4 are removed, leaving task-2 and task-5, the pipe associated with task-1 is never cleaned up.
Over time these pipes add up and as far as I can tell they are never removed. Is this a leak somewhere, and if so where? JSF? WildFly? Undertow?
Things I have tried:
WildFly 14, 17 and 18
With and without Omnifaces (2.7 and 3.3)
Changing the min and max threads to be the same - this prevents handles building up, but I'd rather not go down this route
I'm facing a kind of this leak, too. Handles are "lost" in triples of two pipes and one epoll selector. (@Gareth: may you confim this? Take a look at /proc/$PID/fd for pipes and anonymous inodes). From this it seems to be spawned by Java NIO Channels.
I discovered that the handles are released (at least) by invoking a Full GC (@Gareth: may you confirm this)? I'm using a well-tuned Java8-JVM with G1GC enabled, and as a enjoyable result a Full GC happens very seldom. But as a negative consequence, it will consume thousands of this FH triples in the meanwhile.
Because the handles are releasable, it's not a real leak but an effect of a Soft/Weak/Phantom-Reference.
And have reached the assigned OS limit (the JVM with Wildfly runs inside a LX-Container) twice last week. Therefore, as a first workaround for production, I wrote a watchdog which invoke a FGC using jcmd if the level of pipe handles rise a limit.
This is watched on a (balanced) pair of Wildfly-13 running about >20 applications. It don't seem to be related to a concrete application because it also happens (on both Wildfly of the pair) if I disable single applications from load balancing (on one of the pair).
It don't "show up" on other (pairs) of our Wildflies, but there is another set of application with other usecases. There is more memory circulation and more "pressure" on the heap. Maybe this will trigger the contemporary release of the objects holding the filehandles in another way.
By taking a look on a heap dump with the "Memory Analyzer Tool", I was able to discover a comparable high and equal number of instances of
sun.nio.ch.EPollArrayWrapper
andsun.nio.ch.EPollSelectorImpl
with inbound references toorg.xnio.nio.NioXnio$FinalizableSelectorHolder
.Have started encountering this issue after migrating the WildFly runtime from Java from 8 to 11.
Java 11 changed the default GC algorithm to G1. In G1 old generation objects are collected very selectively and only once a certain heap occupancy threshold is passed. If you have few objects that get promoted to old generation and which help reaching this threshold, it's possible the
org.xnio.nio.NioXnio$FinalizableSelectorHolder
pile-up there for very long periods, retaining the open file descriptors.In my case switching garbage collection to use concurrent mark and sweep solved the problem, though I'm fairly certain G1 can be tuned to collect the old generation more aggressively.
-XX:-G1UseAdaptiveIHOP
and-XX:InitiatingHeapOccupancyPercent
are the switches to play with. Another approach might actually be reducing the size of old generation heap (–XX:NewRatio
) or of the entire heap.