I have a Grails web application (just a standard war file) deployed on a Ubuntu 10.10 server running on tomcat 6. My database is postgresql.
The problem is that every so often (once or twice a day after inactivity) when I try to log into this web application it just freezes. I can navigate to the login page but when I try and login (first time the DB is hit, might be a clue..?) the application just freezes indefinitely, no 500 response code... the browser just waits and waits.
I followed the instructions detailed here
because the problem described sounded the same as mine. My GC logging showed no long running GC, all sub sec.
When the application freezes a jmap heap output is...
using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC
Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 536870912 (512.0MB)
NewSize = 21757952 (20.75MB)
MaxNewSize = 87228416 (83.1875MB)
OldSize = 65404928 (62.375MB)
NewRatio = 7
SurvivorRatio = 8
PermSize = 21757952 (20.75MB)
MaxPermSize = 85983232 (82.0MB)
Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 19595264 (18.6875MB)
used = 11411976 (10.883308410644531MB)
free = 8183288 (7.804191589355469MB)
58.23843965562291% used
Eden Space:
capacity = 17432576 (16.625MB)
used = 9249296 (8.820816040039062MB)
free = 8183280 (7.8041839599609375MB)
53.05754009046053% used
From Space:
capacity = 2162688 (2.0625MB)
used = 2162680 (2.0624923706054688MB)
free = 8 (7.62939453125E-6MB)
99.99963008996212% used
To Space:
capacity = 2162688 (2.0625MB)
used = 0 (0.0MB)
free = 2162688 (2.0625MB)
0.0% used
concurrent mark-sweep generation:
capacity = 101556224 (96.8515625MB)
used = 83906080 (80.01907348632812MB)
free = 17650144 (16.832489013671875MB)
82.62032270912317% used
Perm Generation:
capacity = 85983232 (82.0MB)
used = 62866832 (59.95448303222656MB)
free = 23116400 (22.045516967773438MB)
73.1152232100324% used
Anyone know what "From Space:" is?
Any ideas on further fault finding ideas? I dont have much experience with this type of fault finding.
Your delay sounds way too long to be gc-related. I would add some instrumentation code to the login page and measure things like database and page response. Then reproduce the problem manually or using a load testing tool like the Grinder.
Also, what are you running ask of this on? Dedicated hardware or a VM?
HTH!
Tom Purl