I'm currently using Nginx to proxy requests to a Unicorn server running a Sinatra application. The application only has a couple of routes defined, those of which make fairly simple (non costly) queries to a PostgreSQL database, and finally return data in JSON format, these services are being monitored by God.
I'm currently experiencing extremely slow response times from this application server. I have another two Unicorn servers being proxied via Nginx, and these are responding perfectly fine, so I think I can rule out any wrong doing from Nginx.
Here is my God configuration:
# God configuration
APP_ROOT = File.expand_path '../', File.dirname(__FILE__)
God.watch do |w|
w.name = "app_name"
w.interval = 30.seconds # default
w.start = "cd #{APP_ROOT} && unicorn -c #{APP_ROOT}/config/unicorn.rb -D"
# -QUIT = graceful shutdown, waits for workers to finish their current request before finishing
w.stop = "kill -QUIT `cat #{APP_ROOT}/tmp/unicorn.pid`"
w.restart = "kill -USR2 `cat #{APP_ROOT}/tmp/unicorn.pid`"
w.start_grace = 10.seconds
w.restart_grace = 10.seconds
w.pid_file = "#{APP_ROOT}/tmp/unicorn.pid"
# User under which to run the process
w.uid = 'web'
w.gid = 'web'
# Cleanup the pid file (this is needed for processes running as a daemon)
w.behavior(:clean_pid_file)
# Conditions under which to start the process
w.start_if do |start|
start.condition(:process_running) do |c|
c.interval = 5.seconds
c.running = false
end
end
# Conditions under which to restart the process
w.restart_if do |restart|
restart.condition(:memory_usage) do |c|
c.above = 150.megabytes
c.times = [3, 5] # 3 out of 5 intervals
end
restart.condition(:cpu_usage) do |c|
c.above = 50.percent
c.times = 5
end
end
w.lifecycle do |on|
on.condition(:flapping) do |c|
c.to_state = [:start, :restart]
c.times = 5
c.within = 5.minute
c.transition = :unmonitored
c.retry_in = 10.minutes
c.retry_times = 5
c.retry_within = 2.hours
end
end
end
Here is my Unicorn configuration:
# Unicorn configuration file
APP_ROOT = File.expand_path '../', File.dirname(__FILE__)
worker_processes 8
preload_app true
pid "#{APP_ROOT}/tmp/unicorn.pid"
listen 8001
stderr_path "#{APP_ROOT}/log/unicorn.stderr.log"
stdout_path "#{APP_ROOT}/log/unicorn.stdout.log"
before_fork do |server, worker|
old_pid = "#{APP_ROOT}/tmp/unicorn.pid.oldbin"
if File.exists?(old_pid) && server.pid != old_pid
begin
Process.kill("QUIT", File.read(old_pid).to_i)
rescue Errno::ENOENT, Errno::ESRCH
# someone else did our job for us
end
end
end
I have checked God status logs but it appears CPU and Memory Usage are never out of bounds. I also have something to kill high memory workers, which can be found on the GitHub blog page here.
When running a tail -f
on the Unicorn logs I see some requests, but they're far and few between, when I was at around 60-100 a second before this trouble seemed to have arrived. This log also shows workers being reaped and started as expected.
So my question is, how would I go about debugging this? What are the next steps I should be taking? I'm extremely baffled that the server will sometimes respond quickly, but at others time it's very slow, for long periods of time (which may or may not be peak traffic times).
Any advice is much appreciated.
I would start by looking at general system health with tools like atop. Next, I would look closely at what postgres is doing. If that doesn't reveal the problem, I would use tcpdump (or the nicer tshark) to view communication between your browser, ngnix, and unicorn. In conjunction with that, I would try strace on nginx and unicorn.