Background
Hello,
I have a server I'm running on a free EC2 instance. I'm using nginx and passenger/rails as my web server and application server.
The server receives little traffic (still in development), but a reasonable amount of traffic from random bots. The server also serves images from S3. The front end is served statically at mywebsite.com
defined in one server block, and the backend is served using passenger at api.mywebsite.com
.
The Problem
Seemingly randomly, the CPU usage on the server goes to 100%. The CPU usage spikes are correlated with network out spikes, although the network out spikes still seem to be relatively small. When this happens, the front end can no longer be served, and I can't even SSH into the server to check what processes are running.
What I've tried
- Blocking malicious bots using this bad-bot-blocker.
- Correlating network spikes to requests in the nginx access log at
/var/logs/nginx/access.log
. Usually the correlation is pretty unclear. - Looking at
/var/logs/nginx/error.log
for anything relevant.
When the CPU does this, I often end up rebooting the server on the EC2 console which seems to work, but obviously isn't sustainable.
I'm new to deployment stuff/DevOps, so I was wondering if there's anything obvious I might be missing based on this information. I'm not even sure what layer is causing the problem (AWS/nginx/my rails backend/vanilla HTML/JS frontend). If there's any other information I can provide, please let me know.
Thanks,
Jacob