I have a potential client that has a php site that performs fine most of the time. However, every week or so, it will experience lag (slow page loading). I am sure there are a myriad of things that can be causing this (network issues, bad installation, a specific php file, increased traffic load). However, I need a way to deduce what is causing this. Is there any server monitoring software that is made especially to handle these situations?
PS: The server is linux
I would find out the following:
If the slowdown is always on Friday at quitting time and the application is used for users to enter their time card data for the week, it might simply be the server needs more CPU/Memory and or Bandwidth to take the load of all the last-minute users. Suffice it to say, those type of patterns will be hard to track down without knowing the ins and outs of the application and its users and uses.
In order to recommend tools, we'd need to know what OS your app is running on? Windows/IIS, Linux/Apache? However, in my anecdotal experience, site slowdown is caused by one of a few things:
SELECT * FROM TableXYZ
The most common things to check (for performance related problems) are
For us in the MSSQL environment, the Auto-grow DB option being enabled on a busy DB server can cause a random slowdown to occur as well. During the Auto-grow, disk I/O is very heavy as the file is being expanded, thus any real-time transactions occurring during that time will be noticeably slower especially if this server is already running near its I/O limits.
To solve this, we simply expanded the DB auto-grow amount in MBs to a ridiculously large number such that it only occurs once a year now. Still a slow-down but it doesn't happen every week now.
Do you have shell access?
Is sysstat installed?
If you have shell access, make sure sysstat is installed, and enabled. Wait an hour or so for it to collect data, and type "sar" as root, from the command line. If you see data, great. Now leave it. Read through sysstat or sar tutorials in the mean time. WHen you see lag again, connect, run sar, see what exactly is slowing it down when. Seeing exactly when, and what is being starved (CPU/Memory/IO/Network) will give you a better idea of what to look at.
How do you measure the slow down? Do you use external web monitoring tools like keynote or alertfox? That would be useful to compare with the internal logs.
A really great little tool for system monitoring (either ad-hoc on the command-line or for trends if you set it up to run in the background and log data), is dstat. It's the best tool I've found for the command-line.
One benfit is that its only dependency is on Python, so you can run it right out of your home on a system you have limited access to, and most functionality doesn't require root.
Here's a favorite alias to give you real-time usage of the major components (and the heaviest hitting processes) each second: