In our production environment (SQL Server 2008 Enterprise), we've been having issues with CPU usage and network utilization. Based on the specs of the (multiple) machines we have and the general functionality of our software product, load should not be an issue.
As a developer, I figured the problem must lie elsewhere, probably in the queries that are being run. I thought it would be a good idea to attach the SQL Profiler to a production instance to figure out which queries (or even code) are the hotspots, and work on optimizing those areas first. (None of our hardware/network guys have done this before.)
I attached the profiler to one of our development instances for a couple hours, and determined we have a few problem areas:
- Queries being executed inside loops -- it was common to see the same few queries being executed tens of thousands of times.
- Queries that don't get cached -- I saw a few
SELECT *
queries show up; there were also quite a few that are non-parameterized. This is a large subset of #1, where a query would only vary by a singleWHERE
parameter. - Long-running queries -- ones that are actually taxing the CPU; these may be inefficiently written, not utilizing indexes, etc. The hit count on these was much lower.
Based on the summary statistics I generated, the biggest problem queries are of type #1 and #2. However, I can't immediately take those results and start working because the development instance queries are essentially what the other devs are working on at the time, not what is going to be run most in production.
I've been reading that attaching the SQL Server Profiler to an instance is problematic because of overhead. For obvious reasons, attaching in a production environment should be as lightweight as possible.
What I need to know is this -- given the three different types of queries I need to spot, how do I set up the profiler to have a minimal impact on performance? Is there another tool that I could use to accomplish this?
Notes:
- Our production environment has definite on- and off-peak times, so it may be possible to get a decent sample by capturing only 30-60 minutes of data during a peak time.
- I'd like to log to a database -- after playing around with the profiler, that log format is the easiest to consume later on.
Instead of attaching the SQL PRofiler, use the
sp_trace_xxx
procedures. The reason why the SQL Profiler is not recommended in production is that it can cause the server to block on waiting for network communication with the Profiler to flush ('network' can mean local shared memory, the location is not relevant). The idea is that a single threaded GUI managed process won't be able to keep up with a full speed stream of events from a multi-threaded native highly optimized server. By using the Trace procedures instead you are taking the GUI/Profiler out of the equation and you let the server flush the events as fast as they can be written to disk on the server, which is always much faster than how the Profiler would be able to process them. If you add a decent filter on the events (eg. only trace RPC:Completed and Batch:Completed events with a duration higher than > 1 second or something similar) then you won't have any serious impact on the server.The GUI Profiler is capable of saving a trace as a series of sp_trace calls for you, so you don't have to manually create the script.
Ideally, you could also have a staging or test environment, where you put code that you want to test, not develop, and then put a production-level load against it. In this case, you would first load a copy of production code and databases, start the profiler, and then let the users or testers use it like they would the production system.
Once you've done that, and optimized some things, you load that code back into your test environment, and test it again, to see if you find another bottleneck.
Now, if you can't do that, you should have an idea from running the profiler in your dev environment if it would be lightweight enough (or not) to try to run against production directly.
Have the profiler run from a different server/work station.