Have any of you experienced the following, and have you found a solution:
A large part of our website's back-end is MS SQL Server 2005. Every week or two weeks the site begins running slower - and I see queries taking longer and longer to complete in SQL. I have a query that I like to use:
USE master
select text,wait_time,blocking_session_id AS "Block",
percent_complete, * from sys.dm_exec_requests
CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS s2 order by start_time asc
Which is fairly useful... it gives a snapshot of everything that's running right at that moment against your SQL server. What's nice is that even if your CPU is pegged at 100% for some reason and Activity Monitor is refusing to load (I'm sure some of you have been there) this query still returns and you can see what query is killing your DB.
When I run this, or Activity Monitor during the times that SQL has begun to slow down I don't see any specific queries causing the issue - they are ALL running slower across the board. If I restart the MS SQL Service then everything is fine, it speeds right up - for a week or two until it happens again.
Nothing that I can think of has changed, but this just started a few months ago... Ideas?
--Added
Please note that when this database slowdown happens it doesn't matter if we are getting 100K page views an hour (busier time of day) or 10K page views an hour (slow time) the queries all take a longer time to complete than normal. The server isn't really under stress - the CPU isn't high, the disk usage doesn't seem to be out of control... it feels like index fragmentation or something of the sort but that doesn't seem to be the case.
As far as pasting results of the query I pasted above I really can't do that. The Query above lists the login of the user performing the task, the entire query, etc etc.. and I'd really not like to hand out the names of my databases, tables, columns and the logins online :)... I can tell you that the queries running at that time are normal, standard queries for our site that run all the time, nothing out of the norm.
--March 24th
It's been about two weeks since the last reboot. I made several changes: I found a few queries where we were making heavy use of temp tables that were totally unnecessary and had our developers change how they were doing it. I adjusted the size of some of the constantly (slowly but surely) growing databases to an intelligent size for their growth. I adjusted the autogrowth settings for everything as well to be more intelligent (they were ALL set to 1MB growth). Lastly I cleaned up MSDB a bit. We do log shipping and really didn't need to keep years and years worth of backup points, I've written some scripts that keep this to only a few months. I'll keep updating this thread, as it's too early to tell if the problem is solved yet.
We found it. It turned out that it was actually a web server that had a problem with one of it's app pools. It would get stuck running the same set of queries over and over (which happened to deal in temp tables). It would just loop and loop and eventually cause the SQL server to be sad. Once this offending machine / app pool was found and 'put down' everything was resolved.
You have to ask yourself, what happens at a SQL service restart? Lots of stuff, but two relevant points come to mind:
1) SQL memory is freed.
Its possible (not sure how likely), that if your MaxMemory setting is set too high, that the SQL service grows to use all available memory, and Windows starts to swap important stuff out to the swap file. Check to make sure that MaxMemory is set to a reasonable value, leaving enough additional memory for whatever else needs to run on that box (is it a dedicated SQL server? Or is it also the app server?)
2) TempDB is rebuilt from the default sizes.
Check your default tempdb file sizes, especially the default size and growth interval of the TempDB Log file. If the growth interval is set too LOW, then the log can build up some incredible internal fragmentation, which can dramatically slow down normal usage. See these two excellent blog articles by Kimberly Tripp.
Do you make heavy use of temporary tables or cursors? Check any cursors are being closed and deallocated correctly. Also watch out for linked servers - we've got to use a buggy driver for an old linked Informix server and it periodically means we have to reboot the server.
If it looks weird then look for the weird.
If tweaking sql server settings doesn't help try the windows task manager: go to processes tab, then options > columns > add cpu time, handles, read, write, other and the memory options.
Go back to the process list. For each column sort by highest to lowest and look at the top 5 processes. Anything out of the ordinary? e.g. A memory leak on a process will have a bizarre number of handles. We have some *ki printers which add a handle to the DCSLoader process every 2 seconds. After a few weeks a machine lists lots of free memory and cpu but a process with 100,000 handles and will barely move the mouse pointer.
Check your scheduled tasks list too. Tell your AV not to scan .mdf files.
Dave,
Have you checked the wait stats? the query you gave above lists the 'last_wait_type' column. that column may have some details regarding what the queries are waiting for (network, cpu, etc.)
If your backup "Recovery model" is FULL, then does taking a backup of the DB and then a backup of the transaction logs improve things at all? On a system that is running out of disk space, this kind of thing might explain the problem.
I seem to have a configuration very similar to yours (16Gb, upgraded to 32Gb, and MD1000 with a terabyte of disks, dual quadcore xeon).
The only thing that has helped me diagnose bizarre problems like that in the past is beta_lockinfo by Erland Sommarskog. Run it when it's slow time and compare.
Also I've had an insane amount of problems with SQL 2005 before SP2, but SP3 is really stable.
Hope this gives more useful info:
Make sure db is ok with:
Keep an eye on logspace with:
If you see expansion going on, that will definately slow things down. If you run this you will see your logspace get nearer and nearer to 100%, then the log will expand and the percentage will shrink as its got some space. Hopefully you'll never get to see it expand before your backup kicks in and clears down the log.
Mostly idiot configuration. Happens.
First, you should actually regularly run index defrag in a maintenance run. Schedule it as activity, just before or after you make backups.
Second, do not autogrow your database and especially do not autoshrink it. Depending on load autogrow / autoshrink are basically suicide settings.
Not seen a slowing down SQL Server like that pretty much ever. Can you post the results of that query under times of hugh stress? Sure nothing on your end overloads SQL Server at that time?