Woke up this morning with a page that our cluster was down. It came back up right away. I found log error logs with entries about IO taking longer than 15 seconds. Our monitoring server had tried to ping the server and had a timeout error.
I checked one of our monitoring tools to see what was going on at 4:30 in the morning. It seems to be statistics being updated on one of our large databases. The tool shows our disk being maxed out. I see very high % busy times for one of the disks.
Now sqlagent is progressing through subsequent databases alphabetically doing the same thing! We do have auto update stats on - but I thought that happened on a as-needed basis. I don't have any statistics update jobs enabled right now(that I know of - and the job monitor doesn't show any running jobs), so I'm not really sure whats causing this. Also, it's the middle of the night, so there wouldn't be users on these systems updating rows.
http://support.microsoft.com/default.aspx?scid=kb;en-us;195565 - confirms my thoughts on the as-needed nature of autostats.
The same thing also happened last night around 6:30pm - on the same large database - a few select statsman from... statements.
sqlagent is running the commands.
The disks are on a SAN and we're running the latest version of sql 2005.
Create an event notification. The MSDN list of DDL events doesn't list the statistics event(s), but they are valid sources of DDL notifications and will fire. Your notification message will contain the SPID, the LOGIN and and the actual T-SQL statement that triggered the statistics operation.
Turn off auto-update stats tonight and see if the problem is replicated. If not, then you may have to manage the stats updates yourself.