SQL 2005 Management Studio
I have a replication job that is flooding the log with errors. We took the replication destination offline on purpose and we are aware of the issue. We dont want to delete the replication job yet though because (I'm told) there's no "disable" for replication jobs... It's either there, or it isn't; and they don't want it to do "isn't" until the new server holds its water for a while.
Is there a way to hide the constant errors it's spilling into the logs? A filter view? Maybe an app that you all love for parsing these things?
Best answer to anybody who can recommend a good program for checking jobs and logs across SQL 2005 and SQL 2000 machines :)
Thanks!
I wrote a sproc that sucks the error log into a table in my nefarious-DBA-purposes database and then emails me the contents that don't match a table of message patterns to ignore. It runs every half hour on each of my servers. I don't have to ever go look at the logs or at a tool to see if there are errors I should be investigating, plus I have a record of the errors after the error logs are cycled off disk.
A second sproc runs every two hours and emails me about jobs that failed or ran longer than usual since the last time it ran, or are disabled (and don't have an entry in a table indicating that they're allowed to be disabled).
If the replication job is like all other sql jobs there should be a built in filter in the job history and once set, the filter will be active as long as the sql management studio is not closed.
Two things: first, don't try to troubleshoot SQL Server via its own error logs first. The big errors go into the Windows application event logs. The nice thing about checking that first is that it's standard across lots of applications, so you can use the same tools to monitor lots of applications.
Second, you asked about the best job and log checking across lots of SQL Server 2000 and 2005 machines, so I'll cast my vote for Quest Spotlight on SQL Server. Disclaimer: I work for Quest, hahaha. Anyway, it gives you a graphical dashboard showing all of your servers with simple red/yellow/green color-coding, and then you can drill down into each server to find out what jobs and things have failed.
For the logging question... There are a couple of great tools for searching the logs if you are interested in rolling your own:
Logparser 2.2 is a very useful tool that you can use to filter the SQL Error Log, the SQL Agent Log and the Windows event log. It's available from Microsoft download, has help files, and usage examples are pretty readily available with a little searching.
You can also write scripts to read all of these logs with Powershell! Use the SMO interface and the Server.ReadErrorLog method to get to the SQL Error Log: http://forums.iis.net/p/1145912/1855025.aspx#1855025
For the replication issue... It sounds like you are saying you took the subscriber offline. Is this transactional replication? You should be able to just stop the distribution agent to tell it to stop sending transactions to the subscriber, which should stop the errors. Just start it back up when you bring your subscriber back. Be careful to look at how long commands will hang out on the distributor before they are deleted, if you elapse that you will have to reinitialize.
For the larger issue... It sounds like you may be managing a SQL Server environment without a local DBA. If this is the case, invest in training, training, training! I would also advise looking at free trials of more comprehensive tools, check out SQL Server magazines to see a lot of listings for what's out there. I believe you can get free evaluations of most.