I know that there are really three kinds of fragmentation that I need to be concerned about as a DBA:
Index Fragmentation in the SQL data files, including clustered index (table) fragmentation. Identify this using DBCC SHOWCONTIG (in SQL 2000) or sys.dm_ db_ index_ physical_ stats (in 2005+).
VLF Fragmentation inside of SQL Log files. Run DBCC LOGINFO to see how many VLFs are in each of your SQL log files.
Physical file fragmentation of the database files on the hard drive. Diagnose this by using the "Disk Defragmenter" utility in Windows. (inspired by this excellent blog post)
A lot of attention is paid to index fragmentation (see this excellent Serverfault answer from Paul Randall), so that's not the focus of my question.
I know I can prevent physical fragmentation (and VLF fragmentation) when the database is originally created by planning a reasonable expected data file and log size, because this fragmentation occurs most often from frequent grows and shrinks, but I have some questions about how to fix physical fragmentation once it is identified:
First of all, is physical fragmentation even relevant on an Enterprise SAN? Can I/should I use Windows Defragmenter on a SAN drive, or should the SAN team be using internal defragmenting utilities? Is the fragmentation analysis I get from the Windows tool even accurate when run on a SAN drive?
How big of a deal is physical fragmentation on SQL performance? (Let's assume an internal drive array, pending the outcome of the prior question.) Is it a BIGGER deal than internal index fragmentation? Or is it really the same kind of problem (the drive having to do random reads instead of sequential reads)
Is defragmenting (or rebuilding) indexes a waste of time if the drive is physically fragmented? Do I need to fix the one before I address the other?
What's the best way to fix physical file fragmentation on a production SQL box? I know I can turn off SQL services and run Windows Defrag, but I also heard about a technique where you do a full backup, drop the database, then restore from the backup to an empty drive. Is this latter technique recommended? Does restoring from a backup like this also build indexes from scratch, eliminating internal index fragmentation? Or does it simply return the page order to the same as when the backup was taken? (We're using Quest Lightspeed backups with compression, if that matters.)
UPDATE: Good answers so far on whether to defragment SAN drives (NO) and whether index defragmentation is still worthwhile on physically fragmented drives (YES).
Anyone else care to weigh in on the best methods for actually doing the defragmentation? Or an estimate on the length of time you'd expect it would take to defrag a large fragmented drive, say 500GB or so? Relevant, obviously, because that's the time my SQL server will be down!
Also, if anyone has any anecdotal info on SQL performance improvements you've made by fixing physical fragmentation, that would be great, too. Mike's blog post talks about uncovering the problem, but isn't specific about what kind of improvement it made.
I think this article gives an excellent overview of defragmentation of SAN drives
http://www.las-solanas.com/storage_virtualization/san_volume_defragmentation.php
Basic points is that defragmenting is not recommended on SAN storage because it is difficult to correlate the physical location of blocks on the disk when the location has been virtualized by the SAN when presenting the LUN.
If you were using RAW device mappings or you have direct access to a RAID set that is the LUN you are working with, I could see degfragmentation having a positive effect, but if you're given a "virtual" LUN off a shared RAID-5 set, no.
Multiple parts to this question and answer:
Physical file fragmentation isn't really relevant for Enterprise SAN storage, as Kevin already pointed out - so nothing to add there. It does really come down to the I/O subsystem and how likely you are to be able to make the drives go from more-random I/Os when performing a scan to more sequential I/Os when performing a scan. for DAS, it's more likely you will, for a complex slice-n-dice SAN, probably not.
File-system level defragging - only do it with SQL shut down. I've never experienced problems myself here (as I've never performed an online, open-file defrag of SQL database files) but I've heard plenty of anecdotal evidence from customer and clients of weird corruption problems occuring. General wisdom is not to do it with SQL online.
Index fragmentation is completely orthogonal to file fragmentation. SQL Server has no idea of file-fragmentation - too many virtualizatin layers in between for it to have any hope of working out actual I/O subsystem geometries. Index fragmentation, however, SQL does know everything about. Without repeating myself too much from the answer you already referenced, index fragmentation will prevent SQL doing efficient range-scan readahead, regardless of how fragmented (or not) the files are at the file-system level. So - absolutely you should mitigate index fragmentation if you're seeing degrading query performance.
You do not have to do these in any particular order, although if you take care of file-system fragmentation and then rebuild all your indexes and cause more file-system fragmentation by growing multiple files on a defragged volume, you're probably going to be ticked off. Will it cause any perf issues though? As discussed above, it depends :-D
Hope this helps!
I run SYSINTERNALS' contig on my database files.
See http://technet.microsoft.com/en-us/sysinternals/bb897428.aspx
I would recommend sizing the db appropriately, shutting sql server down, copy the database file to another disk array, and then copy it back to defrag it. Much faster than using windows defrag in my experience.
I tried to defragment the physical disks in a scsi solution once, but got little or no performance boost at all. The lesson i learned is that if you experince slow performance due to the disk system, it doesnt have anything to do with the fragmentation, as far as we talk data file, since it is using random access.
If your indexes are defragmented and statistics are updated (very important) and you still see I/O as bottleneck, then you suffer from other things than physical fragmentation. Have you used more than 80% of the drive? Do you have enough drives? Are your queries optimized enough? Are you doing a lot of Table scan or even worse a lot of index seek followed by clustered index lookup? Look at the query plans and use "set statistics io on" to find out whats really going on with your query. (look for a high number of logical or physical reads)
Please let me know if i am completely wrong.
/Håkan Winther
Maybe the indexes are not optimized enough for your application and you do not have Veritas I3 to optimize your database then you could use a statement like this to find missing indexes:
Or a statement like this to find indexes that is not used in select statements and decreases update/insert performance:
I have some other SQL statements that I am using when I analyzing performance issues in the production environment, but these two is a good start I think.
(I know, This post is a little bit of topic, but I thought you might be interested since it has to do with indexing strategy)
/Håkan Winther