On a daily basis we generate about 3.4 Million small jpeg files. We also delete about 3.4 Million 90 day old images. To date, we've dealt with this content by storing the images in a hierarchical manner. The heriarchy is something like this:
/Year/Month/Day/Source/
This heirarchy allows us to effectively delete days worth of content across all sources.
The files are stored on a Windows 2003 server connected to a 14 disk SATA RAID6.
We've started having significant performance issues when writing-to and reading-from the disks.
This may be due to the performance of the hardware, but I suspect that disk fragmentation may be a culprit at well.
Some people have recommended storing the data in a database, but I've been hesitant to do this. An other thought was to use some sort of container file, like a VHD or something.
Does anyone have any advice for mitigating this kind of fragmentation?
Additional Info:
The average file size is 8-14KB
Format information from fsutil:
NTFS Volume Serial Number : 0x2ae2ea00e2e9d05d
Version : 3.1
Number Sectors : 0x00000001e847ffff
Total Clusters : 0x000000003d08ffff
Free Clusters : 0x000000001c1a4df0
Total Reserved : 0x0000000000000000
Bytes Per Sector : 512
Bytes Per Cluster : 4096
Bytes Per FileRecord Segment : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length : 0x000000208f020000
Mft Start Lcn : 0x00000000000c0000
Mft2 Start Lcn : 0x000000001e847fff
Mft Zone Start : 0x0000000002163b20
Mft Zone End : 0x0000000007ad2000
Diskeeper 2009 (now 2010) works well for defragmenting in real time with minimal impact on performance. However, there is a cost as it is a commercial package. We had tried several free apps and found significant performance issues.
Diskeeper Home page
I assume from your post that you're retaining 90 days worth of images. Doing some quick math, it would appear that you need 4.28TB worth of storage. What are the I/O patterns like (i.e., is any of the data accessed more frequently)? How many volumes do you have this data spread across? How quickly does it take the performance to degrade to an unacceptable level after a defragmentation?
If you're unwilling to make changes to the system (introducing a database), perhaps you should focus on how you can defragment in a manageable fashion with the tools that are bundled with the OS. Rotate and split the data across multiple, smaller LUNs so that you can defragment them individually. After you've finished writing X days worth of data, move to the next LUN and defragment the volume with the previous X days. If you're no longer writing to it, you shouldn't introduce any more fragmentation.
If you've been provided with a sizable budget, you might look at a storage medium that's impervious to fragmentation (such as an SSD).