OK. some givens to factor in
I use Backup Exec 12.x/13.x, Have server 2003/2008 environment including Exchange.
I have Backup to Disk (Full/Diff) happening that is independent of the backup to LTO (Full/Diff). For more than one reason I'd rather not just go to backing up from disk to tape I'd like to keep the backup direct to LTO happening.
I currently have a single LTO-3 drive with no sort of loader/robot/library. The box serving the LTO drive has an Adaptec 39160 Ultra160 SCSI card in it. I currently use one tape for Full (one per week) and one tape for Diff (four days a week before the tape is taken out). The Full backup is bumping up against the 372.5GB barrier and when it does the backup doesn't finish Saturday it's still waiting for a tape on Monday morning.
Ward mentioned putting the second LTO3 full tape in on Monday afternoon/evening after normal business hours. The problem with this is compared below:
Normal flow
- Friday insert LTO3 tape 1 for Full backup for week 1
- Monday insert LTO3 tape for differential
- Tuesday, Wednesday, Thursday differentials use tape that was inserted Monday
- repeat for week 2
2 LTO3 tapes for Full backup flow
- Friday insert LTO3 tape 1 for Full backup for week 1
- Monday insert LTO3 tape 2 for Full backup for week 1
- Monday insert LTO3 tape 1 for Full backup for week 1 (for verify process)
- Monday insert LTO3 tape 2 for Full backup for week 1 (for verify process)
- Tuesday insert LTO 3 tape for Differential
- Wednesday, Thursday differentials use tape that was inserted Tuesday
- repeat for week 2
The extra tape swaps eat 6+ hours into Monday (starting from the time I put in the second tape). If I did that at 5PM I'd be here until almost midnight swapping tapes. That's not counting the idle time on Sat/Sun/Mon waiting for a tape.
Now I could turn off the verify process and save two tape swaps and shorten the "backup" process by several hours but I can't just put tape 2 in and walk away at the end of the day if I don't turn verification off. Having the backup spill over onto a 2nd tape lengthens the backup process but it also
- Increases the number of tapes in the rotation (cost)
- Increases the number of tapes in transport (size/weight of briefcase going to off-site storage)
- Increases the complexity of the backup process by making me stay on site after hours for verification process
- Increases the complexity of managing backup/restores from my office which is not right next to the server room. This goes quadruple for dealing with such issues from home.
And yes I'm not going in on Saturday to sit there for 6+ hours and babysit the tape drive. I'd like to have a life outside of work. 12 hour days M-F are bad enough when they happen. I'm not going to permanently tie myself to a 6 day workweek.
The tape drive is a Dell PowerVault 110T LTO3. The backup server is on Gigabit Ethernet using only a single NIC and can fill a full tape in about 12 hours.
I can change the backup process to separate one of the more intensive servers to a full backup on its own LTO to temporarily hold this decision off but soon I think I'll need to choose one of these options:
Buy a LTO-3 drive and take advantage of just having a second physical tape available.This is a less desirable option and only makes sense if LTO-3 drives are considerably cheaper than LTO-4 drives which is not the case.Buy a LTO-4 drive and use the LTO-4 tapes for full backups and use LTO-3 tapes for differentials until the LTO-3 tapes get rotated out and new LTO4 tapes match the price of LTO3 tapes. This will probably get me through the weekend backup for years to come without having to swap tapes. This also partially addresses shoeshining since LTO4 has lower minimum speed than LTO3.
Buy something that can feed tapes in automatically. I'm assuming there isn't something I can add to the PowerVault 110T and this would mean a purchase of a new device that has the tape and loader in a single unit. This is probably not cost effective vs just getting a drive and manually loading tapes but going autoloading LTO4 would be the ultimate in convenience. I'll let the boss above me decide between single tape drive and autoloading drive.
Evan Anderson mentioned in another solution that you could buy drives around this price range
LTO-4 (internal drive, 1 tape / day) - $2,766.00
LTO-4 (autoloader, 1 tape / day) - $4,566.00
but I don't know specifics on what he or you would recommend for the actual drive and if necessary controller. Show me a newegg URL (or Dell, or HP, or whatever your favorite vendor would be) for your solution if you don't mind looking it up or just give me a brand and a model number and I'll be glad to do the leg work myself.
I'm looking to make a needed purchase some time down the road before this backup rotation gets to be too cumbersome. I probably have a few months.
Xenny mentions the age of the servers and speed of backup. The Exchange server is 6 years old (though the hard drives are much newer). There are a couple of 4 year old servers in the mix with consumer grade sata drives (WD6400AAKS). Servers I consider "new" are 2 years old at this point.
Backup to disk from the old exchange server has been as fast as 2184 MB/min but in general backup to disk is just as slow as backup to tape in this setup. In fact backup to disk is sometimes slower than backup to the LTO-3 tape drive. I've also had issues with drives failing and lack of bays to add more drives. In general backup to disk is even more of an issue than the LTO3/4 transition but that belongs on a different question on serverfault if I wanted input on that subject.
I'll just pick some numbers from a recent backup to give you an idea on speeds. This is not a complete list but gives you an idea on the variety of speeds involved. I plan to update this soon in the format of oldspeed MB/min newspeed MB/min where oldspeed is the old SCSI 320 LTO3 and newspeed is the SAS LTO4.
DC C: ~850 MB/min
DC system state ~700 MB/min
Exchange Server C: and system state ~500 MB/min ~600 MB/min
Exchange Server D: ~1400 MB/min ~1200 MB/min
Exchange Server First Storage Group ~1100 MB/min ~700MB/min
Webserver C: ~600 MB/min ~950 MB/min
Webserver E: ~1700 MB/min ~1950 MB/min
Fileserver C: ~500 MB/min
Fileserver E: ~1500 MB/min ~2200 MB/min
Fileserver G: ~1800 MB/min ~2400 MB/min
Fileserver system state ~650 MB/min
faxserver C: ~400 MB/min ~550 MB/min
Accounting server C: ~1300 MB/min ~1775 MB/min
Accounting server D: ~1500 MB/min ~2250 MB/min
Accounting SQL instance ~1600 MB/min
application server C: and system state ~700 MB/min ~900 MB/min
backup server C: 700 MB/min ~1800 MB/Min
backup server E: 1350 MB/min ~2900 MB/min
Monitoring the Fileserver I saw numbers that make me think the raid controller is holding back the transfer rates. The controller is SATA 1.5 but the drives are 3.0 capable. I noticed after changing volumes from RAID 1 to RAID 10 and getting no increase in speed for the backups. Unfortunately doubling the sustained read speed had no affect on backup to the LTO3 tape drive.
In general backup straight to LTO gives me a decent benchmark of where my servers are I/O limited. The servers that are backing up below 1500 MB/min are generally slow disk wise and the ones between there and 2400 MB/min are still low hanging fruit. For example the Exchange 2003 server is getting low on disk space and continues to expand the database for the First Storage Group out to slower portions of the disks. This server will be replaced with a Exchange 2010 server with faster processors and more disks. The other servers will get disk upgrades and/or SSDs added.
http://en.wikipedia.org/wiki/Tape_drive mentions "When shoe-shining occurs, it significantly affects the attainable data rate, as well drive and tape life." but it doesn't mention shoe-shining reducing effective capacity of a tape. After looking at archival tapes from the bank I can confirm about 2% to 15% space wasted on the LTO3 tapes. Nowhere near enough to keep me from moving to LTO4 or an autoloader but it could be significant. For those of you with Backup Exec you can calculate your shoeshining waste by:
- Making a backup job that will backup around 100% of the tapes native capacity without compression. Disable compression on the drive and software when running the test.
- look in the media tab of backup exec and compare the "used capacity" column to the "Data" column. If compression is off and the numbers match you aren't shoeshining at all.
In my case I had a archival LTO3 tape with 272.4 GB "used" but only 233.67 GB "data" and another with 400.6 GB versus 395.19 GB. I also tried a backup to LTO4 without compression and got 833 GB "used" with only 786.77 GB "data". Obviously the shoeshining will vary from my environment to yours but before this I didn't think to test it. Hopefully this will make it clear to you how to figure out how much wasted tape you have in your backup environment.
edit: new info at http://www.fujifilmusa.com/shared/bin/LTO_Overview.pdf showing minimum tape speeds for LTO3 and LTO4. It looks like the IBM LTO4 actually has a lower minimum speed than the IBM LTO3. Either way my average server is too slow to feed LTO3/4 without shoeshining. I'm concerned even my backup to disk local volumes will be too slow to feed the drive quickly but I'll have to test that.
Pulling IBM full height drive info from PDF above I get
LTO4 : 30-120MB/s 800GB native (45-240MB/s compressed)
LTO3 : 40- 80MB/s 400GB native (60-160MB/s compressed)
LTO2 : 18- 35MB/s 200GB native (27- 70MB/s compressed)
LTO1 : 15- 15MB/s 100GB native (30- 30MB/s compressed)
Update: The server I was using for backup started giving me stop errors so I moved the tape drive to another server. The old SCSI controller was an Adaptec 160 the "new" controller is a LSI based 320 (at least I assume the external connector is a 320 as the 4 hard drives inside the server mention 320 SCSI in the server management).
The new server situation leaves me without backup to disk temporarily until I get an external enclosure for direct attached storage. In general this LTO discussion has pointed me towards buying more hard drives for my servers. I will have work to do reconfiguring RAID arrays to increase the speed of the backup and hopefully increase the reliability of the overall setup.
Update 2:Comparison below uses a old fileserver whose raid controller bottlenecks all transfers at ~40MB/s so ideal would be about 2400MB/min. This is about the speed needed to test the edge of shoe shining. Presumably the data flow will not be perfectly regular and will force speed matching almost all the way through the test.
I no longer know the buffer size and buffer count I used on the speed test of the old LTO3 drive but it doesn't change it much at all I got maybe 100MB/min gain by tuning buffers. The test data is about 20GB of scanned tifs and jpgs. I did these tests on a Friday afternoon and I didn't repeat the tests enough times to average the data or otherwise weed out invalid data. Testing after hours, choosing different data, and other variables could noticeably affect these tests.
The same servers are used in all tests. The old drive is on a 320 SCSI LVD controller that is PCIx. The new drive is on a PCIe LSI 3801E SAS controller. It is possible that the drive controller and/or the LTO3 tape drive are bottlenecks. I won't be testing the individual components, only the old pairing vs the new pairing. The server running Backup Exec has 4GB ram, 32bit Server 2008 standard, Pentium D 3.2GHz dual core CPU.
Network connectivity is by way of a 1Gb switch both servers are on the same switch. I have a Remote Desktop Connection open but with the backup going + that connection the Gb connection is less than 50% utilized at worst and averages more like 25% usage.
So as rough as the test methods are I feel reasonably confident that the bottlenecks aren't in a variable that I'm ignoring.
Short Test Results:
~1500 MB/min using Dell LTO3 drive and LTO3 tape compression ON, 64KB block size (many buffer counts tested, best result listed here)
~1800 MB/min using Quantum Superloader3 LTO 4 drive with a LTO3 tape (same tape as above) compression ON, 64KB block size, 64KB buffer size, buffer count 10, highwater count 0, Write Single block mode ON, Write SCSI pass-through mode ON
~2150 MB/min using Quantum Superloader3 LTO 4 drive with a LTO3 tape (same tape as above) compression ON, 256KB block size, 256KB buffer size, buffer count 10, highwater count 0, Write Single block mode ON, Write SCSI pass-through mode ON
~2200 MB/min using Quantum Superloader3 LTO 4 drive with a LTO3 tape (same tape as above) compression OFF, 256KB block size, 256KB buffer size, buffer count 10, highwater count 0, Write Single block mode ON, Write SCSI pass-through mode ON
~2050 MB/min using Quantum Superloader3 LTO 4 drive with a LTO4 tape compression ON, 256KB block size, 256KB buffer size, buffer count 10, highwater count 0, Write Single block mode ON, Write SCSI pass-through mode ON
~2250 MB/min using Quantum Superloader3 LTO 4 drive with a LTO4 tape compression OFF, 256KB block size, 256KB buffer size, buffer count 10, highwater count 0, Write Single block mode ON, Write SCSI pass-through mode ON
~2050 MB/min using Quantum Superloader3 LTO 4 drive with a LTO4 tape compression ON, 256KB block size, 1MB buffer size, buffer count 10,highwater count 0, Write Single block mode ON, Write SCSI pass-through mode ON
~2300 MB/min using Quantum Superloader3 LTO 4 drive with a LTO4 tape compression OFF, 256KB block size, 1MB buffer size, buffer count 10, highwater count 0, Write Single block mode ON, Write SCSI pass-through mode ON
~2200 MB/min using Quantum Superloader3 LTO 4 drive with a LTO4 tape compression ON, 256KB block size, 1MB buffer size, buffer count 20, highwater count 0, Write Single block mode ON, Write SCSI pass-through mode ON
~2300 MB/min using Quantum Superloader3 LTO 4 drive with a LTO4 tape compression OFF, 256KB block size, 1MB buffer size, buffer count 20, highwater count 0, Write Single block mode ON, Write SCSI pass-through mode ON
It's clear that tuning block size is more important than buffer size. No matter the block or buffer size you use, you will get better performance turning off compression if your source data can't keep up with the tape drives minimum data matching rate. Unfortunately that is a per drive setting not a per job or per tape format setting so you can't just restrict compression to full backups or to LTO3 only. You will also have to test how much of an issue it is with your combination of hardware/software. Of course that hit in performance is minor and the more important tests will be to optimize the Full backup of 600GB to 800GB instead of 20GB. I'll try to update again once I have a few weeks or months of backups done.
As an asdide, note that 100MB/min is far below the minimum speed for tape streaming with LTO 3, so you're probably losing a fair amount of capacity with the tape stopping and starting (ie you're probably getting better than 1.5:1 compression, but this is lost in gaps in the data on the tape). This will probably be rather worse with LTO 4, as I think the minimum speed has gone up.
Disk - Disk - Tape will help with the minimum speed problem, and will give you some capacity for free.
If you're not doing it, strongly consider some kind of scheduled defrag of the disks on the servers you're backing up. 1000 MB/min is not a great level of throughput for gig ethernet on reasonably modern hardware. I'd expect that on even 2 year old machines you should be able to get 1800MB/min (that's only reading from the server disk at 30MB/sec), so there's scope for improvement.
Edit: For LTO 3, you really want a 256KB block size for best performance.
WRT shoe shining, There's no time for the tape to rewind if the buffer runs empty briefly, so it'll leave a gap on the tape.
inevitably, backups exceed the capacity that you originally planned for. Here's what i would suggest and say about your situation:
So the Full backup exceeds the capacity of one tape. Then use two tapes.
Follow Symantec's recommendation and continue doing your backup to disk, then backup those disk backups to tape. schedule the backups to disk to occur after hours when fewer resources are in use. schedule the backups to tape to occur anytime during the day after the disk backups are complete because the backups to tape don't have any impact on production systems.
Think of your backups for the week (Full and Differentials) as being part of the same backup set. if it takes two or three tapes per week, then so be it.
schedule the backups to tape to occur only during the week when you're there to swap the tapes.
i have a similar situation, i'm using a dell powervault 110t lto2 drive and here's what i do:
on saturday i take a full backup to disk (backup to disk folder for full backups).
sunday through friday i take incremental backups to disk (another backup to disk folder for incrementals).
monday through friday i take backups to tape of the full and incremental backup to disk folders. when the tape reaches it's capacity i swap it out. if it reaches capacity in the middle of the night, i swap it out the next morning and the tape job finishes.
after fridays backup to tape job i swap tapes for the next week. the two tapes i pull out are the full and incrementals from the current week and go into my 4 week rotation. now i know that all of the current weeks backup data is on one tape set, stored off site.
rinse and repeat
We do something similar to joe:
If you really have to do the disk-tape independent of the disk-disk backup, I'd live with the two backups being slightly out-of-sync:
I don't see a problem with having slightly different sets of files backed up on the two different media. In almost all cases, you're going to restore a file from the disk backup, with the tape just a fallback or an easy way to organize multiple backup sets.
Here's one option that could help you get by for a while:
Have you considered splitting your backup into two separate data sets? Depending on how your files are organized, you might be able to easily divide it into two logical chunks (ie. by department). You would do a full backup of the first dataset on Thursday night, and a full backup of the second dataset on Friday night. Each night after that would run two jobs onto a single tape, a differential for each dataset.
This way you're not coming in on weekends and you aren't having to babysit a drive while waiting for a verify to complete. In addition, you get the added protection of not having all of your eggs in one basket, so to speak.