I have a stack of old LTO-2 tapes, and my task is to save the contents as large binary files on disk for eventual consolidation on newer media. (The contents are in a custom format specific to this site, not relevant to this question. I don't need to preserve the blocking.)
I'd like some advice on how to read the contents as fast as possible. Current performance using dd
is about 2MB/s, and I believe this is due to the low and variable block size of the files on tape. The LTO-2 specs say I should be able to get 40MB/s native.
Using Solaris, I can see with the tcopy
utility that the files are stored with a variable block size:
# tcopy /dev/rmt/1cbn
file 1: record 1: size 40
file 1: record 2: size 1024
file 1: record 3: size 10240
file 1: record 4: size 7168
file 1: record 5: size 1024
file 1: records 6 to 7: size 10240
[...]
Test read from tape to /dev/null
:
# dd if=/dev/rmt/1cbn of=/dev/null bs=128k
(Note that the block size of 128k specified here is the maximum block size. If the actual size of the block on tape is smaller than this, that smaller amount of data will be returned for each IO.)
iostat -Mzcnx 1
shows:
r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
304.2 0.0 1.9 0.0 0.0 1.0 0.0 3.2 0 97 rmt/1
This says to me that it's reading at 1.9MB/s, with an average IO size of about 6500 bytes and an average of exactly 1 IO outstanding at any one time.
So: given that I can't go back in time and change the block size that was written to tape, please let me know if there's some way to read the existing data faster.
Look into the buffer command, basically it speeds allows simultaneous read/write activity instead of the normal read, then write, then read, etc....
Basically it does this by spawning two sub-processes. The processes communicate using a shared memory buffer. The command line parameters are similar to dd with the addition of parameters to size the shared memory buffer.
The command is sometimes already in some Linux distributions, however if it is not use yum or apt-get (or whatever your package retrieval mechanism is) to get the buffer package.
I have personally used this command when authoring many of the tape backup/restore packages and it increases throughput by about 10-20%.