I'm building a backup station.
I'd like to be able to get an image of an HDD containing its partition table and all its partition (not one partition at the time) so that restoring will be easy to do. I'd like to do it with different HDD simultaneously, and each one will be USB.
I tried partimage but it seems to backup one partition at the time. I tried to use clonezilla but it seems to need a client machine and that's not what I need.
A linux solution will be appreciated, but I could use it in virtual machine if needed, even if I'd like it to be an automated process as much as possible.
It must support NTFS because most of the backups I'll do will use NTFS.
Note:
clonezilla seems interesting because as I understood it the client builds a package and sends it to the server over the network. I'd like to build the same easily-restorable package of an HDD plugged via USB, without any extra machine or network involvement.
Echoing wombie's concern, I don't think you want the server trying to do big data copy jobs in parallel.
Whether you are trying to copy multiple partitions, which wombie predicts would cause the disk heads to thrash and slow things down, or to trying to copy multiple disks over a usb bus, in which each data stream may cause interrupts that would slow each other down, unless you are dealing with a transmission technology specifically designed to handle high throughput from multiple clients, you are going to slow things down if you try to do them in parallel.
For example, trying to ftp a single file over 10BaseT Ethernet, I could get over 1 MByte/sec (over 8Mbit/sec) throughput, but if I tried to ftp two files from different machines, even to the same server, the throughput would fall to about 150 KByte/sec/per transfer (i.e., about 300 KByte/sec, 2.4MBit/sec). (This is from memory, and it may have taken 3 transmitting stations to get the 10BaseT throughput to drop from ~90% to ~30%. Still, adding a second station did decrease the overall efficiency, due to collisions.)
Besides, its a catch-22: the protocols that can gracefully handle multiplexing high throughput streams generally introduce high overhead. Classic examples of networking protocols that gracefully handle multiplexing high throughput streams: Token-Ring, FDDI, ATM. For example, ATM introduces a minimum 10% overhead (of the 53 bytes in a cell, 5 are header) to the transmission.
Whether you use dd, partimage, or clonezilla, I would suggest:
Then, when you add a disk to the chain, it will get copied. Like some bittorrent clients that periodically check for a torrent in some folder and then process the torrent automatically.
I would also suggest not using USB, if you can, or at least getting multiple USB cards so each disk can have its own USB bus.
With regards to clonezilla, presumably, the client and the server could reside on the same machine. Install the server, perhaps testing with a separate machine, and then install the client and have it connect to localhost or to an assigned IP of the server.
No, you don't want to be able to do this. Reading one partition at a time is the right thing to do, because then the disk heads can just stream data off the disk. If you try to read multiple partitions on the same disk simultaneously, the drive will spend half its time whipping between different parts of the disk, and you won't get anywhere near the same data transfer speed, which means your backups will take longer.
If you want to take a single image of the entire hard drive, including the partition table, then just use
dd
to read the entire image into a file (run the output throughgzip
to avoid wasting lots of disk space storing the empty space on the disk).Can you not just spawn multiple copies of
dd
?