I have a fairly large dataset (~160TB) that need to be delivered to a client every so often. This dataset consists of fairly large files, usually between 2Gb and 20Gb each. They exist on a BeeGFS filesisystem running on a RAID cluster with a total capacity of 1.1Tb. Currently, when it's time to deliver the data, it is done the following way:
- Create a mainindex of the files and their sizes
- Tally up filesizes until 4Tb, and make a sub-index of said files from the main index
- Copy files over to 4Tb USB drives
- Repeat step 2 and 3 until the entire dataset has been copied
- Give a cardboard box of USB drives to the client
What I would like to do is to just rsync this over to a mounted filesystem, so I was wondering if there is a filesystem available that can spread the storage space over multiple disks? The obvious candidates are LVM and RAID, but the problem is that the client needs to be able to read each disk on its own, which outrules this (as far as I know, at least). Is there a way of emulating LVM or something similar, but allows for individual disks to be read in a fairly standard way? In effect, allowing me to run a single rsync operation that will spread the data over multiple individual disks/filesystems
The data comes from a redhat machine, so I've simply used ext4 on the USB drives so far. However, if possible, it would be very beneficial (although not strictly necessary) for everyone if I could use a filesystem that played nicely with Windows10.
PS: I have no limitations when it comes to the amount of USB drives attached at the same time. The only real constraint I have is that the data must be accessible one disk/filesystem at a time.
create the full list of files and sizes, something like:
find /path -type f -printf "%s %h%f\n" > all_files.txt
run an awk that splits
all_files.txt
into parts, based on the total size for each part (MAXSIZE here is a placeholder for maximum size in bytes)You can now mount all the disks at different mount points (something like
/mnt/send/partial-1
,/mnt/send/partial-2
,...), using whichever filesystem you want in each one.Within a loop you
rsync
with--files-from=FILE
to the right mount point. Something along these lines: