I have a folder full of 600gb of files. I want to automatically copy the first 300 to one folder, and the rest to another folder. I am not sure how to limit the results with ls or whichever so I can pass it as an argument...
platform is linux...
edit: I want to move 300gb, not the first 300 files. File sizes are arbitrary, and ordering does not matter.
Update: Oh, the first 300GB, well then... this probably slow, depending on file size, but I like the exercise :-)
Hopefully there are not problems with the size of the int.
If you mean break them up into folders each with 300 files, maybe you want something like the following:
Although that might not be as fast as some of the find commands. If you just want to do the first 300 command, you could use the same counter strategy but use a while $counter -le 300.
This is a way to get nearly equal 300gb distribution,
You could do a
du
based search to find distribution across top level directories and files and then split them into nearly two parts with some trials.This will give a sorted list of
KB
sizes.You could do a trick like picking up alternate lines on this list for a quick nearly-even distribution
A Very rough distribution...
Finally, if you have very uneven file or directory sizes -- quite far from 300GB distribution,
keep yourself away from the bin-packing problem and do some simple trials on moving around a couple of lines between the two list files.
Find the difference between the two sets (with
du
) and move a directory/filethat is about half the difference from the larger list to the smaller one.
That should get you quite close
You could do it with find, head & xargs. It should look like this:
WARNING! When you start calculating file sizes, you are likely to make the mistake to measure them by bytes, while most file systems will allocate disk space in blocks. And this block size varies from disk to disk but is often a multiple of 512.
Basically, that means you can have 500 files of one byte each, which would only be 500 bytes. But a file-system that allocates 2048 bytes per block would thus claim about 1 megabyte of disk space. Yeah, that's a lot of overhead.
Basically, you should round up the file-sizes you get by the block size of the file system you use. That way, you can measure them more precisely.
Then again, how much difference could it be? If the block size is 2048 bytes then the average amount of bytes "lost" would be 1 KB. With 300 files this would be about 300 KB that you would need more on top of your total size. You want to copy 300 GB but how many files would that be? And are the two disks using the same file system with the same block size?
Anyway, the error margin depends on the average file-size. If you have a lot of huge files, (music, images, binaries) the error margin would be very small. If you have a lot of small files (like scripts, sources and text files) then the error margin might easily add another 30 GB to the total file size, that you didn't account for...
So, measuring file sizes isn't easy...
You can get a listing of file usage either by pulling the size out of
ls -l
or by using thedu
command:$ cd /dirwithlotsoffiles $ du -k *
That will print a list of the size of the files in kilobytes followed by the filename.
The "find" answer would copy the first 300 files, not the first 300GB as I understand as the request.
You can try with tar and its multiple-volume options
A pretty crude way would be to loop over files sorted by size (ls -S) and simply move each alternate file to one of the subdirectories. How about this:
~
I'm afraid you're probably going to have to get your hands dirty with some scripting here. You can easily get a list of files and their size using the terminal command ls -l, you'd then have to write a script that goes through that list and copies files one-by-one and keeps a counter to record the number of KB transferred so far. Each time check to see if we're moved 300GB's worth yet, if not, move another file. It's probably do-able in about 10 lines of Perl or less.
You can get a reasonable result by simply getting a list of filenames along with the size of each file. Sort the files according to size largest first. Then simply copy the largest file on the list that will fit in the remaining space on the target directory and remove it from the list. Repeat until no more files will fit.
Then start again with a new target directory. Repeat until the list is empty.