Ping a Specific Port

Question

Bill Gray

Asked: 2009-07-31 07:35:43 +0800 CST2009-07-31 07:35:43 +0800 CST 2009-07-31 07:35:43 +0800 CST

How would I measure the size of files in a directory?

772

I have a folder full of 600gb of files. I want to automatically copy the first 300 to one folder, and the rest to another folder. I am not sure how to limit the results with ls or whichever so I can pass it as an argument...

platform is linux...

edit: I want to move 300gb, not the first 300 files. File sizes are arbitrary, and ordering does not matter.

9 Answers

Voted

Kyle Brandt · Answer 1 · 2009-07-31T08:07:34+08:00

Best Answer

Kyle Brandt

2009-07-31T08:07:34+08:002009-07-31T08:07:34+08:00

Update: Oh, the first 300GB, well then... this probably slow, depending on file size, but I like the exercise :-)

filesize=0
for i in *; do 
    filesize=$(stat -c "%s" "$i");  
    (( totalsize += filesize )); 
    if [[ $totalsize < 322122547200 ]]; then
        mv "$i" first_300/
    else
        mv "$i" the_rest/
    fi
done

Hopefully there are not problems with the size of the int.

If you mean break them up into folders each with 300 files, maybe you want something like the following:

folder=0
counter=0
for i in *; do 
    mv $i foo_$folder/
    if [[ $(( counter % 10 )) -eq 0 ]]; then 
       (( folder++ ));
    fi
    (( counter++ ))
done

Although that might not be as fast as some of the find commands. If you just want to do the first 300 command, you could use the same counter strategy but use a while $counter -le 300.

5

nik · Answer 2 · 2009-07-31T07:56:10+08:00

nik

2009-07-31T07:56:10+08:002009-07-31T07:56:10+08:00

This is a way to get nearly equal 300gb distribution,

You could do a du based search to find distribution across top level directories and files and then split them into nearly two parts with some trials.

find . -maxdepth 1 -type d -exec du -sk {} \; | sort -n -k 1 > list.txt

This will give a sorted list of KB sizes.
You could do a trick like picking up alternate lines on this list for a quick nearly-even distribution

awk '{if (FNR%2==1) print $2}' list.txt > list1.txt
awk '{if (FNR%2==0) print $2}' list.txt > list2.txt

A Very rough distribution...

Finally, if you have very uneven file or directory sizes -- quite far from 300GB distribution,
keep yourself away from the bin-packing problem and do some simple trials on moving around a couple of lines between the two list files.
Find the difference between the two sets (with du) and move a directory/file
that is about half the difference from the larger list to the smaller one.
That should get you quite close

2

Benoit · Answer 3 · 2009-07-31T07:46:05+08:00

Benoit

2009-07-31T07:46:05+08:002009-07-31T07:46:05+08:00

You could do it with find, head & xargs. It should look like this:

find ./ -type f -print0 | head -300 | xargs -0 -I mv {} /one/folder
find ./ -type f -print0 | xargs -0 -I mv {} /another/folder

1

Wim ten Brink · Answer 4 · 2009-07-31T09:55:15+08:00

WARNING! When you start calculating file sizes, you are likely to make the mistake to measure them by bytes, while most file systems will allocate disk space in blocks. And this block size varies from disk to disk but is often a multiple of 512.

Basically, that means you can have 500 files of one byte each, which would only be 500 bytes. But a file-system that allocates 2048 bytes per block would thus claim about 1 megabyte of disk space. Yeah, that's a lot of overhead.

Basically, you should round up the file-sizes you get by the block size of the file system you use. That way, you can measure them more precisely.

Then again, how much difference could it be? If the block size is 2048 bytes then the average amount of bytes "lost" would be 1 KB. With 300 files this would be about 300 KB that you would need more on top of your total size. You want to copy 300 GB but how many files would that be? And are the two disks using the same file system with the same block size?

Anyway, the error margin depends on the average file-size. If you have a lot of huge files, (music, images, binaries) the error margin would be very small. If you have a lot of small files (like scripts, sources and text files) then the error margin might easily add another 30 GB to the total file size, that you didn't account for...

So, measuring file sizes isn't easy...

Chad Huneycutt · Answer 5 · 2009-07-31T07:52:36+08:00

Chad Huneycutt

2009-07-31T07:52:36+08:002009-07-31T07:52:36+08:00

You can get a listing of file usage either by pulling the size out of ls -l or by using the du command:

$ cd /dirwithlotsoffiles $ du -k *

That will print a list of the size of the files in kilobytes followed by the filename.

0

Jorge Bernal · Answer 6 · 2009-07-31T07:55:17+08:00

Jorge Bernal

2009-07-31T07:55:17+08:002009-07-31T07:55:17+08:00

The "find" answer would copy the first 300 files, not the first 300GB as I understand as the request.

You can try with tar and its multiple-volume options

0

CK. · Answer 7 · 2009-07-31T08:17:16+08:00

CK.

2009-07-31T08:17:16+08:002009-07-31T08:17:16+08:00

A pretty crude way would be to loop over files sorted by size (ls -S) and simply move each alternate file to one of the subdirectories. How about this:

#!/usr/bin/bash
dir1=path/to/dir1
dir2=path/to/dir2
a=0
for file in `ls -1S`
do
  a=`expr $a + 1`
  even=`expr $a%2|bc`
  if [ $even -gt 0 ]
  then
    mv $file $dir1
  else
    mv $file $dir2
  fi
done

~

0

Bart B · Answer 8 · 2009-07-31T08:19:02+08:00

Bart B

2009-07-31T08:19:02+08:002009-07-31T08:19:02+08:00

I'm afraid you're probably going to have to get your hands dirty with some scripting here. You can easily get a list of files and their size using the terminal command ls -l, you'd then have to write a script that goes through that list and copies files one-by-one and keeps a counter to record the number of KB transferred so far. Each time check to see if we're moved 300GB's worth yet, if not, move another file. It's probably do-able in about 10 lines of Perl or less.

0

Thorbjørn Ravn Andersen · Answer 9 · 2009-07-31T08:40:20+08:00

Thorbjørn Ravn Andersen

2009-07-31T08:40:20+08:002009-07-31T08:40:20+08:00

You can get a reasonable result by simply getting a list of filenames along with the size of each file. Sort the files according to size largest first. Then simply copy the largest file on the list that will fit in the remaining space on the target directory and remove it from the list. Repeat until no more files will fit.

Then start again with a new target directory. Repeat until the list is empty.

0

How would I measure the size of files in a directory?

Ping a Specific Port

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What's the command-line utility in Windows to do a reverse DNS look-up?

How to check if a port is blocked on a Windows machine?

What port should I open to allow remote desktop?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?