I want to pipe a stream to split
. I know how big will be the stream in bytes (very big, comes from network), I want split to create N files of relatively equal size, without splitting lines in half. Is it possible to achieve that. Something like:
cat STREAM | split $SIZE_OF_STREAM $NUMBER_OF_FILES_TO_PRODUCE
I could not find a way to achieve that through docs, I'm sorry if it was obvious but I couldn't find it.
Oh well, it seems that the
split
utility on Mac (and maybe BSD) is one option short :(On Linux, there is
-C
option, which enables you to say each chunk of lines to be of how many bytes. Or said in simpler way - if you passcat file | split -C 1000
, it will create chunks of UP TO 1000 bytes of whole lines, which with elementary math gives me an easy way to achieve what I wanted.create file which will be out STREAM:
now will split it
it will give you a log of files with fixed size 2 bytes and names file1, file2, file3....
I would simply split on line count as that will make all files except for the last one nearly equal.
You could do the math with $SIZE_OF_STREAM divided by $NUMBER_OF_FILES_TO_PRODUCE but just setting a line count gets you 90% of the way there for having all files basically equal unless your line length is distributed in a very non-normal manner.
I have linked to the online documentation, but man pages are shipped with OS X so you can see that split there has a byte cutoff as well as a line cutoff.