I'm trying to analyze an enormous text file (1.6GB), whose data lines look like this:
20090118025859 -2.400000 78.100000 1023.200000 0.000000 20090118025900 -2.500000 78.100000 1023.200000 0.000000 20090118025901 -2.400000 78.100000 1023.200000 0.000000
I don't even know how many lines there are. But I'm trying to split the file by date. The left number is a time stamp (these lines are from 2009, January 18th).
How can I split this file into pieces according to the date?
Everything I know would be to grep file '20090118*' > data20090118.dat
, but there sure is a way to do all the dates at once, right?
The number of entries per date differ, so using split
with a constant number won't work.
Thanks in advance,
Alex
Assuming the file is sorted and the dates are always there, this should work:
[Save it as
my_splitter
, make it executable by runningchmod +x my_splitter
, then call it like./my_splitter input_file
]It reads the input file line by line, extracts the date and uses that to append the lines with the same date to the same file.
base_dir
is the target directory, and the files will be of the form<date>.txt
. Note: existing files won't be overwritten, new lines would be appended due to the>>
redirector, so better make sure the target directory doesn't contain any files of the form<date>.txt
.This could probably work for you:
I would use {x..y}, maybe for y, m, d cascading, shema: