I have a folder with csv files whose file names are dates, viz.: January-01-2018.csv
, January-02-2018.csv
, ..., April-30-2018.csv
.
Using Bash preferably, I want to extract the number of lines from each csv file but doing so in order of date. i.e., I wish to extract the number of lines in January-01-2018.csv
and then January-02-2018.csv
... and then April-30-2018.csv
and so on.
At the moment, all I have is:
for filename in $(ls *.csv); do cat $filename | wc -l >> by_day.dat; done
But this does not take care of my operation in "ascending order of date".
Any suggestions on how I might accomplish this? I would like to do this using bash.
You can do this by combining a few common tools:
find
to list all .csv files (unordered) and execute a command for eachbasename
to extract the file name without.csv
extension from the pathdate
to interpret the date specification in the file name and convert it to an easily sortable number, like seconds since 1970.echo
to print the calculated number and the real file path in one line for each filesort
to sort the file paths according to this converted date numbercut
to extract only the file paths again from the combined listxargs cat
to construct a command by passing all file names in order to thecat
command for concatenating them.The complete line looks like this, if all files we want to process are located in a folder named
datecsv
:My example files producing the output above are these:
As you only want the line number of each file, the command for that would look like this:
The only change is the last part, where we use
xargs -n1 wc -l
instead ofxargs cat
as above.Some notes: The approach above relies in your file names being a format that
date
can parse. This is the case for the example names you provided, but it might break if the format changes. It also requires the file name to end with a lowercase.csv
. Not sure if some special characters in file names might break stuff (spaces should probably be safe, newlines will surely break it).