How to unzip a zip file from the Terminal?

Question

Alexander Engelhardt

Asked: 2011-03-03 03:59:40 +0800 CST2011-03-03 03:59:40 +0800 CST 2011-03-03 03:59:40 +0800 CST

Split a text file by its entries

772

I'm trying to analyze an enormous text file (1.6GB), whose data lines look like this:

20090118025859 -2.400000 78.100000 1023.200000 0.000000
20090118025900 -2.500000 78.100000 1023.200000 0.000000
20090118025901 -2.400000 78.100000 1023.200000 0.000000

I don't even know how many lines there are. But I'm trying to split the file by date. The left number is a time stamp (these lines are from 2009, January 18th).
How can I split this file into pieces according to the date?

Everything I know would be to grep file '20090118*' > data20090118.dat , but there sure is a way to do all the dates at once, right?

The number of entries per date differ, so using split with a constant number won't work.

Thanks in advance,
Alex

3 Answers

Voted

htorque · Answer 1 · 2011-03-03T05:12:54+08:00

htorque

2011-03-03T05:12:54+08:002011-03-03T05:12:54+08:00

Assuming the file is sorted and the dates are always there, this should work:

#!/bin/bash

base_dir='./'    

while read line; do
    date="${line:0:8}"
    echo "$line" >> "$base_dir$date.txt"
done < "$1"

[Save it as my_splitter, make it executable by running chmod +x my_splitter, then call it like ./my_splitter input_file]

It reads the input file line by line, extracts the date and uses that to append the lines with the same date to the same file.

base_dir is the target directory, and the files will be of the form <date>.txt. Note: existing files won't be overwritten, new lines would be appended due to the >> redirector, so better make sure the target directory doesn't contain any files of the form <date>.txt.

6

Arcege · Answer 2 · 2011-03-03T07:10:58+08:00

Arcege

2011-03-03T07:10:58+08:002011-03-03T07:10:58+08:00

This could probably work for you:

awk '{d=substr($1, 1, 8); fn = "data" d ".dat"; print $0 >> fn}' hugefile

1

user unknown · Answer 3 · 2011-03-03T08:35:31+08:00

user unknown

2011-03-03T08:35:31+08:002011-03-03T08:35:31+08:00

I would use {x..y}, maybe for y, m, d cascading, shema:

for d in {18..19} ; do grep 200901$d datadata; echo; done 
20090118025859 -2.400000 78.100000 1023.200000 0.000000
20090118025900 -2.500000 78.100000 1023.200000 0.000000
20090118025901 -2.400000 78.100000 1023.200000 0.000000

20090119025859 -2.400000 78.100000 1023.200000 0.000000
20090119025900 -2.500000 78.100000 1023.200000 0.000000
20090119025901 -2.400000 78.100000 1023.200000 0.000000

0

Split a text file by its entries

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

What command do I need to unzip/extract a .tar.gz file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?

How can I add a user as a new sudoer using the command line?

Change folder permissions and ownership