I have a zipped file Data.zip
that (if uncompressed) contains many files:
file_1.txt
file_2.txt
...
...
I want to have a CLI command to turn this into a new folder Data_zipped
that contains the individual files in Data.zip
uncompressed:
Data_zipped/file_1.zip
Data_zipped/file_2.zip
...
...
But the trick is that Data.zip
contains so many files (and they are collectively so big) that I cannot first uncompress Data.zip and then compress the individual files inside it in one swoop: it all has to
happen 'on the fly':
For all files in Data.zip/
- get the i-th file
- compress it into
name_of_that_file.zip
- store the compressed file in the new folder
Data_zipped
How to do that using the CLI?
I modified @George's super clear script to help better explain the folder structure:
#!/bin/bash
#Name of zip file
filename=$1
# Check if valid zip file is passed
if [[ $(file "$filename" | grep -o "Zip archive data") =~ "Zip archive data" ]]
then
# List the contents of the zip file
unzip -l "$filename"
# Get the number of files in zip file
count=$(unzip -l "$filename" | awk '{count = $2 - 2} END {print count}')
echo "$count"
fi
exit 0
When I run it I get (I use a token Data.zip with only a few files in it, but you get the idea):
./GU_script.sh Data.zip
Archive: Data.zip
Length Date Time Name
--------- ---------- ----- ----
0 2017-11-21 22:58 Data/
120166309 2017-11-21 14:58 Data/Level1_file.csv
120887829 2017-11-21 14:58 Data/Level1_other_file.csv
163772796 2017-11-21 14:59 Data/Level1_yet_other_file.csv
193519556 2017-11-21 14:59 Data/Level1_here_is_another_file.csv
153798779 2017-11-21 14:59 Data/Level1_so_many_files.csv
131918225 2017-11-21 14:59 Data/Level1_many_more_to_go.csv
--------- -------
884063494 7 files
5
So basically, I would like Level1_file.csv
and the other files to be zipped individually (-> Level1_file.zip) and put in a folder.
Edit2;
I ended up combining @George's and @David Foerster's answers:
#!/bin/bash
#Name of zip file
filename="$1"
# Check if valid zip file is passed
if file "$filename" | grep -wq "Zip archive data";
then
#!/bin/bash
src="$filename"
dst=.
LC_ALL=C unzip -l "$src" |
sed -re '1,/^-{6}/d; /^-{6}/,$d; /\/$/d; s/^\s*(\S+\s+){3}//' |
while IFS= read -r f; do
out="${f##*/}"; out="$dst/${f%%/*}_zipped/${out%.*}.zip"
if [ ! -d "${out%/*}" ]; then
mkdir -p "${out%/*}" || break
fi
zip --copy "$src" --out "$out" "$f" || break
done
else
echo "Invalid file type: \"zip\" file required"
exit 1
fi
You can use the “copy” operation of
zip(1)
and some file path mangling. It has the advantage to copy compressed data streams directly to the target archive without intermittent decompression.I added
LC_ALL=C
to the invocation ofunzip
because its output format looks a little flaky across different implementations and I want to avoid locale-dependent output variants at least.This should be able to do what you want:
Note:
Tree structure used:
Have you considered looking into a fuse filesystem with zip-support?
This basically exposes the zip file as a regular directory, which any application may open and read files from, whilst the fuse library handles the dirty details of reading and writing the compressed stream.
On Ubuntu you can install it with
sudo apt install fuse-zip
After installing fuse-zip you can mount a zip-file with
fuse-zip /path/to/some.zip mnt/
, where mnt is an empty directory of your choosing.After finishing, unmount it with
fusermount -u mnt/
, where mnt is the directory where you mounted it.fuse-zip will even create the zip on the fly for you, if it doesn't excist.
you can unzip the files contained in Data.zip one by one:
unzip Data.zip file1.txt
and compress them.