Ping a Specific Port

Question

Ryan Detzel

Asked: 2010-06-15 04:36:52 +0800 CST2010-06-15 04:36:52 +0800 CST 2010-06-15 04:36:52 +0800 CST

Does gunzip work in memory or does it write to disk?

772

We have our log files gzipped to save space. Normally we keep them compressed and just do

gunzip -c file.gz | grep 'test'

to find important information but we're wondering if it's quicker to keep the files uncompressed and then do the grep.

cat file | grep 'test'

There has been some discussions about how gzip works if it would make sense that if it reads it into memory and unzips then the first one would be faster but if it doesn't then the second one would be faster. Does anyone know how gzip uncompresses data?

6 Answers

Voted

McJeff · Answer 1 · 2010-06-15T04:53:47+08:00

Best Answer

McJeff

2010-06-15T04:53:47+08:002010-06-15T04:53:47+08:00

It's always going to be quicker to cat the uncompressed file as there's no overhead associated with that. Even if you're not writing a temporary file, you're going through the decompression motions, which munch CPU. If you're accessing these files often enough, it's probably better to keep them uncompressed if you have the space.

That said, dumping data to standard out (gunzip -c, zcat, etc...) won't trigger writing to a temporary file. The data is piped directly to the grep command, which treats the uncompressed stream as it's own standard in.

The Wikipedia article on LZ* encoding is here: http://en.wikipedia.org/wiki/LZ77_and_LZ78.

5

Dennis Williamson · Answer 2 · 2010-06-15T05:36:59+08:00

Dennis Williamson

2010-06-15T05:36:59+08:002010-06-15T05:36:59+08:00

As always, nothing beats actual measurement.

Your mileage may vary, but on my system, grepping an already uncompressed file took about a third the time that piping zcat or gunzip into grep did. This isn't surprising.

3

Luke404 · Answer 3 · 2010-06-15T13:04:43+08:00

Luke404

2010-06-15T13:04:43+08:002010-06-15T13:04:43+08:00

Using compression could actually deliver faster throughput to disks, but that depends on a number of factors, including the compression algorithm used and the kind of data you're moving around. ZFS, for example, heavily relies on this assumption.

gzip will either decompress the whole file to a temporary one and rename it in the end (standard gzip -d myfile.gz) or not use any temporary file at all reading some blocks of compressed data at a time and spitting uncompressed data on stdout (gzip -d -c...).

On a modern system I suspect a gunzip | grep could be faster than grepping an uncompressed file, on the other hand gunzip | grep will always win over decompressing a file and then grepping the uncompressed one :)

2

Vi. · Answer 4 · 2010-06-15T05:38:20+08:00

Vi.

2010-06-15T05:38:20+08:002010-06-15T05:38:20+08:00

You can also substitute gzip with lzo to improve performance.

Using of LZO can make things faster (lesser disk input-ouput and little compression CPU overhead)

1

Rob Shinn · Answer 5 · 2010-06-15T06:28:01+08:00

Rob Shinn

2010-06-15T06:28:01+08:002010-06-15T06:28:01+08:00

gzip -dc | grep foo (or gunzip -c) | grep foo writes to a pipe. How the pipe is implemented is dependent on your operating system, but generally it will stay in memory. As others have pointed out, grepping an uncompressed file is always going to be faster due to the time it takes to decompress the compressed data. Using a different compression program may or may not improve performance; you can always measure it.

1

pjz · Answer 6 · 2010-06-15T11:58:34+08:00

pjz

2010-06-15T11:58:34+08:002010-06-15T11:58:34+08:00

Depends on file size: when I/O dominates, then the CPU of doing the decompress takes less time than the file transfer. Whether I/O will dominate depends heavily on the relative speeds of your CPU, your storage systems, and the bandwidth between them.

Also, as an aside, grep -Z aka zgrep is also handy.

0

Does gunzip work in memory or does it write to disk?

Ping a Specific Port

How do I tell Git for Windows where to find my private RSA key?

How do you restart php-fpm?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?