I just learned that cpio has three modes: copy-out, copy-in and pass-through.
I was wondering what are the advantages and disadvantages of cpio under copy-out and copy-in modes over tar. When is it better to use cpio and when to use tar?
Similar question for cpio under pass-through mode versus cp.
Thanks and regards!
This is an extremely generic overview:
CPIO does a better job of duplication a file system, including taking backups. It preserves things like hardlinks, fifos, and other not-a-standard-file features. Most implementations of CPIO do everything TAR does, including reading and writing .tar files. CPIO usually takes a list of files from standard input to archive; this makes it very easy to pipe a list from something else (like
find
).CPIO passthrough is very useful if you have a very long list of files you want to copy from directory A to directory B. (For example, you could use
find
to locate all files that have changed in the last 2 years on your system)TAR does a better job of simply dumping all your standard files to/from a tape (or archive file). It's a bit simpler to use (for most common tasks). It meets most people's simple backup demands easily; and most of it's popularity is from this fact.
And now for the fine print. There's several different versions and implementations of both CPIO and TAR. Each one has different features and some have different command line options. There are things that each can do where the other can not; if you find yourself limited by one, try the other. Everyone has a favorite, and 99% of the time either will accomplish the task.
On AE 3 redhat, I found that cpio had a size limitation of 2 GBytes on an output stream. However, tar did not have this limitation.
Other systems might have different limitations.
I understand from the comments and other background that
cpio
is less ubiquitous now and inconsistent between versions. Butcpio
has one advantage I recently found invaluable when dealing with a large number of corrupt tar archives. It does not stop at the first error in a tar file but attempts to skip bad data and extract as much as possible. For example,will print
after the first encountered error, whereas
will print the extracted files and for each error will print:
The tar format expects each archive header to be aligned on a 512 boundary, but if corruption mis-aligns the headers,
cpio
makes a best effort to extract as much as possibleI prefer CPIO too. However, when using
cpio
on file set of an unknown origin (like files created by end users) it is better work withNUL
-terminated file names: use-print0
flag to find and add0
flag tocpio
. This way files with weird names (like ones includingCRNL
characters) will be treated correctly.I see no reason to use cpio for any reason other than ripping opened RPM files, via disrpm or rpm2cpio, but there may be corner cases in which cpio is preferable to tar.
History and popularity
Both tar and cpio are competing archive formats that were introduced in Version 7 Unix in 1979 and then included in POSIX.1-1988, though only tar remained in the next standard, POSIX.1-20011.
Cpio's file format has changed several times and has not remained fully compatible between versions. For example, there is now an ASCII-encoded representation of binary file information data.
Tar is more universally known, has become more versatile over the years, and is more likely to be supported on a given system. Cpio is still used in a few areas, such as the Red Hat package format (RPM), though RPM v5 (which is admittedly obscure) uses xar instead of cpio.
Both live on most Unix-like systems, though tar is more common. Here are Debian's install stats:
Modes
Copy-out: This is for archive creation, akin to
tar -pc
Copy-in: This is for archive extraction, akin to
tar -px
Pass-through: This is basically both of the above, akin to
tar -pc … |tar -px
but in a single command (and therefore microscopically faster). It's similar tocp -pdr
, though both cpio and (especially) tar have more customizability. Also considerrsync -a
, which people often forget since it's more typically used across a network connection.I have not compared their performance, but I expect they'll be quite similar in CPU, memory, and archive size (after compression).