Ping a Specific Port

Question

Alex Reynolds

Asked: 2009-08-28 15:26:29 +0800 CST2009-08-28 15:26:29 +0800 CST 2009-08-28 15:26:29 +0800 CST

Is there a smarter tar or cpio out there for efficiently retrieving a file stored in the archive?

772

I am using tar to archive a group of very large (multi-GB) bz2 files.

If I use tar -tf file.tar to list the files within the archive, this takes a very long time to complete (~10-15 minutes).

Likewise, cpio -t < file.cpio takes just as long to complete, plus or minus a few seconds.

Accordingly, retrieving a file from an archive (via tar -xf file.tar myFileOfInterest.bz2 for example) is as slow.

Is there an archival method out there that keeps a readily available "catalog" with the archive, so that an individual file within the archive can be retrieved quickly?

For example, some kind of catalog that stores a pointer to a particular byte in the archive, as well as the size of the file to be retrieved (as well as any other filesystem-specific particulars).

Is there a tool (or argument to tar or cpio) that allows efficient retrieval of a file within the archive?

8 Answers

Voted

cas · Answer 1 · 2009-08-28T17:07:12+08:00

Best Answer

cas

2009-08-28T17:07:12+08:002009-08-28T17:07:12+08:00

tar (and cpio and afio and pax and similar programs) are stream-oriented formats - they are intended to be streamed direct to a tape or piped into another process. while, in theory, it would be possible to add an index at the end of the file/stream, i don't know of any version that does (it would be a useful enhancement though)

it won't help with your existing tar or cpio archives, but there is another tool, dar ("disk archive"), that does create archive files that contain such an index and can give you fast direct access to individual files within the archive.

if dar isn't included with your unix/linux-dist, you can find it at:

http://dar.linux.free.fr/

20

MauganRa · Answer 2 · 2013-12-31T09:31:46+08:00

MauganRa

2013-12-31T09:31:46+08:002013-12-31T09:31:46+08:00

You could use SquashFS for such archives. It is

designed to be accessed using a fuse driver (although a traditional interface exists)
compressed (the larger the block size, the more efficient)
included in the Linux kernel
stores UIDs/GIDs and creation time
endianess-aware, therefore quite portable

The only drawback I know of is that it is read-only.

http://squashfs.sourceforge.net/ http://www.tldp.org/HOWTO/SquashFS-HOWTO/whatis.html

12

Dennis Williamson · Answer 3 · 2009-08-28T17:04:17+08:00

Dennis Williamson

2009-08-28T17:04:17+08:002009-08-28T17:04:17+08:00

While it doesn't store an index, star is purported to be faster than tar. Plus it supports longer filenames and has better support for file attributes.

As I'm sure you're aware, decompressing the file takes time and would likely be a factor in the speed of extraction even if there was an index.

Edit: You might also want to take a look at xar. It has an XML header that contains information about the files in the archive.

From the referenced page:

Xar's XML header allows it to contain arbitrary metadata about files contained within the archive. In addition to the standard unix file metadata such as the size of the file and it's modification and creation times, xar can store information such as ext2fs and hfs file bits, unix flags, references to extended attributes, Mac OS X Finder information, Mac OS X resource forks, and hashes of the file data.

8

womble · Answer 4 · 2009-08-28T16:53:20+08:00

womble

2009-08-28T16:53:20+08:002009-08-28T16:53:20+08:00

The only archive format I know of that stores an index is ZIP, because I've had to reconstruct corrupted indexes more than once.

6

Aidas Kasparas · Answer 5 · 2013-10-18T04:56:22+08:00

Aidas Kasparas

2013-10-18T04:56:22+08:002013-10-18T04:56:22+08:00

Thorbjørn Ravn Anderser is right. GNU tar creates "seekable" archives by default. But it does not use that information when it reads these archives if -n option is not given. With -n option I just extracted 7GB file from 300GB archive in time required to read/write 7GB. Without -n it took more than hour and produced no result.

~~I'm not sure how compression affects this. My archive was not compressed.~~ Compressed archives are not "seekable" because current (1.26) GNU tar offloads compression to external program.

5

MediaManNJ · Answer 6 · 2009-08-28T18:44:41+08:00

MediaManNJ

2009-08-28T18:44:41+08:002009-08-28T18:44:41+08:00

It doesn't index that I know of, but I use dump & restore with large files, and navigating the restore tree in interactive mode to select random files is VERY fast.

2

complistic · Answer 7 · 2014-07-08T18:50:43+08:00

You can use the 7z (7zip) archive/compression format if you have access to the p7zip-full package.

On Ubuntu you can use this command to install it:

$ sudo apt-get install p7zip-full

To create an archive you can use 7z a <archive_name> <file_or_directory> and if you do not want to compress the files and want to just "store" them as-is, you can use the -mx0 option like:

$ 7z a -mx0 myarchive.7z myfile.txt

Creating archive myarchive.7z

You can then extract the files using 7z e:

$ 7z e myarchive.7z

Processing archive: myarchive.7z
Extracting  myfile.txt

Or you can list the index of the archive with the 7z l which is handy for searching with grep:

$ 7z l myarchive.7z | grep

2014-07-08 12:13:39 ....A            0            0  myfile.txt

This is also the t option to test integrity, u to add/update a file to the archive, and d to delete a file.

IMPORTANT NOTE
Do not use the 7zip format for linux filesystem backups as it does not store the owner and group of the files contained.

Thorbjørn Ravn Andersen · Answer 8 · 2009-08-29T10:11:38+08:00

Thorbjørn Ravn Andersen

2009-08-29T10:11:38+08:002009-08-29T10:11:38+08:00

I belive GNU tar is capable of doing what you want, but I cannot locate a definitive resource saying so.

In any case you need a archiving format with an index (since that will allow you to do what you want). I do not belive ZIP-files can grow that big, unfortunately.

1

Is there a smarter tar or cpio out there for efficiently retrieving a file stored in the archive?

Ping a Specific Port

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What's the command-line utility in Windows to do a reverse DNS look-up?

How to check if a port is blocked on a Windows machine?

What port should I open to allow remote desktop?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?