Ping a Specific Port

Question

davidparks21

Asked: 2013-03-15 20:40:22 +0800 CST2013-03-15 20:40:22 +0800 CST 2013-03-15 20:40:22 +0800 CST

How to "quickly" create 300M files and 16M directories?

772

I have a 3-level directory structure defined by 2 hex digits as such:

0A/FF/2B/someimagefile.gif

I have 300M small files in 1.5TB of compressed files that will populate these directories (we will have more files to come in the future, so I chose the directory structure to keep the mass of files from crashing a typical extX filesystem).

Unpacking these files moves at 1MB per second (or ~18 days to unpack). Ouchie!

I guess it was slow because I was creating the directory structure and then the files (done from Java APIs). So I set out to just create the directory structure alone in a bash loop.

The directories alone is about a 5 day task at current rate.

Any ideas on improving the speed that this moves?

UPDATE

One part of the puzzle is solved, using perl, rather than bash, creates the directories over 200 times faster, now it's an operation that give you a coffee break, not an extended weekend off.

But file creation is still extremely slow, even without needing to create the directories.

2 Answers

Voted

davidparks21 · Answer 1 · 2013-03-18T05:23:12+08:00

Best Answer

davidparks21

2013-03-18T05:23:12+08:002013-03-18T05:23:12+08:00

My final answer to this: "Don't do it".

I could not find a way to improve the speed beyond about 2Mbytes/sec when creating many small files. For terrabyte data volumes this is just too much inertia to work against.

We are following in the footsteps of facebook and dumping the files to a binary data store (or using a massive mysql/myisam table with BLOBs, experimenting now...).

It's a bit more complex, but eliminates the random seek problem associated with small files, and I can work with terrabyte volumes of data in a matter of hours, or a day, rather than weeks.

MongoDB has come in as another good option to investigate.

1

Ryan Goltry · Answer 2 · 2013-03-30T17:06:14+08:00

Ryan Goltry

2013-03-30T17:06:14+08:002013-03-30T17:06:14+08:00

remount the filesystem with the options of noatime, nodiratime

0

How to "quickly" create 300M files and 16M directories?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?