Ping a Specific Port

Question

Worker

Asked: 2015-05-05 04:29:22 +0800 CST2015-05-05 04:29:22 +0800 CST 2015-05-05 04:29:22 +0800 CST

Effectively handling 2+ million files

772

I have a file based DB that has about 2M files stored in 3 levels of subdirectories.

2/2/6253
2/2/6252
...

File a vary from 30 bytes to 60 KB. Whole DB is Read Only. DB is about 125 Gigabytes big.

Added: All files are compressed by zlib (python)

I want to handle it all as one file with file system in it. Which file system would be my best choose?

At the moment I use following script:

dd if=/dev/zero of=/my_file.iso bs=1024K count=60000
mkfs.ext4 -f /my_file.iso
mount -o loop /my_file.iso /mnt/

4 Answers

Voted

ewwhite · Answer 1 · 2015-05-05T05:36:31+08:00

Best Answer

ewwhite

2015-05-05T05:36:31+08:002015-05-05T05:36:31+08:00

You probably just want to use XFS.

It's quite capable of what you're asking for, and does the job.

There's no reason to complicate this with lesser-used filesystems, which can come with other tradeoffs.

Please see: How does the number of subdirectories impact drive read / write performance on Linux? and The impact of a high directory-to-file ratio on XFS

If you want something more esoteric, ZFS zvols with a filesystem on top could provide an interesting alternative (for compression, integrity and portability purposes).

See here: Transparent compression filesystem in conjunction with ext4

7

shodanshok · Answer 2 · 2015-05-05T05:08:36+08:00

shodanshok

2015-05-05T05:08:36+08:002015-05-05T05:08:36+08:00

If it is read-only, why to not use a ISO file? You can use genisoimage or mkisofs.

If you want to compress the whole thing, you can also use squashfs, another read-only filesystem with very high compression ratio.

2

Fox · Answer 3 · 2015-05-05T05:16:49+08:00

Seeing the number of small files, I would consider using SquashFS. Especially if you have powerful enough CPU (meaning no Pentium III, or 1GHz ARM).

Depending on the type of data stored, SquashFS can greatly reduce its size and thus the I/O when reading it. Only downside is CPU usage on read. On the other hand, any modern CPU can decompress at speeds far outperforming HDD and probably even SSD.

As another advantage - you save space/bandwidth and/or time spent uncompressing after transfer.

Some benchmarks comparing it to ISO and other similar means. As with every benchmark, take it with a grain of salt and better, fake your own. ;-)

Edit: depending on circumstances (and im not daring to guess here) SquashFS without compression (mksquashfs -noD) could outperform ext4, as the code for reading should be much simpler and optimized for read-only operation. But that is really up to you to benchmark in your use case. Another advantage is the SquashFS image being just a little larger than your data. With Ext4 you have to always create larger loop device. Disadvantage is, of course, that it is rather uncomfortable, when you need to change the data. That is way easier with ext4.

Simon · Answer 4 · 2015-05-05T11:34:59+08:00

Simon

2015-05-05T11:34:59+08:002015-05-05T11:34:59+08:00

I am not sure if this fits your purpose, but have you considered tar to combine multiple files? That might decrease the pressure and space requirements on the filesystem, and your database application can read data for a specific file with one of the many tar libraries around.

Depending on your access pattern this might even increase the performance.

1

Effectively handling 2+ million files

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?