Ping a Specific Port

Question

foss4me

Asked: 2020-10-22 15:44:14 +0800 CST2020-10-22 15:44:14 +0800 CST 2020-10-22 15:44:14 +0800 CST

fio 3.23 core dumps when bench-marking many small files

772

I have been asked to come up fio benchmark results for this test dataset: 1048576x1MiB. So, overall size is 1TiB. The set contains 2^20 1MiB files. The server runs CentOS Linux release 7.8.2003 (Core). It has sufficient RAM:

[root@tbn-6 src]# free -g
              total        used        free      shared  buff/cache   available
Mem:            376           8         365           0           2         365
Swap:             3           2           1

It's actually not a physical server. Instead, it's a Docker container with the following CPU:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                48
On-line CPU(s) list:   0-47
Thread(s) per core:    2
Core(s) per socket:    12
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Gold 6146 CPU @ 3.20GHz
[...]

Why docker? We are working on a project that evaluates the appropriateness of using containers instead of physical servers. Back to the fio issue.

I remember I had troubles with fio dealing with a dataset consisting many small files before. So, I did the following checks:

[root@tbn-6 src]# ulimit -Hn
8388608
[root@tbn-6 src]# ulimit -Sn
8388608
[root@tbn-6 src]# cat /proc/sys/kernel/shmmax
18446744073692774399

All looked OK to me. I also compiled as of this writing the latest fio 3.23 with GCC 9.

[root@tbn-6 src]# fio --version
fio-3.23

Here is the job file:

[root@tbn-6 src]# cat testfio.ini 
[writetest]
thread=1
blocksize=2m
rw=randwrite
direct=1
buffered=0
ioengine=psync
gtod_reduce=1
numjobs=12
iodepth=1
runtime=180
group_reporting=1
percentage_random=90
opendir=./1048576x1MiB

Note: of the above, the following can be taken out:

[...]
gtod_reduce=1
[...]
runtime=180
group_reporting=1
[...]

The rest MUST be kept. This is because running fio in our view the job file should be set up in such a way that emulates the application's interactions with storage as closely as possible, even knowing fio != the application.

I did the first run like so

[root@tbn-6 src]# fio testfio.ini
smalloc: OOM. Consider using --alloc-size to increase the shared memory available.
smalloc: size = 368, alloc_size = 388, blocks = 13
smalloc: pool 0, free/total blocks 1/524320
smalloc: pool 1, free/total blocks 8/524320
smalloc: pool 2, free/total blocks 10/524320
smalloc: pool 3, free/total blocks 10/524320
smalloc: pool 4, free/total blocks 10/524320
smalloc: pool 5, free/total blocks 10/524320
smalloc: pool 6, free/total blocks 10/524320
smalloc: pool 7, free/total blocks 10/524320
fio: filesetup.c:1613: alloc_new_file: Assertion `0' failed.
Aborted (core dumped)

OK, so time to use the --alloc-size

[root@tbn-6 src]# fio --alloc-size=776 testfio.ini
smalloc: OOM. Consider using --alloc-size to increase the shared memory available.
smalloc: size = 368, alloc_size = 388, blocks = 13
smalloc: pool 0, free/total blocks 1/524320
smalloc: pool 1, free/total blocks 8/524320
smalloc: pool 2, free/total blocks 10/524320
smalloc: pool 3, free/total blocks 10/524320
smalloc: pool 4, free/total blocks 10/524320
smalloc: pool 5, free/total blocks 10/524320
smalloc: pool 6, free/total blocks 10/524320
smalloc: pool 7, free/total blocks 10/524320
smalloc: pool 8, free/total blocks 8/524288
smalloc: pool 9, free/total blocks 8/524288
smalloc: pool 10, free/total blocks 8/524288
smalloc: pool 11, free/total blocks 8/524288
smalloc: pool 12, free/total blocks 8/524288
smalloc: pool 13, free/total blocks 8/524288
smalloc: pool 14, free/total blocks 8/524288
smalloc: pool 15, free/total blocks 8/524288
fio: filesetup.c:1613: alloc_new_file: Assertion `0' failed.
Aborted (core dumped)

Back to square one :(

I must be missing something. Any help is much obliged.

1 Answers

Voted

Anon · Answer 1 · 2020-10-24T23:43:13+08:00

(TL;DR setting --alloc-size to have a big number helps)

I bet you can simplify this job down and still reproduce the problem (which will be helpful for whoever looks at this because there are less places to look). I'd guess the crux is that opendir option and the fact that you say the directory contains "2^20 1MiB files"...

If you read the documentation of --alloc-size you will notice it mentions:

If running large jobs with randommap enabled, fio can run out of memory.

By default fio evenly distributes random I/O across evenly across a file (each block is written once per pass) but to do so it needs to keep track of the areas it has written which means it has to keep a data structure per file. OK you can see where this is going...

Memory pools set aside for certain data structures (because they have to be shared between jobs). Initially there are 8 pools (https://github.com/axboe/fio/blob/fio-3.23/smalloc.c#L22 ) and by default each pool is 16 megabytes in size (https://github.com/axboe/fio/blob/fio-3.23/smalloc.c#L21 ).

Each file that does random I/O requires a data structure to go with it. Based on your output let's guess that each file forces an allocation a data structure of 368 bytes + header (https://github.com/axboe/fio/blob/fio-3.23/smalloc.c#L434 ), which combined comes to 388 bytes. Because the pool works in allocations of 32 bytes (https://github.com/axboe/fio/blob/fio-3.23/smalloc.c#L70 ) this means we actually take a bite of 13 blocks (416 bytes) out of a pool per file.

Out of curiosity I have the following questions:

Are you running this in a container?
What is the maximum size that your /tmp can be?

I don't think the above are germane to your issue but it would be good rule out.

Update: by default, docker limits the amount of IPC shared memory (also see its --shm-size option). It's unclear if it was a factor in this particular case but see the "original job only stopped at 8 pools" comment below.

So why didn't setting --alloc-size=776 help? Looking at what you wrote, it seems odd that your blocks per pool didn't increase, right? I notice your pools grew to the maximum of 16 (https://github.com/axboe/fio/blob/fio-3.23/smalloc.c#L24 ) the second time around. The documentation for --alloc-size says this:

--alloc-size=kb Allocate additional internal smalloc pools of size kb in KiB. [...] The pool size defaults to 16MiB. [emphasis added]

You used --alloc-size=776... isn't 776 KiB smaller than 16 MiB? That would make each pool smaller than the default and may explain why it tried to grow the number of pools to the maximum of 16 before giving up in your second run.

(2 ** 20 * 416) / 8 / 1024 = 53248 (but see the update below)

The above arithmetic suggests you want each pool to be approximately 52 megabytes in size if you are going to have 8 of them for a sum total of approximately 416 megabytes of RAM. What happens when you use --alloc-size=53248?

Update: the calculated number above was too low. In a comment the question asker reports that using a much higher setting of --alloc-size=1048576 was required.

(I'm a little concerned that the original job only stopped at 8 pools (128 MiB) though. Doesn't that suggest that trying to grow to a ninth 16 MiB pool was problematic?)

Finally, the fio documentation seems to be hinting these data structures are being allocated when you ask for a particular distribution of random I/O. This suggests that if the I/O is sequential or if the I/O is using random offsets but DOESN'T have to adhere to a distribution then maybe those data structures don't have to be allocated... What happens if you use norandommap ?

(Aside: blocksize=2M but your files are 1MiB big - is that correct?)

This question feels too big and specialist for a casual serverfault answer and may get a better answer from the fio project itself (see https://github.com/axboe/fio/blob/fio-3.23/REPORTING-BUGS , https://github.com/axboe/fio/blob/fio-3.23/README#L58 ).

Good luck!

fio 3.23 core dumps when bench-marking many small files

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?