We have a RabbitMQ server that runs correctly for a while until it starts setting and unsetting memory alarms back and forth :
380 =INFO REPORT==== 25-Oct-2021::17:46:33 ===
381 vm_memory_high_watermark set. Memory used:3437756080 allowed:3338231808
382
383 =WARNING REPORT==== 25-Oct-2021::17:46:33 ===
384 memory resource limit alarm set on node 'rabbit@rab-server'.
385
386 **********************************************************
387 *** Publishers will be blocked until this alarm clears ***
388 **********************************************************
389
390 =INFO REPORT==== 25-Oct-2021::17:46:36 ===
391 vm_memory_high_watermark clear. Memory used:1541409584 allowed:3338231808
392
393 =WARNING REPORT==== 25-Oct-2021::17:46:36 ===
394 memory resource limit alarm cleared on node 'rabbit@rab-server'
395
396 =WARNING REPORT==== 25-Oct-2021::17:46:36 ===
397 memory resource limit alarm cleared across the cluster
398
399 =INFO REPORT==== 25-Oct-2021::17:46:42 ===
400 vm_memory_high_watermark set. Memory used:4035019336 allowed:3338231808
401
402 =WARNING REPORT==== 25-Oct-2021::17:46:42 ===
403 memory resource limit alarm set on node 'rabbit@rab-server'.
404
405 **********************************************************
406 *** Publishers will be blocked until this alarm clears ***
407 **********************************************************
408
409 =INFO REPORT==== 25-Oct-2021::17:46:45 ===
410 vm_memory_high_watermark clear. Memory used:1786022776 allowed:3338231808
It does so several times and the jumps are always about 2.5GB or memory used.
Until it starts doing the same for disk space
456 =INFO REPORT==== 25-Oct-2021::18:15:35 ===
457 Free disk space is insufficient. Free bytes: 44498944. Limit: 50000000
458
459 =WARNING REPORT==== 25-Oct-2021::18:15:35 ===
460 disk resource limit alarm set on node 'rabbit@rab-server'.
461
462 **********************************************************
463 *** Publishers will be blocked until this alarm clears ***
464 **********************************************************
465
466 =INFO REPORT==== 25-Oct-2021::18:16:05 ===
467 Free disk space is sufficient. Free bytes: 8649433088. Limit: 50000000
468
469 =WARNING REPORT==== 25-Oct-2021::18:16:05 ===
470 disk resource limit alarm cleared on node 'rabbit@rab-server'
471
472 =WARNING REPORT==== 25-Oct-2021::18:16:05 ===
473 disk resource limit alarm cleared across the cluster
474
475 =INFO REPORT==== 25-Oct-2021::18:18:17 ===
476 Free disk space is insufficient. Free bytes: 46092288. Limit: 50000000
477
478 =WARNING REPORT==== 25-Oct-2021::18:18:17 ===
479 disk resource limit alarm set on node 'rabbit@rab-server'.
480
481 **********************************************************
482 *** Publishers will be blocked until this alarm clears ***
483 **********************************************************
484
485 =INFO REPORT==== 25-Oct-2021::18:19:07 ===
486 Free disk space is sufficient. Free bytes: 2646163456. Limit: 50000000
487
488 =WARNING REPORT==== 25-Oct-2021::18:19:07 ===
489 disk resource limit alarm cleared on node 'rabbit@rab-server'
490
491 =WARNING REPORT==== 25-Oct-2021::18:19:07 ===
492 disk resource limit alarm cleared across the cluster
493
494 =INFO REPORT==== 25-Oct-2021::18:40:51 ===
495 Free disk space is insufficient. Free bytes: 49758208. Limit: 50000000
496
497 =WARNING REPORT==== 25-Oct-2021::18:40:51 ===
498 disk resource limit alarm set on node 'rabbit@rab-server'.
499
500 **********************************************************
501 *** Publishers will be blocked until this alarm clears ***
502 **********************************************************
And the jump in disk space goes from gigabytes of free disk space to less than 50 megabytes
And then it crashes. Attempting to restart the database in this state will have RabbitMQ try to start and will dump its entire database in the log file, causing the log file to go from about 600 lines to over 19000000 (yes, nineteen millions).
I am having trouble figuring out what is causing this behaviour because when i do
df -h
During the behaviour, none of the disks are full.
I am not an expert on RabbitMQ therefore my questions are :
1 - Does RabbitMQ writes to disk if it has too much in RAM?
2 - Where does RabbitMQ writes? It is possible the 50MB is referring to its writable space?
3 - If you have ever encountered a similar issue, what have you done to fix it?
For now i've added swap space on the machine as it is mentioned to enable it in the memory documentation of RabbitMQ : https://www.rabbitmq.com/memory.html
I had to put the server back on track for now, but i have no way of confirming this will fix the issue until it shows up again.
Thank you for your time.