I have a server with the following characteristics: https://www.soyoustart.com/it/offerte/1801sysgame05.xml
Processor Intel i7-4790K
RAM 32GB DDR3 1333MHz
Traffic Unlimited Anti-DDoS Included
Disks 1x240GB SSD
Bandwidth 250 Mbps
I've installed the Proxmox Linux distribution that runs a container based on Ubuntu server to handle a real-time TCP game server written in C++ that, at the moment, reached around 1000 online users, and we are going to double the current population soon.
The problem is that we are encountering a weird performance "bottleneck" as soon as the number of online users reaches ~850. As soon as it returns to ~800 or less, the bottleneck disappears. What practically happens is that players have to wait for about 30s to be connected to the server, while the players already connected are not experiencing any issue (no latency, no freezes etc.). It seems like network congestion, or cap-limit, or something similar that denies further connections to the same process and creates pressure on our CPU (as you can see from the screenshots below)
Here I have collected some graphs from our NetData where I have noticed the same "pattern". The softirqs RCU is particularly meaningful I guess but I do not know what does it mean exactly.
softirqs RCU:
cpu usage/pressure:
cpu frequency
cpu temperature
I do not believe that the fault is our CPU itself, but as said above, seems like something related to a process limitation or something similar.
Do you have any idea of what's going on?
UPDATE:
another related graph