I am using a vServer that is suddenly experiencing very high wait times (10/20/30 seconds) or even timeouts on basic requests since yesterday after being in use for over a year without any problems. This is my configuration:
- 8 CPU vCores, 32 GB memory, 800 GB SSD
- Standard Plesk Obsidian with latest updates
The server runs a couple of websites with PHP and MariaDB via Apache, nothing too fancy, not a huge amount of in-going or out-going traffic, not too much processing on the server. While the average load on this vServer has usually been between 1 and 3 now it is suddenly 20-100... once I start either the Apache or MariaDB service.
Via htop
I can see:
- up to 30 processes in the "D" state (Uninterruptible Sleep)
- very low CPU use (<5% or even 0% on most cores)
- plenty of free memory available (disk space is available as well)
- no unknown/unusual processes (mostly Plesk-related, MariaDB and Apache)
Via iotop
I can see:
- very limited disk activity, both read/write are 0 or close to 0 most of the time
And vmstat 1 5
gives me the following output
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs US SY ID WA st
1 13 0 1055820 0 25834552 2 2 128 59 0 93 8 1 90 0 0
2 14 0 975180 0 25870484 0 0 16 0 0 586 31 4 65 0 0
1 16 0 910584 0 25873184 0 0 100 48 0 374 7 2 92 0 0
0 16 0 920048 0 25883484 0 0 16 64 0 415 8 1 91 0 0
0 15 0 954344 0 25883472 0 0 96 1432 0 383 1 0 99 0 0
So it looks like something is blocking these processes from being executed until every minute or so they are executed. I can then load a few pages on one of the websites, theses processes don't show up in htop
anymore but a few more clicks and suddenly the same situation...
Interactions with this vServer via e.g. SFTP or SSH are also considerably slower than before due to the high average load. I have checked the health of the MariaDB databases already and couldn't find any problems and the load issue also happens when the MariaDB service isn't running.
My questions:
- What can I do or use to find the specific reason why these processes cannot be executed / what is blocking them?
- Is it possible that either the memory or disk has a problem? Should I run
fsck
(this would require taking the server offline)?
Anything to document e.g. hardware-related problems would be really helpful. I have checked other posts about a high load average but couldn't find a solution for my problem.
UPDATE
I've noticed that both buffer
and swpd
above are always 0. Here is the output of cat /proc/meminfo
, could this be a/the reason?
MemTotal: 33554432 kB
MemFree: 639036 kB
MemAvailable: 25227064 kB
Cached: 24259912 kB
Buffers: 0 kB
Active: 19847944 kB
Inactive: 12315884 kB
Active(anon): 7664604 kB
Inactive(anon): 572316 kB
Active(file): 12183340 kB
Inactive(file): 11743568 kB
Unevictable: 11228 kB
Mlocked: 28388 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 8485104 kB
Writeback: 8 kB
AnonPages: 8236920 kB
Shmem: 328712 kB
Slab: 683440 kB
SReclaimable: 661120 kB
SUnreclaim: 22320 kB
Output for iostat -d
(but this shows "40 CPU", so probably the whole server?):
Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn
somename 13805,99 41065,30 2557318,07 2443426623 152162982648
UPDATE
Sample of blocked processes:
UPDATE
Ongoing discussions with other customers of this hoster indicate widespread problems with the virtualization platform used for these vServers (e.g. resources not being made available to the vServers). Some customers having had problems for days now. I'll update once more information is available.
UPDATE
Here is a news report in German about the ongoing problems with this hoster: Lang anhaltende Störung bei Stratos V-Servern
FINAL UPDATE
The problem with this vServer seems to have been resolved now by the hoster after a week, even almost 2 weeks for some customers with other vServers. Main reason: communication issues between switches leading to delays with io operations. Details can be found here: Strato: Massive V-Server-Störung bald behoben