Ping a Specific Port

Question

alxgomz

Asked: 2014-03-27 14:39:26 +0800 CST2014-03-27 14:39:26 +0800 CST 2014-03-27 14:39:26 +0800 CST

Very high load during pvmove in domU

772

I had a pretty bad time this evening. I had to move LVM2 LVs from one PV to another (source PV backed by NFS stored vdisk, target PV backed by an iscsi LUN). Moving small LVs of ths VG (few gigabits) went fine, but I had a 400GB LV and after a while this made my guest reach more than 150 loadavg, to the point where it got stuck and I had to hard reboot it.

I tried to resume the pvmove after doubling memory and cpu sizing (16GB and 4vcpu). The load went very high almost immediately. Reaching 60 of 5 min loadavg, I decided to kill the pvmove process (crossing fingers). The process got killed properly, or at least it was not in the proces table anymore as per ps and top, but the load kept on increasing. Reaching more than 90 before I decided reboot was my only option. While the pvmove process was not running anymorethe load never decreased and CPU was almost exclusively waiting on IOs as show bellow (probably 40 min after I killed the process , which ran during 5 min maximum).

top - 21:18:44 up 12:26,  1 user,  load average: 93.07, 92.53, 89.07
Tasks: 405 total,   1 running, 402 sleeping,   2 stopped,   0 zombie
Cpu(s):  0.1%us,  0.1%sy,  0.0%ni,  0.0%id, 99.8%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16021672k total, 15363796k used,   657876k free,   427060k buffers
Swap:  2095100k total,       36k used,  2095064k free, 11856520k cached

I still had an ssh terminal opened and responsive. Actions to the filesystem seemed pretty responsive (listng dir) but restarting daemon took reaaaally long, and it was not possible to open new ssh connections.

Does any body have an explanation on this behaviour, and more particularly why does the load still increases while the process is not there anymore?

I suspect my iscsi initiator is just not good enough for such operations. But I am eager to ear about anybody else experience on such topics. P.S: I have found this similar question, but it didn't really got answered clearly imho:

https://serverfault.com/questions/268907/high-load-and-oom-killer-on-domus-while-pvmove#=

Regards.

1 Answers

Voted

EEAA · Answer 1 · 2014-03-27T16:21:09+08:00

EEAA

2014-03-27T16:21:09+08:002014-03-27T16:21:09+08:00

See that ~99%wa value? That's your problem. You're running into severe resource contention in your storage subsystem.

You'll need to implement some monitoring so you can collect metrics and determine if the bottleneck is at the network level, at the physical disk level, or somewhere else entirely.

0

Very high load during pvmove in domU

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?