Ping a Specific Port

Question

darron

Asked: 2018-09-28 14:18:21 +0800 CST2018-09-28 14:18:21 +0800 CST 2018-09-28 14:18:21 +0800 CST

Google Compute Engine - Can't connect via SSH? VM loses network access?

772

EDIT: This was an out-of-control application process, not GCE. Here's the issue, and answered below:

I just had some kind of outage with my CE VM on a trial account, but I don't see any outage reported on the Google Compute Outage list.

I'm not sure how long it lasted since I'm not sure when it started. From the behavior it matches something that seemed to happen a few weeks ago (losing the ability to log in with SSH over the Compute Engine dashboard until the VM was rebooted).

My test VM disconnected my SSH connection in the last day or so, and when I noticed today I was unable to reconnect. I then tried to connect with SSH using "SSH" connect on the Compute Engine VM list, and that failed. The only thing I could do was get a prompt on the serial console... but I didn't have a password-enabled account at all, I was relying on SSH (now fixed). I had to stop the VM and restart it... then I could connect using the "SSH" connect option on the VM list, although I could NOT connect from outside. I connected to the serial console and saw some network error messages trying to connect to various snaps. I tried to SSH to a remote server from my SSH window into the VM, and initially could not. After a minute or so that worked, and suddenly remote connections worked again.

EDIT: I got a response from my support request from Google. They're saying I experienced a Live Migration event. That doesn't sound right. This was at least 10 minutes of disrupted networking. I could connect to the serial console, and it seemed responsive. It was only after rebooting and the failure of the google management snaps to initialize that it appeared to suddenly start working. Maybe a failure of communication in boot triggered the migration event? I don't know.

EDIT: I removed my worrying about GCE's stability since the infrastructure had nothing to do with the problem.

2 Answers

Voted

John Mathew · Answer 1 · 2018-09-29T10:14:34+08:00

John Mathew

2018-09-29T10:14:34+08:002018-09-29T10:14:34+08:00

There may be a number of reasons for this to happen. I would recommend checking the SSH troubleshooting document for more information about how to troubleshoot this issue.

This issue could also occur if the Linux guest environment did not initiate properly after the live migration. The guest environments includes a set of scripts and processes that run contents from a metadata server and creates the proper environment for a virtual machine to run. It might be possible that the SSH keys were not set properly during the guest environment setup.

You may also set the 'automaticRestart' field to 'true' as mentioned in this document. This will automatically restart your instance if it crashes due to a hardware issue or after a live migration. This will ensure that the SSH keys were set up correctly. Feel free to read the live migration documentation if you need further information about live migration in Google Cloud Platform.

1

darron · Answer 2 · 2018-09-29T17:33:33+08:00

Best Answer

darron

2018-09-29T17:33:33+08:002018-09-29T17:33:33+08:00

The instance appeared functional on the serial console, but it was in fact in high distress due to an out of control root-privileged (a temporary testing thing) process eating up all available memory. The system OOM killer was constantly killing the process, which would just respawn.

Google Compute Engine should monitor system memory usage by default. It's kind of weird that it doesn't.

So, uh... given the situation the usefulness of this question to anyone seems low. Should it be deleted?

0

Google Compute Engine - Can't connect via SSH? VM loses network access?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?