Ping a Specific Port

Question

Vojtěch

Asked: 2022-02-13 23:48:32 +0800 CST2022-02-13 23:48:32 +0800 CST 2022-02-13 23:48:32 +0800 CST

Failed instance in google compute engine

772

I have an GCE instance which has been running for several years. During night, the instance was restarted with following logs:

2022-02-13 04:46:36.370 CET compute.instances.hostError Instance terminated by Compute Engine.
2022-02-13 04:47:08.279 CET compute.instances.automaticRestart Instance automatically restarted by Compute Engine.

However the instance did not restart.

I can connect to the serial console where I see this:

serialport: Connected to ***.europe-west1-b.*** port 1 (
[ TIME ] Timed out waiting for device ***
[DEPEND] Dependency failed for File… ***.
[DEPEND] Dependency failed for /data.
[DEPEND] Dependency failed for Local File Systems.
[  OK  ] Stopped Dispatch Password …ts to Console Directory Watch.
[  OK  ] Stopped Forward Password R…uests to Wall Directory Watch.
[  OK  ] Reached target Timers.
         Starting Raise network interfaces...
[  OK  ] Closed Syslog Socket.
[  OK  ] Reached target Login Prompts.
[  OK  ] Reached target Paths.
[  OK  ] Reached target Sockets.
[  OK  ] Started Emergency Shell.
[  OK  ] Reached target Emergency Mode.
         Starting Create Volatile Files and Directories...
[  OK  ] Finished Create Volatile Files and Directories.
         Starting Network Time Synchronization...
         Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Finished Update UTMP about System Boot/Shutdown.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Finished Update UTMP about System Runlevel Changes.
[  OK  ] Started Network Time Synchronization.
[  OK  ] Reached target System Time Set.
[  OK  ] Reached target System Time Synchronized.
         Stopping Network Time Synchronization...
[  OK  ] Stopped Network Time Synchronization.
         Starting Network Time Synchronization...
[  OK  ] Started Network Time Synchronization.
[  OK  ] Finished Raise network interfaces.
[  OK  ] Reached target Network.
[  OK  ] Reached target Network is Online.
You are in emergency mode. After logging in, type "journalctl -xb" to view
system logs, "systemctl reboot" to r
Cannot open access to console, the root account is locked.
See sulogin(8) man page for more details.
Press Enter to continue.

It seems that one of the disks cannot be connected – but what can I do about it now? The disk seems to be normally available within the compute engine.

2 Answers

Voted

PjoterS · Answer 1 · 2022-02-17T00:56:54+08:00

I am afraid that you cannot do anything with this affected VM.

In Host Events documentation or FAQ you can find information:

A host error (compute.instances.hostError) means that there was a hardware or software issue on the physical machine hosting your VM that caused your VM to crash. A host error which involves total hardware failure or other hardware issues might prevent live migration of your VM.

VM instance which is in the "Cloud", it's still a physical machine that is running your workload. Unfortunately this instance had a hardware or software failure and there is nothing you can do.

GCP introduced something called Live migration which prevents this kind of situation.

Compute Engine offers live migration to keep your virtual machine instances running even when a host system event, such as a software or hardware update, occurs, however I guess it's too late to configure this one.

...

Live migration keeps your instances running during:

Regular infrastructure maintenance and upgrades.

Network and power grid maintenance in the data centers.

Failed hardware such as memory, CPU, network interface cards, disks, power, and so on. This is done on a best-effort basis; if a hardware fails completely or otherwise prevents live migration, the VM crashes and restarts automatically and a hostError is logged.

...

Live migration does not change any attributes or properties of the VM itself. The live migration process just transfers a running VM from one host machine to another host machine within the same zone.

Possible Workaround

As you mention that disks are persistent and still visible in the GCP, you could try to reattach them to another VM. How to Guide can be found in Creating and attaching a disk documentation.

Vojtěch · Answer 2 · 2022-02-18T06:41:57+08:00

Vojtěch

2022-02-18T06:41:57+08:002022-02-18T06:41:57+08:00

I finally found the strange reason for this error - see original /etc/fstab:

/dev/disk/by-id/google-***-data /data ext4 discard,defaults 0 2

But there is no such device on this path. I solved this by attaching /dev/sdb instead, but I guess thi is not the best solution. I wonder how does this happen that the device suddenly completely disappears and in the end kills the machine.

1

Failed instance in google compute engine

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?