Pyromancer's questions -server

Pyromancer

Asked: 2019-08-23 04:41:48 +0800 CST

Installing Kubernetes on Ubuntu 18.04 LTS (with Docker) - fails on init

0

I am attempting to install Kubernetes on VMs running Ubuntu 10.04 LTS, and running into a problem when trying to initialise the system, the kubeadm init command results in failure (full log below).

VM: 2 CPUs, 512mb RAM, 100 gig disk, running under VMWare ESXi6.

OS: Ubuntu 18.04 LTS server install, fully updated via apt update and apt upgrade before beginning the Docker and Kubernetes installs.

Docker installed as per instructions here, install completes with no errors: https://kubernetes.io/docs/setup/production-environment/container-runtimes/#docker

Kubernetes installed as per instructions here, except for the Docker section (as following those instructions produces a PreFlight error re systemd/cgroupfs): https://vitux.com/install-and-deploy-kubernetes-on-ubuntu/

All installation appears to proceed smoothly with no errors reported, however attempting to start Kubernetes then fails, as shown in the log below.

I am entirely new to both Docker and Kubernetes though I get the main concepts and have experimented with the on-line tutorials on kubernetes.io, but until I can get a working system installed I'm unable to progress further. At the point at which kubeadm attempts to start the cluster, everything hangs for the four minutes, and then exits with the timeout as shown below.

root@k8s-master-dev:~# sudo kubeadm init --pod-network-cidr=10.244.0.0/16
[init] Using Kubernetes version: v1.15.3
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-master-dev kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.24.0.100]
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-master-dev localhost] and IPs [10.24.0.100 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-master-dev localhost] and IPs [10.24.0.100 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
        timed out waiting for the condition

This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
        - 'docker ps -a | grep kube | grep -v pause'
        Once you have found the failing container, you can inspect its logs with:
        - 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster

I've had a look at both the log journal data and the docker logs but other than lots of timeouts, can't see anything that explains the actual error. Can anyone advise where I should be looking, and what's most likely to be the cause of the problem?

Things already tried: Removing all IPTables rules and setting defaults to "accept". Running with Docker install as per the vitux.com instructions (gives a PreFlight warning but no errors, but same timeout on attempting to init Kubernetes).

Update: Following from @Crou's comment, here is what happens now if I try just 'kubeadm init' as root:

root@k8s-master-dev:~# uptime
 16:34:49 up  7:23,  3 users,  load average: 10.55, 16.77, 19.31
root@k8s-master-dev:~# kubeadm init
[init] Using Kubernetes version: v1.15.3
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR Port-6443]: Port 6443 is in use
        [ERROR Port-10251]: Port 10251 is in use
        [ERROR Port-10252]: Port 10252 is in use
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
        [ERROR Port-10250]: Port 10250 is in use
        [ERROR Port-2379]: Port 2379 is in use
        [ERROR Port-2380]: Port 2380 is in use
        [ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`

Re the very high load shown bu uptime, that starts as soon as the init is first attempted and load remains very high unless a kibeadm reset is done to clear everything down.

Pyromancer

Asked: 2018-07-21 01:55:16 +0800 CST

Is there a command to scan an existing XenServer storage repository for VMs?

2

Question: Is it possible to scan for and then re-import XenServer virtual machines from an old storage repository disk after connecting it to a new install?

Background: A client had a XenServer 7.3 install with three local disks each configured as separate storage repositories. All the VMs were on LocalStorage2 and LocalStorage3 while XenServer itself was installed on LocalStorage, the boot disk of the three.

They started getting issues with the XenCenter being unable to show the consoles of the VMs, and while investigating this we discovered that the XenServer disk appeared to have gone read-only. A reboot and fsck temporarily corrected this but it promptly went again - so assuming a faulty disk we replaced the Xen disk, and reinstalled the same version of XenServer.

We then re-attached disks 2 and 3 using the instructions provided by Citrix here - https://support.citrix.com/article/CTX121896 - this worked and the storage repositories show up, and the used vs free disk space shows that the VM data is still there.

However the XenServer shows no VMs present, and despite extensive Googling I can't find any commands to scan an SR for existing VMs. Is this possible? I had hoped to be able to scan / re-import the existing data off the disks.

Meanwhile we're restoring from xva backups, but as each VM is several hundred gigs, this is a slow process.

Pyromancer

Asked: 2017-02-16 09:37:48 +0800 CST

Nagios check_by_ssh returns status 3 even though running the command manually works?

0

Following virtualisation of a datacentre, I'm replacing an elderly internal-use-only Nagios server with a new one on a new VM. For simplicity and because we know it works I've simply replicated the old system, including re-installing Nagios 3.

Most of the detailed checks on the hosts are done using check_by_ssh, accessing remote systems using ssh keys. This always worked perfectly on the old system, however on the new, all of the checks are giving (for example) "Remote command check_disk -w5% -c3% -p /data -u GB returned status 3" in the nagios.log file and on screen.

Having set the keys up I can run the commands manually and they return the expected values, for instance:

ssh -i ~ng3/.ssh/id_rsa user@server "/usr/local/nagios/libexec/check_disk -w25% -c10% -p /store-web -u MB"

returns

DISK OK - free space: /store-web 1405862 MB (70%);| /store-web=609875MB;1511802;1814162;0;2015736

But the log and front end says

UNKNOWN - check_by_ssh: Remote command 'check_disk -w5% -c3% -p /store-web -u MB' returned status 3

Can anyone suggest what could be wrong? There are no SSH banners interrupting the data, passwordless ssh has been checked and full access is available to the correct users, and the commands are either specified in full or in the path - and running them manually from the Nagios box works fine. Results are the same whether the private key is explicitly specified or the just assumed by the user.

Were this a home-grown plugin I could see the problem being incorrect (or no) exit codes being issued but as results are the same from the official Nagios plugins (in this case check_disk) I'm assuming it's not that.

Pyromancer

Asked: 2016-12-27 00:12:33 +0800 CST

SMTP load balancing with remote host's IP passed through to SMTP servers?

2

I'm attempting to build a load-balanced SMTP cluster. The mail servers already exist and run Exim 4. Initially, I looked at using Nginx to do the load-balancing, however on the test system all the mail servers see the inbound connections as coming from the load-balancer IP rather than the actual remote sender IP, and after extensive Googling there doesn't appear to be any way round this. As that effectively turns the mail cluster into an open relay it's clearly a non-starter, which is a pity as Nginx works beautifully otherwise.

So I'm looking to use HAProxy instead, as I gather from further Googling that it has the ability to pass the connections with their original source IP intact so the system relay-allow lists and ACLs will operate correctly.

However having set HAProxy up as per several on-line examples, I either get "SMTP synchronisation error" (and a 500 series error so mail will bounce), and the connection immediately dropping, or just the connection dropping with no SMTP message at all.

Here is the haproxy.conf that's in use:

global

    log         127.0.0.1 local2

    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        nobody
    group       nobody
    daemon

    stats socket /var/lib/haproxy/stats

defaults
    log                     global
    option                  redispatch
    retries                 3
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout check           10s
    maxconn                 3000

listen smtp
    bind 0.0.0.0:25
    mode tcp
    no option http-server-close
    balance roundrobin
#    option smtpchk HELO smtp-in.example.com
    server smtp01 10.0.0.141:25 send-proxy check
    server smtp02 10.0.0.143:25 send-proxy check

Despite the presence of the send-proxy command, which I gather is how you tell haproxy to pass through the source IP, the Exim logs look like this:

016-12-26 07:06:48 SMTP protocol synchronization error (input sent without waiting for greeting): rejected connection from H=[10.0.0.150] input="PROXY TCP4 10.0.0.150 10.0.0.143 40334 25\r\nHELO smtp-in.example.comr\n"

In this case .150 is the load balancer and .143 is the Exim SMTP server.

Questions:

Is it in fact possible to get Nginx to present the SMTP connections to the mail servers with the source IP being the real remote source IP instead of the load-balancer?
Alternatively, is this possible in HAProxy, and how is it done?

In this case the current production system runs LVS, however that relies on both the load balancer and the loopback interfaces on all the mail servers sharing the same IP address. The new load balancer will be OpenSUSE 42.2, and amongst other things if that detects an IP already in use on the network it appears to helpfully removes it from itself to avoid a conflict. So LVS is out in the new build.

Other solutions to the problem which I'm considering include separating inbound and outbound SMTP traffic entirely (currently it all runs through the same load balancer), installing a simple relay (qmail possibly) on the load-balancer IP address, configured to only allow recognised ranges as per standard relay practice, and using simple DNS round-robin on the MX records to send inbound SMTP direct to the mail servers. But a load-balanced solution would be more elegant.

Pyromancer

Asked: 2016-11-25 10:45:19 +0800 CST

How do I extract the initrd when the usual cpio -i -m produces gibberish?

0

I am attempting to follow the instructions here https://superuser.com/questions/1134160/how-to-get-old-linux-versions-to-boot-after-p2v-on-vmware for getting another P2Vd old Linux system to boot, but this time the distro in question is RedHat 7.3. Having tracked down a suitable install ISO I've successfully installed a fresh RH7.3 and am now trying to extract the files from its initrd, to be combined with the ones from the failing-to-boot P2V version and build a new initrd.

However when I run

gunzip < initrd-2.4.18-3.smp.img |cpio -i -m

It responds with

cpio: warning: skipped 485423 bytes of junk
cpio: warning: archive header has reverse byte-order
cpio: premature end of file

and then writes a file named ?lyyPjye?" (except the ys and e are extended ASCII characters) with file permissions c--S--S---

Assuming this would be a fairly straightforward gotcha I tried googling, and searching here, various combinations of cpio, initrd, reverse byte order, but other than references to problems extracting rhel6 rpms on rhel5 systems I didn't find any mentions. This is rhel7.3 trying to extract its own initrd file so version mismatches shouldn't come into it.

How do I resolve this, and will any special technique be needed to rebuild the initrd afterwards?

Pyromancer

Asked: 2016-11-22 18:00:11 +0800 CST

VMware Converter erroring with Unsupported version URI "urn:converter/7.0 - how to resolve?

3

I'm attempting to use VMware Standalone Converter version 6.11 (current version, installed very recently) to create a backup of an existing VMed Windows machine on a remote VMware hypervisor. However as soon as I give VMware Converter the source machine IP and login, it throws this error:

A general system error occurred: Not supported version: Unsupported version URI "urn:converter/7.0" while parsing SOAP body at line 6, column 0 while parsing SOAP envelope at line 2, column 0 while parsing HTTP request before method was determined at line 1, column 0.

Googling has left me none the wiser about what could be causing this, it appears to be a version mismatch of some kind but as I'm running an up to date version of Converter I'm not sure what is objecting to what?

Not sure if relevant, but if I attempt to connect to the hypervisor running the VM in question from the same Windows 10 laptop the Vsphere client installer it wants to download throws an error about "This can only be installed on Win XP SP2 or above" (it's V5). I'm wondering if there are support files that are needed from Vsphere before Converter will connect? And if so is there any way to persuade it that yes, Windows 10 is a version above XP SP2? Just a guess though.

Edit: I tried installing Vsphere from the hypervisor, and Converter, on a Windows 2003 machine on the same network. Vsphere accepted that as being "XP SP2 or above" and installed correctly and connects to the hypervisor quite happily. However Converter throws exactly the same error when given the login credentials of the VM I want to copy.

Pyromancer

Asked: 2015-11-05 11:50:12 +0800 CST

Is there a command to get Apache to display its running config from memory?

5

Is there any way to get the Apache web server to display it's current running config, from memory, i.e. not by parsing the files in the config directories?

I've just managed to accidentally over-write the vhost config on a server (yes, I know, should have had a backup!), and while I can reconstruct it fairly easily (very new server, so simple config and hence no backup yet - that was tomorrow's task), was wondering if in general there was any way to get Apache to display its live running config, rather than just parsing the files (as httpd -S seems to do).

Have tried Googling and searching here on ServerFault, but not found anything. I can imagine this might save a few people's bacon over time. :)

Installing Kubernetes on Ubuntu 18.04 LTS (with Docker) - fails on init

Is there a command to scan an existing XenServer storage repository for VMs?

Nagios check_by_ssh returns status 3 even though running the command manually works?

SMTP load balancing with remote host's IP passed through to SMTP servers?

How do I extract the initrd when the usual cpio -i -m produces gibberish?

VMware Converter erroring with Unsupported version URI "urn:converter/7.0 - how to resolve?

Is there a command to get Apache to display its running config from memory?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?