Brett Larson's questions -server

Brett Larson

Asked: 2019-07-31 14:22:20 +0800 CST

Enable debug or trace logging on Kublet for Kubernetes

1

Looking at the Kubernetes dev documentation I can see that there is additional logging available for a node. Utimately i'm trying to enable debug or trace logging on Kublet so that I can troubleshoot an issue i'm having. I can't seem to find any guidance anywhere on adjusting this.

Brett Larson

Asked: 2019-07-23 09:40:42 +0800 CST

"Context deadline exceeded" preventing pods from being created in AKS

4

We have been experiencing an issue causing us pain for the last few months.

The issue appears to be that occasionally when we request a pod via the Kubernetes executor it fails to create.

For example, a spark pod may fail with the following error:

Events:
  Type     Reason                  Age                   From                                         Message
  ----     ------                  ----                  ----                                         -------
  Warning  FailedCreatePodSandBox  20m (x3 over 32m)     kubelet, k8s-agentpool1-123456789-vmss00000q  (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "spark-worker-cc1d28bf3de8428a826c04471e58487c-8577d5d654-2jg89": operation timeout: context deadline exceeded
  Normal   SandboxChanged          16m (x150 over 159m)  kubelet, k8s-agentpool1-123456789-vmss00000q  Pod sandbox changed, it will be killed and re-created.
  Warning  FailedCreatePodSandBox  5m7s (x14 over 161m)  kubelet, k8s-agentpool1-123456789-vmss00000q  Failed create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "spark-worker-cc1d28bf3de8428a826c04471e58487c-8577d5d654-2jg89": operation timeout: context deadline exceeded

Looking at the logs we see "SyncLoop" from kubelet requsting a new pod.

Jul 16 16:33:58 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: I0716 16:33:58.001997    4797 kubelet.go:1908] SyncLoop (ADD, "api"): "d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)

There are also some logs that indicate that volumes are mounted...

Jul 16 16:34:29 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: I0716 16:33:58.175573    4797 reconciler.go:252] operationExecutor.MountVolume started for volume "default-shared" (UniqueName: "kubernetes.io/glusterfs/8272d74f-a7e7-11e9-8f1c-000d3a7b202b-default-shared") pod "d9b3911585c4461c9728aefa39716c44" (UID: "8272d74f-a7e7-11e9-8f1c-000d3a7b202b")

We see a pod Sandbox going to be created:

Jul 16 16:34:29 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: I0716 16:34:29.627374    4797 kuberuntime_manager.go:397] No sandbox for pod "d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)" can be found. Need to start a new one

We don't seem to see anything else until we see this logged:

Jul 16 16:36:29 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: E0716 16:36:29.629252    4797 kuberuntime_manager.go:662] createPodSandbox for pod "d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)" failed: rpc error: code = Unknown desc = failed to create a sandbox for pod "d9b3911585c4461c9728aefa39716c44": operation timeout: context deadline exceeded
Jul 16 16:36:29 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: E0716 16:36:29.629301    4797 pod_workers.go:190] Error syncing pod 8272d74f-a7e7-11e9-8f1c-000d3a7b202b ("d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)"), skipping: failed to "CreatePodSandbox" for "d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)" with CreatePodSandboxError: "CreatePodSandbox for pod \"d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)\" failed: rpc error: code = Unknown desc = failed to create a sandbox for pod \"d9b3911585c4461c9728aefa39716c44\": operation timeout: context deadline exceeded"
Jul 16 16:36:43 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: I0716 16:36:43.937085    4797 kuberuntime_manager.go:397] No sandbox for pod "d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)" can be found. Need to start a new one
Jul 16 16:36:43 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: E0716 16:36:43.940691    4797 remote_runtime.go:96] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to create a sandbox for pod "d9b3911585c4461c9728aefa39716c44": Error response from daemon: Conflict. The container name "/k8s_POD_d9b3911585c4461c9728aefa39716c44_default_8272d74f-a7e7-11e9-8f1c-000d3a7b202b_0" is already in use by container "2a7ecfd3725bbe6604b3006abf6c59a36eb8a5d7142e71a3791f5f7378bf5e27". You have to remove (or rename) that container to be able to reuse that name.
Jul 16 16:36:43 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: E0716 16:36:43.940731    4797 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)" failed: rpc error: code = Unknown desc = failed to create a sandbox for pod "d9b3911585c4461c9728aefa39716c44": Error response from daemon: Conflict. The container name "/k8s_POD_d9b3911585c4461c9728aefa39716c44_default_8272d74f-a7e7-11e9-8f1c-000d3a7b202b_0" is already in use by container "2a7ecfd3725bbe6604b3006abf6c59a36eb8a5d7142e71a3791f5f7378bf5e27". You have to remove (or rename) that container to be able to reuse that name.
Jul 16 16:36:43 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: E0716 16:36:43.940747    4797 kuberuntime_manager.go:662] createPodSandbox for pod "d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)" failed: rpc error: code = Unknown desc = failed to create a sandbox for pod "d9b3911585c4461c9728aefa39716c44": Error response from daemon: Conflict. The container name "/k8s_POD_d9b3911585c4461c9728aefa39716c44_default_8272d74f-a7e7-11e9-8f1c-000d3a7b202b_0" is already in use by container "2a7ecfd3725bbe6604b3006abf6c59a36eb8a5d7142e71a3791f5f7378bf5e27". You have to remove (or rename) that container to be able to reuse that name.
Jul 16 16:36:43 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: E0716 16:36:43.940805    4797 pod_workers.go:190] Error syncing pod 8272d74f-a7e7-11e9-8f1c-000d3a7b202b ("d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)"), skipping: failed to "CreatePodSandbox" for "d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)" with CreatePodSandboxError: "CreatePodSandbox for pod \"d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)\" failed: rpc error: code = Unknown desc = failed to create a sandbox for pod \"d9b3911585c4461c9728aefa39716c44\": Error response from daemon: Conflict. The container name \"/k8s_POD_d9b3911585c4461c9728aefa39716c44_default_8272d74f-a7e7-11e9-8f1c-000d3a7b202b_0\" is already in use by container \"2a7ecfd3725bbe6604b3006abf6c59a36eb8a5d7142e71a3791f5f7378bf5e27\". You have to remove (or rename) that container to be able to reuse that name."
Jul 16 16:36:44 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: W0716 16:36:44.221607    4797 docker_sandbox.go:384] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "d9b3911585c4461c9728aefa39716c44_default": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "2a7ecfd3725bbe6604b3006abf6c59a36eb8a5d7142e71a3791f5f7378bf5e27"
Jul 16 16:36:44 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: I0716 16:36:44.222749    4797 kubelet.go:1953] SyncLoop (PLEG): "d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)", event: &pleg.PodLifecycleEvent{ID:"8272d74f-a7e7-11e9-8f1c-000d3a7b202b", Type:"ContainerDied", Data:"2a7ecfd3725bbe6604b3006abf6c59a36eb8a5d7142e71a3791f5f7378bf5e27"}
Jul 16 16:36:44 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: I0716 16:36:44.739387    4797 kuberuntime_manager.go:415] No ready sandbox for pod "d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)" can be found. Need to start a new one

We thought that may be a CNI error, however we reviewed the Azure CNI logs and it doesn't appear to even get to the part where it starts to request an IP, only showing an error that the DEL command completed with err.

2019/07/16 16:23:06 [net] Deleting veth pair azv4ea5d9d9527 eth0.
2019/07/16 16:23:06 [net] Deleted endpoint &{Id:8e963d34-eth0 HnsId: SandboxKey: IfName:eth0 HostIfName:azv4ea5d9d9527 MacAddress:ce:93:bf:4d:e9:19 InfraVnetIP:{IP:<nil> Mask:<nil>} IPAddresses:[{IP:10.250.18.22
5 Mask:fffff800}] Gateways:[10.250.16.1] DNS:{Suffix: Servers:[168.63.129.16]} Routes:[{Dst:{IP:0.0.0.0 Mask:00000000} Src:<nil> Gw:10.250.16.1 Protocol:0 DevName: Scope:0}] VlanID:0 EnableSnatOnHost:false Enabl
eInfraVnet:false EnableMultitenancy:false NetworkNameSpace:/proc/10781/ns/net ContainerID:8e963d340597f1c9f789b93a7784e8d44ffb00687086de8ee6561338aab7c72d PODName:jupyter-some-person-10 PODNameSpace:defaul
t InfraVnetAddressSpace:}.
2019/07/16 16:23:06 [net] Save succeeded.
2019/07/16 16:23:06 [cni] Calling plugin azure-vnet-ipam DEL nwCfg:&{CNIVersion:0.3.0 Name:azure Type:azure-vnet Mode:bridge Master: Bridge:azure0 LogLevel: LogTarget: InfraVnetAddressSpace: PodNamespaceForDualN
etwork:[] MultiTenancy:false EnableSnatOnHost:false EnableExactMatchForPodName:false CNSUrl: Ipam:{Type:azure-vnet-ipam Environment: AddrSpace: Subnet:10.250.16.0/21 Address:10.250.18.225 QueryInterval:} DNS:{Na
meservers:[] Domain: Search:[] Options:[]} RuntimeConfig:{PortMappings:[] DNS:{Servers:[] Searches:[] Options:[]}} AdditionalArgs:[]}.
2019/07/16 16:23:06 [cni] Plugin azure-vnet-ipam returned err:<nil>.
2019/07/16 16:23:06 Get number of endpoints for ifname eth0 network azure
2019/07/16 16:23:06 [cni-net] DEL command completed with err:<nil>.
2019/07/16 16:23:06 [cni-net] Plugin stopped.
2019/07/16 16:36:38 [cni-net] Plugin azure-vnet version v1.0.18.
2019/07/16 16:36:38 [cni-net] Running on Linux version 4.15.0-1040-azure (buildd@lgw01-amd64-030) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)) #44-Ubuntu SMP Thu Feb 21 14:24:01 UTC 2019

Client:
 Version:           3.0.3
 API version:       1.40
 Go version:        go1.11.4
 Git commit:        48bd4c6d
 Built:             Wed Jan 23 16:17:56 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          3.0.4
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.11.4
  Git commit:       8ecd530
  Built:            Fri Jan 25 01:45:38 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.2.2
  GitCommit:        9754871865f7fe2f4e74d43e2fc7ccd237edcbce
 runc:
  Version:          1.0.0-rc6+dev
  GitCommit:        96ec2177ae841256168fcf76954f7177af9446eb
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Here is the k8s version info:

Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Here is our Node info - We are using AKS engine to create a Kubernetes cluster which uses Azure VMSS nodes.

 Kernel Version:             4.15.0-1050-azure
 OS Image:                   Ubuntu 16.04.6 LTS
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://3.0.4
 Kubelet Version:            v1.13.5
 Kube-Proxy Version:         v1.13.5

I am kind of out of ideas as to what we can even do to troubleshoot this further, as we're unable to even re-create this issue.

I understand that this "Context deadline exceeded" alert is a generic gRPC timeout, but I'm not sure which gRPC transaction is getting hung up on.

Brett Larson

Asked: 2018-07-14 11:58:51 +0800 CST

Setup ASR protection on a virtual machine with a Hyper-V host running Windows 10

0

I am trying to migrate an on-premise virtual machine from a Windows 10 Hyper-V host to Azure using ASR.

My question is that - Is it possible to run the Azure Site Recovery Provider on a client computer that is running Windows 10? I am trying to install AzureSiteRecoveryProvider.exe however I'm getting the following error.

Brett Larson

Asked: 2017-01-18 07:51:48 +0800 CST

AzureRM Diagnostic data and OMS Log Analytics - Monitoring across subscriptions

1

I am trying to collect azure diagnostics data from some resources in Azure on one separate subscription and directory, and ship them to an OMS / log analytics workspace on another subscription / directory.

The account i'm using has access to both organizations, however when I am running the commands to "To enable sending of Diagnostic Logs to a Log Analytics workspace" (as described here)

Set-AzureRmDiagnosticSetting -ResourceId [your resource id] -WorkspaceId [resource id of the log analytics workspace] -Enabled $true

My Commands:

$resourceid = "/subscriptions/e12d538c-xxxx-xxxx-xxxx-e60xxxxx2144/resourceGroups/xxx-xxxx/providers/Microsoft.Cache/Redis/xxxxxxxxxx"
$workspaceid = "/subscriptions/6a9axxxx-8xxx-4xxx-92xx-1bxxxxxx5fc23/resourceGroups/xxxxx-oms-rg/providers/Microsoft.OperationalInsights/workspaces/xxxxxxx"
Set-AzureRmDiagnosticSetting -ResourceId $ResourceId  -WorkspaceId $workspaceId -Enabled $true

I am getting the following error:

Set-AzureRmDiagnosticSetting : The access token is from the wrong issuer 'https://sts.windows.net/5xxxxxxx-cxxx-4xxx-axxx-2xxxxxxxxxxxxx/'. It must match the tenant 'https://sts.windows.net/2xxxxxxx-cxxx-2xxx-bxxx-3xxxxxxxxxxxxx/' associated with this subscription. Please use the authority (URL) 'https://login.windows.net/2xxxxxxx-cxxx-2xxx-bxxx-3xxxxxxxxxxxxx' to get the token. Note, if the subscription is transferred to another tenant there is no impact to the services, but information about new tenant could take time to propagate (up to an hour). If you just transferred your subscription and see this error message, please try back later. At line:1 char:1 + Set-AzureRmDiagnosticSetting -ResourceId $ResourceId -WorkspaceId $w ... + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : CloseError: (:) [Set-AzureRmDiagnosticSetting], CloudException + FullyQualifiedErrorId : Microsoft.Azure.Commands.Insights.Diagnostics.SetAzureRmDiagnosticSettingCommand

Brett Larson

Asked: 2012-12-20 11:56:00 +0800 CST

Connecting to Server 2012 Client VDI from Linux based Thin Clients

2

Currently we have set up a test VDI environment that hosts several Windows 7 client computers. We are using the "Personal" desktop option and everything seems to be running fine except...

The only way to access the virtual desktops seems to be from either logging into the Remote Desktop Web server that is hosting the file and then opening that .RDP file or Alternatively saving that .RDP file and opening it from another PC.

We have several Linux based HP t5145 thin clients running "Thin Pro" that we were looking to utilize to connect to these VDIs, but I cannot seem to find a way to open the .rdp file or to connect to the VDI through the pool. The only DIRECT message from Microsoft I could find was here:

Without the RD Web Access role service to aid discovery, you’ll need to share the RDP files explicitly with all users. Any changes to this access would require a re-dispersal of the RDP file.

So is connecting to client VDI only supported from that darn RDP file? Is there anyway to open that file with rdesktop or another linux alternative? Is there any documentation from Microsoft or HP on the matter? (I couldn't find anything but this sales document that assured me that it was "Fully Tested, Ready to Run")

Thanks.

Brett Larson

Asked: 2012-11-14 14:26:19 +0800 CST

Hyper-V Virtual Machine Networking issues related to Max Ethernet Frame Size

3

I fixed an issue today earlier today but i'm interested in learning WHY it worked. We set up a new Hyper-V virtual machine only to discover that HTTP traffic wasn't working. HTTPS, pings, everything else was working fine.

After months of prodding around I took a shot in the dark. On the Hyper-V host server, the physical NIC card had an advanced setting of "Max Ethernet Frame Size" set to 1500. After setting this setting to 1514 the issue was fixed. Alternatively, setting this to 1512 did not solve the issue; 1514 is the magic number.

My best guess it that when this setting was set to 1500 it was allowing incoming pings because the data payload was a lot smaller of say, HTTP traffic. As far as HTTPS traffic, I read about something called "Path MTU discovery" which i'm going to assume why is HTTPs traffic was getting through fine, albeit slower.

Looking at this post, people agree that 1518 is the max total frame size. Why didn't I need to change this to 1518 instead of 1514 bytes? Why is the default frame size 1500 if that's the max size of the Ethernet payload and not the max size.

Brett Larson

Asked: 2012-11-02 12:54:41 +0800 CST

Test P2V with SCVMM

5

I am trying to convert a physical server 2003 server to a virtual machine.

I am using System Center Virtual Machine Manager 2008 R2 (Workgroup Edition) on a Microsoft Server 2008 R2 Data center edition hyper-v server.

Is it possible to do a test P2V with SCVMM without disrupting the production server? If so, how?

Enable debug or trace logging on Kublet for Kubernetes

"Context deadline exceeded" preventing pods from being created in AKS

Setup ASR protection on a virtual machine with a Hyper-V host running Windows 10

AzureRM Diagnostic data and OMS Log Analytics - Monitoring across subscriptions

Connecting to Server 2012 Client VDI from Linux based Thin Clients

Hyper-V Virtual Machine Networking issues related to Max Ethernet Frame Size

Test P2V with SCVMM

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?