Looking at the Kubernetes dev documentation I can see that there is additional logging available for a node. Utimately i'm trying to enable debug or trace logging on Kublet so that I can troubleshoot an issue i'm having. I can't seem to find any guidance anywhere on adjusting this.
Brett Larson's questions
We have been experiencing an issue causing us pain for the last few months.
The issue appears to be that occasionally when we request a pod via the Kubernetes executor it fails to create.
For example, a spark pod may fail with the following error:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 20m (x3 over 32m) kubelet, k8s-agentpool1-123456789-vmss00000q (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "spark-worker-cc1d28bf3de8428a826c04471e58487c-8577d5d654-2jg89": operation timeout: context deadline exceeded
Normal SandboxChanged 16m (x150 over 159m) kubelet, k8s-agentpool1-123456789-vmss00000q Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 5m7s (x14 over 161m) kubelet, k8s-agentpool1-123456789-vmss00000q Failed create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "spark-worker-cc1d28bf3de8428a826c04471e58487c-8577d5d654-2jg89": operation timeout: context deadline exceeded
Looking at the logs we see "SyncLoop" from kubelet requsting a new pod.
Jul 16 16:33:58 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: I0716 16:33:58.001997 4797 kubelet.go:1908] SyncLoop (ADD, "api"): "d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)
There are also some logs that indicate that volumes are mounted...
Jul 16 16:34:29 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: I0716 16:33:58.175573 4797 reconciler.go:252] operationExecutor.MountVolume started for volume "default-shared" (UniqueName: "kubernetes.io/glusterfs/8272d74f-a7e7-11e9-8f1c-000d3a7b202b-default-shared") pod "d9b3911585c4461c9728aefa39716c44" (UID: "8272d74f-a7e7-11e9-8f1c-000d3a7b202b")
We see a pod Sandbox going to be created:
Jul 16 16:34:29 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: I0716 16:34:29.627374 4797 kuberuntime_manager.go:397] No sandbox for pod "d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)" can be found. Need to start a new one
We don't seem to see anything else until we see this logged:
Jul 16 16:36:29 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: E0716 16:36:29.629252 4797 kuberuntime_manager.go:662] createPodSandbox for pod "d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)" failed: rpc error: code = Unknown desc = failed to create a sandbox for pod "d9b3911585c4461c9728aefa39716c44": operation timeout: context deadline exceeded
Jul 16 16:36:29 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: E0716 16:36:29.629301 4797 pod_workers.go:190] Error syncing pod 8272d74f-a7e7-11e9-8f1c-000d3a7b202b ("d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)"), skipping: failed to "CreatePodSandbox" for "d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)" with CreatePodSandboxError: "CreatePodSandbox for pod \"d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)\" failed: rpc error: code = Unknown desc = failed to create a sandbox for pod \"d9b3911585c4461c9728aefa39716c44\": operation timeout: context deadline exceeded"
Jul 16 16:36:43 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: I0716 16:36:43.937085 4797 kuberuntime_manager.go:397] No sandbox for pod "d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)" can be found. Need to start a new one
Jul 16 16:36:43 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: E0716 16:36:43.940691 4797 remote_runtime.go:96] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to create a sandbox for pod "d9b3911585c4461c9728aefa39716c44": Error response from daemon: Conflict. The container name "/k8s_POD_d9b3911585c4461c9728aefa39716c44_default_8272d74f-a7e7-11e9-8f1c-000d3a7b202b_0" is already in use by container "2a7ecfd3725bbe6604b3006abf6c59a36eb8a5d7142e71a3791f5f7378bf5e27". You have to remove (or rename) that container to be able to reuse that name.
Jul 16 16:36:43 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: E0716 16:36:43.940731 4797 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)" failed: rpc error: code = Unknown desc = failed to create a sandbox for pod "d9b3911585c4461c9728aefa39716c44": Error response from daemon: Conflict. The container name "/k8s_POD_d9b3911585c4461c9728aefa39716c44_default_8272d74f-a7e7-11e9-8f1c-000d3a7b202b_0" is already in use by container "2a7ecfd3725bbe6604b3006abf6c59a36eb8a5d7142e71a3791f5f7378bf5e27". You have to remove (or rename) that container to be able to reuse that name.
Jul 16 16:36:43 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: E0716 16:36:43.940747 4797 kuberuntime_manager.go:662] createPodSandbox for pod "d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)" failed: rpc error: code = Unknown desc = failed to create a sandbox for pod "d9b3911585c4461c9728aefa39716c44": Error response from daemon: Conflict. The container name "/k8s_POD_d9b3911585c4461c9728aefa39716c44_default_8272d74f-a7e7-11e9-8f1c-000d3a7b202b_0" is already in use by container "2a7ecfd3725bbe6604b3006abf6c59a36eb8a5d7142e71a3791f5f7378bf5e27". You have to remove (or rename) that container to be able to reuse that name.
Jul 16 16:36:43 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: E0716 16:36:43.940805 4797 pod_workers.go:190] Error syncing pod 8272d74f-a7e7-11e9-8f1c-000d3a7b202b ("d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)"), skipping: failed to "CreatePodSandbox" for "d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)" with CreatePodSandboxError: "CreatePodSandbox for pod \"d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)\" failed: rpc error: code = Unknown desc = failed to create a sandbox for pod \"d9b3911585c4461c9728aefa39716c44\": Error response from daemon: Conflict. The container name \"/k8s_POD_d9b3911585c4461c9728aefa39716c44_default_8272d74f-a7e7-11e9-8f1c-000d3a7b202b_0\" is already in use by container \"2a7ecfd3725bbe6604b3006abf6c59a36eb8a5d7142e71a3791f5f7378bf5e27\". You have to remove (or rename) that container to be able to reuse that name."
Jul 16 16:36:44 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: W0716 16:36:44.221607 4797 docker_sandbox.go:384] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "d9b3911585c4461c9728aefa39716c44_default": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "2a7ecfd3725bbe6604b3006abf6c59a36eb8a5d7142e71a3791f5f7378bf5e27"
Jul 16 16:36:44 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: I0716 16:36:44.222749 4797 kubelet.go:1953] SyncLoop (PLEG): "d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)", event: &pleg.PodLifecycleEvent{ID:"8272d74f-a7e7-11e9-8f1c-000d3a7b202b", Type:"ContainerDied", Data:"2a7ecfd3725bbe6604b3006abf6c59a36eb8a5d7142e71a3791f5f7378bf5e27"}
Jul 16 16:36:44 k8s-agentpool1-123456789-vmss00000T kubelet[4797]: I0716 16:36:44.739387 4797 kuberuntime_manager.go:415] No ready sandbox for pod "d9b3911585c4461c9728aefa39716c44_default(8272d74f-a7e7-11e9-8f1c-000d3a7b202b)" can be found. Need to start a new one
We thought that may be a CNI error, however we reviewed the Azure CNI logs and it doesn't appear to even get to the part where it starts to request an IP, only showing an error that the DEL command completed with err.
2019/07/16 16:23:06 [net] Deleting veth pair azv4ea5d9d9527 eth0.
2019/07/16 16:23:06 [net] Deleted endpoint &{Id:8e963d34-eth0 HnsId: SandboxKey: IfName:eth0 HostIfName:azv4ea5d9d9527 MacAddress:ce:93:bf:4d:e9:19 InfraVnetIP:{IP:<nil> Mask:<nil>} IPAddresses:[{IP:10.250.18.22
5 Mask:fffff800}] Gateways:[10.250.16.1] DNS:{Suffix: Servers:[168.63.129.16]} Routes:[{Dst:{IP:0.0.0.0 Mask:00000000} Src:<nil> Gw:10.250.16.1 Protocol:0 DevName: Scope:0}] VlanID:0 EnableSnatOnHost:false Enabl
eInfraVnet:false EnableMultitenancy:false NetworkNameSpace:/proc/10781/ns/net ContainerID:8e963d340597f1c9f789b93a7784e8d44ffb00687086de8ee6561338aab7c72d PODName:jupyter-some-person-10 PODNameSpace:defaul
t InfraVnetAddressSpace:}.
2019/07/16 16:23:06 [net] Save succeeded.
2019/07/16 16:23:06 [cni] Calling plugin azure-vnet-ipam DEL nwCfg:&{CNIVersion:0.3.0 Name:azure Type:azure-vnet Mode:bridge Master: Bridge:azure0 LogLevel: LogTarget: InfraVnetAddressSpace: PodNamespaceForDualN
etwork:[] MultiTenancy:false EnableSnatOnHost:false EnableExactMatchForPodName:false CNSUrl: Ipam:{Type:azure-vnet-ipam Environment: AddrSpace: Subnet:10.250.16.0/21 Address:10.250.18.225 QueryInterval:} DNS:{Na
meservers:[] Domain: Search:[] Options:[]} RuntimeConfig:{PortMappings:[] DNS:{Servers:[] Searches:[] Options:[]}} AdditionalArgs:[]}.
2019/07/16 16:23:06 [cni] Plugin azure-vnet-ipam returned err:<nil>.
2019/07/16 16:23:06 Get number of endpoints for ifname eth0 network azure
2019/07/16 16:23:06 [cni-net] DEL command completed with err:<nil>.
2019/07/16 16:23:06 [cni-net] Plugin stopped.
2019/07/16 16:36:38 [cni-net] Plugin azure-vnet version v1.0.18.
2019/07/16 16:36:38 [cni-net] Running on Linux version 4.15.0-1040-azure (buildd@lgw01-amd64-030) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)) #44-Ubuntu SMP Thu Feb 21 14:24:01 UTC 2019
Client:
Version: 3.0.3
API version: 1.40
Go version: go1.11.4
Git commit: 48bd4c6d
Built: Wed Jan 23 16:17:56 2019
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 3.0.4
API version: 1.40 (minimum version 1.12)
Go version: go1.11.4
Git commit: 8ecd530
Built: Fri Jan 25 01:45:38 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.2.2
GitCommit: 9754871865f7fe2f4e74d43e2fc7ccd237edcbce
runc:
Version: 1.0.0-rc6+dev
GitCommit: 96ec2177ae841256168fcf76954f7177af9446eb
docker-init:
Version: 0.18.0
GitCommit: fec3683
Here is the k8s version info:
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Here is our Node info - We are using AKS engine to create a Kubernetes cluster which uses Azure VMSS nodes.
Kernel Version: 4.15.0-1050-azure
OS Image: Ubuntu 16.04.6 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://3.0.4
Kubelet Version: v1.13.5
Kube-Proxy Version: v1.13.5
I am kind of out of ideas as to what we can even do to troubleshoot this further, as we're unable to even re-create this issue.
I understand that this "Context deadline exceeded" alert is a generic gRPC timeout, but I'm not sure which gRPC transaction is getting hung up on.
I am trying to migrate an on-premise virtual machine from a Windows 10 Hyper-V host to Azure using ASR.
My question is that - Is it possible to run the Azure Site Recovery Provider on a client computer that is running Windows 10? I am trying to install AzureSiteRecoveryProvider.exe however I'm getting the following error.
I am trying to collect azure diagnostics data from some resources in Azure on one separate subscription and directory, and ship them to an OMS / log analytics workspace on another subscription / directory.
The account i'm using has access to both organizations, however when I am running the commands to "To enable sending of Diagnostic Logs to a Log Analytics workspace" (as described here)
Set-AzureRmDiagnosticSetting -ResourceId [your resource id] -WorkspaceId [resource id of the log analytics workspace] -Enabled $true
My Commands:
$resourceid = "/subscriptions/e12d538c-xxxx-xxxx-xxxx-e60xxxxx2144/resourceGroups/xxx-xxxx/providers/Microsoft.Cache/Redis/xxxxxxxxxx"
$workspaceid = "/subscriptions/6a9axxxx-8xxx-4xxx-92xx-1bxxxxxx5fc23/resourceGroups/xxxxx-oms-rg/providers/Microsoft.OperationalInsights/workspaces/xxxxxxx"
Set-AzureRmDiagnosticSetting -ResourceId $ResourceId -WorkspaceId $workspaceId -Enabled $true
I am getting the following error:
Set-AzureRmDiagnosticSetting : The access token is from the wrong issuer 'https://sts.windows.net/5xxxxxxx-cxxx-4xxx-axxx-2xxxxxxxxxxxxx/'. It must match the tenant 'https://sts.windows.net/2xxxxxxx-cxxx-2xxx-bxxx-3xxxxxxxxxxxxx/' associated with this subscription. Please use the authority (URL) 'https://login.windows.net/2xxxxxxx-cxxx-2xxx-bxxx-3xxxxxxxxxxxxx' to get the token. Note, if the subscription is transferred to another tenant there is no impact to the services, but information about new tenant could take time to propagate (up to an hour). If you just transferred your subscription and see this error message, please try back later. At line:1 char:1 + Set-AzureRmDiagnosticSetting -ResourceId $ResourceId -WorkspaceId $w ... + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : CloseError: (:) [Set-AzureRmDiagnosticSetting], CloudException + FullyQualifiedErrorId : Microsoft.Azure.Commands.Insights.Diagnostics.SetAzureRmDiagnosticSettingCommand
Currently we have set up a test VDI environment that hosts several Windows 7 client computers. We are using the "Personal" desktop option and everything seems to be running fine except...
The only way to access the virtual desktops seems to be from either logging into the Remote Desktop Web server that is hosting the file and then opening that .RDP file or Alternatively saving that .RDP file and opening it from another PC.
We have several Linux based HP t5145 thin clients running "Thin Pro" that we were looking to utilize to connect to these VDIs, but I cannot seem to find a way to open the .rdp file or to connect to the VDI through the pool. The only DIRECT message from Microsoft I could find was here:
Without the RD Web Access role service to aid discovery, you’ll need to share the RDP files explicitly with all users. Any changes to this access would require a re-dispersal of the RDP file.
So is connecting to client VDI only supported from that darn RDP file? Is there anyway to open that file with rdesktop or another linux alternative? Is there any documentation from Microsoft or HP on the matter? (I couldn't find anything but this sales document that assured me that it was "Fully Tested, Ready to Run")
Thanks.
I fixed an issue today earlier today but i'm interested in learning WHY it worked. We set up a new Hyper-V virtual machine only to discover that HTTP traffic wasn't working. HTTPS, pings, everything else was working fine.
After months of prodding around I took a shot in the dark. On the Hyper-V host server, the physical NIC card had an advanced setting of "Max Ethernet Frame Size" set to 1500. After setting this setting to 1514 the issue was fixed. Alternatively, setting this to 1512 did not solve the issue; 1514 is the magic number.
My best guess it that when this setting was set to 1500 it was allowing incoming pings because the data payload was a lot smaller of say, HTTP traffic. As far as HTTPS traffic, I read about something called "Path MTU discovery" which i'm going to assume why is HTTPs traffic was getting through fine, albeit slower.
Looking at this post, people agree that 1518 is the max total frame size. Why didn't I need to change this to 1518 instead of 1514 bytes? Why is the default frame size 1500 if that's the max size of the Ethernet payload and not the max size.
I am trying to convert a physical server 2003 server to a virtual machine.
I am using System Center Virtual Machine Manager 2008 R2 (Workgroup Edition) on a Microsoft Server 2008 R2 Data center edition hyper-v server.
Is it possible to do a test P2V with SCVMM without disrupting the production server? If so, how?