{"id":234,"date":"2019-08-01T21:02:21","date_gmt":"2019-08-02T04:02:21","guid":{"rendered":"http:\/\/blog.nillsf.com\/?p=234"},"modified":"2019-08-18T20:13:42","modified_gmt":"2019-08-19T03:13:42","slug":"ckad-part-5-observability","status":"publish","type":"post","link":"https:\/\/blog.nillsf.com\/index.php\/2019\/08\/01\/ckad-part-5-observability\/","title":{"rendered":"CKAD series part 5: Observability"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">This is part 5 in a multi-series on my CKAD study efforts. You can find previous entries here:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/blog.nillsf.com\/index.php\/2019\/07\/09\/ckad-series-part-1-intro-exam-topics-my-study-plan\/\">Part 1: intro, exam topics and my study plan<\/a> <\/li><li><a href=\"https:\/\/blog.nillsf.com\/index.php\/2019\/07\/11\/ckad-series-part-2-core-concepts\/\">Part 2: Core concepts<\/a><\/li><li><a href=\"https:\/\/blog.nillsf.com\/index.php\/2019\/07\/21\/ckad-series-part-3-configuration\/\">Part 3: Configuration <\/a><\/li><li><a href=\"https:\/\/blog.nillsf.com\/index.php\/2019\/07\/28\/ckad-series-part-4-multi-container-pods\/\">Part 4: Multi-container pods<\/a><\/li><li><a href=\"https:\/\/blog.nillsf.com\/index.php\/2019\/08\/05\/ckad-series-part-6-pod-design\/\">Part 6: Pod Design<\/a><\/li><li><a href=\"https:\/\/blog.nillsf.com\/index.php\/2019\/08\/18\/ckad-series-part-7-services-and-networking\/\">Part 7: Networking<\/a>  <\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">This is part 5 in this CKAD series, covering the 4th exam topic out of 7. If you made it this far, we have covered half of the topics for the exam, 3 more to go! <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This section will\ncover observability. We&#8217;ll dive into the following 4 topics:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Understand LivenessProbes and ReadinessProbes<\/li><li>Understand container logging<\/li><li>Understand how to monitor applications in Kubernetes<\/li><li>Understand debugging in Kubernetes<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s go ahead and start with the first topic:<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Understand LivenessProbes and ReadinessProbes<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Kubernetes uses <a href=\"https:\/\/kubernetes.io\/docs\/tasks\/configure-pod-container\/configure-liveness-readiness-probes\/\">LivenessProbes and ReadinessProbes<\/a> to monitor the availability of your applications. Each probe serves a different purpose:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>A LivenessProbe monitors the availability of an application while it is running. If a LivenessProbe fails, Kubernetes will restart your pod. This could be useful to catch deadlocks, infinite loops, or just a &#8216;stuck&#8217; application.<\/li><li>A ReadinessProbe monitors when you application becomes available. As long as a ReadinessProbe fails, Kubernetes will not send any traffic to unready pods. This is useful if your application has to go through some configuration before it becomes available.<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">LivenessProbes and ReadinessProbes don&#8217;t need to be served from the same endpoint in your application. If you have a <em>smart<\/em> application, that application could take itself out of rotation (meaning, no more traffic is sent to the application), while still being healthy. To achieve this, you would have the ReadinessProbe fail, but have the LivenessProbe remain active. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Why not try out this last scenario is a quick test? Let&#8217;s try and do the following:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Create 2 nginx pods, each with a distinct page;<\/li><li>Create a service that load balances between these 2 pods;<\/li><li>Have a seperate LivenessProbe and ReadinessProbe;<\/li><li>Start of with everything healthy;<\/li><li>Have the ReadinessProbe fail on 1, but not the other;<\/li><li>Have the ReadinessProbe fail on both;<\/li><li>Recover a ReadinessProbe;<\/li><li>Have the LivenessProbe fail on 1.<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">In <a href=\"https:\/\/blog.nillsf.com\/index.php\/2019\/07\/28\/ckad-series-part-4-multi-container-pods\/\">part 4 during the ambassador pattern<\/a>, we created two pods to which we used an ambassador to do load balancing. We&#8217;ll re-use those pods to have 2 distinct web pages, and then create a service on top of them. Just for the fun of things, let&#8217;s also create a new namespace for this part; and set the kubectl default namespace. <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl create namespace observability\nkubectl config set-context $(kubectl config current-context) --namespace=observability<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s now recreate our HTML pages as a configmap. <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&lt;!DOCTYPE html>\n&lt;html>\n&lt;head>\n    &lt;title>Server 1&lt;\/title>\n&lt;\/head>\n&lt;body>\nServer 1\n&lt;\/body>\n&lt;\/html><\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>&lt;!DOCTYPE html>\n&lt;html>\n&lt;head>\n    &lt;title>Server 2&lt;\/title>\n&lt;\/head>\n&lt;body>\nServer 2\n&lt;\/body>\n&lt;\/html><\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl create configmap server1 --from-file=index1.html\nkubectl create configmap server2 --from-file=index2.html<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s also add a health pages. Nothing fancy here, just a health page.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&lt;!DOCTYPE html>\n&lt;html>\n&lt;head>\n    &lt;title>All is fine here&lt;\/title>\n&lt;\/head>\n&lt;body>\nOK\n&lt;\/body>\n&lt;\/html><\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl create configmap healthy --from-file=healthy.html<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s now go ahead and create our two pods, and add a LivenessProbe and ReadinessProbe to them. (bare with me, this is a lot of yaml)<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>apiVersion: v1\nkind: Pod\nmetadata:\n  name: server1\n  labels:\n    app: web-server #we'll give both servers the same label, so the service will load balance\nspec:\n  containers:\n    - name: nginx-1\n      image: nginx\n      ports:\n        - containerPort: 80\n      livenessProbe:\n        httpGet:\n          path: \/healthy.html\n          port: 80\n        initialDelaySeconds: 3\n        periodSeconds: 3\n      readinessProbe:\n        httpGet:\n          path: \/index.html\n          port: 80\n        initialDelaySeconds: 3\n        periodSeconds: 3\n      volumeMounts:\n        - name: html\n          mountPath: \/usr\/share\/nginx\/html\n  initContainers:\n    - name: prep\n      image: busybox\n      volumeMounts:\n        - name: index\n          mountPath: \/tmp\/index.html\n          subPath: index.html\n        - name: healthy\n          mountPath: \/tmp\/healthy.html\n          subPath: healthy.html\n        - name: html\n          mountPath: \/usr\/share\/nginx\/html\/\n      command: [\"\/bin\/sh\", \"-c\"]\n      args: [\"cp \/tmp\/index.html \/usr\/share\/nginx\/html\/index.html; cp \/tmp\/healthy.html \/usr\/share\/nginx\/html\/healthy.html;\"]\n  volumes:\n    - name: index\n      configMap:\n        name: server1\n    - name: healthy\n      configMap:\n        name: healthy\n    - name: html\n      emptyDir: {}\n---\napiVersion: v1\nkind: Pod\nmetadata:\n  name: server2\n  labels:\n    app: web-server #we'll give both servers the same label, so the service will load balance\nspec:\n  containers:\n    - name: nginx-1\n      image: nginx\n      ports:\n        - containerPort: 80\n      livenessProbe:\n        httpGet:\n          path: \/healthy.html\n          port: 80\n        initialDelaySeconds: 3\n        periodSeconds: 3\n      readinessProbe:\n        httpGet:\n          path: \/index.html\n          port: 80\n        initialDelaySeconds: 3\n        periodSeconds: 3\n      volumeMounts:\n        - name: html\n          mountPath: \/usr\/share\/nginx\/html\n  initContainers:\n    - name: prep\n      image: busybox\n      volumeMounts:\n        - name: index\n          mountPath: \/tmp\/index.html\n          subPath: index.html\n        - name: healthy\n          mountPath: \/tmp\/healthy.html\n          subPath: healthy.html\n        - name: html\n          mountPath: \/usr\/share\/nginx\/html\/\n      command: [\"\/bin\/sh\", \"-c\"]\n      args: [\"cp \/tmp\/index.html \/usr\/share\/nginx\/html\/index.html; cp \/tmp\/healthy.html \/usr\/share\/nginx\/html\/healthy.html;\"]\n  volumes:\n    - name: index\n      configMap:\n        name: server2\n    - name: healthy\n      configMap:\n        name: healthy\n    - name: html\n      emptyDir: {}\n---\napiVersion: v1\nkind: Service\nmetadata:\n  name: web\nspec:\n  selector:\n    app: web-server\n  ports:\n  - protocol: TCP\n    port: 80\n    targetPort: 80\n  type: LoadBalancer<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">You&#8217;ll see in the yaml above, that we have a seperate LivenessProbe and a seperate ReadinessProbe. We can then go to our service and get returns from server 1 and 2. The load balancing isn&#8217;t always fully round-robin, so let&#8217;s do a couple of curls and check the occurrences of server 1 and server 2:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get service # get the service ip here\nfor i in {1..50}; do curl --silent 52.191.138.86 | sed -n '7p' >> output.txt; done\necho 'server 1';grep '1' output.txt | wc;echo 'server 2';grep '2' output.txt | wc<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The above commands will do 50 curls to our IP address, and the echo after that will check how often a certain server appears. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s now go in, and make the ReadinessProbe of server 1 fail.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#first, we'll go into the container and move the index file away\nkubectl exec -it server1 sh\ncd \/usr\/share\/nginx\/html\nmv index.html index.html.fail\n#this will make our readiness probe fail, and make our container unready:\nexit #to go out of the container shell\nkubectl get pods --watch<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">As we see our pod go in a unready state, it won&#8217;t get any more traffic, but it remains running. Let&#8217;s run our curls again:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>rm output.txt\nfor i in {1..50}; do curl --silent 52.191.138.86 | sed -n '7p' >> output.txt; done\necho 'server 1';grep '1' output.txt | wc;echo 'server 2';grep '2' output.txt | wc<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">If things go well, you should now see all traffic go to server 2, and no traffic to server 1. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s now also fail server 2, and see what gives:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#first, we'll go into the container and move the index file away\nkubectl exec -it server2 sh\ncd \/usr\/share\/nginx\/html\nmv index.html index.html.fail\n#this will make our readiness probe fail, and make our container unready:\nexit #to go out of the container shell\nkubectl get pods --watch<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">We don&#8217;t need to do 50 curls now, we just need 1 which will eventually timeout:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>curl 52.191.138.86 --connect-timeout 2<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">And this will timeout after those 2 seconds. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We can then go into one of our pods, and also have the LivenessProbe fail. This will initiate a restart of the container, so if you watch this via <code>kubectl get pods --watch<\/code>; you&#8217;ll see the restart happen. <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl exec -it server2 sh\nmv \/usr\/share\/nginx\/html\/healthy.html \/usr\/share\/nginx\/html\/healthy.html.fail<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Now, this is where it gets fun! Do a <code>kubectl get pods --watch <\/code>and see what happens. Your pod gets into a continuous loop of running &#8211; crashloopbackoff. No way to repair the damage easily. (this where you typically throw away the pod, and redeploy your app).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If you followed along in <a href=\"https:\/\/blog.nillsf.com\/index.php\/2019\/07\/11\/ckad-series-part-2-core-concepts\/\">part 2<\/a>, we did something similar in removing the page a LivenessProbe refers to. There, the page would automatically come back from the container image. In this case however, our health page is stored on the volume we mount into the container. And the LivenessProbe will fail after 3 seconds (as per our pod definition), meaning we have a 3 second window to apply our changes. Which is too short, believe me, I tried. AAAAND: The LivenessProbe is implemented at the container level, not the pod level (<a href=\"https:\/\/github.com\/kubernetes\/kubernetes\/issues\/52345\">there&#8217;s a github issue talking about this<\/a>), so the initcontainer won&#8217;t reapply our files.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This led me to want to go into the node and move the files around in the actual emptydir (the dir exists somewhere you know&#8230;). What follows is a bit of <a href=\"https:\/\/en.wiktionary.org\/wiki\/mental_masturbation\">mental masturbation<\/a> &#8211; as I want to get in the node &#8211; which is actually harder than it looked for a VMSS based AKS-cluster. Long story short, don&#8217;t do. Just delete the pod and carry on with your live. (I did a lot of back and forth with the load balancer, NAT pools, and eventually gave up and just ran a SSH pod in my cluster, which I describe in what follows).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Step 1 is to enable SSH into your cluster nodes. This can be done with the following <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/aks\/ssh#configure-virtual-machine-scale-set-based-aks-clusters-for-ssh-access\">script using AZ CLI<\/a>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>rgname=\"akschallenge\"\naksname=\"aksworkshop\"\nCLUSTER_RESOURCE_GROUP=$(az aks show --resource-group $rgname --name $aksname --query nodeResourceGroup -o tsv)\nSCALE_SET_NAME=$(az vmss list --resource-group $CLUSTER_RESOURCE_GROUP --query [0].name -o tsv)\naz vmss extension set  \\\n    --resource-group $CLUSTER_RESOURCE_GROUP \\\n    --vmss-name $SCALE_SET_NAME \\\n    --name VMAccessForLinux \\\n    --publisher Microsoft.OSTCExtensions \\\n    --version 1.4 \\\n    --protected-settings \"{\\\"username\\\":\\\"azureuser\\\", \\\"ssh_key\\\":\\\"$(cat ~\/.ssh\/id_rsa.pub)\\\"}\"\n\naz vmss update-instances --instance-ids '*' \\\n    --resource-group $CLUSTER_RESOURCE_GROUP \\\n    --name $SCALE_SET_NAME<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">With SSH enabled on the nodes, we can then try to SSH into them. First get the following info: your private key and the IP of the node hosting your failed pod. Afterwards we&#8217;ll need the pod ID of the failed pod.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl get pod server2 -o wide #remember the node\nkubectl get pod server2 -o yaml | grep id #copy this ID\nkubectl get nodes -o wide #copy the IP address of the node\ncat ~\/.ssh\/id_rsa #copy your private key\nkubectl run -it --rm aks-ssh --image=debian\n#wait a couple seconds for the pod to come live\napt-get update &amp;&amp; apt-get install openssh-client -y\nmkdir ~\/.ssh\necho \"PASTE IN YOUR PRIVATE KEY\" > ~\/.ssh\/id_rsa\nchmod 0600 ~\/.ssh\/id_rsa \nssh azureuser@THE_IP_OF_YOUR_NODE \nsudo su #to become root<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">So, now we&#8217;re finally in our node. We can then access that emptydir we created.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Navigate to this folder, and you can move around the file to get the Probes to work again:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>cd \/var\/lib\/kubelet\/pods\/cdfdea76-b3b6-11e9-ad8a-46d847be6880\/volumes\/kubernetes.io~empty-dir\/html\/ #replace the ID with your ID\nmv index.html.fail index.html #this will repair our LivenessProbe page\ncp index.html healthy.html #this will repair our ReadinessProbe<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">If you then do a<code> kubectl get pods --watch<\/code> &#8211; you&#8217;ll notice (after a while) that your pod will stop being rebooted. #SUCCESS<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Understand container logging<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">There&#8217;s a <strong><a href=\"https:\/\/kubernetes.io\/docs\/concepts\/cluster-administration\/logging\/\">good kubernetes documentation article<\/a><\/strong> that describes logging in kubernetes.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For our certification purposes, I&#8217;m going to keep this one short; we&#8217;ll just access logs of a container and of a multi-container pod. But please, if you&#8217;re going to bring this to production, think about your logging infrastructure. Kubernetes doesn&#8217;t maintain logs if a container crashes, a pod is evicted or a node dies &#8211; so you&#8217;ll want to have an external solution for your logs. Think either <a href=\"https:\/\/kubernetes.io\/docs\/tasks\/debug-application-cluster\/logging-elasticsearch-kibana\/\">Elasticsearch and Kibana<\/a> (the open source default), <a href=\"https:\/\/cloud.google.com\/logging\/\">StackDriver <\/a>(the GKE default) or <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/azure-monitor\/insights\/container-insights-overview\">Azure Monitor<\/a> (guess what, the Azure default. Which is actually quiet nice, since it combines logging and metrics in one.)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Ok, let&#8217;s start with a single container pod &#8211; and try to get it&#8217;s logs.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>apiVersion: v1\nkind: Pod\nmetadata:\n  name: counter\nspec:\n  containers:\n  - name: count\n    image: busybox\n    args: [\/bin\/sh, -c,\n            'i=0; while true; do echo \"$i: $(date)\"; i=$((i+1)); sleep 1; done']<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">We can create this one using <code>kubectl create -f counter.yaml<\/code> &#8211; and we can access the logs of this container using <code>kubectl logs counter<\/code>. You can also attach your terminal to follow these logs using <code>kubectl logs counter --follow<\/code>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Pro-tip for the exam, you might be asked (I expect this) to export logs to a file and copy this file to a certain location. Learn about <a href=\"https:\/\/www.tecmint.com\/linux-io-input-output-redirection-operators\/\">output redirection<\/a> if you don&#8217;t already. <code>kubectl logs counter &gt; logfile.log<\/code> will do the trick for you.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s now try the same with a multi-container pod.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>apiVersion: v1\nkind: Pod\nmetadata:\n  name: counter-web\nspec:\n  containers:\n  - name: count\n    image: busybox\n    args: [\/bin\/sh, -c,\n            'i=0; while true; do echo \"$i: $(date)\"; i=$((i+1)); sleep 1; done']\n  - name: web-server\n    image: nginx<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s create this using <code>kubectl create -f counter-web.yaml<\/code> . To get web-server logs, we&#8217;ll need to make at least one request to our web server and then we can get to our logs:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl exec -it counter-web --container count wget localhost\nkubectl logs counter-web web-server # to get web logs\nkubectl logs counter-web count # to get count logs<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Understand how to monitor applications in Kubernetes<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Monitoring comes with the same remark as the previous topic around logging. For a production cluster, you&#8217;ll want to think this one through. You&#8217;ll want to get good information about both the health and utilization of your cluster and your pods. For the purpose of our certification focus here, we&#8217;ll just focus on 1 command here: <code>kubectl top<\/code><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To get metric information about your nodes, you can run <code>kubectl top node<\/code>. This will show you CPU\/Memory utilization across your cluster. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To get metric information about your pods, you can run <code>kubectl top pods<\/code> to show info on your pods. If you want to go a little more granular and also show to containers in your pods, you can execute <code>kubectl top pods --containers<\/code>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Understand debugging in Kubernetes<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">There&#8217;s a large section of the Kubernetes docs dedicated to debugging. The docs discuss debugging <a href=\"https:\/\/kubernetes.io\/docs\/tasks\/debug-application-cluster\/debug-init-containers\/\">Init Containers<\/a>, debugging <a href=\"https:\/\/kubernetes.io\/docs\/tasks\/debug-application-cluster\/debug-pod-replication-controller\/\">Pods and ReplicationControllers<\/a>, debugging <a href=\"https:\/\/kubernetes.io\/docs\/tasks\/debug-application-cluster\/debug-service\/\">Services<\/a>, debugging a <a href=\"https:\/\/kubernetes.io\/docs\/tasks\/debug-application-cluster\/debug-stateful-set\/\">StatefulSet<\/a> and debugging a <a href=\"https:\/\/kubernetes.io\/docs\/tasks\/debug-application-cluster\/crictl\/\">node<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The best way to deal with debugging is running a lot of examples, and hitting errors while you&#8217;re doing so (I certainly have run into a couple of issues along the way that have taken me down deep rabbit holes). <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Before we jump into a couple of examples, let&#8217;s look at the most popular debugging commands:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><code>kubectl get pod<\/code>: This will show you basic info about your pod. You can get more output via the <code>-o wide<\/code> flag, and even more output with <code>-o yaml<\/code> or <code>-o json<\/code>. With <code>--watch<\/code> you can keep an eye on potential changes.<\/li><li><code>kubectl describe pod<\/code>: This command describes the pod for you (in some detail) and also shows you the latest events related to your pod. <\/li><li><code>kubectl logs *podname* *containername*<\/code>: We discussed this before, but this way you can get the logs from your pods and the containers in your pods<\/li><li><code>kubectl get service<\/code>: This will give you service information.  You can get more output via the <code>-o wide<\/code> flag, and even more output with <code>-o yaml<\/code> or <code>-o json<\/code>. With <code>--watch<\/code> you can keep an eye on potential changes. <\/li><li><code>kubectl describe endpoints<\/code>: this will give you information about the endpoints used by services. This will also show you healthy and unhealthy endpoints. <\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">When you are debugging issues with resources <em>apperently <\/em>not existing, don&#8217;t forget about namespaces. You can append the <code>--all-namespaces<\/code> flag to most commands in kubectl, this can help. In service debugging, please remember that the nameresolution for services works with the servicename within the same namespace, but needs the FQDN for cross-namespace communication (<code>service.namespace.svc.cluster.local<\/code>).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s have a look at two examples, and where things can go wrong. We&#8217;ll start with a very simple busybox container that does a counter, with one little mistake in there.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>apiVersion: v1\nkind: Pod\nmetadata:\n  name: counter-wrong\nspec:\n  containers:\n  - name: count\n    image: busybox\n    args: [\/bin\/sh, -c,\n            'i=0; while treu; do echo \"$i: $(date)\"; i=$((i+1)); sleep 1; done']<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">(challenge, can you spot the issue with the naked eye?)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s create this and see what happens:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl create -f counter-wrong.yaml\nkubectl get pods --watch<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">In watching our pods, we&#8217;ll see the container complete a couple times and then enter the CrashLoopBackOff state. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s have a look at the describe for this pod <code>kubectl describe pod counter-wrong<\/code>. If you look at the events here you just see the container starting and then restarting when it fails. This doesn&#8217;t help much does it?<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Maybe the logs will show us something: <code>kubectl logs counter-wrong<\/code><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/bin\/sh: treu: not found<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Now, this tells me something. We misspelled <strong>true<\/strong> as <em>treu<\/em>. That&#8217;s why our while loop was failing. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s try another example, now running an alpine container:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>apiVersion: v1\nkind: Pod\nmetadata:\n  name: alpine-error\nspec:\n  containers:\n  - name: alpine\n    image: apline<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s create this and look at the pods:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl create -f alpine.yaml\nkubectl get pods --watch<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">This will show us a new state for pods we haven&#8217;t seen before:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>alpine-error    0\/1     ErrImagePull       0          31s\nalpine-error    0\/1     ImagePullBackOff   0          43s<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s have a look at a describe to see if this shows us something more:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Events:\n  Type     Reason     Age                From                                        Message\n  ----     ------     ----               ----                                        -------\n  Normal   Scheduled  79s                default-scheduler                           Successfully assigned observability\/alpine-error to aks-nodepool1-14406582-vmss000001\n  Normal   Pulling    37s (x3 over 78s)  kubelet, aks-nodepool1-14406582-vmss000001  Pulling image \"apline\"\n  Warning  Failed     36s (x3 over 77s)  kubelet, aks-nodepool1-14406582-vmss000001  <strong>Failed to pull image \"apline\": rpc error: code = Unknown desc = Error response from daemon: pull access denied for apline, repository does not exist or may require 'docker login': denied: requested access to the resource is denied<\/strong>\n  Warning  Failed     36s (x3 over 77s)  kubelet, aks-nodepool1-14406582-vmss000001  Error: ErrImagePull\n  Normal   BackOff    11s (x4 over 77s)  kubelet, aks-nodepool1-14406582-vmss000001  Back-off pulling image \"apline\"\n  Warning  Failed     11s (x4 over 77s)  kubelet, aks-nodepool1-14406582-vmss000001  Error: ImagePullBackOff<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Do you see what we did there? We misspelled apline, and we couldn&#8217;t pull our container. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Summary<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In this part of the CKAD series we look at observability. We played around with LivenessProbes and ReadinessProbes, briefly touched on logging and monitoring and looked into debugging Kubernetes. For the debugging part I recommend you play around a lot, and learn to work with Kubernetes. The best way to learn is by doing. Nothing beats experience.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">That being said, we&#8217;re over half way our study progress. Are you ready for part 6?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is part 5 in a multi-series on my CKAD study efforts. You can find previous entries here: Part 1: intro, exam topics and my study plan Part 2: Core concepts Part 3: Configuration Part 4: Multi-container pods Part 6: Pod Design Part 7: Networking This is part 5 in this CKAD series, covering the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[20],"tags":[19,17,18],"class_list":["post-234","post","type-post","status-publish","format-standard","hentry","category-ckad","tag-certification","tag-ckad","tag-kubernetes"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/blog.nillsf.com\/index.php\/wp-json\/wp\/v2\/posts\/234","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.nillsf.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.nillsf.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.nillsf.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.nillsf.com\/index.php\/wp-json\/wp\/v2\/comments?post=234"}],"version-history":[{"count":11,"href":"https:\/\/blog.nillsf.com\/index.php\/wp-json\/wp\/v2\/posts\/234\/revisions"}],"predecessor-version":[{"id":296,"href":"https:\/\/blog.nillsf.com\/index.php\/wp-json\/wp\/v2\/posts\/234\/revisions\/296"}],"wp:attachment":[{"href":"https:\/\/blog.nillsf.com\/index.php\/wp-json\/wp\/v2\/media?parent=234"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.nillsf.com\/index.php\/wp-json\/wp\/v2\/categories?post=234"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.nillsf.com\/index.php\/wp-json\/wp\/v2\/tags?post=234"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}