Skip to content

How can I retrieve Amazon EKS control plane logs from CloudWatch Logs?

9 minute read
1

I want to troubleshoot an Amazon Elastic Kubernetes Service (Amazon EKS) issue. I need to collect CloudWatch logs from the components that run on the EKS control plane.

Resolution

Prerequisite: To view your log events in Amazon CloudWatch Logs, you must activate Amazon EKS control plane logging in your cluster. For more information, see View cluster control plane logs.

Use Amazon CloudWatch Logs Insights to access your Amazon EKS control plane logs. Then, query the EKS control plane log data.

For more information, see Analyzing log data with CloudWatch Logs Insights.

View your Amazon EKS control plane logs

Complete the following steps:

  1. Open the CloudWatch console.
  2. In the navigation pane, choose Logs, and then choose Log Insights.
  3. In the Select log group(s) menu, select the Cluster log group that you want to query.
  4. Choose Run to view the results.

Note: To export the results as a .csv file or to copy the results to the clipboard, choose Export results. You can change the sample query to get data for a specific use case.

Sample queries for common EKS use cases

See the following example queries for common EKS use cases:

Note: You can save and re-run queries in Amazon CloudWatch Logs Insights.

Find mutating changes

Run the following query to find mutating changes made to the aws-auth ConfigMap:

fields @logStream, @timestamp, @message  
| filter @logStream like /^kube-apiserver-audit/  
| filter requestURI like /\/api\/v1\/namespaces\/kube-system\/configmaps/  
| filter objectRef.name = "aws-auth"  
| filter verb like /(create|delete|patch)/  
| sort @timestamp desc  
| limit 50  

Locate denied requests

Run the following query to find messages that contain a denied state:

fields @logStream, @timestamp, @message  
| filter @logStream like /authenticator/  
| filter @message like "denied"  
| sort @timestamp asc  
| limit 50

Find a scheduled pod’s node

Run the following query to find the node that a pod was scheduled on:

fields  @timestamp, @message  
| filter @logStream like /kube-scheduler/  
| filter @message like "example-pod-name"  
| filter @message like "ip-"  
| sort @timestamp asc  
| limit 3

Note: Replace example-pod-name with your pod’s name.

Find HTTP 5xx server errors

Run the following query to find HTTP 5xx server errors for Kubernetes API server requests:

fields @logStream, @timestamp, responseStatus.code, @message  
| filter @logStream like /^kube-apiserver-audit/  
| filter responseStatus.code >= 500  
| limit 50

Troubleshoot a CronJob object activation

Run the following query to find API calls that the cronjob-controller made:

fields @logStream, @timestamp, @message  
| filter @logStream like /kube-apiserver-audit/  
| filter user.username like "system:serviceaccount:kube-system:cronjob-controller"  
| display @logStream, @timestamp, @message, objectRef.namespace, objectRef.name  
| sort @timestamp desc  
| limit 50

Find replicaset-controller API calls

Run the following query to find API calls that the replicaset-controller made:

fields @logStream, @timestamp, @message  
| filter @logStream like /kube-apiserver-audit/  
| filter user.username like "system:serviceaccount:kube-system:replicaset-controller"  
| display @logStream, @timestamp, requestURI, verb, user.username  
| sort @timestamp desc  
| limit 50

Find and count HTTP response codes

Run the following query to count the number of HTTP response codes for calls made to the Kubernetes API server:

fields @logStream, @timestamp, @message  
|filter @logStream like /^kube-apiserver-audit/  
| stats count(*) as count by responseStatus.code  
| sort count desc

Example output:

responseStatus.code,count  
200,35066  
201,525  
403,125  
404,116  
101,2

Note: The API server response code statistics show 35,066 successful requests for HTTP 200, 525 created resources for HTTP 201, 125 forbidden requests for HTTP 403, 116 not found errors for HTTP 404, and 2 switching protocol requests for HTTP 101.

Find changes made to DaemonSets/Addons

Run the following query to find changes you made to DaemonSets/Addons in the kube-system namespace:

filter @logStream like /^kube-apiserver-audit/  
| fields @logStream, @timestamp, @message  
| filter verb like /(create|update|delete)/ and strcontains(requestURI,"/apis/apps/v1/namespaces/kube-system/daemonsets")  
| sort @timestamp desc  
| limit 50

Find patch, update, create, and delete calls

Run the following query to find all the patch, update, create, and delete calls related to a specific deployment and deployment pods:

`fields @timestamp,verb, objectRef.name, objectRef.resource, requestObject.message   | filter objectRef.name like /example-deployment-name/   | filter objectRef.resource not like /serviceaccounts/   `| filter objectRef.resource not like /events/`   | filter verb like /create|delete|patch|update/   | sort @timestamp asc`

Note: Replace example-deployment-name with the name of your deployment. In the preceding query, you can remove the line | filter objectRef.resource not like /events/ to exclude events.

Identify the user that deleted a node or resource

Run the following query to find the user that deleted a node:

fields @logStream, @timestamp, @message  
| filter @logStream like /^kube-apiserver-audit/  
| filter verb == "delete" and requestURI like "/api/v1/nodes"  
| sort @timestamp desc  
| limit 10

Run the following query to find the user that deleted a resource, such as a ConfigMap, pod, or deployment:

fields @timestamp,verb, user.username, user.extra.arn.0, user.extra.canonicalArn.0   
| filter  objectRef.name like /aws-auth/  
| filter verb like /delete/  
| sort @timestamp asc

Note: Replace aws-auth with your pod name to find the delete calls for your pod.

Find a deployment’s image version

Run the following query to find the image version of a deployment:

fields @timestamp, verb, objectRef.name,  objectRef.resource  
| filter objectRef.name like /example-deployment-name/  
| filter @message like /image/  
| filter objectRef.resource  like /deployments/  
| parse requestObject.spec.template.spec ‘image”:*,’ as image  
| sort @timestamp asc  
| limit 10000

Note: Replace example-deployment-name with the name of your deployment.

Identify events for a specific node

Run the following query to locate a node that hasn’t been updated:

fields @timestamp, @message, @logStream  
| sort @timestamp asc  
| filter @message like "node example-node-name hasn't been updated for"

Note: Replace example-node-name with your node’s name.
Run the following query to check the last transition time for a specific node’s parameters:

fields @timestamp  
| parse responseObject.status.conditions.0 "lastTransitionTime*" as MemoryPressure  
| parse responseObject.status.conditions.1 "lastTransitionTime*" as DiskPressure  
| parse responseObject.status.conditions.2 "lastTransitionTime*" as PIDPressure  
| parse responseObject.status.conditions.3 "lastTransitionTime*" as ReadyStatus  
| parse responseObject.status.conditions.3 "lastTransitionTime*" as Timepass  
| filter objectRef.name like /example-node-name/  
| filter verb like /patch/  
| filter @message like /lastTransitionTime/  
| sort @timestamp asc

Note: Replace example-node-name with your node’s name.

Identify the user that cordoned a node

Run the following query to find the user that cordoned specific nodes or made the nodes unschedulable:

fields @timestamp, objectRef.name as node_name, verb,user.username, user.extra.sessionName.0 as name, requestObject.spec.unschedulable as unschedulable_flag| filter @logStream like /kube-apiserver-audit/  
| filter @message like /example-node-IP/  
| filter verb like /patch/  
| filter requestObject.spec.unschedulable like /1/

Note: Replace example-node-IP with your node’s IP address.

Identify a deleted pod’s PodIP

Run the following query to find a deleted pod's podIP:

fields @timestamp,objectRef.name as pod, requestObject.status.podIP as podIP  
| filter @logStream like /kube-apiserver-audit/  
| filter objectRef.name = "example-pod-name"  
| filter verb like /patch/  
| filter ispresent(requestObject.status.podIP)  
| sort @timestamp asc

Note: Replace example-pod-name with your pod’s name.

Find an unknown pod’s object output

Run the following query to view the describe output for a deleted pod without the pod name:

fields @timestamp, requestURI, requestObject.message  
| filter requestURI like '/api/v1/namespaces/example-namespace/events'   
| filter  responseObject.involvedObject.name like /example-deployment-name/  
| sort @timestamp asc

Note: Replace example-deployment-name with your deployment’s name and example-namespace with your namespace. If there aren’t multiple objects with the same name in multiple namespaces, then remove the line that contains filter requestURI like.

Find a scheduled pod’s node

Run the following query to find the node that you scheduled a pod on:

fields  @timestamp, @message  
| filter @logStream like /kube-scheduler/  
| filter @message like "example-pod-name"  
| filter @message like "ip-"  
| sort @timestamp asc  
| limit 3

Note: Replace example-pod-name with your pod’s name.

Check for an eviction API

Note: If AWS Fargate operating system (OS) patching deleted your pods or nodes, then the eviction API appears in the audit logs.

Run the following query to find check whether an eviction API appears in your audit logs:

filter @logStream like /kube-apiserver-audit/  
| fields @timestamp, user.username,user.extra.canonicalArn.0, responseStatus.code, responseObject.status, responseStatus.message  
| sort @timestamp asc  
| filter verb == "create" and objectRef.subresource == 'eviction'

 Run the following query to find the eviction API call’s event:

fields @logStream, @timestamp, @message   
| sort @timestamp asc   
| filter user.username == "eks:node-manager" and requestURI like "eviction" and requestURI like "pod"

Find a Fargate pod’s task ID

Run the following query to find the task ID of the Fargate pod:

fields @timestamp, verb, responseObject.spec.providerID as InstanceID  
| filter @message like /example-fargate-node-IP/  
| filter ispresent(responseObject.spec.providerID)

Note: In the previous query, replace example-fargate-node-IP with your Amazon Fargate node’s IP address.

Identify a URL that receives errors

Run the following query to find a URL that receives more than a certain number of 4## or 5## errors:

fields requestURI   
| filter @logStream like "kube-apiserver-audit-i"   
| filter count > example-filter-count   
| stats count(*) as count by requestURI, responseStatus.code   
| filter responseStatus.code >= 400  
| sort count desc

Note: Replace example-filter-count with the minimum number of errors that the query output must display.

Troubleshoot webhooks errors

Run the following query to find errors with webhooks:

fields @timestamp, @message  
| filter @logStream like /kube-apiserver/ and @logStream not like /kube-apiserver-audit/  
| filter @message like /failed calling webhook/  
| sort @timestamp desc  
| stats count(*) by bin(1m)

List failed API Server health checks

Run the following query to list the API Server health checks that failed:

fields @message  
| sort @timestamp asc  
| filter @logStream like "kube-apiserver"  
| filter @logStream not like "kube-apiserver-audit"  
| filter @message like "healthz check failed"

Count Kubernetes objects and userAgent Cloud Watch Log Insights

Run the following query to count the requests by Kubernetes objects and userAgent Cloud Watch Log Insights:

fields @timestamp, @message, @logStream  
| filter @logStream like "kube-apiserver-audit"   
| display @logStream, requestURI, verb   
| stats count(*) as count by objectRef.resource, userAgent  
| sort count desc  
| display objectRef.resource, userAgent, count

View frequent logs

Run the following query to view your most frequent logs:

fields @timestamp, @message, @logStream  
| filter @logStream not like /kube-apiserver-audit/  
| parse @message "*] *" as loggingTimeStamp, loggingMessage  
| stats count(*) as count by loggingMessage   
| sort count desc
AWS OFFICIALUpdated 4 months ago