How do I troubleshoot Athena SELECT query issues for CloudTrail logs?
I ran a SELECT query on my Amazon Athena table for AWS CloudTrail logs. However, the query either runs slowly, didn't return results, or failed with a timeout error.
Resolution
Follow the troubleshooting steps for your use case.
Query returns no results
Incorrect Amazon S3 location path
Check the Amazon Simple Storage Service (Amazon S3) table location for your CloudTrail logs. If the input LOCATION path in your table is incorrect, then Athena doesn't return any records.
Partitions not loaded to table
If you used the manual partitioning CREATE TABLE statement to create your table, then verify that you loaded partitions to the Athena table. If you don't load partitions to the table, then Athena queries don't return results.
Use a SHOW PARTITIONS statement to view partitions that you already loaded to the table. Use the ALTER TABLE ADD PARTITION command to load the required partitions so that your query returns results.
Partitions incorrectly defined
If you used the manual partitioning CREATE TABLE statement to create your table, then check the timestamp property range. Make sure that the projection.timestamp.range property matches the partitions in your Amazon S3 bucket location.
Make sure that the SELECT query doesn't filter data on values beyond the range that you set in the projection.timestamp.range property.
Storage classes
Athena doesn't support data queries for S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage classes, and objects are ignored.
Check if the CloudTrail logs are in unsupported S3 storage classes. Restore the Amazon S3 archived objects that you want to query. Then, query the restored Amazon S3 objects.
Query takes a long time to run or fails with a timeout error
Nonpartitioned table
If you used the CloudTrail console to create an Athena table for CloudTrail logs, then a nonpartitioned table is created.
SELECT queries scan all log files in your Amazon S3 bucket with nonpartitioned tables. If the DML query timeout limit is exceeded, then queries might run slowly or time out.
To optimize query performance, it's a best practice to create a partitioned table. To reduce query time, specify a partition column in the WHERE clause so that Athena scans only data from the matched partitions.
Partitioned table query doesn't use the partition column to filter data
If the SELECT query doesn't use a partition column in the WHERE clause, then Athena scans all log files from the Amazon S3 bucket.
The following example query has a partition projection CloudTrail table that runs slowly because it scans all logs:
SELECT useridentity, sourceipaddress, eventtime FROM cloudtrail_table WHERE eventname = 'RunInstances' AND eventtime >= '2023-10-30T00:00:00Z' AND eventtime < '2023-10-30T00:04:00Z';
To reduce query time, specify a partition column in the WHERE clause so that Athena scans only data from the matched partitions.
The following example query has a WHERE clause on the partition column timestamp that scans only logs from the 2023/10/30 partition:
SELECT useridentity, sourceipaddress, eventtime FROM cloudtrail_table WHERE eventname = 'RunInstances' AND eventtime >= '2023-10-30T00:00:00Z' AND eventtime < '2023-10-30T00:04:00Z' AND timestamp >= '2023/10/30' AND timestamp < '2023/10/31'
For more information, see How do I troubleshoot timeout issues when I query CloudTrail data using Athena?
Related information
How do I automatically create tables in Amazon Athena to search through AWS CloudTrail logs?
Contenuto pertinente
- AWS UFFICIALEAggiornata 3 anni fa
- AWS UFFICIALEAggiornata 2 anni fa
- AWS UFFICIALEAggiornata un anno fa