I want to schedule queries in Amazon Athena.
Short description
Scheduled queries have many use cases, such as periodic reporting or loading new partitions on a regular interval. Choose one of the following ways to schedule queries in Athena, based on your use case:
- Create an AWS Lambda function to schedule the query, and then create an Amazon EventBridge rule to schedule the Lambda function.
- Use AWS Step Functions to create an ETL pipeline that uses Athena.
- On a Linux machine, use crontab to schedule the query.
- Use an AWS Glue Python shell job.
Resolution
Use a Lambda function and an EventBridge rule to schedule and Athena query
You can use a Lambda function and EventBridge rule to schedule an Athena query. To do so, complete the following steps:
-
Create an AWS Identity and Access Management (IAM) service role for Lambda. Then, attach a policy that allows access to Athena, Amazon Simple Storage Service (Amazon S3), and Amazon CloudWatch Logs. For example, you can add AmazonAthenaFullAccess and CloudWatchLogsFullAccess permissions to the role. AmazonAthenaFullAccess allows full access to Athena and includes basic permissions for Amazon S3. CloudWatchLogsFullAccess allows full access to CloudWatch Logs.
-
Open the Lambda console.
-
Choose Create function.
-
Be sure that Author from scratch is selected, and then configure the following options:
For Name, enter a name for your function.
For Runtime, choose one of the Python options.
For Role, choose Use an existing role, and then choose the IAM role that you created in step 1.
-
Choose Create function.
-
Paste your code in the Function code section. The following example uses Python 3.7. Replace the following values in the example:
default: the Athena database name
SELECT * FROM default.tb: the query that you want to schedule
s3://AWSDOC-EXAMPLE-BUCKET/: the S3 bucket for the query output
import boto3
# Query string to execute
query = 'SELECT * FROM database.tb'
# Database to execute the query against
DATABASE = 'database'
# Output location for query results
output='s3://OUTPUTBUCKET/'
def lambda_handler(event, context):
# Initiate the Boto3 Client
client = boto3.client('athena')
# Start the query execution
response = client.start_query_execution(
QueryString=query,
QueryExecutionContext={
'Database': DATABASE
},
ResultConfiguration={
'OutputLocation': output
}
)
# Return response after starting the query execution
return response
-
Choose Deploy.
-
Open the Amazon EventBridge console.
-
In the navigation pane, choose Rules, and then choose Create rule.
-
Enter a name and description for the rule.
-
For Define pattern, select Schedule.
-
Select Cron expression, and then enter a cron expression.
-
For Select event bus, select AWS default event bus.
-
In the Select Targets section, do the following:
For Target, select Lambda function from the dropdown list.
For Function, select the name of your Lambda function from the dropdown list.
-
Choose Create.
Note: There are quotas for the allowed number of calls to the Athena API per account. Make sure you stay within the quotas if you want to schedule multiple queries. For more information, see Per account API call quotas.
Create an ETL pipeline with Step Functions
If you use Step Functions to create an ETL pipeline that includes Athena, you can use the Step Functions to schedule an Athena query.
Use crontab from a Linux machine
If you use a Linux machine, then you can use crontab to schedule an Athena query.
Use an AWS Glue Python shell job
You can use an AWS Glue Python shell job to run the Athena query with the Athena Boto3 API. Then, define a schedule for the AWS Glue job.
Related information
Tutorial: Create an EventBridge scheduled rule for AWS Lambda functions
Creating a rule that runs on a schedule in Amazon EventBridge