- Newest
- Most votes
- Most comments
Hello,
Thank you for reaching out about this. I have tried replicating the same, and can confirm your findings; when Athena queries are executed, the Glue 'BatchGetTable' API event/call for the same is listed in CloudTrail. Part of the 'requestParameters' field of this event is the 'entries' array, containing the Glue databases and tables as you have mentioned as well.
I also could not find documentation on the same at the moment; however, in order to monitor who is querying which Glue tables you can ensure that you have CloudTrail logs set up and that these are being delivered to S3 - if you create a Glue table for this data in Athena, you can query the CloudTrail logs in Athena to list the IAM users/roles that are involved with executing each query and the Glue database/tables associated with this event from the array. In you query, you could use filters such as the following when querying your CloudTrail table, to ensure that the API call is made by Athena:
WHERE eventname = 'BatchGetTable' AND useragent = 'athena.amazonaws.com'
Another option could be to monitor the StartQueryExecution API calls that are used to run queries in Athena, and the IAM principals associated with these calls. You could get these on an individual query-by-query basis as needed, or use the Athena "ListQueryExecutions" API call or CLI command for the same. For the query execution ID returned for each StartQueryExecution API call, you could then use the Athena 'GetQueryExecution' API or CLI command to get the query string and manually parse the query string to find the table scanned - another option is to use the Athena 'GetQueryRuntimeStatistics' API or CLI command, which would return details about the query execution and it's stages. Part of this output result would be the "QueryStagePlan" field, which contains a "Children" array field that mentions table identifiers used in the query in the format "awsdatacatalog:database_name:table_name". These approaches using API calls are much more involved, but offer some flexibility - it could be a useful approach if you would like to look into or monitor who is querying which tables on an ad-hoc basis or for a few specific query executions when needed, as you would not have to have CloudTrail data already captured in S3, and a table for the same in Glue to query with Athena.
More information on some of the Athena APIs mentioned can be found in the list here (https://docs.aws.amazon.com/athena/latest/APIReference/API_Operations.html) or on the CLI or relevant SDK documentation pages.
I hope that you find this helpful. Please let me know if you have additional questions, I'd be happy to help further wherever I can.
Regards, Zubair
Hello Zubair
Thanks for the detailed answer and the alternative options. We're looking to publish daily statistics on who is using what tables so we cannot rely on an ad-hoc approach. I was hoping there was an off-the-shelf solution but it seems we will have to build something ourselves. I think the simplest solution would be to use the Glue 'BatchGetTable' API event, provided I can get confirmation that this API event will not get deprecated!
Regards Soumaya
Relevant content
- asked 2 years ago
- Accepted Answerasked 8 months ago
- asked 9 months ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated 7 months ago
- AWS OFFICIALUpdated 7 months ago
- AWS OFFICIALUpdated 2 months ago