I want to use Amazon Athena to query my Application Load Balancer connection logs.
Short description
Elastic Load Balancing provides connection logs as an optional logging feature. The logs provide information on the Transport Layer Security (TLS and mTLS) connections to your Application Load Balancer.
Connection logs are deactivated by default. When activated, Amazon Athena lets you query the Application Load Balancer connection logs and stores them in the Amazon Simple Storage (Amazon S3) bucket.
Resolution
Create a database and table for Athena to query the Application Load Balancer logs.
-
Open the Amazon Athena console.
-
To create a database, run the following command in the Query Editor.
Note: It's a best practice to create the database in the same AWS Region as the Amazon S3 bucket:
CREATE DATABASE alb_db
-
After you create the database, create a table with the following query:
CREATE EXTERNAL TABLE IF NOT EXISTS alb_conn_logs (
time string,
client_ip string,
client_port int,
listener_port int,
tls_protocol string,
tls_cipher string,
tls_handshake_latency double,
leaf_client_cert_subject string,
leaf_client_cert_validity string,
leaf_client_cert_serial_number string,
tls_verify_status string
)
PARTITIONED BY
(
day STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' =
'([^ ]*) ([^ ]*) ([0-9]*) ([0-9]*) ([A-Za-z0-9.-]*) ([^ ]*) ([-.0-9]*) \"([^\"]*)\" ([^ ]*) ([^ ]*) ([^ ]*)')
LOCATION 's3://<S3-LOCATION>/AWSLogs/<ACCOUNT-NUMBER>/elasticloadbalancing/<REGION>/'
TBLPROPERTIES
(
"projection.enabled" = "true",
"projection.day.type" = "date",
"projection.day.range" = "2023/11/27,NOW",
"projection.day.format" = "yyyy/MM/dd",
"projection.day.interval" = "1",
"projection.day.interval.unit" = "DAYS",
"storage.location.template" = "s3://<S3-LOCATION>/AWSLogs/<ACCOUNT-NUMBER>/elasticloadbalancing/<REGION>/${day}"
)
Note: Replace <S3-LOCATION>, <ACCOUNT-NUMBER>, and <REGION> with the values of your own Amazon S3 bucket.
-
To query the Application Load Balancer connection logs, use SQL statements.
Example queries
View the latest 100 occurrences of logs
SELECT *
FROM alb_conn_logs
ORDER by time desc
LIMIT 100
Count occurrences where tls_verify_status was NOT 'Success' and group by client_IP
SELECT distinct client_ip, count() as count from alb_conn_logs
WHERE tls_verify_status != 'Success'
GROUP by client_ip
ORDER by count() DESC;
View all occurrences where tls_handshake_latency is over two seconds in a specific time range
SELECT * FROM alb_conn_logs
WHERE (parse_datetime(time,'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z')
BETWEEN parse_datetime('2024-01-01-00:00:00','yyyy-MM-dd-HH:mm:ss')
AND parse_datetime('2024-03-20-00:00:00','yyyy-MM-dd-HH:mm:ss'))
AND (tls_handshake_latency >= 2.0);
Check for connections with an exact certification validity date and group by leaf_client_cert_serial_number
SELECT DISTINCT leaf_client_cert_serial_number, count() AS count
FROM alb_conn_logs
WHERE tls_verify_status = 'Success' AND leaf_client_cert_validity like '%NotAfter=2025-01-01%'
GROUP BY leaf_client_cert_serial_number
ORDER BY count() DESC;
Related information
How do I use Amazon Athena to analyze my Application Load Balancer access logs?