How do I analyze AWS WAF logs in Athena?

7 minuto de leitura
0

I want to query AWS WAF logs in Amazon Athena.

Resolution

To query AWS WAF logs in Athena, create a database and table schema in Amazon Simple Storage Service (Amazon S3). Then, use the sample queries to get the required information from your logs.

Create a database and table in Amazon S3

  1. Turn on web access control list (web ACL) logging for your Amazon S3 bucket. Copy the values for Target bucket and Target prefix into a text file to use in the table schema. These values specify the Amazon S3 location in your Athena query.

  2. Open the Athena console.
    Note: Before you run your first query, create an S3 bucket for your query result location.

  3. In the Query editor, run CREATE DATABASE to create a database:

    CREATE DATABASE waf_logs_db

    Note: It's a best practice to create the database in the same AWS Region as your Amazon S3 bucket.

  4. Create a table schema for the AWS WAF logs in Athena. The following example is a table template query with partition projection:

    CREATE EXTERNAL TABLE `waf_logs`(  `timestamp` bigint,
      `formatversion` int,
      `webaclid` string,
      `terminatingruleid` string,
      `terminatingruletype` string,
      `action` string,
      `terminatingrulematchdetails` array <
                                        struct <
                                            conditiontype: string,
                                            sensitivitylevel: string,
                                            location: string,
                                            matcheddata: array < string >
                                              >
                                         >,
      `httpsourcename` string,
      `httpsourceid` string,
      `rulegrouplist` array <
                          struct <
                              rulegroupid: string,
                              terminatingrule: struct <
                                                  ruleid: string,
                                                  action: string,
                                                  rulematchdetails: array <
                                                                       struct <
                                                                           conditiontype: string,
                                                                           sensitivitylevel: string,
                                                                           location: string,
                                                                           matcheddata: array < string >
                                                                              >
                                                                        >
                                                    >,
                              nonterminatingmatchingrules: array <
                                                                  struct <
                                                                      ruleid: string,
                                                                      action: string,
                                                                      overriddenaction: string,
                                                                      rulematchdetails: array <
                                                                                           struct <
                                                                                               conditiontype: string,
                                                                                               sensitivitylevel: string,
                                                                                               location: string,
                                                                                               matcheddata: array < string >
                                                                                                  >
                                                                                           >
                                                                        >
                                                                 >,
                              excludedrules: string
                                >
                           >,
    `ratebasedrulelist` array <
                             struct <
                                 ratebasedruleid: string,
                                 limitkey: string,
                                 maxrateallowed: int
                                   >
                              >,
      `nonterminatingmatchingrules` array <
                                        struct <
                                            ruleid: string,
                                            action: string,
                                            rulematchdetails: array <
                                                                 struct <
                                                                     conditiontype: string,
                                                                     sensitivitylevel: string,
                                                                     location: string,
                                                                     matcheddata: array < string >
                                                                        >
                                                                 >,
                                            captcharesponse: struct <
                                                                responsecode: string,
                                                                solvetimestamp: string
                                                                 >
                                              >
                                         >,
      `requestheadersinserted` array <
                                    struct <
                                        name: string,
                                        value: string
                                          >
                                     >,
      `responsecodesent` string,
      `httprequest` struct <
                        clientip: string,
                        country: string,
                        headers: array <
                                    struct <
                                        name: string,
                                        value: string
                                          >
                                     >,
                        uri: string,
                        args: string,
                        httpversion: string,
                        httpmethod: string,
                        requestid: string
                          >,
      `labels` array <
                   struct <
                       name: string
                         >
                    >,
      `captcharesponse` struct <
                            responsecode: string,
                            solvetimestamp: string,
                            failureReason: string
                              >,
      `challengeresponse` struct <
                            responsecode: string,
                            solvetimestamp: string,
                            failureReason: string
                            >,
      `ja3Fingerprint` string
    )
    PARTITIONED BY ( 
    `region` string, 
    `date` string) 
    ROW FORMAT SERDE 
      'org.openx.data.jsonserde.JsonSerDe' 
    STORED AS INPUTFORMAT 
      'org.apache.hadoop.mapred.TextInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
      's3://DOC-EXAMPLE-BUCKET/AWSLogs/accountID/WAFLogs/region/DOC-EXAMPLE-WEBACL/'
    TBLPROPERTIES(
     'projection.enabled' = 'true',
     'projection.region.type' = 'enum',
     'projection.region.values' = 'us-east-1,us-west-2,eu-central-1,eu-west-1',
     'projection.date.type' = 'date',
     'projection.date.range' = '2021/01/01,NOW',
     'projection.date.format' = 'yyyy/MM/dd',
     'projection.date.interval' = '1',
     'projection.date.interval.unit' = 'DAYS',
     'storage.location.template' = 's3://DOC-EXAMPLE-BUCKET/AWSLogs/accountID/WAFLogs/${region}/DOC-EXAMPLE-WEBACL/${date}/')

    Note: Replace storage.location.template, projection.region.values, projection.date.range, DOC-EXAMPLE-BUCKET, and DOC-EXAMPLE-WEBACL with your values.

    The following example is a table template query without partition projection:

    CREATE EXTERNAL TABLE `waf_logs`(  `timestamp` bigint,
      `formatversion` int,
      `webaclid` string,
      `terminatingruleid` string,
      `terminatingruletype` string,
      `action` string,
      `terminatingrulematchdetails` array <
                                        struct <
                                            conditiontype: string,
                                            sensitivitylevel: string,
                                            location: string,
                                            matcheddata: array < string >
                                              >
                                         >,
      `httpsourcename` string,
      `httpsourceid` string,
      `rulegrouplist` array <
                          struct <
                              rulegroupid: string,
                              terminatingrule: struct <
                                                  ruleid: string,
                                                  action: string,
                                                  rulematchdetails: array <
                                                                       struct <
                                                                           conditiontype: string,
                                                                           sensitivitylevel: string,
                                                                           location: string,
                                                                           matcheddata: array < string >
                                                                              >
                                                                        >
                                                    >,
                              nonterminatingmatchingrules: array <
                                                                  struct <
                                                                      ruleid: string,
                                                                      action: string,
                                                                      overriddenaction: string,
                                                                      rulematchdetails: array <
                                                                                           struct <
                                                                                               conditiontype: string,
                                                                                               sensitivitylevel: string,
                                                                                               location: string,
                                                                                               matcheddata: array < string >
                                                                                                  >
                                                                                           >
                                                                        >
                                                                 >,
                              excludedrules: string
                                >
                           >,
    `ratebasedrulelist` array <
                             struct <
                                 ratebasedruleid: string,
                                 limitkey: string,
                                 maxrateallowed: int
                                   >
                              >,
      `nonterminatingmatchingrules` array <
                                        struct <
                                            ruleid: string,
                                            action: string,
                                            rulematchdetails: array <
                                                                 struct <
                                                                     conditiontype: string,
                                                                     sensitivitylevel: string,
                                                                     location: string,
                                                                     matcheddata: array < string >
                                                                        >
                                                                 >,
                                            captcharesponse: struct <
                                                                responsecode: string,
                                                                solvetimestamp: string
                                                                 >
                                              >
                                         >,
      `requestheadersinserted` array <
                                    struct <
                                        name: string,
                                        value: string
                                          >
                                     >,
      `responsecodesent` string,
      `httprequest` struct <
                        clientip: string,
                        country: string,
                        headers: array <
                                    struct <
                                        name: string,
                                        value: string
                                          >
                                     >,
                        uri: string,
                        args: string,
                        httpversion: string,
                        httpmethod: string,
                        requestid: string
                          >,
      `labels` array <
                   struct <
                       name: string
                         >
                    >,
      `captcharesponse` struct <
                            responsecode: string,
                            solvetimestamp: string,
                            failureReason: string
                              >,
      `challengeresponse` struct <
                              responsecode: string,
                              solvetimestamp: string,
                              failureReason: string
                              >,
      `ja3Fingerprint` string
    )
    ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
    STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION 's3://DOC-EXAMPLE-BUCKET/prefix/'

    Note: Replace DOC-EXAMPLE-BUCKET with your S3 bucket name.

  5. In the navigation pane, under Tables, choose Preview table. Confirm that the AWS WAF data, such as formatversion, webaclid, httpsourcename, and ja3Fingerprint, are in the table.

Analyze your AWS WAF logs in Athena

To analyze your AWS WAF log files, use the following example queries. You can also create your own queries.

Count the matched IP addresses that align with excluded rules in the last 10 days

Run the following command:

WITH test_dataset AS   (SELECT * FROM waf_logs 
    CROSS JOIN UNNEST(rulegrouplist) AS t(allrulegroups))
SELECT 
  COUNT(*) AS count, 
  "httprequest"."clientip", 
  "allrulegroups"."excludedrules",
  "allrulegroups"."ruleGroupId"
FROM test_dataset 
WHERE allrulegroups.excludedrules IS NOT NULL AND from_unixtime(timestamp/1000) > now() - interval '10' day
GROUP BY "httprequest"."clientip", "allrulegroups"."ruleGroupId", "allrulegroups"."excludedrules"
ORDER BY count DESC

Note: Replace 10 with your time frame.

Return records for a specified date range and IP address

Run the following command:

SELECT * FROM waf_logs 
WHERE httprequest.clientip='192.168.0.0' AND "date" >= '2022/03/01' AND "date" < '2022/03/31'

Replace 192.168.0.0 with your IP address, and 2022/03/01 and 2022/03/31 with your dates.

Count the number of times a request was blocked, grouped by specific attributes

Run the following command:

SELECT   COUNT(*) AS count,
  webaclid,
  terminatingruleid,
  httprequest.clientip,
  httprequest.uri
FROM waf_logs
WHERE action='BLOCK'
GROUP BY webaclid, terminatingruleid, httprequest.clientip, httprequest.uri
ORDER BY count DESC
LIMIT 100;

Note: Replace webaclid, terminatingruleid, httprequest.clientip, and httprequest.uri with your values and 100 with the maximum number of results that you want.

Group all counted custom rules by number of times matched

Run the following command:

SELECT  count(*) AS count,
         httpsourceid,
         httprequest.clientip,
         t.ruleid,
         t.action
FROM "waf_logs" 
CROSS JOIN UNNEST(nonterminatingmatchingrules) AS t(t) 
WHERE action <> 'BLOCK' AND cardinality(nonTerminatingMatchingRules) > 0 
GROUP BY t.ruleid, t.action, httpsourceid, httprequest.clientip 
ORDER BY "count" DESC
Limit 50

Run a query with a filter IP address

Run the following command:

SELECT * FROM "waf_logs_db"."waf_logs" where httprequest.clientip='192.168.0.0' limit 10;

Note: Replace 192.168.0.0 with your IP address, and 10 with the maximum number of results that you want.

Select a datestamp where the request doesn't have an origin header, browser user agent string, or cookies

Run the following command:

SELECT
datestamp,
element_at(filter(httprequest.headers, headers -> lower(headers.name) = 'origin'), 1).value IS NULL AS MissingOrigin,
httprequest.clientip,
element_at(filter(httprequest.headers, headers -> lower(headers.name) = 'user-agent'), 1).value AS UserAgent,
element_at(filter(httprequest.headers, headers -> lower(headers.name) = 'cookie'), 1).value AS Cookie
from "waf_logs_db"."waf_logs"
where webaclname = 'production-web'
AND datestamp >= '2021/01/01'
AND httprequest.uri = '/uri/path'
AND httprequest.httpmethod = 'POST'
order by 1 desc

Note: Replace production-web with your web ACL and 2021/01/01 with your date.

Count and sort records by a specific column

The following example query counts and sorts records based on the User-Agent and URI path columns. It also excludes specific HTTP methods and requests with an origin request header.

Run the following command:

SELECT
count() AS Count,
element_at(filter(httprequest.headers, headers -> lower(headers.name) = 'user-agent'), 1).value AS useragent,
httprequest.uri
from "db"."waf_logs"
where webaclname = 'production-web'
AND httprequest.httpmethod != 'GET'
AND httprequest.httpmethod != 'HEAD'
AND element_at(filter(httprequest.headers, headers -> lower(headers.name) = 'origin'), 1).value IS NULL
AND datestamp >= '2021/01/01'
group by 2,3
ORDER BY 1 desc

Note: Replace production-web with your web ACL and 2021/01/01 with your date.

For more information, see Querying AWS WAF logs.

Related information

Step 1: Create a database

Troubleshooting in Athena

How do I use Amazon Athena to analyze my Amazon S3 server access logs?

AWS OFICIAL
AWS OFICIALAtualizada há 6 meses