Skip to content

Automate malware scanning of incoming files to your Amazon S3 bucket before processing

4 minute read
Content level: Expert
0

A common challenge faced by many AWS users is ensuring the security when dealing with third-party file uploads.

A common challenge faced by many AWS users is ensuring the security when dealing with third-party file uploads. Recently, one of our partners approached me with a similar concern - how could they automate the scanning of incoming files to an S3 bucket before their application picks them up for processing?

The issue was that several third-party vendors were uploading files directly to an S3 bucket, making it crucial to check for any malicious content before allowing the application to handle these files. Letting potentially harmful files slip through could expose the entire system to serious risks.

In June 2024 AWS announced Amazon GuardDuty Malware Protection for Amazon S3, an expansion of GuardDuty Malware Protection to detect malicious file uploads to selected S3 buckets. This is perfect for this use-case. Below are the requirements that need to be met in order for this to be complete solution:

1- Incoming files should be scanned for malware

2- Infected files should be quarantined and notification sent for further action(s)

3- Clean files should be ready for application to process

4- Minimum operational overhead to write and manage custom code (Lambda run times)

5- Easy operational auditability to know if the system is working

When you enable GuardDuty Malware Protection for S3, you can optionally enable tagging of objects based on scan result. The potential scan result tag values include NO_THREATS_FOUND, THREATS_FOUND, UNSUPPORTED, ACCESS_DENIED, and FAILED. In this solution we will use these tags to move the incoming file from example-input-bucket to either example-quarantine-bucket, example-clean-bucket or example-unprocessed-bucket accordingly.

We will make use of AWS Step function SDK integrations to call S3 CopyObject and DelteObject APIs copy the files between buckets and finally delete it form the input bucket. Using Step Functions SDK integrations eliminates the need to write and maintain custom Lambda functions hence meeting the requirement #4 above.

When a file is copied to any of the destination buckets S3 ObjectCreated Event Notificationcan be used to invoke downstream workflows, like application picking up files for processing and/or notifying via SNS hence meeting requirements #2 and #3 above.

Furthermore the files are removed from the incoming buckets after GuardDuty malware scan. They move to one of the three destination buckets based on the Object tag, making it operationally simpler. Use of CloudWatch metrics to

The Step Function workflow itself is invoked by an event bridge rule with the source as aws.guardduty and target being the Step Function state machine.

Enter image description here

Here is a sample GuardDuty event:

{
  "version": "0",
  "id": "<SNIP>",
  "detail-type": "GuardDuty Malware Protection Object Scan Result",
  "source": "aws.guardduty",
  "account": "<SNIP>",
  "time": "2024-10-01T08:32:07Z",
  "region": "ap-southeast-2",
  "resources": [
    "arn:aws:guardduty:ap-southeast-2:<SNIP>:malware-protection-plan/<SNIP>"
  ],
  "detail": {
    "schemaVersion": "1.0",
    "scanStatus": "COMPLETED",
    "resourceType": "S3_OBJECT",
    "s3ObjectDetails": {
      "bucketName": "<SNIP>",
      "objectKey": "eicar_com.zip",
      "eTag": "<SNIP>",
      "versionId": null,
      "s3Throttled": false
    },
    "scanResultDetails": {
      "scanResultStatus": "THREATS_FOUND",
      "threats": [
        {
          "name": "EICAR-Test-File (not a virus)"
        }
      ]
    }
  }
}

Step Function looks like this: Enter image description here

{
  "Comment": "Copy and delete S3 object based on GuardDuty Malware Protection Scan Result",
  "StartAt": "Check Scan Result Status",
  "States": {
    "Check Scan Result Status": {
      "Type": "Choice",
      "Choices": [
        {
          "And": [
            {
              "Variable": "$.detail.scanResultDetails.scanResultStatus",
              "StringEquals": "THREATS_FOUND"
            },
            {
              "Variable": "$.detail.scanStatus",
              "StringEquals": "COMPLETED"
            }
          ],
          "Next": "Copy Object to Quarantine"
        },
        {
          "And": [
            {
              "Variable": "$.detail.scanResultDetails.scanResultStatus",
              "StringEquals": "NO_THREATS_FOUND"
            },
            {
              "Variable": "$.detail.scanStatus",
              "StringEquals": "COMPLETED"
            }
          ],
          "Next": "Copy Object to Clean"
        }
      ],
      "Default": "Copy Object to Unprocessed"
    },
    "Copy Object to Quarantine": {
      "Type": "Task",
      "InputPath": "$",
      "ResultPath": null,
      "Resource": "arn:aws:states:::aws-sdk:s3:copyObject",
      "Parameters": {
        "Bucket": "example-quarantine-bucket",
        "CopySource.$": "States.Format('{}/{}', $.detail.s3ObjectDetails.bucketName, $.detail.s3ObjectDetails.objectKey)",
        "Key.$": "$.detail.s3ObjectDetails.objectKey"
      },
      "Next": "Delete Object"
    },
    "Copy Object to Clean": {
      "Type": "Task",
      "InputPath": "$",
      "ResultPath": null,
      "Resource": "arn:aws:states:::aws-sdk:s3:copyObject",
      "Parameters": {
        "Bucket": "example-clean-bucket",
        "CopySource.$": "States.Format('{}/{}', $.detail.s3ObjectDetails.bucketName, $.detail.s3ObjectDetails.objectKey)",
        "Key.$": "$.detail.s3ObjectDetails.objectKey"
      },
      "Next": "Delete Object"
    },
    "Copy Object to Unprocessed": {
      "Type": "Task",
      "InputPath": "$",
      "ResultPath": null,
      "Resource": "arn:aws:states:::aws-sdk:s3:copyObject",
      "Parameters": {
        "Bucket": "example-unprocessed-bucket",
        "CopySource.$": "States.Format('{}/{}', $.detail.s3ObjectDetails.bucketName, $.detail.s3ObjectDetails.objectKey)",
        "Key.$": "$.detail.s3ObjectDetails.objectKey"
      },
      "Next": "Delete Object"
    },
    "Delete Object": {
      "Type": "Task",
      "Parameters": {
        "Bucket.$": "$.detail.s3ObjectDetails.bucketName",
        "Key.$": "$.detail.s3ObjectDetails.objectKey"
      },
      "Resource": "arn:aws:states:::aws-sdk:s3:deleteObject",
      "Next": "Finish"
    },
    "Finish": {
      "Type": "Pass",
      "End": true
    }
  }
}

As you can see how we can leverage various services in AWS to achieve an outcome while minimizing operational overhead.

Disclaimers:

  • This is just one way to solve the problem, it does not mean it is the only way or the best way. Please consider tradeoffs while making choices
  • Any AWS usage charges incurred by following this article is your responsibility
  • Sample codes and templates are just samples, please use your judgement and analysis before using them.