How do I improve data transfer performance with the AWS CLI sync command for Amazon S3?

5 minute read
4

I use the AWS Command Line Interface (AWS CLI) "sync" command to transfer data on Amazon Simple Storage Service (Amazon S3). However, the transfer takes a long time to complete.

Short description

The number of objects in the source and destination bucket might affect the time it takes for the sync command to complete the process. Transfer size can affect the duration of the sync or the cost that you incur from requests to Amazon S3.

Delete markers also affect list performance, so it's a best practice to minimize the number of delete markers. Because the sync command runs list API calls on the backend, delete markers also affect the performance of the sync command. You can use an S3 Lifecycle configuration rule to automatically remove expired delete markers in a versioning-activated bucket.

Resolution

To improve transfer time when you run the sync command, use the following practices.

Run multiple AWS CLI operations

Note: If you receive errors when you run AWS CLI commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.

To copy a large amount of data, run separate sync operations in parallel. The following example command runs parallel sync operations for different prefixes:

aws s3 sync s3://source-AWSDOC-EXAMPLE-BUCKET/folder1 s3://destination-AWSDOC-EXAMPLE-BUCKET/folder1  aws s3 sync s3://source-AWSDOC-EXAMPLE-BUCKET/folder2 s3://destination-AWSDOC-EXAMPLE-BUCKET/folder2

Or, run parallel sync operations for separate exclude and include filters. The following example operations separate the files to sync by key names that begin with numbers 0 through 4, and numbers 5 through 9:

aws s3 sync s3://source-AWSDOC-EXAMPLE-BUCKET/ s3://destination-AWSDOC-EXAMPLE-BUCKET/ --exclude "*" --include "0*" --include "1*" --include "2*" --include "3*" --include "4*"  aws s3 sync s3://source-AWSDOC-EXAMPLE-BUCKET/ s3://destination-AWSDOC-EXAMPLE-BUCKET/ --exclude "*" --include "5*" --include "6*" --include "7*" --include "8*" --include "9*"

Note: Even when you use exclude and include filters, the sync command still reviews all files in the source bucket. The review identifies the source files to copy to the destination bucket. If you have multiple sync operations for different key name prefixes, then each sync operation reviews all the source files. However, because of the exclude and include filters, only the files that you include in the filters copy to the destination bucket.

For more information about how to optimize the performance of your workload, see Best practices design patterns: optimizing Amazon S3 performance.

Activate Amazon S3 Transfer Acceleration

Use Amazon S3 Transfer Acceleration to improve your transfer speeds.

Transfer Acceleration incurs additional charges. To review pricing, choose the Data transfer tab on the Amazon S3 pricing page. To determine whether Transfer Acceleration improves your transfer speeds, use the Amazon S3 Transfer Acceleration Speed Comparison tool.

Note: Transfer Acceleration doesn't support the CopyObject action across AWS Regions.

Modify the AWS CLI configuration values

max_concurrent_requests

When you use max_concurrent_requests, the default number of requests that you can send to Amazon S3 at one time is 10. To improve performance, increase the value.

 Important:

  • When you run more threads, you use more resources on your machine. Make sure that your machine has enough resources to support your maximum number of concurrent requests.
  • Too many concurrent requests might cause connection timeouts or slow the system's responsiveness. To avoid timeout issues from the AWS CLI, set the --cli-read-timeout value or the --cli-connect-timeout value to 0.

multipart_threshold

When a file reaches the size threshold, Amazon S3 uses a multipart upload instead of a single operation. The default value for multipart_threshold is 8 MB. To increase the default value, run the following command:

aws configure set default.s3.multipart_threshold 16MB

Note: Replace 16MB with the value that you want to increase to.

multipart_chunksize

The default value for multipart_chunksize is 8 MB and the minimum value is 5 MB. To increase the chunk size threshold, run the following command:

aws configure set default.s3.multipart_chunksize 16MB

Note: Replace 16MB with the value that you want to increase to.

You can either specify the number of bytes as an integer or use a size suffix. For large objects, you can set the multipart_threshold to 100 MB so that only significantly large files use multipart uploads. Set the multipart_chunksize to 25 MB to balance between efficient uploads and manageable part sizes.

(Optional) Check your EC2 instance configuration

If you use an Amazon Elastic Compute Cloud (Amazon EC2) instance to run the sync operation, then use the following best practices to improve performance:

  • Use a larger Amazon EC2 instance type. Larger instance types have high bandwidth and Amazon Elastic Block Store (Amazon EBS)-optimized networks.
  • To reduce latency, reduce the geographical distance between the instance and your Amazon S3 bucket. If the instance is in a different Region than your bucket, then use an instance that's in the same Region. 
  • If the instance is in the same Region as your source bucket, then set up an Amazon Virtual Private Cloud (Amazon VPC) gateway endpoint for S3

Related information

How can I improve the transfer speeds for copying data between my S3 bucket and EC2 instance?

How do I transfer large amounts of data from one Amazon S3 bucket to another?

How do I troubleshoot slow or inconsistent speeds when downloading or uploading to Amazon S3?