How do I optimize performance when I use AWS CLI to upload large files to Amazon S3?

4 minuto de leitura
0

I want to optimize performance when I use AWS Command Line Interface (AWS CLI) to upload large files (1 GB or larger) to Amazon Simple Storage Service (Amazon S3).

Short description

If you upload large files to Amazon S3, then it's a best practice to leverage multipart uploads. If you're using AWS CLI, then all high-level aws s3 commands automatically perform a multipart upload when the object is large. These high-level commands include aws s3 cp and aws s3 sync.

To optimize performance, choose one of the following methods:

If you require large-scale data transfers or complex workloads, then use the AWS CRT-based client for AWS CLI.
Note: You must use AWS CLI version 2 for this method. If you receive errors when you run AWS CLI commands, then see Troubleshoot AWS CLI errors.

Resolution

Use the CRT-based client with AWS CLI

Activate the CRT-based client
To activate the CRT-based client, configure AWS CLI. To do this, add the configuration to your AWS CLI config file (~/.aws/config), or use AWS CLI to set it.

Add the preferred_transfer_client to your AWS CLI config file:

[default]
s3 =
preferred_transfer_client = crt

Or, enter the following command in your terminal:

aws configure set default.s3.preferred_transfer_client crt

Configure the target_bandwidth option (optional)
This option controls the target bandwidth that the transfer client tries to reach for S3 uploads and downloads. By default, AWS CLI chooses a target bandwidth that matches the maximum network bandwidth of the system.

Note: Use caution when you configure the target_bandwidth value. If the value is greater than the capabilities of your local client resources, then you might overwhelm your client. It's a best practice to leave this option null.

After you activate the CRT-based client, AWS CLI automatically uses the client for Amazon S3 operations, including file uploads. This improves performance and reliability compared to the standard AWS CLI, especially for large file uploads.

Customize the upload configurations

Customize AWS CLI configurations for Amazon S3 with these settings:

  • max_concurrent_requests This value sets the number of requests that can be sent to Amazon S3 simultaneously. The default value is 10.
    Note: If you run more threads, then your machine consumes more resources. Be sure that your machine has enough resources to support the maximum number of concurrent requests that you require.
  • max_queue_size This value sets the maximum number of tasks in the queue. The default value is 1,000.
  • multipart_threshold This value sets the size threshold for multipart uploads of individual files. The default value is 8 MB.
  • multipart_chunksize This value sets the size of each part that AWS CLI uploads in a multipart upload for an individual file. This setting allows you to break down a larger file (for example, 300 MB) into smaller parts for quicker upload speeds. The default value is 8 MB.
    Note: A multipart upload requires that a single file is uploaded in fewer than 10,000 distinct parts. Be sure that the chunksize that you set balances the part file size and the number of parts.
  • max_bandwidth This value sets the maximum bandwidth to upload data to Amazon S3. There's no default value.

Activate Amazon S3 Transfer Acceleration

Amazon S3 Transfer Acceleration provides fast and secure transfers over long distances between your client and Amazon S3. Transfer Acceleration uses the globally distributed edge locations of Amazon CloudFront.

Transfer Acceleration incurs additional charges, so be sure that you review pricing. To determine if Transfer Acceleration improves the transfer speeds for your use case, review the Amazon S3 Transfer Acceleration Speed Comparison tool.

Note: Transfer Acceleration doesn't support CopyObject copies across AWS Regions.

Related information

AWS CLI S3 Configuration

What is the AWS Command Line Interface?

AWS OFICIAL
AWS OFICIALAtualizada há 2 meses
4 Comentários

what is the size of object after which s3 cli uses multi part upload ?

AWS
respondeu há um ano

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERADOR
respondeu há um ano

I had to switch the AWS CLI to use the CRT (AWS Common Runtime) library for S3 that has better performance than the Python library. This is explained in more detail in https://awscli.amazonaws.com/v2/documentation/api/latest/topic/s3-config.html#preferred-transfer-client . It would be nice to have a link to that article here.

The following 2 commands helped me improve the performance significantly:

aws configure set default.s3.preferred_transfer_client crt
aws configure set default.s3.target_bandwidth 100Gb/s

Adjusting the multipart_chunksize variable can help as well.

AWS
respondeu há 10 meses

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERADOR
respondeu há 10 meses