- Newest
- Most votes
- Most comments
If you only call any of the available upload APIs and specify a single chunk as the contents for a destination object, S3 will take the chunk as representing the entire contents of the object, overwriting an existing object by that key.
For S3 to consider the chunks parts of the same object, you first have to call create_multipart_upload
(https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/create_multipart_upload.html) to establish a multipart upload session. Then use upload_part
(https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/upload_part.html) to upload each chunk. Note that each chunk has to be a minimum of 5 MiB in size, except for the last part, and there cannot be more than 10,000 parts to one multipart upload. Once all the parts are uploaded, call complete_multipart_upload
(https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/complete_multipart_upload.html) to tell S3 that it's time to assemble the full object by concatenating the parts and committing the combined result as the target object in the S3 bucket.
There's no particular requirement to upload the parts in multiple threads in your code. It's just that the part uploads are completely independent of one another and can be sent to different S3 servers, so nearly unlimited throughput can be achieved by parallelising the upload. It's probably neither needed nor practical in your specific use case, since you are decompressing the file in a single thread anyway. Technically, there's nothing preventing you from uploading one part at a time.
Note that if the multipart upload doesn't get completed for any reason, the upload session and the parts that you uploaded will remain in S3's staging area for all eternity, until or unless you abort the multipart upload explicitly with abort_multipart_upload
(https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/abort_multipart_upload.html). The parts in the staging area are charged at the S3 Standard storage class rate. However, there's a built-in mechanism that you can activate to abort incomplete uploads one or more days after they were started. This blog post explains how and why to set it up: https://aws.amazon.com/blogs/aws-cloud-financial-management/discovering-and-deleting-incomplete-multipart-uploads-to-lower-amazon-s3-costs/
Relevant content
- asked a year ago
- asked 7 months ago
- asked a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
Thank you for the details. I already used same approach that you shared. But my doubt was how to use multithread to make the transfer much faster ( if there are 1000 of files in the zip) .