Copy from S3 more than 200 milions archives. That's impossible?

0

I'm trying to copy from S3 a bucket with more than 200 milions of arquives. I've tryed use rclone to do that. But evertime after some 20 hours rclone stop for a memory erros. (my baremetal have 96GB of ram). I've tryed too AWS CLI copy but after one day running nothing happens. Somebody have a solution? Regards

2 Answers
0

Could you please provide more information on the use case? are you copying from s3 to s3, same region, across region, from on-prem to the cloud or the other way around? average size of the files and hierarchy if possible, network bandwidth available?

If you wish to copy between s3 buckets, please look at this article: https://repost.aws/knowledge-center/s3-large-transfer-between-buckets

profile pictureAWS
EXPERT
answered a year ago
0

At the scale of 200 million objects, using rclone or the AWS CLI copy commands is not a good idea and will be slow. It's also hard to track progress and failures that way.

If you're copying between two S3 buckets, using S3 Batch Operations and/or S3 replication is the way to go since it can track progress and it is an automated solution that works at scale.

When you say "archives" do you mean objects stored in cold storage like Glacier or Glacier Deep Archive? In that case, you will first need to restore these archived objects using S3 Batch Operations and then copy the objects over using S3 Batch Operations and/or S3 replication.

If you are trying to download all 200 million objects onto a local machine, if the total size of the download is in the multi-TB range, then you should use Snowball Edge to export your S3 data. If there's not much data (a bunch of tiny objects that total 200 million), then you should first create an S3 Inventory report and then write a mini-application using an AWS SDK (i.e. Boto3) that can do simultaneous multi-threaded GETs on all those objects and keep track of failed requests.

AWS
Krishna
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions