S3 Data Copy\Migration Best Practices

2 minute read
Content level: Intermediate
1

S3 data migration is a common use-case leveraged by customers for a variety of intra-region or inter-region projects today. Moving data efficiently and cost effectively requires an understanding of best practices that will ensure you maintain defined budgetary for the project and are performant enough to meet desired time objectives. Below are a few best practices and considerations to ensure your S3 migration projects are successful!

  • Performance/Pre-Partitioning
    • Ensure partitions are evenly created and in advance at both source and target side.
    • Ensure bucket key space is evenly distributed and utilize some hashing method in defining some key prefixes.
    • Example Key Structure: YYYY-MM-DD/{BRANDNAME}/{assetype}/{COMPLEXFNV_HASH}/{BIGINT_ASSET_ID}/{filename}.{extension}
    • S3 batch jobs upward of 100 million objects typically observe higher throughput than jobs with less objects
    • Performance is dictated by the number of active jobs running in that account and region. Each AWS account has a default limit of 1,000 active S3 Batch jobs per Region.
    • Performance is also dictated by the size of the S3 objects. 5GB is the maximum per object limit. *** Time/size**
    • Optional workflow is as follows: manifest generation, preparation, execution, and (optional) report generation.
    • Try to maintain job sizes to 500 million to one billion objects.
    • Do not exceed 4 billion keys in a single job
    • Use inventory reports or generated manifests instead of a single flat CSV.
  • Cost considerations
    • Adjust the default Multipart upload configuration to ensure the number of Puts are lesser and thus further cost optimized.
    • If using SSE-KMS CMK, keep s3 bucket key enabled, will reduce number of KMS key API calls

Though this isn’t an exhaustive list of best practices, the above can get your S3 migration projects off to a good start. Consider running through the hands on workshops below, contact your AWS account team, or post your questions on re:Post if you have additional questions regarding optimizing your next data migration project. Also be sure to check out our recent re:Post Live episode with Abhishek SHUKLA which further discusses S3 migration here (Data Back-up and Recovery with Amazon S3)

Hands on Labs