Compaction of DMS log files

0

We use DMS to export data from MY SQL to S3 post which we run ETLs. The Glue ETLs use bookmarks hence reads only what has changed from the last time it ran . However the raw data keeps increasing in terms of numerous KiloByte files.

My plan is to

  1. Write a glue job to read all these files and keep creating 256 MB files
  2. Create a retention policy on the DMS end point bucket to delete files older than 90 days

The reason

  1. for selecting 256 MB is , I read somewhere that 256 MB is the preferred file size by Athena . Is that right. ?
  2. for compacting the raw files is to make it easier for any other application to consume the data , that is, read small number of 256 MB files than reading millions of KB files

What I want to know is

  • what is the general architecture around this ?
  • Is it a good practice to implement my steps ?
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions