Compaction of DMS log files

0

We use DMS to export data from MY SQL to S3 post which we run ETLs. The Glue ETLs use bookmarks hence reads only what has changed from the last time it ran . However the raw data keeps increasing in terms of numerous KiloByte files.

My plan is to

  1. Write a glue job to read all these files and keep creating 256 MB files
  2. Create a retention policy on the DMS end point bucket to delete files older than 90 days

The reason

  1. for selecting 256 MB is , I read somewhere that 256 MB is the preferred file size by Athena . Is that right. ?
  2. for compacting the raw files is to make it easier for any other application to consume the data , that is, read small number of 256 MB files than reading millions of KB files

What I want to know is

  • what is the general architecture around this ?
  • Is it a good practice to implement my steps ?
没有答案

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则