Compaction of DMS log files

0

We use DMS to export data from MY SQL to S3 post which we run ETLs. The Glue ETLs use bookmarks hence reads only what has changed from the last time it ran . However the raw data keeps increasing in terms of numerous KiloByte files.

My plan is to

  1. Write a glue job to read all these files and keep creating 256 MB files
  2. Create a retention policy on the DMS end point bucket to delete files older than 90 days

The reason

  1. for selecting 256 MB is , I read somewhere that 256 MB is the preferred file size by Athena . Is that right. ?
  2. for compacting the raw files is to make it easier for any other application to consume the data , that is, read small number of 256 MB files than reading millions of KB files

What I want to know is

  • what is the general architecture around this ?
  • Is it a good practice to implement my steps ?
Sem respostas

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas