Compaction of DMS log files

0

We use DMS to export data from MY SQL to S3 post which we run ETLs. The Glue ETLs use bookmarks hence reads only what has changed from the last time it ran . However the raw data keeps increasing in terms of numerous KiloByte files.

My plan is to

Write a glue job to read all these files and keep creating 256 MB files
Create a retention policy on the DMS end point bucket to delete files older than 90 days

The reason

for selecting 256 MB is , I read somewhere that 256 MB is the preferred file size by Athena . Is that right. ?
for compacting the raw files is to make it easier for any other application to consume the data , that is, read small number of 256 MB files than reading millions of KB files

What I want to know is

what is the general architecture around this ?
Is it a good practice to implement my steps ?

Tópicos

Armazenamento Análise Banco de dados

Tags

Serviço de armazenamento simples da Amazon Amazon Athena AWS Glue Extrair, transformar e carregar dados

Idioma

English

feita há um ano62 visualizações

Sem respostas

Mais recentes
Mais votos
Mais comentários

Conteúdo relevante

Solução de problemas do erro do AWS DMS “Last Error Replication task out of memory. Stop Reason FATAL_ERROR Error Level FATAL”
AWS OFICIALAtualizada há um ano
Como posso solucionar problemas de T-Log cheio na fonte do RDS para o SQL Server quando a CDC da tarefa do AWS DMS está ativada?
AWS OFICIALAtualizada há um ano
Por que recebo o erro “Unable to validate the following destination configurations” quando crio uma Notificação de eventos do Amazon S3?
AWS OFICIALAtualizada há 7 meses
How do I connect to my WorkSpace with RDP?
AWS OFICIALAtualizada há 2 anos