Compaction of DMS log files

0

We use DMS to export data from MY SQL to S3 post which we run ETLs. The Glue ETLs use bookmarks hence reads only what has changed from the last time it ran . However the raw data keeps increasing in terms of numerous KiloByte files.

My plan is to

  1. Write a glue job to read all these files and keep creating 256 MB files
  2. Create a retention policy on the DMS end point bucket to delete files older than 90 days

The reason

  1. for selecting 256 MB is , I read somewhere that 256 MB is the preferred file size by Athena . Is that right. ?
  2. for compacting the raw files is to make it easier for any other application to consume the data , that is, read small number of 256 MB files than reading millions of KB files

What I want to know is

  • what is the general architecture around this ?
  • Is it a good practice to implement my steps ?
답변 없음

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠