AWS Glue Crawler Scalability for Large Number of Delta Tables

0

Question: We currently have approximately 100 tables in delta format, partitioned by yyyy, mm, dd, hh, mm. Our current process involves reading these delta tables via a crawler, cataloging them, and utilizing spectrum tables in Redshift for building business logic.

However, we are encountering scalability limitations due to the maximum of 10 tables per crawler. As we continue to add more tables, adding additional crawlers becomes cumbersome. Additionally, the data volume on some of these tables is substantial, with up to 500k records per hour.

Considering these constraints, what would be the optimal approach to read the delta tables in parallel via the crawler? Can we configure the crawler to utilize an RDS database for improved scalability? Any insights or best practices would be appreciated.

  • Could you share how you're creating these Delta tables? Where is the source data coming from for these tables?

  • We are creating the delta tables via Glue ETL. Source - API.

질문됨 한 달 전356회 조회
답변 없음

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인