Glue Data Catalog configuration when updating with Database Migration Service

0

I set up a replication task with AWS Database Migration Service to implement full load + CDC from a RDS instance to a S3 bucket. Since I want to use Athena to query the data in S3, I set the option "GlueCatalogGeneration": true so that I wouldn't need to configure a separate crawler to run periodically and get me the latest data: however, I realized that when DMS generates the tables in the Glue Data Catalog it sets the option escape.delim to null. This doesn't seem to be a problem for Athena, but if I try to access any table using Spark (e.g. with the create_dynamic_frame_from_catalog option) I receive an error of IllegalEscaper; is there some option in DMS I can configure so that this parameter doesn't get created at all?

2개 답변
1
수락된 답변

When DMS replicates data from a database to S3 and enables Glue catalog generation, it sets certain properties in the generated Glue tables. One such property is escape.delim, which gets set to null.

This null value does not cause issues when querying the data from Athena. However, it can cause problems when trying to access the tables from Spark using the create_dynamic_frame_from_catalog option, as Spark expects a non-null escape delimiter value.

There is currently no option in DMS to configure this escape.delim property value.

  • After the initial load and replication is complete, update the Glue table definition manually through the Glue console or API to set a non-null escape delimiter value.
  • Alternatively, instead of using create_dynamic_frame_from_catalog in Spark, you can directly query the data from S3 using Spark SQL without going through the Glue catalog.
profile picture
전문가
답변함 한 달 전
profile pictureAWS
전문가
검토됨 한 달 전
0

I noticed that when using create_dynamic_frame_from_options and reading directly from S3 I don't have the same problem, I was curious as to why that was the case. Thank you! Now it's clear

답변함 한 달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠