Glue Data Catalog configuration when updating with Database Migration Service

0

I set up a replication task with AWS Database Migration Service to implement full load + CDC from a RDS instance to a S3 bucket. Since I want to use Athena to query the data in S3, I set the option "GlueCatalogGeneration": true so that I wouldn't need to configure a separate crawler to run periodically and get me the latest data: however, I realized that when DMS generates the tables in the Glue Data Catalog it sets the option escape.delim to null. This doesn't seem to be a problem for Athena, but if I try to access any table using Spark (e.g. with the create_dynamic_frame_from_catalog option) I receive an error of IllegalEscaper; is there some option in DMS I can configure so that this parameter doesn't get created at all?

2 Answers
1
Accepted Answer

When DMS replicates data from a database to S3 and enables Glue catalog generation, it sets certain properties in the generated Glue tables. One such property is escape.delim, which gets set to null.

This null value does not cause issues when querying the data from Athena. However, it can cause problems when trying to access the tables from Spark using the create_dynamic_frame_from_catalog option, as Spark expects a non-null escape delimiter value.

There is currently no option in DMS to configure this escape.delim property value.

  • After the initial load and replication is complete, update the Glue table definition manually through the Glue console or API to set a non-null escape delimiter value.
  • Alternatively, instead of using create_dynamic_frame_from_catalog in Spark, you can directly query the data from S3 using Spark SQL without going through the Glue catalog.
profile picture
EXPERT
answered a month ago
profile pictureAWS
EXPERT
reviewed a month ago
0

I noticed that when using create_dynamic_frame_from_options and reading directly from S3 I don't have the same problem, I was curious as to why that was the case. Thank you! Now it's clear

answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions