Reading dynamo DB based glue catalog table using redshift external table

0

I have created a glue catalog table on top of a dynamo DB table using glue crawler.

I'm trying to read this glue table by creating a external schema in redshift on top of glue database.

Glue catalog and redshift are located in different accounts, so I have configured required IAM permissions (DynamoDB and glue is located on the same account).

Following is the CREATE EXTERNAL SCHEMA used to create the external schema :

create external schema a2dynamodb from data catalog database 'a2-dynamodb' region 'us-east-1' iam_role 'arn:aws:iam::<RedshiftAccountID>:role/CrossAccountGlueToRedshift,arn:aws:iam::<GlueAccountID>:role/GlueDynamoAnalyticsAccountRole';

Once the schema is created based on Glue database, redshift is automatically fetching all the tables in the glue database as external tables under the a2dynamodb external schema.

When I try to query these tables, I get the follwoing error on redshift:

[XX000] ERROR: S3Location [2023-10-27 15:18:41] Detail: [2023-10-27 15:18:41] ----------------------------------------------- [2023-10-27 15:18:41] error: S3Location [2023-10-27 15:18:41] code: 8001 [2023-10-27 15:18:41] context: Invalid S3 Path: arn:aws:dynamodb:us-east-1:<DynamoDBAccountID>:table/Users [2023-10-27 15:18:41] query: 32988533 [2023-10-27 15:18:41] location: s3location.cpp:146 [2023-10-27 15:18:41] process: padbmaster [pid=1073840402]

Folowing is the CREATE TABLE query of the auto fetched external table :

create external table a2dynamodb.users ( devadmin boolean, costcenterid string, systemuser boolean, buildingcode string, costcentername string, fullname string, userid string, email string, status string, username string, managerusername string ) row format serde 'com.amazon.ionhiveserde.IonHiveSerDe' stored as inputformat 'com.amazon.ionhiveserde.formats.IonInputFormat' outputformat 'com.amazon.ionhiveserde.formats.IonOutputFormat' location 'arn:aws:dynamodb:us-east-1:<DynamoDBAccountID>:table/Users' table properties ('CrawlerSchemaDeserializerVersion'='1.0', 'CrawlerSchemaSerializerVersion'='1.0', 'UPDATED_BY_CRAWLER'='A2-DynamoDB-Users');

Why am I getting Invalid S3 path error when the external tables are based on dynamoDB tables?

已提问 7 个月前320 查看次数
2 回答
0
已接受的回答

Use DynamoDB as source for Glue table and direct integration with Amazon Redshift is not yet supported. Alternatively you can get the data synced into Redshift and then query it.

The Redshift COPY command supports loading DynamoDB table into Redshift table. This is good for one-time load. Refer https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-source-dynamodb.html

For continued replication refer https://aws.amazon.com/blogs/database/dynamodb-streams-use-cases-and-design-patterns/ which describes leveraging DynamoDB Streams.

profile pictureAWS
已回答 7 个月前
0

Regarding the continuous replication, an article was recently posted which shows how to use a stored procedure to convert the JSON payload to tables/columns and to even handle schema changes:

https://repost.aws/articles/ARlyMMJAeUQLWZd8p3c1Jz6A/automated-change-data-capture-cdc-data-ingestion-from-dynamodb-to-redshift

profile pictureAWS
专家
Rajiv_G
已回答 6 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则