Is there a way to create a Redshift Table from a Glue table's schema?

0

Athena tables can be created from Glue tables which can have schemas based on crawlers.

**Is it also possible to use the schema of a Glue table to generate a Redshift-compatible CREATE TABLE statement? **

I tried SHOW CREATE TABLE encounter; in Athena. And then I tried plugging in the resulting CREATE TABLE statement in Redshift, but got an error:

ERROR: syntax error at or near "`" Position: 23.

I can go through the statement Athena generated and clean it up to fit Redshift requirements, like taking out the back-ticks, but I'm wondering if there's any more direct way to generate a table based on a Glue table?

This is that CREATE TABLE statement that Athena generated:

CREATE EXTERNAL TABLE `encounter`(
  `resourcetype` string COMMENT 'from deserializer', 
  `id` string COMMENT 'from deserializer', 
  `meta` struct<lastupdated:string,profile:array<string>> COMMENT 'from deserializer', 
  `identifier` array<struct<use:string,system:string,value:string>> COMMENT 'from deserializer', 
  `status` string COMMENT 'from deserializer', 
  `class` struct<system:string,code:string> COMMENT 'from deserializer', 
  `type` array<struct<coding:array<struct<system:string,code:string,display:string>>,text:string>> COMMENT 'from deserializer', 
  `subject` struct<reference:string,display:string> COMMENT 'from deserializer', 
  `participant` array<struct<type:array<struct<coding:array<struct<system:string,code:string,display:string>>,text:string>>,period:struct<start:string,end:string>,individual:struct<reference:string,display:string>>> COMMENT 'from deserializer', 
  `period` struct<start:string,end:string> COMMENT 'from deserializer', 
  `location` array<struct<location:struct<reference:string,display:string>>> COMMENT 'from deserializer', 
  `serviceprovider` struct<reference:string,display:string> COMMENT 'from deserializer', 
  `reasoncode` array<struct<coding:array<struct<system:string,code:string,display:string>>>> COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.openx.data.jsonserde.JsonSerDe' 
WITH SERDEPROPERTIES ( 
  'paths'='class,id,identifier,location,meta,participant,period,reasonCode,resourceType,serviceProvider,status,subject,type') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://bucket/Encounter/'
TBLPROPERTIES (
  'CrawlerSchemaDeserializerVersion'='1.0', 
  'CrawlerSchemaSerializerVersion'='1.0', 
  'UPDATED_BY_CRAWLER'='healthlake-export-crawler', 
  'averageRecordSize'='1561', 
  'classification'='json', 
  'compressionType'='none', 
  'objectCount'='14', 
  'recordCount'='53116', 
  'sizeKey'='83059320', 
  'typeOfData'='file')

Here's an example of what the original data looks like (it's synthetic data, so not PHI):

{
	"period": {
		"start": "2019-11-18T13:53:49-08:00",
		"end": "2019-11-18T14:23:49-08:00"
	},
	"subject": {
		"reference": "Patient/92e36d1e-66a2-4e77-9f50-155f7edf819c",
		"display": "Cyndi533 Bogan287"
	},
	"serviceProvider": {
		"reference": "Organization/3ecb1bdd-03d7-3fd2-b52d-8df2a04f5b0a",
		"display": "SOUTH SHORE SKIN CENTER, LLC"
	},
	"id": "b39745ae-14dd-46b3-9345-2916efa759ad",
	"type": [{
		"coding": [{
			"system": "http://snomed.info/sct",
			"code": "410620009",
			"display": "Well child visit (procedure)"
		}],
		"text": "Well child visit (procedure)"
	}],
	"class": {
		"system": "http://terminology.hl7.org/CodeSystem/v3-ActCode",
		"code": "AMB"
	},
	"participant": [{
		"period": {
			"start": "2019-11-18T13:53:49-08:00",
			"end": "2019-11-18T14:23:49-08:00"
		},
		"individual": {
			"reference": "Practitioner/c51e847b-fcd0-3f98-98a7-7e4274a2e6f3",
			"display": "Dr. Jacquelyne425 O'Reilly797"
		},
		"type": [{
			"coding": [{
				"system": "http://terminology.hl7.org/CodeSystem/v3-ParticipationType",
				"code": "PPRF",
				"display": "primary performer"
			}],
			"text": "primary performer"
		}]
	}],
	"resourceType": "Encounter",
	"status": "finished",
	"meta": {
		"lastUpdated": "2022-04-08T15:40:39.926Z"
	}
}
AWS
asked 2 years ago4883 views
2 Answers
0
Accepted Answer

Hi,

A Glue ETL job can be configured to create tables in the data target. If your goal is to create a table in Redshift and write data to it, consider looking into Glue ETL referenced below. There is an option to have Glue create tables in your data target, so you wouldn't have to write the schema yourself. You could then see the schema by generating the table DDL afterwards in Redshift.

https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/load-data-from-amazon-s3-to-amazon-redshift-using-aws-glue.html

AWS
hamltm
answered 2 years ago
profile picture
EXPERT
reviewed 9 months ago
  • the link is broken it actually open the same repost page.

  • Thank you Fabrizio@AWS, link is fixed.

  • That link gives nothing more than a high-level overview that isn't useful. It says for the target, choose "create tables in your data target" but doesn't say which transform might allow that option because I don't see it in the Apply Mapping or Select Fields transforms. Then for related resources it sends you to the generic AWS Glue documentation where there might possibly be something helpful in the hundreds of pages of documentation.

  • Thanks! So it sounds like the answer is "no, you can't use an existing glue data catalog table, but the end goal can be achieved via a Glue ETL job."

0

Have you considered creating an external schema in Redshift pointing to a database in Glue catalog, so the data can be accessed from Redshift via Redshift Spectrum?

https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_SCHEMA.html

AWS
Denis_A
answered 2 years ago
profile picture
EXPERT
reviewed 9 months ago
  • The name of the Glue db is healthlakedb. The schema is different for each table in the Glue db. However, I don't see a field for glue table, so this was my command: CREATE EXTERNAL SCHEMA healthlake FROM DATA CATALOG DATABASE 'healthlakedb' REGION 'us-east-1' IAM_ROLE 'default' CREATE EXTERNAL DATABASE IF NOT EXISTS Got this error: "Unknown std exception when calling external catalog API ... error: Assert code: 1000 context: arn.valid() || g_xencppunit - query: 0 location: datacatalogclient.cpp:228 process: padbmaster" Any tips on how this works with Glue tables? Using Redshift Serverless

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions