Is there a way to create a Redshift Table from a Glue table's schema?

0

Athena tables can be created from Glue tables which can have schemas based on crawlers.

**Is it also possible to use the schema of a Glue table to generate a Redshift-compatible CREATE TABLE statement? **

I tried SHOW CREATE TABLE encounter; in Athena. And then I tried plugging in the resulting CREATE TABLE statement in Redshift, but got an error:

ERROR: syntax error at or near "`" Position: 23.

I can go through the statement Athena generated and clean it up to fit Redshift requirements, like taking out the back-ticks, but I'm wondering if there's any more direct way to generate a table based on a Glue table?

This is that CREATE TABLE statement that Athena generated:

CREATE EXTERNAL TABLE `encounter`(
  `resourcetype` string COMMENT 'from deserializer', 
  `id` string COMMENT 'from deserializer', 
  `meta` struct<lastupdated:string,profile:array<string>> COMMENT 'from deserializer', 
  `identifier` array<struct<use:string,system:string,value:string>> COMMENT 'from deserializer', 
  `status` string COMMENT 'from deserializer', 
  `class` struct<system:string,code:string> COMMENT 'from deserializer', 
  `type` array<struct<coding:array<struct<system:string,code:string,display:string>>,text:string>> COMMENT 'from deserializer', 
  `subject` struct<reference:string,display:string> COMMENT 'from deserializer', 
  `participant` array<struct<type:array<struct<coding:array<struct<system:string,code:string,display:string>>,text:string>>,period:struct<start:string,end:string>,individual:struct<reference:string,display:string>>> COMMENT 'from deserializer', 
  `period` struct<start:string,end:string> COMMENT 'from deserializer', 
  `location` array<struct<location:struct<reference:string,display:string>>> COMMENT 'from deserializer', 
  `serviceprovider` struct<reference:string,display:string> COMMENT 'from deserializer', 
  `reasoncode` array<struct<coding:array<struct<system:string,code:string,display:string>>>> COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.openx.data.jsonserde.JsonSerDe' 
WITH SERDEPROPERTIES ( 
  'paths'='class,id,identifier,location,meta,participant,period,reasonCode,resourceType,serviceProvider,status,subject,type') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://bucket/Encounter/'
TBLPROPERTIES (
  'CrawlerSchemaDeserializerVersion'='1.0', 
  'CrawlerSchemaSerializerVersion'='1.0', 
  'UPDATED_BY_CRAWLER'='healthlake-export-crawler', 
  'averageRecordSize'='1561', 
  'classification'='json', 
  'compressionType'='none', 
  'objectCount'='14', 
  'recordCount'='53116', 
  'sizeKey'='83059320', 
  'typeOfData'='file')

Here's an example of what the original data looks like (it's synthetic data, so not PHI):

{
	"period": {
		"start": "2019-11-18T13:53:49-08:00",
		"end": "2019-11-18T14:23:49-08:00"
	},
	"subject": {
		"reference": "Patient/92e36d1e-66a2-4e77-9f50-155f7edf819c",
		"display": "Cyndi533 Bogan287"
	},
	"serviceProvider": {
		"reference": "Organization/3ecb1bdd-03d7-3fd2-b52d-8df2a04f5b0a",
		"display": "SOUTH SHORE SKIN CENTER, LLC"
	},
	"id": "b39745ae-14dd-46b3-9345-2916efa759ad",
	"type": [{
		"coding": [{
			"system": "http://snomed.info/sct",
			"code": "410620009",
			"display": "Well child visit (procedure)"
		}],
		"text": "Well child visit (procedure)"
	}],
	"class": {
		"system": "http://terminology.hl7.org/CodeSystem/v3-ActCode",
		"code": "AMB"
	},
	"participant": [{
		"period": {
			"start": "2019-11-18T13:53:49-08:00",
			"end": "2019-11-18T14:23:49-08:00"
		},
		"individual": {
			"reference": "Practitioner/c51e847b-fcd0-3f98-98a7-7e4274a2e6f3",
			"display": "Dr. Jacquelyne425 O'Reilly797"
		},
		"type": [{
			"coding": [{
				"system": "http://terminology.hl7.org/CodeSystem/v3-ParticipationType",
				"code": "PPRF",
				"display": "primary performer"
			}],
			"text": "primary performer"
		}]
	}],
	"resourceType": "Encounter",
	"status": "finished",
	"meta": {
		"lastUpdated": "2022-04-08T15:40:39.926Z"
	}
}
2개 답변
0
수락된 답변

Hi,

A Glue ETL job can be configured to create tables in the data target. If your goal is to create a table in Redshift and write data to it, consider looking into Glue ETL referenced below. There is an option to have Glue create tables in your data target, so you wouldn't have to write the schema yourself. You could then see the schema by generating the table DDL afterwards in Redshift.

https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/load-data-from-amazon-s3-to-amazon-redshift-using-aws-glue.html

AWS
hamltm
답변함 2년 전
profile picture
전문가
검토됨 10달 전
  • the link is broken it actually open the same repost page.

  • Thank you Fabrizio@AWS, link is fixed.

  • That link gives nothing more than a high-level overview that isn't useful. It says for the target, choose "create tables in your data target" but doesn't say which transform might allow that option because I don't see it in the Apply Mapping or Select Fields transforms. Then for related resources it sends you to the generic AWS Glue documentation where there might possibly be something helpful in the hundreds of pages of documentation.

  • Thanks! So it sounds like the answer is "no, you can't use an existing glue data catalog table, but the end goal can be achieved via a Glue ETL job."

0

Have you considered creating an external schema in Redshift pointing to a database in Glue catalog, so the data can be accessed from Redshift via Redshift Spectrum?

https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_SCHEMA.html

AWS
Denis_A
답변함 2년 전
profile picture
전문가
검토됨 10달 전
  • The name of the Glue db is healthlakedb. The schema is different for each table in the Glue db. However, I don't see a field for glue table, so this was my command: CREATE EXTERNAL SCHEMA healthlake FROM DATA CATALOG DATABASE 'healthlakedb' REGION 'us-east-1' IAM_ROLE 'default' CREATE EXTERNAL DATABASE IF NOT EXISTS Got this error: "Unknown std exception when calling external catalog API ... error: Assert code: 1000 context: arn.valid() || g_xencppunit - query: 0 location: datacatalogclient.cpp:228 process: padbmaster" Any tips on how this works with Glue tables? Using Redshift Serverless

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠