Athena tables can be created from Glue tables which can have schemas based on crawlers.
**Is it also possible to use the schema of a Glue table to generate a Redshift-compatible CREATE TABLE
statement? **
I tried SHOW CREATE TABLE encounter;
in Athena. And then I tried plugging in the resulting CREATE TABLE
statement in Redshift, but got an error:
ERROR: syntax error at or near "`" Position: 23.
I can go through the statement Athena generated and clean it up to fit Redshift requirements, like taking out the back-ticks, but I'm wondering if there's any more direct way to generate a table based on a Glue table?
This is that CREATE TABLE
statement that Athena generated:
CREATE EXTERNAL TABLE `encounter`(
`resourcetype` string COMMENT 'from deserializer',
`id` string COMMENT 'from deserializer',
`meta` struct<lastupdated:string,profile:array<string>> COMMENT 'from deserializer',
`identifier` array<struct<use:string,system:string,value:string>> COMMENT 'from deserializer',
`status` string COMMENT 'from deserializer',
`class` struct<system:string,code:string> COMMENT 'from deserializer',
`type` array<struct<coding:array<struct<system:string,code:string,display:string>>,text:string>> COMMENT 'from deserializer',
`subject` struct<reference:string,display:string> COMMENT 'from deserializer',
`participant` array<struct<type:array<struct<coding:array<struct<system:string,code:string,display:string>>,text:string>>,period:struct<start:string,end:string>,individual:struct<reference:string,display:string>>> COMMENT 'from deserializer',
`period` struct<start:string,end:string> COMMENT 'from deserializer',
`location` array<struct<location:struct<reference:string,display:string>>> COMMENT 'from deserializer',
`serviceprovider` struct<reference:string,display:string> COMMENT 'from deserializer',
`reasoncode` array<struct<coding:array<struct<system:string,code:string,display:string>>>> COMMENT 'from deserializer')
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'paths'='class,id,identifier,location,meta,participant,period,reasonCode,resourceType,serviceProvider,status,subject,type')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://bucket/Encounter/'
TBLPROPERTIES (
'CrawlerSchemaDeserializerVersion'='1.0',
'CrawlerSchemaSerializerVersion'='1.0',
'UPDATED_BY_CRAWLER'='healthlake-export-crawler',
'averageRecordSize'='1561',
'classification'='json',
'compressionType'='none',
'objectCount'='14',
'recordCount'='53116',
'sizeKey'='83059320',
'typeOfData'='file')
Here's an example of what the original data looks like (it's synthetic data, so not PHI):
{
"period": {
"start": "2019-11-18T13:53:49-08:00",
"end": "2019-11-18T14:23:49-08:00"
},
"subject": {
"reference": "Patient/92e36d1e-66a2-4e77-9f50-155f7edf819c",
"display": "Cyndi533 Bogan287"
},
"serviceProvider": {
"reference": "Organization/3ecb1bdd-03d7-3fd2-b52d-8df2a04f5b0a",
"display": "SOUTH SHORE SKIN CENTER, LLC"
},
"id": "b39745ae-14dd-46b3-9345-2916efa759ad",
"type": [{
"coding": [{
"system": "http://snomed.info/sct",
"code": "410620009",
"display": "Well child visit (procedure)"
}],
"text": "Well child visit (procedure)"
}],
"class": {
"system": "http://terminology.hl7.org/CodeSystem/v3-ActCode",
"code": "AMB"
},
"participant": [{
"period": {
"start": "2019-11-18T13:53:49-08:00",
"end": "2019-11-18T14:23:49-08:00"
},
"individual": {
"reference": "Practitioner/c51e847b-fcd0-3f98-98a7-7e4274a2e6f3",
"display": "Dr. Jacquelyne425 O'Reilly797"
},
"type": [{
"coding": [{
"system": "http://terminology.hl7.org/CodeSystem/v3-ParticipationType",
"code": "PPRF",
"display": "primary performer"
}],
"text": "primary performer"
}]
}],
"resourceType": "Encounter",
"status": "finished",
"meta": {
"lastUpdated": "2022-04-08T15:40:39.926Z"
}
}
the link is broken it actually open the same repost page.
Thank you Fabrizio@AWS, link is fixed.
That link gives nothing more than a high-level overview that isn't useful. It says for the target, choose "create tables in your data target" but doesn't say which transform might allow that option because I don't see it in the Apply Mapping or Select Fields transforms. Then for related resources it sends you to the generic AWS Glue documentation where there might possibly be something helpful in the hundreds of pages of documentation.
Thanks! So it sounds like the answer is "no, you can't use an existing glue data catalog table, but the end goal can be achieved via a Glue ETL job."