How do I create a schema/table in Redshift cluster from inside a Glue script?

0

I have established a Redshift connection in AWS Glue already. My project would require me to create tables and schemas dynamically on the fly using the Glue script. Supposing that I have already got the field names to populate to column names in the table, how do I go about that?

I tried below script in Jupyter Notebook platform of Glue.

%idle_timeout 2880
%glue_version 4.0
%worker_type G.1X
%number_of_workers 5
%connections redshift

import sys, boto3
from awsglue.transforms import *
from pyspark.context import SparkContext
from awsglue.context import GlueContext
  
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session

redshift_connection = boto3.client('redshift')

spark.sql('CREATE TABLE IF NOT EXISTS public.test (id int)')

The last command gives below error.

AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to glue.ap-south-1.amazonaws.com:443 [glue.ap-south-1.amazonaws.com/43.204.239.230, glue.ap-south-1.amazonaws.com/13.234.208.37, glue.ap-south-1.amazonaws.com/13.232.151.228] failed: connect timed out

What do I do to achieve the goal? Assist me with the approaches. Thanks in advance.

已提問 7 個月前檢視次數 334 次
2 個答案
1

Hello.

The error message you received is related to a connectivity timeout, which may not be directly related to your SQL query but instead due to issues related to networking or permissions. This needs to be resolved first before executing SQL statements.

Ensure that your Glue jobs/development endpoint has network access to your Redshift cluster. You may need to adjust your VPC/security group rules.

Make sure your Glue IAM role has the necessary permissions to interact with Redshift. Attach the necessary policies related to Redshift to the IAM role assumed by Glue.

Regards, Andrii

profile picture
專家
已回答 7 個月前
profile picture
專家
已審閱 5 天前
1

Glue cannot reach the catalog, the "redshift" connection has a VPC without connectivity to the internet.
The best solution is to add the regional glue service endpoint to the VPC.

profile pictureAWS
專家
已回答 7 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南