How do I create a schema/table in Redshift cluster from inside a Glue script?

0

I have established a Redshift connection in AWS Glue already. My project would require me to create tables and schemas dynamically on the fly using the Glue script. Supposing that I have already got the field names to populate to column names in the table, how do I go about that?

I tried below script in Jupyter Notebook platform of Glue.

%idle_timeout 2880
%glue_version 4.0
%worker_type G.1X
%number_of_workers 5
%connections redshift

import sys, boto3
from awsglue.transforms import *
from pyspark.context import SparkContext
from awsglue.context import GlueContext
  
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session

redshift_connection = boto3.client('redshift')

spark.sql('CREATE TABLE IF NOT EXISTS public.test (id int)')

The last command gives below error.

AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to glue.ap-south-1.amazonaws.com:443 [glue.ap-south-1.amazonaws.com/43.204.239.230, glue.ap-south-1.amazonaws.com/13.234.208.37, glue.ap-south-1.amazonaws.com/13.232.151.228] failed: connect timed out

What do I do to achieve the goal? Assist me with the approaches. Thanks in advance.

질문됨 7달 전333회 조회
2개 답변
1

Hello.

The error message you received is related to a connectivity timeout, which may not be directly related to your SQL query but instead due to issues related to networking or permissions. This needs to be resolved first before executing SQL statements.

Ensure that your Glue jobs/development endpoint has network access to your Redshift cluster. You may need to adjust your VPC/security group rules.

Make sure your Glue IAM role has the necessary permissions to interact with Redshift. Attach the necessary policies related to Redshift to the IAM role assumed by Glue.

Regards, Andrii

profile picture
전문가
답변함 7달 전
profile picture
전문가
검토됨 3일 전
1

Glue cannot reach the catalog, the "redshift" connection has a VPC without connectivity to the internet.
The best solution is to add the regional glue service endpoint to the VPC.

profile pictureAWS
전문가
답변함 7달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠