Auto-detect schema for parquet data load

0

Hi, I'm trying to load a parquet in redshift, tried both locally or from S3. I've been trying to use the Load Data tool in Redshift query editor V2. I'm using the "Load new table" (Create table with detected schema) but it seems like Redshift is unable to detect the parquet schema, no column is automatically inferred.

Is there a way to create at table from a file (parquet or CSV), without having to manually specify the table schema manually ?

Thanks

RobinF
asked 8 months ago308 views
1 Answer
1

Hi there,

Another option for inferring schemas of files that reside in S3 is to use an AWS Glue Crawler.

Once the S3-based files have been crawled, table entries will appear in the AWS Glue Data Catalog, which can be made visible in Redshift through creation of an EXTERNAL SCHEMA using the 'DATA CATALOG' keyword.

Once the external schema is created, you can begin querying the crawled tables inside Redshift. To create a physical copy of the external tables in Redshift, you can run a CTAS statement.

Any subsequent tables crawled will appear within Redshift for querying (as long as they are mapped to the same Glue Database).

I hope this helps!

profile pictureAWS
EXPERT
answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions