Auto-detect schema for parquet data load

Question

Hi,
I'm trying to load a parquet in redshift, tried both locally or from S3. I've been trying to use the Load Data tool in Redshift query editor V2. I'm using the "Load new table" (`Create table with detected schema`)  but it seems like Redshift is unable to detect the parquet schema, no column is automatically inferred.

Is there a way to create at table from a file (parquet or CSV), without having to manually specify the table schema manually ?

Thanks

Answer

Hi there,

Another option for inferring schemas of files that reside in S3 is to use an **[AWS Glue Crawler](https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html)**.

Once the S3-based files have been crawled, table entries will appear in the AWS Glue Data Catalog, which can be made visible in Redshift through creation of an **[EXTERNAL SCHEMA](https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_SCHEMA.html)** using the 'DATA CATALOG' keyword.

Once the external schema is created, you can begin querying the crawled tables inside Redshift. To create a physical copy of the external tables in Redshift, you can run a **[CTAS statement](https://docs.aws.amazon.com/redshift/latest/dg/r_CTAS_examples.html)**.

Any subsequent tables crawled will appear within Redshift for querying (as long as they are mapped to the same Glue Database).

I hope this helps!

Auto-detect schema for parquet data load

관련 콘텐츠