Auto-detect schema for parquet data load

0

Hi, I'm trying to load a parquet in redshift, tried both locally or from S3. I've been trying to use the Load Data tool in Redshift query editor V2. I'm using the "Load new table" (Create table with detected schema) but it seems like Redshift is unable to detect the parquet schema, no column is automatically inferred.

Is there a way to create at table from a file (parquet or CSV), without having to manually specify the table schema manually ?

Thanks

RobinF
gefragt vor 9 Monaten324 Aufrufe
1 Antwort
1

Hi there,

Another option for inferring schemas of files that reside in S3 is to use an AWS Glue Crawler.

Once the S3-based files have been crawled, table entries will appear in the AWS Glue Data Catalog, which can be made visible in Redshift through creation of an EXTERNAL SCHEMA using the 'DATA CATALOG' keyword.

Once the external schema is created, you can begin querying the crawled tables inside Redshift. To create a physical copy of the external tables in Redshift, you can run a CTAS statement.

Any subsequent tables crawled will appear within Redshift for querying (as long as they are mapped to the same Glue Database).

I hope this helps!

profile pictureAWS
EXPERTE
beantwortet vor 9 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen