Issue with Redshift Spectrum/Glue Crawler - Unintended Column Splitting on strings

0

Hey,

We have a Glue crawler crawling a series of CSVs in S3 and capturing this in a database. This is surfaced in Redshift via Spectrum Schema.

The problem we have is that in Redshift, the delimiter/quote is not being respected for commas within quotes causing unintended splits in a column of text: Enter image description here

So far, we have attempted to add a classifier to the crawler as such Enter image description here

which did not resolve the problem, the table property here is currently quoteclassifer true as I manually set it, but, the crawler runs and overrides this value back to false. Enter image description here

We also tried changing the datatypes from string to varchar(1000) in the JSON Schema but this did not seem to work either.

This is not an issue when you open the CSV in Excel/Notepad/VSCode.

OmniFlo
demandé il y a 5 mois166 vues
1 réponse
0

Greetings from AWS! I understand that Glue crawler reset "areColumnsQuoted" parameter each time it runs, such caused Redshift Spectrum cannot split your csv data correctly. To fix this issue, after manually changed the table parameter in your Redshift Spectrum table, you can try editing your glue crawler --> "Set output and scheduling" --> "Output configuration" --> "Advanced options" --> select "Add new columns only" and enable "Update all new and existing partitions with metadata from the table" --> Save the changes. By applying this configuration, the crawler will inherit existing table parameters from your exiting table and will not reset/override them.

Ref: https://docs.aws.amazon.com/glue/latest/dg/crawler-configuration.html#crawler-configure-changes-console

AWS
Ethan_H
répondu il y a 5 mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions