1 Answer
- Newest
- Most votes
- Most comments
2
- Check Classifier Settings: Ensure that areColumnsQuoted = true is correctly set.
- Use OpenCSV SerDe: Explicitly define the quote character in the SerDe settings if you are using a custom table definition.
- ETL Post-processing: Consider stripping quotes within your Glue ETL job if they persist.
- Preprocess CSVs: As a last resort, preprocess the CSVs to remove quotes before crawling.
Relevant content
- asked 9 months ago
- asked 2 years ago
- asked a year ago
- AWS OFFICIALUpdated 4 years ago
- AWS OFFICIALUpdated a year ago

Update:It worked for me.
How did this work ? Referred this resource: https://docs.aws.amazon.com/glue/latest/dg/add-classifier.html#classifier-built-in
content: The built-in CSV classifier creates tables referencing the LazySimpleSerDe as the serialization library, which is a good choice for type inference. However, if the CSV data contains quoted strings, edit the table definition and change the SerDe library to OpenCSVSerDe. Adjust any inferred types to STRING, set the SchemaChangePolicy to LOG, and set the partitions output configuration to InheritFromTable for future crawler runs. For more information about SerDe libraries, see SerDe Reference in the Amazon Athena User Guide.
I had to add the right type of csv verde to be used even if I dont want to add a specific classifier for this crawler. the right one was : org.apache.hadoop.hive.serde2.OpenCSVSerde
-^this removes the double quotes I was getting earlier. Serde serialization lib: org.apache.hadoop.hive.serde2.OpenCSVSerde
and it worked.....