- Le plus récent
- Le plus de votes
- La plupart des commentaires
Glue Crawler has an option to use for schema evolution scenario. To be able to add new columns whenever new columns are added into the files, you need to follow the following steps:
-
Make sure you choose "Crawl all folders"
-
Choose Add new columns or Update Table definition in Catalog under advanced options in "Set output and scheduling"
In case your column names are updated correctly, you may also need to create a classifier and link that classifier with this crawler. Here are the options you need to choose for the classifier:
Thanks for bring this scenario up. Is it possible for you to try to do perform few tests using parquet files and see if that works for your use case?
Is this a change I can make in the crawler?
Thanks for follow up question. You can use glue or your own transform from csv to parquet. For testing may be try https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/three-aws-glue-etl-job-types-for-converting-data-to-apache-parquet.html and see how it works for your case.
Contenus pertinents
- demandé il y a un an
- demandé il y a 8 mois
- demandé il y a 3 mois
- AWS OFFICIELA mis à jour il y a un an
- AWS OFFICIELA mis à jour il y a un an
- AWS OFFICIELA mis à jour il y a 2 ans
I've tried with these exact settings, deleting the table and letting the crawler recreate it, but the columns when queried in Athena still have the wrong data in them.