- 最新
- 最多得票
- 最多評論
Glue Crawler has an option to use for schema evolution scenario. To be able to add new columns whenever new columns are added into the files, you need to follow the following steps:
-
Make sure you choose "Crawl all folders"
-
Choose Add new columns or Update Table definition in Catalog under advanced options in "Set output and scheduling"
In case your column names are updated correctly, you may also need to create a classifier and link that classifier with this crawler. Here are the options you need to choose for the classifier:
Thanks for bring this scenario up. Is it possible for you to try to do perform few tests using parquet files and see if that works for your use case?
Is this a change I can make in the crawler?
Thanks for follow up question. You can use glue or your own transform from csv to parquet. For testing may be try https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/three-aws-glue-etl-job-types-for-converting-data-to-apache-parquet.html and see how it works for your case.
相關內容
- 已提問 7 個月前
- AWS 官方已更新 1 年前
- AWS 官方已更新 2 年前
I've tried with these exact settings, deleting the table and letting the crawler recreate it, but the columns when queried in Athena still have the wrong data in them.