Glue data catalog schema not updated

0

Hi everyone. Should crawler update table schema if datasourse schema is changed? For example, I have some parquet file with data. One field has datatype "double". Parquet file is created by Glue Job. I've updated Glue Job and set "decimal" instead of "double" When I run job - it finishs with 'succeded' status. Then I run the crawler. But the table still has "double" datatype. Only if I delete the table and run crawler once again - it creates a new table with proper datatype = "decimal" Is it correct behavior or I'm doing smth wrong?

Anton
質問済み 7ヶ月前423ビュー
2回答
2
承認された回答

Yes, IF you configured the crawler appropriately. See https://docs.aws.amazon.com/glue/latest/dg/crawler-configuration.html for details.

Also, it could be possible that you do NOT need to run the crawler separately, and instead update the catalog as a part of your pySpark script. See https://docs.aws.amazon.com/glue/latest/dg/update-from-job.html

回答済み 7ヶ月前
profile picture
エキスパート
レビュー済み 3日前
1

Decimal and double are not really compatible types on the way they are serialized, not even decimals with different precisions are.
Whichever the crawler decides to use for the table, I think that table will be broken if you have a mixture of those types and some numbers will be incorrect.
To do that kind of change you should recreate the table with the new type, after deleting the old files. Also, if you generate the data in your own job, it should update the table during writing, no point on waiting for a crawler to make a guess when the job knows the schema.

profile pictureAWS
エキスパート
回答済み 7ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ