1개 답변
- 최신
- 최다 투표
- 가장 많은 댓글
1
Hi,
AWS Glue Crawlers are used to automatically discover the schema of the data in Amazon S3 or other data sources. They also help in capturing schema evolution.
If your schema is fixed (do not change often), already known and you do not have issues creating your tables manually via the console or your code using the APIs, then you do not need to use them.
Consider also that the Crawler do have a cost, so cost optimization might be another reason to not use them if you are fine with self managing the schemas of your datasets.
for additional information on Crawlers, you can refer to this section of the AWS Glue Documentation.
hope this helps
관련 콘텐츠
- AWS 공식업데이트됨 3년 전
- AWS 공식업데이트됨 2년 전
As Fabrizio said correctly, You only need to run the AWS Glue Crawler again if your schema changes. Also, ETL job support bookmarking and its recommended to use when your data grows day by day and when your job runs, you don't want to perform ETL job operations all over your data again, with bookmarking it will start processing operation on new data (wont perform operation on a processed data).
Read more at: https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html