1 個回答
- 最新
- 最多得票
- 最多評論
0
Hey Dave,
It sounds like a perfect use case for Glue especially because the quantities and the concurrency is not too masive.
- Glue job accepts input values at runtime as parameters to be passed into the job. Parameters can be reliably passed into ETL script using AWS Glue’s getResolvedOptionsfunction.
- You can limit the number of running jobs by using the Data Processing Units (DPU) so it will run the number of jobs you need.
- For orchastrating more complex ETL graphs you can check out this documentation, it has a way of architecting it in a similar form of what you described in 6 ( https://docs.aws.amazon.com/glue/latest/dg/orchestrate-using-workflows.html ). In adittion you can concatinate job triggers and cloudwatch notifications like the following example: https://aws.amazon.com/premiumsupport/knowledge-center/start-glue-job-run-end/
Good luck! Ido
已回答 5 年前
相關內容
- 已提問 6 個月前
- AWS 官方已更新 2 年前
- AWS 官方已更新 1 年前
- AWS 官方已更新 3 年前
- AWS 官方已更新 2 年前
Hi Dave, Thank you for your response. I still have some doubts regarding the parameters you mentioned in the first point, and I believe you can assist me with that. Currently, I am using a Glue Workflow to process my data, which is triggered by an Event using EventBridge. This workflow starts with a Glue Job that fetches the last file uploaded to the S3 bucket being monitored, which triggers the event. However, I'm unsure how to adapt my Glue Job to get the exact file that triggered the workflow. I am concerned about how the workflow will handle multiple files being uploaded simultaneously.