Including packages in Glue jobs

0

Hi everyone,

I created a Glue job using boto3. The job runs in the Python shell mode and needs several python packages like opencv, deltalake and polars. I tried using AWS CLI to apply its option --extra-py-files and --additional-python-modules, and --additional-python-modules worked. But the problem is my company regulations don't allow AWS Glue to download the packages from the internet and I have to use its own package artifactory.

My questions are about two possible options:

  1. How to give the job these packages it needs without downloading them from the internet. Instead the packages are downloaded manually beforehand from the artifactory using pip and zipped so the job can use.
  2. Is there any way to have AWS Glue point to the artifactory location to directly download the packages?

Thanks

Spring
已提問 8 個月前檢視次數 304 次
1 個回答
0
  1. Yes, you can create a virtualenv, install those packages, zip the site-packages content of the virtualenv, put the zip on s3 and then specify it using --extra-py-files. This solution might not work with packages with native code
  2. you can put the whl files download on s3 and then list those s3 urls --extra-py-files (the drawback is that you need to take care of transitive dependencies).
    Finally, the most sophisticated option is configure pip to another repo (which can be on s3), using the pip flag --index-url, in older versions you could pass to Glue using --python-modules-installer-option but I think no you can just specify it inside --additional-python-modules
profile pictureAWS
專家
已回答 8 個月前
  • Thanks. Nice. Could you please give a more detailed guide on this? I'm a novice and found the AWS documentation not really clear and actionable.

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南