Including packages in Glue jobs

0

Hi everyone,

I created a Glue job using boto3. The job runs in the Python shell mode and needs several python packages like opencv, deltalake and polars. I tried using AWS CLI to apply its option --extra-py-files and --additional-python-modules, and --additional-python-modules worked. But the problem is my company regulations don't allow AWS Glue to download the packages from the internet and I have to use its own package artifactory.

My questions are about two possible options:

  1. How to give the job these packages it needs without downloading them from the internet. Instead the packages are downloaded manually beforehand from the artifactory using pip and zipped so the job can use.
  2. Is there any way to have AWS Glue point to the artifactory location to directly download the packages?

Thanks

Spring
已提问 8 个月前302 查看次数
1 回答
0
  1. Yes, you can create a virtualenv, install those packages, zip the site-packages content of the virtualenv, put the zip on s3 and then specify it using --extra-py-files. This solution might not work with packages with native code
  2. you can put the whl files download on s3 and then list those s3 urls --extra-py-files (the drawback is that you need to take care of transitive dependencies).
    Finally, the most sophisticated option is configure pip to another repo (which can be on s3), using the pip flag --index-url, in older versions you could pass to Glue using --python-modules-installer-option but I think no you can just specify it inside --additional-python-modules
profile pictureAWS
专家
已回答 8 个月前
  • Thanks. Nice. Could you please give a more detailed guide on this? I'm a novice and found the AWS documentation not really clear and actionable.

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则