Including packages in Glue jobs

0

Hi everyone,

I created a Glue job using boto3. The job runs in the Python shell mode and needs several python packages like opencv, deltalake and polars. I tried using AWS CLI to apply its option --extra-py-files and --additional-python-modules, and --additional-python-modules worked. But the problem is my company regulations don't allow AWS Glue to download the packages from the internet and I have to use its own package artifactory.

My questions are about two possible options:

  1. How to give the job these packages it needs without downloading them from the internet. Instead the packages are downloaded manually beforehand from the artifactory using pip and zipped so the job can use.
  2. Is there any way to have AWS Glue point to the artifactory location to directly download the packages?

Thanks

Spring
質問済み 8ヶ月前302ビュー
1回答
0
  1. Yes, you can create a virtualenv, install those packages, zip the site-packages content of the virtualenv, put the zip on s3 and then specify it using --extra-py-files. This solution might not work with packages with native code
  2. you can put the whl files download on s3 and then list those s3 urls --extra-py-files (the drawback is that you need to take care of transitive dependencies).
    Finally, the most sophisticated option is configure pip to another repo (which can be on s3), using the pip flag --index-url, in older versions you could pass to Glue using --python-modules-installer-option but I think no you can just specify it inside --additional-python-modules
profile pictureAWS
エキスパート
回答済み 8ヶ月前
  • Thanks. Nice. Could you please give a more detailed guide on this? I'm a novice and found the AWS documentation not really clear and actionable.

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ