Including packages in Glue jobs

0

Hi everyone,

I created a Glue job using boto3. The job runs in the Python shell mode and needs several python packages like opencv, deltalake and polars. I tried using AWS CLI to apply its option --extra-py-files and --additional-python-modules, and --additional-python-modules worked. But the problem is my company regulations don't allow AWS Glue to download the packages from the internet and I have to use its own package artifactory.

My questions are about two possible options:

  1. How to give the job these packages it needs without downloading them from the internet. Instead the packages are downloaded manually beforehand from the artifactory using pip and zipped so the job can use.
  2. Is there any way to have AWS Glue point to the artifactory location to directly download the packages?

Thanks

Spring
질문됨 8달 전305회 조회
1개 답변
0
  1. Yes, you can create a virtualenv, install those packages, zip the site-packages content of the virtualenv, put the zip on s3 and then specify it using --extra-py-files. This solution might not work with packages with native code
  2. you can put the whl files download on s3 and then list those s3 urls --extra-py-files (the drawback is that you need to take care of transitive dependencies).
    Finally, the most sophisticated option is configure pip to another repo (which can be on s3), using the pip flag --index-url, in older versions you could pass to Glue using --python-modules-installer-option but I think no you can just specify it inside --additional-python-modules
profile pictureAWS
전문가
답변함 8달 전
  • Thanks. Nice. Could you please give a more detailed guide on this? I'm a novice and found the AWS documentation not really clear and actionable.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠