Including packages in Glue jobs

0

Hi everyone,

I created a Glue job using boto3. The job runs in the Python shell mode and needs several python packages like opencv, deltalake and polars. I tried using AWS CLI to apply its option --extra-py-files and --additional-python-modules, and --additional-python-modules worked. But the problem is my company regulations don't allow AWS Glue to download the packages from the internet and I have to use its own package artifactory.

My questions are about two possible options:

  1. How to give the job these packages it needs without downloading them from the internet. Instead the packages are downloaded manually beforehand from the artifactory using pip and zipped so the job can use.
  2. Is there any way to have AWS Glue point to the artifactory location to directly download the packages?

Thanks

Spring
feita há 8 meses302 visualizações
1 Resposta
0
  1. Yes, you can create a virtualenv, install those packages, zip the site-packages content of the virtualenv, put the zip on s3 and then specify it using --extra-py-files. This solution might not work with packages with native code
  2. you can put the whl files download on s3 and then list those s3 urls --extra-py-files (the drawback is that you need to take care of transitive dependencies).
    Finally, the most sophisticated option is configure pip to another repo (which can be on s3), using the pip flag --index-url, in older versions you could pass to Glue using --python-modules-installer-option but I think no you can just specify it inside --additional-python-modules
profile pictureAWS
ESPECIALISTA
respondido há 8 meses
  • Thanks. Nice. Could you please give a more detailed guide on this? I'm a novice and found the AWS documentation not really clear and actionable.

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas