ModuleNotFoundError when starting a training job on Sagemaker

0

I want to submit a training job on sagemaker. I tried it on notebook and it works. When I try the following I get ModuleNotFoundError: No module named 'nltk'

My code is

import sagemaker  
from sagemaker.pytorch import PyTorch

JOB_PREFIX   = 'pyt-ic'
FRAMEWORK_VERSION = '1.3.1'

estimator = PyTorch(entry_point='finetune-T5.py',
                   source_dir='../src',
                   train_instance_type='ml.p2.xlarge' ,
                   train_instance_count=1,
                   role=sagemaker.get_execution_role(),
                   framework_version=FRAMEWORK_VERSION, 
                   debugger_hook_config=False,  
                   py_version='py3',
                   base_job_name=JOB_PREFIX)

estimator.fit()

finetune-T5.py have many other libraries that are not installed. How can I install the missing library? Or is there a better way to run the training job?

  • I tried adding nltk to requirements.txt file in scripts directory which worked for another module but not nltk; what could I be doing wrong?

preguntada hace 4 años1675 visualizaciones
1 Respuesta
0
Respuesta aceptada

Check out this link (Using third-party libraries section) on how to install third-party libraries for training jobs. You need to create requirement.txt file in the same directory as your training script to install other dependencies at runtime.

AWS
Sam
respondido hace 4 años

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas