Can we use requirements.txt in model monitoring to install libraries?

Question

https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-faqs.html
"If you want to include a few third-party libraries, you can place a requirements.txt file in a local directory and reference it using the source_dir parameter in the SageMaker estimator, which enables library installation at run-time. However, if you have lots of libraries or dependencies that increase the installation time while running the training job, you might want to leverage BYOC."

I need few libraries to be installed so that the preprocessing script can reference the same when using default model monitoring. How to leverage 'requirements.txt' for sagemaker model monitoring.

Answer

Hello,

I understand that you would like to know if we can use requirements.txt to install libraries so that the preprocessing script can reference it when using default model monitoring.

As mentioned in the FAQs about 'requirements.txt' file that can be referenced using the source_dir parameter in SageMaker estimator, please note that this is for training jobs. I understand it may cause some confusion regarding model monitoring. Apologies for the inconvenience.

To install additional libraries when using default model monitoring, please note that unfortunately it is not possible to do it via ‘requirements.txt’ file because there is no way to make that file available to the pre-processing script. However, you can do a direct installation from the pre-processing script.

You can use custom preprocessing and postprocessing Python scripts by uploading them to S3 and can reference them when creating the model monitor. Please refer to [1] for more information on this. For example, you can refer to the below script to install additional packages as required using the preprocessing script-

```
import sys
import subprocess

subprocess.check_output(["/usr/bin/python3", "-m", "pip", "install", "pandas==1.3.5", "numpy=1.21.6", "python-dateutil==2.8.2", "contextvars ==2.4"])

import pandas as pd

def preprocess_handler():
   ...
```
Please refer to [2] for more details on this and sample preprocessing scripts. I would like to mention here that this can only be used if you have a few packages to be installed, if not it might result in a timeout issue.

However, if you would like to install large number of packages, Build Your Own Processing Container (BYOC) can be used where a Docker image that has your own code and dependencies to run the data processing and and model evaluation workloads can be used. Please refer to [3] for more information on this.

You can also refer to the example docker file given in [3] which can be used for the processing. You can add the required packages in the ‘requirement.txt’ file or manually add them in the docker file before creating the container image.

I hope this was helpful. If you face any other issues or require further assistance, please reach out to AWS Support [4] along with your use case in detail, and we would be happy to assist you further. Have a great day ahead!

References:

[1] https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-pre-and-post-processing.html

[2] https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-pre-and-post-processing.html#model-monitor-pre-processing-script

[3] https://docs.aws.amazon.com/sagemaker/latest/dg/build-your-own-processing-container.html

[4] https://console.aws.amazon.com/support/home#/case/create

Answer

Thanks for the answer, "subprocess.check_output" will be helpful. What if I have to reference few custom python files/classes from preprocessing script. Is there an option to import them to the preprocessing script, other than  BYOC?

Can we use requirements.txt in model monitoring to install libraries?

Relevant content