How to prevent programs downloaded over curl to be deleted after every restart on Amazon Sagemaker Studio?

0

In my JupyterLab instance, I have downloaded a software using the terminal using a curl command. Every time I'm done with the instance I will shut it down to prevent any additional charges. But when I turn it back on, the software is always deleted and I have to download it again. Perhaps this is some sort of clean up being done, if so how can I turn that off? Thanks in advance.

Kai
asked 20 days ago294 views
2 Answers
0

What did you save the data? Only files and data saved within the /home/ec2-user/SageMaker folder persist between notebook instance sessions. Files and data that are saved outside this directory are overwritten when the notebook instance stops and restarts. Each notebook instance's /tmp directory provides a minimum of 10 GB of storage in an instance store. An instance store is temporary, block-level storage that isn't persistent. When the instance is stopped or restarted, SageMaker deletes the directory's contents. This temporary storage is part of the root volume of the notebook instance.

profile pictureAWS
EXPERT
answered 20 days ago
profile pictureAWS
EXPERT
reviewed 20 days ago
0

Each time you start a SageMaker Studio "space" (or "app" in Studio Classic), a fresh container is started from the base image and only the contents of your persistent home folder are persisted:

  • In SageMaker Studio, it's your user's home folder /home/sagemaker-user
  • In the older SageMaker Notebook Instances (not Studio), it's the /home/ec2-user/SageMaker folder mentioned in the other answer - and this sometimes catches me out because the terminal opens in /home/ec2-user by default, which is not visible in JupyterLab folder tree or persisted between restarts... So you'll want to cd SageMaker first.

So one solution is to simply curl your program to somewhere under this persistent folder.

But if you need to perform other setup that lives outside this folder (for example to install OS-level libraries your program depends on, put files in /usr or similar folders, or edit your ~/.bashrc in SM Notebook Instances where root user home folder is not persisted), you can also use lifecycle configuration scripts (see docs for Studio JupyterLab, Studio Code Editor, Studio Classic, and Notebook Instances). These shell scripts run every time your environment starts, so can perform setup steps that need to be repeated like installing extra OS libraries, installing content outside the persistent folder, etc. Just note that:

  1. Your LCC script must complete within 5 minutes (or maybe start an asynchronous process that does the rest of setup)
  2. The environments are a bit different, so an LCC script will typically need some porting to work between Studio/Studio Classic/NBIs
  3. I have been aware of an issue in the past where editing LCCs in the AWS console from a Windows laptop causes CRLF line-endings to be inserted. This may be fixed already, but if it happens, you'll probably see the first line of your script fail with an error that doesn't make sense: E.g. fail to cd FolderThatDefinitelyExists, because an invisible carriage return character is present at the end that's treated as part of the name. So if you work on a Windows machine, watch out for this and maybe try creating your LCC directly from the CLI in your (Linux-based) SageMaker environment.

Sample LCC scripts are available for reference:

AWS
EXPERT
Alex_T
answered 2 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions