- Newest
- Most votes
- Most comments
What did you save the data? Only files and data saved within the /home/ec2-user/SageMaker folder persist between notebook instance sessions. Files and data that are saved outside this directory are overwritten when the notebook instance stops and restarts. Each notebook instance's /tmp directory provides a minimum of 10 GB of storage in an instance store. An instance store is temporary, block-level storage that isn't persistent. When the instance is stopped or restarted, SageMaker deletes the directory's contents. This temporary storage is part of the root volume of the notebook instance.
Each time you start a SageMaker Studio "space" (or "app" in Studio Classic), a fresh container is started from the base image and only the contents of your persistent home folder are persisted:
- In SageMaker Studio, it's your user's home folder
/home/sagemaker-user
- In the older SageMaker Notebook Instances (not Studio), it's the
/home/ec2-user/SageMaker
folder mentioned in the other answer - and this sometimes catches me out because the terminal opens in/home/ec2-user
by default, which is not visible in JupyterLab folder tree or persisted between restarts... So you'll want tocd SageMaker
first.
So one solution is to simply curl
your program to somewhere under this persistent folder.
But if you need to perform other setup that lives outside this folder (for example to install OS-level libraries your program depends on, put files in /usr
or similar folders, or edit your ~/.bashrc
in SM Notebook Instances where root user home folder is not persisted), you can also use lifecycle configuration scripts (see docs for Studio JupyterLab, Studio Code Editor, Studio Classic, and Notebook Instances). These shell scripts run every time your environment starts, so can perform setup steps that need to be repeated like installing extra OS libraries, installing content outside the persistent folder, etc. Just note that:
- Your LCC script must complete within 5 minutes (or maybe start an asynchronous process that does the rest of setup)
- The environments are a bit different, so an LCC script will typically need some porting to work between Studio/Studio Classic/NBIs
- I have been aware of an issue in the past where editing LCCs in the AWS console from a Windows laptop causes
CRLF
line-endings to be inserted. This may be fixed already, but if it happens, you'll probably see the first line of your script fail with an error that doesn't make sense: E.g. fail tocd FolderThatDefinitelyExists
, because an invisible carriage return character is present at the end that's treated as part of the name. So if you work on a Windows machine, watch out for this and maybe try creating your LCC directly from the CLI in your (Linux-based) SageMaker environment.
Sample LCC scripts are available for reference:
Relevant content
- asked 5 months ago
- Accepted Answer
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago