Install NVIDIA GPU driver, CUDA Toolkit, NVIDIA Container Toolkit on Amazon EC2 instances running Amazon Linux 2 (AL2)
Steps to install NVIDIA driver, CUDA Toolkit, NVIDIA Container Toolkit on AL2 (Amazon Linux 2) (x86_64/arm64)
Overview
This article suggests how to install NVIDIA GPU driver, CUDA Toolkit, NVIDIA Container Toolkit on NVIDIA GPU EC2 instances running AL2 (Amazon Linux 2)
Note that by using this method, you agree to NVIDIA Driver License Agreement, End User License Agreement and other related license agreement.
This article applies to AL2 only. Similar articles are available for AL2023, Ubuntu Linux, RHEL/Rocky Linux and Windows
This article install NVIDIA Tesla driver which does not support G6f and Gr6f instance types
Other Options
If you need AMIs preconfigured with NVIDIA GPU driver, CUDA, other NVIDIA software, and optionally PyTorch or TensorFlow framework, consider AWS Deep Learning AMIs. Refer to Release notes for DLAMIs for currently supported options.
Refer to NVIDIA drivers for your Amazon EC2 instance for NVIDIA driver install options.
For container workloads, consider Amazon ECS-optimized Linux AMIs and Amazon EKS optimized AMIs
Note: instructions in this article are not applicable to pre-built AMIs.
Custom ECS/EKS GPU-optimized AMI
If you wish to build your own custom Amazon ECS or EKS GPU-optimized AMI, install NVIDIA driver, Docker and NVIDIA container toolkit, and refer to How do I create and use custom AMIs in Amazon ECS? and amazon-eks-ami
About CUDA toolkit
CUDA Toolkit is generally optional when GPU instance is used to run applications (as opposed to develop applications) as the CUDA application typically packages (by statically or dynamically linking against) the CUDA runtime and libraries needed.
End of support notice
- EPEL support: this article uses EL 7 packages from Extra Packages for Enterprise Linux (EPEL) which has ended support since June 2024
- AL2 OS support: Support for Amazon Linux 2 will end on 2026-06-30. AL2 has high level of compatibility with CentOS 7
- NVIDIA Driver support: R550 and R535 are the latest Production Branch (PB) and Long Term Support Branch (LTSB) to support CentOS 7. R550 and R535 will end of life in June 2025 and June 2026 respectively
- CUDA Toolkit support: NVIDIA has deprecated and removed support for CentOS 7 in CUDA 12.4 Update 1 and 12.5 respectively
While this guide may work with newer NVIDIA driver and CUDA toolkit versions, you are encouraged to use a newer OS. Possible options include Amazon Linux 2023, Ubuntu and RHEL/Rocky among others
Prerequisites
Go to Service Quotas console of your desired Region to verify On-Demand Instance quota value of your desired instance type:
- G instance types: Running On-Demand G and VT instances
- P instance types: Running On-Demand P instances
Request quota increase if the assigned value is less than vCPU count of your desired EC2 instance size. Do not proceed until your applied quota value is equal or higher than your instance type vCPU count
Prepare Amazon Linux 2
Launch a new NVIDIA GPU instance running Amazon Linux 2 preferably with at least 25 GB storage
You may have to search AMI by keyword amzn2-ami-kernel-5.10-hvm. Go to Community AMIs section and select AMI with latest Publish Date where OwnerAlias is amazon.
Connect to the instance as ec2-user
Install DKMS and kernel headers
sudo yum update -y
sudo amazon-linux-extras install -y epel
sudo yum install -y dkms kernel-devel kernel-devel-$(uname -r) vulkan-devel libglvnd-devel elfutils-libelf-devel automake make gcc gcc-c++ xorg-x11-server-Xorg xorg-x11-fonts-Type1 xorg-x11-drivers
sudo systemctl enable --now dkms
Restart your EC2 instance if kernel is updated
sudo reboot
Install NVIDIA driver
To install NVIDIA driver version 570.195.03
cd /tmp
DRIVER_VERSION=570.195.03
curl -L -O https://us.download.nvidia.com/tesla/$DRIVER_VERSION/NVIDIA-Linux-$(arch)-$DRIVER_VERSION.run
chmod +x ./NVIDIA-Linux-$(arch)-$DRIVER_VERSION.run
sudo CC=/usr/bin/gcc10-cc ./NVIDIA-Linux-$(arch)-$DRIVER_VERSION.run -s
sudo sed -i "s/\"'make'/& CC=\/usr\/bin\/gcc10-gcc /" /usr/src/nvidia-$DRIVER_VERSION/dkms.conf
To install another version, refer to Driver Release Notes for available Linux versions, and modify the above line that sets DRIVER_VERSION value
Verify compilation
Verify that driver module compilation is successful
tail /var/log/nvidia-installer.log
Output should be similar to below
-> The initramfs will not be rebuild.
executing: '/sbin/ldconfig'...
executing: '/sbin/depmod -a '...
executing: '/bin/systemctl daemon-reload'...
-> done.
-> Driver file installation is complete.
-> Running post-install sanity check:
-> done.
-> Post-install sanity check passed.
-> Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version: 570.172.08) is now complete.
Verify module
nvidia-smi
Output should be similar to below
Tue Oct 7 02:08:12 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.195.03 Driver Version: 570.195.03 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 |
| N/A 31C P0 25W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Optional: CUDA Toolkit
Ensure your EC2 instance has more than 15 GB of free disk space to install CUDA Toolkit 12.4 Update 1
cd /tmp
if ( arch | grep -q x86 ); then
wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda_12.4.1_550.54.15_linux.run
else
wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda_12.4.1_550.54.15_linux_sbsa.run
fi
chmod +x ./cuda_*.run
sudo CC=/usr/bin/gcc10-cc ./cuda_*.run --toolkit --silent
Refer to CUDA Toolkit documentation about installation options.
To install another version, refer to CUDA Toolkit Archive page for runfile (local) download link.
Verify compilation
tail /var/log/cuda-installer.log
Your output should be similar to below
[INFO]: Installing: cuda-nvvm
[INFO]: Installing: cuda-crt
[WARNING]: Skipping copy. File already exists at: /usr/local/cuda-12.4/bin/crt/link.stub
[WARNING]: Skipping copy. File already exists at: /usr/local/cuda-12.4/bin/crt/prelink.stub
[INFO]: Installing: cuda-nvprune
[INFO]: Installing: CUDA Demo Suite 12.4
[WARNING]: Cannot find manpages to install.
[INFO]: Installing: CUDA Documentation 12.4
[WARNING]: Skipping copy. File already exists at: /usr/local/cuda-12.4/bin/cuda-uninstaller
[WARNING]: Cannot find manpages to install.
Verify installation
/usr/local/cuda/bin/nvcc -V
Output should be similar to below
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:24:28_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
Post-installation Actions
Refer to NVIDIA CUDA Installation Guide for Linux for post-installation actions before CUDA Toolkit can be used. For example, you may want to modify your PATH and LD_LIBRARY_PATH environment variables to include /usr/local/cuda-12.4/bin and /usr/local/cuda-12.4/lib64
respectively.
Optional: NVIDIA Container Toolkit
NVIDIA Container toolkit supports AL2 on both x86_64 and arm64.
sudo yum-config-manager --add-repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
sudo yum install -y nvidia-container-toolkit
Refer to NVIDIA Container toolkit documentation about supported platforms, prerequisites and installation options
Verify Container Toolkit
nvidia-container-cli -V
Output should be similar to below
cli-version: 1.17.8
lib-version: 1.17.8
build date: 2025-05-30T13:47+0000
build revision: 6eda4d76c8c5f8fc174e4abca83e513fb4dd63b0
build compiler: gcc 4.8.5 20150623 (Red Hat 4.8.5-44)
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
Container engine configuration
Refer to NVIDIA Container Toolkit site for container engine configuration instructions.
Docker
To install and configure docker
sudo amazon-linux-extras install -y docker
sudo systemctl enable docker
sudo usermod -aG docker ec2-user
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Verify Docker engine configuration
To verify docker configuration
sudo docker run --rm --runtime=nvidia --gpus all public.ecr.aws/amazonlinux/amazonlinux:2 nvidia-smi
Output should be similar to below
Unable to find image 'public.ecr.aws/amazonlinux/amazonlinux:2' locally
2: Pulling from amazonlinux/amazonlinux
91f0f90aeef8: Pull complete
Digest: sha256:281b59b2d12e8b1eb6bf9b33ad2c6fe3f75d3ec6cd527ee20f2f0d84f65f706f
Status: Downloaded newer image for public.ecr.aws/amazonlinux/amazonlinux:2
Tue Oct 7 02:09:19 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.195.03 Driver Version: 570.195.03 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 |
| N/A 31C P0 25W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Install on EC2 instance at launch
To install NVIDIA driver, CUDA toolkit and NVIDIA container toolkit including Docker when launching a new GPU instance with at least 25 GB storage, you can use the following as user data script.
Remove # character (except the first line) if you wish to install CUDA toolkit
#!/bin/bash
sudo yum update -y
sudo amazon-linux-extras install -y epel
sudo yum install -y dkms kernel-devel kernel-devel-$(uname -r) vulkan-devel libglvnd-devel elfutils-libelf-devel automake make gcc gcc-c++ xorg-x11-server-Xorg xorg-x11-fonts-Type1 xorg-x11-drivers
sudo systemctl enable --now dkms
cd /tmp
DRIVER_VERSION=570.195.03
curl -L -O https://us.download.nvidia.com/tesla/$DRIVER_VERSION/NVIDIA-Linux-$(arch)-$DRIVER_VERSION.run
chmod +x ./NVIDIA-Linux-$(arch)-$DRIVER_VERSION.run
sudo CC=/usr/bin/gcc10-cc ./NVIDIA-Linux-$(arch)-$DRIVER_VERSION.run -s
sudo sed -i "s/\"'make'/& CC=\/usr\/bin\/gcc10-gcc /" /usr/src/nvidia-$DRIVER_VERSION/dkms.conf
#if ( arch | grep -q x86 ); then
# curl -L -O https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda_12.4.1_550.54.15_linux.run
#else
# curl -L -O https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda_12.4.1_550.54.15_linux_sbsa.run
#fi
#chmod +x ./cuda*.run
#sudo CC=/usr/bin/gcc10-cc ./cuda_*.run --toolkit --silent
sudo yum-config-manager --add-repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
sudo yum install -y nvidia-container-toolkit
sudo amazon-linux-extras install -y docker
sudo systemctl enable docker
sudo usermod -aG docker ec2-user
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
sudo reboot
Verify
Connect to your EC2 instance
nvidia-smi
/usr/local/cuda/bin/nvcc -V
nvidia-container-cli -V
sudo docker run --rm --runtime=nvidia --gpus all public.ecr.aws/amazonlinux/amazonlinux:2 nvidia-smi
View /var/log/cloud-init-output.log to troubleshoot any installation issues.
Perform post-installation actions in order to use CUDA toolkit. To verify integrity of installation, you can download, compile and run CUDA samples such as deviceQuery.
If Docker and NVIDIA container toolkit (but not CUDA toolkit) are installed and configured, you can use CUDA samples container image to validate CUDA driver.
sudo docker run --rm --runtime=nvidia --gpus all nvcr.io/nvidia/k8s/cuda-sample:devicequery
GUI (graphical desktop) remote access
If you need remote graphical desktop access, refer to Install GUI (graphical desktop) on Amazon EC2 instances running Amazon Linux 2 (AL2)?
Note that this article installs NVIDIA Tesla driver (also know as NVIDIA Datacenter Driver), which is intended primarily for GPU compute workloads. If configured in xorg.conf, Tesla drivers support one display of up to 2560x1600 resolution.
GRID drivers provide access to four 4K displays per GPU and are certified to provide optimal performance for professional visualization applications. AMIs preconfigured with GRID drivers are available from AWS Marketplace. You can also consider using amazon-ec2-nice-dcv-samples CloudFormation templates to provision your own EC2 instances with either NVIDIA Tesla or GRID driver, Docker with NVIDIA Container Toolkit, graphical desktop environment and Amazon DCV remote display protocol server.
Uninstallation
CUDA Toolkit
To uninstall CUDA Toolkit, run the uninstallation script provided in the bin directory of the toolkit. For version 12.4
sudo /usr/local/cuda-12.4/bin/cuda-uninstaller
NVIDIA Driver
To remove NVIDIA driver
sudo /usr/bin/nvidia-uninstall
You may have to delete /usr/src/nvidia-$DRIVER_VERSION folder manually
- Language
- English
