- Newest
- Most votes
- Most comments
Hello,
Thank you for contacting us and for using AWS Deep learning AMI.
Please note that such issues in the nvidia-smi command can generally occur when an unsupported instance type for the Deep Learning AMI GPU PyTorch 1.11.0 (Amazon Linux 2) 20220328 is used.
As per https://aws.amazon.com/releasenotes/aws-deep-learning-ami-gpu-pytorch-1-11-amazon-linux-2/ , the supported instance types for this AMI are -> G3, P3, P3dn, P4d, G5, G4dn
I was getting same issues which you mentioned when I tried using an unsupported ec2 instance type. Switching to the supported instances resolved the issue for me. Can you try the same at your end ?
If required further help, I would recommend you to provide more information about your Deep learning AMI EC2 instance by opening a support case as we cannot discuss account specific issue in the public posts. You can open a support case with AWS using the link:
https://console.aws.amazon.com/support/home?#/case/create
Thanks :)
Yup same issue here. Documentation is inconsistent:
Deep Learning Proprietary Nvidia Driver AMI GPU TensorFlow 2.13 (Amazon Linux 2)
Please use the following command to activate the TensorFlow 2.13 Pip Environment:
TensorFlow 2.13 ____________________ source /opt/tensorflow/bin/activate
- Supported EC2 instances: P3, P3dn, G3, G5, G4dn.
- For P4 EC2 instances, please use OSS Nvidia Driver DLAMI.
- EC2 P2 Instance is not supported on current DLAMI.
NVIDIA driver version: 535.129.03 CUDA version: 11.8
AWS Deep Learning AMI Homepage: https://aws.amazon.com/machine-learning/amis/ Release Notes: https://docs.aws.amazon.com/dlami/latest/devguide/appendix-ami-release-notes.html Support: https://forums.aws.amazon.com/forum.jspa?forumID=263 For a fully managed experience, check out Amazon SageMaker at https://aws.amazon.com/sagemaker Security scan reports for python packages are located at: /opt/aws/dlami/info/
source /opt/tensorflow/bin/activate
Error: Note that the Amazon EC2 g4dn.2xlarge instance type is not supported by current Deep Learning AMI.
Please refer the DLAMI release notes https://docs.aws.amazon.com/dlami/latest/devguide/appendix-ami-release-notes.html for more information.
I have used multiple instance for with Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.1.0 (Ubuntu 20.04) 20240116 Have requested quota for G and P instance types. And get same problem!
Relevant content
- asked 6 years ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated 4 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
Hey, I have the same problem; I have an g3s.xlarge instance on eu-west-2 with the image Deep Learning OSS Nvidia Driver AMI GPU PyTorch 1.13.1 (Amazon Linux 2) 20240123. When you try "nvidia-smi" you get the error "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running." And when trying to use pytorch no GPUs are discovered.