2 Answers
- Newest
- Most votes
- Most comments
1
HI @blakem,
I can confirm the first issue is due to lack of GPU in the head node. To experiment within one of the compute nodes you can submit a job, retrieve the node hostname and then when the job is Running connect to the node with SSH:
[ec2-user@ip-10-0-0-33 ~]$ sbatch --wrap "sleep 100"
Submitted batch job 1
[ec2-user@ip-10-0-0-33 ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1 queue1 wrap ec2-user R 0:03 1 queue1-dy-queue1-t2medium-1
[ec2-user@ip-10-0-0-33 ~]$ ssh queue1-dy-queue1-t2medium-1
Once in the compute node you can try to manually install the package on it.
If it works as expected you can automate the installation by using OnNodeConfigured
custom bootstrap action: https://docs.aws.amazon.com/parallelcluster/latest/ug/custom-bootstrap-actions-v3.html
Enrico
answered 2 years ago
0
@enrico-aws, thanks for the quick turnaround and suggestion. I had to wait for 10+ minutes for my GPU node to finish initializing, but once it was running I was able to log into the GPU node.
answered 2 years ago
Relevant content
- asked 2 years ago
- Accepted Answerasked 2 years ago
- asked a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 5 months ago