Best way to run GPU containers

0

Bringing existing containerized solution into AWS. I have few questions. I uploaded containers to ECR, created RDS etc. created cluster and task definitions, started, all good. Now need to figure out how to run containerized custom nvidia triton in AWS. This needs 1-2 A100/80GB or similar.

  1. What is the best instance to run one or two A100 GPUs. If instance is p4d.x24large can i use only one GPU from it or does billing run for all 8?
  2. Planning to turn GPU task on only when needed, for example 8 hours a day and destroy when not needed. I investigated and there are many ways to do it but what is the recommended way?
  3. Should i use service discovery to get the GPU server ip and port info as this would be in task2 while container that accesses this is in task1 - is there simpler method for the case when GPU server is started- destroyed/stopped. If instance is started and destroyed the system will give it new ip adderesses. Is there better way. The idea is to save $$$ in GPU cost.

Thanks in advance!

질문됨 6달 전384회 조회
1개 답변
0

Hello,

You can run A100 GPUs in p4d.24xlarge, and you will be charged $32.7726/hr, this pricing is for on demand instnace and you qwill be charged for the whole instance i.e 8 GPUs. You can find more information regarding the pricing here[1].

As you have mentioned there are different ways to run your workloads for a specified time and it would be worth identifying your use case and identifying the best way suits your use case, as this solely depends on your use case and requirements. Service discovery would be the straight forward method for the use case, as the above pricing is for on demand you may consider using spot instances if it suits your use case also try optimizing the most resources in less time.

I have included below links, you may consider visiting them to see if they are helpful.

[1] https://aws.amazon.com/ec2/pricing/on-demand/ [2] https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-gpu.html#gpu-considerations [3] https://aws.amazon.com/blogs/compute/optimizing-gpu-utilization-for-ai-ml-workloads-on-amazon-ec2/ [4] https://aws.amazon.com/blogs/compute/10-things-you-can-do-today-to-reduce-aws-costs/ [5] https://aws.amazon.com/ec2/cost-and-capacity/

AWS
sanju_s
답변함 6달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠