Skip to content

All Content tagged with Elastic Fabric Adapter

Elastic Fabric Adapter (EFA) is a network interface for Amazon EC2 instances that enables customers to run applications requiring high levels of inter-node communications at scale on AWS.

Content language: English

Filter content
Select tags to filter
Sort by
Sort by most recent
7 results
[Join our experts LIVE on Twitch](https://bit.ly/4anH9WR) to dive deep into Implementing Health Checks for Large-scale AI/ML Training
This article provides a systematic approach to diagnose and resolve performance issues in distributed large language model (LLM) training operations. This article also focuses on pre-flight checks and...
Instead of streaming models from network-based storage, this article describes how to preload model weights onto high-speed local storage to reduce startup times for large language models (LLMs).
NCCL Operations uses RDMA WRITE/WRITE IMM for few of the collectives ( all_reduce, all2all) but looking the support page Nitro V3 dont support RDMA RDMA Write , that means NCCL dont run on Nitro V3 ?
1
answers
0
votes
174
views
asked 7 months ago
* I have 2 EC2 instances of size p4de.24xlarge. * They have each been created with a single network interface on them that is EFA-enabled. * I am able to see that the efa interface exists and that lib...
Accepted AnswerElastic Fabric Adapter
2
answers
0
votes
507
views
asked 2 years ago
Today I follow efa get started doc https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html#efa-start-enable to install EFA software on Amazon Linux 2, it failed on the last step "Testing E...
1
answers
0
votes
1.6K
views
asked 4 years ago
I follow the guide https://www.hpcworkshops.com/07-efa/01-create-efa-cluster.html to create a HPC cluster, and running the MPI hello world application(git clone https://github.com/mpitutorial/mpitutor...
3
answers
1
votes
1.3K
views
asked 4 years ago
  • 1
  • Page size
    12 / page