Get the latest from re:Invent 2022

7 minute read
Content level: Foundational
3

All you need to know about AWS Trainium and Inferentia at re:Invent 2022

Authored by Dr. Max Liu

More than 50,000 cloud enthusiasts were in Las Vegas in the last week of November, and more than 300,000 registered to tune in virtually, for re:Invent, AWS’s biggest cloud event of the year, where they learned the latest in cloud technology, got hands-on training from the experts, and heard directly from leaders across the AWS organization.

For the machine learning and AI crowd, we offered many sessions about purpose-built ML accelerators: AWS Inferentia and AWS Trainium. These sessions focused on why Inferentia/Trainium maybe a good fit for you, how to optimize your models for the highest performance while lowering costs, and how to create sustainable solutions to accelerate deep learning applications in the cloud. We had deep dive sessions to help you learn more about the technology and hands on workshops to help you get started.

Missed one or two sessions during the busy week in Las Vegas? No worries! We got you covered. Here we summarized all the keynote highlights and breakout recordings for AWS Trainium and Inferentia, so you can take your time and view them all in one place. Enjoy!


Keynote Highlights

Monday Night Live with Peter DeSantis

Enter image description here

Peter unveiled Trn1n with 1600Gbps of EFA connectivity, and spent time deep diving into the trend of ML and how Trn1 with hardware-enabled stochastic rounding helps our customers finish training 20% faster. We also got a cool Rings of Power shout out, showcasing the ability to improve collective compute scaling via innovation that is unique to Trainium+EFA.

Keynote with Adam Selipsky

Enter image description here

Adam covered all of AWS purpose-built ML accelerator product lines: Trn1 (50% lower training cost), Inf1 (70% lower cost per inference), and announced the new Inf2 instances! He highlighted the ability to deploy huge size models with 175B parameters on Inf2, as well as the 4x performance gains and 1/10th of latency over Inf1.


Sessions

Introducing AWS Inferentia2-based EC2 Inf2 instances

Enter image description here

Introducing the new Amazon EC2 Inf2 instances featuring AWS Inferentia2, the third ML accelerator built by AWS and optimized for ML inference. In this session, learn about how Inf2 instances deliver the lowest cost-per-inference for customers’ most demanding, 100B+ parameter deep learning models in the cloud. Dive deep into how AWS Inferentia2 has been architected to deploy the next generation of 100B+ parameter deep learning models for inference on Amazon EC2.

Accelerate deep learning and innovate faster with AWS Trainium

Enter image description here

Amazon EC2 Trn1 instances, powered by AWS Trainium chips, are purpose-built for high-performance, deep-learning training and offer up to 50 percent cost-to-train savings over equivalent GPU-based instances. In this session, learn about AWS Trainium and Trn1 innovations, the AWS collaboration with PyTorch and Hugging Face, and the successes users have seen.

Train large language model using Hugging Face and AWS Trainium

Sustainability and AWS silicon

Enter image description here

With the world’s increasing need for computing and machine learning becoming mainstream, continually innovating at the chip level is critical to sustainably powering the workloads of the future. In this session, learn how AWS continues to innovate on chip design as the organization works toward Amazon’s goal of achieving net-zero carbon by 2040. Find out about the carbon emissions associated with the silicon manufacturing process and hardware usage, and how the design process at AWS delivers higher power efficiency and lower carbon footprint for chips designed by AWS. Learn how sustainability is integrated into Pinterest’s AWS architecture decisions.

Reduced costs and better performance for startups with AWS Inferentia

Enter image description here

Amazon EC2 Inf1 instances, powered by AWS Inferentia chips, deliver up to 70 percent lower cost per inference and up to 2.3 times higher throughput than comparable GPU-based Amazon EC2 instances. In this session, learn about how startup companies have realized these benefits to grow their businesses and deliver innovative experiences to their end users.

AI parallelism explained: How Amazon Search scales deep-learning training

Transformer-based models have caused the rapid growth of model sizes over the past few years, with sizes and complexities increasing rapidly (more than 100 billion parameters), driven by proportional increases in accuracy and capabilities. Broader adoption of these advancements is blocked due to the ability to scale across a heterogeneous infrastructure. In this session, dive deep into parallelism strategies. Learn how Amazon Search trains large language models using various parallelism strategies and deploys them into production at scale. See a demo of all strategies (including DeepSpeed and PyTorch FSDP) and open-source code.

Silicon innovation at AWS

Organizations are bringing diverse workloads onto AWS at a faster rate than ever before. To run diverse workloads with the performance and costs that users expect, AWS often innovates on their behalf and delivers breakthrough innovations even at the silicon level. AWS efforts in silicon design began with the AWS Nitro System but quickly extended to AWS Graviton processors and purpose-built inference chips with AWS Inferentia. In this session, explore the AWS journey into silicon innovation and learn about some of the thought processes, learnings, and results from the experience so far.

Choosing the right accelerator for training and inference

Amazon EC2 provides the broadest and deepest portfolio of instances for machine learning applications. From GPU-based high-performance instances such as P4 and G5, to Trn1 and Inf1 instances purpose-built with AWS silicon for best price performance, there’s a right instance for each of your machine learning workloads. In this session, learn about these instances, benchmarks, and ideal use case guidelines for each of these instances. See a demo of how to initiate and scale machine learning workloads in production.

The Six Five On the Road with Gadi Hutt of Annapurna Labs at re:Invent 2022

The Six Five On the Road at AWS hashtag #reInvent 2022. Patrick Moorhead and Daniel Newman sit down with Gadi Hutt, Director of Business Development for Annapurna Labs, an Amazon Web Services (AWS) company. Their discussion covers EC2 and silicon innovation, including an overview of the news that was announced at re:Invent 2022.


About the Author

Max Liu, PhD, MBA is a tech business executive with 15+ years of experience in building products and growing businesses powered by AI & machine learning, cloud, and automation. He has held multiple leadership positions in firms ranging from startup to world's largest enterprises across functions in engineering, product management, and business development. At AWS, Max is a Principal Specialist focused on go-to-market and technical business development of AWS accelerated computing services for AI/ML. Prior to AWS, he was the Director of Product Management at Intel, where he managed the automotive product roadmap and created the company’s first purpose-built, server-class processor for autonomous driving. Earlier in his career, Max was a co-founder of a robotics startup and before that a R&D program manager at Ericsson. As a fruitful researcher, Max published many papers and reviewed many peers' work in leading journals and conferences. Max received his PhD in Thermophysics from Tsinghua University, China, and MBA with High Distinction from the University of Michigan.

profile pictureAWS
EXPERT
published a year ago1179 views