Can I preview a course before enrolling?

Yes, you can preview the first video and view the syllabus before you enroll. You must purchase the course to access content not included in the preview.

When will I have access to the lectures and assignments?

If you decide to enroll in the course before the session start date, you will have access to all of the lecture videos and readings for the course. You’ll be able to submit assignments once the session starts.

What will I get when I enroll?

Once you enroll and your session begins, you will have access to all videos and other resources, including reading items and the course discussion forum. You’ll be able to view and submit practice assessments, and complete required graded assignments to earn a grade and a Course Certificate.

When will I receive my Course Certificate?

If you complete the course successfully, your electronic Course Certificate will be added to your Accomplishments page - from there, you can print your Course Certificate or add it to your LinkedIn profile.

Why can’t I audit this course?

This course is currently available only to learners who have paid or received financial aid, when available.

Is financial aid available?

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

Train Large Language Models Faster - Parallelism Deep Dive

抓住节省的机会！购买 Coursera Plus 3 个月课程可享受40% 的折扣，并可完全访问数千门课程。

Train Large Language Models Faster - Parallelism Deep Dive

位教师：Packt - Course Instructors

包含在中

了解更多

16个模块

深入了解一个主题并学习基础知识。

中级等级

推荐体验

1 周完成

在 10 小时一周

灵活的计划

自行安排学习进度

16个模块

深入了解一个主题并学习基础知识。

中级等级

推荐体验

1 周完成

在 10 小时一周

灵活的计划

自行安排学习进度

您将学到什么

Learn to apply parallelism strategies to accelerate LLM training.
Understand the differences and use cases of data, model, and hybrid parallelism.
Gain hands-on experience with PyTorch and DeepSpeed for LLM training optimization.
Master fault tolerance and checkpointing strategies to ensure training reliability.

您将学习的工具

Generative AI

要了解的详细信息

可分享的证书

添加到您的领英档案

了解顶级公司的员工如何掌握热门技能

了解关于 Coursera for Business 的更多信息

Petrobras, TATA, Danone, Capgemini, P&G 和 L'Oreal 的徽标

该课程共有16个模块

This course features Coursera Coach!

A smarter way to learn with interactive, real-time conversations that help you test your knowledge, challenge assumptions, and deepen your understanding as you progress through the course. This course focuses on accelerating the training of large language models (LLMs) through parallelism strategies. By exploring techniques such as data, model, and hybrid parallelism, you will learn how to optimize training processes for faster results. The course breaks down complex topics in a structured way, starting with an introduction to parallel computing and scaling laws, before diving into hands-on applications using popular libraries like PyTorch and DeepSpeed. You will also gain practical experience running parallelism strategies on multi-GPU systems and exploring fault tolerance techniques to ensure reliable training. The course integrates theoretical concepts with real-world examples to provide a comprehensive understanding of LLM training. Throughout the course, you will explore various types of parallelism—data, model, pipeline, and tensor parallelism—and their applications in LLMs. You’ll work with datasets like MNIST and WikiText, gaining hands-on experience implementing parallel strategies to optimize training speed. The course culminates in an exploration of advanced checkpointing strategies and fault tolerance methods, ensuring you understand how to recover from system failures during training. This course is perfect for learners interested in optimizing machine learning workflows and accelerating AI model development. A background in machine learning or deep learning is recommended, and the course is suitable for intermediate learners seeking to deepen their knowledge of LLM training strategies. By the end of the course, you will be able to implement and compare various parallelism techniques for LLM training, run distributed training on multi-GPU environments, apply fault tolerance strategies, and understand advanced topics in parallel computing.

In this module, we will introduce the course, explain the key objectives, and provide a roadmap of how parallelism techniques will accelerate large language model training. You will gain an overview of what to expect and get familiar with the course structure.

涵盖的内容

3个视频1篇阅读材料

In this module, we will explore the different parallelism strategies for LLM training, including single GPU vs. parallel strategies. You'll understand how parallelism improves efficiency and learn its key advantages in real-world applications.

涵盖的内容

4个视频1个作业

In this module, we will establish a foundational understanding of IT concepts crucial for training LLMs. Topics like cloud computing, storage solutions, and computer architecture will provide the context for optimizing LLM workflows.

涵盖的内容

10个视频1个作业

10个视频总计55分钟

IT Fundamentals - Introduction 1分钟
Introduction to Cloud Computing and Traditional IT 8分钟
What is a Computer - CPU and RAM Overview 7分钟
Data Storage and File Systems 4分钟
OS File System Structure 3分钟
LAN Introduction 11分钟
What is the Internet 8分钟
Internet Communication Deep Dive 5分钟
Understanding Servers and Clients 7分钟
GPUs - Overview 2分钟

1个作业总计15分钟

IT Fundamental Concepts - Assessment 15分钟

In this module, we will explore GPU architecture and its role in LLM training. You'll learn how GPUs are designed to handle the massive computations required by large models, ensuring faster and more efficient training.

涵盖的内容

2个视频1个作业

In this module, we will cover the fundamentals of machine learning and deep learning. We’ll explore neural networks, training processes, and key differences between ML and DL to lay the groundwork for LLM training.

涵盖的内容

11个视频1个作业

11个视频总计63分钟

Machine and Deep Learning Introduction 2分钟
Deep and Machine Learning - Overview and Breakdown 9分钟
Deep Learning Key Aspects 11分钟
Deep Neural Networks - Deep Dive 9分钟
The Single Neuron Computation - Deep Dive 6分钟
Weights 3分钟
Activation Functions - Deep Dive 6分钟
Deep Learning - Summary 2分钟
Machine Learning Introduction - ML vs DL 5分钟
Learning Types and Full ML & DL Analogy Example 6分钟
DL and ML Comparative Capabilities - Summary 4分钟

1个作业总计15分钟

Deep and Machine Learning - Deep Dive -Assessment 15分钟

In this module, we will dive into the fundamentals of LLMs, starting with the Transformer architecture. You'll learn about key components such as self-attention and how the Transformer library powers modern AI applications.

涵盖的内容

5个视频1个作业

In this module, we will introduce parallel computing concepts and their relevance to LLM training. You’ll gain a deeper understanding of how parallelism reduces bottlenecks and accelerates model development.

涵盖的内容

2个视频1个作业

In this module, we will explore data, model, and hybrid parallelism in detail. You’ll learn how each strategy optimizes training workflows and where to apply them for maximum efficiency in LLM training.

涵盖的内容

11个视频1个作业

11个视频总计49分钟

Types of Parallelism in LLM Training 2分钟
Data Parallelism - How It Works 12分钟
Data Parallelism Advantages for LLM Training 1分钟
Real-world Example - Data Parallelism in GPT-3 Training 5分钟
Model Parallelism and Tensor Parallelism and Layer Parallelism - Deep Dive 8分钟
LLM Relevance and Implementation 2分钟
Model vs Data Parallelism 9分钟
Key Differences Highlighted - Data vs Model Parallelism 3分钟
Data vs Model Parallelism 2分钟
Hybrid Parallelism - Animation 4分钟
Hybrid Parallelism - What is It and Motivation 2分钟

1个作业总计15分钟

Types of Parallelism in LLM Training - Data, Model, and Hybrid Parallelism - Assessment 15分钟

In this module, we will delve into pipeline and tensor parallelism, explaining their key concepts and how they work together to enhance training efficiency. You’ll also explore real-world strategies for implementing these techniques.

涵盖的内容

11个视频1个作业

11个视频总计55分钟

Pipeline Parallelism Overview 3分钟
Pipeline Parallelism Key Concepts and How it Works - Step by Step 7分钟
Pipeline Bubbles Key Concepts 3分钟
Pipeline Schedules Key Concepts 4分钟
Activation Recomputation - Overview and Introduction 2分钟
Neural Network and Activation and Backward and Forward Passes - Full Dive 7分钟
Understanding Activation Recomputation vs Standard Training - Deep Dive 10分钟
Demo - Activation Recomputation Visualization 3分钟
Activation Recomputation vs Standard Approach 4分钟
Benefits of Activation Recomputation and Implementation Strategies 9分钟
Pipeline Parallelism Implementation Frameworks and Key Takeaways 4分钟

1个作业总计15分钟

Types of Parallelism - Pipeline and Tensor Parallelism - Assessment 15分钟

In this module, we will dive deep into tensor parallelism, focusing on partitioning strategies, communication patterns, and device synchronization. You'll gain a clear understanding of how this technique accelerates LLM training.

涵盖的内容

8个视频1个作业

8个视频总计40分钟

What is Tensor Parallelism and Why - Benefits 5分钟
Tensor Parallel Pizza Making Analogy 2分钟
Tensors and Partitioning Strategies - Deep Dive 7分钟
Tensor Communication Patterns - Deep Dive 9分钟
Device Mesh Communication Pattern - Deep Dive 5分钟
How Components Work Together in Distributed LLM Training 4分钟
Understanding Tensor Parallelism with LEGO Bricks Animation Demo 4分钟
Putting it All Together - All Strategies in LLM Training 4分钟

1个作业总计15分钟

Tensor Parallelism - Deep Dive - Assessment 15分钟

In this module, we will shift to hands-on learning, applying data parallelism techniques in PyTorch. You'll train a small model on the MNIST dataset, testing different parallelism strategies and observing their effects on performance.

涵盖的内容

11个视频1个作业

11个视频总计60分钟

Strategies for Parallelizing LLMs - Hands-on Introduction 1分钟
Pytorch - LLM Training Library Overview 4分钟
The Transformers Library - Overview 2分钟
Numpy Overview 2分钟
TorchVision and TorchDistributed Overview 4分钟
DeepSpeed and Megatron-LM - Overview 3分钟
Datasets and Why this Toolkit 4分钟
HANDS-On: Data Parallelism - Training a Small Model - MNIST Dataset 21分钟
Testing Pseudo Data Parallelism Trained Model 9分钟
HANDS-ON: Data Parallelism - Colab - Full Demo 9分钟
Data Parallelism - Simulated Parallelism on GPU Takeaways 2分钟

1个作业总计15分钟

HANDS-ON: Strategies for Parallelism - Data Parallelism Deep Dive - Assessment 15分钟

In this module, we will apply data parallelism to the WikiText-2 dataset and use DeepSpeed to optimize memory usage. You'll gain hands-on experience with advanced techniques to improve LLM training efficiency.

涵盖的内容

3个视频1个作业

In this module, we will guide you through setting up Runpod.io for multi-GPU parallelism. You’ll gain practical experience running parallelism experiments on a distributed environment and working with large-scale models.

涵盖的内容

5个视频1个作业

In this module, we will dive into fault tolerance and checkpointing strategies. You'll learn how to ensure scalable, resilient LLM training workflows that can recover from failures and continue without interruptions.

涵盖的内容

10个视频1个作业

10个视频总计50分钟

Fault Tolerance Introduction & Types of Failures in Distributed LLM Training 4分钟
Strategies for Fault Tolerance 5分钟
Checkpointing in LLM Training - Animation 4分钟
Basic Checkpointing in LLM Training 4分钟
Incremental Checkpointing in LLM Training 7分钟
Asynchronous Checkpointing in LLM Training 7分钟
Multi-level Checkpointing in LLM Training - Animation 9分钟
Checkpoint Storage Considerations - Deep Dive 3分钟
Implementing a Hybrid Approach - Performance, Failure, Optimizations - Full Dive 5分钟
Checkpoint Storage Strategy - Summary 1分钟

1个作业总计15分钟

Fault Tolerance and Scalability & Advanced Checkpointing Strategies - Deep Dive - Assessment 15分钟

In this module, we will explore cutting-edge advancements in parallel computing and LLM training. You'll gain insight into the latest trends and technologies that are revolutionizing AI and the future of machine learning.

涵盖的内容

1个视频1个作业

In this module, we will wrap up the course by summarizing everything you've learned about parallelism and LLM training. You'll also receive guidance on how to proceed with your AI journey and apply these skills in future projects.

涵盖的内容

1个视频2个作业

位教师

Packt - Course Instructors

Packt

1,542 门课程 427,280 名学生

提供方

Packt

从 Cloud Computing 浏览更多内容

状态：免费试用
Pearson
Quick Start Guide to Large Language Models (LLMs): Unit 3
课程
状态：免费试用
Pearson
Quick Start Guide to Large Language Models (LLMs): Unit 1
课程
Packt
Decoding Large Language Models
课程
状态：免费试用
Pearson
Quick Start Guide to Large Language Models (LLMs): Unit 2
课程

人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

自 2018开始学习的学生

''能够按照自己的速度和节奏学习课程是一次很棒的经历。只要符合自己的时间表和心情，我就可以学习。'

Jennifer J.

自 2020开始学习的学生

''我直接将从课程中学到的概念和技能应用到一个令人兴奋的新工作项目中。'

Larry W.

自 2021开始学习的学生

''如果我的大学不提供我需要的主题课程，Coursera 便是最好的去处之一。'

Chaitanya A.

''学习不仅仅是在工作中做的更好：它远不止于此。Coursera 让我无限制地学习。'

通过 Coursera Plus 开启新生涯

无限制访问 10,000+ 世界一流的课程、实践项目和就业就绪证书课程 - 所有这些都包含在您的订阅中

了解更多

通过在线学位推动您的职业生涯

获取世界一流大学的学位 - 100% 在线

探索学位

加入超过 3400 家选择 Coursera for Business 的全球公司

提升员工的技能，使其在数字经济中脱颖而出

了解更多

常见问题

Parallelism in LLM training refers to distributing the workload of training a large language model across multiple computational resources, such as GPUs, to speed up the process. This is relevant because LLMs require vast amounts of computational power and time for training, and parallelism allows researchers and engineers to overcome these challenges. By using parallelism techniques, you can significantly reduce training time, optimize resource utilization, and enable more scalable AI systems.

This course focuses on teaching how to use parallelism to train large language models more efficiently. It covers various parallelism strategies, such as data parallelism, model parallelism, and hybrid parallelism. Through theoretical lessons and hands-on demos, you'll explore how parallel computing can accelerate LLM training, dive into GPU architectures, and gain practical experience with tools like PyTorch, DeepSpeed, and Megatron-LM. By the end of the course, you'll be well-equipped to apply parallelism strategies to your own LLM projects.

Upon completion of this course, you will be able to apply parallelism techniques to train large language models efficiently. You will have hands-on experience with various parallelism strategies such as data, model, and hybrid parallelism. You'll also be familiar with tools like PyTorch and DeepSpeed to optimize the training of LLMs, and you’ll understand how to implement fault tolerance and scalability strategies in distributed environments. This will allow you to improve model training speed and performance in real-world scenarios.

To get the most out of this course, you should have a foundational understanding of machine learning concepts, particularly deep learning and neural networks. Familiarity with Python and machine learning libraries like PyTorch will be helpful, as the course involves hands-on exercises using these tools. While experience with GPUs, cloud computing, or distributed systems is not required, it is recommended as the course will introduce these concepts as part of the curriculum.

This course is ideal for machine learning engineers, data scientists, and AI practitioners who want to optimize the training of large language models. If you're looking to improve your skills in distributed computing, parallelism, and large-scale AI model training, this course will provide you with the knowledge and practical skills needed to succeed. It's also suitable for those working in research or production environments who are involved in training and optimizing LLMs.

The course consists of 8 hours of video content. Depending on your learning pace and how much time you spend on hands-on activities and exercises, it can typically be completed within a week or two. The course is designed to be engaging and practical, offering both theoretical and applied knowledge to help you master parallelism techniques for LLM training.

Train Large Language Models Faster - Parallelism Deep Dive

Train Large Language Models Faster - Parallelism Deep Dive

您将学到什么

您将学习的工具

要了解的详细信息

了解顶级公司的员工如何掌握热门技能

该课程共有16个模块

Introduction

涵盖的内容

Strategies for Parallelizing LLMS - Deep Dive

涵盖的内容

IT Fundamental Concepts

涵盖的内容

GPU Architecture for LLM Training Deep Dive

涵盖的内容

Deep and Machine Learning - Deep Dive

涵盖的内容

Large Language Models - Fundamentals of AI and LLMs

涵盖的内容

Parallel Computing Fundamentals & Parallelism in LLM Training

涵盖的内容

Types of Parallelism in LLM Training - Data, Model, and Hybrid Parallelism

涵盖的内容

Types of Parallelism - Pipeline and Tensor Parallelism

涵盖的内容

Tensor Parallelism - Deep Dive

涵盖的内容

HANDS-ON: Strategies for Parallelism - Data Parallelism Deep Dive

涵盖的内容

HANDS-ON: Data Parallelism w/ WikiText Dataset & DeepSpeed Mem. Optimization

涵盖的内容

Running TRUE Parallelism on Multiple GPU Systems - Runpod.io

涵盖的内容

Fault Tolerance and Scalability & Advanced Checkpointing Strategies - Deep Dive

涵盖的内容

Advanced Topics and Emerging Trends

涵盖的内容

Wrap up and Next Steps

涵盖的内容

位教师

提供方

从 Cloud Computing 浏览更多内容

Quick Start Guide to Large Language Models (LLMs): Unit 3

Quick Start Guide to Large Language Models (LLMs): Unit 1

Decoding Large Language Models

Quick Start Guide to Large Language Models (LLMs): Unit 2

人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

Jennifer J.

Larry W.

Chaitanya A.

通过 Coursera Plus 开启新生涯

通过在线学位推动您的职业生涯

加入超过 3400 家选择 Coursera for Business 的全球公司

常见问题

What is parallelism in large language model (LLM) training, and why is it relevant?

What is this course about?

What will I be able to do after completing this course?

更多问题