Birla Institute of Technology & Science, Pilani

Multicore and GPGPU Programming

通过 Coursera Plus 提高技能,仅需 239 美元/年(原价 399 美元)。立即节省

Birla Institute of Technology & Science, Pilani

Multicore and GPGPU Programming

Kunal Kishore Korgaonkar
Prof. Gargi Prabhu

位教师:Kunal Kishore Korgaonkar

包含在 Coursera Plus

深入了解一个主题并学习基础知识。
中级 等级

推荐体验

8 周 完成
在 10 小时 一周
灵活的计划
自行安排学习进度
深入了解一个主题并学习基础知识。
中级 等级

推荐体验

8 周 完成
在 10 小时 一周
灵活的计划
自行安排学习进度

您将学到什么

  • Understand the fundamentals of multi-threaded programming and its applications in multicore systems.

  • Develop shared memory programs in OpenMP and distributed programming using MPI.

  • Gain a foundational understanding of GPGPU architecture and the CUDA programming model.

要了解的详细信息

可分享的证书

添加到您的领英档案

作业

124 项作业

授课语言:英语(English)

了解顶级公司的员工如何掌握热门技能

Petrobras, TATA, Danone, Capgemini, P&G 和 L'Oreal 的徽标

该课程共有12个模块

In this module, the learners will be introduced to the course and its syllabus, setting the foundation for their learning journey. The course's introductory video will provide them with insights into the valuable skills and knowledge they can expect to gain throughout the duration of this course. Additionally, the syllabus reading will comprehensively outline essential course components, including course values, assessment criteria, grading system, schedule, details of live sessions, and a recommended reading list that will enhance the learner’s understanding of the course concepts. Moreover, this module offers the learners the opportunity to connect with fellow learners as they participate in a discussion prompt designed to facilitate introductions and exchanges within the course community.

涵盖的内容

4个视频1篇阅读材料1个讨论话题

In this module, students will gain foundational knowledge of parallel and multi-threaded programming, exploring the core principles that underlie the efficient utilisation of modern multi-core and many-core processors. Beginning with an overview of parallel programming concepts, this module covers different types of parallelism, including data parallelism, task parallelism, and pipeline parallelism. Students will also examine critical performance metrics like speedup, efficiency, and scalability, which help in evaluating the benefits and trade-offs of parallel approaches.

涵盖的内容

12个视频2篇阅读材料12个作业1个讨论话题

This module provides an in-depth exploration of multicore processor architectures, examining the design principles, performance considerations, and challenges involved in building efficient multicore systems. Students will study how multiple cores interact within a processor, focusing on memory hierarchies, caching mechanisms, and the role of parallelism in improving computational performance.

涵盖的内容

15个视频2篇阅读材料15个作业1个讨论话题

This module introduces students to the architectural principles of General-Purpose GPU (GPGPU) systems and the CUDA programming model. It explores the hardware components, including Streaming Multiprocessors (SMs), CUDA cores, and memory hierarchy, which form the foundation of GPU computing. The module also provides an overview of the CUDA programming model, emphasising its thread hierarchy, grid, and block organisation. By understanding these fundamental concepts, students will develop the ability to harness GPU architecture for high-performance parallel computing.

涵盖的内容

15个视频2篇阅读材料14个作业1个讨论话题

This module provides a comprehensive understanding of how CUDA executes programs on GPUs. It covers key concepts such as warps, warp scheduling, and resource partitioning, which are critical for understanding GPU hardware behaviour. The module delves into branch divergence and its impact on performance, offering strategies to minimise its effects. It also emphasises exposing parallelism effectively by leveraging CUDA’s hierarchical execution model. Students will learn how to design and optimise GPU programs by aligning with the underlying execution model to maximise efficiency and throughput.

涵盖的内容

15个视频2篇阅读材料15个作业1个讨论话题

The CUDA Memory Model & Streams and Concurrency module introduces students to the intricacies of memory hierarchy in CUDA, including global, shared, and local memory. It emphasises the importance of memory coalescing and efficient memory access patterns to optimise performance on GPUs. The module also covers CUDA streams, explaining how concurrent kernel execution and memory operations can be managed to enhance parallelism. By understanding these concepts, students will gain the ability to design GPU programs that maximise throughput and minimise latency.

涵盖的内容

14个视频2篇阅读材料14个作业1个讨论话题1个非评分实验室

This module explains in depth the difference between processes and threads and introduces multithreaded programming using pthreads library. Students are expected to learn about the various functions in pthreads library and implement those to solve real-world problems through a multithreaded approach. It also discusses precautions to take while developing an algorithm that uses multi-threading.

涵盖的内容

10个视频11篇阅读材料10个作业1个讨论话题

This module aims to introduce students to Distributed memory programming using the Message Passing Interface (MPI). Students will learn about the functions provided by the MPI library and their descriptions. It will enable students to develop parallel programming codes and also to convert a serial programmed code into a parallel code with the help of the MPI functions.

涵盖的内容

7个视频9篇阅读材料7个作业1个讨论话题

This module aims to introduce the shared memory programming model with the help of the OpenMP library. Students will gain exposure to the functions in the OpenMP library and methods to implement those in code to implement parallelism using shared memory. Students will explore the foundational concepts of OpenMP through videos and readings, starting with the basics of the library and progressing to more advanced topics such as reduction clauses, variable scoping, and mutual exclusion. Through worked examples like the Trapezoidal Rule and sorting functions, learners will understand how to parallelise loops, manage scheduling, and apply critical sections and locks for safe concurrent execution. The module also covers tasking in OpenMP and classic concurrency problems like producers and consumers.

涵盖的内容

12个视频12篇阅读材料13个作业1个讨论话题

This module will introduce the n-body problem in physics, examining its significance in simulating gravitational interactions among multiple particles. It will explore classical and modern algorithmic approaches to solving the n-body problem, followed by a discussion on their computational complexity. Emphasis will be placed on identifying opportunities for parallelisation, and students will analyse and implement efficient parallel solutions using the programming languages and parallel computing directives covered in the course.

涵盖的内容

13个视频13篇阅读材料13个作业1个讨论话题

This module focuses on hands-on implementations of the Sample Sort algorithm using OpenMP, Pthreads, MPI, and CUDA. Students will explore the strengths and limitations of each parallel programming model through practical coding exercises. The module includes performance benchmarking and comparative analysis of the implementations to highlight trade-offs in scalability, efficiency, and suitability for different architectures. By the end of the module, students will have a strong grasp of each API and be equipped to make informed decisions about the most appropriate tool for a given parallel computing task.

涵盖的内容

8个视频9篇阅读材料10个作业1个讨论话题

Final Comprehensive Examination

涵盖的内容

1个作业

位教师

Kunal Kishore Korgaonkar
Birla Institute of Technology & Science, Pilani
2 门课程 675 名学生
Prof. Gargi Prabhu
Birla Institute of Technology & Science, Pilani
1 门课程 61 名学生

提供方

人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

自 2018开始学习的学生
''能够按照自己的速度和节奏学习课程是一次很棒的经历。只要符合自己的时间表和心情,我就可以学习。'

Jennifer J.

自 2020开始学习的学生
''我直接将从课程中学到的概念和技能应用到一个令人兴奋的新工作项目中。'

Larry W.

自 2021开始学习的学生
''如果我的大学不提供我需要的主题课程,Coursera 便是最好的去处之一。'

Chaitanya A.

''学习不仅仅是在工作中做的更好:它远不止于此。Coursera 让我无限制地学习。'
Coursera Plus

通过 Coursera Plus 开启新生涯

无限制访问 10,000+ 世界一流的课程、实践项目和就业就绪证书课程 - 所有这些都包含在您的订阅中

通过在线学位推动您的职业生涯

获取世界一流大学的学位 - 100% 在线

加入超过 3400 家选择 Coursera for Business 的全球公司

提升员工的技能,使其在数字经济中脱颖而出

常见问题