Design Real-Time Architectures with Spark & Kafka

本课程是 Real-Time, Real Fast: Kafka & Spark for Data Engineers 专项课程的一部分

位教师：Soheil Haddadi

访问权限由 New York State Department of Labor 提供

3个模块

深入了解一个主题并学习基础知识。

中级等级

推荐体验

4 小时完成

灵活的计划

自行安排学习进度

3个模块

深入了解一个主题并学习基础知识。

中级等级

推荐体验

4 小时完成

灵活的计划

自行安排学习进度

您将学到什么

Examine core real-time data principles and how Kafka and Spark support streaming architectures.
Create real-time pipelines by connecting Kafka topics with Spark Structured Streaming.
Improve and deploy streaming systems using monitoring, fault tolerance, and tuning.

您将获得的技能

您将学习的工具

要了解的详细信息

可分享的证书

添加到您的领英档案

作业

1 项作业

授课语言：英语（English）

了解顶级公司的员工如何掌握热门技能

了解关于 Coursera for Business 的更多信息

Petrobras, TATA, Danone, Capgemini, P&G 和 L'Oreal 的徽标

积累特定领域的专业知识

本课程是 Real-Time, Real Fast: Kafka & Spark for Data Engineers 专项课程专项课程的一部分

在注册此课程时，您还会同时注册此专项课程。

向行业专家学习新概念
获得对主题或工具的基础理解
通过实践项目培养工作相关技能
获得可共享的职业证书

该课程共有3个模块

“Design Real-Time Architectures with Apache Spark & Kafka” is an intermediate-level course crafted for learners aiming to build modern, scalable streaming systems. Across engaging, scenario-driven lessons, the course offers a comprehensive introduction to designing and implementing real-time data pipelines. Participants explore the foundations of streaming concepts, event-driven patterns, and the unique demands of low-latency processing. They gain practical experience working with Apache Kafka for event ingestion and Apache Spark Structured Streaming for real-time computation, learning to transform raw streams into actionable insights. The curriculum emphasizes reliable pipeline design, covering fault tolerance, checkpointing, and performance tuning to ensure systems can operate at scale. Through hands-on practice, guided dialogues, and real-world financial data scenarios, learners develop the confidence to architect, optimize, and deploy production-ready streaming solutions. By the end of the course, they are equipped with the technical and strategic skills needed to excel in today’s data-driven, real-time environments.

Learners should know basic Python or Scala, be comfortable with the command line, understand distributed systems at a high level, and have a simple introductory familiarity with Kafka and Spark. This course is ideal for aspiring data engineers, analysts or data scientists shifting into real-time systems, and software engineers exploring event-driven architecture. It also suits anyone working with large-scale data or financial and AI/ML pipelines who wants to understand how real-time data powers modern systems. By the end of the course, they are equipped with the technical and strategic skills needed to excel in today’s data-driven, real-time environments.

This module introduces the core principles behind real-time data systems and how they differ from traditional batch processing. Learners explore key patterns such as event-driven design, streaming workflows, and the roles Kafka and Spark play in a modern data ecosystem. By the end, learners understand the foundational components required to build low-latency, scalable streaming architectures.

涵盖的内容

4个视频2篇阅读材料1次同伴评审

4个视频总计18分钟

Welcome to the Real-Time Architectures with Apache Spark & Kafka2分钟
Key Components: Kafka, Spark & Supporting Ecosystem Tools5分钟
Event-Driven Patterns and Streaming Design Principles5分钟
Key Components: Kafka, Spark & Supporting Ecosystem Tools6分钟

2篇阅读材料总计10分钟

Welcome to the Course: Course Overview5分钟
Streaming Data vs. Stream Processing vs. Real-Time Analytics5分钟

1次同伴评审总计20分钟

Hands-On-Learning: Mapping a Real-Time Architecture for Live Transaction Monitoring20分钟

In this module, learners dive into the practical construction of streaming pipelines using Kafka and Spark Structured Streaming. They design Kafka topics, configure producers and consumers, and connect Spark to process incoming data streams. The module emphasizes transformations, windowing, and stateful operations essential for building functional real-world pipelines.

涵盖的内容

3个视频1篇阅读材料1次同伴评审

3个视频总计20分钟

Designing Kafka Topics, Producers & Consumers5分钟
Connecting Spark Structured Streaming to Kafka7分钟
Transformations, Windows & Stateful Stream Processing8分钟

1篇阅读材料总计5分钟

Designing Effective Kafka Topics and Event Streams5分钟

1次同伴评审总计20分钟

Hands-On-Learning: Building a Streaming Pipeline for Real-Time Transaction Alerts20分钟

This module focuses on preparing real-time systems for production environments. Learners explore fault tolerance, scalability strategies, and performance tuning for Kafka and Spark. They also learn how to monitor streaming workloads, implement checkpoints, and ensure reliability. The module concludes with best practices for deploying and maintaining robust, enterprise-ready real-time architectures.

涵盖的内容

4个视频1篇阅读材料1个作业2次同伴评审

4个视频总计21分钟

Ensuring Reliability with Checkpointing & Fault Tolerance5分钟
Performance Tuning Kafka & Spark for Real-Time Workloads5分钟
Deploying, Monitoring & Managing Streaming Pipelines8分钟
Course Wrap-Up2分钟

1篇阅读材料总计5分钟

10× Pipeline Performance: Kafka and Spark Tuning in Practice5分钟

1个作业总计20分钟

Design Real-Time Architectures with Spark & Kafka20分钟

2次同伴评审总计80分钟

Hands-On-Learning: Optimizing and Monitoring a Production-Ready Streaming System20分钟
Project: Real-Time Streaming Alert System for Money-Laundering Detection60分钟