Coursera

Real-Time, Real Fast: Kafka & Spark for Data Engineers 专项课程

Coursera

Real-Time, Real Fast: Kafka & Spark for Data Engineers 专项课程

Real-Time Kafka & Spark Data Engineering. Build fault-tolerant streaming pipelines processing millions of events with Kafka & Spark.

Caio Avelino
Jairo Sanchez
Starweaver

位教师:Caio Avelino

访问权限由 New York State Department of Labor 提供

深入学习学科知识
中级 等级

推荐体验

4 周 完成
在 10 小时 一周
灵活的计划
自行安排学习进度
深入学习学科知识
中级 等级

推荐体验

4 周 完成
在 10 小时 一周
灵活的计划
自行安排学习进度

您将学到什么

  • Design and optimize Kafka clusters for high throughput, low latency, and fault tolerance in production environments

  • Build end-to-end streaming pipelines with Spark Structured Streaming, exactly-once semantics, and schema evolution

  • Implement real-time dashboards, orchestration, and disaster recovery for enterprise streaming architectures

要了解的详细信息

可分享的证书

添加到您的领英档案

授课语言:英语(English)
最近已更新!

January 2026

了解顶级公司的员工如何掌握热门技能

Petrobras, TATA, Danone, Capgemini, P&G 和 L'Oreal 的徽标

精进特定领域的专业知识

  • 向大学和行业专家学习热门技能
  • 借助实践项目精通一门科目或一个工具
  • 培养对关键概念的深入理解
  • 通过 Coursera 获得职业证书

专业化 - 12门课程系列

Optimize Kafka for Speed & Availability

Optimize Kafka for Speed & Availability

第 1 门课程 4小时

您将学到什么

  • Configure Kafka topics with appropriate replication factors, partition counts, and durability settings to ensure high availability.

  • Diagnose performance bottlenecks using consumer lag metrics, broker health indicators, and throughput analysis.

  • Optimize producer and consumer configurations including batching, compression, and parallelism to maximize throughput while meeting latency SLAs.

您将获得的技能

类别:Apache Kafka
类别:System Configuration
类别:Performance Tuning
类别:Grafana
类别:Scalability
类别:Command-Line Interface
类别:Content Strategy
类别:Data Loss Prevention
类别:Process Optimization
类别:System Monitoring
类别:Distributed Computing
类别:Real Time Data
类别:Prometheus (Software)
Stream & Optimize Real-Time Data Flows

Stream & Optimize Real-Time Data Flows

第 2 门课程 4小时

您将学到什么

  • Evaluate log configurations to recommend tiered storage, retention policies, and access controls.

  • Design stream processing topologies that implement join patterns, aggregation windows, and state management for real-time data transformation.

  • Optimize real-time data flows by analyzing throughput bottlenecks, partition strategies, and resource allocation to meet SLAs within budget limits.

您将获得的技能

类别:Apache Kafka
类别:Payment Card Industry (PCI) Data Security Standards
类别:Real Time Data
类别:Data Architecture
类别:Computer Architecture
类别:Data Pipelines
类别:Cloud Storage
类别:Capacity Management
类别:System Monitoring
类别:Performance Tuning
类别:Apache
类别:Compliance Management
类别:Governance
类别:Data Governance
类别:Multi-Tenant Cloud Environments
类别:Application Performance Management
类别:Scalability
类别:Operational Data Store

您将学到什么

  • Explain core patterns for schema evolution (backward/forward/full compatibility, additive vs. breaking changes) and select the right strategy.

  • Implement versioned event/data contracts with Avro or Protobuf using a schema registry and enforce compatibility rules in CI/CD.

  • Orchestrate real‑time rollout plans across producers, consumers, and storage (Kafka topics, CDC sinks, warehouses) with monitoring and rollback.

您将获得的技能

类别:Real Time Data
类别:Data Warehousing
类别:Data Pipelines
类别:Data Validation
类别:Data Integrity
类别:Continuous Monitoring
类别:Automation Engineering
类别:Software Versioning
类别:Operational Databases
类别:Data Modeling
类别:Continuous Integration
类别:Automation
类别:Warehouse Management
类别:Apache Kafka

您将学到什么

  • Stream pipeline design by analyzing failure scenarios and business requirements to prevent data loss or duplication.

  • Implement exactly-once processing semantics across producer, processor, and sink layers using transactions, checkpoints, and idempotent operations.

  • Evaluate watermarking and windowing configurations to optimize the tradeoff between latency and data completeness.

您将获得的技能

类别:Apache Kafka
类别:Apache Spark
类别:Transaction Processing
类别:Internet Of Things
类别:Data Integrity
类别:Data Architecture
类别:Verification And Validation
类别:Integration Testing
类别:Project Implementation
类别:Service Level
类别:Data Pipelines
类别:Apache
类别:Performance Tuning
类别:Production Management
类别:System Design and Implementation
类别:Event Monitoring
类别:Real Time Data

您将学到什么

  • Explain the execution model of Spark Structured Streaming and build a simple pipeline from a file source to a console sink.

  • Develop streaming pipelines that integrate with Kafka, apply event-time processing with watermarks, and write reliable outputs to Delta Lake.

  • Build an end-to-end Spark streaming pipeline that can be deployed in real-world production environments.

您将获得的技能

类别:Real Time Data
类别:Apache Spark
类别:Fraud detection
类别:Data Pipelines
类别:Event Monitoring
类别:Scalability
类别:JSON
类别:Data Persistence
类别:PySpark
类别:Data Processing
类别:Data Transformation
类别:Apache Kafka
类别:Data-Driven Decision-Making
类别:Event Management
Optimize Spark Performance & Throughput

Optimize Spark Performance & Throughput

第 6 门课程 4小时

您将学到什么

  • Inspect Spark UI and metrics (task duration, shuffle I/O, executor CPU/mem) to find bottlenecks and recommend actionable optimizations.

  • Apply partitioning and skew mitigation (salting/custom partitioner) & reduce shuffle (broadcast joins, avoid groupByKey, AQE) to improve parallelism.

  • Configure executors, cores, memory, dynamic allocation and parallelism/caching settings to maximize throughput while meeting defined SLA targets.

您将获得的技能

类别:Apache Spark
类别:Performance Tuning
类别:PySpark
类别:Database Management
类别:Debugging
类别:Process Optimization
类别:System Configuration
类别:Scalability
类别:Performance Analysis
类别:Resource Allocation
类别:Job Analysis
Process & Analyze Real-Time Data Fast

Process & Analyze Real-Time Data Fast

第 7 门课程 5小时

您将学到什么

  • Architect a streaming data solution by differentiating between batch, micro-batch, and streaming patterns to solve a specific business problem.

  • Develop real-time analytics pipelines using window functions and watermarking to aggregate and analyze streaming data.

  • Optimize a production streaming application by diagnosing performance bottlenecks like data skew and implementing mitigation techniques.

您将获得的技能

类别:Apache Spark
类别:Fraud detection
类别:Real Time Data
类别:Dashboard
类别:Data Pipelines
类别:Data Analysis
类别:Performance Tuning
类别:Internet Of Things
类别:Trend Analysis
类别:Big Data
类别:Data Processing
类别:Performance Analysis
类别:Databricks
类别:Anomaly Detection
类别:Operational Databases
类别:PySpark
Build Real-Time Dashboards with Spark

Build Real-Time Dashboards with Spark

第 8 门课程 5小时

您将学到什么

  • Explain Spark’s streaming model and produce a dashboard-ready table from a simple file source.

  • Construct a real-time pipeline that ingests from Kafka, processes with Spark, and stores result in Delta using event-time windows and watermarks.

  • Operate a production-oriented dashboard with refresh policies, monitoring, and failure recovery.

您将获得的技能

类别:Apache Spark
类别:Data Integrity
类别:Real Time Data
类别:Dashboard
类别:Continuous Monitoring
类别:Data Persistence
类别:Apache Kafka
类别:Business Metrics
类别:PySpark
类别:JSON
类别:Data Pipelines
类别:Business Intelligence
类别:Scalability

您将学到什么

  • Transform nested and streaming data into analytics-ready tables using programming tools and platforms.

  • Implement automated data quality checks and integrate these checks into CI/CD pipelines to enforce quality gates.

  • Build and manage scalable real-time analytics pipelines that block low-quality data and connect curated datasets to Power BI dashboards.

您将获得的技能

类别:Data Transformation
类别:PySpark
类别:Data Quality
类别:Power BI
类别:Real Time Data
类别:Data Validation
类别:Data Pipelines
类别:Business Intelligence
类别:Dashboard
类别:Data Governance
类别:Performance Tuning
类别:Data Integrity
类别:CI/CD
类别:Data Visualization

您将学到什么

  • Build and schedule streaming and batch-adjacent workflows using a modern orchestrator, such as Airflow or Prefect.

  • IImplement reliability patterns like idempotence, checkpointing, DLQs, and backfills for fault-tolerant and exactly-once-ish processing.

  • Design multi-region recovery strategies (mirroring/replication) and run playbooks to restore pipelines after partial or regional failures.

您将获得的技能

类别:Real Time Data
类别:Apache Airflow
类别:Disaster Recovery
类别:Apache Spark
类别:Apache Kafka
类别:Data Pipelines
类别:Data Processing
类别:Site Reliability Engineering
类别:Data Integrity
类别:Data Storage Technologies
类别:Workflow Management
类别:Data Infrastructure
Stream & Unify Data Schemas with CDC

Stream & Unify Data Schemas with CDC

第 11 门课程 5小时

您将学到什么

  • Explain CDC fundamentals (binlog/WAL) and schema evolution strategies.

  • Configure a Schema Registry pipeline locally using Debezium and Kafka.

  • Use streaming SQL (Flink/ksqlDB) to map, cast, and merge divergent schemas into a canonical model.

您将获得的技能

类别:Data Pipelines
类别:Real Time Data
类别:Data Validation
类别:Database Design
类别:Apache Kafka
类别:Schematic Diagrams
类别:Data Capture
类别:Data Modeling
类别:PostgreSQL
类别:SQL
类别:Data Integrity
类别:Cloud Deployment
类别:Data Storage Technologies
类别:Continuous Integration
类别:Data Mapping
类别:Data Transformation
类别:Continuous Monitoring

您将学到什么

  • Examine core real-time data principles and how Kafka and Spark support streaming architectures.

  • Create real-time pipelines by connecting Kafka topics with Spark Structured Streaming.

  • Improve and deploy streaming systems using monitoring, fault tolerance, and tuning.

您将获得的技能

类别:Apache Spark
类别:Real Time Data
类别:Apache Kafka
类别:Performance Management
类别:Event-Driven Programming
类别:Data Transformation
类别:System Monitoring
类别:Real-Time Operating Systems
类别:Application Deployment
类别:Software Architecture
类别:Distributed Computing
类别:Systems Architecture
类别:Performance Tuning
类别:Scalability
类别:Data Processing
类别:Architecture and Construction
类别:Data Pipelines

获得职业证书

将此证书添加到您的 LinkedIn 个人资料、简历或履历中。在社交媒体和绩效考核中分享。

位教师

Caio Avelino
9 门课程 7,709 名学生
Jairo Sanchez
5 门课程 7,815 名学生
Starweaver
Coursera
548 门课程 996,451 名学生

提供方

Coursera

人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

自 2018开始学习的学生
''能够按照自己的速度和节奏学习课程是一次很棒的经历。只要符合自己的时间表和心情,我就可以学习。'

Jennifer J.

自 2020开始学习的学生
''我直接将从课程中学到的概念和技能应用到一个令人兴奋的新工作项目中。'

Larry W.

自 2021开始学习的学生
''如果我的大学不提供我需要的主题课程,Coursera 便是最好的去处之一。'

Chaitanya A.

''学习不仅仅是在工作中做的更好:它远不止于此。Coursera 让我无限制地学习。'