Coursera

Open source Data Engineering with Spark, dbt & Airflow 专业证书

Coursera

Open source Data Engineering with Spark, dbt & Airflow 专业证书

Build Production Data Pipelines at Scale.

Explore Spark, dbt, and Airflow to design, automate, and deploy enterprise-grade data pipelines.

访问权限由 Coursera Learning Team 提供

获得职业证书,展示您的专业知识
中级 等级

推荐体验

4 周 完成
在 10 小时 一周
灵活的计划
自行安排学习进度
获得职业证书,展示您的专业知识
中级 等级

推荐体验

4 周 完成
在 10 小时 一周
灵活的计划
自行安排学习进度

您将学到什么

  • Build modular, production-grade data pipelines using Apache Spark, dbt, and Airflow to ingest, transform, and load data at scale.

  • Design and implement dimensional data models including star schemas, SCD Type 2, and incremental load strategies for data warehouses.

  • Optimize distributed data processing by resolving Spark shuffle, skew, and partitioning issues to improve pipeline performance.

  • Automate deployments and enforce data quality using CI/CD pipelines, Docker containers, and automated testing frameworks like Great Expectations.

要了解的详细信息

可分享的证书

添加到您的领英档案

授课语言:英语(English)
最近已更新!

March 2026

了解顶级公司的员工如何掌握热门技能

Petrobras, TATA, Danone, Capgemini, P&G 和 L'Oreal 的徽标

借助热门技能来开拓您的职业生涯

  • 接受 Coursera 的专业级培训
  • 展现您对技术的精通程度
  • 通过 Coursera 获得雇主认可的证书

专业认证 - 6门课程系列

Building Automated Data Pipelines with Spark,dbt,and Airflow

Building Automated Data Pipelines with Spark,dbt,and Airflow

第 1 门课程, 小时

您将学到什么

  • Build end-to-end data pipelines that automatically ingest from databases, APIs, and streams using Spark, dbt, and Airflow tools.

  • Design data models with historical tracking using SCD Type 2 patterns to preserve complete change history for analytics.

  • Create automated workflows with intelligent retry logic, SLA monitoring, and parameterization for production reliability.

  • Optimize Spark job performance using partitioning and caching strategies to achieve 30%+ runtime improvements.

您将获得的技能

类别:Data Validation
类别:Apache Airflow
类别:Data Integration
类别:Database Development
类别:Data Quality
类别:Data Architecture
类别:Data Warehousing
类别:Enterprise Security
类别:Data Flow Diagrams (DFDs)
类别:Data Processing
类别:Apache Spark
类别:Data Pipelines
类别:Data Modeling
类别:Configuration Management
类别:Extract, Transform, Load
类别:Data Transformation
Optimizing Spark and Cloud Data Storage for Analytics

Optimizing Spark and Cloud Data Storage for Analytics

第 2 门课程, 小时

您将学到什么

  • Optimize Spark job performance through strategic partitioning and caching, achieving 30%+ runtime improvements using data access analysis.

  • Implement transactional data lakes with Delta format, enabling versioning, ACID operations, and schema evolution for reliable datasets.

  • Provision secure cloud data infrastructure using IAM policies, private networks, and encrypted storage following security best practices.

  • Evaluate and benchmark storage formats (Parquet, ORC, Avro) to select optimal solutions for analytical workloads and cost efficiency.

您将获得的技能

类别:Infrastructure Architecture
类别:Cloud Security
类别:Data Storage
类别:Amazon S3
类别:PySpark
类别:Cloud Computing Architecture
类别:Transaction Processing
类别:Apache Spark
类别:Data Storage Technologies
类别:Infrastructure as Code (IaC)
类别:Data Warehousing
类别:Data Lakes
类别:Cloud Deployment
类别:Data Infrastructure
类别:Data Integrity
类别:Data Security
类别:Cloud Storage
类别:Cloud Computing
类别:Performance Tuning
类别:Data Management
Data Modeling & Warehousing Fundamentals in Data Engineering

Data Modeling & Warehousing Fundamentals in Data Engineering

第 3 门课程, 小时

您将学到什么

  • Design star schema data models with fact and dimension tables that enable intuitive self-service business intelligence reporting.

  • Apply third normal form normalization to optimize database structure while maintaining query performance through indexing strategies.

  • Use advanced SQL window functions to calculate rolling metrics, rankings, and time-series analytics for complex data analysis.

  • Implement database replication and incremental loading techniques to ensure high availability and efficient data warehouse updates.

您将获得的技能

类别:Extract, Transform, Load
类别:Data Warehousing
类别:Data Pipelines
类别:Star Schema
类别:Database Architecture and Administration
类别:Data Integration
类别:Database Development
类别:Database Design
类别:Relational Databases
类别:SQL
类别:Data Quality
类别:Performance Tuning
类别:Data Modeling
类别:Business Intelligence
类别:Database Software
DevOps and CI/CD for Data Engineering Performance

DevOps and CI/CD for Data Engineering Performance

第 4 门课程, 小时

您将学到什么

  • Resolve merge conflicts and trace bugs using Git history tools, keeping collaborative codebases stable and production-ready.

  • Design branching strategies and automate deployments with CI/CD pipelines to safely promote data pipeline artifacts across environments.

  • Build and publish versioned Docker images and automate server configuration with Ansible for consistent, reproducible environments.

  • Analyze query execution metrics and optimize resource allocation to maintain performance targets in production data systems.

您将获得的技能

类别:Continuous Deployment
类别:Git (Version Control System)
类别:Data Infrastructure
类别:Infrastructure as Code (IaC)
类别:Configuration Management
类别:CI/CD
类别:Performance Tuning
类别:DevOps
类别:Containerization
类别:Development Environment
类别:Continuous Integration
类别:Docker (Software)
类别:Ansible
类别:Version Control
类别:Application Deployment
类别:Root Cause Analysis
类别:Data Pipelines
Data Quality and Debugging for Reliable Pipelines

Data Quality and Debugging for Reliable Pipelines

第 5 门课程, 小时

您将学到什么

  • Define and automate data quality tests using YAML to validate row counts, null thresholds, and uniqueness across pipeline datasets.

  • Trace data anomalies through pipeline stages by analyzing logs and dashboards to identify and fix the exact source of failure.

  • Apply advanced Python debugging tools — including conditional breakpoints, watchpoints, and pdb — to diagnose and resolve pipeline issues.

  • Resolve complex concurrency bugs by reading stack traces and correlating thread logs to identify deadlocks and race conditions in code.

您将获得的技能

类别:YAML
类别:DevOps
类别:Data Integrity
类别:Data Validation
类别:Root Cause Analysis
类别:Reliability
类别:Python Programming
类别:Performance Tuning
类别:Generative AI
类别:Test Automation
类别:Data Pipelines
类别:Dashboard
类别:Anomaly Detection
类别:Debugging
类别:Data Quality
类别:Development Testing
Career Development For Open Source Data Engineering

Career Development For Open Source Data Engineering

第 6 门课程, 小时

您将学到什么

  • Build a data engineering portfolio with end-to-end pipeline projects that prove your ability to design, build, and deploy production-style systems.

  • Create a resume, LinkedIn profile, and GitHub presence that position you as a hands-on data engineer ready to contribute from day one.

  • Practice real data engineering interview scenarios and develop structured responses to technical, design, and behavioral questions.

  • Execute a 30-day career launch plan covering portfolio completion, job applications, and networking in the data engineering community.

您将获得的技能

类别:Apache Airflow
类别:Data Pipelines
类别:Data Quality
类别:Professional Development
类别:Interviewing Skills
类别:Apache Spark
类别:Collaboration
类别:Communication
类别:Portfolio Management
类别:SQL
类别:Data Infrastructure
类别:Python Programming
类别:Apache
类别:GitHub
类别:Software Development
类别:Professional Networking

获得职业证书

将此证书添加到您的 LinkedIn 个人资料、简历或履历中。在社交媒体和绩效考核中分享。

位教师

Professionals from the Industry
366 门课程51,989 名学生

提供方

Coursera

人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

自 2018开始学习的学生
''能够按照自己的速度和节奏学习课程是一次很棒的经历。只要符合自己的时间表和心情,我就可以学习。'

Jennifer J.

自 2020开始学习的学生
''我直接将从课程中学到的概念和技能应用到一个令人兴奋的新工作项目中。'

Larry W.

自 2021开始学习的学生
''如果我的大学不提供我需要的主题课程,Coursera 便是最好的去处之一。'

Chaitanya A.

''学习不仅仅是在工作中做的更好:它远不止于此。Coursera 让我无限制地学习。'

² 职业发展(例如升职加薪)基于美国 2021 年 Cousera 学生结果调查的结果。