Is this course really 100% online? Do I need to attend any classes in person?

This course is completely online, so there’s no need to show up to a classroom in person. You can access your lectures, readings and assignments anytime and anywhere via the web or your mobile device.

Can I just enroll in a single course?

Yes! To get started, click the course card that interests you and enroll. You can enroll and complete the course to earn a shareable certificate. When you subscribe to a course that is part of a Certificate, you’re automatically subscribed to the full Certificate. Visit your learner dashboard to track your progress.

Open source Data Engineering with Spark, dbt & Airflow (berufsbezogenes Zertifikat)

Sparen Sie mit 40% Rabatt auf 3 Monate Coursera Plus bei den Fähigkeiten, die Sie zum Strahlen bringen. Jetzt sparen

berufsbezogenes zertifikat ist nicht verfügbar in Deutsch (Deutschland)

Wir übersetzen es in weitere Sprachen.

Open source Data Engineering with Spark, dbt & Airflow (berufsbezogenes Zertifikat)

Build Production Data Pipelines at Scale.

Explore Spark, dbt, and Airflow to design, automate, and deploy enterprise-grade data pipelines.

Dozent: Professionals from the Industry

Bei enthalten

Mehr erfahren

6-teilige Kursreihe

Erwerben Sie eine Karrierereferenz, die Ihre Qualifikation belegt

Stufe Mittel

Empfohlene Erfahrung

4 Wochen zu vervollständigen

unter 10 Stunden pro Woche

Flexibler Zeitplan

In Ihrem eigenen Lerntempo lernen

6-teilige Kursreihe

Erwerben Sie eine Karrierereferenz, die Ihre Qualifikation belegt

Stufe Mittel

Empfohlene Erfahrung

4 Wochen zu vervollständigen

unter 10 Stunden pro Woche

Flexibler Zeitplan

In Ihrem eigenen Lerntempo lernen

Was Sie lernen werden

Build modular, production-grade data pipelines using Apache Spark, dbt, and Airflow to ingest, transform, and load data at scale.
Design and implement dimensional data models including star schemas, SCD Type 2, and incremental load strategies for data warehouses.
Optimize distributed data processing by resolving Spark shuffle, skew, and partitioning issues to improve pipeline performance.
Automate deployments and enforce data quality using CI/CD pipelines, Docker containers, and automated testing frameworks like Great Expectations.

Kompetenzen, die Sie erwerben

Kategorie: CI/CD
Kategorie: Cloud Security
Kategorie: Data Flow Diagrams (DFDs)
Kategorie: Data Modeling
Kategorie: Data Pipelines
Kategorie: Data Validation
Kategorie: Data Warehousing
Kategorie: Database Design
Kategorie: Diagram Design
Kategorie: Interviewing Skills
Kategorie: Snowflake Schema
Kategorie: SQL
Kategorie: Star Schema
Kategorie: Workflow Management

Werkzeuge, die Sie lernen werden

Kategorie: Ansible
Kategorie: Apache Airflow
Kategorie: Apache Spark
Kategorie: Docker (Software)
Kategorie: Git (Version Control System)
Kategorie: PySpark

Wichtige Details

Zertifikat zur Vorlage

Zu Ihrem LinkedIn-Profil hinzufügen

Unterrichtet in Englisch

Kürzlich aktualisiert!

März 2026

Erfahren Sie, wie Mitarbeiter führender Unternehmen gefragte Kompetenzen erwerben.

Weitere Informationen zu Coursera für Unternehmen

Logos von Petrobras, TATA, Danone, Capgemini, P&G und L'Oreal

Bringen Sie Ihre Karriere mit gefragten Kompetenzen voran.

Erhalten Sie Schulungen auf professionellem Niveau von Coursera
Stellen Sie Ihre technischen Kenntnisse unter Beweis.
Erwerben Sie ein von Arbeitgebern anerkanntes Zertifikat von Coursera.

Berufsbezogenes Zertifikat – 6 Kursreihen

This program equips you with the open-source tools and architectural thinking used by professional data engineers to build scalable, reliable data systems from the ground up. You will work hands-on with Apache Spark for distributed data processing, dbt for modular SQL-based transformation, and Apache Airflow for workflow orchestration — the same stack powering data infrastructure at leading technology and data-driven organizations worldwide.

Across the courses, you will gain practical expertise in designing dimensional data models, implementing incremental load strategies, optimizing Spark job performance, enforcing data quality with automated testing frameworks, and deploying pipelines through CI/CD workflows. You will also develop foundational skills in cloud storage provisioning, containerization with Docker, and version control best practices that mirror real production environments.

By the end of this Program, you will be able to design and deploy end-to-end data pipelines that ingest from diverse sources, transform data through well-tested models, and deliver analytics-ready datasets to downstream consumers — demonstrating job-ready engineering skills valued across analytics engineering, data platform, and data infrastructure roles.

Übungsprojekt

Throughout this Program, you will complete hands-on projects that mirror real production data engineering challenges — from building modular ETL pipelines that ingest CRM and streaming data into a cloud data warehouse, to authoring Airflow DAGs with retry logic and SLA monitoring, to diagnosing Spark performance bottlenecks and implementing Delta Lake versioning. Each project asks you to work in your own development environment, producing portfolio-ready artifacts that demonstrate your ability to design, optimize, and deploy reliable data infrastructure using open-source tools.

Building Automated Data Pipelines with Spark,dbt,and Airflow

KURS 1, 9 Stunden

Was Sie lernen werden

Build end-to-end data pipelines that automatically ingest from databases, APIs, and streams using Spark, dbt, and Airflow tools.
Design data models with historical tracking using SCD Type 2 patterns to preserve complete change history for analytics.
Create automated workflows with intelligent retry logic, SLA monitoring, and parameterization for production reliability.
Optimize Spark job performance using partitioning and caching strategies to achieve 30%+ runtime improvements.

Kompetenzen, die Sie erwerben

Kategorie: Apache Airflow

Kategorie: Data Pipelines

Kategorie: Apache Spark

Kategorie: Data Flow Diagrams (DFDs)

Kategorie: Data Transformation

Kategorie: Extract, Transform, Load

Kategorie: Configuration Management

Kategorie: Data Mapping

Kategorie: Diagram Design

Kategorie: Data Warehousing

Kategorie: Data Modeling

Kategorie: Data Integration

Kategorie: Enterprise Security

Kategorie: Data Processing

Kategorie: Data Architecture

Kategorie: Database Development

Optimizing Spark and Cloud Data Storage for Analytics

KURS 2, 10 Stunden

Was Sie lernen werden

Optimize Spark job performance through strategic partitioning and caching, achieving 30%+ runtime improvements using data access analysis.
Implement transactional data lakes with Delta format, enabling versioning, ACID operations, and schema evolution for reliable datasets.
Provision secure cloud data infrastructure using IAM policies, private networks, and encrypted storage following security best practices.
Evaluate and benchmark storage formats (Parquet, ORC, Avro) to select optimal solutions for analytical workloads and cost efficiency.

Kompetenzen, die Sie erwerben

Kategorie: Apache Spark

Kategorie: Performance Tuning

Kategorie: Cloud Security

Kategorie: Data Storage

Kategorie: Transaction Processing

Kategorie: Data Warehousing

Kategorie: Cloud Computing Architecture

Kategorie: Cloud Storage

Kategorie: Data Management

Kategorie: Cloud Computing

Kategorie: Data Lakes

Kategorie: Data Integrity

Kategorie: Cloud Deployment

Kategorie: Security Controls

Kategorie: Data Security

Kategorie: Infrastructure Architecture

Kategorie: Infrastructure as Code (IaC)

Kategorie: Cloud Infrastructure

Kategorie: PySpark

Kategorie: Data Storage Technologies

Data Modeling & Warehousing Fundamentals in Data Engineering

KURS 3, 9 Stunden

Was Sie lernen werden

Design star schema data models with fact and dimension tables that enable intuitive self-service business intelligence reporting.
Apply third normal form normalization to optimize database structure while maintaining query performance through indexing strategies.
Use advanced SQL window functions to calculate rolling metrics, rankings, and time-series analytics for complex data analysis.
Implement database replication and incremental loading techniques to ensure high availability and efficient data warehouse updates.

Kompetenzen, die Sie erwerben

Kategorie: SQL

Kategorie: Extract, Transform, Load

Kategorie: Data Warehousing

Kategorie: Star Schema

Kategorie: Database Management

Kategorie: Performance Tuning

Kategorie: Database Design

Kategorie: PostgreSQL

Kategorie: Database Theory

Kategorie: Data Infrastructure

Kategorie: Database Development

Kategorie: Database Architecture and Administration

Kategorie: Business Intelligence

Kategorie: Data Modeling

Kategorie: Relational Databases

Kategorie: Database Software

Kategorie: Data Integration

DevOps and CI/CD for Data Engineering Performance

KURS 4, 12 Stunden

Was Sie lernen werden

Resolve merge conflicts and trace bugs using Git history tools, keeping collaborative codebases stable and production-ready.
Design branching strategies and automate deployments with CI/CD pipelines to safely promote data pipeline artifacts across environments.
Build and publish versioned Docker images and automate server configuration with Ansible for consistent, reproducible environments.
Analyze query execution metrics and optimize resource allocation to maintain performance targets in production data systems.

Kompetenzen, die Sie erwerben

Kategorie: CI/CD

Kategorie: DevOps

Kategorie: Containerization

Kategorie: Performance Tuning

Kategorie: Git (Version Control System)

Kategorie: Ansible

Kategorie: Data Pipelines

Kategorie: IT Automation

Kategorie: Application Deployment

Kategorie: Version Control

Kategorie: Software Versioning

Kategorie: Docker (Software)

Kategorie: Data Infrastructure

Kategorie: Configuration Management

Kategorie: Root Cause Analysis

Kategorie: Development Environment

Kategorie: Devops Tools

Kategorie: Continuous Integration

Kategorie: Infrastructure as Code (IaC)

Kategorie: Continuous Deployment

Data Quality and Debugging for Reliable Pipelines

KURS 5, 7 Stunden

Was Sie lernen werden

Define and automate data quality tests using YAML to validate row counts, null thresholds, and uniqueness across pipeline datasets.
Trace data anomalies through pipeline stages by analyzing logs and dashboards to identify and fix the exact source of failure.
Apply advanced Python debugging tools — including conditional breakpoints, watchpoints, and pdb — to diagnose and resolve pipeline issues.
Resolve complex concurrency bugs by reading stack traces and correlating thread logs to identify deadlocks and race conditions in code.

Kompetenzen, die Sie erwerben

Kategorie: Data Quality

Kategorie: Data Validation

Kategorie: Debugging

Kategorie: Data Integrity

Kategorie: Anomaly Detection

Kategorie: Test Automation

Kategorie: YAML

Kategorie: Generative AI

Kategorie: AI Integrations

Kategorie: Test Tools

Kategorie: Memory Management

Kategorie: Python Programming

Kategorie: Data Pipelines

Kategorie: Performance Tuning

Kategorie: Root Cause Analysis

Kategorie: CI/CD

Kategorie: Reliability

Career Development For Open Source Data Engineering

KURS 6, 2 Stunden

Was Sie lernen werden

Build a data engineering portfolio with end-to-end pipeline projects that prove your ability to design, build, and deploy production-style systems.
Create a resume, LinkedIn profile, and GitHub presence that position you as a hands-on data engineer ready to contribute from day one.
Practice real data engineering interview scenarios and develop structured responses to technical, design, and behavioral questions.
Execute a 30-day career launch plan covering portfolio completion, job applications, and networking in the data engineering community.

Kompetenzen, die Sie erwerben

Kategorie: SQL

Kategorie: Apache Spark

Kategorie: Interviewing Skills

Kategorie: Professional Networking

Kategorie: Apache

Kategorie: Data Quality

Kategorie: Data Presentation

Kategorie: GitHub

Kategorie: Portfolio Management

Kategorie: Web Presence

Kategorie: Python Programming

Kategorie: Apache Airflow

Kategorie: Data Pipelines

Erwerben Sie ein Karrierezertifikat.

Fügen Sie dieses Zeugnis Ihrem LinkedIn-Profil, Lebenslauf oder CV hinzu. Teilen Sie sie in Social Media und in Ihrer Leistungsbeurteilung.

Dozent

Professionals from the Industry

472 Kurse83.884 Lernende

von

Coursera

Warum entscheiden sich Menschen für Coursera für ihre Karriere?

Felipe M.

Lernender seit 2018

„Es ist eine großartige Erfahrung, in meinem eigenen Tempo zu lernen. Ich kann lernen, wenn ich Zeit und Nerven dazu habe.“

Jennifer J.

Lernender seit 2020

„Bei einem spannenden neuen Projekt konnte ich die neuen Kenntnisse und Kompetenzen aus den Kursen direkt bei der Arbeit anwenden.“

Larry W.

Lernender seit 2021

„Wenn mir Kurse zu Themen fehlen, die meine Universität nicht anbietet, ist Coursera mit die beste Alternative.“

Chaitanya A.

„Man lernt nicht nur, um bei der Arbeit besser zu werden. Es geht noch um viel mehr. Bei Coursera kann ich ohne Grenzen lernen.“

Häufig gestellte Fragen

This Program is designed for intermediate learners. You should be comfortable writing Python scripts and SQL queries before starting. Prior experience with data engineering tools like Spark or Airflow is not required — you will build that knowledge through the courses.

You will work in your own local or cloud-based development environment using open-source tools including Apache Spark, dbt Core, Apache Airflow, Docker, and Git. Specific setup instructions are provided at the start of each course.

This program is designed for aspiring data engineers and technically curious professionals who want to build a career working with data infrastructure and pipelines. It is well-suited for software developers transitioning into data engineering, analysts looking to move beyond spreadsheets and SQL into pipeline development, and recent graduates seeking job-ready, hands-on data engineering skills.

Basic Python familiarity and foundational SQL knowledge — such as writing simple SELECT and JOIN queries — are recommended before starting. General comfort working in a command-line environment will also be helpful. No prior experience with Spark, dbt, Airflow, Docker, or cloud platforms is required. The program builds all data engineering skills from the ground up.

Weitere Fragen

Besuchen Sie die das Hilfe-Center für Kursteilnehmer.

Finanzielle Unterstützung verfügbar,

¹Basierend auf den Antworten der „Coursera Learner Outcomes Survey“, USA, 2021.