Coursera

Pipeline Architects: Data Engineering to Lakehouse 专项课程

Coursera

Pipeline Architects: Data Engineering to Lakehouse 专项课程

Build Data Pipelines That Scale to Production.

Master ingestion, transformation, orchestration, and lakehouse architecture at scale.

Hurix Digital

位教师:Hurix Digital

访问权限由 Coursera Learning Team 提供

深入学习学科知识
中级 等级

推荐体验

4 周 完成
在 10 小时 一周
灵活的计划
自行安排学习进度
深入学习学科知识
中级 等级

推荐体验

4 周 完成
在 10 小时 一周
灵活的计划
自行安排学习进度

您将学到什么

  • Design data flow diagrams and configure Airbyte connectors for relational databases, streaming platforms, and REST APIs to unify diverse sources.

  • Build modular ETL pipelines using Python, dbt, and Airflow, and evaluate columnar versus row-oriented storage formats for analytical workloads.

  • Implement incremental warehouse loading, SCD2 historical tracking, and data lake transactions with versioning and schema evolution support.

  • Architect and build lakehouse platforms using Delta Lake, Iceberg, and Hudi, registering external tables and automating ingestion pipelines.

要了解的详细信息

可分享的证书

添加到您的领英档案

授课语言:英语(English)
最近已更新!

April 2026

了解顶级公司的员工如何掌握热门技能

Petrobras, TATA, Danone, Capgemini, P&G 和 L'Oreal 的徽标

精进特定领域的专业知识

  • 向大学和行业专家学习热门技能
  • 借助实践项目精通一门科目或一个工具
  • 培养对关键概念的深入理解
  • 通过 Coursera 获得职业证书

专业化 - 10门课程系列

Map Data Flows Fast

Map Data Flows Fast

第 1 门课程, 小时

您将学到什么

  • Visual data flow docs are key for system clarity and form the base for good pipeline design and team communication.

  • Complete data flow diagrams must show the full journey from sources through transforms to final destinations.

  • Structured diagram creation follows steps: find sources, map processes, set destinations, and check connections.

  • Good data flow visuals connect technical work with business needs, enabling stakeholder alignment and decisions.

您将获得的技能

类别:Data Transformation
类别:Data Mapping
类别:Dataflow
类别:Diagram Design
类别:Data Literacy
类别:Data Store
类别:Data Processing
类别:Data Flow Diagrams (DFDs)
类别:Data Pipelines
类别:Data Visualization
类别:Data Presentation
类别:Technical Communication
Unify Diverse Data Sources

Unify Diverse Data Sources

第 2 门课程, 小时

您将学到什么

  • Standardized connector configuration patterns apply across different data source types, making integration skills transferable.

  • Authentication and security considerations must be built into every connector setup to ensure enterprise-grade data protection.

  • Proper offset and parameter management in streaming and API connections prevents data loss and ensures complete data capture.

  • Unified staging approaches enable downstream analytics and business intelligence regardless of source system complexity.

您将获得的技能

类别:Restful API
类别:Data Pipelines
类别:Apache Kafka
类别:Data Integration
类别:Relational Databases
类别:Real Time Data
类别:Database Systems
类别:Enterprise Security
类别:Data Infrastructure
类别:Databases
类别:Authentications
Evaluate Storage for Data Warehousing Success

Evaluate Storage for Data Warehousing Success

第 3 门课程, 小时

您将学到什么

  • Storage format choice strongly affects query performance and should match workload needs, not general assumptions.

  • Column storage suits read-heavy analytics, while row storage performs better for transactional and write-focused workloads.

  • Benchmarking with real datasets and queries offers the best basis for sound storage architecture decisions.

  • Compression and ingestion speed must be balanced carefully to align performance with business priorities.

您将获得的技能

类别:Amazon Redshift
类别:Snowflake Schema
类别:Data Processing
类别:Star Schema
类别:Data-Driven Decision-Making
类别:Data Storage
类别:Query Languages
类别:Technical Communication
类别:Performance Testing
类别:Analysis
类别:Apache Hive
类别:Data Storage Technologies
类别:Scalability
类别:Data Warehousing
类别:Data Architecture
Build & Transform Data Pipelines

Build & Transform Data Pipelines

第 4 门课程, 小时

您将学到什么

  • Modular pipeline design enables maintainable, scalable data systems that can adapt to changing business requirements.

  • Integration of complementary tools (Spark, dbt, Airflow) creates more robust and efficient data processing workflows than single-tool approaches.

  • Proper separation of concerns between ingestion, transformation, and loading stages reduces complexity and improves debugging capabilities.

  • Automation and orchestration are essential for reliable, production-grade data systems that minimize manual intervention and human error.

您将获得的技能

类别:Cloud Computing
类别:Extract, Transform, Load
类别:Cloud Deployment
类别:Data Warehousing
类别:Data Pipelines
类别:Data Integration
类别:Data Processing
类别:Data Cleansing
类别:Apache Airflow
类别:Maintainability
Update Your Data Warehouse Incrementally

Update Your Data Warehouse Incrementally

第 5 门课程, 小时

您将学到什么

  • Standardized connector configuration patterns apply across different data source types, making integration skills transferable.

  • Authentication and security considerations must be built into every connector setup to ensure enterprise-grade data protection

  • Proper offset and parameter management in streaming and API connections prevents data loss and ensures complete data capture.

  • Unified staging approaches enable downstream analytics and business intelligence regardless of source system complexity.

Apply SCD2 to Build Dynamic Data Models

Apply SCD2 to Build Dynamic Data Models

第 6 门课程, 小时

您将学到什么

  • Historical data preservation is essential for accurate business analytics and regulatory compliance - once overwritten, critical context is lost.

  • SCD2 patterns create sustainable data architecture by maintaining complete audit trails through automated versioning than destructive updates.

  • Effective dimensional modeling requires systematic change detection logic that identifies modifications and creates new historical records.

  • Modern data tools like dbt democratize complex SCD2 implementation, making enterprise-grade historical tracking accessible through declarative SQL.

您将获得的技能

类别:Data Quality
类别:Data Integrity
类别:Time Series Analysis and Forecasting
类别:SQL
类别:Trend Analysis
类别:Data Validation
类别:Business Intelligence
类别:Scalability
类别:Data Pipelines
类别:Data Modeling
类别:Data Warehousing
Apply Data Lake Transactions & Versioning

Apply Data Lake Transactions & Versioning

第 7 门课程, 小时

您将学到什么

  • Transactional storage layers ensure data lake reliability, supporting concurrent operations and maintaining integrity.

  • Version control in data lakes enables auditing, compliance, time-travel queries, and error recovery for production systems.

  • Schema evolution strategies help data systems adapt to business changes while maintaining backward compatibility.

  • Converting raw files to transactional formats is a key pattern supporting both analytics and operational reliability.

您将获得的技能

类别:Data Lakes
类别:Data Pipelines
类别:SQL
Build & Analyze Your Data Lakehouse

Build & Analyze Your Data Lakehouse

第 8 门课程, 小时

您将学到什么

  • External tables let query engines access distributed files without duplication, reshaping large-scale analytics design.

  • Choosing Delta, Iceberg, or Hudi requires evaluating schema changes, time travel needs, and performance goals.

  • Lakehouse architecture merges data lake flexibility with warehouse reliability using metadata and ACID support.

  • Automated ingestion with staging and transformation layers ensures consistent, high-quality data across analytics systems.

Automate Data Workflows with Airflow Excellence

Automate Data Workflows with Airflow Excellence

第 9 门课程, 小时

您将学到什么

  • Production-grade workflows require proactive failure handling strategies, not reactive troubleshooting approaches.

  • Parameterization and configuration management are essential for workflow reusability across different environments and datasets.

  • Task dependency design and SLA monitoring form the foundation of reliable data pipeline operations.

  • Robust workflow architecture prevents downstream business disruptions and reduces operational overhead.

您将获得的技能

类别:Scalability
类别:Apache Airflow
类别:Data Pipelines
类别:Service Level Agreement
类别:MLOps (Machine Learning Operations)
类别:System Monitoring
类别:Extract, Transform, Load
类别:Incident Response
类别:Workflow Management
类别:DevOps
Unify, Reconcile, and Tune Data Systems

Unify, Reconcile, and Tune Data Systems

第 10 门课程, 小时

您将学到什么

  • SQL MERGE offers atomic sync that maintains consistency in CDC pipelines with minimal overhead.

  • Field-level conflict analysis needs clear business rules and source-of-truth hierarchies for reliable reconciliation.

  • Integration performance improves through measurement, bottleneck detection, and targeted tuning, not large redesigns.

  • Sustainable data systems balance quality, speed, and reliability through ongoing monitoring and iterative improvement.

您将获得的技能

类别:Data Integration
类别:SQL
类别:Performance Testing
类别:Operational Databases
类别:Database Design
类别:Data Cleansing
类别:Data Validation
类别:Performance Tuning
类别:Performance Metric
类别:Stored Procedure
类别:Data Quality
类别:Performance Measurement
类别:Application Performance Management
类别:Data Governance
类别:Data Manipulation
类别:Data Integrity
类别:Consolidation
类别:Systems Integration
类别:Data Pipelines
类别:Performance Improvement

获得职业证书

将此证书添加到您的 LinkedIn 个人资料、简历或履历中。在社交媒体和绩效考核中分享。

位教师

Hurix Digital
Coursera
414 门课程35,180 名学生

提供方

Coursera

人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

自 2018开始学习的学生
''能够按照自己的速度和节奏学习课程是一次很棒的经历。只要符合自己的时间表和心情,我就可以学习。'

Jennifer J.

自 2020开始学习的学生
''我直接将从课程中学到的概念和技能应用到一个令人兴奋的新工作项目中。'

Larry W.

自 2021开始学习的学生
''如果我的大学不提供我需要的主题课程,Coursera 便是最好的去处之一。'

Chaitanya A.

''学习不仅仅是在工作中做的更好:它远不止于此。Coursera 让我无限制地学习。'