Coursera

Modern Data Architecture & Lakehouse Engineering 专项课程

Coursera

Modern Data Architecture & Lakehouse Engineering 专项课程

Design and Build Modern Data Platforms.

Learn to architect, secure, and optimize cloud-based lakehouse systems for enterprise analytics.

Hurix Digital

位教师:Hurix Digital

访问权限由 Coursera Learning Team 提供

深入学习学科知识
中级 等级

推荐体验

4 周 完成
在 10 小时 一周
灵活的计划
自行安排学习进度
深入学习学科知识
中级 等级

推荐体验

4 周 完成
在 10 小时 一周
灵活的计划
自行安排学习进度

您将学到什么

  • Architect and provision secure, resilient cloud data infrastructure using Infrastructure as Code and disaster recovery best practices.

  • Build lakehouse platforms with transactional integrity, automated pipelines, and seamless integration of diverse data sources.

  • Optimize data system performance through strategic partitioning, query tuning, security controls, and systematic benchmarking.

要了解的详细信息

可分享的证书

添加到您的领英档案

授课语言:英语(English)
最近已更新!

February 2026

了解顶级公司的员工如何掌握热门技能

Petrobras, TATA, Danone, Capgemini, P&G 和 L'Oreal 的徽标

精进特定领域的专业知识

  • 向大学和行业专家学习热门技能
  • 借助实践项目精通一门科目或一个工具
  • 培养对关键概念的深入理解
  • 通过 Coursera 获得职业证书

专业化 - 13门课程系列

Engineer Cloud Data for Resiliency & ROI

Engineer Cloud Data for Resiliency & ROI

第 1 门课程, 小时

您将学到什么

  • Infrastructure as Code automates data platform deployments, replacing manual processes with version-controlled, repeatable systems.

  • Cost optimization uses performance benchmarking and data analysis to identify efficient compute/storage configs for specific workloads.

  • Business continuity requires proactive disaster recovery with automated failover and continuous replication for strict recovery goals.

  • Successful cloud data engineering balances performance, cost, and reliability through strategic design and continuous monitoring.

您将获得的技能

类别:Disaster Recovery
类别:Business Continuity
类别:Capacity Management
类别:Business Continuity Planning
类别:Cloud Deployment
类别:Data Architecture
类别:Automation
类别:Cost Management
类别:Infrastructure as Code (IaC)
类别:IT Infrastructure
类别:Performance Analysis
类别:Data Infrastructure
类别:Data Warehousing
类别:Benchmarking
类别:Terraform
类别:AWS CloudFormation
类别:Cloud Computing Architecture
Build & Analyze Your Data Lakehouse

Build & Analyze Your Data Lakehouse

第 2 门课程, 小时

您将学到什么

  • External tables let query engines access distributed files without duplication, reshaping large-scale analytics design.

  • Choosing Delta, Iceberg, or Hudi requires evaluating schema changes, time travel needs, and performance goals.

  • Lakehouse architecture merges data lake flexibility with warehouse reliability using metadata and ACID support.

  • Automated ingestion with staging and transformation layers ensures consistent, high-quality data across analytics systems.

Transform, Analyze, and Optimize Your Data

Transform, Analyze, and Optimize Your Data

第 3 门课程, 小时

您将学到什么

  • Batch data transformation converts raw semi-structured data into analysis-ready formats that support enterprise decisions.

  • Workload analysis guides database design by linking access patterns and query frequency to performance and cost gains.

  • Migration choices must rely on performance testing and quantitative analysis to ensure ROI-driven transformations.

  • System performance depends on storage, queries, and hardware, requiring holistic technical and business evaluation.

您将获得的技能

类别:Database Management
类别:Data Transformation
类别:Database Design
类别:Data Architecture
类别:Operational Databases
类别:Data Wrangling
类别:Amazon Redshift
类别:Apache Cassandra
类别:Azure Synapse Analytics
类别:Apache Hive
Unify, Reconcile, and Tune Data Systems

Unify, Reconcile, and Tune Data Systems

第 4 门课程, 小时

您将学到什么

  • SQL MERGE offers atomic sync that maintains consistency in CDC pipelines with minimal overhead.

  • Field-level conflict analysis needs clear business rules and source-of-truth hierarchies for reliable reconciliation.

  • Integration performance improves through measurement, bottleneck detection, and targeted tuning, not large redesigns.

  • Sustainable data systems balance quality, speed, and reliability through ongoing monitoring and iterative improvement.

您将获得的技能

类别:Data Integration
类别:SQL
类别:Performance Testing
类别:Operational Databases
类别:Database Design
类别:Data Cleansing
类别:Data Validation
类别:Performance Tuning
类别:Performance Metric
类别:Stored Procedure
类别:Data Quality
类别:Performance Measurement
类别:Application Performance Management
类别:Data Governance
类别:Data Manipulation
类别:Data Integrity
类别:Consolidation
类别:Systems Integration
类别:Data Pipelines
类别:Performance Improvement
Secure Data: Mask, Monitor, and Audit

Secure Data: Mask, Monitor, and Audit

第 5 门课程, 小时

您将学到什么

  • Data protection requires layered security controls that balance privacy with operational utility.

  • Proactive monitoring and anomaly detection are essential for identifying security threats before they escalate into breaches.

  • Compliance frameworks provide structured approaches to evaluating and strengthening organizational security postures.

  • Effective data governance integrates technical controls with policy frameworks to create comprehensive protection strategies.

您将获得的技能

类别:Security Management
Provision Secure Cloud Data Infrastructure

Provision Secure Cloud Data Infrastructure

第 6 门课程, 小时

您将学到什么

  • Security by design applies layered defenses across storage, identity, and networks from the start of infrastructure setup.

  • Infrastructure as Code ensures consistent, auditable security settings that reduce errors and support compliance needs.

  • The principle of least privilege must be embedded into every access control decision, granting only necessary permissions to specific resources.

  • Secure networks rely on segmentation with private subnets and controls to protect systems from public exposure.

您将获得的技能

类别:Infrastructure as Code (IaC)
类别:Identity and Access Management
类别:Encryption
类别:Data Security
类别:Cloud Security
类别:Network Security
类别:Cloud Infrastructure
类别:Private Cloud
类别:Security Controls
类别:Infrastructure Security
类别:Data Management
类别:Data Integrity
类别:Data Infrastructure
类别:Cloud Storage
Apply Data Lake Transactions & Versioning

Apply Data Lake Transactions & Versioning

第 7 门课程, 小时

您将学到什么

  • Transactional storage layers ensure data lake reliability, supporting concurrent operations and maintaining integrity.

  • Version control in data lakes enables auditing, compliance, time-travel queries, and error recovery for production systems.

  • Schema evolution strategies help data systems adapt to business changes while maintaining backward compatibility.

  • Converting raw files to transactional formats is a key pattern supporting both analytics and operational reliability.

您将获得的技能

类别:Data Lakes
类别:Data Pipelines
类别:SQL
Evaluate Storage for Data Warehousing Success

Evaluate Storage for Data Warehousing Success

第 8 门课程, 小时

您将学到什么

  • Storage format choice strongly affects query performance and should match workload needs, not general assumptions.

  • Column storage suits read-heavy analytics, while row storage performs better for transactional and write-focused workloads.

  • Benchmarking with real datasets and queries offers the best basis for sound storage architecture decisions.

  • Compression and ingestion speed must be balanced carefully to align performance with business priorities.

您将获得的技能

类别:Amazon Redshift
类别:Snowflake Schema
类别:Data Processing
类别:Star Schema
类别:Data-Driven Decision-Making
类别:Data Storage
类别:Query Languages
类别:Technical Communication
类别:Performance Testing
类别:Analysis
类别:Apache Hive
类别:Data Storage Technologies
类别:Scalability
类别:Data Warehousing
类别:Data Architecture
Build & Transform Data Pipelines

Build & Transform Data Pipelines

第 9 门课程, 小时

您将学到什么

  • Modular pipeline design enables maintainable, scalable data systems that can adapt to changing business requirements.

  • Integration of complementary tools (Spark, dbt, Airflow) creates more robust and efficient data processing workflows than single-tool approaches.

  • Proper separation of concerns between ingestion, transformation, and loading stages reduces complexity and improves debugging capabilities.

  • Automation and orchestration are essential for reliable, production-grade data systems that minimize manual intervention and human error.

您将获得的技能

类别:Cloud Computing
类别:Extract, Transform, Load
类别:Cloud Deployment
类别:Data Warehousing
类别:Data Pipelines
类别:Data Integration
类别:Data Processing
类别:Data Cleansing
类别:Apache Airflow
类别:Maintainability
Unify Diverse Data Sources

Unify Diverse Data Sources

第 10 门课程, 小时

您将学到什么

  • Standardized connector configuration patterns apply across different data source types, making integration skills transferable.

  • Authentication and security considerations must be built into every connector setup to ensure enterprise-grade data protection.

  • Proper offset and parameter management in streaming and API connections prevents data loss and ensures complete data capture.

  • Unified staging approaches enable downstream analytics and business intelligence regardless of source system complexity.

您将获得的技能

类别:Restful API
类别:Data Pipelines
类别:Apache Kafka
类别:Data Integration
类别:Relational Databases
类别:Real Time Data
类别:Database Systems
类别:Enterprise Security
类别:Data Infrastructure
类别:Databases
类别:Authentications
Map Data Flows Fast

Map Data Flows Fast

第 11 门课程, 小时

您将学到什么

  • Visual data flow docs are key for system clarity and form the base for good pipeline design and team communication.

  • Complete data flow diagrams must show the full journey from sources through transforms to final destinations.

  • Structured diagram creation follows steps: find sources, map processes, set destinations, and check connections.

  • Good data flow visuals connect technical work with business needs, enabling stakeholder alignment and decisions.

您将获得的技能

类别:Data Transformation
类别:Data Mapping
类别:Dataflow
类别:Diagram Design
类别:Data Literacy
类别:Data Store
类别:Data Processing
类别:Data Flow Diagrams (DFDs)
类别:Data Pipelines
类别:Data Visualization
类别:Data Presentation
类别:Technical Communication
Optimize Spark Performance: Analyze & Accelerate

Optimize Spark Performance: Analyze & Accelerate

第 12 门课程, 小时

您将学到什么

  • Performance optimization is a systematic process requiring analysis of data access patterns, not random configuration changes.

  • Strategic partitioning minimizes expensive network shuffles and is the foundation of scalable Spark applications.

  • Intelligent caching of reusable intermediate datasets can dramatically reduce computation costs and improve job reliability.

  • The Spark UI provides actionable insights that guide optimization decisions and enable data-driven performance improvements.

您将获得的技能

类别:Performance Tuning
类别:Apache Spark
类别:PySpark
类别:Systems Analysis
类别:Data Pipelines
类别:Data Processing
Optimize Query Performance for Data Success

Optimize Query Performance for Data Success

第 13 门课程, 小时

您将学到什么

  • Proactive performance monitoring prevents system failures and ensures consistent user experience across production environments.

  • Systematic diagnosis of query bottlenecks requires understanding both query logic efficiency and underlying resource limitations.

  • Strategic resource allocation combines technical optimization with business requirements to maintain service level agreements.

  • Continuous performance analysis creates a feedback loop that improves system reliability over time.

您将获得的技能

类别:Operational Databases
类别:Capacity Management
类别:Application Performance Management
类别:Query Languages
类别:Service Level
类别:Database Management
类别:System Monitoring
类别:Performance Testing
类别:Continuous Monitoring
类别:Performance Tuning

获得职业证书

将此证书添加到您的 LinkedIn 个人资料、简历或履历中。在社交媒体和绩效考核中分享。

位教师

Hurix Digital
Coursera
414 门课程35,180 名学生

提供方

Coursera

人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

自 2018开始学习的学生
''能够按照自己的速度和节奏学习课程是一次很棒的经历。只要符合自己的时间表和心情,我就可以学习。'

Jennifer J.

自 2020开始学习的学生
''我直接将从课程中学到的概念和技能应用到一个令人兴奋的新工作项目中。'

Larry W.

自 2021开始学习的学生
''如果我的大学不提供我需要的主题课程,Coursera 便是最好的去处之一。'

Chaitanya A.

''学习不仅仅是在工作中做的更好:它远不止于此。Coursera 让我无限制地学习。'