Coursera

Spark, Skew & Speed: Pipeline Performance Engineering 专项课程

通过 Coursera Plus 提高技能,仅需 239 美元/年(原价 399 美元)。立即节省

Coursera

Spark, Skew & Speed: Pipeline Performance Engineering 专项课程

Engineer Faster, Smarter Data Pipelines.

Master Spark optimization, pipeline debugging, & performance engineering for production data systems

Hurix Digital

位教师:Hurix Digital

包含在 Coursera Plus

深入学习学科知识
高级设置 等级

推荐体验

4 周 完成
在 10 小时 一周
灵活的计划
自行安排学习进度
深入学习学科知识
高级设置 等级

推荐体验

4 周 完成
在 10 小时 一周
灵活的计划
自行安排学习进度

您将学到什么

  • Optimize Apache Spark jobs by analyzing execution plans, implementing strategic partitioning, & applying caching to deliver measurable runtime gains.

  • Diagnose and resolve data skew, shuffle inefficiencies, and pipeline bottlenecks using Spark UI analysis and proactive partition strategies.

  • Benchmark competing pipeline designs, automate transformation model generation, & apply configuration-driven scripting for scalable data operations.

  • Trace data anomalies to their source, debug Python pipeline failures using stack traces and logs, and implement systematic root cause analysis.

要了解的详细信息

可分享的证书

添加到您的领英档案

授课语言:英语(English)
最近已更新!

April 2026

了解顶级公司的员工如何掌握热门技能

Petrobras, TATA, Danone, Capgemini, P&G 和 L'Oreal 的徽标

精进特定领域的专业知识

  • 向大学和行业专家学习热门技能
  • 借助实践项目精通一门科目或一个工具
  • 培养对关键概念的深入理解
  • 通过 Coursera 获得职业证书

专业化 - 8门课程系列

Trace and Fix Data Anomalies

Trace and Fix Data Anomalies

第 1 门课程, 小时

您将学到什么

  • Systematic root cause analysis requires methodical examination of each pipeline stage rather than reactive troubleshooting.

  • Data anomalies often originate from transformation logic errors, making code-level investigation essential for permanent fixes.

  • Effective data quality monitoring combines proactive dashboard observation with hands-on validation techniques.

  • Pipeline reliability depends on maintaining clear traceability from data sources through all transformation stages.

您将获得的技能

类别:Data Pipelines
类别:Data Validation
类别:Data Integrity
类别:Extract, Transform, Load
类别:Dependency Analysis
类别:Dashboard
类别:Anomaly Detection
类别:Data Transformation
类别:SQL
类别:Data Quality
类别:Data Processing
Debug Python Pipelines: Root Causes

Debug Python Pipelines: Root Causes

第 2 门课程, 小时

您将学到什么

  • Advanced debugging is a systematic discipline that moves beyond trial-and-error to leverage sophisticated tools for efficient problem resolution.

  • Multithreaded debugging requires understanding execution flow patterns and correlation techniques to reconstruct complex failure scenarios.

  • Production debugging success depends on methodical analysis of runtime state, memory conditions, and thread interactions rather than intuition.

  • Effective debugging practices create repeatable processes that transform unpredictable failures into manageable, documented solutions.

您将获得的技能

类别:Root Cause Analysis
类别:Complex Problem Solving
类别:Application Performance Management
类别:Failure Analysis
类别:Event Monitoring
类别:Integrated Development Environments
类别:Analysis
Optimize Query Performance for Data Success

Optimize Query Performance for Data Success

第 3 门课程, 小时

您将学到什么

  • Proactive performance monitoring prevents system failures and ensures consistent user experience across production environments.

  • Systematic diagnosis of query bottlenecks requires understanding both query logic efficiency and underlying resource limitations.

  • Strategic resource allocation combines technical optimization with business requirements to maintain service level agreements.

  • Continuous performance analysis creates a feedback loop that improves system reliability over time.

您将获得的技能

类别:Operational Databases
类别:Database Management
类别:Service Level
类别:Query Languages
类别:Performance Testing
类别:Application Performance Management
类别:Capacity Management
类别:System Monitoring
类别:Performance Tuning
类别:Continuous Monitoring
Validate and Track Data History Confidently

Validate and Track Data History Confidently

第 4 门课程, 小时

您将学到什么

  • Automated checksum validation strengthens data pipelines and detects errors early before they move downstream to impact business decisions.

  • Reusable SCD2 architecture lowers maintenance and ensures consistent historical tracking across data warehouses for reliable analytics.

  • Parameterized transforms support scalable engineering and adapt to changing needs without duplicating code or increasing technical debt.

  • Structured data reconciliation is vital for compliance, audit trails, and maintaining trust in analytics across all organizational levels.

您将获得的技能

类别:Data Architecture
类别:Snowflake Schema
类别:Performance Tuning
类别:Data Warehousing
类别:Data Integrity
类别:Extract, Transform, Load
类别:Data Quality
类别:Data Validation
类别:Data Mart
类别:Data Transformation
类别:Data Maintenance
类别:Reconciliation
类别:Database Development
类别:Star Schema
Optimize Spark Performance: Analyze & Accelerate

Optimize Spark Performance: Analyze & Accelerate

第 5 门课程, 小时

您将学到什么

  • Performance optimization is a systematic process requiring analysis of data access patterns, not random configuration changes.

  • Strategic partitioning minimizes expensive network shuffles and is the foundation of scalable Spark applications.

  • Intelligent caching of reusable intermediate datasets can dramatically reduce computation costs and improve job reliability.

  • The Spark UI provides actionable insights that guide optimization decisions and enable data-driven performance improvements.

您将获得的技能

类别:Performance Tuning
类别:Apache Spark
类别:Data Processing
类别:PySpark
类别:Data Pipelines
类别:Systems Analysis
Fix Data Bottlenecks: Optimize Spark Performance

Fix Data Bottlenecks: Optimize Spark Performance

第 6 门课程, 小时

您将学到什么

  • Performance bottlenecks in distributed systems often stem from uneven data distribution rather than insufficient computational resources.

  • Visual execution plan analysis is essential for identifying specific stages where data processing imbalances occur.

  • Proactive partition strategy selection prevents performance degradation more effectively than reactive optimization

  • Spark's shuffle.partitions configuration and broadcast join patterns are fundamental tools for sustainable pipeline optimization.

您将获得的技能

类别:Performance Tuning
类别:Apache Spark
类别:Performance Analysis
类别:Scalability
类别:Distributed Computing
类别:Data Pipelines
类别:Debugging
类别:Data Processing
类别:PySpark
Automate, Optimize, and Benchmark Data Pipelines

Automate, Optimize, and Benchmark Data Pipelines

第 7 门课程, 小时

您将学到什么

  • Performance measurement and evidence-based decisions rely on comparing execution metrics to improve data engineering efficiency.

  • Config-driven model generation cuts manual work, keeps projects consistent, and supports scalable data transformation.

  • Pipeline optimization uses repeated measurement and programmatic fixes to deliver lasting performance gains.

  • Modern data engineering succeeds by creating reusable, maintainable systems that adapt to changing needs while preserving performance.

您将获得的技能

类别:Performance Testing
类别:Extract, Transform, Load
类别:Performance Measurement
类别:Statistical Analysis
类别:Benchmarking
类别:Performance Analysis
类别:Data Processing
类别:Data Modeling
类别:Data-Driven Decision-Making
Transform, Analyze, and Optimize Your Data

Transform, Analyze, and Optimize Your Data

第 8 门课程, 小时

您将学到什么

  • Batch data transformation converts raw semi-structured data into analysis-ready formats that support enterprise decisions.

  • Workload analysis guides database design by linking access patterns and query frequency to performance and cost gains.

  • Migration choices must rely on performance testing and quantitative analysis to ensure ROI-driven transformations.

  • System performance depends on storage, queries, and hardware, requiring holistic technical and business evaluation.

您将获得的技能

类别:Azure Synapse Analytics
类别:Database Design
类别:Apache Cassandra
类别:Database Management
类别:Data Wrangling
类别:Data Architecture
类别:Operational Databases
类别:Apache Hive
类别:Amazon Redshift
类别:Data Transformation

获得职业证书

将此证书添加到您的 LinkedIn 个人资料、简历或履历中。在社交媒体和绩效考核中分享。

位教师

Hurix Digital
Coursera
406 门课程34,487 名学生

提供方

Coursera

人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

自 2018开始学习的学生
''能够按照自己的速度和节奏学习课程是一次很棒的经历。只要符合自己的时间表和心情,我就可以学习。'

Jennifer J.

自 2020开始学习的学生
''我直接将从课程中学到的概念和技能应用到一个令人兴奋的新工作项目中。'

Larry W.

自 2021开始学习的学生
''如果我的大学不提供我需要的主题课程,Coursera 便是最好的去处之一。'

Chaitanya A.

''学习不仅仅是在工作中做的更好:它远不止于此。Coursera 让我无限制地学习。'
Coursera Plus

通过 Coursera Plus 开启新生涯

无限制访问 10,000+ 世界一流的课程、实践项目和就业就绪证书课程 - 所有这些都包含在您的订阅中

通过在线学位推动您的职业生涯

获取世界一流大学的学位 - 100% 在线

加入超过 3400 家选择 Coursera for Business 的全球公司

提升员工的技能,使其在数字经济中脱颖而出

常见问题