Master the tools and techniques that power large-scale data processing and analytics. This course introduces the principles and frameworks of Big Data Processing with Hadoop and Spark, enabling learners to manage, process, and analyze massive datasets efficiently.

Big Data Processing with Hadoop and Spark
本课程是 Cloud Computing for Data Science 专项课程 的一部分
访问权限由 New York State Department of Labor 提供
您将学到什么
Explain how Hadoop and Spark enable large-scale data processing.
Build and manage distributed data pipelines using Hadoop frameworks.
Implement in-memory analytics and real-time processing with Spark.
Apply big data tools to design scalable, data-driven applications.
您将获得的技能
- PySpark
- Predictive Modeling
- Data Analysis
- Data Storage
- Data Pipelines
- Big Data
- Data Storage Technologies
- Python Programming
- Data Processing
- Apache Spark
- Data Science
- Data Management
- Apache Hadoop
- Data Transformation
- Apache Hive
- Information Technology
- Scikit Learn (Machine Learning Library)
- Scalability
- Distributed Computing
- 技能部分已折叠。显示 12 项技能,共 19 项。
要了解的详细信息

添加到您的领英档案
8 项作业
February 2026
了解顶级公司的员工如何掌握热门技能

积累特定领域的专业知识
- 向行业专家学习新概念
- 获得对主题或工具的基础理解
- 通过实践项目培养工作相关技能
- 获得可共享的职业证书

该课程共有3个模块
This module guides you through the core components of the Hadoop ecosystem, starting with its architecture and distributed file system. You’ll explore how Hadoop processes data, gain insight into its broader ecosystem, and apply your knowledge in hands-on activities using both Docker and a Linux virtual machine.
涵盖的内容
6个视频1篇阅读材料3个作业
This module introduces you to key programming models for distributed data processing, with a focus on MapReduce and its practical applications. You'll explore core concepts and terminology, work through guided code walkthroughs using Python to implement word count and server log analysis tasks, and gain experience using Apache Pig for data transformation. You'll also gain hands-on experience writing data transformation scripts in Apache Pig, culminating in an assignment that applies these skills to web log analysis.
涵盖的内容
6个视频6篇阅读材料3个作业
This module introduces you to Apache Spark, covering its core concepts, architecture, and machine learning capabilities through MLlib. You’ll learn how to set up Spark using Docker and Linux VM, explore how PySpark operates within the Spark framework, and compare Spark MLlib with scikit-learn through hands-on code walkthroughs. By the end of the module, you'll apply what you've learned in graded activities and an assignment focused on building a predictive model with PySpark and MLlib.
涵盖的内容
5个视频3篇阅读材料2个作业
获得职业证书
将此证书添加到您的 LinkedIn 个人资料、简历或履历中。在社交媒体和绩效考核中分享。
攻读学位
课程 是 University of Pittsburgh提供的以下学位课程的一部分。如果您被录取并注册,您已完成的课程可计入您的学位学习,您的学习进度也可随之转移。
位教师

人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

Jennifer J.

Larry W.







