Big Data Processing with Hadoop and Spark

通过 Coursera Plus 提高技能，仅需 239 美元/年（原价 399 美元）。立即节省

Big Data Processing with Hadoop and Spark

本课程是 Cloud Computing for Data Science 专项课程的一部分

位教师：Dmitriy Babichenko

包含在中

了解更多

3个模块

深入了解一个主题并学习基础知识。

中级等级

推荐体验

9 小时完成

灵活的计划

自行安排学习进度

攻读学位

了解更多

3个模块

深入了解一个主题并学习基础知识。

中级等级

推荐体验

9 小时完成

灵活的计划

自行安排学习进度

攻读学位

了解更多

您将学到什么

Explain how Hadoop and Spark enable large-scale data processing.
Build and manage distributed data pipelines using Hadoop frameworks.
Implement in-memory analytics and real-time processing with Spark.
Apply big data tools to design scalable, data-driven applications.

您将获得的技能

您将学习的工具

要了解的详细信息

可分享的证书

添加到您的领英档案

了解顶级公司的员工如何掌握热门技能

了解关于 Coursera for Business 的更多信息

Petrobras, TATA, Danone, Capgemini, P&G 和 L'Oreal 的徽标

积累特定领域的专业知识

本课程是 Cloud Computing for Data Science 专项课程专项课程的一部分

在注册此课程时，您还会同时注册此专项课程。

向行业专家学习新概念
获得对主题或工具的基础理解
通过实践项目培养工作相关技能
获得可共享的职业证书

该课程共有3个模块

Master the tools and techniques that power large-scale data processing and analytics. This course introduces the principles and frameworks of Big Data Processing with Hadoop and Spark, enabling learners to manage, process, and analyze massive datasets efficiently.

You’ll start by understanding the Hadoop ecosystem, including HDFS and MapReduce, and how distributed storage and computation work together to handle data at scale. Then, you’ll explore Apache Spark, a powerful framework for fast, in-memory data processing and real-time analytics. Through guided exercises and case studies, you’ll learn how to build scalable data pipelines, optimize performance, and apply transformations for business insights. By the end of this course, you’ll be equipped to handle complex data workloads using industry-standard big data tools. Ideal for aspiring data engineers, analysts, and developers, this course bridges data management and cloud computing—preparing you to design, implement, and manage big data solutions that drive intelligent decision-making in modern organizations.

This module guides you through the core components of the Hadoop ecosystem, starting with its architecture and distributed file system. You’ll explore how Hadoop processes data, gain insight into its broader ecosystem, and apply your knowledge in hands-on activities using both Docker and a Linux virtual machine.

涵盖的内容

6个视频1篇阅读材料3个作业

6个视频总计41分钟

Overview: Hadoop2分钟
Lecture 1: Introduction to Hadoop7分钟
Lecture 2: HDFS Architecture7分钟
Lecture 3: Yarn Architecture7分钟
Lecture 4: Hadoop Ecosystem9分钟
Lecture 5: Hadoop Data Processing9分钟

1篇阅读材料总计10分钟

Course Overview10分钟

3个作业总计90分钟

HDFS Architecture30分钟
Test Yourself: Hadoop30分钟
Let's Practice: Hadoop30分钟

This module introduces you to key programming models for distributed data processing, with a focus on MapReduce and its practical applications. You'll explore core concepts and terminology, work through guided code walkthroughs using Python to implement word count and server log analysis tasks, and gain experience using Apache Pig for data transformation. You'll also gain hands-on experience writing data transformation scripts in Apache Pig, culminating in an assignment that applies these skills to web log analysis.

涵盖的内容

6个视频6篇阅读材料3个作业

6个视频总计34分钟

Overview: Parallel Programming Models2分钟
Lecture 1: Programming Models4分钟
Lecture 2: Programming Models Concepts and Terminology11分钟
Lecture 3: MapReduce8分钟
Lecture 4: MapReduce Deeper Dive6分钟
Lecture 5: Apache Pig4分钟

6篇阅读材料总计60分钟

Code Review: Introduction to MapReduce With Python10分钟
Code Review: Word Count Example with MapReduce + Python10分钟
Code Review: Server Log Analysis with MapReduce + Python10分钟
Code Review: Server Log Analysis (Reading from File) with MapReduce + Python10分钟
Activity & Code Review: Word Count with Apache Pig10分钟
Activity: Working with Apache Pig10分钟

3个作业总计90分钟

MapReduce30分钟
Test Yourself: Programming Models30分钟
Let's Practice: Programming Models30分钟

This module introduces you to Apache Spark, covering its core concepts, architecture, and machine learning capabilities through MLlib. You’ll learn how to set up Spark using Docker and Linux VM, explore how PySpark operates within the Spark framework, and compare Spark MLlib with scikit-learn through hands-on code walkthroughs. By the end of the module, you'll apply what you've learned in graded activities and an assignment focused on building a predictive model with PySpark and MLlib.

涵盖的内容

5个视频3篇阅读材料2个作业

5个视频总计22分钟

Lecture 1: Introduction to Apache Spark3分钟
Lecture 2: Apache Spark Core Concepts5分钟
Lecture 3: Apache Spark Architecture3分钟
Lecture 4: PySpark and Its Execution in Apache Spark Architecture6分钟
Lecture 5: Introduction to Apache Spark MLlib6分钟

3篇阅读材料总计30分钟

Case Study & Code Review: scikit-learn vs. Spark MLlib10分钟
Activity & Code Review: PySpark and MLlib Pipeline10分钟
Course Summary10分钟

2个作业总计60分钟

Test Yourself: Apache Spark30分钟
Let's Practice: Apache Spark30分钟

获得职业证书

将此证书添加到您的 LinkedIn 个人资料、简历或履历中。在社交媒体和绩效考核中分享。

攻读学位

课程是 University of Pittsburgh提供的以下学位课程的一部分。如果您被录取并注册，您已完成的课程可计入您的学位学习，您的学习进度也可随之转移。

位教师

Dmitriy Babichenko

University of Pittsburgh

4 门课程2,671 名学生

提供方

University of Pittsburgh

从 Data Management 浏览更多内容

状态：免费试用
Packt
Apache Spark with Scala – Hands-On with Big Data!
课程
状态：免费试用
IBM
Introduction to Big Data with Spark and Hadoop
课程
状态：免费试用
Pearson
Hadoop and Spark Fundamentals: Unit 2
课程
状态：免费试用
EDUCBA
Apache Spark: Apply & Evaluate Big Data Workflows
课程

人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

自 2018开始学习的学生

''能够按照自己的速度和节奏学习课程是一次很棒的经历。只要符合自己的时间表和心情，我就可以学习。'

Jennifer J.

自 2020开始学习的学生

''我直接将从课程中学到的概念和技能应用到一个令人兴奋的新工作项目中。'

Larry W.

自 2021开始学习的学生

''如果我的大学不提供我需要的主题课程，Coursera 便是最好的去处之一。'

Chaitanya A.

''学习不仅仅是在工作中做的更好：它远不止于此。Coursera 让我无限制地学习。'

通过 Coursera Plus 开启新生涯

无限制访问 10,000+ 世界一流的课程、实践项目和就业就绪证书课程 - 所有这些都包含在您的订阅中

了解更多

通过在线学位推动您的职业生涯

获取世界一流大学的学位 - 100% 在线

探索学位

加入超过 3400 家选择 Coursera for Business 的全球公司

提升员工的技能，使其在数字经济中脱颖而出

了解更多

常见问题

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.