Big Data Processing with Hadoop and Spark

Ce cours n'est pas disponible en Français (France)

Nous sommes actuellement en train de le traduire dans plus de langues.

Big Data Processing with Hadoop and Spark

Ce cours fait partie de Spécialisation "Cloud Computing for Data Science"

Instructeur : Dmitriy Babichenko

Inclus avec

3 modules

Obtenez un aperçu d'un sujet et apprenez les principes fondamentaux.

niveau Intermédiaire

Expérience recommandée

9 heures à compléter

Planning flexible

Apprenez à votre propre rythme

Préparer un diplôme

3 modules

Obtenez un aperçu d'un sujet et apprenez les principes fondamentaux.

niveau Intermédiaire

Expérience recommandée

9 heures à compléter

Planning flexible

Apprenez à votre propre rythme

Préparer un diplôme

Ce que vous apprendrez

Explain how Hadoop and Spark enable large-scale data processing.
Build and manage distributed data pipelines using Hadoop frameworks.
Implement in-memory analytics and real-time processing with Spark.
Apply big data tools to design scalable, data-driven applications.

Compétences que vous acquerrez

Catégorie : Information Technology
Catégorie : Data Management
Catégorie : Data Pipelines
Catégorie : Data Processing
Catégorie : File Systems
Catégorie : Data Storage
Catégorie : Data Transformation
Catégorie : Data Analysis
Catégorie : Scalability
Catégorie : Data Science
Catégorie : Predictive Modeling
Catégorie : Distributed Computing
Catégorie : Big Data

Outils que vous découvrirez

Catégorie : Scikit Learn (Machine Learning Library)
Catégorie : PySpark
Catégorie : Apache Spark
Catégorie : Apache Hive
Catégorie : Apache Hadoop

Détails à connaître

Certificat partageable

Ajouter à votre profil LinkedIn

Récemment mis à jour !

février 2026

Évaluations

8 devoirs

Enseigné en Anglais

Découvrez comment les employés des entreprises prestigieuses maîtrisent des compétences recherchées

En savoir plus sur Coursera pour les affaires

logos de Petrobras, TATA, Danone, Capgemini, P&G et L'Oreal

Élaborez votre expertise du sujet

Ce cours fait partie de la Spécialisation "Cloud Computing for Data Science"

Lorsque vous vous inscrivez à ce cours, vous êtes également inscrit(e) à cette Spécialisation.

Apprenez de nouveaux concepts auprès d'experts du secteur
Acquérez une compréhension de base d'un sujet ou d'un outil
Développez des compétences professionnelles avec des projets pratiques
Obtenez un certificat professionnel partageable

Il y a 3 modules dans ce cours

Master the tools and techniques that power large-scale data processing and analytics. This course introduces the principles and frameworks of Big Data Processing with Hadoop and Spark, enabling learners to manage, process, and analyze massive datasets efficiently.

You’ll start by understanding the Hadoop ecosystem, including HDFS and MapReduce, and how distributed storage and computation work together to handle data at scale. Then, you’ll explore Apache Spark, a powerful framework for fast, in-memory data processing and real-time analytics. Through guided exercises and case studies, you’ll learn how to build scalable data pipelines, optimize performance, and apply transformations for business insights. By the end of this course, you’ll be equipped to handle complex data workloads using industry-standard big data tools. Ideal for aspiring data engineers, analysts, and developers, this course bridges data management and cloud computing—preparing you to design, implement, and manage big data solutions that drive intelligent decision-making in modern organizations.

This module guides you through the core components of the Hadoop ecosystem, starting with its architecture and distributed file system. You’ll explore how Hadoop processes data, gain insight into its broader ecosystem, and apply your knowledge in hands-on activities using both Docker and a Linux virtual machine.

Inclus

6 vidéos1 lecture3 devoirs

6 vidéosTotal 41 minutes

Overview: Hadoop2 minutes
Lecture 1: Introduction to Hadoop7 minutes
Lecture 2: HDFS Architecture7 minutes
Lecture 3: Yarn Architecture7 minutes
Lecture 4: Hadoop Ecosystem9 minutes
Lecture 5: Hadoop Data Processing9 minutes

1 lectureTotal 10 minutes

Course Overview10 minutes

3 devoirsTotal 90 minutes

HDFS Architecture30 minutes
Test Yourself: Hadoop30 minutes
Let's Practice: Hadoop30 minutes

This module introduces you to key programming models for distributed data processing, with a focus on MapReduce and its practical applications. You'll explore core concepts and terminology, work through guided code walkthroughs using Python to implement word count and server log analysis tasks, and gain experience using Apache Pig for data transformation. You'll also gain hands-on experience writing data transformation scripts in Apache Pig, culminating in an assignment that applies these skills to web log analysis.

Inclus

6 vidéos6 lectures3 devoirs

6 vidéosTotal 34 minutes

Overview: Parallel Programming Models2 minutes
Lecture 1: Programming Models4 minutes
Lecture 2: Programming Models Concepts and Terminology11 minutes
Lecture 3: MapReduce8 minutes
Lecture 4: MapReduce Deeper Dive6 minutes
Lecture 5: Apache Pig4 minutes

6 lecturesTotal 60 minutes

Code Review: Introduction to MapReduce With Python10 minutes
Code Review: Word Count Example with MapReduce + Python10 minutes
Code Review: Server Log Analysis with MapReduce + Python10 minutes
Code Review: Server Log Analysis (Reading from File) with MapReduce + Python10 minutes
Activity & Code Review: Word Count with Apache Pig10 minutes
Activity: Working with Apache Pig10 minutes

3 devoirsTotal 90 minutes

MapReduce30 minutes
Test Yourself: Programming Models30 minutes
Let's Practice: Programming Models30 minutes

This module introduces you to Apache Spark, covering its core concepts, architecture, and machine learning capabilities through MLlib. You’ll learn how to set up Spark using Docker and Linux VM, explore how PySpark operates within the Spark framework, and compare Spark MLlib with scikit-learn through hands-on code walkthroughs. By the end of the module, you'll apply what you've learned in graded activities and an assignment focused on building a predictive model with PySpark and MLlib.

Inclus

5 vidéos3 lectures2 devoirs

5 vidéosTotal 22 minutes

Lecture 1: Introduction to Apache Spark3 minutes
Lecture 2: Apache Spark Core Concepts5 minutes
Lecture 3: Apache Spark Architecture3 minutes
Lecture 4: PySpark and Its Execution in Apache Spark Architecture6 minutes
Lecture 5: Introduction to Apache Spark MLlib6 minutes

3 lecturesTotal 30 minutes

Case Study & Code Review: scikit-learn vs. Spark MLlib10 minutes
Activity & Code Review: PySpark and MLlib Pipeline10 minutes
Course Summary10 minutes

2 devoirsTotal 60 minutes

Test Yourself: Apache Spark30 minutes
Let's Practice: Apache Spark30 minutes

Obtenez un certificat professionnel

Ajoutez ce titre à votre profil LinkedIn, à votre curriculum vitae ou à votre CV. Partagez-le sur les médias sociaux et dans votre évaluation des performances.

Préparer un diplôme

Ce site cours fait partie du (des) programme(s) diplômant(s) suivant(s) proposé(s) par University of Pittsburgh. Si vous êtes admis et que vous vous inscrivez, les cours que vous avez suivis peuvent compter pour l'apprentissage de votre diplôme et vos progrès peuvent être transférés avec vous.¹

Instructeur

Dmitriy Babichenko

University of Pittsburgh

4 Cours2 855 apprenants

Offert par

University of Pittsburgh

En savoir plus sur Data Management

Statut : Essai gratuit
Packt
Apache Spark with Scala – Hands-On with Big Data!
Cours
Statut : Essai gratuit
IBM
Introduction to Big Data with Spark and Hadoop
Cours
Statut : Essai gratuit
Pearson
Hadoop and Spark Fundamentals: Unit 2
Cours
Statut : Essai gratuit
EDUCBA
Apache Spark: Apply & Evaluate Big Data Workflows
Cours

Pour quelles raisons les étudiants sur Coursera nous choisissent-ils pour leur carrière ?

Felipe M.

Étudiant(e) depuis 2018

’Pouvoir suivre des cours à mon rythme à été une expérience extraordinaire. Je peux apprendre chaque fois que mon emploi du temps me le permet et en fonction de mon humeur.’

Jennifer J.

Étudiant(e) depuis 2020

’J'ai directement appliqué les concepts et les compétences que j'ai appris de mes cours à un nouveau projet passionnant au travail.’

Larry W.

Étudiant(e) depuis 2021

’Lorsque j'ai besoin de cours sur des sujets que mon université ne propose pas, Coursera est l'un des meilleurs endroits où se rendre.’

Chaitanya A.

’Apprendre, ce n'est pas seulement s'améliorer dans son travail : c'est bien plus que cela. Coursera me permet d'apprendre sans limites.’

Ouvrez de nouvelles portes avec Coursera Plus

Accès illimité à 10,000+ cours de niveau international, projets pratiques et programmes de certification prêts à l'emploi - tous inclus dans votre abonnement.

Faites progresser votre carrière avec un diplôme en ligne

Obtenez un diplôme auprès d’universités de renommée mondiale - 100 % en ligne

Découvrir les diplômes

Rejoignez plus de 3 400 entreprises mondiales qui ont choisi Coursera pour les affaires

Améliorez les compétences de vos employés pour exceller dans l’économie numérique

Foire Aux Questions

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.