Lorsque vous vous inscrivez à ce cours, vous devez également sélectionner un programme spécifique.
Apprenez de nouveaux concepts auprès d'experts du secteur
Acquérez une compréhension de base d'un sujet ou d'un outil
Développez des compétences professionnelles avec des projets pratiques
Obtenez un certificat professionnel partageable
Il y a 9 modules dans ce cours
Big data is the area of informatics focusing on datasets whose size is beyond the ability of typical database and other software tools to capture, store, analyze and manage. This course provides a rapid immersion into the area of big data and the technologies which have recently emerged to manage it.
We start with an introduction to the characteristics of big data and an overview of the associated technology landscape and continue with an in depth exploration of Hadoop, the leading open source framework for big data processing. Here the focus is on the most important Hadoop components such as Hive, Pig, stream processing and Spark as well as architectural patterns for applying these components. We continue with an exploration of the range of specialized (NoSQL) database systems architected to address the challenges of managing large volumes of data.
Overall the objective is to develop a sense of how to make sound decisions in the adoption and use of these technologies as well as economically deploy them on modern cloud computing infrastructure.
Welcome to Big Data Technologies! In Module 1, students will develop a foundational understanding of analytic data, its inherent value, and the methods to transform raw data into valuable insights. This module covers the challenges of handling large datasets, including their collection, processing, and analysis, while providing a comprehensive overview of Big Data's origins, properties, and real-world applications. Additionally, students will explore the economic, logistical, and ethical concerns associated with Big Data, alongside the professional advantages for data scientists proficient in Big Data analysis.
Inclus
16 vidéos10 lectures8 devoirs1 sujet de discussion
Afficher les informations sur le contenu du module
16 vidéos•Total 104 minutes
Course Overview•4 minutes
Instructor Introduction•2 minutes
Module 1 Introduction•2 minutes
From Data to Value - Part 1•10 minutes
From Data to Value - Part 2•7 minutes
Big Data Overview - Part 1•8 minutes
Big Data Overview - Part 2•6 minutes
Confounding Factors - Part 1•8 minutes
Confounding Factors - Part 2•7 minutes
Confounding Factors - Part 3•6 minutes
Big Data Challenges•6 minutes
Big Data Benefits - Part 1•6 minutes
Big Data Benefits - Part 2•5 minutes
Big Data Technology - Part 1•10 minutes
Big Data Technology - Part 2•8 minutes
Generic Distributed Storage Systems and Execution Engines •11 minutes
10 lectures•Total 500 minutes
Syllabus•10 minutes
Module 1 Introduction Reading•60 minutes
From Data to Value•60 minutes
Big Data Overview•60 minutes
Confounding Factors•60 minutes
Big Data Challenges•60 minutes
Big Data Benefits•60 minutes
Big Data Technology•60 minutes
Generic Distributed Storage Systems and Execution Engines•60 minutes
Module 1 Summary•10 minutes
8 devoirs•Total 330 minutes
Module 1 Summative Assessment•120 minutes
From Data to Value Quiz•15 minutes
Big Data Overview Quiz•15 minutes
Confounding Factors Quiz•15 minutes
Big Data Challenges Quiz•15 minutes
Big Data Benefits Quiz•15 minutes
Big Data Technology Quiz•15 minutes
Creating an AWS Account Assignment•120 minutes
1 sujet de discussion•Total 10 minutes
Meet and Greet Discussion•10 minutes
Module 2: Apache Hadoop Overview
Module 2•11 heures à terminer
Détails du module
Module 2 introduces students to the challenges of building and managing distributed systems for big data storage and processing. It covers Hadoop’s origins, concepts, core components, and key characteristics, while exploring the Hadoop ecosystem's tools and services. Students will gain an understanding of distributed file systems, specifically HDFS, YARN's resource management, and various technologies for effective big data storage and organization.
Inclus
13 vidéos7 lectures6 devoirs
Afficher les informations sur le contenu du module
13 vidéos•Total 91 minutes
Module 2 Introduction•2 minutes
Hadoop - Part 1•9 minutes
Hadoop - Part 2•6 minutes
Hadoop - Part 3•7 minutes
Hadoop Distributed File System Overview - Part 1•7 minutes
Hadoop Distributed File System Overview - Part 2•8 minutes
Hadoop Distributed File System Overview - Part 3•6 minutes
Using the Hadoop Distributed File System - Part 1•9 minutes
Using the Hadoop Distributed File System - Part 2•5 minutes
Cloud Object Storage for Big Data - Part 1•9 minutes
Cloud Object Storage for Big Data - Part 2•8 minutes
Yet Another Resource Negotiator - Part 1•9 minutes
Yet Another Resource Negotiator - Part 2•6 minutes
7 lectures•Total 370 minutes
Module 2 Introduction Reading•60 minutes
Hadoop•60 minutes
Hadoop Distributed File System Overview•60 minutes
Using the Hadoop Distributed File System•60 minutes
Cloud Object Storage for Big Data•60 minutes
Yet Another Resource Negotiator•60 minutes
Module 2 Summary•10 minutes
6 devoirs•Total 195 minutes
Module 2 Summative Assessment•120 minutes
Hadoop Quiz•15 minutes
Hadoop Distributed File System (HDFS) Overview Quiz•15 minutes
Using HDFS Quiz•15 minutes
Cloud Object Storage Quiz•15 minutes
Yet Another Resource Negotiator (YARN) Quiz•15 minutes
Module 3: Apache Hadoop MapReduce
Module 3•13 heures à terminer
Détails du module
In Module 3, students will explore the differences between processing small to moderate versus massive data volumes through distributed computing. This module covers the key concepts of the MapReduce framework, including how it breaks down large data processing tasks into smaller, parallel tasks for efficient execution. Students will also learn about the phases of MapReduce, the role of map and reduce functions, optimization patterns, and the benefits and limitations of various development approaches, including Java-based MapReduce and Hadoop Streaming.
Inclus
18 vidéos8 lectures7 devoirs
Afficher les informations sur le contenu du module
18 vidéos•Total 120 minutes
Module 3 Introduction•2 minutes
The Path to MapReduce - Part 1•8 minutes
The Path to MapReduce - Part 2•7 minutes
MapReduce Overview - Part 1•6 minutes
MapReduce Overview - Part 2•5 minutes
MapReduce Overview - Part 3•7 minutes
MapReduce Concepts - Part 1•6 minutes
MapReduce Concepts - Part 2•5 minutes
MapReduce Concepts - Part 3•6 minutes
MapReduce Concepts - Part 4•10 minutes
MapReduce Examples - Part 1•9 minutes
MapReduce Examples - Part 2•5 minutes
MapReduce Programming - Part 1•8 minutes
MapReduce Programming - Part 2•10 minutes
MapReduce Programming - Part 3•6 minutes
MapReduce Optimization - Part 1•8 minutes
MapReduce Optimization - Part 2•4 minutes
MapReduce Optimization - Part 3•8 minutes
8 lectures•Total 430 minutes
Module 3 Introduction Reading•60 minutes
The Path to MapReduce•60 minutes
MapReduce Overview•60 minutes
MapReduce Concepts•60 minutes
MapReduce Examples•60 minutes
MapReduce Programming•60 minutes
MapReduce Optimization•60 minutes
Module 3 Summary•10 minutes
7 devoirs•Total 210 minutes
Module 3 Summative Assessment•120 minutes
The Path to MapReduce Quiz•15 minutes
MapReduce Overview Quiz•15 minutes
MapReduce Concepts Quiz•15 minutes
MapReduce Examples Quiz•15 minutes
MapReduce Programming•15 minutes
MapReduce Optimization•15 minutes
Module 4: Apache Spark (Part 1)
Module 4•12 heures à terminer
Détails du module
In Module 4, students will explore Apache Spark as a powerful distributed processing framework for interactive, batch, and streaming tasks. This module covers Spark's core functionalities, including machine learning, graph processing, and handling structured and unstructured data, while highlighting its in-memory processing potential and unified nature. Students will compare Spark with MapReduce, learn about Spark's primary components, execution architecture, Resilient Distributed Datasets (RDDs), DataFrames, Datasets, and the various methods for creating and optimizing DataFrames for efficient data processing.
Inclus
25 vidéos7 lectures6 devoirs
Afficher les informations sur le contenu du module
25 vidéos•Total 143 minutes
Module 4 Introduction•2 minutes
Spark Overview - Part 1•9 minutes
Spark Overview - Part 2•9 minutes
Spark Components - Part 1•7 minutes
Spark Components - Part 2•6 minutes
Spark Components - Part 3•6 minutes
Spark Components - Part 4•7 minutes
Spark Components - Part 5•3 minutes
Spark Concepts - Part 1•7 minutes
Spark Concepts - Part 2•6 minutes
Spark Concepts - Part 3•5 minutes
Spark Concepts - Part 4•7 minutes
Spark Concepts - Part 5•6 minutes
Spark Concepts - Part 6•4 minutes
Spark Concepts - Part 7•7 minutes
Spark Concepts - Part 8•3 minutes
Spark Concepts - Part 9•5 minutes
Spark Concepts - Part 10•5 minutes
Creating Spark DataFrames - Part 1•6 minutes
Creating Spark DataFrames - Part 2•9 minutes
Creating Spark DataFrames - Part 3•6 minutes
Creating Spark DataFrames - Part 4•4 minutes
Defining Spark Schemas - Part 1•6 minutes
Defining Spark Schemas - Part 2•5 minutes
Defining Spark Schemas - Part 3•2 minutes
7 lectures•Total 370 minutes
Module 4 Introduction Reading•60 minutes
Spark Overview•60 minutes
Spark Components•60 minutes
Spark Concepts•60 minutes
Creating Spark DataFrames•60 minutes
Defining Spark Schemas•60 minutes
Module 4 Summary•10 minutes
6 devoirs•Total 195 minutes
Module 4 Summative Assessment•120 minutes
Spark Overview Quiz•15 minutes
Spark Components Quiz•15 minutes
Concepts Quiz•15 minutes
Creating Spark DataFrames Quiz•15 minutes
Defining Spark Schemas Quiz•15 minutes
Module 5: Apache Spark (Part 2)
Module 5•16 heures à terminer
Détails du module
In Module 5, students will delve deeper into Spark's capabilities for data manipulation and transformation. The module covers essential operations such as selecting, filtering, and sorting data, as well as joining DataFrames and performing aggregations. Students will also learn about handling null values, using Spark SQL for data queries, and optimizing performance with caching. Practical applications include creating and manipulating DataFrames, executing transformations and actions, and efficiently writing data to various formats.
Inclus
19 vidéos11 lectures10 devoirs
Afficher les informations sur le contenu du module
19 vidéos•Total 103 minutes
Module 5 Introduction•2 minutes
Transformation - Rows - Part 1•10 minutes
Transformation - Rows - Part 2•5 minutes
Transformation - Rows - Part 3•4 minutes
Transformations Columns - Part 1•9 minutes
Transformations Columns - Part 2•4 minutes
Transformations Join - Part 1•4 minutes
Transformations Join - Part 2•4 minutes
Transformations - Aggregations - Part 1•7 minutes
Transformations - Aggregations - Part 2•5 minutes
Transformations - Working with Null Values - Part 1•5 minutes
Transformations - Working with Null Values - Part 2•5 minutes
Transformations - Spark SQL - Part 1•6 minutes
Transformations - Spark SQL - Part 2•4 minutes
Transformations - Caching - Part 1•4 minutes
Transformations - Caching - Part 2•5 minutes
Actions•10 minutes
Actions - Writing Data - Part 1•5 minutes
Actions - Writing Data - Part 2•5 minutes
11 lectures•Total 610 minutes
Module 5 Introduction Reading•60 minutes
Transformation - Rows•60 minutes
Transformations - Columns•60 minutes
Transformations - Join•60 minutes
Transformations - Aggregations•60 minutes
Transformations - Working with Null Values•60 minutes
Transformations - Working with Null Values Quiz•15 minutes
Transformations - Spark SQL Quiz•15 minutes
Transformations - Caching Quiz•15 minutes
Transformations - Actions Quiz•15 minutes
Actions - Writing Data Quiz•15 minutes
Module 6: Big Data Streaming and Design Patterns
Module 6•10 heures à terminer
Détails du module
Module 6 introduces students to the limitations of batch processing and the significance of real-time data processing. It covers essential aspects of stream processing, including data ingestion and analysis, with a focus on tools like Apache Kafka for stream ingestion and Spark Structured Streaming for scalable and fault-tolerant data processing. Students will also explore various design patterns for organizing big data clusters, the concept of data lakes, and the Lambda Architecture for unifying real-time and batch data processing in modern data environments.
Inclus
16 vidéos6 lectures6 devoirs
Afficher les informations sur le contenu du module
16 vidéos•Total 106 minutes
Module 6 Introduction•3 minutes
Stream Ingestion and Processing I - Part 1•9 minutes
Stream Ingestion and Processing I - Part 2•8 minutes
Stream Ingestion and Processing I - Part 3•8 minutes
Stream Ingestion and Processing II - Part 1•6 minutes
Stream Ingestion and Processing II - Part 2•3 minutes
Stream Ingestion and Processing II - Part 3•5 minutes
Stream Ingestion and Processing II - Part 4•7 minutes
Analytic Cluster Pattern - Part 1•7 minutes
Analytic Cluster Pattern - Part 2•7 minutes
Data Lake Pattern - Part 1•6 minutes
Data Lake Pattern - Part 2•6 minutes
Data Lake Pattern - Part 3•6 minutes
Lambda Architecture - Part 1•10 minutes
Lambda Architecture - Part 2•8 minutes
Lambda Architecture - Part 3•8 minutes
6 lectures•Total 310 minutes
Stream Ingestion and Processing (Part 1)•60 minutes
Stream Ingestion and Processing (Part 2)•60 minutes
Analytic Cluster Pattern•60 minutes
Data Lake Pattern•60 minutes
Lambda Architecture•60 minutes
Module 6 Summary•10 minutes
6 devoirs•Total 195 minutes
Module 6 Summative Assessment•120 minutes
Stream Ingestion and Processing (Part 1) Quiz•15 minutes
Stream Ingestion and Processing (Part 2) Quiz•15 minutes
What is a characteristic of a transient Hadoop cluster? Quiz•15 minutes
Data Lake Pattern Quiz•15 minutes
Lambda Architecture Quiz•15 minutes
Module 7: NoSQL Database
Module 7•10 heures à terminer
Détails du module
In Module 7, students will explore the benefits and limitations of relational databases in big data contexts and the concept of distributed database systems. This module covers NoSQL databases, their diverse data models, and their scalability and flexibility advantages. Students will also learn about real-world use cases, data partitioning, consistency models, and the CAP Theorem, gaining a comprehensive understanding of how NoSQL databases manage large datasets across clusters while ensuring scalability and availability.
Inclus
18 vidéos6 lectures6 devoirs
Afficher les informations sur le contenu du module
18 vidéos•Total 121 minutes
Module 7 Introduction•3 minutes
Using Databases for Big Data Storage - Part 1•10 minutes
Using Databases for Big Data Storage - Part 2•6 minutes
Using Databases for Big Data Storage - Part 3•3 minutes
Using Databases for Big Data Storage - Part 4•9 minutes
Using Databases for Big Data Storage - Part 5•6 minutes
NoSQL Database Concepts I - Part 1•7 minutes
NoSQL Database Concepts I - Part 2•7 minutes
NoSQL Database Concepts I - Part 3•4 minutes
NoSQL Database Concepts II - Part 1•5 minutes
NoSQL Database Concepts II - Part 2•3 minutes
NoSQL Database Concepts II - Part 3•7 minutes
NoSQL Database Classifications I - Part 1•11 minutes
NoSQL Database Classifications I - Part 2•7 minutes
NoSQL Database Classifications I - Part 3•9 minutes
NoSQL Database Classifications II - Part 1•12 minutes
NoSQL Database Classifications II - Part 2•5 minutes
NoSQL Database Classifications II - Part 3•6 minutes
In Module 8, students will explore specific NoSQL databases types – namely Key-Value, Wide-Column, and Document databases. Two similar systems, HBase and Cassandra, will be studied and contrasted in the context of the CAP theorem and associated CP/AP trade-offs. Topics such as consistency and availability will be discussed in the context of specific usage scenarios for both HBase and Cassandra – and general application domains of both systems will be highlighted. Finally, the document database MongoDB will be reviewed in the context of natural language/text processing use cases – and MongoDB usage and architecture will be analyzed with respect to traditional RDBMS.
Inclus
9 vidéos4 lectures4 devoirs
Afficher les informations sur le contenu du module
9 vidéos•Total 71 minutes
Module 8 Introduction•0 minutes
HRBase Pt. 1•10 minutes
HRBase Pt. 2•7 minutes
HRBase Pt. 3•6 minutes
Cassandra Pt. 1•11 minutes
Cassandra Pt. 2•11 minutes
MongoDB Pt. 1•9 minutes
MongoDB Pt. 2•7 minutes
MongoDB Pt. 3•10 minutes
4 lectures•Total 190 minutes
HR Base•60 minutes
Dynamo and Cassandra•60 minutes
Mongo DB•60 minutes
Module 8 Summary•10 minutes
4 devoirs•Total 165 minutes
Module 8 Summative Assessment•120 minutes
HR Base Quiz•15 minutes
Cassandra Quiz•15 minutes
Mongo DB Quiz•15 minutes
Summative Course Assessment
Module 9•3 heures à terminer
Détails du module
This module contains the summative course assessment that has been designed to evaluate your understanding of the course material and assess your ability to apply the knowledge you have acquired throughout the course.
Inclus
1 devoir
Afficher les informations sur le contenu du module
1 devoir•Total 180 minutes
Summative Course Assessment•180 minutes
Obtenez un certificat professionnel
Ajoutez ce titre à votre profil LinkedIn, à votre curriculum vitae ou à votre CV. Partagez-le sur les médias sociaux et dans votre évaluation des performances.
Préparer un diplôme
Ce site cours fait partie du (des) programme(s) diplômant(s) suivant(s) proposé(s) par Illinois Tech. Si vous êtes admis et que vous vous inscrivez, les cours que vous avez suivis peuvent compter pour l'apprentissage de votre diplôme et vos progrès peuvent être transférés avec vous.¹
Consulter les diplômes éligibles
Préparer un diplôme
Ce site cours fait partie du (des) programme(s) diplômant(s) suivant(s) proposé(s) par Illinois Tech. Si vous êtes admis et que vous vous inscrivez, les cours que vous avez suivis peuvent compter pour l'apprentissage de votre diplôme et vos progrès peuvent être transférés avec vous.¹
¹La réussite de la candidature et de l'inscription est requise. Les conditions d'admissibilité s'appliquent. Chaque établissement détermine le nombre de crédits reconnus en complétant ce contenu qui peut compter pour les exigences du diplôme, en tenant compte de tout crédit existant que vous pourriez avoir. Cliquez sur un cours spécifique pour plus d'informations.
Illinois Tech is a top-tier, nationally ranked, private research university with programs in engineering, computer science, architecture, design, science, business, human sciences, and law. The university offers bachelor of science, master of science, professional master’s, and Ph.D. degrees—as well as certificates for in-demand STEM fields and other areas of innovation. Talented students from around the world choose to study at Illinois Tech because of the access to real-world opportunities, renowned academic programs, high value, and career prospects of graduates.
Pour quelles raisons les étudiants sur Coursera nous choisissent-ils pour leur carrière ?
Felipe M.
Étudiant(e) depuis 2018
’Pouvoir suivre des cours à mon rythme à été une expérience extraordinaire. Je peux apprendre chaque fois que mon emploi du temps me le permet et en fonction de mon humeur.’
Jennifer J.
Étudiant(e) depuis 2020
’J'ai directement appliqué les concepts et les compétences que j'ai appris de mes cours à un nouveau projet passionnant au travail.’
Larry W.
Étudiant(e) depuis 2021
’Lorsque j'ai besoin de cours sur des sujets que mon université ne propose pas, Coursera est l'un des meilleurs endroits où se rendre.’
Chaitanya A.
’Apprendre, ce n'est pas seulement s'améliorer dans son travail : c'est bien plus que cela. Coursera me permet d'apprendre sans limites.’
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I subscribe to this Specialization?
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Is financial aid available?
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.