Big Data Analytics

通过 Coursera Plus 提高技能，仅需 239 美元/年（原价 399 美元）。立即节省

Big Data Analytics

位教师：Dr. Mohit Bhatnagar

包含在中

了解更多

11个模块

深入了解一个主题并学习基础知识。

初级等级

推荐体验

3 周完成

在 10 小时一周

灵活的计划

自行安排学习进度

攻读学位

了解更多

11个模块

深入了解一个主题并学习基础知识。

初级等级

推荐体验

3 周完成

在 10 小时一周

灵活的计划

自行安排学习进度

攻读学位

了解更多

您将学到什么

Gain a deep understanding of Hadoop and Spark ecosystems for managing big data. Become familiar with tools like Hive and Pig to query large datasets.

您将获得的技能

您将学习的工具

要了解的详细信息

可分享的证书

添加到您的领英档案

作业

16 项作业

授课语言：英语（English）

了解顶级公司的员工如何掌握热门技能

了解关于 Coursera for Business 的更多信息

Petrobras, TATA, Danone, Capgemini, P&G 和 L'Oreal 的徽标

该课程共有11个模块

The Big Data Analytics course offers a deep dive into the technologies, tools, and techniques used to process and analyze large-scale data. Learners will explore the Hadoop and Spark ecosystems, gaining hands-on experience with essential components such as Hadoop Distributed File System (HDFS), MapReduce, Pig, and Hive. The course also covers both relational (SQL) and nonrelational (NoSQL) databases, helping learners understand the appropriate contexts for each type of data storage.

A significant focus is placed on Apache Spark, known for its high-speed, in-memory data processing capabilities, which is vital for handling big data applications. Learners will also work through real-world exercises, including implementing and deploying a machine learning application that processes streaming data on the cloud. Designed for professionals with a background in predictive analytics, basic SQL, and Python programming, this course equips learners with the practical skills to manage data characterized by high volume, velocity, and variety. By the end of the course, participants will be able to derive actionable insights from big data and apply them in business contexts, contributing to improved decision-making and competitive advantage in data-driven environments.

Welcome to the Big Data Analytics course! By the end of this course, you will develop an understanding of the various technologies associated with Hadoop and the Spark ecosystem of tools and technologies. You will get hands-on experience working with core Hadoop components like MapReduce and Hadoop Distributed File System (HDFS). You will learn to write Pig scripts and Hive queries and extract data stored across Hadoop clusters. You will also learn about relational (SQL) and nonrelational (NoSQL) databases and discuss scenarios in which one is preferred over the other for data storage. You will also gain insight into the Spark ecosystem which makes running jobs across clusters very fast, thereby having several emerging applications. You will also learn a hands-on example of implementing and deploying a machine-learning application that handles streaming data on the cloud. This is an advanced-level course, intended for learners with a background using predictive tools and techniques, experience in writing basic Structured Query Language (SQL) queries, and an understanding of Python programming. The knowledge you gain from this course will help you make a career as a business analyst. You will gain skills to draw insights from data that has characteristics of high velocity, volume, and variety. The data with such characteristics is called big data and is increasingly being used by organizations for competitive advantage and decision-making. In this module, you will learn about Big Data applications and the various components of the Hadoop ecosystem. The module also discusses the MapReduce paradigm that facilitates distributed processing of data. You will also gain an insight into the HDFS and use it for storing files. Hands-on examples are provided using Hortonworks Data Platform Sandbox, which can be installed on a Windows/Mac computer with at least 8 GB of available RAM.

涵盖的内容

13个视频4篇阅读材料2个作业1个讨论话题

13个视频总计96分钟

Course Introduction2分钟
Introduction to Big Data 7分钟
Data Types and Applications4分钟
The Need and Evolution of Hadoop5分钟
The Hadoop Ecosystem7分钟
Hortonworks Data Platform Sandbox Installation (Desktop/Laptop)9分钟
Hortonworks Data Platform Sandbox Installation (Google Cloud)15分钟
The HDFS File System6分钟
Hands-On with HDFS on HDP Sandbox (Desktop/Laptop)10分钟
Hands-On with HDFS on HDP Sandbox (Google Cloud)14分钟
Distributed Computing Using YARN5分钟
Introduction to MapReduce 6分钟
Hands-On with MapReduce Using Python 7分钟

4篇阅读材料总计180分钟

Essential Reading: Introduction to Big Data60分钟
Recommended Reading: Introduction to Hadoop Ecosystem30分钟
Essential Reading: Hands-On with Hadoop60分钟
Recommended Reading: mrjob Python Library30分钟

2个作业总计39分钟

Introduction to Big Data and Hadoop Ecosystem24分钟
Hands-On with Hadoop15分钟

1个讨论话题总计20分钟

Applications of Big Data Analytics20分钟

This assessment is a graded quiz based on the module covered in this week.

涵盖的内容

1个作业

In this module, you will learn about the Hive scripting language and its usage for mining data from Hadoop clusters. Hive provides an SQL dialect called Hive Query Language (abbreviated HiveQL or just HQL) for querying data stored in a Hadoop cluster. Hive is most suited for data warehouse applications, where relatively static data is analyzed, fast response times are not required, and when the data is not changing rapidly. Hive makes it easier for developers to port SQL-based applications to Hadoop, compared with other Hadoop languages and tools. Like all SQL dialects in widespread use, it does not fully conform to any particular revision of the ANSI SQL standard. It is perhaps closest to MySQL’s dialect, but with significant differences. Hive supports several sizes of integer and floating-point types, a boolean type, and character strings of arbitrary length. Lastly, taking a real-world data set, you will load it in the Ambari environment for analysis using HDFS and HQL. You will go through the process of creating tables, loading data, and analyzing it using a Hive Query Language.

涵盖的内容

9个视频2篇阅读材料2个作业1个讨论话题

9个视频总计67分钟

Recap of Basic Concepts6分钟
Introduction to Hive6分钟
Hive Data Types6分钟
HQL Commands and Uses7分钟
HiveQL Data Definition and Manipulation6分钟
Getting Started with Hive11分钟
Using the Hive View on Ambari8分钟
Practice Example on Hive8分钟
Challenge: Hands-On9分钟

2篇阅读材料总计105分钟

Essential Reading: Introduction to Hive15分钟
Essential Reading: Hands-On with Hive90分钟

2个作业总计30分钟

Introduction to Hive18分钟
Hands-On with Hive12分钟

1个讨论话题总计15分钟

Introduction to HIVE15分钟

This assessment is a graded quiz based on the modules covered this week. 

涵盖的内容

1个作业

In this module, you will learn about the Pig Latin scripting language and how you can leverage it to query big data on Hadoop clusters. You will also learn about the different data types and commands available in the Pig Latin language and how they can be used to define and manipulate data in the Hadoop ecosystem. Furthermore, you will be to work on a practical example of a publicly available data set to run Pig Latin scripts for data analysis.

涵盖的内容

7个视频2篇阅读材料2个作业

7个视频总计57分钟

Introduction to Pig Latin8分钟
Pig Data Types7分钟
Pig Latin Commands and Uses7分钟
Pig Data Definition and Manipulation9分钟
Running Pig View on Ambari6分钟
Example on Pig View10分钟
Practice Problem as a Challenge11分钟

2篇阅读材料总计105分钟

Essential Reading: Introduction to Pig Language15分钟
Recommended Reading: Hands-On with Pig90分钟

2个作业总计30分钟

Introduction to Pig Language24分钟
Hands-On with Pig6分钟

In this module, you will be introduced to the need for NoSQL databases. You will also get introduced to HBase, a NoSQL database, and its role in the Hadoop ecosystem. You will learn about the CAP theorem and how it affects the trade-offs between choosing the different NoSQL database options available on Hadoop. You will also learn about CAP consistency, availability, and partition tolerance in detail and how they affect our choice of technology to access and manipulate data on Hadoop. Lastly, you will get insights into other emerging cloud-based NoSQL solutions.

涵盖的内容

8个视频2篇阅读材料2个作业1个讨论话题

8个视频总计59分钟

Introduction to Data Warehouses8分钟
Need for NoSQL Databases8分钟
CAP Theorem8分钟
Making a Choice of a Database8分钟
Introduction to HBase7分钟
Architecture of Hbase8分钟
HBase data model6分钟
Running and Setting Up Hbase on Ambari and Hands-On with Hbase7分钟

2篇阅读材料总计135分钟

Essential Reading: Introduction to NoSQL Databases45分钟
Recommended Reading: Hands-On with HBase90分钟

2个作业总计30分钟

Introduction to NoSQL Databases15分钟
Hands-On with HBase15分钟

1个讨论话题总计15分钟

Architecture of HBase15分钟

This assessment is a graded quiz based on the modules covered this week.

涵盖的内容

1个作业

In this module, you will be introduced to the popular Apache Spark platform for Big Data processing. You will explore the key components of Apache Spark that provide significant benefits in distributed computing. You will also be introduced to the Resilient Distributed Datastores (RDD) and the Spark DataFrames. Furthermore, you will be introduced to Spark SQL and Spark Streaming.

涵盖的内容

11个视频4篇阅读材料2个作业1个讨论话题

11个视频总计70分钟

The Need for Spark5分钟
Spark Background and Applications6分钟
The Resilient Distributed Dataset (RDD)7分钟
Hands-On with the PySpark Library in Python8分钟
Working with Spark DataFrames and Spark SQL5分钟
Hands-On with Structured Queries on Spark7分钟
Need for Processing Streaming Data5分钟
Introduction to Spark Streaming6分钟
Hands-On with DStream API7分钟
Structured Streaming6分钟
Hands-On with Structured Streaming6分钟

4篇阅读材料总计360分钟

Essential Reading: Introduction to Spark180分钟
Recommended Reading: Quick Start on Spark60分钟
Essential Reading: Introduction to Spark Streaming90分钟
Recommended Reading: Spark Structured Streaming30分钟

2个作业总计30分钟

Introduction to the Building Blocks of Spark15分钟
Introduction to Spark Streaming15分钟

1个讨论话题总计20分钟

Windowing in Structured Streaming20分钟

This assessment is a graded quiz based on the module covered in this week.

涵盖的内容

1个作业

In this module, you will learn about MLlib, which is used for making predictions on large datasets that need distributed processing. You will be working on regression and classification tasks for large datasets. Then, a hands-on exercise with streaming data from the twitter API is implemented. This is a predictive streaming application to show participants an end-to-end big data scenario.

涵盖的内容

8个视频3篇阅读材料2个作业

8个视频总计52分钟

Introduction to MLlib5分钟
Regression Algorithms in Mllib6分钟
Solving Classification Problems with Mllib6分钟
Hands-On with Sentiment Analysis8分钟
Introduction to Google Cloud Dataproc5分钟
Hands-On setting up a cluster on Google Dataproc 8分钟
Streaming Data from Twitter API 7分钟
Hands-On with a Streaming Analytics Application7分钟

3篇阅读材料总计150分钟

Essential Reading: Introduction to ML on Spark90分钟
Recommended Reading: Dataproc Best Practices Guide30分钟
Recommended Reading: Twitter API v230分钟

2个作业总计27分钟

Machine Learning on Spark15分钟
Running Hadoop and Spark on Cloud12分钟

Course Wrap-Up Video

涵盖的内容

1个视频

攻读学位

课程是 O.P. Jindal Global University提供的以下学位课程的一部分。如果您被录取并注册，您已完成的课程可计入您的学位学习，您的学习进度也可随之转移。

位教师

Dr. Mohit Bhatnagar

O.P. Jindal Global University

5 门课程4,494 名学生

提供方

O.P. Jindal Global University

从 Data Analysis 浏览更多内容

状态：免费试用
Microsoft
Data Analytics and Machine Learning for Big Data
课程
状态：免费试用
University of Pittsburgh
Big Data Processing with Hadoop and Spark
课程
状态：免费试用
IBM
Introduction to Big Data with Spark and Hadoop
课程
状态：免费试用
EDUCBA
Big Data Analytics with Hive, Pig & MapReduce
课程

人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

自 2018开始学习的学生

''能够按照自己的速度和节奏学习课程是一次很棒的经历。只要符合自己的时间表和心情，我就可以学习。'

Jennifer J.

自 2020开始学习的学生

''我直接将从课程中学到的概念和技能应用到一个令人兴奋的新工作项目中。'

Larry W.

自 2021开始学习的学生

''如果我的大学不提供我需要的主题课程，Coursera 便是最好的去处之一。'

Chaitanya A.

''学习不仅仅是在工作中做的更好：它远不止于此。Coursera 让我无限制地学习。'

通过 Coursera Plus 开启新生涯

无限制访问 10,000+ 世界一流的课程、实践项目和就业就绪证书课程 - 所有这些都包含在您的订阅中

了解更多

通过在线学位推动您的职业生涯

获取世界一流大学的学位 - 100% 在线

探索学位

加入超过 3400 家选择 Coursera for Business 的全球公司

提升员工的技能，使其在数字经济中脱颖而出

了解更多

常见问题

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.