Big data is the area of informatics focusing on datasets whose size is beyond the ability of typical database and other software tools to capture, store, analyze and manage. This course provides a rapid immersion into the area of big data and the technologies which have recently emerged to manage it.

推荐体验
推荐体验
中级
Familiarity with Linux Shell (Bash)/Operating Systems, Familiarity with Relational Database (SQL)/Management Systems
您将学到什么
Understanding and identifying use cases and domains of Big Data problems
Selecting and implementing technical solutions involving Big Data systems
Develop and use various open source software systems (Apache) in the Big Data tech stack
Operate and run various cloud computing software services (AWS) in the Big Data infrastructure space
您将获得的技能
要了解的详细信息

添加到您的领英档案
54 项作业
了解顶级公司的员工如何掌握热门技能

积累特定领域的专业知识
- 向行业专家学习新概念
- 获得对主题或工具的基础理解
- 通过实践项目培养工作相关技能
- 获得可共享的职业证书

该课程共有9个模块
Welcome to Big Data Technologies! In Module 1, students will develop a foundational understanding of analytic data, its inherent value, and the methods to transform raw data into valuable insights. This module covers the challenges of handling large datasets, including their collection, processing, and analysis, while providing a comprehensive overview of Big Data's origins, properties, and real-world applications. Additionally, students will explore the economic, logistical, and ethical concerns associated with Big Data, alongside the professional advantages for data scientists proficient in Big Data analysis.
涵盖的内容
16个视频10篇阅读材料8个作业1个讨论话题
16个视频• 总计104分钟
- Course Overview• 4分钟
- Instructor Introduction• 2分钟
- Module 1 Introduction• 2分钟
- From Data to Value - Part 1• 10分钟
- From Data to Value - Part 2• 7分钟
- Big Data Overview - Part 1• 8分钟
- Big Data Overview - Part 2• 6分钟
- Confounding Factors - Part 1• 8分钟
- Confounding Factors - Part 2• 7分钟
- Confounding Factors - Part 3• 6分钟
- Big Data Challenges• 6分钟
- Big Data Benefits - Part 1• 6分钟
- Big Data Benefits - Part 2• 5分钟
- Big Data Technology - Part 1• 10分钟
- Big Data Technology - Part 2• 8分钟
- Generic Distributed Storage Systems and Execution Engines • 11分钟
10篇阅读材料• 总计500分钟
- Syllabus• 10分钟
- Module 1 Introduction Reading• 60分钟
- From Data to Value• 60分钟
- Big Data Overview• 60分钟
- Confounding Factors• 60分钟
- Big Data Challenges• 60分钟
- Big Data Benefits• 60分钟
- Big Data Technology• 60分钟
- Generic Distributed Storage Systems and Execution Engines• 60分钟
- Module 1 Summary• 10分钟
8个作业• 总计330分钟
- From Data to Value Quiz• 15分钟
- Big Data Overview Quiz• 15分钟
- Confounding Factors Quiz• 15分钟
- Big Data Challenges Quiz• 15分钟
- Big Data Benefits Quiz• 15分钟
- Big Data Technology Quiz• 15分钟
- Creating an AWS Account Assignment• 120分钟
- Module 1 Summative Assessment• 120分钟
1个讨论话题• 总计10分钟
- Meet and Greet Discussion• 10分钟
Module 2 introduces students to the challenges of building and managing distributed systems for big data storage and processing. It covers Hadoop’s origins, concepts, core components, and key characteristics, while exploring the Hadoop ecosystem's tools and services. Students will gain an understanding of distributed file systems, specifically HDFS, YARN's resource management, and various technologies for effective big data storage and organization.
涵盖的内容
13个视频7篇阅读材料6个作业
13个视频• 总计91分钟
- Module 2 Introduction• 2分钟
- Hadoop - Part 1• 9分钟
- Hadoop - Part 2• 6分钟
- Hadoop - Part 3• 7分钟
- Hadoop Distributed File System Overview - Part 1• 7分钟
- Hadoop Distributed File System Overview - Part 2• 8分钟
- Hadoop Distributed File System Overview - Part 3• 6分钟
- Using the Hadoop Distributed File System - Part 1• 9分钟
- Using the Hadoop Distributed File System - Part 2• 5分钟
- Cloud Object Storage for Big Data - Part 1• 9分钟
- Cloud Object Storage for Big Data - Part 2• 8分钟
- Yet Another Resource Negotiator - Part 1• 9分钟
- Yet Another Resource Negotiator - Part 2• 6分钟
7篇阅读材料• 总计370分钟
- Module 2 Introduction Reading• 60分钟
- Hadoop• 60分钟
- Hadoop Distributed File System Overview• 60分钟
- Using the Hadoop Distributed File System• 60分钟
- Cloud Object Storage for Big Data• 60分钟
- Yet Another Resource Negotiator• 60分钟
- Module 2 Summary• 10分钟
6个作业• 总计195分钟
- Hadoop Quiz• 15分钟
- Hadoop Distributed File System (HDFS) Overview Quiz• 15分钟
- Using HDFS Quiz• 15分钟
- Cloud Object Storage Quiz• 15分钟
- Yet Another Resource Negotiator (YARN) Quiz• 15分钟
- Module 2 Summative Assessment• 120分钟
In Module 3, students will explore the differences between processing small to moderate versus massive data volumes through distributed computing. This module covers the key concepts of the MapReduce framework, including how it breaks down large data processing tasks into smaller, parallel tasks for efficient execution. Students will also learn about the phases of MapReduce, the role of map and reduce functions, optimization patterns, and the benefits and limitations of various development approaches, including Java-based MapReduce and Hadoop Streaming.
涵盖的内容
18个视频8篇阅读材料7个作业
18个视频• 总计120分钟
- Module 3 Introduction• 2分钟
- The Path to MapReduce - Part 1• 8分钟
- The Path to MapReduce - Part 2• 7分钟
- MapReduce Overview - Part 1• 6分钟
- MapReduce Overview - Part 2• 5分钟
- MapReduce Overview - Part 3• 7分钟
- MapReduce Concepts - Part 1• 6分钟
- MapReduce Concepts - Part 2• 5分钟
- MapReduce Concepts - Part 3• 6分钟
- MapReduce Concepts - Part 4• 10分钟
- MapReduce Examples - Part 1• 9分钟
- MapReduce Examples - Part 2• 5分钟
- MapReduce Programming - Part 1• 8分钟
- MapReduce Programming - Part 2• 10分钟
- MapReduce Programming - Part 3• 6分钟
- MapReduce Optimization - Part 1• 8分钟
- MapReduce Optimization - Part 2• 4分钟
- MapReduce Optimization - Part 3• 8分钟
8篇阅读材料• 总计430分钟
- Module 3 Introduction Reading• 60分钟
- The Path to MapReduce• 60分钟
- MapReduce Overview• 60分钟
- MapReduce Concepts• 60分钟
- MapReduce Examples• 60分钟
- MapReduce Programming• 60分钟
- MapReduce Optimization• 60分钟
- Module 3 Summary• 10分钟
7个作业• 总计210分钟
- The Path to MapReduce Quiz• 15分钟
- MapReduce Overview Quiz• 15分钟
- MapReduce Concepts Quiz• 15分钟
- MapReduce Examples Quiz• 15分钟
- MapReduce Programming• 15分钟
- MapReduce Optimization• 15分钟
- Module 3 Summative Assessment• 120分钟
In Module 4, students will explore Apache Spark as a powerful distributed processing framework for interactive, batch, and streaming tasks. This module covers Spark's core functionalities, including machine learning, graph processing, and handling structured and unstructured data, while highlighting its in-memory processing potential and unified nature. Students will compare Spark with MapReduce, learn about Spark's primary components, execution architecture, Resilient Distributed Datasets (RDDs), DataFrames, Datasets, and the various methods for creating and optimizing DataFrames for efficient data processing.
涵盖的内容
25个视频7篇阅读材料6个作业
25个视频• 总计143分钟
- Module 4 Introduction• 2分钟
- Spark Overview - Part 1• 9分钟
- Spark Overview - Part 2• 9分钟
- Spark Components - Part 1• 7分钟
- Spark Components - Part 2• 6分钟
- Spark Components - Part 3• 6分钟
- Spark Components - Part 4• 7分钟
- Spark Components - Part 5• 3分钟
- Spark Concepts - Part 1• 7分钟
- Spark Concepts - Part 2• 6分钟
- Spark Concepts - Part 3• 5分钟
- Spark Concepts - Part 4• 7分钟
- Spark Concepts - Part 5• 6分钟
- Spark Concepts - Part 6• 4分钟
- Spark Concepts - Part 7• 7分钟
- Spark Concepts - Part 8• 3分钟
- Spark Concepts - Part 9• 5分钟
- Spark Concepts - Part 10• 5分钟
- Creating Spark DataFrames - Part 1• 6分钟
- Creating Spark DataFrames - Part 2• 9分钟
- Creating Spark DataFrames - Part 3• 6分钟
- Creating Spark DataFrames - Part 4• 4分钟
- Defining Spark Schemas - Part 1• 6分钟
- Defining Spark Schemas - Part 2• 5分钟
- Defining Spark Schemas - Part 3• 2分钟
7篇阅读材料• 总计370分钟
- Module 4 Introduction Reading• 60分钟
- Spark Overview• 60分钟
- Spark Components• 60分钟
- Spark Concepts• 60分钟
- Creating Spark DataFrames• 60分钟
- Defining Spark Schemas• 60分钟
- Module 4 Summary• 10分钟
6个作业• 总计195分钟
- Spark Overview Quiz• 15分钟
- Spark Components Quiz• 15分钟
- Concepts Quiz• 15分钟
- Creating Spark DataFrames Quiz• 15分钟
- Defining Spark Schemas Quiz• 15分钟
- Module 4 Summative Assessment• 120分钟
In Module 5, students will delve deeper into Spark's capabilities for data manipulation and transformation. The module covers essential operations such as selecting, filtering, and sorting data, as well as joining DataFrames and performing aggregations. Students will also learn about handling null values, using Spark SQL for data queries, and optimizing performance with caching. Practical applications include creating and manipulating DataFrames, executing transformations and actions, and efficiently writing data to various formats.
涵盖的内容
19个视频11篇阅读材料10个作业
19个视频• 总计103分钟
- Module 5 Introduction• 2分钟
- Transformation - Rows - Part 1• 10分钟
- Transformation - Rows - Part 2• 5分钟
- Transformation - Rows - Part 3• 4分钟
- Transformations Columns - Part 1• 9分钟
- Transformations Columns - Part 2• 4分钟
- Transformations Join - Part 1• 4分钟
- Transformations Join - Part 2• 4分钟
- Transformations - Aggregations - Part 1• 7分钟
- Transformations - Aggregations - Part 2• 5分钟
- Transformations - Working with Null Values - Part 1• 5分钟
- Transformations - Working with Null Values - Part 2• 5分钟
- Transformations - Spark SQL - Part 1• 6分钟
- Transformations - Spark SQL - Part 2• 4分钟
- Transformations - Caching - Part 1• 4分钟
- Transformations - Caching - Part 2• 5分钟
- Actions• 10分钟
- Actions - Writing Data - Part 1• 5分钟
- Actions - Writing Data - Part 2• 5分钟
11篇阅读材料• 总计610分钟
- Module 5 Introduction Reading• 60分钟
- Transformation - Rows• 60分钟
- Transformations - Columns• 60分钟
- Transformations - Join• 60分钟
- Transformations - Aggregations• 60分钟
- Transformations - Working with Null Values• 60分钟
- Transformations - Spark SQL• 60分钟
- Transformations - Caching• 60分钟
- Actions• 60分钟
- Actions - Writing Data• 60分钟
- Module 5 Summary• 10分钟
10个作业• 总计255分钟
- Transformations - Rows Quiz• 15分钟
- Transformations - Columns Quiz• 15分钟
- Transformations - Join Quiz• 15分钟
- Transformations/Actions - Aggregations Quiz• 15分钟
- Transformations - Working with Null Values Quiz• 15分钟
- Transformations - Spark SQL Quiz• 15分钟
- Transformations - Caching Quiz• 15分钟
- Transformations - Actions Quiz• 15分钟
- Actions - Writing Data Quiz• 15分钟
- Module 5 Summative Assessment• 120分钟
Module 6 introduces students to the limitations of batch processing and the significance of real-time data processing. It covers essential aspects of stream processing, including data ingestion and analysis, with a focus on tools like Apache Kafka for stream ingestion and Spark Structured Streaming for scalable and fault-tolerant data processing. Students will also explore various design patterns for organizing big data clusters, the concept of data lakes, and the Lambda Architecture for unifying real-time and batch data processing in modern data environments.
涵盖的内容
16个视频6篇阅读材料6个作业
16个视频• 总计106分钟
- Module 6 Introduction• 3分钟
- Stream Ingestion and Processing I - Part 1• 9分钟
- Stream Ingestion and Processing I - Part 2• 8分钟
- Stream Ingestion and Processing I - Part 3• 8分钟
- Stream Ingestion and Processing II - Part 1• 6分钟
- Stream Ingestion and Processing II - Part 2• 3分钟
- Stream Ingestion and Processing II - Part 3• 5分钟
- Stream Ingestion and Processing II - Part 4• 7分钟
- Analytic Cluster Pattern - Part 1• 7分钟
- Analytic Cluster Pattern - Part 2• 7分钟
- Data Lake Pattern - Part 1• 6分钟
- Data Lake Pattern - Part 2• 6分钟
- Data Lake Pattern - Part 3• 6分钟
- Lambda Architecture - Part 1• 10分钟
- Lambda Architecture - Part 2• 8分钟
- Lambda Architecture - Part 3• 8分钟
6篇阅读材料• 总计310分钟
- Stream Ingestion and Processing (Part 1)• 60分钟
- Stream Ingestion and Processing (Part 2)• 60分钟
- Analytic Cluster Pattern• 60分钟
- Data Lake Pattern• 60分钟
- Lambda Architecture• 60分钟
- Module 6 Summary• 10分钟
6个作业• 总计195分钟
- Stream Ingestion and Processing (Part 1) Quiz• 15分钟
- Stream Ingestion and Processing (Part 2) Quiz• 15分钟
- What is a characteristic of a transient Hadoop cluster? Quiz• 15分钟
- Data Lake Pattern Quiz• 15分钟
- Lambda Architecture Quiz• 15分钟
- Module 6 Summative Assessment• 120分钟
In Module 7, students will explore the benefits and limitations of relational databases in big data contexts and the concept of distributed database systems. This module covers NoSQL databases, their diverse data models, and their scalability and flexibility advantages. Students will also learn about real-world use cases, data partitioning, consistency models, and the CAP Theorem, gaining a comprehensive understanding of how NoSQL databases manage large datasets across clusters while ensuring scalability and availability.
涵盖的内容
18个视频6篇阅读材料6个作业
18个视频• 总计121分钟
- Module 7 Introduction• 3分钟
- Using Databases for Big Data Storage - Part 1• 10分钟
- Using Databases for Big Data Storage - Part 2• 6分钟
- Using Databases for Big Data Storage - Part 3• 3分钟
- Using Databases for Big Data Storage - Part 4• 9分钟
- Using Databases for Big Data Storage - Part 5• 6分钟
- NoSQL Database Concepts I - Part 1• 7分钟
- NoSQL Database Concepts I - Part 2• 7分钟
- NoSQL Database Concepts I - Part 3• 4分钟
- NoSQL Database Concepts II - Part 1• 5分钟
- NoSQL Database Concepts II - Part 2• 3分钟
- NoSQL Database Concepts II - Part 3• 7分钟
- NoSQL Database Classifications I - Part 1• 11分钟
- NoSQL Database Classifications I - Part 2• 7分钟
- NoSQL Database Classifications I - Part 3• 9分钟
- NoSQL Database Classifications II - Part 1• 12分钟
- NoSQL Database Classifications II - Part 2• 5分钟
- NoSQL Database Classifications II - Part 3• 6分钟
6篇阅读材料• 总计310分钟
- Using Databases for Big Data Storage• 60分钟
- NoSQL Database Concepts (Part 1)• 60分钟
- NoSQL Database Concepts (Part 2)• 60分钟
- NoSQL Database Classifications (Part 1)• 60分钟
- NoSQL Database Classifications (Part 2)• 60分钟
- Module 7 Summary• 10分钟
6个作业• 总计195分钟
- Using Databases for Big Data Storage Quiz• 15分钟
- NoSQL Database Concepts (Part 1) Quiz• 15分钟
- NoSQL Database Concepts (Part 2) Quiz• 15分钟
- NoSQL Database Classifications (Part 1) Quiz• 15分钟
- NoSQL Database Classifications (Part 2) Quiz• 15分钟
- Module 7 Summative Assessment• 120分钟
In Module 8, students will explore specific NoSQL databases types – namely Key-Value, Wide-Column, and Document databases. Two similar systems, HBase and Cassandra, will be studied and contrasted in the context of the CAP theorem and associated CP/AP trade-offs. Topics such as consistency and availability will be discussed in the context of specific usage scenarios for both HBase and Cassandra – and general application domains of both systems will be highlighted. Finally, the document database MongoDB will be reviewed in the context of natural language/text processing use cases – and MongoDB usage and architecture will be analyzed with respect to traditional RDBMS.
涵盖的内容
9个视频4篇阅读材料4个作业
9个视频• 总计71分钟
- Module 8 Introduction• 0分钟
- HRBase Pt. 1• 10分钟
- HRBase Pt. 2• 7分钟
- HRBase Pt. 3• 6分钟
- Cassandra Pt. 1• 11分钟
- Cassandra Pt. 2• 11分钟
- MongoDB Pt. 1• 9分钟
- MongoDB Pt. 2• 7分钟
- MongoDB Pt. 3• 10分钟
4篇阅读材料• 总计190分钟
- HR Base• 60分钟
- Dynamo and Cassandra• 60分钟
- Mongo DB• 60分钟
- Module 8 Summary• 10分钟
4个作业• 总计165分钟
- HR Base Quiz• 15分钟
- Cassandra Quiz• 15分钟
- Mongo DB Quiz• 15分钟
- Module 8 Summative Assessment• 120分钟
This module contains the summative course assessment that has been designed to evaluate your understanding of the course material and assess your ability to apply the knowledge you have acquired throughout the course.
涵盖的内容
1个作业
1个作业• 总计180分钟
- Summative Course Assessment• 180分钟
获得职业证书
将此证书添加到您的 LinkedIn 个人资料、简历或履历中。在社交媒体和绩效考核中分享。
攻读学位
课程 是 Illinois Tech提供的以下学位课程的一部分。如果您被录取并注册,您已完成的课程可计入您的学位学习,您的学习进度也可随之转移。
攻读学位
课程 是 Illinois Tech提供的以下学位课程的一部分。如果您被录取并注册,您已完成的课程可计入您的学位学习,您的学习进度也可随之转移。
Illinois Tech
Master of Data Science
学位 · 12-15 months
必须成功申请并注册。资格要求适用。各院校会根据您现有的学分情况,确定完成本课程后可计入学位要求的学分。单击特定课程了解更多信息。
位教师

提供方

提供方

Illinois Tech is a top-tier, nationally ranked, private research university with programs in engineering, computer science, architecture, design, science, business, human sciences, and law. The university offers bachelor of science, master of science, professional master’s, and Ph.D. degrees—as well as certificates for in-demand STEM fields and other areas of innovation. Talented students from around the world choose to study at Illinois Tech because of the access to real-world opportunities, renowned academic programs, high value, and career prospects of graduates.
从 Software Development 浏览更多内容

课程
OO.P. Jindal Global University
课程
YYonsei University
课程
UUniversity of Pittsburgh
课程
人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

Jennifer J.

Larry W.

Chaitanya A.
常见问题
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.
更多问题
提供助学金,

