Data Management for Analytics Part 2

Data Management for Analytics Part 2

位教师：Xuemin Jin

访问权限由 New York State Department of Labor 提供

6个模块

深入了解一个主题并学习基础知识。

1 周完成

在 10 小时一周

灵活的计划

自行安排学习进度

6个模块

深入了解一个主题并学习基础知识。

1 周完成

在 10 小时一周

灵活的计划

自行安排学习进度

您将获得的技能

您将学习的工具

要了解的详细信息

可分享的证书

添加到您的领英档案

作业

36 项作业

授课语言：英语（English）

了解顶级公司的员工如何掌握热门技能

了解关于 Coursera for Business 的更多信息

Petrobras, TATA, Danone, Capgemini, P&G 和 L'Oreal 的徽标

该课程共有6个模块

This course will offer you an opportunity to learn the fundamental concepts and emerging technologies in data storage and data governance. It presents a balanced theory-practice focus and covers Structured Query Language, and two flavors of NoSQL databases in MongoDB and Neo4j graph database. It also includes a brief introduction to big data management including hadoop, MapReduce, and Apache Spark. By the end of this part 2 course on data analytics, you will have a foundational understanding of the theory and applications of database management to support data analytics, data mining, machine learning, and artificial intelligence.

This module first presents an overview of the structured query language (SQL) Data Definition Language (SQL DDL) to define a relational data model. It examines the schema creation, table creation, drop command, and alter command. Various syntaxes are illustrated with explicit examples. This module also discusses the SQL Data Manipulation Language (SQL DML) used to retrieve data, update data, insert new data, and delete existing data. The focus is on SQL INSERT statements for inserting data into tables and some simple SQL SELECT statements. More complex SQL SELECT statements will be discussed in later modules along with SQL DELETE and SQL UPDATE statements.

涵盖的内容

1个视频10篇阅读材料7个作业

1个视频总计1分钟

Meet Your Faculty 1分钟

10篇阅读材料总计113分钟

Course Introduction 2分钟
Syllabus - Data Management for Analytics Part 2 10分钟
Academic Integrity 1分钟
What is SQL? 15分钟
SQL Data Definition Language (DDL) 5分钟
A DDL example 20分钟
DROP and ALTER command 10分钟
SQL INSERT statement 15分钟
SQL SELECT statement 30分钟
Module 1 Summary 5分钟

7个作业总计13分钟

Check Your Prior Knowledge 3分钟
Assess Your Learning: What is SQL? 1分钟
Assess Your Learning: SQL Data Definition Language (DDL) 2分钟
Assess Your Learning: A DDL Example 2分钟
Assess Your Learning: DROP and ALTER Command 2分钟
Assess Your Learning: SQL INSERT Statement 1分钟
Assess Your Learning: SQL SELECT statement 2分钟

This module continues the discussion of the SQL data manipulation language (DML) SELECT statement. It introduces various aggregate functions: COUNT, SUM, AVG, VARIANCE, MIN, and MAX, which are used to summarize information from database tuples. This is followed by the GROUP BY/HAVING clause, which allows the application of aggregate functions to subgroups. This module then discusses join queries that allow the user to combine or join data from multiple tables. The inner join queries feature a “where” clause that matches one or multiple columns from two tables. The left outer join, right outer join, and full outer join can be used to keep all the tuples of one or both tables in the result, regardless of whether or not they have matching tuples in the other table. All queries in this module use the Wine database in the online playground and can be executed there.

涵盖的内容

1个视频6篇阅读材料6个作业

1个视频总计4分钟

Aggregate Functions 4分钟

6篇阅读材料总计85分钟

Queries with Aggregate Functions 25分钟
Queries with GROUP BY/HAVING 10分钟
Queries with ORDER BY 10分钟
Inner Joins 20分钟
Outer Joins 15分钟
Module 2 Summary 5分钟

6个作业总计11分钟

Check Your Prior Knowledge 2分钟
Assess Your Learning: Queries with Aggregate Functions 2分钟
Assess Your Learning: Queries with GROUP BY/HAVING 1分钟
Assess Your Learning: Queries with ORDER BY 2分钟
Assess Your Learning: Inner Joins 2分钟
Assess Your Learning: Outer Joins 2分钟

This module presents more complex SQL queries. It introduces nested queries where a complete SELECT FROM block appears in the WHERE clause of another query. The subquery or inner block is nested in the outer block and there can be multi-level nesting. The query optimizer usually flattens the nested query into multiple queries and executes them sequentially from the innermost to the outermost level. This module also examines the correlated nested query, where the inner block uses one or more columns of the table defined in the outer block. In this case, the query cannot be flattened, and the inner block subquery must be evaluated for each tuple of the table (also used in the inner block). The usage of the operators >= ALL and > ANY is discussed. The former can be used to find the highest or largest values whereas the latter can be used to exclude the lowest or smallest values. All queries in this module use the Wine database in the online playground and can be executed there. Finally, this module examines the DELETE and UPDATE statements that can be used to delete or modify data. It concludes with a brief discussion of SQL views.

涵盖的内容

2个视频10篇阅读材料10个作业

2个视频总计8分钟

Nested Query - Correlated Query 4分钟
ALL/ANY/EXISTS/NOT EXISTS 4分钟

10篇阅读材料总计135分钟

Nested Queries 15分钟
Nested Correlated Queries 20分钟
Queries with ALL/ANY 15分钟
EXISTS/NOT EXISTS functions 10分钟
Subqueries in SELECT/FROM 10分钟
Set Operations 15分钟
DELETE Statement 15分钟
UPDATE Statement 15分钟
SQL Views 15分钟
Module 10 Summary 5分钟

10个作业总计19分钟

Check Your Prior Knowledge 3分钟
Assess Your Learning: Nested Queries 2分钟
Assess Your Learning: Nested Correlated Queries 2分钟
Assess Your Learning: Queries with ALL/ANY Knowledge 2分钟
Assess Your Learning: EXISTS/NOT EXISTS Functions 2分钟
Assess Your Learning: Subqueries in SELECT/FROM 1分钟
Assess Your Learning: Set Operations 2分钟
Assess Your Learning: DELETE Statement 2分钟
Assess Your Learning: UPDATE Statement 2分钟
Assess Your Learning: SQL Views 1分钟

This module introduces a couple of extensions to the Relational Database Management Systems (RDBMSs). We will start by reviewing the core components of the relational model and its limitations. Subsequently, the module explores methods for extending relational databases, starting with a thorough review of triggers and stored procedures as pivotal mechanisms for augmenting the activity of RDBMSs. The module concludes by delving into the intricacies of recursive queries, a powerful extension to the SQL language.

涵盖的内容

4篇阅读材料4个作业

4篇阅读材料总计60分钟

Limitations of the relational model 10分钟
Active Relational Database Management System Extensions: Triggers and Stored Procedures 25分钟
Recursive SQL Queries 20分钟
Week 11 Summary 5分钟

4个作业总计8分钟

Check Your Prior Knowledge 2分钟
Assess Your Learning: Limitations of the relational model 3分钟
Assess Your Learning: Active Relational Database Management System Extensions: Triggers and Stored Procedures 2分钟
Assess Your Learning: Recursive SQL Queries 1分钟

This module presents an overview of the NoSQL movement and distributed systems. MongoDB NoSQL database is discussed at the introductory level. MongoDB is intended for storing documents such as resumes, legal documents, books, etc. It does not use any schema or data model, and stores documents as collections — which store a collection of attributes labeled and unordered that represent semi-structured items.

涵盖的内容

5篇阅读材料5个作业

5篇阅读材料总计70分钟

The NoSQL movement 20分钟
Key-Value Stores and Distributed Systems 10分钟
Document Stores and MongoDB 20分钟
Aggregation with MapReduce 15分钟
Module 5 Summary 5分钟

5个作业总计7分钟

Check Your Prior Knowledge 1分钟
Assess Your Learning: The NoSQL movement 2分钟
Assess Your Learning: Key-Value Stores and Distributed Systems 1分钟
Assess Your Learning: Document Stores and MongoDB 2分钟
Assess Your Learning: Aggregation with MapReduce 1分钟

This module continues the discussion of the NoSQL database. The graph theory and Neo4j graph database are discussed at the introductory level. The Neo4j is a graph database that applies graph theory to information storage. It consists of nodes and edges, both of which can store information. Graph databases are particularly useful in modeling social networks such as X (formerly known as Twitter) and Facebook. In a way, a graph database is a hyper-relational database where join tables are replaced by more interesting and semantically meaningful relationships that can be navigated (graph traversal) and/or queried, based on graph pattern matching.