This course will cover various topics in data engineering in support of decision support systems, data analytics, data mining, machine learning, and artificial intelligence. You will study on-premises data warehouse architecture, dimensional modeling of data warehouses, Extract-Transform-Load (ETL) integration from source systems to data warehouse, On-line Analytical Processing (OLAP) systems, and the evolving world of data quality and data governance. It offers you an opportunity to design, develop and maintain cloud-based data pipelines. Both on-premises and cloud-based platforms will be used to illustrate and implement data engineering techniques using operational and analytical data warehouses.

Data Warehousing and Integration Part 1
访问权限由 New York State Department of Labor 提供
您将获得的技能
您将学习的工具
要了解的详细信息

添加到您的领英档案
13 项作业
了解顶级公司的员工如何掌握热门技能

该课程共有7个模块
This module introduces data warehousing and business intelligence, emphasizing their role in enhancing organizational decision-making. Data warehouses transform raw data into actionable insights using processes like ETL (Extract, Transform, and Load), supported by tools such as OLAP for querying and data mining. While operational databases (OLTP) are suited for daily transactions, OLAP databases are optimized for complex analytics.
涵盖的内容
3个视频6篇阅读材料1个作业
3个视频• 总计7分钟
- Course Overview• 2分钟
- Meet Your Instructor: Venkat Krishnamurthy• 2分钟
- Introduction to Data Warehouses• 4分钟
6篇阅读材料• 总计178分钟
- Welcome to Data Warehousing & Integration Part 1• 2分钟
- Syllabus - Data Warehousing & Integration Part 1• 10分钟
- Academic Integrity• 1分钟
- Module 1 Overview• 5分钟
- Introduction to Data Warehouses• 5分钟
- Conceptual Database Design• 155分钟
1个作业• 总计15分钟
- Assess Your Learning: Conceptual Database Modeling• 15分钟
This module builds on the foundations of database design from the previous module, focussing on relational database modeling, normalization, and SQL. The readings will guide you in translating a conceptual EER diagram into a relational model, ensuring adherence to normalization principles and aiming for Third Normal Form (3NF). We’ll also emphasize understanding primary keys and foreign keys for maintaining data integrity and establishing table relationships. You will also have the opportunity to create and critique relational models. We’ll then explore SQL basics, covering syntax (SELECT, INSERT, UPDATE, DELETE), querying techniques (WHERE, ORDER BY, JOIN), and operations involving functions and aggregates (COUNT, SUM, AVG, MIN, MAX), which are fundamental in database querying and management.
涵盖的内容
3篇阅读材料2个作业1个应用程序项目
3篇阅读材料• 总计339分钟
- Module 2 Overview• 5分钟
- Logical Database Design• 165分钟
- SQL• 169分钟
2个作业• 总计40分钟
- Assess Your Learning: Logical Database Design• 20分钟
- Assess Your Learning: SQL• 20分钟
1个应用程序项目• 总计10分钟
- Normalization• 10分钟
This module provides an introduction to data warehouse concepts. Data warehouses are based on a multidimensional model. We will look closely into the multidimensional model and its representation as data cubes (also known as hypercubes). We’ll examine how different aspects of data are categorized into facts, measures, and dimensions. Dimensions such as Product, Time, and Customer are organized hierarchically within a cube, allowing data to be analyzed at various levels of detail. Measures such as Quantity and Sales Amount are stored within these cubes, and analysts can navigate through different levels of detail using "rolling up" and "drilling down" techniques. We will also explore key concepts such as granularity, dimension schema, and member hierarchies, which are essential in understanding how data is structured and analyzed in multidimensional models. Finally, we will learn to use techniques such as disjointness, completeness, and correctness to ensure data accuracy and integrity when aggregating information in data cubes, collectively known as summarizability.
涵盖的内容
2个视频5篇阅读材料2个作业1个应用程序项目
2个视频• 总计6分钟
- Mental Image of Multidimensional Cube• 3分钟
- Summarizability• 3分钟
5篇阅读材料• 总计93分钟
- Module 3 Overview• 5分钟
- Multidimensional Model• 12分钟
- Measures and Summarizability• 46分钟
- OLAP Operations on a Multidimensional Model• 10分钟
- Data Warehouse and Architecture• 20分钟
2个作业• 总计50分钟
- Assess Your Learning: Measures & Summarizability• 25分钟
- Assess Your Learning: OLAP Operations• 25分钟
1个应用程序项目• 总计15分钟
- The Multidimensional Model• 15分钟
In this module we’ll explore conceptual modeling with multidimensional models, visualized using MultiDim. This approach helps us organize data into facts and dimensions and understand the relationships between them, which is essential for designing data warehouses. We’ll explore topics such as dimensions (e.g., date, customer) and measures (e.g., quantity, total sales) in more detail. We’ll also explore the difference between primary events and secondary events and learn how they are used. Finally, we will look at another categorization of Measures into Flow: Level and Unit Measures.
涵盖的内容
2个视频4篇阅读材料3个作业
2个视频• 总计9分钟
- Primary and Secondary Events• 4分钟
- Additivity of Measures• 5分钟
4篇阅读材料• 总计56分钟
- Module 4 Overview• 5分钟
- Design Conceptual Multidimensional Models• 36分钟
- Primary and Secondary Events• 5分钟
- Additivity of Measures• 10分钟
3个作业• 总计31分钟
- Assess Your Learning: Conceptual Modeling 1• 15分钟
- Assess Your Learning: Primary and Secondary Events• 8分钟
- Assess Your Learning: Additivity of Measures• 8分钟
In this module, we’ll dive into conceptual modeling of hierarchies within data warehouses, exploring their definitions, characteristics, and significance. Balanced hierarchies have a uniform structure where each child has one parent and all branches are of the same length, making data analysis consistent and efficient. In contrast, unbalanced hierarchies have varying branch lengths and missing aggregation levels, offering flexibility to model real-world scenarios like product categories and geographical hierarchies. You’ll also be introduced to generalized hierarchies, which involve "is-a" relationships between supertypes and subtypes, allowing for detailed data representation but requiring careful management of aggregation and specialization. We’ll also explore alternative hierarchies, showcasing different ways to organize the same dimension, such as calendar vs. fiscal views of time. Finally, we’ll look at parallel hierarchies, both independent and dependent, as tools for analyzing data from multiple perspectives, representing complex organizational structures. Understanding these hierarchy types is crucial for effective data management and analysis in data warehousing.
涵盖的内容
4个视频3篇阅读材料2个作业
4个视频• 总计14分钟
- Balanced and Unbalanced Hierarchies• 5分钟
- Generalized Hierarchies• 4分钟
- Alternative Hierarchies• 3分钟
- Parallel Hierarchies• 2分钟
3篇阅读材料• 总计140分钟
- Module 5 Overview• 5分钟
- Balanced and Unbalanced Hierarchies• 60分钟
- Advanced Modeling Concepts• 75分钟
2个作业• 总计23分钟
- Assess Your Learning: Conceptual Modeling of Hierarchies• 15分钟
- Assess Your Learning: Advanced Modeling Concepts• 8分钟
In this module, you’ll explore logical modeling in data warehousing, which is the process of designing a structured, abstract representation of data to be stored, focusing on how data is organized, related, and optimized for efficient querying and analysis. Building on what you learned in the previous modules, you'll take the next step in data warehouse design: translating a conceptual model into a logical model for implementation. The module will focus on the relational representation of data warehouses, including the study of various schema implementations: star, snowflake, starflake, and constellation. You'll also examine the rules for mapping a multidimensional conceptual model to a relational model, highlighting the role and importance of different types of keys in this process. We'll also discuss strategies for maintaining consistency in a data warehouse. Finally, you'll explore how to pre-populate certain dimensions, like time, to streamline operations and improve query performance.
涵盖的内容
6个视频11篇阅读材料2个作业1个应用程序项目
6个视频• 总计9分钟
- Introduction to Logical Modeling in Data Warehousing• 2分钟
- Different ROLAP Schemas Conclusion• 2分钟
- Surrogate Keys• 1分钟
- Importance of Data Consistency• 1分钟
- Consistency in a Data Warehouse Example• 2分钟
- Prepopulating Dimensional Data Example• 1分钟
11篇阅读材料• 总计122分钟
- Module 6 Overview• 5分钟
- Logical Modeling of Data Warehouse• 32分钟
- Introduction to Surrogate Keys• 10分钟
- Benefits of Surrogate Keys• 10分钟
- Implementation of Surrogate Keys in a Data Warehouse• 10分钟
- Importance of Data Consistency• 5分钟
- Challenges & Best Practices for Maintaining and Ensuring Data Consistency• 10分钟
- Understanding Prepopulating Dimensions• 5分钟
- The Process of Prepopulating Time and Geography Dimensions• 5分钟
- Benefits of Prepopulating Time and Geography Dimensions• 5分钟
- Prepopulating Dimensions• 25分钟
2个作业• 总计35分钟
- Assess Your Learning: Logical Modeling• 20分钟
- Assess Your Learning: Keys, Consistency and Prepopulating Dimensions• 15分钟
1个应用程序项目• 总计20分钟
- Types of ROLAP Schemas• 20分钟
Designing a data warehouse is a complex process that requires transitioning from high-level conceptual models to detailed logical models. This transition is critical because it bridges the gap between understanding business needs and translating them into a technical framework that effectively supports those needs. In this module, you’ll expand on the logical modeling process covered in the previous module, with a particular focus on dimensional model design and the intricacies of hierarchy modeling. As you delve deeper, you’ll encounter logical modeling for advanced concepts such as many-to-many dimensions, links between facts, and facts with multiple granularities. We’ll also explore the concept of Slowly Changing Dimensions (SCDs), which are essential for managing historical data in your warehouse. You’ll learn how to implement different SCD types to accurately track and manage changes in dimension data over time. Finally, we’ll touch on SQL for OLAP, focusing on advanced concepts like aggregation and window functions, and you’ll learn how to use SQL to query and analyze data warehouses.
涵盖的内容
5个视频11篇阅读材料1个作业
5个视频• 总计13分钟
- Modeling Various Types of Hierarchies• 5分钟
- SCD Best Practices• 2分钟
- Translating between SCDs• 3分钟
- Examples of Translating Between SCD Types• 2分钟
- Conclusion • 1分钟
11篇阅读材料• 总计137分钟
- Module 7 Overview• 5分钟
- Introduction to Conceptual & Logical Models• 15分钟
- Mapping Process• 10分钟
- Conclusion• 1分钟
- Advanced Modeling Concepts• 36分钟
- Understanding Slowly Changing Dimensions• 5分钟
- Types of Slowly Changing Dimensions• 10分钟
- Benefits of Managing Slowly Changing Dimensions• 5分钟
- Steps for Translating Between SCD Types• 10分钟
- Performing OLAP queries with SQL• 38分钟
- Congratulations! • 2分钟
1个作业• 总计25分钟
- Assess Your Learning: Logical Representation of Hierarchies and Advanced concepts• 25分钟
位教师

提供方

提供方

Founded in 1898, Northeastern is a global research university with a distinctive, experience-driven approach to education and discovery. The university is a leader in experiential learning, powered by the world’s most far-reaching cooperative education program. The spirit of collaboration guides a use-inspired research enterprise focused on solving global challenges in health, security, and sustainability.
人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

Jennifer J.

Larry W.

Chaitanya A.
从 Data Science 浏览更多内容
NNortheastern University
课程
UUniversity of Colorado System
课程
NNortheastern University
课程

