This course provides a comprehensive overview of data storage and management approaches for big data. Learners will explore structured, semi-structured, and unstructured data formats, compare SQL and NoSQL database technologies, and implement data lakes and data warehouses. The course includes working with various file formats and understanding the differences between batch and real-time processing approaches.
只需 199 美元(原价 399 美元)即可通过 Coursera Plus 学习更高水平的技能。立即节省

您将学到什么
- Manage big data storage and pipelines with Azure services.
- Process and analyze large datasets using Apache Spark and Databricks.
您将获得的技能
要了解的详细信息
了解顶级公司的员工如何掌握热门技能

积累 Data Analysis 领域的专业知识
- 向行业专家学习新概念
- 获得对主题或工具的基础理解
- 通过实践项目培养工作相关技能
- 通过 Microsoft 获得可共享的职业证书

该课程共有5个模块
Data Storage Technologies (SQL vs NoSQL) guides learners through the core principles of modern data storage and the trade-offs that shape today’s big data systems. The module examines how relational databases manage structured data, where they encounter limitations at scale, and how techniques such as partitioning, indexing, and lakehouse architectures mitigate performance gaps. Learners compare major NoSQL categories—including document, key-value, and column-family databases—to understand how flexible schemas and distributed designs support high-volume, high-velocity workloads. Through hands-on activities with SQL Server, Azure Synapse, and Azure Cosmos DB, learners practice essential operations, evaluate storage technologies based on workload requirements, and build the skills needed to select and implement effective database solutions for big data environments.
涵盖的内容
6个视频3篇阅读材料8个作业
Working with Data Formats (Structured, Semi-structured, Unstructured) helps learners build a clear understanding of how different data formats function within big data systems and why format selection matters for performance, storage, and analytical success. The module introduces structured formats, such as CSV and TSV, and explores flexible semi-structured formats, including JSON and XML. It also examines optimized file types, including Parquet, Avro, and ORC, that support large-scale analytics. Learners practice transforming data between formats using Azure Data Factory, working with nested structures, applying schema inference, and evaluating performance trade-offs across file types. Through demonstrations, code exercises, and hands-on labs, this module equips learners to select, convert, and manage data formats effectively for diverse big data scenarios.
涵盖的内容
6个视频3篇阅读材料8个作业
Data Lakes and Data Warehouses Implementation guides learners through the architectural foundations and hands-on skills needed to build modern analytical environments. The module explores the purpose and structure of data lakes, highlighting the zones of raw, cleaned, enriched, and curated data, and demonstrates how thoughtful design supports flexibility, governance, and large-scale analytics. Learners also study core data warehouse concepts, including dimensional modeling, star schemas, and data marts, to understand how structured storage enables high-performance querying. Through practical work with Azure Data Lake Storage Gen2 and Azure Synapse Analytics, learners design zone architectures, implement dimensional models, configure SQL pools, and apply best practices for partitioning, distribution, and optimization. By the end, they gain the ability to organize, govern, and integrate data across both lake and warehouse environments, supporting scalable, enterprise-ready analytics.
涵盖的内容
6个视频3篇阅读材料7个作业
Building Data Pipelines (ETL/ELT with Azure Data Factory) equips learners with the skills to design, implement, and manage scalable data integration workflows using modern, cloud-native approaches. The module examines the differences between ETL and ELT, helping learners understand when each methodology delivers the best performance, flexibility, and cost efficiency. Learners gain hands-on experience with Azure Data Factory, configuring linked services, datasets, activities, and core orchestration components, and practice building both simple and advanced pipelines. The module also introduces transformation logic, control flow patterns, parameterization, and error handling strategies that support production-ready data engineering solutions. Through walkthroughs, labs, code exercises, and scenario-based decisions, learners learn to monitor pipelines, troubleshoot failures, and design reliable data workflows that support enterprise-scale analytics.
涵盖的内容
6个视频3篇阅读材料9个作业
Batch and Real-Time Processing Fundamentals introduces learners to the core processing models that power modern big data systems, helping them understand when each approach delivers the most value. The module explores batch architectures, scheduling methods, and optimization strategies for large-scale historical processing, while also examining real-time stream processing concepts, including event handling, latency trade-offs, and throughput requirements. Learners gain hands-on experience implementing both models—building batch workflows with Azure Data Factory and configuring streaming pipelines using Event Hubs and Stream Analytics. Through architectural analysis, code exercises, and practical labs, learners learn to evaluate business needs, select the right processing approach, and design hybrid systems that combine batch and streaming for comprehensive analytics.
涵盖的内容
6个视频3篇阅读材料9个作业
获得职业证书
将此证书添加到您的 LinkedIn 个人资料、简历或履历中。在社交媒体和绩效考核中分享。
从 Data Analysis 浏览更多内容
状态:预览Northeastern University
状态:免费试用DeepLearning.AI
状态:免费试用
状态:免费试用
人们为什么选择 Coursera 来帮助自己实现职业发展




常见问题
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
更多问题
提供助学金,
¹ 本课程的部分作业采用 AI 评分。对于这些作业,将根据 Coursera 隐私声明使用您的数据。





