Data Analytics and Machine Learning for Big Data

本课程是 Microsoft Big Data Management and Analytics 专业证书的一部分

位教师： Microsoft

访问权限由 Coursera Learning Team 提供

5个模块

深入了解一个主题并学习基础知识。

中级等级

推荐体验

3 周完成

在 10 小时一周

灵活的计划

自行安排学习进度

5个模块

深入了解一个主题并学习基础知识。

中级等级

推荐体验

3 周完成

在 10 小时一周

灵活的计划

自行安排学习进度

您将学到什么

- Manage big data storage and pipelines with Azure services.
- Process and analyze large datasets using Apache Spark and Databricks.

您将获得的技能

您将学习的工具

Apache Spark

要了解的详细信息

可分享的证书

添加到您的领英档案

作业

46 任务¹

AI 评分请参见免责声明

授课语言：英语（English）

了解顶级公司的员工如何掌握热门技能

了解关于 Coursera for Business 的更多信息

Petrobras, TATA, Danone, Capgemini, P&G 和 L'Oreal 的徽标

积累 Data Analysis 领域的专业知识

本课程是 Microsoft Big Data Management and Analytics 专业证书专项课程的一部分

在注册此课程时，您还会同时注册此专业证书。

向行业专家学习新概念
获得对主题或工具的基础理解
通过实践项目培养工作相关技能
通过 Microsoft 获得可共享的职业证书

该课程共有5个模块

This advanced course teaches machine learning and AI techniques for big data systems. Learners will build end-to-end ML pipelines with PySpark ML, implement supervised and unsupervised models, and apply NLP techniques at scale. The course also explores deep learning, distributed training, and integrating Generative AI into big data workflows.

By the end of this course, you will be able to: - Implement ML pipelines using PySpark ML - Build supervised, unsupervised, and recommendation models - Apply NLP and text analytics to large datasets -Integrate Generative AI and LLMs with big data systems Tools & Software: PySpark ML, PyTorch, TensorFlow, Azure Machine Learning, Azure OpenAI Service Skills: Machine learning, NLP, Deep learning, Generative AI, Model evaluation

Machine learning appears quite different when data exceeds the capacity of a single system. In this section, learners explore the foundational ideas behind machine learning in big data environments and how familiar approaches change at scale. You will examine supervised and unsupervised learning, regression and classification problems, and the practical challenges that arise with massive datasets—such as scalability, distributed computing, and the need to adapt algorithms for large-scale processing.

涵盖的内容

6个视频3篇阅读材料7个作业

6个视频总计29分钟

Machine Learning Transforms Big Data into Business Intelligence4分钟
ML Problem Classification and Business Mapping7分钟
Data Quality Drives ML Success at Scale4分钟
Distributed Data Preparation Workflows6分钟
Rigorous Evaluation Prevents ML Disasters at Scale4分钟
Implementing Scalable Model Evaluation5分钟

3篇阅读材料总计30分钟

Machine Learning Fundamentals for Big Data Environments10分钟
Big Data ML Preparation Techniques10分钟
ML Model Evaluation for Big Data Systems10分钟

7个作业总计210分钟

Machine Learning Problem Analysis30分钟
ML Fundamentals for Big Data Assessment30分钟
ML Data Preparation Pipeline30分钟
Data Preparation for ML at Scale Assessment30分钟
Scalable Model Evaluation30分钟
Model Evaluation at Scale Assessment30分钟
ML Fundamentals for Big Data Mastery30分钟

A practical foundation for building scalable machine learning solutions using PySpark ML in big data environments. The content focuses on designing and implementing end-to-end machine learning pipelines with transformers and estimators, while developing regression, classification, and clustering models that scale across distributed systems. Emphasis is placed on real-world implementation and informed platform selection for enterprise deployments using Azure Databricks, Microsoft Fabric, and Azure HDInsight, ensuring solutions are both technically robust and operationally viable at scale.

涵盖的内容

6个视频3篇阅读材料10个作业

6个视频总计36分钟

Democratizing Machine Learning at Enterprise Scale4分钟
PySpark ML Pipeline Development Across Platforms10分钟
Supervised Learning Success Stories in Enterprise Big Data5分钟
Supervised Learning Model Development6分钟
Recommendation Systems Drive Business Growth4分钟
Building Scalable Recommendation Systems8分钟

3篇阅读材料总计30分钟

PySpark ML Architecture and Platform Comparison10分钟
Supervised Learning Algorithms for Big Data10分钟
Unsupervised Learning and Recommendation Systems10分钟

10个作业总计300分钟

ML Pipeline Component Development30分钟
ML Platform Comparison and Pipeline Creation30分钟
PySpark ML Platform Fundamentals Assessment30分钟
Supervised Learning Implementation30分钟
Supervised Learning Model Development30分钟
Supervised Learning at Scale Assessment30分钟
Recommendation System Implementation30分钟
Recommendation System Development30分钟
Unsupervised Learning and Recommendations Assessment30分钟
PySpark ML Implementation Mastery30分钟

Large-scale text analytics introduces the challenges and techniques required to process and analyze unstructured text at enterprise scale using distributed computing frameworks. The focus is on applying natural language processing (NLP) techniques in scalable architectures to support text classification, sentiment analysis, and entity and relationship extraction across massive text corpora. Emphasis is placed on practical, production-oriented approaches for handling high-volume text data, with integration of Azure Cognitive Services to enhance accuracy, scalability, and operational efficiency in real-world analytics solutions.

涵盖的内容

6个视频3篇阅读材料10个作业

6个视频总计39分钟

Unlocking Value from Unstructured Text at Scale5分钟
Building Scalable Text Processing Pipelines9分钟
Advanced NLP Drives Business Intelligence5分钟
Implementing Advanced NLP at Scale7分钟
Production-Scale Text Classification Transforms Business Operations4分钟
Building Production Text Classification Systems8分钟

3篇阅读材料总计30分钟

Distributed Text Processing Techniques10分钟
Advanced NLP Techniques for Big Data10分钟
Scalable Text Classification Architectures10分钟

10个作业总计300分钟

Text Preprocessing Pipeline Development30分钟
Scalable Text Preprocessing Design30分钟
Text Processing at Scale Assessment30分钟
Advanced NLP Implementation and Monitoring30分钟
NLP System Architecture Design30分钟
Advanced NLP Techniques Assessment30分钟
Text Classification System Development30分钟
Text Classification System Implementation30分钟
Text Classification at Scale Assessment30分钟
Text Analytics and NLP Mastery30分钟

Deep Learning for Big Data introduces the fundamentals of deep learning and advanced architectures specifically adapted for big data environments. Students will learn to implement neural networks for big data applications, apply transfer learning techniques with pre-trained models, and scale deep learning training across distributed clusters using modern frameworks and optimization techniques.

涵盖的内容

6个视频3篇阅读材料10个作业

6个视频总计31分钟

Deep Learning Revolutionizes Big Data Analytics5分钟
Neural Network Implementation in Big Data Frameworks5分钟
Advanced Architectures Transform Complex Data Analysis6分钟
CNN and RNN Implementation at Scale5分钟
Distributed Deep Learning Enables Breakthrough Scale4分钟
Implementing Distributed Deep Learning Training5分钟

3篇阅读材料总计30分钟

Deep Learning Architectures for Big Data10分钟
Advanced Deep Learning Architectures for Scale10分钟
Distributed Deep Learning Training Strategies10分钟

10个作业总计300分钟

Neural Network Implementation30分钟
Neural Network for Big Data Classification30分钟
Deep Learning Fundamentals Assessment30分钟
Advanced Architecture Implementation30分钟
Deep Learning Architecture Design30分钟
Advanced Deep Learning Architectures Assessment30分钟
Distributed Training Implementation and Management30分钟
Distributed Deep Learning Training30分钟
Distributed Deep Learning Training Assessment30分钟
Deep Learning for Big Data Mastery30分钟

Generative AI and Big Data Integration explores how generative AI transforms big data analytics by enabling intelligent, natural language–driven workflows at scale. You will learn how foundation models and large language models integrate with distributed data pipelines to automate insights, enhance analytics, and power modern data applications. Through hands-on labs, you will implement LLM integration, apply fine-tuning for domain-specific use cases, and design production-ready GenAI solutions for real-world big data scenarios.