This advanced course teaches machine learning and AI techniques for big data systems. Learners will build end-to-end ML pipelines with PySpark ML, implement supervised and unsupervised models, and apply NLP techniques at scale. The course also explores deep learning, distributed training, and integrating Generative AI into big data workflows.
By the end of this course, you will be able to:
- Implement ML pipelines using PySpark ML
- Build supervised, unsupervised, and recommendation models
- Apply NLP and text analytics to large datasets
-Integrate Generative AI and LLMs with big data systems
Tools & Software:
PySpark ML, PyTorch, TensorFlow, Azure Machine Learning, Azure OpenAI Service
Skills:
Machine learning, NLP, Deep learning, Generative AI, Model evaluation
Machine learning appears quite different when data exceeds the capacity of a single system. In this section, learners explore the foundational ideas behind machine learning in big data environments and how familiar approaches change at scale. You will examine supervised and unsupervised learning, regression and classification problems, and the practical challenges that arise with massive datasets—such as scalability, distributed computing, and the need to adapt algorithms for large-scale processing.
涵盖的内容
6个视频3篇阅读材料7个作业
显示有关单元内容的信息
6个视频•总计29分钟
Machine Learning Transforms Big Data into Business Intelligence•4分钟
ML Problem Classification and Business Mapping•7分钟
Data Quality Drives ML Success at Scale•4分钟
Distributed Data Preparation Workflows•6分钟
Rigorous Evaluation Prevents ML Disasters at Scale•4分钟
Implementing Scalable Model Evaluation•5分钟
3篇阅读材料•总计30分钟
Machine Learning Fundamentals for Big Data Environments•10分钟
Big Data ML Preparation Techniques•10分钟
ML Model Evaluation for Big Data Systems•10分钟
7个作业•总计210分钟
ML Fundamentals for Big Data Mastery•30分钟
Machine Learning Problem Analysis•30分钟
ML Fundamentals for Big Data Assessment•30分钟
ML Data Preparation Pipeline•30分钟
Data Preparation for ML at Scale Assessment•30分钟
Scalable Model Evaluation•30分钟
Model Evaluation at Scale Assessment•30分钟
Building ML Models with PySpark ML
第 2 单元•小时 后完成
单元详情
A practical foundation for building scalable machine learning solutions using PySpark ML in big data environments. The content focuses on designing and implementing end-to-end machine learning pipelines with transformers and estimators, while developing regression, classification, and clustering models that scale across distributed systems. Emphasis is placed on real-world implementation and informed platform selection for enterprise deployments using Azure Databricks, Microsoft Fabric, and Azure HDInsight, ensuring solutions are both technically robust and operationally viable at scale.
涵盖的内容
6个视频3篇阅读材料10个作业
显示有关单元内容的信息
6个视频•总计36分钟
Democratizing Machine Learning at Enterprise Scale•4分钟
PySpark ML Pipeline Development Across Platforms•10分钟
Supervised Learning Success Stories in Enterprise Big Data•5分钟
Supervised Learning Model Development•6分钟
Recommendation Systems Drive Business Growth•4分钟
Building Scalable Recommendation Systems•8分钟
3篇阅读材料•总计30分钟
PySpark ML Architecture and Platform Comparison•10分钟
Supervised Learning Algorithms for Big Data•10分钟
Unsupervised Learning and Recommendation Systems•10分钟
10个作业•总计300分钟
PySpark ML Implementation Mastery•30分钟
ML Pipeline Component Development•30分钟
ML Platform Comparison and Pipeline Creation•30分钟
PySpark ML Platform Fundamentals Assessment•30分钟
Supervised Learning Implementation•30分钟
Supervised Learning Model Development•30分钟
Supervised Learning at Scale Assessment•30分钟
Recommendation System Implementation•30分钟
Recommendation System Development•30分钟
Unsupervised Learning and Recommendations Assessment•30分钟
Text Analytics and NLP at Scale
第 3 单元•小时 后完成
单元详情
Large-scale text analytics introduces the challenges and techniques required to process and analyze unstructured text at enterprise scale using distributed computing frameworks. The focus is on applying natural language processing (NLP) techniques in scalable architectures to support text classification, sentiment analysis, and entity and relationship extraction across massive text corpora. Emphasis is placed on practical, production-oriented approaches for handling high-volume text data, with integration of Azure Cognitive Services to enhance accuracy, scalability, and operational efficiency in real-world analytics solutions.
涵盖的内容
6个视频3篇阅读材料10个作业
显示有关单元内容的信息
6个视频•总计39分钟
Unlocking Value from Unstructured Text at Scale•5分钟
Building Scalable Text Processing Pipelines•9分钟
Advanced NLP Drives Business Intelligence•5分钟
Implementing Advanced NLP at Scale•7分钟
Production-Scale Text Classification Transforms Business Operations•4分钟
Building Production Text Classification Systems•8分钟
3篇阅读材料•总计30分钟
Distributed Text Processing Techniques•10分钟
Advanced NLP Techniques for Big Data•10分钟
Scalable Text Classification Architectures•10分钟
10个作业•总计300分钟
Text Analytics and NLP Mastery•30分钟
Text Preprocessing Pipeline Development•30分钟
Scalable Text Preprocessing Design•30分钟
Text Processing at Scale Assessment•30分钟
Advanced NLP Implementation and Monitoring•30分钟
NLP System Architecture Design•30分钟
Advanced NLP Techniques Assessment•30分钟
Text Classification System Development•30分钟
Text Classification System Implementation•30分钟
Text Classification at Scale Assessment•30分钟
Deep Learning for Big Data
第 4 单元•小时 后完成
单元详情
Deep Learning for Big Data introduces the fundamentals of deep learning and advanced architectures specifically adapted for big data environments. Students will learn to implement neural networks for big data applications, apply transfer learning techniques with pre-trained models, and scale deep learning training across distributed clusters using modern frameworks and optimization techniques.
涵盖的内容
6个视频3篇阅读材料10个作业
显示有关单元内容的信息
6个视频•总计31分钟
Deep Learning Revolutionizes Big Data Analytics•5分钟
Neural Network Implementation in Big Data Frameworks•5分钟
Advanced Architectures Transform Complex Data Analysis•6分钟
CNN and RNN Implementation at Scale•5分钟
Distributed Deep Learning Enables Breakthrough Scale•4分钟
Implementing Distributed Deep Learning Training•5分钟
3篇阅读材料•总计30分钟
Deep Learning Architectures for Big Data•10分钟
Advanced Deep Learning Architectures for Scale•10分钟
Distributed Deep Learning Training Strategies•10分钟
10个作业•总计300分钟
Deep Learning for Big Data Mastery•30分钟
Neural Network Implementation•30分钟
Neural Network for Big Data Classification•30分钟
Deep Learning Fundamentals Assessment•30分钟
Advanced Architecture Implementation•30分钟
Deep Learning Architecture Design•30分钟
Advanced Deep Learning Architectures Assessment•30分钟
Distributed Training Implementation and Management•30分钟
Distributed Deep Learning Training•30分钟
Distributed Deep Learning Training Assessment•30分钟
Generative AI and Big Data Integration
第 5 单元•小时 后完成
单元详情
Generative AI and Big Data Integration explores how generative AI transforms big data analytics by enabling intelligent, natural language–driven workflows at scale. You will learn how foundation models and large language models integrate with distributed data pipelines to automate insights, enhance analytics, and power modern data applications. Through hands-on labs, you will implement LLM integration, apply fine-tuning for domain-specific use cases, and design production-ready GenAI solutions for real-world big data scenarios.
涵盖的内容
7个视频3篇阅读材料9个作业
显示有关单元内容的信息
7个视频•总计42分钟
Generative AI Transforms Big Data Analytics•4分钟
Exploring Generative AI Models for Data Applications•10分钟
LLMs Democratize Data Analysis•5分钟
LLM Integration with Big Data Pipelines•6分钟
Domain-Specific AI Models Drive Business Value•4分钟
Implementing Fine-tuning Pipelines - Part 1•6分钟
Implementing Fine-tuning Pipelines - Part 2•6分钟
3篇阅读材料•总计30分钟
Generative AI Architectures and Big Data Integration•10分钟
Large Language Model Integration Strategies•10分钟
Model Fine-tuning and Domain Adaptation Strategies•10分钟
9个作业•总计270分钟
Generative AI Integration Mastery•30分钟
Generative AI Model Exploration•30分钟
Generative AI Fundamentals Assessment•30分钟
LLM API Integration and Automation•30分钟
LLM-Enhanced Data Analysis Pipeline•30分钟
LLM Integration Techniques Assessment•30分钟
Fine-tuning Pipeline Implementation and Monitoring•30分钟
Our goal at Microsoft is to empower every individual and organization on the planet to achieve more.
In this next revolution of digital transformation, growth is being driven by technology. Our integrated cloud approach creates an unmatched platform for digital transformation. We address the real-world needs of customers by seamlessly integrating Microsoft 365, Dynamics 365, LinkedIn, GitHub, Microsoft Power Platform, and Azure to unlock business value for every organization—from large enterprises to family-run businesses. The backbone and foundation of this is Azure.
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I subscribe to this Certificate?
When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.