This advanced course teaches machine learning and AI techniques for big data systems. Learners will build end-to-end ML pipelines with PySpark ML, implement supervised and unsupervised models, and apply NLP techniques at scale. The course also explores deep learning, distributed training, and integrating Generative AI into big data workflows.

您将获得的技能
- Transfer Learning
- Scalability
- Unstructured Data
- Model Deployment
- Model Evaluation
- Unsupervised Learning
- PySpark
- Deep Learning
- Text Mining
- Artificial Intelligence and Machine Learning (AI/ML)
- Distributed Computing
- Feature Engineering
- Big Data
- Natural Language Processing
- Generative AI
- Supervised Learning
- PyTorch (Machine Learning Library)
- Machine Learning
- 技能部分已折叠。显示 9 项技能,共 18 项。
要了解的详细信息
了解顶级公司的员工如何掌握热门技能

积累 Data Analysis 领域的专业知识
- 向行业专家学习新概念
- 获得对主题或工具的基础理解
- 通过实践项目培养工作相关技能
- 通过 Microsoft 获得可共享的职业证书

该课程共有5个模块
Machine learning appears quite different when data exceeds the capacity of a single system. In this section, learners explore the foundational ideas behind machine learning in big data environments and how familiar approaches change at scale. You will examine supervised and unsupervised learning, regression and classification problems, and the practical challenges that arise with massive datasets—such as scalability, distributed computing, and the need to adapt algorithms for large-scale processing.
涵盖的内容
6个视频3篇阅读材料7个作业
A practical foundation for building scalable machine learning solutions using PySpark ML in big data environments. The content focuses on designing and implementing end-to-end machine learning pipelines with transformers and estimators, while developing regression, classification, and clustering models that scale across distributed systems. Emphasis is placed on real-world implementation and informed platform selection for enterprise deployments using Azure Databricks, Microsoft Fabric, and Azure HDInsight, ensuring solutions are both technically robust and operationally viable at scale.
涵盖的内容
6个视频3篇阅读材料10个作业
Large-scale text analytics introduces the challenges and techniques required to process and analyze unstructured text at enterprise scale using distributed computing frameworks. The focus is on applying natural language processing (NLP) techniques in scalable architectures to support text classification, sentiment analysis, and entity and relationship extraction across massive text corpora. Emphasis is placed on practical, production-oriented approaches for handling high-volume text data, with integration of Azure Cognitive Services to enhance accuracy, scalability, and operational efficiency in real-world analytics solutions.
涵盖的内容
6个视频3篇阅读材料10个作业
Deep Learning for Big Data introduces the fundamentals of deep learning and advanced architectures specifically adapted for big data environments. Students will learn to implement neural networks for big data applications, apply transfer learning techniques with pre-trained models, and scale deep learning training across distributed clusters using modern frameworks and optimization techniques.
涵盖的内容
6个视频3篇阅读材料10个作业
Generative AI and Big Data Integration explores how generative AI transforms big data analytics by enabling intelligent, natural language–driven workflows at scale. You will learn how foundation models and large language models integrate with distributed data pipelines to automate insights, enhance analytics, and power modern data applications. Through hands-on labs, you will implement LLM integration, apply fine-tuning for domain-specific use cases, and design production-ready GenAI solutions for real-world big data scenarios.
涵盖的内容
7个视频3篇阅读材料9个作业
获得职业证书
将此证书添加到您的 LinkedIn 个人资料、简历或履历中。在社交媒体和绩效考核中分享。
人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

Jennifer J.

Larry W.

Chaitanya A.
从 Data Science 浏览更多内容

O.P. Jindal Global University

University of California San Diego
¹ 本课程的部分作业采用 AI 评分。对于这些作业,将根据 Coursera 隐私声明使用您的数据。






