End-to-End Multimodal AI: Fine-Tuning, Fusion, and MLOps

本课程是 Multimodal Intelligence - Vision, Audio & Language in Action 专业证书的一部分

位教师：Professionals from the Industry

访问权限由 Coursera Learning Team 提供

20个模块

深入了解一个主题并学习基础知识。

中级等级

推荐体验

2 周完成

在 10 小时一周

灵活的计划

自行安排学习进度

20个模块

深入了解一个主题并学习基础知识。

中级等级

推荐体验

2 周完成

在 10 小时一周

灵活的计划

自行安排学习进度

您将学到什么

Fine-tune transformer-based multimodal models using transfer learning in PyTorch and TensorFlow.
Build cross-modal retrieval systems using FAISS and attention-based fusion of visual and text embeddings.
Automate ML pipelines with drift monitoring, hyperparameter tuning, and retraining using MLflow and Ray Tune.
Design and document versioned multimodal inference APIs with FastAPI, OAuth2, and OpenAPI specifications.

您将获得的技能

您将学习的工具

要了解的详细信息

可分享的证书

添加到您的领英档案

作业

38 任务¹

AI 评分请参见免责声明

授课语言：英语（English）

了解顶级公司的员工如何掌握热门技能

了解关于 Coursera for Business 的更多信息

Petrobras, TATA, Danone, Capgemini, P&G 和 L'Oreal 的徽标

积累 Algorithms 领域的专业知识

本课程是 Multimodal Intelligence - Vision, Audio & Language in Action 专业证书专项课程的一部分

在注册此课程时，您还会同时注册此专业证书。

向行业专家学习新概念
获得对主题或工具的基础理解
通过实践项目培养工作相关技能
通过 Coursera 获得可共享的职业证书

该课程共有20个模块

Build production-ready multimodal AI systems that combine vision, language, and audio into unified intelligent applications. This course takes you through the full lifecycle of multimodal model development — from constructing and fine-tuning transformer-based architectures using PyTorch and TensorFlow, to diagnosing training failures, designing cross-modal retrieval systems, and deploying secure, monitored inference APIs.

You will work with real-world tools including CLIP, ViT, FAISS, FastAPI, MLflow, and Ray Tune to build systems that process and integrate multiple data types simultaneously. You will analyze computational complexity to optimize fusion algorithms, evaluate model errors to identify failure patterns, and translate model outputs into stakeholder-ready business insights. This course is built for intermediate practitioners in machine learning and AI who want to move beyond single-modality models and into the cutting edge of AI systems design. By the end, you will have a portfolio of deployable, optimized multimodal systems that demonstrate advanced engineering capability to employers.

You will build the foundational MLOps infrastructure for multimodal AI systems by designing modular data pipeline components and implementing your first multimodal transformer fine-tuning workflow using open source tools.

涵盖的内容

3个视频1篇阅读材料1个作业1个非评分实验室

3个视频总计12分钟

Why Modular Data Pipelines Matter in Enterprise Environments2分钟
Open Source Tools for Pipeline Development: Spark, dbt, and Airflow6分钟
Fine-tuning Multimodal Transformers3分钟

1篇阅读材料总计12分钟

Fundamentals of Modular Data Pipeline Architecture12分钟

1个作业总计3分钟

Modular Pipeline Foundations Knowledge Check3分钟

1个非评分实验室总计20分钟

Building Your First Modular Pipeline Component20分钟

You will accelerate multimodal model development using transfer learning techniques and implement the transformation and loading pipeline stages that deliver processed data and trained models reliably to downstream systems.

涵盖的内容

1个视频1篇阅读材料3个作业

You will identify and analyze training and validation metric patterns to diagnose overfitting and gradient stability issues using TensorBoard visualization tools.

涵盖的内容

2个视频1篇阅读材料1个作业1个非评分实验室

2个视频总计8分钟

When Neural Networks Fail: The Hidden Cost of Training Problems2分钟
Understanding Training Dynamics: Patterns, Gradients, and Warning Signs6分钟

1篇阅读材料总计10分钟

Mathematical Foundations of Gradient Analysis10分钟

1个作业总计3分钟

Training Dynamics Diagnosis Assessment3分钟

1个非评分实验室总计20分钟

Neural Network Training Diagnostics Lab20分钟

You will implement targeted interventions including gradient clipping and early stopping to stabilize training processes and prevent common neural network training failures.

涵盖的内容

1个视频1篇阅读材料3个作业

1个视频总计12分钟

Implementing Gradient Clipping in TensorFlow and PyTorch12分钟

1篇阅读材料总计12分钟

Training Stabilization Techniques: Gradient Clipping and Early Stopping12分钟

3个作业总计31分钟

Final Assessment: Neural Network Training Stabilization10分钟
Training Pipeline Stabilization Implementation18分钟
Training Stabilization Techniques Assessment3分钟

You will learn systematic image preprocessing techniques including normalization and color-space conversions to prepare raw visual data for computer vision applications.

涵盖的内容

3个视频1篇阅读材料1个作业1个非评分实验室

3个视频总计17分钟

Why Image Preprocessing Matters in Computer Vision3分钟
Implementing Normalization Techniques with NumPy7分钟
Converting Between Color Spaces with OpenCV7分钟

1篇阅读材料总计10分钟

Fundamentals of Image Normalization and Color Space Theory10分钟

1个作业总计8分钟

Image Preprocessing Fundamentals Assessment8分钟

1个非评分实验室总计18分钟

Image Preprocessing Pipeline: Normalization & Color-Space Transformations18分钟

You will learn optical flow and frame differencing techniques to extract temporal motion features from video sequences for computer vision applications.

涵盖的内容

2个视频1篇阅读材料2个作业

You will establish foundational understanding of systematic error analysis approaches and learn to evaluate computer vision model performance beyond basic accuracy metrics.

涵盖的内容

2个视频1篇阅读材料1个作业1个非评分实验室

2个视频总计10分钟

Why Systematic Error Analysis Matters in Computer Vision3分钟
Understanding Confusion Matrices and Error Categories7分钟

1篇阅读材料总计12分钟

Foundations of Computer Vision Error Analysis12分钟

1个作业总计8分钟

Evaluating Error Analysis Fundamentals8分钟

1个非评分实验室总计20分钟

Hands-On Confusion Matrix Analysis for Computer Vision Models20分钟

You will apply advanced techniques to identify systematic failure patterns in computer vision models and generate comprehensive quality reports for model improvement.

涵盖的内容

1个视频1篇阅读材料3个作业

You will build foundational understanding of cross-modal retrieval systems and implement approximate nearest-neighbor search algorithms using FAISS for production-scale similarity search across multimodal embeddings.

涵盖的内容

1个视频2篇阅读材料1个作业1个非评分实验室

1个视频总计7分钟

Fundamentals of Cross-Modal Retrieval Systems7分钟

2篇阅读材料总计18分钟

FAISS Architecture and Index Types for Production Systems10分钟
Implementing FAISS Indexing for Cross-Modal Search8分钟

1个作业总计3分钟

Cross-Modal Retrieval and FAISS Implementation Assessment3分钟

1个非评分实验室总计15分钟

Building Production-Scale Cross-Modal Retrieval with FAISS15分钟

You will design and implement sophisticated attention-based fusion algorithms that intelligently combine visual and textual embeddings, mastering the creation of multimodal neural architectures for advanced cross-modal AI applications.

涵盖的内容

2篇阅读材料3个作业

You will learn the foundational concepts of computational complexity analysis, learning to systematically evaluate fusion algorithms using Big O notation and profiling tools.

涵盖的内容

3个视频1篇阅读材料1个作业1个非评分实验室

3个视频总计16分钟

Why Algorithm Complexity Analysis Matters in Production AI3分钟
Applying Big O Analysis to Fusion Algorithm Components7分钟
Profiling Fusion Algorithms with cProfile6分钟

1篇阅读材料总计8分钟

Fundamentals of Computational Complexity in Fusion Algorithms8分钟

1个作业总计5分钟

Complexity Analysis Fundamentals Assessment5分钟

1个非评分实验室总计18分钟

Profile and Analyze Fusion Algorithm Performance18分钟

You will apply complexity analysis skills to make strategic optimization decisions, evaluating trade-offs between performance, accuracy, and resource constraints in real-world deployment scenarios.

涵盖的内容

1个视频3个作业

You will learn the systematic evaluation of production ML models to identify performance degradation and implement drift detection systems that automatically trigger remediation actions.

涵盖的内容

1个视频1篇阅读材料1个作业1个非评分实验室

You will build comprehensive automated ML pipelines with integrated hyperparameter optimization and end-to-end automation that maintains model performance in production environments.

涵盖的内容

2个视频1篇阅读材料3个作业

2个视频总计15分钟

End-to-End ML Pipeline Architecture and Components7分钟
Building Automated ML Pipelines with Ray Tune and MLflow8分钟

1篇阅读材料总计10分钟

Hyperparameter Optimization Strategies and Integration Patterns10分钟

3个作业总计28分钟

Final Course Assessment - Automated ML Operations10分钟
Enterprise ML Pipeline Implementation15分钟
Automated ML Pipeline Mastery Assessment3分钟

You will build foundational skills for systematically analyzing multimodal AI model outputs, understanding cross-modal relationships, and preparing technical findings for stakeholder communication.

涵盖的内容

2个视频1篇阅读材料1个作业1个非评分实验室

2个视频总计10分钟

The Business Impact of Multimodal AI Interpretation3分钟
Explainability Tools and Techniques for Multimodal Analysis7分钟

1篇阅读材料总计10分钟

Understanding Multimodal AI Model Architecture and Output Patterns10分钟

1个作业总计3分钟

Multimodal Analysis Fundamentals Knowledge Check3分钟

1个非评分实验室总计20分钟

Multimodal AI Model Analysis for Business Stakeholders20分钟

You will learn the critical skills of translating complex multimodal AI analysis into compelling business narratives, creating executive-level presentations, and developing stakeholder communication frameworks that drive strategic decisions.

涵盖的内容

2个视频1篇阅读材料3个作业

2个视频总计11分钟

When Technical Excellence Isn't Enough: The Communication Gap in AI3分钟
Creating Executive Briefings from Technical AI Analysis8分钟

1篇阅读材料总计10分钟

Business Narrative Frameworks for AI Insights10分钟

3个作业总计38分钟

Comprehensive Multimodal AI Analysis and Stakeholder Communication Assessment15分钟
Developing Comprehensive Executive Briefing from Multimodal Analysis20分钟
Stakeholder Communication Fundamentals Knowledge Check3分钟

You will design and implement versioned API endpoints specifically optimized for multimodal AI inference workloads

涵盖的内容

3个视频1篇阅读材料2个作业

3个视频总计15分钟

Why API Versioning Matters for Multimodal AI Services3分钟
Fundamentals of Multimodal API Endpoint Design7分钟
Implementing Versioned Endpoints with FastAPI4分钟

1篇阅读材料总计10分钟

Designing Robust Data Contracts for Multimodal Inputs10分钟

2个作业总计21分钟

Build a Versioned Multimodal API Prototype18分钟
API Endpoint Design Knowledge Check3分钟

You will implement comprehensive OAuth2 authentication systems and observability middleware for production API services

涵盖的内容

2个视频1篇阅读材料2个作业

2个视频总计14分钟

OAuth2 Authentication and API Security Fundamentals7分钟
Implementing OAuth2 Security Middleware with FastAPI7分钟

1篇阅读材料总计12分钟

Implementing Comprehensive API Monitoring and Observability12分钟

2个作业总计23分钟

Build Comprehensive Security and Monitoring Middleware20分钟
Security and Monitoring Implementation Knowledge Check3分钟

You will create comprehensive OpenAPI specifications that enable automated testing, client generation, and seamless integration

涵盖的内容

2个视频1篇阅读材料2个作业1个非评分实验室

2个视频总计12分钟

Why Comprehensive API Documentation Drives Developer Adoption4分钟
Advanced OpenAPI Features for Multimodal APIs8分钟

1篇阅读材料总计11分钟

OpenAPI Specification Design for Developer Integration11分钟

2个作业总计18分钟

Comprehensive OpenAPI Documentation Assessment15分钟
OpenAPI Documentation Knowledge Check3分钟

1个非评分实验室总计20分钟

OpenAPI Specification for Multimodal AI Services20分钟

You will build a production-grade multimodal AI system that processes visual and textual data, integrating fine-tuning, cross-modal fusion, and deployment-ready inference services.This capstone synthesizes model optimization, data engineering, API design, and MLOps practices to deliver a deployable, monitored multimodal application.