Learn to build AI that sees, hears, and understands the world in an integrated way. This course takes you beyond single-modality models, teaching you to architect applications that connect different data types like text, images, and speech.

Multimodal and cross-modal AI integrations
本课程是 Microsoft Generative AI Engineering 专业证书 的一部分

位教师: Microsoft
访问权限由 New York State Department of Labor 提供
您将获得的技能
要了解的详细信息

添加到您的领英档案
24 项作业
January 2026
了解顶级公司的员工如何掌握热门技能

积累 Software Development 领域的专业知识
- 向行业专家学习新概念
- 获得对主题或工具的基础理解
- 通过实践项目培养工作相关技能
- 通过 Microsoft 获得可共享的职业证书

该课程共有4个模块
This module introduces the foundational concepts of multimodal AI. You will learn the architectural patterns for combining different AI components, such as text and image models, and progress from basic integration to building complex systems that can reason across multiple data types.
涵盖的内容
4个视频9篇阅读材料7个作业
This module provides a deep dive into the popular and creative task of generating images from text descriptions. You will explore the models that power this technology, like DALL·E, and learn both basic and advanced prompting techniques to craft and refine specific, high-quality visual outputs.
涵盖的内容
5个视频5篇阅读材料5个作业
This module focuses on practical implementation using a powerful, specialized tool. You will leverage the features of Azure AI Vision to build and optimize cross-modal applications like image captioning and visual search. You'll learn how this single service can analyze visual content to generate rich textual descriptions and extract embedded text (OCR), providing the core components for sophisticated multimodal solutions.
涵盖的内容
7个视频6篇阅读材料7个作业
This capstone module builds upon your deep expertise in Azure AI Vision. You will learn to integrate your vision applications with other powerful Azure AI Services, such as Language and Speech, to create comprehensive, end-to-end solutions. The focus will be on orchestrating these distinct services to develop a sophisticated application that solves a real-world business problem, demonstrating your ability to design and build a complete multimodal system from the ground up.
涵盖的内容
6个视频5篇阅读材料5个作业
获得职业证书
将此证书添加到您的 LinkedIn 个人资料、简历或履历中。在社交媒体和绩效考核中分享。
人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

Jennifer J.

Larry W.







