Coursera

Vision & Audio AI Systems 专项课程

Coursera

Vision & Audio AI Systems 专项课程

Build Multimodal AI for Vision and Audio. Design, debug, and deploy AI systems that unify visual and audio data processing.

Hurix Digital

位教师:Hurix Digital

包含在 Coursera Plus

深入学习学科知识
高级设置 等级

推荐体验

4 周 完成
在 10 小时 一周
灵活的计划
自行安排学习进度
深入学习学科知识
高级设置 等级

推荐体验

4 周 完成
在 10 小时 一周
灵活的计划
自行安排学习进度

您将学到什么

  • Design preprocessing pipelines for image, video, and audio data that transform raw inputs into model-ready features.

  • Implement cross-modal retrieval systems and fusion algorithms that unify visual and audio information effectively.

  • Debug and optimize multimodal AI systems through systematic error analysis and performance tuning techniques.

要了解的详细信息

可分享的证书

添加到您的领英档案

授课语言:英语(English)
最近已更新!

January 2026

了解顶级公司的员工如何掌握热门技能

Petrobras, TATA, Danone, Capgemini, P&G 和 L'Oreal 的徽标

精进特定领域的专业知识

  • 向大学和行业专家学习热门技能
  • 借助实践项目精通一门科目或一个工具
  • 培养对关键概念的深入理解
  • 通过 Coursera 获得职业证书

专业化 - 10门课程系列

您将学到什么

  • Multimodal architecture needs encoder-fusion-decoder pipelines balancing computational efficiency with cross-modal understanding capabilities.

  • Transfer learning transforms AI by enabling rapid adaptation of pre-trained knowledge to new domains with minimal data and training requirements.

  • Fine-tuning balances knowledge preservation and task adaptation through careful hyperparameter selection and strategic layer freezing techniques.

  • Production multimodal systems require systematic optimization approaches considering both model performance and computational resource constraints.

您将获得的技能

类别:Deep Learning
类别:Knowledge Transfer
类别:Model Deployment
类别:Tensorflow
类别:Artificial Neural Networks
类别:Keras (Neural Network Library)
类别:PyTorch (Machine Learning Library)

您将学到什么

  • Training and validation metric divergence patterns are reliable indicators of overfitting that require early intervention to avoid model degradation.

  • Gradient magnitude tracking during backpropagation reveals critical stability issues that can be systematically diagnosed and corrected.

  • Proactive diagnostic workflows using visualization tools like TensorBoard enable timely interventions that save significant computational resources

  • Successful model development depends on establishing continuous monitoring practices that catch training failures before they become costly problems.

您将获得的技能

类别:Analysis
类别:Applied Machine Learning
类别:Performance Analysis

您将学到什么

  • Systematic error analysis uncovers specific failure modes and root causes that guide focused model improvements.

  • Confusion matrices and error categories reveal class-level model strengths and weaknesses.

  • Visualizing predictions with ground truth adds qualitative insight to complement numeric metrics.

  • Linking errors to data traits enables targeted data collection and model tuning for stronger robustness.

您将获得的技能

类别:Computer Vision
类别:Model Evaluation
类别:Exploratory Data Analysis
类别:Analysis
类别:Failure Mode And Effects Analysis
类别:Image Analysis
类别:Quality Assurance
类别:Root Cause Analysis
类别:Debugging
类别:Data Visualization
类别:Statistical Reporting
Unify Modalities: Cross-Modal Retrieval

Unify Modalities: Cross-Modal Retrieval

第 4 门课程 2小时

您将学到什么

  • Cross-modal retrieval aligns vector spaces to bridge semantic gaps between text, images, and other data types.

  • ANN tools like FAISS enable fast similarity search across millions of embeddings with production-scale performance.

  • Attention mechanisms fuse visual and textual features by learning contextual relationships across multiple representations.

  • Multimodal systems balance accuracy, speed, and memory through careful index choice and parameter tuning.

您将获得的技能

类别:Vector Databases
类别:PyTorch (Machine Learning Library)
类别:Artificial Intelligence and Machine Learning (AI/ML)
类别:Image Analysis
类别:Embeddings
类别:Transfer Learning
类别:Performance Tuning
类别:Applied Machine Learning
类别:Vision Transformer (ViT)
Analyze and Optimize Fusion Algorithms

Analyze and Optimize Fusion Algorithms

第 5 门课程 2小时

您将学到什么

  • Systematic complexity analysis with Big O notation for time and space is fundamental to predicting performance in scalable AI system design.

  • Trade-off evaluation between speed and memory usage requires formal assessment methodologies rather than intuitive guessing.

  • Resource optimization decisions must be grounded in empirical profiling data combined with theoretical complexity analysis.

  • Algorithm selection for deployment environments requires matching complexity profiles to specific hardware constraints and performance requirements.

您将获得的技能

类别:Algorithms
类别:Systems Analysis
类别:Scalability
类别:Resource Utilization

您将学到什么

  • Image preprocessing with normalization and color-space conversion ensures stable training and consistent performance across visuals.

  • Motion features from optical flow and frame differencing help systems learn temporal dynamics for tracking and action tasks.

  • Strong preprocessing improves model accuracy and training efficiency, making it essential in any vision pipeline

  • Mastering pixel changes and motion patterns enables advanced AI systems to understand dynamic visual scenes.

您将获得的技能

类别:Data Preprocessing
类别:Convolutional Neural Networks
类别:Image Analysis
类别:Real Time Data
类别:Computer Vision
类别:Data Transformation
类别:NumPy

您将学到什么

  • Raw audio waveforms must be transformed into structured numerical representations to enable effective processing by machine learning models.

  • Spectral features, STFT, MFSCs, & cepstral features, MFCCs, capture complementary signal info supporting ML classification, detection, recognition.

  • Noise injection, time-shifting, pitch modification & speed adjustment improve model generalization in real-world acoustic environments.

  • Automated audio augmentation pipelines are essential for production-ready AI systems ensuring reliable performance across diverse conditions.

您将获得的技能

类别:Digital Signal Processing
类别:Data Manipulation
类别:Time Series Analysis and Forecasting
类别:Applied Machine Learning
类别:System Design and Implementation
类别:NumPy
类别:Data Wrangling
类别:Data Pipelines
类别:Model Evaluation
类别:Feature Engineering
类别:Data Transformation
类别:Data Preprocessing

您将学到什么

  • Performance monitoring needs quantitative metrics and audio sample analysis to understand model behaviour and failures.

  • Audio failures often link to environmental conditions found through spectrogram and signal quality analysis.

  • Effective debugging combines statistical measures with audio analysis techniques for actionable insights

  • Root cause analysis requires understanding data quality, environmental factors, and model architecture relationships.

您将获得的技能

类别:Analysis
类别:Data Preprocessing
类别:Software Visualization
类别:Quantitative Research
类别:Debugging
类别:Performance Analysis
类别:Root Cause Analysis
类别:Model Evaluation
类别:Exploratory Data Analysis
类别:Performance Tuning

您将学到什么

  • Unified data schemas with common metadata fields enable efficient querying and joining of diverse data types for machine learning applications.

  • DAG-based orchestration platforms enable reliable data pipelines with built-in dependency control and robust error handling.

  • Strategic indexing and data type selection in schema design directly impacts storage efficiency and retrieval performance for ML training at scale.

  • Automated ETL with scheduling and monitoring converts raw multimodal data into ML-ready features while reducing manual effort .

您将获得的技能

类别:Data Pipelines
类别:Data Storage
类别:Database Design
类别:Scalability
类别:Data Modeling
类别:Data Integration
类别:Apache Airflow
类别:AI Workflows
类别:Extract, Transform, Load
类别:Data Architecture
类别:Workflow Management
类别:Data Quality
类别:Feature Engineering
Validate Multimodal Data: Ensure Quality

Validate Multimodal Data: Ensure Quality

第 10 门课程 1小时

您将学到什么

  • Data quality is the foundation of reliable multimodal AI systems - poor quality input inevitably leads to poor system performance regardless.

  • Systematic validation across modalities requires understanding the technical alignment (timestamps, IDs) and semantic consistency (content matching).

  • Automated validation pipelines are essential for scaling multimodal data operations and catching issues before they propagate to model training.

  • Cross-modal integrity checks must be designed with domain-specific knowledge about how different data types should relate to each other properly.

您将获得的技能

类别:Data Integrity
类别:Reconciliation
类别:Verification And Validation
类别:Auditing
类别:Debugging

获得职业证书

将此证书添加到您的 LinkedIn 个人资料、简历或履历中。在社交媒体和绩效考核中分享。

位教师

Hurix Digital
Coursera
283 门课程 19,711 名学生

提供方

Coursera

您可能还喜欢

人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

自 2018开始学习的学生
''能够按照自己的速度和节奏学习课程是一次很棒的经历。只要符合自己的时间表和心情,我就可以学习。'

Jennifer J.

自 2020开始学习的学生
''我直接将从课程中学到的概念和技能应用到一个令人兴奋的新工作项目中。'

Larry W.

自 2021开始学习的学生
''如果我的大学不提供我需要的主题课程,Coursera 便是最好的去处之一。'

Chaitanya A.

''学习不仅仅是在工作中做的更好:它远不止于此。Coursera 让我无限制地学习。'
Coursera Plus

通过 Coursera Plus 开启新生涯

无限制访问 10,000+ 世界一流的课程、实践项目和就业就绪证书课程 - 所有这些都包含在您的订阅中

通过在线学位推动您的职业生涯

获取世界一流大学的学位 - 100% 在线

加入超过 3400 家选择 Coursera for Business 的全球公司

提升员工的技能,使其在数字经济中脱颖而出

常见问题