Preparing Multimodal Data: Vision, Audio, and NLP Pipelines

Économisez sur les compétences qui vous font briller avec 40 % de réduction sur 3 mois de Coursera Plus. Économisez maintenant

Ce cours n'est pas disponible en Français (France)

Nous sommes actuellement en train de le traduire dans plus de langues.

Preparing Multimodal Data: Vision, Audio, and NLP Pipelines

Ce cours fait partie de Certificat Professionnel Multimodal Intelligence - Vision, Audio & Language in Action

Instructeur : Professionals from the Industry

Inclus avec

13 modules

Obtenez un aperçu d'un sujet et apprenez les principes fondamentaux.

niveau Intermédiaire

Expérience recommandée

1 semaine à compléter

à 10 heures par semaine

Planning flexible

Apprenez à votre propre rythme

13 modules

Obtenez un aperçu d'un sujet et apprenez les principes fondamentaux.

niveau Intermédiaire

Expérience recommandée

1 semaine à compléter

à 10 heures par semaine

Planning flexible

Apprenez à votre propre rythme

Ce que vous apprendrez

Preprocess images and video using normalization, color-space conversion, and motion extraction techniques.
Build audio feature extraction and augmentation pipelines using MFCCs and spectral transforms.
Fine-tune transformer models and construct text preprocessing pipelines for NLP applications.
Evaluate and debug multimodal AI models using automatic metrics and human-in-the-loop frameworks.

Compétences que vous acquerrez

Catégorie : Data Processing
Catégorie : Machine Learning Algorithms
Catégorie : Machine Learning Methods
Catégorie : Artificial Intelligence and Machine Learning (AI/ML)
Catégorie : Image Quality
Catégorie : Computer Vision
Catégorie : Data Preprocessing
Catégorie : Feature Engineering
Catégorie : Model Evaluation
Catégorie : Machine Learning Software
Catégorie : Data Architecture
Catégorie : Model Training
Catégorie : Image Analysis
Catégorie : Data Pipelines
Catégorie : Digital Signal Processing
Catégorie : Natural Language Processing
Catégorie : Fine-tuning
Catégorie : Data Transformation
Catégorie : Artificial Neural Networks

Outils que vous découvrirez

Catégorie : Hugging Face

Détails à connaître

Certificat partageable

Ajouter à votre profil LinkedIn

Récemment mis à jour !

mars 2026

Évaluations

23 affectations¹

Noté par l'IA voir l'avis de non-responsabilité

Enseigné en Anglais

Découvrez comment les employés des entreprises prestigieuses maîtrisent des compétences recherchées

En savoir plus sur Coursera pour les affaires

logos de Petrobras, TATA, Danone, Capgemini, P&G et L'Oreal

Élaborez votre expertise en Software Development

Ce cours fait partie de la Certificat Professionnel Multimodal Intelligence - Vision, Audio & Language in Action

Lorsque vous vous inscrivez à ce cours, vous êtes également inscrit(e) à ce Certificat Professionnel.

Apprenez de nouveaux concepts auprès d'experts du secteur
Acquérez une compréhension de base d'un sujet ou d'un outil
Développez des compétences professionnelles avec des projets pratiques
Obtenez un certificat professionnel partageable auprès de Coursera

Il y a 13 modules dans ce cours

Raw images, audio clips, and text are only valuable when transformed into formats that AI models can actually use. This intermediate course equips you with the hands-on skills to build multimodal data processing pipelines across three core data types — visual, audio, and language — and to evaluate the AI models trained on them.

You will preprocess and enhance image data using normalization, color-space conversion, and quality correction techniques. You will extract motion features from video using optical flow and frame differencing. On the audio side, you will apply spectral and cepstral feature extraction and build augmentation pipelines that improve model robustness. For language, you will fine-tune transformer models on domain-specific datasets and construct end-to-end text preprocessing pipelines using industry-standard tools. Grounded in real-world job tasks from machine learning and AI roles, this course prepares you to take raw, unstructured data and shape it into training-ready inputs — a skill in high demand across AI, computer vision, speech, and NLP teams.

You will learn the foundational image preprocessing techniques essential for computer vision applications, including normalization methods and color-space conversions that ensure consistent model performance across diverse visual conditions.

Inclus

1 vidéo2 lectures2 devoirs

You will learn motion analysis techniques essential for dynamic computer vision applications, implementing optical flow algorithms and frame differencing methods to extract temporal features from video sequences for applications like object tracking and action recognition.

Inclus

1 vidéo2 lectures2 devoirs1 laboratoire non noté

1 vidéoTotal 11 minutes

Optical Flow Algorithms and Frame Differencing Mathematics11 minutes

2 lecturesTotal 18 minutes

Motion Vector Analysis and Performance Optimization10 minutes
How to Implement Optical Flow with OpenCV and NumPy8 minutes

2 devoirsTotal 13 minutes

Motion Detection and Optical Flow Fundamentals Knowledge Check3 minutes
Comprehensive Motion Analysis Assessment10 minutes

1 laboratoire non notéTotal 20 minutes

Implement Motion-Based Object Tracking System20 minutes

You will learn systematic diagnostic techniques to identify and categorize common image quality issues in computer vision datasets

Inclus

2 vidéos1 lecture2 devoirs

You will implement specific algorithmic solutions to correct identified image quality issues and validate improvements using quantitative metrics.

Inclus

2 vidéos1 lecture2 devoirs1 laboratoire non noté

2 vidéosTotal 10 minutes

Why Algorithmic Enhancement Saves Production Deployments3 minutes
Algorithmic Enhancement Techniques Overview7 minutes

1 lectureTotal 7 minutes

Implementing Unsharp Masking for Blur Correction7 minutes

2 devoirsTotal 13 minutes

Apply Targeted Mitigation Techniques3 minutes
Image Quality Enhancement Mastery Assessment10 minutes

1 laboratoire non notéTotal 18 minutes

Algorithmic Image Enhancement: Deblurring, Denoising, and Histogram Correction18 minutes

You will transform raw audio waveforms into numerical features for machine learning. You will apply spectral analysis techniques such as STFT and MFSCs. Then use cepstral analysis methods like MFCCs to extract richer representations.

Inclus

3 vidéos1 lecture2 devoirs

3 vidéosTotal 18 minutes

Why Audio Feature Extraction Matters in Production ML Systems2 minutes
Spectral Analysis Fundamentals: STFT and Mel-Scale Features8 minutes
Computing MFCCs with Librosa: Step-by-Step Implementation7 minutes

1 lectureTotal 7 minutes

Cepstral Analysis and MFCC Feature Extraction7 minutes

2 devoirsTotal 21 minutes

Optimizing MFCC Features for Environmental Sound Recognition18 minutes
Spectral and Cepstral Feature Extraction Knowledge Check3 minutes

You will design and implement automated augmentation pipelines that apply noise injection, temporal modifications, and spectral transformations to improve model generalization in real-world acoustic environments.

Inclus

2 vidéos1 lecture2 devoirs1 laboratoire non noté

2 vidéosTotal 15 minutes

Audio Augmentation Techniques: Noise, Temporal, and Spectral Transformations10 minutes
Building Audio Augmentation Pipelines with Python and Librosa5 minutes

1 lectureTotal 7 minutes

Designing Robust Augmentation Pipelines for Production Systems7 minutes

2 devoirsTotal 28 minutes

Audio Augmentation Pipeline Design and Implementation3 minutes
Audio Feature Extraction and Augmentation for Production ML Systems25 minutes

1 laboratoire non notéTotal 20 minutes

Build Production-Ready Audio Augmentation Pipelines20 minutes

You will learn quantitative performance evaluation techniques for audio models, including calculating industry-standard metrics and identifying degradation patterns across different user cohorts.

Inclus

3 vidéos1 lecture1 devoir1 laboratoire non noté

3 vidéosTotal 20 minutes

Why Audio Model Performance Monitoring Matters in Production4 minutes
Essential Audio Model Performance Metrics and Calculation Methods8 minutes
Calculating Performance Metrics with Python for Audio Model Evaluation 9 minutes

1 lectureTotal 7 minutes

Performance Metrics in Production Audio Systems: Industry Applications and Best Practices7 minutes

1 devoirTotal 8 minutes

Performance Metrics Evaluation Assessment8 minutes

1 laboratoire non notéTotal 18 minutes

Audio Model Performance Dashboard: Calculating WER and F1-Scores for User Cohort Analysis18 minutes

You will learn systematic root cause analysis techniques for audio model failures, including qualitative error analysis and environmental factor correlation to implement effective remediation strategies.

Inclus

2 vidéos1 lecture3 devoirs

2 vidéosTotal 13 minutes

Audio Sample Error Analysis Using Spectrograms and Signal Processing Tools6 minutes
Implementing Root Cause Investigation Workflow for Production Audio Models8 minutes

1 lectureTotal 7 minutes

Systematic Root Cause Analysis Framework for Audio Model Debugging7 minutes

3 devoirsTotal 48 minutes

Complete Audio Model Debugging Investigation and Remediation Plan 20 minutes
Root Cause Analysis and Systematic Debugging Assessment 3 minutes
Comprehensive Audio Model Debugging and Root Cause Analysis Evaluation25 minutes

You will learn the process of adapting pre-trained BERT models for specialized domains using Hugging Face Transformers, achieving production-ready performance on domain-specific tasks.

Inclus

3 vidéos1 lecture1 devoir

3 vidéosTotal 17 minutes

Why Domain-Specific Language Models Transform Business Intelligence3 minutes
Understanding Transformer Fine-Tuning Architecture and Process7 minutes
Implementing BERT Fine-Tuning with Hugging Face Trainer7 minutes

1 lectureTotal 10 minutes

Hugging Face Transformers Framework and Fine-Tuning Components10 minutes

1 devoirTotal 3 minutes

Fine-Tuning Transformer Models Knowledge Check3 minutes

You will build comprehensive text preprocessing pipelines using spaCy that transform raw text into analysis-ready formats through systematic tokenization, normalization, and encoding workflows.

Inclus

2 vidéos1 lecture2 devoirs1 laboratoire non noté

2 vidéosTotal 14 minutes

Building Text Preprocessing Pipelines with spaCy Components9 minutes
Creating Automated Text Preprocessing Pipelines with spaCy5 minutes

1 lectureTotal 10 minutes

spaCy Framework and Text Processing Components10 minutes

2 devoirsTotal 15 minutes

Text Preprocessing Pipeline Knowledge Check3 minutes
Comprehensive NLP Fine-Tuning and Text Preprocessing Assessment12 minutes

1 laboratoire non notéTotal 20 minutes

Build Production-Ready Text Preprocessing Pipelines with spaCy20 minutes

You will understand the foundational principles of combining automated metrics with human-in-the-loop evaluation for comprehensive language model assessment.

Inclus

3 vidéos1 lecture1 devoir

3 vidéosTotal 23 minutes

Why Dual Evaluation Matters in Production AI Systems3 minutes
Automated Metrics Fundamentals for Language Model Assessment8 minutes
Language Model Evaluation: Automatic and Human-in-the-Loop Metrics12 minutes

1 lectureTotal 7 minutes

Human-in-the-Loop Evaluation Framework Design7 minutes

1 devoirTotal 3 minutes

Automated Metrics and Human Evaluation Concepts Knowledge Check3 minutes

You will apply integrated evaluation strategies combining automated metrics with human judgment to conduct thorough language model assessments in realistic workplace scenarios.

Inclus

3 vidéos2 devoirs1 laboratoire non noté

3 vidéosTotal 21 minutes

When Automated Metrics Miss Critical Quality Issues4 minutes
Integration Strategies for Automated and Human Evaluation Methods8 minutes
Computing Automated Metrics with Python Evaluation Libraries10 minutes

2 devoirsTotal 13 minutes

Integrated Evaluation Strategy Assessment3 minutes
Comprehensive Language Model Evaluation Assessment10 minutes

1 laboratoire non notéTotal 20 minutes

Implementing Comprehensive Language Model Assessment20 minutes

In this module, you will design and implement a multimodal AI system that integrates computer vision, audio processing, and natural language processing techniques. You will build a complete data pipeline including data preprocessing, feature extraction, multimodal fusion, model training, and performance evaluation. By the end of this module, you will be able to develop and assess a real-world AI application that combines multiple data types into a unified intelligent system.

Inclus

4 lectures1 devoir

Obtenez un certificat professionnel

Ajoutez ce titre à votre profil LinkedIn, à votre curriculum vitae ou à votre CV. Partagez-le sur les médias sociaux et dans votre évaluation des performances.

Instructeur

Professionals from the Industry

472 Cours85 173 apprenants

Offert par

Coursera

En savoir plus sur Software Development

Coursera
End-to-End Multimodal AI: Fine-Tuning, Fusion, and MLOps
Cours
Coursera
Production-Ready Multimodal ML Engineering
Cours
Coursera
Career Development for Multimodal Intelligence
Cours
Coursera
Solution Architecture and Ethical AI Design
Cours

Pour quelles raisons les étudiants sur Coursera nous choisissent-ils pour leur carrière ?

Felipe M.

Étudiant(e) depuis 2018

’Pouvoir suivre des cours à mon rythme à été une expérience extraordinaire. Je peux apprendre chaque fois que mon emploi du temps me le permet et en fonction de mon humeur.’

Jennifer J.

Étudiant(e) depuis 2020

’J'ai directement appliqué les concepts et les compétences que j'ai appris de mes cours à un nouveau projet passionnant au travail.’

Larry W.

Étudiant(e) depuis 2021

’Lorsque j'ai besoin de cours sur des sujets que mon université ne propose pas, Coursera est l'un des meilleurs endroits où se rendre.’

Chaitanya A.

’Apprendre, ce n'est pas seulement s'améliorer dans son travail : c'est bien plus que cela. Coursera me permet d'apprendre sans limites.’

Foire Aux Questions

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Plus de questions

Visitez le Centre d'Aide pour les Étudiants

Aide financière disponible,

¹ Certains travaux de ce cours sont notés par l'IA. Pour ces travaux, vos Données internes seront utilisées conformément à Notification de confidentialité de Coursera.