Modern cloud-native applications rarely crash outright. Instead, they fail in subtle ways such as latency spikes, partial errors, or noisy dependencies. This course helps you become productive with the open-source trio used across the industry: Prometheus for metrics and PromQL analysis, Grafana for dashboards and alerting, and OpenTelemetry for standard, vendor-neutral instrumentation.
You will launch a small local stack, scrape metrics, and build a practical three-panel dashboard that tracks requests, errors, and latency. Then you will create alerts that actually matter and instrument a sample service with the OpenTelemetry SDK to produce traces that can be correlated with metrics.
Along the way, you will learn key observability patterns like pull versus push collection, label hygiene, histogram quantiles, and Collector pipelines.
Learners should be familiar with basic Docker or Linux, YAML/JSON, and be comfortable with web apps/HTTP; Kubernetes familiarity helpful.
This course is designed for software engineers, SREs, and platform engineers who want hands-on experience setting up and using an open-source observability stack to diagnose real production issues.
By the end, you will have working configurations, starter queries, and a clear path to production that covers exporters, data retention, SLOs, and burn rate alerts.
Familiarize yourself with the three primary observability signals—metrics, logs, and traces—and understand how Prometheus, Grafana, and OpenTelemetry correspond to each. We will comprehensively examine the entire data pathway, clarifying the roles of pull versus push mechanisms and exporters versus receivers. Subsequently, you will set up a small local environment using Docker Compose, which will be reused throughout this course. By the conclusion, you will have established a functional laboratory environment where targets are operationally marked in green, and data flows seamlessly.
涵盖的内容
4个视频2篇阅读材料1次同伴评审
显示有关单元内容的信息
4个视频•总计34分钟
Introduction and Welcome•5分钟
What “Observability” Really Means •10分钟
The Minimal Open-Source Stack (Architecture Flyover)•10分钟
Start the Stack with Docker Compose•10分钟
2篇阅读材料•总计10分钟
Welcome to the Course: Course Overview•5分钟
Cheat-Sheet: Signals & Tool Roles•5分钟
1次同伴评审•总计20分钟
Hands-On-Learning: Stack Up! Your First Observability Environment•20分钟
Prometheus + Grafana Essentials: PromQL and Dashboards
第 2 单元•小时 后完成
单元详情
Acquire knowledge of the fundamental components of PromQL essential for daily use: rate(), sum by(), label filters, and histogram quantiles—while avoiding typical pitfalls associated with counters and gauges. Subsequently, transform queries into meaningful signals through the development of a clear three-panel Grafana dashboard displaying RPS, error ratio, and 95th percentile latency, all equipped with appropriate units, legends, and variables. Export the dashboard as JSON and configure a noise-aware alert (error rate >5% over 5 minutes) to practice setting thresholds in relation to time windows. The emphasis is on maintaining practical panel organization and creating queries that can be clearly explained.
涵盖的内容
3个视频1篇阅读材料1次同伴评审
显示有关单元内容的信息
3个视频•总计33分钟
Mastering PromQL Queries•12分钟
Build the Three-Panel Dashboard•11分钟
Alerting Fundamentals: Thresholds & Windows•10分钟
1篇阅读材料•总计5分钟
Starter PromQL Snippets•5分钟
1次同伴评审•总计20分钟
Hands-On-Learning: Build Your First Production Dashboard•20分钟
OpenTelemetry in Practice: Traces, Collector Pipelines, and Correlation
第 3 单元•小时 后完成
单元详情
Implement the demo application with an OpenTelemetry (OTel) Software Development Kit (SDK), establish meaningful resource attributes, and export data via the OpenTelemetry Protocol (OTLP) to a Collector pipeline, which you will configure (receivers → processors → exporters). You will visualize traces using Grafana/Tempo and learn how to navigate from a “hot” metric dashboard directly to the related spans using exemplars. Throughout the process, you will validate the health of the pipeline, incorporate attributes and batching, and practice root-cause analysis on induced failures. The session concludes with next steps including label management, Service Level Objectives (SLOs) and burn rates, as well as retention/export strategies for production environments.
涵盖的内容
4个视频1篇阅读材料1个作业2次同伴评审
显示有关单元内容的信息
4个视频•总计41分钟
OTel SDK : Minimal App Instrumentation•11分钟
Build Collector Pipelines•10分钟
From Panel to Span: Root-Cause Analysis•14分钟
Course Wrap-Up•5分钟
1篇阅读材料•总计5分钟
Correlating Traces & Metrics•5分钟
1个作业•总计25分钟
Open Source Observability Stack Essentials•25分钟
2次同伴评审•总计80分钟
Hands-On-Learning: From Code to Trace: Root-Cause Detection •20分钟
Project: Production Ready: Full-Stack Observability Capstone•60分钟
Coursera brings together a diverse network of subject matter experts who have demonstrated their expertise through professional industry experience or strong academic backgrounds. These instructors design and teach courses that make practical, career-relevant skills accessible to learners worldwide.
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I purchase the Certificate?
When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Is financial aid available?
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.