Engineering Data Ecosystems: Pipelines, ETL, Spark

本课程是 Building Smarter Data Pipelines: SQL, Spark, Kafka & GenAI 专项课程的一部分

位教师：Soheil Haddadi

访问权限由 New York State Department of Labor 提供

1个模块

深入了解一个主题并学习基础知识。

初级等级

推荐体验

3 小时完成

灵活的计划

自行安排学习进度

1个模块

深入了解一个主题并学习基础知识。

初级等级

推荐体验

3 小时完成

灵活的计划

自行安排学习进度

您将学到什么

Identify and describe the components and importance of data ecosystems.
Understand the basic structure and function of data pipelines.
Recognize the steps involved in ETL workflows and their role in data handling.
Gain an introductory knowledge of big data and the application of Apache Spark.

您将获得的技能

要了解的详细信息

可分享的证书

添加到您的领英档案

作业

3 任务¹

AI 评分请参见免责声明

授课语言：英语（English）

了解顶级公司的员工如何掌握热门技能

了解关于 Coursera for Business 的更多信息

Petrobras, TATA, Danone, Capgemini, P&G 和 L'Oreal 的徽标

积累特定领域的专业知识

本课程是 Building Smarter Data Pipelines: SQL, Spark, Kafka & GenAI 专项课程专项课程的一部分

在注册此课程时，您还会同时注册此专项课程。

向行业专家学习新概念
获得对主题或工具的基础理解
通过实践项目培养工作相关技能
获得可共享的职业证书

该课程共有1个模块

This course is designed to provide you with a foundational understanding of how modern data ecosystems work. From data pipelines to ETL processes, and big data handling using Apache Spark, you’ll explore the essential tools, techniques, and technologies that drive decision-making in today’s data-driven world. Whether you’re an aspiring data engineer or someone interested in the mechanics of data handling, this course will lay the groundwork for your journey into the exciting field of data engineering.

This course is ideal for aspiring data engineers, software developers, database administrators, and IT professionals looking to expand their skills in data handling and processing. Additionally, analysts and business professionals interested in data technologies will find the course beneficial for enhancing their understanding of the fundamental processes behind data ecosystems and big data. Participants should have a general interest in data and a basic understanding of programming concepts. Familiarity with database systems will be helpful, but prior experience with Spark is not required. An interest in big data and data analytics will enrich your learning experience throughout the course. By the end of this course, participants will be able to identify the components and importance of data ecosystems, understand the structure and function of data pipelines, and recognize the critical steps involved in ETL workflows. Additionally, you'll gain introductory knowledge of big data handling with Apache Spark and its applications in large-scale data processing.

This course serves as an introductory course aimed at unraveling the complexities of data ecosystems. It's tailored for individuals at the onset of their data engineering journey, emphasizing the construction, management, and optimization of data pipelines, the essentials of ETL (Extract, Transform, Load) workflows, and an introduction to big data processing with Apache Spark.