One of the important topics that every data analyst should be familiar with is the distributed data processing technologies. As a data analyst, you should be able to apply different queries to your dataset to extract useful information out of it. but what if your data is so big that working with it on your local machine is not easy to be done. That is when the distributed data processing and Spark Technology will become handy. So in this project, we are going to work with pyspark module in python and we are going to use google colab environment in order to apply some queries to the dataset we have related to lastfm website which is an online music service where users can listen to different songs. This dataset is containing two csv files listening.csv and genre.csv. Also, we will learn how we can visualize our query results using matplotlib.


您将学到什么
Learn how to setup the google colab for distributed data processing
Learn applying different queries to your dataset to extract useful Information
Learn how to visualize this information using matplotlib
您将练习的技能
要了解的详细信息

添加到您的领英档案
仅桌面可用
了解顶级公司的员工如何掌握热门技能

在不到 2 个小时的时间内学习、练习和应用为就业做好准备的技能
- 接受行业专家的培训
- 获得解决实训工作任务的实践经验
- 使用最新的工具和技术来建立信心

关于此指导项目
分步进行学习
在与您的工作区一起在分屏中播放的视频中,您的授课教师将指导您完成每个步骤:
Prepare the Google Colab for distributed data processing
Mounting our Google Drive into Google Colab environment
Importing first file of our Dataset (1 Gb) into pySpark dataframe
Applying some Queries to extract useful information out of our data
Importing second file of our Dataset (3 Mb) into pySpark dataframe
Joining two dataframes and prepapre it for more advanced queries
Learn visualizing our query results using matplotlib
推荐体验
Learners should be familiar with Python programming Language, Spark Technology and have a little experience working with google colab environment
5个项目图片
位教师

学习方式
基于技能的实践学习
通过完成与工作相关的任务来练习新技能。
专家指导
使用独特的并排界面,按照预先录制的专家视频操作。
无需下载或安装
在预配置的云工作空间中访问所需的工具和资源。
仅在台式计算机上可用
此指导项目专为具有可靠互联网连接的笔记本电脑或台式计算机而设计,而不是移动设备。
人们为什么选择 Coursera 来帮助自己实现职业发展




学生评论
314 条评论
- 5 stars
62.73%
- 4 stars
24.84%
- 3 stars
8.59%
- 2 stars
1.59%
- 1 star
2.22%
显示 3/314 个
已于 Jan 7, 2022审阅
This project is really good. Gives you a hands-on experience on PySpark. I was able to write complex queries on my own after a while in this project. Very useful.
已于 Nov 14, 2020审阅
Best guided project for an introduction to the PySpark
已于 Dec 10, 2021审阅
quick start for a newbie with most basic information covered here, worth it.
您可能还喜欢
- 状态:免费试用
Edureka
- 状态:预览
Edureka
- 状态:免费
Coursera Project Network
- 状态:免费试用
常见问题
购买指导项目后,您将获得完成指导项目所需的一切,包括通过 Web 浏览器访问云桌面工作空间,工作空间中包含您需要了解的文件和软件,以及特定领域的专家提供的分步视频说明。
由于您的工作空间包含适合笔记本电脑或台式计算机使用的云桌面,因此指导项目不在移动设备上提供。
指导项目授课教师是特定领域的专家,他们在项目的技能、工具或领域方面经验丰富,并且热衷于分享自己的知识以影响全球数百万的学生。