This course introduces the necessary concepts and common techniques for analyzing data. The primary emphasis is on the process of data analysis, including data preparation, descriptive analytics, model training, and result interpretation. The process starts with removing distractions and anomalies, followed by discovering insights, formulating propositions, validating evidence, and finally building professional-grade solutions. Following the process properly, regularly, and transparently brings credibility and increases the impact of the results.

您将学到什么
1. Apply appropriate techniques for generating insights from data.
2. Present actionable solutions with confidence to the business stakeholders.
您将获得的技能
要了解的详细信息

添加到您的领英档案
32 项作业
了解顶级公司的员工如何掌握热门技能

积累特定领域的专业知识
- 向行业专家学习新概念
- 获得对主题或工具的基础理解
- 通过实践项目培养工作相关技能
- 获得可共享的职业证书

该课程共有9个模块
Welcome to Data Preparation and Analysis! Module 1 guides students through the art of crafting informative and visually appealing histograms, a fundamental aspect of data visualization. Students will learn techniques for measuring the location and scale of data, understanding the origins and impacts of noise and missing values in datasets. This module also introduces the CRISP-DM Process, a structured approach to data mining, along with Gartner's Analytics Ascendancy Model for advanced data analysis. Additionally, students will explore the distinction between raw data and processed information, a key concept for effective data interpretation and decision-making.
涵盖的内容
10个视频7篇阅读材料4个作业1个讨论话题1个非评分实验室
10个视频• 总计54分钟
- Course Overview• 1分钟
- Instructor Introduction• 1分钟
- Module 1 Introduction• 1分钟
- Why Do We Analyze Data• 6分钟
- The Process of Data Analysis - Part 1• 7分钟
- The Process of Data Analysis - Part 2• 6分钟
- The First Step of Knowing Your Data - Part 1• 8分钟
- The First Step of Knowing Your Data - Part 2• 5分钟
- The First Step of Knowing Your Data - Part 3• 9分钟
- The First Step of Knowing Your Data - Part 4• 10分钟
7篇阅读材料• 总计290分钟
- Syllabus• 10分钟
- Data Files• 60分钟
- Module 1 Introduction• 30分钟
- Big Data and IEEE 754• 60分钟
- CRISP-DM2• 60分钟
- Selecting the Bin Size of a Time Histogram• 60分钟
- Module 1 Summary• 10分钟
4个作业• 总计225分钟
- Why Do We Analyze Data Quiz• 15分钟
- The Process of Data Analysis Quiz• 15分钟
- Knowing Your Data Quiz• 15分钟
- Module 1 Summative Assessment• 180分钟
1个讨论话题• 总计60分钟
- Meet and Greet Discussion• 60分钟
1个非评分实验室• 总计60分钟
- Module 1 Python Lab - VS Code• 60分钟
Module 2 delves into the intricacies of statistical analysis, beginning with a thorough understanding of the p-value concept and its significance as a Type I Error indicator. Students will learn to apply statistical tests in Python to identify significantly correlated features, exploring various correlation metrics tailored for categorical, mixed-type, and continuous features. This module emphasizes practical application, equipping students with the skills to calculate and interpret these metrics using Python, thereby enhancing their ability to conduct sophisticated data analysis and draw meaningful conclusions from complex datasets.
涵盖的内容
7个视频5篇阅读材料4个作业1个非评分实验室
7个视频• 总计54分钟
- Module 2 Introduction• 2分钟
- Discover and Measure Associations - Part 1• 10分钟
- Discover and Measure Associations - Part 2• 10分钟
- Measure Associations - Part 1• 8分钟
- Measure Associations - Part 1 (Continued)• 7分钟
- Measure Associations - Part 2• 9分钟
- Measure Associations - Part 2 (Continued)• 9分钟
5篇阅读材料• 总计250分钟
- Module 2 Introduction• 60分钟
- Chicago Taxi Trip Data• 60分钟
- Correlation with Python• 60分钟
- Eta-squared• 60分钟
- Module 2 Summary• 10分钟
4个作业• 总计225分钟
- Correlation of Continuous Features Quiz• 15分钟
- Correlation of Mixed Types Features• 15分钟
- Means to an End for Feature Screening Quiz• 15分钟
- Module 2 Summative Assessment• 180分钟
1个非评分实验室• 总计60分钟
- Module 2 Python Lab - VS Code• 60分钟
Module 3 offers a deep dive into the world of Association Rules, teaching students how to improvise these rules for identifying valuable feature combinations that generate specific label values. Learners will master setting appropriate thresholds for Support and Confidence and gain a comprehensive understanding of the Apriori Algorithm and the significance of Frequent Itemsets within it. This module covers the calculation of common metrics for Association Rules, familiarizing students with the relevant terminology. Additionally, learners will explore the practical application of Association Rules in Market Basket Analysis, including strategies for cross-selling, up-selling, and product bundling, equipping them with valuable skills for advanced data-driven decision making in business contexts.
涵盖的内容
7个视频5篇阅读材料3个作业1个非评分实验室
7个视频• 总计46分钟
- Module 3 Introduction• 1分钟
- What is in Your Basket - Part 1• 7分钟
- What is in Your Basket - Part 2• 6分钟
- How Are Association Rules Discovered - Part 1• 9分钟
- How Are Association Rules Discovered - Part 2• 8分钟
- What Can Association Rules Tell Me - Part 1• 8分钟
- What Can Association Rules Tell Me - Part 2• 6分钟
5篇阅读材料• 总计200分钟
- PGML Chapter 3• 60分钟
- Cross-Selling• 60分钟
- Apriori Algorithm and Association Rules• 60分钟
- Module 3 Summary• 10分钟
- Insights from an Industry Leader: Learn More About Our Program• 10分钟
3个作业• 总计210分钟
- Market Basket Analysis Quiz• 15分钟
- Association Rules Discovery Quiz• 15分钟
- Module 3 Summative Assessment• 180分钟
1个非评分实验室• 总计60分钟
- Module 3 Python Lab - VS Code• 60分钟
In Module 4, students will learn how to describe and interpret profiles of clusters, gaining proficiency in deploying the K-Means and K-Modes clustering algorithms. They will explore the application of Recency, Frequency, and Monetary (RFM) Analysis to identify the most valuable customers in retail business settings. The module also covers the technique of Simple Random Sampling with the option of incorporating stratification variables, enhancing the precision of data analysis. Furthermore, it emphasizes the importance of objectively validating models using a testing partition, ensuring the reliability and effectiveness of the analytical models in real-world scenarios.
涵盖的内容
8个视频5篇阅读材料4个作业1个非评分实验室
8个视频• 总计70分钟
- Module 4 Introduction• 1分钟
- Partition Observations for Training Models - Part 1• 10分钟
- Partition Observations for Training Models - Part 2• 12分钟
- Create Segments of Observations for Business Reasons - Part 1• 10分钟
- Create Segments of Observations for Business Reasons - Part 2• 10分钟
- Put Observations with Similar Feature Values in Clusters - Part 1• 10分钟
- Put Observations with Similar Feature Values in Clusters - Part 2• 11分钟
- Put Observations with Similar Feature Values in Clusters - Part 3• 8分钟
5篇阅读材料• 总计220分钟
- PGML Chapter 4 • 30分钟
- Sampling Techniques• 60分钟
- RFM• 60分钟
- Clustering• 60分钟
- Module 4 Summary• 10分钟
4个作业• 总计225分钟
- Partition Observations for Training Models Quiz• 15分钟
- Segments of Observations Quiz• 15分钟
- Clustering Quiz• 15分钟
- Module 4 Summative Assessment• 180分钟
1个非评分实验室• 总计60分钟
- Module 4 Python Lab - VS Code• 60分钟
This module delves into feature importance analysis in machine learning, covering Shapley Values, feature selection methods, statistical evaluation, feature interaction, aliasing, and the Least Squares Algorithm. Students will be able to master these concepts to build robust and interpretable models.
涵盖的内容
8个视频5篇阅读材料4个作业1个非评分实验室
8个视频• 总计53分钟
- Module 5 Introduction• 1分钟
- Linear Regression Model - Part 1• 10分钟
- Linear Regression Model - Part 2• 5分钟
- Forward Selection - Part 1• 8分钟
- Forward Selection - Part 2• 4分钟
- Feature Importance - Part 1• 9分钟
- Feature Importance - Part 2• 8分钟
- Feature Importance - Part 3• 7分钟
5篇阅读材料• 总计250分钟
- Linear Regression Analysis • 60分钟
- Least Squares Regression • 60分钟
- Forward and Backward Stepwise Regression• 60分钟
- Shapley Values• 60分钟
- Module 5 Summary• 10分钟
4个作业• 总计225分钟
- Linear Regression Model Quiz• 15分钟
- Feature Selection Quiz• 15分钟
- Feature Importance Quiz• 15分钟
- Module 5 Summative Assessment• 180分钟
1个非评分实验室• 总计60分钟
- Module 5 Python Lab - VS Code• 60分钟
In Module 6, students will master the art of feature selection in machine learning by exploring the Forward and Backward Selection Method, the All-Possible Subsets Method, and the concept of complete and quasi-complete separation. Students will also discover association rules for identifying separations, interpret model parameters and predicted probabilities, and delve into the concepts of maximum likelihood estimation, odds, and odds ratios.
涵盖的内容
6个视频5篇阅读材料4个作业1个非评分实验室
6个视频• 总计34分钟
- Module 6 Introduction• 1分钟
- Logistic Regression - Part 1• 6分钟
- Logistic Regression - Part 2• 7分钟
- Forward Selection• 9分钟
- Interpret Model and Assess Performance - Part 1• 8分钟
- Interpret Model and Assess Performance - Part 2• 4分钟
5篇阅读材料• 总计220分钟
- PGML Chapter 6• 30分钟
- Predictive Analytics• 60分钟
- Forward Selection• 60分钟
- Best R-squared for Logistic Regression• 60分钟
- Module 6 Summary• 10分钟
4个作业• 总计225分钟
- Logistic Regression Quiz• 15分钟
- Forward Selection Quiz• 15分钟
- Blessing and the Curse of Too Many Predictors Quiz• 15分钟
- Module 6 Summative Assessment• 180分钟
1个非评分实验室• 总计60分钟
- Module 6 Python Lab - VS Code• 60分钟
Module 7 will equip students wth the ability to harness the power of tree-based models to uncover hidden patterns in your data. Students will be able to describe clusters effectively, intelligently set algorithm parameters, construct business rules from tree results, and utilize variance metrics, entropy values, and Gini indices for optimal tree construction.
涵盖的内容
7个视频5篇阅读材料4个作业1个非评分实验室
7个视频• 总计37分钟
- Module 7 Introduction• 1分钟
- Motivation of Decision Trees - Part 1• 6分钟
- Motivation of Decision Trees - Part 2• 5分钟
- The CART Algorithm - Part 1• 3分钟
- The CART Algorithm - Part 2• 9分钟
- Cluster Profiling - Part 1• 4分钟
- Cluster Profiling - Part 2• 7分钟
5篇阅读材料• 总计220分钟
- PGML Chapter 5• 30分钟
- CART• 60分钟
- CART as an Equation• 60分钟
- Decision Trees for Clustering• 60分钟
- Module 7 Summary• 10分钟
4个作业• 总计225分钟
- Motivation of Decision Trees Quiz• 15分钟
- The CART Algorithm Quiz• 15分钟
- Cluster Profiling Quiz• 15分钟
- Module 7 Summative Assessment• 180分钟
1个非评分实验室• 总计60分钟
- Module 7 Python Lab - VS Code• 60分钟
Module 8 delves into the realm of evaluation metrics for machine learning models. Students will master the concepts of precision and recall curves, lift curves, and receiver operating characteristics (ROC) curves. Additionally, students will obtain the ability to discover methods for calculating probability thresholds using Kolmogorov-Smirnov statistics and F1 scores. They will be able to explore metrics like misclassification rate, area under the curve (AUC), and root mean squared error (RMSE), along with techniques for computing RMSE and detecting severely misfitted observations using model-specific residuals.
涵盖的内容
8个视频5篇阅读材料4个作业1个非评分实验室
8个视频• 总计43分钟
- Module 8 Introduction• 1分钟
- Prediction Models• 8分钟
- Nominal Classification Models• 6分钟
- Binary Classification Models - Part 1• 4分钟
- Binary Classification Models - Part 2• 6分钟
- Binary Classification Models - Part 3• 5分钟
- Binary Classification Models - Part 4• 6分钟
- Binary Classification Models - Part 5• 7分钟
5篇阅读材料• 总计235分钟
- PGML Chapter 7, 8 • 45分钟
- Outliers• 60分钟
- ROC Curve• 60分钟
- Using Life Analysis• 60分钟
- Module 8 Summary• 10分钟
4个作业• 总计225分钟
- Metrics for Prediction Models Quiz• 15分钟
- Metrics for Classification Models Quiz• 15分钟
- Charts for Classification Models Quiz• 15分钟
- Module 8 Summative Assessment• 180分钟
1个非评分实验室• 总计60分钟
- Module 8 Python Lab - VS Code• 60分钟
This module contains the summative course assessment that has been designed to evaluate your understanding of the course material and assess your ability to apply the knowledge you have acquired throughout the course. Be sure to review the course material thoroughly before taking the assessment.
涵盖的内容
1个作业
1个作业• 总计180分钟
- Summative Course Assessment• 180分钟
获得职业证书
将此证书添加到您的 LinkedIn 个人资料、简历或履历中。在社交媒体和绩效考核中分享。
攻读学位
课程 是 Illinois Tech提供的以下学位课程的一部分。如果您被录取并注册,您已完成的课程可计入您的学位学习,您的学习进度也可随之转移。
攻读学位
课程 是 Illinois Tech提供的以下学位课程的一部分。如果您被录取并注册,您已完成的课程可计入您的学位学习,您的学习进度也可随之转移。
Illinois Tech
Master of Data Science
学位 · 12-15 months
必须成功申请并注册。资格要求适用。各院校会根据您现有的学分情况,确定完成本课程后可计入学位要求的学分。单击特定课程了解更多信息。
提供方

提供方

Illinois Tech is a top-tier, nationally ranked, private research university with programs in engineering, computer science, architecture, design, science, business, human sciences, and law. The university offers bachelor of science, master of science, professional master’s, and Ph.D. degrees—as well as certificates for in-demand STEM fields and other areas of innovation. Talented students from around the world choose to study at Illinois Tech because of the access to real-world opportunities, renowned academic programs, high value, and career prospects of graduates.
人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

Jennifer J.

Larry W.

Chaitanya A.
常见问题
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.
更多问题
提供助学金,



