Birla Institute of Technology & Science, Pilani

Introduction to Data Analytics

通过 Coursera Plus 提高技能,仅需 239 美元/年(原价 399 美元)。立即节省

Birla Institute of Technology & Science, Pilani

Introduction to Data Analytics

Prof. Seetha Parameswaran
Professor Aneesh S Chivukula

位教师:Prof. Seetha Parameswaran

包含在 Coursera Plus

深入了解一个主题并学习基础知识。
中级 等级

推荐体验

2 月 完成
在 10 小时 一周
灵活的计划
自行安排学习进度
深入了解一个主题并学习基础知识。
中级 等级

推荐体验

2 月 完成
在 10 小时 一周
灵活的计划
自行安排学习进度

您将学到什么

  • Apply data preprocessing techniques using Python libraries like Pandas and NumPy to clean, transform, and prepare datasets for analysis.

  • Use EDA and ML algorithms to identify patterns, trends & solve real-world data problems through regression, classification and clustering techniques.

  • Evaluate model performance using appropriate metrics and visualise insights through data visualisation tools to effectively communicate findings.

要了解的详细信息

可分享的证书

添加到您的领英档案

作业

153 项作业

授课语言:英语(English)

了解顶级公司的员工如何掌握热门技能

Petrobras, TATA, Danone, Capgemini, P&G 和 L'Oreal 的徽标

该课程共有12个模块

In this module, the learners will be introduced to the course and its syllabus, setting the foundation for their learning journey. The course's introductory video will provide them with insights into the valuable skills and knowledge they can expect to gain throughout the duration of this course. Additionally, the syllabus reading will comprehensively outline essential course components, including course values, assessment criteria, grading system, schedule, details of live sessions, and a recommended reading list that will enhance the learner’s understanding of the course concepts. Moreover, this module offers the learners the opportunity to connect with fellow learners as they participate in a discussion prompt designed to facilitate introductions and exchanges within the course community.

涵盖的内容

3个视频1篇阅读材料1个讨论话题

This module provides a comprehensive introduction to data analytics, covering its definition, importance, key components, and industry applications. Students will learn to apply the four types of data analytics (descriptive, diagnostic, predictive, and prescriptive) to solve business problems and make data-driven decisions. They will also analyse real-world use cases, challenges, and future trends in data analytics across various domains. Additionally, the students will gain an understanding of structured, unstructured, semi-structured, quantitative, and qualitative data from primary, secondary, internal, and external sources, and learn how to apply this knowledge to data analytics projects.

涵盖的内容

17个视频4篇阅读材料16个作业1个讨论话题

This module focuses on essential Python concepts and techniques for data analytics. The module introduces basic Python concepts, such as the Python interpreter, Jupyter Notebook, input/output, and indentation, enabling students to start developing Python programs for data analytics. Students will learn to apply Python scalar types, objects, attributes, methods, and operators to create and manipulate data structures. They will also apply control statements and iterations, such as conditional statements and loops, to control the flow of execution and process data efficiently. The module covers the use of regular and lambda functions to create reusable and modular code. Additionally, students will learn to apply file-handling techniques to read from and write to files, facilitating data persistence and external data processing. By the end of this module, students will have the necessary Python skills to perform data manipulation, analysis, and processing tasks.

涵盖的内容

22个视频6篇阅读材料17个作业1个讨论话题

This module explores essential data structures in Python, covering both immutable and mutable types and the powerful NumPy library. Students will learn to apply tuples and strings, along with their methods, to store and manipulate fixed data. They will also apply lists, dictionaries, and sets, as well as their respective methods and operations, to handle changeable data effectively. The module introduces NumPy, enabling students to create, manipulate, and perform arithmetic operations on NumPy arrays using built-in functions. By the end of this module, students will have a solid understanding of Python data structures and NumPy, equipping them with the necessary tools for efficient data manipulation and numerical computations in data analytics tasks.

涵盖的内容

19个视频4篇阅读材料15个作业1个讨论话题

This module focuses on exploratory data analysis (EDA) and visualisation using the Pandas library and Matplotlib in Python. Students will learn to apply Pandas to create, manipulate, and perform operations on Series and DataFrame objects, enabling efficient data analysis and preprocessing. They will conduct EDA to identify patterns, trends, and relationships in the data. Additionally, students will apply Matplotlib to create informative and visually appealing plots to effectively communicate insights derived from EDA. By the end of this module, students will have the skills to perform comprehensive exploratory data analysis and create meaningful visualisations using Python.

涵盖的内容

19个视频4篇阅读材料16个作业1个讨论话题

This module focuses on data preprocessing techniques essential for preparing data for analysis. Students will learn to apply methods for reading and writing data in text format while identifying and addressing data quality issues. They will handle missing data by filtering out or filling in missing values and applying various data transformation techniques such as removing duplicates, mapping, replacing values, discretisation, outlier detection and filtering, and encoding categorical variables. Additionally, students will apply data aggregation techniques, including grouping, aggregation and combining functions, to summarise and analyse data. By the end of this module, students will have the skills to preprocess and clean datasets effectively, ensuring data quality and readiness for further analysis.

涵盖的内容

21个视频6篇阅读材料17个作业1个讨论话题

This module focuses on advanced data preprocessing techniques for handling large and complex datasets. Students will learn to apply data reduction techniques, including dimensionality reduction, numerosity reduction, and sampling methods, to reduce the size and complexity of datasets while preserving important information. They will also apply feature selection techniques, such as filter methods, wrapper methods, and embedded methods, to identify and select the most relevant features for data analysis. Additionally, students will explore feature extraction techniques, including Principal Component Analysis (PCA) and Covariance Analysis, to transform and extract new, informative features from the original dataset. By the end of this module, students will have the skills to effectively preprocess and optimise datasets for improved performance and insights in data analysis tasks.

涵盖的内容

13个视频3篇阅读材料14个作业1个讨论话题1个非评分实验室

This module focuses on regression analysis, a fundamental technique in predictive modeling and data analysis. Students will learn to apply linear regression techniques, including univariate and multivariate linear models, to analyse and model the relationship between dependent and independent variables in real-world applications. They will also apply model fitting techniques, such as gradient descent, and evaluate regression models using appropriate metrics to select the best-performing model for a given dataset. Additionally, students will explore nonlinear regression techniques, including smoothing methods, regularised models, robust regression, and nonlinear models, to capture and model complex, nonlinear relationships between variables. By the end of this module, students will have the skills to effectively apply regression techniques to solve real-world problems and make data-driven predictions.

涵盖的内容

12个视频3篇阅读材料10个作业1个讨论话题1个非评分实验室

This module focuses on classification techniques, specifically rule-based and parameter-based models. Students will learn to apply decision trees to solve binary and multilabel classification problems and evaluate the performance of these models. They will explore decision tree induction algorithms, considering design issues and measures of impurity, and random forests, to build effective and interpretable models. Students will also apply model selection techniques, such as cross-validation, and address overfitting issues to optimise decision tree models and visualise decision boundaries. Additionally, they will learn to apply logistic regression and discriminant analysis, parameter-based models, to solve classification problems and evaluate its performance. By the end of this module, students will have the skills to effectively apply classification techniques to real-world problems and make data-driven predictions.

涵盖的内容

17个视频4篇阅读材料17个作业1个讨论话题1个非评分实验室

This module focuses on unsupervised learning techniques for clustering, which aim to discover natural groupings and patterns in data without prior knowledge of class labels. Students will learn to apply partitional clustering techniques, specifically the k-Means algorithm, considering similarity measures, distance matrices, and cluster goodness evaluation. They will also explore hierarchical clustering methods, both bottom-up agglomerative and top-down divisive, to create nested clusters and analyse data at different levels of granularity. Additionally, students will apply cluster validation techniques, including external and internal indices, to assess the quality of clustering results and determine the optimal number of clusters for a given dataset. By the end of this module, students will have the skills to effectively apply clustering techniques to real-world problems and gain insights from unlabeled data.

涵盖的内容

13个视频4篇阅读材料13个作业1个讨论话题

This module focuses on privacy, fairness, and security of data analytics. Students will learn about the risk assessment and threat modeling in the practical use of data analytics. Privacy-preserving data mechanism for model privacy will be surveyed. The attack strategies and defense mechanisms of model security will be emphasized. Notions of AI fairness and algorithmic bias will be covered at the stages of pre-processing, in-processing, post-processing stages of data analytics. Cost-sensitive classification and machine learning will be discussed to assess model fairness. Model security will be formalized under frameworks of adversarial data mining for game theory based AI with applications in the cyber kill chain for cybersecurity. Adversarial example games will be summarized for specific targets in adversarial capability, ability and goals. An adversarial risk analysis of the game theories and association optimization trade-offs will be presented in the setup of binary classification, multiclass classification, and multilabel classification. Relation between adversarial and robust data mining for classifier design will be motivated with respect to the robustness properties of analytics models satisfied in defense mechanisms such as semi-supervised machine learning, adversarial training and learning, empirical risk minimization, and mistake-bounds frameworks for adversarial classification. By the end of this module, students will have the skills to effectively apply data analytics techniques to real-world problems and gain insights in a safe, secure, and transparent manner.

涵盖的内容

16个视频4篇阅读材料17个作业1个讨论话题1个非评分实验室

涵盖的内容

1个作业

位教师

Prof. Seetha Parameswaran
Birla Institute of Technology & Science, Pilani
2 门课程 498 名学生
Professor Aneesh S Chivukula
Birla Institute of Technology & Science, Pilani
1 门课程 471 名学生

提供方

从 Data Analysis 浏览更多内容

人们为什么选择 Coursera 来帮助自己实现职业发展

Felipe M.

自 2018开始学习的学生
''能够按照自己的速度和节奏学习课程是一次很棒的经历。只要符合自己的时间表和心情,我就可以学习。'

Jennifer J.

自 2020开始学习的学生
''我直接将从课程中学到的概念和技能应用到一个令人兴奋的新工作项目中。'

Larry W.

自 2021开始学习的学生
''如果我的大学不提供我需要的主题课程,Coursera 便是最好的去处之一。'

Chaitanya A.

''学习不仅仅是在工作中做的更好:它远不止于此。Coursera 让我无限制地学习。'
Coursera Plus

通过 Coursera Plus 开启新生涯

无限制访问 10,000+ 世界一流的课程、实践项目和就业就绪证书课程 - 所有这些都包含在您的订阅中

通过在线学位推动您的职业生涯

获取世界一流大学的学位 - 100% 在线

加入超过 3400 家选择 Coursera for Business 的全球公司

提升员工的技能,使其在数字经济中脱颖而出

常见问题