This article provides a comprehensive overview of data science terminology, explaining key data science terms, buzzwords, and analytical concepts.

Data science is the scientific study of data. Data scientists ask questions and find ways to answer those questions with data. They may work on capturing data, transforming raw data into a usable form, analyzing data, and creating predictive models. Many data scientists start their careers as data analysts, where they familiarize themselves with the data analysis process and the overall data analytics landscape.
Data science terminology refers to the specific vocabulary and concepts used within the field of data science to describe its techniques, tools, and processes. These terms are crucial for anyone involved in data science as they provide a common language that facilitates clearer communication, more effective collaboration, and a deeper understanding of complex concepts. Having a firm grasp of data science terminology is vital for accurately interpreting and implementing data-driven strategies in various industries.
The data science glossary below can be a useful reference if you are familiar with basic terms and want to advance your understanding of data science. You’ll find common data science terms, their definitions, and helpful links to explore these topics further.
An algorithm is a set of instructions or rules to follow in order to complete a specific task. Algorithms can be particularly useful when you’re working with big data or machine learning. Data analysts may use algorithms to organize or analyze data, while data scientists may use algorithms to make predictions or build models.
Artificial intelligence (AI) uses computer science and data to enable problem solving in machines. In this case, the intelligence is “artificial” because it’s a computer programmed to perform tasks commonly associated with human intelligence.
Learn more: Expand your data science toolkit by exploring AI courses.
Big data is a large collection of data characterized by the three V’s: volume, velocity, and variety. Volume refers to the amount of data—big data deals with high volumes of data; velocity refers to the rate at which data is collected—big data is collected at a high velocity and often streams directly into memory; and variety refers to the range of data formats—big data tends to have a high variety of structured, semi-structured, and unstructured data, as well as a variety of formats such as numbers, text strings, images, and audio.
Learn more: Unlock the power of massive datasets by mastering analysis techniques with big data.
Business intelligence (BI) is data analytics used to empower organizations to make data-driven business decisions. Business intelligence analysts analyze business data like revenue, sales, or customer data, and offer recommendations based on their analysis.
Learn more: Drive strategic decision-making in your organization using business intelligence.
A changelog is a list documenting all of the steps you took when working with your data. This can be helpful in the event that you need to return to your original data or recall how you prepared your data for analysis.
Classification is a machine learning problem that organizes data into categories. You may use this to create email spam filters, for example. Some examples of algorithms commonly used to create classification models are logistic regression, decision trees, K-nearest neighbor (KNN), and random forest.
A dashboard is a tool for monitoring and displaying live data. It is typically connected to a database and features visualizations that automatically update to reflect the most current data in the database.
Data analytics is the collection, transformation, and organization of data in order to draw conclusions, make predictions, and drive informed decision making. Data analytics encompasses data analysis (the process of deriving information from data), data science (using data to theorize and forecast), and data engineering (building data systems). Data analysts, data scientists, and data engineers are all data analytics professionals.
Learn more: Enhance your analytical capabilities and drive informed decision-making with data analytics.
There are four key types of data analytics, including:
Descriptive analytics tells us what happened.
Diagnostic analytics tells us why something happened.
Predictive analytics tells us what will likely happen in the future.
Prescriptive analytics tells us how to act.
Read more: Data Analysis Terms: A to Z Glossary
Data architecture, also called data design, is the plan for an organization’s data management system. This can include all touchpoints in the data lifecycle, including how the data is gathered, organized, utilized, and discarded. Data architects design the blueprints that organizations use for their data management systems.
Learn more: Learn to design and optimize data systems effectively with data architecture.
Data cleaning, cleansing, or scrubbing is the process of preparing raw data for analysis. When cleaning your data, you verify that your data is accurate, complete, consistent, and unbiased. It’s important to make sure you have clean data prior to analysis because unclean or dirty data can lead to inaccurate conclusions and misguided business decisions.
Learn more: Refine your data for analysis and ensure accuracy with data cleaning.
Data engineering is the process of making data accessible for analysis. Data engineers build systems that collect, manage, and convert raw data into usable information. Some common tasks include developing algorithms to transform data into a more useful form, building database pipeline architectures, and creating new data analysis tools.
Learn more: Build data solutions and streamline data processing with data engineering.
Data enrichment is the process of adding data to an existing dataset. Typically, a data scientist would enrich data during the data transformation process as they prepare to begin their analysis if they realize additional data is needed to answer the business question.
Data governance is the formal plan for how an organization manages company data. Data governance encompasses rules for the way data is accessed and used and can include accountability and compliance rules.
Learn more: Master the essentials of managing, securing, and utilizing data effectively with data governance.
A data lake is a data storage repository designed to capture and store a large amount of structured, semi-structured, and unstructured raw data. Data scientists use the data in data lakes for machine learning or AI algorithms and models, or they can process the data and transfer it to a data warehouse.
A data mart is a subset of a data warehouse that houses all processed data relevant to a specific department. While a data warehouse may contain data pertaining to the finance, marketing, sales, and human resources teams, a data mart may isolate the finance team data.
Data mining is the process of closely examining data to identify patterns and glean insights. Data mining is a central aspect of data analytics; the insights you find during the mining process will inform your business recommendations.
Learn more: Discover patterns and derive insights from complex datasets with data mining.
Data modeling is the process of mapping and building data pipelines that connect data sources for analysis. A data model is a tool that implements those pipelines and organizes data across data sources. Data modelers are systems analysts who work with data architects and database administrators to design databases and data systems.
Learn more: Shape and refine data structures for insightful analysis with data modeling.
Data visualization is the representation of information and data using charts, graphs, maps, and other visual tools. With strong data visualizations, you can foster storytelling, make your data accessible to a wider audience, identify patterns and relationships, and explore your data further.
Learn more: Transform complex data into clear, impactful visual narratives with data visualization.
A data warehouse is a centralized data repository that stores processed, organized data from multiple sources. Data warehouses may contain a combination of current and historical data that has been extracted, transformed, and loaded from internal and external databases.
Learn more: Optimize data storage and analysis for enterprise needs with a data warehouse.
Data wrangling, also called data munging or data remediation, is the process of converting raw data into a usable form. There are four stages of the munging process: discovery, data transformation, data validation, and publishing. The data transformation stage can be broken down further into tasks like data structuring, data normalization or denormalization, data cleaning, and data enrichment.
Learn more: Streamline the transformation of raw data into actionable insights with data wrangling.
A database is an organized collection of information that can be searched, sorted, and updated. This data is often stored electronically in a computer system called a database management system (DBMS). Oftentimes, you’ll need to use a programming language, such as structured query language (SQL), to interact with your database.
Learn more: Enhance your skills in organizing and maintaining data systems with database management.
Deep learning is a machine learning technique that layers algorithms and computing units—or neurons—into what is called an artificial neural network (ANN). Unlike machine learning, deep learning algorithms can improve incorrect outcomes through repetition without human intervention. These deep neural networks take inspiration from the structure of the human brain.
Learn more: Explore advanced neural networks and machine learning techniques with deep learning.
Machine learning is a subset of AI in which algorithms mimic human learning while processing data. With machine learning, algorithms can improve over time, becoming increasingly accurate when making predictions or classifications. Machine learning engineers build, design, and maintain AI and machine learning systems.
Learn more: Unlock the potential of predictive analytics and algorithmic processing with machine learning.
A relational database is a database that contains several tables with related information. Even though data is stored in separate tables, you can access related data across several tables with a single query. For example, a relational database may have one table for inventory and another table for customer orders. When you look up a specific product in your relational database, you can retrieve both inventory and customer order information at the same time.
Regression is a machine learning problem that uses data to predict future outcomes. Some examples of algorithms commonly used to create regression models are linear regression and ridge regression.
Learn more: Master the fundamentals of predictive modeling and statistical analysis with regression.
Reinforcement learning is a type of machine learning that learns by interacting with its environment and receiving positive reinforcement for correct predictions and negative reinforcement for incorrect predictions. This type of machine learning may be used to develop autonomous vehicles. Common algorithms are temporal difference, deep adversarial networks, and Q-learning.
Learn more: Explore the dynamic field of algorithms that learn through trial and error with reinforcement learning.
Structured data is defined and searchable. It is formatted, for example, into rows and columns. Because of its tidy formatting, structured data is typically easier to analyze than unstructured data. This includes data like phone numbers, dates, and product SKUs.
Structured Query Language, or SQL (pronounced “sequel”), is a computer programming language for managing relational databases. It’s among the most common languages for database management.
Learn more: Enhance your database management and querying skills with SQL.
Supervised learning is a type of machine learning that learns from labeled historical input and output data. It’s “supervised” because you are feeding it labeled information. This type of machine learning may be used to predict real estate prices or find disease risk factors. Common algorithms used during supervised learning are neural networks, decision trees, linear regression, and support vector machines.
Learn more: Develop skills in training models on labeled data with supervised learning.
Unstructured data is data that is not organized in any apparent way. In order to analyze unstructured data, you’ll typically need to implement some type of structured organization.
Unsupervised learning is a machine learning type that looks for data patterns. Unlike supervised learning, unsupervised learning doesn’t learn from labeled data. This type of machine learning is often used to develop predictive models and to create clusters. For example, you can use unsupervised learning to group customers based on purchase behavior, and then make product recommendations based on the purchasing patterns of similar customers. Hidden Markov models, k-means, hierarchical clustering, and Gaussian mixture models are common algorithms used during unsupervised learning.
Learn more: Explore the techniques for analyzing unlabeled data sets with unsupervised learning.
Dive into the fascinating world of data science with courses designed to enhance your skills and confidence in addressing complex data challenges. Whether just starting or aiming to deepen your expertise, these courses provide the essential knowledge you need.
Take advantage of the opportunity to advance in the field of data science by getting started today with IBM's Data Science Professional Certificate on Coursera. You'll master the most up-to-date practical skills and knowledge that data scientists use in their daily roles, apply your new skills to real-world projects, and build a portfolio of data projects that showcase your proficiency to employers.
Data Science 术语是数据分析、Machine Learning 和人工智能中使用的专项课程词汇。关键的数据科学术语包括 Algorithm、Model、数据管道和统计方法,这些术语有助于专业人员进行有效沟通,解决复杂的数据驱动问题。
数据科学的四大类型包括描述性分析(分析过去的数据趋势)、诊断性分析(找出趋势背后的原因)、预测性分析(使用 Model 预测未来结果)和描述性分析(根据数据洞察提供建议)。了解这些 Data Science 术语有助于企业优化决策和战略。
转行从事 Data Science 可以获得丰厚的回报。随着企业越来越依赖数据驱动决策,对数据 科学家的需求也在迅速增长。Data Science 职位通常提供有竞争力的薪酬和发展机会。
此外,数据科学领域用途广泛,可应用于各行各业,提供了一条充满活力和刺激的职业道路。通过掌握必要的技能并紧跟行业发展趋势,向 Data Science 转型是对自己未来的一项值得投资的选择。
Writer
Coursera is the global online learning platform that offers anyone, anywhere access to online course...
此内容仅供参考。建议学生多做研究,确保所追求的课程和其他证书符合他们的个人、专业和财务目标。