What Is Named Entity Recognition (NER) and How Does It Work?

作者:Jessica Schulze • 更新于

The NER technique is used in many industries, from entertainment to health care. Learn why it’s popular and how it works in this article.

[Featured Image] Two artificial intelligence engineers discuss how named entity recognition will help their chatbot answer questions more effectively.

Named entity recognition (NER) is a natural language processing (NLP) method, which is a subcategory of artificial intelligence (AI) and machine learning (ML). Although it isn’t exactly a household name, named entity recognition powers much of the technology we use every day. It helps search engines produce the results we seek and enables chatbots to answer our questions in a human-like, conversational manner. In the following article, you can learn more about how this technique works, who uses it, and why. 

NER definition 

Named entity recognition, or NER, is a process that extracts information from text. It’s also referred to as entity chunking, entity extraction, or entity identification. The goal is to identify, sort, and rank pieces of information by importance. Breaking this term down into two parts can help us better understand it:

Named Entity: A named entity is any object that can be referenced by name in text.

Recognition: NER systems are trained to recognize these objects and sort them into helpful classifications called entity types.

4 types of named entity recognition models

  1. Dictionary-based: Dictionary-based NER systems reference terms listed in dictionaries to identify their presence in text. Dictionaries can be any collection of words related to a specific field or domain. You can create one yourself or use public sources such as databases. 

  2. Rule-based: Rule-based NER systems rely on a set of instructions for extracting named entities from text. You must create the rules based on two types of instruction: Pattern-based rules, which relate to word forms and structure, and context-based rules like “if a contraction such as Mr. or Ms. precedes a name, then that contraction is the person’s honorific title.” These rules can also be combined with dictionaries.

  3. Machine learning-based: Machine learning-based NER systems are based on statistical models designed to identify entity names. To develop an ML-based NER system, the machine learning model must be trained on annotated documents. Annotated documents have explanations that help the machine learn to produce entity names based on instruction and past experiences.

  4. Hybrid systems: Hybrid NER systems combine more than one of the approaches listed above. 

Why is named entity recognition useful? 

NER is especially useful for analyzing unstructured text. In the context of data sets, “unstructured” refers to the absence of organization or database formatting. For example, the collection of files in your computer can be considered unstructured. If you sorted those files into categories such as portable document formats (PDFs) and word documents (DOCs), they would become structured. NER systems reduce the need for time and resource-consuming human analysis, making them ideal for situations that involve large quantities of text.

What are examples of named entity recognition industry applications?

  • Customer service: NER models are used in customer service to power chatbots and organize data related to customer care. For example, ChatGPT responds to user queries conversationally by identifying relevant entities to determine context. A customer support system can route users to the appropriate departments by categorizing their complaints and matching them to resolutions.

  • Health care: Medical professionals use NER models to analyze large amounts of documentation regarding diseases, drugs, and patients. Being able to quickly identify and extract the most pertinent information from lengthy, unstructured text helps reduce research time. 

  • Finance: In the financial field, NER can be used to monitor trends and inform risk analyses. Aside from financial information such as loans and earnings reports, NER models can analyze company names and other relevant mentions on social media to monitor developments that may affect stock prices. 

  • Entertainment: Recommendation systems such as the ones you see on Netflix, Spotify, and Amazon are often powered by NER models that analyze your search history and content you’ve recently interacted with. 

Named entity recognition example in NLP

Named entity recognition systems can be used to enhance other natural language processing tasks, such as parsing. For example, NER can increase the efficiency of part-of-speech tagging or the categorization of words that correspond with specific parts of speech depending on context.

How does named entity recognition work?

The named entity recognition process can be broken down into five steps:

  1. Tokenization: Text must first be split into smaller splices that the NER system can process. These splices can be as small as single words or as large as whole sentences. For example, “A24 released a movie starring Mia Goth” may be split into the following tokens: A24, movie, Mia, Goth. 

  2. Identification: This step is where statistical methods or semantic rules come into play. The NER system can identify entities by format or capitalization. For example, the capitalization in “Mia” and the subsequent word “Goth” indicates a proper noun. 

  3. Classification: Now that the text has been broken down into identifiable pieces, each token can be sorted into predefined categories. Examples of these categories may include “company,” “person,” or “location.”

  4. Contextual analysis: To improve output accuracy, NER systems use context clues. Using the previous example, “Goth” will be recognized as a last name rather than a subculture since the identification process determined it to be a proper noun and the classification process placed it under the category of “person.” 

  5. Post-processing: The post-processing phase is used to refine the NER system’s results. You might use an information base to enhance the data set it’s working with or fine-tune categorization rules to resolve inexactness.

Pros and cons of using named entity recognition systems

AdvantagesDisadvantages
Automates information extraction in large volumes of textDefining rules and providing NER models with vocabulary can be time-consuming.
Applicable in nearly every industryHuman language evolves constantly, requiring NER systems to be updated to avoid false-positive identifications.
The NER process does not evaluate text for truthfulness.Can struggle with spelling variations and spoken word that’s been converted to text
Helps eliminate human errors during text analyses such as overlookingMachine-learning based NER outputs can be challenging to explain.

Learn more about named entity recognition with Coursera 

You can strengthen your knowledge of natural language processing and machine learning with expert-level guidance on Coursera. In the IBM Machine Learning Professional Certificate offered by DeepLearning.AI, you can discover the most up-to-date practical skills and knowledge machine learning experts use in their daily roles. By the end, you’ll predict course ratings by training a neural network and constructing regression and classification models. 

更新于
作者:

SEO 内容经理 I

杰西卡是一名技术作家,专攻计算机科学和信息技术。她拥有法律、技术和专业交流方面的背景,在将复杂的主题转化为清晰、吸引人的文案方面,她找到了灵感。

此内容仅供参考。建议学生多做研究,确保所追求的课程和其他证书符合他们的个人、专业和财务目标。