Amazon Comprehend

Learn via video courses
Topics Covered

Overview

With AWS providing numerous cloud services that help us to handle the huge amount of data that we are generating these days, Amazon Comprehend which uses Natural Language Processing (NLP), helps to create new products by extracting valuable insights from the content of documents with the UTF-8 text file format. The insights are developed by recognizing the sentiments, entities, important phrases,` Personally Identifiable Information, language, and other similar elements in a document. We can search social networking feeds for mentions of products or even scan an entire document repository for specific key phrases by simply implementing the Amazon Comprehend.

What is NLP

While we shall be learning about Amazon Comprehend throughout the entire article, let us discuss in brief Natural Language Processing first which is the concept around which Amazon Comprehend was built.

Natural Language Processing (widely known as NLP) is defined as an approach for computers to understand, analyze, and extract significant meaning from textual data smartly and conveniently. With Natural Language Processing, we can effortlessly read a piece of text and extract important information like syntax, categorizing the text, phrases, and sentiment of the text key entities like location, date, brand, etc. discussed in the text, along with even extracting the language of the text. Hence, this way the NLP proves to be valuable for the business with its ability to capture beneficial insights from unstructured data which previously was too much of a burden to gauge.

The below flowchart shows how text data is processed for Natural language Processing: Text Data

Some application of Natural Language Processing is mentioned below image to give a glance at their use cases: Application of NLP

What is Amazon Comprehend?

Now that we learned about NLP, let us start discussing what is Amazon Comprehend.

To briefly describe, Amazon Comprehend is a natural language processing (NLP) managed service which uses the concepts of ML(machine learning) to discover valuable insights from text present in a document. With Amazon Comprehend we can do Sentiment Analysis, Custom Classification, Key phrase Extraction, Custom Entity Recognition, Entity Recognition, and more such APIs are available which will help the business to expand its growth by understanding the pain points.

We can easily integrate NLP into the applications and start analyzing what text from different documents says about the products that the organization manufactures or offers its customers. By simply calling the Amazon Comprehend APIs in the application and providing the location of the source text document, the APIs find the language of the text, extract key phrases, language in a JSON format, people, understand sentiment about products or services, and can also specify the relevant topics from a library of documents. Amazon Comprehend allows a corpus of text documents to pass through which can be customer product review forums, web pages, customer support tickets, product reviews, social media feeds, emails, or even blogs.

The below image represents what output an Amazon Comprehend can represent: What is Amazon Comprehend1

Quick Note: Implementing the Amazon Comprehend within your organization differentiates your business as it custom trains a model to organize the text documents and define the associated topics with no prior machine learning experience.

Below is a glimpse of what all Amazon Comprehend has to offer to its Customers: What is Amazon Comprehend2

Amazon Comprehend helps businesses uncover beneficial insights from their text-based documents. The out-of-the-box implementations can be Entity extraction, Key phrase Language identification, Personally identifiable information, or even Sentiment where these ML models around them require no training but proper formatting of the text and correct uploading to the AWS which allows for an easy path to understanding the unstructured and untagged text data.

Features of Amazon Comprehend

Listed below are the features offered by Amazon Comprehend which it provides to its users and has wide demand accordingly:

Keyphrase Extraction With the Keyphrase Extraction API, you can return the key phrases or the talking points with a confidence score that supports the credibility of the key phrase.

Language Detection With Language Detection API, Amazon Comprehend automatically recognizes text written across 100 languages and returns the dominant language along with a confidence score that supports which language is more dominant than others.

Multiple Language Support You can perform text analysis in multiple languages such as German, English, Hindi, French, Korean, Arabic, Chinese (simplified), Spanish, Italian, Chinese (Traditional) Portuguese, and Japanese text.

Sentiment Analysis With Sentiment Analysis API, you get the the ability that returns the overall sentiment which can be Positive, Mixed, Neutral, or Negative of the text.

Custom Classification The Custom Classification API offered by Amazon Comprehend gives you the ability to effortlessly build custom text classification models with your business-specific labels without diving deep into ML. To automatically compartmentalize the inbound requests according to the different question type like how customer describes their issues you can use Custom Classification.

Syntax Analysis By implementing the Amazon Comprehend Syntax API, customers can analyze the text using tokenization` and Parts of Speech (PoS). Further, they can also identify word boundaries and labels such as nouns and adjectives from the text.

Entity Recognition Amazon Comprehend Entity Recognition API gives you the capability to return the named entities (like "Locations," "People," etc.) which based on the provided text get automatically categorized.

Topic Modeling The Topic Modeling can be done by identifying some relevant topics(terms) stored in a collection of documents on Amazon S3. After this, it identifies the most common topics of the collection and sorts them into groups. Finally, they are mapped into documents analyzing which topic belongs to which documents.

Personal Identifiable Information Identification and Redaction By implementing the Amazon Comprehend ML capabilities you can detect and polish the personally identifiable information (PII) in customer emails, product reviews, social media, support tickets, and more.

Custom Entity Recognition Custom Entity Recognition offered by Amazon Comprehend gives you the capability to customize Amazon Comprehend which can pick out the topics specific to the defined domain.

The below diagram shows some of the few features that we learned about:

Amazon Comprehend features

Benefits Of Amazon Comprehend

Listed below are ten major benefits that Amazon Comprehend provides to its users which can be unleashed based on your scenarios:

Using Deep Learning we can do Natural Language Processing – By implementing Amazon Comprehend, we can start to accurately analyze text using the deep learning technology. To improve the accuracy of our model we can constantly train it with new data over multiple domains.

Output results and volume data are encrypted – When we store the documents on Amazon S3 it already carries the ability to encrypt those input documents, and by integrating the Amazon Comprehend we extend this capability even further. With our own KMS key, we can not only encrypt the output results of the scheduled job but also the data associated with the storage volume that processes this analysis job which gives significantly enhanced security.

We can support the general and industry-specific text - By implementing the Amazon Comprehend we can identify many upcoming industry-specific insights from their unstructured text and documents. With the specific Amazon Comprehend Medical aims to identify certain critical medical information, like medical conditions along with medication from various sources like doctor’s prescriptions or notes, and then analyze their relationship with each other which enables simpler and effortless analysis.

Below is the logical flowchart of how the Medical Cohort analysis is done using Amazon Comprehend: logical flowchart

Unleashing the valuable insights from the text - We get the ability to uncover the valuable insights, significance, and relationships in text from customer support incidents, social media feeds, documents, product reviews, news articles, and other sources.

Natural Language Processing is now Scaleable – The Natural Language Processing which is used by Amazon Comprehend to derive and understand valuable insights from text within the documents is now scalable as it enables us to analyze millions of documents which helps us to easily discover insights that it contains.

Simple and Quick integration with other AWS services – The Amazon Comprehend is designed such that it can easily, quickly, and effortlessly get integrated with many AWS services such as Amazon S3, AWS KMS, and AWS Lambda along with working smoothly enough to provide a seamless experience to its customers.

Labelling the documents by topics - We can label the text or documents with topics or tags defined by us by training Amazon Comprehend. With Natural Language Processing techniques we can go beyond keyword search or rules-based tagging for more accurate document classification. This will help us deliver highly personalized content to our customers which gives them richer navigation based on these terms or tags.

Some benefits that we studied above are shown below image to help you recapture it all: add all the keywords mentioned above

How Does Amazon Comprehend Work?

In the below section, we shall learn about how Amazon Comprehend works.

How Does Amazon Comprehend Work The above diagram shows how Amazon Comprehend works to imply the insights generated to improve the rate of positive outcomes within less time.

  • The Amazon Comprehend works on the pre-trained model to gather valuable insights over a set of documents. Then this pre-trained model is continuously trained with the large body of the text so that no extra training data is required.

  • We can use Amazon Comprehend to build the customized models that contain the custom classification along with custom entity recognition.

  • With Amazon Comprehend we get the topic modeling features as well using the in-built model. The Topic modeling investigates a corpus of documents and then arranges the set of documents based on similar keywords in them.

  • Amazon Comprehend provides the synchronous and asynchronous text document processing modes where using the synchronous mode helps to process from one document to a set of up to 25 documents. While using an asynchronous job we get the flexibility to process a large number of documents. Amazon Comprehend works with the AWS Key Management Service (AWS KMS) that helps to provide enhanced encryption to our data.

Below we shall understand the four major topics related to working with Amazon Comprehend which will help us work with Amazon Comprehend effectively as analyzing the scenarios would become easier as follows:

  • Amazon Comprehend Insights
  • Amazon Comprehend Custom
  • Document Clustering ( Topic Modelling )
  • Document processing modes

Amazon Comprehend Insights

As Amazon Comprehend uses the concept of pre-training the model to investigate and analyze the set of documents to gather beneficial insights. The pre-trained model is continuously trained with the large body of text to provide a model which doesn't need to again train with more data.

The Amazon Comprehend Insights concept is around the gathering of beneficial outcomes where it captures the following types of insights:

Entities – Defined as the references given to the names of locations, people, things, and places contained in a document. Syntax – Defined as the parts of speech for each word in the document. Key phrases – Defined as the phrases that randomly appear in a document like when a document is telling about a tennis game then it might contain a and so it shall return team names, and venue names, along with the final score. Personally Identifiable Information (PII) – Defined as the personal sensitive data of an individual that can help to identify an individual, with information related like bank account number, address, date, and place of birth, mother's maiden name, or biometric records or phone number. Language – Defined as the predominant language of a document. Sentiment – Defined as the predominant sentiment from a document, which is either positive, negative, neutral, or mixed. Targeted sentiment – Defined as the sentiments that are associated with defined entities from a document. here, the sentiment for each of the occurred of the entity could be positive, negative, neutral, or mixed.

Targeted sentiment

Amazon Comprehend Custom

When we talk about the Amazon Comprehend Custom we can refer to it as the customization of Amazon Comprehend with our defined requirements without the knowledge required to build the machine learning-based NLP solutions. By implementing automatic machine learning, also known as AutoML, Amazon Comprehend Custom builds the customized NLP models on our behalf, implementing the data we already had. We have two distinguished classifications under Amazon Comprehend Custom as follows:

Custom classification – Defined as the Amazon Comprehend Custom where we can create custom classification models or classifiers that can be used to organize our set of documents into our categories.

Custom entity recognition – Defined as the Amazon Comprehend Custom where we can create specific custom entity recognition models or recognizers that can analyze text for our specific terms and noun-based phrases.

Document Clustering ( Topic Modelling )

We have talked about the Topic of Modelling before as well when we discussed the features of Amazon Comprehend. The Topic Modeling can be done by identifying some relevant topics(terms) stored in a collection of documents on Amazon S3. After this, it identifies the most common topics of the collection and sorts them into groups. Finally, they are mapped into documents analyzing which topic belongs to which documents.

With Topic modeling, we'll investigate a set of documents to organize those documents into sections based on similar keywords within them. Document clustering or topic modeling is considered useful as it helps to easily organize a large set of documents into clusters that are of similar word-based frequency.

Document Processing Modes

When we talk about Amazon Comprehend returning the model and then segregating these documents based on similarity we need to know the three different Document processing modes that Amazon Comprehend supports. The choice we made for document processing mode depends on the number of documents we need to process and how immediately we need to view the output:

Single-document synchronous – We can call Amazon Comprehend with a single document and receive a synchronous response which gets delivered to the console or the application right away.

Multi-document synchronous – We can call the Amazon Comprehend API for a collection of up to 25 documents and receive a synchronous response when we work with Multi-document synchronous document processing mode.

Asynchronous batch – We call the Amazon Comprehend Asynchronous batch data processing mode when a large set of documents, first are put into an Amazon S3 bucket, and then we start an asynchronous job by using console or API operations to further analyze the documents. This data is then stored in the S3 bucket for analysis defined in the request.

Amazon Comprehend Pricing

When we talk about the Pricing that Amazon` comprehend charges for using the service is described as follows:

You are not charged any minimum fees or any upfront commitments cost. We get to pay only for the documents that we want to analyze and customize our models around them to train them.

The Pricing structure for Amazon Comprehend is around the four major categories of analysis that it provides which are stated below:

  • Natural Language Processing,
  • Personal Identifiable Information (PII) detection and redaction,
  • Custom Classification and Entity detection
  • Topic modeling,

These enable a broad range of applications that analyze the raw text with some APIs and document formats such as PDF and Word.

Let us briefly know about the pricing charges for each.

Pricing for Natural language processing: We use the Amazon Comprehend APIs for sentiment analysis, key phrase extraction, entity recognition, syntax analysis, or even language detection. This type of analysis is used to capture beneficial insights from the natural language text. The requests made are measured in units of 100 characters where 1 unit = 100 characters, and 3 units or 300 characters is the minimum charge per request.

Pricing for Personal Identifiable Information (PII): When we do a Personal Identifiable Information (PII) detection and redaction, the detect PII API help to locate the place of the specified Personally Identifiable Information (“PII”) entities from the document which can be used to create the redacted versions of those documents. This PII API tells about if the document has the specified PII or not. These requests made are measured in units of 100 characters where 1 unit = 100 characters, and 3 units or 300 characters is the minimum charge per request.

Pricing for Custom Comprehend: When we talk about the pricing concerning the Custom Classification and Entities APIs that helps to train a custom NLP model to organize the text and extract the custom entities from it. The Asynchronous inference requests made are measured in units of 100 characters where 1 unit = 100 characters, where 3 units or 300 characters is the minimum charge per request. We are also charged 3perhourforthemodeltrainingwhichgetsbilledbythesecondand3 per hour for the model training which gets billed by the second and 0.50 per month for the customized model management. Concerning Synchronous Custom Classification and Entities inference requests made, we provision an endpoint with the appropriate throughput where we get charged from the time that we start the endpoint until it is deleted or removed.

Pricing for Topic Modeling: Topic Modeling specifies the relevant topics from a set of documents that are stored in Amazon S3. We get charged based on the total size of documents that get successfully processed per analysis job. Here, the first 100 MB gets charged at a flat rate. Then if more than 100 MB, we get charged per MB.

The pictorial representation of Amazon Comprehend pricing is mentioned below: Targeted Sentiment1

The AWS Free Tier Pricing Structure for Amazon Comprehend:

50K UNITS OF TEXT (5M CHARACTERS)

  • As far as the AWS Free tier for Amazon Comprehend is concerned, we get a free tier that covers 50K units of text that is, 5M characters per API per month.

All the eligible APIs are Sentiment, Entity Recognition, Detect PII, Event Detection, Key Phrase Extraction, Syntax Analysis, Targeted Sentiment, Language Detection, and Contains PII.

Quick Note: The Custom Comprehend offered by Amazon Comprehend for the customized classification and entities including the inference, model training, and model management does not come in the free tier.

TOPIC MODELING - 5 JOBS UP TO 1MB EACH

AWS offers a free tier for using Amazon Comprehend for both new and existing AWS customers for 12 months, where the start date is as soon as the first Amazon Comprehend request is sent.

The below image represents the pricing structure for Topic Modelling and Custome COmprehend: Targeted Sentiment3

Use Cases of Amazon Comprehend

Now we shall dive into a few use cases of Amazon Comprehend which we can implement whenever a similar situation arises:

Use cases of Amazon comprehend

  • Processing of the Financial documents We can use Amazon Comprehend to classify and capture the entities from the financial services documents like information regarding insurance claims or mortgage packages.
  • Mining of the business or call center analytics With Amazon Comprehend we can detect customer sentiment and start to analyze customer interactions and automatically segregate the inbound support requests.

Below is a representation of how Amazon Comprehend can be used to do valuable Customer analytics to improve the product from their feedback. Minig of the business or call center analytics

  • Legal briefs management We can automate the flow of capturing insights from packs of legal briefs like contracts or court records. This is helping to secure the documents by identifying and redacting the Personally Identifiable Information (PII).
  • Discovering the pain points and needs of your customers Using Amazon Comprehend topic modeling we can discover the topics that our customers are talking about on certain forums and message boards by doing entity detection to analyze the topics they are associated with.

Below is a representation of how Amazon Comprehend can be used to Discover the pain points and needs of your customers Discovering the pain points and needs of your customers

  • Indexing, Searching, and Analyzing the product reviews With Amazon Comprehend we can focus smartly on context by equipping the search engine with not just plain simple keywords but with index sentiment, index key phrases, and entities.

Below is a representation of how Amazon Comprehend can be used to do Indexing, Search, and Analyze the product reviews Indexing, Searching, and Analyzing the product reviews

  • Search documents about a subject By using Amazon Comprehend topic modeling, we can search documents about a particular subject. We can easily scan a set of documents and determine the topics discussed by specifying the number of topics that Amazon Comprehend must return from the document set. Below is a representation of how Amazon Comprehend can be used to Search documents about a subject Search documents about a subject

Companies using Amazon Comprehend

The below image refers to some of the talked about companies that have been using Amazon Comprehend and unleashing its benefits:

Companies using Amazon Comprehend

LexisNexis: LexisNexis discovers valuable insights from its legal documents quickly and effectively by using accurate custom entity recognition models.

ClearView Social: ClearView Social unleashed the benefits, to read an article and extracting topics from it.

TeraDact: TeraDact is now offering users a safe environment for sharing their PII information to effectively redact the PII that helps them reach a larger customer base.

Chisel AI: Chisel AI automates document-intensive processes, which they now automated helping them significantly reduce customer effort.

Conclusion

Some key takeaway points from the article are as below:

  • Natural Language Processing (widely known as NLP) is defined as an approach for computers to understand, analyze, and extract significant meaning from textual data smartly and conveniently.

  • Amazon Comprehend charges no minimum fees or any upfront commitments cost. We get to pay only for the documents that we want to analyze `` analyzeandcustomize` our models around them to train them.

  • Using Amazon Comprehend topic modeling we can discover the topics that our customers are talking about on certain forums and message boards by doing entity detection to analyze the topics they are associated with. Finally, we can use sentiment analysis to then derive meanings from it.