Discourse in NLP

Learn via video courses
Topics Covered

Overview

Discourse in NLP is nothing but coherent groups of sentences. When we are dealing with Natural Language Processing, the provided language consists of structured, collective, and consistent groups of sentences, which are termed discourse in NLP. The relationship between words makes the training of the NLP model quite easy and more predictable than the actual results.

Discourse Analysis is extracting the meaning out of the corpus or text. Discourse Analysis is very important in Natural language Processing and helps train the NLP model better.

Pre-requisites

Before learning about the Discourse in NLP, let us first learn some basics about the NLP itself.

  • NLP stands for Natural Language Processing. In NLP, we perform the analysis and synthesis of the input, and the trained NLP model then predicts the necessary output.
  • NLP is the backbone of technologies like Artificial Intelligence and Deep Learning. In basic terms, we can say that NLP is nothing but the computer program's ability to process and understand the provided human language.

Introduction

One of the primary challenges that we face in the world of Artificial Intelligence is processing Natural Language data by computers. We can even say that Natural Language Processing is quite a difficult issue in the field of AI. Now if we are talking about the major problem in Natural Language Processing, then we are talking about the processing of Discourse in NLP.

So, we can see that the real problem is the processing of the Discourse in NLP, and hence we need to work on it so that our model can be trained well, which will help in better processing of Natural Language data by the computers and hence the Artificial Intelligence can predict the desired result.

Now a question that comes to our mind is what is Discourse in NLP? Well, in simple terms, we can say that discourse in NLP is nothing but coherent groups of sentences. When we are dealing with Natural Language Processing, the provided language consists of structured, collective, and consistent groups of sentences, which are termed discourse in NLP. The relationship between words makes the training of the NLP model quite easy and more predictable than the actual results.

Discourse Analysis is extracting the meaning out of the corpus or text. Discourse Analysis is very important in Natural language Processing and helps train the NLP model better.

Let us now learn about the concept of coherence in the next section.

Concept of Coherence

Coherence in terms of Discourse in NLP means making sense of the utterances or making meaningful connections and correlations. There is a lot of connection between the coherence and the discourse structure (discussed in the next section). We use the property of good text, coherence, etc., to evaluate the quality of the output generated by the natural language processing generation system.

What are coherent discourse texts? Well, if we read a paragraph from a newspaper, we can see that the entire paragraph is interrelated; hence we can say that the discourse is coherence, but if we only combine the newspaper headlines consecutively, then it is not a discourse, it is just a group of sentences that are also non-coherence.

Let us now learn about the two major properties of coherence, i.e., Coherence relation between utterances and Coherence relation between entities.

Coherence Relation Between Utterances

When we say that the discourses are coherent, then it simply means that the discourse has some sort of meaningful connection. The coherent relation tells us that there is some sort of connection present between the utterances.

Relationship Between Entities

If there is some kind of relationship between the entities, then we can also say that the discourse in NLP is coherent. So, the coherence between the entities is known as entity-based coherence.

Discourse Structure

So far, we have discussed discourse and coherence, but we have not discussed the structure of the discourse in NLP. Let us now look at the structure that discourse in NLP must have. Now, the structure of the discourse depends on the type of segmentation applied to the discourse.

What is discourse segmentation ? Well, when we determine the types of structures for a large discourse, we term its segmentation. The segmentation is a difficult thing to implement, but it is very necessary as discourse segmentation is used in fields like :

  • Information Retrieval,
  • Text summarization,
  • Information Extraction, etc.

Algorithms for Discourse Segmentation

We have different algorithms for Unsupervised Discourse Segmentation and Supervised Discourse Segmentation. Let us now learn about the various algorithms used for discourse segmentation in this section.

Unsupervised Discourse Segmentation

The class of unsupervised segmentation is also termed or represented as linear segmentation. Let us take an example to understand this discourse segmentation better.

Suppose we have a text with us, and the task is to segment the text into various units of multi-paragraphs. In the multi-paragraphs, a single unit is going to represent a passage of the text.

Now the algorithm will take the help of cohesion (that we have discussed above), and the algorithm will classify the dependent texts and tie them together using some linguistic devices. In simpler terms, unsupervised discourse segmentation means the classification and grouping up of similar texts with the help of coherent discourse in NLP.

The unsupervised discourse segmentation can also be performed with the help of lexicon cohesion. The lexicon cohesion indicates the relationship among similar units, for example, synonyms.

Supervised Discourse Segmentation

In the previous segmentation, there was no certain labeled segment boundary to separate the discourse segments. But in the supervised discourse segmentation, we only deal with the training data set having a labeled boundary. To differentiate or structure the discourse segments, we make use of cue words or discourse makers. These cue words or discourse maker works to signal the discourse structure. As there can be varied domains of discourse in NLP so, the cue words or discourse makers are domain specific.

Text Coherence

As we have previously discussed, the coherent discourse in NLP aims to find the coherence relation among the discourse text. Now, to find the structure in discourse, we use lexical repetition, but by using this lexical repetition, we cannot satisfy the conditions of coherent discourse. So, to prove such a kind of discourse relation, Hebb has proposed some solutions.

Suppose we have two kinds of related sentences, namely: S0 and S1.

Result

We can say that the second statement, i.e., S1 can be the cause of the first statement, i.e., S0. For example, Rahul is late. He will be punished.

In the above example, we can say that the first statement, S0, i.e., Rahul is late, has caused the second statement, i.e., S1, i.e., He will be punished.

Explanation

Similar to the result, We can say that the first statement, i.e., S0 can be the cause of the second statement, i.e., S1. For example, Rahul fought with his friend. He was drunk.

Parallel

By the term parallel, we mean that the assertion from the statement S0, i.e., p(a1, a2, …), and the assertion from the statement S1, i.e. p(b1, b2, …), the ai and bi is similar for all the values of I.

In simpler terms, it shows us that the sentences are parallel. For example, He wants food. She wants money. Both of the statements are parallel as there is a sense of want in both sentences.

Elaboration

Elaboration means that proposition P is inferring from both the assertions S0 and S1. For example, Rahul is from Delhi. Rohan is from Mumbai.

Occasion

The occasion takes place when the change in the state is inferred from the first assertion S0, the final state is inferred from the statement S1, and vice-versa. Let us take an example to understand the relationship occasion better. For example, Rahul took the money. he gave it to Rohan.

Building Hierarchical Discourse Structure

In the previous section, we discussed how text coherence takes place. Let us now try to build a hierarchal discourse structure with the help of a group of statements. We generally create the hierarchical structure among the coherence relations to get the entire discourse in NLP.

Let us consider the following phrases and serially number them.

  • S1:
    Rahul went to the bank to deposit money.
  • S2:
    He then went to Rohan's shop.
  • S3 :
    He wanted a phone.
  • S4 :
    He did not have a phone.
  • S5:
    He also wanted to buy a laptop from Rohan's shop.

Now the entire discourse can be represented using the below hierarchal discourse structure.

building-hierarchical-discourse-structure

Reference Resolution

The extraction of the meaning or interpretation of the sentences of discourse is one of the most important tasks in natural language processing, and to do so, we first need to know what or who is the entity that we are talking about. Reference resolution means understanding the type of entity that is being talked about.

By the term reference, we mean the linguistic expression that is used to denote an individual or an entity. For example, look at the below sentences.

  • Rahul went to the farm.
  • He cooked food.
  • His farm was very big.

In the above sentences, Rahul, He, and His references. So, we can simply define the reference resolution as the task of determination of the entities that are being referred to by the linguistic expressions.

Let us now look at the various terminologies used in the reference resolution.

Terminology Used in Reference Resolution

  • Referring expression:
    The NLP expression that performs the reference is termed a referring expression. For example, the passage that we have talked about in the above section is an example of the referring expression.
  • Referent:
    Referent is the entity we have referred to. For example, in the above passage, Rahul is the referent.
  • Co-refer:
    As the name suggests, Co-refer is a term used for an entity if two or more expressions are referring to the same entity. For example, Rahul and He is used for the same entity, i.e., Rahul.
  • Antecedent:
    The term that has been licensed to use another term is termed antecedent. For example, in the above passage, Rahul is the antecedent of the reference He.
  • Anaphora & Anaphoric:
    The referring expression is termed anaphoric. Anaphora & Anaphoric can be said to be the term or reference used for an entity that has previously been introduced in the same sentence.
  • Discourse model:
    It is the model that has the overall representation of the entities that have been referred to in the discourse text. It also contains the relationship of the involved discourse in the NLP.

Types of Referring Expressions

As we have previously discussed, the NLP expression that performs the reference is termed a referring expression. We have mainly five types of referring expressions in Natural Language Processing. Let us discuss them one by one.

1. Indefinite Noun Phrases

Indefinite noun reference is a kind of reference that represents the entity that is new to the discourse context's hearer. To understand the indefinite noun phrase, let us take an example.

For example:
In the sentence : Rahul is doing some work., some is an indefinite noun phrase.

2. Definite Noun Phrases

A definite noun reference is a kind of reference that represents the entity that is not new to the discourse context's hearer. The discourse context's hearer can easily identify the definite noun reference. To understand the definite noun phrase, let us take an example.

For example:
In the sentence: Rahul loves reading the Times of India., the Times of India is an indefinite noun phrase.

3. Pronouns

Pronouns is a form of definite reference (its working is the same as we have learned in English grammar).

For example:
In the sentence, Rahul learned as much as he could. Here, he is the pronoun that is referring to the noun Rahul.

4. Demonstratives

The demonstratives are also used to demonstrate the nouns but they behave differently than the simple pronouns.

For example, that, this, these, and those, are some examples of demonstratives.

5. Names

Names can be the name of the person, location, organization, etc. So, it is the simplest form of referring to the expressions.

For example, in the above examples, Rahul is the name referring expression.

Reference Resolution Tasks

To resolve the reference, we can use the two resolution tasks. Let us discuss them one by one.

1. Co-reference Resolution

In the Co-reference Resolution, the main aim is to find the referring expression from the provided text that refers to the same entity. In a discourse in NLP, Co-refer is a term used for an entity if two or more expressions are referring to the same entity.

For example, Rahul and He is used for the same entity i.e., Rahul.

The Co-reference Resolution can be simply termed as finding the relevant co-refer expressions among the provided discourse text. Let us take an example for more clarity.

For example, Rahul went to the farm. He cooked food. In this example, Rahul and He is the referring expressions.

We have some sort of constraints present on the Co-reference Resolution. Let us learn about the constraint.

Constraint on Co-reference Resolution:

In the English language, we have many pronouns. If we are using the pronouns he and she, then we can easily resolve it. But if we are using the pronoun it, the resolution can be tricky, and if we have a set of co-referring expressions, then it becomes more complex to resolve it. In simpler terms, if we are using the it pronoun, then the exact determination of the referred noun is complex.

2. Pronominal Anaphora Resolution

By the terms Pronominal Anaphora Resolution, we are aiming to find the antecedent for the current single pronoun.

For example, in the passage - Rahul went to the farm. He cooked food., Rahul is the antecedent of the reference He.

Conclusion

  • Discourse in NLP is nothing but coherent groups of sentences. When we are dealing with Natural Language Processing, the provided language consists of structured, collective, and consistent groups of sentences, which are termed discourse in NLP.
  • Discourse Analysis is very important in Natural language Processing and helps train the NLP model better.
  • Coherence in terms of Discourse in NLP means making sense between the utterances or making meaningful connections and correlations. We use the property of good text, coherence, etc. to evaluate the quality of the output generated by the natural language processing generation system.
  • The extraction of the meaning or interpretation of the sentences of discourse is one of the most important tasks in natural language processing, and to do so, we first need to know what or who is the entity that we are talking about.
  • Indefinite noun reference is a kind of reference that represents the entity that is new to the discourse context's hearer.
  • Definite noun reference is a kind of reference that represents the entity that is not new to the discourse context's hearer. The discourse context's hearer can easily identify the definite noun reference.
  • In the Co-reference Resolution, the main aim is to find the referring expression from the provided text that refers to the same entity. By the terms Pronominal Anaphora Resolution, we are aiming to find the antecedent for the current single pronoun.