Wordnet in NLP - Scaler Topics

Overview

A word and its semantics (meanings, relations, and usage in various contexts) play a very important role in Natural Language Processing (NLP). A meaningful sentence is composed of meaningful words. Many of the NLP tasks, like text classification, sentiment analysis, and most important, WSD (word sense disambiguation), rely on these sentence and word semantics.

Introduction

Natural Language Processing can be challenging when it comes to automatically deciphering and analyzing word meanings and pre-processing text input (NLP). To help with this, lexicons are widely used. A vocabulary is referred to as a dictionary of lexicons. We often make connections in language using these lexicons, which helps us understand the relationship between various concepts. A great lexical resource is WordNet. The identification of word relationships, synonyms, grammar, and other topics is made easier because of its distinctive semantic network. Automatic language translation, sentiment analysis, and text similarity are all aided by this.

Before we get to WordNet directly, let's look at some important terminologies.

Word Sense

Words are ambiguous, which means the same word can be used differently depending on the context. For example, a 'bank' could be a river bank or a financial institution. These meanings and variety due to context are captured by sense (or word sense).

A sense (or word sense) is a discrete representation of one aspect of the meaning of a word.

Representation of Word Sense

There are many ways of mathematically defining or representing words in the form of embeddings like Word2Vec or GloVe, which can also capture some kind of meaning and relation between words defined by co-occurrences. But they fail to answer:** How to define the meaning of a word?**

Another way of capturing the senses is using thesauruses and giving a textual definition for each sense.

bank: (sloping land (especially the slope beside a body of water)) "they pulled the canoe up on the bank"; "he sat on the bank of the river and watched the currents"
bank: depository financial institution, bank, banking concern, banking company (a financial institution that accepts deposits and channels the money into lending activities) "he cashed a check at the bank"; "that bank holds the mortgage on my home."

An alternate way is to capture the semantic relationship between words (or senses) like car IS-A vehicle is a relation defined as 'car is a type of vehicle'.

Such definitions and semantic relations are captured by online tools like WordNet.

Semantic Relations

Synonymy The senses of two separate words are called synonyms if the meanings of these words are identical or similar. Example: center/middle, run/jog, etc.

Antonymy Antonyms are words with opposite meanings.

Example: dark/light, fast/slow etc.

Taxonomic Relations

Word senses can be related taxonomically so that they can be classified in certain categories. A word (or sense) is a hyponym of another word or sense if the one denotes a subclass of the other and is conversely called hypernym. For example, man is a hyponym of animal, and animal is a hypernym of man. Alternatively, this hyponym/hypernym can be defined as IS-A relationship 'Man IS-A animal'

Meronymy The 'part-whole' relationship is called Meronymy. A wheel is part of car.

What is the WordNet?

Now that we have discussed some NLP terms, let's get back to WordNet. WordNet is a large lexical database of words, senses, and their semantic relations. This project was started by George A. Miller in the mid-1980s, and captures the word and their senses. In WordNet, the sense is defined by a set of synonyms, called synsets, that have a similar meaning or sense. This means WordNet represents words (or senses) as lists of the word senses that can be used to express the concept. Here is an example of a synset. Sense for the word 'fool' can be defined by the list of synonyms as {chump, jester, gull, fritter, dupe, fool around}

sample sysnet and various groups

It can also be seen that English WordNet consists of three separate databases, one each for nouns and verbs and a third for adjectives and adverbs.

Synset

A synset in WordNet is an interface that is a part of NLTK that can be used to look up words in WordNet. A Synset instance has groupings of words that are synonymous or words that express similar concepts. Some words have a singular Synset, and some have multiple. Here's an example:

Output:

Structure of WordNet

A synonym set (synset) is a group of words that all refer to the same notion in Wordnet. The structure of the wordnet is made up of words and synsets linked together by conceptual-semantic links.

As we read earlier, the structure of WordNet consists of words, senses, and Synsets. The image below, best describes the structure of WordNet.

structure-of-wordnet

How to use WordNet?

In this example, we are going to showcase the usage of NLTK to explore WordNet for synsets, meanings, and various semantic relationships.

WordNet Is Available as A Corpus in Nltk. Download the Word Net Corpus and Its Dependencies.

Find All the Sysnets of 'fool'

Output:

Find only Verb Systems of 'fool.'

Output:

Creating a Class to Look up Words in WordNet

What is the WordNetTagger()? A very common function in NLP, is part-of-speech tagging, which means we tag every word to its part of speech, such as verbs, nouns, etc. WordNet provides us with this functionality with the WordNetTagger() function. Let's look at how it is used with a code example.

Now to use the WordNetTagger() for part of speech tagging, we must first create a class to look up words with WordNet.

Now, this class that is created will return the count of the number of each part of the speech tag found in the Synsets for a word, and then the most common tag (treebank tag) will be the main tag given using internal mapping.

Using a Simple WordNetTagger()

We can now use the simple WordNetTagger().

Output:

We can improve this accuracy, read on!

WordNetTagger class at the end of an NgramTagger backoff chain

Here we initialize a backoff tagger, and use unigrams, bigrams as well as trigrams.

Output:

Finding Similarity Using WordNet

WordNet can also be used to find similarities between two words, of course, similarity here means semantic similarity. WordNet provides us with a function that helps compute similarity. A higher number in the result implies greater similarity.

To calculate the similarity between two words, we must first represent the words as synsets, and then make use of the wup_similarity() function.

Output

Finding Entailments

What do we mean when we say the word "entailments"? Well, entailments essentially mean implications. For example, looking implies seeing, or buying implies choosing and paying. Now WordNet has entailment links between words. For example, a link between the word try (in the legal sense) and arrest exists, because in order to try someone, you have to arrest them. In NLP, this is called - Troponymy.

Conclusion

A word sense is the locus of word meaning; definitions and meaning relations are defined at the level of the word sense rather than word forms.
Relations between senses include synonymy, antonymy, meronymy, and taxonomic relations hyponymy and hypernymy.
WordNet is a large database of lexical relations for English, and WordNets exist for a variety of languages.
WordNet can help in Word Sense Disambiguation.