TextBlob For NLP With Python

This article will guide you through the text analysis processes with the TextBlob library. This library runs over the NLTK package and provides most of the text processing analysis.

As a natural language processing area, text analysis is growing rapidly with a collection of applications like a chatbot, hiring analysis, language extraction, social site analysis, etc.

The TextBlob library provides the majority of the tasks for natural language processing with classifiers for modeling and feature extraction.

Topics To Be Covered

1. Installing the textblob library
2. Creating textblob and POS tagging
3. Tokenization
4. Lemmatization
5. WordNet and WordList
6. Parsing
7. N-Gram
8. Sentiment analysis
9. Word and Phrase Frequencies 
10. Correction of spellings

Installing the TextBlob Library

The installation of TextBlob is very simple. We can open an Anaconda prompt, and if there is any other environment, we can choose that. Otherwise, we can install it in the base environment. To install in the Anaconda, the command is shown below:

conda install -c conda-forge textblob

Creating TextBlob and POS Tagging

Creating a textblob is very simple. We just need to import the Textblob from the TextBlob library.

from textblob import TextBlob

This TextBlob is a class of textblob to use for text analysis. Now we will make an object of this class to use for further analysis.

string = TextBlob("Hello all, I am Amit Chauhan from India")

We will see the part of the speech method of this string with the help of the tags method in the class. This tagging method will return the tuple with a word and its part of speech.

string.tags#output:
[('Hello', 'NNP'),
 ('all', 'DT'),
 ('I', 'PRP'),
 ('am', 'VBP'),
 ('Amit', 'NNP'),
 ('Chauhan', 'NNP'),
 ('from', 'IN'),
 ('India', 'NNP')]

Tokenization

Tokenization is an important part of natural language processing that divides the statements into words. The most amazing part of tokenization in TextBlob is that it can break the sentence into words and sentences, as shown in the example below:

#take an example
tokens_of_str = TextBlob("Today is a beautiful day. "
                         "The rainy clouds are everywhere.")
                         
#find the tokens as number of words
tokens_of_str.words
#output:
WordList(['Today', 'is', 'a', 'beautiful', 'day', 'The', 'rainy',
          'clouds', 'are', 'everywhere'])
#find the tokens as number of sentences
tokens_of_str.sentences
#output:
[Sentence("Today is a beautiful day."),
 Sentence("The rainy clouds are everywhere.")]
 
#we can find the first word and sentence also with indexing
tokens_of_str.words[0]
tokens_of_str.sentences[0]
#output:
'Today'
Sentence("Today is a beautiful day.")

Lemmatization

Lemmatization is used to make a sentence into its root word. We have the lemmatize method in the textblob class to find a list of the words with their root.

If we are not passing anything inside the lemmatize method, then it will return the noun of the word. We can also pass the part of speech to get the word accordingly.

from textblob import Word
a = Word("classes")
a.lemmatize()
#output:
'class'
#to get the lemma with verb 'v' of the word
a = Word("blinked")
a.lemmatize("v")
#output:
'blink'
#with adjective 'a' means adjective
a = Word("faster")
a.lemmatize("a")

Those were examples with words. Now we will see a sentence with lemmatization:

sent = "Today is a beautiful day"
a = sent.split(" ")
print(a)
print([Word(word).lemmatize() for word in a])
print([Word(word).lemmatize("v") for word in a])#output:
['Today', 'is', 'a', 'beautiful', 'day']
['Today', 'is', 'a', 'beautiful', 'day']
['Today', 'be', 'a', 'beautiful', 'day']

WordNet and WordList

WordNet is a collection of words for finding synonyms and other information. The TextBlob library uses the database of NLTK’s WordNet. We can use the synset method to get the words or sentence of the searched word, as shown below:

from textblob import TextBlob
from textblob import Word
 
word = Word('sad')
 
print(word.definitions)
 
synonyms = set()
for synset in word.synsets:
    for lemma in synset.lemmas():
        synonyms.add(lemma.name())
         
print(synonyms)
#output:
['experiencing or showing sorrow or unhappiness', 'of things that make you feel sad', 'bad; unfortunate']
{'deplorable', 'sorry', 'lamentable', 'pitiful', 'distressing', 'sad'}

WordList is a class in a TextBlob library that contains methods like upperlowerlemmatizestemappendpluralizecount, etc.

from textblob import TextBlob
x = TextBlob("Today is a beautiful day")
x.words
#output:
WordList(['Today', 'is', 'a', 'beautiful', 'day'])
#to make the plural of the words
x.words.pluralize()
#output:
WordList(['Todays', 'iss', 'some', 'beautifuls', 'days'])
#to make the word in Capital letters
x.words.upper()
#output:
WordList(['TODAY', 'IS', 'A', 'BEAUTIFUL', 'DAY']

Parsing

This is like a parsing tree of a sentence to find the sentence that is to be syntactically structured by checking for verb-phrase, noun-phrase, etc.

string = TextBlob("Today")
print(string.parse())#output:
Today/NN/B-NP/O

N-Gram

TextBlob provides easy-to-use n-grams for sentences. As we give the value of the n-gram in the ngram method, in return, that number of words is a list. An example of an ngram is shown below:

#The value of the ngram is '1'
blob = TextBlob("Today is a beautiful day")
blob.ngrams(n=1)
#output:
[WordList(['Today']),
 WordList(['is']),
 WordList(['a']),
 WordList(['beautiful']),
 WordList(['day'])]
#The value of the ngram is '2'
blob = TextBlob("Today is a beautiful day")
blob.ngrams(n=3)
#output:
[WordList(['Today', 'is', 'a']),
 WordList(['is', 'a', 'beautiful']),
 WordList(['a', 'beautiful', 'day'])

Sentiment analysis is used to check whether the sentiments of a word or sentence are positive or negative.

from textblob import TextBlob
from textblob.sentiments import NaiveBayesAnalyzer
string = TextBlob("I am happy today", analyzer=NaiveBayesAnalyzer())
string.sentiment#output:
Sentiment(classification='pos', p_pos=0.7367024460316827,
                                p_neg=0.2632975539683174)

Word and Phrase Frequency

This is used to count the frequency of a word/phrase in a text. There are two ways to do it: with the word_counts method and with the count method.

#the first method
text = TextBlob("hip hip hurray")
text.word_counts['hip']#output:
2#the second method
text.words.count('hip', case_sensitive=True)#output:
2

Correction of Spelling

This aspect of TextBlob is used to improve written words in a sentence, as shown below:

text = TextBlob("Today is a beautif day")
print(text.correct())#output:
Today is a beautiful day

This article has been published from the source link without modifications to the text. Only the headline has been changed.

Source link