Calculating Word Frequency in Dataframe

June 27, 2020

Alright so in the short tutorial we’ll calculate word frequency and visualize it.

It’s relatively simple task.
BUT when it comes for stopwords and language different from English, there might be some difficulties.

I’ve a dataframe which has field text is russian language.

Step 0 : Install required libraries

packages.install("tidyverse")
packages.install("tidytext")
packages.install("tm")
library(tidyverse)
library(tidytext)
library(tm)

Step 1 : Create stopwords dataframe

#create stopwords DF
rus_stopwords = data.frame(word = stopwords("ru"))

Step 2 : Tokenize

new_df <- video %>% unnest_tokens(word, text) %>% anti_join(rus_stopwords)


# - anti_join  - functoin to remove stopwords
#video - is name of dataframe
#word - is name of new field
#text - is just a filed with our text

Step 3 : Count words

frequency_dataframe = new_df %>% count(word) %>% arrange(desc(n))

Step 4 (Optional) : Take only first 20 items from a dataframe

short_dataframe = head(frequency_dataframe, 20)

Step 5 Visualize with ggplot

ggplot(short_dataframe, aes(x = word, y = n, fill = word)) + geom_col()

So in my case it looked looked like this:

Screenshot 2020-05-05 at 11.50.18.png

This article has been published from the source link without modifications to the text. Only the headline has been changed.

Source link

Calculating Word Frequency in Dataframe

Step 0 : Install required libraries

Step 1 : Create stopwords dataframe

Step 2 : Tokenize

Step 3 : Count words

Step 4 (Optional) : Take only first 20 items from a dataframe

Step 5 Visualize with ggplot

Most Popular

In Crypto we Still “Trust”

Understanding why Wall Street likes Crypto

Meta to Spend $40 Billion on AI this year

Is Bitcoin halving the launch of new era for crypto

Drawing the line on using AI in TV and film?

Making AI Sustainable

Follow Us

POPULAR POSTS

Understanding why Wall Street likes Crypto

Making AI Sustainable

Is Bitcoin halving the launch of new era for crypto

Vitalik Buterin sheds light on crypto’s true essence

POPULAR CATEGORY

Calculating Word Frequency in Dataframe

Step 0 : Install required libraries

Step 1 : Create stopwords dataframe

Step 2 : Tokenize

Step 3 : Count words

Step 4 (Optional) : Take only first 20 items from a dataframe

Step 5 Visualize with ggplot

RELATED ARTICLES

Most Popular

Follow Us

POPULAR POSTS

POPULAR CATEGORY