Home Data Engineering Data DIY Calculating Word Frequency in Dataframe

Calculating Word Frequency in Dataframe

June 27, 2020

Alright so in the short tutorial we’ll calculate word frequency and visualize it.

It’s relatively simple task.
BUT when it comes for stopwords and language different from English, there might be some difficulties.

I’ve a dataframe which has field text is russian language.

Step 0 : Install required libraries

packages.install("tidyverse")
packages.install("tidytext")
packages.install("tm")
library(tidyverse)
library(tidytext)
library(tm)

Step 1 : Create stopwords dataframe

#create stopwords DF
rus_stopwords = data.frame(word = stopwords("ru"))

Step 2 : Tokenize

new_df <- video %>% unnest_tokens(word, text) %>% anti_join(rus_stopwords)


# - anti_join  - functoin to remove stopwords
#video - is name of dataframe
#word - is name of new field
#text - is just a filed with our text

Step 3 : Count words

frequency_dataframe = new_df %>% count(word) %>% arrange(desc(n))

Step 4 (Optional) : Take only first 20 items from a dataframe

short_dataframe = head(frequency_dataframe, 20)

Step 5 Visualize with ggplot

ggplot(short_dataframe, aes(x = word, y = n, fill = word)) + geom_col()

So in my case it looked looked like this:

Screenshot 2020-05-05 at 11.50.18.png

This article has been published from the source link without modifications to the text. Only the headline has been changed.

Source link

Calculating Word Frequency in Dataframe

Step 0 : Install required libraries

Step 1 : Create stopwords dataframe

Step 2 : Tokenize

Step 3 : Count words

Step 4 (Optional) : Take only first 20 items from a dataframe

Step 5 Visualize with ggplot

Follow Us

POPULAR POSTS

Fed Flips On Huge $3 Trillion Crypto Price Boom

Biden, Xi agree that AI should not Control Nuclear Arms

AI Chatbot threatens User

Looking inside AI’s “Mind” with Google DeepMind

POPULAR CATEGORY

Biden, Xi agree that AI should not Control Nuclear Arms