You are on page 1of 4

Syntax used for creating wordcloud, sentiment analysis and topic modeling using R: -

 Packages which are used for above analysis: -


library(tm)
library(SnowballC)
library(topicmodels)
library(wordcloud)
library(plyr)
library(dplyr)
library(stringr)
library(ggplot2)
library(httr)
library(reshape2)
library(sentimentr)
library(scales)
library(RCurl)
library(syuzhet)
library(wordcloud2)

 Commands: -
getwd()
setwd("E:/R CA3")
data = list.files(getwd(), pattern = "*.txt")
datatext = lapply(data, readLines)
View(datatext)
#Step 1 Create Corpus
library(tm)
data.corpus = Corpus(VectorSource(datatext))
class(data.corpus)
#Step 2 Text Processing
#Make each letter lowercase
#tm_map is used to clean data
data.corpus = tm_map(data.corpus, tolower)
#wanrning means there is some loss in data
#Remove Punctuation
data.corpus = tm_map(data.corpus, removePunctuation)
#Remove Numbers
data.corpus = tm_map(data.corpus, removeNumbers)
#Remove generic and custom stopwords
stopwords()
data.corpus = tm_map(data.corpus, removeWords, stopwords())
#We also can remove some specific words too
data.corpus = tm_map(data.corpus, removeWords,c("and","the","has","have","a"))
#Step 3 Visualization
library(wordcloud)
wordcloud(data.corpus, random.order = F)
#Step 4 Create TDM
tdm = TermDocumentMatrix(data.corpus)
class(tdm)
tdm=as.matrix(tdm)
tdm
termFreq= rowSums(as.matrix(tdm))
termFreq
class(termFreq)
#Subsetting TDM
termfreqsubset= subset(termFreq, termFreq>=10)
termfreqsubset
#Create a data frame
tdmdf = data.frame(Term = names(termfreqsubset), Freq = termfreqsubset)
tdmdf
rownames(tdmdf)=NULL
tdmdf
library(wordcloud2)
wordcloud2(tdmdf)
View(tdmdf)
#Sentiment Analysis
library(sentimentr)
library(syuzhet)
#WE need corpus data
class(data.corpus)
char = as.character(data.corpus) # Back to raw data
class(char)
mysentiments = get_nrc_sentiment(char)
sentimentscores = data.frame(colSums(mysentiments[,])) # happy, very happy, not happy are
different
sentimentscores
names(sentimentscores) = "Score"
sentimentscores = cbind("Sentiments" =rownames(sentimentscores), sentimentscores)
sentimentscores
rownames(sentimentscores) = NULL
sentimentscores
#Create visualization using ggplot2 package
library(ggplot2)
#create a bar chart to visualize
ggplot(sentimentscores, aes(x = Sentiments, y = Score))+
geom_bar(aes(fill = Sentiments), stat = "identity") +
theme(legend.position = "none") +
xlab("Sentiment") + ylab("Score") +
ggtitle("Total Sentiment Score")
#Topic Modeling
library(topicmodels)
topicmodels = DocumentTermMatrix(data.corpus, control = list(minWordLength = 3))
k=3
SEED = 1234
topic.lda = LDA(topicmodels, k, method = "Gibbs", control = list(seed=SEED))
lda.terms=terms(topic.lda)
lda.terms

WORDCLOUD: -
 A word cloud is a popular visualization of words typically associated with Internet
keywords and text data. They are most commonly used to highlight popular or trending
terms based on frequency of use and prominence. A word cloud is a beautiful, informative
image that communicates much in a single glance.
 A visual representation of the words used in a particular piece of text, with the size of each
word indicating its relative frequency.
 In the below figure it can be clearly seen that word “space”, “geometry”, “theory” and
“time” are having the highest frequency and they are repeated the most of the times.
WORDCLOUD2: -
 The wordcloud2 package have extended the visual ability of a standard word cloud to plot the words
into a figure or shape. It is used to give colors showing in the viewer sections to make them
colourful and more attractive.
 In the below figure we can easily see that the same data has been taken from wordcloud and here
wordcloud2 package is giving them a better way with better viewing angles.

SENTIMENT ANALYSIS: -
Sentiment Analysis is a process of extracting opinions that have different polarities. By polarities, we
mean positive, negative or neutral. It is also known as opinion mining and polarity detection. With the
help of sentiment analysis, you can find out the nature of opinion that is reflected in documents,
websites, social media feed, etc. Sentiment Analysis is a type of classification where the data is
classified into different classes.
It is clearly visible in the figure below that our data is showing the positive trend as the maximum
number of words that belongs to positive classes.
TOPIC MODELING: -
 Topic modeling is a method for unsupervised classification of documents, by modeling each
document as a mixture of topics and each topic as a mixture of words. Latent Dirichlet allocation is
a particularly popular method for fitting a topic model.
 In simple terms, the process of looking into a large collection of documents, identifying clusters of
words and grouping them together based on similarity and identifying patterns in the clusters
appearing in multitude.

As shown in the above figure, the topics, discovered through collection of documents, are:
1. theory
2. space
3. space

You might also like