You are on page 1of 6

ASSIGNMENT-14

REDDIVARI SAI SARAN


G01142501

1. For each datafile, run the NaiveBayes classifier from the Notebook and report the output whether it is
classified as pos or neg.
2. Validate your answer by calculating Frequency of word count as well as visualizing it using wordcloud.

TEXTSAMPLE1:
TEXTSAMPLE2:
3. Explain your analysis based on your results from question 1 and 2.

The accuracy for the textsample1 and textsample2 is 99.50 when we use naïve Bayes
classification. If you look at the actual data, you'll see that the data is kind of messy there are
typos, abbreviations, grammatical errors of all sorts
The textsample1 and textsample2 contain both pos and neg tweets. The most commonly used
words in textsample1 are love, Vinci, Harry, like, awesome, impossible, mountain, etc. The most
commonly used words in textsample2 are Vinci, harry, mountain, code, hate, sucked, sucks,
movie, etc
The positive and negative tweets used commonly are Vinci, mountain, brokeback, mission, etc.
The naive Bayes classification helped to classify the negative and positive tweets with more
accuracy.

You might also like