Professional Documents
Culture Documents
A18 CU6051NA A2 CW Coursework 16034872 Anjil Shrestha
A18 CU6051NA A2 CW Coursework 16034872 Anjil Shrestha
• BRAND MONITORING
• CUSTOMER SUPPORT
• CUSTOMER FEEDBACK
• PRODUCT ANALYTICS
• MARKET RESEARCH AND ANALYSIS
• WORKFORCE ANALYTICS & VOICE OF THE EMPLOYEE
• SPAM FILTERING
WHY NAÏVE BAYES CLASSIFIER?
• PROBABILISTIC MODEL
• Naïve Bayes is a probabilistic algorithm that takes advantage of probability theory and
Bayes’ theorem to predict sentiment of a text.
P(A|B) – posterior
P(A) – prior
P(B) – evidence
P(B|A) – likelihood
BAG OF WORDS
helpful course and material boring dont waste time in this useful content helped lot thanks a La
s be
l Bag of Words (BOW) is the
representation of text that
Helpful 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 +
course
and
describes the occurrence of ways
materials
. of extracting features from
Boring. 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 - documents.
Don’t
waste
0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 -
• A vocabulary of known words
time in
this.
Useful 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 +
• A measure of the presence of
materials
and known words
content.
Helped a 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 +
lot.
Thanks
DEVELOPMENT
• Pandas
• NumPy
• Scikit-learn
• NLTK
• Regex
Startup page
• Options to train dataset
for sentiment prediction
DEVELOPED SYSTEM CONTINUED..
Training page
• The training process is
getting carried out in
the backend.
DEVELOPED SYSTEM CONTINUED..
Floating navigation
button for visualization
• Option to open
visualization page.
DEVELOPED SYSTEM CONTINUED..
Visualization page
• Bar diagram showing
total reviews made on
12 test courses.
DEVELOPED SYSTEM CONTINUED..
Visualization page
• Total positive,
negative and neutral
reviews. (Extracted
from data set)
PSEUDO CODE
IMPORT NECESSARY LIBRARIES (PANDAS, SKLEARN, NLTK TOOLS)
READ DATASET AND SEPARATE SENTIMENT TEXT AND ITS SENTIMENT LABEL.
X = DATAFRANE.SENTIMENTTEXT
Y = SENTIMENTLABEL
X_TRAIN,
X_TEST,Y_TRAIN,Y_TEST=TRAIN_TEST_SPLIT(X,Y,TEST_SIZE=0.2,RANDOM_STATE=1)
REMOVE STOPWORDS.
TOKENIZATION.
MODEL=NAIVE_BAYES.MULTINOMIALNB()
MODEL.FIT(X_TRAIN,Y_TRAIN)
MY_VECTORIZER=VECTORIZER.TRANSFORM(MY_TEST_DATA)
MODEL.PREDICT(MY_VECTORIZER