You are on page 1of 8

ROBIN MICHAEL

CODE MIXED SENTIMENTAL ANALYSIS


LITERATURE REVIEW
• Sentiment analysis, also referred to as opinion mining, is an approach to natural language processing
(NLP) that identifies the emotional tone behind a body of text. Code mixed sentimental analysis is where
we are performing sentimental analysis on code mixed data. Code mixed data means data consisting of a
mixture of two languages like english and malayalam. Code-mixed data can be easily extracted from
various social-media platforms like Facebook,youtube etc.various tools,algorithms and libraries are there
to help in code mixed sentimental analysis.Monkeylearn,lexalytics,brandwatch etcare some of the tools
used in code mixed sentimental analysis. Algorithms like Naïve Bayes Classifier,Support vector
machine(SVM),Random Forest and further more can be used to perform code mixed sentimental analysis.
Natural language tool kit(NLTK) It is one of the best Python libraries for any task based on natural
language processing and sentimental analysis is one of the bestThe relevance of sentimental analysis in
today’s world has a vital role due to the growth of social media since it can be used to analyse a person’s
opinion about a particlar product or movie or anything he wishes to comment.Most of the people using
social media in the present scenario likes to express their opinions with a mixture of languages like english
and malayalam.Hence code mixed sentimental analysis plays a major role in analysing people’s opinions
on various things and use them for varius purposes.So the project that is intended to do here consists of
analysing code mixed data on comments taken from youtube and anlayse the polarity of the
comments(positive,negative or neutral).
ALGORITHM
• Random forest
• Random forest is a commonly-used machine learning algorithm trademarked by Leo Breiman and Adele
Cutler, which combines the output of multiple decision trees to reach a single result. Its ease of use and
flexibility have fueled its adoption, as it handles both classification and regression problems.
• Random forest algorithms have three main hyperparameters, which need to be set before training. These
include node size, the number of trees, and the number of features sampled. From there, the random forest
classifier can be used to solve for regression or classification problems.
• Depending on the type of problem, the determination of the prediction will vary. For a regression task, the
individual decision trees will be averaged, and for a classification task, a majority vote—i.e. the most
frequent categorical variable—will yield the predicted class.
• The algorithm reduces the risk of overfitting,provides flexibility.The key challenges are it is time
consuming,Requires more resources and is more complex.
• They are widely used in finance sector,e-commerce sector,and health care sector.
ALGORITHM RANDOM FOREST

• step-1:Select random K data points from the training set.


• Step-2: Build the decision trees associated with the selected data
points (Subsets).
• Step-3: Choose the number N for decision trees that you want to
build.
• Step-4: Repeat Step 1 & 2.
• Step-5: For new data points, find the predictions of each decision
tree, and assign the new data points to the category that wins the
majority votes.
Sample dataset
Process flow
REFERENCES
• https://arxiv.org/ftp/arxiv/papers/1808/1808.03299.pdf
• https://www.researchgate.net/publication/326988364_Code-Mix
ed_Sentiment_Analysis_Using_Machine_Learning_and_Neural
_Network_Approaches

You might also like