Code mixed sentiment analysis uses machine learning techniques like random forest algorithms to analyze sentiments in data containing a mixture of languages, like English and Malayalam, extracted from social media. Random forest builds multiple decision trees on random subsets of data and outputs the majority vote of the classifications. It reduces overfitting while providing flexibility. The key challenges are it is time consuming and requires more resources. Random forest is widely used in sectors like finance, e-commerce, and healthcare to analyze sentiments in code mixed data.
Code mixed sentiment analysis uses machine learning techniques like random forest algorithms to analyze sentiments in data containing a mixture of languages, like English and Malayalam, extracted from social media. Random forest builds multiple decision trees on random subsets of data and outputs the majority vote of the classifications. It reduces overfitting while providing flexibility. The key challenges are it is time consuming and requires more resources. Random forest is widely used in sectors like finance, e-commerce, and healthcare to analyze sentiments in code mixed data.
Code mixed sentiment analysis uses machine learning techniques like random forest algorithms to analyze sentiments in data containing a mixture of languages, like English and Malayalam, extracted from social media. Random forest builds multiple decision trees on random subsets of data and outputs the majority vote of the classifications. It reduces overfitting while providing flexibility. The key challenges are it is time consuming and requires more resources. Random forest is widely used in sectors like finance, e-commerce, and healthcare to analyze sentiments in code mixed data.
LITERATURE REVIEW • Sentiment analysis, also referred to as opinion mining, is an approach to natural language processing (NLP) that identifies the emotional tone behind a body of text. Code mixed sentimental analysis is where we are performing sentimental analysis on code mixed data. Code mixed data means data consisting of a mixture of two languages like english and malayalam. Code-mixed data can be easily extracted from various social-media platforms like Facebook,youtube etc.various tools,algorithms and libraries are there to help in code mixed sentimental analysis.Monkeylearn,lexalytics,brandwatch etcare some of the tools used in code mixed sentimental analysis. Algorithms like Naïve Bayes Classifier,Support vector machine(SVM),Random Forest and further more can be used to perform code mixed sentimental analysis. Natural language tool kit(NLTK) It is one of the best Python libraries for any task based on natural language processing and sentimental analysis is one of the bestThe relevance of sentimental analysis in today’s world has a vital role due to the growth of social media since it can be used to analyse a person’s opinion about a particlar product or movie or anything he wishes to comment.Most of the people using social media in the present scenario likes to express their opinions with a mixture of languages like english and malayalam.Hence code mixed sentimental analysis plays a major role in analysing people’s opinions on various things and use them for varius purposes.So the project that is intended to do here consists of analysing code mixed data on comments taken from youtube and anlayse the polarity of the comments(positive,negative or neutral). ALGORITHM • Random forest • Random forest is a commonly-used machine learning algorithm trademarked by Leo Breiman and Adele Cutler, which combines the output of multiple decision trees to reach a single result. Its ease of use and flexibility have fueled its adoption, as it handles both classification and regression problems. • Random forest algorithms have three main hyperparameters, which need to be set before training. These include node size, the number of trees, and the number of features sampled. From there, the random forest classifier can be used to solve for regression or classification problems. • Depending on the type of problem, the determination of the prediction will vary. For a regression task, the individual decision trees will be averaged, and for a classification task, a majority vote—i.e. the most frequent categorical variable—will yield the predicted class. • The algorithm reduces the risk of overfitting,provides flexibility.The key challenges are it is time consuming,Requires more resources and is more complex. • They are widely used in finance sector,e-commerce sector,and health care sector. ALGORITHM RANDOM FOREST
• step-1:Select random K data points from the training set.
• Step-2: Build the decision trees associated with the selected data points (Subsets). • Step-3: Choose the number N for decision trees that you want to build. • Step-4: Repeat Step 1 & 2. • Step-5: For new data points, find the predictions of each decision tree, and assign the new data points to the category that wins the majority votes. Sample dataset Process flow REFERENCES • https://arxiv.org/ftp/arxiv/papers/1808/1808.03299.pdf • https://www.researchgate.net/publication/326988364_Code-Mix ed_Sentiment_Analysis_Using_Machine_Learning_and_Neural _Network_Approaches