The document proposes a system to analyze sentiment in news articles using a lexicon-based approach. It involves collecting news articles, preprocessing the text, calculating sentiment scores of words and phrases using a dictionary, determining sentiment scores for each document, and classifying articles as positive, negative or neutral. The goal is to analyze sentiment towards different entities mentioned in the news, like people, places and topics. It applies this method to BBC news articles across different subject areas and hopes to expand the analysis to understand how sentiment may vary by location, source or group.
The document proposes a system to analyze sentiment in news articles using a lexicon-based approach. It involves collecting news articles, preprocessing the text, calculating sentiment scores of words and phrases using a dictionary, determining sentiment scores for each document, and classifying articles as positive, negative or neutral. The goal is to analyze sentiment towards different entities mentioned in the news, like people, places and topics. It applies this method to BBC news articles across different subject areas and hopes to expand the analysis to understand how sentiment may vary by location, source or group.
The document proposes a system to analyze sentiment in news articles using a lexicon-based approach. It involves collecting news articles, preprocessing the text, calculating sentiment scores of words and phrases using a dictionary, determining sentiment scores for each document, and classifying articles as positive, negative or neutral. The goal is to analyze sentiment towards different entities mentioned in the news, like people, places and topics. It applies this method to BBC news articles across different subject areas and hopes to expand the analysis to understand how sentiment may vary by location, source or group.
Sentiment Analysis of News Articles by using Lexicon-based Approach
May Sabal Myo
4th year (Knowledge Engineering) Abstract • Newspapers and blogs express opinion of news entities such as people, place and things) while reporting on recent events.
• Now, I proposed a system that assigns scores indicating positive or
negative opinion to each distinct entity. Introduction • News can be good or bad, but it is rarely neural. • The statistical analysis of relatively simple sentiment and provide a surprisingly meaningful sense of how latest news impacts important entities. • In this experiment, we calculate the scores of five popular areas (economics, entertainment, sports, politics and Tech) by using BBC news datasets. • In this research, based on lexicon-based sentiment analysis of news articles by observing the role of text pre-processing in sentiment analysis. Background Theory • Sentiment analysis is a text analysis method that detects polarity (e.g. a positive or negative opinion) within text, whether a whole document, paragraph, sentence, or clause.
• In general, sentiment analysis has been divided mainly in three levels:
1. Document level 2. Sentence level 3. Entity or aspect level Background Theory (cont.) • Sentiment analysis can generally be carried out using supervised or unsupervised approaches. 1. A supervised approach comprises of a set of labeled training data that is used to build a classification model with the intent of using this model to classify new data for which labels are not present. 2. Unsupervised or Lexicon-based approaches to sentiment analysis do not require any training data. In case of a sentence or a document, the polarities of the individual words that compose the document collectively convey the sentiment of the sentence or the document. Proposed System and Methodologies • Sentiment analysis can be done on document level, sentence level, word level or phrase level.
• This system explores sentiment analysis on the document level. This
research identifies whether the documents new articles expressed opinions are positive, negative or neutral.
• The proposed system is based on the Lexicon-based approach.
Proposed System and Methodologies (cont.) • This approach can use the following methods: 1. Dictionary-based methods: in these methods lexicon dictionary is used in order to find out the positive opinion words and negative opinion words. 2. Corpus-based methods: in these methods large corpus of words is used and based on syntactic patterns other opinion words can be found within the context. Proposed System and Methodologies (cont.) • The methodology comprised of 5 steps: 1. Data Collection This proposed system used BBC new dataset which contains 2225 documents from the website corresponding to stories in five popular areas, i.e. Business, Entertainment, Politics, Sport, Tech and each classes carry 300 to 400 article and the file format of these file is .txt. Proposed System and Methodologies (cont.) 2. Text Pre-processing • Preprocessing is a necessary step to clean text (lessen noise of text) and to reduce inconsistencies from it so that this cleansed data can more effectively be utilized in text mining or sentiment analysis task. • The first preprocessing task was tokenizing which breaks a sequence of sentences into individual components such as words, phrases or symbols which are termed tokens. • During tokenization some characters, such as punctuation marks, stop words, white space, numbers from the data, are discarded. Proposed System and Methodologies (cont.) 3. Calculate Polarity of Sentiment of Sentiment words • Using TF-IDF(Term Frequency-Inverse Document Frequency) important words or terms in document were identified and assigned a weightage according to the occurrence of various words in the news article. • After identification of important words, WordNet Dictionary (a lexical database for English language) has been used in this experiment. Proposed System and Methodologies (cont.) 4. Calculate Total Sentiment Score • Find the polarities of each individual words, phrases and sentences and combining them to predict the polarity of whole document. • The sentiment score of whole news article has been calculated using extract sentiment operator, text having a sentiment score of -1 is considered negative and text having a sentiment score of +1 is positive. Proposed System and Methodologies (cont.) 5. Sentiment Results • News articles were classified in to positive, negative and neutral classes by looking at their total sentiment score. • News articles sentiment was then calculated as the average value of total word sentiments. Conclusion • There are many interesting directions that can be explored. We are interested in how sentiment can vary by demographic group, news source or geographic location. • By expanding our spatial analysis of news entities to sentiment maps, we can identify geographical regions of favorable or adverse opinions forgiven entities. • We are also studying in analyzing the degree to which our sentiment indices predict future changes in popularity or market behavior.