You are on page 1of 5
Problem 2- In this particular project, we are going to work on the inaugural corpora from the nltk in Python. We will be looking at the following speeches of the Presidents of the United States of America: 1. President Franklin D, Roosevelt in 1941 2. President John F. Kennedy in 1961 3. President Richard Nixon in 1973 2.1 Find the number of characters, words, and sentences for the mentioned documents. Characters Characters in Franklin D. Roosevelt's speech: 7571 Characters in John F. Kennedy's speech: 7618 Characters in Richard Nixon's speech: 9991 Words: Words in Franklin D. Roosevelt's speech: 1536 Words in John F. Kennedy's speech: 1546 Words in Richard Nixon's speech: 20208 Sentences Sentences in Franklin D. Roosevelt's speech: 68 Sentences in John F. Kennedy's speech: 52 Sentences in Richard Nixon's speech: 69 2.2 Remove all the stopwords from all three speeches. To remove the stopwords, there is package called “stopwords’ in the nitk.corpus library. So, in order to do 80 we need to import following libraries- = from nitk.corpus import stopwords = from nitk.stem.porter import PorterStemmer The stopwords library contains all the stop words like ‘and’, ‘a’, ‘is’, ‘to’ ‘is’, ', ‘of, ‘to’ ete., that usually don't have any importance in understanding the sentiment or usefullness in machine learning algorithms, These stopwords present in the package are universally accepted stopwords and we can add using the (.extend()) function or remove them as per our requirement. Also, we need to specify the language we are working with before defining the functions, as there are many language packages. Here, we will use English. Stemming is a process which helps the processor in understanding the words that have similar meaning. In this the words are brought down to their base or root level by removing the affixes. It is highly used in search engines. For e.g. - eating, eats, eaten all these will be reduced to eat after stemming. Some of the stop words removed are o i 1 me 2 my 3 myself 4 we 5 our 6 ours T ourselves 8 you 9 you're 2.3 Which word occurs the most number of times in inaugural address for each president? Mention the top three words. (after removing the stopwords) Results after removing stopwords and stemming, ‘+ For Franklin D. Roosevelt's speech’ The top three words in Roosevelt's Speech(after removing the stopwords) are : [Cnation’, 17), (‘know’, 10), (‘peopl', 9), (‘spirit', 9), (‘life', 9), (‘democraci', 9)] Here ‘peopl’, ‘spirit’, ‘occurrences. Most occurring word: Nation. fe’ and ‘democraci’ all are on 3 place because of the same number of ‘+ For John F. Kennedy's speech: The top three words in Kennedy's Speech(after removing the stopwords) are : [('let', 16), (‘us', 12), (‘power’, 9)] Most occurring word: Let. ‘+ For Richard Nixon's speech: The top three words in Nixon's Speech(after removing the stopwords) are : [('us', 26), (‘let', 22), (‘america’, 21)] Most occurring word: Us. 2.4 Plot the word cloud of each of the speeches of the variable. (after removing the stopwords) Word Cloud is a data visualization technique used for representing text data in which the size of each word indicates its frequency or imp. 2. For generating word-cloud we need word-cloud package. By default itis not installed in the kernel, so we have to install it. After importing the package we will again remove the stopwords but will not perform stemming. As removing stops words would remove the filter the unwanted words that possibly have no sentiment analysis. Word Cloud of Roosevelt's Speech: pa eee RYT fee t faith I Elen alol | aa t CIsa sii alone . acl freedom! farsa tie detiocracySp Government Xeli nmap aayss We can see some highlighted words like “nation”, "know’, “people”, etc which we observed as top words in the previous question. This shows the bigger the size more the frequency Word Cloud of Kennedy's Speech: Se President t (= qi bo co) Fe) oe TO Br | v Tovetesy today Word Cloud of Nixon’s Speech: Insights - Our objective was to look at all the 3 speeches and analyse them. To find the strength and sentiment of the speeches. Based on the outputs we can see that there are some similar words that are present in all the speeches. These words may the point which inspired the many people and also get them the seat of the president of United States of Ame Among all the speeches “ nation * is the word that is significantly highlighted in all three.

You might also like