Problem 2- In this particular project, we are going to work on the inaugural corpora from the nltk
in Python. We will be looking at the following speeches of the Presidents of the United States of
America:
1. President Franklin D, Roosevelt in 1941
2. President John F. Kennedy in 1961
3. President Richard Nixon in 1973
2.1 Find the number of characters, words, and sentences for the mentioned
documents.
Characters
Characters in Franklin D. Roosevelt's speech: 7571
Characters in John F. Kennedy's speech: 7618
Characters in Richard Nixon's speech: 9991
Words:
Words in Franklin D. Roosevelt's speech: 1536
Words in John F. Kennedy's speech: 1546
Words in Richard Nixon's speech: 20208
Sentences
Sentences in Franklin D. Roosevelt's speech: 68
Sentences in John F. Kennedy's speech: 52
Sentences in Richard Nixon's speech: 69
2.2 Remove all the stopwords from all three speeches.
To remove the stopwords, there is package called “stopwords’ in the nitk.corpus library.
So, in order to do 80 we need to import following libraries-
= from nitk.corpus import stopwords
= from nitk.stem.porter import PorterStemmer
The stopwords library contains all the stop words like ‘and’, ‘a’, ‘is’, ‘to’ ‘is’, ', ‘of, ‘to’ ete., that usually don't
have any importance in understanding the sentiment or usefullness in machine learning algorithms, These
stopwords present in the package are universally accepted stopwords and we can add using the (.extend())
function or remove them as per our requirement.
Also, we need to specify the language we are working with before defining the functions, as there are many
language packages. Here, we will use English.
Stemming is a process which helps the processor in understanding the words that have similar meaning. In
this the words are brought down to their base or root level by removing the affixes. It is highly used in
search engines. For e.g. - eating, eats, eaten all these will be reduced to eat after stemming.
Some of the stop words removed areo i
1 me
2 my
3 myself
4 we
5 our
6 ours
T ourselves
8 you
9 you're
2.3 Which word occurs the most number of times in inaugural address for
each president? Mention the top three words. (after removing the stopwords)
Results after removing stopwords and stemming,
‘+ For Franklin D. Roosevelt's speech’
The top three words in Roosevelt's Speech(after removing the stopwords) are :
[Cnation’, 17), (‘know’, 10), (‘peopl', 9), (‘spirit', 9), (‘life', 9), (‘democraci', 9)]
Here ‘peopl’, ‘spirit’,
‘occurrences.
Most occurring word: Nation.
fe’ and ‘democraci’ all are on 3 place because of the same number of
‘+ For John F. Kennedy's speech:
The top three words in Kennedy's Speech(after removing the stopwords) are :
[('let', 16), (‘us', 12), (‘power’, 9)]
Most occurring word: Let.
‘+ For Richard Nixon's speech:
The top three words in Nixon's Speech(after removing the stopwords) are :
[('us', 26), (‘let', 22), (‘america’, 21)]
Most occurring word: Us.2.4 Plot the word cloud of each of the speeches of the variable. (after
removing the stopwords)
Word Cloud is a data visualization technique used for representing text data in which the size of each
word indicates its frequency or imp. 2. For generating word-cloud we need word-cloud package. By
default itis not installed in the kernel, so we have to install it.
After importing the package we will again remove the stopwords but will not perform stemming. As
removing stops words would remove the filter the unwanted words that possibly have no sentiment
analysis.
Word Cloud of Roosevelt's Speech:
pa eee RYT
fee
t faith
I Elen alol |
aa t
CIsa sii
alone .
acl
freedom!
farsa tie
detiocracySp
Government
Xeli nmap aayssWe can see some highlighted words like “nation”, "know’, “people”, etc which we observed as top words in
the previous question. This shows the bigger the size more the frequency
Word Cloud of Kennedy's Speech:
Se
President
t
(=
qi
bo
co)
Fe)
oe
TO
Br
|
v
Tovetesy
todayWord Cloud of Nixon’s Speech:
Insights -
Our objective was to look at all the 3 speeches and analyse them. To find the strength and
sentiment of the speeches.
Based on the outputs we can see that there are some similar words that are present in all
the speeches.
These words may the point which inspired the many people and also get them the seat of
the president of United States of Ame
Among all the speeches “ nation * is the word that is significantly highlighted in all three.