Professional Documents
Culture Documents
In this particular project, we are going to work on the inaugural corpora from the nltk
Import Libraries.
import nltk
nltk.download('inaugural')
inaugural.fileids()
inaugural.raw('1941-Roosevelt.txt')
inaugural.raw('1961-Kennedy.txt')
inaugural.raw('1973-Nixon.txt')
[nltk_data] C:\Users\Hp\AppData\Roaming\nltk_data...
[nltk_data] C:\Users\Hp\AppData\Roaming\nltk_data...
[nltk_data] C:\Users\Hp\AppData\Roaming\nltk_data...
[nltk_data] Package movie_reviews is already up-to-date!
[nltk_data] C:\Users\Hp\AppData\Roaming\nltk_data...
y0 = pd.DataFrame({'Text':inaugural.raw('1961-Kennedy.txt')},index = [0])
y1 = pd.DataFrame({'Text':inaugural.raw('1941-Roosevelt.txt')},index = [0])
[('the', 9446),
('of', 7087),
(',', 7045),
('and', 5146),
('.', 4856),
('to', 4414),
('in', 2561),
('a', 2184),
('our', 2021),
('that', 1748)]
Most Common top (10) Words Used by all 3 Presidents during the Inaugural Ceremony since the
Time.
speeches. – 3 Marks.
We can filter the stop words with the help to Filter, Sort & Stop function.
'i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "yo
t'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been'
, 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',
', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'ag
', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other',
'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than'
, 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'shou
ld', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', '
text =inaugural.raw('1941-Roosevelt.txt')
text_tokens = word_tokenize(y1['Text'][0])
est]
print(tokens_without_sw)
We need to tokenize the all three speeches to get the stop words and to get out the special
print(filtered_sentence)
Need to Filter all speeches to get the speech in proper Maner., we can use function Filter Sentences.
Roosevelt_split = filtered_sentence.split()#y0['Text'][0].split()
Roosevelt_counter = Counter(Roosevelt_split)
Kennedy_split = filtered_sentence.split()#y1['Text'][0].split()
Kenndey_counter = Counter(Kennedy_split)
Nixon_split = filtered_sentence.split()#y2['Text'][0].split()
Nixon_counter = Counter(Nixon_split)
In [39]:
Roosevelt_most_occur = Roosevelt_counter.most_common(10)
equent_words', 'Roosevelt_total_words'])
Roosevelt_freq
Kennedy_most_occur = Kenndey_counter.most_common(10)
Kennedy_freq
Nixon_most_occur = Nixon_counter.most_common(10)
', 'Nixon_total_words'])
Nixon_freq
Nixon_Frequent_words Nixon_total_words
0 , 77
1 . 68
2 -- 25
3 It 13
4 The 10
5 know 10
6 We 10
7 spirit 9
8 life 9
9 us 8
The Most Common words use by the all 3 President during the Speech.
Most common word of Roosevelt speech [(',', 77), ('.', 68), ('--', 25
), ('It', 13), ('The', 10), ('know', 10), ('We', 10), ('spirit', 9),
Most common word of Kennedy speech [(',', 77), ('.', 68), ('--', 25),
('It', 13), ('The', 10), ('know', 10), ('We', 10), ('spirit', 9), ('l
'It', 13), ('The', 10), ('know', 10), ('We', 10), ('spirit', 9), ('li
Marks¶
With the Help of World Cloud Function, we can distinguish the most used word by the all 3 Presidents
During the Speech. We need to change the Vales of y0,y1,& y2 for app