Professional Documents
Culture Documents
3 - Working With Text Data
3 - Working With Text Data
Madhuri Prabhala
Working with text data
Practice problem
Details on the dataset
10. How many unique words are there in the text corpus?
11. What are the 10 most frequent words?
12. What are the 10 least frequent words?
13. Create a Word Cloud.
14. What are your observations?
15. What will you do next?
Sentiment Analysis
16. What are the different sentiment scores that can be calculated?
Use textblob and Vader
17. What is your take based on the sentiment scores
Topic Model
❑ Perplexity
o Speaks of how well the model works on held-out data.
o Lower perplexity scores are considered better.
❑ Coherence
o How close to human intuition the identified topics are.
o There are multiple measures of coherence such as:
c_v, c_umass, c_npmi, c_a
o We choose models with higher number of coherence score.