You are on page 1of 1

Computation Thinking

1. Using decomposition, what are the primary sub-problems that need to be solved in
solving the overall problem?
a. Gather inputs
b. Get the number of occurrences of the keyword and its synonyms
2. Using pattern recognition, what patterns do you see in the solution, i.e., what
processes need to be repeated?
a. Gathering inputs
b. Checking number of occurrences of keyword in collection of documents
c. Checking number of occurrences of each word in array of keyword synonyms
in the collection of documents
3. Using data abstraction and representation, how would you represent the thesaurus,
the corpus, and each of the documents in the corpus?
a. I will represent the thesaurus with a dictionary (HashMap), with keys
representing words and values as array of word synonyms making synonym
lookup of keyword synonym faster. Only words with be needed, the meaning
of words will not be needed.
b. Corpus will be a collection of documents to search from.
4. Using pattern recognition, what patterns do you see in the solution, i.e., what
processes need to be repeated?
a. Initialize inputs
b. Lookup synonyms of keyword from thesaurus
c. Store keyword and each synonym as dictionary keys with values equal to 0
d. Scan through each word in each document in corpus, if any word matches
either keyword or one of the synonyms, increment corresponding value by 1
till no more document can be scanned.
e. Return dictionary which contains number of occurrences
5. Describe a problem that you may face -- either in your career or in everyday life --
that involves determining the number of occurrences of a word and its synonyms in
a corpus of documents. The problem you face may be much bigger than that and
require that calculation as only a small part of the solution, but should involve
looking through some collection of text and looking for certain words.
a. Checking if a mail is a spam or not. First you need to train your model with
several mails and extracting information that makes a mail a spam or
otherwise, checking number of occurrences of words as this metric will help
determine if a particular mail is a spam.

You might also like