Professional Documents
Culture Documents
Course Content
Questions : 11
Time : 20m
Instructions
Attempt History
Marks: 1.50/1.5
Term Document Matrix is the transpose of the Document Term Matrix
False
A document-term matrix or term-document matrix is a mathematical matrix that describes the frequency of terms that occur in a collection
of documents
Marks: 2/2
You have created a document-term matrix of the data, treating every tweet as one document. Which of the following is correct, in regards to
document term matrix?
A) Removal of stop words from the data will affect the dimensionality of data
B) Normalization of words in the data will reduce the dimensionality of data
C) Converting all the words in lowercase will not affect the dimensionality of the data
This study source was downloaded by 100000834959320 from CourseHero.com on 03-12-2022 23:37:37 GMT -06:00
C
https://www.coursehero.com/file/63592660/Weekly-Quiz-1-ML-Machine-Learning-Great-Learningpdf/
A&B You Selected
Choices A and B are correct because stop word removal will decrease the number of features in the matrix, normalization of words will also
reduce redundant features, and, converting all words to lowercase will also decrease the dimensionality
Marks: 1/1
it is a truth universally acknowledged that a single man in possession of a good fortune must be in want of a wife You Selected
It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.
A bag-of-words is a representation of text that describes the occurrence of words within a document. It involves two things: *A vocabulary
of known words. *A measure of the presence of known words. It is called a “bag” of words, because any information about the order or
structure of words in the document is discarded. The model is only concerned with whether known words occur in the document, not
where in the document.
Marks: 1/1
Which of the following is not an example of a stop word?
how
really
In natural language processing, useless words (data), are referred to as stop words. Stop Words: A stop word is a commonly used word (such
as “the”, “a”, “an”, “in”) that a search engine has been programmed to ignore, both when indexing entries for searching and when retrieving
them as the result of a search query
Marks: 1.50/1.5
The ____ format is often preferred to be used for serializing and transmitting structured data over a network connection.
XML
The JSON format is often used for serializing and transmitting structured data over a network connection. It is used primarily to transmit
This study source was downloaded by 100000834959320 from CourseHero.com on 03-12-2022 23:37:37 GMT -06:00
data between a server and web application, serving as an alternative to XML. JSON is JavaScript Object Notation
https://www.coursehero.com/file/63592660/Weekly-Quiz-1-ML-Machine-Learning-Great-Learningpdf/
Question No: 6 Correct Answer
Marks: 1.50/1.5
HTML
XML
Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed mark-up, i.e. non-closed tags, so
named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web
scraping
Marks: 1.50/1.5
A) Satellite Imagery
B) Email
C) Social Media
D) Mobile Data
Structured Data
Unstructured data files often include text and multimedia content. Examples include e-mail messages, word processing documents, videos,
photos, audio files, presentations, webpages and many other kinds of business documents
Marks: 0/1
9) N-grams are defined as the combination of N keywords together. How many bi-grams can be generated from the given sentence?
It was a bright cold day in April and the clocks were striking thirteen
14 You Selected
13 Correct Option
This study source was downloaded by 100000834959320 from CourseHero.com on 03-12-2022 23:37:37 GMT -06:00
A bigram is a sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words. A bigram is an n-
gram for n=2. [“It was”, “was a”, “a bright”, “bright cold”, “cold day”, “day in”, “in April”, “April and”, “and the”, ”the clocks‘’, “clocks were”, “were
https://www.coursehero.com/file/63592660/Weekly-Quiz-1-ML-Machine-Learning-Great-Learningpdf/
striking , striking thirteen ]. Meaning there are in total of 13
Marks: 1/1
In Text Processing, which of the following algorithms is used to reduce word variations by converting every word in the corpus to its root
word?
Bi-gram Tokenization
Stemming is a kind of normalization for words. Normalization is a technique where a set of words in a sentence are converted into a
sequence to shorten its lookup. The words which have the same meaning but have some variation according to the context or sentence are
normalized.
Marks: 1/1
Only individual words can be used as features/tokens in Document-Term Matrix
True
A token is a meaningful unit of text, such as a word, that we are interested in using for analysis, and tokenization is the process of splitting
text into tokens. This one-token-per-row structure contrasts with the ways text is often stored in current analyses, perhaps as strings or in
a document-term matrix.
Marks: 0/2
In a corpus of N documents, one document is randomly picked. The document contains a total of T terms and the term “data” appears K times.
What is the correct value for the product of TF (term frequency) and IDF (inverse-document-frequency), if the term “data” appears in
approximately one-third of the total documents?
T * log(3) / K
log(3) / KT
formula for TF is K/T formula for IDF is log(total docs / no of docs containing “data”) = log(1 / (⅓)) = log (3) Hence correct choice is
Klog(3)/T
Previous Next
This study source was downloaded by 100000834959320 from CourseHero.com on 03-12-2022 23:37:37 GMT -06:00
https://www.coursehero.com/file/63592660/Weekly-Quiz-1-ML-Machine-Learning-Great-Learningpdf/
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
This study source was downloaded by 100000834959320 from CourseHero.com on 03-12-2022 23:37:37 GMT -06:00
https://www.coursehero.com/file/63592660/Weekly-Quiz-1-ML-Machine-Learning-Great-Learningpdf/
Powered by TCPDF (www.tcpdf.org)