You are on page 1of 5

Go Back to Machine Learning

Course Content

Weekly Quiz 1 (ML)

Type : Graded Quiz


Attempts : 1/1

Questions : 11
Time : 20m

Due Date : Jun 01, 11:59 PM

Your Score : 12.00/15

Instructions

Attempt History

Date Attempt Marks

Jun 01, 11:06 PM 1 12 Hide answers

Question No: 1 Correct Answer

Marks: 1.50/1.5
Term Document Matrix is the transpose of the Document Term Matrix

True You Selected

False

A document-term matrix or term-document matrix is a mathematical matrix that describes the frequency of terms that occur in a collection
of documents

Question No: 2 Correct Answer

Marks: 2/2

You have created a document-term matrix of the data, treating every tweet as one document. Which of the following is correct, in regards to
document term matrix?

A) Removal of stop words from the data will affect the dimensionality of data
B) Normalization of words in the data will reduce the dimensionality of data
C) Converting all the words in lowercase will not affect the dimensionality of the data

This study source was downloaded by 100000834959320 from CourseHero.com on 03-12-2022 23:37:37 GMT -06:00
C
https://www.coursehero.com/file/63592660/Weekly-Quiz-1-ML-Machine-Learning-Great-Learningpdf/
A&B You Selected

Choices A and B are correct because stop word removal will decrease the number of features in the matrix, normalization of words will also
reduce redundant features, and, converting all words to lowercase will also decrease the dimensionality

Question No: 3 Correct Answer

Marks: 1/1

What is the following an example of Bag of Words?

it is a truth universally acknowledged that a single man in possession of a good fortune must be in want of a wife You Selected

It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.

A bag-of-words is a representation of text that describes the occurrence of words within a document. It involves two things: *A vocabulary
of known words. *A measure of the presence of known words. It is called a “bag” of words, because any information about the order or
structure of words in the document is discarded. The model is only concerned with whether known words occur in the document, not
where in the document.

Question No: 4 Correct Answer

Marks: 1/1
Which of the following is not an example of a stop word?

how

really

webpage You Selected

In natural language processing, useless words (data), are referred to as stop words. Stop Words: A stop word is a commonly used word (such
as “the”, “a”, “an”, “in”) that a search engine has been programmed to ignore, both when indexing entries for searching and when retrieving
them as the result of a search query

Question No: 5 Correct Answer

Marks: 1.50/1.5

Fill in the blanks:

The ____ format is often preferred to be used for serializing and transmitting structured data over a network connection.

JSON You Selected

XML

The JSON format is often used for serializing and transmitting structured data over a network connection. It is used primarily to transmit
This study source was downloaded by 100000834959320 from CourseHero.com on 03-12-2022 23:37:37 GMT -06:00

data between a server and web application, serving as an alternative to XML. JSON is JavaScript Object Notation
https://www.coursehero.com/file/63592660/Weekly-Quiz-1-ML-Machine-Learning-Great-Learningpdf/
Question No: 6 Correct Answer

Marks: 1.50/1.5

Beautiful Soup is a Python package for parsing which type(s) of document(s)?

HTML

XML

Both of HTML and XML You Selected

None of HTML and XML

Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed mark-up, i.e. non-closed tags, so
named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web
scraping

Question No: 7 Correct Answer

Marks: 1.50/1.5

The following are an example of which type of data?

A) Satellite Imagery
B) Email
C) Social Media
D) Mobile Data

Unstructured Data You Selected

Structured Data

Unstructured data files often include text and multimedia content. Examples include e-mail messages, word processing documents, videos,
photos, audio files, presentations, webpages and many other kinds of business documents

Question No: 8 Incorrect Answer

Marks: 0/1

9) N-grams are defined as the combination of N keywords together. How many bi-grams can be generated from the given sentence?

It was a bright cold day in April and the clocks were striking thirteen

14 You Selected

13 Correct Option

This study source was downloaded by 100000834959320 from CourseHero.com on 03-12-2022 23:37:37 GMT -06:00
A bigram is a sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words. A bigram is an n-
gram for n=2. [“It was”, “was a”, “a bright”, “bright cold”, “cold day”, “day in”, “in April”, “April and”, “and the”, ”the clocks‘’, “clocks were”, “were
https://www.coursehero.com/file/63592660/Weekly-Quiz-1-ML-Machine-Learning-Great-Learningpdf/
striking , striking thirteen ]. Meaning there are in total of 13

Question No: 9 Correct Answer

Marks: 1/1
In Text Processing, which of the following algorithms is used to reduce word variations by converting every word in the corpus to its root
word?

Stemming You Selected

Bi-gram Tokenization

Stemming is a kind of normalization for words. Normalization is a technique where a set of words in a sentence are converted into a
sequence to shorten its lookup. The words which have the same meaning but have some variation according to the context or sentence are
normalized.

Question No: 10 Correct Answer

Marks: 1/1
Only individual words can be used as features/tokens in Document-Term Matrix

True

False You Selected

A token is a meaningful unit of text, such as a word, that we are interested in using for analysis, and tokenization is the process of splitting
text into tokens. This one-token-per-row structure contrasts with the ways text is often stored in current analyses, perhaps as strings or in
a document-term matrix.

Question No: 11 Incorrect Answer

Marks: 0/2
In a corpus of N documents, one document is randomly picked. The document contains a total of T terms and the term “data” appears K times.
What is the correct value for the product of TF (term frequency) and IDF (inverse-document-frequency), if the term “data” appears in
approximately one-third of the total documents?

KT * log(3) You Selected

K * log(3) / T Correct Option

T * log(3) / K

log(3) / KT

formula for TF is K/T formula for IDF is log(total docs / no of docs containing “data”) = log(1 / (⅓)) = log (3) Hence correct choice is
Klog(3)/T

Previous Next
This study source was downloaded by 100000834959320 from CourseHero.com on 03-12-2022 23:37:37 GMT -06:00

https://www.coursehero.com/file/63592660/Weekly-Quiz-1-ML-Machine-Learning-Great-Learningpdf/
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

© 2020 All rights reserved Privacy Terms of service Help

This study source was downloaded by 100000834959320 from CourseHero.com on 03-12-2022 23:37:37 GMT -06:00

https://www.coursehero.com/file/63592660/Weekly-Quiz-1-ML-Machine-Learning-Great-Learningpdf/
Powered by TCPDF (www.tcpdf.org)

You might also like