You are on page 1of 10

5.

The procedure to organize items of a given collection into groups based on some similar
features called as ————-
(1 Point)

Decision Trees

Regression

Association

Clustering
6.
The terms used in Machine Learning with Big data are. i). Pattern Recognition ii). Data
mining iii). Data slang iv).Predictive Analytics
(1 Point)

(iii) is wrong.

(ii) is correct.

Only (i) and (iv) are correct.

All are correct.


7.
Which of these is not the characteristic of Big Data?
(1 Point)

Veracity

Volume

Integrity

Variety
8.
Which of the following is false for Apache Spark?
(1 Point)

Enables powerful interactive and data analytics application across live streaming data

It is the kernel of spark

It enables users to run SQL/HQL queries on the top of spark.

Provides an execution platform for all the spark applications


9.
Dimensionality Reduction is a
(1 Point)

Clustering Problem

Feature Extraction Problem

Classification Problem

Regression Problem
10.
The process of constructing a mathematical model that can be used to predict one
variable by another variable
(1 Point)

Correlation

Outlier

Regression

Cluster Analysis
11.
How is KNN model used for classification?
(1 Point)
All the neighbours that are ‘K’ distance apart from the new sample point determine the label
for the new sample

The class labels of ‘K’ neighbouring samples determine the label for the new sample.

All the training samples within a circle of ‘K’ radius determine the label for the new sample.
12.
Choose the false statement.
(1 Point)

Association analysis, define what an item set is.

One can uncover unexpected and useful relationships with association analysis

Association rules are not used to determine when items or events occur together

The goal is to come up with a set of rules to capture associations between items or events
13.
What are the two parts in data understanding phase of CRISP-DM?
(1 Point)

Data exploration and data preprocessing

Data acquisition and data exploration

Data acquisition and data preprocessing

Data preprocessing and data modelling


14.
The main steps in the k-means clustering algorithm are
(1 Point)

Calculate the centroids, then determine the appropriate stopping criterion depending on the
number of centroids.

Assign each sample to the closest centroid, then calculate the new centroid.
Calculate the distances between the cluster centroids, then find the two closest centroids.

Count the number of samples, then determine the initial centroids.


15.
__________ is a process of predicting numeric values and not category.
(1 Point)

Regression

Classification

Prediction

Analysis
16.
Misclassification rate is another name given for :
(1 Point)

Classification Rate

Training Rate

Error Rate

Testing Rate
17.
What is the purpose of exploring data?
(1 Point)

To digitize your data.

To gather your data into one repository

To gain a better understanding of your data.

To generate label for your data


18.
Select the one that is NOT a way to handle missing values?
(1 Point)

Drop samples with missing values

Replace missing values with median value

Replace a missing value with an outlier

Replace missing values with most probable value.


19.
Which of the following is measure used in decision trees while selecting splitting criteria
that partitions data into the best possible manner.
(1 Point)

Gini Index

Association

Probability

Regression
20.
What is involved in data wrangling?
(1 Point)

Removing the outliers

Removing noise from the data

Feature selection and Feature transformation

Cleaning the data


21.
Which of the following is NOT an example of regression?
(1 Point)
Estimating the amount of rain

Predicting the price of a stock

Determining whether power usage will rise or fall

Predicting the demand for a product


22.
Which of the following is not affected by the curse of dimensionality?
(1 Point)

KNN

Correlation

Decision Tree

Naïve Bayes
23.
Choose the correct statement
(1 Point)

Bar plots never use aggregation - not sure

Bar plots are drawn for numeric variables

Histograms and bar plots are used for categorical and numeric data respectively.

Bar plots always use numeric binner


24.
A model that overfits will not _______ well to new data.
(1 Point)

Regularise

Generalize
Justify

Optimize
25.
In linear regression, the least squares method is used to
(1 Point)

Determine the regression line that best fits the samples

Determine whether the target is categorical or numerical

Determine how to partition the data into training and test sets.

Determine the distance between two pairs of samples.


26.
Sentiment Analysis is an example of :
(1 Point)

Regression, Classification and Clustering

Regression Only

Regression, Classification, Clustering and Reinforcement Learning

Regression, Classification and Reinforcement


27.
Which of these statements is true about samples and variables?
(1 Point)

All

A Sample is an instance or example of an entity.

A variable describes a special characteristic of an entity in your data.

A sample can have many variables to describe it.


28.
Merging duplicate records while retaining relevant data is an example that illustrates the
use of _________ knowledge to address a data quality issue.
(1 Point)

data

none of these

feature

domain
29.
What is Dimensionality reduction?
(1 Point)

Dimensionality reduction is scaling variable values to smaller range

Dimensionality reduction is analysing data in high dimensional space

Generation of synthetic data from original data

Dimensionality reduction is finding a smaller subset of feature that can effectively capture
the characteristics of the input data
30.
Cluster results can be used to
(1 Point)

Segment the data into groups so that each group can be analyzed further

Create labeled samples for a classification task

All of these choices are valid uses of the resulting clusters.

Determine anomalous samples

Classify new samples


31.
Which category of machine learning algorithms are supervised?
(1 Point)

Classification and clustering

Regression and clustering

Regression and association analysis

Classification and regression


32.
Which of the following is not a type of clustering algorithm?
(1 Point)

Centroid clustering

K-Mean clustering

Density clustering

Simple clustering
33.
Which is not a way to accomplish pre-pruning in decision trees?
(1 Point)

Stop if number of records< some threshold

Stop if improvement in impurity measure< some threshold

None

Stop when the tree is grown to its maximum size


34.
———— regression finds a relationship between one or more features (independent
variables) and a continuous variables (dependent variable).
(1 Point)
None of These

Non-linear

Linear

Both of these - not sure

Submit
This content is created by the owner of the form. The data you submit will be sent to the form owner. Microsoft is not
responsible for the privacy or security practices of its customers, including those of this form owner. Never give out
your password.

Powered by Microsoft Forms | Privacy and cookies | Terms of use

Doubts - 8, 23, 26, 32, 34 please write question and answer no..
PLEASE CONFIRM REMAINING 4

-> Choose the correct statement - Bar plots never use aggregation - i had done this since
bar is category type things and histogram is numeric type things

-> regression finds a relationship between one or more features (independent


variables) and a continuous variables (dependent variable). - Both of these

-> Which of the following is not a type of clustering algorithm? - Simple Clustering - SURE?

-> Which of the following is false for Apache Spark? - Enables powerful interactive and
data analytics application across live streaming data

-> Sentiment Analysis is an example of: - Regression, Classification and Reinforcement

submit??

Submitted - shameek ++
Thanks everyone
Submitted - Devesh

You might also like