BigData ML

5.
The procedure to organize items of a given collection into groups based on some similar
features called as ————-
(1 Point)
Decision Trees
Regression
Association
Clustering
6.
The terms used in Machine Learning with Big data are. i). Pattern Recognition ii). Data
mining iii). Data slang iv).Predictive Analytics
(1 Point)
(iii) is wrong.
(ii) is correct.
Only (i) and (iv) are correct.
All are correct.

7.
Which of these is not the characteristic of Big Data?
(1 Point)
Veracity
Volume
Integrity
Variety
8.
Which of the following is false for Apache Spark?
(1 Point)
Enables powerful interactive and data analytics application across live streaming data
It is the kernel of spark
It enables users to run SQL/HQL queries on the top of spark.
Provides an execution platform for all the spark applications

9.
Dimensionality Reduction is a
(1 Point)
Clustering Problem
Feature Extraction Problem
Classification Problem
Regression Problem
10.
The process of constructing a mathematical model that can be used to predict one
variable by another variable
(1 Point)
Correlation
Outlier
Regression
Cluster Analysis
11.
How is KNN model used for classification?
(1 Point)
All the neighbours that are ‘K’ distance apart from the new sample point determine the label
for the new sample
The class labels of ‘K’ neighbouring samples determine the label for the new sample.
All the training samples within a circle of ‘K’ radius determine the label for the new sample.
12.
Choose the false statement.
(1 Point)
Association analysis, define what an item set is.
One can uncover unexpected and useful relationships with association analysis
Association rules are not used to determine when items or events occur together
The goal is to come up with a set of rules to capture associations between items or events
13.
What are the two parts in data understanding phase of CRISP-DM?
(1 Point)
Data exploration and data preprocessing
Data acquisition and data exploration
Data acquisition and data preprocessing
Data preprocessing and data modelling

14.
The main steps in the k-means clustering algorithm are
(1 Point)
Calculate the centroids, then determine the appropriate stopping criterion depending on the
number of centroids.
Assign each sample to the closest centroid, then calculate the new centroid.
Calculate the distances between the cluster centroids, then find the two closest centroids.
Count the number of samples, then determine the initial centroids.

15.
__________ is a process of predicting numeric values and not category.
(1 Point)
Regression
Classification
Prediction
Analysis
16.
Misclassification rate is another name given for :
(1 Point)
Classification Rate
Training Rate
Error Rate
Testing Rate
17.
What is the purpose of exploring data?
(1 Point)
To digitize your data.
To gather your data into one repository
To gain a better understanding of your data.
To generate label for your data

18.
Select the one that is NOT a way to handle missing values?
(1 Point)
Drop samples with missing values
Replace missing values with median value
Replace a missing value with an outlier
Replace missing values with most probable value.

19.
Which of the following is measure used in decision trees while selecting splitting criteria
that partitions data into the best possible manner.
(1 Point)
Gini Index
Association
Probability
Regression
20.
What is involved in data wrangling?
(1 Point)
Removing the outliers
Removing noise from the data
Feature selection and Feature transformation
Cleaning the data

21.
Which of the following is NOT an example of regression?
(1 Point)
Estimating the amount of rain
Predicting the price of a stock
Determining whether power usage will rise or fall
Predicting the demand for a product

22.
Which of the following is not affected by the curse of dimensionality?
(1 Point)
KNN
Correlation
Decision Tree
Naïve Bayes
23.
Choose the correct statement
(1 Point)
Bar plots never use aggregation - not sure
Bar plots are drawn for numeric variables
Histograms and bar plots are used for categorical and numeric data respectively.
Bar plots always use numeric binner

24.
A model that overfits will not _______ well to new data.
(1 Point)
Regularise
Generalize
Justify
Optimize
25.
In linear regression, the least squares method is used to
(1 Point)
Determine the regression line that best fits the samples
Determine whether the target is categorical or numerical
Determine how to partition the data into training and test sets.
Determine the distance between two pairs of samples.

26.
Sentiment Analysis is an example of :
(1 Point)
Regression, Classification and Clustering
Regression Only
Regression, Classification, Clustering and Reinforcement Learning
Regression, Classification and Reinforcement

27.
Which of these statements is true about samples and variables?
(1 Point)
All
A Sample is an instance or example of an entity.
A variable describes a special characteristic of an entity in your data.
A sample can have many variables to describe it.

28.
Merging duplicate records while retaining relevant data is an example that illustrates the
use of _________ knowledge to address a data quality issue.
(1 Point)
data
none of these
feature
domain
29.
What is Dimensionality reduction?
(1 Point)
Dimensionality reduction is scaling variable values to smaller range
Dimensionality reduction is analysing data in high dimensional space
Generation of synthetic data from original data
Dimensionality reduction is finding a smaller subset of feature that can effectively capture
the characteristics of the input data
30.
Cluster results can be used to
(1 Point)
Segment the data into groups so that each group can be analyzed further
Create labeled samples for a classification task
All of these choices are valid uses of the resulting clusters.
Determine anomalous samples
Classify new samples

31.
Which category of machine learning algorithms are supervised?
(1 Point)
Classification and clustering
Regression and clustering
Regression and association analysis
Classification and regression

32.
Which of the following is not a type of clustering algorithm?
(1 Point)
Centroid clustering
K-Mean clustering
Density clustering
Simple clustering
33.
Which is not a way to accomplish pre-pruning in decision trees?
(1 Point)
Stop if number of records< some threshold
Stop if improvement in impurity measure< some threshold
None
Stop when the tree is grown to its maximum size

34.
———— regression finds a relationship between one or more features (independent
variables) and a continuous variables (dependent variable).
(1 Point)
None of These
Non-linear
Linear
Both of these - not sure
Submit
This content is created by the owner of the form. The data you submit will be sent to the form owner. Microsoft is not
responsible for the privacy or security practices of its customers, including those of this form owner. Never give out
your password.
Powered by Microsoft Forms | Privacy and cookies | Terms of use
Doubts - 8, 23, 26, 32, 34 please write question and answer no..
PLEASE CONFIRM REMAINING 4
-> Choose the correct statement - Bar plots never use aggregation - i had done this since
bar is category type things and histogram is numeric type things
-> regression finds a relationship between one or more features (independent

variables) and a continuous variables (dependent variable). - Both of these
-> Which of the following is not a type of clustering algorithm? - Simple Clustering - SURE?
-> Which of the following is false for Apache Spark? - Enables powerful interactive and
data analytics application across live streaming data
-> Sentiment Analysis is an example of: - Regression, Classification and Reinforcement
submit??
Submitted - shameek ++
Thanks everyone
Submitted - Devesh

BigData ML

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BigData ML

Uploaded by

Copyright:

Available Formats

5.

Only (i) and (iv) are correct.

All are correct.

It is the kernel of spark

It enables users to run SQL/HQL queries on the top of spark.

Provides an execution platform for all the spark applications

Feature Extraction Problem

Association analysis, define what an item set is.

Data exploration and data preprocessing

Data acquisition and data exploration

Data acquisition and data preprocessing

Data preprocessing and data modelling

Count the number of samples, then determine the initial centroids.

To digitize your data.

To gather your data into one repository

To gain a better understanding of your data.

To generate label for your data

Drop samples with missing values

Replace missing values with median value

Replace a missing value with an outlier

Replace missing values with most probable value.

Removing the outliers

Removing noise from the data

Feature selection and Feature transformation

Cleaning the data

Predicting the price of a stock

Determining whether power usage will rise or fall

Predicting the demand for a product

Bar plots never use aggregation - not sure

Bar plots are drawn for numeric variables

Bar plots always use numeric binner

Determine the regression line that best fits the samples

Determine whether the target is categorical or numerical

Determine the distance between two pairs of samples.

Regression, Classification and Clustering

Regression, Classification, Clustering and Reinforcement Learning

Regression, Classification and Reinforcement

A Sample is an instance or example of an entity.

A variable describes a special characteristic of an entity in your data.

A sample can have many variables to describe it.

Dimensionality reduction is scaling variable values to smaller range

Dimensionality reduction is analysing data in high dimensional space

Generation of synthetic data from original data

Create labeled samples for a classification task

All of these choices are valid uses of the resulting clusters.

Determine anomalous samples

Classify new samples

Classification and clustering

Regression and clustering

Regression and association analysis

Classification and regression

Stop if number of records< some threshold

Stop if improvement in impurity measure< some threshold

Stop when the tree is grown to its maximum size

Both of these - not sure

Powered by Microsoft Forms | Privacy and cookies | Terms of use

-> regression finds a relationship between one or more features (independent

-> Sentiment Analysis is an example of: - Regression, Classification and Reinforcement

You might also like