All Merged IDS Quiz - Merged

18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)
Quiz 1
Due Dec 19 at 23:59 Points 10 Questions 40 Available Dec 18 at 19:00 - Dec 19 at 23:59
Time Limit 60 Minutes
Instructions
The Quiz I can be attempted only once. You will not be provided with a make-up Quiz, if you miss this Quiz.
The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go back to the previous question if you skip a
question.
“Choose the most appropriate answer to each question.”
The answers will be visible only after three days once the quiz has ended.
All the best.
Attempt History
Attempt Time Score
LATEST Attempt 1 48 minutes 9.38 out of 10
 Correct answers will be available on Dec 22 at 0:00.
Score for this quiz: 9.38 out of 10

Submitted Dec 18 at 23:57
This attempt took 48 minutes.
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 1/25
Question 1 0.25 / 0.25 pts
Which of the following is not a tool used by a Data Scientist?
Springboot
Python
R
SAS
A city conducted a new bi-annual census of its residents. Which of the following most strongly suggests a
cognitive bias in their collected dataset?
The census data includes new data on how many cars are owned by the residents.
Some rows have null values for last names
Some rows have date of birth in DD-MM-YY and some in DD-MM-YYYY formats

The average income in the dataset is 40% higher than the average income of the city’s population from the
previous census 2 years ago.
Statement 1: Role of a Business analyst usually requires expertise on building ML models.

Statement 2: Role of a Data Scientist usually requires expertise on performing descriptive analysis.
Which of the following is right?
Statement 1 is correct and Statement 2 is wrong
Both statements are correct
Both statements are wrong
Statement 1 is wrong and Statement 2 is correct
Hadoop
MongoDB
Amazon S3
Flask
Tableau is not likely to be used by
Business Analyst
Visualization Engineer
Data Architect
Data Scientist
Which of the following best describes the difference between the data analyst and data scientist?
Data analyst and data scientists plays the same role in the project
Data analyst are more proficient in R whereas Data scientists are more proficient in Python
Data analyst does estimation whereas data scientist predicts & explains it as well
Data analyst just deal with numbers whereas data scientists deals with algorithms
Compute the depth of each bin for the data given below, if the number of bins is 5.
[47, 38, 40, 42, 47, 42, 41, 39, 42, 45, 36, 37, 36, 40, 43, 40, 45, 36, 43, 46]
2.02
2.17
2.2
2.3
Calculate cosine similarity between two documents represented by vectors
x= (0,1,1,1,2,3,0,0,0,2,1) and
y= (2,1,1,2,1,2,1,1,0,0,0)
0.034
50.2
0.64
0.99
Suppose that the minimum and maximum values for an attribute are 4.3 and 7.6, respectively. Compute the
scaled value of 5.4 if min-max normalization is applied to scale [0.0,1.0]. (Answer should have a precision of
X.XX]
0.36
0.33
0.43
0.45
Suppose that the minimum and maximum values for an attribute are 4.3 and 7.6, respectively. Compute the
scaled value of 5.4 if min-max normalization is applied to scale [-1.0,+1.0]. (Answer should have a precision
of X.XX)
0.33
0.76
0.66
-0.33
Identify the data analytics task for the following scenario. A car service showroom manager wants to analyze
his marketing and sales data to understand the reason for drop in sales.
Predictive Analytics
Descriptive Analytics
Diagnostic Analytics
Prescriptive Analytics
Amongst which of the following is / are the branch of statistics which deals with the development of statistical
methods is classified as ___.
Industry statistics
None of the mentioned above
Economic statistics
Applied statistics
Identify the data analytics task for the following scenario. A physics teacher is analyzing the answer scripts of
the students to identify the areas that he/she should concentrate on so that the students understand the
concepts better.
Aanalyzing the data to determine why some phenomena related to learning happened a type of
Descriptive
Prescriptive
Predictive
Diagnostic
What is the name of the Google-developed programming framework that enables the creation of applications
for processing big data sets in a distributed computing environment?
Google Cloud Dataproc
ZooKeeper
Hive
MapReduce
Match the following data analytics to their description:
Descriptive What happened?
Diagnostic Why did this happen?
Predictive What might happen in the f
Prescriptive What should we do next?
Data Science uses _________ to make decision and prediction.
prescriptive analytics
All of the options
Predictive analytics
machine learning
The process through which businesses analyse customer data or other types of information in order to find
patterns and links between various data items is known as:
Data mining
Data digging
Consumer engagement
Customer data management
Identify the data analytics task for the following scenario. A project manager is analysing past projects to
identify the software, hardware and human resources that will be required to complete a new project
successfully on time.
Fortis-Apollo hospital is planning to design a model which maps patients to the best possible treatments
based on the diagnosis. Identify the data analytics task for this scenario
Cognitive analytics
We are predicting the weather condition as foggy, warm, cloudy, and misty at Bangalore using the data
collected in the last one month. This task is an example of
Regression
Association Rule
Classification
Clustering
Which of the following statement is false with respect to data set?
Raw data should be processed only one time.
Merging concerns combining datasets on the same observations to produce a result
All of the listed options
Sub setting can be used to select and exclude variables and observations
Is it possible to convert a Nominal scale to an Ordinal Scale during data analysis?

True
False
Which of the following is not true for SEMMA model?
SEMMA is focused on the model development aspects of data mining.

It places less emphasis on the initial planning phases covered in CRISP-DM (Business Understanding and Data
Understanding phases) and omits entirely the Deployment phase.
The SEMMA model also emphasizes data mining as a non-linear, adaptive process.

SEMMA is a logical organisation of the functional tool set of SAS Enterprise Miner for carrying out the core tasks of
data mining.
In a FashionStore Data set the feature ShirtSize { S,M,L,XL,XXL} is an example of
Ordinal attribute
Numeric attribute
Continuous attribute
Nominal attribute
"Order Fulfilment Date" should come after "Order Creation Date". This is an example of which data quality
aspect:
Timeliness
Consistency
Integrity
Conformity
Missing data in Pandas is represented by
NULL
Null
Empty
NaN
A box plot is the visual representation of the following statistical summary
Minimum, Average, Maximum
Min, Median, Mode
Minimum, First Quartile, Third Quartile, Second Quartile, Mean
Minimum, First Quartile, Median, Third Quartile, Maximum
Partial
Match the following techniques with the definitions
Binarization maps a continuous attribut
Binning Divide the range of a contin
Concept Hierarchy Smooth out the effect of no
Functional Transformation Transform attribute values x
The scatterplot implies that
The features are independent
The features are positively correlated.
The features are negatively correlated.
None of the given options
Incorrect
Question 31 0 / 0.25 pts
In Python, the function used to find whether there are missing values is:
dropna()
isna()
imputena()
fillna()
Choose the correct empirical relation
mean−mode ≈ 3×(mean−median).
mean−median ≈ 3×(mean−mode)
mean−mode ≈ 3×(median-mean).
median-mean ≈ 3×(mean−mode)
In One-Hot Encoding the advantages are:
Expands the feature space.
A and B
Suitable for linear models.
Keeps all the information of the categorical variable.
Which of the statement is TRUE ?
Outliers should be addressed only in the training dataset
Outliers should always be addressed in the dataset
Treatment of outliers depends on the problem statement
Outliers should be addressed in the test dataset
The statistical description (x1,x2, . . . ,xN)/N, for the data values x1,x2, . . . ,xN is called as their
________________
mean
IQR
mode
median
In which phase, the duplicates (of the data) are removed? Choose the best possible answer.
Data Preparation
Data Exploration
Data Collection
Data Understanding
In a box and whisker plot of data, point out the FALSE statement, about Outliers
Outliers are beyond 1.5 times the Inter Quartile Range (IQR) from the lower quartile
Outliers are beyond 1.5 times the Inter Quartile Range (IQR) from the median
Outliers are beyond the lowest or the highest value in the dataset
Outliers are beyond 1.5 times the Inter Quartile Range (IQR) from the upper quartile
Consider the data set below.How will you(most appropriately) handle the missing values for record 1,4,6
respectively.
Data set description.
price: continuous from 5118 to 45400.(class label)
normalized-losses: continuous from 65 to 256.
num-of-doors: four, two.
replace by mode,replace by medain,replace by mean.
ignore all 3 records.
Ignore the tuple,replace by mean,replace by mode.
replace by mode,ignore the tuple,replace by mean.
Incorrect
Which of the following is a bottom-up approach for discretization?
Equal-frequency binning
Histogram analysis
Entropy based discretization
Correlation analysis
For exploring continuous data using descriptive statistics which of the following method is used
Range
Frequency
Percentage
Histogram
Quiz Score: 9.38 out of 10
12/18/22, 11:21 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)
Quiz 1
Due Dec 19 at 23:59 Points 10 Questions 40
Available Dec 18 at 19:00 - Dec 19 at 23:59 Time Limit 60 Minutes
Instructions
The Quiz I can be attempted only once. You will not be provided with a make-up Quiz, if you miss this
Quiz.
The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question.
All the best.
Attempt History
Attempt Time Score

Data scientist is not responsible for
building continuous data stream
data mining
data manipulation
data analytics
Which of the following is correct skills for a Data Scientist?
Data Wrangling
Probability & Statistics
All of the options
Machine Learning / Deep Learning
Which of the following is not a application for data science?
Image & Speech Recognition
Recommendation Systems
Privacy Checker
Online Price Comparison
Which of the following organizational structure for Data Science teams

indicates a significant investment within the company towards being data-
driven.
Centralized
Consulting
Decentralized
Federated
Identify the data mining tasks from the below list.
i. Dividing the customers of a company according to their gender

ii. Predicting the future stock price of a company using historical records
iii. Monitoring the heart rate of a patient for abnormalities
iv. Extracting the frequencies of a sound wave
v. Sorting a student database based on student identification numbers
i, ii, iii
iv, v
i, iv, v
ii, iii
Hadoop
Flask
Amazon S3
MongoDB
Suppose that the mean and standard deviation of the values for an
attribute are 8.9 and 6.5, respectively. Apply z-score normalization to a
value of 3.2. [Answer should have a precision of X.XXXX]
0.8679
0.8769
-0.8679
-0.8769
Compute the depth of each bin for the data given below, if the number of
bins is 5.
[47, 38, 40, 42, 47, 42, 41, 39, 42, 45, 36, 37, 36, 40, 43, 40, 45, 36, 43,
46]
2.3
2.17
2.02
2.2
Compute the depth of each bin for data given below, if the number of bins
is 5.
[47, 38, 40, 42, 47, 42, 41, 39, 42, 45, 36, 37, 36, 40, 43, 40, 45, 36, 43,
46]
3
6
5
4
20/5
Suppose that the minimum and maximum values for the attribute income
are $12,000 and $98,000, respectively. The new range is [0.0,1.0]. Apply
min-max normalization to a value of $73,600.
0.716
0.561
0.856
0.758
Aanalyzing the data to determine why some phenomena related to

learning happened a type of
Diagnostic
Prescriptive
Descriptive
Predictive
Incorrect
A scenario where you feel unwell, go to a doctor and explain your

symptoms to the doctor. This can be considered analogous to which stage
of data analytics?
Diagnostic
Descriptive
Prescriptive
Predictive
Predictive What might happen in t
Prescriptive What should we do nex
Incorrect Question 14 0 / 0.25 pts
A scenario where you visited a doctor for your fever, and the doctor
asked you questions about your condition and tried to understand how
it might have happened. Now the doctor comes to a conclusion based
on your responses as normal fever. But to be sure about it and ensure
there are no underlying health problems or other risk factors that make
it more likely that you won't recover as quickly, the doctor performs
some tests and asks to see previous medical reports. Now, based on
this doctor is sure that it is a regular fever, and he’s sure it will go away
in 5 days. The last part of the narration, which is derived from above-
mentioned details can be considered analogous to which stage of data

analytics
Prescriptive
Predictive
Diagnostic
Descriptive
Access Situation - "Cost and Benefits" - Falls into which phase of Crisp -
DM
Data modeling
Evaluation
Business Understanding
Data Preparation
Which of the following are classification problems?

Calculating a room's temperature (in Celsius) based on other
environmental factors (such as atmospheric pressure, humidity etc).

Predicting if a cricket player is a batsman or bowler, given his playing
record.
filtering spam from mails

Finding the shorter path between two already-existing routes between two
locations.

Predict traffic congestion along a specific route between two locations
using vehicle journey times.
Incorrect
In an exam, marks less than 40 (out of 100) is considered fail. The task of
finding the names of the failed students with a mark range between 10
and 25 is
Failure Analytics
Not part of analytics
Regression is
(1) Prediction of a value of a given continuous valued variable based on
the values of other variables.
(2) Regression works with a linear or nonlinear model of dependency
among the variables.
Which of the above is true?
Option 2
Option 1
None
Both 1& 2
Match with the most appropriate answer related to analytics.
Descriptive Analytics what happened
Diagnostic Analytics why happened
Predictive Analytics what will happened
Prescriptive Analytics what can make it happe
Identify the data analytics task for the following scenario. A project
manager is analysing past projects to identify the software, hardware and
human resources that will be required to complete a new project
The best method for predicting the number of deaths due to covid is
Regression
Correlation
Classification
Clustering
Incorrect
We are predicting the weather condition as foggy, warm, cloudy, and misty
at Bangalore using the data collected in the last one month. This task is
an example of
Classification
Clustering
Regression
Association Rule
Incorrect
Temperature in kelvin is of__________________ attribute type.
Nominal attribute
Ordinal attribute
Interval attribute
Ratio attribute
Which of the following is not an example of ordinal attributes?
Exam Grades
Academic ranks
Zip codes
Military ranks
Data lake mainly stores data in
Structured format
none
both
Raw format
In a FashionStore Data set the feature ShirtSize { S,M,L,XL,XXL} is an

example of
Numeric attribute
Nominal attribute
Ordinal attribute
Swiggy wants customers to provide their satisfaction feedback in a scale

of 1-5 where
1- Very Unsatisfied; 2- Somewhat Unsatisfied; 3- Neutral; 4- Somewhat

Satisfied; 5- Very Satisfied
What type of attribute is satisfaction?
Interval attribute
Nominal attribute
Ratio attribute
Ordinal attribute
Missing data in Pandas is represented by
Null
Empty
NaN
NULL
In a boxplot, where Q1, Q2 and Q3 are the first, second and third quartiles
respectively, the interquartile range IQR is calculated as:
IQR = Q3 – Q1
None of the above
IQR = Q2-Q1
IQR = Q3-Q2
Histogram analysis algorithms can be applied recursively to generate a

multilevel concept hierarchy.
True
False
In which phase, the duplicates (of the data) are removed? Choose the
best possible answer.
Data Exploration
Data Preparation
Data Collection
Data Understanding
attribute are 6500 and 2000, respectively. Apply z-score normalization to
value of 8000.
.5
.75
.7
.6
For a data analytics task to analyse feedback on her subject for a class of
60 students, a school teacher decided to use the survey submitted by the
ten students who come for tuitions for that subject, at her home. Identify
the type of sampling she is doing.
Stratified sampling
Non Probabilistic sampling
Systematic Sampling
Sampling without replacement
“Similarity” means:
A listing of the similar features of a collection of objects.

The number of tuples in a database whose attributes have similar values.
A collection of similar objects.
Numerical measure of how alike two data objects are.
Histogram analysis algorithms can be based on either equal width or

equal frequency.
True
False
A sample is ------------------ if it has approximately the same property of

interest
Systamatic
Probabilistic
Representative
Qualitative
What is the median of this data 2,5,1,6,7?
5
5.5
6
2
Consider the following Python code.

input=['Havells','Philips','Syska','Eveready','Lloyd']
le = sklearn.preprocessing.LabelEncoder()
le.fit(input)
print(le.transform('Lloyd'))
The last line of the code snippet will print
2
3
1
4
There are two sets X={10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24} and Y = {-30, -31, -32, -33, -34, -35, -36, -37, -38, -39, -40, -41,
-42, -43, -44}. What is TRUE about the standard deviations of X and Y i.e.
σX and σY respectively?
σY will be smaller than σX.
Will be the same.
Magnitude will be the same but the sign will be different
σX will be smaller than σY.
Match the following function usage in Python used in data cleaning.
dropna() return Index without NA
fillna() to fill NA/NaN values us
interpolate() to fill NA values in the d
notnull() to find missing values
Quiz 1
Due Dec 19 at 11:59pm Points 10 Questions 40
Available Dec 18 at 7pm - Dec 19 at 11:59pm Time Limit 60 Minutes
Instructions
The Quiz I can be attempted only once. You will not be provided with a make-up Quiz, if you miss
this Quiz.
All the best.
Attempt History
Attempt Time Score
 Correct answers will be available on Dec 22 at 12am.

Submitted Dec 18 at 8:45pm
Data science is the process of diverse set of data through

____________
processing data
analysing data
All of the options
organizing data
Statement 1: Role of a Business analyst usually requires expertise on

building ML models.
Statement 2: Role of a Data Scientist usually requires expertise on
performing descriptive analysis.
Incorrect

ii. Predicting the future stock price of a company using historical
records
i, ii, iii
iv, v
ii, iii
i, iv, v
Data Science project steps are highly linear.
True
False
The task of the data scientist include
Collecting Raw Data
communicate the results to the stakeholders
All of the Above
Identifying relevant features
data mining
data analytics
data manipulation
Compute the Manhattan distance between A(2,3) and B(5,7).
5
8
6
7
0.8769
-0.8679
0.8679
-0.8769
Compute the width of each bin for data given below, if the number of
bins is 4.
[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46,
48, 36]
4
3
5
(49-30)/4
6
Compute the Euclidean distance between A(2,3) and B(5,7).
5
4
7
6
A police team wants to predict the crime rate in a locality based on

certain attributes. Which modelling technique would be appropriate
Classification
Clustering
Regression
Optimization
A scenario where you feel unwell and go to a doctor. The doctor asks
you questions like were you exposed to rain or cold climate or did you
have contact with a sick person or did you have food from outside etc.
Based on your answers,doctor came to the conclusion. This can be
considered analogous to which stage of data analytics?
Descriptive
Predictive
Prescriptive
Diagnostic
We are predicting the weather condition as foggy, warm, cloudy, and

misty at Bangalore using the data collected in the last one month. This
task is an example of
Classification
Association Rule
Clustering
Regression
A company wants to find the target segment of people for one of its
products. Which modelling technique would be generally appropriate
Clustering
Classification
Ranking
Regression
In "Business Understanding" phase of the data science process, the

goals are identified and the objectives defined.
True
False
Data pre-processing improves the data quality and make data mining
algorithms efficient and effective. What are the data pre-processing
tasks along with Data cleaning?
Data Transformation
Data Integration
All of the above.
Data Reduction
Access Situation - "Cost and Benefits" - Falls into which phase of Crisp
- DM
Data Preparation
Data modeling
Evaluation
Incorrect
Identify the data analytics task for the following scenario. A software
programmer develops a tool to convert programs written in Verilog to
Python.
Incorrect
Identify the data analytics task for the following scenario. A student is
analyzing various blogs and vlogs to find out the skill set that has to be
acquired to become a data scientist in the future.
In 2001, Big Data created the three Vs: volume, velocity, and variety.
The V's have grown to encompass veracity and value in the years
since. Big data is sometimes subjected to a fifth V, which is:
Volatile
Vector
Variability
Vulnerability
Amongst which of the following is / are the branch of statistics which

deals with the development of statistical methods is classified as ___.
Applied statistics
Industry statistics
Economic statistics
What are the best practices for implementing big data analytics
programmes?
Determining business direction based on data analysis
Letting go entirely of 'old ideas' related to data management

Focusing on business goals and how to use big data analytics
technologies to meet them

Adopting data analysis tools based on a laundry list of their capabilities
Zip codes
Exam Grades
Military ranks
Academic ranks
Which of the following Python library is required for web scraping?
BeautifulSoup
WebCrawler
Scraper
WebSpider
What is the difference between interval/ratio and ordinal variables?

The distance between categories is equal across the range of
interval/ratio data
Ordinal data can be rank ordered, but interval/ratio data cannot
Interval/ratio variables contain only two categories

Ordinal variables have a fixed zero point, whereas interval/ratio
variables do not
Attributes cannot be called as:
Variables
Features
Dimensions
Data point
In a dataset, we are tracking whether a customer is purchasing a

product or not. This is an example of
Symmetric attribute
Discrete attribute
Asymmetric attribute
Dealing with missing values during data preparation is what kind of an

operation
Data retrieval
Data cleansing
Data transformation
Data combining
The____ and standard deviation are strongly affected by outliers
MEDIAN
MEAN
MODE
RANGE
Data Preparation
Data Collection
Data Understanding
Data Exploration
Which of the following is NOT a visualization technique?
Matrix
TreeMap
Lexeme
ConeTree
23, 24} and Y = {-30, -31, -32, -33, -34, -35, -36, -37, -38, -39, -40, -41,
-42, -43, -44}. What is TRUE about the standard deviations of X and Y
i.e. σX and σY respectively?
Will be the same.
The features are independent.
Consider the data set below.How will you(most appropriately) handle

the missing values for record 1,4,6 respectively.
For a data analytics task to analyse feedback on her subject for a

class of 60 students, a school teacher decided to use the survey
submitted by the ten students who come for tuitions for that subject, at
her home. Identify the type of sampling she is doing.
Systematic Sampling
Stratified sampling
Which among the following are valid methods of handling missing data
P. Eliminating Data Objects
Q. Estimating Missing Values
R. Ignoring the Missing Values during Analysis
S. Replacing with all possible values
Q and R
All the options are correct
R only
P, Q and R
Which is the major task of Data Integration

For the same real world entity, resolving attribute values from different
sources
None of the above
Identification of missing rows for identified key values
Clustering of similar data from different sources
Which of the following statements are true?
I. The smaller data sets resulting from data reduction require less
memory and processing time.
II. Aggregation provides a high‐level view of the data instead of a low‐

level view
III. High-quality data that is aggregated can lead to a high chance of

identifying false positives and negatives
IV. An advantage of aggregation is the potential loss of interesting

details
Statement I and II
Statement II and III
Only Statement III
Statement I and IV
The missing value for categorical attribute is substituted with
mean of a value
least frequent attribute value
none of the above
most frequent attribute value
Quiz 1
Instructions
this Quiz.
All the best.
Attempt History
Attempt Time Score


____________
processing data
analysing data
All of the options
organizing data

building ML models.
Incorrect

records
i, ii, iii
iv, v
ii, iii
i, iv, v
True
False
Collecting Raw Data
All of the Above
data mining
data analytics
data manipulation
5
8
6
7
0.8769
-0.8679
0.8679
-0.8769
bins is 4.
[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46,
48, 36]
4
3
5
(49-30)/4
6
5
4
7
6

Classification
Clustering
Regression
Optimization
Descriptive
Predictive
Prescriptive
Diagnostic

Classification
Association Rule
Clustering
Regression
Clustering
Classification
Ranking
Regression

True
False
Data Transformation
Data Integration
All of the above.
Data Reduction
- DM
Data Preparation
Data modeling
Evaluation
Incorrect
Python.
Incorrect
Volatile
Vector
Variability
Vulnerability

Applied statistics
Industry statistics
Economic statistics
programmes?


Zip codes
Exam Grades
Military ranks
Academic ranks
BeautifulSoup
WebCrawler
Scraper
WebSpider

interval/ratio data

variables do not
Variables
Features
Dimensions
Data point

Symmetric attribute
Discrete attribute

operation
Data retrieval
Data cleansing
Data transformation
Data combining
MEDIAN
MEAN
MODE
RANGE
Data Preparation
Data Collection
Data Understanding
Data Exploration
Matrix
TreeMap
Lexeme
ConeTree
23, 24} and Y = {-30, -31, -32, -33, -34, -35, -36, -37, -38, -39, -40, -41,
Will be the same.


Systematic Sampling
Stratified sampling
Q and R
R only
P, Q and R

sources
None of the above

level view


details
Statement I and II
Only Statement III
Statement I and IV
mean of a value
none of the above
Quiz 1
Instructions
this Quiz.
All the best.
Attempt History
Attempt Time Score

Tableau software is most likely to be used during
presentation of idea to stakeholders
Feature selection
Deployment
Data warehousing
Pattern Recognition is a sub-field of Data Science.
True
False
Match the following set of roles to their job description
Data Analyst data collection and in
Data Scientist solves problems usin
Data Architect warehousing the data
Data Engineer implements, test and
Incorrect
"Take last weeks data and predict the sales for next six months within
next few days with 2 people team" is good example of which of the
following Data Science challenge?
Insufficient timing for project completion
Lack of professional having sound knowledge of Data science skills
Unrealistic expectations
Data sizing
There are following key roles in data science project
Data Scientist, Architect, SME, Sponsor, Programmer
Data Scientist, Analyst, SME, Programmer
Data Scientist, Analyst
Data Scientist, Analyst, Sponsor
Building a continuous data stream
Data Analytics
Data manipulation
Data mining
Suppose that the minimum and maximum values for an attribute are 4.3
and 7.6, respectively. Compute the scaled value of 5.4 if min-max
normalization is applied to scale [0.0,1.0]. (Answer should have a
precision of X.XX]
0.45
0.36
0.43
0.33
Compute the Cosine similarity between A(3,6) and B(4,8).
1.1
1.2
1.0
1.3
Compute the depth of each bin for data given below, if the number of
bins is 5.
[47, 38, 40, 42, 47, 42, 41, 39, 42, 45, 36, 37, 36, 40, 43, 40, 45, 36, 43,
46]
4
20/5
5
6
3
Find the inter quartile range(IQR) of the the following inputs :

4,4,10,11,15,7,14,12,6
11
8
10
7
Identify the data analytics task for the following scenario. A physics
teacher is analyzing the answer scripts of the students to identify the
areas that he/she should concentrate on so that the students
understand the concepts better.
The process through which businesses analyse customer data or other

types of information in order to find patterns and links between various
data items is known as:
Customer data management
Data digging
Data mining
Consumer engagement

symptoms to the doctor. This can be considered analogous to which
stage of data analytics?
Prescriptive
Diagnostic
Predictive
Descriptive
Variability
Volatile
Vulnerability
Vector
BITS investigating to determine the cause for decreased admissions for

CSI PG program is an example of
Descriptive analysis
Diagnositic Analysis
Predictive analysis
Prescriptive analysis

Sub setting can be used to select and exclude variables and
observations

Merging concerns combining datasets on the same observations to
produce a result
We are predicting the humidity at Bangalore using the data collected in

the last one month. This task is an example of
Regression
Clustering
Association Rule
Classification

Applied statistics
Industry statistics
Economic statistics
Incorrect
Analytics 3.0 consists of
Descriptive analytics
Predictive analytics
None of the above
Prescriptive analytics
CRISP-DM methodology is specifically built for IT(Information

Technology) projects.
True
False
Regression
Ranking
Classification
Clustering
Which of the following is not an issue with CRISP-DM model?

Various modeling techniques are selected and applied, and their
parameters are calibrated to optimal values3

Thorough evaluation indeed is needed, yet the CRISP-DM methodology
does not prescribe how to do this.

It very much underestimates the amount real experimentation that is
needed to get at viable results

The end-users of the analytical model are required to post-rationalize
the model, which leads to a lot of dissatisfaction
Incorrect
Is it possible to convert a Nominal scale to an Ordinal Scale during data

analysis?
True
False
e – mail is an example of ____________________ data
Structured
Semi-structured
Quasi-structured
Unstructured
Incorrect

Symmetric attribute
Discrete attribute
Incorrect

It places less emphasis on the initial planning phases covered in
CRISP-DM (Business Understanding and Data Understanding phases)
and omits entirely the Deployment phase.


SEMMA is a logical organisation of the functional tool set of SAS
Enterprise Miner for carrying out the core tasks of data mining.

The SEMMA model also emphasizes data mining as a non-linear,
adaptive process.
"Identify false statement "
Profession can be considered nominal
Subject grade can be considered ordinal

Response to 'Do you own a car?' can be considered symmetric binary

Response to 'Do you have a rare disease?' can be considered
symmetric binary.
A box plot is the visual representation of the following statistical

summary
Min, Median, Mode

operation
Data combining
Data cleansing
Data transformation
Data retrieval
Incorrect
Identify the sampling technique used in the following use case. For ML
classification task, the algorithm requires that the test set has equal
examples from the three categories.
Simple Random
Cluster Sampling
Stratified Random
Systematic Sampling
Incorrect
Partial
Which of the following is/are true about data aggregation?
It preserves all details even after aggregation at all times
It can work with both quantitative and qualitative attributes
It provides a high-level view of the data
It does not complement well with statistical analysis
A table contains the salary details of professionals in different fields,

categorized by field. The table has got 100,000 rows. Around 10% of
the rows do not have salary data. It is required to fill in the missing
salary data as part of data pre-processing. Choose from below the best
method for this:

Find the mean salary of the available 90% of the data and use that to fill
in all the missing data.

Find the field wise mean salary for the available data and fill in the
missing salary data with the applicable mean salary.
Delete the rows where salary data is missing.
Guess the missing data manually.
Data Integration is a
Generalization technique
None of the answers
Pre-processing technique
Data Normalization Technique
Incorrect

The number of tuples in a database whose attributes have similar
values.
Incorrect
5
5.5
6
2
Unanswered Question 39 0 / 0.25 pts
In which phase, the duplicates of the data are removed? Choose the
Data Collection
Data Requirements
Data Understanding
Data Preparation
Unanswered
Missing values should always be imputed before training the model.
None of the answers
Mostly True
True
False
Quiz 1
Instructions
this Quiz.
All the best.
Attempt History
Attempt Time Score

Due to market expectations, businesses are having difficulty retaining

highly trained data scientists and engineers.
False
No answer text provided.
True
True
False
Data Wrangling
All of the options
Partial Question 4 0.08 / 0.25 pts
Which of the following are the reasons for the sudden growth of
analytics?

Large number of user friendly analytics tools available for data
processing
Data is growing at 40% compound annual rate
Cost of storage has hugely dropped
Large number of analysts available in the market
Building a continuous data stream
Data manipulation
Data mining
Data Analytics
Tableau software is most likely to be used during
presentation of idea to stakeholders
Feature selection
Data warehousing
Deployment
bins is 5.
[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46,
48, 36]
5
4
(49−30)÷5
6
3
The value 56739 when scaled into Decimal Normalization will be

_________.
56.739
5673.9
5.6739
567.39
Suppose the Lab administrator measured the power consumption of

an entire network operations centre (NOC) and the set of consumption
details is 90 W, 104 W, 98 W, 98 W, 105 W, 92 W, 102 W, 100 W, 110
W, 98 W, 210 W and 115 W.What is the mode power consumption?
150W
100W
90W
98W
-0.8679
-0.8769
0.8679
0.8769
In _________ phase, final report/technical document of process is

prepared
None
Operationalize
Model building
Model planning
Vector
Variability
Vulnerability
Volatile
Prescriptive Analytics what can make it happ

Predictive analysis
Identify the data analytics task for the following scenario. A

pharmaceutical organization is developing a new drug or vaccine to
compact Covid-19 using machine learning techniques where the data is
from the existing drugs and the diseases it can fight or cure.
Identify the data analytics task for the following scenario. The sales of
various products of your organization per month per geographical area
is reported using an interactive visual tool.

Finding the shorter path between two already-existing routes between
two locations.



record.
Diagnostic
Descriptive
Predictive
Prescriptive
Incorrect
The 5 stages of KDD process is

1.Selection
2.Preprocessing
3.Tranformation
4.Data Mining
5.Interpretaion/Evaluation.
Identify the CRISP DM phases that corresponds to Stage 3 and 4 of

KDD.
Data Preparation,Modeling
Data Understanding,Evaluation
Modeling,Data Preparation
Evaluation,Business understanding
Match with the most appropriate answer, related to the tools

available to a Data Scientist.
Cassandra Big-data
Tableau Visualization
SAS Statistics
Weka Machine Learning
Incorrect
Which one of the following statement(s) is correct (Choose the most

appropriate answer)?
All the statements
None of the statements

Data analytics is the pursuit of extracting meaning from raw data using
specialized computer systems.

Data Analytics refers to the techniques used to analyze data to enhance
productivity and business gain.

Analytics is a process in which a computer examines information using
mathematical methods to find useful patterns.
Incorrect
Which of the following statements are true about data cleaning?
It focuses on removing inaccurate data from your data set
All of the given options

It focuses on transforming the data’s format by converting raw data into
another format
It enhances the data’s accuracy and integrity
WebSpider
BeautifulSoup
Scraper
WebCrawler
Which of the following is an example of raw data?
original swath files generated from a sonar system
all of the mentioned
a real-time GPS-encoded navigation file
initial time-series file of temperature values
Is it possible to convert an interval variable to an Ordinal Variable
True
False
In a FashionStore Data set the feature Jacket_Shade { Grey,brown,

black, Indigo, Beige , Khaki} is an example of
Ordinal attribute
Numeric attribute
Nominal attribute
As part of a survey in a large organization, one of the features that you

capture is designation. This type of data has the characteristic
Nominal, Quantitative, Discrete
Discrete, Quantitative, Ordinal
None of the given answers
Discrete, Qualitative, Ordinal
True
False
Data Requirements
Data Preparation
Data Understanding
Data Collection
fillna() to fill NA/NaN values u
interpolate() to find missing values
notnull() to fill NA values in the

method for this:


A data object is being described by a categorical attribute having four

categories, then for data analysis purpose if we want to transform the
attributes into numerical values, then
D. it can be done by encoding using 3 or 4 binary variables
A. it can’t be done
C. it can be done by encoding using only 4 binary variables
B. it can be done by encoding using only 3 binary variables
One Hot Encoding scales well as the number of class labels increases
Most of the time
None of the given statements
The statement is false
Some of the time

operation
Data cleansing
Data retrieval
Data combining
Data transformation
Q and R
P, Q and R
R only
Stratified Random
Systematic Sampling
Simple Random
Cluster Sampling
None of the answers
Which of the following methods are considered to be the best practice

for data cleaning?
cleansing large dataset without segmentation
Sorting data by attributes
By breaking large dataset into small data

interest
Probabilistic
Systamatic
Qualitative
Representative
Exploratory data analysis does not help in
Finding out the data type of a variable
In univariate and bivariate variable analysis
Finding statistical estimates of a variable
Derivation of new attributes
Quiz 1
Instructions
this Quiz.
All the best.
Attempt History
Attempt Time Score

In which of the following analysts are allocated to units throughout the

organization and their activities are coordinated by a central entity.
Consulting
Coordinational
Functional
Center of Excellence
Statement 1: Business Intelligence involves analyzing past data and

reporting on it.
Statement 2: Descriptive analysis involves analyzing past data and

reporting on it.
All of the options
Data Wrangling
Hadoop
MongoDB
Flask
Amazon S3

True
False
Privacy Checker
Compute the depth of each bin for the data given below, if the number
of bins is 5.
[47, 38, 40, 42, 47, 42, 41, 39, 42, 45, 36, 37, 36, 40, 43, 40, 45, 36,
43, 46]
2.2
2.3
2.17
2.02
The value 56739 when scaled into Decimal Normalization will be

_________.
56.739
567.39
5673.9
5.6739
value of 10.7.
0.2679
-0.2679
0.2769
-0.2769
Suppose that the minimum and maximum values for an attribute are
4.3 and 7.6, respectively. Compute the scaled value of 5.4 if min-max
precision of X.XX]
0.38
0.30
0.33
0.29
Match with the most appropriate answer, related to the tools available
to a Data Science life cycle.
SMAM 8 stages
SEMMA 5 stages
Big data life cycle 12 phases
CRISP-DM 6 stages



All the statements

A scenario where you feel unwell and go to a doctor. After a detailed

diagnosis, the doctor concludes that it is a regular fever and sends you
home with medicines. He also has instructions to rest and drink plenty
of fluids and return to the hospital if the fever doesn't subside in a
week. This can be considered analogous to which stage of data
analytics
Prescriptive
Predictive
Descriptive
Diagnostic
Which of the sentences below best describes predictive analytics?

a gateway that offers access to a variety of vital information from many
different sources on one screen

methods for predicting future behavior using statistical analysis and
data mining, particularly to maximize the strategic value of corporate
intelligence

a method that uses feature analysis to predict people in photos and
tags them to other photos on its own

software applications, also known as "bots," that are dispatched to carry
out a mission and gather data from web pages on behalf of a user

symptoms to the doctor. This can be considered analogous to which
stage of data analytics?
Prescriptive
Predictive
Descriptive
Diagnostic
Find the odd term.
Data Wrangling
Artificial Intelligence
Deep Learning
Machine Learning
Incorrect

another format
In _________ phase, final report/technical document of process is

prepared
Model building
None
Model planning
Operationalize
Incorrect
manager is analysing past projects to identify the software, hardware
and human resources that will be required to complete a new project
Identify the data analytics task for the following scenario. A program
manager wants to analyze user clickstream data from the mobile app to
understand how many users were using a particular feature that was
rolled out.
Big Data cannot be stored in the Storage Area Network (SAN)?
True
False
Identify the data analytics task for the following scenario. A mother
analysed the answer scripts of her daughter and found that they should
concentrate on reading comprehension to score better in the upcoming
exams.
In a dataset, it is observed that date is mentioned as 11/19/2010 and

19th November ,2012. This is an example of
Duplicate Data
Inconsistent data
Noisy data
Incomplete data
Swiggy wants customers to provide their satisfaction feedback in a

scale of 1-5 where
1- Very Unsatisfied; 2- Somewhat Unsatisfied; 3- Neutral; 4- Somewhat

Satisfied; 5- Very Satisfied
What type of attribute is satisfaction?
Nominal attribute
Ratio attribute
Ordinal attribute
Interval attribute
Dress color is of__________________ attribute type.
Nominal attribute
Ordinal attribute
Interval attribute
Ratio attribute
Regression techniques can be used for
Either Identifying Missing Values or predicting continuous
Identifying missing values
None of the answers
Predicting continuous output
Variables
Data point
Features
Dimensions
Which of the following does not represent a bin obtained by applying

equi-width binning on the data [24, 0, 6, 60, 63, 30, 87, 90, 87]?
[0, 24, 30]
[30, 60, 63]
[87, 87, 90]
[0, 6, 24]
Scaling a variable is not an essential criteria when the ML pipeline uses

algorithms based on gradient descent Optimization
True
False

method for this:


Which of the following does not represent a bin created by applying

[0,6,24]
[30,60,63]
[0,24,30]
[87,87,90]
Cluster Sampling
Simple Random
Stratified Random
Systematic Sampling

lb = sklearn.preprocessing.LabelBinarizer()
print(lb.fit_transform(['yes', 'no', 'no', 'yes']))
[true, false, false, true]
[True, False, False, True]
[0,1,1,0]
[1,0,0,1]
For exploring continuous data using descriptive statistics which of the

following method is used
Histogram
Range
Percentage
Frequency
“Proximity” in Data Science terms means:
A measure of the physical distance between two objects.

The area surrounding an object where the object is able to exert its
influence.
The extent of similarity or dissimilarity between two objects.
None of the above.
In a box and whisker plot of data, point out the FALSE statement, about
Outliers

Outliers are beyond 1.5 times the Inter Quartile Range (IQR) from the
upper quartile

lower quartile

median
Outliers are beyond the lowest or the highest value in the dataset
Find the Jaccard coefficient for the 2 data objects with the below
feature vectors.
0
NONE
1
.7


for data cleaning?
Quiz 1
Instructions
this Quiz.
All the best.
Attempt History
Attempt Time Score

SAS
Python
R
Springboot
Data Analyst data collection and inte
Data Scientist solves problems using
Data Engineer implements, test and m
Which of the following is performed by Data Scientist?
Create reproducible code
Challenge results
Define the question
All of the mentioned
Which one of the following statements is not true?
Data Science helps to translate data into a story.
A part of data science is the machine learning algorithms.
Data Science helps study Big Data.
Data Science does not find patterns in the data.
Who among the following is responsible for presenting the idea to

stakeholders and representing the data team with those unfamiliar with
statistics.
Data Engineer
Data Architect
Data Visualization Engineer
Data Journalist
Flask
Hadoop
MongoDB
Amazon S3
-0.8679
0.8769
0.8679
-0.8769
Suppose the administrator measured the power consumption of an

entire network operations centre (NOC) and the consumption details
are: 90 W, 104 W, 98 W, 98 W, 105 W, 92 W, 102 W, 100 W, 110 W, 98
W, 200 W and 115 W.What is the range of power consumption?
100W
98W
150W
110W
precision of X.XX]
0.29
0.33
0.38
0.30
Suppose that the minimum and maximum values for the attribute
income are $12,000 and $98,000, respectively. The new range is
[0.0,1.0]. Apply min-max normalization to a value of $73,600.
0.758
0.716
0.561
0.856
Big data analytics does not benefit a company:
Better understand customers
Refine marketing and advertising
Increase costs due to additional analytics investment
Increase shareholder dividends
Google tries to differentiate emails as spam and non-spam, this is an

example of
Association Rule
Clustering
Classification
Regression

two locations.



record.
Suppose a web user visits Flipkart during big billion-day sales.

Predicting whether he / she makes a purchase of a smartphone is a
___________________ task.
Reinforcement
Association
Classification
Regression

True
False
What is the name of the Google-developed programming framework

that enables the creation of applications for processing big data sets in
a distributed computing environment?
ZooKeeper
MapReduce
Hive

Predictive analysis
Incorrect
Identify the data analytics task for the following scenario. An e-

commerce platform is recommending products to their customers to
improve the shopping experience.

Predictive
Diagnostic
Descriptive
Prescriptive
Incorrect
A retail store realizes its sales were lower than expected in the last
quarter. Data scientist helps in identifying whether the sales were
affected uniformly across all segments or restricted to one segment?
What kind of analytics is this?
Predictive
Prescriptive
Descriptive
Diagnostic




All the statements
Incorrect
A course instructor has data about students attendance in her course in

the past semester . What kind of analytics is she performing when she
creates a line graph based on this data?
Prescriptive
Predictive
Descriptive
Diagnostic
Features
Dimensions
Data point
Variables
Incorrect
None of the answers
In a dataset, CarColor is one of the attributes and it can take the

following values {Red, Green, Yellow, Black}, what type of attribute is
CarColor?
Interval attribute
Ordinal attribute
Nominal attribute
Ratio attribute
True
False


symmetric binary.
Temperature is an example for ____________________ attribute
Continuous
Normal
Asymmetric
Discrete
From mathematical models for epidemics one observes that the initial
phase is in the exponential growth. This can be verified by plotting the
number of infections on log-scale. This is a use case for which
technique.
Sampling
Discretization
Aggregation
Transformation
Lexeme
TreeMap
Matrix
ConeTree
attribute are 6500 and 2000, respectively. Apply z-score normalization
to value of 8000.
.7
.75
.6
.5

True
False
In the given table below there is a requirement that to get the name,
gender, marks of the top-scoring students only. Which of the
following functionalities of data wrangling is used?
Replace
Data exploration
Filter
Reshape

for data cleaning?
RANGE
MODE
MEDIAN
MEAN

[0,24,30]
[87,87,90]
[30,60,63]
[0,6,24]

[1,0,0,1]
[0,1,1,0]
If you come across an value for AGE as 102
Change it to Mean Value
Do Nothing
Understand the Business Problem
Change to Mode Value
Histogram analysis
Quiz 1
Instructions
this Quiz.
All the best.
Attempt History
Attempt Time Score

SAS
Python
R
Springboot
Data Analyst data collection and inte
Data Scientist solves problems using
Data Engineer implements, test and m
Which of the following is performed by Data Scientist?
Create reproducible code
Challenge results
Define the question
All of the mentioned
Which one of the following statements is not true?
Data Science helps to translate data into a story.
A part of data science is the machine learning algorithms.
Data Science helps study Big Data.
Data Science does not find patterns in the data.
Who among the following is responsible for presenting the idea to

stakeholders and representing the data team with those unfamiliar with
statistics.
Data Engineer
Data Architect
Data Visualization Engineer
Data Journalist
Flask
Hadoop
MongoDB
Amazon S3
-0.8679
0.8769
0.8679
-0.8769

are: 90 W, 104 W, 98 W, 98 W, 105 W, 92 W, 102 W, 100 W, 110 W, 98
100W
98W
150W
110W
precision of X.XX]
0.29
0.33
0.38
0.30
0.758
0.716
0.561
0.856

example of
Association Rule
Clustering
Classification
Regression

two locations.



record.

___________________ task.
Reinforcement
Association
Classification
Regression

True
False
What is the name of the Google-developed programming framework

that enables the creation of applications for processing big data sets in
a distributed computing environment?
ZooKeeper
MapReduce
Hive

Predictive analysis
Incorrect
Identify the data analytics task for the following scenario. An e-

commerce platform is recommending products to their customers to
improve the shopping experience.

Predictive
Diagnostic
Descriptive
Prescriptive
Incorrect
A retail store realizes its sales were lower than expected in the last
quarter. Data scientist helps in identifying whether the sales were
affected uniformly across all segments or restricted to one segment?
What kind of analytics is this?
Predictive
Prescriptive
Descriptive
Diagnostic




All the statements
Incorrect

Prescriptive
Predictive
Descriptive
Diagnostic
Features
Dimensions
Data point
Variables
Incorrect
None of the answers

CarColor?
Interval attribute
Ordinal attribute
Nominal attribute
Ratio attribute
True
False


symmetric binary.
Temperature is an example for ____________________ attribute
Continuous
Normal
Asymmetric
Discrete
technique.
Sampling
Discretization
Aggregation
Transformation
Lexeme
TreeMap
Matrix
ConeTree
attribute are 6500 and 2000, respectively. Apply z-score normalization
to value of 8000.
.7
.75
.6
.5

True
False
In the given table below there is a requirement that to get the name,
gender, marks of the top-scoring students only. Which of the
following functionalities of data wrangling is used?
Replace
Data exploration
Filter
Reshape

for data cleaning?
RANGE
MODE
MEDIAN
MEAN

[0,24,30]
[87,87,90]
[30,60,63]
[0,6,24]

[1,0,0,1]
[0,1,1,0]
If you come across an value for AGE as 102
Change it to Mean Value
Do Nothing
Understand the Business Problem
Change to Mode Value
Histogram analysis
Quiz 1
Instructions
this Quiz.
All the best.
Attempt History
Attempt Time Score


____________
processing data
analysing data
All of the options
organizing data

building ML models.
Incorrect

records
i, ii, iii
iv, v
ii, iii
i, iv, v
True
False
Collecting Raw Data
All of the Above
data mining
data analytics
data manipulation
5
8
6
7
0.8769
-0.8679
0.8679
-0.8769
bins is 4.
[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46,
48, 36]
4
3
5
(49-30)/4
6
5
4
7
6

Classification
Clustering
Regression
Optimization
Descriptive
Predictive
Prescriptive
Diagnostic

Classification
Association Rule
Clustering
Regression
Clustering
Classification
Ranking
Regression

True
False
Data Transformation
Data Integration
All of the above.
Data Reduction
- DM
Data Preparation
Data modeling
Evaluation
Incorrect
Python.
Incorrect
Volatile
Vector
Variability
Vulnerability

Applied statistics
Industry statistics
Economic statistics
programmes?


Zip codes
Exam Grades
Military ranks
Academic ranks
BeautifulSoup
WebCrawler
Scraper
WebSpider

interval/ratio data

variables do not
Variables
Features
Dimensions
Data point

Symmetric attribute
Discrete attribute

operation
Data retrieval
Data cleansing
Data transformation
Data combining
MEDIAN
MEAN
MODE
RANGE
Data Preparation
Data Collection
Data Understanding
Data Exploration
Matrix
TreeMap
Lexeme
ConeTree
23, 24} and Y = {-30, -31, -32, -33, -34, -35, -36, -37, -38, -39, -40, -41,
Will be the same.


Systematic Sampling
Stratified sampling
Q and R
R only
P, Q and R

sources
None of the above

level view


details
Statement I and II
Only Statement III
Statement I and IV
mean of a value
none of the above
Quiz 1
Instructions
this Quiz.
All the best.
Attempt History
Attempt Time Score


____________
processing data
analysing data
All of the options
organizing data

building ML models.
Incorrect

records
i, ii, iii
iv, v
ii, iii
i, iv, v
True
False
Collecting Raw Data
All of the Above
data mining
data analytics
data manipulation
5
8
6
7
0.8769
-0.8679
0.8679
-0.8769
bins is 4.
[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46,
48, 36]
4
3
5
(49-30)/4
6
5
4
7
6

Classification
Clustering
Regression
Optimization
Descriptive
Predictive
Prescriptive
Diagnostic

Classification
Association Rule
Clustering
Regression
Clustering
Classification
Ranking
Regression

True
False
Data Transformation
Data Integration
All of the above.
Data Reduction
- DM
Data Preparation
Data modeling
Evaluation
Incorrect
Python.
Incorrect
Volatile
Vector
Variability
Vulnerability

Applied statistics
Industry statistics
Economic statistics
programmes?


Zip codes
Exam Grades
Military ranks
Academic ranks
BeautifulSoup
WebCrawler
Scraper
WebSpider

interval/ratio data

variables do not
Variables
Features
Dimensions
Data point

Symmetric attribute
Discrete attribute

operation
Data retrieval
Data cleansing
Data transformation
Data combining
MEDIAN
MEAN
MODE
RANGE
Data Preparation
Data Collection
Data Understanding
Data Exploration
Matrix
TreeMap
Lexeme
ConeTree
23, 24} and Y = {-30, -31, -32, -33, -34, -35, -36, -37, -38, -39, -40, -41,
Will be the same.


Systematic Sampling
Stratified sampling
Q and R
R only
P, Q and R

sources
None of the above

level view


details
Statement I and II
Only Statement III
Statement I and IV
mean of a value
none of the above
Quiz 1
Instructions
this Quiz.
All the best.
Attempt History
Attempt Time Score
LATEST Attempt 1 43 minutes 9 out of 10
Score for this quiz: 9 out of 10


____________
analysing data
processing data
organizing data
All of the options
Data Architect
Data Scientist
Business Analyst
Python
Springboot
SAS
R
Which of the following best describes the difference between the data
analyst and data scientist?

Data analyst does estimation whereas data scientist predicts & explains
it as well

Data analyst are more proficient in R whereas Data scientists are more
proficient in Python

Data analyst just deal with numbers whereas data scientists deals with
algorithms
True
False
If a person is coming from software development background, which of

the following Data science project roles will best suite him?
Storyteller
Machine Learning Engineer
Data Analyst
Big Data Engineer

are: 90 W, 104 W, 98 W, 98 W, 105 W, 92 W, 102 W, 100 W, 110 W, 98
150W
98W
100W
110W
Find the inter quartile range(IQR) of the the following inputs :

4,4,10,11,15,7,14,12,6
10
11
7
8
Calculate cosine similarity between two documents represented by

vectors
x= (0,1,1,1,2,3,0,0,0,2,1) and
y= (2,1,1,2,1,2,1,1,0,0,0)
0.034
50.2
0.64
0.99
0.716
0.758
0.856
0.561

Clustering
Regression
Association Rule
Classification
exams.

Predictive
Descriptive
Prescriptive
Diagnostic
Identify the data analytics task for the following scenario. A car service
showroom manager wants to analyze his marketing and sales data to
understand the reason for drop in sales.
For a given data set, the following data preprocessing techniques used
to improve the quality of data:
Data cleaning, Data integration, Data reduction and Data

transformations.
Which of the following statements is TRUE?
These techniques are mutually exclusive
The techniques are not mutually exclusive
All of the given options are true
All the techniques may not work together
Match with the most appropriate answer, related to the tools available
to a Data Science life cycle.
SMAM 8 stages
SEMMA 5 stages
Big data life cycle 12 phases
CRISP-DM 6 stages
Predictive
Descriptive
Prescriptive
Diagnostic
The answer to following question can be obtained by which type of

analytics?
"Whats the best that can happen?"

Association Rule
Clustering
Regression
Classification
Identify the data analytics task for the following scenario. Google is
using tools to suggest texts or phrases while composing emails.
Incorrect
The most time-consuming phase in a data science process is
Data collection
Deployment
Data preparation
Data Modelling
All of the above.
Data Transformation
Data Reduction
Data Integration
Incorrect
Street numbers are __________________ type of attributes.
Nominal attribute
Ordinal attribute
Ratio attribute
Interval attribute
Temperature in kelvin is of__________________ attribute type.
Nominal attribute
Ordinal attribute
Ratio attribute
Interval attribute

Discrete attribute
Symmetric attribute
In a FashionStore Data set the feature Apparel_Price showing the cost

of the apparel is an example of
Ordinal attribute
Numeric attribute
Nominal attribute

summary
Min, Median, Mode


Which of the following is/are true about data aggregation?
It can work with both quantitative and qualitative attributes
It provides a high-level view of the data
It does not complement well with statistical analysis
It preserves all details even after aggregation at all times
For exploring continuous data using descriptive statistics which of the

following method is used
Percentage
Histogram
Range
Frequency

values.
Identify the false statement.
It may not be a good idea to drop a data field with missing values.

As a data scientist, even if you know that certain outliers are valid data,
you might still omit them from model construction.

As a data scientist, while cleaning the data, you are not concerned with
why the data values are missing. You just fix them.


le.fit(input)
print(le.transform(‘Syska’))
The last line of the code snippet will return
2
4
1
3
The following representation technique shows the maximum, minimum,

median, and other characterizing measures at the same time
Boxplot
Tabulation
Histogram
Pareto diagram
Data Understanding
Data Collection
Data Preparation
Data Exploration
Choose the possible combinations for drawing a scatter plot for given
data.
Age and Test1 Score
Weight and Test 1 Score
Age and weight
All the options
What does discretization do ?
Both of the above.
None of the above.
Reduce overall data size.
Convert a continuous attribute into a discrete attribute.
Which statement best compares histogram and 5-number summary
Histogram is always more informative on data distribution
5-number summary can be used for non-numeric data
5-number summary is robust w.r.t. noise and outliers
Histogram can be very informative with finer ranges
Quiz Score: 9 out of 10
Quiz 1
Instructions
this Quiz.
All the best.
Attempt History
Attempt Time Score


Storyteller
Big Data Engineer
Data Analyst
Privacy Checker
Which one of the following is not a necessary characteristic of a Data

Scientist?
Communicative
Punctual
Creative
Technical
Which of the following are correct skills for a Data Scientist?
probability and statistics
all of the options
machine learning/deep learning
data wrangling

Functional
Coordinational
Consulting

are: 90 W, 104 W, 98 W, 98 W, 105 W, 92 W, 102 W, 100 W, 110 W, 98
98W
100W
150W
110W
0.8679
0.8769
-0.8769
-0.8679
value of 10.7.
-0.2679
0.2679
0.2769
-0.2769
normalization is applied to scale [-1.0,+1.0]. (Answer should have a
precision of X.XX)
0.66
0.76
0.33
-0.33
Which of the following methodologies focus the most on model

deployment and embedding in operational systems?
SMAM
CRISP-DM
SEMMA
All options are correct
Find the odd term.
Data Wrangling
Deep Learning
Machine Learning
Which data analytic approach can identify the probabilities of an action?
Prescriptive
Predictive
Diagnostic
Descriptive

Predictive
Prescriptive
Diagnostic
Descriptive

To generate genuine business value, simply gathering and keeping data

isn't enough. Technologies for big data analytics are required to:
Integrate data from internal and external sources
Formulate eye-catching charts and graphs
Determine business goals and objectives
Extract valuable insights from the data

Regression
Classification
Association Rule
Clustering

Classification
Association Rule
Regression
Clustering
Training and testing datasets are developed during _________ phase.
None
Model planning
Operationalize
Model building
Incorrect
Fortis-Apollo hospital is planning to design a model which maps

patients to the best possible treatments based on the diagnosis. Identify
the data analytics task for this scenario
Cognitive analytics
Which of the following artifacts are not considered in the Descriptive

analytics?
Alerts
Adhoc reports
Predictive model
Standard report
Variables
Data point
Features
Dimensions

Nominal attribute
Numeric attribute
Ordinal attribute

interval/ratio data

variables do not

summary
Min, Median, Mode
Incorrect
Clustering techniques can be used in
Unsupervised Learning
None of the answers
Feature Selection
Either Unsupervised Learning or Feature Selection
Quasi-structured
Semi-structured
Unstructured
Structured
Partial
Binarization maps a continuous attr
Binning Divide the range of a co
Concept Hierarchy Smooth out the effect o
Functional Transformation Transform attribute valu
Incorrect

le.fit(input)
1
4
3
2
Incorrect
The dissimilarity between two data objects is
Lower when objects are not alike
None of the above
Lower when objects are more alike
Higher when objects are more alike
Systematic Sampling
Stratified Random
Cluster Sampling
Simple Random
Mostly True
False
None of the answers
True
Creating dummy variables during data preparation is what kind of an

operation
Data transformation
Data retrieval
Data cleansing
Data combining

values.
Data transformation is done to improve ------------- in algorithm
Noise
Inconsistencies
Accuracy & Efficiency
Integration
Redundancy

In a boxplot, where Q1, Q2 and Q3 are the first, second and third
quartiles respectively, the interquartile range IQR is calculated as:
None of the above
IQR = Q3 – Q1
IQR = Q3-Q2
IQR = Q2-Q1

le.fit(input)
2
3
4
1
technique.
Transformation
Aggregation
Discretization
Sampling
Quiz 1
Due
Dec 19 at 23:59
Points
10
Questions
40
Available
Dec 18 at 19:00 - Dec 19 at 23:59
Time Limit
60 Minutes
Instructions
this Quiz.
All the best.
Attempt History
Attempt Time Score
LATEST Attempt 1
60 minutes 8.63 out of 10

Correct answers will be available on Dec 22 at 0:00.
Score for this quiz:

8.63 out of 10
Question 1 0.25
/ 0.25 pts

Privacy Checker



Question 2 0.25
/ 0.25 pts

data wrangling


all of the options

Question 3 0.25
/ 0.25 pts
R and Python is not a preferred skill for

Data Architect

Data Analyst

Data Journalist

Data Scientist
Question 4 0.25
/ 0.25 pts


Business Analyst

Data Scientist

Data Architect
Question 5 0.25
/ 0.25 pts
Which of the following best describes the difference between the data
analyst and data scientist?

Data analyst does estimation whereas data scientist predicts & explains
it as well


Data analyst just deal with numbers whereas data scientists deals with
algorithms

Data analyst are more proficient in R whereas Data scientists are more
proficient in Python
Question 6 0.25
/ 0.25 pts
"Take last weeks data and predict the sales for next six months within
next few days with 2 people team" is good example of which of the
following Data Science challenge?

Lack of professional having sound knowledge of Data science skills

Unrealistic expectations

Data sizing

Insufficient timing for project completion
Question 7 0.25
/ 0.25 pts
Compute the Jaccard's Co-efficient for x = (1,0,0,0,1,1,1) and y =

(0,1,1,0,0,1,0)

0.166

0.455

0.765

0.234
Question 8 0.25
/ 0.25 pts
Suppose the Lab administrator measured the power consumption of an

entire network operations centre (NOC) and the set of consumption
details is 90 W, 104 W, 98 W, 98 W, 105 W, 92 W, 102 W, 100 W, 110
W, 98 W, 210 W and 115 W.What is the mode power consumption?

90W

98W

100W

150W
Question 9 0.25
/ 0.25 pts

-0.8679

0.8769

0.8679

-0.8769
Incorrect
Question 10 0
/ 0.25 pts
Compute the Cosine similarity between A(3,6) and B(4,8).

1.0

1.3

1.1

1.2
Question 11 0.25
/ 0.25 pts
Which of the following data science project step is the most critical step
for the success of the project?

Model Selection

Data preprocessing

Model Evaluation

Model Building
Question 12 0.25
/ 0.25 pts
- DM

Evaluation

Data modeling


Data Preparation
Question 13 0.25
/ 0.25 pts


True

False
Incorrect
Question 14 0
/ 0.25 pts
A scenario where you feel unwell and go to a doctor. After a detailed

diagnosis, the doctor concludes that it is a regular fever and sends you
home with medicines. He also has instructions to rest and drink plenty
of fluids and return to the hospital if the fever doesn't subside in a week.
This can be considered analogous to which stage of data analytics

Prescriptive

Descriptive

Predictive

Diagnostic
Question 15 0.25
/ 0.25 pts

Diagnostic

Prescriptive

Descriptive

Predictive
Question 16 0.25
/ 0.25 pts




Question 17 0.25
/ 0.25 pts
Which of the following step is performed next by data scientist after

acquiring the data?

Data integration

Data cleaning

Data replication

All of them
Question 18 0.25
/ 0.25 pts

Vector

Variability

Vulnerability

Volatile
Question 19 0.25
/ 0.25 pts

Regression

Classification

Clustering

Ranking
Question 20 0.25
/ 0.25 pts
If you were to arrange the following data analytics techniques in the

increasing order of complexity, which of the following is considered the
correct order?

Diagnostic, Predictive, Prescriptive, Descriptive

Descriptive, Diagnostic, Predictive, Prescriptive

Prescriptive, Descriptive, Diagnostic, Predictive

Predictive, Diagnostic, Prescriptive, Descriptive
Incorrect
Question 21 0
/ 0.25 pts
manager is analysing past projects to identify the risk involved and how
they were mitigated.




Question 22 0.25
/ 0.25 pts


two locations.

record.


Question 23 0.25
/ 0.25 pts

Unstructured

Semi-structured

Quasi-structured

Structured
Question 24 0.25
/ 0.25 pts

Structured format

Raw format

both

none
Question 25 0.25
/ 0.25 pts
Point out the correct statement.

None of the mentioned

Preprocessed data is original source of data

Raw data is the data obtained after processing steps

Raw data is original source of data
Question 26 0.25
/ 0.25 pts

WebSpider

Scraper

WebCrawler

BeautifulSoup
Question 27 0.25
/ 0.25 pts

It places less emphasis on the initial planning phases covered in
CRISP-DM (Business Understanding and Data Understanding phases)
and omits entirely the Deployment phase.

The SEMMA model also emphasizes data mining as a non-linear,
adaptive process.

SEMMA is a logical organisation of the functional tool set of SAS
Enterprise Miner for carrying out the core tasks of data mining.

Question 28 0.25
/ 0.25 pts

CarColor?

Interval attribute

Ratio attribute

Ordinal attribute

Nominal attribute
Question 29 0.25
/ 0.25 pts

Mostly True

False

None of the answers

True
Partial
Question 30 0.13
/ 0.25 pts
Match the following sampling techniques with the use cases
Random Sampling Randomly pick mang
Systematic Sampling Select of fruits based
Stratified Sampling Pick one mango, one
Quota Sampling Pick every 5th fruit fr
Question 31 0.25
/ 0.25 pts

interest

Representative

Probabilistic

Qualitative

Systamatic
Question 32 0.25
/ 0.25 pts
le.fit(input)

1

2

3

4
Question 33 0.25
/ 0.25 pts
23, 24} and Y = {-30, -31, -32, -33, -34, -35, -36, -37, -38, -39, -40, -41,

Will be the same.



Question 34 0.25
/ 0.25 pts
The first 20 rows of a continuous values attribute of a large data set is

given below.Missing values are handled by replacing with 0.What kind
of discretisation would you prefer?
Continous_Attribute
130.2
125
126.75

Equal Width binning

Equal frequency binning.
Incorrect Question 35 0
/ 0.25 pts




Histogram analysis
Question 36 0.25
/ 0.25 pts

Inconsistencies

Noise

Redundancy


Integration
Incorrect
Question 37 0
/ 0.25 pts
classification task, an engineer used every 8th example to generate the
test set .

Simple Random

Stratified Random

Systematic Sampling

Cluster Sampling
Question 38 0.25
/ 0.25 pts
How can you handle missing or corrupted data in a dataset?

Drop missing rows or columns


Assign a unique category to missing values

Replace missing values with mean/median/mode
Question 39 0.25
/ 0.25 pts




Question 40 0.25
/ 0.25 pts
Histogram analysis algorithms can be based on either equal width or

equal frequency.

True

False
Quiz Score:
8.63 out of 10
Quiz 1
Instructions
this Quiz.
All the best.
Attempt History
Attempt Time Score

all of the options
data wrangling
Data Analyst
Data Scientist
Data Architect
Data Journalist

reporting on it.

reporting on it.
data manipulation
data mining
data analytics
Privacy Checker
For an organization, having not much data analytics need and just
embarking on the analytics path will most likely structure its data team
in a
Centralized model
Federated model
Consulting model
Functional model
precision of X.XX]
0.33
0.43
0.36
0.45

(0,1,1,0,0,1,0)
0.765
0.166
0.455
0.234
bins is 5.
[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46, 48,
36]
6
4
(49−30)÷5
3
5
precision of X.XX]
0.33
0.29
0.30
0.38
Deployment
Data Modelling
Data preparation
Data collection
Which of the following is not a type of predictive analytics?
What is the student's performance in the next question?
average attendance of the students in the current semester?
Which course will the student take in the next semester?

what is the average score of all students in the CBSE 10th Math Exam
Incorrect

CRISP-DM
SEMMA
SMAM

Predictive analysis

observations

produce a result
Incorrect

Optimization
Classification
Regression
Clustering

example of
Clustering
Association Rule
Classification
Regression
Incorrect
Model Selection
Model Evaluation
Model Building
Data preprocessing

analytics?
Classification
Clustering
Regression
Ranking
WebSpider
WebCrawler
BeautifulSoup
Scraper

A dataset can contain
Both Quantitative and Qualitative Values
None of the answers
Quantitative values
Qualitative Values
Which of the following properties are supported by interval attribute.
P) Distinctness
Q) Order
R) Meaningful differences
S) Meaningful ratios
P and R
P and S
P,Q and R

CarColor?
Interval attribute
Ordinal attribute
Nominal attribute
Ratio attribute
True
False
Incorrect
Incorrect

le.fit(input)
2
4
1
3
For a data analytics task to analyse feedback on her subject for a class
of 60 students, a school teacher decided to use the survey submitted
by the ten students who come for tuitions for that subject, at her home.
Identify the type of sampling she is doing.
Systematic Sampling
Stratified sampling
Incorrect
Incorrect
The statistical description (x1,x2, . . . ,xN)/N, for the data values x1,x2, .
. . ,xN is called as their ________________
mean
IQR
median
mode
Which of the following statements are true with respect to data quality
issues?
The given data set should not miss any values or attributes
All of the above

Pre-processing of data is required to address the problems of
inconsistency, incompleteness.

If data are not updated time to time there will be a negative impact on
data quality.
Incorrect

[0,1,1,0]
[1,0,0,1]



Incorrect

None of the above
IQR = Q3 – Q1
IQR = Q2-Q1
IQR = Q3-Q2
Incorrect
MEAN
RANGE
MEDIAN
MODE
012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
/012ÿ4
506ÿÿ0.ÿÿ1879. ÿ7819:;ÿ06 ÿ/06;:189;ÿ,6
<=>1?>@?6ÿÿ03ÿÿ0.766ÿÿÿ0.ÿÿ1879. ÿA1B6ÿC1B1:ÿD6ÿ &
E9;:F0G:189;
V$ÿ ÿÿÿ'ÿ*%ÿ(Jÿ)ÿX ÿP((ÿÿ'ÿ%QÿP$ÿÿÿ*W %ÿ 4ÿNÿJ ÿ*&&ÿ$&
)
V$ÿ- ÿ&ÿ*ÿNÿÿ$ ÿÿ$&ÿÿ'ÿ*%(ÿÿJ ÿ&)ÿX ÿP((ÿÿ'ÿ((PÿÿY
'Wÿÿ$ÿ%Q &ÿ- &ÿNÿJ ÿ&W%ÿÿ- &)ÿ
Z$&ÿ$ÿ*&ÿ%%%ÿ&Pÿÿ$ÿ- &)[
V$ÿ&P&ÿP((ÿ'ÿQ&'(ÿ(JÿNÿ$ÿJ&ÿÿ$ÿ- ÿ$&ÿ)
H((ÿ$ÿ'&)
H*%ÿI&J
<::6BK: A1B6 LG8F6
C<AMLA <::6BK:ÿ4 9+ÿ* & 3).1ÿ ÿNÿ06
Oÿÿ&P&ÿP((ÿ'ÿQ('(ÿÿÿ11ÿÿ6766)
ÿNÿ$&ÿ- 7ÿRSTUÿ ÿNÿ06
'*ÿÿ03ÿÿ11719
V$&ÿ*%ÿWÿ9+ÿ* &)
7>F:1>? /06;:189ÿ4 \S4]ÿ^ÿ\SU_ÿK:;
HN(ÿ((Yÿ*%&&ÿNÿ$ÿN((PYÿ&*&
$%&722'&%()& )*2 &&20+6,2- &28,98 020.

012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
ÿÿ%ÿ/ÿ
ÿÿ$ÿ/ÿ
9:;87<=>ÿ2 0123ÿ5ÿ0123ÿ678
?$ÿÿ@((A/ÿBCÿ(&ÿÿÿ&ÿ%D
ÿÿÿ&4ÿE$4ÿ4ÿ%&4ÿ
/**ÿ
9:;87<=>ÿF 0123ÿ5ÿ0123ÿ678

ÿG/ÿ&ÿÿ& '@(ÿ@ÿÿ)
ÿÿ? ÿ
9:;87<=>ÿH 0123ÿ5ÿ0123ÿ678
I$$ÿÿ@ÿ$ÿ@((A/ÿ&*&ÿ&ÿÿ J
$%&722'&%()& )*2 &&20+6,2- &28,98 120.

012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
ÿÿÿÿ&ÿÿ/ÿ%&ÿÿ$ÿ)ÿ
?>@=AA;@7 9:;87<=>ÿ5 0ÿ2ÿ0345ÿ678
ÿ&&ÿ&ÿÿ&%&'(ÿ/
ÿÿÿ*% (ÿ
9:;87<=>ÿB 0345ÿ2ÿ0345ÿ678
C$$ÿ/ÿ$ÿ/((DEÿ%%$ÿ&$ (ÿ'ÿ &ÿÿ&FÿÿG(H&&
- &I
ÿÿJÿ ÿ$ÿ- &ÿD$$ÿ&ÿÿ'ÿ&Dÿ
$%&722'&%()& )*2 &&20+6,2- &28,98 820.

012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
89:76;<=ÿ> /012ÿ4ÿ/012ÿ567
%%&ÿ$ÿ$ÿ** *ÿÿ*?* *ÿ@( &ÿAÿÿ' ÿÿ,)8
ÿ+)B4ÿ&%@(C)ÿ*% ÿ$ÿ&(ÿ@( ÿAÿ9),ÿAÿ**?
*(ÿ&ÿ%%(ÿÿ&(ÿD0)64E0)6F)ÿG&Hÿ&$ (ÿ$@ÿ
%&ÿAÿI)II#
ÿÿ6)88ÿ
89:76;<=ÿJ /012ÿ4ÿ/012ÿ567
%%&ÿ$ÿ$ÿ** *ÿÿ*?* *ÿ@( &ÿAÿÿ' ÿÿ,)8
ÿ+)B4ÿ&%@(C)ÿ*% ÿ$ÿ&(ÿ@( ÿAÿ9),ÿAÿ**?
*(ÿ&ÿ%%(ÿÿ&(ÿD6)640)6F)ÿG&Hÿ&$ (ÿ$@ÿ
%&ÿAÿI)IIF
ÿÿ6)88ÿ
$%&722'&%()& )*2 &&20+6,2- &28,98 ,20.

012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
89:76;<=ÿ> /012ÿ4ÿ/012ÿ567
%%&ÿ$ÿ$ÿ** *ÿÿ*?* *ÿ@( &ÿAÿÿ' ÿÿ,)8
ÿ+)B4ÿ&%@(C)ÿ*% ÿ$ÿ&(ÿ@( ÿAÿ9),ÿAÿ**?
*(ÿ&ÿ%%(ÿÿ&(ÿD6)640)6E)ÿF&Gÿ&$ (ÿ$@ÿ
%&ÿAÿH)HHE
ÿÿ6)88ÿ
89:76;<=ÿI/ /012ÿ4ÿ/012ÿ567
*% ÿ$ÿJK&ÿAAÿAÿ?ÿLÿ0464646404040#ÿÿCÿL
6404046464046#
ÿÿ6)0BBÿ
89:76;<=ÿII /012ÿ4ÿ/012ÿ567
$%&722'&%()& )*2 &&20+6,2- &28,98 920.

012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
01ÿ$ÿÿ(1&ÿ&2ÿ0ÿ$ÿ0((34ÿ&)ÿ5ÿ%6
*4ÿ&ÿ(1&4ÿ%&ÿ%6&ÿÿ01ÿ$ÿ&2ÿ7(7ÿÿ$3
$1ÿ3ÿ*4)
ÿÿ&%7ÿ5(1&ÿ
ABC@?DEFÿG: 89:;ÿ=ÿ89:;ÿ>?@
5&&ÿ ÿÿH&ÿÿI0&HÿÿJ((&ÿÿ3$$ÿ%$&ÿ0ÿ&%ÿ

ÿÿI &&&ÿK&4ÿ
ABC@?DEFÿGL 89:;ÿ=ÿ89:;ÿ>?@
ÿ&ÿ &&ÿÿÿ&7ÿ$ÿ%ÿ0*ÿ$ÿ3
)
$%&722'&%()& )*2 &&20+6,2- &28,98 .20/

012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
ÿÿ/((ÿ0ÿ$ÿ%&ÿ
B?C>DD<C8 :;<98=>?ÿ@A 1ÿ3ÿ1456ÿ789

/ÿ(ÿ&ÿ(&ÿ&ÿ&(&ÿEÿ(Eÿ$ÿF%ÿÿ$ÿ(&
- )ÿÿ&&ÿ$(%&ÿÿ0GHÿE$$ÿ$ÿ&(&ÿEÿ00
0*(Gÿ&&ÿ((ÿ&H*&ÿÿ&ÿÿÿ&H*IÿJ$ÿKÿ0
(G&ÿ&ÿ$&I
ÿÿH&ÿ
:;<98=>?ÿ@6 1456ÿ3ÿ1456ÿ789
ÿÿ%$&4ÿ0(ÿ%2$(ÿ *ÿ0ÿ%&&ÿ&
%%
$%&722'&%()& )*2 &&20+6,2- &28,98 +20.

012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
ÿÿ/%(ÿ
9:;87<=>ÿ?@ 0123ÿ5ÿ0123ÿ678
ÿÿA&$ÿÿ&ÿ$ÿB ÿ$ÿCÿ444D 4DD Eÿ&ÿ

F*%(ÿB
ÿÿ/(ÿ' ÿ
9:;87<=>ÿ?G 0123ÿ5ÿ0123ÿ678
H$ÿ%&&ÿ$ I$ÿJ$$ÿ' &&&&ÿ(K&ÿ &*ÿÿÿ$
K%&ÿBÿB*ÿÿÿÿBÿ%&ÿÿ(L&ÿ'JÿM &
ÿ*&ÿ&ÿLJÿ&7
ÿÿÿ*Iÿ
$%&722'&%()& )*2 &&20+6,2- &28,98 320.

012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
89:76;<=ÿ>? /012ÿ4ÿ/012ÿ567
@$$ÿAÿ$ÿA((BCÿ&ÿÿ&&(ÿ%&&ÿÿB$$ÿ$ÿ((C
*$&ÿÿ%%(ÿÿDÿÿ%&E
ÿÿÿCÿ
89:76;<=ÿ>F /012ÿ4ÿ/012ÿ567
AGÿ$ÿÿ(G&ÿ&HÿAÿ$ÿA((BCÿ&)ÿIÿ& ÿ&
(GCÿJ &ÿ'(C&ÿÿJ(C&ÿÿAÿ ÿ$ÿ&H((ÿ&ÿ$ÿ$&ÿÿ'
- ÿÿ'*ÿÿÿ&&ÿÿ$ÿA )ÿ
ÿÿ&%JÿI(G&ÿ
K=L<MM:L6 89:76;<=ÿ1/ /ÿ4ÿ/012ÿ567
$%&722'&%()& )*2 &&20+6,2- &28,98 .20.

012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
/ÿÿ01ÿÿ&4ÿ$ÿ2((30ÿÿ%%&&0ÿ$- &ÿ &ÿ

*%1ÿ$ÿ- (4ÿ2ÿ7
ÿ(04ÿÿ04ÿÿ ÿÿ
&2*&)
5$$ÿ2ÿ$ÿ2((30ÿ&*&ÿ&ÿ6789
ÿÿ:((ÿ2ÿ$ÿ01ÿ%&ÿÿ ÿ
DEFCBGHIÿ=J ;<=>ÿ@ÿ;<=>ÿABC
7
ÿ*$(04ÿ&ÿ&%2((4ÿ' (ÿ2ÿ62*ÿ6$(04#
%K&)
ÿÿ/(&ÿ
DEFCBGHIÿ== ;<=>ÿ@ÿ;<=>ÿABC
5ÿÿ%0ÿ$ÿ$ *4ÿÿL0(ÿ &0ÿ$ÿÿ((ÿ

$ÿ(&ÿÿ*$)ÿ6$&ÿ&Mÿ&ÿÿN*%(ÿ2
$%&722'&%()& )*2 &&20+6,2- &28,98 0620.
012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
ÿÿ/0&&ÿ
:;<98=>?ÿ3@ 1234ÿ6ÿ1234ÿ789
ÿ *'&ÿÿÿA%ÿBÿ' &)
ÿÿC(ÿ' ÿ
:;<98=>?ÿ3D 1234ÿ6ÿ1234ÿ789
/Eÿÿ&$ (ÿ'ÿ%&&ÿ(Aÿÿ*)
ÿÿF(&ÿ
:;<98=>?ÿ34 1234ÿ6ÿ1234ÿ789
$%&722'&%()& )*2 &&20+6,2- &28,98 0020.
012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
ÿÿ/&$ÿÿ&ÿ$ÿ0 ÿ1%%(
ÿ&$23ÿ$ÿ&ÿ0
$ÿ%%(ÿ&ÿÿ4*%(ÿ0
ÿÿ &ÿ' ÿ
>?@=<ABCÿ7D 5678ÿ:ÿ5678ÿ;<=

ÿ ÿ$ÿÿ&*)
ÿÿ2ÿÿ&ÿ3(ÿ& ÿ0ÿÿ
>?@=<ABCÿ7E 5678ÿ:ÿ5678ÿ;<=
F*% ÿÿG(Hÿ&ÿ0ÿ' ÿI%)
ÿÿJÿ' ÿ
$%&722'&%()& )*2 &&20+6,2- &28,98 0120.

012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
89:76;<=ÿ1> /012ÿ4ÿ/012ÿ567
?ÿ'@ÿ%(ÿ&ÿ$ÿA& (ÿ%&ÿBÿ$ÿB((CDÿ&&(ÿ& **E
ÿÿ* *4ÿF&ÿ (4ÿ4ÿG$ÿ (4ÿ@* *ÿ
I=J<KK:J6 89:76;<=ÿ1H /ÿ4ÿ/012ÿ567

&ÿ$ÿB((CDÿ
E$ÿ)ÿ
% LMNOA((&N4N
$(%&N4NE&PN4NAEN4N(ENQ
(ÿLÿ&P()%%&&D)'(#
()B% #
%()&B*N(EN##
G$ÿ(&ÿ(ÿBÿ$ÿÿ&%%ÿC((ÿ%
ÿÿ,ÿ
$%&722'&%()& )*2 &&20+6,2- &28,98 0820.

012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
89:76;<=ÿ>/ /012ÿ4ÿ/012ÿ567
?$$ÿ@ÿ$ÿ@((ABÿ&ÿCDEÿÿF& (ÿ$- G
ÿÿH*ÿ
89:76;<=ÿ>I /012ÿ4ÿ/012ÿ567
&ÿ$ÿ@((ABÿ
J$ÿ)ÿÿ
*%ÿ *%Jÿ&ÿ%
@*ÿ&K()%%&&Bÿ*%ÿL
H*0ÿMÿ%)JN,04,84,94,+4,.O#
'0ÿMÿL$&$(MÿH*0)*##
H*0'ÿMÿ'0)@&@*H*0)&$%04ÿ0##
%H*0'#
E$ÿ(&ÿ(ÿ@ÿ$ÿÿ&%%ÿA((ÿ%
ÿÿN646464040Oÿ
$%&722'&%()& )*2 &&20+6,2- &28,98 0,20.

012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
89:76;<=ÿ>1 /012ÿ4ÿ/012ÿ567
$&ÿ$ÿÿ*%(ÿ(
ÿÿ*?*ÿ@ÿ8A*?*#)ÿ
89:76;<=ÿ>> /012ÿ4ÿ/012ÿ567
B$ÿ*&&CÿD( ÿEÿC(ÿ' ÿ&ÿ& '& ÿF$
ÿÿ*&ÿE- ÿ' ÿD( ÿ
89:76;<=ÿ>G /012ÿ4ÿ/012ÿ567
EHÿ$ÿE(&ÿ&*)
$%&722'&%()& )*2 &&20+6,2- &28,98 0920.

012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
ÿ
0&ÿÿÿ&&4ÿ1$(ÿ(2ÿ$ÿ4ÿ3 ÿÿÿÿ1$
1$3ÿ$ÿÿ4( &ÿÿ*&&2)ÿ5 ÿ6 &ÿ78ÿ$*)
BCDA@EFGÿH< 9:;<ÿ>ÿ9:;<ÿ?@A
$ÿ$ÿ7((12ÿ7 ÿ &2ÿÿ
3$ÿ &ÿÿÿ(2)
IJF?GKLM ÿ ÿ8ÿ1$ ÿN02
OEPPGKLM ÿ ÿ7((ÿN02NNÿ4( &ÿ &
EG@DJ?FPK@DLM ÿ ÿ7((ÿN0ÿ4( &ÿÿ$ÿ
GF@GCPPLM ÿ ÿ7ÿ*&&2ÿ4( &
BCDA@EFGÿHQ 9:;<ÿ>ÿ9:;<ÿ?@A
R$$ÿ*2ÿ$ÿ7((12ÿÿ4(ÿ*$&ÿ7ÿ$(2ÿ*&&2ÿ
$%&722'&%()& )*2 &&20+6,2- &28,98 0.20/
012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
ÿÿÿÿÿÿ
)ÿ(*/ÿÿ0'1&
ÿÿÿÿÿ)ÿ&*/ÿ&&/ÿ2( &
ÿÿÿÿÿ3)ÿ//ÿ$ÿ&&/ÿ2( &ÿ /ÿ4(5&&
ÿÿÿÿÿ)ÿ3%(/ÿ6$ÿ((ÿ%&&'(ÿ7( &
ÿÿ4((ÿ$ÿ%&ÿÿÿ
ABC@?DEFÿGH 89:;ÿ=ÿ89:;ÿ>?@
I*ÿ*$*(ÿ*(&ÿJÿ%*&ÿÿ'&7&ÿ$ÿ$ÿ(
%$&ÿ&ÿÿ$ÿK%(ÿ/6$)ÿL$&ÿÿ'ÿ7Jÿ'5ÿ%(/ÿ$
*'ÿJÿJ&ÿÿ(/&()ÿL$&ÿ&ÿÿ &ÿ&ÿJÿ6$$ÿ$- )
ÿÿL&J*ÿ
ABC@?DEFÿGM 89:;ÿ=ÿ89:;ÿ>?@
N$$ÿÿ7& (ÿ&ÿ%%%ÿÿK%(ÿ$ÿ(&$%
'6ÿ6ÿ' &ÿ ÿJÿ*5ÿ' &ÿÿÿÿJ*)
$%&722'&%()& )*2 &&20+6,2- &28,98 0+20.
012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
ÿÿÿ%(ÿ
89:76;<=ÿ>? /012ÿ4ÿ/012ÿ567
@$ÿ&&(ÿ&%ÿA04A14ÿ)ÿ)ÿ)ÿ4AB#2B4ÿCÿ$ÿÿD( &ÿA04A14ÿ)ÿ)ÿ)
4ABÿ&ÿ((ÿ&ÿ$ÿ
ÿÿ*ÿ
89:76;<=ÿE/ /012ÿ4ÿ/012ÿ567
@$ÿC&ÿ16ÿF&ÿCÿÿ &ÿD( &ÿ' ÿCÿÿ(Gÿÿ&ÿ&

GDÿ'(F)&&GÿD( &ÿÿ$(ÿ'Hÿ%(GÿF$ÿ6)I$ÿJÿC
&&ÿF (ÿH ÿ%CK
&L'
6
6
6
6
6
$%&722'&%()& )*2 &&20+6,2- &28,98 0320.
012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
6
086)1
6
6
019
6
6
6
6
01/)+9
6
6
6
6
6
ÿÿ- (ÿ0- 1ÿ'2)ÿ
ÿ7ÿ3456ÿ ÿ0ÿ06
$%&722'&%()& )*2 &&20+6,2- &28,98 0.20.

Quiz 1
Instructions
this Quiz.
All the best.
Attempt History
Attempt Time Score
Attempt 1 (https://bits- 31 8.48 out of
LATEST
pilani.instructure.com/courses/1704/quizzes/3453/history?version=1) minutes 10


True
False
file:///Users/rajeshkumar/Downloads/Quiz 1_ Introduction to Data Science (S1-22_DSECLZG532).html 1/18

Which of these is not an example of the application of data science?
Product recommender systems
Creation of new fiscal policy
Fraud detection and prevention system in a bank
Targeted advertising as per customer's need

Storyteller
Data Analyst
Big Data Engineer

____________

All of the options
processing data
analysing data
organizing data

building ML models.
Data mining
Data manipulation
Building continuous data stream

Data analytics
bins is 5.
[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46, 48,
36]
4
(49−30)÷5
3
6
5
7
8
5
6

bins is 4.
[47, 38, 40, 42, 47, 42, 41, 39, 42, 45, 36, 37, 36, 40, 43, 40, 45, 36, 43,
46]
6
5
3
4

(0,1,1,0,0,1,0)
0.234
0.166
0.455
0.765


Descriptive
Prescriptive
Predictive
Diagnostic

Cognitive analytics
- DM
Data modeling
Data Preparation
Evaluation

Operationalize
None
Model building
Model planning
Diagnostic
Prescriptive
Predictive
Descriptive

exams.
The 5 stages of KDD process is

1.Selection
2.Preprocessing
3.Tranformation
4.Data Mining
5.Interpretaion/Evaluation.
Identify the CRISP DM phases that corresponds to Stage 3 and 4 of

KDD.
Data Preparation,Modeling
Evaluation,Business understanding
Data Understanding,Evaluation
Modeling,Data Preparation
Incorrect


Regression
Association Rule
Classification
Clustering
Clustering
Ranking
Classification

Regression
Incorrect

Which data analytic approach can show the relationships in the various
elements in your data?
Descriptive
Prescriptive
Predictive
Diagnostic


Symmetric attribute
Discrete attribute
Incorrect
Is it possible to convert an interval variable to an Ordinal Variable
True
False
Bank loan approval data consists of a field called loan type. It is stored
as an integer in the database. Its values mean the following:- 1 -
personal loan, 2 - home loan, 3 - business loan. What type of data type
is the loan type?
Ratio
Interval
Nominal
Ordinal


analysis?
True
False

Match the following datasets to its correct type.
Transaction Data Record
Molecular Structures Graph
Spatial Data Graph

Integration
Redundancy
Inconsistencies
Noise
Incorrect
Convert a continuous attribute into a discrete attribute

Both of the above.
None of the above.

level view


details
Only Statement III
Statement I and II
Statement I and IV

The features are not correlated
The features are correlated.

import numpy as np
from sklearn.preprocessing import Binarizer
exam1 = np.array([41,43,45,47,49])
b1 = Binarizer(threshold= exam1.mean())
exam1_b = b1.fit_transform(exam1.reshape(-1, 1))
print(exam1_b)
[0,0,0,1,1]
[0,0,1,0,0]
[1,1,0,1,1]
[1,1,1,0,0]

Convert a continuous attribute into a discrete attribute.
None of the above.
Both of the above.
Data Collection
Data Requirements
Data Preparation
Data Understanding

Simple Random
Stratified Random
Cluster Sampling
Systematic Sampling
True
None of the answers
Mostly True
False
fillna() to fill NA values in the d
interpolate() to fill NA/NaN values us
notnull() return Index without NA

6
2
5
5.5

Quiz 1
Instructions
this Quiz.
All the best.
Attempt History
Attempt Time Score

all of the options
data wrangling
Data Analyst
Data Scientist
Data Architect
Data Journalist

reporting on it.

reporting on it.
data manipulation
data mining
data analytics
Privacy Checker
in a
Centralized model
Federated model
Consulting model
Functional model
precision of X.XX]
0.33
0.43
0.36
0.45

(0,1,1,0,0,1,0)
0.765
0.166
0.455
0.234
bins is 5.
[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46, 48,
36]
6
4
(49−30)÷5
3
5
precision of X.XX]
0.33
0.29
0.30
0.38
Deployment
Data Modelling
Data preparation
Data collection

Incorrect

CRISP-DM
SEMMA
SMAM

Predictive analysis

observations

produce a result
Incorrect

Optimization
Classification
Regression
Clustering

example of
Clustering
Association Rule
Classification
Regression
Incorrect
Model Selection
Model Evaluation
Model Building
Data preprocessing

analytics?
Classification
Clustering
Regression
Ranking
WebSpider
WebCrawler
BeautifulSoup
Scraper

None of the answers
Quantitative values
Qualitative Values
P) Distinctness
Q) Order
P and R
P and S
P,Q and R

CarColor?
Interval attribute
Ordinal attribute
Nominal attribute
Ratio attribute
True
False
Incorrect
Incorrect

le.fit(input)
2
4
1
3
Systematic Sampling
Stratified sampling
Incorrect
Incorrect
mean
IQR
median
mode
issues?
All of the above


data quality.
Incorrect

[0,1,1,0]
[1,0,0,1]



Incorrect

None of the above
IQR = Q3 – Q1
IQR = Q2-Q1
IQR = Q3-Q2
Incorrect
MEAN
RANGE
MEDIAN
MODE
Quiz 1
Instructions
this Quiz.
All the best.
Attempt History
Attempt Time Score

all of the options
data wrangling
Data Analyst
Data Scientist
Data Architect
Data Journalist

reporting on it.

reporting on it.
data manipulation
data mining
data analytics
Privacy Checker
in a
Centralized model
Federated model
Consulting model
Functional model
precision of X.XX]
0.33
0.43
0.36
0.45

(0,1,1,0,0,1,0)
0.765
0.166
0.455
0.234
bins is 5.
[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46, 48,
36]
6
4
(49−30)÷5
3
5
precision of X.XX]
0.33
0.29
0.30
0.38
Deployment
Data Modelling
Data preparation
Data collection

Incorrect

CRISP-DM
SEMMA
SMAM

Predictive analysis

observations

produce a result
Incorrect

Optimization
Classification
Regression
Clustering

example of
Clustering
Association Rule
Classification
Regression
Incorrect
Model Selection
Model Evaluation
Model Building
Data preprocessing

analytics?
Classification
Clustering
Regression
Ranking
WebSpider
WebCrawler
BeautifulSoup
Scraper

None of the answers
Quantitative values
Qualitative Values
P) Distinctness
Q) Order
P and R
P and S
P,Q and R

CarColor?
Interval attribute
Ordinal attribute
Nominal attribute
Ratio attribute
True
False
Incorrect
Incorrect

le.fit(input)
2
4
1
3
Systematic Sampling
Stratified sampling
Incorrect
Incorrect
mean
IQR
median
mode
issues?
All of the above


data quality.
Incorrect

[0,1,1,0]
[1,0,0,1]



Incorrect

None of the above
IQR = Q3 – Q1
IQR = Q2-Q1
IQR = Q3-Q2
Incorrect
MEAN
RANGE
MEDIAN
MODE
Quiz 1
Due
Dec 19 at 23:59
Points
10
Questions
40
Available
Dec 18 at 19:00 - Dec 19 at 23:59
Time Limit
60 Minutes
Instructions
this Quiz.
All the best.
Attempt History
Attempt Time Score
LATEST Attempt 1
60 minutes 5.38 out of 10

Correct answers will be available on Dec 22 at 0:00.
Score for this quiz:

5.38 out of 10
/ 0.25 pts

reporting on it.

reporting on it.




Incorrect
Question 2 0
/ 0.25 pts
Which among the following is a more flexible model with right balance
of
centralized and distributed coordination.

Functional

Consulting

Federated

Centralized
Question 3 0.25
/ 0.25 pts

True

False
Question 4 0.25
/ 0.25 pts


Storyteller

Data Analyst

Big Data Engineer

Question 5 0.25
/ 0.25 pts
Artificial Intelligence comprises of the following streams

Machine Learning

Data Science

Deep Learning

IoT
Question 6 0.25
/ 0.25 pts




Question 7 0.25
/ 0.25 pts
precision of X.XX]

0.43

0.33

0.45

0.36
Question 8 0.25
/ 0.25 pts
bins is 5.
[47, 38, 40, 42, 47, 42, 41, 39, 42, 45, 36, 37, 36, 40, 43, 40, 45, 36,
43, 46]

4
20/5

3

5

6
Question 9 0.25
/ 0.25 pts
bins is 4.
[47, 38, 40, 42, 47, 42, 41, 39, 42, 45, 36, 37, 36, 40, 43, 40, 45, 36,
43, 46]

4

3

5

6
Question 10 0.25
/ 0.25 pts
bins is 4.
[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46,
48, 36]

6

3

4

5
(49-30)/4
Question 11 0.25
/ 0.25 pts

___________________ task.

Association

Regression

Classification

Reinforcement
/ 0.25 pts




Predictive analysis

Question 13 0.25
/ 0.25 pts
Examining the data they're keeping and reviewing how it's being used
has little or no value for firms who aren't currently aiming to undertake
big data analytics.

False

True
Question 14 0.25
/ 0.25 pts

Data collection

Deployment

Data preparation

Data Modelling
Incorrect
Question 15 0
/ 0.25 pts

supermarket manager wants to improve her inventory for the summer
season by analyzing last year’s sales.




Incorrect
Question 16 0
/ 0.25 pts
Which of the sentences below best describes predictive analytics?

methods for predicting future behavior using statistical analysis and
data mining, particularly to maximize the strategic value of corporate
intelligence

software applications, also known as "bots," that are dispatched to carry
out a mission and gather data from web pages on behalf of a user

a gateway that offers access to a variety of vital information from many
different sources on one screen

a method that uses feature analysis to predict people in photos and
tags them to other photos on its own
Incorrect
Question 17 0
/ 0.25 pts




another format
Partial
Question 18 0.13
/ 0.25 pts
Match with the most appropriate answer, related to the pandemic
Descriptive Analytics Checking whether hosp
Predictive Analytics Predict test positivity ra
Diagnostic Analytics Interactive visual tool to
Prescriptive Analytics Identify the actions to b
Question 19 0.25
/ 0.25 pts
Identify the data analytics task for the following scenario. The team
leader aggregates the sales data from various geographical areas and
reports the penetration of each product.




Incorrect
Question 20 0
/ 0.25 pts
manager is analysing past projects to identify the risk involved and how
they were mitigated.




Question 21 0.25
/ 0.25 pts
Identify the data analytics task for the following scenario. Google is
using tools to suggest texts or phrases while composing emails.




Question 22 0.25
/ 0.25 pts




Question 23 0.25
/ 0.25 pts

Raw data is the data obtained after processing steps

None of the mentioned

Preprocessed data is original source of data

Raw data is original source of data
Incorrect
Question 24 0
/ 0.25 pts
"Order Fulfilment Date" should come after "Order Creation Date". This
is an example of which data quality aspect:

Integrity

Conformity

Timeliness

Consistency
Question 25 0.25
/ 0.25 pts

Data point

Variables

Dimensions

Features
Question 26 0.25
/ 0.25 pts

analysis?

True

False
Question 27 0.25
/ 0.25 pts

preprocessed data is original source of data

none of the options

raw data is the data obtained after processing steps

raw data is original source of data
Question 28 0.25
/ 0.25 pts
In a FashionStore Data set the feature ShirtSize { S,M,L,XL,XXL} is an

example of


Ordinal attribute

Nominal attribute

Numeric attribute
Question 29 0.25
/ 0.25 pts
Imbalance issue in data sets can be rectified using ____

Binarisation

Normalisation

Standardisation

Sampling
Question 30 0.25
/ 0.25 pts




None of the answers
/ 0.25 pts


sources

None of the above

Unanswered Question 32 0
/ 0.25 pts

mean of a value



none of the above
/ 0.25 pts
In a box and whisker plot of data, Inter Quartile Range (IQR) is

Distance between the first and third quartile

Distance between the first and second quartile

Distance between the first and fourth quartile

Distance between the second and third quartile
/ 0.25 pts
students, a school teacher decided to use the survey submitted by the t
who come for tuitions for that subject, at her home. Identify the type of s
is doing.

Systematic Sampling


Stratified sampling

/ 0.25 pts

[0,1,1,0]



[1,0,0,1]
/ 0.25 pts

Redundancy


Integration

Noise

Inconsistencies
/ 0.25 pts
Scaling a variable is not an essential criteria when the ML pipeline uses

algorithms based on gradient descent Optimization

True

False
/ 0.25 pts
23, 24} and Y = {-30, -31, -32, -33, -34, -35, -36, -37, -38, -39, -40, -41,




Will be the same.
/ 0.25 pts




/ 0.25 pts

Matrix

Lexeme

ConeTree

TreeMap
Quiz Score:
5.38 out of 10
Quiz 1
Instructions
this Quiz.
All the best.
Attempt History
Attempt Time Score


____________
processing data
analysing data
All of the options
organizing data

building ML models.
Incorrect

records
i, ii, iii
iv, v
ii, iii
i, iv, v
True
False
Collecting Raw Data
All of the Above
data mining
data analytics
data manipulation
5
8
6
7
0.8769
-0.8679
0.8679
-0.8769
bins is 4.
[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46,
48, 36]
4
3
5
(49-30)/4
6
5
4
7
6

Classification
Clustering
Regression
Optimization
Descriptive
Predictive
Prescriptive
Diagnostic

Classification
Association Rule
Clustering
Regression
Clustering
Classification
Ranking
Regression

True
False
Data Transformation
Data Integration
All of the above.
Data Reduction
- DM
Data Preparation
Data modeling
Evaluation
Incorrect
Python.
Incorrect
Volatile
Vector
Variability
Vulnerability

Applied statistics
Industry statistics
Economic statistics
programmes?


Zip codes
Exam Grades
Military ranks
Academic ranks
BeautifulSoup
WebCrawler
Scraper
WebSpider

interval/ratio data

variables do not
Variables
Features
Dimensions
Data point

Symmetric attribute
Discrete attribute

operation
Data retrieval
Data cleansing
Data transformation
Data combining
MEDIAN
MEAN
MODE
RANGE
Data Preparation
Data Collection
Data Understanding
Data Exploration
Matrix
TreeMap
Lexeme
ConeTree
23, 24} and Y = {-30, -31, -32, -33, -34, -35, -36, -37, -38, -39, -40, -41,
Will be the same.


Systematic Sampling
Stratified sampling
Q and R
R only
P, Q and R

sources
None of the above

level view


details
Statement I and II
Only Statement III
Statement I and IV
mean of a value
none of the above
Quiz 1
Instructions
this Quiz.
All the best.
Attempt History
Attempt Time Score


____________
processing data
analysing data
All of the options
organizing data

building ML models.
Incorrect

records
i, ii, iii
iv, v
ii, iii
i, iv, v
True
False
Collecting Raw Data
All of the Above
data mining
data analytics
data manipulation
5
8
6
7
0.8769
-0.8679
0.8679
-0.8769
bins is 4.
[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46,
48, 36]
4
3
5
(49-30)/4
6
5
4
7
6

Classification
Clustering
Regression
Optimization
Descriptive
Predictive
Prescriptive
Diagnostic

Classification
Association Rule
Clustering
Regression
Clustering
Classification
Ranking
Regression

True
False
Data Transformation
Data Integration
All of the above.
Data Reduction
- DM
Data Preparation
Data modeling
Evaluation
Incorrect
Python.
Incorrect
Volatile
Vector
Variability
Vulnerability

Applied statistics
Industry statistics
Economic statistics
programmes?


Zip codes
Exam Grades
Military ranks
Academic ranks
BeautifulSoup
WebCrawler
Scraper
WebSpider

interval/ratio data

variables do not
Variables
Features
Dimensions
Data point

Symmetric attribute
Discrete attribute

operation
Data retrieval
Data cleansing
Data transformation
Data combining
MEDIAN
MEAN
MODE
RANGE
Data Preparation
Data Collection
Data Understanding
Data Exploration
Matrix
TreeMap
Lexeme
ConeTree
23, 24} and Y = {-30, -31, -32, -33, -34, -35, -36, -37, -38, -39, -40, -41,
Will be the same.


Systematic Sampling
Stratified sampling
Q and R
R only
P, Q and R

sources
None of the above

level view


details
Statement I and II
Only Statement III
Statement I and IV
mean of a value
none of the above
Question 1
0.25 / 0.25 pts
True
False
PartialQuestion 2
0.17 / 0.25 pts
Which of the following are the reasons for the sudden growth of analytics?
Large number of analysts available in the market
Large number of user friendly analytics tools available for data processing
Cost of storage has hugely dropped
Data is growing at 40% compound annual rate
Question 3
0.25 / 0.25 pts
A city conducted a new bi-annual census of its residents. Which of the following most
strongly suggests a cognitive bias in their collected dataset?
The census data includes new data on how many cars are owned by the residents.
Some rows have null values for last names

The average income in the dataset is 40% higher than the average income of the city’s
population from the previous census 2 years ago.
Some rows have date of birth in DD-MM-YY and some in DD-MM-YYYY formats
Question 4
0.25 / 0.25 pts
Due to market expectations, businesses are having difficulty retaining highly trained data
scientists and engineers.
False
True
Question 5
0.25 / 0.25 pts
Data science is an interdisciplinary field that has minimal overlap with which of the below?
Software Engineering
Machine Learning
Statistical Analysis
Question 6
0.25 / 0.25 pts
Statement 1: Role of a Business analyst usually requires expertise on building ML models.
Statement 2: Role of a Data Scientist usually requires expertise on performing descriptive
analysis.
Question 7
0.25 / 0.25 pts
Suppose that the minimum and maximum values for an attribute are 4.3 and 7.6,
respectively. Compute the scaled value of 5.4 if min-max normalization is applied to scale
[-1.0,+1.0]. (Answer should have a precision of X.XX)
0.76
0.66
0.33
-0.33
Question 8
0.25 / 0.25 pts
Suppose that the mean and standard deviation of the values for an attribute are 8.9 and
6.5, respectively. Apply z-score normalization to a value of 10.7.
0.2679
-0.2769
-0.2679
0.2769
Question 9
0.25 / 0.25 pts
7
Question 11
0.25 / 0.25 pts
Access Situation - "Cost and Benefits" - Falls into which phase of Crisp - DM
Evaluation
Data modeling
Data Preparation
Question 12
0.25 / 0.25 pts
Suppose a web user visits Flipkart during big billion-day sales. Predicting whether he / she
makes a purchase of a smartphone is a ___________________ task.
Association
Classification
Reinforcement
Regression
Question 15
0.25 / 0.25 pts
A course instructor has data about students attendance in her course in the past semester
. What kind of analytics is she performing when she creates a line graph based on this
data?
Predictive
Descriptive
Diagnostic
Prescriptive
Question 18
0.25 / 0.25 pts
Aanalyzing the data to determine why some phenomena related to learning happened a
type of
Diagnostic
Descriptive
Prescriptive
Predictive
Question 19
0.25 / 0.25 pts
Regression is
(1) Prediction of a value of a given continuous valued variable based on the values of other
variables.
(2) Regression works with a linear or nonlinear model of dependency among the variables.
Which of the above is true?
Option 1
None
Option 2
Both 1& 2
Question 20
0.25 / 0.25 pts
Amongst which of the following is / are the branch of statistics which deals with the
development of statistical methods is classified as ___.
Applied statistics
Industry statistics
Economic statistics
Question 21
0.25 / 0.25 pts
Google tries to differentiate emails as spam and non-spam, this is an example of
Clustering
Classification
Regression
Association Rule
Question 22
0.25 / 0.25 pts
Identify the data analytics task for the following scenario. An e-commerce platform is
recommending products to their customers to improve the shopping experience.
Question 23
0.25 / 0.25 pts
In a dataset, CarColor is one of the attributes and it can take the following values {Red,
Green, Yellow, Black}, what type of attribute is CarColor?
Interval attribute
Ratio attribute
Ordinal attribute
Nominal attribute
Question 24
0.25 / 0.25 pts
What are the best practices for implementing big data analytics programmes?
Focusing on business goals and how to use big data analytics technologies to meet them
Question 25
0.25 / 0.25 pts
Is it possible to rescale a continuous data for better data understanding?
True
False
Question 26
0.25 / 0.25 pts
As part of a survey in a large organization, one of the features that you capture is
designation. This type of data has the characteristic
Question 27
0.25 / 0.25 pts
"Order Fulfilment Date" should come after "Order Creation Date". This is an example of
which data quality aspect:
Consistency
Integrity
Conformity
Timeliness
Question 28
0.25 / 0.25 pts
none
Structured format
Raw format
both
Question 29
0.25 / 0.25 pts
None of the above.
Convert a continuous attribute into a discrete attribute
Both of the above.
Question 32
0.25 / 0.25 pts
import numpy as np
from sklearn.preprocessing import Binarizer
exam1 = np.array([41,43,45,47,49])
b1 = Binarizer(threshold= exam1.mean())
exam1_b = b1.fit_transform(exam1.reshape(-1, 1))
print(exam1_b)
[0,0,1,0,0]
[1,1,0,1,1]
[0,0,0,1,1]
[1,1,1,0,0]
Question 33
0.25 / 0.25 pts

Question 34
0.25 / 0.25 pts
Which data visualization is appropriate to explore the relationship between two
attributes out of many attributes in a data frame.
Histogram
Scatter plot
Box-plot
Heat maps
Question 35
0.25 / 0.25 pts
A sample is ------------------ if it has approximately the same property of interest
Systamatic
Qualitative
Representative
Probabilistic
Question 37
0.25 / 0.25 pts
Converting the raw values of a numeric attribute is ?
Sampling
Normalization
Discretization
Smoothing
Question 38
0.25 / 0.25 pts
In the given table below there is a requirement that to get the name, gender, marks of the
top-scoring students only. Which of the following functionalities of data wrangling is
used?
Data exploration
Reshape
Replace
Filter
Question 39
0.25 / 0.25 pts
“Proximity” in Data Science terms means:
The extent of similarity or dissimilarity between two objects.
A measure of the physical distance between two objects.
None of the above.
The area surrounding an object where the object is able to exert its influence.
PartialQuestion 40
0.13 / 0.25 pts
Which of the following methods are considered to be the best practice for data cleaning?

co
Quiz 1
Instructions
this Quiz.
All the best.
Attempt History
Attempt Time Score


Storyteller
Big Data Engineer
Data Analyst
Privacy Checker
Which one of the following is not a necessary characteristic of a Data

Scientist?
Communicative
Punctual
Creative
Technical
all of the options
data wrangling

Functional
Coordinational
Consulting

are: 90 W, 104 W, 98 W, 98 W, 105 W, 92 W, 102 W, 100 W, 110 W, 98
98W
100W
150W
110W
0.8679
0.8769
-0.8769
-0.8679
value of 10.7.
-0.2679
0.2679
0.2769
-0.2769
normalization is applied to scale [-1.0,+1.0]. (Answer should have a
precision of X.XX)
0.66
0.76
0.33
-0.33

SMAM
CRISP-DM
SEMMA
Find the odd term.
Data Wrangling
Deep Learning
Machine Learning
Which data analytic approach can identify the probabilities of an action?
Prescriptive
Predictive
Diagnostic
Descriptive

Predictive
Prescriptive
Diagnostic
Descriptive



Regression
Classification
Association Rule
Clustering

Classification
Association Rule
Regression
Clustering
None
Model planning
Operationalize
Model building
Incorrect

Cognitive analytics
Which of the following artifacts are not considered in the Descriptive

analytics?
Alerts
Adhoc reports
Predictive model
Standard report
Variables
Data point
Features
Dimensions

Nominal attribute
Numeric attribute
Ordinal attribute

interval/ratio data

variables do not

summary
Min, Median, Mode
Incorrect
Clustering techniques can be used in
Unsupervised Learning
None of the answers
Feature Selection
Either Unsupervised Learning or Feature Selection
Quasi-structured
Semi-structured
Unstructured
Structured
Partial
Binarization maps a continuous attr
Binning Divide the range of a co
Concept Hierarchy Smooth out the effect o
Functional Transformation Transform attribute valu
Incorrect

le.fit(input)
1
4
3
2
Incorrect
The dissimilarity between two data objects is
Lower when objects are not alike
None of the above
Lower when objects are more alike
Higher when objects are more alike
Systematic Sampling
Stratified Random
Cluster Sampling
Simple Random
Mostly True
False
None of the answers
True
Creating dummy variables during data preparation is what kind of an

operation
Data transformation
Data retrieval
Data cleansing
Data combining

values.
Noise
Inconsistencies
Integration
Redundancy

None of the above
IQR = Q3 – Q1
IQR = Q3-Q2
IQR = Q2-Q1

le.fit(input)
2
3
4
1
technique.
Transformation
Aggregation
Discretization
Sampling

All Merged IDS Quiz - Merged

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

All Merged IDS Quiz - Merged

Uploaded by

Copyright:

Available Formats

18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

“Choose the most appropriate answer to each question.”

All the best.

 Correct answers will be available on Dec 22 at 0:00.

Score for this quiz: 9.38 out of 10

Question 1 0.25 / 0.25 pts

Which of the following is not a tool used by a Data Scientist?

Question 2 0.25 / 0.25 pts

Some rows have null values for last names

Question 3 0.25 / 0.25 pts

Statement 1: Role of a Business analyst usually requires expertise on building ML models.

Which of the following is right?

Statement 1 is correct and Statement 2 is wrong

Both statements are correct

Both statements are wrong

Statement 1 is wrong and Statement 2 is correct

Question 4 0.25 / 0.25 pts

Which of the following is not a tool used by a Data Scientist?

Question 5 0.25 / 0.25 pts

Tableau is not likely to be used by

Question 6 0.25 / 0.25 pts

Question 7 0.25 / 0.25 pts

Question 8 0.25 / 0.25 pts

Calculate cosine similarity between two documents represented by vectors

x= (0,1,1,1,2,3,0,0,0,2,1) and

Question 9 0.25 / 0.25 pts

Question 10 0.25 / 0.25 pts

Question 11 0.25 / 0.25 pts

Question 12 0.25 / 0.25 pts

None of the mentioned above

Question 13 0.25 / 0.25 pts

Question 14 0.25 / 0.25 pts

Question 15 0.25 / 0.25 pts

Google Cloud Dataproc

Question 16 0.25 / 0.25 pts

Match the following data analytics to their description:

Descriptive What happened?

Diagnostic Why did this happen?

Predictive What might happen in the f

Prescriptive What should we do next?

Question 17 0.25 / 0.25 pts

Data Science uses _________ to make decision and prediction.

All of the options

Question 18 0.25 / 0.25 pts

Customer data management

Question 19 0.25 / 0.25 pts

Question 20 0.25 / 0.25 pts

Question 21 0.25 / 0.25 pts

Question 22 0.25 / 0.25 pts

Which of the following statement is false with respect to data set?

Raw data should be processed only one time.

Merging concerns combining datasets on the same observations to produce a result

All of the listed options

Question 23 0.25 / 0.25 pts

Is it possible to convert a Nominal scale to an Ordinal Scale during data analysis?

Question 24 0.25 / 0.25 pts

Which of the following is not true for SEMMA model?

SEMMA is focused on the model development aspects of data mining.

Question 25 0.25 / 0.25 pts

In a FashionStore Data set the feature ShirtSize { S,M,L,XL,XXL} is an example of

Question 26 0.25 / 0.25 pts

Question 27 0.25 / 0.25 pts

Missing data in Pandas is represented by