You are on page 1of 629

18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz 1
Due Dec 19 at 23:59 Points 10 Questions 40 Available Dec 18 at 19:00 - Dec 19 at 23:59
Time Limit 60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss this Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go back to the previous question if you skip a
question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score
LATEST Attempt 1 48 minutes 9.38 out of 10

 Correct answers will be available on Dec 22 at 0:00.

Score for this quiz: 9.38 out of 10


Submitted Dec 18 at 23:57
This attempt took 48 minutes.
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 1/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 1 0.25 / 0.25 pts

Which of the following is not a tool used by a Data Scientist?

  Springboot

  Python

 R

  SAS

Question 2 0.25 / 0.25 pts

A city conducted a new bi-annual census of its residents. Which of the following most strongly suggests a
cognitive bias in their collected dataset?

  The census data includes new data on how many cars are owned by the residents.

  Some rows have null values for last names

  Some rows have date of birth in DD-MM-YY and some in DD-MM-YYYY formats

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 2/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
The average income in the dataset is 40% higher than the average income of the city’s population from the
previous census 2 years ago.

Question 3 0.25 / 0.25 pts

Statement 1: Role of a Business analyst usually requires expertise on building ML models.


Statement 2: Role of a Data Scientist usually requires expertise on performing descriptive analysis.

Which of the following is right?

  Statement 1 is correct and Statement 2 is wrong

  Both statements are correct

  Both statements are wrong

  Statement 1 is wrong and Statement 2 is correct

Question 4 0.25 / 0.25 pts

Which of the following is not a tool used by a Data Scientist?

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 3/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Hadoop

  MongoDB

  Amazon S3

  Flask

Question 5 0.25 / 0.25 pts

Tableau is not likely to be used by

  Business Analyst

  Visualization Engineer

  Data Architect

  Data Scientist

Question 6 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 4/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Which of the following best describes the difference between the data analyst and data scientist?

  Data analyst and data scientists plays the same role in the project

  Data analyst are more proficient in R whereas Data scientists are more proficient in Python

  Data analyst does estimation whereas data scientist predicts & explains it as well

  Data analyst just deal with numbers whereas data scientists deals with algorithms

Question 7 0.25 / 0.25 pts

Compute the depth of each bin for the data given below, if the number of bins is 5.
[47, 38, 40, 42, 47, 42, 41, 39, 42, 45, 36, 37, 36, 40, 43, 40, 45, 36, 43, 46]

  2.02

  2.17

  2.2

  2.3

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 5/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 8 0.25 / 0.25 pts

Calculate cosine similarity between two documents represented by vectors

     x= (0,1,1,1,2,3,0,0,0,2,1) and

     y= (2,1,1,2,1,2,1,1,0,0,0)

  0.034

  50.2

  0.64

  0.99

Question 9 0.25 / 0.25 pts

Suppose that the minimum and maximum values for an attribute are 4.3 and 7.6, respectively. Compute the
scaled value of 5.4 if min-max normalization is applied to scale [0.0,1.0]. (Answer should have a precision of
X.XX]

  0.36

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 6/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  0.33

  0.43

  0.45

Question 10 0.25 / 0.25 pts

Suppose that the minimum and maximum values for an attribute are 4.3 and 7.6, respectively. Compute the
scaled value of 5.4 if min-max normalization is applied to scale [-1.0,+1.0]. (Answer should have a precision
of X.XX)

  0.33

  0.76

  0.66

  -0.33

Question 11 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 7/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Identify the data analytics task for the following scenario. A car service showroom manager wants to analyze
his marketing and sales data to understand the reason for drop in sales. 

  Predictive Analytics

  Descriptive Analytics

  Diagnostic Analytics

  Prescriptive Analytics

Question 12 0.25 / 0.25 pts

Amongst which of the following is / are the branch of statistics which deals with the development of statistical
methods is classified as ___.

  Industry statistics

  None of the mentioned above

  Economic statistics

  Applied statistics

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 8/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 13 0.25 / 0.25 pts

Identify the data analytics task for the following scenario. A physics teacher is analyzing the answer scripts of
the students to identify the areas that he/she should concentrate on so that the students understand the
concepts better.

  Predictive Analytics

  Descriptive Analytics

  Prescriptive Analytics

  Diagnostic Analytics

Question 14 0.25 / 0.25 pts

Aanalyzing the data to determine why some phenomena related to learning happened a type of

  Descriptive

  Prescriptive

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 9/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Predictive

  Diagnostic

Question 15 0.25 / 0.25 pts

What is the name of the Google-developed programming framework that enables the creation of applications
for processing big data sets in a distributed computing environment?

  Google Cloud Dataproc

  ZooKeeper

  Hive

  MapReduce

Question 16 0.25 / 0.25 pts

Match the following data analytics to their description:

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 10/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Descriptive   What happened?

Diagnostic   Why did this happen?

Predictive   What might happen in the f

Prescriptive   What should we do next?

Question 17 0.25 / 0.25 pts

Data Science uses _________ to make decision and prediction.

  prescriptive analytics

  All of the options

  Predictive analytics

  machine learning

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 11/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 18 0.25 / 0.25 pts

The process through which businesses analyse customer data or other types of information in order to find
patterns and links between various data items is known as:

  Data mining

  Data digging

  Consumer engagement

  Customer data management

Question 19 0.25 / 0.25 pts

Identify the data analytics task for the following scenario. A project manager is analysing past projects to
identify the software, hardware and human resources that will be required to complete a new project
successfully on time. 

  Descriptive Analytics

  Prescriptive Analytics

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 12/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Diagnostic Analytics

  Predictive Analytics

Question 20 0.25 / 0.25 pts

Fortis-Apollo hospital is planning to  design a model which maps patients to the best possible treatments
based on the diagnosis. Identify the data analytics task for this scenario 

  Diagnostic Analytics

  Predictive Analytics

  Cognitive analytics

  Descriptive Analytics

Question 21 0.25 / 0.25 pts

We are predicting the weather condition as foggy, warm, cloudy, and misty at Bangalore using the data
collected in the last one month. This task is an example of

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 13/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Regression

  Association Rule

  Classification

  Clustering

Question 22 0.25 / 0.25 pts

Which of the following statement is false with respect to data set?

  Raw data should be processed only one time.

  Merging concerns combining datasets on the same observations to produce a result

  All of the listed options

  Sub setting can be used to select and exclude variables and observations

Question 23 0.25 / 0.25 pts

Is it possible to convert a Nominal scale to an Ordinal Scale during data analysis?


https://bits-pilani.instructure.com/courses/1704/quizzes/3453 14/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  True

  False

Question 24 0.25 / 0.25 pts

Which of the following is not true for SEMMA model?

  SEMMA is focused on the model development aspects of data mining.

 
It places less emphasis on the initial planning phases covered in CRISP-DM (Business Understanding and Data
Understanding phases) and omits entirely the Deployment phase.

  The SEMMA model also emphasizes data mining as a non-linear, adaptive process.

 
SEMMA is a logical organisation of the functional tool set of SAS Enterprise Miner for carrying out the core tasks of
data mining.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 15/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 25 0.25 / 0.25 pts

In a FashionStore Data set the feature ShirtSize { S,M,L,XL,XXL} is an example of

  Ordinal attribute

  Numeric attribute

  Continuous attribute

  Nominal attribute

Question 26 0.25 / 0.25 pts

"Order Fulfilment Date" should come after "Order Creation Date". This is an example of which data quality
aspect:

  Timeliness

  Consistency

  Integrity

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 16/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Conformity

Question 27 0.25 / 0.25 pts

Missing data in Pandas is represented by

  NULL

  Null

  Empty

  NaN

Question 28 0.25 / 0.25 pts

A box plot is the visual representation of the following statistical summary

  Minimum, Average, Maximum

  Min, Median, Mode

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 17/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Minimum, First Quartile, Third Quartile, Second Quartile, Mean

  Minimum, First Quartile, Median, Third Quartile, Maximum

Partial
Question 29 0.13 / 0.25 pts

Match the following techniques with the definitions

Binarization   maps a continuous attribut

Binning   Divide the range of a contin

Concept Hierarchy   Smooth out the effect of no

Functional Transformation   Transform attribute values x

Question 30 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 18/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

The scatterplot implies that 

  The features are independent

  The features are positively correlated.

  The features are negatively correlated.

  None of the given options

Incorrect
Question 31 0 / 0.25 pts

In Python, the function used to find whether there are missing values is:
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 19/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  dropna()

  isna()

  imputena()

  fillna()

Question 32 0.25 / 0.25 pts

Choose the correct empirical relation

  mean−mode ≈ 3×(mean−median).

  mean−median ≈ 3×(mean−mode)

  mean−mode ≈ 3×(median-mean).

  median-mean ≈ 3×(mean−mode)

Question 33 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 20/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

In One-Hot Encoding the advantages are:

  Expands the feature space.

  A and B

  Suitable for linear models.

  Keeps all the information of the categorical variable.

Question 34 0.25 / 0.25 pts

Which of the statement is TRUE ?

  Outliers should be addressed only in the training dataset

  Outliers should always be addressed in the dataset

  Treatment of outliers depends on the problem statement

  Outliers should be addressed in the test dataset

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 21/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 35 0.25 / 0.25 pts

The statistical description (x1,x2, . . . ,xN)/N, for the data values x1,x2, . . . ,xN is called as their
________________

  mean

  IQR

  mode

  median

Question 36 0.25 / 0.25 pts

In which phase, the duplicates (of the data) are removed? Choose the best possible answer.

  Data Preparation

  Data Exploration

  Data Collection

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 22/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Data Understanding

Question 37 0.25 / 0.25 pts

In a box and whisker plot of data, point out the FALSE statement, about Outliers

  Outliers are beyond 1.5 times the Inter Quartile Range (IQR) from the lower quartile

  Outliers are beyond 1.5 times the Inter Quartile Range (IQR) from the median

  Outliers are beyond the lowest or the highest value in the dataset

  Outliers are beyond 1.5 times the Inter Quartile Range (IQR) from the upper quartile

Question 38 0.25 / 0.25 pts

Consider the data set below.How will you(most appropriately) handle the missing values for record 1,4,6
respectively.
Data set description.
price: continuous from 5118 to 45400.(class label)
normalized-losses: continuous from 65 to 256.
num-of-doors: four, two.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 23/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  replace by mode,replace by medain,replace by mean.

  ignore all 3 records.

  Ignore the tuple,replace by mean,replace by mode.

  replace by mode,ignore the tuple,replace by mean.

Incorrect
Question 39 0 / 0.25 pts

Which of the following is a bottom-up approach for discretization?

  Equal-frequency binning

  Histogram analysis

  Entropy based discretization

  Correlation analysis

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 24/25
18/12/2022, 23:58 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 40 0.25 / 0.25 pts

For exploring continuous data using descriptive statistics which of the following method is used

  Range

  Frequency

  Percentage

  Histogram

Quiz Score: 9.38 out of 10

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 25/25
12/18/22, 11:21 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz 1
Due Dec 19 at 23:59 Points 10 Questions 40
Available Dec 18 at 19:00 - Dec 19 at 23:59 Time Limit 60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss this
Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score
LATEST Attempt 1 57 minutes 8.25 out of 10

 Correct answers will be available on Dec 22 at 0:00.

Score for this quiz: 8.25 out of 10


Submitted Dec 18 at 23:20
This attempt took 57 minutes.

Question 1 0.25 / 0.25 pts

Data scientist is not responsible for 

  building continuous data stream

  data mining

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 1/20
12/18/22, 11:21 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  data manipulation

  data analytics

Question 2 0.25 / 0.25 pts

Which of the following is correct skills for a Data Scientist?

  Data Wrangling

  Probability & Statistics

  All of the options

  Machine Learning / Deep Learning

Question 3 0.25 / 0.25 pts

Which of the following is not a application for data science?

  Image & Speech Recognition

  Recommendation Systems

  Privacy Checker

  Online Price Comparison

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 2/20
12/18/22, 11:21 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 4 0.25 / 0.25 pts

Which of the following organizational structure for Data Science teams


indicates a significant investment within the company towards being data-
driven.

  Centralized

  Consulting

  Decentralized

  Federated

Question 5 0.25 / 0.25 pts

Identify the data mining tasks from the below list.

i.  Dividing the customers of a company according to their gender


ii.  Predicting the future stock price of a company using historical records
iii. Monitoring the heart rate of a patient for abnormalities
iv. Extracting the frequencies of a sound wave
v. Sorting a student database based on student identification numbers

  i, ii, iii

  iv, v

  i, iv, v

  ii, iii

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 3/20
12/18/22, 11:21 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 6 0.25 / 0.25 pts

Which of the following is not a tool used by a Data Scientist?

  Hadoop

  Flask

  Amazon S3

  MongoDB

Question 7 0.25 / 0.25 pts

Suppose that the mean and standard deviation of the values for an
attribute are 8.9 and 6.5, respectively. Apply z-score normalization to a
value of 3.2. [Answer should have a precision of X.XXXX]

  0.8679

  0.8769

  -0.8679

  -0.8769

Question 8 0.25 / 0.25 pts

Compute the depth of each bin for the data given below, if the number of
bins is 5.
[47, 38, 40, 42, 47, 42, 41, 39, 42, 45, 36, 37, 36, 40, 43, 40, 45, 36, 43,
46]

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 4/20
12/18/22, 11:21 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  2.3

  2.17

  2.02

  2.2

Question 9 0.25 / 0.25 pts

Compute the depth of each bin for data given below, if the number of bins
is 5.

[47, 38, 40, 42, 47, 42, 41, 39, 42, 45, 36, 37, 36, 40, 43, 40, 45, 36, 43,
46]

 3

 6

 5

 4

20/5

Question 10 0.25 / 0.25 pts

Suppose that the minimum and maximum values for the attribute income
are $12,000 and $98,000, respectively. The new range is [0.0,1.0]. Apply
min-max normalization to a value of $73,600.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 5/20
12/18/22, 11:21 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  0.716

  0.561

  0.856

  0.758

Question 11 0.25 / 0.25 pts

Aanalyzing the data to determine why some phenomena related to


learning happened a type of

  Diagnostic

  Prescriptive

  Descriptive

  Predictive

Incorrect
Question 12 0 / 0.25 pts

A scenario where you feel unwell, go to a doctor and explain your


symptoms to the doctor. This can be considered analogous to which stage
of data analytics?

  Diagnostic

  Descriptive

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 6/20
12/18/22, 11:21 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Prescriptive

  Predictive

Question 13 0.25 / 0.25 pts

Match the following data analytics to their description:

Descriptive   What happened?

Diagnostic   Why did this happen?

Predictive   What might happen in t

Prescriptive   What should we do nex

Incorrect Question 14 0 / 0.25 pts

A scenario where you visited a doctor for your fever, and the doctor
asked you questions about your condition and tried to understand how
it might have happened. Now the doctor comes to a conclusion based
on your responses as normal fever. But to be sure about it and ensure
there are no underlying health problems or other risk factors that make
it more likely that you won't recover as quickly, the doctor performs
some tests and asks to see previous medical reports. Now, based on
this doctor is sure that it is a regular fever, and he’s sure it will go away
in 5 days. The last part of the narration, which is derived from above-

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 7/20
12/18/22, 11:21 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

mentioned details can be considered analogous to which stage of data


analytics

  Prescriptive

  Predictive

  Diagnostic

  Descriptive

Question 15 0.25 / 0.25 pts

Access Situation - "Cost and Benefits" - Falls into which phase of Crisp -
DM

  Data modeling

  Evaluation

  Business Understanding

  Data Preparation

Incorrect Question 16 0 / 0.25 pts

Which of the following are classification problems?

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 8/20
12/18/22, 11:21 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
Calculating a room's temperature (in Celsius) based on other
environmental factors (such as atmospheric pressure, humidity etc).

 
Predicting if a cricket player is a batsman or bowler, given his playing
record.

  filtering spam from mails

 
Finding the shorter path between two already-existing routes between two
locations.

 
Predict traffic congestion along a specific route between two locations
using vehicle journey times.

Incorrect
Question 17 0 / 0.25 pts

In an exam, marks less than 40 (out of 100) is considered fail. The task of
finding the names of the failed students with a mark range between 10
and 25 is

  Failure Analytics

  Diagnostic Analytics

  Not part of analytics

  Descriptive Analytics

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 9/20
12/18/22, 11:21 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 18 0.25 / 0.25 pts

Regression is
(1) Prediction of a value of a given continuous valued variable based on
the values of other variables.
(2) Regression works with a linear or nonlinear model of dependency
among the variables.

Which of the above is true?

  Option 2

  Option 1

  None

  Both 1& 2

Question 19 0.25 / 0.25 pts

Match with the most appropriate answer related to analytics.

Descriptive Analytics   what happened

Diagnostic Analytics   why happened

Predictive Analytics   what will happened

Prescriptive Analytics   what can make it happe

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 10/20
12/18/22, 11:21 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 20 0.25 / 0.25 pts

Identify the data analytics task for the following scenario. A project
manager is analysing past projects to identify the software, hardware and
human resources that will be required to complete a new project
successfully on time. 

  Prescriptive Analytics

  Predictive Analytics

  Descriptive Analytics

  Diagnostic Analytics

Question 21 0.25 / 0.25 pts

The best method for predicting the number of deaths due to covid is

  Regression

  Correlation

  Classification

  Clustering

Incorrect
Question 22 0 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 11/20
12/18/22, 11:21 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

We are predicting the weather condition as foggy, warm, cloudy, and misty
at Bangalore using the data collected in the last one month. This task is
an example of

  Classification

  Clustering

  Regression

  Association Rule

Incorrect
Question 23 0 / 0.25 pts

Temperature in kelvin is of__________________ attribute type.

  Nominal attribute

  Ordinal attribute

  Interval attribute

  Ratio attribute

Question 24 0.25 / 0.25 pts

Which of the following is not an example of ordinal attributes?

  Exam Grades
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 12/20
12/18/22, 11:21 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Academic ranks

  Zip codes

  Military ranks

Question 25 0.25 / 0.25 pts

Data lake mainly stores data in

  Structured format

  none

  both

  Raw format

Question 26 0.25 / 0.25 pts

In a FashionStore Data set the feature ShirtSize { S,M,L,XL,XXL} is an


example of

  Numeric attribute

  Nominal attribute

  Ordinal attribute

  Continuous attribute

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 13/20
12/18/22, 11:21 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 27 0.25 / 0.25 pts

Swiggy wants customers to provide their satisfaction feedback in a scale


of 1-5 where

1- Very Unsatisfied; 2- Somewhat Unsatisfied; 3- Neutral; 4- Somewhat


Satisfied; 5- Very Satisfied

What type of attribute is satisfaction?

  Interval attribute

  Nominal attribute

  Ratio attribute

  Ordinal attribute

Question 28 0.25 / 0.25 pts

Missing data in Pandas is represented by

  Null

  Empty

  NaN

  NULL

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 14/20
12/18/22, 11:21 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 29 0.25 / 0.25 pts

In a boxplot, where Q1, Q2 and Q3 are the first, second and third quartiles
respectively, the interquartile range IQR is calculated as:

  IQR = Q3 – Q1

  None of the above

  IQR = Q2-Q1

  IQR = Q3-Q2

Question 30 0.25 / 0.25 pts

Histogram analysis algorithms can be applied recursively to generate a


multilevel concept hierarchy. 

  True

  False

Question 31 0.25 / 0.25 pts

In which phase, the duplicates (of the data) are removed? Choose the
best possible answer.

  Data Exploration

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 15/20
12/18/22, 11:21 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Data Preparation

  Data Collection

  Data Understanding

Question 32 0.25 / 0.25 pts

Suppose that the mean and standard deviation of the values for an
attribute are 6500 and 2000, respectively. Apply z-score normalization to
value of 8000.

  .5

  .75

  .7

  .6

Question 33 0.25 / 0.25 pts

For a data analytics task to analyse feedback on her subject for a class of
60 students, a school teacher decided to use the survey submitted by the
ten students who come for tuitions for that subject, at her home. Identify
the type of sampling she is doing.

  Stratified sampling

  Non Probabilistic sampling

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 16/20
12/18/22, 11:21 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Systematic Sampling

  Sampling without replacement

Question 34 0.25 / 0.25 pts

“Similarity” means:

  A listing of the similar features of a collection of objects.

 
The number of tuples in a database whose attributes have similar values.

  A collection of similar objects.

  Numerical measure of how alike two data objects are.

Question 35 0.25 / 0.25 pts

Histogram analysis algorithms can be based on either equal width or


equal frequency.

  True

  False

Question 36 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 17/20
12/18/22, 11:21 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

A sample is ------------------ if it has approximately the same property of


interest

  Systamatic

  Probabilistic

  Representative

  Qualitative

Question 37 0.25 / 0.25 pts

What is the median of this data 2,5,1,6,7?

 5

  5.5

 6

 2

Incorrect Question 38 0 / 0.25 pts

Consider the following Python code. 


input=['Havells','Philips','Syska','Eveready','Lloyd']
le = sklearn.preprocessing.LabelEncoder()
le.fit(input)
print(le.transform('Lloyd'))
The last line of the code snippet will print

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 18/20
12/18/22, 11:21 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 2

 3

 1

 4

Question 39 0.25 / 0.25 pts

There are two sets X={10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24} and Y = {-30, -31, -32, -33, -34, -35, -36, -37, -38, -39, -40, -41,
-42, -43, -44}. What is TRUE about the standard deviations of X and Y i.e.
σX and σY respectively?

  σY will be smaller than σX.

  Will be the same.

  Magnitude will be the same but the sign will be different

  σX will be smaller than σY.

Question 40 0.25 / 0.25 pts

Match the following function usage in Python used in data cleaning.

dropna()   return Index without NA

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 19/20
12/18/22, 11:21 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

fillna()   to fill NA/NaN values us

interpolate()   to fill NA values in the d

notnull()   to find missing values

Quiz Score: 8.25 out of 10

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 20/20
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz 1
Due Dec 19 at 11:59pm Points 10 Questions 40
Available Dec 18 at 7pm - Dec 19 at 11:59pm Time Limit 60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss
this Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score

LATEST Attempt 1 47 minutes 8.75 out of 10

 Correct answers will be available on Dec 22 at 12am.

Score for this quiz: 8.75 out of 10


Submitted Dec 18 at 8:45pm
This attempt took 47 minutes.

Question 1 0.25 / 0.25 pts

Data science is the process of diverse set of data through


____________

  processing data

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 1/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  analysing data

  All of the options

  organizing data

Question 2 0.25 / 0.25 pts

Statement 1: Role of a Business analyst usually requires expertise on


building ML models.
Statement 2: Role of a Data Scientist usually requires expertise on
performing descriptive analysis.

Which of the following is right?

  Statement 1 is correct and Statement 2 is wrong

  Both statements are correct

  Statement 1 is wrong and Statement 2 is correct

  Both statements are wrong

Incorrect
Question 3 0 / 0.25 pts

Identify the data mining tasks from the below list.

i.  Dividing the customers of a company according to their gender


ii.  Predicting the future stock price of a company using historical
records
iii. Monitoring the heart rate of a patient for abnormalities
iv. Extracting the frequencies of a sound wave
 v. Sorting a student database based on student identification numbers

  i, ii, iii
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 2/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  iv, v

  ii, iii

  i, iv, v

Question 4 0.25 / 0.25 pts

Data Science project steps are highly linear.

  True

  False

Question 5 0.25 / 0.25 pts

The task of the data scientist include

  Collecting Raw Data

  communicate the results to the stakeholders

  All of the Above

  Identifying relevant features

Question 6 0.25 / 0.25 pts

Data scientist is not responsible for 

  data mining

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 3/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  building continuous data stream

  data analytics

  data manipulation

Question 7 0.25 / 0.25 pts

Compute the Manhattan distance between A(2,3) and B(5,7).

 5

 8

 6

 7

Question 8 0.25 / 0.25 pts

Suppose that the mean and standard deviation of the values for an
attribute are 8.9 and 6.5, respectively. Apply z-score normalization to a
value of 3.2. [Answer should have a precision of X.XXXX]

  0.8769

  -0.8679

  0.8679

  -0.8769

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 4/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 9 0.25 / 0.25 pts

Compute the width of each bin for data given below, if the number of
bins is 4.
[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46,
48, 36] 

 4

 3

 5

(49-30)/4

 6

Question 10 0.25 / 0.25 pts

Compute the Euclidean distance between A(2,3) and B(5,7).

 5

 4

 7

 6

Question 11 0.25 / 0.25 pts

A police team wants to predict the crime rate in a locality based on


certain attributes. Which modelling technique would be appropriate

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 5/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Classification

  Clustering

  Regression

  Optimization

Question 12 0.25 / 0.25 pts

Match with the most appropriate answer related to analytics.

Descriptive Analytics   what happened

Diagnostic Analytics   why happened

Predictive Analytics   what will happened

Prescriptive Analytics   what can make it happe

Question 13 0.25 / 0.25 pts

A scenario where you feel unwell and go to a doctor. The doctor asks
you questions like were you exposed to rain or cold climate or did you
have contact with a sick person or did you have food from outside etc.
Based on your answers,doctor came to the conclusion. This can be
considered analogous to which stage of data analytics?

  Descriptive

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 6/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Predictive

  Prescriptive

  Diagnostic

Question 14 0.25 / 0.25 pts

We are predicting the weather condition as foggy, warm, cloudy, and


misty at Bangalore using the data collected in the last one month. This
task is an example of

  Classification

  Association Rule

  Clustering

  Regression

Question 15 0.25 / 0.25 pts

A company wants to find the target segment of people for one of its
products. Which modelling technique would be generally appropriate

  Clustering

  Classification

  Ranking

  Regression

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 7/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 16 0.25 / 0.25 pts

In "Business Understanding" phase of the data science process, the


goals are identified and the objectives defined.

  True

  False

Question 17 0.25 / 0.25 pts

Data pre-processing improves the data quality and make data mining
algorithms efficient and effective. What are the data pre-processing
tasks along with Data cleaning?

  Data Transformation

  Data Integration

  All of the above.

  Data Reduction

Question 18 0.25 / 0.25 pts

Access Situation - "Cost and Benefits" - Falls into which phase of Crisp
- DM

  Data Preparation

  Data modeling

  Evaluation

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 8/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Business Understanding

Incorrect
Question 19 0 / 0.25 pts

Identify the data analytics task for the following scenario. A software
programmer develops a tool to convert programs written in Verilog to
Python.

  Predictive Analytics

  Prescriptive Analytics

  Diagnostic Analytics

  Descriptive Analytics

Incorrect
Question 20 0 / 0.25 pts

Identify the data analytics task for the following scenario. A student is
analyzing various blogs and vlogs to find out the skill set that has to be
acquired to become a data scientist in the future. 

  Predictive Analytics

  Prescriptive Analytics

  Diagnostic Analytics

  Descriptive Analytics

Question 21 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 9/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

In 2001, Big Data created the three Vs: volume, velocity, and variety.
The V's have grown to encompass veracity and value in the years
since. Big data is sometimes subjected to a fifth V, which is:

  Volatile

  Vector

  Variability

  Vulnerability

Question 22 0.25 / 0.25 pts

Amongst which of the following is / are the branch of statistics which


deals with the development of statistical methods is classified as ___.

  None of the mentioned above

  Applied statistics

  Industry statistics

  Economic statistics

Question 23 0.25 / 0.25 pts

What are the best practices for implementing big data analytics
programmes?

  Determining business direction based on data analysis

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 10/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Letting go entirely of 'old ideas' related to data management

 
Focusing on business goals and how to use big data analytics
technologies to meet them

 
Adopting data analysis tools based on a laundry list of their capabilities

Question 24 0.25 / 0.25 pts

Which of the following is not an example of ordinal attributes?

  Zip codes

  Exam Grades

  Military ranks

  Academic ranks

Question 25 0.25 / 0.25 pts

Which of the following Python library is required for web scraping?

  BeautifulSoup

  WebCrawler

  Scraper

  WebSpider

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 11/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 26 0.25 / 0.25 pts

What is the difference between interval/ratio and ordinal variables?

 
The distance between categories is equal across the range of
interval/ratio data

  Ordinal data can be rank ordered, but interval/ratio data cannot

  Interval/ratio variables contain only two categories

 
Ordinal variables have a fixed zero point, whereas interval/ratio
variables do not

Question 27 0.25 / 0.25 pts

Attributes cannot be called as:

  Variables

  Features

  Dimensions

  Data point

Incorrect Question 28 0 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 12/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

In a dataset, we are tracking whether a customer is purchasing a


product or not. This is an example of

  Symmetric attribute

  Discrete attribute

  Continuous attribute

  Asymmetric attribute

Question 29 0.25 / 0.25 pts

Dealing with missing values during data preparation is what kind of an


operation

  Data retrieval

  Data cleansing

  Data transformation

  Data combining

Question 30 0.25 / 0.25 pts

The____ and standard deviation are strongly affected by outliers

  MEDIAN

  MEAN

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 13/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  MODE

  RANGE

Question 31 0.25 / 0.25 pts

In which phase, the duplicates (of the data) are removed? Choose the
best possible answer.

  Data Preparation

  Data Collection

  Data Understanding

  Data Exploration

Question 32 0.25 / 0.25 pts

Which of the following is NOT a visualization technique?

  Matrix

  TreeMap

  Lexeme

  ConeTree

Question 33 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 14/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

There are two sets X={10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24} and Y = {-30, -31, -32, -33, -34, -35, -36, -37, -38, -39, -40, -41,
-42, -43, -44}. What is TRUE about the standard deviations of X and Y
i.e. σX and σY respectively?

  σX will be smaller than σY.

  Magnitude will be the same but the sign will be different

  Will be the same.

  σY will be smaller than σX.

Question 34 0.25 / 0.25 pts

The scatterplot implies that

  The features are positively correlated.

  The features are independent.

  The features are negatively correlated.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 15/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 35 0.25 / 0.25 pts

Consider the data set below.How will you(most appropriately) handle


the missing values for record 1,4,6 respectively.
Data set description.
price: continuous from 5118 to 45400.(class label)
normalized-losses: continuous from 65 to 256.
num-of-doors: four, two.

  replace by mode,replace by medain,replace by mean.

  replace by mode,ignore the tuple,replace by mean.

  ignore all 3 records.

  Ignore the tuple,replace by mean,replace by mode.

Question 36 0.25 / 0.25 pts

For a data analytics task to analyse feedback on her subject for a


class of 60 students, a school teacher decided to use the survey
submitted by the ten students who come for tuitions for that subject, at
her home. Identify the type of sampling she is doing.

  Systematic Sampling

  Stratified sampling

  Non Probabilistic sampling

  Sampling without replacement

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 16/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 37 0.25 / 0.25 pts

Which among the following are valid methods of handling missing data

       P. Eliminating Data Objects

      Q. Estimating Missing Values

      R. Ignoring the Missing Values during Analysis

      S. Replacing with all possible values

  Q and R

  All the options are correct

  R only

  P, Q and R

Incorrect Question 38 0 / 0.25 pts

Which is the major task of Data Integration

 
For the same real world entity, resolving attribute values from different
sources

  None of the above

  Identification of missing rows for identified key values

  Clustering of similar data from different sources

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 17/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 39 0.25 / 0.25 pts

Which of the following statements are true?

I. The smaller data sets resulting from data reduction require less
memory and processing time.

II. Aggregation provides a high‐level view of the data instead of a low‐


level view

III. High-quality data that is aggregated can lead to a high chance of


identifying false positives and negatives

IV. An advantage of aggregation is the potential loss of interesting


details

  Statement I and II

  Statement II and III

  Only Statement III

  Statement I and IV

Question 40 0.25 / 0.25 pts

The missing value for categorical attribute is substituted with

  mean of a value

  least frequent attribute value

  none of the above

  most frequent attribute value

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 18/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz Score: 8.75 out of 10

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 19/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz 1
Due Dec 19 at 11:59pm Points 10 Questions 40
Available Dec 18 at 7pm - Dec 19 at 11:59pm Time Limit 60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss
this Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score

LATEST Attempt 1 47 minutes 8.75 out of 10

 Correct answers will be available on Dec 22 at 12am.

Score for this quiz: 8.75 out of 10


Submitted Dec 18 at 8:45pm
This attempt took 47 minutes.

Question 1 0.25 / 0.25 pts

Data science is the process of diverse set of data through


____________

  processing data

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 1/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  analysing data

  All of the options

  organizing data

Question 2 0.25 / 0.25 pts

Statement 1: Role of a Business analyst usually requires expertise on


building ML models.
Statement 2: Role of a Data Scientist usually requires expertise on
performing descriptive analysis.

Which of the following is right?

  Statement 1 is correct and Statement 2 is wrong

  Both statements are correct

  Statement 1 is wrong and Statement 2 is correct

  Both statements are wrong

Incorrect
Question 3 0 / 0.25 pts

Identify the data mining tasks from the below list.

i.  Dividing the customers of a company according to their gender


ii.  Predicting the future stock price of a company using historical
records
iii. Monitoring the heart rate of a patient for abnormalities
iv. Extracting the frequencies of a sound wave
 v. Sorting a student database based on student identification numbers

  i, ii, iii
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 2/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  iv, v

  ii, iii

  i, iv, v

Question 4 0.25 / 0.25 pts

Data Science project steps are highly linear.

  True

  False

Question 5 0.25 / 0.25 pts

The task of the data scientist include

  Collecting Raw Data

  communicate the results to the stakeholders

  All of the Above

  Identifying relevant features

Question 6 0.25 / 0.25 pts

Data scientist is not responsible for 

  data mining

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 3/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  building continuous data stream

  data analytics

  data manipulation

Question 7 0.25 / 0.25 pts

Compute the Manhattan distance between A(2,3) and B(5,7).

 5

 8

 6

 7

Question 8 0.25 / 0.25 pts

Suppose that the mean and standard deviation of the values for an
attribute are 8.9 and 6.5, respectively. Apply z-score normalization to a
value of 3.2. [Answer should have a precision of X.XXXX]

  0.8769

  -0.8679

  0.8679

  -0.8769

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 4/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 9 0.25 / 0.25 pts

Compute the width of each bin for data given below, if the number of
bins is 4.
[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46,
48, 36] 

 4

 3

 5

(49-30)/4

 6

Question 10 0.25 / 0.25 pts

Compute the Euclidean distance between A(2,3) and B(5,7).

 5

 4

 7

 6

Question 11 0.25 / 0.25 pts

A police team wants to predict the crime rate in a locality based on


certain attributes. Which modelling technique would be appropriate

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 5/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Classification

  Clustering

  Regression

  Optimization

Question 12 0.25 / 0.25 pts

Match with the most appropriate answer related to analytics.

Descriptive Analytics   what happened

Diagnostic Analytics   why happened

Predictive Analytics   what will happened

Prescriptive Analytics   what can make it happe

Question 13 0.25 / 0.25 pts

A scenario where you feel unwell and go to a doctor. The doctor asks
you questions like were you exposed to rain or cold climate or did you
have contact with a sick person or did you have food from outside etc.
Based on your answers,doctor came to the conclusion. This can be
considered analogous to which stage of data analytics?

  Descriptive

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 6/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Predictive

  Prescriptive

  Diagnostic

Question 14 0.25 / 0.25 pts

We are predicting the weather condition as foggy, warm, cloudy, and


misty at Bangalore using the data collected in the last one month. This
task is an example of

  Classification

  Association Rule

  Clustering

  Regression

Question 15 0.25 / 0.25 pts

A company wants to find the target segment of people for one of its
products. Which modelling technique would be generally appropriate

  Clustering

  Classification

  Ranking

  Regression

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 7/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 16 0.25 / 0.25 pts

In "Business Understanding" phase of the data science process, the


goals are identified and the objectives defined.

  True

  False

Question 17 0.25 / 0.25 pts

Data pre-processing improves the data quality and make data mining
algorithms efficient and effective. What are the data pre-processing
tasks along with Data cleaning?

  Data Transformation

  Data Integration

  All of the above.

  Data Reduction

Question 18 0.25 / 0.25 pts

Access Situation - "Cost and Benefits" - Falls into which phase of Crisp
- DM

  Data Preparation

  Data modeling

  Evaluation

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 8/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Business Understanding

Incorrect
Question 19 0 / 0.25 pts

Identify the data analytics task for the following scenario. A software
programmer develops a tool to convert programs written in Verilog to
Python.

  Predictive Analytics

  Prescriptive Analytics

  Diagnostic Analytics

  Descriptive Analytics

Incorrect
Question 20 0 / 0.25 pts

Identify the data analytics task for the following scenario. A student is
analyzing various blogs and vlogs to find out the skill set that has to be
acquired to become a data scientist in the future. 

  Predictive Analytics

  Prescriptive Analytics

  Diagnostic Analytics

  Descriptive Analytics

Question 21 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 9/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

In 2001, Big Data created the three Vs: volume, velocity, and variety.
The V's have grown to encompass veracity and value in the years
since. Big data is sometimes subjected to a fifth V, which is:

  Volatile

  Vector

  Variability

  Vulnerability

Question 22 0.25 / 0.25 pts

Amongst which of the following is / are the branch of statistics which


deals with the development of statistical methods is classified as ___.

  None of the mentioned above

  Applied statistics

  Industry statistics

  Economic statistics

Question 23 0.25 / 0.25 pts

What are the best practices for implementing big data analytics
programmes?

  Determining business direction based on data analysis

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 10/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Letting go entirely of 'old ideas' related to data management

 
Focusing on business goals and how to use big data analytics
technologies to meet them

 
Adopting data analysis tools based on a laundry list of their capabilities

Question 24 0.25 / 0.25 pts

Which of the following is not an example of ordinal attributes?

  Zip codes

  Exam Grades

  Military ranks

  Academic ranks

Question 25 0.25 / 0.25 pts

Which of the following Python library is required for web scraping?

  BeautifulSoup

  WebCrawler

  Scraper

  WebSpider

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 11/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 26 0.25 / 0.25 pts

What is the difference between interval/ratio and ordinal variables?

 
The distance between categories is equal across the range of
interval/ratio data

  Ordinal data can be rank ordered, but interval/ratio data cannot

  Interval/ratio variables contain only two categories

 
Ordinal variables have a fixed zero point, whereas interval/ratio
variables do not

Question 27 0.25 / 0.25 pts

Attributes cannot be called as:

  Variables

  Features

  Dimensions

  Data point

Incorrect Question 28 0 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 12/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

In a dataset, we are tracking whether a customer is purchasing a


product or not. This is an example of

  Symmetric attribute

  Discrete attribute

  Continuous attribute

  Asymmetric attribute

Question 29 0.25 / 0.25 pts

Dealing with missing values during data preparation is what kind of an


operation

  Data retrieval

  Data cleansing

  Data transformation

  Data combining

Question 30 0.25 / 0.25 pts

The____ and standard deviation are strongly affected by outliers

  MEDIAN

  MEAN

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 13/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  MODE

  RANGE

Question 31 0.25 / 0.25 pts

In which phase, the duplicates (of the data) are removed? Choose the
best possible answer.

  Data Preparation

  Data Collection

  Data Understanding

  Data Exploration

Question 32 0.25 / 0.25 pts

Which of the following is NOT a visualization technique?

  Matrix

  TreeMap

  Lexeme

  ConeTree

Question 33 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 14/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

There are two sets X={10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24} and Y = {-30, -31, -32, -33, -34, -35, -36, -37, -38, -39, -40, -41,
-42, -43, -44}. What is TRUE about the standard deviations of X and Y
i.e. σX and σY respectively?

  σX will be smaller than σY.

  Magnitude will be the same but the sign will be different

  Will be the same.

  σY will be smaller than σX.

Question 34 0.25 / 0.25 pts

The scatterplot implies that

  The features are positively correlated.

  The features are independent.

  The features are negatively correlated.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 15/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 35 0.25 / 0.25 pts

Consider the data set below.How will you(most appropriately) handle


the missing values for record 1,4,6 respectively.
Data set description.
price: continuous from 5118 to 45400.(class label)
normalized-losses: continuous from 65 to 256.
num-of-doors: four, two.

  replace by mode,replace by medain,replace by mean.

  replace by mode,ignore the tuple,replace by mean.

  ignore all 3 records.

  Ignore the tuple,replace by mean,replace by mode.

Question 36 0.25 / 0.25 pts

For a data analytics task to analyse feedback on her subject for a


class of 60 students, a school teacher decided to use the survey
submitted by the ten students who come for tuitions for that subject, at
her home. Identify the type of sampling she is doing.

  Systematic Sampling

  Stratified sampling

  Non Probabilistic sampling

  Sampling without replacement

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 16/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 37 0.25 / 0.25 pts

Which among the following are valid methods of handling missing data

       P. Eliminating Data Objects

      Q. Estimating Missing Values

      R. Ignoring the Missing Values during Analysis

      S. Replacing with all possible values

  Q and R

  All the options are correct

  R only

  P, Q and R

Incorrect Question 38 0 / 0.25 pts

Which is the major task of Data Integration

 
For the same real world entity, resolving attribute values from different
sources

  None of the above

  Identification of missing rows for identified key values

  Clustering of similar data from different sources

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 17/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 39 0.25 / 0.25 pts

Which of the following statements are true?

I. The smaller data sets resulting from data reduction require less
memory and processing time.

II. Aggregation provides a high‐level view of the data instead of a low‐


level view

III. High-quality data that is aggregated can lead to a high chance of


identifying false positives and negatives

IV. An advantage of aggregation is the potential loss of interesting


details

  Statement I and II

  Statement II and III

  Only Statement III

  Statement I and IV

Question 40 0.25 / 0.25 pts

The missing value for categorical attribute is substituted with

  mean of a value

  least frequent attribute value

  none of the above

  most frequent attribute value

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 18/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz Score: 8.75 out of 10

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 19/19
18/12/2022, 23:02 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz 1
Due Dec 19 at 23:59 Points 10 Questions 40
Available Dec 18 at 19:00 - Dec 19 at 23:59 Time Limit 60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss
this Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score
LATEST Attempt 1 60 minutes 6.63 out of 10

 Correct answers will be available on Dec 22 at 0:00.

Score for this quiz: 6.63 out of 10


Submitted Dec 18 at 22:59
This attempt took 60 minutes.

Question 1 0.25 / 0.25 pts

Tableau software is most likely to be used during

  presentation of idea to stakeholders

  Feature selection

  Deployment

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 1/18
18/12/2022, 23:02 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Data warehousing

Question 2 0.25 / 0.25 pts

Pattern Recognition is a sub-field of Data Science.

  True

  False

Question 3 0.25 / 0.25 pts

Match the following set of roles to their job description

Data Analyst   data collection and in

Data Scientist   solves problems usin

Data Architect   warehousing the data

Data Engineer   implements, test and

Incorrect
Question 4 0 / 0.25 pts

"Take last weeks data and predict the sales for next six months within
next few days with 2 people team" is good example of which of the
following Data Science challenge?

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 2/18
18/12/2022, 23:02 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Insufficient timing for project completion

  Lack of professional having sound knowledge of Data science skills

  Unrealistic expectations

  Data sizing

Question 5 0.25 / 0.25 pts

There are following key roles in data science project

  Data Scientist, Architect, SME, Sponsor, Programmer

  Data Scientist, Analyst, SME, Programmer

  Data Scientist, Analyst

  Data Scientist, Analyst, Sponsor

Question 6 0.25 / 0.25 pts

Data scientist is not responsible for

  Building a continuous data stream

  Data Analytics

  Data manipulation

  Data mining

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 3/18
18/12/2022, 23:02 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 7 0.25 / 0.25 pts

Suppose that the minimum and maximum values for an attribute are 4.3
and 7.6, respectively. Compute the scaled value of 5.4 if min-max
normalization is applied to scale [0.0,1.0]. (Answer should have a
precision of X.XX]

  0.45

  0.36

  0.43

  0.33

Incorrect Question 8 0 / 0.25 pts

Compute the Cosine similarity between A(3,6) and B(4,8).

  1.1

  1.2

  1.0

  1.3

Question 9 0.25 / 0.25 pts

Compute the depth of each bin for data given below, if the number of
bins is 5.

[47, 38, 40, 42, 47, 42, 41, 39, 42, 45, 36, 37, 36, 40, 43, 40, 45, 36, 43,
46]

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 4/18
18/12/2022, 23:02 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 4

20/5

 5

 6

 3

Question 10 0.25 / 0.25 pts

Find the inter quartile range(IQR) of the the following inputs :


4,4,10,11,15,7,14,12,6

  11

 8

  10

 7

Question 11 0.25 / 0.25 pts

Identify the data analytics task for the following scenario. A physics
teacher is analyzing the answer scripts of the students to identify the
areas that he/she should concentrate on so that the students
understand the concepts better.

  Predictive Analytics

  Descriptive Analytics

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 5/18
18/12/2022, 23:02 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Prescriptive Analytics

  Diagnostic Analytics

Question 12 0.25 / 0.25 pts

The process through which businesses analyse customer data or other


types of information in order to find patterns and links between various
data items is known as:

  Customer data management

  Data digging

  Data mining

  Consumer engagement

Question 13 0.25 / 0.25 pts

A scenario where you feel unwell, go to a doctor and explain your


symptoms to the doctor. This can be considered analogous to which
stage of data analytics?

  Prescriptive

  Diagnostic

  Predictive

  Descriptive

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 6/18
18/12/2022, 23:02 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 14 0.25 / 0.25 pts

In 2001, Big Data created the three Vs: volume, velocity, and variety.
The V's have grown to encompass veracity and value in the years
since. Big data is sometimes subjected to a fifth V, which is:

  Variability

  Volatile

  Vulnerability

  Vector

Question 15 0.25 / 0.25 pts

BITS investigating to determine the cause for decreased admissions for


CSI PG program is an example of

  Descriptive analysis

  Diagnositic Analysis

  Predictive analysis

  Prescriptive analysis

Question 16 0.25 / 0.25 pts

Which of the following statement is false with respect to data set?

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 7/18
18/12/2022, 23:02 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
Sub setting can be used to select and exclude variables and
observations

  All of the listed options

 
Merging concerns combining datasets on the same observations to
produce a result

  Raw data should be processed only one time.

Question 17 0.25 / 0.25 pts

We are predicting the humidity at Bangalore using the data collected in


the last one month. This task is an example of

  Regression

  Clustering

  Association Rule

  Classification

Question 18 0.25 / 0.25 pts

Amongst which of the following is / are the branch of statistics which


deals with the development of statistical methods is classified as ___.

  Applied statistics

  None of the mentioned above

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 8/18
18/12/2022, 23:02 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Industry statistics

  Economic statistics

Incorrect
Question 19 0 / 0.25 pts

Analytics 3.0 consists of

  Descriptive analytics

  Predictive analytics

  None of the above

  Prescriptive analytics

Question 20 0.25 / 0.25 pts

CRISP-DM methodology is specifically built for IT(Information


Technology) projects.

  True

  False

Question 21 0.25 / 0.25 pts

A company wants to find the target segment of people for one of its
products. Which modelling technique would be generally appropriate

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 9/18
18/12/2022, 23:02 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Regression

  Ranking

  Classification

  Clustering

Incorrect Question 22 0 / 0.25 pts

Which of the following is not an issue with CRISP-DM model?

 
Various modeling techniques are selected and applied, and their
parameters are calibrated to optimal values3

 
Thorough evaluation indeed is needed, yet the CRISP-DM methodology
does not prescribe how to do this.

 
It very much underestimates the amount real experimentation that is
needed to get at viable results

 
The end-users of the analytical model are required to post-rationalize
the model, which leads to a lot of dissatisfaction

Incorrect
Question 23 0 / 0.25 pts

Is it possible to convert a Nominal scale to an Ordinal Scale during data


analysis?

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 10/18
18/12/2022, 23:02 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  True

  False

Question 24 0.25 / 0.25 pts

e – mail is an example of ____________________ data

  Structured

  Semi-structured

  Quasi-structured

  Unstructured

Incorrect
Question 25 0 / 0.25 pts

In a dataset, we are tracking whether a customer is purchasing a


product or not. This is an example of

  Continuous attribute

  Symmetric attribute

  Asymmetric attribute

  Discrete attribute

Incorrect
Question 26 0 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 11/18
18/12/2022, 23:02 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Which of the following is not true for SEMMA model?

 
It places less emphasis on the initial planning phases covered in
CRISP-DM (Business Understanding and Data Understanding phases)
and omits entirely the Deployment phase.

 
SEMMA is focused on the model development aspects of data mining.

 
SEMMA is a logical organisation of the functional tool set of SAS
Enterprise Miner for carrying out the core tasks of data mining.

 
The SEMMA model also emphasizes data mining as a non-linear,
adaptive process.

Question 27 0.25 / 0.25 pts

"Identify false statement "

  Profession can be considered nominal

  Subject grade can be considered ordinal

 
Response to 'Do you own a car?' can be considered symmetric binary

 
Response to 'Do you have a rare disease?' can be considered
symmetric binary.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 12/18
18/12/2022, 23:02 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 28 0.25 / 0.25 pts

A box plot is the visual representation of the following statistical


summary

  Min, Median, Mode

  Minimum, Average, Maximum

  Minimum, First Quartile, Median, Third Quartile, Maximum

  Minimum, First Quartile, Third Quartile, Second Quartile, Mean

Question 29 0.25 / 0.25 pts

Choose the correct empirical relation

  mean−median ≈ 3×(mean−mode)

  mean−mode ≈ 3×(median-mean).

  median-mean ≈ 3×(mean−mode)

  mean−mode ≈ 3×(mean−median).

Question 30 0.25 / 0.25 pts

Dealing with missing values during data preparation is what kind of an


operation

  Data combining
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 13/18
18/12/2022, 23:02 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Data cleansing

  Data transformation

  Data retrieval

Incorrect
Question 31 0 / 0.25 pts

Identify the sampling technique used in the following use case. For ML
classification task, the algorithm requires that the test set has equal
examples from the three categories.

  Simple Random

  Cluster Sampling

  Stratified Random

  Systematic Sampling

Incorrect
Question 32 0 / 0.25 pts

The scatterplot implies that 

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 14/18
18/12/2022, 23:02 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  None of the given options

  The features are positively correlated.

  The features are independent

  The features are negatively correlated.

Partial
Question 33 0.13 / 0.25 pts

Which of the following is/are true about data aggregation?

  It preserves all details even after aggregation at all times

  It can work with both quantitative and qualitative attributes

  It provides a high-level view of the data

  It does not complement well with statistical analysis

Question 34 0.25 / 0.25 pts

A table contains the salary details of professionals in different fields,


categorized by field. The table has got 100,000 rows. Around 10% of
the rows do not have salary data. It is required to fill in the missing
salary data as part of data pre-processing. Choose from below the best
method for this:

 
Find the mean salary of the available 90% of the data and use that to fill
in all the missing data.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 15/18
18/12/2022, 23:02 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
Find the field wise mean salary for the available data and fill in the
missing salary data with the applicable mean salary.

  Delete the rows where salary data is missing.

  Guess the missing data manually.

Question 35 0.25 / 0.25 pts

Data Integration is a 

  Generalization technique

  None of the answers

  Pre-processing technique

  Data Normalization Technique

Incorrect
Question 36 0 / 0.25 pts

Which of the statement is TRUE ?

  Outliers should always be addressed in the dataset

  Outliers should be addressed in the test dataset

  Outliers should be addressed only in the training dataset

  Treatment of outliers depends on the problem statement

Question 37 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 16/18
18/12/2022, 23:02 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

“Similarity” means:

  Numerical measure of how alike two data objects are.

  A listing of the similar features of a collection of objects.

 
The number of tuples in a database whose attributes have similar
values.

  A collection of similar objects.

Incorrect
Question 38 0 / 0.25 pts

What is the median of this data 2,5,1,6,7?

 5

  5.5

 6

 2

Unanswered Question 39 0 / 0.25 pts

In which phase, the duplicates of the data are removed? Choose the
best possible answer.

  Data Collection

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 17/18
18/12/2022, 23:02 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Data Requirements

  Data Understanding

  Data Preparation

Unanswered
Question 40 0 / 0.25 pts

Missing values should always be imputed before training the model.

  None of the answers

  Mostly True

  True

  False

Quiz Score: 6.63 out of 10

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 18/18
12/18/22, 8:59 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz 1
Due Dec 19 at 23:59 Points 10 Questions 40
Available Dec 18 at 19:00 - Dec 19 at 23:59 Time Limit 60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss
this Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score

LATEST Attempt 1 60 minutes 7.71 out of 10

 Correct answers will be available on Dec 22 at 0:00.

Score for this quiz: 7.71 out of 10


Submitted Dec 18 at 20:58
This attempt took 60 minutes.

Question 1 0.25 / 0.25 pts

Due to market expectations, businesses are having difficulty retaining


highly trained data scientists and engineers.

  False

  No answer text provided.

  True

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 1/18
12/18/22, 8:59 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  No answer text provided.

Question 2 0.25 / 0.25 pts

Data Science project steps are highly linear.

  True

  False

Question 3 0.25 / 0.25 pts

Which of the following is correct skills for a Data Scientist?

  Probability & Statistics

  Machine Learning / Deep Learning

  Data Wrangling

  All of the options

Partial Question 4 0.08 / 0.25 pts

Which of the following are the reasons for the sudden growth of
analytics?

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 2/18
12/18/22, 8:59 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
Large number of user friendly analytics tools available for data
processing

  Data is growing at 40% compound annual rate

  Cost of storage has hugely dropped

  Large number of analysts available in the market

Question 5 0.25 / 0.25 pts

Data scientist is not responsible for

  Building a continuous data stream

  Data manipulation

  Data mining

  Data Analytics

Question 6 0.25 / 0.25 pts

Tableau software is most likely to be used during

  presentation of idea to stakeholders

  Feature selection

  Data warehousing

  Deployment

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 3/18
12/18/22, 8:59 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 7 0.25 / 0.25 pts

Compute the width of each bin for data given below, if the number of
bins is 5.

[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46,
48, 36]

 5

 4

(49−30)÷5

 6

 3

Question 8 0.25 / 0.25 pts

The value 56739 when scaled into Decimal Normalization will be


_________.

  56.739

  5673.9

  5.6739

  567.39

Question 9 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 4/18
12/18/22, 8:59 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Suppose the Lab administrator measured the power consumption of


an entire network operations centre (NOC) and the set of consumption
details is 90 W, 104 W, 98 W, 98 W, 105 W, 92 W, 102 W, 100 W, 110
W, 98 W, 210 W and 115 W.What is the mode power consumption?

  150W

  100W

  90W

  98W

Question 10 0.25 / 0.25 pts

Suppose that the mean and standard deviation of the values for an
attribute are 8.9 and 6.5, respectively. Apply z-score normalization to a
value of 3.2. [Answer should have a precision of X.XXXX]

  -0.8679

  -0.8769

  0.8679

  0.8769

Question 11 0.25 / 0.25 pts

In _________ phase, final report/technical document of process is


prepared

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 5/18
12/18/22, 8:59 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  None

  Operationalize

  Model building

  Model planning

Question 12 0.25 / 0.25 pts

In 2001, Big Data created the three Vs: volume, velocity, and variety.
The V's have grown to encompass veracity and value in the years
since. Big data is sometimes subjected to a fifth V, which is:

  Vector

  Variability

  Vulnerability

  Volatile

Question 13 0.25 / 0.25 pts

Match with the most appropriate answer related to analytics.

Descriptive Analytics   what happened

Diagnostic Analytics   why happened

Predictive Analytics   what will happened

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 6/18
12/18/22, 8:59 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Prescriptive Analytics   what can make it happ

Question 14 0.25 / 0.25 pts

BITS investigating to determine the cause for decreased admissions for


CSI PG program is an example of

  Predictive analysis

  Prescriptive analysis

  Diagnositic Analysis

  Descriptive analysis

Question 15 0.25 / 0.25 pts

Identify the data analytics task for the following scenario. A


pharmaceutical organization is developing a new drug or vaccine to
compact Covid-19 using machine learning techniques where the data is
from the existing drugs and the diseases it can fight or cure. 

  Descriptive Analytics

  Predictive Analytics

  Prescriptive Analytics

  Diagnostic Analytics

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 7/18
12/18/22, 8:59 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 16 0.25 / 0.25 pts

Identify the data analytics task for the following scenario. The sales of
various products of your organization per month per geographical area
is reported using an interactive visual tool. 

  Descriptive Analytics

  Prescriptive Analytics

  Predictive Analytics

  Diagnostic Analytics

Question 17 0.25 / 0.25 pts

Which of the following are classification problems?

 
Finding the shorter path between two already-existing routes between
two locations.

 
Predict traffic congestion along a specific route between two locations
using vehicle journey times.

 
Calculating a room's temperature (in Celsius) based on other
environmental factors (such as atmospheric pressure, humidity etc).

  filtering spam from mails

 
Predicting if a cricket player is a batsman or bowler, given his playing
record.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 8/18
12/18/22, 8:59 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 18 0.25 / 0.25 pts

A scenario where you feel unwell and go to a doctor. The doctor asks
you questions like were you exposed to rain or cold climate or did you
have contact with a sick person or did you have food from outside etc.
Based on your answers,doctor came to the conclusion. This can be
considered analogous to which stage of data analytics?

  Diagnostic

  Descriptive

  Predictive

  Prescriptive

Incorrect
Question 19 0 / 0.25 pts

The 5 stages of KDD process is


1.Selection
2.Preprocessing
3.Tranformation
4.Data Mining
5.Interpretaion/Evaluation.

Identify the CRISP DM phases that corresponds to Stage 3 and 4 of


KDD.

  Data Preparation,Modeling

  Data Understanding,Evaluation

  Modeling,Data Preparation

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 9/18
12/18/22, 8:59 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Evaluation,Business understanding

Question 20 0.25 / 0.25 pts

Match with the most appropriate answer, related to the tools


available to a Data Scientist.

Cassandra   Big-data

Tableau   Visualization

SAS   Statistics

Weka   Machine Learning

Incorrect
Question 21 0 / 0.25 pts

Which one of the following statement(s) is correct (Choose the most


appropriate answer)?

  All the statements

  None of the statements

 
Data analytics is the pursuit of extracting meaning from raw data using
specialized computer systems.

 
Data Analytics refers to the techniques used to analyze data to enhance
productivity and business gain.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 10/18
12/18/22, 8:59 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
Analytics is a process in which a computer examines information using
mathematical methods to find useful patterns.

Incorrect
Question 22 0 / 0.25 pts

Which of the following statements are true about data cleaning?

  It focuses on removing inaccurate data from your data set

  All of the given options

 
It focuses on transforming the data’s format by converting raw data into
another format

  It enhances the data’s accuracy and integrity

Question 23 0.25 / 0.25 pts

Which of the following Python library is required for web scraping?

  WebSpider

  BeautifulSoup

  Scraper

  WebCrawler

Question 24 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 11/18
12/18/22, 8:59 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Which of the following is an example of raw data?

  original swath files generated from a sonar system

  all of the mentioned

  a real-time GPS-encoded navigation file

  initial time-series file of temperature values

Question 25 0.25 / 0.25 pts

Is it possible to convert an interval variable to an Ordinal Variable

  True

  False

Question 26 0.25 / 0.25 pts

In a FashionStore Data set the feature Jacket_Shade { Grey,brown,


black, Indigo, Beige , Khaki} is an example of

  Ordinal attribute

  Continuous attribute

  Numeric attribute

  Nominal attribute

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 12/18
12/18/22, 8:59 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Incorrect Question 27 0 / 0.25 pts

As part of a survey in a large organization, one of the features that you


capture is designation. This type of data has the characteristic

  Nominal, Quantitative, Discrete

  Discrete, Quantitative, Ordinal

  None of the given answers

  Discrete, Qualitative, Ordinal

Question 28 0.25 / 0.25 pts

Raw data should be processed only one time.

  True

  False

Question 29 0.25 / 0.25 pts

In which phase, the duplicates of the data are removed? Choose the
best possible answer.

  Data Requirements

  Data Preparation

  Data Understanding

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 13/18
12/18/22, 8:59 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Data Collection

Partial Question 30 0.13 / 0.25 pts

Match the following function usage in Python used in data cleaning.

dropna()   return Index without NA

fillna()   to fill NA/NaN values u

interpolate()   to find missing values

notnull()   to fill NA values in the

Question 31 0.25 / 0.25 pts

A table contains the salary details of professionals in different fields,


categorized by field. The table has got 100,000 rows. Around 10% of
the rows do not have salary data. It is required to fill in the missing
salary data as part of data pre-processing. Choose from below the best
method for this:

 
Find the mean salary of the available 90% of the data and use that to fill
in all the missing data.

 
Find the field wise mean salary for the available data and fill in the
missing salary data with the applicable mean salary.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 14/18
12/18/22, 8:59 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Guess the missing data manually.

  Delete the rows where salary data is missing.

Incorrect Question 32 0 / 0.25 pts

A data object is being described by a categorical attribute having four


categories, then for data analysis purpose if we want to transform the
attributes into numerical values, then

  D. it can be done by encoding using 3 or 4 binary variables

  A. it can’t be done

  C. it can be done by encoding using only 4 binary variables

  B. it can be done by encoding using only 3 binary variables

Incorrect Question 33 0 / 0.25 pts

One Hot Encoding scales well as the number of class labels increases

  Most of the time

  None of the given statements

  The statement is false

  Some of the time

Question 34 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 15/18
12/18/22, 8:59 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Dealing with missing values during data preparation is what kind of an


operation

  Data cleansing

  Data retrieval

  Data combining

  Data transformation

Question 35 0.25 / 0.25 pts

Which among the following are valid methods of handling missing data

       P. Eliminating Data Objects

      Q. Estimating Missing Values

      R. Ignoring the Missing Values during Analysis

      S. Replacing with all possible values

  Q and R

  P, Q and R

  R only

  All the options are correct

Incorrect Question 36 0 / 0.25 pts

Identify the sampling technique used in the following use case. For ML
classification task, the algorithm requires that the test set has equal
examples from the three categories.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 16/18
12/18/22, 8:59 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Stratified Random

  Systematic Sampling

  Simple Random

  Cluster Sampling

Question 37 0.25 / 0.25 pts

Data Integration is a 

  Data Normalization Technique

  Generalization technique

  None of the answers

  Pre-processing technique

Question 38 0.25 / 0.25 pts

Which of the following methods are considered to be the best practice


for data cleaning?

  All of the given options

  cleansing large dataset without segmentation

  Sorting data by attributes

  By breaking large dataset into small data

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 17/18
12/18/22, 8:59 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 39 0.25 / 0.25 pts

A sample is ------------------ if it has approximately the same property of


interest

  Probabilistic

  Systamatic

  Qualitative

  Representative

Incorrect Question 40 0 / 0.25 pts

Exploratory data analysis does not help in

  Finding out the data type of a variable

  In univariate and bivariate variable analysis

  Finding statistical estimates of a variable

  Derivation of new attributes

Quiz Score: 7.71 out of 10

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 18/18
12/18/22, 9:41 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz 1
Due Dec 19 at 23:59 Points 10 Questions 40
Available Dec 18 at 19:00 - Dec 19 at 23:59 Time Limit 60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss
this Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score

LATEST Attempt 1 59 minutes 7.75 out of 10

 Correct answers will be available on Dec 22 at 0:00.

Score for this quiz: 7.75 out of 10


Submitted Dec 18 at 21:40
This attempt took 59 minutes.

Question 1 0.25 / 0.25 pts

In which of the following analysts are allocated to units throughout the


organization and their activities are coordinated by a central entity.

  Consulting

  Coordinational

  Functional

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 1/18
12/18/22, 9:41 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Center of Excellence

Question 2 0.25 / 0.25 pts

Statement 1: Business Intelligence involves analyzing past data and


reporting on it.

Statement 2: Descriptive analysis involves analyzing past data and


reporting on it.

Which of the following is right?

  Both statements are correct

  Both statements are wrong

  Statement 1 is correct and Statement 2 is wrong

  Statement 1 is wrong and Statement 2 is correct

Question 3 0.25 / 0.25 pts

Which of the following is correct skills for a Data Scientist?

  Machine Learning / Deep Learning

  All of the options

  Probability & Statistics

  Data Wrangling

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 2/18
12/18/22, 9:41 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Incorrect Question 4 0 / 0.25 pts

Which of the following is not a tool used by a Data Scientist?

  Hadoop

  MongoDB

  Flask

  Amazon S3

Question 5 0.25 / 0.25 pts

Due to market expectations, businesses are having difficulty retaining


highly trained data scientists and engineers.

  True

  No answer text provided.

  No answer text provided.

  False

Question 6 0.25 / 0.25 pts

Which of the following is not a application for data science?

  Recommendation Systems

  Privacy Checker

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 3/18
12/18/22, 9:41 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Image & Speech Recognition

  Online Price Comparison

Question 7 0.25 / 0.25 pts

Compute the depth of each bin for the data given below, if the number
of bins is 5.
[47, 38, 40, 42, 47, 42, 41, 39, 42, 45, 36, 37, 36, 40, 43, 40, 45, 36,
43, 46]

  2.2

  2.3

  2.17

  2.02

Question 8 0.25 / 0.25 pts

The value 56739 when scaled into Decimal Normalization will be


_________.

  56.739

  567.39

  5673.9

  5.6739

Question 9 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 4/18
12/18/22, 9:41 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Suppose that the mean and standard deviation of the values for an
attribute are 8.9 and 6.5, respectively. Apply z-score normalization to a
value of 10.7. 

  0.2679

  -0.2679

  0.2769

  -0.2769

Question 10 0.25 / 0.25 pts

Suppose that the minimum and maximum values for an attribute are
4.3 and 7.6, respectively. Compute the scaled value of 5.4 if min-max
normalization is applied to scale [0.0,1.0]. (Answer should have a
precision of X.XX]

  0.38

  0.30

  0.33

  0.29

Question 11 0.25 / 0.25 pts

Match with the most appropriate answer, related to the tools available
to a Data Science life cycle.

SMAM   8 stages

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 5/18
12/18/22, 9:41 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

SEMMA   5 stages

Big data life cycle   12 phases

CRISP-DM   6 stages

Question 12 0.25 / 0.25 pts

Which one of the following statement(s) is correct (Choose the most


appropriate answer)?

  None of the statements

 
Analytics is a process in which a computer examines information using
mathematical methods to find useful patterns.

 
Data Analytics refers to the techniques used to analyze data to enhance
productivity and business gain.

  All the statements

 
Data analytics is the pursuit of extracting meaning from raw data using
specialized computer systems.

Question 13 0.25 / 0.25 pts

A scenario where you feel unwell and go to a doctor. After a detailed


diagnosis, the doctor concludes that it is a regular fever and sends you

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 6/18
12/18/22, 9:41 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

home with medicines. He also has instructions to rest and drink plenty
of fluids and return to the hospital if the fever doesn't subside in a
week. This can be considered analogous to which stage of data
analytics

  Prescriptive

  Predictive

  Descriptive

  Diagnostic

Question 14 0.25 / 0.25 pts

Which of the sentences below best describes predictive analytics?

 
a gateway that offers access to a variety of vital information from many
different sources on one screen

 
methods for predicting future behavior using statistical analysis and
data mining, particularly to maximize the strategic value of corporate
intelligence

 
a method that uses feature analysis to predict people in photos and
tags them to other photos on its own

 
software applications, also known as "bots," that are dispatched to carry
out a mission and gather data from web pages on behalf of a user

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 7/18
12/18/22, 9:41 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 15 0.25 / 0.25 pts

A scenario where you feel unwell, go to a doctor and explain your


symptoms to the doctor. This can be considered analogous to which
stage of data analytics?

  Prescriptive

  Predictive

  Descriptive

  Diagnostic

Question 16 0.25 / 0.25 pts

Find the odd term.

  Data Wrangling

  Artificial Intelligence

  Deep Learning

  Machine Learning

Incorrect
Question 17 0 / 0.25 pts

Which of the following statements are true about data cleaning?

  All of the given options

  It enhances the data’s accuracy and integrity

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 8/18
12/18/22, 9:41 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
It focuses on transforming the data’s format by converting raw data into
another format

  It focuses on removing inaccurate data from your data set

Question 18 0.25 / 0.25 pts

In _________ phase, final report/technical document of process is


prepared

  Model building

  None

  Model planning

  Operationalize

Incorrect
Question 19 0 / 0.25 pts

Identify the data analytics task for the following scenario. A project
manager is analysing past projects to identify the software, hardware
and human resources that will be required to complete a new project
successfully on time. 

  Diagnostic Analytics

  Prescriptive Analytics

  Descriptive Analytics

  Predictive Analytics

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 9/18
12/18/22, 9:41 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 20 0.25 / 0.25 pts

Identify the data analytics task for the following scenario. A program
manager wants to analyze user clickstream data from the mobile app to
understand how many users were using a particular feature that was
rolled out. 

  Diagnostic Analytics

  Descriptive Analytics

  Predictive Analytics

  Prescriptive Analytics

Question 21 0.25 / 0.25 pts

Big Data cannot be stored in the Storage Area Network (SAN)?

  True

  False

Question 22 0.25 / 0.25 pts

Identify the data analytics task for the following scenario. A mother
analysed the answer scripts of her daughter and found that they should
concentrate on reading comprehension to score better in the upcoming
exams. 

  Prescriptive Analytics

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 10/18
12/18/22, 9:41 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Predictive Analytics

  Diagnostic Analytics

  Descriptive Analytics

Question 23 0.25 / 0.25 pts

Which of the following is an example of raw data?

  all of the mentioned

  original swath files generated from a sonar system

  a real-time GPS-encoded navigation file

  initial time-series file of temperature values

Question 24 0.25 / 0.25 pts

In a dataset, it is observed that date is mentioned as 11/19/2010 and


19th November ,2012. This is an example of

  Duplicate Data

  Inconsistent data

  Noisy data

  Incomplete data

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 11/18
12/18/22, 9:41 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 25 0.25 / 0.25 pts

Swiggy wants customers to provide their satisfaction feedback in a


scale of 1-5 where

1- Very Unsatisfied; 2- Somewhat Unsatisfied; 3- Neutral; 4- Somewhat


Satisfied; 5- Very Satisfied

What type of attribute is satisfaction?

  Nominal attribute

  Ratio attribute

  Ordinal attribute

  Interval attribute

Question 26 0.25 / 0.25 pts

Dress color is of__________________ attribute type.

  Nominal attribute

  Ordinal attribute

  Interval attribute

  Ratio attribute

Incorrect Question 27 0 / 0.25 pts

Regression techniques can be used for 

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 12/18
12/18/22, 9:41 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Either Identifying Missing Values or predicting continuous

  Identifying missing values

  None of the answers

  Predicting continuous output

Question 28 0.25 / 0.25 pts

Attributes cannot be called as:

  Variables

  Data point

  Features

  Dimensions

Question 29 0.25 / 0.25 pts

Which of the following does not represent a bin obtained by applying


equi-width binning on the data [24, 0, 6, 60, 63, 30, 87, 90, 87]?

  [0, 24, 30]

  [30, 60, 63]

  [87, 87, 90]

  [0, 6, 24]

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 13/18
12/18/22, 9:41 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 30 0.25 / 0.25 pts

Scaling a variable is not an essential criteria when the ML pipeline uses


algorithms based on gradient descent Optimization

  True

  False

Incorrect Question 31 0 / 0.25 pts

A table contains the salary details of professionals in different fields,


categorized by field. The table has got 100,000 rows. Around 10% of
the rows do not have salary data. It is required to fill in the missing
salary data as part of data pre-processing. Choose from below the best
method for this:

 
Find the mean salary of the available 90% of the data and use that to fill
in all the missing data.

  Delete the rows where salary data is missing.

  Guess the missing data manually.

 
Find the field wise mean salary for the available data and fill in the
missing salary data with the applicable mean salary.

Question 32 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 14/18
12/18/22, 9:41 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Which of the following does not represent a bin created by applying


equi-width binning on the data [24, 0, 6, 60, 63, 30, 87, 90, 87]?

  [0,6,24]

  [30,60,63]

  [0,24,30]

  [87,87,90]

Incorrect Question 33 0 / 0.25 pts

Identify the sampling technique used in the following use case. For ML
classification task, the algorithm requires that the test set has equal
examples from the three categories.

  Cluster Sampling

  Simple Random

  Stratified Random

  Systematic Sampling

Question 34 0.25 / 0.25 pts

Consider the following Python code.  


lb = sklearn.preprocessing.LabelBinarizer()
print(lb.fit_transform(['yes', 'no', 'no', 'yes']))
The last line of the code snippet will print

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 15/18
12/18/22, 9:41 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  [true, false, false, true]

  [True, False, False, True]

  [0,1,1,0]

  [1,0,0,1]

Incorrect Question 35 0 / 0.25 pts

For exploring continuous data using descriptive statistics which of the


following method is used

  Histogram

  Range

  Percentage

  Frequency

Question 36 0.25 / 0.25 pts

“Proximity” in Data Science terms means:

  A measure of the physical distance between two objects.

 
The area surrounding an object where the object is able to exert its
influence.

  The extent of similarity or dissimilarity between two objects.

  None of the above.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 16/18
12/18/22, 9:41 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 37 0.25 / 0.25 pts

In a box and whisker plot of data, point out the FALSE statement, about
Outliers

 
Outliers are beyond 1.5 times the Inter Quartile Range (IQR) from the
upper quartile

 
Outliers are beyond 1.5 times the Inter Quartile Range (IQR) from the
lower quartile

 
Outliers are beyond 1.5 times the Inter Quartile Range (IQR) from the
median

  Outliers are beyond the lowest or the highest value in the dataset

Incorrect Question 38 0 / 0.25 pts

Find the Jaccard coefficient for the 2 data objects with the below
feature vectors.

 0

  NONE

 1

  .7

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 17/18
12/18/22, 9:41 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Incorrect Question 39 0 / 0.25 pts

A data object is being described by a categorical attribute having four


categories, then for data analysis purpose if we want to transform the
attributes into numerical values, then

  C. it can be done by encoding using only 4 binary variables

  A. it can’t be done

  B. it can be done by encoding using only 3 binary variables

  D. it can be done by encoding using 3 or 4 binary variables

Question 40 0.25 / 0.25 pts

Which of the following methods are considered to be the best practice


for data cleaning?

  By breaking large dataset into small data

  cleansing large dataset without segmentation

  Sorting data by attributes

  All of the given options

Quiz Score: 7.75 out of 10

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 18/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz 1
Due Dec 19 at 23:59 Points 10 Questions 40
Available Dec 18 at 19:00 - Dec 19 at 23:59 Time Limit 60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss
this Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score

LATEST Attempt 1 55 minutes 7.17 out of 10

 Correct answers will be available on Dec 22 at 0:00.

Score for this quiz: 7.17 out of 10


Submitted Dec 18 at 22:28
This attempt took 55 minutes.

Question 1 0.25 / 0.25 pts

Which of the following is not a tool used by a Data Scientist?

  SAS

  Python

 R

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 1/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Springboot

Question 2 0.25 / 0.25 pts

Match the following set of roles to their job description

Data Analyst   data collection and inte

Data Scientist   solves problems using

Data Architect   warehousing the data

Data Engineer   implements, test and m

Question 3 0.25 / 0.25 pts

Which of the following is performed by Data Scientist?

  Create reproducible code

  Challenge results

  Define the question

  All of the mentioned

Question 4 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 2/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Which one of the following statements is not true?

  Data Science helps to translate data into a story.

  A part of data science is the machine learning algorithms.

  Data Science helps study Big Data.

  Data Science does not find patterns in the data.

Question 5 0.25 / 0.25 pts

Who among the following is responsible for presenting the idea to


stakeholders and representing the data team with those unfamiliar with
statistics.

  Data Engineer

  Data Architect

  Data Visualization Engineer

  Data Journalist

Incorrect Question 6 0 / 0.25 pts

Which of the following is not a tool used by a Data Scientist?

  Flask

  Hadoop

  MongoDB

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 3/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Amazon S3

Question 7 0.25 / 0.25 pts

Suppose that the mean and standard deviation of the values for an
attribute are 8.9 and 6.5, respectively. Apply z-score normalization to a
value of 3.2. [Answer should have a precision of X.XXXX]

  -0.8679

  0.8769

  0.8679

  -0.8769

Question 8 0.25 / 0.25 pts

Suppose the administrator measured the power consumption of an


entire network operations centre (NOC) and the consumption details
are: 90 W, 104 W, 98 W, 98 W, 105 W, 92 W, 102 W, 100 W, 110 W, 98
W, 200 W and 115 W.What is the range of power consumption?

  100W

  98W

  150W

  110W

Question 9 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 4/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Suppose that the minimum and maximum values for an attribute are
4.3 and 7.6, respectively. Compute the scaled value of 5.4 if min-max
normalization is applied to scale [0.0,1.0]. (Answer should have a
precision of X.XX]

  0.29

  0.33

  0.38

  0.30

Question 10 0.25 / 0.25 pts

Suppose that the minimum and maximum values for the attribute
income are $12,000 and $98,000, respectively. The new range is
[0.0,1.0]. Apply min-max normalization to a value of $73,600.

  0.758

  0.716

  0.561

  0.856

Incorrect Question 11 0 / 0.25 pts

Big data analytics does not benefit a company:

  Better understand customers

  Refine marketing and advertising

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 5/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Increase costs due to additional analytics investment

  Increase shareholder dividends

Question 12 0.25 / 0.25 pts

Google tries to differentiate emails as spam and non-spam, this is an


example of

  Association Rule

  Clustering

  Classification

  Regression

Partial Question 13 0.17 / 0.25 pts

Which of the following are classification problems?

 
Finding the shorter path between two already-existing routes between
two locations.

  filtering spam from mails

 
Predict traffic congestion along a specific route between two locations
using vehicle journey times.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 6/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
Calculating a room's temperature (in Celsius) based on other
environmental factors (such as atmospheric pressure, humidity etc).

 
Predicting if a cricket player is a batsman or bowler, given his playing
record.

Question 14 0.25 / 0.25 pts

Suppose a web user visits Flipkart during big billion-day sales.


Predicting whether he / she makes a purchase of a smartphone is a
___________________ task.

  Reinforcement

  Association

  Classification

  Regression

Question 15 0.25 / 0.25 pts

CRISP-DM methodology is specifically built for IT(Information


Technology) projects.

  True

  False

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 7/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 16 0.25 / 0.25 pts

What is the name of the Google-developed programming framework


that enables the creation of applications for processing big data sets in
a distributed computing environment?

  ZooKeeper

  MapReduce

  Hive

  Google Cloud Dataproc

Question 17 0.25 / 0.25 pts

BITS investigating to determine the cause for decreased admissions for


CSI PG program is an example of

  Prescriptive analysis

  Diagnositic Analysis

  Descriptive analysis

  Predictive analysis

Incorrect
Question 18 0 / 0.25 pts

Identify the data analytics task for the following scenario. An e-


commerce platform is recommending products to their customers to
improve the shopping experience.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 8/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Prescriptive Analytics

  Diagnostic Analytics

  Predictive Analytics

  Descriptive Analytics

Question 19 0.25 / 0.25 pts

Aanalyzing the data to determine why some phenomena related to


learning happened a type of

  Predictive

  Diagnostic

  Descriptive

  Prescriptive

Incorrect
Question 20 0 / 0.25 pts

A retail store realizes its sales were lower than expected in the last
quarter. Data scientist helps in identifying whether the sales were
affected uniformly across all segments or restricted to one segment?
What kind of analytics is this?

  Predictive

  Prescriptive

  Descriptive

  Diagnostic

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 9/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 21 0.25 / 0.25 pts

Which one of the following statement(s) is correct (Choose the most


appropriate answer)?

  None of the statements

 
Data Analytics refers to the techniques used to analyze data to enhance
productivity and business gain.

 
Data analytics is the pursuit of extracting meaning from raw data using
specialized computer systems.

 
Analytics is a process in which a computer examines information using
mathematical methods to find useful patterns.

  All the statements

Incorrect
Question 22 0 / 0.25 pts

A course instructor has data about students attendance in her course in


the past semester . What kind of analytics is she performing when she
creates a line graph based on this data?

  Prescriptive

  Predictive

  Descriptive

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 10/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Diagnostic

Question 23 0.25 / 0.25 pts

Attributes cannot be called as:

  Features

  Dimensions

  Data point

  Variables

Incorrect
Question 24 0 / 0.25 pts

Regression techniques can be used for 

  Either Identifying Missing Values or predicting continuous

  None of the answers

  Predicting continuous output

  Identifying missing values

Question 25 0.25 / 0.25 pts

In a dataset, CarColor is one of the attributes and it can take the


following values {Red, Green, Yellow, Black}, what type of attribute is
CarColor?

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 11/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Interval attribute

  Ordinal attribute

  Nominal attribute

  Ratio attribute

Question 26 0.25 / 0.25 pts

Raw data should be processed only one time.

  True

  False

Question 27 0.25 / 0.25 pts

"Identify false statement "

  Profession can be considered nominal

 
Response to 'Do you own a car?' can be considered symmetric binary

  Subject grade can be considered ordinal

 
Response to 'Do you have a rare disease?' can be considered
symmetric binary.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 12/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 28 0.25 / 0.25 pts

Temperature is an example for ____________________ attribute

  Continuous

  Normal

  Asymmetric

  Discrete

Question 29 0.25 / 0.25 pts

From mathematical models for epidemics one observes that the initial
phase is in the exponential growth. This can be verified by plotting the
number of infections on log-scale. This is a use case for which
technique.

  Sampling

  Discretization

  Aggregation

  Transformation

Question 30 0.25 / 0.25 pts

Which of the statement is TRUE ?

  Outliers should be addressed only in the training dataset

  Treatment of outliers depends on the problem statement

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 13/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Outliers should be addressed in the test dataset

  Outliers should always be addressed in the dataset

Question 31 0.25 / 0.25 pts

Which of the following is NOT a visualization technique?

  Lexeme

  TreeMap

  Matrix

  ConeTree

Question 32 0.25 / 0.25 pts

Suppose that the mean and standard deviation of the values for an
attribute are 6500 and 2000, respectively. Apply z-score normalization
to value of 8000.

  .7

  .75

  .6

  .5

Question 33 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 14/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Histogram analysis algorithms can be applied recursively to generate a


multilevel concept hierarchy. 

  True

  False

Incorrect Question 34 0 / 0.25 pts

In the given table below there is a requirement that to get the name,
gender, marks of the top-scoring students only. Which of the
following functionalities of data wrangling is used? 

  Replace

  Data exploration

  Filter

  Reshape

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 15/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Incorrect Question 35 0 / 0.25 pts

Which of the following methods are considered to be the best practice


for data cleaning?

  All of the given options

  Sorting data by attributes

  By breaking large dataset into small data

  cleansing large dataset without segmentation

Question 36 0.25 / 0.25 pts

The____ and standard deviation are strongly affected by outliers

  RANGE

  MODE

  MEDIAN

  MEAN

Incorrect Question 37 0 / 0.25 pts

Which of the following does not represent a bin created by applying


equi-width binning on the data [24, 0, 6, 60, 63, 30, 87, 90, 87]?

  [0,24,30]
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 16/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  [87,87,90]

  [30,60,63]

  [0,6,24]

Question 38 0.25 / 0.25 pts

Consider the following Python code.  


lb = sklearn.preprocessing.LabelBinarizer()
print(lb.fit_transform(['yes', 'no', 'no', 'yes']))
The last line of the code snippet will print

  [1,0,0,1]

  [true, false, false, true]

  [True, False, False, True]

  [0,1,1,0]

Incorrect Question 39 0 / 0.25 pts

If you come across an value for AGE as 102

  Change it to Mean Value

  Do Nothing

  Understand the Business Problem

  Change to Mode Value

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 17/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Incorrect Question 40 0 / 0.25 pts

Which of the following is a bottom-up approach for discretization?

  Entropy based discretization

  Histogram analysis

  Equal-frequency binning

  Correlation analysis

Quiz Score: 7.17 out of 10

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 18/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz 1
Due Dec 19 at 23:59 Points 10 Questions 40
Available Dec 18 at 19:00 - Dec 19 at 23:59 Time Limit 60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss
this Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score

LATEST Attempt 1 55 minutes 7.17 out of 10

 Correct answers will be available on Dec 22 at 0:00.

Score for this quiz: 7.17 out of 10


Submitted Dec 18 at 22:28
This attempt took 55 minutes.

Question 1 0.25 / 0.25 pts

Which of the following is not a tool used by a Data Scientist?

  SAS

  Python

 R

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 1/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Springboot

Question 2 0.25 / 0.25 pts

Match the following set of roles to their job description

Data Analyst   data collection and inte

Data Scientist   solves problems using

Data Architect   warehousing the data

Data Engineer   implements, test and m

Question 3 0.25 / 0.25 pts

Which of the following is performed by Data Scientist?

  Create reproducible code

  Challenge results

  Define the question

  All of the mentioned

Question 4 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 2/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Which one of the following statements is not true?

  Data Science helps to translate data into a story.

  A part of data science is the machine learning algorithms.

  Data Science helps study Big Data.

  Data Science does not find patterns in the data.

Question 5 0.25 / 0.25 pts

Who among the following is responsible for presenting the idea to


stakeholders and representing the data team with those unfamiliar with
statistics.

  Data Engineer

  Data Architect

  Data Visualization Engineer

  Data Journalist

Incorrect Question 6 0 / 0.25 pts

Which of the following is not a tool used by a Data Scientist?

  Flask

  Hadoop

  MongoDB

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 3/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Amazon S3

Question 7 0.25 / 0.25 pts

Suppose that the mean and standard deviation of the values for an
attribute are 8.9 and 6.5, respectively. Apply z-score normalization to a
value of 3.2. [Answer should have a precision of X.XXXX]

  -0.8679

  0.8769

  0.8679

  -0.8769

Question 8 0.25 / 0.25 pts

Suppose the administrator measured the power consumption of an


entire network operations centre (NOC) and the consumption details
are: 90 W, 104 W, 98 W, 98 W, 105 W, 92 W, 102 W, 100 W, 110 W, 98
W, 200 W and 115 W.What is the range of power consumption?

  100W

  98W

  150W

  110W

Question 9 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 4/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Suppose that the minimum and maximum values for an attribute are
4.3 and 7.6, respectively. Compute the scaled value of 5.4 if min-max
normalization is applied to scale [0.0,1.0]. (Answer should have a
precision of X.XX]

  0.29

  0.33

  0.38

  0.30

Question 10 0.25 / 0.25 pts

Suppose that the minimum and maximum values for the attribute
income are $12,000 and $98,000, respectively. The new range is
[0.0,1.0]. Apply min-max normalization to a value of $73,600.

  0.758

  0.716

  0.561

  0.856

Incorrect Question 11 0 / 0.25 pts

Big data analytics does not benefit a company:

  Better understand customers

  Refine marketing and advertising

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 5/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Increase costs due to additional analytics investment

  Increase shareholder dividends

Question 12 0.25 / 0.25 pts

Google tries to differentiate emails as spam and non-spam, this is an


example of

  Association Rule

  Clustering

  Classification

  Regression

Partial Question 13 0.17 / 0.25 pts

Which of the following are classification problems?

 
Finding the shorter path between two already-existing routes between
two locations.

  filtering spam from mails

 
Predict traffic congestion along a specific route between two locations
using vehicle journey times.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 6/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
Calculating a room's temperature (in Celsius) based on other
environmental factors (such as atmospheric pressure, humidity etc).

 
Predicting if a cricket player is a batsman or bowler, given his playing
record.

Question 14 0.25 / 0.25 pts

Suppose a web user visits Flipkart during big billion-day sales.


Predicting whether he / she makes a purchase of a smartphone is a
___________________ task.

  Reinforcement

  Association

  Classification

  Regression

Question 15 0.25 / 0.25 pts

CRISP-DM methodology is specifically built for IT(Information


Technology) projects.

  True

  False

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 7/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 16 0.25 / 0.25 pts

What is the name of the Google-developed programming framework


that enables the creation of applications for processing big data sets in
a distributed computing environment?

  ZooKeeper

  MapReduce

  Hive

  Google Cloud Dataproc

Question 17 0.25 / 0.25 pts

BITS investigating to determine the cause for decreased admissions for


CSI PG program is an example of

  Prescriptive analysis

  Diagnositic Analysis

  Descriptive analysis

  Predictive analysis

Incorrect
Question 18 0 / 0.25 pts

Identify the data analytics task for the following scenario. An e-


commerce platform is recommending products to their customers to
improve the shopping experience.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 8/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Prescriptive Analytics

  Diagnostic Analytics

  Predictive Analytics

  Descriptive Analytics

Question 19 0.25 / 0.25 pts

Aanalyzing the data to determine why some phenomena related to


learning happened a type of

  Predictive

  Diagnostic

  Descriptive

  Prescriptive

Incorrect
Question 20 0 / 0.25 pts

A retail store realizes its sales were lower than expected in the last
quarter. Data scientist helps in identifying whether the sales were
affected uniformly across all segments or restricted to one segment?
What kind of analytics is this?

  Predictive

  Prescriptive

  Descriptive

  Diagnostic

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 9/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 21 0.25 / 0.25 pts

Which one of the following statement(s) is correct (Choose the most


appropriate answer)?

  None of the statements

 
Data Analytics refers to the techniques used to analyze data to enhance
productivity and business gain.

 
Data analytics is the pursuit of extracting meaning from raw data using
specialized computer systems.

 
Analytics is a process in which a computer examines information using
mathematical methods to find useful patterns.

  All the statements

Incorrect
Question 22 0 / 0.25 pts

A course instructor has data about students attendance in her course in


the past semester . What kind of analytics is she performing when she
creates a line graph based on this data?

  Prescriptive

  Predictive

  Descriptive

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 10/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Diagnostic

Question 23 0.25 / 0.25 pts

Attributes cannot be called as:

  Features

  Dimensions

  Data point

  Variables

Incorrect
Question 24 0 / 0.25 pts

Regression techniques can be used for 

  Either Identifying Missing Values or predicting continuous

  None of the answers

  Predicting continuous output

  Identifying missing values

Question 25 0.25 / 0.25 pts

In a dataset, CarColor is one of the attributes and it can take the


following values {Red, Green, Yellow, Black}, what type of attribute is
CarColor?

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 11/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Interval attribute

  Ordinal attribute

  Nominal attribute

  Ratio attribute

Question 26 0.25 / 0.25 pts

Raw data should be processed only one time.

  True

  False

Question 27 0.25 / 0.25 pts

"Identify false statement "

  Profession can be considered nominal

 
Response to 'Do you own a car?' can be considered symmetric binary

  Subject grade can be considered ordinal

 
Response to 'Do you have a rare disease?' can be considered
symmetric binary.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 12/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 28 0.25 / 0.25 pts

Temperature is an example for ____________________ attribute

  Continuous

  Normal

  Asymmetric

  Discrete

Question 29 0.25 / 0.25 pts

From mathematical models for epidemics one observes that the initial
phase is in the exponential growth. This can be verified by plotting the
number of infections on log-scale. This is a use case for which
technique.

  Sampling

  Discretization

  Aggregation

  Transformation

Question 30 0.25 / 0.25 pts

Which of the statement is TRUE ?

  Outliers should be addressed only in the training dataset

  Treatment of outliers depends on the problem statement

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 13/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Outliers should be addressed in the test dataset

  Outliers should always be addressed in the dataset

Question 31 0.25 / 0.25 pts

Which of the following is NOT a visualization technique?

  Lexeme

  TreeMap

  Matrix

  ConeTree

Question 32 0.25 / 0.25 pts

Suppose that the mean and standard deviation of the values for an
attribute are 6500 and 2000, respectively. Apply z-score normalization
to value of 8000.

  .7

  .75

  .6

  .5

Question 33 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 14/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Histogram analysis algorithms can be applied recursively to generate a


multilevel concept hierarchy. 

  True

  False

Incorrect Question 34 0 / 0.25 pts

In the given table below there is a requirement that to get the name,
gender, marks of the top-scoring students only. Which of the
following functionalities of data wrangling is used? 

  Replace

  Data exploration

  Filter

  Reshape

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 15/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Incorrect Question 35 0 / 0.25 pts

Which of the following methods are considered to be the best practice


for data cleaning?

  All of the given options

  Sorting data by attributes

  By breaking large dataset into small data

  cleansing large dataset without segmentation

Question 36 0.25 / 0.25 pts

The____ and standard deviation are strongly affected by outliers

  RANGE

  MODE

  MEDIAN

  MEAN

Incorrect Question 37 0 / 0.25 pts

Which of the following does not represent a bin created by applying


equi-width binning on the data [24, 0, 6, 60, 63, 30, 87, 90, 87]?

  [0,24,30]
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 16/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  [87,87,90]

  [30,60,63]

  [0,6,24]

Question 38 0.25 / 0.25 pts

Consider the following Python code.  


lb = sklearn.preprocessing.LabelBinarizer()
print(lb.fit_transform(['yes', 'no', 'no', 'yes']))
The last line of the code snippet will print

  [1,0,0,1]

  [true, false, false, true]

  [True, False, False, True]

  [0,1,1,0]

Incorrect Question 39 0 / 0.25 pts

If you come across an value for AGE as 102

  Change it to Mean Value

  Do Nothing

  Understand the Business Problem

  Change to Mode Value

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 17/18
12/18/22, 10:29 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Incorrect Question 40 0 / 0.25 pts

Which of the following is a bottom-up approach for discretization?

  Entropy based discretization

  Histogram analysis

  Equal-frequency binning

  Correlation analysis

Quiz Score: 7.17 out of 10

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 18/18
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz 1
Due Dec 19 at 11:59pm Points 10 Questions 40
Available Dec 18 at 7pm - Dec 19 at 11:59pm Time Limit 60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss
this Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score

LATEST Attempt 1 47 minutes 8.75 out of 10

 Correct answers will be available on Dec 22 at 12am.

Score for this quiz: 8.75 out of 10


Submitted Dec 18 at 8:45pm
This attempt took 47 minutes.

Question 1 0.25 / 0.25 pts

Data science is the process of diverse set of data through


____________

  processing data

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 1/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  analysing data

  All of the options

  organizing data

Question 2 0.25 / 0.25 pts

Statement 1: Role of a Business analyst usually requires expertise on


building ML models.
Statement 2: Role of a Data Scientist usually requires expertise on
performing descriptive analysis.

Which of the following is right?

  Statement 1 is correct and Statement 2 is wrong

  Both statements are correct

  Statement 1 is wrong and Statement 2 is correct

  Both statements are wrong

Incorrect
Question 3 0 / 0.25 pts

Identify the data mining tasks from the below list.

i.  Dividing the customers of a company according to their gender


ii.  Predicting the future stock price of a company using historical
records
iii. Monitoring the heart rate of a patient for abnormalities
iv. Extracting the frequencies of a sound wave
 v. Sorting a student database based on student identification numbers

  i, ii, iii
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 2/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  iv, v

  ii, iii

  i, iv, v

Question 4 0.25 / 0.25 pts

Data Science project steps are highly linear.

  True

  False

Question 5 0.25 / 0.25 pts

The task of the data scientist include

  Collecting Raw Data

  communicate the results to the stakeholders

  All of the Above

  Identifying relevant features

Question 6 0.25 / 0.25 pts

Data scientist is not responsible for 

  data mining

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 3/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  building continuous data stream

  data analytics

  data manipulation

Question 7 0.25 / 0.25 pts

Compute the Manhattan distance between A(2,3) and B(5,7).

 5

 8

 6

 7

Question 8 0.25 / 0.25 pts

Suppose that the mean and standard deviation of the values for an
attribute are 8.9 and 6.5, respectively. Apply z-score normalization to a
value of 3.2. [Answer should have a precision of X.XXXX]

  0.8769

  -0.8679

  0.8679

  -0.8769

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 4/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 9 0.25 / 0.25 pts

Compute the width of each bin for data given below, if the number of
bins is 4.
[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46,
48, 36] 

 4

 3

 5

(49-30)/4

 6

Question 10 0.25 / 0.25 pts

Compute the Euclidean distance between A(2,3) and B(5,7).

 5

 4

 7

 6

Question 11 0.25 / 0.25 pts

A police team wants to predict the crime rate in a locality based on


certain attributes. Which modelling technique would be appropriate

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 5/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Classification

  Clustering

  Regression

  Optimization

Question 12 0.25 / 0.25 pts

Match with the most appropriate answer related to analytics.

Descriptive Analytics   what happened

Diagnostic Analytics   why happened

Predictive Analytics   what will happened

Prescriptive Analytics   what can make it happe

Question 13 0.25 / 0.25 pts

A scenario where you feel unwell and go to a doctor. The doctor asks
you questions like were you exposed to rain or cold climate or did you
have contact with a sick person or did you have food from outside etc.
Based on your answers,doctor came to the conclusion. This can be
considered analogous to which stage of data analytics?

  Descriptive

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 6/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Predictive

  Prescriptive

  Diagnostic

Question 14 0.25 / 0.25 pts

We are predicting the weather condition as foggy, warm, cloudy, and


misty at Bangalore using the data collected in the last one month. This
task is an example of

  Classification

  Association Rule

  Clustering

  Regression

Question 15 0.25 / 0.25 pts

A company wants to find the target segment of people for one of its
products. Which modelling technique would be generally appropriate

  Clustering

  Classification

  Ranking

  Regression

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 7/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 16 0.25 / 0.25 pts

In "Business Understanding" phase of the data science process, the


goals are identified and the objectives defined.

  True

  False

Question 17 0.25 / 0.25 pts

Data pre-processing improves the data quality and make data mining
algorithms efficient and effective. What are the data pre-processing
tasks along with Data cleaning?

  Data Transformation

  Data Integration

  All of the above.

  Data Reduction

Question 18 0.25 / 0.25 pts

Access Situation - "Cost and Benefits" - Falls into which phase of Crisp
- DM

  Data Preparation

  Data modeling

  Evaluation

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 8/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Business Understanding

Incorrect
Question 19 0 / 0.25 pts

Identify the data analytics task for the following scenario. A software
programmer develops a tool to convert programs written in Verilog to
Python.

  Predictive Analytics

  Prescriptive Analytics

  Diagnostic Analytics

  Descriptive Analytics

Incorrect
Question 20 0 / 0.25 pts

Identify the data analytics task for the following scenario. A student is
analyzing various blogs and vlogs to find out the skill set that has to be
acquired to become a data scientist in the future. 

  Predictive Analytics

  Prescriptive Analytics

  Diagnostic Analytics

  Descriptive Analytics

Question 21 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 9/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

In 2001, Big Data created the three Vs: volume, velocity, and variety.
The V's have grown to encompass veracity and value in the years
since. Big data is sometimes subjected to a fifth V, which is:

  Volatile

  Vector

  Variability

  Vulnerability

Question 22 0.25 / 0.25 pts

Amongst which of the following is / are the branch of statistics which


deals with the development of statistical methods is classified as ___.

  None of the mentioned above

  Applied statistics

  Industry statistics

  Economic statistics

Question 23 0.25 / 0.25 pts

What are the best practices for implementing big data analytics
programmes?

  Determining business direction based on data analysis

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 10/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Letting go entirely of 'old ideas' related to data management

 
Focusing on business goals and how to use big data analytics
technologies to meet them

 
Adopting data analysis tools based on a laundry list of their capabilities

Question 24 0.25 / 0.25 pts

Which of the following is not an example of ordinal attributes?

  Zip codes

  Exam Grades

  Military ranks

  Academic ranks

Question 25 0.25 / 0.25 pts

Which of the following Python library is required for web scraping?

  BeautifulSoup

  WebCrawler

  Scraper

  WebSpider

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 11/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 26 0.25 / 0.25 pts

What is the difference between interval/ratio and ordinal variables?

 
The distance between categories is equal across the range of
interval/ratio data

  Ordinal data can be rank ordered, but interval/ratio data cannot

  Interval/ratio variables contain only two categories

 
Ordinal variables have a fixed zero point, whereas interval/ratio
variables do not

Question 27 0.25 / 0.25 pts

Attributes cannot be called as:

  Variables

  Features

  Dimensions

  Data point

Incorrect Question 28 0 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 12/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

In a dataset, we are tracking whether a customer is purchasing a


product or not. This is an example of

  Symmetric attribute

  Discrete attribute

  Continuous attribute

  Asymmetric attribute

Question 29 0.25 / 0.25 pts

Dealing with missing values during data preparation is what kind of an


operation

  Data retrieval

  Data cleansing

  Data transformation

  Data combining

Question 30 0.25 / 0.25 pts

The____ and standard deviation are strongly affected by outliers

  MEDIAN

  MEAN

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 13/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  MODE

  RANGE

Question 31 0.25 / 0.25 pts

In which phase, the duplicates (of the data) are removed? Choose the
best possible answer.

  Data Preparation

  Data Collection

  Data Understanding

  Data Exploration

Question 32 0.25 / 0.25 pts

Which of the following is NOT a visualization technique?

  Matrix

  TreeMap

  Lexeme

  ConeTree

Question 33 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 14/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

There are two sets X={10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24} and Y = {-30, -31, -32, -33, -34, -35, -36, -37, -38, -39, -40, -41,
-42, -43, -44}. What is TRUE about the standard deviations of X and Y
i.e. σX and σY respectively?

  σX will be smaller than σY.

  Magnitude will be the same but the sign will be different

  Will be the same.

  σY will be smaller than σX.

Question 34 0.25 / 0.25 pts

The scatterplot implies that

  The features are positively correlated.

  The features are independent.

  The features are negatively correlated.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 15/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 35 0.25 / 0.25 pts

Consider the data set below.How will you(most appropriately) handle


the missing values for record 1,4,6 respectively.
Data set description.
price: continuous from 5118 to 45400.(class label)
normalized-losses: continuous from 65 to 256.
num-of-doors: four, two.

  replace by mode,replace by medain,replace by mean.

  replace by mode,ignore the tuple,replace by mean.

  ignore all 3 records.

  Ignore the tuple,replace by mean,replace by mode.

Question 36 0.25 / 0.25 pts

For a data analytics task to analyse feedback on her subject for a


class of 60 students, a school teacher decided to use the survey
submitted by the ten students who come for tuitions for that subject, at
her home. Identify the type of sampling she is doing.

  Systematic Sampling

  Stratified sampling

  Non Probabilistic sampling

  Sampling without replacement

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 16/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 37 0.25 / 0.25 pts

Which among the following are valid methods of handling missing data

       P. Eliminating Data Objects

      Q. Estimating Missing Values

      R. Ignoring the Missing Values during Analysis

      S. Replacing with all possible values

  Q and R

  All the options are correct

  R only

  P, Q and R

Incorrect Question 38 0 / 0.25 pts

Which is the major task of Data Integration

 
For the same real world entity, resolving attribute values from different
sources

  None of the above

  Identification of missing rows for identified key values

  Clustering of similar data from different sources

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 17/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 39 0.25 / 0.25 pts

Which of the following statements are true?

I. The smaller data sets resulting from data reduction require less
memory and processing time.

II. Aggregation provides a high‐level view of the data instead of a low‐


level view

III. High-quality data that is aggregated can lead to a high chance of


identifying false positives and negatives

IV. An advantage of aggregation is the potential loss of interesting


details

  Statement I and II

  Statement II and III

  Only Statement III

  Statement I and IV

Question 40 0.25 / 0.25 pts

The missing value for categorical attribute is substituted with

  mean of a value

  least frequent attribute value

  none of the above

  most frequent attribute value

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 18/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz Score: 8.75 out of 10

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 19/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz 1
Due Dec 19 at 11:59pm Points 10 Questions 40
Available Dec 18 at 7pm - Dec 19 at 11:59pm Time Limit 60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss
this Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score

LATEST Attempt 1 47 minutes 8.75 out of 10

 Correct answers will be available on Dec 22 at 12am.

Score for this quiz: 8.75 out of 10


Submitted Dec 18 at 8:45pm
This attempt took 47 minutes.

Question 1 0.25 / 0.25 pts

Data science is the process of diverse set of data through


____________

  processing data

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 1/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  analysing data

  All of the options

  organizing data

Question 2 0.25 / 0.25 pts

Statement 1: Role of a Business analyst usually requires expertise on


building ML models.
Statement 2: Role of a Data Scientist usually requires expertise on
performing descriptive analysis.

Which of the following is right?

  Statement 1 is correct and Statement 2 is wrong

  Both statements are correct

  Statement 1 is wrong and Statement 2 is correct

  Both statements are wrong

Incorrect
Question 3 0 / 0.25 pts

Identify the data mining tasks from the below list.

i.  Dividing the customers of a company according to their gender


ii.  Predicting the future stock price of a company using historical
records
iii. Monitoring the heart rate of a patient for abnormalities
iv. Extracting the frequencies of a sound wave
 v. Sorting a student database based on student identification numbers

  i, ii, iii
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 2/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  iv, v

  ii, iii

  i, iv, v

Question 4 0.25 / 0.25 pts

Data Science project steps are highly linear.

  True

  False

Question 5 0.25 / 0.25 pts

The task of the data scientist include

  Collecting Raw Data

  communicate the results to the stakeholders

  All of the Above

  Identifying relevant features

Question 6 0.25 / 0.25 pts

Data scientist is not responsible for 

  data mining

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 3/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  building continuous data stream

  data analytics

  data manipulation

Question 7 0.25 / 0.25 pts

Compute the Manhattan distance between A(2,3) and B(5,7).

 5

 8

 6

 7

Question 8 0.25 / 0.25 pts

Suppose that the mean and standard deviation of the values for an
attribute are 8.9 and 6.5, respectively. Apply z-score normalization to a
value of 3.2. [Answer should have a precision of X.XXXX]

  0.8769

  -0.8679

  0.8679

  -0.8769

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 4/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 9 0.25 / 0.25 pts

Compute the width of each bin for data given below, if the number of
bins is 4.
[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46,
48, 36] 

 4

 3

 5

(49-30)/4

 6

Question 10 0.25 / 0.25 pts

Compute the Euclidean distance between A(2,3) and B(5,7).

 5

 4

 7

 6

Question 11 0.25 / 0.25 pts

A police team wants to predict the crime rate in a locality based on


certain attributes. Which modelling technique would be appropriate

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 5/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Classification

  Clustering

  Regression

  Optimization

Question 12 0.25 / 0.25 pts

Match with the most appropriate answer related to analytics.

Descriptive Analytics   what happened

Diagnostic Analytics   why happened

Predictive Analytics   what will happened

Prescriptive Analytics   what can make it happe

Question 13 0.25 / 0.25 pts

A scenario where you feel unwell and go to a doctor. The doctor asks
you questions like were you exposed to rain or cold climate or did you
have contact with a sick person or did you have food from outside etc.
Based on your answers,doctor came to the conclusion. This can be
considered analogous to which stage of data analytics?

  Descriptive

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 6/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Predictive

  Prescriptive

  Diagnostic

Question 14 0.25 / 0.25 pts

We are predicting the weather condition as foggy, warm, cloudy, and


misty at Bangalore using the data collected in the last one month. This
task is an example of

  Classification

  Association Rule

  Clustering

  Regression

Question 15 0.25 / 0.25 pts

A company wants to find the target segment of people for one of its
products. Which modelling technique would be generally appropriate

  Clustering

  Classification

  Ranking

  Regression

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 7/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 16 0.25 / 0.25 pts

In "Business Understanding" phase of the data science process, the


goals are identified and the objectives defined.

  True

  False

Question 17 0.25 / 0.25 pts

Data pre-processing improves the data quality and make data mining
algorithms efficient and effective. What are the data pre-processing
tasks along with Data cleaning?

  Data Transformation

  Data Integration

  All of the above.

  Data Reduction

Question 18 0.25 / 0.25 pts

Access Situation - "Cost and Benefits" - Falls into which phase of Crisp
- DM

  Data Preparation

  Data modeling

  Evaluation

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 8/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Business Understanding

Incorrect
Question 19 0 / 0.25 pts

Identify the data analytics task for the following scenario. A software
programmer develops a tool to convert programs written in Verilog to
Python.

  Predictive Analytics

  Prescriptive Analytics

  Diagnostic Analytics

  Descriptive Analytics

Incorrect
Question 20 0 / 0.25 pts

Identify the data analytics task for the following scenario. A student is
analyzing various blogs and vlogs to find out the skill set that has to be
acquired to become a data scientist in the future. 

  Predictive Analytics

  Prescriptive Analytics

  Diagnostic Analytics

  Descriptive Analytics

Question 21 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 9/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

In 2001, Big Data created the three Vs: volume, velocity, and variety.
The V's have grown to encompass veracity and value in the years
since. Big data is sometimes subjected to a fifth V, which is:

  Volatile

  Vector

  Variability

  Vulnerability

Question 22 0.25 / 0.25 pts

Amongst which of the following is / are the branch of statistics which


deals with the development of statistical methods is classified as ___.

  None of the mentioned above

  Applied statistics

  Industry statistics

  Economic statistics

Question 23 0.25 / 0.25 pts

What are the best practices for implementing big data analytics
programmes?

  Determining business direction based on data analysis

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 10/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Letting go entirely of 'old ideas' related to data management

 
Focusing on business goals and how to use big data analytics
technologies to meet them

 
Adopting data analysis tools based on a laundry list of their capabilities

Question 24 0.25 / 0.25 pts

Which of the following is not an example of ordinal attributes?

  Zip codes

  Exam Grades

  Military ranks

  Academic ranks

Question 25 0.25 / 0.25 pts

Which of the following Python library is required for web scraping?

  BeautifulSoup

  WebCrawler

  Scraper

  WebSpider

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 11/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 26 0.25 / 0.25 pts

What is the difference between interval/ratio and ordinal variables?

 
The distance between categories is equal across the range of
interval/ratio data

  Ordinal data can be rank ordered, but interval/ratio data cannot

  Interval/ratio variables contain only two categories

 
Ordinal variables have a fixed zero point, whereas interval/ratio
variables do not

Question 27 0.25 / 0.25 pts

Attributes cannot be called as:

  Variables

  Features

  Dimensions

  Data point

Incorrect Question 28 0 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 12/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

In a dataset, we are tracking whether a customer is purchasing a


product or not. This is an example of

  Symmetric attribute

  Discrete attribute

  Continuous attribute

  Asymmetric attribute

Question 29 0.25 / 0.25 pts

Dealing with missing values during data preparation is what kind of an


operation

  Data retrieval

  Data cleansing

  Data transformation

  Data combining

Question 30 0.25 / 0.25 pts

The____ and standard deviation are strongly affected by outliers

  MEDIAN

  MEAN

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 13/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  MODE

  RANGE

Question 31 0.25 / 0.25 pts

In which phase, the duplicates (of the data) are removed? Choose the
best possible answer.

  Data Preparation

  Data Collection

  Data Understanding

  Data Exploration

Question 32 0.25 / 0.25 pts

Which of the following is NOT a visualization technique?

  Matrix

  TreeMap

  Lexeme

  ConeTree

Question 33 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 14/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

There are two sets X={10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24} and Y = {-30, -31, -32, -33, -34, -35, -36, -37, -38, -39, -40, -41,
-42, -43, -44}. What is TRUE about the standard deviations of X and Y
i.e. σX and σY respectively?

  σX will be smaller than σY.

  Magnitude will be the same but the sign will be different

  Will be the same.

  σY will be smaller than σX.

Question 34 0.25 / 0.25 pts

The scatterplot implies that

  The features are positively correlated.

  The features are independent.

  The features are negatively correlated.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 15/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 35 0.25 / 0.25 pts

Consider the data set below.How will you(most appropriately) handle


the missing values for record 1,4,6 respectively.
Data set description.
price: continuous from 5118 to 45400.(class label)
normalized-losses: continuous from 65 to 256.
num-of-doors: four, two.

  replace by mode,replace by medain,replace by mean.

  replace by mode,ignore the tuple,replace by mean.

  ignore all 3 records.

  Ignore the tuple,replace by mean,replace by mode.

Question 36 0.25 / 0.25 pts

For a data analytics task to analyse feedback on her subject for a


class of 60 students, a school teacher decided to use the survey
submitted by the ten students who come for tuitions for that subject, at
her home. Identify the type of sampling she is doing.

  Systematic Sampling

  Stratified sampling

  Non Probabilistic sampling

  Sampling without replacement

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 16/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 37 0.25 / 0.25 pts

Which among the following are valid methods of handling missing data

       P. Eliminating Data Objects

      Q. Estimating Missing Values

      R. Ignoring the Missing Values during Analysis

      S. Replacing with all possible values

  Q and R

  All the options are correct

  R only

  P, Q and R

Incorrect Question 38 0 / 0.25 pts

Which is the major task of Data Integration

 
For the same real world entity, resolving attribute values from different
sources

  None of the above

  Identification of missing rows for identified key values

  Clustering of similar data from different sources

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 17/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 39 0.25 / 0.25 pts

Which of the following statements are true?

I. The smaller data sets resulting from data reduction require less
memory and processing time.

II. Aggregation provides a high‐level view of the data instead of a low‐


level view

III. High-quality data that is aggregated can lead to a high chance of


identifying false positives and negatives

IV. An advantage of aggregation is the potential loss of interesting


details

  Statement I and II

  Statement II and III

  Only Statement III

  Statement I and IV

Question 40 0.25 / 0.25 pts

The missing value for categorical attribute is substituted with

  mean of a value

  least frequent attribute value

  none of the above

  most frequent attribute value

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 18/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz Score: 8.75 out of 10

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 19/19
12/18/22, 10:04 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz 1
Due Dec 19 at 23:59 Points 10 Questions 40
Available Dec 18 at 19:00 - Dec 19 at 23:59 Time Limit 60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss
this Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score

LATEST Attempt 1 43 minutes 9 out of 10

 Correct answers will be available on Dec 22 at 0:00.

Score for this quiz: 9 out of 10


Submitted Dec 18 at 22:01
This attempt took 43 minutes.

Question 1 0.25 / 0.25 pts

Data science is the process of diverse set of data through


____________

  analysing data

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 1/18
12/18/22, 10:04 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  processing data

  organizing data

  All of the options

Question 2 0.25 / 0.25 pts

Tableau is not likely to be used by

  Data Architect

  Data Scientist

  Visualization Engineer

  Business Analyst

Question 3 0.25 / 0.25 pts

Which of the following is not a tool used by a Data Scientist?

  Python

  Springboot

  SAS

 R

Incorrect Question 4 0 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 2/18
12/18/22, 10:04 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Which of the following best describes the difference between the data
analyst and data scientist?

 
Data analyst does estimation whereas data scientist predicts & explains
it as well

 
Data analyst are more proficient in R whereas Data scientists are more
proficient in Python

 
Data analyst just deal with numbers whereas data scientists deals with
algorithms

  Data analyst and data scientists plays the same role in the project

Question 5 0.25 / 0.25 pts

Pattern Recognition is a sub-field of Data Science.

  True

  False

Question 6 0.25 / 0.25 pts

If a person is coming from software development background, which of


the following Data science project roles will best suite him?

  Storyteller

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 3/18
12/18/22, 10:04 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Machine Learning Engineer

  Data Analyst

  Big Data Engineer

Question 7 0.25 / 0.25 pts

Suppose the administrator measured the power consumption of an


entire network operations centre (NOC) and the consumption details
are: 90 W, 104 W, 98 W, 98 W, 105 W, 92 W, 102 W, 100 W, 110 W, 98
W, 200 W and 115 W.What is the range of power consumption?

  150W

  98W

  100W

  110W

Question 8 0.25 / 0.25 pts

Find the inter quartile range(IQR) of the the following inputs :


4,4,10,11,15,7,14,12,6

  10

  11

 7

 8

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 4/18
12/18/22, 10:04 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 9 0.25 / 0.25 pts

Calculate cosine similarity between two documents represented by


vectors

     x= (0,1,1,1,2,3,0,0,0,2,1) and

     y= (2,1,1,2,1,2,1,1,0,0,0)

  0.034

  50.2

  0.64

  0.99

Question 10 0.25 / 0.25 pts

Suppose that the minimum and maximum values for the attribute
income are $12,000 and $98,000, respectively. The new range is
[0.0,1.0]. Apply min-max normalization to a value of $73,600.

  0.716

  0.758

  0.856

  0.561

Question 11 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 5/18
12/18/22, 10:04 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

We are predicting the humidity at Bangalore using the data collected in


the last one month. This task is an example of

  Clustering

  Regression

  Association Rule

  Classification

Question 12 0.25 / 0.25 pts

Identify the data analytics task for the following scenario. A mother
analysed the answer scripts of her daughter and found that they should
concentrate on reading comprehension to score better in the upcoming
exams. 

  Prescriptive Analytics

  Diagnostic Analytics

  Predictive Analytics

  Descriptive Analytics

Question 13 0.25 / 0.25 pts

A course instructor has data about students attendance in her course in


the past semester . What kind of analytics is she performing when she
creates a line graph based on this data?

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 6/18
12/18/22, 10:04 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Predictive

  Descriptive

  Prescriptive

  Diagnostic

Question 14 0.25 / 0.25 pts

Identify the data analytics task for the following scenario. A car service
showroom manager wants to analyze his marketing and sales data to
understand the reason for drop in sales. 

  Descriptive Analytics

  Diagnostic Analytics

  Prescriptive Analytics

  Predictive Analytics

Question 15 0.25 / 0.25 pts

For a given data set, the following data preprocessing techniques used
to improve the quality of data:

Data cleaning, Data integration, Data reduction and Data


transformations.

Which of the following statements is TRUE?

  These techniques are mutually exclusive

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 7/18
12/18/22, 10:04 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  The techniques are not mutually exclusive

  All of the given options are true

  All the techniques may not work together

Question 16 0.25 / 0.25 pts

Match with the most appropriate answer, related to the tools available
to a Data Science life cycle.

SMAM   8 stages

SEMMA   5 stages

Big data life cycle   12 phases

CRISP-DM   6 stages

Question 17 0.25 / 0.25 pts

A scenario where you feel unwell and go to a doctor. The doctor asks
you questions like were you exposed to rain or cold climate or did you
have contact with a sick person or did you have food from outside etc.
Based on your answers,doctor came to the conclusion. This can be
considered analogous to which stage of data analytics?

  Predictive

  Descriptive

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 8/18
12/18/22, 10:04 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Prescriptive

  Diagnostic

Question 18 0.25 / 0.25 pts

The answer to following question can be obtained by which type of


analytics?
"Whats the best that can happen?"

  Diagnostic Analytics

  Descriptive Analytics

  Predictive Analytics

  Prescriptive Analytics

Question 19 0.25 / 0.25 pts

We are predicting the weather condition as foggy, warm, cloudy, and


misty at Bangalore using the data collected in the last one month. This
task is an example of

  Association Rule

  Clustering

  Regression

  Classification

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 9/18
12/18/22, 10:04 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 20 0.25 / 0.25 pts

Identify the data analytics task for the following scenario. Google is
using tools to suggest texts or phrases while composing emails.

  Descriptive Analytics

  Predictive Analytics

  Prescriptive Analytics

  Diagnostic Analytics

Incorrect
Question 21 0 / 0.25 pts

The most time-consuming phase in a data science process is

  Data collection

  Deployment

  Data preparation

  Data Modelling

Question 22 0.25 / 0.25 pts

Data pre-processing improves the data quality and make data mining
algorithms efficient and effective. What are the data pre-processing
tasks along with Data cleaning?

  All of the above.

  Data Transformation

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 10/18
12/18/22, 10:04 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Data Reduction

  Data Integration

Question 23 0.25 / 0.25 pts

Which of the following is an example of raw data?

  a real-time GPS-encoded navigation file

  initial time-series file of temperature values

  original swath files generated from a sonar system

  all of the mentioned

Incorrect
Question 24 0 / 0.25 pts

Street numbers are __________________ type of attributes.

  Nominal attribute

  Ordinal attribute

  Ratio attribute

  Interval attribute

Question 25 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 11/18
12/18/22, 10:04 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Temperature in kelvin is of__________________ attribute type.

  Nominal attribute

  Ordinal attribute

  Ratio attribute

  Interval attribute

Question 26 0.25 / 0.25 pts

In a dataset, we are tracking whether a customer is purchasing a


product or not. This is an example of

  Continuous attribute

  Discrete attribute

  Asymmetric attribute

  Symmetric attribute

Question 27 0.25 / 0.25 pts

In a FashionStore Data set the feature Apparel_Price showing the cost


of the apparel is an example of

  Ordinal attribute

  Numeric attribute

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 12/18
12/18/22, 10:04 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Nominal attribute

  Continuous attribute

Question 28 0.25 / 0.25 pts

A box plot is the visual representation of the following statistical


summary

  Min, Median, Mode

  Minimum, First Quartile, Median, Third Quartile, Maximum

  Minimum, Average, Maximum

  Minimum, First Quartile, Third Quartile, Second Quartile, Mean

Question 29 0.25 / 0.25 pts

Consider the data set below.How will you(most appropriately) handle


the missing values for record 1,4,6 respectively.
Data set description.
price: continuous from 5118 to 45400.(class label)
normalized-losses: continuous from 65 to 256.
num-of-doors: four, two.

  replace by mode,replace by medain,replace by mean.

  replace by mode,ignore the tuple,replace by mean.

  ignore all 3 records.

  Ignore the tuple,replace by mean,replace by mode.


https://bits-pilani.instructure.com/courses/1704/quizzes/3453 13/18
12/18/22, 10:04 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 30 0.25 / 0.25 pts

Which of the following is/are true about data aggregation?

  It can work with both quantitative and qualitative attributes

  It provides a high-level view of the data

  It does not complement well with statistical analysis

  It preserves all details even after aggregation at all times

Question 31 0.25 / 0.25 pts

For exploring continuous data using descriptive statistics which of the


following method is used

  Percentage

  Histogram

  Range

  Frequency

Question 32 0.25 / 0.25 pts

“Similarity” means:

  A listing of the similar features of a collection of objects.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 14/18
12/18/22, 10:04 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
The number of tuples in a database whose attributes have similar
values.

  A collection of similar objects.

  Numerical measure of how alike two data objects are.

Question 33 0.25 / 0.25 pts

Identify the false statement.

  It may not be a good idea to drop a data field with missing values.

 
As a data scientist, even if you know that certain outliers are valid data,
you might still omit them from model construction.

 
As a data scientist, while cleaning the data, you are not concerned with
why the data values are missing. You just fix them.

 
As a data scientist, even if you know that certain outliers are valid data,
you might still omit them from model construction.

Question 34 0.25 / 0.25 pts

Choose the correct empirical relation

  mean−median ≈ 3×(mean−mode)

  mean−mode ≈ 3×(median-mean).

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 15/18
12/18/22, 10:04 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  median-mean ≈ 3×(mean−mode)

  mean−mode ≈ 3×(mean−median).

Question 35 0.25 / 0.25 pts

Consider the following Python code. 


input=['Havells','Philips','Syska','Eveready','Lloyd']
le = sklearn.preprocessing.LabelEncoder()
le.fit(input)
print(le.transform(‘Syska’))
The last line of the code snippet will return

 2

 4

 1

 3

Question 36 0.25 / 0.25 pts

The following representation technique shows the maximum, minimum,


median, and other characterizing measures at the same time

  Boxplot

  Tabulation

  Histogram

  Pareto diagram

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 16/18
12/18/22, 10:04 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 37 0.25 / 0.25 pts

In which phase, the duplicates (of the data) are removed? Choose the
best possible answer.

  Data Understanding

  Data Collection

  Data Preparation

  Data Exploration

Question 38 0.25 / 0.25 pts

Choose the possible combinations for drawing a scatter plot for given
data.

  Age and Test1 Score

  Weight and Test 1 Score

  Age and weight

  All the options

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 17/18
12/18/22, 10:04 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Incorrect Question 39 0 / 0.25 pts

What does discretization do ?

  Both of the above.

  None of the above.

  Reduce overall data size.

  Convert a continuous attribute into a discrete attribute.

Question 40 0.25 / 0.25 pts

Which statement best compares histogram and 5-number summary

  Histogram is always more informative on data distribution

  5-number summary can be used for non-numeric data

  5-number summary is robust w.r.t. noise and outliers

  Histogram can be very informative with finer ranges

Quiz Score: 9 out of 10

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 18/18
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz 1
Due Dec 19 at 23:59 Points 10 Questions 40
Available Dec 18 at 19:00 - Dec 19 at 23:59 Time Limit 60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss
this Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score
LATEST Attempt 1 54 minutes 8.63 out of 10

 Correct answers will be available on Dec 22 at 0:00.

Score for this quiz: 8.63 out of 10


Submitted Dec 18 at 19:55
This attempt took 54 minutes.

Question 1 0.25 / 0.25 pts

If a person is coming from software development background, which of


the following Data science project roles will best suite him?

  Storyteller

  Big Data Engineer

  Machine Learning Engineer

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 1/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Data Analyst

Question 2 0.25 / 0.25 pts

Which of the following is not a application for data science?

  Privacy Checker

  Image & Speech Recognition

  Recommendation Systems

  Online Price Comparison

Question 3 0.25 / 0.25 pts

Which one of the following is not a necessary characteristic of a Data


Scientist?

  Communicative

  Punctual

  Creative

  Technical

Question 4 0.25 / 0.25 pts

Which of the following are correct skills for a Data Scientist?

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 2/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  probability and statistics

  all of the options

  machine learning/deep learning

  data wrangling

Question 5 0.25 / 0.25 pts

There are following key roles in data science project

  Data Scientist, Architect, SME, Sponsor, Programmer

  Data Scientist, Analyst, Sponsor

  Data Scientist, Analyst

  Data Scientist, Analyst, SME, Programmer

Question 6 0.25 / 0.25 pts

In which of the following analysts are allocated to units throughout the


organization and their activities are coordinated by a central entity.

  Functional

  Center of Excellence

  Coordinational

  Consulting

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 3/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 7 0.25 / 0.25 pts

Suppose the administrator measured the power consumption of an


entire network operations centre (NOC) and the consumption details
are: 90 W, 104 W, 98 W, 98 W, 105 W, 92 W, 102 W, 100 W, 110 W, 98
W, 200 W and 115 W.What is the range of power consumption?

  98W

  100W

  150W

  110W

Question 8 0.25 / 0.25 pts

Suppose that the mean and standard deviation of the values for an
attribute are 8.9 and 6.5, respectively. Apply z-score normalization to a
value of 3.2. [Answer should have a precision of X.XXXX]

  0.8679

  0.8769

  -0.8769

  -0.8679

Question 9 0.25 / 0.25 pts

Suppose that the mean and standard deviation of the values for an
attribute are 8.9 and 6.5, respectively. Apply z-score normalization to a
value of 10.7. 
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 4/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  -0.2679

  0.2679

  0.2769

  -0.2769

Question 10 0.25 / 0.25 pts

Suppose that the minimum and maximum values for an attribute are 4.3
and 7.6, respectively. Compute the scaled value of 5.4 if min-max
normalization is applied to scale [-1.0,+1.0]. (Answer should have a
precision of X.XX)

  0.66

  0.76

  0.33

  -0.33

Incorrect Question 11 0 / 0.25 pts

Which of the following methodologies focus the most on model


deployment and embedding in operational systems?

  SMAM

  CRISP-DM

  SEMMA

  All options are correct

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 5/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 12 0.25 / 0.25 pts

Find the odd term.

  Artificial Intelligence

  Data Wrangling

  Deep Learning

  Machine Learning

Question 13 0.25 / 0.25 pts

Which data analytic approach can identify the probabilities of an action?

  Prescriptive

  Predictive

  Diagnostic

  Descriptive

Question 14 0.25 / 0.25 pts

Aanalyzing the data to determine why some phenomena related to


learning happened a type of

  Predictive

  Prescriptive
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 6/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Diagnostic

  Descriptive

Question 15 0.25 / 0.25 pts

Identify the data analytics task for the following scenario. A


pharmaceutical organization is developing a new drug or vaccine to
compact Covid-19 using machine learning techniques where the data is
from the existing drugs and the diseases it can fight or cure. 

  Prescriptive Analytics

  Diagnostic Analytics

  Predictive Analytics

  Descriptive Analytics

Question 16 0.25 / 0.25 pts

To generate genuine business value, simply gathering and keeping data


isn't enough. Technologies for big data analytics are required to:

  Integrate data from internal and external sources

  Formulate eye-catching charts and graphs

  Determine business goals and objectives

  Extract valuable insights from the data

Question 17 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 7/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

We are predicting the humidity at Bangalore using the data collected in


the last one month. This task is an example of

  Regression

  Classification

  Association Rule

  Clustering

Question 18 0.25 / 0.25 pts

We are predicting the weather condition as foggy, warm, cloudy, and


misty at Bangalore using the data collected in the last one month. This
task is an example of

  Classification

  Association Rule

  Regression

  Clustering

Question 19 0.25 / 0.25 pts

Match the following data analytics to their description:

Descriptive   What happened?

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 8/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Diagnostic   Why did this happen?

Predictive   What might happen in t

Prescriptive   What should we do nex

Question 20 0.25 / 0.25 pts

Training and testing datasets are developed during _________ phase.

  None

  Model planning

  Operationalize

  Model building

Incorrect
Question 21 0 / 0.25 pts

Fortis-Apollo hospital is planning to  design a model which maps


patients to the best possible treatments based on the diagnosis. Identify
the data analytics task for this scenario 

  Descriptive Analytics

  Predictive Analytics

  Diagnostic Analytics

  Cognitive analytics

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 9/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 22 0.25 / 0.25 pts

Which of the following artifacts are not considered in the Descriptive


analytics?

  Alerts

  Adhoc reports

  Predictive model

  Standard report

Question 23 0.25 / 0.25 pts

Attributes cannot be called as:

  Variables

  Data point

  Features

  Dimensions

Question 24 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 10/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

In a FashionStore Data set the feature Jacket_Shade { Grey,brown,


black, Indigo, Beige , Khaki} is an example of

  Nominal attribute

  Numeric attribute

  Ordinal attribute

  Continuous attribute

Question 25 0.25 / 0.25 pts

What is the difference between interval/ratio and ordinal variables?

  Interval/ratio variables contain only two categories

  Ordinal data can be rank ordered, but interval/ratio data cannot

 
The distance between categories is equal across the range of
interval/ratio data

 
Ordinal variables have a fixed zero point, whereas interval/ratio
variables do not

Question 26 0.25 / 0.25 pts

A box plot is the visual representation of the following statistical


summary

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 11/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Minimum, First Quartile, Median, Third Quartile, Maximum

  Min, Median, Mode

  Minimum, First Quartile, Third Quartile, Second Quartile, Mean

  Minimum, Average, Maximum

Incorrect
Question 27 0 / 0.25 pts

Clustering techniques can be used in

  Unsupervised Learning

  None of the answers

  Feature Selection

  Either Unsupervised Learning or Feature Selection

Question 28 0.25 / 0.25 pts

e – mail is an example of ____________________ data

  Quasi-structured

  Semi-structured

  Unstructured

  Structured

Partial
Question 29 0.13 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 12/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Match the following techniques with the definitions

Binarization   maps a continuous attr

Binning   Divide the range of a co

Concept Hierarchy   Smooth out the effect o

Functional Transformation  Transform attribute valu

Incorrect
Question 30 0 / 0.25 pts

Consider the following Python code. 


input=['Havells','Philips','Syska','Eveready','Lloyd']
le = sklearn.preprocessing.LabelEncoder()
le.fit(input)
print(le.transform('Lloyd'))
The last line of the code snippet will print

 1

 4

 3

 2

Incorrect
Question 31 0 / 0.25 pts

The dissimilarity between two data objects is

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 13/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Lower when objects are not alike

  None of the above

  Lower when objects are more alike

  Higher when objects are more alike

Question 32 0.25 / 0.25 pts

Identify the sampling technique used in the following use case. For ML
classification task, the algorithm requires that the test set has equal
examples from the three categories.

  Systematic Sampling

  Stratified Random

  Cluster Sampling

  Simple Random

Question 33 0.25 / 0.25 pts

Missing values should always be imputed before training the model.

  Mostly True

  False

  None of the answers

  True

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 14/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 34 0.25 / 0.25 pts

Creating dummy variables during data preparation is what kind of an


operation

  Data transformation

  Data retrieval

  Data cleansing

  Data combining

Question 35 0.25 / 0.25 pts

“Similarity” means:

  A collection of similar objects.

  A listing of the similar features of a collection of objects.

 
The number of tuples in a database whose attributes have similar
values.

  Numerical measure of how alike two data objects are.

Question 36 0.25 / 0.25 pts

Data transformation is done to improve ------------- in algorithm

  Noise

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 15/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Inconsistencies

  Accuracy & Efficiency

  Integration

  Redundancy

Question 37 0.25 / 0.25 pts

A data object is being described by a categorical attribute having four


categories, then for data analysis purpose if we want to transform the
attributes into numerical values, then

  D. it can be done by encoding using 3 or 4 binary variables

  B. it can be done by encoding using only 3 binary variables

  C. it can be done by encoding using only 4 binary variables

  A. it can’t be done

Question 38 0.25 / 0.25 pts

In a boxplot, where Q1, Q2 and Q3 are the first, second and third
quartiles respectively, the interquartile range IQR is calculated as:

  None of the above

  IQR = Q3 – Q1

  IQR = Q3-Q2

  IQR = Q2-Q1

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 16/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 39 0.25 / 0.25 pts

Consider the following Python code. 


input=['Havells','Philips','Syska','Eveready','Lloyd']
le = sklearn.preprocessing.LabelEncoder()
le.fit(input)
print(le.transform(‘Syska’))
The last line of the code snippet will return

 2

 3

 4

 1

Question 40 0.25 / 0.25 pts

From mathematical models for epidemics one observes that the initial
phase is in the exponential growth. This can be verified by plotting the
number of infections on log-scale. This is a use case for which
technique.

  Transformation

  Aggregation

  Discretization

  Sampling

Quiz Score: 8.63 out of 10

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 17/17
18/12/2022, 22:38 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz 1
Due
Dec 19 at 23:59
Points
10
Questions
40
Available
Dec 18 at 19:00 - Dec 19 at 23:59
Time Limit
60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss
this Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score
LATEST Attempt 1
60 minutes 8.63 out of 10


Correct answers will be available on Dec 22 at 0:00.

Score for this quiz:


8.63 out of 10
Submitted Dec 18 at 22:24
This attempt took 60 minutes.

Question 1 0.25
/ 0.25 pts

Which of the following is not a application for data science?

 
Privacy Checker

 
Online Price Comparison

 
Recommendation Systems

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 1/19
18/12/2022, 22:38 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
Image & Speech Recognition

Question 2 0.25
/ 0.25 pts

Which of the following are correct skills for a Data Scientist?

 
data wrangling

 
machine learning/deep learning

 
all of the options

 
probability and statistics

Question 3 0.25
/ 0.25 pts

R and Python is not a preferred skill for

 
Data Architect

 
Data Analyst

 
Data Journalist

 
Data Scientist

Question 4 0.25
/ 0.25 pts

Tableau is not likely to be used by

 
Visualization Engineer

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 2/19
18/12/2022, 22:38 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
Business Analyst

 
Data Scientist

 
Data Architect

Question 5 0.25
/ 0.25 pts

Which of the following best describes the difference between the data
analyst and data scientist?

 
Data analyst does estimation whereas data scientist predicts & explains
it as well

 
Data analyst and data scientists plays the same role in the project

 
Data analyst just deal with numbers whereas data scientists deals with
algorithms

 
Data analyst are more proficient in R whereas Data scientists are more
proficient in Python

Question 6 0.25
/ 0.25 pts

"Take last weeks data and predict the sales for next six months within
next few days with 2 people team" is good example of which of the
following Data Science challenge?

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 3/19
18/12/2022, 22:38 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
Lack of professional having sound knowledge of Data science skills

 
Unrealistic expectations

 
Data sizing

 
Insufficient timing for project completion

Question 7 0.25
/ 0.25 pts

Compute the Jaccard's Co-efficient for x = (1,0,0,0,1,1,1) and y =


(0,1,1,0,0,1,0)

 
0.166

 
0.455

 
0.765

 
0.234

Question 8 0.25
/ 0.25 pts

Suppose the Lab administrator measured the power consumption of an


entire network operations centre (NOC) and the set of consumption
details is 90 W, 104 W, 98 W, 98 W, 105 W, 92 W, 102 W, 100 W, 110
W, 98 W, 210 W and 115 W.What is the mode power consumption?

 
90W

 
98W

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 4/19
18/12/2022, 22:38 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
100W

 
150W

Question 9 0.25
/ 0.25 pts

Suppose that the mean and standard deviation of the values for an
attribute are 8.9 and 6.5, respectively. Apply z-score normalization to a
value of 3.2. [Answer should have a precision of X.XXXX]

 
-0.8679

 
0.8769

 
0.8679

 
-0.8769

Incorrect
Question 10 0
/ 0.25 pts

Compute the Cosine similarity between A(3,6) and B(4,8).

 
1.0

 
1.3

 
1.1

 
1.2

Question 11 0.25
/ 0.25 pts
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 5/19
18/12/2022, 22:38 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Which of the following data science project step is the most critical step
for the success of the project?

 
Model Selection

 
Data preprocessing

 
Model Evaluation

 
Model Building

Question 12 0.25
/ 0.25 pts

Access Situation - "Cost and Benefits" - Falls into which phase of Crisp
- DM

 
Evaluation

 
Data modeling

 
Business Understanding

 
Data Preparation

Question 13 0.25
/ 0.25 pts

In "Business Understanding" phase of the data science process, the


goals are identified and the objectives defined.

 
True

 
False

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 6/19
18/12/2022, 22:38 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Incorrect
Question 14 0
/ 0.25 pts

A scenario where you feel unwell and go to a doctor. After a detailed


diagnosis, the doctor concludes that it is a regular fever and sends you
home with medicines. He also has instructions to rest and drink plenty
of fluids and return to the hospital if the fever doesn't subside in a week.
This can be considered analogous to which stage of data analytics

 
Prescriptive

 
Descriptive

 
Predictive

 
Diagnostic

Question 15 0.25
/ 0.25 pts

A scenario where you feel unwell and go to a doctor. The doctor asks
you questions like were you exposed to rain or cold climate or did you
have contact with a sick person or did you have food from outside etc.
Based on your answers,doctor came to the conclusion. This can be
considered analogous to which stage of data analytics?

 
Diagnostic

 
Prescriptive

 
Descriptive

 
Predictive

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 7/19
18/12/2022, 22:38 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 16 0.25
/ 0.25 pts

Identify the data analytics task for the following scenario. A physics
teacher is analyzing the answer scripts of the students to identify the
areas that he/she should concentrate on so that the students
understand the concepts better.

 
Predictive Analytics

 
Prescriptive Analytics

 
Diagnostic Analytics

 
Descriptive Analytics

Question 17 0.25
/ 0.25 pts

Which of the following step is performed next by data scientist after


acquiring the data?

 
Data integration

 
Data cleaning

 
Data replication

 
All of them

Question 18 0.25
/ 0.25 pts

In 2001, Big Data created the three Vs: volume, velocity, and variety.
The V's have grown to encompass veracity and value in the years
since. Big data is sometimes subjected to a fifth V, which is:
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 8/19
18/12/2022, 22:38 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
Vector

 
Variability

 
Vulnerability

 
Volatile

Question 19 0.25
/ 0.25 pts

A company wants to find the target segment of people for one of its
products. Which modelling technique would be generally appropriate

 
Regression

 
Classification

 
Clustering

 
Ranking

Question 20 0.25
/ 0.25 pts

If you were to arrange the following data analytics techniques in the


increasing order of complexity, which of the following is considered the
correct order?

 
Diagnostic, Predictive, Prescriptive, Descriptive

 
Descriptive, Diagnostic, Predictive, Prescriptive

 
Prescriptive, Descriptive, Diagnostic, Predictive

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 9/19
18/12/2022, 22:38 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
Predictive, Diagnostic, Prescriptive, Descriptive

Incorrect
Question 21 0
/ 0.25 pts

Identify the data analytics task for the following scenario. A project
manager is analysing past projects to identify the risk involved and how
they were mitigated.

 
Diagnostic Analytics

 
Prescriptive Analytics

 
Descriptive Analytics

 
Predictive Analytics

Question 22 0.25
/ 0.25 pts

Which of the following are classification problems?

 
Predict traffic congestion along a specific route between two locations
using vehicle journey times.

 
Finding the shorter path between two already-existing routes between
two locations.

 
Predicting if a cricket player is a batsman or bowler, given his playing
record.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 10/19
18/12/2022, 22:38 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
filtering spam from mails

 
Calculating a room's temperature (in Celsius) based on other
environmental factors (such as atmospheric pressure, humidity etc).

Question 23 0.25
/ 0.25 pts

e – mail is an example of ____________________ data

 
Unstructured

 
Semi-structured

 
Quasi-structured

 
Structured

Question 24 0.25
/ 0.25 pts

Data lake mainly stores data in

 
Structured format

 
Raw format

 
both

 
none

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 11/19
18/12/2022, 22:38 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 25 0.25
/ 0.25 pts

Point out the correct statement.

 
None of the mentioned

 
Preprocessed data is original source of data

 
Raw data is the data obtained after processing steps

 
Raw data is original source of data

Question 26 0.25
/ 0.25 pts

Which of the following Python library is required for web scraping?

 
WebSpider

 
Scraper

 
WebCrawler

 
BeautifulSoup

Question 27 0.25
/ 0.25 pts

Which of the following is not true for SEMMA model?

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 12/19
18/12/2022, 22:38 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
It places less emphasis on the initial planning phases covered in
CRISP-DM (Business Understanding and Data Understanding phases)
and omits entirely the Deployment phase.

 
The SEMMA model also emphasizes data mining as a non-linear,
adaptive process.

 
SEMMA is a logical organisation of the functional tool set of SAS
Enterprise Miner for carrying out the core tasks of data mining.

 
SEMMA is focused on the model development aspects of data mining.

Question 28 0.25
/ 0.25 pts

In a dataset, CarColor is one of the attributes and it can take the


following values {Red, Green, Yellow, Black}, what type of attribute is
CarColor?

 
Interval attribute

 
Ratio attribute

 
Ordinal attribute

 
Nominal attribute

Question 29 0.25
/ 0.25 pts

Missing values should always be imputed before training the model.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 13/19
18/12/2022, 22:38 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
Mostly True

 
False

 
None of the answers

 
True

Partial
Question 30 0.13
/ 0.25 pts

Match the following sampling techniques with the use cases

Random Sampling   Randomly pick mang

Systematic Sampling   Select of fruits based

Stratified Sampling   Pick one mango, one

Quota Sampling   Pick every 5th fruit fr

Question 31 0.25
/ 0.25 pts

A sample is ------------------ if it has approximately the same property of


interest

 
Representative

 
Probabilistic

 
Qualitative

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 14/19
18/12/2022, 22:38 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
Systamatic

Question 32 0.25
/ 0.25 pts

Consider the following Python code. 

input=['Havells','Philips','Syska','Eveready','Lloyd']

le = sklearn.preprocessing.LabelEncoder()

le.fit(input)

print(le.transform(‘Syska’))

The last line of the code snippet will return

 
1

 
2

 
3

 
4

Question 33 0.25
/ 0.25 pts

There are two sets X={10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24} and Y = {-30, -31, -32, -33, -34, -35, -36, -37, -38, -39, -40, -41,
-42, -43, -44}. What is TRUE about the standard deviations of X and Y
i.e. σX and σY respectively?

 
Will be the same.

 
Magnitude will be the same but the sign will be different

 
σX will be smaller than σY.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 15/19
18/12/2022, 22:38 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
σY will be smaller than σX.

Question 34 0.25
/ 0.25 pts

The first 20 rows of a continuous values attribute of a large data set is


given below.Missing values are handled by replacing with 0.What kind
of discretisation would you prefer?

Continous_Attribute

130.2

125

126.75

 
Equal Width binning

 
Equal frequency binning.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 16/19
18/12/2022, 22:38 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Incorrect Question 35 0
/ 0.25 pts

Which of the following is a bottom-up approach for discretization?

 
Entropy based discretization

 
Correlation analysis

 
Equal-frequency binning

 
Histogram analysis

Question 36 0.25
/ 0.25 pts

Data transformation is done to improve ------------- in algorithm

 
Inconsistencies

 
Noise

 
Redundancy

 
Accuracy & Efficiency

 
Integration

Incorrect
Question 37 0
/ 0.25 pts

Identify the sampling technique used in the following use case. For ML
classification task, an engineer used every 8th example to generate the
test set .

 
Simple Random

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 17/19
18/12/2022, 22:38 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
Stratified Random

 
Systematic Sampling

 
Cluster Sampling

Question 38 0.25
/ 0.25 pts

How can you handle missing or corrupted data in a dataset?

 
Drop missing rows or columns

 
All of the given options

 
Assign a unique category to missing values

 
Replace missing values with mean/median/mode

Question 39 0.25
/ 0.25 pts

Exploratory data analysis does not help in

 
Finding out the data type of a variable

 
Finding statistical estimates of a variable

 
In univariate and bivariate variable analysis

 
Derivation of new attributes

Question 40 0.25
/ 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 18/19
18/12/2022, 22:38 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Histogram analysis algorithms can be based on either equal width or


equal frequency.

 
True

 
False

Quiz Score:
8.63 out of 10

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 19/19
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz 1
Due Dec 19 at 23:59 Points 10 Questions 40
Available Dec 18 at 19:00 - Dec 19 at 23:59 Time Limit 60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss
this Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score
LATEST Attempt 1 60 minutes 6.5 out of 10

 Correct answers will be available on Dec 22 at 0:00.

Score for this quiz: 6.5 out of 10


Submitted Dec 18 at 20:33
This attempt took 60 minutes.

Question 1 0.25 / 0.25 pts

Which of the following are correct skills for a Data Scientist?

  machine learning/deep learning

  all of the options

  data wrangling

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 1/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  probability and statistics

Incorrect Question 2 0 / 0.25 pts

R and Python is not a preferred skill for

  Data Analyst

  Data Scientist

  Data Architect

  Data Journalist

Question 3 0.25 / 0.25 pts

Statement 1: Business Intelligence involves analyzing past data and


reporting on it.

Statement 2: Descriptive analysis involves analyzing past data and


reporting on it.

Which of the following is right?

  Both statements are correct

  Both statements are wrong

  Statement 1 is correct and Statement 2 is wrong

  Statement 1 is wrong and Statement 2 is correct

Question 4 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 2/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Data scientist is not responsible for 

  building continuous data stream

  data manipulation

  data mining

  data analytics

Question 5 0.25 / 0.25 pts

Which of the following is not a application for data science?

  Online Price Comparison

  Image & Speech Recognition

  Privacy Checker

  Recommendation Systems

Question 6 0.25 / 0.25 pts

For an organization, having not much data analytics need and just
embarking on the analytics path will most likely structure its data team
in a

  Centralized model

  Federated model

  Consulting model

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 3/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Functional model

Question 7 0.25 / 0.25 pts

Suppose that the minimum and maximum values for an attribute are 4.3
and 7.6, respectively. Compute the scaled value of 5.4 if min-max
normalization is applied to scale [0.0,1.0]. (Answer should have a
precision of X.XX]

  0.33

  0.43

  0.36

  0.45

Question 8 0.25 / 0.25 pts

Compute the Jaccard's Co-efficient for x = (1,0,0,0,1,1,1) and y =


(0,1,1,0,0,1,0)

  0.765

  0.166

  0.455

  0.234

Question 9 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 4/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Compute the width of each bin for data given below, if the number of
bins is 5.

[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46, 48,
36]

 6

 4

(49−30)÷5

 3

 5

Question 10 0.25 / 0.25 pts

Suppose that the minimum and maximum values for an attribute are 4.3
and 7.6, respectively. Compute the scaled value of 5.4 if min-max
normalization is applied to scale [0.0,1.0]. (Answer should have a
precision of X.XX]

  0.33

  0.29

  0.30

  0.38

Incorrect Question 11 0 / 0.25 pts

The most time-consuming phase in a data science process is

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 5/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Deployment

  Data Modelling

  Data preparation

  Data collection

Question 12 0.25 / 0.25 pts

Which of the following is not a type of predictive analytics?

  What is the student's performance in the next question?

  average attendance of the students in the current semester?

  Which course will the student take in the next semester?

 
what is the average score of all students in the CBSE 10th Math Exam

Incorrect
Question 13 0 / 0.25 pts

Which of the following methodologies focus the most on model


deployment and embedding in operational systems?

  CRISP-DM

  SEMMA

  SMAM

  All options are correct

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 6/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 14 0.25 / 0.25 pts

BITS investigating to determine the cause for decreased admissions for


CSI PG program is an example of

  Descriptive analysis

  Prescriptive analysis

  Diagnositic Analysis

  Predictive analysis

Question 15 0.25 / 0.25 pts

Identify the data analytics task for the following scenario. A physics
teacher is analyzing the answer scripts of the students to identify the
areas that he/she should concentrate on so that the students
understand the concepts better.

  Predictive Analytics

  Diagnostic Analytics

  Descriptive Analytics

  Prescriptive Analytics

Question 16 0.25 / 0.25 pts

Which of the following statement is false with respect to data set?

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 7/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
Sub setting can be used to select and exclude variables and
observations

 
Merging concerns combining datasets on the same observations to
produce a result

  Raw data should be processed only one time.

  All of the listed options

Incorrect
Question 17 0 / 0.25 pts

A police team wants to predict the crime rate in a locality based on


certain attributes. Which modelling technique would be appropriate

  Optimization

  Classification

  Regression

  Clustering

Question 18 0.25 / 0.25 pts

Google tries to differentiate emails as spam and non-spam, this is an


example of

  Clustering

  Association Rule

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 8/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Classification

  Regression

Incorrect
Question 19 0 / 0.25 pts

Which of the following data science project step is the most critical step
for the success of the project?

  Model Selection

  Model Evaluation

  Model Building

  Data preprocessing

Question 20 0.25 / 0.25 pts

The answer to following question can be obtained by which type of


analytics?
"Whats the best that can happen?"

  Diagnostic Analytics

  Descriptive Analytics

  Prescriptive Analytics

  Predictive Analytics

Question 21 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 9/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

A company wants to find the target segment of people for one of its
products. Which modelling technique would be generally appropriate

  Classification

  Clustering

  Regression

  Ranking

Question 22 0.25 / 0.25 pts

Identify the data analytics task for the following scenario. A student is
analyzing various blogs and vlogs to find out the skill set that has to be
acquired to become a data scientist in the future. 

  Diagnostic Analytics

  Descriptive Analytics

  Predictive Analytics

  Prescriptive Analytics

Question 23 0.25 / 0.25 pts

Which of the following Python library is required for web scraping?

  WebSpider

  WebCrawler

  BeautifulSoup

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 10/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Scraper

Incorrect Question 24 0 / 0.25 pts

As part of a survey in a large organization, one of the features that you


capture is designation. This type of data has the characteristic

  None of the given answers

  Discrete, Qualitative, Ordinal

  Nominal, Quantitative, Discrete

  Discrete, Quantitative, Ordinal

Question 25 0.25 / 0.25 pts

A dataset can contain

  Both Quantitative and Qualitative Values

  None of the answers

  Quantitative values

  Qualitative Values

Question 26 0.25 / 0.25 pts

Which of the following properties are supported by interval attribute.

   P)  Distinctness
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 11/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

   Q)  Order

   R)   Meaningful differences

   S)  Meaningful ratios

  P and R

  P and S

  P,Q and R

  All the options are correct

Question 27 0.25 / 0.25 pts

In a dataset, CarColor is one of the attributes and it can take the


following values {Red, Green, Yellow, Black}, what type of attribute is
CarColor?

  Interval attribute

  Ordinal attribute

  Nominal attribute

  Ratio attribute

Question 28 0.25 / 0.25 pts

Raw data should be processed only one time.

  True

  False

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 12/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Incorrect
Question 29 0 / 0.25 pts

Which statement best compares histogram and 5-number summary

  5-number summary can be used for non-numeric data

  Histogram is always more informative on data distribution

  Histogram can be very informative with finer ranges

  5-number summary is robust w.r.t. noise and outliers

Incorrect
Question 30 0 / 0.25 pts

Consider the following Python code. 


input=['Havells','Philips','Syska','Eveready','Lloyd']
le = sklearn.preprocessing.LabelEncoder()
le.fit(input)
print(le.transform('Lloyd'))
The last line of the code snippet will print

 2

 4

 1

 3

Question 31 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 13/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

For a data analytics task to analyse feedback on her subject for a class
of 60 students, a school teacher decided to use the survey submitted
by the ten students who come for tuitions for that subject, at her home.
Identify the type of sampling she is doing.

  Non Probabilistic sampling

  Systematic Sampling

  Sampling without replacement

  Stratified sampling

Incorrect
Question 32 0 / 0.25 pts

Which of the statement is TRUE ?

  Outliers should be addressed in the test dataset

  Outliers should always be addressed in the dataset

  Outliers should be addressed only in the training dataset

  Treatment of outliers depends on the problem statement

Incorrect
Question 33 0 / 0.25 pts

The statistical description (x1,x2, . . . ,xN)/N, for the data values x1,x2, .
. . ,xN is called as their ________________

  mean

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 14/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  IQR

  median

  mode

Incorrect Question 34 0 / 0.25 pts

Exploratory data analysis does not help in

  Finding out the data type of a variable

  In univariate and bivariate variable analysis

  Derivation of new attributes

  Finding statistical estimates of a variable

Question 35 0.25 / 0.25 pts

Which of the following statements are true with respect to data quality
issues?

  The given data set should not miss any values or attributes

  All of the above

 
Pre-processing of data is required to address the problems of
inconsistency, incompleteness.

 
If data are not updated time to time there will be a negative impact on
data quality.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 15/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Incorrect
Question 36 0 / 0.25 pts

Consider the following Python code.  


lb = sklearn.preprocessing.LabelBinarizer()
print(lb.fit_transform(['yes', 'no', 'no', 'yes']))
The last line of the code snippet will print

  [0,1,1,0]

  [true, false, false, true]

  [True, False, False, True]

  [1,0,0,1]

Question 37 0.25 / 0.25 pts

Identify the false statement.

 
As a data scientist, while cleaning the data, you are not concerned with
why the data values are missing. You just fix them.

 
As a data scientist, even if you know that certain outliers are valid data,
you might still omit them from model construction.

  It may not be a good idea to drop a data field with missing values.

 
As a data scientist, even if you know that certain outliers are valid data,
you might still omit them from model construction.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 16/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Incorrect
Question 38 0 / 0.25 pts

A data object is being described by a categorical attribute having four


categories, then for data analysis purpose if we want to transform the
attributes into numerical values, then

  A. it can’t be done

  C. it can be done by encoding using only 4 binary variables

  B. it can be done by encoding using only 3 binary variables

  D. it can be done by encoding using 3 or 4 binary variables

Question 39 0.25 / 0.25 pts

In a boxplot, where Q1, Q2 and Q3 are the first, second and third
quartiles respectively, the interquartile range IQR is calculated as:

  None of the above

  IQR = Q3 – Q1

  IQR = Q2-Q1

  IQR = Q3-Q2

Incorrect
Question 40 0 / 0.25 pts

The____ and standard deviation are strongly affected by outliers

  MEAN

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 17/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  RANGE

  MEDIAN

  MODE

Quiz Score: 6.5 out of 10

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 18/18
012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
/012ÿ4
506ÿÿ0.ÿÿ1879. ÿ7819:;ÿ06 ÿ/06;:189;ÿ,6
<=>1?>@?6ÿÿ03ÿÿ0.766ÿÿÿ0.ÿÿ1879. ÿA1B6ÿC1B1:ÿD6ÿ  &
E9;:F0G:189;
V$ÿ ÿÿÿ'ÿ*%ÿ(Jÿ)ÿX ÿP((ÿÿ'ÿ%QÿP$ÿÿÿ*W %ÿ 4ÿNÿJ ÿ*&&ÿ$&
)
V$ÿ- ÿ&ÿ*ÿNÿÿ$ ÿÿ$&ÿÿ'ÿ*%(ÿÿJ ÿ&)ÿX ÿP((ÿÿ'ÿ((PÿÿY
'Wÿÿ$ÿ%Q &ÿ- &ÿNÿJ ÿ&W%ÿÿ- &)ÿ
Z$&ÿ$ÿ*&ÿ%%%ÿ&Pÿÿ$ÿ- &)[
V$ÿ&P&ÿP((ÿ'ÿQ&'(ÿ(JÿNÿ$ÿJ&ÿÿ$ÿ- ÿ$&ÿ)
H((ÿ$ÿ'&)

H*%ÿI&J
<::6BK: A1B6 LG8F6
C<AMLA <::6BK:ÿ4 9+ÿ* & 3).1ÿ ÿNÿ06

Oÿÿ&P&ÿP((ÿ'ÿQ('(ÿÿÿ11ÿÿ6766)
ÿNÿ$&ÿ- 7ÿRSTUÿ ÿNÿ06
 '*ÿÿ03ÿÿ11719
V$&ÿ*%ÿWÿ9+ÿ* &)
7>F:1>? /06;:189ÿ4 \S4]ÿ^ÿ\SU_ÿK:;
HN(ÿ((Yÿ*%&&ÿNÿ$ÿN((PYÿ&*&

$%&722'&%()&  )*2 &&20+6,2- &28,98 020.


012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
ÿÿ%ÿ/ÿ
ÿÿ $ÿ/ÿ

9:;87<=>ÿ2 0123ÿ5ÿ0123ÿ678
?$ÿÿ@((A/ÿBCÿ(&ÿÿÿ&ÿ%D

ÿÿÿ&4ÿE$4ÿ 4ÿ%&4ÿ
/**ÿ

9:;87<=>ÿF 0123ÿ5ÿ0123ÿ678

ÿG/ÿ&ÿÿ& '@(ÿ@ÿÿ)
ÿÿ? ÿ

9:;87<=>ÿH 0123ÿ5ÿ0123ÿ678
I$$ÿÿ@ÿ$ÿ@((A/ÿ&*&ÿ&ÿÿ J

$%&722'&%()&  )*2 &&20+6,2- &28,98 120.


012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#

ÿÿÿÿ&ÿÿ/ÿ%&ÿÿ$ÿ)ÿ

?>@=AA;@7 9:;87<=>ÿ5 0ÿ2ÿ0345ÿ678

ÿ&&ÿ&ÿÿ&%&'(ÿ/

ÿÿÿ*% (ÿ

9:;87<=>ÿB 0345ÿ2ÿ0345ÿ678
C$$ÿ/ÿ$ÿ/((DEÿ%%$ÿ&$ (ÿ'ÿ &ÿÿ&FÿÿG(H&&
- &I

ÿÿJÿ ÿ$ÿ- &ÿD$$ÿ&ÿÿ'ÿ&Dÿ

$%&722'&%()&  )*2 &&20+6,2- &28,98 820.


012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#

89:76;<=ÿ> /012ÿ4ÿ/012ÿ567
 %%&ÿ$ÿ$ÿ** *ÿÿ*?* *ÿ@( &ÿAÿÿ' ÿÿ,)8
ÿ+)B4ÿ&%@(C)ÿ*% ÿ$ÿ&(ÿ@( ÿAÿ9),ÿAÿ**?
*(ÿ&ÿ%%(ÿÿ&(ÿD0)64E0)6F)ÿG&Hÿ&$ (ÿ$@ÿ
%&ÿAÿI)II#

ÿÿ6)88ÿ

89:76;<=ÿJ /012ÿ4ÿ/012ÿ567
 %%&ÿ$ÿ$ÿ** *ÿÿ*?* *ÿ@( &ÿAÿÿ' ÿÿ,)8
ÿ+)B4ÿ&%@(C)ÿ*% ÿ$ÿ&(ÿ@( ÿAÿ9),ÿAÿ**?
*(ÿ&ÿ%%(ÿÿ&(ÿD6)640)6F)ÿG&Hÿ&$ (ÿ$@ÿ
%&ÿAÿI)IIF

ÿÿ6)88ÿ

$%&722'&%()&  )*2 &&20+6,2- &28,98 ,20.


012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#

89:76;<=ÿ> /012ÿ4ÿ/012ÿ567
 %%&ÿ$ÿ$ÿ** *ÿÿ*?* *ÿ@( &ÿAÿÿ' ÿÿ,)8
ÿ+)B4ÿ&%@(C)ÿ*% ÿ$ÿ&(ÿ@( ÿAÿ9),ÿAÿ**?
*(ÿ&ÿ%%(ÿÿ&(ÿD6)640)6E)ÿF&Gÿ&$ (ÿ$@ÿ
%&ÿAÿH)HHE

ÿÿ6)88ÿ

89:76;<=ÿI/ /012ÿ4ÿ/012ÿ567

*% ÿ$ÿJK&ÿAAÿAÿ?ÿLÿ0464646404040#ÿÿCÿL
6404046464046#

ÿÿ6)0BBÿ

89:76;<=ÿII /012ÿ4ÿ/012ÿ567

$%&722'&%()&  )*2 &&20+6,2- &28,98 920.


012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#

01ÿ$ÿÿ(1&ÿ&2ÿ0ÿ$ÿ0((34ÿ&)ÿ5ÿ%6
*4ÿ&ÿ(1&4ÿ%&ÿ%6&ÿÿ01ÿ$ÿ&2ÿ7(7ÿÿ$3
$1ÿ3ÿ*4)

ÿÿ&%7ÿ5(1&ÿ

ABC@?DEFÿG: 89:;ÿ=ÿ89:;ÿ>?@

5&&ÿ ÿÿH&ÿÿI0&HÿÿJ((&ÿÿ3$$ÿ%$&ÿ0ÿ&%ÿ


ÿÿI &&&ÿK&4ÿ

ABC@?DEFÿGL 89:;ÿ=ÿ89:;ÿ>?@
ÿ&ÿ &&ÿÿÿ&7ÿ$ÿ%ÿ0*ÿ$ÿ3
)

$%&722'&%()&  )*2 &&20+6,2- &28,98 .20/


012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
ÿÿ/((ÿ0ÿ$ÿ%&ÿ

B?C>DD<C8 :;<98=>?ÿ@A 1ÿ3ÿ1456ÿ789


/ÿ(ÿ&ÿ(&ÿ&ÿ&(&ÿEÿ(Eÿ$ÿF%ÿÿ$ÿ(&
- )ÿÿ&&ÿ$(%&ÿÿ0GHÿE$$ÿ$ÿ&(&ÿEÿ00
0*(Gÿ&&ÿ((ÿ&H*&ÿÿ&ÿÿÿ&H*IÿJ$ÿKÿ0
(G&ÿ&ÿ$&I

ÿÿH&ÿ

:;<98=>?ÿ@6 1456ÿ3ÿ1456ÿ789

ÿÿ%$&4ÿ0(ÿ%2$(ÿ *ÿ0ÿ%&&ÿ&
%%

$%&722'&%()&  )*2 &&20+6,2- &28,98 +20.


012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#

ÿÿ/%(ÿ

9:;87<=>ÿ?@ 0123ÿ5ÿ0123ÿ678

ÿÿA&$ÿÿ&ÿ$ÿB ÿ$ÿCÿ4 44D 4DD Eÿ&ÿ


F*%(ÿB

ÿÿ/(ÿ' ÿ

9:;87<=>ÿ?G 0123ÿ5ÿ0123ÿ678
H$ÿ%&&ÿ$ I$ÿJ$$ÿ' &&&&ÿ(K&ÿ &*ÿÿÿ$
K%&ÿBÿB*ÿÿÿÿBÿ%&ÿÿ(L&ÿ'JÿM &
ÿ*&ÿ&ÿLJÿ&7
ÿÿÿ*Iÿ

$%&722'&%()&  )*2 &&20+6,2- &28,98 320.


012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#

89:76;<=ÿ>? /012ÿ4ÿ/012ÿ567

@$$ÿAÿ$ÿA((BCÿ&ÿÿ&&(ÿ%&&ÿÿB$$ÿ$ÿ((C
*$&ÿÿ%%(ÿÿDÿÿ%&E

ÿÿÿ Cÿ

89:76;<=ÿ>F /012ÿ4ÿ/012ÿ567
AGÿ$ÿÿ(G&ÿ&HÿAÿ$ÿA((BCÿ&)ÿIÿ& ÿ&
(GCÿJ &ÿ'(C&ÿÿJ(C&ÿÿAÿ ÿ$ÿ&H((ÿ&ÿ$ÿ$&ÿÿ'
- ÿÿ'*ÿÿÿ&&ÿÿ$ÿA  )ÿ

ÿÿ&%JÿI(G&ÿ

K=L<MM:L6 89:76;<=ÿ1/ /ÿ4ÿ/012ÿ567

$%&722'&%()&  )*2 &&20+6,2- &28,98 .20.


012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#

/ÿÿ01ÿÿ&4ÿ$ÿ2((30ÿÿ%%&&0ÿ$- &ÿ &ÿ


*%1ÿ$ÿ- (4ÿ2ÿ7
ÿ(04ÿÿ04ÿÿ ÿÿ
&2*&)
5$$ÿ2ÿ$ÿ2((30ÿ&*&ÿ&ÿ6789

ÿÿ:((ÿ2ÿ$ÿ01ÿ%&ÿÿ ÿ

DEFCBGHIÿ=J ;<=>ÿ@ÿ;<=>ÿABC
7
 ÿ*$(04ÿ&ÿ&%2((4ÿ' (ÿ2ÿ62*ÿ6$(04#
%K&)

ÿÿ/(&ÿ

DEFCBGHIÿ== ;<=>ÿ@ÿ;<=>ÿABC

5ÿÿ%0ÿ$ÿ$ *4ÿÿL0(ÿ &0ÿ$ÿÿ((ÿ


$ÿ(&ÿÿ*$)ÿ6$&ÿ&Mÿ&ÿÿN*%(ÿ2
$%&722'&%()&  )*2 &&20+6,2- &28,98 0620.
012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
ÿÿ/0&&ÿ

:;<98=>?ÿ3@ 1234ÿ6ÿ1234ÿ789

ÿ *'&ÿÿÿA%ÿBÿ' &)

ÿÿC(ÿ' ÿ

:;<98=>?ÿ3D 1234ÿ6ÿ1234ÿ789
/Eÿÿ&$ (ÿ'ÿ%&&ÿ(Aÿÿ*)

ÿÿF(&ÿ

:;<98=>?ÿ34 1234ÿ6ÿ1234ÿ789
$%&722'&%()&  )*2 &&20+6,2- &28,98 0020.
012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#

ÿÿ/&$ÿÿ&ÿ$ÿ0 ÿ1%%(
ÿ&$23ÿ$ÿ&ÿ0
$ÿ%%(ÿ&ÿÿ4*%(ÿ0

ÿÿ  &ÿ' ÿ

>?@=<ABCÿ7D 5678ÿ:ÿ5678ÿ;<=

ÿ ÿ$ÿÿ&*)
ÿÿ2ÿÿ&ÿ3(ÿ& ÿ0ÿÿ

>?@=<ABCÿ7E 5678ÿ:ÿ5678ÿ;<=

F*% ÿÿG(Hÿ&ÿ0ÿ' ÿI%)

ÿÿJÿ' ÿ

$%&722'&%()&  )*2 &&20+6,2- &28,98 0120.


012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#

89:76;<=ÿ1> /012ÿ4ÿ/012ÿ567

?ÿ'@ÿ%(ÿ&ÿ$ÿA& (ÿ%&ÿBÿ$ÿB((CDÿ&&(ÿ& **E

ÿÿ * *4ÿF&ÿ (4ÿ 4ÿG$ÿ (4ÿ @* *ÿ

I=J<KK:J6 89:76;<=ÿ1H /ÿ4ÿ/012ÿ567


&ÿ$ÿB((CDÿ
E$ÿ)ÿ
% LMNOA((&N4N
$(%&N4NE&PN4NAEN4N(ENQ
(ÿLÿ&P()%%&&D)'(#
()B% #
%()&B*N(EN##
G$ÿ(&ÿ(ÿBÿ$ÿÿ&%%ÿC((ÿ%
ÿÿ,ÿ

$%&722'&%()&  )*2 &&20+6,2- &28,98 0820.


012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#

89:76;<=ÿ>/ /012ÿ4ÿ/012ÿ567
?$$ÿ@ÿ$ÿ@((ABÿ&ÿCDEÿÿF& (ÿ$- G

ÿÿH*ÿ

89:76;<=ÿ>I /012ÿ4ÿ/012ÿ567
&ÿ$ÿ@((ABÿ
J$ÿ)ÿÿ
*%ÿ *%Jÿ&ÿ%
@*ÿ&K()%%&&Bÿ*%ÿL
H*0ÿMÿ%)JN,04,84,94,+4,.O#
'0ÿMÿL$&$(MÿH*0)*##
H*0'ÿMÿ'0)@&@*H*0)&$%04ÿ0##
%H*0'#
E$ÿ(&ÿ(ÿ@ÿ$ÿÿ&%%ÿA((ÿ%

ÿÿN646464040Oÿ

$%&722'&%()&  )*2 &&20+6,2- &28,98 0,20.


012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#

89:76;<=ÿ>1 /012ÿ4ÿ/012ÿ567

$&ÿ$ÿÿ*%(ÿ(

ÿÿ*?*ÿ@ÿ8A*?*#)ÿ

89:76;<=ÿ>> /012ÿ4ÿ/012ÿ567

B$ÿ*&&CÿD( ÿEÿC(ÿ' ÿ&ÿ& '& ÿF$

ÿÿ*&ÿE- ÿ' ÿD( ÿ

89:76;<=ÿ>G /012ÿ4ÿ/012ÿ567
EHÿ$ÿE(&ÿ&*)

$%&722'&%()&  )*2 &&20+6,2- &28,98 0920.


012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#

ÿ
0&ÿÿÿ&&4ÿ1$(ÿ(2ÿ$ÿ4ÿ3 ÿÿÿÿ1$
1$3ÿ$ÿÿ4( &ÿÿ*&&2)ÿ5 ÿ6 &ÿ78ÿ$*)

BCDA@EFGÿH< 9:;<ÿ>ÿ9:;<ÿ?@A
$ÿ$ÿ7((12ÿ7 ÿ &2ÿÿ
3$ÿ &ÿÿÿ(2)
IJF?GKLM ÿ  ÿ8ÿ1$ ÿN02

OEPPGKLM ÿ ÿ7((ÿN02NNÿ4( &ÿ &

EG@DJ?FPK@DLM ÿ ÿ7((ÿN0ÿ4( &ÿÿ$ÿ

GF@GCPPLM ÿ ÿ7ÿ*&&2ÿ4( &

BCDA@EFGÿHQ 9:;<ÿ>ÿ9:;<ÿ?@A

R$$ÿ*2ÿ$ÿ7((12ÿÿ4(ÿ*$&ÿ7ÿ$(2ÿ*&&2ÿ
$%&722'&%()&  )*2 &&20+6,2- &28,98 0.20/
012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
ÿÿÿÿÿÿ
)ÿ(*/ÿÿ0'1&
ÿÿÿÿÿ )ÿ&*/ÿ &&/ÿ2( &
ÿÿÿÿÿ3)ÿ//ÿ$ÿ &&/ÿ2( &ÿ /ÿ4(5&&
ÿÿÿÿÿ)ÿ3%(/ÿ6$ÿ((ÿ%&&'(ÿ7( &

ÿÿ4((ÿ$ÿ%&ÿÿÿ

ABC@?DEFÿGH 89:;ÿ=ÿ89:;ÿ>?@
I*ÿ*$*(ÿ*(&ÿJÿ%*&ÿÿ'&7&ÿ$ÿ$ÿ(
%$&ÿ&ÿÿ$ÿK%(ÿ/6$)ÿL$&ÿÿ'ÿ7Jÿ'5ÿ%(/ÿ$
 *'ÿJÿJ&ÿÿ(/&()ÿL$&ÿ&ÿÿ &ÿ&ÿJÿ6$$ÿ$- )

ÿÿL&J*ÿ

ABC@?DEFÿGM 89:;ÿ=ÿ89:;ÿ>?@
N$$ÿÿ7& (ÿ&ÿ%%%ÿÿK%(ÿ$ÿ(&$%
'6ÿ6ÿ' &ÿ ÿJÿ*5ÿ' &ÿÿÿÿJ*)
$%&722'&%()&  )*2 &&20+6,2- &28,98 0+20.
012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#

ÿÿÿ%(ÿ

89:76;<=ÿ>? /012ÿ4ÿ/012ÿ567

@$ÿ&&(ÿ&%ÿA04A14ÿ)ÿ)ÿ)ÿ4AB#2B4ÿCÿ$ÿÿD( &ÿA04A14ÿ)ÿ)ÿ)
4ABÿ&ÿ((ÿ&ÿ$ÿ

ÿÿ*ÿ

89:76;<=ÿE/ /012ÿ4ÿ/012ÿ567

@$ÿC&ÿ16ÿF&ÿCÿÿ  &ÿD( &ÿ' ÿCÿÿ(Gÿÿ&ÿ&


GDÿ'(F) &&GÿD( &ÿÿ$(ÿ'Hÿ%(GÿF$ÿ6)I$ÿJÿC
&&ÿF (ÿH ÿ%CK
 &L' 
6
6
6
6
6
$%&722'&%()&  )*2 &&20+6,2- &28,98 0320.
012032114ÿ06789ÿ
ÿ07ÿ ÿÿÿÿ011 !"981#
6
086)1
6
6
019
6
6
6
6
01/)+9
6
6
6
6
6

ÿÿ- (ÿ0- 1ÿ'2)ÿ

ÿ7ÿ3456ÿ ÿ0ÿ06

$%&722'&%()&  )*2 &&20+6,2- &28,98 0.20.


18/12/2022, 22:12 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz 1
Due Dec 19 at 23:59 Points 10 Questions 40
Available Dec 18 at 19:00 - Dec 19 at 23:59 Time Limit 60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss
this Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score
Attempt 1 (https://bits- 31 8.48 out of
LATEST
pilani.instructure.com/courses/1704/quizzes/3453/history?version=1) minutes 10

 Correct answers will be available on Dec 22 at 0:00.

Score for this quiz: 8.48 out of 10


Submitted Dec 18 at 22:10
This attempt took 31 minutes.

Question 1 0.25 / 0.25 pts

Due to market expectations, businesses are having difficulty retaining


highly trained data scientists and engineers.

  No answer text provided.

  True

  False

file:///Users/rajeshkumar/Downloads/Quiz 1_ Introduction to Data Science (S1-22_DSECLZG532).html 1/18


18/12/2022, 22:12 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  No answer text provided.

Question 2 0.25 / 0.25 pts

Which of these is not an example of the application of data science?

  Product recommender systems

  Creation of new fiscal policy

  Fraud detection and prevention system in a bank

  Targeted advertising as per customer's need

Question 3 0.25 / 0.25 pts

If a person is coming from software development background, which of


the following Data science project roles will best suite him?

  Machine Learning Engineer

  Storyteller

  Data Analyst

  Big Data Engineer

Question 4 0.25 / 0.25 pts

Data science is the process of diverse set of data through


____________

file:///Users/rajeshkumar/Downloads/Quiz 1_ Introduction to Data Science (S1-22_DSECLZG532).html 2/18


18/12/2022, 22:12 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  All of the options

  processing data

  analysing data

  organizing data

Question 5 0.25 / 0.25 pts

Statement 1: Role of a Business analyst usually requires expertise on


building ML models.
Statement 2: Role of a Data Scientist usually requires expertise on
performing descriptive analysis.

Which of the following is right?

  Both statements are wrong

  Statement 1 is correct and Statement 2 is wrong

  Statement 1 is wrong and Statement 2 is correct

  Both statements are correct

Question 6 0.25 / 0.25 pts

Data scientist is not responsible for

  Data mining

  Data manipulation

  Building continuous data stream

file:///Users/rajeshkumar/Downloads/Quiz 1_ Introduction to Data Science (S1-22_DSECLZG532).html 3/18


18/12/2022, 22:12 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Data analytics

Question 7 0.25 / 0.25 pts

Compute the width of each bin for data given below, if the number of
bins is 5.

[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46, 48,
36]

 4

(49−30)÷5

 3

 6

 5

Question 8 0.25 / 0.25 pts

Compute the Manhattan distance between A(2,3) and B(5,7).

 7

 8

 5

 6

file:///Users/rajeshkumar/Downloads/Quiz 1_ Introduction to Data Science (S1-22_DSECLZG532).html 4/18


18/12/2022, 22:12 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 9 0.25 / 0.25 pts

Compute the depth of each bin for data given below, if the number of
bins is 4.
[47, 38, 40, 42, 47, 42, 41, 39, 42, 45, 36, 37, 36, 40, 43, 40, 45, 36, 43,
46]

 6

 5

 3

 4

Question 10 0.25 / 0.25 pts

Compute the Jaccard's Co-efficient for x = (1,0,0,0,1,1,1) and y =


(0,1,1,0,0,1,0)

  0.234

  0.166

  0.455

  0.765

Question 11 0.25 / 0.25 pts

A course instructor has data about students attendance in her course in


the past semester . What kind of analytics is she performing when she
creates a line graph based on this data?

file:///Users/rajeshkumar/Downloads/Quiz 1_ Introduction to Data Science (S1-22_DSECLZG532).html 5/18


18/12/2022, 22:12 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Descriptive

  Prescriptive

  Predictive

  Diagnostic

Question 12 0.25 / 0.25 pts

Fortis-Apollo hospital is planning to  design a model which maps


patients to the best possible treatments based on the diagnosis. Identify
the data analytics task for this scenario 

  Descriptive Analytics

  Diagnostic Analytics

  Cognitive analytics

  Predictive Analytics

Question 13 0.25 / 0.25 pts

Access Situation - "Cost and Benefits" - Falls into which phase of Crisp
- DM

  Data modeling

  Data Preparation

  Evaluation

  Business Understanding

file:///Users/rajeshkumar/Downloads/Quiz 1_ Introduction to Data Science (S1-22_DSECLZG532).html 6/18


18/12/2022, 22:12 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 14 0.25 / 0.25 pts

Training and testing datasets are developed during _________ phase.

  Operationalize

  None

  Model building

  Model planning

Question 15 0.25 / 0.25 pts

A scenario where you feel unwell and go to a doctor. The doctor asks
you questions like were you exposed to rain or cold climate or did you
have contact with a sick person or did you have food from outside etc.
Based on your answers,doctor came to the conclusion. This can be
considered analogous to which stage of data analytics?

  Diagnostic

  Prescriptive

  Predictive

  Descriptive

Question 16 0.25 / 0.25 pts

file:///Users/rajeshkumar/Downloads/Quiz 1_ Introduction to Data Science (S1-22_DSECLZG532).html 7/18


18/12/2022, 22:12 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Identify the data analytics task for the following scenario. A mother
analysed the answer scripts of her daughter and found that they should
concentrate on reading comprehension to score better in the upcoming
exams. 

  Predictive Analytics

  Diagnostic Analytics

  Prescriptive Analytics

  Descriptive Analytics

Question 17 0.25 / 0.25 pts

The 5 stages of KDD process is


1.Selection
2.Preprocessing
3.Tranformation
4.Data Mining
5.Interpretaion/Evaluation.

Identify the CRISP DM phases that corresponds to Stage 3 and 4 of


KDD.

  Data Preparation,Modeling

  Evaluation,Business understanding

  Data Understanding,Evaluation

  Modeling,Data Preparation

Incorrect
Question 18 0 / 0.25 pts

file:///Users/rajeshkumar/Downloads/Quiz 1_ Introduction to Data Science (S1-22_DSECLZG532).html 8/18


18/12/2022, 22:12 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Big data analytics does not benefit a company:

  Increase costs due to additional analytics investment

  Better understand customers

  Increase shareholder dividends

  Refine marketing and advertising

Question 19 0.25 / 0.25 pts

We are predicting the humidity at Bangalore using the data collected in


the last one month. This task is an example of

  Regression

  Association Rule

  Classification

  Clustering

Question 20 0.25 / 0.25 pts

A company wants to find the target segment of people for one of its
products. Which modelling technique would be generally appropriate

  Clustering

  Ranking

  Classification

file:///Users/rajeshkumar/Downloads/Quiz 1_ Introduction to Data Science (S1-22_DSECLZG532).html 9/18


18/12/2022, 22:12 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Regression

Incorrect
Question 21 0 / 0.25 pts

To generate genuine business value, simply gathering and keeping data


isn't enough. Technologies for big data analytics are required to:

  Extract valuable insights from the data

  Formulate eye-catching charts and graphs

  Determine business goals and objectives

  Integrate data from internal and external sources

Question 22 0.25 / 0.25 pts

Which data analytic approach can show the relationships in the various
elements in your data?

  Descriptive

  Prescriptive

  Predictive

  Diagnostic

Incorrect Question 23 0 / 0.25 pts

In a dataset, we are tracking whether a customer is purchasing a


product or not. This is an example of

file:///Users/rajeshkumar/Downloads/Quiz 1_ Introduction to Data Science (S1-22_DSECLZG532).html 10/18


18/12/2022, 22:12 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Symmetric attribute

  Discrete attribute

  Asymmetric attribute

  Continuous attribute

Incorrect
Question 24 0 / 0.25 pts

Is it possible to convert an interval variable to an Ordinal Variable

  True

  False

Question 25 0.25 / 0.25 pts

Bank loan approval data consists of a field called loan type. It is stored
as an integer in the database. Its values mean the following:- 1 -
personal loan, 2 - home loan, 3 - business loan. What type of data type
is the loan type?

  Ratio

  Interval

  Nominal

  Ordinal

Question 26 0.25 / 0.25 pts

file:///Users/rajeshkumar/Downloads/Quiz 1_ Introduction to Data Science (S1-22_DSECLZG532).html 11/18


18/12/2022, 22:12 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Is it possible to convert a Nominal scale to an Ordinal Scale during data


analysis?

  True

  False

Question 27 0.25 / 0.25 pts

As part of a survey in a large organization, one of the features that you


capture is designation. This type of data has the characteristic

  Discrete, Qualitative, Ordinal

  Nominal, Quantitative, Discrete

  Discrete, Quantitative, Ordinal

  None of the given answers

Partial Question 28 0.17 / 0.25 pts

Match the following datasets to its correct type.

Transaction Data   Record

Molecular Structures   Graph

Spatial Data   Graph

file:///Users/rajeshkumar/Downloads/Quiz 1_ Introduction to Data Science (S1-22_DSECLZG532).html 12/18


18/12/2022, 22:12 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 29 0.25 / 0.25 pts

Data transformation is done to improve ------------- in algorithm

  Integration

  Redundancy

  Inconsistencies

  Noise

  Accuracy & Efficiency

Incorrect
Question 30 0 / 0.25 pts

Which statement best compares histogram and 5-number summary

  Histogram is always more informative on data distribution

  Histogram can be very informative with finer ranges

  5-number summary can be used for non-numeric data

  5-number summary is robust w.r.t. noise and outliers

Question 31 0.25 / 0.25 pts

What does discretization do ?

  Convert a continuous attribute into a discrete attribute


file:///Users/rajeshkumar/Downloads/Quiz 1_ Introduction to Data Science (S1-22_DSECLZG532).html 13/18
18/12/2022, 22:12 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Both of the above.

  None of the above.

  Reduce overall data size.

Question 32 0.25 / 0.25 pts

Which of the following statements are true?

I. The smaller data sets resulting from data reduction require less
memory and processing time.

II. Aggregation provides a high‐level view of the data instead of a low‐


level view

III. High-quality data that is aggregated can lead to a high chance of


identifying false positives and negatives

IV. An advantage of aggregation is the potential loss of interesting


details

  Only Statement III

  Statement I and II

  Statement II and III

  Statement I and IV

Question 33 0.25 / 0.25 pts

The scatterplot implies that

file:///Users/rajeshkumar/Downloads/Quiz 1_ Introduction to Data Science (S1-22_DSECLZG532).html 14/18


18/12/2022, 22:12 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  The features are not correlated

  The features are correlated.

  The features are negatively correlated.

  The features are positively correlated.

Question 34 0.25 / 0.25 pts

Consider the following Python code.  


import numpy as np
from sklearn.preprocessing import Binarizer
exam1 = np.array([41,43,45,47,49])
b1 = Binarizer(threshold= exam1.mean())
exam1_b = b1.fit_transform(exam1.reshape(-1, 1))
print(exam1_b)
The last line of the code snippet will print

  [0,0,0,1,1]

  [0,0,1,0,0]

  [1,1,0,1,1]

  [1,1,1,0,0]

file:///Users/rajeshkumar/Downloads/Quiz 1_ Introduction to Data Science (S1-22_DSECLZG532).html 15/18


18/12/2022, 22:12 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 35 0.25 / 0.25 pts

What does discretization do ?

  Convert a continuous attribute into a discrete attribute.

  Reduce overall data size.

  None of the above.

  Both of the above.

Question 36 0.25 / 0.25 pts

In which phase, the duplicates of the data are removed? Choose the
best possible answer.

  Data Collection

  Data Requirements

  Data Preparation

  Data Understanding

Question 37 0.25 / 0.25 pts

Identify the sampling technique used in the following use case. For ML
classification task, the algorithm requires that the test set has equal
examples from the three categories.

file:///Users/rajeshkumar/Downloads/Quiz 1_ Introduction to Data Science (S1-22_DSECLZG532).html 16/18


18/12/2022, 22:12 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Simple Random

  Stratified Random

  Cluster Sampling

  Systematic Sampling

Question 38 0.25 / 0.25 pts

Missing values should always be imputed before training the model.

  True

  None of the answers

  Mostly True

  False

Partial Question 39 0.06 / 0.25 pts

Match the following function usage in Python used in data cleaning.

dropna()   return Index without NA

fillna()   to fill NA values in the d

interpolate()   to fill NA/NaN values us

notnull()   return Index without NA

file:///Users/rajeshkumar/Downloads/Quiz 1_ Introduction to Data Science (S1-22_DSECLZG532).html 17/18


18/12/2022, 22:12 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 40 0.25 / 0.25 pts

What is the median of this data 2,5,1,6,7?

 6

 2

 5

  5.5

Quiz Score: 8.48 out of 10

file:///Users/rajeshkumar/Downloads/Quiz 1_ Introduction to Data Science (S1-22_DSECLZG532).html 18/18


18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz 1
Due Dec 19 at 23:59 Points 10 Questions 40
Available Dec 18 at 19:00 - Dec 19 at 23:59 Time Limit 60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss
this Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score
LATEST Attempt 1 60 minutes 6.5 out of 10

 Correct answers will be available on Dec 22 at 0:00.

Score for this quiz: 6.5 out of 10


Submitted Dec 18 at 20:33
This attempt took 60 minutes.

Question 1 0.25 / 0.25 pts

Which of the following are correct skills for a Data Scientist?

  machine learning/deep learning

  all of the options

  data wrangling

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 1/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  probability and statistics

Incorrect Question 2 0 / 0.25 pts

R and Python is not a preferred skill for

  Data Analyst

  Data Scientist

  Data Architect

  Data Journalist

Question 3 0.25 / 0.25 pts

Statement 1: Business Intelligence involves analyzing past data and


reporting on it.

Statement 2: Descriptive analysis involves analyzing past data and


reporting on it.

Which of the following is right?

  Both statements are correct

  Both statements are wrong

  Statement 1 is correct and Statement 2 is wrong

  Statement 1 is wrong and Statement 2 is correct

Question 4 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 2/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Data scientist is not responsible for 

  building continuous data stream

  data manipulation

  data mining

  data analytics

Question 5 0.25 / 0.25 pts

Which of the following is not a application for data science?

  Online Price Comparison

  Image & Speech Recognition

  Privacy Checker

  Recommendation Systems

Question 6 0.25 / 0.25 pts

For an organization, having not much data analytics need and just
embarking on the analytics path will most likely structure its data team
in a

  Centralized model

  Federated model

  Consulting model

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 3/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Functional model

Question 7 0.25 / 0.25 pts

Suppose that the minimum and maximum values for an attribute are 4.3
and 7.6, respectively. Compute the scaled value of 5.4 if min-max
normalization is applied to scale [0.0,1.0]. (Answer should have a
precision of X.XX]

  0.33

  0.43

  0.36

  0.45

Question 8 0.25 / 0.25 pts

Compute the Jaccard's Co-efficient for x = (1,0,0,0,1,1,1) and y =


(0,1,1,0,0,1,0)

  0.765

  0.166

  0.455

  0.234

Question 9 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 4/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Compute the width of each bin for data given below, if the number of
bins is 5.

[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46, 48,
36]

 6

 4

(49−30)÷5

 3

 5

Question 10 0.25 / 0.25 pts

Suppose that the minimum and maximum values for an attribute are 4.3
and 7.6, respectively. Compute the scaled value of 5.4 if min-max
normalization is applied to scale [0.0,1.0]. (Answer should have a
precision of X.XX]

  0.33

  0.29

  0.30

  0.38

Incorrect Question 11 0 / 0.25 pts

The most time-consuming phase in a data science process is

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 5/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Deployment

  Data Modelling

  Data preparation

  Data collection

Question 12 0.25 / 0.25 pts

Which of the following is not a type of predictive analytics?

  What is the student's performance in the next question?

  average attendance of the students in the current semester?

  Which course will the student take in the next semester?

 
what is the average score of all students in the CBSE 10th Math Exam

Incorrect
Question 13 0 / 0.25 pts

Which of the following methodologies focus the most on model


deployment and embedding in operational systems?

  CRISP-DM

  SEMMA

  SMAM

  All options are correct

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 6/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 14 0.25 / 0.25 pts

BITS investigating to determine the cause for decreased admissions for


CSI PG program is an example of

  Descriptive analysis

  Prescriptive analysis

  Diagnositic Analysis

  Predictive analysis

Question 15 0.25 / 0.25 pts

Identify the data analytics task for the following scenario. A physics
teacher is analyzing the answer scripts of the students to identify the
areas that he/she should concentrate on so that the students
understand the concepts better.

  Predictive Analytics

  Diagnostic Analytics

  Descriptive Analytics

  Prescriptive Analytics

Question 16 0.25 / 0.25 pts

Which of the following statement is false with respect to data set?

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 7/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
Sub setting can be used to select and exclude variables and
observations

 
Merging concerns combining datasets on the same observations to
produce a result

  Raw data should be processed only one time.

  All of the listed options

Incorrect
Question 17 0 / 0.25 pts

A police team wants to predict the crime rate in a locality based on


certain attributes. Which modelling technique would be appropriate

  Optimization

  Classification

  Regression

  Clustering

Question 18 0.25 / 0.25 pts

Google tries to differentiate emails as spam and non-spam, this is an


example of

  Clustering

  Association Rule

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 8/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Classification

  Regression

Incorrect
Question 19 0 / 0.25 pts

Which of the following data science project step is the most critical step
for the success of the project?

  Model Selection

  Model Evaluation

  Model Building

  Data preprocessing

Question 20 0.25 / 0.25 pts

The answer to following question can be obtained by which type of


analytics?
"Whats the best that can happen?"

  Diagnostic Analytics

  Descriptive Analytics

  Prescriptive Analytics

  Predictive Analytics

Question 21 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 9/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

A company wants to find the target segment of people for one of its
products. Which modelling technique would be generally appropriate

  Classification

  Clustering

  Regression

  Ranking

Question 22 0.25 / 0.25 pts

Identify the data analytics task for the following scenario. A student is
analyzing various blogs and vlogs to find out the skill set that has to be
acquired to become a data scientist in the future. 

  Diagnostic Analytics

  Descriptive Analytics

  Predictive Analytics

  Prescriptive Analytics

Question 23 0.25 / 0.25 pts

Which of the following Python library is required for web scraping?

  WebSpider

  WebCrawler

  BeautifulSoup

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 10/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Scraper

Incorrect Question 24 0 / 0.25 pts

As part of a survey in a large organization, one of the features that you


capture is designation. This type of data has the characteristic

  None of the given answers

  Discrete, Qualitative, Ordinal

  Nominal, Quantitative, Discrete

  Discrete, Quantitative, Ordinal

Question 25 0.25 / 0.25 pts

A dataset can contain

  Both Quantitative and Qualitative Values

  None of the answers

  Quantitative values

  Qualitative Values

Question 26 0.25 / 0.25 pts

Which of the following properties are supported by interval attribute.

   P)  Distinctness
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 11/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

   Q)  Order

   R)   Meaningful differences

   S)  Meaningful ratios

  P and R

  P and S

  P,Q and R

  All the options are correct

Question 27 0.25 / 0.25 pts

In a dataset, CarColor is one of the attributes and it can take the


following values {Red, Green, Yellow, Black}, what type of attribute is
CarColor?

  Interval attribute

  Ordinal attribute

  Nominal attribute

  Ratio attribute

Question 28 0.25 / 0.25 pts

Raw data should be processed only one time.

  True

  False

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 12/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Incorrect
Question 29 0 / 0.25 pts

Which statement best compares histogram and 5-number summary

  5-number summary can be used for non-numeric data

  Histogram is always more informative on data distribution

  Histogram can be very informative with finer ranges

  5-number summary is robust w.r.t. noise and outliers

Incorrect
Question 30 0 / 0.25 pts

Consider the following Python code. 


input=['Havells','Philips','Syska','Eveready','Lloyd']
le = sklearn.preprocessing.LabelEncoder()
le.fit(input)
print(le.transform('Lloyd'))
The last line of the code snippet will print

 2

 4

 1

 3

Question 31 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 13/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

For a data analytics task to analyse feedback on her subject for a class
of 60 students, a school teacher decided to use the survey submitted
by the ten students who come for tuitions for that subject, at her home.
Identify the type of sampling she is doing.

  Non Probabilistic sampling

  Systematic Sampling

  Sampling without replacement

  Stratified sampling

Incorrect
Question 32 0 / 0.25 pts

Which of the statement is TRUE ?

  Outliers should be addressed in the test dataset

  Outliers should always be addressed in the dataset

  Outliers should be addressed only in the training dataset

  Treatment of outliers depends on the problem statement

Incorrect
Question 33 0 / 0.25 pts

The statistical description (x1,x2, . . . ,xN)/N, for the data values x1,x2, .
. . ,xN is called as their ________________

  mean

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 14/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  IQR

  median

  mode

Incorrect Question 34 0 / 0.25 pts

Exploratory data analysis does not help in

  Finding out the data type of a variable

  In univariate and bivariate variable analysis

  Derivation of new attributes

  Finding statistical estimates of a variable

Question 35 0.25 / 0.25 pts

Which of the following statements are true with respect to data quality
issues?

  The given data set should not miss any values or attributes

  All of the above

 
Pre-processing of data is required to address the problems of
inconsistency, incompleteness.

 
If data are not updated time to time there will be a negative impact on
data quality.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 15/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Incorrect
Question 36 0 / 0.25 pts

Consider the following Python code.  


lb = sklearn.preprocessing.LabelBinarizer()
print(lb.fit_transform(['yes', 'no', 'no', 'yes']))
The last line of the code snippet will print

  [0,1,1,0]

  [true, false, false, true]

  [True, False, False, True]

  [1,0,0,1]

Question 37 0.25 / 0.25 pts

Identify the false statement.

 
As a data scientist, while cleaning the data, you are not concerned with
why the data values are missing. You just fix them.

 
As a data scientist, even if you know that certain outliers are valid data,
you might still omit them from model construction.

  It may not be a good idea to drop a data field with missing values.

 
As a data scientist, even if you know that certain outliers are valid data,
you might still omit them from model construction.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 16/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Incorrect
Question 38 0 / 0.25 pts

A data object is being described by a categorical attribute having four


categories, then for data analysis purpose if we want to transform the
attributes into numerical values, then

  A. it can’t be done

  C. it can be done by encoding using only 4 binary variables

  B. it can be done by encoding using only 3 binary variables

  D. it can be done by encoding using 3 or 4 binary variables

Question 39 0.25 / 0.25 pts

In a boxplot, where Q1, Q2 and Q3 are the first, second and third
quartiles respectively, the interquartile range IQR is calculated as:

  None of the above

  IQR = Q3 – Q1

  IQR = Q2-Q1

  IQR = Q3-Q2

Incorrect
Question 40 0 / 0.25 pts

The____ and standard deviation are strongly affected by outliers

  MEAN

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 17/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  RANGE

  MEDIAN

  MODE

Quiz Score: 6.5 out of 10

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 18/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz 1
Due Dec 19 at 23:59 Points 10 Questions 40
Available Dec 18 at 19:00 - Dec 19 at 23:59 Time Limit 60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss
this Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score
LATEST Attempt 1 60 minutes 6.5 out of 10

 Correct answers will be available on Dec 22 at 0:00.

Score for this quiz: 6.5 out of 10


Submitted Dec 18 at 20:33
This attempt took 60 minutes.

Question 1 0.25 / 0.25 pts

Which of the following are correct skills for a Data Scientist?

  machine learning/deep learning

  all of the options

  data wrangling

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 1/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  probability and statistics

Incorrect Question 2 0 / 0.25 pts

R and Python is not a preferred skill for

  Data Analyst

  Data Scientist

  Data Architect

  Data Journalist

Question 3 0.25 / 0.25 pts

Statement 1: Business Intelligence involves analyzing past data and


reporting on it.

Statement 2: Descriptive analysis involves analyzing past data and


reporting on it.

Which of the following is right?

  Both statements are correct

  Both statements are wrong

  Statement 1 is correct and Statement 2 is wrong

  Statement 1 is wrong and Statement 2 is correct

Question 4 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 2/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Data scientist is not responsible for 

  building continuous data stream

  data manipulation

  data mining

  data analytics

Question 5 0.25 / 0.25 pts

Which of the following is not a application for data science?

  Online Price Comparison

  Image & Speech Recognition

  Privacy Checker

  Recommendation Systems

Question 6 0.25 / 0.25 pts

For an organization, having not much data analytics need and just
embarking on the analytics path will most likely structure its data team
in a

  Centralized model

  Federated model

  Consulting model

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 3/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Functional model

Question 7 0.25 / 0.25 pts

Suppose that the minimum and maximum values for an attribute are 4.3
and 7.6, respectively. Compute the scaled value of 5.4 if min-max
normalization is applied to scale [0.0,1.0]. (Answer should have a
precision of X.XX]

  0.33

  0.43

  0.36

  0.45

Question 8 0.25 / 0.25 pts

Compute the Jaccard's Co-efficient for x = (1,0,0,0,1,1,1) and y =


(0,1,1,0,0,1,0)

  0.765

  0.166

  0.455

  0.234

Question 9 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 4/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Compute the width of each bin for data given below, if the number of
bins is 5.

[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46, 48,
36]

 6

 4

(49−30)÷5

 3

 5

Question 10 0.25 / 0.25 pts

Suppose that the minimum and maximum values for an attribute are 4.3
and 7.6, respectively. Compute the scaled value of 5.4 if min-max
normalization is applied to scale [0.0,1.0]. (Answer should have a
precision of X.XX]

  0.33

  0.29

  0.30

  0.38

Incorrect Question 11 0 / 0.25 pts

The most time-consuming phase in a data science process is

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 5/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Deployment

  Data Modelling

  Data preparation

  Data collection

Question 12 0.25 / 0.25 pts

Which of the following is not a type of predictive analytics?

  What is the student's performance in the next question?

  average attendance of the students in the current semester?

  Which course will the student take in the next semester?

 
what is the average score of all students in the CBSE 10th Math Exam

Incorrect
Question 13 0 / 0.25 pts

Which of the following methodologies focus the most on model


deployment and embedding in operational systems?

  CRISP-DM

  SEMMA

  SMAM

  All options are correct

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 6/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 14 0.25 / 0.25 pts

BITS investigating to determine the cause for decreased admissions for


CSI PG program is an example of

  Descriptive analysis

  Prescriptive analysis

  Diagnositic Analysis

  Predictive analysis

Question 15 0.25 / 0.25 pts

Identify the data analytics task for the following scenario. A physics
teacher is analyzing the answer scripts of the students to identify the
areas that he/she should concentrate on so that the students
understand the concepts better.

  Predictive Analytics

  Diagnostic Analytics

  Descriptive Analytics

  Prescriptive Analytics

Question 16 0.25 / 0.25 pts

Which of the following statement is false with respect to data set?

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 7/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

 
Sub setting can be used to select and exclude variables and
observations

 
Merging concerns combining datasets on the same observations to
produce a result

  Raw data should be processed only one time.

  All of the listed options

Incorrect
Question 17 0 / 0.25 pts

A police team wants to predict the crime rate in a locality based on


certain attributes. Which modelling technique would be appropriate

  Optimization

  Classification

  Regression

  Clustering

Question 18 0.25 / 0.25 pts

Google tries to differentiate emails as spam and non-spam, this is an


example of

  Clustering

  Association Rule

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 8/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Classification

  Regression

Incorrect
Question 19 0 / 0.25 pts

Which of the following data science project step is the most critical step
for the success of the project?

  Model Selection

  Model Evaluation

  Model Building

  Data preprocessing

Question 20 0.25 / 0.25 pts

The answer to following question can be obtained by which type of


analytics?
"Whats the best that can happen?"

  Diagnostic Analytics

  Descriptive Analytics

  Prescriptive Analytics

  Predictive Analytics

Question 21 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 9/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

A company wants to find the target segment of people for one of its
products. Which modelling technique would be generally appropriate

  Classification

  Clustering

  Regression

  Ranking

Question 22 0.25 / 0.25 pts

Identify the data analytics task for the following scenario. A student is
analyzing various blogs and vlogs to find out the skill set that has to be
acquired to become a data scientist in the future. 

  Diagnostic Analytics

  Descriptive Analytics

  Predictive Analytics

  Prescriptive Analytics

Question 23 0.25 / 0.25 pts

Which of the following Python library is required for web scraping?

  WebSpider

  WebCrawler

  BeautifulSoup

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 10/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Scraper

Incorrect Question 24 0 / 0.25 pts

As part of a survey in a large organization, one of the features that you


capture is designation. This type of data has the characteristic

  None of the given answers

  Discrete, Qualitative, Ordinal

  Nominal, Quantitative, Discrete

  Discrete, Quantitative, Ordinal

Question 25 0.25 / 0.25 pts

A dataset can contain

  Both Quantitative and Qualitative Values

  None of the answers

  Quantitative values

  Qualitative Values

Question 26 0.25 / 0.25 pts

Which of the following properties are supported by interval attribute.

   P)  Distinctness
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 11/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

   Q)  Order

   R)   Meaningful differences

   S)  Meaningful ratios

  P and R

  P and S

  P,Q and R

  All the options are correct

Question 27 0.25 / 0.25 pts

In a dataset, CarColor is one of the attributes and it can take the


following values {Red, Green, Yellow, Black}, what type of attribute is
CarColor?

  Interval attribute

  Ordinal attribute

  Nominal attribute

  Ratio attribute

Question 28 0.25 / 0.25 pts

Raw data should be processed only one time.

  True

  False

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 12/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Incorrect
Question 29 0 / 0.25 pts

Which statement best compares histogram and 5-number summary

  5-number summary can be used for non-numeric data

  Histogram is always more informative on data distribution

  Histogram can be very informative with finer ranges

  5-number summary is robust w.r.t. noise and outliers

Incorrect
Question 30 0 / 0.25 pts

Consider the following Python code. 


input=['Havells','Philips','Syska','Eveready','Lloyd']
le = sklearn.preprocessing.LabelEncoder()
le.fit(input)
print(le.transform('Lloyd'))
The last line of the code snippet will print

 2

 4

 1

 3

Question 31 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 13/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

For a data analytics task to analyse feedback on her subject for a class
of 60 students, a school teacher decided to use the survey submitted
by the ten students who come for tuitions for that subject, at her home.
Identify the type of sampling she is doing.

  Non Probabilistic sampling

  Systematic Sampling

  Sampling without replacement

  Stratified sampling

Incorrect
Question 32 0 / 0.25 pts

Which of the statement is TRUE ?

  Outliers should be addressed in the test dataset

  Outliers should always be addressed in the dataset

  Outliers should be addressed only in the training dataset

  Treatment of outliers depends on the problem statement

Incorrect
Question 33 0 / 0.25 pts

The statistical description (x1,x2, . . . ,xN)/N, for the data values x1,x2, .
. . ,xN is called as their ________________

  mean

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 14/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  IQR

  median

  mode

Incorrect Question 34 0 / 0.25 pts

Exploratory data analysis does not help in

  Finding out the data type of a variable

  In univariate and bivariate variable analysis

  Derivation of new attributes

  Finding statistical estimates of a variable

Question 35 0.25 / 0.25 pts

Which of the following statements are true with respect to data quality
issues?

  The given data set should not miss any values or attributes

  All of the above

 
Pre-processing of data is required to address the problems of
inconsistency, incompleteness.

 
If data are not updated time to time there will be a negative impact on
data quality.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 15/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Incorrect
Question 36 0 / 0.25 pts

Consider the following Python code.  


lb = sklearn.preprocessing.LabelBinarizer()
print(lb.fit_transform(['yes', 'no', 'no', 'yes']))
The last line of the code snippet will print

  [0,1,1,0]

  [true, false, false, true]

  [True, False, False, True]

  [1,0,0,1]

Question 37 0.25 / 0.25 pts

Identify the false statement.

 
As a data scientist, while cleaning the data, you are not concerned with
why the data values are missing. You just fix them.

 
As a data scientist, even if you know that certain outliers are valid data,
you might still omit them from model construction.

  It may not be a good idea to drop a data field with missing values.

 
As a data scientist, even if you know that certain outliers are valid data,
you might still omit them from model construction.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 16/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Incorrect
Question 38 0 / 0.25 pts

A data object is being described by a categorical attribute having four


categories, then for data analysis purpose if we want to transform the
attributes into numerical values, then

  A. it can’t be done

  C. it can be done by encoding using only 4 binary variables

  B. it can be done by encoding using only 3 binary variables

  D. it can be done by encoding using 3 or 4 binary variables

Question 39 0.25 / 0.25 pts

In a boxplot, where Q1, Q2 and Q3 are the first, second and third
quartiles respectively, the interquartile range IQR is calculated as:

  None of the above

  IQR = Q3 – Q1

  IQR = Q2-Q1

  IQR = Q3-Q2

Incorrect
Question 40 0 / 0.25 pts

The____ and standard deviation are strongly affected by outliers

  MEAN

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 17/18
18/12/2022, 20:34 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  RANGE

  MEDIAN

  MODE

Quiz Score: 6.5 out of 10

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 18/18
Quiz 1
Due
Dec 19 at 23:59
Points
10
Questions
40
Available
Dec 18 at 19:00 - Dec 19 at 23:59
Time Limit
60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss
this Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score
LATEST Attempt 1
60 minutes 5.38 out of 10


Correct answers will be available on Dec 22 at 0:00.

Score for this quiz:


5.38 out of 10
Submitted Dec 18 at 20:42
This attempt took 60 minutes.

Incorrect Question 1 0
/ 0.25 pts

Statement 1: Business Intelligence involves analyzing past data and


reporting on it.

Statement 2: Descriptive analysis involves analyzing past data and


reporting on it.

Which of the following is right?


 
Statement 1 is correct and Statement 2 is wrong

 
Statement 1 is wrong and Statement 2 is correct

 
Both statements are correct

 
Both statements are wrong

Incorrect
Question 2 0
/ 0.25 pts

Which among the following is a more flexible model with right balance
of

centralized and distributed coordination.

 
Functional

 
Consulting

 
Federated

 
Centralized

Question 3 0.25
/ 0.25 pts

Data Science project steps are highly linear.

 
True

 
False

Question 4 0.25
/ 0.25 pts

If a person is coming from software development background, which of


the following Data science project roles will best suite him?

 
Storyteller

 
Data Analyst

 
Big Data Engineer

 
Machine Learning Engineer

Question 5 0.25
/ 0.25 pts

Artificial Intelligence comprises of the following streams

 
Machine Learning

 
Data Science

 
Deep Learning

 
IoT

Question 6 0.25
/ 0.25 pts

There are following key roles in data science project

 
Data Scientist, Analyst

 
Data Scientist, Architect, SME, Sponsor, Programmer

 
Data Scientist, Analyst, SME, Programmer

 
Data Scientist, Analyst, Sponsor

Question 7 0.25
/ 0.25 pts

Suppose that the minimum and maximum values for an attribute are
4.3 and 7.6, respectively. Compute the scaled value of 5.4 if min-max
normalization is applied to scale [0.0,1.0]. (Answer should have a
precision of X.XX]

 
0.43

 
0.33

 
0.45

 
0.36

Question 8 0.25
/ 0.25 pts

Compute the depth of each bin for data given below, if the number of
bins is 5.

[47, 38, 40, 42, 47, 42, 41, 39, 42, 45, 36, 37, 36, 40, 43, 40, 45, 36,
43, 46]

 
4

20/5

 
3

 
5

 
6

Question 9 0.25
/ 0.25 pts

Compute the depth of each bin for data given below, if the number of
bins is 4.

[47, 38, 40, 42, 47, 42, 41, 39, 42, 45, 36, 37, 36, 40, 43, 40, 45, 36,
43, 46]

 
4

 
3

 
5

 
6

Question 10 0.25
/ 0.25 pts

Compute the width of each bin for data given below, if the number of
bins is 4.

[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46,
48, 36] 

 
6

 
3

 
4

 
5

(49-30)/4
Question 11 0.25
/ 0.25 pts

Suppose a web user visits Flipkart during big billion-day sales.


Predicting whether he / she makes a purchase of a smartphone is a
___________________ task.

 
Association

 
Regression

 
Classification

 
Reinforcement

Incorrect Question 12 0
/ 0.25 pts

BITS investigating to determine the cause for decreased admissions for


CSI PG program is an example of

 
Diagnositic Analysis

 
Descriptive analysis

 
Predictive analysis

 
Prescriptive analysis

Question 13 0.25
/ 0.25 pts

Examining the data they're keeping and reviewing how it's being used
has little or no value for firms who aren't currently aiming to undertake
big data analytics.
 
False

 
True

Question 14 0.25
/ 0.25 pts

The most time-consuming phase in a data science process is

 
Data collection

 
Deployment

 
Data preparation

 
Data Modelling

Incorrect
Question 15 0
/ 0.25 pts

Identify the data analytics task for the following scenario. A


supermarket manager wants to improve her inventory for the summer
season by analyzing last year’s sales. 

 
Diagnostic Analytics

 
Prescriptive Analytics

 
Descriptive Analytics

 
Predictive Analytics

Incorrect
Question 16 0
/ 0.25 pts
Which of the sentences below best describes predictive analytics?

 
methods for predicting future behavior using statistical analysis and
data mining, particularly to maximize the strategic value of corporate
intelligence

 
software applications, also known as "bots," that are dispatched to carry
out a mission and gather data from web pages on behalf of a user

 
a gateway that offers access to a variety of vital information from many
different sources on one screen

 
a method that uses feature analysis to predict people in photos and
tags them to other photos on its own

Incorrect
Question 17 0
/ 0.25 pts

Which of the following statements are true about data cleaning?

 
It focuses on removing inaccurate data from your data set

 
It enhances the data’s accuracy and integrity

 
All of the given options

 
It focuses on transforming the data’s format by converting raw data into
another format
Partial
Question 18 0.13
/ 0.25 pts

Match with the most appropriate answer, related to the pandemic

Descriptive Analytics   Checking whether hosp

Predictive Analytics   Predict test positivity ra

Diagnostic Analytics   Interactive visual tool to

Prescriptive Analytics   Identify the actions to b

Question 19 0.25
/ 0.25 pts

Identify the data analytics task for the following scenario. The team
leader aggregates the sales data from various geographical areas and
reports the penetration of each product. 

 
Prescriptive Analytics

 
Descriptive Analytics

 
Diagnostic Analytics

 
Predictive Analytics

Incorrect
Question 20 0
/ 0.25 pts
 Identify the data analytics task for the following scenario. A project
manager is analysing past projects to identify the risk involved and how
they were mitigated.

 
Predictive Analytics

 
Prescriptive Analytics

 
Diagnostic Analytics

 
Descriptive Analytics

Question 21 0.25
/ 0.25 pts

Identify the data analytics task for the following scenario. Google is
using tools to suggest texts or phrases while composing emails.

 
Descriptive Analytics

 
Diagnostic Analytics

 
Prescriptive Analytics

 
Predictive Analytics

Question 22 0.25
/ 0.25 pts

Identify the data analytics task for the following scenario. A student is
analyzing various blogs and vlogs to find out the skill set that has to be
acquired to become a data scientist in the future. 

 
Prescriptive Analytics

 
Descriptive Analytics

 
Diagnostic Analytics

 
Predictive Analytics

Question 23 0.25
/ 0.25 pts

Point out the correct statement.

 
Raw data is the data obtained after processing steps

 
None of the mentioned

 
Preprocessed data is original source of data

 
Raw data is original source of data

Incorrect
Question 24 0
/ 0.25 pts

"Order Fulfilment Date" should come after "Order Creation Date". This
is an example of which data quality aspect:

 
Integrity

 
Conformity

 
Timeliness

 
Consistency

Question 25 0.25
/ 0.25 pts

Attributes cannot be called as:

 
Data point

 
Variables

 
Dimensions

 
Features

Question 26 0.25
/ 0.25 pts

Is it possible to convert a Nominal scale to an Ordinal Scale during data


analysis?

 
True

 
False

Question 27 0.25
/ 0.25 pts

Point out the correct statement.

 
preprocessed data is original source of data

 
none of the options

 
raw data is the data obtained after processing steps

 
raw data is original source of data

Question 28 0.25
/ 0.25 pts

In a FashionStore Data set the feature ShirtSize { S,M,L,XL,XXL} is an


example of

 
Continuous attribute

 
Ordinal attribute

 
Nominal attribute

 
Numeric attribute

Question 29 0.25
/ 0.25 pts

Imbalance issue in data sets can be rectified using ____

 
Binarisation

 
Normalisation

 
Standardisation

 
Sampling

Question 30 0.25
/ 0.25 pts
Data Integration is a 

 
Data Normalization Technique

 
Pre-processing technique

 
Generalization technique

 
None of the answers

Incorrect Question 31 0
/ 0.25 pts

Which is the major task of Data Integration

 
Clustering of similar data from different sources

 
For the same real world entity, resolving attribute values from different
sources

 
None of the above

 
Identification of missing rows for identified key values

Unanswered Question 32 0
/ 0.25 pts

The missing value for categorical attribute is substituted with

 
mean of a value

 
most frequent attribute value

 
least frequent attribute value

 
none of the above

Unanswered Question 33 0
/ 0.25 pts

In a box and whisker plot of data, Inter Quartile Range (IQR) is

 
Distance between the first and third quartile

 
Distance between the first and second quartile

 
Distance between the first and fourth quartile

 
Distance between the second and third quartile

Unanswered Question 34 0
/ 0.25 pts

For a data analytics task to analyse feedback on her subject for a class
students, a school teacher decided to use the survey submitted by the t
who come for tuitions for that subject, at her home. Identify the type of s
is doing.

 
Systematic Sampling

 
Sampling without replacement

 
Stratified sampling

 
Non Probabilistic sampling

Unanswered Question 35 0
/ 0.25 pts

Consider the following Python code.  

lb = sklearn.preprocessing.LabelBinarizer()

print(lb.fit_transform(['yes', 'no', 'no', 'yes']))

The last line of the code snippet will print

 
[0,1,1,0]

 
[true, false, false, true]

 
[True, False, False, True]

 
[1,0,0,1]

Unanswered Question 36 0
/ 0.25 pts

Data transformation is done to improve ------------- in algorithm

 
Redundancy

 
Accuracy & Efficiency

 
Integration

 
Noise

 
Inconsistencies

Unanswered Question 37 0
/ 0.25 pts

Scaling a variable is not an essential criteria when the ML pipeline uses


algorithms based on gradient descent Optimization
 
True

 
False

Unanswered Question 38 0
/ 0.25 pts

There are two sets X={10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24} and Y = {-30, -31, -32, -33, -34, -35, -36, -37, -38, -39, -40, -41,
-42, -43, -44}. What is TRUE about the standard deviations of X and Y
i.e. σX and σY respectively?

 
σX will be smaller than σY.

 
Magnitude will be the same but the sign will be different

 
σY will be smaller than σX.

 
Will be the same.

Unanswered Question 39 0
/ 0.25 pts

Which of the statement is TRUE ?

 
Treatment of outliers depends on the problem statement

 
Outliers should always be addressed in the dataset

 
Outliers should be addressed only in the training dataset

 
Outliers should be addressed in the test dataset

Unanswered Question 40 0
/ 0.25 pts

Which of the following is NOT a visualization technique?

 
Matrix

 
Lexeme

 
ConeTree

 
TreeMap

Quiz Score:
5.38 out of 10
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz 1
Due Dec 19 at 11:59pm Points 10 Questions 40
Available Dec 18 at 7pm - Dec 19 at 11:59pm Time Limit 60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss
this Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score

LATEST Attempt 1 47 minutes 8.75 out of 10

 Correct answers will be available on Dec 22 at 12am.

Score for this quiz: 8.75 out of 10


Submitted Dec 18 at 8:45pm
This attempt took 47 minutes.

Question 1 0.25 / 0.25 pts

Data science is the process of diverse set of data through


____________

  processing data

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 1/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  analysing data

  All of the options

  organizing data

Question 2 0.25 / 0.25 pts

Statement 1: Role of a Business analyst usually requires expertise on


building ML models.
Statement 2: Role of a Data Scientist usually requires expertise on
performing descriptive analysis.

Which of the following is right?

  Statement 1 is correct and Statement 2 is wrong

  Both statements are correct

  Statement 1 is wrong and Statement 2 is correct

  Both statements are wrong

Incorrect
Question 3 0 / 0.25 pts

Identify the data mining tasks from the below list.

i.  Dividing the customers of a company according to their gender


ii.  Predicting the future stock price of a company using historical
records
iii. Monitoring the heart rate of a patient for abnormalities
iv. Extracting the frequencies of a sound wave
 v. Sorting a student database based on student identification numbers

  i, ii, iii
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 2/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  iv, v

  ii, iii

  i, iv, v

Question 4 0.25 / 0.25 pts

Data Science project steps are highly linear.

  True

  False

Question 5 0.25 / 0.25 pts

The task of the data scientist include

  Collecting Raw Data

  communicate the results to the stakeholders

  All of the Above

  Identifying relevant features

Question 6 0.25 / 0.25 pts

Data scientist is not responsible for 

  data mining

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 3/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  building continuous data stream

  data analytics

  data manipulation

Question 7 0.25 / 0.25 pts

Compute the Manhattan distance between A(2,3) and B(5,7).

 5

 8

 6

 7

Question 8 0.25 / 0.25 pts

Suppose that the mean and standard deviation of the values for an
attribute are 8.9 and 6.5, respectively. Apply z-score normalization to a
value of 3.2. [Answer should have a precision of X.XXXX]

  0.8769

  -0.8679

  0.8679

  -0.8769

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 4/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 9 0.25 / 0.25 pts

Compute the width of each bin for data given below, if the number of
bins is 4.
[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46,
48, 36] 

 4

 3

 5

(49-30)/4

 6

Question 10 0.25 / 0.25 pts

Compute the Euclidean distance between A(2,3) and B(5,7).

 5

 4

 7

 6

Question 11 0.25 / 0.25 pts

A police team wants to predict the crime rate in a locality based on


certain attributes. Which modelling technique would be appropriate

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 5/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Classification

  Clustering

  Regression

  Optimization

Question 12 0.25 / 0.25 pts

Match with the most appropriate answer related to analytics.

Descriptive Analytics   what happened

Diagnostic Analytics   why happened

Predictive Analytics   what will happened

Prescriptive Analytics   what can make it happe

Question 13 0.25 / 0.25 pts

A scenario where you feel unwell and go to a doctor. The doctor asks
you questions like were you exposed to rain or cold climate or did you
have contact with a sick person or did you have food from outside etc.
Based on your answers,doctor came to the conclusion. This can be
considered analogous to which stage of data analytics?

  Descriptive

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 6/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Predictive

  Prescriptive

  Diagnostic

Question 14 0.25 / 0.25 pts

We are predicting the weather condition as foggy, warm, cloudy, and


misty at Bangalore using the data collected in the last one month. This
task is an example of

  Classification

  Association Rule

  Clustering

  Regression

Question 15 0.25 / 0.25 pts

A company wants to find the target segment of people for one of its
products. Which modelling technique would be generally appropriate

  Clustering

  Classification

  Ranking

  Regression

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 7/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 16 0.25 / 0.25 pts

In "Business Understanding" phase of the data science process, the


goals are identified and the objectives defined.

  True

  False

Question 17 0.25 / 0.25 pts

Data pre-processing improves the data quality and make data mining
algorithms efficient and effective. What are the data pre-processing
tasks along with Data cleaning?

  Data Transformation

  Data Integration

  All of the above.

  Data Reduction

Question 18 0.25 / 0.25 pts

Access Situation - "Cost and Benefits" - Falls into which phase of Crisp
- DM

  Data Preparation

  Data modeling

  Evaluation

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 8/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Business Understanding

Incorrect
Question 19 0 / 0.25 pts

Identify the data analytics task for the following scenario. A software
programmer develops a tool to convert programs written in Verilog to
Python.

  Predictive Analytics

  Prescriptive Analytics

  Diagnostic Analytics

  Descriptive Analytics

Incorrect
Question 20 0 / 0.25 pts

Identify the data analytics task for the following scenario. A student is
analyzing various blogs and vlogs to find out the skill set that has to be
acquired to become a data scientist in the future. 

  Predictive Analytics

  Prescriptive Analytics

  Diagnostic Analytics

  Descriptive Analytics

Question 21 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 9/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

In 2001, Big Data created the three Vs: volume, velocity, and variety.
The V's have grown to encompass veracity and value in the years
since. Big data is sometimes subjected to a fifth V, which is:

  Volatile

  Vector

  Variability

  Vulnerability

Question 22 0.25 / 0.25 pts

Amongst which of the following is / are the branch of statistics which


deals with the development of statistical methods is classified as ___.

  None of the mentioned above

  Applied statistics

  Industry statistics

  Economic statistics

Question 23 0.25 / 0.25 pts

What are the best practices for implementing big data analytics
programmes?

  Determining business direction based on data analysis

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 10/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Letting go entirely of 'old ideas' related to data management

 
Focusing on business goals and how to use big data analytics
technologies to meet them

 
Adopting data analysis tools based on a laundry list of their capabilities

Question 24 0.25 / 0.25 pts

Which of the following is not an example of ordinal attributes?

  Zip codes

  Exam Grades

  Military ranks

  Academic ranks

Question 25 0.25 / 0.25 pts

Which of the following Python library is required for web scraping?

  BeautifulSoup

  WebCrawler

  Scraper

  WebSpider

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 11/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 26 0.25 / 0.25 pts

What is the difference between interval/ratio and ordinal variables?

 
The distance between categories is equal across the range of
interval/ratio data

  Ordinal data can be rank ordered, but interval/ratio data cannot

  Interval/ratio variables contain only two categories

 
Ordinal variables have a fixed zero point, whereas interval/ratio
variables do not

Question 27 0.25 / 0.25 pts

Attributes cannot be called as:

  Variables

  Features

  Dimensions

  Data point

Incorrect Question 28 0 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 12/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

In a dataset, we are tracking whether a customer is purchasing a


product or not. This is an example of

  Symmetric attribute

  Discrete attribute

  Continuous attribute

  Asymmetric attribute

Question 29 0.25 / 0.25 pts

Dealing with missing values during data preparation is what kind of an


operation

  Data retrieval

  Data cleansing

  Data transformation

  Data combining

Question 30 0.25 / 0.25 pts

The____ and standard deviation are strongly affected by outliers

  MEDIAN

  MEAN

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 13/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  MODE

  RANGE

Question 31 0.25 / 0.25 pts

In which phase, the duplicates (of the data) are removed? Choose the
best possible answer.

  Data Preparation

  Data Collection

  Data Understanding

  Data Exploration

Question 32 0.25 / 0.25 pts

Which of the following is NOT a visualization technique?

  Matrix

  TreeMap

  Lexeme

  ConeTree

Question 33 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 14/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

There are two sets X={10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24} and Y = {-30, -31, -32, -33, -34, -35, -36, -37, -38, -39, -40, -41,
-42, -43, -44}. What is TRUE about the standard deviations of X and Y
i.e. σX and σY respectively?

  σX will be smaller than σY.

  Magnitude will be the same but the sign will be different

  Will be the same.

  σY will be smaller than σX.

Question 34 0.25 / 0.25 pts

The scatterplot implies that

  The features are positively correlated.

  The features are independent.

  The features are negatively correlated.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 15/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 35 0.25 / 0.25 pts

Consider the data set below.How will you(most appropriately) handle


the missing values for record 1,4,6 respectively.
Data set description.
price: continuous from 5118 to 45400.(class label)
normalized-losses: continuous from 65 to 256.
num-of-doors: four, two.

  replace by mode,replace by medain,replace by mean.

  replace by mode,ignore the tuple,replace by mean.

  ignore all 3 records.

  Ignore the tuple,replace by mean,replace by mode.

Question 36 0.25 / 0.25 pts

For a data analytics task to analyse feedback on her subject for a


class of 60 students, a school teacher decided to use the survey
submitted by the ten students who come for tuitions for that subject, at
her home. Identify the type of sampling she is doing.

  Systematic Sampling

  Stratified sampling

  Non Probabilistic sampling

  Sampling without replacement

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 16/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 37 0.25 / 0.25 pts

Which among the following are valid methods of handling missing data

       P. Eliminating Data Objects

      Q. Estimating Missing Values

      R. Ignoring the Missing Values during Analysis

      S. Replacing with all possible values

  Q and R

  All the options are correct

  R only

  P, Q and R

Incorrect Question 38 0 / 0.25 pts

Which is the major task of Data Integration

 
For the same real world entity, resolving attribute values from different
sources

  None of the above

  Identification of missing rows for identified key values

  Clustering of similar data from different sources

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 17/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 39 0.25 / 0.25 pts

Which of the following statements are true?

I. The smaller data sets resulting from data reduction require less
memory and processing time.

II. Aggregation provides a high‐level view of the data instead of a low‐


level view

III. High-quality data that is aggregated can lead to a high chance of


identifying false positives and negatives

IV. An advantage of aggregation is the potential loss of interesting


details

  Statement I and II

  Statement II and III

  Only Statement III

  Statement I and IV

Question 40 0.25 / 0.25 pts

The missing value for categorical attribute is substituted with

  mean of a value

  least frequent attribute value

  none of the above

  most frequent attribute value

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 18/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz Score: 8.75 out of 10

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 19/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz 1
Due Dec 19 at 11:59pm Points 10 Questions 40
Available Dec 18 at 7pm - Dec 19 at 11:59pm Time Limit 60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss
this Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score

LATEST Attempt 1 47 minutes 8.75 out of 10

 Correct answers will be available on Dec 22 at 12am.

Score for this quiz: 8.75 out of 10


Submitted Dec 18 at 8:45pm
This attempt took 47 minutes.

Question 1 0.25 / 0.25 pts

Data science is the process of diverse set of data through


____________

  processing data

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 1/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  analysing data

  All of the options

  organizing data

Question 2 0.25 / 0.25 pts

Statement 1: Role of a Business analyst usually requires expertise on


building ML models.
Statement 2: Role of a Data Scientist usually requires expertise on
performing descriptive analysis.

Which of the following is right?

  Statement 1 is correct and Statement 2 is wrong

  Both statements are correct

  Statement 1 is wrong and Statement 2 is correct

  Both statements are wrong

Incorrect
Question 3 0 / 0.25 pts

Identify the data mining tasks from the below list.

i.  Dividing the customers of a company according to their gender


ii.  Predicting the future stock price of a company using historical
records
iii. Monitoring the heart rate of a patient for abnormalities
iv. Extracting the frequencies of a sound wave
 v. Sorting a student database based on student identification numbers

  i, ii, iii
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 2/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  iv, v

  ii, iii

  i, iv, v

Question 4 0.25 / 0.25 pts

Data Science project steps are highly linear.

  True

  False

Question 5 0.25 / 0.25 pts

The task of the data scientist include

  Collecting Raw Data

  communicate the results to the stakeholders

  All of the Above

  Identifying relevant features

Question 6 0.25 / 0.25 pts

Data scientist is not responsible for 

  data mining

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 3/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  building continuous data stream

  data analytics

  data manipulation

Question 7 0.25 / 0.25 pts

Compute the Manhattan distance between A(2,3) and B(5,7).

 5

 8

 6

 7

Question 8 0.25 / 0.25 pts

Suppose that the mean and standard deviation of the values for an
attribute are 8.9 and 6.5, respectively. Apply z-score normalization to a
value of 3.2. [Answer should have a precision of X.XXXX]

  0.8769

  -0.8679

  0.8679

  -0.8769

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 4/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 9 0.25 / 0.25 pts

Compute the width of each bin for data given below, if the number of
bins is 4.
[39, 45, 49, 45, 31, 37, 38, 41, 37, 41, 39, 34, 35, 30, 47, 43, 44, 46,
48, 36] 

 4

 3

 5

(49-30)/4

 6

Question 10 0.25 / 0.25 pts

Compute the Euclidean distance between A(2,3) and B(5,7).

 5

 4

 7

 6

Question 11 0.25 / 0.25 pts

A police team wants to predict the crime rate in a locality based on


certain attributes. Which modelling technique would be appropriate

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 5/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Classification

  Clustering

  Regression

  Optimization

Question 12 0.25 / 0.25 pts

Match with the most appropriate answer related to analytics.

Descriptive Analytics   what happened

Diagnostic Analytics   why happened

Predictive Analytics   what will happened

Prescriptive Analytics   what can make it happe

Question 13 0.25 / 0.25 pts

A scenario where you feel unwell and go to a doctor. The doctor asks
you questions like were you exposed to rain or cold climate or did you
have contact with a sick person or did you have food from outside etc.
Based on your answers,doctor came to the conclusion. This can be
considered analogous to which stage of data analytics?

  Descriptive

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 6/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Predictive

  Prescriptive

  Diagnostic

Question 14 0.25 / 0.25 pts

We are predicting the weather condition as foggy, warm, cloudy, and


misty at Bangalore using the data collected in the last one month. This
task is an example of

  Classification

  Association Rule

  Clustering

  Regression

Question 15 0.25 / 0.25 pts

A company wants to find the target segment of people for one of its
products. Which modelling technique would be generally appropriate

  Clustering

  Classification

  Ranking

  Regression

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 7/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 16 0.25 / 0.25 pts

In "Business Understanding" phase of the data science process, the


goals are identified and the objectives defined.

  True

  False

Question 17 0.25 / 0.25 pts

Data pre-processing improves the data quality and make data mining
algorithms efficient and effective. What are the data pre-processing
tasks along with Data cleaning?

  Data Transformation

  Data Integration

  All of the above.

  Data Reduction

Question 18 0.25 / 0.25 pts

Access Situation - "Cost and Benefits" - Falls into which phase of Crisp
- DM

  Data Preparation

  Data modeling

  Evaluation

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 8/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Business Understanding

Incorrect
Question 19 0 / 0.25 pts

Identify the data analytics task for the following scenario. A software
programmer develops a tool to convert programs written in Verilog to
Python.

  Predictive Analytics

  Prescriptive Analytics

  Diagnostic Analytics

  Descriptive Analytics

Incorrect
Question 20 0 / 0.25 pts

Identify the data analytics task for the following scenario. A student is
analyzing various blogs and vlogs to find out the skill set that has to be
acquired to become a data scientist in the future. 

  Predictive Analytics

  Prescriptive Analytics

  Diagnostic Analytics

  Descriptive Analytics

Question 21 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 9/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

In 2001, Big Data created the three Vs: volume, velocity, and variety.
The V's have grown to encompass veracity and value in the years
since. Big data is sometimes subjected to a fifth V, which is:

  Volatile

  Vector

  Variability

  Vulnerability

Question 22 0.25 / 0.25 pts

Amongst which of the following is / are the branch of statistics which


deals with the development of statistical methods is classified as ___.

  None of the mentioned above

  Applied statistics

  Industry statistics

  Economic statistics

Question 23 0.25 / 0.25 pts

What are the best practices for implementing big data analytics
programmes?

  Determining business direction based on data analysis

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 10/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Letting go entirely of 'old ideas' related to data management

 
Focusing on business goals and how to use big data analytics
technologies to meet them

 
Adopting data analysis tools based on a laundry list of their capabilities

Question 24 0.25 / 0.25 pts

Which of the following is not an example of ordinal attributes?

  Zip codes

  Exam Grades

  Military ranks

  Academic ranks

Question 25 0.25 / 0.25 pts

Which of the following Python library is required for web scraping?

  BeautifulSoup

  WebCrawler

  Scraper

  WebSpider

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 11/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 26 0.25 / 0.25 pts

What is the difference between interval/ratio and ordinal variables?

 
The distance between categories is equal across the range of
interval/ratio data

  Ordinal data can be rank ordered, but interval/ratio data cannot

  Interval/ratio variables contain only two categories

 
Ordinal variables have a fixed zero point, whereas interval/ratio
variables do not

Question 27 0.25 / 0.25 pts

Attributes cannot be called as:

  Variables

  Features

  Dimensions

  Data point

Incorrect Question 28 0 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 12/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

In a dataset, we are tracking whether a customer is purchasing a


product or not. This is an example of

  Symmetric attribute

  Discrete attribute

  Continuous attribute

  Asymmetric attribute

Question 29 0.25 / 0.25 pts

Dealing with missing values during data preparation is what kind of an


operation

  Data retrieval

  Data cleansing

  Data transformation

  Data combining

Question 30 0.25 / 0.25 pts

The____ and standard deviation are strongly affected by outliers

  MEDIAN

  MEAN

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 13/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  MODE

  RANGE

Question 31 0.25 / 0.25 pts

In which phase, the duplicates (of the data) are removed? Choose the
best possible answer.

  Data Preparation

  Data Collection

  Data Understanding

  Data Exploration

Question 32 0.25 / 0.25 pts

Which of the following is NOT a visualization technique?

  Matrix

  TreeMap

  Lexeme

  ConeTree

Question 33 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 14/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

There are two sets X={10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24} and Y = {-30, -31, -32, -33, -34, -35, -36, -37, -38, -39, -40, -41,
-42, -43, -44}. What is TRUE about the standard deviations of X and Y
i.e. σX and σY respectively?

  σX will be smaller than σY.

  Magnitude will be the same but the sign will be different

  Will be the same.

  σY will be smaller than σX.

Question 34 0.25 / 0.25 pts

The scatterplot implies that

  The features are positively correlated.

  The features are independent.

  The features are negatively correlated.

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 15/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 35 0.25 / 0.25 pts

Consider the data set below.How will you(most appropriately) handle


the missing values for record 1,4,6 respectively.
Data set description.
price: continuous from 5118 to 45400.(class label)
normalized-losses: continuous from 65 to 256.
num-of-doors: four, two.

  replace by mode,replace by medain,replace by mean.

  replace by mode,ignore the tuple,replace by mean.

  ignore all 3 records.

  Ignore the tuple,replace by mean,replace by mode.

Question 36 0.25 / 0.25 pts

For a data analytics task to analyse feedback on her subject for a


class of 60 students, a school teacher decided to use the survey
submitted by the ten students who come for tuitions for that subject, at
her home. Identify the type of sampling she is doing.

  Systematic Sampling

  Stratified sampling

  Non Probabilistic sampling

  Sampling without replacement

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 16/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 37 0.25 / 0.25 pts

Which among the following are valid methods of handling missing data

       P. Eliminating Data Objects

      Q. Estimating Missing Values

      R. Ignoring the Missing Values during Analysis

      S. Replacing with all possible values

  Q and R

  All the options are correct

  R only

  P, Q and R

Incorrect Question 38 0 / 0.25 pts

Which is the major task of Data Integration

 
For the same real world entity, resolving attribute values from different
sources

  None of the above

  Identification of missing rows for identified key values

  Clustering of similar data from different sources

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 17/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 39 0.25 / 0.25 pts

Which of the following statements are true?

I. The smaller data sets resulting from data reduction require less
memory and processing time.

II. Aggregation provides a high‐level view of the data instead of a low‐


level view

III. High-quality data that is aggregated can lead to a high chance of


identifying false positives and negatives

IV. An advantage of aggregation is the potential loss of interesting


details

  Statement I and II

  Statement II and III

  Only Statement III

  Statement I and IV

Question 40 0.25 / 0.25 pts

The missing value for categorical attribute is substituted with

  mean of a value

  least frequent attribute value

  none of the above

  most frequent attribute value

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 18/19
12/18/22, 8:45 PM Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz Score: 8.75 out of 10

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 19/19
Question 1
0.25 / 0.25 pts
Pattern Recognition is a sub-field of Data Science.

True

False

PartialQuestion 2
0.17 / 0.25 pts
Which of the following are the reasons for the sudden growth of analytics?

Large number of analysts available in the market

Large number of user friendly analytics tools available for data processing

Cost of storage has hugely dropped

Data is growing at 40% compound annual rate

Question 3
0.25 / 0.25 pts
A city conducted a new bi-annual census of its residents. Which of the following most
strongly suggests a cognitive bias in their collected dataset?

The census data includes new data on how many cars are owned by the residents.

Some rows have null values for last names


The average income in the dataset is 40% higher than the average income of the city’s
population from the previous census 2 years ago.

Some rows have date of birth in DD-MM-YY and some in DD-MM-YYYY formats

Question 4
0.25 / 0.25 pts
Due to market expectations, businesses are having difficulty retaining highly trained data
scientists and engineers.

No answer text provided.

False

True

No answer text provided.

Question 5
0.25 / 0.25 pts
Data science is an interdisciplinary field that has minimal overlap with which of the below?

Software Engineering

Machine Learning

Statistical Analysis
Artificial Intelligence

Question 6
0.25 / 0.25 pts
Statement 1: Role of a Business analyst usually requires expertise on building ML models.
Statement 2: Role of a Data Scientist usually requires expertise on performing descriptive
analysis.
Which of the following is right?

Statement 1 is wrong and Statement 2 is correct

Statement 1 is correct and Statement 2 is wrong

Both statements are correct

Both statements are wrong

Question 7
0.25 / 0.25 pts
Suppose that the minimum and maximum values for an attribute are 4.3 and 7.6,
respectively. Compute the scaled value of 5.4 if min-max normalization is applied to scale
[-1.0,+1.0]. (Answer should have a precision of X.XX)

0.76

0.66

0.33
-0.33

Question 8
0.25 / 0.25 pts
Suppose that the mean and standard deviation of the values for an attribute are 8.9 and
6.5, respectively. Apply z-score normalization to a value of 10.7.

0.2679

-0.2769

-0.2679

0.2769

Question 9
0.25 / 0.25 pts
Compute the Euclidean distance between A(2,3) and B(5,7).

7
Question 11
0.25 / 0.25 pts
Access Situation - "Cost and Benefits" - Falls into which phase of Crisp - DM

Evaluation

Data modeling

Business Understanding

Data Preparation

Question 12
0.25 / 0.25 pts
Suppose a web user visits Flipkart during big billion-day sales. Predicting whether he / she
makes a purchase of a smartphone is a ___________________ task.

Association

Classification

Reinforcement

Regression

Question 15
0.25 / 0.25 pts
A course instructor has data about students attendance in her course in the past semester
. What kind of analytics is she performing when she creates a line graph based on this
data?

Predictive

Descriptive

Diagnostic

Prescriptive

Question 18
0.25 / 0.25 pts
Aanalyzing the data to determine why some phenomena related to learning happened a
type of

Diagnostic

Descriptive

Prescriptive

Predictive

Question 19
0.25 / 0.25 pts
Regression is
(1) Prediction of a value of a given continuous valued variable based on the values of other
variables.
(2) Regression works with a linear or nonlinear model of dependency among the variables.

Which of the above is true?

Option 1

None

Option 2

Both 1& 2

Question 20
0.25 / 0.25 pts
Amongst which of the following is / are the branch of statistics which deals with the
development of statistical methods is classified as ___.

Applied statistics

Industry statistics

Economic statistics

None of the mentioned above

Question 21
0.25 / 0.25 pts
Google tries to differentiate emails as spam and non-spam, this is an example of
Clustering

Classification

Regression

Association Rule

Question 22
0.25 / 0.25 pts
Identify the data analytics task for the following scenario. An e-commerce platform is
recommending products to their customers to improve the shopping experience.

Diagnostic Analytics

Predictive Analytics

Descriptive Analytics

Prescriptive Analytics

Question 23
0.25 / 0.25 pts
In a dataset, CarColor is one of the attributes and it can take the following values {Red,
Green, Yellow, Black}, what type of attribute is CarColor?

Interval attribute
Ratio attribute

Ordinal attribute

Nominal attribute

Question 24
0.25 / 0.25 pts
What are the best practices for implementing big data analytics programmes?

Determining business direction based on data analysis

Adopting data analysis tools based on a laundry list of their capabilities

Letting go entirely of 'old ideas' related to data management

Focusing on business goals and how to use big data analytics technologies to meet them

Question 25
0.25 / 0.25 pts
Is it possible to rescale a continuous data for better data understanding?

True

False

Question 26
0.25 / 0.25 pts
As part of a survey in a large organization, one of the features that you capture is
designation. This type of data has the characteristic

Discrete, Qualitative, Ordinal

Nominal, Quantitative, Discrete

Discrete, Quantitative, Ordinal

None of the given answers

Question 27
0.25 / 0.25 pts
"Order Fulfilment Date" should come after "Order Creation Date". This is an example of
which data quality aspect:

Consistency

Integrity

Conformity

Timeliness

Question 28
0.25 / 0.25 pts
Data lake mainly stores data in
none

Structured format

Raw format

both

Question 29
0.25 / 0.25 pts
What does discretization do ?

None of the above.

Reduce overall data size.

Convert a continuous attribute into a discrete attribute

Both of the above.

Question 32
0.25 / 0.25 pts
Consider the following Python code.
import numpy as np
from sklearn.preprocessing import Binarizer
exam1 = np.array([41,43,45,47,49])
b1 = Binarizer(threshold= exam1.mean())
exam1_b = b1.fit_transform(exam1.reshape(-1, 1))
print(exam1_b)
The last line of the code snippet will print

[0,0,1,0,0]

[1,1,0,1,1]

[0,0,0,1,1]

[1,1,1,0,0]

Question 33
0.25 / 0.25 pts
The scatterplot implies that

None of the given options

The features are independent


The features are negatively correlated.

The features are positively correlated.

Question 34
0.25 / 0.25 pts
Which data visualization is appropriate to explore the relationship between two
attributes out of many attributes in a data frame.

Histogram

Scatter plot

Box-plot

Heat maps

Question 35
0.25 / 0.25 pts
A sample is ------------------ if it has approximately the same property of interest

Systamatic

Qualitative

Representative
Probabilistic

Question 37
0.25 / 0.25 pts
Converting the raw values of a numeric attribute is ?

Sampling

Normalization

Discretization

Smoothing

Question 38
0.25 / 0.25 pts
In the given table below there is a requirement that to get the name, gender, marks of the
top-scoring students only. Which of the following functionalities of data wrangling is
used?
Data exploration

Reshape

Replace

Filter

Question 39
0.25 / 0.25 pts
“Proximity” in Data Science terms means:

The extent of similarity or dissimilarity between two objects.

A measure of the physical distance between two objects.

None of the above.

The area surrounding an object where the object is able to exert its influence.

PartialQuestion 40
0.13 / 0.25 pts
Which of the following methods are considered to be the best practice for data cleaning?

Sorting data by attributes


cleansing large dataset without segmentation

By breaking large dataset into small data

All of the given options

co
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Quiz 1
Due Dec 19 at 23:59 Points 10 Questions 40
Available Dec 18 at 19:00 - Dec 19 at 23:59 Time Limit 60 Minutes

Instructions
The Quiz I can be attempted only once. You will not be provided with a  make-up Quiz, if you miss
this Quiz.

The quiz is timed for one hour and has to be completed once you start. You will not be allowed to go
back to the previous question if you skip a question. 

“Choose the most appropriate answer to each question.”

The answers will be visible only after three days once the quiz has ended.

All the best.

Attempt History
Attempt Time Score
LATEST Attempt 1 54 minutes 8.63 out of 10

 Correct answers will be available on Dec 22 at 0:00.

Score for this quiz: 8.63 out of 10


Submitted Dec 18 at 19:55
This attempt took 54 minutes.

Question 1 0.25 / 0.25 pts

If a person is coming from software development background, which of


the following Data science project roles will best suite him?

  Storyteller

  Big Data Engineer

  Machine Learning Engineer

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 1/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Data Analyst

Question 2 0.25 / 0.25 pts

Which of the following is not a application for data science?

  Privacy Checker

  Image & Speech Recognition

  Recommendation Systems

  Online Price Comparison

Question 3 0.25 / 0.25 pts

Which one of the following is not a necessary characteristic of a Data


Scientist?

  Communicative

  Punctual

  Creative

  Technical

Question 4 0.25 / 0.25 pts

Which of the following are correct skills for a Data Scientist?

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 2/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  probability and statistics

  all of the options

  machine learning/deep learning

  data wrangling

Question 5 0.25 / 0.25 pts

There are following key roles in data science project

  Data Scientist, Architect, SME, Sponsor, Programmer

  Data Scientist, Analyst, Sponsor

  Data Scientist, Analyst

  Data Scientist, Analyst, SME, Programmer

Question 6 0.25 / 0.25 pts

In which of the following analysts are allocated to units throughout the


organization and their activities are coordinated by a central entity.

  Functional

  Center of Excellence

  Coordinational

  Consulting

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 3/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 7 0.25 / 0.25 pts

Suppose the administrator measured the power consumption of an


entire network operations centre (NOC) and the consumption details
are: 90 W, 104 W, 98 W, 98 W, 105 W, 92 W, 102 W, 100 W, 110 W, 98
W, 200 W and 115 W.What is the range of power consumption?

  98W

  100W

  150W

  110W

Question 8 0.25 / 0.25 pts

Suppose that the mean and standard deviation of the values for an
attribute are 8.9 and 6.5, respectively. Apply z-score normalization to a
value of 3.2. [Answer should have a precision of X.XXXX]

  0.8679

  0.8769

  -0.8769

  -0.8679

Question 9 0.25 / 0.25 pts

Suppose that the mean and standard deviation of the values for an
attribute are 8.9 and 6.5, respectively. Apply z-score normalization to a
value of 10.7. 
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 4/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  -0.2679

  0.2679

  0.2769

  -0.2769

Question 10 0.25 / 0.25 pts

Suppose that the minimum and maximum values for an attribute are 4.3
and 7.6, respectively. Compute the scaled value of 5.4 if min-max
normalization is applied to scale [-1.0,+1.0]. (Answer should have a
precision of X.XX)

  0.66

  0.76

  0.33

  -0.33

Incorrect Question 11 0 / 0.25 pts

Which of the following methodologies focus the most on model


deployment and embedding in operational systems?

  SMAM

  CRISP-DM

  SEMMA

  All options are correct

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 5/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 12 0.25 / 0.25 pts

Find the odd term.

  Artificial Intelligence

  Data Wrangling

  Deep Learning

  Machine Learning

Question 13 0.25 / 0.25 pts

Which data analytic approach can identify the probabilities of an action?

  Prescriptive

  Predictive

  Diagnostic

  Descriptive

Question 14 0.25 / 0.25 pts

Aanalyzing the data to determine why some phenomena related to


learning happened a type of

  Predictive

  Prescriptive
https://bits-pilani.instructure.com/courses/1704/quizzes/3453 6/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Diagnostic

  Descriptive

Question 15 0.25 / 0.25 pts

Identify the data analytics task for the following scenario. A


pharmaceutical organization is developing a new drug or vaccine to
compact Covid-19 using machine learning techniques where the data is
from the existing drugs and the diseases it can fight or cure. 

  Prescriptive Analytics

  Diagnostic Analytics

  Predictive Analytics

  Descriptive Analytics

Question 16 0.25 / 0.25 pts

To generate genuine business value, simply gathering and keeping data


isn't enough. Technologies for big data analytics are required to:

  Integrate data from internal and external sources

  Formulate eye-catching charts and graphs

  Determine business goals and objectives

  Extract valuable insights from the data

Question 17 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 7/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

We are predicting the humidity at Bangalore using the data collected in


the last one month. This task is an example of

  Regression

  Classification

  Association Rule

  Clustering

Question 18 0.25 / 0.25 pts

We are predicting the weather condition as foggy, warm, cloudy, and


misty at Bangalore using the data collected in the last one month. This
task is an example of

  Classification

  Association Rule

  Regression

  Clustering

Question 19 0.25 / 0.25 pts

Match the following data analytics to their description:

Descriptive   What happened?

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 8/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Diagnostic   Why did this happen?

Predictive   What might happen in t

Prescriptive   What should we do nex

Question 20 0.25 / 0.25 pts

Training and testing datasets are developed during _________ phase.

  None

  Model planning

  Operationalize

  Model building

Incorrect
Question 21 0 / 0.25 pts

Fortis-Apollo hospital is planning to  design a model which maps


patients to the best possible treatments based on the diagnosis. Identify
the data analytics task for this scenario 

  Descriptive Analytics

  Predictive Analytics

  Diagnostic Analytics

  Cognitive analytics

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 9/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 22 0.25 / 0.25 pts

Which of the following artifacts are not considered in the Descriptive


analytics?

  Alerts

  Adhoc reports

  Predictive model

  Standard report

Question 23 0.25 / 0.25 pts

Attributes cannot be called as:

  Variables

  Data point

  Features

  Dimensions

Question 24 0.25 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 10/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

In a FashionStore Data set the feature Jacket_Shade { Grey,brown,


black, Indigo, Beige , Khaki} is an example of

  Nominal attribute

  Numeric attribute

  Ordinal attribute

  Continuous attribute

Question 25 0.25 / 0.25 pts

What is the difference between interval/ratio and ordinal variables?

  Interval/ratio variables contain only two categories

  Ordinal data can be rank ordered, but interval/ratio data cannot

 
The distance between categories is equal across the range of
interval/ratio data

 
Ordinal variables have a fixed zero point, whereas interval/ratio
variables do not

Question 26 0.25 / 0.25 pts

A box plot is the visual representation of the following statistical


summary

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 11/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Minimum, First Quartile, Median, Third Quartile, Maximum

  Min, Median, Mode

  Minimum, First Quartile, Third Quartile, Second Quartile, Mean

  Minimum, Average, Maximum

Incorrect
Question 27 0 / 0.25 pts

Clustering techniques can be used in

  Unsupervised Learning

  None of the answers

  Feature Selection

  Either Unsupervised Learning or Feature Selection

Question 28 0.25 / 0.25 pts

e – mail is an example of ____________________ data

  Quasi-structured

  Semi-structured

  Unstructured

  Structured

Partial
Question 29 0.13 / 0.25 pts

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 12/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Match the following techniques with the definitions

Binarization   maps a continuous attr

Binning   Divide the range of a co

Concept Hierarchy   Smooth out the effect o

Functional Transformation  Transform attribute valu

Incorrect
Question 30 0 / 0.25 pts

Consider the following Python code. 


input=['Havells','Philips','Syska','Eveready','Lloyd']
le = sklearn.preprocessing.LabelEncoder()
le.fit(input)
print(le.transform('Lloyd'))
The last line of the code snippet will print

 1

 4

 3

 2

Incorrect
Question 31 0 / 0.25 pts

The dissimilarity between two data objects is

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 13/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Lower when objects are not alike

  None of the above

  Lower when objects are more alike

  Higher when objects are more alike

Question 32 0.25 / 0.25 pts

Identify the sampling technique used in the following use case. For ML
classification task, the algorithm requires that the test set has equal
examples from the three categories.

  Systematic Sampling

  Stratified Random

  Cluster Sampling

  Simple Random

Question 33 0.25 / 0.25 pts

Missing values should always be imputed before training the model.

  Mostly True

  False

  None of the answers

  True

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 14/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 34 0.25 / 0.25 pts

Creating dummy variables during data preparation is what kind of an


operation

  Data transformation

  Data retrieval

  Data cleansing

  Data combining

Question 35 0.25 / 0.25 pts

“Similarity” means:

  A collection of similar objects.

  A listing of the similar features of a collection of objects.

 
The number of tuples in a database whose attributes have similar
values.

  Numerical measure of how alike two data objects are.

Question 36 0.25 / 0.25 pts

Data transformation is done to improve ------------- in algorithm

  Noise

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 15/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

  Inconsistencies

  Accuracy & Efficiency

  Integration

  Redundancy

Question 37 0.25 / 0.25 pts

A data object is being described by a categorical attribute having four


categories, then for data analysis purpose if we want to transform the
attributes into numerical values, then

  D. it can be done by encoding using 3 or 4 binary variables

  B. it can be done by encoding using only 3 binary variables

  C. it can be done by encoding using only 4 binary variables

  A. it can’t be done

Question 38 0.25 / 0.25 pts

In a boxplot, where Q1, Q2 and Q3 are the first, second and third
quartiles respectively, the interquartile range IQR is calculated as:

  None of the above

  IQR = Q3 – Q1

  IQR = Q3-Q2

  IQR = Q2-Q1

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 16/17
18/12/2022, 19:57 Quiz 1: Introduction to Data Science (S1-22_DSECLZG532)

Question 39 0.25 / 0.25 pts

Consider the following Python code. 


input=['Havells','Philips','Syska','Eveready','Lloyd']
le = sklearn.preprocessing.LabelEncoder()
le.fit(input)
print(le.transform(‘Syska’))
The last line of the code snippet will return

 2

 3

 4

 1

Question 40 0.25 / 0.25 pts

From mathematical models for epidemics one observes that the initial
phase is in the exponential growth. This can be verified by plotting the
number of infections on log-scale. This is a use case for which
technique.

  Transformation

  Aggregation

  Discretization

  Sampling

Quiz Score: 8.63 out of 10

https://bits-pilani.instructure.com/courses/1704/quizzes/3453 17/17

You might also like