You are on page 1of 3

Noida Institute of Engineering and Technology, Greater Noida

Printed Page:- Sub Code: ………KCS051…………..


Paper Id: Roll
No.:

Pre-University Test (Online)

B. TECH.
(Semester- 5 ) THEORY EXAMINATION 2020-21
Sub Name: DATA ANALYTICS

Time: 3 Hours Total Marks:100


Note: 1. Attempt all Sections. If require any missing date; then choose suitably.

SECTION-A
1. Attempt all questions in brief. 2 x 10 = 20
Q.No. Question Marks CO
a. Distinguish between supervised and unsupervised learning with example. 2 C02
b. Elaborate the five V of BIG DATA and also present suitable example. 2 Co2
c. Discuss various skill sets which are required to become a data scientist and also 2 CO1
explain multiple job roles associated.
d. What do you mean by kth moment in data stream? Compute the surprise number 2 CO3
(second moment) of the stream 3 1 4 1 3 4 2 1 2.
e. What is the difference between linear and logistic regression? 2 CO2
f. Explain the defuzzification process with at least 2 different methods along with 2 CO4
example.
g. Present the advantages of R over Python. 2 CO5
h. Assume user want to cluster 7 observation into 3 clusters using K-means 2 C05
clustering algorithm. After first iteration the clusters C1,C2,C has the following
observation C1: {(1,1) , (4,4), (7,7)}
C2: {(0,4), (4,0)}
C3: {(5,5) , (9,9)}
What will be cluster centroids if user go for second iteration.
i. List various types of distance measure used in the clustering with suitable 2 CO3
examples
j. How we find the outlier in any data set with respect to each feature in R? 2 CO5

SECTION-B
2. Attempt any three of the following: 3 x 10 = 30
Q.No. Question Marks CO
a. Explain each phase of data analytic life cycle and also present it with a neat 10 CO1
diagram.
b. Illustrate the working of Blooms filter with an example. 10 C01
c. A fair coin is tossed twice. What is the probability that both tosses result in 10 CO2
heads given that at least one of the tosses resulted in head?

1|Page
d. With respect to Fuzzy logic explain these terms with diagram and appropriate 10 CO2
example
1. Core
2. Support
3. Boundary
4. Cross over point
5. height
e. What are the limitations of machine learning? How deep learning overcome 10 CO2
these aspects and explain the perceptron learning algorithm with neat diagram
and terminologies.

SECTION-C

3. Attempt any one part of the following: 1 x 10 = 10


Q.No. Question Marks CO
a. What is Big Data? Why we need to analyze Big Data ?List out the characteristics 10 CO5
of Big data and challenges in handling big data?
b. Justify Why SUPPORT VECTOR MACHINE is effective on high dimensional 10 CO2
data and discuss the polynomial kernel function for multiple classes/

4. Attempt any one part of the following: 1 x 10 = 10


Q.No. Question Marks CO
a. What is difference between clustering and classification? Explain the K-means 10 CO3
clustering algorithm step wise .
b. A Diagnostic Test is conducted on 960 patients to detect a disorder with a 10 CO2
prevalence rate of 6.25% in the population. Assume that the test has a
specificity of 83.33%. How many people are incorrectly identified as having a
disease?

5. Attempt any one part of the following: 1 x 10 = 10


Q.No. Question Marks CO
a. perform agglomerative clustering using single linkage on following data set and 10 CO3
also draw the dendrogram.
Distance A B C D E
A 0 5 2 3 1
B 5 0 1 3 2
C 2 1 0 6 2
D 3 3 6 0 3
E 1 2 2 3 0
b. What is the concept of data stream? How we found unique element in continuous 10 CO4
stream. Explain various steps of FLAZOLET MARTIN algorithm with
appropriate example.

6. Attempt any one part of the following: 1 x 10 = 10

2|Page
Q.No. Question Marks CO

a. Explain the concept of Apriori Algorithm. Solve the numerical with min support 10
count =2. Generate the association rule with confidence value 60%.List out the
item which are frequently purchased on the basis of association rule.
T1 ITEM 1, ITEM 3 , ITEM 4
T2 ITEM 2 , ITEM 3, ITEM 5
T3 ITEM 1 , ITEM 2 ,ITEM 3, ITEM 5
T4 ITEM 2, ITEM 5
T5 ITEM 1 , ITEM 3, ITEM 5
b. Discuss the various component of time series analysis and also explain the ARIMA 10
model .

7. Attempt any one part of the following: 1 x 10 = 10


Q.No. Question Marks CO
a. With a neat diagram of Apache Hadoop Eco system Explain the following terms: 10 CO5
1. Map reduce job work flow with diagram
2. HIVE
3. APACHE PIG component
4. HDFS
5. HBASE
b. Using NAÏVE BAYES CLASSIFIER COMPUTE the probability that a RED SUV 10 CO2
DOMESTIC is going to stole or not. Write all computational steps.

Example no Colour Type Origin stolen


1 RED SPORTS DOMESTIC YES
2 RED SPORTS DOMESTIC NO
3 RED SPORTS DOMESTIC YES
4 YELLOW SPORTS DOMESTIC NO
5 YELLOW SPORTS IMPORTED YES
6 YELLOW SUV IMPORTED NO
7 YELLOW SUV IMPORTED YES
8 YELLOW SUV DOMESTIC NO
9 RED SUV IMPORTED NO
10 RED SPORTS IMPORTED YES

3|Page

You might also like