You are on page 1of 3

Total No. of Questions : 8] SEAT No.

8
23
P-7545 [Total No. of Pages : 3

ic-
tat
[6180]-53

5s
T.E. (Computer Engineering)

3:3
02 91
9:5
DATA SCIENCE AND BIG DATA ANALYTICS

0
30
(2019 Pattern) (Semester - II) (310251)
2/1 13
Time : 2½ Hours] [Max. Marks : 70
0
2/2
.23 GP

Instructions to the candidates :


1) Answer Q1 or Q2, Q3 or Q4, Q5 or Q6. Q7 or Q8.
E
81

2) Neat diagrams must be drawn wherever necessary.

8
C

23
3) Figures to the right side indicate full marks.

ic-
4) Assume suitable data if necessary.
16

tat
5) Use of Scientific calculator is permitted.
8.2

5s
.24

Q1) a) Explain Data Analytics Cycle with suitable diagram and its phases. [8]
3:3
91
49

b) List and Explain the various activities involved in identifying potential


9:5
30

data resources as a part of discovery phase in Data Analytics Life Cycle?


30

[9]
01
02

OR
2/2
GP

Q2) a) List and explain the key roles for successful analytics project. [8]
2/1
CE

b) Write short note on : [9]


81

8
23
i) Common Tools for the Model Building
.23

ii) Model selection for Data Analytics ic-


16

tat
8.2

5s
.24

3:3

Q3) a) List and explain the various types of analytics in Big data. [9]
91
49

9:5

b) Calculates the support and confidence value for all the possible item sets.[9]
30
30

Transaction ID Items bought


01
02

1 Onion, Potato, Cold Drink


2/2
GP

2 Onion, Burger, Cold Drink


2/1

3 Eggs, Onion, Cold Drink


CE
81

4 Potato, Milk, Eggs


.23

5 Potato, Burger, Cold Drink, Milk, Eggs


16

OR
8.2

P.T.O.
.24
49
Q4) a) Explain the need of logistic regression along with its various types. [9]

8
23
b) Explain the following terms with suitable example. [9]

ic-
i) Removing Duplicates from dataset.

tat
5s
ii) Handling Missing Data

3:3
02 91
9:5
Q5) a) Suppose that the given data the task is to cluster points (with (x, y)

0
30
representing location) into three clusters, where the points are A1 (2, 10),
2/1 13
A2(2, 5), A3(8, 4), B1(5, 8), B2(7, 5), B3(6, 4), C1(1, 2), C2(4, 9). The
0
2/2
distance function is Euclidean distance. Suppose initially we assign A1,
.23 GP

B1 and C1 as the center of each cluster, respectively. [8]


E

Use the k-means algorithm to show only show only the first round of
81

8
C

23
execution with cluster center.

ic-
b) Explain the following Text Analysis steps with suitable example [9]
16

tat
8.2

i) Part-of-speech(POS)tagging

5s
.24

3:3
ii) Lemmatization
91
49

9:5
OR
30
30

Q6) a) Given the confusion matrix, Calculate Accuracy, Precision, Recall, Error
01
02

rate with description on Diabetic Risk. [8]


2/2
GP

Predicted classes
2/1

Classes Diabetic Risk Diabetic Risk


CE
81

8
-Yes -No

23
.23

Actual Diabetic Risk- 90 210


ic-
16

tat
classes Yes
8.2

5s

Diabetic Risk- 140 9560


.24

3:3
91

No
49

9:5
30

b) Explain the Text Preprocessing steps with suitable example. [9]


30
01
02
2/2

Q7) a) List the few data visualization tools and discuss any four applications of
GP
2/1

data visualization along with the use of the various plots with Python/R
CE

or suitable tool. [9]


81

b) List the challenges of Data Visualization. Explain the types of visualization


.23

with example. [9]


16
8.2

OR
.24

[6180]-53 2
49
Q8) a) Explain in detail the Hadoop Ecosystem with suitable diagram along with

8
23
the various components. [9]

ic-
b) Write a short note on the following. [9]

tat
5s
a) Map Reduce

3:3
b) Pig

02 91
9:5
0
30
2/1 13 
0
2/2
.23 GP
E
81

8
C

23
ic-
16

tat
8.2

5s
.24

3:3
91
49

9:5
30
30
01
02
2/2
GP
2/1
CE
81

8
23
.23

ic-
16

tat
8.2

5s
.24

3:3
91
49

9:5
30
30
01
02
2/2
GP
2/1
CE
81
.23
16
8.2
.24

[6180]-53 3
49

You might also like