You are on page 1of 1

Total No. of Questions : 4] SEAT No.

8
23
PA-10062 [Total No. of Pages : 1

ic-
[6009]-353

tat
T.E.(Information Technology) (Insem)

0s
DATA SCIENCE AND BIG DATA ANALYTICS

6:3
(2019 Pattern) (Semester-II) (314452)

02 91
2:0
0
Time : 1 Hour] [Max. Marks : 30

31
5/0 13
Instructions to the candidates:
1) All questions are compulsory.
0
4/2
2) Figures to the right indicate full marks.
.23 GP

3) Assume suitable data if necessary.


4) Attempt Q.1 or Q.2, Q3 or Q4.
E
80

8
C

23
Q1) a) Explain 6V’s for defining Big Data along with the factors responsible for

ic-
data explosion? [8]
16

tat
b) List and explain data processing infrastructure challenges in Big Data
8.2

0s
with suitable example. [7]
.24

6:3
OR
91
49

2:0
Q2) a) List and explain choices for reengineering the data warehouse? [8]
30
31

b) Explain shared-everything and shared nothing architectures in detail with


respect to Big data? [7]
01
02
4/2
GP

Q3) a) Explain the following terms. [7]


5/0

i) Expectation
CE
80

8
ii) Pair wise independence.

23
.23

b) Given that a person last purchase was coke. there is a 90% chance that
his next purchase will also be coke. If a person’s last purchase was ic-
16

tat
Pepsi, there is an 80% chance that his next purchases will also Pepsi.[8]
8.2

0s

i) Given that a person is currently a Pepsi purchaser, what is the


.24

6:3

probability that he will purchase Coke two purchases from now?


91
49

2:0

ii) Give that a person is currently a Coke drinker, what is the probability
30
31

that he will purchase Pepsi three purchases from now?


01

OR
02

Q4) a) Explain Flajolet Martin Distance Sampling? Find the distinct element
4/2
GP

from the element stream 4,2,5,9,1,6,3,7. Consider the Hash function


5/0

h(x) = (3x+7) mod 32. [8]


CE
80

b) Find the variance and standard deviation for the following data
.23

set: 70, 60, 72, 42, 86 [7]


16
8.2


.24
49

[6009]-353 1

You might also like