You are on page 1of 2

7D

CD

17
AA

06
Paper / Subject Code: 42172 / BIG DATA ANALYTICS

EF

E6

F0
D7
E7

CD

17
AA
B0

F7

0
E6

F0
D7
E7

AE
98

CD

17
B0

F7
9A

E6
7
E7

AE
98
1E

D
Time: 3 Hours Marks: 80

D
B0

F7
9A

C
1F

7
7

E
8
1E
A0

D
0E

D
9

AA

F7
A

7C
1F
61

B
Note: 1. Question 1 is compulsory

E9

E
98
A0
00

7D
E

AA
F1

B0
A
2. Answer any three out of the remaining five questions.

7F

61

EF
E9
1

7
98
0
61
3. Assume any suitable data wherever required and justify the same.

A
A

1
0

B0
A
DE

1F

A
F

61

9
7

7
98
E
A0
61

00

E
7C

A
F1

B0
Q1 a) Distinguish between Name node and Data node.

9A
[5]

7A
7F

1
7D

1
CD

06

8
1E
A0
61

0E
A9
b) List and explain the core business drivers behind the NoSQL movement. EF [5]

F0
7

DE

1F
1

8B
7D

E9
7
AA

06
c) Mention four characteristics of big data. Elaborate these characteristics with

A0
[5]

1
7C

A9
F

F1
0
E7

E
E

7F
respect to social media websites.

61
D

9
01
CD
A

E
B0

F7

61

0
A

1A

F1
F0
d) List and explain the different issues and challenges in data stream query [5]

9A
D7
7

E
E
98

01
D

7
AA

6
processing.

1E
B0

F7

61
9A

00
7C

1A
7

DE

1F
AE

7F
8
1E

D
0E
9

06

A0
7

61
9A

C
1F

7A
B

F0
Q2 a) What is a key-value store? What are the benefits of using a key-value store? [10]

D7

E
E
8

61
1E
A0

17
A9

AA
B0

F7

00
b) Write a map reduce pseudo code to multiply two matrices. Apply map reduce [10]

7C
1F
61

E6
E9

E7

7F
8
A0

working to perform following matrix multiplication.


00

CD
A9

A
F1

B0

F7

61
7A
7F

1 2 6 7
9
1
6

D7

DE
AE
8
1E
0
61

E
9

X
A
0

F7
A
DE

7C
F

A
7F

61

B
9

3 4 8 9
1

E7

E
8
E
0
61

00

7D
C

A9

A
1A

F1

B0
D7

DE

7A
7F

EF
E9
01
06

8
F7

E
Q3 a) Suppose the stream is S = {2, 1, 6, 1, 5, 9, 2, 3, 5}. Let hash functions h(x) = ax + [10]
C

A9

AA
1A
E6

1
F0

B0
D7

1F
AE

b mod 16 for some a and b, treat result as a 4-bit binary integer. Show how the
E9
D

E7
8
A0
F7

00
C
A

A9

Flajolet- Martin algorithm will estimate the number of distinct elements, h(x) = 4x
6

F1

B0
D7
E7

E
AE

61

E9
01
CD

+ 1 mod 16.
98
B0

F7

0
A

1A
E6

F1
F0

9A
D7
E7

E
98

01
D

17
AA

06

1E
B0

7
9A

7C

b) Consider the following data frame given below: [10]


1A
EF

E6

0
E7

1F
7F
98
1E

CD

course id class marks


A

06
B0

A0
7

61
9A
1F

F0

1 11 1 56
7
E7

E
E
98

61
1E

17
AA

2 12 2 75
B0

7
9A

00
7C
1F

EF

E6
E7

7F

3 13 1 48
8
1E
0

7D

CD
A9

A
1A

61
1F

4 14 2 69
8B

EF
E9
6

D7
E7

DE
0
00

5 15 1 84
9

AA
1A

F1

F7
A

7C
7F

6 16 2 53
E9
01
6

E
8
00

7D
E
A9

A
1A

F1

7A
F

8B

EF
E9
01
17

06

i. Create a subset of course less than 3 by using [ ] brackets and demonstrate


0E
9

AA
1A
E6

1
F0

A
1F

the output.
8B
E9
CD

06

E7
A0
61

9
F1

ii. Create a subset where the course column is less than 3 or the class equals
F0

B0
9A
D7

DE

01
7

06

to 2 by using subset () function and demonstrate the output.


98
1E
61
7C

1A
F0

9A
DE

1F
7D

06

1E
A0
61

Q4 a) Explain natural join and grouping and aggregation relational algebraic operation
7C

[10]
EF

0
DE

1F
7F

1
7D

using MapReduce.
AA

06

A0
61
7C
EF

F0
E7

DE

61
7D

17
AA

b) With a neat sketch, explain the architecture of the data-stream management [10]
00
7C
EF

E6

system.
7

7F
7D
0E

CD
AA

61
B

EF

D7
7

DE
98

0E

30013 Page 1 of 2
AA

F7
9A

7C
8B

AE
1E

7D
0E
A9

7A
8B

EF
E9

0E
A9

01F1E9A98B0E7AAEF7D7CDE617F0061A
AA
F1
1E E7 D7 7F 1F
9A AA CD 00 1E
98 6 1A 9A
F1 B 0E
EF
7D
E6
17 0 98
E9 7AA 7C F 00
1F
1E
B0
E7
A9
8B EF DE 6 1A 9A A
61 0 98 AE
0E
7
7D
7C 7F0 1F B0 F7
A9 AA 06 1E E7 D7
8B EF DE
61 1A 9A A AE C
0E 7D 7 01 98 DE

Q6 a)
Q5 a)

b)
b)

30013
7 AA 7C E7F0 F1
7F
B0 F7
D7
61
06 E9
EF A9 DE
61 AA 1A 00
61 C
0E 7D 01 8 7 E DE

A
B0 F7 61 A0

bars
7A 7C F0 F1

Milk

users.
7F 1F

Bread
06 E9 E7 D7
AE DE 1 A A A C D 0 0 1E
9 6

Product
F7 61 A0 1 9A

Detergent
8B

Chocolate
EF E6

Cola Cans
D7 7F 1F 0 7 1 A0 98
0 1

B
AA CD E7 7F 1F

different days:
06 E9 D7 B0
EF E6 1A AA CD 00 1E E7

5
6
A9 61
8 E E 9A

10
21
12
7D 17 01 B F 6 A AA
F0 F1 0 7 1 0 9 8
Newman algorithm.

7C 06 E9 E7 D 7 1 B0 EF
1 A 7C F0 F1
C

DE A E 7D
61 A0 98B A D 06
1A
E9 7 A 7C
7F 1F 0
EF
7
E6
1 0
A9
8 A E DE
E D 7 1

8
7
1
3
00 1E
7 7 F F B F 61

27
61 9A AA CD 00 1E 0E 7D 7F
A0 98B EF E6 6 1A 9 A9 7 AA 7CD 00
1F 61
1E 0E 7D 17
F0
01 8 B0 E F7 E 61 A0
9A 7A 7C F1 E D 7
A D 0 61 E 9 7A 7C F 0
1F
E

4
5
98 E A 0 1E
D

12
33
18
B0
E7
F7
D7
61
7F
A0
1F 9 8 AE DE 61 9A
0
B0 F7 61 A0 98

Page 2 of 2
AA CD 06 1 E9 E7 D7 7F 1F B0
EF E6 1A A9 AA CD 00 1E E7
7D 17 01 8 E E 61 9A
B F 6 A AA

_____________________
Monday Tuesday Wednesday
0 7 1 0
F

Create five sample numeric vectors from this data.


7C F0 F1 E D 7 1 98
06 E9 7A F0 F1 B0
List and discuss various types of data structures in R.

DE 7C

6
y
1A A9 A D 06 E9 E7
11
61

20
13
20
8 E E 1 A

01F1E9A98B0E7AAEF7D7CDE617F0061A
01 AA
7F 61 9
E

00 F1 B0 F7 A0
8 EF
E9 E7 D7 7F 1F B0
61 A9 A C 0 06 1 E7 7D
Thursda

E9
A0
1F 8 B0
AE
F7
DE
61 1A0 A A AE
E D 7 1 98
1E B

ii. Name and explain the operators used to form data subsets in R.
9
Paper / Subject Code: 42172 / BIG DATA ANALYTICS

7A F0 F1 F7

23
12
12
15

9A 7C 0 E 0E
98 A E D E 61 9 A 7 A
D7
CD
B0 F7 61 A0 98 AE
Friday

E7 D7 7F 1F B0 F7
AA CD 00 1E E7

Define collaborative filtering. Using an example of an e-commerce site like


6 D7
9A
i. The following table shows the number of units of different products sold on

1A

flipkart or amazon describe how it can be used to provide recommendation to


EF E6 AA CD
7D 17
F0
01 98
B0 E F7 E6
7C F1 E D 17
DE 06
1A
E9
A9 7 AA 7 CD
61 01 8 E E6
7F B0 F7

[10]
[10]

F1
[10]
Determine communities for the given social network graph using Girvan- [10]

00 E E D 17
61 9 7 A 7 C F0
A0 A9
8B A EF D E6
06
1F 0E 7D 17
1E F0
9A 7A 7C
A D 0

You might also like