DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Academic Year 2016-2017

QUESTION BANK- ODD SEMESTER

NAME OF THE SUBJECT DATA ANALYTICS

SUBJECT CODE IT6006

SEMESTER VII

YEAR IV

DEPARTMENT COMPUTER SCIENCE AND ENGINEERING

HANDLED & PREPARED BY Ms. SUMA.S and Ms.ANITHA.R

UNIT 1

PART – A

Q.No Question Competence Level

1 List the main characteristics of Big Data. Remember BTL 1

2 What can you say about prediction error? Understand BTL 2

3 Define the role of shared memory in MPP. Remember BTL 1

4 Define statistical inference. Remember BTL 1

5 How would you show your understanding of classic map-reduce? Apply BTL 3

Can you list the differences between forward and backward

7 Remember BTL 1

prediction?

8 Classify the types of web mining. Analyze BTL 4

9 Define Web data. Remember BTL 1

10 What is meant by Predictor coefficients? Understand BTL 2

What is the relationship between sampling distribution and re-

11 Evaluate BTL 5

sampling?

12 What is the main idea in traditional analytic architecture? Understand BTL 2

13 Summarize the data types for Big data. Understand BTL 2

Compare and contrast traditional databases and massive parallel

14 Analyze BTL 4

processing.

15 Can you generalize the role of analytical tools in big data? Create BTL 6

16 How would you apply the methods of re-sampling in Bigdata? Apply BTL 3

17 What examples can you find for modern data analytics tools? Apply BTL 3

18 Can you make the distinction between analysis and reporting? Evaluate BTL 5

Can you identify the different statistical concepts required for Big

19 Analyze BTL 4

data?

Why does one choose analytical system over conventional

20 Create BTL 6

system?

PART-B

Q.No. Question Competence Level

i. What is Bigdata? Describe the main features of a data Remember BTL 1

1 analytical system? (8)

ii. Describe in detail about the role of statistical models in Big

data.(8)

List the main characteristics of big data architecture with a neat Remember BTL 1

2

schematic diagram.

Create BTL 6

3 How would you compose the statistical concepts in inference?

4 How would you describe the various prediction techniques? Remember BTL 1

Analyze BTL 4

5 What are the features of Massive parallel processing system?

i. Show how would you use the sampling distribution Apply BTL 3

6 system?(8)

ii. Illustrate the resampling methods.(8)

How would you distinguish analysis and reporting tools used in Understand BTL 2

7

Big-data?

i. What are the best practices in Big Data Analytics?(8) Analyze BTL 4

8

ii. Elaborate the techniques used in Data analytics(8)

i. Summarize how do you explain bootstrapping in detail?(8) Understand BTL 2

9

ii. Discuss the law of total probability and Baye’s rule.(8)

i. Describe in detail about hypothesis testing? (8) Remember BTL 1

10 ii. Describe in detail about the probability distribution and

entropy. (8)

How would you show your understanding of the tools, trends and Apply BTL 3

11

technology in big data?

i. How would you assess the difficulties faced by Evaluate BTL 5

conventional systems? (8)

12

ii. What are the differences that separate out big data

architecture from the traditional one? (8)

13 Summarize the modern data analytic tools in detail. Understand BTL 2

Explain the concept of Friedman’s bias–variance decomposition Analyze BTL 4

14

for classifiers?

UNIT II

PART – A

Q.No Question Competence Level

1 What is the role of repeated measures of data in analysis? Remember BTL 1

What approach would you use for envision the “flow” of Apply BTL 3

2

dynamics?

3 What is meant by support-vector machines? Create BTL 6

Will you interpret the joint probability density in Bayesian Understand BTL 2

4

inference?

5 Can you list the various analysis involved in time series? Remember BTL 1

6 Define multivariate analysis. Remember BTL 1

7 Classify the levels of the techniques in multivariate analysis. Analyze BTL 4

Can you assess the importance of neural networks in data Evaluate BTL 5

8

analysis?

What information would you use to analyse data using Kohonen’s Create BTL 6

9

self-organizing maps?

10 How do you define propositional rule learning? Remember BTL 1

11 What is the main idea of principal component analysis? Understand BTL 2

What are the three main ways of determining principal Remember BTL 1

12

components to obtain adequate representation of data?

13 How would you extract fuzzy models from data? Apply BTL 3

Can you make the distinction between learning and Analyze BTL 4

14

generalization?

15 What can you say about Hebbian learning? Understand BTL 2

How is genetic algorithm used in solving optimization Remember BTL 1

16

tasks?

17 What is main idea of decision trees in fuzzy logic? Understand BTL 2

How would you show your understanding of delta rule in neural Apply BTL 3

18

networks?

19 Can you point out random effects models? Analyze BTL 4

How would you categorize the search techniques of stochastic Evaluate BTL 5

20

methods?

PART-B

Q.No. Question Competence Level

1 Examine how would you implement regression modeling? Remember BTL 1

2 Summarize the Bayesian methods used in data analysis. Understand BTL 2

Why do you think data analytics is becoming more relevant in Evaluate BTL 5

3

today’s software environment?

i. What is the main idea of analyzing time series?(8) Understand BTL 2

4 ii. Distinguish between linear and non-linear dynamics in brief.

(8)

i. What are the features of support vector machines?(8) Analyze BTL 4

5

ii.Identify the kernel methods used in data analysis.(8)

i. Explain briefly about Primal and dual representation in kernel Analyze BTL 4

6 perceptron.(8)

ii. Compare and contrast PCA and CCA.(8)

i. Define rule induction.(4) Remember BTL 1

7 ii. List the pros and cons of using neural networks in

analysis.(12)

i. Can you identify the different mechanisms needed for Apply BTL 3

learning?(8)

8

ii. How do use the generalization techniques needed to illustrate

neural networks? (8)

Can you list the types of evolution strategies in search analysis Remember BTL 1

9

and explain in detail?

Analyze BTL 4

10 Explain in detail about the fuzzy decision trees.

i. Define Principal component analysis. (4) Remember BTL 1

11 ii. Describe in detail about cluster analysis and mixture

decomposition.(12)

i. How would you represent the data in propositional rule Understand BTL 2

12 learning?(10)

ii.Describe the covering algorithm.(6)

13 Illustrate how would you extract fuzzy models from data? Apply BTL 3

How would you formulate the ideas of search methods in Create BTL 6

14

stochastic data analysis?

UNIT III

PART – A

Q.No. Question Competence Level

1 List the main characteristics of stream sources. Remember BTL 1

2 Can you select the most popular element in a stream? Remember BTL 1

Why do you think data stream management is relevant in data Analyze BTL 4

3

mining?

4 Define decay window. Remember BTL 1

5 Define the real-time sentiment analysis. Remember BTL 1

6 What are the issues in stream processing? Remember BTL 1

7 What is the main idea of estimating moments? Understand BTL 2

8 What would result if the cost of exact counts doesn’t match? Apply BTL 3

9 Describe the stream queries. Understand BTL2

10 What can you say about sampling streams? Understand BTL 2

11 What approach would you use to deal with infinite streams? Apply BTL 3

12 Which are the different ways to reduce the error? Remember BTL 1

13 What examples can you find for stream sources? Apply BTL 3

14 What is meant by bloom filter? Understand BTL 2

Compare and contrast RTAP (real time analytics platform) and Analyze BTL 4

15

RTSA (real time sentiment analysis)?

16 Prove by induction on m that 1+3+5+· · ·+(2m−1) = m2 Analyze BTL 4

17 Can you identify the following? Evaluate BTL 5

Suppose our stream consists of the integers 3, 1, 4, 1, 5, 9, 2, 6,

5. Our hash functions will all be of the form h(x) = ax+b mod

32 for some a and b. You should treat the result as a 5-bit binary

integer. Determine the tail length for each stream element and

the resulting estimate of the number of distinct elements if the

hash function is:

(a) h(x) = 2x + 1 mod 32.

(b) h(x) = 3x + 7 mod 32.

(c) h(x) = 4x mod 32.

Compute the surprise number (second moment) for the stream Evaluate BTL 5

18

3, 1, 4, 1, 3, 4, 2, 1, 2. What is the third moment of this stream?

Based on what you know, how would you partition the Create BTL 6

19 following bit stream into buckets 1001011011101? Find all of

them?

What information would you use to substitute the view of Create BTL 6

20

streams over databases?

PART-B

Q.No. Question Competence Level

Describe the Big Data Stream Analytics Framework (BDSAF) Remember BTL 1

1

with a neat architecture diagram

i. Can you assess the importance of sampling data in a stream? Evaluate BTL 5

2 (10)

ii. Enlist the different stream sources. (6)

i. Describe briefly how do you count the distinct elements in a Remember BTL 1

3 stream?(10)

ii.What do you meant by count–distinct problem?(6)

i. How is sentiment analysis playing a major role in data Analyze BTL 4

mining? (8)

4

ii. What approaches would you use to make sentiment

analysis?(8)

Understand BTL 2

5 Summarize the relevance of bloom filters in data mining?

What can you say about the real time analytics platform Understand BTL 2

6

applications?

Show how the mining concept used in real time sentiment Apply BTL 3

7

analysis?

8 How is data analysis used in stock market predictions? Remember BTL1

Apply BTL 3

i. What approaches would you use to estimate the moments? (8)

9

ii. Examine is the function cost of exact counts? (8)

10 Discuss the concept of decaying window in detail. Understand BTL 2

Describe how would you stream data model architecture with Remember BTL 1

11

suitable block diagram?

12 What are the phases involved in real time data analytics- Analyze BTL 4

deployment to production? Analyze.

Assuming a real time stock market situation, bring out the Create BTL 6

13

various ideas used in prediction analysis

i. Explain in detail about Alon-Matias-Szegedy algorithm for Analyze BTL 4

14 second moments.(8)

ii. Explain the concept of higher order moments(8)

UNIT IV

PART – A

Q.No. Question Competence Level

1 Define frequent itemset. Remember BTL 1

2 Compare and contrast the Multistage and Multi-Hash algorithm Understand BTL 2

3 List the features of representation of cluster? Remember BTL 1

How would you show your understanding of Market-Basket

4 Apply BTL 3

Data?

5 Define Monotonicity. Remember BTL 1

6 Can you Pick K in a K-Means Algorithm? Evaluate BTL 5

7 What can you say about CLIQUE and PROCLUS? Understand BTL 2

8 Define the Hierarchical Clustering? Remember BTL 1

9 Point out the conclusions drawn from choosing clustroid? Analyze BTL 4

10 List the clustering strategies. Remember BTL 1

11 How would you stop the Merger Process? Apply BTL 3

12 Explain the role of hash tree in association rule discovery. Remember BTL 1

13 What is meant by Merging Buckets in BDMO? Understand BTL 2

14 Formulate the applications of frequent itemset? Create BTL 6

15 Give an outline of the Limited Pass algorithm? Understand BTL 2

16 How would you use the main memory for Itemset Counting Apply BTL 3

Compare and contrast the relationship between centroids and

17 Analyze BTL 4

clustering

18 Explain the working of Toivonen’s algorithm with example? Analyze BTL 4

19 Can you identify the Pair Counting Bottleneck Evaluate BTL 5

20 How would you Initialize the K-Means algorithm? Create BTL 6

PART-B

Q.No. Question Competence Level

i. Define K-Means algorithm and how will you initialize the Remember BTL 1

1 clusters and pick the value for K? (10)

ii. Examine how the data is processed in BFR Algorithm(6)

i. Illustrate briefly about Mining frequent Itemsets with its Apply BTL 3

Applications?(12)

2

ii. Illustrate how will you find Association Rules with High

confidence?(4)

i. Explain k-means clustering algorithm with an example(8) Analyze BTL 4

3 ii. List the different hierarchical clustering techniques and explain

any one(8)

Summarize the hierarchical clustering in Euclidean and non- Understand BTL 2

4

Euclidean Spaces with its efficiency?

5 Can you explain the counting frequent items in a stream? Analyze BTL 4

i. What are the main features of GRGPF Algorithm? (6) Remember BTL 1

6 ii. How would initialize the cluster tree and add points in

GRGPF Algorithm?(10)

7 Describe about Stream clustering and parallel clustering. Understand BTL 2

A database has five transactions. Let min sup = 60% and min Create BTL 6

conf=80%

TID ITEMS

T100 Milk, Onion, Nuts, Kiwi, Egg, Yoghurt

8 T200 Dhal, Onion, Nuts, Kiwi, Egg, Yoghurt

T300 Milk, Apple, Kiwi, Egg

T400 Milk, Curd, Kiwi, Yoghurt

T500 Curd, Onion, Kiwi, Ice cream, Egg

Find all frequent itemsets using Apriori method

Discuss the various steps of PROCLUS clustering algorithm and Understand BTL 2

9

its significances

Write short notes on Remember BTL 1

i. Simple Randomized Algorithm(4)

10

ii. SON Algorithm(6)

iii. Toivonen’s Algorithm(6)

Illustrate how would you describe the various steps of CLIQUE Apply BTL 3

11

clustering algorithm and its significances

What approach would you use to handle large datasets in main Remember BTL 1

12

memory?

Explain Apriori algorithm and with an example show how Analyze BTL 4

13

association rules are generated from frequent itemsets.

14 Evaluate the market basket data and its use in main memory. Evaluate BTL 5

UNIT V

PART – A

Q.No. Question Competence Level

1 What is CAP theorem? State its significances. Remember BTL 1

2 Describe Relational Database? Understand BTL 2

3 What are the components of Hadoop framework Evaluate BTL 5

4 Explain how can you manage compute node failures? Analyze BTL 4

5 What is the advantage of MaPR? Remember BTL 1

6 Give the applications of IDA. Understand BTL 2

7 What is Hadoop Distributed File System? Remember BTL 1

8 Show the advantage of visual data exploration. Apply BTL 3

Who is generating big data and what are the ecosystem projects Create BTL 6

9

used for processing?

10 Illustrate dimensional stacking. Apply BTL 3

11 List the data types to be visualized. Remember BTL 1

12 How does Map-Reduce computation execute? Understand BTL 2

13 What is NoSQL? Remember BTL 1

14 Illustrate Reduce function. Apply BTL 3

15 Classify visualization techniques. Analyze BTL 4

16 Discuss the features of Hive. Understand BTL 2

17 Classify interaction techniques. Analyze BTL 4

18 What is hive in Big Data? Remember BTL 1

19 Judge why the partitions are shuffled in map reduce? Evaluate BTL 5

20 How will you formulate Hadoop development Create BTL 6

PART-B

Q.No. Question Competence Level

i. Highlight the features of Hadoop and explain the Remember BTL 1

functionalities of Hadoop cluster? (8)

1

ii. Describe briefly about Hadoop input and output and write a

note on data integrity?(8)

i. Illustrate in detail about Hive data manipulation, queries, Apply BTL 3

2 data definition and data types.(8)

ii. Write a brief note on composing map reduce calculations. (8)

Describe the system architecture and components of Hive and Remember BTL 1

3

Hadoop.

Explain briefly on Analyze BTL 4

4

i. MapR (6) ii.Shrading (6) iii. S3 (4)

Consider a collection of literature survey made by a researcher Create BTL 6

in the form of a text document with respect to cloud and big

5

data analytics. Using Hadoop and Map Reduce, write a

program to count the occurrence of pre dominant key words.

i. Describe Map Reduce framework in detail. Draw the Remember BTL 1

architectural diagram for physical organization of compute

6

nodes(8)

ii. Define HDFS. Explain HDFS in detail. (8)

i. Explain in what ways the data type can be visualized.(8) Apply BTL 3

7

ii. Explain the classification of Interaction techniques. (8)

Summarize briefly on Understand BTL 2

8 i. Algorithms using MapReduce.(8)

ii. Extensions to MapReduce.(8)

Write short notes on Remember BTL 1

9

i. NoSQL Databases and its types.(8)

ii. Visualization for Big Data.(8)

10 Can you discuss the Diversity of IDA Applications? Understand BTL 2

11 Compare and Contrast the Hadoop and MapR. Analyze BTL 4

Explain the complexity theory for Map-Reduce? What is Analyze BTL 4

12

reducer size and replication rate/

13 Describe in detail about the issues in the development of IDA. Understand BTL 2

State the significances of MapReduce and discuss about Evaluate BTL 5

14

Hadoop distributed file system architecture with neat diagram.

