You are on page 1of 9

www.padeepz.

net

SUBJECT : Big Data Analytics

SEM / YEAR: VI Sem/ III Year

UNIT I INTRODUCTION TO BIG DATA


Evolution of Big data - Best Practices for Big data Analytics - Big data characteristics - Validating - The
Promotion of the Value of Big Data - Big Data Use Cases- Characteristics of Big Data Applications -
Perception and Quantification of Value -Understanding Big Data Storage - A General Overview of High-
Performance Architecture - HDFS - Map Reduce and YARN - Map Reduce Programming Model
PART – A
BT
Q.No. Questions Competence
Level
1 What are the challenges of conventional system BTL1 Remembering
2 List the main characteristics of Big Data. BTL1 Remembering

3 Why you need to tame big data? BTL1 Remembering


4 Summarize the data types for Big data. BTL2 Understanding
5 What are the various dimensions of growth of big data? BTL2 Understanding
6 Discuss the types of data analytics BTL2 Understanding
7 Outline the need for a distributed file system BTL1 Remembering
8 What is the need of Map Reduce model? BTL1 Remembering
9 Discuss the map- reduce strength and weakness. BTL2 Understanding
10 What is the advantage of MaPR BTL1 Remembering
11 Define HDFS BTL1 Remembering
12 Describe how Map-Reduce computation executes? BTL2 Understanding
13 Illustrate Reduce function BTL3 Applying
14 Why is HDFS preferred to RDBMS? BTL4 Analyzing
15 Explain how can you manage compute node failures? BTL4 Analyzing
16 Classify the components of Hadoop framework BTL5 Evaluating

17 Judge why the partitions are shuffled in map reduce? BTL5 Evaluating
18 How will you formulate Hadoop development BTL6 Creating
19 Who is generating big data and what are the ecosystem projects used
BTL6 Creating
for processing
20 Why BTL6 Creating
c o does
nv one
en choose
tio naanalyt ica
l system? ls yst e m o ve r
e p z .n e t w w w .p ad e
www.padeepz.net
PART – B
List the main characteristics of big data architecture with a neat BTL1 Remembering
1
schematic diagram.(13)
2 Explain in detail about the challenges of conventional system(13) BTL4 Analyzing
3 How would you show your understanding of the tools, trends and BTL3 Applying
technology in big data?(13)
4 i. What are the best practices in Big Data analytics? (6) BTL4 Analyzing
ii. Explain the techniques used in Big Data Analytics. (7)
5 i. What make a great analysis? State reason with example.(6) BTL3 Applying
ii. Examine in detail the trends and technology in big data (7)
i. Describe in detail about 5v‟s? (6)
6 ii. Illustrate Map-Reduce with example.(7) BTL1 Remembering
Discuss the use of Big Data Analytics in Business with suitable real
7 world example.(13) BTL2 Understanding
i. Generalize the list of tools related to Hadoop.(6) BTL 6 Creating
8 ii. How does Hadoop work?(7)
9 Discuss the following features of Apache Hadoop in detail with
BTL2 Understanding
diagram as necessary.(13)
10 Summarize briefly on BTL2 Understanding
i. Feature of MapR distribution(7)
ii. Explain the architecture for MapR.(6)
11 Explain the complexity theory for Map-Reduce? What is reducer size
BTL4 Analyzing
and replication rate (13)
12 Explain the significances Hadoop distributed file systems and its
BTL5 Evaluating
application (13)
13 i. Describe Map Reduce framework in detail. Draw the architectural
diagram for physical organization of compute nodes(7) BTL1 Remembering
ii.Define HDFS. Explain HDFS in detail(6)
14 i. Highlight the features of Hadoop and explain the functionalities of
Hadoop cluster?(7)
BTL1 Remembering
ii. Describe briefly about Hadoop input and output and write a note on
data integrity?(6)
PART – C
Generalize how the data flow takes places in MapReduce framework BTL6 Creating
1 (15)
State the significances of MapReduce and discuss about Hadoop
2 BTL5 Evaluating
distributed file system architecture with neat diagram (15)
Consider a collection of literature survey made by a researcher in the
form of a text document with respect to cloud and big data analytics.
3 Using Hadoop and Map Reduce , write a program to count the BTL5 Evaluating
occurrence of pre dominant key words (15)
4 Examine the NameNode recovery process. What will happen with a
NameNode that doesn‟t have any data? (15) BTL6 Creating
UNIT II CLUSTERING AND CLASSIFICATION
Advanced Analytical Theory and Methods: Overview of Clustering - K-means - Use Cases - Overview of the Method -
Determining the Number of Clusters - Diagnostics - Reasons to Choose and Cautions .- Classification: Decision Trees -
Overview of a Decision Tree - The General Algorithm - Decision Tree Algorithms - Evaluating a Decision Tree -
Decision Trees in R - Naïve Bayes - Bay es „T heo r em - N aï ve B ay e s C la ssifier.
w w w .p ad e e p z . n e t
www.padeepz.net

PART – A
Q.No. Question BT Level Competence
1 Define clustering BTL 1 Remembering
2 How can the initial number of clusters for k-means algorithm be
BTL3 Applying
estimated
3 Can you Pick K in a K-Means Algorithm? BTL5 Evaluating
4 What are the problems faced if clustering exists in non-Euclidean BTL2 Understanding
5 Point out the conclusions drawn from choosing clustroid? BTL4 Analyzing
6 Compare and contrast the relationship between centroids and
BTL4 Analyzing
clustering
7 Generalize the initialization of K-Means algorithm? BTL6 Creating
8 Define Bayes Theorem BTL 1 Remembering
9 Discuss the number of clusters? BTL2 Understanding
10 What is diagnotics? BTL 1 Remembering
11 Examine the use of object Attributes?
BTL3 Applying
What is unit of measure? BTL 1 Remembering
12
13 Summarize about Rescaling? BTL2 Understanding
14 What is metoids? BTL2 Understanding
15 What is Customer segmentation BTL 1 Remembering
16 Describe the prediction trees BTL2 Understanding
17 Analyze on internal nodes and leaf nodes BTL4 Analyzing
18 What is purity of node BTL 1 Remembering
Point out the CART BTL4 Analyzing
19
Illustrate Naïve Bayes
20 BTL3 Applying
PART – B
1 Explain the K-means clustering algorithm with an example.(13) BTL2 Understanding
i. Define K-Means algorithm(6)
2 BTL1 Remembering
ii. Examine how the data is processed in BFR Algorithm(7)
What are the main features of GRGPF Algorithm and explain it?(13) BTL1 Remembering
3
Summarize the hierarchical clustering in Euclidean and non-Euclidean
4 BTL2 Understanding
Spaces with its efficiency?(13)
5 Describe the various hierarchical methods of cluster analysis. (13) BTL2 Understanding
Explain and list the different hierarchical clustering techniques and
6 BTL4 Analyzing
explain any one. (13)
i. Given a one dimensional dataset {1, 5, 8, 10, 2} use the
7 BTL6 Creating
agglomerative clustering algorithms with the complete link with
Euclidean distance to westwabwlish.pa
ahiderearcehpicazl .gnroeupting
www.padeepz.net
relationship. By using the maximal lifetime as the cutting
threshold, how many clusters are there? What is their
membership in each cluster? (6)
ii. State and explain the clustering in non-Euclidean space with
example. (7) BTL2 Understanding
8 Describe about Market-Basket model.(13) BTL1 Remembering
9 Demonstrate about the two clustering techniques with suitable
BTL3 Applying
example.(13)
Illustrate about the clustering? Explain it with proper example.(13)
10 BTL1 Remembering
11 Explain in detail about the applications of clustering. (13)
BTL4 Analyzing
Generalize about general algorithm and decision tree algorithm. (13)
12 BTL 6 Creating
Illustrate in detail about Decision tree in R.(13)
13 BTL3 Applying
Explain in detail about Naïve Bayes Theorem, Classifier, Smoothing
14 BTL4 Analyzing
and Diagnostics. (13)
PART – C
Use the k-means algorithm and Euclidean distance to cluster the BTL5 Evaluating
1
following 8 examples into 3
clusters:A1=(2,10),A2=(2,5),A3=(8,4),A4=(5,8),A5=(7,5),A6=(6,4),A
7=(1,2),A8=(4,9).Suppose that the initial seeds(centers of each cluster)
are A1,A4 and A7.Run the k-means algorithm for 1 epoch only. At the
end of this epoch show
(i) The new clusters (5)
(ii) The centers of the new clusters (6)
(iii) How many more iterations are needed to coverage? Draw
the result for each epoch. (4)
2 Develop decision tree with an example to predict whether customers BTL 6 Creating
will buy a product(15)
3 Explain in detail about evaluate the decision tree algorithm(15) BTL5 Evaluating
Explain in detail about two methods of using the naïve Bayes classifier BTL 6 Creating
4 in R with examples.(15)

UNIT III ASSOCIATION AND RECOMMENDATION SYSTEM

Advanced Analytical Theory and Methods: Association Rules - Overview - Apriori Algorithm - Evaluation
of Candidate Rules - Applications of Association Rules - Finding Association& finding similarity -
Recommendation System: Collaborative Recommendation- Content Based Recommendation - Knowledge
Based Recommendation- Hybrid Recommendation Approaches.
PART – A
BT
Q.No. Question Competence
Level
1 Define apriori algorithm BTL1 Remembering
2 State the use of Apriori algorithm in data mining BTL1 Remembering
3 State market basket analysis BTL1 Remembering

www.padeepz.net
www.padeepz.net
4 What is the logic behind association rule? BTL2 Understanding
5 What is Prune BTL2 Understanding
6 Define Confidence BTL1 Remembering
7 What do you mean by lift BTL2 Understanding
8 Show the advantage of leverage BTL3 Applying
9 What is frequent itemset generation BTL1 Remembering
10 Analyze the Validation and testing
BTL4 Analyzing
11 Demonstrate the approaches to improve Apriori efficiency
BTL3 Applying
12 How are interesting rules identified? BTL4 Analyzing
13 Examine the broad classification of Recommendation systems? BTL3 Applying
14 Summarize the interesting rules distinguished from coincidental rules? BTL 5 Evaluating
What is utility matrix
15 BTL4 Analyzing
16 Examine about item profile BTL1 Remembering
17 Give the Content Based recommendation system BTL 5 Evaluating

18 Explain collaborative filtering system BTL4 Analyzing


Define knowledge based Recommendation BTL2 Understanding
19
Give the definition Hybrid recommendation BTL2 Understanding
20
PART – B
1 Explain the apriori algorithm for mining frequent item sets with an BTL1 Remembering
example. (13)
2 Illustrate how will you find Association Rules with High confidence(13) BTL3 Applying
Evaluate Apriori Algorithm in detail for the following super market
Scenarios(13)
3

Transacti Onion Potato Burger Milk Gee


on ID BTL5 Evaluating
T1 1 1 1 0 0
T2 0 1 1 1 0
T3 0 0 0 1 1
T4 1 1 0 1 0
T5 1 1 1 0 1
T6 1 1 1 1 1
4 Describe the Recommendation systems? Clearly explain the two BTL1 Remembering
applications for Reccomdation systems.(13)
5 Discuss in detail about any one Ranking algorithm used by Search BTL2 Understanding
Engines?(13)
6 Explain Recommendation based on User Ratings using appropriate BTL1 Remembering
example.(13)
7 Differentiate collaborative fil ter ing an d c o nte n t BTL2 Understanding
w w w .pad ee
b as ed sy st ems. (13)
p z .n e t
www.padeepz.net
8 Explain collaborative filtering based recommendation system. (13)
9 Differentiate between lexical similarity and semantic similarity of BTL2 Understanding
documents.(13)
10 Explain in detail about Frequent item set generation and rule generation. BTL1 Remembering
(13)
Explain in detail about evaluation of candidate rule. (13) BTL1 Remembering
11
12 Outline in detail about the application of association rule. (13) BTL3 Applying

13 Generalize in detail about utility matrix and long tail. (13) BTL 6 Creating
Explain in detail about discovering features of documents. (13) BTL3 Applying
14
PART – C
A database has five transactions. Let min sup = 60% and min conf=80%
1
TID ITEMS
T100 Milk, Onion, Nuts, Kiwi, Egg, Yoghurt
T200 Dhal, Onion, Nuts, Kiwi, Egg, Yoghurt
T300 Milk, Apple, Kiwi, Egg BTL6 Creating
T400 Milk, Curd, Kiwi,Yoghurt
T500 Curd, Onion, Kiwi, Ice cream,Egg
Find all frequent item sets using Apriori method(15)
Narrate in detail about a model for Recommendation system.(15)
2 BTL 5 Evaluating
3 Illustrates with an example the application of the Apriori algorithm to a
relatively simple case that generalizes to those used in practice. Show how to
use the Apriori algorithm to generate frequent item sets and rules and to BTL6 Creating
evaluate and visualize the rules.(15)
Explain in detail about Hybrid and Knowledge based
4 recommendation.(15) BTL 4 Analyzing

UNIT IV STREAM MEMORY


Introduction to Streams Concepts – Stream Data Model and Architecture - Stream Computing, Sampling Data
in a Stream – Filtering Streams – Counting Distinct Elements in a Stream – Estimating moments – Counting
oneness in a Window – Decaying Window – Real time Analytics Platform(RTAP) applications - Case Studies -
Real Time Sentiment Analysis, Stock Market Predictions. Using Graph Analytics for Big Data: Graph
Analytics
PART – A
Q.No. Question BT Competence
1 Differentiate between data stream mining and traditional data mining BTL4 Analyzing
2 Illustrate examples can you find for stream sources? BTL3 Applying
3 How are moments estimated? BTL2 Understanding
4 List out the applications of data stream. BTL4 Analyzing
5 Compute the surprise number (second moment) for the stream 3, 1, 4, 1, BTL1 Remembering
3, 4, 2, 1, 2. What is the third moment of this stream?
6 Define decaying window BTL1 Remembering
7 Outline the need for sampling data in a stream BTL5 Evaluating
8 Analyze the term filtering a data stream. BTL3 Applying
9 What is real time analysis? www.padeepz.net BTL1 Remembering
www.padeepz.net
10 Give the advantages of the algorithm used in estimating moments BTL1 Remembering
11 Why stream data systems? BTL2 Understanding
Why do you think data stream management is relevant in data mining? BTL2 Understanding
12
13 How oneness is counted in window BTL5 Evaluating
14 What would result if the cost of exact counts doesn‟t match? BTL3 Applying
Illustrate how would you show your understanding of Market-Basket
15 BTL3 Applying
Data?
16 What is sentiment analysis? BTL1 Remembering
17 Compare and contrast RTAP (Real Time Analytics Platform) and RTSA BTL5 Evaluating
(Real Time Sentiment Analysis)?
18 Prove by induction on m that 1+3+5+· · · +(2m−1) = m2 BTL4 Analyzing
19 List any 4 online tool to perform sentiment analysis BTL1 Remembering
20 Generalize information would you use to substitute the view of streams
BTL6 Creating
over databases?
PART – B
What are streams? Explain stream data model with its architecture.(13) BTL1 Remembering
1
2 What is decaying window? briefly explain it with an example (13) BTL1 Remembering
i. Write a short note on sampling in Data Streams.(7) BTL2 Understanding
3
ii. What are the applications of data stream.(6)
4 i. List some common online tools used to perform sentiment BTL1 Remembering
analysis.(6)
ii. What do you understand by sentiment analysis?(7)
5 Explain any one algorithm to count number of distinct elements in a BTL2 Understanding
data stream. (13)
Describe about Stream clustering and parallel clustering. (13) BTL1 Remembering
6
7 Discuss in detail about characteristics of a social network as a graph. BTL3 Applying
(13)
8 i. With a neat sketch, explain the architecture of data-stream BTL6 Creating
management system.(6)
ii. Outline the algorithm used for counting distinct elements in a data
stream.(7)
9 Discuss the concept of decaying window in detail.(13) BTL3 Applying
i. Explain in detail about how data analysis used in Stock BTL2 Understanding
10
Market Predictions(7)
ii. Describe in detail about the usage of data analysis in
Weather forecasting predictions. (6)
11 Explain the concept of Bloom Filter with an example.(13) BTL4 Analyzing
Show how the mining concept used in real time sentiment analysis? (13) BTL3 Applying
12

How is sentiment analysis playing a major role in data mining? (13) BTL4 Analyzing
13

www.padeepz.net
www.padeepz.net
Explain in detail about Alon-Matias-Szegedy algorithm for second
14 BTL3 Applying
moments. (13)
PART – C
How does the Big Data Stream Analytics Framework (BDSAF)works
1 and explain with a neat architecture diagram (15) BTL6 Creating
Taking stock market preconditions as a case study, elaborate on the
2 Real-time Analytics Platform (RTAP). Present the assumptions mode. BTL6 Creating
(15)

3 i. Classify approaches would you use to estimate the moments?(7)


ii. Examine is the function cost of exact counts?(8) BTL5 Evaluating
4 Can you identify the phases involved in real time data analytics-
deployment to production? Analyze. (15) BTL5 Evaluating

UNIT V NOSQL DATA MANAGEMENT FOR BIG DATA AND


VISUALIZATION
NoSQL Databases : Schema-less Models‖: Increasing Flexibility for Data Manipulation-Key Value Stores-
Document Stores - Tabular Stores - Object Data Stores - Graph Databases Hive - Sharding –- Hbase –
Analysing big data with twitter - Big data for E-Commerce Big data for blogs - Review of Basic Data
Analytic Methods using R.
PART – A
Q.No. Question BT Competence
1 What is NoSQL database BTL1 Remembering
2 What is Key Value data store? BTL1 Remembering
3 Compare document store vs Key value store BTL1 Remembering
4 Provide your own definition of what big data means to your BTL1 Remembering
organization?
5 Outline the sharding? BTL 5 Creating
6 Identify three “big data” sources either within or external to your BTL2 Understanding
organization that would be relevant to your business
7 Define Tabular store BTL2 Understanding
8 Summarize the features of Hive. BTL4 Analyzing
9 What is Hive in Big data? BTL4 Analyzing
10 List out any three business challenges in an organization BTL3 Applying
11 Point out the aspects of adopting big data techniques BTL3 Applying
12 Figure out the process of validating big data BTL3 Applying
13 Name the dimensions for measuring the quality of information used for BTL4 Analyzing
big data analytics
14 Define object data stores. BTL1 Remembering
15 Justify how twitter data is useful for analyzing big data. BTL2 Understanding
16 Point out top data analytic tools. BTL5 Evaluating
17 Define R. BTL1 Remembering
18 What is a Graph database? BTL1 Remembering
19 Enumerate the term Graph Analytics. BTL5 Evaluating

www.padeepz.net
www.padeepz.net
Describe a pilot application for graph analytics BTL4 Analyzing
20

PART – B
List the classification of NoSQL Databases and explain about Key BTL3 Applying
1 Value Stores. (13)
2 Describe the system architecture and components of Hive and BTL1 Remembering
Hadoop(13)
3 What is NoSQL? What are the advantages of NoSQL? Explain the types
of NoSQL databases. (13) BTL1 Remembering
4 Explain about Graph databases and descriptive Statistics(13) BTL1 Remembering
5 Write short notes on BTL3 Applying
i. NoSQL Databases and its types(7)
ii. Illustrate in detail about Hive data manipulation, queries, data
definition and data types(6)
With suitable examples differentiate the applications, structure, BTL2 Understanding
6
working and usage of different NoSQL databases (13)
7 Differences between SQL Vs NoSQL explain it with suitable example. BTL2 Understanding
(13)
i. Explain Hive Architecture(6) BTL2 Understanding
8 ii. Write down the features of Hive(7)
9 Explain the types of NoSQL data stores in detail. (13) BTL3 Applying
Discuss in detail about the characteristics of NoSQL databases(13)
10 BTL4 Analyzing
Explain in detail about Market and Business drives for Big data BTL1 Remembering
12
Analytics. (13)
What is HBase? Give detailed note on features of HBASE(13) BTL4 Analyzing
13
i. What is the purpose of sharding? (6)
14 BTL6 Creating
ii. Explain the process of sharding in MongoDB. (7)
PART – C
1 Analyze the use of Hive. How does Hive interact with Hadoop explain
in detail.(15) BTL4 Analyzing
Formulate how big data analytics helps business people to increase
2 BTL5 Evaluating
their revenue. Discuss with any one real time application.(15)
3 Draw insights out of any one visualization tool.(15)
BTL5 Evaluating
Explain in detail about brief history of NoSQL. Explain in detail about
4 ACID vs. BASE.(15) BTL 6 Creating

www.padeepz.net

You might also like