You are on page 1of 2

A 1000ITT401122201 Pages: 2

Reg No.:_______________ Name:__________________________


APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
Seventh Semester B.Tech Degree Examination December 2022 (2019 scheme)

Course Code: ITT401


Course Name: DATA ANALYTICS
Max. Marks: 100 Duration: 3 Hours
PART A
Answer all questions, each carries 3 marks. Marks
1 Explain four general categories of analytics that are distinguished by the results (3)
they produce.
2 Explain Gartner’s definitions of 3Vs in big data. (3)
3 Identify the big data issues and challenges that affect analytics. (3)
4 Define NoSQL. Explain Key value data stores. (3)
5 Differentiate between RDBMS and HBase. (3)
6 Explain the concept of Map Reduce framework. (3)
7 What is the significance of functions gather() and spread() in tidying data? (3)
Illustrate with an example.
8 Write R function to check whether the given number is prime or not. (3)
9 Mention the tools used in social media analytics. (3)
10 Describe the steps involved in churn prediction process. (3)
PART B
Answer any one full question from each module, each carries 14 marks.
Module I
11 a) Illustrate the concept of re-sampling? Explain about different re-sampling (9)
techniques using suitable example.
b) The following are the details of online transaction data collected for a prediction (5)
model.
Number of Observations: 1 million; Fraud : 100; Non-Fraud: 999,900
Suppose you created a model that predicted 95% of the transactions as Non-Fraud,
and all the predictions for Non-Frauds turn out to be accurate. Construct a
confusion matrix for the data and compute the accuracy, precision and recall for
the data.
OR

Page 1of 2
1000ITT401122201

12 a) With a diagram, explain the various phases of Data Analytics Lifecycle. (14)
Module II
13 a) Explain how is cloud computing related to big data. (4)
b) Identify the various phases involved in big data acquisition and explain the (10)
functionalities of each phase.
OR
14 a) Illustrate the functionalities of five popular data analytics tools and identify their (7)
application areas
b) Explain how MongoDB can be applied to create, update, and delete documents. (7)
Module III
15 a) Illustrate the anatomy of a YARN application with necessary diagram. (10)
b) Explain the benefits and features of Apache Pig. (4)
OR
16 a) Draw the HDFS architecture and describe the HDFS framework and interface. (8)
b) Illustrate the architecture of HIVE using suitable diagram. (6)
Module IV
17 a) Explain Exploratory Data Analysis and its characteristics. (4)
b) With suitable example describe the five commonly used ‘dplyr’ key functions. (10)
OR
18 a) List out the various data structures in R. Represent each type using example. (8)
b) Write R code for the following with ggplot2 using diamonds data set (6)
i) Create a histogram of "carat" with a border colour and fill colour
Set the bin width of the histogram to 0.01
ii) Make a scatterplot: carat vs price and Facet it by clarity
iii) Show carat vs cut, make a violin and a boxplot.
Module V
19 a) Describe the five main techniques used in recommender systems. Also specify (14)
the advantages and disadvantages of each technique.

OR
20 a) Analyze Facebook data to do a case study on citizen centric public services. (7)
b) Illustrate uplift modelling with an appropriate example. (7)
****

Page 2of 2

You might also like