Unit Iv Dabs

School of Computer Applications
Degree / Branch: M.C.A / Computer Applications

Semester: II / Year: I
Course Code & Title: 23CAP202 –DATA ANALYTICS FOR BUSINESS SOLUTIONS
UNIT IV
ANALYTICS USING HADOOP AND MAPREDUCE FRAMEWORK
PART A
1. Give the definition of Hadoop CO4 K2

2. Differences between regular File System and HDFS? CO4 K2
3. Why is HDFS fault-tolerant? CO4 K1
4. Find the two types of metadata that a Name Node server holds? CO4 K1
5. If you have an input file of 350 MB, how many input splits would HDFS CO4 K2
create and what would be the size of each input split?
6. Show the key advantages in Hadoop. CO4 K2
7. How can you set the mappers and reducers for a MapReduce job? CO4 K1
8. Can we have more than one Resource Manager in a YARN-based CO4 K2
cluster?
9. In a cluster of 10 DataNodes, each having 16 GB RAM and 10 cores, CO4 K2
what would be the total processing capacity of the cluster?
10. List out the components used in Hive query processors? CO4 K1
11. Write a query to insert a new column(new_col INT) into a hive table CO4 K2
(h_table) at a position before an existing column (x_col).
12. List the core concepts of Hadoop CO4 K2
13. Write the code needed to open a connection in HBase. CO4 K2
14. Define Compaction in Hbase. CO4 K1
15. Sketch the architecture of Hadoop Hbase . CO4 K1
16. List the components used in hadoop. CO4 K2
17. List the different vendor – specific distributions of Hadoop. CO4 K1
18. When do you use the dfsadmin –refresh Nodes and rmadmin –refresh CO4 K2
Nodes commands?
19. What are the key components of HBase? CO4 K1
20. Define how Map-Reduce computation is executed . CO4 K2
21. Point out the meaning of the term “ Hadoop YARN”. CO4 K2
22. Define MAP REDUCE concepts. CO4 K1
23. How can a key value pair is formed? CO4 K1
24. Develop the importance of DFS. CO4 K2
25. Differentiate between Hadoop and Map Reduce. CO4 K2
26. Point out the characteristics of Hadoop. CO4 K1
27. Distinguish between Hadoop and Big data. CO4 K2
28. List the advantages of MaPR CO4 K1
29. Classify the classical components of computer. CO4 K1
30. Give the importance of Shuffle and sort Algorithm. CO4 K2
31. Show the goals of HDFS. CO4 K1
32. What are the list of Hadoop applications? CO4 K1
33. Classify types of big data. CO4 K2
34. Define the partitions are shuffled in map reduce. CO4 K1
35. List out the steps in map reduce algorithm. CO4 K1
36. Generalize matrix vector multiplication. CO4 K1
37. Difference between mapper and reducer. CO4 K2
38. Write the formula for map reducer. CO4 K1
39. How do you multiply two 2x2 matrices? CO4 K1
40. What is matrix multiplication in analysis of algorithms? CO4 K1
41. Infer the flow of data in map reduce. CO4 K1
42. Relate shuffling and sorting. CO4 K2
43. What is the fastest algorithm for matrix multiplication? CO4 K1
44. Show the properties used in matrix multiplication. CO4 K2
45. Compare selection algorithm with selection operation. CO4 K2
46. What are the 4 types of aggregation? CO4 K2
47. What is grouping and aggregation in SQL? CO4 K1
48. How do you calculate aggregates? CO4 K1
49. Write a Query to display the number of people with the same job. CO4 K2
JOB COUNT *
Analyst 2
Clerk 4
Manager 3
President 1
50. Display the manager number and the salary of the lowest paid employee CO4 K2
for that manager include anyone where the manager is not known
include any groups where the id is not salary is less than sort the output
in descending order of salary.
MRG MIN(SAL)
7566 3000
7839 2450
7782 1300
7788 1100
51. List out the uses of data aggregation. CO4 K2

52. Define Command line interface using HDFS files CO4 K1
53. Enlist the Meta data with example? CO4 K1
54. What are the key elements used in meta data. CO4 K1
55. How will you find the statistical analysis in aggregation CO4 K1
56. Summarize the features of spark used in map reduce. CO4 K2
57. Discuss about Map, Flatmap, and Filter. CO4 K2
58. Define spark implementation. CO4 K1
59. What do you meant by resilience of RDD’s? CO4 K1
60. Illustrate the concept of Tensor Flow. CO4 K2
School of Computer Applications
Degree / Branch: M.C.A / Computer Applications

Semester: II / Year: I
Course Code & Title: 23CAP202 –DATA ANALYTICS FOR BUSINESS SOLUTIONS
UNIT IV
ANALYTICS USING HADOOP AND MAPREDUCE FRAMEWORK
PART B
1. i. What is Hadoop Ecosystem? Discuss various components of Hadoop 10 CO4 K1

Ecosystem.
ii. Discuss role of Job Tracker and Task Tracker in processing data with 6 CO4 K2
Hadoop.
2. i. Infer Job Scheduling in Map Reduce. How it is done in case of 8 CO4 K1
(i)The Fair Scheduler and (ii) The Capacity Scheduler
ii. Explain the architecture of HDFS. 8 CO4 K1
3. i. With Neat sketch explain in detail Hadoop architecture and its 10 CO4 K2
components?
ii. Enumerate Simple Linear Regression with example. 6 CO4 K2
4. i. How to Use Data Visualization Techniques and Tools for Business? 8 CO4 K1
ii. Explain yarn infrastructure and architecture. 8 CO4 K1
5. i. Express the various core components of the Hadoop. 8 CO4 K2
ii. Give the briefly about Hadoop input and output and write a note on data 8 CO4 K1
integrity?
6. i. Explain job scheduling in Hadoop and explain the types of schedulers 6 CO4 K1
ii. How IBM InfoSphere BigInsights and streams delivering enterprise 10 CO4 K3
Hadoop capabilities with easy-to-use analytic tools and visualization
7. i. Explain partitions in Hive also its advantages and limitations. 10 CO4 K2
ii. Infer Bucketing and views in Hive. 6 CO4 K2
8. i. Explain joins in Hive with neat diagram. 6 CO4 K1
ii. Illustrate pig architecture for scripts dataflow and processing. 10 CO4 K3
9. i. With a neat diagram explain the Pig architecture for scripts data flow 8 CO4 K2
and processing.
ii. Demonstrate the algorithms using Map Reduce. 8 CO4 K1
10. i. Analyze the steps of Map reduce Algorithms. 8 CO4 K2
ii. Consider a collection of literature survey made by a researcher in the form of a 8 CO4 K3
text document with respect to cloud and big data analytics. Analyze Using
Hadoop and Map Reduce, write a program to count the occurrence of pre
dominant key words
11. i. Create a Map-Reduce Algorithm to get the Dot Product of two Large Vectors. 6 CO4 K3
Assuming Only non-zero elements of those vectors are given in input files and
output file should show only non-zero entries( assuming two vectors are same
size). ex: v1=[ 5 4 0 1 2] v2=[ 4 2 1 0 6] file1: file2: output: (0,5) (0,4) (0,20)
(1,4) (1,2) (1,8) (3,1) (2,1) (4,12) (4,2) (4,6)
ii. Perform analysis on web server report 10 CO4 K3

Sample Data: teleman.pr.mcs.net,-,-,[01/Jul/2005:00:03:57,0400],
"GET,/images/KSC-logosmall.gif,HTTP/1.0", 304,0teleman.pr.mcs.net,-,-
,[01/Jul/2005:00:03:57,0400],"GET,/images/KSClogosmall.gif,HTTP/1.0",304,0.
The data is comma separated. It consists of the user IP address, time at which
the request is received, time zone, request type, requested link, request
details, response code and bytes transferred. Usually the scale of these
datasets is quite huge and running queries in a conventional method is not
possible. Hence use Pig programming on this dataset to retrieve the necessary
statistics which helps us to understand the load and usage of the server, user
visit frequency, webpage popularity and the total bytes transferred
12. i. Prepare Formulate a Hbase table from the following data 10 CO4 K3
Data_file.txt contains the below data 1.
1,India,Bihar,Champaran,2009,April,P1,1,5 2. 2,India,
Bihar,Patna,2009,May,P1,2,10 3. 3,India, Bihar,Bhagalpur,2010,June,P2,3,15
4. 4,United States,California,Fresno,2009,April,P2,2,5 5. 5,United States
ii. Demonstrate about HBase and Hbase clients in detail. 6 CO4 K1
13. i. Describe how Cassandra is integrated with Hadoop and also the tools 8 CO4 K2
related to Hadoop
ii. Recommend a procedure to find the number of occurrence of a word in 8 CO4 K2
a document using Hive.
14. i. How will you Order the use of Hive. How Does Hive Interact With 8 CO4 K1
Hadoop explain in detail?
ii. Analyze in detail about Hive data manipulation, queries, and data types 8 CO4 K2
15. i. Explain the concept in detail about: 6 CO4 K1
(i)Conceptual data modeling.
(ii)Logical data modeling.
(iii)Physical data modeling
ii. Explain Compare and Contrast the Hadoop and MapR 10 CO4 K1
16. i. Predict about Pig data model in detail with neat diagram. 6 CO4 K2
ii. Difference between drill and spark. 10 CO4 K1
17. i. Estimate the query optimization used in map reduce concept. 6 CO4 K1
ii. Illustrate the concept of Apache used in cloud d era. 8 CO4 K2
18. i. Distinguish between JDBC driver and ODBC driver. 8 CO4 K2
ii. Predict the data types in MongoDB 8 CO4 K2
19. i. Explain the Replication Architecture in MongoDB. 10 CO4 K1
ii. Infer the concept of pipeline in the MongoDB aggregation framework. 6 CO4 K1
20. i. Explain the process of Sharding. 6 CO4 K1
ii. Distinguish between MongoDB over RDBMS. 10 CO4 K2

Unit Iv Dabs

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit Iv Dabs

Uploaded by

Copyright:

Available Formats

School of Computer Applications

Degree / Branch: M.C.A / Computer Applications

ANALYTICS USING HADOOP AND MAPREDUCE FRAMEWORK

1. Give the definition of Hadoop CO4 K2

51. List out the uses of data aggregation. CO4 K2

Degree / Branch: M.C.A / Computer Applications

ANALYTICS USING HADOOP AND MAPREDUCE FRAMEWORK

1. i. What is Hadoop Ecosystem? Discuss various components of Hadoop 10 CO4 K1

ii. Perform analysis on web server report 10 CO4 K3

You might also like