Semester: II / Year: I Course Code & Title: 23CAP202 –DATA ANALYTICS FOR BUSINESS SOLUTIONS
UNIT IV
ANALYTICS USING HADOOP AND MAPREDUCE FRAMEWORK
PART A
1. Give the definition of Hadoop CO4 K2
2. Differences between regular File System and HDFS? CO4 K2 3. Why is HDFS fault-tolerant? CO4 K1 4. Find the two types of metadata that a Name Node server holds? CO4 K1 5. If you have an input file of 350 MB, how many input splits would HDFS CO4 K2 create and what would be the size of each input split? 6. Show the key advantages in Hadoop. CO4 K2 7. How can you set the mappers and reducers for a MapReduce job? CO4 K1 8. Can we have more than one Resource Manager in a YARN-based CO4 K2 cluster? 9. In a cluster of 10 DataNodes, each having 16 GB RAM and 10 cores, CO4 K2 what would be the total processing capacity of the cluster? 10. List out the components used in Hive query processors? CO4 K1 11. Write a query to insert a new column(new_col INT) into a hive table CO4 K2 (h_table) at a position before an existing column (x_col). 12. List the core concepts of Hadoop CO4 K2 13. Write the code needed to open a connection in HBase. CO4 K2 14. Define Compaction in Hbase. CO4 K1 15. Sketch the architecture of Hadoop Hbase . CO4 K1 16. List the components used in hadoop. CO4 K2 17. List the different vendor – specific distributions of Hadoop. CO4 K1 18. When do you use the dfsadmin –refresh Nodes and rmadmin –refresh CO4 K2 Nodes commands? 19. What are the key components of HBase? CO4 K1 20. Define how Map-Reduce computation is executed . CO4 K2 21. Point out the meaning of the term “ Hadoop YARN”. CO4 K2 22. Define MAP REDUCE concepts. CO4 K1 23. How can a key value pair is formed? CO4 K1 24. Develop the importance of DFS. CO4 K2 25. Differentiate between Hadoop and Map Reduce. CO4 K2 26. Point out the characteristics of Hadoop. CO4 K1 27. Distinguish between Hadoop and Big data. CO4 K2 28. List the advantages of MaPR CO4 K1 29. Classify the classical components of computer. CO4 K1 30. Give the importance of Shuffle and sort Algorithm. CO4 K2 31. Show the goals of HDFS. CO4 K1 32. What are the list of Hadoop applications? CO4 K1 33. Classify types of big data. CO4 K2 34. Define the partitions are shuffled in map reduce. CO4 K1 35. List out the steps in map reduce algorithm. CO4 K1 36. Generalize matrix vector multiplication. CO4 K1 37. Difference between mapper and reducer. CO4 K2 38. Write the formula for map reducer. CO4 K1 39. How do you multiply two 2x2 matrices? CO4 K1 40. What is matrix multiplication in analysis of algorithms? CO4 K1 41. Infer the flow of data in map reduce. CO4 K1 42. Relate shuffling and sorting. CO4 K2 43. What is the fastest algorithm for matrix multiplication? CO4 K1 44. Show the properties used in matrix multiplication. CO4 K2 45. Compare selection algorithm with selection operation. CO4 K2 46. What are the 4 types of aggregation? CO4 K2 47. What is grouping and aggregation in SQL? CO4 K1 48. How do you calculate aggregates? CO4 K1 49. Write a Query to display the number of people with the same job. CO4 K2 JOB COUNT * Analyst 2 Clerk 4 Manager 3 President 1 50. Display the manager number and the salary of the lowest paid employee CO4 K2 for that manager include anyone where the manager is not known include any groups where the id is not salary is less than sort the output in descending order of salary. MRG MIN(SAL) 7566 3000 7839 2450 7782 1300 7788 1100
51. List out the uses of data aggregation. CO4 K2
52. Define Command line interface using HDFS files CO4 K1 53. Enlist the Meta data with example? CO4 K1 54. What are the key elements used in meta data. CO4 K1 55. How will you find the statistical analysis in aggregation CO4 K1 56. Summarize the features of spark used in map reduce. CO4 K2 57. Discuss about Map, Flatmap, and Filter. CO4 K2 58. Define spark implementation. CO4 K1 59. What do you meant by resilience of RDD’s? CO4 K1 60. Illustrate the concept of Tensor Flow. CO4 K2 School of Computer Applications
Degree / Branch: M.C.A / Computer Applications
Semester: II / Year: I Course Code & Title: 23CAP202 –DATA ANALYTICS FOR BUSINESS SOLUTIONS
UNIT IV
ANALYTICS USING HADOOP AND MAPREDUCE FRAMEWORK
PART B
1. i. What is Hadoop Ecosystem? Discuss various components of Hadoop 10 CO4 K1
Ecosystem. ii. Discuss role of Job Tracker and Task Tracker in processing data with 6 CO4 K2 Hadoop. 2. i. Infer Job Scheduling in Map Reduce. How it is done in case of 8 CO4 K1 (i)The Fair Scheduler and (ii) The Capacity Scheduler ii. Explain the architecture of HDFS. 8 CO4 K1 3. i. With Neat sketch explain in detail Hadoop architecture and its 10 CO4 K2 components? ii. Enumerate Simple Linear Regression with example. 6 CO4 K2 4. i. How to Use Data Visualization Techniques and Tools for Business? 8 CO4 K1 ii. Explain yarn infrastructure and architecture. 8 CO4 K1 5. i. Express the various core components of the Hadoop. 8 CO4 K2 ii. Give the briefly about Hadoop input and output and write a note on data 8 CO4 K1 integrity? 6. i. Explain job scheduling in Hadoop and explain the types of schedulers 6 CO4 K1 ii. How IBM InfoSphere BigInsights and streams delivering enterprise 10 CO4 K3 Hadoop capabilities with easy-to-use analytic tools and visualization 7. i. Explain partitions in Hive also its advantages and limitations. 10 CO4 K2 ii. Infer Bucketing and views in Hive. 6 CO4 K2 8. i. Explain joins in Hive with neat diagram. 6 CO4 K1 ii. Illustrate pig architecture for scripts dataflow and processing. 10 CO4 K3 9. i. With a neat diagram explain the Pig architecture for scripts data flow 8 CO4 K2 and processing. ii. Demonstrate the algorithms using Map Reduce. 8 CO4 K1 10. i. Analyze the steps of Map reduce Algorithms. 8 CO4 K2 ii. Consider a collection of literature survey made by a researcher in the form of a 8 CO4 K3 text document with respect to cloud and big data analytics. Analyze Using Hadoop and Map Reduce, write a program to count the occurrence of pre dominant key words 11. i. Create a Map-Reduce Algorithm to get the Dot Product of two Large Vectors. 6 CO4 K3 Assuming Only non-zero elements of those vectors are given in input files and output file should show only non-zero entries( assuming two vectors are same size). ex: v1=[ 5 4 0 1 2] v2=[ 4 2 1 0 6] file1: file2: output: (0,5) (0,4) (0,20) (1,4) (1,2) (1,8) (3,1) (2,1) (4,12) (4,2) (4,6)
ii. Perform analysis on web server report 10 CO4 K3
Sample Data: teleman.pr.mcs.net,-,-,[01/Jul/2005:00:03:57,0400], "GET,/images/KSC-logosmall.gif,HTTP/1.0", 304,0teleman.pr.mcs.net,-,- ,[01/Jul/2005:00:03:57,0400],"GET,/images/KSClogosmall.gif,HTTP/1.0",304,0. The data is comma separated. It consists of the user IP address, time at which the request is received, time zone, request type, requested link, request details, response code and bytes transferred. Usually the scale of these datasets is quite huge and running queries in a conventional method is not possible. Hence use Pig programming on this dataset to retrieve the necessary statistics which helps us to understand the load and usage of the server, user visit frequency, webpage popularity and the total bytes transferred 12. i. Prepare Formulate a Hbase table from the following data 10 CO4 K3 Data_file.txt contains the below data 1. 1,India,Bihar,Champaran,2009,April,P1,1,5 2. 2,India, Bihar,Patna,2009,May,P1,2,10 3. 3,India, Bihar,Bhagalpur,2010,June,P2,3,15 4. 4,United States,California,Fresno,2009,April,P2,2,5 5. 5,United States ii. Demonstrate about HBase and Hbase clients in detail. 6 CO4 K1 13. i. Describe how Cassandra is integrated with Hadoop and also the tools 8 CO4 K2 related to Hadoop ii. Recommend a procedure to find the number of occurrence of a word in 8 CO4 K2 a document using Hive. 14. i. How will you Order the use of Hive. How Does Hive Interact With 8 CO4 K1 Hadoop explain in detail? ii. Analyze in detail about Hive data manipulation, queries, and data types 8 CO4 K2 15. i. Explain the concept in detail about: 6 CO4 K1 (i)Conceptual data modeling. (ii)Logical data modeling. (iii)Physical data modeling ii. Explain Compare and Contrast the Hadoop and MapR 10 CO4 K1 16. i. Predict about Pig data model in detail with neat diagram. 6 CO4 K2 ii. Difference between drill and spark. 10 CO4 K1 17. i. Estimate the query optimization used in map reduce concept. 6 CO4 K1 ii. Illustrate the concept of Apache used in cloud d era. 8 CO4 K2 18. i. Distinguish between JDBC driver and ODBC driver. 8 CO4 K2 ii. Predict the data types in MongoDB 8 CO4 K2 19. i. Explain the Replication Architecture in MongoDB. 10 CO4 K1 ii. Infer the concept of pipeline in the MongoDB aggregation framework. 6 CO4 K1 20. i. Explain the process of Sharding. 6 CO4 K1 ii. Distinguish between MongoDB over RDBMS. 10 CO4 K2