This document contains questions to prepare for viva voce exams covering 5 units: 1) understanding big data, 2) Hadoop distributed file system, 3) NoSQL data management, 4) MapReduce and YARN, and 5) Pig and Hive. For each unit, 5 questions are provided to test knowledge of key concepts and differentiate components like name and data nodes in HDFS, mapper and reducer tasks in MapReduce, and similarities and differences between HiveQL and SQL.
This document contains questions to prepare for viva voce exams covering 5 units: 1) understanding big data, 2) Hadoop distributed file system, 3) NoSQL data management, 4) MapReduce and YARN, and 5) Pig and Hive. For each unit, 5 questions are provided to test knowledge of key concepts and differentiate components like name and data nodes in HDFS, mapper and reducer tasks in MapReduce, and similarities and differences between HiveQL and SQL.
This document contains questions to prepare for viva voce exams covering 5 units: 1) understanding big data, 2) Hadoop distributed file system, 3) NoSQL data management, 4) MapReduce and YARN, and 5) Pig and Hive. For each unit, 5 questions are provided to test knowledge of key concepts and differentiate components like name and data nodes in HDFS, mapper and reducer tasks in MapReduce, and similarities and differences between HiveQL and SQL.
1. What are the key characteristics of big data? 2. Why is big data important in today's context? 3. Discuss the challenges posed by big data. 4. Can you classify big data analytics? Explain. 5. Provide examples of big data applications in healthcare, banking, advertising, and other industries.
### Unit 2: Hadoop Distributed File System (HDFS)
1. Explain the components of the Hadoop ecosystem. 2. Describe the architecture of Hadoop. 3. What are the key concepts of HDFS? 4. Differentiate between Name nodes and Data nodes in HDFS. 5. How do you read, write, and delete data in HDFS?
### Unit 3: NoSQL Data Management
1. What is NoSQL and why is it used? 2. Discuss the aggregate data models in NoSQL. 3. Explain the key-value and document data models. 4. What are graph databases and schema-less databases? 5. Describe the concepts of sharding and map-reduce in NoSQL.
### Unit 4: MapReduce and YARN
1. Explain the MapReduce paradigm in Hadoop. 2. Differentiate between Mapper and Reducer tasks. 3. What are Job and Task trackers in Hadoop? 4. Discuss the components and functions of YARN. 5. How does YARN address the failures encountered in classic MapReduce?
### Unit 5: Pig and Hive
1. How do you install and run Pig? Provide an example. 2. Compare Pig with traditional databases. 3. What is Pig Latin and how is it used for data processing? 4. Explain the concepts of Hive and its shell. 5. Discuss the similarities and differences between HiveQL and traditional SQL.
These questions should cover the main topics outlined in your syllabus.