0% found this document useful (1 vote)

136 views38 pages

BDA - Chapter-1-Components of Hadoop Ecosystem - Lecture 3

The document discusses the key components of the Hadoop ecosystem, including HDFS, MapReduce, YARN, Hive, Pig, HBase, HCatalog, Avro, Thrift, Drill, Mahout, Sqoop, Flume, Ambari, Zookeeper, and Oozie. It describes the main functions and purposes of each component, such as HDFS for distributed file storage, MapReduce for distributed processing of large datasets, and YARN for cluster resource management.

Uploaded by

dnyanbavkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

136 views38 pages

BDA - Chapter-1-Components of Hadoop Ecosystem - Lecture 3

Uploaded by

dnyanbavkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

Components Of Hadoop

Ecosystem
D. M. Bavkar
List of the Hadoop Components
 HDFS
 HDFS Components:
 i. NameNode
 Tasks of HDFS NameNode
 Manage file system namespace.
 Regulates client’s access to files.
 Executes file system execution such as naming, closing, opening files and directories.
 ii. DataNode
 Tasks of HDFS DataNode
 DataNode performs operations like block replica creation, deletion, and replication according to
the instruction of NameNode.
 DataNode manages data storage of the system.
2. MapReduce
3. YARN(Yet Another Resource Negotiator)
4. Hive
4. Hive
5. Pig
 Apache Pig is a high-level language platform for analyzing and querying huge dataset that are stored in
HDFS.

 Pig as a component of Hadoop Ecosystem uses PigLatin language.

 It is very similar to SQL.

 It loads the data, applies the required filters and dumps the data in the required format.

 For Programs execution, pig requires Java runtime environment.

6. HBase

 Apache HBase is a Hadoop ecosystem component which is a distributed database that was

designed to store structured data in tables that could have billions of row and millions of

columns.

 HBase is scalable, distributed, and NoSQL database that is built on top of HDFS.

 HBase, provide real-time access to read or write data in HDFS.

Apache HBase Architecture
7. HCatalog
 It is a table and storage management layer for Hadoop.

 HCatalog supports different components available in Hadoop ecosystems like MapReduce,

Hive, and Pig to easily read and write data from the cluster.

 HCatalog is a key component of Hive that enables the user to store their data in any format
and structure. By default, HCatalog supports RCFile, CSV, JSON, sequenceFile and ORC
file formats.
Benefits of HCatalog

 Enables notifications of data availability.

 With the table abstraction, HCatalog frees the user from overhead of data storage.
 Provide visibility for data cleaning and archiving tools.
8. Avro
 Avro is a part of Hadoop ecosystem and is a most popular Data serialization system.

 Avro is an open source project that provides data serialization and data exchange services for
Hadoop.

 These services can be used together or independently.

 Big data can exchange programs written in different languages using Avro.

 Using serialization service programs can serialize data into files or messages.

 It stores data definition and data together in one message or file making it easy for programs to
dynamically understand information stored in Avro file or message.
8. Avro
9. Thrift

 It is a software framework for scalable cross-language services development.

 Thrift is an interface definition language for RPC(Remote procedure call) communication.

 Hadoop does a lot of RPC calls so there is a possibility of using Hadoop Ecosystem
component Apache Thrift for performance or other reasons.
Apache Thrift Architecture
10. Apache Drill
11. Mahaout
12. Sqoop

 Sqoop imports data from external sources into related Hadoop ecosystem components like
HDFS, HBase or Hive.
 It also exports data from Hadoop to other external sources.
 Sqoop works with relational databases such as teradata, Netezza, oracle, MySQL.
12. Sqoop
12. Sqoop
12. Sqoop
13. Flume

Flume efficiently collects, aggregate and moves a large amount of data from its
origin and sending it back to HDFS. It is fault tolerant and reliable mechanism.
This Hadoop Ecosystem component allows the data flow from the source into
Hadoop environment. It uses a simple extensible data model that allows for the
online analytic application. Using Flume, we can get the data from multiple
servers immediately into Hadoop.
13. Flume
14. Ambari

 Ambari, another Hadoop ecosystem component, is a management platform for

provisioning, managing, monitoring and securing apache Hadoop cluster.

 Hadoop management gets simpler as Ambari provide consistent, secure platform for

operational control.
14. Ambari
15. Zookeeper

 Apache Zookeeper is a centralized service and a Hadoop Ecosystem component for

maintaining configuration information, naming, providing distributed synchronization, and
providing group services.

 Zookeeper manages and coordinates a large cluster of machines.

15. Zookeeper
16. Zookeeper
17. Oozie
17. Oozie

Unit Iii
100% (1)
Unit Iii
43 pages
Big Data Analytics: Pig & Hive Overview
No ratings yet
Big Data Analytics: Pig & Hive Overview
10 pages
Chat Application Through Client Server Management System Project
No ratings yet
Chat Application Through Client Server Management System Project
29 pages
Java Programming Notes
No ratings yet
Java Programming Notes
89 pages
Unit - III (Widt) Pre
No ratings yet
Unit - III (Widt) Pre
46 pages
Module 2-Java Servlets: I) Introduction
No ratings yet
Module 2-Java Servlets: I) Introduction
11 pages
Object-Oriented System Design Guide
No ratings yet
Object-Oriented System Design Guide
51 pages
BDA Presentations Unit-4 - Hadoop, Ecosystem
100% (1)
BDA Presentations Unit-4 - Hadoop, Ecosystem
25 pages
R23-Java Lab Exercise
No ratings yet
R23-Java Lab Exercise
57 pages
Lab Manual: Sri Ramakrishna Institute of Technology
No ratings yet
Lab Manual: Sri Ramakrishna Institute of Technology
49 pages
Hadoop I/O for Data Engineers
No ratings yet
Hadoop I/O for Data Engineers
36 pages
Illumination Model Explained
No ratings yet
Illumination Model Explained
2 pages
Module 4
No ratings yet
Module 4
74 pages
Web Development Basics Guide
No ratings yet
Web Development Basics Guide
16 pages
UNIT 5 - Ost
No ratings yet
UNIT 5 - Ost
15 pages
Phong & Gouraud Shading
100% (1)
Phong & Gouraud Shading
3 pages
Client-Side JavaScript Programming Guide
No ratings yet
Client-Side JavaScript Programming Guide
57 pages
CSE Students' MEAN Stack Guide
No ratings yet
CSE Students' MEAN Stack Guide
115 pages
Dbms Module 1 (Mmc103)
No ratings yet
Dbms Module 1 (Mmc103)
25 pages
Understanding Big Data Dynamics
No ratings yet
Understanding Big Data Dynamics
56 pages
Multimedia Applications in Business
No ratings yet
Multimedia Applications in Business
5 pages
JNDI Java Application Guide
No ratings yet
JNDI Java Application Guide
14 pages
Unit-1 Java Notes
No ratings yet
Unit-1 Java Notes
21 pages
Unit Ii Notes
No ratings yet
Unit Ii Notes
49 pages
Web Engineering Models
100% (1)
Web Engineering Models
20 pages
UNIT-IV DHTML With Javascript
No ratings yet
UNIT-IV DHTML With Javascript
16 pages
C++ Question Bank
No ratings yet
C++ Question Bank
3 pages
BDA Notes Unit-1
No ratings yet
BDA Notes Unit-1
18 pages
KCA 034 - Unit 1
No ratings yet
KCA 034 - Unit 1
48 pages
Introduction to Client-Side Scripting
No ratings yet
Introduction to Client-Side Scripting
36 pages
Unit 4 Stream Classes
No ratings yet
Unit 4 Stream Classes
9 pages
Big Data Stream Processing Guide
No ratings yet
Big Data Stream Processing Guide
22 pages
Unit 4
No ratings yet
Unit 4
18 pages
Congestion Control Techniques Overview
No ratings yet
Congestion Control Techniques Overview
8 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
71 pages
Lecturer Notes On IT 2353 UNIT III
100% (3)
Lecturer Notes On IT 2353 UNIT III
30 pages
HPC for Medical Imaging & Vision
No ratings yet
HPC for Medical Imaging & Vision
3 pages
NOSQL
No ratings yet
NOSQL
55 pages
Unit - 2 Cascading Style Sheet
No ratings yet
Unit - 2 Cascading Style Sheet
32 pages
XHTML & JS for Students
100% (1)
XHTML & JS for Students
18 pages
Cs2023 Advanced Java Programming
No ratings yet
Cs2023 Advanced Java Programming
1 page
Web Designing Material
No ratings yet
Web Designing Material
43 pages
C# Question Bank
No ratings yet
C# Question Bank
4 pages
Modularity in Object-Oriented Programming
No ratings yet
Modularity in Object-Oriented Programming
38 pages
DOM
100% (1)
DOM
8 pages
B.tech. 3rd Yr CSE (AI) 2022 23 Revised
No ratings yet
B.tech. 3rd Yr CSE (AI) 2022 23 Revised
35 pages
INTRO To Advance Java
No ratings yet
INTRO To Advance Java
60 pages
Java Lab Manual
No ratings yet
Java Lab Manual
24 pages
Hadoop MapReduce for Temperature Analysis
No ratings yet
Hadoop MapReduce for Temperature Analysis
22 pages
Que. Difference Between C++ and Java. Ans
No ratings yet
Que. Difference Between C++ and Java. Ans
19 pages
Software Design Essentials
No ratings yet
Software Design Essentials
81 pages
Bda U-5
No ratings yet
Bda U-5
30 pages
DWDM Notes - Unit 1
No ratings yet
DWDM Notes - Unit 1
26 pages
Introduction To Data Science UNIT-3
100% (1)
Introduction To Data Science UNIT-3
28 pages
JSP Objects and Components
100% (2)
JSP Objects and Components
10 pages
Uninformed Search: Some Material Adopted From Notes and Slides by Marie Desjardins and Charles R. Dyer
No ratings yet
Uninformed Search: Some Material Adopted From Notes and Slides by Marie Desjardins and Charles R. Dyer
56 pages
Paper - II Web Technology
100% (1)
Paper - II Web Technology
248 pages
Hadoop Ecosystem PDF
No ratings yet
Hadoop Ecosystem PDF
6 pages
Hadoop Ecosystem and Their Components
No ratings yet
Hadoop Ecosystem and Their Components
45 pages
Hadoop Ecosystem Overview
No ratings yet
Hadoop Ecosystem Overview
5 pages
SQL Question
No ratings yet
SQL Question
6 pages
Why We Cannot Put Transaction Log File in A FileGroup
No ratings yet
Why We Cannot Put Transaction Log File in A FileGroup
14 pages
1 Design A Metrics Reporting System
No ratings yet
1 Design A Metrics Reporting System
5 pages
Answer The Following Questions: Required
No ratings yet
Answer The Following Questions: Required
5 pages
Temporal Databases Explained
No ratings yet
Temporal Databases Explained
7 pages
Final Year Project
No ratings yet
Final Year Project
62 pages
Java 2: Senior Systems Analyst/Software Engineer
No ratings yet
Java 2: Senior Systems Analyst/Software Engineer
3 pages
AD0-P101 Adobe Real Exam Questions
No ratings yet
AD0-P101 Adobe Real Exam Questions
5 pages
Introduction To Databases
No ratings yet
Introduction To Databases
37 pages
Scribd Document Metadata
No ratings yet
Scribd Document Metadata
3 pages
Google - Examcollection.cloud Digital Leader - Pdf.download.2024 May 01.by - Luther.162q.vce
No ratings yet
Google - Examcollection.cloud Digital Leader - Pdf.download.2024 May 01.by - Luther.162q.vce
6 pages
How To Deploy Ethereum On Windows v51
No ratings yet
How To Deploy Ethereum On Windows v51
28 pages
Opportunities Through Integrating Public Data
No ratings yet
Opportunities Through Integrating Public Data
24 pages
Cloud Forensics: Recover, Analyze, and Report On Your Cloud Evidence in One Case
No ratings yet
Cloud Forensics: Recover, Analyze, and Report On Your Cloud Evidence in One Case
2 pages
SQL102
No ratings yet
SQL102
36 pages
AI System Design: Data Acquisition & Models
No ratings yet
AI System Design: Data Acquisition & Models
1 page
SQL Stored Procedures and Views Guide
No ratings yet
SQL Stored Procedures and Views Guide
15 pages
4 12th Computer Science Important Questions For Slow Lanners English Medium
No ratings yet
4 12th Computer Science Important Questions For Slow Lanners English Medium
11 pages
Skill Enhancement Course-I Aws Cloud Computing: Submitted by
No ratings yet
Skill Enhancement Course-I Aws Cloud Computing: Submitted by
34 pages
Web Development Syllabus V2V Internship
No ratings yet
Web Development Syllabus V2V Internship
16 pages
IT Application Controls Checklist
No ratings yet
IT Application Controls Checklist
13 pages
Database Programming With SQL Section 12 Quiz Parte II
No ratings yet
Database Programming With SQL Section 12 Quiz Parte II
47 pages
Syllabus File
No ratings yet
Syllabus File
12 pages
Family Office MAX Databases
No ratings yet
Family Office MAX Databases
3 pages
Nikhil Burdak's Tech Experience & Skills
No ratings yet
Nikhil Burdak's Tech Experience & Skills
1 page
Curriclum-Syllabus-MS Data Science & MGT IIT Indore
No ratings yet
Curriclum-Syllabus-MS Data Science & MGT IIT Indore
16 pages
States of Transaction
No ratings yet
States of Transaction
4 pages
Deep Learning for XSS and SQLi Defense
No ratings yet
Deep Learning for XSS and SQLi Defense
17 pages
Research Paper Tanishka
No ratings yet
Research Paper Tanishka
5 pages
Company Database DBMS LAB
No ratings yet
Company Database DBMS LAB
15 pages

BDA - Chapter-1-Components of Hadoop Ecosystem - Lecture 3

Uploaded by

BDA - Chapter-1-Components of Hadoop Ecosystem - Lecture 3

Uploaded by

Components Of Hadoop

 Pig as a component of Hadoop Ecosystem uses PigLatin language.

 It is very similar to SQL.

 For Programs execution, pig requires Java runtime environment.

 HBase, provide real-time access to read or write data in HDFS.

 HCatalog supports different components available in Hadoop ecosystems like MapReduce,

 Enables notifications of data availability.

 These services can be used together or independently.

 It is a software framework for scalable cross-language services development.

 Thrift is an interface definition language for RPC(Remote procedure call) communication.

 Ambari, another Hadoop ecosystem component, is a management platform for

provisioning, managing, monitoring and securing apache Hadoop cluster.

 Apache Zookeeper is a centralized service and a Hadoop Ecosystem component for

 Zookeeper manages and coordinates a large cluster of machines.

You might also like