You are on page 1of 3

Hadoop Training In Bangalore

INTRODUCTION TO HADOOP
Hadoop Architecture
Map Reduce
Hadoop Distributed File System
How Does Hadoop Work?
Advantages of Hadoop
Hadoop is an Apache open source framework developed in java that allows distributed processing
of huge datasets across clusters of pc making use of simple and easy programming models .
The Hadoop framework application succeeds in an environment that offers distributed space for
storing as well as computation across clusters of computers. Hadoop was created to range up
from a single server to several thousand machines, each individual providing local
computation and also storage area.
At it is core , Hadoop contains two most significant layers specifically named as :
(a) Processing/Computation layer (Map Reduce), and
(b) Storage layer (Hadoop Distributed File System).

www.kellytechno.com

Page 1

Hadoop Training In Bangalore

Map Reduce
Map Reduce is a parallel programming type for preparing supplied applications generated at
Google for well-designed processing of a considerable amount of data ( multi-terabyte data-sets )
, on larger clusters ( a huge number of nodes ) of commodity components in a reliable , faulttolerant strategy . The Map Reduce program will work on Hadoop which can be an Apache opensource framework .

Hadoop Distributed File System


The Hadoop Distributed File System (HDFS) will depend on the Google File System (GFS) and
then presents a distributed file system that is definitely created to run on commodity hardware.
It possesses several similarities with existed distributed file systems. Although, the differences
from another distributed file systems are considerable. It is very highly fault-tolerant as well
as created to be installed on less expensive hardware. It gives maximum throughput usage
of application data it is appropriate to applications possessing huge datasets.
Moreover the above-mentioned two basic core components, Hadoop framework also consists
of the following two modules:
Hadoop Common: These are Java libraries and utilities mandatory by most
other Hadoop modules.
Hadoop YARN: It is a framework for job scheduling and the cluster resource
management.

How Does Hadoop Work?


It is very more expensive to set up bigger servers with large configurations
that control larger scale processing, but then instead of , you will be able to connect with one
another several commodity computers with single-CPU , as being a single working distributed
system and then practically , the clustered machines can certainly read the dataset in
similar then generate a greater throughput .
Apart from that, it really is much less expensive in comparison with one high quality server. So
that it is the first motivational thing behind implementing Hadoop that it runs across clustered
and less expensive machines.

www.kellytechno.com

Page 2

Hadoop Training In Bangalore

Hadoop runs code across a group of computers. Accomplishing this offers the following core
tasks that Hadoop represents:
Data is initially split up into directories and files. Files are divided into united sized
blocks of 128M and 64M (absolutely 128M) .
These files are then distributed across several cluster nodes for further processing.
HDFS, having top of the local file system, manages the processing.
Blocks are replicated for avoiding hardware failure.
Checking that the code was executed successfully.
Performing the sort that takes place between the map and then minimize stages .
Sending the sorted data to a specific computer.
Writing the debugging logs for each and every job.

Advantages of Hadoop
Hadoop framework helps the user to easily write and check out distributed systems. It is
really efficient, and it fully automatic distributes the data and do the job across the
machines along with turn, uses the supporting parallelism of the CPU cores.
Hadoop does not go with hardware to bring fault-tolerance and high availability (FTHA), rather
Hadoop library itself has been manufactured to locate and regulate failures at the application
layer.
Servers are typically added or detached from the cluster dynamically and Hadoop proceeds to
operate without interruption.
Another major benefit from Hadoop is that apart from being open source, it is really compatible
on all of the platforms since it is Java based.

www.kellytechno.com

Page 3

You might also like