Professional Documents
Culture Documents
INTRODUCTION TO HADOOP
Hadoop Architecture
Map Reduce
Hadoop Distributed File System
How Does Hadoop Work?
Advantages of Hadoop
Hadoop is an Apache open source framework developed in java that allows distributed processing
of huge datasets across clusters of pc making use of simple and easy programming models .
The Hadoop framework application succeeds in an environment that offers distributed space for
storing as well as computation across clusters of computers. Hadoop was created to range up
from a single server to several thousand machines, each individual providing local
computation and also storage area.
At it is core , Hadoop contains two most significant layers specifically named as :
(a) Processing/Computation layer (Map Reduce), and
(b) Storage layer (Hadoop Distributed File System).
www.kellytechno.com
Page 1
Map Reduce
Map Reduce is a parallel programming type for preparing supplied applications generated at
Google for well-designed processing of a considerable amount of data ( multi-terabyte data-sets )
, on larger clusters ( a huge number of nodes ) of commodity components in a reliable , faulttolerant strategy . The Map Reduce program will work on Hadoop which can be an Apache opensource framework .
www.kellytechno.com
Page 2
Hadoop runs code across a group of computers. Accomplishing this offers the following core
tasks that Hadoop represents:
Data is initially split up into directories and files. Files are divided into united sized
blocks of 128M and 64M (absolutely 128M) .
These files are then distributed across several cluster nodes for further processing.
HDFS, having top of the local file system, manages the processing.
Blocks are replicated for avoiding hardware failure.
Checking that the code was executed successfully.
Performing the sort that takes place between the map and then minimize stages .
Sending the sorted data to a specific computer.
Writing the debugging logs for each and every job.
Advantages of Hadoop
Hadoop framework helps the user to easily write and check out distributed systems. It is
really efficient, and it fully automatic distributes the data and do the job across the
machines along with turn, uses the supporting parallelism of the CPU cores.
Hadoop does not go with hardware to bring fault-tolerance and high availability (FTHA), rather
Hadoop library itself has been manufactured to locate and regulate failures at the application
layer.
Servers are typically added or detached from the cluster dynamically and Hadoop proceeds to
operate without interruption.
Another major benefit from Hadoop is that apart from being open source, it is really compatible
on all of the platforms since it is Java based.
www.kellytechno.com
Page 3