Professional Documents
Culture Documents
UNIT III
Parallel and Distributed Programming Paradigms – MapReduce , Twister and Iterative MapReduce
– Hadoop Library from Apache – Mapping Applications - Cloud Software Environments -
Eucalyptus, Open Nebula, OpenStack,
What is MapReduce?
MapReduce is a programming model or pattern within the Hadoop framework that is
used to access big data stored in the Hadoop File System (HDFS). It is a core component,
chunks, and processing them in parallel on Hadoop commodity servers. In the end, it
aggregates all the data from multiple servers to return a consolidated output back to the
application.
With MapReduce, rather than sending data to where the application or logic resides,
the logic is executed on the server where the data already resides, to expedite
processing. Data access and storage is disk-based—the input is usually stored as files
containing structured, semi-structured, or unstructured data, and the output is also stored in
files.
MapReduce was once the only method through which the data stored in the HDFS could be
retrieved, but that is no longer the case. Today, there are other query-based systems such as
Hive and Pig that are used to retrieve data from the HDFS using SQL-like statements.
However, these usually run along with jobs that are written using the MapReduce model.
At the crux of MapReduce are two functions: Map and Reduce. They are sequenced one after
the other.
The Map function takes input from the disk as <key,value> pairs, processes them,
of data.
failures.
Four modules comprise the primary Hadoop framework and work
Cloud computing
Application Mapping
hardware.
mapping software.
• Active Probing — Creates a map with data from packets that report
traffic.
benefits:
Version 3.3, which became generally available in June 2013, adds the
following features:
EUCALYPTUS ARCHITECTURE
Eucalyptus CLIs can oversee both Amazon Web Services and their own
private occasions. Clients can undoubtedly relocate cases from
Eucalyptus to Amazon Elastic Cloud. Network, storage, and compute
are overseen by the virtualisation layer. Occurrences are isolated by
hardware virtualisation. The following wording is utilised by Eucalyptus
architecture in cloud computing.
instance.
EUCALYPTUS COMPONENTS
volumes.
framework.
OTHER TOOLS
Eucalyptus Walrus.
3. s3fs: This is a FUSE record framework, which can be utilised to
OpenNebula
Fits into any existing data center thanks to its open, flexible and
extensible interfaces, architecture and components
Builds any type of Cloud deployment
Open source software, Apache license
Seamless integration with any product and service in the
virtualization/cloud ecosystem and management tool in the data
center, such as cloud providers, VM managers, virtual image
managers, service managers, management tools, schedulers.
OpenStack
WHAT IS OPENSTACK?
OpenStack is a cloud operating system that controls large pools of
compute, storage, and networking resources throughout a datacenter,
all managed and provisioned through APIs with common authentication
mechanisms.
EUCALYPTUS Architecture