You are on page 1of 11

1

Hadoop Platforms

Introduction

Hadoop was created by Doug Cutting and Mike Cafarella in 2005. It was named after
a toy elephant. It was originally developed to support distribution for the Nutch search
engine project.

Hadoop is an open-source software framework for storing data and running applications on clusters. It
provides immense storage for any kind of data, enormous processing power and the ability to handle
limitless concurrent tasks.

Hadoop is a highly scalable analytics platform and can process multiple petabytes of data spread across
hundreds or thousands of physical storage servers or nodes.

It provides:

Redundant, fault-tolerant data storage

Parallel computation framework

Job Coordination

Hadoop is a solution to manage Big Data, it is framework for running data management
applications on a large cluster built of commodity hardware.
11/3/16

Importance of Hadoop

Ability to store and process huge amounts of any kind of data, quickly.

Computing power-Hadoop's distributed computing model processes big data


faster.

Fault tolerance-Data and application processing are protected against


hardware failure. If a node goes down, jobs are automatically redirected to other
nodes to make sure the distributed computing does not fail.

Flexibility- structured and unstructured both kinds of data can be stored


without pre-processing them.

Low cost-The open-source framework is free and uses commodity


hardware to store large quantities of data.

Scalability-Nodes can be added as and when needed and maintenance


cost is very less.

3
http://www.sas.com/content/sascom/en_us/insights/big-data/hadoop/_jcr_content/par/styledcontainer_8bf1/par/styledcontainer_a643/par/textimage_ea05/image.img.png/1468851612191.png

11/3/16

Hadoop Core Components


Hadoop is a system for large scale data processing.

It has two main components:


1.

HDFS Hadoop Distributed File System


(Storage)

Distributed across nodes

Natively redundant

NameNode tracks locations.

2.

MapReduce (Processing)

Splits a task across processors

near the data & assembles results

Self-healing, High Bandwidth

Clustured storage

JobTracker manages the TaskTrackers

http://cdn.edureka.co/blog/wp-content/uploads/2014/08/hadoop1componenets.png

11/3/16

Top 5 Hadoop Platform Providers 5


A software framework which provides the necessary tools
to carry out Big Data analysis is widely used across
industries.

It is open-source, designed to be user-friendly, in its raw


state it still needs considerable specialist knowledge to set
up and run.

Hadoop-as-a-Service has evolved in recent times, all of


the installation will actually take place within the vendors
own cloud, with customers paying a subscription to access
the services.

The top 5 Hadoop platform providers are:

IBM

Amazon Web Services

Hortonworks

Cloudera

MapR

https://media.licdn.com/mpr/mpr/AAEAAQAAAAAAAAclAAAAJDZmZTQwODVlLTAwZGQtNGI3Ny05OTlhLTUzMTEyYTNmMTllMg.jpg

11/3/16

1. IBM

IBM has deep roots in the computing industry. Its BigInsights


package adds its proprietary analytics and visualization
algorithms to the core Hadoop infrastructure.

IBM Open Platform with Apache Hadoop

Native support for rolling upgrades for Hadoop services

Support for long-running applications within YARN for


enhanced reliability & security

Heterogeneous storage in HDFS for in-memory, SSD in


addition to HDD

Spark in-memory distributed compute engine for dramatic performance increases over MapReduce
and simplifies developer experience, leveraging Java, Python & Scala languages
Apache Hadoop projects included: HDFS, YARN, MapReduce, Ambari, Hbase, Hive, Oozie, Parquet,
Parquet Format, Pig, Snappy, Solr, Spark, Sqoop, Zookeeper, Open JDK, Knox, Slider

https://www-01.ibm.com/software/in/data/images/bd-platform.jpg

11/3/16

2. Amazon Web Services

Amazon is a frontrunner and offering Hadoop in its cloud


services package.

Amazon Web Services (AWS) is a hosted solution


integrating Hadoop with Amazons Elastic Cloud Compute
and Simple Storage Service (S3) cloud-based data
processing and storage services.

AWS offers a broad set of global compute, storage,


database, analytics, application, and deployment services
that help organizations move faster, lower IT costs, and
scale applications.

AWS are trusted by the largest enterprises and the


hottest start-ups to power a wide variety of workloads
including web and mobile applications, data processing
and warehousing, storage, archive, and many others.

Big Data on AWS introduces you to cloud-based big data solutions such as Amazon Elastic,
MapReduce (EMR), Amazon Redshift, Amazon Kinesis and the rest of the AWS big data platform.
7

http://www.strategism.org/wp-content/uploads/2015/06/amazon-800x600.jpg

11/3/16

3. Hortonworks

Horton is one of the few which offer 100% open


source Hadoop technology without any proprietary.

Horton were also the first to integrate support for


Apache Catalog, which creates metadata data
within data simplifying the process of sharing your
data across other layers of service such as Apache
Hive or Pig.

HDP (HORTONW0RKS DATA PLATFORM) is the


enterprise-ready open source Apache
Hadoopdistribution based on a centralized
architecture (YARN).

HDP addresses the complete needs of data-at-rest, powers real-time customer applications
and delivers robust analytics that accelerate decision making and innovation.
Hortonworks is all about data: data-in-motion, data-at-rest, and Modern Data Applications.
Our Connected Data Platforms help customers create actionable intelligence to transform their
businesses.

8
http://hortonworks.com/wp-content/uploads/2014/03/11.png

11/3/16

4. Cloudera

Most popular and have largest number of installations


running.

Cloudera contribute Impala, which offers real-time massively


parallel processing of Big Data to Hadoop.

Cloudera's open-source Apache Hadoop distribution, CDH


(Cloudera Distribution Including Apache Hadoop), targets
enterprise-class deployments of that technology.

Cloudera says that more than 50% of its engineering output is


donated upstream to the various Apache-licensed open source
projects (Apache Hive, Apache Avro,Apache HBase, and so
on) that combine to form the Hadoop platform.

Cloudera is a sponsor of the Apache Software Foundation.

9
http://blog.cloudera.com/wp-content/uploads/2013/06/search.png

11/3/16

5. MapR

10

MapR uses some differing concepts, such as native support for


UNIX file systems rather than HDFS.

MapR technologies is spearheading development of the


Apache Drill project, which provides advanced tools for
interactive real-time querying of Big Datasets.

The MapR Converged Data Platform is the industrys only


platform to integrate the enormous power of Hadoop and
Spark with global event streaming, real-time database
capabilities, and enterprise storage.

The MapR Hadoop distribution replaces HDFS with its


proprietary file system, MapR-FS, which is designed to provide
more efficient management of data, reliability and ease of use.

The MapR Converged Data Platform supports big data storage


and processing through the Apache collection of Hadoop
products, as well as its added-value components.

http://2s7gjr373w3x22jf92z99mgm5w-wpengine.netdna-ssl.com/wp-content/uploads/2016/03/Mapr_Zeta_4-1.png

10

11/3/16

References

11

1.

http://www.sas.com/en_us/insights/big-data/hadoop.html#hadoopimportance

2.

http://www.ironsystems.com/products/hadoop-platforms-overview

3.

http://www.slideshare.net/billonahill/intro-to-hadoop-14125097/32-Hadoop_provides_Redundant_faulttolerant_data

4.

http://www.computerweekly.com/feature/Big-data-storage-Hadoop-storage-basics

5.

https://www.linkedin.com/pulse/big-data-top-10-commercial-hadoop-platforms-bernard-marr

6.

http://data-informed.com/10-top-commercial-hadoop-platforms/

7.

http://www.cloudera.com/partners/solutions/amazon-web-services.html

8.

http://hortonworks.com/products/data-center/hdp/

9.

http://www-03.ibm.com/software/products/en/ibm-open-platform-with-apache-hadoop

10.

https://en.wikipedia.org/wiki/Cloudera

11.

https://www.mapr.com/

12.

http://searchdatamanagement.techtarget.com/feature/Inside-the-MapR-Hadoop-distribution-for-managing-big-data

13.

http://www.ironnetworks.com/

14.

http://www.ironsystems.com/

15.

http://shop.ironnetworks.com/

11/3/16

You might also like