Professional Documents
Culture Documents
Hadoop Platforms
Introduction
Hadoop was created by Doug Cutting and Mike Cafarella in 2005. It was named after
a toy elephant. It was originally developed to support distribution for the Nutch search
engine project.
Hadoop is an open-source software framework for storing data and running applications on clusters. It
provides immense storage for any kind of data, enormous processing power and the ability to handle
limitless concurrent tasks.
Hadoop is a highly scalable analytics platform and can process multiple petabytes of data spread across
hundreds or thousands of physical storage servers or nodes.
It provides:
Job Coordination
Hadoop is a solution to manage Big Data, it is framework for running data management
applications on a large cluster built of commodity hardware.
11/3/16
Importance of Hadoop
Ability to store and process huge amounts of any kind of data, quickly.
3
http://www.sas.com/content/sascom/en_us/insights/big-data/hadoop/_jcr_content/par/styledcontainer_8bf1/par/styledcontainer_a643/par/textimage_ea05/image.img.png/1468851612191.png
11/3/16
Natively redundant
2.
MapReduce (Processing)
Clustured storage
http://cdn.edureka.co/blog/wp-content/uploads/2014/08/hadoop1componenets.png
11/3/16
IBM
Hortonworks
Cloudera
MapR
https://media.licdn.com/mpr/mpr/AAEAAQAAAAAAAAclAAAAJDZmZTQwODVlLTAwZGQtNGI3Ny05OTlhLTUzMTEyYTNmMTllMg.jpg
11/3/16
1. IBM
Spark in-memory distributed compute engine for dramatic performance increases over MapReduce
and simplifies developer experience, leveraging Java, Python & Scala languages
Apache Hadoop projects included: HDFS, YARN, MapReduce, Ambari, Hbase, Hive, Oozie, Parquet,
Parquet Format, Pig, Snappy, Solr, Spark, Sqoop, Zookeeper, Open JDK, Knox, Slider
https://www-01.ibm.com/software/in/data/images/bd-platform.jpg
11/3/16
Big Data on AWS introduces you to cloud-based big data solutions such as Amazon Elastic,
MapReduce (EMR), Amazon Redshift, Amazon Kinesis and the rest of the AWS big data platform.
7
http://www.strategism.org/wp-content/uploads/2015/06/amazon-800x600.jpg
11/3/16
3. Hortonworks
HDP addresses the complete needs of data-at-rest, powers real-time customer applications
and delivers robust analytics that accelerate decision making and innovation.
Hortonworks is all about data: data-in-motion, data-at-rest, and Modern Data Applications.
Our Connected Data Platforms help customers create actionable intelligence to transform their
businesses.
8
http://hortonworks.com/wp-content/uploads/2014/03/11.png
11/3/16
4. Cloudera
9
http://blog.cloudera.com/wp-content/uploads/2013/06/search.png
11/3/16
5. MapR
10
http://2s7gjr373w3x22jf92z99mgm5w-wpengine.netdna-ssl.com/wp-content/uploads/2016/03/Mapr_Zeta_4-1.png
10
11/3/16
References
11
1.
http://www.sas.com/en_us/insights/big-data/hadoop.html#hadoopimportance
2.
http://www.ironsystems.com/products/hadoop-platforms-overview
3.
http://www.slideshare.net/billonahill/intro-to-hadoop-14125097/32-Hadoop_provides_Redundant_faulttolerant_data
4.
http://www.computerweekly.com/feature/Big-data-storage-Hadoop-storage-basics
5.
https://www.linkedin.com/pulse/big-data-top-10-commercial-hadoop-platforms-bernard-marr
6.
http://data-informed.com/10-top-commercial-hadoop-platforms/
7.
http://www.cloudera.com/partners/solutions/amazon-web-services.html
8.
http://hortonworks.com/products/data-center/hdp/
9.
http://www-03.ibm.com/software/products/en/ibm-open-platform-with-apache-hadoop
10.
https://en.wikipedia.org/wiki/Cloudera
11.
https://www.mapr.com/
12.
http://searchdatamanagement.techtarget.com/feature/Inside-the-MapR-Hadoop-distribution-for-managing-big-data
13.
http://www.ironnetworks.com/
14.
http://www.ironsystems.com/
15.
http://shop.ironnetworks.com/
11/3/16