Professional Documents
Culture Documents
Bigdataintroduction 130506060855 Phpapp01 PDF
Bigdataintroduction 130506060855 Phpapp01 PDF
Big Data/Hadoop
Introduction
Agenda
Three V's
• Volume
• Variety
• Velocity
Trendwise Analytics
Volume of Data
•
A commercial aircraft An ERP system for an mid A Video Suveillance
generates 3GB of size company grows by Camera generates 1-
flight sensor data in 1 1-2TB annually 3TB data in 3 months
hour
Market opportunity
IDC, a research firm, predicts that
the market for Big Data technology and services will reach $16.9 billion by 2015,
up from $3.2 billion in 2010. That is a 40 percent-a-year growth rate —
about seven times the estimated growth rate for the overall information
technology and communications business, according to IDC.
IT
infrastructure Legal Social network Traffic flow Web app
optimization discovery analysis optimization optimization
Churn Natural
analysis resource Weather Healthcare
exploration forecasting outcomes
16 Apr 2013
Copyright Trendwise Analytics
9
Trendwise Analytics
Big Data at GE
11
Trendwise Analytics
Govt of India
• Aadhar project by Govt. of India uses Hadoop
TECHNOLOGY
Hadoop Non Hadoop
Hadoop Components
Hadoop runs on HDFS,
Hadoop Distributed Filesystem
Any data stored is converted to blocks
and distributed across the cluster nodes
Other components
PIG
Hive
• A high-level data-flow language and
• Data Warehouse infrastructure execution framework for parallel
that provides data computation
summarization and ad hoc
querying on top of Hadoop
Zookeeper
Sqoop
• Zookeeper is a centralized
• Sqoop is a tool designed to service for maintaining
help users of large data configuration information,
import existing relational naming, providing
databases into their Hadoop distributed synchronization,
clusters and providing group services
Benefits of Hadoop
• Hadoop is designed to run on cheap commodity
hardware
• It automatically handles data replication and node
failure
• Handles large volumes of unstructured data easily
• Last but not least – its free! ( Open source)
Trendwise Analytics
• Cloudera
• Hortonworks
• Greenplum, A Division of EMC
• IBM InfoSphere BigInsights
•NoSQL Databases
Key-Values Stores – Redis, Riak
Column Family Stores – Cassandra, HBase
Document Databases – CouchDB, MongoDB
Graph Database – InfoGrid, Infinite Graph
Trendwise Analytics
Thank You!
Additional resources
Hadoop:
http://hadoop.apache.org/
http://bigdatauniversity.com/
Others:
http://www.cloudera.com/content/cloudera/en/home.html
http://hortonworks.com/