Professional Documents
Culture Documents
Objectives
39322
By the end of
this lesson, you
will be able to:
Data Explosion
39322
A typical, large stock exchange captures more than 1 TB of data every day.
There are around 5 billion mobile phones (including 1.75 billion smart phones) in the world.
Large social networks such as Twitter and Facebook capture more than 10 TB of data daily.
Types of Data
39322
39322
90% of the data in the world today has been created in the last two years alone.
80% of the data is unstructured or exists in widely varying structures, which are difficult to analyze.
Structured formats have some limitations with respect to handling large quantities of data.
39322
In its raw form, oil has little value. Once processed and refined, it helps power the world.
Ann Winblad
39322
Big data is an all-encompassing term for any collection of data sets so large and complex that it
becomes difficult to process them using on-hand data management tools or traditional data
processing applications.
The sources of Big Data are:
web logs;
sensor networks;
social media;
internet text and documents;
internet pages;
search index data;
atmospheric science, astronomy, biochemical and medical records;
scientific research;
military surveillance; and
photography archives.
Copyright 2015, Revert Technology Pvt. Ld., All rights
39322
39322
2015
2024
Responds to the
increasing velocity
Turned 12 terabytes of Tweets created each day into improved product sentiment analysis
Converted 350 billion annual meter readings to better predict power consumption
39322
Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective,
innovative forms of information processing for enhanced insight and decision making.
Source: Gartner
39322
Social
media
Web
Billing
ERP
Machine
data
Network
elements
39322
Big Data technology enables IT to leverage multiple sources of data. Following are some of the
sources:
Application data
Machine data
Enterprise data
Social data
High volume
High velocity
Variety
Variety
Structured
Semi-structured
Highly unstructured
Highly unstructured
High throughput
Ingestion at a high
Veracity
High volume
speed
Click each image to learn more.
Copyright 2015, Revert Technology Pvt. Ld., All rights
39322
The following are the requirements of the traditional IT analytics approach and factors they are
challenged by:
Requirements
Challenging factors
39322
In a typical scenario of traditional IT systems development, the requirements are defined, followed by
solution design and build. Once the solution is implemented, queries are executed. If there are new
requirements or queries, the system is redesigned and rebuilt.
Define requirements
Design solution
Execute queries
39322
Following are the requirements for using Big Data technology as a platform for discovery and
exploration, and the challenges overcome by the same:
Requirements
sources.
39322
The image illustrates how IT systems are built with the help of Big Data technology.
Determine questions
to ask and test hypothesis
39322
Analyze unstructured
data
39322
39322
Following are the challenges that need to be addressed by Big Data technology:
How to combine data accumulated from all
systems
analysis
Merging of data
Introduction to Hadoop
39322
petabytes of data
39322
Hadoop originated from the Nutch open source project on search engines and works over distributed
network nodes.
Hadoop Milestones
39322
Cluster specifications
Uses
A9.com: Amazon
Yahoo
AOL