Professional Documents
Culture Documents
Big Data
• Using data to understand customers and businesses operations to
sustain and foster growth and profitability is an increasingly more
challenging task for today’s enterprises.
• As more and more data is available, timely processing of data with
the traditional tools is becoming impractical.
• This phenomenon of having a huge set of data coming in real time
is termed as big data.
• Big data is becoming more of a buzz word but in actual terms is the
analytics that is behind the curtains.
• The term ‘Big’ is a relative term and it depends on the organization
size as well as its interpretation of this term.
• Big data has become a popular term to describe the exponential
growth, availability and the use of information, both structured and
unstructured.
Sources of Big Data
• Where is the big data come from?
• A simple answer is ‘everywhere’.
• The sources we ignored earlier because of technical
limitations are treated as gold mines today.
• Big data may come from web logs, RFIDs, GPS systems,
sensor networks, social networks, IOT, search indices,
detail call records, science experiments like nuclear physics,
medical records, military surveillance, photo archives,
video archives, e-commerce practices etc.
• Since the advent of data warehouses in early 90s,
companies are storing relevant data in large volumes.
• Many believe that big data is not only dependent on data
itself but variety, velocity, veracity, variability and value
preposition are also an important aspects of Big Data.
The Vs that define Big Data
• Big data is typically defined by three “V”s :
– Volume,
– Variety and
– Velocity.
14
Differences We See
• Big Data
• Real-Time use of data
• Selling data
• Decision-Based data
15
16
Allen’s Definition of Business
Analytics
Utilizing Data to Increase Shareholder Value
Data =
Big and Small
Internal and External
Structured and Non-structured
Traditional and “New”
“Free” and Purchased
Utilizing =
Determine Business Needs
Capture and Store
Ensure Quality
Access and Format
Analyze and Summarize
Gain Insight and Produce Action
and …….. ‘Sell’ It
18
http://topmanagement.com.mx/innovacion-social-y- empresarial-objetivo-de-hitachi/
3
The Growth of Data
https://www.domo.com/learn/data-never-sleeps-3-0
4
http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427
The Emergence of Big Data
Tools
http://blogs.forrester.com/category/hadoop 6
http://solutions.forrester.com/Global/FileLib/webinars/Big_Data_-_Gold_Rush_or_Illusion.pd
f
7
http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/
8
http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/
https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now
9
http://hortonworks.com/blog/big-data-refinery-fuels-next-generation-data-architecture/ 10
11
http://www.kdnuggets.com/2014/05/big-data-landscape-v30-
analyzed.html
http://dataofthings.blogspot.com/2014/04/the-bbbt-sessions-hortonworks-big-data.html
17
18
http://www.gartner.com/it-glossary/predictive-analytics
Types of Questions and Analytics
DatascienceTh.com
Doing Data Science by O'Neil et al (2013) Data
Picture:http://www.clipartpanda.com/categories/scientist-
clip-art Scientist
Data
Visualization
21
What would be good data Visualization
designer?
22
Fundamentals of Big Data Analytics
• Big Data by itself is useless unless business users
do something about it which delivers some value
to the organization.
• The traditional means for capturing, storing and
analyzing data are not capable of dealing with Big
Data effectively and efficiently.
• New technologies are required to deal with the
enormous amount of data.
• Before investing in high end technologies,
organizations need to take decisions regarding its
use, importance, velocity etc.
The success of Big Data Analytics depends on a
number of factors. Some critical factors are
Business Problems Addressed by Big
Data Analytics
• Top business decision taken with the help of Big Data are
process efficiency, cost reduction, enhancing customer
experience and risk management.
• Efficiency and cost reduction with BDA are mostly
addressed in manufacturing, government, energy,
communication, media, transport and healthcare sector.
• Enhanced customer experience may be important for
insurance companies and retailers.
• Risk management is useful for banking sector and new
product development.
• Other problems like fraud detection, identifying new
markets, revenue maximization etc. can also be dealt with
big data analytics.
Big Data Technologies
• Although there are number of different
technologies that are useful in analyzing Big
Data. Most of them share some common
characteristics.
• There are three Big Data Technologies that
stand out of the lot:
– MapReduce
– Hadoop
– NoSQL
MapReduce
• MapReduce is a technique popularized by Google
that distributes the processing of a very large
multi-structured data files across a large cluster
of machines.
• High performance is achieved by breaking the
processing into small units of work that can be
run in parallel across thousands of clusters.
• Map reduce help organization in processing and
analyzing large volumes of multi-structured data.
For example- graph analysis, text analysis,
machine learning, data transformation etc.
Hadoop
• It is an open source framework for processing, storing
and analyzing massive amounts of distributed,
unstructured data.
• Hadoop was inspired by MapReduce and was designed
to handle petabytes and exabytes of data.
• Rather than banging away huge block of data with
single machine, Hadoop breaks up Big Data into
multiple parts so each part can be processed and
analyzed at the same time.
• Sources of data may include log files, social media
feeds and internal data sources.