Professional Documents
Culture Documents
Outline
• What is Big Data and Why Does It Matter?
• What Is Big Data?
• How Is Big Data Different and More of the Same?
• Risks of Big Data
• The Structure of Big Data
• Most Big Data Doesn’t Matter
• Mixing Big Data with Traditional Data
• Today’s Big Data Is Not Tomorrow’s Big Data
“Big data refers to data sets whose size is beyond the ability of
typical database software tools to capture, store, manage and
analyze.” - The McKinsey Global Institute, 2011
3
SIZE OF DATA
What Is Big Data?
• The “BIG” in big data isn’t just about volume
7
Big Data Analysis Example
• Big data can generate significant financial value across sectors
8
Outline
• What is Big Data and Why Does It Matter?
• What Is Big Data?
• How Is Big Data Different and More of the Same?
• Risks of Big Data
• The Structure of Big Data
• Most Big Data Doesn’t Matter
• Mixing Big Data with Traditional Data
• Today’s Big Data Is Not Tomorrow’s Big Data
10
How Is Big Data More of the Same?
• Most new data sources were considered big and difficult
• Just the next wave of new, bigger data
< The past > < The present > < The future >
11
Outline
• What is Big Data and Why Does It Matter?
• What Is Big Data?
• How Is Big Data Different and More of the Same?
• Risks of Big Data
• The Structure of Big Data
• Most Big Data Doesn’t Matter
• Mixing Big Data with Traditional Data
• Today’s Big Data Is Not Tomorrow’s Big Data
13
Why You Need to Tame Big Data
• Analyzing big data is already standard
(e.g. ecommerce)
14
Outline
• What is Big Data and Why Does It Matter?
• What Is Big Data?
• How Is Big Data Different and More of the Same?
• Risks of Big Data
• The Structure of Big Data
• Most Big Data Doesn’t Matter
• Mixing Big Data with Traditional Data
• Today’s Big Data Is Not Tomorrow’s Big Data
• Semi-structured
• Many sources of big
data
• Unstructured
• Video data, audio data
16
Various types of data formats
Exploring Big Data ▪ The time for
• The time for
developing an analysis
developing an analysis
(Initially working with big
data) Analyzing
data
(5%)
Analyzing
data
(20~30%)
18
Outline
• What is Big Data and Why Does It Matter?
• What Is Big Data?
• How Is Big Data Different and More of the Same?
• Risks of Big Data
• The Structure of Big Data
• Most Big Data Doesn’t Matter
• Mixing Big Data with Traditional Data
• Today’s Big Data Is Not Tomorrow’s Big Data
•Transform
20
•Load
The Example of RFID Tags
• Have short-term value
• (e.g.) The responses at 10 second intervals between tags and readers
Big data
Create a
synergy
effect
Other
data
23
Mixing Big Data with Traditional Data
• Browsing history
• Knowing how valuable a customer is
• What they have bought in the past
• Smart-grid data
• For a utility company
• Knowing the historical billing patterns
• Dwelling type
24
The Need for Standards
• Become more structured over time
• Fine-tune to be friendlier for analysis
• Standardize enough to make life much easier
25
Outline
• What is Big Data and Why Does It Matter?
• What Is Big Data?
• How Is Big Data Different and More of the Same?
• Risks of Big Data
• The Structure of Big Data
• Most Big Data Doesn’t Matter
• Mixing Big Data with Traditional Data
• Today’s Big Data Is Not Tomorrow’s Big Data
27
Outline
• What is Big Data and Why Does It Matter?
• What Is Big Data?
• How Is Big Data Different and More of the Same?
• Risks of Big Data
• The Structure of Big Data
• Most Big Data Doesn’t Matter
• Mixing Big Data with Traditional Data
• Today’s Big Data Is Not Tomorrow’s Big Data
360-Degree View
• Organizations have talked about a 360-degree view of their
customers
• What is a 360-degree view?
29
Web Data Overview (2/6)
98% of Information
30
Web Data Overview (3/6)
Action flow
31
Web Data Overview (4/6)
motivation1
Intention1
Motiva Preference1
tion2 Etc.
Preference
2 Inten
tion2
32
Web Data Overview (5/6)
Privacy
• Privacy may become an even bigger issue as time passes
• Faceless customer analysis
• An arbitrary ID number can be matched
• It is useful to find the pattern, not the behavior of any specific customer
Behavioral
Pattern
34
Outline
• What is Big Data and Why Does It Matter?
• What Is Big Data?
• How Is Big Data Different and More of the Same?
• Risks of Big Data
• The Structure of Big Data
• Most Big Data Doesn’t Matter
• Mixing Big Data with Traditional Data
• Today’s Big Data Is Not Tomorrow’s Big Data
Shopping Behaviors
• How customers come to a site to begin shopping
• What search engine do they use?
• What specific search terms are entered?
• Do they use a bookmark they created previously?
Associated with higher sales rates
Search keywords
36
What Web Data Reveals (2/7)
37
What Web Data Reveals (3/7)
Shopping Behaviors (cont.)
38
What Web Data Reveals (4/7)
Research Behaviors
• Understanding how customers utilize the research content can lead
to tremendous insights into
• How to interact with each individual customer
• How different aspects of the site do or do not add value
39
What Web Data Reveals (5/7)
Detailed specification
40
What Web Data Reveals (6/7)
Feedback Behaviors
• Some of the best information is
• Detailed feedback on products and services
• By using text mining, we can understand
• Tone
• Intent
• Topic
41
What Web Data Reveals (7/7)
42
Outline
• What is Big Data and Why Does It Matter?
• What Is Big Data?
• How Is Big Data Different and More of the Same?
• Risks of Big Data
• The Structure of Big Data
• Most Big Data Doesn’t Matter
• Mixing Big Data with Traditional Data
• Today’s Big Data Is Not Tomorrow’s Big Data
44
Web Data in Action (2/8)
46
Web Data in Action (4/8)
Attrition Modeling
• In the telecommunications industry,
• Companies have invested massive amounts of time and effort for “churn”
models
• It is critical to understand patterns of customer usage and
profitability
47
Web Data in Action (5/8)
Provider 101’s
cancellation
policies page
Response Modeling
• It is similar to attrition modeling
• The goal is predicting a negative behavior rather than a positive behavior
(purchase or response)
• In response model, all customers are scored and ranked
• In theory, every customer has a unique score
• In practice, a small number of variables define most models
• Many customers end up with identical or nearly identical scores
• Web data can help increase differentiation among customers
49
Web Data in Action (7/8)
Customer Segmentation
• Web data enables to segment customers based upon typical
browsing patterns
Dreamer
51
Thank you
The Evolution of Analytic Processes
Outline
• Introduction
• The Analytic Sandbox
• Analytic Data Set (ADS)
• Enterprise Analytic Data Set (EADS)
• Scoring Routines
54
Introduction
• Upgrading technologies won’t provide a lot of value, if the same old
analytical processes remain in place
1. Change the process of configuring and maintaining workspace
The Analytic SandBox
Embedded Scoring
55
Outline
• Introduction
• The Analytic Sandbox
• Analytic Data Set (ADS)
• Enterprise Analytic Data Set (EADS)
• Scoring Routines
56
The Analytical Sandbox (1/5)
Definition
• A set of resources that enable analytic professionals to experiment
and reshape data in whatever fashion they need to
• Data exploration
• Development of analytical processes
• Proof of concepts
• prototyping
57
The Analytical Sandbox (2/5)
An Internal Sandbox
• A portion of an enterprise data warehouse or data mart is carved out
to serve as the analytic sandbox
• Strength
• Leverage existing hardware resources and infrastructure already in place
• Ability to directly join production data with sandbox data
• Cost-effective since no new hardware is needed
• Weaknesses
• An additional load on the existing enterprise data warehouse or data mart
• Can be constrained by production policies and procedures
Sandbox
Analytic Views & Core Database
Enterprise Analytic Data Tables
Sets
Extract
61
Outline
• Introduction
• The Analytic Sandbox
• Analytic Data Set (ADS)
• Enterprise Analytic Data Set (EADS)
• Scoring Routines
62
Analytic Data Set (1/2)
Definition
• The data that is pulled together in order to create an analysis or
model
• In the format required for the specific analysis at hand
• Generated by transforming, aggregating, and combining data
• Help to bridge the gap between efficient storage and ease of use
63
Analytic Data Set (2/2)
Two Primary kinds of Analytic Data Sets
• A development ADS
• Used to build an analytic process
• Have many variables or metrics within it
• Very wide but not very deep
• Production analysis data set
• Needed for scoring and deployment
• Contain only the specific metrics that were actually in the final solution
• Not very
Table1
wide but very deep Production ADS
Table2
Table3
Table4 Development Analytic Data Set
Table5
Table6
Narrow & Deep Wide & Shallow
Base
Tables
64
Derive, Aggregate, Combine, and Transform….
Outline
• Introduction
• The Analytic Sandbox
• Analytic Data Set (ADS)
• Enterprise Analytic Data Set (EADS)
• Scoring Routines
65
Enterprise Analytic Data Set (1/5)
Traditional Analytic Data Sets
• All analytic data sets are created outside of the database
• Each analytic professional creates their own data sets independently
• The risk of inconsistencies
• The repetitious work
66
Enterprise Analytic Data Set (2/5)
Enterprise Analytic Data Set
• A shared and reusable set of centralized, standardized analytic data
sets for use in analytics
• A standardized view of data to support multiple analysis efforts
• Streamline the data preparation process
• Provide grate consistency, accuracy, and visibility to analytics processes
• Build once, use many
67
Enterprise Analytic Data Set (3/5)
Structure
EADS Logical View:
Customer ADS
Table
Total Total Home- Mail E-mail
Customer Gender
Sales Purchases owners Responder Opt in
Customer
Sales
It could very well be stored
Mail E-mail differently!
Customer For updating an
Responder Opt in
EADS
68
Enterprise Analytic Data Set (4/5)
Summary Table or View?
• Summary tables that are updated via a scheduled process
• Benefits
• Compute once, use many
• Most advanced analytics efforts involve a heavy use of historical data
• Very low latency in getting data
• Downsides
• Not be fully up-to-date with the latest data
• Use disk space on the system, potentially a whole lot of it
69
Enterprise Analytic Data Set (5/5)
Summary Table or View?
• A series of views that are run on demand
• Benefits
• be completely fresh and updated
• Good performance in real-time analysis
• Changes are immediately available
• Consistency and transparency of the computations
• Downsides
• The system load won’t necessarily be reduced that much
• Have to wait longer to get their data back
70
Outline
• Introduction
• The Analytic Sandbox
• Analytic Data Set (ADS)
• Enterprise Analytic Data Set (EADS)
• Scoring Routines
71
Scoring Routines (1/2)
Embedded Scoring
• Score
• Something generated from a predictive model, or any other type of output from
analytic process
• Embedded Scoring
• Deploying each individual scoring routine
• A process to manage and track the various scoring routines
• Benefits
• Scores run in batches will be available on demand
• Real-time scoring
• Abstract complexity from users
• Have all the models contained in a centralized repository so they are all in one place
72
Embedded Scoring (2/2)
Model and Score Management
• Model and score management procedures will need to be in place to
scale the use of models by an organization
Analytic Data Set Inputs
Model Definitions
73
The Evolution of Analytic Tools and
Methods
Outline
• Introduction
• The Evolution of Analytic Methods
• The Evolution of Analytic Tools
75
Introduction
• Analytic professionals have used a range of tools over the years
• Execute analytic algorithms
• Assess the results
But Now
76
Outline
• Introduction
• The Evolution of Analytic Methods
• The Evolution of Analytic Tools
77
The Evolution of Analytic Methods(1/7)
Sophisticated
Naïve algorithm
algorithm
NOW
output output
78
The Evolution of Analytic Methods(2/7)
Ensemble Methods
logistic
regression final
aggregator
decision result
tree
neutral
network
79
The Evolution of Analytic Methods(3/7)
80
The Evolution of Analytic Methods(4/7)
Commodity Model
• Commodity model has been produced rapidly
• A commodity modeling process stops when something good enough
is found
10
0 GREAT
90 SOMETIMES
ACCEPTABL
E
81
The Evolution of Analytic Methods(5/7)
Uses for Commodity Models
VS
82
The Evolution of Analytic Methods(6/7)
Text Analysis
• Analysis of text and other unstructured data sources is growing
rapidly
• Unstructured data is applied to some structure after being processed
• Structured results are what is analyzed
Structured
data
parser
83
The Evolution of Analytic Methods(7/7)
Ambiguity
• Applying context to the text is no easy task
• read a book vs book a ticket
• Emphasis can change the meaning
Varying the emphasis Changes the meaning
I didn’t say Bill’s book stinks But my buddy Bob did!
I didn’t say Bill’s book stinks How dare you accuse me of such a thing
I didn’t say Bill’s book stinks But I admit that I did write it in an e-mail
I didn’t say Bill’s book stinks It’s that other guy’s book that stinks
I didn’t say Bill’s book stinks I said his blog stinks
I didn’t say Bill’s book stinks I simply said it wasn’t my favorite
84
Outline
• Introduction
• The Evolution of Analytic Methods
• The Evolution of Analytic Tools
85
The Evolution of Analytic Tools(1/7)
Previous Tools
• Analytics work was done against a mainframe in 1980s
• Not user-friendly
• Directly program code to do analytics
86
The Evolution of Analytic Tools(2/7)
Graphical User Interface
87
The Evolution of Analytic Tools(3/7)
The Explosion of Point Solutions
88
The Evolution of Analytic Tools(4/7)
Open Source
89
The Evolution of Analytic Tools(5/7)
More object-oriented
Programming is intensive
90
The Evolution of Analytic Tools(6/7)
Data Visualization
• An effective visualization can make a pattern jump right off the page
at you
• Today’s visualization tools allows
• Multiple tabs
• Link the graphs and charts with underlying data
• New idea for data visualization
• 3-D
91
The Evolution of Analytic Tools(7/7)
92
THANK YOU