You are on page 1of 7

A

J. Alberto Espinosa, Ph.D. (alberto@american.edu)


AmCham Predictive Analytics Dr. Espinosa is a Professor, former Chair and curriculum coordinator of the Information
Technology and Analytics Department at the Kogod School of Business at American

Predictive Analytics
University. He holds a Ph.D. and M.S. degrees in Information Systems from the Tepper
School of Business at Carnegie Mellon University; a Master’s degree in Business
Administration from Texas Tech University; and a Mechanical Engineering degree

and Machine Learning


from Universidad Catolica, Peru. He is the lead architect of Kogod’s MS Analytics

U
(campus and online) programs and undergraduate analytics specializations. His
research focuses on coordination and performance in global technical projects across

in Business global boundaries, particularly distance and time separation (e.g. time zones) and the
visual and quantitative representation of knowledge networks through analytics. His
J. Alberto Espinosa, Ph.D.
work has been published in leading scholarly journals, including: Management Science;
Organization Science; Information Systems Research; the Journal of Management
Information Technology and Analytics Deptartment Information Systems; IEEE Transactions on Software Engineering; IEEE Transactions
on Engineering Management; Communications of the ACM; Information, Technology
Presented on 6/4/2020 and People; and Software Process: Improvement and Practice. He is also a frequent
presenter in leading academic conferences. He has over 60 published articles, book
chapters and books, with over 5,000 Google Scholar citations.

1 2

Data, Science, Models?


A Few Popular Quotes Results

“In God we trust,


Report Meaning Storytelling

all others must bring data” Models


W. Edwards Deming, engineer and statistician Model Interpretable
Black
Model Box
Specification Model
“Torture the data and Assumptions Method
Machine Learning

it will confess to anything”


Data
Ronald Coase, Economics, Nobel Prize
Structured
“Our decisions are based (Tables) Big Data Text Unstructured

on data and science”


Most politicians these days Business Problem / Question
Quantitative Classification
4
3

3 4

1
Billy Beane (Oakland A’s) incorporated
(“sabermetrics”) analytics and statistics,
reaching record profits in baseball
Moneyball

5 6

5 6

The Age of Analytics Analytics


• Without question, we are now in “the age of “It is the scientific process of transforming data
analytics, machine learning, artificial intelligence into insight for making better decisions” (INFORMS)
and big data” Lack analytical talent – by 2018, the US faces a shortage of
• Everybody talks about these, everyone claims to be 140,000 to 190,000 people with deep analytical skills as well as
1.5 million managers and analysts to analyze big data and make
doing it, but how many really know it? decisions (McKinsey Global Institute, 2011) – 2018 is over, is it true?
• What are the best practices?
• Everyone is shaking the tree, but who is making jam The Age of Analytics
McKinsey Global Institute, 2016
(asked Tom Davenport)? What are the secret sauces?

• What is the body of knowledge that constitutes the


field of analytics?

7 8

7 8

2
Analytics Types (INFORMS)
Analytics Methods
• Descriptive: getting meaning from the data – e.g.,
descriptive statistics, data mining, cluster analysis, • Association: models based on how variables
market basket analysis, etc. co-vary or correlate – e.g., how much does annual
income increase with each year of additional
• Predictive: using some data to predict outcomes university education?
(quantitative or classification) – e.g., projecting • Classification: models that classify observations
sales/profits, decease survival prognosis, probability of or predict the probability that a case will fall in a
losing/gaining a client, probability of a security breach, particular class – e.g., yes/no, positive medical
etc. diagnosis, loan default, cyber security breach, etc.
• Prescriptive: decision models • Visual: e.g., Tableau, SAS, SAP Lumira, R, Python
that use data to recommend • Network Analytics: (visual & quantitative)
what to do to achieve outcomes relationships (people, entities, etc.), social networks
– e.g., optimal pricing, where to • Unstructured Data Analysis: text mining, natural
drill for oil, selecting job language processing, social media, etc.
applicants, etc. 9 10

9 10

CRISP-DM
Step 1 – Business Understanding (CDM-1)
(1) Formulate Business Case and Question
(2) Translate into Analytics Question: The
Cross-Industry
Analytics
a. Quantitative (regression, regression trees, etc.) or
Standard Process
CDM-1 CDM-2 b. Classification (e.g., logistic regression, classification trees, etc.);

for Data Mining Business Data


Step 2 – Data Work (CDM-2, 3)
Life Cycle
Understanding Understanding Identify & gather data (structured, unstructured, visual, etc.) (Espinosa)
Pre-process data: cleanse, prepare, transform, format, etc.
Descriptive Analytics: familiarize with and analyze the data;
CDM-3 unsupervised learning; identify patterns: descriptive statistics, correlation,
ANOVA, cluster analysis, etc.
Data
Preparation Step 3 – Select Model Method and Model Specification (CDM-4)
Data Predictive Analytics – predict outcomes; supervised learning:
CDM-6 Goals: Inference; Interpretation of results; accurate Prediction of outcomes
CDM-4
Model Selection: OLS assumptions, suitable model, cross-validation
Deployment Model Specification: business domain; complex vs. parsimonious; variance
Modeling vs. bias; dimensionality; etc.
Prescriptive Analytics – decision models; optimization, etc. (not covered)
Goals: inform best courses of actions
CDM-5

Step 4 – Analysis (CDM-5)


Evaluation Analysis goals: Inference; interpretation; prediction;
Evaluation: fit statistics; cross-validation; etc.

Step 5 – Reporting (CD-6)


Written, interactive, visual, “storytelling”, etc.
11 12

11 12

3
Step 1 – Business Understanding (CDM-1) • Ordinary Least Squares
(1) Formulate Business Case and Question •
Analytics Modeling Options
Weighted Least Squares
(2) Translate into Analytics Question: The •
The Analytics Life Cycle
Generalized Linear Model

Analytics
a. Quantitative (regression, regression trees, etc.) or • Ridge Regression
(Espinosa)
b. Classification (e.g., logistic regression, classification trees, etc.); • LASSO Regression

Step 2 – Data Work (CDM-2, 3)


Life Cycle •

Principal Components Regression
Partial Least Squares
Modeling Method

Identify & gather data (structured, unstructured, visual, etc.) (Espinosa) • Non-Linear Models Structured
• Regression Trees Visual, Text,
Pre-process data: cleanse, prepare, transform, format, etc. • Piecewise Linear Regression • Classification Trees
Unstructured, etc.

Step 1 – Business Understanding
Descriptive Analytics: familiarize with and analyze the data; • Step Regression Bootstrap Aggregation
(CDM-1) Bubble charts, network
• Random Forest
Step
Step 4 3
2– – Select
– Analysis
Data Work Model Method and Specification (CDM-4)
unsupervised learning; identify patterns: descriptive statistics, correlation,
(CDM-2, 3)
• Linear, Cubic, Smoothing Splines
• Boosting diagrams, natural
Step (CDM-5)
ANOVA, cluster analysis, etc. • Neural Networks
Descriptive
Cluster analysis, correlation, market basket Trees
(1)Formulate
Step 5
Identify –
Predictive&
Business
Reporting
Analytics
gather data – predict Case
outcomes;
(CD-6)
(structured,
and Question
supervised
unstructured, learning:
visual, etc.) • • Ensemble Modelslanguage processing,
Structural Equation Modeling analysis, sample statistics, ANOVA clustering dendograms,
Analysis goals: Inference; interpretation; prediction; • Etc. • Etc.
(2)Translate
Goals: Step 3 – into
Inference;Select Analytics
Model Method
Interpretation of Question:
and Model
results; Specification (CDM-4) etc.
Written, interactive,
Pre-process
Evaluation: fitdata: visual,
cleanse,
statistics; “storytelling” , etc. format, etc. of outcomes
accurate
prepare, transform,
cross-validation; etc.
Prediction
Predictive Analytics – predict outcomes; supervised learning:
Model Selection:
a.Quantitative
Descriptive Analytics:
Goals: OLS
Inference; assumptions,
(regression,
familiarize
Interpretation with
of suitable
regression
and
results; model,
trees,
analyze
accurate thecross-validation
etc.)
Prediction or unsupervised
data;
of outcomes Predictive Association Decision Tree Charts
Model Model Selection:business
Specification: OLS assumptions,
domain; suitable model, cross-validation
complex vs. parsimonious; variance
learning; identify
b.Classification patterns:
(e.g.,
Model Specification:
descriptive
logistic
business
statistics,
regression,
domain;
correlation,
classification
complex vs. parsimonious;
ANOVA, cluster
trees,
variance etc.); Regression plots, scatter
Quantitative
vs. bias; etc.
analysis, dimensionality; etc.etc.
vs. bias; dimensionality; Regression Regression Trees plots, Tableau diagrams,
Value trend charts, etc.
Prescriptive Analytics – decision models; optimization, etc. (not covered)
Prescriptive Analytics – decision models; optimization, etc. (not covered)
Goals: inform best courses of actions
Goals: inform best courses of actions Logistic Regression;
Tree maps, interactive
Classification Other Categorical •Classification TreesRegression
Binomial Logistic diagrams,
Step 4 – Analysis (CDM-5) Regression Models • Multinomial Logistic Regression
Analysis goals: Inference; interpretation; prediction;
• Probit Regression
Evaluation: fit statistics; cross-validation; etc.
• Linear Discriminant Analysis
Operations research, decision modeling,
Prescriptive • Quadratic Discriminant Analysis
Simulations, etc.
Step 5 – Reporting (CD-6) optimization, linear programming
• Neural Networks
Written, interactive, visual, “storytelling”, etc. • Etc.
13 14

13 14

Machine Learning & Artificial Intelligence


Interpretable and Black Box Models
Artificial Intelligence
Interpretable (or Explainable) Models
Unsupervised Learning

Easy to explain how the model works and what the What-If Expert Machine (Descriptive Analytics)
Supervised Learning
results mean → e.g., OLS regression, logistic Logic Systems Learning (Predictive Analytics)
regression, decision trees → good for
interpretability and inference Statistical Interpretable Deep
Learning vs. Black Box
Learning
Black Box Models (non-interpretable) Train Model
+
Test Accuracy Regression Tree Natural
The complexities of the internal model are difficult Neural Tensor
Models Models Language Networks
to explain → e.g., neural networks, deep learning → Flow, etc.
Processing
good for predictive accuracy
→ Machine Learning
Quantitative Classification
15 Predictions Predictions 16

15 16

4
Machine Learning Predictive Modeling and
Is method of data analysis that uses algorithms that learn
from data iteratively allowing computers to find hidden
Machine Learning llustration
insights without being explicitly programmed (SAS:
http://www.sas.com/en_us/insights/analytics/machine-learning.html) PredictiveMachine
Model Learning
Dataset Outcome

Train
➢ Unsupervised learning → data exploration without specific
Predict
goals (closely associated with descriptive analytics and data
mining, e.g., clustering, correlation analysis, histograms)
Test
➢ Supervised learning → data analysis with specific goals in
mind (closely associated with analytics, e.g., regression) →
not much different than predictive modeling Predict
New Data ?
Split Train Model / Subset
the
Data Test Accuracy / Subset
17
18

17 18

Machine Learning
Business Applications Big Data’s 3 V’s
(Doug Laney, Gartner)

• Marketing – Hello Barbie, Starbucks, Target


– Volume: The amount/size of data
• Arts – Chef Watson (IBM), music generation
– Velocity: The pace at which data arrives,
• Energy – predictive maintenance, failure prevention, conservation
changes and/or loses relevance/value
• Financial Services – American Express, Capitol One
– Variety: The different types of data types
• Healthcare – Pandemic models, tumor detection, radiology
represented in the data – structured (e.g.,
• Manufacturing – Volvo vehicle monitoring, self-driving vehicles tables), unstructured (e.g., text, social
• Media – Alexa, Netflix, Spotifi network), visual, multimedia
• Retail – Amazon recommend, smart malls, consumer behavior – Other V’s proposed: veracity, variability
• Service – Virtual assistants, chat bots, Disney’s MagicBands (i.e., data at rest vs. in motion), value, etc.
• Social Media – Twitter user preferences, Facebook likes
19 20

19 20

5
Analytics Body of Knowledge
Data Science Zoo
Business Domains
Mathematics & Software Social Media
Business
Statistics Programming Healthcare
Data Marketing
Science (R, Python)
Machine Fraud & Data Cyber
Learning Forensics Finance Governance Security
Analytics Policy

Business
Predictive
Intelligence
Analytics Core
Modeling Prescriptive
Descriptive Machine Data
Analytics Analytics Mining
Learning Business
Predictive Text Data Intelligence
Data Mining Visualization
Analytics
Mining

Foundations (IT) Math &


Information Statistics
Technology Software &
Database & Data
Tools Warehousing Programming
Big Data
Computer Science Database & R, Python
& Artificial Intelligence Data Engineering 22
21

21 22

IST & Business Analytics @ Kogod


Information Systems and Technology (IST)
BSBA IST Specialization
https://www.american.edu/kogod/undergraduate/business-administration.cfm

IST Minor
Training
https://www.american.edu/kogod/undergraduate/it-minor.cfm

Business Analytics Modules


BSBA Analytics Specialization (coming soon)
Business Analytics Minor (coming soon)
MS Analytics (STEM) http://fs2.american.edu/alberto/www/analytics/
BS/MS Dual Degree option → double count 9 credits
23
24

23 24

6
A
U
Questions

25

25

You might also like