Chapter 6 Introduction To Predictive Analytics

Definitions of Predictive Analytics
HISTORY OF PREDICTIVE ANALYTICS
Who discovered predictive analytics?

• In 2003 the SPSS Senior Management & Marketing under the leadership of Jack Noonan, Dyke Hensen & Matt
Cutler coined the phrase "Predictive Analytics" to explain to the market and to the analysts how SPSS differed
from BI companies like BO and Cognos.
• Predictive analytics first started in the 1940s, as governments began using the early computers.
Though it has existed for decades, predictive analytics has now developed into a concept whose time
has come.
• In 1689, predictive analytics was used by the Lloyd company to underwrite insurance for sea voyages. Using
data, the company would accept the risk of sea voyages in return for a premium. Lloyd used data sets of past
trips in order to evaluate the risk of these voyages and predict patterns of liability. Lloyds continues to use
predictive models in all facets of their insurance underwriting, and the idea has become general-practice in the
insurance industry.
• Predictive analytics has evolved greatly since the days of Arnold Daniels and the Lloyd Insurance Company, but
the drive remains the same; to utilize data and patterns to decrease description cost, increase accuracy, and
provide managers with the tools to make the right decision the first time.
www.computerhope.com/issues/ch000984.htm
www.afterinc.com/brief-history-predictive-analytics-part-
What is predictive analytics?
The term predictive analytics refers to the use of statistics and

modeling techniques to make predictions about future outcomes
and performance. Predictive analytics looks at current and
historical data patterns to determine if those patterns are likely
to emerge again. This allows businesses and investors to adjust
where they use their resources to take advantage of possible
future events. Predictive analysis can also be used to improve
operational efficiencies and reduce risk.
https://www.investopedia.com/terms/p/predictive-analytics.asp
KEY TAKEAWAYS
•Predictive analytics uses statistics and modeling techniques to determine future

performance.
•Industries and disciplines, such as insurance and marketing, use predictive techniques
to make important decisions.
•Predictive models help make weather forecasts, develop video games, translate
voice-to-text messages, customer service decisions, and develop investment portfolios.
•People often confuse predictive analytics with machine learning even though the two
are different disciplines.
•Types of predictive models include decision trees, regression, and neural networks.
https://www.investopedia.com/terms/p/predictive-analytics.asp
Types of Predictive Analytical Models
Decision Trees If you want to understand what leads to someone's decisions, then you may find decision trees
useful. This type of model places data into different sections based on certain variables, such as
price or market capitalization. Just as the name implies, it looks like a tree with individual branches
and leaves. Branches indicate the choices available while individual leaves represent a particular
decision.
Regression This is the model that is used the most in statistical analysis. Use it when you want to determine
patterns in large sets of data and when there's a linear relationship between the inputs. This method
works by figuring out a formula, which represents the relationship between all the inputs found in the
dataset. For example, you can use regression to figure out how price and other key factors can shape
the performance of a security.
Neural Networks Neural networks were developed as a form of predictive analytics by imitating the way the human brain
works. This model can deal with complex data relationships using artificial intelligence and pattern
recognition. Use it if you have several hurdles that you need to overcome like when you have too much
data on hand, when you don't have the formula you need to help you find a relationship between the
inputs and outputs in your dataset, or when you need to make predictions rather than come up with
explanations.
Decision Trees
A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences,
including chance event outcomes
Regression
A regression is a statistical technique that relates a dependent variable to one or more independent (explanatory)
variables. A regression model is able to show whether changes observed in the dependent variable are associated with
changes in one or more of the explanatory variables.
Neural Networks
Neural networks can help computers make intelligent decisions with limited human assistance.
A neural network is a method in artificial intelligence that teaches computers to process data in a way
that is inspired by the human brain. It is a type of machine learning process, called deep learning, that
uses interconnected nodes or neurons in a layered structure that resembles the human brain
Predictive models are used for all kinds of applications,
including:
•Weather forecasts
•Creating video games
•Translating voice to text for mobile phone messaging
•Customer service
•Investment portfolio development
What are examples of predictive analytics in business?
Predictive analytics models may be able to identify correlations between

sensor readings. For example, if the temperature reading on a machine
correlates to the length of time it runs on high power, those two
combined readings may put the machine at risk of downtime. Predict
future state using sensor values.
Why is predictive analytics important?
Detecting fraud. Combining multiple analytics methods can improve pattern
detection and prevent criminal behavior. As cybersecurity becomes a growing concern,
high-performance behavioral analytics examines all actions on a network in real time to
spot abnormalities that may indicate fraud, zero-day vulnerabilities and advanced
persistent threats.
Optimizing marketing campaigns. Predictive analytics are used to determine customer
responses or purchases, as well as promote cross-sell opportunities. Predictive models help
businesses attract, retain and grow their most profitable customers.
Improving operations. Many companies use predictive models to forecast inventory and
manage resources. Airlines use predictive analytics to set ticket prices. Hotels try to predict the
number of guests for any given night to maximize occupancy and increase revenue. Predictive
analytics enables organizations to function more efficiently.
Reducing risk. Credit scores are used to assess a buyer’s likelihood of default for purchases and
are a well-known example of predictive analytics. A credit score is a number generated by a
predictive model that incorporates all data relevant to a person’s creditworthiness. Other risk-related
uses include insurance claims and collections.
PREDICTIVE ANALYTIC
TOOLS
Ericka Mae Q. Torres

Predictive Analytic Tools
 It is powered by several different models and algorithms
that can be applied to wide range of use cases.
 Determining what predictive modeling techniques are best

for your company is key to getting the most out of a
predictive analytics solution and leveraging data to make
insightful decisions.
TOP 5 PREDICTIVE ANALYTICS MODELS
1. Classification Model
2. Clustering Model
3. Forecast Model
4. Outliers Model
5. Time Series Model

CLASSIFICATION MODEL
 These model works by categorizing information based on historical
data.
 Used in different industries because they can be easily retrained with

new data and can provide a broad analysis for answering questions.
 It is excellent for answering yes/no questions.

CLUSTERING MODEL
 It is the process of separating the data sets into a specific number of
clusters in such a way that the data points having a place in a cluster
have similar attributes.
 It works using two types of clustering;
 Hard Clustering
 Soft Clustering
FORECAST MODEL
 One of the most widely used predictive analytics models.
 It deals in metric value prediction, estimating numeric value for new

data based on learnings from historical data.
 It also considers multiple input parameters.

OUTLIERS MODEL
 It is oriented around anomalous data entries within a dataset.
 It works by identifying unusual data, either in isolation or in relation

with different categories and numbers.
 It is effective in detecting fraud because it can be used to find

anomalies.
TIME SERIES MODEL
 It focuses on data where time is the input parameter.
 It works by using different data points (taken from the previous year’s
data) to develop a numerical metric that will predict trends within a
specified period.
PREDICTIVE PROBLEMS
COMMON PREDICTIVE ALGORITHMS
Machine Learning – involves structural data that we see in a table.
Algorithms for this compromise both linear and nonlinear varieties.
Deep Learning – a subset of machine learning that is more popular to

deal with audio, video, text, and images.
COMMON ALGORITHMS
1. Random Forest
2. Generalized Linear Model (GLM)
3. Gradient Boosted Model (GBM)
4. K-Means
5. Prophet
RANDOM FOREST
 The most popular classification algorithm, capable of both
classification and regression.
 It can accurately classify large volume of data.
 The name “Random Forest” is derived from the fact that the algorithm
is a combination of decision trees. Each tree depends on the values of
a random vector sampled independently with the same distribution for
all trees in the “forest”.
Advantages of Random Forest
 Accurate and efficient when running on large databases
 Multiple trees reduce the variance and bias of a smaller set or single tree
 Resistant to overfitting
 Can handle thousands of input variables without variable deletion
 Can estimate what variables are important in classification
 Provides effective methods for estimating missing data
 Maintains accuracy when a large proportion of the data is missing

GENERALIZED LINEAR MODEL (GLM)
 It is a more complex variant of the General Linear Model.
 It takes the latter model’s comparison of the effects of multiple

variables on continuous variables before drawing from an array of
different distributions to find the “best fit” model.
 It is also able to deal with categorical predictors, while being

relatively straightforward to interpret.
GRADIENT BOOSTED MODEL (GBM)
 It produces a prediction model composed of an ensemble of decision

trees before generalizing.
 It is used for the classification model.
 It builds its trees one tree at a time.
 Data is more expressive, and benchmarked results show that the GBM
method is preferable in terms of the overall thoroughness of the data.
K-MEANS
 It is a highly popular machine-learning algorithm that places

unlabeled data points into groups with similar characteristics.
 It’s a high-speed predictive algorithm primarily used in clustering

models.
PROPHET
 It is used in the time series and forecast models.
 It is an open-source algorithm developed by Facebook, used internally

by the company for forecasting.
 It is extensively used in capacity planning, like allocating resources or

setting goals for a business. Unlike manual forecasting, which
requires more human workforce and hours to draw out accurate
outputs yet provides inconsistent results, Prophet is an innovative and
valuable alternative.
DATA PRE-PROCESSING
Jamaica R. Zara
Discretization or
Normalization
Feature Selection
DATA
COLLECTIO Noise Reduction
N
Outlier Detection
OUTLIER DETECTION
GLOBAL OUTLIERS
COLLECTIVE OUTLIERS
CONTEXTUAL OUTLIERS
Discretization or
Normalization
Feature Selection
DATA
N
Outlier Detection
Instance
Selection
INSTANCE SELECTION
Discretization or
Normalization
Feature Selection
DATA
N
Outlier Detection
Instance
Selection
Missing Value
Imputation
MISSING VALUE IMPUTATION
1. SOME OF OUR FIELD VALUES OUR

MISSING
mpg cubic inches hp brand
14.000 350 165 US

1
31.900 71 EUROPE
2
17.000 302 140 US
3
15.000 400 150
4
37.700 89 62 JAPAN
5
2. REPLACING MISSING FIELD VALUES

WITH USER-DEFINED CONSTANTS
14.000 350 165 US

1
31.900 0 71 EUROPE
2
17.000 302 140 US
3
15.000 400 150 MISSING
4
37.700 89 62 JAPAN
5

WITH MEANS OR MODES
14.000 350 165 US

1
31.900 200.65 71 EUROPE
2
17.000 302 140 US
3
15.000 400 150 US
4
37.700 89 62 JAPAN
5

WITH RANDOM DRAWS FROM THE
DISTRIBUTION OF THE VARIABLE
14.000 350 165 US

1
31.900 450 71 EUROPE
2
17.000 302 140 US
3
15.000 400 150 JAPAN
4
37.700 89 62 JAPAN
5

Chapter 6 Introduction To Predictive Analytics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 6 Introduction To Predictive Analytics

Uploaded by

Copyright:

Available Formats

Definitions of Predictive Analytics

HISTORY OF PREDICTIVE ANALYTICS

Who discovered predictive analytics?

The term predictive analytics refers to the use of statistics and

•Predictive analytics uses statistics and modeling techniques to determine future

Predictive analytics models may be able to identify correlations between

Ericka Mae Q. Torres

 Determining what predictive modeling techniques are best

5. Time Series Model

 Used in different industries because they can be easily retrained with

 It is excellent for answering yes/no questions.

 It works using two types of clustering;

 It deals in metric value prediction, estimating numeric value for new

 It also considers multiple input parameters.

 It works by identifying unusual data, either in isolation or in relation

 It is effective in detecting fraud because it can be used to find

Deep Learning – a subset of machine learning that is more popular to

2. Generalized Linear Model (GLM)

3. Gradient Boosted Model (GBM)

 It can accurately classify large volume of data.

 Can handle thousands of input variables without variable deletion

 Can estimate what variables are important in classification

 Provides effective methods for estimating missing data

 Maintains accuracy when a large proportion of the data is missing

 It is a more complex variant of the General Linear Model.

 It takes the latter model’s comparison of the effects of multiple

 It is also able to deal with categorical predictors, while being

 It produces a prediction model composed of an ensemble of decision

 It is used for the classification model.

 It builds its trees one tree at a time.

 It is a highly popular machine-learning algorithm that places

 It’s a high-speed predictive algorithm primarily used in clustering

 It is used in the time series and forecast models.

 It is an open-source algorithm developed by Facebook, used internally

 It is extensively used in capacity planning, like allocating resources or

1. SOME OF OUR FIELD VALUES OUR

14.000 350 165 US

2. REPLACING MISSING FIELD VALUES

14.000 350 165 US

3. REPLACING MISSING FIELD VALUES

14.000 350 165 US

3. REPLACING MISSING FIELD VALUES

14.000 350 165 US

You might also like