You are on page 1of 4

What is Prediction?

• Predicting the identity of one thing based


Data Mining purely on the description of another, related
Prediction thing
• Not necessarily future events, just
Kevin Swingler unknowns
• Based on the relationship between a thing
that you can know and a thing you need to
predict

2 of 23

How Does it Differ From


Terms
Classification?
Predictor => Predicted • A classification problem could be seen as a
predictor of classes, but ….
• Predicted values are usually continuous whereas
• When building a predictive model, you have classifications are discreet.
data covering both • Predictions are often (but not always) about the
• When using one, you have data describing future whereas classifications are about the
present.
the predictor and you want it to tell you the
• Classification is more concerned with the input
predicted value than the output
3 of 23 4 of 23

Usual Examples Techniques


• Predicting levels of sales that will result from a • Most prediction techniques are based on
price change or advert. mathematical models:
• Predicting whether or not it will rain based on – Simple statistical models such as regression
current humidity
– Non-linear statistics such as power series
• Predicting the colour of a pottery glaze based on a
– Neural networks, RBFs, etc
mixture of base pigments
• Predicting how far up the charts a single will go • All based on fitting a curve through the
• Predicting how much revenue a book of debt will data, that is, finding a relationship from the
bring predictors to the predicted

5 of 23 6 of 23

1
Simple Worked Example The Data
Price Cover Competition Advert spend Sales
22 Political No 5900 15392

• Predicting sales levels for a national 39


28
25
Political
Sport
Sport
No
No
No
5100
5700
5600
14350
14491
14849
Sales increase as price decreases
newspaper 38 Royal No 5400 14029 but other factors play a part too.
22 Royal No 5900 15192
21 Crime No 5500 15433
20 Royal No 5400 15273
31 Royal High Val 5700 14914
26 Royal Low Val 5300 14596
23 Sport No 5100 14742
Predictors Predicted 23
21
Sport
Royal
High Val
No
5900
5400
15147
15032
Sales by Price

29 Crime Low Val 5800 14635


– Price Sales in Units 25 Sport Low Val 5500 14449
16000

24 Sport Low Val 5900 14500 15500


– Front cover story 32
27
Crime
Sport
No
No
5800
5700
14852
14546 15000
30 Sport High Val 5600 14774
– Competitions 31
29
Royal
Political
No
High Val
5500
5900
14713
15435
14500

14000
39 Sport No 5600 13753
– Advertising 32
23
Political
Royal
No
High Val
5900
5600
14852
15345
13500
0 10 20 30 40 50
spend 31
27
Sport
Royal
No
No
5800
5900
14315
14947
31 Sport No 5300 14511

7 of 23 8 of 23

Mathematical Model Neural Network Example


• Learns relationship between all predictors at • A certain type of neural network, called a
once and the predicted outcome: multi layer perceptron (MLP) can learn a
function between our inputs (qualities of a
Sales=f(Price, Cover, Adverts, Competition) newspaper) and the outcome (Sales)
• Sales are a function of several variables. • It works by building the function out of
many small simple functions, joined by
• The job of a data mining algorithm is to find weighted connections
the function f
9 of 23 10 of 23

MLP Structure Neural Network Example


Output Layer
• A neural network uses the data to modify
Every unit does the
same thing: the weighted connections between all of its
functions until it is able to predict the data
Hidden Layer
O j = f (∑ wij ⋅ Oi ) accurately
i • This process is referred to as training the
neural network
Input Layer 1
f (a ) =
1 + e−a
11 of 23 12 of 23

2
Neural Network Training How are the Weights Changed?
1. Prepare the data so that a file contains the predictors and • Training data has inputs and outputs, in this
the predicted variables with an example per row
2. Split the data into a test set and a training set example, newspaper details and sales figures
3. Read each row in turn into the neural network, • The MLP starts with random weights
presenting the predictors as inputs and the predicted
value as the target output • Each example in the training data is used as an
4. Make a prediction and compare the value given by the input and the network generates an output
neural network to the target value
5. Update the weights – see next slide • The difference between that output and the value
6. Present the next example in the file in the training data is known as the error
7. Repeat until the error no longer reduces – ideally stop
when the test error is at its lowest.
13 of 23 14 of 23

Error Back Propagation Qualities of a Predictor


• An algorithm known as error back propagation • Which ever technique you use, it should
uses this error value to change the weights have the following qualities:
– Ability to make correct predictions on data that
The weight change from the input layer unit i to hidden layer unit j is:
is not in the original training data
∆wij = η ⋅ δ j ⋅ xi where δ j = o j (1 − o j )∑ w jk ⋅ δ k
k
– Ability to provide a certainty measure with its
The weight change from the hidden layer unit j to the output layer unit k is: predictions
∆w jk = η ⋅ δ k ⋅ o j where δ k = (error ) yk (1 − yk ) • How well a solution performs depends on
both the data and the person who built it
15 of 23 16 of 23

Important Concepts Important Concepts


• Over Fitting • Multiple solutions
– A data mining predictor can capture the – It is possible (easy, in fact) to build more than
structure of the data so well that irrelevant one correct (or equally accurate) predictor from
details are picked up and used when they are the same data set
not generally true
– Several such predictors should be built and
• Data Quality and Quantity
compared
– Insufficient data or data that does not capture
the relationship between predictors and – A winner might be chosen, or several could be
predicted can produce a very poor solution used as a ‘panel of experts’

17 of 23 18 of 23

3
Non-linear? Non-linear?
5 0.3

• Curvy! Or to be more specific: 4 0.25

0.2
3

0.15
2
0.1

“ If x predicts y then they have a non-linear 1

0
0.05

relationship if the effect on y of a small change -0.6 -0.4 -0.2


-1
0 0.2 0.4 0.6
-0.6 -0.4 -0.2
0
0 0.2 0.4 0.6

in x depends on the current value of x. ” Where ever you are along the Here, moving a unit to the right
line on the linear plot above, on the line above will carry you
moving one unit to the right up a different amount, depending
will move you up 5 units. on where you are: non-linear
The 1/5 ratio is constant
19 of 23 20 of 23
so the relationship is linear

Non-Linear Advantages of Neural Networks


• Note that if you have more than one • Very powerful predictors – almost always
predictor, non-linearity can occur as two or better than any rule based system a human
more predictors combine expert could design
• E.g Putting the price up 1p will cause you to • Can cope with non-linear relationships,
sell 1000 fewer newspapers when there is a multiple numeric and discreet variables
political story on the front cover, but only • Able to generalise to data that it has not
500 fewer with sport on the cover seen before

21 of 23 22 of 23

Disadvantages
• How predictions are gained can be hard to
understand by a human user
• Not easy to ask why an answer was given
(though some help is possible)
• No rules to look at
• Can make big errors if not trained properly
• Requires a certain degree of faith!
23 of 23

You might also like