You are on page 1of 54

Instructions on Using the tool

( Building a prediction Model)

Step 1: Enter Your Data


(A) Enter your data in The Data worksheet, starting from the cell AC105
(B) The observations should be in rows and the variables should be in columns.
(C) Above each column, choose appropriate Type (Omit, Output, Cont, Cat)
To drop a column from model - set the type = Omit
To treat a column as categorical Input, set type = Cat
To treat a column as continuous Input, set type = Cont
To treat a column as Output, set type = Output
You can have atmost 10 output variables. Application will automatically treat them all as continuous variables.
Usually one builds prediction model with 1 output only.
If you have say, 2 output variables Y1 and Y2, both of which depend on the same set of Input variables,
you may be better off, building 2 separate models - One with Y1 as Output, another one with Y2 as output.

You can have at most 50 input variables, out of which atmost 40 could be categorical.
Make sure that the number of Input (Cat & Cont) columns exactly match with the number entered in UserInput sheet.
(D) Please make sure that your data does not have blank rows or blank columns.
(E) Continuous Inputs:
Any non-number in Cont column will be treated as missing value.
Application will replace it by the column mean
(E) Categorical Inputs:
Any blank cell or cells containing Excel error in Cat column will be treated as missing value
Application will reaplce it by the most frequently occuring category.
Category labels are case insensitive - lables good, Good, GoOd, GOOD will all be treated as the same categor
There should be at least 2 observations in each category of a Cat column.
If one of the category of a Cat column has only 1 observation, you should do one of the following Remove that observation OR
Rename the category to any other categories of that Cat column.
Step 2: Fill up Model Inputs
(A) Fill up the model inputs in the User Input Page.
(B) Make sure that your inputs are within the range of values allowed by the application.
(C) Click the 'Build Model' button to start modeling.
Step 3: Results of Modeling
(A) A Neural Network model is basically a set of weights between the layers of the net.
At the end of the run, the final set of weights are saved in the Calc sheet.
(B) The output page of this file will show you the values of MSE and ARE on the training and validation set
as the training of the model progresses. Two charts showing training and Validation MSE's
have been already provided in the Output sheet.
(C) In UserInput page if you have asked to save the model in a separate file, then a new file
will be created containing the model inputs, your data, and the fitted model ( i.e. the weights)
You will be able to use this file as a calculator to do prediction, given any new input.
Step 4: Study Profiles
Fitted model is a surface in p -dimension where the number of your inputs is p .
Unless p is 2 or less, it is not possible to show the surface graphically.
Profile plot is the next best way to visualize this fitted surface.
By varying only one predictor between two values and keeping all the others fixed at some pre-specified values

we get the profile plot - which is really a one dimensional cross section of the high dimensional surface.
In the Profile sheet you can specify which predictor to vary and the values at which the other predictors should be held fixe
Click Create Profile button to generate the profile.
If the predictor you choose to vary is categorical then the other info ( #points to be generated, start and end values)
will be ignored and the graph will show you the predicted response for each category of the predictor you have chosen to v
Profile plot lets you study the following things:
(1) Nature of relationship bettween a particular predictor X and the response Y
( E.g. Y increases as X increases OR Y decreases as X increases
OR the relationship is non-linear - Y first increases and then decreases with X etc etc.
(2) Profile plots also lets you study the interaction between predictors.
Suppose there are two predictors X and Z and we are studying the profile of Y as X varies
Suppose we look at the profile by keeping Z fixed at 1 and varying X between -10 and 10.
Now keep Z fixed at 2 instead of 1 and vary X between -10 and 10.
If the shape of the profiles in these two scenarios are drastically different
(e.g. one is increasing and the other is decreasing) then that says thay X and Z has interaction.
In other words, the effect of X on the Response is not same at all levels of Z
To study the effect of X, it matters where Z is set.

A few more points


Initial weights
For the training of the model, we need to start with an initial set of values of the network weights.
By default, the weights are initialized with random values between -w and w .
where w is a number between 0 and 1, specified by you in the UserInput page.
(A) Once you build a model, the final weights are stored in Calc page.
Next time you want to train a model with same architecture and same data,
the application will ask you whether to start with the weights already saved in Calc sheet.
If you say YES , these wights are used. If you say NO , the weights are re-initialized with random values.
(B) Instead of starting with ramdom weights, you may want to start with our own choice of weights.
Specifying your choice of starting weights is a bit non-trivial for this application. Here is how you do it.
Specify the inputs in the UserInput page and specify the number of training cycle as 0.
This will just setup the Calc page without doing any training.
Now go to Calc sheet and write down your choice of weights in the appropriate places of the weight matrices.
Now come back to UserInput sheet and specify the number of trining cycles you want and click on the Buil Model button.
When the application asks whether to use the already saved weights, click on the YES button.
Now your network will be trained with the starting weights specified by you.

# Missing Value
Min

ll as continuous variables.

Max
Average

et of Input variables,
one with Y2 as output.

sd
Intercept
Slope

umber entered in UserInput sheet.

d as missing value
will all be treated as the same category

ld do one of the following -

ning and validation set

t some pre-specified values

dimensional surface.
he other predictors should be held fixed.

enerated, start and end values)


y of the predictor you have chosen to vary.

with X etc etc.

le of Y as X varies
ween -10 and 10.

X and Z has interaction.

work weights.

with random values.


oice of weights.
e is how you do it.
ing cycle as 0.

opriate places of the weight matrices.


nt and click on the Buil Model button.

Cont. Var.
# Missing Value

Cat. Var.
#Levels
Lables

Values

Dummy

Partition data into Tr


Use who

Network Architectu

Number of Inputs ( bew

Number of Hidden Laye

Learning parameter (be


Momentum (between 0

Training Options
Total #rows in your data

Present Inputs in Rando

From very last cycle

With least Training Error

With least Validation Error

Partition data into Training / Validation set


Use whole data as training set

1
2

Save Network weights

Training / Validation S

If you want to partition, h


Please ch
Please fill up the input n

Save model in a sepa

Network ArchitectureOptions
Number of Inputs ( bewtween 2 and 50)

Number of Outputs ( between 1 and 10 )

Number of Hidden Layers ( 1 or 2 )

Hidden Layer sizes ( Maximum 20 )

Learning parameter (between 0 and 1)

0.4

Initial Wt Range ( 0 +/- w): w =

Momentum (between 0 and 1)

Training Options
Total #rows in your data ( Minimum 10 )

38

No. of Training cycles ( Maximum 500 )

Present Inputs in Random order while Training ?

NO

Training Mode (Batch or Sequential )

Save Network weights

With least Training Error

Training / Validation Set

Partition data into Training / Validation set

If you want to partition, how do you want to select the Validation set ?
Please choose one option
1
Please fill up the input necessary for the selected option
Save model in a separate workbook?

NO

Option 1 : Randomly select


Option 2:
Use last

etween 1 and 10 )

aximum 20 )

( Maximum 500 )

or Sequential )

1
Hidden 1

Hidden 2

2
0.5

50
Sequential

10%
5

of data as Validation set (between 1% and 50%)


rows of the data as validation set

Enter your Data in this sheet


Start Entering your data from cell AC105.
Make sure that the row 104 is blank.
Specify variable type in row 102.
Cont - for continuous Input,
Cat - for Categorical Input,
Output -for Output var.
Omit - if you don't want to usethe variable in the model
For each continuous Input, there will be 1 neuron in Input Layer.
For Each categorical Input with K levels, there will be K neurons in Input Layer
Please make sure that there are no more than 50 neurons in Input Layer.
There should be at most 10 Output variables - application will treat them all as Continuous.
There should be no more than 40 Categorical Input Variables.
Instructions:

Var Type
Var Name

Cont

Omit

Cylinder

Car

8
8
8
8
4
4
4
4
5
6
4
6
6
6
6
6
8
8
8
8
4
6
4
4
4
4
4
4
6
6
4

Buick Estate Wagon


Ford Country Squire
Chevy Malibu Wagon
Chrysler Lebaron Wagon
Chevette
Toyota Corona
Datsun 510
Dodge Omni
Audi 5000
Volvo 240 GL
Saab 99 GLE
Peugeot 694 SL
Buick Century Special
Mercury Zephyr
Dodge Aspen
AMC Concord D/L
Chevy Caprice Classic
Ford LTD
Mercury Grand Marquise
Dodge St Regis
Ford Mustang 4
Ford Mustang Ghia
Mazda GLC
Dodge Colt
AMC Spirit
VW Scirocco
Honda Accord LX
Buick Skylark
Chevy Citation
Olds Omega
Pontiac Phoenix

Output
MPG

Cont
Weight

16.9
15.5
18.5
30
27.5
27.2
30.9
20.3
20.3
17
21.6
16.2
20.6
20.8
18.6
18.1
17
17.6
16.5
18.2
26.5
21.9
34.1
35.1
27.4
31.5
29.5
28.4
28.8
26.8
33.5

4.36
4.054
3.605
3.94
2.155
2.56
2.3
2.23
2.83
3.14
2.795
3.41
3.38
3.07
3.62
3.41
3.84
3.725
3.955
3.83
2.585
2.91
1.975
1.915
2.67
1.99
2.135
2.67
2.595
2.7
2.556

4
4
4
4
6
4
4

Plymouth Horizon
Datsun 210
Fiat Strada
VW Dasher
Datsun 810
BMW 320i
VW Rabbit

34.2
31.8
37.3
30.5
22
21.5
31.9

2.2
2.02
2.13
2.19
2.815
2.6
1.925

Specify variable name in row 103.

m all as Continuous.

Cont

Cont

Drive_Ratio

Cont

Horsepower

2.73
2.26
2.56
2.45
3.7
3.05
3.54
3.37
3.9
3.5
3.77
3.58
2.73
3.08
2.71
2.73
2.41
2.26
2.26
2.45
3.08
3.08
3.73
2.97
3.08
3.78
3.05
2.53
2.69
2.84
2.69

Cat

Displacement

155
142
125
150
68
95
97
75
103
125
115
133
105
85
110
120
130
129
138
135
88
109
65
80
80
71
68
90
115
115
90

Country

350
351
267
360
98
134
119
105
131
163
121
163
231
200
225
258
305
302
351
318
140
171
86
98
121
89
98
151
173
173
151

US
US
US
US
US
Japan
Japan
US
Europe
Europe
Europe
Europe
US
US
US
US
US
US
US
US
US
US
Japan
Japan
US
Europe
Japan
US
US
US
US

Omit

3.37
3.7
3.1
3.7
3.7
3.64
3.78

70
65
69
78
97
110
71

105
85
91
97
146
121
89

US
Japan
Europe
Europe
Japan
Europe
Europe

Omit

Omit

Omit

Omit

Omit

Omit

Omit

Omit

Omit

Omit

X18

X19

Omit

Omit

Omit

Omit

Omit

X20

X21

X22

X23

X24

Omit

Omit

Omit

Omit

Omit

X25

X26

X27

X28

X29

Omit

Omit

Omit

Omit

Omit

X30

X31

X32

X33

X34

Omit

Omit

Omit

Omit

Omit

X35

X36

X37

X38

X39

Omit

Omit

Omit

Omit

Omit

X40

X41

X42

X43

X44

Omit

Omit

Omit

Omit

Omit

X45

X46

X47

X48

X49

Omit

Omit

Omit

Omit

Omit

X50

X51

X52

X53

X54

Omit

Omit

Omit

Omit

Omit

X55

X56

X57

X58

X59

Omit
X60

Neural Network Model for Prediction

Created On :

MSE(Training)

MSE(Validation)

13.717

Number of Hidden Layers


Layer Sizes

14-Aug-02
6.6302

1
8

True Output (if available)


Model (Predicted) Output
ABS( (Tru - Predicted) / Tru )

RMSE

#VALUE!

Cont
Weight

Cont
Drive_Ratio

20.8014
#DIV/0!

Bias
Raw Input

Cont
Cylinder

1
Bias

Transformed Input
Hdn1_bias
Hdn1_Nrn1
Hdn1_Nrn2
Op_bias
Op_Nrn1

7.9600
Cylinder

Weight

3.0934
Drive_Ratio

0.9900

0.3877

0.0000

0.0000

0.0000

0.0000

0.2943

-1.1508

-0.9277

-0.4073

0.9926
1.0000

-1.1561
0.0774

-1.5765
0.2088

-0.5139

0.0000

0.0000

0.0000

0.0000

-1.8890
1.0000

2.2201
0.2432

2.7873

-1.1353

Category Table
Country
3
1
2
3

2.8629

us
japan
europe

0.5082

ARE

#DIV/0!

Cont
Cont
Cat
Horsepower
Displacement Country
101.7368

177.2895 us
Displacemen
Horsepower
t
Country.us

Country.japa
n
Country.europe

0.4082

0.3356

1.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

0.0000

-1.0479

-0.8085

-0.3676

0.7118

0.1508

-2.4783

-0.9123

-0.8300

0.3429

0.0155

-0.0342

-1.3324

50

10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

45
46
47
48
49
50

Avg. error per Input (Original


Scale)
(Training Set)

Avg. error per Input (Original


Scale)
(Validation Set)
17.000

Epoch
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

MSE (Original Scale)

ARE (%)

MSE (Original Scale)

ARE (%)

16.452
16.349
16.245
16.144
16.048
15.957
15.870
15.787
15.707
15.631
15.558
15.487
15.419
15.353
15.289
15.227
15.167
15.108
15.051
14.996
14.942
14.889
14.837
14.786
14.737
14.688
14.641
14.594
14.548
14.503
14.458
14.414
14.371
14.329
14.287
14.246
14.205
14.165
14.125
14.086
14.047
14.009
13.971
13.934

14.33%
14.25%
14.15%
14.05%
13.96%
13.87%
13.78%
13.70%
13.62%
13.54%
13.47%
13.40%
13.34%
13.27%
13.21%
13.15%
13.09%
13.04%
12.98%
12.93%
12.88%
12.83%
12.78%
12.74%
12.69%
12.65%
12.60%
12.56%
12.52%
12.48%
12.44%
12.40%
12.36%
12.33%
12.29%
12.26%
12.22%
12.19%
12.15%
12.12%
12.09%
12.06%
12.03%
11.99%

16.826
16.653
16.326
15.959
15.588
15.221
14.862
14.512
14.171
13.839
13.518
13.205
12.903
12.610
12.326
12.052
11.787
11.531
11.283
11.044
10.813
10.590
10.375
10.167
9.967
9.773
9.586
9.405
9.231
9.063
8.900
8.743
8.591
8.444
8.302
8.164
8.031
7.903
7.778
7.657
7.540
7.427
7.317
7.210

19.74%
19.63%
19.41%
19.17%
18.92%
18.68%
18.44%
18.21%
17.98%
17.75%
17.53%
17.32%
17.11%
16.90%
16.70%
16.50%
16.31%
16.12%
15.94%
15.76%
15.59%
15.42%
15.25%
15.09%
14.93%
14.78%
14.63%
14.48%
14.34%
14.20%
14.06%
13.92%
13.79%
13.67%
13.54%
13.42%
13.30%
13.19%
13.07%
12.96%
12.85%
12.74%
12.64%
12.54%

16.500
16.000
15.500
15.000
14.500
14.000
13.500
0

18.000
16.000
14.000
12.000
10.000
8.000
6.000
4.000
2.000
0.000
0

45
46
47
48
49
50

13.897
13.860
13.824
13.788
13.753
13.717

11.96%
11.93%
11.91%
11.88%
11.85%
11.82%

7.106
7.005
6.908
6.812
6.720
6.630

12.44%
12.34%
12.24%
12.15%
12.06%
11.96%

MSE (Training)

10

20

30

40

50

60

Epoch

MSE (Validation)

10

20

30
Epoch

40

50

60

Profile plot for the fitted model

Generate profile for


Generate
by varying
keeping the other predictors fixe
Outputs
MPG

Predictors
Cylinder
Weight
Drive_Ratio
Horsepower
Displacement
Country

Cylinder
4
4.04
4.08
4.12
4.16
4.2
4.24
4.28
4.32
4.36
4.4
4.44
4.48
4.52
4.56
4.6
4.64
4.68
4.72
4.76
4.8
4.84
4.88
4.92
4.96
5
5.04
5.08
5.12
5.16
5.2
5.24
5.28
5.32
5.36
5.4
5.44
5.48
5.52
5.56
5.6
5.64
5.68
5.72
5.76

Predicted MPG
25.50382
25.43797
25.37239
25.3071
25.24209
25.17737
25.11295
25.04884
24.98503
24.92155
24.85838
24.79555
24.73305
24.67088
24.60906
24.54759
24.48647
24.42571
24.36531
24.30528
24.24562
24.18633
24.12741
24.06888
24.01074
23.95298
23.89561
23.83864
23.78206
23.72588
23.6701
23.61472
23.55975
23.50519
23.45104
23.3973
23.34397
23.29105
23.23855
23.18647
23.1348
23.08355
23.03272
22.98231
22.93231

Predictor

Cylinder

Fixed Value

5.395

Min / Max in Original Data (for user's reference o


Min
4.00
Max
8.00

30
25
20
15
10
5
0
0

Category Table
Country
3
us
japan
europe

5.8
5.84
5.88
5.92
5.96
6
6.04
6.08
6.12
6.16
6.2
6.24
6.28
6.32
6.36
6.4
6.44
6.48
6.52
6.56
6.6
6.64
6.68
6.72
6.76
6.8
6.84
6.88
6.92
6.96
7
7.04
7.08
7.12
7.16
7.2
7.24
7.28
7.32
7.36
7.4
7.44
7.48
7.52
7.56
7.6
7.64
7.68
7.72
7.76
7.8
7.84

22.88274
22.83359
22.78486
22.73655
22.68866
22.64118
22.59413
22.5475
22.50128
22.45549
22.41011
22.36514
22.32059
22.27646
22.23274
22.18943
22.14653
22.10404
22.06195
22.02028
21.979
21.93813
21.89766
21.85759
21.81792
21.77864
21.73976
21.70127
21.66317
21.62545
21.58812
21.55117
21.51461
21.47842
21.44261
21.40717
21.3721
21.3374
21.30306
21.26909
21.23548
21.20223
21.16934
21.1368
21.1046
21.07276
21.04126
21.0101
20.97929
20.94881
20.91866
20.88885

7.88 20.85936
7.92 20.8302
7.96 20.80136

or the fitted model


Generate profile for

MPG
data points
Cylinder between
4
and
keeping the other predictors fixed at the specified values
100

Weight

Drive_Ratio

2.863

Horsepower

3.093

Displacement

101.737

ginal Data (for user's reference only)


1.91
2.26
4.36
3.90

177.289

65.00
155.00

Country
us

85.00
360.00

Predicted MPG

You might also like