Hyderabad2019Day4NeuralNetworks Day 4 PDF

CONTRASTING STYLES OF NEURAL NETWORKS and
MULTIPLE REGRESSION
What is DEEP LEARNING

Hyderabad 2019 Workshop
December 21-‐24, 2019
Day 4

Instructor: MB Rao
Division of Biostatistics and Bioinformatics
University of Cincinnati

1

Agenda:
1. What is a neural network?
2. Neural Network vis-‐à-‐vis Multiple Regression
3. Deep Learning

1. What is a neural network?

Regression context:

Response variable: Y – Numeric
Predictors: X1 and X2

Goal: Build a neural network model connecting Y with X1 and X2. The
emphasis is on prediction.

Equipment:
Every neural network has one input layer and one output layer.
I choose one hidden layer.
Choose the number of neurons. I Chose 3.
Choose an activation function f. I chose the sigmoid function:
!
f(x) = , x real.
!!!"# (!!)
Properties: 0 < f(x) < 1
f’(x) = f(x)(1 – f(x))
Come up with weights as outlined below.

Here is my model pictorially:

2

Here is my model algebraically:

Y* = f(w1*f(w11*X1+ w21*X2 + b1) + w2*f(w12*X1+ w22*X2 + b2) +
w3*f(w13*X1+ w23*X2 + b1) + b)
Y* is a function of 9 weights and 4 biases.

How does this work?

Get the data handy.

Y X1 X2 Y*

Y1 X11 X21 𝑌!∗
Y2 X12 X22 𝑌!∗
3

… … …
Yn X1n X2n 𝑌!∗

The weights and biases are in place. Plug in the data of X1 and X2 from the first
row. Calculate Y*. Continue with every row data on X1 and X2. Keep
calculating Y*.

Check how close the Y* values are to Y values. Calculate MSE =
! !
!!! !! ! !!∗ !
!"#$%&'"()%! !"#$%&'#$()*+# !
= !!!
! !
Error = 0.5*MSE
Set a target value for ERROR. The Error we have now in hand is not
satisfactory. Tweak the weights and biases. How? There is a way. (Back
propagation method) Start again. Look at the resultant Error. It is not
satisfactory. Tweak the weights and biases. Continue until we get the
Error we wanted.

Query: Why don’t we minimize the Error with respect to the weights
and biases? Very hard.
Easy. Set a target for Error. Do iterations until we get what we wanted.

Contrast the neural network with our multiple regression model.

Y = β0 + β1*X1 + β2*X2 + ε with mean 0 and sd σ2.

The model has 4 parameters. We do use the least squares method to
estimate the parameters of model.

A model with 4 parameters versus a model with 13 parameters.

R has several packages, which can get you a neural network model.

Example
4

> data(Boston)
> dim(Boston)
[1] 506 14
> head(Boston)
crim zn indus chas nox rm age dis rad tax ptratio black lstat
1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98
2 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14
3 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03
4 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94
5 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33
6 0.02985 0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21
medv
1 24.0
2 21.6
3 34.7
4 33.4
5 36.2
6 28.7
Importance of the study …
Response variable: medv
Predictors: 13 in all

Build a neural network regression model.
Let us experiment with 5 hidden layers.
# weights = 70
# biases = 6

Multiple regression model has 15 parameters.

I Step. Each column data has to be normalized to 0 to 1. Why?

Operation normalization:
Find the maximum column-‐wise.
> Maxs <-‐ apply(Boston, 2, max)
> Maxs
crim zn indus chas nox rm age dis
88.9762 100.0000 27.7400 1.0000 0.8710 8.7800 100.0000 12.1265
rad tax ptratio black lstat medv
24.0000 711.0000 22.0000 396.9000 37.9700 50.0000
Find the minimum column-‐wise.
> Mins <-‐ apply(Boston, 2, min)
> Mins
0.00632 0.00000 0.46000 0.00000 0.38500 3.56100 2.90000 1.12960
1.00000 187.00000 12.60000 0.32000 1.73000 5.00000
5

> class(Boston)
[1] "data.frame"
Normalization …
> ScaledB <-‐ scale(Boston, center = Mins, scale = Maxs -‐ Mins)
> class(ScaledB)
[1] "matrix"
> head(ScaledB)
1 0.0000000000 0.18 0.06781525 0 0.3148148 0.5775053 0.6416066 0.2692031
2 0.0002359225 0.00 0.24230205 0 0.1728395 0.5479977 0.7826982 0.3489620
3 0.0002356977 0.00 0.24230205 0 0.1728395 0.6943859 0.5993821 0.3489620
4 0.0002927957 0.00 0.06304985 0 0.1502058 0.6585553 0.4418126 0.4485446
5 0.0007050701 0.00 0.06304985 0 0.1502058 0.6871048 0.5283213 0.4485446
6 0.0002644715 0.00 0.06304985 0 0.1502058 0.5497222 0.5746653 0.4485446
1 0.00000000 0.20801527 0.2872340 1.0000000 0.08967991 0.4222222
2 0.04347826 0.10496183 0.5531915 1.0000000 0.20447020 0.3688889
3 0.04347826 0.10496183 0.5531915 0.9897373 0.06346578 0.6600000
4 0.08695652 0.06679389 0.6489362 0.9942761 0.03338852 0.6311111
5 0.08695652 0.06679389 0.6489362 1.0000000 0.09933775 0.6933333
6 0.08695652 0.06679389 0.6489362 0.9929901 0.09602649 0.5266667

Step2
Deep concern: Researchers are concerned about over-‐fitting. Better use a cross-‐
validation method.

Idea. Select a random sample of 75% of the observations. Fit the model to the
chosen data (training data). Predict using the test data.
> index <-‐ sample(1:nrow(Boston), round(0.75*nrow(Boston)))
> head(index)
[1] 456 84 122 121 138 395
Training data …
> train <-‐ ScaledB[index, ]
Test data …
> test <-‐ ScaledB[-‐index, ]
> dim(train)
[1] 380 14
> dim(test)
[1] 126 14

Step 3

Activate the neuralnet package. Fit the model to the training data.
6

> Neural <-‐ neuralnet(medv ~ crim + zn + indus + chas + nox + rm + age + dis +
+ rad + tax + ptratio + black + lstat, data = train, hidden = 5, linear.output =
+ TRUE)
Look at the output.
> Neural
Call: neuralnet(formula = medv ~ crim + zn + indus + chas + nox + rm + age + dis
+ rad + tax + ptratio + black + lstat, data = train, hidden = 5, linear.output =
TRUE)
1 repetition was calculated.
Error Reached Threshold Steps
1 0.5521105681 0.008518995316 1810
What is the definition of error?
What is available in the output?
> names(Neural)
[1] "call" "response" "covariate"
[4] "model.list" "err.fct" "act.fct"
[7] "linear.output" "data" "net.result"
[10] "weights" "startweights" "generalized.weights"
[13] "result.matrix"
Predictions as per the model on its own training data …
Weights and biases of the fitted network

Look at the final weights and biases.
> Neural$weights
[[1]]
[[1]][[1]]
[,1] [,2] [,3] [,4]
[1,] -2.85568157199 -0.08583591601 -0.378601415140 -1.59310310115
[2,] -4.08403137510 33.22505440312 -2.867568990001 -3.07724578496
[3,] -0.34286531604 0.33550912203 -1.497394360159 -47.74364991763
[4,] -1.04142215689 0.31833981935 -0.001272842622 2.04785553440
[5,] 0.06533935713 -0.66015458860 0.436315756114 0.09642631869
[6,] -0.12990764305 -1.03445251768 -1.144746487349 -1.36984675982
[7,] 4.96295548937 -1.51900387065 -0.255016229657 -0.69923746384
[8,] -0.46148810256 1.45779560846 -0.167382194514 -0.25209468993
[9,] -1.30767935934 -0.58530508206 -1.647454422674 -8.41362091385
[10,] -2.33850302023 -1.95119638696 -14.853747863353 1.68882353274
[11,] -0.89923628530 1.16781101870 2.019907260779 -0.01393246031
[12,] -0.73903291384 0.51683535809 -0.446705542600 0.08117674463
[13,] 1.15231495614 -0.77570190697 0.662823518947 0.64011814564
[14,] -2.52586709011 -0.53923916714 -3.171540411614 -6.54867055429
[,5]
[1,] -0.32123913219
[2,] -55.25851933241
[3,] -3.25723698219
[4,] -0.64465681507
7

[5,] 1.14337588923
[6,] 1.76061174429
[7,] -2.26634076690
[8,] 0.28661267269
[9,] -3.43069426657
[10,] 4.31135087352
[11,] -0.47561199731
[12,] 0.05543802282
[13,] -0.14470307233
[14,] -13.06309132362
[[1]][[2]]
[,1]
[1,] 0.3285047223
[2,] 1.0230877214
[3,] -0.2312830911
[4,] -1.0136261389
[5,] 2.9024081531
[6,] -1.7150123559

The final weights and biases are presented as a list with two leafs. The first leaf
gives 5 biases associated with the five hidden layers and 65 weights (13 predictors
* 5 hidden layers). The second leaf gives one bias and 5 weights.

Plot the network …
> plot(Neural)
8

crim 1 -2 1
-4 .0 .8
8 40 5 5
68
3-323
3
5 .50.2 -1437.25 -2-00.0.6 001..14 -11.7

zn
.2826 7752855.24.97743264.50047486
. 75
-0 .34 287
0.
50
33
5
142
7
indus -1 .054 51
1.0
-0 .0
0.3 34
23
6158
7
3
0.0 34
9 127 36 3 3
8 58
09
chas -0 .69961
4
4 . 09 3 3 .360 --02. .26 0.28 -3 .
2 015
.1
-0 664 6 4 8 6 961 69 634 6 61 43 0
nox -1 .03 445 6 -0
.23
49 62 9
-1 2 12
.141 9
-0 .37
5 7 25310 .5136902. .1869 9 --12..933 --01 .30 -0 .4 4.9
-1 .5 475 8
0.328
rm
2 8 7 44 6 84301 7 8214 5 185 .58 768 6 1
-0 .28
86
75
45 502
85 92
5
1.
age medv
1
-0.16738 -1.01363
53
-0
.25 745
4
4
-1 .6 209 5
-1 .59 3
dis
2
-8 .4 537
4.81 362 41
-1 02
1
rad 9
1.688 82 2.
1
71 9 9
4.
69
31
-2 .52 0-10..15 --00. .7 -10 .
393
tax -0 .011 35
-0 .32 12
-0 . 18
4871
1
5
0.0 61
50
2
ptratio
71
0.05 2
0 1 5 44
4
..514731952847.667
-1 .
. 64
0
86 4
-0 .14 47
7
black
09
.0 63
--630
-1 3
lstat

Step 3
Do predictions on the test data …
> Pred <-‐ compute(Neural, test[ , 1:13])
Predictions on the test data.
Pred$net.result
[,1]
2 0.3757415701
7 0.3591791009
9 0.2669614644
10 0.3146476731
12 0.3345820387
28 0.2447454954
30 0.3475679445
9

32 0.2534157230
42 0.6075361443
44 0.4699340484
50 0.2966713391
55 0.2466716981
56 0.6437671427
72 0.3575877295
80 0.3378467889
83 0.4406688622
86 0.5097421581
87 0.3813464824
89 0.5297017486
96 0.4475301031
108 0.3316455636
113 0.3030533420
114 0.3010485312
115 0.3967557632
123 0.3069406727
125 0.3064333012
128 0.2680328928
129 0.3209942035
130 0.2356838553
140 0.2712029351
142 0.1707914676
145 0.2095164027
146 0.2174989132
154 0.2536396135
165 0.3310547342
166 0.3129059581
167 0.9800033205
168 0.2881798281
176 0.5851211602
181 0.8770446959
188 0.5568578915
198 0.6103184783
199 0.6999327919
205 0.9984145626
10

209 0.3567476275
210 0.2643354897
213 0.3434654552
218 0.4697034708
219 0.3510338783
225 0.8946034984
227 0.7713939588
230 0.4932146168
234 0.8451574570
235 0.3973463612
237 0.4086731864
240 0.4776623025
243 0.3870811017
246 0.2770407172
249 0.3633061375
250 0.4551401092
253 0.5097248098
261 0.7123061252
263 0.9954823174
264 0.6998072948
269 0.8187958756
277 0.6694492660
278 0.5903431127
280 0.6969573674
283 0.7931284028
284 0.9596458996
285 0.6185027939
291 0.6004720344
292 0.6664488558
298 0.3045235110
302 0.4554820845
305 0.6713047878
310 0.3540051268
317 0.2909557635
318 0.3025937072
325 0.4476307134
326 0.4687220025
11

328 0.3632932039
332 0.2603927261
337 0.3715700270
344 0.4542951347
350 0.4435675137
351 0.3403736236
359 0.3192597160
363 0.3251733709
367 0.3988461814
371 1.2906069031
375 0.1037415294
387 0.1155471295
394 0.3382996601
396 0.2789493990
398 0.2466502738
399 0.1040357035
402 0.1836781198
404 0.1618810811
405 0.1063417382
406 0.1067429388
407 0.1682969906
408 0.5524514639
414 0.1545580633
418 0.1104342050
419 0.1021958320
426 0.1157222453
428 0.1472523894
431 0.2099817017
432 0.1651702798
433 0.3743405146
441 0.1322323174
445 0.1303098946
447 0.2509959661
448 0.2242491925
452 0.2657565517
457 0.1961223795
473 0.3867211847
12

477 0.2962039278
481 0.4123671802
482 0.5385543241
485 0.3542734361
488 0.4347491866
492 0.1942285742
494 0.3627209208
503 0.2596888149

These predictions are on the normalized data …

We need to do predictions on the original data. De-‐normalization?
> PredOriginal <-‐ (Pred$net.result)*(max(Boston$medv) -‐ min(Boston$medv)) +
+ min(Boston$medv)
> head(PredOriginal)
[,1]
2 21.90837066
7 21.16305954
9 17.01326590
10 19.15914529
12 20.05619174
28 16.01354729
The test data has to be de-‐normalized for comparison with predictions.
> head(test)
crim zn indus chas nox rm
2 0.0002359225392 0.000 0.2423020528 0 0.1728395062 0.5479977007
7 0.0009213230365 0.125 0.2716275660 0 0.2860082305 0.4696301974
9 0.0023032513925 0.125 0.2716275660 0 0.2860082305 0.3966277065
10 0.0018401733261 0.125 0.2716275660 0 0.2860082305 0.4680973367
12 0.0012492992010 0.125 0.2716275660 0 0.2860082305 0.4690553746
28 0.0106715890816 0.000 0.2815249267 0 0.3148148148 0.4763364629
age dis rad tax ptratio
2 0.7826982492 0.3489619802 0.04347826087 0.1049618321 0.5531914894
7 0.6560247168 0.4029226418 0.17391304348 0.2366412214 0.2765957447
9 1.0000000000 0.4503541907 0.17391304348 0.2366412214 0.2765957447
10 0.8547888774 0.4967308969 0.17391304348 0.2366412214 0.2765957447
12 0.8238928939 0.4635033509 0.17391304348 0.2366412214 0.2765957447
13

28 0.8846549949 0.3022488156 0.13043478261 0.2290076336 0.8936170213
black lstat medv
2 1.0000000000 0.2044701987 0.3688888889
7 0.9967219729 0.2952538631 0.3977777778
9 0.9741035857 0.7781456954 0.2555555556
10 0.9743053104 0.4241169978 0.3088888889
12 1.0000000000 0.3184326711 0.3088888889
28 0.7717484492 0.4290838852 0.2177777778
> testmedv <-‐ (test[ , 14])*(max(Boston$medv) -‐ min(Boston$medv)) +
min(Boston$medv)
> head(testmedv)
2 7 9 10 12 28
21.6 22.9 16.5 18.9 18.9 14.8
Calculate mean square error.
SENeural <-‐ sum((PredOriginal -‐ testmedv)^2)/nrow(test)
> MSENeural
[1] 8.786489466

Fantastic? The MSE on multiple regression model worked out to be around 25.

How close the predicted values are to observed values?
> plot(test1[ , 14], PredOriginal, xlab = "Observed Median Prices", ylab =
+ "Predicted Median Prices", main = "Neural Network Regression Model", pch =
16, col = "blue", xlim = c(9, 55), ylim = c(9, 55))
> abline(0, 1, lwd = 2, col = "red")

14

60
50
Neural Network Regression Model
Predicted Median Prices
40
30
20
10
10 20 30 40 50
Observed Median Prices

Are the neural networks worth the sweat?
A critical appraisal …

3. What is deep learning?

Have one more bunch of hidden layers.

R codedim(Boston)
head(Boston)
Maxs <-‐ apply(Boston, 2, max)
Maxs
Mins <-‐ apply(Boston, 2, min)
15

Mins
class(Boston)
ScaledB <-‐ scale(Boston, center = Mins, scale = Maxs -‐ Mins)
class(ScaledB)
ScaledB <-‐ as.data.frame(ScaledB)
head(ScaledB)
index <-‐ sample(1:nrow(Boston), round(0.75*nrow(Boston)))
head(index)
train <-‐ ScaledB[index, ]
test <-‐ ScaledB[-‐index, ]
dim(train)
dim(test)
Neural <-‐ neuralnet(medv ~ crim + zn + indus + chas + nox + rm + age + dis +
rad + tax + ptratio + black + lstat, data = train, hidden = 5, linear.output =
TRUE)
summary(Neural)
Neural
names(Neural)
head(Neural$net.result)
Neural$weights
plot(Neural)
Pred <-‐ compute(Neural, test[ , 1:13])
head(Pred)
PredOriginal <-‐ (Pred$net.result)*(max(Boston$medv) -‐ min(Boston$medv)) +
min(Boston$medv)
head(PredOriginal)
testmedv <-‐ (test["medv"])*(max(Boston$medv) -‐ min(Boston$medv)) +
min(Boston$medv)
head(testmedv)
head(test)
head(test$medv)
testmedv <-‐ (as.matrix(test)$medv)*(max(Boston$medv) -‐ min(Boston$medv)) +
min(Boston$medv)
testmedv <-‐ (test[ , 14])*(max(Boston$medv) -‐ min(Boston$medv)) +
min(Boston$medv)
head(testmedv)
MSENeural <-‐ sum((PredOriginal -‐ testmedv)^2)/nrow(test)
16

MSENeural
plot(test1[ , 14], PredOriginal, xlab = "Observed Median Prices", ylab =
"Predicted Median Prices", main = "Neural Network Regression Model", pch = 16,
col = "blue", xlim = c(9, 55), ymin = c(9, 55))
test1 <-‐ Boston[-‐index, ]
abline(0, 1, lwd = 2, col = "red")

17

Hyderabad2019Day4NeuralNetworks Day 4 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hyderabad2019Day4NeuralNetworks Day 4 PDF

Uploaded by

Copyright:

Available Formats

5 .50.2 -1437.25 -2-00.0.6 001..14 -11.7

Observed Median Prices

You might also like