Professional Documents
Culture Documents
Hyderabad2019Day4NeuralNetworks Day 4 PDF
Hyderabad2019Day4NeuralNetworks Day 4 PDF
CONTRASTING
STYLES
OF
NEURAL
NETWORKS
and
MULTIPLE
REGRESSION
What
is
DEEP
LEARNING
Hyderabad
2019
Workshop
December
21-‐24,
2019
Day
4
Instructor:
MB
Rao
Division
of
Biostatistics
and
Bioinformatics
University
of
Cincinnati
1
Agenda:
1.
What
is
a
neural
network?
2.
Neural
Network
vis-‐à-‐vis
Multiple
Regression
3.
Deep
Learning
1.
What
is
a
neural
network?
Regression
context:
Response
variable:
Y
–
Numeric
Predictors:
X1
and
X2
Goal:
Build
a
neural
network
model
connecting
Y
with
X1
and
X2.
The
emphasis
is
on
prediction.
Equipment:
Every
neural
network
has
one
input
layer
and
one
output
layer.
I
choose
one
hidden
layer.
Choose
the
number
of
neurons.
I
Chose
3.
Choose
an
activation
function
f.
I
chose
the
sigmoid
function:
!
f(x)
=
,
x
real.
!!!"# (!!)
Properties:
0
<
f(x)
<
1
f’(x)
=
f(x)(1
–
f(x))
Come
up
with
weights
as
outlined
below.
Here
is
my
model
pictorially:
2
Here
is
my
model
algebraically:
Y*
=
f(w1*f(w11*X1+
w21*X2
+
b1)
+
w2*f(w12*X1+
w22*X2
+
b2)
+
w3*f(w13*X1+
w23*X2
+
b1)
+
b)
Y*
is
a
function
of
9
weights
and
4
biases.
How
does
this
work?
Get
the
data
handy.
Y
X1
X2
Y*
Y1
X11
X21
𝑌!∗
Y2
X12
X22
𝑌!∗
3
…
…
…
Yn
X1n
X2n
𝑌!∗
The
weights
and
biases
are
in
place.
Plug
in
the
data
of
X1
and
X2
from
the
first
row.
Calculate
Y*.
Continue
with
every
row
data
on
X1
and
X2.
Keep
calculating
Y*.
Check
how
close
the
Y*
values
are
to
Y
values.
Calculate
MSE
=
! !
!!! !! ! !!∗ !
!"#$%&'"()%! !"#$%&'#$()*+# !
=
!!!
! !
Error
=
0.5*MSE
Set
a
target
value
for
ERROR.
The
Error
we
have
now
in
hand
is
not
satisfactory.
Tweak
the
weights
and
biases.
How?
There
is
a
way.
(Back
propagation
method)
Start
again.
Look
at
the
resultant
Error.
It
is
not
satisfactory.
Tweak
the
weights
and
biases.
Continue
until
we
get
the
Error
we
wanted.
Query:
Why
don’t
we
minimize
the
Error
with
respect
to
the
weights
and
biases?
Very
hard.
Easy.
Set
a
target
for
Error.
Do
iterations
until
we
get
what
we
wanted.
Contrast
the
neural
network
with
our
multiple
regression
model.
Y
=
β0
+
β1*X1
+
β2*X2
+
ε
with
mean
0
and
sd
σ2.
The
model
has
4
parameters.
We
do
use
the
least
squares
method
to
estimate
the
parameters
of
model.
A
model
with
4
parameters
versus
a
model
with
13
parameters.
R
has
several
packages,
which
can
get
you
a
neural
network
model.
Example
4
>
data(Boston)
>
dim(Boston)
[1]
506
14
>
head(Boston)
crim zn indus chas nox rm age dis rad tax ptratio black lstat
1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98
2 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14
3 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03
4 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94
5 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33
6 0.02985 0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21
medv
1 24.0
2 21.6
3 34.7
4 33.4
5 36.2
6 28.7
Importance
of
the
study
…
Response
variable:
medv
Predictors:
13
in
all
Build
a
neural
network
regression
model.
Let
us
experiment
with
5
hidden
layers.
#
weights
=
70
#
biases
=
6
Multiple
regression
model
has
15
parameters.
I
Step.
Each
column
data
has
to
be
normalized
to
0
to
1.
Why?
Operation
normalization:
Find
the
maximum
column-‐wise.
>
Maxs
<-‐
apply(Boston,
2,
max)
>
Maxs
crim zn indus chas nox rm age dis
88.9762 100.0000 27.7400 1.0000 0.8710 8.7800 100.0000 12.1265
rad tax ptratio black lstat medv
24.0000 711.0000 22.0000 396.9000 37.9700 50.0000
Find
the
minimum
column-‐wise.
>
Mins
<-‐
apply(Boston,
2,
min)
>
Mins
crim zn indus chas nox rm age dis
0.00632 0.00000 0.46000 0.00000 0.38500 3.56100 2.90000 1.12960
rad tax ptratio black lstat medv
1.00000 187.00000 12.60000 0.32000 1.73000 5.00000
5
>
class(Boston)
[1]
"data.frame"
Normalization
…
>
ScaledB
<-‐
scale(Boston,
center
=
Mins,
scale
=
Maxs
-‐
Mins)
>
class(ScaledB)
[1]
"matrix"
>
head(ScaledB)
crim zn indus chas nox rm age dis
1 0.0000000000 0.18 0.06781525 0 0.3148148 0.5775053 0.6416066 0.2692031
2 0.0002359225 0.00 0.24230205 0 0.1728395 0.5479977 0.7826982 0.3489620
3 0.0002356977 0.00 0.24230205 0 0.1728395 0.6943859 0.5993821 0.3489620
4 0.0002927957 0.00 0.06304985 0 0.1502058 0.6585553 0.4418126 0.4485446
5 0.0007050701 0.00 0.06304985 0 0.1502058 0.6871048 0.5283213 0.4485446
6 0.0002644715 0.00 0.06304985 0 0.1502058 0.5497222 0.5746653 0.4485446
rad tax ptratio black lstat medv
1 0.00000000 0.20801527 0.2872340 1.0000000 0.08967991 0.4222222
2 0.04347826 0.10496183 0.5531915 1.0000000 0.20447020 0.3688889
3 0.04347826 0.10496183 0.5531915 0.9897373 0.06346578 0.6600000
4 0.08695652 0.06679389 0.6489362 0.9942761 0.03338852 0.6311111
5 0.08695652 0.06679389 0.6489362 1.0000000 0.09933775 0.6933333
6 0.08695652 0.06679389 0.6489362 0.9929901 0.09602649 0.5266667
Step2
Deep
concern:
Researchers
are
concerned
about
over-‐fitting.
Better
use
a
cross-‐
validation
method.
Idea.
Select
a
random
sample
of
75%
of
the
observations.
Fit
the
model
to
the
chosen
data
(training
data).
Predict
using
the
test
data.
>
index
<-‐
sample(1:nrow(Boston),
round(0.75*nrow(Boston)))
>
head(index)
[1]
456
84
122
121
138
395
Training
data
…
>
train
<-‐
ScaledB[index,
]
Test
data
…
>
test
<-‐
ScaledB[-‐index,
]
>
dim(train)
[1]
380
14
>
dim(test)
[1]
126
14
Step
3
Activate
the
neuralnet
package.
Fit
the
model
to
the
training
data.
6
>
Neural
<-‐
neuralnet(medv
~
crim
+
zn
+
indus
+
chas
+
nox
+
rm
+
age
+
dis
+
+
rad
+
tax
+
ptratio
+
black
+
lstat,
data
=
train,
hidden
=
5,
linear.output
=
+
TRUE)
Look
at
the
output.
>
Neural
Call:
neuralnet(formula
=
medv
~
crim
+
zn
+
indus
+
chas
+
nox
+
rm
+
age
+
dis
+
rad
+
tax
+
ptratio
+
black
+
lstat,
data
=
train,
hidden
=
5,
linear.output
=
TRUE)
1
repetition
was
calculated.
Error Reached Threshold Steps
1 0.5521105681 0.008518995316 1810
What
is
the
definition
of
error?
What
is
available
in
the
output?
>
names(Neural)
[1]
"call"
"response"
"covariate"
[4]
"model.list"
"err.fct"
"act.fct"
[7]
"linear.output"
"data"
"net.result"
[10]
"weights"
"startweights"
"generalized.weights"
[13]
"result.matrix"
Predictions
as
per
the
model
on
its
own
training
data
…
Weights
and
biases
of
the
fitted
network
Look
at
the
final
weights
and
biases.
>
Neural$weights
[[1]]
[[1]][[1]]
[,1] [,2] [,3] [,4]
[1,] -2.85568157199 -0.08583591601 -0.378601415140 -1.59310310115
[2,] -4.08403137510 33.22505440312 -2.867568990001 -3.07724578496
[3,] -0.34286531604 0.33550912203 -1.497394360159 -47.74364991763
[4,] -1.04142215689 0.31833981935 -0.001272842622 2.04785553440
[5,] 0.06533935713 -0.66015458860 0.436315756114 0.09642631869
[6,] -0.12990764305 -1.03445251768 -1.144746487349 -1.36984675982
[7,] 4.96295548937 -1.51900387065 -0.255016229657 -0.69923746384
[8,] -0.46148810256 1.45779560846 -0.167382194514 -0.25209468993
[9,] -1.30767935934 -0.58530508206 -1.647454422674 -8.41362091385
[10,] -2.33850302023 -1.95119638696 -14.853747863353 1.68882353274
[11,] -0.89923628530 1.16781101870 2.019907260779 -0.01393246031
[12,] -0.73903291384 0.51683535809 -0.446705542600 0.08117674463
[13,] 1.15231495614 -0.77570190697 0.662823518947 0.64011814564
[14,] -2.52586709011 -0.53923916714 -3.171540411614 -6.54867055429
[,5]
[1,] -0.32123913219
[2,] -55.25851933241
[3,] -3.25723698219
[4,] -0.64465681507
7
[5,] 1.14337588923
[6,] 1.76061174429
[7,] -2.26634076690
[8,] 0.28661267269
[9,] -3.43069426657
[10,] 4.31135087352
[11,] -0.47561199731
[12,] 0.05543802282
[13,] -0.14470307233
[14,] -13.06309132362
[[1]][[2]]
[,1]
[1,] 0.3285047223
[2,] 1.0230877214
[3,] -0.2312830911
[4,] -1.0136261389
[5,] 2.9024081531
[6,] -1.7150123559
The
final
weights
and
biases
are
presented
as
a
list
with
two
leafs.
The
first
leaf
gives
5
biases
associated
with
the
five
hidden
layers
and
65
weights
(13
predictors
*
5
hidden
layers).
The
second
leaf
gives
one
bias
and
5
weights.
Plot
the
network
…
>
plot(Neural)
8
crim 1 -2 1
-4 .0 .8
8 40 5 5
68
3-323
3
.2826 7752855.24.97743264.50047486
. 75
-0 .34 287
0.
50
33
5
142
7
indus -1 .054 51
1.0
-0 .0
0.3 34
23
6158
7
3
0.0 34
9 127 36 3 3
8 58
09
chas -0 .69961
4
4 . 09 3 3 .360 --02. .26 0.28 -3 .
2 015
.1
-0 664 6 4 8 6 961 69 634 6 61 43 0
nox -1 .03 445 6 -0
.23
49 62 9
-1 2 12
.141 9
-0 .37
5 7 25310 .5136902. .1869 9 --12..933 --01 .30 -0 .4 4.9
-1 .5 475 8
0.328
rm
2 8 7 44 6 84301 7 8214 5 185 .58 768 6 1
-0 .28
86
75
45 502
85 92
5
1.
age medv
1
-0.16738 -1.01363
53
-0
.25 745
4
4
-1 .6 209 5
-1 .59 3
dis
2
-8 .4 537
4.81 362 41
-1 02
1
rad 9
1.688 82 2.
1
71 9 9
4.
69
31
-2 .52 0-10..15 --00. .7 -10 .
393
tax -0 .011 35
-0 .32 12
-0 . 18
4871
1
5
0.0 61
50
2
ptratio
71
0.05 2
0 1 5 44
4
..514731952847.667
-1 .
. 64
0
86 4
-0 .14 47
7
black
09
.0 63
--630
-1 3
lstat
Step
3
Do
predictions
on
the
test
data
…
>
Pred
<-‐
compute(Neural,
test[
,
1:13])
Predictions
on
the
test
data.
Pred$net.result
[,1]
2
0.3757415701
7
0.3591791009
9
0.2669614644
10
0.3146476731
12
0.3345820387
28
0.2447454954
30
0.3475679445
9
32
0.2534157230
42
0.6075361443
44
0.4699340484
50
0.2966713391
55
0.2466716981
56
0.6437671427
72
0.3575877295
80
0.3378467889
83
0.4406688622
86
0.5097421581
87
0.3813464824
89
0.5297017486
96
0.4475301031
108
0.3316455636
113
0.3030533420
114
0.3010485312
115
0.3967557632
123
0.3069406727
125
0.3064333012
128
0.2680328928
129
0.3209942035
130
0.2356838553
140
0.2712029351
142
0.1707914676
145
0.2095164027
146
0.2174989132
154
0.2536396135
165
0.3310547342
166
0.3129059581
167
0.9800033205
168
0.2881798281
176
0.5851211602
181
0.8770446959
188
0.5568578915
198
0.6103184783
199
0.6999327919
205
0.9984145626
10
209
0.3567476275
210
0.2643354897
213
0.3434654552
218
0.4697034708
219
0.3510338783
225
0.8946034984
227
0.7713939588
230
0.4932146168
234
0.8451574570
235
0.3973463612
237
0.4086731864
240
0.4776623025
243
0.3870811017
246
0.2770407172
249
0.3633061375
250
0.4551401092
253
0.5097248098
261
0.7123061252
263
0.9954823174
264
0.6998072948
269
0.8187958756
277
0.6694492660
278
0.5903431127
280
0.6969573674
283
0.7931284028
284
0.9596458996
285
0.6185027939
291
0.6004720344
292
0.6664488558
298
0.3045235110
302
0.4554820845
305
0.6713047878
310
0.3540051268
317
0.2909557635
318
0.3025937072
325
0.4476307134
326
0.4687220025
11
328
0.3632932039
332
0.2603927261
337
0.3715700270
344
0.4542951347
350
0.4435675137
351
0.3403736236
359
0.3192597160
363
0.3251733709
367
0.3988461814
371
1.2906069031
375
0.1037415294
387
0.1155471295
394
0.3382996601
396
0.2789493990
398
0.2466502738
399
0.1040357035
402
0.1836781198
404
0.1618810811
405
0.1063417382
406
0.1067429388
407
0.1682969906
408
0.5524514639
414
0.1545580633
418
0.1104342050
419
0.1021958320
426
0.1157222453
428
0.1472523894
431
0.2099817017
432
0.1651702798
433
0.3743405146
441
0.1322323174
445
0.1303098946
447
0.2509959661
448
0.2242491925
452
0.2657565517
457
0.1961223795
473
0.3867211847
12
477
0.2962039278
481
0.4123671802
482
0.5385543241
485
0.3542734361
488
0.4347491866
492
0.1942285742
494
0.3627209208
503
0.2596888149
These
predictions
are
on
the
normalized
data
…
We
need
to
do
predictions
on
the
original
data.
De-‐normalization?
>
PredOriginal
<-‐
(Pred$net.result)*(max(Boston$medv)
-‐
min(Boston$medv))
+
+
min(Boston$medv)
>
head(PredOriginal)
[,1]
2
21.90837066
7
21.16305954
9
17.01326590
10
19.15914529
12
20.05619174
28
16.01354729
The
test
data
has
to
be
de-‐normalized
for
comparison
with
predictions.
>
head(test)
crim
zn
indus
chas
nox
rm
2
0.0002359225392
0.000
0.2423020528
0
0.1728395062
0.5479977007
7
0.0009213230365
0.125
0.2716275660
0
0.2860082305
0.4696301974
9
0.0023032513925
0.125
0.2716275660
0
0.2860082305
0.3966277065
10
0.0018401733261
0.125
0.2716275660
0
0.2860082305
0.4680973367
12
0.0012492992010
0.125
0.2716275660
0
0.2860082305
0.4690553746
28
0.0106715890816
0.000
0.2815249267
0
0.3148148148
0.4763364629
age
dis
rad
tax
ptratio
2
0.7826982492
0.3489619802
0.04347826087
0.1049618321
0.5531914894
7
0.6560247168
0.4029226418
0.17391304348
0.2366412214
0.2765957447
9
1.0000000000
0.4503541907
0.17391304348
0.2366412214
0.2765957447
10
0.8547888774
0.4967308969
0.17391304348
0.2366412214
0.2765957447
12
0.8238928939
0.4635033509
0.17391304348
0.2366412214
0.2765957447
13
28
0.8846549949
0.3022488156
0.13043478261
0.2290076336
0.8936170213
black
lstat
medv
2
1.0000000000
0.2044701987
0.3688888889
7
0.9967219729
0.2952538631
0.3977777778
9
0.9741035857
0.7781456954
0.2555555556
10
0.9743053104
0.4241169978
0.3088888889
12
1.0000000000
0.3184326711
0.3088888889
28
0.7717484492
0.4290838852
0.2177777778
>
testmedv
<-‐
(test[
,
14])*(max(Boston$medv)
-‐
min(Boston$medv))
+
min(Boston$medv)
>
head(testmedv)
2 7 9 10 12 28
21.6 22.9 16.5 18.9 18.9 14.8
Calculate
mean
square
error.
SENeural
<-‐
sum((PredOriginal
-‐
testmedv)^2)/nrow(test)
>
MSENeural
[1]
8.786489466
Fantastic?
The
MSE
on
multiple
regression
model
worked
out
to
be
around
25.
How
close
the
predicted
values
are
to
observed
values?
>
plot(test1[
,
14],
PredOriginal,
xlab
=
"Observed
Median
Prices",
ylab
=
+
"Predicted
Median
Prices",
main
=
"Neural
Network
Regression
Model",
pch
=
16,
col
=
"blue",
xlim
=
c(9,
55),
ylim
=
c(9,
55))
>
abline(0,
1,
lwd
=
2,
col
=
"red")
14
60
50
Neural Network Regression Model
Predicted Median Prices
40
30
20
10
10 20 30 40 50
15
Mins
class(Boston)
ScaledB
<-‐
scale(Boston,
center
=
Mins,
scale
=
Maxs
-‐
Mins)
class(ScaledB)
ScaledB
<-‐
as.data.frame(ScaledB)
head(ScaledB)
index
<-‐
sample(1:nrow(Boston),
round(0.75*nrow(Boston)))
head(index)
train
<-‐
ScaledB[index,
]
test
<-‐
ScaledB[-‐index,
]
dim(train)
dim(test)
Neural
<-‐
neuralnet(medv
~
crim
+
zn
+
indus
+
chas
+
nox
+
rm
+
age
+
dis
+
rad
+
tax
+
ptratio
+
black
+
lstat,
data
=
train,
hidden
=
5,
linear.output
=
TRUE)
summary(Neural)
Neural
names(Neural)
head(Neural$net.result)
Neural$weights
plot(Neural)
Pred
<-‐
compute(Neural,
test[
,
1:13])
head(Pred)
PredOriginal
<-‐
(Pred$net.result)*(max(Boston$medv)
-‐
min(Boston$medv))
+
min(Boston$medv)
head(PredOriginal)
testmedv
<-‐
(test["medv"])*(max(Boston$medv)
-‐
min(Boston$medv))
+
min(Boston$medv)
head(testmedv)
head(test)
head(test$medv)
testmedv
<-‐
(as.matrix(test)$medv)*(max(Boston$medv)
-‐
min(Boston$medv))
+
min(Boston$medv)
testmedv
<-‐
(test[
,
14])*(max(Boston$medv)
-‐
min(Boston$medv))
+
min(Boston$medv)
head(testmedv)
MSENeural
<-‐
sum((PredOriginal
-‐
testmedv)^2)/nrow(test)
16
MSENeural
plot(test1[
,
14],
PredOriginal,
xlab
=
"Observed
Median
Prices",
ylab
=
"Predicted
Median
Prices",
main
=
"Neural
Network
Regression
Model",
pch
=
16,
col
=
"blue",
xlim
=
c(9,
55),
ymin
=
c(9,
55))
test1
<-‐
Boston[-‐index,
]
abline(0,
1,
lwd
=
2,
col
=
"red")
17