You are on page 1of 17

 

 
 
 
 
CONTRASTING  STYLES  OF  NEURAL  NETWORKS  and  
MULTIPLE  REGRESSION  
What  is  DEEP  LEARNING  
   
Hyderabad  2019  Workshop  
December  21-­‐24,  2019  
Day  4  
 
Instructor:  MB  Rao  
Division  of  Biostatistics  and  Bioinformatics  
University  of  Cincinnati  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1  
 
 
 
Agenda:  
1.  What  is  a  neural  network?  
2.  Neural  Network  vis-­‐à-­‐vis  Multiple  Regression  
3.  Deep  Learning  
 
 
1.  What  is  a  neural  network?  
 
Regression  context:  
 
Response  variable:  Y  –  Numeric  
Predictors:  X1  and  X2  
 
Goal:  Build  a  neural  network  model  connecting  Y  with  X1  and  X2.  The  
emphasis  is  on  prediction.    
 
Equipment:  
Every  neural  network  has  one  input  layer  and  one  output  layer.    
I  choose  one  hidden  layer.  
Choose  the  number  of  neurons.  I  Chose  3.    
Choose  an  activation  function  f.  I  chose  the  sigmoid  function:    
!
f(x)  =   ,  x  real.    
!!!"#  (!!)
Properties:  0  <  f(x)  <  1  
     f’(x)  =  f(x)(1  –  f(x))  
Come  up  with  weights  as  outlined  below.    
 
Here  is  my  model  pictorially:    
 

2  
 
 
Here  is  my  model  algebraically:    
 
Y*  =  f(w1*f(w11*X1+  w21*X2  +  b1)  +  w2*f(w12*X1+  w22*X2  +  b2)  +  
     w3*f(w13*X1+  w23*X2  +  b1)  +  b)  
Y*  is  a  function  of  9  weights  and  4  biases.    
 
How  does  this  work?    
 
Get  the  data  handy.    
 
Y   X1   X2   Y*  
 
Y1   X11   X21   𝑌!∗  
Y2   X12   X22   𝑌!∗  

3  
 
…   …   …  
Yn   X1n   X2n   𝑌!∗  
 
The  weights  and  biases  are  in  place.  Plug  in  the  data  of  X1  and  X2  from  the  first  
row.  Calculate  Y*.  Continue  with  every  row  data  on  X1  and  X2.  Keep  
calculating  Y*.    
 
Check  how  close  the  Y*  values  are  to  Y  values.  Calculate  MSE  =  
! !
!!! !! !  !!∗ !
!"#$%&'"()%!  !"#$%&'#$()*+# !
 =   !!!  
! !
Error  =  0.5*MSE  
Set  a  target  value  for  ERROR.  The  Error  we  have  now  in  hand  is  not  
satisfactory.  Tweak  the  weights  and  biases.  How?  There  is  a  way.  (Back  
propagation  method)  Start  again.  Look  at  the  resultant  Error.  It  is  not  
satisfactory.  Tweak  the  weights  and  biases.  Continue  until  we  get  the  
Error  we  wanted.    
 
Query:  Why  don’t  we  minimize  the  Error  with  respect  to  the  weights  
and  biases?  Very  hard.    
Easy.  Set  a  target  for  Error.  Do  iterations  until  we  get  what  we  wanted.    
 
Contrast  the  neural  network  with  our  multiple  regression  model.  
 
Y  =  β0  +  β1*X1  +  β2*X2  +  ε  with  mean  0  and  sd  σ2.  
 
The  model  has  4  parameters.  We  do  use  the  least  squares  method  to  
estimate  the  parameters  of  model.    
 
A  model  with  4  parameters  versus  a  model  with  13  parameters.    
 
R  has  several  packages,  which  can  get  you  a  neural  network  model.    
 
Example  

4  
 
 
>  data(Boston)  
>  dim(Boston)  
[1]  506    14  
>  head(Boston)  
crim zn indus chas nox rm age dis rad tax ptratio black lstat
1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98
2 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14
3 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03
4 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94
5 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33
6 0.02985 0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21
medv
1 24.0
2 21.6
3 34.7
4 33.4
5 36.2
6 28.7
Importance  of  the  study  …    
Response  variable:  medv  
Predictors:  13  in  all  
 
Build  a  neural  network  regression  model.    
Let  us  experiment  with  5  hidden  layers.    
#  weights  =  70  
#  biases  =  6  
 
Multiple  regression  model  has  15  parameters.    
 
I  Step.  Each  column  data  has  to  be  normalized  to  0  to  1.  Why?  
 
Operation  normalization:    
Find  the  maximum  column-­‐wise.  
>  Maxs  <-­‐  apply(Boston,  2,  max)  
>  Maxs  
crim zn indus chas nox rm age dis
88.9762 100.0000 27.7400 1.0000 0.8710 8.7800 100.0000 12.1265
rad tax ptratio black lstat medv
24.0000 711.0000 22.0000 396.9000 37.9700 50.0000
Find  the  minimum  column-­‐wise.    
>  Mins  <-­‐  apply(Boston,  2,  min)  
>  Mins  
crim zn indus chas nox rm age dis
0.00632 0.00000 0.46000 0.00000 0.38500 3.56100 2.90000 1.12960
rad tax ptratio black lstat medv
1.00000 187.00000 12.60000 0.32000 1.73000 5.00000

5  
 
>  class(Boston)  
[1]  "data.frame"  
Normalization  …    
>  ScaledB  <-­‐  scale(Boston,  center  =  Mins,  scale  =  Maxs  -­‐  Mins)  
>  class(ScaledB)  
[1]  "matrix"  
>  head(ScaledB)  
crim zn indus chas nox rm age dis
1 0.0000000000 0.18 0.06781525 0 0.3148148 0.5775053 0.6416066 0.2692031
2 0.0002359225 0.00 0.24230205 0 0.1728395 0.5479977 0.7826982 0.3489620
3 0.0002356977 0.00 0.24230205 0 0.1728395 0.6943859 0.5993821 0.3489620
4 0.0002927957 0.00 0.06304985 0 0.1502058 0.6585553 0.4418126 0.4485446
5 0.0007050701 0.00 0.06304985 0 0.1502058 0.6871048 0.5283213 0.4485446
6 0.0002644715 0.00 0.06304985 0 0.1502058 0.5497222 0.5746653 0.4485446
rad tax ptratio black lstat medv
1 0.00000000 0.20801527 0.2872340 1.0000000 0.08967991 0.4222222
2 0.04347826 0.10496183 0.5531915 1.0000000 0.20447020 0.3688889
3 0.04347826 0.10496183 0.5531915 0.9897373 0.06346578 0.6600000
4 0.08695652 0.06679389 0.6489362 0.9942761 0.03338852 0.6311111
5 0.08695652 0.06679389 0.6489362 1.0000000 0.09933775 0.6933333
6 0.08695652 0.06679389 0.6489362 0.9929901 0.09602649 0.5266667
 
Step2  
Deep  concern:  Researchers  are  concerned  about  over-­‐fitting.  Better  use  a  cross-­‐
validation  method.    
 
Idea.  Select  a  random  sample  of  75%  of  the  observations.  Fit  the  model  to  the  
chosen  data  (training  data).  Predict  using  the  test  data.      
>  index  <-­‐  sample(1:nrow(Boston),  round(0.75*nrow(Boston)))  
>  head(index)  
[1]  456    84  122  121  138  395  
Training  data  …    
>  train  <-­‐  ScaledB[index,  ]  
Test  data  …    
>  test  <-­‐  ScaledB[-­‐index,  ]  
>  dim(train)  
[1]  380    14  
>  dim(test)  
[1]  126    14  
 
Step  3  
 
Activate  the  neuralnet  package.  Fit  the  model  to  the  training  data.    

6  
 
>  Neural  <-­‐  neuralnet(medv  ~  crim  +  zn  +  indus  +  chas  +  nox  +  rm  +  age  +  dis  +    
+  rad  +  tax  +  ptratio  +  black  +  lstat,  data  =  train,  hidden  =  5,  linear.output  =    
+  TRUE)  
Look  at  the  output.    
>  Neural  
Call:  neuralnet(formula  =  medv  ~  crim  +  zn  +  indus  +  chas  +  nox  +  rm  +          age  +  dis  
+  rad  +  tax  +  ptratio  +  black  +  lstat,  data  =  train,          hidden  =  5,  linear.output  =  
TRUE)  
1  repetition  was  calculated.  
Error Reached Threshold Steps
1 0.5521105681 0.008518995316 1810
What  is  the  definition  of  error?  
What  is  available  in  the  output?    
>  names(Neural)  
 [1]  "call"                                "response"                        "covariate"                      
 [4]  "model.list"                    "err.fct"                          "act.fct"                          
 [7]  "linear.output"              "data"                                "net.result"                    
[10]  "weights"                          "startweights"                "generalized.weights"  
[13]  "result.matrix"              
Predictions  as  per  the  model  on  its  own  training  data  …  
Weights  and  biases  of  the  fitted  network    
 
Look  at  the  final  weights  and  biases.    
>  Neural$weights  
[[1]]  
[[1]][[1]]  
[,1] [,2] [,3] [,4]
[1,] -2.85568157199 -0.08583591601 -0.378601415140 -1.59310310115
[2,] -4.08403137510 33.22505440312 -2.867568990001 -3.07724578496
[3,] -0.34286531604 0.33550912203 -1.497394360159 -47.74364991763
[4,] -1.04142215689 0.31833981935 -0.001272842622 2.04785553440
[5,] 0.06533935713 -0.66015458860 0.436315756114 0.09642631869
[6,] -0.12990764305 -1.03445251768 -1.144746487349 -1.36984675982
[7,] 4.96295548937 -1.51900387065 -0.255016229657 -0.69923746384
[8,] -0.46148810256 1.45779560846 -0.167382194514 -0.25209468993
[9,] -1.30767935934 -0.58530508206 -1.647454422674 -8.41362091385
[10,] -2.33850302023 -1.95119638696 -14.853747863353 1.68882353274
[11,] -0.89923628530 1.16781101870 2.019907260779 -0.01393246031
[12,] -0.73903291384 0.51683535809 -0.446705542600 0.08117674463
[13,] 1.15231495614 -0.77570190697 0.662823518947 0.64011814564
[14,] -2.52586709011 -0.53923916714 -3.171540411614 -6.54867055429
[,5]
[1,] -0.32123913219
[2,] -55.25851933241
[3,] -3.25723698219
[4,] -0.64465681507

7  
 
[5,] 1.14337588923
[6,] 1.76061174429
[7,] -2.26634076690
[8,] 0.28661267269
[9,] -3.43069426657
[10,] 4.31135087352
[11,] -0.47561199731
[12,] 0.05543802282
[13,] -0.14470307233
[14,] -13.06309132362

[[1]][[2]]
[,1]
[1,] 0.3285047223
[2,] 1.0230877214
[3,] -0.2312830911
[4,] -1.0136261389
[5,] 2.9024081531
[6,] -1.7150123559
 
The  final  weights  and  biases  are  presented  as  a  list  with  two  leafs.  The  first  leaf  
gives  5  biases  associated  with  the  five  hidden  layers  and  65  weights  (13  predictors  
*  5  hidden  layers).  The  second  leaf  gives  one  bias  and  5  weights.    
 
Plot  the  network  …    
>  plot(Neural)  

8  
 
crim 1 -2 1
-4 .0 .8
8 40 5 5
68

3-323
3

5 .50.2 -1437.25 -2-00.0.6 001..14 -11.7


zn

.2826 7752855.24.97743264.50047486
. 75
-0 .34 287
0.

50
33

5
142

7
indus -1 .054 51

1.0
-0 .0
0.3 34

23
6158

7
3
0.0 34

9 127 36 3 3

8 58

09
chas -0 .69961

4
4 . 09 3 3 .360 --02. .26 0.28 -3 .
2 015
.1
-0 664 6 4 8 6 961 69 634 6 61 43 0
nox -1 .03 445 6 -0
.23
49 62 9
-1 2 12
.141 9

-0 .37
5 7 25310 .5136902. .1869 9 --12..933 --01 .30 -0 .4 4.9

-1 .5 475 8

0.328
rm
2 8 7 44 6 84301 7 8214 5 185 .58 768 6 1

-0 .28

86
75
45 502
85 92

5
1.
age medv
1

-0.16738 -1.01363
53

-0
.25 745
4

4
-1 .6 209 5
-1 .59 3

dis
2

-8 .4 537
4.81 362 41
-1 02
1

rad 9
1.688 82 2.
1
71 9 9

4.
69

31
-2 .52 0-10..15 --00. .7 -10 .

393
tax -0 .011 35
-0 .32 12

-0 . 18
4871
1

5
0.0 61
50
2

ptratio
71

0.05 2
0 1 5 44
4
..514731952847.667

-1 .

. 64
0
86 4

-0 .14 47
7

black
09
.0 63
--630

-1 3
lstat

 
Step  3  
Do  predictions  on  the  test  data  …    
>  Pred  <-­‐  compute(Neural,  test[  ,  1:13])  
Predictions  on  the  test  data.    
Pred$net.result  
                       [,1]  
2      0.3757415701  
7      0.3591791009  
9      0.2669614644  
10    0.3146476731  
12    0.3345820387  
28    0.2447454954  
30    0.3475679445  

9  
 
32    0.2534157230  
42    0.6075361443  
44    0.4699340484  
50    0.2966713391  
55    0.2466716981  
56    0.6437671427  
72    0.3575877295  
80    0.3378467889  
83    0.4406688622  
86    0.5097421581  
87    0.3813464824  
89    0.5297017486  
96    0.4475301031  
108  0.3316455636  
113  0.3030533420  
114  0.3010485312  
115  0.3967557632  
123  0.3069406727  
125  0.3064333012  
128  0.2680328928  
129  0.3209942035  
130  0.2356838553  
140  0.2712029351  
142  0.1707914676  
145  0.2095164027  
146  0.2174989132  
154  0.2536396135  
165  0.3310547342  
166  0.3129059581  
167  0.9800033205  
168  0.2881798281  
176  0.5851211602  
181  0.8770446959  
188  0.5568578915  
198  0.6103184783  
199  0.6999327919  
205  0.9984145626  

10  
 
209  0.3567476275  
210  0.2643354897  
213  0.3434654552  
218  0.4697034708  
219  0.3510338783  
225  0.8946034984  
227  0.7713939588  
230  0.4932146168  
234  0.8451574570  
235  0.3973463612  
237  0.4086731864  
240  0.4776623025  
243  0.3870811017  
246  0.2770407172  
249  0.3633061375  
250  0.4551401092  
253  0.5097248098  
261  0.7123061252  
263  0.9954823174  
264  0.6998072948  
269  0.8187958756  
277  0.6694492660  
278  0.5903431127  
280  0.6969573674  
283  0.7931284028  
284  0.9596458996  
285  0.6185027939  
291  0.6004720344  
292  0.6664488558  
298  0.3045235110  
302  0.4554820845  
305  0.6713047878  
310  0.3540051268  
317  0.2909557635  
318  0.3025937072  
325  0.4476307134  
326  0.4687220025  

11  
 
328  0.3632932039  
332  0.2603927261  
337  0.3715700270  
344  0.4542951347  
350  0.4435675137  
351  0.3403736236  
359  0.3192597160  
363  0.3251733709  
367  0.3988461814  
371  1.2906069031  
375  0.1037415294  
387  0.1155471295  
394  0.3382996601  
396  0.2789493990  
398  0.2466502738  
399  0.1040357035  
402  0.1836781198  
404  0.1618810811  
405  0.1063417382  
406  0.1067429388  
407  0.1682969906  
408  0.5524514639  
414  0.1545580633  
418  0.1104342050  
419  0.1021958320  
426  0.1157222453  
428  0.1472523894  
431  0.2099817017  
432  0.1651702798  
433  0.3743405146  
441  0.1322323174  
445  0.1303098946  
447  0.2509959661  
448  0.2242491925  
452  0.2657565517  
457  0.1961223795  
473  0.3867211847  

12  
 
477  0.2962039278  
481  0.4123671802  
482  0.5385543241  
485  0.3542734361  
488  0.4347491866  
492  0.1942285742  
494  0.3627209208  
503  0.2596888149  
 
These  predictions  are  on  the  normalized  data  …    
 
We  need  to  do  predictions  on  the  original  data.  De-­‐normalization?    
>  PredOriginal  <-­‐  (Pred$net.result)*(max(Boston$medv)  -­‐  min(Boston$medv))  +    
+  min(Boston$medv)  
>  head(PredOriginal)  
                   [,1]  
2    21.90837066  
7    21.16305954  
9    17.01326590  
10  19.15914529  
12  20.05619174  
28  16.01354729  
The  test  data  has  to  be  de-­‐normalized  for  comparison  with  predictions.    
>  head(test)  
                           crim        zn                indus  chas                    nox                      rm  
2    0.0002359225392  0.000  0.2423020528        0  0.1728395062  0.5479977007  
7    0.0009213230365  0.125  0.2716275660        0  0.2860082305  0.4696301974  
9    0.0023032513925  0.125  0.2716275660        0  0.2860082305  0.3966277065  
10  0.0018401733261  0.125  0.2716275660        0  0.2860082305  0.4680973367  
12  0.0012492992010  0.125  0.2716275660        0  0.2860082305  0.4690553746  
28  0.0106715890816  0.000  0.2815249267        0  0.3148148148  0.4763364629  
                       age                    dis                      rad                    tax            ptratio  
2    0.7826982492  0.3489619802  0.04347826087  0.1049618321  0.5531914894  
7    0.6560247168  0.4029226418  0.17391304348  0.2366412214  0.2765957447  
9    1.0000000000  0.4503541907  0.17391304348  0.2366412214  0.2765957447  
10  0.8547888774  0.4967308969  0.17391304348  0.2366412214  0.2765957447  
12  0.8238928939  0.4635033509  0.17391304348  0.2366412214  0.2765957447  

13  
 
28  0.8846549949  0.3022488156  0.13043478261  0.2290076336  0.8936170213  
                   black                lstat                  medv  
2    1.0000000000  0.2044701987  0.3688888889  
7    0.9967219729  0.2952538631  0.3977777778  
9    0.9741035857  0.7781456954  0.2555555556  
10  0.9743053104  0.4241169978  0.3088888889  
12  1.0000000000  0.3184326711  0.3088888889  
28  0.7717484492  0.4290838852  0.2177777778  
>  testmedv  <-­‐  (test[  ,  14])*(max(Boston$medv)  -­‐  min(Boston$medv))  +  
min(Boston$medv)  
>  head(testmedv)  
2 7 9 10 12 28
21.6 22.9 16.5 18.9 18.9 14.8
Calculate  mean  square  error.    
SENeural  <-­‐  sum((PredOriginal  -­‐  testmedv)^2)/nrow(test)  
>  MSENeural  
[1]  8.786489466  
 
Fantastic?  The  MSE  on  multiple  regression  model  worked  out  to  be  around  25.    
 
How  close  the  predicted  values  are  to  observed  values?  
>  plot(test1[  ,  14],  PredOriginal,  xlab  =  "Observed  Median  Prices",  ylab  =    
+  "Predicted  Median  Prices",  main  =  "Neural  Network  Regression  Model",  pch  =  
16,  col  =  "blue",  xlim  =  c(9,  55),  ylim  =  c(9,  55))  
>  abline(0,  1,  lwd  =  2,  col  =  "red")  
 
 

14  
 
60
50
Neural Network Regression Model
Predicted Median Prices

40
30
20
10

10 20 30 40 50

Observed Median Prices


 
Are  the  neural  networks  worth  the  sweat?    
A  critical  appraisal  …    
 
3.  What  is  deep  learning?    
 
Have  one  more  bunch  of  hidden  layers.    
 
 
 
R  codedim(Boston)  
head(Boston)  
Maxs  <-­‐  apply(Boston,  2,  max)  
Maxs  
Mins  <-­‐  apply(Boston,  2,  min)  

15  
 
Mins  
class(Boston)  
ScaledB  <-­‐  scale(Boston,  center  =  Mins,  scale  =  Maxs  -­‐  Mins)  
class(ScaledB)  
ScaledB  <-­‐  as.data.frame(ScaledB)  
head(ScaledB)  
index  <-­‐  sample(1:nrow(Boston),  round(0.75*nrow(Boston)))  
head(index)  
train  <-­‐  ScaledB[index,  ]  
test  <-­‐  ScaledB[-­‐index,  ]  
dim(train)  
dim(test)  
Neural  <-­‐  neuralnet(medv  ~  crim  +  zn  +  indus  +  chas  +  nox  +  rm  +  age  +  dis  +    
rad  +  tax  +  ptratio  +  black  +  lstat,  data  =  train,  hidden  =  5,  linear.output  =    
TRUE)  
summary(Neural)  
Neural  
names(Neural)  
head(Neural$net.result)  
Neural$weights  
plot(Neural)  
Pred  <-­‐  compute(Neural,  test[  ,  1:13])  
head(Pred)  
PredOriginal  <-­‐  (Pred$net.result)*(max(Boston$medv)  -­‐  min(Boston$medv))  +    
min(Boston$medv)  
head(PredOriginal)  
testmedv  <-­‐  (test["medv"])*(max(Boston$medv)  -­‐  min(Boston$medv))  +  
min(Boston$medv)  
head(testmedv)  
head(test)  
head(test$medv)  
testmedv  <-­‐  (as.matrix(test)$medv)*(max(Boston$medv)  -­‐  min(Boston$medv))  +  
min(Boston$medv)  
testmedv  <-­‐  (test[  ,  14])*(max(Boston$medv)  -­‐  min(Boston$medv))  +  
min(Boston$medv)  
head(testmedv)  
MSENeural  <-­‐  sum((PredOriginal  -­‐  testmedv)^2)/nrow(test)  

16  
 
MSENeural  
plot(test1[  ,  14],  PredOriginal,  xlab  =  "Observed  Median  Prices",  ylab  =    
"Predicted  Median  Prices",  main  =  "Neural  Network  Regression  Model",  pch  =  16,    
col  =  "blue",  xlim  =  c(9,  55),  ymin  =  c(9,  55))  
test1  <-­‐  Boston[-­‐index,  ]  
abline(0,  1,  lwd  =  2,  col  =  "red")  
 
 
 
 
 
 
 
 
 
 

17  
 

You might also like