Professional Documents
Culture Documents
l)
(Textbook reading: Chapter l
c
Yo- x la Yu, VK
Basio Idea
. Combine input informalion in a cornF,lex & f'lexible neural net "model"
Network Structure
o Multiple layers
l. Input layer (raw observations)
2. Hidden layers
3. Output layer
o Nodes
r Weights (like coeffrcients, subjecl tc iteralive adjustment)
e Bias values (also like coefficients. but not subject to iterative adjustment)
Schematic Diaeram qr li /
x 9 ?
1
Example Using fat &
r it6 tlr*lrn *h,yn+-'
salt contenl to predict consumer acceptance of cheese
2 L pxy.\
Xr wt: t3l
Fat -----. 1
t4 0 it , dislike
o
,l wao
*r- 07
Salt 2 (l
I t7l .
' like
wzs
wsr
LR h
?
I
Fn'
f; r LR n X"Y,,
Circles are nodes, wiy on arrows are weights, and Q are node bias values ,v1or7liw haun',
v
TinyExampte-Data t^'{"n"Iltudt
Olrs- Iirr &.rrl' Solr Se{rrc .4e.ept.rtt'r
I 0.1 0.9 I lil<-
l 0l 0_l 0 a',rL;lu
l 0: 0..+ 0 ,tll li kt
l 0.1 {,.5 0 ,1'.r likr
5 0,4 0_5 I t lta
6 0.3 0.8 I
l..lu"
2
The Hidden Layer
' In this example, it has 3 nodes
' Each node receives as input the output ofall input nodes
' Output ofeach hidden node is some function of the weighted sum ofinputs
I *r'
t p
)
output 1 = g(O; *
The Weights
The weights q (theta) and w are t,vpically initialized to random values in the range -0.05
to +0.0f-
' Equivalent to a model $,ith random prediction (in other words, no predictive value)
1
out?uh = = 0.43
1 + e-t-03+(0.T)(0.2)+(0.01X0.e)l
-0.3
0. J -0.
0.0 0. l4 0.51
ll -0.015
Saft=0.9 0.9
-0.02 :7 0.506...*tike
-0.0
(s : a
r LR tu,
) 0.
Node outputs (bold) using first record in this example, and logistic function ?,, lr
f ;: LR rvr X,, tl
3
[,r= LK 4a Xr rlv
Pl , LIt rl Xr,I>
Output Layer
' The output ofthe last hidden layer becomes input for the output layer
' Uses same function as above, i.e. a function g ofthe weighted average
Outpr.rt,.
1"
1
Output, g-l-o.otl+10.01) (0.430J*lo 05)i0.5{}?}+(o.ol5)f0.51 I )l - 0.506
1 1
!=o+fw1 x;
Preprocessing Steps
' Scale variables to 0-1
' Categoricalvariables
' If equidistant categories, map to equidistant interval points in 0-1 range
' Otherwise, create dummy variables
' Transform (e.g., log) skewed variables
4
At each record compare prediction to actual
errp- irG-9i0u_9)
Why It Works
' Big errors lead to big changes in u,eights
' Small errors leave weights relatively unchanged
' Over thousaryts ofuodates, a given weight keeps changing until the error associated with
that weight is negligible, at which point r,r.eights change little
5
#### Table 11.2
library(neuralnet)
>
.
g! <- read:csv ("tinydata. csv" )
> df
ob Fat Sal t Acceptance
1 1 n) 0.9 Ii ke
2 2 0.1 0.1 dislike
3 3 0.2 0.4 dislike
4 4 0.2 0.5 dislike tl \t
0.4 0.5 l-i ke (^1rc
5 5
0.3 0.8 Iike
n*l* ;s Tvnt
6 6 loliu.f '|n*t*rnf,
,Yh ,rnyf {n!"rthlt
df9Like <- df SAcceptance== " 1i ke"
> dfSDislike <- df$Acceptance== " i: s I i ke "
set.seed(1)
# Multiclass classif ication
# Like + Disfike are 2 logical classesi can extend to multi class
Xr Xz
oI ,t nn <- neuralnet (Like + Dislike - Salt + Fat, data = df,
i ii.rear.oiilt,t = r,-hidcien = 3r
\
#fndicates t hidden faver with 3 nodes, the syntax hidden = 3,4
#would mean 2 layers, 3 nodes in the first, 4 in the second
6
nanvr lnrn)
\
lrlt)
l'. ,ti(flth{,r
---
c __.._ A Lilq
xr0
> # display weights liltr
> nn$we j.ghts r"O o D:r
ttll l
t t1l I ttlll
bo t'11 L21 r,Jl ,tale
i6lvrr1t i1, l -t .061L43694 3.Os702r8ao 3. 337952001
f,l t) 1
z\tzeozqnz -3.408663181 -4.293213530
Y1 t3,l 4.L06434697 -6.525668384 -5 .929418648
)r
ttllltt2ll L:lk )i; l;kt
t, rl l,2l
[1, ]
t o-0. 3a953328 g2 -r. 6i't 855a62
t2, I h 5 .87771-45665 -3.606625360
nr ilt, t3, lll-5 .3529200'726 5. 620329"7 OO
t4, l!-6.111s038896 6.69628685i altrvndriwf '
tA cjt
'("nI vit" t
# display predictions -f,* ih hh ^*-l *J;t*n tc 1d P
prediction (nn) ?
Data Error: 0; a
$repl P
|.?t +\ca',e*it;tly, llntt- Pv\'
Sal-t Eat Li ke Disfike guln *o I '; ia Pactiu r
ln^y uff r{'r'tc +
1 0. 1 0.1 0.0002415535993 0 .999655L247 9
2 0. 4 o) 0.0344215785564 0 . 965567 81 69 4
?ci 'tl{^ Ih
3 0. 5 0.2 0.1248666'7 417 40 0.8i81.6821940 httrnrvi'al PY
$data
Sal-t Fat Li- ke Di,sLike
noi
1 0.1 0.1 0 1
$1 tattt ) 0.4 0.2 0 1
, 'lttl 3 0.5 i) 0 1
u.l
F
.riJ 45 0.9
0.8 0.3
1
1
0
0
;. 6 0.5 il 1 0
* plot network
plot (nn, rep= rrbest rt )
7
Gt 1
\
\ b,
\\ \a
\r \hL,
Salt Lrke
(
'/
h
t
\L
!t
Fa1 s D6like
Error:0.03757 Steps:76
plot. nn
x : an ob j ect of c.Iass nn
rep: repetition of the neural network. 1f rep=rrbestn, the repetition
with the smallest error wil.l- be plotted. If not stated all repetitions
will- be plotted, each in a separate window.
8
*## rable 11.3
> library (caret )
$neurons [[2] l
t,1l L21 t, 3l l'41
1 10 .864545L21 5 0.2L7t-991846 4.1529212964
2 10 .3910t989 47 0.8873134952 0.910168C690
3 10 . 66608 99r04 0 .5959029849 0. 6070151908
4 10 .r156845856 0.5118869030 0.5013653731
5 10 .8512504353 0.2213912623 a .23491 62818
6 10 .88407s7738 0.1641581374 0.1329130140
?
t* t1
t2
0..9349452648L41
0.0002415s3s993
0
0
.01 0227 3225"1
.99965572419
$.1^ t3 0.0344275186564 0. 96556787694
tl\r
t4 c.12486661 41140 0.87816827940
6 o\, t5 0.8841904620140 0.1261243'i'72L
t6 0.t591361793188 0.0450s630529
9
wLi.l^ . rr-t a x 2 2 I
apply (predict$net. result, i, which . max)
tL1 L222t1.
h "n,
Remarks:
3!41) ir a R function which enables to make quick ogerations on matrix, vector or array. The
operations can be done on the lines, the columns or even both of them.
10
>df
Obs . fdL Salt Acceptance Like Disiike
11 0.2 0.9 Ii ke TRUE EALSE
22 0.1 0.1 dislike FALSE TRUE
33 0.2 0.4 dislike EALSE TRUE
44 0.2 0.5 dislike FALSE TRUE
55 0.4 0.5 Iike TR.LIE EALSE
66 0.3 0.8 Ii ke TRL1E EALSE I ult.t& +b ,utt t hr.l.tllll| ll,1
loj\,.cJ t#<v,,n*-
) table (ifelse (predicted. class::" 1", "disLi-ke", "1ike"), df $acceptfnce)
A"+hr.{
dislike Iike r -r lir illq
disli ke 3 0
c -l -1, lct
Yr'i fi ke 0 3
[1] "matrix"
> names (predict )
NULL
> predict t:lq Dtl.lq
t,1l l,2l
?
li
t'
,.J
t1
t2
l0
l0
. 9349452648 0 . 0't 022-132
. 0002415536 0. 9996551-2
,A t3 l0 . 0344215187 0. 965s6788
IA l0 .L2486661 48 0. 87816828
Y
Dl,. ts, l0 . 884L904620 0. 1261 2438
16, l0 .9591361793 0.0450563r tyro,l
>p red icted. class:apply (predict, 1, which. max) -1
> table(ifelse (predicted.cl-ass=="1", "disJ-ike", "Iike"),
df$Acceptance )
d f ke 1
disfike 3 0
1i ke 0 3
11
('J qtt v rL'l A,fiM rh
usc+^J[ r+ \^+nl's- '
Data Normalization
l, hV{ ct +o
%' "^ll< sqlt-
One of the most important procedures when forming a neural network ir dljellg4altzation. thit
involves adjusting the data to a common scale so as to accurately compare predicted and actual
values. We can do this in two ways in R:
. Scale the data frame automatically using the scale function in R
x-x I scl'r+,
s 0
. Transform the data using a max-min norntalization technique
Scaled Normalization
,-i
W lorlT
w
scale( )
Max-Min Normalization
For this method, we invoke the following functiorr to normalize our data:
normalize <- function(x) {
rerum ((x - min(x)) / (max(x) - min(x))) fflg *a(o,l-l
) 'vtr''1
- ttn\ Y
Then. rve use lapply to run the lirnction across our existing data (e.g., mydata):
maxmindl <- as.data.l'rame(lapply(mydata, normalize) )
Fat Salt
11
72
valur r
hovnJ\d
> nn <- neurafnet (t,ite + Dlel.ike - SaIt + Eat, data = df,
.I inear . output = F, hidden = 3 )
> #nn # long output
t[1]l tt2ll
t, 1l L,2t
1 0.4923L33 0.5164109
2 -3.2832295 L.'t6s6123
3 -2.0970303 L.2261949
4 4.0719476 -3.65380s0
Data Error: 0;
$ repl fu' )' 7wL
Sal"t Eat Like Disl-i ke
(ur,J S /l ,
+\,eot1t\to,llv
1 -1.5071514 -r.2909944 0.007591287 0.91061'726 , L
t10,r
Iot i" 2 -0. 4531389 -0.3227 486 0.0l562r-549 0. 95544668 ;^ s6!i tl', '^n1
3 -0. 1159347 -0.3221 486 0.03-p76s916 0. 93157454 t:
buol,rr. e^ch YL
or
Jlln( 4 t.2't52820 -0.3227486 0.96J'788452 0.07879514
Ari'^ 5 0.92'7 47'78 0.64549't2 0.986252988 0.04763819 tt,llS,t)rtA LilLl '^
l".;t(r 6 -0.r159347 7.6t3'?437 0.986s08633 0. 047 6908 4 I\eiv LA uotuL '
5oYt4
$data ,f. u1 lou unt
Salt Fat L ike Dislike 6u rn t ri gr.{
1 -1. 5071514 -1.2909945 0 1
It{-citi,/"' )
2 -0.4637389 -0.3221 486 0 1
3 -0.1159347 -0.3221 486 0 1
4 t.2152820 -0.3221 486 1 0
5 0.921 4718 0.6454972 1 0
6 -0. 1159347 1..6L3't 431. I 0
# plot network
13
> PLot (nn, rep=rrbest'r )
n",,,,J',14
Va( lr{ J
dislike 1i-ke
H"( disfike 30
l) li ke 03
t4
Example 2: Classifying Accident Severity
22 I 0
32 I
4 I I 0
-- i--
52 I I 2
6 0 I I
-t--
72 0 I 3 I
82 0 i I 4 I
92 0 I 2 0
10 ) I 0
A neural netu-ork n'ith two nodes in the hidden laver (accidents data) i Y <- X, ,/\r"'
')
15
> head (accictents. df)
HrlLrR I R Al(:rll 1 ALIGT]_I STRA:UI4_R ',lir:.--ZCliE i,llii I R IIIT_HWY LGTCCI.i_I-R MAI.JCCI, T 3 PED A:C R RELJCT 1 R
F,EL R"d! R I'IOT I;, I A
1 0
l:
0 I a) I
I
L l C 1 i Ll I
;
4 1 I l 0 I
1
1 c c 3 0
I
l I I i) 0
0
-villATEER
SPF_i,I]I SU:_CONi I]11E_CON R TRAF ,iAY Yil: IilV:. L iNJURY CRAIIH NO II'IJ I PRPTYDI.IG CiASH FATALITIES
l,iAx ssv ti1
1 1i 1 1
I
? a_l
0
l 0
0
4
'l,t: L (l 1
0
5 al c
0
6 .'t' ,j o !
1
1b
S PRPTYDMG-CRASH: int 0 1 1 1 101 0 1 1 ...
$ FATA],I T I ES : lnt 0 0 0 0 c 0 0 0 0 0 ...
S MAX SEV IR : int 1 0 0 0 c 1 0 1 0 0 ...
> # seLected variables
> vars <- c("ALCHI_I", "PROFIL_r*R", ,VEH INVI" )
> vars
[1] "ArCHr,_r " " PROFII_I-R" I'VEH_I NVL "
> # ,^,1:'rlill,'k'
> head (class. ind (accidents.df Itralning. ] $sUR COND) )
, 2 3 h q
1 2 3 4 9
1,I 1 0 0 0 0
2,1 1 0 0 0 0
3,I 1 0 0 0 0
4,1 7 0 0 0 0
5,I 1 0 0 0 0
6,1 7 0 0 0 0
tai.I (class.ind(accj-dents.df Il-ra]nins, I SSUR CoND) )
12349
125304 , ) 10000
[25305, ] 10000
[2s306, ] 10000
125301 , I 10000
[2s308,1 10000
12s309 , I 01000
\7
No
i.^j r7
'j 14.
I {-t^l;+y
3 lu,l,, U tl 2
> head(class.ind(accidents.df Itra:ninS, ] $MAX SEv IR) )
0 1 2
t1,l 1 0 0
12,) 1 0 0
3,l 1 0 0
4,) 0 1 0
5,l 1 0 0
6,1 1 0 0
tai I ( class . ind (accidents . df I tra: ni ng, ] $l.4Ax_SEv_IR) )
012
12s304,l 010
[2s305, ] 010
t25306 , I 010
[25307, ] 100
[25308, ] 010
t25309,I 010
ALCHL I
PROFIL I R SEV I 0
\.!
VEH INVL O
s
SUR COND SEV I 1
COND
S o 2
SUR CO
19
> traininq.prediction <- cojpg!{.rn, LrainData [, -c (8:11) J )
> training.class <- apply(training.predictionsnet.result,l,which,max)-1
> table (training.class. accidents.df Itraining, ] $MAx_sEv_IR)
A.+rra{
training . clas s 1
0 10798 9964 191
rLA 1 1-644 2645 63 4l
rl (^r1s ih f ,rJi r,h tv,
i?
Remarks
Confusion Matrix
Reierence
Predlction 0 1 2
0 10798 9964 190
7 L644 2645 68
200 0
Confusion Malrix
Re:erence
Prediction 0 1 2
0 l tlg 5596 148
1 1100 1791 60
200 0
20
Common Criteria to Stoo the Updating
' When weights change very little from one iteration to the next
To avoid overfitting:
' Track error in validation data
' Limit iterations
' Limit complexity of network
User Inrruts
"Learning Rate"
' Low values "downweight" the new information from errors at each iteration
' This slo*s leaming, but reduces tendency to overfit to local structure
"Momentu m"
' High values keep weights changing in same direction as previous iteration
' Likewise. this helps avoid overfitting to local structure, but also slows learning
27
Arguments in ncuralnet
hidden: a vector specifying the number ofnodes per layer (thus specifying both the size
and number of layers)
Advantages
' Good predictive ability
' Can capture complex relationships
' No need to specifr a model
Disadvantages
' Considered a "black box" prediction machine, with no insight into relationships between
predictors and outcome
' No variable-selection mechanism, so you have to exercise care in selecting variables
' Heavy computational requirements if there are many variables (additional variables
dramatically increase the number of weights to calculate)
Deep l-eaming
22
Summan'
' Neural net\4,orks can be used for classification and prediction
' Can capture a very flexible/complicated relationship between the outcome and a set of
predictors
' The network "leams" and updates its model iteratively as more data are fed into it
' Major danger: overfitting
' Requires large:mounts of data
' Good predictive performance, yet "black box" in nature
' Deep leaming, very complex neural nets, is effective in image recognition and AI
tl
rrir*
' '.lr
T€ruor Fl orrt
-'r lenni'.'t
Dcto
d .J
Record the RMS error for the training data and the validation data. Repeat the
process, changing the number ofhidden layers and nodes to {single layer with 5
nodes), {two layers, 5 nodes in each layer}.
iii.What happens to the RMS error for the training data as the number of layers and
nodes increases?
la
lv. What happens to the RMS error for the validation data?
Comment on the appropriate number of layers and nodes fbr this
application.
24