You are on page 1of 24

DS 535: ADVANIllt) l)AT.

r\ MTNING FOR BUSINESS

Lectqgl\otes #5: Neural Ncts

l)
(Textbook reading: Chapter l
c
Yo- x la Yu, VK
Basio Idea
. Combine input informalion in a cornF,lex & f'lexible neural net "model"

a Model "coefficients" are continually lrvea-lied in an iterative process

The network's interim prerformance in cla;sification and prediction informs successive


tweaks

Network Structure
o Multiple layers
l. Input layer (raw observations)
2. Hidden layers
3. Output layer
o Nodes
r Weights (like coeffrcients, subjecl tc iteralive adjustment)
e Bias values (also like coefficients. but not subject to iterative adjustment)

Schematic Diaeram qr li /

x 9 ?

-e l- idden Lat e-s [,f Nourons


5o =
o=

1
Example Using fat &
r it6 tlr*lrn *h,yn+-'
salt contenl to predict consumer acceptance of cheese
2 L pxy.\

lnput Layer Hidden Layer Output Layer

Xr wt: t3l
Fat -----. 1
t4 0 it , dislike

o
,l wao

*r- 07
Salt 2 (l
I t7l .
' like
wzs
wsr
LR h
?
I

Fn'
f; r LR n X"Y,,
Circles are nodes, wiy on arrows are weights, and Q are node bias values ,v1or7liw haun',
v
TinyExampte-Data t^'{"n"Iltudt
Olrs- Iirr &.rrl' Solr Se{rrc .4e.ept.rtt'r
I 0.1 0.9 I lil<-
l 0l 0_l 0 a',rL;lu
l 0: 0..+ 0 ,tll li kt
l 0.1 {,.5 0 ,1'.r likr
5 0,4 0_5 I t lta
6 0.3 0.8 I
l..lu"

Moving Through the Network

The Input Layer

For input layer, input : output

8.g.. for record # I :


Fat input : output : 0.2
Salt input : output : 0.9

Output of input layer : input into hidden layer

2
The Hidden Layer
' In this example, it has 3 nodes
' Each node receives as input the output ofall input nodes
' Output ofeach hidden node is some function of the weighted sum ofinputs

I *r'
t p
)
output 1 = g(O; *

The Weights

The weights q (theta) and w are t,vpically initialized to random values in the range -0.05
to +0.0f-
' Equivalent to a model $,ith random prediction (in other words, no predictive value)

These initial weights are used in the first round ol


Output of Node 3 if g is a Logistic Funclion
p

outputi =g(o; +fw1;x1)

1
out?uh = = 0.43
1 + e-t-03+(0.T)(0.2)+(0.01X0.e)l

Initial Pass of the Nelwork


lnput Layer Hidden Layer Output Layer

-0.3

i3, 0.43 -0.04


!-f- - :0.02
Fat=0.2 r;) 0.2 - 0.01
0.02
c.01
0. t[8'l ---r dislike

0. J -0.

0.0 0. l4 0.51
ll -0.015
Saft=0.9 0.9
-0.02 :7 0.506...*tike
-0.0
(s : a
r LR tu,
) 0.

Node outputs (bold) using first record in this example, and logistic function ?,, lr
f ;: LR rvr X,, tl
3

[,r= LK 4a Xr rlv
Pl , LIt rl Xr,I>
Output Layer
' The output ofthe last hidden layer becomes input for the output layer
' Uses same function as above, i.e. a function g ofthe weighted average

Outpr.rt,.
1"

0.{)2 l, i}.-l:tr + -{r.i,3)(t.51 l+(t.015}{{r.:r2)l


: 0.,X81
1 + {.-l.-{r.o.l*i.- (

1
Output, g-l-o.otl+10.01) (0.430J*lo 05)i0.5{}?}+(o.ol5)f0.51 I )l - 0.506
1 1

Mapping the output to a classification


' Output - 0.506 for "like" and 0.481 lbr "dislike"
' So classification, at this early stage. is "like", using a cut-offprobability of 0.5

Relation to Linear Regression


A net with a single output node and no hidden layers, where g is the identity function, takes the
same form as a linear regression model

!=o+fw1 x;

Traininq the Model

Preprocessing Steps
' Scale variables to 0-1
' Categoricalvariables
' If equidistant categories, map to equidistant interval points in 0-1 range
' Otherwise, create dummy variables
' Transform (e.g., log) skewed variables

lnitial Pass l'hrough Network


' Goal: Find weights that yield best predictions

The process we described above is repeated for all records

4
At each record compare prediction to actual

Difference is the error for the output node


Error i s back and distributed to all the hidden nodes and used to update their
weights

Back Propagation ("back-prop")


' Output from output node k:
' Error associated with that node:

errp- irG-9i0u_9)

Note: this is like ordinary error, multiplied by a correction factor

Error is used to UDdate Weishts

*l* =*ftd +l(erri)


/: constant between 0 and 1, reflects the "learning rate" or "raeight decay parameter"

Why It Works
' Big errors lead to big changes in u,eights
' Small errors leave weights relatively unchanged
' Over thousaryts ofuodates, a given weight keeps changing until the error associated with
that weight is negligible, at which point r,r.eights change little

R Functions for Neural Nets CASc


d^*,7
neuralnet (used here)
bnlrrh ,TW;W
nnet (does not support multilay'lr networks)

5
#### Table 11.2

library(neuralnet)
>
.
g! <- read:csv ("tinydata. csv" )

> df
ob Fat Sal t Acceptance
1 1 n) 0.9 Ii ke
2 2 0.1 0.1 dislike
3 3 0.2 0.4 dislike
4 4 0.2 0.5 dislike tl \t
0.4 0.5 l-i ke (^1rc
5 5
0.3 0.8 Iike
n*l* ;s Tvnt
6 6 loliu.f '|n*t*rnf,
,Yh ,rnyf {n!"rthlt
df9Like <- df SAcceptance== " 1i ke"
> dfSDislike <- df$Acceptance== " i: s I i ke "

> class dfSLi ( ke )

[1] 1ogical "


"
> class (df$ Di slike )

[1] " logical "

>df 1 l,x iq.!. tluns


ob s. Eat Sa 1t Acceptance Li ke D"i s 1i ke
1 1 0.2 0.9 li ke f-nua FAISE
2 2 0.1 0.1 di sIi ke FALSE TRUE
3 3 n2 0.4 di-slike FALSE TRUE
4 4 0.2 0.5 dislike FALSE TRUE
5 0.4 0.5 1i ke TRUE FALSE
6 6 0.3 0.8 like TRUE FALSE

set.seed(1)
# Multiclass classif ication
# Like + Disfike are 2 logical classesi can extend to multi class
Xr Xz
oI ,t nn <- neuralnet (Like + Dislike - Salt + Fat, data = df,
i ii.rear.oiilt,t = r,-hidcien = 3r
\
#fndicates t hidden faver with 3 nodes, the syntax hidden = 3,4
#would mean 2 layers, 3 nodes in the first, 4 in the second

6
nanvr lnrn)
\
lrlt)
l'. ,ti(flth{,r
---
c __.._ A Lilq
xr0
> # display weights liltr
> nn$we j.ghts r"O o D:r

ttll l
t t1l I ttlll
bo t'11 L21 r,Jl ,tale
i6lvrr1t i1, l -t .061L43694 3.Os702r8ao 3. 337952001
f,l t) 1
z\tzeozqnz -3.408663181 -4.293213530
Y1 t3,l 4.L06434697 -6.525668384 -5 .929418648
)r
ttllltt2ll L:lk )i; l;kt
t, rl l,2l
[1, ]
t o-0. 3a953328 g2 -r. 6i't 855a62
t2, I h 5 .87771-45665 -3.606625360
nr ilt, t3, lll-5 .3529200'726 5. 620329"7 OO
t4, l!-6.111s038896 6.69628685i altrvndriwf '
tA cjt
'("nI vit" t
# display predictions -f,* ih hh ^*-l *J;t*n tc 1d P
prediction (nn) ?
Data Error: 0; a
$repl P
|.?t +\ca',e*it;tly, llntt- Pv\'
Sal-t Eat Li ke Disfike guln *o I '; ia Pactiu r
ln^y uff r{'r'tc +
1 0. 1 0.1 0.0002415535993 0 .999655L247 9
2 0. 4 o) 0.0344215785564 0 . 965567 81 69 4
?ci 'tl{^ Ih
3 0. 5 0.2 0.1248666'7 417 40 0.8i81.6821940 httrnrvi'al PY

4 0. 9 0.2 0 .9349452648t4L 0 .0i 0227 3225'1 ,o lhf wt^fi r.r.


5 0. 8 0.3 0.959136L793L88 0.04505630529
6 0. 5 it 0.8841904620140 0.t267243'172L

$data
Sal-t Fat Li- ke Di,sLike
noi
1 0.1 0.1 0 1
$1 tattt ) 0.4 0.2 0 1
, 'lttl 3 0.5 i) 0 1
u.l
F
.riJ 45 0.9
0.8 0.3
1
1
0
0
;. 6 0.5 il 1 0

* plot network
plot (nn, rep= rrbest rt )

7
Gt 1

\
\ b,
\\ \a
\r \hL,
Salt Lrke
(
'/
h

t
\L
!t
Fa1 s D6like

Error:0.03757 Steps:76

Remarks: kprilt ,talt"t t


#neura-bet ( f ormu Ia, data, hidden = 1, threshold = 0.01,
a-

# stepmax = 1e+05, rep = l, startlreights = NULL,


# learningrate . .l-1mit : NULL, Iearningrate. factor : fist (minus 0

# plus = L.2), Iearningrate : NULL, lifesign = "none"


# lifesign. step : 1000, algorithm = rrrprop+", err.fct
# act. fct = "Iogistic", linear. output : TRUE, exclude = NULL,
# constant . weight s = NULL, Iikelihood = EALSE)

plot. nn
x : an ob j ect of c.Iass nn
rep: repetition of the neural network. 1f rep=rrbestn, the repetition
with the smallest error wil.l- be plotted. If not stated all repetitions
will- be plotted, each in a separate window.

8
*## rable 11.3
> library (caret )

Loading required package: fatt ice


Loading required package: ggplot2
Warning message:
package 'caret' was built under R version 3.4 .4
# prediction from nn model using t'c.ompute" nllt, rniivl lu Atl
I (
> predict <- compute (nn, data. frame (df$Salt, df$ Ear ) )
_t
P
vzA:t*
> clas s (nr-e!]ct 1 7{)
[1] "1ist"
> names (pris!:Lct )

[1] "neurons " "net . resuft "


o tih
O
> predict oi x
$neurons >o );r lilcr
$neurons [ [1] l
I df. SaIt df Fat
1 I 0.9 0.2
2 I 0.1 0.1
( t 3 I 0.4 0.2
4 I 0.5 0.2
5 I 0.5 a\l
6 1 0.8 0.3

$neurons [[2] l
t,1l L21 t, 3l l'41
1 10 .864545L21 5 0.2L7t-991846 4.1529212964
2 10 .3910t989 47 0.8873134952 0.910168C690
3 10 . 66608 99r04 0 .5959029849 0. 6070151908
4 10 .r156845856 0.5118869030 0.5013653731
5 10 .8512504353 0.2213912623 a .23491 62818
6 10 .88407s7738 0.1641581374 0.1329130140

$net. result Lil Q Dl)l:le


t ,L) I ))
1N

?
t* t1
t2
0..9349452648L41
0.0002415s3s993
0
0
.01 0227 3225"1
.99965572419
$.1^ t3 0.0344275186564 0. 96556787694
tl\r
t4 c.12486661 41140 0.87816827940
6 o\, t5 0.8841904620140 0.1261243'i'72L
t6 0.t591361793188 0.0450s630529

9
wLi.l^ . rr-t a x 2 2 I
apply (predict$net. result, i, which . max)

tL1 L222t1.
h "n,

Remarks:

3!41) ir a R function which enables to make quick ogerations on matrix, vector or array. The
operations can be done on the lines, the columns or even both of them.

Horr does it rvork'l

The patlem is really simple : apply(variable, margin, fuqg{on).


-variable is the variable you want to appll the fiii-ction to.
-margin specifies if you want to apply b1, row (margin: !), by column (margin = 2J, or for each
element (margin = I :2). Margin can be even-lreater than 2, if we work with variables of
dimension greater than two.
*function is the function you wanl to apply to the elements ofyour variable.
7_-
#the rnatrix we ra'ill work on: l.r I I
ll r ? lz
a = nlatrix(c(l : l5). nrow = -5 , nool 3) .t ',:!
?o
|,- 4 rv
t t! I
#will apply the tirnclion mean to all th!'elsme nts 6T!6Ih rorv
applv(a. 1 . nrcan)
#[l 678e10
#will apply the firnction rnean to all the elements of each column
apply(a. 2. mean)
#[r] 3 813

> predicted. class = apply (predict$net. result,l,which.max)-1


> apply(predict$net.resu1t, l,which.max) -1
t1l 0 1 1 1 0 0

10
>df
Obs . fdL Salt Acceptance Like Disiike
11 0.2 0.9 Ii ke TRUE EALSE
22 0.1 0.1 dislike FALSE TRUE
33 0.2 0.4 dislike EALSE TRUE
44 0.2 0.5 dislike FALSE TRUE
55 0.4 0.5 Iike TR.LIE EALSE
66 0.3 0.8 Ii ke TRL1E EALSE I ult.t& +b ,utt t hr.l.tllll| ll,1
loj\,.cJ t#<v,,n*-
) table (ifelse (predicted. class::" 1", "disLi-ke", "1ike"), df $acceptfnce)
A"+hr.{
dislike Iike r -r lir illq
disli ke 3 0
c -l -1, lct
Yr'i fi ke 0 3

> # afternativeiy, use'freaic/' ." opro=" to compute


;tit* > predict <- predict (nn, df)

> class (predict )

[1] "matrix"
> names (predict )
NULL
> predict t:lq Dtl.lq
t,1l l,2l
?
li
t'
,.J
t1
t2
l0
l0
. 9349452648 0 . 0't 022-132
. 0002415536 0. 9996551-2

,A t3 l0 . 0344215187 0. 965s6788
IA l0 .L2486661 48 0. 87816828
Y
Dl,. ts, l0 . 884L904620 0. 1261 2438
16, l0 .9591361793 0.0450563r tyro,l
>p red icted. class:apply (predict, 1, which. max) -1
> table(ifelse (predicted.cl-ass=="1", "disJ-ike", "Iike"),
df$Acceptance )

d f ke 1
disfike 3 0
1i ke 0 3

11
('J qtt v rL'l A,fiM rh
usc+^J[ r+ \^+nl's- '

Data Normalization
l, hV{ ct +o
%' "^ll< sqlt-
One of the most important procedures when forming a neural network ir dljellg4altzation. thit
involves adjusting the data to a common scale so as to accurately compare predicted and actual
values. We can do this in two ways in R:
. Scale the data frame automatically using the scale function in R
x-x I scl'r+,
s 0
. Transform the data using a max-min norntalization technique

Scaled Normalization
,-i
W lorlT
w
scale( )

Max-Min Normalization
For this method, we invoke the following functiorr to normalize our data:
normalize <- function(x) {
rerum ((x - min(x)) / (max(x) - min(x))) fflg *a(o,l-l
) 'vtr''1
- ttn\ Y

Then. rve use lapply to run the lirnction across our existing data (e.g., mydata):
maxmindl <- as.data.l'rame(lapply(mydata, normalize) )

{^'.1 w\l\ # normalizinq the SaIt and Fat in df


re g\att Fat r lr".lt
> df [,2:3]<-sca]-e(df [,2:3] )
16e
rv q ino{ ) scale(dfl,2:31)
.rl\*rr
Fat Salt
11, I -0.322't486 1.2752820
12,) -1 .2909944 -l-.5071514
t3, I -0.3227 486 -0.4637389
[ ]
4, -0 .3221 486 -0. 1159347
ts,l 1.6137431 -0.r.1s9347
16,) 0.64549'72 0.92't4'7'78
attr (, "scaled: center" )
Fat Salt
3. 700743e-17 8.326613e-11
attr (, "scaled: scaIe" )

Fat Salt
11

72
valur r
hovnJ\d
> nn <- neurafnet (t,ite + Dlel.ike - SaIt + Eat, data = df,
.I inear . output = F, hidden = 3 )
> #nn # long output

> # dlsplay welghts


> nn$weights
ttlll
ttilt ttltl
[,1] 1,2) t,3l
1 0.3960458 0.03305449 -1.049633
2 -2 - 441t'788 -r. 47511998 4 .219826
3 -2.6"144981 -2.182691_29 3.543871

t[1]l tt2ll
t, 1l L,2t
1 0.4923L33 0.5164109
2 -3.2832295 L.'t6s6123
3 -2.0970303 L.2261949
4 4.0719476 -3.65380s0

> # display predictions f,rl h,lJ w ")',


o,r"
> D.rediction (nn) a { rlr< P

Data Error: 0;
$ repl fu' )' 7wL
Sal"t Eat Like Disl-i ke
(ur,J S /l ,
+\,eot1t\to,llv
1 -1.5071514 -r.2909944 0.007591287 0.91061'726 , L
t10,r
Iot i" 2 -0. 4531389 -0.3227 486 0.0l562r-549 0. 95544668 ;^ s6!i tl', '^n1
3 -0. 1159347 -0.3221 486 0.03-p76s916 0. 93157454 t:
buol,rr. e^ch YL
or
Jlln( 4 t.2't52820 -0.3227486 0.96J'788452 0.07879514
Ari'^ 5 0.92'7 47'78 0.64549't2 0.986252988 0.04763819 tt,llS,t)rtA LilLl '^
l".;t(r 6 -0.r159347 7.6t3'?437 0.986s08633 0. 047 6908 4 I\eiv LA uotuL '
5oYt4
$data ,f. u1 lou unt
Salt Fat L ike Dislike 6u rn t ri gr.{
1 -1. 5071514 -1.2909945 0 1
It{-citi,/"' )
2 -0.4637389 -0.3221 486 0 1
3 -0.1159347 -0.3221 486 0 1
4 t.2152820 -0.3221 486 1 0
5 0.921 4718 0.6454972 1 0
6 -0. 1159347 1..6L3't 431. I 0

# plot network

13
> PLot (nn, rep=rrbest'r )

n",,,,J',14
Va( lr{ J

Error: O.O'l 0468 Steps. 37

predict <- predict (nn, df)


preci i cE
[,1] 1 ,21
t1,l 0.96'7'188452 0.07879514
12, ) 0. 007591287 0.91061726
t3,l 0.015621549 0. 95544658
14, ) 0. 030765916 0. 93157454
ts,l 0.986508633 0.04769084
[6, ] 0.986252988 0.047638L9

> predi cted . cl a s s=app1y (predi ct, 1, whr ch. max ) - 1

> table(ifelse (predicted.class=="1", "disIike",


dfSAcceDtance )

dislike 1i-ke
H"( disfike 30
l) li ke 03

t4
Example 2: Classifying Accident Severity

Subset fiom the acoidents data. for a high-lbtality region

Obs. ALCHL I I'ITOFIL_I_R STIR-Co\D VEH_INVL MAX SEV IR


I l I I 1 I

22 I 0
32 I

4 I I 0
-- i--
52 I I 2

6 0 I I
-t--
72 0 I 3 I

82 0 i I 4 I

92 0 I 2 0

10 ) I 0

Description of Vadables for Automobile Accident Example

ALCHL I Presence (1) or absence (2) of alcohol r {ludr


PROFIL_I_R Profile of the roadway: level (1). other (0) ,tJ J
SUR COND Surlace condition of the road: dry ( 1), wet (2), snodslush (3), ice (4),
J l-tt d
unlnown (9) '
VEH INVL Number of vehicles involved
MAX SEV.IR Presence of injuries/fataliries: no injuries (0), injury (1), fatality (2) ) Stttls

A neural netu-ork n'ith two nodes in the hidden laver (accidents data) i Y <- X, ,/\r"'
')

#### Tab1e 11.6, 11.7


l ibrary ( neuralnet, nnet, caret )

> accidents.df <- read. csv ("accidents 1. csv" )

> view (accidents.df)

15
> head (accictents. df)
HrlLrR I R Al(:rll 1 ALIGT]_I STRA:UI4_R ',lir:.--ZCliE i,llii I R IIIT_HWY LGTCCI.i_I-R MAI.JCCI, T 3 PED A:C R RELJCT 1 R
F,EL R"d! R I'IOT I;, I A
1 0
l:
0 I a) I
I
L l C 1 i Ll I
;
4 1 I l 0 I
1
1 c c 3 0
I
l I I i) 0
0
-villATEER
SPF_i,I]I SU:_CONi I]11E_CON R TRAF ,iAY Yil: IilV:. L iNJURY CRAIIH NO II'IJ I PRPTYDI.IG CiASH FATALITIES
l,iAx ssv ti1
1 1i 1 1

I
? a_l

0
l 0
0
4
'l,t: L (l 1

0
5 al c
0
6 .'t' ,j o !
1

str acciCents. df)


(

'data-framer:. 42183 obs of 4 variabfes:


$ HOUR_I*R int 0 1 1 t 11110 .L

s Arcnl,_r int 2 2 2 .) 1 22222


9 ALIGN*I int 2 1 1 1 1 11111
$ STRATUM-R int 1 0 0 1C I 0110
$ WRK_ZONE int 0 0 0 0c0 0000
S WKDY-I_R int 1 1 1 c1111i0
9 TNT-HWY int 0 1 0 0c0 1000
$ LGTCON-I_R i.nt 3 3 3 33.3 3333
$ MANCOL_I_R i-nt 0
.>
2 220 0000
5 PED-ACC_R int 0000c00000
$ REI,JCT_I_R int 1111C L00ii
$ REL_RV'Y_R int 01111r1 0000
$ PROFrl,_r_R int 1 11 11 1 111
S SPD_LIM int 40 1A 3r 35 25 't0 70 35 30 25
$ SUR-COND int 4 44 44 4 444 4

9 TRqF_CON_R int 0 01 1{l 0 000 0


$ TRCE_WAY int 3 2 277
$ VEH_INVL int 1 I
S WEATHER*R int 1 2 2722,
$ ]NJURY-CRASH int I 00 011 t- 010 0
9NOINJI int 1 000r1 1010c

1b
S PRPTYDMG-CRASH: int 0 1 1 1 101 0 1 1 ...
$ FATA],I T I ES : lnt 0 0 0 0 c 0 0 0 0 0 ...
S MAX SEV IR : int 1 0 0 0 c 1 0 1 0 0 ...
> # seLected variables
> vars <- c("ALCHI_I", "PROFIL_r*R", ,VEH INVI" )

> vars
[1] "ArCHr,_r " " PROFII_I-R" I'VEH_I NVL "

> # partition tiie data


> set. seed (2 )

) tral-nlng=sample (row- names (accicents. df ), dim (accidents. df ) [ 1 ] *0. 6)


> val idation=setdi ff (row. names (accidents.df), training)
> ,(_ ,lrd nuwrtrr c

class. ind <- function (cl )


tunrtl- -l-o L rr,t\fr< i \Ji ,r*n/
{=----
n <- length (c])
cl <- as. factor (cl )
h n^^hi ,[ ail ot,
x <- matrlx(0. n, length(levels (c-)) ) ^
x[ (1:n) + n*funclassGf t-l)] G-T
dim-names (x) <- llst (names (cl) , levels (c]) )
x
) dnl y. +t uka o *tt

> # ,^,1:'rlill,'k'
> head (class. ind (accidents.df Itralning. ] $sUR COND) )
, 2 3 h q

1 2 3 4 9
1,I 1 0 0 0 0
2,1 1 0 0 0 0
3,I 1 0 0 0 0
4,1 7 0 0 0 0
5,I 1 0 0 0 0
6,1 7 0 0 0 0
tai.I (class.ind(accj-dents.df Il-ra]nins, I SSUR CoND) )

12349
125304 , ) 10000
[25305, ] 10000
[2s306, ] 10000
125301 , I 10000
[2s308,1 10000
12s309 , I 01000
\7
No
i.^j r7
'j 14.
I {-t^l;+y
3 lu,l,, U tl 2
> head(class.ind(accidents.df Itra:ninS, ] $MAX SEv IR) )
0 1 2
t1,l 1 0 0
12,) 1 0 0
3,l 1 0 0
4,) 0 1 0
5,l 1 0 0
6,1 1 0 0
tai I ( class . ind (accidents . df I tra: ni ng, ] $l.4Ax_SEv_IR) )

012
12s304,l 010
[2s305, ] 010
t25306 , I 010
[25307, ] 100
[25308, ] 010
t25309,I 010

* when y has multipl-e classes - need to dummify


trainData <- cbind (accidents. df Itraining,c (vars) ],
r vadr d^
Adftr.r class.ind(ac:idents-ifItraining, ] SsuR-coND) ,
class. ind (actridenLs,cif Itraining, ] 9MAx sEV IR) )

names (trainData ) <- c (vars,


pasre ( |'suR_coliD_'t, c{L, 2, 3, 4 3), sep=" " ; ,
paste ("MAX_SI!-_IR_", c 10, 1, 2J , sep=rr '! ) )

validData <- cbind(accidents.cif Ivalidation.c(vars) ],


class . ind (accidents.Cf Ivalidation, ]$SUR_COND) ,
c1ass. ind (accidents.cf [validation, ] $MAx SEv IR)
names (val i dData ) <- c(vars,
paste ("SUR*COliD-'r, c 11, 2, 3, 4r 9). sep='r'r),
paste("MAX SEV IR ", c(0, 1r 2), sep='rir)1

# run nn with 2 hidden nodes


# use hidden= with a vector of integers specifying number of hidden
nodes in each layer lJo ir\ u! '\ r^)14u/
{a+lla,
/
nn <- neuralnet (MAX SEV IR 0 + MAX SEV ]R 1+ MAX SEV IR 2
ALCHL_I + PROFI L IF.+VEH + SUR-COND_1 +SUR COND 2
-INVL
+ SUR COND 3 + SUR CCND 4, dara = trainData,
P, r lt',,
I
f tl,ltlt Ctn'letterl' '
c-4"' Its<tiv {"l}-
> * this takes longer to run I lu1< 'ru'pl< ri6 ) ?olrr) tav: b ttrf r4h:
> plot (nn) [ce r' 3r-1
6rc:{1$1./fg arc\ AtJrt.Af
1 1

ALCHL I

PROFIL I R SEV I 0

\.!

VEH INVL O

s
SUR COND SEV I 1

COND

S o 2

SUR CO

Enor: 6359.603423 Steps: 29196

19
> traininq.prediction <- cojpg!{.rn, LrainData [, -c (8:11) J )
> training.class <- apply(training.predictionsnet.result,l,which,max)-1
> table (training.class. accidents.df Itraining, ] $MAx_sEv_IR)
A.+rra{
training . clas s 1
0 10798 9964 191
rLA 1 1-644 2645 63 4l
rl (^r1s ih f ,rJi r,h tv,
i?

> val,idation.prediction <- comp:rte(nn, validData[,-c(8:11) ] )


> validation. class <-
apply (val idation. prediction$net . resuft, 1, which.nax) -1
> table (validation. class, accidents. df Ivalidation, ] $MAx sEv IR)
fr.+ur.(
validation. c lass
0 171 9 6596 148
Y"r{ 1 110 0 t7 97 60
i,

Remarks

Confusion Matrix
Reierence
Predlction 0 1 2
0 10798 9964 190
7 L644 2645 68
200 0

Confusion Malrix
Re:erence
Prediction 0 1 2
0 l tlg 5596 148
1 1100 1791 60
200 0

20
Common Criteria to Stoo the Updating
' When weights change very little from one iteration to the next

When the misclassification rate reaches a required threshold

\['hen a limit on runs is reached

Avoidins Ove rfitting


With suffrcient iterations, neural net can easily overfit the data

To avoid overfitting:
' Track error in validation data
' Limit iterations
' Limit complexity of network

User Inrruts

Speciry Network Architecture

Number of hidden layers


' Most popular - one hidden layer

Number of nodes in hidden layer(s)


' More nodes capture complexity, but increase chances ofoverfit

Number of output nodes


' For classification with m classes. use m or m-l nodes
' For numerical prediction use one

"Learning Rate"
' Low values "downweight" the new information from errors at each iteration
' This slo*s leaming, but reduces tendency to overfit to local structure

"Momentu m"
' High values keep weights changing in same direction as previous iteration
' Likewise. this helps avoid overfitting to local structure, but also slows learning

27
Arguments in ncuralnet

hidden: a vector specifying the number ofnodes per layer (thus specifying both the size
and number of layers)

learningrate: value between 0 and 1

Advantages
' Good predictive ability
' Can capture complex relationships
' No need to specifr a model

Disadvantages
' Considered a "black box" prediction machine, with no insight into relationships between
predictors and outcome
' No variable-selection mechanism, so you have to exercise care in selecting variables
' Heavy computational requirements if there are many variables (additional variables
dramatically increase the number of weights to calculate)

Deep l-eaming

The most active application area for neural nets


. ln image recognition, pixel values are predictors, and there might be 100,000+ predictors

- big data! (voice recognition similar)


. Deep neural nets with many layers ("neural nets on steroids") have facilitated

revolutionary breakthroughs in image/voice recognition, and in artificial intelligence (AI)


Key is the ability to self-leam features ("unsupervised")
For example, clustering could separate the pixels in this 12" by 12" football field irnage
into the "green field" and "yard marker" areas without knowing that those concepts exist
From there, the concepr of a boundary, or "edge" emerges
Successive stages move from identification oflocal, simple features to more global &
complex features

22
Summan'
' Neural net\4,orks can be used for classification and prediction
' Can capture a very flexible/complicated relationship between the outcome and a set of
predictors
' The network "leams" and updates its model iteratively as more data are fed into it
' Major danger: overfitting
' Requires large:mounts of data
' Good predictive performance, yet "black box" in nature
' Deep leaming, very complex neural nets, is effective in image recognition and AI
tl
rrir*
' '.lr
T€ruor Fl orrt
-'r lenni'.'t
Dcto
d .J

Problems R ;wtrrlrve. Lcnn' il'ut


3. Car Sales. Consider the data on used crs (TbyotaCorol/a. csu) with 1436 records and
details on 38 attributes, including Price, Age, KM, HP, and other specifications. The goal
is to predict the price of a used Toyota Corolla based on its specifications.
a. Fit a neural network model to the data. Use a single hidden layer with 2
nodes.
. KM, Frrel_Type, HP, Automatic,
Use predictors Age_08_04,
Doors, Quarterly_Tax, Mfr_Guarantee, Guarantee_Period, Airco,
Automatic_airco. CD_Player,
Powererl_Windows, Sport_Model, and Tow_Bar.
. Remember to first scale the numerical predictor and outcome
variables to a 0-1 scale (use function preprocessQ with method :
" range "-see Chapter 7) and convert categorical predictors to dummies.

Record the RMS error for the training data and the validation data. Repeat the
process, changing the number ofhidden layers and nodes to {single layer with 5
nodes), {two layers, 5 nodes in each layer}.
iii.What happens to the RMS error for the training data as the number of layers and
nodes increases?

la
lv. What happens to the RMS error for the validation data?
Comment on the appropriate number of layers and nodes fbr this
application.

24

You might also like