You are on page 1of 5

STATISTICA Formula Guide

Weight of Evidence Module


Copyright 2013 Version 1 PAGE 1 OF 5
Making the World More Productive
Formula Guide
Weight of Evidence Module
!he purpose o" the #eight o" E$iden%e &#oE' module is to pro$ide "le(i)le tools to re%ode the $alues
in %ontinuous and %ategori%al predi%tor $aria)les into dis%rete %ategories automati%ally* and to assign
to ea%h %ategory a uni+ue #oE $alue, !his re%oding is %ondu%ted in a manner that -ill produ%e the
largest di""eren%es )et-een the re%oded groups -ith respe%t to the #oE $alues, .n addition* other
%onstraints are o)ser$ed -hile the program determines solutions "or the optimal /)inning0 o"
predi%tors,
Optimal Coding of Predictors
1pe%i"i%ally* the goal o" the algorithms implemented in the automated #oE module is to
identi"y the )est groupings "or predi%tor $aria)les that -ill result in the greatest di""eren%es
in #oE )et-een groups, For %ontinuous $aria)les the automated #oE module identi"ies
the )est re%oding to -eight2o"2e$iden%e $alues, For %ategori%al predi%tors or intera%tions
)et-een %oded predi%tors* users %an %om)ine groups -ith similar o)ser$ed #oE to %reate
ne- %oded predi%tors -ith %ontinuous -eight2o"2e$iden%e $alue,
Continuous Variables
For %ontinuous predi%tors* "irst a de"ault %oding is deri$ed using the Classi"i%ation and
3egression !rees &C43!' algorithm, For de"ault %ategories -ith "e-er than 20 groups
STATISTICA -ill e(pli%itly sear%h through all possi)le %om)inations o" de"ault groups to
a%hie$e the least num)ers o" groups -ith the greatest .n"ormation Value &.V', #hen the
num)er o" groups is greater than 20* STATISTICA uses the C5A.6 approa%h, !he C5A.6
approa%h is a modi"i%ation to the C5A.6 algorithm -here instead o" the %ustomary X
2

%riterion* the %hange in #oE is used as the %riterion,
!hree types o" %onstrained #oE re%oding solutions are pro$ided su)7e%t to their e(isten%e8
9onotone solutions* -here the #oE $alues o" all ad7a%ent re%oded groups
&inter$als' -ill either in%rease &positi$e monotone relationship o" predi%tor inter$als
to #oE'* or the #oE $alues o" all ad7a%ent re%oded groups -ill al-ays de%rease
&negati$e monotone relationship o" predi%tor inter$als to #oE',
:uadrati% solutions* -here the relationship )et-een the %oded $alue ranges
&inter$als' to #oE %an ha$e a single re$ersal so that the resulting "un%tion is either
;2shaped or in$erse2;2shaped,
Cu)i% solutions* -here the relationship )et-een the %oded $alue ranges &inter$als'
to #oE $alues %an ha$e t-o re$ersals so that the resulting "un%tion is 12shaped,
!-o types o" un%onstrained #oE re%oding solutions are pro$ided8

STATISTICA Formula Guide
Weight of Evidence Module
Copyright 2013 Version 1 PAGE 2 OF 5
Making the World More Productive
Custom %oding is )ased on the de"ault )inning s%heme -ith either C43! or 10 e+ual
groups o" appro(imately e+ual si<e,
!he no restri%tions %oding is )ased on the %ustom solution a"ter the running either
the e(hausti$e sear%h or the C5A.6 algorithm,
=ote that the initial )ins may)e ad7usted prior to the algorithm in order to ma>e sure that
ea%h )in satis"ies the minimum = and minimum ?ad = user spe%i"ied parameters,
Categorical Variables
For %ategori%al &dis%rete' predi%tors* the de"ault &original' grouping is "urther re"ined using
the modi"ied C5A.6 approa%h,
!-o types o" un%onstrained #oE re%oding solutions are pro$ided8
Custom %oding is )ased on the de"ault )inning o" the group,
!he no restri%tions %oding is )ased on the de"ault %ategori<ation pro$ided )y the
modi"ied C5A.6 algorithm,
=ote that the initial )ins may)e ad7usted prior to the algorithm in order to ma>e sure that
ea%h )in satis"ies the minimum = and minimum ?ad = user spe%i"ied parameters,
Interactions
For pairs o" %oded predi%tors the modi"ied C5A.6 approa%h is implemented using
intera%tion %oding o" the t-o2-ay intera%tion ta)le or user2de"ined %oding,
Statistics
Chi-square
X
2
=
(0bscr:cJ

-ExpcctcJ

)
2
ExpcctcJ

K
=1

!his statisti% is distri)uted a%%ording to a %hi2s+uare distri)ution -ith degrees o"
"reedom e+ual to the di""eren%e )et-een the num)er o" parameters under the
alternati$e hypothesis and the num)er o" parameters under the null hypothesis,
Cramers V
I =
_
X
2
N
,
min
(-1)(]-1)

=otation8
N = !otal num)er o" o)ser$ations
min
(-1)(]-1)
@ 9inimum o" ro- dimension minus 1 and %olumn dimension minus 1

STATISTICA Formula Guide
Weight of Evidence Module
Copyright 2013 Version 1 PAGE 3 OF 5
Making the World More Productive
F-test
F =
_ n

.
-

)
2
K -1
,

_
(
]
-

.
)
2
N - K
_
]

=otation8

.
@ sample mean o" the i
th
group
n

@ num)er o" o)ser$ations in the i


th
group

@ o$erall mean o" the data


K @ denotes the num)er o" groups

]
@ 7
th
o)ser$ation in the i
th
out o" A groups
N@ o$erall sample si<e
Gini
g = 2 _
Numbcr o BoJs
N
] _
Numbcr o 0ooJs
N
]
=otation8
N @ !otal num)er o" o)ser$ations
Information Value (IV
II = _(Rcloti:c Frcqucncy o 0ooJs

-Rcloti:c Frcqucncy o BoJs

)
K
-1
- ln _
Rcloti:c Frcqucncy o 0ooJs
Rcloti:c Frcqucncy o BoJs
]_
!he .V o" a predi%tor is related to the sum o" the &a)solute' $alues "or #oE o$er all
groups, !hus* it e(presses the amount o" diagnosti% in"ormation o" a predi%tor
$aria)le "or separating the Goods "rom the ?ads,
!olmogoro"-Smirno" (!S test
For all Good o)ser$ations* predi%ted pro)a)ility o" ?ad is %omputed* that is the
relati$e "re+uen%y o" )ad %ases in the )in a Good o)ser$ation is pla%ed, !his pro%ess
is repeated "or all ?ad o)ser$ations, !he A1 test is then %ompleted -ith the
GoodB?ad indi%ator as the group $aria)le and the predi%ted pro)a)ility o" ?ad as the
response,

STATISTICA Formula Guide
Weight of Evidence Module
Copyright 2013 Version 1 PAGE C OF 5
Making the World More Productive
Z = mox
]
|
]
|
_
n
1
n
2
n
1
+n
2
]
1igni"i%an%e le$el &p' appro(imation is )ased on the "ormula8
p = 2 (-1)
-1

=1
c
-2
2
`

KS_
n
1
n
1
+n
2
+0.12+0.11
_
n
1
n
1
+n
2
_
/

2

#ogit $ransformation (#ogg Odds
Iogit = ln _
Numbcr o 0ooJs
N
,
Numbcr o BoJs
N
,
_
%ean
x =
_x
n

Somers &
." ties are present8
J =
(n
c
-n
d
)
t

." ties are not present8
J = 2 c -1 -here c =
(n
c
+u.S(t -n
c
-n
d
))
t
,

=otation8
&=ote8 1orting o" %ases "or %al%ulation o" 1omerDs d is )ased on the relati$e "re+uen%y o"
)ad* that is* estimated pro)a)ly o" )ad,'
t @ total num)er o" pairs -ith di""erent responses o" goodB)ad
n
c
= num)er o" pairs o" %ases -here the %ase -ith the lo-er ordered response $alue has a
lo-er predi%ted mean s%ore than the %ase -ith the higher ordered response $alue,
n
d
= num)er o" pairs o" %ases -here the %ase -ith the lo-er ordered response $alue has a
higher predi%ted mean s%ore than the %ase -ith the higher ordered response $alue,

STATISTICA Formula Guide
Weight of Evidence Module
Copyright 2013 Version 1 PAGE 5 OF 5
Making the World More Productive
'eight of ("idence ('o(
woE = _ln _
Rcloti:c Frcqucncy o 0ooJs
Rcloti:c Frcqucncy o BoJs
]_ - 1uu
!he $alue o" #oE -ill )e 0 i" the odds o" 3elati$e Fre+uen%y o" Goods B 3elati$e
Fre+uen%y ?ads is e+ual to 1, ." the 3elati$e Fre+uen%y o" ?ads in a group is greater
than the 3elati$e Fre+uen%y o" Goods* the odds ratio -ill )e less than 1 and the #oE
-ill )e a negati$e num)erE i" the 3elati$e Fre+uen%y o" Goods is greater than the
3elati$e Fre+uen%y o" ?ads in a group* the #oE $alue -ill )e a positi$e num)er,
)otes
!he #oE re%oding o" predi%tors is parti%ularly -ell suited "or su)se+uent modeling using Fogisti%
3egression, 1pe%i"i%ally* logisti% regression -ill "it a linear regression e+uation o" predi%tors &or #oE2
%oded %ontinuous predi%tors' to predi%t the logit2trans"ormed )inary GoodsB?ads dependent or G
$aria)le, !here"ore* )y using #oE2%oded predi%tors in logisti% regression* the predi%tors are all
prepared and %oded to the same #oE s%ale* and the parameters in the linear logisti% regression
e+uation %an )e dire%tly %ompared* "or e(ample* -hen using the ne- modeling tools "or 9arginal
1tep-ise Fogisti% 3egression,

You might also like