Professional Documents
Culture Documents
http://ebooks.cambridge.org/
Chapter
Downloaded from Cambridge Books Online by IP 198.211.119.232 on Sun Apr 17 21:19:43 BST 2016.
http://dx.doi.org/10.1017/CBO9780511975820.015
Cambridge Books Online © Cambridge University Press, 2016
264 Appendix
Logistic Regression
Matlab code for performing logistic regression is provided below.
Neural Networks
Neural Networks is a broad term that includes a number of related imple-
mentations. In this book we used one that optimizes the number of nodes in
the hidden layers. This is derived from work by Broyden–Fletcher–
Goldfarb–Shanno (BFGS) and by DJC MacKay; see MacKay (1992a,b).
The implementation we used was written for Matlab by Sigurdur
Sigurdsson (2002) and is based on an older neural classifier written by
Morten with Pedersen. It is available in the ANN:DTU Toolbox http://isp.
imm.dtu.dk/toolbox/ann/index.html.
As stated on that website, all code can be used freely in research and other
nonprofit applications. If you publish results obtained with the ANN:DTU
Toolbox you are asked to cite the relevant sources.
Multiple neural network packages are available for R (search “neural
network” at http://cran.r-project.org/.
Still other free packages for neural network classification (NuMap and
NuClass, available only for Windows) can be found at http://www-ee.uta.
edu/eeweb/IP/Software/Software.htm.
A convenient place to find a collection of Matlab implementations is
“Matlab Central” http://www.mathworks.com/matlabcentral/.
For example, Neural Network Classifiers written by Sebastien Paris is
available at http://www.mathworks.com/matlabcentral/fileexchange/17415.
A commercial package “Neural Network Toolbox” is also available for
Matlab.
Boosting
We used BoosTexter, available at http://www.cs.princeton.edu/~schapire/
boostexter.html.
For this implementation see Schapire and Singer (2000). As stated on the
home page above, “the object code for BoosTexter is available free from
AT&T for non-commercial research or educational purposes.”
Random Forests; RF
Random Forests (written for R) can be obtained from http://cran.r-project.
org/web/packages/randomForest/index.html.
Downloaded from Cambridge Books Online by IP 198.211.119.232 on Sun Apr 17 21:19:43 BST 2016.
http://dx.doi.org/10.1017/CBO9780511975820.015
Cambridge Books Online © Cambridge University Press, 2016
265 Appendix
SASW programs
Logistic regressions for stroke study analysis were done in SAS version 9.1.3 ®
PROC LOGISTIC.
®
Custom SAS version 8.2 PROC IML code, macro %GOFLOGIT written
by Oliver Kuss (2002), was used for model goodness-of-fit analysis in logistic
regression.
Matlab code
Code for Fisher Linear Discriminant Analysis: fLDA.m
function [ConfMatrix,decisions,prms]=fLDA(LearnSamples, . . .
LearnLabels,TestSamples,TestLabels)
% Usage:
% ConfMatrix,decisions,prms]=fLDA(LearnSamples,LearnLabels,
% TestSamples,TestLabels)
%
% The code expects that the LearnSamples and TestSamples be
% n x m matrices where n is the number of the cases (samples)
% and each row contains the m-predictor values for each case.
% Otherwise, transpose the data, i.e. uncomment the lines below:
% LearnSamples=LearnSamples’;
% TestSamples=TestSamples’;
% obtain the covariance matrix and the means for “positives” and
% “negatives”
[Spos, meanpos]=getSmat(predpos);
[Sneg, meanneg]=getSmat(predneg);
Downloaded from Cambridge Books Online by IP 198.211.119.232 on Sun Apr 17 21:19:43 BST 2016.
http://dx.doi.org/10.1017/CBO9780511975820.015
Cambridge Books Online © Cambridge University Press, 2016
266 Appendix
decisions=[];
ConfMatrix=[];
if nargin>2 % if testsamples provided
% run the discriminant on the testing data
cpred=TestSamples*wproj;
decisions=(cpred>cthresh)';
if nargin>3 % if testlabels provided
% obtain the confusion matrix (1 indicates that we
% want the raw counts)
ConfMatrix=GetConfTable(decisions,TestLabels,1);
end
end
% Supporting functions
% get a covariance matrix of x
function [Smat,meanx]=getSmat(x)
meanx=mean(x);
zmn=x-repmat(meanx,size(x,1),1);
Smat=zmn'*zmn;
Downloaded from Cambridge Books Online by IP 198.211.119.232 on Sun Apr 17 21:19:44 BST 2016.
http://dx.doi.org/10.1017/CBO9780511975820.015
Cambridge Books Online © Cambridge University Press, 2016
267 Appendix
Downloaded from Cambridge Books Online by IP 198.211.119.232 on Sun Apr 17 21:19:44 BST 2016.
http://dx.doi.org/10.1017/CBO9780511975820.015
Cambridge Books Online © Cambridge University Press, 2016
268 Appendix
switch fitit
case {0,2} % return the LR values
[nsamps, mpred] = size(LRSamples);
inputmat=[LRSamples,ones(nsamps,1)];
liny = inputmat * LRinp;
if fitit==2
lrresult = liny
else
lrresult = invlogit(liny);
end
case 1 % Obtain LR parameters (fitting)
% specify desired precision and maximal number of
% Newton-Ralphson iterations, trade precision (small itereps,
% e.g. 1e-12) for speed (larger itereps, e.g. 1e-7)
itereps = 1e-9;
maxiter = 100;
[nsamps, mpred] = size(LRSamples)
inputmat=[LRSamples,ones(nsamps,1)];
mprms=mpred+1;
% initialize iterations
lrresult=zeros(mprms,1);
lrnlabels=LRinp';
prevexpy=-ones(size(lrnlabels));
for iter=1:maxiter
liny=inputmat*lrresult;
expy=invlogit(liny);
% LR weights based on derivative of invlogit (=p(1-p))
lrw=max(5*eps, expy.*(1-expy)); % avoiding zero lrw for
liny=liny+(lrnlabels-expy)./lrw; % update with W^(-1)(y-p)
% adjust prescribed weights "wts" with LR weights to
% obtain the final weights matrix
weights=spdiags(lrw.*wts, 0, nsamps, nsamps);
Downloaded from Cambridge Books Online by IP 198.211.119.232 on Sun Apr 17 21:19:44 BST 2016.
http://dx.doi.org/10.1017/CBO9780511975820.015
Cambridge Books Online © Cambridge University Press, 2016
269 Appendix
function logodds=logit(p)
logodds=log(p./(1-p));
function p=invlogit(lodds)
p=1./(1+exp(-lodds));
Downloaded from Cambridge Books Online by IP 198.211.119.232 on Sun Apr 17 21:19:44 BST 2016.
http://dx.doi.org/10.1017/CBO9780511975820.015
Cambridge Books Online © Cambridge University Press, 2016
270 Appendix
consistent, and do better than the original “winning” scheme at every sample
size.
Moreover, as discussed in Chapter 12, for any finite collection of machines
there is an ensemble machine that does at least as well as the best in that
collection. Which is to say that declaring a single winner in a machine arms
race is a misdirected use of computing resources and brain power.
Downloaded from Cambridge Books Online by IP 198.211.119.232 on Sun Apr 17 21:19:44 BST 2016.
http://dx.doi.org/10.1017/CBO9780511975820.015
Cambridge Books Online © Cambridge University Press, 2016