Logistic Regression With SAS

Logistic Regression with SAS
Summer Semester 2019/2020
Adam Korczyński
The Collegium of Economic Analysis of Warsaw School of Economics
Institute of Statistics and Demography
Statistical Methods and Business Analytics Unit
e-mail: akorczy@sgh.waw.pl
Contingency table and measures
of association
Logistic Regression with SAS ©AK, SGH 2020

Agenda
• Introductory notes: scope of the class, schedule and requirements
• Basic notions
• Contingency table
• Comparing two proportions - measures in the analysis of categorical
data
• Types of studies
• Calculating risk, risk ratio, odds and odds ratio
• Proc Freq in SAS. Cross tabulation with R and Python
Basic notions (Agresti, 2002)

• Categorical variable: scale consisting of a set of categories
• Response Y – Explanatory variable X
• Scales:
• Nominal
• Ordinal
• Interval
• Ratio
Contingency table
• Y and X – two categorical variables with k,l categories
respectively
• We can present the distribution of the two variables using a
table with k rows of Y and l columns for X categories
• The cells represent possible outcomes
• The table with frequency counts of outcomes for a sample,
the table is called contingency table (by Karl Pearson, 1904).
• 2x2 table:
X=x1 X=x2
Y=y1 n11 n12

Y=y2 n21 n22
E.g. X- tax regimen (A, B); Y- survival of a company at time t
Comparing two proportions - measures in the

analysis of categorical data
1 𝑐𝑜𝑚𝑝𝑎𝑛𝑦 𝑖𝑠 𝑜𝑛 𝑡ℎ𝑒 𝑚𝑎𝑟𝑘𝑒𝑡 𝑎𝑡 𝑡𝑖𝑚𝑒 𝑡
𝑌=ቊ
0 𝑐𝑜𝑚𝑝𝑎𝑛𝑦 𝑔𝑜𝑡 𝑜𝑢𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑎𝑟𝑘𝑒𝑡 𝑝𝑟𝑖𝑜𝑟 𝑡𝑜 𝑡
1 𝑡𝑎𝑥 𝑟𝑒𝑔𝑖𝑚𝑒𝑛 𝐴
𝑋=ቊ
0 𝑡𝑎𝑥 𝑟𝑒𝑔𝑖𝑚𝑒𝑛 𝐵
𝑅𝑖𝑠𝑘 (𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦): 𝑅𝑥𝑗 = 𝑃(𝑌 = 𝑖|𝑥 = 𝑗)
𝑃(𝑌 = 𝑖|𝑥 = 1)
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑟𝑖𝑠𝑘: 𝑅𝑅10 =
𝑃(𝑌 = 𝑖|𝑥 = 0)
𝑃(𝑌 = 𝑖|𝑥 = 𝑗)
𝑂𝑑𝑑𝑠 (𝑐𝑜𝑚𝑝𝑎𝑟𝑖𝑛𝑔 𝑡ℎ𝑒 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑤𝑖𝑡ℎ 𝑐𝑜𝑚𝑝𝑙𝑒𝑚𝑒𝑛𝑡𝑎𝑟𝑦 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦): 𝑂𝑥𝑗 =
1 − 𝑃(𝑌 = 𝑖|𝑥 = 𝑗)
𝑂𝑥1
𝑂𝑑𝑑𝑠 𝑟𝑎𝑡𝑖𝑜: 𝑂𝑅𝑥10 =
𝑂𝑥0
2x2 contingency table

Calculating R, RR, O, OR
Types of studies
• Follow-up study (prospective)
• Risk factor → Observation → Outcome
• Assigning tax regimen A or B →Observation → Survived or not
• Case-control study (retrospective)
• Risk factor  Observation  Outcome
• what was the tax regimen under which companies were operating? 
Survived or not
• Cross-sectional
• Snapshot of data at a given time point. Allows assessing coexistence of
features.
Prospective study – experimental design

• Set a target sample of subjects (e.g. companies) to be in the study
• Randomize subjects into groups of interest, e.g. tax regimens x
• Then observe the companies at scheduled time points: 6 months, 1
year, 2 years and determine the companies that are still operating Y
• In this setting Y is a random variable and x is fixed. Therefore we can
estimate P(Y=i|x=j) the probability of being on the market at time t
given that the company was operating under tax regimen j
Retrospective studies
• Research organization decides to study the status of the Polish companies
in a specific industry in years 2015-2016. There were n1 companies which
went bankrupt in this period. For the purpose of the comparison n2
companies operating throughout the whole period have been added to the
sample. The tax regimens under which the companies were operating were
identified afterwards.
• In this setting X is a random variable and Y is set post hoc. We do identify
the status of survival at first and then identify the characteristics. As a
consequence we can estimate P(X=j|y=i). This would mean probability of
being in a tax regimen A or B given that a company survived/not survived.
This does not address the research question though. The counts for Y are
not collected through a random process and therefore P(Y=i|X=j) is not
good estimate of survival conditional on tax regimen.
Numeric example – comparison of measures

of association
Population Survival
Regimen A Regimen B ni. RISKS RR ODDS OR
Survived 478 220 698 Regimen A 0.080282 1.47646 0.08729 1.51805
Not survived 5476 3826 9302 Regimen B 0.054375 0.0575
n.j 5954 4046 10000
Prospective sample
n.j 1190 810 2000
Retrospective sample
n.j 1146 654 1800
The chance of survival within the studied time period is 50% greater in
companies operating under tax regimen A.
Proc Freq in SAS

PROC FREQ < options > ;
BY variables ;
EXACT statistic-options < / computation-options > ;
OUTPUT <OUT=SAS-data-set > output-options ;
TABLES requests < / options > ;
TEST options ;
WEIGHT variable < / option > ;
BY Provides separate analyses for each BY group
EXACT Requests exact tests
OUTPUT Requests an output data set
TABLES Specifies tables and requests analyses
TEST Requests tests for measures of association and agreement
WEIGHT Identifies a weight variable
Source:
http://support.sas.com/documentation/cdl/en/procstat/66703/HTML/
default/viewer.htm#procstat_freq_syntax.htm
Exercise
• Assess the risk of lung cancer among smokers numerically in order to
answer the following question: What is the risk of having LC while
being a smoker as compared to not being a smoker?
• Use Excel and Proc Freq in SAS
Smoker
Lung Cancer Yes No
Yes 688 21
No 650 59
Data source: Agresti, 2002, Table 2.5.
Excercise with R and Python [SAS] l01_OddsRatio.sas

[R] l01_OddsRatio.R
examples – calculating OR [Python] l01_OddsRatio.py
data a01;
input y$ x$ n;
cards;
L S 688
L X 21
N S 650
N X 59
;
run;
proc freq data=a01;

tables y*x / relrisk;
weight n;
run;
Origins of logistic regression.
Introduction to logistic regression

Agenda
• Origins of logistic regression
• Distributions for categorical data
• Introduction to modelling of binary response variable
• The basic concepts of logistic regression
• Fitting a binary logistic regression model using the LOGISTIC procedure
• Describing the standard output from the LOGISTIC procedure one continuous
and one categorical predictor variable
• Reading and interpreting odds ratio tables and plots
Origins of logistic regression

• J.S. Cramer (2002), The Origins of Logistic Regression, „Tinbergen Institute
Working Paper”, No. 119(4).
• The logistic regression origins are in 19th century and relates to work with
aim to describe the growth of population and autocatalytic chemical
reactions. The process in question is the quantity W(t) over time, where t
represents time.
• Growth rate (e.g. population growth rate):
𝑑𝑊(𝑡)
ሶ
𝑊(𝑡) =
𝑑𝑡
• The assumption is that the growth rate is proportional to the current
quantity: 𝑊ሶ 𝑡
𝑊ሶ 𝑡 = 𝛽𝑊 𝑡 , 𝛽=
𝑊(𝑡)
where: β – constant rate of growth.
Exponential growth model

• Assumptions: constant rate of
growth
𝑊ሶ 𝑡 = 𝛽𝑊 𝑡
growing quantity
• The quantity W(t) at the time t can be determined by where we are in

the process which simply t but the effect β magnitude is growing over
time which leads to exponential growth model:
𝑊 𝑡 = 𝐴𝑒𝑥𝑝(𝛽𝑡)
where: A – initial level that can be replaced by W(0).
The thesis by T. Malthus dated 1789 claimed that that population left itself will
grow in geometric progression.
Population growth
Source: University of Oxford, Our World in Data, https://ourworldindata.org/world-population-growth, date

accessed 22Feb2020.
Exponential growth model – example

[SAS] l02_OriginsOfLogReg01.sas
Exponential growth model – SAS code

%let beta=1.7;
data a01;
A=100;
beta=&beta.;
do t=0 to 100 by 0.01;
Wt=A*exp(beta*t);
output;
end;
run;
proc sgplot data=a01;

series x=t y=Wt / lineattrs=(color=red) legendlabel="beta=&beta.";
run;
Further development leading to logisitic regression model
• In fact the exponential model has no upper limit. This aspect of the
described processes was criticized by Adolphe Quetelet (1795-1874) and
investigated by his pupil Pierre-Francois Verhulst (1804-1849).
• Their proposal was to introduce the term representing the resistance to
further growth. The model with Ω as the upper limit (saturation level) of
W(t) was suggested following some investigation on potential limiting
terms:
𝑊 𝑡
𝑊ሶ 𝑡 = 𝛽𝑊 𝑡 1−
Ω
• In this model „growth is proportional both to the population already
attained W(t) and to the remaining room for further expansion Ω-W(t)”
(Cramer, 2002, p. 4).
Verhulst suggestion for final model for W(t)

• In the model the quantity W(t) can be expressed as proportion
representing the share of the current level in the maximum:
𝑃 𝑡 = 𝑊(𝑡)/Ω
• Then by solving the below differential equation (see next 2 slides):
𝑊 𝑡
𝑊ሶ 𝑡 = 𝛽𝑊 𝑡 1−
Ω
we get the logistic function:
1
𝑃(𝑡) =
1 + 𝑒 −𝐶−𝛽𝑡
𝑊 𝑡
Solving the differential equation 𝑊ሶ 𝑡 = 𝛽𝑊 𝑡 1−
Ω
𝑑𝑊(𝑡) 𝑊 𝑡
= 𝛽𝑊 𝑡 1− ԡ∙ 𝑑𝑡
𝑑𝑡 Ω
𝑊 𝑡 𝑊 𝑡
𝑑𝑊(𝑡) = 𝛽𝑊 𝑡 1− 𝑑𝑡 ብ: 𝑊 𝑡 1−
Ω Ω 1
1 𝑊 𝑡
ԡ∙ Ω
𝑑𝑊(𝑡) = 𝛽𝑑𝑡 𝑊 𝑡 1− Ω
𝑊 𝑡
𝑊 𝑡 1−
Ω =𝑊
Ω
ԡ−𝑊 𝑡 + 𝑊(𝑡) in num
𝑡 Ω−𝑊 𝑡
1
න 𝑑𝑊(𝑡) = න 𝛽 𝑑𝑡 Ω − 𝑊 𝑡 + 𝑊(𝑡)
𝑊 𝑡 =
𝑊 𝑡 1−
Ω 𝑊 𝑡 Ω−𝑊 𝑡
Ω−𝑊 𝑡 𝑊(𝑡)
= +
1 1 𝑊 𝑡 Ω−𝑊 𝑡 𝑊 𝑡 Ω−𝑊 𝑡
න + 𝑑𝑊(𝑡) = න 𝛽 𝑑𝑡 1 1
𝑊 𝑡 Ω−𝑊 𝑡 = +
𝑊 𝑡 Ω−𝑊 𝑡
1 1
Solving න + 𝑑𝑊(𝑡) = න 𝛽 𝑑𝑡
𝑊 𝑡 Ω−𝑊 𝑡
ln 𝑊 𝑡 − ln Ω − 𝑊 𝑡 = 𝛽𝑡 + 𝐶 ԡ∙ (−1)
−ln 𝑊 𝑡 + ln Ω − 𝑊 𝑡 = −𝛽𝑡 − 𝐶 When taking this
mathematical model
Ω−𝑊 𝑡 to statistics we get the
ln = −𝛽𝑡 − 𝐶 ԡexp() following parameters:
𝑊 𝑡
Ω−𝑊 𝑡 For:
= exp(−𝛽𝑡 − 𝐶)
𝑊 𝑡 𝑃 𝑡 = 𝑊(𝑡)/Ω Intercept
Ω Regression
− 1 = exp −𝛽𝑡 − 𝐶 ԡ+1 The equation simplifies to:
𝑊 𝑡 coefficient β
Ω 1
= 1 + exp −𝛽𝑡 − 𝐶 𝑃(𝑡) =
𝑊 𝑡 1 + 𝑒 −𝐶−𝛽𝑡
Ω
𝑊 𝑡 = which is the logistic function.
1 + exp −𝛽𝑡 − 𝐶
Verhulst conclusion
• Verhulst concluded that „the curve agrees with the actual course of
the population of France, Belgium, Essex and Russia for periods up to
1833.” (Cramer, 2002, p. 4).
1833 – end date for

data in Verhulst study
Population growth
Source: University of Oxford, Our World in Data, https://ourworldindata.org/grapher/population-by-

country?time=1500..2000, date accessed 22Feb2020.
Logistic curve growth model – example

Logistic growth model – SAS code

%let beta1a=1.7;
data a01;
beta0=2;
beta1a=&beta1a.;
do t=-10 to 10 by 0.01;
Pt=1/(1+exp(-1*(beta0+beta1a*t)));
output;
end;
run;

series x=t y=Pt / lineattrs=(color=red) legendlabel="beta=&beta1a.";
run;
The use of logistic regression models

• The response Y is categorical
• Predictors X are:
• Categorical: Contingency Table or Logistic Regression
• Continuous: Logistic Regression
• Categorical and Continuous: Logistic Regression
Predictor X1 X1=a X1=b X1=c
Predictor X2 X2=y X2=n

Response Y
…
In general regression analysis has the aim to
quantify the relationship between response
Predictor Xk Xk∈ [0, 120]
variable and one or more predictors.
Types of Response Variables in Logistic

regression models
• Binary(i=2)
𝑌=ቊ
• Ordinal (i>2)
0 𝐺𝑜𝑜𝑑
𝑌 = ቐ1 𝑀𝑜𝑑𝑒𝑟𝑎𝑡𝑒
2 𝑃𝑜𝑜𝑟
• Multinomial (i>2)
0 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑝ℎ𝑜𝑛𝑒
𝑌 = ቐ1 𝐴𝑛𝑑𝑟𝑜𝑖𝑑
2 𝑖𝑃ℎ𝑜𝑛𝑒
Distributions for categorical data

Binomial distribution (Agresti 2002, p. 5-6)

• y1, y2,…, yn – n independent and identical (Bernoulli) trials such that:
P(Yi=1)=𝜋
P(Yi=0)=1-𝜋, where 1 can represent a „success” and 0 a „failure”.
• The total number of successes:
𝑌 = σ𝑛𝑖=1 𝑌𝑖 has the binomial distribution with index n and parameter
𝜋 with the following probability mass function:
𝑛 𝑦
𝑝 𝑦 = 𝑦 𝜋 (1 − 𝜋)𝑛−𝑦 y=0,1,2,…,n.
with parameters:
𝜇 = 𝐸 𝑌 = 𝑛𝜋
𝜎 2 = 𝑣𝑎𝑟 𝑌 = 𝑛𝜋(1 − 𝜋)
• The distribution converges to normality as n increases for fixed 𝜋.
[SAS] l02_BinomialDist.sas
Binomial distribution PDF and CDF

P(X = x) P(X = x)
1.0 1.0
• We know that 0.9 0.9
one out of ten 0.8 0.8
clients is 0.7 0.7
responding to 0.6 0.6
the survey. 0.5 0.5
How likely it is 0.4 0.4

that we get
responses from 0.3 0.3
15 if we call 0.2 0.2
100? 0.1 0.1
0.0 0.0
01234567891111111111222222222233333333334444444444555555555566666666667777777777888888888899999999991
0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
0
x
Multinomial distribution (Agresti 2002, p. 6-7)

• yi = (yi1, yi2,…, yic )– multinomial trial with c outcome categories:
yij = 1 if trial i has outcome in category j and yij = 0 otherwise e.g. yi = (0,1,0)
• The number of trials having outcome in j is:
𝑛𝑗 = σ𝑛𝑖=1 𝑦𝑖𝑗 then counts (n1, n2,…, nc) have the multinomial distribution.
• Let 𝜋i=P(Yij=1) then the probability mass function in multinomial distribution is:
𝑛! 𝑛 𝑛 𝑛
𝑝(𝑛1 , 𝑛2 , . . . , 𝑛𝑐 ) = 𝜋1 1 𝜋2 2 ∙ … ∙ 𝜋𝑐 𝑐
𝑛1 !𝑛2 !∙ ... ∙𝑛𝑐 !
with parameters:
𝜇 = 𝐸 𝑌 = 𝑛𝜋𝑗
𝜎 2 = 𝑣𝑎𝑟 𝑌 = 𝑛𝜋𝑗 (1 − 𝜋𝑗 )
𝑐𝑜𝑣 𝑛𝑗 , 𝑛𝑘 = −𝑛𝜋𝑗 𝜋𝑘
• The binomial distribution is the special case with c=2.
[SAS] l02_MultinomialDist.sas
Multinomial distribution – kernel density

Distribution and Kernel Density for employed and unemployed
• What likely it is 15 0.016
to draw 70 0.014
employed and 5
unemployed 10
0.012
given that there 0.01

are three types
unemployed
of status 5
0.008
(employed, 0.006
unemployed,
inactive) with 0.004
𝜋1=0.75, 𝜋2=0.05 0
0.002
and 𝜋3=0.2 and

n=100? 60 70 80 90
0
employed
Poisson distribution (Agresti 2002, p. 7)

• Poisson distribution allows addressing the question on number of
“successes” or “failures” in trials the number of which is not fixed upfront.
For example y can represent the number of car defects in the next day.
• The probability mass function in Poisson distribution is:
𝑒 −𝜇 𝜇𝑦
𝑝 𝑦 = y=0,1,2,…
𝑦!
with parameters:
𝐸 𝑌 = 𝑣𝑎𝑟(𝑌) = 𝜇
The distribution converges to normality as 𝜇 increases. The distribution is
used for counts of events that occur randomly over time or space, when
outcomes are disjoint periods or regions are independent. It is an
approximation of binomial probability when n is large and 𝜋 is small with
𝜇=n 𝜋.
[SAS] l02_PoissonDist.sas
Poisson distribution PDF and CDF

• What is the
likelihood of
observing 4 car
defects next
day if the count
follows Poisson
distribution
with 𝜇=5?
Overdispersion (Agresti, 2002, p. 7-8)

• In practice count observations often exhibit variability exceeding that
predicted by binomial or Poisson distributions. This is called
overdispersion.
• For example we assume with Possion that each subject has the same
probability of getting a car defect. In reality this probability vary due
to factors such as amount of time spent driving, having the car
checked, driving conditions etc. Such variation causes the counts to
exibit more variability than by the Poisson model.
• Assuming a Possion distribution for a count variable is often too
simplistic. The alternative is negative binomial that permits the
variance to exceed the mean.
Introduction to modelling of
binary response variable

The constraints in probability modelling

𝜋𝑖 𝜌 𝛽0 + 𝛽1 𝑥1𝑖 (𝑖 = 1,2, … , 𝑛)
• Probabilities need to be bounded in [0, 1] => we need other than identity

link between the linear equation and probability 𝜋i
• What would be the best shape to reflect the relationship between X and 𝜋
throughout the possible range of X? The relationship is considered non-
linear in logistic regression model.
• In practice we do not observe the probability - this leads to different
assessment of “error term” than in linear regression model
Notation in the logistic model

Each observation on Y has Bernoulli distribution with parameters:
Through logit transformation we get:

Logistic curve
1
𝜋 𝑥𝑖 =
1 + 𝑒 −(𝛼+𝛽𝑥𝑖 )
The logistic curve holds and assumption on the relationship between X and p. It can be considered as an
approximation of the mentioned relationship.
Interpreting parameters in logistic regression

1
𝜋 𝑥𝑖 =
1 + 𝑒 −(𝛼+𝛽𝑥𝑖 )
• 𝛽 rate of change – whether the transition is gradual or sharp given X
(the greater 𝛽 the steeper the rate of change)
• 𝛽 > 0 predictor “stimulating” the probability
• 𝛽 < 0 “inhibitor” for the probability
• 𝛽 = 0 uniform distribution – each value gives the same probability
assessment
Logistic curve on an example – program code

%let beta1a=1;
%let beta1b=5;
data a01;
beta0=2;
beta1a=&beta1a.;
beta1b=&beta1b.;
do x=-10 to 10 by 0.01;
p1=1/(1+exp(-1*(beta0+beta1a*x)));
p2=1/(1+exp(-1*(beta0+beta1b*x)));
output;
end;
run;

series x=x y=p1 / lineattrs=(color=green) legendlabel="beta=&beta1a.";
series x=x y=p2 / lineattrs=(color=red) legendlabel="beta=&beta1b.";
run;
Logistic curve on an example – output

Logit transformation
Logistic model is often represented through logit:
𝜋 𝑥𝑖
𝑙𝑜𝑔𝑖𝑡 𝜋 𝑥𝑖 = 𝑙𝑛 = 𝛼 + 𝛽𝑥𝑖
1 − 𝜋 𝑥𝑖
where:
1
𝜋 𝑥𝑖 =
1 + 𝑒 −(𝛼+𝛽𝑥𝑖 )
• Logit is natural logarithm of odds
• It defines the relationship between linear predictor and probability as
response:
p→1 then ln(p/[1-p])→+ ∞
p→0 then ln(p/[1-p])→- ∞
Logit transformation on an example – program code

%let beta1a=1;
%let beta1b=5;
data a01;
beta0=2;
beta1a=&beta1a.;
beta1b=&beta1b.;
do x=-10 to 10 by 0.01;
p1=1/(1+exp(-1*(beta0+beta1a*x)));
logit1=log(p1/(1-p1));
p2=1/(1+exp(-1*(beta0+beta1b*x)));
logit2=log(p2/(1-p2));
output;
end;
run;

series x=x y=logit1 / lineattrs=(color=green) legendlabel="beta=&beta1a.";
series x=x y=logit2 / lineattrs=(color=red) legendlabel="beta=&beta1b.";
run;
Logit transformation on an example – output

Multivariate binary logistic regression model
𝑙𝑜𝑔𝑖𝑡 𝜋 𝑥𝑖 = 𝛽0 + 𝛽1 𝑥1𝑖 + ⋯ + 𝛽𝑘 𝑥𝑘𝑖
• 𝑙𝑜𝑔𝑖𝑡 𝜋 𝑥𝑖 - logit of the probability of the event

• 𝛽0 - intercept of the regression equation
• 𝛽𝑘 - parameter estimate of the kth predictor variable
PROC LOGISTIC in SAS

• The basic syntax of the LOGISTIC procedure is as follows:
PROC LOGISTIC <options> ;

CLASS variable <(options)><variable <(options)> ></ options> ;
<label:> MODEL events/trials=<effects></ options> ;
ODDSRATIO <’label’> variable </ options> ;
OUTPUT <OUT=SAS-data-set><keyword=name <keyword=name >></ option> ;
RUN;
SAS OnLineDoc:
https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#logistic_toc.htm
[SAS] l02_LogRegBank01.sas
[Data] bank.sas7bdat
Example 2.1 Binary logistic regression model

• The data is related with direct marketing campaigns (phone calls) of a
Portuguese banking institution. The classification goal is to predict if
the client would subscribe a term deposit (variable y) based on a
sample of n=4521 clients.
• Source: http://archive.ics.uci.edu/ml/datasets/bank+marketing
1 subscribed a term deposit
𝑌 − NEWDEPOSIT = ቊ
0 otherwise
𝑋1 - AGE: age of the client

𝑠𝑢𝑐𝑐𝑒𝑠𝑠
𝑓𝑎𝑖𝑙𝑢𝑟𝑒
𝑋2 − POUTCOME: response to previous marketing campaign =
𝑢𝑛𝑘𝑛𝑜𝑤𝑛
𝑜𝑡ℎ𝑒𝑟
Example 2.1 Syntax

Plot of the predicted probability on the Y
axis by the predictor on the X axis
proc logistic data=b02 alpha=0.05 plots(only)=(effect oddsratio);

Unit change class Poutcome (param=ref ref='failure');
for X for OR model NewDeposit(event='1')= Age Poutcome / clodds=pl;
assessment units age=5; *OR for change by given unit;
output out=p predicted=p; *Predictions;
run; Plot of odds ratio with confidence limits
The event category for the binary response model
Confidence intervals for the odds ratios of

all predictor variables
Example 2.1 SAS Output [1]

Model Information
Data Set WORK.B02
Response Variable NewDeposit
Number of Response Levels 2
Model binary logit
Optimization Technique Fisher's scoring
Number of Observations Read 4521

Number of Observations Used 4521
Response Profile
Ordered Value NewDeposit Total Frequency
10 4000
21 521

Probability modeled is NewDeposit=1.
Modelled event, by default is
is the smallest or first value
(alphanumeric ordering)
Class Level Information
Class Value Design Variables
poutcome failure 0 0 0
other 1 0 0
success 0 1 0
unknown 0 0 1 Model specification
Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.
Model parameters were
estimated

Model Fit Statistics
Intercept Intercept and
Criterion Only Covariates
AIC 3233.000 3000.518
SC 3239.417 3032.600
-2 Log L 3231.000 2990.518
• -2 Log L:
• Akaike’s information criterion (AIC)
• Schwarz criterion (SC):
The criteria are for assessing the goodness-of-fit of different models. Only measuring the relative fit
among models. The smaller the value is the better the fit is. AIC and SC adjust the log likelihood
statistic for number of parameters and number of parameters and samle size respectively. More
parsimonious models are preferred.

Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 240.4826 4 <.0001
Score 391.6167 4 <.0001
Wald 239.7336 4 <.0001
The test H0: β=0 is to check if all regression coefficients are 0.

P-value ≤ α indicate that at least one predictor is statistically significant.
The three tests are assessing the same aspect but the construction of the respective test statistics
differs. The likelihood ratio test is more reliable for smaller samples. All tests would give similar
results for large samples.

Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard Error Wald Chi-Square Pr > ChiSq
Intercept α 1 -2.3318 0.2322 100.8112 <.0001
age β1 1 0.00995 0.00445 5.0081 0.0252
poutcome β2 other 1 0.5004 0.2258 4.9124 0.0267
poutcome β3 success 1 2.4852 0.2285 118.3311 <.0001
poutcome β4 unknown 1 -0.3835 0.1467 6.8352 0.0089
𝜋 𝑥𝑖
𝑙𝑜𝑔𝑖𝑡 𝜋 𝑥𝑖 = 𝑙𝑛
1 − 𝜋 𝑥𝑖
= −2.3318 + 0.00995 x1 + 0.5004 x21 +2.4852 x22− 0.3835 x23
1
𝜋 𝑥𝑖 =
1 + 𝑒 2.3318 − 0.00995 x1 − 0.5004 x21 −2.4852 x22+ 0.3835 x23

Predicted Probabilities for NewDeposit=1
1.00
0.75
Probability
0.50
0.25
0.00
20 40 60 80
age
poutcome failure other success unknown
Whom to contact with intention to get a positive response?


• Logistic regression model:
𝑙𝑜𝑔𝑖𝑡 𝜋 𝑥𝑖 = ln(𝑜𝑑𝑑𝑠) = 𝛽0 + β1x1 +β2x21 +β3x22−β4x23

• Odds ratio (5 years increase):
odds for older client = 𝑒 𝛽0+𝛽1 𝑥1 +5 +𝛽2𝑥21 +𝛽3𝑥23 +𝛽4𝑥23
odds for younger client = 𝑒 𝛽0 +𝛽1 𝑥1 +𝛽2 𝑥21 +𝛽3 𝑥23 +𝛽4 𝑥23
odds ratio = odds older/odds younger = 𝑒 𝛽1 ∙5 = 𝑒 (0.00995 ∙ 5) ≈ 1.051
The odds ratio for age indicates that the odds of having positive response
from the marketing campaign increases by 5% for each increase of 5 years in
age of bank clients.

Odds Ratio Estimates and Profile-Likelihood Confidence Intervals
Effect Unit Estimate 95% Confidence Limits Odds Ratios with 95% Profile-Likelihood Confidence Limits
age 5.0000 1.051 1.006 1.098
poutcome other vs failure 1.0000 1.649 1.053 2.557
poutcome success vs failure 1.0000 12.003 7.720 18.928
age units=5
poutcome unknown vs failure 1.0000 0.681 0.515 0.915
poutcome other vs failure
poutcome success vs failure
poutcome unknown vs failure
0 5 10 15 20
Odds Ratio
Forest plot with odds ratios.

Basic model assessment: comparing pairs

𝜋 𝑥𝑖 𝜌 𝜋 𝑥𝑗 ?
• To find concordant, discordant and tied pairs we compare subjects
that had the outcome of interest against subjects that did not
Event = 0 Event = 1
𝑝Ƹ 𝑖 =? 𝑝Ƹ𝑗 =?
Concordant pair
• Outcome in agreement with probabilities
Event = 0 Event = 1
𝑝Ƹ 𝑖 = 0.03 𝑝Ƹ𝑗 = 0.40

Discordant pair
• Outcome in disagreement with probabilities
Event = 0 Event = 1
𝑝Ƹ 𝑖 = 0.80 𝑝Ƹ𝑗 = 0.60

Tied pair
• The model cannot distinguish between the two subjects
Event = 0 Event = 1
𝑝Ƹ 𝑖 = 0.60 𝑝Ƹ𝑗 = 0.60

Association of Predicted Probabilities and Observed Responses

Percent Concordant 60.7 Somers' D 0.228
Percent Discordant 37.8 Gamma 0.232
Percent Tied 1.5 Tau-a 0.047
Pairs 2084000 c 0.614
[Data] german.sas
Exercise 2.1
Fit the logistic regression model using dataset German with Default as response
variable and housing type as categorical variable.
1. Write the logistic model equation.

2. Assess the statistical significance of the parameter estimates.
3. Assess the risk of not paying off the loan using odds ratio estimates from the model.
4. Assess the effect of AGE, DURATION on risk of being default.
[SAS] l02_LogRegGerman01.sas
Odds Ratios with 95% Wald Confidence Limits
housing for free vs own
housing rent vs own
age
1 2 3
Odds Ratio
Predicted Probabilities for default=1
1.00
0.75
Probability
0.50
0.25
0.00
20 40 60 80
age
housing for free own rent
Binary logistic regression model

Logistic regression
The response variable is given by Bernoulli distribution, e.g.:
𝑌=ቊ
• The purpose of the analysis is to estimate the probability of

observing an event Y=y conditionally on X=x which is
P(Y=y|X=x). The explanatory variables in the model can be:
• discrete
• continuous
• Example of research questions: finding factors affecting

company survival, identifying variables having an impact on
decision making process e.g. in marketing to say why
customer chooses to buy a product.
Specification of the logistic regression model

• Each observation on Y has Bernoulli distribution:
P ( Y = 1) = 
i i
P ( Y = 0 ) = 1 −  ( 0    1)
i i i
• In logistic regression model the probabilities are conditional on Xk

(k=1,…, p): p
exp(  +   x ) 1 k ki
 ( x ) = P (Y = 1 | X = x ) =
i i
= k =1
p p
1 + exp(  +   x ) 1 + e
 −( +   k x ki )
k =1
k ki
k =1
• Logit transformation provides:

 (x ) p
logit [ ( x )] = ln i
= + x
1 −  (x )
i k ki
k =1
i
Maximum likelihood estimation

• The method of maximum likelihood yields values for the unknown
parameters that maximize the probability of obtaining the observed
set of data (Hosmer et al., 2013, section 1.2)
• Setup the likelihood function describing the process under analysis:
p(y) for observed set of outcomes (Y1, Y2, …, Yn)
• (Y1, Y2, …, Yn)→sample→(y1, y2, …, yn)
• Assuming that individual observations are independent the
probability:
n
P (Y = y , Y = y ,..., Y = y ) = P (Y = y ) P (Y = y )... P (Y = y ) =  p ( y )
1 1 2 2 n n 1 1 2 2 n n i
i =1
• Likelihood function:
1 2 n 1 r
n
L ( y , y ,..., y |  ,...,  ) =  p ( y |  ,...,  ) i 1 r

max to get ML estimates:
ˆ ,..., ˆ
i =1
 ,..., 
1 r 1 r
Estimation in logistic regression model (1)

• For n independent observations from Bernoulli distribution likelihood
function is: n
L ( y , y ,..., y |  ,...,  ) =  p ( y |  ,...,  )

1 2 n 1 r i 1 r
i =1
n
=  ( x ) (1 −  ( x )) ...  ( x ) (1 −  ( x )) =   ( x ) (1 −  ( x ))
y1 ( 1 − y1 ) yn ( 1− y n ) yi ( 1− y i )
1 1 n n i i
i =1
• Natural logarithm of likelihood function gives:
n
ln L ( y1 , y 2 ,..., y n |  1 ,...,  r ) = ln   ( x i ) i (1 −  ( x i ))
y (1− y i )
i =1
n
=   y ln  ( x )  + (1 − y ) ln 1 −  ( x ) 
i i i i
i =1
n
 ( xi ) n
=  y i ln
1 −  ( xi )
+  ln 1 −  ( x )  i
i =1 i =1
n
 p
 n
 p

=  y i  +  k ki  −
 x  ln 1 + exp(  +   k x ki ) 
i =1  k =1  i =1  k =1 
Estimation in logistic regression model (2)

• In order to get ML estimates the task is to find values of parameters
that maximize the likelihood function. For a model with one
explanatory variable the likelihood equations are found by:
 ln L ( y , y ,..., y |  ,...,  ) n n exp(  +  x )
1 2 n 1 r
=  y − i
 1 + exp(  +  x )
i
i =1 i =1
i
 ln L ( y , y ,..., y |  ,...,  ) x exp(  +  x )

n n
1 2 n 1
=  yx − r i i
 1 + exp(  +  x )
i i
i =1 i =1
i
  ln L ( y , y ,..., y |  ,...,  )
1 2 n 1
=0 r
  ˆ ˆ
  ln L ( y , y ,..., y |  ,...,  ) ,
 1 2 n 1
=0 Estimates of the parameters
r
  in logistic regression model

Newton-Raphson
Algorithm
(Agresti, 2002, p. 143-
Algorithms 145)
for model
fitting Fisher Scoring
(Agresti, 2002, p. 145-
146)
Algorithms for model fitting visually

Newton-Raphson Algorithm numerically

• For t representing the iteration of the algorithm the procedure is:
= β − (H ) u
( t +1) t ( t ) −1 (t )
β
 L (β )   
2
L (β )  L (β )
2
 L (β ) 
2
  
    1
2
  1  2   1  p
 1
  
L (β )  
2
L (β )  L (β )
2
 L (β ) 
2
  
u =   , H =    
2
 2 p 
  
2
 2 1 2

 L (β )   2     
  L (β )  L (β )  L (β ) 
2 2
  
  p      p 2   p 
2
 p 1
• H is called the observed information matrix in the setting in Newton-

Raphson algorithm.
Newton-Raphson Algorithm
Logistic regression (LR) model with one explanatory variable
β
( t +1)
=β − H
t
( )
(t ) −1
u
(t )
 
β=  
 
 n n
exp(  +  x i ) 
  yi −  
i = 1 1 + exp(  +  x i )
u =  n i =1


n
x exp(  +  x ) 

 i =1
yi xi −  i i

 i = 1 1 + exp(  +  x i ) 
 +  xi  +  xi
 n
e n
xi e 
−  −  
H = 
i =1 (1 + e  +  xi 2
) i =1 (1 + e  +  xi 2
) 
 +  xi  +  xi
 n
xi e n 2
xi e 
−  −  
 i =1 (1 + e  +  xi 2
) i =1 (1 + e  +  xi 2
) 
Newton-Raphson Algorithm [SAS] l03_NetwonRaphsonAlgorithm01.sas
Example, program code

*Function: y = -x.^4 -x.^3 + x.^2 + x + 1
Task find the maximum using Newton-Raphson algorithm;
data a01;
do x=-3 to 3 by 0.001;
y=-x**4-x**3+x**2+x+1; *Plot the function and iterations:
output; proc sql noprint;
end; select x into :a separated by ' ' from a02;
run; quit;
data a02; proc sgplot data=a01;
x=5; series x=x y=y;
do until(conv<0.01); xaxis min=-3 max=3;
dy = -4*x**3-3*x**2+2*x + 1; refline &a. / axis=x lineattrs=(color=red);
ddy = -12*x**2-6*x + 2; run;
xt=x;
x=x-dy/ddy;
conv = abs(xt - x);
output;
end;
run;
Newton-Raphson Algorithm
Example, Results
Homework:
1. What if the starting point
is different, e.g. -2? What
does this tell about
properties of Newton-
Raphson algorithm?
2. Write a program to
estimate the parameters
of logistic regression
model with one
explanatory variable
using Newton-Raphson
algorithm.
Fisher scoring numerically

• Fisher scoring method is similar to Newton-Raphson with the
exception that the former uses J matrix of expected information
rather than matrix of observed information H. In For t representing
the iteration of the algorithm the procedure is:
β
( t +1)
=β − J
t
( )
(t ) −1
u
(t )
   2 L (β )    2 L (β )    2 L (β )  
  L (β )   − E   − E    − E 
     
    1     1  2
2
   1 p 
 1
 
 L ( β )   2 L (β )    2 L (β )    2 L (β )  
  − E  − E    − E 
u =   , J =          
  2 
2
2  1   p 
  
2 2
 
  L (β )     
 
     2 L (β )    2 L (β )    2 L (β )  
   p  − E  − E   − E 
           2  
  p 1   p 2   p 
For large samples J and H are approximately equal. For details on expected
information matrix see: Agresti, 2002, p. 135-139.
Plotting log likelihood (1)

Model (0)
1 𝑑𝑒𝑓𝑎𝑢𝑙𝑡
𝑌=ቊ
0 𝑛𝑜𝑛 − 𝑑𝑒𝑓𝑎𝑢𝑙𝑡 1
 ( xi ) = − ( +  xi )
1 ℎ𝑎𝑣𝑖𝑛𝑔 𝑡𝑒𝑙𝑒𝑝ℎ𝑜𝑛𝑒 1+ e
𝑋=ቊ
0 𝑛𝑜𝑡 ℎ𝑎𝑣𝑖𝑛𝑔 𝑡𝑒𝑙𝑒𝑝ℎ𝑜𝑛𝑒
Log-likelihood for model (0):
ln L ( y 1 , y 2 ,..., y n |  ,  ) =
n n
=  y  i
+  xi  −  ln 1 + exp(  +  xi )
i =1 i =1
n y =1 n y =1 ; x =1 n x =1 n x=0
=   yi +   yi xi −  ln 1 + exp(  +  ) −  ln 1 + exp(  ) 
i =1 i =1 i =1 i =1
= n y =1 + n y =1; x =1  − n x =1 ln 1 + exp(  +  )  − n x = 0 ln 1 + exp(  ) 
[SAS] l03_PlottingLnL01.sas
Plotting log likelihood for model (0)

ln L ( y 1 , y 2 ,..., y n |  ,  ) =
= n y =1 + n y =1; x =1  − n x =1 ln 1 + exp(  +  )  − n x = 0 ln 1 + exp(  ) 
= 300  + 113  − 404 ln[ 1 + exp(  +  )] − 596 ln 1 + exp(  ) 
goptions device=activex;
proc g3d data=p01;
plot a*b=f;
run;
quit;
Model (1)
• Response Y (Customer status) and explanatory X (Housing status)
variables are:
1 𝑑𝑒𝑓𝑎𝑢𝑙𝑡
𝑌=ቊ
0 𝑛𝑜𝑛 − 𝑑𝑒𝑓𝑎𝑢𝑙𝑡
1 𝑂𝑤𝑛
𝑋 = ቐ2 𝑅𝑒𝑛𝑡
3 𝐹𝑜𝑟 𝐹𝑟𝑒𝑒
• Estimate the model assessing the risk of customer being default given
their housing status:
1
 ( xi ) = − ( +  x +  x )
1+ e 1 1i 2 2i
[SAS] l03_EstimationTechniques01.sas
Estimation of logistic regression in SAS

Choosing estimating algorithm
• The option to choose the estimating algorithm is in model statement:

technique=Fisher | Newton
• Fisher scoring is default
*Binary logistic regression model;
proc logistic data=german;
class housing (param=ref ref='own');
model default(event='1')= housing / technique=Newton;
run;
Model Information Model Information
Data Set CLASS3.GERMAN Data Set CLASS3.GERMAN
Response Variable default Response Variable default
Number of Response Levels 2 Number of Response Levels 2
Model binary logit Model binary logit
Optimization Technique Fisher's scoring Optimization Technique Newton-Raphson

Alternative estimating procedure for LR

model
proc genmod data=class3.german descending;
model default= housing / dist = bin link = logit;
run;
Analysis Of Maximum Likelihood Parameter Estimates

Parameter DF Estimate Standard Wald 95% Wald Pr > ChiSq
Error Confidence Limits Chi-
Square
Intercept 1 -1.0415 0.0853 -1.2086 -0.8743 149.11 <.0001
housing for free 1 0.6668 0.2136 0.2481 1.0854 9.74 0.0018
housing rent 1 0.5986 0.1753 0.2550 0.9422 11.66 0.0006
Scale 0 1.0000 0.0000 1.0000 1.0000
Parametrization of the logistic model

Parametrization (see SAS Online Doc for other
types of parametrizations):
• Reference - Parameter estimates of CLASS main
effects, using the reference coding scheme,
estimate the difference in the effect of each
nonreference level compared to the effect of
the reference level.
• Effect - Parameter estimates of CLASS main
effects, using the effect coding scheme,
estimate the difference in the effect of each
nonreference level compared to the average
effect over all levels.
Reference vs. effect parametrization

• In model assessing relationship between a customer being default
and their housing status the model have three structural parameters,
intercept and effects related to the housing status. The individual
probabilities are calculated using:
1
 ( xi ) = − ( +  x +  x )
Reference coding 1+ e Effect coding 1 1i 2 2i
1 1
 ( xi ) = − ( − 1 . 0414 + 0 . 6669 x1 i + 0 . 5987 x 2 i )
 ( xi ) = − ( − 0 . 6195 + 0 . 2450 x1 i + 0 . 1768 x 2 i )
1+ e 1+ e
1
Reference parametrization  ( xi ) =
1+ e
1 . 0414 − 0 . 6669 x 1 i − 0 . 5987 x 2 i
• Calculating risk of being default comparing those renting with

customers owning flats [model (1)]:
1
 ( x i = rent ) = 1 . 0414 − 0 . 5987
 0 . 3911
1+ e
1
 ( x i = own ) =  0 . 2610
1+ e
1 . 0414
 ( xi )
O ( x i = rent ) =  0 . 6423 OR ( rent vs own )
1 −  ( xi ) O ( x i = rent )
=  1 . 8198
 ( xi ) O ( x i = own )
O ( x i = own ) =  0 . 3530
1 −  ( xi )
1
 ( xi ) =
Effect parametrization 1+ e
0 . 6195 − 0 . 2450 x 1 i − 0 . 1768 x 2 i
• Calculating risk of being default comparing those renting with

customers owning flats [model (1)]:
1
 ( x i = rent ) = 0 . 6195 − 0 . 1768
 0 . 3911
1+ e
1
 ( x i = own ) = 0 . 6195 + 0 . 2450 + 0 . 1768
 0 . 2609
1+ e
 ( xi )
O ( x i = rent ) =  0 . 6423 OR ( rent vs own )
1 −  ( xi ) O ( x i = rent )
=  1 . 8196
 ( xi ) O ( x i = own )
O ( x i = own ) =  0 . 3530
1 −  ( xi )
Parametrization programmatically
• Reference:
model default(event='1')= housing;
run;
• Effect:
*Binary logistic regression model;
class housing (param=effect ref='own');
model default(event='1')= housing;
run;
Selecting variables for the model

• Theoretically sensible
• Frequencies analysis to check the distributions – sparse categories
will affect the model estimation
• Selection method (e.g. forward, backward, stepwise)
• Statistical significance of the effects
Variable selection methods in SAS

• SAS OnlineDoc:
• When SELECTION=FORWARD, PROC LOGISTIC first estimates parameters for effects forced into
the model. These effects are the intercepts and the first n explanatory effects in
the MODEL statement, where n is the number specified by the START= or INCLUDE= option in
the MODEL statement (n is zero by default). Next, the procedure computes the score chi-square
statistic for each effect not in the model and examines the largest of these statistics. If it is
significant at the SLENTRY= level, the corresponding effect is added to the model. Once an effect
is entered in the model, it is never removed from the model.
• When SELECTION=BACKWARD, estimate model including all variables. Results of the Wald test for
individual parameters are examined. The least significant effect that does not meet
the SLSTAY= level for staying in the model is removed. Once an effect is removed from the model,
it remains excluded. The process is repeated until no other effect in the model meets the
specified level for removal or until the STOP= value is reached.
• The SELECTION=STEPWISE option is similar to the SELECTION=FORWARD option except that
effects already in the model do not necessarily remain. Effects are entered into and removed
from the model in such a way that each forward selection step can be followed by one or more
backward elimination steps. The stepwise selection process terminates if no further effect can be
added to the model or if the current model is identical to a previously visited model.
[SAS] l04_VariableSelection01.sas
Exploring variables through SAS selection

method
• Ex 1. Estimate the model assessing risk of customer being default
adding new explanatory variables using three selection methods.
Compare the results.
*Exploring variables using selection procedure;
class housing personal_status debtors telephone / param=ref;
model default(event='1')= duration credit_amt age housing
personal_status debtors telephone
/ selection=forward /*Forward | Backward | Stepwise*/;
run;
[SAS] l04_ModelFitStatistics01.sas
Assessing goodness-of-fit
ln L ( y 1 , y 2 ,..., y n |  1 ,...,  r )
n
 p
 n
 p

=  y i  +   k x ki  −  ln 1 + exp(  +   k x ki ) 
i =1  k =1  i =1  k =1 
In model (1):
ln L ( y 1 , y 2 ,..., y n |  ,  1 ,  2 )
n n
=  y  i
+  1 x 1i +  2 x 2 i  −  ln 1 + exp(  +  1 x 1i +  2 x 2 i ) 
i =1 i =1
AIC = − 2 ln L + 2 p
SC = − 2 ln L + p ln n
p-number of parameters [model (1) has 3 parameters]

n- sample size
Testing significance of effects

• Testing if at least one explanatory variable in the model is statistically
significant ie. the model allows to identify at least one effect which
affects the probability under analysis
H0 :β = 0
H1 : k  0
k
Likelihood ratio test (Agresti, 2002, p. 12)

• The idea is to compare lnL in the null model with the
model with explanatory variable(s):
The test statistic is as follows:
 l0 
− 2 ln  = − 2 ln   = − 2 (ln L 0 − ln L 1 )

 l1 
• For model (1) we get:
− 2 ln  = − 2 ln L 0 + 2 ln L 1 = 1221 . 729 − 1204 . 049 = 17 . 68
• -2lnΛ has a limiting chi-square distribution with degrees of freedom equal to the
difference in the dimension of the parameter spaces between the compared
models
[SAS] l04_WaldTest01.sas
Wald test (Agresti, 2002, p. 11)

• The idea is to compare the estimates with their
standard errors. The test statistic is as follows:
(
W = βˆ − β 0 )
T
cov( βˆ )  (βˆ − β )
−1
0
• For model (1) we get:
−1
 ˆ 2
ˆ ˆ ˆ   ˆ 

W = ˆ1 ˆ 2  
ˆ1 1 21
  ˆ 
ˆ ˆ1 ˆ 2 ˆ
2
ˆ 2
  2 
• W has a limiting chi-square distribution with
degrees of freedom equal to the rank of covariance matrix.
Score test (Agresti, 2002, p. 12-13)
• The idea is to compare the slope and expected curvature of the log-
likelihood function at β0 . The test statistic is as follows:
−1
S =u J u
T
• S has approximate chi-square distribution with r

degrees of freedom where r is the number of restrictions imposed on
𝛃. See details in OnlineDoc.
Score test (2)

Under H0 the assumption is that regression coefficients are equal to 0 and therefore the logit equals:
 ( xi )
n
1
 ( xi ) = 
T
logit [ ( x i )] = ln = yi    ( xi )  
1 −  ( xi ) n i =1

H0 :  1 ... p
T
=  ln  
 0 ... 0
  1 −  ( xi )  
The test statistic is: H1 :  k  0
Estimate of P(Y=1) k
−1
S =u J u
T
   2 L (β )    2 L (β )    2 L (β )  
  L (β )   − E   − E    − E 
     
    1     1  2
2
   1 p 
 1
 
 L ( β )   2 L (β )    2 L (β )    2 L (β )  
  − E  − E    − E 
u =   , J =          
  2 
2
1  1   p 
  
2 2
 
  L (β )     
 
     2 L (β )    2 L (β )    2 L (β )  
   p  − E  − E   − E 
           2  
  p 1   p 2   p 
Score test example

In model (0) we get:
ln L ( y 1 , y 2 ,..., y n |  ,  ) = n y =1 + n y =1; x =1  − n x =1 ln 1 + exp(  +  )  − n x = 0 ln 1 + exp(  ) 

exp(  +  ) exp(  )     2 L (β )    2 L (β )  
  L (β )  
n − n x =1 − n x=0  − E   − E
 

     y =1        
2
1 + exp(  +  ) 1 + exp(  )    
u =  =  , J =
 L (β )  exp(  +  )   
2
β   
2
L (β )  
   n y =1; x =1 − n x =1  
L ( )
  
     1 + exp(  +  )  − E   − E 
     
2
    
  n
e
 +  xi   n
x e
 +  xi   e
 +
e

e
 +


− E −   −E − i    n x =1 + n x=0 n x =1 
J = 
 
 (
i =1 1 + e )
 +  xi 2 


 (
i =1 1 + e )
 +  xi 2 
   ( 1+ e )
 + 2
( 1+ e )
 2
( 1+ e
 +
)
2

=   +  +

 n
xie
 +  x i   n 2
xi e
 +  x i  e e
− E  −  − E−  
  n x =1
( )
n x =1
( )

 ( )   ( )   + 2  + 2
 i =1 1 + e
 +  x 2
i =1 1 + e
 +  x 2
  1+ e 1+ e 
   
i i

where n y =1 = 300 , n y = 0 = 700 , n x = 0 = 596 , n x =1 = 404 , n x =1; y =1 = 113
 0 .3 
Under H 0 :  = ln   = − 0 . 8473 ,  = 0
 0 .7 
−1
S = u J u = 1 . 329784
T
Wald test for specific variable

• For model (1) with additional effect of age included we get: se (  3 ) = ˆ  3
( )
T
 (
−1
W = βˆ − β 0 cov( βˆ ) βˆ − β 0 )
ˆ
3
2
(-0.021304 067)
2
= 2 =  9.5988
ˆ 3
0.00004728 34
• W has a limiting chi-square distribution with

degrees of freedom equal to the rank of covariance matrix.
Measures of Goodness-of-Fit,
• Assessing fit of the model - we would like to know whether the
probabilities produced by the model accurately reflect the true
outcome experience in the data (Hosmer et al., 2007).
• Deviance and Pearson χ2: the statistics compare saturated model with
the estimated model. Saturated model reflects ideal fit – probabilities
from the model equal to the estimated probabilities for each profile
(k=1,2,…,nlk)
Saturated Estimated
n11 n12 … n1k Model Model
n21 n22 … n2k 𝑛𝑖𝑗 𝑛ො 𝑖𝑗
𝑣𝑠. 𝑝Ƹ 𝑖𝑗 =
… … … … 𝑛 𝑛
nl1 nl2 … nlk
[SAS] l04_SaturatedModel01.sas
Saturated model
Counts (nij) in
• Let us consider model (1) with TELEPHONE being included as saturated
model
explanatory variable
proc freq data=german;
tables housing*telephone /
out=f01;
run;

class housing (param=ref ref='own')
telephone (param=ref ref='yes');
model default(event='1')= housing
telephone / aggregate scale=none ;
Probabilities in
output out=out predprobs=(i);
saturated
run;
model
Deviance and Pearson χ2
• Deviance:
𝑙 𝑘
𝑛𝑖𝑗
𝐷 = 2 ෍ ෍ 𝑛𝑖𝑗 𝑙𝑛
𝑛ො 𝑖𝑗
𝑖=1 𝑗=1
• Pearson χ2:
𝑙 𝑘 2
2
𝑛𝑖𝑗 − 𝑛ො 𝑖𝑗
𝜒 = ෍෍
𝑛ො 𝑖𝑗
𝑖=1 𝑗=1
𝑛𝑖𝑗 - observed counts

𝑛ො 𝑖𝑗 − counts estimated in the model E(X)=nπ (by property of binomial distribution)
D and χ2 follow chi-square distribution with degrees of freedom equal to the difference between the number
of profiles (parameters estimated in the saturated model) and estimated parameters.
Example
• Test goodness of fit of model (1) with TELEPHONE being included as
explanatory variable using Deviance and Pearson χ2. Calculate
Deviance and Pearson χ2 statistics.
Expected Observed D Chi-square
0 no rent 68.73725 66 6.6 -2.68201 0.1090029
Count IP_0 IP_1 0 no own 313.67 319 31.9 5.37501 0.09056869
116 11.6 0.592563 0.407437 68.73725 47.26275 0 no for free 26.58863 24 2.4 -2.45832 0.25202516
433 43.3 0.724411 0.275589 313.67 119.33 0 yes rent 40.25858 43 4.3 2.832714 0.18667833
47 4.7 0.565716 0.434284 26.58863 20.41137 0 yes own 213.3264 208 20.8 -5.25938 0.13299282
63 6.3 0.639025 0.360975 40.25858 22.74142 0 yes for free 37.40772 40 4 2.680094 0.1796398
280 28 0.76188 0.23812 213.3264 66.67357 1 no rent 47.26275 50 5 2.815032 0.15852994
61 6.1 0.613241 0.386759 37.40772 23.59228 1 no own 119.33 114 11.4 -5.20913 0.23806829
1 no for free 20.41137 23 2.3 2.746248 0.32829759
1 yes rent 22.74142 20 2 -2.56912 0.33047201
1 yes for free 66.67357 72 7.2 5.533743 0.42551922
1 yes for 23.59228 21 2.1 -2.44434 0.28483535
2.721091 2.7166301
Deviance and Pearson 2

χ practical aspects
• If logistic regression model satisfies either of the following conditions:
• includes many explanatory variables
• explanatory variables have considerable number of categories
• model includes continuous variables
Deviance and Pearson χ2 do not follow χ2 distribution
Practically contingency table includes sparse categories – small or zero
counts. A test which allows to measure the goodness of fit in case of a
model satisfying either of above conditions is Hosmer-Lemeshow (HL)
test.
Hosmer-Lemeshow test
• Hosmer Lemeshow statistic is calculated as follows:
• Estimate the individual probabilities.
• Sort within response and divide set into 10 groups with similar
number of observations (k=1,2,3,…,10)
• Using the groups calculate expected counts and compare with
observed counts using Pearson χ2 statistic, which has approximately χ2
distribution with v=10-2=8 degrees of freedom.
class housing (param=ref ref='own') telephone (param=ref ref='yes');
model default(event='1')= housing telephone / aggregate scale=none
lackfit;
run;
[SAS] l04_HosmerLemeshow01.sas
Example
• Test goodness of fit of model (1) with AGE being included as
explanatory variable using Hosmer-Lemeshow statistic.
SAS OnlineDoc
0.002428 0.000547
0.069971 0.020732
1.302721 0.43821
0.072579 0.026138
0.173811 0.066921
0.083389 0.034293
0.831501 0.361852
0.801296 0.392317
0.796441 0.523366
0.784196 0.626636
7.409344
Predictive power of the model

• How the predictions differ from observed values. Generalized
coefficient of determination Cox & Snell R2:
𝐿𝑅
−𝑛
𝑅2 =1− 𝑒
• where LR is Likelihood ratio statistic. The minimum in range of R2 is 0

with maximum smaller than 1. The statistic can be normalized using
the following transformation (Nagelkerke):
𝑅2
𝑅𝑛2 = 2𝑙𝑛𝐿0
1 −𝑒 𝑛
• where lnL0 is ln of likelihood in model with intercept only, R2 is in [0,1]
[SAS] l04_RSqaure01.sas
Example
• Assess predictive power of the model (1) with AGE being included as
explanatory variable using Hosmer-Lemeshow statistic.

model default(event='1')= housing age / aggregate scale=none
lackfit rsq;
ods output LackFitPartition=hl;
run;
Discriminatory performance of the model (1)

• Create pairs of observations, and compare
• the probabilities
Kleinbaum and Klein, 2010, p. 358
• nc- number of concordant – 𝜋 𝑑𝑒𝑓𝑎𝑢𝑙𝑡 > 𝜋 𝑛𝑜𝑛 − 𝑑𝑒𝑓𝑎𝑢𝑙𝑡

• nd- number discordant – 𝜋 𝑑𝑒𝑓𝑎𝑢𝑙𝑡 < 𝜋 𝑛𝑜𝑛 − 𝑑𝑒𝑓𝑎𝑢𝑙𝑡
• nt- number of ties – 𝜋 𝑑𝑒𝑓𝑎𝑢𝑙𝑡 = 𝜋 𝑛𝑜𝑛 − 𝑑𝑒𝑓𝑎𝑢𝑙𝑡
• t- number of pairs- comparing 300 defaults with 700 non-defaults
gives 300x700=210 000 comparisons.
Discriminatory performance of the model (2)

• D Sommers=(nc-nd)/t – [-1,1] where -1 all discordant 1 all concordant
• G-K Gamma=(nc-nd)/(nc+nd)
• Tau-Kendall=(nc-nd)/(0.5n(n-1))
• c=(nc+0.5(t-nc-nd))/t – area under ROC curve, normalized in [0.5,1]
where 0.5 reflects random assignment of probabilities and 1 ideal
assignment of probabilities
Classification tables
• A tool to assess predictive power of the model. How good the model is in assessing the values of
Y? We set up a boundary cut-off in (0,1). Then using c we may classify observations into those for
which the event occurred and those for which the event did not occur. We can assess
appropriateness of classification using the following statistics:
Sensitivity (assessing occurrence of the event): Specificity (assessing non-occurrence of the event):
n Yˆ =1|Y =1 n Yˆ = 0 |Y = 0
S1 = S0 =
n Y =1 nY = 0
False occurence: False non-occurence:
n Yˆ =1|Y = 0 n Yˆ = 0 |Y =1
FP = FN =
n Yˆ =1 n Yˆ = 0
[SAS] l04_ClassificationTable01.sas
Classification table for model (1)
proc logistic data=class3.german;

ni.
Yˆ = 1 Yˆ = 0 model default(event='1')= housing /
ctable pprob=0.3;
Y=1 n11 n12 n1.
Y=0 n21 n22 n2. run;
n.j n.1 n.2 n
n11 + n 22 n11 n 22 n 21 n12

n11 n 22 n 21 n12
n n1 . n 2. n .1 n .2
Logistic Regression with SAS ©AK, SGH 2020 [SAS] l04_RocCurve01.sas
ROC curve
n Yˆ =1|Y =1
S1 = smaller cut-off
n Y =1
n Yˆ = 0 |Y = 0
S0 =
n
[SAS] l04_Residuals01.sas
Residuals in logistic regression

• Assessing goodness-of-fit (large residuals indicate poor fit)
• Identifying influential observations
• The residuals are:
• DFBETASi – how the estimates of coefficients would change after removing ith
observation?
• DIFDEVi/DIFCHISQi change in Deviance/Pearson χ2 after removing ith observation
• Ci and CBARi specifies the confidence interval displacement diagnostic that
measures the influence of individual observations on the regression estimates.
• SAS OnlineDoc:
https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/vie
wer.htm#statug_logistic_sect042.htm
[SAS] l04_Collinearity01.sas
Collinearity assessment
• Collinearity in logistic regression lead to overestimation estimation of
standard errors and could lead to overestimation of regression
coefficients
• Standard statistics used in linear regression can be applied to detect
collinearity variance inflation (VIF) and tolerance (TOL):
1 1
VIF = TOL =
1 − Rk VIF
[SAS] l04_DepositBinaryModel01.sas
Exercise with R and Python [R] l04_DepositBinaryModel01.R
examples – Binary Logistic Model [Python] l04_DepositBinaryModel01.py

Use the dataset Bank.sas7bdat (see: Lectures - slide 52) to estimate the logistic regression model with NEWDEPOSIT
as the target variable. The aim of the analysis is to find out what is the chance that a client with particular profile
would be willing to subscribe a term deposit. Sample size n=4521. The input variables to consider are:
o Age (numeric)
o Education (categorical: "unknown", "secondary", "primary", "tertiary")
o Balance: average yearly balance, in euros (numeric)
o Duration: last contact duration, in seconds (numeric)
o Poutcome: outcome of the previous marketing campaign (categorical: "unknown", "other", "failure", "success")
Output variable (target):

17 - y (NEWDEPOSIT)- has the client subscribed a term deposit? (binary: "yes", "no")
Following the estimation provide answers to questions in

Logistic regression with SAS Quiz: Binary Logistic Regression: Link
Homework
1. Download and familiarize yourself with the data and documentation from
European Social Survey (ESS) study, Round 9:
http://www.europeansocialsurvey.org
2. Formulate a research problem. The aim is to practice binary logistic regression
– please consider that in the choice of the problem
3. Formulate the research hypothesis. The hypothesis will be verified in your
paper.
4. Suggest exact set features from ESS data to include in the analysis.
5. Specify the model to be used in the analysis.
Prepare a brief presentation to share with the group. The proposals will be
discussed. The research problem will be analyzed in your papers (Projects I, II and
III).
Logistic regression - model types

• Binary(i=2)
𝑌=ቊ
• Ordinal (i>2)
0 𝐺𝑜𝑜𝑑
𝑌 = ቐ1 𝑀𝑜𝑑𝑒𝑟𝑎𝑡𝑒
2 𝑃𝑜𝑜𝑟
• Multinomial (i>2)
Multinomial logistic regression model

Multinomial model specification

• Response variable Y has j>2 categories. Let us discuss the example of
Y={0,1,2}, j=3. The assumption is that Y has multinomial distribution
with parameters:
• P(Y=0)=π0
• P(Y=1)=π1
• P(Y=2)= π2 =1-(π0+π1)
• Model includes p explanatory variables captured in a vector x and
intercept
Logit notation in multinomial model

• For Y with three categories multinomial logistic regression model can
be expressed by two logit functions:
 P (Y = 1 | x ) 
 =  10 +  11 x1 + ... +  1 p x p = x β 1
T
ln 
 P (Y = 0 | x ) 
 P (Y = 2 | x ) 
 =  20 +  21 x1 + ... +  2 p x p = x β 2
T
ln 
 P (Y = 0 | x ) 
Conditional probabilities
• Conditional probabilities for πj|x are derived using the following
system of equations:
The individual conditional probabilities can be
   1| x  expressed by:
 = x β1
T
 ln 
   0 | x  1
 0|x =
   2|x  1+ e
T
x β1
+e
T
x β2

=
T
  β2
T
 ln x e
x β1
   0 | x   1| x = T
x β1
T
x β2
  0 | x = 1 − ( 1| x +  2 | x ) 1+ e T
+e
x β2
 e
 2|x =
 1+ e
T
x β1
+e
T
x β2
Model estimation – likelihood function

• The multinomial model is estimated based on the multinomial
distribution. Conditional likelihood function for a sample of n
independent observations is (Agresti, 2002, p. 272-273; Hosmer,
Lemeshow, Sturdivant, 2013, p. 269-272):
 Y = 0 : y 0 = 1, y 1 = 0 , y 2 = 0

 Y = 1 : y 0 = 0 , y 1 = 1, y 2 = 0
Y = 2 : y = 0 , y = 0 , y = 1
 0 1 2
n
L ( y 1 , y 2 ,..., y n |  1 ,...,  r ) =  p ( y i |  1 ,...,  r )

i =1
=  0 (x1 )  1 (x1 )  2 (x1 ) ...  0 ( x n )  1 (x n )  2 (x n )
y 01 y 11 y 21 y0 n y1 n y2 n
=   0 (x i )  1 (x i )  2 (x i )
y0i y1 i y2i
i =1
n
=  (1 −  1 ( x i ) −  2 ( x i ))  1 (x i )  2 (x i )
y0i y1 i y2i
i =1
Odds ratio in multinomial model

• For Y=0 being the reference category and Y=1 the category under
analysis, odds ratio allows to assess what is the chance of observing
Y=1 as compared to Y=0 if subject has X=a as compared to X=b where
a and b are specified characteristics. For example we may answer the
question what is the chance that a client would have a specific
smartphone and not simple phone in group of people with higher
education?
P (Y = 1 | X = a )
P (Y = 0 | X = a )
OR 10 ( a , b ) =
P (Y = 1 | X = b )
P (Y = 0 | X = b )
[Data] phones.sas7bdat
Exercise – Estimation of multinomial model

• What is the chance of the client having smartphone given their characteristics?
• The dataset Phones contains information on use of phone based on study „The
Pew Research Center’s Internet & American Life Project”,
http://pewinternet.org/Shared-Content/Data-Sets/2012/April-2012--Cell-
Phones.aspx [date accessed 3Jan2014]. The dataset includes clients with standard
phone, Android phone or iPhone with the following additional characteristics: age
(AGE), education (EDU), internet use (INTUSE), using internet in mobile device
(INTMOB) and quality of life (LQ). There is n=1061 observations in the dataset.
• Task: assess the chance using contingency tables and logistic regression model.
X: LQ SEX AGE EDU INTUSE INTMOB
Phones dataset
ID Name Description Categories
1 Y Type of phone 0=standard
1=Android
2=iPhone
2 LQ Quality of Life 1=Excellent
2=Very good
3=Good
4=Fair or poor
3 EDU Education 1=At least started higher education
0=otherwise
4 SEX Sex of the 1=Male
respondent 2=Female
5 AGE Age of the N/A
respondent
6 INTUSE Is using Internet? 1=Yes
2=No
7 INTMOB Is using Internet 1=Yes
through mobile 2=No
devices?
Comparing clients with higher education [SAS] l05_MultContingency01.sas
to other education – contingency tables

Comparing clients with higher education to

other education – multinomial model
• Chance of having an Android smartphone is more than 50% greater in

group of clients with higher education.
• Chance of having an iPhone smartphone is more than 2 times greater
in group of clients with higher education.
• For those with higher education it is likely that they will have
smartphones. Question: is there a significant difference between
Android and iPhone though?
iPhone vs Android
• H0: OR10=OR20
• then the point estimate with confidence 95% confidence interval:
• and OR21=exp(0.3377)=1.40 (1.01, 1.94). H0 rejected. OR10 and OR20

are statistically significantly different.
[SAS] l05_MultinomialModel01.sas
Exercise – estimating multinomial model

• Estimate the model assessing Chance of having a smartphone given
clients characteristics. Assess significance of the parameters and
interpret the results. The model specification is as follows:
X: SEX AGE EDU INTUSE INTMOB

Profile probabilities
Given the estimates of the model provide the probabilities estimates
for the following profiles:
1. Lady in her 31 with higher education, using Internet in general and
specifically on mobile devices has a specific phone type.
2. Plot the probabilities for age
Deriving profile probabilities

Plotting probabilities
Ordinal logistic regression model

Ordinal model specification

• Response variable Y has j>2 categories measured on an ordinal scale.
Order of the categories matters. Let us discuss the example of
Y={0,1,2}, j=3. The assumption is that Y has multinomial distribution
with parameters:
• P(Y=0)=π0
• P(Y=1)=π1
• P(Y=2)= π2 =1-(π0+π1)
• Model includes p explanatory variables captured in a vector x and
intercept. For example let Y denotes quality of life:
1 𝐹𝑎𝑖𝑟 𝑜𝑟 𝑝𝑜𝑜𝑟
𝑌 = ቐ2 𝑉𝑒𝑟𝑦 𝑔𝑜𝑜𝑑 𝑜𝑟 𝑔𝑜𝑜𝑑
3 𝐸𝑥𝑐𝑒𝑙𝑙𝑒𝑛𝑡
Cumulative Logit Model

• Ordinal variable can be modeled in various ways, e.g. using
multinomial model. A special type of model which allows to take the
ordering of categories to account is cumulative logit model
(Kleinbaum and Klein, 2010, Chapter 13):
0 1 2 3 4
Comparing:
0 1 2 3 4 <=0 vs >0
0 1 2 3 4 <=1 vs >1
0 1 2 3 4 <=2 vs >2
0 1 2 3 4 <=3 vs >3
Cumulative Logit Model

• For ordinal outcome variable Y with G categories (0,1,2,…,G-1) there
are G-1 ways to dichotomize the outcome:
D  1 vs D  1
D  2 vs D  2
Odds is derived based for
each cut point:
proportional odds
... assumption
D  G − 1 vs D  G − 1
P(D  g )
odds ( D  g ) = , g = 1, 2 ,..., G − 1
P(D  g )
Proportional odds assumption

Proportional odds means same odds ratio regardless of where
categories are dichotomized. In other words, the odds ratio is invariant
to where the outcome categories are dichotomized.
OR ( D  1) = OR ( D  4 )
odds ( D  1) | X = 1
OR ( D 1 ) =
odds ( D  1) | X = 0 
odds ( D  4 ) | X = 1
OR ( D  4 ) =
odds ( D  4 ) | X = 0 
Cumulative logit model

This implies that if there are G outcome categories, there is only one
parameter (β) for each of the predictors variables (e.g., β1 for predictor
X1). However, there is still a separate intercept term (αg) for each of the
G-1 comparisons:
Ordinal
Variable Parameter
Intercept α1, α,2, ...,αG-1
X1 β1

• G outcome levels and one predictor X:
1
P(D  g | X ) = − ( g +  xi )
, g = 1, 2 ,..., G − 1 .
1+ e
− ( g +  xi )
1 e
P(D  g | X ) = 1 − P(D  g | X ) = 1 − − ( g +  xi )
= − ( g +  xi )
1+ e 1+ e
1
P(D  g | X ) P(D  g | X ) − ( g +  xi )
= 1 + −e(  +  x )
 g +  xi
odds ( D  g ) = = =e
1 − P(D  g | X ) P(D  g | X ) e g i
− ( g +  xi )
1+ e
Cumulative logit model – alternative

parameterization
• G outcome levels and one predictor X:
P(D  g | X )
odds ( D  g ) =
1 − P(D  g | X )
P(D  g | X )  g −  xi
= =e , where g = 1, 2 ,..., G − 1 .
P(D  g | X )
• Many models can be parameterized in different ways. This need not be

problematic as long as the investigator understands how the model is
formulated and how to interpret its parameters.
Proportional odds assumption – contingency

table („Phones” dataset)
H0: OR1=OR2=…=ORG-1
H1: Ǝi≠j ORi≠ORj
LQ1 younger
0 1 Total
1 Fair or poor 119 58 177
2 Very good or good 349 311 660
3 Excellent 87 137 224
Total 555 506 1061
LQ1 younger
0 1 Total Odds of being in lower category when older
1 Fair or poor 119 58 177 Risk(Y<=1|X=0) 0.214414 Risk(Y<=1|X=1) 0.114625
2 Very good or good or 3 Excellent 436 448 884 Odds(Y<=1|X=0) 0.272936 Odds(Y<=1|X=1) 0.129464
Total 555 506 1061 OR01 2.108194
LQ1 younger
0 1 Total Odds of being in lower category when older
1 Fair or poor or 2 Very good or good 468 369 837 Risk(Y<=2|X=0) 0.843243 Risk(Y<=2|X=1) 0.729249
3 Excellent 87 137 224 Odds(Y<=2|X=0) 5.37931 Odds(Y<=2|X=1) 2.693431
Total 555 506 1061 OR01 1.997197
[SAS] l05_PropOdds01.sas
Proportional odds assumption – SAS
(„Phones” dataset)
H0: OR1=OR2=…=ORG-1
H1: Ǝi≠j ORi≠ORj
https://support.sas.com/documentation/cdl/en/statug/63962/
HTML/default/viewer.htm#statug_logistic_sect039.htm
Probabilities from cumulative logit model
1 1
P ( D  1 | X = 1) = − ( 1 +  xi )
=  0 . 1168
1+ e 1+ e
2.0235
1 1
P ( D  2 | X = 1) = − ( 2 +  xi )
=  0 . 7275
1+ e 1+ e
- 0.9820
1 1
P ( D  1 | X = 0) = − ( 1 +  xi )
= 2.0235 − 0.7147
 0 . 2127
1+ e 1+ e
1 1
P ( D  2 | X = 0) = − ( 2 +  xi )
= - 0.9820 − 0.7147
 0 . 8451
1+ e 1+ e
Odds ratio in cumulative logit model

1 1 1 1
P ( D  1 | X = 0) = − ( +  x )
=  0 . 1168 P ( D  1 | X = 1) = − ( 1 +  xi )
= 2.0235 − 0.7147
 0 . 2127
1+ e 1 i 1+ e 1+ e 1+ e
2.0235
P ( D  1 | X = 0 ) = 1 − 0 . 1168 = 0 . 8832 P ( D  1 | X = 1) = 1 − 0 . 2127 = 0 . 7873

P ( D  1 | X = 0 ) 0 . 1168 P ( D  1 | X = 1) 0 . 2127
Odds ( D 1 ) = =  0 . 1322 Odds ( D 1 ) = =  0 . 2702
P ( D  1 | X = 0 ) 0 . 8832 P ( D  1 | X = 1) 0 . 7873
0 . 1322
OR 1 vs (2 and 3) =  0 . 48
0 . 2702
1 1 1 1
P ( D  2 | X = 0) = − ( 2 +  xi )
=  0 . 7275 P ( D  2 | X = 1) = =  0 . 8451
1+ e 1+ e
- 0.9820
− ( 2 +  xi ) - 0.9820 − 0.7147
1+ e 1+ e
P ( D  2 | X = 0 ) = 1 − 0 . 7275 = 0 . 2725 P ( D  2 | X = 1) = 1 − 0 . 8451 = 0 . 1549
P ( D  2 | X = 0 ) 0 . 7275 P ( D  2 | X = 1) 0 . 8451
Odds ( D  2 ) = =  2 . 6697 Odds ( D  2 ) = =  4 . 4558
P ( D  2 | X = 0 ) 0 . 2725 P ( D  2 | X = 1) 0 . 1549
2 . 6697
OR (1 and 2 ) vs 3
=  0 . 48
4 . 4558
Proportional odds assumption check – example

(Source: SAS OnlineDoc, Proc Logistic Example 72.3)
Consider a study of the effects on taste of
various cheese additives. Researchers tested data Cheese;
do Additive = 1 to 4;
four cheese additives and obtained 52 response
do y = 1 to 9;
ratings for each additive. Each response was input freq @@;
measured on a scale of nine categories ranging output;
from strong dislike (1) to excellent taste (9). The end;
data, given in McCullagh and Nelder (1989, p. end;
175) in the form of a two-way frequency table label y='Taste Rating';
of additive by rating, are saved in the data set datalines;
Cheese by using the following program. The 0 0 1 7 8 8 19 8 1
variable y contains the response rating. The 6 9 12 11 7 6 1 0 0
1 1 6 8 23 7 5 1 0
variable Additive specifies the cheese additive
0 0 0 1 3 7 14 16 11
(1, 2, 3, or 4). The variable „Freq”s gives the ;
frequency with which each additive received run;
each rating. Assess proportional odds
assumption.
[SAS] l05_OrdinalModel01.sas

• Assess the effect of sex, age, education level and Internet usage on
the quality of life. Use phones dataset. The variables in the model are:
• Estimate the model and interpret the results.

• Plot the probabilities in order to assess the effect of the respective
features on the quality of life.
PROC LOGISTIC summary (OnLineDoc: Link)

ods graphics on;
proc logistic data=g /*Input dataset*/
covout outest=cov /*Output covariance matrix estimates*/
plots(only)=ROC; /*Output ROC curve*/
class housing (param=ref ref='own') acc_status (param=ref); /*Define
classification variables and parametrization (e.g. reference)*/
model default(event='1')= age duration housing acc_status / /*logistic
model definition, response and response category = predictors*/
technique=fisher /*Estimating algorithm*/
corrb /*Collinearity diagnostics*/
covb /*Covariance matrix*/
waldcl /*Wald confidence interval for model parameters*/
aggregate scale=none /*Deviance and Pearson chi-square*/
lackfit /*Hosmer-Lemeshow test of goodness-of-fit*/
expb /*exp of the estimates – OR in table with parameter estimates*/
rsq /*coefficient of determination*/
ctable /*classification tables [optional cut-points: pprob=(0.4 0.5)]*/
influence /*influential observations determination*/
;
run;
ods graphics off;
References
A. Agresti (2002) Categorical Data Analysis (Hoboken: John Wiley & Sons).
P.D. Allison (1999) Logistic regression using the SAS system. Theory and
Application (Cary: SAS Institute Inc.).
D.W. Hosmer, S. Lemeshow, R.X. Sturdivant (2013) Applied Logistic

Regression (Hoboken: John Wiley & Sons).
D.G. Kleinbaum and M. Klein (2010) Logistic regression. A Self-Learning Text,

3rd ed. (New York: Springer).
Thank you for your attention!
Any questions?

Logistic Regression With SAS

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Logistic Regression With SAS

Uploaded by

Copyright:

Available Formats

Logistic Regression with SAS

Summer Semester 2019/2020

Logistic Regression with SAS ©AK, SGH 2020

Basic notions (Agresti, 2002)

Y=y1 n11 n12

Comparing two proportions - measures in the

𝑅𝑖𝑠𝑘 (𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦): 𝑅𝑥𝑗 = 𝑃(𝑌 = 𝑖|𝑥 = 𝑗)

2x2 contingency table

Prospective study – experimental design

Numeric example – comparison of measures

Proc Freq in SAS

Excercise with R and Python [SAS] l01_OddsRatio.sas

proc freq data=a01;

Logistic Regression with SAS ©AK, SGH 2020

Origins of logistic regression

Exponential growth model

• The quantity W(t) at the time t can be determined by where we are in

Source: University of Oxford, Our World in Data, https://ourworldindata.org/world-population-growth, date

Exponential growth model – example

Exponential growth model – SAS code

proc sgplot data=a01;

Further development leading to logisitic regression model

Verhulst suggestion for final model for W(t)

1833 – end date for

Source: University of Oxford, Our World in Data, https://ourworldindata.org/grapher/population-by-

Logistic curve growth model – example

Logistic growth model – SAS code

proc sgplot data=a01;

The use of logistic regression models

Predictor X2 X2=y X2=n

Types of Response Variables in Logistic

Logistic Regression with SAS ©AK, SGH 2020

Binomial distribution (Agresti 2002, p. 5-6)

Binomial distribution PDF and CDF

• We know that 0.9 0.9

one out of ten 0.8 0.8

clients is 0.7 0.7

responding to 0.6 0.6

the survey. 0.5 0.5

How likely it is 0.4 0.4

15 if we call 0.2 0.2

100? 0.1 0.1

Multinomial distribution (Agresti 2002, p. 6-7)

Multinomial distribution – kernel density

• What likely it is 15 0.016

given that there 0.01

and 𝜋3=0.2 and

Poisson distribution (Agresti 2002, p. 7)

Poisson distribution PDF and CDF

Overdispersion (Agresti, 2002, p. 7-8)

Logistic Regression with SAS ©AK, SGH 2020

The constraints in probability modelling

• Probabilities need to be bounded in [0, 1] => we need other than identity

Notation in the logistic model

Through logit transformation we get:

Interpreting parameters in logistic regression

Logistic curve on an example – program code

proc sgplot data=a01;

Logistic curve on an example – output

Logit transformation on an example – program code

proc sgplot data=a01;

Logit transformation on an example – output

Multivariate binary logistic regression model