You are on page 1of 9

Pseudo-R squared

• R squared in linear regression(Known as Non


Pseudo R2) is more appropriate to think of it is a
measure of explained variation, rather than
goodness of fit.
 
• The second most common type of regression
model is logistic regression, which is appropriate
for binary outcome data. And the R2 calculated
to explained variation in logistic regression
model is known as Pseudo R2.
 
How Pseudo R2 is calculated for a logistic regression model?
Different researchers have proposed different measures for
logistic regression, with the objective usually that the
measure inherits the properties of the familiar R squared
from linear regression.
Commonly Encountered Pseudo R-Squared
1) Efron’s
2) McFadden’s
3) McFadden’s (adjusted)
4) Cox & Snell
5) Nagelkerke / Cragg & Uhler’s
6) McKelvey & Zavoina
7) Count
8) Adjusted Count:
McFadden's pseudo-R squared
• Logistic regression models are fitted using the method of maximum likelihood -
i.e. the parameter estimates are those values which maximize the likelihood of
the data which have been observed. McFadden's R squared measure is defined
as  

Where ,
Lc denotes the maximized likelihood value from current fitted model
Lnull denotes the corresponding value for the null model (ie.the model with only
an intercept and no covariates)
• In R, the glm (generalized linear model) command is the standard command
for fitting logistic regression. The fitted glm object doesn't directly give any of
the pseudo R squared values, but McFadden's measure can be readily
calculated. To do so, we first fit our model of interest, and then the null model
which contains only an intercept. We can then calculate McFadden's R squared
using the fitted model log likelihood values:
• mod <- glm(y~x, family="binomial")nullmod <- glm(y~1, family="binomial")1-
logLik(mod)/logLik(nullmod)
McFadden's pseudo-R squared in R

• In R, the glm (generalized linear model) command is the


standard command for fitting logistic regression. The fitted
glm object doesn't directly give any of the pseudo R squared
values, but McFadden's measure can be readily calculated.
To Calculate
• We first fit our model of interest, and then the null model
which contains only an intercept.
• We can then calculate McFadden's R squared using the
fitted model log likelihood values as
mod <- glm(y~x, family="binomial")
nullmod <- glm(y~1, family="binomial")
1- logLik(mod)/logLik(nullmod)
IN R
Pseudo R2 directly obtained by using function
PseudoR2
PseudoR2(x, which = NULL)
Arguments
X:the glm, polr or multinom model object to be
evaluated.
Which:character, one out
of "McFadden", "McFaddenAdj", "CoxSnell", "Nag
elkerke", "AldrichNelson", "VeallZimmermann", "
Efron", "McKelveyZavoina", "Tjur", "all".
Example 1
install.packages("DescTools")
library(DescTools)
Titanic
str(Titanic)
Data=Untable(Titanic)
head(Data)
Model = glm(Survived ~ ., data=Data, family=binomial)
Nullmodel= glm(Survived ~ 1, data=Data, family=binomial)
1- (logLik(Model)/logLik(Nullmodel))
#Direct calculation using PseudoR2 function
PseudoR2(Model)
PseudoR2(r.glm, c("McFadden", "Nagel"))
Example 2
#Example 2
#Installing package ISLR and calling it
install.packages(ISLR)
library(ISLR)
# Data set
Smarket
head(Smarket)
dim(Smarket)
# Logistic Regression
glm.fit <- glm(Direction ~., data = Smarket, family = binomial)
summary(glm.fit)
# Prediction
glm.probs <- predict(glm.fit,type = "response")
glm.probs[1:5]
# Based on probabilities finding market is up or down
glm.pred <- ifelse(glm.probs > 0.5, "Up", "Down")
glm.pred[1:5]
# Giving wrong predication.
table=table(Predicted=glm.pred,Actual=Smarket$Direction)
addmargins(table)
#Calulation of PseudoR2
PseudoR2(glm.fit)
Example 3
Data=Untable(HairEyeColor)
dim(Data)
Model = glm(Sex ~ ., data=Data, family=binomial)
summary(Model)
# Prediction
Model.probs <- predict(Model,type = "response")
Model.probs[1:5]
# Based on probabilities finding sex
Model.pred <- ifelse(Model.probs > 0.5, "Male", "Female")
Model.pred[1:5]
# Giving wrong predication.
table=table(Predicted=Model.pred,Actual=Data$Sex)
addmargins(table)
Accuracy=sum(diag(table))/sum(table)
cat("Accuracy=",Accuracy)
#Calulation of PseudoR2
Nullmodel= glm(Sex ~ 1, data=Data, family=binomial)
1- (logLik(Model)/logLik(Nullmodel))
#Direct calculation using PseudoR2 function
PseudoR2(Model)
PseudoR2(r.glm, c("McFadden", "Nagel"))

You might also like