You are on page 1of 7

Logistic Regression

What is Logistic Regression?


• Logistic regression is a statistical analysis method to predict a
binary/ dichotomous outcome, such as yes or no, based on prior
observations of a data set.
• Logistic regression, also called a logit model. In the logit model
the log odds of the outcome is modeled as a linear combination
of the predictor variables.
• A logistic regression model predicts a dependent data variable
by analyzing the relationship between one or more existing
independent variables.
Logistic Regression (Contd..)
• Logistic regression is an example of supervised learning. It
is used to calculate or predict the probability of a binary
(yes/no) event occurring. An example of logistic regression
could be applying machine learning to determine if a
person is likely to be infected with COVID-19 or not.
• Logistic regression can also play a role in data preparation
activities by allowing data sets to be put into specifically
predefined buckets during the extract, transform, load (ETL)
process in order to stage the information for analysis.
Assumptions for Logistic Regression
• The variables must be independent of one another. So, for example, zip code
and gender could be used in a model, but zip code and state would not work.
• The raw data should represent unrepeated or independent phenomena. For
example, a survey of customer satisfaction should represent the opinions of
separate people. But these results would be skewed if someone took the
survey multiple times from different email addresses to qualify for a reward.
• Logistic regression also requires a significant sample size. This can be as small
as 10 examples of each variable in a model. But this requirement goes up as
the probability of each outcome drops.
• each variable can be represented using binary categories such as
male/female, click/no-click.
Logistic Regression Vs Linear Regression
• Logistic regression provides a constant output, while linear regression provides a
continuous output.
• In logistic regression, the outcome, or dependent variable, has only two possible
values. However, in linear regression, the outcome is continuous, which means that it
can have any one of an infinite number of possible values.
• Logistic regression is used when the response variable is categorical, such as yes/no,
true/false and pass/fail. Linear regression is used when the response variable is
continuous, such as hours, height and weight. For example, given data on the time a
student spent studying and that student's exam scores, logistic regression and linear
regression can predict different things.
• With logistic regression predictions, only specific values or categories are allowed.
Therefore, logistic regression predicts whether the student passed or failed. Since
linear regression predictions are continuous, such as numbers in a range, it can
predict the student's test score on a scale of 0 to100.
Example
• A researcher is interested in how variables, such as GRE (Graduate Record
Exam scores), GPA (grade point average) and prestige of the
undergraduate institution, effect admission into graduate school. The
response variable, admit/don’t admit, is a binary variable.
• Use Data set binary.sav (provided by instructor)
• This dataset has a binary response (outcome, dependent) variable called
admit, which is equal to 1 if the individual was admitted to graduate
school, and 0 otherwise. There are three predictor variables: gre, gpa,
and rank. We will treat the variables gre and gpa as continuous. The
variable rank takes on the values 1 through 4. Institutions with a rank of 1
have the highest prestige, while those with a rank of 4 have the lowest.
References:
• https://researchwithfawad.com/index.php/lp-courses/data-analysis-
using-spss/binary-logistic-regression-analysis-in-spss/

You might also like