You are on page 1of 50

Logistic Regression

• Logistic regression is the appropriate regression analysis to conduct


when the dependent variable is dichotomous (binary). Like all regression
analyses, the logistic regression is a predictive analysis.

• Logistic regression is used to describe data and to explain the


relationship between one dependent binary variable and one or more
nominal, ordinal, interval or ratio-level independent variables.

• Logistic Regression is a classification algorithm and not a regression


algorithm, and it should not be confused with Linear Regression.

1. https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/what-is-logistic-regression/
2. Introduction to Machine Learning with Python (2017)
Advantages

• Logistic regression is easier to implement, interpret, and very efficient to train.

• It makes no assumptions about distributions of classes in feature space.

• It can easily extend to multiple classes and a natural probabilistic view of class predictions.

• It is very fast at classifying unknown records.

• It performs well when the dataset is linearly separable.

https://www.geeksforgeeks.org/advantages-and-disadvantages-of-logistic-regression/
Disadvantages

• If the number of observations is lesser than the number of features, Logistic Regression should not
be used, otherwise, it may lead to overfitting.

• It constructs linear boundaries.

• The major limitation of Logistic Regression is the assumption of linearity between the dependent
variable and the independent variables.

• It can only be used to predict discrete functions.

• Logistic Regression requires average or no multicollinearity between independent variables.

https://www.geeksforgeeks.org/advantages-and-disadvantages-of-logistic-regression/
**probability
between 0 & 1
**binary classification
-Essentially saying
points above defiantly
will take offer (>100%)
-points below defiantly
will not (<0%)
**similar to linear
regression, this is the
best fitting line given
these data points
**hat represents
prediction, so p_hat is
predicted probability
**0.5 threshold
-below 0.5 projected
downward
(y_hat = 0, no)
-above 0.5 projected
Upwards
(y_hat = 1, yes)
Correct Predictions
Correct Predictions
Incorrect
Predictions
k-Nearest Neighbors (k-NN)

• The k-NN algorithm is arguably the simplest machine learning algorithm. Building the model
consists only of storing the training dataset. To make a prediction for a new data point, the
algorithm finds the closest data points in the training dataset its “nearest neighbors.”

• the k-NN algorithm only considers exactly one nearest neighbor, which is the closest training
data point to the point we want to make a prediction for. The prediction is then simply the
known output for this training point.

• There are two important parameters to the K-neighbors classifier: the number of neighbors
and how you measure distance between data points.
• One of the strengths of k-NN is that the model is very easy to understand, and often
gives reasonable performance without a lot of adjustments.

• Using this algorithm is a good baseline method to try before considering more advanced
techniques.

• Building the nearest neighbors model is usually very fast, but when your training set is
very large (either in number of features or in number of samples) prediction can be slow.

• When using the k-NN algorithm, it’s important to preprocess your data.

• This approach often does not perform well on datasets with many features (hundreds or
more), and it does particularly badly with datasets where most features are 0 most of the
time.
End

Book: Introduction to Machine Learning with Python (2017)

You might also like