Professional Documents
Culture Documents
1. https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/what-is-logistic-regression/
2. Introduction to Machine Learning with Python (2017)
Advantages
• It can easily extend to multiple classes and a natural probabilistic view of class predictions.
https://www.geeksforgeeks.org/advantages-and-disadvantages-of-logistic-regression/
Disadvantages
• If the number of observations is lesser than the number of features, Logistic Regression should not
be used, otherwise, it may lead to overfitting.
• The major limitation of Logistic Regression is the assumption of linearity between the dependent
variable and the independent variables.
https://www.geeksforgeeks.org/advantages-and-disadvantages-of-logistic-regression/
**probability
between 0 & 1
**binary classification
-Essentially saying
points above defiantly
will take offer (>100%)
-points below defiantly
will not (<0%)
**similar to linear
regression, this is the
best fitting line given
these data points
**hat represents
prediction, so p_hat is
predicted probability
**0.5 threshold
-below 0.5 projected
downward
(y_hat = 0, no)
-above 0.5 projected
Upwards
(y_hat = 1, yes)
Correct Predictions
Correct Predictions
Incorrect
Predictions
k-Nearest Neighbors (k-NN)
• The k-NN algorithm is arguably the simplest machine learning algorithm. Building the model
consists only of storing the training dataset. To make a prediction for a new data point, the
algorithm finds the closest data points in the training dataset its “nearest neighbors.”
• the k-NN algorithm only considers exactly one nearest neighbor, which is the closest training
data point to the point we want to make a prediction for. The prediction is then simply the
known output for this training point.
• There are two important parameters to the K-neighbors classifier: the number of neighbors
and how you measure distance between data points.
• One of the strengths of k-NN is that the model is very easy to understand, and often
gives reasonable performance without a lot of adjustments.
• Using this algorithm is a good baseline method to try before considering more advanced
techniques.
• Building the nearest neighbors model is usually very fast, but when your training set is
very large (either in number of features or in number of samples) prediction can be slow.
• When using the k-NN algorithm, it’s important to preprocess your data.
• This approach often does not perform well on datasets with many features (hundreds or
more), and it does particularly badly with datasets where most features are 0 most of the
time.
End