fered Advantages Ee euch ed
Linear regression algorithms are all Fastiotrain | Not very accurate.
about finding best fit line through data. | and forecast. | Don't use for non
Logistic regression is adaptation of linear | Good for small | linear data.
Python Libraries: | regression algorithm so that it can classification | Not flexible to
sKLeam forecast problems where data is data problems. | adapt to complex
classified into groups. If you want to Easy to data.
linear_model.Logis | classify data such as finding if an object | understand Model occasionally
ticRegression belongs to a category or if you want to ends up overfitting
find probability of an event happening
then use Logistic regression.
Itis a Linear Classification Model
therefore it finds relationship between
independent and dependent variables.
Outcome probabilities are modelled
using logistic functions
Nearest Ifyou have known dala properties (such_ | Simple, Memory intensive.
Neighbours as products that customers have bought) | adaptable to | Costly, all training
and want to use your data to forecast the problem. | data might be
Python Libraries: | new events (such as finding what Accurate. involved in
sKLeam products to recommend to a new Easy to decision making.
customer based on similar products that | understand Slow performance
Neighbors.KNeighb | existing customers have bought) then Use spatial due to 10
orsClassifier use nearest neighbour. trees to operations.
It finds sample data that is closest in improve space | Choosing wrong
distance to the target object. issues. distance measure
Random Forest
Python Libraries:
SKLeam
ensemble.Random
ForestClassifier
Euclidean distance can be used to
measure the new data points.
Ifyou have alarge data set and
forecasting data is based on multiple
decisions then use random forest. With
random forest, you can split data, give it
to multiple decision trees, combine
multiple trees to forest and use majority
vote to find the best possible decision.
‘An example could be finding best seller
TV brand for next year based on different
categories such as price, TV sold last
year, warranty, screen size etc.
Random forest is atype of ensemble
which is a combination of gathering
decisions (outcomes) from different
algorithms.
A large number of decision trees are
created to form a random forecast. Each
decision tree forecasts a value and then
average of the forecasted values are
chosen
There are two stages; first to create a
tree and second is to get the tree to
forecast.
Each tree in the ensemble is built from a
sample drawn with replacement (.e., a
bootstrap sample) from the training set.
In addition, when splitting a node during
the construction of the tree, the split that
is chosen is no longer the best split
among all features. Instead. the split that
High accuracy
Good starting
point to solve
a problem
Flexible and
can fit variety
of different
data well
Fast to
execute
Easy to use
Useful for
regression and
classification
problems
Can model
missing values.
Itis high
performing.
can produce
inaccurate results
‘Slow at training
Overfiting
Not suitable for
small samples
‘Small change in
training data
changes model.
Oceasionally too
simple for very
complex problems.