You are on page 1of 13

Regularization and

Bias/Variance
Ankit Kumar
Bias and Variance
• Bias is used to allow the Machine Learning Model to learn in a simplified
manner. Ideally, the simplest model that is able to learn the entire dataset
and predict correctly on it is the best model. Hence, bias is introduced
into the model in the view of achieving the simplest model possible.
• Parameter based learning algorithms usually have high bias and hence are
faster to train and easier to understand. However, too much bias causes
the model to be oversimplified and hence underfits the data. Hence these
models are less flexible and often fail when they are applied on complex
problems.
• Mathematically, it is the difference between the model’s average
prediction and the expected value.
Variance
• Variance in data is the variability of the model in a case where different
Training Data is used. This would significantly change the estimation of
the target function. Statistically, for a given random variable, Variance
is the expectation of squared deviation from its mean. 
• In other words, the higher the variance of the model, the more
complex the model is and it is able to learn more complex functions.
However, if the model is too complex for the given dataset, where a
simpler solution is possible, a model with high Variance causes the
model to overfit. 
• When the model performs well on the Training Set and fails to
perform on the Testing Set, the model is said to have Variance.
Characteristics of a biased model 
• A biased model will have the following characteristics:
• Underfitting: A model with high bias is simpler than it should be and
hence tends to underfit the data. In other words, the model fails to
learn and acquire the intricate patterns of the dataset. 
• Low Training Accuracy: A biased model will not fit the Training
Dataset properly and hence will have low training accuracy (or high
training loss). 
• Inability to solve complex problems: A Biased model is too simple
and hence is often incapable of learning complex features and solving
relatively complex problems.
Characteristics of a model with Variance 

• A model with high Variance will have the following


characteristics:
• Overfitting: A model with high Variance will have a tendency to be overly
complex. This causes the overfitting of the model.
• Low Testing Accuracy: A model with high Variance will have very high
training accuracy (or very low training loss), but it will have a low testing
accuracy (or a low testing loss). 
• Overcomplicating simpler problems: A model with high variance tends to
be overly complex and ends up fitting a much more complex curve to a
relatively simpler data. The model is thus capable of solving complex
problems but incapable of solving simple problems efficiently.
• From the understanding of bias and variance individually thus far, it can be concluded that the two are

complementary to each other. In other words, if the bias of a model is decreased, the variance of the model

automatically increases. The vice-versa is also true, that is if the variance of a model decreases, bias starts to

increase.

• Hence, it can be concluded that it is nearly impossible to have a model with no bias or no variance since

decreasing one increases the other. This phenomenon is known as the Bias-Variance Trade
Detection of Bias and Variance of a
model
• In model building, it is imperative to have the knowledge to detect
if the model is suffering from high bias or high variance. The
methods to detect high bias and variance is given below:
1.Detection of High Bias:
1. The model suffers from a very High Training Error.
2. The Validation error is similar in magnitude to the training error.
3. The model is underfitting.
2.Detection of High Variance:
1. The model suffers from a very Low Training Error.
2. The Validation error is very high when compared to the training error.
3. The model is overfitting.
pproach to solve a Bias-Variance
Problem by Dr. Andrew Ng
Detection and Solution to High Bias problem - if the training
error is high: 

1.Train longer: High bias means a usually less complex model, and hence it requires more training iterations to learn the relevant patterns. Hence, longer

training solves the error sometimes.

2.Train a more complex model: As mentioned above, high bias is a result of a less than optimal complexity in the model. Hence, to avoid high bias, the

existing model can be swapped out with a more complex model. 

3.Obtain more features: It is often possible that the existing dataset lacks the required essential features for effective pattern recognition. To remedy this

problem: 
1. More features can be collected for the existing data.

2. Feature Engineering can be performed on existing features to extract more non-linear features. 

4.Decrease regularization: Regularization is a process to decrease model complexity by regularizing the inputs at different stages, promote generalization

and prevent overfitting in the process. Decreasing regularization allows the model to learn the training dataset better. 

5.New model architecture: If all of the above-mentioned methods fail to deliver satisfactory results, then it is suggested to try out other new model

architectures. 
Detection and Solution to High Variance
problem - if a validation error is high
1.Obtain more data: High variance is often caused due to a lack of training data. The model complexity and quantity of training

data need to be balanced. A model of higher complexity requires a larger quantity of training data. Hence, if the model is suffering

from high variance, more datasets can reduce the variance. 

2.Decrease number of features: If the dataset consists of too many features for each data-point, the model often starts to suffer

from high variance and starts to overfit. Hence, decreasing the number of features is recommended. 

3.Increase Regularization: As mentioned above, regularization is a process to decrease model complexity. Hence, if the model is

suffering from high variance (which is caused by a complex model), then an increase in regularization can decrease the

complexity and help to generalize the model better.

4.New model architecture: Similar to the solution of a model suffering from high bias, if all of the above-mentioned methods fail

to deliver satisfactory results, then it is suggested to try out other new model architectures.
Bagging vs Boosting in Machine Learning

• As we know, Ensemble learning helps improve machine learning results by combining several models. This
approach allows the production of better predictive performance compared to a single model. Basic idea is to
learn a set of classifiers (experts) and to allow them to vote. Bagging and Boosting are two types of Ensemble
Learning. These two decrease the variance of a single estimate as they combine several estimates from different
models. So the result may be a model with higher stability. Let’s understand these two terms in a glimpse.

1.Bagging: It is a homogeneous weak learners’ model that learns from each other independently in parallel and
combines them for determining the model average.

2.Boosting: It is also a homogeneous weak learners’ model but works differently from Bagging. In this model,
learners learn sequentially and adaptively to improve model predictions of a learning algorithm.
Bagging
• Bootstrap Aggregating, also knows as bagging, is a machine learning ensemble
meta-algorithm designed to improve the stability and accuracy of machine learning
algorithms used in statistical classification and regression. It decreases
the variance and helps to avoid overfitting. It is usually applied to decision tree
methods. Bagging is a special case of the model averaging approach.
1.Bagging is a homogeneous weak learners’ model that learns from each other
independently in parallel and combines them for determining the model average.
2.Boosting is also a homogeneous weak learners’ model but works differently from
Bagging. In this model, learners learn sequentially and adaptively to improve
model predictions of a learning algorithm.
• Bagging is an acronym for ‘Bootstrap Aggregation’ and is used to decrease the variance in the prediction model. Bagging is a parallel method

that fits different, considered learners independently from each other, making it possible to train them simultaneously.

• Bagging generates additional data for training from the dataset. This is achieved by random sampling with replacement from the original

dataset. Sampling with replacement may repeat some observations in each new training data set. Every element in Bagging is equally probable

for appearing in a new dataset. 

• These multi datasets are used to train multiple models in parallel. The average of all the predictions from different ensemble models is

calculated. The majority vote gained from the voting mechanism is considered when classification is made. Bagging decreases the variance and

tunes the prediction to an expected outcome.

• Example of Bagging:

• The Random Forest model uses Bagging, where decision tree models with higher variance are present. It makes random feature selection to

grow trees. Several random trees make a Random Forest.

You might also like