You are on page 1of 8

6/4/2019 Machine Learning: Bias VS.

Variance – Becoming Human: Artificial Intelligence Magazine

Machine Learning: Bias


VS. Variance
Alex Guanga
Oct 11, 2018

What is BIAS?

From EliteDataScience, bias is: “Bias occurs when an


algorithm has limited flexibility to learn the true signal from
the dataset.”

Wikipedia states, “… bias is an error from erroneous


assumptions in the learning algorithm. High bias can cause an
algorithm to miss the relevant relations between features and
target outputs (underfitting).”

https://becominghuman.ai/machine-learning-bias-vs-variance-641f924e6c57 1/8
6/4/2019 Machine Learning: Bias VS. Variance – Becoming Human: Artificial Intelligence Magazine

Bias is the accuracy of our predictions.

A high bias means the prediction will be inaccurate.


Intuitively, bias can be thought as having a ‘bias’ towards
people. If you are highly biased, you are more likely to
make wrong assumptions about them. An oversimplified
mindset creates an unjust dynamic: you label them
accordingly to a ‘bias.’

Forman’s article summarized this:

“Bias is the algorithm’s tendency to consistently learn the


wrong thing by not taking into account all the information in
the data (underfitting).”

Trending AI Articles:
1. Google will beat Apple at its own game
with superior AI

2. The AI Job Wars: Episode I

3. Introducing Open Mined: Decentralised


AI

4. AI & NLP Workshop

Thus, parametric algorithms are prone to high bias. A


parametric algorithm is defined as, “A learning model that
summarizes data with a set of parameters of fixed size
(independent of the number of training examples) is called a
parametric model. No matter how much data you throw at a

https://becominghuman.ai/machine-learning-bias-vs-variance-641f924e6c57 2/8
6/4/2019 Machine Learning: Bias VS. Variance – Becoming Human: Artificial Intelligence Magazine

parametric model, it won’t change its mind about how many


parameters it needs.”

A linear regression is an example of a parametric algorithm.


These are easy to understand but not flexible to learn the
underlying signal of the data. Thus, they are inaccurate for
complex datasets.

Examples of high-bias algorithms include Linear Regression,


Linear Discriminant Analysis, and Logistic Regression.

What is VARIANCE?

From EliteDataScience, the variance is: “Variance refers to an


algorithm’s sensitivity to specific sets of the training set occurs
when an algorithm has limited flexibility to learn the true
signal from the dataset.”

https://becominghuman.ai/machine-learning-bias-vs-variance-641f924e6c57 3/8
6/4/2019 Machine Learning: Bias VS. Variance – Becoming Human: Artificial Intelligence Magazine

Wikipedia states, “… variance is an error from sensitivity to


small fluctuations in the training set. High variance can cause
an algorithm to model the random noise in the training data,
rather than the intended outputs (overfitting).”

Variance is the difference between many model’s


predictions.

Unlike the analogy as before, we are implementing


complicated models. Hence, any ‘noise’ in the dataset,
might be captured by the model. A high variance tends to
occur when we use complicated models that can overfit our
training sets. For example, a variance can be thought as
having different stereotypes based on different
demographics.

For example, a complicated model might depict people’s name


as a good predictor of our hypothesis. However, names are
random and should not have any predictive power. In one
dataset, people with the name ‘Alex’ can indicate they are likely
to be criminals. However, in another dataset, people with the
name ‘Alex’ can indicate they likely to be graduates. Hence,
names should not be used as a predictive variable.

Forman’s described variance as:

“Variance is the algorithm’s tendency to learn random things


irrespective of the real signal by fitting highly flexible models
that follow the error/noise in the data too closely
(overfitting).”

What is the TRADE-OFF?

https://becominghuman.ai/machine-learning-bias-vs-variance-641f924e6c57 4/8
6/4/2019 Machine Learning: Bias VS. Variance – Becoming Human: Artificial Intelligence Magazine

If you have a simple model, you might conclude that every


“Alex” are amazing people. This presents a High Bias and
Low Variance problem. Your dataset is ‘biased’ towards
people with the name Alex. Thus, most predictions will be
similar, since you believe people with ‘Alex’ act a certain
way.

You attempt to fix the model. However, the model is too


complicated. Your model has different results for different
groups. Thus, Alex can be a wonderful person, a criminal,
an athlete, and a scholar.

You must find balance! The good thing, if you do Cross-


Validation, you can train on many datasets and average
their predictions.

Unfortunately, you cannot minimize bias


and variance.

Low Bias — High Variance:

https://becominghuman.ai/machine-learning-bias-vs-variance-641f924e6c57 5/8
6/4/2019 Machine Learning: Bias VS. Variance – Becoming Human: Artificial Intelligence Magazine

A low bias and high variance problem is overfitting. Different


data sets are depicting insights given their respective dataset.
Hence, the models will predict differently. However, if
average the results, we will have a pretty accurate prediction.

High Bias — Low Variance:

The predictions will be similar to one another but on average,


they are inaccurate.

Lessons From Andrew Ng’s Course:


If you have HIGH VARIANCE PROBLEM:

https://becominghuman.ai/machine-learning-bias-vs-variance-641f924e6c57 6/8
6/4/2019 Machine Learning: Bias VS. Variance – Becoming Human: Artificial Intelligence Magazine

You can get more training examples because a larger the


dataset is more probable to get a higher predictions.

Try smaller sets of features (because you are overfitting)

Try increasing lambda, so you can not overfit the training


set as much. The higher the lambda, the more the
regularization applies, for Linear Regression with
regularization.

If you have HIGH BIAS PROBLEM:

Try getting additional features, you are generalizing the


datasets.

Try adding polynomial features, make the model more


complicated.

Try decreasing lambda, so you can try to fit the data better.
The lower the lambda, the less the regularization applies,
for Linear Regression with regularization.

Reminders:
If a learning algorithm is suffering from high variance, getting
more training data helps a lot. High variance and low bias
means overfitting. This is caused by understanding the data to
well. With more data, it will find the signal and not the noise.

WANT MORE…
If so, I suggest following my Instagram page. I post summaries
and thoughts on a book that I have and am currently reading.

Instagram: Booktheories, Personal

https://becominghuman.ai/machine-learning-bias-vs-variance-641f924e6c57 7/8
6/4/2019 Machine Learning: Bias VS. Variance – Becoming Human: Artificial Intelligence Magazine

Follow me on: Twitter, GitHub,
and LinkedIn
AND if you liked this article, I’ll appreciate it if you click on
the like button below. THANKS!

https://becominghuman.ai/machine-learning-bias-vs-variance-641f924e6c57 8/8

You might also like