You are on page 1of 7

What is distance function in machine learning?

Distance measures play an important role in machine learning. A distance measure is an


objective score that summarizes the relative difference between two objects in a problem
domain.

Another unsupervised learning algorithm that uses distance measures at its core is the K-means
clustering algorithm.

What is the purpose of the distance function?

A distance function provides distance between the elements of a set. If the distance is zero then
elements are equivalent else they are different from each other. A distance function is nothing
but a mathematical formula used by distance metrics. The distance function can differ across
different distance metrics.

What is concept of distance?

The extent or amount of space between two things, points, lines, etc. the state or fact of being
apart in space, as of one thing from another; remoteness. a linear extent of space: Seven miles is
a distance too great to walk in an hour.

Distance measures play an important role in machine learning. A distance measure is an


objective score that summarizes the relative difference between two objects in a problem
domain. Most commonly, the two objects are rows of data that describe a subject (such as a
person, car, or house), or an event (such as a purchase, a claim, or a diagnosis).

How will you define the similarity between different observations here? How can we say that
two points are similar to each other? This will happen if their features are similar, right? When
we plot these points, they will be closer to each other in distance.

Distance metrics are a key part of several machine learning algorithms. These distance metrics
are used in both supervised and unsupervised learning, generally to calculate the similarity
between data points.
4 Types of Distance Metrics in Machine Learning

1. Euclidean Distance
2. Manhattan Distance
3. Minkowski Distance
4. Hamming Distance

1. Euclidean Distance

Euclidean Distance represents the shortest distance between two points.

Most machine learning algorithms including K-Means use this distance metric to
measure the similarity between observations. Let’s say we have two points as
shown below
So, the Euclidean Distance between these two points A and B will be:

Here’s the formula for Euclidean Distance:

We use this formula when we are dealing with 2 dimensions. We can generalize this for an
n-dimensional space as:

Where,

n = number of dimensions

pi, qi = data points


2. Manhattan Distance
Manhattan Distance is the sum of absolute differences between points across all
the dimensions.

We can represent Manhattan Distance as:

Since the above representation is 2 dimensional, to calculate Manhattan Distance, we will take
the sum of absolute distances in both the x and y directions. So, the Manhattan distance in a 2-
dimensional space is given as:
And the generalized formula for an n-dimensional space is given as:

Where,

n = number of dimensions

pi, qi = data points

3. Minkowski Distance
The Minkowski distance or Minkowski metric is a metric in a normed vector space which can be
considered as a generalization of both the Euclidean distance and the Manhattan distance.

It is named after the German mathematician Hermann Minkowski.

The formula for Minkowski Distance is given as:

You might also like