You are on page 1of 15

Curse of Dimensionality

Prof. Nicholas Zabaras

Email: nzabaras@gmail.com
URL: https://www.zabaras.com/

September 3, 2020

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras


Contents
 Curse of Dimensionality-Gaussian in High Dimensions
 Polynomial Regression in High Dimensional Input
 Volume of a Sphere in High Dimensions, Area of Sphere in D-Dimensions,
Hypercube in High Dimensions
 A Gaussian Distribution in High Dimensions, Maximum of the Distribution and
of the Probability Mass, Summary
 The goals for today’s lecture include:
 Understand the challenges of dealing with high-dimensionality

 Learn how to compute and interpret volume/area in high-dimensions

 Obtain a physical intuition of probability and probability mass in high-dimensions


• Chris Bishops’ PRML book, Chapter 1
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 2
The Curse of Dimensionality
 In many applications we have to deal with spaces of high-dimensionality
comprising many input variables. Consider e.g. a classification problem.

Test
point

 Naïve classification based on division in cells & taking a “majority vote” on the
cell where a test point 𝑥 lies is not practical in high-dimensions.
 Bishop, C. M. and G. D. James (1993). Analysis of multiphase flows using dual-energy gamma densitometry and neural networks. Nuclear Instruments
and Methods in Physics Research A327, 580–593
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 3
The Curse of Dimensionality
 With an exponentially large number of cells, we will need an exponential large
training data set to be sure that the cells are not empty.

𝑀𝐷 # cubical regions
(here 𝑀 = 3)

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 4


Polynomial Regression
 Consider polynomial curve fitting, 𝑀 = 3
D3 # of terms

D D D D D D
y ( x , w )  w0   wi xi   wij xi x j   wijk xi x j xk
i 1 i 1 i 1 i 1 i 1 i 1

 𝐷 is the dimensionality of the input space.

 How many 𝑤 −terms do we have in the 𝑀th term in this general polynomial
expansion? Using symmetry:
𝐷 𝐷 𝐷 𝐷 𝑖1 𝑖𝑀−1

෍ ෍ … ෍ 𝑤𝑖1 𝑖2...𝑖𝑀 𝑥𝑖1 𝑥𝑖2 . . . 𝑥𝑖𝑀 = ෍ ෍ … ෍ 𝑤


ഥ𝑖1𝑖2 ...𝑖𝑀 𝑥𝑖1 𝑥𝑖2 . . . 𝑥𝑖𝑀
𝑖1 =1 𝑖2 =1 𝑖𝑀 =1 𝑖1 =1 𝑖2 =1 𝑖𝑀 =1

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 5


Polynomial Regression in High Dimensions
𝐷 𝑖1 𝑖𝑀−1

෍ ෍… ෍ 𝑤
ഥ𝑖1𝑖2...𝑖𝑀 𝑥𝑖1 𝑥𝑖2 . . . 𝑥𝑖𝑀
𝑖1 =1 𝑖2 =1 𝑖𝑀 =1
 We can estimate the number of terms in this expansion as:
D i1 iM 1 D  i1 iM 1
 D D
n( D, M )   ... 1     ... 1   n(i1 , M  1)   n(i, M  1)
i1 1 i2 1 iM 1 i1 1  i2 1 iM 1  i1 1 i 1

 By induction in 𝐷 (and any 𝑀) can directly show that (Hint: true for 𝐷 = 1,
assume true for 𝐷 and show true for 𝐷 + 1):
D
 i  M  2 !   D  M  1!

i 1  i  1 ! M  1 !  D  1! M !
(1)

n ( D, M ) 
 D  M  1!
 Using the above and by induction in 𝑀 and any 𝐷 can show that  D  1!M !
𝐷+𝑀−2 !
 For 𝑀 = 2, n( D, 2)  D  D  1 2 , assume true for 𝑀 − 1, i.e. n 𝐷, 𝑀 − 1 =
D
Using (1), we obtain: n( D, M )   n(i, M  1) 
 D  M  1 !
𝐷−1 ! 𝑀−1 !

i 1  D  1! M !
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 6
Polynomial Expansion in High Dimensions
 The total number of
M
the terms in a polynomial of order 𝑀 in a 𝐷 −dimensional
space is: N ( D, M )   n( D, m) . Prove that:
m0

N ( D, M ) 
 D  M !
D !M !
 D  M  1!
 Using n ( D, m) 
 D  1! M !
and by induction in 𝑀 (for constant 𝐷):
M
N ( D, M  1)   n( D, m)  n( D, M  1) 
 D  M !   D  M ! 
 D  M  1!
m0 D !M ! ( D  1)!( M  1)! D !( M  1)!

 Consider the case 𝑀 >> 𝐷 and use Stirling’s approximation for large 𝑛, n !  n n e  n
you can show that 𝑁 is of the order 𝑀𝐷:
D  M 
DM M D
e( D M ) M D  M e D  D M D e D  D  (1  D )e
D
N ( D, M )   1    1  ( M  D)   MD
D !M M e M D !M M  M D!  M  D!

 Similarly for 𝐷 >> 𝑀, 𝑁(𝐷, 𝑀)~𝐷𝑀 .

 Note that 𝑁(𝐷 = 100, 𝑀 = 3) = 176,851!


Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 7
Volume of a Sphere in High-Dimensions
 Consider a sphere of radius 𝑟 = 1 in 𝐷 dimensions. Let us compute the
fraction of the volume of the sphere that lies between radius 𝑟 = 1 − 𝜀 and
𝑟 = 1.
VD (1)  VD (1   ) K D 1D  K D (1   ) D

VD (1) K D1D
 1  (1   ) D

 Note that for large 𝐷, this fraction tends to 1 even


for small 𝜀.

 In high dimensions, the volume of the sphere is The volume of a sphere in


D-dimensions is given as
concentrated near the surface!
VD (r )  K D r D ,
K D  D - dependent constant
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 8
Area of a Sphere in 𝐷 Dimensions
 Let us prove that the surface of a unit sphere in 𝐷 − dimensions is:
D
S D  2 D /2 G  
2
 This can be derived (transform from Cartesian to polar coordinates) easily by
noting that: D   D

e i 1 
 xi2
dxi  S D  e r2
r D 1dr ,  dxi  S D r D 1dr
i 1
0

 From the normalization of the Gaussian and definition of the G function


  2 D /2
S D  u D /21 SD  D  SD 
 D /2  S D  e  r r D 1dr 
2 0
 G  
2
e u du D
0
2 2 G 
2
D
G 
2
2 D /2 D 1
a
 The area of an arbitrary sphere at 𝑟 = 𝑎:  
D
G 
2 3/2 2 2 3/2 2 2
a  1/2 a  4 a 2
 Verify that for 𝐷 = 3, 3 
G 
2 2
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 9
Volume of a Sphere in High-Dimensions
 The volume of a unit sphere in 𝐷 − dimensions can be computed as:
2 D /2
1
SD
D
VD  S D  r D 1
dr  
 dxi  S D r D1dr 
i 1
0
D D
DG  
2
SD D 2 D /2 D
 For a sphere of radius 𝑟 = 𝑎, the volume is: a  a
D  
D
DG  
S D 3 4 3 volume
2
a 
of unit
 Indeed we can verify that in 3D, D 3
a radius
sphere

 Let us calculate the ratio of the volume of a sphere of radius 𝑎 to the volume
of the hypercube of size 2𝑎 (sphere touches the hypercube at the centers of
each side) Vol. of sphrere 2 1  D /2 D /2
 a D

D  2a  D
D
Vol. of cube
DG   D 2 D 1 G  
2 2

 For high dimensions, use Stirling’s formula for the Gamma function
2  z  4  D 
z D /2
D
G z     G     
z e 2 D  2e 
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 10
Volume of a Sphere in High-Dimensions
Vol. of sphrere  D /2 D 4  D 
D /2
 G  
Vol. of cube D
D 2 D 1 G  
 
2 D  2e 
2

 Combining the above two Equations, we can derive:


 D /2  D /2 D1/2  2e  1  e 
D /2 D /2
Vol. of sphrere
     0
Vol. of cube  D  D 2 D 12 1/2  D   D1/2  2 D  D 
D 2 D 1 G  
2

 It is interesting to note that this ratio goes to zero as 𝐷∞.

 This implies that most of the volume of the hypercube must be concentrated
in its many corners which themselves become very long spikes (note distance
from the center of the cube to the corners a D   )!

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 11


Curse of Dimensionality: Gaussian in High 𝐷
 Also consider the behavior of a Gaussian distribution in a high-dimensional
space.

𝑝𝑚(𝑟)

 The probability mass 𝑝𝑚(𝑟) = 𝑝(𝑟) 𝑑𝑟 is concentrated in a thin shell.

 Here we transformed to polar coordinates and integrated out the directional


variables (the proof is given next).
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 12
Curse of Dimensionality: Gaussian in High D
 Consider a Gaussian in high 𝐷 of the form:
1  x 2 
p( x )  exp  2 
 2 
2 D /2
 2 

 Consider a shell of radius 𝑟 and thickness 𝜀. The density is approximately


constant at x  r and the shell has volume S r  where S D is the surface area of
2 2 D 1
D

a unit sphere in 𝐷 − dimensions. The probability mass on this shell is then:


r2

pm (r )  
shell
p ( x )dx  S D r D 1 p (r ) ~ S D r D 1 e 2 2

 The max of 𝑝𝑚(𝑟) (𝐷 >> 1) takes place at:


r2 r2
 1 
( D  1)r D  2 e 2 2
 rD e 2 2
 0  𝑟Ƹ ≈ 𝐷𝜎
 2
2
Ƹ
𝑟+𝜀
 Similarly the mass 𝑝𝑚 (𝑟Ƹ + 𝜀) is given as: 𝑝𝑚 (𝑟Ƹ + 𝜀) ∝ 𝑟Ƹ + 𝜀 𝐷−1 − 2𝜎2
𝑒
2
𝑟Ƹ + 𝜀 𝜀2 𝑟𝜀
Ƹ 𝜀 𝜀2
𝑝𝑚 (𝑟Ƹ + 𝜀) ∝ exp − + (𝐷 − 1)ln(𝑟Ƹ + 𝜀) = 𝑝𝑚 (𝑟)exp
Ƹ − 2 − 2 + (𝐷 − 1)( − 2 ቇ
2𝜎 2 2𝜎 𝜎 𝑟Ƹ 2𝑟Ƹ
ln(1  x)  x  x 2 2  O( x 3 )

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 13


Curse of Dimensionality: Gaussian in High 𝐷
 Thus using 𝑟Ƹ ≈ 𝐷𝜎 and assuming high 𝐷, we conclude:
𝜀2 𝐷𝜀 𝜀 𝜀2 𝜀2
𝑝𝑚 (𝑟Ƹ + 𝜀) ∝ 𝑝𝑚 (𝑟)exp
Ƹ − 2− + (𝐷 − 1)( − 2
൱ ∝ 𝑝𝑚 (𝑟)exp
Ƹ − 2
2𝜎 𝜎 𝐷𝜎 2𝐷𝜎 𝜎
 𝑝𝑚(𝑟) decays exponentially from its max at 𝑟Ƹ with length scale  where
𝑟Ƹ >> 𝜎 , i.e. the mass is concentrated in a thin shell at a large radius

 Let us now compare the density p( x  0) versus 𝑝(‖𝒙‖ = 𝑟Ƹ )


1
𝑝 𝒙=𝟎 = 𝐷 Τ2
2𝜋𝜎 2 𝑝(𝒙 = 0) 𝐷
⇒ = exp
1 𝑟Ƹ 2 1 𝐷 𝑝(‖𝒙‖ = 𝑟Ƹ ) 2
𝑝(‖𝒙‖ = 𝑟)Ƹ = 𝐷 Τ2
exp − 2 = exp −
2𝜋𝜎 2 2𝜎 2𝜋𝜎 2 𝐷Τ2 2

 We see that the probability is larger at the origin than at 𝑟Ƹ by a factor 𝑒 𝐷/2 , i.e.
most of the probability mass in high 𝐷 is concentrated at a different radius
than the density!
 This distinction for Gaussians in high 𝐷 is important in Bayesian estimation.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 14
Curse of Dimensionality
 Real data are confined to a region of the space having lower effective
dimensionality.

 Data usually live on a low-dimensional manifold embedded within the high-


dimensional space.

 Real data exhibit smoothness so that small changes in the input variables
produce small changes in the target variables.

 Thus we can still exploit local interpolation-like techniques to make


predictions of the target variables for new values of the input variables.

 Addressing the so called “curse of (stochastic) dimensionality” (Bellman,


1961) is a fundamental problem in machine learning.
 Bellman, R. (1961). Adaptive Control Processes: A Guided Tour. Princeton University Press.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 15

You might also like