Lec10 CurseOfDimensionality

Curse of Dimensionality
Prof. Nicholas Zabaras
Email: nzabaras@gmail.com
URL: https://www.zabaras.com/
September 3, 2020
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras

Contents
 Curse of Dimensionality-Gaussian in High Dimensions
 Polynomial Regression in High Dimensional Input
 Volume of a Sphere in High Dimensions, Area of Sphere in D-Dimensions,
Hypercube in High Dimensions
 A Gaussian Distribution in High Dimensions, Maximum of the Distribution and
of the Probability Mass, Summary
 The goals for today’s lecture include:
 Understand the challenges of dealing with high-dimensionality
 Learn how to compute and interpret volume/area in high-dimensions
 Obtain a physical intuition of probability and probability mass in high-dimensions

• Chris Bishops’ PRML book, Chapter 1
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 2
The Curse of Dimensionality
 In many applications we have to deal with spaces of high-dimensionality
comprising many input variables. Consider e.g. a classification problem.
Test
point
 Naïve classification based on division in cells & taking a “majority vote” on the
cell where a test point 𝑥 lies is not practical in high-dimensions.
 Bishop, C. M. and G. D. James (1993). Analysis of multiphase flows using dual-energy gamma densitometry and neural networks. Nuclear Instruments
and Methods in Physics Research A327, 580–593
The Curse of Dimensionality
 With an exponentially large number of cells, we will need an exponential large
training data set to be sure that the cells are not empty.
𝑀𝐷 # cubical regions
(here 𝑀 = 3)

Polynomial Regression
 Consider polynomial curve fitting, 𝑀 = 3
D3 # of terms
D D D D D D
y ( x , w )  w0   wi xi   wij xi x j   wijk xi x j xk
i 1 i 1 i 1 i 1 i 1 i 1
 𝐷 is the dimensionality of the input space.
 How many 𝑤 −terms do we have in the 𝑀th term in this general polynomial
expansion? Using symmetry:
𝐷 𝐷 𝐷 𝐷 𝑖1 𝑖𝑀−1
෍ ෍ … ෍ 𝑤𝑖1 𝑖2...𝑖𝑀 𝑥𝑖1 𝑥𝑖2 . . . 𝑥𝑖𝑀 = ෍ ෍ … ෍ 𝑤

ഥ𝑖1𝑖2 ...𝑖𝑀 𝑥𝑖1 𝑥𝑖2 . . . 𝑥𝑖𝑀
𝑖1 =1 𝑖2 =1 𝑖𝑀 =1 𝑖1 =1 𝑖2 =1 𝑖𝑀 =1

Polynomial Regression in High Dimensions
𝐷 𝑖1 𝑖𝑀−1
෍ ෍… ෍ 𝑤
ഥ𝑖1𝑖2...𝑖𝑀 𝑥𝑖1 𝑥𝑖2 . . . 𝑥𝑖𝑀
𝑖1 =1 𝑖2 =1 𝑖𝑀 =1
 We can estimate the number of terms in this expansion as:
D i1 iM 1 D  i1 iM 1
 D D
n( D, M )   ... 1     ... 1   n(i1 , M  1)   n(i, M  1)
i1 1 i2 1 iM 1 i1 1  i2 1 iM 1  i1 1 i 1
 By induction in 𝐷 (and any 𝑀) can directly show that (Hint: true for 𝐷 = 1,
assume true for 𝐷 and show true for 𝐷 + 1):
D
 i  M  2 !   D  M  1!

i 1  i  1 ! M  1 !  D  1! M !
(1)
n ( D, M ) 
 D  M  1!
 Using the above and by induction in 𝑀 and any 𝐷 can show that  D  1!M !
𝐷+𝑀−2 !
 For 𝑀 = 2, n( D, 2)  D  D  1 2 , assume true for 𝑀 − 1, i.e. n 𝐷, 𝑀 − 1 =
D
Using (1), we obtain: n( D, M )   n(i, M  1) 
 D  M  1 !
𝐷−1 ! 𝑀−1 !
i 1  D  1! M !
Polynomial Expansion in High Dimensions
 The total number of
M
the terms in a polynomial of order 𝑀 in a 𝐷 −dimensional
space is: N ( D, M )   n( D, m) . Prove that:
m0
N ( D, M ) 
 D  M !
D !M !
 D  M  1!
 Using n ( D, m) 
 D  1! M !
and by induction in 𝑀 (for constant 𝐷):
M
N ( D, M  1)   n( D, m)  n( D, M  1) 
 D  M !   D  M ! 
 D  M  1!
m0 D !M ! ( D  1)!( M  1)! D !( M  1)!
 Consider the case 𝑀 >> 𝐷 and use Stirling’s approximation for large 𝑛, n !  n n e  n
you can show that 𝑁 is of the order 𝑀𝐷:
D  M 
DM M D
e( D M ) M D  M e D  D M D e D  D  (1  D )e
D
N ( D, M )   1    1  ( M  D)   MD
D !M M e M D !M M  M D!  M  D!
 Similarly for 𝐷 >> 𝑀, 𝑁(𝐷, 𝑀)~𝐷𝑀 .
 Note that 𝑁(𝐷 = 100, 𝑀 = 3) = 176,851!

Volume of a Sphere in High-Dimensions
 Consider a sphere of radius 𝑟 = 1 in 𝐷 dimensions. Let us compute the
fraction of the volume of the sphere that lies between radius 𝑟 = 1 − 𝜀 and
𝑟 = 1.
VD (1)  VD (1   ) K D 1D  K D (1   ) D

VD (1) K D1D
 1  (1   ) D
 Note that for large 𝐷, this fraction tends to 1 even

for small 𝜀.
 In high dimensions, the volume of the sphere is The volume of a sphere in

D-dimensions is given as
concentrated near the surface!
VD (r )  K D r D ,
K D  D - dependent constant
Area of a Sphere in 𝐷 Dimensions
 Let us prove that the surface of a unit sphere in 𝐷 − dimensions is:
D
S D  2 D /2 G  
2
 This can be derived (transform from Cartesian to polar coordinates) easily by
noting that: D   D
e i 1 
 xi2
dxi  S D  e r2
r D 1dr ,  dxi  S D r D 1dr
i 1
0
 From the normalization of the Gaussian and definition of the G function

  2 D /2
S D  u D /21 SD  D  SD 
 D /2  S D  e  r r D 1dr 
2 0
 G  
2
e u du D
0
2 2 G 
2
D
G 
2
2 D /2 D 1
a
 The area of an arbitrary sphere at 𝑟 = 𝑎:  
D
G 
2 3/2 2 2 3/2 2 2
a  1/2 a  4 a 2
 Verify that for 𝐷 = 3, 3 
G 
2 2
 The volume of a unit sphere in 𝐷 − dimensions can be computed as:
2 D /2
1
SD
D
VD  S D  r D 1
dr  
 dxi  S D r D1dr 
i 1
0
D D
DG  
2
SD D 2 D /2 D
 For a sphere of radius 𝑟 = 𝑎, the volume is: a  a
D  
D
DG  
S D 3 4 3 volume
2
a 
of unit
 Indeed we can verify that in 3D, D 3
a radius
sphere
 Let us calculate the ratio of the volume of a sphere of radius 𝑎 to the volume
of the hypercube of size 2𝑎 (sphere touches the hypercube at the centers of
each side) Vol. of sphrere 2 1  D /2 D /2
 a D

D  2a  D
D
Vol. of cube
DG   D 2 D 1 G  
2 2
 For high dimensions, use Stirling’s formula for the Gamma function
2  z  4  D 
z D /2
D
G z     G     
z e 2 D  2e 
Vol. of sphrere  D /2 D 4  D 
D /2
 G  
Vol. of cube D
D 2 D 1 G  
 
2 D  2e 
2
 Combining the above two Equations, we can derive:

 D /2  D /2 D1/2  2e  1  e 
D /2 D /2
Vol. of sphrere
     0
Vol. of cube  D  D 2 D 12 1/2  D   D1/2  2 D  D 
D 2 D 1 G  
2
 It is interesting to note that this ratio goes to zero as 𝐷∞.
 This implies that most of the volume of the hypercube must be concentrated
in its many corners which themselves become very long spikes (note distance
from the center of the cube to the corners a D   )!

Curse of Dimensionality: Gaussian in High 𝐷
 Also consider the behavior of a Gaussian distribution in a high-dimensional
space.
𝑝𝑚(𝑟)
 The probability mass 𝑝𝑚(𝑟) = 𝑝(𝑟) 𝑑𝑟 is concentrated in a thin shell.
 Here we transformed to polar coordinates and integrated out the directional

variables (the proof is given next).
Curse of Dimensionality: Gaussian in High D
 Consider a Gaussian in high 𝐷 of the form:
1  x 2 
p( x )  exp  2 
 2 
2 D /2
 2 
 Consider a shell of radius 𝑟 and thickness 𝜀. The density is approximately

constant at x  r and the shell has volume S r  where S D is the surface area of
2 2 D 1
D
a unit sphere in 𝐷 − dimensions. The probability mass on this shell is then:

r2

pm (r )  
shell
p ( x )dx  S D r D 1 p (r ) ~ S D r D 1 e 2 2
 The max of 𝑝𝑚(𝑟) (𝐷 >> 1) takes place at:

r2 r2
 1 
( D  1)r D  2 e 2 2
 rD e 2 2
 0  𝑟Ƹ ≈ 𝐷𝜎
 2
2
Ƹ
𝑟+𝜀
 Similarly the mass 𝑝𝑚 (𝑟Ƹ + 𝜀) is given as: 𝑝𝑚 (𝑟Ƹ + 𝜀) ∝ 𝑟Ƹ + 𝜀 𝐷−1 − 2𝜎2
𝑒
2
𝑟Ƹ + 𝜀 𝜀2 𝑟𝜀
Ƹ 𝜀 𝜀2
𝑝𝑚 (𝑟Ƹ + 𝜀) ∝ exp − + (𝐷 − 1)ln(𝑟Ƹ + 𝜀) = 𝑝𝑚 (𝑟)exp
Ƹ − 2 − 2 + (𝐷 − 1)( − 2 ቇ
2𝜎 2 2𝜎 𝜎 𝑟Ƹ 2𝑟Ƹ
ln(1  x)  x  x 2 2  O( x 3 )

Curse of Dimensionality: Gaussian in High 𝐷
 Thus using 𝑟Ƹ ≈ 𝐷𝜎 and assuming high 𝐷, we conclude:
𝜀2 𝐷𝜀 𝜀 𝜀2 𝜀2
𝑝𝑚 (𝑟Ƹ + 𝜀) ∝ 𝑝𝑚 (𝑟)exp
Ƹ − 2− + (𝐷 − 1)( − 2
൱ ∝ 𝑝𝑚 (𝑟)exp
Ƹ − 2
2𝜎 𝜎 𝐷𝜎 2𝐷𝜎 𝜎
 𝑝𝑚(𝑟) decays exponentially from its max at 𝑟Ƹ with length scale  where
𝑟Ƹ >> 𝜎 , i.e. the mass is concentrated in a thin shell at a large radius
 Let us now compare the density p( x  0) versus 𝑝(‖𝒙‖ = 𝑟Ƹ )

1
𝑝 𝒙=𝟎 = 𝐷 Τ2
2𝜋𝜎 2 𝑝(𝒙 = 0) 𝐷
⇒ = exp
1 𝑟Ƹ 2 1 𝐷 𝑝(‖𝒙‖ = 𝑟Ƹ ) 2
𝑝(‖𝒙‖ = 𝑟)Ƹ = 𝐷 Τ2
exp − 2 = exp −
2𝜋𝜎 2 2𝜎 2𝜋𝜎 2 𝐷Τ2 2
 We see that the probability is larger at the origin than at 𝑟Ƹ by a factor 𝑒 𝐷/2 , i.e.
most of the probability mass in high 𝐷 is concentrated at a different radius
than the density!
 This distinction for Gaussians in high 𝐷 is important in Bayesian estimation.
Curse of Dimensionality
 Real data are confined to a region of the space having lower effective
dimensionality.
 Data usually live on a low-dimensional manifold embedded within the high-

dimensional space.
 Real data exhibit smoothness so that small changes in the input variables
produce small changes in the target variables.
 Thus we can still exploit local interpolation-like techniques to make

predictions of the target variables for new values of the input variables.
 Addressing the so called “curse of (stochastic) dimensionality” (Bellman,

1961) is a fundamental problem in machine learning.
 Bellman, R. (1961). Adaptive Control Processes: A Guided Tour. Princeton University Press.

Lec10 CurseOfDimensionality

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec10 CurseOfDimensionality

Uploaded by

Copyright:

Available Formats

Curse of Dimensionality

Prof. Nicholas Zabaras

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras

 Learn how to compute and interpret volume/area in high-dimensions

 Obtain a physical intuition of probability and probability mass in high-dimensions

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 4

 𝐷 is the dimensionality of the input space.

෍ ෍ … ෍ 𝑤𝑖1 𝑖2...𝑖𝑀 𝑥𝑖1 𝑥𝑖2 . . . 𝑥𝑖𝑀 = ෍ ෍ … ෍ 𝑤

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 5

 Similarly for 𝐷 >> 𝑀, 𝑁(𝐷, 𝑀)~𝐷𝑀 .

 Note that 𝑁(𝐷 = 100, 𝑀 = 3) = 176,851!

 Note that for large 𝐷, this fraction tends to 1 even

 In high dimensions, the volume of the sphere is The volume of a sphere in

 From the normalization of the Gaussian and definition of the G function

 Combining the above two Equations, we can derive:

 It is interesting to note that this ratio goes to zero as 𝐷∞.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 11

 The probability mass 𝑝𝑚(𝑟) = 𝑝(𝑟) 𝑑𝑟 is concentrated in a thin shell.

 Here we transformed to polar coordinates and integrated out the directional

 Consider a shell of radius 𝑟 and thickness 𝜀. The density is approximately

a unit sphere in 𝐷 − dimensions. The probability mass on this shell is then:

 The max of 𝑝𝑚(𝑟) (𝐷 >> 1) takes place at:

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 13

 Let us now compare the density p( x  0) versus 𝑝(‖𝒙‖ = 𝑟Ƹ )

 Data usually live on a low-dimensional manifold embedded within the high-

 Thus we can still exploit local interpolation-like techniques to make

 Addressing the so called “curse of (stochastic) dimensionality” (Bellman,

You might also like