You are on page 1of 37

Introduction to

Probability and Statistics


(Continued)
Prof. Nicholas Zabaras

Email: nzabaras@gmail.com
URL: https://www.zabaras.com/

August 18, 2020

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras


Contents
 Random Variables, CDF, Mean and Variance

 Uniform Distribution, Gaussian, The binomial and Bernoulli distributions

 The multinomial and multinoulli distributions, Poisson distribution

 Empirical distribution

 The goals for today’s lecture:

 Understand the definition of random variables, CDFs and PDFs


 Learn about the mean, variance and higher-moments of a distribution
 Familiarize ourselves with the Uniform, Gaussian, Bernoulli, Binomial, Multinomial,
Multinoulli, Poisson and Empirical distributions
 Get a first look at the concept of maximum likelihood for parameter estimation
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 2
References
• Following closely Chris Bishops’ PRML book, Chapter 2

• Kevin Murphy’s, Machine Learning: A probablistic perspective, Chapter 2

• Jaynes, E. T. (2003). Probability Theory: The Logic of Science. Cambridge


University Press.

• Bertsekas, D. and J. Tsitsiklis (2008). Introduction to Probability. Athena


Scientific. 2nd Edition

• Wasserman, L. (2004). All of statistics. A Concise Course in Statistical


Inference. Springer.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 3


Random Variables
 Define Ω to be a probability space equipped with a probability measure 𝑃 that
measures the probability of events 𝐸 ⊂ Ω. Ω contains all possible events in the
form of its own subsets.

 A real valued random variable 𝑋 is a mapping 𝑋: Ω → ℝ.

 We call 𝑥 = 𝑋 𝜔 , 𝜔 ∈ ℝ, a realization of 𝑋.

 Probability distribution of 𝑋: For 𝐵 ⊂ ℝ,


PX ( B)  P  X 1 ( B )   P  X ( )  B

 Probability density
PX ( B)   p X ( x)dx
B
 We often write:
p X ( x)  p( x)
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 4
Cummulative Density Function

z
p  x  0 F  z   p( x)dx
 

 p( x)dx  1

(cumulative density function)

b
P  x  (a, b)    p ( x)dx  F (b)  F (a )
a

 The CDF for a random variable 𝑋 is the function 𝐹(𝑥) that returns the
probability that 𝑋 is less than 𝑥.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 5


Mean and Variance
 The expected (mean) value of 𝑋 is:


 X    xpX ( x)dx


 The variance of 𝑋 is defined as:


var  X    X   X 
2
  X 2   X 
2

 

 The standard deviation is often useful defined as

std  X   var[ X ]

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 6


Mean and Variance
 If 𝑇(𝑋) is a function of 𝑋 (called a statistic), the mean of 𝑇(𝑋) is given by


T ( X )   T ( x) p X ( x)dx


 The variance of 𝑇(𝑋) is also defined as:

var T ( X )   T ( X )  (T ( X )) 2   T ( X ) 2    T ( X ) 


2

 

 The 𝑘th moment is defined as: (𝑘 = 3, skewness, 𝑘 = 4, kurtosis)



 X  [ X ]   
 
k k
( x [ X ]) p X ( x)dx
 


Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 7


Expectation
 For the discrete case

 f    p( x) f ( x)
x

 For the continuous case


 f    p( x) f ( x)dx

 Conditional expectation

x  f | y    p( x | y ) f ( x)
x

x  f | y    p( x | y) f ( x)dx

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 8


Expectation
 The expectation of a random variable is not necessarily the value that we
should expect a realization to have. Let Ω = [−1,1] with the probability 𝑃 being
uniform: 1 1
2 I
P( I )  dx  I , I  [1,1]
2

 Consider the following two random variables:


X 1 :[1,1]  , X 1 ( )  1 
2   0
X 2 :[1,1]  , X 2 ( )  
0   0
 We can see that
1 1
1 1
 1  1  
X  X  dx   1 dx  1,
1
2 1
2
1 1
1 1
 X 2    X 2   dx   2 dx  1
2 2
although 0 0

X 2 ( )  1   .
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 9
Uniform Random Variable
 Consider the Uniform random variable 𝒰 𝑥 𝑎, 𝑏 :
1
U ( x | a, b)  ( a  x  b)
ba
 x for 0  x  1,
 What is the CDF of a uniform random variable 𝒰 𝑥 0,1 ? 
PU ( x)  1, for x  1
 0, otherwise

 You can show that the mean, 2nd moment and variance of 𝒰 𝑥 𝑎, 𝑏 are:
ab a 2  ab  b 2 (b  a ) 2
 x | a, b   ,  x | a, b  
2
, var  x | a, b  
2 3 12
 Note that it is possible for 𝑝(𝑥) > 1 but the density still needs to integrate to 1.
For example, note that
 1
U ( x | 0,1/ 2)  2  1 x  0, 
 2
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 10
The Gaussian Distribution
 A random variable 𝑋 ∈ ℝ is Gaussian or normally distributed 𝑋~𝒩 𝜇, 𝜎 2 if:
 1 2
t
1
P  X  t 
2 
exp   ( x   ) dx
2   2 
2

 The probability density is:


 1 
N  x,  ,  2  
1
exp   2 ( x   ) 2 
2 2  2 

To show that this density is normalized note the following trick:
  
 1   1   1 
Let I   exp   2 ( x   ) 2 dx, then : I 2    exp   2 ( y   ) 2  exp   2 ( x   ) 2  dxdy
  2     2   2 
 2 
 1 2  1 2
Set r  ( x   )  ( y   ) , then : I   0  2 2 
    0   2 2 r rdr
2 2 2 2
exp r rdrd 2 exp
0

 1 
Thus : I 2    exp   2 u du    2 2  e   1  2 2
0  2 
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 11
The Gaussian Distribution
 A random variable 𝑋 ∈ ℝ is Gaussian or normally distributed 𝑋~𝒩 𝜇, 𝜎 2 if:
 1 
N  x,  ,  2  
1
exp   2 ( x   ) 2 
2 2  2 

 The following can be shown easily with direct integration:



1  1 2
X   2 
exp   ( x   )  xdx   ,
2   2 2


1  1 2 2  X    2    2
 X       x dx   2   2 , var[ X ] 
2
exp  ( x )   
2 2   2 
2

The following integrals are useful in these derivations :


  

 exp  u du   ,  u exp  u du  0,  u exp  u du 
2 2 2 2

  
2

 We often work with the precision of a Gaussian 𝜆 = 1/𝜎2. The higher the 𝜆 the
narrower the distribution is.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 12
CDF of a Gaussian
 Plot of the Standard Normal 𝒩(0,1) and CDF.
PDF
0.4

Run MatLab function gaussPlotDemo 0.35

from Kevin Murphys’ PMTK 0.3

0.25

0.2

0.15

CDF 0.1

N  x;0,1
100

90 0.05

80 0
-3 -2 -1 0 1 2 3
70
x
F  x;  ,     N ( z |  ,
60
2 2
50
)dz

40

30

20
F  x;  ,  2  
1
2
 
1  erf z / 2  , z  ( x   ) / 

  x;0,1
x
2
erf  x    dt
10
t 2
e
0
-3 -2 -1 0 1 2 3  0

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 13


CDF and the Standard Normal
 Assume 𝑋 ~ 𝒩(0,1)
 Define Φ(𝑥) = 𝑃(𝑋 < 𝑥) = Cumulative Distribution of 𝑋

x
 ( x)   p( z )dz
z  

1
x
 z2 

2 
z  
exp  dz
 2

𝑥−𝜇
 Assume 𝑋 ~ 𝒩(𝜇, 𝜎2). Then F x = 𝑃(𝑋 < 𝑥|𝜇, 𝜎2) = Φ .
𝜎

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 14


Univariate Gaussian

[X ]  μ
var[ X ]   2

1  ( x   )2 
p( x)  exp  
2   2 2

 Shorthand: We say 𝑋 ~ 𝒩(𝜇, 𝜎2) to mean “𝑋 is distributed as a Gaussian with


parameters 𝜇 and 𝜎2 ”.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 15


Univariate Gaussian
 Representation of symmetric phenomena without p( x) 
1  ( x   )2 
exp  
2   2 2

long tails.
 Inappropriate for skewness, fat tails,
multimodality, etc.
 The popularity of the Gaussian is related to facts as
 Completely defined in terms of the mean and the variance
 The Central Limit Theorem shows that the sum of i.i.d. random variables has
approximately a Gaussian distribution making it an appropriate choice for
modeling noise (limit of additive small effects)
 The Gaussian distribution makes the least assumptions (maximum entropy) from
all distributions with given mean and variance.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 16


The Normal Model
 The normal (or Gaussian) distribution 𝑋 ~ 𝒩(𝜇, 𝜎2)
1  1 2
exp   2  x    
2  2 

is one of the most studied and one of the most used distributions

 Sample 𝑥1, . . . , 𝑥𝑛 from a 𝒩 𝜇, 𝜎2 normal distribution 0.4

0.35

In a typical inference problem, we are interested in: 0.3

0.25

 Estimation of 𝜇, 𝜎
0.2

0.15

 Confidence regions on 𝜇, 𝜎 0.1

0.05

 Test on (𝜇, 𝜎) and comparison with other samples 0


-4 -3 -2 -1 0 1 2 3 4

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 17


Datasets: Normal Data
 Normaldata : Relative changes in reported larcenies between 1991 and 1995
(relative to 1991) for the 90 most populous US counties (Source: FBI)1
4

3.5

Normal data 2.5

Matlab implementation 2

1.5

0.5

0
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

From Bayesian Core, J.M. Marin and C.P. Roberts, Chapter 2 (available online)
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 18
Datasets: CMBData
 CMBdata: Spectral representation of the cosmological microwave background
(CMB), i.e. electromagnetic radiation from photons back to 300,000 years
after the Big Bang, expressed as difference in apparent temperature from the
mean temperature.1 6

1
 MLE based
0.9
5 estimates of
0.8 𝜇 and 2 used in
0.7 4
constructing the
0.6
Gaussian shown
0.5 3
with the solid line
0.4

 Maximum
0.3
2
0.2

0.1
Likelihood
1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Estimators (MLE)
will be discussed
CMBdata 0
-0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 in a follow up

Matlab implementation lecture


Normal estimation

From Bayesian Core, J.M. Marin and C.P. Roberts, Chapter 2 (available online)
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 19
Quantiles
 Recall that the probability density function 𝑝𝑋(·) of a random variable 𝑋 is
defined as the derivative of the cumulative density function, so that
x0

F ( x0 )  p

X ( x)dx

 The value 𝑦(𝛼) such that 𝐹(𝑦(𝛼) ) = 𝛼 is called the a-quantile of the
distribution with CDF 𝐹. The median is of course 𝐹(𝑦 0.5 ) = 0.5
  One can define tail area probabilities. The shaded
 Note that PX ()  p

X ( x)dx  1 regions each contain 𝛼/2 of the probability mass.
For 𝒩(0, 1), the leftmost cutoff point is Φ−1 (𝛼/2),
where Φ is the cdf of 𝒩(0, 1).

 If 𝛼 = 0.05, the central interval is 95%, and the


left cutoff is −1.96 and the right is 1.96. Figure
generated by quantileDemo from PMTK
a/2 a/2

 For 𝒩(𝜇, 𝜎2), the 95% interval becomes (𝜇 −


-1(a/2) 0 -1(1-a/2) 1.96𝜎, 𝜇 + 1.96𝜎) – often approximated as (𝜇2𝜎)
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 20
Binary Variables
 Consider a coin flipping experiment with heads = 1 and tails = 0. With 𝜇 ∈
[0,1]
p ( x  1|  )  
p( x  0 |  )  1  

 This defines the Bernoulli distribution as follows:

Bern ( x |  )   x (1   )1 x
 Using the indicator function, we can also write this as:

Bern ( x |  )   ( x 1)
(1   ) ( x  0)

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 21


Bernoulli Distribution
 Recall that in general
[ f ]   p( x) f ( x), [ f ]   p( x) f ( x)dx
x

var[ f ]  [ f ( x) 2 ]  [ f ( x)]2
 For the Bernoulli distribution Bern ( x |  )   x (1   )1 x, we can easily show
from the definitions:
[ x]  
var[ x]   (1   )

[ x]   
x0,1
p ( x |  ) ln p( x |  )    ln   (1   ) ln(1   )

 Here H[𝑥] is the “entropy of the distribution”.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 22


Likelihood Function for Bernoulli Distribution
 Consider the i.i.d. data set D   x1 , x2 ,..., xN  in which we have 𝑚 heads (𝑥 =
1), and 𝑁 − 𝑚 tails (𝑥 = 0)

 The likelihood function takes the form:

N N
p (D |  )   p ( xn |  )   xn (1   )1 xn  m (1   ) N  m
n 1 n 1

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 23


Binomial Distribution
 Consider the discrete random variable 𝑋 ∈ 0,1,2, … , 𝑁 .

 We define the Binomial distribution as follows:


binomial distribution
0.35

N m
Bin ( X  m | N ,  )     (1   ) N  m 0.3

m 0.25 Bin ( N  10,   0.25)


 In our coin flipping experiment,
0.2

0.15

it gives the probability in 𝑁 flips to 0.1

get 𝑚 heads with 𝜇 being the Matlab Code


0.05

probability getting heads in one flip. 0


0 1 2 3 4 5 6 7 8 9 10

 It can be shown (see S. Ross, Introduction to Probability Models) that the limit
of the binomial distribution as 𝑁 → ∞, 𝑁𝜇 → 𝜆, is the Poisson(l) distribution.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 24


Binomial Distribution
 The Binomial distribution for 𝑁 = 10, and 𝜇 ∈ 0.25,0.9 is shown below using
MatLab function binomDistPlot from Kevin Murphys’ PMTK

 =0.250
 0.25  =0.900
0.9
0.35 0.4

0.3 0.35

0.25 0.3

0.25
0.2

0.2
0.15

0.15
0.1
0.1

0.05
0.05

0
0 1 2 3 4 5 6 7 8 9 10 0
0 1 2 3 4 5 6 7 8 9 10

Bin ( N ,  )

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 25


Mean,Variance of the Binomial Distribution
 Consider for independent events the mean of the sum is the sum of the
means, and the variance of the sum is the sum of the variances.

 Because 𝑚 = 𝑥1 + . . . + 𝑥𝑁, and for each observation the mean and variance
are known from the Bernoulli distribution:
N
[m]   mBin (m | N ,  )  [ x1  ...  xN ]  N 
m0
N
var[m]    m  [m] Bin (m | N ,  )  var[ x1  ...  xN ]  N  (1   )
2

m0

 One can also compute [m], [m 2 ] by differentiating (twice) the identity


wrt 𝜇. Try it!
N
N m
    (1   )
m0  m 
N m
1

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 26


Binomial Distribution: Normalization
To show that the Binomial is correctly normalized, we use the following:
 N   N   N  1
 With direct substitution can show:     (*)
 n   n  1  n 
N m
N

 Binomial theorem: (1  x)     x (**)


N

m0  m 

 This theorem is proved by induction using (*) and noticing:


N
N m N
 N  m N  N  m 1 N  N  m N 1  N  m
(1  x) N 1
    x (1  x)     x     x     x    x 
m0  m  m0  m  m0  m  m0  m  m 1  m  1 
 N m  N  N  m N 1   N  1 m N 1 N  1
*
N N
  m
      
N 1
1  x   x  x   1    x  x   x
 m 1  m    m 1  m  1  m 1  m  m 0  m 

 To finally show normalization using (**):


m N
N
N m N
 N    N   
    (1   ) N m
 (1   ) N
   
m 0  m   1   
 (1   )  1   1
m0  m   1  
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 27
Generalization of the Bernoulli Distribution
We are now looking at discrete variables that can take on one of 𝐾 possible
mutually exclusive states.

The variable is represented by a 𝐾-dimensional vector 𝒙 in which one of the


elements 𝑥𝑘 equals 1, and all remaining elements equal 0: 𝒙 =
0,0, … , 1,0, … , 0 𝑇 .
K

These vectors satisfy: x


k 1
k 1

 Let the probability of 𝑥𝑘 = 1 be denoted as 𝜇𝑘. Then


K K K
p( x |  )      k
k 1
xk
k
k 1
( xk 1)
, 
k 1
k  1, k  0

where 𝝁 = 𝜇1 , … , 𝜇𝐾 𝑇 .

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 28


Multinoulli/Categorical Distribution
The distribution is already normalized:
K K

 p( x |  )    
x x k 1
xk
k   k  1
k 1

 The mean of the distribution is computed as:


 x |     xp( x |  )   1 ,...,  K 
T

x

This is similar to the result for the Bernoulli distribution.

 The Multinoulli also known as the Categorical distribution often denoted as


(Mu here is the multinomial distribution):
Cat  x |    Multinoulli  x |    Mu  x |1,  
 The parameter 1 stands to emphasize that we roll a 𝐾 −sided dice once (𝑁 =
1) – see next for the multinomial distribution.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 29
Likelihood: Multinoulli Distribution
 Let us consider an i.i.d. data set D   x1 ,..., xN  . The likelihood becomes:
N
N K K  xnk K N
p(D |  )    xnk
k   kn1   , where : mk  xnk
mk
k
n 1 k 1 k 1 k 1 n 1

is the # of observations of 𝑥𝑘 = 1.

 𝑚𝑘 is the “sufficient statistic” of the distribution.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 30


MLE Estimate: Multinoulli Distribution
To compute the maximum likelihood (MLE) estimate of 𝝁, we maximize an
augmented log-likelihood
 K  K  K 
ln p (D |  )  l   k  1   mk ln k  l   k  1
 k 1  k 1  k 1 
mk
 Setting the derivative wrt 𝜇𝑘 equal to zero: k  
l
 Substitution into the constraint
K

K m k K k 
mk

mk

k 1
k 1  k 1

l
 1  l   mk 
k 1 m
K

k
N
k 1

As expected, this is
the fraction in the 𝑁
observations of 𝑥𝑘 = 1
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 31
Multinomial Distribution
 We can also consider the joint distribution of 𝑚1, … , 𝑚𝐾 in 𝑁 observations
conditioned on the parameters 𝝁 = (𝜇1, … , 𝜇𝐾).

 From the expression for the likelihood given earlier


K
p (D |  )    kmk
k 1

the multinomial distribution Mu (m1 ,..., mK | N ,  ) with parameters 𝑁 and 𝝁 takes


the form:
N!
p (m1 , m2 ,..., mK | N , 1 , 2 ,...,  K )  1m1 2m2 ... Kmk where  k 1 mk  N
K

m1 !m2 !...mK !

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 32


Example: Biosequence Analysis
 Consider a set of DNA cgat acg gggtcgaa
Sequences
caat ccg agatcgca
sequences where there caat ccg t g t t g g g a N=1:10
are 10 rows (sequences) caat cgg catgcggg
and 15 columns (locations cgagccg cgtacgaa
along the genome). cat acgg agcacgaa
t aat ccg ggcatgta
 Several locations are cgagccg agtacaga
ccat ccg cgtaagca
conserved by evolution ggat acg agatgaca
(e.g., because they are part Location along the genome
of a gene coding region), since
the corresponding columns tend to be pure e.g., column 7 is all 𝑔’s.

 To visuallize the data (sequence logo), we plot the letters 𝐴, 𝐶, 𝐺 and 𝑇 with a
font size proportional to their empirical probability, and with the most probable
letter on the top.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 33
Example: Biosequence Analysis
 The empirical probability distribution at location 𝑡, is obtained by normalizing
the vector of counts (see MLE estimate)
𝑁 𝑁 𝑁 𝑁
1
෡𝑡 =
𝜽 ෍𝕀(𝑋𝑖𝑡 = 1) , ෍𝕀(𝑋𝑖𝑡 = 2) , ෍𝕀(𝑋𝑖𝑡 = 3) , ෍𝕀(𝑋𝑖𝑡 = 4)
𝑁
𝑖=1 𝑖=1 𝑖=1 𝑖=1
2

 This distribution is known


as a motif.

 Can also compute the

Bits
1

most probable letter


in each location; this is the
consensus sequence.
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Sequence Position

Use MatLab function seqlogoDemo from Kevin Murphys’ PMTK


Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 34
Summary of Discrete Distributions
 A summary of the multinomial and related discrete distributions is summarized
below on a Table from Kevin Murphy’s textbook

 𝑛 = 1 (one roll of the dice), 𝑛 = − (𝑁 rolls of the dice)

 𝐾 = 1 (binary variables), 𝐾 = − (1 −of−𝐾 encoding)

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 35


The Poisson Distribution
 We say that 𝑋 ∈ {0,1,2,3, . . } has a Poisson distribution with parameter 𝜆 > 0,
if its pmf is
l lx
X ~ Poi (l ) : Poi ( x | l )  e
x!

 This is a model for counts of rare events.


Poi(l=1.000) Poi(l=10.000)
0.4 0.14

0.35 0.12

0.3
0.1

0.25
0.08

0.2
0.06

0.15
0.04
0.1

0.02
0.05

0
0 0 5 10 15 20 25 30
0 5 10 15 20 25 30

Use MatLab function poissonPlotDemo from Kevin Murphys’ PMTK


Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 36
The Empirical Distribution
 Given data, 𝒟 = {𝑥1, … , 𝑥𝑁}, 𝑥𝑖 ~𝑝 𝑥 , define the empirical distribution as:
1 N
1 if xi  A
pemp ( A)    xi ( A), Dirac Measure :  xi ( A)  
N i 1 0 if xi  A
 We can also associate weights with each sample:
N N N
1
Generalize pemp ( x) 
N
 i 1
xi ( x)  pemp ( x)   wi xi ( x) , 0  wi  1,  wi  1
i 1 i 1

 This corresponds to a histogram with spikes at each sample point with height
equal to the corresponding weight. This distribution assigns zero weight to
any point not in the dataset.
 Note that the “sample mean of 𝑓(𝑥)” is the expectation of 𝑓(𝑥) under the
empirical distribution:
N N
1 1
[ f ( x)]  pemp ( x ) [ f ( x)]   f ( x)  xi ( x)dx   f (x )
i
i 1 N N i 1
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 37

You might also like