Professional Documents
Culture Documents
2nd Report
Ioannis Kouroudis
1.1
1.2
Log is a monotonic function and therefore preserves the critical points How-
ever, as can be seen from excercise 1.1, the logarithmic expression of the deriva-
tives is substantially simpler than the corresponding original one. Further, al-
though the critical points of both functions are the same, the values of the
distributions are substantially larger in the logarithmic case and consequently
less prone to cancellation and round off errors.
2.3
M LE = argmax(P (D|θ))
By using Bayes rule and omitting the constant denominator, as it doesnt
shift the critical point:
M AP = argmax(P (D|θ)P (θ))
consequently, we are searching for a probability distribution of θ that doesnt
affect the critical points. The only one is the constant number, i.e. a uniform
distribution. In a sense therefore, MLE is a special case of MAP with a uniform
prior.
2.4
The posteriori probability having a binomial estimator and a beta prior belief
a+m
is a Beta distribution as well, given by Beta(a+m,b+l) and its mean is a+m+b+l .
The proof of the above is the following: (The constants can be omited, as
we just need to determine the distribution and scale it to 0-1 range)
m
M LE = m+l
1
the prior mean, since the prior is a beta distribution is given by
a
E(θ) = a+b
we can solve the following equations to get a function of the form E(θ|D) =
λE(θ) + (1 − λ)M LE
a a
a+b ∗p= a+b+m+l
and
m m
m+l ∗k = a+b+m+l
a+b
p= a+b+m+l
m+l
k= a+b+m+l
Then, supposing
m a+m
m+l < a+b+m+l → mb > al
a
Which is a contradiction. Similarly, it can be proven for case of a+b >
a+m m a+m
a+b+m+l and m+l > a+b+m+l . The equality case is valid for large numbers of
data. Otherwise, we can indeed assume inequality. Consequently, one of the
terms is larger and one smaller than the posterior and it is proven that E(θ|D)
lies between E(θ) and M LE
5a
Qn k
The overall P (λ|D) = i=1 e−λ λki !i
Pn
f (λ) = logP (λ|D) = −nl + i=1 (ki logλ − logκi !)
The critical (in this case maximum)
Pn
point of this probability distribution is
(ki )
at df /dλ = 0 which is at λmax = i=1 n
2
5b
As seen before, the posterior distribution is given by
P (D|λ)P (λ)
P (θ|D) = P (D)
α α−1 −βλ
Substituting P (λ) with β λ(a−1)!
e
andP (D|λ) with the beta distribution
and since the map estimate is not affected byP(D) we can ommit it entirely, we
get.
α α−1 −βλ
λki β λ
Qn Pn
e
P (θ|D) ∝ e−nλ i=1 ki ! α−1 ∝λ i=1 ki +a
− e−λ(β+n)
At this point, i should check the second derivative to see the nature of the
critical point. The second derivative is −(β + n) < 0, which means that the
calculated λ is a maximum. Consequently
Pn
ki +a−1
λM AP = i=1
n+β