Department of Statistics

University of Auckland
Summer Scholarship 2012-2013
Parameter estimation for selected
multivariate distributions
Name: Viet Quoc Hoang
UPI: 1414002
Supervisor: Dr Thomas Yee
Degree: Not enrolled
February 27, 2013
Summer Scholarship Experience
First of all, this summer scholarship in Statistics has brought my experience in doing research to the
next level. Not only my L
A
T
E
X typesetting skills improved tremendously but also it has showcased the
applicability of certain courses (say STATS 310 and 210) throughout the entire research. Technically
speaking, I am able to implement real applications of parameter estimations from STATS 310 in
order to generalize a given result further. Moreover, I am thrilled and flattered by the support of
Statistics Department by their workplace facilities and generous help from the supervisor. Standing
at a student’s point of view, I cannot think of how much benefits I have gained from this research
in guiding me much more to many different aspects such as post-graduate study or work-related
areas. Lastly, summer scholarship in Statistics definitely gives me a chance to explore all the possible
potential that I may be heading to in the future but for now, a piece of advice for future scholarship
recipients, you would not regret taking this because it is somewhat an awesome preparation for any
prospects.
Summary
In this short report, I will be presenting the essence of this summer research including different types
of distributions. Some of them have complete characteristics, the rest are still under investigation.
Due to time constraints, it was not possible to finish them all. However, I believe that if time allowed
us, we would be able to complete this report totally. In particular, there are plentiful findings about
truncated geometric and Weibull distribution. Besides, I have also digged into some other areas
such as spatial statistics and time series. It occurs to me that the theory of nearest neighbor is
strongly associated with mathematics in general and thus, is an exciting aspect to work on. Finally,
as mentioned above, there is an advantage of taking STATS 210 and 310 in doing theoretical research
like this due to the fact that they give you a background on inferential side of Statistics (e.g.,checking
expected value for score function, etc).
1
Abstract
This article discusses some theoretical and practical aspects in Statistics. We will firstly look at the
usefulness of having Sweave in producable research field and then investigate different distributions
along with their distinguished characteristics. I would like to thank Thomas Yee for his superb super-
vision during the summer research period. Thanks also go to my family who gives everything possible
for me to complete this paper. Without their tremendous support, this report would have not been
published. Any correction to improve quality of this paper is welcome at vietmath@hotmail.com.
Contents
1 L
A
T
E
X and Sweave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Types of Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1 Generalized Rayleigh Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Half-normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Positive truncated geometric distribution . . . . . . . . . . . . . . . . . . . . . 3
2.4 Generalized positive truncated geometric distribution . . . . . . . . . . . . . . 4
2.5 Logit-normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.6 Multivariate Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . 5
2.7 Nearest Neighbor and point-to-nearest object . . . . . . . . . . . . . . . . . . 6
2.8 Weibull distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.9 Data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1
1 L
A
T
E
X and Sweave
Sweave is a tool that allows to embed the R code for complete data analyses in L
A
T
E
X documents. The
purpose is to create dynamic reports, which can be updated automatically if data or analyses change.
Instead of inserting a prefabricated graph or table into the report, the master document contains the
R code necessary to obtain it. When run through R, all data analysis output (tables, graphs, etc.)
are created on the fly and inserted into a final L
A
T
E
X document. The report can be automatically
updated if data or analyses change, which allows for truly reproducible research. (Leisch, 2008)
To demonstrate how Sweave works, we need the file with extension .Rnw (one can use Tinn-R software
to do this as they are common because of its compactiablility with most of the scientific packages).
For example, the information entered in rnw should look like this
\begin{Schunk}
\begin{Sinput}
> x=seq(0,2,by=0.01)
> y=dlnorm(x)
> plot(x,y,type="l")
\end{Sinput}
\end{Schunk}
\setkeys{Gin}{width=0.5\textwidth} % 0.8 is the current default
\begin{figure}[hh]
\begin{center}
\includegraphics{draft-002}
\caption{
This is the lognormal distribution.
}
\label{fig:lognormal}
\end{center}
\end{figure}
The output is undoubtedly as follows
> x=seq(0,2,by=0.01)
> y=dlnorm(x)
> plot(x,y,type="l")
Clearly, if we refer back to the figure 1, one can modify the coding lines so that it fits best with the
reproduciable research purposes.
2 Types of Distribution
2.1 Generalized Rayleigh Distribution
For α > 0 and λ > 0 the two-parameter GR distribution has the probability density function (Kundu
and Raqab, 2005)
f(y; α, λ) = 2α
y
λ
2
e
−(y/λ)
2
_
1 −e
−(y/λ)
2
_
α−1
, y > 0.
2
0.0 0.5 1.0 1.5 2.0
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
.
6
x
y
Figure 1: This is the lognormal distribution.
The cdf distribution of Y is
F(y; α, λ) =
_
1 −e
−(y/λ)
2
_
α
, y > 0.
The distribution function is readily invertible to yield the quantile function
Q(u) = F
−1
(u) =
_
−λ
2
log(1 −u
1/α
), 0 < u < 1.
2.2 Half-normal Distribution
The probability density function (PDF) of the half-normal is given by
f(y; σ) =
2

2πσ
e

1
2
(
y−µ
σ
)
2
, y > µ, σ > 0.
2.3 Positive truncated geometric distribution
Theory and Results
The truncated geometric distribution (TGD) (MacKenzie et al., 2006, p.89) probability function
given by
P(Y = y|0 < Y ≤ K) =
p(1 −p)
y−1
1 −(1 −p)
K
, y = 1, . . . , K, (1)
= 0, otherwise
where Y is the time for the first detection for these locations where the species was detected during
K surveys. Moreover, we also have
E(Y |0 < Y ≤ K) =
1
p

K(1 −p)
K
1 −(1 −p)
K
. (2)
3
Seber (1973) showed ´ p satisfies y =
sQ
s+1
−(s + 1)Q
s
+ 1
Q
s+1
−Q
s
−Q + 1
where Q = 1 −p.
We can infer some of its properties, for instance, the mean and variance are as follows:
µ = p
−1

sQ
s
1 −Q
s
, (3)
σ
2
= p
−2
+
1
Q
2
_
p
−1

sQ
s
1 −Q
s
_

1
Q
2
−s
_
(s −1)Q
s−2
1 −Q
s
+
sQ
2s−2
(1 −Q
s
)
2
_
. (4)
Besides, we also perform a few calculations to obtain the first two derivatives of the log-likehood
function wrt the parameter p
d
i
dp
=
1
p
+
1 −y
i
1 −p

s(1 −p)
s−1
1 −(1 −p)
s
, (5)
d
2

i
dp
2
= −
1
p
2
+
1 −y
i
(1 −p)
2
+
s(s −1)(1 −p)
s−2
(1 −(1 −p)
s
) + s
2
(1 −p)
2(s−1)
{1 −(1 −p)
s
}
2
. (6)
R Code
The following R code is implemented to check the positiveness of the expected information matrix
as calculated above
> pvec <- seq(0.001, 0.999, length = 1001)
> p <- pvec
> s <- 4
> eim <- -(-1/p^2+1/(1-p)^2-1/(1-p)^2*(1/p-s*(1-p)^s/(1-(1-p)^s))+
+ (s*(s-1)*(1-p)^(s-2)*(1-(1-p)^s)+s^2*(1-p)^(2*(s-1)))/(1-(1-p)^s)^2)
> plot(eim ~ log1p(p), type = "l", col = "blue", log = "y")
Figure 2 clearly indicates that for this particular, the EIM is positive as expected.
2.4 Generalized positive truncated geometric distribution
We define the probability distribution function of the GPTGD ( generalised positive truncated geo-
metric distribution) as follows
P(Y = y|M ≤ Y ≤ N) =
p(1 −p)
y−M
1 −(1 −p)
N−M+1
, y = M, M + 1, . . . , N −1, N. (7)
with mean and variance are respectively
E[Y ] =
1
1 −(1 −p)
N−M+1
_
M −N(1 −p)
N−M+1
+
1 −p
p

(1 −p)
N−M+1
p
_
, (8)
Var[Y ] = −
1
p
2
+
M
(1 −p)
2
+
(N −M)(N −M + 1)(1 −p)
N−M−1
_
1 −(1 −p)
N−M+1
_
{1 −(1 −p)
N−M+1
}
2

1
(1 −p)
2
(1 −(1 −p)
N−M+1
)
_
M −N(1 −p)
N−M+1
+
1 −p
p

(1 −p)
N−M+1
p
_
+
(N −M + 1)
2
(1 −p)
2(N−M)
{1 −(1 −p)
N−M+1
}
2
. (9)
4
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
1
5
5
0
5
0
0
log1p(p)
e
i
m
Figure 2: EIM for the positive truncated geometric. Here, s = 4.
Besides, we also perform a few calculations to obtain the first two derivatives of the log-likehood
function wrt the parameter p
d
i
dp
=
1
p

y
i
−M
1 −p

(N −M + 1)(1 −p)
N−M
1 −(1 −p)
N−M+1
, (10)
d
2

i
dp
2
= −
1
p
2
+
(N −M)(N −M + 1)(1 −p)
N−M−1
_
1 −(1 −p)
N−M+1
_
{1 −(1 −p)
N−M+1
}
2
+
(N −M + 1)
2
(1 −p)
2(N−M)
{1 −(1 −p)
N−M+1
}
2

y
i
−M
(1 −p)
2
. (11)
2.5 Logit-normal Distribution
The logit-normal distribution for y with support 0 ≤ y ≤ 1 is
[y|µ, σ
2
] = logit N(µ, σ
2
)
=
_
1
2πσ
2
exp
_

1

2
[logit(y) −µ]
2
_
1
y(1 −y)
.
2.6 Multivariate Exponential Distribution
Let S
n
denote the set of vectors (s
1
, s
2
, . . . , s
n
)
T
where each s
i
= 0 or 1 but (s
1
, s
2
, . . . , s
n
) =
(0, . . . , 0). Let {Z
s
: s ∈ S
n
} be a set of independent random variables such that each Z
s
has an
exponential distribution with mean 1/λ
s
. Define X
i
= min{Z
s
: s
i
= 1}, i = 1, 2, . . . , n. Then
X = (X
1
, X
2
, . . . , X
n
) has a multivariate exponential distribution with parameters {λ
s
: s ∈ S
n
}.
We shall write X ∼ MV E(n, λ) as an abbreviation of the statement: X is an n-dimensional random
variable with a multivariate exponential distribution with parametes {λ
s
: s ∈ S
n
}. Denoting
the bivariate exponential distribution function by F, straightforward computation yields (Expected
Information Matrix: EIM)( Arnold (1968))
5
E
_


2
∂λ
2
10
log dF
_
=
λ
01
λ(λ −λ
01
)
2
+
1
λλ
10
, (12)
E
_


2
∂λ
2
01
log dF
_
=
λ
10
λ(λ −λ
10
)
2
+
1
λλ
01
, (13)
E
_


2
∂λ
2
11
log dF
_
=
λ
01
λ(λ −λ
01
)
2
+
λ
10
λ(λ −λ
10
)
2
+
1
λλ
11
, (14)
E
_


2
∂λ
10
∂λ
01
log dF
_
= 0, (15)
E
_


2
∂λ
10
∂λ
11
log dF
_
=
λ
01
λ(λ −λ
01
)
2
, (16)
E
_


2
∂λ
01
∂λ
11
log dF
_
=
λ
10
λ(λ −λ
10
)
2
(17)
(18)
where λ = λ
01
+ λ
10
+ λ
11
. Consequently, the information matrix has the form
I
N
(Ψ) =
N
λ
_
_
a + c 0 a
0 b + d b
a b a + b + c
_
_
where a = λ
01
(λ −λ
01
)
−2
, b = λ
10
(λ −λ
10
)
−2
, c = λ
−1
10
, d = λ
−1
01
and e = λ
−1
11
.
2.7 Nearest Neighbor and point-to-nearest object
We introduce the following function
f(y
i
) =
2πN
A
y
i
×exp
_

πNy
2
i
A
_
where
y
i
= distance from i
th
point/object to nearest object,
A = surface are of the whole surface region,
N = the number of animals in the population,
i = 1, 2, . . . , n where n is the number of sampled objects/points.
Point processes in spatial statistics
The analysis of point pattern data in a compact subset S of R
n
is a major object of study within
spatial statistics. Such data appear in a broad range of disciplines, amongst which are:
• forestry and plant ecology (positions of trees or plants in general)
• epidemiology (home locations of infected patients)
• zoology (burrows or nests of animals)
6
• geography (positions of human settlements, towns or cities)
• seismology (epicenters of earthquakes)
• materials science (positions of defects in industrial materials)
• astronomy (locations of stars or galaxies)
• computational neuroscience (spikes of neurons).
The need to use point processes to model these kinds of data lies in their inherent spatial structure.
Accordingly, a first question of interest is often whether the given data exhibit complete spatial
randomness (i.e are a realization of a spatial Poisson process) as opposed to exhibiting either spatial
aggregation or spatial inhibition.
2.8 Weibull distribution
Suppose that Y has a Weibull distribution with p.d.f. and c.d.f. with β > 0, λ > 0 given by Lawless
(2006):
f(y; λ, β) = βλ(λy)
β−1
e
−(λy)
β
, (19)
F(y; λ, β) = 1 −e
−(λy)
β
, y ≥ 0. (20)
Notice that the general left truncated distribution is given by
f
L
(y; T) =
f(y)
1 −F(T)
, 0 < T ≤ y < ∞ (21)
where T is prespecified (known). Then the p.d.f. of the left truncated Weibull distribution (LTWD)
is given by (21)
f
L
(y; T) =
βλ(λy)
β−1
e
−(λy)
β
e
−(λT)
β
. (22)
Statistical properties of the LTWD
The cumulative distribution, probability density and hazard functions of the (untruncated) Weibull
distribution are, respectively (Wingo, 1989)
F(y; α, β) = 1 −exp(−αy
β
), (23)
f(y; α, β) = αβy
β−1
exp(−αy
β
), (24)
h(y; α, β) = αβy
β−1
, (25)
where α > 0, β > 0 and y > 0. The corresponding distribution and density functions for the LTWD
are, respectively,
F
T
(y; α, β) = 1 −exp[−α(y
β
−T
β
)], (26)
f
T
(y; α, β) = αβy
β−1
exp[−α(y
β
−T
β
)], (27)
7
where 0 < T < y. The truncation point T is assumed to be a known, positive constant. Clearly, for
fixed α > 0 and β > 0, F
T
(y) →F(y) and f
T
(y) →f(y) as T →0; that is, as expected, the LTWD
reduces to the untruncated Weibull distribution when the truncation point T equals zero. Also, as
expected, the hazard or failure rate function for the LTWD,
h
T
(y; α, β) =
f
T
(y)
1 −F
T
(y)
= αβy
β−1
(28)
The r-th moment about the origin of y is µ
r
= E{y
r
} =
_

T
y
r
f
T
(y)dy. This integral can be
evaluated to yield, for r ≥ 1,
µ
r
= E{y
r
} =
1
α
r/β
exp(−αT
β
)
_
Γ
_
1 +
r
β
_
−γ
_
1 +
r
β
, αT
β
__
(29)
where
γ(θ, y) =
_
y
0
t
θ−1
e
−t
dt (θ > 0, y > 0) (30)
is the incomplete gamma integral. If we define µ = µ
1
= E{y} and σ
2
= Var{y} = E{y
2
} −µ
2
, then
one has immediately from (29)
µ =
1
α
1/β
exp(−αT
β
)
_
Γ
_
1 +
1
β
_
−γ
_
1 +
1
β
, αT
β
__
, (31)
and
σ
2
=
1
α
2/β
exp(−αT
β
)
_
Γ
_
1 +
2
β
_
−γ
_
1 +
2
β
, αT
β
__
−µ
2
. (32)
Estimation of parameters
As n → ∞, the distribution of (´ α,
´
β) is asymptotically bivariate normal with mean (α, β) and
variance-covariance matrix
_
Cov(´ α,
´
β) Var(
´
β)
_

1
n
_
I
αα
I
αβ
I
βα
I
ββ
_
−1
, (33)
where I
αα
, I
ββ
and I
αβ
= I
βα
are elements of the (symmetric) Fisher information matrix [I
ij
] =
[E{−∂
2
L/∂λ
i
∂λ
j
}], λ = (α , β)
T
, i, j = 1, 2.
From (Wingo, 1989, p.43), it can be shown that
I
αα
= n/α
2
, (34)
I
αβ
= I
βα
= nE{x
β
log y} −nT
β
log T, (35)
I
ββ
= n/β
2
+ αnE{y
β
(log y)
2
} −αnT
β
(log T)
2
. (36)
8
where
E{y
β
log y} = {exp(αT
β
)[ψ(2) −γ

(2, αT
β
)] −(αT
β
+ 1) log α}/αβ, (37)
E{y
β
(log y)
2
} = exp(αT
β
){ψ(2)
2
+ ψ

(2) −γ

(2, αT
β
) − (38)
2 log α[ψ(2) −γ

(2, αT
β
)]}/αβ
2
+ (log α)
2
(αT
β
+ 1)/αβ
2
. (39)
where γ

(2, αT
β
) and γ

(2, αT
β
) are, respectively, the first and second partial derivatives, with respect
to θ, of the incomplete gamma integral γ(θ, y) evaluated at θ = 2 and y = αT
β
. The quantities ψ(2)
and ψ

(2) are respectively, the digamma and trigamma functions evaluated at 2.
These special functions and their derivatives can be evaluated using the computer programms given
by Moore (1982), Bernardo (1976) and Schneider (1978).
Derivatives of the Incomplete Gamma Integral
These derivatives can be evaluated by using the FORTRAN subroutine developed by Moore (1982).
Moore’s subroutine, however, evaluates the derivatives, with respect to θ, of the incomplete gamma
function
I(θ, z) =
γ(θ, z)
Γ(θ)
(40)
By straight differentiation, one obtains
γ

(θ, z) = γ(θ, z)
_
ψ(θ) +
∂I(θ, z)/∂θ
I(θ, z)
_
(41)
The second derivatives γ

(θ, z) would be obtained by a similar operation applied to (41). The algebra
involved, however, becomes considerably more complex. In particular, one has
γ

(θ, z) = {Γ(θ)I(θ, z)}G
2
(θ) (42)
where
G
2
(θ) = ∂G
1
(θ)/∂θ + (G
1
(θ))
2
(43)
and
∂G
1
(θ)/∂θ = ψ

(θ) +
I(θ, z)(∂
2
I(θ, z)/∂θ
2
) −(∂I(θ, z)/∂θ)
2
(I(θ, z))
2
(44)
Tie in with R
The function given in R package is pgamma
_
q, shape, rate = 1, scale =
1
rate
_
.
We also have
P(a, x) ≡
1
Γ(a)
_
x
0
t
a−1
exp(−t)dt (45)
≡ pgamma(x, a) (46)
Thus, according to Wingo, we obtain
γ(θ, y) = Γ(θ)P(θ, y) (47)
= gamma(theta)*pgamma(theta,y) (48)
= exp(lgamma(theta)+ pgamma(y, theta, log.p = TRUE)) (49)
9
Change of variables in EIM
We would like to see the Expected Information Matrix of the LTWD after the following change of
variables in order to make it similar to Weibull() in the VGAM package of Thomas Yee.
We have
_
_
_
α =
_
1
b
_
a
,
β = a.
Suppose that the log-likelihood function is given in the form (a, b) = (α(a, b), β(a, b)), then the
chain rule tells us that

aa
=

2

∂a
2
=
α
α
aa
+
β
β
aa
+
αα

a
)
2
+ 2
αβ
β
a
α
a
+
ββ

a
)
2
, (50)

bb
=

2

∂b
2
=
α
α
bb
+
β
β
bb
+
αα

b
)
2
+ 2
αβ
β
b
α
b
+
ββ

b
)
2
, (51)
(52)
and

ab
=

2

∂a∂b
=
α
α
ab
+
β
β
ab
+
αα
α
a
α
b
+
αβ

b
α
a
+ β
a
α
b
) +
ββ
β
a
β
b
. (53)
Moreover, with a little help from first year calculus, we obtain
α
a
=
_
1
b
_
a
log
_
1
b
_
,
α
b
= −a ×b
−a−1
,
β
a
= 1,
β
b
= 0.
Thus, by substituting those values to those second order partial derivative above and taking the
expectation from both sides (notice that E(
α
) = 0, E(
β
) = 0 since they are score functions), we
have
I
aa
= −E(l
aa
) = I
αα
__
1
b
_
a
log
_
1
b
__
2
+ 2I
αβ
_
1
b
_
a
log
_
1
b
_
+ I
ββ
, (54)
I
bb
= −E(l
bb
) = I
αα
a
2
b
−2(a+1)
, (55)
I
ab
= −E(l
ab
) = I
αα
_
1
b
_
a
log
_
1
b
_
_
−a ×b
−a−1
_
+ 2I
αβ
_
−a ×b
−a−1
_
. (56)
Written Code
Thomas Yee has added the VGAM family function truncweibull() to his package based on this
work.
10
2.9 Data sets
I have processed the following data sets for the VGAM package
1. Taiwanese students
2. Battle of Britain (J.F.Bowyer, 1990)
11
Bibliography
Arnold, B. C., 1968. Parameter estimation for a multivariate exponential distribution. J. Amer.
Statist. Assoc. 63, 848–852.
Bernardo, J. M., 1976. Algorithm as 103: Psi (digamma) function. Journal of the Royal Statistical
Society. Series C (Applied Statistics) 25 (3), pp. 315–317.
URL http://www.jstor.org/stable/2347257
J.F.Bowyer, M., 1990. The Battle of Britain : 50 years on, first printing Edition. Patrick Stephens
Limited, Wellingborough, Northamptonshire NN8 2RQ, England, Ch. 10.
Kundu, D., Raqab, M. Z., 2005. Generalized Rayleigh distribution: Different methods of estimation.
http://home.iitk.ac.in/~kundu/pap.html.
Lawless, J. F., 2006. Truncated Distributions. John Wiley and Sons, Ltd, The United States of
America.
URL http://dx.doi.org/10.1002/9780470012505.tat016
Leisch, F., 2008. Sweave. http://www.stat.uni-muenchen.de/~leisch/Sweave/, [Online; accessed
19-Nov-2012].
MacKenzie, D. I., Nichols, J. D., Royle, J. A., Pollock, K. H., Bailey, L. L., Hines, J. E., 2006. Oc-
cupancy Estimation and Modelling, first printing Edition. AP-Academic Press, the United States
of America.
Moore, R. J., 1982. Statistical algorithms: Algorithm AS 187: Derivatives of the incomplete gamma
integral 31 (3), 330–335.
URL http://lib.stat.cmu.edu/apstat/187
Schneider, B. E., 1978. Algorithm as 121: Trigamma function. Journal of the Royal Statistical Society.
Series C (Applied Statistics) 27 (1), pp. 97–99.
URL http://www.jstor.org/stable/2346249
Seber, G. A. F., 1973. The Estimation of Animal Abundance and Related Parameters, second printing
Edition. The Blackburn Press, The United States of America.
Wingo, D. R., 1989. The left-truncated Weibull distribution: theory and computation. Statist. Papers
30 (1), 39–48.
URL http://dx.doi.org/10.1007/BF02924307
12