Professional Documents
Culture Documents
HIGH-DIMENSIONAL PROFILE
ANALYSIS
LiTH-MAT-R--2020/07--SE
Linköping University
Department of Mathematics
SE-581 83 Linköping
HIGH-DIMENSIONAL PROFILE ANALYSIS
Cigdem Cengiz1 and Dietrich von Rosen1,2
1 Departmentof Energy and Technology,
Swedish University of Agricultural Sciences,
SE-750 07 Uppsala, Sweden.
2 Department of Mathematics,
Linköping University,
SE-581 83 Linköping, Sweden.
Abstract
The three tests of profile analysis: test of parallelism, test of level and test of flatness
have been studied. Likelihood ratio tests have been derived. Firstly, a traditional
setting, where the sample size is greater than the dimension of the parameter space,
is considered. Then, all tests have been derived in a high-dimensional setting. In
high-dimensional data analysis, it is required to use some techniques to tackle the
problems which arise with the dimensionality. We propose a dimension reduction
method using scores which was first proposed by Läuter et al. (1996).
Keywords: High-dimensional data; hypothesis testing; linear scores; multivariate
analysis; profile analysis; spherical distributions.
1
Notation
Abbreviations
p.d. : positive definite
p.s.d. : positive semi-definite
i.i.d. : independently and identically distributed
i.e. : that is
e.g. : for example
MANOVA : multivariate analysis of variance
GMANOVA : generalized multivariate analysis of variance
BRM : bilinear regression model
PLS : partial least squares
PCA : principal component analysis
PCR : principal component regression
CLT : central limit theorem
LLN : law of large numbers
Symbols
x : column vector
X : matrix
A0 : transpose of A
A−1 : inverse of A
A+ : Moore-Penrose inverse of A
A− : generalized inverse of A
|A| : determinant of A
C(A) : column space of A
r(A) : rank of A
A⊥ : orthocomplement of subspace A
A◦ : C(A◦ ) = C(A)⊥
⊗ : Kronecker product
: orthogonal sum of linear spaces
vec: vec-operator
In : n × n identity matrix
1n : n × 1 vector of ones
E[x] : expectation of x
D[x] : dispersion matrix of x
Np (µ, Σ) : multivariate normal distribution
Np,n (µ, Σ, Ψ) : matrix normal distribution
Wp (Σ, n, ∆) : non-central Wishart distribution with n degrees of freedom
Wp (Σ, n) : central Wishart distribution with n degrees of freedom
d
= : equal in distribution
(A)( )0 : (A)(A)0
2
1 Introduction
In this report, we are going to construct test statistics for each of the three hypothesis
in profile analysis, first in a classical setting where the number of parameters is less than
the number of subjects and then in a high-dimensional setting where the opposite holds,
i.e., the number of parameters exceeds the number of individuals.
In profile analysis, we have multiple variables for each individual who form different groups
(at least two) and the groups are compared based on the mean vectors of these variables.
The idea is to see if there is an interaction between groups and responses. Assume we
have p variables and q independent groups (treatments), the p-dimensional vectors are
denoted by x1 , x2 , ..., xq with mean vectors µ1 , µ2 , ..., µq . The mean profile for the i-th
group is obtained by connecting the lines between the points (1, µi1 ), (2, µi2 ), ..., (p, µip ).
We can then consider the profile analysis as the comparison of these q lines of mean
vectors. See Figure 1 for an illustration.
There are two possible scenarios that can be considered for the responses:
I. The same mean-variable can be compared between q groups over several time-points
(repeated measurements).
II. One can measure different variables for each subject and compare their mean levels
between q groups.
In the literature, the topic has been investigated by many researchers. One of the first
and leading papers on this topic was published by Greenhouse and Geisser (1959) and
the topic has been revisited by Geisser (2003). Srivastava (1987) derived the likelihood
ratio tests together with their distributions for the three hypothesis. A chapter on profile
analysis can be found in the books by Srivastava (2002) and Srivastava and Carter (1983).
Potthoff and Roy (1964) presented the growth curve model for the first time and other
3
extensions within the framework of the growth curve model can be found in Fujikoshi
(2009), where Fujikoshi extended profile analysis, especially statistical inference on the
parallelism hypothesis. Ohlson and Srivastava (2010) considered profile analysis of several
groups, where the groups have partly equal means. Seo, Sakurai and Fujikoshi (2011)
derived the likelihood ratio tests for the two hypotheses, level and flatness, in profile
analysis of growth curve data. Another focus was on the profile analysis with random
effects covariance structure. Srivastava and Singull (2012) constructed tests based on
the likelihood ratio, without any restrictions on the parameter space, for testing the
covariance matrix for random-effects structure or sphericity. Yokoyama (1995) derived
the likelihood ratio criterion with random-effects covariance structure under the parallel
profile model. Yokoyama and Fujikoshi (1993) conducted analysis of parallel growth
curves of groups where they assumed a random-effects covariance structure. They also
gave the asymptotic null distributions of the tests.
4
1.2 Test statistics for the two-sample case
In this section, only the special case of two groups is considered. Let the p-dimensional
(i) (i)
random vectors x1 , ..., xni , i = 1, 2, be independently normally distributed with mean
vector µi and covariance matrix Σ. The sample mean vectors, the sample covariance
matrices and the pooled sample covariance matrix are given by
ni
(i) 1 X (i)
x = xj ,
ni j=1
ni
1 X (i) (i)
S (i)
= (xj − x(i) )(xj − x(i) )0 ,
ni − 1 j=1
1
Sp = [(n1 − 1)S (1) + (n2 − 1)S (2) ].
n1 + n2 − 2
Define a (p − 1) × p matrix C which satisfies C1p = 0 and is of rank r(C) = p − 1. Let
n1 n2
b= , f = n1 + n2 − 2, u = x(1) − x(2) .
n1 + n2
Then the three hypotheses and related test statistics can be written as below (Srivastava
and Carter, 1983; Srivastava, 1987, 2002):
f − (p − 1) + 1 0 0
bu C (CSp C 0 )−1 Cu ≥ Fp−1,f −p+2,α ,
f (p − 1)
where Fp−1,f −p+2,α denotes the α-percentile of the F -distribution with p − 1 and
f − p + 2 degrees of freedom.
n(f − p + 3) 0 0
x C (CV C 0 + bCuu0 C 0 )−1 Cx ≥ Fp−1,n−p+1,α ,
p−1
As it is mentioned before, the second hypothesis is tested given that H1 is true. If one
fails to reject the first hypothesis, it cannot be concluded that the profiles are parallel.
5
2 Useful definitions and theorems
Definition 2.1. The vector space generated by the columns of an arbitrary matrix A :
p × q is denoted C(A):
C(A) = {a : a = Ax, x ∈ Rq }.
Definition 2.2. A matrix, whose columns generate the orthogonal complement to C(A)
is denoted A◦ , i.e., C(A◦ ) = C(A)⊥ . Similar to the generalized inverse, A◦ is not unique.
One can choose A◦ = I − (A0 )− A0 or A◦ = I − A(A0 A)− A0 in addition to some other
choices.
Definition 2.3. The space CV (A) denotes a column vector space with an inner product
defined through the positive definite matrix V ; i.e., for any pair of vectors x and y, the
operation x0 V −1 y holds. If V = I, instead of CI (A) one writes C(A).
(iii) P is unique.
(iv) PA = A(A0 A)− A0 is a projector on C(A) for which the standard inner product is
assumed to hold.
(v) PA,V = A(A0 V −1 A)− A0 V −1 is a projector on CV (A) for which an inner product
defined by (x, y) = x0 V −1 y is assumed to hold and V is p.d.
Definition 2.7. Let X and Y be two arbitrary matrices. The covariance of these two
matrices is defined by
6
Definition 2.8. The general multivariate linear model equals
X = M B + E,
X = AM B + E,
where X : p × n, the unknown mean parameter matrix M : q × k, the two design matrices
A : p × q and B : k × n and the error matrix E. This is also called the GMANOVA
model or the growth curve model.
AXB = C
Theorem 2.2. The equation AXB = C is consistent if and only if C(C) ⊆ C(A) and
C(C 0 ) ⊆ C(B 0 ). A particular solution of the equation is given by
X0 = A− CB − ,
A special case is
0 0
S −1 − B ◦ (B ◦ SB ◦ )− B ◦ = S −1 B(B 0 S −1 B)− B 0 S −1 .
7
Theorem 2.6. Let W1 ∼ Wp (Σ, n, ∆1 ) be independent of W2 ∼ Wp (Σ, m, ∆2 ). Then
W1 + W2 ∼ Wp (Σ, n + m, ∆1 + ∆2 ).
Theorem 2.7. Let X ∼ Np,n (0, Σ, I) and Q be any idempotent matrix of a proper size.
Then
XQX 0 ∼ Wp (Σ, r(Q)).
AW A0 ∼ Wq (AΣA0 , n).
Theorem 2.11. Let S be positive definite and suppose that V , W and H are of proper
sizes, assuming H −1 exists. Then
(S + V HW 0 )−1 = S −1 − S −1 V (W 0 S −1 V + H −1 )−1 W 0 S −1 .
Theorem 2.14.
8
Theorem 2.15. Let A, B and C be matrices of proper sizes. Then
vec(ABC) = (C 0 ⊗ A)vecB.
Theorem 2.16.
(i) (A ⊗ B)0 = A0 ⊗ B 0
(A ⊗ B)(C ⊗ D) = AC ⊗ BD.
Xk = Mk Dk + Ek
1 ··· 1 0 ··· 0 ··· 0
0 · · · 0 1 · · · 1 · · · 0
D = .. . . .. .. . . .. . . .. ,
. . . . . . . .
0 ··· 0 0 ··· 0 ··· 1
9
and where Ek ∼ Np,nk (0, Σ, Ink ). The relation in (1) can be written
X = M D + E , X ∼ Np,N (M D, Σ, IN ).
(p×N ) (p×q)(q×N ) (p×N )
These two matrices, F and C, will be used in the next chapter during the derivations of
the tests. Since the common F and C are used in each hypothesis, they are introduced
here.
X = MD + E (2)
10
two spaces which are orthogonal to each other; C(D 0 ) and C(D 0 )⊥ which correspond to
the mean space and residual space, respectively.
C(D 0 ) C(D 0 )⊥
Figure 2: The decomposition of the space with no restriction on the mean parameter
space.
Note that since D is a full rank matrix, one can write PD0 = D 0 (DD 0 )− D = D 0 (DD 0 )−1 D.
Now we will move on to the derivation of the test statistics.
The null hypothesis and the alternative hypothesis for parallelism can be written
H1 : E(X) = M D, CM F = 0,
A1 : E(X) = M D, CM F 6= 0, (3)
Theorem 3.1. The likelihood ratio statistic for the parallelism hypothesis presented in
(3) can be given as
|CSC 0 |
λ2/N = , (4)
|CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 |
where S = X(I − PD0 )X 0 and K is any matrix satisfying C(K) = C(D) ∩ C(F ),
Then
|CSC 0 |
λ2/N = ∼ Λ(p − 1, N − r(D), r(K)),
|CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 |
CM F = 0 ⇔ (F 0 ⊗ C)vecM = 0
which means that vecM belongs to C(F ⊗ C 0 )⊥ . By Theorem 2.1, the general solution
of the equation CM F = 0 equals
0
M = (C 0 )◦ θ1 + C 0 θ2 F ◦ ,
11
where θ1 and θ2 are new parameters. Inserting this solution into (2) yields
0
X = (C 0 )◦ θ1 D + C 0 θ2 F ◦ D + E.
This is the reparameterization of the first model given by (2) after applying the restrictions
CM F = 0. Here we notice that we are outside the MANOVA and GMANOVA model.
Recall the inequality
−N/2
−N/2 − 21 tr{Σ−1 (X−E(X))(X−E(X))0 } 1
|Σ| e ≤ (X − E(X))(X − E(X))0 e−N p/2
N
with equality if and only if N Σ = (X − E(X))(X − E(X))0 .
Now, we start performing some calculations under the null hypothesis:
0
|(X − (C 0 )◦ θ1 D − C 0 θ2 F ◦ D)( )0 |
↑
I = PD0 +(I − PD0 )
0
= |(XPD0 − (C 0 )◦ θ1 D − C 0 θ2 F ◦ D)( )0 + X(I − PD0 )X 0 |
| {z }
S
−1 0 ◦ 0 ◦0 0
= |S||S (XPD0 − (C ) θ1 D − C θ2 F D)( ) + I|
0 ◦ 0 ◦0 0 −1 0
= |S||(XPD0 − (C ) θ1 D − C θ2 F D) S (XPD0 − (C 0 )◦ θ1 D − C 0 θ2 F ◦ D) + I|.
(5)
Recall from Theorem 2.3,
0 0
S −1 = C 0 (CSC 0 )− C + S −1 (C 0 )◦ [(C 0 )◦ S −1 (C 0 )◦ ]− (C 0 )◦ S −1 .
Notice that
0
D 0 θ10 (C 0 )◦ C 0 (CSC 0 )− C = 0.
orthogonal
12
By Theorem 2.4,
0 0
=|S||C 0 (CSC 0 )− C(XPD0 − C 0 θ2 F ◦ D)(XPD0 − C 0 θ2 F ◦ D)0 C 0 (CSC 0 )− CS + I|
0 0
=| SC 0 (CSC 0 )− C (XPD0 − C 0 θ2 F ◦ D)(XPD0 − C 0 θ2 F ◦ D)0 C 0 (CSC 0 )− CS +S|
| {z } | {z }
P0 0 PC 0 ,S −1
C ,S −1
0
=|(PC0 0 ,S −1 XPD0 − PC0 0 ,S −1 C 0 θ2 F ◦ D)( )0 + S|
↑
(I − PD0 F ◦ ) + PD0 F ◦
=|PC0 0 ,S −1 XPD0 (I − PD0 F ◦ )(PC0 0 ,S −1 XPD0 )0
0
+ (PC0 0 ,S −1 XPD0 PD0 F ◦ − PC0 0 ,S −1 C 0 θ2 F ◦ D)( )0 + S|
≥|PC0 0 ,S −1 XPD0 (I − PD0 F ◦ )(PC0 0 ,S −1 XPD0 )0 + S|. (7)
Since PA◦ = I − PA , one can write I − PD0 F ◦ = P(D0 F ◦ )◦ . From the definition of the
column spaces given in the Notation part C[(D 0 F ◦ )◦ ] = C(D 0 F ◦ )⊥ . Using Theorem 2.17,
C(D 0 F ◦ )⊥ can be decomposed into two orthogonal subspaces:
where C(K) = C(D) ∩ C(F ). The space C(D 0 )⊥ will correspond to I − PD0 and the space
C(D 0 (DD 0 )−1 K) will correspond to PD0 (DD0 )−1 K . Then
(7) =|PC0 0 ,S −1 XPD0 [(I − PD0 ) + PD0 (DD0 )−1 K ](PC0 0 ,S −1 XPD0 )0 + S|
(PD0 (I − PD0 ) = 0, PD0 PD0 (DD0 )−1 K = PD0 (DD0 )−1 K )
=|PC0 0 ,S −1 XPD0 (DD0 )−1 K (PC0 0 ,S −1 X)0 + S|.
For the alternative hypothesis, we do not have any restrictions on the mean parameter
space. Thus, we will use the results from the introduction of Section 3.2, where N Σ
b =
0
RR = S was found.
NΣ
b A1 = S,
b H1 = S + P 0 0 −1 XPD0 (DD0 )−1 K X 0 PC 0 ,S −1 .
NΣ (9)
C ,S
Thus,
|N Σ
b A1 | |S|
λ2/N = = 0
.
|N Σ
b H1 | |S + PC 0 ,S −1 XPD0 (DD0 )−1 K X 0 PC 0 ,S −1 |
The numerator and the denominator are not independently distributed. To be able to
achieve this independency, we introduce the full rank matrix
H = (C 0 , S −1 (C 0 )◦ ).
Multiplying the ratio (both the numerator and the denominator) by H yields
|H 0 Σ
b A1 H| |CSC 0 |
λ2/N = = ·
|H 0 Σ
b H1 H| |CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 |
13
By Theorem 2.7 and Theorem 2.8, the following two relations hold:
The ratio given by λ2/N does not depend Σ, consequently we can replace CΣC 0 with
Ip−1 . For a detailed explanation, see Appendix A, Result A1. Therefore,
|CSC 0 |
λ2/N = ∼ Λ(p − 1, N − r(D), r(K)),
|CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 |
Note that the distribution of λ2/N , that is Wilks’ lambda distribution, can be approxi-
mated very accurately (Läuter, 2016, Mardia, Kent and Bibby, 1979).
Assuming that the profiles are parallel, we will construct a test to check if they coincide.
The null hypothesis and the alternative hypothesis for the level test can be written
H2 |H1 : E(X) = M D, M F = 0,
A2 |H1 : E(X) = M D, CM F = 0, (10)
Theorem 3.2. The likelihood ratio statistic for the level hypothesis can be expressed as
0
2/N |(C 0 )◦ S −1 (C 0 )◦ |−1
λ = 0 0 0
((C 0 )◦ S −1 (C 0 )◦ )−1 + ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 XD 0 (DD 0 )−1 KQ−1 K 0
0
× (DD 0 )−1 DX 0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1
(11)
where Q = K 0 (DD 0 )−1 K+K 0 (DD 0 )−1 DX 0 C 0 (CSC 0 )−1 CXD 0 (DD 0 )−1 K and C(K) =
C(D) ∩ C(F ),
0 0
((C 0 )◦ S −1 (C 0 )◦ )−1 ∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , N − r(D) − p + 1),
0 0
((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 XD 0 (DD 0 )−1 KQ−1 K 0 (DD 0 )−1 DX 0 S −1 (C 0 )◦
0 0
× ((C 0 )◦ S −1 (C 0 )◦ )−1 ∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , r(K)).
Then
Proof. Equivalent expressions for the restrictions in both hypotheses can be written as
0
H2 : M F = 0 ⇔ M = θF ◦ ,
0
A2 : CM F = 0 ⇔ M = (C 0 )◦ θ1 + C 0 θ2 F ◦ .
14
Plugging these solutions into the models gives
0
H2 : X = θF ◦ D + E,
0
A2 : X = (C 0 )◦ θ1 D + C 0 θ2 F ◦ D + E.
where X(I − PD0 )X 0 = S. We know that S and XPD0 (DD0 )−1 K X 0 are Wishart dis-
tributed, but PC0 0 ,S −1 XPD0 (DD0 )−1 K X 0 PC 0 ,S −1 is not. Therefore the likelihood function
will be manipulated similarly to the treatment of the parallelism hypothesis. Put H =
(C 0 , S −1 (C 0 )◦ ) which is a full rank matrix and multiply both the numerator and the
denominator in the likelihood ratio with H 0 and H from left and right respectively:
|H 0 Σ
b A2 H|
λ2/N = ·
|H 0 Σ
b H2 H|
15
where
V11 =C(S + PC0 0 ,S −1 XPD0 (DD0 )−1 K X 0 PC 0 ,S −1 )C 0 ,
V12 =C(S + PC0 0 ,S −1 XPD0 (DD0 )−1 K X 0 PC 0 ,S −1 )S −1 (C 0 )◦ ,
0
V21 =(C 0 )◦ S −1 (S + PC0 0 ,S −1 XPD0 (DD0 )−1 K X 0 PC 0 ,S −1 )C 0 ,
0
V22 =(C 0 )◦ S −1 (S + PC0 0 ,S −1 XPD0 (DD0 )−1 K X 0 PC 0 ,S −1 )S −1 (C 0 )◦ .
It follows that
0 0
V21 = (C 0 )◦ C 0 +(C 0 )◦ S −1 PC0 0 ,S −1 XPD0 (DD0 )−1 K X 0 PC 0 ,S −1 C 0
| {z }
=0
0 ◦0
= (C ) S −1 [C 0 (CSC 0 )− CS]0 XPD0 (DD0 )−1 K X 0 C 0 (CSC 0 )− CSC 0
0
= (C 0 )◦ S −1 0 0 − 0 0 0 −
| {zS} C (CSC ) CXPD0 (DD0 )−1 K X C (CSC ) CSC
0
=I
| {z }
=0
= 0.
Notice that V12 = V210 . Therefore V12 = 0.
V11 = CSC 0 + CSC 0 (CSC 0 )− C XPD0 (DD0 )−1 K X 0 C 0 (CSC 0 )− CSC 0
| {z } | {z }
=C =C 0
and
0
V22 =(C 0 )◦ S −1 SS −1 (C 0 )◦
0
+ (C 0 )◦ S −1 0 0 − 0 0 0 − −1 0 ◦
| {zS} C (CSC ) CXPD0 (DD0 )−1 K X C (CSC ) CSS (C )
=I
| {z }
=0
0 ◦0 −1 0 ◦
=(C ) S (C ) .
C(X(I − PD0 F ◦ )X 0 )C 0 0
= 0 ◦0 −1 .
0 (C ) S (C 0 )◦
C(S + XP 0 0 −1 X 0 )C 0 C(S + XP 0 0 −1 X 0 )S −1 (C 0 )◦
= (C 0 )◦0 S −1 (S + DXP
(DD ) K
0 0 0
D (DD ) K
(C 0 )◦ S −1 (S + XPD0 (DD0 )−1 K X 0 )S −1 (C 0 )◦
D 0 (DD 0 )−1 K X )C
CX(I − P 0 ◦ )X 0 C 0 CX(I − P 0 ◦ )X 0 S −1 (C 0 )◦
= (C 0 )◦0 S −1 X(I D− FP 0 ◦ )X 0 C 0 (C 0 )◦0 S −1 X(I D− FP 0 ◦ )X 0 S −1 (C 0 )◦ .
D F D F
16
Note that we used the relation X(I − PD0 F ◦ )X 0 = X(I − PD0 )X 0 + XPD0 (DD0 )−1 K X 0 =
S + XPD0 (DD0 )−1 K X 0 . The determinant for the alternative hypothesis is straightforward.
We use the Theorem 2.10, which is for the partitionened matrices, for the determinant
for the null hypothesis. Hence,
0
b A2 H| = |CX(I − PD0 F ◦ )X 0 C 0 ||(C 0 )◦ S −1 (C 0 )◦ |,
|H 0 Σ
b H2 H| = |CX(I − PD0 F ◦ )X 0 C 0 ||(C 0 )◦0 S −1 X(I − PD0 F ◦ )X 0 S −1 (C 0 )◦
|H 0 Σ
0
− (C 0 )◦ S −1 X(I − PD0 F ◦ )X 0 C 0 [CX(I − PD0 F ◦ )X 0 C 0 ]−1
× CX(I − PD0 F ◦ )X 0 S −1 (C 0 )◦ |.
When we take the ratio of these two quantities, i.e., the ratio of |H 0 Σ b A2 H| and |H 0 Σ
b H2 H|,
the first term, which is |CX(I −PD0 F ◦ )X C |, will cancel out. Say S1 = X(I −PD0 F ◦ )X 0 .
0 0
Thus,
0
|H 0 Σ
b A2 H| |(C 0 )◦ S −1 (C 0 )◦ |
=
|H 0 Σ
b H2 H| |(C 0 )◦ S −1 [S1 − S1 + (C 0 )◦ ((C 0 )◦0 S1−1 (C 0 )◦ )−1 (C 0 )◦0 ]S −1 (C 0 )◦ |
0
0
|(C 0 )◦ S −1 (C 0 )◦ |
=
|(C 0 )◦0 S −1 (C 0 )◦ ||((C 0 )◦0 S1−1 (C 0 )◦ )−1 (C 0 )◦0 S −1 (C 0 )◦ |
0
|(C 0 )◦ S −1 (C 0 )◦ |−1
= ·
|(C 0 )◦0 S1−1 (C 0 )◦ )|−1
Notice that
S1−1 = [X(I − PD0 F ◦ )X 0 ]−1 = (S + XPD0 (DD0 )−1 K X)−1
= [S + (XP1 )(XP1 )0 ]−1 (put P1 =PD0 (DD0 )−1 K )
17
For the last determinant in (13),
0 0
I + P1 X 0 S −1 XP1 − P1 X 0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 XP1
0 0
= I + P1 X 0 [S −1 − S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 ]XP1 . (14)
Thus,
0 0
|(C 0 )◦ S1−1 (C 0 )◦ )|−1 = |(C 0 )◦ S −1 (C 0 )◦ |−1 |P1 X 0 S −1 XP1 + I|
× |I + P1 X 0 C 0 (CSC 0 )−1 CXP1 |−1 .
Note that
P1 = PD0 (DD0 )−1 K = D 0 (DD 0 )−1 K(K 0 (DD 0 )−1 DD 0 (DD 0 )−1 K)− K 0 (DD 0 )−1 D.
Plug this into the ratio above and take K 0 (DD 0 )−1 D to the left:
Now we take (K 0 (DD 0 )−1 DD 0 (DD 0 )−1 K)− out for both the numerator and the denom-
inator. Then
|K 0 (DD 0 )−1 DD 0 (DD 0 )−1 K + K 0 (DD 0 )−1 DX 0 C 0 (CSC 0 )−1 CXD 0 (DD 0 )−1 K|
(15) = ·
|K 0 (DD 0 )−1 DD 0 (DD 0 )−1 K + K 0 (DD 0 )−1 DX 0 S −1 XD 0 (DD 0 )−1 K|
0 0
DD 0 (DD 0 )−1 = I. Moreover, we know that S −1 = S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦
× S −1 + C 0 (CSC 0 )−1 C, which implies
18
Put Q = K 0 (DD 0 )−1 K + K 0 (DD 0 )−1 DX 0 C 0 (CSC 0 )−1 CXD 0 (DD 0 )−1 K and use the
rotation in Theorem 2.4:
0 0
(15) = |I + (C 0 )◦ S −1 XD 0 (DD 0 )−1 KQ−1 K 0 (DD 0 )−1 DX 0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 |−1
0 0 0 0
= |(C 0 )◦ S −1 (C 0 )◦ |−1 |((C 0 )◦ S −1 (C 0 )◦ )−1 + ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1
0
× XD 0 (DD 0 )−1 KQ−1 K 0 (DD 0 )−1 DX 0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 |−1
0
|(C 0 )◦ S −1 (C 0 )◦ |−1
= 0 0 0 ·
((C 0 )◦ S −1 (C 0 )◦ )−1 + ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 XD 0 (DD 0 )−1 KQ−1 K 0
0
× (DD 0 )−1 DX 0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1
Now we will find the distributions of the expressions in this ratio. Let’s start with
0
((C 0 )◦ S −1 (C 0 )◦ )−1 . We multiply this expression with the following identity matrices
from left and right:
0 0 0 0 0
((C 0 )◦ (C 0 )◦ )−1 (C 0 )◦ (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ (C 0 )◦ ((C 0 )◦ (C 0 )◦ )−1
| {z } | {z }
I I
| {z }
=S−SC 0 (CSC 0 )−1 CS
◦0 ◦0 0
= ((C 0 ) (C 0 )◦ )−1 (C 0 ) [S − SC 0 (CSC 0 )−1 CS](C 0 )◦ ((C 0 )◦ (C 0 )◦ )−1 . (16)
r(I − PD0 )(I − X 0 C 0 [CX(I − PD0 )X 0 C 0 ]−1 CX(I − PD0 )) = N − r(D) − p + 1. (18)
0
We also need to show that (C 0 )◦ Σ−1 X and X 0 C 0 are independent which is verified in
Appendix A, Result A3.
0 0
Thus, we can conclude that the conditional distribution of ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 X
× (I − PD0 )(I − X 0 C 0 [CX(I − PD0 )X 0 C 0 ]−1 CX(I − PD0 )) conditioned on CX follows
0
a normal distribution with mean 0 and dispersion matrix ((C 0 )◦ Σ−1 (C 0 )◦ )−1 . Then, by
Theorem 2.7,
0 0
((C 0 )◦ S −1 (C 0 )◦ )−1 ∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , N − r(D) − p + 1),
19
which is independent of CX. Now we will move on to the second expression in the ratio
given by (15), which equals
0 0
((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 XD 0 (DD 0 )−1 KQ−1 K 0
0
× (DD 0 )−1 DX 0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 . (19)
0 0 0
First focus on ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 X. We use the identity matrix ((C 0 )◦ (C 0 )◦ )−1
0 0 0
×(C 0 )◦ (C 0 )◦ and the relations (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 = I −SC 0 (CSC 0 )−1 C
and S = X(I − PD0 )X 0 . Then
0 0 0 0
((C 0 )◦ (C 0 )◦ )−1 (C 0 )◦ (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 X
0 0
= ((C 0 )◦ (C 0 )◦ )−1 (C 0 )◦ X[I − (I − PD0 )X 0 C 0 (CX(I − PD0 )X 0 C 0 )−1 CX]. (20)
0
Moreover, (C 0 )◦ Σ−1 X and X 0 C 0 are independently distributed. Thus, (21) is normally
distributed given CX and so is
0 0
((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 XD 0 (DD 0 )−1 KQ−1/2 . (22)
From the definition of the Wishart distribution given by Definition 2.6, if X is normally
distributed with mean 0 and dispersion I ⊗ Σ, then XX 0 ∼ W (Σ, n). So we need to
check the mean and dispersion for (22). The mean is zero and for the dispersion recall
that
Q−1/2 K 0 (DD 0 )−1 D[I − X 0 C 0 (CX(I − PD0 )X 0 C 0 )−1 CX(I − PD0 )][I − (I − PD0 )
0 0
× X 0 C 0 (CX(I − PD0 )X 0 C 0 )−1 CX]D 0 (DD 0 )−1 KQ−1/2 ⊗ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦
0
× Σ−1 ΣΣ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 .
(23)
The details for the calculation of (23) is given in Appendix A, Result A4. Based on this
result
0
(23) = I ⊗ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 .
20
Hence, we can conclude that (22) is normally distributed with mean 0 and dispersion
0
I ⊗ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 , conditional on CX. Therefore, the square of the matrix in (22)
is Wishart distributed. Notice that
is idempotent. To show this, put G = Q−1/2 K 0 (DD 0 )−1 D[I−X 0 C 0 (CX(I−PD0 )X 0 C 0 )−1
× CX(I − PD0 )] and
G0 GG 0 0
| {z } G = G G.
=I
To see the details for GG0 = I, see Appendix A, Result A4. Thus, (24) is idempotent.
We need to check the rank of this idempotent matrix to determine the degrees of freedom
in the Wishart distribution (See Theorem 2.7):
Prop. 2.14 ii) Prop. 2.14 i)
r((24)) = r(G0 G) = tr(G0 G) = tr(GG0 ) = tr(I).
Furthermore, tr(I) will be equal to the size of GG0 . To be able to find the size of GG0 , one
needs to check Q = K 0 (DD 0 )−1 K + K 0 (DD 0 )−1 DX 0 C 0 (CSC 0 )−1 CXD 0 (DD 0 )−1 K.
Say K is a q × s matrix and r(K) = s. Then Q is s × s, so the size of GG0 is s = r(K).
As a conclusion,
0 0
((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 XD 0 (DD 0 )−1 KQ−1 K 0 (DD 0 )−1 DX 0 S −1 (C 0 )◦
0 0
× ((C 0 )◦ S −1 (C 0 )◦ )−1 ∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , r(K)).
|U |
Thus, the distribution for (15) is given by , where
|U + V |
0
U ∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , N − r(D) − p + 1),
0
V ∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , r(K)).
0
If we pre- and post-multiply U and V with ((C 0 )◦ Σ−1 (C 0 )◦ )1/2 , and denote the new
e and Ve respectively, then the ratio becomes |U | , where
e
expressions with U
|U
e + Ve |
Then
|U
e|
λ2/N = ∼ Λ(1, N − r(D) − p + 1, r(K)).
|U
e + Ve |
21
3.2.3 Flatness Hypothesis
Assuming that the profiles are parallel, we will test if they are flat or not.
H3 |H1 : E(X) = M D, CM = 0,
A3 |H1 : E(X) = M D, CM F = 0, (25)
where
Then
λ2/N ∼ Λ(p − 1, N − r(D) + r(K), r(D 0 F ◦ )).
Proof. Equivalent expressions for the restrictions in both hypotheses can be written
H3 : CM = 0 ⇔ M = (C 0 )◦ θ,
0
A3 : CM F = 0 ⇔ M = (C 0 )◦ θ1 + C 0 θ2 F ◦ .
H3 : X = (C 0 )◦ θD + E,
0
A3 : X = (C 0 )◦ θ1 D + C 0 θ2 F ◦ D + E.
We cannot simply say that (27) ≥ |X(I − PD0 )X 0 | with equality if and only if XPD0 =
(C 0 )◦ θD because XPD0 = (C 0 )◦ θD is not necessarily a consistent equation. Recall the
two conditions for consistency from Theorem 2.2:
(i) C(PD0 X 0 ) ⊆ C(D 0 ) is satisfied.
22
Let G = XPD0 − (C 0 )◦ θD and using Theorem 2.3,
0 0
(27) = |S||I + G0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )− (C 0 )◦ S −1 G + G0 C 0 (CSC 0 )− CG|
≥ |S||I + G0 C 0 (CSC 0 )− CG| (28)
This is equivalent to
0
G0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )− = 0.
Thus, the lower bound which we were seeking for in (28) and which equals |S||I +
G0 C 0 (CSC 0 )− CG| = |S||I + PD0 X 0 C 0 (CSC 0 )− CXPD0 | has been obtained. The situ-
ation for the third hyothesis is similar to the level hypothesis where we have CM F = 0
as the alternative hypothesis. We assume that the profiles are parallel and the test is con-
ducted to see if they are flat or not. The restrictrictions for this test can be summarised
with CM = 0. Due to the assumption that parallelism holds, the alternative hypothesis
becomes CM F = 0. Then (9) from Section 3.2.1, where the likelihood for CM F = 0
has been derived, will be used for |N Σb A3 |. As a result,
In order to get a familiar structure for the ratio of these two quantities, we need to
do some changes on |N ΣH3 |. Use the rotation given by Theorem 2.4 and since PD0 is
idempotent, PD0 PD0 = PD0 holds. And use also I = SS −1 :
b H3 | = |S||I + PD0 X 0 C 0 (CSC 0 )− CSS −1 X|
|N Σ (rotate)
= |S||I + XPD0 X 0 C 0 (CSC 0 )− CS S −1 | (PC 0 ,S −1 = PC2 0 ,S −1 and then rotate)
| {z }
=PC 0 ,S −1
= |S||I + PC 0 ,S −1 S XPD0 X 0 PC 0 ,S −1 |
−1
Hence,
b H3 | = |S + P 0 0 −1 XPD0 X 0 PC 0 ,S −1 |;
|N Σ C ,S
23
We already know that XPD0 X 0 is Wishart distributed (see Theorem 2.7) but PC0 0 ,S −1 X
× PD0 X 0 PC 0 ,S −1 is not. Similarly XPD0 (DD0 )−1 K X 0 is Wishart distributed but PC0 0 ,S −1 X
× PD0 (DD0 )−1 K X 0 PC 0 ,S −1 is not. Let H = (C 0 , S −1 C 0 )◦ be of full rank:
|H 0 Σ
b A3 H|
λ2/N = .
|H 0 Σ
b H3 H|
The alternative hypothesis for the flatness test is the same as for the level test. Conse-
quently, the corresponding likelihood will have the same form, so H 0 Σ
b A2 H = H 0 Σb A3 H.
Notice that the matrix H used during the level test is the same as the matrix H intro-
duced here. Then
0
|H 0 Σ
b A3 H| = |CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 ||(C 0 )◦ S −1 (C 0 )◦ |.
24
Thus, the ratio becomes:
0
2/N |H 0 Σ
b A3 H| |CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 ||(C 0 )◦ S −1 (C 0 )◦ |
λ = =
|H 0 Σ
b H3 H| |CSC 0 + CXPD0 X 0 C 0 ||(C 0 )◦0 S −1 (C 0 )◦ |
|CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 |
= ,
|CSC 0 + CXPD0 X 0 C 0 |
where CSC 0 , CXPD0 (DD0 )−1 K X 0 C 0 and CXPD0 X 0 C 0 are all Wishart distributed. How-
ever, we are trying to find a structure with |U|U |
+V |
, U and V are independently Wishart
distributed. Recall the space decomposition given in Equation (8) for the parallelism
hypothesis. It implies
|H 0 Σ
b A3 H| |CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 |
λ2/N = =
|H 0 Σ
b H3 H| |CSC 0 + CX(PD0 (DD0 )−1 K + PD0 F ◦ )X 0 C 0 |
|CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 |
= ·
|CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 + CXPD0 F ◦ X 0 C 0 |
We already know that CSC 0 ∼ Wp−1 (CΣC 0 , N − r(D)) and by Theorem 2.6, the sum
of two independently distributed Wishart matrices with the same scale matrix is again
Wishart. Then
4 High-dimensional setting
4.1 Background
The classical setting for data analysis usually consists of large number of experimental
units and small number of variables. For estimability reasons, the number of data points,
n, needs to be larger than the number of parameters, p. Asymptotic properties have been
derived in the classical setting. Theorems such as The Law of Large Numbers and The
Central Limit Theorem focus on the case when p is fixed and n → ∞.
25
In recent years due to the development of information technology and data storage, the
direction of the relationship between p and n has started to change. We face more research
questions where we have more parameters than data points:
p>n or p n. (29)
26
are called score coefficients or weights. With this approach high-dimensional observations
are compressed into low-dimensional scores. Then these are used for the analysis instead
of the original data. This approach can be useful in many situations because we often do
not have the knowledge on the effect of each single variable or one may want to investigate
the joint effect of several variables.
Let’s give the mathematical representation of the theory. Suppose
x = (xi ) ∼ Np (µ, Σ)
z 0 = (z1 , z2 , · · · , zn ) = (d1 , d2 , · · · , dp )X = d0 X,
where d is the vector of weights and zj ’s, j = 1, ..., n, are the individual scores. The
rule for choosing the vector d of the coefficients is that it has to be a unique function
of XX 0 which is the p × p matrix of the sums of the products. Moreover, the condition
d0 X 6= 0 with probability 1 needs to be satisfied. The total sums of product matrix
XX 0 corresponds to the hypothesis µ = 0. Consequently, based on the hypothesis the
structure of the function can change. We will try to illustrate the idea with two primary
theorems presented in Läuter, Glimm and Kropf (1996).
Theorem 4.1. (Läuter et al., 1996) Assume that X is a p × n matrix consisting
of n p-dimensional observations (p ≥ 1, n ≥ 2) that follows the normal distribution
X ∼ Np×n (0, Σ, In ). Define a p-dimensional vector of weights d which is a function of
XX 0 and assume d0 X 6= 0 with probability 1. Then
√
nz̄
t= (30)
sz
has the exact t distribution with n − 1 degrees of freedom, where
1 0 1
z 0 = (zj )0 = d0 X, z̄ = z 1n , s2z = (z 0 z − nz̄ 2 ).
n n−1
Theorem 4.2. (Läuter et al., 1996) Assume that H ∼ Wp (Σ, m) and G ∼ Wp (Σ, f )
and they are independently distributed. Define a p-dimensional vector of weights d which
is a function of H + G and assume d0 (H + G)d 6= 0 with probability 1. Then
f d0 Hd
F =
m d0 Gd
follows an F -distibution with m and f degrees of freedom.
The idea behind these theorems are based on the theory of spherical distributions which
has been treated extensively in the book by Fang and Zhang (1990). Elliptically contoured
distributions can be considered as the generalization of the class of Gaussian distribution
which has been the centre of multivariate theory. Normality is assumed for many testing
problems, but practically this is often not true. Thus, there has been an effort to extend
27
the class of normal distibution to a wider class which still keeps the basic properties of
normal distribution.
We know that if z ∼ Nn (0, In ), the statistic t given in (30) has a t-distribution with n − 1
degrees of freedom , so we need to show a connection between z and the standard normal
distribution. Since the normal distribution is in the class of spherical distributions, if
one can show that z is spherically distributed, then this connection will be provided.
We also need to show that the test statistics’ distributions remain the same when we
use spherically distributed random vectors. These ideas are given by a corollary and a
theorem, among others, by Fang and Zhang (1990).
Theorem 4.3. (Fang and Zhang, 1990) A statistic t(x)’s distribution remains the
d
same whenever x ∼ Sn+ (φ) if t(αx) = t(x) for each α > 0 and each x ∼ Sn+ (.), where
Sn (φ) denotes the spherical distribution with parameter φ and φ(.) is a function of a
scalar variable and it is called the characteristic generator of the spherical distribution.
If x ∼ Sn (φ) and P (x = 0) = 0, this is denoted by x ∼ Sn+ (φ).
H1 : E(X) = M D, CM F = 0,
A1 : E(X) = M D, CM F 6= 0
and
|ΣA1 | |CSC 0 |
LR = = ,
|ΣH1 | |CSC 0 + CXPD0 (DD0 )− K X 0 C 0 |
where
28
In the beginning it was assumed that X ∼ Np,N (M D, Σ, IN ). If we multiply X with
C, then CX ∼ N(p−1),N (CM D, CΣC 0 , IN ). As we can see through the derivation of
the statistics in (31), X appears with C. Let Y = CX. Then
Theorem 4.4. The ratio given in (32) follows Wilks’ lambda distribution with parameters
1, r(K) and N − r(D) that is denoted by Λ(1, r(K), N − r(D)) which is equivalent to
B N −r(D)
2
, r(K)
2
, where B(·, ·) denotes the Beta-distribution.
which means that we cannot find the distribution of the ratio directly. This is where we
need the theory of spherical distributions.We begin by showing that the scores are spher-
ically distributed. To show this, first we need to show that Y is spherically distributed:
YΓ = Y Γ
D(YΓ ) = (Γ0 Γ) ⊗ Σ = I ⊗ Σ but E(YΓ ) 6= E(Y ).
Thus, Y is not spherically distributed. Therefore, we need to adapt the test statistic
without changing the overall value in order to achieve sphericity.
Recall that the model under the null hypothesis is
0
X = (C 0 )◦ θ1 D + C 0 θ2 F ◦ D + E,
We subtract the mean under the null hypothesis from X. Then the expressions in the
29
ratio given by (32) will become as it follows:
Then
(ii) = d0 CXPD0 (DD0 )−1 K X 0 C 0 d = d0 Y PD0 (DD0 )−1 K Y 0 d
This means that we can subtract the mean from X and the expression of the likelihood
ratio remains the same which means that the distribution will remain the same:
d0 Y (I − PD0 )Y 0 d
λN/2 =
d0 Y (I − PD0 )Y 0 d + d0 Y PD0 (DD0 )−1 K Y 0 d
d0 [Y − CE(X)](I − PD0 )[Y − CE(X)]0 d
= 0 ·
d [Y − CE(X)](I − PD0 )[Y − CE(X)]0 d + d0 [Y − CE(X)]
× PD0 (DD0 )−1 K [Y − CE(X)]0 d
d0 Ye (I − PD0 )Ye 0 d
λN/2 = .
d0 Ye (I − PD0 )Ye 0 d + d0 Ye PD0 (DD0 )−1 K Ye 0 d
The reason why we needed the adaptation of the Y was that it was not spherically
distributed. Now without changing the statistics overall, we achieved a new variable Ye
which is spherical distributed. To show this:
30
We showed that Ye is distributed with mean 0 and with covariance the same as Y ’s.
Define YeΓ = Ye Γ, where Γ is an N × N orthogonal matrix. The subscript Γ means
multiplication with Γ from the right-hand side.
E(YeΓ ) = E(Y Γ) = 0,
D(YeΓ ) = (Γ0 Γ) ⊗ Σ = I ⊗ Σ.
This proves that YeΓ and Ye follow the same (p − 1) × N normal distribution. Now we
can show that the scores, z 0 = d0 Ye , are spherically distributed. First, define zΓ0 = d0Γ YeΓ .
Notice that dΓ = d since they are derived from the same matrix:
YeΓ YeΓ0 = Ye Γ(Ye Γ)0 = Ye ΓΓ0 Ye 0 = Ye Ye 0 .
In Chapter 4.2, it was stated that d needs to be a unique function of the total sums of
product matrix. The derivation above is based on this information. Then
zΓ0 = d0 Ye Γ = z 0 Γ. (34)
From this result, we can say that the vectors z 0 and zΓ0 follows the same distribution
which is spherical. It is well-known that every spherically distributed random vector x
d
has a stochastic representation x = Ru(n) , where u(n) is a uniformly distributed random
d
vector and R ≥ 0 is independent of u(n) . Furthermore, if x ∼ Sn (φ), then x = Rw,
where w ∼ Nn (0, In ) is independent of R ≥ 0. Lastly, recall the Theorem 2.5.8. from
Fang and Zhang (1990) which is also given in Chapter 4.2 as Theorem 4.3. Now we can
work on the ratio Λ2/N and implement these results:
z 0 (I − PD0 )z
f (z) = λ2/N =
z 0 (I − PD0 )z + z 0 PD0 (DD0 )−1 K z
d (Rw)0 (I − PD0 )(Rw)
=
(Rw)0 (I − PD0 )(Rw) + (Rw)0 PD0 (DD0 )−1 K (Rw)
R2 w0 (I
− PD0 )w
=
R2 w0 (I − PD0 )w + w0 PD0 (DD0 )−1 K w
which does not depend on R. The theorem from Fang and Zhang is satisfied. Thus, we
can conclude that
z 0 (I − PD0 )z
λ2/N = ∼Λ(1, N − r(D), r(K))
z 0 (I − PD0 )z + z 0 PD0 (DD0 )−1 K z
N − r(D) r(K)
≡B , . (35)
2 2
31
4.3.2 Level hypothesis
First let’s recall how the null and the alternative hypotheses looked like in the normal
setting:
H2 |H1 : E(X) = M D, M F = 0,
A2 |H1 : E(X) = M D, CM F = 0
For this hypothesis, it is clear that the expressions in the likelihood ratio are already one
dimensional. The contrast matrix was given explicitly in Chapter 3.1. It is a (p − 1) × p
matrix and
0
C : (p − 1) × p ⇒ C 0 : p × (p − 1) ⇒ (C 0 )◦ : p × 1 ⇒ (C 0 )◦ : 1 × p.
However, dimension reduction is needed due to the degrees of freedom in the Wishart
distribution. Recall
0 0
((C 0 )◦ S −1 (C 0 )◦ )−1 ∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , N − r(D) − p + 1).
When p > N , the degrees of freedom will become negative which cannot take place.
Moreover, S −1 does not exist in this case. Thus, we need to take care of these issues.
One can notice that we have a more complicated situation here because the expressions
in the ratio are more complex than for the parallelism and the level hypotheses. To solve
0 0 0
the problem, we expand ((C 0 )◦ S −1 (C 0 )◦ )−1 and ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 X. The
expansions will be investigated in two parts; (i) and (ii), respectively:
0
(i) We start with ((C 0 )◦ S −1 (C 0 )◦ )−1 :
0 0 0 0
((C 0 )◦ S −1 (C 0 )◦ )−1 = ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1
0 0
× (C 0 )◦ Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1
0 0 0
=((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 [S − SC 0 (CSC 0 )−1 CS]Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 .
(37)
32
When we check (38), we can notice the structure, say LXP X 0 L0 , where P is an idempo-
tent matrix. This was aimed from the beginning and this will allow us to give the exact
0
distribution of (38). Notice that (C 0 )◦ Σ−1 X and CX are independent which is proved
in Appendix A, Result A3.
It is assumed that d is a function of CX. Therefore,
0 0
((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 X[(I − PD0 )(I − X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1
×d0 CX(I − PD0 ))] (39)
is normally distributed given CX and it’s square is Wishart distributed. Let’s check the
rank of the idempotent matrix given below to determine the degrees of freedom of the
Wishart distribution. For the detailed calculations, see Appendix B, Result B1:
r[(IN − PD0 )(IN − X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX(I − PD0 ))] = N − r(D) − 1.
The degrees of freedom of the Wishart distribution for (38) is now found. Then, the
covariance matrix needs to be obtained. From Theorem 2.12,
0 0
D[vec(((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 X)]
0 0 0
=I ⊗ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 ΣΣ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1
0
=I ⊗ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 . (40)
We have examined the first expression in the ratio given by (36). Dimension reduction
has been applied with the help of the vector d after using necessary expansions. Then
the distribution of this expression, that is (38), has been found.
We continue with the second expression which appears in the denominator of (36).
0 0
(ii) Let’s start with the first part which is ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 X. Recall also
0 0
the relations (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 = I − SC 0 (CSC 0 )−1 C and S = X(I −
PD0 )X 0 :
0 0
((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 X
0 0 0 0
=((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 X
0 0
=((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 X[I − (I − PD0 )X 0 C 0 (CX(I − PD0 )X 0 C 0 )−1 CX].
33
Apply d0 to CX which yields
e = K 0 (DD 0 )−1 K + K 0 (DD 0 )−1 DX 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)− d0 CXD 0 (DD 0 )−1 K.
Q
Then, after the dimension reduction, the second expression in the denominator of the
likelihood ratio given by (36) becomes
0 0
((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 X[I − (I − PD0 )X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1
× d0 CX]D 0 (DD 0 )−1 K Q e −1 K 0 (DD 0 )−1 D[I − X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1
0
× d0 CX(I − PD0 )]X 0 Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 . (42)
In (42)
which corresponds to the part that lies between X and X 0 , is an idempotent matrix.
For the details, see Appendix B, Result B2. This means that for (42) the structure,
LXP X 0 L0 , where P is an idempotent matrix, has been achieved. We just need to
0
show that (C 0 )◦ Σ−1 X and CX are independent which has already been shown in this
chapter for (i). Since d is a function of CX and Q e also depends on d0 CX, one can
0 ◦0 −1
conclude that (C ) Σ X is independent of what lies in the idempotent matrix given
by (43). Thus, (42) is Wishart distributed. For the degrees of freedom of the Wishart
distribution, one needs to check the rank of (43) which is equal to r(K), where K is any
matrix satisfying C(K) = C(D) ∩ C(F ), which was first defined in Theorem 3.1. For the
proof, see Appendix B, Result B3.
For the scale matrix of the Wishart distribution for (42), see the following which has also
been calculated in (40):
0 0 0
D[vec(((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 X)] = I ⊗ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 .
Our aim was to find the distribution of the ratio which was given by (36) after the
dimension reduction. The expressions from this ratio have been investigated separately
in (i) and (ii) above. If the results from (i) and (ii) are put together, we can conclude the
following:
0 0
((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 X[(I − PD0 )(I − X 0 C 0 d(d0 CX(I − PD0 )
0
×X 0 C 0 d)−1 d0 CX(I − PD0 ))]X 0 Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1
0
∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , N − r(D) − 1); (44)
◦0 ◦0
((C 0 ) Σ−1 (C 0 )◦ )−1 (C 0 ) Σ−1 X[I − (I − PD0 )X 0 C 0 d(d0 CX(I − PD0 )
× X 0 C 0 d)−1 d0 CX]D 0 (DD 0 )−1 K Qe −1 K 0 (DD 0 )−1 D[I − X 0 C 0 d(d0 CX
0
× (I − PD0 )X 0 C 0 d)−1 d0 CX(I − PD0 )]X 0 Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1
0
∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , r(K)). (45)
34
These expressions can be simplified in the following way:
0 0
(44) = ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 [X(I − PD0 )X 0 − X(I − PD0 )X 0 C 0 d(d0 CX
0
× (I − PD0 )X 0 C 0 d)−1 d0 CX(I − PD0 )X 0 ]Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1
0 0
= ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 [S − SC 0 d(d0 CSC 0 d)−1 d0 CS]Σ−1 (C 0 )◦
0
× ((C 0 )◦ Σ−1 (C 0 )◦ )−1
0 0 0
= ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 [S(I − PC 0 d,S −1 )]Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1
Then
0 0 0
((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 [S(I − PC 0 d,S −1 )]Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1
0
∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , N − r(D) − 1).
Moreover
0 0
(45) = ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 [X − SC 0 d(d0 CSC 0 d)−1 d0 CX]
× D 0 (DD 0 )−1 K Q
e −1 K 0 (DD 0 )−1 D[X 0 − X 0 C 0 d
0
× (d0 CSC 0 d)−1 d0 CS]Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1
0 0
= ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 (I − P 0 0 −1 )XD 0 (DD 0 )−1 K Q
C d,S
e −1
× K 0 (DD 0 )−1 DX 0 (I − PC 0 d,S −1 )
0
∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , r(K)).
Denote (44) by U
e L and (45) by VeL . Then the following theorem can be presented:
Theorem 4.5.
U
eL
λ2/N = ∼Λ(1, N − r(D) − 1, r(K))
U
e L + VeL
N − r(D) − 1 r(K)
≡B , .
2 2
We will follow a very similar approach to the parallelism hypothesis due to the similarities
between the test statistics in the likelihood ratio. Recall the null and the alternative
hypotheses:
H3 |H1 : E(X) = M D, CM = 0,
A3 |H1 : E(X) = M D, CM F = 0
and
|Σ
b A3 | |CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 |
LR = = ,
|Σ
b H3 | |CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 + CXPD0 F ◦ X 0 C 0 |
35
where
CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 ∼ Wp−1 (CΣC 0 , N − r(D) + r(K)), (46)
CXPD0 F ◦ X 0 C 0 ∼ Wp−1 (CΣC 0 , r(D 0 F ◦ )). (47)
Recall that CSC 0 = CX(I−PD0 )X 0 C 0 . Put Y = CX and since X ∼ Np,N (M D, Σ, IN ),
Y ∼ N(p−1),N (CM D, CΣC 0 , IN ). To reduce the dimension, we will apply the weight
vector d to Y instead of the data matrix X like we did for the parallelism hypothesis
and denote the new vector by z, where z 0 = d0 Y . Then
d0 Y (I − PD0 )Y 0 d + d0 Y PD0 (DD0 )−1 K Y 0 d
λ2/N =
d0 Y (I − PD0 )Y 0 d + d0 Y PD0 (DD0 )−1 K Y 0 d + d0 Y PD0 F ◦ Y 0 d
z 0 (I − PD0 )z + z 0 PD0 (DD0 )−1 K z
= 0 . (48)
z (I − PD0 )z + z 0 PD0 (DD0 )−1 K z + z 0 PD0 F ◦ z
The distribution of this ratio is presented in the next theorem.
Theorem 4.6. The ratio given in (48) follows Wilks’ lambda distribution with parameters
1, N − r(D) + r(K) and r(D 0 F ◦ ) that is denoted by Λ(1, N − r(D) + r(K), r(D 0 F ◦ ))
r(D 0 F ◦ )
which equals B N −r(D)+r(K)
2
, 2
.
Proof. The situation with d being a function of Y Y 0 is the same as with the parallelism
hypothesis. As a result of this, d0 Y is not normally distrubuted. Thus,
d0 Y (I − PD0 )Y 0 d + d0 Y PD0 (DD0 )−1 K Y 0 d W1 (d0 CΣC 0 d, N − r(D) + r(K),
d0 Y PD0 F ◦ Y 0 d W1 (d0 CΣC 0 d, r(D 0 F ◦ )).
Then we will use the related theorems of spherical distributions presented in Section 4.2.
One can see that Y is not spherically distributed because the orthogonally transformed
matrix YΓ does not have the same distribution as Y since E(YΓ ) 6= E(Y ). Here again,
as with the first hypothesis (parallelism hypothesis), we will form a new variable by
subtracting the mean under the null hypothesis from X. The statistics in the likelihood
ratio will not change and we will assure sphericity for the scores by assuring sphericity
for the new variable.
Recall that the model under the null hypothesis equals the classical growth curve model
(GMANOVA)
X = (C 0 )◦ θD + E
and the mean under the null hypothesis is given by (C 0 )◦ θD.
After subtracting the mean from X, the ratio can be written
d0 [Y − CE(X)](I − PD0 )[Y − CE(X)]0 d + d0 [Y − CE(X)]
× PD0 (DD0 )−1 K [Y − CE(X)]0 d
λN/2 = 0 ,
d [Y − CE(X)](I − PD0 )[Y − CE(X)]0 d + d0 [Y − CE(X)]
× PD0 (DD0 )−1 K [Y − CE(X)]0 d + d0 [Y − CE(X)]PD0 F ◦ [Y − CE(X)]0 d
since
d0 C[X − E(X)](I − PD0 )[X − E(X)]0 C 0 d
=d0 C[X − (C 0 )◦ θD](I − PD0 )[X − (C 0 )◦ θD]0 C 0 d
=d0 CX(I − PD0 )X 0 C 0 d
=d0 Y (I − PD0 )Y 0 d ; (49)
36
d0 C[X − E(X)]PD0 (DD0 )−1 K [X − E(X)]0 C 0 d
=d0 C[X − (C 0 )◦ θD]PD0 (DD0 )−1 K [X − (C 0 )◦ θD]0 C 0 d
=d0 CXPD0 (DD0 )−1 K X 0 C 0 d
=d0 Y PD0 (DD0 )−1 K Y 0 d ; (50)
From (49), (50) and (51), we can conclude that the distribution of the ratio will remain the
same after we subtract the mean under the null hypothesis from X. Let Ye = Y −CE(X).
Then
d0 Ye (I − PD0 )Ye 0 d + d0 Ye PD0 (DD0 )−1 K Ye 0 d
λN/2 = ·
d0 Ye (I − PD0 )Ye 0 d + d0 Ye × PD0 (DD0 )−1 K Ye 0 d + d0 Ye PD0 F ◦ Ye 0 d
The next step is to show that Ye is spherically distributed. This will be exactly the same
as for the parallelism hypothesis, consequently we omit the details. The scores, z 0 = d0 Ye ,
will also be spherically distributed according to the result in (34).
Finally we can establish the theorem, knowing that z is spherically distributed and using
Theorem 4.3 by Fang and Zhang,
z 0 (I − PD0 )z + z 0 PD0 (DD0 )−1 K z
λN/2 = f (z) =
z 0 (I − PD0 )z + z 0 PD0 (DD0 )−1 K z + z 0 PD0 F ◦ z
d (Rw)0 (I − PD0 )(Rw) + (Rw)0 PD0 (DD0 )−1 K (Rw)
=
(Rw)0 (I
− PD0 )(Rw) + (Rw)0 PD0 (DD0 )−1 K (Rw) + (Rw)0 PD0 F ◦ (Rw)
0
R2
w (I − PD0 )w + w0 PD0 (DD0 )−1 K w
= 2 .
R w0 (I − PD0 )w + w0 PD0 (DD0 )−1 K w + w0 PD0 F ◦ w
One can see that f (z) does not depend on R which means that the ratio will keep its
distribution. As a result,
z 0 (I − PD0 )z + z 0 PD0 (DD0 )−1 K z
λN/2
= 0 0 0
∼ Λ(1, N − r(D) + r(K), r(D 0 F ◦ ))
z (I − PD0 )z + z PD0 (DD0 )−1 K z + z PD0 F ◦ z
N − r(D) + r(K) r(D 0 F ◦ )
≡B , ,
2 2
and the theorem is proven.
5 Conclusion
In this report, profile analysis of several groups is of interest. Different from the current
methods which have been proposed by Srivastava (1987, 2002), we treat the problems
as problems in MANOVA and GMANOVA. Reformulation of the problems in this way
is useful in the later stages when the test statics have been formulated under high-
dimensional setting. For all three hypotheses, we derived the test statistic |U|U |
+V |
where
37
U and V are independently Wishart distributed. The distribution of |U|U |
+V |
is well-
known, that is Wilks’ lambda distribution which can be written as the product of Beta
distributed variables.
When p > n, one is exposed to issues such as the singularity of S. Different approaches
have been proposed so far, but the one we focused on was Läuter’s (1996, 2016) and
Läuter, Glimm and Kropf’s (1996, 1998) method. The original idea is taking a linear
combinations of p-variables per individual, which means multiplying the data matrix X
with a vector d0 , where d is a function of XX 0 . We needed to adapt this idea to make
it work in profile analysis. Thus, instead of implementing the reduction just for the data
matrix, we implemented it to a linear function of X, which appears in the likelihood
ratio statistic. There is a special case with the level hypothesis. Due to the restrictions
applied to the mean parameter space in this hypothesis, the likelihood ratio statistics
becomes already one dimensional, but in high-dimensions, the degrees of freedom of the
Wishart distribution for U becomes negative. By dimension reduction, we again attain
Wilks’ lambda distributions for the ratios in the end, which do not depend on p this time.
Spherical distribution theory is used to show the matrices follow a Wishart distribution.
Notice that we did not need to use spherical distributions for the level hypothesis.
The important question which still needs to be answered is how we should choose d. It is
noted that d needs to be a function of CX, but how to determine this function remains
as an open question.
Appendices
Appendix A will include the technical results in the usual setting (Chapter 3, where
N > p), whereas Appendix B will include the technical results in a high dimensional
setting (Chapter 4, where p > N ).
Appendix A
Result A1. Let W1 ∼ Wp (Σ, f1 ) and W2 ∼ Wp (Σ, f2 ) with f1 ≥ p. Then the distribu-
tion of Λ = |W|W 1|
1 +W2 |
, that is Wilks’ lambda distribution which is denoted by Λ(p, f1 , f2 ),
does not depend on Σ.
To show this, choose a non-singular matrix A which satisfies AΣA0 = Ip . Then
and
|W
f1 | |A||W1 ||A0 |
Λ
e= = = Λ.
|W f2 |
f1 + W |A||W1 + W2 ||A0 |
Λ’s
e distribution does not depend on Σ and since Λ
e = Λ, Λ’s distribution is also indepen-
dent of Σ.
Based on this result, one can conclude that the distribution of
|CSC 0 |
λ2/N = ,
|CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 |
38
where
does not depend on Σ and one can replace CΣC 0 by Ip−1 . Thus,
Result A2.
0
As a conclusion, (C 0 )◦ Σ−1 X and X 0 C 0 are independent.
Result A4.
0 0
(22) = ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 XD 0 (DD 0 )−1 KQ−1/2 .
(23) =Q−1/2 K 0 (DD 0 )−1 D[I − X 0 C 0 (CX(I − PD0 )X 0 C 0 )−1 CX(I − PD0 )]
× [I − (I − PD0 )X 0 C 0 (CX(I − PD0 )X 0 C 0 )−1 CX]D 0 (DD 0 )−1 KQ−1/2
0 0 0
⊗ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 ΣΣ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 .
39
The first part of the Kronecker product given by (23) can be calculated as
0 0 0 0 −1
I − X C (CX(I − PD 0 )X C ) CX(I − P D 0 )
− (I − P 0 )X 0 C 0 (CX(I − P 0 )X 0 C 0 )−1 CX
D D
1) Q−1/2 K 0 (DD 0 )−1 D
+ X 0 C 0 (CX(I − PD0 )X 0 C 0 )−1 CX(I − PD0 )
0 0 0 0 −1
× X C (CX(I − PD0 )X C ) CX
The second part of the Kronecker product given by (23) can be calculated as
0 0 0
2) ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 ΣΣ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1
0 0 0
=((C 0 )◦ Σ−1 (C 0 )◦ )−1 ((C 0 )◦ Σ−1 (C 0 )◦ )((C 0 )◦ Σ−1 (C 0 )◦ )−1
0
=((C 0 )◦ Σ−1 (C 0 )◦ )−1 .
Appendix B
Result B1. The rank of the idempotent matrix that appears in the middle part of (39)
is calculated as
r[(IN − PD0 )(IN − X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX(I − PD0 ))] (Prop. 2.13)
0 0 0 0 0 −1 0
=r(IN − PD0 ) − r[IN − X C d(d CX(I − PD0 )X C d) d CX(I − PD0 )] − N
=N − r(D) + N − r(X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX(I − PD0 )) − N (Prop. 2.14, ii))
=N − r(D) − tr(X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX(I − PD0 )) (Prop. 2.14, i))
0 0 0 0 0 0 −1
=N − r(D) − tr(d CX(I − PD0 )X C d(d CX(I − PD0 )X C d) )
=N − r(D) − tr(1)
=N − r(D) − 1.
40
Put
B = [I − (I − PD0 )X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX]D 0 (DD 0 )− K Q
e −1/2 .
Calculate
0 0 0 0 0 −1 0
I − X C d(d CX(I − P D 0 )X C d) d CX(I − PD 0 )
− (I − P 0 )X 0 C 0 d(d0 CX(I − P 0 )X 0 C 0 d)−1 d0 CX
D D
B 0 B =Q
e −1/2 K 0 (DD 0 )−1 D
+ X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX(I − PD0 )
× X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX
× D 0 (DD 0 )−1 K Q
e −1/2
e −1/2 K 0 (DD 0 )−1 DD 0 (DD 0 )−1 K Q
=Q e −1/2 + Q
e −1/2 K 0 (DD 0 )−1 DX 0 C 0 d(d0 CX
× (I − PD0 )X 0 C 0 d)−1 d0 CXD 0 (DD 0 )−1 K Q e −1/2
e −1/2 K 0 (DD 0 )−1 K Q
=Q e −1/2 + Q
e −1/2 K 0 (DD 0 )−1 DX 0 C 0 d(d0 CX(I − PD0 )
× X 0 C 0 d)−1 d0 CXD 0 (DD 0 )−1 K Q
e −1/2
e −1/2 [K 0 (DD 0 )−1 K + K 0 (DD 0 )−1 DX 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX
=Q
× D 0 (DD 0 )−1 K]Q e −1/2
e −1/2 Q
=Q eQe −1/2 = I.
Observe that (43) = BB 0 . To prove that (43) is idempotent, one needs to show that
BB 0 BB 0 = BB 0 :
0
B |B{z B} B 0 = BB 0 .
=I
Result B3.
(43) = [I − (I − PD0 )X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX]D 0 (DD 0 )−1 K Q
e −1
× K 0 (DD 0 )−1 D[I − X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX(I − PD0 )]
= BB 0 .
r(43) = r(BB 0 ) = tr(BB 0 ) = tr(B 0 B) = tr(I) = s = r(K)
The size of Qe is the same with the size of Q and Q’s size information together with the
relation to the matrix K is given in the proof of Theorem 3.2 in Chapter 3.2.2.
Acknowledgements
The authors would like to thank Professor Julia Volaufova and Dr. Martin Singull for their
invaluable comments and suggestions which helped to improve the report significantly.
References
[1] Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis (3rd
ed). Wiley, New York.
[2] Anderson, T. W. and Fang, K. T. (1982). On the theory of multivariate elliptically
contoured distributions and their applications. Technical Report No. 54, Stanford
University, California.
41
[3] Anderson, T. W. and Fang, K. T. (1990). Theory and applications of elliptically
contoured and related distributions. Technical Report No. 24, Stanford University,
California.
[5] Cambanis, S., Huang, S., and Simons, G. (1981). On the theory of elliptically con-
toured distribution. Journal of Multivariate Analysis, 11, 368-385.
[6] Dawid, A. P. (1977). Spherical matrix distributions and a multivariate model. Journal
of the Royal Statistical Society, Series B, 39, 254-261.
[8] Fang, K. T., Kotz, S. and Ng, K. W. (1990). Symmetric Multivariate and Related
Distributions. Springer-Science+Business Media, B.V.
[10] Fujikoshi, Y. (2009). Statistical inference for parallelism hypothesis in growth curve
model. SUT Journal of Mathematics, 45, 137-148.
[11] Fujikoshi, Y., Ulyanov, V. V., and Shimizu, R. (2010). Multivariate Statistics: High-
Dimensional and Large-Sample Approximations. Wiley, Hoboken, New Jersey.
[12] Geisser, S. (2003). The analysis of profile data–revisited. Statistics in Medicine, 22,
3337-3346.
[13] Greenhouse, S. W., Geisser, S. (1959). On the methods in the analysis of profile
data. Psychometrika 24, 95-112.
[15] Kariya, T. and Sinha, B. K. (1989). Robustness of Statistical Tests. Academic Press,
Boston.
[17] Kollo, T. and von Rosen, D. (2005). Advanced Multivariate Statistics with Matrices.
Springer, Dordrecht.
[18] Kollo, T., von Rosen, T. and von Rosen, D. (2011). Estimation in high-himensional
analysis and multivariate linear models. Communications in Statistics - Theory and
Methods, 40, 1241-1253.
42
[20] Läuter, J. (1996). Exact t and F tests for analyzing studies with multiple endpoints.
Biometrics, 52, 964-970.
[21] Läuter, J. (2016). Multivariate Statistik - drei Manuskripte. Shaker Verlag, Aachen.
[22] Läuter, J., Glimm, E. and Kropf, S. (1996). New multivariate tests for data with an
inherent structure. Biometrical Journal, 38, 5-23.
[23] Läuter, J., Glimm, E., and Kropf, S. (1998). Multivariate tests based on left-
spherically distributed linear scores. The Annals of Statistics 26, 1972-1988.
[24] Ledoit, O. and Wolf, M. (2002). Some hypothesis tests for the covariance matrix
when the dimension is large compared to the sample size. The Annals of Statistics
30, 1081-1102.
[25] Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. Academic
Press, New York.
[26] Morrison, D. F. (2004). Multivariate Statistical Methods, 4th edition. Duxbury Press,
CA.
[27] Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley, New York.
[28] O’Brien, P. C. (1984). Procedures for comparing samples with multiple endpoints.
Biometrics, 40, 1079-1087.
[29] Ohlson, M. and Srivastava, M. S. (2010). Profile analysis for a growth curve model.
Journal of the Japan Statistical Society, 40, 1-21.
[30] Onozawa, M., Nishiyama, T. and Seo, T. (2016). On test statistics in profile analysis
with high-dimensional data. Communications in Statistics - Simulation and Compu-
tation, 45, 3716-3743.
[32] Rao, C. R. (1973). Linear Statistical Inference and Its Applications, 2nd edition.
Wiley, New York.
[33] Rencher, A. C. (2002). Methods of Multivariate Analysis, 2nd edition. Wiley, New
York.
[34] Seo, T., Sakurai, T. and Fujikoshi, Y. (2011). LR tests for two hypotheses in profile
analysis of growth curve data. SUT Journal of Mathematics, 47, 105-118.
[35] Shutoh, N. and Takahashi, S. (2016). Tests for parallelism and flatness hypotheses
of two mean vectors in high-dimensional settings. Journal of Statistical Computation
and Simulation, 86, 1150-1165.
43
[38] Srivastava, M. S. (2005). Some tests concerning the covariance matrix in high di-
mensional data. Journal of the Japan Statistical Society, 35, 251–272.
[39] Srivastava, M. S. (2007). Multivariate theory for analyzing high dimensional data.
Journal of the Japan Statistical Society, 37, 53-86.
[41] Srivastava, M. S. and Du, M. (2008). A test for the mean vector with fewer obser-
vations than the dimension. Journal of Multivariate Analysis, 99, 386–402.
[44] Srivastava, M. S. and Singull, M. (2012). Profile analysis with random-effects covari-
ance structure. Journal of the Japan Statistical Society, 42, 145-164.
[45] Srivastava, M. S. and Singull, M. (2017). Test for the mean matrix in a growth curve
model for high dimensions. Communications in Statistics - Theory and Methods, 46,
6668-6683.
[46] von Rosen, D. (2018). Bilinear Regression Analysis: An Introduction. Springer In-
ternational Publishing, New York.
[48] Yokoyama, T. and Fujikoshi, Y. (1993). A parallel profile model with random-effects
covariance structure. Journal of the Japan Statistical Society, 23, 83-89.
44