You are on page 1of 46

Department of Mathematics

HIGH-DIMENSIONAL PROFILE
ANALYSIS

Cigdem Cengiz and Dietrich von Rosen

LiTH-MAT-R--2020/07--SE
Linköping University
Department of Mathematics
SE-581 83 Linköping
HIGH-DIMENSIONAL PROFILE ANALYSIS
Cigdem Cengiz1 and Dietrich von Rosen1,2
1 Departmentof Energy and Technology,
Swedish University of Agricultural Sciences,
SE-750 07 Uppsala, Sweden.
2 Department of Mathematics,
Linköping University,
SE-581 83 Linköping, Sweden.

Abstract

The three tests of profile analysis: test of parallelism, test of level and test of flatness
have been studied. Likelihood ratio tests have been derived. Firstly, a traditional
setting, where the sample size is greater than the dimension of the parameter space,
is considered. Then, all tests have been derived in a high-dimensional setting. In
high-dimensional data analysis, it is required to use some techniques to tackle the
problems which arise with the dimensionality. We propose a dimension reduction
method using scores which was first proposed by Läuter et al. (1996).
Keywords: High-dimensional data; hypothesis testing; linear scores; multivariate
analysis; profile analysis; spherical distributions.

1
Notation
Abbreviations
p.d. : positive definite
p.s.d. : positive semi-definite
i.i.d. : independently and identically distributed
i.e. : that is
e.g. : for example
MANOVA : multivariate analysis of variance
GMANOVA : generalized multivariate analysis of variance
BRM : bilinear regression model
PLS : partial least squares
PCA : principal component analysis
PCR : principal component regression
CLT : central limit theorem
LLN : law of large numbers

Symbols
x : column vector
X : matrix
A0 : transpose of A
A−1 : inverse of A
A+ : Moore-Penrose inverse of A
A− : generalized inverse of A
|A| : determinant of A
C(A) : column space of A
r(A) : rank of A
A⊥ : orthocomplement of subspace A
A◦ : C(A◦ ) = C(A)⊥
⊗ : Kronecker product
 : orthogonal sum of linear spaces
vec: vec-operator
In : n × n identity matrix
1n : n × 1 vector of ones
E[x] : expectation of x
D[x] : dispersion matrix of x
Np (µ, Σ) : multivariate normal distribution
Np,n (µ, Σ, Ψ) : matrix normal distribution
Wp (Σ, n, ∆) : non-central Wishart distribution with n degrees of freedom
Wp (Σ, n) : central Wishart distribution with n degrees of freedom
d
= : equal in distribution
(A)( )0 : (A)(A)0

2
1 Introduction
In this report, we are going to construct test statistics for each of the three hypothesis
in profile analysis, first in a classical setting where the number of parameters is less than
the number of subjects and then in a high-dimensional setting where the opposite holds,
i.e., the number of parameters exceeds the number of individuals.
In profile analysis, we have multiple variables for each individual who form different groups
(at least two) and the groups are compared based on the mean vectors of these variables.
The idea is to see if there is an interaction between groups and responses. Assume we
have p variables and q independent groups (treatments), the p-dimensional vectors are
denoted by x1 , x2 , ..., xq with mean vectors µ1 , µ2 , ..., µq . The mean profile for the i-th
group is obtained by connecting the lines between the points (1, µi1 ), (2, µi2 ), ..., (p, µip ).
We can then consider the profile analysis as the comparison of these q lines of mean
vectors. See Figure 1 for an illustration.

Figure 1: Profiles of q groups.

There are two possible scenarios that can be considered for the responses:
I. The same mean-variable can be compared between q groups over several time-points
(repeated measurements).

II. One can measure different variables for each subject and compare their mean levels
between q groups.

In the literature, the topic has been investigated by many researchers. One of the first
and leading papers on this topic was published by Greenhouse and Geisser (1959) and
the topic has been revisited by Geisser (2003). Srivastava (1987) derived the likelihood
ratio tests together with their distributions for the three hypothesis. A chapter on profile
analysis can be found in the books by Srivastava (2002) and Srivastava and Carter (1983).
Potthoff and Roy (1964) presented the growth curve model for the first time and other

3
extensions within the framework of the growth curve model can be found in Fujikoshi
(2009), where Fujikoshi extended profile analysis, especially statistical inference on the
parallelism hypothesis. Ohlson and Srivastava (2010) considered profile analysis of several
groups, where the groups have partly equal means. Seo, Sakurai and Fujikoshi (2011)
derived the likelihood ratio tests for the two hypotheses, level and flatness, in profile
analysis of growth curve data. Another focus was on the profile analysis with random
effects covariance structure. Srivastava and Singull (2012) constructed tests based on
the likelihood ratio, without any restrictions on the parameter space, for testing the
covariance matrix for random-effects structure or sphericity. Yokoyama (1995) derived
the likelihood ratio criterion with random-effects covariance structure under the parallel
profile model. Yokoyama and Fujikoshi (1993) conducted analysis of parallel growth
curves of groups where they assumed a random-effects covariance structure. They also
gave the asymptotic null distributions of the tests.

1.1 Profile analysis of several groups


As mentioned before, there are three types of tests which are commonly considered in
profile analysis: test of parallelism, test of levels and test of flatness (Srivastava and
Carter, 1983; Srivastava, 1987, 2002). Assume that the ni p-dimensional random vectors
xij are independently normally distributed as Np (µi , Σ), j = 1, ..., ni , i = 1, ..., q, where
µi = (µ1,i , ..., µp,i )0 .

(1) Parallelism hypothesis


H1 : µi − µq = γi 1p , i = 1, ..., q − 1 and A1 6= H1 ,
where A1 stands for alternative hypothesis and 1p is a p-dimensional vector of ones.

(2) Level hypothesis


H2 |H1 : γi = 0, i = 1, ..., q − 1 and A2 6= H2 |H1 ,
where H2 |H1 means H2 under the assumption that H1 is true.

(3) Flatness hypothesis


H3 |H1 : µ• = γq 1p and A3 6= H3 |H1 ,
where H3 |H1 means H3 under the assumption that H1 is true.

The parameters γi ’s represent unknown scalars and µ• = N1 qi=1 ni µi where N is the


P
total sample size, that is, N = n1 + · · · + nq .
If the profiles are parallel, we can say that there is no interaction between the responses
and the treatments (groups). Given that the parallelism hypothesis holds, one may want
to proceed with testing the second hypothesis, H2 , which indicates that there is no column
or treatment (group) effect. Alternatively, if the first hypothesis holds, one may want
to proceed with testing the third hypothesis, H3 , which indicates that there is no row
effect. Briefly speaking, the level hypothesis under the parallelism hypothesis (H2 |H1 )
indicates that the q profiles are coincident with each other. The flatness hypothesis under
the parallelism hypothesis (H3 |H1 ) indicates that the q profiles are constant. It is useful
to note that failing to reject H1 does not mean that it is true, that is the profiles are
parallel. It means that we do not have enough evidence against H1 . The second and the
third hypotheses are constructed given that the first hypothesis holds.

4
1.2 Test statistics for the two-sample case
In this section, only the special case of two groups is considered. Let the p-dimensional
(i) (i)
random vectors x1 , ..., xni , i = 1, 2, be independently normally distributed with mean
vector µi and covariance matrix Σ. The sample mean vectors, the sample covariance
matrices and the pooled sample covariance matrix are given by
ni
(i) 1 X (i)
x = xj ,
ni j=1
ni
1 X (i) (i)
S (i)
= (xj − x(i) )(xj − x(i) )0 ,
ni − 1 j=1
1
Sp = [(n1 − 1)S (1) + (n2 − 1)S (2) ].
n1 + n2 − 2
Define a (p − 1) × p matrix C which satisfies C1p = 0 and is of rank r(C) = p − 1. Let
n1 n2
b= , f = n1 + n2 − 2, u = x(1) − x(2) .
n1 + n2

Then the three hypotheses and related test statistics can be written as below (Srivastava
and Carter, 1983; Srivastava, 1987, 2002):

(1) Parallelism hypothesis: H1 : Cµ1 = Cµ2 .


The null hypothesis is rejected if

f − (p − 1) + 1 0 0
bu C (CSp C 0 )−1 Cu ≥ Fp−1,f −p+2,α ,
f (p − 1)

where Fp−1,f −p+2,α denotes the α-percentile of the F -distribution with p − 1 and
f − p + 2 degrees of freedom.

(2) Level hypothesis: H2 |H1 : 10p µ1 = 10p µ2 .


The null hypothesis is rejected if
f − p + 1  0 −1 2 0 −1 −1
b(1 Sp u) (1 Sp 1) (1 + f −1 Tp−1
2
)−1 ≥ t2f −p+1,α/2 = F1,f −p+1,α ,
f
2
where Tp−1 = bu0 C 0 (CSp C 0 )−1 Cu and t2f −p+1,α/2 is the α/2-percentile of the t-
distribution with f − p + 1 degrees of freedom.

(3) Flatness hypothesis: H3 |H1 : C(µ1 + µ2 ) = 0.


The null hypothesis is rejected if

n(f − p + 3) 0 0
x C (CV C 0 + bCuu0 C 0 )−1 Cx ≥ Fp−1,n−p+1,α ,
p−1

where x = (n1 x(1) + n2 x(2) )/(n1 + n2 ) and V = f Sp .

As it is mentioned before, the second hypothesis is tested given that H1 is true. If one
fails to reject the first hypothesis, it cannot be concluded that the profiles are parallel.

5
2 Useful definitions and theorems
Definition 2.1. The vector space generated by the columns of an arbitrary matrix A :
p × q is denoted C(A):
C(A) = {a : a = Ax, x ∈ Rq }.

Definition 2.2. A matrix, whose columns generate the orthogonal complement to C(A)
is denoted A◦ , i.e., C(A◦ ) = C(A)⊥ . Similar to the generalized inverse, A◦ is not unique.
One can choose A◦ = I − (A0 )− A0 or A◦ = I − A(A0 A)− A0 in addition to some other
choices.

Definition 2.3. The space CV (A) denotes a column vector space with an inner product
defined through the positive definite matrix V ; i.e., for any pair of vectors x and y, the
operation x0 V −1 y holds. If V = I, instead of CI (A) one writes C(A).

Definition 2.4. The orthogonal complement to CV (A) is denoted by CV (A)⊥ and is


generated by all the vectors orthogonal to all the vectors in CV (A); i.e., for an arbitrary
a ∈ CV (A), all the y satisfying y 0 V −1 a = 0 generate the linear space (column vector
space) CV (A)⊥ .

Definition 2.5. Let V1 and V2 be disjoint subspaces and y = x1 + x2 , where x1 ∈ V1


and x2 ∈ V2 . The mapping P y = x1 is called a projection of y on V1 along V2 and P
is a projector. If V1 and V2 are orthogonal, we say that we have an orthogonal projector.

(i) P P = P , which means P is an idempotent matrix.

(ii) If P is a projector, then I − P is also a projector.

(iii) P is unique.

(iv) PA = A(A0 A)− A0 is a projector on C(A) for which the standard inner product is
assumed to hold.

(v) PA,V = A(A0 V −1 A)− A0 V −1 is a projector on CV (A) for which an inner product
defined by (x, y) = x0 V −1 y is assumed to hold and V is p.d.

Definition 2.6. The matrix W : p × p is said to be Wishart distributed if and only if


W = XX 0 for some matrix X, where X ∼ Np,n (µ, Σ, I), Σ ≥ 0. If µ = 0, we have
a central Wishart distribution which is denoted by W ∼ Wp (Σ, n) and if µ 6= 0, we
have a non-central Wishart distribution which is denoted by W ∼ Wp (Σ, n, ∆) where
∆ = Σ−1 µµ0 .

Definition 2.7. Let X and Y be two arbitrary matrices. The covariance of these two
matrices is defined by

Cov[X, Y ] = E[(vecX − E[vecX])(vecY − E[vecY ])0 ]

if the expectations exist. From this, we have the following:

D[X] = E[vec(X − E[X])vec0 (X − E[X])].

6
Definition 2.8. The general multivariate linear model equals

X = M B + E,

where X : p × n is a random matrix which corresponds to the observations, M : p × q


is an unknown parameter matrix and B : q × n is a known design matrix. Moreover,
E ∼ Np,n (0, Σ, I), where Σ is an unknown p.d. matrix. This is also called the MANOVA
model.

Definition 2.9. A bilinear model can be defined as

X = AM B + E,

where X : p × n, the unknown mean parameter matrix M : q × k, the two design matrices
A : p × q and B : k × n and the error matrix E. This is also called the GMANOVA
model or the growth curve model.

Theorem 2.1. The general solution of the consistent equation in X:

AXB = C

can be given by any of these three formulas:


0 0
(i) X = X0 + (A0 )◦ Z1 B 0 + A0 Z2 B ◦ + (A0 )◦ Z3 B ◦ ,
0
(ii) X = X0 + (A0 )◦ Z1 + A0 Z2 B ◦ ,
0
(iii) X = X0 + Z1 B ◦ + (A0 )◦ Z2 B 0 ,

where X0 represents a particular solution and Zi , i = 1, 2, 3, represent arbitrary matrices


of proper sizes.

Theorem 2.2. The equation AXB = C is consistent if and only if C(C) ⊆ C(A) and
C(C 0 ) ⊆ C(B 0 ). A particular solution of the equation is given by

X0 = A− CB − ,

where ”−” denotes an arbitrary g-inverse.

Theorem 2.3. If S is positive definite and C(B) ⊆ C(A),


0 0 0
PA,S = PB,S + SPA,S B ◦ (B ◦ SPA,S
0
B ◦ )− B ◦ PA,S .

A special case is
0 0
S −1 − B ◦ (B ◦ SB ◦ )− B ◦ = S −1 B(B 0 S −1 B)− B 0 S −1 .

Theorem 2.4. For A : m × n and B : n × m

|Im + AB| = |In + BA|.

Theorem 2.5. Let X ∼ Np,n (µ, Σ, Ψ). For any A : q × p and B : m × n

AXB 0 ∼ Nq,m (AµB 0 , AΣA0 , BΨB).

7
Theorem 2.6. Let W1 ∼ Wp (Σ, n, ∆1 ) be independent of W2 ∼ Wp (Σ, m, ∆2 ). Then

W1 + W2 ∼ Wp (Σ, n + m, ∆1 + ∆2 ).

Theorem 2.7. Let X ∼ Np,n (0, Σ, I) and Q be any idempotent matrix of a proper size.
Then
XQX 0 ∼ Wp (Σ, r(Q)).

Theorem 2.8. Let A ∈ Rq×p and W ∼ Wp (Σ, n). Then

AW A0 ∼ Wq (AΣA0 , n).

Theorem 2.9. Let A ∈ Rp×q and W ∼ Wp (Σ, n). Then

A(A0 W −1 A)− A0 ∼ Wp (A(A0 Σ−1 A)− A0 , n − p + r(A)).


 
A11 A12
Theorem 2.10. Let a partitioned non-singular matrix A be given by A = .
A21 A22
If A22 is non-singular, then

|A| = |A22 ||A11 − A12 A−1


22 A21 |;

if A11 is non-singular, then

|A| = |A11 ||A22 − A21 A−1


11 A12 |.

Theorem 2.11. Let S be positive definite and suppose that V , W and H are of proper
sizes, assuming H −1 exists. Then

(S + V HW 0 )−1 = S −1 − S −1 V (W 0 S −1 V + H −1 )−1 W 0 S −1 .

Theorem 2.12. Let E[X] = µ and D[X] = Ψ ⊗ Σ. Then

(i) E[AXB] = AµB,

(ii) D[AXB] = B 0 ΨB ⊗ AΣA0 .

Theorem 2.13. Let A : n × m and B : m × n. Then

r(A − ABA) = r(A) + r(Im − BA) − m = r(A) + r(In − AB) − n.

Theorem 2.14.

(i) For A, B, C and D of proper sizes

tr(ABCD) = tr(DABC) = tr(CDAB) = tr(BCDA).

(ii) If A is idempotent, then


tr(A) = r(A).

(iii) For any A


tr(A) = tr(A0 ).

8
Theorem 2.15. Let A, B and C be matrices of proper sizes. Then

vec(ABC) = (C 0 ⊗ A)vecB.

Theorem 2.16.
(i) (A ⊗ B)0 = A0 ⊗ B 0

(ii) Let A, B, C and D be matrices of proper sizes. Then

(A ⊗ B)(C ⊗ D) = AC ⊗ BD.

(iii) A ⊗ B = 0 if and only if A = 0 or B = 0.


Theorem 2.17. Let A, B and C be matrices of proper sizes. If C(A) ⊆ C(B), then

C(B) = C(A)  C(B) ∩ C(A)⊥ .

3 Likelihood ratio tests for the three hypotheses


In Section 1.1, we have given the three hypotheses of profile analysis of q groups and the
test statistics of the likelihood ratio tests for two groups have been presented in Section 1.2
(Srivastava and Carter, 1983; Srivastava, 1987, 2002). Srivastava (1987) derived the test
statistics for q groups, but in this section we will reformulate the problems as we indeed
are in a mutivariate analysis of variance (MANOVA) or a generalized mutivariate analysis
of variance (GMANOVA) testing situation. This will require matrix reformulation of the
hypotheses and derivation of the likelihood ratio tests based on this matrix notation.

3.1 The model


The model for one group, say the k-th group, can be written as

Xk = Mk Dk + Ek

where Xk represents the matrix of observations, Mk the p-vector of mean parameters,


Dk a vector of nk ones, i.e., Dk = 10nk and Ek error matrix. The columns of Xk
are independently distributed, which means that the columns of Ek are independently
distributed. The assumption for the distribution of Ek is that the column vectors of Ek
follow a multivariate normal distribution; ejk ∼ Np (0, Σ).
When we have q groups, we have q models:

(X1 : X2 : ... : Xq ) = (M1 D1 : M2 D2 : ... : Mq Dq ) + (E1 : E2 : ... : Eq )


= (M1 : M2 : ... : Mq )D + (E1 : E2 : ... : Eq ), (1)

where D is a q × N matrix, N = qk=1 nk , which equals


P

 
1 ··· 1 0 ··· 0 ··· 0
0 · · · 0 1 · · · 1 · · · 0
D =  .. . . .. .. . . .. . . ..  ,
 
. . . . . . . .
0 ··· 0 0 ··· 0 ··· 1

9
and where Ek ∼ Np,nk (0, Σ, Ink ). The relation in (1) can be written

X = M D + E , X ∼ Np,N (M D, Σ, IN ).
(p×N ) (p×q)(q×N ) (p×N )

where X = (X1 : X2 : · · · : Xq ), M = (M1 : M2 : · · · : Mq ) and E = (E1 : E2 : · · · :


Eq ).
Moreover let F be a q × (q − 1) and C be a (p − 1) × p matrix which satisfy 10 F = 0
and C1 = 0, respectively, e.g.,
 
1 0 ··· 0  
−1 1 · · · 0  1 −1 0 0 ··· 0 0
  0 1 −1 0 ··· 0 0 
 0 −1 · · · 0   
F =  ..

.. . . ..  ,

C = 0 0
 1 −1 ··· 0 0  ·
 . . . .   .. .. .. .. . . .. .. 
  . . . . . . . 
0 0 ··· 1 
0 0 0 0 · · · 1 −1
0 0 · · · −1

These two matrices, F and C, will be used in the next chapter during the derivations of
the tests. Since the common F and C are used in each hypothesis, they are introduced
here.

3.2 Derivation of the tests


The model for q groups has been given above by

X = MD + E (2)

where X = (X1 : X2 : · · · : Xq ). This model is often called the MANOVA model. If we


want to deduct any inference from the model, the unknown parameters M and Σ need
to be estimated.
For the model in (2), the likelihood function equals
 
− 12 pN −N/2 1 −1 0
L(M , Σ) = (2π) |Σ| exp − trΣ (X − M D)(X − M D)
2
− 12 N
− 21 pN 1 1
≤ (2π) (X − M D)(X − M D)0 e− 2 N p ,
N

where N = qk=1 nk and equality holds if and only if N Σ = (X − M D)(X − M D)0


P
(see Srivastava and Khatri, 1979, Theorem 1.10.4). Now, we need to find a lower bound
for |(X − M D)(X − M D)0 |.
Use that I = I − PD0 + PD0 and then

|(X − M D)(X − M D)0 | = |(X − M D)PD0 (X − M D)0


+ (X − M D)(I − PD0 )(X − M D)0 |
≥ |(X − M D)(I − PD0 )(X − M D)0 | = |X(I − PD0 )X 0 |

and equality holds if and only if (X − M D)PD0 = 0. Thus XPD0 = M c D and N Σ b =


0 0 0 −
RR , where R = X(I − PD0 ) with PD0 = D (DD ) D. This can be considered as a
decomposition of Rn relative to the model (see Figure 2). The whole space is divided into

10
two spaces which are orthogonal to each other; C(D 0 ) and C(D 0 )⊥ which correspond to
the mean space and residual space, respectively.

c D = XPD0 R = X(I − PD0 )


M

C(D 0 ) C(D 0 )⊥

Figure 2: The decomposition of the space with no restriction on the mean parameter
space.

Note that since D is a full rank matrix, one can write PD0 = D 0 (DD 0 )− D = D 0 (DD 0 )−1 D.
Now we will move on to the derivation of the test statistics.

3.2.1 Parallelism Hypothesis

The null hypothesis and the alternative hypothesis for parallelism can be written

H1 : E(X) = M D, CM F = 0,
A1 : E(X) = M D, CM F 6= 0, (3)

where C and F are defined in Section 3.1.

Theorem 3.1. The likelihood ratio statistic for the parallelism hypothesis presented in
(3) can be given as
|CSC 0 |
λ2/N = , (4)
|CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 |
where S = X(I − PD0 )X 0 and K is any matrix satisfying C(K) = C(D) ∩ C(F ),

CSC 0 ∼ Wp−1 (CΣC 0 , N − r(D)),


CXPD0 (DD0 )−1 K X 0 C 0 ∼ Wp−1 (CΣC 0 , r(K)).

Then
|CSC 0 |
λ2/N = ∼ Λ(p − 1, N − r(D), r(K)),
|CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 |

where Λ(·, ·, ·) denotes the Wilks’ lambda distribution.

Proof. We have restrictions on the mean parameter space:

CM F = 0 ⇔ (F 0 ⊗ C)vecM = 0

which means that vecM belongs to C(F ⊗ C 0 )⊥ . By Theorem 2.1, the general solution
of the equation CM F = 0 equals
0
M = (C 0 )◦ θ1 + C 0 θ2 F ◦ ,

11
where θ1 and θ2 are new parameters. Inserting this solution into (2) yields
0
X = (C 0 )◦ θ1 D + C 0 θ2 F ◦ D + E.

This is the reparameterization of the first model given by (2) after applying the restrictions
CM F = 0. Here we notice that we are outside the MANOVA and GMANOVA model.
Recall the inequality
−N/2
−N/2 − 21 tr{Σ−1 (X−E(X))(X−E(X))0 } 1
|Σ| e ≤ (X − E(X))(X − E(X))0 e−N p/2
N
with equality if and only if N Σ = (X − E(X))(X − E(X))0 .
Now, we start performing some calculations under the null hypothesis:
0
|(X − (C 0 )◦ θ1 D − C 0 θ2 F ◦ D)( )0 |

I = PD0 +(I − PD0 )
0
= |(XPD0 − (C 0 )◦ θ1 D − C 0 θ2 F ◦ D)( )0 + X(I − PD0 )X 0 |
| {z }
S
−1 0 ◦ 0 ◦0 0
= |S||S (XPD0 − (C ) θ1 D − C θ2 F D)( ) + I|
0 ◦ 0 ◦0 0 −1 0
= |S||(XPD0 − (C ) θ1 D − C θ2 F D) S (XPD0 − (C 0 )◦ θ1 D − C 0 θ2 F ◦ D) + I|.
(5)
Recall from Theorem 2.3,
0 0
S −1 = C 0 (CSC 0 )− C + S −1 (C 0 )◦ [(C 0 )◦ S −1 (C 0 )◦ ]− (C 0 )◦ S −1 .

If we replace S −1 in (5) with this expression:


0 0 0
=|S||(XPD0 − (C 0 )◦ θ1 D − C 0 θ2 F ◦ D)0 S −1 (C 0 )◦ [(C 0 )◦ S −1 (C 0 )◦ ]− (C 0 )◦ S −1
0 0
× (XPD0 − (C 0 )◦ θ1 D − C 0 θ2 F ◦ D) + (XPD0 − (C 0 )◦ θ1 D − C 0 θ2 F ◦ D)0
0
× C 0 (CSC 0 )− C(XPD0 − (C 0 )◦ θ1 D − C 0 θ2 F ◦ D) + I|
0
≥|S||(XPD0 − (C 0 )◦ θ1 D − C 0 θ2 F ◦ D)0 C 0 (CSC 0 )− C
0
× (XPD0 − (C 0 )◦ θ1 D − C 0 θ2 F ◦ D) + I| (6)

with equality if and only if


0 0 0
(XPD0 − (C 0 )◦ θ1 D − C 0 θ2 F ◦ D)0 S −1 (C 0 )◦ [(C 0 )◦ S −1 (C 0 )◦ ]− (C 0 )◦ S −1
0
×(XPD0 − (C 0 )◦ θ1 D − C 0 θ2 F ◦ D) = 0.

Notice that
0
D 0 θ10 (C 0 )◦ C 0 (CSC 0 )− C = 0.
orthogonal

Hence, (6) is equal to


0 0
|S||(XPD0 − C 0 θ2 F ◦ D)0 C 0 (CSC 0 )− C (XPD0 − C 0 θ2 F ◦ D) + I|
| {z }
=C 0 (CSC 0 )− CSC 0 (CSC 0 )− C
0 0
=|S||(XPD0 − C 0 θ2 F ◦ D)0 C 0 (CSC 0 )− CSC 0 (CSC 0 )− C(XPD0 − C 0 θ2 F ◦ D) + I|.

12
By Theorem 2.4,
0 0
=|S||C 0 (CSC 0 )− C(XPD0 − C 0 θ2 F ◦ D)(XPD0 − C 0 θ2 F ◦ D)0 C 0 (CSC 0 )− CS + I|
0 0
=| SC 0 (CSC 0 )− C (XPD0 − C 0 θ2 F ◦ D)(XPD0 − C 0 θ2 F ◦ D)0 C 0 (CSC 0 )− CS +S|
| {z } | {z }
P0 0 PC 0 ,S −1
C ,S −1
0
=|(PC0 0 ,S −1 XPD0 − PC0 0 ,S −1 C 0 θ2 F ◦ D)( )0 + S|

(I − PD0 F ◦ ) + PD0 F ◦
=|PC0 0 ,S −1 XPD0 (I − PD0 F ◦ )(PC0 0 ,S −1 XPD0 )0
0
+ (PC0 0 ,S −1 XPD0 PD0 F ◦ − PC0 0 ,S −1 C 0 θ2 F ◦ D)( )0 + S|
≥|PC0 0 ,S −1 XPD0 (I − PD0 F ◦ )(PC0 0 ,S −1 XPD0 )0 + S|. (7)

Since PA◦ = I − PA , one can write I − PD0 F ◦ = P(D0 F ◦ )◦ . From the definition of the
column spaces given in the Notation part C[(D 0 F ◦ )◦ ] = C(D 0 F ◦ )⊥ . Using Theorem 2.17,
C(D 0 F ◦ )⊥ can be decomposed into two orthogonal subspaces:

C(D 0 F ◦ )⊥ = C(D 0 )⊥  C(D 0 ) ∩ C(D 0 F ◦ )⊥ , (8)


| {z }
C(D 0 (DD 0 )−1 K)

where C(K) = C(D) ∩ C(F ). The space C(D 0 )⊥ will correspond to I − PD0 and the space
C(D 0 (DD 0 )−1 K) will correspond to PD0 (DD0 )−1 K . Then

(7) =|PC0 0 ,S −1 XPD0 [(I − PD0 ) + PD0 (DD0 )−1 K ](PC0 0 ,S −1 XPD0 )0 + S|
(PD0 (I − PD0 ) = 0, PD0 PD0 (DD0 )−1 K = PD0 (DD0 )−1 K )
=|PC0 0 ,S −1 XPD0 (DD0 )−1 K (PC0 0 ,S −1 X)0 + S|.

For the alternative hypothesis, we do not have any restrictions on the mean parameter
space. Thus, we will use the results from the introduction of Section 3.2, where N Σ
b =
0
RR = S was found.


b A1 = S,
b H1 = S + P 0 0 −1 XPD0 (DD0 )−1 K X 0 PC 0 ,S −1 .
NΣ (9)
C ,S

Thus,

|N Σ
b A1 | |S|
λ2/N = = 0
.
|N Σ
b H1 | |S + PC 0 ,S −1 XPD0 (DD0 )−1 K X 0 PC 0 ,S −1 |

The numerator and the denominator are not independently distributed. To be able to
achieve this independency, we introduce the full rank matrix

H = (C 0 , S −1 (C 0 )◦ ).

Multiplying the ratio (both the numerator and the denominator) by H yields

|H 0 Σ
b A1 H| |CSC 0 |
λ2/N = = ·
|H 0 Σ
b H1 H| |CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 |

13
By Theorem 2.7 and Theorem 2.8, the following two relations hold:

S = X(I − PD0 )X 0 ∼ Wp (Σ, N − r(D)) ⇒ CSC 0 ∼ Wp−1 (CΣC 0 , N − r(D)),


XPD0 (DD0 )−1 K X 0 ∼ Wp (Σ, r(K)) ⇒ CXPD0 (DD0 )−1 K X 0 C 0 ∼ Wp−1 (CΣC 0 , r(K)).

The ratio given by λ2/N does not depend Σ, consequently we can replace CΣC 0 with
Ip−1 . For a detailed explanation, see Appendix A, Result A1. Therefore,

|CSC 0 |
λ2/N = ∼ Λ(p − 1, N − r(D), r(K)),
|CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 |

which completes the proof.

Note that the distribution of λ2/N , that is Wilks’ lambda distribution, can be approxi-
mated very accurately (Läuter, 2016, Mardia, Kent and Bibby, 1979).

3.2.2 Level Hypothesis

Assuming that the profiles are parallel, we will construct a test to check if they coincide.
The null hypothesis and the alternative hypothesis for the level test can be written

H2 |H1 : E(X) = M D, M F = 0,
A2 |H1 : E(X) = M D, CM F = 0, (10)

where C and F are defined in Section 3.1.

Theorem 3.2. The likelihood ratio statistic for the level hypothesis can be expressed as
0
2/N |(C 0 )◦ S −1 (C 0 )◦ |−1
λ = 0 0 0
((C 0 )◦ S −1 (C 0 )◦ )−1 + ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 XD 0 (DD 0 )−1 KQ−1 K 0
0
× (DD 0 )−1 DX 0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1
(11)

where Q = K 0 (DD 0 )−1 K+K 0 (DD 0 )−1 DX 0 C 0 (CSC 0 )−1 CXD 0 (DD 0 )−1 K and C(K) =
C(D) ∩ C(F ),
0 0
((C 0 )◦ S −1 (C 0 )◦ )−1 ∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , N − r(D) − p + 1),
0 0
((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 XD 0 (DD 0 )−1 KQ−1 K 0 (DD 0 )−1 DX 0 S −1 (C 0 )◦
0 0
× ((C 0 )◦ S −1 (C 0 )◦ )−1 ∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , r(K)).

Then

λ2/N ∼ Λ(1, N − r(D) − p + 1, r(K)).

Proof. Equivalent expressions for the restrictions in both hypotheses can be written as
0
H2 : M F = 0 ⇔ M = θF ◦ ,
0
A2 : CM F = 0 ⇔ M = (C 0 )◦ θ1 + C 0 θ2 F ◦ .

14
Plugging these solutions into the models gives
0
H2 : X = θF ◦ D + E,
0
A2 : X = (C 0 )◦ θ1 D + C 0 θ2 F ◦ D + E.

First the null hypothesis will be studied. Under H2 :


0 0
|(X − θF ◦ D)( )0 | = |(XPD0 F ◦ − θF ◦ D)( )0 + X(I − PD0 F ◦ )X 0 |

PD0 F ◦ + (I − PD0 F ◦ )
≥ |X(I − PD0 F ◦ )X 0 |

with equality if and only if XPD0 F ◦ = θF b ◦0 D. As with the parallelism hypothesis, we


will partition the space C(D 0 F ◦ )⊥ , which corresponds to I − PD0 F ◦ , into two orthogonal
parts, that is C(D 0 F ◦ )⊥ = C(D 0 )⊥  C(D 0 ) ∩ C(D 0 F ◦ )⊥ , where C(D 0 ) ∩ C(D 0 F ◦ )⊥ =
C(D 0 (DD 0 )−1 K) with C(K) = C(D) ∩ C(F ).
We have already derived the maximum of the likelihood under the restriction CM F =
0 while considering the parallelism hypothesis. This restriction appeared in the null
hypothesis for testing parallelism. For the second hypothesis, we assume that the profiles
are parallel (or we do not reject H1 ) and the test is conducted to see if they have equal
levels. The restrictions for this test can be summarised with M F = 0. The alternative
hypothesis will not simply be M F 6= 0 because we already assume that the profiles are
parallel. Consequently the level hypothesis is tested against the parallelism hypothesis
due this prior knowledge. This is the reason CM F = 0 why appears here in the
alternative hypothesis.
Thus
b H2 | = |X(I − PD0 F ◦ )X 0 | = |X(I − PD0 )X 0 + XPD0 (DD0 )−1 K X 0 |,
|N Σ
b A2 | = |X(I − PD0 )X 0 + P 0 0 −1 XPD0 (DD0 )−1 K X 0 PC 0 ,S −1 |,
|N Σ C ,S

where X(I − PD0 )X 0 = S. We know that S and XPD0 (DD0 )−1 K X 0 are Wishart dis-
tributed, but PC0 0 ,S −1 XPD0 (DD0 )−1 K X 0 PC 0 ,S −1 is not. Therefore the likelihood function
will be manipulated similarly to the treatment of the parallelism hypothesis. Put H =
(C 0 , S −1 (C 0 )◦ ) which is a full rank matrix and multiply both the numerator and the
denominator in the likelihood ratio with H 0 and H from left and right respectively:

|H 0 Σ
b A2 H|
λ2/N = ·
|H 0 Σ
b H2 H|

We start with calculating H 0 Σ b A2 H:


 
0b C
|H ΣA2 H| = 0 (S + PC0 0 ,S −1 XPD0 (DD0 )−1 K X 0 PC 0 ,S −1 )(C 0 , S −1 (C 0 )◦ )
(C 0 )◦ S −1
V11 V12
= ,
V21 V22

15
where
V11 =C(S + PC0 0 ,S −1 XPD0 (DD0 )−1 K X 0 PC 0 ,S −1 )C 0 ,
V12 =C(S + PC0 0 ,S −1 XPD0 (DD0 )−1 K X 0 PC 0 ,S −1 )S −1 (C 0 )◦ ,
0
V21 =(C 0 )◦ S −1 (S + PC0 0 ,S −1 XPD0 (DD0 )−1 K X 0 PC 0 ,S −1 )C 0 ,
0
V22 =(C 0 )◦ S −1 (S + PC0 0 ,S −1 XPD0 (DD0 )−1 K X 0 PC 0 ,S −1 )S −1 (C 0 )◦ .

It follows that
0 0
V21 = (C 0 )◦ C 0 +(C 0 )◦ S −1 PC0 0 ,S −1 XPD0 (DD0 )−1 K X 0 PC 0 ,S −1 C 0
| {z }
=0
0 ◦0
= (C ) S −1 [C 0 (CSC 0 )− CS]0 XPD0 (DD0 )−1 K X 0 C 0 (CSC 0 )− CSC 0
0
= (C 0 )◦ S −1 0 0 − 0 0 0 −
| {zS} C (CSC ) CXPD0 (DD0 )−1 K X C (CSC ) CSC
0

=I
| {z }
=0
= 0.
Notice that V12 = V210 . Therefore V12 = 0.
V11 = CSC 0 + CSC 0 (CSC 0 )− C XPD0 (DD0 )−1 K X 0 C 0 (CSC 0 )− CSC 0
| {z } | {z }
=C =C 0

and
0
V22 =(C 0 )◦ S −1 SS −1 (C 0 )◦
0
+ (C 0 )◦ S −1 0 0 − 0 0 0 − −1 0 ◦
| {zS} C (CSC ) CXPD0 (DD0 )−1 K X C (CSC ) CSS (C )
=I
| {z }
=0
0 ◦0 −1 0 ◦
=(C ) S (C ) .

Then the determinant can be written


CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 0
|H 0 Σ
b A2 H| =
0 ◦0 −1
0 (C ) S (C 0 )◦

C(X(I − PD0 F ◦ )X 0 )C 0 0
= 0 ◦0 −1 .
0 (C ) S (C 0 )◦

Let’s move on to the null hypothesis:


 
0b C
|H ΣH2 H| = 0 (S + XPD0 (DD0 )−1 K X 0 )(C 0 , S −1 (C 0 )◦ )
(C 0 )◦ S −1

C(S + XP 0 0 −1 X 0 )C 0 C(S + XP 0 0 −1 X 0 )S −1 (C 0 )◦
= (C 0 )◦0 S −1 (S + DXP
(DD ) K
0 0 0
D (DD ) K
(C 0 )◦ S −1 (S + XPD0 (DD0 )−1 K X 0 )S −1 (C 0 )◦
D 0 (DD 0 )−1 K X )C

CX(I − P 0 ◦ )X 0 C 0 CX(I − P 0 ◦ )X 0 S −1 (C 0 )◦
= (C 0 )◦0 S −1 X(I D− FP 0 ◦ )X 0 C 0 (C 0 )◦0 S −1 X(I D− FP 0 ◦ )X 0 S −1 (C 0 )◦ .
D F D F

16
Note that we used the relation X(I − PD0 F ◦ )X 0 = X(I − PD0 )X 0 + XPD0 (DD0 )−1 K X 0 =
S + XPD0 (DD0 )−1 K X 0 . The determinant for the alternative hypothesis is straightforward.
We use the Theorem 2.10, which is for the partitionened matrices, for the determinant
for the null hypothesis. Hence,
0
b A2 H| = |CX(I − PD0 F ◦ )X 0 C 0 ||(C 0 )◦ S −1 (C 0 )◦ |,
|H 0 Σ
b H2 H| = |CX(I − PD0 F ◦ )X 0 C 0 ||(C 0 )◦0 S −1 X(I − PD0 F ◦ )X 0 S −1 (C 0 )◦
|H 0 Σ
0
− (C 0 )◦ S −1 X(I − PD0 F ◦ )X 0 C 0 [CX(I − PD0 F ◦ )X 0 C 0 ]−1
× CX(I − PD0 F ◦ )X 0 S −1 (C 0 )◦ |.

When we take the ratio of these two quantities, i.e., the ratio of |H 0 Σ b A2 H| and |H 0 Σ
b H2 H|,
the first term, which is |CX(I −PD0 F ◦ )X C |, will cancel out. Say S1 = X(I −PD0 F ◦ )X 0 .
0 0

Then the ratio will become


0
|H 0 Σ
b A2 H| |(C 0 )◦ S −1 (C 0 )◦ |
=
|H 0 Σ
b H2 H| |(C 0 )◦0 S −1 S1 S −1 (C 0 )◦ − (C 0 )◦0 S −1 S1 C 0 [CS1 C 0 ]−1 × CS1 S −1 (C 0 )◦ |
0
|(C 0 )◦ S −1 (C 0 )◦ |
= ·
|(C 0 )◦0 S −1 [S1 − S1 C 0 (CS1 C 0 )−1 CS1 ]S −1 (C 0 )◦ |

By the special case of Theorem 2.3,


0 0
S1 C 0 (CS1 C 0 )−1 CS1 = S1 − (C 0 )◦ [(C 0 )◦ S1−1 (C 0 )◦ ]−1 (C 0 )◦ .

Thus,
0
|H 0 Σ
b A2 H| |(C 0 )◦ S −1 (C 0 )◦ |
=
|H 0 Σ
b H2 H| |(C 0 )◦ S −1 [S1 − S1 + (C 0 )◦ ((C 0 )◦0 S1−1 (C 0 )◦ )−1 (C 0 )◦0 ]S −1 (C 0 )◦ |
0

0
|(C 0 )◦ S −1 (C 0 )◦ |
=
|(C 0 )◦0 S −1 (C 0 )◦ ||((C 0 )◦0 S1−1 (C 0 )◦ )−1 (C 0 )◦0 S −1 (C 0 )◦ |
0
|(C 0 )◦ S −1 (C 0 )◦ |−1
= ·
|(C 0 )◦0 S1−1 (C 0 )◦ )|−1

Notice that
S1−1 = [X(I − PD0 F ◦ )X 0 ]−1 = (S + XPD0 (DD0 )−1 K X)−1
= [S + (XP1 )(XP1 )0 ]−1 (put P1 =PD0 (DD0 )−1 K )

= S −1 − S −1 (XP1 )[(XP1 )0 S −1 (XP1 ) + I]−1 (XP1 )0 S −1 (since P1 is idem. and sym.)

= S −1 − S −1 XP1 (P1 X 0 S −1 XP1 + I)−1 P1 X 0 S −1 . (by Theorem 2.11) (12)


0
If we replace S1−1 in |(C 0 )◦ S1−1 (C 0 )◦ )|−1 with (12):
0
|(C 0 )◦ S1−1 (C 0 )◦ )|−1
0 0
= |(C 0 )◦ S −1 (C 0 )◦ − (C 0 )◦ S −1 XP1 (P1 X 0 S −1 XP1 + I)−1 P1 X 0 S −1 (C 0 )◦ |
0 0
= |(C 0 )◦ S −1 (C 0 )◦ |−1 |I − P1 X 0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1
0
× (C 0 )◦ S −1 XP1 (P1 X 0 S −1 XP1 + I)−1 |−1
0
= |(C 0 )◦ S −1 (C 0 )◦ |−1 |P1 X 0 S −1 XP1 + I||I + P1 X 0 S −1 XP1
0 0
− P1 X 0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 XP1 |−1 . (13)

17
For the last determinant in (13),
0 0
I + P1 X 0 S −1 XP1 − P1 X 0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 XP1
0 0
= I + P1 X 0 [S −1 − S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 ]XP1 . (14)

Using the relation given in Theorem 2.3,

(14) = I + P1 X 0 [C 0 (CSC 0 )−1 C]XP1 .

Thus,
0 0
|(C 0 )◦ S1−1 (C 0 )◦ )|−1 = |(C 0 )◦ S −1 (C 0 )◦ |−1 |P1 X 0 S −1 XP1 + I|
× |I + P1 X 0 C 0 (CSC 0 )−1 CXP1 |−1 .

If we put this result back into the ratio:


0
|H 0 Σ
b A2 H| |(C 0 )◦ S −1 (C 0 )◦ |−1 |I + P1 X 0 C 0 (CSC 0 )−1 CXP1 |
= =
|H 0 Σ
b H2 H| |(C 0 )◦0 S1−1 (C 0 )◦ )|−1 |I + P1 X 0 S −1 XP1 |
|I + X 0 C 0 (CSC 0 )−1 CXP1 |
= · (15)
|I + X 0 S −1 XP1 |

Note that

P1 = PD0 (DD0 )−1 K = D 0 (DD 0 )−1 K(K 0 (DD 0 )−1 DD 0 (DD 0 )−1 K)− K 0 (DD 0 )−1 D.

Plug this into the ratio above and take K 0 (DD 0 )−1 D to the left:

I + K 0 (DD 0 )−1 DX 0 C 0 (CSC 0 )−1 CXD 0 (DD 0 )−1 K


× (K 0 (DD 0 )−1 DD 0 (DD 0 )−1 K)−
(15) = ·
I + K 0 (DD 0 )−1 DX 0 S −1 XD 0 (DD 0 )−1 K
× (K 0 (DD 0 )−1 DD 0 (DD 0 )−1 K)−

Now we take (K 0 (DD 0 )−1 DD 0 (DD 0 )−1 K)− out for both the numerator and the denom-
inator. Then
|K 0 (DD 0 )−1 DD 0 (DD 0 )−1 K + K 0 (DD 0 )−1 DX 0 C 0 (CSC 0 )−1 CXD 0 (DD 0 )−1 K|
(15) = ·
|K 0 (DD 0 )−1 DD 0 (DD 0 )−1 K + K 0 (DD 0 )−1 DX 0 S −1 XD 0 (DD 0 )−1 K|

0 0
DD 0 (DD 0 )−1 = I. Moreover, we know that S −1 = S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦
× S −1 + C 0 (CSC 0 )−1 C, which implies

|K 0 (DD 0 )−1 K + K 0 (DD 0 )−1 DX 0 C 0 (CSC 0 )−1 CXD 0 (DD 0 )−1 K|


(15) = 0 0
K 0 (DD 0 )−1 K + K 0 (DD 0 )−1 DX 0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦
× S −1 XD 0 (DD 0 )−1 K + K 0 (DD 0 )−1 DX 0 C 0 (CSC 0 )−1 CD 0 (DD 0 )−1 K
= |I + [K 0 (DD 0 )−1 K + K 0 (DD 0 )−1 DX 0 C 0 (CSC 0 )−1 CXD 0 (DD 0 )−1 K]−1
0 0
× K 0 (DD 0 )−1 DX 0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 XD 0 (DD 0 )−1 K|−1

18
Put Q = K 0 (DD 0 )−1 K + K 0 (DD 0 )−1 DX 0 C 0 (CSC 0 )−1 CXD 0 (DD 0 )−1 K and use the
rotation in Theorem 2.4:
0 0
(15) = |I + (C 0 )◦ S −1 XD 0 (DD 0 )−1 KQ−1 K 0 (DD 0 )−1 DX 0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 |−1
0 0 0 0
= |(C 0 )◦ S −1 (C 0 )◦ |−1 |((C 0 )◦ S −1 (C 0 )◦ )−1 + ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1
0
× XD 0 (DD 0 )−1 KQ−1 K 0 (DD 0 )−1 DX 0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 |−1
0
|(C 0 )◦ S −1 (C 0 )◦ |−1
= 0 0 0 ·
((C 0 )◦ S −1 (C 0 )◦ )−1 + ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 XD 0 (DD 0 )−1 KQ−1 K 0
0
× (DD 0 )−1 DX 0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1

Now we will find the distributions of the expressions in this ratio. Let’s start with
0
((C 0 )◦ S −1 (C 0 )◦ )−1 . We multiply this expression with the following identity matrices
from left and right:
0 0 0 0 0
((C 0 )◦ (C 0 )◦ )−1 (C 0 )◦ (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ (C 0 )◦ ((C 0 )◦ (C 0 )◦ )−1
| {z } | {z }
I I
| {z }
=S−SC 0 (CSC 0 )−1 CS
◦0 ◦0 0
= ((C 0 ) (C 0 )◦ )−1 (C 0 ) [S − SC 0 (CSC 0 )−1 CS](C 0 )◦ ((C 0 )◦ (C 0 )◦ )−1 . (16)

Recall that S = X(I − PD0 )X 0 . Then (16) becomes


0 0
((C 0 )◦ (C 0 )◦ )−1 (C 0 )◦ X(I − PD0 )(I − X 0 C 0 [CX(I − PD0 )X 0 C 0 ]−1
0
× CX(I − PD0 ))X 0 (C 0 )◦ ((C 0 )◦ (C 0 )◦ )−1 . (17)

From Theorem 2.3, two relations


0 0
I = (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 + ΣC 0 (CΣC 0 )−1 C,
0 0
I = Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ + C 0 (CΣC 0 )−1 CΣ

are obtained which will be then used in (17):


0 0
(17) = ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 X(I − PD0 )(I − X 0 C 0 [CX(I − PD0 )
0
× X 0 C 0 ]−1 CX(I − PD0 ))X 0 Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 .

We have achieved a structure where we have · · · X ( ) X 0 · · · . Now the rank of this


| {z }
idempotent
idempotent matrix is checked (see Appendix A, Result A2) :

r(I − PD0 )(I − X 0 C 0 [CX(I − PD0 )X 0 C 0 ]−1 CX(I − PD0 )) = N − r(D) − p + 1. (18)
0
We also need to show that (C 0 )◦ Σ−1 X and X 0 C 0 are independent which is verified in
Appendix A, Result A3.
0 0
Thus, we can conclude that the conditional distribution of ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 X
× (I − PD0 )(I − X 0 C 0 [CX(I − PD0 )X 0 C 0 ]−1 CX(I − PD0 )) conditioned on CX follows
0
a normal distribution with mean 0 and dispersion matrix ((C 0 )◦ Σ−1 (C 0 )◦ )−1 . Then, by
Theorem 2.7,
0 0
((C 0 )◦ S −1 (C 0 )◦ )−1 ∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , N − r(D) − p + 1),

19
which is independent of CX. Now we will move on to the second expression in the ratio
given by (15), which equals
0 0
((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 XD 0 (DD 0 )−1 KQ−1 K 0
0
× (DD 0 )−1 DX 0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 . (19)

0 0 0
First focus on ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 X. We use the identity matrix ((C 0 )◦ (C 0 )◦ )−1
0 0 0
×(C 0 )◦ (C 0 )◦ and the relations (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 = I −SC 0 (CSC 0 )−1 C
and S = X(I − PD0 )X 0 . Then
0 0 0 0
((C 0 )◦ (C 0 )◦ )−1 (C 0 )◦ (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 X
0 0
= ((C 0 )◦ (C 0 )◦ )−1 (C 0 )◦ X[I − (I − PD0 )X 0 C 0 (CX(I − PD0 )X 0 C 0 )−1 CX]. (20)

Similar to the calcutions done for (17), the identity matrix


0 0
I = (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 + ΣC 0 (CΣC 0 )−1 C

will be used in (20):


0 0 0 0
((C 0 )◦ (C 0 )◦ )−1 (C 0 )◦ (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 X
× [I − (I − PD0 )X 0 C 0 (CX(I − PD0 )X 0 C 0 )−1 CX]
0 0
=((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 X[I − (I − PD0 )X 0 C 0 (CX(I − PD0 )X 0 C 0 )CX]. (21)

0
Moreover, (C 0 )◦ Σ−1 X and X 0 C 0 are independently distributed. Thus, (21) is normally
distributed given CX and so is
0 0
((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 XD 0 (DD 0 )−1 KQ−1/2 . (22)

From the definition of the Wishart distribution given by Definition 2.6, if X is normally
distributed with mean 0 and dispersion I ⊗ Σ, then XX 0 ∼ W (Σ, n). So we need to
check the mean and dispersion for (22). The mean is zero and for the dispersion recall
that

D[AXB] = (B 0 ⊗ A) D(X)(B ⊗ A0 ) = (B 0 B) ⊗ (AΣA0 ).


| {z }
I⊗Σ

Using this formula, the dispersion of (22) for given CX equals

Q−1/2 K 0 (DD 0 )−1 D[I − X 0 C 0 (CX(I − PD0 )X 0 C 0 )−1 CX(I − PD0 )][I − (I − PD0 )
0 0
× X 0 C 0 (CX(I − PD0 )X 0 C 0 )−1 CX]D 0 (DD 0 )−1 KQ−1/2 ⊗ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦
0
× Σ−1 ΣΣ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 .
(23)

The details for the calculation of (23) is given in Appendix A, Result A4. Based on this
result
0
(23) = I ⊗ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 .

20
Hence, we can conclude that (22) is normally distributed with mean 0 and dispersion
0
I ⊗ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 , conditional on CX. Therefore, the square of the matrix in (22)
is Wishart distributed. Notice that

[I − (I − PD0 )X 0 C 0 (CX(I − PD0 )X 0 C 0 )−1 CX]D 0 (DD 0 )−1 KQ−1/2


× Q−1/2 K 0 (DD 0 )−1 D[I − X 0 C 0 (CX(I − PD0 )X 0 C 0 )−1 CX(I − PD0 )] (24)

is idempotent. To show this, put G = Q−1/2 K 0 (DD 0 )−1 D[I−X 0 C 0 (CX(I−PD0 )X 0 C 0 )−1
× CX(I − PD0 )] and
G0 GG 0 0
| {z } G = G G.
=I

To see the details for GG0 = I, see Appendix A, Result A4. Thus, (24) is idempotent.
We need to check the rank of this idempotent matrix to determine the degrees of freedom
in the Wishart distribution (See Theorem 2.7):
Prop. 2.14 ii) Prop. 2.14 i)
r((24)) = r(G0 G) = tr(G0 G) = tr(GG0 ) = tr(I).

Furthermore, tr(I) will be equal to the size of GG0 . To be able to find the size of GG0 , one
needs to check Q = K 0 (DD 0 )−1 K + K 0 (DD 0 )−1 DX 0 C 0 (CSC 0 )−1 CXD 0 (DD 0 )−1 K.
Say K is a q × s matrix and r(K) = s. Then Q is s × s, so the size of GG0 is s = r(K).
As a conclusion,
0 0
((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 XD 0 (DD 0 )−1 KQ−1 K 0 (DD 0 )−1 DX 0 S −1 (C 0 )◦
0 0
× ((C 0 )◦ S −1 (C 0 )◦ )−1 ∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , r(K)).

|U |
Thus, the distribution for (15) is given by , where
|U + V |
0
U ∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , N − r(D) − p + 1),
0
V ∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , r(K)).

0
If we pre- and post-multiply U and V with ((C 0 )◦ Σ−1 (C 0 )◦ )1/2 , and denote the new
e and Ve respectively, then the ratio becomes |U | , where
e
expressions with U
|U
e + Ve |

e ∼ W1 (I, N − r(D) − p + 1) and Ve ∼ W1 (I, r(K)).


U

Then
|U
e|
λ2/N = ∼ Λ(1, N − r(D) − p + 1, r(K)).
|U
e + Ve |

21
3.2.3 Flatness Hypothesis

Assuming that the profiles are parallel, we will test if they are flat or not.

H3 |H1 : E(X) = M D, CM = 0,
A3 |H1 : E(X) = M D, CM F = 0, (25)

where C and F are defined in Section 3.1.


Theorem 3.3. The likelihood ratio statistic for the flatness hypothesis is given by

2/N |CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 |


λ = , (26)
|CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 + CXPD0 F ◦ X 0 C 0 |

where

CXPD0 F ◦ X 0 C 0 ∼ Wp−1 (CΣC 0 , r(D 0 F ◦ )),


CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 ∼ Wp−1 (CΣC 0 , N − r(D) + r(K)).

Then
λ2/N ∼ Λ(p − 1, N − r(D) + r(K), r(D 0 F ◦ )).
Proof. Equivalent expressions for the restrictions in both hypotheses can be written

H3 : CM = 0 ⇔ M = (C 0 )◦ θ,
0
A3 : CM F = 0 ⇔ M = (C 0 )◦ θ1 + C 0 θ2 F ◦ .

If we plug in these solutions into the model given in (2), then

H3 : X = (C 0 )◦ θD + E,
0
A3 : X = (C 0 )◦ θ1 D + C 0 θ2 F ◦ D + E.

We will first look at the null hypothesis. Under H3 :

|(X − (C 0 )◦ θD)(X − (C 0 )◦ θD)0 | = |(XPD0 − (C 0 )◦ θD)( )0 + X(I − PD0 )X 0 |. (27)



PD0 + (I − PD0 )

We cannot simply say that (27) ≥ |X(I − PD0 )X 0 | with equality if and only if XPD0 =
(C 0 )◦ θD because XPD0 = (C 0 )◦ θD is not necessarily a consistent equation. Recall the
two conditions for consistency from Theorem 2.2:
(i) C(PD0 X 0 ) ⊆ C(D 0 ) is satisfied.

(ii) C(XPD0 ) ⊆ C((C 0 )◦ ), which is not necessarily true.

Thus, we need some further steps.

(27) = |(XPD0 − (C 0 )◦ θD)( )0 + S| = |S||S −1 (XPD0 − (C 0 )◦ θD)( )0 + I|


= |S||I + (XPD0 − (C 0 )◦ θD)0 S −1 (XPD0 − (C 0 )◦ θD)|.

22
Let G = XPD0 − (C 0 )◦ θD and using Theorem 2.3,
0 0
(27) = |S||I + G0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )− (C 0 )◦ S −1 G + G0 C 0 (CSC 0 )− CG|
≥ |S||I + G0 C 0 (CSC 0 )− CG| (28)

which is independent of θ since

CG = C(XPD0 − (C 0 )◦ θD) = CXPD0 − C(C 0 )◦ θD = CXPD0 .


| {z }
=0

Equality holds if and only if


0 0
G0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )− (C 0 )◦ S −1 G = 0.

This is equivalent to
0
G0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )− = 0.

Thus, the lower bound which we were seeking for in (28) and which equals |S||I +
G0 C 0 (CSC 0 )− CG| = |S||I + PD0 X 0 C 0 (CSC 0 )− CXPD0 | has been obtained. The situ-
ation for the third hyothesis is similar to the level hypothesis where we have CM F = 0
as the alternative hypothesis. We assume that the profiles are parallel and the test is con-
ducted to see if they are flat or not. The restrictrictions for this test can be summarised
with CM = 0. Due to the assumption that parallelism holds, the alternative hypothesis
becomes CM F = 0. Then (9) from Section 3.2.1, where the likelihood for CM F = 0
has been derived, will be used for |N Σb A3 |. As a result,

b H3 | = |S||I + PD0 X 0 C 0 (CSC 0 )− CXPD0 |,


|N Σ
b A3 | = |S + P 0 0 −1 XPD0 (DD0 )− K X 0 PC 0 ,S −1 |.
|N Σ C ,S

In order to get a familiar structure for the ratio of these two quantities, we need to
do some changes on |N ΣH3 |. Use the rotation given by Theorem 2.4 and since PD0 is
idempotent, PD0 PD0 = PD0 holds. And use also I = SS −1 :
b H3 | = |S||I + PD0 X 0 C 0 (CSC 0 )− CSS −1 X|
|N Σ (rotate)
= |S||I + XPD0 X 0 C 0 (CSC 0 )− CS S −1 | (PC 0 ,S −1 = PC2 0 ,S −1 and then rotate)
| {z }
=PC 0 ,S −1

= |S||I + PC 0 ,S −1 S XPD0 X 0 PC 0 ,S −1 |
−1

= |S||I + C 0 (CSC 0 )− C |SS −1 0


{z } XPD0 X PC 0 ,S −1 |
=I
= |S + SC 0 (CSC 0 )− CXPD0 X 0 PC 0 ,S −1 |
= |S + PC0 0 ,S −1 XPD0 X 0 PC 0 ,S −1 |.

Hence,

b H3 | = |S + P 0 0 −1 XPD0 X 0 PC 0 ,S −1 |;
|N Σ C ,S

|N ΣA3 | = |S + P 0 0 −1 XPD0 (DD0 )−1 K X 0 PC 0 ,S −1 |.


b
C ,S

23
We already know that XPD0 X 0 is Wishart distributed (see Theorem 2.7) but PC0 0 ,S −1 X
× PD0 X 0 PC 0 ,S −1 is not. Similarly XPD0 (DD0 )−1 K X 0 is Wishart distributed but PC0 0 ,S −1 X
× PD0 (DD0 )−1 K X 0 PC 0 ,S −1 is not. Let H = (C 0 , S −1 C 0 )◦ be of full rank:

|H 0 Σ
b A3 H|
λ2/N = .
|H 0 Σ
b H3 H|

We start with calculating |H 0 Σ b H3 H|:


 
0b C
|H ΣH3 H| = 0 (S + PC0 0 ,S −1 XPD0 X 0 PC 0 ,S −1 )(C 0 , S −1 (C 0 )◦ )
(C 0 )◦ S −1
V11 V12
= ,
V21 V22
where

V11 =C(S + PC0 0 ,S −1 XPD0 X 0 PC 0 ,S −1 )C 0 ,


V12 =C(S + PC0 0 ,S −1 XPD0 X 0 PC 0 ,S −1 )S −1 (C 0 )◦ ,
0
V21 =(C 0 )◦ S −1 (S + PC0 0 ,S −1 XPD0 X 0 PC 0 ,S −1 )C 0 ,
0
V22 =(C 0 )◦ S −1 (S + PC0 0 ,S −1 XPD0 X 0 PC 0 ,S −1 )S −1 (C 0 )◦ .

Let’s check V12 :

V12 = CSS −1 (C 0 )◦ + CPC0 0 ,S −1 XPD0 X 0 PC 0 ,S −1 S −1 (C 0 )◦


= C(C 0 )◦ +CSC 0 (CSC 0 )− CXPD0 X 0 C 0 (CSC 0 )− CSS −1 (C 0 )◦
| {z } | {z }
=0 =0
= 0.

Notice that V21 = V120 = 0. Let’s calculate the other elements:

V11 = CSC 0 + CSC 0 (CSC 0 )− CXPD0 X 0 C 0 (CSC 0 )− CSC 0


= CSC 0 + CXPD0 X 0 C 0 ,
0 0
V22 = (C 0 )◦ S −1 SS −1 (C 0 )◦ + (C 0 )◦ S −1 SC 0 (CSC 0 )− CXPD0 X 0 C 0 (CSC 0 )−
× CSS −1 (C 0 )◦
0
= (C 0 )◦ S −1 (C 0 )◦ .

If the established relations are put together:


CSC 0 + CXPD0 X 0 C 0 0
|H 0 Σ
b H3 H| =
0 ◦0 −1 .
0 (C ) S (C 0 )◦

The alternative hypothesis for the flatness test is the same as for the level test. Conse-
quently, the corresponding likelihood will have the same form, so H 0 Σ
b A2 H = H 0 Σb A3 H.
Notice that the matrix H used during the level test is the same as the matrix H intro-
duced here. Then
0
|H 0 Σ
b A3 H| = |CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 ||(C 0 )◦ S −1 (C 0 )◦ |.

24
Thus, the ratio becomes:
0
2/N |H 0 Σ
b A3 H| |CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 ||(C 0 )◦ S −1 (C 0 )◦ |
λ = =
|H 0 Σ
b H3 H| |CSC 0 + CXPD0 X 0 C 0 ||(C 0 )◦0 S −1 (C 0 )◦ |
|CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 |
= ,
|CSC 0 + CXPD0 X 0 C 0 |

where CSC 0 , CXPD0 (DD0 )−1 K X 0 C 0 and CXPD0 X 0 C 0 are all Wishart distributed. How-
ever, we are trying to find a structure with |U|U |
+V |
, U and V are independently Wishart
distributed. Recall the space decomposition given in Equation (8) for the parallelism
hypothesis. It implies

I − PD0 F ◦ = I − PD0 + PD0 (DD0 )−1 K ,


PD0 = PD0 (DD0 )−1 K + PD0 F ◦ .

Then we can write the ratio as

|H 0 Σ
b A3 H| |CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 |
λ2/N = =
|H 0 Σ
b H3 H| |CSC 0 + CX(PD0 (DD0 )−1 K + PD0 F ◦ )X 0 C 0 |
|CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 |
= ·
|CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 + CXPD0 F ◦ X 0 C 0 |

Since X is normally distributed and PD0 F ◦ is idempotent, by Theorem 2.7,


Theorem 2.8
XPD0 F ◦ X 0 ∼ Wp (Σ, r(D 0 F ◦ )) ⇒ CXPD0 F ◦ X 0 C 0 ∼ Wp−1 (CΣC 0 , r(D 0 F ◦ )).

We already know that CSC 0 ∼ Wp−1 (CΣC 0 , N − r(D)) and by Theorem 2.6, the sum
of two independently distributed Wishart matrices with the same scale matrix is again
Wishart. Then

CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 ∼ Wp−1 (CΣC 0 , N − r(D) + r(K)).

Using the result given in Appendix A, Result A1,

λ2/N ∼ Λ(p − 1, N − r(D) + r(K), r(D 0 F ◦ )).

4 High-dimensional setting
4.1 Background
The classical setting for data analysis usually consists of large number of experimental
units and small number of variables. For estimability reasons, the number of data points,
n, needs to be larger than the number of parameters, p. Asymptotic properties have been
derived in the classical setting. Theorems such as The Law of Large Numbers and The
Central Limit Theorem focus on the case when p is fixed and n → ∞.

25
In recent years due to the development of information technology and data storage, the
direction of the relationship between p and n has started to change. We face more research
questions where we have more parameters than data points:

p>n or p  n. (29)

Here we can mention different types of asymptotics as n → ∞:


p
(i) n
→ c, where c ∈ (a, b),
p
(ii) n
→ ∞.

Classical multivariate approaches fail when the dimension of repeated measurements, p


starts to exceed the number of observations, n. The sample covarince matrix becomes
singular, consequently the likelihood ratio tests are not well-defined. As a result of this,
classical tests are not feasible in the high-dimensional setting. We need to investigate
and extend the current approaches within the high-dimensional framework. Our focus
in this report will not be on any specific type of asymptotics that have been mentioned
above. The p and n will be fixed and they will satisfy the condition given by (29).
Ledoit and Wolf (2002) derived hypothesis tests for the covariance matrix in a high-
dimensional setting. Srivastava (2005) also developed tests for certain hypotheses on
the covariance matrix in high-dimensions. Srivastava and Fujikoshi (2006), Srivastava
(2007), Srivastava and Du (2008) are other examples in the multivariate area. Kollo, von
Rosen and von Rosen (2011) focused on estimating the parameters describing the mean
structure in the Growth Curve model. Testing for the mean matrix in a Growth Curve
model for high-dimensions was studied by Srivastava and Singull (2017) as well. Fujikoshi,
Ulyanov and Shimizu (2010) focus on high-dimensional and large-sample approximations
for multivariate statistics.
The focus in this report is on high-dimensional profile analysis. Onozawa, Nishiyama and
Seo (2016) derived test statistics for profile analysis with unequal covariance matrices in
high-dimensions. Similarly, Harrar and Kong (2016) worked on this topic. Shutoh and
Takahashi (2016) proposed new test statistics in profile analysis with high-dimensional
data by using the Cauchy-Schwarz inequality. All these references study the asymptotic
distributions of the test statistics. They introduce different high-dimensional asymptotic
frameworks and derive the test statistics in profile analysis under these frameworks. Our
approach will be different than the approaches mentioned above. As noted before, we will
not focus on the asymptotic distributions of the test statistics. In this report, fixed p and
n are of interest and the method that is going to be used is introduced in the following
chapter.

4.2 Dimension reduction using scores and spherical distribu-


tions
Läuter (1996, 2016) and Läuter, Glimm and Kropf (1996, 1998) proposed a new method
for dealing with the problem that arises in high-dimensional settings. The tests they
proposed are based on linear scores which have been obtained by using score coefficients
that are determined from data via sums of products matrices. These scores are basically
linear combinations of the repeated measures and how the combinations are being chosen

26
are called score coefficients or weights. With this approach high-dimensional observations
are compressed into low-dimensional scores. Then these are used for the analysis instead
of the original data. This approach can be useful in many situations because we often do
not have the knowledge on the effect of each single variable or one may want to investigate
the joint effect of several variables.
Let’s give the mathematical representation of the theory. Suppose

x = (xi ) ∼ Np (µ, Σ)

and n individual p-dimensional vectors form the p × n matrix X which satisfies

X = (xij ) ∼ Np×n (µ10n , Σ, In ).

Consider a single score

z 0 = (z1 , z2 , · · · , zn ) = (d1 , d2 , · · · , dp )X = d0 X,

where d is the vector of weights and zj ’s, j = 1, ..., n, are the individual scores. The
rule for choosing the vector d of the coefficients is that it has to be a unique function
of XX 0 which is the p × p matrix of the sums of the products. Moreover, the condition
d0 X 6= 0 with probability 1 needs to be satisfied. The total sums of product matrix
XX 0 corresponds to the hypothesis µ = 0. Consequently, based on the hypothesis the
structure of the function can change. We will try to illustrate the idea with two primary
theorems presented in Läuter, Glimm and Kropf (1996).
Theorem 4.1. (Läuter et al., 1996) Assume that X is a p × n matrix consisting
of n p-dimensional observations (p ≥ 1, n ≥ 2) that follows the normal distribution
X ∼ Np×n (0, Σ, In ). Define a p-dimensional vector of weights d which is a function of
XX 0 and assume d0 X 6= 0 with probability 1. Then

nz̄
t= (30)
sz
has the exact t distribution with n − 1 degrees of freedom, where
1 0 1
z 0 = (zj )0 = d0 X, z̄ = z 1n , s2z = (z 0 z − nz̄ 2 ).
n n−1
Theorem 4.2. (Läuter et al., 1996) Assume that H ∼ Wp (Σ, m) and G ∼ Wp (Σ, f )
and they are independently distributed. Define a p-dimensional vector of weights d which
is a function of H + G and assume d0 (H + G)d 6= 0 with probability 1. Then

f d0 Hd
F =
m d0 Gd
follows an F -distibution with m and f degrees of freedom.

The idea behind these theorems are based on the theory of spherical distributions which
has been treated extensively in the book by Fang and Zhang (1990). Elliptically contoured
distributions can be considered as the generalization of the class of Gaussian distribution
which has been the centre of multivariate theory. Normality is assumed for many testing
problems, but practically this is often not true. Thus, there has been an effort to extend

27
the class of normal distibution to a wider class which still keeps the basic properties of
normal distribution.
We know that if z ∼ Nn (0, In ), the statistic t given in (30) has a t-distribution with n − 1
degrees of freedom , so we need to show a connection between z and the standard normal
distribution. Since the normal distribution is in the class of spherical distributions, if
one can show that z is spherically distributed, then this connection will be provided.
We also need to show that the test statistics’ distributions remain the same when we
use spherically distributed random vectors. These ideas are given by a corollary and a
theorem, among others, by Fang and Zhang (1990).

Corollary 4.1. (Fang and Zhang, 1990) An n × 1 random vector x is spherically


d
distributed if and only if, for every n × n orthogonal matrix Γ, x = Γx.

Theorem 4.3. (Fang and Zhang, 1990) A statistic t(x)’s distribution remains the
d
same whenever x ∼ Sn+ (φ) if t(αx) = t(x) for each α > 0 and each x ∼ Sn+ (.), where
Sn (φ) denotes the spherical distribution with parameter φ and φ(.) is a function of a
scalar variable and it is called the characteristic generator of the spherical distribution.
If x ∼ Sn (φ) and P (x = 0) = 0, this is denoted by x ∼ Sn+ (φ).

4.3 The derivation of the tests in the high dimensional setting


We have derived the three hypothesis for the classical setting where N > p. Now we will
focus on the high dimensional setting where we have p > N or p >> N . First we will
construct the scores and then derive the likelihood ratio tests based on these scores. One
should notice that we use capital N for the total sample size when there are groups, whose
sizes may differ from each other. We refer to the parts related to profile analysis.
P In our
case, there are q groups with group size nk , where k = 1, ..., q and therefore N = qk=1 nk .
In Chapter 4.1 and Chapter 4.2, while the general introduction is being given, the total
sample size is denoted by n when there exists one group in the analysis or theory.

4.3.1 Parallelism Hypothesis

Recall (3) and (4) from Section 3.2.1,

H1 : E(X) = M D, CM F = 0,
A1 : E(X) = M D, CM F 6= 0

and
|ΣA1 | |CSC 0 |
LR = = ,
|ΣH1 | |CSC 0 + CXPD0 (DD0 )− K X 0 C 0 |

where

CSC 0 ∼ Wp−1 (CΣC 0 , N − r(D)),


CXPD0 (DD0 )−1 K X 0 C 0 ∼ Wp−1 (CΣC 0 , r(K)). (31)

28
In the beginning it was assumed that X ∼ Np,N (M D, Σ, IN ). If we multiply X with
C, then CX ∼ N(p−1),N (CM D, CΣC 0 , IN ). As we can see through the derivation of
the statistics in (31), X appears with C. Let Y = CX. Then

CSC 0 = CX(I − PD0 )X 0 C 0 = Y (I − PD0 )Y 0 ∼ Wp−1 (CΣC 0 , N − r(D)),


CXPD0 (DD0 )−1 K X 0 C 0 = Y PD0 (DD0 )−1 K Y 0 ∼ Wp−1 (CΣC 0 , r(K)).

Instead of applying the vector d to X we will apply it to Y . Notice that d is a (p − 1) × 1


vector. When we multiply Y with d0 from left (and Y 0 with d from right), Y will be
reduced to a vector and we call this new vector the score vector and denote it by z, that
is z 0 = d0 Y :
d0 Y (I − PD0 )Y 0 d
λN/2 =
d0 Y (I − PD0 )Y 0 d + d0 Y PD0 (DD0 )− K Y 0 d
z 0 (I − PD0 )z
= 0 . (32)
z (I − PD0 )z + z 0 PD0 (DD0 )−1 K z

Now we are going to find the distribution of this ratio.

Theorem 4.4. The ratio given in (32) follows Wilks’ lambda distribution with parameters
1, r(K) and N − r(D) that is denoted by Λ(1, r(K), N − r(D)) which is equivalent to
B N −r(D)
2
, r(K) 
2
, where B(·, ·) denotes the Beta-distribution.

Proof. First, we should note that since d is a function of Y Y 0 ,

d0 Y (I − PD0 )Y 0 d  W1 (d0 CΣC 0 d, N − r(D)),


d0 Y PD0 (DD0 )−1 K Y 0 d  W1 (d0 CΣC 0 d, r(K)),

which means that we cannot find the distribution of the ratio directly. This is where we
need the theory of spherical distributions.We begin by showing that the scores are spher-
ically distributed. To show this, first we need to show that Y is spherically distributed:

YΓ = Y Γ
D(YΓ ) = (Γ0 Γ) ⊗ Σ = I ⊗ Σ but E(YΓ ) 6= E(Y ).

Thus, Y is not spherically distributed. Therefore, we need to adapt the test statistic
without changing the overall value in order to achieve sphericity.
Recall that the model under the null hypothesis is
0
X = (C 0 )◦ θ1 D + C 0 θ2 F ◦ D + E,

and accordingly the mean under the null hypothesis equals


0
(C 0 )◦ θ1 + C 0 θ2 F ◦ .

We subtract the mean under the null hypothesis from X. Then the expressions in the

29
ratio given by (32) will become as it follows:

(i) d0 C[X − E(X)](I − PD0 )[X − E(X)]0 C 0 d


0 0
=d0 C[X − (C 0 )◦ θ1 D − C 0 θ2 F ◦ D](I − PD0 )[X − (C 0 )◦ θ1 D − C 0 θ2 F ◦ D]0 C 0 d
=d0 CX(I − PD0 )X 0 C 0 d
=d0 Y (I − PD0 )Y 0 d;
(ii) d0 C[X − E(X)]PD0 (DD0 )−1 K [X − E(X)]0 C 0 d
0 0
=d0 C[X − (C 0 )◦ θ1 D − C 0 θ2 F ◦ D]PD0 (DD0 )−1 K [X − (C 0 )◦ θ1 D − C 0 θ2 F ◦ D]0 C 0 d
0
=d0 [CX − C(C 0 )◦ θ1 D − CC 0 θ2 F ◦ D]PD0 (DD0 )−1 K [CX − C(C 0 )◦ θ1 D
| {z } | {z }
=0 =0
0
− CC 0 θ2 F ◦ D]0 d.

Let’s look at the middle part:


0
CC 0 θ2 F ◦ DPD0 (DD0 )−1 K
0
= CC 0 θ2 F ◦ DD 0 (DD 0 )−1 K[(D 0 (DD 0 )−1 K)0 (D 0 (DD 0 )−1 K)]− (D 0 (DD 0 )−1 K)0 .
| {z }
=I
(33)

Recall C(K) = C(D) ∩ C(F ) and K can be written as K = D(D 0 F ◦ )◦ . Then


0
(33) = CC 0 θ2 F ◦ D(D 0 F ◦ )◦ [(D 0 (DD 0 )−1 K)0 (D 0 (DD 0 )−1 K)]− (D 0 (DD 0 )−1 K)0 .
| {z }
=0

Then
(ii) = d0 CXPD0 (DD0 )−1 K X 0 C 0 d = d0 Y PD0 (DD0 )−1 K Y 0 d
This means that we can subtract the mean from X and the expression of the likelihood
ratio remains the same which means that the distribution will remain the same:
d0 Y (I − PD0 )Y 0 d
λN/2 =
d0 Y (I − PD0 )Y 0 d + d0 Y PD0 (DD0 )−1 K Y 0 d
d0 [Y − CE(X)](I − PD0 )[Y − CE(X)]0 d
= 0 ·
d [Y − CE(X)](I − PD0 )[Y − CE(X)]0 d + d0 [Y − CE(X)]
× PD0 (DD0 )−1 K [Y − CE(X)]0 d

If we denote this new variable, Y − CE(X), by Ye , then

d0 Ye (I − PD0 )Ye 0 d
λN/2 = .
d0 Ye (I − PD0 )Ye 0 d + d0 Ye PD0 (DD0 )−1 K Ye 0 d

The reason why we needed the adaptation of the Y was that it was not spherically
distributed. Now without changing the statistics overall, we achieved a new variable Ye
which is spherical distributed. To show this:

E(Ye ) = E(Y ) − CE(X) = 0,


D(Ye ) = D(Y ).

30
We showed that Ye is distributed with mean 0 and with covariance the same as Y ’s.
Define YeΓ = Ye Γ, where Γ is an N × N orthogonal matrix. The subscript Γ means
multiplication with Γ from the right-hand side.
E(YeΓ ) = E(Y Γ) = 0,
D(YeΓ ) = (Γ0 Γ) ⊗ Σ = I ⊗ Σ.

This proves that YeΓ and Ye follow the same (p − 1) × N normal distribution. Now we
can show that the scores, z 0 = d0 Ye , are spherically distributed. First, define zΓ0 = d0Γ YeΓ .
Notice that dΓ = d since they are derived from the same matrix:
YeΓ YeΓ0 = Ye Γ(Ye Γ)0 = Ye ΓΓ0 Ye 0 = Ye Ye 0 .
In Chapter 4.2, it was stated that d needs to be a unique function of the total sums of
product matrix. The derivation above is based on this information. Then
zΓ0 = d0 Ye Γ = z 0 Γ. (34)
From this result, we can say that the vectors z 0 and zΓ0 follows the same distribution
which is spherical. It is well-known that every spherically distributed random vector x
d
has a stochastic representation x = Ru(n) , where u(n) is a uniformly distributed random
d
vector and R ≥ 0 is independent of u(n) . Furthermore, if x ∼ Sn (φ), then x = Rw,
where w ∼ Nn (0, In ) is independent of R ≥ 0. Lastly, recall the Theorem 2.5.8. from
Fang and Zhang (1990) which is also given in Chapter 4.2 as Theorem 4.3. Now we can
work on the ratio Λ2/N and implement these results:

z 0 (I − PD0 )z
f (z) = λ2/N =
z 0 (I − PD0 )z + z 0 PD0 (DD0 )−1 K z
d (Rw)0 (I − PD0 )(Rw)
=
(Rw)0 (I − PD0 )(Rw) + (Rw)0 PD0 (DD0 )−1 K (Rw)
R2 w0 (I
 − PD0 )w
= 
R2 w0 (I − PD0 )w + w0 PD0 (DD0 )−1 K w


which does not depend on R. The theorem from Fang and Zhang is satisfied. Thus, we
can conclude that
z 0 (I − PD0 )z
λ2/N = ∼Λ(1, N − r(D), r(K))
z 0 (I − PD0 )z + z 0 PD0 (DD0 )−1 K z
 
N − r(D) r(K)
≡B , . (35)
2 2

Wilks’ lambda distribution can be written as the product of independently Q distributed


Beta variables (Mardia and Kent, 1979, Läuter, 2016), that is Λ(p, m, n) ∼ pi=1 Bi , where
B1 , B2 , ..., Bn are p independent random variables which follows the Beta distribution;
ui ∼ B( m−i+1 2
, n2 ), i = 1, ..., p. Notice that we have one dimensional Wishart distributed
expressions in the ratio given by (32) which means p = 1. Then the overall ratio will
reduce to one Beta distribution given at the end of the proof by (35).

31
4.3.2 Level hypothesis

First let’s recall how the null and the alternative hypotheses looked like in the normal
setting:

H2 |H1 : E(X) = M D, M F = 0,
A2 |H1 : E(X) = M D, CM F = 0

and the likelihood ratio was


0
|(C 0 )◦ S −1 (C 0 )◦ |−1
LR = 0 0 0 ·
((C 0 )◦ S −1 (C 0 )◦ )−1 + ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 XD 0 (DD 0 )−1 KQ−1 K 0 (DD 0 )−1
0
× DX 0 S −1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1
(36)

For this hypothesis, it is clear that the expressions in the likelihood ratio are already one
dimensional. The contrast matrix was given explicitly in Chapter 3.1. It is a (p − 1) × p
matrix and
0
C : (p − 1) × p ⇒ C 0 : p × (p − 1) ⇒ (C 0 )◦ : p × 1 ⇒ (C 0 )◦ : 1 × p.

However, dimension reduction is needed due to the degrees of freedom in the Wishart
distribution. Recall
0 0
((C 0 )◦ S −1 (C 0 )◦ )−1 ∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , N − r(D) − p + 1).

When p > N , the degrees of freedom will become negative which cannot take place.
Moreover, S −1 does not exist in this case. Thus, we need to take care of these issues.
One can notice that we have a more complicated situation here because the expressions
in the ratio are more complex than for the parallelism and the level hypotheses. To solve
0 0 0
the problem, we expand ((C 0 )◦ S −1 (C 0 )◦ )−1 and ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 X. The
expansions will be investigated in two parts; (i) and (ii), respectively:
0
(i) We start with ((C 0 )◦ S −1 (C 0 )◦ )−1 :
0 0 0 0
((C 0 )◦ S −1 (C 0 )◦ )−1 = ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1
0 0
× (C 0 )◦ Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1
0 0 0
=((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 [S − SC 0 (CSC 0 )−1 CS]Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 .
(37)

Recall that S = X(I − PD0 )X 0 . Then (37) becomes


0 0
((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 X[(I − PD0 )(I − X 0 C 0 (CX(I − PD0 )X 0 C 0 )−1
0
× CX(I − PD0 ))]X 0 Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 .

Apply the d vector to CX, i.e., d0 CX:


0 0
((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 X[(I − PD0 )(I − X 0 C 0 d(d0 CX(I − PD0 )
0
× X 0 C 0 d)−1 d0 CX(I − PD0 ))]X 0 Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 .
(38)

32
When we check (38), we can notice the structure, say LXP X 0 L0 , where P is an idempo-
tent matrix. This was aimed from the beginning and this will allow us to give the exact
0
distribution of (38). Notice that (C 0 )◦ Σ−1 X and CX are independent which is proved
in Appendix A, Result A3.
It is assumed that d is a function of CX. Therefore,
0 0
((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 X[(I − PD0 )(I − X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1
×d0 CX(I − PD0 ))] (39)

is normally distributed given CX and it’s square is Wishart distributed. Let’s check the
rank of the idempotent matrix given below to determine the degrees of freedom of the
Wishart distribution. For the detailed calculations, see Appendix B, Result B1:

r[(IN − PD0 )(IN − X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX(I − PD0 ))] = N − r(D) − 1.

The degrees of freedom of the Wishart distribution for (38) is now found. Then, the
covariance matrix needs to be obtained. From Theorem 2.12,
0 0
D[vec(((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 X)]
0 0 0
=I ⊗ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 ΣΣ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1
0
=I ⊗ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 . (40)

We have examined the first expression in the ratio given by (36). Dimension reduction
has been applied with the help of the vector d after using necessary expansions. Then
the distribution of this expression, that is (38), has been found.
We continue with the second expression which appears in the denominator of (36).
0 0
(ii) Let’s start with the first part which is ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 X. Recall also
0 0
the relations (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 = I − SC 0 (CSC 0 )−1 C and S = X(I −
PD0 )X 0 :
0 0
((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 X
0 0 0 0
=((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 (C 0 )◦ ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 X
0 0
=((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 X[I − (I − PD0 )X 0 C 0 (CX(I − PD0 )X 0 C 0 )−1 CX].

Apply the d vector to CX, i.e., d0 CX and X 0 C 0 d:


0 0
((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 X[I − (I − PD0 )X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX].
(41)
0
Notice that (C 0 )◦ Σ−1 X and CX are independent which have been proven before and so
0
are (C 0 )◦ Σ−1 X and d0 CX. The first part of the second expression in the denominator
of the likelihood ratio given by (36) has now been investigated. The second part is
D 0 (DD 0 )−1 KQ−1 K 0 (DD 0 )−1 D. It is apparent that Q in this expression, which was first
introduced in Theorem 3.2, also contains CX for which we need to apply the dimension
reduction. Recall the matrix Q:

Q = K 0 (DD 0 )−1 K + K 0 (DD 0 )−1 DX 0 C 0 (CX(I − PD0 )X 0 C 0 )− CXD 0 (DD 0 )−1 K.

33
Apply d0 to CX which yields
e = K 0 (DD 0 )−1 K + K 0 (DD 0 )−1 DX 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)− d0 CXD 0 (DD 0 )−1 K.
Q

Then, after the dimension reduction, the second expression in the denominator of the
likelihood ratio given by (36) becomes
0 0
((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 X[I − (I − PD0 )X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1
× d0 CX]D 0 (DD 0 )−1 K Q e −1 K 0 (DD 0 )−1 D[I − X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1
0
× d0 CX(I − PD0 )]X 0 Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 . (42)

In (42)

[I − (I − PD0 )X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX]D 0 (DD 0 )−1 K Q


e −1
×K 0 (DD 0 )−1 D[I − X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX(I − PD0 )], (43)

which corresponds to the part that lies between X and X 0 , is an idempotent matrix.
For the details, see Appendix B, Result B2. This means that for (42) the structure,
LXP X 0 L0 , where P is an idempotent matrix, has been achieved. We just need to
0
show that (C 0 )◦ Σ−1 X and CX are independent which has already been shown in this
chapter for (i). Since d is a function of CX and Q e also depends on d0 CX, one can
0 ◦0 −1
conclude that (C ) Σ X is independent of what lies in the idempotent matrix given
by (43). Thus, (42) is Wishart distributed. For the degrees of freedom of the Wishart
distribution, one needs to check the rank of (43) which is equal to r(K), where K is any
matrix satisfying C(K) = C(D) ∩ C(F ), which was first defined in Theorem 3.1. For the
proof, see Appendix B, Result B3.
For the scale matrix of the Wishart distribution for (42), see the following which has also
been calculated in (40):
0 0 0
D[vec(((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 X)] = I ⊗ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 .

Our aim was to find the distribution of the ratio which was given by (36) after the
dimension reduction. The expressions from this ratio have been investigated separately
in (i) and (ii) above. If the results from (i) and (ii) are put together, we can conclude the
following:
0 0
((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 X[(I − PD0 )(I − X 0 C 0 d(d0 CX(I − PD0 )
0
×X 0 C 0 d)−1 d0 CX(I − PD0 ))]X 0 Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1
0
∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , N − r(D) − 1); (44)
◦0 ◦0
((C 0 ) Σ−1 (C 0 )◦ )−1 (C 0 ) Σ−1 X[I − (I − PD0 )X 0 C 0 d(d0 CX(I − PD0 )
× X 0 C 0 d)−1 d0 CX]D 0 (DD 0 )−1 K Qe −1 K 0 (DD 0 )−1 D[I − X 0 C 0 d(d0 CX
0
× (I − PD0 )X 0 C 0 d)−1 d0 CX(I − PD0 )]X 0 Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1
0
∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , r(K)). (45)

34
These expressions can be simplified in the following way:
0 0
(44) = ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 [X(I − PD0 )X 0 − X(I − PD0 )X 0 C 0 d(d0 CX
0
× (I − PD0 )X 0 C 0 d)−1 d0 CX(I − PD0 )X 0 ]Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1
0 0
= ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 [S − SC 0 d(d0 CSC 0 d)−1 d0 CS]Σ−1 (C 0 )◦
0
× ((C 0 )◦ Σ−1 (C 0 )◦ )−1
0 0 0
= ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 [S(I − PC 0 d,S −1 )]Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1

Then
0 0 0
((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 [S(I − PC 0 d,S −1 )]Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1
0
∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , N − r(D) − 1).

Moreover
0 0
(45) = ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 [X − SC 0 d(d0 CSC 0 d)−1 d0 CX]
× D 0 (DD 0 )−1 K Q
e −1 K 0 (DD 0 )−1 D[X 0 − X 0 C 0 d
0
× (d0 CSC 0 d)−1 d0 CS]Σ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1
0 0
= ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 (I − P 0 0 −1 )XD 0 (DD 0 )−1 K Q
C d,S
e −1
× K 0 (DD 0 )−1 DX 0 (I − PC 0 d,S −1 )
0
∼ W1 (((C 0 )◦ Σ−1 (C 0 )◦ )−1 , r(K)).

Denote (44) by U
e L and (45) by VeL . Then the following theorem can be presented:

Theorem 4.5.

U
eL
λ2/N = ∼Λ(1, N − r(D) − 1, r(K))
U
e L + VeL
 
N − r(D) − 1 r(K)
≡B , .
2 2

4.3.3 Flatness Hypothesis

We will follow a very similar approach to the parallelism hypothesis due to the similarities
between the test statistics in the likelihood ratio. Recall the null and the alternative
hypotheses:

H3 |H1 : E(X) = M D, CM = 0,
A3 |H1 : E(X) = M D, CM F = 0

and


b A3 | |CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 |
LR = = ,

b H3 | |CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 + CXPD0 F ◦ X 0 C 0 |

35
where
CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 ∼ Wp−1 (CΣC 0 , N − r(D) + r(K)), (46)
CXPD0 F ◦ X 0 C 0 ∼ Wp−1 (CΣC 0 , r(D 0 F ◦ )). (47)
Recall that CSC 0 = CX(I−PD0 )X 0 C 0 . Put Y = CX and since X ∼ Np,N (M D, Σ, IN ),
Y ∼ N(p−1),N (CM D, CΣC 0 , IN ). To reduce the dimension, we will apply the weight
vector d to Y instead of the data matrix X like we did for the parallelism hypothesis
and denote the new vector by z, where z 0 = d0 Y . Then
d0 Y (I − PD0 )Y 0 d + d0 Y PD0 (DD0 )−1 K Y 0 d
λ2/N =
d0 Y (I − PD0 )Y 0 d + d0 Y PD0 (DD0 )−1 K Y 0 d + d0 Y PD0 F ◦ Y 0 d
z 0 (I − PD0 )z + z 0 PD0 (DD0 )−1 K z
= 0 . (48)
z (I − PD0 )z + z 0 PD0 (DD0 )−1 K z + z 0 PD0 F ◦ z
The distribution of this ratio is presented in the next theorem.
Theorem 4.6. The ratio given in (48) follows Wilks’ lambda distribution with parameters
1, N − r(D) + r(K) and r(D 0 F ◦ ) that is denoted by Λ(1, N − r(D) + r(K), r(D 0 F ◦ ))
r(D 0 F ◦ ) 
which equals B N −r(D)+r(K)
2
, 2
.
Proof. The situation with d being a function of Y Y 0 is the same as with the parallelism
hypothesis. As a result of this, d0 Y is not normally distrubuted. Thus,
d0 Y (I − PD0 )Y 0 d + d0 Y PD0 (DD0 )−1 K Y 0 d  W1 (d0 CΣC 0 d, N − r(D) + r(K),
d0 Y PD0 F ◦ Y 0 d  W1 (d0 CΣC 0 d, r(D 0 F ◦ )).
Then we will use the related theorems of spherical distributions presented in Section 4.2.
One can see that Y is not spherically distributed because the orthogonally transformed
matrix YΓ does not have the same distribution as Y since E(YΓ ) 6= E(Y ). Here again,
as with the first hypothesis (parallelism hypothesis), we will form a new variable by
subtracting the mean under the null hypothesis from X. The statistics in the likelihood
ratio will not change and we will assure sphericity for the scores by assuring sphericity
for the new variable.
Recall that the model under the null hypothesis equals the classical growth curve model
(GMANOVA)
X = (C 0 )◦ θD + E
and the mean under the null hypothesis is given by (C 0 )◦ θD.
After subtracting the mean from X, the ratio can be written
d0 [Y − CE(X)](I − PD0 )[Y − CE(X)]0 d + d0 [Y − CE(X)]
× PD0 (DD0 )−1 K [Y − CE(X)]0 d
λN/2 = 0 ,
d [Y − CE(X)](I − PD0 )[Y − CE(X)]0 d + d0 [Y − CE(X)]
× PD0 (DD0 )−1 K [Y − CE(X)]0 d + d0 [Y − CE(X)]PD0 F ◦ [Y − CE(X)]0 d
since
d0 C[X − E(X)](I − PD0 )[X − E(X)]0 C 0 d
=d0 C[X − (C 0 )◦ θD](I − PD0 )[X − (C 0 )◦ θD]0 C 0 d
=d0 CX(I − PD0 )X 0 C 0 d
=d0 Y (I − PD0 )Y 0 d ; (49)

36
d0 C[X − E(X)]PD0 (DD0 )−1 K [X − E(X)]0 C 0 d
=d0 C[X − (C 0 )◦ θD]PD0 (DD0 )−1 K [X − (C 0 )◦ θD]0 C 0 d
=d0 CXPD0 (DD0 )−1 K X 0 C 0 d
=d0 Y PD0 (DD0 )−1 K Y 0 d ; (50)

d0 C[X − E(X)]PD0 F ◦ [X − E(X)]0 C 0 d


=d0 C[X − (C 0 )◦ θD]PD0 F ◦ [X − (C 0 )◦ θD]0 C 0 d
=d0 CXPD0 F ◦ X 0 C 0 d
=d0 Y PD0 F ◦ Y 0 d . (51)

From (49), (50) and (51), we can conclude that the distribution of the ratio will remain the
same after we subtract the mean under the null hypothesis from X. Let Ye = Y −CE(X).
Then
d0 Ye (I − PD0 )Ye 0 d + d0 Ye PD0 (DD0 )−1 K Ye 0 d
λN/2 = ·
d0 Ye (I − PD0 )Ye 0 d + d0 Ye × PD0 (DD0 )−1 K Ye 0 d + d0 Ye PD0 F ◦ Ye 0 d

The next step is to show that Ye is spherically distributed. This will be exactly the same
as for the parallelism hypothesis, consequently we omit the details. The scores, z 0 = d0 Ye ,
will also be spherically distributed according to the result in (34).
Finally we can establish the theorem, knowing that z is spherically distributed and using
Theorem 4.3 by Fang and Zhang,
z 0 (I − PD0 )z + z 0 PD0 (DD0 )−1 K z
λN/2 = f (z) =
z 0 (I − PD0 )z + z 0 PD0 (DD0 )−1 K z + z 0 PD0 F ◦ z
d (Rw)0 (I − PD0 )(Rw) + (Rw)0 PD0 (DD0 )−1 K (Rw)
=
(Rw)0 (I
− PD0 )(Rw) + (Rw)0 PD0 (DD0 )−1 K (Rw) + (Rw)0 PD0 F ◦ (Rw)
 0
R2
w (I − PD0 )w + w0 PD0 (DD0 )−1 K w
= 2 .
R w0 (I − PD0 )w + w0 PD0 (DD0 )−1 K w + w0 PD0 F ◦ w


One can see that f (z) does not depend on R which means that the ratio will keep its
distribution. As a result,
z 0 (I − PD0 )z + z 0 PD0 (DD0 )−1 K z
λN/2
= 0 0 0
∼ Λ(1, N − r(D) + r(K), r(D 0 F ◦ ))
z (I − PD0 )z + z PD0 (DD0 )−1 K z + z PD0 F ◦ z
N − r(D) + r(K) r(D 0 F ◦ )
 
≡B , ,
2 2
and the theorem is proven.

5 Conclusion
In this report, profile analysis of several groups is of interest. Different from the current
methods which have been proposed by Srivastava (1987, 2002), we treat the problems
as problems in MANOVA and GMANOVA. Reformulation of the problems in this way
is useful in the later stages when the test statics have been formulated under high-
dimensional setting. For all three hypotheses, we derived the test statistic |U|U |
+V |
where

37
U and V are independently Wishart distributed. The distribution of |U|U |
+V |
is well-
known, that is Wilks’ lambda distribution which can be written as the product of Beta
distributed variables.
When p > n, one is exposed to issues such as the singularity of S. Different approaches
have been proposed so far, but the one we focused on was Läuter’s (1996, 2016) and
Läuter, Glimm and Kropf’s (1996, 1998) method. The original idea is taking a linear
combinations of p-variables per individual, which means multiplying the data matrix X
with a vector d0 , where d is a function of XX 0 . We needed to adapt this idea to make
it work in profile analysis. Thus, instead of implementing the reduction just for the data
matrix, we implemented it to a linear function of X, which appears in the likelihood
ratio statistic. There is a special case with the level hypothesis. Due to the restrictions
applied to the mean parameter space in this hypothesis, the likelihood ratio statistics
becomes already one dimensional, but in high-dimensions, the degrees of freedom of the
Wishart distribution for U becomes negative. By dimension reduction, we again attain
Wilks’ lambda distributions for the ratios in the end, which do not depend on p this time.
Spherical distribution theory is used to show the matrices follow a Wishart distribution.
Notice that we did not need to use spherical distributions for the level hypothesis.
The important question which still needs to be answered is how we should choose d. It is
noted that d needs to be a function of CX, but how to determine this function remains
as an open question.

Appendices
Appendix A will include the technical results in the usual setting (Chapter 3, where
N > p), whereas Appendix B will include the technical results in a high dimensional
setting (Chapter 4, where p > N ).

Appendix A
Result A1. Let W1 ∼ Wp (Σ, f1 ) and W2 ∼ Wp (Σ, f2 ) with f1 ≥ p. Then the distribu-
tion of Λ = |W|W 1|
1 +W2 |
, that is Wilks’ lambda distribution which is denoted by Λ(p, f1 , f2 ),
does not depend on Σ.
To show this, choose a non-singular matrix A which satisfies AΣA0 = Ip . Then

f1 = AW1 A0 ∼ Wp (Ip , f1 ) and W


W f2 = AW2 A0 ∼ Wp (Ip , f2 ).

and
|W
f1 | |A||W1 ||A0 |
Λ
e= = = Λ.
|W f2 |
f1 + W |A||W1 + W2 ||A0 |

Λ’s
e distribution does not depend on Σ and since Λ
e = Λ, Λ’s distribution is also indepen-
dent of Σ.
Based on this result, one can conclude that the distribution of

|CSC 0 |
λ2/N = ,
|CSC 0 + CXPD0 (DD0 )−1 K X 0 C 0 |

38
where

CSC 0 ∼Wp−1 (CΣC 0 , N − r(D)),


CXPD0 (DD0 )−1 K X 0 C 0 ∼ Wp−1 (CΣC 0 , r(K)),

does not depend on Σ and one can replace CΣC 0 by Ip−1 . Thus,

λ2/N ∼ Λ(p − 1, N − r(D), r(K)).

Result A2.

(18) =(I − PD0 )(I − X 0 C 0 [CX(I − PD0 )X 0 C 0 ]−1 CX(I − PD0 ))


=(I − PD0 ) − (I − PD0 )X 0 C 0 [CX(I − PD0 )X 0 C 0 ]−1 CX(I − PD0 ).

We will find the rank of this expression. If we use Proposition 2.13,

r((I − PD0 ) − (I − PD0 )X 0 C 0 [CX(I − PD0 )X 0 C 0 ]−1 CX(I − PD0 ))


= r(I − PD0 ) + r(IN − X 0 C 0 [CX(I − PD0 )X 0 C 0 ]−1 CX(I − PD0 )) − N
= N − r(D) + N − r(X 0 C 0 [CX(I − PD0 )X 0 C 0 ]−1 CX(I − PD0 )) − N
= N − r(D) − tr(X 0 C 0 [CX(I − PD0 )X 0 C 0 ]−1 CX(I − PD0 )) (by Prop. 2.14 ii))
= N − r(D) − tr(CX(I − PD0 )X 0 C 0 [CX(I − PD0 )X 0 C 0 ]−1 ) (by Prop. 2.14 i))
= N − r(D) − tr(Ip−1 )
= N − r(D) − p + 1.
0
Result A3. (C 0 )◦ Σ−1 X and X 0 C 0 are independent:
By Definition 2.12,
0 0
Cov[(C 0 )◦ Σ−1 X, CX] = E[vec((C 0 )◦ Σ−1 X)vec0 (CX)]
0
= E[(I ⊗ (C 0 )◦ Σ−1 )vecXvec0 X(I ⊗ C 0 )]
0
= (I ⊗ (C 0 )◦ Σ−1 )E[vecXvec0 X](I ⊗ C 0 )
0
= (I ⊗ (C 0 )◦ Σ−1 )(I ⊗ Σ)(I ⊗ C 0 ) (by Prop. 2.16 ii))
0 ◦0 −1 0
= I ⊗ (C ) Σ ΣC = 0. (by Prop. 2.16 iii))

0
As a conclusion, (C 0 )◦ Σ−1 X and X 0 C 0 are independent.

Result A4.
0 0
(22) = ((C 0 )◦ S −1 (C 0 )◦ )−1 (C 0 )◦ S −1 XD 0 (DD 0 )−1 KQ−1/2 .

The dispersion of this expression is then written as

(23) =Q−1/2 K 0 (DD 0 )−1 D[I − X 0 C 0 (CX(I − PD0 )X 0 C 0 )−1 CX(I − PD0 )]
× [I − (I − PD0 )X 0 C 0 (CX(I − PD0 )X 0 C 0 )−1 CX]D 0 (DD 0 )−1 KQ−1/2
0 0 0
⊗ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 ΣΣ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 .

39
The first part of the Kronecker product given by (23) can be calculated as
0 0 0 0 −1
 

 I − X C (CX(I − PD 0 )X C ) CX(I − P D 0 )

 − (I − P 0 )X 0 C 0 (CX(I − P 0 )X 0 C 0 )−1 CX 
 
D D
1) Q−1/2 K 0 (DD 0 )−1 D

 + X 0 C 0 (CX(I − PD0 )X 0 C 0 )−1 CX(I − PD0 )  
 
0 0 0 0 −1
× X C (CX(I − PD0 )X C ) CX
 

× D 0 (DD 0 )−1 KQ−1/2


=Q−1/2 K 0 (DD 0 )−1 DD 0 (DD 0 )−1 KQ−1/2 + Q−1/2 K 0 (DD 0 )−1 DX 0 C 0 (CX
× (I − PD0 )X 0 C 0 )−1 CXD 0 (DD 0 )−1 KQ−1/2
=Q−1/2 K 0 (DD 0 )−1 KQ−1/2 + Q−1/2 K 0 (DD 0 )−1 DX 0 C 0 (CX(I − PD0 )X 0 C 0 )−1
× CXD 0 (DD 0 )−1 KQ−1/2
=Q−1/2 [K 0 (DD 0 )−1 K + K 0 (DD 0 )−1 DX 0 C 0 (CX(I − PD0 )X 0 C 0 )−1 CX
× D 0 (DD 0 )−1 K]Q−1/2
=Q−1/2 QQ−1/2 K 0 (DD 0 )−1 D
=I.

The second part of the Kronecker product given by (23) can be calculated as
0 0 0
2) ((C 0 )◦ Σ−1 (C 0 )◦ )−1 (C 0 )◦ Σ−1 ΣΣ−1 (C 0 )◦ ((C 0 )◦ Σ−1 (C 0 )◦ )−1
0 0 0
=((C 0 )◦ Σ−1 (C 0 )◦ )−1 ((C 0 )◦ Σ−1 (C 0 )◦ )((C 0 )◦ Σ−1 (C 0 )◦ )−1
0
=((C 0 )◦ Σ−1 (C 0 )◦ )−1 .

After combining 1) and 2), one can conclude that


0
(23) = I ⊗ ((C 0 )◦ Σ−1 (C 0 )◦ )−1 .

Appendix B
Result B1. The rank of the idempotent matrix that appears in the middle part of (39)
is calculated as

r[(IN − PD0 )(IN − X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX(I − PD0 ))] (Prop. 2.13)

0 0 0 0 0 −1 0
=r(IN − PD0 ) − r[IN − X C d(d CX(I − PD0 )X C d) d CX(I − PD0 )] − N
=N − r(D) + N − r(X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX(I − PD0 )) − N (Prop. 2.14, ii))
=N − r(D) − tr(X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX(I − PD0 )) (Prop. 2.14, i))

0 0 0 0 0 0 −1
=N − r(D) − tr(d CX(I − PD0 )X C d(d CX(I − PD0 )X C d) )
=N − r(D) − tr(1)
=N − r(D) − 1.

Result B2. The matrix

[I − (I − PD0 )X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX]D 0 (DD 0 )−1 K Q


e −1
×K 0 (DD 0 )−1 D[I − X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX(I − PD0 )],

which is given by (43) in Chapter 4.3.2 is an idempotent matrix:

40
Put
B = [I − (I − PD0 )X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX]D 0 (DD 0 )− K Q
e −1/2 .

Calculate
0 0 0 0 0 −1 0
 

 I − X C d(d CX(I − P D 0 )X C d) d CX(I − PD 0 )

 − (I − P 0 )X 0 C 0 d(d0 CX(I − P 0 )X 0 C 0 d)−1 d0 CX 
 
D D
B 0 B =Q
e −1/2 K 0 (DD 0 )−1 D

 + X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX(I − PD0 ) 
 
× X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX
 

× D 0 (DD 0 )−1 K Q
e −1/2
e −1/2 K 0 (DD 0 )−1 DD 0 (DD 0 )−1 K Q
=Q e −1/2 + Q
e −1/2 K 0 (DD 0 )−1 DX 0 C 0 d(d0 CX
× (I − PD0 )X 0 C 0 d)−1 d0 CXD 0 (DD 0 )−1 K Q e −1/2
e −1/2 K 0 (DD 0 )−1 K Q
=Q e −1/2 + Q
e −1/2 K 0 (DD 0 )−1 DX 0 C 0 d(d0 CX(I − PD0 )
× X 0 C 0 d)−1 d0 CXD 0 (DD 0 )−1 K Q
e −1/2
e −1/2 [K 0 (DD 0 )−1 K + K 0 (DD 0 )−1 DX 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX
=Q
× D 0 (DD 0 )−1 K]Q e −1/2
e −1/2 Q
=Q eQe −1/2 = I.

Observe that (43) = BB 0 . To prove that (43) is idempotent, one needs to show that
BB 0 BB 0 = BB 0 :
0
B |B{z B} B 0 = BB 0 .
=I

Result B3.
(43) = [I − (I − PD0 )X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX]D 0 (DD 0 )−1 K Q
e −1
× K 0 (DD 0 )−1 D[I − X 0 C 0 d(d0 CX(I − PD0 )X 0 C 0 d)−1 d0 CX(I − PD0 )]
= BB 0 .
r(43) = r(BB 0 ) = tr(BB 0 ) = tr(B 0 B) = tr(I) = s = r(K)

The size of Qe is the same with the size of Q and Q’s size information together with the
relation to the matrix K is given in the proof of Theorem 3.2 in Chapter 3.2.2.

Acknowledgements
The authors would like to thank Professor Julia Volaufova and Dr. Martin Singull for their
invaluable comments and suggestions which helped to improve the report significantly.

References
[1] Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis (3rd
ed). Wiley, New York.
[2] Anderson, T. W. and Fang, K. T. (1982). On the theory of multivariate elliptically
contoured distributions and their applications. Technical Report No. 54, Stanford
University, California.

41
[3] Anderson, T. W. and Fang, K. T. (1990). Theory and applications of elliptically
contoured and related distributions. Technical Report No. 24, Stanford University,
California.

[4] Bilodeau, M. and Brenner, D. (1999). Theory of Multivariate Statistics. Springer-


Verlag, New York.

[5] Cambanis, S., Huang, S., and Simons, G. (1981). On the theory of elliptically con-
toured distribution. Journal of Multivariate Analysis, 11, 368-385.

[6] Dawid, A. P. (1977). Spherical matrix distributions and a multivariate model. Journal
of the Royal Statistical Society, Series B, 39, 254-261.

[7] Dawid, A. P. (1978). Extendibility of spherical matrix distributions. Journal of Mul-


tivariate Analysis, 8, 559-566.

[8] Fang, K. T., Kotz, S. and Ng, K. W. (1990). Symmetric Multivariate and Related
Distributions. Springer-Science+Business Media, B.V.

[9] Fang, K. T. and Zhang, Y. T. (1990). Generalized Multivariate Analysis. Springer-


Verlag, Berlin.

[10] Fujikoshi, Y. (2009). Statistical inference for parallelism hypothesis in growth curve
model. SUT Journal of Mathematics, 45, 137-148.

[11] Fujikoshi, Y., Ulyanov, V. V., and Shimizu, R. (2010). Multivariate Statistics: High-
Dimensional and Large-Sample Approximations. Wiley, Hoboken, New Jersey.

[12] Geisser, S. (2003). The analysis of profile data–revisited. Statistics in Medicine, 22,
3337-3346.

[13] Greenhouse, S. W., Geisser, S. (1959). On the methods in the analysis of profile
data. Psychometrika 24, 95-112.

[14] Harrar, S. W. and Kong, X. (2016). High-dimensional multivariate repeated measures


analysis with unequal covariance matrices. Journal of Multivariate Analysis 145, 1-21.

[15] Kariya, T. and Sinha, B. K. (1989). Robustness of Statistical Tests. Academic Press,
Boston.

[16] Kelker, D. (1970). Distribution theory of spherical distributions and a location-scale


parameter generalization. Sankhyā Ser. A 32, 419-430.

[17] Kollo, T. and von Rosen, D. (2005). Advanced Multivariate Statistics with Matrices.
Springer, Dordrecht.

[18] Kollo, T., von Rosen, T. and von Rosen, D. (2011). Estimation in high-himensional
analysis and multivariate linear models. Communications in Statistics - Theory and
Methods, 40, 1241-1253.

[19] Kropf, S. (2000). Hochdimensionale Multivariate Verfahren In Der Medizinischen


Statistik. Shaker Verlag, Aachen.

42
[20] Läuter, J. (1996). Exact t and F tests for analyzing studies with multiple endpoints.
Biometrics, 52, 964-970.

[21] Läuter, J. (2016). Multivariate Statistik - drei Manuskripte. Shaker Verlag, Aachen.

[22] Läuter, J., Glimm, E. and Kropf, S. (1996). New multivariate tests for data with an
inherent structure. Biometrical Journal, 38, 5-23.

[23] Läuter, J., Glimm, E., and Kropf, S. (1998). Multivariate tests based on left-
spherically distributed linear scores. The Annals of Statistics 26, 1972-1988.

[24] Ledoit, O. and Wolf, M. (2002). Some hypothesis tests for the covariance matrix
when the dimension is large compared to the sample size. The Annals of Statistics
30, 1081-1102.

[25] Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. Academic
Press, New York.

[26] Morrison, D. F. (2004). Multivariate Statistical Methods, 4th edition. Duxbury Press,
CA.

[27] Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley, New York.

[28] O’Brien, P. C. (1984). Procedures for comparing samples with multiple endpoints.
Biometrics, 40, 1079-1087.

[29] Ohlson, M. and Srivastava, M. S. (2010). Profile analysis for a growth curve model.
Journal of the Japan Statistical Society, 40, 1-21.

[30] Onozawa, M., Nishiyama, T. and Seo, T. (2016). On test statistics in profile analysis
with high-dimensional data. Communications in Statistics - Simulation and Compu-
tation, 45, 3716-3743.

[31] Potthoff, R. F. and Roy, S. N. (1964). A generalized multivariate analysis of variance


model useful especially for growth curve problems. Biometrika 51, 313-326.

[32] Rao, C. R. (1973). Linear Statistical Inference and Its Applications, 2nd edition.
Wiley, New York.

[33] Rencher, A. C. (2002). Methods of Multivariate Analysis, 2nd edition. Wiley, New
York.

[34] Seo, T., Sakurai, T. and Fujikoshi, Y. (2011). LR tests for two hypotheses in profile
analysis of growth curve data. SUT Journal of Mathematics, 47, 105-118.

[35] Shutoh, N. and Takahashi, S. (2016). Tests for parallelism and flatness hypotheses
of two mean vectors in high-dimensional settings. Journal of Statistical Computation
and Simulation, 86, 1150-1165.

[36] Srivastava, M. S. (1987). Profile analysis of several groups. Communications in


Statistics - Theory and Methods, 16, 909-926.

[37] Srivastava, M. S. (2002). Methods of Multivariate Statistics. Wiley, New York.

43
[38] Srivastava, M. S. (2005). Some tests concerning the covariance matrix in high di-
mensional data. Journal of the Japan Statistical Society, 35, 251–272.

[39] Srivastava, M. S. (2007). Multivariate theory for analyzing high dimensional data.
Journal of the Japan Statistical Society, 37, 53-86.

[40] Srivastava, M. S. and Carter, E. M.(1983). An Introduction to Applied Multivariate


Statistics. North Holland, New York.

[41] Srivastava, M. S. and Du, M. (2008). A test for the mean vector with fewer obser-
vations than the dimension. Journal of Multivariate Analysis, 99, 386–402.

[42] Srivastava, M. S. and Fujikoshi, Y. (2006). Multivariate analysis of variance


with fewer observations than the dimension. Journal of Multivariate Analysis, 97,
1927–1940.

[43] Srivastava, M. S. and Khatri, C. G. (1979). An Introduction to Multivariate Statistics.


North Holland, New York.

[44] Srivastava, M. S. and Singull, M. (2012). Profile analysis with random-effects covari-
ance structure. Journal of the Japan Statistical Society, 42, 145-164.

[45] Srivastava, M. S. and Singull, M. (2017). Test for the mean matrix in a growth curve
model for high dimensions. Communications in Statistics - Theory and Methods, 46,
6668-6683.

[46] von Rosen, D. (2018). Bilinear Regression Analysis: An Introduction. Springer In-
ternational Publishing, New York.

[47] Yokoyama, T. (1995). LR test for random-effects covariance structure in a parallel


profile model. Annals of the Institute of Statistical Mathematics, 47, 309-320.

[48] Yokoyama, T. and Fujikoshi, Y. (1993). A parallel profile model with random-effects
covariance structure. Journal of the Japan Statistical Society, 23, 83-89.

44

You might also like