Professional Documents
Culture Documents
Nonlinear Models:
Fixed Effects, Random
Effects, and Mixed Models
Erik W. Grafarend
Walter de Gruyter
Grafarend · Linear and Nonlinear Models
Erik W. Grafarend
Linear and
Nonlinear Models
Fixed Effects, Random Effects, and Mixed Models
≥
Walter de Gruyter · Berlin · New York
Author
Erik W. Grafarend, em. Prof. Dr.-Ing. habil. Dr. tech. h.c. mult Dr.-Ing. E.h. mult
Geodätisches Institut
Universität Stuttgart
Geschwister-Scholl-Str. 24/D
70174 Stuttgart, Germany
E-Mail: grafarend@gis.uni-stuttgart.de
앪
앝 Printed on acid-free paper which falls within the guidelines of the ANSI to ensure permanence
and durability.
Grafarend, Erik W.
Linear and nonlinear models : fixed effects, random effects, and
mixed models / by Erik W. Grafarend.
p. cm.
Includes bibliographical references and index.
ISBN-13: 978-3-11-016216-5 (hardcover : acid-free paper)
ISBN-10: 3-11-016216-4 (hardcover : acid-free paper)
1. Regression analysis. 2. Mathematical models. I. Title.
QA278.2.G726 2006
519.5136⫺dc22
2005037386
ISBN-13: 978-3-11-016216-5
ISBN-10: 3-11-016216-4
쑔 Copyright 2006 by Walter de Gruyter GmbH & Co. KG, 10785 Berlin
All rights reserved, including those of translation into foreign languages. No part of this book may
be reproduced or transmitted in any form or by any means, electronic or mechanical, including
photocopy, recording, or any information storage and retrieval system, without permission in
writing from the publisher.
Printed in Germany
Cover design: Rudolf Hübler, Berlin
Typeset using the author’s word files: M. Pfizenmaier, Berlin
Printing and binding: Hubert & Co. GmbH & Co. Kg, Göttingen
Preface
“Well, Mr. Jacobi; here it is: all the generalized inversion of two generations of
inventors who knowingly or unknowingly subscribed and extended your dictum.
Please, forgive us if we have over-inverted, or if we have not always inverted in
the natural and sensible way. Some of us have inverted with labor and pain by
using hints from a dean or a tenure and promotion committee that “you better
invert more, or else you would be inverted.””
M.Z. Nashed, L.B. Rall
There is a certain intention in reviewing linear and nonlinear models from the
point-of-view of fixed effects, random effects and mixed models. First, we want
to portray the different models from the algebraic point of view – for instance a
minimum norm, least squares solution (MINOLESS) – versus the stochastic
point-of-view – for instance a minimum bias, minimum variance “best” solution
(BLIMBE). We are especially interested in the question under which assumption
the algebraic solution coincides with the stochastic solution, for instance when
MINOLESS is identical to BLIMBE.
The stochastic approach is richer with respect to modeling. Beside the first order
moments we need, the expectation of a random variable, also a design for the
central second order moments, the variance-covariance matrix of the random
variable as long as we deal with second order statistics. Second, we therefore
setup a unified approach to estimate (predict) the first order moments, or for
instance by BLUUE (BLUUP) and the central second order moments, for in-
stance by BIQUUE, if they exist. In short, BLUUE (BLUUP) stands for “Best
Linear Uniformly Unbiased Estimation” (Prediction) and BIQUUE alternatively
for “Best Invariant Quadratic Uniformly Unbiased Estimation”.
A third criterion is the decision whether the observation vector is inconsistent or
random, whether the unknown parameter vector is random or not, whether the
“first design matrix” within a linear model is random or not and finally whether
the “mixed model” E{y} = Aȟ + &E{]} + E{;}Ȗ has to be applied if we restrict
ourselves to linear models. How to handle a nonlinear model where we have a
priori information about approximative values will be outlined in detail. As a
special case we also deal with “condition equations with unknowns”
BE{y} c = Aȟ where the matrices/vector {A, B, c} are given and the observation
vector y is again a random variable.
vi Preface
the modern theory of dynamic nonlinear models and comment on the theory of
chaotic behavior as its up-to-date counterparts.
In the appendices we specialize on specific topics. Appendix A is a review on
matrix algebra, namely special matrices, scalar measures and inverse matrices,
eigenvalues and eigenvectors and generalized inverses. The counterpart is matrix
analysis which we outline in Appendix B. We begin with derivations of scalar-
valued and vector-valued vector functions, followed by a chapter on derivations
of trace forms and determinantal forms. A specialty is the derivation of a vec-
tor/matrix function of a vector/matrix. We learn how to derive the Kronecker-
Zehfuß product and matrix-valued symmetric or antisymmetric matrix function.
Finally we show how to compute higher order derivatives. Appendix C is an
elegant review of Lagrange multipliers. The lengthy Appendix D introduces
sampling distributions and their use: confidence intervals and confidence re-
gions. As peculiar vehicles we show how to transform random variables. A first
confidence interval of Gauss-Laplace normally distributed observations is com-
puted for the case P , V 2 known, example the Three Sigma Rule. A second confi-
dence interval for the sampling form the Gauss-Laplace normal distributions for
the mean built on the assumption that the variance is known.
The alternative sampling from the Gauss-Laplace normal distribution leads the
third confidence interval for the mean, variance unknown based on the Student
sampling distribution. The fourth confidence interval for the variance is based
on the analogue sampling for the variance based on the F 2 - Helmert distribution.
For both the intervals of confidence, namely based on the Student sampling dis-
tribution for the mean, variance unknown, and based on the F 2 - Helmert distri-
bution for the sample variance, we compute the corresponding Uncertainty Prin-
ciple. The case of a multidimensional Gauss-Laplace normal distribution is out-
lined for the computation of confidence regions for fixed parameters in the linear
Gauss-Markov model. Key statistical notions like moments of a probability dis-
tribution, the Gauss-Laplace normal distribution (quasi-normal distribution),
error propagation as well as important notions of identifiability and unbiased-
ness are reviewed. We close with bibliographical indices.
Here we are not solving rank-deficient or ill-problems in using UTV or QR fac-
torization techniques. Instead we refer to A. Björk (1996), P. Businger and G. H.
Golub (1965), T. F. Chan and P. C. Hansen (1991, 1992), S. Chandrasekaran
and I. C. Ipsen (1995), R. D. Fierro (1998), R. D. Fierro and J. R. Bunch (1995),
R. D. Fierro and P. C. Hansen (1995, 1997), L. V. Foster (2003), G. Golub and
C. F. van Loan (1996), P. C. Hansen (1990 a, b, 1992, 1994, 1995, 1998), Y.
Hosada (1999), C. L. Lawson and R. J. Hanson (1974), R. Mathias and G.W.
Stewart (1993), A. Neumaier (1998), H. Ren (1996), G. W. Stewart (1992, 1992,
1998), L. N. Trefethen and D. Bau (1997).
My special thanks for numerous discussions go to J. Awange (Kyoto/Japan), A.
Bjerhammar (Stockholm/Sweden), F. Brunner (Graz/Austria), J. Cai (Stutt-
gart/Germany), A. Dermanis (Thessaloniki/Greece), W. Freeden (Kaiserslautern
/Germany), R. Jurisch (Dessau/Germany), J. Kakkuri (Helsinki/Finland), G.
Kampmann (Dessau/Germany), K. R. Koch (Bonn/Germany), F. Krumm (Stutt-
gart/Germany), O. Lelgemann (Berlin/Germany), H. Moritz (Graz/Austria), F.
Sanso (Milano/Italy), B. Schaffrin (Columbus/Ohio/USA), L. Sjoeberg (Stock-
Preface xi
holm/Sweden), N. Sneeuw (Calgary/Canada), L. Svensson (Gävle/Sweden), P.
Vanicek (Fredericton/New Brunswick/Canada).
For the book production I want to thank in particular J. Cai, F. Krumm, A.
Vollmer, M. Paweletz, T. Fuchs, A. Britchi, and D. Wilhelm (all from Stuttgart/
Germany).
At the end my sincere thanks go to the Walter de Gruyter Publishing Company
for including my book into their Geoscience Series, in particular to Dr. Manfred
Karbe and Dr. Robert Plato for all support.
References 659
Index 745
1 The first problem of algebraic regression
– consistent system of linear observational equations –
underdetermined system of linear equations:
{Ax = y | A \ n×m , y R ( A ) rk A = n, n = dim Y}
Lemma 1.2
xm G x -MINOS of x
Lemma 1.3
xm G x -MINOS of x
Lemma 1.6
#
adjoint operator A
Corollary 1.8
Symmetric pair of eigensystems
Lemma 1.9
Canonical MINOS
1-1 Introduction
With the introductory paragraph we explain the fundamental
concepts and basic notions of this section. For you, the analyst,
who has the difficult task to deal with measurements,
observational data, modeling and modeling equations we present
numerical examples and graphical illustrations of all abstract
notions. The elementary introduction is written not for a mathe-
matician, but for you, the analyst, with limited remote control of
the notions given hereafter. May we gain your interest.
Assume an n-dimensional observation space Y, here a linear space parameter-
ized by n observations (finite, discrete) as coordinates y = [ y1 ," , yn ]c R n in
which an r-dimensional model manifold is embedded (immersed). The model
4 1 The first problem of algebraic regression
First, the introductory example solves the front page consistent system of linear
equations,
x1 + x2 + x3 = 2
x1 + 2 x2 + 4 x3 = 3,
ª x1 º
ª y º ªa a12 a13 º « »
y = « 1 » = « 11 x2
¬ y2 ¼ ¬ a21 a22 a23 »¼ « »
¬« x3 ¼»
ª x1 º
ª 2 º ª1 1 1 º « »
y = Ax : « » = « » « x2 »
¬ 3 ¼ ¬1 2 4 ¼ « »
¬ x3 ¼
xc = [ x1 , x2 , x3 ], y c = [ y1 , y2 ] = [2, 3],
x R 3×1 , y Z
+2×1 R 2×1
ª1 1 1 º
A := « » Z+ R
¬1 2 4 ¼
r = rk A = dim Y = n = 2.
Just observe that we are left with only two linear equations for three unknowns
( x1 , x2 , x3 ) . Indeed the system of inhomogeneous linear equations is “underde-
termined”. Without any additional postulate we shall be unable to inverse those
equations for ( x1 , x2 , x3 ) . In particular we shall outline how to find such an
additional postulate. Beforehand we have to introduce some special notions from
the theory of operators.
Within matrix algebra the index of the linear operator A is the rank r = rk A,
here r = 2, which coincides with the dimension of the observation space, here
n = dim Y = 2. A system of linear equations is called consistent if rk A = dim Y.
Alternatively we say that the mapping f : x 6 y = f (x) R (f ) or
A : x 6 Ax = y R (A) takes an element x X into the range R(f) or the
range space R(A), also called the column space of the matrix A.
f : x 6 y = f (x), y R ( f )
A : x 6 Ax = y, y R(A ) .
Here the column space is spanned by the first column c1 and the second column
c 2 of the matrix A, the 2×3 array, namely
° ª1º ª 1 º ½°
R (A) = span ® « » , «2» ¾ .
¯° ¬1¼ ¬ ¼ ¿°
Let us continue with operator theory. The right complementary index of the
linear operator A R n× m which accounts for the injectivity defect given by d =
m - rk A (here d = m - rk A = 1). “Injectivity” relates to the kernel N(f), or “the
null space” we shall constructively introduce lateron.
How can such a linear model of interest, namely a system of consistent linear
equations, be generated?
Let us assume that we have observed a dynamical system y(t) which is repre-
sented by a polynomial of degree two with respect to time t R, namely
y (t ) = x1 + x2 t + x3t 2 .
Box 1.2:
Special linear model: polynomial of degree two,
two observations, three unknowns
ª x1 º
ª y º ª1 t1 t12 º « »
y = « 1» = « » x2
¬ y2 ¼ ¬1 t2 t22 ¼ « »
«¬ x3 »¼
ª x1 º
ª t1 = 1, y1 = 2 ª 2 º ª1 1 1 º « »
« :« » = « » « x2 » ~
¬t2 = 2, y2 = 3 ¬ 3 ¼ ¬1 2 4 ¼ « x »
¬ 3¼
~ y = Ax, r = rk A = dim Y = n = 2 .
Third, let us begin with a more detailed analysis of the linear mapping
f : Ax = y , namely of the linear operator A R n× m , r = rk A = dim Y = n. We
shall pay special attention to the three fundamental partitionings, namely
(i) algebraic partitioning called rank partitioning of the matrix A,
(ii) geometric partitioning called slicing of the linear space X,
(iii) set-theoretical partitioning called fibering of the domain D(f).
wL
(x 2m ) = 0
wx 2
A c2 ( A1A1c ) 1 y + [ A c2 ( A1A1c ) 1 A 2 + I]x 2m = 0
x 2 m = [ A c2 ( A1A1c ) 1 A 2 + I]1 Ac2 ( A1 A1c ) 1 y,
we finally derive
x 2 m = A c2 ( A1 A1c + A 2 A c2 ) 1 y.
w2L
(x 2 m ) = 2[ A c2 ( A1 A1c ) 1 A 2 + I ] > 0
wx 2 wxc2
due to positive-definiteness of the matrix A c2 ( A1 A1c ) 1 A 2 + I
generate the sufficiency condition for obtaining the minimum of
the unconstrained Lagrangean. Finally let us backward transform
x 2 m 6 x1m = A11 A 2 x 2 + A11 y.
x1m = A11 A 2 A c2 ( A1 A1c + A 2 A c2 ) 1 y + A11 y.
®
°̄ x 2m = A c2 ( A1A1c + A 2 A c2 ) y
1
or
ª x1m º ª A1c º
« x » = « A c » ( A1 A1c + A 2 A c2 ) y
1
¬ 2m ¼ ¬ 2 ¼
or
x m = A c( AA c) 1 y.
1 ª14 4 º
A1c ( A1 A1c + A 2 A c2 ) 1 = 14 « 7 1»
¬ ¼
1 [ 7, 6]
A c2 ( A1 A1c + A 2 A c2 ) 1 = 14
ª 73 º
x1m = « 11 » , x 2 m = 14 , & x m & I = 14 42
1 3
¬ 14 ¼
y (t ) = 87 + 14
11
t + 141 t 2
w2L
(x 2 m ) = 2[ A c2 ( A1 A1c ) 1 A 2 + I ] = 28 > 0 .
wx 2 wxc2
{e1 + e 2 , e1 + 2e 2 | O}
or
{ec1 , ec 2 | O},
with respect to the orthogonal base vector e1 and e 2 , respectively, attached to the
origin O. Symbolically we write
R ( A) = span{e1 + e 2 , e1 + 2e 2 | O}.
ec2
1
ec1
e2
e1 1
r = rk A = rk A1 = n ½
° n× m n× r n× d °
® A R A = [ A1 , A 2 ] A1 R , A 2 R ¾
° d = d ( A) = m rk A °¿
¯
holds. (In the introductory example A R 2×3 , A1 R 2× 2 ,
A 2 R 2×1 , rk A = 2, d ( A) = 1 applies.) A consistent system of
linear equations Ax = y, rk A = dim Y , is “horizontally rank
partitioned” if
Ax = y , rk A = dim Y A1x1 + A 2 x 2 = y
1-1 Introduction 11
x1 = [ x1 , x2 ]c R 2×1 , x 2 = [ x3 ] R
ª1 1 º ª x1 º ª 1 º
«1 2 » « x » + « 4 » x3 = y.
¬ ¼¬ 2¼ ¬ ¼
By means of the horizontal rank partitioning of the system of
homogenous linear equations an identification of the null space
N(A), namely
N ( A) = {x R m | Ax = A1 x1 + A 2 x 2 = 0},
is
A1 x1 + A 2 x 2 = 0 x1 = A11 A 2 x 2 ,
x1 = 2 x3 = 2W , x2 = 3x3 = 3W , x3 = W .
Here the two equations Ax = 0 for any x X = R 2 constitute the linear space
N(A), dim N ( A) = 1 , a one-dimensional subspace of X = R 2 . For instance, if
we introduce the parameter x3 = W , the other coordinates of the parameter
space X = R 2 amount to x1 = 2W , x2 = 3W . In geometric language the linear
1
space N(A) is a parameterized straight line L0 through the origin illustrated by
Figure 1.2. The parameter space X = R r (here r = 2) is sliced by the subspace,
the linear space N(A), also called linear manifold, dim N ( A) = d( A) = d , here a
1
straight line L0 (here), through the origin O.
12 1 The first problem of algebraic regression
Compute
the “horizontal rank partitioning ”
A = [ A1 , A 2 ], A1 R r × r = R n× n , A 2 R n× ( m r ) = R n× ( m n )
“ m r = m n = d is called
right complementary index.”
“A as a linear operator is not
injective, but surjective”
1-1 Introduction 13
Compute
the null space N(A)
N ( A) := {x R m | Ax = 0} =
{x R m | x1 + A11A 2 x 2 = 0}
x m = A c( AA c) 1 y . h
or
ª x1 º ª 2 1º ª1 º ª 2 1º ª y1 º
« x » = « 1 1 » « 4 » x3 + « 1 1 » « y »
¬ 2¼ ¬ ¼¬ ¼ ¬ ¼¬ 2¼
x1 = 2 x3 + 2 y1 y2 , x2 = 3x3 y1 + y2 .
L1( y1 , y2 ) := { x R 3 | x1 = 2 x3 + 2 y1 y2 , x2 = 3x3 y1 + y2 },
x2
1
A ~ N (A)A
x1
L ( y1 , y2 ) := { x R | x1 = 2 x3 + 2 y1 y2 , x2 = 3x3 y1 + y2 } .
1 3
The geometric interpretation of the minimum norm solution & x & I = min is the
following: With reference to Figure 1.4 we decompose the vector
x = x N (A) + x N (A) A
null space N ( A) (here: the straight line L1(0, 0) , while the inconcistency parame-
ter i N ( A ) = i m is an element of the range space R ( A ) (here: the straight
line L1( y , y ) , namely L1(2, 3) ) of the generalized inverse matrix A of type
1 2
MINOS (“minimum norm solution”). & x &2I =& x N ( A ) + x N ( A ) &2
A
=& x N ( A ) & +2 < x | i > + & i & is minimal if and only if the inner prod-
2 2
Ax m = AA m Ax m AA A.
x m = A c( AA c) 1 y = A m y = A m Ax m
x m = A m y = A m AA m y
A m y = A m AA m y A AA = A .
x m = A m y = A m Ax m A A = PR ( A ) .
Box 1.7:
The general solution of a consistent system of linear equations;
f : x 6 y = Ax, x X = R m (parameter space), y Y = R n
(observation space) r = rk A = dim Y , A generalized inverse
of MINOS type
Condition #1 Condition #1
f (x) = f ( g (y )) Ax = AA Ax
f = f DgD f . AA A = A.
Condition #2 Condition #2
(reflexive g-inverse mapping) (reflexive g-inverse)
1-2 The minimum norm solution: “MINOS” 17
x = g (y ) = x = A y = A AA y
= g ( f (x)). A AA = A .
Condition #3 Condition #3
g ( f (x)) = x R ( A
)
A A = x R (A )
g D f = projR ( A
) A A = projR (A ) .
The set-theoretical partitioning, the fibering of the set system of points which
constitute the parameters space X, the domain D(f), will be finally outlined.
Since the set system X (the parameters space) is R r , the fibering is called “triv-
ial”. Non-trivial fibering is reserved for nonlinear models in which case we are
dealing with a parameters space X which is a differentiable manifold. Here the
fibering
D( f ) = N ( f ) N ( f )A
Figure 1.5: Venn diagram, trivial fibering of the domain D(f), trivial fibers
N(f) and N ( f ) A , f : R m = X o Y = R n , Y = R (f ) , X set system
of the parameter space, Y set system of the observation space.
1-2 The minimum norm solution: “MINOS”
The system of consistent linear equations Ax = y subject to A R n× m , rk A =
n < m , allows certain solutions which we introduce by means of Definition 1.1
18 1 The first problem of algebraic regression
1 w 2L
(x m , Ȝ m ) = G x t 0
2 wxwxc
due to the positive semidefiniteness of the matrix G x generate the sufficiency
condition for obtaining the minimum of the constrained Lagrangean. Due to the
assumption rk A = rk [ A, y ] = n or y R ( A) the existence of G x -MINOS x m
is guaranteed. In order to prove uniqueness of G x -MINOS x m we have to con-
sider case (i) G x positive definite and case (ii) G x positive semidefinite.
Case (i) : G x positive definite
Due to rk G x = m , G x z 0 , the partitioned system of normal equations
ªG A c º ª x m º ª 0 º
G x z 0, « x »« » = « »
¬ A 0 ¼ ¬Ȝ m ¼ ¬y ¼
is uniquely solved. The theory of inverse partitioned matrices (IPM) is presented
in Appendix A.
Case (ii) : G x positive semidefinite
Follow these algorithmic steps: Multiply the second normal equation by Ac in
order to produce A cAx Acy = 0 or A cAx = Acy and add the result to the first
normal equation which generates
20 1 The first problem of algebraic regression
G x x m + A cAx m + A cȜ m = A cy or (G x + A cA)x m + A cȜ m = A cy .
The augmented first normal equation and the original second normal equation
build up the equivalent system of normal equations
ªG + A cA A cº ª x m º ª A cº
G x + A cA z 0, « x » « » = « » y,
¬ A 0 ¼ ¬ Ȝ m ¼ ¬I n ¼
i m = x x m = [I m G x 1 A c( AG x 1 A c) 1 A]x . (1.17)
x = xm + i m (1.22)
i m = {I m ( G x + A cA ) A c[ A ( G x + A cA )
1 1
A c]1 A}x . (1.24)
rk (G x + A cA ) 1 A c[ A(G x + A cA ) 1 A c]1 A = rk A = n,
22 1 The first problem of algebraic regression
and
{I m = (G x + A cA ) 1 A c[ A (G x + A cA ) 1 A c]1 A},
rk{I m (G x + A cA) 1 A c[ A(G x + A cA) 1 A c]1 A} = n rk A = d ,
are independent. The corresponding norms are positive semidefinite,
namely
|| x m ||G2 x + AcA = y c[ A(G x + A cA) 1 A c]1 y
(1.25)
= xcA c[ A(G x + A cA) 1 A c]1 Ax = xcG m x,
: Proof :
A basis of the proof could be C. R. Rao´s Pandora Box, the theory of inverse
partitioned matrices (Appendix A: Fact: Inverse Partitioned Matrix /IPM/ of a
symmetric matrix). Due to the rank identity rk A = rk ( AG x 1 A c) = n < m the
normal equations of the case (i) as well as case (ii) can be faster directly solved
by Gauss elimination.
ªG x A c º ª x m º ª 0 º
« » « »=« »
¬ A 0 ¼ ¬Om ¼ ¬ y ¼
G x x m + A cOm = 0
Ax m = y.
Multiply the first normal equation by AG x 1 and subtract the second normal
equation from the modified first one.
Ax m + AG x 1A cOm = 0
Ax m = y
Om = ( AG A c) 1 y.
1
x
The forward reduction step is followed by the backward reduction step. Imple-
ment Om into the first normal equation and solve for x m .
G x x m A c( AG x 1A c) 1 y = 0
x m = G x1A c( AG x1A c) 1 y
x m = G x 1 A c( AG x 1 A c) 1 y , Ȝ m = ( AG x 1 A c) 1 y.
1-2 The minimum norm solution: “MINOS” 23
For the Case (iii), to the first normal equation we add the term AA cx m = Acy and
write the modified normal equation
ªG x + A cA A cº ª x m º ª A cº
« » « » = « » y.
¬ A 0 ¼ ¬Om ¼ ¬ I n ¼
Multiply the first modified normal equation by AG x 1 and subtract the second
normal equation from the modified first one.
Ax m + A(G x + A cA) 1 A cOm = A (G x + A cA ) 1 A cy
Ax m = y
A(G x + A cA) 1 A cOm = [ A(G x + A cA) 1 A c I n ]y
Om = [ A(G x + A cA) 1 A c]1[ A(G x + A cA) 1 A c I n ]y
Om = [I n ( A(G x + A cA) 1 A c) 1 ]y.
The forward reduction step is followed by the backward reduction step. Imple-
ment Om into the first modified normal equation and solve for x m .
(G x + A cA )x m Ac[ A(G x + A cA ) 1 A c]1 y + A cy = A cy
(G x + A cA )x m Ac[ A(G x + A cA ) 1 A c]1 y = 0
x m = (G x + A cA) 1 Ac[ A(G x + A cA ) 1 A c]1 y.
Thus G x -MINOS of (1.1) in terms of particular generalized inverse is obtained
as
x m = (G x + A cA) 1 Ac[ A(G x + A cA ) 1 A c]1 y ,
Om = [I n ( A(G x + A cA) 1 A c) 1 ]y .
ƅ
1-21 A discussion of the metric of the parameter space X
With the completion of the proof we have to discuss the basic results of Lemma
1.3 in more detail. At first we have to observe that the matrix G x of the metric of
the parameter space X has to be given a priori. We classified MINOS accord-
ing to (i) G x = I m , (ii) G x positive definite and (iii) G x positive semidefinite.
But how do we know the metric of the parameter space? Obviously we need
prior information about the geometry of the parameter space X , namely from
24 1 The first problem of algebraic regression
the empirical sciences like physics, chemistry, biology, geosciences, social sci-
ences. If the parameter space X R m is equipped with an inner product
¢ x1 | x 2 ² = x1cG x x 2 , x1 X, x 2 X where the matrix G x of the metric & x &2 =
xcG x x is positive definite, we refer to the metric space X R m as Euclidean
E m . In contrast, if the parameter space X R m is restricted to a metric space
with a matrix G x of the metric which is positive semidefinite, we call the pa-
rameter space semi Euclidean E m , m . m1 is the number of positive eigenvalues,
1 2
For such a parameter space MINOS has to be generalized to & x &2G = extr , for x
are (i) idempotent, (ii) G x idempotent and (iii) [ A(G x + AcA) 1 Ac]1 idempotent,
2
namely projection matrices. Similarly, the norms i m of the type (1.30), (1.31)
and (1.32) measure the distance of the solution point x m X from the null space
N ( A ) . The matrices
(i) I m A c( AA c) 1 A (1.30)
1 1
(ii) G x A c( AG A c) A
x (1.31)
(iii) I m (G x + A cA ) A c[ A(G x + A cA ) A c] A
1 1 1
(1.32)
1-2 The minimum norm solution: “MINOS” 25
=|| Ly ||2
Gx +2y cLcG x (I LA )z + || (I LA)z ||G2 x
hand side is a symmetric matrix. Accordingly the left hand side must have this
property, too, namely G x LA = (G x LA)c , which had to be shown.
Reflexivity of the matrix L originates from the consistency condition, namely
(I AL)y = 0 for all y R m×1 or AL = I . The reflexive condition of the G x -
weighted, minimum norm generalized inverse, (1.17) G x LAL = G x L , is a direct
consequence.
Consistency of the normal equations (1.4) or equivalently the uniqueness of
G x x m follows from G x L1y = A cL1cG x L1y = G x L1 AL1 y = G x L1 AL 2 y = A cL1c A ×
×Lc2 G x L 2 y = A cL 2 G x L 2 y = G x L 2 y for arbitrary matrices L1 R m × n and
L 2 R m × n which satisfy (1.16).
ƅ
1-24 Eigenvalue decomposition of G x -MINOS:
canonical MINOS
In the empirical sciences we meet quite often the inverse problem to determine
the infinite set of coefficients of a series expansion of a function of a functional
(Taylor polynomials) from a finite set of observations.
First example:
Determine the Fourier coefficients (discrete Fourier transform, trigonometric
polynomials) of a harmonic function with circular support from observations in a
one-dimensional lattice.
Second example:
Determine the spherical harmonic coefficients (discrete Fourier-Legendre trans-
form) of a harmonic function with spherical support from observations n a two-
dimensional lattice.
Both the examples will be dealt with lateron in a case study. Typically such ex-
pansions generate an infinite dimensional linear model based upon orthogonal
(orthonormal) functions. Naturally such a linear model is underdetermined since
a finite set of observations is confronted with an infinite set of unknown parame-
ters. In order to make such an infinite dimensional linear model accessible to the
computer, the expansion into orthogonal (orthonormal) functions is truncated or
band-limited.
Observables y Y , dim Y = n , are related to parameters x X , dim X =
= m n = dim Y , namely the unknown coefficients, by a linear operator
A \ n× m which is given in the form of an eigenvalue decomposition. We are
confronted with the problem to construct “canonical MINOS”, also called the
eigenvalue decomposition of G x -MINOS.
First, we intend to canonically represent the parameter space X as well as the
observation space Y . Here, we shall assume that both spaces are Euclidean
1-2 The minimum norm solution: “MINOS” 27
aac = G x bbc = G y
j1 , j2 {1,..., m} i1 , i2 {1,..., n}
rk G x = m rk G y = n
G *x = V cG x V = G *y = U cG y U =
(1.35) (1.36)
= Diag(Ȝ 1x ,..., Ȝ mx ) =: ȁ x = Diag(Ȝ 1y ,..., Ȝ ny ) =: ȁ y
28 1 The first problem of algebraic regression
subject to subject to
(1.37) VV c = V cV = I m UU c = U cU = I n (1.38)
(1.39) (G x Ȝ xj I m ) v j = 0 (G y Ȝ iy I n )u i = 0 (1.40)
(1.41) G x = VG *x V c = Vȁ x V c Uȁ y U c = UG *y U c = G y . (1.42)
Second, we study the impact of the left diagonalization of the metric of the met-
ric G x as well as right diagonalization of the matrix of the metric G y on the
coordinates x X and y Y , the parameter systems of the left Euclidean space
X , dim X = m , and of the right Euclidian space Y . Enjoy the way how we
have established the canonical coordinates x* := [ x1* ,..., xm* ]c of X as well as the
canonical coordinates y * := [ y1* ,..., yn* ] called the left and right star coordinates
of X and Y , respectively, in Box 1.9. In terms of those star coordinates (1.45)
as well as (1.46) the left norm || x* ||2 of the type (1.41) as well as the right norm
|| y * ||2 of type (1.42) take the canonical left and right quadratic form. The trans-
formations x 6 x* as well as y 6 y * of type (1.45) and (1.46) are special ver-
sions of the left and right polar decomposition: A rotation constituted by the 1 1
matrices {U, V} is followed by a stretch constituted by the matrices {ȁ x , ȁ y } as 2 2
by the positive roots of the left and right eigenvalues, respectively. (1.49) –
(1.52) are corresponding direct and inverse matrix identities. We conclude with
the proof that the ansatz (1.45), (1.46) indeed leads to the canonical representa-
tion (1.43), (1.44) of the left and right norms.
Box 1.9:
Canonical coordinates x* X and y * Y ,
parameter space versus observation space
“canonical coordinates “canonical coordinates
of the parameter space” of the observation space”
|| x* ||2 = (x* )cx* = || y * ||2 = (y * )cy * =
(1.43) (1.44)
= xcG x x =|| x ||G2 x
= y cG y y =|| y ||G2 y
ansatz
1 1
(1.45) x* = V cȁ x x2
y * = U cȁ y y2
(1.46)
1-2 The minimum norm solution: “MINOS” 29
versus versus
- 12 - 12
(1.47) x = ȁ Vx x
*
y = ȁ y Uy * (1.48)
1
(1.49) ȁ x := Diag
2
( O1x ,..., Omx ) Diag ( O1y ,..., Ony := ȁ y (1.50)) 1
2
1
§ 1 1 · § 1 1 · 1
(1.51) ȁ x := Diag ¨ ¸ =: ȁ -y (1.52)
- 2
,..., ¸ Diag ¨ ,..., 2
¨ O x
Omx ¸ ¨ O y
Ony ¸
© 1 ¹ © 1 ¹
“the ansatz proof” “the ansatz proof”
G x = Vȁ x V c G y = Uȁ y Uc
|| x ||G2 = xcG x x =
x
|| y ||G2 = y cG y y =
y
1 1 1 1
= xcVȁ x ȁ x V cx =
2 2
= y cUȁ y ȁ y U cy =
2 2
- 12 1
- 12
= (x* )cȁ x V cVȁ x ȁ x V cVȁ x x* = - 12 1
- 12
2
= (y * )cȁ y U cUȁ y ȁ y U cUȁ y y * = 2
Box 1.10:
General bases versus orthonormal bases spanning the parameter space X as
well as the observation space Y
“left” “right”
“parameter space X ” “observation space”
“general left base” “general right base”
span {a1 ,..., am } = X Y = span {b1 ,..., bn }
30 1 The first problem of algebraic regression
versus versus
- 12 - 12
(1.59) e x = V cȁ x a e y = Ucȁ y b (1.60)
{ }
span e1x ,..., emx = X {
Y = span e1y ,..., eny . }
Fourth, let us begin the eigenspace analysis versus eigenspace synthesis of the
rectangular matrix A \ n× m , r := rk A = n , n < m . Indeed the eigenspace of
the rectangular matrix looks differently when compared to the eigenspace of the
quadratic, symmetric, positive definite matrix G x \ m × m , rk G x = m and
G y \ n×n , rk G y = n of the left and right metric. At first we have to generalize
the transpose of a rectangular matrix by introducing the adjoint operator A #
which takes into account the matrices {G x , G y } of the left, right metric. Defini-
tion 1.5 of the adjoint operator A # is followed by its representation, namely
Lemma 1.6.
Definition 1.5 (adjoint operator A # ):
The adjoint operator A # \ m× n of the matrix A \ n× m is defined
by the inner product identity
y | Ax G = x | A # y , (1.61)
y Gx
where the left inner product operates on the symmetric, full rank
matrix G y of the observation space Y , while the right inner prod-
uct is taken with respect to the symmetric full rank matrix G x of the
parameter space X .
For the proof we take advantage of the symmetry of the left inner product,
namely
y | Ax Gy
= y cG y Ax versus x | A#y = xcG x A # y
Gx
y cG y Ax = xcA cG y y = xcG x A # y
A cG y = G x A # G x1A cG y = A # .
ƅ
Five, we solve the underdetermined system of linear equations
{y = Ax | A \ n× m
, rk A = n, n < m }
by introducing
• the eigenspace of the rectangular matrix A \ n× m of rank
r := rk A = n , n < m : A 6 A*
• the left and right canonical coordinates: x o x* , y o y *
as supported by Box 1.11. The transformations (1.63) x 6 x* , (1.64) y 6 y *
from the original coordinates ( x1 ,..., xm ) , the parameters of the parameter space
( )
X , to the canonical coordinates x1* ,..., xm* , the left star coordinates, as well as
from the original coordinates ( y1 ,..., yn ) , the parameters of the observation
( )
space Y , to the canonical coordinates y1* ,..., yn* , the right star coordinates are
polar decompositions: a rotation {U, V} is followed by a general stretch
{ 1 1
2
} 1
2
1
G y , G x . The matrices G y as well as G x are product decompositions of type
2 2
ªV c º
A = G y U [ ȁ, 0 ] « 1 » G x
1 1
- 2 2
«V c »
¬ 2¼
- 12
is based upon the left matrix (1.75) L := G y U and the right matrix (1.76)
1
R := G x V . Indeed the left matrix L by means of (1.77) LLc = G -1y reconstructs
-
2
the inverse matrix of the metric of the observation space Y . Similarly, the right
32 1 The first problem of algebraic regression
A
X
x y Y
1 1
V cG x 2
U cG y 2
X
x* y* Y
A*
Figure 1.6: Commutative diagram of coordinate transformations
Consult the commutative diagram for a short hand summary of the introduced
transformations of coordinates, both of the parameter space X as well as the
observation space Y .
Box 1.11:
Canonical representation,
underdetermined system of linear equations
“parameter space X ” versus “observation space Y ”
1 1
(1.63) x* = V cG x x 2
y * = U cG y y (1.64) 2
and and
- 12 - 12
(1.65) x = G x Vx* y = G y Uy * (1.66)
( ) ( )
1
- 12 - 12 1
(1.69) y * = U cG y AG x V x*
2
y = G y UA* V cG x x (1.70) 2
1-2 The minimum norm solution: “MINOS” 33
subject to
= [ ȁ, 0] «V c »
¬ 2¼
“dimension identities”
r×r
ȁ\ , 0 \ r × ( m r ) , r := rk A = n, n < m
V1 \ m × r , V2 \ m × ( m r ) , U \ r × r
“left eigenspace” “right eigenspace”
- 12 1 - 12 1
(1.75) L := G U L = U cG y
y
-1 2
R := G x V R -1 = V cG x (1.76) 2
- 12 - 12
R 1 := G x V1 , R 2 := G x V2
1 1
R 1- := V1cG x , R -2 := V2cG x
2 2
(1.81) A = L [ ȁ, 0] « - » 1
versus (1.82)
¬« R 2 ¼» = L-1 A [ R 1 , R 2 ]
ª A # AR 1 = R 1 ȁ 2
(1.83) AA # L = Lȁ 2 versus « # (1.84)
«¬ A AR 2 = 0
“underdetermined system of linear
equations solved in canonical coordinates”
ª x* º x* \ r ×1
(1.85) y * = A* x* = [ ȁ, 0] « 1* » = ȁx*1 , * 1 ( m r )×1
«¬ x 2 »¼ x2 \
ª x*1 º ª ȁ -1 y * º
« *» = « * » (1.86)
¬« x 2 ¼» ¬ x 2 ¼
“if x* is MINOS, then x*2 = 0 : x1* ( ) m
= ȁ -1 y * .”
34 1 The first problem of algebraic regression
{y = Ax | A \ n× m
, rk A = n, n < m, || x ||G2 = min x
}
by introducing Lemma 1.7, namely the eigenvalue - eigencolumn equations of
the matrices A # A and AA # , respectively, as well as Lemma 1.9, our basic result
on “canonical MINOS”, subsequently completed by proofs.
Lemma 1.7 (eigenspace analysis versus eigenspace synthesis
{
of the matrix A \ n× m , r := rkA = n < m ) }
The pair of matrices {L, R} for the eigenspace analysis and the eigenspace
synthesis of the rectangular matrix A \ n× m of rank r := rkA = n < m ,
namely
A* = L-1 AR versus A = LA* R -1
or or
A = [ ȁ, 0 ] = L A [ R 1 , R 2 ]
* -1
versus ª R -1 º
A = L [ ȁ, 0] « 1-1 » ,
¬« R 2 ¼»
are determined by the eigenvalue – eigencolumn equations (eigenspace
equations) of the matrices A # A and AA # , respectively, namely
A # AR 1 = R 1 ȁ 2 versus AA # L = Lȁ 2
subject to
ªO12 … 0 º
« »
ȁ 2 = « # % # » , ȁ = Diag + O12 ,..., + Or2 . ( )
« 0 " Or2 »
¬ ¼
Let us prove first AA # L = Lȁ 2 , second A # AR 1 = R 1 ȁ 2 .
(i) AA # L = Lȁ 2
AA # L = AG -1x A cG y L =
ªV c º ªȁº
= L [ ȁ, 0] « 1 » G x G -1x (G x )c [ V1 , V2 ] « » U c(G y )cG y G y U,
1 1 1 1
2
- 2
- - 2 2
«V c » 0
¬ ¼c
¬ 2¼
ª V cV V1c V2 º ª ȁ º
AA # L = L [ ȁ, 0] « 1 1 » « »,
«V cV
¬ 2 1 V2c V2 »¼ ¬ 0c ¼
ªI 0 º ªȁº
AA # L = L [ ȁ, 0] « r . ƅ
¬0 I m -r »¼ «¬ 0c »¼
1-2 The minimum norm solution: “MINOS” 35
(ii) A # AR 1 = R 1 ȁ 2
A # AR = G -1x AcG y AR =
ªȁº
= G -1xG x V « » U c(G y )cG y G y U [ ȁ, 0] V cG x G x V,
1 1 1 1 1
2
- - 2
- 2 2 2
¬ 0c ¼
ªȁº ª ȁ 2 0º
A # AR = G x V « » [ ȁ, 0] = G x [ V1 , V2 ] «
1 1
- -
»,
2 2
¬ 0c ¼ ¬ 0 0¼
A # A [ R 1 , R 2 ] = G x ª¬ V1 ȁ 2 , 0 º¼
1
- 2
A # AR 1 = R 1 ȁ 2 . ƅ
{
The pair of eigensystems AA # L = Lȁ 2 , A # AR 1 = R 1 ȁ 2 is unfortunately }
based upon non-symmetric matrices AA # = AG -1x A cG y and A # A = G -1x A cG y A
which make the left and right eigenspace analysis numerically more complex. It
appears that we are forced to use the Arnoldi method rather than the more effi-
cient Lanczos method used for symmetric matrices. In this situation we look out
for an alternative. Indeed when we substitute
{L, R} { - 12
by G y U, G x V
- 12
}
- 12
into the pair of eigensystems and consequently left multiply AA # L by G x , we
achieve a pair of eigensystems identified in Corollary 1.8 relying on symmetric
matrices. In addition, such a symmetric pair of eigensystems produces the ca-
nonical base, namely orthonormal eigencolumns.
Corollary 1.8 (symmetric pair of eigensystems):
The pair of eigensystems
1 1
- 12 - 12
(1.87) G y AG -1x A c(G y )cU = ȁ 2 U versus
2 2
(G x )cA cG y AG x V1 = V1 ȁ 2 (1.88)
1
- 12 - 12 - 12
(1.89) G y AG -1x Ac(G y )c Ȝ i2 I r = 0 versus (G x )cA cG y AG x Ȝ 2j I m = 0 (1.90)
2
is based upon symmetric matrices. The left and right eigencolumns are
orthogonal.
{y = Ax | A \ n× m
, r := rkA = n, n < m . }
Then the rank partitioning of x*m
ª x* º ª ȁ -1 y * º x1* = ȁ -1 y * *
x*m = « *1 » = « » or , x1 \ r ×1 , x*2 \ ( m r )×1 (1.91)
¬x2 ¼ ¬ 0 ¼ x2 = 0
*
ª ȁ -1 º
xm = G x [ V1 , V2 ] « » U cG y y ,
1 1
- 2 2
¬ 0c ¼
- 12 1
xm = G x V1 ȁ -1 U cG y = 5 1 ȁ -1 /-1 y. 2
For the proof we depart from G x -MINOS (1.14) and replace the ma-
trix A \ n× m by its canonical representation, namely eigenspace synthesis.
( )
-1
xm = G -1x Ac AG -1x Ac y
ªV c º
A = G y U [ ȁ, 0 ] « 1 » G x
1 1
- 2 2
«V c »
¬ 2¼
ªVc º ªȁº
AG -1x Ac = G y U [ ȁ, 0] « 1 » G x G -1x (G x )c [ V1 , V2 ] « » Uc(G y )c
1 1 1 1
- 2 2
- 2 2
«V c » ¬0¼
¬ 2¼
- 12
AG -1x Ac = G y Uȁ 2 Uc(G y )c, AG -1x Ac
- 12
( )
-1
( )
c 1
= G y Uȁ -2 UcG y
2
1
2
( )c [V , V ] «¬ªȁ0 »¼º Uc (G )c (G )c Uȁ
xm = G -1x G x
1
2
1 2
- 12
y
1
2
y
-2
1
U cG y y
2
1-2 The minimum norm solution: “MINOS” 37
ª ȁ -1 º
xm = G x [ V1 , V2 ] « » U cG y y
1 1
- 2 2
¬ 0 ¼
- 12 1
xm = G x V1 ȁ -1 U cG y y = A m- y
2
- 12 1
A m- = G x V1 ȁ -1 U cG y A1,2,4
G
2
x
ª x* º 1 ª ȁ -1 º ª ȁ -1 º
1 ª ȁ -1 y * º
x*m = « *1 » = V cG x xm = « » U cG y y = « » y * = «
2 2
». ƅ
¬x2 ¼ ¬ 0 ¼ ¬ 0 ¼ ¬ 0 ¼
The important result of x*m based on the canonical G x -MINOS of {y * = A* x* |
A* \ n× m , rkA* = rkA = n, n < m} needs a short comment. The rank partition-
ing of the canonical unknown vector x* , namely x*1 \ r , x*2 \ m r again
paved the way for an interpretation. First, we acknowledge the “direct inversion”
(
x*1 = ȁ -1 y * , ȁ = Diag + O12 ,..., + Or2 , )
for instance [ x1* ,..., xr* ]c = [O11 y1 ,..., Or1 yr ]c . Second, x*2 = 0 , for instance
[ xr*+1 ,..., xm* ]c = [0,..., 0]c introduces a fixed datum for the canonical coordinates
( xr +1 ,..., xm ) . Finally, enjoy the commutative diagram of Figure 1.7 illustrating
our previously introduced transformations of type MINOS and canonical
MINOS, by means of A m and ( A* )m .
A m
Y
y xm X
1 1
UcG y2
V cG x
2
Y
y* x*m X
(A )
*
m
Finally, let us compute canonical MINOS for the Front Page Example,
specialized by G x = I 3 , G y = I 2 .
38 1 The first problem of algebraic regression
ª x1 º
ª 2 º ª1 1 1 º « »
y = Ax : « » = « » « x2 » , r := rk A = 2
¬ 3 ¼ ¬1 2 4 ¼ « »
¬ x3 ¼
left eigenspace right eigenspace
A # AV1 = A cAV1 = V1 ȁ 2
AA U = AAcU = Uȁ
# 2
A # AV2 = A cAV2 = 0
ª2 3 5 º
ª3 7 º « 3 5 9 » = A cA
AA c = « »
¬7 21¼ « »
«¬ 5 9 17 »¼
eigenvalues
AA c Oi2 I 2 = 0 A cA O j2 I 3 = 0
ª 2 O12 3 5 º ª v11 º
2
ª3 O 7 º ª u11 º « »
(1st) « 1
»« » = 0 (1st) « 3 5 O12
9 » «« v 21 »» = 0
¬ 7 21 O12 ¼ ¬u21 ¼ « 5
¬ 9 17 O12 »¼ «¬ v31 »¼
subject to subject to
u112 + u21
2
=1 v112 + v 221 + v31
2
=1
ª 2 49 49
« u11 = 49 + (3 O 2 ) 2 = 260 + 18 130
« 1
« 2 2 2
(3 O1 ) 211 + 18 130
«u21 = =
¬« 49 + (3 O 1
2 2
) 260 + 18 130
2
ª v11 º ª (2 + 5O12 ) 2 º
« 2» 1 « »
« v 21 » = (2 + 5O 2 ) 2 + (3 9O 2 ) 2 + (1 + 7O 2 O 4 ) 2
2 2
« (3 9O1 ) »
« v31
2 » 1 1 1 1 « (1 7O12 + O14 ) 2 »
¬ ¼ ¬ ¼
1-2 The minimum norm solution: “MINOS” 39
2
ªv º
ª 62 + 5 130 2 º
« (
» )
11
« 2»
« »
«v » =
2
21
1
(
« 105 9 130 » )
« v » 102700 + 9004 130 «
2 »
¬ ¼
31
¬«
(
« 191 + 17 130 2 »
¼»
)
ª 2 O22 3 5 º ª v12 º
ª3 O22 7 º ª u12 º « »
(2nd) « 2»« » = 0 (2nd) « 3 2
5 O2 9 » «« v 22 »» = 0
¬ 7 21 O2 ¼ ¬u22 ¼ « 5
¬ 9 17 O22 »¼ «¬ v32 »¼
subject to subject to
u +u =1
2
12
2
22 v + v 222 + v32
2
12
2
=1
(3 O22 )u12 + 7u22 = 0 versus ª (2 O22 )v12 + 3v 22 + 5v32 = 0
«
¬ 3v12 + (5 O2 )v 22 + 9v32 = 0
2
ª 2 49 49
« u12 = 49 + (3 O 2 ) 2 = 260 18 130
« 2
« 2 2 2
(3 O2 ) 211 18 130
«u22 = =
«¬ 49 + (3 O 2 2
2 ) 260 18 130
2
ª v12 º ª (2 + 5O22 ) 2 º
« 2 » 1 « »
« v 22 » = (2 + 5O 2 ) 2 + (3 9O 2 ) 2 + (1 + 7O 2 O 4 ) 2
2 2
« (3 9O2 ) »
« v32
2 » 2 2 2 2 « (1 7O22 + O24 ) 2 »
¬ ¼ ¬ ¼
2
ª v12 º
(
ª 62 5 130 2 º
« » )
« 2»
« 2 »
« v 22 » =
1
(
« 105 + 9 130 »
102700 9004 130 « »
)
« v32
2 »
¬ ¼
«¬ (
« 191 17 130 2 »
»¼ )
ª 2 3 5 º ª v13 º
(3rd) «« 3 5 9 »» «« v 23 »» = 0 subject to v132 + v 223 + v33
2
=1
«¬ 5 9 17 »¼ «¬ v33 »¼
2v13 + 3v 23 + 5v33 = 0
3v13 + 5v 23 + 9v33 = 0
ª 2 3º ª v13 º ª 5º ª v13 º ª 5 3º ª 5º
« 3 5» « v » = « 9» v33 « v » = « 3 2 » «9» v33
¬ ¼ ¬ 23 ¼ ¬ ¼ ¬ 23 ¼ ¬ ¼¬ ¼
v13 = 2v33 , v 23 = 3v33
40 1 The first problem of algebraic regression
2 9 1
v132 = 2
, v 23 = , v33
2
= .
7 14 14
There are four combinatorial solutions to generate square roots.
«u » = « »
¬ 21 u22 ¼ «¬ ± u21
2 2 »
± u22 ¼
ª 2 º
v13 º « ± v11 ± v12 ± v13
2 2
ª v11 v12 »
«v v 22 v 23 »» = « ± v 221 ± v 222 ± v 223 » .
« 21 « »
«¬ v31 v32 v33 »¼ « ± v 2 ± v32
2
± v332 »
31
¬ ¼
Here we have chosen the one with the positive sign exclusively. In summary, the
eigenspace analysis gave the result as follows.
ª 62 + 5 130 62 5 130 º
« 2 »
« 102700 + 9004 130 102700 9004 130 »
« 14 »
« 105 + 9 130 105 9 130 3 »
V=« » = [ V1 , V2 ] .
« 102700 + 9004 130 102700 9004 130 14 »
« 1 »
« 191 + 17 130 191 + 17 130 »
«« 102700 + 9004 130 14 »
¬ 102700 9004 130 »¼
Another aspect of any series expansion is the choice of the function space. For
instance, if we develop scalar-valued, vector-valued or tensor-valued functions
into scalar-valued, vector-valued or tensor-valued circular or spherical harmon-
ics, we generate orthogonal functions with respect to a special inner product,
also called “scalar product” on the circle or spherical harmonics are eigenfunc-
tions of the circular or spherical Laplace-Beltrami operator. Under the postulate
of the Sturm-Liouville boundary conditions the spectrum (“eigenvalues”) of the
Laplace-Beltrami operator is
positive and integer.
The eigenvalues of the circular Laplace-Beltrami operator are l 2 for integer
{
values l {0,1,..., f} , of the spherical Laplace-Beltrami operator k (k + 1), l 2 }
for integer values k {0,1,..., f} , l {k , k + 1,..., 1, 0,1,..., k 1, k} . Thanks to
such a structure of the infinite-dimensional eigenspace of the Laplace-Beltrami
operator we discuss the solutions of the underdetermined regression problem
(linear algebraic regression) in the context of “canonical MINOS”. We solve the
system of linear equations
{Ax = y | A \ n× m , rk A = n, n m}
by singular value decomposition as shortly outlined in Appendix A.
1-31 Fourier series
? What are Fourier series ?
Fourier series (1.92) represent the periodic behavior of a function x(O ) on a
circle S1 . They are also called trigonometric series since trigonometric functions
{1,sin O , cos O ,sin 2O , cos 2O ,...,sin AO , cos AO} represent such a periodic signal.
Here we have chosen the parameter “longitude O ” to locate a point on S1 .
Instead we could exchange the parameter O by time t , if clock readings would
substitute longitude, a conventional technique in classical navigation. In such a
setting,
2S
O = Zt = t = 2SQ t ,
T
t
AO = AZt = 2S A = 2S AQ t
T
42 1 The first problem of algebraic regression
ª cos AO A > 0
«
eA (O ) := « 1 A = 0 (1.94)
«¬sin A O A < 0.
ª OA = 1 A1 = 0 1
subject to «
«¬ OA = 2 A1 z 0
1
1
“norms, convergence”
2S +L
1
|| x ||
2
= ³ d O x (O ) = lim
2
¦Ox A
2
A <f (1.98)
2S 0
Lof
A=L
“orthonormality”
< e*A (x) | e*A (x) >= G A A
1 2 1 2
(1.109)
44 1 The first problem of algebraic regression
Fourier space
Lof
FOURIER = span{e L , e L +1 ,..., e 1 , e0 , e1 ,..., e L 1 , e L }
“ FOURIER = HARM L ( S ) ”. 2 1
for instance < e1 (O ) | e1 (O ) > = 0 , the base functions eA (O ) are called orthogo-
nal. But according to
< eA (O ) | eA (O ) > = 12 ,
is identical with the Hilbert space L2 (S1 ) of harmonic functions on the circle
S1 .
? What is a harmonic function which has the unit circle S1 as a support ?
ª x(0) = x(2S )
(2nd) «
«[ d x(O )](0) = [ d x(O )](2S ).
«¬ d O dO
The special Sturm-Liouville equations force the frequency to be integer, shortly
proven now.
ansatz: x(O ) = cZ cos ZO + sZ sin ZO
x(0) = x(2S )
d d
[ x(O )](0) = [ x(2S )](2S )
dO dO
sZZ = cZZ sin 2SZ + sZZ cos 2SZ
cos 2SZ = 0 º
Z = A A {0,1,..., L 1, L} .
sin 2SZ = 0 »¼
dim Y = n I
“equidistant lattice on S1 ”
2S
Oi = (i 1) i {1,..., I } (1.111)
I
Example ( I = 2) : O1 = 0, O2 = S 180°
Example ( I = 3) : O1 = 0, O2 = 2S
3
120°, O3 = 4S
3
240°
Example ( I = 4) : O1 = 0, O2 = 2S
4
90°, O3 = S 180°, O4 = 3S
2
270°
Example ( I = 5) : O1 = 0, O2 = 2S
5
72°, O3 = 4S
5
144°,
O4 = 6S
5
216°, O5 = 8S
5
288°
1-3 Case study 47
“The parameter space X ”
x1 = x0 , x2 = x1 , x3 = x+1 , x4 = x2 , x5 = x+2 ,..., xm 1 = x L , xm = xL (1.112)
dim X = m 2 L + 1
“The underdetermined linear model”
n < m : I < 2L + 1
ª y1 º ª1 sin O1 cos O1 ... sin LO1 cos LO1 º ª x1 º
« y » «1 sin O cos O2 ... sin LO2 cos LO2 »» «« x2 »»
« 2 » « 2
level L = 0 I =3
level L = 1 0° 120° 240° 360°
level L = 2
level L = 3 I =4
0° 90° 180° 270° 360°
I =5
0° 72° 144° 216° 288° 360°
dim X = m = f .
+L
L = 1: ¦ e (O A i1 )eA (Oi ) = L + 1 i1 = i2
2
A = L
ª x1 º ª1 1 º ª y1 + y2 º
1« 1«
x A = « x2 » = A c( AA c) y = « 0 0 » y = « 0 »»
« » 1 »
2 2
«¬ x3 »¼ A «¬1 1»¼ «¬ y1 y2 »¼
|| x A ||2 = 12 y cy .
50 1 The first problem of algebraic regression
Box 1.16:
The second example:
Fourier analysis as an underdetermined linear model:
n rk A = n m = 2, L = 2
“ dim Y = n = 3, dim X = m = 5 ”
ª x1 º
ª y1 º ª1 sin O1 cos O1 sin 2O1 cos 2O1 º «« x2 »»
« y » = «1 sin O cos O2 sin 2O2 cos 2O2 »» « x3 »
« 2» « 2
« »
«¬ y3 »¼ «¬1 sin O3 cos O3 sin 2O3 cos 2O3 »¼ « x4 »
«¬ x5 »¼
sin O1 = 0,sin O2 = 1
2
3,sin O3 = 12 3
cos O1 = 1, cos O2 = 12 , cos O3 = 12
sin 2O1 = 0,sin 2O2 = 12 3,sin 2O3 = 1
2
3
cos 2O1 = 1, cos 2O2 = 12 , cos 2O3 = 12
ª1 0 1 0 1º
« »
A := «1 2 3 12
1
1
2
3 12 »
« 1 1 1 »
¬1 2 3 2 2
3 12 ¼
AA c = 3I 3 ( AAc) 1 = 13 I 3
AA c =
ª 3
1 + sin O1 sin O2 + cos O1 cos O2 + 1 + sin O1 sin O3 + cos O1 cos O3 + º
« + sin 2O1 sin 2O2 + cos 2O1 cos 2O 2 + sin 2O1 sin 2O3 + cos 2O1 cos 2O3 »
« »
1 + sin O2 sin O3 + cos O2 cos O3 +
«1 + sin O
2
sin O1 + cos O2 cos O1 +
3 »
« + sin 2O
2
sin 2O1 + cos 2O2 cos 2O1 + sin 2O2 sin 2O3 + cos 2O 2 cos 2O3 »
«1 + sin O sin O1 + cos O3 cos O1 + 1 + sin O3 sin O2 + cos O3 cos O 2 + »
« + sin 2O 3
3 »
¬ 3
sin 2O1 + cos 2O3 cos 2O1 + sin 2O3 sin 2O2 + cos 2O3 cos 2O2 ¼
2S
if Oi = (i 1) , then
I
1 + sin O1 sin O2 + cos O1 cos O2 + sin 2O1 sin 2O2 + cos 2O1 cos 2O2 =
= 1 12 12 = 0
1-3 Case study 51
1 + sin O1 sin O3 + cos O1 cos O3 + sin 2O1 sin 2O3 + cos 2O1 cos 2O3 =
= 1 12 12 = 0
1 + sin O2 sin O3 + cos O2 cos O3 + sin 2O2 sin 2O3 + cos 2O2 cos 2O3 =
= 1 34 14 14 + 14 = 0
+L
L = 2: ¦ e (O A i1 )eA (Oi ) = 0 i1 z i2
2
A = L
+L
L = 2: ¦ e (O A i1 )eA (Oi ) = L + 1 i1 = i2
2
A = L
ª1 1 1 º
« 1 1
»
«0 2 3 2 3 » ª y1 º
1
x A = Ac( AAc) 1 y = «1 12 12 » «« y2 »»
3 « »
«0 12 3 12 3 » «¬ y3 »¼
« »
«¬1 12 12 »¼
ª x1 º ª y1 + y2 + y3 º
«x » « 1 1
»
« 2» « 2 3 y2 2 3 y3 »
1 1
x A = « x3 » = « y1 12 y2 12 y3 » , || x ||2 = y cy .
« » 3« » 3
« x4 » « 12 3 y2 + 12 3 y3 »
«¬ x5 »¼ « »
A «¬ y1 12 y2 12 y3 »¼
ª x1 º
« x2 »
« x3 »
yi = y (Oi ) = [1,sin Oi , cos Oi ,..., cos( L 1)Oi ,sin LOi , cos LOi ] « # » (1.114)
« xm 2 »
«x »
« xm 1 »
¬ m ¼
or
y = Ax, A \ n× m , rk A = n, I = n < m = 2 L + 1
A O ( n ) := {A \ n × m | AA c = ( L + 1)I n } (1.115)
2S
Oi = (i 1) i, i1 , i2 {1,..., I } , (1.116)
I
then discrete orthogonality
+L
ª 0 i1 z i2
AA c = ( L + 1)I n ¦ e (O )eA (Oi ) = « (1.117)
¬ L + 1 i1 = i2
A i1 2
A=L
P2 (t ) = 32 t 2 12
P11 (t ) = 1 t 2
P21 (t ) = 3t 1 t 2
54 1 The first problem of algebraic regression
d2
Example: P22 (t ) = (1 t 2 ) P2 (t )
dt 2
P22 (t ) = 3(1 t 2 )
+e 2 2 x5 + e 21 x6 + e 20 x7 + e 21 x8 + e 22 x9 + O3
recurrence relations
vertical recurrence relation
Pk A = Pk A A < 0
k 1, A 1 k 1, A k 1, A + 1
k,A
Box 1.18:
The Fourier-Legendre space
“The base functions e k A (O , I ) , k {1,..., K } , A { K , K + 1,...,
1, 0,1,..., K 1, K } span the Fourier-Legendre space
L2 {[0, 2S ] ×] S 2, + S 2[} :
they generate a complete orthogonal (orthonormal) system of sur-
face spherical functions.”
“inner product”
: x FOURIER LEGENDRE and y FOURIER LEGENDRE :
1
< x | y >= dS x(O , I )T y (O , I ) =
S³
(1.125)
2S +S 2
1
= ³ dO ³ dI cosI x(O , I )y (O , I )
4S 0 S 2
“normalization”
2S +S 2
1
<: e k A (O , I ) | e k A > (O , I ) > = ³ dO ³ dI cos I e k A (O , I )e k A (O , I ) (1.126)
1 1 2 2
4S 0 S 2
1 1 2 2
k1 , k2 {0,..., K }
= Ok A G k k G A A
1 1 1 2 1 2
A 1 , A 2 { k ,..., + k}
1 (k1 A1 )!
Ok A = (1.127)
1 1
2k1 + 1 (k1 + A1 )!
“norms, convergence”
2S +S 2
1
|| x ||2 = O dI cos I x 2 (O , I ) =
4S ³0 ³
d (1.128)
S 2
K +k
= lim ¦ ¦ Ok A x k2A < f
K of
k = 0 A = k
ª 2 cos AO A > 0
(k + A )! «
e (O , I ) := 2k + 1
*
kA Pk A (sin I ) « 1 A = 0 (1.133)
(k A )! «
¬ 2 sin A O A < 0
(orthonormal basis)
1
(1.134) e*k A = ek A versus e k A = Ok A e*k A (1.135)
Ok A
1
(1.136) xk*A = Ok A xk A versus xk A = xk*A (1.137)
Ok A
K +k
x = lim ¦ ¦ e*k A < x | e*k A > (1.138)
K of
k = 0 A = k
“orthonormality”
< e*k A (O , I ) | e*k A (O , I ) > = G k k G A A
1 2 1 2
(1.139)
Fourier-Legendre space
K of
Those integrals are divided by the size of the surface element 4S of S 2r . Alter-
native representations of < x, y > and <e k A , e k A > (Dirac’s notation of a bracket
1 1 2 2
1-3 Case study 57
Box 1.19:
Fourier-Legendre analysis as an underdetermined linear model
- the observation space Y -
“equidistant lattice on S 2 ”
(equiangular)
S S
O [0, 2S [, I ] , + [
2 2
ª 2S
O = (i 1) i {1,..., I }
I = 2J : « i I
« Ij j {1,..., I }
¬
ª S S J
« Ik = J + (k 1) J k {1,..., }
2
J even: «
«Ik = S (k 1) S k { J + 2 ,..., J }
«¬ J J 2
ª S J +1
« Ik = (k 1) J + 1 k {1,..., 2 }
J odd: «
«Ik = (k 1) S J +3
k { ,..., J }
¬« J +1 2
58 1 The first problem of algebraic regression
2S
longitudinal interval: 'O := Oi +1 Oi =
I
ª S
« J even : 'I := I j +1 I j = J
lateral interval: «
« J odd : 'I := I j +1 I j = S
«¬ J +1
“initiation: choose J , derive I = 2 J ”
ª 'I J
« Ik = 2 + (k 1)'I k {1,..., }
2
J even: «
«Ik = 'I J + 2
(k 1)'I k { ,..., J }
¬ 2 2
ª J +1
« Ik = (k 1)'I k {1,...,
2
}
J odd: «
«Ik = (k 1)'I k { J + 3 ,..., J }
¬ 2
Oi = (i 1)'O i {1,..., I } and I = 2 J
“multivariate setup of the observation space X ”
yij = x(Oi , I j )
“vectorizations of the matrix of observations”
Example ( J = 1, I = 2) :
Sample points Observation vector y
(O1 , I1 ) ª x(O1 , I1 ) º 2×1
(O2 , I1 ) « x (O , I ) » = y \
¬ 2 1 ¼
Example ( J = 2, I = 4) :
Sample points Observation vector y
(O1 , I1 ) ª x(O1 , I1 ) º
(O2 , I1 ) « x (O , I ) »
« 2 1 »
(O3 , I1 ) « x(O3 , I1 ) »
(O4 , I1 ) « »
« x(O4 , I1 ) » = y \8×1
(O1 , I2 ) « x(O1 , I2 ) »
(O2 , I2 ) « »
« x(O2 , I2 ) »
(O3 , I2 ) « x (O , I ) »
« 3 2 »
(O4 , I2 )
¬« x(O4 , I2 ) ¼»
Number of observations: n = IJ = 2 J 2
Example: J = 1 n = 2, J = 3 n = 18
J = 2 n = 8, J = 4 n = 32.
1-3 Case study 59
Table 1.1:
Equidistant lattice on S 2
- the lateral lattice -
lateral grid
J 'I
1 2 3 4 5 6 7 8 9 10
1 - 0°
2 90° +45° -45°
3 45° 0° +45° -45°
4 45° +22,5° +67.5° -22.5° -67.5°
5 30° 0° +30° +60° -30° -60°
6 30° 15° +45° +75° -15° -45° -75°
7 22.5° 0° +22.5° +45° +67.5° -22.5° -45° -67.5°
8 22.5° +11.25° +33.75° +56.25° +78.75° -11.25° -33.75° -56.25° -78.75°
9 18° 0° +18° +36° +54° +72° -18° -36° -54° -72°
10 18° +90° +27° +45° +63° +81° -9° -27° -45° -63° -81°
Table 1.2:
Equidistant lattice on S 2
- the longitudinal lattice -
Longitudinal grid
J I = 2 J 'O
1 2 3 4 5 6 7 8 9 10
1 2 180° 0° 180°
2 4 90° 0° 90° 180° 270°
3 6 60° 0° 60° 120° 180° 240° 300°
4 8 45° 0° 45° 90° 135° 180° 225° 270° 315°
5 10 36° 0° 36° 72° 108° 144° 180° 216° 252° 288° 324°
J = 1, I = 2 J = 1, I = 2
J = 2, I = 4 J = 2, I = 4
J = 3, I = 6 J = 3, I = 6
J = 4, I = 8 J = 4, I = 8
J = 5, I = 10 J = 5, I = 10
S S
+ +
2 2
Figure 1.11 a:
Platt-Carré Map of S 2
I 0° S 2S I longitude-latitude lat-
tice, case:
J = 1, I = 2, n = 2
S S
2 2
O
62 1 The first problem of algebraic regression
S S
+ +
2 2
Figure 1.11 b:
I I Platt-Carré Map of S 2
0° S 2S
longitude-latitude lat-
tice, case:
S S J = 2, I = 4, n = 8
2 2
S
O S
+ +
2 2
Figure 1.11 c:
I I Platt-Carré Map of S 2
0° S 2S
longitude-latitude lat-
tice, case:
S S J = 3, I = 6, n = 18
2 2
S
O S
+ +
2 2
Figure 1.11 d:
I I Platt-Carré Map of S 2
0° S 2S
longitude-latitude lat-
tice, case:
S S J = 4, I = 8, n = 32
2 2
S
O S
+ +
2 2
Figure 1.11 e:
Platt-Carré Map of S 2
I 0° S 2S I longitude-latitude lat-
tice, case:
S S J = 5, I = 10, n = 50.
2 2
O
In contrast, the parameter space X, dimX = f is infinite-dimensional. The
unknown Fourier-Legendre coefficients, collected in a Pascal triangular graph
of Figure 1.10 are vectorized by
X = span{x00 , x11 , x10 , x11 ,..., xk A }
k of
k =0ok
| A |= 0 o k
dim X = m = f .
1-3 Case study 63
i, j {1,..., n}
is a representation of the linear observational equations (1.138) in Ricci calculus
which is characteristic for Fourier-Legendre analysis.
number of observed data number of unknown
at lattice points Fourier-Legendre coef-
ficients
K +k
n = IJ = 2 J 2 versus m = lim ¦ ¦ e k A
K of
k = 0 A = k
(finite) (infinite)
Such a portray of Fourier-Legendre analysis effectivly summarizes its peculiar-
rities. A finite number of observations is confronted with an infinite number of
observations. Such a linear model of type
“underdetermined of power 2”
cannot be solved in finite computer time. Instead one has to truncate the Fourier-
Legendre series, leaving the serier “bandlimited”. We consider three cases.
n>m n=m n<m
overdetermined case regular cases underdetermined case
First, we have to truncate the infinite Fourier-Legendre series that n > m hold.
In this case of an overdetermined problem, we have more observations than
equations. Second, we alternativly balance the number of unknown Fourier-
Legendre coefficients such that n = m holds. Such a model choice assures a
regular linear system. Both linear Fourier-Legendre models which are tuned to
the number of observations suffer from a typical uncertainty. What is the effect
of the forgotten unknown Fourier-Legendre coefficients m > n ? Indeed a sig-
nificance test has to decide upon any truncation to be admissible. We need an
objective criterion
to decide upon the degree m of bandlimit. Third, in order to be as obiective as
possible we again follow the third case of “less observations than unknows” such
that n < m holds. Such a Fourier-Legendre linear model generating an underde-
termined system of linear equations will consequently be considered.
64 1 The first problem of algebraic regression
Box 1.20
The first example:
Fourier-Legendre analysis as an underdetermined linear model:
m rk A = m n = 2
J = 1, I = 2 J = 2 n = IJ = 2 J 2 = 2 versus K = 1 m = ( k + 1) 2 = 4
ª x1 º
« »
ª y1 º 1 P11 ( sin I1 ) sin O1 P10 ( sin I1 ) P11 ( sin I1 ) cos O1 « x2 »
ª º
= «
« y » 1 P sin I sin O P sin I P sin I cos O « x » »
¬ 2 ¼ «¬ 11 ( 2) 2 10 ( 2 ) 11 ( 2) 2»
¼« 3»
¬ x4 ¼
subject to
AAc =
ª 1 + P11 ( sin I1 ) P11 (sin I2 ) sin O1 sin O2 + º
« »
« 1 + P11 ( sin I1 ) + P10 (sin I1 ) + P10 (sin I1 ) P10 (sin I2 ) +
2 2
»
« »
« + P11 ( sin I1 ) P11 (sin I2 ) cos O1 cos O2 »
«1 + P ( sin I ) P (sin I ) sin O sin O + »
11 2 11 1 2 1
« »
« + P10 ( sin I2 ) P10 (sin I2 ) + 1 + P112 ( sin I2 ) + P102 (sin I2 ) »
« »
¬« + P11 ( sin I2 ) P11 (sin I1 ) cos O2 cos O1 ¼»
1
AAc = 2I 2 (AAc)-1 = I 2
2
K =1 + k1 , + k 2
K =1 + k1 , + k 2
ª x1 º ª c00 º
«x » «s »
« 2» 1
xA = ( MINOS ) = « 11 » ( MINOS ) = A cy =
« x3 » « c10 » 2
« » « »
¬ x4 ¼ ¬ c11 ¼
ª1 1 º ª y1 + y2 º
«0 0 » y « »
1
= « » ª 1º = 1 « 0 ».
« »
2 « 0 0 » ¬ y2 ¼ 2 « 0 »
« » « »
¬1 1¼ ¬ y1 y2 ¼
Box 1.21
The second example:
Fourier-Legendre analysis as an underdetermined linear model:
m rk A = m n = 1
dim Y = n = 8 versus dim X 1 = m = 9
J = 2, I = 2 J = 4 n = IJ = 2 J 2 = 8
versus
k = 2 m = (k + 1) 2 = 9
66 1 The first problem of algebraic regression
“equidistant lattice,
longitudinal width 'O , lateral width 'I ”
'O = 90D , 'I = 90D
P11 (sin I1 ) = P11 (sin I2 ) = P11 (sin I3 ) = P11 (sin I4 ) = cos 45D = 0,5 2
P11 (sin I5 ) = P11 (sin I6 ) = P11 (sin I7 ) = P11 (sin I8 ) = cos( 45D ) = 0,5 2
P10 (sin I1 ) = P10 (sin I2 ) = P10 (sin I3 ) = P10 (sin I4 ) = sin 45D = 0,5 2
P10 (sin I5 ) = P10 (sin I6 ) = P10 (sin I7 ) = P10 (sin I8 ) = sin( 45D ) = 0,5 2
P22 (sin I ) = 3cos 2 I , P21 (sin I ) = 3sin I cos I , P20 (sin I ) = (3 / 2) sin 2 I (1/ 2)
P22 (sin I1 ) = P22 (sin I2 ) = P22 (sin I3 ) = P22 (sin I4 ) = 3cos 2 45D = 3 / 2
P22 (sin I5 ) = P22 (sin I6 ) = P22 (sin I7 ) = P22 (sin I8 ) = 3cos 2 ( 45D ) = 3 / 2
P21 (sin I1 ) = P21 (sin I2 ) = P21 (sin I3 ) = P21 (sin I4 ) = 3sin 45D cos 45D = 3 / 2
P21 (sin I5 ) = P21 (sin I6 ) = P21 (sin I7 ) = P21 (sin I8 ) = 3sin( 45D ) cos( 45D ) = 3 / 2
P20 (sin I1 ) = P20 (sin I2 ) = P20 (sin I3 ) = P20 (sin I4 ) = (3 / 2) sin 2 45D (1/ 2) = 1/ 4
P20 (sin I5 ) = P20 (sin I6 ) = P20 (sin I7 ) = P20 (sin I8 ) = (3 / 2) sin 2 ( 45D ) (1/ 2) = 1/ 4
sin O1 = sin O3 = sin O5 = sin O7 = 0
sin O2 = sin O6 = +1, sin O4 = sin O8 = 1
cos O1 = cos O5 = +1, cos O2 = cos O4 = cos O6 = cos O8 = 0
1-3 Case study 67
cos O3 = cos O7 = 1
sin 2O1 = sin 2O2 = sin 2O3 = sin 2O4 = sin 2O5 = sin 2O6 = sin 2O7 = sin 2O8 = 0
cos 2O1 = cos 2O3 = cos 2O5 = cos 2O7 = +1
cos 2O2 = cos 2O4 = cos 2O6 = cos 2O8 = 1
A \8×9
ª1 0 2/2 2/2 0 0 1/ 4 3 / 2 3 / 2 º
«1 2/2 2/2 0 0 3 / 2 1/ 4 0 3 / 2 »
« »
«1 0 2/2 2/2 0 0 1/ 4 3 / 2 3 / 2 »
«
A=« 1 2/2 2/2 0 0 3 / 2 1/ 4 0 3 / 2 » .
1 0 2/2 2/2 0 0 1/ 4 3 / 2 3 / 2 »
« »
«1 2/2 2/2 0 0 3 / 2 1/ 4 0 3 / 2 »
« 1 0 2/2 2/2 0 0 1/ 4 3 / 2 3 / 2 »
«¬1 2 / 2 2/2 0 0 3 / 2 1/ 4 0 3 / 2 »¼
rkA < min{n, m} < 8.
Here “the little example” ends, since the matrix A is a rank smaller than 8!
In practice, one is taking advantage of
• Gauss elimination
or
• weighting functions
in order to directly compute the Fourier-Legendre series. In order to understand
the technology of “weighting function” better, we begin with rewriting the
spherical harmonic basic equations. Let us denote the letters
+S / 2 2S
1
f k A := ³ dI cos I ³ d O Z (I )e k A (O , I ) f (O , I ) ,
4S S / 2 0
K S /2 2S
1
fkA = ¦ f k,A ³ dI cos I ³ d O w(I )e k A (O , I )e k A ( O , I ) =
k1 , A1
1 1
4S S / 2 0
1 1
K
= ¦ f ¦g e
k1 , A1 ij kA ( Oi , I j )e k A ( Oi , I j ) =
1 1
k1 , A1 i, j
K
= ¦ gij ¦ gij ekA (Oi ,I j )ek A (Oi ,I j ) = 1 1
i, j k1 ,A1
68 1 The first problem of algebraic regression
= ¦ gij f ( Oi , I j )e k A ( Oi , I j ) .
i, j
lattice: (Oi , I j ) .
Newton iteration
Level 0: x 0 = x 0 , 'y 0 = F( X) F(x 0 )
'x1 = [ J (x 0 ) ]R 'y 0
ª 1 ª 1
« xD = 2 « xE = 2 ª xJ = 0
pD = « , pE = « , pJ = «
«y = 1 «y = 1 «y = 1 3
«¬ D 3
«¬ E 3 «¬ J 3
6 6
Obviously the approximate coordinates of the three nodal points are barycentric,
namely characterized by Box 1.22: Their sum as well as their product sum van-
ish.
Box 1.22: First and second moments of nodal points, approximate coordinates
x B + x C + x H = 0, yB + y C + y H = 0
J xy = xD yD + xE yE + xJ yJ = 0
ªI º ª xD + xE + xJ º ª0 º
[ Ii ] = « I x » = « »=« »
¬ y¼ ¬« yD + yE + yJ ¼» ¬0 ¼
ª¬ I ij º¼ = « »=« »=
«¬ I xy I yy »¼ «¬ xD yD + yE xE + xJ yJ ( xD + xE + xJ ) »¼
2 2 2
ª 1 0 º
=« 2 1»
= 12 I 2
¬ 0 2¼
Box 1.23: First and second moments of nodal points, inertia tensors
2
I1 = ¦ ei I i = e1 I1 + e 2 I 2
i =1
+f +f
subject to r 2 = x 2 + y 2
U ( x, y ) = G ( x, y, xD yD ) + G ( x, y, xE yE ) + G ( x, y , xJ yJ ) .
The product sum of the approximate coordinates of the nodal points constitute
the rectangular coordinates of the inertia tensor
2
I= ¦e i
e j I ij
i , j =1
+f +f
I ij = ³ dx ³ dy U ( x, y)( x x i j r 2G ij )
f f
1-4 Special nonlinear models 71
xc = ª¬ xD , yD , xE , yE , xJ , yJ º¼ = ª¬ 12 , 16 3, 12 , 16 3, 0, 13 3 º¼
sJacobi maps
ª w F1 w F1 w F1 w F1 w F1 w F1 º
« »
« wX D wYD wX E wYE wX J wYJ »
«wF w F2 w F2 w F2 w F2 w F2 »
J (x) := « 2 » ( x) =
« wX D wYD wX E wYE wX J wYJ »
« »
« w F3 w F3 w F3 w F3 w F3 w F3 »
« wX D wYD wX E wYE wX J wYJ »¼
¬
ª 2( xE xD ) 2( y E yD ) 2( xE xD ) 2( y E yD ) 0 0 º
« »
« 0 0 2( xJ xE ) 2( yJ y E ) 2( xJ xE ) 2( yJ y E ) » =
« 2( xD xJ ) 2( yD yJ ) 0 0 2( xD xJ ) 2( yD yJ ) »¼
¬
ª 2 0 2 0 0 0º
« »
=«0 0 1 3 1 3»
« »
¬ 1 3 0 0 1 3¼
here specialized to
Box 1.25: Linearized observational equations of distance measurements
in the plane, I -MINOS, rkA = dimY
sObserved minus computeds
'y := F( X) F(x) = J (x)( X x) + O [ ( X x)
( X x) ] =
= J'x + O [ ( X x)
( X x) ] ,
2 2 2
ª 'sDE º ª SDE sDE º ª1.1 1 º ª 1 º
10
« 2 » « 2 » « » « 1»
« 'sEJ » = « S EJ sEJ » = «0.9 1» = « 10 »
2
« 2 » « 2 2 » « » « 1»
«¬ 'sJD »¼ «¬ SJD sJD »¼ ¬1.2 1 ¼ ¬ 5 ¼
ª 'xD º
« »
'yD »
0 º«
2
ª 'sDE º ª aDE bDE aDE bDE 0
« 2 » « » « 'xE »
« 'sEJ » = « 0 0 aEJ bEJ aEJ bEJ » « »
« 2 » « « 'yE »
a bJD 0 0 aJD bJD »¼ « »
¬« 'sJD ¼» ¬ JD
« 'xJ »
« 'y »
¬ J¼
slinearized observational equationss
y = Ax, y R 3 , x R 6 , rkA = 3
ª 2 0 2 0 0 0º
« »
A=«0 0 1 3 1 3»
« »
¬ 1 3 0 0 1 3¼
ª 9 3 3 º
« »
« 3 3 5 3 »
1 «« 9 3 3 »
A c( AA c) 1 = »
36 « 3 5 3 3 »
« »
« 0 6 6 »
« 2 3 4 3 4 3 »¼
¬
ª 'xD º ª 9 y1 + 3 y2 3 y3 º
« » « »
« 'yD » « 3 y1 + 3 y2 5 3 y3 »
« 'xE » 1 « 9 y1 + 3 y2 3 y3 »
xm = « »= « »
« 'yE » 36 « 3 y1 5 3 y2 + 3 y3 »
« » « »
« 'xJ » « 6 y2 + 6 y3 »
« 'y » « »
¬ J¼ ¬ 2 3 y1 + 4 3 y2 + 4 3 y3 ¼
xcm = 180
1 ª
¬ 9, 5 3, 0, 4 3, +9, 3 ¼
º
= 180
1 ª
¬ 99, 35 3, +90, 26 3, +9, +61 3 ¼ .
º
The sum of the final coordinates is zero, but due to the non-symmetric displace-
ment field ['xD , 'yD , 'xE , 'yE , 'xJ , 'yJ ]c the coordinate J xy of the inertia
tensor does not vanish. These results are collected in Box 1.26.
Box 1.26: First and second moments of nodal points, final coordinates
yD + 'yD + yE + 'yE + yJ + 'yJ = yD + yE + yJ + 'yD + 'yE + 'yJ = 0
J xy = I xy + 'I xy =
= ( xD + 'xD )( yD + 'yD ) + ( xE + 'xE )( yE + 'yE ) + ( xJ + 'xJ )( yJ + 'yJ ) =
= xD yD + xE yE + xJ yJ + xD 'yD + yD 'xD + xE 'yE + yE 'xE + xJ 'yJ + yJ 'xJ +
+ O ('xD 'yD , 'xE 'yE , 'xJ 'yJ )
= 3 /15
J xx = I xx + 'I xx =
= ( yD + 'yD ) 2 ( yE + 'yE ) 2 ('yJ yJ ) 2 =
= ( yD2 + yE2 + yJ2 ) 2 yD 'yD 2 yE 'yE 2 yJ 'yJ O ('yD2 , 'yE2 , 'yJ2 ) =
= 7 /12
J yy = I yy + 'I yy =
= ( xD + 'xD ) 2 ( xE + 'xE ) 2 ('xJ xJ ) 2 =
= ( xD2 + xE2 + xJ2 ) 2 xD 'xD 2 xE 'xE 2 xJ 'xJ O ('xD2 , 'xE2 , 'xJ2 )
= 11/ 20 .
ƅ
74 1 The first problem of algebraic regression
A classification of such a nonlinear function can be based upon the "soft" Im-
plicit Function Theorem which is a substitute for the theory of algebraic partion-
ing, namely rank partitioning. (The “soft” Implicit Function Theorem is re-
viewed in Appendix C.) Let us compute the matrix of first derivatives
wFi
[ ] R n× m ,
wX j
subject to
A R n× m , A1 = J R n× n = R r × r , A 2 = K R n× ( m n ) = R n× ( m r ) .
m-rk A is called the datum defect of the consistent system of nonlinear equations
Y = F ( X) which is a priori known. By means of such a rank partitioning we
have decomposed the vector of unknowns
Xc = [ X1c , Xc2 ]
Let us apply the “soft” Implicit Function Theorem to the nonlinear observational
equations of distance measurements in the plane which we already have intro-
1-4 Special nonlinear models 75
duced in the previous example. Box 1.27 outlines the nonlinear observational
equations for Y1 = SDE 2
, Y2 = S EJ2 , Y3 = SJD2 . The columns c1 , c 2 , c3 of the matrix
[wFi / wX j ] are linearly independent and accordingly build up the Jacobi matrix J
of full rank. Let us partition the unknown vector Xc = [ X1c , Xc2 ] , namely into the
"free parameters" [ X D , YD , YE ]c and the "bounded parameters" [ X E , X J , YJ ]c.
Here we have made the following choice for the "free parameters": We have
fixed the origin of the coordinate system by ( X D = 0, YD = 0). Obviously the
point PD is this origin. The orientation of the X-axis is given by YE = 0. In
consequence the "bounded parameters" are now derived by solving a quadratic
equation, indeed a very simple one: Due to the datum choice we find
(1st) X E = ± SDE
2
= ± Y1
(2 nd) X J = ± ( SDE
2
S EJ2 + SJD2 ) /(2SDE ) = ±(Y1 Y2 + Y3 ) /(2 Y1 )
(3rd) YJ = ± SJD2 ( SDE
2
S EJ2 + SJD2 ) 2 /(4 SDE
2
) = ± Y3 (Y1 Y2 + Y3 ) 2 /(4Y1 ) .
"algebraic notation"
Y1 = F1 ( X) = SDE
2
= ( X E X D ) 2 + (YE YD ) 2
Y2 = F2 ( X) = S EJ2 = ( X J X E ) 2 + (YJ YE ) 2
Y3 = F3 ( X) = SJD2 = ( X D X J ) 2 + (YD YJ ) 2
Y c := [Y1 , Y2 , Y3 ] = [ SDE
2
, S EJ2 , SJD2 ]
Xc := [ X 1 , X 2 , X 3 , X 4 , X 5 , X 6 ] = [ X D , YD , X E , YE , X J , YJ ]
"Jacobi matrix"
wFi
[ ]=
wX j
ª ( X 3 X 1 ) ( X 4 X 2 ) ( X 3 X 1 ) ( X 4 X 2 ) 0 0 º
=2 « 0 0 ( X 5 X 3 ) ( X 6 X 4 ) ( X 5 X 3 ) ( X 6 X 4 ) »
« »
«¬ ( X 1 X 5 ) ( X 2 X 6 ) 0 0 ( X 1 X 5 ) ( X 2 X 6 ) ¼»
wF wF
rk[ i ] = 3, dim[ i ] = 3 × 6
wX j wX j
ª ( X 3 X 1 ) ( X 4 X 2 ) ( X 3 X 1 ) º
J = «« 0 0 ( X 5 X 3 ) »» , rk J = 3
«¬ ( X 1 X 5 ) (X2 X6) 0 »¼
ª (X4 X2) 0 0 º
K = «« ( X 6 X 4 ) ( X 5 X 3 ) ( X 6 X 4 ) »» .
«¬ 0 ( X 1 X 5 ) ( X 2 X 6 ) »¼
1-4 Special nonlinear models 77
X1 = X D = 0 X 3 = X E = + SDE
( )
X 2 = YD = 0 X 5 = X J = + SDE
2
S EJ2 + SJD2 = + Y32 Y22 + Y12
( ) ()
X 4 = YE = 0 X 6 = YJ = + S EJ2 SDE
2
= + Y22 Y12
( ) ( )
ª xº ª X º ªtx º
« y » = R « Y » + «t »
¬ ¼ ¬ ¼ ¬ y¼
Reference:
Facts :(representation of a 2×2 orthonormal matrix) of Appendix A:
ª cos I sin I º
R=«
¬ sin I cos I »¼
xD = X D cos I + YD sin I t x
yD = X D sin I + YD cos I t y
xE = X E cos I + YE sin I t x
yE = X E sin I + YE cos I t y
xJ = X J cos I + YJ sin I t x
yJ = X J sin I + YJ cos I t y .
ª xD xE + xE xJ + xJ xD = min
(iii ) & x &2 = min « y y + y y + y y = min .
¬ D E E J J D
The representation of the objective function of type MINOS in term of the obser-
vations Y1 = SDE
2
, Y2 = S EJ2 , Y3 = SJD2 can be proven as follows:
78 1 The first problem of algebraic regression
Proof:
2
SDE = ( xE xD ) 2 + ( yE yD ) 2
= xD2 + yD2 + xE2 + yE2 2( xD xE + yD yE )
1
2
2
SDE + xD xE + yD yE = 12 ( xD2 + yD2 + xE2 + yE2 )
& x &2 = xD2 + yD2 + xE2 + yE2 + xJ2 + yJ2 =
= 12 ( SDE
2
+ S EJ2 + SJD2 ) + xD xE + xE xJ + xJ xD + yD yE + yE yJ + yJ yD
& x &2 :=
:= xD2 + yD2 + xE2 + yE2 + xJ2 + yJ2 = min
tx ,t y ,I
"Lagrangean "
L (t x , t y , I ) :=
:= ( X D cos I + YD sin I t x ) 2 + ( X D sin I + YD cos I t y ) 2
+ ( X E cos I + YE sin I t x ) 2 + ( X E sin I + YE cos I t y ) 2
+ ( X J cos I + YJ sin I t x ) 2 + ( X J sin I + YJ cos I t y ) 2
(t x , t y ) = arg{L (t x , t y , I ) = min} .
"solution t x , t y in Lagrangean:
reduced Lagrangean"
L (I ) :=
:= { X D cos I + YD sin I [( X D + X E + X J ) cos I + (YD + YE + YJ ) sin I ]}2 +
1
3
L (I ) =
= {[ X D ( X D + X E + X J )]cos I + [YD 13 (YD + YE + YJ )]sin I }2 +
1
3
"centralized coordinate"
'X := X D 13 ( X D + X E + X J ) = 13 (2 X D X E X J )
'Y := YD 13 (YD + YE + YJ ) = 13 (2YD YE YJ )
"reduced Lagrangean"
L1 (I ) = ('X D cos I + 'YD sin I ) 2 +
+ ('X E cos I + 'YE sin I ) 2 +
+ ('X J cos I + 'YJ sin I ) 2
step three
82 1 The first problem of algebraic regression
& x &2 = L (t x , t y , I )
step four
ª xD xE xJ º ª cos I sin I º ª X D XE X J º ªt x º
«y = « » 1c .
¬ D yE yJ »¼ «¬ sin I »«
cos I ¼ ¬ YD YE YJ »¼ «¬t y »¼
X D = 0, X E = 1.10, X J = 0.84,
YD = 0 , YE = 0 , YJ = 0.86,
test:
['X] = 0, ['Y] = 0,
['X'Y] = 0.166, ['X 2 ] = 0.661, ['Y 2 ] = 0.493,
tan 2I = 1.979, I = 31D.598,828, 457,
t x = 0.701, t y = 0.095,
1-5 Notes
What is the origin of the rank deficiency three of the linearized observational
equations, namely the three distance functions observed in a planar triangular
network we presented in paragraph three?
1-5 Notes 83
well as pseudo – distance ratios and angles equivariant. Sample references are
A. O. Barut (1972), H. Bateman (1910), F. Bayen (1976), J. Beckers, J. Harnard,
M, Perroud and M. Winternitz (1976), D. G. Boulware, L. S. Brown, R. D. Pec-
cei (1970), P. Carruthers (1971), E. Cunningham (1910), T. tom Dieck (1967),
N. Euler and W. H. Steeb (1992), P. G. O. Freund (1974), T. Fulton, F. Rohrlich
and L. Witten (1962), J. Haantjes (1937), H. A. Kastrup (1962,1966), R. Kotecky
and J. Niederle (1975), K. H. Marivalla (1971), D. H. Mayer (1975), J. A. Scho-
uten and J. Haantjes (1936), D. E. Soper (1976) and J. Wess (1990).
Box 1.33: Observables and transformation groups
Box 1.33 contains a list of observables in R n , equipped with a metric, and their
corresponding transformation groups. The number of the datum parameters coin-
cides with the injectivity rank deficiency in a consistent system of linear (lin-
earized) observational equations Ax = y subject to A R n× m , rk A = n < m,
d ( A) = m rk A .
2 The first problem of probabilistic regression
– special Gauss-Markov model with datum defect –
Setup of the linear uniformly minimum bias estimator of
type LUMBE for fixed effects.
In the first chapter we have solved a special algebraic regression problem,
namely the inversion of a system of consistent linear equations classified as
“underdetermined”. By means of the postulate of a minimum norm solution
|| x ||2 = min we were able to determine m unknowns ( m > n , say m = 106 ) from
n observations (more unknowns m than equations n, say n = 10 ). Indeed such a
mathematical solution may surprise the analyst: In the example “MINOS” pro-
duces one million unknowns from ten observations.
Though “MINOS” generates a rigorous solution, we are left with some doubts.
How can we interpret such an “unbelievable solution”?
The key for an evaluation of “MINOS” is handed over to us by treating the spe-
cial algebraic regression problem by means of a special probabilistic regression
problem, namely as a special Gauss-Markov model with datum defect. The bias
generated by any solution of an underdetermined or ill-posed problem will be
introduced as a decisive criterion for evaluating “MINOS”, now in the context of
a probabilistic regression problem. In particular, a special form of “LUMBE”,
the linear uniformly minimum bias estimator || LA - I ||2 = min , leads us to a
solution which is equivalent to “MINOS”. Alternatively we may say that in the
various classes of solving an underdetermined problem “LUMBE” generates a
solution of minimal bias.
? What is a probabilistic regression problem?
By means of a certain statistical objective function, here of type “minimum bias”,
we solve the inverse problem of linear and nonlinear equations with “fixed ef-
fects” which relates stochastic observations to parameters. According to the
Measurement Axiom observations are elements of a probability space. In terms of
second order statistics the observation space Y of integer dimension,
dim Y = n , is characterized by the first moment E {y} , the expectation of
y Y , and the central second moment D {y} , the dispersion matrix or variance-
covariance matrix Ȉ y . In the case of “fixed effects” we consider the parameter
space X of integer dimension, dim X = m , to be metrical. Its metric is induced
from the probabilistic measure of the metric, the variance-covariance matrix Ȉ y
of the observations y Y . In particular, its variance-covariance-matrix is
pulled-back from the variance-covariance-matrix Ȉ y . In the special probabilistic
regression model “fixed effects” ȟ Ȅ (elements of the parameter space) are
estimated.
Fast track reading:
Consult Box 2.2 and read only Theorem 2.3
86 2 The first problem of probabilistic regression
Please pay attention to the guideline of Chapter two, say definitions, lemma and
theorems.
Lemma 2.2
ȟ̂ hom S -LUMBE of ȟ
Theorem 2.4
equivalence of G x -MINOS
and S -LUMBE
ȟˆ = Ly , (2.3)
“ansatz”
ȟˆ = Ly (2.6)
“bias vector”
ȕ := E{ȟˆ ȟ} = E{ȟˆ} ȟ z 0 ȟ \ m (2.7)
ȕ = LE{y} ȟ = [I m LA]ȟ = 0 ȟ \ m (2.8)
88 2 The first problem of probabilistic regression
“bias matrix”
Bc = I m LA (2.9)
“bias norms”
ȟˆ = Ly , (2.13)
: Proof :
for S -LUMBE. The necessary conditions for the minimum of the quadratic
Lagrangean L (L) are
wL ˆ
wL
( ) c
L := 2 ª¬ ASA cLˆ c AS º¼ = 0. (2.17)
wL
w (vec L)
( )
Lˆ = vec 2 ª¬LASA
ˆ c SAcº
¼ (2.20)
wL
w (vec L)
( )
Lˆ = 2 [ ASAc
I m ] vec Lˆ 2 vec ( SAc ) . (2.21)
ȟˆ = SA c( ASA c) 1 y (2.23)
ȕ := E{ȟˆ} ȟ =
(2.25)
= ª¬I m SAc( ASAc) 1 A º¼ ȟ
for all ȟ \ m .
The proof of Theorem 2.3 is straight forward. At this point we have to comment
what Theorem 2.3 is actually telling us. hom LUMBE has generated the estima-
tion ȟ̂ of type (2.23), the dispersion matrix D{ȟˆ} of type (2.24) and the bias
vector of type (2.25) which all depend on C. R. Rao’s substitute matrix S ,
rk S = m . Indeed we can associate any element of the solution vector, the dis-
persion matrix as well as the bias vector with a particular weight which can be
“designed” by the analyst.
2-2 The Equivalence Theorem of Gx -MINOS and S -LUMBE
We have included the second chapter on hom S -LUMBE in order to interpret
G x -MINOS of the first chapter. The key question is open:
2-3 Examples
Due to the Equivalence Theorem G x -MINOS ~ S -LUMBE the only new items
of the first problem of probabilistic regression are the dispersion matrix
D{ȟˆ | hom LUMBE} and the bias matrix B{ȟˆ | hom LUMBE} . Accordingly the
first example outlines the simple model of the variance-covariance matrix
D{ȟˆ} =: Ȉȟˆ and its associated Frobenius matrix bias norm || B ||2 . New territory
is taken if we compute the variance-covariance matrix D{ȟˆ * } =: Ȉȟˆ and its re-
*
lated Frobenius matrix bias norm || B* ||2 for the canonical unknown vector ȟ* of
star coordinates [ȟˆ1* ,..., ȟˆ *m ]c , lateron rank partitioned.
Example 2.1 (simple variance-covariance matrix D{ȟˆ | hom LUMBE} ,
Frobenius norm of the bias matrix || B(hom LUMBE) || ):
The dispersion matrix Ȉ := D{ȟˆ} of ȟ̂ (hom LUMBE) is called
ȟˆ
Simple,
if S = I m and Ȉ y := D{y} = I n ı y2 .
Such a model is abbreviated
“i.i.d.” and “u.s.”
or or
independent identically
unity substituted
distributed observations
(unity substitute matrix).
(one variance component)
Such a simple dispersion matrix is represented by
Ȉȟˆ = A c( AA c) 2 Aı 2y . (2.27)
The Frobenius norm of the bias matrix for such a simple invironment is derived
by
92 2 The first problem of probabilistic regression
Table 2.1:
Simple variance-covariance matrix
(i.i.d. and u.s.)
Frobenius norm of the simple bias matrix
Front page example 1.1
A \ 2×3 , n = 2, m = 3
ª1 1 1 º ª3 7 º 1 ª 21 7 º
A := « » , AAc = « » , ( AAc) 1 = «
¬1 2 4 ¼ ¬7 21¼ 14 ¬ 7 3 »¼
rk A = 2
|| B || = 1 = d .
2-3 Examples 93
canonically simple,
if S = I m and Ȉ y := D{y * } = I nV y2 . In short, we denote such a model by
* *
or
ª1 1 º
var ȟˆ 1* = Diag « 2 ,..., 2 » V y2 , ȟ1* \ r ×1 ,
*
¬ O1 Or ¼
( )
cov ȟˆ 1* , ȟˆ *2 = 0, var ȟˆ *2 = 0, ȟ*2 \ ( m r )×1 .
Box 2.3:
Canonical bias vector, canonical bias matrix
“ansatz”
ȟˆ * = L* y *
“bias vector”
ȕ := E{ȟ* } ȟ * ȟ * \ m
*
94 2 The first problem of probabilistic regression
ȕ* = L* E{y * } ȟ* ȟ* \ m
ª ȕ* º ªI 0 º ª ȁ 1 º ª ȟ1* º
ȕ* (hom LUMBE) = « 1* » = ( « r « »[ ȁ , 0 ] « *»
) (2.31)
¬ȕ 2 ¼ ¬0 I d »¼ ¬ 0 ¼ ¬ȟ 2 ¼
for all ȟ*1 \ r , ȟ*2 \ d
ª ȕ* º ª0 0 º ª ȟ*1 º
ȕ* (hom LUMBE) = « *1 » = « »« *» (2.32)
¬ȕ 2 ¼ ¬ 0 I d ¼ ¬ȟ 2 ¼
ª ȕ* º ª0º
ȕ* (hom LUMBE) = « *1 » = « * » ȟ *2 \ d (2.33)
¬ȕ 2 ¼ ¬ȟ 2 ¼
“bias matrix”
(B* )c = I m L* A*
ªI 0 º ª ȁ 1 º
ª¬B* (hom LUMBE) º¼c = « r « » [ ȁ, 0 ] (2.34)
¬0 I d »¼ ¬ 0 ¼
ª0 0 º
ª¬B* (hom LUMBE) º¼c = « » (2.35)
¬0 I d ¼
“Frobenius norm of the canonical bias matrix”
ª0 0 º
|| B* (hom LUMBE) ||2 = tr « » (2.36)
¬0 I d ¼
Lemma 3.2
x A G y -LESS of x
Lemma 3.3
x A G y -LESS of x
Lemma 3.4
x A G y -LESS of x
constrained Lagrangean
Lemma 3.5
x A G y -LESS of x
constrained Lagrangean
Theorem 3.6
bilinear form
Lemma 3.7
Characterization of G y -LESS
3-1 Introduction
First, the introductory example solves the front page inconsistent system of lin-
ear equations,
x1 + x2 1 x1 + x2 + i1 = 1
x1 + 2 x2 3 or x1 + 2 x2 + i2 = 3
x1 + 3 x2 4 x1 + 3x2 + i3 = 4,
Box 3.1:
Special linear model:
polynomial of degree one, three observations, two unknowns
ª y1 º ª a11 a12 º
ªx º
y = «« y2 »» = «« a21 a22 »» « 1 »
x
«¬ y3 »¼ «¬ a31 a32 »¼ ¬ 2 ¼
ª1 º ª1 1 º ª i1 º ª1 1 º
ª x1 º « »
y = Ax + i : « 2 » = «1 2 » « » + «i2 » A = ««1 2 »»
« » « »
x
«¬ 4 »¼ «¬1 3 »¼ ¬ 2 ¼ «¬ i3 »¼ «¬1 3 »¼
r = rk A = dim X = m = 2 .
g : R ( f )
y / 6 x D( f ) is one-to-one, the mapping f is injective. Alterna-
tively we may identify the kernel N(f), or the null space N ( A ) with {0} .
For instance, let us solve the first two equations, namely x1 = 0, x2 = 1. As soon
as we substitute this solution in the third one, the inconsistency 3 z 4 is met.
Obviously such a system of linear equations needs general inconsistency parame-
ters (i1 , i2 , i3 ) in order to avoid contradiction. Since the right-hand side of the
equations, namely the inhomogeneity of the system of linear equations, has been
measured as well as the model (the model equations) have been fixed, we have
no alternative but inconsistency.
Within matrix algebra the index of the linear operator A is the rank r = rk A,
here r = 2, which coincides with the dimension of the parameter space X,
dim X = m, namely r = rk A = dim X = m, here r=m=2. In the terminology of
the linear mapping f, f is not “onto” (surjective), but “one-to-one” (injective).
The left complementary index of the linear operator A R n× m , which account for
the surjectivity defect is given by d s = n rkA, also called “degree of freedom”
(here d s = n rkA = 1 ). While “surjectivity” related to the range R(f) or “the
range space R(A)” and “injectivity” to the kernel N(f) or “the null space N(A)”
we shall constructively introduce the notion of
Box 3.2:
Special linear model: polynomial of degree one,
three observations, two unknowns
ª y1 º ª1 t1 º ª i1 º
« » « » ª x1 º « »
y = « y2 » = «1 t2 » « » + «i2 »
x
«¬ y3 »¼ «¬1 t3 »¼ ¬ 2 ¼ «¬ i3 »¼
ª t1 = 1, y1 = 1 ª 1 º ª1 1 º ª i1 º
« « » « » ª x1 º « »
«t2 = 2, y2 = 2 : « 2 » = «1 2 » « » + «i2 » ~
x
«¬ t3 = 3, y3 = 4 «¬ 4 »¼ «¬1 3 »¼ ¬ 2 ¼ «¬ i3 »¼
~ y = Ax + i, r = rk A = dim X = m = 2 .
Thirdly, let us begin with a more detailed analysis of the linear mapping
f : Ax y or Ax + i = y , namely of the linear operator A R n× m , r = rk A =
dim X = m. We shall pay special attention to the three fundamental partition-
ings, namely
(i) algebraic partitioning called rank partitioning of the matrix A,
(ii) geometric partitioning called slicing of the linear space Y
(observation space),
(iii) set-theoretical partitioning called fibering of the set Y of
observations.
3-13 Least squares solution of the front page example by means of vertical
rank partitioning
Let us go back to the front page inconsistent system of linear equations, namely
the problem to determine two unknown polynomial coefficients from three
sampling points which we classified as an overdetermined one. Nevertheless we
are able to compute a unique solution of the overdetermined system of
inhomogeneous linear equations Ax + i = y , y R ( A) or rk A = dim X , here
A R 3× 2 x R 2×1 , y R 3×1 if we determine the coordinates of the unknown
vector x as well as the vector i of the inconsistency by least squares (minimal
Euclidean length, A2-norm), here & i &2I = i12 + i22 + i32 = min .
Box 3.3 outlines the solution of the related optimization problem.
3-1 Introduction 101
Box 3.3:
Least squares solution of the inconsistent system of
inhomogeneous linear equations, vertical rank partitioning
The solution of the optimization problem
{& i &2I = min | Ax + i = y , rk A = dim X}
x
is based upon the vertical rank partitioning of the linear mapping
f : x 6 y = Ax + i, rk A = dim X , which we already introduced.
As soon as
ª y1 º ª A1 º ª i1 º r ×r
« y » = « A » x + « i » subject to A1 R
¬ 2¼ ¬ 2¼ ¬ 2¼
x = A1 i1 + A1 y1
1 1
y 2 = A 2 A11i1 + i 2 + A 2 A11y1
i 2 = A 2 A11i1 A 2 A11 y1 + y 2
is implemented in the norm & i &2I we are prepared to compute the first
derivatives of the unconstrained Lagrangean
L (i1 , i 2 ) := & i &2I = i1ci1 + i c2 i 2 =
= i1ci1 + i1c A1c1A c2 A 2 A11i1 2i1c A1c1A c2 ( A 2 A11y1 y 2 ) +
+( A 2 A11y1 y 2 )c( A 2 A11y1 y 2 ) =
= min
i1
wL
(i1l ) = 0
wi1
A1c1A c2 ( A 2 A11y1 y 2 ) + [ A1c1Ac2 A 2 A11 + I ] i1l = 0
i1l = [I + A1c1Ac2 A 2 A11 ]1 A1c1 A c2 ( A 2 A11y1 y 2 )
1st term
(I + A1c1A c2 A 2 A11 ) 1 A1c1A c2 A 2 A11y1 = ( A1c + A c2 A 2 A11 ) 1 A c2 A 2 A11y1 =
A1 ( A1c A1 + A c2 A 2 ) 1 A c2 A 2 A11y1 = A1 ( A1c A1 + A c2 A 2 ) 1 A1cy1 +
+ A1 ( A1c A1 + A c2 A 2 ) 1 ( A c2 A 2 A11 + A1c )y1 = A1 ( A1c A1 + A c2 A 2 ) 1 A1cy1 +
+( A1c A1 + A c2 A 2 ) 1 ( A c2 A 2 + A1c A1 )y1 = A1 ( A1c A1 + A c2 A 2 ) 1 A1cy1 + y1
2nd term
w2L
(i1l ) = 2[( A 2 A11 )c( A 2 A11 ) + I] > 0
wi1wi1c
due to positive-definiteness of the matrix ( A 2 A11 )c( A 2 A11 ) + I
generate the sufficiency condition for obtaining the minimum of
the unconstrained Lagrangean. Finally let us backward transform
i1l 6 i 2 l = A 2 A11i1l A 2 A11 y1 + y 2 .
i 2 l = A 2 ( A1c A1 + Ac2 A 2 ) 1 ( A1cy1 + Ac2 y 2 ) + y 2 .
¬ 2l ¼ ¬ 2¼ ¬ 2¼ ¬ 2¼
or
i l = A( A cA) 1 y + y.
Finally we are left with the backward step to compute the unknown
vector of parameters x X :
xl = A11i1l + A11 y1
xl = ( A1c A1 + A c2 A 2 ) 1 ( A1cy1 + A c2 y 2 )
or
xl = ( A cA) 1 Acy.
3-1 Introduction 103
ª 8 3 º
A1 ( A1c A1 + A c2 A 2 ) 1 = 16 « »,
¬2 0 ¼
ª8º ª1 º
A1cy1 + Ac2 y 2 = « » , y1 = « » , y 2 = 4
¬19 ¼ ¬ 3¼
ª 1º
i1l = 16 « » , i 2 l = 16 , & i l & I = 16 6,
¬2¼
ª 2 º
xl = 16 « » , & xl &= 16 85,
¬9¼
y(t ) = 13 + 32 t
1 w 2L ª 2 2 º
(x 2m ) = [( A 2 A11 )c( A 2 A11 ) + I] = « » > 0,
2 wx 2 wxc2 ¬ 2 5 ¼
§ ª 2 2 º · § ª 2 2 º ·
" first eigenvalue O1 ¨ « » ¸ = 6", " second eigenvalue O2 ¨ « » ¸ = 1".
© ¬ 2 5 ¼ ¹ © ¬ 2 5 ¼ ¹
The diagnostic algorithm for solving an overdetermined system of linear equa-
tions y = Ax, rk A = dim X = m, m < n = dim Y, y Y by means of rank parti-
tioning is presented to you by Box 3.4.
3-14 The range R(f) and the kernel N(f), interpretation of “LESS”
by three partitionings:
(i) algebraic (rank partitioning)
(ii) geometric (slicing)
(iii) set-theoretical (fibering)
Fourthly, let us go into the detailed analysis of R(f), R ( f ) A , N(f), with respect
to the front page example. Beforehand we begin with a comment.
We want to emphasize the two step procedure of the least squares solution
(LESS) once more: The first step of LESS maps the observation vector y onto
the range space R(f) while in the second step the LESS point y R ( A) is
uniquely mapped to the point xl X , an element of the parameter space. Of
104 3 The second problem of algebraic regression
ª1 1 º ª1 1 º
«1 2 » = A = ª A1 º = «1 2» ,
« » « » « »
«¬1 3 »¼ ¬ A 2 ¼ «1 3»
¬ ¼
ª1 º
ª y1 º « » 2 ×1
« » = «3 » , y1 R , y 2 R .
y
¬ 2 ¼ «4»
¬ ¼
By means of the vertical rank partitioning of the inconsistent sys-
tem of inhomogeneous linear equations an identification of the
range space R(A), namely
R ( A) = {y R n | y 2 A 2 A11 y1 = 0}
is based upon
y1 = A1x + i1 x1 = A11 (y1 i1 )
y 2 = A 2 x + i 2 x 2 = A 2 A11 (y1 i1 ) + i 2
y 2 A 2 A11 y1 = i 2 A 2 A11i1
which leads to the range space R(A) for inconsistency zero, particu-
larly in the introductory example
1
ª1 1 º ª y1 º
y3 [1, 3] « » « » = 0.
¬1 2 ¼ ¬ y2 ¼
For instance, if we introduce the coordinates y1 = u , y2 = v, the other
coordinate y3 of the range space R(A) Y = R 3 amounts to
ª 2 1º ªu º ªu º
y3 = [1, 3] « » « v » = [ 1, 2] « v »
¬ 1 1 ¼¬ ¼ ¬ ¼
y3 = u + 2v.
In geometric language the linear space R(A) is a parameterized plane
2
P 0 through the origin illustrated by Figure 3.1. The observation space
Y = R n (here n = 3) is sliced by the subspace, the linear space (linear
manifold) R(A), dim R ( A) = rk( A) = r , namely a straight line, a
plane (here), a higher dimensional plane through the origin O.
106 3 The second problem of algebraic regression
y
2
0
e3
ec1
e2
e1
Box 3.5:
Algorithm
Diagnostic algorithm for solving an overdetermined system of
linear equations y = Ax + i, rk A = dim X , y R ( A) by means
of rank partitioning
Determine
the rank of the matrix A
rk A = dimX = m
3-1 Introduction 107
Compute
the “vertical rank partitioning”
ªA º
A = « 1 » , A1 R r × r = R m × n , A 2 R ( n r )× r = R ( n m )× m
¬A2 ¼
“n – r = n – m = ds is called
left complementary index”
“A as a linear operator is not
surjective, but injective”
Compute
the range space R(A)
R ( A) := {y R n | y 2 A 2 A11 y1 = 0}
Compute
the inconsistency vector of type LESS
i l = A( AcA) 1 y + y
test : A ci l = 0
Compute
the unknown parameter vector of type LESS
xl = ( A cA) 1 Acy .
h
What is the geometric interpretation of the least-squares solution & i &2I = min ?
maps the point y R ( f ) to the point xl of the chosen chart of the parameter
space X as a differentiable manifold. Examples follow lateron.
Let us continue with the geometric interpretation of the linear model of this
paragraph. The range space R(A), dim R ( A) = rk( A) = m is a linear space of
dimension m, here m = rk A, which slices R n . In contrast, the subspace R ( A) A
corresponds to a n rk A = d s dimensional linear space Ln r , here n - rk A = n –
m, r = rk A= m.
Let the algebraic partitioning and the geometric partitioning be merged to inter-
pret the least squares solution of the inconsistent system of linear equations as a
generalized inverse (g-inverse) of type LESS. As a summary of such a merger we
take reference to Box 3.6.
Axl = AA l y = AA l ( Axl + i l ) º
»
A ci l = Ac[I A( AcA) Ac]y = 0 A l i l = ( A cA) Aci l = 0 ¼
1 1
Axl = AA l Axl AA A = A .
xl = ( A cA) 1 A cy = A l y = A l ( Axl + i l ) º
»
A l i l = 0 ¼
xl = A l y = A l AA l y
A l y = A l AA l y A AA = A .
y = Axl + i l = AA l + (I AA l )y
y = Axl + i l = A( A cA) 1 A cy + [I A( A cA) 1 A c]y º
»
y = y R(A) + i R(A) A
»¼
A A = PR ( A ) , (I AA ) = PR ( A
A
)
.
Box 3.6:
The three condition of the generalized inverse mapping
(generalized inverse matrix) LESS type
Condition #1 Condition #1
f (x) = f ( g (y )) Ax = AA Ax
f = f DgD f AA A = A
Condition #2 Condition #2
(reflexive g-inverse mapping) (reflexive g-inverse)
x = g (y ) = x = A y = A AA y
= g ( f (x)) A AA = A
Condition #3 Condition #3
f ( g (y )) = y R ( A ) A Ay = y R (A)
f D g = projR (f) A A = PR (A) .
3-2 The least squares solution: “LESS” 111
The set-theoretical partitioning, the fibering of the set system of points which
constitute the observation space Y, the range R(f), will be finally outlined. Since
the set system Y (the observation space) is R n , the fibering is called “trivial”.
Non-trivial fibering is reserved for nonlinear models in which case we are deal-
ing with a observation space as well as an range space which is a differentiable
manifold. Here the fibering
Y = R( f ) R( f )A
produces the trivial fibers R ( f ) and R ( f ) A where the trivial fibers R ( f ) A is the
quotient set R n /R ( f ) . By means of a Venn diagram (John Venn 1834-1928)
also called Euler circles (Leonhard Euler 1707-1783) Figure 3.3 illustrates the
trivial fibers of the set system Y = R n generated by R ( f ) and R ( f ) A . The set
system of points which constitute the parameter space X is not subject to fiber-
ing since all points of the set system R(f) are mapped into the domain D(f).
Figure 3.3: Venn diagram, trivial fibering of the observation space Y, trivial
fibers R ( f ) and R ( f ) A , f : R m = X o Y = R ( f ) R ( f ) A , X set
system of the parameter space, Y set system of the observation
space.
3-2 The least squares solution: “LESS”
The system of inconsistent linear equations Ax + i = y subject to A R n×m ,
rk A = m < n , allows certain solutions which we introduce by means of Defini-
tion 3.1 as a solution of a certain optimization problem. Lemma 3.2 contains the
normal equations of the optimization problem. The solution of such a system of
normal equations is presented in Lemma 3.3 as the least squares solution with
respect to the G y - norm . Alternatively Lemma 3.4 shows the least squares solu-
tion generated by a constrained Lagrangean. Its normal equations are solved for
(i) the Lagrange multiplier, (ii) the unknown vector of inconsistencies by Lemma
3.5. The unconstrained Lagrangean where the system of linear equations has
been implemented as well as the constrained Lagrangean lead to the identical
solution for (i) the vector of inconsistencies and (ii) the vector of unknown pa-
rameters. Finally we discuss the metric of the observation space and alternative
choices of its metric before we identify the solution of the quadratic optimization
problem by Lemma 3.7 in terms of the (1, 2, 3)-generalized inverse.
112 3 The second problem of algebraic regression
ª rk A = dim X = m
Ax + i = y , y Y { R , ««n
or (3.1)
«¬ y R ( A)
: Proof :
G y -LESS is constructed by means of the Lagrangean
b ± b 2 4ac
L(x) := & i &2G = & y Ax &2G =
y y
2a
= xcA cG y Ax 2y cG y Ax + y cG y y = min
x
wL w i cG y i
(xl ) = (xl ) = 2 A cG y ( Axl y ) = 0
wx wx
constitute the necessary conditions. The theory of vector derivative is presented
in Appendix B. The second derivatives
3-2 The least squares solution: “LESS” 113
w2L w 2 i cG y i
(xl ) = (xl ) = 2 A cG y A t 0
wx wxc wx wxc
y = y l + il (3.12)
is an orthogonal decomposition of the observation vector
y Y { R n into the G y -LESS vector y l Y = R n and the G y -
LESS vector of inconsistency i l Y = R n subject to
y l = Axl = A( A cG y A) 1 AcG y y , (3.13)
i l = y Axl =[I n A( AcG y A) 1 AcG y ] y . (3.14)
: Proof :
G y -LESS is based on the constrained Lagrangean
L(i, x, Ȝ ) := i cG y i + 2Ȝ c( $x + i y ) = min
i , x, Ȝ
ªG y 0I n º ª il º ª 0 º
« 0 0A c»» «« xl »» = «« 0 »»
«
«¬ I n A 0 »¼ «¬ Ȝ l »¼ «¬ y »¼
1 w2L
( xl ) = G y t 0
2 w i w ic
i l =[I n A( A cG y A) 1 A cG y ] y, (3.20)
Ȝ l =[G y A( A cG y A) 1 A c I n ] G y y. (3.21)
:Proof :
A basis of the proof could be C. R. Rao´s Pandora Box, the theory of inverse
partitioned matrices (Appendix A: Fact: Inverse Partitioned Matrix /IPM/ of a
116 3 The second problem of algebraic regression
Multiply the third normal equation by A cG y , multiply the first normal equation
by Ac and substitute A c Ȝ l from the second normal equation in the modified first
one.
A cG y Axl + AcG y i l A cG y y = 0 º
A cG y i l + A cȜ l = 0 »
»
A cȜ l = 0 »¼
A cG y Axl + AcG y i l A cG y y = 0 º
»
A cG y i l = 0 ¼
A cG y Axl A cG y y = 0,
xl = ( A cG y A) 1 AcG y y.
Ȝ l =[G y A( AcG y A) 1 A cG y G y ] y.
ƅ
Of course the G y -LESS of type (3.2) and the G y -LESS solution of type con-
strained Lagrangean (3.16) are equivalent, namely (3.11) ~ (3.19) and (3.14) ~
(3.20).
In order to analyze the finite dimensional linear space Y called “the observation
space”, namely the case of a singular matrix of its metric, in more detail, let us
take reference to the following.
3-2 The least squares solution: “LESS” 117
Box 3.7:
Canonical representation of the rank deficient matrix
of the matrix of the observation space Y
rk G y =: ry , ȁ y := Diag(O1 ,..., Or ) . y
118 3 The second problem of algebraic regression
subject to
{
U SO(n(n 1) / 2) := U \ n× n | UcU = I n , U = +1 }
n× ry n×( n ry ) ry ×ry
U1 \ , U2 \ , ȁy \
ry × n ( n ry )× ry ( n ry )×( n ry )
01 \ , 02 \ , 03 \
“norms”
(3.26) || i ||G2 = || i ||2U ȁ U c
y ~ i cG y i = i cU1 ȁ y U1ci (3.27)
1 y 1
ric of the observation space Y has to be given a priori. We classified LESS ac-
cording to (i) G y = I n , (ii) G y positive definite and (iii) G y positive semidefi-
nite. But how do we know the metric of the observation space Y? Obviously we
need prior information about the geometry of the observation space Y, namely
from the empirical sciences like physics, chemistry, biology, geosciences, social
sciences. If the observation space Y R n is equipped with an inner product
¢ y1 | y 2 ² = y1cG y y 2 , y1 Y, y 2 Y where the matrix G y of the metric & y &2 = y cG y y
is positive definite, we refer to the metric space Y R n as Euclidean E n . In
contrast, if the observation space is positive semidefinite we call the observation
space semi Euclidean E n , n . n1 is the number of positive eigenvalues, n2 the
1 2
vation space LESS has to be generalized to & y Ax &2G = extr , for instance
y
There are various alternative scales or objective functions for projection matrices
for substituting Euclidean metrics termed robustifying. In special cases those
objective functions operate on
(3.11) xl = Hy subject to H x = ( AcG y A) 1 AG y ,
(3.13) y A = H y y subject to H y = A( A cG y A) 1 AG y ,
(3.14) i A = H A y subject to H A = ª¬I n A( A cG y A) 1 AG y º¼ y ,
where {H x , H y , H A } are called “hat matrices”. In other cases analysts have to
accept that the observation space is non-Euclidean. For instance, direction ob-
servations in R p locate points on the hypersphere S p 1 . Accordingly we have to
accept an objective function of von Mises-Fisher type which measures the spheri-
cal distance along a great circle between the measurement points on S p 1 and the
mean direction. Such an alternative choice of a metric of a non- Euclidean space
Y will be presented in chapter 7.
Here we discuss in some detail alternative objective functions, namely
• optimal choice of the weight matrix G y :
second order design SOD
• optimal choice of the weight matrix G y by means of
condition equations
• robustifying objective functions
w2L
G x = A c(x)G y A(x) = 1
2
=: FISHER
wx A wxcA
at the “point“ x A of type LESS. The first order design problem aims at determin-
ing those points x within the Jacobi matrix A by means of a properly chosen
risk operating on “FISHER”. Here, “FISHER” relates the weight matrix of the
observations G y , previously called the matrix of the metric of the observation
space, to the weight matrix G x of the unknown parameters, previously called
the matrix of the metric of the parameter space.
Gx Gy
weight matrix of weight matrix of
the unknown parameters the observations
or or
matrix of the metric of matrix of the metric of
the parameter space X the observation space Y .
Being properly prepared, we are able to outline the optimal choice of the weight
matrix G y or X , also called the second order design problem, from a criterion
matrix Y , an ideal weight matrix G x (ideal) of the unknown parameters, We
hope that the translation of G x and G y “from metric to weight” does not cause
any confusion. Box 3.8 elegantly outlines SOD.
Box 3.8:
Second order design SOD, optimal fit to a criterion matrix of weights
“weight matrix of 3-21 “weight matrix of
the parameter space“ the observation space”
(3.28) Y := 1
2
w2L
= Gx ( )
X := G y = Diag g1y ,..., g ny (3.29)
wx A wx Ac
x := ª¬ g1y ,..., g ny º¼c
vec ǻ =
vec ǻ \ n ×1 , vec X \ n ×1 , ( A c
A c) \ n ×n , A c : A c \ n ×n
2 2 2 2 2
The optimal fit “ A cXA to Y “ is achieved by the Lagrangean || ǻ ||2 = min , the
optimum of the Frobenius norm of the inconsistency matrix ǻ . The vectorized
form of the inconsistency matrix, vec ǻ , leads us first to the matrix ( A c
A c) ,
the Zehfuss product of Ac , second to the Kronecker matrix ( A c : A c) , the Kha-
tri- Rao product of Ac , as soon as we implement the diagonal matrix X . For a
definition of the Kronecker- Zehfuss product as well as of the Khatri- Rao
product and related laws we refer to Appendix A. The unknown weight vector x
is LESS, if
y3 = 6.94 km
Pį
y1 = 13.58 km y2 = 9.15 km
PĮ Pȕ
ª0.511 0 0 º
«
Gy = « 0 0.974 0 »» ,
«¬ 0 0 0.515»¼
which lead us to the weight G x = I 2 a posteriori. Note that the weights came out
positive.
Box 3.9:
Example for a second order design problem, trilateration network
ª 0.454 0.891º
A = «« 0.809 +0.588»» , X = Diag( x1 , x2 , x3 ), Y = I 2
«¬ +0.707 +0.707 »¼
A c Diag( x1 , x2 , x3 ) A = I 2
ª1 0 º
=« »
¬0 1 ¼
“inconsistency ǻ = 0 ”
(1st) 0.206 x1 + 0.654 x2 + 0.5 x3 = 1
(2nd) 0.404 x1 0.476 x2 + 0.5 x3 = 0
(3rd) 0.794 x1 + 0.346 x2 + 0.5 x3 = 1
Box 3.10:
Taylor-Karman structure of a homogeneous and isotropic tensor- val-
ued, two-point function, two-dimensional, planar network
ª gx x gx y gx x gx y º
« 1 1 1 1 1 2 1 2
»
« gy x 1 1
gy y
1 1
gy x 1 2
gy y »
1 2
Gx = « » G x (xD , x E )
« gx x 2 1
gx y
2 1
gx x 2 2
gx y »
2 2
«g gy y gy x g y y »»
«¬ y x 2 1 2 1 2 2 2 2
¼
“Euclidean distance function of points PD (xD , yD ) and
PE (x E , y E ) ”
ª x j ( PD ) x j ( PE ) º¼ ª¬ x j ( PD ) x j ( PE ) º¼
= f m ( sDE )G j j + ª¬ f A ( sDE ) f m ( sDE ) º¼ ¬ 1 1 2 2
(3.36)
1 2
s2 DE
j1 , j2 {1, 2} , ( xD , yD ) = ( x1 , y1 ), ( xE , yE ) = ( x2 , y2 ).
y R ( A ) = PR ( A ) y , y R = PR y,
( A )A ( A )A
y A R ( A ) is an element i A R ( A ) is an element
A
versus
of the range space R ( A ) , of its orthogonal comple-
in general the tangent space ment in general the normal
space R ( A ) .
A
Tx M of the mapping f (x)
G y -orthogonality y A i A Gy
= 0 is proven in Box 3.11.
Box 3.11
G y -Orthogonality of
y A = y ( LESS ) and i A = i ( LESS )
“ G y -orthogonality”
yA iA Gy
=0 (3.37)
yA iA GA
= y c ¬ªG y A( AcG y A) 1 A c¼º G y ¬ª I n A( AcG y A) 1 A cG y ¼º y =
= y cG y A( A cG y A) 1 A cG y y cG y A ( A cG y A ) 1 A cG y A( A cG y A) 1 A cG y y =
= 0.
There is an alternative interpretation of the equations of G y -orthogonality
i A y A G = i AcG y y A = 0 of i A and y A . First, replace iA = PR A y where PR A is
+ +
( ) ( )
a characteristic projection matrix. Second, substitute y A = Ax A where x A is G y -
y
Box 3.12
G y -orthogonality of A and B
i A R ( A) A , dim R ( A) A = n rk A = n m
y A R ( A ) , dim R ( A ) = rk A = m
3-2 The least squares solution: “LESS” 127
iA yA Gy
= 0 ª¬I n A( AcG y A) 1 A cG y º¼c G y A = 0 (3.38)
rk ª¬ I n A( A cG y A) 1 A cG y º¼ = n rk A = n m (3.39)
ª¬I n A( A cG y A) 1 A cG y º¼ = [ B, C] (3.40)
B \ n× ( n m ) , C \ n× m , rk B = n m
iA yA Gy
= 0 BcG y A = 0 . (3.41)
ª hDE º ª 1 0 º ªiDE º
« » « » ª hDE º « »
« hEJ » = « 0 1 » « h » + « iEJ »
« hJD » «¬ 1 1»¼ ¬ EJ ¼ « iJD »
¬ ¼ ¬ ¼
ª hDE º ª1 0º
« » ª hDE º
y := « hEJ » , A := «« 0 1 »» , x := « »
« hJD » «¬ 1 1»¼ ¬ hEJ ¼
¬ ¼
y \ 3×1 , A \ 3× 2 , rk A = 2, x \ 2×1 .
ª y1 º
ª 2 1 1º « »
x A = A A y = ( AcA) 1 Acy = 13 « » « y2 »
¬ 1 2 1¼ « »
¬ y3 ¼
ª hDE º ª 2 y y2 y3 º
xA = « » = 13 « 1 »
h
¬ EJ ¼ A ¬ y1 + 2 y2 y3 ¼
ª 2 1 1º
y A = AxA = AA y = A( A A) A y = 3 « 1 2 1»» y
A
c 1
c 1«
«¬ 1 1 2 »¼
ª 2 y1 y2 y3 º
y A = «« y1 + 2 y2 y3 »»
1
3
«¬ y1 y2 + 2 y3 »¼
( )
i A = y Ax A = I n AA A y = ª¬I n A ( A cA) 1 A cº¼ y
ª1 1 1º ª y1 + y2 + y3 º
i A = ««1 1 1»» y =
1
3
1
3
«y + y + y »
« 1 2 3»
«¬1 1 1»¼ «¬ y1 + y2 + y3 »¼
ª1 1 1º ª y1 º
|| i A ||2 = [ y1 , y2 , y3 ] 13 ««1 1 1»» «« y2 »»
«¬1 1 1»¼ «¬ y3 »¼
ª1 1 1º
G A := I n H y = I n AA = I n A ( A cA ) A c = ««1 1 1»» \ 3×3 ,
A
1 1
3
«¬1 1 1»¼
( )
det G A = det I n AA A = 0, rk(I n AA A ) = n m = 1
ª1 1 1º
« »
G A = ª¬I 3 AA º¼ = [ B, C] = «1 1 1»
A
1
3
«¬1 1 1»¼
B \ 3×1 , C \ 3× 2 .
ª G y ALA = G y A
«G AL = (G AL)c. (3.43)
¬ y y
: Proof :
According to the theory of the generalized inverse presented in Appendix A
x A = Ly is G y -LESS of (3.1) if and only if A cG y AL = AcG y is fulfilled. In-
deed A cG y AL = AcG y is equivalent to the two conditions G y ALA = G y A and
G y AL = (G y AL)c . For a proof of such a statement multiply A cG y AL = AcG y
left by Lc and receive
LcA cG y AL = LcAcG y .
we are led to
G y AL = LcA cG y AL = (G y AL)c ,
Let us fulfil G y Ax A by
: Proof :
If G y is positive definite, there exists the inverse matrix G y1 . (3.43) can be
transformed into the equivalent condition
A c = A cLcA and G y1LcAc = (G y1LcAc)c ,
Box 3.13:
General bases versus orthonormal bases
spanning the parameter space X as well as the observation space Y
132 3 The second problem of algebraic regression
“left” “right”
“parameter space” “observation space”
versus versus
- 12 - 12
(3.50) e x = V cȁ x a e y = Ucȁ y b (3.51)
by introducing
• the eigenspace of the rectangular matrix A \ n× m of rank
r := rk A = m , n > m : A 6 A*
• the left and right canonical coordinates: x o x* , y o y *
as supported by Box 3.14. The transformations x 6 x* (3.52), y 6 y * (3.53)
from the original coordinates ( x1 ,..., xm ) to the canonical coordinates ( x1* ,..., xm* ) ,
the left star coordinates, as well as from the original coordinates ( y1 ,..., yn ) to the
canonical coordinates ( y1* ,..., yn* ) , the right star coordinates, are polar decompo-
1 1
sitions: a rotation {U, V} is followed by a general stretch {G y , G x } . Those root
2 2
1 1
matrices are generated by product decompositions of type G y = (G y )cG y as well
1 1
2 2
¬0¼
- 12
is based upon the left matrix (3.64) L := G y U and the right matrix (3.65)
1
R := G x V . Indeed the left matrix L by means of (3.66) LLc = G -1y reconstructs
2
-
the inverse matrix of the metric of the observation space Y . Similarly, the right
matrix R by means of (3.67) RR c = G -1x generates the inverse matrix of the
metric of the parameter space X . In terms of “ L , R ” we have summarized the
eigenvalue decompositions (3.68)-(3.73). Such an eigenvalue decomposition
helps us to canonically invert y * = A* x* + i* by means of (3.74), (3.75), namely
the rank partitioning of the canonical observation vector y * into y1* \ r×1 and
y *2 \ ( n r )×1 to determine x*A = ȁ -1 y1* leaving y *2 “unrecognized”. Next we shall
proof i1* = 0 if i1* is LESS.
Box 3.14:
Canonical representation,
overdetermined system of linear equations
“parameter space X ” 1
versus “observation space Y ” 1
(3.52) x* = V cG x x 2
y * = U cG y y (3.53) 2
and and
- 12 - 12
(3.54) x = G Vx x
*
y = G y Uy * (3.55)
( 1
(3.58) y * = UcG y AG x V x* + i*
2
- 12
) ( 1
y = G y UA* V cG x x + i (3.59)
- 2
1
2
)
134 3 The second problem of algebraic regression
subject to subject to
¬ U c2 ¼ ¬0¼ ¬0¼
“dimension identities”
ȁ \ r × r , U1 \ n × r
0 \ ( n r )× r , U 2 \ n × ( n r ) , V \ r × r
r := rk A = m, n > m
“left eigenspace” “right eigenspace”
- 12 1
- 12 1
(3.64) L := G y U L-1 = U cG y 2
versus R := G x V R -1 = V cG x (3.65) 2
12 12
L1 := G y U1 , L 2 := G y U 2
ª Uc º ª L º
1
L1 = « 1 » G y =: « 1 »
2
¬ U c2 ¼ ¬L 2 ¼
ª ȁ º ª L º
(3.70) A = [ L1 , L 2 ] A # R 1 versus A* = « » = « 1 » AR (3.71)
¬ 0 ¼ ¬L 2 ¼
ª A # AL1 = L1 ȁ 2
(3.72) « # versus AA # R = Rȁ 2 (3.73)
«¬ A AL 2 = 0
“overdetermined system of linear
equations solved in canonical coordinates”
ªȁº ªi* º ª y * º
(3.74) y * = A* x* + i* = « » x* + « 1* » = « *1 »
¬0¼ ¬«i 2 ¼» ¬ y 2 ¼
“dimension identities”
3-2 The least squares solution: “LESS” 135
1 1
V cG x 2
U cG y 2
X
x* * y* R(A* ) Y
A
Figure 3.5:Commutative diagram of coordinate transformations
ª AA # L1 = L1 ȁ 2
A # AR = Rȁ 2 versus «
¬ AA L 2 = 0
#
subject to
ªO12 … 0 º
« »
(
ȁ 2 = « # % # » , ȁ = Diag + O12 ,..., + Or2 . )
« 0 " Or2 »
¬ ¼
¬ U c2 ¼ ¬0 ¼
ªȁº
A # AR = G x V [ ȁ, 0c] « » = G x Vȁ 2
1
- - 1
2 2
¬ ¼0
A # AR = Rȁ 2 . ƅ
(ii) AA # L1 = L1 ȁ 2 , AA # L 2 = 0
AA # L = AG -1x A cG y L =
ªȁº ª Uc º -
= G y [ U1 , U 2 ] « » V cG x G -1x G x V [ ȁ, 0c] « 1 » (G y )cG y G y [ U1 , U 2 ]
1 1 1 1 1
-2 2 2
-
2 2
¬0¼ c
¬U2 ¼
ªȁº ª U c U U1c U 2 º
AA # L = [ L1 , L 2 ] « » [ ȁ, 0c] « 1 1 »
¬0¼ ¬ U c2 U1 U c2 U 2 ¼
ªȁ2 0c º ª I r 0 º
AA # L = [ L1 , L 2 ] « »«
¬0 0¼¬0 I n-r »¼
AA # [ L1 , L 2 ] = ª¬ L1 ȁ 2 , 0 º¼ , AA # L1 = L1 ȁ 2 , AA # L 2 = 0. ƅ
The pair of eigensystems {A # AR = Rȁ 2 , AA # [L1 , L 2 ] = ª¬L1 ȁ 2 ,0º¼} is unfortu-
nately based upon non-symmetric matrices AA # = AG -1x A cG y and
A # A = G -1x AcG y A which make the left and right eigenspace analysis numerically
more complex. It appears that we are forced to use the Arnoldi method rather
than the more efficient Lanczos method used for symmetric matrices.
3-2 The least squares solution: “LESS” 137
In this situation we look out for an alternative. Actually as soon as we substitute
- 12 - 12
{L, R} by {G y U, G x V}
- 12
into the pair of eigensystems and consequently multiply AA # L by G x , we
achieve a pair of eigensystems identified in Corollary 3.10 relying on symmetric
matrices. In addition, such a pair of eigensystems produces the canonical base,
namely orthonormal eigencolumns.
- 12 - 12
1
- 12 | (G x )cA cG y AG x Ȝ 2j I r |= 0
(3.78) | G y AG Ac(G )c Ȝ I |= 0 versus
2 -1
x y
2
i n
(3.79)
is based upon symmetric matrices. The left and right eigencolumns are
orthogonal.
Then the rank partitioning of y * = ª¬(y *1 )c, (y *2 )cº¼c leads to the canoni-
cal unknown vector
ª y* º ª y* º y * \ r ×1
(3.80) x*A = ª¬ ȁ -1 , 0 º¼ « *1 » = ȁ -1 y *1 , y * = « *1 » , * 1 ( n r )×1 (3.81)
¬y 2 ¼ ¬y 2 ¼ y 2 \
and to the canonical vector of inconsistency
ª i* º ª y* º ª ȁ º i* = 0
(3.82) i*A = « *1 » := « *1 » « » ȁ -1 y *1 or *1 *
¬i 2 ¼ A ¬ y 2 ¼ A ¬ 0 ¼ i2 = y2
138 3 The second problem of algebraic regression
x A = ( A cG y A ) A cG y y º
-1
»
ªȁº »
A = G y [ U1 , U 2 ] « » V cG x »
1 1
- 2 2
¬0¼ ¼»
ª Uc º ªȁº
A cG y A = (G x )cV [ ȁ, 0] « 1 » (G y )cG y G y [ U1 , U 2 ] « » V cG x
1 1 1 1
2
2 2 2
¬ U c2 ¼ ¬0¼
ª Uc º -
x A = G x Vȁ 2 V c(G x )c(G x )cV [ ȁ, 0] « 1 » (G y )cG y y
- 1 - 1 1 1
2 2 2 2
¬ U c2 ¼
1 ª Uc º 1
x A = G x V ª¬ ȁ -1 , 0 º¼ « 1 » G y y
- 2 2
¬ U c2 ¼
- 12 1
x A = G x Vȁ -1 U1c G y y = A -A y 2
- 12 1
A A- = G x Vȁ -1 U1c G y A1,2,3
G
2
y
1 ª Uc º
1 º 1
x*A = V cG x x A = ȁ -1 U1c G y y = ª¬ ȁ -1 , 0 º¼ « 1 » G y y »
2 2 2
¬ U c2 ¼ »
ª y º ª Uc º
* » 1
y * = « *1 » = « 1 » G y y » 2
y
¬ 2¼ ¬ 2¼ U c »¼
ª y* º
x*A = ª¬ ȁ -1 , 0 º¼ « *1 » = ȁ -1 y 1* .
¬y 2 ¼
3-2 The least squares solution: “LESS” 139
Thus we have proven the canonical inversion formula. The proof for the canoni-
cal representation of the vector of inconsistency is a consequence of the rank
partitioning
ª i* º ª y* º ª ȁ º i* , y * \ r ×1
i*l = « 1* » := « 1* » « » x*A , * 1 * 1 ( n r )×1 ,
¬i 2 ¼ A ¬ y 2 ¼ ¬ 0 ¼ i2 , y2 \
ª i* º ª y * º ª ȁ º ª0º
i*A = « 1* » = « 1* » « » ȁ -1 y1* = « * » .
¬i 2 ¼ A ¬ y 2 ¼ ¬ 0 ¼ ¬y 2 ¼
ƅ
The important result of x*A based on the canonical G y -LESS of {y * = A* x* + i* |
A* \ n× m , rk A* = rk A = m, n > m} needs a comment. The rank partitioning
of the canonical observation vector y * , namely y1* \ r , y *2 \ n r again paved
the way for an interpretation. First, we appreciate the simple “direct inversion”
( )
x*A = ȁ -1 y1* , ȁ = Diag + O12 ,..., + Or2 , for instance
*
1
ª x º ª Ȝ1-1y1* º
« » « »
« ... » = « ... » .
« x*m » « Ȝ -1r y *r »
¬ ¼A ¬ ¼
Second, i1* = 0 , eliminates all elements of the vector of canonical inconsistencies,
for instance ª¬i1* ,..., ir* º¼ c = 0 , while i*2 = y *2 identifies the deficient elements of the
A
vector of canonical inconsistencies with the vector of canonical observations for
instance ª¬ir +1 ,..., in º¼ c = ª¬ yr*+1 ,..., yn* º¼ c . Finally, enjoy the commutative diagram
* *
A A
of Figure 3.6 illustrating our previously introduced transformations of type
( )
LESS and canonical LESS, by means of A A and A* , respectively.
A
AA
Y
y xA X
1 1
UcG y 2
V cG x
2
Y
y* ( A* )A x*A X
ª1 º ª1 1 º ª i1 º
ªx º
y = Ax + i : «« 2 »» = ««1 2 »» « 1 » + ««i2 »» , r := rk A = 2
x
«¬ 4 »¼ «¬1 3 »¼ ¬ 2 ¼ «¬ i3 »¼
ª2 3 4 º
ª3 6 º
AA c = «« 3 5 7 »» «6 14 » = A cA
«¬ 4 7 10 »¼ ¬ ¼
eigenvalues
| AAc Oi2 I 3 |= 0 | A cA O j2 I 2 |= 0
17 1 17 1
O12 = + 265, O22 = 265, O32 = 0
2 2 2 2
left eigencolumns right eigencolumns
ª 2 O12 3 4 º ª u11 º
« » ª3 O12 6 º ª v11 º
(1st) « 3 5 O12
7 » ««u21 »» = 0 (1st) « »« » = 0
« 4 ¬ 6 14 O12 ¼ ¬ v 21 ¼
¬ 7 10 O12 »¼ «¬u31 »¼
subject to subject to
u112 + u21
2
+ u312 = 1 v112 + v 221 = 1
ª 2 36 72
« v11 = 36 + (3 O 2 ) 2 = 265 + 11 265
« 1
« 2 (3 O1 ) 2
2
193 + 11 265
« v 21 = =
«¬ 36 + (3 O1 )
2 2
265 + 11 265
ª u112 º ª (1 + 4O12 ) 2 º
« 2» 1 « »
«u21 » = (1 + 4O 2 ) 2 + (2 7O 2 ) 2 + (1 7O 2 + O 4 ) 2
2 2
« (2 7O1 ) »
« 2»
¬ (1 7O1 + O1 ) ¼
1 1 1 1 « 2 4 2»
¬u31 ¼
3-2 The least squares solution: “LESS” 141
(
ª 35 + 2 265 2 º
« » )
ª u112 º « 2»
« 2» 2 «§ 115 + 7 265 · »
«u21 » = ¨ ¸ »
«u31
2 » 43725 + 2685 265 «© 2 2 ¹
¬ ¼ « »
( )
2
« 80 + 5 265 »
¬« »¼
ª 2 O22 3 5 º ª v12 º
ª3 O22 7 º ª u12 º « »
(2nd) « »« » = 0 (2nd) « 3 2
5 O2 9 » «« v 22 »» = 0
¬ 7 21 O22 ¼ ¬u22 ¼ « 5
¬ 9 17 O22 »¼ «¬ v32 »¼
subject to subject to
u122 + u22
2
+ u322 = 1 v122 + v 22
2
=1
ª 2 36 72
« v12 = 36 + (3 O 2 ) 2 = 265 11 265
« 2
« 2 2 2
(3 O2 ) 193 11 265
« v 22 = =
¬« 36 + (3 O 2 2
2 ) 265 11 265
ª u122 º ª (1 + 4O22 ) 2 º
« 2» 1 « »
«u22 » = (1 + 4O 2 ) 2 + (2 7O 2 ) 2 + (1 7O 2 + O 4 ) 2
2 2
« (2 7O2 ) »
«u32
2 » 2 2 2 2 « (1 7O22 + O24 ) 2 »
¬ ¼ ¬ ¼
ª (35 2 265) 2 º
ª u122 º « »
« 2» 2 « 115 7 2»
«u22 » = ( 265)
«u32
2 » 43725 2685 265 « 2 2 »
¬ ¼ « 2
»
¬« (80 5 265) »¼
ª 2 3 4 º ª u13 º
(3rd) «« 3 5 7 »» ««u23 »» = 0 subject to u132 + u23
2
+ u332 = 1
«¬ 4 7 10 »¼ «¬u33 »¼
1 2 1
u132 = , u23
2
= , u332 = .
6 3 6
There are four combinatorial solutions to generate square roots.
ª ± u132 º
u13 º « ± u11 ± u122
2
ª u11 u12 »
«u u23 »» = « ± u21
2
± u22
2 2 »
± u23
« 21 u22 « »
«¬u31 u32 u33 »¼ « ± u 2 ± u32
2
± u33
2 »
31
¬ ¼
«v = « ».
¬ 21 v 22 »¼ « ± v 2 ± v 222 »¼
¬ 21
Here we have chosen the one with the positive sign exclusively. In summary, the
eigenspace analysis gave the result as follows.
§ 17 + 265 17 265 ·
ȁ = Diag ¨ , ¸
¨ 2 2 ¸
© ¹
ª 35 + 2 265 35 2 265 º
« 2 2 1 »
« 43725 + 2685 265 43725 2685 265 6»
« 6 »
« 2 115 + 7 265 2 115 7 265 1 »
U=« 6 = [ U1 , U 2 ]
2 43725 + 2685 265 2 43725 2685 265 3 »
« »
« 1 »
« 80 + 5 265 80 5 265 6
2 2 6 »
« 43725 + 2685 265 43725 2685 265 »
¬ ¼
ª 72 72 º
« »
« 265 + 11 265 265 11 265 »
V=« ».
« 193 + 11 265 193 11 265 »
« »
¬ 265 + 11 265 265 11 265 ¼
3-3 Case study 143
y (ti ) = 1i x1 + ti x2 i {1," , n} .
y4 *
y3 *
y (t )
y2
y1 * *
t1 = 1 t2 = 2 t3 = 3 t4 = a
t
Figure 3.7: Graph of the function y(t), high leverage point t4=a
Box 3.15 summarizes the right eigenspace analysis of the hat matrix
H y : =A(AcA)- Ac . First, we have computed the spectrum of A cA and ( A cA) 1
for the given matrix A R 4× 2 , namely the eigenvalues squared
2
O1,2 = 59 ± 3261 . Note the leverage point t4 = a = 10. Second, we computed the
right eigencolumns v1 and v2 which constitute the orthonormal matrix
V SO(2) . The angular representation of the orthonormal matrix V SO(2)
follows: Third, we take advantage of the sine-cosine representation (3.85)
V SO(2) , the special orthonormal group over R2. Indeed, we find the angular
parameter J = 81o53ƍ25.4Ǝ. Fourth, we are going to represent the hat matrix Hy in
terms of the angular parameter namely (3.86) – (3.89). In this way, the general
representation (3.90) is obtained, illustrated by four cases. (3.86) is a special case
of the general angular representation (3.90) of the hat matrix Hy. Five, we sum
up the canonical representation AV cȁ 2 V cA c (3.91), of the hat matrix Hy, also
called right eigenspace synthesis. Note the rank of the hat matrix, namely
rk H y = rk A = m = 2 , as well as the peculiar fourth adjusted observation
1
yˆ 4 = y4 (I LESS) = ( 11 y1 + y2 + 13 y3 + 97 y4 ) ,
100
which highlights the weight of the leverage point t4: This analysis will be more
pronounced if we go through the same type of right eigenspace synthesis for the
leverage point t4 = a, ao f , outlined in Box 3.18.
3-3 Case study 145
Box 3.15
Right eigenspace analysis
of a linear model of an univariate
polynomial of degree one
- high leverage point a =10 -
“Hat matrix H y = A( A cA) 1 A = AVȁ 2 V cAc ”
ª A cAV = Vȁ 2
«
right eigenspace analysis: «subject to
« VV c = I 2
¬
ª1 1 º
«1 2 »
A := « » , A cA = ª 4 16 º , ( AA) 1 = 1 ª 57 8º
«1 3 » « » 100 ¬« 8 2 ¼»
¬16 114¼
« »
¬1 10 ¼
4 O2 16
2
= 0 O 4 118O 2 + 200 = 0
16 114 O
2
O1,2 = 59 ± 3281 = 59 ± 57.26 = 0
ª( A cA O 2j I 2 )V = 0
«
right eigencolumn analysis: «subject to
« VV c = I
¬ 2
ªv º
(1st) ( A cA O12 I ) « 11 » = 0 subject to v112 + v21
2
=1
v
¬ 21 ¼
16
v11 = + v112 = = 0.141
256 + (4 O12 ) 2
4 O12
v21 = + v21
2
= = 0.990
256 + (4 O12 ) 2
ªv º
(2nd) ( A cA O 22 I 2 ) « 12 » = 0 subject to v122 + v22
2
=1
v
¬ 22 ¼
(4 O 22 )v12 + 16v22 = 0 º
»
v122 + v222
=1 ¼
16
v12 = + v122 = = 0.990
256 + (4 O 22 ) 2
4 O 22
v22 = + v22
2
= = 0.141
256 + (4 O 22 ) 2
V SO(2) := {V R 2×2 VV c = I 2 , V = 1}
J=81o.890,386 = 81o53’25.4”
hat matrix H y = A( A cA) 1 Ac = AVȁ 2 V cAc
ª 1 1 1 1 º
« O 2 cos J + O 2 sin J ( O 2 + O 2 ) sin J cos J »
2 2
( A cA) 1 = V/ 2 V = « 1 2 1 2 » (3.86)
« 1 1 1 1 »
«( 2 + 2 ) sin J cos J sin J + 2 cos J »
2 2
2
¬ O1 O 2 O1 O2 ¼
3-3 Case study 147
m=2
1
( A cA) j 1j =
1 2 ¦O 2
cos J j j cos J j
1 3 2 j3
(3.87)
j3 =1 j3
subject to
m=2
VV c = I 2 ~ ¦ cos J j1 j3 cos J j 2 j3
= Gj j 1 2
(3.88)
j3 =1
cos 2 J11 + cos 2 J12 = 1 cos J11 cos J 21 + cos J12 cos J 22 = 0
(cos 2 J + sin 2 J = 1) ( cos J sin J + sin J cos J = 0)
( A cA) 1 =
ª O12 cos 2 J 11 + O22 cos 2 J 12 O12 cos J 11 cos J 21 + O22 cos J 12 cos J 22 º
« 2 »
¬O1 cos J 21 cos J 11 + O2 cos J 22 cos J 12 O12 cos 2 J 21 + O22 cos 2 J 22
2
¼
(3.89)
m=2
1
H y = AVȁ 2 V cA c ~ hi i = ¦ ai j ai 2 j2
cos J j j cos J j 2 j3
(3.90)
12
j1 , j2 , j3 =1
1 1
O j2 3
1 3
ª 0.849 1.131 º
« 1.839 1.272 »» 2
A ~ := AV = « , ȁ = Diag(8.60 × 103 , 0.58)
« 2.829 1.413 »
« »
¬ 9.759 2.400 ¼
ª 43 37 31 11º
« »
1 « 37 33 29 1 »
H y = A ~ ȁ 2 ( A ~ )c =
100 « 31 29 27 13 »
« »
¬« 11 1 13 97 ¼»
rk H y = rk A = m = 2
1
yˆ 4 = y4 (I -LESS) = ( 11 y1 + y2 + 13 y3 + 97 y4 ) .
100
148 3 The second problem of algebraic regression
By means of Box 3.16 we repeat the right eigenspace analysis for one leverage
point t4 = a, lateron a o f , for both the hat matrix H x : = ( A cA) 1 A c and
H y : = A( A cA) 1 Ac . First, Hx is the linear operator producing xˆ = x A (I -LESS) .
Second, Hy as linear operator generates yˆ = y A (I -LESS) . Third, the complemen-
tary operator I 4 H y =: R as the matrix of partial redundancies leads us to the
inconsistency vector ˆi = i A (I -LESS) . The structure of the redundancy matrix R,
rk R = n – m, is most remarkable. Its diagonal elements will be interpreted soon-
est. Fourth, we have computed the length of the inconsistency vector || ˆi ||2 , the
quadratic form y cRy .
The highlight of the analysis of hat matrices is set by computing
1st : H x (a o f) versus 2nd : H y (a o f)
for “highest leverage point” a o f , in detail reviewed Box 3.17. Please, notice
the two unknowns x̂1 and x̂2 as best approximations of type I-LESS. x̂1 re-
sulted in the arithmetic mean of the first three measurements. The point
y4 , t4 = a o f , had no influence at all. Here, xˆ2 = 0 was found. The hat matrix
H y (a o f) has produced partial hats h11 = h22 = h33 = 1/ 3 , but h44 = 1 if
a o f . The best approximation of the I LESS observations were yˆ1 = yˆ 2 = yˆ 3
as the arithmetic mean of the first three observations but ŷ4 = y4 has been a
reproduction of the fourth observation. Similarly the redundancy matrix
R (a o f) produced the weighted means iˆ1 , iˆ2 and iˆ3 . The partial redundan-
cies r11 = r22 = r33 = 2 / 3, r44 = 0 , sum up to r11 + r22 + r33 + r44 = n m = 2 . Notice
the value iˆ4 = 4 : The observation indexed four is left uncontrolled.
Box 3.16
The linear model of a univariate
polynomial of degree one
- one high leverage point -
ª y1 º ª1 1º ª i1 º
« y » «1 2 » ª x1 º ««i2 »»
»
y = Ax + i ~ « 2 » = « +
« y3 » «1 3 » «¬ x2 »¼ « i3 »
« » « » « »
¬« y4 ¼» ¬«1 a ¼» ¬«i4 ¼»
x R 2 , y R 4 , A R 4× 2 , rk A = m = 2
ª y1 º
ª8 a + a « »
1 2
2 2a + a 2
4 3a + a 2
14 6a º « y2 »
Hx = « »
18 12a + 3a 2 ¬ 2 a 2a 6a 6 + 3a ¼ « y3 »
« »
«¬ y4 »¼
ª 6 2a + a 2 4 3a + a 2 2 4a + a 2 8 3a º
« »
« 4 3a + a 6 4a + a 8 5a + a
2 2 2
1 2 »
Hy =
18 18a + 3a 2 « 2 4a + a 2 6 5a + a 2 14 6a + a 2 4 + 3a »
« »
¬« 8 3a 2 4 + 3a 14 12a + 3a 2 ¼»
“redundancy”: n – rk A = n – m = 2
ª12 10a + 2a 2 4 + 3a a 2 2 + 4 a a 2 8 + 3a º
« »
« 4 + 3a a 12 6a + 2a 2 8 + 5a a 2
2
1 2 »
R=
18 12a + 3a 2 « 2 + 4a a 2 8 + 5a a 2 4 6a + 2a 2 4 3a »
« »
«¬ 8 + 3a 2 4 3a 4 »¼
1 ª1 1 1 0 º
lim H x = «
aof 3 ¬0 0 0 0 »¼
1
xˆ1 = ( y1 + y2 + y3 ), xˆ2 = 0
3
(2nd ) H y (a o f)
ª6 2 4 3 2 4 8 3 º
« a2 a + 1 a2 a + 1 a 2
+1
a
a2 a »
« »
« 4 3 +1 6 4 +1 8 5
+1
2 »
1 « a2 a a2 a a 2
a a 2 »
Hy = « »
18 12 2 4 8 5 14 6 4 3 »
+ 3 « 2 +1 2 +1 +1 2 +
a2 a «a a a a a2 a a a »
« 8 3 2 4 3 14 12 »
« 2 2+ + 3»
¬ a a a2 a a a2 a ¼
ª1 1 1 0º
« 0 »»
1 1 1 1
lim H y = « , lim h44 = 1
a of 3 «1 1 1 0 » a of
« »
¬«0 0 0 3 ¼»
1
yˆ1 = yˆ 2 = yˆ 3 = ( y1 + y2 + y3 ), yˆ 4 = y4
3
(3rd ) R (a o f)
ª 10 10 4 3 2 4 8 3º
« a2 a + 2 a2 + a 1 a2 + a 1 a2 + a »
« »
« 4 + 3 1 12 8 + 2 8 + 5 1 2
2 »
1 « a2 a a2 a a2 a a »
R= « »
18 12 2 4 8 5 4 6 4 3 »
+ 3 « + 1 + 1 + 2
a2 a « a2 a a2 a a2 a a2 a »
« 8 3 2 4 3 4 »
« 2+ 2 »
¬ a a a a2 a a2 ¼
ª 2 1 1 0º
« 0 »»
1 1 2 1
lim R (a ) = « .
a of 3 « 1 1 2 0»
« »
¬0 0 0 0¼
3-3 Case study 151
1 1 1
iˆ1 = (2 y1 y2 y3 ), iˆ2 = ( y1 + 2 y2 y3 ), iˆ3 = ( y1 y2 + 2 y3 ), iˆ4 = 0
3 3 3
ª 2 1 1 0º
« 1 2 1 0 »»
1
lim || iˆ(a ) ||2 = y c « y
a of 3 « 1 1 2 0»
« »
«¬ 0 0 0 0 »¼
1
lim || iˆ(a ) ||2 = (2 y12 + 2 y22 + 2 y32 2 y1 y2 2 y2 y3 2 y3 y1 ) .
aof 3
A fascinating result is achieved upon analyzing (the right eigenspace of the hat
matrix H y (a o f) . First, we computed the spectrum of the matrices A cA and
( A cA) 1 . Second, we proved O1 (a o f) = f , O2 (a o f) = 3 or O11 (a o f) = 0 ,
O21 (a o f) = 1/ 3 .
Box 3.18
Right eigenspace analysis
of a linear model of a univariate polynomial of degree one
- extreme leverage point a o f -
“Hat matrix H y = A c( A cA) 1 A c ”
ª A cAV = Vȁ 2
«
right eigenspace analysis: «subject to
« VV c = I mc
¬
4 O2 6+a
= 0 O 4 O 2 (18 + a 2 ) + 20 12a + 3a 2 = 0
6 + a 14 + a O
2 2
1
2
O1,2 = tr ( A cA) ± (tr A cA) 2 4 det A cA
2
tr A cA = 18 + a 2 , det A cA = 20 12a + 3a 3
a2 a4
2
O1,2 = 9+ ± 61 + 12a + 6a 2 +
2 4
spec( A cA) = {O1 , O2 } =
2 2
° a 2 a4 a2 a4 ½°
= ®9 + + 61 + 12a + 6a 2 + , 9 + 61 + 12a + 6a 2 + ¾
°¯ 2 4 2 4 °¿
“inverse spectrum ”
1 1
spec( A cA) = {O12 , O22 } spec( A cA) 1 = { 2 , 2 }
O1 O2
a2 a4 9 1 61 12 6 1
9+ 61 + 12a + 6a 2 + 2
+ 4+ 3+ 2+
1 2 4 =a 2 a a a 4
=
O22 20 12a + 3a 2 20 12
+3
a2 a
1
lim =0
a of O 2
1
a2 a4 9 1 61 12 6 1
9+ + 61 + 12a + 6a 2 + 2
+ + + + +
1
= 2 4 = a 2 a 4 a3 a 2 4
O12 20 12a + 3a 2 20 12
+3
a2 a
1 1
lim =
a of O 2 3
2
1
lim spec( A cA)(a) = {f,3} lim spec( A cA) 1 = {0, }
aof aof 3
A cAV = Vȁ º 2
» A cA = Vȁ V c ( A cA ) = Vȁ V c
2 1 2
VV c = I m ¼
“Hat matrix H y = AVȁ 2 V cA c ”.
3-32 Multilinear algebra, “join” and ”meet”, the Hodge star operator
Before we can analyze the matrices “hat Hy” and “red R” in more detail, we
have to listen to an “intermezzo” entitled multilinear algebra, “join” and “meet”
as well as the Hodge star operator. The Hodge star operator will lay down the
foundation of “latent restrictions” within our linear model and of Grassmann
coordinates, also referred to as Plücker coordinates.
Box 3.19 summarizes the definitions of multilinear algebra, the relations “join
and meet”, denoted by “ ” and “*”, respectively. In terms of orthonormal base
vectors ei , " , ei , we introduce by (3.94) the exterior product ei " ei
1 k 1 m
also known as “join”, “skew product” or 1st Grassmann relation. Indeed, such an
exterior product is antisymmetric as defined by (3.95), (3.96), (3.97) and (3.98).
3-3 Case study 153
“antisymmetry”:
ei ...ij ...i = ei ... ji...i i z j
1 m 1 m
(3.95)
ei ... e j e j ... ei = ei ... e j e j ... ei
1 k k +1 m 1 k +1 k m
(3.96)
ei "i i "i = 0 i = j
1 i j m
(3.97)
ei " ei e j " ei = 0 i = j
1 m
(3.98)
Example: e1 e 2 = e 2 e1 or e i e j = e j e i i z j
Example: e1 e1 = 0, e 2 e 2 = 0 or e i e j = 0 i = j
“meet”: Hodge star operator,
Hodge dualizer
2nd Grassmann relation
154 3 The second problem of algebraic regression
: ȁ m ( R n ) o n m ȁ ( R n ) (3.99)
output: “meet”
1
*X := g e j " e j H i "i j " j Xi "i 1 m
(3.101)
m !(n m)! 1 nm 1 m 1 nm
For our purposes two examples on “Hodge’s star” will be sufficient for
the following analysis of latent restrictions in our linear model. In all
detail, Box 3.20 illustrates “join and meet” for
: ȁ 2 ( R 3 ) o ȁ 1 ( R 3 ) .
as their coordinates, the columns of the matrix A with respect to the or-
thonormal frame of reference {e1 , e 2 , e 3 |0} at the origin 0.
n =3
ab = ¦e i1 ei ai 1ai 2 ȁ 2 (R 3 )
2 1 2
i1 ,i2 =1
§n · §3·
¨ ¸=¨ ¸=3
© m¹ © 2¹
subdeterminants of A . If the determinant of the matrix G = I 4 , det G = 1
g = 1 , then according to (3.106), (3.107)
(a b) R ( A) A = G1,3
ai 1 = col1 A; ai 2 = col 2 A
1 2
1 n =3
ab = ¦ ei ei ai 1ai 2 ȁ 2 (R 3 ) (3.104)
2! i ,i =1
1 2
1 2 1 2
“cyclic order
1
ab = e 2 e3 (a21a32 a31a22 ) +
2!
1
+ e3 e1 (a31a12 a11a32 ) + (3.105)
2!
1
+ e1 e 2 (a11a23 a21a12 ) R ( A ) = G 2,3 .
2!
156 3 The second problem of algebraic regression
Output: “meet” ( g = 1, G y = I 3 , m = 2, n = 3, n m = 1)
n=2
1
(a b) = ¦ e j H i ,i , j ai 1ai 2 (3.106)
i ,i , j =1 2!
1 2 1 2
1 2
1
*(a b) = e1 ( a21a32 a31a22 ) +
2!
1
+ e 2 (a31a12 a11a32 ) + (3.107)
2!
1
+ e3 ( a11a23 a21a12 ) R A ( A ) = G1,3
2!
§n · §3·
¨ ¸ = ¨ ¸ subdeterminant of A
© m¹ © 2¹
Grassmann coordinates (Plücker coordinates)
(e 2 e3 ) = e1 ,
(e3 e1 ) = e 2 ,
(e1 e 2 ) = e3 . (3.108)
§ n · § 4·
¨ ¸=¨ ¸=6
© m¹ © 2¹
subdeterminants of the matrix A, namely
a11a12 a21a12 , a11a32 a31a22 , a11a42 a41a12 ,
a21a32 a31a22 , a21a42 a41a22 , a31a41 a41a32 .
Finally, (3.113), for instance
(e1 e 2 ) = e3 e 4 , demonstrates the operation
" ,
" called “join and meet”, indeed quite a generalization of the “cross prod-
uct”.
Box 3.21
The second example
“join and meet”
: / 2 ( R 4 ) o / 2 ( R 4 )
“selfdual”
Input : “join”
n=4 n=4
a = ¦ ei ai 1 , b = ¦ ei ai
1 1 2 2 2 (3.109)
i1 =1 i2 =1
1 n=4
ab = ¦ ei ei ai 1ai 2 ȁ 2 (R 4 ) (3.110)
2! i ,i =1
1 2
1 2 1 2
“lexicographical order”
1
ab = e1 e 2 ( a11a22 a21a12 ) +
2!
1
+ e1 e 3 ( a11a32 a31a22 ) +
2! (3.111)
1
+ e1 e 4 (a11a42 a41a12 ) +
2!
158 3 The second problem of algebraic regression
1
+ e 2 e3 (a21a32 a31a22 ) +
2!
1
+ e 2 e 4 (a21a42 a41a22 ) +
2!
1
+ e3 e 4 (a31a42 a41a32 ) R ( A) A = G 2,4
2!
§ n · § 4·
¨ ¸ = ¨ ¸ subdeterminants of A:
© m¹ © 2¹
Grassmann coordinates ( Plücker coordinates).
Output: “meet” g = 1, G y = I 4 , m = 2, n = 4, n m = 2
1 n=4 1
(a b) = ¦ e j e j Hi i
1 2 j1 j2
ai 1ai 2
2! i ,i , j , j
1 2 1 2 =1 2!
1 2 1 2
1
= e3 e 4 (a11a22 a21a12 ) +
4
1
+ e 2 e 4 (a11a32 a31a22 ) +
4
1
+ e3 e 2 (a11a42 a41a12 ) + (3.112)
4
1
+ e 4 e1 (a21a32 a31a22 ) +
4
1
+ e3 e1 (a21a22 a41a22 ) +
4
1
+ e1 e 2 (a31a42 a41a32 ) R ( A) A = G 2,4
4
§ n · § 4·
¨ ¸ = ¨ ¸ subdeterminants of A :
© m¹ © 2¹
Grassmann coordinates (Plücker coordinates).
(e1 e 2 ) = e3 e 4 ,
(e1 e3 ) = e 2 e 4 ,
(e1 e 4 ) = e3 e 2 ,
(e 2 e3 ) = e 4 e1 ,
(e 2 e 4 ) = e3 e1 , (3.113)
(e3 e 4 ) = e1 e 2 .
our case study we may say that we have eliminated the third observation, but
kept the leverage point. First, let us go through the routine to compute the hat
matrices H x = ( A c A) 1 A c and H y = A( A c A) 1 A c , to be identified by (3.115)
and (3.116). The corresponding estimations xˆ = x A (I -LESS) , (3.116), and
y = y A (I -LESS) , (3.118), prove the different weights of the observations
( y1 , y2 , y3 ) influencing x̂1 and x̂2 as well as ( yˆ1 , yˆ 2 , yˆ3 ) . Notice the great weight
of the leverage point t3 = 10 on ŷ3 .
Second, let us interpret the redundancy matrix R = I 3 A( AcA) 1 Ac , in particu-
lar the diagonal elements.
A cA (1) 64 A cA (2) 81 A cA (3) 1
r11 = = , r22 = = , r33 = = ,
det A cA 146 det AcA 146 det AcA 146
n =3
1
tr R = ¦ (AcA)(i ) = n rk A = n m = 1,
det A cA i =1
the degrees of freedom of the I 3 -LESS problem. There, for the first time, we
meet the subdeterminants ( A cA )( i ) which are generated in a two step procedure.
1 1
1 2 ( A cA )(1) = det A c(1) A (1) = 64
A c(1) A (1)
1 10 det A cA = 146
1 1 1 2 12
1 2 10 12 104
Example: ( AcA) 2
1 1
1 2 ( A cA )(2) = det A c(2) A (2) = 81
A c( 2) A ( 2)
1 10 det A cA = 146
1 1 1 2 11
1 2 10 11 101
160 3 The second problem of algebraic regression
Example: ( AcA)3
1 1
A c(3) A (3) 1 2 ( A cA )(3) = det A c(3) A (3) = 1
1 10 det A cA = 146
1 1 1 2 3
1 2 10 3 5
Obviously, the partial redundancies (r11 , r22 , r33 ) are associated with the influence
of the observation y1, y2 or y3 on the total degree of freedom. Here the observa-
tion y1 and y2 had the greatest contribution, the observation y3 at a leverage point
a very small influence.
The redundancy matrix R, properly analyzed, will lead us to the latent restric-
tions or “from A to B”. Third, we introduce the rank partitioning R = [ B, C] ,
rk R = rk B = n m = 1, (3.120), of the matrix R of spatial redundancies. He-
re, b R 3×1 , (3.121), is normalized to generate b
= b / || b || 2 , (3.122). Note,
C R 3× 2 is a dimension identity. We already introduced the orthogonality condi-
tion
bcA = 0 or bcAxA = bcyˆ = 0
(b
)cA = 0 or (b
)cAxA = (b
)cyˆ = 0,
wx1
«¬1»¼
ª1 º
wy
= [e1 , e 2 , e3 ] « 2 »» ,
y «
t2 :=
y y
wx 2
«¬10 »¼
«¬ a31 x1 + a32 x2 »¼ i =1 j =1
ª a11 º n =3
wy
(x1 , x 2 ) = [e1 , e 2 , e3 ] « a21 »» = ¦ eiy ai1
y y y «
wx1
«¬ a31 »¼ i =1
ª a12 º n =3
wy
(x1 , x 2 ) = [e1 , e 2 , e3 ] « a22 »» = ¦ eiy ai2 .
y y y «
wx 2
«¬ a32 »¼ i =1
Indeed, the columns of the matrix A lay the foundation of GRASSMANN (A).
Five, let us turn to GRASSMANN (B) which is based on the normal space
R ( A) A . The normal vector n = t1 × t 2 =
(t1 t 2 ) which spans GRASSMANN
(B) is defined by the “cross product” identified by " ,
" , the skew product
symbol as well as the Hodge star symbol. Alternatively, we are able to represent
the normal vector n, (3.130), (3.132), (3.133), constituted by the columns {col1A,
col2A} of the matrix, in terms of the Grassmann coordinates (Plücker coordi-
nates).
a a22 a a32 a a12
p23 = 21 = 8, p31 = 31 = 9, p12 = 11 = 1,
a31 a32 a11 a12 a21 a22
¦
(e
i1 ,i2 =1
i1 ei )ai 1ai 2 .
2 1 2
“The vector b
which constitutes the latent restriction (latent condition equation)
coincides with the normalized normal vector n
R ( A) A , an element of the
space R ( A) A , which is normal to the column space R ( A) of the matrix A. The
vector b
is built on the Grassmann coordinates (Plücker coordinates),
[ p23 , p31 , p12 ]c , subdeterminant of vector g
in agreement with b
.”
Box 3.22
Latent restrictions
Grassmann coordinates (Plücker coordinates)
the second example
ª y1 º ª1 1 º ª1 1 º
« y » = «1 2 » ª x1 º A = «1 2 » , rk A = 2
« 2» « » «x » « »
«¬ y3 »¼ «¬1 10 »¼ ¬ 2 ¼ «¬1 10 »¼
(1st) H x = ( A cA ) 1 A c
1 ª 92 79 25º
H x = ( AcA) 1 Ac = (3.115)
146 «¬ 10 7 17 »¼
1 ª 92 y1 + 79 y2 25 y3 º
xˆ = x A (I LESS) = (3.116)
146 «¬ 10 y1 7 y2 + 17 y3 »¼
(2nd) H y = A( A cA) 1 A c
ª 82 72 8 º
1 «
H y = ( A cA) Ac =
1
72 65 9 »» , rk H y = rk A = 2 (3.117)
146 «
«¬ 8 9 145»¼
ª82 y1 + 72 y2 8 y3 º
1 «
yˆ = y A (I LESS) = 72 y1 + 65 y2 + 3 y3 »» (3.118)
146 «
«¬ 8 y1 + 9 y2 + 145 y3 »¼
1
yˆ 3 = (8 y1 + 9 y2 + 145 y3 ) (3.119)
146
(3rd) R = I 3 A( A cA ) 1 Ac
3-3 Case study 163
ª 64 72 8 º
1 «
R = I 3 A( A cA) 1 Ac = « 72 81 9 »» (3.120)
146
«¬ 8 9 1 »¼
ª 64 72 8º
1 «
R = [B, C] = 72 81 9 » , rk R = 1 (3.120)
146 « »
«¬ 8 9 1»¼
ª 64 º ª 0.438 º
1 «
b := « 72 »» = «« 0.493»» (3.121)
146
«¬ 8 »¼ «¬ 0.053 »¼
ª 8º ª 0.662 º
b 1 « » «
b :=
= « 9 » = « 0.745 »» (3.122)
&b& 146
«¬ 1 »¼ «¬ 0.083 »¼
8 yˆ1 9 yˆ 2 + yˆ 3 = 0 (3.127)
ª1º
wy y « »
“the first tangent vector”: t1 := = [e1 , e 2 , e 3 ] 1
y y
(3.128)
wx1 « »
«¬1»¼
164 3 The second problem of algebraic regression
ª1 º
wy
“the second tangent vector”: t 2 := = [e1y , e 2y , e 3y ] « 2 » (3.129)
wx2 « »
«¬10»¼
“ Gm,n ”
G 2,3 = span{t1 , t 2 } R 3 : Grassmann ( A )
“the normal vector”
n := t1 × t 2 =
( t1 t 2 ) (3.130)
n =3 n =3
t1 = ¦ ei ai 1 1 1
and t1 = ¦ ei ai 2 2 2 (3.131)
i =1 i =1
n =3 n =3
n= ¦ee i1 i2 ai 1ai 2 =
1 2 ¦ (e i1 ei )ai 1ai
2 1 2 2 (3.132)
i1 ,i2 =1 i1 ,i2 =1
i, i1 , i2 {1," , n = 3}
n= versus n=
= e 2 × e3 (a21a32 a31a22 ) =
(e 2 × e3 )( a21a32 a31a22 ) +
(3.133) +e3 × e1 (a31a12 a11a32 ) +
(e3 × e1 )(a31a12 a11a32 ) + (3.134)
+e1 × e 2 (a11a22 a21a12 ) +
(e1 × e 2 )( a11a22 a21a12 )
ª
(e 2 e 3 ) = e 2 × e 3 = e1
Hodge star operator : «
(e e ) = e × e = e (3.135)
« 3 1 3 1 2
«¬ (e1 e 2 ) = e1 × e 2 = e 3
ª8 º
n = t1 × t 2 =
( t1 × t 2 ) = [e , e , e ] « 9 » y y y
(3.136)
« » 1 2 3
«¬1 »¼
ª8 º
n 1 « »
n :=
= [e1 , e 2 , e3 ]
y y y
9 (3.137)
|| n || 146 « »
«¬1 »¼
Corollary: b = n
“Grassmann manifold G n m ,n “
3-3 Case study 165
ª1 1º
a a22 a31 a32 a11 a12
A = « 1 2 » , g ( A ) := { 21 , , }=
« » a31 a32 a11 a12 a21 a22
«¬10 10»¼ (3.138)
1 2 1 10 1 1
={ , , } = {8, 9,1}
1 10 1 1 1 2
(cyclic order)
g ( A) = { p23 , p31 , p12 }
ª p23 º ª8 º
Grassmann vector : g := «« p31 »» = «« 9 »» (3.139)
«¬ p12 »¼ ¬«1 ¼»
ª8 º
g 1 « »
normalized Grassmann vector: g :=
= 9 (3.140)
|| g || 146 « »
«¬1 »¼
Corollary : b = n = g . (3.141)
Now we are prepared to analyze the matrix A R 2× 4 of our case study. Box 3.23
outlines first the redundancy matrix R R 2× 4 (3.142) used for computing the
inconsistency coordinates iˆ4 = i4 (I LESS) , in particular. Again it is proven that
the leverage point t4=10 has little influence on this fourth coordinate of the in-
consistency vector. The diagonal elements (r11, r22, r33, r44) of the redundancy
matrix are of focal interest. As partial redundancy numbers (3.148), (3.149),
(3.150) and (3.151)
AA (1) 57 AA ( 2) 67 AA (3) 73 AA ( 4) 3
r11 = = , r22 = = , r33 = = , r44 = = ,
det A cA 100 det A cA 100 det A cA 100 det A cA 100
they sum up to
n=4
1
tr R = ¦ (AcA)(i ) = n rk A = n m = 2 ,
det A cA i =1
the degree of freedom of the I 4 -LESS problem. Here for the second time we
meet the subdeterminants ( A cA )( i ) which are generated in a two-step procedure.
166 3 The second problem of algebraic regression
Box 3.23
Redundancy matrix of a linear model of a uninvariant polynomial of degree
one
- light leverage point a=10 -
“Redundancy matrix R = (I 4 A( A cA) 1 A c) ”
ª 57 37 31 11 º
« »
1 « 37 67 29 1 »
I 4 A( AcA) 1 Ac = (3.142)
100 « 31 29 73 13»
« »
«¬ 11 1 13 3 »¼
1
iˆ4 = i4 (I -LESS) = (11 y1 y2 13 y3 + 3 y4 ) (3.144)
100
57 67 73 3
r11 = , r22 = , r33 = , r44 = (3.145)
100 100 100 100
“rank partitioning”
R R 4×4 , rk R = n rk A = n m = 2, B R 4×2 , C R 4×2
ª 57 37 º
« 67 »
1 « 37 » , then BcA = 0 ”
“ if B := (3.147)
100 « 31 29 »
« »
¬ 11 1 ¼
A cA (1) A cA ( 2 )
(3.148) r11 = , r22 = (3.149)
det A cA det A cA
c
A A (3) A cA ( 4 )
(3.150) r33 = , r44 = (3.151)
det A cA det A cA
n =4
1
tr R = ¦ (AcA)(i ) = n rk A = n m = 2 (3.152)
det A cA i =1
3-3 Case study 167
Example: ( A cA )(1)
1 1
A c(1) A (1) 1 2 ( A cA )(1) =det ( A c(1) A (1) ) =114
1 3
det A cA = 200
1 10
1 1 1 1 3 15
1 2 3 10 15 113
Example: ( A cA)( 2)
1 1
A c( 2) A ( 2) 1 2 ( A cA)( 2) =det ( A c( 2) A ( 2) ) =134
1 3
det A cA = 200
1 10
1 1 1 1 3 14
1 2 3 10 14 110
Example: ( A cA)(3)
1 1
A c(3) A (3) 1 2 ( A cA)(3) =det ( A c(3) A (3) ) =146
1 3
det A cA = 200
1 10
1 1 1 1 3 13
1 2 3 10 13 105
Example: ( A cA)( 4)
1 1
A c( 2) A ( 2) 1 2 ( A cA)( 4) =det ( A c( 4) A ( 4) ) =6
1 3
det A cA = 200
1 10
1 1 1 1 3 6
1 2 3 10 6 10
168 3 The second problem of algebraic regression
Again, the partial redundancies (r11 ," , r44 ) are associated with the influence of
the observation y1, y2, y3 or y4 on the total degree of freedom. Here the observa-
tions y1, y2 and y3 had the greatest influence, in contrast the observation y4 at the
leverage point a very small impact.
The redundancy matrix R will be properly analyzed in order to supply us with
the latent restrictions or the details of “from A to B”. The rank partitioning
R = [B, C], rk R = rk B = n m = 2 , leads us to (3.22) of the matrix R of partial
redundancies. Here, B R 4× 2 , with two column vectors is established. Note
C R 4×2 is a dimension identity. We already introduced the orthogonality condi-
tions in (3.22)
BcA = 0 or BcAxA = Bcy A = 0 ,
ª a11 x1 + a12 x2 º
« a x + a x » n=4 m=2
y = [e1y , e 2y , e3y , e 4y ] « 21 1 22 2 » = ¦ ¦ eiy aij x j ,
« a31 x1 + a32 x2 » i =1 j =1
« »
¬« a41 x1 + a42 x2 ¼»
3-3 Case study 169
ª a11 º
«a » n=4
wy y « 21 »
( x1 , x2 ) = [e1 , e 2 , e3 , e 4 ]
y y y
= ¦ eiy ai1 ,
wx1 « a31 » i =1
« »
¬« a41 ¼»
ª a12 º
«a » n=4
wy
( x1 , x2 ) = [e1y , e 2y , e3y , e 4y ] « » = ¦ eiy ai 2 .
22
wx2 « a32 » i =1
« »
«¬ a42 »¼
Box 3.24
Latent restrictions
Grassmann coordinates (Plücker coordinates)
the first example
(3.153) BcA = 0 Bcy = 0 (3.154)
ª1 1 º ª 57 37 º
«1 2 » « »
(3.155) A=« » B = 1 « 37 67 » (3.156)
«1 3 » 100 « 31 29 »
« » « »
«¬1 10 »¼ «¬ 11 1 »¼
“ latent restriction”
57 yˆ1 37 yˆ 2 31yˆ 3 + 11yˆ 4 = 0 (3.157)
ª1 º
« »
wy y y y y « 2 »
“the second tangent vector”: t 2 := [e1 , e 2 , e3 , e 4 ] (3.160)
wx 2 «3 »
« »
¬«10 ¼»
170 3 The second problem of algebraic regression
b1
“the first normal vector”: n1
:= (3.161)
|| b1 ||
ª 0.755 º
« 0.490»
n1
= [e1y , e 2y , e 3y , e 4y ] « » (3.163)
« 0.411»
« »
¬ 0.146¼
b2
“the second normal vector”: n
2 := (3.164)
|| b 2 ||
ª 0.452 º
« »
y « 0.819 »
n 2 = [e1 , e 2 , e3 , e 4 ]
y y y
(3.166)
« 0.354 »
« »
¬« 0.012 ¼»
Grassmann coordinates (Plücker coordinates)
ª1 1 º
«1 2 »
A=« » g ( A) := °® 1 1 , 1 1 , 1 1 , 1 2 , 1 2 , 1 3 °½¾ =
«1 3 » °¯ 1 2 1 3 1 10 1 3 1 10 1 10 °¿
« »
¬1 10 ¼
= { p12 , p13 , p14 , p23 , p24 , p34 }
(3.167)
p12 = 1, p13 = 2, p14 = 9, p23 = 1, p24 = 8, p34 = 7.
Again, the columns of the matrix A lay the foundation of GRASSMANN (A).
Next we turn to GRASSMANN (B) to be identified as the normal space R ( A) A .
The normal vectors
ªb11 º ªb21 º
«b » «b »
n1
= [e1y , e 2y , e3y , e 4y ] « 21 » ÷ || col1 B ||, n
2 = [e1y , e 2y , e3y , e 4y ] « 22 » ÷ || col2 B ||
«b31 » «b32 »
« » « »
¬«b41 ¼» ¬«b42 ¼»
are computed from the normalized column vectors of the matrix B = [b1 , b 2 ] .
3-3 Case study 171
The normal vectors {n1 , n 2 } span the normal space R ( A) A , also called GRASS-
MANN(B). Alternatively, we may substitute the normal vectors n1 and n 2 by the
Grassmann coordinates (Plücker coordinates) of the matrix A, namely by the
Grassmann column vector.
1 1 1 1 1 1
p12 = = 1, p13 = = 2, p14 = =9
1 2 1 3 1 10
1 2 1 2 1 3
p23 = = 1, p24 = = 8, p34 = =7
1 3 1 10 1 10
n = 4, m = 2, n–m = 2
n=4
1 n=4
¦
(ei ei )ai 1ai 2 = ¦ e j e j H i ,i , j , j ai 1ai 2
i1 ,i2 =1
1 2 1 2
2! i ,i , j , j =1
1 2 1 2
1 2 1 2 1 2 1 2
ª p12 º ª1 º
«p » « »
« 13 » « 2 »
« p14 » «9 »
g := « » = « » R 6×1 .
« p23 » «1 »
« p » «8 »
« 24 » « »
«¬ p34 »¼ ¬«7 ¼»
?How do the vectors {b1, b2},{n1, n2} and g relate to each other?
Earlier we already normalized, {b1
, b
2 } to {b1
, b
2 }, when we constructed
{n1
, n
2 } . Then we are left with the question how to relate {b1
, b
2 } and
{n1
, n
2 } to the Grassmann column vector g.
The elements of the Grassmann column vector g(A) associated with matrix A
are the Grassmann coordinates (Plücker coordinates){ p12 , p13 , p14 , p23 , p24 , p34 }
in lexicographical order. They originate from the dual exterior form
D m = E n m
where D m is the original m-exterior form associated with the matrix A.
n = 4, n–m = 2
1 n=4
D 2 := ¦ ei ei ai ai =
2! i i =1
1, 2
1 2 1 2
1 1
= e1 e 2 (a11a22 a21a12 ) + e1 e3 ( a11a32 a31a22 ) +
2! 2!
1 1
+ e1 e 4 (a11a42 a41a12 ) + e 2 e3 ( a21a32 a31a22 ) +
2! 2!
1 1
+ e 2 e 4 (a21a42 a41a22 ) + e3 e 4 (a31a42 a41a32 )
2! 2!
172 3 The second problem of algebraic regression
1 n=4
E :=
D 2 (R 4 ) = ¦ e j e j Hi i 1 2 j1 j2
ai 1ai 2 =
4 i i , j , j =1
1, 2 1 2
1 2 1 1
1 1 1
= e3 e 4 p12 + e 2 e 4 p13 + e3 e 2 p14 +
4 4 4
1 1 1
+ e 4 e1 p23 + e3 e1 p24 + e1 e 2 p34 .
4 4 4
The Grassmann coordinates (Plücker coordinates) { p12 , p13 , p14 , p23 , p24 , p34 }
refer to the basis
{e3 e 4 , e 2 e 4 , e3 e 2 , e 4 e1 , e3 e1 , e1 e 2 } .
Indeed the Grassmann space G 2,4 spanned by such a basis can be alternatively
covered by the chart generated by the column vectors of the matrix B,
n=4
J 2 := ¦e j1 e j b j b j GRASSMANN(Ǻ),
2 1 2
j1 , j2
“The matrix B constitutes the latent restrictions also called latent condition
equations. The column space R (B) of the matrix B coincides with complemen-
tary column space R ( A) A orthogonal to column space R ( A) of the matrix A.
The elements of the matrix B are the Grassmann coordinates, also called Plücker
coordinates, special sub determinants of the matrix A = [a i1 ," , a im ]
n
p j j :=
1 2 ¦ Hi "i
1 m j1 " jn-m
ai 1 "ai
1 mm
.
i1 ," , im =1
which is, of course, not unique since the matrix Z \ A× A is left undetermined.
Such a result is typical for an orthogonality conditions.
Second, let us construct the Grassmann space G A ,n , in short GRASSMANN (B)
as well as the Grassmann space G n A , n , in short GRASSMANN (A) representing
R (B) and R (B) A , respectively.
1 n
JA = ¦ e j " e j b j 1"b j A (3.168)
A ! j " j =11 A
1 A 1 A
n
1 1
G n A :=
J A = ¦ ei " ei H i "i n A j1 " jA
b j 1 "b j A .
(n A)! i ,", i
1 nA , j1 ," , jA =1 A !
1 nA 1 1 A
The exterior form J A which is built on the column vectors {b j 1 ," , b j A } of the 1 A
matrix B \ A× n is an element of the column space R (B) . Its dual exterior form
J = G nA , in contrast, is an element of the orthogonal complement R (B) A .
q i "i
1 nA
:= Hi "i 1 n A j1 " jA
b j 1"b j A
1 A
(3.169)
denote the Grassmann coordinates (Plücker coordinates) which are dual to the
Grassmann coordinates (Plücker coordinates) p j " j . q := [ q i … q n A ] is con- 1 nm
stituted by subdeterminants of the matrix B, while p := [ p j … p n m ] by subde-
1
G n A = J A JA
id ( A = n m ) id ( A = n m )
Dm E n m =
D m
Figure 3.8: Commutative diagram
D m o
D m = En-m = J n-m o
J n-m =
En-m =
D m = (1) m ( n-m ) D m
174 3 The second problem of algebraic regression
ª1 1 º n =3 n =3
A = ««1 2 »» \ 3×2 : a1 = ¦ ei ai 1 and a 2 = ¦ ei ai 1 1 2 2 2
i =1 i =1
«¬1 10 »¼ 1 2
n =3
1
D 2 := ¦ ei ei ai 1ai 2 ȁ 2 (\ 3 ) ȁ m ( \ n )
i ,i =1 2! 1 2
1 2 1 2
n =3
1
E1 :=
D 2 := ¦ e j Hi i j ai 1ai 2 ȁ 2 (\ 3 ) ȁ m ( \ n )
i ,i , j =1
1 2
2! 1
1 12 1 1 2
n =3
1
G 2 :=
J1 := ¦ ei ei H i i j H j 2 j3 j1
a j 1a j 2
2! i1 ,i2 , j1 , j2 , j3 =1
1 2 12 1 2 3
n =3
1
G2 = ¦ e i e i (G i j G i 2 j3
Gi j Gi j ) a j 1 a j 2
2! i1 ,i2 , j1 , j2 , j3 =1
1 2 1 2 1 3 2 2 2 3
n =3
1
G2 = ¦e i1 ei ai 1 ai 2 = D 2 ȁ 2 (\ 3 ) ȁ m ( \ n )
2 i1 ,i2 =1
2 1 2
«¬1 10 »¼ 1 2
n=4
1
D 2 := ¦ ei ei ai 1ai 2 ȁ 2 (\ 4 ) ȁ m (\ n )
i1 ,i2 =1 2! 1 2 1 2
1 n=4 1
E2 :=
D 2 := ¦ e j e j Hi i j j ai 1ai 2 ȁ 2 (\ 4 ) ȁ n-m (\ n )
2! i ,i , j , j =1 2!
1 2 1 2
1 2 12 1 2 1 2
1 1 1
E2 = e3 e 4 p12 + e 2 e 4 p13 + e3 e 2 p14 +
4 4 4
1 1 1
+ e 4 e1 p23 + e3 e1 p24 + e1 e 2 p34
4 4 4
p12 = 1, p13 = 2, p14 = 9, p23 = 1, p34 = 7
n=4
1 1
G 2 :=
J 2 := ¦ ei ei H i i 1 2 j1 j2
Hj j1 2 j3 j4
a j 1a j 2 =
2! i ,i , j , j , j , j
1 2 1 2 3 4 =1 2!
1 2 3 4
= D 2 ȁ 2 (\ 4 ) ȁ m (\ n )
1 n=4
G2 = D 2 = ¦ ei ei ai 1ai 2
4 i ,i =1
1 2
1 2 1 2
1 1 1
G2 = D 2 = e3 e 4 q12 + e 2 e 4 q13 + e3 e 2 q14 +
4 4 4
1 1 1
+ e 4 e1q23 + e3 e1q24 + e1 e 2 q34
4 4 4
q12 = p12 ,q13 = p13 ,q14 = p14 ,q23 = p23 ,q24 = p24 ,q34 = p34 .
176 3 The second problem of algebraic regression
Table 3.1: Test “ break points” observations for a piecewise linear model
y t
y1 1 1 t1
y2 2 2 t2
y3 2 3 t3
y4 3 4 t4
y5 2 5 t5
y6 1 6 t6
y7 0.5 7 t7
y8 2 8 t8
y9 4 9 t9
y10 4.5 10 t10
Table 3.1 summarises a set of observations yi with n=10 elements. Those measure-
ments have been taken at time instants {t1 ," , t10 } . Figure 3.9 illustrates the graph
of the corresponding function y(t). By means of the celebrated Gauss-Jacobi
Combinatorial Algorithm we aim at localizing break points. First, outlined in
Box 3.27 we determine all the combinations of two points which allow the fit of
a straight line without any approximation error. As a determined linear model
y = Ax , A \ 2× 2 , r k A = 2 namely x = A 1 y we calculate (3.172) x1 and
(1.173) x 2 in a closed form. For instance, the pair of observations ( y1 , y2 ) , in
short (1, 2) at (t1 , t2 ) = (1, 2) determines ( x1 , x2 ) = (0,1) . Alternatively, the pair of
observations ( y1 , y3 ) , in short (1, 3), at (t1, t3) = (1, 3) leads us to (x1, x2) = (0.5,
0.5). Table 3.2 contains the possible 45 combinations which determine ( x1 , x2 )
from ( y1 ," , y10 ) . Those solutions are plotted in Figure 3.10, 3.11 and 3.12.
Box 3.27
Piecewise linear model
Gauss-Jacobi combinatorial algorithm
1st step
ª y (ti ) º ª1 ti º ª x1 º
y=« »=« » « » = Ax i < j {1," , n} (3.171)
¬ y (t j ) ¼ ¬1 t j ¼ ¬ x2 ¼
y R 2 , A R 2× 2 , rk A = 2, x R 2
ªx º 1 ª t j ti º ª y (ti ) º
x = A 1y « 1 » = « 1 1 » « y (t ) » (3.172)
x
¬ 2¼ t j ti ¬ ¼¬ j ¼
t j y1 ti y2 y j yi
x1 = and x2 = . (3.173)
t j ti t j ti
178 3 The second problem of algebraic regression
Example: ti = t1 = 1, t j = t2 = 2
y (t1 ) = y1 = 1, y (t2 ) = y2 = 2
x1 = 0, x2 = 1.
Example: ti = t1 = 1, t j = t3 = 3
y (t1 ) = y1 = 1, y (t3 ) = y3 = 2
x1 = 0.5 and x2 = 0.5 .
Table 3.2
3-3 Case study 179
“if G y = I 2 , then G x = A cA ”
ª 2 ti + t j º
G x = A cA = « i < j {1," , n}. (3.175)
¬ ti + t j ti2 + t 2j »¼
Example: ti = t1 = 1 , t j = t2 = 2
ª 2 3º
Gx = « » , vech G x = [2,3,5]c.
¬ 3 5¼
Example: ti = t1 = 1 , t j = t3 = 3
ª2 4 º
Gx = « » , vech G x = [2, 4,10]c .
¬ 4 10¼
Third, we are left the problem to identify the break points. C.F. Gauss (1828)
and C.G.J. Jacobi (1841) have proposed to take the weighted arithmetic mean of
the combinatorial solutions (x1,x2)(1,2), (x1,x2)(1,3), in general (x1,x2)(i,j), i<j, are
considered as
Pseudo-observations.
180 3 The second problem of algebraic regression
Box 3.29
Piecewise linear model: Gauss-Jacobi combinatorial algorithm
3rd step
pseudo-observations
Example
ª x1(1,2) º ª1 0º ª i1 º
« (1,2) » «
« x2 » = «0 1 » ª x1 º ««i2 »»
»
+ \ 4×1 (3.176)
« x1(1,3) » «1 0 » «¬ x2 »¼ «i3 »
« (1,3) » « » « »
¬« x2 ¼» ¬«0 1 ¼» ¬«i4 ¼»
G x -LESS
ª x1(1,2) º
« (1,2) »
ª xˆ1 º (1,3) 1 (1,3) « x2
x A := xˆ = « » = ¬ªG x + G x º¼ ª¬G x , G x º¼ (1,3) »
(1,2) (1,2)
\ 2×1 (3.177)
¬ xˆ2 ¼ « x1 »
« (1,3) »
«¬ x2 »¼
vech G (1,2
x
)
= [2, 3,5]c , vech G(1,3)
x = [2, 4,10]c
ª 2 3º ª2 4 º
G (1,2
x
)
=« », G(1,3)
x =« »
¬ 3 5¼ ¬ 4 10¼
¬ 7 15¼ 11 ¬ 7 4 ¼
ª xˆ º 1 ª6º 6
x A := « 1 » = « » xˆ1 = xˆ2 = = 0.545, 454 .
ˆ
¬ x2 ¼ 11 ¬6¼ 11
Box 3.29 provides us with an example. For establishing the third step of the
Gauss-Jacobi Combinatorial Algorithm. We outline G X LESS for the set of
pseudo-observations (3.176) ( x1 , x2 )(1,2) and ( x1 , x2 )(1,3) solved by (3.177)
xA = ( xˆ1 , xˆ2 ) . The matrices G(1,2)
x and G (1,3)
x representing the metric of the pa-
rameter space X derived from ( x1 , x2 )(1,2 ) and ( x1 , x2 )(1,3) are additively com-
posed and inverted, a result which is motivated by the special design matrix
A = [I 2 , I 2 ]c of “direct” pseudo-observations. As soon as we implement the
weight matrices G (1,2 x
)
and G (1,3)
x from Table 3.2 as well as ( x1 , x2 )(1,2 ) and
( x1 , x2 ) (1,3)
we are led to the weighted arithmetic mean xˆ1 = xˆ2 = 6 /11 . Such a
result has to be compared with the componentwise median x1 ( median) = 1/4
and x2 ( median) = 3/4.
3-3 Case study 181
Here, the arithmetic mean of x1(1,2) , x1(1,3) and x2(1,2) , x2(1,3) coincides with the me-
dian neglecting the weight of the pseudo-observations.
Box 3.30
Piecewise linear models and two break points
“Example”
ª x1 º
« »
ª y1 º ª1n tn 0 0 0 0 º « x2 » ª i y º
« y2 » = « 0
1 1
» x « 1
»
0 1n tn 0 0 » « 3 » + «i y (3.178)
« » « 2 2
« x4 » 2 »
¬ y3 ¼ «¬ 0 0 0 0 1n 3
t n »¼ « x » «¬i y
3 3
»¼
5
«x »
¬« 6 ¼»
I n -LESS, I n -LESS, I n -LESS,
1 2 3
2 « » (3.179)
¬ 2 ¼ A n1t n t n (1cn t n ) «¬ (1cn t n )(1cn y1 ) + n1t cn y1
1 1 1 1 1 1 1 1 »¼
2 « » (3.180)
¬ 4 ¼ A n2 t cn t n (1cn t n ) «¬ (1cn t n )(1cn y 2 ) + n2 t cn y 2 ¼»
2 2 2 2 2 2 2 2
Box 3.31
Piecewise linear models and two break points
“ Example”
1st interval: n = 4, m = 2, t {t1 , t2 , t3 , t4 }
ª1 t1 º
«1 t2 » ª x1 º
y1 = « » + i y = 1n x1 + t n x2 + i y (3.182)
«1 t3 » «¬ x2 »¼ 1 1
« »
¬1 t4 ¼
182 3 The second problem of algebraic regression
« »
¬1 t7 ¼
« »
¬1 t10 ¼
Figure 3.10, 3.11 and 3.12 have illustrated the three clusters of combinatorial
solutions referring to the first, second and third straight line. Outlined in Box
3.30 and Box 3.31, namely by (3.178) to (3.181), n1 = n2 = n3 = 4 , we have com-
puted ( x1 , x2 ) A for the first segment, ( x3 , x4 ) A for the second segment and
( x5 , x6 ) A for the third segment of the least squares fit of the straight line. Table
3.3 contains the results explicitly. Similarly, by means of the Gauss-Jacobi
Combinatorial Algorithm of Table 3.4 we have computed the identical solution
( x1 , x2 ) A , ( x3 , x4 ) A and ( x5 , x6 ) A as “weighted arithmetic means” numerically
presented only for the first segment.
Table 3.3
I n -LESS solutions for those segments of a straight line,
two break points
ªx º ª 0.5º
I 4 -LESS : « 1 » = « » : y (t ) = 0.5 + 0.6t
¬ x2 ¼ A ¬0.6 ¼
ªx º 1 ª126 º
I 4 -LESS : « 3 » = « » : y (t ) = 6.30 0.85t
¬ x4 ¼ A 20 ¬ 17 ¼
ªx º 1 ª 183º
I 4 -LESS : « 5 » = « » : y (t ) = 9.15 + 1.40t .
¬ x6 ¼ A 20 ¬ 28 ¼
Table 3.4
Gauss-Jacobi Combinatorial Algorithm
for the first segment of a straight line,
3-3 Case study 183
ª xˆ1 º (3,4) 1
« xˆ » = ª¬G x + G x + G x + G x + G x + G x º¼
(1,2) (1,3) (1,4) (2,3) (2,4)
¬ 2¼
ª x1(1,2) º
« (1,2) »
« x2 »
« x1(1,3) »
« (1,3) »
« x2 »
« x (1,4) »
« 1(1,4) » (3.185)
x
ª¬G (1,2)
x , G (1,3)
x , G (1,4)
x , G (2,3)
x , G(2,4)
x , G(3,4)
x º¼ «« 2(2,3) »»
x
« 1(2,3) »
« x2 »
« x (2,4) »
« 1 »
« x2(2,4) »
« (3,4) »
« x1 »
«¬« x2(3,4) »¼»
) 1 1 ª 15 5º
ª¬G (1,2 )
+ G (1,3) + G (1,4 )
+ G (x2,3) + G (x2,4 ) + G (3,4 º¼ =
30 «¬ 5 2 »¼
x x x x
ª x1(1,2) º ª 2 3º ª 0º ª 3º
G (1,2)
x « (1,2) » = « »« » = « »
¬ x2 ¼ ¬ 3 5¼ ¬1¼ ¬5¼
ª x1(1,3) º ª 2 4 º ª 0.5º ª 3º
G (1,3)
x « (1,3) » = « »« » = « »
¬ x2 ¼ ¬ 4 10¼ ¬ 0.5¼ ¬ 7¼
ª x1(1,4) º ª 2 5 º ª 0.333º ª 4 º
G (1,4)
x « (1,4) » = « »« »=« »
¬ x2 ¼ ¬ 5 17 ¼ ¬ 0.666¼ ¬13¼
ª x1(2,3) º ª 2 5 º ª 2 º ª 4 º
G (2,3)
x « (2,3) » = « »« » = « »
¬ x2 ¼ ¬ 5 13¼ ¬ 0 ¼ ¬10¼
ª x1(2,4) º ª 2 6 º ª 1 º ª 5 º
G (2,4)
x « (2,4) » = « »« » = « »
¬ x2 ¼ ¬ 6 20¼ ¬ 0.5¼ ¬16¼
ª x1(3,4) º ª 2 7 º ª 1º ª 5 º
G (3,4)
x « (3,4) » = « »« » = « »
¬ x2 ¼ ¬ 7 25¼ ¬ 1 ¼ ¬18¼
ª x1(1,2) º (3,4)
(3,4) ª x1 º ª 24 º
G (1,2)
x « (1,2) » + " + G x « (3,4) » = « »
¬ x2 ¼ ¬ x2 ¼ ¬ 69 ¼
ª xˆ1 º 1 ª15 5º ª 24 º ª 0.5º
« xˆ » = 30 « 5 2 » « 69 » = « 0.6» .
¬ 2¼ ¬ ¼¬ ¼ ¬ ¼
3-4 Special linear and nonlinear models 184
Table 3.5
A family of means
Name Formula
1
xA = ( y1 + " + yn )
arithmetic mean n
1
n = 2 : xA = ( y1 + y2 )
2
x A = (1cG y 1) 11cG y y
if G y = Diag( g1 ," , g1 )
weighted arithmetic mean then
g1 y1 + g n yn
xA =
g1 + " + g n
1
§ n · n
( y1 ) p +1 + " + ( yn ) p +1
xp =
Wassel´s family of means ( y1 ) p + " + ( yn ) p
n n
x p = ¦ ( yi ) p +1 ¦(y ) i
p
i =1 i =1
Case p=0: x p = xA
Case p= 1/ 2 , n=2: x p = xA
Hellenic mean Case p= –1:
n=2:
1
§ y 1 + y21 · 2 y1 y2
H = H ( y1 , y2 ) = ¨ 1 ¸ = .
© 2 ¹ y1 + y2
3-5 A historical note on C.F. Gauss, A.M. Legendre and the inven-
tions of Least Squares and its generalization
The historian S.M. Stigler (1999, pp 320, 330-331) made the following com-
ments on the history of Least Squares.
“The method of least squares is the automobile of modern statistical
analysis: despite its limitations, occasional accidents, and incidental pollu-
tion, this method and its numerous variations, extensions, and related
conveyances carry the bulk of statistical analyses, and are known and val-
ued by nearly all. But there has been some dispute, historically, as to who
is the Henry Ford of statistics. Adrian Marie Legendre published the
method in 1805, an American, Robert Adrian, published the method in
late 1808 or early 1809, and Carl Fiedrich Gauss published the method in
1809. Legendre appears to have discovered the method in early 1805, and
Robert Adrain may have “discovered” it in Legendre’s 1805 book (Stigler
1977c, 1978c), but in 1809 Gauss had the temerity to claim that he had
been using the method since 1795, and one of the most famous priority
disputes in the history of science was off and running. It is unnecessary to
repeat the details of the dispute – R.L. Plackett (1972) has done a masterly
job of presenting and summarizing the evidence in the case.
Let us grant, then, that Gauss’s later accurate were substantially accunate,
and that he did device the method of least squares between 1794 and
1799, independently of Legendre or any other discoverer. There still re-
mains the question, what importance did he attach to the discovery? Here
the answer must be that while Gauss himself may have felt the method
useful, he was unsuccessful in communicating its importance to other be-
fore 1805. He may indeed have mentioned the method to Olbers, Linde-
mau, or von Zach before 1805, but in the total lack of applications by oth-
ers, despite ample opportunity, suggests the message was not understood.
The fault may have been more in the listener than in the teller, but in this
case its failure serves only to enhance our admiration for Legendre’s 1805
success. For Legendre’s description of the method had an immediate and
186 3 The second problem of algebraic regression
widespread effect – as we have seen, it even caught the eye and under-
standing of at least one of those astronomers (Lindemau) who had been
deaf to Gauss’s message, and perhaps it also had an influence upon the
form and emphasis of Gauss’s exposition of the method.
When Gauss did publish on least squares, he went far beyond Legendre in
both conceptual and technical development, linking the method to prob-
ability and providing algorithms for the computation of estimates. His
work has been discussed often, including by H.L. Seal (1967), L. Eisen-
hart (1968), H. Goldsteine (1977, §§ 4.9, 4.10), D.A. Sprott (1978),O.S.
Sheynin (1979, 1993, 1994), S.M. Stigler (1986), J.L. Chabert (1989),
W.C. Waterhouse (1990), G.W. Stewart (1995), and J. Dutka (1996).
Gauss’s development had to wait a long time before finding an apprecia-
tive audience, and much was intertwined with other’s work, notably
Laplace’s. Gauss was the first among mathematicians of the age, but it
was Legendre who crystallized the idea in a form that caught the mathe-
matical public’s eye. Just as the automobile was not the product of one
man of genius, so too the method of least squares is due to many, includ-
ing at least two independent discoverers. Gauss may well have been the
first of these, but he was no Henry Ford of statistics. If these was any sin-
gle scientist who first put the method within the reach of the common
man, it was Legendre.”
Indeed, these is not much to be added. G.W. Stewart (1995) recently translated
the original Gauss text “Theoria Combinationis Observationum Erroribus Mini-
mis Obmaxial, Pars Prior. Pars Posterior “ from the Latin origin into English.
F. Pukelsheim (1998) critically reviewed the sources, the reset Latin text and the
quality of the translation. Since the English translation appeared in the SIAM
series “ Classics in Applied Mathematics”, he concluded: “ Opera Gaussii contra
SIAM defensa”.
“Calculus probilitatis contra La Place defenses.” This is Gauss’s famous diary
entry of 17 June 1798 that he later quoted to defend priority on the Method of
Least Squares (Werke, Band X, 1, p.533).
C.F. Gauss goes Internet
With the Internet Address http://gallica.bnf.fr you may reach the catalogues of
digital texts of Bibliotheque Nationale de France. Fill the window “Auteur” by
“Carl Friedrich Gauss” and you reach “Types de documents”. Continue with
“Touts les documents” and click “Rechercher” where you find 35 documents
numbered 1 to 35. In total, 12732 “Gauss pages” are available. Only the Gauss-
Gerling correspondence is missing. The origin of all texts are the resources of
the Library of the Ecole Polytechnique.
Meanwhile Gauss’s Werke are also available under http://www.sub.uni-
goettingen.de/. A CD-Rom is available from “Niedersächsische Staats- und
Universitätsbibliothek.”
For the early impact of the Method of Least Squares on Geodesy, namely W.
Jordan, we refer to the documentary by S. Nobre and M. Teixeira (2000).
4 The second problem of probabilistic regression
– special Gauss-Markov model without datum defect –
Setup of BLUUE for the moments of first order and of
BIQUUE for the central moment of second order
Lemma 4.2
ȟ̂ : Ȉ y -BLUUE of ȟ
Lemma 4.4
E{yˆ }, Ȉ y -BLUUE of E{y}
e y , D{e y }, D{y}
Theorem 4.5
equivalence of Ȉ y -BLUUE
and G y -LESS
Corollary 4.6
multinomial inverse
Lemma 4.13
var-cov components:
IQUUE
Corollary 4.14
translational invariance
4 The second problem of probabilistic regression 189
Corollary 4.15
IQUUE of Helmert type:
HIQUUE
Corollary 4.16
Helmert equation
det H z 0
Corollary 4.17
Helmert equation
det H = 0
Lemma 4.20
Best IQUUE
Theorem 4.21
2
Vˆ BIQUUE of V
The key for evaluating “LESS” is handed over to us by treating the special alge-
braic regression problem by means of a special probabilistic regression problem,
namely a special Gauss-Markov model without datum defect. We shall prove that
uniformly unbiased estimations of the unknown parameters of type “fixed ef-
fects” exist. “LESS” is replaced by “BLUUE” (Best Linear Uniformly Unbiased
Estimation). The fixed effects constitute the moments of first order of the under-
lying probability distributions of the observations to be specified. In contrast, its
central moments of second order, known as the variance-covariance matrix or
dispersion matrix, open the door to associate to the estimated fixed effects an
objective accuracy measure.
1 15 +2 +2 11 0 +2
2 12 -1 -1 12 1 -1
3 14 +1 +1 13 1 +1
4 11 -2 -2 14 2 -1
5 13 0 0 15 2 0
In contrast, Table 4.2 presents an augmented observation vector: The observa-
tions six is an outlier. Again we have computed the new arithmetic mean 30.16
as well as the standard deviation 42.1. In addition, for both examples we have
calculated the sample median and the sample absolute deviation for comparison.
All definitions will be given in the context as well as a careful analysis of the
two data sets.
number of observation difference of difference of ordered set ordered set of ordered set
observation yi observation observation of observa- y( i ) med y of
and mean and median tions y( i ) y( i ) mean y
n
Vˆ 2 = y cMy = (y c
y c)(vec M )
Vˆ 2 = ¦m pq y p yq or
p , q =1 = (vecM )c(y
y )
First, we begin with the postulate that the fixed unknown parameters ( P , V 2 )
are estimated by means of a certain linear form P̂ = A cy + N = y cA + N and by
means of a certain quadratic form Vˆ 2 = y cMy + xcy + Z = (vec M )c(y
y ) +
+ xcy + Z of the observation vector y, subject to the symmetry condition
M SYM := {M R n× n | M = M c}, namely the space of symmetric matrices.
Second we demand E{Pˆ } = P , E{Vˆ 2 } = V 2 , namely unbiasedness of the estima-
tions ( Pˆ , Vˆ 2 ) . Since the estimators ( Pˆ , Vˆ 2 ) are special forms of the observation
vector y R n , an intuitive understanding of the postulate of unbiasedness is the
following: If the dimension of the observation space Y
y , dim Y = n , is going
to infinity, we expect information about the “two values” ( P , V 2 ) , namely
lim Pˆ (n) = P , lim Vˆ 2 ( n) = V 2 .
nof nof
V 2 Aˆ c + 1cn Oˆ = 0c
V 2 Aˆ c1 n + 1 cn 1 n Oˆ = V 2
+ 1 cn 1 n Oˆ = 0
V2 V2
Oˆ = =
1cn1n n
2
V
V 2 Aˆ +1n Oˆ =V 2 lˆ 1n = 0c
n
1 1
Aˆ = 1n and Pˆ = Aˆ cy = 1cn y .
n n
4-1 Introduction 195
The second derivatives
1 w 2L ˆ ˆ
( A , O ) = V 2 I n > 0c
2 wAwA c
constitute the sufficiency condition which is automatically satisfied. The theory
of vector differentiation is presented in detail in Appendix B. Let us briefly sum-
marize the first result P̂ BLUUE of P .
tr D = tr ( I n ) tr 1n (1n1cn ) = n 1
with respect to the linear model of “direct” observations. Under the action of
such a transformation group the quadratic estimation Vˆ 2 of V 2 is specialized to
E{Vˆ 2 | Ȉ y = I nV 2 } = V 2 tr M = 1 tr M 1 = 0 .
198 4 The second problem of probabilistic regression
1
Vˆ 2 = y cMy = y c(I n 1n 1n1cn )y IQUUE
n 1
2nd : E{Vˆ 4 }
E{Vˆ 4 } = mij mk A E{eiy e jy eky eAy } = mij mk AS ijk A
4-1 Introduction 199
subject to
S ijk A := E{eiy e jy eky eAy } i, j , k , A {1,..., n} .
For a normal pdf, the fourth order moment S ijk A can be reduced to second order
moments. For a more detailed presentation of “normal models” we refer to Ap-
pendix D.
S ijk A = S ijS k A + S ik S jA + S iAS jk = V 4 (G ijG k A + G ik G j A + G iAG jk )
E{Vˆ 4 } = V 4 [(tr M ) 2 + 2 tr M cM ].
which accounts for the symmetry of the unknown matrix. We here conclude with
the normal equations for BIQUUE generated from
wL wL wL wL
= 0, = 0, = 0, = 0.
w (vec M ) wO0 wO1 wO2
2
D{Vˆ 2 } = V 4.
n 1
Finally, we are going to outline the simultaneous estimation of {P , V 2 } for the
linear model of direct observations.
• first postulate: inhomogeneous, multilinear (bilinear) estimation
Pˆ = N 1 + A c1y + mc1 (y
y )
Vˆ 2 = N 2 + A c2 y + (vec M 2 )c( y y )
ª Pˆ º ªN 1 º ª A c1 mc1 º ª y º
«Vˆ 2 » = «N » + « A c (vec M )c» « y
y »
¬ ¼ ¬ 2¼ ¬ 2 2 ¼¬ ¼
ª Pˆ º ªN A c1 m1c º
x = XY x := « 2 » , X = « 1 »
¬Vˆ ¼ ¬N 2 A c2 (vec M 2 )c¼
ª 1 º
Y := «« y »»
«¬ y
y »¼
4-13 BLUUE and BIQUUE of the front page example, sample median,
median absolute deviation
According to Table 4.1 and Table 4.2 we presented you with two sets of observa-
tions yi Y, dim Y = n, i {1,..., n} , the second one qualifies to certain “one
outlier”. We aim at a definition of the median and of the median absolute devia-
tion which is compared to the definition of the mean (weighted mean) and of the
root mean square error. First we order the observations according to
y(1) < y( 2) < ... < y( n1) < y( n ) by means of the permutation matrix
ª y(1) º ª y1 º
« y » « y2 »
« (2) » « »
« ... » = P « ... » ,
« y( n 1) » « yn 1 »
« » «¬ yn »¼
¬« y( n ) ¼»
namely
data set one data set two
ª 11 º ª 0 0 0 1 0 0º ª 15 º
ª11º ª 0 0 0 1 0º ª15º « 12 » « 0
«12 » « 0 1 0 0 0 0»» «« 12 »»
« » « 1 0 0 0 »» ««12»» « » «
« 13 » « 0 0 0 0 1 0» « 14 »
«13» = « 0 0 0 0 1 » «14» versus « »=« »« »
« » « »« » « 14 » « 0 0 1 0 0 0» « 11 »
«14 » « 0 0 1 0 0 » «11»
« 15 » «1 0 0 0 0 0» « 13 »
«¬15»¼ «¬1 0 0 0 0»¼ «¬13»¼ « » « »« »
«¬116 »¼ «¬ 00 0 0 0 1 »¼ «¬116»¼
ª0 0 0 1 0 0º
ª0 0 0 1 0º «0 1 0 0 0 0»
«0 1 0 0 0»» « »
« «0 0 0 0 1 0»
P5 = « 0 0 0 0 1» versus P6 = « ».
« » «0 0 1 0 0 0»
«0 0 1 0 0»
«1 0 0 0 0 0 »
«¬1 0 0 0 0»¼ « »
«¬0 0 0 0 0 1 »¼
Table 4.3: “direct” observations, comparison two data sets by means of med y,
mad y (I-LESS, G y -LESS), r.m.s. (I-BIQUUE)
data set one data set two
n = 5 (“odd”) n = 6 (“even”)
n / 2 = 2.5, [n / 2] = 2 n / 2 = 3, n / 2 + 1 = 4
[n / 2] + 1 = 3
med y = y(3) = 13 med y = 13.5
G y = Diag(1,1,1,1,1, 1000
24
)
such that
y1 + y2 + y3 + y4 + y5 5med y
x=
med y y6
4-1 Introduction 203
here
24
x = 0.024, 390, 243 .
1000
Indeed the weighted mean with respect to the weight matrix G y = Diag(1,1,1,
1,1, 24 /1000) reproduces the median of the second data set. The extreme value
has been down-weighted by a weight 24 /1000 approximately.
Four, with respect to the simple linear model E{y} = 1P , D{y} = IV 2 we com-
pute I-BLUUE of P and I-BIQUUE of V 2 , namely
Pˆ = (1c1) 11cy = 1n 1cy
1 1 1
Vˆ 2 = y c ªI 1(1c1) 1 º¼ y = y c ªI 1 11cº y = (y 1Pˆ )c(y 1Pˆ ) .
n 1 ¬ n 1 ¬ n ¼ n 1
Obviously the extreme value y6 in the second data set has spoiled the specifica-
tion of the simple linear model. The r.m.s. (I-BLUUE) = 1.6 of the first data set
is increased to the r.m.s. (I-BIQUUE) = 42.1 of the second data set.
Five, we setup the alternative linear model for the second data set, namely
ª y1 º ª P1 º ª P º ª1º ª0º
« y2 » « P » « P » «1» «0»
« » « 1» « » « » « »
y P P » 1 0
E{« 3 »} = « 1 » = « = « » P + « »Q
« y4 » « P1 » « P » «1» «0»
« y5 » « P1 » « P » «1» «0»
« » « » « » «» «1 »
«¬ y6 »¼ ¬ P 2 ¼ ¬ P + Q ¼ ¬1¼ ¬ ¼
[ ] [ ]
ª A := 1, a \ 5×2 , 1 := 1,1,1,1,1,1 c \ 6×1
E{y} = Aȟ : «
«
¬ȟ := [ P ,Q ] \ , a := [ 0, 0, 0, 0, 0,1] \
c 2×1 c 6×1
ª1 0 0 0 0 0º
«0 1 0 0 0 0»
« »
0 0 1 0 0 0» 2
D{y} = « V \ 6×6
«0 0 0 1 0 0»
«0 0 0 0 1 0»
«0 0 0 0 0 1 »¼
¬
D{y} = I 6V 2 , V 2 \ + ,
adding to the observation y6 the bias term Q . Still we assume the variance-
covariance matrix D{y} of the observation vector y \ 6×1 to be isotropic with
one variance component as an unknown. ( Pˆ ,Qˆ ) is I 6 -BLUUE if
204 4 The second problem of probabilistic regression
ª Pˆ º
«Qˆ » = ( A cA) A cy
1
¬ ¼
ª Pˆ º ª 13 º
«Qˆ » = «103»
¬ ¼ ¬ ¼
Pˆ = 13, Qˆ = 103, P1 = Pˆ = 13, yˆ 2 = 116
ª Pˆ º ª Pˆ º V 2 ª 1 1º
D{« »} = ( A cA) 1 V 2 D{« »} = « 1 6 »
¬Qˆ ¼ ¬Qˆ ¼ 5 ¬ ¼
V2 6 1 2
V P2ˆ = , V Q2ˆ = V 2 , V PQ
ˆˆ = V
5 5 5
Vˆ 2 is I 6 -BIQUUE if
1
Vˆ 2 = y c ªI 6 A ( A cA ) 1 A cº¼ y
n rk A ¬
ª4 1 1 1 1 0º
« 1 4 1 1 1 0»
1 « 1 1 4 1 1 0»
»
I 6 A ( A cA ) 1 A c = «
5 « 1 1 1 4 1 0»
« 1 1 1 1 4 0»
«0 0 0 0 0 5 »¼
¬
§4 4 4 4 4 ·
ri := ª¬I 6 A( A cA) 1 A cº¼ = ¨ , , , , ,1¸ i {1,..., 6}
ii
©5 5 5 5 5 ¹
are the redundancy numbers.
y c(I 6 A( A cA) 1 A c) y = 13466
1
Vˆ 2 = 13466 = 3366.5, Vˆ = 58.02
4
3366.5
V Pˆ (Vˆ ) =
2 2
= 673.3, V Pˆ (Vˆ ) = 26
5
6
V Q2ˆ (Vˆ 2 ) = 3366.5 = 4039.8, V Qˆ (Vˆ ) = 63.6 .
5
Indeed the r.m.s. value of the partial mean P̂ as well as of the estimated bias Qˆ
have changed the results remarkably, namely from r.m.s. (simple linear model)
42.1 to r.m.s. (linear model) 26. A r.m.s. value of the bias Qˆ in the order of 63.6
has been documented. Finally let us compute the empirical “error vector” l and
is variance-covariance matrix by means of
4-1 Introduction 205
e y = [ 2 1 1 2 0 116]c
ª4 1 1 1 1 0º
« 1 4 1 1 1 0» 2
{} «
D l = « 1
1
1
1
4
1
1
4
1
1
0» V
0» 5
« 1 1 1 1 4 0»
«¬ 0 0 0 0 0 5 »¼
ª4 1 1 1 1 0º
« 1 4 1 1 1 0»
« 0» .
D{l | Vˆ } = 673.3
« 1
2 1 4 1 1
1 1 1 4 1 0»
« 1 1 1 1 4 0»
«¬ 0 0 0 0 0 5 »¼
n
n n 1
= ln 2S ln V 2 2 ¦(y i P ) 2 = max
2 2 V i =1 P ,V 2
206 4 The second problem of probabilistic regression
( ) n
2
n
2
n
(
ln f y1 , " , yn P , V 2 = ln 2S ln V 2 2 m2 2m1 P + P 2 .
2V
)
Now we are able to define the optimization problem
( ) (
A P , V 2 := ln f y1 , " , yn P , V 2 = max) P, V 2
more precisely.
Definition (Maximum Likelihood Estimation, linear model
E{y} = 1n P , D{y} = I nV 2 , independent, identically normal
distributed observations { y1 ,… , yn } ):
A 2x1 vector [ PA , V A2 ]' is called MALE of [ P , V 2 ]' , (Maximum Likeli-
hood Estimation) with respect to the linear model 0.1 if its log-
likelihood function
A( P , V 2 ) := ln f ( y1 ,… , yn P , V 2 ) =
n n n
= ln 2S ln V 2 2 (m2 2m1 + P 2 )
2 2 2V
is minimal.
The simultaneous estimation of ( P , V 2 ) of type MALE can be characterized as
following.
Corollary (MALE with respect to the linear model
E{y}= 1n P , D{y}= I nV 2 , independent identically normal dis-
tributed observations { y1 ,… , yn } ):
( )
The log-likelihood function A P , V 2 with respect to the linear model
E{y} = 1n P , D{y} = I nV 2 , ( P , V 2 R × R + ) , of independent, identically
normal distributed observations { y1 ,… , yn } is maximal if
1 1
PA = m1 = 1c y, V A2 = m2 m12 = ( y yA )c( y yA )
n n
is a simultaneous estimation of the mean volume (first moment) PA and
of the variance (second moment) V A2 .
:Proof:
The Lagrange function
n n
L( P , V 2 ) := ln V 2 2 (m2 2m1 P + P 2 ) = max
2 2V P ,V 2
4-1 Introduction 207
leads to the necessary conditions
wL nm nP
( P , V 2 ) = 21 = 2 = 0
wP V V
wL n n
( P , V 2 ) = 2 + 4 (m2 2 P m1 + P 2 ) = 0,
wV 2 2V 2V
also called the likelihood normal equations.
Their solution is
ª P1 º ª m1 º 1 ª 1cy º
«V 2 » = « m m 2 » = « y cy - (1cy)2 » .
¬ A¼ ¬ 2 1 ¼ n¬ ¼
The matrix of second derivates constitutes as a negative matrix the sufficiency
conditions.
ª 1 º
2 «V 2 0 »
w L
( P A , V A ) = n « A »>0.
w ( P , V 2 )w ( P , V 2 ) ' « 0 1 »
« V A4 »¼
¬
h
Finally we can immediately check that A( P , V 2 ) o f as ( P , V 2 ) approaches the
boundary of the parameter space. If the log-likelihood function is sufficiently
regular, we can expand it as
ª PP º
A( P , V 2 ) = A( PA , V A2 ) + DA( PA , V A2 ) « 2 A2 » +
¬V V A ¼
1 ª PP º ª PP º
+ D 2 A( PA , V A2 ) « 2 A2 »
« 2 A2 » + O3 .
2 ¬V V A ¼ ¬V V A ¼
Due to the likelihood normal likelihood equations DA( PA , V A2 ) vanishes. There-
fore the behavior of A( P , V 2 ) near ( PA , V A2 ) is largely determined by
D 2 A( PA , V A2 ) R × R + , which is a measure of the local curvature the log-
likelihood function A( P , V 2 ) . The non-negative Hesse matrix of second deriva-
tives
w2A
I ( PA , V A2 ) = 2 2
( PA , V A2 ) > 0
w ( P , V )w ( P , V ) '
PA V A2 | VA |
st
1 example
(n=5) 13 2 2
2nd example
(n=6) 30.16 1474.65 36.40
ȟˆ = Ly + ț , (4.3)
4-2 Setup of the best linear uniformly unbiased estimators 209
Note that there are locally unbiased estimations such that (LA I m )ȟ 0 = 0 for
LA I m z 0. Alternatively, B. Schaffrin (2000) has softened the constraint of
unbiasedness (4.9) by replacing it by the stochastic matrix constraint
A cLc = I m + E0 subject to E{vec E0 } = 0, D{vec E0 } = (I m
Ȉ 0 ), Ȉ 0 a positive
definite matrix. For Ȉ0 o 0 , uniform unbiasedness is restored. Estimators which
fulfill the stochastic matrix constraint A cLc = I m + E0 for finite Ȉ0 are called
“softly unbiased” or “unbiased in the mean”.
Second, the choice of norm for "best" of type minimum variance has to be dis-
cussed more specifically. Under the condition of a linear uniformly unbiased
estimation let us derive the specific representation of the weighted Frobenius
matrix norm of Lc . Indeed let us define the dispersion matrix
210 4 The second problem of probabilistic regression
for Ȉ y - BLUUE. The necessary conditions for the minimum of the quadratic
constraint Lagrangean L (L, ȁ ) are
4-2 Setup of the best linear uniformly unbiased estimators 211
wL ˆ ˆ ˆ )c = 0
( L, ȁ ) := 2( Ȉ y Lˆ c + Aȁ (4.18)
wL
wL ˆ ˆ
(L, ȁ ) := 2( A cLˆ c I m ) = 0 , (4.19)
wȁ
which agree to the normal equations (4.14). The theory of matrix derivatives is
reviewed in Appendix B, namely (d3) and (d4).
The second derivatives
w 2L ˆ ) = 2( Ȉ
I ) > 0
(Lˆ , ȁ y m (4.20)
w (vecL)w (vecL)c
constitute the sufficiency conditions due to the positive-definiteness of the matrix
Ȉ for L (L, ȁ ) = min . (The Kronecker-Zehfuss Product A
B of two arbitrary
matrices A and B, is explained in Appendix A.)
h
Obviously, a homogeneous linear form ȟˆ = Ly is sufficient to generate Ȉ -
BLUUE for the special Gauss-Markov model (4.1), (4.2). Explicit representa-
tions of Ȉ - BLUUE of type ȟ̂ as well as of its dispersion matrix D{ȟˆ | ȟˆ Ȉ y -
BLUUE} generated by solving the normal equations (4.14) are collected in
Theorem 4.3 ( ȟˆ Ȉ y -BLUUE of ȟ ):
Let ȟˆ = Ly be Ȉ - BLUUE of ȟ in the special linear Gauss-Markov
model (4.1),(4.2). Then
ȟˆ = ( A cȈ y 1 A) 1 A cȈ y1 y (4.21)
:Proof:
We shall present two proofs of the above theorem: The first one is based upon
Gauss elimination in solving the normal equations (4.14), the second one uses
the power of the IPM method (Inverse Partitioned Matrix, C. R. Rao's Pandora
Box).
(i) forward step (Gauss elimination):
Multiply the first normal equation by Ȉ y1 , multiply the reduced equation by Ac
and subtract the result from the second normal equation. Solve for ȁ
212 4 The second problem of probabilistic regression
A cLˆ c A cȈy1Aȁˆ = 0º
»
A cLˆc=I »¼
m
A cȈ y1Aȁ ˆ =I
ˆ = ( A cȈ 1A ) 1 .
ȁ (4.23)
y
Lˆ c + Ȉy1Aȁˆ = 0 Lˆ = ȁ ˆ cA cȈ 1 º
y
»
ˆ
ȁ = ( A cȈ y A )
1 1
»¼
ª Ȉ y A º ª A11 A12 º
« Ac 0 » = « Ac 0 »¼
.
¬ ¼ ¬ 12
According to Appendix A (Fact on Inverse Partitioned Matrix: IPM) its Cayley
inverse is partitioned as well.
1 1
ª Ȉy Aº ªA A12 º ªB B12 º
= « 11 = « 11
« Ac
¬ 0 »¼ c
¬ A12 0 »¼ c
¬B12 B 22 »¼
Lˆ = B12
c = ( AcȈ y1A) 1 AcȈ y1
ˆ = B = ( A cȈ 1A) 1. (4.25)
ȁ 22 y
4-2 Setup of the best linear uniformly unbiased estimators 213
(iv) dispersion matrix
The related dispersion matrix is computed by means of the "Error Propagation
Law".
D{ȟˆ} = D{Ly | Lˆ = ( A cȈ y1A) 1} = Lˆ D{y}Lˆ c
D{ȟˆ} = ( A cȈ y1A) 1 A cȈ y1Ȉ y Ȉ y1A( A cȈ y1 A) 1
:Proof:
n
(i ) E{y} = Aȟˆ = A ( A cȈ y 1 A ) 1 A cȈ y 1 y
(ii ) D{Aȟˆ} = A ( A cȈ y 1 A ) 1 A c
As soon as we substitute
E{e y } = [I n A( A cȈy1A ) 1 A cȈy1 ]E{y}
The additive decomposition of the residual vector y-E{y} left us with two terms,
namely the predicted residual vector e y and the term which is a linear functional
of ȟˆ ȟ. The corresponding product decomposition
[ y E{y}][ y E{y}]c =
= A (ȟˆ ȟ )(ȟˆ ȟ )c + A(ȟˆ ȟ )e cy + e y (ȟˆ ȟ )cA c + e y e cy
due to
or
C{Aȟˆ , e y } =
A( A cȈ y1A ) 1 A cȈ y1 E{(y E{y})( y E{y})c}[I n A( AcȈ y1A) 1 AcȈ y1 ]c
C{Aȟˆ , e y } =
A( A cȈ y1A) 1 A cȈ y1Ȉ y [I n Ȉ y1A( A cȈ y1A) 1 A c]
C{Aȟˆ , e y } =
A( A cȈ y1A) 1 A c A( A cȈ y1A) 1 A c = 0.
The proof is straight forward if we compare the solution (3.11) of G y -LESS and
(4.21) of Ȉ y -BLUUE. Obviously the inverse dispersion matrix D{y},
y{Y,pdf} is equivalent to the matrix of the metric G y of the observation space
Y. Or conversely the inverse matrix of the metric of the observation space Y
determines the variance-covariance matrix D{y} Ȉ y of the random variable
y {Y, pdf} .
4-3 Setup of BIQUUE 217
dispersion vector (4.39), (4.44). Indeed the dispersion vector d(y ) = Xı builds
up a linear form where the second order design matrix X, namely
2
× A ( A +1) / 2
X := [vec C1 ," , vec CA ( A +1) ] R n ,
reflects the block structure. There are A(A+1)/2 matrices C j , j{1," , A(A +1) / 2} .
For instance, for A = 2 we are left with 3 block matrices {C1 , C2 , C3 } .
Before we analyze the variance-covariance component model in more detail,
we briefly mention the multinominal inverse Ȉ 1 of the block partitioned ma-
trix Ȉ . For instance by “JPM” and “SCHUR” we gain the block partitioned
inverse matrix Ȉ 1 with elements {U11 , U12 , U 22 } (4.51) – (4.54) derived from
the block partitioned matrix Ȉ with elements {V11 , V12 , V22 } (4.47).
“Sequential JPM” solves the block inverse problems for any block parti-
tioned matrix. With reference to Box 4.2 and Box 4.3
Ȉ = C1V 1 + C2V 2 + C3V 3 Ȉ 1 = E1 (V ) + E2 (V ) + E3 (V )
is an example.
Box 4.2
Partitioning of variance-covariance matrix
ª V11V 12 V12V 12 " V1A 1V 1A 1 V1AV 1A º
« Vc V V22V 22 " V2A 1V 2A 1 V2AV 2A »
« 12 12 »
Ȉ=« # # # # »>0 (4.38)
« V1cA 1V 1A 1 V2cA 1V 2A 1 " VA 1A 1V A21 VA 1AV A 1A »
«¬ V1cAV 1A V2cAV 2A " VAc1AV A 1A VAAV A2 »¼
A ( A +1) / 2
Ȉ= ¦ C jV j R n× m (4.42)
j =1
220 4 The second problem of probabilistic regression
Box 4.3
Multinomial inverse
:Input:
ªV 0º ª 0 V12 º ª0 0 º
C11 := C1 := « 11 » , C12 := C2 := « » , C22 := C3 := « » (4.47)
¬ 0 0¼ ¬ V12c 0 ¼ ¬ 0 V22 ¼
3
Ȉ = C11V 12 + C12V 12 + C22V 22 = C1V 1 + C2V 2 + C3V 3 = ¦ C jV j (4.48)
j =1
3
ªV 1 º
vec 6 = ¦ (vec C j )V j =[vec C1 , vec C2 , vec C3 ] ««V 2 »» = XV (4.49)
j =1
«¬V 3 »¼
2
×1
vec C j R n j {1,..., A(A + 1) / 2}
" X is called second order design matrix"
2
×A ( A +1) / 2
X := [vec C1 ," , vec CA (A +1) / 2 ] R n
here: A=2
:output:
ªU 0 º 2 ª 0 U12 º 1 ª 0 0 º 2
Ȉ 1 = « 11 V1 + « V 12 + « »V 2 (4.50)
¬ 0 0 »¼ c
¬ U12 0 »¼ ¬ 0 U 22 ¼
4-3 Setup of BIQUUE 221
subject to
(4.51) U11 = V111 + qV111 V12 SV12c V111 , U12 = Uc21 = qV111 V12 S (4.52)
V 122
(4.53) U 22 = S = (V22 qV12c V111 V12 ) 1 ; q := (4.54)
V 12V 22
A ( A +1) / 2 = 3
Ȉ 1 = E1 + E2 + E3 = ¦ Ej (4.55)
j =1
ªU 0 º 2 ª 0 U12 º 1 ª 0 0 º 2
E1 (V ) := « 11 » V 1 , E 2 (V ) := « » V 12 , E3 (V ) := « »V 2 . (4.56)
¬ 0 0¼ c
¬ U12 0 ¼ ¬ 0 U 22 ¼
The general result that inversion of a block partitioned symmetric matrix con-
serves the block structure is presented in
Corollary 4.6 (multinomial inverse):
A ( A +1) / 2 A ( A +1) / 2
Ȉ= ¦ C j V j Ȉ 1 = ¦ Ǽ j (V ) . (4.57)
j =1 j =1
ªV 12 º
d {y} = vec Ȉ = [vec C11 ,… , vec C jj ] « " » (4.59)
«V 2 »
¬ A¼
+
d {y} = XV V R . (4.60)
Example 4.3. (two variance components, two sets of totally uncorrected observa-
tions "heterogeneous observations")
ª n = n1 + n2
ª I n V 12 0 º «
D{y} : Ȉ y = « 1
2»
subject to « Ȉ y SYM (R n×n ) (4.63)
¬« 0 I n V »
2¼ « 2 + +
¬V 1 R , V 2 R .
2 2
Example 4.4 (two variance components, one covariance components, two sets of
correlated observations "heterogeneous observations")
ª n = n1 + n2
ªV V 2 V12V 12 º « n ×n n ×n
D{y} : Ȉ y = « 11 1 » subject to « V11 R , V22 R
1 1 2 2
¬ V12c V 12 V11V 22 ¼ n ×n
«¬ V12 R 1 2
Those conditions of IQE, represented in Lemma 4.7 and Lemma 4.9 enable us to
separate the estimation process of first moments ȟ j (like BLUUE) from the esti-
mation process of central second moments V k (like BIQUUE). Finally we pro-
vide you with the general solution (4.75) of the in homogeneous matrix equa-
tions M1/k 2 A = 0 (orthogonality conditions) for all k {1, " ,A(A+1)/2} where
A(A+1)/2 is the number of variance-covariance components, restricted to the
special Gauss–Markov model E {y} = Aȟ , d {y} = XV of "full column rank",
A R n× m , rk A = m .
Definition 4.7 (invariant quadratic estimation Vˆ 2 of V 2 : IQE ):
The scalar Vˆ 2 is called IQE (Invariant Quadratic Estimation) of
V 2 R + with respect to the special Gauss-Markov model of full col-
umn rank.
E{y} = Aȟ, A R n×m , rk A = m
(4.69)
D{y} = VV 2 , V R n×n , rk V = n, V 2 R + ,
if the “variance component V 2 is V ”
(i) a quadratic estimation
Vˆ 2 = y cMy = (vec M )c(y
y ) = (y c
y c)(vec M) (4.70)
subject to
M SYM := {M R n× n | M c = M} (4.71)
M SYM := {M R m×n | M c = M}
y cMy = ȟ cA c(M1/ 2 )cM1/ 2 Aȟ + ȟ cA c(M1/ 2 )cM1/ 2e y + e yc (M1/ 2 )cM1/ 2 Aȟ + e ycMe y .
Vˆ k = y cM k y = e ycM k e y (4.82)
Note the fundamental lemma " Vˆ k IQE of V k " whose proof follows the
same line as the proof of Lemma 4.7.
Lemma 4.10 (invariant quadratic estimation Vˆ k of V k : IQE):
The problem is left with the orthogonality conditions (4.75), (4.76) and (4.84),
(4.85). Box 4.4 reviews the general solutions of the homogeneous equations
(4.86) and (4.88) for our " full column rank linear model ".
Box 4.4
General solutions of homogeneous matrix equations
M1/k 2 = 0 M k = Z k (I n - AA ) (4.86)
: rk A = m
226 4 The second problem of probabilistic regression
A = A L = ( A cG y A) 1 A cG y (4.87)
M1/k 2 = 0 º
» M k = Z k [I n A( A cG y A) A cG y ]
1/ 2 1
(4.88)
rk A d m ¼
« y y
Vˆ k = y cM k y = e ycM k e y (4.94)
or
Vˆ k = (vec M k )c(y
y ) = (vec M k )c(e y
e y ) (4.95)
or
Vˆ k = tr Ȃ k yy c = tr M k e y e yc , (4.96)
if and only if
(4.99) (i) M1/ 2 A = 0 and (ii) tr VM = 1 . (4.100)
:Proof:
First, we compute E{Vˆ 2 } .
Vˆ 2 = tr Me y e yc E{Vˆ 2 } = tr MȈ y = tr Ȉ y M.
E{Vˆ 2 } := V 2 V 2 R tr VM = 1.
Third, we adopt the first condition of type “IQE”.
h
The conditions for “ Vˆ k IQUUE of V k ” are only slightly more complicated.
ª y1 º ª A1 º
«y » «A »
« 2 » « 2 »
A 1 , nA × m
E{« # »} = « # » ȟ = Aȟ, A \ n n "n 1 2
, rk A = m (4.101)
« » « »
« y A 1 » « A A 1 »
«¬ y A »¼ «¬ A A »¼
n1 + n2 + " + nA 1 + nA = n
A ( A +1) / 2
D{y} = ¦ C jV j (4.104)
j =1
A 1 × nl
V11 \ n ×n , V12 \ n ×n ," , VA 1,A \ n
1 1 1 2
, VAA \ n ×n
A A
(4.107)
A ( A +1) / 2
E{Vˆ k } = V k ¦ (tr C j M k )V j = V k , V i R A ( A +1) / 2 (4.112)
j =1
tr C j M k G jk = 0 .
Box 4.5
IQUUE : one variance component
1st variations
{E{y} = Ax, A R n× m , rk A = m, D{y} = VV 2 , rk V = m, V 2 R + }
tr VM = D tr[I n A( A cV 1 A) A cV 1 ] = 1
tr I n = 0 º
»
tr[ A( A ' V A) A ' V ] = tr A( A ' V A) A ' V A = tr I m = m ¼
1 1 1 1 1 1
1
tr VM = D (n m) = 1 D = . (4.114)
nm
Let us make a statement about the translational invariance of e y predicted by
Ȉ y - BLUUE and specified by the “one variance component” model Ȉ y = VV 2 .
e y = e y ( Ȉ y - BLUUE) = [I n A( A ' Ȉ y1A) 1 A ' Ȉ y1 ]y . (4.115)
Box 4.6
Helmert’s ansatz
one variance component
e cy Ȉ y1e y = ecy P ' Ȉ y1Pe y = tr PȈ y1Pe y ecy (4.118)
E{e cy Ȉ y1e y } = tr(P ' Ȉ y1P E{e y ecy }) = tr(P ' Ȉ y1PȈ y ) (4.119)
232 4 The second problem of probabilistic regression
1
Vˆ 2 := e cy V 1e y E{Vˆ 2 }=V 2 . (4.123)
nm
Let us finally collect the result of “Helmert’s ansatz” in
Corollary 4.15 ( Vˆ 2 of HIQUUE of V 2 ):
Helmert’s ansatz
1
Vˆ 2 = e y ' V 1e y (4.124)
nm
is IQUUE, also called HIQUUE.
4-35 Invariant quadratic uniformly unbiased estimators of variance
covariance components of Helmert type: HIQUUE versus HIQE
In the previous paragraphs we succeeded to prove that first M 1/ 2 generated by
e y = e y ( Ȉ y - BLUUE) with respect to “one variance component” leads to
IQUUE and second Helmert’s ansatz generated “ Vˆ 2 IQUUE of V 2 ”. Here we
reverse the order. First, we prove that Helmert’s ansatz for estimating variance-
covariance components may lead (or may, in general, not) lead to
“ Vˆ k IQUUE of V k ”.
Second, we discuss the proper choice of M1/k 2 and test whether (i) M1/k 2 A = 0
and (ii) tr H j M k = G jk is fulfilled by HIQUUE of whether M1/k 2 A = 0 is fulfilled
by HIQE.
Box 4.7
Helmert's ansatz
variance-covariance components
step one: make a sub order device of variance-covariance components:
V 0 := [V 12 , V 12 , V 2 2 , V 13 , V 12 ,..., V A 2 ]0c
A ( A +1) / 2
step two: compute Ȉ 0 := ( Ȉ y )0 = Ȉ ¦ C jV j (V 0 ) (4.125)
j =1
4-3 Setup of BIQUUE 233
''variance-covariance components''
A ( A +1) / 2
Ȉy = Ȉ ¦ CkV k (4.130)
k =1
input: V 0 , Ȉ 0 , output: Ek (V 0 ).
step six: Helmert's equation
i, j {1," , A(A + 1) / 2}
A ( A +1) / 2
(4.133)
E{e cy Ei (V 0 )e y } = ¦ (tr P(V 0 )Ei (V 0 )P c(V 0 )C j )V j
k =1
"Helmert's choice''
A ( A +1) / 2
ecy Ei (V 0 ) e y = ¦ (tr P(V 0 )Ei (V 0 )P c(V 0 )C j )V j (4.134)
j =1
ª q := ey
cEi (V 0 )ey
«
q = Hıˆ « H := tr P (V 0 )Ei (V 0 )P '(V 0 )C j (" Helmert ' s process ") (4.135)
«
¬ıˆ := [Vˆ1 , Vˆ12 , Vˆ 2 , Vˆ13 , Vˆ 23 , Vˆ 3 ,..., Vˆ A ] .
2 2 2 2
taken if we replace 6 01 by the block partitioned inverse matrix, on the “Hel-
mert’s ansatz”. The fundamental expectation equation which maps the variance-
covariance components V j by means of the “Helmert traces” H to the quadratic
terms q (V 0 ) . Shipping the expectation operator on the left side, we replace V j
by their estimates Vˆ j . As a result we have found the aborted Helmert equation
q = Hıˆ which has to be inverted. Note E{q} = Hı reproducing unbiasedness.
Let us classify the solution of the Helmert equation q = Hı with respect to bias.
First let us assume that the Helmert matrix is of full rank, vk H = A(A + 1) / 2 the
number of unknown variance-covariance components. The inverse solution, Box
4.8, produces an update ıˆ 1 = H 1 (ıˆ 0 ) ' q(ıˆ 0 ) out of the zero order information
Vˆ 0 we have implemented. For the next step, we iterate ıˆ 2 = H 1 (ıˆ 1 )q(ıˆ 1 ) up to
the reproducing point Vˆ w = Vˆ w1 with in computer arithmetic when iteration
ends. Indeed, we assume “Helmert is contracting”.
Box 4.8
Solving Helmert's equation
the fast case : rk H = A ( A + 1) / 2, det H z 0
:"iterated Helmert equation":
Vˆ1 = H 1 (Vˆ 0 )q (Vˆ 0 ),..., VˆZ = HZ1 (VˆZ 1 ) q(VˆZ 1 ) (4.136)
"reproducing point"
start: V 0 = Vˆ 0 Vˆ1 = H 01 q0 Vˆ 2 = H11 q1
subject to H1 := H (Vˆ1 ), q1 := q(Vˆ1 )
... VˆZ = VˆZ 1 (computer arithmetic): end.
h
For the second case of our classification, let us assume that Helmert matrix is no
longer of full rank, rk H < A(A + 1) / 2 , det H=0. Now we are left with the central
question.
4-3 Setup of BIQUUE 235
“ MINOLESS” “ IQUUE”?
Unfortunately, the MINOLESS of the rank factorized Helmert equation
q = JKıˆ outlined in Box 4.9 by the weighted Moore-Penrose solution, indicates
a negative answer. Instead, Corollary 4 proves Vˆ is only HIQE, but resumes
also in establishing estimable variance-covariance components as “ Helmert
linear combinations” of them.
Box 4.9
Solving Helmert´s equation the second case:
rk H < A(A + 1) / 2 , det H=0
E{eiy } = 0 ,
E{eiy e yj } = S ij = vijV 2 (4.142)
E{e e e } = S ijk = 0, (obliquity)
y y y
i j k (4.143)
E{eiy e yj eky ely } = S ijkl = S ijS kl + S ik S jl + S ilS jk =
(4.144)
= (vij vkl + vik v jl + vil v jk )V 4
relate to the "centralized random variable"
e y := y E{y} = [eiy ] . (4.145)
The moment arrays are taken over the index set i, j, k, l {1000, n}
when the natural number n is identified as the number of observations. n
is the dimension of the observation space Y = {\ n , pdf } .
The scalar Vˆ 2 is called BIQUUE of V 2 ( Best Invariant Quadratic Uni-
formly Unbiased Estimation) of the special Gauss-Markov model of full
column rank.
"first moments" :
E{y} = Aȟ, A \ n× m , ȟ \ m , rk A = m (4.146)
"central second moments":
D{y} å y = VV 2 , V \ n×m , V 2 \ + , rk V = n (4.147)
m 2 +
where ȟ \ is the first unknown vector and V \ the second un-
known " one variance component", if it is.
(i) a quadratic estimation (IQE):
Vˆ 2 = y cMy = (vec M )cy
y = tr Myy c (4.148)
subject to
1 1
M = (M )cM SYM := {M \ n×m M = M c}
2 2
(4.149)
(ii) translational invariant, in the sense of
y o y E{y} =: e y (4.150)
V 2 = y cMy = ecy Mey
ˆ (4.151)
or equivalently
Vˆ 2 = (vec M )c y
y = (vec M )ce y
e y (4.152)
Vˆ 2 = tr Myy c = tr Me y ecy (4.153)
(iii) uniformly unbiased in the sense of
E{Vˆ 2 } = V 2 , V 2 \ + (4.154)
and
(iv) of minimal variance in the sense
D{Vˆ 2 } := E{[Vˆ 2 E{Vˆ 2 }]2 } = min . (4.155)
M
238 4 The second problem of probabilistic regression
E{[Vˆ 2 E{Vˆ 2 }]} = E{[tr Me y ecy (tr MV)V 2 ][tr Me y ecy (tr MV)V 2 ]} =
= E{(tr Me y ecy )(tr Me y ecy )} (tr MV) 2 V 4 (4.156).
h
2 2
With the “ansatz” Vˆ IQE of V we have achieved the first decomposition of
var {Vˆ 2 } . The second decomposition of the first term will lead us to central mo-
ments of fourth order which will be decomposed into central moments of second
order for a Gauss normal random variable y. The computation is easiest in
“Ricci calculus“. An alternative computation of the reduction “fourth moments
to second moments” in “Cayley calculus” which is a bit more advanced, is gives
in Appendix D.
E{(tr Me y ecy )(tr Me y ecy )} =
n n
= ¦ m ij m kl E{eiy e yj e ky ely } = ¦ m ij m kl ʌijkl =
i , j , k ,l =1 i , j , k ,l =1
n
= ¦ m ij m kl ( ʌij ʌ kl + ʌik ʌ jl + ʌil ʌ jk ) =
i , j , k ,l =1
n
= ¦ m ij m kl ( v ij v kl + v ik v jl + v il v jk )ı 4
i , j , k ,l =1
tr VM = Į tr[I n A( A cV 1 A) 1 A cV 1 ] = Į ( n m)
L(Į, O0 ) = Į 2 (n m) + 2O0 [D (n m) 1] = min . (4.164)
Į , O0
240 4 The second problem of probabilistic regression
ª 1 1 º ª Dˆ º ª 0 º
«¬ n m 0 »¼ « Oˆ » = «¬1 »¼
¬ 0¼
1
solved by Dˆ = Oˆ0 = .
nm
1 w2 L
(Dˆ , Oˆ0 ) = n m × 0
2 wD 2
constitutes the necessary condition, automatically fulfilled. Such a solution for
the parameter D leads us to the " BIQUUE" representation of the matrix M.
1
M= V 1 [I n A( AcV 1 A) 1 A cV 1 ] . (4.167)
nm
h
2 2 2
Explicit representations Vˆ BIQUUE of V , of the variance D{Vˆ } and its esti-
mate D{ıˆ 2 } are highlighted by
4-3 Setup of BIQUUE 241
(iii) D {Vˆ 2 }
An estimate of BIQUUE´s variance is
Dˆ {Vˆ 2 } = 2(n m) 1 (Vˆ 2 ) (4.171)
Dˆ {Vˆ } = 2(n m) (e cV e ) .
2 3 1 2
(4.172)
: Proof:
We have already prepared the proof for (i). Therefore we continue to prove (ii)
and (iii)
(i) D{ıˆ 2 BIQUUE}
(iii) D{Vˆ 2 }
Just replace within D{Vˆ 2 } the variance V 2 by the estimate Vˆ 2 .
Dˆ {Vˆ 2 } = 2(n m) 1 (Vˆ 2 ) 2 . h
242 4 The second problem of probabilistic regression
Additional Reading
Seely. J. and Lee, Y. (confidence interval for a variance: 1994), Azzam, A.,
Birkes, A.D. and Seely, J. (admissibility in linear models, polyhydral covariance
structure: 1988), Seely, J. and Rady, E. (random effects – fixed effects, linear
hypothesis: 1988), Seely, J. and Hogg, R.V. (unbiased estimation in linear mod-
els: 1982), Seely, J. (confidence intervals for positive linear combinations of
variance components, 1980), Seely, J. (minimal sufficient statistics and com-
pleteness, 1977), Olsen, A., Seely, J. and Birkes, D. (invariant quadratic embi-
ased estimators for two variance components, 1975), Seely, J. (quadratic sub-
spaces and completeness, 1971) and Seely, J. (linear spaces and unbiased estima-
tion, 1970).
5 The third problem of algebraic regression
- inconsistent system of linear observational equations
with datum defect: overdetermined- undertermined sys-
tem of linear equations:
{Ax + i = y | A \ n×m , y R ( A ) rk A < min{m, n}}
Lemma 5.2
G x -minimum norm,
G y -least squares solution
Lemma 5.3
G x -minimum norm,
G y -least squares solution
Lemma 5.5
MINOLESS
additive rank partitioning
Lemma 5.6
characterization of
G x , G y -MINOS
Lemma 5.7
eigenspace analysis
versus eigenspace synthesis
244 5 The third problem of algebraic regression
Lemma 5.9
D -HAPS
Definition 5.8
D -HAPS
Lemma 5.10
D -HAPS
We shall outline three aspects of the general inverse problem given in discrete
form (i) set-theoretic (fibering), (ii) algebraic (rank partitioning; “IPM”, the
Implicit Function Theorem) and (iii) geometrical (slicing).
Here we treat the third problem of algebraic regression, also called the general
linear inverse problem:
An inconsistent system of linear observational equations
{Ax + i = y | A \ n× m , rk A < min {n, m}}
also called “under determined - over determined system of linear equations” is
solved by means of an optimization problem. The introduction presents us with
the front page example of inhomogeneous equations with unknowns. In terms of
boxes and figures we review the minimum norm, least squares solution
(“MINOLESS”) of such an inconsistent, rank deficient system of linear equa-
tions which is based upon the trinity
5-1 Introduction 245
5-1 Introduction
With the introductory paragraph we explain the fundamental concepts and basic
notions of this section. For you, the analyst, who has the difficult task to deal
with measurements, observational data, modeling and modeling equations we
present numerical examples and graphical illustrations of all abstract notions.
The elementary introduction is written not for a mathematician, but for you, the
analyst, with limited remote control of the notions given hereafter. May we gain
your interest?
Assume an n-dimensional observation space, here a linear space parameterized
by n observations (finite, discrete) as coordinates y = [ y1 ," , yn ]c R n in which
an m-dimensional model manifold is embedded (immersed). The model manifold
is described as the range of a linear operator f from an m-dimensional parameter
space X into the observation space Y. As a mapping f is established by the
mathematical equations which relate all observables to the unknown parameters.
Here the parameter space X , the domain of the linear operator f, will be also
restricted to a linear space which is parameterized by coordinates
x = [ x1 ," , xm ]c R m . In this way the linear operator f can be understood as a
coordinate mapping A : x 6 y = Ax. The linear mapping f : X o Y is geo-
metrically characterized by its range R(f), namely R(A), defined by R(f):=
{y R n | y = f(x) for some x X} which in general is a linear subspace of Y
and its kernel N(f), namely N(A), defined by N ( f ) := {x X | f (x) = 0}. Here
the range R(f), namely the range space R(A), does not coincide with the n-
dimensional observation space Y such that y R (f ) , namely y R (A) . In
addition, we shall assume here that the kernel N(f), namely null space N(A) is
not trivial: Or we may write N(f) z {0}.
First, Example 1.3 confronts us with an inconsistent system of linear equations
with a datum defect. Second, such a system of equations is formulated as a spe-
cial linear model in terms of matrix algebra. In particular we are aiming at an
explanation of the terms “inconsistent” and “datum defect”. The rank of the
matrix A is introduced as the index of the linear operator A. The left complemen-
tary index n – rk A is responsible for surjectivity defect, which its right comple-
mentary index m – rk A for the injectivity (datum defect). As a linear mapping f
is neither “onto”, nor “one-to-one” or neither surjective, nor injective. Third, we
are going to open the toolbox of partitioning. By means of additive rank parti-
tioning (horizontal and vertical rank partitioning) we construct the minimum
norm – least squares solution (MINOLESS) of the inconsistent system of linear
equations with datum defect Ax + i = y , rk A d min{n, m }. Box 5.3 is an explicit
solution of the MINOLESS of our front page example.
Fourth, we present an alternative solution of type “MINOLESS” of the front
page example by multiplicative rank partitioning. Fifth, we succeed to identify
246 5 The third problem of algebraic regression
the range space R(A) and the null space N(A) using the door opener “rank
partitioning”.
5-11 The front page example
Example 5.1 (inconsistent system of linear equations with datum
defect: Ax + i = y, x X = R m , y Y R n
A R n× m , r = rk A d min{n, m} ):
Firstly, the introductory example solves the front page inconsistent system of
linear equations with datum defect,
x1 + x2 1 x1 + x2 + i1 = 1
x2 + x3 1 or x2 + x3 + i2 = 1
+ x1 x3 3 + x1 x3 + i3 = 3
ª 1 1 0 º
A := «« 0 1 1 »» Z 3×3 R 3×3
«¬ 1 0 1»¼
r = rk A = 2 .
5-1 Introduction 247
Box 5.2:
The measurement process of leveling and
its relation to the linear model
y1 = yDE = hDE + iDE = hD + hE + iDE
y2 = yEJ = hEJ + iEJ = hE + hJ + iEJ
y3 = yJD = hJD + iJD = hJ + hD + iJD
ª y1 º ª hD + hE + iDE º ª x1 + x2 + i1 º ª 1 1 0 º ª x1 º ª i1 º
« y » = « h + h + i » = « x + x + i » = « 0 1 1 » « x » + « i » .
« 2» « E J EJ » « 2 3 2» « »« 2» « 2»
«¬ y3 »¼ «¬ hJ + hD + iJD »¼ «¬ x3 + x1 + i3 »¼ «¬ 1 0 1»¼ «¬ x3 »¼ «¬ i3 »¼
Thirdly, let us begin with a more detailed analysis of the linear mapping
f : Ax y or Ax + i = y , namely of the linear operator A R n× m , r = rk A d
min{n, m}. We shall pay special attention to the three fundamental partitioning,
namely
(i) algebraic partitioning called additive and multiplicative
rank partitioning of the matrix A,
(ii) geometric partitioning called slicing of the linear space X
(parameter space) as well as of the linear space Y
(observation space),
(iii) set-theoretical partitioning called fibering of the set X of
parameter and the set Y of observations.
5-13 Minimum norm - least squares solution of the front page example
by means of additive rank partitioning
Box 5.3 is a setup of the minimum norm – least squares solution of the inconsis-
tent system of inhomogeneous linear equations with datum defect following the
first principle “additive rank partitioning”. The term “additive” is taken from the
additive decomposition y1 = A11x1 + A12 x 2 and y 2 = A 21x1 + A 22 x 2 of the ob-
servational equations subject to A11 R r × r , rk A11 d min{ n, m}.
Box 5.3:
Minimum norm-least squares solution of the inconsistent system
of inhomogeneous linear equations with datum defect ,
“additive rank partitioning”.
The solution of the hierarchical optimization problem
(1st) || i ||2I = min :
x
xl = arg{|| y Ax || I = min | Ax + i = y, A R n×m , rk A d min{ n, m }}
2
5-1 Introduction 249
ª y1 º ª A11 A12 º ª x1 º ª i1 º y1 R r ×1 , x1 R r ×1
«y » = « A » « » + « »,
¬ 2 ¼ ¬ 21 A 22 ¼ ¬ x 2 ¼ ¬ i 2 ¼ y 2 R ( n r )×1 , x 2 R ( m r )×1 .
First, as shown before, we compute the least-squares solution
|| i ||2I = min or ||y Ax ||2I = min which generates standard normal
x x
equations
A cAxl = A cy
or
c A11 + Ac21 A 21
ª A11 c A12 + Ac21 A 22 º ª x1 º ª A11
A11 c Ac21 º ª y1 º
« Ac A + Ac A » « » =«
¬ 12 11 22 21
c A12 + A c22 A 22 ¼ ¬ x 2 ¼ ¬ A12
A12 c A c22 »¼ ¬« y 2 ¼»
or
ª N11 N12 º ª x1l º ª m1 º
=
«N
¬ 21 N 22 »¼ «¬ x 2 l »¼ «¬m 2 »¼
subject to
N11 := A11
c A11 + A c21 A 21 , N12 := A11
c A12 + Ac21 A 22 , m1 = A11c y1 + A c21y 2
N 21 := A12
c A11 + A c22 A 21 , N 22 := A12
c A12 + A c22 A 22 , m 2 = A12
c y1 + A c22 y 2 ,
which are consistent linear equations with an (injectivity) defect
d = m rkA . The front page example leads us to
ª 1 1 0 º
ªA A12 º «
A = « 11 = 0 1 1 »»
¬ A 21 A 22 »¼ «
«¬ 1 0 1»¼
or
ª 1 1 º ª0º
A11 = « » , A12 = « »
¬ 0 1¼ ¬1 ¼
A 21 = [1 0] , A 22 = 1
250 5 The third problem of algebraic regression
ª 2 1 1º
A cA = «« 1 2 1»»
«¬ 1 1 2 »¼
ª 2 1º ª 1º
N11 = « » , N12 = « » , | N11 |= 3 z 0,
¬ 1 2 ¼ ¬ 1¼
N 21 = [ 1 1] , N 22 = 2
ª1º ª 4 º
y1 = « » , m1 = A11
c y1 + A c21y 2 = « »
¬1¼ ¬0¼
y 2 = 3, m 2 = A12
c y1 + A c22 y 2 = 4 .
wL 1 wL1 1 wL2
(x 2lm ) = 0 (x 2lm ) + (x 2lm ) = 0
wx 2 2 wx 2 2 wx 2
2
c N11
N12 m1 + (I + N12 2
c N11 N12 )x 2lm = 0
x 2lm = (I + N12 2
c N11 N12 ) 1 N12 2
c N11 m1 ,
1 w2L
(x 2 lm ) = (N12
c N112 N12 + I ) > 0
2 wx 2 wxc2
5-1 Introduction 251
c + N11 N11
by (N12 N12 c ) 1 such that
c + N11 N11
c (N12 N12
N12 N12 c ) 1 = N11 N11 c + N11N11
c (N12 N12 c ) 1 + I
ª 5 4 º ª1 1º
c =«
N11N11 » c =«
, N12 N12 »
¬ 4 5 ¼ ¬1 1¼
ª 6 3 º 1 ª6 3º
c + N11N11
N12 N12 c =« » c + N11N11
, [N12 N12 c ]1 =
¬ 3 6 ¼ 27 «¬ 3 6 »¼
ª 4 º 1 ª 4 º 4 4
m1 = « » x1lm = « » , x 2lm = , || xlm ||2I = 2
0
¬ ¼ 3 ¬ ¼0 3 3
252 5 The third problem of algebraic regression
4 4
x1lm = hˆD = , x2lm = hˆE = 0, x3lm = hˆJ =
3 3
4
|| xlm ||2I = 2
3
x + x + x = 0 ~ hˆ + hˆ + hˆ = 0.
1lm 2lm 3lm D E J
ª1º
1 1
i lm = ««1»» , Aci l = 0, || i lm ||2I = 3.
3 3
«¬1»¼
Box 5.4:
Minimum norm-least squares solution of the inconsistent
system of inhomogeneous linear equations with datum defect
multiplicative rank partitioning
The solution of the hierarchical optimization problem
(1st) ||i ||2I = min :
x
xl = arg{|| y Ax ||2I = min | Ax + i = y , A R n×m , rk A d min{ n, m }}
(2nd) ||x l ||2I = min :
xl
ª D R n×r , rk D = rk A =: r d min{n, m}
A = DE = « r ×m
¬E R , rk E = rk A =: r d min{n, m}
with respect to the linear model
y = Ax + i
ª Ex =: z
y = Ax + i = DEx + i « DEx = Dz y = Dz + i .
¬
First, as shown before, we compute the least-squares solution
|| i ||2I = min or ||y Ax ||2I = min which generates standard normal
x x
equations
DcDz l = Dcy z l = (DcD) 1 Dcy = Dcl y ,
which are consistent linear equations of rank rk D = rk DcD = rk A = r.
The front page example leads us to
ª 1 1 0 º ª 1 1 º
ªA A12 º «
A = DE = « 11 » = « 0 1 1 » , D = «« 0 1»» R 3×2
»
¬ A 21 A 22 ¼
«¬ 1 0 1»¼ «¬ 1 0 »¼
or
1 ª1 0 1º
z l = (DcD) 1 Dc = « y
3 ¬ 0 1 1»¼
ª1º
4 ª2º
y = «« 1 »» z l = « » .
3 ¬1 ¼
«¬ 3»¼
ª1 0 1º ª2 1º
E=« » , EEc = « »
¬0 1 1¼ ¬1 2¼
ª2 1º
1 ª 2 1º 1«
(EEc) = «
1
, Ec(EEc) = « 1
1
2 »»
3 ¬ 1 2 »¼ 3
«¬ 1 1 »¼
ª 1 0 1º
1«
xlm = Ec(EEc) (DcD) Dcy = « 1 1
1 1
0 »» y
3
«¬ 0 1 1»¼
ª1º ª 4 º ª 1º
« » 1« » 4« » 4
y = « 1 » xlm = « 0 » = « 0 » , || xlm ||= 2
3 3 3
«¬ 3»¼ «¬ +4 »¼ «¬ +1»¼
ˆ
ª x1lm º ª« hD º» ª 1º
« » ˆ 4« »
xlm = « x2lm » = « hE » = « 0 »
3
«¬ x3lm »¼ «« hˆ »» «¬ +1»¼
¬ ¼
J
4
|| xlm ||= 2
3
x1lm + x2lm + x3lm = 0 ~ hˆD + hˆE + hˆJ = 0.
ª1º
1 1
i lm = ««1»» , Aci l = 0, || i lm ||= 3.
3 3
«¬1»¼
h
Box 5.5 summarizes the algorithmic steps for the diagnosis of the simultaneous
horizontal and vertical rank partitioning to generate ( Fm Gy )-MINOS.
1
Box 5.5:
algorithm
The diagnostic algorithm for solving a general rank deficient system
of linear equations
y = Ax, A \ n× m , rk A < min{n, m}
by means of simultaneous horizontal and vertical rank partioning
Determine
the rank of the matrix A
rk A < min{n, m} .
Compute
“the simultaneous horizontal and vertical
rank partioning”
r ×r r ×( m r )
ª A11 A12 º A11 \ , A12 \
A=« »,
¬ A 21 A 22 ¼ A 21 \ ( ) , A 22 \ ( ) ( )
n r ×r n r × m r
Compute
the range space R(A) and the null space N(A)
of the linear operator A
R(A) = span {wl1 ( A )," , wlr ( A )}
N(A) = {x \ n | N11x1A + N12 x 2 A = 0}
or
x1A = N111
N12 x 2 A .
256 5 The third problem of algebraic regression
Compute
(Tm , Gy ) -MINOS
ª x º ª Nc º ªy º
x Am = « 1 » = « 11 » = [ N12 N12 c + N11N11 c G11y , A c21G12y ] « 1 »
c ]1 [ A11
c ¼
¬ x 2 ¼ ¬ N12 ¬y2 ¼
N11 := A11 c G11A11 + A c21G 22 A 21 , N12 := A11
y y
c G11A12 + A c21G 22 A 22
y y
N 21 := N12
c , N 22 := A12
c G11y A12 + A c21G 22
y
A 22 .
Here we will outline by means of Box 5.6 the range space as well as the null
space of the general inconsistent system of linear equations.
Box 5.6:
The range space and the null space of the general
inconsistent system of linear equations
n ×m
Ax + i = y , A \ , rk A d min{n, m}
ª1º ª 1 1 0 º
ª y1 º « » « » 3× 3
«¬ y2 »¼ = « 1 » , A = « 0 1 1 » \ , rk A =: r = 2
¬ 3 ¼ ¬ 1 0 1¼
R(A)=span {e1 a11 + e 2 a21 + e3 a31 , e1 a12 + e 2 a22 + e3 a32 } \ 3
or
R(A) = span {e1 + e3 , e1 e 2 } \ 3 = Y
c1 = [1, 0,1], c 2 = [1, 1, 0], \ 3 = span{ e1 , e 2 , e3 }
ec2 ec1
O
e3
y
e1 e2
ª 2 1º 1 ª2 1º ª 1º
N11 = « »
1
, N11 = « » , N12 = « »
¬ 1 2 ¼ 3 ¬1 2¼ ¬ 1¼
x1A = u, x 2A = u, x 3A = u
N(A)= H 01 = G1,3 .
258 5 The third problem of algebraic regression
N(A)= L 0
1
N(A)=G1,3 \
3
x2
x1
Figure 5.2 : Kernel N( f ), null space N(A), “the null space N(A) as
the linear manifold L 0 (Grassmann space G1,3) slices the
1
Box 5.7
MINOLESS of a general inconsistent system of linear equations :
f : x o y = Ax + i, x X = \ m (parameter space),
y Y = \ n (observation space)
r := rk A < min{n, m}
A- generalized inverse of MINOS type:
A1,2,3,4 or A Am
Condition # 1 Condition # 1
f(x)=f(g(y)) Ax =AA-Ax
f = f D gD f AA-A=A
Condition # 2 Condition # 2
g(y)=g(f(x)) A y=A-Ax=A-AA-y
-
g = gD f Dg A-AA-=A-
Condition # 3 Condition # 3
f(g(y))=yR(A) A-Ay= yR(A)
5-1 Introduction 259
f D g = PR ( A ) A A = PR ( A
)
Condition # 4 Condition # 4
g(f(x))= y R ( A ) A AA = PR ( A )
g D f = PR (g)
A A = PR ( A ) .
R(A-) A
R(A ) D(A)
-
D(A-) R(A) D(A-)
D(A) A- R(A)
P R(A)
A
R(A-)
A-
AA = PR ( A )
f D g = PR ( f )
A A = PR ( A )
g D f = PR ( g ) .
In addition, we follow Figure 5.4 and 5.5 for the characteristic diagrams de-
scribing:
260 5 The third problem of algebraic regression
X A Y Y
X
+
A =A
A =A
( G y ) 1 = G y
*
Gx A
G y A ( A
G y A )
(A Gy A)
A
G y A Gy *
G x = (G x ) 1
*
X A Y X A Y
y 6 y = G y y y Y, y Y
ª / 0º
Gy G
y = « y »
¬ 0 0¼
synthesis analysis
5-1 Introduction 261
ª Uc º
G*y = UcG y U = « 1 » G y [U1 , U 2 ]
G y = UG *y U c ¬ Uc2 ¼
= U1ȁ y U1c ª U1cG y U1 U1cG y U 2 º
=« »
¬ Uc2G y U1c Uc2G y U 2 ¼
ȁ y = U1cG y U1 U1ȁ y = G y U1
0 = G y Uc2 and U1cG y U 2 = 0
|| y Ax ||G2 = || i ||2 = i cG y i º
y
»
G y = U1c ȁ y U1 »¼
(y Ax)cU1c ȁ y U1 (y Axc) = min
x
U1 ( y Ax ) = U1i = i
If we use simultaneous horizontal and vertical rank partitioning
ª y º ªi º ªA A12 º ª x1 º ª i1 º
y = « 1 » + « 1 » = « 11 » « » + «i »
y i
¬ 2 ¼ ¬ 2 ¼ ¬ 21 A A 22 ¼ ¬ x 2 ¼ ¬ 2¼
subject to special dimension identities
y1 \ r ×1 , y 2 \ ( n r )×1
A11 \ r × r , A12 \ r × ( m r )
A 21 \ ( n r )× r , A 22 \ ( n r )× ( m r ) ,
we arrive at Lemma 5.0.
y1 \ r ×1 , y 2 \ ( n r )×1 , x1 \ r ×1 , x 2 \ ( m r )×1
A11 \ r × r , A12 \ r ×( m r )
A 21 \ ( n r )× r , A 22 \ ( n r )×( m r )
is a simultaneous horizontal and vertical rank partitioning of the
linear model (5.1)
262 5 The third problem of algebraic regression
(x 2 )Am = [N12
c N111G11x N111 N12 2G 21
x
N111 N12 + G x22 ]1
(5.4)
c N111G11x N11
(N12 1 1
2G x21 N11 )m1 .
The symmetric matrices (Gx, Gy) of the metric of the parameter space X
as well as of the observation space Y are consequently partitioned as
ª G y G12
y
º x
ª G11 x
G12 º
G y = « 11y y »
and « x x »
= Gx (5.5)
¬G 21 G 22 ¼ ¬ G 21 G 22 ¼
subject to
N11 := A11
c G11y A11 + A c21G 21
y
A11 + A11 y
c G12 A 21 + A c21G 22
y
A 21 (5.8)
N12 := A11
c G11y A12 + A c21G 21
y
A12 + A11 y
c G12 A 22 + A c21G 22
y
A 22 , (5.9)
N 21 = N12
c , (5.10)
N 22 := A12 y
c G11 A12 + A c22 G 21
y
A12 + A12 y
c G12 A 22 + Ac22 G 22
y
A 22 , (5.11)
M11 := A11 y
c G11 + A c21G 21
y
, M12 := A11 y
c G12 + A c21G 22
y
, (5.12)
5-2 MINOLESS and related solutions 263
M 21 := A12
c G11y + Ac22 G y21 , M 22 := A12 y
c G12 + A c22 G y22 , (5.13)
ª Gx A cG y A º ª x Am º ª 0 º
« A cG A = (5.18)
¬ y 0 »¼ «¬ OAm »¼ «¬ A cG y y »¼
:Proof:
G y -MINOS of the system of normal equations A cG y Ax A = A cG y is constructed
by means of the constrained Lagrangean
L( x A , OA ) := xcA G x x A + 2OAc( A cG y Ax A A cG y y ) = min ,
x ,O
ª Gx A cG y A º ª x Am º ª 0 º
« =
¬ A cG y A 0 »¼ «¬ OAm »¼ «¬ AcG y y »¼
1 w 2L
( x Am , OAm ) = G x t 0 (5.20)
2 wxwx c
due to the positive semidefiniteness of the matrix G x generate the sufficiency
condition for obtaining the minimum of the constrained Lagrangean. Due to the
assumption A cG y y R( A cG x A ) the existence of G y -MINOS x Am is granted. In
order to prove uniqueness of G y -MINOS x Am we have to consider
case (i) and case (ii)
G x positive definite G x positive semidefinite .
Gx A cG y A
= G x A cG y AG x1A cG y A = 0. (5.21)
A cG y A 0
First, we solve the system of normal equations which characterize x Am G x , G y -
MINOLESS of x for the case of a full rank matrix of the metric G x of the para-
metric space X, rk G x = m in particular. The system of normal equations is
solved for
ª x Am º ª G x A cG y A º ª 0 º ª C1 C2 º ª 0 º
« O » = « A cG A =
0 »¼ «¬ A cG y y »¼ «¬ C3 C4 »¼ «¬ A cG y y »¼
(5.22)
¬ Am ¼ ¬ y
subject to
ª Gx A cG y A º ª C1 C2 º ª G x A cG y A º ª G x A cG y A º
« A cG A » « » =«
¬ y
« »
0 ¼ ¬ C3 C4 ¼ ¬ A cG y A 0 ¼ ¬ A cG y A 0 »¼
(5.23)
as a postulate for the g-inverse of the partitioned matrix. Cayley multiplication of
the three partitioned matrices leads us to four matrix identities.
G x C1G x + G x C2 A cG y A + A cG y AC3G x + A cG y AC4 A cG y A = G x (5.24)
G x C1A cG y A + A cG y AC3 A cG y A = A cG y A (5.25)
A cG y AC1G x + A cG y AC2 A cG y A = AcG y A (5.26)
A cG y AC1 A cG y A = 0. (5.27)
Multiply the third identity by G x1A cG y A from the right side and substitute the
fourth identity in order to solve for C2.
A cG y AC2 A cG y AG x1A cG y A = A cG y AG x1A cG y A (5.28)
C2 = G x1A cG y A ( A cG y AG x1A cG y A )
solves the fifth equation
A cG y AG x1A cG y A ( A cG y AG x1A cG y A ) A cG y AG x1A cG y A =
(5.29)
= A cG y AG x1A cG y A
by the axiom of a generalized inverse
x Am = C2 A cG y y (5.30)
ªG x + A cG y A A cG y A º ª x Am º ª A cG y y º
« A cG A = (5.32)
¬ y 0 »¼ «¬ OAm »¼ «¬ A cG y y »¼
ªG 0 º ª Gx º
G x + A cG y A = [G x , A cG y A ] « x »« », (5.34)
¬ 0 ( A cG y A ) ¼ ¬ A cG y A ¼
namely G x + AcG y A z 0. The modified system of normal equations is solved
for
ª x Am º ªG x + A cG y A AcG y A º ª A cG y y º
« O » = « A cG A 0 »¼ «¬ A cG y y »¼
=
¬ Am ¼ ¬ y
(5.35)
ª C C2 º ª A cG y y º ª C1A cG y y + C2 A cG y y º
=« 1 »« »=« »
¬C3 C4 ¼ ¬ A cG y y ¼ ¬ C3A cG y y + C4 A cG y y ¼
subject to
ªG x + A cG y A A cG y A º ª C1 C2 º ªG x + A cG y A A cG y A º
« A cG A =
¬ y 0 »¼ «¬C3 C4 »¼ «¬ A cG y A 0 »¼
(5.36)
ªG x + A cG y A A cG y A º
=«
¬ A cG y A 0 »¼
“element (1,2)”
(G x + A cG y A)C1 A cG y A + A cG y AC3 = A cG y A (5.38)
5-2 MINOLESS and related solutions 267
“element (2,1)”
A cG y AC1 (G x + AcG y A) + AcG y AC2 AcG y A = AcG y A (5.39)
“element (2,2)”
A cG y AC1 A cG y A = 0. (5.40)
First, we realize that the right sides of the matrix identities are symmetric matri-
ces. Accordingly the left sides have to constitute symmetric matrices, too.
(G x + A cG y A)C1c (G x + A cG y A) + (G x + A cG y A)Cc3 A cG y A +
(1,1):
+ A cG y ACc2 (G x + A cG y A) + A cG y ACc4 AcG y A = G x + A cG y A
Second, we are going to solve for C1, C2, C3= C2 and C4.
C1 = (G x + A cG y A) 1{I m A cG y A[ AcG y A(G x + A cG y A) 1 A cG y A]
(5.42)
A cG y A(G x + A cG y A ) 1}
C4 = [ A cG y A (G x + A cG y A ) 1 A cG y A ] . (5.45)
or
A cG y A(G x + A cG y A) 1 A cG y A[ A cG y A(G x + A cG y A) 1 A cG y A]
(5.46)
A cG y A(G x + A cG y A ) 1 A cG y A = A cG y A(G x + A cG y A) 1 A cG y A .
268 5 The third problem of algebraic regression
as an exercise. Similarly,
C1 = (I m C2 A cG y A)(G x + A cG y A) 1 (5.47)
solves (2,2) where we again take advantage of the axiom of the g-inverse,
namely
A cG y AC1 AcG y A = 0 (5.48)
A cG y A(G x + A cG y A) 1 A cG y A(G x + A cG y A) 1 A cG y A
A cG y A(G x + A cG y A) 1 A cG y A[ A cG y A(G x + A cG y A) 1 A cG y A]
A cG y A(G x + A cG y A) 1 A cG y A = 0
A cG y A(G x + A cG y A ) 1 A cG y A
A cG y A(G x + A cG y A ) 1 A cG y A( A cG y A(G x + A cG y A ) 1 A cG y A)
A cG y A(G x + A cG y A ) 1 A cG y A = 0.
we receive
2AcGy A(Gx + AcGy A)1 AcGy A[AcGy A(Gx + AcGy A)1 AcGy A] AcGy A
(Gx + AcGy A)1 + AcGy A(Gx + AcGy A)1 AcGy AC4 AcGy A(Gx + AcGy A)1 AcGy A =
= AcGy A(Gx + AcGy A)1 AcGy A.
Finally, substitute
C4 = [ A cG y A(G x + A cG y A) 1 A cG y A] (5.50)
5-2 MINOLESS and related solutions 269
to conclude
A cG y A(G x + A cG y A) 1 A cG y A[ A cG y A(G x + A cG y A) 1 A cG y A] A cG y A
(G x + A cG y A) 1 AcG y A = AcG y A(G x + AcG y A) 1 AcG y A ,
namely the axiom of the g-inverse. Obviously, C4 is a symmetric matrix such
that C4 = Cc4 .
Here ends my elaborate proof.
The results of the constructive proof of Lemma 5.2 are collected in Lemma 5.3.
Lemma 5.3 ( G x -minimum norm, G y -least squares solution:
MINOLESS):
x Am = Ly
ˆ is G -minimum norm, G -least squares solution of (5.1)
x y
subject to
r := rk A = rk( A cG y A ) < min{n, m}
rk(G x + A cG y A ) = m
if and only if
Lˆ = A G+ y Gx
= ( A Am ) G y Gx
(5.51)
xAm = (G x + AcG y A)1 AcG y A[ AcG y A(G x + AcG y A)1 AcG y A] AcG y y , (5.53)
where A G+ G = A1,2,3,4
y G G is the G y , G x -weighted Moore-Penrose inverse. If
x y x
rk G x = m , then
ˆ = E D ª«ER = Em = Ec(EEc)
1
right inverse
L R L (5.58)
¬ DL = DA = (DcD) Dc left inverse
1
y = y Am + i Am (5.61)
is an orthogonal decomposition of the observation vector
y Y = \ n into
Ax Am = y Am R ( A) and y AxAm = i Am R ( A) A , (5.62)
the vector of inconsistency.
y Am = Ax Am = AA + y = i Am = y y Am = ( I n AA + ) y =
and
= D( DcD) 1 Dcy = DDA y = [I n D( DcD) 1 Dc]y = ( I n DD A ) y
AA + y = D( DcD) 1 Dcy = DDA y = y Am is the projection PR ( A ) and
( I n AA + ) y = [I n D( DcD) 1 Dc]y = ( I n DD A ) y is the projection PR ( A ) . A
x Am = ( A Am )G G y d A G+ G y =
y x y x
(5.66)
= G Ec(EG Ec) ( DcG y D) 1 DG y y.
1
x
1
x
1
y = y Am + i Am (5.68)
of inconsistency.
y Am = AA G+ G yy x
and i A = ( I n AA G+ G ) y
y x
(5.70)
AA G+ yGx
= PR ( A ) I n AA G+ yGx
= PR ( A ) A
are G y -orthogonal
¢ i Am | y Am ² G = 0 or (I n AA + ( weighted ))cG y A = 0 .
y
(5.71)
The “goodness of fit” of G x , G y -MINOLESS is
|| y Ax Am ||G2 =|| i Am ||G2 =
y y
+
= y c[I n AA Gy Gx ]cG y [I n AA G+ G ]y = y x
(5.72)
= y c[I n D(DcG y D) DcG y ]cG y [I n D(DcG y D) 1 DcG y ]y =
1
While Lemma 5.4 took advantage of rank factorization, Lemma 5.5 will alterna-
tively focus on additive rank partitioning.
272 5 The third problem of algebraic regression
ª Nc º
Lˆ = A Am = « 11 » (N12 N12
c + N11 N11
c ) 1 [ A11
c , Ac21 ] (5.75)
N c
¬ 12 ¼
subject to
N11 := A11
c A11 + Ac21A 21 , N12 := A11 c A12 + Ac21A 22 (5.76)
N 21 := A12 A11 + A 22 A 21 , N 22 := A12
c c c A12 + A c22 A 22 (5.77)
or
ª Nc º ªy º
x Am = « 11 » (N12 N12
c + N11 N11 c , Ac21 ] « 1 » .
c ) 1 [ A11 (5.78)
c ¼
¬ N12 ¬y 2 ¼
The unknown vector xAm has the minimum Euclidean length
|| x Am ||2 = x cAm x Am =
ªA º ªy º (5.79)
= [ y1c , y c2 ] « 11 » ( N12 N12
c + N11N11 c , A c21 ] « 1 » .
c ) 1[ A11
¬ A 21 ¼ ¬y2 ¼
y = y Am + i Am
Lˆ = ( A Am
)G yGx
.
The left side of the equation LcA cG y AL = LcA cG y is a symmetric matrix. Conse-
quently the right side has to be symmetric, too. Indeed we have proven condition
(iii) (G y AL)c = G y AL . Let us transplant the symmetric condition (iii) into the
original normal equations in order to benefit from
A cG y AL = A cG y or G y A = LcA cG y A = (G y AL)cA = G y ALA .
ª (1) AA A = A
«
« (2) A AA = A
(1st)
« (3) ( AA )cG y = G y AA
«
«¬ (4) ( A A )cG x = G x A A
ª A #G y AA = A cG y
(2nd) «
¬ ( A )cG x A A = ( A )cG x
( A )cG x A A = ( A )cG x ”
ª AA = PR ( A )
(3rd) «
«¬ A A = PR ( A ) .
(2) G x A AA = G x A
(3) ( AA )cG y = G y AA
(4) ( A A )cG x = G x A A .
Note 3:
A g-inverse which satisfies the conditions of Note 1 is denoted by A G+ G and y x
referred to as G y , G x -MINOLESS g-inverse of A.
A G+ G is unique if G x is positive definite. When both G x and G y are
y x
general positive semi definite matrices, A G+ G may not be unique . If
y x
| G x + A cG y A |z 0 holds, A G+ G is unique.
y x
5-2 MINOLESS and related solutions 277
Note 4:
If the matrices of the metric are positive definite, G x z 0, G y z 0 , then
(i) ( A G+ G )G+ G = A ,
y x x y
(ii) ( A G+ G ) # = ( A c)G+ 1 1 .
y x x Gy
versus versus
(5.96) e x = V cȁ x1 2 a e y = Ucȁ y1 2 b (5.97)
span {e1x ,… , e xm } = X Y=span {e1y ,… , e ny } .
Second, we are solving the general system of linear equations
{y = Ax | A \ n ×m , rk A < min{n, m}}
by introducing
• the eigenspace of the rank
deficient, rectangular matrix
of rank r := rk A < min{n, m}:
A 6 A
• the left and right canonical
coordinates: x 6 x
, y 6 y
as supported by Box 5.9. The transformations x 6 x
(5.97), y 6 y
(5.98) from
the original coordinates ( x1 ,… , x m ) to the canonical coordinates ( x1
,… , x
m ) , the
left star coordinates, as well as from the original coordinates ( y1 ,… , y n ) to the
canonical coordinates ( y1
,… , y
n ), the right star coordinates, are polar decompo-
sitions: a rotation {U, V}is followed by a general stretch {G1y 2 , G1x 2 } . Those
root matrices are generated by product decompositions of type G y = (G1y 2 )cG1y 2
as well as G x = (G1x 2 )cG1x 2 . Let us substitute the inverse transformations (5.99)
x
6 x = G x1 2 Vx
and (5.100) y
6 y = G y1 2 Uy
into the system of linear equa-
tions (5.1), (5.101) y = Ax + i, rk A < min{n, m} or its dual (5.102) y
= A
x
+ i
.
Such an operation leads us to (5.103) y
= f( x
) as well as (5.104) y = f (x).
Subject to the orthonormality condition (5.105) UcU = I n and (5.106)
V cV = I m we have generated the left-right eigenspace analysis (5.107)
ªȁ O1 º
ȁ
= «
¬ O2 O3 »¼
is based upon the left matrix (5.109) L := G y1 2 U decomposed into (5.111)
L1 := G y1 2 U1 and L 2 := G y1 2 U 2 and the right matrix (5.100) R := G x1 2 V decom-
posed into R1 := G x1 2 V1 and R 2 := G x1 2 V2 . Indeed the left matrix L by means of
(5.113) LLc = G y1 reconstructs the inverse matrix of the metric of the observation
space Y. Similarly, the right matrix R by means of (5.114) RR c = G x1 generates
5-2 MINOLESS and related solutions 279
the inverse matrix of the metric of the parameter space X. In terms of “L, R” we
have summarized the eigenvalue decompositions (5.117)-(5.122).
Such an eigenvalue decomposition helps us to canonically invert y
= A
x
+ i
by means of (5.123), namely the “full rank partitioning” of the system of canoni-
cal linear equations y
= A
x
+ i
. The observation vector y
\ n is decom-
posed into y1
\ r ×1 and y
2 \ ( n r )×1 while the vector x
\ m of unknown
parameters is decomposed into x1
\ r ×1 and x
2 \ ( m r )×1 .
(x1
) Am = ȁ 1 y1
“dimension identities”
ȁ \ r × r , O1 \ r ×( m r ) , U1 \ n × r , V1 \ m × r
O2 \ ( n r )× r , O3 \ ( n r )×( m r ) , U 2 \ n ×( n r ) , V2 \ m ×( m r )
ª Uc º ª L º ª Vcº ªR º
(5.116) L1 = « 1 » G1y 2 =: « 1 » R 1 = « 1 » G1x 2 =: « 1 » (5.117)
¬ Uc2 ¼ ¬L2 ¼ ¬ V2c ¼ ¬R 2 ¼
(5.118) A = LA
R 1 versus A
= L1AR (5.119)
ªȁ O1 º
A
= « =
ªR º
¬O2 O3 »¼
(5.120) A = [L1 , L 2 ]A
« » 1
versus (5.121)
¬R ¼ ª L º
= « 1 » A[R1 , R 2 ]
2
¬L2 ¼
ªȁ O1 º ª x1
º ª i1
º ª y1
º
(5.124) y
= A
x
+ i
= « « »+« » = « »
¬ O2 O3 »¼ ¬ x
2 ¼ ¬ i
2 ¼ ¬ y
2 ¼
Consult the commutative diagram of Figure 5.6 for a shortened summary of the
newly introduced transformation of coordinates, both of the parameter space X
as well as the observation space Y.
5-2 MINOLESS and related solutions 281
A
X
x R ( A) Y
V cG1x 2 UcG1y 2
X
x
y
R ( A
) Y
Figure 5.6 : Commutative diagram of coordinate transformations
Third, we prepare ourselves for MINOLESS of the general system of linear
equations
{y = Ax + i | A \ n × m , rk A < min{n, m} ,
ªȁ O1 º
A
= « = A=
¬O2 O3 »¼
versus ªR º
ª L º = [L1 , L 2 ]A* « 1 »
= « 1 » A[R1 , R 2 ] ¬R 2 ¼
¬L2 ¼
are determined by the eigenvalue-eigencolumn equations (eigen-
space equations)
A # AR1 = R1ȁ 2 AA # L1 = L1ȁ 2
versus
A # AR 2 = 0 AA # L 2 = 0
282 5 The third problem of algebraic regression
subject to
ªO12 " 0 º
« »
« # % # » , ȁ = Diag(+ O1 ,… , + Or ) .
2 2
« 0 " Or2 »
¬ ¼
5-24 Notes
The algebra of eigensystems is treated in varying degrees by most books on lin-
ear algebra, in particular tensor algebra. Special mention should be made of R.
Bellman’s classic “Introduction to matrix analysis” (1970) and Horn’s and John-
son’s two books (1985, 1991) on introductory and advanced matrix analysis.
More or less systematic treatments of eigensystem are found in books on matrix
computations. The classics of the field are Householder’s “Theory of matrices in
numerical analysis” (1964) and Wilkinson’s “The algebraic eigenvalue prob-
lem” (1965) . G. Golub’s and Van Loan’s “Matrix computations” (1996) is the
currently definite survey of the field. Trefethen’s and Bau’s “Numerical linear
algebra” (1997) is a high-level, insightful treatment with a welcomed stress on
geometry. G.W. Stewart’s “Matrix algorithm: eigensystems” (2001) is becoming
a classic as well.
The term “eigenvalue” derives from the German Eigenwert, which was intro-
duced by D. Hilbert (1904) to denote for integral equations the reciprocal of the
matrix eigenvalue. At some point Hilbert’s Eigenwert inverted themselves and
became attached to matrices. Eigenvalues have been called many things in their
day. The “characteristic value” is a reasonable translation of Eigenwert. How-
ever, “characteristic” has an inconveniently large number of syllables and sur-
vives only in the terms “characteristic equation” and “characteristic polyno-
mial”. For symmetric matrices the characteristic equation and its equivalent are
also called the secular equation owing to its connection with the secular pertur-
bations in the orbits of planets. Other terms are “latent value” and “proper value”
from the French “valeur propre”.
Indeed the day when purists and pedants could legitimately object to “eigen-
value” as a hybrid of German and English has long since passed. The German
“eigen” has become thoroughly materialized English prefix meaning having to
do with eigenvalues and eigenvectors. Thus we can use “eigensystem”, “eigen-
space” or “eigenexpansion” without fear of being misunderstood. The term “ei-
genpair” used to denote an eigenvalue and eigenvector is a recent innovation.
5-3 The hybrid approximation solution: D-HAPS and Tykhonov-
Phillips regularization
G x ,G y MINOLESS has been built on sequential approximations. First, the
surjectivity defect was secured by G y LESS . The corresponding normal
5-3 The hybrid approximation solution 283
equations suffered from the effect of the injectivity defect. Accordingly, second
G x LESS generated a unique solution the rank deficient normal equations.
Alternatively, we may constitute a unique solution of the system of inconsistent,
rank deficient equations
{Ax + i = y | AG \ n× m , r := rk A < min{n, m}}
by the D -weighted hybrid norm of type “LESS” and “MINOS”. Such a solution
of a general algebraic regression problem is also called
• Tykhonov- Phillips regularization
• ridge estimator
• D HAPS.
Indeed, D HAPS is the most popular inversion operation, namely to regularize
improperly posed problems. An example is the discretized version of an integral
equation of the first kind.
Definition 5.8 (D-HAPS):
An m × 1vector x is called weighted D HAPS (Hybrid AP proxi-
mative Solution) with respect to an D -weighted G x , G y -seminorm
of (5.1), if
x h = arg{|| y - Ax ||G2 +D || x ||G2 = min | Ax + i = y ,
y x
(5.125)
n× m
A\ ; rk A d min{n, m}}.
Note that we may apply weighted D HAPS even for the case of rank identity
rkA d min{n, m} . The factor D \ + balances the least squares norm and the
minimum norm of the unknown vector which is illustrated by Figure 5.7.
: Proof :
D HAPS is constructed by means of the Lagrangean
L( x ) :=|| y - Ax ||G2 +D || x ||G2 = ( y - Ax )cG y ( y - Ax) + D ( xcG y x) = min ,
y y x
Lemma 6.2
ȟˆ hom Ȉ y , S-BLUMBE of ȟ
Theorem 6.3
hom Ȉ y , S-BLUMBE of ȟ
Theorem 6.5
Vˆ 2 BIQUUE of Vˆ 2
Theorem 6.6
Vˆ 2 BIQE of V 2
286 6 The third problem of probabilistic regression
6 Theorem 6.13
ȟ̂ hom D -BLE
Definition 6.7 and Lemma 6.2, Theorem 6.3, Lemma 6.4, Theorem 6.5 and 6.6
review ȟ̂ of type hom Ȉ y , S-BLUMBE, BIQE, followed by the first example.
Alternatively, estimators of type best linear, namely hom BLE, hom S-BLE and
hom D -BLE are presented. Definitions 6.7, 6.8 and 6.9 relate to various estima-
tors followed by Lemma 6.10, Theorem 6.11, 6.12 and 6.13.
In the fifth chapter we have solved a special algebraic regression problem,
namely the inversion of a system of inconsistent linear equations with a datum
defect. By means of a hierarchic postulate of a minimum norm || x ||2 = min , least
squares solution || y Ax ||2 = min (“MINOLESS”) we were able to determine m
unknowns from n observations through the rank of the linear operator,
rk A = r < min{n, m} , was less than the number of observations or less the num-
ber of unknowns. Though “MINOLESS” generates a rigorous solution, we were
left with the problem to interpret our results.
The key for an evolution of “MINOLESS” is handed over to us by treating the
special algebraic problem by means of a special probabilistic regression
problem, namely as a special Gauss-Markov model with datum defect. The bias
6-1 Setup of the best linear minimum bias estimator of type BLUMBE 287
1st moments
E{y} = Aȟ , rk A < min{n, m} (6.1)
2nd moments
D{y} =: Ȉ y , Ȉ y positive definite, rk Ȉ y = n, (6.2)
Box 6.2
Bias vector, bias matrix
Vector and matrix bias norms
Special linear Gauss-Markov model of fixed effects subject to datum defect
A \ n× m , rk A < min{n, m}
“ansatz”
ȟˆ = Ly (6.4)
bias vector
bias matrix
B := I n LA (6.7)
“bias norms”
|| ȕ ||2 = ȕcȕ = ȟ c[I m LA]c[I m LA]ȟ (6.8)
“dispersion matrix”
“decomposition”
Definition 6.1 defines (1st) ȟ̂ as a linear homogenous form, (2nd) of type “mini-
mum bias” and (3rd) of type “smallest average variance”.
Chapter 6-11 is a collection of definitions and lemmas, theorems basic for the
developments in the future.
6-11 Definitions, lemmas and theorems
Definition 6.1 (ȟˆ hom Ȉ , S-BLUMBE) :
y
: Proof :
First, we minimize the S-modified bias matrix norm, second the MSE( ȟ̂ ) matrix
norm. All matrix norms have been chosen “Frobenius”.
(i) || (I m LA) ' ||S2 = min .
L
The S -weighted Frobenius matrix norm || (I m LA ) ' ||S2 establishes the La-
grangean
L (L) := tr(I m LA)S(I m LA) ' = min
L
for S-BLUMBE .
ª ASA ' Lˆ ' AS = 0
L (L) = min «
¬ ( ASA ')
I m > 0,
L
Let us multiply the third identity by Ȉ -1y ASAc = 0 from the right side and substi-
tute the fourth identity in order to solve for C2 .
ASAcC2 ASAcȈ y1 ASAc = ASAcȈ y1 ASAc (6.30)
6-1 Setup of the best linear minimum bias estimator of type BLUMBE 291
w2L ˆ ) = 2( Ȉ
I ) > 0 ,
(Lˆ , ȁ y m
w (vec L) w (vec L ')
to be a positive definite matrix, the sufficiency conditions. Indeed, the first matrix
derivations have been identified as the normal equations of the sequential opti-
mization problem.
h
For an explicit representation of ȟ̂ hom Ȉ y , S-BLUMBE of ȟ we solve the nor-
mal equations for
ȟˆ = SA '( ASA ' Ȉ -1y ASA ') ASA ' Ȉ-1y y , (6.33)
completed by
the dispersion matrix
D(ȟˆ ) = SA '( ASA ' Ȉ-1y ASA ') AS , (6.34)
modified by
rk MSE{ȟˆ} = rk S (6.38)
is singular.
Ȉy ASAc
=| Ȉ y | | ASAcȈ y 1 ASAc |= 0 ,
ASA c 0
Lˆ c = C2 AS (6.40)
Lˆ = SA cCc2 (6.41)
such that
ȟˆ = Ly
ˆ = SA c( ASA cȈ 1ASA c) ASAcȈ 1y.
y y (6.43)
D{ȟˆ} = LȈ
ˆ Lˆ c =
y
ȕ := E{ȟˆ ȟ} = (I m LA
ˆ )ȟ =
(6.46)
= [I m SA c( ASA cȈ y1ASA c) ASA cȈ y1A]ȟ .
6-12 The first example: BLUMBE versus BLE, BIQUUE versus BIQE,
triangular leveling network
The first example for the special Gauss-Markov model with datum defect
(ii) V, S-BLUMBE of ȟ \ m
(iii) I, I-BLE of ȟ \ m
(iv) V, S-BLE of ȟ \ m
(v) BIQUUE of V 2 \ +
(vi) BIQE of V 2 \ +
will be considered. In particular, we use consecutive results of Appendix A,
namely from solving linear system of equations based upon generalized inverse,
in short g-inverses. For the analyst, the special Gauss-Markov model with datum
defect constituted by the problem of estimating absolute heights [hĮ , hȕ , hȖ ] of
points {PĮ , Pȕ , PȖ } from height differences is formulated in Box 6.3.
Box 6.3
The first example
ª hĮȕ º ª 1 +1 0 º ª hĮ º
« » « »
E{« hȕȖ »} = «« 0 1 +1»» « hȕ »
¬ hȖĮ ¼ ¬« +1 0 1¼» ¬ hȖ ¼
« » « »
ª hĮȕ º ª 1 + 1 0 º ª hĮ º
« » « »
y := « hȕȖ » , A := «« 0 1 +1»» \ 3×3 , ȟ := « hȕ »
« hȖĮ »
¬ ¼ «¬ +1 0 1»¼ « hȖ »
¬ ¼
ª hĮȕ º
« »
D{« hȕȖ »} = D{y} = VV 2 , V 2 \ +
« hȖĮ »
¬ ¼
:dimensions:
ȟ \ 3 , dim ȟ = 3, y \ 3 , dim{Y, pdf } = 3
m = 3, n = 3, rk A = 2, rk V = 3.
ȟˆ = A c( AA cAA c) AA cy
ª 2 1 1º ª2 1 0º
1«
AA AA = 3 «« 1 2 1»» ,
c c ( AA cAA c) = « 1 2 0 »» .
9
«¬ 1 1 2 »¼ «¬ 0 0 0 »¼
ª hĮ º ª y1 + y3 º
« » 1
ȟˆ = « hȕ » (I 3 , I 3 -BLUMBE) = «« y1 y2 »»
3
« hȖ » «¬ y2 y3 »¼
¬ ¼
D{ȟˆ} = V 2 A c( AA cAAc) A
ª +2 1 1º
V2 «
ˆ
D{ȟ} = 1 + 2 1 »
9 « »
¬« 1 1 +2 ¼»
“replace V 2 by Vˆ 2 (BIQUUE):
6-1 Setup of the best linear minimum bias estimator of type BLUMBE 299
Vˆ 2 = (n rk A) 1 e cy e y ”
e y (I 3 , I 3 -BLUMBE) = [I 3 A ( AA c) A c]y
ª1 1 1º ª1º
1 1
e y = «1 1 1» y = ( y1 + y2 + y3 ) «1»
3« » 3 «»
«¬1 1 1»¼ «¬1»¼
ª1 1 1º ª1 1 1º
1
e cy e y = y c ««1 1 1»» ««1 1 1»» y
9
«¬1 1 1»¼ «¬1 1 1»¼
1
e cy e y = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 )
3
1
Vˆ 2 (BIQUUE) = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 )
3
ª +2 1 1º
1«
D{ȟ} = « 1 +2 1»» Vˆ 2 (BIQUUE)
ˆ
9
«¬ 1 1 +2 »¼
“replace V 2 by Vˆ 2 (BIQE):
Vˆ 2 = ( n + 2 rk A ) 1 e cy e y ”
e y (I 3 , I 3 -BLUMBE) = [I 3 A ( AA c) A c]y
ª1 1 1º ª1º
1 1
e y = ««1 1 1»» y = ( y1 + y2 + y3 ) ««1»»
3 3
«¬1 1 1»¼ «¬1»¼
1
Vˆ 2 ( BIQE ) = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 )
9
ª +2 1 1º
1
D{ȟˆ} = «« 1 +2 1»» Vˆ 2 (BIQE) .
9
«¬ 1 1 +2 »¼
ª1 1 1º
1«
ȕ = «1 1 1»» ȟ ,
3
«¬1 1 1»¼
ª1 1 1º
1«
ȕ = «1 1 1»» ȟˆ (I 3 , I 3 -BLUMBE) ,
3
¬«1 1 1»¼
ȕ=0
ª5 2 2º
V2 «
ˆ
MSE{ȟ} = 2 5 2 »» .
9 «
«¬ 2 2 5 »¼
“replace V 2 by Vˆ 2 (BIQUUE):
Vˆ 2 = (n rk A) 1 ecy e y ”
1
Vˆ 2 (BIQUUE) = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 ) ,
3
ª5 2 2º
1«
MSE{ȟ} = « 2 5 2 »» Vˆ 2 (BIQUUE) .
ˆ
9
«¬ 2 2 5 »¼
“replace V 2 by Vˆ 2 (BIQE):
Vˆ 2 = ( n + 2 rk A ) 1 ecy e y ”
1
Vˆ 2 (BIQE) = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 )
9
6-1 Setup of the best linear minimum bias estimator of type BLUMBE 301
ª5 2 2º
1
MSE{ȟˆ} = «« 2 5 2 »» Vˆ 2 (BIQE) .
9
«¬ 2 2 5 »¼
ª1 1 1º ª1º
1« 1
e y = «1 1 1» y = ( y1 + y2 + y3 ) ««1»»
»
3 3
«¬1 1 1»¼ «¬1»¼
D{e y } = V 2 [I 3 A( A cA) A c]
ª1 1 1º
V2 «
D{e y } = 1 1 1»» .
3 «
«¬1 1 1»¼
ª1 1 1º
1«
D{e y } = «1 1 1»» Vˆ 2 (BIQUUE)
3
«¬1 1 1»¼
or
ª1 1 1º
1«
D{e y } = «1 1 1»» Vˆ 2 (BIQE) .
3
«¬1 1 1»¼
ª2 1 1º
1«
V = «1 2 1 »» , rk V = 3 = n
2
«¬1 1 2 »¼
ª 3 1 1º
1«
V = « 1 3 1»» , but
1
2
«¬ 1 1 3 »¼
S = Diag(0,1,1), rk S = 2
ª +3 1 1º
1«
V = « 1 +3 1»» , S = Diag(0,1,1), rk S = 2
1
2
«¬ 1 1 +3»¼
ª 2 3 1 º
ASAcV ASAc = 2 «« 3 6 3»»
1
«¬ 1 3 2 »¼
ª 2 0 1º
1«
( ASAcV ASAc) = « 0 0 3»»
1
6
«¬ 1 0 2 »¼
ª hĮ º ª 0 º
ˆȟ = « h » 1«
= « 2 y1 y2 y3 »» .
« ȕ» 3
« hȖ »
¬ ¼ V ,S BLUMBE ¬« y1 + y2 2 y3 ¼»
6-1 Setup of the best linear minimum bias estimator of type BLUMBE 303
ª0 0 0 º
V2 «
ˆ
D{ȟ} = « 0 2 1 »»
6
«¬0 1 2 »¼
“replace V 2 by Vˆ 2 (BIQUUE):
Vˆ 2 = (n rk A) 1 e cy e y ”
ª1 1 1º ª1º
1« y + y2 + y3
e y = «1 1 1»» y = 1 «1»
«»
3 3
«¬1 1 1»¼ «¬1»¼
ª1 1 1º ª1 1 1º
1 «
e cy e y = y c «1 1 1»» ««1 1 1»» y
9
«¬1 1 1»¼ «¬1 1 1»¼
1
e cy e y = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 )
3
1
Vˆ 2 (BIQUUE) = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 )
3
ª1 1 1º
2«
D{ȟ} = «1 1 1»» Vˆ 2 (BIQUUE) .
ˆ
3
«¬1 1 1»¼
“replace V 2 by Vˆ 2 (BIQE):
Vˆ 2 = (n + 2 rk A) 1 e cy e y ”
e y (V , S-BLUMBE) = [I 3 A ( A cV 1A ) A cV 1 ]y
304 6 The third problem of probabilistic regression
ª1 1 1º ª1º
1 y + y2 + y3
e y = ««1 1 1»» y = 1 «1»
«»
3 3
«¬1 1 1»¼ «¬1»¼
1
Vˆ 2 (BIQE) = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 )
9
ª +2 1 1º
1
D{ȟˆ} = «« 1 +2 1»» Vˆ 2 (BIQE) .
9
«¬ 1 1 +2 »¼
ª1 0 0 º ª[1 º
ȕ = «1 0 0 » ȟ = ««[1 »» ,
« »
¬«1 0 0 »¼ ¬«[1 ¼»
“replace ȟ which is inaccessible by ȟ̂ (V,S-BLUMBE)”
ª1º
ȕ = ««1»» ȟˆ , (V , S-BLUMBE) z 0 .
¬«1¼»
Mean Square Estimation Error MSE{ȟˆ (V , S-BLUMBE)}
MSE{ȟˆ} =
= D{ȟˆ} + [S SA c( ASA cV 1ASA c) ASA cV 1AS]V 2
ª0 0 0 º
V2 «
ˆ
MSE{ȟ} = 0 2 1 »» = D{ȟˆ} .
6 «
«¬0 1 2 »¼
“replace V 2 by Vˆ 2 (BIQUUE):
Vˆ 2 = (n rk A) 1 ecy e y ”
Vˆ 2 (BIQUUE)=3V 2
6-1 Setup of the best linear minimum bias estimator of type BLUMBE 305
MSE{ȟˆ} = D{ȟˆ} .
ª1 1 1º ª1º
1« y + y2 + y3
e y = «1 1 1»» y = 1 «1»
«»
3 3
«¬1 1 1»¼ «¬1»¼
D{e y } = V 2 [V A( A cV 1A) A c]
ª1 1 1º
2 2«
D{e y } = V «1 1 1»» .
3
«¬1 1 1»¼
“replace V by Vˆ 2
2
(BIQE):
Vˆ 2 = (n + 2 rk A) 1 ecy e y ”
Vˆ 2 (BIQE) versus
ª0 0 0 º
1«
MSE{ȟ} = «0 2 1 »» V 2 ( BIQE ) .
ˆ
6
«¬0 1 2 »¼
ª1 1 1º ª1º
1« y + y2 + y3
e y = «1 1 1»» y = 1 «1»
«»
3 3
«¬1 1 1»¼ «¬1»¼
D{e y } = V 2 [V A( A cV 1A) A c]
ª1 1 1º
2 2«
D{e y } = V «1 1 1»» .
3
«¬1 1 1»¼
ª1 1 1º
2«
D{e y } = « 1 1 1»» Vˆ 2 (BIQUUE)
3
«¬1 1 1»¼
or
ª1 1 1º
2«
D{e y } = «1 1 1»» Vˆ 2 (BIQE) .
3
«¬1 1 1»¼
ȟˆ (BLE) = (I 3 + A cA ) 1 A cy
ª +3 1 1º ª2 1 1º
1«
I 3 + A cA = «« 1 +3 1»» , (I 3 + AcA) = «1 2 1 »»
1
4
«¬ 1 1 +3»¼ «¬1 1 2 »¼
ª 1 0 1º ª y1 + y3 º
ˆȟ (BLE) = 1 « 1 1 0 » y = 1 « + y y »
4« » 4«
1 2»
«¬ 0 1 1»¼ «¬ + y2 y3 »¼
2
ª +2 1 1º
V « 1 +2 1» .
D{ȟˆ | BLE} =
16 « »
«¬ 1 1 +2 »¼
ȕ = [I 3 + A cA]1 ȟ
ª2 1 1º ª 2[1 + [ 2 + [3 º
1« 1«
ȕ = « 1 2 1 » ȟ = «[1 + 2[ 2 + [3 »» .
»
4 4
«¬ 1 1 2 »¼ «¬[1 + [ 2 + 2[3 »¼
ª2 1 1º
V2 «
ˆ
MSE{ȟ (BLE)} = 1 2 1 »» .
4 «
«¬1 1 2 »¼
ª2 1 1º ª 2 y1 + y2 + y3 º
1« 1«
e y (BLE) = « 1 2 1 » y = « y1 + 2 y2 + y3 »»
»
4 4
«¬1 1 2 »¼ «¬ y1 + y2 + 2 y3 »¼
ª6 5 5º
V2 «
D{e y (BLE)} = « 5 6 5 »» .
16
«¬5 5 6 »¼
Correlations
ª +2 1 1º
V2 «
ˆ
C{e y , Aȟ} =
1 +2 1» .
16 « »
¬« 1 1 +2 ¼»
Comparisons BLUMBE-BLE
ª 1 0 1 º ª y1 + y3 º
1 1
ȟˆ BLUMBE ȟˆ BLE = «« 1 1 0 »» y = «« + y1 y2 »» .
12 12
«¬ 0 1 1»¼ «¬ + y2 y3 »¼
ª +2 1 1º
7
D{ȟˆ BLUMBE } D{ȟˆ BLE } = V « 1 +2 1»» positive semidefinite.
2
144 «
«¬ 1 1 +2 »¼
ª +2 1 1º
V2 «
ˆ ˆ
MSE{ȟ BLUMBE } MSE{ȟ BLE } = 1 +2 1»» positive semidefinite.
36 «
«¬ 1 1 +2 »¼
ª +3 1 1º
1«
V = « 1 +3 1»» ,
1
2
«¬ 1 1 +3»¼
and
S = Diag(0,1,1), rk S = 2 ,
6-1 Setup of the best linear minimum bias estimator of type BLUMBE 309
ȟˆ = (I 3 + SA cV 1A) 1 SA cV 1y ,
ª 1 0 0º ª 21 0 0 º
1 «
I 3 + SA V A = « 2 5 2 » , (I 3 + SA V A) = «14 5 2 »» ,
c 1 « » c 1 1
21
«¬ 2 2 5 »¼ «¬14 2 5 »¼
ª hĮ º
ˆȟ (V, S-BLE) = « h » =
« ȕ»
« hȖ »
¬ ¼ V ,S -BLE
ª0 0 0º ª 0 º
1 « 1 «
= «14 6 4 » y = «10 y1 6 y2 4 y3 »» .
»
21 21
«¬ 4 6 10 »¼ «¬ 4 y1 + 6 y2 10 y3 »¼
“fixed effects”
ª0 0 0 º
V2 «
ˆ
D{ȟ | V, S-BLE} = « 0 76 22 »» .
441
«¬0 22 76 »¼
ª 21 0 0 º ª 21[1 º
1 « 1 «
ȕ = «14 5 2 » ȟ = «14[1 + 5[ 2 + 2[ 3 »» .
»
21 21
«¬14 2 5 »¼ «¬14[1 + 2[ 2 + 5[ 3 »¼
ª0 0 0 º
V2 «
MSE{ȟ | V, S-BLE} = 0 5 2 »» .
21 «
«¬0 2 5 »¼
ª11 6 4 º ª11 y1 + 6 y2 + 4 y3 º
1 « » y = 1 « 6y + 9y + 6y »
e y {V , S-BLE} = 6 9 6
21 « » 21 «
1 2 3 »
«¬ 4 6 11»¼ «¬ 4 y1 + 6 y2 + 11y3 »¼
Correlations
2
ª 29 9 20 º
V « 9 18 9 » .
C{e y , Aȟˆ} =
441 « »
«¬ 20 9 29 »¼
Comparisons BLUMBE-BLE
ª0 0 0º ª 0 º
1 « 1 «
ȟˆ V ,S BLUMBE ȟˆ V ,S -BLE = « 4 1 3 » y = « 4 y1 y2 3 y3 »» .
»
21 21
«¬ 3 1 4 »¼ «¬3 y1 + y2 4 y3 »¼
2
ª0 0 0º
V «
D{ȟˆ V ,S -BLUMBE } D{ȟˆ V ,S -BLE } = 0 142 103»» positive semidefinite.
882 «
«¬ 0 103 142 »¼
ª0 0 0 º
V2 «
MSE{ȟˆ V ,S BLUMBE } MSE{ȟˆ V ,S BLE } = « 0 4 3 »» positive semidefinite.
42
«¬0 3 4 »¼
ȟˆ BLUMBE ȟˆ BLE , D{ȟˆ BLUMBE } D{ȟˆ BLE } and MSE{ȟˆ BLUMBE } MSE{ȟˆ BLE } as well
as ȟˆ V ,S -BLUMBE ȟˆ V,S -BLE , D{ȟˆ V ,S -BLUMBE } D{ȟˆ V ,S -BLE } and MSE{ȟˆ V ,S -BLUMBE }
MSE{ȟˆ V ,S -BLE } result positive semidefinite: In consequence, for three different
measures of distorsions
BLE is in favor of BLIMBE:
BLE produces smaller errors in comparing with BLIMBE!
Finally let us compare weighted BIQUUE and weighted BIQE:
(i) Weighted BIQUUE Vˆ 2 and weighted BIQE Vˆ 2
Vˆ 2 = (n r ) 1 y cV 1e y = versus Vˆ 2 = (n r + 2)y cV 1e y =
= (n r ) 1 e cy V 1e y = (n r + 2)e cy V 1e y
ª4 1 1º
1«
(e y ) V ,S -BLUMBE = «1 4 1 »» y
6
«¬1 1 4 »¼
ª +3 1 1º
1«
r = rk A = 2, n = 3, V = « 1 +3 1»»
1
2
«¬ 1 1 +3»¼
1 versus 1
Vˆ 2 = ( y12 + y22 + y32 ) Vˆ 2 = ( y12 + y22 + y32 )
2 6
(ii) D{Vˆ 2 | BIQUUE} versus D{Vˆ 2 | BIQE}
D{Vˆ 2 } = 2V 4 versus 2
D{Vˆ 2 } = V 4
9
312 6 The third problem of probabilistic regression
1 versus 1
Dˆ {Vˆ 2 } = ( y12 + y22 + y32 ) Eˆ {Vˆ 2 V 2 } = ( y12 + y22 + y32 )
2 9
1
Eˆ {(Vˆ 2 V 2 )} = ( y12 + y22 + y32 ) .
54
1
Vˆ 2BIQUUE Vˆ 2BIQE = ( y12 + y22 + y32 ) positive.
3
We repeat that the difference Vˆ 2BIQUUE Vˆ BIQE
2
is in favor of Vˆ 2BIQE < Vˆ 2BIQUUE .
6-2 Setup of the best linear estimators of type hom BLE, hom S-
BLE and hom Į-BLE for fixed effects
Numerical tests have documented that ȟ̂ of type Ȉ - BLUUE of ȟ is not robust
against outliers in the stochastic vector y observations. It is for this reason that
we give up the postulate of unbiasedness, but keeping the set up of a linear esti-
mation ȟˆ = Ly of homogeneous type. The matrix L is uniquely determined by
the D weighted hybrid norm of type minimum variance || D{ȟˆ} ||2 and minimum
bias || I LA ||2 . For such a homogeneous linear estimation (2.21) by means of
Box 6.4 let us specify the real-valued, nonstochastic bias vector
ȕ:= E{ȟˆ ȟ} = E{ȟˆ}ȟ of type (6.11), (6.12), (6.13) and the real-valued, non-
stochastic bias matrix I m LA (6.74) in more detail.
First, let us discuss why a setup of an inhomogeneous linear estimation is not
suited to solve our problem. In the case of an unbiased estimator, the setup of an
inhomogeneous linear estimation ȟˆ = Ly + ț led us to E{ȟˆ} = ȟ the postulate of
unbiasedness if and only if E{ȟˆ} ȟ := LE{y} ȟ + ț = (I m LA)ȟ + ț = 0 for
all ȟ R m or LA = I m and ț = 0. Indeed the postulate of unbiasedness restricted
the linear operator L to be the (non-unique) left inverse L = A L as well as the
vector ț of inhomogeneity to zero. In contrast the bias vector ȕ := E{ȟˆ ȟ} =
E{ȟˆ} ȟ = LE{y} ȟ = (I m LA)ȟ + ț for a setup of an inhomogeneous linear
estimation should approach zero if ȟ = 0 is chosen as a special case. In order to
include this case in the linear biased estimation procedure we set ț = 0 .
6-2 Setup of the best linear estimators fixed effects 313
Second, we focus on the norm (2.79) namely || ȕ ||2 := E{(ȟˆ ȟ )c(ȟˆ ȟ )} of the bias
vector ȕ , also called Mean Square Error MSE{ȟˆ} of ȟˆ . In terms of a setup of a
homogeneous linear estimation, ȟˆ = Ly , the norm of the bias vector is repre-
sented by (I m LA)cȟȟ c(I m LA) or by the weighted Frobenius matrix norm
c where the weight matrix ȟȟ c, rk ȟȟ c = 1, has rank one. Obviously
2
|| (I m LA)c ||ȟȟ
2
|| (I m LA)c ||ȟȟ c is only a semi-norm. In addition, ȟȟ c is not accessible since ȟ is
unknown. In this problematic case we replace the matrix ȟȟ c by a fixed, positive
definite m×m matrix S and define the S-weighted Frobenius matrix norm
|| (I m LA)c ||S2 of type (2.82) of the bias matrix I m LA . Indeed by means of
the rank identity, rk S=m we have chosen a weight matrix of maximal rank. Now
we are prepared to understand intuitively the following.
Here we focus on best linear estimators of type hom BLE, hom S-BLE and hom
Į-BLE of fixed effects ȟ, which turn out to be better than the best linear uni-
formly unbiased estimator of type hom BLUUE, but suffer from the effect to be
biased. At first let us begin with a discussion about the bias vector and the bias
matrix as well as the Mean Square Estimation Error MSE{ȟˆ} with respect to a
homogeneous linear estimation ȟˆ = Ly of fixed effects ȟ based upon Box 6.4.
Box 6.4
Bias vector, bias matrix
Mean Square Estimation Error
in the special Gauss–Markov model with fixed effects
E{y} = Aȟ (6.71)
D{y} = Ȉ y (6.72)
“ansatz”
ȟˆ = Ly (6.73)
bias vector
bias matrix
B := I m LA (6.76)
decomposition
( E{ȟˆ E{ȟˆ}} = 0)
|| MSE{ȟˆ} ||2 =
= tr LD{y}Lc + tr[I m LA] ȟȟ c [I m LA]c (6.83)
= || Lc ||
2
6y + || (I m LA)c ||
2
ȟȟ '
error and bias, respectively. The double decomposition of the vector ȟˆ ȟ leads
straightforward to the double representation of the matrix MSE{ȟˆ} of type (6.80).
Such a representation suffers from two effects: Firstly the vector ȟ of fixed ef-
fects is unknown, secondly the matrix ȟȟ c has only rank 1. In consequence, the
matrix [I m LA] ȟȟ c [I m LA]c has only rank 1, too. In this situation there has
been made a proposal to modify MSE{ȟˆ} with respect to ȟȟ c by the regular ma-
trix S. MSES {ȟˆ} has been defined by (6.81). A scalar measure of MSE{ȟˆ} as
well as MSES {ȟˆ} are the Frobenius norms (6.82), (6.83), (6.84). Those scalars
constitute the optimal risk in Definition 6.7 (hom BLE) and Definition 6.8 (hom
S-BLE). Alternatively a homogeneous Į-weighted hybrid minimum variance-
minimum bias estimation (hom Į-BLE) is presented in Definition 6.9 (hom Į-
BLE) which is based upon the weighted sum of two norms of type (6.85),
namely
• average variance || Lc ||62 y = tr L6 y Lc
• average bias || (I m LA)c ||S2 = tr[I m LA] S [I m LA]c.
The very important estimator Į-BLE is balancing variance and bias by the
weight factor Į which is illustrated by Figure 6.1.
ȟˆ = Ly , (6.87)
(2nd) in comparison to all other linear estimations ȟ̂ has the
minimum Mean Square Estimation Error in the sense of
|| MSE{ȟˆ} ||2 =
= tr LD{y}Lc + tr[I m LA] ȟȟ c [I m LA]c (6.88)
= || Lc ||6y + || (I m LA)c ||
2 2
ȟȟ c .
316 6 The third problem of probabilistic regression
ȟˆ = Ly , (6.89)
(2nd) in comparison to all other linear estimations ȟ̂ has the mini-
mum S-modified Mean Square Estimation Error in the sense
of
ȟˆ = Ly , (6.91)
tr LD{y}Lc + 1 tr (I m LA ) S (I m LA )c
D
(6.92)
= || Lc ||62 + 1 || (I m LA)c ||S2 = min ,
y
D L
( Ȉ y + 1 ASAc)Lˆ c = 1 AS . (6.95)
D D
:Proof:
(i) hom BLE:
The hybrid norm || MSE{ȟˆ} ||2 establishes the Lagrangean
L (L) := tr L6 y Lc + tr (I m LA) ȟȟ c (I m LA)c = min
L
for ȟ̂ hom BLE of ȟ . The necessary conditions for the minimum of the quad-
ratic Lagrangean L (L) are
wL ˆ
(L) := 2[6 y Lˆ c + Aȟȟ cA cLˆ c Aȟȟ c] = 0
wL
which agree to the normal equations (6.93). (The theory of matrix derivatives is
reviewed in Appendix B (Facts: derivative of a scalar-valued function of a ma-
trix: trace).
The second derivatives
w2 L
(Lˆ ) > 0
w (vecL)w (vecL)c
wL
( Lˆ ) := vec[2 Lˆ (6 y + Aȟȟ cA c) 2ȟȟ cA c] .
w (vecL )
for ȟ̂ hom S-BLE of ȟ . Following the first part of the proof we are led to the
necessary conditions for the minimum of the quadratic Lagrangean L (L)
wL ˆ
(L) := 2[6 y Lˆ c + ASAcLˆ c AS]c = 0
wL
as well as to the sufficiency conditions
w2 L
(Lˆ ) = 2[( Ȉ y + ASAc)
I m ] > 0 .
w (vecL)w ( vecL)c
for ȟ̂ hom Į-BLE of ȟ . Following the first part of the proof we are led to the
necessary conditions for the minimum of the quadratic Lagrangean L (L)
wL ˆ
(L) = 2[( Ȉ y + Aȟȟ cA c)
I m ]vecLˆ 2vec(ȟȟ cA c) .
wL
The Kronecker-Zehfuss product A
B of two arbitrary matrices as well as
( A + B)
C = A
B + B
C of three arbitrary matrices subject to dim A = dim
B is introduced in Appendix A, “Definition of Matrix Algebra: multiplication of
matrices of the same dimension (internal relation) and Laws”. The vec operation
(vectorization of an array) is reviewed in Appendix A, too, “Definition, Facts:
vecAB = (Bc
I cA )vecA for suitable matrices A and B”. Now we are prepared to
compute
w2 L
(Lˆ ) = 2[(6 y + Aȟȟ cA c)
I m ] > 0
w (vecL)w (vecL)c
w2 L
(Lˆ ) = 2[( 1 ASA c + Ȉ y )
I m ] > 0.
w (vecL)w (vecL)c D
Beside the explicit representation of ȟ̂ of type hom BLE, hom S-BLE and hom
Į-BLE we compute the related dispersion matrix D{ȟˆ} , the Mean Square Estima-
tion Error MSE{ȟˆ}, the modified Mean Square Estimation Error
MSES {ȟˆ} and MSED ,S {ȟˆ} in
ȕ := E{ȟˆ} ȟ
(6.98)
= [I m ȟȟ cA c( Aȟȟ cA c + Ȉ y ) 1 A] ȟ
MSE{ȟˆ} :=
D{ȟˆ} + [I m ȟȟ cA c( Aȟȟ cA c + Ȉ y ) 1 A] × (6.100)
×ȟȟ c [I m Ac( Aȟȟ cA c + Ȉ y ) Aȟȟ c].
1
At this point we have to comment what Theorem 6.11 tells us. hom BLE has
generated the estimation ȟ̂ of type (6.96), the dispersion matrix D{ȟˆ} of type
(6.97), the bias vector of type (6.98) and the Mean Square Estimation Error of
type (6.100) which all depend on the vector ȟ and the matrix ȟȟ c , respectively.
We already mentioned that ȟ and the matrix ȟȟ c are not accessible from meas-
urements. The situation is similar to the one in hypothesis testing. As shown later
in this section we can produce only an estimator ȟ and consequently can setup a
hypothesis ȟ 0 of the "fixed effect" ȟ . Indeed, a similar argument applies to the
second central moment D{y} ~ Ȉ y of the "random effect" y, the observation
vector. Such a dispersion matrix has to be known in order to be able to com-
pute ȟ̂ , D{ȟˆ} , and MSE{ȟˆ} . Again we have to apply the argument that we are
only able to construct an estimate Ȉ ˆ and to setup a hypothesis about
y
D{y} ~ 6 y .
Theorem 6.12 ( ȟ̂ hom S-BLE):
Let ȟˆ = Ly be hom S-BLE of ȟ in the special linear Gauss-
Markov model with fixed effects of Box 6.3. Then equivalent repre-
sentations of the solutions of the the normal equations (6.94) are
ȟˆ = SA c( Ȉ y + ASA c) 1 y (6.101)
ȕ := E{ȟˆ} ȟ
= [I m SA c( ASA c + Ȉ y ) 1 A] ȟ
ȟˆ = 1 SA c( Ȉ y + 1 ASA c) 1 y (6.110)
D D
ȟˆ = ( A cȈ y1A + D S 1 ) 1 A cȈ y1y (6.111)
ȕ := E{ȟˆ} ȟ
= [I m 1 SA c( 1 ASAc + Ȉ y ) 1 A] ȟ
D D
ȕ = [I m ( AcȈ y1 A + D S 1 ) 1 AcȈ y1A] ȟ (6.115)
+ [I m - 1 SAc( 1 ASA c + Ȉ y ) 1 A] ȟȟ c ×
D D
(6.117)
× [I m - A c( 1 ASA c + Ȉ y ) 1 AS 1 ] =
D D
1 1 1
= S SAc( ASAc + Ȉ y ) 1 1
AS
D D D D
MSED , S {ȟˆ} = ( A cȈ y1A + D S 1 ) 1 A cȈ y1A( A cȈ y1A + D S 1 ) 1 +
+ [I m - ( A cȈ y1A + D S 1 ) 1 A cȈ y1A] ȟȟ c ×
(6.118)
× [I m - A cȈ y1A( A cȈ y1A + D S 1 ) 1 ]
= ( A cȈ y1A + D S 1 ) 1
inverse dispersion part and the second inverse bias part. While the experiment
informs us of the variance-covariance matrix Ȉ y , say Ȉ l y , the bias weight matrix
S and the weight factor D are at the disposal of the analyst. For instance, by the
choice S = Diag( s1 ,..., sA ) we may emphasize increase or decrease of certain bias
matrix elements. The choice of an equally weighted bias matrix is S = I m . In
contrast the weight factor D can be determined by the A-optimal design of type
• tr D{ȟˆ} = min
D
• ȕȕc = min
D
• tr MSED , S {ȟˆ} = min .
D
In the first case we optimize the trace of the variance-covariance matrix
D{ȟˆ} of type (6.113), (6.114). Alternatively by means of ȕȕ ' = min we optimize
D
the quadratic bias where the bias vector ȕ of type (6.115) is chosen, regardless
of the dependence on ȟ . Finally for the third case – the most popular one – we
minimize the trace of the Mean Square Estimation Error MSED , S {ȟˆ} of type
(6.118), regardless of the dependence on ȟȟ c . But beforehand let us present the
proof of Theorem 6.10, Theorem 6.11 and Theorem 6.8.
Proof:
(i) ȟˆ = ȟȟ cA c[ Ȉ y + Aȟȟ cA c]1 y
If the matrix Ȉ y + Aȟȟ cA c of the normal equations of type hom BLE is of full
rank, namely rk(Ȉ y + Aȟȟ cA c) = n, then a straightforward solution of (6.93) is
Lˆ = ȟȟ cA c[ Ȉ y + Aȟȟ cA c]1.
(ii) ȟˆ = SA c( Ȉ y + ASA c) 1 y
If the matrix Ȉ y + ASAc of the normal equations of type hom S-BLE is of full
rank, namely rk(Ȉ y + ASA c) = n, then a straightforward solution of (6.94) is
if S 1 and Ȉ y1 exist. Such a result concludes this part of the proof.
(iv) ȟˆ = (I m + SA cȈ y1A) 1 SA cȈ y1y
Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matri-
ces, s(9)) the fundamental matrix identity
324 6 The third problem of probabilistic regression
(v) ȟˆ = 1 SA c( Ȉ y + 1 ASA c) 1 y
D D
If the matrix Ȉ y + D1 ASA c of the normal equations of type hom Į-BLE is of full
rank, namely rk(Ȉ y + D1 ASA c) = n, then a straightforward solution of (6.95) is
Lˆ = 1 SA c[ Ȉ y + 1 ASAc]1 .
D D
(vi) ȟˆ = ( A cȈ y1A + D S 1 ) 1 A cȈ y1y
Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matri-
ces, s(10), Duncan-Guttman matrix identity) the fundamental matrix identity
1 SAc( Ȉ + ASAc) 1 = ( AcȈ 1 A + D S 1 ) 1 AcȈ 1
y y y
D
if S 1 and Ȉ y1 exist. Such a result concludes this part of the proof.
By means of the definition of the dispersion matrix D{ȟˆ} and the substitution of
ȟ̂ of type hom BLE the proof has been straightforward.
(ix) hom S-BLE: D{ȟˆ} (1st representation)
By means of the definition of the dispersion matrix D{ȟˆ} and the substitution of
ȟ̂ of type hom S-BLE the proof of the first representation has been straightfor-
ward.
6-2 Setup of the best linear estimators fixed effects 325
= 1 SA c( Ȉ y + 1 ASA c) 1 Ȉ y ( Ȉ y + 1 ASA c) 1 AS 1 .
D D D D
By means of the definition of the dispersion matrix D{ȟˆ} and the substitution of
ȟ̂ of type hom Į-BLE the proof of the first representation has been straightfor-
ward.
(xii) hom Į-BLE: D{ȟˆ} (2nd representation)
if S 1 and Ȉ y1 exist. By means of the definition of the dispersion matrix and the
D{ȟˆ} substitution of ȟ̂ of type hom Į-BLE the proof of the second representation
has been straightforward.
(xiii) bias ȕ for hom BLE, hom S-BLE and hom Į-BLE
As soon as we substitute into the bias ȕ := E{ȟˆ} ȟ = ȟ + E{ȟˆ} the various esti-
mators ȟ̂ of the type hom BLE, hom S-BLE and hom Į-BLE we are directly
led to various bias representations ȕ of type hom BLE, hom S-BLE and hom Į-
BLE.
(xiv) MSE{ȟˆ} of type hom BLE, hom S-BLE and hom Į-BLE
MSE{ȟˆ} := E{(ȟˆ ȟ )(ȟˆ ȟ )c}
ȟˆ ȟ = ȟˆ E{ȟˆ} + ( E{ȟˆ} ȟ )
At first we have defined the Mean Square Estimation Error MSE{ȟˆ} of ȟ̂ . Sec-
ondly we have decomposed the difference ȟˆ ȟ into the two terms
• ȟˆ E{ȟˆ}
• E{ȟˆ} ȟ
in order to derive thirdly the decomposition of MSE{ȟˆ} , namely
• the dispersion matrix of ȟ̂ , namely D{ȟˆ} ,
• the quadratic bias ȕȕc .
As soon as we substitute MSE{ȟˆ} the dispersion matrix D{ȟˆ} and the bias vector
ȕ of various estimators ȟ̂ of the type hom BLE, hom S-BLE and hom D -BLE
we are directly led to various representations ȕ of the Mean Square Estimation
Error MSE{ȟˆ} .
Here is my proof’s end. h
7 A spherical problem of algebraic representation - inconsis-
tent system of directional observational equations - overde-
termined system of nonlinear equations on curved mani-
folds
“Least squares regression is not appropriate when the response variable is cir-
cular, and can lead to erroneous results. The reason for this is that the squared
difference is not an appropriate measure of distance on the circle.”
U. Lund (1999)
A typical example of a nonlinear model is the inconsistent system of nonlinear
observational equations generated by directional measurements (angular obser-
vations, longitudinal data). Here the observation space Y as well as the parame-
ter space X is the hypersphere S p R p +1 : the von Mises circle S1 , p = 2 the
Fisher sphere S 2 , in general the Langevin sphere S p . For instance, assume re-
peated measurements of horizontal directions to one target which are distributed
as polar coordinates on a unit circle clustered around a central direction. Alter-
natively, assume repeated measurements of horizontal and vertical directions to
one target which are similarly distributed as spherical coordinates (longitude,
latitude) on a unit sphere clustered around a central direction. By means of a
properly chosen loss function we aim at a determination of the central direction.
Let us connect all points on S1 , S 2 , or in general S p the measurement points, by
a geodesic, here the great circle, to the point of the central direction. Indeed the
loss function will be optimal at a point on S1 , S 2 , or in general S p , called the
central point. The result for such a minimum geodesic distance mapping will be
presented.
Please pay attention to the guideline of Chapter 7.
Lemma 7.2
minimum geodesic distance: S1
Lemma 7.6
minimum geodesic distance: S 2
328 7 A spherical problem of algebraic representation
7-1 Introduction
Directional data, also called “longitudinal data” or “angular data”, arise in
several situations, notable geodesy, geophysics, geology, oceanography, atmos-
pheric science, meteorology and others. The von Mises or circular normal distri-
bution CN ( P , N ) with mean direction parameter P (0 d P d 2S ) and concen-
tration parameter N (N > 0) , the reciprocal of a dispersion measure, plays the
role in circular data parallel to that of the Gauss normal distribution in linear
data. A natural extension of the CN distribution to the distribution on a p-
dimensional sphere S p R p +1 leads to the Fisher - von Mises or Langevin distri-
bution L( P , N ) . For p=2, namely for spherical data (spherical longitude, spheri-
cal latitude), this distribution has been studied by R. A. Fisher (1953), generaliz-
ing the result of R. von Mises (1918) for p=1, and is often quoted as the Fisher
distribution. Further details can be taken from K. V. Mardia (1972), K. V. Mardia
and P.E. Jupp (2000), G. S. Watson (1986, 1998) and A. Sen Gupta and R.
Maitra (1998).
Box 7.1:
Fisher - von Mises or Langevin distribution
p=1 ( R. von Mises 1918)
f (/ | P , N ) = [2S I 0 (N )]1 exp[N cos( / P / )] (7.1)
f (/ | P , N ) = [2S I 0 (N )] exp N < ȝ | ; >
1
(7.2)
cos < := (7.3)
:=< ȝ | ; >= P x X + P y Y = cos P / cos / + sin P / sin /
(7.4)
cos < = cos(/ P/ )
ȝ = e1 P x + e 2 P y + e3 P z =
(7.9)
= e1 cos P) cos P / + e 2 cos P) sin P / + e3 sin P) S 2
X = e1 X + e 2Y + e3 Z =
(7.10)
= e1 cos ) cos / + e 2 cos ) sin / + e3 sin ) S 2 .
Box 7.1 is a review of the Fisher- von Mises or Langevin distribution. First, we
setup the circular normal distribution on S1 with longitude / as the stochastic
variable and ( P/ , N ) the distributional parameters called “mean direction ȝ ”
and “concentration measure”, the reciprocal of a dispersion measure. Due to the
normalization of the circular probability density function (“pdf”) I 0 (N ) as the
zero order modified Bessel function of the first kind of N appears. The circular
distance between the circular mean vector ȝ S1 and the placement vector
X S1 is measured by “ cos < ”, namely the inner product < ȝ | X > , both P
and X represented in polar coordinates ( P / , / ) , respectively. In summary,
(7.1) is the circular normal pdf, namely an element of the exponential class.
Second, we refer to the spherical normal pdf on S 2 with spherical longitude / ,
spherical latitude ) as the stochastic variables and ( P / , P) , N ) the distribu-
tional parameters called “longitudinal mean direction, lateral mean direction
( P/ , P) ) ” and “concentration measure N ”, the reciprocal of a dispersion
measure. Here the normalization factor of the spherical pdf is N /(4S sinh N ) .
The spherical distance between the spherical mean vector ȝ S 2 and the place-
ment vector X S 2 is measured by “ cos < ”, namely the inner product
< ȝ | X > , both ȝ and X represented in polar coordinates – spherical coordi-
nates ( P / , P) , /, ) ) , respectively. In summary, (7.7) is the spherical normal pdf,
namely an element of the exponential class.
Box 7.2:
Loss function
p=1: longitudinal data
n
type1: ¦ cos < i = max ~ 1c cos Ȍ = max (7.11)
i =1
n
type 2 : ¦ (1 cos < i ) = min ~ 1c(1 cos Ȍ) = min (7.12)
i =1
n
Ȍ Ȍ
type 3 : ¦ sin 2
< i / 2 = min ~ (sin )c (sin ) = min (7.13)
i =1 2 2
transformation
1 cos < = 2sin 2 < / 2 (7.14)
" geodetic distance"
cos< i = cos(/ i x) = cos / i cos x + sin / i sin x (7.15)
2sin < i / 2 = 1 cos < i = 1 cos / i cos x + sin / i sin x
2
(7.16)
330 7 A spherical problem of algebraic representation
transformation
1 cos < = 2sin 2 < / 2 (7.22)
" geodetic distance"
cos< i = cos ) i cos x2 cos(/ i x1 ) + sin ) i sin x2 =
(7.23)
= cos ) i cos / i cos x1 cos x2 + cos ) i sin / i sin x1 cos x2 + sin ) i sin x2
7-2 Minimal geodesic distance: MINGEODISC 331
is minimal.
n
/ g = arg {¦ 2(1 cos < i ) = min | cos < i = cos(/ i / g )} . (7.27)
i=1
Proof:
/ g is generated by means of the Lagrangean (loss function)
n
L( x) := ¦ 2[1 cos(/ i x)] =
i =1
n n
= 2n 2 cos x ¦ cos / i 2sin x ¦ sin / i = min.
x
i =1 i =1
d 2 L( x ) n n
2
(/ g ) = 2 cos / g ¦ cos / i + 2sin / g ¦ sin / i > 0
dx i =1 i =1
function
n
L(/ g , ) g ) := ¦ 2(1 cos < i ) = (7.33)
i=1
n
= ¦ 2[1 cos ) i cos ) g cos(/ i / g ) sin ) i sin ) g ] = min
/g ,)g
i =1
is minimal.
n
(/ g , ) g ) = arg {¦ 2(1 cos < i ) = min | (7.34)
i=1
| cos < i = cos ) i cos ) g cos(/ i / g ) + sin ) i sin ) g } .
n n
cos x2 cos x1 ¦ cos ) i sin / i cos x2 sin x1 ¦ cos ) i cos / i = 0 . (7.36)
i =1 i =1
Proof:
(/ g , ) g ) is generated by means of the Lagrangean (loss function)
n
L( x1 , x2 ) := ¦ 2[1 cos ) i cos / i cos x1 cos x2
i =1
wL( x) n
(/ g , ) g ) = 2 cos / g sin ) g ¦ cos ) i cos / i +
w x2 i =1
n
+ 2sin / g sin ) g ¦ cos ) i sin / i
i =1
n
2 cos ) g ¦ sin ) i = 0
i =1
constitute the necessary conditions. The matrix of second derivative
w 2 L( x )
(/ g , ) g ) t 0
w xw xc
builds up the sufficiency condition for the minimum at (/ g , ) g ) .
w 2 L( x )
(/ g , ) g ) = 2 cos / g cos ) g [cos ) cos / ] +
w x12
+ 2sin / g cos ) g [cos ) sin / ]
w 2 L( x )
(/ g , ) g ) = 2sin / g sin ) g [cos ) cos / ] +
w x1 x2
+ 2 cos / g sin ) g [cos ) sin / ]
w L( x )
2
(/ g , ) g ) = 2 cos / g cos ) g [cos ) cos / ] +
w x22
+ 2sin / g cos ) g [cos ) sin / ] +
+ sin ) g [sin ) ]. .
h
Lemma 7.6 (minimum geodesic distance, solution of the normal
equation: S 2 ):
Let the point (/ g , ) g ) S 2 be at minimum geodesic distance to
other points (/ i , ) i ) S 2 , i {1, " , n} . Then the corresponding
normal equations ((7.35), (7.36)) are uniquely solved by
tan / g = [cos ) sin /] /[cos ) cos /] (7.37)
[sin )]
tan ) g =
[cos ) cos / ]2 + [cos ) sin /]2
such that the circular solution point is
X g = e1 cos ) g cos / g + e 2 cos ) g sin / g + e3 sin ) g =
1
= *
[cos ) cos / ] + [cos ) sin /]2 + [sin )]2
2
(7.38)
*{e1[cos ) cos / ] + e 2 [cos ) sin /] + e3 [sin )]}
2 2
7-3 Special models 335
subject to
n
[cos ) cos / ] := ¦ cos ) i cos / i (7.39)
i=1
n
[cos ) sin / ] := ¦ cos ) i sin / i (7.40)
i=1
n
[sin )] := ¦ sin ) i .
i=1
Element Al Sb Ar As Ba Be Bi
Atomic
W 26.98 121.76 39.93 74.91 137.36 9.01 209.00
Weight
He asked the question “Does a typical element in some sense have integer atomic
weight ?” A natural interpretation of the question is “Do the fractional parts of
the weight cluster near 0 and 1?” The atomic weight W can be identified in a
natural way with points on the unit circle, in such a way that equal fractional
parts correspond to identical points. This can be done under the mapping
ªcos T1 º
W ox=« , T1 = 2S (W [W ]),
¬sin T1 »¼
where [u ] is the largest integer not greater than u . Von Mises’ question can now
be seen to be equivalent to asking “Do this points on the circle cluster near
e1 = [1 0]c ?”.
Incidentally, the mapping W o x can be made in another way:
336 7 A spherical problem of algebraic representation
ªcos T 2 º 1
W ox=« » , T 2 = 2S (W [W + ]) .
¬sin T 2 ¼ 2
The two sets of angles for the two mappings are then as follows:
Table 7.2
Element Al Sb Ar As Ba Be Bi Average
T1 / 2S 0.98 0.76 0.93 0.91 0.36 0.01 0.00 T1 / 2S = 0.566
We note from the discrepancy between the averages in the final column that our
usual ways of describing data, e.g., means and standard deviations, are likely to
fail us when it comes to measurements of direction.
If the points do cluster near e1 then the resultant vector 6 Nj=1x j (here N =7)
we should have approximately x / || x ||= e ,
should point in that direction, i.e., 1
elements
where x = 6 x j / N and ||x|| = (xcx)1/ 2 is the length of x . For the seven
are considered
whose weights we find
here,
D D
ª 0.9617 º ª cos 344.09 º ª cos 15.91 º
x / || x ||= « » = « D » = « D»
,
¬ 0.2741 ¼ ¬«sin 344.09 ¼» ¬«sin 15.91 ¼»
a direction not far removed from e1 .
Von Mises then asked “For what distribution of the unit circle is the unit vec-
tor Pˆ = [cos Tˆ0 sin Tˆ0 ]c = x / || x || a maximum likelihood estimator (MLE) of a
direction T 0 of clustering or concentration ?” The answer is the distribution now
known as the von Mises or circular normal distribution. It has density, expressed
in terms of random angle T ,
exp{k cos(T T 0 )} dT
,
I 0 (k ) 2S
where T 0 is the direction of concentration and the normalizing constant I 0 (k ) is
a Bessel function. An alternative expression is
exp(kx ' ȝ) dS ªcos Tˆ0 º
, ȝ=« » , ||x|| = 1.
I 0 (k ) 2S «¬sin Tˆ0 »¼
Von Mises’ question clearly has to do with the truth of the hypothesis
ª1 º
H 0 : T 0 = 0 or ȝ = e1 = « » .
¬0¼
7-3 Special models 337
It is worth mentioning that Fisher found the same distribution in another context
(Fisher, 1956, SMSI, pp. 133-138) as the conditional distribution of x , given
|| x ||= 1, when x is N 2 (ȝ, k 1 I 2 ) .
7-32 Oblique map projection
D = A, r = 2 R sin < / 2
versus
x = 2 R sin < / 2 cos A
y = 2 R sin < / 2sin A.
Second, we intend to derive the transformation from the local surface element
dAd< sin < to the alternate local surface element | J | dD dr sin < (D , r ) by means
of the inverse Jacobian determinant
ª dD º ª dA º
« dA 0 » « dD 0 »
« » = | J 1 | ~ J = « »
« 0 dr » « 0 d< »
¬« d < ¼» «¬ dr ¼»
ª wD wD º
« wA w< » ª DAD 0 º
J 1 =« »=
« wr wr » «¬ 0 D< r »¼
«¬ wA w< »¼
ª1 0 º
ªD A Dr A º « », J = 1
J=« D = 1 .
¬ DD < Dr < ¼» «0 » R cos < / 2
¬« R cos < / 2 ¼»
338 7 A spherical problem of algebraic representation
< r < r2
A = D , sin = , cos = 1 2
2 2R 2 4R
direct equations
D = A, r = 2 R sin < / 2
inverse equations
r r
A = D , sin < = 1 ( )2 .
R 2R
Lemma 7.9:
1 dD rdr
dAd < sin < = dxdy = .
R2 R2
1 x2 + y 2 1
sin < = x2 + y 2 1 2
= x2 + y 2 4R 2 ( x 2 + y 2 ) .
R 4R 2R2
7-3 Special models 339
The parameters ( F x , F y ) determine the initial values of the elliptic curve repre-
senting the canonical data set, namely (1/ F x , 1/ F y ) . The circular normal
distribution is achieved for the data set F x = F y = 1.
Second, we intend to transform the representation of coordinates of the oblique
normal distribution from Cartesian coordinates in the oblique equatorial plane to
curvilinear coordinates in the spherical reference frame:
x 2 (cos 2 H F x + sin 2 H F y ) + y 2 (sin 2 H F x + cos 2 H F y ) + xy sin H cos H ( F x F y ) =
= r 2 cos 2 D (cos 2 H F x + sin 2 H F y ) + r 2 sin 2 D (sin 2 H F x + cos 2 H F y ) +
+ r 2 sin D cos D sin H cos H ( F x F y )
= r 2 (< , A) cos 2 A(cos 2 H F x + sin 2 H F y ) + r 2 (< , A) sin 2 A(sin 2 H F x + cos 2 H F y ) +
+ r 2 (< , A) sin A cos A sin H cos H ( F x F y ).
to fulfill
M 1: d ( x, y ) t 0
M 2 : d ( x, y ) = 0 x = y
M 3 : d ( x, y ) = d ( y, x)
M 4 : d ( x, z ) d d ( x, y ) + ( y, z) : "Triangular Inequality".
Assume || x ||2 =|| y ||2 =|| z ||2 = 1 : Axioms M1 and M3 are easy to prove, Axiom M 2
is not complicated, but the Triangular Inequality requires work. Let x, y , z
X, D = d (x, y ), E = d ( y, z ) and J = d ( x, z ) , i.e.
D , E , J [0, S ],
cos D =< x, y >, cos E =< y, z >, cos J =< x, z > .
We wish to prove J d D + E . This result is trivial in the case cos t S , so we
may assume D + E [0, S ] . The third desired inequality is equivalent to
cos J t cos(D + E ) . The proof of the basic formulas relies heavily on the proper-
ties of the inverse product:
< u + uc, v >=< u, v > + < uc, v > º
< u, v + v c >=< u, v > + < u, v c > »» for all u,uc, v, v c \ 3 , and O \.
< O u, v >= O < u, v >=< u, O v > »¼
Table 7.4:
Computation of theodolite data
Comparison of / ˆ and /
g
Alternatively, let us present a second example. Let there be given observed azi-
muths / i and vertical directions ) i , by Table 7.3 in detail. First, we compute
the solution of the optimization problem
n n
= min
/P , )P
¦ cos ) sin /
i =1
i i ¦ sin )
i =1
i
ˆ =
tan / ˆ =
, tan ) .
n n n
¦ cos )i cos /i
i =1
(¦ cos ) i cos / i ) + (¦ cos ) i sin / i )
2 2
i =1 i =1
Table 7.5:
Data of type azimuth / i and vertical direction ) i
/i )i /i )i
This accounts for measurements of data on the horizontal circle and the vertical
circle being Fisher normal distributed. We want to tackle two problems:
Problem 1: Compare (/ ˆ ,)
ˆ ) with the arithmetic mean (/, ) ) of the
data set. Why do the results not coincide?
Problem 2: In which case (/ˆ ,)
ˆ ) and (/, ) ) do coincide?
Solving Problem 1
Let us compute
ˆ ,)
(/, ) ) = (125D.212,5, 88D.125) and (/ ˆ ) = (125D.206, 664,5, 88D.125, 050, 77)
The results do not coincide due to the fact that the arithmetic means are obtained
by adjusting direct observations with least-squares technology.
344 7 A spherical problem of algebraic representation
Solving Problem 2
The results do coincide if the following conditions are met.
• All vertical directions are zero
• /̂ = / if the observations / i , ) i fluctuate only “a little” around
the constant value / 0 , ) 0
• /̂ = / if ) i = const.
• )ˆ = ) if the fluctuation of / around / is considerably
i 0
smaller than the fluctuation of ) i around ) 0 .
Note the values
8 8
and
/ i = / 0 + G/ i versus ) i = ) 0 + G) i
1 n 1 n
G/ = ¦ G/ i versus G) = ¦ G) i
n i =1 n i =1
n n n
sin / 0 + G/ cos / 0
tan / =
cos / 0 G/ sin / 0
7-4 Case study 345
n n
(¦ cos ) i sin / i ) 2 + (¦ cos ) i cos / i ) 2 =
i =1 i =1
2
= n 2 (cos 2 ) 0 + cos 2 G/ + sin 2 G) 2sin ) 0 cos G))
n
ˆ = n sin ) 0 + G) cos ) 0
tan )
n cos ) 0 + G/ cos 2 ) 0 + G) 2 sin 2 ) 0 2G) sin ) 0 cos ) 0
2 2
sin ) 0 + G) cos ) 0
tan ) = .
cos ) 0 G) sin ) 0
ˆ z /, )
In consequence, / ˆ z ) holds in general.
References
Anderson, T.W. and M.A. Stephens (1972), Arnold, K.J. (1941), Barndorff-
Nielsen, O. (1978), Batschelet, E. (1965), Batschelet, E. (1971), Batschelet, E.
(1981), Beran, R.J. (1968), Beran, R.J. (1979), Blingham, C. (1964), Blingham,
C. (1974), Chang, T. (1986) Downs, T. D. and Gould, A. L. (1967), Durand, D.
and J.A. Greenwood (1957), Enkin, R. and Watson, G. S. (1996), Fisher, R. A.
(1953), Fisher, N.J. (1985), Fisher, N.I. (1993), Fisher, N.I. and Lee A. J.
(1983), Fisher, N.I. and Lee A. J. (1986), Fujikoshi, Y. (1980), Girko, V.L.
(1985), Goldmann, J. (1976), Gordon, L. and M. Hudson (1977), Grafarend, E.
W. (1970), Greenwood, J.A. and D. Durand (1955), Gumbel, E. J., Greenwood,
J. A. and Durand, D. (1953), Hammersley, J.M. (1950), Hartman, P. and G. S.
Watson (1974), Hetherington, T.J. (1981), Jensen, J.L. (1981), Jupp, P.E. and…
(1980), Jupp, P. E. and Mardia, K. V. , Kariya, T. (1989), (1989), Kendall, D.G.
(1974), Kent, J. (1976), Kent, J.T. (1982), Kent, J.T. (1983), Krumbein, W.C.
(1939), Langevin, P. (1905), Laycock, P.J. (1975), Lenmitz, C. (1995), Lenth,
R.V. (1981), Lord, R.D. (1948), Lund, U. (1999), Mardia, K.V. (1972), Mardia,
K.V. (1975), Mardia, K. V. (1988), Mardia, K. V. and Jupp, P. E. (1999),
Mardia, K.V. et al. (2000), Mhaskar, H.N., Narcowich, F.J. and J.D. Ward
(2001), Muller, C. (1966), Neudecker, H. (1968), Okamoto, M. (1973), Parker,
R.L. et al (1979), Pearson, K. (1905), Pearson, K. (1906), Pitman, J. and M. Yor
(1981), Presnell, B., Morrison, S.P. and R.C. Littell (1998), Rayleigh, L. (1880),
Rayleigh, L. (1905), Rayleigh, R. (1919), Rivest, L.P. (1982), Rivest, L.P.
346 7 A spherical problem of algebraic representation
(1988), Roberts, P.H. and H.D. Ursell (1960), Sander, B. (1930), Saw, J.G.
(1978), Saw, J.G. (1981), Scheidegger, A.E. (1965), Schmidt-Koenig, K. (1972),
Selby, B. (1964), Sen Gupta, A. and R. Maitra (1998), Sibuya, M. (1962), Stam,
A.J. (1982), Stephens, M.A. (1963), Stephens, M.A. (1964), Stephens, M. A.
(1969), Stephens, M.A. (1979), Tashiro, Y. (1977), Teicher, H. (1961), Von
Mises, R. (1918), Watson, G.S. (1956a, 1956b), Watson, G.S. (1960), Watson,
G.S.(1961), Watson, G.S. (1962), Watson, G.S. (1965), Watson, G.S. (1966),
Watson, G.S. (1967a, 1967b), Watson, G.S. (1968), Watson, G.S. (1969), Wat-
son, G.S. (1970), Watson, G.S. (1974), Watson, G.S. (1981a, 1981b), Watson,
G.S. (1982a, 1982b, 1982c, 1982d), Watson, G.S. (1983), Watson, G.S. (1986),
Watson, G.S. (1988), Watson, G.S. (1998), Watson, G.S. and E.J. Williams
(1956), Watson, G.S. and E..Irving (1957), Watson, G.S. and M.R. Leadbetter
(1963), Watson, G.S. and S. Wheeler (1964), Watson, G.S. and R.J. Beran
(1967), Watson, G.S., R. Epp and J.W. Tukey (1971), Wellner, J. (1979), Wood,
A. (1982), Xu, P.L. (1999), Xu, P.L. (2001), Xu, P. (2002), Xu, P.L. et al. (1996a,
1996b), Xu, P.L., and Shimada, S.(1997).
8 The fourth problem of probabilistic regression – special
Gauss-Markov model with random effects – Setup of BLIP
and VIP for the central moments of first order
Lemma 8.4
hom BLIP, hom S-BLIP
and hom Į-VIP
The general model of type “fixed effects”, “random effects” and “error-in-
variables” will be presented in our final chapter:
Here we focus on “random effects”.
348 8 The fourth problem of probabilistic regression
knowns within the simple special Gauss-Markov model with random effects: (i)
the vector z of random effects is unknown, (ii) the fixed vectors E{y}, E{z} of
expectations of the vector y of observations and of the vector z of random effects
(first moments) are unknown and (iii) the fixed matrices Ȉ y , Ȉ z of dispersion
matrices D{y}, D{z} (second central moments) are unknown.
Box 8.1:
Special Gauss-Markov model with random effects
y = Cz + e y Ce z
E{y} = CE{z} \ n
D{y} = D{y Cz} + CD{z}Cc \ n×n
C{y , z} = 0
z, E{z}, E{y}, Ȉ y-Cz , Ȉ z unknown
dim R (Cc) = rk C = l.
Here we focus on best linear predictors of type hom BLIP, hom S-BLIP and hom
Į-VIP of random effects z, which turn out to be better than the best linear uni-
formly unbiased predictor of type hom BLUUP. At first let us begin with a dis-
cussion of the bias vector and the bias matrix as well as of the Mean Square
Prediction Error MSPE{z} with respect to a homogeneous linear prediction
z = Ly of random effects z based upon Box 8.2.
Box 8.2:
Bias vector, bias matrix
Mean Square Prediction Error
in the special Gauss–Markov model with random effects
E{y} = CE{z} (8.1)
D{y} = D{y Cz} + CD{z}Cc (8.2)
“ansatz”
z = Ly (8.3)
bias vector
ȕ := E{z z} = E{z } E{z} (8.4)
z z =
(8.7)
z E{z} (z E{z}) + ( E{z} E{z})
z z =
(8.8)
L(y E{y}) (z E{z}) [I A LC]E{z}
The bias vector ȕ is conventionally defined by E{z} E{z} subject to the ho-
mogeneous prediction form z = Ly . Accordingly the bias vector can be repre-
sented by (8.5) ȕ = [I A LC]E{z} . Since the expectation E{z} of the vector z
of random effects is unknown, there has been made the proposal to use instead
the matrix I A LC as a matrix-valued measure of bias. A measure of the predic-
tion error is the Mean Square prediction error MSPE{z} of type (8.9).
MSPE{z} can be decomposed into three basic parts:
8-1 The random effect model 351
The predictions z hom BLIP, hom S-BLIP and hom Į-VIP can be characterized
as follows:
Lemma 8.4 (hom BLIP, hom S-BLIP and hom Į-VIP):
An l×1 vector z is hom BLIP, hom S-BLIP or hom Į-VIP of
z in the special linear Gauss-Markov model with random effects
of Box 8.1, if and only if the matrix L̂ fulfils the normal equations
(1st) hom BLIP:
(6 y + CSCc)Lˆ c = CS (8.24)
(6 y + 1 CSCc)Lˆ c = 1 CS . (8.25)
D D
:Proof:
(i) hom BLIP:
2
The hybrid norm || MSPE{z} || establishes the Lagrangean
L (L) := tr L6 y Lc + tr (I l LC) E{z}E{z}c (I l LC)c + tr 6 z = min
L
for z hom BLIP of z. The necessary conditions for the minimum of the quad-
ratic Lagrangean L (L) are
wL ˆ
(L) := 2[6 y Lˆ c + CE{z}E{z}cCcLˆ c CE{z}E{z}c ] = 0 ,
wL
which agree to the normal equations (8.23). (The theory of matrix derivatives is
reviewed in Appendix B (Facts: derivative of a scalar-valued function of a ma-
trix: trace).)
The second derivatives
w2 L
(Lˆ ) > 0
w (vecL)w (vecL)c
wL ˆ
(L) := 2Lˆ (6 y + CE{z}E{z}cCc) 2 E{z}E{z}cCc
wL
wL
(Lˆ ) := vec[2Lˆ (6 y + CE{z}E{z}cCc) 2 E{z}E{z}cCc].
w (vecL)
for z hom S-BLIP of z. Following the first part of the proof we are led to the
necessary conditions for the minimum of the quadratic Lagrangean L (L)
wL ˆ
(L) := 2[6 y Lˆ c + CSCcLˆ c CS]c = 0
wL
as well as to the sufficiency conditions
w2 L
(Lˆ ) = 2[(6 y + CSCc)
I A ] > 0.
w (vecL)w (vecL)c
for z hom Į-VIP of z. Following the first part of the proof we are led to the
necessary conditions for the minimum of the quadratic Lagrangean L (L)
wL ˆ
(L) = 2[(6 y + CE{z}E{z}cCc)
I A ]vecLˆ 2vec(E{z}E{z}cCc).
wL
The Kronecker-Zehfuss Product A
B of two arbitrary matrices as well as
( A + B)
C = A
B + B
C of three arbitrary matrices subject to dim A=dim
B is introduced in Appendix A. (Definition of Matrix Algebra: multiplication
matrices of the same dimension (internal relation) and multiplication of matrices
(internal relation) and Laws). The vec operation (vectorization of an array) is
reviewed in Appendix A, too. (Definition, Facts: vecAB = (Bc
I cA )vecA for
suitable matrices A and B.) Now we are prepared to compute
w2 L
(Lˆ ) = 2[(6 y + CE{z}E{z}Cc)
I A ] > 0
w (vecL)w (vecL)c
8-1 The random effect model 355
w2 L
(Lˆ ) = 2[( 1 CSCc + 6 y )
I A ] > 0 .
w (vecL)w ( vecL)c D
Beside the explicit representation of z of type hom BLIP, hom S-BLIP and hom
Į-VIP we compute the related dispersion matrix D{z} , the Mean Square Predic-
tion Error MSPE{z} , the modified the Mean Square Prediction Error MSPE S {z}
and MSPED ,S {z} and the covariance matrices C{z, z z} in
ȕ := E{z } E{z}
(8.28)
= [I A E{z}E{z}cCc(CE{z}E{z}cCc + 6 y ) 1 C]E{z}
MSPE{z } :=
D{z} + D{z} + [I A E{z}E{z}cCc(CE{z}E{z}cCc + 6 y ) 1 C] × (8.30)
×E{z}E{z}c [I A Cc(CE{z}E{z}cCc + 6 y ) 1 CE{z}E{z}c ].
At this point we have to comment what Theorem 8.5 tells us. hom BLIP has
generated the prediction z of type (8.26), the dispersion matrix D{z} of type
(8.27), the bias vector of type (8.28) and the Mean Square Prediction Error of
type (8.30) which all depend on the vector E{z} and the matrix E{z}E{z}c ,
respectively. We already mentioned that E{z} and E{z}E{z}c are not accessi-
ble from measurements. The situation is similar to the one in hypothesis theory.
As shown later in this section we can produce only an estimator E n{z} and con-
sequently can setup a hypothesis first moment E{z} of the "random effect" z.
Indeed, a similar argument applies to the second central moment D{y} ~ 6 y of
the "random effect" y, the observation vector. Such a dispersion matrix has to be
known in order to be able to compute z , D{z} , and MSPE{z} . Again we have
to apply the argument that we are only able to construct an estimate 6ˆ cy and to
setup a hypothesis about D{y} ~ 6 y .
Theorem 8.6 ( z hom S-BLIP):
Let z = Ly be hom S-BLIP of z in the special linear Gauss-Markov
model with random effects of Box 8.1. Then equivalent representations
of the solutions of the normal equations (8.24) are
z = SCc(6 y + CSCc) 1 y =
(8.31)
= SCc(6 y Cz + C6 z Cc + CSCc) 1 y
z = 1 SCc(6 y + 1 CSCc) 1 y
D D
(8.40)
= 1 SCc(6 y Cz + C6 z Cc + 1 CSCc) 1 y
D D
z = (Cc6 y1C + D S 1 ) 1 Cc6 y1y (8.41)
MSPE{z } =
= 6 z + SCc(CSCc + 6 y ) 1 6 y (CSCc + 6 y ) 1 CS +
• tr D{z} = min
D
• ȕȕc = min
D
• tr MSPE{z} = min .
D
Lˆ = E{z}E{z}cCc[6 y + CE{z}E{z}cCc]1 .
Lˆ = SCc[6 y + CSCc]1.
if S 1 and 6 y1 exist. Such a result concludes this part of the proof.
(iv) z = (I A + SCc6 y1C) 1 SCc6 y1y
Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matri-
ces, s(9)) the fundamental matrix identity
SC '(6 y + CSCc) 1 = (I A + SCc6 y1C) 1 SCc6 y1 ,
Lˆ = 1 SCc[6 y + 1 CSCc]1.
D D
(vi) z = (Cc6 y1C + D S 1 ) 1 Cc6 y1y
Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matri-
ces, s(10), Duncan-Guttman matrix identity) the fundamental matrix identity
1 SCc(6 + CSCc) 1 = (Cc6 1C + D S 1 ) 1 Cc6 1 ,
y y y
D
if S 1 and 6 y1 exist. Such a result concludes this part of the proof.
By means of the definition of the dispersion matrix D{z} and the substitution of
z of type hom BLIP the proof has been straightforward.
(ix) hom S-BLIP: D{z} (1st representation)
8-1 The random effect model 361
By means of the definition of the dispersion matrix D{z} and the substitution of
z of type hom S-BLIP the proof of the first representation has been straightfor-
ward.
(x) hom S-BLIP: D{z} (2nd representation)
2!
1
f cc( P z )V z2 O (3)]2 },
2!
hence E{[ y E{ y}][[ y E{ y}]} is given by
8-2 Examples 363
1 2
V y2 = f c2 ( P z )V z2 f cc ( P z )V z4 + f fc cc( P z ) E{( z P z )3 } +
4
1
+ f cc2 E{( z P z ) 4 } + O (3).
4
Finally if z is quasi-normally distributed, we have
1
V y2 = f c 2 ( P z )V z2 + f cc 2 ( P z )V z4 + O (3).
2
2! 2!
+ f cc([ 0 )( z P z )( z [ 0 ) + O (3)
and
1
E{ y} = E{ f ( z )} = f ([ 0 ) + f c([ 0 )( P z [ 0 ) + f cc([ 0 )V z2 +
2
1
+ f cc([ 0 )( P z [ 0 ) 2 + O (3)
2
leading to E{[ y E{ y}][[ y E{ y}]} as
V z2 = f c2 ([ 0 )V z2 + f fc cc([ 0 ) E{( z P z )3} + 2 f fc cc([ 0 )V z2 ( P z [ 0 ) +
1
+ f cc2 ([ 0 ) E{( z P z ) 4 } + f cc2 ([ 0 ) E{( z P z ) 3}( P z [ 0 )
4
1
f cc2 ([ 0 )V z4 + f cc2 ([ 0 )V z2 ( P z [ 0 ) 2 + O (3)
4
and with z being quasi-normally distributed, we have
1 2
V z2 = f c2 ([ 0 )V z2 + 2 f fc cc([ 0 )V z2 ( P z [ 0 ) + f cc ([ 0 )V z4 + f cc2 ([ 0 )V z2 ( P z [ 0 ) 2 + O (3) ,
2
with the first and third terms (on the right hand side) being the right hand side-
terms of case 1 (cf. E. Grafarend and B. Schaffrin 1983, p.470).
364 8 The fourth problem of probabilistic regression
2! 2!
1 1
E{ y} = f ( P z ) + f c( P z ) E 0 + f cc( P z )V z2 + f cc( P z ) E 02 + O (3)
2 2 0
1
2 f fc cc( P z )V z2 E 0 + f cc2 ( P z ) E{( z0 E{z0 }) 4 } +
0
4
f cc2 ( P z ) E{( z0 E{z0 })3 }E 0 + f cc2 ( P z )V z2 E 02 +
0
1 1
+ f cc2 ( P z )V z4 f cc2 ( P z ) E{( z0 E{z0 }) 2 }V z2 + O (3)
4 0
2 0
with the first and third terms (on the right-hand side) being the right-hand side
terms of case 1.
8-2 Examples 365
P2
P4
e2
P1
e1
Point x y
P1 100.00m 100.00m
P2 110.00m 117.32m
P3 101.34m 122.32m
P4 91.34m 105.00m
Table 8.2: Longitudinal and lateral correlation functions 6 m and 6 A
for a Taylor-Korman structured 4 point network
|x| 6 m ( x) 6 A ( x)
x1 y1 x2 y2 x3 y3 x4 y4
x1 1 0 0.438 -0.022 0.441 -0.005 0.512 0.108
y1 1 -0.022 0.412 -0.005 0.361 0.108 0.638
x2 1 0 0.512 0.108 0.381 -0.037
y2 1 0.108 0.634 -0.037 0.417
x3 symmetric 1 0 0.438 -0.022
y3 1 -0.022 0.412
x4 1 0
y4 1
Finally, we have computed the Jacobi matrix of first derivatives in Table 8.6 and
the Hesse matrix of second derivatives in Table 8.7.
368 8 The fourth problem of probabilistic regression
:Note:
wF y y º x0 = x4
= i +1 i -1 »
wxi 2 y0 = y4
»
wF xi +1 xi -1 » x5 = x1
= »
wyi 2 ¼ y5 = y1 .
Results
At first, we list the distances {P1P2, P2P3, P3P4, P4P1} of the trapezoidal finite
element by |P1P2|=20 (for instance 20m), |P2P3|=10 (for instance 10m), |P3P4|=20
(for instance 20m) and |P4P1|=10 (for instance 10m).
Second, we compute V F2 ( first term) = JȈJ c by
V F2 ( first term) =
1
= [12.32, 18.66, 22.32, 1.34, 12.32, 18.62, 22.32, 1.34] ×
2
ª 1 0 0.438 -0.022 0.442 -0.005 0.512 0.108 º
« 0 1 -0.022 0.412 -0.005 0.362 0.108 0.638 »
« 0.438 -0.022 1 0 0.512 0.108 0.386 -0.037 »
«-0.022 0.412 0 1 0.108 0.638 -0.037 0.418 » ×
׫
0.442 -0.005 0.512 0.108 1 0 0.438 -0.022 »
« -0.005 0.362 0.108 0.638 0 1 -0.022 0.412 »
« 0.512 0.108 0.386 -0.037 0.438 -0.022 1 0 »
« 0.108 0.638 -0.037 0.418 -0.022 0.412 0 1 »¼
¬
1
× [12.32, 18.66, 22.32, 1.34, 12.32, 18.62, 22.32, 1.34]c =
2
= 334.7117.
1
Third, we need to compute V F2 ( second term) = H (vecȈ)(vecȈ)cH c by
2
1
V F2 ( second term) = H (vecȈ)(vecȈ)cH c = 7.2222 × 10-35
2
where
ª 0 0 0 12 0 0 0 12 º
« 0 0 1 0 0 0 1 0 »
« 1
2
1
2 »
« 0 2 0 0 0 2 0 0 »
« 1 0 0 0 1 0 0 0 »
H = vec « 2 1
2
1 »,
« 0 0 01 2 0 0 01 2 »
« 0 0 2 0 0 0 2 0 »
« 0 1 0 0 0 1 0 0 »
« 1 2 1
2 »
¬« 2 0 0 0 2 0 0 0 ¼»
and
ª 1 0 0.438 -0.022 0.442 -0.005 0.512 0.108 º
« 0 1 -0.022 0.412 -0.005 0.362 0.108 0.638 »
« 0.438 -0.022 1 0 0.512 0.108 0.386 -0.037 »
«-0.022 0.412 0 1 0.108 0.638 -0.037 0.418 »
vec Ȉ = vec « ».
« 0.442 -0.005 0.512 0.108 1 0 0.438 -0.022 »
« -0.005 0.362 0.108 0.638 0 1 -0.022 0.412 »
« 0.512 0.108 0.386 -0.037 0.438 -0.022 1 0 »
¬« 0.108 0.638 -0.037 0.418 -0.022 0.412 0 1 ¼»
Finally, we get the variance of the planar surface element F
370 8 The fourth problem of probabilistic regression
F = ( x2 x1 ) 2 + ( y2 y1 ) 2 .
w wF wF 1 ( x x )2 ( x x1 )( y 2 y1 )
( ," , ) =[ 2 31 , 2 ,
wx1 wx1 wy 4 F F F3
1 ( x 2 x 2 ) ( x x1 )( y 2 y1 )
+ 2 3 1 , 2 , 0, 0, 0, 0],
F F F3
8-2 Examples 371
w wF wF ( x x )( y y ) 1 ( y y ) 2
( ," , ) = [ 2 1 3 2 1 , 2 3 1 ,
wy1 wx1 wy4 F F F
( x2 x1 )( y2 y1 ) 1 ( y22 y12 )
, + , 0, 0, 0, 0]
F3 F F3
w wF wF 1 ( x x ) 2 ( x x )( y y )
( ," , ) = [ + 2 3 1 , 2 1 3 2 1 ,
wx2 wx1 wy4 F F F
1 ( x22 x12 ) ( x x )( y y )
3
, 2 1 3 2 1 , 0, 0, 0, 0],
F F F
w wF wF ( x x )( y y ) 1 ( y y )2
( ," , ) =[ 2 1 3 2 1 , + 2 3 1 ,
wy2 wx1 wy4 F F F
( x2 x1 )( y2 y1 ) 1 ( y22 y12 )
, , 0, 0, 0, 0],
F3 F F3
w wF wF º
( ," , ) = [0, 0, 0, , 0, 0, 0, 0, 0] »
wxi wx1 wy4
» , i = 3, 4.
w wF wF »
( ," , ) = [0, 0, 0, , 0, 0, 0, 0, 0]»
wyi wx1 wy4 ¼
Results
At first, we list the distance {P1P2} of the distance element by |P1P2|=20 (for
instance 20m).
Second, we compute V F2 ( first term) = JȈJ c by
V F2 ( first term) =
= [0.5, 0.866, 0.5, 0.866, 0, 0, 0, 0] ×
ª 1 0 0.438 -0.022 0.442 -0.005 0.512 0.108 º
« 0 1 -0.022 0.412 -0.005 0.362 0.108 0.638 »
« »
« 0.438 -0.022 1 0 0.512 0.108 0.386 -0.037 »
-0.022 0.412 0 1 0.108 0.638 -0.037 0.418 »
× «« ×
0.442 -0.005 0.512 0.108 1 0 0.438 -0.022 »
« -0.005 0.362 0.108 0.638 0 1 -0.022 0.412 »»
«
« 0.512 0.108 0.386 -0.037 0.438 -0.022 1 0 »
«¬ 0.108 0.638 -0.037 0.418 -0.022 0.412 0 1 »¼
×[0.5, 0.866, 0.5, 0.866, 0, 0, 0, 0]c =
= 1.2000.
1
Third, we need to compute V F2 ( second term) = H (vecȈ)(vecȈ)cH c by
2
372 8 The fourth problem of probabilistic regression
1
V F2 ( second term) = H (vecȈ)(vecȈ)cH c = 0.0015
2
where
ª 0.0375 -0.0217 -0.0375 0.0217 0 0 0 0º
« -0.0217 0.0125 0.0217 -0.0125 0 0 0 0»
« »
« -0.0375 0.0217 0.0375 -0.0217 0 0 0 0»
0.0217 -0.0125 -0.0217 0.0125 0 0 0 0»
H = vec «« ,
0 0 0 0 0 0 0 0»
« 0 0 0 0 0 0 0 0»»
«
« 0 0 0 0 0 0 0 0»
«¬ 0 0 0 0 0 0 0 0»¼
and
ª 1 0 0.438 -0.022 0.442 -0.005 0.512 0.108 º
« 0 1 -0.022 0.412 -0.005 0.362 0.108 0.638 »
« »
« 0.438 -0.022 1 0 0.512 0.108 0.386 -0.037 »
-0.022 0.412 0 1 0.108 0.638 -0.037 0.418 »
vec Ȉ = vec «« .
0.442 -0.005 0.512 0.108 1 0 0.438 -0.022 »
« -0.005 0.362 0.108 0.638 0 1 -0.022 0.412 »»
«
« 0.512 0.108 0.386 -0.037 0.438 -0.022 1 0 »
«¬ 0.108 0.638 -0.037 0.418 -0.022 0.412 0 1 »¼
Lemma 9.2
“inconsistent homogeneous
conditions”
Definition 9.1
Lemma 9.3
“inconsistent homogeneous
G y -norm: least squares solution
conditions”
Theorem 9.4
G y -seminorm: least squares
solution
Here we shall outline two systems of poor conditional equations, namely homo-
geneous and inhomogeneous inconsistent equations. First, Definition 9.1 gives
us G y -LESS of a system of
inconsistent homogeneous conditional equations
which we characterize as the least squares solution with respect to the G y -
seminorm ( G y -norm) by means of Lemma 9.2, Lemma 9.3 ( G y -norm) and
Lemma 9.4 ( G y -seminorm). Second, Definition 9.5 specifies G y -LESS of a
system of
374 9 The fifth problem of algebraic regression
ªG y B cº ª i A º ª 0 º
=« » (9.4)
«B
¬ 0 »¼ «¬ OA »¼ ¬ B y ¼
L( i, O ) := icG y i + 2O c( Bi By ) = min .
i, O
The solution
i A = G y1Bc( BG y1Bc) 1 By (9.6)
:Proof:
A basis of the proof could be C. R. Rao’s Pandora Box, the theory of inverse
partitioned matrices (Appendix A: Fact: Inverse Partitioned Matrix /IPM/ of a
symmetric matrix). Due to the rank identity rkG y = n , the normal equations
(9.4) can be faster solved by Gauss elimination.
376 9 The fifth problem of algebraic regression
G y i A + BcOA = 0
Bi A = By.
Multiply the first normal equation by BG y1 and substitute the second normal
equation for Bi A .
BG y1BcOA = By
Finally we substitute the “Lagrange multiplier” OA back to the first normal equa-
tion in order to prove
G y i A + BcOA = G y i A Bc(BG y1Bc) 1 By = 0
The normal equations of the inconsistent condition read for the case G y = I 3 :
i A = Bc(BBc) 1 By ,
ª1 1 1º
1
Bc(BBc)B = «1 1 1» ,
3 «1 1 1»
¬ ¼
1
(iDE )A = (iEJ )A = (iJD ) A = ( hDE + hEJ + hJD ) .
3
(ii) the second example: sum of planar triangles
Alternatively, we assume: three angles which form a planar triangle of sum to
D + E + J = 180D
namely
ªD º ªiD º
« »
B := [1, 1, 1], y := « E » , i := ««iE »» , c := 180D.
«¬ J »¼ «¬ iJ »¼
The normal equations of the inconsistent condition equation read in our case
G y = I3 :
i A = Bc(BBc) 1 (By c) ,
ª1º
1
Bc(BBc) 1 = «1» , By c = D + E + J 180D ,
3 «1»
¬¼
ª1º
1
(iD )A = (i E )A = (iJ )A = «1» (D + E + J 180D ) .
3 «1»
¬¼
10 The fifth problem of probabilistic regression
– general Gauss-Markov model with mixed effects–
Setup of BLUUE for the moments of first order
(Kolmogorov-Wiener prediction)
“Prediction company’s chance of success is not zero, but close to it.”
Eugene Fama
“The best way to predict the future is to invent it.”
Alan Kay
Lemma 10.2
Ȉ y BLUUE of ȟ and E{z} :
Theorem 10.3
Definition 10.1 n
ȟˆ , E {z} : Ȉ y BLUUE of
Ȉ y BLUUE of ȟ and E{z} :
ȟ and E{z} :
Lemma 10.4
n n
E{y}: ȟˆ , E {z} Ȉ y BLUUE of
ȟ and E{z}
The inhomogeneous general linear Gauss-Markov model with fixed effects and
random effects will be presented first. We review the special Kolmogorov-
Wiener model and extend it by the proper stochastic model of type BIQUUE
given by Theorem 10.5.
The extensive example for the general linear Gauss-Markov model with fixed
effects and random effects concentrates on a height network observed at two
epochs. At the first epoch we assume three measured height differences. In be-
tween the first and the second epoch we assume height differences which change
linear in time, for instance as a result of an earthquake we have found the height
difference model
•
hDE (W ) = hDE (0) + hDE W + O (W 2 ) .
Namely, W indicates the time interval from the first epoch to the second epoch
•
relative to the height difference hDE . Unknown are
• the fixed effects hDE and
• the expected values of stochastic effects
of type height difference velocities hDE
given the singular dispersion matrix of height differences. Alternative estimation
and prediction producers of
• type (V + CZCc) -BLUUE for the unknown fixed parameter vector ȟ of
height differences of initial epoch and
• the expectation data E{z} of stochastic height difference velocities z, and
• of type (V + CZCc) -BLUUE for the expectation data E{y} of height
difference measurements y,
• of type e y of the empirical error vector,
• as well as of type (V + CZCc) -BLUUP of the stochastic vector z of
height difference velocities.
For the unknown variance component V 2 of height difference observations we
review estimates of type BIQUUE.
At the end, we intend to generalize the concept of estimation and prediction of
fixed and random effects by a short historical remark.
10-1 Inhomogeneous general linear Gauss-Markov model
(fixed effects and random effects)
Here we focus on the general inhomogeneous linear Gauss-Markov model in-
cluding fixed effects and random effects. By means of Definition 10.1 we review
Ȉ y -BLUUE of ȟ and E{z} followed by the related Lemma 10.2, Theorem 10.3
and Lemma 10.4.
Box 10.1
Inhomogeneous general linear Gauss–Markov model
(fixed effects and random effects)
10-1 Inhomogeneous general linear Gauss-Markov model 381
Ȉ z := D{z}, Ȉ y := D{y}
C{y , z} = 0
ª ȟˆ º ª ȟˆ º ª L1 º ª ț1 º
« » = «n» = « »y + « »
¬ Șˆ ¼ «¬ E{z}»¼ ¬ L 2 ¼ ¬ț 2 ¼
n
tr D {Șˆ } := tr D{E {z}} := E {( Șˆ - Ș)
(Șˆ - Ș)} =
n n (10.6)
= E{( E {z} E{z})c( E {z} E{z}{z})} = tr L 2 Ȉ y Lc2 =|| Lc ||2Ȉ = min .
y L2
382 10 The fifth problem of probabilistic regression
or
Ȉ y L1c Aȁ11 Cȁ 21 = 0, Ȉ y Lc2 Aȁ12 Cȁ 22 0
A cL1c = I m , A cLc2 = 0 (10.8)
CcL1c = 0, CcLc2 = I l
with suitable matrices ȁ11 , ȁ12 , ȁ 21 and ȁ 22 of “Lagrange multi-
pliers”.
Theorem 10.3 specifies the solution of the special normal equations by means of
(10.9) relative to the specific “Schur complements” (10.10)-(10.13).
{}
D ȟˆ = {A cȈ-y1 [I n - C(CcȈ-y1C) -1 CcȈ-y1 ]A} =: Ȉȟˆ
1
C{ȟ n
ˆ Șˆ } = C{ȟˆ , E {z}}} =
= {A cȈ-y1[I n - C(CcȈ-y1C) -1 CcȈ-y1 ]A} A cȈ-y1C(CcȈ-y1C) 1
1
D {Șˆ } := D{E
n {z}} = {CcȈ-y1 [I n - A ( A cȈ-y1A ) -1 A cȈ-y1 ]C}1 =: ȈȘˆ
D{ȟˆ} = S 1 , D{Șˆ } = D{E n{z}} = S 1
A C
C{ȟˆ , z} = 0
C{Șˆ , z}:= C{E n {z}, z} = 0 ,
where the “Schur complements” are defined by
S A := A cȈ-y1[I n - C(CcȈ-y1C) -1 CcȈ-y1 ]A, (10.10)
s A := A cȈ-y1[I n - C(CcȈ-y1C) -1 CcȈ-1y ]y (10.11)
Our final result (10.14)-(10.23) summarizes (i) the two forms (10.14) and (10.15)
of estimating E n {y} and D{En{y}} as derived covariance matrices, (ii) the empirical
error vector e y and the related variance-covariance matrices (10.19)-(10.21) and
(iii) the dispersion matrices D{y} by means of (10.22)-(10.23).
n
Lemma 10.4 ( E n
{y}: ȟˆ , E {z} Ȉ y BLUUE of ȟ and E{z} ):
D{En n
{y}} = D{Aȟˆ + C E {z}} =
n
= AD{ȟˆ}A c + A cov{ȟˆ , E n
{z}}Cc + C cov{ȟˆ , E n
{z}}A c + CD{E {z}}Cc
n
D{E n
{y}} = D{Aȟˆ + C E {z}} =
= C(CcȈ-y1C) 1 Cc + [I n - C(CcȈ-y1C) -1 CcȈ-y1 ]AS A1A c[I n - Ȉ -y1C(CcȈ -y1C) -1 Cc]
n
D{E n
{y}} = D{Aȟˆ + C E{z}} =
= A( A cȈ-y1A) 1 Ac + [I n - A( AcȈ-y1A)-1 AcȈ-y1 ]CS C1Cc[I n - Ȉ-y1A( AcȈ-y1A) -1 Ac],
where S A , s A , SC , sC are “Schur complements” (10.10), (10.11),
(10.12) and (10.13).
The covariance matrix of E n{y} and z amounts to
n
cov{E n
{y}, z} = C{Aȟˆ + C E{z}, z} = 0. (10.16)
subject to
[S A , s A ] := A c( V + CZCc) 1{I n C[Cc( V + CZCc) 1 C]1 Cc( V + CZCc) 1}[ A, y ] =
= A cQ( V + CZCc) 1[ A, y ]
and
[SC , sC ] := Cc( V + CZCc) 1{I n A[ Ac( V + CZCc) 1 A]1 Ac( V + CZCc) 1}[C, y] ,
where SA and SC are “Schur complements”.
Alternately, we receive the empirical data based upon
Vˆ 2 = ( n m A) 1 y c( V + CZCc) 1 e y =
= ( n m A) 1 e cy ( V + CZCc) 1 e y
and the related variances
386 10 The fifth problem of probabilistic regression
D{Vˆ 2 } = 2( n m A) 1V 4 = 2( n m A) 1 (V 2 ) 2
or replacing by the estimations
D{Vˆ 2 } = 2(n m A) 1 (Vˆ 2 ) 2
= 2(n m A) 1[e cy (V + CZCc) 1 e y ]2 .
(ii) If the cofactor matrix V is positive definite, we will find for the simple
representations of type BIQUUE of V 2 the equivalent representations
ª A cV 1A A cV 1Cº ª A cº 1
Vˆ 2 = ( n m A) 1 y c{V 1 V 1[ A, C] « 1 1 » « » A }y
¬ CcV A CcV C ¼ ¬ Cc ¼
Vˆ 2 = ( n m A) 1 y cV 1 [I n A ( A cV 1A ) 1 A cV 1 ]( y CSC1sC )
subject to the projection matrix
Q = I n V 1C(CcV 1C) 1 Cc
and
[S A , s A ] := A cV 1 [I n C(CcV 1C) 1 CcV 1 ][ A, y ] = A cQV 1 [ A, y ]
where W notes the time difference from the first epoch to the second epoch, re-
lated to the height difference. Unknown are the fixed height differences hDE and
•
the expected values of the random height difference velocities hDE . Given is the
singular dispersion matrix of height difference measurements. Alternative esti-
mation and prediction data are of type (V + CZCc) -BLUUE for the unknown
parameter ȟ of height difference at initial epoch and the expected data E{z} of
stochastic height difference velocities z, of type (V + CZCc) -BLUUE of the
expected data E{y} of height difference observations y, of type e y of the em-
pirical error vector of observations and of type (V + CZCc) -BLUUP for the
stochastic vector z of height difference velocities. For the unknown variance
component V 2 of height difference observations we use estimates of type
BIQUUE. In detail, our model assumptions are
epoch 1
ª hDE º ª 1 0 º
ªh º
E{«« hEJ »»} = « 0 1 » « DE »
« » hEJ
«¬ hJD »¼ ¬ 1 1 ¼ ¬ ¼
epoch 2
ª hDE º
ª hDE º ª 1 0 º ª W 0 º «
« » « hEJ »
»
E{« hEJ »} = 0 1 0 W « » « •
»
« »« » « E{hDE }»
«¬ hJD »¼ ¬ 1 1 ¼ ¬ W W ¼ « • »
¬« E{hEJ }¼»
epoch 1 and 2
ª hDE º ª 1 0 º ª0 0º
« h » « » «0
EJ
« » « 0 1 0»
» ªh º « » •
h
« » 1 1 » DE 0 0 » ª E{hDE }º
E{« JD »} = « +«
kDE « 1 0 » «¬ hEJ »¼ « W 0 » «« E{hEJ
• »
}¼»
« » « ¬
» «0 W»
« k EJ » « 0 1 » « W
« k » ¬ 1 1 ¼ ¬ W »¼
¬ JD ¼
ª hDE º ª1 0º
« h » «0
« »EJ 1»
« »
« hJD » 1 1»
y := « » , A := « ,
kDE «1 0»
« » «0
« k EJ » 1»
« 1 1 »¼
« k » ¬
¬ JD ¼
ªh º
ȟ := « DE »
¬ hEJ ¼
388 10 The fifth problem of probabilistic regression
ª 0 0º
« 0 0»
« » •
ª E{hDE }º
0 0»
C := « , E{z} = « • »
« W 0» ¬« E{hEJ }¼»
« 0 W»
« W W »
¬ ¼
rank identities
rk A=2, rk C=2, rk [A,C]=m+l=4.
The singular dispersion matrix D{y} = VV 2 of the observation vector y and the
singular dispersion matrix D{z} = ZV 2 are determined in the following. We
separate 3 cases.
(i) rk V=6, rk Z=1
1 ª1 1º
V = I6 , Z =
W 2 «¬1 1»¼
(ii) rk V=5, rk Z=2
1
V = Diag(1, 1, 1, 1, 1, 0) , Z = I 2 , rk(V +CZCc)=6
W2
(iii) rk V=4, rk Z=2
1
V = Diag(1, 1, 1, 1, 0, 0) , Z = I 2 , rk (V +CZCc)=6 .
W2
In order to be as simple as possible we use the time interval W=1.
With the numerical values of matrix inversion and of “Schur-complements”, e.g.
Table 1: (V +CZCc)-1
Table 2: {I n -A[Ac(V +CZCc)-1 A]-1 Ac(V +CZCc)-1 }
Table 3: {I n -C[Cc(V +CZCc)-1C]-1Cc(V +CZCc)-1 }
Table 4: “Schur-complements” SA, SC
Table 5: vectors sA, sC
1st case: n
ȟ̂ , D{ȟˆ} , E n
{z} , D{E{y}}
10-3 An example for collocation 389
1 ª 2 1 1 0 0 0 º 1 ª 2 y + y2 y3 º
ȟˆ = « y= « 1 ,
3 ¬1 2 1 0 0 0¼ » 3 ¬ y1 + 2 y2 + y3 »¼
V 2 ª2 1º
D{ȟˆ} = ,
3 «¬1 2 »¼
n ª 2 1 1 2 1 1º ª 2 y1 y2 + y3 + 2 y4 + y5 y6 º
E{z} = « y=« ,
¬ 1 2 1 1 2 1 »
¼ ¬ y1 2 y2 y3 + y4 + 2 y5 + y6 »¼
2
n V ª7 5º
D{E{z}} = «¬ 5 7 »¼ ,
3
2nd case: n
ȟ̂ , D{ȟˆ} , E n
{z} , D{E{z}}
1 ª 2 1 1 0 0 0 º 1 ª 2 y + y2 y3 º
ȟˆ = « y= « 1 ,
3 ¬1 2 1 0 0 0¼ » 3 ¬ y1 + 2 y2 + y3 »¼
V 2 ª2 1º
D{ȟˆ} = ,
3 «¬1 2 »¼
n ª 4 2 2 3 3 3º
E{z} = « y,
¬ 2 4 2 3 3 3 »¼
n V 2 ª13 5 º
D{E{z}} = ,
6 «¬ 5 13»¼
3rd case: n
ȟ̂ , D{ȟˆ} , E n
{z} , D{E{z}}
1 ª 2 1 1 0 0 0 º 1 ª 2 y + y2 y3 º
ȟˆ = « y= « 1 ,
3¬ 1 2 1 0 0 0 »
¼ 3 ¬ y1 + 2 y2 + y3 »¼
V 2 ª2 1º
D{ȟˆ} = ,
3 «¬1 2 »¼
n ª 2 1 1 3 0 0 º 1 ª 2 y1 y2 + y3 + 3 y4 º
E{z} = « ,
¬ 1 2 1 0 3 0 »¼ 3 «¬ y1 2 y2 y3 + 3 y5 »¼
2
n V ª5 1 º
D{E{z}} = «¬1 5»¼ .
3
Table 1:
Matrix inverse (V +CZCc)-1 for a mixed Gauss-Markov
model with fixed and random effects
V +CZCc (V +CZCc)-1
390 10 The fifth problem of probabilistic regression
1st case ª1 0 0 0 0 0º ª3 0 0 0 0 0 º
«0 1 0 0 0 0» «0 3 0 0 0 0 »
« » 1 ««0 »
«0 0 1 0 0 0» 0 3 0 0 0 »
«0 0 0 2 1 0» 3 «0 0 0 2 1 0»
«0 0 0 1 2 0» «0 0 0 1 2 0»
«0 0 0 0 0 1 »¼ «0 0 0 0 0 3»¼
¬ ¬
2nd case ª1 0 0 0 0 0 º ª4 0 0 0 0 0 º
«0 1 0 0 0 0 » «0 4 0 0 0 0 »
« » 1 «« 0 »
«0 0 1 0 0 0 » 0 4 0 0 0 »
«0 0 0 2 0 1» 4 «0 0 0 3 1 2»
«0 0 0 0 2 1» «0 0 0 1 3 2 »
«0 0 0 1 1 2 »¼ «0 0 0 2 2 4 »¼
¬ ¬
3rd case ª1 0 0 0 0 0 º ª1 0 0 0 0 0 º
«0 1 0 0 0 0 » «0 1 0 0 0 0 »
« » « »
«0 0 1 0 0 0 » «0 0 1 0 0 0 »
«0 0 0 1 0 1» «0 0 0 2 1 1»
«0 0 0 0 1 1» «0 0 0 1 2 1»
«0 0 0 1 1 3 »¼ «0 0 0 1 1 1 »¼
¬ ¬
Table 2:
Matrices {I n +[ Ac( V +CZCc) -1 A]1 Ac(V +CZCc) -1} for a mixed
Gauss-Markov model with fixed and random effects
1st case ª 13 7 4 5 1 4º
« 7 13 4 1 5 4 »
1 «« 4 4 16 4 4 8 »
»
24 « 11 7 4 19 1 4»
« 7 11 4 1 19 4 »
« 4 4 8 4 4 16 »¼
¬
2nd case ª 13 5 6 4 4 3º
« 5 13 6 4 4 3 »
1 «« 6 6 12 0 0 6 »
»
24 « 11 5 6 20 4 3»
« 5 11 6 4 20 3 »
« 6 6 12 0 0 18 »¼
¬
10-3 An example for collocation 391
3rd case ª5 1 2 3 1 0 º
« 1 5 2 1 3 0 »
1 «« 2 2 4 2 2 0 »
»
8 « 3 1 2 5 1 0 »
« 1 3 2 1 5 0 »
«2 2 4 2 2 8 »¼
¬
Table 3:
Matrices {I n C[Cc(V +CZCc)-1Cc]1 Cc(V +CZCc)-1 } for a mixed Gauss-Markov
model with fixed and random effects
2nd case ª2 0 0 0 0 0º
«0 2 0 0 0 0»
« »
«0 0 2 0 0 0»
«0 0 0 1 1 1 »
«0 0 0 1 1 1»
«0 0 0 0 0 0 »¼
¬
3rd case ª1 0 0 0 0 0º
«0 1 0 0 0 0»
« »
«0 0 1 0 0 0»
«0 0 0 0 0 0»
«0 0 0 0 0 0»
«0 0 0 1 1 1 »¼
¬
Table 4:
“Schur-complements” SA, SC , three cases, for a
mixed Gauss-Markov model with fixed and random effects
SA SC
S A1 SC1
(10.10) (10.12)
392 10 The fifth problem of probabilistic regression
Table 5:
Vectors sA and sC, three cases, for a mixed
Gauss-Markov model with fixed and random effects
sA sC
(10.11) (10.13)
1st case ª1 0 1 0 0 0 º 1 ª 9 3 12 9 3 12 º
«0 1 1 0 0 0 » y 8 ¬ 3 9 12 3 9 12 »¼
« y
¬ ¼
2nd case ª1 0 1 0 0 0 º 1 ª 7 1 6 4 4 9 º
«0 1 1 0 0 0 » y 24 «¬ 1 7 6 4 4 9 »¼
y
¬ ¼
3rd case ª1 0 1 0 0 0 º 1 ª 3 1 2 5 1 0 º
«0 1 1 0 0 0 » y 8 «¬ 1 3 2 1 5 0 »¼
y
¬ ¼
First case:
n
ȟ̂ , D{ȟˆ} , E n
{z} , D{E{z}}
Second case:
n
ȟ̂ , D{ȟˆ} , E n
{z} , D{E{z}}
Third case:
n
ȟ̂ , D{ȟˆ} , E n
{z} , D{E{z}}.
Here are the results of computing
n
E n
{y} , D{E{z}}, e y and D{e y } ,
Table 6:
Numerical values, Case 1
n
1st case: E n
{y} , D{E {y}} , e y , D{e y }
1
ª2 1 1 0 0 0º ª 2 y1 + y2 y3 º
«1 2 1 0 0 0 » « y1 + 2 y2 + y3 »
« » « »
n 1 1 2 0 0 0» y + 2 y2 + 2 y3 »
E{y} = « y=« 1
«0 0 0 2 1 1» « 2 y4 + y5 y6 »
«0 0 0 1 2 1 » « y4 + 2 y5 y6 »
«0 0 0 1 1 2¼» « y + y + 2y »
¬ ¬ 4 5 6 ¼
ª2 1 1 0 0 0º
«1 2 1 0 0 0»
2 « »
n V « 1 1 2 0 0 0»
D{E{y}} =
3 «0 0 0 5 4 1»
«0 0 0 4 5 1»
«0 0 0 1 1 2 »¼
¬
ª 1 5 8 5 1 4 º
« 5 1 8 1 5 4»
1 «« 8 8 4 4 4 8»
»
e y =
12 « 11 7 4 7 11 8 »
«7 11 4 11 7 8»
« 4 4 8 8 8 4 »¼
¬
ª 1 1 1 0 0 0º
« 1 1 1 0 0 0»
2 « »
V « 1 1 1 0 0 0»
D{e y } = .
3 «0 0 0 1 1 1»
«0 0 0 1 1 1»
«0 0 0 1 1 1 »¼
¬
Table 7
Numerical values, Case 2
n
2nd case: E n
{y} , D{E{y}} , e y , D{e y }
1
ª4 2 2 0 0 0º
«2 4 2 0 0 0»
n 1 « 2 2 4 0 0 0»
E{y} = «
6« 0 0 0 3 3 3»
»
«0 0 0 3 3 3»
¬0 0 0 0 0 6¼
394 10 The fifth problem of probabilistic regression
ª2 1 1 0 0 0º
«1 2 1 0 0 0»
n V 2
« 1 1 2 0 0 0»
D{E{y}} = «
3 «0 0 0 5 4 1»
»
«0 0 0 4 5 1»
¬0 0 0 1 1 2¼
ª 2 2 2 0 0 0 º ª 2 y1 2 y2 + 2 y3 º
« 2 2 2 0 0 0 » « 2 y1 + 2 y2 2 y3 »
« »
1 «« 2 2 2 0 0 0 »» 2 y 2 y2 + 2 y3 »
e y = y=« 1
6 « 0 0 0 3 3 3 » « 3 y4 3 y5 + 3 y6 »
« 0 0 0 3 3 3» « 3 y4 3 y5 + 3 y6 »
«0 0 0 0 0 0» « »
¬ ¼ ¬ 3 y4 + 3 y5 3 y6 ¼
ª 2 2 2 0 0 0º
« 2 2 2 0 0 0»
2 « »
V « 2 2 2 0 0 0»
D{e y } =
6 « 0 0 0 3 3 0»
« 0 0 0 3 3 0»
«0 0 0 0 0 0 »¼
¬
n
1st case: E n
{y} , D{E{y}} , e y , D{e y }
ª2 1 1 0 0 0º ª 2 y1 + y2 y3 º
«1 2 1 0 0 0» « y1 + 2 y2 + y3 »
« » « »
n 1 1 2 0 0 0» y + 2 y2 + 2 y3 »
E{y} = « y=« 1
«0 0 0 2 1 1» « 2 y4 + y5 y6 »
«0 0 0 1 2 1 » « y4 + 2 y5 y6 »
«0 0 0 1 1 2¼» « y + y + 2y »
¬ ¬ 4 5 6 ¼
ª2 1 1 0 0 0º
«1 2 1 0 0 0»
2 « »
n V « 1 1 2 0 0 0»
D{E{y}} =
3 «0 0 0 5 4 1»
«0 0 0 4 5 1»
«0 0 0 1 1 2 »¼
¬
ª 1 5 8 5 1 4 º
« 5 1 8 1 5 4»
1 «8 8 4 4 4 8»
»
e y = «
12 « 11 7 4 7 11 8 »
«7 11 4 11 7 8»
« 4 4 8 8 8 4 »¼
¬
10-3 An example for collocation 395
ª 1 1 1 0 0 0º
« 1 1 1 0 0 0»
2 « »
V « 1 1 1 0 0 0»
D{e y } = .
3 «0 0 0 1 1 1»
«0 0 0 1 1 1»
«0 0 0 1 1 1 »¼
¬
Table 8
Numerical values, Case 3
n
3rd case: E n
{y} , D{E{y}} , e y , D{e y }
ª0 0 0 0 0 0º
«0 0 0 0 0 0»
n 1« 0 0 0 0 0 0»
»
E{y} = «
3 « 2 1 1 0 0 0»
« 1 2 1 0 3 0»
« 1 1 0 3 3 0»¼
¬
ª2 1 1 0 0 0º
«1 2 1 0 0 0»
2 « »
n V « 1 1 2 0 0 0»
D{E{y}} =
3 «0 0 0 3 0 3»
«0 0 0 0 3 3»
«0 0 0 3 3 6 »¼
¬
ª 1 1 1 0 0 0º ª y1 y2 + y3 º
« 1 1 1 0 0 0» « y1 + 2 y2 y3 »
1 « 1 1 1 0 0
»
0»
«
y y2 + y3 »
»
e y = « y=« 1
3« 0 0 0 0 0 0» « 0 »
«0 0 0 0 0 0» « 0 »
«0 0 0 0 0 »
0¼ « 0 »
¬ ¬ ¼
ª 1 1 1 0 0 0º
« 1 1 1 0 0 0»
V 2 «« 1 1 1 0 0 0»
»
D{e y } = .
3 «0 0 0 0 0 0»
«0 0 0 0 0 0»
«0 0 0 0 0 0 »¼
¬
396 10 The fifth problem of probabilistic regression
Table 9
Data of type z , D{z} for 3 cases
1st case:
1 ª 2 1 1 2 1 1º
z = « y,
3 ¬ 1 2 1 1 2 2 »¼
V ª7 5 º
D{z} = ,
3 «¬ 5 7 »¼
2nd case:
1 ª 4 2 2 3 3 3º
z = « y,
3 ¬ 2 4 2 3 3 3 »¼
V ª13 5 º
D{z} = ,
6 «¬ 5 13»¼
3rd case:
1 ª 2 1 1 3 0 0 º
z = « y,
3 ¬ 1 2 1 0 3 0 »¼
V ª5 1 º
D{z} = ,
3 «¬1 5»¼
Table 10
Data of type Vˆ 2 , D{Vˆ 2 }, Dˆ {Vˆ 2 } for 3 cases
1st case: n=6, m=2, A = 2 , n m A = 2
ª7 1 2 5 1 4º
« 1 7 2 1 5 4 »
1 « 2 2 10 4 4 8 »
»
Vˆ 2 = y c « y , D{Vˆ 2 } = V 4 ,
12 « 5 1 4 7 1 2 »
« 1 5 4 1 7 2»
«4 4 8 2 2 10 »¼
¬
ª7 1 2 5 1 4º
« 1 7 2 1 5 4 »
« »
1 2 2 10 4 4 8 » 2
Dˆ {Vˆ 2 } = {y c « y} ,
144 « 5 1 4 7 1 2 »
« 1 5 4 1 7 2»
«4 4 8 2 2 10 »¼
¬
2nd case: n=6, m=2, A = 2 , n m A = 2
10-4 Comments 397
ª 2 2 2 0 0 0 º
« 2 2 2 0 0 0 »
1 « 2 2 2 0 0 0 »
Vˆ = y c « » y , D{Vˆ } = V ,
2 2 4
12 « 0 0 0 3 3 3 »
« 0 0 0 3 3 3»
¬ 0 0 0 3 3 3 ¼
ª2 2 2 0 0 0 º
« 2 2 2 0 0 0 »
ˆ 1 «2 2 2 0 0 0 » 2
D{Vˆ } =
2
{y c « y} ,
144 « 0 0 0 3 3 3 »
»
«0 0 0 3 3 3»
¬0 0 0 3 3 3 ¼
3rd case: n=6, m=2, A = 2, nmA = 2
ª 1 1 1 0 0 0º
« 1 1 1 0 0 0»
1 « 1 1 1 0 0 0»
Vˆ 2 = y c « y , D{Vˆ 2 } = V 4 ,
6 «0 0 0 0 0 0»
»
«0 0 0 0 0 0»
¬0 0 0 0 0 0¼
ª 1 1 1 0 0 0º
« 1 1 1 0 0 0»
1 « 1 1 1 0 0 0» 2
Dˆ {Vˆ } =
2
{y c « y} .
144 « 0 0 0 0 0 0»
»
«0 0 0 0 0 0»
¬0 0 0 0 0 0¼
Here is my journey’s end.
10-4 Comments
(i)
In their original contribution A. N. Kolmogorov (1941 a, b, c) and N. Wiener
(“yellow devil”, 1939) did not depart from our general setup of fixed effects and
random effects. Instead they departed from the model
q q
“ yPD = cPD + ¦ cPD QE yQE + ¦c PD QE QJ yQE yQJ + O ( y 3 ) ”
E =1 E J , =1
yP = yQ cP Q + yQ cP Q + " + yQ cP Q
2 1 2 1 2 2 2 q 2 q
…
yP = yQ cP Q + yQ cP Q + " + yQ cP Q
p 1 1 p 1 1 2 p 1 2 q p 1 q
yP = yQ cP Q + yQ cP Q + " + yQ cP Q .
p 1 p 1 2 p 2 q p q
398 10 The fifth problem of probabilistic regression
P1 P2 = Q1 Q2“
n
E{ yP } = C(CȈ y1 C) 1 CȈ y1 yP
P P
n
D{E{ yP }} = C(CȈ y1 C) 1 Cc P
or
for all yP { yP ," , yP }
1 P
Q = Qq
“Kolmogorov-Wiener prediction”
k =q
c pj ( KW ) = ¦ [ cov(Q j , Qk )]1 cov(Qk , P)
k =1
q q
E{( yP yˆ P ) 2 | KW } = cov( P, P) ¦¦ cov( P, Q j ) cov( P, Qk )[cov(Q j , Qk )]1
j =1 k =1
constrained to
| Q j Qk |
Qj Qk
cov(Q j , Qk ) = cov(Q j Qk )
n 1
cov(W) = ¦ ( yQ E{ yQ })( yQ E{ yQ }).
NG |Q j Qk |
j j k k
(ii)
The first model applies if we want to predict data of one type to predict data of
the same type. Indeed, we have to generalize if we want to predict, for instance,
vertical deflections,
gravity gradients, from gravity disturbances.
gravity values,
The second model has to start from relating one set of heterogeneous data to
another set of heterogeneous data. In the case we have to relate the various data
sets to each other. An obvious alternative setup is
(iii)
The level of collocation is reached if we include a trend model in addition to
Kolmogorov-Wiener prediction, namely
E{y} = Aȟ + CE{z} ,
the trend being represented by Aȟ . The decomposition of „trend“ and “signal”
is well represented in E. Grafarend (1976), E. Groten (1970), E. Groten and H.
Moritz (1964), S. Heitz (1967, 1968, 1969), S. Heitz and C.C. Tscherning (1972),
R.A. Hirvonen (1956, 1962), S. K. Jordan (1972 a, b, c, 1973), W.M. Kaula
(1959, 1963, 1966 a, b, c, 1971), K.R. Koch (1973 a, b), K.R. Koch and S. Lauer
(1971), L. Kubackova (1973, 1974, 1975), S. Lauer (1971 a, b), S.L. Lauritzen
(1972, 1973, 1975), D. Lelgemann (1972, 1974), P. Meissl (1970, 1971), in par-
ticular H. Moritz (1961, 1962, 1963 a, b, c, d, 1964, 1965, 1967, 1969, 1970 a, b,
c, d, e, 1971, 1972, 1973 a, b, c, d, e, f, 1974 a, b, 1975), H. Moritz and K.P.
Schwarz (1973), W. Mundt (1969), P. Naicho (1967, 1968), G. Obenson (1968,
1970), A.M. Obuchow (1947, 1954), I. Parzen (1960, 1963, 1972), L.P. Pellinen
(1966, 1970), V.S. Pugachev (1962), C.R. Rao (1971, 1972, 1973 a, b), R. Rupp
(1962, 1963, 1964 a, b, 1966 a, b, c, 1972, 1973 a, b, c, 1974, 1975), H. P.
Robertson (1940), M. Rosenblatt (1959, 1966), R. Rummel (1975 a, b), U. Schatz
(1970), I. Schoenberg (1942), W. Schwahn (1973, 1975), K.P. Schwarz (1972,
400 10 The fifth problem of probabilistic regression
• ˆ
ŷ2 = Ly (best homogeneous linear unbiased prediction)
1
dispersion identities
D3 d D2 d D4 .
(v)
In spite of the effect that “trend components” and “KW prediction” may serve
well the needs of an analyst, generalizations are obvious. For instance, in Krige’s
prediction concept it is postulated that only
|| y p y p ||2 = : E{( y p y p ( E{ y p } E{ y p })) 2 }
1 2 1 2 1 2
|| ( yQ E{ yQ }) (...) ( yQ E{ yQ }) ||
1 1 n n
By means of Figure 11.1 we review the mixed model (fixed effects plus random
effects), total least squares (fixed effects plus “errors-in-variables”) and a spe-
cial type of the mixed model which superimposes random effects and “errors-in-
variables”. Here we will concentrate on the model
“errors-in-variables”.
In the context of the general probabilistic regression problem
E{y} = Aȟ + &E{]} + E{;}Ȗ ,
we specialize here to the model
“errors in variables”,
namely
E{y} = E{X}Ȗ
in which y as well as X (vector y and matrix X) are unknown. A simple example
is
the straight line fit,
abbreviated by
“y = ax + b1”.
(x, y) is assumed to be measured, in detail
E{y} = aE{x} + b1 =
ªa º
= [ E{x}, 1] « »
¬b ¼
E{(x E{x}) } z 0, E{(y E{y})2 } z 0
2
but Cov{x, y} = 0.
Note
Ȗ1 := a, Ȗ 2 := b
E{y} = y ey , E{x} = x ex
Cov{ex , ey } = 0
and
ªȖ º
y e y = [x e x , 1] « 1 »
¬Ȗ2 ¼
y = xȖ1 + 1Ȗ 2 e x Ȗ1 + e y .
11 The sixth problem of probabilistic regression 403
constrained by the Lagrangean
L ( Ȗ1 , Ȗ 2 , e x , e y , Ȝ ) =:
= ecx Px e x + ecy Py e y + 2Ȝ c( y xȖ1 1Ȗ 2 + e x Ȗ1 e y ) =
= min .
Ȗ1 , Ȗ 2 , ex , e y , Ȝ
y •
•
( E{x}, E{ y}) •
• •
•
ey
•
ex P ( x, y )
x
Figure11.2. The straight line fit of total least squares
( E{y} = a E{x} + 1b, E{x} = x e x , E{y} = y e y )
404 11 The sixth problem of probabilistic regression
We can further solve the optimality problem using the Frobenius Ȉ - semi-
norms:
ª y 1Ȗ 2 Ȗ1 E n {x}º 1 ª y 1Ȗ 2 Ȗ1 E n{x}º
« »Ȉ « »
«¬ xE n {x} »¼ «¬ xE n {x} »¼
ªeˆ º
ª¬eˆ cy , eˆ cx º¼ Ȉ 1 « y » = eˆ cx Px eˆ x + eˆ cy Py eˆ y = min .
¬eˆ x ¼ 1 2 Ȗ ,Ȗ
and n t m (11.10)
L =: || i y ||2 W + || I X ||2 W
y X
(11.11)
subject to
y X Ȗ + I X Ȗ i y = 0. (11.12)
1 wL
= WX I X + ȖȜ cA (11.15)
2 wI X
1 wL
= y X y + IX Ȗ i y = 0 (11.16)
2 wȜ
and
w2L
det ( ) t 0. (11.17)
wJ i wJ j
:Proof:
First, we begin with the modified risk function
406 11 The sixth problem of probabilistic regression
if and only if
1 wL
= X cȜ A + I'X Ȝ A = 0 (11.19)
2 wȖ
1 wL
= Wy i y Ȝ A (11.20)
2 wi y
1 wL
= WX I X = 0 (11.21)
2 w (tr I'X WX I X )
1 wL
= y XȖ + I X Ȗ i y = 0 (11.22)
2 wȜ A
and
w2L
(11.23) positive semidefinite.
wȖ wȖ '
The first derivatives guarantee the necessity of the solution, while the second
derivatives being positive semidefinite assure the sufficient condition.
ªa º
= [ E{x}, 1] « »
¬ b¼
or
ªJ º
J 1 := a, J 2 := b, xJ 1 + 1J 2 = [ x, 1] « 1 »
¬J 2 ¼
and
y xJ 1 1J 2 + ecx J 1 ecy = 0.
( J 1 , J 2 ) are the two unknowns in the parameter space. It has to be noted that the
term exJ 1 includes two coupled unknowns, namely e x and J 1 .
Second, we formulate the modified method of least squares.
L (J 1 , J 2 , e x , e y , Ȝ ) =
= icWi + 2( y c x cJ 1 1J 2 + i 'x J 1 i ' y )Ȝ =
= icWi + 2Ȝ c( y xJ 1 1J 2 + i 'x J 1 i y )
or
i cy Wy i y + i cx Wx i x +
+2(y c xcJ 1 1cJ 2 + i cxJ 1 i cy )Ȝ.
Third, we present the necessary and sufficient conditions for obtaining the mini-
mum of the modified method of least squares.
1 wL
(11.28) = x cȜ A + icx Ȝ A = 0
2 wJ 1
1 wL
(11.29) = 1cȜ A = 0
2 wJ 2
1 wL
(11.30) = Wy i y Ȝ A = 0
2 wi y
1 wL
(11.31) = Wx i x + Ȝ lJ 1 = 0
2 wi x
1 wL
(11.32) = y xJ 1 1J 2 + i xJ i y = 0
2 wȜ
and
ª w 2 L / w 2J 1 w 2 L / wJ 1wJ 2 º
det « 2 » t 0. (11.33)
¬w L / wJ 1wJ 2 w 2 L / w 2J 2 ¼
408 11 The sixth problem of probabilistic regression
Indeed, these conditions are necessary and sufficient for obtaining the minimum
of the modified method of least squares.
By Gauss elimination we receive the results
(11.34) (-x c + icx ) Ȝ A = 0
(11.35) Ȝ1 + ... + Ȝ n = 0
(11.36) Wy i y = Ȝ A
(11.37) Wx i x = Ȝ A Ȗ1
(11.38) Wy y = Wy xJ 1 + Wy 1J 2 Wy i xJ 1 + Wy i y
or
Wy y = Wy xJ 1 Wy 1J 2 ( I x J 12 ) Ȝ A = 0
(11.39)
if Wy = Wx = W
and
(11.40) xcWy xcWxJ 1 xcW1J 2 xc(I x J 12 )Ȝ A = 0
(11.42) + x cȜ l = + i 'x Ȝ A
(11.43) Ȝ , +... + Ȝ n = 0
ª x cº ª y cº ª x cº ª x cº
« y c» Wy « x c» WxJ 1 « y c» W1J 2 « y c» ( I x J 1 ) Ȝ A = 0
2
(11.44)
¬ ¼ ¬ ¼ ¬ ¼ ¬ ¼
subject to
(11.45) Ȝ1 + ... + Ȝ n = 0,
(11.46) x1c Ȝ l = i cx Ȝ l .
Let us iterate the solution.
ª0 0 0 0 (xcn i cx ) º ªJ1 º ª 0 º
«0 0 0 0 1cn »» «J » « 0 »
« « 2» « »
«0 0 Wx 0 J 1I n » « ix » = « 0 » .
« » « » « »
«0 0 0 Wy I n » « iy » « 0 »
«¬ x n 1n J 1I n In 0 »¼ «¬ Ȝ A »¼ «¬ y n »¼
We meet again the problem that the nonlinear terms J 1 and i x appear. Our itera-
tion is based on the initial data
11-2 Example: The straight line fit 409
The five unknowns have led us to the example within Figure 11.3.
C.F. Gauss and F.R. Helmert introduced the generalized algebraic regres-
sion problem which can be identified as a system of conditional equations
with unknowns.
:Fast track reading:
Read only Lemma 12.2, Lemma 12.5,
Lemma 12.8
Lemma 12.2
Normal equations: Ax + Bi = By
Definition 12.1
W - LESS: Ax + Bi = By
Lemma 12.3
Condition A B
Lemma 12.5
R, W - MINOLEES: Ax + Bi = By
Definition 12.4
R, W - MINOLESS: Ax + Bi = By
Lemma 12.6
relation between A und B
Lemma 12.8
Definition 12.7
R, W – HAPS: normal equations:
R, W – HAPS: Ax + Bi = By Ax + Bi = By
Lemma 12.10
W - LESS: Ax + Bi = By c
Definition 12.9
W - LESS: Ax + Bi = By c
Lemma 12.11
Condition A B
Lemma 12.13
R, W - MINOLESS:
Ax + Bi = By c
Definition 12.12
R, W - MINOLESS:
Ax + Bi = By c
Lemma 12.14
relation between A und B
The necessary conditions for obtaining the minimum are given by the first de-
rivatives
1 wL
(i A , x A , Ȝ A ) = Wi A + BcȜ A = 0
2 wi
1 wL
(i A , x A , Ȝ A ) = A cȜ A = 0
2 wx
1 wL
(i A , x A , Ȝ A ) = Ax A + Bi A By = 0.
2 wȜ
Details for obtaining the derivatives of vectors are given in Appendix B. The
second derivatives
1 w2L
(i A , x A , Ȝ A ) = W t 0
2 wiwic
are the sufficient conditions for the minimum due to the matrix W being positive
semidefinite.
Due to the condition
416 12 The sixth problem of generalized algebraic regression
R ( Bc ) R ( W )
we have
WW Bc = Bc.
As shown in Appendix A, BW 1B is invariant with respect to the choice of the
generalized inverse W . In fact, the matrix BW - Bc is uniquely invertible.
Elimination of the vector i A leads us to the system of reduced normal equations.
ª BW Bc A º ª Ȝ A º ª By º
« »« » = « »
¬ Ac 0 ¼ ¬xA ¼ ¬ 0 ¼
and finally eliminating Ȝ A to
A c( BW Bc) 1 Ax A = A c( BW Bc) 1 By, (12.8)
because of BW W = B there follows the existence of x A . Uniqueness is assured
12-12 R, W – MINOLESS
R, W - MINOLESS is built on Definition 12.4, Lemma 12.5 and Lemma 12.6.
Definition 12.4 (R, W - MINOLESS, homogeneous conditions
with unknowns):
An m × 1 vector xAm is called R, W - MINOLESS (Minimum
NOrm with respect to the R – Seminorm, LEast Squares Solution
with respect to the W – Seminorm) of the inconsistent system of
linear equations if (12.3) is consistent and x Am is R- MINOS of
(12.3).
12-1 Solving the system of homogeneous condition equations with unknowns 417
ª R A c(BW Bc)-1 A º ª x Am º
« A c(BW Bc)-1 A 0 » «Ȝ » =
¬ ¼ ¬ Am ¼
ª 0 º
=« -1 » (12.11)
¬ A c(BW Bc) By ¼
with the m × 1 vector Ȝ Am of “Lagrange – multipliers”. x Am
exists always and is uniquely determined, if
rk[R, Ac] = m (12.12)
holds, or equivalently, if the matrix
R + A c(BW Bc)-1 A (12.13)
is regular.
The proof of Lemma 12.5 is based on applying Lemma 1.2 on the normal equa-
tions (12.5). The rest is based on the identity
ªR º ªR º
R + A c(BW Bc)-1 A = [ R , A c] «
0
-1 » « » .
¬ 0 c
(BW B ) ¼ ¬ A ¼
RA A = ( RA A ) ' (12.17)
is fulfilled. In this case
Rx Am = RLy (12.18)
is always unique. In the special case that R is positive definite,
the matrix L is unique, fulfilling (12.14) - (12.17).
:Proof:
Earlier we have shown that the representation
(BW Bc)-1 AA = [(BW Bc)-1 AA ]' (12.19)
:Proof:
With the “Lagrange function” Ȝ R, W – HAPS is defined by
L(i, x, Ȝ ) := i cWi + xcRx + 2Ȝ c(Ax + Bi - By ) = min .
i,x,Ȝ
we can assure the existence of our solution x h and, in addition, the equivalence
of the regularity of the matrix (R +A c(BW Bc)-1 A) with the condition
rk[ R, A c] = m , the basis of the uniqueness of x h .
12-2 Examples 421
we treat a height network, consisting of four points whose relative and absolute
heights are derived from height difference measurements according to the net-
work graph in Chapter 9. We shall study various optimal criteria of type
I-LESS
I, I-MINOLESS
I, I-HAPS,
and
R, W-MINOLESS: R positive semidefinite
W positive semidefinite.
We use constructive details of the theory of generalized inverses according to
Appendix A.
Throughout we take advantage of holonomic height difference measurements,
also called “gravimetric leveling”
{ hDE := hE hD , hJD := hD hJ , hEG := hG hE , hGJ := hJ hG }
within the triangles {PD , PE , PJ } and {PE , PG , PJ }. In each triangle we have the
holonomity condition, namely
{hDE + hEJ + hJD = 0, (hDE + hEJ = hJD )}
A c(BBc)-1 Ax A = A c(BBc)-1 By
ª 1 1º
A c(BBc)-1 A = « » =: DE
¬ 1 1 ¼
ª1 º
D = « » , E = [1, 1] .
¬ 1¼
For the matrix of the normal equations
A c(BBc)-1 A = DE
we did rank factorizing:
O(D) = m × r , O(E) = r × m, rkD = rkE = r = 1
1 1 ª 1 1º
[ A c(BBc)-1 A ]' = Ec(EEc) 1 (DcD) 1 Dc = A c(BB c)-1 A = « ,
4 4 ¬ 1 1 »¼
1 ª 1 1 1 1º
A '(BB ')-1 B = .
2 «¬ 1 1 1 1 »¼
I, I – MINOLESS due to rk[R , A ' ] = 2 leads to the unique solution
ª1 1 0 0º
1 «1 1 0 0»
W= «
2 0 0 1 1»
«0 0 1 1 »¼
¬
and W = W , such that R (B ') R ( W) holds. The positive semidefinite ma-
trix R = Diag(0,1) has been chosen in such a way that the rank partitioned un-
known vector x = [x1c , xc2 ]', O(x1 ) = r × 1, O( x 2 ) = ( m 1) × 1, rkA =: r = 1 relat-
ing to the partial solution x 2 = xJ = 0, namely
ª 1 1º 1 ª 1 1 1 1º
A c(BW Bc)-1 A = « » , A c(BW Bc)-1 B = «
¬ 1 1 ¼ 2 ¬ 1 1 1 1 »¼
and
ª 1 1º ª x E º 1 ª 1 1 1 1º
«¬ 1 1 »¼ « x » = 2 «¬ 1 1 1 1 »¼ y ,
«¬ J »¼ Am
1
( x E ) Am = ( hDE + hJD hEG hGJ ), ( xJ ) Am = 0.
2
12-3 Solving the system of inhomogeneous condition equations with
unknowns
First, we solve the problem of inhomogeneous condition equations by the
method of minimizing the W – seminorm of Least Squares. We review by Defi-
nition 12.9 and Lemma 12.10 and Lemma 12.11 the characteristic normal equa-
tions and linear form which build up the solution of type W – LESS. Second, we
extend the method of W – LESS by R, W – MINOLESS by means of Definition
12.12 and Lemma 12.13 and Lemma 12.14. R, W – MINOLESS stands for Mini-
mum Norm East Squares Solution (R – Seminorm, W – Seminorm of type (LEast
Squares). Third, we alternatively present by Definition 12.15 and Lemma 12.16
R, W – HAPS (Hybrid AProximate Solution with respect to the combined R- and
W– Seminorm). Fourth, we again compare R, W – MINOLESS and R, W –
HAPS by means of computing the difference vector x A x Am .
12-31 W – LESS
W – LESS of our system of inconsistent inhomogeneous condition equations
with unknowns Ax + Bi = By - c, By c + R(A) is built on Definition 12.9,
Lemma 12.10 and Lemma 12.11.
Definition 12.9 (W - LESS , inhomogeneous conditions with un-
knowns):
An m × 1 vector x A is called W - LESS (LEast Squares Solution
with respect to the W- seminorm) of the inconsistent system of
inhomogeneous linear equations
12-3 Solving the system of inhomogeneous condition equations with unknowns 425
Ax + Bi = By - c (12.46)
ª R A c(BW Bc)-1 A º ª x Am º
« A c(BW Bc)-1 A 0 » «Ȝ » =
¬ ¼ ¬ Am ¼
ª 0 º
=« -1 » (12.59)
¬ A c(BW Bc) (By c) ¼
with the m × 1 vector Ȝ Am of “Lagrange – multipliers”. x Am exists al-
ways and is uniquely determined, if
rk[R, A '] = m (12.60)
12-3 Solving the system of inhomogeneous condition equations with unknowns 427
holds, or equivalently, if the matrix
R + A c(BW Bc)-1 A (12.61)
is regular. In this case the solution can be represented by
x Am = [R + A c(BW Bc)-1 A c(BW Bc)-1 A ] ×
×{A c(BW Bc)-1 A[R + A c(BW Bc)-1 A ]1 × (12.62)
×A c(BW Bc) A} A c(BW Bc) (By c) ,
-1 -1
12-33 R, W – HAPS
R, W – HAPS is alternatively built on Definition 12.15 and Lemma 12.16 for the
special case of inconsistent, inhomogeneous conditions equations with unknowns
Ax + Bi = By c , By c + R(A) .
Definition 12.15 (R, W - HAPS, inhomogeneous conditions with un-
knowns):
An m × 1 vector x h with Bi h = By c Ax h is called R, W -
HAPS (Hybrid APproximate Solution with respect to the com-
bined R- and W- Seminorm if compared to all other vectors
x R n of type Bi = By c Ax the inequality
428 12 The sixth problem of generalized algebraic regression
mal.
The solution of type R, W – HAPS can be computed by
Lemma 12.16 (R, W – HAPS inhomogeneous conditions
with unknowns: normal equations):
An m × 1 vector x h is R, W – HAPS of the Gauß – Helmert
model of inconsistent, inhomogeneous condition equations with
unknowns if and only if the normal equations
ª W Bc 0 º ª i h º ª 0 º
« B 0 A » « Ȝ h » = « By c » (12.70)
« »« » « »
¬ 0 Ac R ¼ ¬ x h ¼ ¬ 0 ¼
with the q × 1 vector Ȝ h of “Lagrange – multpliers” are fulfilled. x A ex-
ists certainly in case of
R (Bc) R ( W) (12.71)
and is solution of the system of normal equations
[R +A c(BW Bc)-1 A ]x h = A c(BW B c)-1 (By c) , (12.72)
which is independent of the choice of the generalized inverse W
uniquely defined. x h is uniquely defined if and only if the matrix
[R +A c(BW Bc)-1 A] is regular, equivalently if
rk[R, A c] = m (12.73)
holds. In this case the solution can be represented by
x h = [R +A c(BW Bc)-1 A]1 A c(BW Bc)-1 (By c). (12.74)
The proof of Lemma 12.16 follows the lines of Lemma 12.8.
12-34 R, W - MINOLESS against R, W - HAPS
Again we note the relations between R, W-MINOLESS and R, W-HAPS: R, W-
HAPS is unique because the representations (12.59) and (12.12) are identical. Let
us replace (12.59) by the equivalent system
(R +A c(BW Bc)-1 A)x Am + A c(BW B c)-1 AȜ Am = A c(BW B c)-1 (By c) (12.75)
and
A c(BW Bc) x Am -1
= A c(BW Bc)-1 (By c) , (12.76)
12-4 Conditional equations with unknowns 429
such that the difference
x h x Am = [R +A'(BW - B')-1 A ]1 A'(BWB')-1 AȜ Am (12.77)
follows automatically.
12-4 Conditional equations with unknowns: from the algebraic ap-
proach to the stochastic one
Let us consider the stochastic portray of the model “condition equations with
unknowns”, namely the stochastic Gauß-Helmert model. Consider the model
equations
AE{x} = BE{y} Ȗ or Aȟ = BȘ Ȗ
subject to
O( A) = q × m, O(B) = q × n
Ȗ = Bį for some į R n
rkA = m < rkB = q < n
E{x} = ȟ, E{y} = Ș
E{x} = x e x , E{y} = y e y
ª E{x E{x}} = 0
«
¬ E{(x E{x})( x E{x}) '} = ı x 4 x
2
versus
ª E{y E{y}} = 0
«
«¬ E{(y E{y})( y E{y}) '} = ı y 4 y .
2
w = Aȟ + Be y .
ȟ E{ȟˆ} º ªȟ = K1BE{y} + l1 = K1 ( Aȟ + Ȗ ) + l1
» or «
n
E{x} = E{E{x}}¼» ¬ for all ȟ R m .
430 12 The sixth problem of generalized algebraic regression
n
The unknown E{y} = y IJ = y (K2 By A 2 ) is uniformly unbiased estimable if
and only if
n
B{y} = E{E{y}} = (I n K2 B) E{y} + A 2 = E{y} K2 ( Aȟ + Ȗ ) + l2
12-43 n
The first step: unbiased estimation of ȟ̂ and E{ȟ}
n
The key lemma of unbiased estimation of ȟ̂ and E{ȟ} will be presented first.
versus
which is built on the scalar x1 , the vector x 2 and the matrix X 3 . In addition, by
the matrices
ª x1 x2 ... xn 1 xn º
Y1 := «« y1 y2 ... yn 1 yn »» \ 3×n
«¬ z1 z2 ... zn 1 zn »¼
and
ª X1 X 2 ... X n 1 Xn º
Y2 := «« Y1 Y2 ... Yn 1 Yn »» \ 3×n
«¬ Z1 Z2 ... Z n 1 Z n »¼
ª x1 x2 x3 x4 ºc ª X1 X2 X3 X 4 ºc
Y1 := «« y1 y2 y3 y4 » «« Y1
» Y2 Y3 Y4 »» =: Y
«¬ z1 z2 z3 z4 »¼ «¬ Z1 Z2 Z3 Z 4 »¼
Obviously, the coordinate errors (e11 , e12 , e13 ) have the same weight w1 ,
(e 21 , e 22 , e 23 ) have the same weight w2 , (e31 , e32 , e33 ) have the same weight w3 ,
and finally (e 41 , e 42 , e 43 ) have the same weight w4 . We may also say that the
error weight is pointwise isotropic,
weight e11 = weight e12 = weight e13 = w1
etc. However, the error weight is not homogeneous since
w1 = weight e11 z weight e 21 = w2 .
Of course, an ideal homogeneous and isotropic weight distribution is guaranteed
by the criterion w1 = w2 = w3 = w4 .
13-1 The 3d datum transformation and the Procrustes Algorithm
First, we present W - LESS for our nonlinear adjustment problem for the un-
knowns of type scalar, vector and special orthonormal matrix. Second, we re-
view the Procrustes Algorithm for the parameters {x1 , x 2 , X 3} .
Definition 13.1: (nonlinear analysis for the three-dimensional
datum transformation: the conformal group
^ 7 (3) ):
The parameter array {x1A , x 2 A , X 3A } is called W – LESS (LEast
Squares Solution with respect to the W – Seminorm) of the in-
consistent linear system of equations
Y2 Xc3 x1 + 1xc2 + E = Y1 (13.1)
subject to
Xc3 X3 = I 3 , | X3 |= +1 (13.2)
434 13 The nonlinear problem
with respect to x 2 !
1
L( x1 , x 2 , X 3 ) := || E ||2W =|| Y1 Y2 X '3 x1 1x ' 2 ||2W =
2
1
= tr( Y1 Y2 X '3 x1 1x '2 ) ' W( Y1 Y2 X '3 x1 1x '2 ) =
2
= min
x1 t 0, x 2 R 3×1 , X 3 ' X 3 = I 3
wL
( x 2A ) = (1'W1)x 2 ( Y1 Y2 X '3 x1 ) ' W1 = 0
wx ,2
1
L( x1 , X 3 ) = tr{[(I (1'W1) 111 ') W( Y1 Y2 X '3 x1 )]' W *
2
*[( I (1'W1) 1 11 ') W( Y1 Y2 X '3 x1 )]}
1
C := I n 11'
2
being a definition of the centering matrix, namely for W = I n
being in general symmetric. Substituting the centering matrix into the reduced
Lagrangean L( x1 , X 3 ) , we gain the centralized Lagrangean
1
L( x1 , X3 ) = tr{[ Y1 Y2 X '3 x1 ]' C'WC[ Y1 Y2 X '3 x1 ]}. (13.9)
2
Step two: x1
subject to
X '3 X 3 = I 3 .
wL
( x1A ) = x1A tr X 3Y '2 C ' WCY2 X '3 tr Y '1 C ' WCY2 X '3 = 0
wx1
constitutes the second necessary condition. Due to
tr X 3Y '2 C ' WCY2 X '3 = tr Y '2 C ' WCY2 X '3 X 3 = Y ' 2 C ' WCY2
lead us to x1A . While the forward computation of (wL / wx1 )( x1A ) = 0 enjoyed a
representation of the optimal scale parameter x1A , its backward substitution into
the Lagrangean L( x1 , X 3 ) amounts to
1 [tr Y '1 C ' WCY2 X '3 ]2 1 [tr Y '1 C ' WCY2 X '3 ]2
L( X 3 ) = tr( Y '1 C ' WCY1 ) +
2 [tr Y '2 C ' WCY2 ] 2 [tr Y '2 C ' WCY2 ]
Third, we are left with the proof for the Corollary 13.4, namely X 3 .
13-1 The 3d datum transformation and the Procrustes Algorithm 437
Step three: X 3
Let A := Y '1 C ' WCY2 = UȈV ' , a singular value decomposition with respect to a
left orthonormal matrix U, U'U = I 3 , a right orthonormal matrix V, VV' = I 3
and Ȉ s = Diag (V 1 , V 2 , V 3 ) a diagonal matrix of singular values (V 1 , V 2 , V 3 ).
Then
3 3
tr( AX '3 ) = tr( UȈ s V ' X '3 ) = tr( Ȉ s V ' X 3U ) = ¦ V i rii d ¦ V i
i =1 i =1
holds, since
R = V'X '3 U = [ rij ] R 3×3 (13.15)
3
is orthonormal with || rii ||d 1 . The identity tr ( AX '3 ) = ¦ V i applies, if
i =1
X 3 = UV ' (13.19)
namely
[( Y '1 C ' WCY2 ) '( Y '1 C ' WCY2 ) V i I] v i = 0 (13.21)
In the proof of Corollary 13.6 we only sketch the result that the matrix
I n (1/ n)11' is idempotent:
1 1 2 1
(I n 11c)(I n 11c) = I n 11c + 2 (11') 2
n n n n
2 1 1
= I n 11c + 2 n11c = I n 11c.
n n n
440 13 The nonlinear problem
As a summary of the various steps of Corollary 2-4, 5 and Theorem 5, Table 13.1
presents us the celebrated Procrustes Algorithm, which is followed by one short
und interesting Citation about “Procrustes”.
Following Table 13.1, we present the celebrated Procrustes Algorithm which is a
summary of the various steps of Corollary 2-4,5 and Theorem 5.
Table 13.1: Procrustes Algorithm
ª x1 y1 z1 º ª X 1 Y1 Z1 º
Step 1: Read Y1 = #« # # » and « # # # » = Y2
« » « »
«¬ xn yn zn »¼ «¬ X n Yn Z n ¼»
1
Step 2: Compute: Y '1 CY2 subject to C := I n 11 '
n
Step 3: Compute: SVD Y '1 CY2 = UDiag (V 1 , V 2 , V 3 ) V '
3-1 | ( Y '1 CY2 ) '( Y '1 CY2 ) V i2 I |= 0 (V 1 , V 2 , V 3 )
3-2 (( Y '1 CY2 ) '( Y '1 CY2 ) V i2 I)v i = 0, i {1, 2, 3}
V = [ v1 , v 2 ,v 3 ] right eigenvectors (right eigencolumns)
3-3 U = Y '1 CY2 VDiag (V 11 , V 21 , V 31 ) left eigenvectors
(left eigencolumns)
Step 4: Compute: X 3A = UV ' rotation
tr Y '1 CY2 X '3
Step 5: Compute: x1A = (dilatation)
tr Y '2 CY2
1
Step 6: Compute: x 2 A = ( Y1 Y2 X '3 x1 ) ' 1 (translation)
n
tr Y '1 CY2 VU '
Step 7: Compute: EA = C(Y1 Y2 VU ' ) (error matrix)
tr Y '2 CY2
‘optional control’ EA := Y1 ( Y2 X '3 x1A + 1x '2 A )
Step 8: Compute: || EA ||I := tr( E 'A EA ) (error matrix)
Step 9: Compute: || EA ||I := tr( EcA EA ) / 3n (mean error matrix)
Procrustes (the subduer), son of Poseidon, kept an inn benefiting from what he
claimed to be a wonderful all-fitting bed. He lopped of excessive limbage from
tall guests and either flattened short guests by hammering or stretched them by
racking. The victim fitted the bed perfectly but, regrettably, died. To exclude the
embarrassment of an initially exact-fitting guest, variants of the legend allow
Procrustes two, different-sized beds. Ultimately, in a crackdown on robbers and
monsters, the young Theseus fitted Procrustes to his own bed.
13-2 The variance - covariance matrix of the error matrix E 441
The proof follows directly from “error propagation”. Obviously the variance –
covariance matrix of ȈvecE' can be decomposed in the variance – covariance
matrix ȈvecY ' , the product ( I n
x1X 3 ) ȈvecY ' (I n
X 3 ) ' using prior information
1 2
of x1 and X 3 and the covariance matrix ȈvecY ' , ( I
x X ) vecY ' again using prior 1 n 1 3 2
information of x1 and X3 .
13-3 Case studies: The 3d datum transformation and the Procrustes
Algorithm
By Table 13.1 and Table 13.2 we present two sets of coordinates, first for the
local system A, second for the global system B, also called “World Geodetic
System 84”. The units are in meter. The results of
I – LESS, Procrustes Algorithm
are listed in Table 13.3, especially
completed by Table 13.5 of residuals from the Linearized Least Squares and by
Table 13.6 listing the weight matrix.
442 13 The nonlinear problem
Discussion
By means of the Procrustes Algorithm which is based upon W – LESS with
respect to Frobenius matrix W – norm we have succeeded to solve the normal
equations of Corollary 13.2 and 13.5 (necessary conditions) of the matrix –
valued “error equations”
vec E ' = vec Y '1 (I n
x1X3 ) vec Y '2 vec x 2 1 '
subject to
X '3 X 3 = I 3 , | X 3 |= +1 .
Mean error
matrix norm
(m) 0.0930
||| EA |||W := tr( E WEA ) / 3n
*
A
444 13 The nonlinear problem
1.8110817 0 0 0 0 0 0
0 2.1843373 0 0 0 0 0
0 0 2.1145291 0 0 0 0
0 0 0 1.9918578 0 0 0
0 0 0 0 2.6288452 0 0
0 0 0 0 0 2.1642460 0
0 0 0 0 0 0 2.359370
13-4 References
Here is a list of important references:
Awange LJ (1999), Awange LJ (2002), Awange LJ, Grafarend E (2001 a, b, c),
Awange LJ, Grafarend E (2002), Bernhardt T (2000), Bingham C, Chang T,
Richards D (1992), Borg I, Groenen P (1997), Brokken FB (1983), Chang T, Ko
DJ (1995), Chu MT, Driessel R (1990), Chu MT, Trendafilov NT (1998),
Crosilla F (1983a, b), Dryden IL (1998), Francesco D, Mathien PP, Senechal D
(1997), Golub GH (1987), Goodall C (1991), Gower JC (1975), Grafarend E
and Awange LJ (2000, 2003), Grafarend E, Schaffrin B (1993), Grafarend E,
Knickmeyer EH, Schaffrin B (1982), Green B (1952), GullikssonM(1995a, b),
Kent JT, Mardia KV (1997), Koch KR (2001), Krarup T (1979), Lenzmann E,
Lenzmann L (2001a, b), Mardia K (1978), Mathar R (1997), Mathias R (1993),
Mooijaart A, Commandeur JJF (1990), Preparata FP, Shamos MI (1985), Reink-
ing J (2001), Schönemann PH (1966), Schönemann PH, Carroll RM (1970),
Schottenloher M (1997), Ten Berge JMF (1977), Teunissen PJG (1988), Tre-
fethen LN, Bau D (1997) and Voigt C (1998).
14 The seventh problem of generalized algebraic regression
revisited: The Grand Linear Model:
The split level model of conditional equations with unknowns
(general Gauss-Helmert model)
There are many examples for such a model. As a holonomity condition there is
said, for instance, that the “true observations” fulfill an equation of type
0 = B1E{y} c1 .
Example
Let there be a given two connected triangular networks of type height difference
measurements which we already presented in Chapter 9-3, namely for c1 := 0
and
{hDE + hEJ + hJD = 0}
{hJE + hEG + hJE = 0}
ª1 1 1 0 0 º
B1 = « , y := ª¬ hDE , hEJ , hJD , hEJ , hGE º¼c .
¬ 0 1 0 1 1 »¼
The second equation: A 2 x + B 2 i = B 2 y c 2 , c 2 R (B 2 )
The second condition equation with unknowns is assumed to be the general
model which is characterized by the inconsistent, inhomogeneous system of lin-
ear equations, namely
A 2 x + B 2 i = B 2 y c 2 , c 2 R (B 2 ).
We refer to our old example of fixing a triangular network in the plane whose
position coordinates are derived from distance measurements and fixed by a
datum constraint.
The other linear model of type Chapters 1, 3, 5, 9 and 12 can be consid-
ered as special cases. Lemma 14.1 refers to the solution of type W -
LESS, Lemma 14.2 of type R, W – MINOLESS and Lemma 14.3 of
type R, W – HAPS.
ª W Bc 0 º ª i l º ª 0 º
« B 0 A » « Ȝ l » = « By c » (14.1)
« 0 Ac 0 » « x » « 0 »
¬ ¼¬ l¼ ¬ ¼
with the q3 × 1 vector Ȝ l of “Lagrange multipliers”. x l exists in the case
of
R (Bc) R ( W)
and is solution of the system of normal equations
ª A c2 (B 2 W W1W Bc2 ) 1 A 2 A 3 º ª xl º
« =
¬ A2 0 »¼ «¬ Ȝ 3 »¼
(14.2)
ª A c (B W W1W Bc2 ) 1 (B 2 W W1y k 2 ) º
=« 2 2 »
¬ c 3 ¼
with
W1 := W B1c ( B1 W B1c ) 1 B1 (14.3)
k 2 := c 2 B 2 W B1c ( B1 W B1c ) c1
1
(14.4)
wL
(i l , xl , Ȝ l ) = 2 A cȜ l = 0
wx
wL
(i l , xl , Ȝ l ) = 2( Axl + Bi l By + c) = 0
wȜ
are necessary conditions. Note the theory of vector derivatives is summarized in
Appendix B. The second derivatives
w2L
(i l , x l , Ȝ l ) = 2W t 0
wiwic
constitute due to the positive-semidefiniteness of the matrix W the sufficiency
condition. In addition, due to the identity WW Bc = Bc and the invariance of
BW Bc with respect to the choice of the g-inverse such that with the matrices
BW Bc and B1 W B1c the “Schur complements” B 2 W Bc2 B 2 W B1c
(B1 W B1c ) 1 B1 W Bc2 = B 2 W W1 W Bc2 is uniquely invertible. Once if the vector
( q1 + q2 + q3 ) × 1 vector Ȝ l is partitioned with respect to Ȝ cl := [Ȝ1c , Ȝ c2 , Ȝ c3 ] with
O(Ȝ i ) = q1 × 1 for all i = 1, 2, 3, then by eliminating of i l we arrive at the reduced
system of normal equations
ª B1W B1c B1W Bc2 0 0 º ª Ȝ1 º ª B1y c1 º
« B W B c B W B c 0 A » « Ȝ » « B y c »
« 2 1 2 2 2»
« 2»= « 2 2
» (14.9)
« 0 0 0 A 3»« 3»
Ȝ « c 3 »
¬« 0 A c2 A c3 0 »¼ «¬ Ȝ l »¼ «¬ 0 »¼
¬ 2 1 2 2¼
:Proof:
The proof follows the line of Lemma 12.13 if we refer to the reduced system of
normal equations (12.62). The rest is subject to the identity (12.59)
ªR 0 0 0º
« »
0 I 0 0» ªR º
R + N = [R , A c] «
« ». (14.19)
« 0 0 (B 2 W W1W Bc2 )1 0 » ¬ A ¼
« »
¬« 0 0 0 I ¼»
R+N (14.25)
with
N := A c2 ( B 2 W W1 W Bc2 ) 1 A 2 + A c3 A 3 (14.26)
x h = (R + N ) 1{R + N A c3 [ A 3 (R + N) 1 A c3 ] A 3 ×
×(R + N ) 1 A c2 (B 2 W W1 W Bc2 ) 1 × (14.28)
×(B 2 W W1 y k 2 ) ( R + N) Ac3 [ A 3 ( R + N) A c3 ) c3 ,
1 1
:Proof:
w 2L
(i h , x h , Ȝ h ) = 2 W t 0
wiwi c
w 2L
( i h , x h , Ȝ h ) = 2R t 0
wxwx c
constitute due to the positive-semidefiniteness of the matrices W and R a suffi-
ciency condition for obtaining a minimum. Because of the condition (14.21)
R (Bc) R ( W) we are able to reduce first the vector i A in order to be left with
the system of normal equations.
452 14 The seventh problem of generalized algebraic regression
and
ª R + N A c3 º ª x h º ª A c2 (B 2 W W1 W Bc2 )1 (B 2 W W1 y k 2 ) A c3 c3 º
=« ».
« A
¬ 3 0 »¼ «¬ Ȝ 3 »¼ ¬ c3 ¼
For any g – inverse ( R + N ) there holds
c3 = A 3 x h = A 3 (R + N) 1 (R + N)x h =
A 3 (R + N)[ A c2 (B 2 W W1 W Bc2 )-1 (B 2 W W1 y k 2 ) A 3 (c3 + Ȝ 3 )]
Nx Am =
(14.32)
= N 3 N Ac2 (B 2 W W1W Bc2 ) (B 2 W W1y k 2 ) A c3 ( A 3 N A c3 ] c3 ,
1
xh xAm = (R + N)1 NȜ Am +
+(R + N)1{R + N Ac3 ( A3 (R + N)1 Ac3 ) A3 ](R + N)1
(14.33)
N3 N- }Ac2 (B2 W W1 W B'2 )-1 ×
×(B2 W W1y k 2 ) + (R + N)1 Ac3 {(A3 N1 Ac3 ) [A3 (R + N)1 Ac3 ] }c3 .
Ax = y Ax + i = y AX = By Ax + Bi = By Bi = By Ax = By - c Ax + Bi = By - c Bi = By - c
y R(A) y R (A) By R(A) By R(A) By c + R(A) By c + R(A)
A2 = A A2 = A A2 = A A2 = A A2 = 0 A2 = A A2 = A A2 = 0
A3 = 0 A3 = 0 A3 = 0 A3 = 0 A3 = 0 A3 = 0 A3 = 0 A3 = 0
B1 = 0 B1 = 0 B1 = 0 B1 = 0 B1 = 0 B1 = 0 B1 = 0 B1 = B
B2 = I B2 = I B2 = B B2 = B B2 = 0 B2 = B B2 = B B2 = 0
(i = 0) (i = 0) (i = 0)
c1 = 0 c1 = 0 c1 = 0 c1 = 0 c1 = 0 c1 = 0 c1 = 0 c1 = c
c2 = 0 c2 = 0 c2 = 0 c2 = 0 c2 = 0 c2 = c c2 = c c2 = 0
c3 = 0 c3 = 0 c3 = 0 c3 = 0 c3 = 0 c3 = 0 c3 = 0 c3 = 0
454 14 The seventh problem of generalized algebraic regression
Example 14.1
As an example of a partitioned general linear system of equations, of type
Ax + Bi = By c we treat a planar triangle whose coordinates consist of three
distance measurements under a datum condition. As approximate coordinates for
the three points we choose
xD = 3 / 2, yD = 1/ 2, xE = 3, yE = 1, xJ = 3 / 2, yJ = 3 / 2
ª 3 / 2 1/ 2 3/2 1/ 2 00 º
« »
A2 = « 0 1 0 0 01 ».
« 0 0 3 / 2 1/ 2 3 / 2 1/ 2 »¼
¬
The number of three degrees of freedom of the network of type translation and
rotation are fixed by three conditions:
ª1 0 0 0 0 0º ª 0, 01º
A 3x = c 3 (c 3 R(A 3 ) ), A 3 = «0 1 0 0 0 0 , c 3 = «0, 02 » ,
»
« » « »
«¬0 0 0 0 1 0»¼ «¬ 0, 01»¼
or
n m, m p , p
or
(vec Y (I p
A) vec X)c(I n
G y (vec Y (I p
A) vec X) = min . (15.12)
x
Thanks to the weight matrix Gij the multivariate least squares solution (15.3)
differs from the special univariate model (15.9). The analogue to the general
LESS model (15.10)-(15.12) of type multivariate BLUUE is given next.
Theorem 15.4 (multivariate Gauss-Markov model of type ȟ i ,
in particular ( Ȉ, I n ) - BLUUE):
A multivariate Gauss-Markov model is ( Ȉ, I n ) - BLUUE if the
vector vecȄ of an array Ȅ, dim Ȅ = n × p , dim(vec Ȅ) = np × 1
of unknowns is estimated by the matrix ȟ i , namely
ˆ = [(I
A)c( Ȉ
I ) 1 (I
A)]1 (I
A)c( Ȉ
I ) 1 vec Y (15.15)
vec Ȅ p n p p n
subject to
rk( Ȉ
I n ) 1 = np . (15.16)
Ȉ ~ V ij denotes the variance-covariance matrix of multivariate
effects yDi for all D = 1,..., n and i = 1,..., p . An unbiased
estimator of the variance-covariance matrix of multivariate
effects is
ª 1 ˆ ˆ
« i = j : Vˆ i = n q (y i Aȟ i )c(y i Aȟ i )
2
« (15.17)
«i z j : Vˆ = 1 (y Aȟˆ )c(y Aȟˆ )
«¬ ij
nq
i i j i
because of
E{(y i Aȟˆ i )c(y j Aȟˆ j )} = E{y ci ( I A( AcA) 1 A c) y j } = V ij ( n q) . (15.18)
A nice example is given in K.R. Koch (1988 pp. 281-286). For practical applica-
tions we need the incomplete multivariate models which do not allow a full rank
matrix V ij . For instance, in the standard multivariate model, it is assumed that
the matrix A of coefficients has to be identical for p vectors y i and the vectors
y i have to be completely given.
If due to a change in the observational program in the case of repeated measure-
ments or due to a loss of measurements, these assumptions are not fulfilled, an
incomplete multivariate model results. If all the matrices of coefficients are dif-
ferent, but if p vectors y i of observations agree with their dimension, the vari-
ance-covariance matrix Ȉ and the vectors ȟ i of first order parameters can be
iteratively estimated.
15-1 The multivariate Gauss-Markov model 459
For example, if the parameters of first order, namely ȟ i , and the parameters of
second order, namely V ij , the elements of the variance-covariance matrix, are
unknown, we may use the hybrid estimation of first and second order parame-
ters of type {ȟ i , V ij } as outlined in Chapter 3, namely Helmert type simultaneous
estimation of {ȟ i , V ij } (B. Schaffrin 1983, p.101).
An important generalization of the standard multivariate Gauss-Markov model
taking into account constraints, for instance caused by rank definitions, e.g. the
datum problem at r epochs, is the
multivariate Gauss-Markov model
with constraints
which we will treat at the end.
Definition 15.5 (multivariate Gauss-Markov model with constraints):
If in a multivariate model (15.1) and (15.2) the vectors ȟ i of pa-
rameters of first order are subject to constraints
Hȟ i = w i , (15.19)
where H denotes the r × m matrix of known coefficients with the
restriction
H ( A cA) A cA = H, rk H = r d m (15.20)
and w i known r × 1 vectors, then
E{y i } = Aȟ i , (15.21)
D{y i , y j } = I nV ij (15.22)
subject to
Hȟ i = w i (15.23)
is called “the multivariate Gauss-Markov model with linear ho-
mogeneous constraints”. If the p vectors w i are collected in the
r × p matrix W , dim W = r × p, the corresponding matrix model
reads
E{Y} = A;, D{vec Y} = Ȉ
I n , HȄ = W (15.24)
subject to
O{Ȉ} = p × p, O{Ȉ
I n } = np × np,
O{vec Y} = np × 1, O{Y} = n × p
(15.25)
O{D{vec Y}} = np × np,
O{H} = r × m, O{Ȅ} = m × p, O{W} = r × p.
A key result is Lemma 15.6 in which we solve for a given multivariate weight
matrix G ij - being equivalent to ( Ȉ
I n ) 1 - a multivariate LESS problem.
Theorem 15.6 (multivariate Gauss-Markov model with constraints):
A multivariate Gauss-Markov model with linear homogeneous
constraints is ( Ȉ, I n ) BLUUE if
ˆ = (I
( AcA) Ac) vec Y + Y(I
( AcA) H c( H( AcA) H c) 1 ) vec Y
vec Ȅ p p
where yij is investment index of the jth person in the ith education level, P is a
general mean, Di is the effect on investment of the ith level of education and eij
is the random error term peculiar to yij . For the data of Table 15.1 there are 3
educational levels and i takes the values j = 1, 2,..., ni 1, ni where ni is the
number of observations in the ith educational level, in our case n1 = 3, n2 = 2 and
n3 = 2 in Table 15.1.
Our model is the model for the 1-way classification. In general, the groupings
such as educational levels are called classes and in our model yij as the response
and levels of education as the classes, this is a model we can apply to many
situations.
The normal equations arise from writing the data of Table 15.1 in terms of our
model equation.
ª 74 º ª y11 º ª P + D1 + e11 º
« 68 » « y12 » « P + D1 + e12 »
«77 » « y13 » « P + D1 + e13 »
« 76 » = « y21 » = « P + D 2 + e21 » , O (y ) = 7 × 1
«80 » « y22 » « P + D 2 + e22 »
«85 » « y31 » « P + D 3 + e31 »
«¬ 93 »¼ « y » « P + D + e »
¬ 32 ¼ ¬ 3 32 ¼
462 15 Special problems of algebraic regression and stochastic estimation
or
ª 74 º ª1 1 0 0º ª e11 º
« 68 » «1 1 0 0» «e »
« » « » ª P º « 12 »
«77 » «1 1 0 0 » « » « e13 »
D
« 76 » = y = «1 0 1 0 » « 1 » + « e21 » = Ax + e y
D
«80 » «1 0 1 0 » « 2 » « e22 »
«85 » «1 » «¬D 3 »¼ « »
« » « 0 0 1» « e31 »
«¬ 93 »¼ «¬1 0 0 1 »¼ «¬ e32 »¼
and
ª1 1 0 0º
«1 1 0 0»
«1 1 0 0» ªP º
A = «1 «D »
0 1 0 » , x = «D 1 » , O ( A) = 7 × 4, O( x) = 4 ×1
«1 0 1 0 » 2
«1 «¬D 3 »¼
0 0 1»
«¬1 0 0 1 »¼
with y being the vector of observations and e y the vector of corresponding error
terms. As an inconsistent linear equation
y e y = Ax, O{y} = 7 × 1, O{A} = 7 × 4, O{x} = 4 × 1
we pose the key question:
?What is the rank of the design matrix A?
Most notable, the first column is 1n and the sum of the other three columns is
also one, namely c 2 + c 3 + c 4 = 1n ! Indeed, we have a proof for a linear depend-
ence: c1 = c 2 + c 3 + c 4 . The rank rk A = 3 is only three which differs from
O{A} = 7 × 4. We have to build in this rank deficiency. For example, we could
postulate the condition x4 = D 3 = 0 eliminating one component of the unknown
vector. A more reasonable approach would be based on the computation of the
symmetric reflexive generalized inverse
such that
xlm = ( A cA) rs A cy , (15.33)
which would guarantee a least squares minimum norm solution or a V, S-
BLUMBE solution (Best Linear V-Norm Uniformly Minimum Bias S-Norm
Estimation) for V=I, S=I and
rk A = rk A cA = rk( A cA) rs = rk A + (15.34)
A cA is a symmetric matrix ( A cA ) rs is a symmetric matrix
or called
:rank preserving identity:
!symmetry preserving identity!
15-2 n-way classification models 463
ª7 3 2º
«3 3 0»
DcD = « , E to be determined:
2 0 2»
«2 0 »
0¼
¬
DcA cA = DcDE ( DcD) 1 DcAcA = E
ª7 3 2º
ª7 3 2 2º «3 3 0 » ª«
66 30 18º
DcD = « 3 3 0 0 » «2 = 30 18 6 »
«¬ 2 0 2 0 »¼ 0 2» «
«2 » ¬18 6 8 »¼
¬ 0 0¼
( A cA) rs A c =
ª 0.0833 0.0833 0.0833 0.1250 0.1250 0.1250 0.1250 º
« 0.2500 0.2500 0.2500 -0.1250 -0.1250 -0.1250 -0.1250 »
«-0.0833 -0.0833 -0.0833 0.3750 0.3750 -0.1250 -0.1250 »
« »
«¬-0.0833 -0.0833 -0.0833 -0.1250 -0.1250 0.3750 0.3750 »¼
ª 60.0 º
«13.0 »
xlm = ( AcA) Acy = «
rs ».
«18.0 »
¬« 29.0 ¼»
464 15 Special problems of algebraic regression and stochastic estimation
Summary
The general formulation of our 1-way classification problem is generated by
identifying the vector of responses as well as the vector of parameters:
Table 15.3: 1-way classification
y c := [y11 , y12 ...y1( n 1) y1n | y 21y 22 ...y 2( n
1 1 2 1) y 2 n | ... | y p1y p 2 ...y p ( n
2 p 1) y ( pn ) ]
p
xc := [ P D1 D 2 ...D p 1 D p ]
ª1 1 0 0º
«" " "»
«1 1 0 0»
«1 0 1 0
"»» , O( A) = n × ( p + 1)
A := «" 1
"
0 1 0
«" " "»
«1 0 0 1»
«" " "»
¬« 1 0 0 1 ¼»
p
n = n1 + n2 + ... + n p = ¦ ni (15.35)
i =1
experimental design: number of rank of the
number of observations parameters: design matrix:
n = n1 + n2 + … + n p 1+ p 1 + ( p 1) = p (15.36)
:MINOLESS:
(15.37) || y - Ax || = min and || x ||2 = min
2
(15.38)
x
The factor A appears in p levels and the factor B in g levels. If nij denotes the
number of observations under the influence of the ith level of the factor A and
the jth level of the factor B, then the results of the experiment can be condensed
15-2 n-way classification models 465
in Table 15.4. If Di and E y denote the effects of the factors A and B, P the mean
of all observations, we receive
P + D i + E j = E{y ijk } for all i {1,..., p}, j {1,..., q}, k {1,..., nij } (15.44)
as our model equation.
Table 15.4 (level of factors):
level of the factor B 1 2 … q
level of factor A 1 n11 n12 … n1q
2 n21 n22 n2q
… … … …
p np1 np2 npq
If nij = 0 for at least one pair{i, j} , then our experimental design is called in-
complete. An experimental design for which nij is equal of all pairs {ij} , is said
to be balanced.
The data of Table 15.5 describe such a general model of y ijk observations in the
ith row (brand of stove) and jth column (make of the pan), P is the mean, Di is
the effect of the ith row, E j is the effect of the jth column, and eijk is the error
term.
Outside the context of rows and columns Di is equivalently the effect due to the
ith level of the D factor and E j is the effect due to the jth level of the E factor.
In general, we have p levels of the D factor with i = 1,..., p and q levels of the E
factor with j = 1,..., q : in our example p = 4 and q = 3 .
With balanced data every one of the pq cells in Table 15.5 would have one (or
n) observations and n d 1 would be the only symbol needed to describe the num-
ber of observations in each cell. In our Table 15.5 some cells have zero observa-
tions and some have one. We therefore need nij as the number of observations in
466 15 Special problems of algebraic regression and stochastic estimation
the ith row and jth column. Then all nij = 0 or 1, and the number of observations
are the values of
q p p q
ni = ¦ nij , n j = ¦ nij , n = ¦¦ nij . (15.45)
j =1 i =1 i =1 j =1
Corresponding totals and means of the observations are shown, too. For the ob-
servations in Table 15.5 the linear equations of the model are given as follows,
ª18 º ª y11 º ª 1 1 1 º ªe º
ª P º « e11 »
«12 » « y12 » « 1 1 1 » «D1 » « e12 »
« 24 » « y13 » « 1 1 1» «D 2 » « 13 »
« 9 » « y23 » « 1 1 1» «D 3 » « e23 »
« 3 » = « y31 » = « 1 1 1 » «D 4 » + « e31 » ,
«15 » « y33 » « 1 1 1» « E1 » « e33 »
« 6 » «y » « 1 1 1 » « E » « e41 »
« 3 » « y 41 » « 1 1 1 » « E 2 » «e42 »
«18 » «« y42 »» « 1 1 1 »¼ ¬ 3 ¼ «e »
¬ ¼ ¬ 43 ¼ ¬ ¬ 43 ¼
where dots represent zeros. In summary,
ª18 º ª 1 1 0 0 0 1 0 0º ªe º
« 0» ª P º « e11 »
«12 » 1 1 0 0 0 0 1 «D1 » « e12 »
« 24 » « 1 1 0 0 0 0 0 1» «D 2 » « 13 »
«9» « 1 0 1 0 0 0 0 1» «D 3 » « e23 »
«3»=y=« 1 0 0 1 0 1 0 0» «D 4 » + « e31 »
«15 » « 1 0 0 1 0 0 0 1» « E1 » « e33 »
«6» « 1 0 0 0 1 1 0 0» « E » « e41 »
«3» « 1 0 0 0 1 0 1 0» « E 2 » « e42 »
«18 » « 1 0 0 0 1 0 0 1 »¼ ¬ 3 ¼ «e »
¬ ¼ ¬ ¬ 43 ¼
ª1 1 0 0 0 1 0 0º
ªP º
«1 1 0 0 0 0 1 0» « D1 »
«1 1 0 0 0 0 1 1» «D 2 »
«1 0 1 0 0 0 0 1» «D »
A = «1 0 0 1 0 1 0 0 » , x = «D 3 » , O( A) = 9 × 8, O(x) = 8 × 1
«1 0 0 1 0 0 0 1» 4
« E1 »
«1 0 0 0 1 1 0 0 » «E »
«1 0 0 0 1 0 1 0» « E2 »
«1 0 0 0 1 0 0 1 »¼ ¬ 3¼
¬
with y being the vector of observations and e y the vector of corresponding error
terms. As an inconsistent linear equation
y - e y = Ax, O{y} = 9 × 1, O{A} = 9 × 8, O{x} = 8 × 1
we pose the key question:
? What is the rank of the design matrix A ?
Most notable, the first column is 1n and the sum of the next 4 columns is also 1n
as well as the sum of the remaining 3 columns is 1n , too, namely
15-2 n-way classification models 467
or called
:rank preserving identity:
:symmetry preserving identity:
ª9 3 1 2 3 2º
«3 3 0 0 1 1»
«1 0 1 0 0 0»
D = «2 0 0 2 1 0» , E to be determined
«3 0 0 0 1 1»
«3 1 0 1 3 0»
«¬ 2 1 0 0 0 2 »¼
ª 5.3684 º
« 10.8421»
« -6.1579 »
« »
-1.1579 »
xlm = ( A cA) rs A cy = « .
« 1.8421 »
« -0.2105 »
« -4.2105 »»
«
¬« 9.7895 ¼»
Summary
The general formulation of our 2-way classification problem without interaction
is generated by identifying the vector of responses as well as the vector of pa-
rameters.
Table 15.7: 2-way classification without interaction
y c := [y11
c , ..., y cp1 , y12
c ,..., y cpq 1 y cpq ]
xc := [ P , D1 ,..., D p , E1 ,..., E q ]
A := [1n , c2 ,..., c p , c p +1 ,..., cq ]
subject to
c2 + ... + c p = 1, c p +1 + ... + cq = 1
q p p, q
ni = ¦ nij , n j = ¦ nij , n = ¦ nij (15.48)
j =1 i =1 i =1, j =1
:MINOLESS:
(15.50) || y Ax || = min
2
and || x ||2 = min (15.51)
x
The equations of a suitable linear model for analyzing data of the nature of Table
15.8 is for yijk as the kth observation in the ith treatment and jth variety. In our
top table, P is the mean, D i is the effect of the ith treatment, E j is the effect of
the jth variety, (DE )ij is the interaction effect for the ith treatment and the jth
variety and A ijk is the error term.
With balanced data every one of pq cells of our table would have n observa-
tions. In addition there would be pq levels of the (DE ) factor, the interaction
factor. However, with unbalanced data, when some cells have no observations
they are only as many (DE )ij - levels in the data as there are non-empty cells. Let
the number of such cells be s (s = 8 in Table 15.8). Then, if nij is the number of
observations in the (i, j)th cell of type “treatment i and variety j”, s the number of
cells in which nij z 0 , in all other cases nij > 0 . For these cells
nij
is the total yield in the (i, j)th cell, and yij is the corresponding mean. Similarly,
p q p ,q p , q , nij
is the total yield for all plots, the number of observations called “plots” therein
being
p q p,q
n = ¦ ni = ¦ n j = ¦ nij . (15.59)
i =1 j =1 i, j
We shall continue with the corresponding normal equations being derived from
the observational equations.
(DE )
P D1 D 2 D 3 E1 E 2 E 3 E 4 11 13 14 21 22 31 33 34
ª 8 º ª y111 º ª1 1 1 1 º ª e111 º
«13» « y112 » «1 1 1 1 » ª P º « e112 »
« 9 » « y113 » «1 1 1 1 » « D1 » « e113 »
«12 » « y131 » «1 1 1 » « D 2 » « e131 »
« 7 » « y141 » «1 1 1 » « D 3 » « e141 »
«11» « y142 » «1 1 1 » « E1 » « e142 »
« 6 » « y211 » «1 1 1 1 » « E 2 » « e211 »
«12 » « y212 » «1 1 1 1 » « E 3 » «e212 »
«12 » « y » «1 1 1 1 » « E » « e »
«14 » = « y 221 » = «1 1 1 1 » « (DE4) » + «e221 »
« 9 » « y222 » «1 1 1 1 » « (DE )11 » « e222 »
« 7 » « y321 » «1 1 1 1 » « (DE )13 » « e321 »
« » « 322 » « » « 14 » « 322 »
14
« » « y331 » «1 1 1 1 » « (DE ) 21 » « e331 »
«16 » « y332 » «1 1 1 1 » « (DE ) 22 » « e332 »
«10 » « y341 » «1 1 1 1» « (DE )32 » « e341 »
«14 » « y342 » «1 1 1 1» « (DE )33 » « e342 »
«11» « y343 » «1 1 1 1» «¬ (DE )34 »¼ « e343 »
¬«13¼» ««¬ y344 »»¼ ¬«1 1 1 1¼» ««¬ e344 ¼»»
472 15 Special problems of algebraic regression and stochastic estimation
ª18 6 4 8 5 4 3 6 3 1 2 2 2 2 2 4 º ª P º ª y º ª198º
«6 6 3 1 2 3 1 2 » « D1 » « y1 » « 60 »
«4 4 2 2 2 2 » « D 2 » « y2 » « 44 »
«8 8 2 2 4 2 2 4 » « D 3 » « y3 » « 94 »
«5 3 2 5 3 2 » « E1 » « y1 » « 48 »
«4 2 2 4 2 2 » « E 2 » « y2 » « 42 »
«3 1 2 3 1 2 » «« E 3 »» «« y3 »» « 42 »
«6 2 4 6 2 4 » « E 4 » = « y4 » = « 66 » ~ AcAx = Acy.
«3 3 3 3 » « (DE )11 » « y11 » « 30 » A
«1 1 1 1 » « (DE )13 » « y13 » « 12 »
«2 2 2 2 »» « (DE )14 » « y14 » «« 18 »»
«
«2 2 2 2 » « (DE ) 21 » « y21 » « 18 »
«2 2 2 2 » «(DE ) 22 » « y22 » « 26 »
«2 2 2 2 » « (DE )32 » « y32 » « 16 »
«2 2 2 2 » « (DE )33 » « y33 » « 30 »
¬« 4 4 4 4 ¼» «¬ (DE )34 »¼ «¬ y34 »¼ ¬« 48 ¼»
or called
:rank preserving identity:
!symmetry preserving identity!
Table 15.9 summarizes all the details of 2-way classification with interaction. In
general, for complete models our table lists the general number of parameters and
the rank of the design matrix which differs from our incomplete design model.
15-2 n-way classification models 473
x Am = ( A cA ) rs A cy . (15.66)
For our key example we get from the symmetric normal equation A cAx A = A cy
the solution
x Am = ( A cA ) rs A cy
given A cA and A cy
ª3 1 2 2 2 2 2 4º
«3 1 2 0 0 0 0 0»
«0 0 0 2 2 0 0 0»
«0 0 0 0 0 2 2 4»
«3 0 0 2 0 0 0 0»
«0 0 0 0 2 2 0 0»
«0 1 0 0 0 0 2 0»
« 4»
D = «0 0 2 0 0 0 0
3 0 0 0 0 0 0 0»
«0 1 0 0 0 0 0 0»
«0 0 2 0 0 0 0 0»
«0 0 0 2 0 0 0 0»
«0 0 0 0 2 0 0 0»
«0 0 0 0 0 2 0 0»
«0 0 0 0 0 0 2 0»
«0 0 0 0 0 0 0 4 »¼»
¬«
474 15 Special problems of algebraic regression and stochastic estimation
x Am = ( A cA) rs A cy =
= [6.4602, 1.5543, 2.4425, 2.4634, 0.6943, 1.0579, 3.3540, 1.3540,
1.2912, 0.6315, -0.3685, -0.5969, 3.0394, -1.9815, 2.7224,1.72245]c .
for all
i {1,… , p}, j {1,… , q}, k {1,… , r},… , A {1,… , nijk ,…}.
Summary
'D = D 2 D1 , 'E = E 2 E1 for all i {1, 2}, j {1, 2} (15.70)
as well as
'(DE1 ), '(DE 2 ), '(DE 2 ) (15.71)
are unbiased estimable.
476 15 Special problems of algebraic regression and stochastic estimation
At the end we review the number of parameters and the rank of the design matrix
for a 3–way classification with interactions according to the following example.
1 + ( p 1) + (q 1) + (r 1)
1+ p + q + r +
p,q,r
+( p 1)(q 1) +
n= ¦ nijk + pq + pr + qr + (15.72)
+( p 1)(r 1) + (q 1)(r 1) +
i , j , k =1
+ pqr
( p 1)(q 1)(r 1) = pqr
Tracking a satellite’s orbit around the Earth might be based on the unknown state
vector z(t ) being a function of the position and the speed of the satellite at time t
with respect to a spherical coordinate system with origin at the mass center of the
Earth. Position and speed of a satellite can be measured by GPS, for instance. If
distances and accompanying angles are measured, they establish the observation
y (t ) . The principles of space-time geometry, namely mapping y (t ) into z(t ) ,
would be incorporated in the matrix C while e y (t ) would reflect the measure-
ment errors at the time instant t. The matrix A reflects the situation how position
and speed change in time according the physical lows governing orbiting bodies,
while ez would allow for deviation from the lows owing to factors as nonuni-
formity of the Earth gravity field.
Example 2 (statistical quality control):
Here the observation vector y (t ) is a simple approximately normal transforma-
tion of the number of derivatives observed in a sample obtained at time t, while
y1 (t ) and y2 (t ) represent respectively the refractive index of the process and
the drift of the index. We have the observation equation and the system equa-
tions
z (t ) = z2 (t ) + ez1
y (t ) = z1 (t ) + ey (t1 ) and 1
z2 (t ) = z2 (t 1) + ez 2 .
In vector notation, this system of equation becomes
z(t ) = Az(t 1) + e z
namely
ª z (t ) º ª1 1º ª ez º ª 0 1º
z(t ) = « 1 » , e z = « » « », A = «
1
»
¬ z2 ( t ) ¼ ¬ 0 1¼ «¬ ez »¼2 ¬ 0 1¼
In practice, most time series are non-stationary. In order to fit a stationary model,
it is necessary to remove non-stationary sources of variation. If the observed
time series is non-stationary in the mean, then we can use the difference of the
series. Differencing is widely used in all scientific disciplines. If
z (t ), t {1,… , T } , is replaced by d z (t ) , then we have a model capable of de-
scribing certain types of non-stationary signals. Such a model is called an “inte-
grated model” because the stationary model that fits to the difference data has to
the summed or “integrated” to provide a model for the original non-stationary
data. Writing
W (t ) = d z (t ) = (1 B ) d z (t ) (15.80)
called prediction equations. The last equations follows from the standard results
on variance-covariance matrices for random vector variables. When the new
observation at time t, namely when y (t ) has been observed the estimator for
E{z(t )} can be modified to take account of the extra information. At time
(t 1) , the best forecast of y (t ) is given by hc E{z(t ) | zˆ (t 1)} so that the pre-
diction error is given by
eˆy (t ) = y (t ) hc(t ) E{z (t ) | z (t 1)}. (15.86)
This quantity can be used to update the estimate of E{z (t )} and its variance-
covariance matrix.
E{zˆ (t )} = E{z(t ) | zˆ (t 1)} + K (t )eˆ y (15.87)
V{t} is called the gain matrix. In the univariate case, K (t ) is just a vector of
size ( m 1) . The previous equations constitute the second updating stage of the
Kalman filter, thus they are called the updating equations.
A major practical advantage of the Kalman filter is that the calculations are re-
cursive so that, although the current estimates are based on the whole past history
of measurements, there is no need for an ever expanding memory. Rather the
near estimate of the signal is based solely on the previous estimate and the latest
observations. A second advantage of the Kalman filter is that it converges fairly
480 15 Special problems of algebraic regression and stochastic estimation
quickly when there is a constant underlying model, but can also follow the
movement of a system where the underlying model is evolving through time.
For special cases, there exist much simpler equations. An example is to consider
the random walk plus noise model where the state vector z(t ) consist of one
state variable, the current level ȝ(t ) . It can be shown that the Kalman filter for
this model in the steady state case for t o f reduces the simple recurrence
relation
ˆ t ) = ȝ(
ȝ( ˆ t 1) + D e(
ˆ t) ,
where the smoothing constant D is a complicated function of the signal-to-noise
ratio ı 2w ı n2 . Our equation is simple exponential smoothing. When ı 2w tends to
zero, ȝ(t ) is a constant and we find that D o 0 would intuitively be expected,
while as ı 2w ı n2 becomes large, then D approaches unity.
For a multivariate time series approach we may start from the vector-valued
equation of type
E{y (t )} = CE{z} , (15.90)
where C is a known nonsingular m × m matrix.
By LESS we are able to predict
E{zn
(t )} = C1E{n
y (t )} . (15.91)
Once a model has been put into the state-space form, the Kalman filter can be
used to provide estimates of the signal, and they in turn lead to algorithms for
various other calculations, such as making prediction and handling missing val-
ues. For instance, forecasts may be obtained from the state-space model using
the latest estimates of the state vector. Given data to time N, the best estimate of
the state vector is written E{zn( N )} and the h-step-ahead forecast is given by
E{yn
( N )} = hc( N + h) E{zn
( N + h)} =
(15.92)
= h( N + h)G ( N + h)G{N + h 1}… G ( N + 1) E{zn
( N )}
E{yn
( N | h)} = hc( N + h)G h E{zn
( N )}. (15.93)
If future values of h(t ) or G(t ) are not known, then they must themselves be
forecasted or otherwise guessed.
Up to this day a lot of research has been done on nonlinear models in prediction
theory relating to state-vectors and observational equations. There are excellent
reviews, for instance by P. H. Frances (1988), C. W. J. Granger and P. Newbold
15-3 Dynamical Systems 481
The key question is now whether to the characteristic state equation there be-
longs a transformation matrix such that for a specific matrix A
and B
there
exists an integer number r, 0 d r < n , of the form
ª A
ª r×r r × (n r ) º
A12 º
A
= « 11
»
, O{A
} = «
¬ 0 A 22 ¼ ¬(n 1) × r (n r ) × ( n r ) ¼»
ª B
º ª q×r º
B
= « 1 » , O{B
} = « ».
¬0 ¼ ¬ q × (n r ) ¼
In this case the state equation separates in two distinct parts.
d
z1 (t ) = A11 z1 + A12
z 2 (t ) + B1
h(t ), z1
(0) = z10
(15.97)
dt
d
z 2 (t ) = A
22 z
2 (t ), z
2 (0) = z
20 . (15.98)
dt
The last n r elements of z
cannot be influenced in its time development.
Influence is restricted to the initial conditions and to the eigen dynamics of the
partial system 2 (characterized by the matrix A
22 ). The state of the whole sys-
tem cannot be influenced completely by the artificially given point of the state
space. Accordingly, the state differential equation in terms of the matrix pair (A,
B) is not steerable.
Example 3: (steerable state):
If we apply the dynamic matrix A and the introductory matrix of a state model of
type
482 15 Special problems of algebraic regression and stochastic estimation
ª 4 2º
« 3», B = ª 1 º ,
A=« 3
1 5» « 0.5»
¬ ¼
« »
«¬ 3 3 »¼
we are led to the alternative matrices after using the similarity transformation
ª 1 1º
ª1º
A
= « , B =« ».
¬ 0 2 »¼ ¬ 0¼
If the initial state is located along the z1
-axis, for instance z20
= 0 , then the state
vector remains all times along this axis. It is only possible to move this axis
along a straight line “up and down”.
In case that there exists no similarity transformation we call the state matrices
(A, B) steerable. Steerability of a state differential equation may be tested by
Lemma 15.7 (Steerability):
The pair (A, B) is steerable if and only if
(15.99) rk[BAB … A n 1B] = rkF( A, B) = n.
In this case the state equation and the observational equations read
d
z1 (t ) = A11
z1 (t ) + B1
u(t ), z1
(0) = z10
(15.102)
dt
d
z 2 (t ) = A
21z1
(t ) + A
22 z
2 (t ) + B
2u( t ), z
2 (0) = z
20 (15.103)
dt
y (t ) = C
2 z1
(t ) + Du (t ). (15.104)
15-3 Dynamical Systems 483
The last n r elements of the vector z
are not used in the exit variable y. Since
they do not have an effect to z1
, the vector g contains no information of the
component of the state vector. This state moves in the n r dimensional sub-
space of \ n without any change in the exit variables. Our model (C, A) is in this
case called non-observable.
Example 4: (observability):
If the exit matrix and the dynamic matrix of a state model can be characterized
by the matrices
ª4 2º ª 0 1º
C=« », A = « ,
¬5 3¼ ¬ 2 3»¼
an application of the transformation matrix T leads to the matrices
ª 1 0º
C
= [1, 0] , A
= « ».
¬ 1 2 ¼
For an arbitrary motion of the state in the direction of the z
2 axis has no influ-
ence on the existing variable.
If there does not exist a transformation T, we call the state vector observable.
A rank study helps again!
Lemma 15.8 (Observability test):
The pair (C, A) is observable if and only if
ªC º
«CA »
rk « » = rkG(C, A ) = n. (15.105)
«# »
« n 1 »
«¬C »¼
G(C, A ) is called observability matrix. If its rank r < n , then
there exists a transformation matrix T such that A
= T 1AT and
C
= CT is of the form
ª A
0 º ª r×r r × (n r ) º
A
= « 11
»
, O{A
} = « » (15.106)
¬( n 1) × r ( n r ) × ( n r ) ¼
¬ A 21 A 22 ¼
With Lemma 15.7 and Lemma 15.8 we can only state whether a state model is
steerable or observable or not, or which dimension has a partial system being
classified as non-steerable and non-observable. In order to determine which part
of a system is non-steerable or non-observable - which eigen motion is not ex-
484 15 Special problems of algebraic regression and stochastic estimation
For details we recommend to check the reference list. We only refer to solving
both the state differential equation as well as the initial equation: Eliminating the
state vector z( s ) lead us to the algebraic relation between u( s ) and y ( s ) :
(15.110) G( s ) = C( sI n A ) 1 B + D
or
1
§ ªI 0 º ª A11 A12 º · ª B1
º
(15.111) G( s ) = ¬ªC C ¼º ¨ s « r
« » ¸ « »+D
1 2 ¨ ¬0
© I n r »¼ ¬ 0 A
22 ¼ ¸¹ ¬ 0 ¼
= C1
( sI n A11 ) B1 + D.
1
Recently, the topic of chaos has attracted much attention. Chaotic behavior
arises from certain types of nonlinear models, and a loose definition is appar-
ently random behavior that is generated by a purely deterministic, nonlinear
system.
Refer to the contributions of K.S. Chan and H. Tong (2001), J. Gleick (1987), V.
Isham (1983), H. Kants and I. Schreiber (1997).
Appendix A: Matrix Algebra
A1 Matrix-Algebra
A matrix is a rectangular or a quadratic array of numbers,
The format or “order” of A is given by the number n of rows and the number of
the columns,
O( A) := n × m.
Fact:
Two matrices are identical if they have identical format and if at each
place (i, j) are identical numbers, namely
ª i {1,..., n}
A = B aij = bij «
¬ j {1,..., m}.
486 Appendix A: Matrix Algebra
A + B = B + A (commutativity)
(A + B) + C = A + (B + C) (associativity)
Compatibility
(D + E )A = D A + E A º
distributivity
D ( A + B) = D A + D B »¼
( A + B)c = A c + Bc.
3(ii) “Kronecker-Zehfuss-product”
A = [aij ], O( A) = n × m º
B = [bij ], O(B) = k × l »¼
A1 Matrix-Algebra 487
C := B
A = [cij ], B
A := [bij A], O(C) = O(B
A) = kn × l
3(iii) “Khatri-Rao-product”
(of two rectangular matrices of identical column number)
A = [a1 ,..., am ], O ( A) = n × m º
B = [b1 ,..., bm ], O (B) = k × m »¼
C := B : A := [b1 a1 ,… , bm am ], O(C) = kn × m
3(iv) “Hadamard-product”
(of two rectangular matrices of the same order; elementwise product)
G = [ gij ], O(G ) = n × m º
H =[hij ], O(H ) = n × m »¼
The existence of the product A B does not imply the existence of the product
B A . If both products exist, they are in general not equal. Two quadratic matri-
ces A and B, for which holds A B = B A , are called commutative.
Laws
(i) (A B) C = A (B C)
A ( B + C) = A B + A C
( A + B) C = A C + B C
( A B)c = ( Bc A c) .
(ii)
( A
B)
C = A
( B
C) = A
B
C
( A + B )
C = ( A
B ) + ( B
C)
A
( B + C) = ( A
B) + ( A
C)
( A
B ) ( C
D ) = ( A C)
( B D )
( A
B )c = A c
B c .
(iii)
( A : B) : C = A : ( B : C) = A : B : C
( A + B ) : C = ( A : C ) + ( B : C)
A : ( B + C) = ( A : B) + ( A : C)
( A C) : (B D) = ( A : B) (C : D)
A : (B D) = ( A : B) D, if dij = 0 for i z j.
488 Appendix A: Matrix Algebra
(iv) A
B = B
A
( A
B )
C = A
( B
C) = A
B
C
( A + B )
C = ( A
C ) + ( B
C)
( A1 B1 C1 )
( A 2 B 2 C2 ) = ( A1 : A 2 )c ( B1
B 2 ) (C1 : C2 )
(D A)
(B D) = D ( A
B) D, if dij = 0 for i z j
( A
B)c = Ac
Bc.
A2 Special Matrices
We will collect special matrices of symmetric, antisymmetric, diagonal, unity,
zero, idempotent, normal, orthogonal, orthonormal, positive-definite and posi-
tive-semidefinite, special orthonormal matrices, for instance of type Helmert or
of type Hankel.
ª aij = 0 i z j
unity I n× n = «
¬ aij = 1 i z j
zero matrix 0 n× n : aij = 0 i, j {1,..., n}
upper º
ª aij = 0 i > j
» triangular: « a = 0 i < j
lower »¼ ¬ ij
x12 + x 22 = 1
ªx x2 º
{X = « 1 R 2×2 x1 x3 + x 2 x 4 = 0 , x1 x 4 x 2 x3 = +1}
¬ x3 x 4 »¼
x32 + x 42 = 1
ª cos I sin I º
(i) X=« R 2×2 , I [0, 2S ]
¬ sin I cos I »¼
ª 1 x2 2x º
« + »
X = « 1+ x 1 + x 2 » R 2×2 , x R
2
(iii) 2
« 2 x 1 x »
«¬ 1 + x 2 1 + x 2 »¼
atlas of the special orthogonal group SO(n) has at least four distinct charts and
there is one with exactly four charts. (“minimal atlas”: Lusternik – Schnirelmann
category)
(i) X = (I n + S)(I n S) 1 ,
where S = Sc is a skew matrix (antisymmetric matrix), is called a
Cayley-Lipschitz representation of X SO(n) .
( n! / 2(n 2)! is the number of independent parameters/coordinates of X)
(ii) If each of the matrices R 1 ," , R k is an n×n orthonormal matrix, then
their product
R1R 2 " R k 1R k SO(n)
Let ac = [a1 , ", a n ] represent any row vector such that a i z 0 (i {1, " , n}) is
any row vector whose elements are all nonzero. Suppose that we require an
n×n orthonormal matrix, one row which is proportional to ac . In what follows
one such matrix R is derived.
Let [r1c, " , rnc ] represent the rows of R and take the first row r1c to be the row of
R that is proportional to ac . Take the second row r2c to be proportional to the n-
dimensional row vector
[a1 , a12 / a 2 , 0, 0, " , 0], (H2)
the third row r3c proportional to
[a1 , a 2 , (a12 + a 22 ) / a 3 , 0, 0, ", 0] (H3)
and more generally the first through nth rows r1c, " , rnc proportional to
k 1
[a1 , a 2 , " , a k 1 , ¦ a i2 / a k , 0, 0, " , 0] (Hn-1)
i =1
for k {2,", n} ,
A2 Special Matrices 491
respectively confirm to yourself that the n-1 vectors ( H n1 ) are orthogonal to
each other and to the vector ac . In order to obtain explicit expressions for r1c, ",
rnc it remains to normalize ac and the vectors ( H n1 ). The Euclidean norm of the
kth of the vectors ( H n1 ) is
k 1 k 1 k 1 k
{¦ a i2 + (¦ a i2 ) 2 / a k2 }1 / 2 = {(¦ a i2 ) (¦ a i2 ) / a k2 }1 / 2 .
i =1 i =1 i =1 i =1
Accordingly for the orthonormal vectors r1c, " , rnc we finally find
n
(1st row) r1c = [¦ a i2 ] 1 / 2 (a1 , ", a n )
i =1
a 2k k 1
a i2
(kth row) rkc = [ k 1 k
] 1 / 2 (a1 , a 2 , ", a k 1 , ¦ , 0, 0, ", 0)
i =1 a k
(¦ a i2 ) (¦ a i2 ).
i =1 i =1
a 2n n 1
a i2
(nth row) rnc = [ n 1 n
] 1 / 2 [a1 , a 2 , ", a n1 , ¦ ] .
i =1 a n
(¦ a i2 ) (¦ a i2 ).
i =1 i =1
The recipy is complicated: When a c = [1, 1, ",1, 1] , the Helmert factors in the
1st row, …, kth row,…, nth row simplify to
r1c = n 1 / 2 [1, 1, ",1, 1] R n
ª r1c º
« rc »
« 2 »
«"»
« »
«rkc1 » SO(n)
« rkc »
« »
«"»
«r c »
« n 1 »
«¬ rnc »¼
ª 1/ 2 1/ 2 1/ 2 1/ 2 º
« »
« 1/ 2 1/ 2 0 0 »
« » SO(4).
« 1/ 6 1/ 6 2 / 6 0 »
«1/ 12 1/ 12 1/ 12 3 / 12 »¼
¬
Check that the rows are orthogonal and normalized.
Example (Helmert matrix of order n):
ª 1/ n 1/ n 1/ n 1/ n " 1/ n 1/ n º
« »
« 1/ 2 1/ 2 0 0 " 0 0 »
« »
« 1/ 6 1/ 6 2/ 6 0 " 0 0 »
« " " »
« » SO(n).
« 1 1 1 1 (n 1) »
« " " 0 »
« (n 1)(n 2) (n 1)(n 2) (n 1)(n 2) (n 1)(n 2) »
« 1 1 1 1 1 n »
« " " »
¬« n(n 1) n(n 1) n(n 1) n(n 1) n(n 1) ¼»
Check that the rows are orthogonal and normalized. An example is the nth row
1 2n + n
2 2
1 1 (1 n ) n 1
+"+ + = + =
n ( n 1) n( n 1) n( n 1) n( n 1) n( n 1)
2
n n n ( n 1)
= = = 1,
n ( n 1) n( n 1)
ª a11 º
«a »
« 21 »
« " »
« »
« an 11 »
«¬ an1 an 2 " anm »¼
A is a Hankel matrix.
Definition (Vandermonde matrix):
Vandermonde matrix: V R n× n
ª 1 1 " 1 º
« x x2 " xn »
V := « #1 # # # »,
«¬ x1n 1 n 1
x2 n 1 »
" xn ¼
n
det V = ( xi x j ).
i, j
i> j
ª1 1 1º
V := «« x1 x2 x3 »» , det V = ( x2 x1 )( x3 x2 )( x3 x1 ).
«¬ x12 x22 x32 »¼
«D1 x13 + D 2 x23 + D 3 x33 D1 x14 + D 2 x24 + D 3 x34 D1 x15 + D 2 x25 + D 3 x35 D1 x16 + D 2 x26 + D 3 x36 »
¬ ¼
P = [a1 , a2 , a3 ]
ª D1 x1 + D 2 x2 + D 3 x3 D1 x12 + D 2 x22 + D 3 x32 D1 x13 + D 2 x23 + D 3 x33 º
« 2 4»
«D1 x1 + D 2 x2 + D 3 x3 D1 x1 + D 2 x2 + D 3 x3 D1 x1 + D 2 x2 + D 3 x3 » .
2 2 3 3 3 4 4
«D1 x13 + D 2 x23 + D 3 x33 D1 x14 + D 2 x24 + D 3 x34 D1 x15 + D 2 x25 + D 3 x35 »
¬ ¼
ª1 0 0 0 0 0º
«0 0 0 1 0 0»
n = 3º K 32
«
= «0 1 0 0 0 0»
m = 2 »¼ 0 0 0 0 1 0»
«0 0 1 0 0 0»
«¬0 0 0 0 0 1 »¼
For all vectors which are characterized by x1 ,..., x n unequal from zero are
called linear dependent.
Let A be a rectangular matrix of the order O( ǹ) = n × m . The column rank
of the matrix A is the largest number of linear independent columns, while
the row rank is the largest number of linear independent rows. Actually the
column rank of the matrix A is identical to its row rank. The rank of a ma-
trix thus is called
rk A .
Obviously,
rk A d min{n, m}.
If rk A = n holds, we say that the matrix A has full row ranks. In contrast if
the rank identity rk A = m holds, we say that the matrix A has full column
rank.
Facts
A matrix A has the column space
R ( A)
formed by the column vectors. The dimension of such a vector space is dim
R ( A) = rk A . In particular,
R ( A) = R ( AA c)
holds.
(i) ( A B ) C = A ( B C) (associativity)
are equivalent. The Cayley-inverse A 1 is left and right identical. The Cayley-
inverse is unique.
ªA A 12 º
A := « 11 c = A 11 , A c22 = A 22 .
, A 11
c
¬ A 12 A 22 »¼
498 Appendix A: Matrix Algebra
ª[I + A 11
1 1
c A 11
A 12 ( A 22 A 12 A 12 ) 1 A 12 1
c ]A 11 1
A 11 c A 11
A 12 ( A 22 A 12 1
A 12 ) 1 º
« 1 »,
¬ ( A 22 A 12c A 11 A 12 ) 1 A 12 1
c A 11 ( A 22 A 12 1
c A 11 A 12 ) 1 ¼
1
if A 11 exists ,
1
ªA A 12 º
A 1
= « 11 =
c
¬ A 12 A 22 »¼
ª c A 221 A 12 ) 1
( A 11 A 12 c A 221 A 12 ) 1 A 12 A 221
( A 11 A 12 º
« 1 »
,
1
c ( A 11 A 12
¬ A 22 A 12
1
c A 22 A 12 ) 1
[I + A 22 A 12
1 1
c ( A 11 A 12 A 22 A 12 1
c ) A 12 ]A 22 ¼
if A 221 exists .
S 11 := A 22 A 12 1
c A 11 A 12 and S 22 := A 11 A 12
c A 221 A 12
are the minors determined by properly chosen rows and columns of the matrix A
called “Schur complements” such that
1
ªA A 12 º
A 1
= « 11 =
c
¬ A 12 A 22 »¼
ª(I + A 11
1 1
A 12 S 11 1
c ) A 11
A 12 1
A 11 1
A 12 S 11 º
« 1 1 1 »
¬ S 11 c A 11
A 12 S 11 ¼
1
if A 11 exists ,
1
ªA A 12 º
A 1
= « 11 =
c
¬ A 12 A 22 »¼
if A 221 exists ,
The formulae S11 and S 22 were first used by J. Schur (1917). The term “Schur
complements” was introduced by E. Haynsworth (1968). A. Albert (1969) re-
placed the Cayley inverse A 1 by the Moore-Penrose inverse A + . For a survey
we recommend R. W. Cottle (1974), D.V. Oullette (1981) and D. Carlson (1986).
:Proof:
For the proof of the “inverse partitioned matrix” A 1 (Cayley inverse) of the
partitioned matrix A of full rank we apply Gauss elimination (without pivoting).
AA 1 = A 1 A = I
ªA A 12 º
A = « 11 c = A 11 , A c22 = A 22
, A 11
c
¬ A 12 A 22 »¼
ª A R m×m , A R m×l
« 11 l ×m
12
l ×l
c R
«¬ A 12 , A 22 R
ªB B 12 º
A 1 = « 11 c = B 11 , B c22 = B 22
, B 11
c
¬B 12 B 22 »¼
ªB R m×m , B R m×l
« 11 l ×m
12
l ×l
c R
«¬ B 12 , B 22 R
AA 1 = A 1 A = I
c B11 + A 22 B12
« A12 c = B12
c A11 + B 22 A12
c =0 (3)
«
c B12 + A 22 B 22 = B12
¬ A12 c A12 + B 22 A 22 = I l (4).
1
Case (i): A 11 exists
“forward step”
A11B11 + A12 B12
c = I m (first left equation: º
» 1
cA ) »
multiply by A12 11
c = 0 (second right equation) ¼»
c B11 + A 22 B12
A12
c B 11 A 12
A 12 c A 111
c = A 12
A 12 B 12 1
c A 11 º
»
c B 11 + A 22 B 12
A 12 c =0 ¼
500 Appendix A: Matrix Algebra
ª A B + A 12 B 12 c = Im
« 11 11
1
c = A 12
c A 11 A 12 )B 12
¬( A 22 A 12
1
c A 11
c = ( A 22 A 12
B 12 1
c A 11 A 12 ) 1 A 12 1
c A 11
c = S 11 A 12
B 12 1 1
c A 11
or
A 11B 11 + A 12 B12
c = Im º
1 »
c = ( A 22 A 12
B12 1 1
c A 11 A 12 ) A 12
c A 11 ¼
B 11 = A 11
1
c ) = (I m B 12 A 12
(I m A 12 B 12 1
c ) A 11
B 11 = [I m + A 11
1 1
c A 11
A 12 ( A 22 A 12 A 12 ) 1 A 12 1
c ]A 11
B 11 = A 11 + A 11 A 12 S 11 A 12
1 1 1 1
c A 11
B 12 = A 11
1
A 12 B 22 = A 11
1 1
c A 11
A 12 ( A 22 A 12 A 12 ) 1
B 22 = ( A 22 A12 1
c A11 A12 ) 1
B 22 = S11
1
.
“forward step”
A11B12 + A12 B 22 = 0 (third right equation) º
c B12 + A 22 B 22 = I l (fourth left equation: »
A12 »
multiply by A12 A 221 ) »¼
A 11B 12 + A 12 B 22 = 0 º
1 »
1
c B 12 A 12 B 22 = A 12 A 22 ¼
A 12 A 22 A 12
ª A c B + A 22 B 22 = I l
« 12 12
1
c )B 12 = A 12 A 221
¬( A 11 A 12 A 22 A 12
A3 Scalar Measures and Inverse Matrices 501
B 12 = ( A 11 A 12 A 22
1
c ) 1 A 12
A 12 1
c A 22
B 12 = S 22 A 12 A 22
1 1
or
ªI m A 12 A 221 º ª A 11 A 12 º ª A 11 A 12 A 221 A 12
c 0 º
« »« » =« ».
¬0 Il c
¼ ¬ A 12 A 22 ¼ ¬ c
A 12 A 22 ¼
“backward step”
c B12 + A 22 B 22 = I l
A 12 º
1 »
B12 = ( A 11 A 12 A 22 A 12
1 1
c ) A 12 A 22 ¼
B 22 = A 22
1
c ) = (I l B 12
c B 12
(I l A 12 1
c A 12 ) A 22
B 22 = [I l + A 221 A 12
c ( A 11 A 12 A 221 A 12
c ) 1 A 12 ]A 221
B 22 = A 22 + A 22 A 12
1 1 1 1
c S 22 A 12 A 22
c B 11 + A 22 B 12
A 12 c = 0 ( third left equation )
c = A 221 A 12
B 12 c B 11 = A 221 A 12
c ( A 11 A 12 A 221 A 12
c ) 1
B 11 = ( A 11 A 12 A 1
22 A 1c 2 ) 1
B 1 1 = S 2 21 .
h
The representations { B11 , B12 , B 21 = B12
c , B 22 } in terms of { A11 , A12 , A 21 = A12
c ,
A 22 } have been derived by T. Banachiewicz (1937). Generalizations are referred
to T. Ando (1979), R. A. Brunaldi and H. Schneider (1963), F. Burns, D. Carl-
son, E. Haynsworth and T. Markham (1974), D. Carlson (1980), C. D. Meyer
(1973) and S. K. Mitra (1982), C. K. Li and R. Mathias (2000).
We leave the proof of the following fact as an exercise.
ªA A 12 º
A := « 11 .
¬ A 21 A 22 »¼
1
ªA A 12 º
A 1 = « 11 =
¬ A 21 A 22 »¼
1
ª A 11 + A 11
1 1
A 12 S 11 1
A 21 A 11 1
A 11 1
A 12 S 11 º
« 1 1 1 »,
¬ S 11 A 21 A 11 S 11 ¼
1
if A 11 exists
1
ªA A 12 º
A 1
= « 11 =
¬ A 21 A 22 »¼
if A 221 exists
(1992). (s10) has been noted by W. J. Duncan (1944) and L. Guttman (1946):
The result is directly derived from the identity
( A + CBD)( A + CBD) 1 = I
h
Certain results follow directly from their definitions.
Facts (inverses):
(i) ( A ¸ B)1 = B1 ¸ A1
(ii) ( A B)1 = B1 A1
(iii) A positive definite A1 positive definite
(iv) ( A B)1 , ( A B)1 and (A1 B1 ) are positive
definite, then (A1 B1 ) ( A B)1 is positive
semidefinite as well as (A1 A ) I and I (A1 A)1 .
plays a similar role as a second scalar measure. Here the summation is extended
as the summation perm ( j1 ,… , jn ) over all permutations ( j1 ,..., jn ) of the set of
integer numbers (1,… , n) . ) ( j1 ,… , jn ) is the number of permutations which
transform (1,… , n) into ( j1 ,… , jn ) .
Laws (determinant)
(i) | D A | = D n | A | for an arbitrary scalar D R
(ii) | A B |=| A | | B |
(iii) | A
B |=| A |m | B |n for an arbitrary m × n matrix B
(iv) | A c |=| A |
1
(v) | (A + A c) |d| A | if A + A c is positive definite
2
(vi) | A 1 |=| A |1 if A 1 exists
(vii) | A |= 0 A is singular ( A 1 does not exist)
(viii) | A |= 0 if A is idempotent, A z I
n
(ix) | A |= aii if A is diagonal and a triangular matrix
i =1
n
(x) 0 d| A |d aii =| A
I | if A is positive definite
i =1
n
(xi) | A | | B | d | A | bii d| A
B | if A and B are posi-
i =1
tive definite
1
ª det A11 det( A 22 A 21A11 A12 )
« m ×m
ª A11 A12 º « A11 R , rk A11 = m1 1 1
(xii) «A » =« 1
A 22 ¼
¬ 21 « det A 21 det( A11 A12 A 22 A 21 )
« A R m ×m , rkA = m . 2 2
¬ 22 22 2
A3 Scalar Measures and Inverse Matrices 505
ª a11 º
«… »
« »
« an1 »
a
A = [aij ] = [a ji ] = A c vechA := «« 22 »» .
…
«a »
« n2 »
«… »
«¬ ann »¼
(iii) Let A be a quadratic, antisymmetric matrix, A = A c , of
order O( A) = n × n . Then veckA (“vec - skew”) is the
[n(n + 1) / 2] × 1 vector which is generated columnwise stapels of
those matrix elements which are under its diagonal.
ª a11 º
« … »
« »
« an1 »
« a »
A = [aij ] = [a ji ] = Ac veckA := « 32 » .
…
« a »
« n2 »
« … »
«¬ an, n 1 »¼
Examples
ªa b cº
(i) A=« vecA = [a, d , b, e, c, f ]c
¬d e f »¼
ªa b cº
(ii) A = «« b d e »» = A c vechA = [ a, b, c, d , e, f ]c
¬« c e f »¼
ª 0 a b c º
« a 0 d e »»
(iii) A=« = A c veckA = [a, b, c, d , e, f ]c .
«b d 0 f»
« »
«¬ c e f 0 »¼
Useful identities, relating to scalar- and vector - valued
measures of matrices will be reported finally.
U 2 := [u r +1 ,… , u m ], O(U 2 ) = m × (m r ), A U 2 = 0.
/1/ 2 := Diag( O1 ,… , Or ) .
A + := U1 U1 = U1 / 1 U1c
is the representation of its pseudoinverse namely
(i) AA + A = (U1 /U1c )( U1 / 1 U1c )( U1 /U1c ) = U1 /U1c
(ii) A + AA + = (U1 / 1 U1c )( U1 /U1c )( U1 / 1 U1c ) = U1/ 1 U1c = A +
(iii) AA + = (U1 /U1c )( U1 / 1 U1c ) = U1 U1c = ( AA + )c
At the end we compute the eigenvalues and eigenvectors which relate the
variation problem xcAx = extr subject to the condition xcx = 1 , namely
xcAx + O (xcx) = extr .
x, O
A6 Generalized Inverses
Because the inversion by Cayley inversion is only possible for quadratic
nonsingular matrices, we introduce a slightly more general definition in order to
invert arbitrary matrices A of the order O( A) = n × m by so – called generalized
inverses or for short g – inverses.
An m × n matrix G is called g – inverse of the matrix A if it fulfils the equation
AGA = A
in the sense of Cayley multiplication. Such g – inverses always exist and are
unique if and only if A is a nonsingular quadratic matrix. In this case
G = A 1 if A is invertible,
in other cases we use the notation
G = A if A 1 does not exist.
For the rank of all g – inverses the inequality
r := rk A d rk A d min{n, m}
holds. In reverse, for any even number d in this interval there exists a g – inverse
A such that
d = rkA = dim R ( A )
A r AA r = A r
but are not necessary symmetric for symmetric matrices A. In general,
A = A c and A g-inverse of A
( A )c g-inverse of A
A( A cA) A and A c( AA c) 1 A
can be used to generate special g-inverses of AcA or AA c . For instance,
A A := ( A cA) A c and A m := A c( AA c)
have the special reproducing properties
A( A cA) A cA = AA A A = A
and
AAc( AAc) A = AA m A = A ,
which can be generalized in case that W and S are positive semidefinite matrices
to
WA( A cWA) A cWA = WA
ASAc( ASAc) AS = AS ,
where the matrices
WA( A cWA) A cW and SAc( ASA c) AS
are independent of the choice of the g-inverse ( A cWA) and ( ASA c) .
A beautiful interpretation of the various g-inverses is based on the fact that the
matrices
( AA )( AA ) = ( AA A) A = AA and ( A A)( A A) = A ( AA A) = A A
are idempotent and can therefore be geometrically interpreted as projections. The
image of AA , namely
R ( AA ) = R ( A) = {Ax | x R m } R n ,
independent of the special rank of the g-inverses A which are determined by the
subspaces R ( A A) and N ( AA ) , respectively.
N ( AA c) N (A A) R( A A )
R( AA )
in R n in R m
(i) AA + A = A (g-inverse)
(ii) A + AA + = A + (reflexivity)
(iii) AA + = ( AA + )cº
» Symmetry due to orthogonal projection .
(iv) A + A = ( A + A)c »¼
R ( AA + ) = R (G ) A N (G c) = N ( AA + ) = N ( A c).
If we want to give up the orthogonality conditions, in case of a quadratic matrix
A = GF , we could take advantage of the projections
A r A = AA r
we could postulate
R ( A p A) = R ( AA r ) = R (G ) ,
N ( A cA r ) = N ( A r A) = N (F) .
In consequence, if FG is a nonsingular matrix, we enjoy the representation
A r := G (FG ) 1 F ,
which reduces in case that A is a symmetric matrix to the pseudoinverse A + .
Dual methods of computing g-inverses A are based on the basis of the null
space, both for F and G, or for A and A c . On the first side we need the matrix
EF by
FEcF = 0, rkEF = m r versus G cEG c = 0, rkEG c = n r
on the other side. The enlarged matrix of the order (n + r r ) × (n + m r ) is
automatically nonsingular and has the Cayley inverse
518 Appendix A: Matrix Algebra
1
ªA EG c º ª A+ EF+ º
«E » =« + »
¬ F 0 ¼ ¬ EG c 0¼
with the pseudoinverse A + on the upper left side. Details can be derived from A.
Ben – Israel and T. N. E. Greville (1974 p. 228).
If the null spaces are always normalized in the sense of
< EF | EcF >= I m r , < EcG c | EG c >= I n r
because of
+
E = EcF < EF | EcF > 1 = EcF
F
and
+
E Gc =< EcG c | EG c > 1 EcG c = EcG c
1
ªA EG c º ªA+ EG c º
«E = « » .
¬ F 0 »¼ ¬ EcF 0 ¼
These formulas gain a special structure if the matrix A is symmetric to the order
O( A) . In this case
EG c = EcF =: Ec , O(E) = (m r ) × m , rk E = m r
and
1
ª A Ec º ª A+ E c < E | Ec > 1 º
«E 0 » = « »
¬ < E | Ec > E
1
¬ ¼ 0 ¼
on the basis of such a relation, namely EA + = 0 there follows
I m = AA + + Ec < E | Ec > 1 E =
= ( A + EcE)[ A + + Ec(EEcEEc) 1 E]
and with the projection (S - transformation)
A + A = I m Ec < E | Ec > 1 E = ( A + EcE) 1 A
and
A + = ( A + EcE) 1 Ec(EEcEEc) 1 E
pseudoinverse of A
R ( A + A) = R ( AA + ) = R ( A) A N ( A) = R (Ec) .
In case of a symmetric, reflexive g-inverse A rs there holds the orthogonality or
complementary
A6 Generalized Inverses 519
R ( A rs A) A N ( AA rs )
N ( AA rs ) complementary to R ( AA rs ) ,
which is guaranteed by a matrix K , rk K = m r , O(K ) = (m r ) × m such that
KEc is a non-singular matrix.
At the same time, we take advantage of the bordering of the matrix A by K and
K c , by a non-singular matrix of the order (2m r ) × (2m r ) .
1
ª A K cº ª A rs K R º
«K 0 » = « ».
¬ ¼ ¬ (K R ) c 0 ¼
K R := Ec(KEc) 1 is the right inverse of A . Obviously, we gain the symmetric
reflexive g-inverse A rs whose columns are orthogonal to K c :
For the special case of a symmetric and positive semidefinite m × m matrix A the
matrix set U and V are reduced to one. Based on the various matrix decompo-
sitions
ª- 0 º ª U1c º
A = [ U1 , U 2 ] « » « » = U1AU1c ,
¬ 0 0 ¼ ¬ U c2 ¼
we find the different g - inverses listed as following.
ª-1 L12 º ª U1c º
A = [ U1 , U 2 ] « »« ».
¬ L 21 L 21-L12 ¼ ¬ U c2 ¼
¬ 0 U
0¼ ¬ 2 ¼
A + = ( A + U 2 U c2 ) 1 U 2 U c2 .
The main target of our discussion of various g-inverses is the easy handling of
representations of solutions of arbitrary linear equations and their characteriza-
tions.
We depart from the solution of a consistent system of linear equations,
Ax = c, O( A) = n × m, c R ( A) x = A c for any g-inverse A .
x = A c is the general solution of such a linear system of equations. If we want
to generate a special g - inverse, we can represent the general solution by
x = A c + (I m A A ) z for all z R m ,
since the subspaces N ( A) and R (I m A A ) are identical. We test the consis-
tency of our system by means of the identity
AA c = c .
c is mapped by the projection AA to itself.
Similary we solve the matrix equation AXB = C by the consistency test: the
existence of the solution is granted by the identity
A6 Generalized Inverses 521
X = A CB + Z A AZBB ,
where Z is an arbitrary matrix of suitable order. We can use an arbitrary g-
inverse A and B , for instance the pseudoinverse A + and B + which would be
for Z = 0 coincide with two-sided orthogonal projections.
How can we reduce the matrix equation AXB = C to a vector equation?
The vec-operator is the door opener.
AXB = C (Bc
A) vec X = vec C .
The general solution of our matrix equation reads
vec X = (Bc
A) vec C + [I (Bc
A) (Bc
A)] vec Z .
Here we can use the identity
( A
B) = B
A ,
generated by two g-inverses of the Kronecker-Zehfuss product.
At this end we solve the more general equation Ax = By of consistent type
R ( A) R (B) by
Lemma (consistent system of homogenous equations Ax = By ):
Given the homogenous system of linear equations Ax = By for
y R A constraint by By R ( A ) . Then the solution x = Ly can
be given under the condition
R ( A ) R (B ) .
In this case the matrix L may be decomposed by
L = A B for a certain g-inverse A .
Appendix B: Matrix Analysis
A short version on matrix analysis is presented. Arbitrary derivations of scalar-
valued, vector-valued and matrix-valued vector – and matrix functions for func-
tionally independent variables are defined. Extensions for differenting symmetric
and antisymmetric matrices are given. Special examples for functionally depend-
ent matrix variables are reviewed.
B1 Derivatives of Scalar valued and Vector valued Vector Functions
Here we present the analysis of differentiating scalar-valued and vector-valued
vector functions enriched by examples.
Definition: (derivative of scalar valued vector function):
Let a scalar valued function f (x) of a vector x of the order
O(x) = 1× m (row vector) be given, then we call
wf
Df (x) = [D1 f (x),… , Dm f ( x)] :=
wxc
first derivative of f (x) with respect to xc .
ª wf ij º
» = ª¬ J ijk A º¼ R
n× q× m× p
«
wx
¬ kA ¼
is four-dimensional and automatic outside the usual frame of matrix algebra of
two-dimensional arrays. By means of the operations vecF and vecX we will
vectorize the matrices F and X. Accordingly we will take advantage of vecF(X)
of the vector vecX derived with respect to the matrix J F , a two-dimensional
array.
B2 Derivatives of Trace Forms 523
Examples
(i) f (x) = xcAx = a11 x12 + (a12 + a21 ) x1 x2 + a22 x22
wf
Df (x) = [D1 f (x), D 2 f (x)] = =
wxc
= [2a11 x1 + (a12 + a21 ) x2 | (a12 + a21 ) x1 + 2a22 x2 ] = xc( A + Ac)
ªa x + a x º
(ii) f ( x) = Ax = « 11 1 12 2 »
¬ a21 x1 + a22 x2 ¼
wf ª a11 a12 º
J F = Df (x) = =« =A
wxc ¬ a21 a22 »¼
O(J F ) = 4 × 4 .
w(vecX) « symmetric;
«
¬[vec(Ac A)]c, if the n × n matrix X is antisymmetric.
for instance
ªa a12 º ªx x12 º
A = « 11 » , X = « 11 .
¬ a21 a22 ¼ ¬ x21 x22 »¼
ª w w º
« wx wx12 » ªa a21 º
w w
= « 11 », tr( AX) = « 11 = Ac.
wX « w w » wX ¬ a12 a22 »¼
« wx wx22 »¼
¬ 21
Case # 2: “the n × n matrix X is symmetric : X = Xc “
x12 = x21
tr( AX ) = a11 x11 + ( a12 + a21 ) x21 + a22 x22
ª w dx21 w º ª w w º
w « wx dx12 wx21 » « wx11 wx21 »
= « 11 »=« »
wX « w w » « w w »
« wx wx22 »¼ «¬ wx21 wx22 »¼
¬ 21
w ª a11 a12 + a21 º
tr( AX) = « = A c + A Diag(a11 ,… , ann ) .
wX ¬ a12 + a21 a22 »¼
w ª 0 a12 + a21 º
tr( AX) = « » = Ac A .
wX ¬ a12 a21 0 ¼
B2 Derivations of Trace Forms 525
Let us now assume that the matrix X of variables xij is always consisting of
functionally independent elements. We note some useful identities of first de-
rivatives.
Scalar valued functions of vectors
w
(acx) = ac (B1)
w xc
w
(xcAx) = Xc( A + Ac). (B2)
w xc
w
trXD = D ( Xc)D 1 , if X is quadratic ; (B7)
wX
especially:
w trX
= (vecI)c .
w (vecX)c
w
| AXBXC |= BcXcA c(adjAXBXC)cCc + A c(adjAXBXC)cCcXcBc ; (B11)
wX
w
| XBX |= BcXc(adjXBX)c + (adjXBX)cXB c ;
wX
especially:
2
w|X |
= (vec[Xcadj(X 2 )c + adj(X 2 )cXc])c =
w (vecX)c
w (vecXcX)
= (I p +K p p )(I p
Xc) for all X R m × p
2
w (vecX)c
w (vecX 1 )
= ( X 1 )c if X is non-singular
w (vecX)c
w (vecXD ) D
= ¦ (Xc)D -j
X j 1 for all D N , if X is a square matrix.
w (vecX)c j =1
w vec(X
Y)
= I p
[(K qm
I n) (I m
vecY)] ,
w (vecX)c
w vec(X
Y)
= (I p
K qm ) (vecX
I q )]
I n .
w (vecY)c
tr(AX) = (a12 a 21 )x 21
530 Appendix B: Matrix Analysis
w w w tr(AX)
= , = a12 a21 ,
w (veckX)c wx21 w (veckX)c
w tr(AX) w tr(AX)
= [veck(A c A)]c=[veck ]c .
w (veckX)c wX
wf
Df (x) = = [2a11 x1 + (a12 + a21 ) x2 | (a12 + a21 ) x1 + 2a22 x2 ]
wxc
ªa x + a x º
(ii) f (x) = Ax = « 11 1 12 2 »
¬ a21 x1 + a22 x2 ¼
wf ª a11 a12 º
Df (x) = =« =A
wxc ¬ a21 a22 »¼
w 2f ª0 0º
DDcf (x) = =« , O(DDcf (x)) = 2 × 2
wxwx ¬0 0 »¼
c
w w w w w w
HF =
vecF(X) = [ , , , ]
JF =
w (vecX)c w (vecX)c wx11 wx21 wx12 wx22
ª2 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0º
« »
0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0»
=«
«0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0»
« »
¬« 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 2 ¼»
O(H F ) = 4 × 16 .
At the end, we want to define the derivative of order l of a matrix-valued matrix
function whose structure is derived from the postulate of a suitable array.
Definition ( l-th derivative of a matrix-valued matrix function):
Let F(X) be an n × q matrix valued function of an m × p matrix
of functional independent variables X. The nq × ml p l matrix of
l-th derivative is defined by
w w
Dl F(X) :=
…
vecF(X) =
w (vecX)c l -times w (vecX)c
wl
= vecF(X) for all l N .
w (vecX)c
…
(vecX)c
l -times
Appendix C: Lagrange Multipliers
wFi
rk( ) = r < m. (C3)
wxm
ª F1 ( x1 ,..., xm r ; xm r +1 ,..., xm ) º
« F2 ( x1 ,..., xm r ; xm r +1 ,..., xm ) »
« »
(x1 , x 2 ) 6 F(x1 , x 2 ) = « ... » (C6)
« Fr 1 ( x1 ,..., xm r ; xm r +1 ,..., xm ) »
«¬ Fr ( x1 ,..., xm r ; xm r +1 ,..., xm ) »¼
transform a continuously differential function with F(x1 , x 2 ) = 0 . In case
of a Jacobi determinant j not zero or a Jacobi matrix J of rank r, or
w ( F1 ,..., Fr )
j := det J z 0 or rk J = r , J := , (C7)
w ( xm r +1 ,..., xm )
there exists a surrounding U := U(x1 ) \ m r and V := UG (x 2 ) \ r
such that the equation F (x1 , x 2 ) = 0 for any x1 U in V c has only one
solution
ª xm r +1 º ª G1 ( x1 ,..., xm r ) º
« xm r + 2 » « G 2 ( x1 ,..., xm r ) »
« » « »
x 2 = G (x1 ) or « ... » = « ... ». (C8)
x
« m 1 » « r 1 1G ( x ,..., xmr » )
«¬ xm »¼ «¬ G r ( x1 ,..., xm r ) »¼
:Example C1:
F1 ( x1 , x2 , x3 ) = Z ( x, y, z ) = 0 «
«2 y = 1 2 1 x2
«¬ 2
3
F2 ( x1 , x2 , x3 ) = E ( x, y, z ) = 0 z = x
4
1 3
1 f ( x1 , x2 , x3 ) = 1 f ( x, y , z ) = f ( x, + 2 1 x2 , ) =
2 4
x 1
= 2 1 x2
4 2
1 3
2 f ( x1 , x2 , x3 ) = 2 f ( x, y , z ) = f ( x, 2 1 x2 , )
2 4
x 1
= + 2 1 x2
4 2
1 1 x 1
1 f ( x) = 0 + = 0 1 x =
c 2
4 2 1 x2 3
1 1 x 1
2 f ( x )c = 0
+ 2 = 0 2 x = +
4 2 1 x 2 3
1 3 1 3
1 f ( ) = (minimum), 2 f ( ) = + ( maximum).
3 4 3 4
At the position x = 1/ 3, y = 2 / 3, z = 1/ 4 we find a global minimum, but at
the position x = +1/ 3, y = 2 / 3, z = 1/ 4 a global maximum.
F1 ( x1 , x2 , x3 ) = Z ( x, y , z ) = x 2 + 2 y 2 1 = 0
F2 ( x1 , x2 , x3 ) = E ( x, y , z ) = 3x 4 z = 0
representing an elliptical cylinder and a plane. In this case is the (m-r) dimen-
sional surface M F the intersection manifold of the elliptic cylinder and of the
plane as the m-r =1 dimensional manifold in \ 3 , namely as “spatial curve”.
Secondly, the risk function f ( x1 ,..., xm ) = extr generates an (m-1) dimensional
surface M f which is a special level surface. The level parameter of the (m-1)
dimensional surface M f should be external. In our Example C1 one risk function
can be interpreted as the plane
f ( x1 , x2 , x3 ) = f ( x, y , z ) = x y z .
We summarize our result within Lemma C2.
:proof:
wx
t k ( p ) := Tp M F (k = 1,..., m r ).
wx k x= p
:Example C2:
Let the m r = 2 dimensional level surface M F of the sphere S r2 \ 3 of radius
r (“level parameter r 2 ”) be given by the side condition
F ( x1 , x2 , x3 ) = x12 + x2 2 + x32 r 2 = 0.
:Normal space:
ª 2 x1 º
wF wF wF
n( p ) = grad F ( p ) = e1 + e2 + e3 = [e1 , e 2 , e 3 ] « 2 x2 » .
wx1 wx2 wx3 «2 x »
¬ 3¼p
3
The orthogonal vectors [e1 , e 2 , e 3 ] span \ . The normal space will be generated
locally by a normal vector n( p ) = grad F ( p ).
:Tangent space:
The implicit representation is the characteristic element of the level surface. In
order to gain an explicit representation, we take advantage of the Implicit
Function Theorem according to the following equations.
F ( x1 , x2 , x3 ) = 0 º
»
) = r = 1 » x3 = G ( x1 , x2 )
wF
rk(
wx j »¼
wF wF
x12 + x22 + x32 r = 0 and ( ) = [2 x1 + 2 x2 + 2 x3 ], rk( ) =1
wx j wx j
x j = G ( x1 , x2 ) = + r 2 ( x12 + x2 2 ) .
The negation root leads into another domain of the sphere: here holds the do-
main 0 < x1 < r , 0 < x2 < r , r 2 ( x1 + x2 ) > 0.
2 2
x( p ) = e1 x1 + e 2 x2 + e 3 r 2 ( x12 + x22 ) ,
ª ª º
« « 1 »
wx x1
« t1 ( p ) = ( p ) = e1 e3 = [e1 , e 2 , e3 ] « 0 »
« wx2 r ( x1 + x2 )
2 2 2
« x1 »
« «¬ r 2 ( x12 + x2 2 )»
¼
«
« ª º
« 0 »
« wx x2
«t1 ( p ) = wx ( p ) = e 2 e3 2 = [e1 , e 2 , e3 ] « 1 »,
« 2 r ( x1 + x2 )
2 2
« x2 »
«¬ «¬ r 2 ( x12 + x2 2 )»
¼
Fr ( x1 ,..., xm ) = 0 ¼ »
:Example C3:
Let us assume that there will be given the point X \ 3 . Unknown is the point in
the m r = 2 dimensional level surface M F of type sphere S r 2 = \ 3 which is
from the point X \ 3 at extremal distance, either minimal or maximal.
The distance function || X x ||2 for X \ 3 and X S r 2 describes the risk func-
tion
f ( x1 , x2 , x3 ) = ( X 1 x1 ) 2 + ( X 2 x2 ) 2 + ( X 3 x3 ) 2 = R 2 = extr ,
x1 , x2 , x3
ª 2 x1 º
n( p ) := grad F ( p ) = [e1 , e 2 , e 3 ] « 2 x2 »
«2 x »
¬ 3¼
is an element of the normal space N p M f .
The normal equation
grad f ( p ) = O grad F ( p )
L ( x1 ,..., xm ; O1 ,..., Or ) =
r
= f ( x1 ,..., xm ) ¦ i =1 Oi Fi ( x1 ,..., xm ) = extr
x1 ,..., xm ; O1 ,..., Or
ª wL wf r wFi
« wx = wx ¦ i =1 Oi wx = 0 ( j = 1,..., m)
« i j j
« wL
« wx = Fi ( x j ) = 0 (i = 1,..., r ).
¬ k
:Example C4:
We continue our third example by solving the alternative system of equations.
L ( x1 , x2 , x3 ; O ) = ( X 1 x1 ) 2 + ( X 2 x2 ) 2 + ( X 3 x3 )
O ( x12 + x22 + x32 r 2 ) = extr
x1 , x2 , x3 ; O
wL º
= 2( X j x j ) 2O x j = 0 »
wx j
»
wL »
= x1 + x2 + x3 r = 0 »
2 2 2 2
wO ¼
X X º
x1 = 1 ; x2 = 2 »
1 O 1 O
»
x12 + x22 + x32 r 2 = 0 ¼
X 12 + X 2 2 + X 32
r 2 = 0 (1 O ) 2 r 2 + X 12 + X 2 2 + X 32 = 0
(1 O ) 2
X 12 + X 2 2 + X 32 1
(1 O ) 2 = 1 O1, 2 = ± X 12 + X 2 2 + X 32
r2 r
1 r ± X 12 + X 2 2 + X 32
O1, 2 = 1 ± X 12 + X 2 2 + X 32 =
r r
rX 1
( x1 )1, 2 = ± ,
X 12 + X 2 2 + X 32
rX 2
( x2 )1, 2 = ± ,
X 12 + X 2 2 + X 32
rX 3
( x3 )1, 2 = ± .
X + X 2 2 + X 32
1
2
w 2L
H=( ) = (G jk (1 O )) = (1 O )I 3
wx j xk
Our example illustrates how we can find the global optimum under side condi-
tions by means of the technique of Lagrange multipliers.
:Example C5:
Search for the global extremum of the function f ( x1 , x2 , x3 ) subject to two side
conditions F1 ( x1 , x2 , x3 ) and F2 ( x1 , x2 , x3 ) , namely
f ( x1 , x2 , x3 ) = f ( x, y , z ) = x y z (plane)
ª F1 ( x1 , x2 , x3 ) = Z ( x, y, z ) := x 2 + 2 y 2 1 = 0 (elliptic cylinder)
« F ( x , x , x ) = E ( x, y , z ) := 3x 4 z = 0 (plane)
¬ 2 1 2 3
wFi ª2 x 4 y 0 º
J=( )=« , rk J ( x z 0 oder y z 0) = r = 2 .
wx j ¬ 3 0 4 »¼
:Variational Problem:
L ( x1 , x2 , x3 ; O1 , O2 ) = L ( x, y, z; O , P )
= x y z O ( x + 2 y 2 1) P (3 x 4 z ) =
2
extr
x1 , x2 , x3 ; O , P
wL º
= 1 2O x 3P = 0 »
wx
»
wL 1
= 1 4O y = 0 O = »
wy 4y »
»
wL 1 »
= 1 4 P = 0 O =
wz 4 »
wL »
= x2 + 2 y 2 1 = 0 »
wO »
wL »
= 3 x 4 z = 0. »
wP ¼
We multiply the first equation wL / wx by 4y, the second equation wL / wy by
(2 x) and the third equation wL / wz by 3 and add !
4 y 8O xy 12P y + 2 x + 8O xy 3 y + 12P y = y + 2 x = 0 .
542 Appendix C: Lagrange Multipliers
w 2L ª 2O 0 0º
H=( )=« 0 4O 0 »
wx j xk «¬ 0 0 0 »¼
3
ºª ª - 34 0 0 º
3 ª 4 03 0 º » « H (O2 = 3 ) = « 0 - 3 0 » d 0
H (O1 = ) = «0 2 0 » t 0
8 «0 0 0» »« 8 « 0 02 0 »
¬ ¼ »« ¬ ¼
(minimum) » «(maximum)
»«
( x, y, z; O , P )1 =( 13 ,- 32 , 14 ;- 83 , 14 ) 1 2 1 3 1
» «( x, y, z; O , P ) 2 =(- 3 , 3 ,- 4 ; 8 , 4 )
is the restricted minmal solution point.»¼ «¬is the restricted maximal solution point.
The geometric interpretation of the Hesse matrix follows from E. Grafarend and
P. Lohle (1991).
The matrix of second derivatives H decides upon whether at the point
( x1 , x2 , x3 , O )1, 2 we enjoy a maximum or minimum.
Apendix D: Sampling distributions and their use:
confidence intervals and confidence regions
D1 A first vehichle: Transformation of random variables
If the probability density function (p.d.f.) of a random vector y = [ y1 ,… , yn ]c is
known, but we want to derive the probability density function (p.d.f.) of a ran-
dom vector x = [ x1 ,… , xn ]c (p.d.f.) which is generated by an injective mapping x
=g(y) or xi = g i [ y1 ,… , yn ] for all i {1," , n} we need the results of Lemma D1.
Lemma D1 (transformation of p.d.f.):
Let the random vector y := [ y1 ,… , yn ]c be transformed into the random vector
x = [ x1 ,… , xn ]c by an injective mapping x = g(y) or xi = gi [ y1 ,… , yn ] for all
i {1,… , n} which is of continuity class C1 (first derivatives are continuous). Let
the Jacobi matrix J x := (wg i / wyi ) be regular ( det J x z 0 ), then the inverse
transformation y = g-1(x) or yi = gi1 [ x1 ,… , xn ] is unique. Let f x ( x1 ,… , xn ) be
the unknown p.d.f., but f y ( y1 ,… , yn ) the given p.d.f., then
1 x2 1
y1± = x1 ± 1 ( x12 x2 )
2 4 2
1 x12 1 2
y2± = x1 B ( x1 x2 ) .
2 4 2
At first we have computed the Jacobi matrix J x, secondly we aimed at an inver-
sion of the direct transformation ( y1 , y2 ) 6 ( x1 , x2 ) . As the detailed inversion
step proves, namely the solution of a quadratic equation, the mapping x = g(y) is
not injective.
Example D2:
Suppose (x1, x2) is a random variable having p.d.f.
ªexp( x1 x2 ), x1 t 0, x2 t 0
f x ( x1 , x2 ) = «
¬0 , otherwise.
ª y1
exp( y1 ) , y1 > 0, y2 > 0
f y ( y1 , y2 ) = « (1 + y2 ) 2
«
«¬0 , otherwise.
Proof:
The probability that the random variables y1 ,… , yn take on values in the region
: y is given by
In applying the transformation theorems of p.d.f. we meet quite often the prob-
lem that the function xi = gi ( y1 ,… , yn ) for all i {1,… , n} is given but not the
inverse function yi = gi1 ( x1 ,… , xn ) for all i {1,… , n} . Then the following
results are helpful.
Corollary D2 (Jacobian):
If the inverse Jacobian | det J x | = | det(wgi / wy j ) | is given, we are able to
compute.
wgi1 ( x1 ,… , xn ) wgi ( y1 ,… , yn ) 1
| det J y | = | det | = | det J |1 = | det | .
wx j wy j
Example D3 (Jacobian):
Let us continue Example D2. The inverse map
ª g 1 ( y , y ) º ª x + x º wy ª 1 1 º
y = « 11 1 2 » = « 1 2 » =« 2 »
¬« g 2 ( y1 , y2 ) ¼» ¬ x2 / x1 ¼ wxc ¬ x2 / x1 1/ x1 ¼
wy 1 x x +x
jy = | J y | = = + 22 = 1 2 2
wx c x1 x1 x1
wx x12 x2
jx = | J x | = j y1 = | J y |1 = = = 1
wy c x1 + x2 y1
ª g ( y , y ) º ª x º ª y /(1 + y2 ) º
x=« 1 1 2 »=« 1»=« 1 »
«¬ g 2 ( y1 , y2 ) »¼ ¬ x2 ¼ ¬ y1 y2 /(1 + y2 ) ¼
For the special case that the Jacobi matrix is given in a partitioned form, the
results of Corollary D3 are useful.
Corollary D3 (Jacobian):
If the Jacobi matrix Jx is given in the partitioned form
wg i ªU º
| J x |= ( )=« »,
wy j ¬V ¼
then
D2 A second vehicle: Transformation of random variables 547
if det(UU c) z 0
if det(VV c) z 0
Proof:
The Proof is based upon the determinantal relations of a partitioned matrix of
type
ªA Uº
» = det A det(D VA U ) if det A z 0
1
det «
¬ V D ¼
ªA Uº
» = det D det( A UD U) if det D z 0
1
det «
¬ V D ¼
ªA Uº
det « » = D det A V (adj A )U ,
¬ V D¼
which have been introduced by G. Frobenius (1908): Über Matrizen aus positi-
ven Elementen, Sitzungsberichte der Königlich Preussischen Akademie der Wis-
senschaften von Berlin, 471-476, Berlin 1908 and J. Schur (1917): Über Potenz-
reihen, die im Innern des Einheitskreises beschränkt sind, J. reine und angew.
Math 147 (1917) 205-232.
Table D1
Cartesian and polar coordinates of a two-dimensional observation space,
total measure of the arc of the circle
dy1dy2 = rdrdI1
S1 := {y R 2 | y12 + y22 = 1}
2S
Z1 = ³ dI 1 = 2S .
0
S S
(I1 , I2 , r ) [0, 2S ] × ] , [ × ]0, r[ , ( y1 , y2 ) R 2
2 2
( y1 , y2 , y3 ), R 3
Z2 = ³ dI ³ 1 dI2 cos I2 = 4S .
0 S / 2
Table D3
Cartesian and polar coordinates
of a four-dimensional observation space total measure of the
hypersurface of the 3-sphere
y1 = r cos I3 cos I2 cos I1 , y2 = r cos I3 cos I2 sin I1 ,
S S S S
(I1 , I2 , I3 , r ) [0, 2S ] × ] , [ × ] , [ × ]0, 2S [
2 2 2 2
dy1dy2 dy3dy4 = r 3 cos2 I3 cos I2 drdI3 dI2 dI1
w ( y1 , y2 , y3 , y4 )
J y := =
w (I1 , I2 , I3 , r )
ª r cos I3 cos I2 sin I1 r cos I3 sin I2 cos I1 r sin I3 cos I2 cos I1 cos I3 cos I2 cos I1 º
« »
« + r cos I3 cos I2 cos I1 r cos I3 sin I2 sin I1 r sin I3 cos I2 sin I1 cos I3 cos I2 sin I1 »
« 0 + r cos I3 cos I2 r sin I3 sin I2 cos I3 cos I2 »
« »
¬ 0 0 r cos I3 sin I3 ¼
| det J y |= r 3 cos 2 I3 cos I2
Z3 = 2S 2 .
Lemma D4 (polar coordinates, hypervolume element, hypersurface element):
S S S S S S
(I1 , I2 ,…, In2 , In1 , r ) [0, 2S ] × ] , + [ ×"× ] , + [ × ] , + [ × ]0, f[,
2 2 2 2 2 2
then the local hypervolume element
dy1 ...dyn = r n 1dr cos n 2 In 1 cos n 3 In 2 ...cos 2 I3 cos I2 dIn 1dIn 2 ...dI3dI2 dI1 ,
as well as the global hypersurface element
+S / 2 +S / 2 2S
2 S ( n 1) / 2
Z n 1 = := ³ cos Inn12 dIn 1 " ³ cos I2 dI2 ³ dI1 ,
n 1
*( ) S / 2 S / 2 0
2
where J ( X ) is the gamma function.
Before we care for the proof, let us define Euler’s gamma function.
Definition D5 (Euler’s gamma function):
f
*( x) = ³ e t t x 1 dt ( x > 0)
0
subject to
1
*(1) = 1 or *( ) = S
2
3 1 1 1
*(2) = 1! *( ) = *( ) = S
2 2 2 2
… …
p pq pq
*(n + 1) = n ! *( ) = *( )
q q q
p
if is a rational
+
if n is integer, n Z q
number, p / q Q+ .
1
(i) *(1) = 1 (i) *( ) = S
2
3 1 1 1
(ii) *(2) = 1 (ii) *( ) = *( ) = S
2 2 2 2
5 3 3 3
(iii) *(3) = 1 2 = 2 (iii) *( ) = *( ) = S
2 2 2 4
7 5 5 15
(iv) *(4) = 1 2 3 = 6 (iv) *( ) = *( ) = S.
2 2 2 8
Proof:
Our proof of Lemma D4 will be based upon computing the image of the tangent
space Ty S n 1 E n of the hypersphere S n 1 E n . Let us embed the hypersphere
S n 1 parameterized by (I1 , I2 ,… , In 2 , In 1 ) in E n parameterized by ( y1 ,… , yn ) ,
namely y E n ,
y = e1r cos In 1 cos In 2 " cos I2 cos I1 +
+e 2 r cos In 1 cos In 2 " cos I2 sin I1 + " +
+e n 1r cos In 1 sin In 2 +
+e n r sin In 1.
{g1 ,… , g n 1} span the image of the tangent space in E n . gn is the hypersphere
normal vector, || gn|| = 1. From the inner products < gi | gj > = gij, i, j {1,… , n} ,
we derive the Gauss matrix of the metric G:= [ gij].
< g1 | g1 > = r 2 cos 2 In 1 cos 2 In 2 " cos 2 I3 cos 2 I2
< g 2 | g 2 > = r 2 cos 2 In 1 cos 2 In 2 " cos 2 I3
"
< g n 1 | g n 1 > = r 2 ,
< g n | g n > = 1.
The off-diagonal elements of the Gauss matrix of the metric are zero.
Accordingly
The square root det G n , det G n1 elegantly represents the Jacobian deter-
minant
w ( y1 , y2 ,… , yn )
J y := = det G n .
w (I1 , I2 ,… , In 1 , r )
³ dI 1 = 2S
0
+S / 2
+S / 2
1
³ cos 2 I3 dI3 = [cos I3 sin I3 I3 ]+SS // 22 = S / 2
S / 2
2
+S / 2
1 4
³ cos3 I4 dI4 = [cos 2 I4 sin I4 2sin I4 ]+SS // 22 =
S / 2
3 3
...
D3 A first confidence interval of Gauss-Laplace normally distributed observations 553
+S / 2 +S / 2
1 1
³ (cos In 1 ) n 2 dIn 1 = [(cos In 1 ) n 3 ]+SS // 22 + (cos In 1 ) n 4 dIn 1
S / 2
n2 n 3 S³/ 2
with a left boundary l = x1 and a right boundary r = x2 . The length of the inter-
val is x2 x1 = r l . The center of the interval is ( x1 + x2 ) / 2 or P . Here we
have taken advantage of the Gauss-Laplace pdf in generating the cumulative
probability
P( x1 < X < x2 ) = F( x2 ) F( x1 )
F( x2 ) F( x1 ) = F( P + cV ) F( P + cV ).
f(x)
³ f ( x | P ,V
2
)dx = J
x1
³ f ( x | P , V 2 )dx = ³ f ( z | 0,1)dz = J .
x1 c
³ f ( x | P , V 2 )dx = 2 ³ f ( x | P , V 2 )dx = J
x1 0
+c c
³ f ( z | 0,1)dz = 2 ³ f ( z | 0,1)dz = J .
c 0
J ( z ) = 2 ³ f ( z
)dz
0
advantage of the Gauss inequality which has been reviewed in this context by F.
Pukelsheim (1994). There also the Vysochanskii-Petunin inequality has been
discussed. We follow here a two-step procedure. First, we divide the domain
z [0, f] into two intervals z [0,1] and z [1, f ] . In the first interval f ( z ) is
isotonic, differentiable and convex, f cc( z ) = f ( z )( z 2 1) < 0, while in the second
interval isotonic, differentiable and concave, f cc( z ) = f ( z )( z 2 1) > 0 . z = 1 is
the point of inflection. Second, we setup Taylor series of f ( z ) in the interval
z [0,1] at the point z = 0 , while in the interval z [1, f ] at the point z = 1 and
z [2, f] at the point z = 2 .
Three examples of such a forward solution of the characteristic linear Volterra
integral equation of the first kind will follow. They establish:
Box D1
Operational calculus applied to
the Gauss-Laplace probability distribution
“generating differential equations”
f cc( z ) + 2 f c( z ) + f ( z ) = 0
subject to
+f
³ f ( z )dz = 1
f
“recursive differentiation”
1 § 1 ·
f ( z) = exp ¨ z 2 ¸
2S © 2 ¹
f c( z ) = zf ( z ) =: g ( z )
f cc( z ) = g c( z ) = f ( z ) zg ( z ) = ( z 2 1) f ( z )
f ccc( z ) = 2 zf ( z ) + ( z 2 1) g ( z ) = ( z 3 + 3 z ) f ( z )
f ( 4) ( z ) = (3 z 2 + 3) f ( z ) + ( z 3 + 3 z ) g ( z ) = ( z 4 6 z 2 + 3) f ( z )
f (5) ( z ) = (4 z 3 12 z ) f ( z ) + ( z 4 6 z 2 + 3) g ( z ) = ( z 5 + 10 z 3 15 z ) f ( z )
f (6) ( z ) = (5 z 4 + 30 z 2 15) f ( z ) + ( z 5 + 10 z 3 15 z ) g ( z ) =
= ( z 6 15 z 4 + 45 z 2 15) f ( z )
D3 A first confidence interval of Gauss-Laplace normally distributed observations 557
f (7) ( z ) = (6 z 5 60 z 3 + 90 z ) f ( z ) + ( z 6 15 z 4 + 45 z 2 15) g ( z ) =
= ( z 7 + 21z 5 105 z 3 + 105 z ) f ( z )
f (8) ( z ) = (7 z 6 + 105 z 4 315 z 2 + 105) f ( z ) + ( z 7 + 21z 5 105 z 3 + 105 z ) f ( z ) =
= ( z 8 28 z 6 + 210 z 4 420 z 2 + 105) f ( z )
f (9) ( z ) = (8 z 7 168 z 5 + 840 z 3 840 z ) f ( z ) +
+( z 8 28 z 6 + 210 z 4 420 z 2 + 105) g ( z ) =
= ( z 9 + 36 z 7 378 z 5 + 1260 z 3 945z ) f ( z )
f (10) ( z ) = (9 z 8 + 252 z 6 1890 z 4 + 3780 z 2 945) f ( z ) +
+ ( z 9 + 36 z 7 378 z 5 + 1260 z 3 945) g ( z ) =
= ( z10 45 z 8 + 630 z 6 3150 z 4 + 4725 z 2 945) f ( z )
ª f ( z) º ª1 º
« » ª 1 0 0 0 0 0 0 0 0 0 0º « »
« f c( z ) » « 0 1 0 0 0 0 0 0 0 0 0 »» « »
z
« f cc( z ) » « « 2»
« » « 1 0 1 0 0 0 0 0 0 0 0» « z »
ccc
« f ( z) » « » 3
« 0 3 0 1 0 0 0 0 0 0 0» « z »
« (4) » « 4»
« f ( z) » « 3 0 6 0 1 0 0 0 0 0 0» « z »
« f (5) ( z ) » = f ( z ) « 0 15 0 100 0 0 0 1
»
0 0» « z 5 » .
« » « « »
« f (6) ( z ) » « 15 0 45 01 0 0 15 0 0 0» « z 6 »
« (7) » « »« »
« f ( z) » « 0 105 0 105 0 21 0 1 0 0 0» « z 7 »
« (8) » « »
« 105 0 420 0 210 0 28 0 1 0 0» « z8 »
« f ( z) » « »
« f (9) ( z ) » « 0 945 0 1260 0 378 0 36 0 1 0 » « z 9 »
« (10) » « »
¬ 945 0 4725 0 3150 0 630 0 45 0 1 ¼ « 10 »
¬« f ( z ) ¼» ¬« z ¼»
J ( z = 1) := 2³ f ( z
)dz
0
558 Appendix D: Sampling distributions and their use
( )
z z
1 2 1 1 1 5 1 7
³ f ( z
)dz
= ³ exp z
dz
= ( z z3 + z z +
0 2S 0 2S 6 40 336
1 9 1 1
+ z z11 + z13 + O (15))
3456 42240 599040
“the specific values z=1”
1
2 1 1 1 1 1 1
J (1) = 2 ³ f ( z )dz = (1 + + + + O (15))
0 2S 6 40 336 3456 42240 599040
2
= (1 0.166, 667 + 0.025, 000)
2S
0.002,976 + 0.000, 289
2
= 0.855, 624 = 0.682, 689 =
2S
= 0.683
“coefficient of confidence”
1
0.683 = 1 317.311
103 = 1 .
3
Table D1
Special values of derivatives
Gauss-Laplace probability distribution
z=0
1
= 0.398, 942
2S
1
f (0) = , f c(0) = 0,
2S
1 1
f cc(0) = , f cc(0) = 0.199, 471
2S 2!
3 1 (4)
f ccc(0) = 0, f ( 4 ) (0) = + , f (0) = +0.049,868
2S 4!
15 1 (6)
f (5) (0) = 0, f (6) (0) = , f (0) = 0.008,311.
2S 6!
Example D9 (Series expansion of the Gauss-Laplace integral, 2nd interval):
Let us solve the integrals
2 1 2
J ( z = 2) = J ( z = 1) + 2 ³ f ( z
)dz
,
1
up to order O(12) has led us to the coefficient of confidence J (2) = 0.954 . The
result
P( P 2V < X < P + 2V ) = 0.954
Table D2
Special values of derivatives
Gauss-Laplace probability distribution
z=1
1 § 1·
= 0.398,942, exp ¨ ¸ = 0.606,531
2S © 2¹
1 § 1·
f (1) = exp ¨ ¸ = 0.241,971
2S © 2¹
1 § 1·
f c(1) = f (1) = exp ¨ ¸ = 0.241,971, f cc(1) = 0
2S © 2¹
2 § 1·
f ccc(1) = 2 f (1) = exp ¨ ¸ = 0.482,942
2S © 2¹
1
f ccc(1) = +0.080, 490
3!
f ( 4 ) (1) = 2 f (1) = 0.482, 942
1 (4)
f (1) = 0.020,122
4!
6 § 1·
f (5) (1) = 6 f (1) = exp ¨ ¸ = 1.451,826
2S © 2¹
1 (5)
f (1) = 0.012, 098
5!
16 § 1·
f (6) (1) = 16 f (1) = exp ¨ ¸ = 3.871,536
2S © 2¹
1 (6)
f (1) = 0.005, 377
6!
20 § 1·
f (7) (1) = 20 f (1) = exp ¨ ¸ = 4.829, 420
2S © 2¹
1 (7)
f (1) = 0.000, 958
7!
f (8) (1) = 132 f (1) = 31.940,172
562 Appendix D: Sampling distributions and their use
1 (8)
f (1) = 0.000, 792
8!
f (9) (1) = 28 f (1) = 6.775,188
1 (9)
f (1) = 0.000, 019
9!
f (10) (1) = 8234 f (1) = 1992.389
1 (10)
f (1) = 0.000,549.
10!
Example D10 (Series expansion of the Gauss-Laplace integral, 3rd interval):
Let us solve the integrals
3 1 2 3
J ( z = 3) := 2 ³ f ( z )dz = 2 ³ f ( z )dz + 2 ³ f ( z ) dz + 2 ³ f ( z ) dz
0 0 1 2
J ( z = 3) = J ( z = 1) + J ( z = 2) + 2 ³ f ( z )dz,
2
namely in the 3rd interval 2 d z d 3 . First, we setup Taylor series of f(z) “around
the point z=2”. The derivatives of f(z) “at the point z=2” are collected up to order
10 in Table D3. Second, we integrate the Taylor series termwise and receive the
specific integral of Box D4. Note that termwise integration is permitted since the
Taylor series are uniformly convergent. The detailed computation up to order
O(12) leads us to the coefficient of confidence J (3) = 0.997 . The result
P ( P 3V < X < P + 3V ) = 0.997
Table D3
Special values of derivatives
Gauss-Laplace probability distribution
z=2
1
= 0.398,942, exp ( 2 ) = 0.135,335
2S
D3 A first confidence interval of Gauss-Laplace normally distributed observations 563
1
f (2) = exp ( 2 ) = 0.053,991
2S
f c(2) = 2 f (2) = 0.107, 982
1
f cc(2) = 3 f (2), f cc(2) = +0.080, 986
2!
1
f ccc(2) = 2 f (2), f ccc(2) = 0.017, 997
3!
1 (4)
f (4) (2) = 5 f (2), f (2) = 0.011, 248
4!
1 (5)
f (5) (2) = 18 f (2), f (2) = +0.008, 099
5!
1 (6)
f (6) (2) = 11 f (2), f (2) = 0.000,825
6!
1 (7)
f (7) (2) = 86 f (2), f (2) = 0.000, 921
7!
1 (8)
f (8) (2) = +249 f (2), f (2) = +0.000, 333
8!
1 (9)
f (9) (2) = 190 f (2), f (2) = +0.000, 028
9!
1 (10)
f (10) (2) = 2621 f (2), f (2) = 0.000, 039 .
10!
Box D4
A specific integral
“expansion of the exponential function”
1 § 1 ·
f ( z ) := exp ¨ z 2 ¸
2S © 2 ¹
1 1 1 (10)
f ( z ) = f (2) + f c(2)( z 2) + f cc(2)( z 2) 2 + " + f (2)( z 2)10 + O (11)
1! 3! 10!
“the specific integral”
z
11 11
³ f (z
)dz
= f (2)( z 2) + f c(2)( z 2) 2 + f cc(2)( z 2)3 +
2
2 1! 3 2!
564 Appendix D: Sampling distributions and their use
11 1 1 ( 4)
+ f ccc(2)( z 2) 4 + f (2)( z 2)5 +
4 3! 5 4!
1 1 (5) 1 1 (10)
+ f (2)( z 2)6 + " + f (2)( z 2)11 + O (12)
6 5! 1110!
case z=3
3
J (3) = J (1) + J (2) + 2 ³ f ( z ) dz =
2
“coefficient of confidence”
1
0.997 = 1 2.65
10 3 = 1 .
377
D 32 The backward computation of a first confidence interval of Gauss-
Laplace normally distributed observations: P , V 2 known
Finally we solve the Volterra integral equation of the first kind by the technique
of series inversion, also called series reversion. Let us recognize that the interval
integration of a Taylor series expanded Gauss-Laplace normal density distribu-
tion led us to a univariate homogeneous polynomial of arbitrary order. Such a
univariate homogeneous polynomial y = a1 x + a2 x 2 + " + an x n (“input”) can be
reversed as a univariate homogeneous polynomial x = b1 y + b2 y 2 + " + bn y n
(“output”) as outlined in Table D4. Consult M. Abramowitz and I. A. Stegun
(1965 p. 16) for a review, but E. Grafarend, T. Krarup and R. Syffus (1996) for a
derivation based upon Computer Algebra.
Table D4
Series inversion
E. Grafarend, T. Krarup, R. Syffus:
Journal of Geodesy 70 (1996) 276-286
“input: univariate homogeneous polynomial”
y = a1 x + a2 x 2 + a3 x 3 + a4 x 4 + a5 x 5 + a6 x 6 + a7 x 7 + O ( x 8 )
“coefficient relation”
D3 A first confidence interval of Gauss-Laplace normally distributed observations 565
(i) a1b1 = 1
(ii) a13b2 = a2
(vi) a111b6 = 7a13 a2 a5 + 7a13 a2 a4 + 84a1a23 a3 a14 a6 28a1a2 a32 42a25 28a12 a22 a4
(vii) a113b7 = 6a14 a2 a6 + 8a14 a2 a5 + 4a14 a42 + 120a12 a23 a4 + 180a12 a22 a32 + 132a26
Table D5
Series inversion quantile c0.90
(i) input
1 z
11
J ( z ) = 2 ³ f ( z
)dz
+ 2 ³ f ( z
)dz
= 0.682, 689 + 2[ f (1)( z 1) + f c(1)( z 1) 2 +
0 0
2 1!
1 1
+" + f ( n 1) (1)( z 1) n + O(n + 1)]
n (n 1)!
566 Appendix D: Sampling distributions and their use
1 11
[J ( z ) 0.682, 689] = f (1)( z 1) + f c(1)( z 1) 2 + " +
2 2 1!
1 1
+ f ( n 1) (1)( z 1)n + O ( n + 1)]
n ( n 1)!
y = a1 x + a2 x 2 + " + an x n + O ( n + 1)
x := z 1
y := (0.900, 000 0.682, 689) / 2 = 0.108, 656
11
a2 := f c(1) = 0.241, 971
2 1!
11
a3 := f cc(1) = 0
3 2!
11
a4 := f ccc(1) = 0.020,125
4 3!
1 1 (4)
a5 := f (1) = 0.004, 024
5 4!
…
1 1
an := f ( n 1) (1) .
n (n 1)!
(ii) output
1
b1 = = 4.132, 726
a1
1
b2 = a2 = 8.539, 715
a13
1
b3 = (2a22 a1a3 ) = 35.292,308
a15
1
b4 = (5a1a2 a3 a12 a4 5a23 ) = 158.070
a17
1
b5 = 9
(6a12 a2 a4 + 3a12 a32 + 14a24 a13 a5 21a1a22 a3 ) = 475.452,152
a1
z = x + 1 = 1.624,372 = c0.90 .
At this end we would like to give some sample references on computing the
“inverse error function”
x
1 1
y = F ( x ) := ³ exp( z 2 )dz =: erf x
0 2S 2
versus
x = F 1 ( y ) = inv erf y ,
namely L. Carlitz (1963) and A. J. Strecok (1968).
D4 Sampling from the Gauss-Laplace normal distribution:
a second confidence interval for the mean, variance known
The second confidence interval of Gauss-Laplace i.i.d. observations will be
constructed for the mean P̂ BLUUE of P , when the variance V 2 is known. “n”
is the size of the sample, namely to agree to the number of observations.
Before we present the general sampling distribution we shall work through two
examples. Example D12 has been chosen for a sample size n = 2, while Example
D12 for n=3 observations. Afterwards the general result is obvious and suffi-
ciently motivated.
f 2 ( x ) = f 2 (Vˆ / V )
2 2
x := Vˆ 2 / V 2
2
Figure D2: Special Helmert pdf F p for one degree of freedom p = 1
568 Appendix D: Sampling distributions and their use
f1 ( z 0,1)
z = ( Pˆ P ) /(V 2 / 2)
Figure D3: Special Gauss Laplace normal pdf of ( Pˆ P ) / V 2 / 2
1 § 1 ·
f ( y1 , y2 ) = 2
exp ¨ 2 [( y1 P ) 2 + ( y2 P ) 2 ] ¸ ,
2SV © 2V ¹
1 § 1 ·
f ( y1 , y2 ) = exp ¨ 2 (y 1P )c(y 1P ) ¸ .
2SV 2 © 2V ¹
The second action item
The coordinates of the observation vector have been denoted by [ y1 , y2 ]c =
= y Y, dim Y = 2 . The quadratic form ( y 1P )c( y 1P ) allows the fundamen-
tal decomposition
y 1P = ( y 1Pˆ ) + 1( Pˆ P )
( y 1P )c( y 1P ) = Vˆ 2 + 2( Pˆ P ) 2 .
Vˆ 2 Vˆ 2
dF = f ( y1 , y2 )dy1 dy2 = f1 ( Pˆ ) f 2 ( x)d Pˆ dx = f1 ( Pˆ ) f 2 ( 2
) d Pˆ d 2
V V
has to be decomposed into the first pdf f1 ( Pˆ P , V 2 / n ) representing the pdf
of the sample mean P̂ and the second pdf f 2 ( x ) of the new variable
x := ( y1 y2 ) 2 /(2V 2 ) = Vˆ 2 / V 2 representing the sample variance Vˆ 2 , normal-
ized by V 2 .
? How can the second decomposition f1f2 be understood?
Let us replace the quadratic form ( y 1P )c( y 1P ) = Vˆ 2 + 2( Pˆ P ) 2 in the cu-
mulative pdf
1 § 1 ·
dF = f ( y1 , y2 )dy1 dy2 = 2
exp ¨ 2 (y 1P )c(y 1P ) ¸ dy1 dy2 =
2SV © 2V ¹
1 § 1 ·
= 2
exp ¨ 2 [Vˆ 2 + 2( Pˆ P ) 2 ] ¸ dy1 dy2
2SV © 2V ¹
1 1 § 1 (Pˆ P)2 · 1 1 § 1 Vˆ 2 ·
dF = f ( y1 , y2 )dy1dy2 =
exp ¨ ¸
exp ¨ 2 ¸
dy1dy2 .
2S V
2
© 2 V / 2 ¹ 2S 2V © 2V ¹
2
2
The quadratic form Vˆ , conventionally given in terms of the residual vector
y 1P̂ will be rewritten in terms of the coordinates [ y1 , y2 ]c = y of the observa-
tion vector.
1 1
Vˆ 2 = ( y1 Pˆ ) 2 + ( y2 Pˆ ) 2 = [ y1 ( y1 + y2 )]2 + [ y2 ( y1 + y2 )]2
2 2
1 1 1
Vˆ 2 = ( y1 y2 ) 2 + ( y2 y1 ) 2 = ( y1 y2 ) 2 .
4 4 2
The fourth action item
1
The new variable x := ( y1 y2 ) 2 will be introduced in the cumulative pdf
2V 2
dF = f ( y1 , y2 )dy1dy2 . The new surface element d Pˆ dx will be related to the old
surface element dy1dy2 .
ª D y Pˆ D y Pˆ º
d Pˆ dx = | det « 1 2
| dy dy = J dy1dy2
¬ Dy x 1
D y x »¼ 1 2
2
wPˆ 1 wPˆ 1
D y Pˆ := = , D y Pˆ := =
1
wy1 2 2
wy2 2
D4 A second confidence interval for the mean, variance known 571
wx y1 y2 wx y y
D y x := = , D y x := = 1 2 2 .
1
wy1 V2 2
wy2 V
V2 V
dy1 dy2 = d Pˆ dx = d Pˆ dx
y1 y2 2 x
based upon
1 1
x= 2
( y1 y2 ) 2 2 x = ( y1 y2 ) .
2V V
In collecting all detailed partial results we can formulate a corollary.
Corollary D6: (marginal probability distributions of Pˆ, V 2 given, and Vˆ 2 ):
The cumulative pdf of a set of two observations is represented by
dF = f ( y1 , y2 )dy1 dy2 = f1 ( Pˆ P , V 2 / 2) f 2 ( x)d Pˆ dx
subject to
1 1 § 1 ( Pˆ P ) 2 ·
f1 ( Pˆ P , V 2 / 2) :=
exp ¨ ¸
2S V
2
© 2 V /2 ¹
2
2
Vˆ 1 1 § 1 ·
f 2 ( x) = f 2 ( 2 ) := exp ¨ x ¸
V 2S x © 2 ¹
subject to
+f +f
³ f1 ( Pˆ ) d Pˆ = 1 and ³ f 2 ( x) dx = 1.
f 0
f 2 (2Vˆ 2 | V 2 )
x := 2V 2 | V 2
V2
f1 ( Pˆ | P , )
3
V2
( Pˆ P ) /
3
V2
Figure D5: Special Gauss-Laplace normal pdf of ( Pˆ P ) /
3
The first action item
Let us assume an experiment of three Gauss-Laplace i.i.d. observations. Their
pdf is given by
f ( y1 , y2 , y3 ) = f ( y1 ) f ( y2 ) f ( y3 ) ,
§ 1 ·
f ( y1 , y2 , y3 ) = (2S ) 3 / 2 V 3 exp ¨ 2 [( y1 P ) 2 + ( y2 P ) 2 + ( y3 P ) 2 ] ¸ ,
© 2V ¹
§ 1 ·
f ( y1 , y2 , y3 ) = (2S ) 3 / 2 V 3 exp ¨ 2 (y 1P )c( y 1P ) ¸ .
© 2V ¹
The coordinates of the observation vector have been denoted by [ y1 , y2 , y3 ]c = y Y ,
dim Y = 2 .
The second action item
The quadratic form ( y1 1P )c( y2 1P ) allows the fundamental decomposition
y 1P = ( y 1Pˆ ) + 1( Pˆ P ) ,
( y 1P )c( y 1P ) = 2Vˆ 2 + 3( Pˆ P ) 2 .
574 Appendix D: Sampling distributions and their use
1
Pˆ BLUUE of P : Pˆ = ( y1 + y2 + y3 )
3
1
Vˆ 2 BIQUUE of V 2 : Vˆ 2 = [( y1 Pˆ ) 2 + ( y2 Pˆ ) 2 + ( y3 Pˆ ) 2 ].
2
As soon as we substitute P̂ and Vˆ 2 we arrive at
( y1 P ) 2 + ( y2 P ) 2 + ( y3 P ) 2 = [( y1 Pˆ ) 2 + ( Pˆ P )]2 +
+[( y2 Pˆ ) 2 + ( Pˆ P ) 2 ]2 + [( y3 Pˆ ) 2 + ( Pˆ P ) 2 ]2 =
= 2Vˆ 2 + 3( Pˆ P ) 2 .
1 2 4 2 1 2 4 4 2
( y2 Pˆ ) 2 = y1 + y2 + y3 y1 y2 y2 y3 + y3 y1
9 9 9 9 9 9
1 2 1 2 4 2 2 4 4
( y3 Pˆ ) 2 = y1 + y2 + y3 + y1 y2 y2 y3 y3 y1
9 9 9 9 9 9
and
2 2
( y1 Pˆ ) 2 + ( y2 Pˆ ) 2 + ( y3 Pˆ ) 2 = ( y1 + y22 + y32 y1 y2 y2 y3 y3 y1 ).
3
We shall prove
( y 1Pˆ )c( y 1Pˆ ) = y cMy = z12 + z22 , rkM = 2, M \ 3×3
ª 2 1 1º
M = « 1 2 1»
« »
«¬ 1 1 2 »¼
ª 2 1 1º ª y1 º
1
( y 1P ) ( y 1P ) = y My = [ y1 , y2 , y3 ] « 1 2 1» « y2 » .
ˆ c ˆ c
3 « »« »
«¬ 1 1 2 »¼ «¬ y3 »¼
F.R. Helmert (1975, 1976 a, b) had the bright idea to implement what we call
nowadays the forward Helmert transformation
1
z1 = ( y1 y2 )
1 2
1
z2 = ( y1 + y2 2 y3 )
23
or
1 2
z12 = ( y1 2 y1 y2 + y22 )
2
576 Appendix D: Sampling distributions and their use
1
z22 = ( y12 + y22 + 4 y32 + 2 y1 y2 4 y2 y3 4 y3 y1 )
6
2 2
z12 + z22 = ( y1 + y22 + y32 y1 y2 y2 y3 y3 y1 ).
3
Indeed we found
( y 1Pˆ )c( y 1Pˆ ) = ( y1 Pˆ ) 2 + ( y2 Pˆ ) 2 + ( y3 Pˆ ) 2 = z12 + z22 .
ª 1 1 º
« 0 »
1 2 1 2
z = H 23 y , H 23 := « ».
« 1 1 2 »
« »
¬ 23 23 23 ¼
The rectangular Helmert matrix is right orthogonal,
ª 1 1 º
1 1 « »
ª º 1 2 23 »
« 0 »«
1 2 1 2 « 1 1 » ª1 0 º
H 23 H c23 = « » »=« » = ǿ2.
« 1 1 2 » «« 1 2 2 3 » ¬0 1 ¼
« »
¬ 23 23 23 ¼ « 2 »
« 0 »
¬ 23 ¼
brings
2
( y 1Pˆ )c( y 1Pˆ ) = ( y12 + y22 + y32 y1 y2 y2 y3 y3 y1 )
3
via
ª 1 1
« y1 = 1 2 z1 + 2 3 z2
«
« 1 1
y = H 23z or « y2 =
'
z1 + z2
1 2 23
«
«y = 2 z ,
«¬ 3 23
2
1 2 1 2 1
y12 + y22 + y32 = z1 + z2 + z1 z2 +
2 6 3
1 1 1
+ z22 + z22 z1 z2 +
2 6 3
2
+ z22 ,
3
1 1 1 1 1 1
y1 y2 + y2 y3 + y3 y1 = z12 + z22 + z1 z2 z22 z1 z2 z22 ,
2 6 3 3 3 2
2 2 2 3 3
( y1 + y22 + y32 y1 y2 y2 y3 y3 y1 ) = ( z12 + z22 ) = z12 + z22 ,
3 3 2 2
into the canonical form.
The fifth action item
Let us go back to the partitioned pdf in order to inject the canonical representa-
tion of the deficient quadratic form y cMy, M \ 3×3 , rk M=2. Here we meet first
the problem to transform
§ 1 ·
dF = (2S ) 3/ 2 V 3 exp ¨ 2 [2Vˆ 2 + 3( Pˆ P ) 2 ] ¸ dy1dy2 dy3
© 2V ¹
by an extended vector [ z1 , z2 , z3 ]c =: z into the canonical form
§ 1 ·
dF = (2S ) 3/ 2 exp ¨ ( z12 + z22 + z32 ) ¸ dz1dz2 dz3 ,
© 2 ¹
which is generated by the general forward Helmert transformation
z = V 1 H ( y 1P )
578 Appendix D: Sampling distributions and their use
ª 1 1 º
« 0 »
« 1 2 1 2 » ª y1 P º
ª z1 º
«z » = 1 « 1 1
2 »«
y2 P »
« 2» V « 23 23 23» « »
«¬ z3 »¼ « » ¬« y3 P ¼»
« 1 1 1 »
«¬ 3 3 3 »¼
or its backward Helmert transformation, also called the general inverse Helmert
transformation
ª 1 1 1 º
« 1 2 23 3 »» z
ª y1 P º « ª 1º
« y P» = V « 1 1 1 »« »
« z2 .
« 2 » 1 2 23 3»« »
«¬ y3 P »¼ « » «z »
« 2 1 » ¬ 3¼
«¬ 0
23 3 »¼
y 1P = V Hcz
ª y1 P º
1 1 1 «
z3 = V 1[ , , ] y2 P »
3 3 3 « »
«¬ y3 P »¼
y + y2 + y3 3P º
z3 = V 1 1 »
3 »
y1 + y2 + y3 »
= Pˆ y1 + y2 + y3 = 3Pˆ »
3 ¼
Pˆ P 2 1
z3 = 3V 1 , z3 = 2 3( Pˆ P ) 2 .
3 V
Indeed the extended Helmert matrix H3 is ingenious to decompose
1
( y 1Pˆ )c( y 1Pˆ ) = z12 + z22 + z32
V2
into a canonical quadratic form relating z12 + z22 to V̂ 2 and z32 to ( Pˆ P ) 2 . At this
point, we have to interpret the general Helmert transformation z = V 1H( y P ) :
Structure elements
of the Helmert transformation
scale or dilatation V 1
rotation H
translation P
V 1 \ + produces a dilatation or a scale change, H SO(3) := {H \ 3×3
HcH = I 3 and H = +1} a rotation (3 parameters) and 1P \ 3 a translation.
Please, prove for yourself that the quadratic Helmert matrix is orthonormal, that
is HH c = H cH = I 3 and H = +1 .
The sixth action item
Finally we are left with the problem to split the cumulative pdf into one part
f1 ( Pˆ ) which is a marginal distribution of the arithmetic mean P̂ BLUUE of P
and another part f 2 ( Pˆ ) which is a marginal distribution of the standard devia-
tion V̂ , V̂ 2 BIQUUE of V 2 , Helmert’s F 22 with two degrees of freedom.
First let us introduce polar coordinates (I1 , r ) which represent the Cartesian
coordinates z1 = r cos I1 , z2 = r sin I1 . The index 1 is needed for later generaliza-
tion to higher dimension. As a longitude, the domain of I1 is I1 [0, 2S ] or
0 d I1 d 2S . The new random variable z12 + z22 =|| z ||2 =: x or radius r relates to
Helmert’s
2Vˆ 2 1
F 2 = z12 + z22 = = ( y 1Pˆ )c( y 1Pˆ ) .
V2 V2
580 Appendix D: Sampling distributions and their use
V 3 § 3 ·
f1 ( Pˆ P , )d Pˆ = (2S ) 1/ 2 exp ¨ 2 ( Pˆ P ) 2 ¸ d Pˆ .
3 V © 2V ¹
Third, the marginal distribution of the sample variance 2Vˆ 2 / V 2 = z12 + z22 =: x ,
Helmert’s F 2 distribution for p=n-1=2 degrees of freedom,
p
1 1 § x·
f 2 (2Vˆ 2 / V 2 ) = f 2 ( x) = x 2 exp ¨ ¸
p © 2¹
2 p / 2 *( )
2
is generated by
+f 2S
§ 1 2· 1 1 § 1 ·
dF2 = 1/ 2
³f (2S ) exp ¨© 2 z3 ¸¹ dz3 ³0 dI1 (2S ) 2 exp ¨© 2 x ¸¹ dx
2S
1 § 1 ·
dF2 = (2S ) 1Z2 exp ¨ x ¸ dx subject to Z 2 = ³ dI 1 = 2S
2 © 2 ¹ 0
1 § 1 ·
dF2 = exp ¨ x ¸ dx
2 © 2 ¹
and
dx
x := z12 + z22 = r 2 dx = 2rdr, dr =
2r
1
dz1 dz2 = rdrdI1 = dxdI1
2
is the transformation of the surface element dz1dz2 . In collecting all detailed
results let us formulate a corollary.
D4 A second confidence interval for the mean, variance known 581
V2 1 1 § 1 ( Pˆ P ) 2 ·
f1 ( Pˆ | P , ) := * * exp ¨ 2 ¸
3 2S V / 3 © 2 V /3 ¹
1 § 1 ·
f 2 ( x) = exp ¨ x ¸
2 © 2 ¹
subject to
2
x := z12 + z22 = 2Vˆ 2 / V 2 , dx = dVˆ 2
V2
and
f f
³ f1 ( Pˆ ) d Pˆ = 1 versus ³f 2 ( x ) dx = 1.
f 0
1 § 1 (Pˆ P )2 · Vˆ 2 § Vˆ 2 · 2
= (2S )1/ 2 exp ¨ 2 ¸ d Pˆ = exp ¨ 2 ¸ dVˆ
V/ 3 © 2 V /3 ¹ V2 © V ¹
or or
582 Appendix D: Sampling distributions and their use
f1 ( z3 0,1)dz3 = f 2 ( x)dx =
1 1 1
= (2S ) 1/ 2 exp( z32 ) dz3 = exp( x)dx
2 2 2
Pˆ P Vˆ 2
z3 := x := 2
3V V2
D41 Sampling distributions of the sample mean P̂ , V 2 known, and of
the sample variance Vˆ 2
The two examples have prepared us for the general sampling distribution of the
sample mean P̂ , V 2 known, and of the sample variance Vˆ 2 for Gauss-Laplace
i.i.d. observations, namely samples of size n. By means of Lemma D8 on the
rectangular Helmert transformation and Lemma D9 on the quadratic Helmert
transformation we prepare for Theorem D10 which summaries both the pdfs for
P̂ BLUUE of P , V 2 known, for the standard deviation Vˆ and for Vˆ 2
BIQUUE of V 2 . Corollary D11 focusses on the pdf of V = qVˆ where V is an
unbiased estimation of the standard deviation V , namely E{V } = V .
Lemma D8 (rectangular Helmert transformation):
The rectangular Helmert matrix H n 1, n \ n×( n 1) transforms the degenerate
quadratic form
( n 1)Vˆ 2 := ( y 1Pˆ )c( y 1Pˆ ) = y c My, rk M= n-1
subject to
1
Pˆ = 1cy
n
into the canonical form
(n 1)Vˆ 2 = zcn 1z n 1 = z12 + " + zn21 .
z 2n = n( Pˆ P ) 2 .
Since the quadratic Helmert matrix is orthonormal, the inverse Helmert trans-
formation is generated by
z 6 y 1P = V H cz .
584 Appendix D: Sampling distributions and their use
The proofs for Lemma D8 and Lemma D9 are based on generalizations of the
special cases for n=2, Example D11, and for n=3, Example D12. Any proof will
be omitted here.
The highlight of this paragraph is the following theorem.
Theorem D10 (marginal probability distribution of ( Pˆ , V 2 ) and Vˆ 2 ):
The cumulative pdf of a set of n observations is represented by
dF = f ( y1 ,… , yn )dy1 " dyn =
as the product of the marginal pdf f1 ( Pˆ ) of the sample mean Pˆ = n 1 1y and the mar-
ginal pdf f 2 (Vˆ ) of the sample standard deviation Vˆ = ( y 1Pˆ )c( y 1Pˆ ) /( n 1) ,
also called r.m.s. (root mean square error), or the marginal pdf f 2 (Vˆ 2 ) of the
sample variance Vˆ 2 . Those marginal pdfs are represented by
dF1 = f1 ( Pˆ )d Pˆ
V2 1 ª 1 ( Pˆ P ) 2 º
f1 ( Pˆ ) = f1 ( Pˆ | P , ) := exp « »
n V 2
¬ 2 V /n ¼
2S
n
n 1 1
z := ( Pˆ P ) : f1 ( z )dz = exp( z 2 ) dz
V 2S 2
2pp p Vˆ 2
f 4 (Vˆ ) = Vˆ p 1 exp( )
p
V p 2 p / 2 *( ) 2 V2
2
Vˆ p
x := n 1 = Vˆ
V V
2 1
f 4 ( x )dx = x p 1 exp( x 2 )dx
p 2
2 p / 2 *( )
2
D4 A second confidence interval for the mean, variance known 585
dF4 = f 4 ( x )dx
p := n -1
1 1 Vˆ 2
f 2 (Vˆ 2 ) = p p / 2Vˆ p 2 exp( p ),
p
V p 2 p / 2 *( ) 2 V2
2
Vˆ 2 p
x := ( n 1) 2
= 2 Vˆ 2 :
V V
p
1 1 1
f 2 ( x )dx = x 2 exp( x )dx.
p/2 p 2
2 *( )
2
V2
f1 ( Pˆ | P ,
) as the marginal pdf of the sample mean BLUUE of P is a
n
Gauss-Laplace pdf with mean P and variance V 2 / n . f 2 ( x ), x := pVˆ / V , is
the standard pdf of the normalized root-mean-square error with p degrees of
freedom. In contrast, f 2 ( x ), x := pVˆ 2 / V 2 is a Helmert Chi Square F p2 pdf with
p = n 1 degrees of freedom.
Before we present a sketch of a proof of Theorem D10 which will be run with
five action items and a special reference to the first and second vehicle, we give
some historical comments. S. Kullback (1934) refers the marginal pdf f1 ( Pˆ ) of
the “arithmetic mean” P̂ to S. D. Poisson (1827), F. Haussdorff (1901) and J.
O. Irwins (1927). He has also solved the problem to find the marginal pdf of the
“geometric mean”. The marginal pdf f 2 (Vˆ 2 ) of the sample variance Vˆ 2 has been
originally derived by F. R. Helmert (1875, 1976 a, b). A historical discussion of
Helmert’s distribution is offered by H. A. David (1957), W. Kruskal (1946), H.
O. Lancaster (1965, 1966), K. Pearson (1931) and O. Sheynin (1995).
The marginal pdf f 4 (Vˆ ) has not found any interest in practice so far. The reason
may be found in the effect that Vˆ is not an unbiased estimate of the standard
deviation V, namely E{Vˆ } = V . According to E. Czuber (1891 p. 162), K. D. P.
Rosen (1948 p. 37) L. Schmetterer (1956 p. 203), R. Stoom (1967 p. 199, 218),
M. Fisz (1971 p. 240) and H. Richter and V. Mammitzsch (1973 p. 42) have
documented that
p
*( )
p 2 Vˆ = qVˆ
V =
2 *( p + 1 )
2
586 Appendix D: Sampling distributions and their use
( y 1Pˆ )c( y 1P ) 2p 1
Vˆ p := = Vˆ p
1 2 p 1 2
p
2
p +1 § § p + 1 ·2 ·
*p( ) ¨ ¨ *( ) ¸
2 ¸ V 2 ¸
p
2p 2
f 4 (V ) = V p 1
exp ¨ ¨ ¸
V p 2p/2 p
p
p ¨ ¨ *( p ) ¸ ¸
( ) 2 * p +1 ( ) ¨ © ¨ ¸ ¸
2 2 © 2 ¹ ¹
and
dF4 = f 4 ( x)dx
p +1
*( )
2 2 Vˆ
x :=
V *( p )
2
2 1
f 4 ( x) = x p 1 exp( x 2 )
p 2
2 p / 2 *( )
2
subject to
p +1
*(
)
E{x} = 2 2 .
p
*( )
2
D4 A second confidence interval for the mean, variance known 587
Figure. D6: Marginal pdf for the sample standard deviation V (r.m.s.)
Proof:
The first action item
The pdf of n Gauss-Laplace i.i.d. observations is given by
n
f ( y1 ," , yn ) = f ( y1 )" f ( yn ) = f ( yi )
i =1
§ 1 ·
f ( y1 ," , yn ) = (2S ) n / 2 V n exp ¨ 2 (y 1P )c(y 1P ) ¸ .
© 2V ¹
The coordinates of the observation space Y have been denoted by
[ y1 ," , yn ]c = y . Note dim Y = n .
The second action item
The quadratic form ( y 1P )c( y 1P ) > 0 allows the fundamental decomposition
y 1P = (y 1Pˆ ) + 1( Pˆ P ) ,
(y 1P )c(y 1P ) = (y 1Pˆ )c(y 1Pˆ ) + 1c1( Pˆ P ) 2
1c1 = n
588 Appendix D: Sampling distributions and their use
(y 1P )c(y 1P ) = (n 1)Vˆ 2 + n( Pˆ P ) 2 .
§ 1 ·
= (2S ) n / 2 V n exp ¨ 2 (y 1P )c(y 1P ) ¸ dy1 " dyn =
© 2V ¹
§ 1 ·
= (2S ) n / 2 V n exp ¨ 2 [( n 1)Vˆ 2 + n( Pˆ P ) 2 ] ¸ dy1 " dyn
© 2V ¹
z = V 1 H ( y 1P ) or y 1P = V Hcz
1
( y 1P )c( y 1P ) = z12 + z22 + " + zn21 + zn2 .
V2
Here, we have substituted the divert Helmert transformation (quadratic Helmert
matrix H) and its inverse. Again V 1 is the scale factor, also called dilatation, H
an orthonormal matrix, also called rotation matrix, and 1P R n the translation,
also called shift.
dF = f ( y1 ," , yn )dy1 " dyn =
1 § 1 1 · § 1 ·
= exp ¨ zn2 ( n 1) / 2 ¸
exp ¨ ( z12 + " + zn2 ) ¸ dz1dz2 " dzn 1dzn
2S © 2 (2S ) ¹ © 2 ¹
based upon
dy1dy2 " dyn 1dyn = V n dz1dz2 " dzn 1dzn
J = V n | Hc |= V n | H |= V n .
J again denotes the absolute value of the Jacobian determinant introduced by the
first vehicle.
The fourth action item
First, we identify the marginal distribution of the sample mean P̂ .
V2
dF1 = f1 ( Pˆ | P , )d Pˆ .
n
D4 A second confidence interval for the mean, variance known 589
ª y1 P º
1 1 « y + " + yn n P
zn = V [ 1
," , ] # » = V 1 1 ,
n n « » n
«¬ yn P »¼
upon substituting
1 y + " + yn
Pˆ = 1cy = 1 y1 + " + yn = n Pˆ
n n
n( Pˆ P ) ( Pˆ P ) 2 1
zn = V 1 zn2 = n 2
, dzn = n d Pˆ .
n V V
Let us implement dzn in the marginal distribution.
1 1
dF1 = exp( zn2 )dzn
2S 2
+f +f
1
³ " ³ (2S ) ( n 1) / 2 exp[ ( z12 + " + zn21 )]dz1 " dzn 1 ,
f f
2
+f +f
1
³f " f³ (2S )
( n 1) / 2
exp[ ( z12 + " + zn21 )]dz1 " dzn 1 = 1
2
1 1
dF1 = exp( zn2 )dzn
2S 2
1 1 1 ( Pˆ P ) 2 V2
dF1 = exp( ) d P
ˆ = f1 ( P
ˆ | P , )d Pˆ .
2S V 2 V2 n
n n
The fifth action item
Second, we identify the marginal distribution of the sample variance Vˆ 2 . We
depart the ansatz
dF2 = f 2 (Vˆ )dVˆ = f 2 (Vˆ 2 )dVˆ 2
1 1
( n 1) / 2
exp[ ( z12 + " + zn21 )]dz1 " dzn 1 .
(2S ) 2
Transform the Cartesian coordinates ( z1 ," , zn 1 ) R n 1 to spherical coordinates
()1 , ) 2 ," , ) n 2 , rn 1 ) . From the operational point of view, p = n 1 , the num-
ber of “degrees of freedom”, is an optional choice. Let us substitute the global
hypersurface element Z n 1 or Z p into dF2 , namely
f
1 1
³ exp( 2 z
2
n )dzn = 1
2S f
1 1
dF2 = r n 2 exp( r 2 )dr
2( n 1) / 2 2
+S / 2 +S / 2 +S / 2 2S
³ cosn 3In 2 dIn 2 ³ cosn 4In 3dIn 3 " ³ cosI2 dI2 ³ dI1
S / 2 S / 2 S / 2 0
1 1
dF2 = p/2
r p 1 exp( r 2 )dr
(2S ) 2
+S / 2 +S / 2 +S / 2 +S / 2 2S
³ cos p 2I p 1 ³ cos p 3I p 2 dI p 2 " ³ cos2I3dI3 ³ cosI2 dI2 ³ dI1
S / 2 S / 2 S / 2 S / 2 0
2 2
Z n 1 = Z p = S ( n 1) / 2 = S p/2 =
n 1 p
*( ) *( )
2 2
S /2 S /2 S /2 S /2 2S
= ³ cos p 2I p 1dI p 1 ³ cos p 3I p 2 dI p 2 " ³ cos 2I3dI3 ³ cosI2 dI2 ³ dM1
S / 2 S / 2 S / 2 S / 2 0
Zp 1
dF2 = p/2
r p 1 exp( r 2 )dr
(2S ) 2
2 1
dF2 = r p 1 exp( r 2 )dr .
p 2
2 p / 2 *( )
2
The marginal distribution of the r.m.s. f 2 (Vˆ ) is generated as soon as we substi-
tute the radius r = z12 + " + z n21 = n 1 Vˆ1 / V . Alternatively the marginal
distribution of the sample variance f 2 (Vˆ 2 ) is produced when we substitute the
radius square r 2 = z12 + " + zn21 = ( n 1)Vˆ 2 / V 2 .
Project A
Vˆ p
r = n 1 Vˆ / V = dr =
p dVˆ
V V
dF2 = f 2 (Vˆ )dVˆ
D4 A second confidence interval for the mean, variance known 591
2pp 1 p 2
f 2 (Vˆ ) = Vˆ p -1 exp( Vˆ ) .
V p 2p/2 2V2
Indeed, f 2 (Vˆ ) establishes the marginal distribution of the root-mean-square
error Vˆ with p=n-1 degrees of freedom.
Project B
ª dx = 2rdr
Vˆ 2 Vˆ 2
x := r = (n 1) 2 = p 2 =: F p «
2 2
V V « dr = dx = 1 dx
«¬ 2r 2 x
1 2p 1
r p 1 dr = x dx
2
dF2 = f 2 ( x)dx
p
1 1 1
f 2 ( x) := x 2 exp( x ).
p 2
2 p / 2 *( )
2
Finally, we have derived Helmert’s Chi Square F p2 distribution f 2 ( x)dx = dF2
by substituting r 2, r p-1 and dr in factor of x := r 2 and dx = 2r dr.
Project C
Replace the radical coordinate squared r 2 = (n 1)Vˆ 2 / V 2 = pVˆ 2 / V 2 by re-
scaling on the basis p / V 2
Vˆ 2 p
x = r 2 = z12 + " + zn21 = z12 + " + z 2p = (n 1) 2
= 2 Vˆ 2
V V
p
dx = dVˆ
V2
within Helmert’s F p2 with p=n-1 degrees of freedom
dF2 = f 2 (Vˆ 2 )dVˆ 2
1 1 p 2
f 2 (Vˆ 2 ) = p p / 2Vˆ p 2 exp( Vˆ )
p
V p 2 p / 2 *( ) 2V2
2 .
Recall that f 2 (Vˆ 2 ) establishes the marginal distribution of the sample variance
Vˆ 2 with p = n -1 degrees of freedom.
592 Appendix D: Sampling distributions and their use
Both, the marginal pdf f 2 (Vˆ ) of the sample standard deviation Vˆ , also called
root-mean-square error, and the marginal pdf f 2 (Vˆ 2 ) of the sample variance Vˆ 2
document the dependence on the variance V 2 and its power V p .
h
“Here is my journey’s end.”
(W. Shakespeare: Hamlet)
D42 The confidence interval for the sample mean, variance known
Lemma D12 (confidence interval for the sample mean, variance known):
V V
P ]Pˆ c1D / 2 , Pˆ + c1D / 2 [
n n
with confidence
V V
P{Pˆ c1D / 2 < P < Pˆ + c1D / 2 } = 1 D
n n
of level 1-D. For three values of the coefficient of confidence J =1-D, Table D7
is a list of associated quantiles c1-D/2 .
ª y1 º ª1.2 º
« y » «3.4 »
y := « 2 » = « » , Pˆ = 2.7, V 2 = 9, r.m.s. : 3
« y3 » « 0.6 »
« » « »
«¬ y4 »¼ «¬5.6 »¼
certain to contain the unknowns parameter P between them. Previously, for sam-
ples of size 4 we have known that the random variable
Pˆ P Pˆ P
z= =
V 3
n 2
is normally distributed with mean zero and unit variance. P̂ is the sample mean
2.7 and 3/2 is V / n . The probability J = 1 D that z will be between any two
arbitrarily chosen numbers c1 = - c and c2 = c is
c2
+c
c c
c c
D D
³ f ( z )dz = 1 , ³ f ( z )dz = .
f
2 f
2
Table D7 collects the quantiles c1-D/2 given the coefficients of confidence on their
complements which we listed in Table D7.
Table D7
Quantiles for the confidence interval of the sample mean,
variance unknown
1 D / 2 = (1 + J ) / 2 J D c1-D/2
0.975 0.95 0.05 1.960
0.995 0.99 0.01 2.576
0.999,5 0.999 0.001 3.291
Given the quantiles c1-D/2, we are going to construct the confidence interval for
the sample mean P̂ , the variance V 2 to be known. For this purpose, we convert
the forward transformations Pˆ o z = n ( Pˆ P ) / V to P.
V
Pˆ z=P
n
V V
Pˆ1 := Pˆ c1D / 2 < P < Pˆ + c1D / 2 =: Pˆ 2 .
n n
The interval Pˆ1 < P < Pˆ 2 for the fixed value z = c1-D/2 contains the “true” mean P
with probability J =1-D.
c Pˆ 2
V V V2
P{Pˆ c D < P < Pˆ + c D } = ³ f ( z )dz = ³ f ( Pˆ | P , )d Pˆ = J = 1 D
n 1 2 n 1 2 c Pˆ
n 1
since
V
Pˆ + c1D / 2
Pˆ 2 n c1D / 2
D
³ f ( Pˆ )d Pˆ = ³ f ( Pˆ )d Pˆ = ³ f ( Pˆ )d Pˆ = 1 .
f f f 2
f ( Pˆ | P , V 2 ) f ( Pˆ | P , V 2 )
P̂1 P P̂ 2
P̂1
V V
Pˆ1 = Pˆ c1D / 2 , Pˆ 2 = Pˆ + c1D / 2
n n
V P V
Pˆ1 = Pˆ c1D / 2 Pˆ1 = Pˆ + c1D / 2
n n
Figure D7: Two-sided confidence interval, quantile c1-D/2
Let us specify all the integrals to our example
c1D / 2
³ f ( z )dz = 1 D / 2
f
3 3
P{2.7 3.291 < P < 2.7 + 3.291} = 0.999
2 2
P{2.236 < P < 7.636} = 0.999 .
With probability 95% the “true” mean P is an element of the interval ]-0.24,
+5.64[. In contrast, with probability 99% the “true” mean P is an element of the
larger interval ]-1.164, +6.564[. Finally, with probability 99.9% the “true” mean
P is an element of the largest interval ]-2.236, +7.636[.
D5 Sampling from the Gaus-Laplace normal distribution:
a third confidence interval for the mean, variance unknown
In order to derive the sampling distributions for the sample mean, variance un-
known, of Gauss-Laplace i.i.d. observation D51 introduces two examples (two
and three observations, respectively, for generating Student’s t distribution.
Lemma D 13 reviews Student’s t-distribution of the random variable
n c ( Pˆ P ) / Vˆ where the sample mean P̂ is BLUUE of P, whereas the sample
variance Vˆ 2 is BIQUUE of V2. D52 by means of Lemma D13 introduces the
confidence interval for the “true” mean P variance V2 unknown, which is based
on Student’s probability distribution. For easy computation, Table D12 is its flow
chart. D53 discusses The Uncertainly Principle generated by The Magic Trian-
gle of (i) length of confidence interval, (ii) coefficient of negative confidence,
also called the uncertainty number, and (iii) the number of observations. Various
figures and examples pave the way for the routine analyst’s use of the confi-
dence interval for the mean, variance unknown.
D51 Student’s sampling distribution of the random variable ( Pˆ P ) / Vˆ
Two examples for n=2 or n=3 Gauss-Laplace i.i.d. observations keep us to de-
rive Student’s t-distribution for the random variable n ( Pˆ P ) / Vˆ where P̂ is
BLUUE of P, whereas Vˆ 2 is BIQUUE of V2. Lemma D12 and its proof is the
highlight of this paragraph in generating the sampling probability distribution of
Student’s t.
Example D15 (Student’s t-distribution for two Gauss-Laplace i.i.d. observations):
First, assume an experiment of two Gauss-Laplace i.i.d. observations called y1
and y2: We want to prove that (y1+y2)/2 and (y1-y2)2/2 or the sample mean P̂ and
the sample variance Vˆ 2 are stochastically independent. y1 and y2 are elements of
the joint pdf.
f ( y1 , y2 ) = f ( y1 ) f ( y2 )
1 § 1 ·
f ( y1 , y2 ) = exp ¨ 2 [( y1 P ) 2 + ( y2 P ) 2 ] ¸
2SV 2 © 2V ¹
D5 A third confidence interval for the mean, variance unknown 597
1 § 1 ·
f ( y1 , y2 ) = 2
exp ¨ 2 (y 1P )c(y 1P ) ¸ .
2SV © 2V ¹
The quadratic form (y 1P )c(y 1P ) is decomposed into the sample variance
Vˆ 2 , BIQUUE of V2, and the deviate of the sample mean P̂ , BLUUE of P, from
P by means of the fundamental separation
y 1P = y 1Pˆ + 1( Pˆ P )
(y 1P )c(y 1P ) = Vˆ 2 + 2( Pˆ P ) 2
1 § 1 · § 2 ·
f ( y1 , y2 ) = exp ¨ 2 Vˆ 2 ¸ exp ¨ 2 ( Pˆ P ) 2 ¸ .
2SV 2 © 2V ¹ © 2V ¹
The joint pdf f (y1,y2) is transformed into a special form if we replace
Pˆ = ( y1 + y2 ) / 2 ,
1
Vˆ 2 = ( y1 Pˆ ) 2 + ( y2 Pˆ ) 2 = ( y1 y2 ) 2 ,
2
namely
1 § 1 1 · § 1 y + y2 ·
f ( y1 , y2 ) = exp ¨ 2 ( y1 y2 ) 2 ¸ exp ¨ 2 ( 1 P )2 ¸ .
2SV 2 © 2V 2 ¹ © V 2 ¹
Obviously the product decomposition of the joint pdf documented that (y1+y2)/2
and (y1-y2)2/2 or P̂ and Vˆ 2 are independent random variables.
Second, we intend to derive the pdf of Student’s random variable t := 2( Pˆ P ) / Vˆ ,
the deviate of the sample mean P̂ form the “true” mean P, normalized by the
sample standard deviations Vˆ . Let us introduce the direct Helmert transforma-
tion
1 y1 y2 Vˆ 2 y1 + y2 2 Pˆ P
z1 = = , z2 = ( P) = ,
V 2 V V 2 V 2
or
ª 1 1 º
ª z1 º 1 « 2 2 »» ª y1 P º
«z » = « 1 1 » «¬ y2 P »¼
,
¬ 2¼ V «
« »
¬ 2 2 ¼
as well as the inverse
598 Appendix D: Sampling distributions and their use
ª 1 1 º
ª y1 P º « 2 2 »» ª z1 º
«y P» =V «
1 » «¬ z2 »¼
,
¬ 2 ¼ « 1
« »
¬ 2 2¼
which brings the joint pdf dF = f(y1, y2)dy1dy2 = f(z1, z2)dz1dz2 into the canonical
form.
1 § 1 · 1 § 1 ·
dF = exp ¨ z12 ¸ exp ¨ z22 ¸ dz1dz2
2S © 2 ¹ 2S © 2 ¹
1 1
1 2 2
dy1dy2 = dz1dz2 = dz1dz2 .
V2 1 1
2 2
The Helmert random variable x := z12 or z1 = x replaces the random variable z1
such that
1 1
dz1dz2 = dxdz2
2 x
1 1 § 1 · 1 § 1 ·
dF = exp ¨ x ¸ exp ¨ z22 ¸ dxdz2
2S 2 x © 2 ¹ 2S © 2 ¹
is the joint pdf of x and z2. Finally, we introduce Student’s random variable
Pˆ P
t := 2
Vˆ
2 V
z2 = ( Pˆ P ) Pˆ P = z2
V 2
Vˆ
z1 = x = Vˆ = V z1 = V x
V
z2
t= z2 = x t z22 = x t 2 .
x
Let us transform dF =f (x, z2) dx dz2 to dF =f (t, x) dt dx, namely from the joint
pdf of the Helmert random variable x and the Gauss-Laplace normal variate z2
to the joint pdf of the Student random variable t and the Helmert random vari-
able x.
D5 A third confidence interval for the mean, variance unknown 599
Dt z2 Dx z2
dz2 dx = dt dx
Dt x Dx x
2 3/ 2
x x t
dz2 dx = 2 dt dx
0 1
dz2 dx = x dt dx
dF = f (t, x)dt dx
1 1 1
f (t , x) = exp[ (1 + t 2 ) x ] .
2S 2 2
The marginal distribution of Student’s random variable t, namely dF3=f3(t)dt, is
generated by
f
1 1 1
f 3 (t ) := ³ exp[ (1 + t 2 ) x]dx
2S 2 0 2
1 1 2
E := (1 + t 2 ), =
2 E 1+ t2
such that
1 1
f 3 (t ) = ,
2S 1 + t 2
1 1
dF3 = dt
2S 1 + t 2
and characterized by a pdf f3(t) which is reciprocal to (1+t2).
Example D16 (Student’s t-distribution for three Gauss-Laplace i.i.d. observa-
tions):
First, assume an experiment of three Gauss-Laplace i.i.d. observations called y1,
y2 and y3: We want to derive the joint pdf f(y1, y2, y3) in terms of the sample mean
P̂ , BLUUE of P, and the sample variance Vˆ 2 , BIQUUE of V2.
f(y1, y2, y3) = f(y1) f(y2) f(y3)
600 Appendix D: Sampling distributions and their use
1 § 1 ·
f ( y1 , y2 , y3 ) = 3/ 2 3
exp ¨ 2 [( y1 P ) 2 + ( y2 P ) 2 + ( y3 P ) 2 ] ¸
(2S ) V © 2V ¹
1 § 1 ·
f ( y1 , y2 , y3 ) = 3/ 2 3
exp ¨ 2 (y 1P )c(y 1P ) ¸ .
(2S ) V © 2V ¹
The quadratic form (y 1P )c(y 1P ) is decomposed into the sample variance
Vˆ 2 and the deviate sample mean P̂ from the “true” mean P by means of the
fundamental separation
y 1P = y 1Pˆ + 1( Pˆ P )
(y 1P )c(y 1P ) = 2Vˆ 2 + 3( Pˆ P ) 2
dF = f ( y1 , y2 , y3 )dy1dy2 dy3 =
1 § 1 · § 3 ·
= exp ¨ 2 2Vˆ 2 ¸ exp ¨ 2 ( Pˆ P ) 2 ¸ dy1dy2 dy3 .
(2S )3/ 2 V 3 © 2V ¹ © 2V ¹
Second, we intend to derive the pdf of Student’s random variable t := 3( Pˆ P ) / Vˆ ,
the deviate of the sample mean P̂ from the “true” mean P, normalized by the
sample standard deviation V̂ . Let us introduce the direct Helmert transformation
1 1
z1 = ( y1 P + y2 P ) = ( y1 + y2 2 P )
V 2 V 2
1 1
z2 = ( y1 P + y2 P 2 y3 + 2 P ) = ( y1 + y2 2 y3 )
V 23 V 23
1 1
z3 = ( y1 P + y2 P + y3 P ) = ( y1 + y2 + y3 3P )
V 3 V 3
or
ª 1 1 º
« 0 »
« 1 2 1 2 » ª y1 P º
ª z1 º
«z » = 1 « 1 1 2 »« »
« 2» V « » « y2 P »
« 23 23 23 »
«¬ z3 »¼ « y P »¼
« 1 1 1 »¬ 3
« »
¬ 3 3 3 ¼
as well as its inverse
D5 A third confidence interval for the mean, variance unknown 601
ª 1 1 1 º
« »
« 1 2 23 3» z
ª y1 P º ª 1º
«y P» = V « 1 1 1 »« »
« 2 » « » z2
« 1 2 23 3»« »
«¬ y3 P »¼ «z »
« 2 1 »¬ 3¼
« 0 »
¬ 23 3¼
in general
z = V 1 H (y P ) versus (y 1P ) = V H cz ,
which help us to bring the joint pdf dF(y1, y2, y3)dy1 dy2 dy3 = f(z1, z2, z3)dz1 dz2
dz3 into the canonical form.
1 § 1 · § 1 ·
dF = exp ¨ ( z12 + z22 ) ¸ exp ¨ z32 ¸ dz1dz2 dz3
(2S )3/ 2 © 2 ¹ © 2 ¹
1 1 1
1 2 23 3
1 1 1 1
dy1dy2 dy3 = dz1dz2 dz3 = dz1dz2 dz3 .
V3 1 2 23 3
2 1
0
23 3
The Helmert random variable x := z12 + z22 or x = z12 + z22 replaces the ran-
dom variable z12 + z22 as soon as we introduce polar coordinates z1 = r cos I1 ,
z2 = r sin I1 , z12 + z22 = r 2 =: x and compute the marginal pdf
2S
1 § 1 · 1 1 § 1 ·
dF ( z3 , x) = exp ¨ z32 ¸ dz3 exp ¨ x ¸ dx ³ dI1 ,
2S © 2 ¹ 2S 2 © 2 ¹ 0
by means of
dx
x := z12 + z22 = r 2 dx = 2rdr , dr = ,
2r
1
dz1dz2 = rdrdI1 = dxdI1 ,
2
1 § 1 · 1 § 1 ·
dF ( z3 , x) = exp ¨ x ¸ exp ¨ z32 ¸ dxdz3 ,
2 © 2 ¹ 2S © 2 ¹
the joint pdf of x and z. Finally, we inject Student’s random variable
602 Appendix D: Sampling distributions and their use
Pˆ P
t := 3 ,
Vˆ
decomposed into
3 V
z3 = ( Pˆ P ) Pˆ P = z3
V 3
Vˆ V V
z12 + z22 = x = 2 Vˆ = z12 + z22 = x
V 2 2
z3 1 1 2
t= 2 z3 = x t z32 = xt .
x 2 2
Let us transform dF = f (x, z3) dx dz3 to dF = f (t, x) dt dx. Alternatively we may
say that we transform the joint pdf of the Helmert random variable x and the
Gauss-Laplace normal variate z3 to the joint pdf of the Student random variable
t and the Helmert random variable x.
Dt z3 Dx z3
dz3 dx = dtdx
Dt x Dx x
x 1
x 3/ 2t
dz3 dx = 2 2 2 dtdx
0 1
x
dz3 dx = dtdx
2
dF = f (t , x)dtdx
1 1 1 t2
f (t , x) = x exp[ (1 + ) x].
2 2 2S 2 2
The marginal distribution of Student’s random variable t, namely dF3 = f3(t)dt, is
generated by
f
1 1 1 t2
f 3 (t ) := ³ x exp[ (1 + ) x]dx
2 2 2S 0
2 2
*(D + 1)
f
1 1 t2
³ x exp( E x)dx = = = +
D
, D , E (1 )
0
E D +1 2 2 2
D5 A third confidence interval for the mean, variance unknown 603
such that
3
f *( )
1 t2 2
³ x exp[ (1 + ) x]dx = 23/ 2
2 2 t 2 3/ 2
0
(1 + )
2
3 1
*( ) = S
2 2
3 1 1
f 3 (t ) = *( ) ,
2 2S t2
(1 + )3/ 2
2
2
dF3 = dt .
t 2 3/ 2
4(1 + )
2
Again Student’s t-distribution is reciprocal t(1+t2/2)3/2.
Lemma D12 (Student’s t-distribution for the derivate of the mean ( Pˆ P ) / Vˆ ,W.S.
Gosset 1908):
Let the random vector of observations y = [ y1 ," , yn ]c be Gauss-Laplace i.i.d.
Student’s random variable
Pˆ P
t := n,
Vˆ
where the sample mean P̂ is BLUUE of P and the sample variance Vˆ 2 is
BIQUUE of V 2 , associated to the pdf
p +1
*( )
2 1 1
f (t ) = .
p pS t 2 ( p +1) / 2
*( ) (1 + )
2 p
due to the effect that zn and z12 + " + zn21 or P̂ P and ( n 1)Vˆ 2 are stochasti-
cally independent. Let us take reference to the specific pdfs with n and n 1 = p
degrees of freedom
1 1 p / 2 ( p 2) / 2 § 1 ·
1 § 1 · f 2 ( x) = ( ) x exp ¨ x ¸ ,
f1 ( zn ) = exp ¨ zn2 ¸ and p 2 © 2 ¹
2S © 2 ¹ *( )
2
or
1
2 f 2 (Vˆ 2 ) = p p/2
p p / 2Vˆ p2
n § n (Pˆ P ) · V 2 *( p /2)
f1 (Pˆ ) = exp ¨ 2 ¸ and
V 2S © 2 V ¹
§ 1 p 2·
exp ¨ 2
Vˆ ¸
© 2V ¹
zn Pˆ P x x
t := n 1 = n or zn = t= t.
x Vˆ n 1 p
ª dt º ª Dz t Dx t º ª dzn º ª dzn º
« dx » = « D x D x » « dx » = J « dx »
n
¬ ¼ «¬ z nx »¼¬ ¼ ¬ ¼
ª p 1 º
« x 3/ 2 zn p » p x
J=« x 2 » , | J |= , | J|-1 =
«¬ 0 »¼ x p
1
we transform the surface element dzn dx to the surface element | J |1 dtdx,
namely
x
dzn dx = dtdx.
p
dF3 = f 3 (t )dt
f
1 1 1 § 1 t2 ·
f 3 (t ) := ( ) p / 2 ³ x ( p 1) / 2 exp ¨ (1 + ) x ¸ dx.
2 pS *( p / 2) 2 0 © 2 p ¹
Consult W. Gröbner and N. Hofreiter (1973 p.55) for the standard integral
D ! *(D + 1)
f
³x
D
exp( E x )dx = =
0
E D +1 E D +1
and
p +1
f *( )
1 t2 2
³0 x ( p 1) / 2
exp[ (1 + ) x ]dx = 2( p +1) / 2 ,
2 p t2
(1 + )( p +1) / 2
p
where p = n 1 is the rank of the quadratic Helmert matrix Hn. Notice a result
p +1 p 1
of the gamma function *( )=( )! . In summary, substituting the stan-
2 2
dard integral
dF3 = f 3 (t )dt
p +1
*( )
2 1 1
f 3 (t ) = 2
p pS t
*( ) (1 + )( p +1) / 2
2 p
resulted in Student’s t distribution, namely the pdf of Student’s random variable
( Pˆ P ) / Vˆ .
h
D52 The confidence interval for the mean, variance unknown
Lemma D12 is the basis for the construction of the confidence interval of the
“true” mean, variance unknown, which we summarize in Lemma D13. Example
D17 contains all details for computing such a confidence interval, namely Table
D8, a collection of the most popular values of the coefficient of confidence, as
well as Table D9, D10 and D11, listing the quantiles for the confidence interval
of the Student random variable with p = n 1 degrees of freedom. Figure D8
and Figure D9 illustrate the probability of two-sided confidence interval for the
mean, variance unknown, and the limits of the confidence interval. Table D12 as
a flow chart paves the way for the “fast computation” of the confidence interval
for the “true” mean, variance unknown.
606 Appendix D: Sampling distributions and their use
Lemma D14 (confidence interval for the sample mean, variance unknown):
The random variable t = n ( Pˆ P ) / Vˆ characterized by the ratio of the deviate
of the sample mean Pˆ = n 1 1cy , BLUUE of P , from the “true” mean P and the
standard deviation Vˆ , Vˆ 2 = (y 1Pˆ )c(y 1Pˆ ) /( n 1) BIQUUE of the “true”
variance V 2 , has the Student t-distribution with p = n 1 “degrees of freedom”.
The “true” mean P is an element of the two-sided confidence interval
Vˆ Vˆ
P ]Pˆ c1D / 2 , Pˆ + c1D / 2 [
n n
with confidence
Vˆ Vˆ
P{Pˆ c1D / 2 < P < Pˆ + c1D / 2 } = 1 D
n n
of level 1 D . For three values of the coefficient of confidence J = 1 D , Table
D9 is a list of associated quantiles c1D / 2 .
Example D17 (confidence interval for the example mean P̂ , V 2 unknown):
Suppose that a random sample
ª y1 º ª1.2 º
«y » « »
« 2 » = «3.4 » , Pˆ = 2.7, Vˆ 2 = 5.2, Vˆ = 2.3
« y3 » «0.6 »
« » « »
«¬ y4 »¼ ¬5.6 ¼
of four observations is characterized by the sample mean Pˆ = 2.7 and the sam-
ple variance Vˆ 2 = 5.2 . 2(2.7 P ) / 2.3 = n ( Pˆ P ) / Vˆ = t has Student’s pdf
with p = n 1 = 3 degrees of freedom. The probability J = 1 D that t will be
between any two arbitrarily chosen numbers c1 = c and c2 = +c is
c2
+c
c c
c c
D D
³ f (t )dt = 1 , ³ f (t )dt =
f
2 f
2
D5 A third confidence interval for the mean, variance unknown 607
D
0.025 0005 0.000,5
2
D 1+ J
1 = 0.975 0.995 0.999,5
2 2
In solving the linear Volterra integral equation of the first kind
t
1 1
³ f (t
)dt
= 1 D (t ) = [1 + J (t )] ,
f
2 2
5 6 1.05 39 40 0.320
6 7 0.925 49 50 0.284
7 8 0.836 99 100 0.198
8 9 0.769 199 200 0.139
9 10 0.715 499 500 0.088
Table D10
Quantiles c1D / 2 / n
for the confidence interval of the Student random variable with p = n 1
degrees of freedom
1 D / 2 = (1 + J ) / 2 = 0.995, J = 0.990,D = 0.010
p n 1 p n 1
c1D / 2 c1D / 2
n n
1 2 45.01 14 15 0.769
2 3 5.73 19 20 0.640
3 4 2.92 24 25 0.559
4 5 2.06 29 30 0.503
5 6 1.65 39 40 0.428
6 7 1.40 49 50 0.379
7 8 1.24 99 100 0.263
8 9 1.12 199 200 0.184
9 10 1.03 499 500 0.116
Table D11
Quantiles c1D / 2 / n
for the confidence interval of the Student random variable with p = n 1
degrees of freedom
1 D / 2 = (1 + J ) / 2 = 0.999.5, J = 0.999, D = 0.001
p n c1D / 2 c1D / 2 / n p n c1D / 2 c1D / 2 / n
because of
c c1D / 2
D
³ f (t )dt = ³ f (t )dt = 1 .
f f
2
Figure D8 and Figure D9 illustrate the coefficient of confidence and the prob-
ability function of a confidence interval.
f (t )
J =1-D
D/2 D/2
c1D / 2 t + c1D / 2
Figure D8: Two-sided confidence interval P ]Pˆ1 , Pˆ 2 [ , Student’s pdf for p = 3
degrees of freedom (n = 4) Pˆ1 = Pˆ Vˆ c1D / 2 / n , Pˆ 2 = Pˆ + Vˆ c1D / 2 / n
Pˆ Vˆ c1D / 2 / n Pˆ + Vˆ c1D / 2 / n
P̂1 P P̂ 2
Figure D9: Two-sided confidence interval for the “true” mean P , quantile
c1D / 2
610 Appendix D: Sampling distributions and their use
³ f (t )dt = 1 D / 2
f
These data substituted into Tables D9-D11 lead to the triplet of confidence inter-
vals for the “true” mean.
case (i ) : J = 0.95, D = 0.05, 1 D / 2 = 0.975
p = 3, n = 4, c1D / 2 / n = 1.59
p = 3, n = 4, c1D / 2 / n = 2.92
p = 3, n = 4, c1D / 2 / n = 6.470
Figure D10: Length of the confidence interval for the mean against the
number of observations
Figure D10 is the graph of the function
Fact #2: For a constant number of observations n, the smaller the num-
ber of uncertainly D is chosen, the larger is the confidence in-
terval 'P .
Evidently, the diversive influences of (i) the length of the confidence interval
'P , (ii) the uncertainly number D and (iii) the number of observations n, which
we collected in The Magic Triangle of Figure D11, constitute The Uncertainly
Principle, in formula
'P (n 1) t: k PD ,
where kPD is called the quantum number for the mean P which depends on the
uncertainty number D . Table D13 is a list of those quantum numbers. Let us
interpret the uncertainty relation 'P (n 1) t kPD . The product 'P (n 1) defines
geometrically a hyperbola which we approximated out of the graph of Figure
D10. Given the uncertainty number D , the product 'P (n 1) has a smallest
number, here denoted by k PD . For instance, choose D = 1% such that
k PD /(n 1) d 'P or 16.4 /(n 1) d 'P . For n taken values 2, 11, 101, we get the
inequalities 8.2 d 'P , 1.64 d 'P , 0.164 d 'P .
length of the confidence interval
Table D13
Coefficient of complementary confidence D , uncertainty number D , versus
quantum number of the mean k PD (E. Grafarend 1970)
D kPD
10% 6.6
5% 9.6
1% 16.4
o0 of
D6 A fourth confidence interval for the variance 613
of level 1 D . Tables D5, D16 and D14 list the quantiles c1 ( p;1 D / 2) and
c2 ( p;D / 2) associated to three values of Table D17 of the coefficient of com-
plementary confidence 1 D .
In order to make yourself more familiar with Helmert’s Chi Square distribution
we recommend to solve the problems of Exercise D1.
Exercise D1 (Helmert’s Chi Square FM2 distribution):
Helmert’s random variable x := (n-1)Vˆ 2 / V 2 = pVˆ 2 / V 2 has the non-symmetric
FM2 pdf. Prove that the first four central moments are,
(i ) S 1 = 0, E{x} = P = p
(ii ) S 2 = V x2 = 2 p
(iii ) S 3 = (2 p )3 / 2 (8 / p)1/ 4
(coefficient of skewness S 32 / S 22 = 8 / p )
(iv) S 4 = 6V 4 (1 + 2 p)
(coefficient of curtosis S 4 / S 22 3 = 3 + 12 / p ).
Guide
p º
E{x} = E{Vˆ 2 } » E{x} = p
(i) V2
»
E{Vˆ 2 } = V 2 ("unbiasedness") »¼
2 2 º
D{Vˆ 2 } = V4 = V4»
n 1 p
(ii) » D{x} = 2 p.
2
p »
D{x} = 4 D{Vˆ 2 } »
V ¼
Example D18 (confidence interval for the sample variance Vˆ 2 ):
Suppose that a random sample ( y1, " yn ) Y of size n = 100 has led to an em-
pirical variance Vˆ 2 = 20.6 . x =(n-1)Vˆ 2 / V 2 or 99
20.6 / V 2 = 2039.4 / V 2 has
Helmert’s pdf with p = n 1 = 99 degrees of freedom. The probability J = 1 D
that x will be between c1 ( p;D / 2) and c2 ( p;1 D / 2) is
c2
c2 c1
Vˆ 2
c1 ( p; D / 2) < x < c2 ( p;1 D / 2) c1 ( p; D / 2) < (n 1) < c2 ( p;1 D / 2)
V2
or
1 V2 1 (n 1)Vˆ 2 (n 1)Vˆ 2
< < < V 2
<
c2 (n 1)Vˆ 2 c1 c2 c1
(n 1)Vˆ 2 (n 1)Vˆ 2
P{ <V2 < } = 1D = J .
c2 ( p;1 D / 2) c1 ( p; D / 2)
0.5
f(x)
0 c1 ( p;D / 2) c2 ( p;1 D / 2)
x := (n-1)Vˆ 2 / V 2 = pVˆ 2 / V
(n 1)Vˆ 2 V2 (n 1)Vˆ 2
c2 ( p;1 D / 2) c1 ( p;D / 2)
Table D15
Quantiles c1 ( p;D / 2) , c2 ( p;1 D / 2) for the confidence interval of the Hel-
mert random variable with p = n 1 degrees of freedom
D / 2 = 0.025,1 D / 2 = 0.975,D = 0.05, J = 0.95
99
20.6 99
20.6
P{ <V2 < } = 0.95
128 73.4
99
20.6 99
20.6
P{ <V2 < } = 0.99
139 66.5
99
20.6 99
20.6
P{ <V2 < } = 0.996
147.3 60.3
(n 1)Vˆ 2 (n 1)Vˆ 2
V2 ] , [.
c2 ( p;1 D / 2) c1 ( p; D / 2)
Figure D14: Length of the confidence interval for the variance against the
number of observations.
Figure D14 is the graph of the function
1 1
'V 2 (n;D ) := (n 1)Vˆ 2 ( ),
c1 (n 1;D / 2) c2 (n 1;1 D / 2)
where 'V 2 is the length of the confidence interval of the “true” variance V 2 .
The independent variable of the functions is the number of observations n. The
function 'V 2 (n;D ) is plotted for fixed values of the coefficient of complemen-
tary confidence D , namely D = 5% , 1%, 0.2%. For reasons given later on the
coefficient of complementary confidence D is called uncertainty number. The
graph of the function 'V 2 (n;D ) illustrates two important facts.
620 Appendix D: Sampling distributions and their use
Fact #1: For a contrast uncertainty number D , the length of the confi-
dence interval 'V 2 is smaller, the larger number of observa-
tions n is chosen.
Fact #2: For a contrast number of observations n, the smaller the num-
ber of uncertainty D is chosen, the larger is the confidence in-
terval 'V 2 .
Evidently, the divisive influences of (i) the length of the confidence interval
'V 2 , (ii) the uncertainty number D and (iii) the number of observations n,
which we collect in The Magic Triangle of Figure D15, constitute The Uncer-
tainty Principle, formulated by the inequality
'V 2 (n 1) t kV D ,
2
where kV D is called quantum number for the variance V 2 . The quantum number
2
which we approximated to the graph of Figure D14. Given the uncertainty num-
ber D , the product 'V 2 (n 1) has a smallest number denoted by kV D . For 2
the number of observations n, for instance n = 2, 11, 101, we find the inequalities
42 d 'V 2 , 4.2 d 'V 2 , 0.42 d 'V 2 .
length of the confidence interval
D kV D
2
10% 19.5
5% 25.9
1% 42.0
o0 of
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 621
At the end, pay attention to the quantum numbers kV D listed in Table D19.
2
ª y1 º ½ ª1 k1 º ª1 0 º
°« » ° « ª [1 º « ª[ º
E{y} = Aȟ E ® « y2 » ¾ = «1 k2 » « » = «1 1 »» « 1 » , rk A = 2 .
»
° « y » ° «1 k » ¬[ 2 ¼ «1 2 » ¬[ 2 ¼
¯¬ 3 ¼ ¿ ¬ 3¼ ¬ ¼
The central second moment
ª y1 º ½ ª1 0 0 º
° °
D{y} = I nV D{y} = D ® «« y2 »» ¾ = ««0 1 0 »» V 2 , V 2 < 0 .
2
° « y » ° «0 0 1 »
¯¬ 3 ¼ ¿ ¬ ¼
k represents the abscissa as a fixed random, y the ordinate as the observation,
naturally a random effect. Samples of the straight line are taken at k1 = 0,
k2 = 1, k3 = 2 , this calling for y (k1 ) = y (0) = y1 , y (k2 ) = y (1) = y2 , y (k3 ) =
= y (2) = y3 , respectively. E{y} is a consistent equation. Alternatively, we may
say E{y} R ( A ) . The matrix A \ 3× 2 is rank deficient by p = n rk A = 1,
also called “degree of freedom”. The dispersion matrix D{y} , the central mo-
ment of second order, is represented as a linear model, too, namely by one-
variance component V 2 . The joint probability function of the three Gauss-
Laplace i.i.d. observations
dF = f ( y1 , y2 , y3 )dy1dy2 dy3 = f ( y1 ) f ( y2 ) f ( y3 )dy1dy2 dy3
§ 1 ·
f ( y1 , y2 , y3 ) = (2S ) 3/ 2 V 3 exp ¨ 2 (y E{y})c( y E{y}) ¸
© 2V ¹
will be transformed by means of the special linear Gauss-Markov model with
one-variance component.
The second action item
For such a transformation, we need ȟ̂ BLUUE of ȟ and Vˆ 2 BIQUUE V 2 .
ȟˆ = ( AcA) 1 Acy ,
ª 3 3º 1 ª 5 3º
A cA = « » , ( A cA) = 6 « 3 3 » ,
1
¬ 3 5 ¼ ¬ ¼
ȟ̂ BLUUE of ȟ :
ª y1 º
ˆȟ = 1 ª 5 2 1º « y » ,
6 «¬ 3 0 3 »¼ « »
2
«¬ y3 »¼
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 623
1
Vˆ 2 = (y Aȟˆ )c(y Aȟˆ ) =
n rk A
1
Vˆ 2 BIQUUE V 2 : = y c(I n A ( A cA ) 1 A c)y ,
n rk A
1
Vˆ 2 = ( y12 4 y1 y2 + 2 y1 y3 + 4 y22 4 y2 y3 + y32 ) .
6
The third action item
The quadratic form (y E{y})c(y E{y}) allows the fundamental decomposi-
tion
ª3 3º ˆ
(y E{y})c(y E{y}) = ıˆ 2 + (ȟˆ ȟ )c « » (ȟ ȟ ).
¬3 5 ¼
The fourth action item
In order to bring the quadratic form (y E{y})c(y E{y}) = ( n rk A)Vˆ 2 +
+(ȟˆ ȟ )cAcA(ȟˆ ȟ ) into a canonical form, we introduce the generalised forward
and backward Helmert transformation
HH c = I n
z = V H (y E{y}) = V 1 H (y Aȟ )
1
and
y E{y} = V H cz
1
2
(y E{y})c(y E{y}) = z cz = z12 + z22 + z32 .
V
?How to relate the sample variance Vˆ 2 and the sample quadratic form
(ȟˆ ȟ )cAcA(ȟˆ ȟ ) to the canonical quadratic form z cz ?
Previously, for the example of direct observations in the special linear Gauss-
Markov model E{y} = 1P , D{y} = I nV 2 we succeeded to relate z12 + " + zn21 to
Vˆ 2 and zn2 to ( Pˆ P ) 2 . Here the sample variance Vˆ 2 , BIQUUE V 2 , as well as
624 Appendix D: Sampling distributions and their use
the quadratic form of the deviate of the sample parameter vector ȟ̂ from the
“true” parameter vector ȟ have been represented by
1 1
Vˆ 2 = (y Aȟˆ )c(y Aȟˆ ) = y c[I n A( A cA) 1 A c]y
n rk A n rk A
rk[I n A( A cA) 1 A c] = n rk A = n m = 1
versus
M := I n A ( A cA ) 1 A = N := A cA =
and
ª 1 2 1 º
1« ª3 3 º
= « 2 4 2 »» =« »,
6 ¬3 5 ¼
¬« 1 2 1 »¼
will be analyzed.
The eigenspace analysis of the The eigenspace analysis of the
matrix M matrices N, N-1
j {1," , n} i {1," , m}
V cMV = U cNU = Diag (J 1 ," , J m ) = ȁ N
ª Vcº
= « 1 » M [ V1 , V2 ] = U cN 1U = Diag (O1 ," , Om ) = ȁ N 1
¬ V2c ¼
1 1
= Diag ( P1 ," , P n m , 0," , 0) = ȁ M J1 = ," , J m =
O1 Om
1 1
O1 = ," , Om =
J1 Jm
rk M = n m rk N = rk N 1 = m
Orthonormality of the eigencolumns Orthonormality of the eigencolumns
V1cV1 = I n m , V2cV2 = I m
U cU = I m
V1cV2 = 0 \ ( n m )× m
ª V1c º ªI 0º
« V c » [ V1 V2 ] = « n m
I m »¼
¬ 2¼ ¬ 0
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 625
v1cv1 = 1 u1cu1 = 1
v1cv2 = 0 u1cu2 = 0
... ...
vnc 1vn = 0 umc 1um = 0
vnc vn = 1 umc um = 1
V SO(n) U SO(m)
:eigencolumns: :eigencolumns:
(M P j I n ) v j = 0 (N J j I n )u j = 0
:eigenvalues: :eigenvalues:
| M P j I n |= 0 | N J i I m |= 0
in particular
eigenspace analysis of the matrix M, eigenspace analysis of the matrix N,
rkM=n-m, M \ 3×3 , A \ 3× 2 , rk M = 1 rkN=m, N \ 2× 2 , rk N = 2
ȁ M = Diag (1, 0, 0) ȁ N = Diag (0.8377, 7.1623)
ª 0.4082 0.7024 0.5830 º
V = [ V1 , V2 ] = «« 0.8165 0.5667 0.1109»» ª 0.8112 0.5847 º
U=« »
«¬ 0.4082 0.4307 0.8049»¼ ¬ 0.5847 0.8112 ¼
V1 \ 3×1 V2 \ 3×2 U \ 2×2
to be completed by
eigenspace synthesis of the eigenspace synthesis of the
matrix M matrix N, N-1
M = Vȁ M V c N = Uȁ N Uc, N 1 = Uȁ N Uc1
ª Vcº
M = [V1 , V2 ]ȁ M « 1 » =
¬ V2c ¼
ª Vcº
= [V1 , V2 ]Diag ( P1 ," , P n m , 0," , 0) « 1 » N = UDiag (J 1 ," , J m )U c
¬ V2c ¼
M = V1 Diag ( P1 ," , P n m )V1c N 1 = UDiag (O1 ," , Om )U c
in particular
M= N=
ª 0.4082 º ª 0.8112 0.5847 º ªJ 1 0 º
« 0.8165» P 0.4082 0.8165 0.4082 « 0.5847 0.8112 » « 0 J »
« » 1[ ] ¬ ¼¬ 2¼
«¬ 0.4082 »¼ ª 0.8112 0.5847 º
«0.5847 0.8112 »
¬ ¼
P1 = 1 versus
J 1 = 0.8377 , J 2 = 7.1623
626 Appendix D: Sampling distributions and their use
N 1 =
ª 0.8112 0.5847 º ªO1 0 º ª 0.8112 0.5847 º
=« »« »« »
¬ 0.5847 0.8112 ¼ ¬ 0 O2 ¼ ¬ 0.5847 0.8112 ¼
versus O1 = J 1 = 1.1937, O2 = J 2 = 0.1396.
1 1
P1 = 1
The non-vanishing eigenvalues of the matrix M have been denoted by
( P1 ," , P n m ) , m eigenvalues are zero such that eigen (M ) = ( P1 ," , P n m ,
, 0," , 0) . The eigenvalues of the regular matrix N span eigen (N) = (J 1 ," , J m ) .
Since the dispersion matrix D{ȟˆ} = ( AcA) 1V 2 = N 1V 2 is generated by the in-
verse of the matrix A cA = N , we have computed, in addition, the eigenvalues of
the matrix N 1 by means of eigen(N 1 ) = (O1 ," , Om ) . The eigenvalues of N and
N 1 , respectively, are related by
J 1 = O11 ," , J m = Om1 or O1 = J 11 ," , Om = J m1 .
In the example, the matrix M had only one non-vanishing eigenvalue P1 = 1 . In
contrast, the regular matrix N was characterized by two eigenvalues J 1 = 0.8377
and J 2 = 7.1623 , its inverse matrix N 1 by O1 = 1.1937 and O2 = 0.1396 .
The two quadratic forms, namely
V cy = y y = Vy and Uc(ȟˆ ȟ ) = Șˆ Ș
such that
1 1 1
2
( y E{y})c( y E{y}) = 2 ( y
)cȁ M y
+ 2 ( Șˆ Ș) ȁ N ( Șˆ Ș)
V V V
nm m
1 1 1
2
(y E{y})c(y E{y}) = 2 ¦ (y j) Pj +
2
¦ ( Șˆ i Și ) 2 J i
V V j =1 V2 i =1
1
(y E{y})c( y E{y}) = z12 + " + zn2 m + z 2 + " + zn2 .
V2 n m +1
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 627
here
1 2 Vˆ 2
z12 = y1* = 2
V2 V
and
1
z22 + z32 = [(Kˆ1 K1 ) 2 J 1 + (Kˆ2 K2 ) 2 J 2 ] ,
V2
or
1
z12 = ( y12 4 y1 y2 + 2 y1 y3 + 4 y22 4 y2 y3 + y32 )
6V 2
and
1 1 ª3 3 º ˆ
z22 + z32 = 2
[0.8377(Kˆ1 K1 )2 + 7.1623(Kˆ2 K 2 ) 2 ] = 2 (ȟˆ ȟ )c « » (ȟ ȟ ).
V V ¬3 5 ¼
The fifth action item
We are left with the problem to transform the cumulative probability dF =
= f ( y1 , y2 , y3 )dy1dy2 dy3 into the canonical form dF = f ( z1 , z2 , z3 )dz1dz2 dz3 .
Here we take advantage of Corollary D3. First, we introduce Helmert’s random
variable x := z12 and the random variables [ˆ1 and [ˆ2 of the unknown parameter
vector ȟ̂ of fixed effects (ȟˆ ȟ )cA cA(ȟˆ ȟ ) = z22 + z32 = || z ||2AcA if we denote
z := [ z2 , z3 ]c .
1 1 dx | A cA |1/ 2 1
dF = exp( x) 2
exp[ 2 (ȟˆ ȟ )cA cA(ȟˆ ȟ )]d [ˆ1d [ˆ2 .
2S 2 x 2SV 2V
The left pdf establishes Helmert’s pdf of x = z12 = Vˆ 2 / V 2 , dx = V 2 dVˆ 2 . In
contrast, the right pdf characterizes the bivariate Gauss-Laplace pdf of || ȟˆ ȟ ||2 .
628 Appendix D: Sampling distributions and their use
1 1
|| ȟˆ ȟ ||2AcA := (ȟˆ ȟ )cA cA(ȟˆ ȟ ) = (ȟˆ ȟ )cUDiag ( , )U c(ȟˆ ȟ )
O1 O2
1
| A cA |1/ 2 = J 1J 2 =
O1O2
Șˆ Ș := U c(ȟˆ ȟ ) ȟˆ ȟ = U ( Șˆ Ș)
1 1
|| ȟˆ ȟ ||2AcA = ( Șˆ Ș)c Diag ( , )( Șˆ Ș) =|| Șˆ Ș ||2D
O1 O2
ª 1 º
« 1.1937 0 »
ª 3 3 º
|| ȟˆ ȟ || 2
A cA = (ȟˆ ȟ ) c « » (ȟˆ ȟ ) = ( Șˆ Ș) c « » ( Șˆ Ș) =|| Șˆ Ș ||D2 .
¬3 5 ¼ « 0 1 »
«¬ 0.1396 »¼
1 1 1 1 1 (Kˆ1 K1 )2 (Kˆ2 K2 )2
dF = exp( x)dx exp[ ( + )]dKˆ1dKˆ2
V 2S x 2 2SV 2 O1O2 2V 2 O1 O2
or
dF = f ( x) f (Kˆ1 ) f (Kˆ2 ) dxdKˆ1dKˆ2 .
ȟˆ = ( A cA) 1 A cy .
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 629
ª D1 x D2 x D3 x º
« ˆ » ª ac º 1× 3
J x = « D1[1 D [ˆ D3[ˆ1 » = «
c»
» ¬( A A) A ¼ 2 × 3
2 1 1
c
« ˆ
¬ D1[ 2 D2[ˆ2 ˆ
D3[ 2 ¼
ª D1 x º ª y1 2 y2 + y3 º
« » 2 1 «
a = « D2 x » = 2 My = 2 y1 + 4 y2 2 y3 »»
V 3V 2 «
«¬ D3 x »¼ «¬ y1 2 y2 + y3 »¼
1 1
x= 2
y cMy = ( y12 4 y1 y2 + 2 y1 y3 + 4 y22 4 y2 y3 + y32 )
V 6V 2
1 ª 5 2 1º
( A cA) 1 A c =
6 «¬ 3 0 3 »¼
ª 2 y1 4 y2 + 2 y3 4 y1 + 8 y2 4 y3 2 y1 4 y2 + 2 y3 º
1 « »
Jx = 5 2 1
6V 2 « »
«¬ 3 0 3 »¼
4 4
aca = 4
y cM cMy = 4 y cMy
V V
4
acA( A cA) 1 Aca = 4 y cM cA( AcA) 1 AcMy =
V
4
= y c[I 3 A ( A cA ) 1 A c]A( A cA) 1 Ac[I 3 A( A cA) 1 Ac]y = 0
V4
2 y cMy
det J x = 2
V det( A cA)
630 Appendix D: Sampling distributions and their use
V2 det( A cA)
| det J y |=| det J x |1 = .
2 y cMy
The various representations of the Jacobian will finally lead us to the special
form of the volume element
1 1 y cMy
dx d[ˆ1 d[ˆ2 = 2 dy1 dy2 dy3
2 V | AcA |1/ 2
and the cumulative probability
1 1
dF = 3 3/ 2
exp[ ( z12 + z22 + z32 )]dz1dz2 dz3 =
V (2S ) 2
1 1 dx | A cA |1/ 2 1
= exp( x) 2
exp[ 2 (ȟˆ ȟ )cA cA(ȟˆ ȟ )]d[ˆ1d[ˆ2 =
V 2S 2 x 2SV 2V
1 1
= 3 exp[ (y E{y})c(y E{y})]dy1dy2 dy3 .
V (2S )3/ 2 2
The sixth action item
The first target is to generate the marginal pdf of the unknown parameter vector
ȟ̂ , BLUUE of ȟ .
f
1 1 dx | A cA |1/ 2 § 1 ·
dF1 = ³ exp( x) 2
exp ¨ 2 (ȟˆ ȟ )cAcA(ȟˆ ȟ ) ¸ d [ˆ1 d [ˆ2
0 2S 2 x 2SV © 2V ¹
f
1 1 dx 1 § 1 2
(Kˆi Ki ) 2 ·
dF1 = ³ exp( x) exp ¨ 2 ¦ ¸ dKˆ1 dKˆ2
0 2S 2 2
x 2SV O1O2 © 2V i =1 Oi ¹
1 1 § 1 2 (Kˆ Ki ) 2 ·
f1 ( Șˆ ) = exp ¨ ¦ i ¸.
2SV 2 O1O2 © 2 i =1 Oi ¹
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 631
The second target is to generate the marginal pdf of Helmert’s random variable
x := Vˆ 2 / V 2 , Vˆ 2 BIQUUE V 2 , namely Helmert’s Chi Square pdf F p2 with p=n-
rkA (here p=1) “degree of freedom”.
+f +f
1 1 dx | A cA |1/ 2 § 1 ·
dF2 = ³f f³ 2SV 2 exp ¨© 2V 2 || ȟˆ ȟ ||A cA ¸¹ d[ˆ1d[ˆ2 .
2
exp( x)
2S 2 x
Let us substitute the integral
+f +f
| A cA |1/ 2 § 1 ·
³f f³ 2SV 2 exp ¨© 2V 2 || ȟˆ ȟ ||A cA ¸¹ d[ˆ1d[ˆ2 = 1
2
subject to
1
p = n rk A = n m , here: p = 1 , *( ) = S
2
1 1 1
f 2 ( x) = exp( x) .
2S x 2
The results of the example will be generalized in Lemma D.
Theorem D16 (marginal probability distributions, special linear Gauss-
Markov model):
E{y} = Aȟ ª A \ n×m , rk A = m, E{y} R ( A )
subject to « 2
D{y} = I nV 2 ¬V \
+
and
632 Appendix D: Sampling distributions and their use
ª E{Vˆ 2 } = Vˆ 2
1
Vˆ =
2
(y Aȟˆ )c(y Aȟˆ ) subject to « 4
n rk A « D{Vˆ 2 } = 2V
«¬ n rk A
identify ȟ̂ BLUUE ȟ and Vˆ 2 BIQUUE of V 2 . The cumulative pdf of the multi-
dimensional Gauss-Laplace probability distribution of the observation vector
y = [ y1 ," , yn ]c Y
f (y | E{y}, D{y} = I nV 2 )dy1 " dyn =
1 § 1 ·
= n/2 n
exp ¨ 2 (y E{y})c(y E{y}) ¸ dy1 " dyn =
(2S ) V © 2V ¹
ˆ ˆ ˆ
= f1 (ȟ ) f 2 (Vˆ )d [1 " d [ m dVˆ
2 2
can be split into two marginal pdfs f1 (ȟˆ ) of ȟ̂ , BLUUE of ȟ , and f 2 (Vˆ 2 ) of
Vˆ 2 , BIQUUE of V 2 .
(i) ȟˆ BLUUE of ȟ
The marginal pdf of ȟˆ , BLUUE of ȟ , is represented by
(1st version)
1 1/ 2 § 1 ·
f1 (ȟˆ ) = m/ 2 m
A cA exp ¨ 2 (ȟˆ ȟ )cA cA(ȟˆ ȟ ) ¸ d [1 " d [ m
(2S ) V © 2V ¹
or
(2nd version)
1 § 1 m
(Kˆi K ) 2 ·
f1 (Kˆ ) = (O1O2 " Om 1Om ) 1/ 2 exp ¨ 2 ¦ ¸,
(2S ) m / 2 (V 2 ) m / 2 © 2V i =1 Oi ¹
by means of Principal Component Analysis PCA also called Singular Value
Decomposition (SVD) or Eigenvalue Analysis (EIGEN) of ( A cA ) 1 ,
Ș = U[c ȟ ª U[c ( A cA) 1 U[ = ȁ = Diag (O1 , O2 ," , Om 1 , Om )
subject to «
Șˆ = U[c ȟˆ «¬ U[c U[ = I m , det U[ = +1
1 § 1 (Kˆ Ki ) 2 ·
f (Kˆi ) = exp ¨ 2 i ¸ i {1," , m}.
V Oi 2S © 2V Oi ¹
The transformed fixed effects (Kˆ1 ," ,Kˆm ) , BLUUE of (K1 ," ,Km ) , are mutually
independent and Gauss-Laplace normal
Kˆi N (Ki | V 2 Oi ) i {1," , m} .
(3rd version)
Kˆi K 1 1
zi := : f1 ( zi )dzi = exp( zi2 )dzi i {1," , m} .
V Oi2
2S 2
(ii) Vˆ 2 BIQUUE V 2
The marginal pdf of Vˆ 2 , BIQUUE V 2 , is represented by
(1st version)
p = n rk A
1 § 1 Vˆ 2 ·
f 2 (Vˆ 2 ) = p p / 2 p2
Vˆ exp ¨ p 2 ¸.
V p 2 p / 2 *( p / 2) © 2 V ¹
(2nd version)
dF2 = f 2 ( x)dx
Vˆ 2 p 1
x := (n rk A) 2
= 2 Vˆ 2 = 2 (y Aȟˆ )c(y Aȟˆ )
V V V
p
1 1 1
f 2 ( x) = p / 2 x 2 exp( x) .
2 *( p / 2) 2
f2(x) as the standard pdf of the normalized sample variance is a Helmert Chi
Square F p2 pdf with p = n rk A “degree of freedom”.
:Proof:
The first action item
n
First, let us decompose the quadratic form || y E{y} ||2 into estimates E{y} of
E{y} .
n
y E{y} = y E n
{y} + ( E{y} E{y})
and
n
(y E{y})c(y E{y}) = ( y E n
{y})c( y E n
{y}) + ( E n
{y} E{y})c( E{y} E{y})
n
|| y E{y} ||2 =|| y E n
{y} ||2 + || E{y} E{y} ||2
= y cMy = (n rk A)Vˆ 2
and
y E{y} = V H cz
1
(y E{y})c(y E{y}) = z cH cHz = z cz
V2
1
|| y E{y} ||2 =|| z ||2 .
V2
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 635
V cy = y y = Vy and Uc(ȟˆ ȟ ) = Șˆ Ș
lead to
1 1 1
2
(y E{y})c(y E{y}) = 2 ( y
) ȁ M y
+ 2 ( Șˆ Ș)cȁ N ( Șˆ Ș)
V V V
nm m
1 1 1
2
(y E{y})c(y E{y}) = 2 ¦ ( y
j ) 2 P j + ¦ (Kˆ i Ki ) 2 J i
V V j =1 V2 i =1
1
2
(y E{y})c(y E{y}) = z12 + " + zn2 m + zn2 m +1 + " + zn2
V
subject to
z12 + " + zn2 m = and zn2 m +1 + " + zn2 =
nm m
1 1
= ¦ (y )
2
j Pj = ¦ (Kˆ i Ki ) 2 J i
V2 j =1 V2 i =1
into the pdf of the Helmert random variable x := z12 + " + zn2 m = V 2 (n rk A)Vˆ 2 =
= V 2 (n m)Vˆ 2 and the pdf of the difference random parameter vector
zn2m+1 + " + zn2 = V 2 (ȟˆ ȟ )cA cA(ȟˆ ȟ ) .
dF = f ( z1 ," , zn m , zn m +1 ," , zn )dz1 " dzn m dzn m +1dzn
1 § 1 ·
f ( z1 ," , zn ) = exp ¨ z cz ¸ =
(2S ) n / 2 © 2 ¹
1 n 2m § 1 · 1
m
§ 1 ·
=( ) exp ¨ ( z12 + " + zn2 m ) ¸ ( ) 2 exp ¨ ( zn2 m +1 + " + zn2 ) ¸ .
2S © 2 ¹ 2S © 2 ¹
Part A
x := r 2 = z12 + " + zn2 m dx = 2( z1 dz1 + " + zn m dzn m )
ª z1 º
« " » = 1 Diag ( P ," , P )V cy
« » V 1 nm 1
«¬ zn m »¼
VV c = I n , V \ n× n , V1 \ n× ( n m ) , V2 \ n× m
and
ª zn m+1 º
« " » = 1 Diag ( J ," , J )Uc(ȟˆ ȟ )
« » V 1 m
«¬ zn »¼
altogether
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 637
ª z1 º
« ... »
« »
« zn m » ª Diag ( P1 ," , Pn m ) V1cy º
» =V «
1
« ».
« zn m+1 » «¬ Diag ( J 1 ," , J m )Uc(ȟˆ ȟ ) »¼
« ... »
« »
«¬ zn »¼
The partitioned vector of the standard random variable z is associated with the
norm || zn m ||2 and || zm ||2 , namely
|| zn m ||2 + || zm ||2 = z12 + " + zn2 m + zn2 m +1 + " + zn2 =
1
= y cV1 Diag ( P1 ," , P n m ) Diag ( P1 ," , P n m )V1cy +
V2
1 ˆ
+ (ȟ ȟ )cUDiag ( J 1 ," , J m ) Diag ( J 1 ," , J m )Uc(ȟˆ ȟ ) =
V2
1 1
= 2
y cV1 Diag ( P1 ," , P n m )V1cy + 2 (ȟˆ ȟ )cUDiag (J 1 ," , J m )U c(ȟˆ ȟ ) =
V V
dz1dz2 " dzn m 1dzn m = r n m 1dr (cos In m 1 ) n m 1 (cos In m 2 ) n m 2 "
" cos 2 I3 cos I2 dIn m 1dIn m 2 " dI3 dI2 dI1 .
1 n 2m 1
dFA = ( ) exp ( z12 + " + zn2 m )dz1 " dzn m =
2S 2
1 1 n 2m n m2 2
= ( ) x dx(cos In m 1 ) n m 1 (cos In m 2 ) n m 2 "
2 2S
" cos 2 I3 cos I2 dIn m 1dIn m 2 " dI3 dI2 dI1 .
Part B
638 Appendix D: Sampling distributions and their use
Part B focuses on the representation of the right pdf, first in terms of the random
variables ȟ̂ , second in terms of the canonical random variables K̂ .
1 m2 § 1 ·
dFr = ( ) exp ¨ ( zn2 m +1 + " + zn2 ) ¸ dzn m +1 " dzn
2S © 2 ¹
1 ˆ
zn2m+1 + " + zn2 = (ȟ ȟ )cA cA(ȟˆ ȟ )
V2
1
dzn m +1 " dzn = | A cA |1/ 2 d[ˆ1 " d[ˆm .
V m/2
The computation of the local m-dimensional hyper volume element dzn m +1 " dzn
has followed Corollary D3 which is based upon the matrix of the metric
V 2 AcA . The first representation of the right pdf is given by
1 m2 § 1 ·
dFr = ( ) exp ¨ ( zn2 m +1 + " + zn2 ) ¸ dzn m +1 " dzn =
2S © 2 ¹
1 m2 | A cA |1/ 2 § 1 ·
=( ) m/ 2
exp ¨ 2 (ȟˆ ȟ )cA cA(ȟˆ ȟ ) ¸ d [ˆ1 " d [ˆm .
2S V © 2V ¹
Let us introduce the canonical random variables (Kˆ1 ," ,Kˆm ) which are gener-
ated by the correlating quadratic form || ȟˆ ȟ ||2AcA .
1 1
(ȟˆ ȟ )cA cA(ȟˆ ȟ ) = (ȟˆ ȟ )cUDiag ( ," , )U c(ȟˆ ȟ ) .
O1 Om
N := A cA = UDiag (J 1 ," , J m )U c
versus
N 1 := ( A cA) 1 = UDiag (O1 ," , Om )U c
subject to
J 1 = O11 ," , J m = Om1 or O1 = J 11 ," , Om = J m1
1
| A cA |1/ 2 = J 1 "J m =
O1 " Om
Șˆ Ș := Uc(ȟˆ ȟ ) ȟˆ ȟ := Uc( Șˆ Ș)
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 639
1 1
|| ȟˆ ȟ ||2AcA =: (ȟˆ ȟ )cA cA (ȟˆ ȟ ) = ( Șˆ Ș)cDiag ( ,", )( Șˆ Ș) .
O1 Om
The local m-dimensional hypervolume element d[ˆ1 " d[ˆm is transformed to the
local m-dimensional hypervolume element dKˆ1 " dKˆm by
1 m2 1 § 1 1 1 ·
dFr = ( ) exp ¨ 2 (Șˆ Ș)cDiag ( ,", )(Șˆ Ș) ¸ dKˆ1 " dKˆm .
2S V m/2
O1 "Om © 2V O1 Om ¹
Part C
1 1 n 2m ( n m 2) / 2
dF = dFA dFr = ( ) x dxd Zn m 1
2 2S
1 m2 | A cA |1/ 2 § 1 ·
( ) exp ¨ (ȟˆ ȟ )cA cA(ȟˆ ȟ ) ¸ d [ˆ1 " d [ˆm
2S Vm © 2 ¹
or
1 1 n 2m ( n m 2) / 2
dF = dFA dFr = ( ) x dxd Zn m 1
2 2S
1 m2 1 § 1 1 1 ·
( ) exp ¨ 2 ( Șˆ Ș)c Diag ( ," , )( Șˆ Ș) ¸ dKˆ1 " dKˆm .
2S V O1 " Om
m
© 2V O1 Om ¹
as well as
dF1 = f1 ( Șˆ )dKˆ1 " dKˆm
leads us to
1 m2 | A cA |1/ 2 § 1 ·
f1 (ȟˆ ) = ( ) exp ¨ (ȟˆ ȟ )cA cA(ȟˆ ȟ ) ¸ .
2S Vm © 2 ¹
Unfortunately, such a general multivariate Gauss-Laplace normal distribution
cannot be tabulated. An alternative is offered by introducing canonical unknown
parameters Ș̂ as random variables.
The definition
f
1 1 nm
f1 ( Șˆ ) := ³ dx ³9d Zn m 1 ( ) 2 x ( n m 2) / 2
0 2 2S
1 m2 1 § 1 1 1 ·
( ) m
(O1O2 " Om 1Om ) 1/ 2 exp ¨ 2 ( Șˆ Ș)c Diag ( ," , )( Șˆ Ș) ¸
2S V © 2V O1 Om ¹
subject to
f
1 1 n 2m ( n m 2) / 2
³0 ³
dx 9d Z n m 1 ( ) x =1
2 2S
alternatively leads us to
1 m2 1 1 § 1 m
(Kˆi K ) 2 ·
f1 ( Șˆ ) = ( ) exp ¨ 2 ¦ ¸
2S V m O1 " Om © 2V i =1 Oi ¹
1 § 1 (Kˆi K ) 2 ·
f1 (Kˆi ) := exp ¨ ¸ i {1," , m} .
V Oi 2S © 2 Oi ¹
Obviously the transformed random variables (Kˆ1 ," ,Kˆm ) BLUUE of (K1 ," ,Km )
are mutually independent and Gauss-Laplace normal.
The sixth action item
Sixth, we shall compute the marginal pdf of Helmert’s random variable
x = (n rkA)Vˆ 2 / V 2 = (n m)Vˆ 2 / V 2 , Vˆ 2 BIQUUE V 2 ,
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 641
dF2 = f 2 ( x)dx
includes the second marginal pdf f 2 ( x) .
The definition
+f +f
1 1 nm 1 m | A cA |1/ 2
f 2 ( x) := 9³ d Zn m 1 ( ) 2 x ( n m 2) / 2
³ d [ˆ1 " ³ d [ˆm ( ) 2 exp
2 2S f f
2S Vm
§ 1 ˆ ˆ ·
¨ 2 (ȟ ȟ )cAcA(ȟ ȟ ) ¸
© 2V ¹
subject to
2
S ( n m 1) / 2
Zn m 1 = ³9d Zn m 1 = ,
n m 1
*( )
2
according to Lemma D4
+f +f m 1/ 2
[ˆ " d [ˆ ( 1 ) 2 | A cA | exp § 1 (ȟˆ ȟ )cA cA(ȟˆ ȟ ) · =
³ f³
f
" d 1 m
2S Vm ¨
© 2V
2 ¸
¹
+f +f
1 m2 § 1 ·
= ³ dz1 " ³ dzm ( ) exp ¨ ( z12 + " + zm2 ) ¸ = 1
f f
2S © 2 ¹
leads us to
p := n rk A = n m
n m 1 1 n m 1 nm p
S *( ) = * ( )* ( ) = *( ) = *( )
2 2 2 2 2
p
1 1 1
f 2 ( x) = p/2
x 2
exp( x) ,
2 *( p / 2) 2
namely the standard pdf of the normalised sample variance, known as Helmert’s
Chi Square pdf F p2 with p = n rk A = n m “degree of freedom”. If you substi-
tute x = (n rkA)Vˆ 2 / V 2 = (n m)Vˆ 2 / V 2 , dx = (n rk A)V 2 dVˆ 2 = ( n m)V 2 dVˆ 2
we arrive at the pdf of the sample variance Vˆ 2 , in particular
dF2 = f 2 (Vˆ 2 )dVˆ 2
1 § 1 Vˆ 2 ·
f 2 (Vˆ 2 ) = p p / 2 p 2
Vˆ exp ¨ p 2 ¸ .
V r 2 p / 2 *( p / 2) © 2 V ¹
642 Appendix D: Sampling distributions and their use
defines a special linear Gauss-Markov model with datum defect of fixed effects
ȟ \ m and a positive definite variance-covariance matrix D{y} = Ȉ y of multi-
variate Gauss-Laplace distributed observations y := [ y1 ," , yn ]c .
ȟˆ = Ly “linear”
ȟˆ = A + y subject to || LA I m ||2 = min “minimum bias”
trD{ȟˆ} = trLȈ y Lc = min “best”
and
ª E{Vˆ 2 } = V 2
1 « 4
Vˆ 2 = (y Aȟˆ )cȈ y1 (y Aȟˆ ) subject to « D{Vˆ 2 } = 2V
n rk A «¬ n rk A
Ȉ1/y 2 := Diag ( V 1 ,… , V n ) :
Ȉ y = ( Ȉ1/y 2 )cȈ1/y 2
versus
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 643
1 1
Ȉ y1 = : cDiag ( ,… , ) : =
V1 Vn
1 1 1 1
= W cDiag ( ,… , ) Diag ( ,… , )W
V1 Vn V1 Vn
1 1
Ȉ 1/ 2 := Diag ( ,… , )W
V1 Vn
1 1 1 1
= (y E{y})cW cDiag ( ,… , ) Diag ( ,… , ) W( y E{y})c =
V1 Vn V1 Vn
= (y
E{y
})c(y
E{y
}) =:|| y
E{y
} ||I2 n
Definitions und lemmas relating to statistics are given, neglecting their proofs.
First, we reviews the statistical moments of a probability distribution of random
vectors and list the Gauss-Laplace normal distribution and its moments. We
slightly generalize the Gauss-Laplace normal distribution by introducing the
notion of a quasi - normal distribution. At the end of the Chapter E1 we review
two lemmas about the Gauss-Laplace 3ı rule, namely the Gauss-Laplace ine-
quality and the Vysochainskii – Potunin inequality. Chapter E2 reviews the spe-
cial linear error propagation as well as the general nonlinear error propagation,
namely based on y = g(x) introducing the moments of second, third and fourth
order. The special role of the Jacobi matrix as well as the Hesse matrix is clari-
fied. Chapter E3 reviews useful identities like E{yy c
y} and E{yy c
yy c} as
well as Ȍ := E{(y E{y})(y E{y})c
(y E{y})} relating to the matrix of
obliquity and ī := E{(y E{y}) (y E{y})c
(y E{y})(y E{y})c} + ... relating
to the matrix of courtosis. The various notions of identifiability and unbiased
estimability in Chapter E4 are reviewed, both for the moments of first order and
for the central moments of second order.
E1: Moments of a probability distribution, the Gauss-Laplace
normal distribution and the quasi-normal distribution
First, we define the moments of a probability distribution of a vector valued
random function and explain the notion of a Gauss-Laplace normal distribution
and its moments. Especially we define the terminology of a quasi – Gauss-
Laplace normal distribution.
Definition E1 (statistical moments of a probability distribution):
(1) The expectation or first moment of a continuous random
vector [ X i ] for all i = 1,… , n of a probability density
f ( x1 ,… xn ) is defined as the mean vector ȝ := [ P1 ,… , P n ] of
E{Xi } = ȝ i
+f +f
ȝ i := E{Xi } = ³ … ³ xi f ( x1 ,… , xn )dx1 … dxn . (E1)
f f
3
ª 1 d i1 z i2 z i3 d n
E{e e e } = E{e } ««
n1 n2 n3 nj
i1 i2 i3 ij and (E9)
j =1
«¬ 0 d n1 + n2 + n3 d 3.
(5) The central moments of the nth order relative to the random
vector [ei ] := [ Xi E{Xi }] are defined by
S i …i := E{ei … ei } = E{( Xi P i )… ( Xi P i )}.
1 n 1 n 1 1 n n
(E13)
S ij = V ij , S ii = V i2 , (E18)
S ijk = 0, (E19)
S ijk Am = 0, (E22)
S i i …i
12 2 m 2 i2 m 1i2 m
= S i i …i
12 2 m2
Si 2 m 1i2 m
+ S i i …i
12 2 m 3i2 m 1
Si 2 m 2 i2 m
+ … + S i i …i S i i . (E23)
2 3 2 m 1 1 2m
E1 Moments of a probability distribution 647
subject to W 2 := E{( X Ȟ ) 2 } .
References about the two inequalities are C.F. Gauss (1823), F. Pukelsheim
(1994), J.R. Savage (1961), B. Ulin (1653), D.F. Vysochaniskii and Y. E. Petunin
(1980, 1983).
E2: Error propagation
At the beginning we note some properties of the operators expectation E and
dispersion D. Afterwards we review the special and general, also nonlinear
error propagation.
Lemma E7 (expectation operators E{X} ):
E is defined as a linear operator in the space of random variables
in R n , also called expectation operator. For arbitrary constants
D , E , G \ there holds the identity
E{D Xi + E Xi + G } . (E29)
Let A and B be two m × n and m × A matrices and į an m × 1
vector of constants x := [ X i ,… , X n ]c and y := [ Yi ,… , Yn ]c two
n × 1 and A × 1 random vectors such that
E{AX + BX + į} = AE{x} + BE{y} + į (E30)
holds. The expectation operator E is not multiplicative that is
E{Xi X j } = E{Xi }E{X j } + C{Xi , X j } z E{Xi }E{X j } , (E31)
if X i and X j are correlated.
E3:.Useful identities
Notable identities about higher order moments are the following.
Lemma E11 (identities: higher order moments):
(i) Kronecker products #1
The above statements are the reason that for the vectors M = Fȟ only linear esti-
mations are analyzed.
Each issue contains an Author Index where all the abstracts are listed according
to names of all the authors showing the abstract number and the classification
number,
Classfication Scheme 2000
(http://www.cbs.nl/isi/stma.htm)
Appendix F: Bibliographic Indexes 657
Abbe, E. (1906): Über die Gesetzmäßigkeit in der Verteilung der Fehler bei Beobach-
tungsreihen, Gesammelte Abhandlungen, vol. II, Jena 1863 (1906)
Abdous, B. and R. Theodorescu (1998): Mean, median, mode IV, Statistica Neerlandica
52 (1998) 356-359
Absil, P.-A. et al. (2002) : A Grassmann-Rayleigh quotient iteration for computing invari-
ant subspaces, SIAM Review 44 (2002) 57-73
Abramowitz, M. and J.A. Stegun (1965): Handbook of mathematical functions, Dover
Publication, New York 1965
Adams, M. and V. Guillemin (1996): Measure theory and probability, 2nd edition, Birk-
häuser Verlag, Boston 1996
Adatia, A. (1996): Asymptotic blues of the parameters of the logistic distribution based on
doubly censored samples, J. Statist. Comput. Simul. 55 (1996) 201-211
Adelmalek, N.N. (1974): On the discrete linear L1 approximation and L1 solutions of
overdetermined linear equations, J. Approximation Theory 11 (1974) 38-53
Afifi, A.A. and V. Clark (1996): Computer-aided multivariate analysis, Chapman and
Hall, Boca Raton 1996
Agostinelli, C. and M. Markatou (1998): A one-step robust estimator for regression based
on the weighted likelihood reweighting scheme, Stat.& Prob. Letters 37 (1998) 341-
350
Agrò, G. (1995): Maximum likelihood estimation for the exponential power function
parameters, Comm. Statist. Simul. Comput. 24 (1995) 523-536
Aickin, M. and C. Ritenbaugh (1996): Analysis of multivariate reliability structures and
the induced bias in linear model estimation, Statistics in Medicine 15 (1996) 1647-
1661
Aitchinson, J. and I.R. Dunsmore (1975): Statistical prediction analysis, Cambridge Uni-
versity Press, Cambridge 1975
Aitken, A.C. (1935): On least squares and linear combinations of observations, Proc. Roy.
Soc. Edinburgh 55 (1935) 42-48
Airy, G.B. (1861): On the algebraical and numerical theory of errors of observations and
the combination of observations, Macmillan Publ., London 1861
Akdeniz, F., Erol, H. and F. Oeztuerk (1999): MSE comparisons of some biased estima-
tors in linear regression model, J. Applied Statistical Science 9 (1999) 73-85
Albert, A. (1969): Conditions for positive and nonnegative definiteness in terms of pseudo
inverses, SIAM J. Appl. Math. 17 (1969) 434-440
Albertella, A. and F. Sacerdote (1995): Spectral analysis of block averaged data in geopo-
tential global model determination, J. Geodesy 70 (1995) 166-175
Alcala, J.T., Cristobal, J.A. and W. Gonzalez-Manteiga (1999): Goodness-of-fit test for
linear models based on local polynomials, Statistics & Probability Letters 42 (1999)
39-46
Aldrich, J. (1999): Determinacy in the linear model: Gauss to Bose and Koopmans, Inter-
national Statistical Review 67 (1999) 211-219
Alefeld, G. and J. Herzberger (1983): Introduction to interval computation. Computer
science and applied mathematics, Academic Press, New York - London 1983
Ali, A.K.A., Lin, C.Y. and E.B. Burnside (1997): Detection of outliers in mixed model
analysis, The Egyptian Statistical Journal 41 (1997) 182-194
Allende, S., Bouza, C. and I. Romero (1995): Fitting a linear regression model by combin-
ing least squares and least absolute value estimation, Questiio 19 (1995) 107-121
660 References
Allmer, F. (2001): Louis Krüger (1857-1923), 25 pages, Technical University of Graz,
Graz 2001
Alzaid, A.A. and L. Benkherouf (1995): First-order integer-valued autoregressive process
with Euler marginal distributions, J. Statist. Res. 29 (1995) 89-92
Ameri, A. (2000): Automatic recognition and 3D reconstruction of buildings through
computer vision and digital photogrammetry, Dissertation, Stuttgart University, Stutt-
gart 2000
An, H.-Z., F.J. Hickernell and L.-X. Zhu (1997): A new class of consistent estimators for
stochastic linear regressive models, J. Multivar. Anal. 63 (1997) 242-258
Anderson, P.L. and M.M. Meerscaert (1997): Periodic moving averages of random vari-
ables with regularly varying tails, Ann. Statist. 25 (1997) 771-785
Anderson, T.W. (1973): Asymptotically efficient estimation of covariance matrices with
linear structure, The Annals of Statistics 1 (1973) 135-141
Anderson, T.W. (2003): An introduction to multivariate statistical analysis, Wiley, Stan-
ford CA, 2003
Anderson, T.W. and M.A. Stephens (1972): Tests for randomness of directions against
equatorial and bimodal alternatives, Biometrika 59 (1972) 613-621
Anderson, W.N. and R.J. Duffin (1969): Series and parallel addition of matrices, J. Math.
Anal. Appl. 26 (1969) 576-594
Ando, T. (1979): Generalized Schur complements, Linear Algebra Appl. 27 (1979) 173-
186
Andrews, D.F. (1971): Transformations of multivariate data, Biometrics 27 (1971) 825-
840
Andrews, D.F. (1974): A robust method for multiple linear regression, Technometrics 16
(1974) 523-531
Andrews, D.F., Bickel, P.J. and F.R. Hampel (1972): Robust estimates of location, Prince-
ton University Press, Princeton 1972
Anh, V.V. and T. Chelliah (1999): Estimated generalized least squares for random coef-
ficient regression models, Scandinavian J. Statist. 26 (1999) 31-46
Anido, C. and T. Valdés (2000): Censored regression models with double exponential
error distributions: an iterative estimation procedure based on medians for correcting
bias, Revista Matemática Complutense 13 (2000) 137-159
Ansley, C.F. (1985): Quick proofs of some regression theorems via the QR Algorithm,
The American Statistician 39 (1985) 55-59
Antoch, J. and H. Ekblom (2003): Selected algorithms for robust M- and L-Regression
estimators, Developments in Robust Statistics, pp. 32-49, Physica Verlag, Heidelberg
2003
Anton, H. (1994): Elementary linear algebra, Wiley, New York 1994
Arnold, B.C. and N. Balakrishnan (1989): Relations, bounds and approximations for order
statistics, Lecture Notes in Statistics 53 (1989) 1-37
Arnold, B.C. and R.M. Shavelle (1998): Joint confidence sets for the mean and variance
of a normal distribution, American Statistical Association 52 (1998) 133-140
Arnold, B.F. and P. Stahlecker (1998): Prediction in linear regression combining crisp
data and fuzzy prior information, Statistics & Decisions 16 (1998) 19-33
Arnold, B.F. and P. Stahlecker (1999): A note on the robustness of the generalized least
squares estimator in linear regression, Allg. Statistisches Archiv 83 (1999) 224-229
Arnold, K.J. (1941): On spherical probability distributions, P.H. Thesis, Boston 1941
Arnold, S.F. (1981): The theory of linear models and multivariate analysis, J. Wiley, New
York 1981
References 661
Arrowsmith, D.K. and C.M. Place (1995): Differential equations, maps and chaotic be-
haviour, Champman and Hall, London 1995
Arun, K.S. (1992): A unitarily constrained total least squares problem in signal process-
ing, SIAM J. Matrix Anal. Appl. 13 (1992) 729-745
Atkinson, A.C. and L.M. Haines (1996): Designs for nonlinear and generalized linear
models, Handbook of Statistik 13 (1996) 437-475
Aven, T. (1993): Reliability and risk analysis, Chapman and Hall, Boca Raton 1993
Awange, J.L. (2002): Gröbner bases, multipolynomial resultants and the Gauss-Jacobi
combinatorial algorithms – adjustment of nonlinear GPS/LPS observations, Schriften-
reihe der Institute des Studiengangs Geodäsie und Geoinformatik, Report 2002.1
Awange, J. and E.W. Grafarend (2002): Linearized Least Squares and nonlinear Gauss-
Jacobi combinatorial algorithm applied to the 7-parameter datum transformation C7(3)
problem, Z. Vermessungswesen 127 (2002) 109-116
Axler, S., Bourdon, P. and W. Ramey, (2001): Harmonic function theory, 2nd ed.,
Springer Verlag, New York 2001
Azzalini, A. (1996): Statistical inference, Chapman and Hall, Boca Raton 1996
Azzam, A.M.H. (1996): Inference in linear models with nonstochastic biased factors,
Egyptian Statistical Journal, ISSR - Cairo University 40 (1996) 172-181
Azzam, A., Birkes, D. and J. Seely (1988): Admissibility in linear models with polyhedral
covariance structure, probability and statistics, essays in honor of Franklin A. Gray-
bill, J.N. Srivastava, Ed. Elsevier Science Publishers, B.V. (North-Holland) 1988
Baarda, W. (1967): A generalization of the concept strength of the figure, Publications on
Geodesy, New Series 2, Delft 1967
Baarda, W. (1968): A testing procedure for use in geodetic networks, Netherlands Geo-
detic Commission, New Series, Delft, Netherlands, 2 (5) 1968
Baarda, W. (1973): S-transformations and criterion matrices, Netherlands Geodetic
Commission, Vol. 5, No. 1, Delft 1973
Baarda, W. (1977): Optimization of design and computations of control networks, F.
Halmos and J. Somogyi (eds.) Akademiai Kiado, Budapest 1977, 419-436
Babai, L. (1986): On Lovasz' lattice reduction and the nearest lattice point problem, Com-
binatorica 6 (1986) 1-13
Babu, G.J. and E.D. Feigelson (1996): Astrostatistics, Chapman and Hall, Boca Raton
1996
Baddeley, A.J. (2000): Time-invariance estimating equations, Bernoulli 6 (2000) 783-808
Bai, J. (1994): Least squares estimation of a shift in linear processes, J.Time Series
Analysis 15 (1994) 453-472
Bai, Z.D. and Y. Wu (1997): General M-estimation, J. Multivariate Analysis 63 (1997)
119-135
Bai, Z.D., Rao, C.R. and Y.H. Wu (1997): Robust inference in multivariate linear regres-
sion using difference of two convex functions as the discrepancy measure, Handbook
of Statistics 15 (1997) 1-19
Bai, Z.D., Chan, X.R., Krishnaiah, P.R. and L.C. Zhao (1987): Asymptotic properties of
the EVLP estimation for superimposed exponential signals in noise, Technical Report
87-19, CMA, University of Pittsburgh, Pittsburgh 1987
Baksalary, J.K. and A. Markiewicz (1988): Admissible linear estimators in the general
Gauss-Markov model, J. Statist. Planning and Inference 19 (1988) 349-359
Baksalary, J.K. and F. Pukelsheim (1991b): On the Löwner, minus and star partial order-
ings of nonnegative definite matrices and their squares, Linear Algebra and its Appli-
cations 151 (1991) 135-141
662 References
Baksalary, J.K., Liski, E.P. and G. Trenkler (1989): Mean square error matrix improve-
ments and admissibility of linear estimators, J. Statist. Planning and Inference 23
(1989) 313-325
Baksalary, J.K., Rao, C.R. and A. Markiewicz (1992): A study of the influence of the
'natural restrictions' on estimation problems in the singular Gauss-Markov model, J.
Statist. Planning and Inference 31 (1992) 335-351
Balakrishnan, N. and Basu, A.P. (1995) (eds.): The exponential distribution, Gordon and
Breach Publishers, Amsterdam 1995
Balakrishnan, N. and R.A. Sandhu (1996): Best linear unbiased and maximum likelihood
estimation for exponential distributions under general progressive type-II censored
samples, Sankhya 58 (1996) 1-9
Bamler, R. and P. Hartl (1998): Synthetic aperture radar interferometry, Inverse Problems
14 (1998) R1-R54
Banachiewicz, T. (1937): Zur Berechnung der Determinanten, wie auch der Inversen und
zur darauf basierten Auflösung der Systeme linearen Gleichungen, Acta Astronom.
Ser. C3 (1937) 41-67
Bansal, N.K., Hamedani, G.G. and H. Zhang (1999): Non-linear regression with multidi-
mensional indices, Statistics & Probability Letters 45 (1999) 175-186
Bapat, R.B. (2000): Linear algebra and linear models, Springer, New York 2000
Barankin, E.W. (1949): Locally best unbiased estimates, Ann. Math. Statist. 20 (1949)
477-501
Barbieri, M.M. (1998): Additive and innovational outliers in autoregressive time series: a
unified Bayesian approach, Statistica 3 (1998) 395-409
Barham, R.H. and W. Drane (1972): An algorithm for least squares estimation of nonlin-
ear parameters when some of the parameters are linear, Technometrics 14 (1972) 757-
766
Bar-Itzhack, I.Y. and F. L., Markley (1990): Minimal parameter solution of the orthogo-
nal matrix differential equation, IEEE Transactions on automatic control 35 (1990)
314-317
Barlow, J.L. (1993): Numerical aspects of solving linear least squares problems, C.R.
Rao, ed., Handbook of Statistic 9 (1993) 303-376
Barlow, R.E. and F. Proschan (1966): Tolerance and confidence limits for classes of
distributions based on failure rate, Ann. Math. Statist 37 (1966) 1593-1601
Barlow, R.E., Clarotti, C.A., and F. Spizzichino (eds) (1993): Reliability and decision
making, Chapman and Hall, Boca Raton 1993
Barnard, J., McCulloch, R. and X.-L. Meng (2000): Modeling covariance matrices in
terms of standard deviations and correlations, with application to shrinkage, Statistica
Sinica 10 (2000) 1281-1311
Barnard, M.M. (1935): The scalar variations of skull parameters in four series of Egyptian
skulls, Ann. Eugen. 6 (1935) 352-371
Barndorff-Nielsen, O.E. (1978): Information and exponential families in statistical theory,
Wiley & Sons, Chichester & New York 1978
Barndorff-Nielsen, O.E., Cox, D.R. and C. Klüppelberg (2001): Complex stochastic
systems, Chapman and Hall, Boca Raton, Florida 2001
Barnett, V. (1999): Comparative statistical inference, 3rd ed., Wiley, Chichester 1999
Barone, J. and A. Novikoff (1978): A history of the axiomatic formulation of probability
from Borel to Kolmogorov, Part I, Archive for History of Exact Sciences 18 (1978)
123-190
Barrio, R. (2000): Parallel algorithms to evaluate orthogonal polynomial series, SIAM J.
Sci. Comput 21 (2000) 2225-2239
References 663
Barrlund, A. (1998): Efficient solution of constrained least squares problems with
Kronecker product structure, SIAM J. Matrix Anal. Appl. 19 (1998) 154-160
Barrodale, I. and D.D. Oleski (1981): Exponential approximation using Prony's method,
The Numerical Solution of Nonlinear Problems, eds. Baker, C.T.H. and C. Phillips,
258-269, 1998
Bartlett, M.S. and D.G. Kendall (1946): The statistical analysis of variance-heterogeneity
and the logarithmic transformation, Queen's College Cambridge, Magdalen College
Oxford, Cambridge/Oxford 1946
Bartoszynski, R. and M. Niewiadomska-Bugaj (1996): Probability and statistical infer-
ence, Wiley, New York 1996
Barut, A.O. and R.B.Haugen (1972): Theory of the conformally invariant mass, Annals of
Physics 71 (1972) 519-541
Basset JR., G. and R. Koenker (1978): Asymptotic theory of least absolute error , J. Amer.
Statist. Assoc. 73 (1978) 618-622
Bateman, H. (1910a): The transformation of the electrodynamical equations, Proc. Lon-
don Math. Soc. 8 (1910) 223-264, 469-488
Bateman, H. (1910b): The transformation of coordinates which can be used to transform
one physical problem into another, Proc. London Math. Soc. 8 (1910) 469-488
Bates, D.M. and M.J. Lindstorm (1986): Nonlinear least squares with conditionally linear
parameters, Proceedings of the Statistical Computation Section, American Statistical
Association, Washington 1986
Bates, D.M. and D.G. Watts (1980): Relative curvature measures of nonlinearity (with
discussion), J. Royal Statist. Soc. Ser. B 42 (1980) 1-25
Bates, D.M. and D.G. Watts (1988a): Nonlinear regression analysis and its applications,
John Wiley, New York 1988
Bates, D.M. and D.G. Watts (1988b): Applied nonlinear regression, J. Wiley, New York
1988
Bates, R.A., Riccomagno, E., Schwabe, R. and H.P. Wynn (1998): Lattices and dual
lattices in optimal experimental design for Fourier models, Computational Statistics &
Data Analysis 28 (1998) 283-296
Batschelet, E. (1965): Statistical methods for the analysis of problems in animal orienta-
tion and certain biological rhythms, Amer. Inst. Biol. Sciences, Washington 1965
Batschelet, E. (1971): Recent statistical methods for orientation, (Animal Orientation
Symposium 1970 on Wallops Island), Amer. Inst. Biol. Sciences, Washington, D.C.,
1971
Batschelet, E. (1981): Circular statistics in biology, Academic Press, London 1981
Bauer, H. (1992): Maß- und Integrationstheorie, 2. Auflage, Walter de Gruyter, Berlin /
New York 1992
Bauer, H. (1996): Probability theory, de Gruyter Verlag, Berlin-New York 1996
Bayen, F. (1976): Conformal invariance in physics, in: Cahen, C. and M. Flato (eds.),
Differential geometry and relativity, Reidel Publ., pages 171-195, Dordrecht 1976
Beale, E.M. (1960): Confidence regions in non-linear estimation, J. Royal Statist. Soc. B
22 (1960) 41-89
Beaton, A.E. and J.W. Tukey (1974): The fitting of power series, meaning polynomials,
illustrated on band-spectroscopic data, Technometrics 16 (1974) 147-185
Becker, T., Weispfennig, V. and H. Kredel (1998): Gröbner bases: a computational ap-
proach to commutative algebra, New York, Springer 1998
Beckermann, B. and E.B. Saff (1999): The sensitivity of least squares polynomial ap-
proximation, Int. Series of Numerical Mathematics, vol. 131: Applications and com-
putation of orthogonal polynomials (eds. W. Gautschi, G.H. Golub, G. Opfer) pp. 1-
19, Birkhäuser Verlag, Basel 1999
664 References
Beckers, J., Harnad, J., Perroud, M. and P. Winternitz (1978): Tensor fields invariant
under subgroups of the conformal group of space-time, J. Math. Phys. 19 (1978)
2126-2153
Behnken, D.W. and N.R. Draper (1972): Residuals and their variance, Technometrics 11
(1972) 101-111
Behrens, W.A. (1929): Ein Beitrag zur Fehlerberechnung bei wenigen Beobachtungen,
Landwirtschaftliche Jahrbücher 68 (1929) 807-837
Beichelt, F. (1997): Stochastische Prozesse für Ingenieure, Teubner Stuttgart 1997
Belikov, M.V. (1991): Spherical harmonic analysis and synthesis with the use of column-
wise recurrence relations, Manuscripta Geodaetica 16 (1991) 384-410
Belikov, M.V. and K.A. Taybatorov (1992): An efficient algorithm for computing the
Earth’s gravitational potential and its derivatives at satellite altitudes, Manuscripta
Geodaetica 17 (1992) 104-116
Belmehdi, S., Lewanowicz, S. and A. Ronveaux (1997): Linearization of the product of
orthogonal polynomials of a discrete variable, Applicationes Mathematicae 24 (1997)
445-455
Ben-Israel, A. and T. Greville (1974): Generalized inverses: Theory and applications,
Wiley, New York 1974
Benbow, S.J. (1999): Solving generalized least-squares problems with LSQR, SIAM J.
Matrix Anal. Appl. 21 (1999) 166-177
Benda, N. and R. Schwabe (1998): Designing experiments for adaptively fitted models,
in: MODA 5 – Advances in model-oriented data analysis and experimental design,
Proceedings of the 5th International Workshop in Marseilles, eds. Atkinson, A.C.,
Pronzato, L. and H.P. Wynn, Physica-Verlag, Heidelberg 1998
Bennett, R.J. (1979): Spatial time series, Pion Limited, London 1979
Beran, R.J. (1968): Testing for uniformity on a compact homogeneous space, J. App.
Prob. 5 (1968) 177-195
Beran, R.J. (1979): Exponential models for directional data, Ann. Statist. 7 (1979) 1162-
1178
Beran, R.J. (1994): Statistical methods for long memory processes, Chapman and Hall,
Boca Raton 1994
Berberan, A. (1992): Outlier detection and heterogeneous observations – a simulation
case study, Australian J. Geodesy, Photogrammetry and Surveying 56 (1992) 49-61
Berger, M.P.F. and F.E.S. Tan (1998): Optimal designs for repeated measures experi-
ments, Kwantitatieve Methoden 59 (1998) 45-67
Berman, A. and R.J. Plemmons (1979): Nonnegative matrices in the mathematical sci-
ences, Academic Press, New York 1979
Bertsekas, D.P. (1996): Incremental least squares methods and the extended Kalman
filter, Siam J. Opt. 6 (1996) 807-822
Berry, J.C. (1994): Improving the James-Stein estimator using the Stein variance estima-
tor, Statist. Probab. Lett. 20 (1994) 241-245
Bertuzzi, A., Gandolfi, A. and C. Sinisgalli (1998): Preference regions of ridge regression
and OLS according to Pitman’s criterion, Sankhya: The Indian J.Statistics 60 (1998)
437-447
Bessel, F.W. (1838): Untersuchungen über die Wahrscheinlichkeit der Beobachtungsfeh-
ler, Astronomische Nachrichten 15 (1838) 368-404
Betensky, R.A. (1997): Local estimation of smooth curves for longitudinal data, Statistics
in Medicine 16 (1997) 2429-2445
Beylkin, G. and N. Saito (1993): Wavelets, their autocorrelation function and multidimen-
sional representation of signals, in: Proceedings of SPIE - The international society of
optical engineering, Vol. LB 26, Int. Soc. for Optical Engineering, Bellingham 1993
References 665
Bhatia, R. (1996): Matrix analysis, Springer Verlag, New York 1996
Bhattacharya, R.N. and C.R. Rao (1976): Normal approximation and asymptotic expan-
sions, Wiley, New York 1976
Bhattacharya, R.N. and E.C. Waymire (2001): Iterated random maps and some classes of
Markov processes, D. N. Shanbhag and C.R. Rao, eds., Handbook of Statistic 19
(2001) 145-170
Bibby, J. (1974): Minimum mean square error estimation, ridge regression, and some
unanswered questions, colloquia mathematica societatis Janos Bolyai, Progress in sta-
tistics, ed. J. Gani, K. Sarkadi, I. Vincze, Vol. I, Budapest 1972, North Holland Publi-
cation Comp., Amsterdam 1974
Bickel, P.J. and K.A. Doksum (1977a): Mathematical statistics - Distribution theory for
transformations of random vectors, pp. 9-41, Holden-Day Inc 1977
Bickel, P.J. and K.A. Doksum (1977b): Mathematical statistics – Optimal tests and confi-
dence intervals: Likelihood ratio tests and related procedures, pp. 192-247, Holden-
Day Inc 1977
Bickel, P.J. and K.A. Doksum (1977c): Mathematical statistics – Basic ideas and selected
topics, pp. 369-406, Holden-Day Inc 1977
Bickel, P.J. and K.A. Doksum (1981): An analysis of transformations revisited, J.the
Maerican Statistical Association 76 (1981) 296-311
Bickel, P.J., Doksum, K. and J.L. Hodges (1982): A Festschrift for Erich L. Lehmann,
Chapman and Hall, Boca Raton 1982
Bierman, G.J. (1977): Factorization Methods for discrete sequential estimation, Academic
Press, New York 1997
Bill, R. (1985b): Kriteriummatrizen ebener geodätischer Netze, Deutsche Geodätische
Kommission, München, Reihe A, No. 102
Bilodeau, M. and D. Brenner (1999): Theory of multivariate statistics, Springer Verlag
1999
Bilodeau, M. and P. Duchesne (2000): Robust estimation of the SUR model, The Cana-
dian J.Statistics 28 (2000) 277-288
Bingham, C. (1964): Distributions on the sphere and propetive plane, PhD. Thesis, Yale
University 1964
Bingham, C. (1974): An antipodally symmetric distribution on the sphere, Ann. Statist. 2
(1974) 1201-1225
Bingham, C., Chang, T. and D. Richards (1992): Approximating the matrix Fisher and
Bingham distributions: Applications to spherical regression and Procrustes analysis,
J.Multivariate Analysis 41 (1992) 314-337
Bingham, N.H. (2001): Random Walk and fluctuation theory, D. N. Shanbhag and C.R.
Rao, eds., Handbook of Statistic 19 (2001) 171-213
Bini, D. and V. Pan (1994): Polynomial and matrix computations, Vol. 1: Fundamental
Algorithms, Birkhäuser, Boston 1994
Bill, R. (1984): Eine Strategie zur Ausgleichung und Analyse von Verdichtungsnetzen,
Deutsche Geodätische Kommission, Bayerische Akademie der Wissenschaften, Re-
port C295, 91 pp., München 1984
Bischof, C.H. and G. Quintana-Orti (1998): Computing rank-revealing QR factorizations
of dense matrices, ACM Transactions on Mathematical Software 24 (1998) 226-253
Bischoff, W. (1992): On exact D-optimal designs for regression models with correlated
observations, Ann. Inst. Statist. Math. 44 (1992) 229-238
Bjerhammar, A. (1951a): Rectangular reciprocal matrices with special reference to calcu-
lation, Bull. Géodésique 20 (1951) 188-210
666 References
Bjerhammar, A. (1951b): Application of calculus of matrices to the method of least-
squares with special reference to geodetic calculations, Trans. RIT, No. 49, Stock-
holm 1951
Bjerhammar, A. (1955): En ny matrisalgebra, SLT 211-288, Stockholm 1955
Bjerhammar, A. (1958): A generalized matrix algebra, Trans. RIT, No. 124, Stockholm
1958
Bjerhammar, A. (1973): Theory of errors and generalized matrix inverses, Elsevier, Am-
sterdam 1973
Björck, A. (1967): Solving linear least squares problems by Gram-Schmidt orthogonaliza-
tion, Nordisk Tidskr. Informationsbehandlung (BIT) 7 (1967) 1-21
Björck, A. (1996): Numerical methods for least squares problems, SIAM, Philadelphia
1996
Björck, A. and G.H. Golub (1973): Numerical methods for computing angles between
linear subspaces, Mathematics of Computation 27 (1973) 579-594
Björkström, A. and R. Sundberg (1999): A generalized view on continuum regression,
Scandinavian J.Statistics 26 (1999) 17-30
Blaker, H. (1999): Shrinkage and orthogonal decomposition, Scandinavian J.Statistics 26
(1999) 1-15
Blewitt, G. (2000): Geodetic network optimization for geophysical parameters, Geophysi-
cal Research Letters 27 (2000) 2615-3618
Bloomfield, P. and W.L. Steiger (1983): Least absolute deviations - theory, applications
and algorithms, Birkhäuser Verlag, Boston 1983
Bobrow, J.E. (1989): A direct minimization approach for obtaining the distance between
convex polyhedra, Int. J. Robotics Research 8 (1989) 65-76
Boggs, P.T., Byrd, R.H. and R.B. Schnabel (1987): A stable and efficient algorithm for
nonlinear orthogonal distance regression, SIAM J. Sci. Stat. Comput. 8 (1987) 1052-
1078
Bolfarine, H. and M. de Castro (2000): ANOCOVA models with measurement errors,
Statistics & Probability Letters 50 (2000) 257-263
Bollerslev, T. (1986): Generalized autoregressive conditional heteroskedasticity, J.
Econometrics 31 (1986) 307-327
Booth, J.G. and J.P. Hobert (1996): Standard errors of prediction in generalized linear
mixed models, J. American Statist. Assoc. 93 (1996) 262-272
Bordes L., Nikulin, M. and V. Voinov (1997): Unbiased estimation for a multivariate
exponential whose components have a common shift, J. Multivar. Anal. 63 (1997)
199-221
Borg, I. and P. Groenen (1997): Modern multidimensional scaling, Springer Verlag, New
York 1997
Borovkov, A.A. (1998): Mathematical statistics, Gordon and Breach Science Publishers,
Amsterdam 1998
Borre, K. (2001): Plane networks and their applications, Birkhäuser Verlag, Basel 2001
Bose, R.C. (1944): The fundamental theorem of linear estimation, Proc. 31st Indian Sci-
entific Congress (1944) 2-3
Bossler, J. (1973): A note on the meaning of generalized inverse solutions in geodesy, J.
Geoph. Res. 78 (1973) 2616
Bossler, J., Grafarend, E.W. and R. Kelm (1973): Optimal design of geodetic nets II, J.
Geoph. Res. 78 (1973) 5887-5897
Boulware, D.G., Brown, L.S. and R.D. Peccei (1970): Deep inelastic electroproduction
and conformal symmetry, Physical Review D2 (1970) 293-298
References 667
Box, G.E.P. and D.R. Cox (1964): Analysis of transformations, J.the Royal Statistical
Society, Series B 26 (1964) 211-252
Box, G.E.P. and G. Tiao (1973): Bayesian inference in statistical analysis, Addison-
Wesley, Reading 1973
Box, G.E.P. and N.R. Draper (1987): Empirical model-building and response surfaces, J.
Wiley, New York 1987
Box, M.J. (1971): Bias in nonlinear estimation, J. Royal Statistical Society B33 (1971)
171-201
Branco, M.D. (2001): A general class of multivariate skew-elliptical distributions,
J.Multivariate Analysis 79 (2001) 99-113
Brandt, S. (1992): Datenanalyse. Mit statistischen Methoden und Computerprogrammen,
3. Aufl., BI Wissenschaftsverlag, Mannheim 1992
Brandt, S. (1999): Data analysis: statistical and computational methods for scientists and
engineers, 3rd ed., Springer Verlag, New York 1999
Braess, D. (1986): Nonlinear approximation theory, Springer-Verlag, Berlin 1986
Breckling, J. (1989): Analysis of directional time series: application to wind speed and
direction, Springer Verlag, Berlin 1989
Brémaud P. (1999): Markov Chains – Gibbs Fields, Monte Carlo Simulation and Queues,
Springer Verlag New York 1999
Breslow, N.E. and D.G. Clayton (1993): Approximate inference in generalized linear
mixed models, J. Amer. Statist. Assoc. 88 (1993) 9-25
Brezinski, C. (1999): Error estimates for the solution of linear systems, SIAM J. Sci.
Comput. 21 (1999) 764-781
Brill, M. and E. Schock (1987): Iterative solution of ill-posed problems - a survey, in:
Model optimization in exploration geophysics, ed. A. Vogel, Vieweg, Braunschweig
1987
Bro, R. (1997): A fast non-negativity-constrained least squares algorithm, J. Chemomet-
rics 11 (1997) 393-401
Bro, R. and S. de Jong (1997): A fast non-negativity-constrained least squares algorithm,
J.Chemometrics 11 (1997) 393-401
Brock, J.E. (1968): Optimal matrices describing linear systems, AIAA J. 6 (1968) 1292-
1296
Brockwell, P.J. (2001): Continuous-time ARMA Processes, D. N. Shanbhag and C.R.
Rao, eds., Handbook of Statistic 19 (2001) 249-276
Brovelli, M.A., Sanso, F. and G. Venuti (2003): A discussion on the Wiener-Kolmogorov
prediction principle with easy-to-compute and robust variants, J. Geodesy 76 (2003)
673-683
Brown, B. and R. Mariano (1989): Measures of deterministic prediction bias in nonlinear
models, Int. Econ. Rev. 30 (1989) 667-684
Brown, B.M., Hall, P. and G.A. Young (1997): On the effect of inliers on the spatial
median, J. Multivar. Anal. 63 (1997) 88-104
Brown, H. and R. Prescott (1999): Applied mixed models in medicine, J. Wiley, Chiches-
ter 1999
Brown, K.G. (1976): Asymptotic behavior of Minque-type estimators of variance compo-
nents, The Annals of Statistics 4 (1976) 746-754
Brualdi, R.A. and H. Schneider (1983): Determinantal identities: Gauss, Schur, Cauchy,
Sylvester, Kronecker, Jacobi, Binet, Laplace, Muir and Cayley, Linear Algebra Appl.
52/53 (1983) 765-791
Brunk, H.D. (1958): On the estimation of parameters restricted by inequalities, Ann.
Math. Statist. 29 (1958) 437-454
668 References
Brunner, F.K., Hartinger, H. and L. Troyer (1999): GPS signal diffraction modelling: the
stochastic sigma- ' model, J. Geodesy 73 (1999) 259-267
Bruno, A.D. (2000): Power geometry in algebraic and differential equations, Elsevier,
Amsterdam-Lausanne-New York-Oxford-Shannon-Singapore-Tokyo 2000
Brzézniak, Z. and T. Zastawniak (1959): Basic stochastic processes, Springer Verlag,
Berlin 1959
Buja, A. (1996): What Criterion for a Power Algorithm?, Rieder, H. (editor): Robust
statistics, data analysis and computer intensive methods, In honour of Peter Huber’s
60th Birthday, Springer 1996
Buhmann, M.D. (2001): Approximation and interpolation with radial functions, In: Multi-
variate Approximation and Applications, pp. 25-43, Cambridge University Press,
Cambridge 2001
Bunday, B.D., Bokhari S.M.H. and K.H. Khan (1997): A new algorithm for the normal
distribution function, Sociedad de Estadistica e Investigacion Operativa 6 (1997) 369-
377
Bunke, H. und O. Bunke (1974): Identifiability and estimability, Math. Operations-
forschg. Statist. 5 (1974) 223-233
Bunke, H. and O. Bunke (1986): Statistical inference in linear models, J. Wiley, New
York 1986
Bunke, O. (1977): Mixed models, empirical Bayes and Stein estimators, Math. Opera-
tionsforschg. Ser. Statistics 8 (1977) 55-68
Buonaccorsi, J., Demidenko, E. and T. Tosteson (2000): Estimation in longitudinal ran-
dom effects models with measurement error, Statistica Sinica 10 (2000) 885-903
Burgio, G. and Y. Nitkitin (1998): Goodness-of-fit tests for normal distribution of order p
and their asymptotic effiency, Statistica 58 (1998) 213-230
Burns, F., Carlson, D., Haynsworth, E., and T. Markham (1974): Generalized inverse
formulas using the Schur complement, SIAM J. Appl. Math. 26 (1974) 254-259
Businger, P. and G.H. Golub (1965): Linear least squares solutions by Householder trans-
formations, Numer. Math., 7 (1965) 269-276
Butler, N.A. (1999): The efficiency of ordinary least squares in designed experiments
subject to spatial or temporal variation, Statistics & Probability Letters 41 (1999) 73-
81
Caboara, M. and E. Riccomagno (1998): An algebraic computational approach to the
identifiability of Fourier models, J. Symbolic Computation 26 (1998) 245-260
Cadet, A. (1996): Polar coordinates in Rnp, application to the computation of the Wishart
and Beta laws, Sankhya: The Indian J.Statistics 58 (1996) 101-114
Cai, J., Grafarend, E.W. and B. Schaffrin (2004): The A-optimal regularization parameter
in uniform Tykhonov-Phillips regularization - D -weighted BLE -, V Hotine-Marussi
Symposium on Mathematical Geodesy, Matera / Italy 2003, in: International Associa-
tion of Geodesy Symposia 127, pp. 309-324, Springer Verlag Berlin – Heidelberg
2004
Cambanis, S. and I. Fakhre-Zakeri (1996): Forward and reversed time prediction of auto-
regressive sequences, J. Appl. Prob. 33 (1996) 1053-1060
Campbell, H.G. (1977): An introduction to matrices, vectors and linear programming, 2nd
ed., Printice Hall, Englewood Cliffs 1977
Candy, J.V. (1988): Signal processing, McGrow Hill, New York 1988
Cantoni, E. (2003): Robust inference based on quasi-likelihoods for generalized linear
models and longitudinal data, Developments in Robust Statistics, pp. 114-124,
Physica Verlag, Heidelberg 2003
Carlin, B.P. and T.A. Louis (1996): Bayes and empirical Bayes methods, Chapman and
Hall, Boca Raton 1996
References 669
Carlitz, L. (1963): The inverse of the error function, Pacific J. Math. 13 (1963) 459-470
Carlson, D., Haynsworth, E. and T. Markham (1974): A generalization of the Schur com-
plement by means of the Moore-Penrose inverse, SIAM J. Appl. Math. 26 (1974) 169-
179
Carlson, D. (1986): What are Schur complements, anyway?, Linear Algebra and its Ap-
plications 74 (1986) 257-275
Carroll, J.D., Green, P.E. and A. Chaturvedi (1999): Mathematical tools for applied mul-
tivariate analysis, Academic Press, San Diego 1999
Carroll, J.D. and P.E. Green (1997): Mathematical tools for applied multivariate analysis,
Academic Press, San Diego 1997
Carroll, R.J. and D. Ruppert (1982): A comparison between maximum likelihood and
generalized least squares in a heteroscedastic linear model, M. American Statist.
Assoc. 77 (1982) 878-882
Carroll, R., Ruppert, D. and L. Stefanski (1995): Measurement error in nonlinear models,
Chapman and Hall, Boca Raton 1995
Carruthers, P. (1971): Broken scale invariance in particle physics, Phys. Lett. Rep. 1
(1971) 1-30
Caspary, W. (2000): Zur Analyse geodätischer Zeitreihen, Schriftenreihe, Heft 60-1,
Neubinerg 2001
Caspary, W. and K. Wichmann (1994): Lineare Modelle. Algebraische Grundlagen und
statistische Anwendungen, Oldenbourg Verlag, München / Wien 1994
Castillo, J. (1994): The singly truncated normal distribution, a non-steep exponential
family, Ann. Inst. Statist. Math 46 (1994) 57-66
Castillo, J. and M. Perez-Casany (1998): Weights poison distributions for overdispersion
and underdispersion situations, Ann. Ins. Statist. Math. 50 (1998) 567-585
Castillo, J. and P. Puig (1997): Testing departures from gamma, Rayleigh and truncated
normal distributions, Ann. Inst. Statist.Math. 49 (1997) 255-269
Cayley, A. (1855): Sept différents mémoires d'analyse, No. 3, Remarque sur la notation
des fonctions algebriques, Journal für die reine und angewandte Mathematik 50
(1855) 282-285
Cayley, A. (1858): A memoir on the theory of matrices, Phil. Transactions, Royal Society
of London 148 (1858) 17-37
Cenkov, N.N. (1972): Statistical decision rule and optimal inference, Nauka 1972
Chan, K.-S. and H. Tong (2001): Chaos, a statistical perspective, Springer-Verlag, New
York 2001
Chan, K.-S. and H. Tong (2002): A note on the equivalence of two approaches for speci-
fying a Markov process, Bernoulli 8 (2002) 117-122
Chan, L.-Y. (2000): Optimal designs for experiments with mixtures: a survey, Commun.
Statist.-Theory Meth. 29 (2000) 2281-2312
Chan, T.F. and P.C. Hansen (1991): Some applications of the rank revealing QR factori-
zations, Numer. Linear Algebra Appl., 1 (1991) 33-44
Chan, T.F. and P.C. Hansen (1992): Some applications of the rank revealing QR factori-
zation, SIAM J. Sci. Statist. Comput., 13 (1992), pp. 727-741
Chandrasekaran, S. (2000): An efficient and stable algorithm for the symmetric-definite
generalized eigenvalue problem, SIAM J. Matrix Anal. Appl. 21 (2000) 1202-1228
Chandrasekaran, S. and I.C.F. Ipsen (1995): Analysis of a QR algorithm for computing
singular values, SIAM J. Matrix Anal. Appl. 16 (1995) 520-535
Chandrasekaran, S., Gu, M. and A.H. Sayed (1998): A stable and efficient algorithm for
the indefinite linear least-squares problem, SIAM J. Matrix Anal. Appl. 20 (1998)
354-362
670 References
Chandrasekaran, S., Golub, G.H., Gu, M. and A.H. Sayed (1998): Parameter estimation in
the presence of bounded data uncertainties, SIAM J. Matrix Anal. Appl. 19 (1998)
235-252
Chang, F.-C. (1999): Exact D-optimal designs for polynomial regression without inter-
cept, Statistics & Probability Letters 44 (1999) 131-136
Chang, F.-C. and Y.-R. Yeh (1998): Exact A-optimal designs for quadratic regression,
Statistica Sinica 8 (1998) 527-533
Chang, T. (1986): Spherical regression, Annals of Statistics 14 (1986) 907-924
Chang, T. (1988): Estimating the relative rotations of two tectonic plates from boundary
crossings, American Statis. Assoc. 83 (1988) 1178-1183
Chang, T. (1993): Spherical regression and the statistics of tectonic plate reconstructions,
International Statis. Rev. 51 (1993) 299-316
Chapman, D.G. and H. Robbins (1951): Minimum variance estimation without regularity
assumptions, Ann. Math. Statist. 22 (1951) 581-586
Chartres, B.A. (1963): A geometrical proof of a theorem due to Slepian, SIAM Review 5
(1963) 335-341
Chatfield, C. and A.J. Collins (1981): Introduction to multivariate analysis, Chapman and
Hall, Boca Raton 1981
Chatterjee, S. and A.S. Hadi (1988): Sensitivity analysis in linear regression, J. Wiley,
New York 1988
Chatterjee, S. and M. Mächler (1997): Robust regression: a weighted least-squares ap-
proach, Commun. Statist. Theor. Meth. 26 (1997) 1381-1394
Chaturvedi, A. and A.T.K. Wan (1998): Stein-rule estimation in a dynamic linear model,
J. Appl. Stat. Science 7 (1998) 17-25
Chaturvedi, A. and A.T.K. Wan (2001): Stein-rule restricted regression estimator in a
linear regression model with nonspherical disturbances, Commun. Statist.-Theory
Meth. 30 (2001) 55-68
Chaturvedi, A. and A.T.K. Wan (1999): Estimation of regression coefficients subject to
interval constraints in models with non-spherical errors, Indian J.Statistics 61 (1999)
433-442
Chauby, Y.P. (1980): Minimum norm quadratic estimators of variance components,
Metrika 27 (1980) 255-262
Chen, C. (2003): Robust tools in SAS, Developments in Robust Statistics, pp. 125133,
Physica Verlag, Heidelberg 2003
Chen, H.-C. (1998): Generalized reflexive matrices: Special properties and applications,
Society for Industrial and Applied Mathematics, 9 (1998) 141-153
Chen, R.-B. and M.-N.L., Huang (2000): Exact D-optimal designs for weighted polyno-
mial regression model, Computational Statistics & Data Analysis 33 (2000) 137-149
Chen, X. (2001): On maxima of dual function of the CDT subproblem, J. Comput.
Mathematics 19 (2001) 113-124
Chen, Z. and J. Mi (1996): Confidence interval for the mean of the exponential distribu-
tion, based on grouped data, IEEE Transactions on Reliability 45 (1996) 671-677
Cheng, C.L. (1998): Polynomial regression with errors in variables, J. Royal Statistical
Soc. B60 (1998) 189-199
Cheng, C.L. and J.W. van Ness (1999): Statistical regression with measurement error,
Arnold Publ., London 1999
Cheng, C.-S. (1996): Optimal design: exact theory, Handbook of Statistik 13 (1996) 977-
1006
References 671
Cherrie, J.B., Beatson, R.K. and G.N. Newsam (2002): Fast evaluation of radial basis
functions: methods for generalized multiquadrics in RN*, SIAM J. Sci. Comput. 23
(2002) 1459-1571
Chiang, C.-Y. (1998): Invariant parameters of measurement scales, British J.Mathematical
and Statistical Psychology 51 (1998) 89-99
Chiang, C.L. (2003): Statistical methods of analysis, University of California, Berkeley,
USA 2003
Chikuse, Y. (1999): Procrustes analysis on some special manifolds, Commun. Statist.
Theory Meth. 28 (1999) 885-903
Chilès, J.P. and P. Delfiner (1999): Geostatistics - modelling spatial uncertainty, J. Wiley,
New York 1999
Chiodi, M. (1986): Procedures for generating pseudo-random numbers from a normal
distribution of order, Riv. Stat. Applic. 19 (1986) 7-26
Chmielewski, M.A. (1981): Elliptically symmetric distributions: a review and bibliogra-
phy, International Statistical Review 49 (1981) 67-74
Chow, T.L. (2000): Mathematical methods for physicists, Cambridge University Press,
Cambridge 2000
Chow, Y.S. and H. Teicher (1978): Probability theory, Springer Verlag, New York 1978
Christensen, R. (1996): Analysis of variance, design and regression, Chapman and Hall,
Boca Raton 1996
Chu, M.T. and N.T. Trendafilov (1998): Orthomax rotation problem. A differential equa-
tion approach, Behaviormetrika 25 (1998) 13-23
Chui, C.K. and G. Chen (1989): Linear Systems and optimal control, Springer Verlag,
New York 1989
Chui, C.K. and G. Chen (1991): Kalman filtering with real time applications, Springer
Verlag, New York 1991
Clark, G.P.Y.(1980): Moments of the least squares estimators in a nonlinear regression
model, JR. Statist. Soc. B42 (1980) 227-237
Clerc-Bérod, A. and S. Morgenthaler (1997): A close look at the hat matrix, Student 2
(1997) 1-12
Cobb, G.W. (1997): Introduction to design and analysis of experiments, Springer Verlag,
New York 1997
Cobb, L., Koppstein, P. and N.H. Chen (1983): Estimation and moment recursions rela-
tions for multimodal distributions of the exponential family, J. American Statist.
Assoc. 78 (1983) 124-130
Cochran, W. (1972a): Some effects of errors of measurement on linear regression, in:
Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Prob-
ability, pages 527-539, UCP, Berkeley 1972
Cochran, W. (1972b): Stichprobenverfahren, de Gruyter, Berlin 1972
Cohen, A. (1966): All admissible linear estimates of the mean vector, Ann. Math. Statist.
37 (1966) 458-463
Cohen, C. and A. Ben-Israel (1969): On the computation of canonical correlations, Ca-
hiers Centre Études Recherche Opér 11 (1969) 121-132
Collett, D. (1992): Modelling binary data, Chapman and Hall, Boca Raton 1992
Collet, D. and T. Lewis (1981): Discriminating between the von Mises and wrapped
normal distributions, Austr. J. Statist. 23 (1981) 73-79
Colton, D., Coyle, J. and P. Monk (2000): Recent developments in inverse acoustic scat-
tering theory, SIAM Review 42 (2000) 369-414
Cook, R.D., Tsai, C.L. and B.C. Wei (1986): Bias in nonlinear regression, Biometrika 73
(1986) 615-623
672 References
Cook, R.D. and S. Weisberg (1982): Residuals and influence in regression, Chapman and
Hall, London 1982
Cottle, R.W. (1974): Manifestations of the Schur complement, Linear Algebra Appl. 8
(1974) 189-211
Cox, A.J. and N.J. Higham (1999): Row-wise backward stable elimination methods for
the equality constrained least squares problem, SIAM J. Matrix Anal. Appl. 21 (1999)
313-326
Cox, D., Little, J. and D. O’Shea (1992): Ideals varieties and algorithms, Springer Verlag,
New York 1992
Cox, D.R. and D.V. Hinkley (1979): Theoretical statistics, Chapman and Hall, Boca
Raton 1979
Cox, D.R. and V. Isham (1980): Point processes, Chapman and Hall, Boca Raton 1980
Cox, D.R. and E.J. Snell (1989): Analysis of binary data, Chapman and Hall, Boca Raton
1989
Cox, D.R. and N. Reid (2000): The theory of the design of experiments, Chapman & Hall,
Boca Raton 2000
Cox, D.R. and N. Wermuth (1996): Multivariate dependencies, Chapman and Hall, Boca
Raton 1996
Cox, T.F. and M.A.A. Cox (2001): Multidimensional scaling, Chapman and Hall, Boca
Raton, Florida 2001
Cox, D.R. and P.J. Salomon (2003): Components of variance, Chapman & Hall/CRC,
Boca Raton – London – New York – Washington D.C. 2003
Craig, A.T. (1943): Note on the independence of certain quadratic forms, The Annals of
Mathematical Statistics 14 (1943) 195-197
Cross, P.A. (1985): Numerical methods in network design, in: Grafarend, E.W. and F.
Sanso (eds.), Optimization and design of geodetic networks, pp. 429-435, Springer,
Berlin - Heidelberg - New York 1985,
Crowder, M.J. (1987): On linear and quadratic estimating function, Biometrika 74 (1987)
591-597
Crowder, M.J. and D.J. Hand (1990): Analysis of repeated measures, Chapman and Hall,
Boca Raton 1990
Crowder, M.J., Sweeting, T. and R. Smith (1994): Statistical analysis of reliability data,
Chapman and Hall, Boca Raton 1994
Crowder, M.J., Kimber, A., Sweeting, T., and R. Smith (1993): Statistical analysis of
reliability data, Chapman and Hall, Boca Raton 1993
Csörgö, M. and L. Horvath (1993): Weighted approximations in probability and statistics,
J. Wiley, Chichester 1993
Csörgö, S. and L. Viharos (1997): Asymptotic normality of least-squares estimators of tail
indices, Bernoulli 3 (1997) 351-370
Csörgö, S. and J. Mielniczuk (1999): Random-design regression under long-range de-
pendent errors, Bernoulli 5 (1999) 209-224
Cummins, D. and A.C. Webster (1995): Iteratively reweighted partial least-squares: a
performance analysis by Monte Carlo simulation, J. Chemometrics 9 (1995) 489-507
Cunningham, E. (1910): The principle of relativity in electrodynamics and an extension
thereof, Proc. London Math. Soc. 8 (1910) 77-98
Czuber, E. (1891): Theorie der Beobachtungsfehler, Leipzig 1891
D‘Agostino, R. and M.A. Stephens (1986): Goddness-of-fit techniques, Marcel Dekker,
New York 1986
Daniel, J.W. (1967): The conjugate gradient method for linear and nonlinear operator
equations, SIAM J. Numer. Anal. 4 (1967) 10-26
References 673
Dantzig, G.B. (1940): On the nonexistence of tests of „Student’s“ hypothesis having
power functions independent of V2, Ann. Math. Statist. 11 (1949) 186-192
Das, I. (1996): Normal-boundary intersection: A new method for generating the Pareto
surface in nonlinear multicriteria optimization problems, SIAM J. Optim. 3 (1998)
631ff.
Das, R. and B.K. Sinha (1987): Robust optimum invariant unbiased tests for variance
components. In Proc. of the Second International Tampere Conference in Statistics. T.
Pukkila and S. Puntanen (eds.), University of Tampere - Finland (1987) 317-342
Das Gupta, S. (1980): Distribution of the correlation coefficient, in: Fienberg, S., Gani, J.,
Kiefer, J. and K. Krickeberg (eds): Lecture notes in statistics, pp. 9-16, Springer Ver-
lag 1980
Das Gupta, S., Mitra, S.K., Rao, P.S., Ghosh, J.K., Mukhopadhyay, A.C. and Y.R. Sarma
(1994a): Selected papers of C.R.Rao, vol. 1, John Wiley, New York 1994
Das Gupta, S., Mitra, S.K., Rao, P.S., Ghosh, J.K., Mukhopadhyay, A.C. and Y.R. Sarma
(1994b): Selected papers of C.R.Rao, vol. 2, John Wiley, New York 1994
David, F.N. and N.L. Johnson (1948): The probability integral transformation when pa-
rameters are estimated from the sample, Biometrika 35 (1948) 182-190
David, F.N. (1954): Tables of the ordinates and probability integral of the distribution of
the correlation coefficient in small samples, Cambridge University Press, London
1954
David, H.A. (1957): Some notes on the statistical papers of Friedrich Robert Helmert
(1943-1917), Bull. Stat. Soc. NSW 19 (1957) 25-28
David, H.A. (1970): Order Statistics, J. Wiley, New York 1970
David, H.A., Hartley, H.O. and E.S. Pearson (1954): The distribution of the ratio in a
single normal sample, of range to standard deviation, Biometrika 41 (1954) 482-293
Davidian, M. and A.R. Gallant (1993): The nonlinear mixed effects model with a smooth
random effects density, Biometrika 80 (1993) 475-488
Davidian, M., and D.M. Giltinan (1995): Nonlinear models for repeated measurement
data, Chapman and Hall, Boca Raton 1995
Davis, R.A.(1997) : M-estimation for linear regression with infinite variance, Probability
and Mathematical Statistics 17 (1997) 1-20
Davis, R. A. and W.T.M. Dunsmuir (1997): Least absolute deviation estimation for re-
gression with ARMA errors, J. theor. Prob. 10 (1997) 481-497
Davis, J.H. (2002): Foundations of deterministic and stochastic control, Birkhäuser, Bos-
ton-Basel-Berlin 2002
Davison, A.C. and D.V. Hinkley (1997): Bootstrap methods and their application, Cam-
bridge University Press, Cambridge 1997
Dax, A. (1997): An elementary proof of Farkas’ lemma, SIAM Rev. 39 (1997) 503-507
Decreusefond, L. and A.S. Üstünel (1999): Stochastic analysis of the fractional Brownian
motion, Potential Anal. 10 (1999) 177-214
Dedekind, R. (1901): Gauß in seiner Vorlesung über die Methode der kleinsten Quadrate,
Berlin 1901
Defant, A. and K. Floret (1993): Tensor norms and operator ideals, North Holland, Am-
sterdam 1993
Deitmar, A. (2002): A frist course in harmonic analysis, Springer Verlag, New York 2002
Demidenko, E. (2000): Is this the least squares estimate?, Biometrika 87 (2000) 437-452
Denham, M.C. (1997): Prediction intervals in partial least-squares, J. Chemometrics 11
(1997) 39-52
Denham, W. and S. Pines (1966): Sequential estimation when measurement function
nonlinearity is comparable to measurement error, AIAA J4 (1966) 1071-1076
674 References
Denis, J.-B. and A. Pazman (1999): Bias of LS estimators in nonlinear regression models
with constraints. Part II: Biadditive models, Applications of Mathematics 44 (1999)
375-403
Denison, D.G.T., Walden, A.T., Balogh, A. and R.J. Forsyth (1999): Multilayer testing of
spectral lines and the detection of the solar rotation frequency and its harmonics,
Appl. Statist. 48 (1999) 427-439
Dermanis, A. (1977): Geodetic linear estimation techniques and the norm choice problem,
Manuscripta Geodetica 2 (1977) 15-97
Dermanis, A. (1978): Adjustment of geodetic observations in the presence of signals,
International School of Advanced Geodesy, Erice, Sicily, May-June 1978, Bollettino
di Geodesia e Scienze Affini 38 (1979) 513-539
Dermanis, A. (1998): Generalized inverses of nonlinear mappings and the nonlinear geo-
detic datum problem, J. Geodesy 72 (1998) 71-100
Dermanis, A. and E.W. Grafarend (1981): Estimability analysis of geodetic, astrometric
and geodynamical quantities in Very Long Baseline Interferometry, Geoph. J. R. As-
tronom. Soc. 64 (1981) 31-56
Dermanis, A. and F. Sanso (1995): Nonlinear estimation problems for nonlinear models,
Manuscripta Geodaetica 20 (1995) 110-122
Dermanis, A. and R. Rummel (2000a): Parameter estimation as an inverse problem, Lec-
ture Notes in Earth Sciences 95 (2000) 24-47
Dermanis, A. and R. Rummel (2000b): The statistical approach to parameter determina-
tion: Estimation and prediction, Lecture Notes in Earth Sciences 95 (2000) 53-73
Dermanis, A. and R. Rummel (2000c): From finite to infinite-dimensional models (or
from discrete to continuous models), Lecture Notes in Earth Sciences 95 (2000) 53-73
Dermanis, A. and R. Rummel (2000d): Data analysis methods in geodesy, Lecture Notes
in Earth Sciences 95, Springer 2000
De Santis, A. (1991): Translated origin spherical cap harmonic analysis, Geoph. J. Int 106
(1991) 253-263
Dette, H. (1993): A note on E-optimal designs for weighted polynomial regression, Ann.
Stat. 21 (1993) 767-771
Dette, H. (1997a): Designing experiments with respect to 'standardized' optimality crite-
ria, J.R. Statist. Soc. B 59 (1997) 97-110
Dette, H. (1997b): E-optimal designs for regression models with quantitative factors – a
reasonable choice?, The Canadian J.Statistics 25 (1997) 531-543
Dette, H. and W. J. Studden (1993): Geometry of E-optimality, Ann. Stat. 21 (1993) 416-
433
Dette, H. and W. J. Studden (1997): The theory of canonical moments with applications in
statistics, probability, and analysis, J. Wiley, New York 1997
Dette, H. and T.E. O'Brien (1999): Optimality criteria for regression models based on
predicted variance, Biometrika 86 (1999) 93-106
Deutsch, F. (2001): Best approximation in inner product spaces, Springer Verlag, New
York 2001
Devidas, M. and E.O. George (1999): Monotonic algorithms for maximum likelihood
estimation in generalized linear models, The Indian J.Statistical 61 (1999) 382-396
Dewess, G. (1973): Zur Anwendung der Schätzmethode MINQUE auf Probleme der
Prozeßbilanzierung, Math. Operationsforschg. Statistik 4 (1973) 299-313
DiCiccio, T.J. and B. Efron (1996): Bootstrap confidence intervals, Statistical Science 11
(1996) 189-228
Diebolt, J. and J. Zuber (1999): Goodness-of-fit tests for nonlinear heteroscedastic regres-
sion models, Statistics & Probability Letters 42 (1999) 53-60
References 675
Dieck, T. (1987): Transformation groups, W de Gruyter, Berlin - New York 1987
Diggle, P.J., Liang, K.Y. and S.L. Zeger (1994): Analysis of longitudinal data, Clarendon
Press, Oxford 1994
Ding, C.G. (1999): An efficient algorithm for computing quantiles of the noncentral chi-
squared distribution, Computational Statistics & Data Analysis 29 (1999) 253-259
Dixon, W.J. (1951): Ratio involving extreme values, Ann. Math. Statistics 22 (1951) 68-
78
Dobson, A.J. (1990): An introduction to generalized linear models, Chapman and Hall,
Boca Raton 1990
Dobson, A.J. (2002): An introduction to generalized linear models, 2nd ed., Chapman -
Hall - CRC, Boca Raton 2002
Dodge, Y. (1987): Statistical data analysis based on the L1-norm and related methods,
Elsevier, Amsterdam 1987
Dodge, Y. (1997): LAD Regression for Detecting Outliers in Response and Explanatory
Variables, J. Multivariate Analysis 61 (1997) 144-158
Dodge, Y. and A.S. Hadi (1999): Simple graphs and bounds for the elements of the hat
matrix, J. Applied Statistics 26 (1999) 817-823
Dodge, Y. and D. Majumdar (1979): An algorithm for finding least square generalized
inverses for classification models with arbitrary patterns, J. Statist. Comput. Simul. 9
(1979) 1-17
Dodge, Y. and J. Jurecková (1997): Adaptive choice of trimming proportion in trimmed
least-squares estimation, Statistics & Probability Letters 33 (1997) 167-176
Donoho, D.L. and P.J. Huber (1983): The notion of breakdown point, Festschrift für Erich
L. Lehmann, eds. P.J. Bickel, K.A. Doksum and J.L. Hodges, Wadsworth, Belmont,
Calif. 157-184, 1983
Dorea, C.C.Y. (1997): L1-convergence of a class of algorithms for global optimization,
Student 2 (1997)
Downs, T.D. and A.L. Gould (1967): Some relationships between the normal and von
Mises distributions, Biometrika 54 (1967) 684-687
Dragan, V. and A. Halanay (1999): Stabilization of linear systems, Birkhäuser Boston –
Basel – Berlin 1999
Draper, N.R. and R. C. van Nostrand (1979): Ridge regression and James-Stein estima-
tion: review and comments, Technometrics 21 (1979) 451-466
Draper, N.R. and J.A. John (1981): Influential observations and outliers in regression,
Technometrics 23 (1981) 21-26
Draper, N.R. and F. Pukelsheim (1996): An overview of design of experiments, Statistical
Papers 37 (1996) 1-32
Draper, N.R. and F. Pukelsheim (2000): Ridge analysis of mixture response surfaces,
Statistics & Probability Letters 48 (2000) 131-140
Draper, N.R. and C.R. Van Nostrand (1979): Ridge regression and James Stein estima-
tion: review and comments, Technometrics 21 (1979) 451-466
Driscoll, M.F. (1999): An improved result relating quadratic forms and Chi-Square Dis-
tributions, The American Statistician 53 (1999) 273-275
Driscoll, M.F. and B. Krasnicka (1995): An accessible proof of Craig’s theorem in the
general case, The American Statistician 49 (1995) 59-62
Droge, B. (1998): Minimax regret analysis of orthogonal series regression estimation:
selection versus shrinkage, Biometrika 85 (1998) 631-643
Drygas, H. (1975): Estimation and prediction for linear models in general spaces, Math.
Operationsforsch. Statistik 6 (1975) 301-324
676 References
Drygas, H. (1983): Sufficiency and completeness in the general Gauss-Markov model,
Sankhya Ser. A 45 (1983) 88-98
Du, Z. and D.P. Wiens (2000): Jackknifing, weighting, diagnostics and variance estima-
tion in generalized M-estimation, Statistics & Probability Letters 46 (2000) 287-299
Duan, J.C. (1997): Augmented GARCH (p,q) process and its diffusion limit, J. Economet-
rics 79 (1997) 97-127
Duncan, W.J. (1944): Some devices for the solution of large sets of simultaneous linear
equations, London, Edinburgh and Dublin Philosophical Magazine and J.Science (7th
series) 35 (1944) 660-670
Dunfour, J.M. (1986): Bias of s2 in linear regression with dependent errors, The American
Statistician 40 (1986) 284-285
Dunnett, C.W. and M. Sobel (1954): A bivariate generalization of Student’s t-distribution,
with tables for certain special cases, Biometrika 41 (1954) 153-69
Dupuis, D.J. and C.A. Field (1998): A comparison of confidence intervals for generalized
extreme-value distributions, J. Statist. Comput. Simul. 61 (1998) 341-360
Durand, D. and J.A. Greenwood (1957): Random unit vectors II: usefulness of Gram-
Charlier and related series in approximating distributions, Ann. Math. Statist. 28
(1957) 978-986
Durbin, J. and G.S. Watson (1950): Testing for serial correlation in least squares regres-
sion, Biometrika 37 (1950) 409-428
Durbin, J. and G.S. Watson (1951): Testing for serial correlation in least squares regres-
sion II, Biometrika 38 (1951) 159-177
D’Urso, P. and T. Gastaldi (2000): A least-squares approach to fuzzy linear regression
analysis, Computational Statistics & Data Analysis 34 (2000) 427-440
Dyn, N., Leviatan, D., Levin, D. and A. Pinkus (2001): Multivariate approximation and
applications, Cambridge University Press, Cambridge 2001
Ecker, E. (1977): Ausgleichung nach der Methode der kleinsten Quadrate, Öst. Z. Ver-
messungswesen 64 (1977) 41-53
Eckert, M (1935): Eine neue flächentreue (azimutale) Erdkarte, Petermann’s Mitteilungen
81 (1935) 190-192
Eckhart, C. and G. Young (1939): A principal axis transformation for non-Hermitean
matrices, Bull. Amer. Math. Soc. 45 (1939) 188-121
Eckl, M.C., Snay, R.A., Solder, T., Cline, M.W. and G.L. Mader (2001): Accuracy of
GPS-derived positions as a function of interstation distance and observing-session du-
ration, J. Geodesy 75 (2001) 633-640
Edelman, A. (1989): Eigenvalues and condition numbers of random matrices, PhD disser-
tation, Massachussetts Institute of Technology 1989
Edelman, A. (1998): The geometry of algorithms with orthogonality constraints, SIAM J.
Matrix Anal Appl. 20 (1998) 303-353
Edelman, A., Arias, T.A. and Smith, S.T. (1998): The geometry of algorithms with or-
thogonality constraints, SIAM J. Matrix Anal. Appl. 20 (1998) 303-353
Edelman, A., Elmroth, E. and B. Kagström (1997): A geometric approach to perturbation
theory of matrices and matrix pencils. Part I: Versal deformations, SIAM J. Matrix
Anal. Appl. 18 (1997) 653-692
Edgar, G.A. (1998): Integral, probability, and fractal measures, Springer Verlag, New
York 1998
Edgeworth, F.Y. (1883): The law of error, Philosophical Magazine 16 (1883) 300-309
Edlund, O., Ekblom, H. and K. Madsen (1997): Algorithms for non-linear M-estimation,
Computational Statistics 12 (1997) 373-383
References 677
Eeg, J. and T. Krarup (1973): Integrated geodesy, Danish Geodetic Institute, Report No.
7, Copenhagen 1973
Effros, E.G. (1997): Dimensions and C* algebras, Regional Conference Series in Mathe-
matics 46, Rhode Island 1997
Efromovich, S. (2000): Can adaptive estimators for Fourier series be of interest to wave-
lets?, Bernoulli 6 (2000) 699-708
Efron, B. and R.J. Tibshirani (1994): An introduction to the bootstrap, Chapman and Hall,
Boca Raton 1994
Eibassiouni, M.Y. and J. Seely (1980): Optimal tests for certain functions of the parame-
ters in a covariance matrix with linear structure, Sankya: The Indian J.Statistics 42
(1980) 64-77
Ekblom, S. and S. Henriksson (1969): Lp-criteria for the estimation of location parame-
ters, SIAM J. Appl. Math. 17 (1969) 1130-1141
Elden, L. (1977): Algorithms for the regularization of ill-conditioned least squares prob-
lems, BIT 17 (1977) 134-145
Elhay, S., Golub, G.H. and J. Kautsky (1991): Updating and downdating of orthogonal
polynomials with data fitting applications, SIAM J. Matrix Anal. Appl. 12 (1991)
327-353
Elian, S.N. (2000): Simple forms of the best linear unbiased predictor in the general linear
regression model, American Statistician 54 (2000) 25-28
Ellis, R.L. and I. Gohberg (2003): Orthogonal systems and convolution operators, Birk-
häuser Verlag, Basel-Boston-Berlin 2003
Ellenberg, J.H. (1973): The joint distribution of the standardized least squares residuals
from a general linear regression, J.the American Statistical Association 68 (1973)
941-943
Elpelt, B. (1989): On linear statistical models of commutative quadratic type, Commun.
Statist.-Theory Method 18 (1989) 3407-3450
El-Basssiouni, M.Y. and Seely, J. (1980): Optimal tests for certain functions of the pa-
rameters in a covariance matrix with linear structure, Sankya A42 (1980) 64-77
Elfving, G. (1952): Optimum allocation in linear regression theory, Ann. Math. Stat. 23
(1952) 255-263
El-Sayed, S.M. (1996): The sampling distribution of ridge parameter estimator, Egyptian
Statistical Journal, ISSR – Cairo University 40 (1996) 211-219
Engel, J. and A. Kneip (1995): Model estimation in nonlinear regression, Lecture Notes in
Statistics 104 (1995) 99-107
Engl, H.W., Hanke, M. and A. Neubauer (1996): Regularization of inverse problems,
Kluwer Academic Publishers, Dordrecht 1996
Engl, H.W., Louis, A.K. and W. Rundell (1997): Inverse problems in geophysical applica-
tions, SIAM, Philadelphia 1997
Engler, K., Grafarend, E.W., Teunissen, P. and J. Zaiser (1982): Test computations of
three-dimensional geodetic networks with observables in geometry and gravity space,
Proceedings of the International Symposium on Geodetic Networks and Computa-
tions. Vol. VII, 119-141. Report B 258/VII. Deutsche Geodätische Kommission, Bay-
erische Akademie der Wissenschaften, München 1982.
Ernst, M.D. (1998): A multivariate generalized Laplace distribution, Computational Sta-
tistics 13 (1998) 227-232
Eubank, R.L. and P. Speckman (1991): Convergence rates for trigonometric and polyno-
mial-trigonometric regression estimators, Statistical & Probability Letters 11 (1991)
119-124
Euler, N. and W.H. Steeb (1992): Continuous symmetry, Lie algebras and differential
equations, B.I. Wissenschaftsverlag, Mannheim 1992
678 References
Even-Tzur, G. (1998): Application of the set covering problem to GPS measurements,
surveying and land Information Systems 58 (1998) 25-29
Even-Tzur, G. (1999): Reliability designs and control of geodetic networks, Z. Vermes-
sungswesen 4 (1999) 128-134
Even-Tzur, G. (2001): Graph theory application to GPS networks, GPS Solution 5 (2001)
31-38
Everitt, B.S. (1987): Introduction to optimization methods and their application in statis-
tics, Chapman and Hall, London 1987
Fagnani, F. and L. Pandolci (2002): A singular perturbation approach to a recursive de-
convolution problem, SIAM J. Control Optim 40 (2002) 1384-1405
Fahrmeir, L. and G. Tutz (2001): Multivariate statistical modelling based on generalized
linear models, Springer Verlag, New York 2001
Fakeev, A.G. (1981): A class of iterative processes for solving degenerate systems of
linear algebraic equations, USSR. Comp. Maths. Math. Phys. 21 (1981) 15-22
Falk, M., Hüsler, J. and R.D. Reiss (1994): Law of small numbers, extremes and rare
events, Birkhäuser Verlag, Basel 1994
Fan, J. and I. Gijbels (1996): Local polynomial modelling and its applications, Chapman
and Hall, Boca Raton 1996
Fang, K.-T. and Y. Wang (1993): Number-theoretic methods in statistics, Chapman and
Hall, Boca Raton 1993
Fang, K.-T. and Y.-T. Zhang (1990): Generalized multivariate analysis, Science Press
Beijing - Springer Verlag, Bejing - Berlin 1990
Fang, K.-T., Kotz, S. and K.W. Ng (1990): Symmetric multivariate and related distribu-
tions, Chapman and Hall, London 1990
Fang, Z. and D.P. Wiens (2000): Integer-valued, minimax robust designs for estimation
and extrapolation in heteroscedastic, approximately linear models, J. the American
Statistical Association 95 (2000) 807-818
Farahmand, K. (1996): Random polynomials with complex coefficients, Statistics &
Probability Letters 27 (1996) 347-355
Farahmand, K. (1999): On random algebraic polynomials, Proceedings of the American
Math. Soc. 127 (1999) 3339-3344
Farebrother, R.W. (1987): The historical development of the L1 and Lf estimation proce-
dures, Statistical Data Analysis Based on the L1-Norm and Related Methods, Y.
Dodge (ed.) 1987
Farebrother, R.W. (1988): Linear least squares computations, Dekker, New York 1988
Farebrother, R.W. (1999): Fitting linear relationships, Springer Verlag, New York 1999
Farrel, R.H. (1964): Estimators of a location parameter in the absolutely continuous case,
Ann. Math. Statist. 35 (1964) 949-998
Fassò, A. (1997): On a rank test for autoregressive conditional heteroscedasticity, Student
2 (1997) 85-94
Faulkenberry , G.D. (1973): A method of obtaining prediction intervals, J. Amer. Statist.
Ass. 68 (1973) 433-435
Fausett, D.W. and C.T. Fulton (1994): Large least squares problems involving Kronecker
products, SIAM J. Matrix Anal. Appl. 15 (1994) 219-227
Fedi, M. and G. Florio (2002): A stable downward continuation by using the ISVD
method, Geophys. J. Int. 151 (2001) 146-156
Fedorov, V.V. and P. Hackl (1997): Model-oriented design of experiments, Springer
Verlag, New York 1997
Fedorov, V.V., Montepiedra, G. and C.J. Nachtsheim (1999): Design of experiments for
locally weighted regression, J.Statistical Planning and Inference 81 (1999) 363-382
References 679
Feinstein, A.R. (1996): Multivariate analysis, Yale University Press, New Haven 1996
Fengler, M., Freeden, W. and V. Michel (2003): The Kaiserslautern multiscale geopoten-
tial model SWITCH-03 from orbit pertubations of the satellite CHAMP and its com-
parison to the models EGM96, UCPH2002_02_0.5, EIGEN-1s, and EIGEN-2, Geo-
physical Journal International (submitted) 2003
Feuerverger, A. and P. Hall (1998): On statistical inference based on record values, Ex-
tremes 1:2 (1998) 169-190
Fiebig, D.G., Bartels, R. and W. Krämer (1996): The Frisch-Waugh theorem and general-
ized least squares, Econometric Reviews 15 (1996) 431-443
Fierro, R.D. (1996): Pertubation analysis for twp-sided (or complete) orthogonal decom-
positions, SIAM J. Matrix Anal. Appl. 17 (1996) 383-400
Fierro, R.D. and J.R. Bunch (1995): Bounding the subspaces from rank revealing two-
sided orthogonal decompositions, SIAM J. Matrix Anal. Appl. 16 (1995) 743-759
Fierro, R.D. and P.C. Hansen (1995): Accuracy of TSVD solutions computed from rank-
revealing decompositions, Numer. Math. 70 (1995) 453-471
Fierro, R.D. and P.C. Hansen (1997): Low-rank revealing UTV decompositions, Numer.
Algorithms 15 (1997) 37-55
Fill, J.A. and D.E. Fishkind (1999): The Moore-Penrose generalized inverse for sums of
matrices, SIAM J. Matrix Anal. Appl. 21 (1999) 629-635
Fisher, N.I. (1993): Statistical analysis of circular data, Cambridge University Press,
Cambridge 1993
Fisher, N.J. (1985): Spherical medians, J. Royal Statistical Society, Series B: 47 (1985)
342-348
Fisher, N.J. and A.J. Lee (1983): A correlation coefficient for circular data, Biometrika 70
(1983) 327-332
Fisher, N.J. and A.J. Lee (1986): Correlation coefficients for random variables on a sphere
or hypersphere, Biometrika 73 (1986) 159-164
Fisher, N.I. and P. Hall (1989): Bootstrap confidence regions for directional data, J.
American Statist. Assoc. 84 (1989) 996-1002
Fisher, R.A. (1915): Frequency distribution of the values of the correlation coefficient in
samples from an indefinitely large population, Biometrika 10 (1915) 507-521
Fisher, R.A. (1935): The fiducial argument in statistical inference, Annals of Eugenics 6
(1935) 391-398
Fisher, R.A. (1939): The sampling distribution of some statistics obtained from nonlinear
equations, Ann. Eugen. 9 (1939) 238-249
Fisher, R.A. (1953): Dispersion on a sphere, Pro. Roy. Soc. Lond. A 217 (1953) 295-305
Fisher, R.A. and F. Yates (1942): Statistical tables for biological, agricultural and medical
research, 2nd edition, Oliver and Boyd, Edinburgh 1942
Fisz, M. (1970): Wahrscheinlichkeitsrechnung und mathematische Statistik, WEB deut-
scher Verlag der Wissenschaften, Berlin 1970
Fitzgerald, W.J., Smith, R.L., Walden, A.T. and P.C. Young (2001): Non-linear and non-
stationary signal processing, Cambridge University Press, Cambridge 2001
Fletcher, R. and C.M. Reeves (1964): Function minimization by conjugate gradients,
Comput. J. 7 (1964) 149-154
Flury, B. (1997): A first course in multivariate statistics, Springer Verlag, New York 1997
Focke, J. and G. Dewess (1972): Über die Schätzmethode MINQUE von C.R. Rao und
ihre Verallgemeinerung, Math. Operationsforschg. Statistik 3 (1972) 129-143
Foerstner, W. (1979a): Konvergenzbeschleunigung bei der a posteriori Varianzschätzung,
Z. Vermessungswesen 104 (1979) 149-156
680 References
Foerstner, W. (1979b): Ein Verfahren zur Schätzung von Varianz- und Kovarianz- Kom-
ponenten, Allg. Vermessungsnachrichten 86 (1979) 446-453
Foerstner, W. (1983): Reliability and discernability of extended Gauss-Markov models,
in: Seminar – Mathematical models of geodetic/Photogrammetric point determination
with regard to outliers and systematic errors, Ackermann, F.E. (ed), München 1983
Foerstner, W. and B. Moonen (2003): A metric for covariance matrices, in: E.W. Grafar-
end, F. Krumm and V. Schwarze: Geodesy – the Challenge of the 3rd Millenium, pp.
299-309, Springer Verlag, Berlin 2003
Forsgren, A. and W. Murray (1997): Newton methods for large-scale linear inequality-
constrained minimization, Siam J. Optim. 7 (1997) 162-176
Forsgren, A. and G. Sporre (2001): On weighted linear least-squares problems related to
interior methods for convex quadratic programming, SIAM J. Matrix Anal. Appl. 23
(2001) 42-56
Forsythe, A.B. (1972): Robust estimation of straight line regression coefficients by mini-
mizing p-th power deviations, Technometrics 14 (1972) 159-166
Foster, L.V. (2003): Solving rank-deficient and ill-posed problems using UTV and QR
factorizations, SIAM J. Matrix Anal. Appl. 25 (2003) 582-600
Fotiou, A. and D. Rossikopoulos (1993): Adjustment, variance component estimation and
testing with the affine and similarity transformations, Z. Vermessungswesen 118
(1993) 494-503
Foucart, T. (1999): Stability of the inverse correlation matrix. Partial ridge regression,
J.Statistical Planning and Inference 77 (1999) 141-154
Fox, M. and H. Rubin (1964): Admissibility of quantile estimates of a single location
parameter, Ann. Math. Statist. 35 (1964) 1019-1031
Franses, P.H. (1998): Time series models for business and esonomic forecasting, Cam-
bridge University Press, Cambridge 1998
Fraser, D.A.S. (1963): On sufficiency and the exponential family, J. Roy. Statist. Soc. 25
(1963) 115-123
Fraser, D.A.S. (1968): The structure of inference, J. Wiley, New York 1968
Fraser, D.A.S. and I. Guttman (1963): Tolerance regions, Ann. Math. Statist. 27 (1957)
162-179
Freeman, R.A. and P.V. Kokotovic (1996): Robust nonlinear control design, Birkhäuser
Verlag, Boston 1996
Freiberg, B. (1985): Exact design for regression models with correlated errors, Statistics
16 (1985) 479-484
Freund, P.G.O. (1974): Local scale invariance and gravitation, Annals of Physics 84
(1974) 440-454
Frey, M. and J.C. Kern (1997): The Pitman Closeness of a Class of Scaled Estimators,
The American Statistician, May 1997, Vol. 51 (1997) 151-154
Fristedt, B. and L. Gray (1995): A modern approach to probability theory, Birkhäuser,
Basel 1997
Frobenius, F.G. (1893): Gedächtnisrede auf Leopold Kronecker (1893), Ferdinand Georg
Frobenius, Gesammelte Abhandlungen, ed. J.S.Serre, Band III, pages 705-724,
Springer Verlag, Berlin 1968
Fujikoshi, Y. (1980): Asymptotic expansions for the distributions of sample roots under
non-normality, Biometrika 67 (1980) 45-51
Fulton, T., Rohrlich, F. and L. Witten (1962): Conformal invariance in physics, Reviews
of Modern Physics 34 (1962) 442-457
Furno, M. (1997): A robust heteroskedasticity consistent covariance matrix estimator,
Statistics 30 (1997) 201-219
References 681
Gabor, D. (1946): Theory of communication, J. the Electrical Engineers 93 (1946) 429-
441
Gaffke, N. and B. Heiligers (1996): Approximate designs for polynomial regression:
invariance, admissibility and optimality, Handbook of Statistik 13 (1996) 1149-1199
Galil, Z. (1985): Computing d-optimum weighing designs: Where statistics, combina-
torics, and computation meet, in: Proceedings of the Berkeley Conference in Honor of
Jerzy Neyman and Jack Kiefer, Vol. II, eds. L.M. LeCam and R.A. Olshen,
Wadsworth 1985
Gallant, A.R. (1987): Nonlinear statistical models, John Wiley, New York 1987
Gallavotti, G. (1999): Statistical mechanics: A short treatise, Springer-Verlag, New York
1999
Gander, W. (1981): Least squares with a quadratic constraint, Numer. Math. 36 (1981)
291-307
Gao, S. and T.M.F. Smith (1995): On the nonexistence of a global nonengative minimum
bias invariant quadratic estimator of variance components, Statistics and Probability
letters 25 (1995) 117-120
Gao, S. and T.M.F. Smith (1998): A constrained Minqu estimator of correlated response
variance from unbalanced dara in complex surveys, Statistica Sinica 8 (1998) 1175-
1188
Gao, Y., Lahaye, F., Heroux, P., Liao, X., Beck, N. and M. Olynik (2001): Modelling and
estimation of C1-P1 bias in GPS receivers, J. Geodesy 74 (2001) 621-626
Garcia, A.G. (2000): Orthogonal sampling formulas: a unified approach, SIAM Review
42 (2000) 499-512
García-Escudero, L.A., Gordaliza, A. and C. Matrán (1997): k-Medians and trimmed k-
medians, Student 2 (1997) 139-148
Garcia-Ligero, M.J., Hermoso, A. and J. Linares (1998): Least squared estimation for
distributed parameter systems with uncertain observations: Part 1: Linear prediction
and filtering, Applied Stochastic Models and Data Analysis 14 (1998) 11-18
Gassmann, H. (1989): Einführung in die Regelungstechnik, Verlag Harri Deutsch, Frank-
furt am Main 1989
Gather, U. and C. Becker (1997): Outlier identification and robust methods, Handbook of
Statistics 15 (1997) 123-143
Gauss, C.F. (1809): Theoria Motus, Corporum Coelesium, Lib. 2, Sec. III, Perthes u.
Besser Publ., 205-224, Hamburg 1809
Gauss, C.F. (1816): Bestimmung der Genauigkeit der Beobachtungen, Z. Astronomi 1
(1816) 185-197
Gautschi, W. (1982): On generating orthogonal polynomials, SIAM Journal on Scientific
and Statistical Computing 3 (1982) 289-317
Gautschi, W. (1985): Orthogonal polynomials - constructive theory and applications, J.
Comput. Appl. Math. 12/13 (1985) 61-76
Gautschi, W. (1997): Numerical analysis - an introduction, Birkhäuser Verlag, Boston-
Basel-Berlin 1997
Gelfand, A.E. and D.K. Dey (1988): Improved estimation of the disturbance variance in a
linear regression model, J. Econometrics 39 (1988) 387-395
Gelman, A., Carlin, J.B., Stern, H.S. and D.B. Rubin (1995): Bayesian data analysis,
Chapman and Hall, London 1995
Genton, M.G. (1998): Asymptotic variance of M-estimators for dependent Gaussian
random variables, Statistics and Probability Lett. 38 (1998) 255-261
Genton, M.G. and Y. Ma (1999): Robustness properties of dispersion estimators, Statistics
& Probability Letters 44 (1999) 343-350
682 References
Ghosh, M. and G. Meeden (1978): Admissibility of the MLE of the normal integer mean,
The Indian J.Statistics 40 (1978) 1-10
Ghosh, M., Mukhopadhyay, N. and P.K. Sen (1997): Sequential estimation, Wiley, New
York 1997
Ghosh, S. (1996): Wishart distribution via induction, The American Statistician 50 (1996)
243-246
Ghosh, S. (1999a): Multivariate analysis, design of experiments, and survey sampling,
Marcel Dekker, Basel 1999
Ghosh, S. (ed.)(1999b): Multivariate analysis, design of experiments, and survey sam-
pling, Marcel Dekker, New York 1999
Ghosh, S., Beran, J. and J. Innes (1997): Nonparametric conditional quantile estimation in
the presence of long memory, Student 2 (1997) 109-117
Giacolone, M. (1997): Lp-norm estimation for nonlinear regression models, Student 2
(1997) 119-130
Gil, A. and J. Segura (1998): A code to evaluate prolate and oblate spheroidal harmonics,
Computer Physics Communications 108 (1998) 267-278
Gil, J.A. and R. Romera (1998): On robust partial least squares (PLS) methods, J.
Chemometrics 12 (1998) 365-378
Gilbert, E.G. and C.P. Foo (1990): Computing the distance between general convex ob-
jects in three-dimensional space, JEEE Transactions on Robotics and Automation 6
(1990) 53-61
Gilberg, F., Urfer, F. and L. Edler (1999): Heteroscedastic nonlinear regression models
with random effects and their application to enzyme kinetic data, Biometrical Journal
41 (1999) 543-557
Gilchrist, R. and G. Portides (1995): M-estimation: some remedies, Lectures Notes in
Statistics 104 (1995) 117-124
Gill, P.E., Murray, W. and M.A. Saunders (2002): Snopt: An SQP algorithm for large
scale constrained optimization, Siam J. Optim. 12 (2002) 979-1006
Gille, J.C., Pelegrin, M. and P. Decaulne (1964): Lehrgang der Regelungstechnik, Verlag
Technik, Berlin 1964
Giri, N. (1977): Multivariate statistical inference, Academic Press, New York 1977
Giri, N. (1993): Introduction to probability and statistics, 2nd edition, Marcel Dekker,
New York 1993
Giri, N. (1996a): Multivariate statistical analysis, Marcel Dekker, New York 1996
Giri, N. (1996b): Group invariance in statistical inference, World Scientific, Singapore
1996
Girko, V.L. (1988): Spectral theory of random matrices, Nauka, Moscow 1988
Girko, V.L. (1990): Theory of random determinants, Kluwer Academic Publishers,
Dordrecht 1990
Girko, V.L. and A.K.Gupta (1996): Multivariate elliptically contoured linear models and
some aspects of the theory of random matrices, in: Multidimensional statistical analy-
sis and theory of random matrices, Proceedings of the Sixth Lukacs Symposium, eds.
Gupta, A.K. and V.L. Girko, pages 327-386, VSP, Utrecht 1996
Glatzer, E. (1999): Über Versuchsplanungsalgorithmen bei korrelierten Beobachtungen,
Master's Thesis, Wirtschaftsuniversität Wien
Gleason, J.R. (2000): A note on a proposed student t approximation, Computational Sta-
tistics & Data Analysis 34 (2000) 63-66
Gleick, J. (1987): Chaos, Viking, New York 1987
Glesser, L.J. and I. Olkin (1972): Estimation for a regression model with an unknown
covariance matrix, in: Proceedings of the Sixth Berkeley Symposium on Mathemati-
References 683
cal Statistics and Probability, pp. 541-569, Cam, L.M.L, Neyman, J. and Scott, E.L.,
University of California Press, Berkley and Los Angeles 1972
Glimm, J. (1960): On a certain class of operator algebras, Trans. American Mathematical
Society 95 (1960) 318-340
Gnedenko, B.V. and A.N. Kolmogorov (1968): Limit distributions for sums of independ-
ent random variables, Addison-Wesley Publ., Reading, Mass. 1968
Gnedin, A.V. (1993): On multivariate extremal processes, J. Multivariate Analysis 46
(1993) 207-213
Gnedin, A.V. (1994): On a best choice problem with dependent criteria, J. Applied Prob-
ability 31 (1994) 221-234
Gneiting, T. (1999): Correlation functions for atmospheric data analysis, Q. J. R. Meteo-
rol. Soc. 125 (1999) 2449-2464
Gnot, S. and A. Michalski (1994): Tests based on admissible estimators in two variance
components models, Statistics 25 (1994) 213-223
Gnot, S. and G. Trenkler (1996): Nonnegative quadratic estimation of the mean squared
errors of minimax estimators in the linear regression model, Acta Applicandae
Mathematicae 43 (1996) 71-80
Goad, C.C. (1996): Single-site GPS models, in: GPS for Geodesy, pp. 219-237, Teunis-
sen, P.J.G. and A. Kleusberg (eds), Berlin 1996
Godambe, V.P. (1991): Estimating Functions, Oxford University Press 1991
Godambe, V.P. (1995): A unified theory of sampling from finite populations, J. Roy.
Statist. Soc B17 (1955) 268-278
Göbel, M. (1998): A constructive description of SAGBI bases for polynomial invariants
of permutation groups, J. Symbolic Computation 26 (1998) 261-272
Goldberger, A.S. (1962): Best linear unbiased prediction in the generalized linear regres-
sion model, J. Amer. Statist. Ass. 57 (1962) 369-375
Goldie, C.M. and S. Resnick (1989): Records in a partially ordered set, Annals Probability
17 (1989) 678-689
Goldie, C.M. and S. Resnick (1995): Many multivariate records, Stochastic Processes
Appl. 59 (1995) 185-216
Goldie, C.M. and S. Resnick (1996): Ordered independent scattering, Commun. Statist.
Stochastic Models 12 (1996) 523-528
Goldstine, H. (1977): A history of numerical analysis from the 16th through the 19th
century, Springer Verlag, New York 1977
Golshstein, E.G. and N.V. Tretyakov (1996): Modified Lagrangian and monotone maps in
optimization, J. Wiley, New York 1996
Golub, G.H. (1965): Numerical methods for solving linear least squares solution, Numer.
Math. 7 (1965) 206-216
Golub G.H. (1968): Least squares, singular values and matrix approximations, Aplikace
Matematiky 13 (1968) 44-51
Golub, G.H. (1973): Some modified matrix eigenvalue problems, SIAM Review 15
(1973) 318-334
Golub, G.H. and C.F. van Loan (1996): Matrix computations, 3rd edition, John Hopkins
University Press, Baltimore 1996
Golub, G.H. and W. Kahan (1965): Calculating the singular values and pseudo-inverse of
a matrix, SIAM J Numer. Anal. 2 (1965) 205-224
Golub, G.H. and C. Reinsch (1970): Singular value decomposition and least squares
solutions, Numer. Math. 14 (1970) 403-420
Golub, G.H. and U. von Matt (1991): Quadratically constrained least squares and quad-
ratic problems, Numer. Math. 59 (1991) 561-580
684 References
Golub, G.H., Hansen, P.C. and D.P. O'Leary (1999): Tikhonov regularization and total
least squares, SIAM J. Matrix Anal. Appl. 21 (1999) 185-194
Gómez, E., Gómez-Villegas, M.A. and J.M. Marín (1998): A multivariate generalization
of the power exponential family of distributions, Commun. Statist. - Theory Meth. 27
(1998) 589-600
Gonin, R. and A.H. Money (1987a): Outliers in physical processes: L1- or adaptive Lp-
norm estimation?, in: Statistical Data Analysis Based on the L1 Norm and Related
Methods, Dodge Y. (ed), North-Holland 1987
Gonin, R. and A.H. Money (1987b): A review of computational methods for solving the
nonlinear L1 norm estimation problem, in: Statistical data analysis based on the L1
norm and related methods, Ed. Y. Dodge, North Holland 1987
Gonin, R. and A.H. Money (1989): Nonlinear lp-norm estimation, Marcel Dekker, New
York 1989
Goodall, C. (1991): Procrustes methods in the statistical analysis of shape, J. Royal Statis-
tical Society B 53 (1991) 285-339
Goodal, C.R. (1993): Computation using the QR decomposition, C.R. Rao, ed., Handbook
of Statistic 9 (1993) 467-508
Goodman, J.W. (1985): Statistical optics, Wiley, New York 1985
Gordon, A.D. (1997): L1-norm and L2-norm methodology in cluster analysis, Student 2
(1997) 181-193
Gordon, A.D. (1999): Classification, 2nd edition, Chapman and Hall, Yew York 1999
Gordon, L. and M. Hudson (1977): A characterization of the Von Mises Distribution,
Ann. Statist. 5 (1977) 813-814
Gordonova, V.I. (1973): The validation of algorithms for choosing the regularization
parameter, Zh. vychisl. mat. mat. fiz. 13 (1973) 1328-1332
Gorman, T.W.: (2001): Adaptive estimation using weighted least squares, Aust. N. Z. J.
Stat. 43 (2001) 287-297
Gotthardt, E. (1978): Einführung in die Ausgleichungsrechnung, 2. Auflage, Karlsruhe
1978
Gould, A.L. (1969): A regression technique for angular varietes, Biometrica 25 (1969)
683-700
Gower, J.C. and G.B. Dijksterhuis (2004): Procrustes Problems, Oxford Statistical Scien-
ce Series 30, Oxford 2004
Grafarend, E.W. (1967a): Bergbaubedingte Deformation und ihr Deformationstensor
Bergbauwissenschaften 14 (1967) 125-132
Grafarend, E.W. (1967b): Allgemeiner Fehlertensor bei a priori und a posteriori Korrela-
tionen, Z. Vermessungswesen 92 (1967) 157-165
Grafarend, E.W. (1969): Helmertsche Fußpunktkurve oder Mohrscher Kreis?, Allg. Ver-
messungsnachrichten 76 (1969) 239-240
Grafarend, E.W. (1970a): Verallgemeinerte Methode der kleinsten Quadrate für zyklische
Variable, Z. Vermessungswesen 4 (1970) 117-121
Grafarend, E.W. (1970b): Die Genauigkeit eines Punktes im mehrdimensionalen Euklidi-
schen Raum, Deutsche Geodätische Kommission bei der Bayerischen Akademie der
Wissenschaften C 153, München 1970
Grafarend, E.W. (1970c): Fehlertheoretische Unschärferelation, Festschrift Professor Dr.-
Ing. Helmut Wolf, 60. Geburtstag, Bonn 1970
Grafarend, E.W. (1971a): Mittlere Punktfehler und Vorwärtseinschneiden, Z. Vermes-
sungswesen 96 (1971) 41-54
Grafarend, E.W. (1971b): Isotropietests von Lotabweichungen Westdeutschlands, Z.
Geophysik 37 (1971) 719-733
References 685
Grafarend, E.W. (1972a): Nichtlineare Prädiktion, Z. Vermessungswesen 97 (1972) 245-
255
Grafarend, E.W. (1972b): Isotropietests von Lotabweichungsverteilungen Westdeutsch-
lands II, Z. Geophysik 38 (1972) 243-255
Grafarend, E.W. (1972c): Genauigkeitsmaße geodätischer Netze, Deutsche Geodätische
Kommission bei der Bayerischen Akademie der Wissenschaften A 73, München 1972
Grafarend, E.W. (1973a): Nichtlokale Gezeitenanalyse, Mitt. Institut für Theoretische
Geodäsie No. 13, Bonn 1973
Grafarend, E.W. (1973b): Optimales Design geodätische Netze 1 (zus. P. Harland), Deut-
sche Geodätische Kommission bei der Bayerischen Akademie der Wissenschaften A
74, München 1973
Grafarend, E.W. (1974): Optimization of geodetic networks, Bollettino di Geodesia e
Scienze Affini 33 (1974) 351-406
Grafarend, E.W. (1975): Second order design of geodetic nets, Z. Vermessungswesen 100
(1975) 158-168
Grafarend, E.W. (1976): Geodetic applications of stochastic processes, Physics of the
Earth and Planetory Interiors 12 (1976) 151-179
Grafarend, E.W. (1978): Operational geodesy, in: Approximation Methods in Geodesy,
eds. H. Moritz and H. Sünkel, pp. 235-284, H. Wichmann Verlag, Karlsruhe 1978
Grafarend, E.W. (1979): Kriterion-Matrizen I - zweidimensional homogene und isotope
geodätische Netze - Z. Vermessungswesen 104 (1979) 133-149
Grafarend, E.W. (1983): Stochastic models for point manifolds, in: Mathematical models
of geodetic/ photogrammetric point determination with regard to outliers and system-
atic errors, ed. F.E. Ackermann, Report A 98, 29-52, Deutsche Geodätische Kommis-
sion, Bayerische Akademie der Wissenschaften, München 1983
Grafarend, E.W. (1984): Variance-covariance component estimation of Helmert type in
the Gauss-Helmert model, Z. Vermessungswesen 109 (1984) 34-44
Grafarend, E.W. (1985a): Variance-covariance component estimation, theoretical results
and geodetic applications, Statistics and Decision, Supplement Issue No. 2 (1985)
407-447
Grafarend, E.W. (1985b): Criterion matrices of heterogeneously observed threedimen-
sional networks, Manuscripta Geodaetica 10 (1985) 3-22
Grafarend, E.W. (1985c): Criterion matrices for deforming networks, in: Optimization
and Design of Geodetic Networks, E.W. Grafarend and F. Sanso (eds.) pages 363-
428, Springer-Verlag, Berlin-Heidelberg-New York-Tokyo 1985
Grafarend, E.W. (1986): Generating classes of equivalent linear models by nuisance
parameter elimination - applications to GPS observations, Manuscripta Geodaetica 11
(1986) 262-271
Grafarend, E.W. (1989a): Four lectures on special and general relativity, Lecture Notes in
Earth Sciences, F. Sanso and R. Rummel (eds.), Theory of Satellite Geodesy and
Gravity Field Determination, Nr. 25, pages 115-151, Springer Verlag Berlin - Heidel-
berg - New York - London - Paris - Tokyo - Hongkong 1989
Grafarend, E.W. (1989b): Photogrammetrische Positionierung, Festschrift Prof. Dr.-Ing.
Dr. h.c. Friedrich Ackermann zum 60. Geburtstag, Institut für Photogrammetrie, Uni-
versität Stuttgart, Report 14, pages 45-55, Suttgart 1989.
Grafarend, E.W. (1991a): Relativistic effects in geodesy, Report Special Study Group
4.119, International Association of Geodesy, Contribution to "Geodetic Theory and
Methodology" ed. F. Sanso, 163-175, Politecnico di Milano, Milano/Italy 1991
Grafarend, E.W. (1991b): The Frontiers of Statistical Scientific Theory and Industrial
Applications (Volume II of the Proceedings of ICOSCO-I), American Sciences Press,
pages 405-427, New York 1991
686 References
Grafarend, E.W. (1998): Helmut Wolf – das wissenschaftliche Werk - the scientific work,
Heft A 115, Deutsche Geodätische Kommission, Bayerische Akademie der Wissen-
schaften, C.H. Beck’sche Verlagsbuchhandlung, 97 Seiten, München 1998
Grafarend, E.W. (2000): Mixed integer-real valued adjustment (IRA) problems, GPS
Solutions 4 (2000) 31-45
Grafarend, E.W. and J. Awange (2002a): Nonlinear adjustment of GPS observations of
type pseudo-ranges, GPS Solutions 5 (2002) 80-93
Grafarend, E.W. and J. Awange (2002b): Algebraic solution of GPS pseudo-ranging
equations, GPS Solutions 5 (2002) 20-32
Grafarend, E.W. and J. Shan (1997): Estimable quantities in projective networks, Z. Ver-
messungswesen, Part I, 122 (1997) 218-226, Part II, 122 (1997) 323-333
Grafarend, E.W. and J. Shan (2002): GPS Solutions: closed forms, critical and special
configurations of P4P, GPS Solutions 5 (2002) 29-42
Grafarend, E.W. and A. d'Hone (1978): Gewichtsschätzung in geodätischen Netzen,
Deutsche Geodätische Kommission bei der Bayerischen Akademie der Wissenschaf-
ten A 88, München 1978
Grafarend, E.W. and B. Richter (1978): Threedimensional geodesy II-the datum problem,
Z. Vermessungswesen 103 (1978) 44-59
Grafarend, E.W. and G. Kampmann (1996): C10(3): The ten parameter conformal group
as a datum transformation in threedimensional Euclidean space, Z. Vermessungswe-
sen 121 (1996) 68-77
Grafarend, E.W. W. and A. Kleusberg (1980): Expectation and variance component esti-
mation of multivariate gyrotheodolite observations, I. Allg. Vermessungsnachrichten
87 (1980) 129-137
Grafarernd, E.W. and F. Krumm (1985): Continuous networks I, in: Optimization and
Design of Geodetic Net-works. E.W. Grafarend and F. Sanso (Ed.) pp. 301-341,
Springer-Verlag, Berlin-Heidelberg-New York-Tokyo 1985
Grafarend, E.W. and A. Mader (1989): A graph-theoretical algorithm for detecting con-
figuration defects in triangular geodetic networks, Bulletin Géodésique 63 (1989)
387-394
Grafarend, E.W. and V. Mueller (1985): The critical configuration of satellite networks,
especially of Laser and Doppler type, for planar configurations of terrestrial points,
Manuscripta Geodaetica 10 (1985) 131-152
Grafarend, E.W. and F. Sanso (1985): Optimization and design of geodetic networks,
Springer Verlag, Berlin-Heidelberg-New York-Tokyo 1985
Grafarend, E.W. and B. Schaffrin (1974): Unbiased free net adjustment, Surv. Rev. XXII,
171 (1974) 200-218
Grafarend, E.W. and B. Schaffrin (1976): Equivalence of estimable quantities and invari-
ants in geodetic networks, Z. Vermessungswesen 191 (1976) 485-491
Grafarend, E.W. and B. Schaffrin (1979): Kriterion-Matrizen I – zweidimensional homo-
gene und isotope geodätische Netze – Z. Vermessungswesen 104 (1979), 133-149
Grafarend, E.W. and B. Schaffrin (1982): Kriterion Matrizen II: Zweidimensionale ho-
mogene und isotrope geodätische Netze, Teil II a: Relative cartesische Koordinaten,
Z. Vermessungswesen 107 (1982), 183-194, Teil IIb: Absolute cartesische Koordina-
ten, Z. Vermessungswesen 107 (1982) 485-493
Grafarend, E.W. and B. Schaffrin (1988): Von der statistischen zur dynamischen Auffa-
sung geodätischer Netze, Z. Vermessungswesen 113 (1988) 79-103
Grafarend, E.W. and B. Schaffrin (1989): The geometry of nonlinear adjustment - the
planar trisection problem, Festschrift to Torben Krarup eds. E. Kejlso, K. Poder and
C.C. Tscherning, Geodaetisk Institut, Meddelelse No. 58, pages 149-172, Kobenhavn
1989
References 687
Grafarend, E.W. and B. Schaffrin (1991): The planar trisection problem and the impact of
curvature on non-linear least-squares estimation, Comput. Stat. Data Anal. 12 (1991)
187-199
Grafarend, E.W. and B. Schaffrin (1993): Ausgleichungsrechnung in linearen Modellen,
Brockhaus, Mannheim 1993
Grafarend, E.W. and G. Offermanns (1975): Eine Lotabweichungskarte Westdeutschlands
nach einem geodätisch konsistenten Kolmo-gorov-Wiener Modell, Deutsche Geodäti-
sche Kommission bei der Bayerischen Akademie der Wissenschaften A 82, München
1975
Grafarend, E.W. and P. Xu (1994): Observability analysis of integrated INS/GPS system,
Bollettino di Geodesia e Scienze Affini 103 (1994) 266-284
Grafarend, E.W. and P. Xu (1995): A multi-objective second-order optimal design for
deforming networks, Geoph. Journal Int. 120 (1995) 577-589
Grafarend, E.W., Kleusberg, A. and B. Schaffrin (1980): An introduction to the variance-
covariance- component estimation of Helmert type, Z. Vermessungswesen 105 (1980)
161-180
Grafarend, E.W., Krarup, T. and R. Syffus (1996): An algorithm for the inverse of a
multivariate homogeneous polynomial of degree n, J. Geodesy 70 (1996) 276-286
Grafarend, E.W., Krumm, F. and F. Okeke (1995): Curvilinear geodetic datum transfor-
mations, Z. Vermessungswesen 120 (1995) 334-350
Grafarend, E.W., Knickemeyer, E.H. and B. Schaffrin (1982): Geodätische Datumstrans-
formationen, Z. Vermessungswesen 107 (1982) 15-25
Grafarend, E.W., Krumm, F. and B. Schaffrin (1985): Criterion matrices of heterogene-
ously observed threedimensional networks, Manuscripta Geodaetica 10 (1985) 3-22
Grafarend, E.W., Krumm, F. and B. Schaffrin (1986): Kriterion-Matrizen III: Zweidi-
mensional homogene und isotrope geodätische Netze, Z. Vermessungswesen 111
(1986) 197-207
Grafarend, E.W., Schmitt, G. and B. Schaffrin (1976): Über die Optimierung lokaler
geodätischer Netze (Optimal design of local geodetic networks), 7th course, High pre-
cision Surveying Engineering (7.Int. Kurs für Ingenieurvermessung hoher Präzision)
29 Sept - 8 Oct 1976, Darmstadt 1976
Grafarend, E.W., Mueller, J.J., Papo, H.B. and B. Richter (1979): Concepts for reference
frames in geodesy and geodynamics: the reference directions, Bulletin Géodésique 53
(1979) 195-213
Graham, A. (1981): Kronecker products and matrix calculus, J. Wiley, New York 1981
Gram, J.P. (1883): Über die Entwicklung reeller Funktionen in Reihen mittelst der Me-
thode der kleinsten Quadrate, J. Reine Angew. Math. 94 (1883) 41-73
Granger, C.W.J. and P. Newbold (1986): Forecasting economic time series, 2nd ed., Aca-
demic Press, New York 1986
Granger, C.W.J. and T. Teräsvirta (1993): Modelling nonlinear economic relations, Ox-
ford University Press, New York 1993
Graybill, F.A. (1954): On quadratic estimates of variance components, The Annals of
Mathematical Statistics 25 (1954) 267-372
Graybill, F.A. (1983): Matrices with applications in statistics, 2nd ed., Wadsworth, Belt-
mont 1983
Graybill, F.A. and R.A. Hultquist (1961): Theorems concerning Eisenhart’s model II, The
Annals of Mathematical Statistics 32 (1961) 261-269
Green, B. (1952): The orthogonal approximation of an oblique structure in factor analysis,
Psychometrika 17 (1952) 429-440
Green, P.J. and B.W. Silverman (1993): Nonparametric regression and generalized linear
models, Chapman and Hall, Boca Raton 1993
688 References
Greenbaum, A. (1997): Iterative methods for solving linear systems, SIAM, Philadelphia
1997
Greenwood, J.A. and D. Durand (1955): The distribution of length and components of the
sum of n random unit vectors, Ann. Math. Statist. 26 (1955) 233-246
Greenwood, P.E. and G. Hooghiemstra (1991): On the domain of an operator between
supremum and sum, Probability Theory Related Fields 89 (1991) 201-210
Grenander, U. (1981): Abstract inference, Wiley, New York 1981
Griffith, D.F. and D.J. Higham (1997): Learning LaTeX, SIAM, Philadelphia 1997
Grimstad, A-A. and T. Mannseth (2000): Nonlinearity, scale and sensitivity for parameter
estimation problems, SIAM J. Sci. Comput. 21 (2000) 2096-2113
Grodecki, J. (1999): Generalized maximum-likelihood estimation of variance components
with inverted gamma prior, J. Geodesy 73 (1999) 367-374
Groechenig, K. (2001): Foundations of time-frequency analysis, Birkäuser Verlag, Bos-
ton-Basel-Berlin 2001
Gross, J. (1996a): On a class of estimators in the general Gauss-Markov model, Commun.
Statist. – Theory Meth. 25 (1996) 381-388
Gross, J. (1996b): Estimation using the linear regression model with incomplete ellipsoi-
dal restrictions, Acta Applicandae Mathematicae 43 (1996) 81-85
Gross, J. (1998): Statistical estimation by a linear combination of two given statistics,
Statistics and Probability Lett. 39 (1998) 379-384
Gross, J. and G. Trenkler (1997): When do linear transforms of ordinary least squares and
Gauss-Markov estimator coincide?, Sankhya 59 (1997) 175-178
Gross, J., Trenkler, G. and E.P. Liski (1998): Necessary and sufficient conditions for
superiority of misspecified restricted least squares regression estimator, J. Statist.
Planning and Inference 71 (1998) 109-116
Gross, J., Trenkler, G. and H.J. Werner (2001): The equality of linear transforms of the
ordinary least squares estimator and the best linear unbiased estimator, The Indian J.
Statistics 63 (2001) 118-127
Grossmann, W. (1973): Grundzüge der Ausgleichungsrechnung, Springer-Verlag, Berlin
1973
Grubbs, F.E. (1973): Errors of measurement, precision, accuracy and the statistical com-
parison of measuring instruments, Technometrics 15 (1973) 53-66
Grubbs, F.E. and G. Beck (1972): Extension of sample sizes and percentage points for
significance tests of outlying observations, Technometrics 14 (1972) 847-854
Guenther, W.C. (1964): Another derivation of the non-central Chi-Square distribution, J.
the American Statistical Association 59 (1964) 957-960
Guérin, C.-A. (2000): Wavelet analysis and covariance structure of some classes of non-
stationary processes, The J.Fourier Analysis and Applications 4 (2000) 403-425
Gui, Q. and J. Zhang (1998): Robust biased estimation and its applications in geodetic
adjustments, J. Geodesy 72 (1998) 430-435
Gui, Q.M. and J.S. Liu (2000): Biased estimation in the Gauss-Markov model, Allg.
Vermessungsnachrichten 107 (2000) 104-108
Gulliksson, M. and P.A. Wedin (2000): The use and properties of Tikhonov filter matri-
ces, SIAM J. Matrix Anal. Appl. 22 (2000) 276-281
Gulliksson, M., Soederkvist, I. and P.A. Wedin (1997): Algorithms for constrained and
weighted nonlinear least-squares, Siam J. Optim. 7 (1997) 208-224
Gumbel, E.J., Greenwood, J.A. and D. Durand (1953): The circular normal distribution:
theory and tables, J. Amer. Statist. Assoc. 48 (1953) 131-152
Guolin, L. (2000): Nonlinear curvature measures of strength and nonlinear diagnosis,
Allg. Vermessungsnachrichten 107 (2000) 109-111
References 689
Guolin, L., Jinyun, G. and T. Huaxue (2000): Two kinds of explicit methods to nonlinear
adjustments of free-networks with rank deficiency, Geomatics Research Australasia
73 (2000) 25-32
Guolin, L., Lianpeng, Z. and J. Tao (2001): Linear Space [L,M] N and the law of general-
ized variance-covariance propagation, Allg. Vermessungsnachrichten 10 (2001) 352-
356
Gupta, A.K. and D.G. Kabe (1997): Linear restrictions and two step multivariate least
squares with aplications, Statistics & Probability Letters 32 (1997) 413-416
Gupta, A.K. and D.G. Kabe (1999a): On multivariate Liouville distribution, Metron 57
(1999) 173-179
Gupta, A.K. and D.G. Kabe (1999b): Distributions of hotelling’s T2 and multiple and
partial correlation coefficients for the mixture of two multivariate Gaussian popula-
tions, Statistics 32 (1999) 331-339
Gupta, A.K. and D.K. Nagar (1998): Quadratic forms in disguised matrix T-variate, Sta-
tistics 30 (1998) 357-374
Gupta, S.S. (1963): Probability integrals of multivariate normal and multivariate t 1, An-
nals of Mathematical Statistics 34 (1963) 792-828
Gut, A. (2002): On the moment problem, Bernoulli 8 (2002) 407-421
Guttman, I. (1982): Linear models: An Introduction, J. Willey & Sons 1982
Guttman, L. (1946): Enlargement methods for computing the inverse matrix, Ann. Math.
Statist. 17 (1946) 336-343
Guu, S.M., Lur, Y.Y. and C.T. Pang (2001): On infinite products of fuzzy matrices, SIAM
J. Matrix Anal. Appl. 22 (2001) 1190-1203
Haantjes, J. (1937): Conformal representations of an n-dimensional Euclidean space with
a non-definite fundamental form on itself, in: Nederl. Akademie van Wetenschappen,
Proc. Section of Sciences, vol. 40, pages 700-705, Noord-Hollandsche Uitgevers-
maatschappij, Amsterdam 1937
Haantjes, J. (1940): Die Gleichberechtigung gleichförmig beschleunigter Beobachter für
die elektromagnetischen Erscheinungen, in: Nederl. Akademie van Wetenschappen,
Proc. Section of Sciences, vol. 43, pages 1288-1299, Noord-Hollandsche Uitgevers-
maatschappij, Amsterdam 1940
Habermann, S.J. (1996): Advanced statistics, volmue I: description of populations,
Springer Verlag, New York 1996
Hadamard, J. (1899): Theorem sur les series entieres, Acta Math. 22 (1899) 1-28
Haerdle, W., Liang, H. and J. Gao (2000): Partially linear models, Physica-Verlag, Hei-
delberg 2000
Hager, W.W. (1989): Updating the inverse of a matrix, SIAM Rev. 31 (1989) 221-239
Hager, W.W. (2000): Iterative methods for nearly singular linear systems, SIAM J. Sci.
Comput. 22 (2000) 747-766
Hager, W.W. (2002): Minimizing the profile of a symmetric matrix, SIAM J. Sci. Com-
put. 23 (2002) 1799-1816
Hahn, M. and R. Bill (1984): Ein Vergleich der L1- und L2 - Norm am Beispiel Hel-
merttransformation, Allg. Vermessungsnachrichten 91 (1984) 441-450
Hahn, W. and P. Weibel (1996): Evolutionäre Symmetrietheorie, Wiss. Verlagsgesell-
schaft, Stuttgart 1996
Haimo, D. (eds) (1967): Orthogonal expansions and their continuous analogues, Southern
Illinois University Press, Carbondale 1967
Haines, G.V. (1985): Spherical cap harmonic analysis, J. Geophysical Research 90 (1985)
2583-2591
690 References
Hald, A. (1998): A history of mathematical statistics from 1750 to 1930, J. Wiley, New
York 1998
Hald, A. (2000): The early history of the cumulants and the Gram-Charlier series, Interna-
tional Statistical Review 68 (2000) 137-153
Halmos, P.R. (1946): The theory of unbiased estimation, Ann. Math. Statist. 17 (1946)
34-43
Hammersley, J.M. (1950): On estimating restricted parameters, J.R. Statist. Soc. (B) 12
(1950) 192-
Hampel, F.R. (1973): Robust estimation: a condensed partial survey, Zeitschrift für Wahr-
scheinlichkeitstheorie und verwandte Gebiete 27 (1973) 87-104
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J. and W.A. Stahel (1986): Robust statis-
tics, J. Wiley, New York 1986
Hanagal, D.D. (1996): UMPU tests for testing symmetry and stress-passing in some
bivariate exponential models, Statistics 28 (1996) 227-239
Hand, D.J. and M.J. Crowder (1996): Practical longitudinal data analysis, Chapman and
Hall, Boca Raton 1996
Hand, D.J., Daly, F., McConway, K., Lunn, D. and E. Ostrowski (1993): Handbook of
small data sets, Chapman and Hall, Boca Raton 1993
Hand, D.J. and C.C. Taylor (1987): Multivariate analysis of variance and repeated meas-
ures, Chapman and Hall, Boca Raton 1987
Handl, A.: Multivariate Analysemethoden. Theorie und Praxis multivariater Verfahren
unter besonderer Berücksichtigung von S-PLUS, Springer-Verlag
Hanke, M. (1991): Accelerated Landweber iterations for the solution of ill-posed equa-
tions, Numer. Math. 60 (1991) 341-375
Hanke, M. and P.C. Hansen (1993): Regularization methods for large-scale problems,
Surveys Math. Indust. 3 (1993) 253-315
Hansen, P.C. (1987): The truncated SVD as a method for regularization, BIT 27 (1987)
534-553
Hansen, P.C. (1990): The discrete Picard condition for discrete ill-posed problems, BIT
30 (1990) 658-672
Hansen, P.C. (1990): Truncated singular value decomposition solutions to discrete ill-
posed problems with ill-determined numerical rank, SIAM J. Sci. Statist. Comput. 11
(1990) 503-518
Hansen, P.C. (1994): Regularization tools: a matlab package for analysis and solution of
discrete ill-posed problems, Numer. Algorithms 6 (1994) 1-35
Hansen, P.C. (1995): Test matrices for regularization methods, SIAM J. Sci. Comput. 16
(1995) 506-512
Hansen, P.C. (1998): Rank-deficient and discrete ILL-posed problems, SIAM, Philadel-
phia 1998
Hardtwig, E. (1968): Fehler- und Ausgleichsrechung, Bibliographisches Institut, Mann-
heim 1968
Harley, B.I. (1956): Some properties of an angular transformation for the correlation
coefficient, Biometrika 43 (1956) 219-223
Harter, H.L. (1964): Criteria for best substitute interval estimators with an application to
the normal distribution, J. Amer. Statist. Assoc 59 (1964) 1133-1140
Harter, H.L. (1974/75): The method of least squares and some alternatives (five parts)
International Statistics Review 42 (1974) 147-174, 235-264, 43 (1975) 1-44, 125-190,
269-278
Harter, H.L. (1977): The non-uniqueness of absolute values regression, Commun. Statist.
Simul. Comput. 6 (1977) 829-838
References 691
Hartley, H.O. and J.N.K. Rao (1967): Maximum likelihood estimation for the mixed
analysis of variance model, Biometrika 54 (1967) 93-108
Hartman, P. and G.S. Watson (1974): „Normal“ distribution functions on spheres and the
modified Bessel function, Ann. Prob. 2 (1974) 593-607
Hartmann, C., Van Keer Berghen P., Smeyersverbeke, J. and D.L. Massart (1997): Robust
orthogonal regression for the outlier detection when comparing two series of meas-
urement results, Analytica Chimica Acta 344 (1997) 17-28
Hartung, J. (1981): Non-negative minimum biased invariant estimation in variance com-
ponet models, Annals of Statistics 9 (1981) 278-292
Hartung, J. (1999): Ordnungserhaltende positive Varianzschätzer bei gepaarten Messun-
gen ohne Wiederholungen, Allg. Statistisches Archiv 83 (1999) 230-247
Hartung, J. and B. Elpelt (1989): Multivariate Statistik, Oldenbourg Verlag, München
1989
Hartung, J. and K.H. Jöckel (1982): Zuverlässigkeits- und Wirtschaftlichkeitsüberlegun-
gen bei Straßenverkehrssignalanlagen, Qualität und Zuverlässigkeit 27 (1982) 65-68
Hartung, J. and D. Kalin (1980): Zur Zuverlässigkeit von Straßenverkehrssignalanlagen,
Qualität und Zuverlässigkeit 25 (1980) 305-308
Hartung, J. and B. Voet (1986): Best invariant unbiased estimators for the mean squared
error of variance component estimators, J. American Statist. Assoc. 81 (1986) 689-
691
Hartung, J. and H.J. Werner (1980): Zur Verwendung der restringierten Moore-Penrose-
Inversen beim Testen von linearen Hypothesen, Z. Angew. Math. Mechanik 60 (1980)
T344-T346
Hartung, J., Elpelt, B. and K.H. Klösener (1995): Statistik, Oldenbourg Verlag, München
1995
Hartung, J. et al (1982): Statistik, R. Oldenbourg Verlag, München 1982
Harvey, A.C. (1993): Time series models, 2nd ed., Harvester Wheatsheaf, New York 1993
Harville, D.A. (1976): Extension of the Gauss-Markov theorem to include the estimation
of random effects, Annals of Statistics 4 (1976) 384-395
Harville, D.A. (1977): Maximum likelihood approaches to variance component estimation
and to related problems, J.the American Statistical Association 72 (1977) 320-339
Harville, D.A. (1997): Matrix algebra from a statistician’s perspective, Springer Verlag,
New York 1997
Harville, D.A. (2001): Matrix algebra: exercises and solutions, Springer Verlag, New
York 2001
Hasssanein, K.M. and E.F. Brown (1996): Moments of order statistics from the rayleigh
distribution, J. Statistical Research 30 (1996) 133-152
Hassibi, A. and S. Boyd (1998): Integer parameter estimation in linear models with appli-
cations to GPS, JEEE Trans. on Signal Processing 46 (1998) 2938-2952
Haslett, J. and K. Hayes (1998): Residuals for the linear model with general covariance
structure, J. Royal Statistical Soc. B60 (1998) 201-215
Hastie, T.J. and R.J. Tibshirani (1990): Generalized additive models, Chapman and Hall,
Boca Raton 1990
Hauser, M.A., Pötscher, B.M. and E. Reschenhofer (1999): Measuring persistence in
aggregate output: ARMA models, fractionally integrated ARMA models and non-
parametric procedures, Empirical Economics 24 (1999) 243-269
Haussdorff, F. (1901): Beiträge zur Wahrscheinlichkeitsrechnung, Königlich Sächsische
Gesellschaft der Wissenschaften zu Leipzig, berichte Math. Phys. Chasse 53 (1901)
152-178
692 References
Hawkins, D.M. (1993): The accuracy of elemental set approximation for regression, J.
Amer. Statist. Assoc. 88 (1993) 580-589
Hayes, K. and J. Haslett (1999): Simplifying general least squares, American Statistician
53 (1999) 376-381
He, K. (1995): The robustness of bootstrap estimator of variance, J. Ital. Statist. Soc. 2
(1995) 183-193
He, X. (1991): A local breakdown property of robust tests in linear regression, J. Multi-
var. Analysis 38, 294-305, 1991
He, X., Simpson, D.G. and Portnoy, S.L. (1990): Breakdown robustness of tests, J. Am.
Statis. Assn 85, 446-452, 1990
Healy, D.M. (1998): Spherical Deconvolution, J. Multivariate Analysis 67 (1998) 1-22
Heck, B. (1981): Der Einfluss einzelner Beobachtungen auf das Ergebnis einer Ausglei-
chung und die Suche nach Ausreißern in den Beobachtungen, Allg. Vermessungs-
nachrichten 88 (1981) 17-34
Heideman, M.T., Johnson, D.H. and C.S. Burrus (1984): Gauss and the history of the fast
Fourier transform, JEEE ASSP Magazine 1 (1984) 14-21
Heiligers, B. (1994): E-optimal designs in weighted polynomial regression, Ann. Stat. 22
(1994) 917-929
Heine, V. (1955): Models for two-dimensional stationary stochastic processes, Biometrika
42 (1955) 170-178
Heinrich, L. (1985): Nonuniform estimates, moderate and large derivations in the central
limit theorem for m-dependent random variable, Math. Nachr. 121 (1985) 107-121
Hekimoglu, S. (1998): Application of equiredundancy design to M-estimation,
J.Surveying Engineering 124 (1998) 103-124
Hekimoglu, S. (2005): Do robust methods identify outliers more reliably than conven-
tional tests for outliers, Z. Vermessungswesen 3 (2005) 174-180
Hekimoglu, S. and M. Berber (2003): Effectiveness of robust methods in heterogeneous
linear models, J. Geodesy 76 (2003) 706-713
Hekimoglu, S and K.-R. Koch (2000): How can reliability of the test for outliers be meas-
ured?, Allg. Vermessungsnachrichten 7 (2000) 247-253
Helmert, F.R. (1875): Über die Berechnung des wahrscheinlichen Fehlers aus einer endli-
chen Anzahl wahrer Beobachtungsfehler, Z. Math. U. Physik 20 (1875) 300-303
Helmert, F.R. (1876): Diskussion der Beobachtungsfehler in Koppes Vermessung für die
Gotthardtunnelachse, Z. Vermessungswesen 5 (1876) 129-155
Helmert, F.R. (1876a): Die Genauigkeit der Formel von Peters zur Berechnung des wahr-
scheinlichen Fehlers direkter Beobachtungen gleicher Genauigkeit, Astron. Nachrich-
ten 88 (1976) 113-132
Helmert, F.R. (1876b): Über die Wahrscheinlichkeit der Potenzsummen der Beobach-
tungsfehler, Z. Math. U. Phys. 21 (1876) 192-218
Helmert, F.R. (1907): Die Ausgleichungsrechnung nach der Methode der kleinsten Quad-
rate, mit Anwendungen auf die Geodäsie, die Physik und die Theorie der Messinstru-
mente, B.G. Teubner, Leipzig – Berlin 1907
Henderson, H.V. (1981): The vec-permutation matrix, the vec operator and Kronecker
products: a review, Linear and Multilinear Algebra 9 (1981) 271-288
Henderson, H.V. and S.R. Searle (1981a): Vec and vech operators for matrices, with some
uses in Jacobians and multivariate statistics
Henderson, H.V. and S.R. Searle (1981b): On deriving the inverse of a sum of matrices,
SIAM Review 23 (1981) 53-60
Henderson, H.V., Pukelsheim, F. and S.R. Searle (1983): On the history of the Kronecker
product, Linear and Multilinear Algebra 14 (1983) 113-120
References 693
Hendriks, H. and Z. Landsman (1998): Mean location and sample mean location on mani-
folds: Asymptotics, tests, confidence regions, J. Multivar. Analysis 67 (1998) 227-243
Hengst, M. (1967): Einführung in die Mathematische Statistik und ihre Anwendung,
Bibliographisches Institut, Mannheim 1967
Henrici, P. (1962): Bounds for iterates, inverses, spectral variation and fields of values of
non-normal matrices, Numer. Math. 4 (1962) 24-40
Herzberg, A.M. and A.V. Tsukanov (1999): A note on the choice of the best selection
criterion for the optimal regression model, Utilitas Mathematica 55 (1999) 243-254
Hesse, K. (2003): Domain decomposition methods in multiscale geopotential determinati-
on from SST and SGG, Berichte aus der Mathematik, Shaker Verlag, Aachen 2003
Hetherington, T.J. (1981): Analysis of directional data by exponential models, PhD. The-
sis, University of California, Berkeley 1981
Hext, G.R. (1963): The estimation of second-order tensors, with related tests and designs,
Biometrika 50 (1963) 353-373
Heyde, C.C. (1997): Quasi-likelihood and its application. A general approach to optimal
parameter estimation, Springer Verlag, New York 1997
Hickernell, F.J. (1999): Goodness-of-fit statistics, discrepancies and robust designs, Sta-
tistics and Probability Letters 44 (1999) 73-78
Hida, T. and S. Si (2004): An innovation approach to random field, Application to white
noise theory, Probability and Statistics 2004
Higham, N.J. and F. Tisseur (2000): A block algorithm for matrix 1-norm estimation, with
an application to 1-norm pseudospectra, SIAM J. Matrix Anal. Appl. 21 (2000) 1185-
1201
Hinde, J. (1998): Overdispersion: models and estimation, Comput. Stat. & Data Anal. 27
(1998) 151-170
Hinkelmann, K. (ed) (1984): Experimental design, statistical models, and genetic statis-
tics, Marcel Dekker, Inc. 1984
Hinkley, D. (1979): Predictive likelihood, Ann. Statist. 7 (1979) 718-728
Hinkley, D., Reid, N. and E.J. Snell (1990): Statistical theory and modelling, Chapman
and Hall, Boca Raton 1990
Hjorth, J.S.U. (1993): Computer intensive statistical methods, Chapman and Hall, Boca
Raton 1993
Ho, L.L. (1997): Regression models for bivariate counts, Brazilian J. Probability and
Statistics 11 (1997) 175-197
Hoaglin, D.C. and R.E. Welsh (1978): The Hat Matrix in regression and ANOVA, The
American Statistician 32 (1978) 17-22
Hocking, R.R. (1996): Methods and applications of linear models – regression and the
analysis of variance, John Wiley & Sons. Inc 1996
Hodge, W. and D. Pedoe (1968): Methods of algebraic geometry, I, Cambridge University
Press, Cambridge 1968
Hoel, P.G. (1965): Minimax distance designs in two dimensional regression, Ann. Math.
Statist. 36 (1965) 1097-1106
Hoel, P.G., S.C. Port and C.J. Stone (1972): Introduction to stochastic processes, Hough-
ton Mifflin Publ., Boston 1972
Hoerl, A.E. and R.W. Kennard (2000): Ridge regression: biased estimation for nonor-
thogonal problems, Technometrics 42 (2000) 80-86
Hoepke, W. (1980): Fehlerlehre und Ausgleichungsrechnung, De Gruyter, Berlin 1980
Hoffmann, K. (1992): Improved estimation of distribution parameters: Stein-type estima-
tors, Teubner-Texte zur Mathematik, Stuttgart/Leipzig 1992
694 References
Hofmann, B. (1986): Regularization for applied inverse and ill-posed problems, Teubner
Texte zur Mathematik 85, Leipzig 1986
Hogg, R.V. (1972): Adaptive robust procedures: a partial review and some suggestions
for future applications and theory, J. American Statistical Association 43 (1972) 1041-
1067
Hogg, R.V. (1974): Adaptive robust procedures: a partial review and some suggestions
for future applications and theory, J. American Statistical Association 69 (1974) 909-
923
Hogg, R.V. and R.H. Randles (1975): Adaptive distribution free regression methods and
their applications, Technometrics 17 (1975) 399-407
Holota, P. (2001): Variational methods in the representation of the gravitational potential,
Cahiers du Centre Européen de Géodynamique et de Sésmologie 2001
Holota, P. (2002): Green’s function and external masses in the solution of geodetic
boundary-value problems, Presented at the 3rd Meeting of the IAG Intl. Gravity and
Geoid Comission, Thessaloniki, Greece, August 26-30, 2002
Holschneider, M. (2000): Introduction to continuous wavelet analysis, in: Klees, R. and R.
Haagmans (eds): Wavelets in the geosciences, Springer 2000
Hong, C.S. and H.J. Choi (1997): On L1 regression coefficients, Commun. Statist. Simul.
Comp. 26 (1997) 531-537
Hora, R.B. and R.J. Buehler (1965): Fiducial theory and invariant estimation, Ann. Math.
Statist. 37 (1965) 643-656
Horn, R.A. (1989): The Hadamard product, in Matrix Theory and Applications, C.R.
Johnson, ed., Proc. Sympos. Appl. Math. 40 (1989) 87-169
Horn, R.A. and C.R. Johnson (1990): Matrix analysis, Cambridge University Press, Cam-
bridge 1990
Horn, R.A. and C.R. Johnson (1991): Topics on Matrix analysis, Cambridge University
Press, Cambridge 1991
Hornoch, A.T. (1950): Über die Zurückführung der Methode der kleinsten Quadrate auf
das Prinzip des arithmetischen Mittels, Österr. Z. Vermessungswesen 38 (1950) 13-18
Hosking, J.R.M. and J.R. Wallis (1997): Regional frequency analysis. An approach based
on L-moments, Cambridge University Press 1997
Hosoda, Y. (1999): Truncated least-squares least norm solutions by applying the QR
decomposition twice, trans. Inform. Process. Soc. Japan 40 (1999) 1051-1055
Hotelling, H. (1953): New light on the correlation coefficient and its transform, J. Royal
Stat. Society, Series B, 15 (1953) 225-232
Hoyle, M.H. (1973): Transformations- an introduction and a bibliography, Int. Statist.
Review 41 (1973) 203-223
Hsu, J.C. (1996): Multiple comparisons, Chapman and Hall, Boca Raton 1996
Hsu, P.L. (1940): An algebraic derivation of the distribution of rectangular coordinates,
Proc. Edinburgh Math. Soc. 6 (1940) 185-189
Hsu, R. (1999): An alternative expression for the variance factors in using Iterated Almost
Unbiased Estimation, J. Geodesy 73 (1999) 173-179
Hsu, Y.S., Metry, M.H. and Y.L. Tong (1999): Hypotheses testing for means of depend-
ent and heterogeneous normal random variables, J. Statist. Planning and Inference 78
(1999) 89-99
Huang, J.S. (1999): Third-order expansion of mean squared error of medians, Statistics &
Probability Letters 42 (1999) 185-192
Huber, P.J. (1964): Robust estimation of a location parameter, Annals Mathematical
Statistics 35 (1964) 73-101
References 695
Huber, P.J. (1972): Robust statistics: a review, Annals Mathematical Statistics 43 (1972)
1041-1067
Huber, P.J. (1981): Robust Statistics, J. Wiley, New York 1981
Huda, S. and A.A. Al-Shiha (1999): On D-optimal designs for estimating slope, The
Indian J. Statistics 61 (1999) 485-495
Huet, S., A. Bouvier, M.A. Gruet and E. Jolivet (1996): Statistical tools for nonlinear
regression, Springer Verlag, New York 1996
Hunter, D.B. (1995): The evaluation of Legendre functions of the second kind, Numerical
Algorithms 10 (1995) 41-49
Huwang, L. and Y.H.S. Huang (2000): On errors-in-variables in polynomical regression-
Berkson case, Statistica Sinica 10 (2000) 923-936
Hwang, C. (1993): Fast algorithm for the formation of normal equations in a least-squares
spherical harmonic analysis by FFT, Manuscripta Geodaetica 18 (1993) 46-52
Ibragimov, F.A. and R.Z. Kasminskii (1981): Statistical estimation, asymptotic theory,
Springer Verlag, New York 1981
Ihorst, G. and G. Trenkler (1996): A general investigation of mean square error matrix
superiority in linear regression, Statistica 56 (1996) 15-23
Imhof, L. (2000): Exact designs minimising the integrated variance in quadratic regres-
sion, Statistics 34 (2000) 103-115
Imhof, J.P. (1961): Computing the distribution of quadratic forms in normal variables,
Biometrika 48 (1961) 419-426
Inda, M.A. de et al (1999): Parallel fast Legendre transform, proceedings of the ECMWF
Workshop “Towards TeraComputing – the Use of Parallel Processors in Meteorol-
ogy”, Worls Scientific Publishing Co 1999
Irle, A. (1990): Sequentialanalyse: Optimale sequentielle Tests, Teubner Skripten zur
Mathematischen Stochastik. Stuttgart 1990
Irle, A. (2001): Wahrscheinlichkeitstheorie und Statistik, Teubner 2001
Irwin, J.O. (1927): On the frequency distribution of the means of samples from a popula-
tion having any law of frequency with finite moments with special reference to Pear-
son’s Type II, Biometrika 19 (1927) 225-239
Isham, V. (1993): Statistical aspects of chaos, in: Networks and Chaos, Statistical and
Probabilistic Aspects (ed. D.E. Barndorff-Nielsen et al) 124-200, Chapman and Hall,
London 1993
Ishibuchi, H., Nozaki, K. and H. Tanaka (1992): Distributed representation of fuzzy rules
and its application to pattern classification, Fuzzy Sets and Systems 52 (1992) 21-32
Ishibuchi, H., Nozaki, K., Yamamoto, N. and H. Tanaka (1995): Selecting fuzzy if-then
rules for classification problems using genetic algorithms, IEEE Transactions on
Fuzzy Systems 3 (1995) 260-270
Ishibuchi, H. and T. Murata (1997): Minimizing the fuzzy rule base and maximizing its
performance by a multi-objective genetic algorithm, in: Sixth FUZZ-IEEE Confer-
ence, Barcelona 1997, pp. 259-264
Izenman, A.J. (1975): Reduced-rank regression for the multivariate linear model, J. Mul-
tivariate Analysis 5 (1975) 248-264
Jacob, N. (1996): Pseudo-differential operators and Markov processes, Akademie Verlag,
Berlin 1996
Jacobi, C.G.J. (1841): Deformatione et proprietatibus determinatum, Crelle's J. reine
angewandte Mathematik, Bd.22
Jacod, J. and P. Protter (2000): Probability essentials, Springer Verlag, Berlin 2000
Jaeckel, L.A. (1972): Estimating regression coefficients by minimizing the dispersion of
the residuals, Annals Mathematical Statistics 43 (1972) 1449-1458
696 References
Jajuga, K. (1995): On the principal components of time series, Statistics in Transition 2
(1995) 201-205
James, A.T. (1954): Normal multivariate analysis and the orthogonal group, Ann. Math.
Statist. 25 (1954) 40-75
Jammalamadaka, S.R. and A.SenGupta (2001): Topics in circular statistics, World Scien-
tific, Singapore 2001
Janacek, G. (2001): Practical time series, Arnold, London 2001
Jennison, C. and B.W. Turnbull (1997): Distribution theory of group sequential t, x2 and
F-Tests for general linear models, Sequential Analysis 16 (1997) 295-317
Jennrich, R.I. (1969): Asymptotic properties of nonlinear least squares estimation, Ann.
Math. Statist. 40 (1969) 633-643
Jensen, J.L. (1981): On the hyperboloid distribution, Scand. J. Statist. 8 (1981) 193-206
Jiancheng, L., Dingbo, C. and N. Jinsheng (1995): Spherical cap harmonic expansion for
local gravity field representation, Manuscripta Geodaetica 20 (1995) 265-277
Jiang, J. (1997): A derivation of BLUP - Best linear unbiased predictor, Statistics & Prob-
ability Letters 32 (1997) 321-324
Jiang, J. (1999): On unbiasedness of the empirical BLUE and BLUP, Statistics & Prob-
ability 41 (1999) 19-24
Jiang, J., Jia, H. and H. Chen (2001): Maximum posterior estimation of random effects in
generalized linear mixed models, Statistica Sinica 11 (2001) 97-120
Joe, H. (1997): Multivariate models and dependence concepts, Chapman and Hall, Boca
Raton 1997
John, P.W.M. (1998): Statistical design and analysis of experiments, SIAM 1998
Johnson, N.L. and S. Kotz (1970) : Continuous univariate distributions-1, distributions in
statistics, Houghton Mifflin Company Boston 1970
Johnson, N.L., Kotz, S. and A.W. Kemp (1992): Univariate discrete distributions, J.
Willey & Sons 1992
Joergensen, B. (1984): The delta algorithm and GLIM, Int. Statist. Review 52 (1984) 283-
300
Joergensen, B. (1997): The theory of dispersion models, Chapman and Hall, Boca Raton
1997
Joergensen, B., Lundbye-Christensen, S., Song, P.X.-K. and L. Sun (1996b): State space
models for multivariate longitudinal data of mixed types, Canad. J. Statist. 24 (1996b)
385-402
Jorgensen, P.C., Kubik, K., Frederiksen, P. and W. Weng (1985): Ah, robust estimation!,
Australian J. Geodesy, Photogrammetry and Surveying 42 (1985) 19-32
John, S. (1962): A tolerance region for multivariate normal distributions, Sankya A24
(1962) 363-368
Johnson, N.L. and S. Kotz (1970a): Continuous univariate distributions – 2, Houghton
Mifflin Company, Boston 1970
Johnson, N.L. and S. Kotz (1970b): Discrete distribution, Houghton Mifflin Company,
Boston 1970
Johnson, N.L. and S. Kotz (1972): Distributions in statistics: continuous multivariate
distributions, J. Wiley, New York 1972
Johnson, N.L., Kotz, S. and X. Wu (1991): Inspection errors for attributes in quality con-
trol, Chapman and Hall, Boca Raton 1991
Joshi, V.M. (1966): Admissibility of confidence intervals, Ann. Math. Statist. 37 (1966)
629-638
Judge, G.G. and M.E. Bock (1978): The statistical implications of pre-test and Stein-rule
estimators in econometrics, Amsterdam 1978
References 697
Judge, G.G. and T.A. Yancey (1981): Sampling properties of an inequality restricted
estimator, Economics Lett. 7 (1981) 327-333
Judge, G.G. and T.A. Yancey (1986): Improved methods of inference in econometrics,
Amsterdam 1986
Jukiü, D. and R. Scitovski (1997): Existence of optimal solution for exponential model by
least squares, J. Comput. Appl. Math. 78 (1997) 317-328
Jupp, P.E. and K.V. Mardia (1980): A general correlation coefficient for directional data
and related regression problems, Biometrika 67 (1980) 163-173
Jupp, P.E. and K.V. Mardia (1989): A unified view of the theory of directional statistics,
1975-1988, International Statist. Rev. 57 (1989) 261-294
Jureckova, J. (1995): Affine- and scale-equivariant M-estimators in linear model, Prob-
ability and Mathematical Statistics 15 (1995) 397-407
Jurisch, R. and G. Kampmann (1997): Eine Verallgemeinerung des arithmetischen Mittels
für einen Freiheitsgrad bei der Ausgleichung nach vermittelnden Beobachtungen, Z.
Vermessungswesen 11 (1997) 509-520
Jurisch, R. and G. Kampmann (1998): Vermittelnde Ausgleichungsrechnung mit balan-
cierten Beobachtungen – erste Schritte zu einem neuen Ansatz, Z. Vermessungswesen
123 (1998) 87-92
Jurisch, R. and G. Kampmann (2001): Plücker-Koordinaten – ein neues Hilfsmittel zur
Geometrie- Analyse und Ausreissersuche, Vermessung, Photogrammetrie und Kultur-
technik 3 (2001) 146-150
Jurisch, R. and G. Kampmann (2002): Teilredundanzen und ihre natürlichen Verallge-
meinerungen, Z. Vermessungswesen 127 (2002) 117-123
Jurisch, R., Kampmann, G. and B. Krause (1997): Über eine Eigenschaft der Methode der
kleinsten Quadrate unter Verwendung von balancierten Beobachtungen, Z. Vermes-
sungswesen 122 (1997) 159-166
Jurisch, R., Kampmann, G. and J. Linke (1999a): Über die Analyse von Beobachtungen in
der Ausgleichungsrechnung - Teil I, Z. Vermessungswesen 124 (1999) 350-357
Jurisch, R., Kampmann, G. and J. Linke (1999b): Über die Analyse von Beobachtungen
in der Ausgleichungsrechnung - Teil II, Z. Vermessungswesen 124 (1999) 350-357
Kagan, A.M., Linnik, J.V. and C.R. Rao (1965): Characterization problems of the normal
law based on a property of the sample average, Sankya Ser. A 27 (1965) 405-406
Kagan, A. and L.A. Shepp (1998): Why the variance?, Statist. Prob. Letters 38 (1998)
329-333
Kagan, A. and Z. Landsman (1999): Relation between the covariance and Fisher informa-
tion matrices, Statistics & Probability Letters 42 (1999) 7-13
Kahn, M., Mackisack, M.S., Osborne, M.R. and G.K. Smyth (1992): On the consistency
of Prony's method and related algorithms, J. Comp. and Graph. Statist. 1 (1992) 329-
349
Kahng, M.W. (1995): Testing outliers in nonlinear regression, J. the Korean Stat. Soc. 24
(1995) 419-437
Kakihara, Y. (2001): The Kolmogorov isomorphism theorem and extensions to some
nonstationary processes, D. N. Shanbhag and C.R. Rao, eds., Handbook of Statistic 19
(2001) 443-470
Kallianpur, G. (1963): Von Mises functionals and maximum likelihood estimation, San-
kya A25 (1963) 149-158
Kallianpur, G. and Y.-T. Kim (1996): A curious example from statistical differential
geometry, Theory Probab. Appl. 43 (1996) 42-62
Kallenberg, O. (1997): Foundations of modern probability, Springer Verlag, New York
1997
698 References
Kaminsky, K.S. and P.I. Nelson (1975): Best linear unbiased prediction of order statistics
in location and scale families, J. Amer. Statist. Ass. 70 (1975) 145-150
Kaminsky, K.S. and L.S. Rhodin (1985): Maximum likelihood prediction, Ann. Inst.
Statist. Math. 37 A (1985), 507-517
Kampmann, G. (1988): Zur kombinativen Norm-Schätzung mit Hilfe der L1-, der L2- und
der Boskovic-Laplace-Methode mit den Mittlen der linearen Programmierung, PhD.
Thesis, Bonn University, Bonn 1988
Kampmann, G. (1992): Zur numerischen Überführung verschiedener linearer Modelle der
Ausgleichungsrechnung, Z. Vermessungswesen 117 (1992), 278-287
Kampmann, G. (1994): Robuste Deformationsanalyse mittels balancierter Ausgleichung,
Allg. Vermessungsnachrichten 1 (1994) 8-17
Kampmann, G (1997): Eine Beschreibung der Geometrie von Beobachtungen in der
Ausgleichungsrechnung, Z. Vermessungswesen 122 (1997) 369-377
Kampmann, G. and B. Krause (1996): Balanced observations with a straight line fit,
Bolletino di Geodesia e Scienze Affini 2 (1996) 134-141
Kampmann, G. and B. Krause (1997a): Minimierung von Residuenfunktionen unter
Ganzzahligkeitrestriktionen, Allg. Vermessungsnachrichtungen 8-9 (1997) 325-331
Kampmann, G. and B. Krause (1997b): A breakdown point analysis for the straight line
fit based on balanced observations, Bolletino di Geodesia e Scienze Affini 3 (1997)
294-303
Kampmann, G. and B. Krause (2004): Zur statistischen Begründung des Regressionsmo-
dells der balanzierten Ausgleichungsrechnung, Z. Vermessungswesen 129 (2004)
176-183
Kampmann, G. and B. Renner (1999): Über Modellüberführungen bei der linearen Aus-
gleichungsrechnung, Allg. Vermessungsnachrichten 2 (1999) 42-52
Kampmann, G. and B. Renner (2000): Numerische Beispiele zur Bearbeitung latenter
Bedingungen und zur Interpretation von Mehrfachbeobachtungen in der Ausglei-
chungsrechnung, Z. Vermessungswesen 125 (2000) 190-197
Kanani, E. (2000): Robust estimators for geodetic transformations and GIS, Institut für
Geodäsie und Photogrammetrie an der Eidgenössischen Technischen Hochschule Zü-
rich, Mitteilungen Nr. 70, Zürich 2000
Kannan, N. and D. Kundu (1994): On modified EVLP and ML methods for estimating
superimposed exponential signals, Signal Processing 39 (1994) 223-233
Kantz, H. and Scheiber, T. (1997): Nonlinear rime series analysis, Cambridge University
Press, Cambridge 1997
Karatzas, I. and S.E. Shreve (1991): Brownian motion and stochastic calculus, Springer-
Verlag, New York 1991
Karian, Z.A. and E.J. Dudewicz (2000): Fitting statistical distributions, CRC Press 2000
Kariya, T. (1989): Equivariant estimation in a model with an ancillary statistic, Ann.
Statist 17 (1989) 920-928
Karlin, S. and W.J. Studden (1966a): Tchebychev systems, Interscience, New York
(1966)
Karlin, S. and W.J. Studden (1966b): Optimal experimental designs, Ann. Math. Statist.
57 (1966) 783-815
Karr, A.F. (1993): Probability, Springer Verlag, New York 1993
Kasala, S. and T. Mathew (1997): Exact confidence regions and tests in some linear func-
tional relationships, Statistics & Probability Letters 32 (1997) 325-328
Kasietczuk, B. (2000): Geodetic network adjustment by the maximum likelihood method
with application of local variance, asymmetry and excess coefficients, Anno LIX -
Bollettino di Geodesia e Scienze Affini 3 (2000) 221-235
References 699
Kass, R.E. and P.W. Vos (1997): Geometrical foundations of asymptotic inference,
Wiley, New York 1997
Kastrup, H.A. (1962): Zur physikalischen Deutung und darstellungstheoretischen Analyse
der konformen Transformationen von Raum und Zeit, Annalen der Physik 9 (1962)
388-428
Kastrup, H.A. (1966): Gauge properties of the Minkowski space, Physical Review 150
(1966) 1183-1193
Kay, S.M. (1988): Sinusoidal parameter estimation, Prentice Hall, Englewood Cliffs, N.J.
1988
Keller, J.B. (1975): Closest unitary, orthogonal and Hermitian operators to a given opera-
tor, Math. Mag. 46 (1975) 192-197
Kelly, R.J. and T. Mathew (1993): Improved estimators of variance components having
smaller probability of negativity, J. Royal Stat. Soc. B 55 (1993) 897-911
Kemperman, J.H.B. (1956): Generalized tolerance limits, Ann. Math. Statist. 27 (1956)
180-186
Kendall, D.G. (1974): Pole seeking Brownian motion and bird navigation, Joy. Roy. Stat.
Soc. B. 36 (1974) 365-417
Kendall, D.G. (1984): Shape manifolds, Procrustean metrics, and complex projective
space, Bulletin of the London Mathematical Society 16 (1984) 81-121
Kendall, M.G. (1960): The evergreen correlation coefficient, pages 274-277, in: Essays on
Honor of Harold Hotelling, ed. I. Olkin, Stanford University Press, Stanford 1960
Kenney, C.S., A.J. Laub and M.S. Reese (1998): Statistical condition estimation for linear
systems, SIAM J. Scientific Computing 19 (1998) 566-584
Kent, J.T. (1976): Distributions, processes and statistics on “spheres”, PhD. Thesis, Uni-
versity of Cambridge
Kent, J.T. (1983): Information gain and a general measure of correlation, Biometrika 70
(1983) 163-173
Kent, J.T. (1997): Consistency of Procrustes estimators, J. R. Statist. Soc. B 59 (1997)
281-290
Kent, J.T. and K.V. Mardia (1997): Consistency of Procrustes estimators, J. R. Statist.
Soc. 59 (1997) 281-290
Kent, J.T. and M. Mohammadzadeh (2000): Global optimization of the generalized cross-
validation criterion, Statistics and Computing 10 (2000) 231-236
Khan, R.A. (1973): On some properties of Hammersley’s estimator of an integer mean,
The Annals of Statistics 1 (1973) 756-762
Khan, R.A. (1978): A note on the admissibility of Hammersley’s estimator of an integer
mean, The Canadian J.Statistics 6 (1978) 113-119
Khan, R.A. (1998): Fixed-width confidence sequences for the normal mean and the bino-
mial probability, Sequential Analysis 17 (1998) 205-217
Khan, R.A. (2000): A note on Hammersley's estimator of an integer mean, J. Statist.
Planning and Inference 88 (2000) 37-45
Khatri, C.G. and C.R. Rao (1968): Solutions to some fundamental equations and their
applications to characterization of probability distributions, Sankya, Series A, 30
(1968) 167-180
Khatri, C.G. and S.K. Mitra (1976): Hermitian and nonnegative definite solutions of
linear matrix equations, SIAM J. Appl. Math. 31 (1976) 597-585
Khuri, A.I. (1999): A necessary condition for a quadratic form to have a chi-squared
distribution: an accessible proof, Int. J. Math. Educ. Sci. Technol. 30 (1999) 335-339
Khuri, A.I., Mathew, T. and B.K. Sinha (1998): Statistical tests for mixed linear models,
Wiley, New York 1998
700 References
Kidd, M. and N.F. Laubscher (1995): Robust confidence intervals for scale and its appli-
cation to the Rayleigh distribution, South African Statist. J. 29 (1995) 199-217
Kiefer, J. (1974): General equivalence theory for optimal designs (approximate theory),
Ann. Stat. 2 (1974) 849-879
Kiefer, J.C. and J. Wolfowitz (1959): Optimum design in regression problem, Ann. Math.
Statist. 30 (1959) 271-294
Kilmer, M.E. and D.P. O’Leary (2001): Choosing regularization parameters in iterative
methods for ILL-Posed Problems, SIAM J. Matrix Anal. Appl. 22 (2001) 1204-1221
Kim, C. and B.E. Storer (1996): Reference values for Cook’s distance, Commun. Statist. –
Simula. 25 (1996) 691-708
King, J.T. and D. Chillingworth (1979): Approximation of generalized inverses by iter-
ated regularization, Numer. Funct. Anal. Optim. 1 (1979) 499-513
King, M.L. (1980): Robust tests for spherical symmetry and their application to least
squares regression, Ann. Statist. 8 (1980) 1265-1271
Kirkwood, B.H., Royer, J.Y., Chang,T.C. and R.G.Gordon (1999): Statistical tools for
estimating and combining finite rotations and their uncertainties, Geoph. J. Int.
137(1999) 408-428
Kirsch, A. (1996): An introduction to the mathematical theory of inverse problems,
Springer Verlag, New York 1996
Kitagawa, G. and W. Gersch (1996): Smoothness priors analysis of time series, Springer
Verlag, New York 1996
Klebanov, L.B. (1976): A general definition of unbiasedness, Theory of Probability and
Appl. 21 (1976) 571-585
Klees, R., Ditmar, P. and P. Broersen (2003): How to handle colored observation noise in
large least-squares problems, J. Geodesy 76 (2003) 629-640
Kleffe, J. (1976): A note on MINQUE for normal models, Math. Operationsforschg.
Statist. 7 (1976) 707-714
Kleffe, J. and R. Pincus (1974): Bayes and the best quadratic unbiased estimators for
variance components and heteroscedastic variances in linear models, Math. Opera-
tionsforschg. Statistik 5 (1974) 147-159
Kleffe, J. and J.N.K. Rao (1986): The existence of asymptotically unbiased nonnegative
quadratic estimates of variance components in ANOVA models, J. American Statisti-
cal Assoc. 81(1986) 692-698
Kleusberg, A. and E.W. Grafarend (1981): Expectation and variance component estima-
tion of multivariate gyrotheodolite observation II, Allg. Vermessungsnachrichten 88
(1981) 104-108
Klonecki, W. and S. Zontek (1996): Improved estimators for simultaneous estimation of
variance components, Statistics & Probability Letters 29 (1996) 33-43
Knautz, H. (1996): Linear plus quadratic (LPQ) quasiminimax estimation in the linear
regression model, Acta Applicandae Mathematicae 43 (1996) 97-111
Knautz, H. (1999): Nonlinear unbiased estimation in the linear regression model with
nonnormal disturbances, J. Statistical Planning and Inference 81 (1999) 293-309
Knickmeyer, E.H. (1984): Eine approximative Lösung der allgemeinen linearen Geodäti-
schen Randwertaufgabe durch Reihenentwicklungen nach Kugelfunktionen, Deutsche
Geodätische Kommission bei der Bayerischen Akademie der Wissenschaften, Mün-
chen 1984
Knobloch, E. (1992): Historical aspects of the foundations of error theory, in: Echeveria,
J., Ibarra, J. and T. Mormann (eds): The space of mathematics – philosophical, epis-
temological and historical explorations, Walter de Gruyter 1992
Kobilinsky, A. (1990): Complex linear models and cyclic designs, Linear Algebra and its
Application 127 (1990) 227-282
References 701
Koch, G.G. (1968): Some further remarks on “A general approach to the estimation of
variance components“, Technometrics 10 (1968) 551-558
Koch, K.R. (1979): Parameter estimation in the Gauß-Helmert model, Boll. Geod. Sci.
Affini 38 (1979) 553-563
Koch, K.R. (1982): S-transformations and projections for obtaining estimable parameters,
in: Blotwijk, M.J. et al. (eds.): 40 Years of Thought, Anniversary volume for Prof.
Baarda’s 65th Birthday Vol. 1. pp. 136-144, Technische Hogeschool Delft, Delft 1982
Koch, K.R. (1987): Parameterschaetzung und Hypothesentests in linearen Modellen,
2nd ed., Duemmler, Bonn 1987
Koch, K.R. (1988): Parameter estimation and hypothesis testing in linear models, Sprin-
ger-Verlag, Berlin – Heidelberg – New York, 1988
Koch, K.R. (1999): Parameter estimation and hypothesis testing in linear models, 2nd ed.,
Springer Verlag, Berlin 1999
Koch, K.R. and J. Kusche (2002): Regularization of geopotential determination from
satellite data by variance components, J. Geodesy 76 (2002) 259-268
Koch, K.R. and Y. Yang (1998): Konfidenzbereiche und Hypothesentests für robuste
Parameterschätzungen, Z. Vermessungswesen 123 (1998) 20-26
König, D. and V. Schmidt (1992): Zufällige Punktprozesse, Teubner Skripten zur Mathe-
matischen Stochastik, Stuttgart 1992
Koenker, R. and G. Basset (1978): Regression quantiles, Econometrica 46 (1978) 33-50
Kokoszka, P. and T. Mikosch (2000): The periodogram at the Fourier frequencies, Sto-
chastic Processes and their Applications 86 (2000) 49-79
Kollo, T. and H. Neudecker (1993): Asymptotics of eigenvalues and unit-length eigenvec-
tors of sample variance and correlation matrices, J. Multivariate Anal. 47 (91993)
283-300
Kollo, T. and D. von Rosen (1996): Formal density expansions via multivariate mixtures,
in: Multidimensional statistical analysis and theory of random matrices, pp. 129-138,
Proceedings of the Sixth Lukacs Symposium, eds. Gupta, A.K. and V.L.Girko, VSP,
Utrecht 1996
Koopmans, T.C. and O. Reiersol (1950): The identification of structural characteristics,
Ann. Math. Statistics 21 (1950) 165-181
Kosko, B. (1992): Networks and fuzzy systems, Prentice-Hall, Englewood Cliffs 1992
Kotecky, R. and J. Niederle (1975): Conformally covariant field equations: First order
equations with non-vanishing mass, Czech. J. Phys. B25 (1975) 123-149
Kotlarski, I. (1967): On characterizing the gamma and the normal distribution, Pacific J.
Mathematics 20 (1967) 69-76
Kotsakis, C. and M.G. Sideris (2001): A modified Wiener-type filter for geodetic estima-
tion problems with non-stationary noise, J. Geodesy 75 (2001) 647-660
Kott, P.S. (1998): A model-based evaluation of several well-known variance estimators
for the combined ratio estimator, Statistica Sinica 8 (1998) 1165-1173
Kotz, S., Kozubowski, T.J. and K. Podgórki (2001): The Laplace distribution and gener-
alizations, Birkhäuser 1999
Koukouvinos, C. and J. Seberry (1996): New weighing matrices, Sankhya: The Indian J.
Statistics B58 (1996) 221-230
Kowalewski, G. (1995): Robust estimators in regression, Statistics in Transition 2 (1995)
123-135
Krämer, W., Bartels, R. and D.G. Fiebig (1996): Another twist on the equality of OLS and
GLS, Statistical Papers 37 (1996) 277-281
Krantz, S.G. and H.R. Parks (2002): The implicit function theorem – history, theory and
applications, Birkhäuser, Boston 2002
702 References
Krarup, T., Juhl, J. and K. Kubik (1980): Götterdämmerung over least squares adjustment,
in: Proc. 14th Congress of the International Society of Photogrammetry, vol. B3,
Hamburg 1980, 369-378
Krengel, U. (1985): Ergodic theorems, de Gruyter, Berlin-New York 1985
Kres, H. (1983): Statistical tables for multivariate analysis, Springer, Berlin-Heidelberg-
New York 1985
Krishnakumar, J. (1996): Towards a general robust estimation approach for generalised
regression models, Physics Abstract, Science Abstract Series A, INSPEC 1996
Kronecker, L. (1903): Vorlesungen über die Theorie der Determinanten, Erster Band,
Bearbeitet und fortgeführt von K.Hensch, B.G. Teubner, Leipzig 1903
Krumbein, W.C. (1939): Preferred orientation of pebbles in sedimentary deposits, J. Geol.
47 (1939) 673-706
Krumm, F. (1987): Geodätische Netze im Kontinuum: Inversionsfreie Ausgleichung und
Konstruktion von Kriterionmatrizen, Deutsche Geodätische Kommission, Bayerische
Akademie der Wissenschaften, PhD. Thesis, Report C334, München 1987
Krumm, F. and F. Okeke (1998): Graph, graph spectra, and partitioning algorithms in a
geodetic network structural analysis and adjustment, Bolletino di Geodesia e Science
Affini 57 (1998) 1-24
Krumm, F., Grafarend, E.W. and B. Schaffrin (1986): Continuous networks, Fourier
analysis and criterion matrices, Manuscripta Geodaetica 11 (1986) 57-78
Kruskal, W. (1946): Helmert’s distribution, American Math. Monthly 53 (1946) 435-438
Kruskal, W. (1968): When are Gauß-Markov and least squares estimators identical? A
coordinate-free approach, Ann. Statistics 39 (1968) 70-75
Kryanev, A.V. (1974): An iterative method for solving incorrectly posed problem, USSR.
Comp. Math. Math. Phys. 14 (1974) 24-33
Krzanowski, W.J. (1995): Recent advances in descriptive multivariate analysis, Clarendon
Press, Oxford 1995
Krzanowski, W.J. and F.H.C. Marriott (1994): Multivariate analysis: part I - distribution,
ordination and inference, Arnold Publ., London 1994
Krzanowski, W.J. and F.H.C. Marriott (1995): Multivariate analysis: part II - classifica-
tion, covariance structures and repeated measurements, Arnold Publ., London 1995
Kshirsagar, A.M. (1983): A course in linear models, Marcel Dekker Inc, New York –
Basel 1983
Kuang, S.L. (1991): Optimization and design of deformations monitoring schemes, PhD
dissertation, Department of Surveying Engineering, University of New Brunswick,
Tech. Rep. 91, Fredericton 1991
Kuang, S. (1996): Geodetic network analysis and optimal design, Ann Arbor Press, Chel-
sea, Michigan 1996
Kubácek, L. (1996a): Linear model with inaccurate variance components, Applications of
Mathematics 41 (1996) 433-445
Kubácek, L. (1996b): Nonlinear error propagation law, Applications of Mathematics 41
(1996) 329-345
Kubik, K. (1982): Kleinste Quadrate und andere Ausgleichsverfahren, Vermessung Pho-
togrammetrie Kulturtechnik 80 (1982) 369-371
Kubik, K., and Y. Wang (1991): Comparison of different principles for outlier detection,
Australian J. Geodesy, Photogrammetry and Surveying 54 (1991) 67-80
Kuechler, U. and M. Soerensen (1997): Exponential families of stochastic processes,
Springer Verlag, Berlin 1997
Kullback, S. (1934): An application of characteristic functions to the distribution problem
of statistics, Annals Math. Statistics 4 (1934) 263-305
References 703
Kumaresan, R. (1982): Estimating the parameters of exponentially damped or undamped
sinusoidal signals in noise, PhD thesis, The University of Rhode Island, Rhode Island
1982
Kundu, D. (1993a): Estimating the parameters of undamped exponential signals, Tech-
nometrics 35 (1993) 215-218
Kundu, D. (1993b): Asymptotic theory of least squares estimator of a particular nonlinear
regression model, Statistics and Probability Letters 18 (1993) 13-17
Kundu, D. (1994a): Estimating the parameters of complex valued exponential signals,
Computational Statistics and Data Analysis 18 (1994) 525-534
Kundu, D. (1994b): A modified Prony algorithm for sum of damped or undamped expo-
nential signals, Sankhya A 56 (1994) 525-544
Kundu, D. (1997): Asymptotic theory of the least squares estimators of sinusoidal signal,
Statistics 30 (1997) 221-238
Kundu, D. and R.D. Gupta (1998): Asymptotic properties of the least squares estimators
of a two dimensional model, Metrika 48 (1998) 83-97
Kundu, D. and A. Mitra (1998a): Fitting a sum of exponentials to equispaced data, The
Indian J.Statistics 60 (1998) 448-463
Kundu, D. and A. Mitra (1998b): Different methods of estimating sinusoidal frequencies:
a numerical comparison, J. Statist. Comput. Simul. 62 (1998) 9-27
Kunst, R.M. (1997): Fourth order moments of augmented ARCH processes, Commun.
Statist. Theor. Meth. 26 (1997) 1425-1441
Kunz, E. (1985): Introduction to commutative algebra and algebraic geometry, Birkhäuser
Boston – Basel – Berlin 1985
Kuo, W., Prasad, V.R., Tillman, F.A. and C.-L. Hwang (2001): Optimal reliability design,
Cambridge University Press, Cambridge 2001
Kuo, Y. (1976): An extended Kronecker product of matrices, J. Math. Anal. Appl. 56
(1976) 346-350
Kupper, L.L (1972): Fourier series and spherical harmonic regression, J. Royal Statist.
Soc. C21 (1972) 121-130
Kupper, L.L (1973): Minimax designs of Fourier series and spherical harmonic regres-
sions: a characterization of rota table arrangements, J. Royal Statist. Soc. B35 (1973)
493-500
Kurata, H. (1998): A generalization of Rao's covariance structure with applications to
several linear models, J. Multivariate Analysis 67 (1998) 297-305
Kurz, L. and M.H. Benteftifa (1997): Analysis of variance in statistical image processing,
Cambridge University Press, Cambridge 1997
Kusche, J. (2001): Implementation of multigrid solvers for satellite gravity anomaly re-
covery, J. Geodesy 74 (2001) 773-782
Kusche, J. (2002): Inverse Probleme bei der Gravitationsfeldbestimmung mittels SST-
und SGG- Satellitenmissionen , Deutsche Geodätische Kommission, Report C548,
München 2002
Kusche, J. (2003): A Monte-Carlo technique for weight estimation in satellite geodesy, J.
Geodesy 76 (2003) 641-652
Kusche, J. and R. Klees (2002): Regularization of gravity field estimation from satellite
gravity gradients, J. Geodesy 76 (2002) 359-368
Kushner, H. (1967a): Dynamical equations for optimal nonlinear filtering, J. Diff. Eq. 3
(1967) 179-190
Kushner, H. (1967b): Approximations to optimal nonlinear filters, IEEE Trans. Auto.
Contr. AC-12 (1967) 546-556
704 References
Kutoyants, Y.A. (1984): Parameter estimation for stochastic processes, Heldermann,
Berlin 1984
Kutterer, H. (1994): Intervallmathematische Behandlung endlicher Unschärfen linearer
Ausgleichsmodelle, PhD Thesis, Deutsche Geodätische Kommission DGK C423,
München 1994
Kutterer, H. and S. Schoen (1999): Statistische Analyse quadratischer Formen - der De-
terminantenansatz, Allg. Vermessungsnachrichten 10 (1999) 322-330
Kutterer, H. (1999): On the sensitivity of the results of least-squares adjustments concern-
ing the stochastic model, J. Geodesy 73 (1999) 350-361
Kutterer, H. (2002): Zum Umgang mit Ungewissheit in der Geodäsie – Bausteine für eine
neue Fehlertheorie - , Deutsche Geodätische Kommission, Report C553, München
2002
Laeuter, H. (1970): Optimale Vorhersage und Schätzung in regulären und singulären
Regressionsmodellen, Math. Operationsforschg. Statistik 1 (1970) 229-243
Laeuter, H. (1971): Vorhersage bei stochastischen Prozessen mit linearem Regressionsan-
teil, Math. Operationsforschg. Statistik 2 (1971) 69-85, 147-166
Laha, R.G. (1956): On the stochastic independence of two second-degree polynomial
statistics in normally distributed variates, The Annals of Mathematical Statistics 27
(1956) 790-796
Laha, R.G. and E. Lukacs (1960): On a problem connected with quadratic regression,
Biometrika 47 (1960) 335-343
Lai, T.L. and C.Z. Wie (1982): Least squares estimates in stochastic regression model
with applications to stochastic regression in linear dynamic systems, Anals of Statis-
tics 10 (1982) 154-166
Laird, N.M. and J.H. Ware (1982): Random-effects models for longitudinal data, Biomet-
rics 38 (1982) 963-974
LaMotte, L.R. (1973): Quadratic estimation of variance components, Biometrics 29
(1973) 311-330
LaMotte, L.R. (1973): On non-negative quadratic unbiased estimation of variance com-
ponents, J. American Statist. Assoc. 68 (1973) 728-730
LaMotte, L.R. (1976): Invariant quadratic estimators in the random, one-way ANOVA
model, Biometrics 32 (1976) 793-804
Lamotte, L.R., McWhorter, A. and R.A. Prasad (1988): Confidence intervals and tests on
the variance ratio in random models with two variance components, Com. Stat. –
Theory Meth. 17 (1988) 1135-1164
Lamperti, J. (1966): Probability, Benjamin Publ. 1966
Lancaster, H.O. (1965): The helmert matrices, American Math. Monthly 72 (1965) 4-11
Lancaster, H.O. (1966): Forerunners of the Pearson Chi2 , Australian J. Statistics 8 (1966)
117-126
Langevin, P. (1905): Magnetisme et theorie des electrons, Ann. de Chim. et de Phys. 5
(1905) 70-127
Lanzinger, H. and U. Stadtmüller (2000): Weighted sums for i.i.d. random variables with
relatively thin tails, Bernoulli 6 (2000) 45-61
Lardy, L.J. (1975): A series representation for the generalized inverse of a closed linear
operator, Atti Accad. Naz. Lincei Rend. Cl. Sci. Fis. Mat. Natur. 58 (1975) 152-157
Lauritzen, S. (1973): The probabilistic background of some statistical methods in Physical
Geodesy, Danish Geodetic Institute, Maddelelse 48, Copenhagen 1973
Lauritzen, S. (1974): Sufficiency, prediction and extreme models, Scand. J. Statist. 1
(1974) 128-134
References 705
Lawson, C.L. and R.J. Hanson (1995): Solving least squares problems, SIAM, Philadel-
phia 1995 (reprinting with corrections and a new appendix of a 1974 Prentice Hall
text)
Lawson, C.L. and R.J. Hanson (1974): Solving least squares problems, Prentice-Hal,
Englewod Cliffs 1974
Laycock, P.J. (1975): Optimal design: regression models for directions, Biometrika 62
(1975) 305-311
LeCam, L. (1960): Locally asymptotically normal families of distributions, University of
California Publication 3, Los Angeles 1960
LeCam, L. (1970): On the assumptions used to prove asymptotic normality of maximum
likelihood estimators, Ann. Math. Statistics 41 (1970) 802-828
LeCam, L. (1986): Proceedings of the Berkeley conference in honor of Jerzy Neyman and
Jack Kiefer, Chapman and Hall, Boca Raton 1986
Lee, J.C. and S. Geisser (1996): On the Prediction of Growth Curves, in: Lee, C., John-
son, W.O. and A. Zellner (eds.): Modelling and Prediction Honoring Seymour Geis-
ser, pp. 77-103, Springer Verlag, New York 1996
Lee, J.C., Johnson, W.O. and A. Zellner (1996): Modeling and prediction, Springer Ver-
lag, New York 1996
Lee, M. (1996): Methods of moments and semiparametric econometrics for limited de-
pendent variable models, Springer Verlag, New York 1996
Lee, P. (1992): Bayesian statistics, Wiley, New York 1992
Lee, S.L. (1995): A practical upper bound for departure from normality, Siam J. Matrix
Anal. Appl.16 (1995) 462-468
Lee, S.L. (1996): Best available bounds for departure from normality, Siam J. Matrix
Anal. Appl.17 (1996) 984-991
Lee, S.Y. and J.-Q. Shi (1998): Analysis of covariance structures with independent and
non-identically distributed observations, Statistica Sinica 8 (1998) 543-557
Lee, Y. and J.A. Nelder (1996): Hierarchical generalized linear models, J. Roy. Statist.
Soc., Series B, 58 (1996) 619-678
Lehmann, E.L. (1959): Testing statistical hypotheses, J. Wiley, New York 1959
Lehmann, E.L. and H. Scheffé (1950): Completeness, similar regions and unbiased esti-
mation, Part I, Sankya 10 (1950) 305-340
Lehmann, E.L. and G. Casella (1998): Theory of point estimation, Springer Verlag, New
York 1998
Lenk, U. (2001a): Schnellere Multiplikation großer Matrizen durch Verringerung der
Speicherzugriffe und ihr Einsatz in der Ausgleichungsrechnung, Z. Vermessungswe-
sen 4 (2001) 201-207
Lenk, U. (2001b): 2.5D-GIS und Geobasisdaten – Integration von Höheninformation und
Digitalen Situationsmodellen, Wiss. Arbeiten der Fachrichtung Vermessungswesen
der Uni. Hannover, Hannnover 2001
Lenstra, A.K., Lenstra, H.W. and L. Lovacz (1982): Factoring polynomials with rational
coefficients, Math. Ann. 261 (1982) 515-534
Lenstra, H.W. (1983): Integer programming with a fixed number of variables, Math.
Operations Res. 8 (1983) 538-548
Lenth, R.V. (1981): Robust measures of location for directional data, Technometrics 23
(1981) 77-81
Lenth, R.V. (1981): Robust measures of location for directional data, Technometrics 23
(1981) 77-81
706 References
Lentner, M.N. (1969): Generalized least-squares estimation of a subvector of parameters
in randomized fractional factorial experiments, Ann. Math. Statist. 40 (1969) 1344-
1352
Lenzmann, L. (2003): Strenge Auswertung des nichtlnearen Gauß-Helmert-Modells, Allg.
Vermessungsnachrichten 2 (2004) 68-73
Lesaffre, E. and G. Verbeke (1998): Local influence in linear mixed models, Biometrics
54 (1998) 570 – 582
Letac, G. and M. Mora (1990): Natural real exponential families with cubic variance
functions, Ann. Statist. 18 (1990) 1-37
Lether, F.G. and P.R. Wentson (1995): Minimax approximations to the zeros of P n (x) and
Gauss-Legendre quadrature, J. Comput. Appl. Math. 59 (1995) 245-252
Levenberg, K. (1944): A method for the solution of certain non-linear problems in least-
squares, Quart. Appl. Math. Vol. 2 (1944) 164-168
Levin, J. (1976): A parametric algorithm for drawing pictures of solid objects composed
of quadratic surfaces, Communications of the ACM 19 (1976) 555-563
Lewis, R.M. and V. Torczon (2000): Pattern search methods for linearly constrained
minimization, SIAM J. Optim. 10 (2000) 917-941
Lewis, T.O. and T.G. Newman (1968): Pseudoinverses of positive semidefinite matrices,
SIAM J. Appl. Math. 16 (1968) 701-703
Li, B.L. and C. Loehle (1995): Wavelet analysis of multiscale permeabilities in the sub-
surface, Geoph. Res. Lett. 22 (1995) 3123-3126
Li, C.K. and R. Mathias (2000): Extremal characterizations of the Schur complement and
resulting inequalities, SIAM Review 42 (2000) 233-246
Li, T. (2000): Estimation of nonlinear errors-in variables models: a simulated minimum
distance estimator, Statistics & Probability Letters 47 (2000) 243-248
Liang, K. and K. Ryu (1996): Selecting the form of combining regressions based on re-
cursive prediction criteria, in: Lee, C., Johnson, W.O. and A. Zellner (eds.): Model-
ling and prediction honoring Seymour Geisser, Springer Verlag, New York 1996,
122-135
Liang, K.Y. and S.L. Zeger (1986): Longitudinal data analysis using generalized linear
models, Biometrika 73 (1986) 13-22
Liang, K.Y. and S.L. Zeger (1995): Inference based on estimating functions in the pres-
ence of nuisance parameters, Statist. Sci. 10 (1995) 158-199
Liesen, J., Rozlozník, M. and Z. Strakos (2002): Least squares residuals and minimal
residual methods, SIAM J. Sci. Comput. 23 (2002) 1503-1525
Lindley, D.V. and A.F.M. Smith (1972): Bayes estimates for the linear model, J. Roy.
Stat. Soc. 34 (1972) 1-41
Lilliefors, H.W. (1967): On the Kolmogorov-Smirnov test for normality with mean and
variance unknown, J. American Statistical Assoc. 62 (1967) 399-402
Lin, A. and S.-P. Han (2004): A class of methods for projection on the intersection of
several ellipsoids, SIAM J. Optim 15 (2004) 129-138
Lin, X. and N.E. Breslow (1996): Bias correction in generalized linear mixed models with
multiple components of dispersion, J. American Statistical Assoc. 91 (1996) 1007-
1016
Lin, X.H. (1997): Variance component testing in generalised linear models with random
effects, Biometrika 84 (1997) 309-326
Lindsey, J.K. (1997): Applying generalized linear models, Springer Verlag, New York
1997
Lindsey, J.K. (1999): Models for repeated measurements, 2nd edition, Oxford University
Press, Oxford 1999
References 707
Lingjaerde, O. and N. Christophersen (2000): Shrinkage structure of partial least squares,
Board of the Foundation of the Scandivnavian J.Statistics 27 (2000) 459-473
Linke, J., Jurisch, R., Kampmann, G. und H. Runne (2000): Numerisches Beispiel zur
Strukturanalyse von geodätischen und mechanischen Netzen mit latenten Restriktio-
nen, Allg. Vermessungsnachrichten 107 (2000) 364-368
Linnik, J.V. and I.V. Ostrovskii (1977): Decomposition of random variables and vectors,
Transl. Math. Monographs Vol. 48, American Mathematical Society, Providence
1977
Liptser, R.S. and A.N. Shiryayev (1977): Statistics of random processes, Vol. 1, Springer
Verlag, New York 1977
Liski, E.P. and T. Nummi (1996): Prediction in repeated-measures models with engineer-
ing applications, Technometrics 38 (1996) 25-36
Liski, E.P. and S. Puntanen (1989): A further note on a theorem on the difference of the
generalized inverses of two nonnegative definite matrices, Commun. Statist.-Theory
Meth. 18 (1989) 1747-1751
Liski, E.P., Luoma, A. and A. Zaigraev (1999): Distance optimality design criterion in
linear models, Metrika 49 (1999) 193-211
Liski, E.P., Luoma, A., Mandal, N.K. and B.K. Sinha (1998): Pitman nearness, distance
criterion and optimal regression designs, Calcutta Statistical Ass. Bull. 48 (1998) 191-
192
Liski, E.P., Mandal, N.K., Shah, K.R. and B.K. Sinha (2002): Topics in optimal design,
Springer Verlag, New York 2002
Liu, J. (2000): MSEM dominance of estimators in two seemingly unrelated regressions, J.
Statistical Planning and Inference 88 (2000) 255-266
Liu, S. (2000): Efficiency comparisons between the OLSE and the BLUE in a singular
linear model, J. Statistical Planning and Inference 84 (2000) 191-200
Liu, X.-W. and Y.-X. Yuan (2000): A robust algorithm for optimization with general
equality and inequality constraints, SIAM J. Sci. Comput. 22 (2000) 517-534
Liu, W., Yao, Y. and C. Shi (2001): Theoretic research on robustified least squares esti-
mator based on equivalent variance-covariance, Geo-spatial Information Science 4
(2001) 1-8
Liu, W., Xia, Z. and M. Deng (2001): Modelling fuzzy geographic objects within fuzzy
fields, Geo-spatial Information Science 4 (2001) 37-42
Ljung, L. (1979): Asymptotic behavior of the extended Kalman filter as a parameter
estimator for linear systems, IEEE Trans. Auto. Contr. AC-24 (1979) 36-50
Lloyd, E.H. (1952): Least squares estimation of location and scale parameters using order
statistics, Biometrika 39 (1952) 88-95
Lohse, P. (1994): Ausgleichungsrechnung in nichtlinearen Modellen, Deutsche Geod.
Kommission C 429, München 1994
Long, J.S. and L.H. Ervin (2000): Using heteroscedasticy consistent standard errors in the
linear regression model, The American Statistician 54 (2000) 217-224
Longford, N.T. (1993): Random coefficient models, Clarendon Press, Oxford 1993
Longford, N. (1995): Random coefficient models, Oxford University Press, 1995
Longley, J.W. and R.D. Longley (1997): Accuracy of Gram-Schmidt orthogonalization
and householder transformation for the solution of linear least squares problems, Nu-
merical Linear Algebra with Applications 4 (1997) 295-303
López-Blázquez, F. (2000): Unbiased estimation in the non-central Chi-Square distribu-
tion, J. Multivariate Analysis 75 (2000) 1-12
Lord, R.D. (1948): A problem with random vectors, Phil. Mag. 39 (1948) 66-71
708 References
Lord, R.D. (1995): The use of the Hankel transform in statistics, I. General theory and
examples, Biometrika 41 (1954) 44-55
Lorentz, G.G. (1966): Metric entropy and approximation, Bull. American Math. Soc. 72
(1966) 903-937
Ludlow, J. and W. Enders (2000): Estimating non-linear ARMA models using Fourier
coefficients, International J.Forecasting 16 (2000) 333-347
Lütkepol, H. (1996): Handbook of matrices, J. Wiley, Chichester U.K. 1996
Lund, U. (1999): Least circular distance regression for directional data, J. Applied Statis-
tics 26 (1999) 723-733
Macinnes, C.S. (1999): The solution to a structured matrix approximation problem using
Grassmann coordinates, SIAM J. Matrix Analysis Appl. 21 (1999) 446-453
Madansky, A. (1959): The fitting of straight lines when both variables are subject to error,
J. American Statistical Ass. 54 (1959) 173-205
Madansky, A. (1962): More on lenght of confidence intervals, J. Amer. Statist. Assoc. 57
(1962) 586-589
Maejima, M. (1978): Some Lp versions for the central limit theorem, Ann. Probability 6
(1978) 341-344
Maekkinen, J. (2002): A bound for the Euclidean norm of the difference between the best
linear unbiased estimator and a linear unbiased estimator, J. Geodesy 76 (2002) 317-
322
Maess, G. (1988): Vorlesungen über numerische Mathematik II. Analysis, Birkhäuser
Verlag, Basel Boston 1988
Magder, L.S. and S.L. Zeger (1996): A smooth nonparametric estimate of a mixing distri-
bution using mixtures of Gaussians, J. Amer. Statist. Assoc. 91 (1996) 1141-1151
Magee, L. (1998): Improving survey-weighted least squares regression, J. Roy Statist.
Soc. B 60 (1998) 115-126
Magness, T.A. and J.B. McGuire (1962): Comparison of least-squares and minimum
variance estimates of regression parameters, Ann. Math. Statist. 33 (1962) 462-470
Magnus, J.R. and H. Neudecker (1988): Matrix differential calculus with applications in
statistics and econometrics, Wiley, Chichester 1988
Mahalanabis, A. and M. Farooq (1971): A second-order method for state estimation of
nonlinear dynamical systems, Int. J. Contr. 14 (1971) 631-639
Mahanobis, P.C., Bose, R.C. and S.N. Roy (1937): Normalisation of statistical variates
and the use of rectangular coordinates in the theory of sampling distributions, Sank-
hya 3 (1937) 1-40
Mallet, A. (1986): A maximum likelihood estimation method for random coefficient
regression models, Biometrika 73 (1986) 645-656
Malliavin, P. (1997): Stochastic analysis, Springer Verlag, New York 1997
Mallows, C.L. (1961): Latent vectors of random symmetric matrices, Biometrika 48
(1961) 133-149
Malyutov, M.B. and R.S. Protassov (1999): Functional approach to the asymptotic nor-
mality of the non-linear least squares estimator, Statistics & Probability Letters 44
(1999) 409-416
Mamontov, Y. and M. Willander (2001): High-dimensional nonlinear diffusion stochastic
processes, World Scientific, Singapore 2001
Mandel, J. (1994): The analysis of two-way layouts, Chapman and Hall, Boca Raton 1994
Mangasarian, O.L. and D.R. Musicant (2000): Robust linear and support vector regres-
sion, IEEE Transactions on Pattern analysis and Maschen Intelligence 22 (2000) 950-
955
References 709
Mangoubi, R.S. (1998): Robust estimation and failure detection, Springer Verlag, Berlin-
Heidelberg New York 1998
Manly, B.F.J. (1976): Exponential data transformations, The Statistician 25 (1976) 37-42
Mardia, K.V. (1962): Multivariate Pareto distributions, Ann. Math. Statistics 33 (1962)
1008-1015
Mardia, K.V. (1970): Measures of multivariate skewness and kurtosis with applications,
Biometrika 57 (1970) 519-530
Mardia, K.V. (1972): Statistics of directional data, Academic Press, New York 1972
Mardia, K.V. (1975a): Characterization of directional distributions, Statistical Distribu-
tions, Scientific Work 3 (1975), G.P. Patil et al (Eds.), 365-385
Mardia, K.V. (1975b): Statistics of directional data, J. Royal Statistical society, Series B,
37 (1975) 349-393
Mardia, K.V. (1976): Linear-circular correlation coefficients and rhythmometry, Bio-
metrika 63 (1976) 403-405
Mardia, K.V. (1988): Directional data analysis: an overview, J. Applied Statistics 2
(1988) 115-122
Mardia, K.V. and M.L. Puri (1978): A robust spherical correlation coefficient against
scale, Biometrika 65 (1978) 391-396
Mardia, K.V., Kent J. and J.M. Bibby: Multivariate analysis, Academic Press, London
1979
Mardia, K.V., Southworth, H.R. and C.C. Taylor (1999): On bias in maximum likelihood
estimators, J. Statist. Planning and Inference 76 (1999) 31-39
Mardia, K.V. and P.E. Jupp (1999): Directional statistics, J. Wiley, England 1999
Marinkovic, P, Grafarend, E.W. and T. Reubelt (2003): Space gravity spectroscopy: the
benefits of Taylor-Karman structured criterion matrices, Advances in Geosciences 1
(2003) 113-120
Mariwalla, K.H. (1971): Coordinate transformations that form groups in the large, in: De
Sitter and Conformal Groups and their Applications, A.O. Barut and W.E. Brittin
(Hrsg.), vol. 13, pages 177-191, Colorado Associated University Press, Boulder 1971
Markatou, M. and X. He (1994): Bounded influence and high breakdown point testing
procedures in linear models, J. Am. Statist. Assn. 89, 543-49, 1994
Markiewicz, A. (1996): Characterization of general ridge estimators, Statistics & Prob-
ability Letters 27 (1996) 145-148
Markov, A.A. (1912): Wahrscheinlichkeitsrechnung, 2nd edition, Teubner, Leipzig 1912
Maroãeviü, T. and D. Jukiü (1997): Least orthogonal absolute deviations problem for
exponential function, Student 2 (1997) 131-138
Marquardt, D.W. (1963): An algorithm for least-squares estimation of nonlinear parame-
ters, J. Soc. Indust. Appl. Math. 11 (1963) 431-441
Marquardt, D.W. (1970): Generalized inverses, ridge regression, biased linear estimation
and nonlinear estimation, Technometrics 12 (1970) 591-612
Marriott, J. and P. Newbold (1998): Bayesian comparison of ARIMA and stationary
ARMA models, J. Statistical Review 66 (1998) 323-336
Marsaglia, G. and G.P.H. Styan (1972): When does rank (A+B) = rank A + rank B?,
Canad. Math. Bull. 15 (1972) 451-452
Marshall, J. (2002): L1-norm pre-analysis measures for geodetic networks, J. Geodesy 76
(2002) 334-344
Martinek, Z. (2002a): Spherical harmonic analysis of regularly distributed data on a
sphere with a uniform or a non-uniform distribution of data uncertainties, or, shortly,
what I know about: Scala surface spherical harmonics, Geodätisches Oberseminar
Stuttgart 2002
710 References
Martinek, Z. (2002b): Lecture Notes 2002. Scalar surface spherical harmonics, Geo For-
schungs Zentrum Potsdam 2002
Martinez, W.L. and E.J. Wegman (2000): An alternative criterion useful for finding exact
E-optimal designs, Statistic & Probability Letters 47 (2000) 325-328
Maruyama, Y. (1998): Minimax estimators of a normal variance, Metrika 48 (1998) 209-
214
Masry, E. (1997): Additive nonlinear arx time series and projection estimates, Economet-
ric Theory 13 (1997) 214-252
Mastronardi, N., Lemmerling, P. and S. van Huffel (2000): Fast structured total least
squares algorithm for solving the basic deconvolution problem, Siam J. Matrix Anal.
Appl. 22 (2000) 533-553
Matérn, B. (1989): Precision of area estimation: a numerical study, J. Microscopy 153
(1989) 269-284
Mathai, A.M. (1997): Jacobians of matrix transformations and functions of matrix argu-
ments, World Scientific, Singapore 1997
Mathar, R. (1997): Multidimensionale Skalierung: mathematische Grundlagen und algo-
rithmische Aspekte, Teubner Stuttgart 1997
Mathew, T. (1989): Optimum invariant tests in mixed linear models with two variance
components, Statistical Data Analysis and Inference, Y. Dodge (ed.), North-Holland
(1989) 381-388
Mathew, T. (1997): Wishart and Chi-Square Distributions Associated with Matrix Quad-
ratic Forms, J. Multivariate Analysis 61 (1997) 129-143
Mathew, T. and B.K. Sinha (1988): Optimum tests in unbalanced two-way models with-
out interaction, Ann. Statist. 16 (1988) 1727-1740
Mathew, T. and S. Kasala (1994): An exact confidence region in multivariate calibration,
The Annals of Statistics 22 (1994) 94-105
Mathew, T. and W. Zha (1996): Conservative confidence regions in multivariate calibra-
tion, The Annals of Statistics 24 (1996) 707-725
Mathew, T. and K. Nordstroem (1997): An inequality for a measure of deviation in linear
models, The American Statistician 51 (1997) 344-349
Mathew, T. and W. Zha (1997): Multiple use confidence regions in multivariate calibra-
tion, J. American Statist. Assoc. 92 (1997) 1141-1150
Mathew, T. and W. Zha (1998): Some single use confidence regions in a multivariate
calibration problem, Applied Statist. Science III (1998) 351-363
Mathew, T., Sharma, M.K. and K. Nordström (1998): Tolerance regions and multiple-use
confidence regions in multivariate calibration, The Annals of Statistics 26 (1998)
1989-2013
Mathias, R. and G.W. Stewart (1993): A block QR algorithm and the singular value de-
composition, Linear Algebra Appl. 182 (1993) 91-100
Mauly, B.F.J. (1976): Exponential data transformations, Statistician 27 (1976) 37-42
Mautz, R. (2002): Solving nonlinear adjustment problems by global optimization, Boll. di
Geodesia e Scienze Affini 61 (2002) 123-134
Maxwell, S.E. (2003): Designig experiments and analyzing data. A model comparison
perspective, Lawrence Erlbaum Associates, Publishers, London - New Jersey 2003
Maybeck, P. (1979): Stochastic models, estimation and control, vol. 1, Academic Press,
New York 1979
Mayer, D.H. (1975): Vector and tensor fields on conformal space, J. Math. Physics 16
(1975) 884-893
Mazya, V. and T. Shaposhnikova (1998): Jacques Hadamand, a universal mathematician
American Mathematical Society, Providence 1998
References 711
McCullagh, P. (1983): Quasi-likelihood functions, The Annals of Statistics 11 (1983) 59-
67
McCullagh, P. (1987): Tensor methods in statistics, Chapman and Hall, London 1987
McCullagh, P. and J.A. Nelder (1989): Generalized linear models, Chapman and Hall,
London 1989
McCulloch, C.E. (1997): Maximum likelihood algorithms for generalized linear mixed
models, J. American Statist. Assoc. 92 (1997) 162-170
McCulloch, C.E. and S.R. Searle (2002): Generalized, lineas and mixed models, Wiley
Series in Probability and Statistic 2002
McElroy, F.W. (1967): A necessary and sufficient condition that ordinary least-squares
estimators be best linear unbiased, J.the American Statistical Association 62 (1967)
1302-1304
McGilchrist, C.A. (1994): Estimation in generalized mixed models, J. Roy. Statist. Soc.,
Series B, 56 (1994) 61-69
McGilchrist, C.A. and C.W. Aisbett (1991): Restricted BLUP for mixed linear models,
Biometrical Journal 33 (1991) 131-141
McGilchrist, C.A. and K.K.W. Yau (1995): The derivation of BLUP, ML, REML estima-
tion methods for generalised linear mixed models, Commun. Statist.-Theor. Meth. 24
(1995) 2963-2980
McMorris, F.R. (1997): The median function on structured metric spaces, Student 2
(1997) 195-201
Mehta, M.L. (1991): Random matrices, Academic Press, New York 1991
Meissl, P. (1965): Über die Innere Genauigkeit dreidimensionaler Punkthaufen, Z. Ver-
messungswesen 90 (1965) 198-118
Meissl, P. (1969): Zusammenfassung und Ausbau der inneren Fehlertheorie eines Punkt-
haufens, Deutsche Geod. Kommission A 61, München 1994, 8-21
Meissl, P. (1976): Hilbert spaces and their application to geodetic least-squares problems,
Bolletino di Geodesia e Scienze Affini 35 (1976) 49-80
Melbourne, W. (1985): The case of ranging in GPS-based geodetic systems, Proc. 1st Int.
Symp. on Precise Positioning with GPS, Rockville, Maryland (1985) 373-386
Menz, J. (2000): Forschungsergebnisse zur Geomodellierung und deren Bedeutung, Ma-
nuskript 2000
Menz, J. and N. Kolesnikov (2000): Bestimmung der Parameter der Kovarianzfunktionen
aus den Differenzen zweier Vorhersageverfahren, Manuskript, 2000
Merriman, M. (1877): On the history of the method of least squares, The Analyst 4 (1877)
Merriman, M. (1884): A textbook on the method of least squares, Wiley, New York 1884
Meyer, C.D. (1973): Generalized inverses and ranks of block matrices, SIAM J. Appl.
Math. 25 (1973) 597-602
Mhaskar, H.N., Narcowich, F.J and J.D. Ward (2001): Representing and analyzing scat-
tered data on spheres, In: Multivariate approximations and applications, Cambridge
University Press, Cambridge 2001, 44-72
Midi, H. (1999): Preliminary estimators for robust non-linear regression estimation, J.
Applied Statistics 26 (1999) 591-600
Migon, H.S. and D. Gammermann (1999): Statistical inference, Arnold London 1999
Miller, R.G. (1981): Simultaneous statistical inference, Springer Verlag 1981
Minami, M. and K. Shimizu (1998): Estimation for a common correlation coefficient in
bivariate normal distributions with missing observations, Biometrics 54 (1998) 1136-
1146
Minzhong, J. and C. Xiru (1999): Strong consistency of least squares estimate in multiple
regression when the error variance is infinite, Statistica Sinica 9 (1999) 289-296
712 References
Minkler, G. and J. Minkler (1993): Theory and application of Kalman filtering, Magellan
Book Company 1993
Misra, R.K. (1996): A multivariate procedure for comparing mean vectors for populations
with unequal regression coefficient and residual covariance matrices, Biom. J. 38
(1996) 415-424
Mitra, S.K. (1982): Simultaneous diagonalization of rectangular matrices, Linear Algebra
Appl. 47 (1982) 139-150
Mjulekar, M.S. and S.N. Mishra (2000): Confidence interval estimation of overlap: Equal
means case, Computational Statistics & Data Analysis 34 (2000) 121-137
Mohan, S.R. and S.K. Neogy (1996): Algorithms for the generalized linear complemen-
tarity problem with a vertical block Z-matrix, Siam J. Optimization 6 (1996) 994-
1006
Moire, C. and J.A. Dawson (1992): Distribution, Chapman and Hall, Boca Raton 1996
Money, A.H. et al. (1982): The linear regression model: Lp-norm estimation and the
choice of p, Commun. Statist. Simul. Comput. 11 (1982) 89-109
Monin, A.S. and A.M. Yaglom (1981): Statistical fluid mechanics: mechanics of turbu-
lence, vol. 2, The Mit Press, Cambridge 1981
Montromery, D.C. (1996): Introduction to statistical quality control, 3rd edition, J. Wiley,
New York 1996
Mood, A.M., F.A. Graybill and D.C. Boes (1974): Introduction to the theory of statistics,
3rd ed., McGraw-Hill, New York 1974
Moon, M.S. and R.F. Gunst (1994): Estimation of the polynomial errors-in-variables
model with decreasing error variances, J. Korean Statist. Soc. 23 (1994) 115-134
Moon, M.S. and R.F. Gunst (1995): Polynomial measurement error modelling, Comput.
Statist. Data Anal. 19 (1995) 1-21
Moore, E.H. (1900): A fundamental remark concerning determinantal notations with the
evaluation of an important determinant of special form, Ann.Math. 2 (1900) 177-188
Moore, E.H. (1920): On the reciprocal of the general algebraic matrix, Bull. Amer. Math.
Soc 26 (1920) 394-395
Morgan, B.J.T. (1992): Analysis of quantal response data, Chapman and Hall, Boca Raton
1992
Morgenthaler, S. (1992): Least-absolute-deviations fits for generalized linear models,
Biometrika 79 (1992) 747-754
Morgera, S. (1992): The role of abstract algebra in structured estimation theory, IEEE
Trans. Inform. Theory 38 (1992) 1053-1065
Moritz, H. (1976): Covariance functions in least-squares collocation, Rep. Ohio State Uni.
240, 1976
Morris, C.N. (1982): Natural exponential families with quadratic variance functions, Ann.
Statist. 10 (1982) 65-80
Morrison, D.F. (1967): Multivariate statistical methods, Mc Graw-Hill Book Comp., New
York 1967
Morrison, T.P. (1997): The art of computerized measurement, Oxford University Press,
Oxford 1997
Morsing, T. and C. Ekman (1998): Comments on construction of confidence intervals in
connection with partial least-squares, J. Chemometrics 12 (1998) 295-299
Moser, B.K. and M.H. McCann (1996): Maximum likelohood and restricted maximum
likelihood estimators as functions of ordinary least squares and analysis of variance
estimators, Commun. Statist.-Theory Meth. 25 (1996) 631-646
Moser, B.K. and J.K. Sawyer (1998): Algorithms for sums of squares and covariance
matrices using Kronecker Products, The American Statistician 52 (1998) 54-57
References 713
Moutard, T. (1894): Notes sur la propagation des ondes et les équations de l'hydroudyna-
mique, Paris 1893, reprint Chelsea Publ., New York 1949
Mudholkar, G.S, (1997): On the efficiencies of some common quick estimators, Commun.
Statist.-Theory Meth. 26 (1997) 1623-1647
Mueller, C.H. (1997): Robust planning and analysis of experiments, Springer Verlag,
New York 1997
Mueller, C.H. (1998): Breakdown points of estimators for aspects of linear models, in:
MODA 5 – Advances in model-oriented data analysis and experimental design, pp.
137-144, Atkinson, A.K., Pronzato, L. and H.P. Wynn (eds), Physica Verlag 1998
Mueller, C.H. (2003): Robust estimators for estimating discontinuous functions, Devel-
opments in Robust Statistics, pp. 266-277, Physica Verlag, Heidelberg 2003
Mueller-Gronbach, T. (1996): Optimal designs for approximating the path of a stochastic
process, J. Statistical Planning and Inference 49 (1996) 371-385
Mueller, H. (1983): Strenge Ausgleichung von Polygonnetzen unter rechentechnischen
Aspekten, Deutsche Geodätische Komission, Bayerische Akademie der Wissenschaf-
ten C279, München 1983
Mueller, H. (1985): Second-order design of combined linear-angular geodetic networks,
Bulletin Geodésique, 59 (1985), 316-331
Mueller, J. (1987): Sufficiency and completeness in the linear model, J. Multivariate
Anal. 21 (1987) 312-323
Mueller, J., Rao, C.R. and B.K. Sinha (1984): Inference on parameters in a linear model: a
review of recent results, in: Experimental design, statistical models and genetic statis-
tics, K. Hinkelmann (ed.), Chap. 16, Marcel Dekker, New York 1984
Mueller, W. (1995): Ein Beispiel zur Versuchungsplanung bei korrelierten Beobachtun-
gen, Osterreichische Zeitschrift für Statistik 24 (1995) 9-15
Mueller, W. (1998a): Spatial data collection, contributions to statistics, Physica Verlag,
Heidelberg 1998
Mueller, W. (1998b): Collecting spatial data - optimum design of experiments for random
fields, Physica-Verlag, Heidelberg 1998
Mueller, W. (2001): Collecting spatial data - optimum design of experiments for random
fields, 2nd ed., Physica-Verlag, Heidelberg 2001
Mueller, W. and A. Pázman (1998): Design measures and extended information matrices
for experiments without replications, J. Statist. Planning and Inference 71 (1998) 349-
362
Mueller, W. and A. Pázman (1999): An algorithm for the computation of optimum de-
signs under a given covariance structure, Computational Statistics 14 (1999) 197-211
Mueler, W.G. (1995): Ein Beispiel zur Versuchungsplanung bei korrelierten Beobachtun-
gen, Österreichische Zeitschrift für Statistik N.F. 24 (1995) 9-15
Mukhopadhyay, P. and R. Schwabe (1998): On the performance of the ordinary least
squares method under an error component model, Metrika 47 (1998) 215-226
Muir, T. (1911): The theory of determinants in the historical order of development, vol-
umes 1-4, Dover, New York 1911, reprinted 1960
Muirhead, R.J. (1982): Aspects of multivariate statistical theory, J. Wiley, New York
1982
Mukherjee, K. (1996): Robust estimation in nonlinear regression via minimum distance
method, Mathematical Methods of Statistics 5 (1996) 99-112
Mukhopadhyay, N. (2000): Probability and statistical inference, Dekker, New York 2000
Muller, C. (1966): Spherical harmonics – Lecture notes in mathematics 17 (1966),
Springer-Verlag, New York, 45pp.
714 References
Muller, D. and W.W.S. Wei (1997): Iterative least squares estimation and identification of
the tranfer function model, J. Time Series Analysis 18 (1997) 579-592
Muller, H. and M. Illner (1984): Gewichtsoptimierung geodätischer Netze. Zur Anpas-
sung von Kriteriumsmatrizen bei der Gewichtsoptimierung, Allg. Vermessungsnach-
richten (1984), 253-269
Muller, H. and G. Schmitt (1985): SODES2 – Ein Programm-System zur Gewichtsopti-
mierung zweidimensionaler geodätischer Netze. Deutsche Geodätische Kommission,
München, Reihe B, 276 (1985)
Munoz-Pichardo, J.M., Munoz-García, J., Fernández-Ponce, J.M. and F. López-Bláquez
(1998): Local influence on the general linear model, Sankhya: The Indian J. Statistics
60 (1998) 269-292
Murray, J.K. and J.W. Rice (1993): Differential geometry and statistics, Chapman and
Hall, Boca Raton 1993
Myers, J.L. (1979): Fundamentals of experimental designs, Allyn and Bacon, Boston
1979
Nadaraya, E. (1993): Limit distribution of the integrated squared error of trigonometric
series regression estimator, Proceedings of the Georgian Academy of Sciences.
Mathematics 1 (1993) 221-237
Naes, T. and H. Martens (1985): Comparison of prediction methods for multicollinear
data, Commun. Statist. Simulat. Computa. 14 (1985) 545-576
Naether, W. (1985): Exact designs for regression models with correlated errors, Statistics
16 (1985) 479-484
Nagaev, S.V. (1979): Large deviations of sums of independent random variables, Ann.
Probability 7 (1979) 745-789
Nagar, D.K. and A.K. Gupta (1996): On a test statistic useful in Manova with structured
covariance matrices, J. Appl. Stat. Science 4 (1996) 185-202
Nagaraja, H.N. (1982): Record values and extreme value distributions, J. Appl. Prob. 19
(1982) 233-239
Nagarsenker, B.N. (1977): On the exact non-null distributions of the LR criterion in a
general MANOVA model, Sankya A39 (1977) 251-263
Nahin, P.J. (2004): When least is best, how mathematicians discovered many clever ways
to make things as small (or a s large) as possible, Princeton University Press, Prince-
ton and Oxford 2004
Nakamura, N. and S. Konishi (1998): Estimation of a number of components for multi-
variate normal mixture models based on information criteria, Japanese J. Appl. Statist
27 (1998) 165-180
Nakamura, T. (1990): Corrected score function for errors-invariables models: methodol-
ogy and application to generalized linear models, Biometrika 77 (1990) 127-137
Nandi, S. and D. Kundu (1999): Least-squares estimators in a stationary random field, J.
Indian Inst. Sci. 79 (1999) 75-88
Nashed, M.Z. (1976): Generalized inverses and applications, Academic Press, New York
1976
Nelson, R. (1995): Probability, stochastic processes, and queueing theory, Springer Ver-
lag, New York 1995
Nesterov, Y.E. and A.S. Nemirovskii (1992): Interior point polynomial methods in con-
vex programming, Springer Verlag, New York 1992
Neudecker, H. (1968): The Kronecker matrix product and some of its applications in
econometrics, Statistica Neerlandica 22 (1968) 69-82
Neudecker, H. (1969): Some theorems on matrix differentiation with special reference to
Kronecker matrix products, J. American Statist. Assoc. 64 (1969) 953-963
References 715
Neudecker, H. (1978): Bounds for the bias of the least squares estimator of V2 in the case
of a first-order autoregressive process (positive autocorrelation), Econometrica 45
(1977) 1257-1262
Neudecker, H. (1978): Bounds for the bias of the LS estimator of V2 in the case of a first-
order (positive) autoregressive process when the regression contains a constant term,
Econometrica 46 (1978) 1223-1226
Neumaier, A. (1998): Solving ILL-conditioned and singular systems: a tutorial on regu-
larization, SIAM Rev. 40 (1998) 636-666
Neuts, M.F. (1995): Algorithmic probability, Chapman and Hall, Boca Raton 195
Neutsch, W. (1995): Koordinaten: Theorie und Anwendungen, Spektrum Akademischer
Verlag, Heidelberg 1995
Neway, W.K. and J.K. Powell (1987): Asymmetric least squares estimation and testing,
Econometrica 55 (1987) 819-847
Newman, D. (1939): The distribution of range in samples from a normal population,
expressed in terms of an independent estimate of standard deviation, Biometrika 31
(1939) 20-30
Neykov, N.M. and C.H. Mueller (2003): Breakdown point and computation of trimmed
likelihood estimators in generalized linear models, in: R. Dutter, P. Filzmoser, U.
Gather, P.J. Rousseeuw (Eds.), Developments in Robust Statistics, pp. 277-286,
Physica Verlag, Heidelberg 2003
Neyman, J. (1934): On the two different aspects of the representative method, J. Royal
Statist. Soc. 97 (1934) 558-606
Neyman, J. (1937): Outline of the theory of statistical estimation based on the classical
theory of probability, Phil. Trans. Roy. Soc. London 236 (1937) 333-380
Ng, M.K. (2000): Preconditioned Lanczos methods for the minimum eigenvalue of a
symmetric positive definitive toeplitz matrix, SIAM J. Svi. Comput 21 (2000) 1973-
1986
Nicholson, W.K. (1999): Introduction to abstract algebra, 2nd ed., J. Wiley, New York
1999
Niemeier, W. (1985): Deformationsanalyse, in: Geodätische Netze in Landes- und Ingeni-
eurvermessung II, Kontaktstudium, ed. H. Pelzer, Wittwer, Stuttgart 1985
Nkuite, G. (1998): Ausgleichung mit singulärer Varianzkovarianzmatrix am Beispiel der
geometrischen Deformationsanalyse, Dissertation, TU München, München 1998
Nobre, S. and M. Teixeiras (2000): Der Geodät Wilhelm Jordan und C.F. Gauss, Gauss-
Gesellschaft e.V. Goettingen, Mitt. 38, pp. 49-54 Goettingen 2000
Nyquist, H. (1988): Least orthogonal absolute deviations, Comput. Statist. Data Anal. 6
(1988) 361-367
O'Neill, M., Sinclair, L.G. and F.J. Smith (1969): Polynomial curve fitting when abscissa
and ordinate are both subject ot error, Comput. J. 12 (1969) 52-56
O'Neill, M.E. and K. Mathews (2000): A weighted least squares approach to Levene's test
of homogeneity of variance, Austral. & New Zealand J. Statist. 42 (2000) 81-100
Offlinger, R. (1998): Least-squares and minimum distance estimation in the three-
parameter Weibull and Fréchet models with applications to river drain data, in: Kahle,
et al (eds.) Advances in stochastic models for reliability, quality and safety, pages 81-
97, Birkhäuser Verlag, Boston 1998
Ogawa, J. (1950): On the independence of quadratic forms in a non-central normal sys-
tem, Osaka Mathematical Journal 2 (1950) 151-159
Ohtani, K. (1988a): Optimal levels of significance of a pre-test in estimating the distur-
bance variance after the pre-test for a linear hypothesis on coefficients in a linear re-
gression, Econom. Lett. 28 (1988) 151-156
716 References
Ohtani, K. (1998b): On the sampling performance of an improved Stein inequality re-
stricted estimator, Austral. and New Zealand J. Statis. 40 (1998) 181-187
Ohtani, K. (1998c): The exact risk of a weighted average estimator of the OLS and Stein-
rule estimators in regression under balanced loss, Statistics & Decisions 16 (1998) 35-
45
Ohtani, K. (1996): Further improving the Stein-rule estimator using the Stein variance
estimator in a misspecified linear regression model, Statist. Probab. Lett. 29 (1996)
191-199
Okamoto, M. (1973): Distinctness of the eigenvalues of a quadratic form in a multivariate
sample, Ann. Stat. 1 (1973) 763-765
Okeke, F. and F. Krumm (1998): Graph, graph spectra and partitioning algorithms in a
geodetic network structural analysis and adjustment, Bolletino di Geodesia e Scienze
Affini 57 (1998) 1-24
Olkin, I. (1998): The density of the inverse and pseudo-inverse of a random matrix, Statis-
tics and Probability Letters 38 (1998) 131-135
Olkin, J. (2000): The 70th anniversary of the distribution of random matrices: a survey,
Technical Report No. 2000-06, Department of Statistics, Stanford University, Stan-
ford 2000
Olkin, I. and S.N. Roy (1954): On multivariate distribution theory, Ann. Math. Statist. 25
(1954) 329-339
Olkin, I. and A.R. Sampson (1972): Jacobians of matrix transformations and induced
functional equations, Linear Algebra Appl. 5 (1972) 257-276
Olkin, I. and J.W. Pratt (1958): Unbiased estimation of certain correlation coefficient,
Annals Mathematical Statistics 29 (1958) 201-211
Olsen, A., Seely, J. and D. Birkes (1976): Ivariant quadratic unbiased estimators for two
variance components, Annals of Statistics 4 (1976) 878-890
Ord, J.K. and S. Arnold (1997): Kendall’s advanced theory of statistics, volume IIA,
classical inference, Arnold Publ., 6th edition, London 1997
Ortega, J.M. and W.C. Rheinboldt (2000): Iterative solution of nonlinear equations in
several variables, SIAM 2000
Osborne, M.R. (1972): Some aspects of nonlinear least squares calculations, Numerical
Methods for Nonlinear Optimization, ed. F.A. Lootsma, Academic Press, New York
1972
Osborne, M.R. (1976): Nonlinear least squares the Levenberg algorithm revisited, J. Aust.
Math. Soc. B 19 (1976) 343-357
Osborne, M.R. and G.K. Smyth (1986): An algorithm for exponential fitting revisited, J.
App. Prob. (1986) 418-430
Osborne, M.R. and G.K. Smyth (1995): A modified Prony algorithm for fitting sums of
exponential functions, SIAM J. Sc. and Stat. Comp. 16 (1995) 119-138
Osiewalski, J. and M.F.J. Steel (1993): Robust Bayesian inference in elliptical regression
models, J. Economatrics 57 (1993) 345-363
Ouellette, D.V. (1981): Schur complements and statistics, Linear Algebra Appl. 36 (1981)
187-295
Owens, W.H. (1973): Strain modification of angular density distributions, Techtonophys-
ics 16 (1973) 249-261
Oyet, A.J. and D.P. Wiens (2000): Robust designs for wavelet approximations of regres-
sion models, Nonparametric Statistics 12 (2000) 837-859
Ovtchinnikov, E.E. and L.S. Xanthis (2001): Successive eigenvalue relaxation : a new
method for the generalized eigenvalue problem and convergence estimates, Proc. R.
Soc. Lond. A 457 (2001) 441-451
References 717
Padmawar, V.R. (1998): On estimating nonnegative definite quadratic forms, Metrika 48
(1998) 231-244
Pagano, M. (1978): On periodic and multiple autoregressions, Annals of Statistics 6
(1978) 1310-1317
Paige, C.C. and M.A.Saunders (1975): Solution of sparse indefinite systems of linear
equations, SIAM J. Numer. Anal. 12 (1975) 617-629
Paige, C. and C. van Loan (1981): A Schur decomposition for Hamiltonian matrices,
Linear Algebra and its Applications 41 (1981) 11-32
Pakes, A.G. (1999): On the convergence of moments of geometric and harmonic means,
Statistica Neerlandica 53 (1999) 96-110
Pal, N. and W.K. Lim (2000): Estimation of a correlation coefficient: some second order
decision – theoretic results, Statistics and Decisions 18 (2000) 185-203
Pal, S.K. and P.P. Wang (1996): Genetic algorithms for pattern recognition, CRC Press,
Boca Raton 1996
Papoulis, A. (1991): Probability, random variables and stochastic processes, McGraw
Hill, New York 1991
Park, H. (1991): A parallel algorithm for the unbalanced orthogonal Procrustes problem,
Parallel Computing 17 (1991) 913-923
Parker, W.V. (1945): The characteristic roots of matrices, Duke Math. J. 12 (1945) 519-
526
Parthasaratky, K.R. (1967): Probability measures on metric spaces, Academic Press, New
York 1967
Parzen, E. (1962): On estimation of a probability density function and mode, Ann. Math.
Statistics 33 (1962) 1065-1073
Patel, J.K. and C.B. Read (1982): Handbook of the normal distribution, Marcel Dekker,
New York and Basel 1982
Patil, V.H. (1965): Approximation to the Behrens-Fisher distribution, Biometrika 52
(1965) 267-271
Pázman, A. (1986): Foundations of optimum experimental design, Mathematics and its
applications, D. Reidel, Dordrecht 1986
Pázman, A. and J.-B. Denis (1999): Bias of LS estimators in nonlinear regression models
with constraints. Part I: General case, Applications of Mathematics 44 (1999) 359-374
Pázman, A. and W.G. Mueler (1998): A new interpretation of design measures, in:
MODA 5 – Advances in model-oriented data analysis and experimental design, pp.
239-246, Atkinson, A.K., Pronzato, L. and H.P. Wynn (eds), Physica Verlag 1998
Pearson, E.S. (1970): William Sealy Gosset, 1876-1937: "Student" as a statistician, Stud-
ies in the History of Statistics and Probalbility (E.S. Pearson and M.G. Kendall, eds.)
Hafner Publ., 360-403, New York 1970
Pearson, E.S. and H.O. Hartley (1958): Biometrika Tables for Statisticians Vol. 1, Cam-
bridge University Press, Cambridge 1958
Pearson, K. (1905): The problem of the random walk, Nature 72 (1905) 294
Pearson, K. (1906): A mathematical theory of random migration, Mathematical Contribu-
tions to the Theory of Evolution, XV Draper’s Company Research Memoirs, Bio-
metrik Series III, London 1906
Pearson, K. (1931): Historical note on the distributional of the Standard Deviations of
Samples of any size from any indefinitely large Normal Parent Population, Bio-
metrika 23 (1931) 416-418
Peddada, S.D. and T. Smith (1997): Consistency of a class of variance estimators in linear
models under heteroscedasticity, Sankhya 59 (1997) 1-10
718 References
Pelzer, H. (1971): Zur Analyse geodätischer Deformationsmessungen, Deutsche Geodäti-
sche Kommission, Akademie der Wissenschaften, Reihe C (164), München 1971
Pelzer, H. (1974): Zur Behandlung singulärer Ausgleichungsaufgaben, Z. Vermessungs-
wesen 99 (1974) 181-194, 479-488
Pena, D., Tiao, G.C., and R.S. Tsay (2001): A course in time series analysis, Wiley, New
York 2001
Penrose, R. (1955): A generalised inverse for matrices, Proc. Cambridge Phil. Soc. 51
(1955) 406-413
Penny, K.I. (1996): Appropriate critical values when testing for a single multivariate
outlier by using the Mahalanobis distance, in: Applied Statistics, ed. S.M. Lewis and
D.A. Preece, J. Royal Stat. Soc. 45 (1996) 73-81
Percival, D.B. and A.T. Walden (1993): Spectral analysis for physical applications, Cam-
bridge, Cambridge University Press 1997
Pereyra, V. and G. Scherer (1973): Efficient computer manipulation of tensor products
with application to multidimensional approximation, Math. Computation 27 (1973)
595-605
Perron, F. and N. Giri (1992): Best equivariant estimation in curved covariance models, J.
Multivariate Analysis 40 (1992) 46-55
Perron, F. and N. Giri (1990): On the best equivariant estimator of mean of a multivariate
normal population, J. Multivariate Analysis 32 (1990) 1-16
Percival, D.B. and A.T. Walden (1999): Wavelet methods for time series analysis, Cam-
bridge University Press, Cambridge 1999
Petrov, V.V. (1975): Sums of independent random variables, Berlin 1975
Pfeufer, A. (1990): Beitrag zur Identifikation und Modellierung dynamischer Deformati-
onsprozesse, Vermessungstechnik 38 (1990) 19-22
Pfeufer, A. (1993): Analyse und Interpretation von Überwachungsmessungen - Termino-
logie und Klassifikation, Z. Vermessungswesen 118 (1993) 19-22
Phillips, G.M. (2000): Two millennia of mathematics – From Archimedes to Gauss,
Springer 2000
Piepho, H.-P. (1998): An algorithm for fitting the shifted multiplicative model, J. Statist.
Comput. Simul. 62 (1998) 29-43
Pilz, J. (1983): Bayesian estimation and experimental design in linear regression models,
Teubner-Texte zur Mathematik 55, Teubner, Leipzig 1983
Pincus, R. (1974): Estimability of parameters of the covariance matrix and variance com-
ponents, Math. Operationsforschg. Statistik 5 (1974) 245-248
Pinheiro, J.C. and D.M. Bates (2000): Mixed-effects models in S and S-Plus, Statistics
and Computing, Springer-Verlag, New York 2000
Pison, G., Van Aelst, S. and G. Willems (2003): Small sample corrections for LTS and
MCD, Developments in Robust Statistics, pp. 330-343, Physica Verlag, Heidelberg
2003
Pistone, G. and M.P. Rogantin (1999): The exponential statistical manifold: mean pa-
rameters. Orthogonality and space transformations, Bernoulli 5 (1999) 721-760
Pitman, E.J.G. (1979): Some basic theory for statistical inference, Chapman and Hall,
Boca Raton 1979
Pitman, J. and M. Yor (1981): Bessel processes and infinitely divisible laws, unpublished
report, University of California, Berkeley
Plachky, D. (1993): An estimation-theoretical characterization of the Poisson distribution,
Statistics and Decisions, Supplement Issue 3 (1993) 175-178
Plackett, R.L. (1949): A historical note on the method of least-squares, Biometrika 36
(1949) 458-460
References 719
Plackett, R.L. (1972): The discovery of the method of least squares, Biometrika 59 (1972)
239-251
Plato, R. (1990): Optimal algorithms for linear ill-posed problems yield regularization
methods, Numer. Funct. Anal. Optim. 11 (1990) 111-118
Plemmons, R.J. (1990): Recursive least squares computation, Proceedings of the Interna-
tional Symposium MTNS 3 (1990) 495-502
Pohst, M. (1987): A modification of the LLL reduction algorithm, J. Symbolic Computa-
tion 4 (1987) 123-127
Poirier, D.J. (1995): Intermediate statistics and econometrics, The MIT Press, Cambridge
1995
Poisson, S.D. (1827): Connaissance des temps de l’annee 1827
Polasek, W. and S. Liu (1997): On generalized inverses and Bayesian analysis in simple
ANOVA models, Student 2 (1997) 159-168
Pollock, D.S.G. (1999): A handbook of time-series analysis, signal processing and dy-
namics, Academic Press, Cambridge 1999
Polya, G. (1919): Zur Statistik der sphaerischen Verteilung der Fixsterne, Astr. Nachr.
208 (1919) 175-180
Polya, G. (1930): Sur quelques points de la théorie des probabilités, Ann. Inst. H. Poin-
care 1 (1930) 117-161
Pope, A.J. (1976): The statistics of residuals and the detection of outliers, NOAA Techni-
cal Report, NOW 65 NGS 1, U.S. Dept. of Commerce, Rockville, Md., 1976
Popinski, W. (1999): Least-squares trigonometric regression estimation, Applicationes
Mathematicae 26 (1999) 121-131
Portnoy, S. and R. Koenker (1997): The Gaussian hare and the Laplacian tortoise: com-
putability of squared error versus absolute-error estimators, Statistical Science 12
(1997) 279-300
Potts, D., Steidl, G. and M. Tasche (1996): Kernels of spherical harmonics and spherical
frames, in: Advanced Topics in Multivariate Approximation pp. 287-301, Fontanella,
F., Jetter, K. and P.J. Laurent (eds), World Scientific Publishing 1996
Powers, D.L. (1999): Boundary value problems, Harcourt Academic Press 1999
Pratt, J.W. (1961): Length of confidence intervals, J. American Statistical Assoc. 56
(1961) 549-567
Pratt, J.W. (1963): Shorter confidence intervals for the mean of a normal distribution with
known variance, Ann. Math. Statist. 34 (1963) 574-586
Prescott, P. (1975): An approximate tests for outliers in linear models, Technometric 17
(1975) 129-132
Presnell, B., Morrison, S.P. and R.C. Littell (1998): Projected multivariate linear models
for directional data, J. American Statist. Assoc. 93 (1998) 1068-1077
Press, S.J. (1989): Bayesian statistics: Principles, models and applications, Wiley, New
York 1989
Press, W.H., Teukolsky, S.A., Vetterling, W.T. and B.P. Flannery (1992): Numerical
Recipes in FORTRAN (2nd edition), Cambridge University Press, Cambridge 1992
Priestley, M.B. (1981): Spectral analysis and time series, Vol. 1 and 2, Academic Press,
London 1981
Priestley, M.B. (1988): Nonlinear and nonstationary time series analysis, Academic Press,
London 1988
Prony, R. (1795): Essai experimentale et analytique, J.Ecole Polytechnique (Paris) 1
(1795) 24-76
Prószynski, W. (1997): Measuring the robustness potential of the least-squares estimation:
geodetic illustration, J. Geodesy 71 (1997) 652-659
720 References
Pruscha, H. (1996): Angewandte Methoden der Mathematischen Statistik, Teubner Skrip-
ten zur Mathematischen Stochastik, Stuttgart 1996
Pugachev, V.S. and I.N. Sinitsyn (2002): Stochastic systems, Theory and applications,
Russian Academy of Sciences 2002
Puntanen, S., Styan, G.P.H. and H.J. Werner (2000): Two matrix-based proofs that the
linear estimator Gy is the best linear unbiased estimator, J. Statist. Planning and Infer-
ence 88 (2000) 173-179
Pukelsheim, F. (1981a): Linear models and convex geometry: Aspects of non-negative
variance estimation, Math. Operationsforsch. u. Stat. 12 (1981) 271-286
Pukelsheim, F. (1981b): On the existence of unbiased nonnegative estiamtes of variance
covariance components, Ann. Statist. 9 (1981) 293-299
Pukelsheim, F. (1993): Optimal design of experiments, Wiley, New York 1993
Pukelsheim, F. (1994): The three sigma rule, American Statistician 48 (1994) 88-91
Pukelsheim, F. and B. Torsney (1991): Optimal weights for experimental designs on
linearly independent support points, The Annals of Statistics 19 (1991) 1614-1625
Pukelsheim, F. and W.J. Studden (1993): E-optimal designs for polynomial regression,
Ann. Stat. 21 (1993) 402-415
Qingming, G. and L. Jinshan (2000): Biased estimation in Gauss-Markov model, Allg.
Vermessungsnachrichten 107 (2000) 104-108
Qingming, G., Yuanxi, Y. and G. Jianfeng (2001): Biased estimation in the Gauss-
Markov model with constraints, Allg. Vermessungsnachrichten 108 (2001) 28-30
Qingming, G., Lifen, S., Yuanxi, Y. and G. Jianfeng (2001): Biased estimation in the
Gauss-Markov model not full of rank, Allg. Vermessungsnachrichten 108 (2001) 390-
393
Quintana, E.S., Quintana, G., Sun, X. and R. van de Geijn (2001): A note on parallel
matrix inversion, SIAM J. Sci. Comput 22 (2001) 1762-1771
Quintana-Orti, G., Sun, X. and C.H. Bischof (1998): A blas-3-version of the QR factoriza-
tion with column pivoting, SIAM J. Sci. Comput. 19 (1998) 1486-1494
Rader, C. and A.O. Steinhardt (1988): Hyperbolic householder transforms, SIAM. J.
Matrix Anal. Appl. 9 (1988) 269-290
Rafajlowicz, E. (1988): Nonparametric least squares estimation of a regression function,
Statistic 19 (1988) 349-358
Raj, D. (1968): Sampling theory, Mc Graw-Hill Book Comp., Bombay 1968
Ramsey, J.O. and B.W. Silverman (1997): Functional data analysis, Springer Verlag, New
York 1997
Rao, B.L.S.P. (1997a): Variance components, Chapman and Hall, Boca Raton 1997
Rao, B.L.S.P. (1997b): Weighted least squares and nonlinear regression, J. Ind. Soc. Ag.
Statistics 50 (1997) 182-191
Rao, B.L.S.P. and B.R. Bhat (1996): Stochastic processes and statistical inference, New
Age International, New Delhi 1996
Rao, C.R. (1945): Generalisation of Markoff’s Theorem and tests of linear hypotheses,
Sankya 7 (1945) 9-16
Rao, C.R. (1952a): Some theorems on Minimum Variance Estimation, Sankhya 12, 27-42
Rao, C.R. (1952b): Advanced statistical methods in biometric research, Wiley, New York
1952
Rao. C.R. (1965a): Linear Statistical Interference and ist Applications, Wiley, New York
1965
Rao, C.R. (1965b): The theory of least squares when the parameters are stochastic and its
application to the analysis of growth curves, Biometrika 52 (1965) 447-458
References 721
Rao, C.R. (1970): Estimation of heteroscedastic variances in linear models, J. Am. Stat.
Assoc. 65 (1970) 161-172
Rao, C.R. (1971a): Estimation of variance and covariance components - MINQUE theory,
J. Multivar. Anal. 1 (1971) 257-275
Rao, C.R. (1971b): Unified theory of linear estimation, Sankya Ser. A33 (1971) 371-394
Rao, C.R. (1971c): Minimum variance quadratic unbiased estimation of variance compo-
nents, J. Multivar. Anal. 1 (1971) 445-456
Rao, C.R. (1972a): Unified theory of least squares, Communications in Statistics 1 (1972)
1-8
Rao, C.R. (1972b): Estimation of variance and covariance components in linear models. J.
Am. Stat. Ass. 67 (1972) 112-115
Rao, C.R. (1973a): Linear statistical inference and its applications, 2nd ed., Wiley, New
York 1973
Rao, C.R. (1973b): Representation of best linear unbiased estimators in the Gauss-
Markoff model with a singular dispersion matrix, J. Multivariate Analysis 3 (1973)
276-292
Rao, C.R. (1976): Estimation of parameters in a linear model, Ann. Statist. 4 (1976) 1023-
1037
Rao, C.R. (1985): The inefficiency of least squares: extensions of the Kantorovich ine-
quality, Linear algebra and its applications 70 (1985) 249-255
Rao, C.R. and S.K. Mitra (1971): Generalized inverse of matrices and its applications,
Wiley, New York 1971
Rao, C.R. and J. Kleffe (1988): Estimation of variance components and applications,
North Holland, Amsterdam 1988
Rao, C.R. and R. Mukerjee (1997): Comparison of LR, score and wald tests in a non-IID
setting, J. Multivariate Analysis 60 (1997) 99-110
Rao, C.R. and H. Toutenburg (1995a): Linear models, least-squares and alternatives,
Springer-Verlag, New York 1995
Rao, C.R. and H. Toutenburg (1995b): Linear models, Springer Verlag, New York 1995
Rao, C.R. and H. Toutenburg (1999): Linear models, Least squares and alternatives, 2nd
ed., Springer Verlag, New York 1999
Rao, C.R. and G. J. Szekely (2000): Statistics for the 21st century - Methodologies for
applications of the future, Marcel Dekker, Basel 2000
Rao, P.S.R.S. and Y.P. Chaubey (1978): Three modifications of the principle of the
MINQUE, Commun. Statist. Theor. Methods A7 (1978) 767-778
Ravishanker, N. and D.K. Dey (2002): A first course in linear model theory – Multivariate
normal and related distributions, Chapman & Hall/CRC 2002
Ravi, V. and H.-J. Zimmermann (2000): Fuzzy rule based classification with FeatureSe-
lector and modified threshold accepting, European J.Operational Research 123 (2000)
16-28
Ravi, V., Reddy, P.J. and H.-J. Zimmermann (2000): Pattern classification with principal
component analysis and fuzzy rule bases, European J.Operational Research 126
(2000) 526-533
Ravishanker, N. and D.K. Dey (2002): A first course in linear model theory, CRC Press,
Boca Raton 2002
Rayleigh, L. (1880): On the resultant of a large number of vibrations of the same pitch
and of arbitrary phase, Phil. Mag. 5 (1880) 73-78
Rayleigh, L. (1905): The problem of random walk, Nature 72 (1905) 318
Rayleigh, L. (1919): On the problem of random vibrations, and of random flights in one,
two or three dimensions, Phil Mag. 37 (1919) 321-347
722 References
Reeves, J. (1998): A bivariate regression model with serial correlation, The Statistician 47
(1998) 607-615
Reich, K. (2000): Gauss' Schüler. Studierten bei Gauss und machten Karriere. Gauss'
Erfolg als Hochschullehrer (Gauss's students: studied with him and were successful.
Gauss's success as a university professor), Gauss Gesellschaft E.V.Göttingen, Mittei-
lungen Nr. 37, pages 33-62, Göttingen 2000
Relles, D.A. (1968): Robust regression by modified least squares, PhD. Thesis, Yale
University, Yale 1968
Remondi, B.W. (1984): Using the Global Positioning System (GPS) phase observable for
relative geodesy: modelling, processing and results. PhD.Thesis, Center for Space Re-
search, The University of Texas, Austin 1984
Ren, H. (1996): On the eroor analysis and implementation of some eigenvalue decomposi-
tion and singular value decomposition algorithms, UT-CS-96-336, LAPACK working
note 115 (1996)
Rencher A.C. (2000): Linear models in statistics, J. Wiley, New York 2000
Renfer, J.D. (1997): Contour lines of L1 -norm regression, Student 2 (1997) 27-36
Resnikoff, G.J. and G.J. Lieberman (1957): Tables of the noncentral t-distribution, Stan-
ford University Press, Stanford 1957
Riccomagno, E., Schwabe, R. and H.P. Wynn (1997): Lattice-based optimum design for
Fourier regression, Ann. Statist. 25 (1997) 2313-2327
Rice, J.R. (1969): The approximation of functions, vol. II - Nonlinear and multivariate
theory, Addison-Wesley, Reading 1969
Richards, F.S.G. (1961): A method of maximum likelihood estimation, J. Royal Stat. Soc.
B 23 (1961) 469-475
Richter, H. and V. Mammitzsch (1973): Methode der kleinsten Quadrate, Stuttgart 1973
Riedel, K.S. (1992): A Sherman-Morrison-Woodbury identity for rank augmenting matri-
ces with application to centering, SIAM J. Matrix Anal. Appl. 13 (1992) 659-662
Riedwyl, H. (1997): Lineare Regression, Birkhäuser Verlag, Basel 1997
Richter, W.D. (1985): Laplace-Gauß integrals, Gaussian measure asymptotic behaviour
and probabilities of moderate deviations, Z. Analysis und ihre Anwendungen 4 (1985)
257-267
Rilstone, P., Srivastava, V.K. and A. Ullah (1996): The second order bias and mean
squared error of nonlinear estimators, J. Econometrics 75 (1996) 369-395
Rivest, L.P. (1982): Some statistical methods for bivariate circular data, J. Royal Statisti-
cal Society, Series B: 44 (1982) 81-90
Rivest, L.P. (1988): A distribution for dependent unit vectors, Comm. Statistics A: Theory
Methods 17 (1988) 461-483
Rivest, L.P. (1989): Spherical regression for concentrated Fisher-von Mises distributions,
Annals of Statistics 17 (1989) 307-317
Roberts, P.H. and H.D. Ursell (1960): Random walk on the sphere and on a Riemannan
manifold, Phil. Trans. Roy. Soc. A252 (1960) 317-356
Robinson, G.K. (1982): Behrens-Fisher problem, Encyclpedia of the Statistical Sciences,
Vol. 1, Wiley, New York 1982
Robinson, P.M. and C. Velasco (1997): Autocorrelation-Robust Interference, Handbook
of Statistics 15 (1997) 267-298
Rodgers, J.L. and W.A. Nicewander (1988): Thirteen ways to look at the correlation
coefficient, the Maerican Statistician 42 (1988) 59-66
Rohatgi, V.K. (1987): Statistical inference, J. Willey & Sons 1987
Rohde, C.A. (1966): Some results on generalized inverses, SIAM Rev. 8 (1966) 201-205
References 723
Romano, J.P. and A.F. Siegel (1986): Counterexamples in probability and statistics,
Chapman and Hall, Boca Raton 1986
Romanowski, M. (1979): Random errors in observations and the influence of modulation
on their distribution, K. Wittwer Verlag, Stuttgart 1979
Rosen, J.B., Park, H. and J. Glick (1996): Total least norm formulation and solution for
structured problems, SIAM J. Matrix Anal. Appl. 17 (1996) 110-126
Rosén, K.D.P. (1948): Gauss’s mean error and other formulae for the precision of direct
observations of equal weight, Tätigkeitsbereiche Balt. Geod. Komm. 1944-1947, pp.
38-62, Helsinki 1948
Rosenblatt, M. (1971): Curve estimates, Ann. Math. Statistics 42 (1971) 1815-1842
Rosenblatt, M. (1997): Some simple remarks on an autoregressive scheme and an implied
problem, J. theor. Prob. 10 (1997) 295-305
Ross, G.J.S. (1982): Non-linear models, Math. Operationsforschung Statistik 13 (1982)
445-453
Ross, S.M. (1983): Stochastic processes, Wiley, New York 1983
Rousseeuw, P.J. and A.M. Leroy (1987): Robust regression and outlier detection, J.
Wiley, New York 1987
Roy, T. (1995): Robust non-linear regression analysis, J. Chemometrics 9 (1995) 451-457
Rozanski, I.P. and R. Velez (1998): On the estimation of the mean and covariance pa-
rameter for Gaussian random fields, Statistics 31 (1998) 1-20
Rueda, C., Salvador, B. and M.A. Fernández (1997): Simultaneous estimation in a re-
stricted linear model, J. Multivariate Analysis 61 (1997) 61-66
Rueschendorf, L. (1988): Asymptotische Statistik, Teubner, Stuttgart 1988
Rummel, R. (1975): Zur Behandlung von Zufallsfunktionen und –folgen in der physikali-
schen Geodäsie, Deutsche Geodätische Kommission bei der Bayerischen Akademie
der Wissenschaften, Report No. C 208, München 1975
Rummel, R. and K. P. Schwarz (1977): On the nonhomogenity of the global covariance
function, Bull. Géodésique 51 (1977) 93-103
Ruppert, D. and R.J. Carroll (1980): Trimmed least squares estimation in the linear model,
J.the American Statistical Association 75 (1980) 828-838
Rutherford, A. (2001): Introducing Anova and Ancova – a GLM approach, Sage, London
2001
Rutherford, D.E. (1933): On the condition that two Zehfuss matrices be equal, Bull.
American Math. Soc. 39 (1933) 801-808
Saalfeld, A. (1999): Generating basis sets of double differences, J. Geodesy 73 (1999)
291-297
Sacks, J. and D. Ylvisaker (1966): Design for regression problems with correlated errors,
Annals of Mathematical Statistics 37 (1966) 66-89
Sahai, H.(2000): The analysis of variance: fixed, random and mixed models, 778 pages,
Birkhaeuser Verlag, Basel 2000
Saichev, A.I. and W.A. Woyczynski (1996): Distributions in the physical and engineering
sciences, Vol. 1, Birkäuser Verlag, Basel 1996
Sakallioglu, S., Kaciranlar, S. and F. Akdeniz (2001): Mean squared error comparisons of
some biased regression estimators, Commun. Statist.- Theory Math. 30 (2001) 347-
361
Samorodnitsky, G. and M.S. Taqqu (1994): Stable non-Gaussian random processes,
Chapman and Hall, Boca Raton 1994
Sampson, P.D. and P. Guttorp (1992): Nonparametric estimation of nonstationary spatial
covariance structure, J.the American Statistical Association 87 (1992) 108-119
Sander, B. (1930): Gefugekunde und Gesteine, J. Springer, Vienna 1930
724 References
Sansò, F. (1990): On the aliasing problem in the spherical harmonic analysis, Bull. Géod.
64 (1990) 313-330
Sansò, F. and G. Sona (1995): The theory of optimal linear estimation for continuous
fields of measurements, Manuscripta Geodetica 20 (1995) 204-230
Sastry, S. (1999): Nonlinear systems: Analysis, stability and control, Springer-Verlag,
New York 1999
Sathe, S.T. and H.D. Vinod (1974): Bound on the variance of regression coefficients due
to heteroscedastic or autoregressive errors, Econometrica 42 (1974) 333-340
Saw, J.G. (1978): A family of distributions on the m-sphere and some hypothesis tests,
Biometrika 65 (1978) 69-73
Saw, J.G. (1981): On solving the likelihood equations which derive from the Von Mises
distribution, Technical Report, University of Florida, 1981
Saxe, K. (2002): Beginning functional analyses, Springer 2002
Sayed, A.H., Hassibi, B. and T. Kailath (1996): Fundamental inertia conditions for the
minimization of quadratic forms in indefinite metric spaces, Oper. Theory: Adv.
Appl., Birkhäuser Verlag, Cambridge/ Mass 1996
Schach, S. and T. Schäfer (1978): Regressions- und Varianzanalyse, Springer, Berlin
1978
Schafer, J.L. (1997): Analysis of incomplete multivariate data, Chapman and Hall, Lon-
don 1997
Schaffrin, B. (1979): Einige ergänzende Bemerkungen zum empirischen mittleren Fehler
bei kleinen Freiheitsgraden, Z. Vermessungsesen 104 (1979) 236-247
Schaffrin, B. (1983a): Varianz-Kovarianz Komponentenschätzung bei der Ausgleichung
heterogener Wiederholungsmessungen, Deutsche Geodätische Kommission, Report C
282, München, 1983
Schaffrin, B. (1983b): Model choice and adjustment techniques in the presence of prior
information, Ohio State University Department of Geodetic Science and Surveying,
Report 351, Columbus 1983
Schaffrin, B. (1985): The geodetic datum with stochastic prior information, Publ. C313,
German Geodetic Commission, München 1985
Schaffrin, B. (1991): Generalized robustified Kalman filters for the integration of GPS
and INS, Tech. Rep. 15, Geodetic Institute, Stuttgart Unversity 1991
Schaffrin, B. (1995): A generalized Lagrange function approach to include fiducial con-
straints, Z. Vermessungswesen 7 (1995) 325-350
Schaffrin, B. (1997): Reliability measures for correlated observations, J. Surveying Engin-
eering 123 (1997) 126-137
Schaffrin, B. (1999): Softly unbiased estimation part1: The Gauss-Markov model, Linear
Algebra and its Applications 289 (1999) 285-296
Schaffrin, B. (2001a): Equivalent systems for various forms of kriging, including least-
squares collocation, Z. Vermessungswesen 126 (2001) 87-94
Schaffrin, B. (2001b): Minimum mean square error adjustment, part. I: the empirical BLE
and the repro-BLE for direct observation, J.the Geodetic Society of Japan 46 (2000)
21-30
Schaffrin, B. and E.W. Grafarend (1982a): Kriterion-Matrizen II: Zweidimensionale ho-
mogene und isotrope geodätische Netze, Teil II a: Relative cartesische Koordinaten,
Z. Vermessungswesen 107 (1982) 183-194
Schaffrin, B. and E.W. Grafarend (1982b): Kriterion-Matrizen II: Zweidimensionale ho-
mogene und isotrope geodätische Netze. Teil II b: Absolute cartesische Koordinaten,
Z. Vermessungswesen 107 (1982) 485-493
Schaffrin, B., Grafarend, E.W. and G. Schmitt (1977): Kanonisches Design Geodätischer
Netze I, Manuscripta Geodaetica 2 (1977) 263-306
References 725
Schaffrin, B., Grafarend, E.W. and G. Schmitt (1978): Kanonisches Design Geodätischer
Netze II, Manuscripta Geodaetica 2 (1978) 1-22
Schaffrin, B. and E.W. Grafarend (1986): Generating classes of equivalent linear models
by nuisance parameter elimination, Manuscripta Geodaetica 11 (1986) 262-271
Schaffrin, B. and E.W. Grafarend (1991): A unified computational scheme for traditional
and robust prediction of random effects with some applications in geodesy, The Fron-
tiers of Statistical Scientific Theory & Industrial Applications 2 (1991) 405-427
Schaffrin, B. and J.H. Kwon (2002): A bayes filter in friendland form for INS/GPS vector
gravimetry, Geoph. J. Int. 149 (2002) 64-75
Shanbhag D.N. and C.R. Rao (eds.) (2001): Stochastic Processes: Theory and methods,
Elsevier 2001
Scheidegger, A.E. (1965): On the statistics of the orientation of bedding planes, grain axes
and similar sedimentological data, U.S. Geol. Survey Prof. Paper 525-C (1965) 164-
167
Schetzen, M. (1980): The Volterra and Wiener theories of nonlinear systems, Wiley, New
York 1980
Schick, A. (1999): Improving weighted least-squares estimates in heteroscedastic linear
regression when the variance is a function of the mean response, J.Statistical Planning
and Inference 76 (1999) 127-144
Schiebler, R. (1988): Giorgio de Chirico and the theory of relativity, Lecture given at
Stanford University, Wuppertal 1988
Schmetterer, L. (1956): Einführung in die mathematische Statistik, Wien 1956
Schmidt, E. (1907): Entwicklung willkürlicher Funktionen, Math. Annalen 63 (1907) 433-
476
Schmidt, K. (1996): A comparison of minimax and least squares estimators in linear
regression with polyhedral prior information, Acta Applicandae Mathematicae 43
(1996) 127-138
Schmidt, K.D. (1996): Lectures on risk theory, Teubner Skripten zur Mathematischen
Stochastik, Stuttgart 1996
Schmidt, P. (1976): Econometrics, Marcel Dekker, New York 1976
Schmidt-Koenig, K. (1972): New experiments on the effect of clock shifts on homing
pigeons in animal orientation and navigation, Eds.: S.R. Galler, K. Schmidt-Koenig,
G.J. Jacobs and R.E. Belleville, NASA SP-262, Washington D:C. 1972
Schmitt, G. (1975): Optimaler Schnittwinkel der Bestimmungsstrecken beim einfachen
Bogenschnitt, Allg. Vermessungsnachrichten 6 (1975) 226-230
Schmitt, G. (1977a): Experiences with the second-order design problem in theoretical and
practical geodetic networks, Proceedings International Symposium on Optimization of
Design and Computation of Control Networks, Sporon 1977
Schmitt, G. (1977b): Experiences with the second-order design problem in theoretical and
practical geodetic networks, Optimization of design and computation of control net-
works. F. Halmos and J. Somogyi eds, Akadémiai Kiadó, Budapest (1979), 179-206
Schmitt, G. (1978): Gewichtsoptimierung bei Mehrpunkteinschaltung mit Streckenmes-
sung, Allg. Vermessungsnachrichten 85 (1978) 1-15
Schmitt, G. (1979): Zur Numerik der Gewichtsoptimierung in geodätischen Netzen, Deut-
sche Geodätische Kommission, Bayerische Akademie der Wissenschaften, Report
256, München 1979
Schmitt, G. (1980): Second order design of a free distance network considering different
types of criterion matrices, Bulletin Geodetique 54 (1980) 531-543
Schmitt, G. (1985): Second Order Design, Third Order Design, Optimization and design
of geodetic networks, Springer Verlag, Berlin (1985), 74-121
726 References
Schmitt, G., Grafarend, E.W., and B. Schaffrin.(1977): Kanonisches Design geodätischer
Netze I, Manuscripta Geodaetica 2 (1977) 263-306
Schmitt, G., Grafarend, E.W. and B. Schaffrin (1978): Kanonisches Design geodätischer
Netze II, Manuscripta Geodaetica 3 (1978) 1-22
Schneeweiß, H. and H.J. Mittag (1986): Lineare Modelle mit fehlerbehafteten Daten,
Physica-Verlag, Heidelberg 1986
Schock, E. (1987): Implicite iterative methods for the approximate solution of ill-posed
problems, Bolletino U.M.I., Series 1-B, 7 (1987) 1171-1184
Schoenberg, I.J. (1938): Metric spaces and completely monotone functions, Ann. Math.
39 (1938) 811-841
Schott, J.R. (1997): Matrix analysis for statistics, J. Wiley, New York 1997
Schott, J.R. (1998): Estimating correlation matrices that have common eigenvectors,
Comput. Stat. & Data Anal. 27 (1998) 445-459
Schouten, J.A. and J.Haantjes (1936): Über die konforminvariante Gestalt der relativisti-
schen Bewegungsgleichungen, in: Koningl. Ned. Akademie van Wetenschappen,
Proc. Section of Sciences, vol. 39, Noord-Hollandsche Uitgeversmaatschappij, Am-
sterdam 1936
Schultz, C. and G. Malay (1998): Orthogonal projections and the geometry of estimating
functions, J. Statistical Planning and Inference 67 (1998) 227-245
Schultze, J. and J. Steinebach (1996): On least squares estimates of an exponential tail
coefficient, Statistics & Decisions 14 (1996) 353-372
Schur, J. (1911): Bemerkungen zur Theorie der verschränkten Bilinearformen mit unend-
lich vielen Veränderlichen, J. Reine und angew. Math. 140 (1911) 1-28
Schur, J. (1917): Über Potenzreihen, die im Innern des Einheitskreises beschränkt sind, J.
Reine angew. Math. 147 (1917) 205-232
Schwarz, G. (1978): Estimating the dimension of a model, The Annals of Statistics 6
(1978) 461-464
Schwarz, H. (1960): Stichprobentheorie, Oldenbourg, München 1960
Schwarz, K. P. (1976): Least-squares collocation for large systems, Boll. Geodesia e
Scienze Affini 35 (1976) 309-324
Scitovski, R. and D. Jukiü (1996): Total least squares problem for exponential function,
Inverse Problems 12 (1996) 341-349
Seal, H.L. (1967): The historical development of the Gauss linear model, Biometrika 54
(1967) 1-24
Searle, S.R. and C.R. Henderson (1961): Computing procedures for estimating compo-
nents of variance in the two-way classification, mixed model, Biometrics 17 (1961)
607-616
Searle, S.R. (1971a). Linear Models, Wiley, New York 1971
Searle, S.R. (1971b): Topics in variance components estimation, Biometrics 27 (1971) 1-
76
Searle, S.R. (1974): Prediction, mixed models, and variance components, Reliability and
Biometry (1974) 229-266
Searle, S.R., Casella, G. and C.E. McCulloch (1992): Variance components, Wiley, New
York 1992
Seber, G.A.F. (1977): Linear regression analysis, Wiley, New York 1977
Seely, J. (1970): Linear spaces and unbiased estimation, Ann. Math. Statist. 41 (1970)
1725-1734
Seely, J. (1970): Linear spaces and unbiased estimation – Application to the mixed linear
model, Ann. Math. Statist. 41 (1970) 1725-1734
References 727
Seely, J. (1971): Quadratic subspaces and completeness, Ann. Math. Statist. 42 (1971)
710-721
Seely, J. (1975): An example of an inquadmissible analysis of variance estimator for a
variance component, Biometrika 62 (1975) 689
Seely, J. (1977): Minimal sufficient statistics and completeness, Sankhya, Series A, Part
2, 39 (1977) 170-185
Seely, J. (1980): Some remarks on exact confidence intervals for positive linear combina-
tions of variance components, J.the American Statistical Association 75 (1980) 372-
374
Seely, J. and R.V. Hogg (1982): Unbiased estimation in linear models, Communication in
Statistics 11 (1982) 721-729
Seely, J. and Y. El-Bassiouni (1983): Applying Wald’s variance components test, Ann.
Statist. 11 (1983) 197-201
Seely, J. and E.-H. Rady (1988): When can random effects be treated as fixed effects for
computing a test statistics for a linear hypothesis?, Communications in Statistics 17
(1988) 1089-1109
Seely, J. and Y. Lee (1994): A note on the Satterthwaite confidence interval for a vari-
ance, Communications in Statistics 23 (1994) 859-869
Seely, J., Birkes, D. and Y. Lee (1997): Characterizing sums of squares by their distribu-
tion, American Statistician 51 (1997) 55-58
Seemkooei, A.A. (2001): Comparison of reliability and geometrical strength criteria in
geodetic networks, J. Geodesy 75 (2001) 227-233
Segura, J. and A. Gil (1999): Evaluation of associated Legendre functions off the cut and
parabolic cylinder functions, Electronic Transactions on Numerical Analysis 9 (1999)
137-146
Selby, B. (1964): Girdle distributions on the sphere, Biometrika 51 (1964) 381-392
Sen Gupta, A. and R. Maitra (1998): On best equivarience and admissibility of simultane-
ous MLE for mean direction vectors of several Langevin distributions, Ann. Inst. Sta-
tist. Math. 50 (1998) 715-727
Sengupta, D. and S.R. Jammalamadaka (2003): Linear models, an integrated approach,
Iseries of Multivariate analysis 6 (2003)
Serfling, R.J. (1980): Approximation theorems of mathematical statistics, J. Wiley, New
York 1980
Shaban, A.M.M. (1994): Anova, minque, PSD-minqmbe, canova and cminque in estima-
ting variance components, Statistica 54 (1994) 481-489
Shah, B.V. (1959): On a generalisation of the Kronecker product designs, Ann. Math.
Statistics 30 (1959) 48-54
Shalabh (1998): Improved estimation in measurement error models through Stein rule
procedure, J. Multivar. Analysis 67 (1998) 35-48
Shao, Q.-M. (1996): p-variation of Gaussian processes with stationary increments, Studia
Scientiarum Mathematicarum Hungarica 31 (1996) 237-247
Shapiro, S.S. and M.B. Wilk (1965): An analysis of variance for normality (complete
samples), Biometrika 52 (1965), 591-611
Shapiro, S.S., Wilk, M.B. and M.J. Chen (1968): A comparative study of various tests for
normality, J. American Statistical Ass. 63 (1968) 1343-1372
Sheppard, W.F. (1912): Reduction of errors by means of negligible differences, Proc. 5th
Int. Congress Mathematicians (Cambridge) 2 (1912) 348-384
Shevlyakov, G.L. and T.Y. Khcatova (1998): On robust estimation of a correlation coeffi-
cient and correlation matrix, Contributions to statistics, pp. 153-162, 1998
Sheynin, O.B. (1966): Origin of the theory of errors, Nature 211 (1966) 1003-1004
728 References
Sheynin, O.B. (1979): Gauß and the theory of errors, Archive for History of Exact Sci-
ences 20 (1979)
Sheynin, O. (1995): Helmert’s work in the theory of errors, Arch. Hist. Exact Sci. 49
(1995) 73-104
Shin, D.W. and S.H. Song (2000): Asymptotic efficiency of the OLSE for polynomial
regression models with spatially correlated errors, Statistics Probability Letters 47
(2000) 1-10
Shiryayev, A.N. (1973): Statistical sequential analysis, Transl. Mathematical Monographs
8, American Mathematical Society, Providence/R.I. 1973
Shkarofsky, I.P. (1968): Generalized turbulence space-correlation and wave-number
spectrum-function pairs, Canadian J.Physics 46 (1968) 2133-2153
Shorack, G.R. (1969): Testing and estimating ratios of scale parameters, J. Am. Statist.
Assn 64, 999-1013, 1969
Shumway, R.H. and D.S. Stoffer (2000): Time series analysis and its applications,
Springer Verlag, New Ýork 2000
Shwartz, A. and A. Weiss (1995): Large deviations for performance analysis, Chapman
and Hall, Boca Raton 1995
Sibuya, M. (1960): Bivariate extreme statistics, Ann. Inst. Statist. Math. 11 (1960) 195-
210
Sibuya, M. (1962): A method of generating uniformly distributed points on n-dimensional
spheres, Ann. Inst. Statist. Math. 14 (1962) 81-85
Sillard, P., Altamimi, Z. and C. Boucher (1998): The ITRF96 realization and its associ-
ated velocity field, Geophysical Research Letters 25 (1998) 3223-3226
Silvey, S.D. (1975): Statistical inference, Chapman and Hall, Boca Raton 1975
Silvey, S.D. (1980): Optimal design, Chapman and Hall, 1980
Sima, V. (1996): Algorithms for linear-quadratic optimization, Dekker, New York 1996
Simmonet, M. (1996): Measures and probabilities, Springer Verlag, New York 1996
Simon, H.D. and H. Zha (2000): Low-rank matrix approximation using the Lanczos bidi-
agonalization process with applications, SIAM J. Sci. Comput. 21 (2000) 2257-2274
Simoncini, V. and F. Perotti (2002): On the numerical solution of ( O 2 A + O B + C ) x = b
and application to structural dynamics, SIAM J. Sci. Comput. 23 (2002) 1875-1897
Simonoff, J.S. (1996): Smoothing methods in statistics, Springer Verlag, New York 1996
Singh, R. (1963): Existence of bounded length confidence intervals, Ann. Math. Statist.
34 (1963) 1474-1485
Singh, S. and D.S. Tracy (1999): Ridge regression using scrambled responses, Metron 57
(1999) 147-157
Singh, S., Horn, S., Chowdhury, S. and F. Yu (1999): Calibration of the estimators of
variance, Austral. & New Zeeland J. Statist. 41 (1999) 199-212
Sjoeberg, L.E. (2003): The BLUE of the GPS double difference satellite-to-receiver
Range for precise positioning, Z. Vermessungswesen 1 (2003) 26-30
Slakter, M.J. (1965): A comparison of the Pearson chi-square and Kolmogorov goodness-
of-fit-tests with respect to validity, J. American Statist. Assoc. 60 (1965) 854-858
Small, C.G. (1996): The statistical theory of shape, Springer Verlag, New York 1996
Smith, A.F.M. (1973): A general Bayesian linear model, J. Royal Statistical Society B 35
(1973) 67-75
Smith, P.J. (1995): A recursive formulation of the old problem of obtaining moments
from cumulants and vice versa, American Statistical Association 49 (1995) 217-218
Smith, T. and S.D. Peddada (1998): Analysis of fixed effects linear models under hetero-
scedastic errors, Statistics & Probability Letters 37 (1998) 399-408
References 729
Smyth, G.K. (1989): Generalized linear models with varying dispersion, J. Royal Statisti-
cal Society B51 (1989) 47-60
Sneeuw, N. and R. Bun (1996): Global spherical harmonic computation by two-
dimensional Fourier methods, J. Geodesy 70 (1996) 224-232
Solari, H.G., Natiello, M.A. and G.B. Mindlin (1996): Nonlinear dynamics, IOP, Bristol
1996
Solomon, P.J. (1985): Transformations for components of variance and covariance, Bio-
metrica 72 (1985) 233-239
Somogyi, J. (1998): The robust estimation of the 2D-projective transformation, Acta
Geod. Geoph. Hung. 33 (1998) 279-288
Song, S.H. (1996): Consistency and asymptotic unbiasedness of S2 in the serially corre-
lated error components regression model for panel data, Statistical Papers 37 (1996)
267-275
Song, S.H. (1999): A note on S2 in a linear regression model based on two-stage sampling
data, Statistics & Probability Letters 43 (1999) 131-135
Soper, H.E. (1916): On the distributions of the correlation coefficient in small samples,
Biometrika 11 (1916) 328-413
Soroush, M. and K.R. Muske (2000): Analytical model predictive control, Progress in
Systems and Control Theory 26 (2000) 166-179
Spanos, A. (1999): Probability theory and statistical inference, Cambridge University
Press, Cambridge 1999
Spaeth, H. (1997): Zum Ausgleich von sphärischen Messdaten mit Kleinkreisen, Allg.
Vermessungsnachrichtungen 11 (1997) 408-410
Spaeth, H. and G.A. Watson (1987): On orthogonal linear l1 approximation, Numerische
Mathematik 51 (1987) 531-543
Sposito, V.A. (1982): On unbiased Lp regression. J. Amer. Statist. Assoc. 77 (1982) 652-
653
Sprent, P. and N.C. Smeeton (1989): Applied nonparametric statistical methods, Chapman
and Hall, Boca Raton, Florida 1989
Sprott, D.A. (1978): Gauss’s contributions to statistics, Historia Mathematica 5 (1978)
183-203
Srecok, A.J. (1968): On the calculation of the inverse of the error function, Math. Compu-
tation 22 (1968) 144-158
Srivastava, A.K., Dube, M. and V. Singh (1996): Ordianry least squares and Stein-rule
predictions in regression models under inclusion of some superfluous variables, Sta-
tistical Papers 37 (1996) 253-265
Srivastava, A.K. and Shalabh, S. (1996): Efficiency properties of least squares and Stein-
Rule predictions in linear regression models, J. Appl. Stat. Science 4 (1996) 141-145
Srivastava, A.K. and Shalabh, S. (1997): A new property of Stein procedure in measure-
ment error model, Statistics & Probability Letters 32 (1997) 231-234
Srivastava, M.S. and D. von Rosen (1998): Outliers in multivariate regression models, J.
Multivariate Analysis 65 (1998) 195-208
Stahlecker, P. and K. Schmidt (1996): Biased estimation and hypothesis testing in linear
regression, Acta Applicandae Mathematicae 43 (1996) 145-151
Stahlecker, P., Knautz, H. and G. Trenkler (1996): Minimax adjustment technique in a
parameter restricted linear model, Acta Applicandae Mathematicae 43 (1996) 139-144
Stam, A.J. (1982): Limit theorems for uniform distributions on spheres in high dimen-
sional euclidean spaces, J. Appl. Prob. 19 (1982) 221-229
Steele, B.M. (1996): A modified EM algorithm for estimation in generalized mixed mod-
els, Biometrics 52 (1996) 1295-310
730 References
Stefanski, L.A. (1989): Unbiased estimation of a nonlinear function of a normal mean
with application to measurement error models, Communications Statist. Theory
Method. 18 (1989) 4335-4358
Stefansky, W. (1971): Rejecting outliers by maximum normal residual, Ann. Math. Statis-
tics 42 (1971) 35-45
Stein, C. (1945): A two-sample test for a linear hypothesis whose power is independent of
the variance, Ann. Math. Statistics 16 (1945) 243-258
Stein, C. (1950): Unbiased estimates with minimum variance, Ann. Math. Statist. 21
(1950) 406-415
Stein, C. (1959): An example of wide discrepancy between fiducial and confidence inter-
vals, Ann. Math. Statist. 30 (1959) 877-880
Stein, C. (1964): Inadmissibility of the usual estimator for the variance of a normal distri-
bution with unknown mean, Ann. Inst. Statist. Math. 16 (1964) 155-160
Stein, C. and A. Wald (1947): Sequential confidence intervals for the mean of a normal
distribution with known variance, Ann. Math. Statist. 18 (1947) 427-433
Steiner, F. and B. Hajagos (1999a): A more sophisticated definition of the sample median,
Acta Geod. Geoph. Hung. 34 (1999) 59-64
Steiner, F. and B. Hajagos (1999b): Insufficiency of asymptotic results demonstrated on
statistical efficiencies of the L1 Norm calculated for some types of the supermodel
fp(x), Acta Geod. Geoph. Hung. 34 (1999) 65-69
Steiner, F. and B. Hajagos (1999c): Error characteristics of MAD-S (of sample medians)
in case of small samples for some parent distribution types chosen from the super-
models fp(x) and fa(x), Acta Geod. Geoph. Hung. 34 (1999) 87-100
Steinmetz, V. (1973): Regressionsmodelle mit stochastischen Koeffizienten, Proc. Oper.
Res. 2, DGOR Ann. Meet., Hamburg 1973, pp. 95-104
Stenger, H. (1971): Stichprobentheorie, Physica-Verlag, Würzburg 1971
Stephens, M.A. (1963): Random walk on a circle, Biometrika 50 (1963) 385-390
Stephens, M.A. (1964): The testing of unit vectors for randomness, J. Amer. Statist. Soc.
59 (1964) 160-167
Stephens, M.A. (1969a): Tests for randomness of directions against two circular alterna-
tives, J. Amer. Statist. Ass. 64 (1969) 280-289
Stephens, M.A. (1969b): Test for the von Mises distribution, Biometrika 56 (1969) 149-
160
Stephens, M.A. (1979): Vector correlations, Biometrika 66 (1979) 41-88
Stepniak, C. (1985): Ordering of nonnegative definite matrices with application to com-
parison of linear models, Linear Algebra and Its Applications 70 (1985) 67-71
Stewart, C.W. (1997): Bias in robust estimation caused by discontinuities and multiple
structures, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (1997)
818-833
Stewart, G.W. (1977): On the perturbation of pseudo-inverses, projections and linear least
squares, SIAM Review 19 (1977) 634-663
Stewart, G.W. (1995a): Gauss, statistics and Gaussian elimination, J. Computational and
Graphical Statistics 4 (1995) 1-11
Stewart, G.W. (1995b): Afterword, in Translation: Theoria Combinationis Observationum
Erroribus Minimis Obnoxiae, pars prior-pars posterior-supplementum by Carl Frie-
drich Gauss (Theory of the Combination of Observations Least Subject to Errors,
Classics in Applied Mathematics, SIAM edition , pages 205-236, Philadelphia 1995
Stewart, G.W. (1992): An updating algorithm for subspace tracking, IEEE Trans. Signal
Proc. 40 (1992) 1535-1541
References 731
Stewart, G.W. (1998): Matrix algorithms, vol. 1: Basic decompositions, SIAM, Philadel-
phia 1998
Stewart, G.W. (1999): The QLP approximation to the singular value decomposition,
SIAM J. Sci. Comput. 20 (1999) 1136-1348
Stewart, G.W. (2001): Matrix algorithms, vol. 2: Eigensystems, SIAM, Philadelphia 2001
Stewart, G.W. and Sun Ji-Guang (1990): Matrix perturbation theory, Academic Press,
New York 1990
Stewart, K.G. (1997): Exact testing in multivariate regression, Econometric reviews 16
(1997) 321-352
Stigler, S.M. (1973a): Laplace, Fisher and the discovery of the concept of suffiency,
Biometrika 60 (1973) 439-445
Stigler, S.M. (1973b): Simon Newcomb, Percy Daniell, and the history of robust estima-
tion 1885-1920, J. American Statistical Association 68 (1973) 872-879
Stigler, S.M. (1977): An attack on Gauss, published by Legendre in 1820, Historia
Mathematica 4 (1977) 31-35
Stigler, S.M. (1986): The history of statistics, the measurement of uncertainty before
1900, Belknap Press-Harvard University Press, Cambridge/Mass. 1986
Stigler, S.M. (1999): Statistics on the table, the history of statistical concepts and meth-
ods, Harvard University Press, Cambridge-London 1999
Stigler, S.M. (2000): International statistics at the millennium: progressing or regressing,
International Statistical Review 68 (2000) 2,111-121
Stoica, P. and T. Soederstroem (1998): Partial least squares: A first-order analysis, Board
of the Foundation of the Scandinavian J.Statistics 25 (1998) 17-24
Stopar, B. (1999): Design of horizontal GPS net regarding non-uniform precision of GPS
baseline vector components, Bollettino di Geodesia e Scienze Affini 58 (1999) 255-
272
Stopar, B. (2001): Second order design of horizontal GPS net, Survey Review 36 (2001)
44-53
Storm, R. (1967): Wahrscheinlichkeitsrechnung, mathematische Statistik und statistische
Qualitätskontrolle, Leipzig 1967
Stoyanov, J. (1997): Regularly perturbed stochastic differential systems with an internal
random noise, Nonlinear Analysis, Theory, Methods & Applications 30 (1997) 4105-
4111
Stoyanov, J. (1998): Global dependency measure for sets of random elements: "The Ital-
ian problem" and some consequences, in: Ioannis Karatzas et al. (Eds.), Stochastic
process and related topics in memory of Stamatis Cambanis 1943-1995, Birkhäuser,
Boston/Basel/Berlin 1998
Stoyanov, J. (1999): Inverse Gaussian distribution and the moment problem, J. Appl.
Statist. Science 9 (1999) 61-71
Stoyanov, J. (2000): Krein condition inprobabilistic moment problems, Bernoulli 6 (2000)
939-949
Strang, G. and K. Borre (1997): Linear algebra, geodesy and GPS, Wellesley, Cambridge
Press 1997
Stroebel, D. (1997): Die Anwendung der Ausgleichungsrechnung auf elastsmechanische
Systeme, Deutsche Geodätische Kommission, Bayerische Akademie der Wissenschaf-
ten, Report C478, München 1997
Strohmer, T. (2000): A Levinson-Galerkin algorithm for regularized trigonometric ap-
proximation, SIAM J. Sci. Comput 22 (2000) 1160-1183
Stroud, A.H. (1966): Gaussian quadrature formulas, Prentice Hall, Englewood Cliffs, N.J.
1966
732 References
Stroud, A.H. and D. Secrest (1963): Approximate integrations formulas for certain spheri-
cally symmetric regions, Mathematics of Computation 17 (1963) 105-135
Stuart, A. and J.K. Ord (1994): Kendall’s advanced theory of statistics: volume I, distribu-
tion theory, Arnold Publ., 6th edition, London 1997
Student (Gosset, W.S.): The probable error of a mean, Biometrika 6 (1908) 1-25
Stulajter, F. (1978): Nonlinear estimators of polynomials in mean values of a Gaussian
stochastic process, Kybernetika 14 (1978) 206-220
Sturmfels, B. (1996): Gröbner bases and convex polytopes, American Mathematical
Society, Providence 1996
Subrahamanyan, M. (1972): A property of simple least squares estimates, Sankya B34
(1972) 355-356
Suenkel, H. (1985): K. Fourier analysis of geodetic networks, in: Optimization and design
of geodetic networks, pp. 257-302, Grafarend E.W. and F. Sansò (eds), Springer Ver-
lag 1985
Sugaria, N. and Y. Fujikoshi (1969): Asymptotic expansions of the non-null distributions
of the likelihood ratio criteria for multivariate linear hypothesis and independence,
Ann. Math. Stat. 40 (1969) 942-952
Sun, J.-G. (2000): Condition number and backward error for the generalized singular
value decomposition, Siam J. Matrix Anal. Appl. 22 (2000) 323-341
Sundaram, R.K. (1996): A first course in optimisation theory, Cambridge University Press
1996
Swallow, W.H. and F. Kianifard (1996): Using robust scale estimates in detecting multi-
ple outliers in linear regression, Biometrics 52 (1996) 545-556
Swallow, W.H. and S.R. Searle (1978): Minimum variance quadratic unbiased estimation
(MIVQUE) of variance components, Technometrics 20 (1978) 265-272
Swamy, P.A.V.B. (1971): Statistical inference in random coefficient regression models,
Springer-Verlag, Berlin 1971
Sylvester, J.J. (1850): Additions to the articles, "On a new class of theorems", and "On
Pascal's theorem", Phil. Mag. 37 (1850) 363-370
Sylvester, J.J. (1851): On the relation between the minor determinants of linearly equiva-
lent quadratic functions, Phil. Mag. 14 (1851) 295-305
Szabados, T. (1996): An elementary introduction to the Wiener process and stochastic
integrals, Studia Scientiarum Mathematicarum Hungarica 31 (1996) 249-297
Szasz, D. (1996): Boltzmann’s ergodic hypothesis, a conjecture for centuries?, Studia
Scientiarum Mathematicarum Hungarica 31 (1996) 299-322
Takos, I. (1999): Adjustment of observation equations without full rank, Bolletino di
Geodesia e Scienze Affini 58 (1999) 195-208
Tanana, V.P. (1997): Methods for solving operator equations, VSP, Utrecht, Netherlands
1997
Tanizaki, H. (1993): Nonlinear filters – estimation and applications, Springer, Berlin
Heidelberg New York 1993
Tanizaki, H. (2000): Bias correction of OLSE in the regression model with lagged de-
pendent variables, Computational Statistics & Data Analysis 34 (2000) 495-511
Tanner, A. (1996): Tools for statistical inference, 3rd ed., Springer Verlag, New York
1996
Tarpey, T. (2000): A note on the prediction sum of squares statistic for restricted least
squares, The American Statistician 54 (2000) 116-118
Tarpey, T. and B. Flury (1996): Self-consistency, a fundamental concept in statistics,
Statistical Science 11 (1996) 229-243
References 733
Tasche, D. (2003): Unbiasedness in least quantile regression, in: R. Dutter, P. Filzmoser,
U. Gather, P.J. Rousseeuw (Eds.), Developments in Robust Statistics, pp. 377-386,
Physica Verlag, Heidelberg 2003
Tashiro, Y. (1977): On methods for generating uniform random points on the surface of a
sphere, Ann. Inst. Statist. Math. 29 (1977) 295-300
Tate, R.F. (1959): Unbiased estimation: functions of location and scale parameters, Ann.
Math. Statist. 30 (1959) 341-366
Tate, R.F. and G.W. Klett (1959): Optimal confidence intervals for the variance of a
normal distribution, J. American Statistical Assoc. 16 (1959) 243-258
Teicher, H. (1961): Maximum likelihood characterization of distribution, Ann. Math.
Statist. 32 (1961) 1214-1222
Taylor, J.R. (1982): An introduction to error analysis, University Science Books, Sausa-
lito 1982
Tenorio, L. (2001): Statistical Regularization of inverse problems, SIAM Review 43
(2001) 347-366
Teunissen, P.J.G. (1985a: The geometry of geodetic inverse linear mapping and non-
linear adjustment, Netherlands Geodetic Commission, Publications on Geodesy, New
Series, Vol. 8/1, Delft 1985
Teunissen, P.J.G. (1985b: Zero order design: generalized inverses, adjustment, the datum
problem and S-transformations, In: Optimization and design of geodetic networks,
Grafarend, E.W. and F. Sanso eds., Springer Verlag, Berlin-Heidelberg-New York-
Tokyo 1985
Teunissen, P.J.G. (1989a): Nonlinear inversion of geodetic and geophysical data: diagnos-
ing nonlinearity, In: Brunner, F.K. and C. Rizos (eds): Developments in four-
dimensinal geodesy, Lecture Notes in Earth Sciences 29 (1989), 241-264
Teunissen, P.J.G. (1989b): First and second moments of non-linear least-squares estima-
tors, Bull. Geod. 63 (1989) 253-262
Teunissen, P.J.G. (1990): Non-linear least-squares estimators, Manuscripta Geodaetica 15
(1990) 137-150
Teunissen, P.J.G. (1993): Least-squares estimation of the integer GPS ambiguities, LGR
series, No. 6, Delft Geodetic Computing Centre, Delft 1993
Teunissen, P.J.G. (1995a: The invertible GPS ambiguity transformation, Manuscripta
Geodaetica 20 (1995) 489-497
Teunissen, P.J.G. (1995b: The least-squares ambiguity decorrelation adjustment: a
method for fast GPS integer ambiguity estimation, J. Geodesy 70 (1995) 65-82
Teunissen, P.J.G. (1997a): A canonical theory for short GPS baselines. Part I: The base-
line precision, J. Geodesy 71 (1997) 320-336
Teunissen, J.P.G. (1997b): On the sensitivity of the location, size and shape of the GPS
ambiguity search space to certain changes in the stochastic model, J. Geodesy 71
(1997) 541-551
Teunissen, J.P.G. (1997c): On the GPS widelane and its decorrelating property, J. Geod-
esy 71 (1997) 577-587
Teunissen, J.P.G. (1997d): The least-squares ambiguity decorrelation adjustment: its
performance on short GPS baselines and short observation spans, J. Geodesy 71
(1997) 589-602
Teunissen, P.J.G. and A. Kleusberg (1998): GPS observation and positioning concepts, in:
GPS for Geodesy, pp. 187-229, Teunissen, P.J.G. and A. Kleusberg (eds), Berlin 1998
Teunissen, P.J.G., de Jonge, P.J. and C.C.J.M. Tiberius (1997): The least-squares ambigu-
ity decorrelation adjustment: its performance on short GPS baselines and short obser-
vation spans, J. Geodesy 71 (1997) 589-602
734 References
Théberge, A. (2000): Calibration and restricted weights, Survey Methodology 26 (2000)
99-107
Theil, H. (1965): The analysis of disturbanaces in regression analysis, J.the American
Statistical Association 60 (1965) 1067-1079
Thompson, R. (1969). Iterative estimation of variance components for non-orthogonal
data, Biometrics 25 (1969) 767-773
Thompson, W.A. (1955): The ratio of variances in variance components model, Ann.
Math. Statist. 26 (1955) 325-329
Thomson, D.J. (1982): Spectrum estimation and harmonic analysis, Proceedings Of The
IEEE 70 (1982) 1055-1096
Tian, G.-L. (1998): The comparison between polynomial regression and orthogonal poly-
nomial regression, Statistics & Probability Letters 38 (1998) 289-294
Tiberius, C.C.J.M. and F. Kenselaar (2000): Estimation of the stochastic model for GPS
code and phase observables, Survey Review 35 (2000) 441-455
Tikhonov, A.N. and V.Y. Arsenin (1977): Solutions of ill-posed problems, J. Wiley, New
York 1977
Tikhonov, A.N., A.S. Leonov and A.G. Yagola (1998a): Nonlinear ill-posed problems,
vol.1, Appl. Math. and Math. Comput. 14, Chapman & Hall, London 1998
Tikhonov, A.N., A.S. Leonov and A.G. Yagola (1998b): Nonlinear ill-posed problems,
vol.2, Appl. Math. and Math. Comput. 14, Chapman & Hall, London 1998
Tjoestheim, D. (1990): Non-linear time series and Markov chains, Adv. Appl. Prob. 22
(1990) 587-611
Tjur, T (1998): Nonlinear regression, quasi likelihood, and overdispersion in generalized
linear models, American Statistician 52 (1998) 222-227
Tobias, P.A. and D.C. Trinidade (1995): Applied reliability, Chapman and Hall, Boca
Raton 1995
Tolimieri, R., An, A. and C. Lu (1989): Algorithms for discrete Fourier transform and
convolution, Springer Verlag 1989
Tominaga, Y. and I. Fujiwara (1997): Prediction-weighted partial least-squares regression
(PWPLS), Chemometrics and Intelligent Lab Systems 38 (1997) 139-144
Tong, H. (1990): Non-linear time series, Oxford University Press, Oxford 1990
Toranzos F.I. (1952): An asymmetric bell-shaped frequency curve, Ann. Math. Statist. 23
(1952) 467-469
Tornatore, V. and F. Migliaccio (2001): Stochastic modelling of non-stationary smooth
phenomena, International Association of Geodesy Symposia 122 “IV Hotine – Ma-
russi Symposium on Mathematical Geodesy”, Springer Verlag, Berlin – Heidelberg
2001
Toutenburg, H. (1970): Vorhersage im allgemeinen linearen Regressionsmodell mit sto-
chastischen Regressoren, Math. Operationsforschg. Statistik 2 (1970) 105-116
Toutenburg, H. (1975): Vorhersage in linearen Modellen, Akademie Verlag, Berlin 1975
Toutenburg, H. (1996): Estimation of regression coefficients subject to interval con-
straints, Sankhya: The Indian J. Statistics A, 58 (1996) 273-282
Toutenburg, H. (2000): Improved predictions in linear regression models with stochastic
linear constraints, Biometrical Journal 42 (2000) 71-86
Townsend, E.C. and S.R. Searle (1971): Best quadratic unbiased estimation of variance
components from unbalanced data in the l-way classification, Biometrics 27 (1971)
643-657
Trefethen, L.N. and D. Bau (1997): Numerical linear algebra, Society for Industrial and
Applied Mathematics (SIAM), Philadelphia 1997
References 735
Troskie, C.G. and D.O. Chalton (1996): Detection of outliers in the presence of multicol-
linearity, in: Multidimensional statistical analysis and theory of random matrices, Pro-
ceedings of the Sixth Lukacs Symposium, eds. Gupta, A.K. and V.L.Girko, pages
273-292, VSP, Utrecht 1996
Troskie, C.G., Chalton, D.O. and M. Jacobs (1999): Testing for outliers and influential
observations in multiple regression using restricted least squares, South African Sta-
tist. J. 33 (1999) 1-40
Tsai, H. and K.S. Chan (2000): A note on the covariance structure of a continuous-time
arma process, Statistica Sinica 10 (2000) 989-998
Tsimikas, J.V. and J. Ledolter (1997): Mixed model representation of state space models:
new smoothing results and their application to REML estimation, Statistica Sinica 7
(1997) 973-991
Tufts, D.W. and R. Kumaresan (1982): Estimation of frequencies of multiple sinusoids:
making linear prediction perform like maximum likelihood, Proc. of IEEE Special is-
sue on Spectral estimation 70 (1982) 975-989
Turkington, D. (2000): Generalised vec operators and the seemingly unrelated regression
equations model with vector correlated disturbances, J. Econometrics 99 (2000) 225-
253
Ulrych, T.J. and R.W. Clayton (1976): Time series modelling and maximum entropy,
Phys. Earth and Planetry Interiors 12 (1976) 188-200
Vainikko, G.M. (1982): The discrepancy principle for a class of regularization methods,
USSR. Comp. Math. Math. Phys. 22 (1982) 1-19
Vainikko, G.M. (1983): The critical level of discrepancy in regularization methods,
USSR. Comp. Math. Math. Phys. 23 (1983) 1-9
Van der Veen, A.-J. (1996): A schur method for low-rank matrix approximation, SIAM J.
Matrix Anal. Appl. 17 (1996) 139-160
Van Garderen, K.J. (1999): Exact geometry of autoregressive models, J. Time Series
Analysis 20 (1999) 1-21
Van Gelderen, M. and R. Rummel (2001): The solution of the general geodetic boundary
value problem by least squares, J. Geodesy 75 (2001) 1-11
Van Huffel, S. (1990): Solution and properties of the restricdes total least squares prob-
lem, Processing of the International Mathematical Theory of Networks and Systems
Symposium (MTNS ´89) 521-528
Van Huffel, S. (1997): Recent advances in total least squares techniques and errors-in-
variables modelling, SIAM, Philadelphia 1997
Van Huffel, S. and H. Zha (1991a): The restricted total least squares problem: Formula-
tion, algorithm, and properties, SIAM J. Matrix Anal. Appl. 12 (1991) 292-309
Van Huffel, S. and H. Zha (1991b): The total least squares problem, SIAM J. Matrix
Anal. Appl. 12 (1991) 377-407
Van Mierlo, J. (1980): Free network adjustment and S-transformations, Deutsche Geod.
Kommission B 252, München 1980, 41-54
Van Montfort, K. (1988): Estimating in structural models with non-normal distributed
variables: some alternative approaches, Leiden 1988
Van Montfort, K. (1989): Estimating in structural models with non-normal distributed
variables: some alternative approaches, 'Reprodienst, Subfaculteit Psychologie', Lei-
den 1989
Van Rosen, D. (1988): Moments for matrix normal variables, Statistics 19 (1988) 575-583
Vanicek, P. and E.W. Grafarend (1980): On the weight estimation in leveling, National
Oceanic and Atmospheric Administration, Report NOS 86, NGS 17, Rockville 1980
Varadhan, S.R.S. (2001): Diffusion Processes, D. N. Shanbhag and C.R. Rao, eds., Hand-
book of Statistic 19 (2001) 853-871
736 References
Vasconcellos, K.L.P. and M.C. Gauss (1997): Approximate bias for multivariate nonlin-
ear heteroscedastic regressions, Brazilian J. Probability and Statistics 11 (1997) 141-
159
Vedel-Jensen, E.B. and L. Stougaard-Nielsen (2000): Inhomogeneous Markov point
processes by transformation, Bernoulli 6 (2000) 761-782
Ventsel, A.D. and M.I. Freidlin (1969): On small random perturbations of dynamical
systems, Report delivered at the meeting of the Moscow Mathematical Society on
March 25, 1969, Moscow 1969
Ventzell, A.D. and M.I. Freidlin (1970): On small random perturbations of dynamical
systems, Russian Math. Surveys 25 (1970) 1-55
Verbeke, G. and G. Molenberghs (2000): Linear mixed models for longitudinal data,
Springer-Verlag, New York 2000
Vernizzi, A., Goller, R. and P. Sais (1995): On the use of shrinkage estimators in filtering
extraneous information, Giorn. Econ. Annal. Economia 54 (1995) 453-480
Verbeke, G. and G. Molenberghs (1997): Linear Mixed Models in Practice, Springer,
New York 1997
Veroren, L.R. (1980): On estimation of variance components, Statistica Neerlandica 34
(1980) 83-106
Vichi, M. (1997): Fitting L2 norm classification models to complex data sets, Student 2
(1997) 203-213
Vigneau, E. M.F. Devaux, E.M. Qannari and P. Robert (1997): Principal component
regression, ridge regression and ridge principal component regression in spectroscopy
calibration, J. Chemometrics 11 (1997) 239-249
Vinod, H.D. and L.R. Shenton (1996): Exact moments for autoregressive and random
walk models for a zero or stationary initial value, Econometric Theory 12 (1996) 481-
499
Vinograde, B. (1950): Canonical positive definite matrices underinternal linear transfor-
mations, Proc. Amer. Math. Soc.1 (1950) 159-161
Voinov, V.G. and M.S. Nikulin (1993a): Unbiased estimators and their applications,
volume 1: univariate case, Kluwer-Academic Publishers, Dordrecht 1993
Voinov, V.G. and M.S. Nikulin (1993b): Unbiased estimators and their applications,
volume 2: multivariate case, Kluwer-Academic Publishers, Dordrecht 1993
Volterra, V. (1930): Theory of functionals, Blackie, London 1930
Vonesh, E.F. and V.M. Chinchilli (1997): Linear and nonlinear models for the analysis of
repeated measurements, Marcel Dekker Inc, New York – Bael – Hong Kong 1997
Von Mises, R. (1981): Über die „Ganzzahligkeit“ der Atomgewichte und verwandte
Fragen, Phys. Z. 19 (1918) 490-500
Wagner, H. (1959): Linear programming techniques for regression analysis, J. American
Statistical Association 56 (1959) 206-212
Wahba, G. (1975): Smoothing noisy data with spline functions, Numer. Math. 24 (1975)
282-292
Wald, A. (1939): Contributions to the theory of statistical estimation and testing hypothe-
ses, Ann. Math. Statistics 10 (1939) 299-326
Wald, A. (1945): Sequential tests for statistical hypothesis, Ann. Math. Statistics 16
(1945) 117-186
Walker, J.S. (1996): Fast Fourier transforms, 2nd edition, CRC Press, Boca Raton 1996
Walker, P.L. (1996): Elliptic functions, J. Wiley, Chichester U.K. 1996
Walker, S. (1996): An EM algorithm for nonlinear random effect models, Biometrics 52
(1996) 934-944
References 737
Wallace, D.L. (1980): The Behrens-Fisher and Fieller-Creasy problems, in: R.A. Fisher:
an appreciation, Fienberg and Hinkley, eds, Springer, pp 117-147, New York 1980
Wan, A.T.K. (1994a): The sampling performance of inequality restricted and pre-test
estimators in a misspecified linear model, Austral. J. Statist. 36 (1994) 313-325
Wan, A.T.K. (1994b): Risk comparison of the inequality constrained least squares and
other related estimators under balanced loss, Econom. Lett. 46 (1994) 203-210
Wan, A.T.K. (1994c): The non-optimality of interval restricted and pre-test estimators
under squared error loss, Comm. Statist. A – Theory Methods 23 (1994) 2231-2252
Wan, A.T.K. (1999): A note on almost unbiased generalized ridge regression estimator
under asymmetric loss, J. Statist. Comput. Simul. 62 (1999) 411-421
Wan, A.T.K. and K. Ohtani (2000): Minimum mean-squared error estimation in linear
regression with an inequality constraint, J.Statistical Planning and Inference 86 (2000)
157-173
Wang, J. (1996): Asymptotics of least-squares estimators for constrained nonlinear re-
gression, Annals of Statistics 24 (1996) 1316-1326
Wang, J. (2000): An approach to GLONASS ambiguity resolution, J. Geodesy 74 (2000)
421-430
Wang, M. C. and G.E. Uhlenbeck (1945): On the theory of the Brownian motion II, Re-
view of Modern Physics 17 (1945) 323-342
Wang, N., Lin, X. and R.G. Guttierrez (1999): A bias correction regression calibration
approach in generalized linear mixed measurement error models, Commun. Statist.
Theory Meth. 28 (1999) 217-232
Wang, Q-H. and B-Y. Jing (1999): Empirical likelihood for partial linear models with
fixed designs, Statistics & Probability Letters 41 (1999) 425-433
Wang, S. and G.I. Uhlenbeck (1945): On the theory of the Brownian motion II, Rev.
Modern Phys. 17 (1945) 323-342
Wang, T. (1996): Cochran Theorems for multivariate components of variance models,
Sankhya: The Indian J. Statistics A, 58 (1996) 238-342
Wassel, S.R. (2002): Rediscovering a family of means, Mathematical Intelligencer 24
(2002) 58-65
Waterhouse, W.C. (1990): Gauss’s first argument for least squares, Archive for the His-
tory of Exact Sciences 41 (1990) 41-52
Watson, G.S. (1983): Statistics on spheres, Wiley, New York 1983
Watson, G.S. (1956a): Analysis of dispersion on a sphere, Monthly Notices of the Royal
Astronomical society, Geophysical Supplement 7 (1956) 153-159
Watson, G.S. (1956b): A test for randomness of directions, Monthly Notices Roy. Astro.
Soc. Geoph. Suppl. 7 (1956) 160-161
Watson, G.S. (1960): More significance tests on the sphere, Biometrika 47 (1960) 87-91
Watson, G.S. (1961): Goodness-of-fit tests on a circle, Biometrika 48 (1961) 109-114
Watson, G.S. (1962): Goodness-of-fit tests on a circle-II, Biometrika 49 (1962) 57-63
Watson, G.S. (1964): Smooth regression analysis, Sankhya: The Indian J. Statistics: Se-
ries A (1964), 359-372
Watson, G.S. (1965): Equatorial distributions on a sphere, Biometrika 52 (1965) 193-201
Watson, G.S. (1966): Statistics of orientation data, J. Geology 74 (1966) 786-797
Watson, G.S. (1967a): Another test for the uniformity of a circular distribution, Bio-
metrika 54 (1967) 675-677
Watson, G.S. (1967b): Some problems in the statistics of directions, Bull. of I.S.I. 42
(1967) 374-385
738 References
Watson, G.S. (1968): Orientation statistic in the earth sciences, Bull of the Geological
Institutions of the Univ. of Uppsala 2 (1968) 73-89
Watson, G.S. (1969): Density estimation by orthogonal series, Ann. Math. Stat. 40
(1969): Density estimation by orthogonal series, Ann. Math. Stat. 40 (1969) 1469-
1498
Watson, G.S. (1970): The statistical treatment of orientation data, Geostatistics – a collo-
quium (Ed. D.F. Merriam), Plenum Press, New York 1970, 1-10
Watson, G.S. (1974): Optimal invariant tests for uniformity, Studies in Probability and
Statistics, Jerusalem Academic Press (1974) 121-128
Watson, G.S. (1981): Large sample theory of the Langevin distributions, J. Stat. Planning
Inference 8 (1983) 245-256
Watson, G.S: (1982): The estimation of palaeomagnetic ploe positions, Statistics in Prob-
ability: Essay in hohor of C.R. Rao, North-Holland, Amsterdam and New York 1982
Watson, G.S (1982): Distributions on the circle and sphere, Essays in Statsistical Science,
J. App. Prob. Special Volume 19A (1982) 265-280
Watson, G.S. (1984): The theory of concentrated Langevin distributions, J. Mult. Anal. 14
(1984) 74-82
Watson, G.S. (1986): Some estimation theory on the sphere, Ann. Inst. Statist. Math. 38
(1986) 263-275
Watson, G.S. (1987): The total approximation problem, in: Approximation theory IV, eds.
Chui, C.K. et al, pages 723-728, Academic Press 1987
Watson, G.S. (1988): The Langevin distribution on high dimensional spheres, J. Applied
Statistics 15 (1988) 123-130
Watson, G.S. (1998): On the role of statistics in polomagnetic proof of continental drift,
Canadian J. Statistics 26 (1998) 383-392
Watson, G.S. and E.J. Williams (1956): On the construction of significance tests on the
circle and the sphere, Biometrika 43 (1956) 344-352
Watson, G.S. and E. Irving (1957): Statistical methods in rock magnetism, Monthly No-
tices Roy. Astro. Soc. 7 (1957) 290-300
Watson, G.S. and M.R. Leadbetter (1963): On the estimation of the probability density-I,
Ann. Math. Stat 34 (1963) 480-491
Watson, G.S. and S. Wheeler (1964): A distribution-free two-sample test on a circle,
Biometrika 51 (1964) 256
Watson, G.S. and R.J. Beran (1967): Testing a sequence of unit vectors for serial correla-
tion, J. Geophysical Research 72 (1967) 5655-5659
Watson, G.S., Epp, R. and J. W. Tukey (1971): Tsting unit vectors for correlation, J.
Geophysical Research 76 (1971) 8480-8483
Wedderburn, R. (1974): Quasi-likelihood functions, generalized linear models and the
Gauß-Newton method, Biometrica 61 (1974) 439-447
Wei, B.-C. (1998): Exponential family: nonlinear models, Springer Verlag, Singapore
1998
Wei, B.-C. (1998): Testing for varying dispersion in exponential family nonlinear models,
Ann. Inst. Statist. Math. 50 (1998) 277-294
Wei, B.-C. and Y.-Q. Hu (1998): Generalized Leverage and its applications, Board of the
Foundations of the Scandinavian J.Statistics 25 (1998)25-37
Wei, M. (1997): Equivalent formulae for the supremum and stability of weighted pseudo-
inverses, Mathematics of Computation 66 (1997) 1487-1508
Wei, M. (2001): Supremum and stability of weighted pseudoinverses and weighted least
squares problems analysis and computations, Nova Science Publishers, New York
2001
References 739
Wei, M. and A.R. de Pierro (2000): Upper perturbation bounds of weighted projections,
weighted and constrained least squares problems, SIAM J. Matrix Anal. Appl. 21
(2000) 931-951
Wei, M. and A.R. de Pierro (2000): Some new properties of the equality constrained and
weighted least squares problem, Linear Algebra and its Applications 320 (2000) 145-
165
Weiss, A. (2002): Determination of thermal stratification and turbulence of the atmos-
pheric surface layer over various types of terrain by optical scintillometry, Disserta-
tion Swiss Federal Institute of Technology Zurich 2002
Weiss, G. and R. Rebarber (2000): Optimizability and estimatability for infinite-
dimensional linear systems, Siam J. Control Optim. 39 (2000) 1204-1232
Weisstein, E.W. (1999): Legendre Polynomial, CRC Press LLC, Wolfram Research Inc.
1999
Wellisch, S. (1910): Theorie und Praxis der Ausgleichungsrechnung Band 2: Probleme
der Ausgleichungsrechnung, pp. 46-49, Kaiserliche und königliche Hof-
Buchdruckerei und Hof-Verlags-Buchhandlung, Carl Fromme, Wien und Leipzig
1910
Wellner, J (1979): Permutation tests for directional data, Ann. Statist. 7 (1979) 924-943
Wells, D.E., Lindlohr, W., Schaffrin, B. and E.W. Grafarend (1987): GPS design: Undif-
ferenced carrier beat phase observations and the fundamental difference theorem,
University of New Brunswick, Surveying Engineering, Technical Report Nr. pages
116, 141 pages, Fredericton/Canada 1987
Welsh, A.H. (1996): Aspects of statistical inference, J. Wiley, New York 1996
Wenzel, H.G. (1977): Zur Optimierung von Schwerenetzen, Z. Vermessungswesen 102
(1977) 452-457
Werkmeister, P. (1916): Graphische Ausgleichung bei trigonometrischer Punktbestim-
mung durch Einschneiden, Z. Vermessungswesen 45 (1916) 113-126
Wernstedt, J. (1989): Experimentelle Prozeßanalyse, Oldenbourg Verlag, München 1989
Wertz, J.R. (1978): Spacekraft attitude determination and control, Kluwer Academic
Publishers, Dordrecht – Boston – London 1978
Wess, J. (1960): The conformal invariance in quantum field theory, in: Il Nuovo Cimento,
Nicola Zanichelli (Hrsg.), vol 18, Bologna 1960
Wetzel, W., Jöhnk, M.D. and P. Naeve (1967): Statistische Tabellen, de Gruyter, Berlin
1977
Whittaker, E.T. and G. Robinson (1924): The calculus of observations, Blackie, London
1924
Whittle, P. (1963a): Prediction and regulation, D. van Nostrand Co., Inc., Princeton 1963
Whittle, P. (1963b): Stochastic processes in several dimensions, Bull. Inst. Int. Statist. 40
(1963) 974-994
Whittle, P. (1973): Some general points in the theory of optimal experimental design, J.
Royal Statist. B35 (1973) 123-130
Wickerhauser, M.V. (1996): Adaptive Wavelet-Analysis, Theorie und Software, Vieweg
& Sohn Verlag, Braunschweig/Wiesbaden 1996
Wieser, A. (2000): Equivalent weight matrix, Graz University of Technilogy, Graz 2000
Wigner, E.P. (1958): On the distribution of the roots of certain symmetric matrices, Ann.
Math. 67 (1958)
Wilcox, R.R. (1997): Introduction to robust estimation and hypothesis testing, Academic
Press, San Diego 1997
Wilcox, R.R. (2001): Fundamentals of modern statistical methods, Springer Verlag, New
York 2001
740 References
Wilders, P. and E. Brakkee (1999): Schwarz and Schur: an algebraical note on equiva-
lence properties, SIAM J. Sci. Comput. 20 (1999) 2297-2303
Wilkinson, J. (1965): The algebraic eigenvalue problem, Clarendon Press, Oxford 1965
Wilks, S.S. (1962): Mathematical statistics, J. Wiley, New York 1962
Wilks, S.S. (1963): Multivariate statistical outliers, Sankya A25 (1963) 407-426
Williams, E.J. (1963): A comparison of the direct and fiducial arguments in the estimation
of a parameter, J. Royal Statistical Society, Series B, 25 (1963) 95-99
Wimmer, G. (1995): Properly recorded estimate and confidence regions obtained by an
approximate covariance operator in a special nonlinear model, Applications of
Mathematics 40 (1995) 411-431
Wimmer, H. (1981a): Ein Beitrag zur Gewichtsoptimierung geodätischer Netze, Deutsche
Geodätische Kommission, München, Reihe C (1981), 269
Wimmer, H. (1981b): Second-order design of geodetic networks by an iterative approxi-
mation of a given criterion matrix, in: Proc. of the IAG Symposium on geodetic net-
works and computations, R. Sigle ed., Deutsche Geodätische Kommission, München,
Reihe B, Nr. 258 (1981), Heft Nr. III, 112-127
Wishart, J. (1928): The generalized product moment distribution in samples from a nor-
mal multivariate population, Biometrika 20 (1928) 32-52
Wishner, R., Tabaczynski, J. and M. Athans (1969): A comparison of three non-linear
filters, Automatica 5 (1969) 487-496
Witkovsky, Viktor (1998): Modified minimax quadratic estimation of variance compo-
nents, Kybernetika 34 (1998) 535-543
Witting, H. and G. Nölle (1970): Angewandte Mathematische Statistik, Teubner Verlag,
Stuttgart 1970
Wolf, H. (1968): Ausgleichungsrechnung nach der Methode der kleinsten Quadrate,
Ferdinand Dümmlers Verlag, Bonn 1968
Wolf, H. (1973): Die Helmert-Inverse bei freien geodätischen Netzen, Z. Vermessungs-
wesen 98 (1973) 396-398
Wolf, H. (1975): Ausgleichungsrechnung I, Formeln zur praktischen Anwendung,
Duemmler, Bonn 1975
Wolf, H. (1976): The Helmert block method, its origin and development, Proc. 2 nd Int.
Symp. on problems related to the Redefinition of North American Geodetic Networks,
pp. 319-326, Arlington 1976
Wolf, H. (1980a): Ausgleichungsrechnung II, Aufgaben und Beispiel zur praktischen
Anwendung, Duemmler, Bonn 1980
Wolf, H. (1980b): Hypothesentests im Gauß-Helmert-Modell, Allg. Vermessungsnach-
richten 87 (1980) 277-284
Wolf, H. (1997): Ausgleichungsrechnung I und II, 3. Auflage, Ferdinand Dümmler Ver-
lag, Bonn 1997
Wolfowitz, J. and J. Kiefer (1959): Optimum design in regression problems, Ann. Math.
Statist. 30 (1959) 271-294
Wolkowicz, H. and G.P.H. Styan (1980): More bounds for eigenvalues using traces,
Linear Algebra Appl. 31 (1980) 1-17
Wolter, K.H. and Fuller, W.A. (1982): Estimation of the quadratic errors-in-variables
model, Biometrika 69 (1982) 175-182
Wong, C.S. (1993): Linear models in a general parametric form, Sankya 55 (1993) 130-
149
Wong, W.K. (1992): A unified approach to the construction of minimax designs, Bio-
metrika 79 (1992) 611-620
References 741
Wood, A. (1982): A bimodal distribution for the sphere, Applied Statistics 31 (1982) 52-
58
Woolson, R.F. and W.R. Clarke (1984): Analysis of categorical incomplete data, J. Royal
Statist. Soc. Series A 147 (1984) 87-99
Worbs, E. (1955): Carl Friedrich Gauß, ein Lebensbild, Leipzig 1955
Wu, C.F.J. (1981): Asymptotic theory of nonlinear least squares estimation, Ann. Stat. 9
(1981) 501-513
Wu, Q. and Z. Jiang (1997): The existence of the uniformly minimum risk equivariant
estimator in Sure model, Commun. Statist. - Theory Meth. 26 (1997) 113-128
Wunsch, G. (1986): Handbuch der Systemtheorie, Oldenbourg Verlag, München 1986
Xi, Z. (1993): Iterated Tikhonov regularization for linear ill-posed problems, PhD. Thesis,
Universität Kaiserslautern, Kaiserslautern 1993
Xu, P. (1989): On robust estimation with correlated observations, Bulletin Géodesique 63
(1989) 237-252
Xu, P. (1991): Least squares collocation with incorrect prior information, Z. Verm. 116
(1991) 266-273
Xu, P. (1992): The value of minimum norm estimation of geopotential fields, Geoph. J.
Int. 111 (1992) 170-178
Xu, P. (1995): Testing the hypotheses of non-estimable functions in free net adjustment
models, Manuscripta Geodaetica 20 (1995) 73-81
Xu, P. (1999a): Biases and accuracy of, and an alternative to, discrete nonlinear filters, J.
Geodesy 73 (1999) 35-46
Xu, P. (1999b): Spectral theory of constrained second-rank symmetric random tensors,
Geoph. J. Int. 138 (1999) 1-24
Xu, P. (2001): Random simulation and GPS decorrelation, J. Geodesy 75 (2001) 408-423
Xu, P. (2002): Isotropic probabilistic models for directions, planes and referential sys-
tems, Proc. Royal Soc. London A 458 (2002) 2017-2038
Xu, P. and E.W. Grafarend (1996): Statistics and geometry of the eigenspectra of three-
dimensional second-rank symmetric random tensors, Geophysical Journal Interna-
tional 127 (1996) 744-756
Xu, P. and E.W. Grafarend (1996): Probability distribution of eigenspectra and eigendi-
rections of a twodimensional, symmetric rank two random tensor, J. Geodesy 70
(1996) 419-430
Yaglom, A.M. (1961): Second-order homogeneous random fields in: Proceedings of the
Fourth Berkeley Symposium on Mathematical Statistics and Probability, pages 593-
622, University of California Press, Berkeley 1961
Yang, H. (1996): Efficiency matrix and the partial ordering of estimate, Commun. Statist.
- Theory Meth. 25(2) (1996) 457-468
Yang, Y. (1994): Robust estimation for dependent observations, Manuscripta Geodaetica
19 (1994) 10-17
Yang, Y. (1999a): Robust estimation of geodetic datum transformation, J. Geodesy 73
(1999) 268-274
Yang, Y. (1999b): Robust estimation of systematic errors of satellite laser range, J. Geod-
esy 73 (1999) 345-349
Yazji, S. (1998): The effect of the characteristic distance of the correlation function on the
optimal design of geodetic networks, Acta Geod. Geoph. Hung., Vol. 33 (2-4) (1998)
215-234
Ye, Y. (1997): Interior point algorithms: Theory and analysis, Wiley, New York 1997
Yeh, A.B. (1998): A bootstrap procedure in linear regression with nonstationary errors,
The Canadian J. Stat. 26 (1998) 149-160
742 References
Yeung, M.-C. and T.F. Chan (1997): Probabilistic analysis of Gaussian elimination with-
out pivoting, SIAM J. Matrix Anal. Appl. 18 (1997) 499-517
Ylvisaker, D. (1977): Test resistance, J. Am Statis. Assn 72, 551-557, 1977
Yohai, V.J. and R.H. Zamar (1997): Optimal locally robust M-estimates of regression, J.
Statistical Planning and Inference 64 (1997) 309-323
Yor, M. (1992): Some aspects of Brownian motion, Part I: Some special functionals,
Birkhäuser Verlag, Basel 1992
Yor, M. (1992): Some aspects of Brownian motion, Part II: Some recent martingale prob-
lems, Birkhäuser Verlag, Basel 1997
Youcai, H. and S.P. Mertikas (1995): On the designs of robust regression estimators,
Manuscripta Geodaetica 20 (1995) 145-160
Youssef, A.H.A. (1998): Coefficient of determination for random regression model,
Egypt. Statist. J. 42 (1998) 188-196
Yu, Z.C. (1996): A universal formula of maximum likelihood estimation of variance-
covariance components, J. Geodesy 70 (1996) 233-240
Yuan, K.H. and P.M. Bentler (1997): Mean and covariance structure analysis- theoretical
and practical improvements, J. American Statist. Assoc. 92 (1997) 767-774
Yuan, K.H. and P.M. Bentler (1998a): Robust mean and covariance structure analysis,
British J. Mathematical and Statistical Psychology (1998) 63-88
Yuan, K.H. and P.M. Bentler (1998b): Robust mean and covariance structure analysis
through iteratively reweighed least squares, Psychometrika 65 (2000) 43-58
Yuan, Y. (2000): On the truncated conjugate gradient method, Math. Prog. 87 (2000) 561-
571
Yuan, Y. (1999): On the truncated conjugate gradient method, Springer- Verlag
Yusuf, S., Peto, R., Lewis, J., Collins, R. and P. Sleight (1985): Beta blockade during and
after myocardial infarction: An overview of the randomized trials, Progress in Car-
diovascular Diseases 27 (1985) 335-371
Zabell, S. (1992): R.A. Fisher and the fiducial argument, Statistical Science 7 (1992) 369-
387
Zackin, R., de Gruttola, V. and N. Laird (1996): Nonparametric mixed-effects for re-
peated binary data arising in serial dilution assays: Application to estimating viral
burden in AIDS, J. American Statist. Assoc. 91 (1996) 52-61
Zacks, S.(1971): The theory of statistical inference, J. Wiley, New York 1971
Zacks, S. (1996): Adaptive designs for parametric models, 151-180
Zadeh, L. (1965): Fuzzy sets, Information and Control 8 (1965) 338-353
Závoti, J. (1999): Modified versions of estimates based on least squares and minimum
norm, Acta Geod. Geoph. Hung. 34 (1999) 79-86
Závoti, J. (2001): Filtering of earth’s polar motion using trigonometric interpolation, Acta
Geod. Geoph. Hung. 36 (2001) 345-352
Zehfuss, G. (1858): Über eine gewisse Determinante, Zeitschrift für Mathematik und
Physik 3 (1858) 298-301
Zellner, A. (1971): An introduction to Bayesian inference in econometrics, J. Wiley, New
York 1971
Zha, H. (1995): Comments on large least squares problems involving Kronecker products,
SIAM J. Matrix Anal. Appl. 16 (1995) 1172
Zhan, X. (2000): Singular values of differences of positive semidefinite matrices, Siam J.
Matrix Anal. Appl. 22 (2000) 819-823
Zhang, Y. (1985): The exact distribution of the Moore-Penrose inverse of X with a den-
sity, in: Multivariate Analysis VI, Krishnaiah, P.R. (ed), pages 633-635, Elsevier,
New York 1985
References 743
Zhang, J.Z., Chen, L.H. and N.Y. Deng (2000): A family of scaled factorized Broyden-
like methods for nonlinear least squares problems, SIAM J. Optim. 10 (2000) 1163-
1179
Zhang, Z. and Y. Huang (2003): A projection method for least squares problems with a
quadratic equality constraint, SIAM J. Matrix Anal. Appl. 25 (2003) 188-212
Zhao, L. (2000): Some contributions to M-estimation in linear models, J. Statistical Plan-
ning an Inference 88 (2000) 189-203
Zhao, Y. and S. Konishi (1997): Limit distributions of multivariate kurtosis and moments
under Watson rotational symmetric distributions, Statistics & Probability Letters 32
(1997) 291-299
Zhdanov, M.S. (2002): Geophysical inverse theory and regularization problems, Methods
in Geochemistry and Geophysics 36, Elsevier, Amsterdam-Boston-London 2002
Zhenhua, X. (1993): Iterated Tikhonov regularization for linear ill-posed problem, PhD.
Thesis University of Kaiserslautern, Kaiserslautern 1993
Zhen-Su She, Jackson, E. and Orszag, S. (1990) : Intermittency of turbulence, in: The
Legacy of John von Neumann (J. Glimm, J. Impagliazzo, I. Singer eds.) Proc. Symp.
Pure Mathematics, vol. 50, pages 197-211, American Mathematical Society, Provi-
dence, Rhode Island 1990
Zhou, J. (2001): Two robust design approaches for linear models with correlated errors,
Statistica Sinica 11 (2001) 261-272
Zhou, K.Q. and S.L. Portnoy (1998): Statistical inference on hetero-skedastic models
based on regression quantiles, J. Nonparametric Statistics 9 (1998) 239-260
Zhong, D. (1997): Robust estimation and optimal selection of polynomial parameters for
the interpolation of GPS geoid heights, J. Geodesy 71 (1997) 552-561
Zhu, Jianjun (1996): Robustness and the robust estimate, J. Geodesy 70 (1996) 586-590
Ziegler, A., Kastner, C. and M. Blettner (1998): The generalised estimating equations: an
annotated bibliography, Biom. J. 40 (1998) 115-139
Zimmermann, H.-J. (1991): Fuzzy set theory and its applications, 2nd ed., Kluwer Aca-
demic Publishers, Dordrecht 1991
Zioutas, G., Camarinopoulos, L. and E.B. Senta (1997): Theory and Methodology: Robust
autoregressive estimates using quadratic programming, European J. Operational Re-
search 101 (1997) 486-498
Zippel, R. (1993): Effective polynomial computations, Kluwer Academic Publishers,
Boston 1993
Zolotarev, V.M. (1997): Modern theory of summation of random variables, VSP, Utrecht
1997
Zucker, D.M., Lieberman, O. and O. Manor (2000): Improved small sample inference in
the mixed linear model: Bartlett correction and adjusted likelihood, J. R. Statist. Soc.
B. 62 (2000) 827-838
Zurmuehl, R. and S. Falk (1984): Matrizen und ihre Anwendungen, Teil 1: Grundlagen,
5.ed., Springer-Verlag, Berlin 1984
Zurmuehl, R. and S. Falk (1986): Matrizen und ihre Anwendungen, Teil 2: Numerische
Methoden, 5.ed., Springer-Verlag, Berlin 1986
Zwet, W.R. van and J. Osterhoff (1967): On the combiantion of independent test statistics,
Ann. Math. Statist. 38 (1967) 659-680
Zyskind, G. (1969): On best linear estimation and a general Gauss-Markov theorem in
linear models with arbitrary nonnegative covariance structure, SIAM J. Appl. Math.
17 (1969) 1190-1202
Index
1-way classification, 460, 461, 463, 464 622, 623, 631, 632, 633, 634, 640,
2-way classification , 464, 467, 469, 642
470, 473, 475 bivariate Gauss-Laplace pdf, 627
3-way classification, n-way classifica- Bjerhammar formula, 485, 516
tion, 476 BLE, 285, 287
3d datum transformation, 431, 433, 441 BLIMBE, 311, 642
BLIP, 347
algebraic regression, 40 BLUMBE, 285, 291, 293, 287, 298, 299,
A-optimal design, 323, 359 300
ARIMA, 455, 477, 478 BLUUE, 188, 187, 194, 195, 201, 208,
arithmetic mean, 191, 195, 184 210, 379, 380, 387, 430, 467, 567,
ARMA, 455, 478 569, 571, 572, 574, 579, 581, 582,
ARMA process, 477 585, 586, 588, 592, 596, 597, 599,
associativity, 486, 497 603, 606, 621, 622, 629, 630, 632,
augmented Helmert matrix, 583 633, 639, 640, 643
autoregressive integrated moving- BLUUP, 380, 387
average process, see ARIMA bordering, 302, 485
break points, 176, 177, 181, 182, 143
best homogeneous linear prediction,
Brown process, 476
400
best homogeneous linear unbiased canonical LESS, 135, 137, 139, 131
prediction, 400 canonical MINOLESS, 281
best inhomogeneous linear prediction, canonical MINOS, 1, 13, 26, 36, 37, 41,
400 212, 485
Best Invariant Quadratic Uniformly Cayley inverse, 322, 323, 324, 359, 360,
Unbiased Estimation, see BIQUUE 497, 498, 499, 501, 502, 513, 517,
Best Linear Uniformly Unbiased 576
Estimation, see BLUUE Cayley multiplication, 513
Best Linear V-Norm Uniformly Mini- Cayley-product, 486, 497
mum Bias S-Norm Estimation, 462 characteristic equation, 509
bias matrix, 87, 88, 91, 93, 94, 288, characteristic polynomial, 509
312, 313, 349 Choleski decomposition, 503
bias vector, 87, 91, 93, 90, 288, 293, collocation, 386, 399
292, 300, 304, 309, 312, 313, column space, 6, 496, 497
314, 320, 321, 322, 349, 350, commutativity, 486
355, 357, 358 condition equations with unknowns, 429
bias weight matrix, 359 conditional equations, 413
BIQE, 285, 294, 299, 300, 301, 303, confidence coefficient, 554
304, 305, 311 confidence interval, 543, 553, 557, 564,
BIQUUE, 187, 189-196, 198-201, 217, 567, 592, 596, 605, 606, 611, 612,
236-241, 285, 294, 298-301, 613, 614, 617, 619, 620
303, 304, 305, 311, 380, 379, confidence region, 543, 621
385, 386, 387, 569, 571, 572, consistent linear equation, 2, 6
574, 579, 581, 582, 588, 596, cumulative pdf, 569, 571, 581
597, 599, 603, 606, 613, 621,
746 Index