You are on page 1of 773

Linear and

Nonlinear Models:
Fixed Effects, Random
Effects, and Mixed Models

Erik W. Grafarend

Walter de Gruyter
Grafarend · Linear and Nonlinear Models
Erik W. Grafarend

Linear and
Nonlinear Models
Fixed Effects, Random Effects, and Mixed Models


Walter de Gruyter · Berlin · New York
Author
Erik W. Grafarend, em. Prof. Dr.-Ing. habil. Dr. tech. h.c. mult Dr.-Ing. E.h. mult
Geodätisches Institut
Universität Stuttgart
Geschwister-Scholl-Str. 24/D
70174 Stuttgart, Germany
E-Mail: grafarend@gis.uni-stuttgart.de


앝 Printed on acid-free paper which falls within the guidelines of the ANSI to ensure permanence
and durability.

Library of Congress Cataloging-in-Publication Data

Grafarend, Erik W.
Linear and nonlinear models : fixed effects, random effects, and
mixed models / by Erik W. Grafarend.
p. cm.
Includes bibliographical references and index.
ISBN-13: 978-3-11-016216-5 (hardcover : acid-free paper)
ISBN-10: 3-11-016216-4 (hardcover : acid-free paper)
1. Regression analysis. 2. Mathematical models. I. Title.
QA278.2.G726 2006
519.5136⫺dc22
2005037386

Bibliographic information published by Die Deutsche Bibliothek


Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie;
detailed bibliographic data is available in the Internet at ⬍http://dnb.ddb.de⬎.

ISBN-13: 978-3-11-016216-5
ISBN-10: 3-11-016216-4

쑔 Copyright 2006 by Walter de Gruyter GmbH & Co. KG, 10785 Berlin
All rights reserved, including those of translation into foreign languages. No part of this book may
be reproduced or transmitted in any form or by any means, electronic or mechanical, including
photocopy, recording, or any information storage and retrieval system, without permission in
writing from the publisher.
Printed in Germany
Cover design: Rudolf Hübler, Berlin
Typeset using the author’s word files: M. Pfizenmaier, Berlin
Printing and binding: Hubert & Co. GmbH & Co. Kg, Göttingen
Preface

“All exact science is dominated by the idea of approximation.”


B. Russell

“You must always invert.”


C.G.J. Jacobi

“Well, Mr. Jacobi; here it is: all the generalized inversion of two generations of
inventors who knowingly or unknowingly subscribed and extended your dictum.
Please, forgive us if we have over-inverted, or if we have not always inverted in
the natural and sensible way. Some of us have inverted with labor and pain by
using hints from a dean or a tenure and promotion committee that “you better
invert more, or else you would be inverted.””
M.Z. Nashed, L.B. Rall

There is a certain intention in reviewing linear and nonlinear models from the
point-of-view of fixed effects, random effects and mixed models. First, we want
to portray the different models from the algebraic point of view – for instance a
minimum norm, least squares solution (MINOLESS) – versus the stochastic
point-of-view – for instance a minimum bias, minimum variance “best” solution
(BLIMBE). We are especially interested in the question under which assumption
the algebraic solution coincides with the stochastic solution, for instance when
MINOLESS is identical to BLIMBE.
The stochastic approach is richer with respect to modeling. Beside the first order
moments we need, the expectation of a random variable, also a design for the
central second order moments, the variance-covariance matrix of the random
variable as long as we deal with second order statistics. Second, we therefore
setup a unified approach to estimate (predict) the first order moments, or for
instance by BLUUE (BLUUP) and the central second order moments, for in-
stance by BIQUUE, if they exist. In short, BLUUE (BLUUP) stands for “Best
Linear Uniformly Unbiased Estimation” (Prediction) and BIQUUE alternatively
for “Best Invariant Quadratic Uniformly Unbiased Estimation”.
A third criterion is the decision whether the observation vector is inconsistent or
random, whether the unknown parameter vector is random or not, whether the
“first design matrix” within a linear model is random or not and finally whether
the “mixed model” E{y} = Aȟ + &E{]} + E{;}Ȗ has to be applied if we restrict
ourselves to linear models. How to handle a nonlinear model where we have a
priori information about approximative values will be outlined in detail. As a
special case we also deal with “condition equations with unknowns”
BE{y} c = Aȟ where the matrices/vector {A, B, c} are given and the observation
vector y is again a random variable.
vi Preface

A fourth problem is related to the question of what is happening when we take


observations not over \ n (real line, n-dimensional linear space) but over
S n (circle S1 , sphere S 2 ,…, hypersphere S n ), over E n (ellipse E1 , ellipsoid
E 2 , …, hyperellipsoid E n ), in short over a curved manifold. We show in par-
ticular that the circular variables are elements of a von Mises distribution or that
the spherical variables are elements of a von Mises-Fisher distribution.
A more detailed discussion is in front of you.
The first problem of algebraic regression in Chapter one is constituted by a
consistent system of linear observational equations of type underdetermined
system of linear equations. So we may say “more unknowns than equations”. We
solve the corresponding system of linear equations by an optimization problem
which we call the minimum norm solution (MINOS). We discuss the semi-norm
solution of Special Relativity and General Relativity and alternative norms of
type l p . For “MINOS” we identify the typical generalized inverse and the eigen-
value decomposition for G x -MINOS. For our Front Page Example we compute
canonical MINOS. Special examples are Fourier series and Fourier-Legendre
series, namely circular harmonic and spherical harmonic regression. Special
nonlinear models included Taylor polynomials and generalized Newton iteration,
for the case of planar triangular network as an example whose nodal points are a
priori coordinated. The representation of the proper objective function of type
“MINOS” is finally given for a defective network (P-diagram, E-diagram). The
transformation group for observed coordinate differences (translation groups
T(2), T(3),..., T(n) ), for observed distances (group of motions T(2) ‰ S O(2),
T(3) ‰ S O(3) ,…, T(n) ‰ SO (n) ), for observed angles or distance ratios (con-
formal groups C 4 (2), C7 (3),..., C( n +1) ( n + 2) / 2 ( n) ) and for observed cross-ratios of
area elements in the projective plane (projective group) are reviewed with their
datum parameters.
Alternatively, the first problem of probabilistic regression – the special Gauss-
Markov model with datum defect – namely the setup of the linear uniformly
minimum bias estimator of type LUMBE for fixed effects is introduced in Chap-
ter two. We define the first moment equations Aȟ = E{y} and the second central
moment equations Ȉ y = D{y} and estimate the fixed effects by the homogeneous
linear setup ȟˆ = / y of type S-LUMBE by the additional postulate of minimum
bias || B ||S2 =|| (I m  LA ) ||S2 where B := I m  LA is the bias matrix. When are
G x -MINOS and S-LUMBE equivalent? The necessary and sufficient condition
is G x = S 1 or G x1 = S , a key result. We give at the end an extensive example.
The second problem of algebraic regression in Chapter three treats an inconsis-
tent system of linear observational equations of type overdetermined system of
linear equations. Or we may say “more observations than unknowns”. We solve
the corresponding system of linear equations by an optimization problem which
we call the least squares solution (LESS). We discuss the signature of the obser-
vation space when dealing with Special Relativity and alternative norms of type
l p , namely l2 ,… , l p ,… , lf . For extensive applications we discuss various objec-
tive functions like (i) optimal choice of the weight matrix G y : second order
design SOD, (ii) optimal choice of weight matrix G y by means of condition
equations, and (iii) robustifying objective functions. In all detail we introduce the
second order design SOD by an optimal choice of a criterion matrix of weights.
Preface vii
What is the proper choice of an ideal weight matrix G x ? Here we propose the
Taylor-Karman matrix borrowed from the Theory of Turbulence which generates
a homogeneous and isotropic weight matrix G x (ideal). Based upon the funda-
mental work of G. Kampmann, R. Jurisch and B. Krause we robustify G y -LESS
and identify outliers. In particular we identify Grassmann-Plücker coordinates
which span the normal space R ( A) A . We pay a short tribute to Fuzzy Sets. In
some detail we identify G y -LESS and its generalized inverse. Canonical LESS
is based on the eigenvalue decomposition of G y -LESS illustrated by an exten-
sive example. As a case study we pay attention to partial redundancies, latent
conditions, high leverage points versus break points, direct and inverse
Grassmann coordinates, Plücker coordinates, “hat” matrix, right eigenspace
analysis, multilinear algebra, “join” and “meet”, the Hodge star operator, dual
Grassmann coordinates, dual Plücker coordinates, Gauss-Jacobi Combinatorial
Algorithm concluding with a historical note on C.F. Gauss, A.M. Legendre and
the invention of Least Squares and its generalization.
Alternatively, the second problem of probabilistic regression in Chapter four the
special Gauss-Markov model without datum effect – namely the setup of the best
linear uniformly unbiased estimator for the first order moments of type BLUUE
and of the best invariant quadratic uniformly unbiased estimator for the central
second order moments of type BIQUUE for random observations is introduced.
First, we define ȟ̂ and Ȉ y -BLUUE, two lemmas and a theorem. Alternatively,
second we setup by four definitions and by six corollaries, five lemmas and two
theorems IQE (“invariant quadratic estimation ”) and best IQUUE (“best invari-
ant quadratic uniformly unbiased estimator ”). Alternative estimators of type
MALE (“maximum likelihood”) are reviewed. Special attention is paid to the
“IQUUE” of variance-covariance components of Helmert type called “HIQUUE”
and “MIQE”. For the case of one variance component, we are able to find neces-
sary and sufficient conditions when LESS agrees to BLUUE, namely G y = Ȉ y1 or
G y1 = Ȉ y , a key result.
The third problem of algebraic regression in Chapter five – the inconsistent
system of linear observational equations with datum defect: overdetermined-
underdetermined system – presents us with three topics. First, by one definition
and five lemmas we document the minimum norm, least squares solution
(“MINOLESS”). Second, we review of the general eigenspace analysis versus
the general eigenspace synthesis. Third, special estimators of type “ D -hybrid
approximation solution” (“ D -HAPS ”) and “Tykhonov-Phillips regularization”
round up the alternative estimators.
Alternatively, third problem of probabilistic regression in Chapter six – the
special Gauss-Markov model with datum problem – namely the setup of estima-
tors of type “BLIMBE” and “BLE” for the moments of first order and of type
“BIQUUE ” and “BIQE ” for the central moments of second order, is reviewed.
First, we define ȟ̂ as homogeneous Ȉ y , S-BLUMBE (“ Ȉ y , S – best linear uni-
formly minimum bias estimator”) and compute via two lemmas and three theo-
rems “hom Ȉ y , S-BLUMBE”, E{y}, D{Aȟˆ}, D{H y } as well as “ Vˆ 2 - BIQUUE”
and “ Vˆ 2 BIQE ” of V 2 . Second, by three definitions and one lemma and three
theorems we work on “hom BLE”, “hom S-BLE”, “hom D -BLE”. Extensive
examples are given. For the case of one variance component we are able to find
viii Preface

necessary and sufficient conditions when MINOLESS agrees to BLIMBE,


namely G x = S 1 , G y = Ȉ y1 or G x1 = S, G y1 = Ȉ y , a key result.
As a spherical problem of algebraic representation we treat an incomplete sys-
tem of directional observational equations, namely an overdetermined system of
nonlinear equations on curved manifolds (circle, sphere, hypersphere S p ). We
define what we mean by minimum geodesic distance on S1 and S 2 and present
two lemmas on S1 and two lemmas on S 2 of type minimum geodesic distances.
In particular, we take reference to the von Mises distribution on a circle, to the
Fisher spherical distribution on the sphere and, in general, to the Langevin
sphere S p  \ p +1 . The minimal geodesic distance (“MINGEODISC”) is com-
puted for / g and (/ g , ) g ) . We solve the corresponding nonlinear normal equa-
tions. In conclusion, we present a historical note of the von Mises distribution
and generalize to the two-dimensional generalized Fisher distribution by an
oblique map projection. At the end, we summarize the notion of angular metric
and give an extensive case study.
The fourth problem of probabilistic regression in Chapter eight as a special
Gauss-Markov model with random effects is described as “BLIP” and “VIP” for
the central moments of first order. Definitions are given by hom BLIP (“homo-
geneous best linear Mean Square Predictor”), S-hom BLIP (“homogeneous linear
minimum S-modified Mean Square Predictor”) and hom D -VIP (“homogeneous
linear minimum variance-minimum bias in the sense of a weighted hybrid norm-
solution”). One lemma and three theorems collect the results for (i) hom BLIP,
(ii) hom S-BLIP and (iii) hom D -VIP. In all detail, we compute the predicted
solution for the random effects, its bias vector, the Mean Square Prediction Error
MSPE. Three cases for nonlinear error propagation with random effects are
discussed.
In Chapter nine we specialize towards the fifth problem of algebraic regression,
namely the system of conditional equations of homogeneous and inhomogeneous
type. We follow two definitions, one theorem and three lemmas of type G y -
LESS before we present an example from angular observations.
As Chapter ten we treat the fifth problem of probabilistic regression, the Gauss-
Markov model with mixed effects in setting up BLUUE estimators for the mo-
ments of first order, special case of Kolmogorov-Wiener prediction. After defin-
ing Ȉ y -BLUUE of [ and E{z} where z is random variable, we present two
lemmas and one theorem how to construct estimators of ȟ̂ , E n{z} on the basis of
Ȉ y -BLUUE of ȟ and E{z} . By a separate theorem we fix a homogeneous quad-
ratic setup of the variance component Vˆ 2 within the first model of fixed effects
and random effects superimposed. As an example we present “collocation” en-
riched by a set of comments about A.N. Kolmogorov – N. Wiener prediction, the
so-called “yellow devil”.
Chapter eleven leads us to the “sixth problem of probabilistic regression”, the
celebrated random effect model “errors-in-variables”. We outline the model and
sum up the theory of normal equations. Our example is the linear equation
E{y} = E{X}Ȗ where the first order design matrix is random. An alternative
name is “straight line fit by total least squares”. Finally we give a detailed ex-
ample and a literature list.
Preface ix
C.F. Gauss and F.R. Helmert introduced the sixth problem of generalized alge-
braic regression, the system of conditional equations with unknowns which we
proudly present in Chapter twelve. First, we define W-LESS of the model
Ax + Bi = By where i is an inconsistency parameter. In two lemmas we solve its
normal equations and discuss the condition on the matrix A  B . Two alternative
solutions are based on R, W-MINOLESS (two lemmas, one definition) and R,
W-HAPS (one lemma, one definition) are given separately. An example is re-
viewed as a height network. For shifted models of type Ax + Bi = By  c with
similar results are summarized.
For the special nonlinear problem of the 3d datum transformation of Chapter
thirteen we review the famous Procrustes Algorithm. With the algorithm we
consider the coupled unknowns of type dilatation, also called scale factor, trans-
lation and rotation for random variables of 3d coordinates in an “old system”
and in a “new system”. After the definition of the conformal group C7(3) in a
three-dimensional network with 7 unknown parameters we present four corollar-
ies and one theorem: First, we reduce the translation parameters, second the
scale parameters and last, not least, third the rotation parameters bound together
in a theorem. A special result is the computation of the variance-covariance ma-
trix of the observation array E := Y1  Y2 X c3 x1  1x c2 as a function of Ȉ vecYc , 1
Ȉ vecYc , Ȉ vecYc , (I n  x1 X 2 ) vec Y2c and (I n … x1X3 ) . A detailed example of type I-
2 1
LESS is given including a discussion about || El || and ||| El ||| precisely defined.
Here we conclude with a reference list.
Chapter fourteen as our sixth problem of type generalized algebraic regression
“revisited” deals with “The Grand Linear Model”, namely the split level of con-
ditional equations with unknowns (general Gauss-Helmert model). The linear
model consists of 3 components: (i) B1i = By  c , (ii) A 2 x + B 2 i = B 2 y , c 2 R (B 2 )
and (iii) A 3 x  c = 0 or A 3 x + c = 0, R3  R ( A 3 ) . The first equation contains
only conditions on the observation vector. In contrast, the second equation bal-
ances both condition equations between the unknown parameters in the form of
A 2 x and the conditions B 2 y = c3 . Finally, the third equation is a condition exclu-
sively on the unknown parameter vector. For our model Lemma 14.1 presents the
W-LESS solution, Lemma 14.2 the R, W-MINOLESS solution and Lemma 14.3
the R, W-HAPS solution. As an example we treat a planar triangle whose coor-
dinates consist of three distances being measured under a datum condition.
Chapter fifteen is concerned with three topics. First, we generalize the univariate
Gauss-Markov model to the multivariate Gauss-Markov model with and without
constraints. We present two definitions, one lemma about multivariate LESS,
one lemma about its counterpart of type multivariate Gauss-Markov modeling
and one theorem of type multivariate Gauss-Markov modeling with constraints.
Second, by means of a MINOLESS solution we present the celebrated “n-way
classification model” to answer the question of how to compute a basis of unbi-
ased estimable quantities. Third, we take into account the fact that in addition to
observational models we also have dynamical system equations. In some detail,
we review the Kalman Filter (Kalman-Bucy Filter), models of type ARMA and
ARIMA. We illustrate the notions of “steerability” and “observability” by two
examples. The state differential equation as well as the observational equation
are simultaneously solved by Laplace transformation. At this end we focus on
x Preface

the modern theory of dynamic nonlinear models and comment on the theory of
chaotic behavior as its up-to-date counterparts.
In the appendices we specialize on specific topics. Appendix A is a review on
matrix algebra, namely special matrices, scalar measures and inverse matrices,
eigenvalues and eigenvectors and generalized inverses. The counterpart is matrix
analysis which we outline in Appendix B. We begin with derivations of scalar-
valued and vector-valued vector functions, followed by a chapter on derivations
of trace forms and determinantal forms. A specialty is the derivation of a vec-
tor/matrix function of a vector/matrix. We learn how to derive the Kronecker-
Zehfuß product and matrix-valued symmetric or antisymmetric matrix function.
Finally we show how to compute higher order derivatives. Appendix C is an
elegant review of Lagrange multipliers. The lengthy Appendix D introduces
sampling distributions and their use: confidence intervals and confidence re-
gions. As peculiar vehicles we show how to transform random variables. A first
confidence interval of Gauss-Laplace normally distributed observations is com-
puted for the case P , V 2 known, example the Three Sigma Rule. A second confi-
dence interval for the sampling form the Gauss-Laplace normal distributions for
the mean built on the assumption that the variance is known.
The alternative sampling from the Gauss-Laplace normal distribution leads the
third confidence interval for the mean, variance unknown based on the Student
sampling distribution. The fourth confidence interval for the variance is based
on the analogue sampling for the variance based on the F 2 - Helmert distribution.
For both the intervals of confidence, namely based on the Student sampling dis-
tribution for the mean, variance unknown, and based on the F 2 - Helmert distri-
bution for the sample variance, we compute the corresponding Uncertainty Prin-
ciple. The case of a multidimensional Gauss-Laplace normal distribution is out-
lined for the computation of confidence regions for fixed parameters in the linear
Gauss-Markov model. Key statistical notions like moments of a probability dis-
tribution, the Gauss-Laplace normal distribution (quasi-normal distribution),
error propagation as well as important notions of identifiability and unbiased-
ness are reviewed. We close with bibliographical indices.
Here we are not solving rank-deficient or ill-problems in using UTV or QR fac-
torization techniques. Instead we refer to A. Björk (1996), P. Businger and G. H.
Golub (1965), T. F. Chan and P. C. Hansen (1991, 1992), S. Chandrasekaran
and I. C. Ipsen (1995), R. D. Fierro (1998), R. D. Fierro and J. R. Bunch (1995),
R. D. Fierro and P. C. Hansen (1995, 1997), L. V. Foster (2003), G. Golub and
C. F. van Loan (1996), P. C. Hansen (1990 a, b, 1992, 1994, 1995, 1998), Y.
Hosada (1999), C. L. Lawson and R. J. Hanson (1974), R. Mathias and G.W.
Stewart (1993), A. Neumaier (1998), H. Ren (1996), G. W. Stewart (1992, 1992,
1998), L. N. Trefethen and D. Bau (1997).
My special thanks for numerous discussions go to J. Awange (Kyoto/Japan), A.
Bjerhammar (Stockholm/Sweden), F. Brunner (Graz/Austria), J. Cai (Stutt-
gart/Germany), A. Dermanis (Thessaloniki/Greece), W. Freeden (Kaiserslautern
/Germany), R. Jurisch (Dessau/Germany), J. Kakkuri (Helsinki/Finland), G.
Kampmann (Dessau/Germany), K. R. Koch (Bonn/Germany), F. Krumm (Stutt-
gart/Germany), O. Lelgemann (Berlin/Germany), H. Moritz (Graz/Austria), F.
Sanso (Milano/Italy), B. Schaffrin (Columbus/Ohio/USA), L. Sjoeberg (Stock-
Preface xi
holm/Sweden), N. Sneeuw (Calgary/Canada), L. Svensson (Gävle/Sweden), P.
Vanicek (Fredericton/New Brunswick/Canada).
For the book production I want to thank in particular J. Cai, F. Krumm, A.
Vollmer, M. Paweletz, T. Fuchs, A. Britchi, and D. Wilhelm (all from Stuttgart/
Germany).
At the end my sincere thanks go to the Walter de Gruyter Publishing Company
for including my book into their Geoscience Series, in particular to Dr. Manfred
Karbe and Dr. Robert Plato for all support.

Stuttgart, December 2005 Erik W. Grafarend


Contents

1 The first problem of algebraic regression


– consistent system of linear observational equations –
underdetermined system of linear equations:
{Ax = y | A  \ n×m , y  R ( A )  rk A = n, n = dim Y} 1
1-1 Introduction 3
1-11 The front page example 4
1-12 The front page example in matrix algebra 5
1-13 Minimum norm solution of the front page example by
means of horizontal rank partitioning 7
1-14 The range R( f ) and the kernel N(A) 9
1-15 Interpretation of “MINOS” by three partitionings 12
1-2 The minimum norm solution: “MINOS” 17
1-21 A discussion of the metric of the parameter space X 23
1-22 Alternative choice of the metric of the parameter space X 24
1-23 G x -MINOS and its generalized inverse 25
1-24 Eigenvalue decomposition of G x -MINOS:
canonical MINOS 26
1-3 Case study:
Orthogonal functions, Fourier series versus Fourier-Legendre
series, circular harmonic versus spherical harmonic regression 40
1-31 Fourier series 41
1-32 Fourier-Legendre series 52
1-4 Special nonlinear models 68
1-41 Taylor polynomials, generalized Newton iteration 68
1-42 Linearized models with datum defect 74
1-5 Notes 82
2 The first problem of probabilistic regression – special Gauss-
Markov model with datum defect – Setup of the linear uniformly
minimum bias estimator of type LUMBE for fixed effects 85
2-1 Setup of the linear uniformly minimum bias estimator of type
LUMBE 86
2-2 The Equivalence Theorem of G x -MINOS and S -LUMBE 90
2-3 Examples 91
3 The second problem of algebraic regression
– inconsistent system of linear observational equations –
overdetermined system of linear equations:
{Ax + i = y | A  \ n×m , y  R ( A )  rk A = m, m = dim X} 95
3-1 Introduction 97
3-11 The front page example 97
xiv Contents

3-12 The front page example in matrix algebra 98


3-13 Least squares solution of the front page example by means
of vertical rank partitioning 100
3-14 The range R( f ) and the kernel N( f ), interpretation of the
least squares solution by three partitionings 103
3-2 The least squares solution: “LESS” 111
3-21 A discussion of the metric of the parameter space X 118
3-22 Alternative choices of the metric of the
observation space Y 119
3-221 Optimal choice of weight matrix: SOD 120
3-222 The Taylor Karman criterion matrix 124
3-223 Optimal choice of the weight matrix: 125
The space R ( A ) and R ( A )
A

3-224 Fuzzy sets 129


3-23 G x -LESS and its generalized inverse 129
3-24 Eigenvalue decomposition of G y -LESS: canonical LESS 131
3-3 Case study
Partial redundancies, latent conditions, high leverage points
versus break points, direct and inverse Grassmann coordinates,
Plücker coordinates 143
3-31 Canonical analysis of the hat matrix,
partial redundancies, high leverage points 143
3-32 Multilinear algebra, ”join” and “meet”,
the Hodge star operator 152
3-33 From A to B: latent restrictions, Grassmann coordinates,
Plücker coordinates 158
3-34 From B to A: latent parametric equations,
dual Grassmann coordinates, dual Plücker coordinates 172
3-35 Break points 176
3-4 Special linear and nonlinear models
A family of means for direct observations 184
3-5 A historical note on C. F. Gauss, A.-M. Legendre and the
invention of Least Squares and its generalization 185
4 The second problem of probabilistic regression
– special Gauss-Markov model without datum defect –
Setup of BLUUE for the moments of first order and of BIQUUE
for the central moment of second order 187
4-1 Introduction 190
4-11 The front page example 191
4-12 Estimators of type BLUUE and BIQUUE of the
front page example 192
4-13 BLUUE and BIQUUE of the front page example, sample
median, median absolute deviation 201
Contents xv

4-14 Alternative estimation Maximum Likelihood (MALE) 205


4-2 Setup of the best linear uniformly unbiased estimators of type
BLUUE for the moments of first order 208
4-21 The best linear uniformly unbiased estimation
ȟ̂ of ȟ : Ȉ y -BLUUE 208
4-22 The Equivalence Theorem of G y -LESS and Ȉ y -BLUUE 216
4-3 Setup of the best invariant quadratic uniform
by unbiased estimator of type BIQUUE for the
central moments of second order 217
4-31 Block partitioning of the dispersion matrix and linear
space generated by variance-covariance components 218
4-32 Invariant quadratic estimation of variance-covariance
components of type IQE 223
4-33 Invariant quadratic uniformly unbiased estimations of
variance-covariance components of type IQUUE 226
4-34 Invariant quadratic uniformly unbiased estimations
of one variance component (IQUUE) from
Ȉ y -BLUUE: HIQUUE 230
4-35 Invariant quadratic uniformly unbiased estimators of
variance covariance components of Helmert type:
HIQUUE versus HIQE 232
4-36 Best quadratic uniformly unbiased estimations of one
variance component: BIQUUE 236
5 The third problem of algebraic regression
– inconsistent system of linear observational equations
with datum defect overdetermined- underdermined system of linear
equations: {Ax + i = y | A  \ n×m , y  R ( A )  rk A < min{m, n}} 243
5-1 Introduction 245
5-11 The front page example 246
5-12 The front page example in matrix algebra 246
5-13 Minimum norm - least squares solution of the front page
example by means of additive rank partitioning 248
5-14 Minimum norm - least squares solution of the front page
example by means of multiplicative rank partitioning: 252
5-15 The range R( f ) and the kernel N( f ) interpretation of
“MINOLESS” by three partitionings 256
5-2 MINOLESS and related solutions like weighted minimum norm-
weighted least squares solutions 263
5-21 The minimum norm-least squares solution: "MINOLESS" 263
5-22 (G x , G y ) -MINOS and its generalized inverse 273
5-23 Eigenvalue decomposition of (G x , G y ) -MINOLESS 277
5-24 Notes 282
xvi Contents

5-3 The hybrid approximation solution: D-HAPS and Tykhonov-


Phillips regularization 282
6 The third problem of probabilistic regression
– special Gauss- Markov model with datum problem –
Setup of BLUMBE and BLE for the moments of first order and
of BIQUUE and BIQE for the central moment of second order 285
6-1 Setup of the best linear minimum bias estimator
of type BLUMBE 287
6-11 Definitions, lemmas and theorems 289
6-12 The first example: BLUMBE versus BLE, BIQUUE
versus BIQE, triangular leveling network 296
6-121 The first example: I3, I3-BLUMBE 297
6-122 The first example: V, S-BLUMBE 301
6-123 The first example: I3 , I3-BLE 306
6-124 The first example: V, S-BLE 308
6-2 Setup of the best linear estimators of type hom BLE,
hom S-BLE and hom Į-BLE for fixed effects 312
7 A spherical problem of algebraic representation
– Inconsistent system of directional observational equations-
overdetermined system of nonlinear equations on curved
manifolds 327
7-1 Introduction 328
7-2 Minimal geodesic distance: MINGEODISC 331
7-3 Special models: from the circular normal distribution to the
oblique normal distribution 335
7-31 A historical note of the von Mises distribution 335
7-32 Oblique map projection 337
7-33 A note on the angular metric 340
7-4 Case study 341
8 The fourth problem of probabilistic regression
– special Gauss-Markov model with random effects–
Setup of BLIP and VIP for the moments of first order 347
8-1 The random effect model 348
8-2 Examples 362
9 The fifth problem of algebraic regression - the system of
conditional equations: homogeneous and inhomogeneous equations -
{By = Bi versus c + By = Bi} 373
9-1 G y -LESS of system of inconsistent homogeneous
conditional equations 374
9-2 Solving a system of inconsistent
inhomogeneous conditional equations 376
Contents xvii

9-3 Examples 377

10 The fifth problem of probabilistic regression


– general Gauss-Markov model with mixed effects–
Setup of BLUUE for the moments of first order
(Kolmogorov-Wiener prediction) 379
10-1 Inhomogeneous general linear Gauss-Markov model
(fixed effects and random effects) 380
10-2 Explicit representations of errors in the general
Gauss-Markov model with mixed effects 385
10-3 An example for collocation 386
10-4 Comments 397
11 The sixth problem of probabilistic regression
– the random effect model – “errors-in-variables” 401
11-1 Solving the nonlinear system of the model
“errors-in-variables” 404
11-2 Example: The straight line fit 406
11-3 References 410
12 The sixth problem of generalized algebraic regression
– the system of conditional equations with unknowns –
(Gauss-Helmert model) 411
12-1 Solving the system of homogeneous condition equations
with unknowns 414
12-11 W-LESS 414
12-12 R, W-MINOLESS 416
12-13 R, W-HAPS 419
12-14 R, W-MINOLESS against R, W - HAPS 421
12-2 Examples for the generalized algebraic regression problem:
homogeneous conditional equations with unknowns 421
12-21 The first case: I-LESS 422
12-22 The second case: I, I-MINOLESS 422
12-23 The third case: I, I-HAPS 423
12-24 The fourth case: R, W-MINOLESS,
R positive semidefinite, W positive semidefinite 423
12-3 Solving the system of inhomogeneous condition equations
with unknowns 424
12-31 W-LESS 424
12-32 R, W-MINOLESS 426
12-33 R, W-HAPS 427
12-34 R, W-MINOLESS against R, W-HAPS 428
12-4 Conditional equations with unknowns: from the algebraic
approach to the stochastic one 429
xviii Contents

12-41 Shift to the center 429


12-42 The condition of unbiased estimators 429
12-43 n
The first step: unbiased estimation of ȟ̂ and E {ȟ} 430
12-44 The second step: unbiased estimation N 1 and N 2 430
13 The nonlinear problem of the 3d datum transformation and the
Procrustes Algorithm 431
13-1 The 3d datum transformation and the Procrustes Algorithm 433
13-2 The variance - covariance matrix of the error matrix E 441
13-3 Case studies: The 3d datum transformation and the
Procrustes Algorithm 441
13-4 References 444
14 The seventh problem of generalized algebraic regression
revisited: The Grand Linear Model:
The split level model of conditional equations with unknowns
(general Gauss-Helmert model) 445
14-1 Solutions of type W-LESS 446
14-2 Solutions of type R, W-MINOLESS 449
14-3 Solutions of type R, W-HAPS 450
14-4 Review of the various models: the sixth problem 453

15 Special problems of algebraic regression and stochastic estimation:


multivariate Gauss-Markov model, the n-way classification model,
dynamical systems 455
15-1 The multivariate Gauss-Markov model – a special problem
of probabilistic regression – 455
15-2 n-way classification models 460
15-21 A first example: 1-way classification 460
15-22 A second example: 2-way classification
without interaction 464
15-23 A third example: 2-way classification with interaction 469
15-24 Higher classifications with interaction 474
15-3 Dynamical Systems 476

Appendix A: Matrix Algebra 485


A1 Matrix-Algebra 485
A2 Special Matrices 488
A3 Scalar Measures and Inverse Matrices 495
A4 Vectorvalued Matrix Forms 506
A5 Eigenvalues and Eigenvectors 509
A6 Generalized Inverses 513
Contents xix

Appendix B: Matrix Analysis 522


B1 Derivations of Scalar-valued and Vector-valued
Vector Functions 522
B2 Derivations of Trace Forms 523
B3 Derivations of Determinantal Forms 526
B4 Derivations of a Vector/Matrix Function of a Vector/Matrix 527
B5 Derivations of the Kronecker-Zehfuß product 528
B6 Matrix-valued Derivatives of Symmetric or
Antisymmetric Matrix Functions 528
B7 Higher order derivatives 530
Appendix C: Lagrange Multipliers 533
C1 A first way to solve the problem 533
Appendix D: Sampling distributions and their use:
Confidence Intervals and Confidence Regions 543
D1 A first vehicle: Transformation of random variables 543
D2 A second vehicle: Transformation of random variables 547
D3 A first confidence interval of Gauss-Laplace normally
distributed observations: P , V 2 known, the Three Sigma Rule 553
D31 The forward computation of a first confidence interval of
Gauss-Laplace normally distributed observations: P , V 2 known 557
D32 The backward computation of a first confidence interval of
Gauss-Laplace normally distributed observations: P , V 2 known 564
D4 Sampling from the Gauss-Laplace normal distribution:
a second confidence interval for the mean, variance known 567
D41 Sampling distributions of the sample mean P̂ , V 2 known,
and of the sample variance Vˆ 2 582
D42 The confidence interval for the sample mean, variance known 592
D5 Sampling from the Gauss-Laplace normal distribution:
a third confidence interval for the mean, variance unknown 596
D51 Student’s sampling distribution of the random variable ( Pˆ P ) / Vˆ 596
D52 The confidence interval for the sample mean, variance unknown 605
D53 The Uncertainty Principle 611
D6 Sampling from the Gauss-Laplace normal distribution:
a fourth confidence interval for the variance 613
D61 The confidence interval for the variance 613
D62 The Uncertainty Principle 619
xx Contents

D7 Sampling from the multidimensional Gauss-Laplace


normal distribution: the confidence region for the fixed
parameters in the linear Gauss-Markov model 621

Appendix E: Statistical Notions 163

E1 Moments of a probability distribution, the Gauss-Laplace


normal distribution and the quasi-normal distribution 644
E2 Error propagation 648
E3 Useful identities 651
E4 The notions of identifiability and unbiasedness 652

Appendix F: Bibliographic Indexes 655

References 659

Index 745
1 The first problem of algebraic regression
– consistent system of linear observational equations –
underdetermined system of linear equations:
{Ax = y | A  \ n×m , y  R ( A )  rk A = n, n = dim Y}

: Fast track reading:


Read only Lemma 1.3.

Lemma 1.2
xm G x -MINOS of x

Lemma 1.3
xm G x -MINOS of x

Definition 1.1 Lemma 1.4


xm G x -MINOS of x characterization of G x -MINOS

Lemma 1.6
#
adjoint operator A

Definition 1.5 Lemma 1.7


# eigenspace analysis versus
Adjoint operator A
eigenspace synthesis

Corollary 1.8
Symmetric pair of eigensystems

Lemma 1.9
Canonical MINOS

“The guideline of chapter one: definitions, lemmas and corollary”


2 1 The first problem of algebraic regression

The minimum norm solution of a system of consistent linear equations Ax = y


subject to A  R n× m , rk A = n, n < m, is presented by Definition 1.1, Lemma 1.2
and Lemma 1.3. Lemma 1.4 characterizes the solution of the quadratic optimiza-
tion problem in terms of the (1,2,4)-generalized inverse, in particular
the right inverse.
The system of consistent nonlinear equations Y = F( X) is solved by means of
two examples. Both examples are based on distance measurements in a planar
network, namely a planar triangle. In the first example Y = F( X) is linearized at
the point x, which is given by prior information and solved by means of Newton
iteration. The minimum norm solution is applied to the consistent system of
linear equations 'y = A'x and interpreted by means of first and second mo-
ments of the nodal points. In contrast, the second example aims at solving the
consistent system of nonlinear equations Y = F( X) in a closed form. Since dis-
tance measurements as Euclidean distance functions are left equivariant under
the action of translation group as well as the rotation group – they are invariant
under translation and rotation of the Cartesian coordinate system – at first a TR-
basis (translation-rotation basis) is established. Namely the origin and the axes of
the coordinate system are fixed. With respect to the TR-basis (a set of free pa-
rameters has been fixed) the bounded parameters are analytically fixed. Since no
prior information is built in, we prove that two solutions of the consistent system
of nonlinear equations Y = F( X) exist. In the chosen TR-basis the solution vec-
tor X is not of minimum norm. Accordingly, we apply a datum transformation
X 6 x of type group of motion (decomposed into the translation group and the
rotation group). The parameters of the group of motion (2 for translation, 1 for
rotation) are determined under the condition of minimum norm of the unknown
vector x, namely by means of a special Procrustes algorithm. As soon as the
optimal datum parameters are determined we are able to compute the unknown
vector x which is minimum norm. Finally, the Notes are an attempt to explain the
origin of the injectivity rank deficiency, namely the dimension of the null space
N ( A), m  rk A of the consistent system of linear equations Ax = y subject to
A  R n× m and rk A = n, n < m , as well as of the consistent system of nonlinear
equations F( X) = Y subject to a Jacobi matrix J  R n× m and rk J = n, n < m =
dim X. The fundamental relation to the datum transformation, also called trans-
formation groups (conformal group, dilatation group /scale/, translation group,
rotation group and projective group) as well as to the “soft” Implicit Function
Theorem is outlined.
By means of a certain algebraic objective function which geometrically is called
minimum distance function we solve the first inverse problem of linear and
nonlinear equations, in particular of algebraic type, which relate observations to
parameters. The system of linear or nonlinear equations we are solving here is
classified as underdetermined. The observations, also called measurements, are
elements of a certain observation space Y of integer dimension, dim Y = n,
1-1 Introduction 3

which may be metrical, especially Euclidean, pseudo–Euclidean, in general a


differentiable manifold. In contrast, the parameter space X of integer dimension,
dim X = m, is metrical as well, especially Euclidean, pseudo–Euclidean, in gen-
eral a differentiable manifold, but its metric is unknown. A typical feature of
algebraic regression is the fact that the unknown metric of the parameter space
X is induced by the functional relation between observations and parameters.
We shall outline three aspects of any discrete inverse problem: (i) set-theoretic
(fibering), (ii) algebraic (rank partitioning, “IPM”, the Implicit Function Theo-
rem) and (iii) geometrical (slicing)
Here we treat the first problem of algebraic regression:
A consistent system of linear observational equations: Ax = y , A  R n× m ,
rk A = n, n < m , also called “underdetermined system of linear equations”, in
short
“more unknowns than equations”
is solved by means of an optimization problem. The Introduction presents us a
front page example of two inhomogeneous linear equations with unknowns. In
terms of five boxes and five figures we review the minimum norm solution of
such a consistent system of linear equations which is based upon the trinity

1-1 Introduction
With the introductory paragraph we explain the fundamental
concepts and basic notions of this section. For you, the analyst,
who has the difficult task to deal with measurements,
observational data, modeling and modeling equations we present
numerical examples and graphical illustrations of all abstract
notions. The elementary introduction is written not for a mathe-
matician, but for you, the analyst, with limited remote control of
the notions given hereafter. May we gain your interest.
Assume an n-dimensional observation space Y, here a linear space parameter-
ized by n observations (finite, discrete) as coordinates y = [ y1 ," , yn ]c  R n in
which an r-dimensional model manifold is embedded (immersed). The model
4 1 The first problem of algebraic regression

manifold is described as the range of a linear operator f from an m-dimensional


parameters space X into the observation space Y. The mapping f is established
by the mathematical equations which relate all observables to the unknown pa-
rameters. Here the parameters space X , the domain of the linear operator f, will
be restricted also to a linear space which is parameterized by coordi-
nates x = [ x1 ," , xm ]c  R m . In this way the linear operator f can be understood as
a coordinate mapping A : x 6 y = Ax. The linear mapping f : X o Y is geo-
metrically characterized by its range R(f), namely R(A), defined by R(f):=
{y  Y | y = f (x) for all x  X} which in general is a linear subspace of Y and
its null space defined by N ( f ) := {x  X | f (x) = 0}. Here we restrict the range
R(f), namely R(A), to coincide with the n = r-dimensional observation space Y
such that y  R (f ) , namely y  R (A) .
Example 1.1 will therefore demonstrate the range space R(f), namely R(A),
which here coincides with the observation space Y, (f is surjective or “onto”) as
well as the null space N(f), namely N(A), which is not empty. (f is not injective
or one-to-one)
Box 1.1 will introduce the special linear model of interest. By means of Box 1.2
it will be interpreted as a polynomial of degree two based upon two observations
and three unknowns, namely as an underdetermined system of consistent linear
equations. Box 1.3 reviews the formal procedure in solving such a system of
linear equations by means of “horizontal” rank partitioning and the postulate of
the minimum norm solution of the unknown vector. In order to identify the range
space R(A), the null space N(A) and its orthonormal complement, N ( A) A ,
Box 1.4 by means of algebraic partitioning (“horizontal” rank partitioning) out-
lines the general solution of a system of homogeneous linear equations approach-
ing zero. With a background Box 1.5 presents the diagnostic algorithm for solv-
ing an underdetermined system of linear equations. In contrast, Box 1.6, is a
geometric interpretation of a special solution of a consistent system of inhomo-
geneous linear equations of type “minimum norm” (MINOS). The g-inverse A m
of the type “MINOS” is finally characterized by three conditions collected in Box
1.7.
Figure 1.1 demonstrates the range space R(A), while Figure 1.2 and 1.3 demon-
strate the null space N ( A ) as well as its orthogonal complement N ( A) A . Fig-
ure 1.4 illustrates the orthogonal projection of an element of the null space
N ( A ) onto the range space R ( A  ) , where A  is a generalized inverse. In
terms of fibering the set of points of the parameter space as well as of the obser-
vation space Figure 1.5 introduced the related Venn diagrams.
1-11 The front page example
Example 1.1 (polynomial of degree two, consistent system of linear
equations Ax = y, x  X = R m , dim X = m,
y  Y = R n , dim Y = n, r = rk A = dim Y ):
1-1 Introduction 5

First, the introductory example solves the front page consistent system of linear
equations,
x1 + x2 + x3 = 2
x1 + 2 x2 + 4 x3 = 3,

obviously in general dealing with the linear space X = R m


x, dim X = m, here
m=3, called the parameter space, and the linear space Y = R n
y , dim Y = n,
here n=2 , called the observation space.
1-12 The front page example in matrix algebra
Second, by means of Box 1.1 and according to A. Cayley’s doctrine let us specify
the consistent system of linear equations in terms of matrix algebra.
Box 1.1:
Special linear model: polynomial of degree two,
two observations, three unknowns

ª x1 º
ª y º ªa a12 a13 º « »
y = « 1 » = « 11 x2 œ
¬ y2 ¼ ¬ a21 a22 a23 »¼ « »
¬« x3 ¼»

ª x1 º
ª 2 º ª1 1 1 º « »
œ y = Ax : « » = « » « x2 » œ
¬ 3 ¼ ¬1 2 4 ¼ « »
¬ x3 ¼
œ xc = [ x1 , x2 , x3 ], y c = [ y1 , y2 ] = [2, 3],
x  R 3×1 , y  Z +2×1  R 2×1

ª1 1 1 º
A := « »  Z+  R

¬1 2 4 ¼
r = rk A = dim Y = n = 2.

The matrix A  R n× m is an element of R n× m , the n×m array of real numbers.


dim X = m defines the number of unknowns (here: m=3), dim Y = n, the num-
ber of observations (here: n=2). A mapping f is called linear if f ( x1 + x2 ) =
f ( x1 ) + f ( x2 ) and f (O x) = O f ( x) holds. Beside the range R(f), the range
space R(A), the linear mapping is characterized by the kernel N ( f ) :=
{x  R m | f (x) = 0}, the null space N ( A) := {x  R m | Ax = 0} to be specified
lateron.
? Why is the front page system of linear equations called “underdetermined”?
6 1 The first problem of algebraic regression

Just observe that we are left with only two linear equations for three unknowns
( x1 , x2 , x3 ) . Indeed the system of inhomogeneous linear equations is “underde-
termined”. Without any additional postulate we shall be unable to inverse those
equations for ( x1 , x2 , x3 ) . In particular we shall outline how to find such an
additional postulate. Beforehand we have to introduce some special notions from
the theory of operators.
Within matrix algebra the index of the linear operator A is the rank r = rk A,
here r = 2, which coincides with the dimension of the observation space, here
n = dim Y = 2. A system of linear equations is called consistent if rk A = dim Y.
Alternatively we say that the mapping f : x 6 y = f (x)  R (f ) or
A : x 6 Ax = y  R (A) takes an element x  X into the range R(f) or the
range space R(A), also called the column space of the matrix A.
f : x 6 y = f (x), y  R ( f )

A : x 6 Ax = y, y  R(A ) .

Here the column space is spanned by the first column c1 and the second column
c 2 of the matrix A, the 2×3 array, namely

­° ª1º ª 1 º ½°
R (A) = span ® « » , «2» ¾ .
¯° ¬1¼ ¬ ¼ ¿°
Let us continue with operator theory. The right complementary index of the
linear operator A  R n× m which accounts for the injectivity defect given by d =
m - rk A (here d = m - rk A = 1). “Injectivity” relates to the kernel N(f), or “the
null space” we shall constructively introduce lateron.
How can such a linear model of interest, namely a system of consistent linear
equations, be generated?
Let us assume that we have observed a dynamical system y(t) which is repre-
sented by a polynomial of degree two with respect to time t  R, namely
y (t ) = x1 + x2 t + x3t 2 .

y (t ) = 2 x3 it is a dynamical system with constant acceleration or constant


Due to 
second derivative with respect to time t. The unknown polynomial coefficients
are collected in the column array x = [ x1 ," , xm ]c, x  X = R 3 , dim X = 3, and
constitute the coordinates of the three-dimensional parameter space X. If the
dynamical system y(t) is observed at two instants, say y(t1) = y1 = 2, y(t2) = y2 =
3, say at t1 = 1 and t2 = 2, respectively, and if we collect the observations in the
column array y = [ y1 , y2 ]c = [2, 3]c, y  Y = R 2 , dim Y = 2, they constitute the
coordinates of the two-dimensional observation space Y. Thus we are left with a
special linear model interpreted in Box 1.2. We use “  ” as the symbol for
“equivalence”.
1-1 Introduction 7

Box 1.2:
Special linear model: polynomial of degree two,
two observations, three unknowns

ª x1 º
ª y º ª1 t1 t12 º « »
y = « 1» = « » x2 œ
¬ y2 ¼ ¬1 t2 t22 ¼ « »
«¬ x3 »¼

ª x1 º
ª t1 = 1, y1 = 2 ª 2 º ª1 1 1 º « »
œ« :« » = « » « x2 » ~
¬t2 = 2, y2 = 3 ¬ 3 ¼ ¬1 2 4 ¼ « x »
¬ 3¼

~ y = Ax, r = rk A = dim Y = n = 2 .

Third, let us begin with a more detailed analysis of the linear mapping
f : Ax = y , namely of the linear operator A  R n× m , r = rk A = dim Y = n. We
shall pay special attention to the three fundamental partitionings, namely
(i) algebraic partitioning called rank partitioning of the matrix A,
(ii) geometric partitioning called slicing of the linear space X,
(iii) set-theoretical partitioning called fibering of the domain D(f).

1-13 Minimum norm solution of the front page example by means of


horizontal rank partitioning
Let us go back to the front page consistent system of linear equations, namely the
problem to determine three unknown polynomial coefficients from two sampling
points which we classified as an underdetermined one. Nevertheless we are able
to compute a unique solution of the underdetermined system of inhomogeneous
linear equations Ax = y , y  R ( A) or rk A = dim Y , here A  R 2×3 , x  R 3×1 ,
y  R 2×1 if we determine the coordinates of the unknown vector x of minimum
norm (minimal Euclidean length, A2-norm), here & x &2I = xcx = x12 + x22 + x32 =
min.
Box 1.3 outlines the solution of the related optimization problem.
Box 1.3:
Minimum norm solution of the consistent system of inhomogeneous
linear equations, horizontal rank partitioning
The solution of the optimization problem
{|| x ||2I = min | Ax = y , rk A = dim Y}
x
is based upon the horizontal rank partitioning of the linear mapping
8 1 The first problem of algebraic regression

f : x 6 y = Ax, rk A = dim Y , which we already introduced.


As soon as we decompose x1 =  A11 A 2 x 2 + A11 y and implement it
in the norm & x &2I , we are prepared to compute the first derivatives
of the unconstrained Lagrangean
L (x1 , x 2 ) := & x &2I = x12 + x22 + x32 =
= (y  A 2 x 2 )c( A1A1c ) 1 (y  A 2 x 2 ) + xc2 x 2 =
= y c( A1A1c ) 1 y  2xc2 A 2 ( A1A1c ) 1 y + xc2 A c2 ( A1A1c ) 1 A 2 x 2 + xc2 x 2 =
= min
x2

wL
(x 2m ) = 0 œ
wx 2
œ  A c2 ( A1A1c ) 1 y + [ A c2 ( A1A1c ) 1 A 2 + I]x 2m = 0 œ
œ x 2 m = [ A c2 ( A1A1c ) 1 A 2 + I]1 Ac2 ( A1 A1c ) 1 y,

which constitute the necessary conditions. (The theory of vector


derivatives is presented in Appendix B.) Following Appendix A
devoted to matrix algebra, namely (I + AB) 1 A = A(I + BA) 1 ,
(BA) 1 = A 1B 1 , for appropriate dimensions of the involved
matrices, such that the identities hold
x 2 m = [ A c2 ( A1 A1c ) 1 A 2 + I]1 A c2 ( A1 A1c ) 1 y =
= Ac2 ( A1 A1c ) 1 [ A 2 Ac2 ( A1 A1c ) 1 + I]1 y =
= Ac2 [ A 2 Ac2 ( A1 A1c ) 1 + I]1 ( A1 A1c ) 1 y ,

we finally derive

x 2 m = A c2 ( A1 A1c + A 2 A c2 ) 1 y.

The second derivatives

w2L
(x 2 m ) = 2[ A c2 ( A1 A1c ) 1 A 2 + I ] > 0
wx 2 wxc2
due to positive-definiteness of the matrix A c2 ( A1 A1c ) 1 A 2 + I
generate the sufficiency condition for obtaining the minimum of
the unconstrained Lagrangean. Finally let us backward transform
x 2 m 6 x1m =  A11 A 2 x 2 + A11 y.
x1m =  A11 A 2 A c2 ( A1 A1c + A 2 A c2 ) 1 y + A11 y.

Let us right multiply the identity A1A1c =  A 2 A c2 + ( A1A1c + A 2 A c2 )


by ( A1 A1c + A 2 A c2 ) 1 such that
1-1 Introduction 9

A1 A1c ( A1 A1c + A 2 A c2 ) 1 =  A 2 A c2 ( A1 A1c + A 2 A c2 ) 1 + I

holds, and left multiply by A11 , namely


A1c ( A1 A1c + A 2 A c2 ) 1 =  A11 A 2 A c2 ( A1 A1c + A 2 A c2 ) 1 + A11 .

Obviously we have generated the linear form

°­ x1m = A1c ( A1A1c + A 2 A c2 ) y


1

®
°̄ x 2m = A c2 ( A1A1c + A 2 A c2 ) y
1

or
ª x1m º ª A1c º
« x » = « A c » ( A1 A1c + A 2 A c2 ) y
1

¬ 2m ¼ ¬ 2 ¼
or
x m = A c( AA c) 1 y.

A numerical computation with respect to the introductory example is


ª3 7 º
A1 A1c + A 2 A c2 = « 1 ª 21 7 º
, ( A1 A1c + A 2 A c2 ) 1 = 14
» « 7 3 »
¬ 7 21¼ ¬ ¼

1 ª14 4 º
A1c ( A1 A1c + A 2 A c2 ) 1 = 14 « 7 1»
¬ ¼
1 [ 7, 6]
A c2 ( A1 A1c + A 2 A c2 ) 1 = 14

ª 73 º
x1m = « 11 » , x 2 m = 14 , & x m & I = 14 42 Ÿ
1 3
¬ 14 ¼
y (t ) = 87 + 14
11
t + 141 t 2

w2L
(x 2 m ) = 2[ A c2 ( A1 A1c ) 1 A 2 + I ] = 28 > 0 .
wx 2 wxc2

1-14 The range R(f) and the kernel N(f)


Fourthly, let us go into the detailed analysis of R(f), N(f), N ( f ) A with respect
to the front page example. How can we actually identify the range space R(A),
the null space N(A) or its orthogonal complement N ( A) A ?
The range space R (A) := {y  R n | Ax = y, x  R m } is conveniently described
by first column c1 = [1, 1]c and the second column c 2 = [1, 2]c of the matrix A,
namely 2-leg
10 1 The first problem of algebraic regression

{e1 + e 2 , e1 + 2e 2 | O}
or
{ec1 , ec 2 | O},

with respect to the orthogonal base vector e1 and e 2 , respectively, attached to the
origin O. Symbolically we write
R ( A) = span{e1 + e 2 , e1 + 2e 2 | O}.

As a linear space R (A)  Y is illustrated by Figure 1.1.

ec2

1
ec1
e2

e1 1

Figure 1.1: Range R(f), range space R(A), y  R (A)


By means of Box 1.4 we identify N(f) or “the null space N(A)” and give its
illustration by Figure 1.2. Such a result has paved the way to the diagnostic algo-
rithm for solving an underdetermined system of linear equations by means of
rank partitioning presented in Box 1.5.
Box 1.4:
The general solution of the system of homogeneous
linear equations Ax = 0, “horizontal” rank partitioning
The matrix A is called “horizontally rank partitioned ”, if

­ r = rk A = rk A1 = n ½
° n× m n× r n× d °
® A  R š A = [ A1 , A 2 ] A1  R , A 2  R ¾
° d = d ( A) = m  rk A °¿
¯
holds. (In the introductory example A  R 2×3 , A1  R 2× 2 ,
A 2  R 2×1 , rk A = 2, d ( A) = 1 applies.) A consistent system of
linear equations Ax = y, rk A = dim Y , is “horizontally rank
partitioned” if
Ax = y , rk A = dim Y œ A1x1 + A 2 x 2 = y
1-1 Introduction 11

for a partitioned unknown vector


ªx º
{x  R m š x = « 1 » | x1  R r ×1 , x 2  R d ×1 }
¬x2 ¼
applies. The “horizontal” rank partitioning of the matrix A as
well as the “horizontally rank partitioned” consistent system of
linear equations Ax = y, rk A = dim Y , of the introductory
example is
ª1 1 1 º ª1 1 º ª1º
A=« » , A1 = « » , A2 = « » ,
¬1 2 4 ¼ ¬1 2¼ ¬ 4¼
Ax = y , rk A = dim Y œ A1x1 + A 2 x 2 = y

x1 = [ x1 , x2 ]c  R 2×1 , x 2 = [ x3 ]  R

ª1 1 º ª x1 º ª 1 º
«1 2 » « x » + « 4 » x3 = y.
¬ ¼¬ 2¼ ¬ ¼
By means of the horizontal rank partitioning of the system of
homogenous linear equations an identification of the null space
N(A), namely
N ( A) = {x  R m | Ax = A1 x1 + A 2 x 2 = 0},
is
A1 x1 + A 2 x 2 = 0 œ x1 =  A11 A 2 x 2 ,

particularly in the introductory example


ª x1 º ª 2 1º ª 1 º
« x » =  « 1 1 » « 4 » x3 ,
¬ 2¼ ¬ ¼¬ ¼

x1 = 2 x3 = 2W , x2 = 3x3 = 3W , x3 = W .

Here the two equations Ax = 0 for any x  X = R 2 constitute the linear space
N(A), dim N ( A) = 1 , a one-dimensional subspace of X = R 2 . For instance, if
we introduce the parameter x3 = W , the other coordinates of the parameter
space X = R 2 amount to x1 = 2W , x2 = 3W . In geometric language the linear
1
space N(A) is a parameterized straight line L0 through the origin illustrated by
Figure 1.2. The parameter space X = R r (here r = 2) is sliced by the subspace,
the linear space N(A), also called linear manifold, dim N ( A) = d( A) = d , here a
1
straight line L0 (here), through the origin O.
12 1 The first problem of algebraic regression

1-15 Interpretation of “MINOS” by three partitionings:


(i) algebraic (rank partitioning)
(ii) geometric (slicing)
(iii) set-theoretical (fibering)

Figure 1.2: The parameter space X = R 3


( x3 is not displayed) sliced by the null space,
1
the linear manifold N ( A) = L0  R 2
The diagnostic algorithm for solving an underdetermined system of linear equa-
tions y = Ax, rk A = dim Y = n, n < m = dim X, y  R ( A) by means of rank
partitioning is presented to you by Box 1.5.
Box 1.5:
algorithm
The diagnostic algorithm for solving an underdetermined system of
linear equations y = Ax, rk A = dim Y , y  R ( A) by means of rank
partitioning
Determine
the rank of the matrix A
rk A = dimY = n

Compute
the “horizontal rank partitioning ”
A = [ A1 , A 2 ], A1  R r × r = R n× n , A 2  R n× ( m  r ) = R n× ( m  n )

“ m  r = m  n = d is called
right complementary index.”
“A as a linear operator is not
injective, but surjective”
1-1 Introduction 13

Compute
the null space N(A)
N ( A) := {x  R m | Ax = 0} =
{x  R m | x1 + A11A 2 x 2 = 0}

Compute the unknown parameter vector of type


MINOS (Minimum Norm Solution x m )

x m = A c( AA c) 1 y . h

While we have characterized the general solution of the system of homogenous


linear equations Ax = 0, we are left with the problem of solving the consistent
system of inhomogeneous linear equations. Again we take advantage of the rank
partitioning of the matrix A summarized in Box 1.4.
Box 1.6:
A special solution of a consistent system of
inhomogeneous linear equations Ax = y,
“horizontal” rank partitioning
ª rk A = dim Y,
Ax = y , « œ A1 x1 + A 2 x 2 = y .
¬ y  R ( A)
Since the matrix A1 is of full rank it can be regularly inverted
(Cayley inverse). In particular, we solve for
x1 =  A11 A 2 x 2 + A11 y ,

or
ª x1 º ª 2 1º ª1 º ª 2 1º ª y1 º
« x » =  « 1 1 » « 4 » x3 + « 1 1 » « y » œ
¬ 2¼ ¬ ¼¬ ¼ ¬ ¼¬ 2¼
œ x1 = 2 x3 + 2 y1  y2 , x2 = 3x3  y1 + y2 .

For instance, if we introduce the parameter x3 = W , the other


coordinates of the parameter space X = R 2 amount to
x1 = 2W + 2 y1  y2 , x2 = 3W  y1 + y2 . In geometric language
the admissible parameter space is a family of a one-dimensional
linear space, a family of one-dimensional parallel straight
lines dependent on y = [ y1 , y2 ]c, here [2, 3]c, in particular
14 1 The first problem of algebraic regression

L1( y1 , y2 ) := { x  R 3 | x1 = 2 x3 + 2 y1  y2 , x2 = 3x3  y1 + y2 },

including the null space L1(0, 0) = N ( A). Figure 1.3 illustrates


(i) the admissible parameter space L1( y1 , y2 ) ,
(ii) the line L1A which is orthogonal to the null
space called N ( A) A ,
(iii) the intersection L1( y1 , y2 ) ˆ N ( A ) A , generating
the solution point xm as will be proven now.

x2
1
A ~ N (A)A

x1

N (A) ~ L1(0,0) L1( 2 , 3 )

Figure 1.3: The range space R ( A  ) (the admissible parameter space)


parallel straight lines L1( y , y ) , namely L1(2, 3) :
1 2

L ( y1 , y2 ) := { x  R | x1 = 2 x3 + 2 y1  y2 , x2 = 3x3  y1 + y2 } .
1 3

The geometric interpretation of the minimum norm solution & x & I = min is the
following: With reference to Figure 1.4 we decompose the vector
x = x N (A) + x N (A) A

where x N ( A ) is an element of the null space N ( A) (here: the straight line


L1(0, 0) ) and x N ( A ) is an element of the orthogonal complement N ( A) A of the
A

null space N ( A) (here: the straight line L1(0, 0) , while the inconcistency parame-
ter i N ( A ) = i m is an element of the range space R ( A  ) (here: the straight
line L1( y , y ) , namely L1(2, 3) ) of the generalized inverse matrix A  of type
1 2
MINOS (“minimum norm solution”). & x &2I =& x N ( A ) + x N ( A ) &2
A

=& x N ( A ) & +2 < x | i > + & i & is minimal if and only if the inner prod-
2 2

uct ¢ x N ( A ) | i ² = 0 , x N ( A ) and i m = i N ( A ) are orthogonal. The solution point


A

x m is the orthogonal projection of the null point onto R ( A  ) :


1-1 Introduction 15

PR ( A ) = A  Ax = A  y for all x  D ( A). Alternatively, if the vector x m of




minimal length is orthogonal to the null space N ( A ) , being an element of


N ( A) A (here: the line L1(0, 0) ) we may say that N ( A) A intersects R ( A  ) in
the solution point x m . Or the normal space NL10 with respect to the tangent
space TL10 – which is in linear models identical to L10 , the null space N ( A ) –
intersects the tangent space TL1y , the range space R ( A  ) in the solution point
x m . In summary, x m  N ( A ) A ˆ R ( A  ).

Figure 1.4: Orthogonal projection of an element of N ( A)


onto the range space R ( A  )
Let the algebraic partitioning and the geometric partitioning be merged to inter-
pret the minimum norm solution of the consistent system of linear equations of
type “underdetermined” MINOS. As a summary of such a merger we take refer-
ence to Box 1.7.
The first condition
AA  A = A

Let us depart from MINOS of y = Ax, x  X = R m , y  Y = R n , r = rk A = n,


namely
x m = A m y = A c( AA c) 1 y.
Ax m = AA m y = AA m Ax m Ÿ

Ÿ Ax m = AA m Ax m œ AA  A.

The second condition


A  AA  = A 
16 1 The first problem of algebraic regression

x m = A c( AA c) 1 y = A m y = A m Ax m Ÿ
x m = A m y = A m AA m y Ÿ

Ÿ A m y = A m AA m y œ A  AA  = A  .

rk A m = rk A is interpreted as follows: the g-inverse of type MINOS is the gen-


eralized inverse of maximal rank since in general rk A  d rk A holds
The third condition
AA  = PR ( A ) 

x m = A m y = A m Ax m œ A  A = PR ( A ) .


Obviously A m A is an orthogonal projection onto R ( A  ) , but i m = I  A m A


onto its orthogonal complemert R ( A  ) A .
If the linear mapping f : x 6 y = f (x), y  R (f ) is given we are aiming at a
generalized inverse (linear) mapping y 6 x  = g(y ) such that y = f (x) =
= f ( g (y ) = f ( g ( f ( x))) or f = f D g D f as a first condition is fulfilled. Alterna-
tively we are going to construct a generalized inverse A  : y 6 A  y = x  such
that the first condition y = Ax  = AA  Ay or AA  A = A holds. Though the
linear mapping f : x 6 y = f (x)  R (f ), or the system of linear equations
Ax = y , rk A = dim Y , is consistent, it suffers from the (injectivity) deficiency
of the linear mapping f(x) or of the matrix A. Indeed it recovers from the (injec-
tivity) deficiency if we introduce the projection x 6 g ( f (x)) = q  R (g ) or
x 6 A  Ax = q  R (A  ) as the second condition. Note that the projection
matrix A  A is idempotent which follows from P 2 = P or
( A  A)( A  A) = A  AA  A = A  A.

Box 1.7:
The general solution of a consistent system of linear equations;
f : x 6 y = Ax, x  X = R m (parameter space), y  Y = R n
(observation space) r = rk A = dim Y , A  generalized inverse
of MINOS type
Condition #1 Condition #1
f (x) = f ( g (y )) Ax = AA  Ax
œ œ
f = f DgD f . AA  A = A.
Condition #2 Condition #2
(reflexive g-inverse mapping) (reflexive g-inverse)
1-2 The minimum norm solution: “MINOS” 17

x = g (y ) = x  = A  y = A  AA  y œ
= g ( f (x)). œ A  AA  = A  .
Condition #3 Condition #3
g ( f (x)) = x R ( A 
)
A  A = x R (A )
œ œ
g D f = projR ( A 
) A A = projR (A ) .


The set-theoretical partitioning, the fibering of the set system of points which
constitute the parameters space X, the domain D(f), will be finally outlined.
Since the set system X (the parameters space) is R r , the fibering is called “triv-
ial”. Non-trivial fibering is reserved for nonlinear models in which case we are
dealing with a parameters space X which is a differentiable manifold. Here the
fibering

D( f ) = N ( f ) ‰ N ( f )A

produces the trivial fibers N ( f ) and N ( f ) A where the trivial fibers N ( f ) A is


the quotient set R n /N ( f ) . By means of a Venn diagram (John Venn 1834-
1928) also called Euler circles (Leonhard Euler 1707–1783) Figure 1.5 illus-
trates the trivial fibers of the set system X = R m generated by N ( f ) and
N ( f ) A . The set system of points which constitute the observation space Y is
not subject to fibering since all points of the set system D(f) are mapped into the
range R(f).

Figure 1.5: Venn diagram, trivial fibering of the domain D(f), trivial fibers
N(f) and N ( f ) A , f : R m = X o Y = R n , Y = R (f ) , X set system
of the parameter space, Y set system of the observation space.
1-2 The minimum norm solution: “MINOS”
The system of consistent linear equations Ax = y subject to A  R n× m , rk A =
n < m , allows certain solutions which we introduce by means of Definition 1.1
18 1 The first problem of algebraic regression

as a solution of a certain optimization problem. Lemma 1.2 contains the normal


equations of the optimization problem. The solution of such a system of normal
equations is presented in Lemma 1.3 as the minimum norm solution with respect
to the G x -seminorm. Finally we discuss the metric of the parameter space and
alternative choices of its metric before we identify by Lemma 1.4 the solution of
the quadratic optimisation problem in terms of the (1,2,4)-generalized inverse.
Definition 1.1 (minimum norm solution with respect to the
G x -seminorm):
A vector xm is called G x -MINOS (Minimum Norm Solution
with respect to the G x -seminorm) of the consistent system of
linear equations
­rk A = rk( A, y ) = n < m
°
Ax = y, y  Y { R , ®
n
A  R n×m (1.1)
° y  R ( A),
¯
if both
Ax m = y (1.2)

and in comparison to all other vectors of solution x  X { R m ,


the inequality

|| x m ||G2 x := xcmG x x m d xcG x x =:|| x ||G2 x (1.3)

holds. The system of inverse linear equations A  y + i = x,



rk A z m or x  R ( A  ) is inconsistent.
By Definition 1.1 we characterized G x -MINOS of the consistent system of lin-
ear equations Ax = y subject to A  R n× m , rk A = n < m (algebraic condition)
or y  R ( A) (geometric condition). Loosely speaking we are confronted with a
system of linear equations with more unknowns m than equations n, namely
n < m . G x -MINOS will enable us to find a solution of this underdetermined
problem. By means of Lemma 1.2 we shall write down the “normal equations”
of G x -MINOS.
Lemma 1.2 (minimum norm solution with respect to the
G x -(semi)norm) :
A vector x m  X { R m is G x -MINOS of (1.1) if and only if the
system of normal equations
ªG x A cº ª x m º ª 0 º
= (1.4)
«A
¬ 0 »¼ «¬ Ȝ m »¼ «¬ y »¼
1-2 The minimum norm solution: “MINOS” 19

with the vector Ȝ m  R n×1 of “Lagrange multipliers” is fulfilled.


x m exists always and is in particular unique, if
rk[G x , A c] = m (1.5)

holds or equivalently if the matrix G x + AcA is regular.


: Proof :
G x -MINOS is based on the constraint Lagrangean
L(x, Ȝ ) := xcG x x + 2Ȝ c( Ax  y ) = min
x, O

such that the first derivatives


1 wL ½
(x m , Ȝ m ) = G x x m + A cȜ m = 0 °
2 wx °
¾œ
1 wL °
(x m , Ȝ m ) = Ax m  y = 0
2 wO ¿°
ªG A c º ª x m º ª 0 º
œ« x »« » = « »
¬ A 0 ¼ ¬Ȝ m ¼ ¬y ¼
constitute the necessary conditions. The second derivatives

1 w 2L
(x m , Ȝ m ) = G x t 0
2 wxwxc
due to the positive semidefiniteness of the matrix G x generate the sufficiency
condition for obtaining the minimum of the constrained Lagrangean. Due to the
assumption rk A = rk [ A, y ] = n or y  R ( A) the existence of G x -MINOS x m
is guaranteed. In order to prove uniqueness of G x -MINOS x m we have to con-
sider case (i) G x positive definite and case (ii) G x positive semidefinite.
Case (i) : G x positive definite
Due to rk G x = m , G x z 0 , the partitioned system of normal equations

ªG A c º ª x m º ª 0 º
G x z 0, « x »« » = « »
¬ A 0 ¼ ¬Ȝ m ¼ ¬y ¼
is uniquely solved. The theory of inverse partitioned matrices (IPM) is presented
in Appendix A.
Case (ii) : G x positive semidefinite
Follow these algorithmic steps: Multiply the second normal equation by Ac in
order to produce A cAx  Acy = 0 or A cAx = Acy and add the result to the first
normal equation which generates
20 1 The first problem of algebraic regression

G x x m + A cAx m + A cȜ m = A cy or (G x + A cA)x m + A cȜ m = A cy .

The augmented first normal equation and the original second normal equation
build up the equivalent system of normal equations

ªG + A cA A cº ª x m º ª A cº
G x + A cA z 0, « x » « » = « » y,
¬ A 0 ¼ ¬ Ȝ m ¼ ¬I n ¼

which is uniquely solved due to rk (G x + A cA ) = m , G x + A cA z 0 . ƅ


The solution of the system of normal equations leads to the linear form x m = Ly
which is G x -MINOS of (1.1) and can be represented as following.

Lemma 1.3 (minimum norm solution with respect to the


G x -(semi-) norm):
x m = Ly is G x -MINOS of the consistent system of linear equations
(1.1)
Ax = y , rk A = rk( A, y ) = n (or y  R ( A) ), if and only if L  R m × n
is represented by
Case (i): G x = I m
L = A R = Ac( AAc) 1 (right inverse) (1.6)
x m = A y = Ac( AAc) y

R
1
(1.7)
x = xm + i m , (1.8)

is an orthogonal decomposition of the unknown vector x  X { R m


into the I-MINOS vector x m  Ln and the I-MINOS vector of in-
consistency i m  Ld subject to
x m = A c( AA c) 1 Ax , (1.9)
i m = x  x m =[I  A c( AA c) 1 A]x .
m
(1.10)

(Due to x m = A c( AA c) 1 Ax , I-MINOS has the reproducing property.


As projection matrices A c( AAc) 1 A , rk A c( AA c) 1 A = rk A = n and
[I m  A c( AA c) 1 A] , rk[I m  A c( AA c) 1 A] = n  rk A = d , are inde-
pendent). Their corresponding norms are positive semidefinite,
namely
& x m ||I2 = y c( AA c) 1 y = xcA c( AA c) 1 Ax = xcG m x (1.11)

|| i m ||I2 = xc[I m  A c( AA c) 1 A]x. (1.12)


1-2 The minimum norm solution: “MINOS” 21

Case (ii): G x positive definite


L = G x 1 A c( AG x 1 A c) 1 (weighted right inverse) (1.13)
x m = G A c( AG A c) y
1
x
1
x
1
(1.14)
x = xm + i m (1.15)

is an orthogonal decomposition of the unknown vector x  X { R m


into the G x -MINOS vector x m  Ln and the G x -MINOS vector of in-
consistency i m  Ld subject to
x m = G x 1 A c( AG x 1 A c) 1 Ax , (1.16)

i m = x  x m = [I m  G x 1 A c( AG x 1 A c) 1 A]x . (1.17)

(Due to x m = G x 1 A c( AG x 1 A c) 1Ax G x -MINOS has the reproducing


property. As projection matrices G x 1 A c( AG x 1 A c) 1A , rk G x 1 A c (A
G x 1A c) 1A =n , and [I m  G x 1 A c( AG x 1 A c) 1 A] , rk[I  G x 1A c( A
G x 1A c) 1 A ] = n  rk A = d , are independent.) The corresponding
norms are positive semidefinite, namely
|| x m ||G2 = y c( AG x A c) 1 y = xcA c( AG x A c) 1 Ax = xcG m x
x
(1.18)

|| i m ||G2 = xc[G x  A c( AG x1A c) 1 A]x .


x
(1.19)

Case (iii): G x positive semidefinite


L = (G x + A cA) 1 A c [ A(G x + A cA) 1 A c]1 (1.20)

x m = (G x + A cA) 1 A c [ A(G x + A cA) 1 A c]1 y (1.21)

x = xm + i m (1.22)

is an orthogonal decomposition of the unknown vector x  X = Ln


into the ( G x + A cA )-MINOS vector x m  Ln and the G x + AA c -
MINOS vector of inconsistency i m  Ld subject to
x m = (G x + A cA) 1 A c [ A(G x + A cA) 1 A c]1 Ax (1.23)

i m = {I m  ( G x + A cA ) A c[ A ( G x + A cA )
1 1
A c]1 A}x . (1.24)

Due to x m = (G x + A cA) 1 A c [ A(G x + A cA) 1 A c]1 Ax


(G x + AcA) -MINOS has the reproducing property.
As projection matrices
(G x + A cA) 1 A c[ A(G x + A cA ) 1 A c]1 A,

rk (G x + A cA ) 1 A c[ A(G x + A cA ) 1 A c]1 A = rk A = n,
22 1 The first problem of algebraic regression

and
{I m = (G x + A cA ) 1 A c[ A (G x + A cA ) 1 A c]1 A},
rk{I m  (G x + A cA) 1 A c[ A(G x + A cA) 1 A c]1 A} = n  rk A = d ,
are independent. The corresponding norms are positive semidefinite,
namely
|| x m ||G2 x + AcA = y c[ A(G x + A cA) 1 A c]1 y
(1.25)
= xcA c[ A(G x + A cA) 1 A c]1 Ax = xcG m x,

|| i m ||G2 x + AcA = xc{I m  (G x + A cA ) 1 A c[ A (G x + A cA ) 1 A c]1 A}x . (1.26)

: Proof :
A basis of the proof could be C. R. Rao´s Pandora Box, the theory of inverse
partitioned matrices (Appendix A: Fact: Inverse Partitioned Matrix /IPM/ of a
symmetric matrix). Due to the rank identity rk A = rk ( AG x 1 A c) = n < m the
normal equations of the case (i) as well as case (ii) can be faster directly solved
by Gauss elimination.
ªG x A c º ª x m º ª 0 º
« » « »=« »
¬ A 0 ¼ ¬Om ¼ ¬ y ¼
G x x m + A cOm = 0
Ax m = y.

Multiply the first normal equation by AG x 1 and subtract the second normal
equation from the modified first one.

Ax m + AG x 1A cOm = 0
Ax m = y
Om = ( AG A c) 1 y.
1
x

The forward reduction step is followed by the backward reduction step. Imple-
ment Om into the first normal equation and solve for x m .
G x x m  A c( AG x 1A c) 1 y = 0
x m = G x1A c( AG x1A c) 1 y

Thus G x -MINOS x m and Om are represented by

x m = G x 1 A c( AG x 1 A c) 1 y , Ȝ m =  ( AG x 1 A c) 1 y.
1-2 The minimum norm solution: “MINOS” 23

For the Case (iii), to the first normal equation we add the term AA cx m = Acy and
write the modified normal equation

ªG x + A cA A cº ª x m º ª A cº
« » « » = « » y.
¬ A 0 ¼ ¬Om ¼ ¬ I n ¼

Due to the rank identity rk A = rk[ A c(G x + AA c) 1 A] = n < m the modified


normal equations of the case (i) as well as case (ii) are directly solved by Gauss
elimination.
(G x + A cA)x m + A cOm = A cy
Ax m = y.

Multiply the first modified normal equation by AG x 1 and subtract the second
normal equation from the modified first one.
Ax m + A(G x + A cA) 1 A cOm = A (G x + A cA ) 1 A cy
Ax m = y
A(G x + A cA) 1 A cOm = [ A(G x + A cA) 1 A c  I n ]y
Om = [ A(G x + A cA) 1 A c]1[ A(G x + A cA) 1 A c  I n ]y
Om = [I n  ( A(G x + A cA) 1 A c) 1 ]y.
The forward reduction step is followed by the backward reduction step. Imple-
ment Om into the first modified normal equation and solve for x m .
(G x + A cA )x m  Ac[ A(G x + A cA ) 1 A c]1 y + A cy = A cy
(G x + A cA )x m  Ac[ A(G x + A cA ) 1 A c]1 y = 0
x m = (G x + A cA) 1 Ac[ A(G x + A cA ) 1 A c]1 y.
Thus G x -MINOS of (1.1) in terms of particular generalized inverse is obtained
as
x m = (G x + A cA) 1 Ac[ A(G x + A cA ) 1 A c]1 y ,
Om = [I n  ( A(G x + A cA) 1 A c) 1 ]y .
ƅ
1-21 A discussion of the metric of the parameter space X
With the completion of the proof we have to discuss the basic results of Lemma
1.3 in more detail. At first we have to observe that the matrix G x of the metric of
the parameter space X has to be given a priori. We classified MINOS accord-
ing to (i) G x = I m , (ii) G x positive definite and (iii) G x positive semidefinite.
But how do we know the metric of the parameter space? Obviously we need
prior information about the geometry of the parameter space X , namely from
24 1 The first problem of algebraic regression

the empirical sciences like physics, chemistry, biology, geosciences, social sci-
ences. If the parameter space X  R m is equipped with an inner product
¢ x1 | x 2 ² = x1cG x x 2 , x1  X, x 2  X where the matrix G x of the metric & x &2 =
xcG x x is positive definite, we refer to the metric space X  R m as Euclidean
E m . In contrast, if the parameter space X  R m is restricted to a metric space
with a matrix G x of the metric which is positive semidefinite, we call the pa-
rameter space semi Euclidean E m , m . m1 is the number of positive eigenvalues,
1 2

m2 the number of zero eigenvalues of the positive semidefinite matrix G x of the


metric (m = m1 + m2 ). In various applications, namely in the adjustment of ob-
servations which refer to Special Relativity or General Relativity we have to
generalize the metric structure of the parameter space X : If the matrix G x of
the pseudometric & x &2 = xcG x x is built on m1 positive eigenvalues (signature +),
m2 zero eigenvalues and m3 negative eigenvalues (signature –), we call the
pseudometric parameter space pseudo Euclidean E m , m , m , m = m1 + m2 + m3 .
1 2 3

For such a parameter space MINOS has to be generalized to & x &2G = extr , for x

instance "maximum norm solution" .


1-22 Alternative choice of the metric of the parameter space X
Another problem associated with the parameter space X is the norm choice
problem. Here we have used the A 2 -norm, for instance
A 2 -norm: & x & 2 := xcx = x12 + x22 + ... + xm2 1 + xm2 ,
p p p p
A p -norm: & x & p := p
x1 + x2 + ... + xm 1 + xm ,
as A p -norms (1 d p < f ) are alternative norms of choice.
Beside the choice of the matrix G x of the metric within the A 2 -norm and of the
norm A p itself we like to discuss the result of the MINOS matrix G m of the
metric. Indeed we have constructed MINOS from an a priori choice of the metric
G called G x and were led to the a posteriori choice of the metric G m of type
(1.27), (1.28) and (1.29). The matrices
(i) G m = A c( AA c) 1 A (1.27)
(ii) G m = A c( AG Ac) A
1
x
1
(1.28)
(iii) G m = A c[ A(G x + A cA) A c] A 1 1
(1.29)

are (i) idempotent, (ii) G x idempotent and (iii) [ A(G x + AcA) 1 Ac]1 idempotent,
2
namely projection matrices. Similarly, the norms i m of the type (1.30), (1.31)
and (1.32) measure the distance of the solution point x m  X from the null space
N ( A ) . The matrices
(i) I m  A c( AA c) 1 A (1.30)
1 1
(ii) G x  A c( AG A c) A
x (1.31)
(iii) I m  (G x + A cA ) A c[ A(G x + A cA ) A c] A
1 1 1
(1.32)
1-2 The minimum norm solution: “MINOS” 25

are (i) idempotent, (ii) G x 1 idempotent and (iii) (G x + A cA) 1 A idempotent,


namely projection matrices.
1-23 G x -MINOS and its generalized inverse
A more formal version of the generalized inverse which is characteristic for
G x -MINOS is presented by

Lemma 1.4 (characterization of G x -MINOS):


x m = Ly is I – MINOS of the consistent system of linear equations
(1.1) Ax = y , rk A = rk ( A, y ) (or y  R ( A) ) if and only if the matrix
L  R m × n fulfils
ALA = A ½
¾. (1.33)
LA = (LA)c¿

The reflexive matrix L is the A1,2,4 generalized inverse.


x m = Ly is G x -MINOS of the consistent system of linear
equations (1.1) Ax = y , rk A = rk( A, y ) (or y  R ( A) ) if and
only if the matrix L  R m × n fulfils the two conditions
ALA = A ½
¾. (1.34)
G x LA = (G x LA ) ¿
c

The reflexive matrix L is the G x -weighted A1,2,4 generalized in-


verse.
: Proof :
According to the theory of the general solution of a system of linear equations
which is presented in Appendix A, the conditions ALA = L or L = A  guarantee
the solution x = Ly of (1.1) , rk A = rk( A, y ) . The general solution
x = x m + (I  LA)z

with an arbitrary vector z  R m×1 leads to the appropriate representation of the


G x -seminorm by means of
|| x m ||G2 = || Ly || G2 d || x ||G2 = || x m + (I  LA )z ||G2
x x x x

=|| x m ||G2 +2xcmG x (I  LA )z + || (I  LA)z ||G2


x x

=|| Ly ||2
Gx +2y cLcG x (I  LA )z + || (I  LA)z ||G2 x

= y cLcG x Ly + 2y cLcG x (I  LA)z + z c(I  A cLc)G x (I  LA)z

where the arbitrary vectors y  Y { R n holds if and only if y cLcG x (I  LA)z = 0


for all z  R m×1 or A cLcG x (I  LA) = 0 or A cLcG xc = A cLcG x LA . The right
26 1 The first problem of algebraic regression

hand side is a symmetric matrix. Accordingly the left hand side must have this
property, too, namely G x LA = (G x LA)c , which had to be shown.
Reflexivity of the matrix L originates from the consistency condition, namely
(I  AL)y = 0 for all y  R m×1 or AL = I . The reflexive condition of the G x -
weighted, minimum norm generalized inverse, (1.17) G x LAL = G x L , is a direct
consequence.
Consistency of the normal equations (1.4) or equivalently the uniqueness of
G x x m follows from G x L1y = A cL1cG x L1y = G x L1 AL1 y = G x L1 AL 2 y = A cL1c A ×
×Lc2 G x L 2 y = A cL 2 G x L 2 y = G x L 2 y for arbitrary matrices L1  R m × n and
L 2  R m × n which satisfy (1.16).
ƅ
1-24 Eigenvalue decomposition of G x -MINOS:
canonical MINOS
In the empirical sciences we meet quite often the inverse problem to determine
the infinite set of coefficients of a series expansion of a function of a functional
(Taylor polynomials) from a finite set of observations.
First example:
Determine the Fourier coefficients (discrete Fourier transform, trigonometric
polynomials) of a harmonic function with circular support from observations in a
one-dimensional lattice.
Second example:
Determine the spherical harmonic coefficients (discrete Fourier-Legendre trans-
form) of a harmonic function with spherical support from observations n a two-
dimensional lattice.
Both the examples will be dealt with lateron in a case study. Typically such ex-
pansions generate an infinite dimensional linear model based upon orthogonal
(orthonormal) functions. Naturally such a linear model is underdetermined since
a finite set of observations is confronted with an infinite set of unknown parame-
ters. In order to make such an infinite dimensional linear model accessible to the
computer, the expansion into orthogonal (orthonormal) functions is truncated or
band-limited.
Observables y  Y , dim Y = n , are related to parameters x  X , dim X =
= m  n = dim Y , namely the unknown coefficients, by a linear operator
A  \ n× m which is given in the form of an eigenvalue decomposition. We are
confronted with the problem to construct “canonical MINOS”, also called the
eigenvalue decomposition of G x -MINOS.
First, we intend to canonically represent the parameter space X as well as the
observation space Y . Here, we shall assume that both spaces are Euclidean
1-2 The minimum norm solution: “MINOS” 27

equipped with a symmetric, positive definite matrix of the metric G x and G y ,


respectively. Enjoy the diagonalization procedure of both matrices reviewed in
Box 1.19. The inner products aac and bbc , respectively, constitute the matrix of
the metric G x and G y , respectively. The base vectors {a1 ,..., am | O} span the
parameter space X , dim X = m , the base vectors {b1 ,..., bm | O} the observation
space, dim Y = n . Note the rank identities rk G x = m , rk G y = n , respectively.
The left norm || x ||G2 = xcG x x is taken with respect to the left matrix of the metric
x
G x . In contrast, the right norm || y ||G2 = y cG y y refers to the right matrix of the
y
metric G y . In order to diagonalize the left quadratic form as well as the right
quadratic form we transform G x 6 G *x = Diag(Ȝ 1x ,..., Ȝ mx ) = 9 cG x 9 - (1.35),
(1.37), (1.39) - as well as G y 6 G *y = Diag(Ȝ 1y ,..., Ȝ ny ) = 8 cG y 8 - (1.36), (1.38),
(1.40) - into the canonical form by means of the left orthonormal matrix V and
by means of the right orthonormal matrix U . Such a procedure is called “eigen-
space analysis of the matrix G x ” as well as “eigenspace analysis of the matrix
G y ”. ȁ x constitutes the diagonal matrix of the left positive eigenvalues
(Ȝ 1x ,..., Ȝ mx ) , the right positive eigenvalues (Ȝ 1y ,..., Ȝ ny ) the n-dimensional right
spectrum. The inverse transformation G *x = ȁ x 6 G x - (1.39) - as well as
G *y = ȁ y 6 G y - (1.40) - is denoted by “left eigenspace synthesis” as well as
“right eigenspace synthesis”.
Box 1.8:
Canonical representation of the matrix of the metric
parameter space versus observation space
“parameter space X ” “observation space”
span{a1 ,..., am } = X Y = span{b1 ,..., bn }
aj |aj
1 2
= gj ,j 
1 2
ai | ai
1 2
= g i ,i 
1 2

 aac = G x  bbc = G y
j1 , j2  {1,..., m} i1 , i2  {1,..., n}
rk G x = m rk G y = n

“left norms” “right norms”


|| x ||G2 = xcG x x = (x* )cx*
x
(y * )cy * = y cG y y =|| y ||G2 y

“eigenspace analysis “eigenspace analysis


of the matrix G x ” of the matrix G y ”

G *x = V cG x V = G *y = U cG y U =
(1.35) (1.36)
= Diag(Ȝ 1x ,..., Ȝ mx ) =: ȁ x = Diag(Ȝ 1y ,..., Ȝ ny ) =: ȁ y
28 1 The first problem of algebraic regression

subject to subject to
(1.37) VV c = V cV = I m UU c = U cU = I n (1.38)

(1.39) (G x  Ȝ xj I m ) v j = 0 (G y  Ȝ iy I n )u i = 0 (1.40)

“eigenspace synthesis “eigenspace synthesis


of the matrix G x ” of the matrix G y ”

(1.41) G x = VG *x V c = Vȁ x V c Uȁ y U c = UG *y U c = G y . (1.42)

Second, we study the impact of the left diagonalization of the metric of the met-
ric G x as well as right diagonalization of the matrix of the metric G y on the
coordinates x  X and y  Y , the parameter systems of the left Euclidean space
X , dim X = m , and of the right Euclidian space Y . Enjoy the way how we
have established the canonical coordinates x* := [ x1* ,..., xm* ]c of X as well as the
canonical coordinates y * := [ y1* ,..., yn* ] called the left and right star coordinates
of X and Y , respectively, in Box 1.9. In terms of those star coordinates (1.45)
as well as (1.46) the left norm || x* ||2 of the type (1.41) as well as the right norm
|| y * ||2 of type (1.42) take the canonical left and right quadratic form. The trans-
formations x 6 x* as well as y 6 y * of type (1.45) and (1.46) are special ver-
sions of the left and right polar decomposition: A rotation constituted by the 1 1
matrices {U, V} is followed by a stretch constituted by the matrices {ȁ x , ȁ y } as 2 2

diagonal matrices. The forward transformations (1.45), (1.46), x 6 x* and


y 6 y * are computed by the backward transformations x* 6 x and y * 6 y .
1 1
ȁ x and ȁ y , respectively, denote those diagonal matrices which are generated
2 2

by the positive roots of the left and right eigenvalues, respectively. (1.49) –
(1.52) are corresponding direct and inverse matrix identities. We conclude with
the proof that the ansatz (1.45), (1.46) indeed leads to the canonical representa-
tion (1.43), (1.44) of the left and right norms.
Box 1.9:
Canonical coordinates x*  X and y *  Y ,
parameter space versus observation space
“canonical coordinates “canonical coordinates
of the parameter space” of the observation space”
|| x* ||2 = (x* )cx* = || y * ||2 = (y * )cy * =
(1.43) (1.44)
= xcG x x =|| x ||G2 x
= y cG y y =|| y ||G2 y

ansatz
1 1
(1.45) x* = V cȁ x x2
y * = U cȁ y y2
(1.46)
1-2 The minimum norm solution: “MINOS” 29

versus versus
- 12 - 12
(1.47) x = ȁ Vx x
*
y = ȁ y Uy * (1.48)

1
(1.49) ȁ x := Diag
2
( O1x ,..., Omx ) Diag ( O1y ,..., Ony := ȁ y (1.50)) 1
2

1
§ 1 1 · § 1 1 · 1
(1.51) ȁ x := Diag ¨ ¸ =: ȁ -y (1.52)
- 2
,..., ¸ Diag ¨ ,..., 2

¨ O x
Omx ¸ ¨ O y
Ony ¸
© 1 ¹ © 1 ¹
“the ansatz proof” “the ansatz proof”
G x = Vȁ x V c G y = Uȁ y Uc

|| x ||G2 = xcG x x =
x
|| y ||G2 = y cG y y =
y

1 1 1 1
= xcVȁ x ȁ x V cx =
2 2
= y cUȁ y ȁ y U cy =
2 2

- 12 1
- 12
= (x* )cȁ x V cVȁ x ȁ x V cVȁ x x* = - 12 1
- 12
2
= (y * )cȁ y U cUȁ y ȁ y U cUȁ y y * = 2

= (x* )cx* =|| x* ||2 = (y * )cy * =|| y * ||2 .

Third, let us discuss the dual operations of coordinate transformations x 6 x* ,


y 6 y * , namely the behavior of canonical bases, also called orthonormal bases
{
e x , e y , or Cartan frames of reference e1x ,..., emx | 0 spanning the parameter }
{ }
space X as well as e1y ,..., eny | 0 spanning the observation space Y , here
a 6 e x , b 6 e y . In terms of orthonormal bases e x and e y as outlined in Box
1.10, the matrix of the metric e x e xc = I m and e yce y = I n takes the canonical form
(“modular”). Compare (1.53) with (1.55) and (1.54) with (1.56) are achieved by
the changes of bases (“CBS”) of type left e x 6 a , a 6 ex - (1.57), (1.59) - and
of type right e y 6 b , b 6 e y - (1.58), (1.60). Indeed these transformations
x 6 x* , a 6 e x - (1.45), (1.57) - and y 6 y * , b 6 e y - (1.46), (1.58) - are
dual or inverse.

Box 1.10:
General bases versus orthonormal bases spanning the parameter space X as
well as the observation space Y
“left” “right”
“parameter space X ” “observation space”
“general left base” “general right base”
span {a1 ,..., am } = X Y = span {b1 ,..., bn }
30 1 The first problem of algebraic regression

: matrix of the metric : : matrix of the metric :


(1.53) aac = G x bbc = G y (1.54)

“orthonormal left base” “orthonormal right base”


{ x
span e ,..., e
1
x
m }=X {
Y = span e1y ,..., eny }
: matrix of the metric : : matrix of the metric :
(1.55) e x ecx = I m e y ecy = I n (1.56)
“base transformation” “base transformation”
1 1
(1.57) a = ȁ x Ve x
2
b = ȁ y Ue y
2
(1.58)

versus versus
- 12 - 12
(1.59) e x = V cȁ x a e y = Ucȁ y b (1.60)

{ }
span e1x ,..., emx = X {
Y = span e1y ,..., eny . }
Fourth, let us begin the eigenspace analysis versus eigenspace synthesis of the
rectangular matrix A  \ n× m , r := rk A = n , n < m . Indeed the eigenspace of
the rectangular matrix looks differently when compared to the eigenspace of the
quadratic, symmetric, positive definite matrix G x  \ m × m , rk G x = m and
G y  \ n×n , rk G y = n of the left and right metric. At first we have to generalize
the transpose of a rectangular matrix by introducing the adjoint operator A #
which takes into account the matrices {G x , G y } of the left, right metric. Defini-
tion 1.5 of the adjoint operator A # is followed by its representation, namely
Lemma 1.6.
Definition 1.5 (adjoint operator A # ):
The adjoint operator A #  \ m× n of the matrix A  \ n× m is defined
by the inner product identity
y | Ax G = x | A # y , (1.61)
y Gx

where the left inner product operates on the symmetric, full rank
matrix G y of the observation space Y , while the right inner prod-
uct is taken with respect to the symmetric full rank matrix G x of the
parameter space X .

Lemma 1.6 (adjoint operator A # ):


A representation of the adjoint operator A #  \ m × n of the matrix
A  \ n× m is
A # = G -1x A cG y . (1.62)
1-2 The minimum norm solution: “MINOS” 31

For the proof we take advantage of the symmetry of the left inner product,
namely
y | Ax Gy
= y cG y Ax versus x | A#y = xcG x A # y
Gx

y cG y Ax = xcA cG y y = xcG x A # y œ
A cG y = G x A # œ G x1A cG y = A # .
ƅ
Five, we solve the underdetermined system of linear equations

{y = Ax | A  \ n× m
, rk A = n, n < m }
by introducing
• the eigenspace of the rectangular matrix A  \ n× m of rank
r := rk A = n , n < m : A 6 A*
• the left and right canonical coordinates: x o x* , y o y *
as supported by Box 1.11. The transformations (1.63) x 6 x* , (1.64) y 6 y *
from the original coordinates ( x1 ,..., xm ) , the parameters of the parameter space
( )
X , to the canonical coordinates x1* ,..., xm* , the left star coordinates, as well as
from the original coordinates ( y1 ,..., yn ) , the parameters of the observation
( )
space Y , to the canonical coordinates y1* ,..., yn* , the right star coordinates are
polar decompositions: a rotation {U, V} is followed by a general stretch
{ 1 1
2
} 1
2
1
G y , G x . The matrices G y as well as G x are product decompositions of type
2 2

G y = S y S yc and G x = S xcS x . If we substitute S y = G y or S x = G x symbolically,


1 1
2 2
1 1
we are led to the methods of general stretches G y and G x respectively. Let us
2 2
1
substitute the inverse transformations (1.65) x* 6 x = G x Vx* and (1.66)
- 2
1
y 6 y = G y Uy into our system of linear equations (1.67) y = Ax or its dual
* - 2*

(1.68) y * = A* x* . Such an operation leads us to (1.69) y * = f x* as well as ( )


(1.70) y = f ( x ) . Subject to the orthonormality conditions (1.71) U cU = I n and
(1.72) V cV = I m we have generated the matrix A* of left–right eigenspace
analysis (1.73)
A* = [ ȁ, 0]

subject to the horizontal rank partitioning of the matrix V = [ V1 , V2 ] . Alterna-


tively, the left-right eigenspace synthesis (1.74)

ªV c º
A = G y U [ ȁ, 0 ] « 1 » G x
1 1
- 2 2

«V c »
¬ 2¼
- 12
is based upon the left matrix (1.75) L := G y U and the right matrix (1.76)
1
R := G x V . Indeed the left matrix L by means of (1.77) LLc = G -1y reconstructs
-
2

the inverse matrix of the metric of the observation space Y . Similarly, the right
32 1 The first problem of algebraic regression

matrix R by means of (1.78) RR c = G -1x generates the inverse matrix of the


metric of the parameter space X . In terms of “L, R” we have summarized the
eigenvalue decompositions (1.79)-(1.84). Such an eigenvalue decomposition
helps us to canonically invert y * = A* x* by means of (1.85), (1.86), namely the
rank partitioning of the canonical unknown vector x* into x*1  \ r and
x*2  \ m  r to determine x*1 = ȁ -1 y * , but leaving x*2 underdetermined. Next we
shall proof that x*2 = 0 if x* is MINOS.

A
X
x y Y

1 1
V cG x 2
U cG y 2

X
x* y*  Y
A*
Figure 1.6: Commutative diagram of coordinate transformations
Consult the commutative diagram for a short hand summary of the introduced
transformations of coordinates, both of the parameter space X as well as the
observation space Y .
Box 1.11:
Canonical representation,
underdetermined system of linear equations
“parameter space X ” versus “observation space Y ”
1 1
(1.63) x* = V cG x x 2
y * = U cG y y (1.64) 2

and and
- 12 - 12
(1.65) x = G x Vx* y = G y Uy * (1.66)

“underdetermined system of linear equations”


{
y = Ax | A  \ n× m , rk A = n, n < m }
(1.67) y = Ax versus y * = A * x* (1.68)
- 12 - 12 1 1
G y Uy * = AG x Vx* U cG y y = A* V cG x x
2 2

( ) ( )
1
- 12 - 12 1
(1.69) y * = U cG y AG x V x*
2
y = G y UA* V cG x x (1.70) 2
1-2 The minimum norm solution: “MINOS” 33

subject to

(1.71) U cU = UUc = I n versus V cV = VV c = I m (1.72)

“left and right eigenspace”


“left-right eigenspace “left-right eigenspace
analysis” synthesis”
A* = U cG y AG x [ V1 , V2 ]
1 1
-
2 2 ªV c º
A = G y U [ ȁ, 0] « 1 » G x (1.74)
1 1
-
(1.73) 2 2

= [ ȁ, 0] «V c »
¬ 2¼
“dimension identities”
r×r
ȁ\ , 0  \ r × ( m  r ) , r := rk A = n, n < m
V1  \ m × r , V2  \ m × ( m  r ) , U  \ r × r
“left eigenspace” “right eigenspace”
- 12 1 - 12 1
(1.75) L := G U Ÿ L = U cG y
y
-1 2
R := G x V Ÿ R -1 = V cG x (1.76) 2

- 12 - 12
R 1 := G x V1 , R 2 := G x V2
1 1
Ÿ
R 1- := V1cG x , R -2 := V2cG x
2 2

(1.77) LLc = G -1y Ÿ (L-1 )cL-1 = G y RR c = G -1x Ÿ (R -1 )cR -1 = G x (1.78)

(1.79) A = LA* R -1 versus A* = L-1 AR (1.80)


ªR º -
A = [ ȁ, 0] =
*

(1.81) A = L [ ȁ, 0] « - » 1
versus (1.82)
¬« R 2 ¼» = L-1 A [ R 1 , R 2 ]

ª A # AR 1 = R 1 ȁ 2
(1.83) AA # L = Lȁ 2 versus « # (1.84)
«¬ A AR 2 = 0
“underdetermined system of linear
equations solved in canonical coordinates”
ª x* º x*  \ r ×1
(1.85) y * = A* x* = [ ȁ, 0] « 1* » = ȁx*1 , * 1 ( m  r )×1 Ÿ
«¬ x 2 »¼ x2  \

ª x*1 º ª ȁ -1 y * º
« *» = « * » (1.86)
¬« x 2 ¼» ¬ x 2 ¼
“if x* is MINOS, then x*2 = 0 : x1* ( ) m
= ȁ -1 y * .”
34 1 The first problem of algebraic regression

Six, we prepare ourselves for MINOS of the underdetermined system of linear


equations

{y = Ax | A  \ n× m
, rk A = n, n < m, || x ||G2 = min x
}
by introducing Lemma 1.7, namely the eigenvalue - eigencolumn equations of
the matrices A # A and AA # , respectively, as well as Lemma 1.9, our basic result
on “canonical MINOS”, subsequently completed by proofs.
Lemma 1.7 (eigenspace analysis versus eigenspace synthesis
{
of the matrix A  \ n× m , r := rkA = n < m ) }
The pair of matrices {L, R} for the eigenspace analysis and the eigenspace
synthesis of the rectangular matrix A  \ n× m of rank r := rkA = n < m ,
namely
A* = L-1 AR versus A = LA* R -1
or or
A = [ ȁ, 0 ] = L A [ R 1 , R 2 ]
* -1
versus ª R -1 º
A = L [ ȁ, 0] « 1-1 » ,
¬« R 2 ¼»
are determined by the eigenvalue – eigencolumn equations (eigenspace
equations) of the matrices A # A and AA # , respectively, namely
A # AR 1 = R 1 ȁ 2 versus AA # L = Lȁ 2

subject to
ªO12 … 0 º
« »
ȁ 2 = « # % # » , ȁ = Diag + O12 ,..., + Or2 . ( )
« 0 " Or2 »
¬ ¼
Let us prove first AA # L = Lȁ 2 , second A # AR 1 = R 1 ȁ 2 .
(i) AA # L = Lȁ 2

AA # L = AG -1x A cG y L =
ªV c º ªȁº
= L [ ȁ, 0] « 1 » G x G -1x (G x )c [ V1 , V2 ] « » U c(G y )cG y G y U,
1 1 1 1
2
- 2
- - 2 2

«V c » 0
¬ ¼c
¬ 2¼
ª V cV V1c V2 º ª ȁ º
AA # L = L [ ȁ, 0] « 1 1 » « »,
«V cV
¬ 2 1 V2c V2 »¼ ¬ 0c ¼

ªI 0 º ªȁº
AA # L = L [ ȁ, 0] « r . ƅ
¬0 I m -r »¼ «¬ 0c »¼
1-2 The minimum norm solution: “MINOS” 35

(ii) A # AR 1 = R 1 ȁ 2

A # AR = G -1x AcG y AR =
ªȁº
= G -1xG x V « » U c(G y )cG y G y U [ ȁ, 0] V cG x G x V,
1 1 1 1 1
2
- - 2
- 2 2 2

¬ 0c ¼
ªȁº ª ȁ 2 0º
A # AR = G x V « » [ ȁ, 0] = G x [ V1 , V2 ] «
1 1
- -
»,
2 2

¬ 0c ¼ ¬ 0 0¼
A # A [ R 1 , R 2 ] = G x ª¬ V1 ȁ 2 , 0 º¼
1
- 2

A # AR 1 = R 1 ȁ 2 . ƅ
{
The pair of eigensystems AA # L = Lȁ 2 , A # AR 1 = R 1 ȁ 2 is unfortunately }
based upon non-symmetric matrices AA # = AG -1x A cG y and A # A = G -1x A cG y A
which make the left and right eigenspace analysis numerically more complex. It
appears that we are forced to use the Arnoldi method rather than the more effi-
cient Lanczos method used for symmetric matrices. In this situation we look out
for an alternative. Indeed when we substitute
{L, R} { - 12
by G y U, G x V
- 12
}
- 12
into the pair of eigensystems and consequently left multiply AA # L by G x , we
achieve a pair of eigensystems identified in Corollary 1.8 relying on symmetric
matrices. In addition, such a symmetric pair of eigensystems produces the ca-
nonical base, namely orthonormal eigencolumns.
Corollary 1.8 (symmetric pair of eigensystems):
The pair of eigensystems
1 1
- 12 - 12
(1.87) G y AG -1x A c(G y )cU = ȁ 2 U versus
2 2
(G x )cA cG y AG x V1 = V1 ȁ 2 (1.88)
1
- 12 - 12 - 12
(1.89) G y AG -1x Ac(G y )c  Ȝ i2 I r = 0 versus (G x )cA cG y AG x  Ȝ 2j I m = 0 (1.90)
2

is based upon symmetric matrices. The left and right eigencolumns are
orthogonal.

Such a procedure requires two factorizations,


1
1 1
- 12 - 12 - 12 - 12
G x = (G x )cG x , G -1x = G x (G x )c
2 2
and G y = (G y )cG y , G -1y = G y (G y )c
2

via Cholesky factorization or eigenvalue decomposition of the matrices G x and


Gy .
36 1 The first problem of algebraic regression

Lemma 1.9 (canonical MINOS):

Let y * = A* x* be a canonical representation of the underdetermined


system of linear equations

{y = Ax | A  \ n× m
, r := rkA = n, n < m . }
Then the rank partitioning of x*m

ª x* º ª ȁ -1 y * º x1* = ȁ -1 y * *
x*m = « *1 » = « » or , x1  \ r ×1 , x*2  \ ( m  r )×1 (1.91)
¬x2 ¼ ¬ 0 ¼ x2 = 0
*

is G x -MINOS. In terms of the original coordinates [ x1 ,..., xm ]c of the


parameter space X a canonical representation of G x -MINOS is

ª ȁ -1 º
xm = G x [ V1 , V2 ] « » U cG y y ,
1 1
- 2 2

¬ 0c ¼
- 12 1
xm = G x V1 ȁ -1 U cG y = 5 1 ȁ -1 /-1 y. 2

The G x -MINOS solution xm = A m- y


- 12 1
A m- = G x V1 ȁ -1 U cG y 2

is built on the canonical ( G x , G y ) weighted reflexive inverse of A .

For the proof we depart from G x -MINOS (1.14) and replace the ma-
trix A  \ n× m by its canonical representation, namely eigenspace synthesis.

( )
-1
xm = G -1x Ac AG -1x Ac y

ªV c º
A = G y U [ ȁ, 0 ] « 1 » G x
1 1
- 2 2

«V c »
¬ 2¼

ªVc º ªȁº
AG -1x Ac = G y U [ ȁ, 0] « 1 » G x G -1x (G x )c [ V1 , V2 ] « » Uc(G y )c
1 1 1 1
- 2 2
- 2 2

«V c » ¬0¼
¬ 2¼

- 12
AG -1x Ac = G y Uȁ 2 Uc(G y )c, AG -1x Ac
- 12
( )
-1
( )
c 1
= G y Uȁ -2 UcG y
2
1
2

( )c [V , V ] «¬ªȁ0 »¼º Uc (G )c (G )c Uȁ
xm = G -1x G x
1
2
1 2
- 12
y
1
2
y
-2
1
U cG y y
2
1-2 The minimum norm solution: “MINOS” 37

ª ȁ -1 º
xm = G x [ V1 , V2 ] « » U cG y y
1 1
- 2 2

¬ 0 ¼
- 12 1
xm = G x V1 ȁ -1 U cG y y = A m- y
2

- 12 1
A m- = G x V1 ȁ -1 U cG y  A1,2,4
G
2
x

( G x weighted reflexive inverse of A )

ª x* º 1 ª ȁ -1 º ª ȁ -1 º
1 ª ȁ -1 y * º
x*m = « *1 » = V cG x xm = « » U cG y y = « » y * = «
2 2
». ƅ
¬x2 ¼ ¬ 0 ¼ ¬ 0 ¼ ¬ 0 ¼
The important result of x*m based on the canonical G x -MINOS of {y * = A* x* |
A*  \ n× m , rkA* = rkA = n, n < m} needs a short comment. The rank partition-
ing of the canonical unknown vector x* , namely x*1  \ r , x*2  \ m  r again
paved the way for an interpretation. First, we acknowledge the “direct inversion”

(
x*1 = ȁ -1 y * , ȁ = Diag + O12 ,..., + Or2 , )
for instance [ x1* ,..., xr* ]c = [O11 y1 ,..., Or1 yr ]c . Second, x*2 = 0 , for instance
[ xr*+1 ,..., xm* ]c = [0,..., 0]c introduces a fixed datum for the canonical coordinates
( xr +1 ,..., xm ) . Finally, enjoy the commutative diagram of Figure 1.7 illustrating
our previously introduced transformations of type MINOS and canonical
MINOS, by means of A m and ( A* )m .
A m
Y
y xm  X

1 1
UcG y2
V cG x
2

Y
y* x*m  X
(A )
*

m

Figure 1.7: Commutative diagram of inverse coordinate transformations

Finally, let us compute canonical MINOS for the Front Page Example,
specialized by G x = I 3 , G y = I 2 .
38 1 The first problem of algebraic regression

ª x1 º
ª 2 º ª1 1 1 º « »
y = Ax : « » = « » « x2 » , r := rk A = 2
¬ 3 ¼ ¬1 2 4 ¼ « »
¬ x3 ¼
left eigenspace right eigenspace
A # AV1 = A cAV1 = V1 ȁ 2
AA U = AAcU = Uȁ
# 2

A # AV2 = A cAV2 = 0

ª2 3 5 º
ª3 7 º « 3 5 9 » = A cA
AA c = « »
¬7 21¼ « »
«¬ 5 9 17 »¼

eigenvalues

AA c  Oi2 I 2 = 0 œ A cA  O j2 I 3 = 0 œ

œ O12 = 12 + 130, O22 = 12  130, O32 = 0

left eigencolumns right eigencolumns

ª 2  O12 3 5 º ª v11 º
2
ª3  O 7 º ª u11 º « »
(1st) « 1
»« » = 0 (1st) « 3 5  O12
9 » «« v 21 »» = 0
¬ 7 21  O12 ¼ ¬u21 ¼ « 5
¬ 9 17  O12 »¼ «¬ v31 »¼

subject to subject to
u112 + u21
2
=1 v112 + v 221 + v31
2
=1

ª(2  O12 )v11 + 3v 21 + 5v31 = 0


(3  O12 )u11 + 7u21 = 0 versus «
¬3v11 + (5  O1 )v 21 + 9v31 = 0
2

ª 2 49 49
« u11 = 49 + (3  O 2 ) 2 = 260 + 18 130
« 1

« 2 2 2
(3  O1 ) 211 + 18 130
«u21 = =
¬« 49 + (3  O 1
2 2
) 260 + 18 130
2
ª v11 º ª (2 + 5O12 ) 2 º
« 2» 1 « »
« v 21 » = (2 + 5O 2 ) 2 + (3  9O 2 ) 2 + (1 + 7O 2  O 4 ) 2
2 2
« (3  9O1 ) »
« v31
2 » 1 1 1 1 « (1  7O12 + O14 ) 2 »
¬ ¼ ¬ ¼
1-2 The minimum norm solution: “MINOS” 39

2
ªv º
ª 62 + 5 130 2 º
« (
» )
11
« 2»
« »
«v » =
2
21
1
(
« 105  9 130 » )
« v » 102700 + 9004 130 «
2 »
¬ ¼
31

¬«
(
« 191 + 17 130 2 »
¼»
)
ª 2  O22 3 5 º ª v12 º
ª3  O22 7 º ª u12 º « »
(2nd) « 2»« » = 0 (2nd) « 3 2
5  O2 9 » «« v 22 »» = 0
¬ 7 21  O2 ¼ ¬u22 ¼ « 5
¬ 9 17  O22 »¼ «¬ v32 »¼
subject to subject to
u +u =1
2
12
2
22 v + v 222 + v32
2
12
2
=1
(3  O22 )u12 + 7u22 = 0 versus ª (2  O22 )v12 + 3v 22 + 5v32 = 0
«
¬ 3v12 + (5  O2 )v 22 + 9v32 = 0
2

ª 2 49 49
« u12 = 49 + (3  O 2 ) 2 = 260  18 130
« 2

« 2 2 2
(3  O2 ) 211  18 130
«u22 = =
«¬ 49 + (3  O 2 2
2 ) 260  18 130
2
ª v12 º ª (2 + 5O22 ) 2 º
« 2 » 1 « »
« v 22 » = (2 + 5O 2 ) 2 + (3  9O 2 ) 2 + (1 + 7O 2  O 4 ) 2
2 2
« (3  9O2 ) »
« v32
2 » 2 2 2 2 « (1  7O22 + O24 ) 2 »
¬ ¼ ¬ ¼

2
ª v12 º
(
ª 62  5 130 2 º
« » )
« 2»
« 2 »
« v 22 » =
1
(
« 105 + 9 130 »
102700  9004 130 « »
)
« v32
2 »
¬ ¼
«¬ (
« 191  17 130 2 »
»¼ )
ª 2 3 5 º ª v13 º
(3rd) «« 3 5 9 »» «« v 23 »» = 0 subject to v132 + v 223 + v33
2
=1
«¬ 5 9 17 »¼ «¬ v33 »¼

2v13 + 3v 23 + 5v33 = 0
3v13 + 5v 23 + 9v33 = 0
ª 2 3º ª v13 º ª 5º ª v13 º ª 5 3º ª 5º
« 3 5» « v » = « 9» v33 œ « v » =  « 3 2 » «9» v33
¬ ¼ ¬ 23 ¼ ¬ ¼ ¬ 23 ¼ ¬ ¼¬ ¼
v13 = 2v33 , v 23 = 3v33
40 1 The first problem of algebraic regression

2 9 1
v132 = 2
, v 23 = , v33
2
= .
7 14 14
There are four combinatorial solutions to generate square roots.

ª u11 u12 º ª ± u11 ± u122 º


2

«u » = « »
¬ 21 u22 ¼ «¬ ± u21
2 2 »
± u22 ¼
ª 2 º
v13 º « ± v11 ± v12 ± v13
2 2
ª v11 v12 »
«v v 22 v 23 »» = « ± v 221 ± v 222 ± v 223 » .
« 21 « »
«¬ v31 v32 v33 »¼ « ± v 2 ± v32
2
± v332 »
31
¬ ¼
Here we have chosen the one with the positive sign exclusively. In summary, the
eigenspace analysis gave the result as follows.

ȁ = Diag ( 12 + 130 , 12  130 )


ª 7 7 º
« »
« 260 + 18 130 260  18 130 »
U=« »
« 211 + 18 130 211 + 18 130 »
« »
¬ 260 + 18 130 260  18 130 ¼

ª 62 + 5 130 62  5 130 º
« 2 »
« 102700 + 9004 130 102700  9004 130 »
« 14 »
« 105 + 9 130 105  9 130 3 »
V=« » = [ V1 , V2 ] .
« 102700 + 9004 130 102700  9004 130 14 »
« 1 »
« 191 + 17 130 191 + 17 130 »
«« 102700 + 9004 130 14 »
¬ 102700  9004 130 »¼

1-3 Case study:


Orthogonal functions, Fourier series versus Fourier-Legendre
series, circular harmonic versus spherical harmonic regression
In empirical sciences, we continuously meet the problems of underdetermined
linear equations. Typically we develop a characteristic field variable into or-
thogonal series, for instance into circular harmonic functions (discrete Fourier
transform) or into spherical harmonics (discrete Fourier-Legendre transform)
with respect to a reference sphere. We are left with the problem of algebraic
regression to determine the values of the function at sample points, an infinite set
of coefficients of the series expansion from a finite set of observations. An infi-
1-3 Case study 41

nite set of coefficients, the coordinates in an infinite-dimensional Hilbert space,


cannot be determined by finite computer manipulations. Instead, band-limited
functions are introduced. Only a finite set of coefficients of a circular harmonic
expansion or of a spherical harmonic expansion can be determined. It is the art
of the analyst to fix the degree / order of the expansion properly. In a peculiar
way the choice of the highest degree / order of the expansion is related to the
Uncertainty Principle, namely to the width of lattice of the sampling points.

Another aspect of any series expansion is the choice of the function space. For
instance, if we develop scalar-valued, vector-valued or tensor-valued functions
into scalar-valued, vector-valued or tensor-valued circular or spherical harmon-
ics, we generate orthogonal functions with respect to a special inner product,
also called “scalar product” on the circle or spherical harmonics are eigenfunc-
tions of the circular or spherical Laplace-Beltrami operator. Under the postulate
of the Sturm-Liouville boundary conditions the spectrum (“eigenvalues”) of the
Laplace-Beltrami operator is
positive and integer.
The eigenvalues of the circular Laplace-Beltrami operator are l 2 for integer
{
values l  {0,1,..., f} , of the spherical Laplace-Beltrami operator k (k + 1), l 2 }
for integer values k {0,1,..., f} , l {k , k + 1,..., 1, 0,1,..., k  1, k} . Thanks to
such a structure of the infinite-dimensional eigenspace of the Laplace-Beltrami
operator we discuss the solutions of the underdetermined regression problem
(linear algebraic regression) in the context of “canonical MINOS”. We solve the
system of linear equations
{Ax = y | A  \ n× m , rk A = n, n  m}
by singular value decomposition as shortly outlined in Appendix A.
1-31 Fourier series
? What are Fourier series ?
Fourier series (1.92) represent the periodic behavior of a function x(O ) on a
circle S1 . They are also called trigonometric series since trigonometric functions
{1,sin O , cos O ,sin 2O , cos 2O ,...,sin AO , cos AO} represent such a periodic signal.
Here we have chosen the parameter “longitude O ” to locate a point on S1 .
Instead we could exchange the parameter O by time t , if clock readings would
substitute longitude, a conventional technique in classical navigation. In such a
setting,
2S
O = Zt = t = 2SQ t ,
T
t
AO = AZt = 2S A = 2S AQ t
T
42 1 The first problem of algebraic regression

longitude O would be exchanged by 2S , the product of ground period T and


time t or by 2S , the product of ground frequency Q . In contrast, AO for all
A  {0,1,..., L} would be substituted by 2S the product of overtones A / T or AQ
and time t . According to classical navigation, Z would represent the rotational
speed of the Earth. Notice that A is integer, A  Z .
Box 1.12:
Fourier series
x(O ) = x1 + (sin O ) x2 + (cos O ) x3 + (1.92)
+(sin 2O ) x4 + (cos 2O ) x5 + O3 (sin AO , cos AO )
+L
x(O ) = lim ¦ e (O ) x
A A (1.93)
L of
A = L

ª cos AO A > 0
«
eA (O ) := « 1 A = 0 (1.94)
«¬sin A O A < 0.

Example (approximation of order three):


x (O ) = e0 x1 + e 1 x2 + e +1 x3 + e 2 x4 + e +2 x5 + O3 .
š
(1.95)

Fourier series (1.92), (1.93) can be understood as an infinite-dimensional vector


space (linear space, Hilbert space) since the base functions (1.94) eA (O ) gener-
ate a complete orthogonal (orthonormal) system based on trigonometric
functions. The countable base, namely the base functions eA (O ) or
{1,sin O , cos O , sin 2O , cos 2O , ..., sin AO , cos AO} span the Fourier space
L2 [0, 2S [ . According to the ordering by means of positive and negative indices
{ L,  L + 1,..., 1, 0, +1, ..., L  1, L} (1.95) x š (O ) is an approximation of the
function x(O ) up to order three, also denoted by x L . Let us refer to Box 1.12 as
a summary of the Fourier representation of a function x(O ), O  S1 .
Box 1.13:
The Fourier space
“The base functions eA (O ), A { L,  L + 1,..., 1, 0, +1,..., L  1, L} ,
span the Fourier space L2 [ 0, 2S ] : they generate a complete or-
thogonal (orthonormal) system of trigonometric functions.”
“inner product”
: x  FOURIER and y  FOURIER :
f 2S
1 1
x y := ³ ds * x( s*) y ( s*) = ³ d O x(O ) y (O ) (1.96)
s0 2S 0
1-3 Case study 43
“normalization”
2S
1
< eA (O ) | eA (O ) >:= ³ dO e A1 (O ) eA (O ) = OA G A A (1.97)
1 2
2S 0
2 1 1 2

ª OA = 1 A1 = 0 1
subject to «
«¬ OA = 2 A1 z 0
1
1

“norms, convergence”
2S +L
1
|| x ||
2
= ³ d O x (O ) = lim
2
¦Ox A
2
A <f (1.98)
2S 0
Lof
A=L

lim || x  x šL ||2 = 0 (convergence in the mean) (1.99)


Lof

“synthesis versus analysis”


1
xA = < eA | x >=
+L OA
(1.100) x = lim ¦e x A A versus 2S
(1.101)
L of 1
³ dO e (O ) x (O )
A = L
:= A
2SOA 0
+L
1
x = lim ¦Oe A < x | eA > (1.102)
L of
A = L A

“canonical basis of the Hilbert space FOURIER”


ª 2 sin AO A > 0
«
e := «

A 1 A = 0 (1.103)
«
¬ 2 cos AO A < 0
(orthonormal basis)
1
(1.104) e*A = eA versus eA = OA e*A (1.105)
OA
1
(1.106) xA* = OA xA versus xA = xA* (1.107)
OA
+L
x = lim ¦e *
A < x | e*A > (1.108)
L of
A = L

“orthonormality”
< e*A (x) | e*A (x) >= G A A
1 2 1 2
(1.109)
44 1 The first problem of algebraic regression

Fourier space
Lof
FOURIER = span{e  L , e  L +1 ,..., e 1 , e0 , e1 ,..., e L 1 , e L }

dim FOURIER = lim(2 L + 1) = f


L of

“ FOURIER = HARM L ( S ) ”. 2 1

? What is an infinite dimensional vector space ?


? What is a Hilbert space ?
? What makes up the Fourier space ?
An infinite dimensional vector space (linear space) is similar to a finite dimen-
sional vector space: As in an Euclidean space an inner product and a norm is
defined. While the inner product and the norm in a finite dimensional vector
space required summation of their components, the inner product (1.96), (1.97)
and the norm (1.98) in an infinite-dimensional vector space force us to integra-
tion. Indeed the inner products (scalar products) (1.96), (1.97) are integrals over
the line element of S1r applied to the vectors x(O ) , y (O ) or eA , eA , respec- 1 2
tively. Those integrals are divided by the length s of a total arc of S1r . Alterna-
tive representations of < x | y > and < eA | eA > (Dirac’s notation of brackets,
1 2
decomposed into “bra” and “ket”) based upon ds = rd O , s = 2S r , lead us
directly to the integration over S1 , the unit circle.
A comment has to be made to the normalization (1.97). Thanks to
< eA (O ) | eA (O ) >= 0 for all A1 z A 2 ,
1 2

for instance < e1 (O ) | e1 (O ) > = 0 , the base functions eA (O ) are called orthogo-
nal. But according to
< eA (O ) | eA (O ) > = 12 ,

for instance < e1 (O ) | e1 (O ) > = || e1 (O ) ||2 = 12 , < e 2 (O ) | e 2 (O ) > = || e 2 (O ) ||2 = 12 ,


they are not normalized to 1. A canonical basis of the Hilbert space FOURIER
has been introduced by (1.103) e*A . Indeed the base functions e*A (O ) fulfil the
condition (1.109) of orthonormality.
The crucial point of an infinite dimensional vector space is convergency. When
we write (1.93) x(O ) as an identity of infinite series we must be sure that the
series converge. In infinite dimensional vector space no pointwise convergency is
required. In contrast, (1.99) “convergence in the mean” is postulated. The norm
(1.98) || x ||2 equals the limes of the infinite sum of the OA weighted, squared
coordinate xA , the coefficient in the trigonometric function (1.92),
1-3 Case study 45
+L
|| x ||2 = lim ¦Ox 2
A A < f,
L of
A = L

which must be finite. As soon as “convergence in the mean” is guaranteed, we


move from a pre-Fourier space of trigonometric functions to a Fourier space we
shall define more precisely lateron.
Fourier analysis as well as Fourier synthesis, represented by (1.100) versus
(1.101), is meanwhile well prepared. First, given the Fourier coefficients x A we
are able to systematically represent the vector x  FOURIER in the orthogonal
base eA (O ) . Second, the projection of the vector x  FOURIER onto the base
vectors eA (O ) agrees analytically to the Fourier coefficients as soon as we take
into account the proper matrix of the metric of the Fourier space. Note the re-
producing representation (1.37) “from x to x ”.
The transformation from the orthogonal base eA (O ) to the orthonormal base e*A ,
also called canonical or modular as well as its inverse is summarized by (1.104)
as well as (1.105). The dual transformations from Fourier coefficients x A to
canonical Fourier coefficients x*A as well as its inverse is highlighted by (1.106)
as well as (1.107). Note the canonical reproducing representation (1.108) “from
x to x ”.
The space
ª FOURIER = span {e  L , e  L +1 ,..., e L 1 , e L }º
« L of »
« »
«¬ dim FOURIER = Llim(2
of
L + 1) = f »
¼
has the dimension of hyperreal number f . As already mentioned in the intro-
duction
FOURIER = HARM L ( S ) 2 1

is identical with the Hilbert space L2 (S1 ) of harmonic functions on the circle
S1 .
? What is a harmonic function which has the unit circle S1 as a support ?

A harmonic function “on the unit circle S1 ” is a function x(O ) , O  S1 , which


fulfils
(i) the one-dimensional Laplace equation (the differential equation of a
harmonic oscillator) and
(ii) a special Sturm-Liouville boundary condition.
d2
(1st) '1 x(O ) = 0 œ ( + Z 2 ) x (O ) = 0
dO2
46 1 The first problem of algebraic regression

ª x(0) = x(2S )
(2nd) «
«[ d x(O )](0) = [ d x(O )](2S ).
«¬ d O dO
The special Sturm-Liouville equations force the frequency to be integer, shortly
proven now.
ansatz: x(O ) = cZ cos ZO + sZ sin ZO

x(0) = x(2S ) œ

œ cZ = cZ cos 2SZ + sZ sin 2SZ

d d
[ x(O )](0) = [ x(2S )](2S ) œ
dO dO
œ sZZ = cZZ sin 2SZ + sZZ cos 2SZ œ

cos 2SZ = 0 º
œ Ÿ Z = A A  {0,1,..., L  1, L} .
sin 2SZ = 0 »¼

Indeed, Z = A , A  {0,1,..., L  1, L} concludes the proof.


Box 1.14:
Fourier analysis as an underdetermined linear model
“The observation space Y ”
ª y1 º ª x(O1 ) º
« y » « x (O ) »
« 2 » « 2 »
« # » := « # » = [ x(Oi ) ] i  {1,.., I }, O  [ 0, 2S ] (1.110)
« » « »
« yn 1 » « x(On 1 ) »
«¬ yn »¼ «¬ x(On ) »¼

dim Y = n  I
“equidistant lattice on S1 ”
2S
Oi = (i  1) i  {1,..., I } (1.111)
I
Example ( I = 2) : O1 = 0, O2 = S  180°
Example ( I = 3) : O1 = 0, O2 = 2S
3
 120°, O3 = 4S
3
 240°
Example ( I = 4) : O1 = 0, O2 = 2S
4
 90°, O3 = S  180°, O4 = 3S
2
 270°
Example ( I = 5) : O1 = 0, O2 = 2S
5
 72°, O3 = 4S
5
 144°,
O4 = 6S
5
 216°, O5 = 8S
5
 288°
1-3 Case study 47
“The parameter space X ”
x1 = x0 , x2 = x1 , x3 = x+1 , x4 = x2 , x5 = x+2 ,..., xm 1 = x L , xm = xL (1.112)
dim X = m  2 L + 1
“The underdetermined linear model”
n < m : I < 2L + 1
ª y1 º ª1 sin O1 cos O1 ... sin LO1 cos LO1 º ª x1 º
« y » «1 sin O cos O2 ... sin LO2 cos LO2 »» «« x2 »»
« 2 » « 2

y := « ... » = « » « ... » . (1.113)


« » « »« »
« yn 1 » «1 sin On 1 cos On 1 ... sin LOn 1 cos LOn 1 » « xm 1 »
«¬ yn »¼ «¬1 sin On cos On ... sin LOn cos LOn »¼ «¬ xm »¼

? How can we setup a linear model for Fourier analysis ?


The linear model of Fourier analysis which relates the elements x  X of the
parameter space X to the elements y  Y of the observation space Y is setup
in Box 1.14. Here we shall assume that the observed data have been made avail-
able on an equidistant angular grid, in short “equidistant lattice” of the unit circle
S1 parameterized by ( O1 ,..., On ) . For the optimal design of the Fourier linear
model it has been proven that the equidistant lattice
2S
Oi = (i  1) i  {1,..., I }
I
is “D-optimal”. Box 1.14 contains three examples for such a lattice. In summary,
the finite dimensional observation space Y , dim Y = n , n = I , has integer di-
mension I .
I =2
0° 180° 360°

level L = 0 I =3
level L = 1 0° 120° 240° 360°
level L = 2

level L = 3 I =4
0° 90° 180° 270° 360°

I =5
0° 72° 144° 216° 288° 360°

Figure 1.8: Fourier series, Pascal Figure 1.9: Equidistant lattice on S1


triangular graph, weights I = 2 or 3 or 4 or 5
of the graph: unknown
coefficients of Fourier
series
48 1 The first problem of algebraic regression

In contrast, the parameter space X , dim X = f , is infinite dimensional. The


unknown Fourier coefficients, conventionally collected in a Pascal triangular
graph of Figure 1.8, are vectorized by (1.112) in a peculiar order.
X = span{x0 , x1 , x+1 ,..., x L , x+ L }
L of

dim X = m = f .

Indeed, the linear model (1.113) contains m = 2 L + 1 , L o f , m o f , un-


knowns, a hyperreal number. The linear operator A : X o Y is generated by the
base functions of lattice points.
L
yi = y (Oi ) = lim ¦ e (O ) x
A i A i {1,..., n}
L of
A = L

is a representation of the linear observational equations (1.113) in Ricci calculus


which is characteristic for Fourier analysis.
number of observed data number of unknown
versus
at lattice points Fourier coefficients
n=I m = 2L + 1 o f
(finite) (infinite)

Such a portray of Fourier analysis summarizes its peculiarities effectively. A


finite number of observations is confronted with an infinite number of observa-
tions. Such a linear model of type “underdetermined of power 2” cannot be
solved in finite computer time. Instead one has to truncate the Fourier series, a
technique or approximation to make up Fourier series “finite” or “bandlimited”.
We have to consider three cases.
n>m n=m n<m
overdetermined case regular case underdetermined case
First, we can truncate the infinite Fourier series such that n > m holds. In this
case of an overdetermined problem , we have more observations than equations.
Second, we alternatively balance the number of unknown Fourier coefficients
such that n = m holds. Such a model choice assures a regular linear system.
Both linear Fourier models which are tuned to the number of observations suffer
from a typical uncertainty. What is the effect of the forgotten unknown Fourier
coefficients m > n ? Indeed a significance test has to decide upon any truncation
to be admissible. We are in need of an
objective criterion
to decide upon the degree m of bandlimit. Third, in order to be as objective as
possible we follow the third case of “less observations than unknowns” such that
1-3 Case study 49

n < m holds. Such a Fourier linear model which generates an underdetermined


system of linear equations will consequently be considered.
The first example (Box 1.15: n  m = 1 ) and the second example (Box 1.16:
n  m = 2 ) demonstrate “MINOS” of the Fourier linear model.
Box 1.15:
The first example:
Fourier analysis as an underdetermined linear model:
n  rk A = n  m = 1, L = 1
“ dim Y = n = 2, dim X = m = 3 ”
ªx º
ª y1 º ª1 sin O1 cos O1 º « 1 »
« y » = «1 sin O x  y = Ax
cos O2 »¼ « 2 »
¬ 2¼ ¬ 2
«¬ x3 »¼
Example ( I = 2) : O1 = 0°, O2 = 180°
sin O1 = 0, cos O1 = 1,sin O2 = 0, cos O2 = 1
ª1 0 1 º ª1 sin O1 cos O1 º
A := « »=«  \ 2× 3
¬1 0 1¼ ¬1 sin O2 cos O2 »¼
AA c = 2I 2 œ ( AA c) 1 = 12 I 2

ª 2 1 + sin O1 sin O2 + cos O1 cos O2 º


AA c = « »
¬1 + sin O2 sin O1 + cos O2 cos O1 2 ¼
2S
if Oi = (i  O ) , then
I
1 + 2sin O1 sin O2 + 2 cos O1 cos O2 = 0
or
+L
L = 1: ¦ e (O A i1 )eA (Oi ) = 0 i1 z i2
2
A = L

+L
L = 1: ¦ e (O A i1 )eA (Oi ) = L + 1 i1 = i2
2
A = L

ª x1 º ª1 1 º ª y1 + y2 º
1« 1«
x A = « x2 » = A c( AA c) y = « 0 0 » y = « 0 »»
« » 1 »
2 2
«¬ x3 »¼ A «¬1 1»¼ «¬ y1  y2 »¼

|| x A ||2 = 12 y cy .
50 1 The first problem of algebraic regression

Box 1.16:
The second example:
Fourier analysis as an underdetermined linear model:
n  rk A = n  m = 2, L = 2

“ dim Y = n = 3, dim X = m = 5 ”

ª x1 º
ª y1 º ª1 sin O1 cos O1 sin 2O1 cos 2O1 º «« x2 »»
« y » = «1 sin O cos O2 sin 2O2 cos 2O2 »» « x3 »
« 2» « 2
« »
«¬ y3 »¼ «¬1 sin O3 cos O3 sin 2O3 cos 2O3 »¼ « x4 »
«¬ x5 »¼

Example ( I = 3) : O1 = 0°, O2 = 120° , O3 = 240°

sin O1 = 0,sin O2 = 1
2
3,sin O3 =  12 3
cos O1 = 1, cos O2 =  12 , cos O3 =  12
sin 2O1 = 0,sin 2O2 =  12 3,sin 2O3 = 1
2
3
cos 2O1 = 1, cos 2O2 =  12 , cos 2O3 =  12

ª1 0 1 0 1º
« »
A := «1 2 3  12
1
 1
2
3  12 »
« 1 1 1 »
¬1  2 3  2 2
3  12 ¼

AA c = 3I 3 œ ( AAc) 1 = 13 I 3

AA c =
ª 3
1 + sin O1 sin O2 + cos O1 cos O2 + 1 + sin O1 sin O3 + cos O1 cos O3 + º
« + sin 2O1 sin 2O2 + cos 2O1 cos 2O 2 + sin 2O1 sin 2O3 + cos 2O1 cos 2O3 »
« »
1 + sin O2 sin O3 + cos O2 cos O3 +
«1 + sin O
2
sin O1 + cos O2 cos O1 +
3 »
« + sin 2O
2
sin 2O1 + cos 2O2 cos 2O1 + sin 2O2 sin 2O3 + cos 2O 2 cos 2O3 »
«1 + sin O sin O1 + cos O3 cos O1 + 1 + sin O3 sin O2 + cos O3 cos O 2 + »
« + sin 2O 3
3 »
¬ 3
sin 2O1 + cos 2O3 cos 2O1 + sin 2O3 sin 2O2 + cos 2O3 cos 2O2 ¼

2S
if Oi = (i  1) , then
I
1 + sin O1 sin O2 + cos O1 cos O2 + sin 2O1 sin 2O2 + cos 2O1 cos 2O2 =
= 1  12  12 = 0
1-3 Case study 51

1 + sin O1 sin O3 + cos O1 cos O3 + sin 2O1 sin 2O3 + cos 2O1 cos 2O3 =
= 1  12  12 = 0

1 + sin O2 sin O3 + cos O2 cos O3 + sin 2O2 sin 2O3 + cos 2O2 cos 2O3 =
= 1  34  14  14 + 14 = 0
+L
L = 2: ¦ e (O A i1 )eA (Oi ) = 0 i1 z i2
2
A = L
+L
L = 2: ¦ e (O A i1 )eA (Oi ) = L + 1 i1 = i2
2
A = L

ª1 1 1 º
« 1 1
»
«0 2 3  2 3 » ª y1 º
1
x A = Ac( AAc) 1 y = «1  12  12 » «« y2 »»
3 « »
«0  12 3 12 3 » «¬ y3 »¼
« »
«¬1  12  12 »¼

ª x1 º ª y1 + y2 + y3 º
«x » « 1 1
»
« 2» « 2 3 y2  2 3 y3 »
1 1
x A = « x3 » = « y1  12 y2  12 y3 » , || x ||2 = y cy .
« » 3« » 3
« x4 » «  12 3 y2 + 12 3 y3 »
«¬ x5 »¼ « »
A «¬ y1  12 y2  12 y3 »¼

Lemma 1.10 (Fourier analysis):


If finite Fourier series

ª x1 º
« x2 »
« x3 »
yi = y (Oi ) = [1,sin Oi , cos Oi ,..., cos( L  1)Oi ,sin LOi , cos LOi ] « # » (1.114)
« xm  2 »
«x »
« xm 1 »
¬ m ¼
or
y = Ax, A  \ n× m , rk A = n, I = n < m = 2 L + 1

A O ( n ) := {A  \ n × m | AA c = ( L + 1)I n } (1.115)

are sampled at observations points Oi  S1 on an equidistance


lattice (equiangular lattice)
52 1 The first problem of algebraic regression

2S
Oi = (i  1) i, i1 , i2  {1,..., I } , (1.116)
I
then discrete orthogonality
+L
ª 0 i1 z i2
AA c = ( L + 1)I n  ¦ e (O )eA (Oi ) = « (1.117)
¬ L + 1 i1 = i2
A i1 2
A=L

applies. A is an element of the orthogonal group O(n) . MINOS


of the underdetermined system of linear equations (1.95) is
1 2 1
xm = A cy, xm = y cy. (1.118)
L +1 L +1
1-32 Fourier-Legendre series
? What are Fourier-Legendre series ?
Fourier-Legendre series (1.119) represent the periodic behavior of a function
x(O , I ) on a sphere S 2 . They are called spherical harmonic functions since
{1, P11 (sin I ) sin O , P10 (sin I ), P11 (sin I ) cos I ,..., Pkk (sin I ) cos k O} represent such a
periodic signal. Indeed they are a pelicular combination of Fourier’s trigonomet-
ric polynomials {sin AO , cos AO} and Ferrer’s associated Legendre polynomials
Pk A (sin I ) . Here we have chosen the parameters “longitude O and latitude I ”
to locate a point on S 2 . Instead we could exchange the parameter O by time t ,
if clock readings would submit longitude, a conventional technique in classical
navigation. In such a setting,
2S
O = Zt = t = 2SQ t ,
T
t
AO = AZt = 2S A = 2S AQ t ,
T
longitude O would be exchanged by 2S the product of ground period T and
time t or by 2S the product of ground frequency Q . In contrast, AO for all
A  { k ,  k + 1,..., 1, 0,1,..., k  1, k} would be substituted by 2S the product of
overtones A / T or AQ and time t . According to classical navigation, Z would
represent the rotational speed of the Earth. Notice that both k , A are integer,
k, A  Z .
Box 1.17:
Fourier-Legendre series
x (O , I ) = (1.119)
P00 (sin I ) x1 + P11 (sin I ) sin O x2 + P10 (sin I ) x3 + P1+1 (sin I ) cos O x4 +
+ P2 2 (sin I ) sin 2O x5 + P21 (sin I ) sin O x6 + P20 (sin I ) x7 + P21 (sin I ) cos O x8 +
1-3 Case study 53

+ P22 (sin I ) cos 2O x9 + O3 ( Pk A (sin I )(cos AO ,sin AO ))


K +k
x(O , I ) = lim ¦ ¦ e k A (O , I ) xk A (1.120)
K of
k = 0 A = k

­ Pk A (sin I ) cos AO A > 0


°
ek A (O , I ) := ® Pk 0 (sin I ) A = 0 (1.121)
° P (sin I ) sin | A | O A < 0
¯ kA
K k
x(O , I ) = lim ¦ ¦ Pk A (sin I )(ck A cos AO + sk A sin AO ) (1.122)
K of
k = 0 A = k

“Legendre polynomials of the first kind”


recurrence relation
k Pk (t ) = (2k  1) t Pk 1 (t )  (k  1) Pk  2 (t ) º
initial data : P0 (t ) = 1, P1 (t ) = t »Ÿ
¼
Example: 2 P2 (t ) = 3tP1 (t )  P0 (t ) = 3t 2  1 Ÿ

P2 (t ) = 32 t 2  12

“if t = sin I , then P2 (sin I ) = 32 sin 2 I  12 ”

“Ferrer’s associates Legendre polynomials of the first kind”


d A Pk (t )
Pk A (t ) := (1  t 2 )
l
2
(1.123)
dt A
d
Example: P11 (t ) = 1  t 2 P1 (t )
dt

P11 (t ) = 1  t 2

“if t = sin I , then P11 (sin I ) = cos I ”


d
Example: P21 (t ) = 1  t 2 P2 (t )
dt
d 3 2 1
P21 (t ) = 1  t 2 ( t  2)
dt 2

P21 (t ) = 3t 1  t 2
54 1 The first problem of algebraic regression

“if t = sin I , then P21 (sin I ) = 3sin I cos I ”

d2
Example: P22 (t ) = (1  t 2 ) P2 (t )
dt 2

P22 (t ) = 3(1  t 2 )

“if t = sin I , then P22 (sin I ) = 3cos 2 I ”

Example (approximation of order three):


x š (O , I ) = e00 x1 + e11 x2 + e10 x3 + e11 x4 + (1.124)

+e 2 2 x5 + e 21 x6 + e 20 x7 + e 21 x8 + e 22 x9 + O3

recurrence relations
vertical recurrence relation

Pk A (sin I ) = sin I Pk 1, A (sin I )  cos I ¬ª Pk 1, A +1 (sin I )  Pk 1, A 1 (sin I ) ¼º

initial data: P0 0 (sin I ) = 1,

Pk A = Pk A A < 0

k  1, A  1 k  1, A k  1, A + 1

k,A

Fourier-Legendre series (1.119) can be understood as an infinite-dimensional


vector space (linear space, Hilbert space) since the base functions (1.120)
e k A (O , I ) generate a complete orthogonal (orthonormal) system based on surface
spherical functions. The countable base, namely the base functions e k A (O , I ) or
{1, cos I sin O ,sin I , cos I cos O ,..., Pk A (sin I ) sin AO}

span the Fourier-Legendre space L2 {[0, 2S [ × ]  S / 2, +S / 2[} . According to


our order xˆ(O , I ) is an approximation of the function x(O , I ) up to order
Pk A (sin I ) {cos AO ,sin A O } for all A > 0, A = 0 and A < 0, respectively. Let us
refer to Box 1.17 as a summary of the Fourier-Legendre representation of a
function x(O , I ), O  [0, 2S [, I ]  S/2, +S/2[.
1-3 Case study 55

Box 1.18:
The Fourier-Legendre space
“The base functions e k A (O , I ) , k {1,..., K } , A  { K ,  K + 1,...,
1, 0,1,..., K  1, K } span the Fourier-Legendre space

L2 {[0, 2S ] ×]  S 2, + S 2[} :
they generate a complete orthogonal (orthonormal) system of sur-
face spherical functions.”
“inner product”
: x  FOURIER  LEGENDRE and y  FOURIER  LEGENDRE :
1
< x | y >= dS x(O , I )T y (O , I ) =

(1.125)

2S +S 2
1
= ³ dO ³ dI cosI x(O , I )y (O , I )
4S 0 S 2

“normalization”
2S +S 2
1
<: e k A (O , I ) | e k A > (O , I ) > = ³ dO ³ dI cos I e k A (O , I )e k A (O , I ) (1.126)
1 1 2 2
4S 0 S 2
1 1 2 2

k1 , k2 {0,..., K }
= Ok A G k k G A A 
1 1 1 2 1 2
A 1 , A 2 { k ,..., + k}
1 (k1  A1 )!
Ok A = (1.127)
1 1
2k1 + 1 (k1 + A1 )!
“norms, convergence”
2S +S 2
1
|| x ||2 = O dI cos I x 2 (O , I ) =
4S ³0 ³
d (1.128)
S 2

K +k
= lim ¦ ¦ Ok A x k2A < f
K of
k = 0 A = k

lim || x  x šK ||2 = 0 (convergence in the mean) (1.129)


K of

“synthesis versus analysis”


K +k
1
(1.130) x = lim ¦ ¦ e k A xk A versus xk A = < e k A | x >:= (1.131)
K of
k = 0 A = k O
2S +S 2
1
:= ³ dO ³ dI cos I e k A (O , I )x(O , I )
4SOk A 0 S 2
56 1 The first problem of algebraic regression
K +k
1
x = lim ¦ ¦ ekA < x | ekA > (1.132)
k = 0 A = k Ok A
K of

“canonical basis of the Hilbert space FOURIER-LEGENDRE”

ª 2 cos AO A > 0
(k + A )! «
e (O , I ) := 2k + 1
*
kA Pk A (sin I ) « 1 A = 0 (1.133)
(k  A )! «
¬ 2 sin A O A < 0
(orthonormal basis)
1
(1.134) e*k A = ek A versus e k A = Ok A e*k A (1.135)
Ok A

1
(1.136) xk*A = Ok A xk A versus xk A = xk*A (1.137)
Ok A
K +k
x = lim ¦ ¦ e*k A < x | e*k A > (1.138)
K of
k = 0 A = k

“orthonormality”
< e*k A (O , I ) | e*k A (O , I ) > = G k k G A A
1 2 1 2
(1.139)

Fourier-Legendre space

K of

FOURIER  LEGENDRE = span {e K ,  L , e K ,  L +1 ,..., e K , 1 , e K ,0 , e K ,1 ,..., e K , L 1 , e K , L }

dim FOURIER  LEGENDRE = lim ( K + 1) 2 = f


K of

“ FOURIER  LEGENDRE = HARM L ( S ) ”. 2 2

An infinite-dimensional vector space (linear space) is similar to a finite-


dimensional vector space: As in an Euclidean space an inner product and a norm
is defined. While the inner product and the norm in a finite-dimensional vector
space required summation of their components, the inner product (1.125), (1.126)
and the norm (1.128) in an infinite-dimensional vector space force us to integra-
tion. Indeed the inner products (scalar products) (1.125) (1.126) are integrals
over the surface element of S 2r applied to the vectors x(O , I ), y (O , I ) or e k A , 1 1
e k A respectively.
2 2

Those integrals are divided by the size of the surface element 4S of S 2r . Alter-
native representations of < x, y > and <e k A , e k A > (Dirac’s notation of a bracket
1 1 2 2
1-3 Case study 57

decomposed into “bra” and “txt” ) based upon dS = rd O dI cos I , S = 4S r 2 , lead


us directly to the integration over S 2r , the unit sphere.
Next we adopt the definitions of Fourier-Legendre analysis as well as Fourier-
Legendre synthesis following (1.125) - (1.139). Here we concentrate on the key
problem:
?What is a harmonic function which has the sphere S 2 as a support?
A harmonic function “on the unit sphere S 2 ” is a function x(O, I), (O, I) 
[0, 2S[ × ]  S / 2, +S / 2[ which fulfils
(i) the two-dimensional Laplace equation (the differential equation of
a two-dimensional harmonic osculator) and
(ii) a special Sturm-Liouville boundary condition
d2
(1st) ' k A x(O , I ) = 0 œ ( + Z ) x(O ) = 0
dO2
plus the harmonicity condition for ' k
ª x(0) = x(2S )
(2nd) « d d
«¬[ d O x(O )](0) = [ d O x(O )](2S ).

The special Strum-Liouville equation force the frequency to be integer!

Box 1.19:
Fourier-Legendre analysis as an underdetermined linear model
- the observation space Y -
“equidistant lattice on S 2 ”
(equiangular)
S S
O  [0, 2S [, I  ]  , + [
2 2
ª 2S
O = (i  1) i {1,..., I }
I = 2J : « i I
« Ij j {1,..., I }
¬
ª S S J
« Ik = J + (k  1) J k  {1,..., }
2
J even: «
«Ik =  S  (k  1) S k  { J + 2 ,..., J }
«¬ J J 2
ª S J +1
« Ik = (k  1) J + 1 k  {1,..., 2 }
J odd: «
«Ik = (k  1) S J +3
k  { ,..., J }
¬« J +1 2
58 1 The first problem of algebraic regression

2S
longitudinal interval: 'O := Oi +1  Oi =
I
ª S
« J even : 'I := I j +1  I j = J
lateral interval: «
« J odd : 'I := I j +1  I j = S
«¬ J +1
“initiation: choose J , derive I = 2 J ”
ª 'I J
« Ik = 2 + (k  1)'I k  {1,..., }
2
J even: «
«Ik =  'I J + 2
 (k  1)'I k  { ,..., J }
¬ 2 2
ª J +1
« Ik = (k  1)'I k  {1,...,
2
}
J odd: «
«Ik = (k  1)'I k  { J + 3 ,..., J }
¬ 2
Oi = (i  1)'O i  {1,..., I } and I = 2 J
“multivariate setup of the observation space X ”
yij = x(Oi , I j )
“vectorizations of the matrix of observations”
Example ( J = 1, I = 2) :
Sample points Observation vector y
(O1 , I1 ) ª x(O1 , I1 ) º 2×1
(O2 , I1 ) « x (O , I ) » = y  \
¬ 2 1 ¼
Example ( J = 2, I = 4) :
Sample points Observation vector y
(O1 , I1 ) ª x(O1 , I1 ) º
(O2 , I1 ) « x (O , I ) »
« 2 1 »
(O3 , I1 ) « x(O3 , I1 ) »
(O4 , I1 ) « »
« x(O4 , I1 ) » = y  \8×1
(O1 , I2 ) « x(O1 , I2 ) »
(O2 , I2 ) « »
« x(O2 , I2 ) »
(O3 , I2 ) « x (O , I ) »
« 3 2 »
(O4 , I2 )
¬« x(O4 , I2 ) ¼»
Number of observations: n = IJ = 2 J 2
Example: J = 1 Ÿ n = 2, J = 3 Ÿ n = 18
J = 2 Ÿ n = 8, J = 4 Ÿ n = 32.
1-3 Case study 59

?How can we setup a linear model for Fourier-Legendre analysis?


The linear model of Fourier-Legendre analysis which relates the elements of the
parameter space X to the elements y  Y of the observations space Y is again
setup in Box 1.19. Here we shall assume that the observed data have been made
available on a special grid which extents to
O  [0, 2S[, I ]  S / 2, +S / 2[
ª 2S
O = (i  1) ,  i {1,..., I }
I = 2J : « i I
« Ij,  j {1,..., I }!
¬
longitudinal interval:
2S
'O =: O i +1  O i =
I
lateral interval:
S
J even: 'I =: I j + i  I j =
J
S
J odd: 'I =: I j +1  I j = .
J +1
In addition, we shall review the data sets fix J even as well as for J odd.
Examples are given for (i) J = 1, I = 2 and (ii) J = 2, I = 4 . The number of
observations which correspond to these data sets have been (i) n = 18 and
(ii) n = 32 .
For the optimal design of the Fourier-Legendre linear model it has been shown
that the equidistant lattice
2S ª J even:
O i = (i  1) , I j = «
I ¬ J odd:
ª S S ­ J½
«Ik = J + (k  1) J , k  ®1,..., 2 ¾
¯ ¿
J even: «
« S S ­J +2 ½
«Ik =   (k  1) , k  ® ,..., J ¾
¬ J J ¯ 2 ¿
ª S ­ J + 1½
«Ik = (k  1) J + 1 , k  ®1,..., 2 ¾
¯ ¿
J odd: «
« S ­J +3 ½
«Ik = (k  1) , k  ® ,..., J ¾
¬ J + 1 ¯ 2 ¿
is “D-optimal”.
Table 1.1 as well Table 1.2 are samples of an equidistant lattice on S 2 espe-
cially in a lateral and a longitudinal lattice.
60 1 The first problem of algebraic regression

Table 1.1:
Equidistant lattice on S 2
- the lateral lattice -
lateral grid
J 'I
1 2 3 4 5 6 7 8 9 10
1 - 0°
2 90° +45° -45°
3 45° 0° +45° -45°
4 45° +22,5° +67.5° -22.5° -67.5°
5 30° 0° +30° +60° -30° -60°
6 30° 15° +45° +75° -15° -45° -75°
7 22.5° 0° +22.5° +45° +67.5° -22.5° -45° -67.5°
8 22.5° +11.25° +33.75° +56.25° +78.75° -11.25° -33.75° -56.25° -78.75°
9 18° 0° +18° +36° +54° +72° -18° -36° -54° -72°
10 18° +90° +27° +45° +63° +81° -9° -27° -45° -63° -81°

Table 1.2:
Equidistant lattice on S 2
- the longitudinal lattice -
Longitudinal grid
J I = 2 J 'O
1 2 3 4 5 6 7 8 9 10
1 2 180° 0° 180°
2 4 90° 0° 90° 180° 270°
3 6 60° 0° 60° 120° 180° 240° 300°
4 8 45° 0° 45° 90° 135° 180° 225° 270° 315°
5 10 36° 0° 36° 72° 108° 144° 180° 216° 252° 288° 324°

In summary, the finite-dimensional observation space Y, dimY = IJ , I = 2J has


integer dimension.
As samples, we have computed via Figure 1.10 various horizontal and vertical
sections of spherical lattices for instants (i) J = 1, I = 2, (ii) J = 2, I = 4,
(iii) J = 3, I = 6, (iv) J = 4, I = 8 and (v) J = 5, I = 10 . By means of Figure
1.11 we have added the corresponding Platt-Carré Maps.
Figure 1.10: Spherical lattice
left: vertical section, trace of parallel circles
right: horizontal section, trace of meridians
vertical section horizontal section
meridian section perspective of parallel circles
1-3 Case study 61

J = 1, I = 2 J = 1, I = 2

J = 2, I = 4 J = 2, I = 4

J = 3, I = 6 J = 3, I = 6

J = 4, I = 8 J = 4, I = 8

J = 5, I = 10 J = 5, I = 10

S S
+ +
2 2
Figure 1.11 a:
Platt-Carré Map of S 2
I 0° S 2S I longitude-latitude lat-
tice, case:
J = 1, I = 2, n = 2
S S
 
2 2
O
62 1 The first problem of algebraic regression
S S
+ +
2 2

Figure 1.11 b:
I I Platt-Carré Map of S 2
0° S 2S
longitude-latitude lat-
tice, case:
S S J = 2, I = 4, n = 8
 
2 2
S
O S
+ +
2 2

Figure 1.11 c:
I I Platt-Carré Map of S 2
0° S 2S
longitude-latitude lat-
tice, case:
S S J = 3, I = 6, n = 18
 
2 2
S
O S
+ +
2 2

Figure 1.11 d:
I I Platt-Carré Map of S 2
0° S 2S
longitude-latitude lat-
tice, case:
S S J = 4, I = 8, n = 32
 
2 2

S
O S
+ +
2 2

Figure 1.11 e:
Platt-Carré Map of S 2
I 0° S 2S I longitude-latitude lat-
tice, case:
S S J = 5, I = 10, n = 50.
 
2 2
O
In contrast, the parameter space X, dimX = f is infinite-dimensional. The
unknown Fourier-Legendre coefficients, collected in a Pascal triangular graph
of Figure 1.10 are vectorized by
X = span{x00 , x11 , x10 , x11 ,..., xk A }
k of
k =0ok
| A |= 0 o k
dim X = m = f .
1-3 Case study 63

Indeed the linear model (1.138) contains m = IJ = 2 J 2 , m o f, unknows, a


hyperreal number. The linear operator A : X o Y is generated by the base func-
tions of lattice points
K +k
jij = y ( xij ) = lim ¦ ¦ e k A (Oi , I j )x k A
K of
k = 0 A = k

 i, j  {1,..., n}
is a representation of the linear observational equations (1.138) in Ricci calculus
which is characteristic for Fourier-Legendre analysis.
number of observed data number of unknown
at lattice points Fourier-Legendre coef-
ficients
K +k
n = IJ = 2 J 2 versus m = lim ¦ ¦ e k A
K of
k = 0 A = k

(finite) (infinite)
Such a portray of Fourier-Legendre analysis effectivly summarizes its peculiar-
rities. A finite number of observations is confronted with an infinite number of
observations. Such a linear model of type
“underdetermined of power 2”
cannot be solved in finite computer time. Instead one has to truncate the Fourier-
Legendre series, leaving the serier “bandlimited”. We consider three cases.
n>m n=m n<m
overdetermined case regular cases underdetermined case
First, we have to truncate the infinite Fourier-Legendre series that n > m hold.
In this case of an overdetermined problem, we have more observations than
equations. Second, we alternativly balance the number of unknown Fourier-
Legendre coefficients such that n = m holds. Such a model choice assures a
regular linear system. Both linear Fourier-Legendre models which are tuned to
the number of observations suffer from a typical uncertainty. What is the effect
of the forgotten unknown Fourier-Legendre coefficients m > n ? Indeed a sig-
nificance test has to decide upon any truncation to be admissible. We need an
objective criterion
to decide upon the degree m of bandlimit. Third, in order to be as obiective as
possible we again follow the third case of “less observations than unknows” such
that n < m holds. Such a Fourier-Legendre linear model generating an underde-
termined system of linear equations will consequently be considered.
64 1 The first problem of algebraic regression

A first example presented in Box 1.20 demonstrates “MINOS” of the Fourier-


Legendre linear model for n = IJ = 2 J 2 = 2 and k = 1, m = (k + 1) 2 = 4 as un-
knowns and observations. Solving the system of linear equations Z and
four unknows [x1 , x2 , x3 , x4 ](MINOS) =
ª1 1 º ª y1 + y2 º
«0 0 » « »
1
= « » = 1 « 0 ».
2 «0 0 » 2 « 0 »
« » « »
¬1 1¼ ¬ y1  y2 ¼
The second example presented in Box 1.21 refers to “MINOS” of the Fourierr-
Legendre linear model for n = 8 and m = 9 . We have computed the
design matrix A .

Box 1.20
The first example:
Fourier-Legendre analysis as an underdetermined linear model:
m  rk A = m  n = 2

dim Y = n = 2 versus dim X = m = 4

J = 1, I = 2 J = 2 Ÿ n = IJ = 2 J 2 = 2 versus K = 1 Ÿ m = ( k + 1) 2 = 4

ª x1 º
« »
ª y1 º 1 P11 ( sin I1 ) sin O1 P10 ( sin I1 ) P11 ( sin I1 ) cos O1 « x2 »
ª º
= «
« y » 1 P sin I sin O P sin I P sin I cos O « x » »
¬ 2 ¼ «¬ 11 ( 2) 2 10 ( 2 ) 11 ( 2) 2»
¼« 3»
¬ x4 ¼
subject to

( O1 , I1 ) = (0D , 0D ) and ( O2 , I2 ) = (180D , 0D )


{y = Ax A  \ 2×4 , rkA = n = 2, m = 4, n = m = 2}
ª1 0 0 1 º ª1 P11 ( sin I1 ) sin O1 P10 ( sin I1 ) P11 ( sin I1 ) cos O1 º
A := « »=« »
¬1 0 0 1¼ «¬1 P11 ( sin I2 ) sin O2 P10 ( sin I2 ) P11 ( sin I2 ) cos O2 »¼

P11 ( sin I ) = cos I , P10 ( sin I ) = sin I


P11 ( sin I1 ) = P11 (sin I2 ) = 1 , P10 ( sin I1 ) = P10 (sin I2 ) = 0
sin O1 = sin O2 = 0, cos O1 = 1, cos O2 = 1
1-3 Case study 65

AAc =
ª 1 + P11 ( sin I1 ) P11 (sin I2 ) sin O1 sin O2 + º
« »
« 1 + P11 ( sin I1 ) + P10 (sin I1 ) + P10 (sin I1 ) P10 (sin I2 ) +
2 2
»
« »
« + P11 ( sin I1 ) P11 (sin I2 ) cos O1 cos O2 »
«1 + P ( sin I ) P (sin I ) sin O sin O + »
11 2 11 1 2 1
« »
« + P10 ( sin I2 ) P10 (sin I2 ) + 1 + P112 ( sin I2 ) + P102 (sin I2 ) »
« »
¬« + P11 ( sin I2 ) P11 (sin I1 ) cos O2 cos O1 ¼»
1
AAc = 2I 2 œ (AAc)-1 = I 2
2

K =1 + k1 , + k 2

¦ ¦ e k A (Oi , Ii ) e k A (Oi , Ii ) = 0, i1 z i2


1 1 1 1 2 2 2 2
k1 , k 2 = 0 A1 =-k1 , A 2 =  k 2

K =1 + k1 , + k 2

¦ ¦ e k A (Oi , Ii ) e k A (Oi , Ii ) = 2, i1 = i2


1 1 1 1 2 2 2 2
k1 , k 2 = 0 A1 =-k1 , A 2 =  k 2

ª x1 º ª c00 º
«x » «s »
« 2» 1
xA = ( MINOS ) = « 11 » ( MINOS ) = A cy =
« x3 » « c10 » 2
« » « »
¬ x4 ¼ ¬ c11 ¼
ª1 1 º ª y1 + y2 º
«0 0 » y « »
1
= « » ª 1º = 1 « 0 ».
« »
2 « 0 0 » ¬ y2 ¼ 2 « 0 »
« » « »
¬1 1¼ ¬ y1  y2 ¼

Box 1.21
The second example:
Fourier-Legendre analysis as an underdetermined linear model:
m  rk A = m  n = 1
dim Y = n = 8 versus dim X 1 = m = 9
J = 2, I = 2 J = 4 Ÿ n = IJ = 2 J 2 = 8
versus
k = 2 Ÿ m = (k + 1) 2 = 9
66 1 The first problem of algebraic regression

ª y1 º ª1 P11 ( sin I1 ) sin O1 P10 ( sin I1 ) P11 ( sin I1 ) cos O1


« ... » = «" … … …
« » «
¬ y2 ¼ «¬ 1 P11 ( sin I8 ) sin O8 P10 ( sin I8 ) P11 ( sin I8 ) cos O8
P22 ( sin I1 ) sin 2O1 P21 ( sin I1 ) sin O1 P20 ( sin I1 )
… … …
P22 ( sin I8 ) sin 2O8 P21 ( sin I8 ) sin O8 P20 ( sin I8 )

P21 ( sin I1 ) cos O1 P22 ( sin I1 ) cos 2O1 º ª x1 º


»« »
… … » «... »
P21 ( sin I8 ) cos O8 P22 ( sin I8 ) cos 2O8 »¼ «¬ x9 »¼

“equidistant lattice,
longitudinal width 'O , lateral width 'I ”
'O = 90D , 'I = 90D

(O1 , I1 ) = (0D , +45D ), (O2 , I2 ) = (90D , +45D ), (O3 , I3 ) = (180D , +45D ),


(O4 , I4 ) = (270D , +45D ), (O5 , I5 ) = (0D , 45D ), (O6 , I6 ) = (90D , 45D ),
(O7 , I7 ) = (180D , 45D ), (O8 , I8 ) = (270D , 45D )

P11 (sin I ) = cos I , P10 (sin I ) = sin I

P11 (sin I1 ) = P11 (sin I2 ) = P11 (sin I3 ) = P11 (sin I4 ) = cos 45D = 0,5 2
P11 (sin I5 ) = P11 (sin I6 ) = P11 (sin I7 ) = P11 (sin I8 ) = cos( 45D ) = 0,5 2
P10 (sin I1 ) = P10 (sin I2 ) = P10 (sin I3 ) = P10 (sin I4 ) = sin 45D = 0,5 2
P10 (sin I5 ) = P10 (sin I6 ) = P10 (sin I7 ) = P10 (sin I8 ) = sin( 45D ) = 0,5 2
P22 (sin I ) = 3cos 2 I , P21 (sin I ) = 3sin I cos I , P20 (sin I ) = (3 / 2) sin 2 I  (1/ 2)
P22 (sin I1 ) = P22 (sin I2 ) = P22 (sin I3 ) = P22 (sin I4 ) = 3cos 2 45D = 3 / 2
P22 (sin I5 ) = P22 (sin I6 ) = P22 (sin I7 ) = P22 (sin I8 ) = 3cos 2 ( 45D ) = 3 / 2
P21 (sin I1 ) = P21 (sin I2 ) = P21 (sin I3 ) = P21 (sin I4 ) = 3sin 45D cos 45D = 3 / 2
P21 (sin I5 ) = P21 (sin I6 ) = P21 (sin I7 ) = P21 (sin I8 ) = 3sin( 45D ) cos( 45D ) = 3 / 2
P20 (sin I1 ) = P20 (sin I2 ) = P20 (sin I3 ) = P20 (sin I4 ) = (3 / 2) sin 2 45D  (1/ 2) = 1/ 4
P20 (sin I5 ) = P20 (sin I6 ) = P20 (sin I7 ) = P20 (sin I8 ) = (3 / 2) sin 2 ( 45D )  (1/ 2) = 1/ 4
sin O1 = sin O3 = sin O5 = sin O7 = 0
sin O2 = sin O6 = +1, sin O4 = sin O8 = 1
cos O1 = cos O5 = +1, cos O2 = cos O4 = cos O6 = cos O8 = 0
1-3 Case study 67

cos O3 = cos O7 = 1
sin 2O1 = sin 2O2 = sin 2O3 = sin 2O4 = sin 2O5 = sin 2O6 = sin 2O7 = sin 2O8 = 0
cos 2O1 = cos 2O3 = cos 2O5 = cos 2O7 = +1
cos 2O2 = cos 2O4 = cos 2O6 = cos 2O8 = 1
A  \8×9
ª1 0 2/2 2/2 0 0 1/ 4 3 / 2 3 / 2 º
«1 2/2 2/2 0 0 3 / 2 1/ 4 0 3 / 2 »
« »
«1 0 2/2  2/2 0 0 1/ 4 3 / 2 3 / 2 »
«
A=« 1  2/2 2/2 0 0 3 / 2 1/ 4 0 3 / 2 » .
1 0  2/2 2/2 0 0 1/ 4 3 / 2 3 / 2 »
« »
«1 2/2  2/2 0 0 3 / 2 1/ 4 0 3 / 2 »
« 1 0  2/2  2/2 0 0 1/ 4 3 / 2 3 / 2 »
«¬1  2 / 2  2/2 0 0 3 / 2 1/ 4 0 3 / 2 »¼
rkA < min{n, m} < 8.
Here “the little example” ends, since the matrix A is a rank smaller than 8!
In practice, one is taking advantage of
• Gauss elimination
or
• weighting functions
in order to directly compute the Fourier-Legendre series. In order to understand
the technology of “weighting function” better, we begin with rewriting the
spherical harmonic basic equations. Let us denote the letters
+S / 2 2S
1
f k A := ³ dI cos I ³ d O Z (I )e k A (O , I ) f (O , I ) ,
4S S / 2 0

the spherical harmonic expansion f k A of a spherical function f (O , I ) weighted


by Z (I ) , a function of latitude. A band limited representation could be specified
by
+S / 2 2S K
1
f k A := I I O Z I ¦ e k A (O , I )e k A (O , I ) f kšA
4S S³/ 2 ³0
d cos d ( ) 1 1 1 1
k ,A 1 1

K S /2 2S
1
fkA = ¦ f kš,A ³ dI cos I ³ d O w(I )e k A (O , I )e k A ( O , I ) =
k1 , A1
1 1
4S S / 2 0
1 1

K
= ¦ f ¦g e š
k1 , A1 ij kA ( Oi , I j )e k A ( Oi , I j ) =
1 1
k1 , A1 i, j

K
= ¦ gij ¦ gij ekA (Oi ,I j )ek A (Oi ,I j ) = 1 1
i, j k1 ,A1
68 1 The first problem of algebraic regression

= ¦ gij f ( Oi , I j )e k A ( Oi , I j ) .
i, j

As a summary, we design the weighted representation of Fourier-Legendre syn-


thesis
J
f š = ¦ g j Pk*A (sin I j ) f Aš (I j )
kA
j =1
J
1st: Fourier f š (I j ) = ¦ g j eA (O ) f (Oi , I j )
A
i =1
J
2nd: Legendre f kš,A ( I , J ) = ¦ g j Pk*A (sin I j ) f Aš (I j )
j =1

lattice: (Oi , I j ) .

1-4 Special nonlinear models


As an example of a consistent system of linearized observational equations
Ax = y , rk A = rk( A, y ) where the matrix A  R n× m is the Jacobi matrix (Jacobi
map) of the nonlinear model, we present a planar triangle whose nodal points
have to be coordinated from three distance measurements and the minimum
norm solution of type I -MINOS.
1-41 Taylor polynomials, generalized Newton iteration
In addition we review the invariance properties of the observational equations
with respect to a particular transformation group which makes the a priori inde-
terminism of the consistent system of linearized observational equations plausi-
ble. The observation vector Y  Y { R n is an element of the column space
Y  R ( A) . The geometry of the planar triangle is illustrated in Figure 1.12.
The point of departure for the linearization process of nonlinear observational
equations is the nonlinear mapping X 6 F ( X) = Y . The B. Taylor expansion
Y = F( X) = F(x) + J (x)( X  x) + H( x)( X  x) … ( X  x) +
+ O [( X  x) … ( X  x) … ( X  x)],
which is truncated to the order O [( X  x) … ( X  x) … ( X  x)], J ( x), H( x) ,
respectively, represents the Jacobi matrix of the first partial derivatives,
while H , the Hesse matrix of second derivatives, respectively, of the vector-
valued function F ( X) with respect to the coordinates of the vector X , both
taken at the evaluation point x . A linearized nonlinear model is generated by
truncating the vector-valued function F(x) to the order O [( X  x) … ( X  x)] ,
namely
'y := F( X)  F(x) = J (x)( X  x) + O [( X  x) … ( X  x)].
A generalized Newton iteration process for solving the nonlinear observational
equations by solving a sequence of linear equations of (injectivity) defect by
means of the right inverse of type G x -MINOS is the following algorithm.
1-4 Special nonlinear models 69

Newton iteration
Level 0: x 0 = x 0 , 'y 0 = F( X)  F(x 0 )
'x1 = [ J (x 0 ) ]R 'y 0


Level 1: x1 = x 0 + 'x1 , 'y1 = F (x)  F (x1 )


'x 2 = [ J (x1 ) ]R 'y1


Level i: xi = xi 1 + 'xi , 'y i = F(x)  F(xi )


'xi +1 = [ J (xi ) ]R 'y i


Level n: 'x n +1 = 'x n


(reproducing point in the computer arithmetric)
I -MINOS, rk A = rk( A, y )

The planar triangle PD PE PJ is approximately an equilateral triangle pD pE pJ


whose nodal points are a priori coordinated by Table 1.3.
Table 1.3: Barycentric rectangular coordinates of the equilateral
triangle pD pE pJ of Figure 1.12

ª 1 ª 1
« xD =  2 « xE = 2 ª xJ = 0
pD = « , pE = « , pJ = «
«y =  1 «y =  1 «y = 1 3
«¬ D 3
«¬ E 3 «¬ J 3
6 6
Obviously the approximate coordinates of the three nodal points are barycentric,
namely characterized by Box 1.22: Their sum as well as their product sum van-
ish.

Figure 1.12: Barycentric rectangular coordinates of the nodal points,


namely of the equilateral triangle
70 1 The first problem of algebraic regression

Box 1.22: First and second moments of nodal points, approximate coordinates
x B + x C + x H = 0, yB + y C + y H = 0

J xy = xD yD + xE yE + xJ yJ = 0

J xx = ( yD2 + yE2 + yJ2 ) =  12 , J yy = ( xD2 + xE2 + xJ2 ) =  12 : J xx = J yy

ªI º ª xD + xE + xJ º ª0 º
[ Ii ] = « I x » = « »=« »
¬ y¼ ¬« yD + yE + yJ ¼» ¬0 ¼

for all i  {1, 2}


ª I xx I xy º ª ( yD + yE + yJ ) xD yD + yE xE + xJ yJ º
2 2 2

ª¬ I ij º¼ = « »=« »=
«¬ I xy I yy »¼ «¬ xD yD + yE xE + xJ yJ  ( xD + xE + xJ ) »¼
2 2 2

ª 1 0 º
=« 2 1»
=  12 I 2
¬ 0  2¼

for all i, j  {1, 2} .

Box 1.23: First and second moments of nodal points, inertia tensors
2
I1 = ¦ ei I i = e1 I1 + e 2 I 2
i =1
+f +f

for all i, j  {1, 2} : I i = ³ dx ³ dy U ( x, y ) xi


f f
2
I2 = ¦e i
… e j I ij = e1 … e1 I11 + e1 … e 2 I12 + e 2 … e1 I 21 + e 2 … e 2 I 22
i , j =1
+f +f

for all i, j  {1, 2} : I ij = ³ dx ³ dy U ( x, y )( xi x j  r 2G ij )


f f

subject to r 2 = x 2 + y 2

U ( x, y ) = G ( x, y, xD yD ) + G ( x, y, xE yE ) + G ( x, y , xJ yJ ) .

The product sum of the approximate coordinates of the nodal points constitute
the rectangular coordinates of the inertia tensor
2
I= ¦e i
… e j I ij
i , j =1
+f +f

I ij = ³ dx ³ dy U ( x, y)( x x i j  r 2G ij )
f f
1-4 Special nonlinear models 71

for all i, j  {1, 2} , r 2 = x 2 + y 2 ,


U ( x, y ) = G ( x, y, xD yD ) + G ( x, y, xE yE ) + G ( x, y , xJ yJ ) .

The mass density distribution U ( x, y ) directly generates the coordinates


I xy , I xx , I yy of the inertia tensor in Box 1.22. ( G (., .) denotes the Dirac general-
ized function.). The nonlinear observational equations of distance measurements
are generated by the Pythagoras representation presented in
Box 1.24: Nonlinear observational equations of distance
measurements in the plane, (i) geometric notation versus
(ii) algebraic notation
Y1 = F1 ( X) = SDE
2
= ( X E  X D ) 2 + (YE  YD ) 2
Y2 = F2 ( X) = S EJ2 = ( X J  X E ) 2 + (YJ  YE ) 2
Y3 = F3 ( X) = SJD2 = ( X D  X J ) 2 + (YD  YJ ) 2 .

sB. Taylor expansion of the nonlinear distance observational equationss


Y c := ª¬ SDE
2
, S EJ2 , SJD2 º¼ , Xc := ª¬ X D , YD , X E , YE , X J , YJ º¼

xc = ª¬ xD , yD , xE , yE , xJ , yJ º¼ = ª¬  12 ,  16 3, 12 ,  16 3, 0, 13 3 º¼

sJacobi maps

ª w F1 w F1 w F1 w F1 w F1 w F1 º
« »
« wX D wYD wX E wYE wX J wYJ »
«wF w F2 w F2 w F2 w F2 w F2 »
J (x) := « 2 » ( x) =
« wX D wYD wX E wYE wX J wYJ »
« »
« w F3 w F3 w F3 w F3 w F3 w F3 »
« wX D wYD wX E wYE wX J wYJ »¼
¬

ª 2( xE  xD ) 2( y E  yD ) 2( xE  xD ) 2( y E  yD ) 0 0 º
« »
« 0 0 2( xJ  xE ) 2( yJ  y E ) 2( xJ  xE ) 2( yJ  y E ) » =
« 2( xD  xJ ) 2( yD  yJ ) 0 0 2( xD  xJ ) 2( yD  yJ ) »¼
¬
ª 2 0 2 0 0 0º
« »
=«0 0 1  3 1 3»
« »
¬ 1  3 0 0 1 3¼

Let us analyze sobserved minus computed s


72 1 The first problem of algebraic regression

'y := F( X)  F(x) = J (x)( X  x) + O [ ( X  x) … ( X  x) ] =


= J'x + O [ ( X  x) … ( X  x) ] ,

here specialized to
Box 1.25: Linearized observational equations of distance measurements
in the plane, I -MINOS, rkA = dimY
sObserved minus computeds
'y := F( X)  F(x) = J (x)( X  x) + O [ ( X  x) … ( X  x) ] =
= J'x + O [ ( X  x) … ( X  x) ] ,
2 2 2
ª 'sDE º ª SDE  sDE º ª1.1  1 º ª 1 º
10
« 2 » « 2 » « » « 1»
« 'sEJ » = « S EJ  sEJ » = «0.9  1» = «  10 »
2

« 2 » « 2 2 » « » « 1»
«¬ 'sJD »¼ «¬ SJD  sJD »¼ ¬1.2  1 ¼ ¬ 5 ¼
ª 'xD º
« »
'yD »
0 º«
2
ª 'sDE º ª aDE bDE  aDE  bDE 0
« 2 » « » « 'xE »
« 'sEJ » = « 0 0 aEJ bEJ  aEJ  bEJ » « »
« 2 » « « 'yE »
a  bJD 0 0 aJD bJD »¼ « »
¬« 'sJD ¼» ¬ JD
« 'xJ »
« 'y »
¬ J¼
slinearized observational equationss

y = Ax, y  R 3 , x  R 6 , rkA = 3
ª 2 0 2 0 0 0º
« »
A=«0 0 1  3 1 3»
« »
¬ 1  3 0 0 1 3¼

ª 9 3 3 º
« »
« 3 3 5 3 »
1 «« 9 3 3 »
A c( AA c) 1 = »
36 « 3 5 3 3 »
« »
« 0 6 6 »
« 2 3 4 3 4 3 »¼
¬

sminimum norm solutions


1-4 Special nonlinear models 73

ª 'xD º ª  9 y1 + 3 y2  3 y3 º
« » « »
« 'yD » « 3 y1 + 3 y2  5 3 y3 »
« 'xE » 1 « 9 y1 + 3 y2  3 y3 »
xm = « »= « »
« 'yE » 36 « 3 y1  5 3 y2 + 3 y3 »
« » « »
« 'xJ » «  6 y2 + 6 y3 »
« 'y » « »
¬ J¼ ¬ 2 3 y1 + 4 3 y2 + 4 3 y3 ¼

xcm = 180
1 ª
¬ 9, 5 3, 0, 4 3, +9, 3 ¼
º

(x + 'x)c = ª¬ xD + 'xD , yD + 'yD , xE + 'xE , yE + 'yE , xJ + 'xJ , yJ + 'yJ º¼ =

= 180
1 ª
¬ 99, 35 3, +90, 26 3, +9, +61 3 ¼ .
º

The sum of the final coordinates is zero, but due to the non-symmetric displace-
ment field ['xD , 'yD , 'xE , 'yE , 'xJ , 'yJ ]c the coordinate J xy of the inertia
tensor does not vanish. These results are collected in Box 1.26.
Box 1.26: First and second moments of nodal points, final coordinates
yD + 'yD + yE + 'yE + yJ + 'yJ = yD + yE + yJ + 'yD + 'yE + 'yJ = 0

J xy = I xy + 'I xy =
= ( xD + 'xD )( yD + 'yD ) + ( xE + 'xE )( yE + 'yE ) + ( xJ + 'xJ )( yJ + 'yJ ) =
= xD yD + xE yE + xJ yJ + xD 'yD + yD 'xD + xE 'yE + yE 'xE + xJ 'yJ + yJ 'xJ +
+ O ('xD 'yD , 'xE 'yE , 'xJ 'yJ )
= 3 /15
J xx = I xx + 'I xx =
= ( yD + 'yD ) 2  ( yE + 'yE ) 2  ('yJ  yJ ) 2 =
= ( yD2 + yE2 + yJ2 )  2 yD 'yD  2 yE 'yE  2 yJ 'yJ  O ('yD2 , 'yE2 , 'yJ2 ) =
= 7 /12
J yy = I yy + 'I yy =
= ( xD + 'xD ) 2  ( xE + 'xE ) 2  ('xJ  xJ ) 2 =
= ( xD2 + xE2 + xJ2 )  2 xD 'xD  2 xE 'xE  2 xJ 'xJ  O ('xD2 , 'xE2 , 'xJ2 )
= 11/ 20 .
ƅ
74 1 The first problem of algebraic regression

1-42 Linearized models with datum defect


More insight into the structure of a consistent system of observational equations
with datum defect is gained in the case of a nonlinear model. Such a nonlinear
model may be written Y = F ( X) subject to Y  R n , X  R m , or
{Yi = Fi ( X j ) | i  {1, ..., n}, j  {1, ..., m}}.

A classification of such a nonlinear function can be based upon the "soft" Im-
plicit Function Theorem which is a substitute for the theory of algebraic partion-
ing, namely rank partitioning. (The “soft” Implicit Function Theorem is re-
viewed in Appendix C.) Let us compute the matrix of first derivatives
wFi
[ ]  R n× m ,
wX j

a rectangular matrix of dimension n × m. The set of n independent columns


builds up the Jacobi matrix
ª wF1 wF1 wF1 º
« wX "
wX 2 wX n »
« 1 »
« wF2 wF2 wF2 »
« " »
A := « wX 1 wX 2 wX n » , r = rk A = n,
«" »
« »
« wFn wFn
"
wFn »
« wX wX 2 wX n »¼
¬ 1
the rectangular matrix of first derivatives
A := [ A1 , A 2 ] = [J, K ]

subject to
A  R n× m , A1 = J  R n× n = R r × r , A 2 = K  R n× ( m  n ) = R n× ( m  r ) .

m-rk A is called the datum defect of the consistent system of nonlinear equations
Y = F ( X) which is a priori known. By means of such a rank partitioning we
have decomposed the vector of unknowns
Xc = [ X1c , Xc2 ]

into “bounded parameters” X1 and “free parameters” X 2 subject to


X1  R n = R r , and X 2  R m  n = R m  r .

Let us apply the “soft” Implicit Function Theorem to the nonlinear observational
equations of distance measurements in the plane which we already have intro-
1-4 Special nonlinear models 75

duced in the previous example. Box 1.27 outlines the nonlinear observational
equations for Y1 = SDE 2
, Y2 = S EJ2 , Y3 = SJD2 . The columns c1 , c 2 , c3 of the matrix
[wFi / wX j ] are linearly independent and accordingly build up the Jacobi matrix J
of full rank. Let us partition the unknown vector Xc = [ X1c , Xc2 ] , namely into the
"free parameters" [ X D , YD , YE ]c and the "bounded parameters" [ X E , X J , YJ ]c.
Here we have made the following choice for the "free parameters": We have
fixed the origin of the coordinate system by ( X D = 0, YD = 0). Obviously the
point PD is this origin. The orientation of the X-axis is given by YE = 0. In
consequence the "bounded parameters" are now derived by solving a quadratic
equation, indeed a very simple one: Due to the datum choice we find

(1st) X E = ± SDE
2
= ± Y1
(2 nd) X J = ± ( SDE
2
 S EJ2 + SJD2 ) /(2SDE ) = ±(Y1  Y2 + Y3 ) /(2 Y1 )
(3rd) YJ = ± SJD2  ( SDE
2
 S EJ2 + SJD2 ) 2 /(4 SDE
2
) = ± Y3  (Y1  Y2 + Y3 ) 2 /(4Y1 ) .

Indeed we meet the characteristic problem of nonlinear observational equations.


There are two solutions which we indicated by "± " . Only prior information can
tell us what the realized one in our experiment is. Such prior information has
been built into by “approximate coordinates” in the previous example, a prior
information we lack now. For special reasons here we have chosen the "+" solu-
tion which is in agreement with Table 1.3.
An intermediate summary of our first solution of a set of nonlinear observational
equations is as following: By the choice of the datum parameters (here: choice of
origin and orientation of the coordinate system) as "free parameters" we were
able to compute the "bounded parameters" by solving a quadratic equation. The
solution space which could be constructed in a closed form was non-unique.
Uniqueness was only achieved by prior information.
The closed form solution X = [ X 1 , X 2 , X 3 , X 4 , X 5 , X 6 ]c = [ X D , YD , X E , YE , X J , YJ ]c
has another deficiency. X is not MINOS: It is for this reason that we apply the
datum transformation ( X , Y ) 6 ( x, y ) outlined in Box 1.28 subject to & x &2 = min,
namely I-MINOS. Since we have assumed distance observations, the datum
transformation is described as rotation (rotation group SO(2) and a translation
(translation group T(2) ) in toto with three parameters (1 rotation parameter
called I and two translational parameters called t x , t y ). A pointwise transforma-
tion ( X D , YD ) 6 ( xD , yD ), ( X E , YE ) 6 ( xE , yE ) and ( X J , YJ ) 6 ( xJ , yJ ) is pre-
sented in Box 1.26. The datum parameters ( I , t x , t y ) will be determined by I-
MINOS, in particular by
a special Procrustes algorithm
contained in Box 1.28. There are various representations of the Lagrangean of
type MINOS outlined in Box 1.27. For instance, we could use the representation
76 1 The first problem of algebraic regression

of & x &2 in terms of observations ( Y1 = SDE 2


, Y2 = S EJ2 , Y3 = SJD2 ) which transforms
(i) & x & into (ii) & x & (Y1 , Y2 , Y3 ) . Finally (iii) & x &2 is equivalent to minimizing
2 2

the product sums of Cartesian coordinates.


Box 1.27: nonlinear observational equations of distance
measurements in the plane
(i) geometric notation versus (ii) algebraic notation
"geometric notation"
2
SDE = ( X E  X D ) 2 + (YE  YD ) 2
S EJ2 = ( X J  X E ) 2 + (YJ  YE ) 2
SJD2 = ( X D  X J ) 2 + (YD  YJ ) 2

"algebraic notation"
Y1 = F1 ( X) = SDE
2
= ( X E  X D ) 2 + (YE  YD ) 2
Y2 = F2 ( X) = S EJ2 = ( X J  X E ) 2 + (YJ  YE ) 2
Y3 = F3 ( X) = SJD2 = ( X D  X J ) 2 + (YD  YJ ) 2

Y c := [Y1 , Y2 , Y3 ] = [ SDE
2
, S EJ2 , SJD2 ]
Xc := [ X 1 , X 2 , X 3 , X 4 , X 5 , X 6 ] = [ X D , YD , X E , YE , X J , YJ ]

"Jacobi matrix"
wFi
[ ]=
wX j
ª ( X 3  X 1 ) ( X 4  X 2 ) ( X 3  X 1 ) ( X 4  X 2 ) 0 0 º
=2 « 0 0 ( X 5  X 3 ) ( X 6  X 4 ) ( X 5  X 3 ) ( X 6  X 4 ) »
« »
«¬ ( X 1  X 5 ) ( X 2  X 6 ) 0 0 ( X 1  X 5 ) ( X 2  X 6 ) ¼»
wF wF
rk[ i ] = 3, dim[ i ] = 3 × 6
wX j wX j

ª ( X 3  X 1 )  ( X 4  X 2 ) ( X 3  X 1 ) º
J = «« 0 0 ( X 5  X 3 ) »» , rk J = 3
«¬ ( X 1  X 5 ) (X2  X6) 0 »¼

ª (X4  X2) 0 0 º
K = «« ( X 6  X 4 ) ( X 5  X 3 ) ( X 6  X 4 ) »» .
«¬ 0 ( X 1  X 5 ) ( X 2  X 6 ) »¼
1-4 Special nonlinear models 77

"free parameters" "bounded parameters"

X1 = X D = 0 X 3 = X E = + SDE
( )

X 2 = YD = 0 X 5 = X J = + SDE
2
 S EJ2 + SJD2 = + Y32  Y22 + Y12
( ) ()

X 4 = YE = 0 X 6 = YJ = + S EJ2  SDE
2
= + Y22  Y12
( ) ( )

Box 1.28: Datum transformation of Cartesian coordinates

ª xº ª X º ªtx º
« y » = R « Y » + «t »
¬ ¼ ¬ ¼ ¬ y¼

R  SO(2):={R  R 2× 2 | R cR = I 2 and det R = +1}

Reference:
Facts :(representation of a 2×2 orthonormal matrix) of Appendix A:
ª cos I sin I º
R=«
¬  sin I cos I »¼

xD = X D cos I + YD sin I  t x
yD =  X D sin I + YD cos I  t y

xE = X E cos I + YE sin I  t x
yE =  X E sin I + YE cos I  t y

xJ = X J cos I + YJ sin I  t x
yJ =  X J sin I + YJ cos I  t y .

Box 1.29: Various forms of MINOS


(i ) & x &2 = xD2 + yD2 + xE2 + yE2 + xJ2 + yJ2 = min
I ,tx ,t y

(ii ) & x &2 = 12 ( SDE


2
+ S EJ2 + SJD2 ) + xD xE + xE xJ + xJ xD + yD yE + yE yJ + yJ yD = min
I ,tx ,t y

ª xD xE + xE xJ + xJ xD = min
(iii ) & x &2 = min œ « y y + y y + y y = min .
¬ D E E J J D

The representation of the objective function of type MINOS in term of the obser-
vations Y1 = SDE
2
, Y2 = S EJ2 , Y3 = SJD2 can be proven as follows:
78 1 The first problem of algebraic regression

Proof:
2
SDE = ( xE  xD ) 2 + ( yE  yD ) 2
= xD2 + yD2 + xE2 + yE2  2( xD xE + yD yE ) Ÿ
Ÿ 1
2
2
SDE + xD xE + yD yE = 12 ( xD2 + yD2 + xE2 + yE2 )
& x &2 = xD2 + yD2 + xE2 + yE2 + xJ2 + yJ2 =
= 12 ( SDE
2
+ S EJ2 + SJD2 ) + xD xE + xE xJ + xJ xD + yD yE + yE yJ + yJ yD

& x &2 = 12 (Y1 + Y2 + Y3 ) + xD xE + xE xJ + xJ xD + yD yE + yE yJ + yJ yD . ƅ

Figure1.13: Commutative diagram Figure1.14: Commutative diagram


(P-diagram) (E-diagram)
P0 : centre of polyhedron P0 : centre of polyhedron (triangle PD PE PJ
(triangle PD PE PJ ) orthonormal 2-legs {E1 , E1 | P0 } and
action of the translation group {e1 , e1 | P0 } ) at P0
action of the translation group

As soon as we substitute the datum transformation of Box 1.28 which we illus-


trated by Figure 1.9 and Figure 1.10 into the Lagrangean L (t x , t y , I ) of type
MINOS ( & x &2 = min ) we arrive at the quadratic objective function of Box 1.30.
In the first forward step of the special Procrustes algorithm we obtain the mini-
mal solution for the translation parameters (tˆx , tˆy ) . The second forward step of
the special Procrustes algorithm is built on (i) the substitution of (tˆx , tˆy ) in the
original Lagrangean which leads to the reduced Lagrangean of Box 1.29 and (ii)
the minimization of the reduced Lagrangean L (I ) with respect to the rotation
parameter I . In an intermediate phase we introduce "centralized coordinates"
('X , 'Y ) , namely coordinate differences with respect to the centre Po = ( X o , Yo )
of the polyhedron, namely the triangle PD , PE , PJ . In this way we are able to
generate the simple (standard form) tan 2I š of the solution I š the argument of
L1 = L1 (I ) = min or L2 = L2 (I ) .
1-4 Special nonlinear models 79

Box 1.30: Minimum norm solution, special Procrustes algorithm,


1st forward step

& x &2 :=
:= xD2 + yD2 + xE2 + yE2 + xJ2 + yJ2 = min
tx ,t y ,I

"Lagrangean "
L (t x , t y , I ) :=
:= ( X D cos I + YD sin I  t x ) 2 + (  X D sin I + YD cos I  t y ) 2
+ ( X E cos I + YE sin I  t x ) 2 + (  X E sin I + YE cos I  t y ) 2
+ ( X J cos I + YJ sin I  t x ) 2 + ( X J sin I + YJ cos I  t y ) 2

1st forward step


1 wL š
(t x ) = ( X D + X E + X J ) cos I + (YD + YE + YJ ) sin I  3t xš = 0
2 wt x
1 wL š
(t y ) = ( X D + X E + X J ) sin I + (YD + YE + YJ ) cos I  3t yš = 0
2 wt y

t xš = + 13 {( X D + X E + X J ) cos I + (YD + YE + YJ ) sin I}


t yš = + 13 {( X D + X E + X J ) sin I + (YD + YE + YJ ) cos I}

(t xš , t yš ) = arg{L (t x , t y , I ) = min} .

Box 1.31: Minimum norm solution, special Procrustes algorithm,


2nd forward step

"solution t xš , t yš in Lagrangean:
reduced Lagrangean"
L (I ) :=
:= { X D cos I + YD sin I  [( X D + X E + X J ) cos I + (YD + YE + YJ ) sin I ]}2 +
1
3

+ { X E cos I + YE sin I  13 [( X D + X E + X J ) cos I + (YD + YE + YJ ) sin I ]}2 +


+ { X J cos I + YJ sin I  13 [( X D + X E + X J ) cos I + (YD + YE + YJ ) sin I ]}2 +
+ { X D sin I + YD cos I  13 [( X D + X E + X J ) sin I + (YD + YE + YJ ) cos I ]}2 +
+ { X E sin I + YE cos I  13 [( X D + X E + X J ) sin I + (YD + YE + YJ ) cos I ]}2 +
+ { X J sin I + YJ cos I  13 [( X D + X E + X J ) sin I + (YD + YE + YJ ) cos I ]}2
= min
I
80 1 The first problem of algebraic regression

L (I ) =
= {[ X D  ( X D + X E + X J )]cos I + [YD  13 (YD + YE + YJ )]sin I }2 +
1
3

+ {[ X E  13 ( X D + X E + X J )]cos I + [YE  13 (YD + YE + YJ )]sin I }2 +


+ {[ X J  13 ( X D + X E + X J )]cos I + [YJ  13 (YD + YE + YJ )]sin I }2 +

+ {[ X D  13 ( X D + X E + X J )]sin I + [YD  13 (YD + YE + YJ )]cos I }2 +


+ {[ X E  13 ( X D + X E + X J )]sin I + [YE  13 (YD + YE + YJ )]cos I }2 +
+ {[ X J  13 ( X D + X E + X J )]sin I + [YJ  13 (YD + YE + YJ )]cos I }2

"centralized coordinate"
'X := X D  13 ( X D + X E + X J ) = 13 (2 X D  X E  X J )
'Y := YD  13 (YD + YE + YJ ) = 13 (2YD  YE  YJ )

"reduced Lagrangean"
L1 (I ) = ('X D cos I + 'YD sin I ) 2 +
+ ('X E cos I + 'YE sin I ) 2 +
+ ('X J cos I + 'YJ sin I ) 2

L2 (I ) = ('X D sin I + 'YD cos I ) 2 +


+ ('X E sin I + 'YE cos I ) 2 +
+ ('X J sin I + 'YJ cos I ) 2
1 wL š
(I ) = 0
2 wI

œ ('X D cos I š + 'YD sin I š )('X D sin I š + 'YD cos I š ) +


+ ('X E cos I š + 'YE sin I š ) 2 ('X E sin I š + 'YE cos I š ) +
+ ('X J cos I š + 'YJ sin I š ) 2 ('X J sin I š + 'YJ cos I š ) = 0 œ

œ  ('X D2 + 'X E2 + 'X J2 ) sin I š cos I š +


+ ('X D 'YD + 'X E 'YE + 'X J 'YJ ) cos 2 I š
 ('X D 'YD + 'X E 'YE + 'X J 'YJ ) sin 2 I š
+ ('YD2 + 'YE2 + 'YJ2 ) sin I š cos I š = 0 œ

œ [('X D2 + 'X E2 + 'X J2 )  ('YD2 + 'YE2 + 'YJ2 )]sin 2I š =


= 2['X D 'YD + 'X E 'YE + 'X J 'Y ]cos 2I š
1-4 Special nonlinear models 81

'X D 'YD + 'X E 'YE + 'X J 'Y


tan 2I š = 2
('X + 'X E2 + 'X J2 )  ('YD2 + 'YE2 + 'YJ2 )
2
D

"Orientation parameter in terms of Gauss brackets"


2['X'Y]
tan 2I š =
['X 2 ]  ['Y 2 ]
I š = arg{L1 (I ) = min} = arg{L2 (I ) = min}.
The special Procrustes algorithm is completed by the backforward steps outlined
in Box 1.32: At first we convert tan 2I š to (cos I š ,sin I š ) . Secondly we substi-
tute (cos I š ,sin I š ) into the translation formula (t xš , t yš ) . Thirdly we substitute
(t xš , t yš , cos I š ,sin I š ) into the Lagrangean L (t x , t x , I ) , thus generating the opti-
mal objective function & x &2 at (t xš , t yš , I š ) . Finally as step four we succeed to
compute the centric coordinates
ª xD xE xJ º
«y »
¬ D yE yJ ¼
with respect to the orthonormal 2-leg {e1 , e 2 | Po } at Po from the given coordi-
nates
ª XD X E XJ º
«Y »
¬ D YE YJ ¼
with respect to the orthonormal 2-leg {E1 , E2 | o} at o , and the optimal datum
parameters
t xš , t yš , cos I š ,sin I š .

Box 1.32: Special Procrustes algorithm


backward steps
step one
2['X'Y]
tan 2I š = Ÿ
['X 2 ]  ['Y 2 ]
ª cos I š
Ÿ « š
¬ sin I
step two
t xš = 13 ([ X]cos I š + [Y]sin I š )
t yš = 13 ([ X]sin I š + [Y]cos I š )

step three
82 1 The first problem of algebraic regression

& x š &2 = L (t xš , t yš , I š )

step four

ª xD xE xJ º ª cos I š sin I š º ª X D XE X J º ªt xš º
«y =  « » 1c .
¬ D yE yJ »¼ «¬  sin I š »«
cos I š ¼ ¬ YD YE YJ »¼ «¬t yš »¼

We leave the proof for [x] = xD + xE + xJ = 0, [y ] = yD + yE + yJ = 0,


[xy ] = xD yD + xE yE + xJ yJ z 0 to the reader as an exercise. A numerical example
is
SDE = 1.1, S EJ = 0.9, SJD = 1.2,
Y1 = 1.21, Y2 = 0.81, Y3 = 1.44,

X D = 0, X E = 1.10, X J = 0.84,
YD = 0 , YE = 0 , YJ = 0.86,

'X D = 0.647, 'X E = +0.453, 'X J = +0.193,


'YD = 0.287 , 'YE = 0.287, 'YJ = +0.573,

test:
['X] = 0, ['Y] = 0,
['X'Y] = 0.166, ['X 2 ] = 0.661, ['Y 2 ] = 0.493,
tan 2I š = 1.979, I š = 31D.598,828, 457,

I š = 31D 35c 55cc.782,

cos I š = 0.851, 738, sin I š = 0.523,968,

t xš = 0.701, t yš = 0.095,

ª xD = 0.701, xE = +0.236, xJ = +0.465,


«
¬ yD = +0.095, yE = 0.481, yJ = +0.387,
test: [x] = xD + xE + xJ = 0, [y ] = yD + yE + yJ = 0, [xy ] = +0.019 z 0 . ƅ

1-5 Notes
What is the origin of the rank deficiency three of the linearized observational
equations, namely the three distance functions observed in a planar triangular
network we presented in paragraph three?
1-5 Notes 83

In geometric terms the a priori indeterminancy of relating observed distances to


absolute coordinates placing points in the plane can be interpreted easily: The
observational equation of distances in the plane P 2 is invariant with respect to a
translation and a rotation of the coordinate system. The structure group of the
twodimensional Euclidean space E 2 is the group of motion decomposed into the
translation group (two parameters) and the rotation group (one parameter). Un-
der the action of the group of motion (three parameters) Euclidean distance func-
tions are left equivariant. The three parameters of the group of motion cannot be
determined from distance measurements: They produce a rank deficiency of
three in the linearized observational equations. A detailed analysis of the relation
between the transformation groups and the observational equations has been
presented by E. Grafarend and B. Schaffrin (1974, 1976).
More generally the structure group of a threedimensional Weitzenboeck space
W 3 is the conformal group C7 (3) which is decomposed into the translation
group T3 (3 parameters), the special orthogonal group SO(3) (3 parameters) and
the dilatation group ("scale", 1 parameter). Under the action of the conformal
group C7 (3) – in total 7 parameters – distance ratios and angles are left equi-
variant. The conformal group C7 (3) generates a transformation of Cartesian
coordinates covering R 3 which is called similarity transformation or datum
transformation. Any choice of an origin of the coordinate system, of the axes
orientation and of the scale constitutes an S-base following W. Baarda
(1962,1967,1973,1979,1995), J. Bossler (1973), M. Berger (1994), A. Dermanis
(1998), A. Dermanis and E. Grafarend (1993), A. Fotiou and D. Rossikopoulis
(1993), E. Grafarend (1973,1979,1983), E. Grafarend, E. H. Knickmeyer and B.
Schaffrin (1982), E. Grafarend and G. Kampmann (1996), G. Heindl (1986), M.
Molenaar (1981), H. Quee (1983), P. J. G. Teunissen (1960, 1985) and H. Wolf
(1990).
In projective networks (image processing, photogrammetry, robot vision) the
projective group is active. The projective group generates a perspective trans-
formation which is outlined in E. Grafarend and J. Shan (1997). Under the ac-
tion of the projective group cross-ratios of areal elements in the projective plane
are left equivariant. For more details let us refer to M. Berger (1994), M. H. Brill
and E. B. Barrett (1983), R. O. Duda and P.E. Heart (1973), E. Grafarend and J.
Shan (1997), F. Gronwald and F. W. Hehl (1996), M. R. Haralick (1980), R. J.
Holt and A. N. Netrawalli (1995), R. L. Mohr, L. Morin and E. Grosso (1992), J.
L. Mundy and A. Zisserman (1992a, b), R. F. Riesenfeldt (1981), J. A. Schonten
(1954).
In electromagnetism (Maxwell equations) the conformal group C16 (3,1) is ac-
tive. The conformal group C16 (3,1) generates a transformation of "space-time"
by means of 16 parameters (6 rotational parameters – three for rotation, three for
"hyperbolic rotation", 4 translational parameters, 5 "involutary" parameters, 1
dilatation – scale – parameter) which leaves the Maxwell equations in vacuum as
84 1 The first problem of algebraic regression

well as pseudo – distance ratios and angles equivariant. Sample references are
A. O. Barut (1972), H. Bateman (1910), F. Bayen (1976), J. Beckers, J. Harnard,
M, Perroud and M. Winternitz (1976), D. G. Boulware, L. S. Brown, R. D. Pec-
cei (1970), P. Carruthers (1971), E. Cunningham (1910), T. tom Dieck (1967),
N. Euler and W. H. Steeb (1992), P. G. O. Freund (1974), T. Fulton, F. Rohrlich
and L. Witten (1962), J. Haantjes (1937), H. A. Kastrup (1962,1966), R. Kotecky
and J. Niederle (1975), K. H. Marivalla (1971), D. H. Mayer (1975), J. A. Scho-
uten and J. Haantjes (1936), D. E. Soper (1976) and J. Wess (1990).
Box 1.33: Observables and transformation groups

observed quantities transformation group datum parame-


ters
coordinate differences translation group 2
in R 2 T(2)
coordinate differences translation group 3
in R 3 T(3)
coordinate differences translation group n
in R n T( n )
Distances group of motion 3
in R 2 T(2) , SO(2)
Distances group of motion 3+3=6
in R 3 T(3) , SO(3)
Distances group of motion n+(n+1)/2
in R n T(n) , SO(n)
angles, distance ratios conformal group 4
in R 2 C 4 (2)
angles, distance ratios conformal group 7
in R 3 C7 (3)
angles, distance ratios conformal group (n+1)(n+2)/2
in R n C( n +1)( n + 2) / 2 (n)
cross-ratios of area elements projective group 8
in the projective plane

Box 1.33 contains a list of observables in R n , equipped with a metric, and their
corresponding transformation groups. The number of the datum parameters coin-
cides with the injectivity rank deficiency in a consistent system of linear (lin-
earized) observational equations Ax = y subject to A  R n× m , rk A = n < m,
d ( A) = m  rk A .
2 The first problem of probabilistic regression
– special Gauss-Markov model with datum defect –
Setup of the linear uniformly minimum bias estimator of
type LUMBE for fixed effects.
In the first chapter we have solved a special algebraic regression problem,
namely the inversion of a system of consistent linear equations classified as
“underdetermined”. By means of the postulate of a minimum norm solution
|| x ||2 = min we were able to determine m unknowns ( m > n , say m = 106 ) from
n observations (more unknowns m than equations n, say n = 10 ). Indeed such a
mathematical solution may surprise the analyst: In the example “MINOS” pro-
duces one million unknowns from ten observations.
Though “MINOS” generates a rigorous solution, we are left with some doubts.
How can we interpret such an “unbelievable solution”?
The key for an evaluation of “MINOS” is handed over to us by treating the spe-
cial algebraic regression problem by means of a special probabilistic regression
problem, namely as a special Gauss-Markov model with datum defect. The bias
generated by any solution of an underdetermined or ill-posed problem will be
introduced as a decisive criterion for evaluating “MINOS”, now in the context of
a probabilistic regression problem. In particular, a special form of “LUMBE”,
the linear uniformly minimum bias estimator || LA - I ||2 = min , leads us to a
solution which is equivalent to “MINOS”. Alternatively we may say that in the
various classes of solving an underdetermined problem “LUMBE” generates a
solution of minimal bias.
? What is a probabilistic regression problem?
By means of a certain statistical objective function, here of type “minimum bias”,
we solve the inverse problem of linear and nonlinear equations with “fixed ef-
fects” which relates stochastic observations to parameters. According to the
Measurement Axiom observations are elements of a probability space. In terms of
second order statistics the observation space Y of integer dimension,
dim Y = n , is characterized by the first moment E {y} , the expectation of
y  Y , and the central second moment D {y} , the dispersion matrix or variance-
covariance matrix Ȉ y . In the case of “fixed effects” we consider the parameter
space X of integer dimension, dim X = m , to be metrical. Its metric is induced
from the probabilistic measure of the metric, the variance-covariance matrix Ȉ y
of the observations y  Y . In particular, its variance-covariance-matrix is
pulled-back from the variance-covariance-matrix Ȉ y . In the special probabilistic
regression model “fixed effects” ȟ  Ȅ (elements of the parameter space) are
estimated.
Fast track reading:
Consult Box 2.2 and read only Theorem 2.3
86 2 The first problem of probabilistic regression

Please pay attention to the guideline of Chapter two, say definitions, lemma and
theorems.

Lemma 2.2
ȟ̂ hom S -LUMBE of ȟ

Definition 2.1 Theorem 2.3


ȟ̂ hom S -LUMBE of ȟ ȟ̂ hom S -LUMBE of ȟ

Theorem 2.4
equivalence of G x -MINOS
and S -LUMBE

“The guideline of chapter two: definition, lemma and theorems”

2-1 Setup of the linear uniformly minimum bias estimator of type


LUMBE
Let us introduce the special consistent linear Gauss-Markov model specified in
Box 2.1, which is given for the first order moments again in the form of a consis-
tent system of linear equations relating the first non-stochastic (“fixed”), real-
valued vector ȟ of unknowns to the expectation E{y} of the stochastic, real-
valued vector y of observations, Aȟ = E{y}. Here, the rank of the matrix A ,
rkA equals the number n of observations, y  \ n . In addition, the second order
central moments, the regular variance-covariance matrix Ȉ y of the observations,
also called dispersion matrix D{y} , constitute the second matrix Ȉ y  \ n×n as
unknowns to be specified as a linear model further on, but postponed to the
fourth chapter.
Box 2.1:
Special consistent linear Gauss-Markov model
{y = Aȟ | A  \ n× m , rk A = n, n < m}
1st moments
Aȟ = E{y} (2.1)
2nd moments
n× n
Ȉ y = D{y}  \ , Ȉ y positive  definite, rk Ȉ y = n (2.2)
ȟ unknown
Ȉ y unknown or known from prior information.
2-1 Setup of the linear uniformly minimum bias estimator 87

Since we deal with a linear model, it is “a natural choice” to setup a homogene-


ous linear form to estimate the parameters ȟ of fixed effects, at first, namely

ȟˆ = Ly , (2.3)

where L  \ m × n is a matrix-valued fixed unknown. In order to determine the


real-valued m × n matrix L , the homogeneous linear estimation ȟ̂ of the vector
ȟ of foxed effects has to fulfil a certain optimality condition we shall outline.
Second, we are trying to analyze the bias in solving an underdetermined system
of linear equations. Take reference to Box 2.2 where we systematically introduce
(i) the bias vector ȕ , (ii) the bias matrix, (iii) the S -modified bias matrix norm
as a weighted Frobenius norm. In detail, let us discuss the bias terminology: For
a homogeneous linear estimation ȟˆ = Ly the vector-valued bias
ȕ := E{ȟˆ  ȟ} = E{ȟˆ}  ȟ takes over the special form
ȕ := E{ȟˆ}  ȟ = [I  LA] ȟ , (2.4)
which led us to the definition of the bias matrix ( I - LA )c . The norm of the bias
vector ȕ , namely || ȕ ||2 := ȕcȕ , coincides with the ȟȟ c weighted Frobenius norm
2
of the bias matrix B , namely || B ||ȟȟ c . Here, we meet the central problem that the
weight matrix ȟȟ , rk ȟȟ = 1, has rank one. In addition, ȟȟ c is not accessible
c c
since ȟ is unknown. In this problematic case we replace the matrix ȟȟ c by a
fixed positive-definite m × m matrix S , rk S = m , C. R. Rao’s substitute matrix
and define the S -weighted Frobenius matrix norm
|| B ||S2 := trBcSB = tr(I m  LA)S(I m  LA )c . (2.5)
Indeed, the substitute matrix S constitutes the matrix of the metric of the bias
space.
Box 2.2:
Bias vector, bias matrix
Vector and matrix bias norms
Special consistent linear Gauss-Markov model of fixed effects
A  \ n×m , rk A = n, n < m
E{y} = Aȟ, D{y} = Ȉ y

“ansatz”

ȟˆ = Ly (2.6)

“bias vector”
ȕ := E{ȟˆ  ȟ} = E{ȟˆ}  ȟ z 0 ȟ  \ m (2.7)
ȕ = LE{y}  ȟ = [I m  LA]ȟ = 0 ȟ  \ m (2.8)
88 2 The first problem of probabilistic regression

“bias matrix”
Bc = I m  LA (2.9)

“bias norms”

|| ȕ ||2 = ȕcȕ = ȟ c[I m  LA ]c [I m  LA ]ȟ (2.10)

|| ȕ ||2 = tr ȕȕc = tr[I m  LA]ȟȟ c[I m  LA]c = || B ||[[


2
c (2.11)

|| ȕ ||S2 := tr[I m  LA]S[I m  LA]c =:|| B ||S2 . (2.12)

Being prepared for optimality criteria we give a precise definition of ȟ̂ of type


hom S-LUMBE.

Definition 2.1 ( ȟ̂ hom S -LUMBE of ȟ ):

An m × 1 vector ȟ̂ is called hom S-LUMBE (homogeneous Linear


Uniformly Minimum Bias Estimation) of ȟ in the special consis-
tent linear Gauss-Markov model of fixed effects of Box 2.1, if
(1st) ȟ̂ is a homogeneous linear form

ȟˆ = Ly , (2.13)

(2nd) in comparison to all other linear estimation ȟ̂ has the


minimum bias in the sense of
|| B ||S2 := || (I m  LA)c ||S2 . (2.14)

The estimations ȟ̂ of type hom S-LUMBE can be characterized by

Lemma 2.2 ( ȟ̂ hom S -LUMBE of ȟ ):

An m × 1 vector ȟ̂ is hom S-LUMBE of ȟ in the special consis-


tent linear Gauss-Markov model with fixed effects of Box 2.1, if
and only if the matrix L̂ fulfils the normal equations

ASA cLˆ c = AS . (2.15)

: Proof :

The S -weighted Frobenius norm || ( I m  LA )c ||S2 establishes the Lagrangean

L (L) := tr ( I m  LA ) S ( I m  LA )c = min L (2.16)


2-1 Setup of the linear uniformly minimum bias estimator 89

for S -LUMBE. The necessary conditions for the minimum of the quadratic
Lagrangean L (L) are
wL ˆ
wL
( ) c
L := 2 ª¬ ASA cLˆ c  AS º¼ = 0. (2.17)

The theory of matrix derivatives is reviewed in Appendix B “Facts: derivative


of a scalar-valued function of a matrix: trace”. The second derivatives
w2L
w (vec L)w (vec L)c
( )
Lˆ > 0 (2.18)

at the “point” L̂ constitute the sufficiency conditions. In order to compute such a


mn × mn matrix of second derivatives we have to vectorize the matrix normal
equation by
wL ˆ
wL
( ) ˆ
L = 2 ª¬LASA c  SAcº
¼ (2.19)

wL
w (vec L)
( )
Lˆ = vec 2 ª¬LASA
ˆ c  SAcº
¼ (2.20)

wL
w (vec L)
( )
Lˆ = 2 [ ASAc … I m ] vec Lˆ  2 vec ( SAc ) . (2.21)

The Kronecker-Zehfuß poduct A … B of two arbitrary matrices as well as


( A + B ) … C = A … C + B … C of three arbitrary matrices subject to the dimen-
sion condition dim A = dim B is introduced in Appendix A, “Definition of Matrix
Algebra: multiplication of matrices of the same dimension (internal relation) and
Laws”. The vec operation (vectorization of an array) is reviewed in Appendix A
as well, namely “Definition, Facts: vec AB = (Bc … I Ac ) vec A for suitable matri-
ces A and B ”. No we are prepared to compute
w2L
( Lc ) = 2( ASAc) … Im > 0 (2.22)
w (vec L)w (vec L)c
as a positive-definite matrix. The useful theory of matrix derivatives which ap-
plies here is reviewed in Appendix B, “Facts: derivative of a matrix-valued func-
tion of a matrix namely w (vec X) / w (vec X)c ”.

The normal equations of hom S-LUMBE wL / wL(Lˆ ) = 0 agree to (2.15).


ƅ
For an explicit representation of ȟ̂ as hom LUMBE in the special consistent
linear Gauss-Markov model of fixed effects of Box 2.1, we solve the normal
equations (2.15) for
Lˆ = arg{L (L) = min} .
L
90 2 The first problem of probabilistic regression

Beside the explicit representation of ȟ̂ of type hom LUMBE we compute the


related dispersion matrix D{ȟˆ} in
Theorem 2.3 ( ȟ̂ hom LUMBE of ȟ ):

Let ȟˆ = Ly be hom LUMBE in the special consistent linear


Gauss-Markov model of fixed effects of Box 2.1. Then the solution
of the normal equation is

ȟˆ = SA c( ASA c) 1 y (2.23)

completed by the dispersion matrix

D{ȟˆ} = SAc( ASAc) 1 Ȉ y ( ASAc) AS (2.24)

and by the bias vector

ȕ := E{ȟˆ}  ȟ =
(2.25)
=  ª¬I m  SAc( ASAc) 1 A º¼ ȟ

for all ȟ  \ m .

The proof of Theorem 2.3 is straight forward. At this point we have to comment
what Theorem 2.3 is actually telling us. hom LUMBE has generated the estima-
tion ȟ̂ of type (2.23), the dispersion matrix D{ȟˆ} of type (2.24) and the bias
vector of type (2.25) which all depend on C. R. Rao’s substitute matrix S ,
rk S = m . Indeed we can associate any element of the solution vector, the dis-
persion matrix as well as the bias vector with a particular weight which can be
“designed” by the analyst.
2-2 The Equivalence Theorem of Gx -MINOS and S -LUMBE
We have included the second chapter on hom S -LUMBE in order to interpret
G x -MINOS of the first chapter. The key question is open:

? When are hom S -LUMBE and G x -MINOS equivalent ?

The answer will be given by

Theorem 2.4 (equivalence of G x -MINOS and S -LUMBE)


With respect to the special consistent linear Gauss-Markov model (2.1),
(2.2) ȟˆ = Ly is hom S -LUMBE for a positive-definite matrix S , if
ȟ m = Ly is G x -MINOS of the underdetermined system of linear equa-
tions (1.1) for
G x = S -1  G -1x = S . (2.26)
2-2 The Equivalence Theorem of G x -MINOS and S-LUMBE 91

The proof is straight forward if we directly compare the solution (1.14) of G x -


MINOS and (2.23) of hom S -LUMBE. Obviously the inverse matrix of the
metric of the parameter space X is equivalent to the matrix of the metric of the
bias space B . Or conversely, the inverse matrix of the metric of the bias space
B determines the matrix of the metric of the parameter space X . In particular,
the bias vector ȕ of type (2.25) depends on the vector ȟ which is inaccessible.
The situation is similar to the one in hypothesis testing. We can produce only an
estimation ȕ̂ of the bias vector ȕ if we identify ȟ by the hypothesis ȟ 0 = ȟˆ . A
similar argument applies to the second central moment D{y}  Ȉ y of the “ran-
dom effect” y , the observation vector. Such a dispersion matrix D{y}  Ȉ y has
to be known a priori in order to be able to compute the dispersion matrix
D{ȟˆ}  Ȉȟˆ . Again we have to apply the argument that we are only able to con-
struct an estimate Ȉˆ and to setup a hypothesis about Ȉ .
y y

2-3 Examples
Due to the Equivalence Theorem G x -MINOS ~ S -LUMBE the only new items
of the first problem of probabilistic regression are the dispersion matrix
D{ȟˆ | hom LUMBE} and the bias matrix B{ȟˆ | hom LUMBE} . Accordingly the
first example outlines the simple model of the variance-covariance matrix
D{ȟˆ} =: Ȉȟˆ and its associated Frobenius matrix bias norm || B ||2 . New territory
is taken if we compute the variance-covariance matrix D{ȟˆ * } =: Ȉȟˆ and its re-
*

lated Frobenius matrix bias norm || B* ||2 for the canonical unknown vector ȟ* of
star coordinates [ȟˆ1* ,..., ȟˆ *m ]c , lateron rank partitioned.
Example 2.1 (simple variance-covariance matrix D{ȟˆ | hom LUMBE} ,
Frobenius norm of the bias matrix || B(hom LUMBE) || ):
The dispersion matrix Ȉ := D{ȟˆ} of ȟ̂ (hom LUMBE) is called
ȟˆ
Simple,
if S = I m and Ȉ y := D{y} = I n ı y2 .
Such a model is abbreviated
“i.i.d.” and “u.s.”
or or
independent identically
unity substituted
distributed observations
(unity substitute matrix).
(one variance component)
Such a simple dispersion matrix is represented by
Ȉȟˆ = A c( AA c) 2 Aı 2y . (2.27)
The Frobenius norm of the bias matrix for such a simple invironment is derived
by
92 2 The first problem of probabilistic regression

|| B ||2 = tr[I m  A c( AA c) 1 A] (2.28)


|| B ||2 = d = m  n = m  rk A, (2.29)
since I m  A c(AA c)1 A and A c( AAc) 1 A are idempotent. According to Appen-
dix A, notice the fact “ tr A = rk A if A is idempotent”. Indeed the Frobenius
norm of the u.s. bias matrix B ( hom LUMBE ) equalizes the square root
m  n = d of the right complementary index of the matrix A .
Table 2.1 summarizes those data of the front page examples of the first chapter
relating to
D{ȟˆ | hom LUMBE} and || B(hom BLUMBE) || .

Table 2.1:
Simple variance-covariance matrix
(i.i.d. and u.s.)
Frobenius norm of the simple bias matrix
Front page example 1.1
A  \ 2×3 , n = 2, m = 3

ª1 1 1 º ª3 7 º 1 ª 21 7 º
A := « » , AAc = « » , ( AAc) 1 = «
¬1 2 4 ¼ ¬7 21¼ 14 ¬ 7 3 »¼
rk A = 2

ª14 4 º ª10 6 2º


1 « 1
A c( AA c) 1 = 7 1» , A c( AA c) 1 A = « 6 5 3 »
14 « 7 5 » 14 « 2 3 13 »
¬ ¼ ¬ ¼
ª106 51 59 º
1 ª 245 84 º 1 «
( AA c) 2 = , A c( AA c) 2
A = 51 25 27 »
98 ¬« 84 29 ¼» 98 « 59 27 37 »
¬ ¼
ª106 51 59 º
1 «
Ȉȟˆ = A c( AA c) AV =
2 2
y 51 25 27 » V y2
98 « 59 27 37 »
¬ ¼
|| B ||2 = tr ¬ªI m  A c( AA c) 1 A º¼ = tr I 3  tr A c( AA c) 1 A

|| B ||2 = 3  141 (10 + 5 + 13) = 3  2 = 1 = d

|| B || = 1 = d .
2-3 Examples 93

Example 2.2 (canonically simple variance-covariance matrix


D{ȟˆ * | hom LUMBE} , Frobenius norm of the
canonical bias matrix || B* (hom LUMBE) || ):
The dispersion matrix Ȉȟˆ := D{ȟˆ * } of the rank partitioned vector of canonical
1
*

coordinates ȟˆ * = 9 cS ȟˆ of type hom LUMBE is called
2

canonically simple,
if S = I m and Ȉ y := D{y * } = I nV y2 . In short, we denote such a model by
* *

“i.i.d.” and “u.s.”


or Or
independent identically
unity substituted
distributed observations
(unity substitute matrix).
(one variance component)
Such a canonically simple dispersion matrix is represented by
­° ª ȟ* º ½° ª ȁ-2Vy2 0º
D{ȟˆ* } = D ® « *1 » ¾ = «
*
» (2.30)
°¯ ¬«ȟ2 ¼» °¿ ¬« 0 0 ¼»

or

ª1 1 º
var ȟˆ 1* = Diag « 2 ,..., 2 » V y2 , ȟ1*  \ r ×1 ,
*

¬ O1 Or ¼

( )
cov ȟˆ 1* , ȟˆ *2 = 0, var ȟˆ *2 = 0, ȟ*2  \ ( m  r )×1 .

If the right complementary index d := m  rk A = m  n is interpreted as a datum


defect, we may say that the variances of the “free parameters” ȟˆ *2  \ d are zero.
Let us specialize the canonical bias vector ȕ* as well as the canonical bias ma-
trix B* which relates to ȟˆ * = L* y * of type “canonical hom LUMBE” as follows.

Box 2.3:
Canonical bias vector, canonical bias matrix
“ansatz”

ȟˆ * = L* y *

E{y * } = A*ȟ* , D{y * } = Ȉ y *

“bias vector”
ȕ := E{ȟ* }  ȟ * ȟ *  \ m
*
94 2 The first problem of probabilistic regression

ȕ* = L* E{y * }  ȟ* ȟ*  \ m

ȕ* = (I m  L* A* )ȟ* ȟ*  \ m

ª ȕ* º ªI 0 º ª ȁ 1 º ª ȟ1* º
ȕ* (hom LUMBE) = « 1* » = ( « r  « »[ ȁ , 0 ] « *»
) (2.31)
¬ȕ 2 ¼ ¬0 I d »¼ ¬ 0 ¼ ¬ȟ 2 ¼
for all ȟ*1  \ r , ȟ*2  \ d

ª ȕ* º ª0 0 º ª ȟ*1 º
ȕ* (hom LUMBE) = « *1 » =  « »« *» (2.32)
¬ȕ 2 ¼ ¬ 0 I d ¼ ¬ȟ 2 ¼

ª ȕ* º ª0º
ȕ* (hom LUMBE) = « *1 » =  « * » ȟ *2  \ d (2.33)
¬ȕ 2 ¼ ¬ȟ 2 ¼
“bias matrix”
(B* )c = I m  L* A*

ªI 0 º ª ȁ 1 º
ª¬B* (hom LUMBE) º¼c = « r « » [ ȁ, 0 ] (2.34)
¬0 I d »¼ ¬ 0 ¼

ª0 0 º
ª¬B* (hom LUMBE) º¼c = « » (2.35)
¬0 I d ¼
“Frobenius norm of the canonical bias matrix”
ª0 0 º
|| B* (hom LUMBE) ||2 = tr « » (2.36)
¬0 I d ¼

|| B* (hom LUMBE) || = d = m  n . (2.37)

It is no surprise that the Frobenius norm of the canonical bias matrix d =


= m  n = m  rk A of Box 2.3 agrees to the value of the Frobenius norm of
the ordinary bias matrix.
3 The second problem of algebraic regression
– inconsistent system of linear observational equations –
overdetermined system of linear equations:
{Ax + i = y | A  \ n×m , y  R ( A )  rk A = m, m = dim X}

:Fast track reading:


Read only Lemma 3.7.

Lemma 3.2
x A G y -LESS of x

Lemma 3.3
x A G y -LESS of x

Lemma 3.4
x A G y -LESS of x
constrained Lagrangean

Lemma 3.5
x A G y -LESS of x
constrained Lagrangean

Theorem 3.6
bilinear form

Lemma 3.7
Characterization of G y -LESS

“The guideline of chapter three: theorem and lemmas”


96 3 The second problem of algebraic regression

By means of a certain algebraic objective function which geometrically is called


a minimum distance function, we solve the second inverse problem of linear and
nonlinear equations, in particular of algebraic type, which relate observations to
parameters. The system of linear or nonlinear equations we are solving here is
classified as overdetermined. The observations, also called measurements, are
elements of a certain observation space Y of integer dimension, dim Y = n,
which may be metrical, especially Euclidean, pseudo–Euclidean, in general a
differentiable manifold. In contrast, the parameter space X of integer dimension,
dim X = m, is metrical as well, especially Euclidean, pseudo–Euclidean, in gen-
eral a differentiable manifold, but its metric is unknown. A typical feature of
algebraic regression is the fact that the unknown metric of the parameter space
X is induced by the functional relation between observations and parameters.
We shall outline three aspects of any discrete inverse problem: (i) set-theoretic
(fibering), (ii) algebraic (rank partitioning, “IPM”, the Implicit Function Theo-
rem) and (iii) geometrical (slicing).
Here we treat the second problem of algebraic regression:
A inconsistent system of linear observational equations: Ax + i = y , A  R n× m ,
rk A = m, n > m, also called “overdetermined system of linear equations”, in
short
“more observations than unknowns”
is solved by means of an optimization problem. The introduction presents us a
front page example of three inhomogeneous equations with two unknowns. In
terms of 31 boxes and 12 figures we review the least-squares solution of such a
inconsistent system of linear equations which is based upon the trinity.
3-1 Introduction 97

3-1 Introduction

With the introductory paragraph we explain the fundamental


concepts and basic notions of section. For you, the analyst,
who has the difficult task to deal with measurements,
observational data, modeling and modeling equations we
present numerical examples and graphical illustrations of all
abstract notions. The elementary introduction is written not for
a mathematician, but for you, the analyst, with limited remote
control of the notions given hereafter. May we gain your interest.

Assume an n-dimensional observation space Y, here a linear space parameter-


ized by n observations (finite, discrete) as coordinates y = [ y1 ," , yn ]c  R n in
which an m-dimensional model manifold is embedded (immersed). The model
manifold is described as the range of a linear operator f from an m-dimensional
parameter space X into the observation space Y. The mapping f is established
by the mathematical equations which relate all observables to the unknown
parameters. Here the parameter space X , the domain of the linear operator f,
will also be restricted to a linear space which is parameterized by coordi-
nates x = [ x1 ," , xm ]c  R m . In this way the linear operator f can be understood as
a coordinate mapping A : x 6 y = Ax. The linear mapping f : X o Y is geo-
metrically characterized by its range R(f), namely R(A), defined by R(f):=
{y  Y | y = f (x) for all x  X} which in general is a linear subspace of Y and
its kernel N(f), namely N(f), defined by N ( f ) := {x  X | f (x) = 0}. Here the
range R(f), namely R(A), does not coincide with the n-dimensional observation
space Y such that y  R (f ) , namely y  R (A) . In contrast, we shall assume
that the null space element N(f) = 0 “is empty”: it contains only the ele-
ment x = 0 .
Example 3.1 will therefore demonstrate the range space R(f), namely the range
space R(A), which dose not coincide with the observation space Y, (f is not
surjective or “onto”) as well as the null space N(f), namely N(A), which is
empty. f is not surjective, but injective.
Box 3.20 will introduce the special linear model of interest. By means of Box
3.21 it will be interpreted.

3-11 The front page example

Example 3.1 (polynomial of degree two, inconsistent system of linear


equations Ax + i = y, x  X = R m , dim X = m,
y  Y = R n , r = rk A = dim X = m, y  N ( A ) ):
98 3 The second problem of algebraic regression

First, the introductory example solves the front page inconsistent system of lin-
ear equations,
x1 + x2  1 x1 + x2 + i1 = 1
x1 + 2 x2  3 or x1 + 2 x2 + i2 = 3
x1 + 3 x2  4 x1 + 3x2 + i3 = 4,

obviously in general dealing with the linear space X = R m


x, dim X = m, here
m=2, called the parameter space, and the linear space Y = R n
y , dim Y = n,
here n =3 , called the observation space.

3-12 The front page example in matrix algebra


Second, by means of Box 3.1 and according to A. Cayley’s doctrine let us specify
the inconsistent system of linear equations in terms of matrix algebra.

Box 3.1:
Special linear model:
polynomial of degree one, three observations, two unknowns

ª y1 º ª a11 a12 º
ªx º
y = «« y2 »» = «« a21 a22 »» « 1 » œ
x
«¬ y3 »¼ «¬ a31 a32 »¼ ¬ 2 ¼

ª1 º ª1 1 º ª i1 º ª1 1 º
ª x1 º « »
œ y = Ax + i : « 2 » = «1 2 » « » + «i2 » Ÿ A = ««1 2 »»
« » « »
x
«¬ 4 »¼ «¬1 3 »¼ ¬ 2 ¼ «¬ i3 »¼ «¬1 3 »¼

xc = [ x1 , x2 ], y c = [ y1 , y2 , y3 ] = [1, 2, 3], i c = [i1 , i2 , i3 ] ,


A  Z +3× 2  R 3× 2 , x  R 2×1 , y  Z +3×1  R 3×1

r = rk A = dim X = m = 2 .

As a linear mapping f : x 6 y  Ax can be classified as following: f is injec-


tive, but not surjective. (A mapping f is called linear if f ( x1 + x2 ) =
f ( x1 ) + f ( x2 ) holds.) Denote the set of all x  X by the domain D(f) or the
domain space D($). Under the mapping f we generate a particular set called the
range R(f) or the range space R(A). Since the set of all y  Y is not in the range
R(f) or the range space R(A), namely y  R (f ) or y  R (A) , the mapping f is
not surjective. Beside the range R(f), the range space R(A), the linear mapping
is characterized by the kernel N ( f ) := {x  R m | f (x) = 0} or the null space
N ( A) := {x  R m | Ax = 0} . Since the inverse mapping
3-1 Introduction 99

g : R ( f )
y / 6 x  D( f ) is one-to-one, the mapping f is injective. Alterna-
tively we may identify the kernel N(f), or the null space N ( A ) with {0} .

? Why is the front page system of linear equations called inconsistent ?

For instance, let us solve the first two equations, namely x1 = 0, x2 = 1. As soon
as we substitute this solution in the third one, the inconsistency 3 z 4 is met.
Obviously such a system of linear equations needs general inconsistency parame-
ters (i1 , i2 , i3 ) in order to avoid contradiction. Since the right-hand side of the
equations, namely the inhomogeneity of the system of linear equations, has been
measured as well as the model (the model equations) have been fixed, we have
no alternative but inconsistency.
Within matrix algebra the index of the linear operator A is the rank r = rk A,
here r = 2, which coincides with the dimension of the parameter space X,
dim X = m, namely r = rk A = dim X = m, here r=m=2. In the terminology of
the linear mapping f, f is not “onto” (surjective), but “one-to-one” (injective).
The left complementary index of the linear operator A  R n× m , which account for
the surjectivity defect is given by d s = n  rkA, also called “degree of freedom”
(here d s = n  rkA = 1 ). While “surjectivity” related to the range R(f) or “the
range space R(A)” and “injectivity” to the kernel N(f) or “the null space N(A)”
we shall constructively introduce the notion of

range R (f ) versus kernel N ( f )


range space R (A) null space N ( f )

by consequently solving the inconsistent system of linear equations. But before-


hand let us ask:
How can such a linear model of interest, namely a system of inconsistent linear
equations, be generated ?
With reference to Box 3.2 let us assume that we have observed a dynamical sys-
tem y(t) which is represented by a polynomial of degree one with respect to
time t  R + , namely
y(t ) = x1 + x2t.

(Due to y• (t ) = x2 it is a dynamical system with constant velocity or constant


first derivative with respect to time t.) The unknown polynomial coefficients are
collected in the column array x = [ x1 , x2 ]c, x  X = R 2 , dim X = 2, and constitute
the coordinates of the two-dimensional parameter space X. If the dynamical
system y(t) is observed at three instants, say y(t1) = y1 = 1, y(t2) = y2 = 2, y(t3) =
y3 = 4, and if we collect the observations in the column array
y = [ y1 , y2 , y3 ]c = [1, 2, 4]c, y  Y = R 3 , dim Y = 3, they constitute the coordi-
nates of the three-dimensional observation space Y. Thus we are left with the
100 3 The second problem of algebraic regression

problem to compute two unknown polynomial coefficients from three measure-


ments.

Box 3.2:
Special linear model: polynomial of degree one,
three observations, two unknowns

ª y1 º ª1 t1 º ª i1 º
« » « » ª x1 º « »
y = « y2 » = «1 t2 » « » + «i2 » œ
x
«¬ y3 »¼ «¬1 t3 »¼ ¬ 2 ¼ «¬ i3 »¼

ª t1 = 1, y1 = 1 ª 1 º ª1 1 º ª i1 º
« « » « » ª x1 º « »
œ «t2 = 2, y2 = 2 : « 2 » = «1 2 » « » + «i2 » ~
x
«¬ t3 = 3, y3 = 4 «¬ 4 »¼ «¬1 3 »¼ ¬ 2 ¼ «¬ i3 »¼
~ y = Ax + i, r = rk A = dim X = m = 2 .

Thirdly, let us begin with a more detailed analysis of the linear mapping
f : Ax  y or Ax + i = y , namely of the linear operator A  R n× m , r = rk A =
dim X = m. We shall pay special attention to the three fundamental partition-
ings, namely
(i) algebraic partitioning called rank partitioning of the matrix A,
(ii) geometric partitioning called slicing of the linear space Y
(observation space),
(iii) set-theoretical partitioning called fibering of the set Y of
observations.

3-13 Least squares solution of the front page example by means of vertical
rank partitioning

Let us go back to the front page inconsistent system of linear equations, namely
the problem to determine two unknown polynomial coefficients from three
sampling points which we classified as an overdetermined one. Nevertheless we
are able to compute a unique solution of the overdetermined system of
inhomogeneous linear equations Ax + i = y , y  R ( A) or rk A = dim X , here
A  R 3× 2 x  R 2×1 , y  R 3×1 if we determine the coordinates of the unknown
vector x as well as the vector i of the inconsistency by least squares (minimal
Euclidean length, A2-norm), here & i &2I = i12 + i22 + i32 = min .
Box 3.3 outlines the solution of the related optimization problem.
3-1 Introduction 101

Box 3.3:
Least squares solution of the inconsistent system of
inhomogeneous linear equations, vertical rank partitioning
The solution of the optimization problem
{& i &2I = min | Ax + i = y , rk A = dim X}
x
is based upon the vertical rank partitioning of the linear mapping
f : x 6 y = Ax + i, rk A = dim X , which we already introduced.
As soon as
ª y1 º ª A1 º ª i1 º r ×r
« y » = « A » x + « i » subject to A1  R
¬ 2¼ ¬ 2¼ ¬ 2¼
x =  A1 i1 + A1 y1
1 1

y 2 =  A 2 A11i1 + i 2 + A 2 A11y1 Ÿ

Ÿ i 2 = A 2 A11i1  A 2 A11 y1 + y 2

is implemented in the norm & i &2I we are prepared to compute the first
derivatives of the unconstrained Lagrangean
L (i1 , i 2 ) := & i &2I = i1ci1 + i c2 i 2 =
= i1ci1 + i1c A1c1A c2 A 2 A11i1  2i1c A1c1A c2 ( A 2 A11y1  y 2 ) +
+( A 2 A11y1  y 2 )c( A 2 A11y1  y 2 ) =
= min
i1

wL
(i1l ) = 0 œ
wi1
œ  A1c1A c2 ( A 2 A11y1  y 2 ) + [ A1c1Ac2 A 2 A11 + I ] i1l = 0 œ
œ i1l = [I + A1c1Ac2 A 2 A11 ]1 A1c1 A c2 ( A 2 A11y1  y 2 )

which constitute the necessary conditions. The theory of vector


derivatives is presented in Appendix B. Following Appendix A,
“Facts: Cayley inverse: sum of two matrices , namely (s9), (s10)
for appropriate dimensions of the involved matrices”, we are led
to the following identities:
102 3 The second problem of algebraic regression

1st term
(I + A1c1A c2 A 2 A11 ) 1 A1c1A c2 A 2 A11y1 = ( A1c + A c2 A 2 A11 ) 1 A c2 A 2 A11y1 =
A1 ( A1c A1 + A c2 A 2 ) 1 A c2 A 2 A11y1 =  A1 ( A1c A1 + A c2 A 2 ) 1 A1cy1 +
+ A1 ( A1c A1 + A c2 A 2 ) 1 ( A c2 A 2 A11 + A1c )y1 =  A1 ( A1c A1 + A c2 A 2 ) 1 A1cy1 +
+( A1c A1 + A c2 A 2 ) 1 ( A c2 A 2 + A1c A1 )y1 =  A1 ( A1c A1 + A c2 A 2 ) 1 A1cy1 + y1
2nd term

(I + A1c1A c2 A 2 A11 ) 1 A1c1A 2 y 2 = ( A1c + A c2 A 2 A11 ) 1 A 2 y 2 =


=  A1 ( A1c A1 + A c2 A 2 ) 1 A c2 y 2 Ÿ

œ i1l =  A1 ( A1c A1 + A c2 A 2 ) 1 ( A1cy1 + A c2 y 2 ) + y1 .

The second derivatives

w2L
(i1l ) = 2[( A 2 A11 )c( A 2 A11 ) + I] > 0
wi1wi1c
due to positive-definiteness of the matrix ( A 2 A11 )c( A 2 A11 ) + I
generate the sufficiency condition for obtaining the minimum of
the unconstrained Lagrangean. Finally let us backward transform
i1l 6 i 2 l = A 2 A11i1l  A 2 A11 y1 + y 2 .
i 2 l =  A 2 ( A1c A1 + Ac2 A 2 ) 1 ( A1cy1 + Ac2 y 2 ) + y 2 .

Obviously we have generated the linear form

i1l =  A1 ( A1c A1 + A c2 A 2 ) 1 ( A1cy1 + Ac2 y 2 ) + y1


i 2l =  A 2 ( A1c A1 + Ac2 A 2 ) 1 ( A1cy1 + Ac2 y 2 ) + y 2
or
ª i1l º ª A1 º ª y1 º ª y1 º
«i » =  « A » ( A1c A1 + A c2 A 2 ) [ A1c , A c] « y » + « y »
1

¬ 2l ¼ ¬ 2¼ ¬ 2¼ ¬ 2¼
or
i l =  A( A cA) 1 y + y.

Finally we are left with the backward step to compute the unknown
vector of parameters x  X :
xl =  A11i1l + A11 y1

xl = ( A1c A1 + A c2 A 2 ) 1 ( A1cy1 + A c2 y 2 )
or
xl = ( A cA) 1 Acy.
3-1 Introduction 103

A numerical computation with respect to the introductory example is


ª3 6 º ª14 6º
A1c A1 + A c2 A 2 = « » , ( A1c A1 + A c2 A 2 ) 1 = 16 « »,
¬6 14 ¼ ¬ 6 3 ¼

ª 8 3 º
A1 ( A1c A1 + A c2 A 2 ) 1 = 16 « »,
¬2 0 ¼

A 2 ( A1c A1 + A c2 A 2 ) 1 = 16 [4, 3],

ª8º ª1 º
A1cy1 + Ac2 y 2 = « » , y1 = « » , y 2 = 4
¬19 ¼ ¬ 3¼

ª 1º
i1l = 16 « » , i 2 l =  16 , & i l & I = 16 6,
¬2¼

ª 2 º
xl = 16 « » , & xl &= 16 85,
¬9¼
y(t ) =  13 + 32 t

1 w 2L ª 2 2 º
(x 2m ) = [( A 2 A11 )c( A 2 A11 ) + I] = « » > 0,
2 wx 2 wxc2 ¬ 2 5 ¼
§ ª 2 2 º · § ª 2 2 º ·
" first eigenvalue O1 ¨ « » ¸ = 6", " second eigenvalue O2 ¨ « » ¸ = 1".
© ¬ 2 5 ¼ ¹ © ¬ 2 5 ¼ ¹
The diagnostic algorithm for solving an overdetermined system of linear equa-
tions y = Ax, rk A = dim X = m, m < n = dim Y, y  Y by means of rank parti-
tioning is presented to you by Box 3.4.
3-14 The range R(f) and the kernel N(f), interpretation of “LESS”
by three partitionings:
(i) algebraic (rank partitioning)
(ii) geometric (slicing)
(iii) set-theoretical (fibering)
Fourthly, let us go into the detailed analysis of R(f), R ( f ) A , N(f), with respect
to the front page example. Beforehand we begin with a comment.
We want to emphasize the two step procedure of the least squares solution
(LESS) once more: The first step of LESS maps the observation vector y onto
the range space R(f) while in the second step the LESS point y  R ( A) is
uniquely mapped to the point xl  X , an element of the parameter space. Of
104 3 The second problem of algebraic regression

course, we directly produce xl = ( A cA) 1 Acy just by substituting the inconsis-


tency vector i = y – Ax into the l2  norm & i &2I = (y  Ax)c(y  Ax) = min .
Such a direct procedure which is common practice in LESS does not give any
insight into the geometric structure of LESS.
But how to identify the range R(f), namely the range space R(A), or the kernel
N(f), namely the null space N(A) in the front page example?
By means of Box 3.4 we identify R(f) or “the null space R(A)” and give its
illustration by Figure 3.1. Such a result has paved the way to the diagnostic algo-
rithm for solving an overdetermined system of linear equations by means of rank
partitioning presented in Box 3.5.
The kernel N(f) or “the null space” is immediately identified as
{0} = N ( A ) = {x  R m | Ax) = 0} = {x  R m | A1 x = 0} by means of rank parti-
tioning ( A1x = 0 œ x = 0} .
Box 3.4:
The range space of the system of inconsistent linear equations
Ax + i = y, “vertical” rank partitioning
The matrix A is called “vertically rank partitioned”, if
r = rk A = rk A1 = m,
ªA º
{A  R n×m š A = « 1 » A1  R r ×r , A 2  R d ×r }
¬ A 2 ¼ d = d ( A) = m  rk A

holds. (In the introductory example A  R 3× 2 , A1  R 2× 2 ,


A 2  R1× 2 , rk A = 2, d ( A) = 1 applies.) An inconsistent system
of linear equations Ax = y, rk A = dim X = m, is “vertically
rank partitioned” if
ªA º ªi º
Ax = y , rk A = dim X œ y = « 1 » x + « 1 » œ
¬A2 ¼ ¬i 2 ¼
ª y = A1x + i1
œ « 1
¬y 2 = A 2x + i 2
for a partitioned observation vector
ªy º
{y  R n š y = « 1 » | y1  R r ×1 , y 2  R d ×1 }
¬y 2 ¼
and a partitioned inconsistency vector
ªi º
{i  R n š i = « 1 » | i1  R r ×1 , i 2  R d ×1 },
¬i 2 ¼
respectively, applies. (The “vertical” rank partitioning of the
3-1 Introduction 105

matrix A as well as the “vertically rank partitioned” inconsistent


system of linear equations Ax + i = y , rk A = dim X = m , of the
introductory example is

ª1 1 º ª1 1 º
«1 2 » = A = ª A1 º = «1 2» ,
« » « » « »
«¬1 3 »¼ ¬ A 2 ¼ «1 3»
¬ ¼

ª1 º
ª y1 º « » 2 ×1
« » = «3 » , y1  R , y 2  R .
y
¬ 2 ¼ «4»
¬ ¼
By means of the vertical rank partitioning of the inconsistent sys-
tem of inhomogeneous linear equations an identification of the
range space R(A), namely
R ( A) = {y  R n | y 2  A 2 A11 y1 = 0}

is based upon
y1 = A1x + i1 Ÿ x1 =  A11 (y1  i1 )
y 2 = A 2 x + i 2 Ÿ x 2 =  A 2 A11 (y1  i1 ) + i 2 Ÿ

y 2  A 2 A11 y1 = i 2  A 2 A11i1

which leads to the range space R(A) for inconsistency zero, particu-
larly in the introductory example
1
ª1 1 º ª y1 º
y3  [1, 3] « » « » = 0.
¬1 2 ¼ ¬ y2 ¼
For instance, if we introduce the coordinates y1 = u , y2 = v, the other
coordinate y3 of the range space R(A)  Y = R 3 amounts to

ª 2 1º ªu º ªu º
y3 = [1, 3] « » « v » = [ 1, 2] « v » Ÿ
¬ 1 1 ¼¬ ¼ ¬ ¼
Ÿ y3 = u + 2v.
In geometric language the linear space R(A) is a parameterized plane
2
P 0 through the origin illustrated by Figure 3.1. The observation space
Y = R n (here n = 3) is sliced by the subspace, the linear space (linear
manifold) R(A), dim R ( A) = rk( A) = r , namely a straight line, a
plane (here), a higher dimensional plane through the origin O.
106 3 The second problem of algebraic regression

y
2
0

e3

ec1

e2

e1

Figure 3.1: Range R(f), range space R(A), y  R(A),


observation space Y = R 3 , slice by R ( A) = P02  R 3 ,
y = e1u + e 2 v+ e3 (u + 2 v)  R ( A)

Box 3.5:
Algorithm
Diagnostic algorithm for solving an overdetermined system of
linear equations y = Ax + i, rk A = dim X , y  R ( A) by means
of rank partitioning
Determine
the rank of the matrix A
rk A = dimX = m
3-1 Introduction 107

Compute
the “vertical rank partitioning”
ªA º
A = « 1 » , A1  R r × r = R m × n , A 2  R ( n  r )× r = R ( n  m )× m
¬A2 ¼
“n – r = n – m = ds is called
left complementary index”
“A as a linear operator is not
surjective, but injective”

Compute
the range space R(A)
R ( A) := {y  R n | y 2  A 2 A11 y1 = 0}

Compute
the inconsistency vector of type LESS
i l =  A( AcA) 1 y + y
test : A ci l = 0

Compute
the unknown parameter vector of type LESS
xl = ( A cA) 1 Acy .
h

What is the geometric interpretation of the least-squares solution & i &2I = min ?

With reference to Figure 3.2 we additively decompose the observation vector


accordingly to
y = y R(A) + y R(A) , A

where y R ( A )  R ( A) is an element of the range space R ( A) , but the inconsis-


tency vector i l = i R ( A )  R ( A) A an element of its orthogonal complement, the
A

normal space R ( A) A . Here R ( A) is the central plane P02 , y R ( A )  P02 , but


108 3 The second problem of algebraic regression

R ( A) A the straight line L1 , i l  R ( A) A . & i &2I = & y  y R ( A ) &2 = min can be


A

understood as the minimum distance mapping of the observation point y  Y


onto the range space R ( A) . Such a mapping is minimal, if and only if the inner
product ¢ y R ( A ) | i R ( A ) ² = 0 approaches zero, we say
A

" y R ( A ) and i R ( A ) are orthogonal".


A

The solution point y R ( A ) is the orthogonal projection of the observation point


y  Y onto the range space R ( A), an m-dimensional linear manifold, also
called a Grassmann manifold G n , m .

Figure 3.2: Orthogonal projection of the observation vector an y  Y onto the


range space R ( A), R ( A) := {y  R n | y 2  A 2 A11 y1 = 0} , i l  R ( A) A , here:
y R ( A )  P02 (central plane), y  L1 (straight line ), representation of
y R ( A ) (LESS) : y = e1u + e 2 v + e3 (u + 2v)  R 3 , R ( A) = span{eu , e v }
ª eu := Du y R ( A ) / & Du y R ( A ) &= (e1  e3 ) / 2
«
« Dv y R ( A )  < Dv y R ( A ) | eu > eu
Gram - Schmidt : «ev := = (e1 + e 2 + e3 ) / 3
& Dv y R ( A )  < Dv y R ( A ) | eu > eu &
«
«
¬ < eu | ev > = 0, Dv y R ( A ) = e 2 + 2e3

As an “intermezzo” let us consider for a moment the nonlinear model by means


of the nonlinear mapping
" X
x 6 f (x) = y R ( A ) , y  Y ".
3-1 Introduction 109

In general, the observation space Y as well as the parameter space X may be


considered as differentiable manifolds, for instance “curved surfaces”. The range
R(f) may be interpreted as the differentiable manifolds. X embedded or more
generally immersed, in the observation space Y = R n , for instance: X  Y. The
parameters [ x1 ,… , xm ] constitute a chart of the differentiable manifolds
X = M m  M n = Y. Let us assume that a point p  R ( f ) is given and we are
going to attach the tangent space Tp M m locally. Such a tangent space Tp M m at
p  R ( f ) may be constructed by means of the Jacobi map, parameterized by
the Jacobi matrix J, rk J = m, a standard procedure in Differential Geometry. An
observation point y  Y = R n is orthogonally projected onto the tangent space
Tp M m at p  R ( f ) , namely by LESS as a minimum distance mapping. In a
second step – in common use is the equidistant mapping – we bring the point
q  Tp M m which is located in the tangent space Tp M m at p  R ( f ) back to
the differentiable manifold, namely y   R ( f ). The inverse map
" R ( f )
y  6 g ( y  ) = xl  X "

maps the point y   R ( f ) to the point xl of the chosen chart of the parameter
space X as a differentiable manifold. Examples follow lateron.
Let us continue with the geometric interpretation of the linear model of this
paragraph. The range space R(A), dim R ( A) = rk( A) = m is a linear space of
dimension m, here m = rk A, which slices R n . In contrast, the subspace R ( A) A
corresponds to a n  rk A = d s dimensional linear space Ln  r , here n - rk A = n –
m, r = rk A= m.
Let the algebraic partitioning and the geometric partitioning be merged to inter-
pret the least squares solution of the inconsistent system of linear equations as a
generalized inverse (g-inverse) of type LESS. As a summary of such a merger we
take reference to Box 3.6.

The first condition:


AA  A = A

Let us depart from LESS of y = Ax + i, namely


xl = A l y = ( AcA) 1 Acy, i l = (I  AA l ) y = [I  A( AcA) 1 Ac]y.

Axl = AA l y = AA l ( Axl + i l ) º
»Ÿ
A ci l = Ac[I  A( AcA) Ac]y = 0 Ÿ A l i l = ( A cA) Aci l = 0 ¼
1  1

Ÿ Axl = AA l Axl œ AA  A = A .

The second condition


A  AA  = A 
110 3 The second problem of algebraic regression

xl = ( A cA) 1 A cy = A l y = A l ( Axl + i l ) º
»Ÿ
A l i l = 0 ¼
xl = A l y = A l AA l y Ÿ

Ÿ A l y = A l AA l y œ A  AA  = A  .

rk A l = rk A is interpreted as following: the g-inverse of type LESS is the gen-


eralized inverse of maximal rank since in general rk A  d rk A holds.
The third condition
AA  = PR ( A ) 

y = Axl + i l = AA l + (I  AA l )y
y = Axl + i l = A( A cA) 1 A cy + [I  A( A cA) 1 A c]y º
»Ÿ
y = y R(A) + i R(A) A
»¼

Ÿ A  A = PR ( A ) , (I  AA  ) = PR ( A
  A
)
.

Obviously AA l is an orthogonal projection onto R ( A) , but I  AA l onto its


orthogonal complement R ( A) A .

Box 3.6:
The three condition of the generalized inverse mapping
(generalized inverse matrix) LESS type
Condition #1 Condition #1
f (x) = f ( g (y )) Ax = AA  Ax
œ œ
f = f DgD f AA  A = A
Condition #2 Condition #2
(reflexive g-inverse mapping) (reflexive g-inverse)
x = g (y ) = x  = A  y = A  AA  y œ
= g ( f (x)) œ A  AA  = A 

Condition #3 Condition #3

f ( g (y )) = y R ( A ) A  Ay = y R (A)
œ œ
f D g = projR (f) A A = PR (A) .

3-2 The least squares solution: “LESS” 111

The set-theoretical partitioning, the fibering of the set system of points which
constitute the observation space Y, the range R(f), will be finally outlined. Since
the set system Y (the observation space) is R n , the fibering is called “trivial”.
Non-trivial fibering is reserved for nonlinear models in which case we are deal-
ing with a observation space as well as an range space which is a differentiable
manifold. Here the fibering
Y = R( f ) ‰ R( f )A

produces the trivial fibers R ( f ) and R ( f ) A where the trivial fibers R ( f ) A is the
quotient set R n /R ( f ) . By means of a Venn diagram (John Venn 1834-1928)
also called Euler circles (Leonhard Euler 1707-1783) Figure 3.3 illustrates the
trivial fibers of the set system Y = R n generated by R ( f ) and R ( f ) A . The set
system of points which constitute the parameter space X is not subject to fiber-
ing since all points of the set system R(f) are mapped into the domain D(f).

Figure 3.3: Venn diagram, trivial fibering of the observation space Y, trivial
fibers R ( f ) and R ( f ) A , f : R m = X o Y = R ( f ) ‰ R ( f ) A , X set
system of the parameter space, Y set system of the observation
space.
3-2 The least squares solution: “LESS”
The system of inconsistent linear equations Ax + i = y subject to A  R n×m ,
rk A = m < n , allows certain solutions which we introduce by means of Defini-
tion 3.1 as a solution of a certain optimization problem. Lemma 3.2 contains the
normal equations of the optimization problem. The solution of such a system of
normal equations is presented in Lemma 3.3 as the least squares solution with
respect to the G y - norm . Alternatively Lemma 3.4 shows the least squares solu-
tion generated by a constrained Lagrangean. Its normal equations are solved for
(i) the Lagrange multiplier, (ii) the unknown vector of inconsistencies by Lemma
3.5. The unconstrained Lagrangean where the system of linear equations has
been implemented as well as the constrained Lagrangean lead to the identical
solution for (i) the vector of inconsistencies and (ii) the vector of unknown pa-
rameters. Finally we discuss the metric of the observation space and alternative
choices of its metric before we identify the solution of the quadratic optimization
problem by Lemma 3.7 in terms of the (1, 2, 3)-generalized inverse.
112 3 The second problem of algebraic regression

Definition 3.1 ( least squares solution w.r.t. the G y -seminorm):


A vector xl  X = R m is called G y - LESS (LEast Squares Solution
with respect to the G y -seminorm) of the inconsistent system of lin-
ear equations

ª rk A = dim X = m
Ax + i = y , y  Y { R , ««n
or (3.1)
«¬ y  R ( A)

(the system of inverse linear equations A  y = x, rk A  = dim X = m or


x  R ( A  ) , is consistent) if in comparison to all other vectors
x  X { R m the inequality
& y  Axl &G2 = (y  Axl )cG y (y  Axl ) d
y
(3.2)
d (y  Ax)cG y (y  Ax) = & y  Ax &G2 y

holds, in particular if the vector of inconsistency i l := y  Axl has


the least G y -seminorm.

The solution of type G y -LESS can be computed as following


Lemma 3.2 (least squares solution with respect to the
G y -seminorm) :
A vector xl  X { R m is G y -LESS of (3.1) if and only if the
system of normal equations
A cG y Axl = AcG y y (3.3)

is fulfilled. xl always exists and is in particular unique, if


A cG y A is regular.

: Proof :
G y -LESS is constructed by means of the Lagrangean
b ± b 2  4ac
L(x) := & i &2G = & y  Ax &2G =
y y
2a
= xcA cG y Ax  2y cG y Ax + y cG y y = min
x

such that the first derivatives

wL w i cG y i
(xl ) = (xl ) = 2 A cG y ( Axl  y ) = 0
wx wx
constitute the necessary conditions. The theory of vector derivative is presented
in Appendix B. The second derivatives
3-2 The least squares solution: “LESS” 113

w2L w 2 i cG y i
(xl ) = (xl ) = 2 A cG y A t 0
wx wxc wx wxc

due to the positive semidefiniteness of the matrix A cG y A generate the suffi-


ciency condition for obtaining the minimum of the unconstrained Lagrangean.
Because of the R ( A cG y A) = R ( A cG y ) there always exists a solution xl whose
uniqueness is guaranteed by means of the regularity of the matrix A cG y A .
ƅ
It is obvious that the matrix A cG y A is in particular regular, if rk A = dim X = m ,
but on the other side the matrix G y is positive definite, namely & i &2G is a G y -
y

norm. The linear form xl = Ly which for arbitrary observation vectors


y  Y { R n leads to G y -LESS of (3.1) can be represented as following.

Lemma 3.3 (least squares solution with respect to the G y - norm,


rk A = dim X = m or ( x  R ( A  ) ):
xl = Ly is G y -LESS of the inconsistent system of linear equa-
tions (3.1) Ax + i = y , restricted to rk ( A cG y A) = rk A = dim X
(or R ( A cG y ) = R ( A c) and x  R ( A  ) ) if and only if L  R m × n
is represented by
Case (i) : G y = I

Lˆ = A L = ( AcA) 1 Ac (left inverse) (3.4)


xl = A L y = ( A cA) 1 Acy. (3.5)
y = yl + il (3.6)

is an orthogonal decomposition of the observation vector


y  Y { R n into the I -LESS vector y l  Y = R n and the I -
LESS vector of inconsistency i l  Y = R n subject to
y l = Axl = A( A cA) 1 A cy (3.7)
i l = y  y l =[I n  A( A cA) 1 Ac] y. (3.8)

Due to y l = A( AcA) 1 Acy , I-LESS has the reproducing property.


As projection matrices A( A cA) 1 A c and [I n  A( AcA) 1 Ac] are
independent. The “goodness of fit” of I-LESS is
& y  Axl &2I =& i l &2I = y c[I n  A( A cA) 1 A c]y . (3.9)

Case (ii) : G y positive definite, rk ( A cG y A) = rk A

Lˆ = ( A cG y A ) 1 A cG y (weighted left inverse) (3.10)


xl = ( A cG y A) AcG y y.
1
(3.11)
114 3 The second problem of algebraic regression

y = y l + il (3.12)
is an orthogonal decomposition of the observation vector
y  Y { R n into the G y -LESS vector y l  Y = R n and the G y -
LESS vector of inconsistency i l  Y = R n subject to
y l = Axl = A( A cG y A) 1 AcG y y , (3.13)
i l = y  Axl =[I n  A( AcG y A) 1 AcG y ] y . (3.14)

Due to y l = A( A cG y A) 1 A cG y y G y -LESS has the reproducing


property. As projection matrices A( A cG y A) 1 A cG y and
[I n  A ( A cG y A ) 1 A cG y ] are independent. The “goodness of fit” of
G y -LESS is
& y  Axl &2G =& i l &2G = y c[I n  A ( A cG y A ) 1 A cG y ]y .
y y
(3.15)
The third case G y positive semidefinite will be treated independently.
The proof of Lemma 3.1 is straightforward. The result that LESS generates the
left inverse, G y -LESS the weighted left inverse will be proved later.
An alternative way of producing the least squares solution with respect to the
G y - seminorm of the linear model is based upon the constrained Lagrangean
(3.16), namely L(i, x, Ȝ ) . Indeed L(i, x, Ȝ ) integrates the linear model (3.1) by a
vector valued Lagrange multiplyer to the objective function of type “least
squares”, namely the distance function in a finite dimensional Hilbert space.
Such an approach will be useful when we apply “total least squares” to the
mixed linear model (error-in-variable model).
Lemma 3.4 (least squares solution with respect to the G y - norm,
rk A = dim X , constrained Lagrangean):
G y -LESS is assumed to be defined with respect to the constrained
Lagrangean
L(i, x, Ȝ ) := i cG y i + 2Ȝ c( Ax + i  y ) = min . (3.16)
i , x, Ȝ

A vector [i cl , xcl , Ȝ cl ]c  R ( n + m + n )×1 is G y -LESS of (3.1) in the sense


of the constrained Lagrangean L(i, x, Ȝ ) = min if and only if the
system of normal equations
ªG y 0 I n º ª i l º ª 0 º
« 0 0 A c» « x » = « 0 » (3.17)
« »« l» « »
«¬ I n A 0 »¼ «¬ Ȝ l »¼ «¬ y »¼
with the vector Ȝ l  R n×1 of “Lagrange multiplyer” is fulfilled.
(i l , xl , Ȝ l ) exists and is in particular unique, if G y is positive semidefi-
nite. There holds
(i l , xl , Ȝ l ) = arg{L(i, x, Ȝ ) = min} . (3.18)
3-2 The least squares solution: “LESS” 115

: Proof :
G y -LESS is based on the constrained Lagrangean
L(i, x, Ȝ ) := i cG y i + 2Ȝ c( $x + i  y ) = min
i , x, Ȝ

such that the first derivatives


wL
(i l , xl , Ȝ l ) = 2(G y i l + Ȝ l ) = 0
wi
wL
(i l , xl , Ȝ l ) = 2$ cȜ l = 0
wx
wL
(i l , xl , Ȝ l ) = 2( $xl + i l  y ) = 0

or

ªG y 0I n º ª il º ª 0 º
« 0 0A c»» «« xl »» = «« 0 »»
«
«¬ I n A 0 »¼ «¬ Ȝ l »¼ «¬ y »¼

constitute the necessary conditions. (The theory of vector derivative is presented


in Appendix B.) The second derivatives

1 w2L
( xl ) = G y t 0
2 w i w ic

due to the positive semidefiniteness of the matrix G y generate the sufficiency


condition for obtaining the minimum of the constrained Lagrangean.
ƅ
Lemma 3.5 (least squares solution with respect to the G y - norm,
rk A = dim X , constrained Lagrangean):
If G y -LESS of the linear equations (3.1) is generated by the con-
strained Lagrangean (3.16) with respect to a positive definite
weight matrix G y , rk G y = n, then the normal equations (3.17) are
uniquely solved by
xl = ( AcG y A) 1 AcG y y, (3.19)

i l =[I n  A( A cG y A) 1 A cG y ] y, (3.20)

Ȝ l =[G y A( A cG y A) 1 A c  I n ] G y y. (3.21)

:Proof :
A basis of the proof could be C. R. Rao´s Pandora Box, the theory of inverse
partitioned matrices (Appendix A: Fact: Inverse Partitioned Matrix /IPM/ of a
116 3 The second problem of algebraic regression

symmetric matrix). Due to the rank identities rk G y = n, rk A = rk ( A cG y A) = m


< n, the normal equations can be solved faster directly by Gauss elimination.
G y il + Ȝ l = 0
A cȜ l = 0
Axl + i l  y = 0.

Multiply the third normal equation by A cG y , multiply the first normal equation
by Ac and substitute A c Ȝ l from the second normal equation in the modified first
one.
A cG y Axl + AcG y i l  A cG y y = 0 º
A cG y i l + A cȜ l = 0 »Ÿ
»
A cȜ l = 0 »¼
A cG y Axl + AcG y i l  A cG y y = 0 º
Ÿ »Ÿ
A cG y i l = 0 ¼
Ÿ A cG y Axl  A cG y y = 0, Ÿ

xl = ( A cG y A) 1 AcG y y.

Let us subtract the third normal equation and solve for i l .


i l = y  Axl ,

i l =[I n  A( AcG y A) 1 AcG y ] y.

Finally we determine the Lagrange multiplier: substitute i l in the first normal


equation in order to find
Ȝ l = G y i l

Ȝ l =[G y A( AcG y A) 1 A cG y  G y ] y.

ƅ
Of course the G y -LESS of type (3.2) and the G y -LESS solution of type con-
strained Lagrangean (3.16) are equivalent, namely (3.11) ~ (3.19) and (3.14) ~
(3.20).
In order to analyze the finite dimensional linear space Y called “the observation
space”, namely the case of a singular matrix of its metric, in more detail, let us
take reference to the following.
3-2 The least squares solution: “LESS” 117

Theorem 3.6 (bilinear form) :


Suppose that the bracket i i or g (i,i) : Y × Y o \ is a bilinear
form or a finite dimensional linear space Y , dim Y = n , for in-
stance a vector space over the field of real numbers. There exists a
basis {e1 ,..., en } such that
(i) ei e j = 0 or g (ei , e j ) = 0 for i z j
­
° 1 1
(
ei ei = +1 or g ei , ei = +1 for 1 d i1 d p 1 1
)
(ii)
°
2 2
( 2 2
)
® ei ei = 1 or g ei , ei = 1 for p + 1 d i2 d p + q = r
°
°¯ 3 3
(
ei ei = 0 or g ei , ei = 0 for r + 1 d i3 d n .
3 3
)
The numbers r and p are determined exclusively by the bilinear form. r is called
the rank, r  p = q is called the relative index and the ordered pair (p,q) the
signature. The theorem states that any two spaces of the same dimension with
bilinear forms of the same signature are isometrically isomorphic. A scalar prod-
uct (“inner product”) in this context is a nondegenerate bilinear form, for in-
stance a form with rank equal to the dimension of Y . When dealing with low
dimensional spaces as we do, we will often indicate the signature with a series of
plus and minus signs when appropriate. For instance the signature of \ 14 may be
written (+ + + ) instead of (3,1). Such an observation space Y is met when we
are dealing with observations in Special Relativity.
For instance, let us summarize the peculiar LESS features if the matrix
G y  \ n×n of the observation space is semidefinite, rk G y := ry < n . By means
of Box 3.7 we have collected the essential items of the eigenspace analysis as
well as the eigenspace synthesis G *y versus G y of the metric. ȁ y =
= Diag(O1 ,..., Or ) denotes the matrix of non-vanishing eigenvalues {O1 ,..., Or } .
y y
Note the norm identity
|| i ||G2 = || i ||2U ȁ U c ,
y
(3.22)
1 y 1

which leads to the U1 ȁ y U1c -LESS normal equations

A cU1 ȁ y U1c x A = A cU1 ȁ y U1c y. (3.23)

Box 3.7:
Canonical representation of the rank deficient matrix
of the matrix of the observation space Y
rk G y =: ry , ȁ y := Diag(O1 ,..., Or ) . y
118 3 The second problem of algebraic regression

“eigenspace analysis” “eigenspace synthesis”


ªU c º ªȁ 01 º ª U1c º
(3.24) G *y = « 1 » G y [ U1 , U 2 ] = G y = [ U1 , U 2 ] « y « » (3.25)
«U c » ¬ 02 03 »¼ « U c »
¬ 2¼ ¬ 2¼
ªȁ 01 º  \ n× n
=« y  \ n× n
¬ 02 03 »¼

subject to
{
U  SO(n(n  1) / 2) := U  \ n× n | UcU = I n , U = +1 }
n× ry n×( n  ry ) ry ×ry
U1  \ , U2  \ , ȁy  \
ry × n ( n  ry )× ry ( n  ry )×( n  ry )
01  \ , 02  \ , 03  \

“norms”
(3.26) || i ||G2 = || i ||2U ȁ U c
y ~ i cG y i = i cU1 ȁ y U1ci (3.27)
1 y 1

LESS: || i ||G2 = min œ || i ||2U ȁ U c = min


y x 1 y 1 x

œ A cU1 ȁ y U1c xA = A cU1 ȁ y U1c y .

Another example relates to an observation space


Y = \ 12 k ( k  {1,..., K })
of even dimension, but one negative eigenvalue. In such a pseudo-Euclidean
space of signature (+ ˜˜˜ + ) the determinant of the matrix of metric G y is
negative, namely det G y = O1 ...O2 K 1 O2 K . Accordingly
x max = arg{|| i ||G2 = max | y = Ax + i, rk A = m}
y

is G y -MORE (Maximal ObseRvational inconsistEncy solution), but not G y -


LESS. Indeed, the structure of the observational space, either pseudo-Euclidean
or Euclidean, decides upon
MORE or LESS.

3-21 A discussion of the metric of the parameter space X


With the completion of the proof we have to discuss the basic results of Lemma
3.3 in more detail. At first we have to observe that the matrix G y of the met-
3-2 The least squares solution: “LESS” 119

ric of the observation space Y has to be given a priori. We classified LESS ac-
cording to (i) G y = I n , (ii) G y positive definite and (iii) G y positive semidefi-
nite. But how do we know the metric of the observation space Y? Obviously we
need prior information about the geometry of the observation space Y, namely
from the empirical sciences like physics, chemistry, biology, geosciences, social
sciences. If the observation space Y  R n is equipped with an inner product
¢ y1 | y 2 ² = y1cG y y 2 , y1  Y, y 2  Y where the matrix G y of the metric & y &2 = y cG y y
is positive definite, we refer to the metric space Y  R n as Euclidean E n . In
contrast, if the observation space is positive semidefinite we call the observation
space semi Euclidean E n , n . n1 is the number of positive eigenvalues, n2 the
1 2

number of zero eigenvalues of the positive semidefinite matrix G y of the metric


(n = n1 + n2 ). In various applications, namely in the adjustment of observations
which refer to Special Relativity or General Relativity we have to generalize the
metric structure of the observation space Y: If the matrix G y of the pseudomet-
ric & y &2 = y cG y y is built on n1 positive eigenvalues (signature +), n2 zero
eigenvalues and n3 negative eigenvalues (signature -), we call the pseudometric
parameter space pseudo Euclidean E n , n , n , n = n1 + n2 + n3 . For such an obser-
1 2 3

vation space LESS has to be generalized to & y  Ax &2G = extr , for instance
y

"maximum norm solution" .

3-22 Alternative choices of the metric of the observation space Y


Another problem associated with the observation space Y is the norm choice
problem. Up to now we have used the A 2 -norm, for instance

A 2 -norm: & y  Ax & 2 := ( y  Ax)( y  Ax) = i c i =

= i12 + i22 + " + in21 + in2 ,


p p p p
A p -norm: & y  Ax & p := p
i1 + i2 + " + in 1 + in ,
1< p < f
A f -norm: & i & f := max | ii |
1di d n

are alternative norms of choice.


Beside the choice of the matrix G y of the metric within the weighted A 2 -norm
we like to discuss the result of the LESS matrix G l of the metric. Indeed we
have constructed LESS from an a priori choice of the metric G called G y and
were led to the a posteriori choice of the metric G l of type (3.9) and (3.15). The
matrices
(i) G l = I n  A( A cA) 1 Ac (3.9)
(ii) G l = I n  A( A cG y A) A cG y 1
(3.15)

are (i) idempotent and (ii) G y1 idempotent, in addition.


120 3 The second problem of algebraic regression

There are various alternative scales or objective functions for projection matrices
for substituting Euclidean metrics termed robustifying. In special cases those
objective functions operate on
(3.11) xl = Hy subject to H x = ( AcG y A) 1 AG y ,
(3.13) y A = H y y subject to H y = A( A cG y A) 1 AG y ,
(3.14) i A = H A y subject to H A = ª¬I n  A( A cG y A) 1 AG y º¼ y ,
where {H x , H y , H A } are called “hat matrices”. In other cases analysts have to
accept that the observation space is non-Euclidean. For instance, direction ob-
servations in R p locate points on the hypersphere S p 1 . Accordingly we have to
accept an objective function of von Mises-Fisher type which measures the spheri-
cal distance along a great circle between the measurement points on S p 1 and the
mean direction. Such an alternative choice of a metric of a non- Euclidean space
Y will be presented in chapter 7.
Here we discuss in some detail alternative objective functions, namely
• optimal choice of the weight matrix G y :
second order design SOD
• optimal choice of the weight matrix G y by means of
condition equations
• robustifying objective functions

3-221 Optimal choice of weight matrix: SOD


The optimal choice of the weight matrix , also called second order design
(SOD), is a traditional topic in the design of geodetic networks. Let us refer to
the review papers by A. A. Seemkooei (2001), W. Baarda (1968, 1973), P. Cross
(1985), P. Cross and K. Thapa (1979), E. Grafarend (1970, 1972, 1974, 1975),
E. Grafarend and B. Schaffrin (1979), B. Schaffrin (1981, 1983, 1985), F.
Krumm (1985), S. L. Kuang (1991), P. Vanicek, K. Thapa and D. Schröder
(1981), B. Schaffrin, E. Grafarend and G. Schmitt (1977), B. Schaffrin, F.
Krumm and D. Fritsch (1980), J. van Mierlo (1981), G. Schmitt (1980, 1985), C.
C. Wang (1970), P. Whittle (1954, 1963), H. Wimmer (1982) and the textbooks
by E. Grafarend, H. Heister, R. Kelm, H. Knopff and B. Schaffrin (1979) and E.
Grafarend and F. Sanso (1985, editors).
What is an optimal choice of the weight matrix G y , what is “a second order
design problem”?
Let us begin with Fisher’s Information Matrix which agrees to the half of the
Hesse matrix, the matrix of second derivatives of the Lagrangean
L(x):=|| i ||G2 = || y  Ax ||G2 , namely
y y
3-2 The least squares solution: “LESS” 121

w2L
G x = A c(x)G y A(x) = 1
2
=: FISHER
wx A wxcA

at the “point“ x A of type LESS. The first order design problem aims at determin-
ing those points x within the Jacobi matrix A by means of a properly chosen
risk operating on “FISHER”. Here, “FISHER” relates the weight matrix of the
observations G y , previously called the matrix of the metric of the observation
space, to the weight matrix G x of the unknown parameters, previously called
the matrix of the metric of the parameter space.

Gx Gy
weight matrix of weight matrix of
the unknown parameters the observations
or or
matrix of the metric of matrix of the metric of
the parameter space X the observation space Y .
Being properly prepared, we are able to outline the optimal choice of the weight
matrix G y or X , also called the second order design problem, from a criterion
matrix Y , an ideal weight matrix G x (ideal) of the unknown parameters, We
hope that the translation of G x and G y “from metric to weight” does not cause
any confusion. Box 3.8 elegantly outlines SOD.
Box 3.8:
Second order design SOD, optimal fit to a criterion matrix of weights
“weight matrix of 3-21 “weight matrix of
the parameter space“ the observation space”

(3.28) Y := 1
2
w2L
= Gx ( )
X := G y = Diag g1y ,..., g ny (3.29)
wx A wx Ac
x := ª¬ g1y ,..., g ny º¼c

“inconsistent matrix equation of the second order design problem“


A cXA + A = Y (3.30)
“optimal fit”
|| ǻ ||2 = tr ǻcǻ = (vec ǻ)c(vec ǻ) = min (3.31)
X

x S := arg{|| ǻ ||2 = min | A cXA + ǻ = Y, X = Diag x} (3.32)


122 3 The second problem of algebraic regression

vec ǻ =

= vec Y  vec( A cXA) = vec Y  ( A c … A c) vec X (3.33)

vec ǻ = vec Y  ( A c … A c)x (3.34)

x  \ n , vec Y  \ n ×1 , vech Y  \ n ( n +1) / 2×1


2

vec ǻ  \ n ×1 , vec X  \ n ×1 , ( A c … A c)  \ n ×n , A c : A c  \ n ×n
2 2 2 2 2

x S = [ ( A c : A c)c( A c : A c) ] ( Ac : Ac) vec Y .


1
(3.35)

In general, the matrix equation A cXA + ǻ = Y is inconsistent. Such a matrix


inconsistency we have called ǻ  \ m × m : For a given ideal weight matrix
G x (ideal ) , A cG y A is only an approximation. The unknown weight matrix of
the observations G y , here called X  \ n× n , can only be designed in its diagonal
form. A general weight matrix G y does not make any sense since “oblique
weights” cannot be associated to experiments. A natural restriction is therefore
( )
X = Diag g1y ,..., g ny . The “diagonal weights” are collected in the unknown
vector of weights

x := ª¬ g1y ,..., g ny º¼c  \ n .

The optimal fit “ A cXA to Y “ is achieved by the Lagrangean || ǻ ||2 = min , the
optimum of the Frobenius norm of the inconsistency matrix ǻ . The vectorized
form of the inconsistency matrix, vec ǻ , leads us first to the matrix ( A c … A c) ,
the Zehfuss product of Ac , second to the Kronecker matrix ( A c : A c) , the Kha-
tri- Rao product of Ac , as soon as we implement the diagonal matrix X . For a
definition of the Kronecker- Zehfuss product as well as of the Khatri- Rao
product and related laws we refer to Appendix A. The unknown weight vector x
is LESS, if

x S = [ ( A c : A c)c( A c : A c) ] ( A c : Ac)c vec Y .


1

Unfortunately, the weights x S may come out negative. Accordingly we have to


build in extra condition, X = Diag( x1 ,..., xm ) to be positive definite. The given
references address this problem as well as the datum problem inherent in
G x (ideal ) .
Example 3.2 (Second order design):
3-2 The least squares solution: “LESS” 123

y3 = 6.94 km

y1 = 13.58 km y2 = 9.15 km

PĮ Pȕ

Figure 3.4: Directed graph of a trilateration network, known points {PD , PE , PJ } ,


unknown point PG , distance observations [ y1 , y2 , y3 ]c  Y
The introductory example we outline here may serve as a firsthand insight into
the observational weight design, also known as second order design. According
to Figure 3.4 we present you with the graph of a two-dimensional planar net-
work. From three given points {PD , PE , PJ } we measure distances to the unknown
point PG , a typical problem in densifying a geodetic network. For the weight
matrix G x  Y of the unknown point we postulate I 2 , unity. In contrast, we aim
at an observational weight design characterized by a weight matrix
G x  X = Diag( x1 , x2 , x3 ) .

The second order design equation


A c Diag( x1 , x2 , x3 ) A + ǻ = I 2
is supposed to supply us with a circular weight matrix G y of the Cartesian co-
ordinates ( xG , yG ) of PG . The observational equations for distances ( sDG , sEG , sJG )
= (13.58 km, 9.15 km, 6.94 km) have already been derived in chapter 1-4.
Here we just take advantage of the first design matrix A as given in Box 3.9
together with all further matrix operations.
A peculiar situation for the matrix equation A cXA + ǻ = I 2 is met: In the special
configuration of the trilateration network the characteristic equation of the
second order design problem is consistent. Accordingly we have no problem to
get the weights
124 3 The second problem of algebraic regression

ª0.511 0 0 º
«
Gy = « 0 0.974 0 »» ,
«¬ 0 0 0.515»¼

which lead us to the weight G x = I 2 a posteriori. Note that the weights came out
positive.
Box 3.9:
Example for a second order design problem, trilateration network

ª 0.454 0.891º
A = «« 0.809 +0.588»» , X = Diag( x1 , x2 , x3 ), Y = I 2
«¬ +0.707 +0.707 »¼

A c Diag( x1 , x2 , x3 ) A = I 2

ª0.206 x1 + 0.654 x2 + 0.5 x3 0.404 x1  0.476 x2 + 0.5 x3 º


œ« =
¬ 0.404 x1  0.476 x2 + 0.5 x3 0.794 x1 + 0.346 x2 + 0.5 x3 »¼

ª1 0 º
=« »
¬0 1 ¼
“inconsistency ǻ = 0 ”
(1st) 0.206 x1 + 0.654 x2 + 0.5 x3 = 1
(2nd) 0.404 x1  0.476 x2 + 0.5 x3 = 0
(3rd) 0.794 x1 + 0.346 x2 + 0.5 x3 = 1

x1 = 0.511, x2 = 0.974, x3 = 0.515.

3-222 The Taylor Karman criterion matrix

? What is a proper choice of the ideal weight matrix G x ?


There has been made a great variety of proposals.
First, G x (ideal ) has been chosen simple: A weight matrix G x is called ideally
simple if G x (ideal ) = I m . For such a simple weight matrix of the unknown pa-
rameters Example 3.2 is an illustration of SOD for a densification problem in a
trilateration network.
Second, nearly all geodetic networks have been SOD optimized by a criterion
matrix G x (ideal ) which is homogeneous and isotropic in a two-dimensional or
3-2 The least squares solution: “LESS” 125

three-dimensional Euclidean space. In particular, the Taylor-Karman structure


of a homogeneous and isotropic weight matrix G x (ideal ) has taken over the SOD
network design. Box 3.10 summarizes the TK- G x (ideal ) of a two-dimensional,
planar network. Worth to be mentioned, TK- G x (ideal ) has been developed in
the Theory of Turbulence, namely in analyzing the two-point correlation function
of the velocity field in a turbulent medium. (G. I. Taylor 1935, 1936, T. Karman
(1937), T. Karman and L. Howarth (1936), C. C. Wang (1970), P. Whittle (1954,
1963)).

Box 3.10:
Taylor-Karman structure of a homogeneous and isotropic tensor- val-
ued, two-point function, two-dimensional, planar network
ª gx x gx y gx x gx y º
« 1 1 1 1 1 2 1 2
»
« gy x 1 1
gy y
1 1
gy x 1 2
gy y »
1 2
Gx = « » G x (xD , x E )
« gx x 2 1
gx y
2 1
gx x 2 2
gx y »
2 2
«g gy y gy x g y y »»
«¬ y x 2 1 2 1 2 2 2 2
¼
“Euclidean distance function of points PD  (xD , yD ) and
PE  (x E , y E ) ”

sDE :=|| xD  x E ||= ( xD  xE ) 2 + ( yD  yE ) 2

“decomposition of the tensor-valued, two-point weight function


G x (xD , x E ) into the longitudinal weight function f A and the transver-
sal weight function f m ”
G x (xD , x E ) = ª¬ g j j (xD , x E ) º¼ =
1 2

ª x j ( PD )  x j ( PE ) º¼ ª¬ x j ( PD )  x j ( PE ) º¼
= f m ( sDE )G j j + ª¬ f A ( sDE )  f m ( sDE ) º¼ ¬ 1 1 2 2
(3.36)
1 2
s2 DE

j1 , j2  {1, 2} , ( xD , yD ) = ( x1 , y1 ), ( xE , yE ) = ( x2 , y2 ).

3-223 Optimal choice of the weight matrix: The space R ( A ) and R ( A) A

In the introductory paragraph we already outlined the additive basic decomposi-


tion of the observation vector into
y = y R (A) + y R = y A + iA ,
( A )A

y R ( A ) = PR ( A ) y , y R = PR y,
( A )A ( A )A

where PR( A ) and PR are projectors as well as


( A )A
126 3 The second problem of algebraic regression

y A  R ( A ) is an element i A  R ( A ) is an element
A
versus
of the range space R ( A ) , of its orthogonal comple-
in general the tangent space ment in general the normal
space R ( A ) .
A
Tx M of the mapping f (x)
G y -orthogonality y A i A Gy
= 0 is proven in Box 3.11.

Box 3.11
G y -Orthogonality of
y A = y ( LESS ) and i A = i ( LESS )

“ G y -orthogonality”

yA iA Gy
=0 (3.37)

yA iA GA
= y c ¬ªG y A( AcG y A) 1 A c¼º G y ¬ª I n  A( AcG y A) 1 A cG y ¼º y =

= y cG y A( A cG y A) 1 A cG y  y cG y A ( A cG y A ) 1 A cG y A( A cG y A) 1 A cG y y =

= 0.
There is an alternative interpretation of the equations of G y -orthogonality
i A y A G = i AcG y y A = 0 of i A and y A . First, replace iA = PR A y where PR A is
+ +
( ) ( )
a characteristic projection matrix. Second, substitute y A = Ax A where x A is G y -
y

LESS of x . As outlined in Box 3.12, G y -orthogonality i AcG y y A of the vectors


i A and y A is transformed into the G y -orthogonality of the matrices A and B .
The columns of the matrices A and B are G y -orthogonal. Indeed we have
derived the basic equations for transforming

parametric adjustment into adjustment of


conditional equations
y A = Ax A ,
BcG y y A = 0,
by means of
BcG y A = 0.

Box 3.12
G y -orthogonality of A and B
i A  R ( A) A , dim R ( A) A = n  rk A = n  m

y A  R ( A ) , dim R ( A ) = rk A = m
3-2 The least squares solution: “LESS” 127

iA yA Gy
= 0 œ ª¬I n  A( AcG y A) 1 A cG y º¼c G y A = 0 (3.38)

rk ª¬ I n  A( A cG y A) 1 A cG y º¼ = n  rk A = n  m (3.39)

“horizontal rank partioning”

ª¬I n  A( A cG y A) 1 A cG y º¼ = [ B, C] (3.40)

B  \ n× ( n  m ) , C  \ n× m , rk B = n  m

iA yA Gy
= 0 œ BcG y A = 0 . (3.41)

Example 3.3 finally illustrates G y -orthogonality of the matrices A und B .

Example 3.3 (gravimetric leveling, G y -orthogonality of A and B ).

Let us consider a triangular leveling network {PD , PE , PJ } which consists of three


observations of height differences ( hDE , hEJ , hJD ) . These height differences are
considered holonomic, determined from gravity potential differences, known as
gravimetric leveling. Due to
hDE := hE  hD , hEJ := hJ  hE , hJD := hD  hJ

the holonomity condition

³9 dh = 0 or hDE + hEJ + hJD = 0

applies. In terms of a linear model the observational equations can accordingly


be established by

ª hDE º ª 1 0 º ªiDE º
« » « » ª hDE º « »
« hEJ » = « 0 1 » « h » + « iEJ »
« hJD » «¬ 1 1»¼ ¬ EJ ¼ « iJD »
¬ ¼ ¬ ¼

ª hDE º ª1 0º
« » ª hDE º
y := « hEJ » , A := «« 0 1 »» , x := « »
« hJD » «¬ 1 1»¼ ¬ hEJ ¼
¬ ¼

y  \ 3×1 , A  \ 3× 2 , rk A = 2, x  \ 2×1 .

First, let us compute ( x A , y A , i A ,|| i A ||) I -LESS of ( x, y , i,|| i ||) . A. Bjerhammar’s


left inverse supplies us with
128 3 The second problem of algebraic regression

ª y1 º
ª 2 1 1º « »
x A = A A y = ( AcA) 1 Acy = 13 « » « y2 »
¬ 1 2 1¼ « »
¬ y3 ¼

ª hDE º ª 2 y  y2  y3 º
xA = « » = 13 « 1 »
h
¬ EJ ¼ A ¬  y1 + 2 y2  y3 ¼

ª 2 1 1º
y A = AxA = AA y = A( A A) A y = 3 « 1 2 1»» y

A
c 1
c 1«

«¬ 1 1 2 »¼

ª 2 y1  y2  y3 º
y A = ««  y1 + 2 y2  y3 »»
1
3
«¬  y1  y2 + 2 y3 »¼

( )
i A = y  Ax A = I n  AA A y = ª¬I n  A ( A cA) 1 A cº¼ y

ª1 1 1º ª y1 + y2 + y3 º
i A = ««1 1 1»» y =
1
3
1
3
«y + y + y »
« 1 2 3»
«¬1 1 1»¼ «¬ y1 + y2 + y3 »¼

|| i A ||2 = y c(I n  AA A )y = y c ª¬I n  A( AcA) 1 A cº¼ y

ª1 1 1º ª y1 º
|| i A ||2 = [ y1 , y2 , y3 ] 13 ««1 1 1»» «« y2 »»
«¬1 1 1»¼ «¬ y3 »¼

|| i A ||2 = 13 ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 ) .

Second, we identify the orthogonality of A and B . A is given, finding B is


the problem of horizontal rank partitioning of the projection matrix.

ª1 1 1º
G A := I n  H y = I n  AA = I n  A ( A cA ) A c = ««1 1 1»»  \ 3×3 ,

A
1 1
3
«¬1 1 1»¼

with special reference to the “hat matrix H y := A( A cA) 1 Ac ”. The diagonal


elements of G A are of special interest for robust approximation. They amount to
the uniform values
hii = 13 (2, 2, 2), ( gii )A = (1  hii ) = 13 (1,1,1).
3-2 The least squares solution: “LESS” 129
Note

( )
det G A = det I n  AA A = 0, rk(I n  AA A ) = n  m = 1

ª1 1 1º
« »
G A = ª¬I 3  AA º¼ = [ B, C] = «1 1 1»

A
1
3
«¬1 1 1»¼

B  \ 3×1 , C  \ 3× 2 .

The holonomity condition hDE + hEJ + hJD = 0 is reestablished by the orthogonal-


ity of BcA = 0 .
ª1 0º
BcA = 0 œ [1,1,1] «« 0 1 »» = [ 0, 0] .
1
3
«¬ 1 1»¼
ƅ
The G y -orthogonality condition of the matrices A and B has been successfully
used by G. Kampmann (1992, 1994, 1997), G. Kampmann and B. Krause (1996,
1997), R. Jurisch, G. Kampmann and B. Krause (1997), R. Jurisch and G.
Kampmann (1997, 1998, 2001 a, b, 2002), G. Kampmann and B. Renner (1999),
R. Jurisch, G. Kampmann and J. Linke (1999 a, b, c, 2000) in order to balance
the observational weights, to robustify G y -LESS and to identify outliers. The
Grassmann- Plücker coordinates which span the normal space R ( A ) will be
A

discussed in Chapter 10 when we introduce condition equations.


3-224 Fuzzy sets
While so far we have used geometry to classify the objective functions as well as
the observation space Y, an alternative concept considers observations as ele-
ments of the set Y = [ y1 ," , yn ] . The elements of the set get certain attributes
which make them fuzzy sets. In short, we supply some references on “fuzzy sets”,
namely G. Alefeld and J. Herzberger (1983), B. F. Arnold and P. Stahlecker
(1999), A. Chaturvedi and A. T. K. Wan (1999), S. M. Guu, Y. Y. Lur and C. T.
Pang (2001), H. Jshibuchi, K. Nozaki and H. Tanaka (1992), H. Jshibuchi, K.
Nozaki, N. Yamamoto and H. Tanaka (1995), B. Kosko (1992), H. Kutterer
(1994, 1999), V. Ravi, P. J. Reddy and H. J. Zimmermann (2000), V. Ravi and H.
J. Zimmermann (2000), S. Wang, T. Shi and C. Wu (2001), L. Zadch (1965), H.
J. Zimmermann (1991).

3-23 G x -LESS and its generalized inverse


A more formal version of the generalized inverse which is characteristic for G y -
LESS is presented by
130 3 The second problem of algebraic regression

Lemma 3.7 (characterization of G y -LESS):


x A = Ly is I-LESS of the inconsistent system of linear equations (3.1)
Ax + i = y , rk A = m , (or y  R ( A) ) if and only if the matrix L  \ m× n fulfils
ª ALA = A
« AL = ( AL)c. (3.42)
¬
The matrix L is the unique A1,2,3 generalized inverse, also called left inverse
A L .
x A = Ly is G y -LESS of the inconsistent system of linear equations (3.1)
Ax + i = y , rk A = m (or y  R ( A) ) if and only if the matrix L fulfils

ª G y ALA = G y A
«G AL = (G AL)c. (3.43)
¬ y y

The matrix L is the G y weighted A1,2,3 generalized inverse, in short A A , also


called weighted left inverse.

: Proof :
According to the theory of the generalized inverse presented in Appendix A
x A = Ly is G y -LESS of (3.1) if and only if A cG y AL = AcG y is fulfilled. In-
deed A cG y AL = AcG y is equivalent to the two conditions G y ALA = G y A and
G y AL = (G y AL)c . For a proof of such a statement multiply A cG y AL = AcG y
left by Lc and receive
LcA cG y AL = LcAcG y .

The left-hand side of such a matrix identity is a symmetric matrix. In conse-


quence, the right-hand side has to be symmetric, too. When applying the central
symmetry condition to
A cG y AL = AcG y or G y A = LcAcG y A ,

we are led to
G y AL = LcA cG y AL = (G y AL)c ,

what had to be proven.


? How to prove uniqueness of A1,2,3 = A A ?

Let us fulfil G y Ax A by

G y AL1 y = G y AL1 AL1 y = L1c AcG y AL1 y = L1c AcL1c AcG y y =


3-2 The least squares solution: “LESS” 131

= L1c A cLc2 A cG y y = L1c A cG y L 2 y = G y AL1 AL 2 y = G y AL 2 y ,


in particular by two arbitrary matrices L1 and L 2 , respectively, which fulfil
(i) G y ALA = G y A as well as
(ii) G y AL = (G y AL)c .
Indeed we have derived one result irrespective of L1 or L 2 .
ƅ
If the matrix of the metric G y of the observation space is positive definite, we
can prove the following duality

Theorem 3.8 (duality):


Let the matrix of the metric G x of the observation space be positive
definite. Then x A = Ly is G y -LESS of the linear model (3.1) for any
observation vector y  \ n , if x ~m = Lcy ~ is G y1 -MINOS of the linear
model y ~ = A cx ~ for all m × 1 columns y ~  R ( A c) .

: Proof :
If G y is positive definite, there exists the inverse matrix G y1 . (3.43) can be
transformed into the equivalent condition
A c = A cLcA and G y1LcAc = (G y1LcAc)c ,

which is equivalent to (1.33).

3-24 Eigenvalue decomposition of G y -LESS: canonical LESS


For the system analysis of an inverse problem the eigenspace analysis and eigen-
space synthesis of x A G y -LESS of x is very useful and gives some peculiar
insight into a dynamical system. Accordingly we are confronted with the prob-
lem to construct “canonical LESS”, also called the eigenvalue decomposition of
G y -LESS.
First, we refer to the canonical representation of the parameter space X as well
as the observation space introduced to you in the first Chapter, Box 1.8 and Box
1.9. But here we add by means of Box 3.13 the comparison of the general bases
versus the orthonormal bases spanning the parameter space X as well as the
observation space Y . In addition, we refer to Definition 1.5 and Lemma 1.6
where the adjoint operator A # has been introduced and represented.

Box 3.13:
General bases versus orthonormal bases
spanning the parameter space X as well as the observation space Y
132 3 The second problem of algebraic regression

“left” “right”
“parameter space” “observation space”

“general left base” “general right base”


span{a1 ,..., am } = X Y = span{b1 ,..., bn }

: matrix of the metric : : matrix of the metric :


(3.44) aac = G x bbc = G y (3.45)

“orthonormal left base” “orthonormal right base”


span{e1x ,..., emx } = X Y = span{e1y ,..., eny }

: matrix of the metric : : matrix of the metric :


(3.46) e x ecx = I m e y ecy = I n (3.47)

“base transformation” “base transformation”


1 1
(3.48) a = ȁ x 9e x
2
b = ȁ y Ue y
2
(3.49)

versus versus
- 12 - 12
(3.50) e x = V cȁ x a e y = Ucȁ y b (3.51)

span{e1x ,..., e xm } = X Y = span{e1y ,..., e yn } .

Second, we are going to solve the overdetermined system of


{y = Ax | A  \ n× m , rk A = n, n > m}

by introducing
• the eigenspace of the rectangular matrix A  \ n× m of rank
r := rk A = m , n > m : A 6 A*
• the left and right canonical coordinates: x o x* , y o y *
as supported by Box 3.14. The transformations x 6 x* (3.52), y 6 y * (3.53)
from the original coordinates ( x1 ,..., xm ) to the canonical coordinates ( x1* ,..., xm* ) ,
the left star coordinates, as well as from the original coordinates ( y1 ,..., yn ) to the
canonical coordinates ( y1* ,..., yn* ) , the right star coordinates, are polar decompo-
1 1
sitions: a rotation {U, V} is followed by a general stretch {G y , G x } . Those root
2 2
1 1
matrices are generated by product decompositions of type G y = (G y )cG y as well
1 1
2 2

as G x = (G x )cG x . Let us substitute the inverse transformations (3.54)


1
2 2
1
x* 6 x = G x Vx* and (3.55) y * 6 y = G y Uy * into the system of linear equa-
- 2
- 2
3-2 The least squares solution: “LESS” 133

tions (3.1) y = Ax + i or its dual (3.57) y * = A* x* + i* . Such an operation leads


( )
us to (3.58) y * = f x* as well as (3.59) y = f ( x ) . Subject to the orthonormal-
ity conditions (3.60) U cU = I n and (3.61) V cV = I m we have generated the left–
right eigenspace analysis (3.62)
ªȁº
ȁ* = « »
¬0¼
subject to the horizontal rank partitioning of the matrix U = [ U1 , U 2 ] . Alterna-
tively, the left–right eigenspace synthesis (3.63)
ªȁº
A = G y [ U1 , U 2 ] « » V cG x
- 1 1
2 2

¬0¼
- 12
is based upon the left matrix (3.64) L := G y U and the right matrix (3.65)
1
R := G x V . Indeed the left matrix L by means of (3.66) LLc = G -1y reconstructs
2
-

the inverse matrix of the metric of the observation space Y . Similarly, the right
matrix R by means of (3.67) RR c = G -1x generates the inverse matrix of the
metric of the parameter space X . In terms of “ L , R ” we have summarized the
eigenvalue decompositions (3.68)-(3.73). Such an eigenvalue decomposition
helps us to canonically invert y * = A* x* + i* by means of (3.74), (3.75), namely
the rank partitioning of the canonical observation vector y * into y1*  \ r×1 and
y *2  \ ( n  r )×1 to determine x*A = ȁ -1 y1* leaving y *2 “unrecognized”. Next we shall
proof i1* = 0 if i1* is LESS.

Box 3.14:
Canonical representation,
overdetermined system of linear equations
“parameter space X ” 1
versus “observation space Y ” 1
(3.52) x* = V cG x x 2
y * = U cG y y (3.53) 2

and and
- 12 - 12
(3.54) x = G Vx x
*
y = G y Uy * (3.55)

“overdetermined system of linear equations”


{y = Ax + i | A  \ n× m , rk A = m, n > m}

(3.56) y = Ax + i versus y * = A * x* + i * (3.57)


- 12 - 12 - 12 1 1 1
G y Uy * = AG x Vx* + G y Ui* U cG y y = A* V cG x x + U cG y i
2 2 2

( 1
(3.58) y * = UcG y AG x V x* + i*
2
- 12
) ( 1
y = G y UA* V cG x x + i (3.59)
- 2
1
2
)
134 3 The second problem of algebraic regression

subject to subject to

(3.60) U cU = UUc = I n versus V cV = VV c = I m (3.61)

“left and right eigenspace”


“left-right eigenspace “left-right eigenspace
analysis” synthesis”
ª Uc º ªȁº ªȁº
A = G y [ U1 , U 2 ] « » V cG x (3.63)
1 1 1 1
(3.62) A* = « 1 » G y AG x V = « »
- 2 2
- 2 2

¬ U c2 ¼ ¬0¼ ¬0¼
“dimension identities”
ȁ  \ r × r , U1  \ n × r

0  \ ( n  r )× r , U 2  \ n × ( n  r ) , V  \ r × r

r := rk A = m, n > m
“left eigenspace” “right eigenspace”
- 12 1
- 12 1
(3.64) L := G y U Ÿ L-1 = U cG y 2
versus R := G x V Ÿ R -1 = V cG x (3.65) 2

 12  12
L1 := G y U1 , L 2 := G y U 2 Ÿ

(3.66) LLc = G -y1 Ÿ (L-1 )cL-1 = G y versus RR c = G x Ÿ (R )cR = G x (3.67)


-1 -1 -1

ª Uc º ª L º
1
L1 = « 1 » G y =: « 1 »
2

¬ U c2 ¼ ¬L 2 ¼

(3.68) A = LA* R -1 versus A* = L-1 AR (3.69)

ª ȁ º ª L º
(3.70) A = [ L1 , L 2 ] A # R 1 versus A* = « » = « 1 » AR (3.71)
¬ 0 ¼ ¬L 2 ¼

ª A # AL1 = L1 ȁ 2
(3.72) « # versus AA # R = Rȁ 2 (3.73)
«¬ A AL 2 = 0
“overdetermined system of linear
equations solved in canonical coordinates”

ªȁº ªi* º ª y * º
(3.74) y * = A* x* + i* = « » x* + « 1* » = « *1 »
¬0¼ ¬«i 2 ¼» ¬ y 2 ¼
“dimension identities”
3-2 The least squares solution: “LESS” 135

y *1  \ r ×1 , y *2  \ ( n  r )×1 , i*1  \ r ×1 , i*2  \ ( n  r )×1

y *1 = ȁx* + i*1 Ÿ x* = ȁ 1 (y *1  i*1 ) (3.75)

“if i* is LESS, then x*A = ȁ 1 y *1 , i*1 = 0 ”.


Consult the commutative diagram of Figure 3.5 for a shorthand summary of the
newly introduced transformations of coordinates, both of the parameter space X
as well as the observation space Y .

Third, we prepare ourselves for LESS of the overdetermined system of linear


equations
{y = Ax + i | A  \ n×m , rk A = m, n > m,|| i ||G2 = min} y

by introducing Lemma 3.9, namely the eigenvalue-eigencolumn equations of the


matrices A # A and AA # , respectively, as well as Lemma 3.11, our basic result
of “canonical LESS”, subsequently completed by proofs. Throughout we refer to
the adjoint operator which has been introduced by Definition 1.5 and Lemma 1.6.
A
X
x y R(A)  Y

1 1
V cG x 2
U cG y 2

X
x* * y*  R(A* )  Y
A
Figure 3.5:Commutative diagram of coordinate transformations

Lemma 3.9 (eigenspace analysis versus eigenspace synthesis of


the matrix {A  \ n× m , r := rk A = m < n} )
The pair of matrices {L, R} for the eigenspace analysis and the eigenspace
synthesis of the rectangular matrix A  \ n× m of rank r := rk A = m < n ,
namely
A* = L-1 AR versus A = LA* R -1
or or
ª ȁ º ª L º ªȁº
A* = « » = « 1 » AR versus A = [ L1 , L 2 ] « » R
¬ 0 ¼ ¬L 2 ¼ ¬0¼
136 3 The second problem of algebraic regression

are determined by the eigenvalue–eigencolumn equations (eigenspace equa-


tions) of the matrices A # A and AA # , respectively, namely

ª AA # L1 = L1 ȁ 2
A # AR = Rȁ 2 versus «
¬ AA L 2 = 0
#

subject to
ªO12 … 0 º
« »
(
ȁ 2 = « # % # » , ȁ = Diag + O12 ,..., + Or2 . )
« 0 " Or2 »
¬ ¼

Let us prove first A # AR = Rȁ 2 , second A # AL1 = L1 ȁ 2 , AA # L 2 = 0 .


(i) A # AR = Rȁ 2
A # AR = G -1x AcG y AR =
ª Uc º - ªȁº
= G -1xG x V [ ȁ, 0c] « 1 » (G y )cG y G y [ U1 , U 2 ] « » V cG x G x V
1 1 1 1 1
2 2
- 2
- 2 2

¬ U c2 ¼ ¬0 ¼
ªȁº
A # AR = G x V [ ȁ, 0c] « » = G x Vȁ 2
1
- - 1
2 2

¬ ¼0
A # AR = Rȁ 2 . ƅ
(ii) AA # L1 = L1 ȁ 2 , AA # L 2 = 0

AA # L = AG -1x A cG y L =
ªȁº ª Uc º -
= G y [ U1 , U 2 ] « » V cG x G -1x G x V [ ȁ, 0c] « 1 » (G y )cG y G y [ U1 , U 2 ]
1 1 1 1 1
-2 2 2
-
2 2

¬0¼ c
¬U2 ¼
ªȁº ª U c U U1c U 2 º
AA # L = [ L1 , L 2 ] « » [ ȁ, 0c] « 1 1 »
¬0¼ ¬ U c2 U1 U c2 U 2 ¼
ªȁ2 0c º ª I r 0 º
AA # L = [ L1 , L 2 ] « »«
¬0 0¼¬0 I n-r »¼

AA # [ L1 , L 2 ] = ª¬ L1 ȁ 2 , 0 º¼ , AA # L1 = L1 ȁ 2 , AA # L 2 = 0. ƅ
The pair of eigensystems {A # AR = Rȁ 2 , AA # [L1 , L 2 ] = ª¬L1 ȁ 2 ,0º¼} is unfortu-
nately based upon non-symmetric matrices AA # = AG -1x A cG y and
A # A = G -1x AcG y A which make the left and right eigenspace analysis numerically
more complex. It appears that we are forced to use the Arnoldi method rather
than the more efficient Lanczos method used for symmetric matrices.
3-2 The least squares solution: “LESS” 137
In this situation we look out for an alternative. Actually as soon as we substitute
- 12 - 12
{L, R} by {G y U, G x V}
- 12
into the pair of eigensystems and consequently multiply AA # L by G x , we
achieve a pair of eigensystems identified in Corollary 3.10 relying on symmetric
matrices. In addition, such a pair of eigensystems produces the canonical base,
namely orthonormal eigencolumns.

Corollary 3.10 (symmetric pair of eigensystems):


The pair of eigensystems
1 1
- 12 - 12
(3.76) G y AG -1x A c(G y )cU1 = ȁ 2 U1 versus (G x )cA cG y AG x V = Vȁ 2 (3.77)
2 2

- 12 - 12
1
- 12 | (G x )cA cG y AG x  Ȝ 2j I r |= 0
(3.78) | G y AG Ac(G )c  Ȝ I |= 0 versus
2 -1
x y
2
i n
(3.79)
is based upon symmetric matrices. The left and right eigencolumns are
orthogonal.

Such a procedure requires two factorizations,


1 1
1 1
- 12 - 12 - 12 - 12
G x = (G x )cG x , G -1x = G x (G x )c
2 2
and G y = (G y )cG y , G -1y = G y (G y )c
2 2

via Choleskifactorization or eigenvalue decomposition of the matrices G x and


Gy .

Lemma 3.11 (canonical LESS):

Let y * = A* x* + i* be a canonical representation of the overdeter-


mined system of linear equations
{y = Ax + i | A  \ n× m , r := rk A = m, n > m} .

Then the rank partitioning of y * = ª¬(y *1 )c, (y *2 )cº¼c leads to the canoni-
cal unknown vector

ª y* º ª y* º y *  \ r ×1
(3.80) x*A = ª¬ ȁ -1 , 0 º¼ « *1 » = ȁ -1 y *1 , y * = « *1 » , * 1 ( n  r )×1 (3.81)
¬y 2 ¼ ¬y 2 ¼ y 2  \
and to the canonical vector of inconsistency

ª i* º ª y* º ª ȁ º i* = 0
(3.82) i*A = « *1 » := « *1 »  « » ȁ -1 y *1 or *1 *
¬i 2 ¼ A ¬ y 2 ¼ A ¬ 0 ¼ i2 = y2
138 3 The second problem of algebraic regression

of type G y -LESS. In terms if the original coordinates x  X a


canonical representation of G y -LESS is
1 ª Uc º 1
x A = G x V ª¬ ȁ -1 , 0 º¼ « 1 » G y y
- 2 2
(3.83)
¬ U c2 ¼
- 12 1
x A = G x Vȁ -1 U1c G y y = Rȁ -1 L-1 y. 2
(3.84)

x A = A A y is built on the canonical (G x , G y ) weighted right in-


verse.
For the proof we depart from G y -LESS (3.11) and replace the matrix A  \ n× m
by its canonical representation, namely by eigenspace synthesis.

x A = ( A cG y A ) A cG y y º
-1

»
ªȁº » Ÿ
A = G y [ U1 , U 2 ] « » V cG x »
1 1
- 2 2

¬0¼ ¼»
ª Uc º  ªȁº
Ÿ A cG y A = (G x )cV [ ȁ, 0] « 1 » (G y )cG y G y [ U1 , U 2 ] « » V cG x
1 1 1 1
2
 2 2 2

¬ U c2 ¼ ¬0¼

A cG y A = (G x )cVȁ 2 V cG x , ( AcG y A ) = G x Vȁ -2 V c(G x )c


1 1 -1 - 12 - 12
2 2

ª Uc º -
Ÿ x A = G x Vȁ 2 V c(G x )c(G x )cV [ ȁ, 0] « 1 » (G y )cG y y
- 1 - 1 1 1
2 2 2 2

¬ U c2 ¼
1 ª Uc º 1
x A = G x V ª¬ ȁ -1 , 0 º¼ « 1 » G y y
- 2 2

¬ U c2 ¼
- 12 1
x A = G x Vȁ -1 U1c G y y = A -A y 2

- 12 1
A A- = G x Vȁ -1 U1c G y  A1,2,3
G
2
y

( G y weighted reflexive inverse)

1 ª Uc º
1 º 1
x*A = V cG x x A = ȁ -1 U1c G y y = ª¬ ȁ -1 , 0 º¼ « 1 » G y y »
2 2 2

¬ U c2 ¼ »Ÿ
ª y º ª Uc º
* » 1
y * = « *1 » = « 1 » G y y » 2

y
¬ 2¼ ¬ 2¼ U c »¼

ª y* º
Ÿ x*A = ª¬ ȁ -1 , 0 º¼ « *1 » = ȁ -1 y 1* .
¬y 2 ¼
3-2 The least squares solution: “LESS” 139

Thus we have proven the canonical inversion formula. The proof for the canoni-
cal representation of the vector of inconsistency is a consequence of the rank
partitioning
ª i* º ª y* º ª ȁ º i* , y *  \ r ×1
i*l = « 1* » := « 1* »  « » x*A , * 1 * 1 ( n  r )×1 ,
¬i 2 ¼ A ¬ y 2 ¼ ¬ 0 ¼ i2 , y2  \

ª i* º ª y * º ª ȁ º ª0º
i*A = « 1* » = « 1* »  « » ȁ -1 y1* = « * » .
¬i 2 ¼ A ¬ y 2 ¼ ¬ 0 ¼ ¬y 2 ¼
ƅ
The important result of x*A based on the canonical G y -LESS of {y * = A* x* + i* |
A*  \ n× m , rk A* = rk A = m, n > m} needs a comment. The rank partitioning
of the canonical observation vector y * , namely y1*  \ r , y *2  \ n  r again paved
the way for an interpretation. First, we appreciate the simple “direct inversion”
( )
x*A = ȁ -1 y1* , ȁ = Diag + O12 ,..., + Or2 , for instance
*
1
ª x º ª Ȝ1-1y1* º
« » « »
« ... » = « ... » .
« x*m » « Ȝ -1r y *r »
¬ ¼A ¬ ¼
Second, i1* = 0 , eliminates all elements of the vector of canonical inconsistencies,
for instance ª¬i1* ,..., ir* º¼ c = 0 , while i*2 = y *2 identifies the deficient elements of the
A
vector of canonical inconsistencies with the vector of canonical observations for
instance ª¬ir +1 ,..., in º¼ c = ª¬ yr*+1 ,..., yn* º¼ c . Finally, enjoy the commutative diagram
* *
A A
of Figure 3.6 illustrating our previously introduced transformations of type
( )
LESS and canonical LESS, by means of A A and A* , respectively.
A

AA
Y
y xA  X

1 1
UcG y 2
V cG x
2

Y
y* ( A* )A x*A  X

Figure 3.6: Commutative diagram of inverse coordinate transformations

A first example is canonical LESS of the Front Page Example by G y = I 3 ,


Gx = I2 .
140 3 The second problem of algebraic regression

ª1 º ª1 1 º ª i1 º
ªx º
y = Ax + i : «« 2 »» = ««1 2 »» « 1 » + ««i2 »» , r := rk A = 2
x
«¬ 4 »¼ «¬1 3 »¼ ¬ 2 ¼ «¬ i3 »¼

left eigenspace right eigenspace


AA # U1 = AAcU1 = U1 ȁ 2
A # AV = A cAV = Vȁ 2
AA U 2 = AAcU 2 = 0
#

ª2 3 4 º
ª3 6 º
AA c = «« 3 5 7 »» «6 14 » = A cA
«¬ 4 7 10 »¼ ¬ ¼

eigenvalues
| AAc  Oi2 I 3 |= 0 œ | A cA  O j2 I 2 |= 0 œ

i  {1, 2,3} j  {1, 2}

17 1 17 1
œ O12 = + 265, O22 =  265, O32 = 0
2 2 2 2
left eigencolumns right eigencolumns

ª 2  O12 3 4 º ª u11 º
« » ª3  O12 6 º ª v11 º
(1st) « 3 5  O12
7 » ««u21 »» = 0 (1st) « »« » = 0
« 4 ¬ 6 14  O12 ¼ ¬ v 21 ¼
¬ 7 10  O12 »¼ «¬u31 »¼

subject to subject to
u112 + u21
2
+ u312 = 1 v112 + v 221 = 1

ª(2  O12 )u11 + 3u21 + 4u31 = 0


« versus (3  O12 ) v11 + 6 v 21 = 0
¬ 3u11 + (5  O1 )u21 + 7u31 = 0
2

ª 2 36 72
« v11 = 36 + (3  O 2 ) 2 = 265 + 11 265
« 1

« 2 (3  O1 ) 2
2
193 + 11 265
« v 21 = =
«¬ 36 + (3  O1 )
2 2
265 + 11 265

ª u112 º ª (1 + 4O12 ) 2 º
« 2» 1 « »
«u21 » = (1 + 4O 2 ) 2 + (2  7O 2 ) 2 + (1  7O 2 + O 4 ) 2
2 2
« (2  7O1 ) »
« 2»
¬ (1  7O1 + O1 ) ¼
1 1 1 1 « 2 4 2»
¬u31 ¼
3-2 The least squares solution: “LESS” 141

(
ª 35 + 2 265 2 º
« » )
ª u112 º « 2»
« 2» 2 «§ 115 + 7 265 · »
«u21 » = ¨ ¸ »
«u31
2 » 43725 + 2685 265 «© 2 2 ¹
¬ ¼ « »
( )
2
« 80 + 5 265 »
¬« »¼

ª 2  O22 3 5 º ª v12 º
ª3  O22 7 º ª u12 º « »
(2nd) « »« » = 0 (2nd) « 3 2
5  O2 9 » «« v 22 »» = 0
¬ 7 21  O22 ¼ ¬u22 ¼ « 5
¬ 9 17  O22 »¼ «¬ v32 »¼
subject to subject to
u122 + u22
2
+ u322 = 1 v122 + v 22
2
=1

ª(2  O22 )u12 + 3u22 + 4u32 = 0


« versus (3  O22 ) v12 + 6 v 22 = 0
¬ 3u12 + (5  O2 )u22 + 7u32 = 0
2

ª 2 36 72
« v12 = 36 + (3  O 2 ) 2 = 265  11 265
« 2

« 2 2 2
(3  O2 ) 193  11 265
« v 22 = =
¬« 36 + (3  O 2 2
2 ) 265  11 265

ª u122 º ª (1 + 4O22 ) 2 º
« 2» 1 « »
«u22 » = (1 + 4O 2 ) 2 + (2  7O 2 ) 2 + (1  7O 2 + O 4 ) 2
2 2
« (2  7O2 ) »
«u32
2 » 2 2 2 2 « (1  7O22 + O24 ) 2 »
¬ ¼ ¬ ¼
ª (35  2 265) 2 º
ª u122 º « »
« 2» 2 « 115 7 2»
«u22 » = (  265)
«u32
2 » 43725  2685 265 « 2 2 »
¬ ¼ « 2
»
¬« (80  5 265) »¼
ª 2 3 4 º ª u13 º
(3rd) «« 3 5 7 »» ««u23 »» = 0 subject to u132 + u23
2
+ u332 = 1
«¬ 4 7 10 »¼ «¬u33 »¼

2u13 + 3u23 + 4u33 = 0


3u13 + 5u23 + 7u33 = 0

ª 2 3º ª u13 º ª 4 º ª u13 º ª 5 3º ª 4º


« 3 5» «u » = « 7 » u33 œ «u » =  « 3 2 » « 7 » u33
¬ ¼ ¬ 23 ¼ ¬ ¼ ¬ 23 ¼ ¬ ¼¬ ¼
142 3 The second problem of algebraic regression

u13 = +u33 , u23 = 2u33

1 2 1
u132 = , u23
2
= , u332 = .
6 3 6
There are four combinatorial solutions to generate square roots.

ª ± u132 º
u13 º « ± u11 ± u122
2
ª u11 u12 »
«u u23 »» = « ± u21
2
± u22
2 2 »
± u23
« 21 u22 « »
«¬u31 u32 u33 »¼ « ± u 2 ± u32
2
± u33
2 »
31
¬ ¼

ª v11 v12 º ª ± v11 2


± v122 º

«v = « ».
¬ 21 v 22 »¼ « ± v 2 ± v 222 »¼
¬ 21

Here we have chosen the one with the positive sign exclusively. In summary, the
eigenspace analysis gave the result as follows.

§ 17 + 265 17  265 ·
ȁ = Diag ¨ , ¸
¨ 2 2 ¸
© ¹

ª 35 + 2 265 35  2 265 º
« 2 2 1 »
« 43725 + 2685 265 43725  2685 265 6»
« 6 »
« 2 115 + 7 265 2 115  7 265 1 »
U=« 6 = [ U1 , U 2 ]
2 43725 + 2685 265 2 43725  2685 265 3 »
« »
« 1 »
« 80 + 5 265 80  5 265 6
2 2 6 »
« 43725 + 2685 265 43725  2685 265 »
¬ ¼
ª 72 72 º
« »
« 265 + 11 265 265  11 265 »
V=« ».
« 193 + 11 265 193  11 265 »
« »
¬ 265 + 11 265 265  11 265 ¼
3-3 Case study 143

3-3 Case study:


Partial redundancies, latent conditions, high leverage points
versus break points, direct and inverse Grassmann coordinates,
Plücker coordinates
This case study has various targets. First we aim at a canonical analysis of the
hat matrices Hx and Hy for a simple linear model with a leverage point. The im-
pact of a high leverage point is studied in all detail. Partial redundancies are
introduced and interpreted in their peculiar role of weighting observations.
Second, preparatory in nature, we briefly introduce multilinear algebra, the op-
erations "join and meet", namely the Hodge star operator. Third, we go "from A
to B": Given the columns space R ( A) = G m , n ( A) , identified as the Grassmann
space G m , n  R n of the matrix A  R n× m , n > m, rk A = m , we construct the
column space R (B) = R A ( A) = G n  m , n  R n of the matrix B which agrees to
the orthogonal column space R A ( A) of the matrix A. R A ( A) is identified as
Grassmann space G n  m , n  R n and is covered by Grassmann coordinates, also
called Plücker coordinates pij. The matrix B, alternatively the Grassmann coor-
dinates (Plücker coordinates), constitute the latent restrictions, also called latent
condition equations, which control parameter adjustment and lead to a proper
choice of observational weights. Fourth, we reverse our path: we go “from B to
A”: Given the column space R (B) of the matrix of restrictions B  RA × n , A < n ,
rk B = A we construct the column space R A (B) = R ( A)  R n , the orthogonal
column space of the matrix B which is apex to the column space R (A) of the
matrix A. The matrix A, alternatively the Grassmann coordinates (Plücker coor-
dinates) of the matrix B constitute the latent parametric equations which are
“behind a conditional adjustment”. Fifth, we break-up the linear model into
pieces, and introduce the notion of break points and their determination.
The present analysis of partial redundancies and latent restrictions has been pio-
neered by G. Kampmann (1992), R. Jurisch, G. Kampmann and J. Linke (1999a,
b) as well as R. Jurisch and G. Kampmann (2002 a, b). Additional useful refer-
ences are D. W. Behmken and N. R. Draper (1972), S. Chatterjee and A. S. Hadi
(1988), R. D. Cook and S. Weisberg (1982). Multilinear algebra, the operations
“join and meet” and the Hodge star operator are reviewed in W. Hodge and D.
Pedoe (1968), C. Macinnes (1999), S. Morgera (1992), W. Neutsch (1995), B. F.
Doolin and C. F. Martin (1990). A sample reference for break point synthesis is
C. H. Mueller (1998), N. M. Neykov and C. H. Mueller (2003) and D. Tasche
(2003).
3-31 Canonical analysis of the hat matrix, partial redundancies, high
leverage points
A beautiful example for the power of eigenspace synthesis is the least squares fit
of a straight line to a set of observation: Let us assume that we have observed a
dynamical system y(t) which is represented by a polynomial of degree one with
respect to time t.
144 3 The second problem of algebraic regression

y (ti ) = 1i x1 + ti x2  i  {1," , n} .

Due to y • (t ) = x2 it is a dynamical system with constant velocity or constant first


derivative with result to time t0. The unknown polynomial coefficients are col-
lected in the column array x = [ x1 , x2 ]c, x  X = R 2 , dim X = 2 and constitute the
coordinates of the two-dimensional parameter space X . For this example we
choose n = 4 observations, namely y = [ y (t1 ), y (t2 ), y (t3 ), y (t4 )]c , y  Y = R 4 ,
dim Y = 4 . The samples of the polynomial are taken at t1 = 1, t2 = 2, t3 = 3 and t4
= a. With such a choice of t4 we aim at modeling the behavior of high leverage
points, e.g. a >> (t1 , t2 , t3 ) or a o f , illustrated by Figure 3.7.

y4 *
y3 *
y (t )
y2
y1 * *

t1 = 1 t2 = 2 t3 = 3 t4 = a
t
Figure 3.7: Graph of the function y(t), high leverage point t4=a
Box 3.15 summarizes the right eigenspace analysis of the hat matrix
H y : =A(AcA)- Ac . First, we have computed the spectrum of A cA and ( A cA) 1
for the given matrix A  R 4× 2 , namely the eigenvalues squared
2
O1,2 = 59 ± 3261 . Note the leverage point t4 = a = 10. Second, we computed the
right eigencolumns v1 and v2 which constitute the orthonormal matrix
V  SO(2) . The angular representation of the orthonormal matrix V  SO(2)
follows: Third, we take advantage of the sine-cosine representation (3.85)
V  SO(2) , the special orthonormal group over R2. Indeed, we find the angular
parameter J = 81o53ƍ25.4Ǝ. Fourth, we are going to represent the hat matrix Hy in
terms of the angular parameter namely (3.86) – (3.89). In this way, the general
representation (3.90) is obtained, illustrated by four cases. (3.86) is a special case
of the general angular representation (3.90) of the hat matrix Hy. Five, we sum
up the canonical representation AV cȁ 2 V cA c (3.91), of the hat matrix Hy, also
called right eigenspace synthesis. Note the rank of the hat matrix, namely
rk H y = rk A = m = 2 , as well as the peculiar fourth adjusted observation
1
yˆ 4 = y4 (I  LESS) = ( 11 y1 + y2 + 13 y3 + 97 y4 ) ,
100
which highlights the weight of the leverage point t4: This analysis will be more
pronounced if we go through the same type of right eigenspace synthesis for the
leverage point t4 = a, ao f , outlined in Box 3.18.
3-3 Case study 145

Box 3.15
Right eigenspace analysis
of a linear model of an univariate
polynomial of degree one
- high leverage point a =10 -
“Hat matrix H y = A( A cA) 1 A = AVȁ 2 V cAc ”
ª A cAV = Vȁ 2
«
right eigenspace analysis: «subject to
« VV c = I 2
¬

ª1 1 º
«1 2 »
A := « » , A cA = ª 4 16 º , ( AA) 1 = 1 ª 57 8º
«1 3 » « » 100 ¬« 8 2 ¼»
¬16 114¼
« »
¬1 10 ¼

spec( A cA) = {O12 , O 22 } : A cA  O 2j I 2 = 0,  j  {1, 2} œ

4  O2 16
2
= 0 œ O 4  118O 2 + 200 = 0
16 114  O
2
O1,2 = 59 ± 3281 = 59 ± 57.26 = 0

spec( A cA ) = {O12 , O 22 } = {116.28, 1.72}


versus
1 1
spec( A cA) 1 = { , } = {8.60 *103 , 0.58}
O12 O 22

ª( A cA  O 2j I 2 )V = 0
«
right eigencolumn analysis: «subject to
« VV c = I
¬ 2

ªv º
(1st) ( A cA  O12 I ) « 11 » = 0 subject to v112 + v21
2
=1
v
¬ 21 ¼

(4  O12 )v11 + 16v21 = 0 º


»Ÿ
v112 + v21
2
=1 »¼
146 3 The second problem of algebraic regression

16
v11 = + v112 = = 0.141
256 + (4  O12 ) 2

4  O12
v21 = + v21
2
= = 0.990
256 + (4  O12 ) 2

ªv º
(2nd) ( A cA  O 22 I 2 ) « 12 » = 0 subject to v122 + v22
2
=1
v
¬ 22 ¼

(4  O 22 )v12 + 16v22 = 0 º
»Ÿ
v122 + v222
=1 ¼
16
v12 = + v122 = = 0.990
256 + (4  O 22 ) 2

4  O 22
v22 = + v22
2
= = 0.141
256 + (4  O 22 ) 2

spec( A cA) = {116.28, 1.72}


right eigenspace: spec( A cA) 1 = {8.60 *103 , 0.58}
ªv v12 º ª 0.141 0.990 º
V = « 11 =  SO(2)
¬ v21 v22 »¼ «¬ 0.990 0.141»¼

V  SO(2) := {V  R 2×2 VV c = I 2 , V = 1}

“Angular representation of V  SO(2) ”


ª cos J sin J º ª 0.141 0.990º
V=« »=« » (3.85)
¬  sin J cos J ¼ ¬ 0.990 0.141¼
sin J = 0.990, cos J = 0.141, tan J = 7.021

J=81o.890,386 = 81o53’25.4”
hat matrix H y = A( A cA) 1 Ac = AVȁ 2 V cAc

ª 1 1 1 1 º
« O 2 cos J + O 2 sin J ( O 2 + O 2 ) sin J cos J »
2 2

( A cA) 1 = V/ 2 V = « 1 2 1 2 » (3.86)
« 1 1 1 1 »
«( 2 + 2 ) sin J cos J sin J + 2 cos J »
2 2
2
¬ O1 O 2 O1 O2 ¼
3-3 Case study 147
m=2
1
( A cA) j 1j =
1 2 ¦O 2
cos J j j cos J j
1 3 2 j3
(3.87)
j3 =1 j3

subject to
m=2
VV c = I 2 ~ ¦ cos J j1 j3 cos J j 2 j3
= Gj j 1 2
(3.88)
j3 =1

case 1: j1=1, j2=1: case 2: j1=1, j2=2:

cos 2 J11 + cos 2 J12 = 1 cos J11 cos J 21 + cos J12 cos J 22 = 0
(cos 2 J + sin 2 J = 1) ( cos J sin J + sin J cos J = 0)

case 3: j1=2, j2=1: case 4: j1=2, j2=2:


cos J 21 cos J11 + cos J 22 cos J12 = 0 cos 2 J 21 + cos 2 J 22 = 1
( sin J cos J + cos J sin J = 0) (sin 2 J + cos 2 J = 1)

( A cA) 1 =
ª O12 cos 2 J 11 + O22 cos 2 J 12 O12 cos J 11 cos J 21 + O22 cos J 12 cos J 22 º
« 2 »
¬O1 cos J 21 cos J 11 + O2 cos J 22 cos J 12 O12 cos 2 J 21 + O22 cos 2 J 22
2
¼
(3.89)
m=2
1
H y = AVȁ 2 V cA c ~ hi i = ¦ ai j ai 2 j2
cos J j j cos J j 2 j3
(3.90)
12
j1 , j2 , j3 =1
1 1
O j2 3
1 3

H y = A( A cA) 1 Ac = AVȁ 2 V cAc (3.91)

ª 0.849 1.131 º
« 1.839 1.272 »» 2
A ~ := AV = « , ȁ = Diag(8.60 × 103 , 0.58)
« 2.829 1.413 »
« »
¬ 9.759 2.400 ¼

ª 43 37 31 11º
« »
1 « 37 33 29 1 »
H y = A ~ ȁ 2 ( A ~ )c =
100 « 31 29 27 13 »
« »
¬« 11 1 13 97 ¼»
rk H y = rk A = m = 2

1
yˆ 4 = y4 (I -LESS) = ( 11 y1 + y2 + 13 y3 + 97 y4 ) .
100
148 3 The second problem of algebraic regression

By means of Box 3.16 we repeat the right eigenspace analysis for one leverage
point t4 = a, lateron a o f , for both the hat matrix H x : = ( A cA) 1 A c and
H y : = A( A cA) 1 Ac . First, Hx is the linear operator producing xˆ = x A (I -LESS) .
Second, Hy as linear operator generates yˆ = y A (I -LESS) . Third, the complemen-
tary operator I 4  H y =: R as the matrix of partial redundancies leads us to the
inconsistency vector ˆi = i A (I -LESS) . The structure of the redundancy matrix R,
rk R = n – m, is most remarkable. Its diagonal elements will be interpreted soon-
est. Fourth, we have computed the length of the inconsistency vector || ˆi ||2 , the
quadratic form y cRy .
The highlight of the analysis of hat matrices is set by computing
1st : H x (a o f) versus 2nd : H y (a o f)

3rd : R (a o f) versus 4th : || ˆi ( a o f) ||2

for “highest leverage point” a o f , in detail reviewed Box 3.17. Please, notice
the two unknowns x̂1 and x̂2 as best approximations of type I-LESS. x̂1 re-
sulted in the arithmetic mean of the first three measurements. The point
y4 , t4 = a o f , had no influence at all. Here, xˆ2 = 0 was found. The hat matrix
H y (a o f) has produced partial hats h11 = h22 = h33 = 1/ 3 , but h44 = 1 if
a o f . The best approximation of the I  LESS observations were yˆ1 = yˆ 2 = yˆ 3
as the arithmetic mean of the first three observations but ŷ4 = y4 has been a
reproduction of the fourth observation. Similarly the redundancy matrix
R (a o f) produced the weighted means iˆ1 , iˆ2 and iˆ3 . The partial redundan-
cies r11 = r22 = r33 = 2 / 3, r44 = 0 , sum up to r11 + r22 + r33 + r44 = n  m = 2 . Notice
the value iˆ4 = 4 : The observation indexed four is left uncontrolled.
Box 3.16
The linear model of a univariate
polynomial of degree one
- one high leverage point -
ª y1 º ª1 1º ª i1 º
« y » «1 2 » ª x1 º ««i2 »»
»
y = Ax + i ~ « 2 » = « +
« y3 » «1 3 » «¬ x2 »¼ « i3 »
« » « » « »
¬« y4 ¼» ¬«1 a ¼» ¬«i4 ¼»

x  R 2 , y  R 4 , A  R 4× 2 , rk A = m = 2

dim X = m = 2 versus dim Y = n = 4

(1st) xˆ = xA (I -LESS) = ( A cA) 1 A cy = H x y (3.92)


3-3 Case study 149

ª y1 º
ª8  a + a « »
1 2
2  2a + a 2
4  3a + a 2
14  6a º « y2 »
Hx = « »
18  12a + 3a 2 ¬ 2  a 2a 6a 6 + 3a ¼ « y3 »
« »
«¬ y4 »¼

(2nd ) yˆ = y A (I -LESS) = A c( A cA) 1 A cy = H y y (3.93)

“hat matrix”: H y = A c( A cA) 1 A c, rk H y = m = 2

ª 6  2a + a 2 4  3a + a 2 2  4a + a 2 8  3a º
« »
« 4  3a + a 6  4a + a 8  5a + a
2 2 2
1 2 »
Hy =
18  18a + 3a 2 « 2  4a + a 2 6  5a + a 2 14  6a + a 2 4 + 3a »
« »
¬« 8  3a 2 4 + 3a 14  12a + 3a 2 ¼»

(3rd) ˆi = i A (I -LESS) = (I 4  A( A cA) 1 A c) y = Ry

“redundancy matrix”: R = I 4  A( AcA) 1 Ac, rk R = n  m = 2

“redundancy”: n – rk A = n – m = 2

ª12  10a + 2a 2 4 + 3a  a 2 2 + 4 a  a 2 8 + 3a º
« »
« 4 + 3a  a 12  6a + 2a 2 8 + 5a  a 2
2
1 2 »
R=
18  12a + 3a 2 « 2 + 4a  a 2 8 + 5a  a 2 4  6a + 2a 2 4  3a »
« »
«¬ 8 + 3a 2 4  3a 4 »¼

(4th) || ˆi ||2 =|| i A (I -LESS) ||2 = y cRy .

At this end we shall compute the LESS fit

lim || iˆ(a ) ||2 ,


a of

which turns out to be independent of the fourth observation.


Box 3.17
The linear model of a univariate polynomial of degree one
- extreme leverage point a o f -
(1st ) H x (a o f)
ª8 1 2 2 4 3 14 6 º
1 « a2  a + 1 + a2  a + 1  a2  a + 1 a2  a »
Hx = « »
18 12 « 2 1 2 1 6 1 6 3»
 + 3 + 2 + 2  2+
a2 a «¬ a 2 a a a a a a a »¼
150 3 The second problem of algebraic regression

1 ª1 1 1 0 º
lim H x = « Ÿ
aof 3 ¬0 0 0 0 »¼

1
xˆ1 = ( y1 + y2 + y3 ), xˆ2 = 0
3
(2nd ) H y (a o f)
ª6 2 4 3 2 4 8 3 º
« a2  a + 1 a2  a + 1 a 2
 +1
a

a2 a »
« »
« 4  3 +1 6  4 +1 8 5
 +1
2 »
1 « a2 a a2 a a 2
a a 2 »
Hy = « »
18 12 2 4 8 5 14 6 4 3 »
 + 3 « 2  +1 2  +1  +1  2 +
a2 a «a a a a a2 a a a »
« 8 3 2 4 3 14 12 »
« 2  2+  + 3»
¬ a a a2 a a a2 a ¼

ª1 1 1 0º
« 0 »»
1 1 1 1
lim H y = « , lim h44 = 1 Ÿ
a of 3 «1 1 1 0 » a of
« »
¬«0 0 0 3 ¼»

1
yˆ1 = yˆ 2 = yˆ 3 = ( y1 + y2 + y3 ), yˆ 4 = y4
3
(3rd ) R (a o f)
ª 10 10 4 3 2 4 8 3º
« a2  a + 2  a2 + a 1  a2 + a 1  a2 + a »
« »
«  4 + 3  1 12  8 + 2  8 + 5  1  2
2 »
1 « a2 a a2 a a2 a a »
R= « »
18 12 2 4 8 5 4 6 4 3 »
 + 3 «  +  1  +  1  + 2 
a2 a « a2 a a2 a a2 a a2 a »
« 8 3 2 4 3 4 »
«  2+  2  »
¬ a a a a2 a a2 ¼

ª 2 1 1 0º
« 0 »»
1 1 2 1
lim R (a ) = « .
a of 3 « 1 1 2 0»
« »
¬0 0 0 0¼
3-3 Case study 151

1 1 1
iˆ1 = (2 y1  y2  y3 ), iˆ2 = ( y1 + 2 y2  y3 ), iˆ3 = ( y1  y2 + 2 y3 ), iˆ4 = 0
3 3 3

(4th ) LESS fit : || iˆ ||2

ª 2 1 1 0º
« 1 2 1 0 »»
1
lim || iˆ(a ) ||2 = y c « y
a of 3 « 1 1 2 0»
« »
«¬ 0 0 0 0 »¼

1
lim || iˆ(a ) ||2 = (2 y12 + 2 y22 + 2 y32  2 y1 y2  2 y2 y3  2 y3 y1 ) .
aof 3
A fascinating result is achieved upon analyzing (the right eigenspace of the hat
matrix H y (a o f) . First, we computed the spectrum of the matrices A cA and
( A cA) 1 . Second, we proved O1 (a o f) = f , O2 (a o f) = 3 or O11 (a o f) = 0 ,
O21 (a o f) = 1/ 3 .

Box 3.18
Right eigenspace analysis
of a linear model of a univariate polynomial of degree one
- extreme leverage point a o f -
“Hat matrix H y = A c( A cA) 1 A c ”

ª A cAV = Vȁ 2
«
right eigenspace analysis: «subject to
« VV c = I mc
¬

spec( A cA) = {O12 , O22 } : A cA  O j2 I = 0  j  {1, 2} œ

4  O2 6+a
= 0 œ O 4  O 2 (18 + a 2 ) + 20  12a + 3a 2 = 0
6 + a 14 + a  O
2 2

1
2
O1,2 = tr ( A cA) ± (tr A cA) 2  4 det A cA
2
tr A cA = 18 + a 2 , det A cA = 20  12a + 3a 3

(tr A cA) 2  4 det AcA = 244 + 46a + 25a 2 + a 4


152 3 The second problem of algebraic regression

a2 a4
2
O1,2 = 9+ ± 61 + 12a + 6a 2 +
2 4
spec( A cA) = {O1 , O2 } =
2 2

­° a 2 a4 a2 a4 ½°
= ®9 + + 61 + 12a + 6a 2 + , 9 +  61 + 12a + 6a 2 + ¾
°¯ 2 4 2 4 °¿
“inverse spectrum ”
1 1
spec( A cA) = {O12 , O22 } œ spec( A cA) 1 = { 2 , 2 }
O1 O2
a2 a4 9 1 61 12 6 1
9+  61 + 12a + 6a 2 + 2
+  4+ 3+ 2+
1 2 4 =a 2 a a a 4
=
O22 20  12a + 3a 2 20 12
 +3
a2 a
1
lim =0
a of O 2
1

a2 a4 9 1 61 12 6 1
9+ + 61 + 12a + 6a 2 + 2
+ + + + +
1
= 2 4 = a 2 a 4 a3 a 2 4
O12 20  12a + 3a 2 20 12
 +3
a2 a
1 1
lim =
a of O 2 3
2

1
lim spec( A cA)(a) = {f,3} œ lim spec( A cA) 1 = {0, }
aof aof 3
A cAV = Vȁ º 2

» Ÿ A cA = Vȁ V c œ ( A cA ) = Vȁ V c
2 1 2

VV c = I m ¼
“Hat matrix H y = AVȁ 2 V cA c ”.
3-32 Multilinear algebra, “join” and ”meet”, the Hodge star operator
Before we can analyze the matrices “hat Hy” and “red R” in more detail, we
have to listen to an “intermezzo” entitled multilinear algebra, “join” and “meet”
as well as the Hodge star operator. The Hodge star operator will lay down the
foundation of “latent restrictions” within our linear model and of Grassmann
coordinates, also referred to as Plücker coordinates.
Box 3.19 summarizes the definitions of multilinear algebra, the relations “join
and meet”, denoted by “ š ” and “*”, respectively. In terms of orthonormal base
vectors ei , " , ei , we introduce by (3.94) the exterior product ei š " š ei
1 k 1 m

also known as “join”, “skew product” or 1st Grassmann relation. Indeed, such an
exterior product is antisymmetric as defined by (3.95), (3.96), (3.97) and (3.98).
3-3 Case study 153

The examples show e1 š e 2 = - e 2 š e1 and e1 š e1 = 0 , e 2 š e 2 = 0 . Though the


operations “join”, namely the exterior product, can be digested without too much
of an effort, the operation ”meet”, namely the Hodge star operator, needs much
more attention. Loosely speaking the Hodge star operator or 2nd Grassmann
relation is a generalization of the conventional “cross product” symbolized by
“ × ”. Let there be given an exterior form of degree k as an element of /k(Rn)
over the field of real numbers Rn . Then the “Hodge *” transforms the input
exterior form of degree m to the output exterior form of degree n – m, namely an
element of /n-k(R n).
/m(R n) o Output: *X/
Input: X/ /n-m.
Applying the summation convention over repeated indices, (3.100) introduces
the input operation “join”, while (3.101) provides the output operation “meet”.
We say that X , (3.101) is a representation of the adjoint form based on the
original form X , (3.100). The Hodge dualizer is a complicated exterior form
(3.101) which is based upon Levi-Civita’s symbol of antisymmetry (3.102) which
is illustrated by 3 examples. H k1"kA is also known as the permutation operator.
Unfortunately, we have no space and time to go deeper into “join and meet“.
Instead we refer to those excellent textbooks on exterior algebra and exterior
analysis, differential topology, in short exterior calculus.
Box 3.19
“join and meet”
Hodge star operator “ š, ”
I := {i1 ," , ik , ik +1 ," , in }  {1," , n}

“join”: exterior product, skew product,


1st Grassmann relation
ei "i := ei š " š e j š e j +1 š " š ei
1 m 1 m
(3.94)

“antisymmetry”:
ei ...ij ...i =  ei ... ji...i i z j
1 m 1 m
(3.95)
ei š ... š e j š e j š ... š ei = ei š ... š e j š e j š ... š ei
1 k k +1 m 1 k +1 k m
(3.96)
ei "i i "i = 0 i = j
1 i j m
(3.97)
ei š " ei š e j š " š ei = 0 i = j
1 m
(3.98)
Example: e1 š e 2 = e 2 š e1 or e i š e j = e j š e i i z j

Example: e1 š e1 = 0, e 2 š e 2 = 0 or e i š e j = 0 i = j
“meet”: Hodge star operator,
Hodge dualizer
2nd Grassmann relation
154 3 The second problem of algebraic regression

: ȁ m ( R n ) o n  m ȁ ( R n ) (3.99)

“a m degree exterior form X  ȁ m ( R n ) over R n is related to a n-m de-


gree exterior form *X called the adjoint form”
:summation convention:
“sum up over repeated indices”
input: “join”
1
X= e i š " š e i X i "i 1 m
(3.100)
m! 1 m

output: “meet”
1
*X := g e j š" š e j H i "i j " j Xi "i 1 m
(3.101)
m !(n  m)! 1 nm 1 m 1 nm

antisymmetry operator ( “Eddington’s epsilons” ):


ª +1 for an even permutation of the indices k1 " kA
H k "k
1 A
:= «« 1 for an oded permutation of the indices k1 " kA (3.102)
«¬ 0 otherwise (for a repetition of the indices).

Example: H123 = H 231 = H 312 = +1


Example: H 213 = H 321 = H132 = 1
Example: H112 = H 223 = H 331 = 0.

For our purposes two examples on “Hodge’s star” will be sufficient for
the following analysis of latent restrictions in our linear model. In all
detail, Box 3.20 illustrates “join and meet” for
: ȁ 2 ( R 3 ) o ȁ 1 ( R 3 ) .

Given the exterior product a š b of two vectors a and b in R 3 with


ai 1 = col1 A, ai 2 = col 2 A
1 2

as their coordinates, the columns of the matrix A with respect to the or-
thonormal frame of reference {e1 , e 2 , e 3 |0} at the origin 0.
n =3
ašb = ¦e i1 š ei ai 1ai 2  ȁ 2 (R 3 )
2 1 2
i1 ,i2 =1

is the representation of the exterior form a š b =: X in the multibasis ei i = ei š ei . 12 1 2


By cyclic ordering, (3.105) is an explicit write-up of a š b  R ( A) . Please,
notice that there are
3-3 Case study 155

§n · §3·
¨ ¸=¨ ¸=3
© m¹ © 2¹
subdeterminants of A . If the determinant of the matrix G = I 4 , det G = 1
g = 1 , then according to (3.106), (3.107)
(a š b)  R ( A) A = G1,3

represent the exterior form *X , which is an element of R ( A) called Grassmann


space G1,3 . Notice that (a š b) is a vector whose Grassmann coordinate
(Plücker coordinate) are
§n · §3·
¨ ¸=¨ ¸=3
© m¹ © 2¹
subdeterminants of the matrix A, namely
a21a32  a31a22 , a31a12  a11a32 , a11a23  a21a12 .

Finally, (3.108) (e 2 š e 3 ) = e 2 × e 3 = e1 for instance demonstrates the relation


between " š, " called “join, meet” and the “cross product”.
Box 3.20
The first example:
“join and meet”
: ȁ 2 (R 3 ) o ȁ1 (R 3 )
Input: “join”
n =3 n =3
a = ¦ ei ai 1 ,
1 1
b =¦ ei ai 2 2 2 (3.103)
i =1 i =1

ai 1 = col1 A; ai 2 = col 2 A
1 2

1 n =3
ašb = ¦ ei šei ai 1ai 2  ȁ 2 (R 3 ) (3.104)
2! i ,i =1
1 2
1 2 1 2

“cyclic order
1
ašb = e 2 š e3 (a21a32  a31a22 ) +
2!
1
+ e3 š e1 (a31a12  a11a32 ) + (3.105)
2!
1
+ e1 š e 2 (a11a23  a21a12 )  R ( A ) = G 2,3 .
2!
156 3 The second problem of algebraic regression

Output: “meet” ( g = 1, G y = I 3 , m = 2, n = 3, n  m = 1)
n=2
1
(a š b) = ¦ e j H i ,i , j ai 1ai 2 (3.106)
i ,i , j =1 2!
1 2 1 2
1 2

1
*(a š b) = e1 ( a21a32  a31a22 ) +
2!
1
+ e 2 (a31a12  a11a32 ) + (3.107)
2!
1
+ e3 ( a11a23  a21a12 )  R A ( A ) = G1,3
2!

§n · §3·
¨ ¸ = ¨ ¸ subdeterminant of A
© m¹ © 2¹
Grassmann coordinates (Plücker coordinates)
(e 2 š e3 ) = e1 , (e3 š e1 ) = e 2 , (e1 š e 2 ) = e3 . (3.108)

Alternatively, Box 3.21 illustrates “join and meet” for selfduality


: ȁ 2 ( R 4 ) o ȁ 2 ( R 4 ) .

Given the exterior product a š b of two vectors a  R 4 and b  R 4 , namely the


two column vectors of the matrix A  R 4× 2 ,
ai 1 = col1 A, ai 2 = col 2 A
1 2

as their coordinates with respect to the orthonormal frame of reference


{e1 , e 2 , e 3 , e 4 | 0 } at the origin 0.
n=4
ašb = ¦e i1 š ei 2
ai 1ai 2  ȁ 2 (R 4 )
1 2
i1 ,i2 =1

is the representation of the exterior form a š b := X in the multibasis


ei i = ei š ei . By lexicographic ordering, (3.111) is an explicit write-up of
12 1 2

a š b ( R ( A)) . Notice that these are


§ n · § 4·
¨ ¸=¨ ¸=6
© m¹ © 2¹
subdeterminants of A . If the determinant of the matrix G of the metric is one
G = I 4 , det G = g = 1 , then according to (3.112), (3.113)
(a š b)  R ( A) A =: G 2,4
3-3 Case study 157

represents the exterior form X , an element of R ( A) A , called Grassmann space


G 2,4 . Notice that (a š b) is an exterior 2-form which has been generated by an
exterior 2-form, too. Such a relation is called “selfdual”. Its Grassmann coordi-
nates (Plücker coordinates) are

§ n · § 4·
¨ ¸=¨ ¸=6
© m¹ © 2¹
subdeterminants of the matrix A, namely
a11a12  a21a12 , a11a32  a31a22 , a11a42  a41a12 ,
a21a32  a31a22 , a21a42  a41a22 , a31a41  a41a32 .
Finally, (3.113), for instance (e1 š e 2 ) = e3 š e 4 , demonstrates the operation
" š, " called “join and meet”, indeed quite a generalization of the “cross prod-
uct”.
Box 3.21
The second example
“join and meet”
: / 2 ( R 4 ) o / 2 ( R 4 )
“selfdual”
Input : “join”
n=4 n=4
a = ¦ ei ai 1 , b = ¦ ei ai
1 1 2 2 2 (3.109)
i1 =1 i2 =1

(ai 1 = col1 ( A), ai 2 = col 2 ( A))


1 2

1 n=4
ašb = ¦ ei šei ai 1ai 2  ȁ 2 (R 4 ) (3.110)
2! i ,i =1
1 2
1 2 1 2

“lexicographical order”
1
ašb = e1 š e 2 ( a11a22  a21a12 ) +
2!
1
+ e1 š e 3 ( a11a32  a31a22 ) +
2! (3.111)
1
+ e1 š e 4 (a11a42  a41a12 ) +
2!
158 3 The second problem of algebraic regression

1
+ e 2 š e3 (a21a32  a31a22 ) +
2!
1
+ e 2 š e 4 (a21a42  a41a22 ) +
2!
1
+ e3 š e 4 (a31a42  a41a32 ) R ( A) A = G 2,4
2!

§ n · § 4·
¨ ¸ = ¨ ¸ subdeterminants of A:
© m¹ © 2¹
Grassmann coordinates ( Plücker coordinates).

Output: “meet” g = 1, G y = I 4 , m = 2, n = 4, n  m = 2
1 n=4 1
(a š b) = ¦ e j š e j Hi i
1 2 j1 j2
ai 1ai 2
2! i ,i , j , j
1 2 1 2 =1 2!
1 2 1 2

1
= e3 š e 4 (a11a22  a21a12 ) +
4
1
+ e 2 š e 4 (a11a32  a31a22 ) +
4
1
+ e3 š e 2 (a11a42  a41a12 ) + (3.112)
4
1
+ e 4 š e1 (a21a32  a31a22 ) +
4
1
+ e3 š e1 (a21a22  a41a22 ) +
4
1
+ e1 š e 2 (a31a42  a41a32 )  R ( A) A = G 2,4
4
§ n · § 4·
¨ ¸ = ¨ ¸ subdeterminants of A :
© m¹ © 2¹
Grassmann coordinates (Plücker coordinates).
(e1 š e 2 ) = e3 š e 4 , (e1 š e3 ) = e 2 š e 4 , (e1 š e 4 ) = e3 š e 2 ,
(e 2 š e3 ) = e 4 š e1 , (e 2 š e 4 ) = e3 š e1 , (3.113)
(e3 š e 4 ) = e1 š e 2 .

3-33 From A to B: latent restrictions, Grassmann coordinates, Plücker


coordinates.
Before we return to the matrix A  R 4× 2 of our case study, let us analyze the
matrix A  R 2×3 of Box 3.22 for simplicity. In the perspective of the example of
3-3 Case study 159

our case study we may say that we have eliminated the third observation, but
kept the leverage point. First, let us go through the routine to compute the hat
matrices H x = ( A c A) 1 A c and H y = A( A c A) 1 A c , to be identified by (3.115)
and (3.116). The corresponding estimations xˆ = x A (I -LESS) , (3.116), and
y = y A (I -LESS) , (3.118), prove the different weights of the observations
( y1 , y2 , y3 ) influencing x̂1 and x̂2 as well as ( yˆ1 , yˆ 2 , yˆ3 ) . Notice the great weight
of the leverage point t3 = 10 on ŷ3 .
Second, let us interpret the redundancy matrix R = I 3  A( AcA) 1 Ac , in particu-
lar the diagonal elements.
A cA (1) 64 A cA (2) 81 A cA (3) 1
r11 = = , r22 = = , r33 = = ,
det A cA 146 det AcA 146 det AcA 146
n =3
1
tr R = ¦ (AcA)(i ) = n  rk A = n  m = 1,
det A cA i =1
the degrees of freedom of the I 3 -LESS problem. There, for the first time, we
meet the subdeterminants ( A cA )( i ) which are generated in a two step procedure.

“First step” “Second step”


eliminate the ith row from A as well as compute the determinant A c( i ) A ( i ) .
the ith column of A.
Example : ( A cA)1

1 1
1 2 ( A cA )(1) = det A c(1) A (1) = 64
A c(1) A (1)
1 10 det A cA = 146

1 1 1 2 12
1 2 10 12 104

Example: ( AcA) 2

1 1
1 2 ( A cA )(2) = det A c(2) A (2) = 81
A c( 2) A ( 2)
1 10 det A cA = 146

1 1 1 2 11
1 2 10 11 101
160 3 The second problem of algebraic regression

Example: ( AcA)3

1 1
A c(3) A (3) 1 2 ( A cA )(3) = det A c(3) A (3) = 1
1 10 det A cA = 146
1 1 1 2 3
1 2 10 3 5

Obviously, the partial redundancies (r11 , r22 , r33 ) are associated with the influence
of the observation y1, y2 or y3 on the total degree of freedom. Here the observa-
tion y1 and y2 had the greatest contribution, the observation y3 at a leverage point
a very small influence.
The redundancy matrix R, properly analyzed, will lead us to the latent restric-
tions or “from A to B”. Third, we introduce the rank partitioning R = [ B, C] ,
rk R = rk B = n  m = 1, (3.120), of the matrix R of spatial redundancies. He-
re, b  R 3×1 , (3.121), is normalized to generate b = b / || b || 2 , (3.122). Note,
C  R 3× 2 is a dimension identity. We already introduced the orthogonality condi-
tion
bcA = 0 or bcAxA = bcyˆ = 0
(b )cA = 0 or (b )cAxA = (b )cyˆ = 0,

which establishes the latent restrictions (3.127)


8 yˆ1  9 yˆ 2 + yˆ 3 = 0.

We shall geometrically interpret this essential result as soon as possible. Fourth,


we aim at identifying R ( A) and R ( A) A for the linear model {Ax + i = y,
A  R n ×m , rk A = m = 2}
ª1º
wy y « »
t1 := = [e1 , e 2 , e3 ] «1» ,
y y

wx1
«¬1»¼
ª1 º
wy
= [e1 , e 2 , e3 ] « 2 »» ,
y «
t2 :=
y y

wx 2
«¬10 »¼

as derivatives of the observation functional y = f (x1 , x 2 ) establish the tangent


vectors which span a linear manifold called Grassmann space.
G 2,3 = span{t1 , t2 }  R 3 ,
3-3 Case study 161

in short GRASSMANN (A). Such a notation becomes more obvious if we com-


pute
ª a11 x1 + a12 x2 º n =3 m = 2
y = [e1 , e 2 , e3 ] « a21 x1 + a22 x2 »» = ¦ ¦ eiy aij x j ,
y y y «

«¬ a31 x1 + a32 x2 »¼ i =1 j =1
ª a11 º n =3
wy
(x1 , x 2 ) = [e1 , e 2 , e3 ] « a21 »» = ¦ eiy ai1
y y y «

wx1
«¬ a31 »¼ i =1
ª a12 º n =3
wy
(x1 , x 2 ) = [e1 , e 2 , e3 ] « a22 »» = ¦ eiy ai2 .
y y y «

wx 2
«¬ a32 »¼ i =1

Indeed, the columns of the matrix A lay the foundation of GRASSMANN (A).
Five, let us turn to GRASSMANN (B) which is based on the normal space
R ( A) A . The normal vector n = t1 × t 2 = (t1 š t 2 ) which spans GRASSMANN
(B) is defined by the “cross product” identified by " š, " , the skew product
symbol as well as the Hodge star symbol. Alternatively, we are able to represent
the normal vector n, (3.130), (3.132), (3.133), constituted by the columns {col1A,
col2A} of the matrix, in terms of the Grassmann coordinates (Plücker coordi-
nates).
a a22 a a32 a a12
p23 = 21 = 8, p31 = 31 = 9, p12 = 11 = 1,
a31 a32 a11 a12 a21 a22

identified as the subdeterminants of the matrix A, generated by


n =3

¦ (e
i1 ,i2 =1
i1 š ei )ai 1ai 2 .
2 1 2

If we normalize the vector b to b = b / || b ||2 and the vector n to n = n / || n ||2 ,


we are led to the first corollary b = n . The space spanned by the normal vector
n, namely the linear manifold G1,3  R 3 defines GRASSMANN (B). In exterior
calculus, the vector built on Grassmann coordinates (Plücker coordinates) is
called Grassmann vector g or normalized Grassmann vector g*, here
ª p23 º ª 8 º ª 8º
« » « » g 1 « »
g := p31 = 9 , g :=
= 9 .
« » « » & g & 2 146 « »
«¬ p12 »¼ «¬ 1 »¼ «¬ 1 »¼

The second corollary identifies b = n = g .


162 3 The second problem of algebraic regression

“The vector b which constitutes the latent restriction (latent condition equation)
coincides with the normalized normal vector n  R ( A) A , an element of the
space R ( A) A , which is normal to the column space R ( A) of the matrix A. The
vector b is built on the Grassmann coordinates (Plücker coordinates),
[ p23 , p31 , p12 ]c , subdeterminant of vector g in agreement with b .”

Box 3.22
Latent restrictions
Grassmann coordinates (Plücker coordinates)
the second example
ª y1 º ª1 1 º ª1 1 º
« y » = «1 2 » ª x1 º œ A = «1 2 » , rk A = 2
« 2» « » «x » « »
«¬ y3 »¼ «¬1 10 »¼ ¬ 2 ¼ «¬1 10 »¼

(1st) H x = ( A cA ) 1 A c

1 ª 92 79 25º
H x = ( AcA) 1 Ac = (3.115)
146 «¬ 10 7 17 »¼

1 ª 92 y1 + 79 y2  25 y3 º
xˆ = x A (I  LESS) = (3.116)
146 «¬ 10 y1  7 y2 + 17 y3 »¼

(2nd) H y = A( A cA) 1 A c

ª 82 72 8 º
1 «
H y = ( A cA) Ac =
1
72 65 9 »» , rk H y = rk A = 2 (3.117)
146 «
«¬ 8 9 145»¼

ª82 y1 + 72 y2  8 y3 º
1 «
yˆ = y A (I  LESS) = 72 y1 + 65 y2 + 3 y3 »» (3.118)
146 «
«¬ 8 y1 + 9 y2 + 145 y3 »¼

1
yˆ 3 = (8 y1 + 9 y2 + 145 y3 ) (3.119)
146

(3rd) R = I 3  A( A cA ) 1 Ac
3-3 Case study 163

ª 64 72 8 º
1 «
R = I 3  A( A cA) 1 Ac = « 72 81 9 »» (3.120)
146
«¬ 8 9 1 »¼

64 A cA (1) 81 A cA (2) 1 A cA (3)


r11 = = , r22 = = , r33 = =
146 det A cA 146 det A cA 146 det A cA
n =3
1
tr R = ¦ ( A cA)(i ) = n  rk A = n  m = 1
det A cA i =1
latent restriction

ª 64 72 8º
1 «
R = [B, C] = 72 81 9 » , rk R = 1 (3.120)
146 « »
«¬ 8 9 1»¼

ª 64 º ª 0.438 º
1 «
b := « 72 »» = ««  0.493»» (3.121)
146
«¬ 8 »¼ «¬ 0.053 »¼

ª 8º ª 0.662 º
b 1 « » «
b :=

= « 9 » = « 0.745 »» (3.122)
&b& 146
«¬ 1 »¼ «¬ 0.083 »¼

(3.123) bcA = 0 œ ( b )cA = 0 (3.124)

(3.125) bcyˆ = 0 œ (b )cyˆ = 0 (3.126)

8 yˆ1  9 yˆ 2 + yˆ 3 = 0 (3.127)

" R (A) and R ( A) A :


tangent space Tx M 2 versus normalspace N x M 2 ,
Grassmann manifold G m2,3  R 3 versus Grassmann manifold G1,3
nm  R "
3

ª1º
wy y « »
“the first tangent vector”: t1 := = [e1 , e 2 , e 3 ] 1
y y
(3.128)
wx1 « »
«¬1»¼
164 3 The second problem of algebraic regression

ª1 º
wy
“the second tangent vector”: t 2 := = [e1y , e 2y , e 3y ] « 2 » (3.129)
wx2 « »
«¬10»¼

“ Gm,n ”
G 2,3 = span{t1 , t 2 }  R 3 : Grassmann ( A )
“the normal vector”
n := t1 × t 2 = ( t1 š t 2 ) (3.130)
n =3 n =3
t1 = ¦ ei ai 1 1 1
and t1 = ¦ ei ai 2 2 2 (3.131)
i =1 i =1

n =3 n =3
n= ¦ee i1 i2 ai 1ai 2 =
1 2 ¦ (e i1 š ei )ai 1ai
2 1 2 2 (3.132)
i1 ,i2 =1 i1 ,i2 =1

i, i1 , i2  {1," , n = 3}

n= versus n=
= e 2 × e3 (a21a32  a31a22 ) = (e 2 × e3 )( a21a32  a31a22 ) +
(3.133) +e3 × e1 (a31a12  a11a32 ) + (e3 × e1 )(a31a12  a11a32 ) + (3.134)
+e1 × e 2 (a11a22  a21a12 ) + (e1 × e 2 )( a11a22  a21a12 )

ª (e 2 š e 3 ) = e 2 × e 3 = e1
Hodge star operator : « (e š e ) = e × e = e (3.135)
« 3 1 3 1 2

«¬ (e1 š e 2 ) = e1 × e 2 = e 3

ª8 º
n = t1 × t 2 = ( t1 × t 2 ) = [e , e , e ] « 9 » y y y
(3.136)
« » 1 2 3

«¬1 »¼

ª8 º
n 1 « »
n :=

= [e1 , e 2 , e3 ]
y y y
9 (3.137)
|| n || 146 « »
«¬1 »¼

Corollary: b = n

“Grassmann manifold G n  m ,n “
3-3 Case study 165

G1,3 = span n  R 3 : Grassmann(B)


Grassmann coordinates (Plücker coordinates)

ª1 1º
a a22 a31 a32 a11 a12
A = « 1 2 » , g ( A ) := { 21 , , }=
« » a31 a32 a11 a12 a21 a22
«¬10 10»¼ (3.138)
1 2 1 10 1 1
={ , , } = {8, 9,1}
1 10 1 1 1 2

(cyclic order)
g ( A) = { p23 , p31 , p12 }

p23 = 8, p31 = 9, p12 = 1

ª p23 º ª8 º
Grassmann vector : g := «« p31 »» = «« 9 »» (3.139)
«¬ p12 »¼ ¬«1 ¼»

ª8 º
g 1 « »
normalized Grassmann vector: g :=
= 9 (3.140)
|| g || 146 « »
«¬1 »¼

Corollary : b = n = g . (3.141)

Now we are prepared to analyze the matrix A  R 2× 4 of our case study. Box 3.23
outlines first the redundancy matrix R  R 2× 4 (3.142) used for computing the
inconsistency coordinates iˆ4 = i4 (I  LESS) , in particular. Again it is proven that
the leverage point t4=10 has little influence on this fourth coordinate of the in-
consistency vector. The diagonal elements (r11, r22, r33, r44) of the redundancy
matrix are of focal interest. As partial redundancy numbers (3.148), (3.149),
(3.150) and (3.151)
AA (1) 57 AA ( 2) 67 AA (3) 73 AA ( 4) 3
r11 = = , r22 = = , r33 = = , r44 = = ,
det A cA 100 det A cA 100 det A cA 100 det A cA 100
they sum up to
n=4
1
tr R = ¦ (AcA)(i ) = n  rk A = n  m = 2 ,
det A cA i =1
the degree of freedom of the I 4 -LESS problem. Here for the second time we
meet the subdeterminants ( A cA )( i ) which are generated in a two-step procedure.
166 3 The second problem of algebraic regression

“First step” “Second step”


eliminate the ith row from A as well as compute the determinant of
the ith column of Ac . A c( i ) A ( i )

Box 3.23
Redundancy matrix of a linear model of a uninvariant polynomial of degree
one
- light leverage point a=10 -
“Redundancy matrix R = (I 4  A( A cA) 1 A c) ”

ª 57 37 31 11 º
« »
1 « 37 67 29 1 »
I 4  A( AcA) 1 Ac = (3.142)
100 « 31 29 73 13»
« »
«¬ 11 1 13 3 »¼

iˆ4 = i4 (I -LESS) = Ry (3.143)

1
iˆ4 = i4 (I -LESS) = (11 y1  y2  13 y3 + 3 y4 ) (3.144)
100
57 67 73 3
r11 = , r22 = , r33 = , r44 = (3.145)
100 100 100 100
“rank partitioning”
R  R 4×4 , rk R = n  rk A = n  m = 2, B  R 4×2 , C  R 4×2

R = I 4  A( A cA) 1 A c = [B, C] (3.146)

ª 57 37 º
« 67 »
1 « 37 » , then BcA = 0 ”
“ if B := (3.147)
100 « 31 29 »
« »
¬ 11 1 ¼
A cA (1) A cA ( 2 )
(3.148) r11 = , r22 = (3.149)
det A cA det A cA
c
A A (3) A cA ( 4 )
(3.150) r33 = , r44 = (3.151)
det A cA det A cA
n =4
1
tr R = ¦ (AcA)(i ) = n  rk A = n  m = 2 (3.152)
det A cA i =1
3-3 Case study 167

Example: ( A cA )(1)

1 1
A c(1) A (1) 1 2 ( A cA )(1) =det ( A c(1) A (1) ) =114
1 3
det A cA = 200
1 10

1 1 1 1 3 15
1 2 3 10 15 113

Example: ( A cA)( 2)

1 1
A c( 2) A ( 2) 1 2 ( A cA)( 2) =det ( A c( 2) A ( 2) ) =134
1 3
det A cA = 200
1 10

1 1 1 1 3 14
1 2 3 10 14 110

Example: ( A cA)(3)

1 1
A c(3) A (3) 1 2 ( A cA)(3) =det ( A c(3) A (3) ) =146
1 3
det A cA = 200
1 10

1 1 1 1 3 13
1 2 3 10 13 105

Example: ( A cA)( 4)

1 1
A c( 2) A ( 2) 1 2 ( A cA)( 4) =det ( A c( 4) A ( 4) ) =6
1 3
det A cA = 200
1 10

1 1 1 1 3 6
1 2 3 10 6 10
168 3 The second problem of algebraic regression

Again, the partial redundancies (r11 ," , r44 ) are associated with the influence of
the observation y1, y2, y3 or y4 on the total degree of freedom. Here the observa-
tions y1, y2 and y3 had the greatest influence, in contrast the observation y4 at the
leverage point a very small impact.
The redundancy matrix R will be properly analyzed in order to supply us with
the latent restrictions or the details of “from A to B”. The rank partitioning
R = [B, C], rk R = rk B = n  m = 2 , leads us to (3.22) of the matrix R of partial
redundancies. Here, B  R 4× 2 , with two column vectors is established. Note
C  R 4×2 is a dimension identity. We already introduced the orthogonality condi-
tions in (3.22)
BcA = 0 or BcAxA = Bcy A = 0 ,

which establish the two latent conditions


57 37 31 11
yˆ1  yˆ 2  yˆ 3 + yˆ 4 = 0
100 100 100 100
37 67 29 1
 yˆ1 + yˆ 2  yˆ 3  yˆ 4 = 0.
100 100 100 100
Let us identify in the context of this paragraph R( A) and R ( A) A for the linear
model
{Ax + i := y , A  R n× m , rk A = m = 2} .
The derivatives
ª1º ª1 º
«1» «2»
wy wy
t1 := = [e1y , e 2y , e 3y , e 4y ] « » , t 2 := = [e1y , e 2y , e 3y , e 4y ] « » ,
wx1 «1» wx 2 «3»
« » « »
¬1¼ ¬10¼
of the observational functional y = f (x1 , x 2 ) generate the tangent vectors which
span a linear manifold called Grassmann space
G 2,4 = span{t1 , t 2 }  R 4 ,
in short GRASSMANN (A). An illustration of such a linear manifold is

ª a11 x1 + a12 x2 º
« a x + a x » n=4 m=2
y = [e1y , e 2y , e3y , e 4y ] « 21 1 22 2 » = ¦ ¦ eiy aij x j ,
« a31 x1 + a32 x2 » i =1 j =1
« »
¬« a41 x1 + a42 x2 ¼»
3-3 Case study 169

ª a11 º
«a » n=4
wy y « 21 »
( x1 , x2 ) = [e1 , e 2 , e3 , e 4 ]
y y y
= ¦ eiy ai1 ,
wx1 « a31 » i =1
« »
¬« a41 ¼»
ª a12 º
«a » n=4
wy
( x1 , x2 ) = [e1y , e 2y , e3y , e 4y ] « » = ¦ eiy ai 2 .
22

wx2 « a32 » i =1
« »
«¬ a42 »¼

Box 3.24
Latent restrictions
Grassmann coordinates (Plücker coordinates)
the first example
(3.153) BcA = 0 œ Bcy = 0 (3.154)

ª1 1 º ª 57 37 º
«1 2 » « »
(3.155) A=« » Ÿ B = 1 « 37 67 » (3.156)
«1 3 » 100 « 31 29 »
« » « »
«¬1 10 »¼ «¬ 11 1 »¼

“ latent restriction”
57 yˆ1  37 yˆ 2  31yˆ 3 + 11yˆ 4 = 0 (3.157)

37 yˆ1 + 67 yˆ 2  29 yˆ 3  yˆ 4 = 0 (3.158)

“ R( A) : the tangent space Tx M 2


the Grassmann manifold G 2,4 ”
ª1º
«»
wy y y y y «1»
“the first tangent vector”: t1 := [e1 , e 2 , e3 , e 4 ] (3.159)
wx1 «1»
«»
«¬1»¼

ª1 º
« »
wy y y y y « 2 »
“the second tangent vector”: t 2 := [e1 , e 2 , e3 , e 4 ] (3.160)
wx 2 «3 »
« »
¬«10 ¼»
170 3 The second problem of algebraic regression

G 2,4 = span{t1 , t 2 }  R 4 : Grassmann ( A )

b1
“the first normal vector”: n1 := (3.161)
|| b1 ||

|| b1 ||2 = 104 (572 + 372 + 312 + 112 ) = 57 102 (3.162)

ª 0.755 º
« 0.490»
n1 = [e1y , e 2y , e 3y , e 4y ] « » (3.163)
« 0.411»
« »
¬ 0.146¼
b2
“the second normal vector”: n 2 := (3.164)
|| b 2 ||

|| b 2 ||2 = 104 (37 2 + 67 2 + 292 + 12 ) = 67 102 (3.165)

ª 0.452 º
« »
y « 0.819 »
n 2 = [e1 , e 2 , e3 , e 4 ]
y y y
(3.166)
« 0.354 »
« »
¬« 0.012 ¼»
Grassmann coordinates (Plücker coordinates)

ª1 1 º
«1 2 »
A=« » Ÿ g ( A) := °­® 1 1 , 1 1 , 1 1 , 1 2 , 1 2 , 1 3 °½¾ =
«1 3 » °¯ 1 2 1 3 1 10 1 3 1 10 1 10 °¿
« »
¬1 10 ¼
= { p12 , p13 , p14 , p23 , p24 , p34 }
(3.167)
p12 = 1, p13 = 2, p14 = 9, p23 = 1, p24 = 8, p34 = 7.

Again, the columns of the matrix A lay the foundation of GRASSMANN (A).
Next we turn to GRASSMANN (B) to be identified as the normal space R ( A) A .
The normal vectors
ªb11 º ªb21 º
«b » «b »
n1 = [e1y , e 2y , e3y , e 4y ] « 21 » ÷ || col1 B ||, n 2 = [e1y , e 2y , e3y , e 4y ] « 22 » ÷ || col2 B ||
«b31 » «b32 »
« » « »
¬«b41 ¼» ¬«b42 ¼»
are computed from the normalized column vectors of the matrix B = [b1 , b 2 ] .
3-3 Case study 171

The normal vectors {n1 , n 2 } span the normal space R ( A) A , also called GRASS-
MANN(B). Alternatively, we may substitute the normal vectors n1 and n 2 by the
Grassmann coordinates (Plücker coordinates) of the matrix A, namely by the
Grassmann column vector.
1 1 1 1 1 1
p12 = = 1, p13 = = 2, p14 = =9
1 2 1 3 1 10
1 2 1 2 1 3
p23 = = 1, p24 = = 8, p34 = =7
1 3 1 10 1 10

n = 4, m = 2, n–m = 2
n=4
1 n=4
¦ (ei š ei )ai 1ai 2 = ¦ e j še j H i ,i , j , j ai 1ai 2
i1 ,i2 =1
1 2 1 2
2! i ,i , j , j =1
1 2 1 2
1 2 1 2 1 2 1 2

ª p12 º ª1 º
«p » « »
« 13 » « 2 »
« p14 » «9 »
g := « » = « »  R 6×1 .
« p23 » «1 »
« p » «8 »
« 24 » « »
«¬ p34 »¼ ¬«7 ¼»

?How do the vectors {b1, b2},{n1, n2} and g relate to each other?
Earlier we already normalized, {b1 , b 2 } to {b1 , b 2 }, when we constructed
{n1 , n 2 } . Then we are left with the question how to relate {b1 , b 2 } and
{n1 , n 2 } to the Grassmann column vector g.
The elements of the Grassmann column vector g(A) associated with matrix A
are the Grassmann coordinates (Plücker coordinates){ p12 , p13 , p14 , p23 , p24 , p34 }
in lexicographical order. They originate from the dual exterior form D m = E n  m
where D m is the original m-exterior form associated with the matrix A.
n = 4, n–m = 2
1 n=4
D 2 := ¦ ei š ei ai ai =
2! i i =1
1, 2
1 2 1 2

1 1
= e1 š e 2 (a11a22  a21a12 ) + e1 š e3 ( a11a32  a31a22 ) +
2! 2!
1 1
+ e1 š e 4 (a11a42  a41a12 ) + e 2 š e3 ( a21a32  a31a22 ) +
2! 2!
1 1
+ e 2 š e 4 (a21a42  a41a22 ) + e3 š e 4 (a31a42  a41a32 )
2! 2!
172 3 The second problem of algebraic regression

1 n=4
E := D 2 (R 4 ) = ¦ e j š e j Hi i 1 2 j1 j2
ai 1ai 2 =
4 i i , j , j =1
1, 2 1 2
1 2 1 1

1 1 1
= e3 š e 4 p12 + e 2 š e 4 p13 + e3 š e 2 p14 +
4 4 4
1 1 1
+ e 4 š e1 p23 + e3 š e1 p24 + e1 š e 2 p34 .
4 4 4
The Grassmann coordinates (Plücker coordinates) { p12 , p13 , p14 , p23 , p24 , p34 }
refer to the basis
{e3 š e 4 , e 2 š e 4 , e3 š e 2 , e 4 š e1 , e3 š e1 , e1 š e 2 } .

Indeed the Grassmann space G 2,4 spanned by such a basis can be alternatively
covered by the chart generated by the column vectors of the matrix B,
n=4
J 2 := ¦e j1 e j b j b j  GRASSMANN(Ǻ),
2 1 2
j1 , j2

a result which is independent of the normalisation of {b j 1 , b j 2 } . 1 2

As a summary of the result of the two examples (i) A  \ 3× 2 and (ii)


A  \ 4× 2 for a general rectangular matrix A  \ n × m , n > m, rkA = m is needed.

“The matrix B constitutes the latent restrictions also called latent condition
equations. The column space R (B) of the matrix B coincides with complemen-
tary column space R ( A) A orthogonal to column space R ( A) of the matrix A.
The elements of the matrix B are the Grassmann coordinates, also called Plücker
coordinates, special sub determinants of the matrix A = [a i1 ," , a im ]
n
p j j :=
1 2 ¦ Hi "i
1 m j1 " jn-m
ai 1 "ai
1 mm
.
i1 ," , im =1

The latent restrictions control the parameter adjustment in the


sense of identifying outliers or blunders in observational data.”

3-34 From B to A: latent parametric equations, dual Grassmann


coordinates, dual Plücker coordinates
While in the previous paragraph we started from a given matrix A  \ n ×m ,
n > m, rk A = m representing a special inconsistent systems of linear equations
y=Ax+i, namely in order to construct the orthogonal complement R ( A) A of
R ( A) , we now reverse the problem. Let us assume that a matrix B  \ A× n ,
A < n , rk B = A is given which represents a special inconsistent system of linear
homogeneous condition equations Bcy = Bci . How can we construct the or-
thogonal complement R ( A) A of R (B) and how can we relate the elements of
R (B) A to the matrix A of parametric adjustment?
3-3 Case study 173

First, let us depart from the orthogonality condition BcA = 0 or A cB = 0 we


already introduced and discussed at length. Such an orthogonality condition had
been the result of the orthogonality of the vectors y A = yˆ (LESS) and
i A = ˆi (LESS) . We recall the general condition of the homogeneous matrix equa-
tion.
BcA = 0 œ A = [I A  B(BcB) 1 Bc]Z ,

which is, of course, not unique since the matrix Z  \ A× A is left undetermined.
Such a result is typical for an orthogonality conditions.
Second, let us construct the Grassmann space G A ,n , in short GRASSMANN (B)
as well as the Grassmann space G n  A , n , in short GRASSMANN (A) representing
R (B) and R (B) A , respectively.
1 n
JA = ¦ e j š" š e j b j 1"b j A (3.168)
A ! j " j =11 A
1 A 1 A

n
1 1
G n  A := J A = ¦ ei š " š ei H i "i n  A j1 " jA
b j 1 "b j A .
(n  A)! i ,", i
1 nA , j1 ," , jA =1 A !
1 nA 1 1 A

The exterior form J A which is built on the column vectors {b j 1 ," , b j A } of the 1 A
matrix B  \ A× n is an element of the column space R (B) . Its dual exterior form
J = G nA , in contrast, is an element of the orthogonal complement R (B) A .
q i "i
1 nA
:= Hi "i 1 n  A j1 " jA
b j 1"b j A
1 A
(3.169)

denote the Grassmann coordinates (Plücker coordinates) which are dual to the
Grassmann coordinates (Plücker coordinates) p j " j . q := [ q i … q n A ] is con- 1 nm
stituted by subdeterminants of the matrix B, while p := [ p j … p n  m ] by subde-
1

terminants of the matrix A.


The (D, E, J, G) -diagram of Figure 3.8 is commutative. If R (B) = R ( A) A , then
R (B) A = R ( A) . Identify A = n  m in order to convince yourself about the
(D, E, J, G) - diagram to be commutative.

G n A = J A JA

id ( A = n  m ) id ( A = n  m )

Dm E n  m = D m
Figure 3.8: Commutative diagram
D m o D m = En-m = J n-m o J n-m = En-m = D m = (1) m ( n-m ) D m
174 3 The second problem of algebraic regression

Third, let us specialize R ( A) = R (B) A and R ( A) A = R (B) by A = n - m .


D m o D m = En-m = J n-m o J n-m = En-m = D m = (1) m ( n-m ) D m (3.170)
The first and second example will be our candidates for test computations of the
diagram of Figure 3.8 to be commutative. Box 3.25 reviews direct and inverse
Grassmann coordinates (Plücker coordinates) for A  \ 3× 2 , B  \ 3×1 , Box 3.26
for A  \ 4× 2 , B  \ 4× 2 .
Box 3.25
Direct and inverse Grassmann coordinates
(Plücker coordinates)
first example
The forward computation

ª1 1 º n =3 n =3
A = ««1 2 »»  \ 3×2 : a1 = ¦ ei ai 1 and a 2 = ¦ ei ai 1 1 2 2 2
i =1 i =1
«¬1 10 »¼ 1 2

n =3
1
D 2 := ¦ ei š ei ai 1ai 2  ȁ 2 (\ 3 )  ȁ m ( \ n )
i ,i =1 2! 1 2
1 2 1 2

n =3
1
E1 := D 2 := ¦ e j Hi i j ai 1ai 2  ȁ 2 (\ 3 )  ȁ m ( \ n )
i ,i , j =1
1 2
2! 1
1 12 1 1 2

Grassmann coordinates (Plücker coordinates)


1 1 1
E1 = e1 p23 + e 2 p31 + e3 p12
2 2 2
p23 = a21 a32  a31 a22 , p31 = a31 a12  a11 a32 ,
p12 = a11 a22  a21 a12 , p23 = 8, p31 = 9, p12 = 1

The backward computation


n =3
1
J1 := ¦ e j Hi i j ai 1 ai 2 = e1 p23 + e 2 p31 + e3 p12  ȁ1 (\ 3 )
i1 ,i2 , j1 =1
1! 1 12 1 1 2

n =3
1
G 2 := J1 := ¦ ei š ei H i i j H j 2 j3 j1
a j 1a j 2 Ÿ
2! i1 ,i2 , j1 , j2 , j3 =1
1 2 12 1 2 3

n =3
1
G2 = ¦ e i š e i (G i j G i 2 j3
 Gi j Gi j ) a j 1 a j 2
2! i1 ,i2 , j1 , j2 , j3 =1
1 2 1 2 1 3 2 2 2 3

n =3
1
G2 = ¦e i1 š ei ai 1 ai 2 = D 2  ȁ 2 (\ 3 )  ȁ m ( \ n )
2 i1 ,i2 =1
2 1 2

inverse Grassmann coordinates


(dual Grassmann coordinates, dual Plücker coordinates)
3-3 Case study 175
1 1
G2 = D 2 = e 2 š e 3 ( a 21 a 32  a 31 a 22 ) + e 3 š e1 ( a 31 a12  a11 a 32 ) +
2 2
1
+ e1 š e 2 ( a11 a 22  a 21 a12 )
2
1 1 1
G2 = D 2 = e 2 š e3 q23 + e 2 š e3 q31 + e 2 š e3 q12  ȁ 2 (\ 3 ) .
2 2 2
Box 3.26
Direct and inverse Grassmann coordinates
(Plücker coordinates)
second example
The forward computation
ª1 1 º n=4 n=4
A = «1 2 »  \ 4× 2 : a1 = ¦ e i ai 1 and a 2 = ¦ e i ai 2
«1 3 » i =1
1 1
i =1
2 2

«¬1 10 »¼ 1 2

n=4
1
D 2 := ¦ ei š ei ai 1ai 2  ȁ 2 (\ 4 )  ȁ m (\ n )
i1 ,i2 =1 2! 1 2 1 2

1 n=4 1
E2 := D 2 := ¦ e j š e j Hi i j j ai 1ai 2  ȁ 2 (\ 4 )  ȁ n-m (\ n )
2! i ,i , j , j =1 2!
1 2 1 2
1 2 12 1 2 1 2

1 1 1
E2 = e3 š e 4 p12 + e 2 š e 4 p13 + e3 š e 2 p14 +
4 4 4
1 1 1
+ e 4 š e1 p23 + e3 š e1 p24 + e1 š e 2 p34
4 4 4
p12 = 1, p13 = 2, p14 = 9, p23 = 1, p34 = 7

The backward computation


1 n=4
J 2 := ¦ e j š e j Hi i 1 2 j1 j2
ai 1ai 2  ȁ 2 (\ 4 )  ȁ n-m (\ n )
2! i ,i , j , j =1
1 2 1 2
1 2 1 2

n=4
1 1
G 2 := J 2 := ¦ ei š ei H i i 1 2 j1 j2
Hj j1 2 j3 j4
a j 1a j 2 =
2! i ,i , j , j , j , j
1 2 1 2 3 4 =1 2!
1 2 3 4

= D 2  ȁ 2 (\ 4 )  ȁ m (\ n )
1 n=4
G2 = D 2 = ¦ ei š ei ai 1ai 2
4 i ,i =1
1 2
1 2 1 2

1 1 1
G2 = D 2 = e3 š e 4 q12 + e 2 š e 4 q13 + e3 š e 2 q14 +
4 4 4
1 1 1
+ e 4 š e1q23 + e3 š e1q24 + e1 š e 2 q34
4 4 4
q12 = p12 ,q13 = p13 ,q14 = p14 ,q23 = p23 ,q24 = p24 ,q34 = p34 .
176 3 The second problem of algebraic regression

3-35 Break points


Throughout the analysis of high leverage points and outliers within the observa-
tional data we did assume a fixed linear model. In reality such an assumption
does not apply. The functional model may change with time as Figure 3.9 indi-
cates. Indeed we have to break-up the linear model into pieces. Break points
have to be introduced as those points when the linear model changes. Of course,
a hypothesis test has to decide whether the break point exists with a certain prob-
ability. Here we only highlight the notion of break points in the context of lever-
age points. For localizing break points we apply the Gauss-Jacobi Combinatorial
Algorithm following J. L. Awange (2002), A. T. Hornoch (1950), S. Wellisch
(1910).
Figure 3.9: Figure 3.10:
Graph of the function y(t), two Gauss-Jacobi Combinatorial
break points Algorithm, piecewise linear model,
1st cluster : ( ti , t j )

Figure 3.11: Figure 3.12:


Gauss-Jacobi Combinatorial Gauss-Jacobi Combinatorial
Algorithm, 2nd cluster : ( ti , t j ) Algorithm, 3rd cluster : ( ti , t j ).
3-3 Case study 177

Table 3.1: Test “ break points” observations for a piecewise linear model
y t
y1 1 1 t1
y2 2 2 t2
y3 2 3 t3
y4 3 4 t4
y5 2 5 t5
y6 1 6 t6
y7 0.5 7 t7
y8 2 8 t8
y9 4 9 t9
y10 4.5 10 t10
Table 3.1 summarises a set of observations yi with n=10 elements. Those measure-
ments have been taken at time instants {t1 ," , t10 } . Figure 3.9 illustrates the graph
of the corresponding function y(t). By means of the celebrated Gauss-Jacobi
Combinatorial Algorithm we aim at localizing break points. First, outlined in
Box 3.27 we determine all the combinations of two points which allow the fit of
a straight line without any approximation error. As a determined linear model
y = Ax , A  \ 2× 2 , r k A = 2 namely x = A 1 y we calculate (3.172) x1 and
(1.173) x 2 in a closed form. For instance, the pair of observations ( y1 , y2 ) , in
short (1, 2) at (t1 , t2 ) = (1, 2) determines ( x1 , x2 ) = (0,1) . Alternatively, the pair of
observations ( y1 , y3 ) , in short (1, 3), at (t1, t3) = (1, 3) leads us to (x1, x2) = (0.5,
0.5). Table 3.2 contains the possible 45 combinations which determine ( x1 , x2 )
from ( y1 ," , y10 ) . Those solutions are plotted in Figure 3.10, 3.11 and 3.12.
Box 3.27
Piecewise linear model
Gauss-Jacobi combinatorial algorithm
1st step
ª y (ti ) º ª1 ti º ª x1 º
y=« »=« » « » = Ax i < j  {1," , n} (3.171)
¬ y (t j ) ¼ ¬1 t j ¼ ¬ x2 ¼

y  R 2 , A  R 2× 2 , rk A = 2, x  R 2

ªx º 1 ª t j ti º ª y (ti ) º
x = A 1y œ « 1 » = « 1 1 » « y (t ) » (3.172)
x
¬ 2¼ t j  ti ¬ ¼¬ j ¼

t j y1  ti y2 y j  yi
x1 = and x2 = . (3.173)
t j  ti t j  ti
178 3 The second problem of algebraic regression

Example: ti = t1 = 1, t j = t2 = 2
y (t1 ) = y1 = 1, y (t2 ) = y2 = 2
x1 = 0, x2 = 1.
Example: ti = t1 = 1, t j = t3 = 3
y (t1 ) = y1 = 1, y (t3 ) = y3 = 2
x1 = 0.5 and x2 = 0.5 .

Table 3.2
3-3 Case study 179

Second, we introduce the pullback operation G y o G x . The matrix of the met-


ric G y of the observation space Y is pulled back to generate by (3.174) the ma-
trix of the metric G x of the parameter space X for the “determined linear
model” y = Ax , A  R 2× 2 , rk A = 2 , namely G x = A cG y A . If the observation
space Y = span{e1y ,e2y} is spanned by two orthonormal vectors e1y ,e 2y relating to a
pair of observations (yi, yj), i<j, i, j  {1," ,10} , then the matrix of the metric
G y = I 2 of the observation space is the unit matrix. In such an experimental
situation (3.175) G x = A cA is derived. For the first example (ti, tj)=(1, 2) we are
led to vech G x = [2,3,5]c . “Vech half” shortens the matrix of the metric
G x  R 2× 2 of the parameter space X
( x1 , x2 ) by stacking the columns of the
lower triangle of the symmetric matrix G x . Similarly, for the second example
(ti, tj)=(1,3) we produce vech G x = [2, 4,10]c . For all the 45 combinations of
observations (yi, yj).
In the last column Table 3.2 contains the necessary information of the matrix of
the metric G x of the parameter space X formed by (vech G x )c . Indeed, such a
notation is quite economical.
Box 3.28
Piecewise linear model: Gauss-Jacobi combinatorial algorithm
2nd step
pullback of the matrix G X the metric from G y
G x = A cG y A (3.174)

“if G y = I 2 , then G x = A cA ”
ª 2 ti + t j º
G x = A cA = «  i < j  {1," , n}. (3.175)
¬ ti + t j ti2 + t 2j »¼

Example: ti = t1 = 1 , t j = t2 = 2

ª 2 3º
Gx = « » , vech G x = [2,3,5]c.
¬ 3 5¼
Example: ti = t1 = 1 , t j = t3 = 3

ª2 4 º
Gx = « » , vech G x = [2, 4,10]c .
¬ 4 10¼
Third, we are left the problem to identify the break points. C.F. Gauss (1828)
and C.G.J. Jacobi (1841) have proposed to take the weighted arithmetic mean of
the combinatorial solutions (x1,x2)(1,2), (x1,x2)(1,3), in general (x1,x2)(i,j), i<j, are
considered as
Pseudo-observations.
180 3 The second problem of algebraic regression

Box 3.29
Piecewise linear model: Gauss-Jacobi combinatorial algorithm
3rd step
pseudo-observations
Example

ª x1(1,2) º ª1 0º ª i1 º
« (1,2) » «
« x2 » = «0 1 » ª x1 º ««i2 »»
»
+  \ 4×1 (3.176)
« x1(1,3) » «1 0 » «¬ x2 »¼ «i3 »
« (1,3) » « » « »
¬« x2 ¼» ¬«0 1 ¼» ¬«i4 ¼»
G x -LESS
ª x1(1,2) º
« (1,2) »
ª xˆ1 º (1,3) 1 (1,3) « x2
x A := xˆ = « » = ¬ªG x + G x º¼ ª¬G x , G x º¼ (1,3) »
(1,2) (1,2)
 \ 2×1 (3.177)
¬ xˆ2 ¼ « x1 »
« (1,3) »
«¬ x2 »¼
vech G (1,2
x
)
= [2, 3,5]c , vech G(1,3)
x = [2, 4,10]c

ª 2 3º ª2 4 º
G (1,2
x
)
=« », G(1,3)
x =« »
¬ 3 5¼ ¬ 4 10¼

ª x1(1,2) º ª0º ª x1(1,3) º ª0.5º


« (1,2) » = « » , « (1,3) » = « »
¬ x2 ¼ ¬ 1 ¼ ¬ x2 ¼ ¬0.5¼
1
ª4 7 º 1 ª 15 7 º
G =«
1
x » = « » = [G x + G x ]
(1,2) (1,3) 1

¬ 7 15¼ 11 ¬  7 4 ¼

ª xˆ º 1 ª6º 6
x A := « 1 » = « » œ xˆ1 = xˆ2 = = 0.545, 454 .
ˆ
¬ x2 ¼ 11 ¬6¼ 11

Box 3.29 provides us with an example. For establishing the third step of the
Gauss-Jacobi Combinatorial Algorithm. We outline G X  LESS for the set of
pseudo-observations (3.176) ( x1 , x2 )(1,2) and ( x1 , x2 )(1,3) solved by (3.177)
xA = ( xˆ1 , xˆ2 ) . The matrices G(1,2)
x and G (1,3)
x representing the metric of the pa-
rameter space X derived from ( x1 , x2 )(1,2 ) and ( x1 , x2 )(1,3) are additively com-
posed and inverted, a result which is motivated by the special design matrix
A = [I 2 , I 2 ]c of “direct” pseudo-observations. As soon as we implement the
weight matrices G (1,2 x
)
and G (1,3)
x from Table 3.2 as well as ( x1 , x2 )(1,2 ) and
( x1 , x2 ) (1,3)
we are led to the weighted arithmetic mean xˆ1 = xˆ2 = 6 /11 . Such a
result has to be compared with the componentwise median x1 ( median) = 1/4
and x2 ( median) = 3/4.
3-3 Case study 181

ª (1, 2), (1,3) (1, 2), (1,3)


« combination combination
«
« G x  LESS median
« xˆ1 = 0.545, 454 x1 (median) = 0.250
«
«¬ xˆ2 = 0.545, 454 x2 (median) = 0.750.

Here, the arithmetic mean of x1(1,2) , x1(1,3) and x2(1,2) , x2(1,3) coincides with the me-
dian neglecting the weight of the pseudo-observations.

Box 3.30
Piecewise linear models and two break points
“Example”
ª x1 º
« »
ª y1 º ª1n tn 0 0 0 0 º « x2 » ª i y º
« y2 » = « 0
1 1
» x « 1
»
0 1n tn 0 0 » « 3 » + «i y (3.178)
« » « 2 2
« x4 » 2 »
¬ y3 ¼ «¬ 0 0 0 0 1n 3
t n »¼ « x » «¬i y
3 3
»¼
5
«x »
¬« 6 ¼»
I n -LESS, I n -LESS, I n -LESS,
1 2 3

ª x1 º 1 ª(t cn t n )(1cn y n )  (1cn t n )(t cn y n )º


«x » = c
1 1 1 1 1 1 1 1

2 « » (3.179)
¬ 2 ¼ A n1t n t n  (1cn t n ) «¬ (1cn t n )(1cn y1 ) + n1t cn y1
1 1 1 1 1 1 1 1 »¼

ª x3 º 1 ª(t cn t n )(1cn y n )  (1cn t n )(t cn y n ) º


«x » =
2 2 2 2 2 2 2 2

2 « » (3.180)
¬ 4 ¼ A n2 t cn t n  (1cn t n ) «¬ (1cn t n )(1cn y 2 ) + n2 t cn y 2 ¼»
2 2 2 2 2 2 2 2

ª x5 º 1 ª(t cn t n )(1cn y n )  (1cn t n )(t cn y n ) º


=
3 3 3 3 3 3 3 3
« » « ». (3.181)
¬ x6 ¼ A n3t cn t n  (1cn t n ) «¬ (1cn t n )(1cn y 3 ) + n3t cn y 3 ¼»
2
3 3 3 3 3 3 3 3

Box 3.31
Piecewise linear models and two break points
“ Example”
1st interval: n = 4, m = 2, t  {t1 , t2 , t3 , t4 }

ª1 t1 º
«1 t2 » ª x1 º
y1 = « » + i y = 1n x1 + t n x2 + i y (3.182)
«1 t3 » «¬ x2 »¼ 1 1

« »
¬1 t4 ¼
182 3 The second problem of algebraic regression

2nd interval: n = 4, m = 2, t  {t4 , t5 , t6 , t7 }


ª1 t4 º
«1 t5 » ª x 3 º
y2 = « » + i = 1n x3 + t n x4 + i y (3.183)
«1 t6 » «¬ x4 »¼ y2 2

« »
¬1 t7 ¼

3rd interval: n = 4, m = 2, t  {t7 , t8 , t9 , t10 }


ª1 t7 º
«1 t8 » ª x5 º
y3 = « » + i = 1n x5 + t n x6 + i y . (3.184)
«1 t9 » «¬ x6 »¼ y
3 3

« »
¬1 t10 ¼

Figure 3.10, 3.11 and 3.12 have illustrated the three clusters of combinatorial
solutions referring to the first, second and third straight line. Outlined in Box
3.30 and Box 3.31, namely by (3.178) to (3.181), n1 = n2 = n3 = 4 , we have com-
puted ( x1 , x2 ) A for the first segment, ( x3 , x4 ) A for the second segment and
( x5 , x6 ) A for the third segment of the least squares fit of the straight line. Table
3.3 contains the results explicitly. Similarly, by means of the Gauss-Jacobi
Combinatorial Algorithm of Table 3.4 we have computed the identical solution
( x1 , x2 ) A , ( x3 , x4 ) A and ( x5 , x6 ) A as “weighted arithmetic means” numerically
presented only for the first segment.
Table 3.3
I n -LESS solutions for those segments of a straight line,
two break points
ªx º ª 0.5º
I 4 -LESS : « 1 » = « » : y (t ) = 0.5 + 0.6t
¬ x2 ¼ A ¬0.6 ¼

ªx º 1 ª126 º
I 4 -LESS : « 3 » = « » : y (t ) = 6.30  0.85t
¬ x4 ¼ A 20 ¬ 17 ¼
ªx º 1 ª 183º
I 4 -LESS : « 5 » = « » : y (t ) = 9.15 + 1.40t .
¬ x6 ¼ A 20 ¬ 28 ¼

Table 3.4
Gauss-Jacobi Combinatorial Algorithm
for the first segment of a straight line,
3-3 Case study 183

ª xˆ1 º (3,4) 1
« xˆ » = ª¬G x + G x + G x + G x + G x + G x º¼
(1,2) (1,3) (1,4) (2,3) (2,4)

¬ 2¼
ª x1(1,2) º
« (1,2) »
« x2 »
« x1(1,3) »
« (1,3) »
« x2 »
« x (1,4) »
« 1(1,4) » (3.185)
x
ª¬G (1,2)
x , G (1,3)
x , G (1,4)
x , G (2,3)
x , G(2,4)
x , G(3,4)
x º¼ «« 2(2,3) »»
x
« 1(2,3) »
« x2 »
« x (2,4) »
« 1 »
« x2(2,4) »
« (3,4) »
« x1 »
«¬« x2(3,4) »¼»

) 1 1 ª 15 5º
ª¬G (1,2 )
+ G (1,3) + G (1,4 )
+ G (x2,3) + G (x2,4 ) + G (3,4 º¼ =
30 «¬ 5 2 »¼
x x x x

ª x1(1,2) º ª 2 3º ª 0º ª 3º
G (1,2)
x « (1,2) » = « »« » = « »
¬ x2 ¼ ¬ 3 5¼ ¬1¼ ¬5¼
ª x1(1,3) º ª 2 4 º ª 0.5º ª 3º
G (1,3)
x « (1,3) » = « »« » = « »
¬ x2 ¼ ¬ 4 10¼ ¬ 0.5¼ ¬ 7¼
ª x1(1,4) º ª 2 5 º ª 0.333º ª 4 º
G (1,4)
x « (1,4) » = « »« »=« »
¬ x2 ¼ ¬ 5 17 ¼ ¬ 0.666¼ ¬13¼
ª x1(2,3) º ª 2 5 º ª 2 º ª 4 º
G (2,3)
x « (2,3) » = « »« » = « »
¬ x2 ¼ ¬ 5 13¼ ¬ 0 ¼ ¬10¼
ª x1(2,4) º ª 2 6 º ª 1 º ª 5 º
G (2,4)
x « (2,4) » = « »« » = « »
¬ x2 ¼ ¬ 6 20¼ ¬ 0.5¼ ¬16¼
ª x1(3,4) º ª 2 7 º ª 1º ª 5 º
G (3,4)
x « (3,4) » = « »« » = « »
¬ x2 ¼ ¬ 7 25¼ ¬ 1 ¼ ¬18¼
ª x1(1,2) º (3,4)
(3,4) ª x1 º ª 24 º
G (1,2)
x « (1,2) » + " + G x « (3,4) » = « »
¬ x2 ¼ ¬ x2 ¼ ¬ 69 ¼
ª xˆ1 º 1 ª15 5º ª 24 º ª 0.5º
« xˆ » = 30 « 5 2 » « 69 » = « 0.6» .
¬ 2¼ ¬ ¼¬ ¼ ¬ ¼
3-4 Special linear and nonlinear models 184

3-4 Special linear and nonlinear models:


A family of means for direct observations
In case of direct observations, LESS of the inconsistent linear model y = 1n x + i
has led to
1
x A = arg{y = 1n x + i || i ||2 = min} , x A = (1c1) 11cy = ( y1 + " + yn ) .
n
Such a mean has been the starting point of many alternatives we present to you
by Table 3.5 based upon S.R. Wassel´s (2002) review.

Table 3.5
A family of means
Name Formula
1
xA = ( y1 + " + yn )
arithmetic mean n
1
n = 2 : xA = ( y1 + y2 )
2
x A = (1cG y 1) 11cG y y
if G y = Diag( g1 ," , g1 )
weighted arithmetic mean then
g1 y1 + g n yn
xA =
g1 + " + g n
1
§ n · n

geometric mean xg = n y1 " yn = ¨ – yi ¸


© i =1 ¹
n = 2 : xg = y1 y2
1
xlog = (ln y1 + " + ln yn )
logarithmic mean n
xlog = ln xg
y(1) < " < y( n )
ordered set of observations
ª y( k +1) if n = 2k + 1
« " add "
med y = «
median «[ y( k ) + y( k +1) ] / 2 if n = 2k
«
¬ " even "
3-5 A historical note 185

( y1 ) p +1 + " + ( yn ) p +1
xp =
Wassel´s family of means ( y1 ) p + " + ( yn ) p
n n
x p = ¦ ( yi ) p +1 ¦(y ) i
p

i =1 i =1

Case p=0: x p = xA
Case p= 1/ 2 , n=2: x p = xA
Hellenic mean Case p= –1:
n=2:
1
§ y 1 + y21 · 2 y1 y2
H = H ( y1 , y2 ) = ¨ 1 ¸ = .
© 2 ¹ y1 + y2

3-5 A historical note on C.F. Gauss, A.M. Legendre and the inven-
tions of Least Squares and its generalization
The historian S.M. Stigler (1999, pp 320, 330-331) made the following com-
ments on the history of Least Squares.
“The method of least squares is the automobile of modern statistical
analysis: despite its limitations, occasional accidents, and incidental pollu-
tion, this method and its numerous variations, extensions, and related
conveyances carry the bulk of statistical analyses, and are known and val-
ued by nearly all. But there has been some dispute, historically, as to who
is the Henry Ford of statistics. Adrian Marie Legendre published the
method in 1805, an American, Robert Adrian, published the method in
late 1808 or early 1809, and Carl Fiedrich Gauss published the method in
1809. Legendre appears to have discovered the method in early 1805, and
Robert Adrain may have “discovered” it in Legendre’s 1805 book (Stigler
1977c, 1978c), but in 1809 Gauss had the temerity to claim that he had
been using the method since 1795, and one of the most famous priority
disputes in the history of science was off and running. It is unnecessary to
repeat the details of the dispute – R.L. Plackett (1972) has done a masterly
job of presenting and summarizing the evidence in the case.
Let us grant, then, that Gauss’s later accurate were substantially accunate,
and that he did device the method of least squares between 1794 and
1799, independently of Legendre or any other discoverer. There still re-
mains the question, what importance did he attach to the discovery? Here
the answer must be that while Gauss himself may have felt the method
useful, he was unsuccessful in communicating its importance to other be-
fore 1805. He may indeed have mentioned the method to Olbers, Linde-
mau, or von Zach before 1805, but in the total lack of applications by oth-
ers, despite ample opportunity, suggests the message was not understood.
The fault may have been more in the listener than in the teller, but in this
case its failure serves only to enhance our admiration for Legendre’s 1805
success. For Legendre’s description of the method had an immediate and
186 3 The second problem of algebraic regression

widespread effect – as we have seen, it even caught the eye and under-
standing of at least one of those astronomers (Lindemau) who had been
deaf to Gauss’s message, and perhaps it also had an influence upon the
form and emphasis of Gauss’s exposition of the method.
When Gauss did publish on least squares, he went far beyond Legendre in
both conceptual and technical development, linking the method to prob-
ability and providing algorithms for the computation of estimates. His
work has been discussed often, including by H.L. Seal (1967), L. Eisen-
hart (1968), H. Goldsteine (1977, §§ 4.9, 4.10), D.A. Sprott (1978),O.S.
Sheynin (1979, 1993, 1994), S.M. Stigler (1986), J.L. Chabert (1989),
W.C. Waterhouse (1990), G.W. Stewart (1995), and J. Dutka (1996).
Gauss’s development had to wait a long time before finding an apprecia-
tive audience, and much was intertwined with other’s work, notably
Laplace’s. Gauss was the first among mathematicians of the age, but it
was Legendre who crystallized the idea in a form that caught the mathe-
matical public’s eye. Just as the automobile was not the product of one
man of genius, so too the method of least squares is due to many, includ-
ing at least two independent discoverers. Gauss may well have been the
first of these, but he was no Henry Ford of statistics. If these was any sin-
gle scientist who first put the method within the reach of the common
man, it was Legendre.”
Indeed, these is not much to be added. G.W. Stewart (1995) recently translated
the original Gauss text “Theoria Combinationis Observationum Erroribus Mini-
mis Obmaxial, Pars Prior. Pars Posterior “ from the Latin origin into English.
F. Pukelsheim (1998) critically reviewed the sources, the reset Latin text and the
quality of the translation. Since the English translation appeared in the SIAM
series “ Classics in Applied Mathematics”, he concluded: “ Opera Gaussii contra
SIAM defensa”.
“Calculus probilitatis contra La Place defenses.” This is Gauss’s famous diary
entry of 17 June 1798 that he later quoted to defend priority on the Method of
Least Squares (Werke, Band X, 1, p.533).
C.F. Gauss goes Internet
With the Internet Address http://gallica.bnf.fr you may reach the catalogues of
digital texts of Bibliotheque Nationale de France. Fill the window “Auteur” by
“Carl Friedrich Gauss” and you reach “Types de documents”. Continue with
“Touts les documents” and click “Rechercher” where you find 35 documents
numbered 1 to 35. In total, 12732 “Gauss pages” are available. Only the Gauss-
Gerling correspondence is missing. The origin of all texts are the resources of
the Library of the Ecole Polytechnique.
Meanwhile Gauss’s Werke are also available under http://www.sub.uni-
goettingen.de/. A CD-Rom is available from “Niedersächsische Staats- und
Universitätsbibliothek.”
For the early impact of the Method of Least Squares on Geodesy, namely W.
Jordan, we refer to the documentary by S. Nobre and M. Teixeira (2000).
4 The second problem of probabilistic regression
– special Gauss-Markov model without datum defect –
Setup of BLUUE for the moments of first order and of
BIQUUE for the central moment of second order

: Fast track reading :


Read only Theorem 4.3 and Theorem 4.13.

Lemma 4.2
ȟ̂ : Ȉ y -BLUUE of ȟ

Definition 4.1 Theorem 4.3


ȟ̂ : Ȉ y -BLUUE of ȟ ȟ̂ : Ȉ y -BLUUE of ȟ

Lemma 4.4
E{yˆ }, Ȉ y -BLUUE of E{y}
e y , D{e y }, D{y}

“The first guideline of chapter four: definition, lemmas and theorem”

In 1823, supplemented in 1828, C. F. Gauss put forward a new substantial gen-


eralization of “least squares” pointing out that an integral measure of loss, more
definitely the principle of minimum variance, was preferable to least squares and
to maximum likelihood. He abandoned both his previous postulates and set high
store by the formula Vˆ 2 which provided an unbiased estimate of variance V 2 .
C. F. Gauss’s contributions to the treatment of erroneous observations, lateron
extended by F. R. Helmert, defined the state of the classical theory of errors.
To the analyst C. F. Gauss’s preference to estimators of type BLUUE (Best
Linear Uniformly Unbiased Estimator) for the moments of first order as well as
of type BIQUUE (Best Invariant Quadratic Uniformly Unbiased Estimator) for
the moments of second order is completely unknown. Extended by A. A. Markov
who added correlated observations to the Gauss unbiased minimum variance
188 4 The second problem of probabilistic regression

estimator we present to you BLUUE of fixed effects and Ȉ y -BIQUUE of the


variance component.

“The second guideline of chapter four:


definitions, lemmas, corollaries and theorems”

Theorem 4.5
equivalence of Ȉ y -BLUUE
and G y -LESS

Corollary 4.6
multinomial inverse

Definition 4.7 Lemma 4.8


invariant quadratic estimation invariant quadratic estimation
Vˆ 2 of V 2 : IQE Vˆ 2 of V 2 : IQE

Definition 4.9 Lemma 4.10


variance-covariance components invariant quadratic estima-
model Vˆ k IQE of V k tion Vˆ k of V k : IQE eigenspace

Definition 4.11 Lemma 4.12


invariant quadratic unformly invariant quadratic unformly
unbiased estimation: IQUUE unbiased estimation: IQUUE

Lemma 4.13
var-cov components:
IQUUE

Corollary 4.14
translational invariance
4 The second problem of probabilistic regression 189

Corollary 4.15
IQUUE of Helmert type:
HIQUUE

Corollary 4.16
Helmert equation
det H z 0

Corollary 4.17
Helmert equation
det H = 0

Definition 4.18 Corollary 4.19


best IQUUE Gauss normal IQE

Lemma 4.20
Best IQUUE

Theorem 4.21
2
Vˆ BIQUUE of V

In the third chapter we have solved a special algebraic regression problem,


namely the inversion of a system of inconsistent linear equations of full column
rank classified as “overdetermined”. By means of the postulate of a least squares
solution || i ||2 =|| y  Ax ||2 = min we were able to determine m unknowns from n
observations ( n > m : more equations n than unknowns m). Though “LESS”
generated a unique solution to the “overdetermined” system of linear equations
with full column rank, we are unable to classify “LESS”. There are two key
questions we were not able to answer so far: In view of “MINOS” versus
“LUMBE” we want to know whether “LESS” produces an unbiased estimation
or not. How can we attach to an objective accuracy measure “LESS”?
190 4 The second problem of probabilistic regression

The key for evaluating “LESS” is handed over to us by treating the special alge-
braic regression problem by means of a special probabilistic regression problem,
namely a special Gauss-Markov model without datum defect. We shall prove that
uniformly unbiased estimations of the unknown parameters of type “fixed ef-
fects” exist. “LESS” is replaced by “BLUUE” (Best Linear Uniformly Unbiased
Estimation). The fixed effects constitute the moments of first order of the under-
lying probability distributions of the observations to be specified. In contrast, its
central moments of second order, known as the variance-covariance matrix or
dispersion matrix, open the door to associate to the estimated fixed effects an
objective accuracy measure.

? What is a probabilistic problem ?

By means of certain statistical objective function, here of type


“best linear uniformly “best quadratic invariant uniformly
unbiased estimation” unbiased estimation”
(BLUUE) (BIQUUE)
for moments of for the central moments of
first order second order
we solve the inverse problem of linear, lateron nonlinear equations with fixed
effects which relates stochastic observations to parameters. According to the
Measurement Axiom, observations are elements of a probability space. In terms
of second order statistics the observation space Y of integer dimension,
dim Y = n , is characterized by
the first moment E{y} , the central second moment D{y}
the expectation of and the dispersion matrix or
y  {Y, pdf } variance-covariance matrix Ȉ y .
In the case of “fixed effects” we consider the parameter space Ȅ , dim Ȅ = m , to
be metrical. Its metric is induced from the probabilistic measure of the metric,
the variance-covariance matrix Ȉ y of the observations y  {Y, pdf } .
In particular, its variance-covariance matrix is pulled-back from the variance-
covariance matrix Ȉ y . In the special probabilistic regression model with un-
known “fixed effects” ȟ  Ȅ (elements of the parameter space) are estimated
while the random variables like y  E{y} are predicted.
4-1 Introduction
Our introduction has four targets. First, we want to introduce P̂ , a linear estima-
tion of the mean value of “direct” observations, and Vˆ 2 , a quadratic estimation
of their variance component. For such a simple linear model we outline the pos-
tulates of uniform unbiasedness and of minimum variance. We shall pay special
4-1 Introduction 191

attention to the key role of the invariant quadratic estimation (“IQE”) Vˆ 2 of


V 2 . Second, we intend to analyse two data sets, the second one containing an
outlier, by comparing the arithmetic mean and the median as well as the “root
mean square error” (r.m.s.) of type BIQUUE and the “median absolute devia-
tion” (m.a.d.). By proper choice of the bias term we succeed to prove identity of
the weighted arithmetic mean and the median for the data set corrupted by an
obvious outlier. Third, we discuss the competitive estimator “MALE”, namely
Maximum Likelihood Estimation which does not produce an unbiased estimation
Vˆ 2 of V 2 , in general. Fourth, in order to develop the best quadratic uniformly
unbiased estimation Vˆ 2 of V 2 , we have to highlight the need for fourth order
statistic. “IQE” as well as “IQUUE” depend on the central moments of fourth
order which are reduced to central moments of second order if we assume
“quasi-normal distributed” observations.
4-11 The front page example
By means of Table 4.1 let us introduce a set of “direct” measurements yi , i  {1,
2, 3, 4, 5} of length data. We shall outline how we can compute the arithmetic
mean 13.0 as well as the standard deviation of 1.6.
Table 4.1: “direct” observations, comparison of mean and median
(mean y = 13, med y = 13, [n / 2] = 2, [n / 2]+1 = 3, med y = y (3) ,
mad y = med| y ( i )  med y | = 1, r.m.s. (I-BIQUUE) = 1.6)
number of observation difference of difference of ordered set ordered set of ordered set
observation yi observation observation of observa- | y ( i )  med y | of
and mean and median tions y( i ) y( i )  mean y

1 15 +2 +2 11 0 +2
2 12 -1 -1 12 1 -1
3 14 +1 +1 13 1 +1
4 11 -2 -2 14 2 -1
5 13 0 0 15 2 0
In contrast, Table 4.2 presents an augmented observation vector: The observa-
tions six is an outlier. Again we have computed the new arithmetic mean 30.16
as well as the standard deviation 42.1. In addition, for both examples we have
calculated the sample median and the sample absolute deviation for comparison.
All definitions will be given in the context as well as a careful analysis of the
two data sets.

Table 4.2: “direct” observations, effect of one outlier


(mean y = 30.16 , med y = (13+14) / 2 = 13.5,
r.m.s. (I-BLUUE) = 42.1, med y ( i )  med y = mad y = 1.5)
192 4 The second problem of probabilistic regression

number of observation difference of difference of ordered set ordered set of ordered set
observation yi observation observation of observa- y( i )  med y of
and mean and median tions y( i ) y( i )  mean y

1 15 15.16 +1.5 11 0.5 15.16


2 12 18.16 -1.5 12 0.5 16.16
3 14 16.16 +0.5 13 1.5 17.16
4 11 19.16 -2.5 14 1.5 18.16
5 13 17.16 -0.5 15 2.5 19.16
6 116 +85.83 +102.5 116 +102.5 +85.83
4-12 Estimators of type BLUUE and BIQUUE of the front page example
In terms of a special Gauss-Markov model our data set can be described as fol-
lowing. The statistical moment of first order, namely the expectation E{y} = 1P
of the observation vector y  R n , here n = 5, and the central statistical moment
of second order, namely the variance-covariance matrix Ȉ y , also called the
dispersion matrix D{y} = I nV 2 , D{y} =: Ȉ y  R n×n , rk Ȉ y = n, of the observation
vector y  R n , with the variance V 2 characterize the stochastic linear model.
The mean P  R of the “direct” observations and the variance factor V 2 are
unknown. We shall estimate ( P , V 2 ) by means of three postulates:
• first postulate: Pˆ : linear estimation, V̂ 2 : quadratic estimation
n
Pˆ = ¦ l p y p or P̂ = l cy
p =1

n
Vˆ 2 = y cMy = (y c … y c)(vec M )
Vˆ 2 = ¦m pq y p yq or
p , q =1 = (vecM )c(y … y )

• the second postulate: uniform unbiasedness


E{Pˆ } = P for all P  R
E{Vˆ 2 } = V 2 for all V 2  R +

• the third postulate: minimum variance


D{Pˆ } = E{[ Pˆ  E{Pˆ }]2 } = min and D{Vˆ 2 } = E{[Vˆ 2  E{Vˆ 2 }]2 } = min
A M

Pˆ = arg min D{Pˆ | Pˆ = A cy, E{Pˆ } = P}


A

Vˆ 2 = arg min D{Vˆ 2 | Vˆ 2 = y cMy, E{Vˆ 2 } = V 2 } .


M
4-1 Introduction 193

First, we begin with the postulate that the fixed unknown parameters ( P , V 2 )
are estimated by means of a certain linear form P̂ = A cy + N = y cA + N and by
means of a certain quadratic form Vˆ 2 = y cMy + xcy + Z = (vec M )c(y … y ) +
+ xcy + Z of the observation vector y, subject to the symmetry condition
M  SYM := {M  R n× n | M = M c}, namely the space of symmetric matrices.
Second we demand E{Pˆ } = P , E{Vˆ 2 } = V 2 , namely unbiasedness of the estima-
tions ( Pˆ , Vˆ 2 ) . Since the estimators ( Pˆ , Vˆ 2 ) are special forms of the observation
vector y  R n , an intuitive understanding of the postulate of unbiasedness is the
following: If the dimension of the observation space Y
y , dim Y = n , is going
to infinity, we expect information about the “two values” ( P , V 2 ) , namely
lim Pˆ (n) = P , lim Vˆ 2 ( n) = V 2 .
nof nof

Let us investigate how LUUE (Linear Uniformly Unbiased Estimation) of P as


well as IQUUE (Invariant Quadratic Uniformly Unbiased Estimation) operate.
LUUE
E{Pˆ } = E{A cy + N } = A cE{y} + N º
E{y} = 1n P »Ÿ
¼
E{Pˆ } = A cE{y} + N = A c1n P + N
E{Pˆ } = P œ N = 0, (A c1n  1) P = 0
œ N = 0, A c1n  1 = 0 for all P  R.

Indeed P̂ is LUUE if and only if N = 0 and (A c1n  1) P = 0 for all P  R. The


zero identity (A c1n  1) P = 0 is fulfilled by means of A c1n  1 = 0, A c1n = 1, if we
restrict the solution by the quantor “ for all P  R ”. P = 0 is not an admissible
solution. Such a situation is described as “uniformly unbiased”. We summarize
that LUUE is constrained by the zero identity
A c1n  1 = 0 .

Next we shall prove that Vˆ 2 is IQUUE if and only if


IQUUE
E{Vˆ 2 } = E{y cMy + xcy + Z } = E{Vˆ 2 } = E{y cMy + xcy + Z } =
E{(vec M )c( y … y ) + xcy + Z} = E{(y c … y c)(vec M )c + y cx + Z} =
(vec M )c E{y … y} + xcE{y} + Z E{y c … y c}(vec M )c + E{y c}x + Z .

Vˆ 2 is called translational invariant with respect to y 6 y  E{y} if

Vˆ 2 = y cMy + xcy + Z = (y  E{y})cM (y  E{y}) + xc( y  E{y}) + Z


and uniformly unbiased if
194 4 The second problem of probabilistic regression

E{Vˆ 2 } = (vec M )c E{y … y} + xcE{y} + Z = V 2 for all V 2  R + .

Finally we have to discuss the postulate of a best estimator of type BLUUE of P


and BIQUUE of V 2 . We proceed sequentially, first we determine P̂ of type
BLUUE and second Vˆ 2 of type BIQUUE. At the end we shall discuss simulta-
neous estimation of ( Pˆ , Vˆ 2 ) .
The scalar Pˆ = A cy is BLUUE of P (Best Linear Uniformly Unbiased
Estimation) with respect to the linear model
E{y} = 1n P , D{y} = I nV 2 ,
if it is uniformly unbiased in the sense of E{Pˆ } = P for all P  R
and in comparison of all linear, uniformly unbiased estimations
possesses the smallest variance in the sense of

D{Pˆ } = E{[ Pˆ  E{Pˆ }]2 } =


V 2 A cA = V 2 tr A cA = V 2 || A ||2 = min .
The constrained Lagrangean L (A, O ) , namely
L (A, O ) := V 2 A cA + 2O (A c1n  1) =
= V 2 A cA + 2(1n A  1)O = min,
A ,O

produces by means of the first derivatives


1 wL ˆ ˆ
(A, O ) =V 2 Aˆ +1n Oˆ = 0
2 wA
1 wL ˆ ˆ ˆ
(A, O ) = A c1n 1= 0
2 wO
the normal equations for the augmented unknown vector (A, O ) , also known as
the necessary conditions for obtaining an optimum. Transpose the first normal
equation, right multiply by 1n , the unit column and substitute the second normal
equation in order to solve for the Lagrange multiplier Ô . If we substitute the
solution Ô in the first normal equation, we directly find the linear operator  .

V 2 Aˆ c + 1cn Oˆ = 0c

V 2 Aˆ c1 n + 1 cn 1 n Oˆ = V 2
+ 1 cn 1 n Oˆ = 0 Ÿ

V2 V2
Ÿ Oˆ =  =
1cn1n n
2
V
V 2 Aˆ +1n Oˆ =V 2 lˆ 1n = 0c Ÿ
n
1 1
Ÿ Aˆ = 1n and Pˆ = Aˆ cy = 1cn y .
n n
4-1 Introduction 195
The second derivatives
1 w 2L ˆ ˆ
( A , O ) = V 2 I n > 0c
2 wAwA c
constitute the sufficiency condition which is automatically satisfied. The theory
of vector differentiation is presented in detail in Appendix B. Let us briefly sum-
marize the first result P̂ BLUUE of P .

The scalar P̂ = A cy is BLUUE of P with respect to the linear model


E{y}= 1n P , D{y}= I nV 2 ,
if and only if
1 1
Aˆ c = 1cn and Pˆ = 1cn y
n n
is the arithmetic mean. The observation space y{Y, pdf } is decomposed into
y (BLUUE):= 1n Pˆ versus e y (BLUUE):= y  y (BLUUE),
1 1
y (BLUUE) = 1n 1cn y versus e y (BLUUE) =[I n  (1n 1cn )]y,
n n
which are orthogonal in the sense of
1 1
e y (BLUUE) y (BLUUE) = 0 or [I n  (1n1cn )] (1n1cn ) = 0.
n n

Before we continue with the setup of the Lagrangean which guarantees


BIQUUE, we study beforehand e y := y  E{y} and e y (BLUUE):= y  y (BLUUE) .
Indeed the residual vector e y (BLUUE) is a linear form of residual vector e y .
1
e y (BLUUE) =[I n  (1n1cn )] e y .
n
For the proof we depart from
1
e y (BLUUE):= y 1n Pˆ =[I n  (1n1cn )]y
n
1
=[I n  (1n1cn )]( y  E{y})
n
1
= I n  (1n1cn ) ,
n
where we have used the invariance property y 6 y  E{y} based upon the idem-
potence of the matrix [I n  (1n1cn ) / n] .

Based upon the fundamental relation e y (BLUUE) = De y , where D:= I n 


(1n1cn ) / n is a projection operator onto the normal space R (1n ) A , we are able to
derive an unbiased estimation of the variance component V 2 . Just compute
196 4 The second problem of probabilistic regression

E{ecy ( BLUUE )e y ( BLUUE )} =


= tr E{e y (BLUUE)ecy (BLUUE)} =
= tr D E{e y ecy }Dc = V 2 tr D Dc = V 2 tr D

tr D = tr ( I n )  tr 1n (1n1cn ) = n  1

E{ecy (BLUUE)e y (BLUUE)} = V 2 ( n  1) .

Let us define the quadratic estimator Vˆ 2 of V 2 by


1
Vˆ 2 = ecy (BLUUE)e y (BLUUE) ,
n 1
which is unbiased according to
1
E{Vˆ 2 } = E{ecy (BLUUE)e y (BLUUE)} = V 2 .
n 1
Let us briefly summarize the first result Vˆ 2 IQUUE of V 2 .
The scalar Vˆ 2 = ecy (BLUUE)e y (BLUUE) /( n  1) is IQUUE of V 2 based
upon the BLUUE-residual vector
e y (BLUUE) = ª¬I n  1n (1n1cn ) º¼ y .

Let us highlight Vˆ 2 BIQUUE of V 2 .


A scalar Vˆ 2 is BIQUUE of V 2 (Best Invariant Quadratic Uniformly Un-
biased Estimation) with respect to the linear model
E{y} = 1n P , D{y} = I nV 2 ,
if it is
(i) uniformly unbiased in the sense of E{Vˆ 2 } = V 2 for all V 2  \ + ,
(ii) quadratic in the sense of Vˆ 2 = y cMy for all M = M c ,
(iii) translational invariant in the sense of
y cMy = (y  E{y})cM ( y  E{y}) = ( y  1n P )cM ( y  1n P ) ,
(iv) best if it possesses the smallest variance in the sense of
D{Vˆ 2 } = E{[Vˆ 2  E{Vˆ 2 }]2 } = min .
M

First, let us consider the most influential postulate of translational invariance of


the quadratic estimation
Vˆ 2 = y cMy = (vec M )c(y … y ) = (y c … y c)(vec M)
to comply with
Vˆ 2 = ecy Me y = (vec M )c(e y … e y ) = (ecy … ecy )(vec M )
4-1 Introduction 197
subject to
M  SYM := {M  \ n× n| M = M c} .

Translational invariance is understood as the action of transformation group


y = E{y} + e y = 1n P + e y

with respect to the linear model of “direct” observations. Under the action of
such a transformation group the quadratic estimation Vˆ 2 of V 2 is specialized to

Vˆ 2 = y cMy = ª¬ E{y} + e y º¼c M ª¬ E{y} + e y º¼ = (1cn P + ecy )M (1n P + e y )

Vˆ 2 = P 2 1cn M1n + P 1cn Me y + P ecy M1n + ecy Me y


y cMy = ecy Me y œ 1cn M = 0c and 1cn M c = 0c .

IQE, namely 1cn M = 0c and 1cn M c = 0c has a definite consequence. It is independ-


ent of P , the first moment of the probability distribution (“pdf”). Indeed, the
estimation procedure of the central second moment V 2 is decoupled from the
estimation of the first moment P . Here we find the key role of the invariance
principle. Another aspect is the general solution of the homogeneous equation
1cn M = 0c subject to the symmetry postulate M = M c .

ªM = ªI n  1cn (1cn1n ) 11cn º Z


1cM = 0c œ « ¬ ¼
«¬ M = (I n  1n 1n1cn )Z ,

where Z is an arbitrary matrix. The general solution of the homogeneous matrix


equation contains the left inverse (generalized inverse (1cn 1n ) 1 1cn = 1-n ) which
takes an exceptionally simple form, here. The general form of the matrix
Z  \ n× n is in no agreement with the symmetry postulate M = M c .
1cn M = 0c
œ M = D (I n  1n 1n1cn ).
M = Mc

Indeed, we made the choice Z = D I n which reduces the unknown parameter


space to one dimension. Now by means of the postulate “best” under the con-
straint generated by “uniform inbiasedness” Vˆ 2 of V 2 we shall determine the
parameter D = 1/(n  1) .
The postulate IQUUE is materialized by

E{Vˆ 2 } = V 2 º ª E{ecy Me y } = mij E{eiy e jy }


» œ « +
Vˆ 2 = ecy Me y ¼» ¬« = mij S ij = V mij G ij = V V  \
2 2 2

E{Vˆ 2 | Ȉ y = I nV 2 } = V 2 œ tr M = 1 œ tr M  1 = 0 .
198 4 The second problem of probabilistic regression

For the simple case of “i.i.d.” observations, namely Ȉ y = I nV 2 , E{Vˆ 2 } = V 2 for


an IQE, IQUUE is equivalent to tr M = 1 or (tr M )  1 = 0 as a condition equa-
tion.
D tr(I n  1n 1n1cn ) = D (n  1) = 1
1
tr M = 1 œ D= .
n 1
IQUUE of the simple case
invariance : (i ) 1cM = 0c and M = M cº 1
» Ÿ M= (I n  1n 1n1cn )
QUUE : (ii ) tr M  1 = 0 ¼ n  1

has already solved our problem of generating the symmetric matrix M .

1
Vˆ 2 = y cMy = y c(I n  1n 1n1cn )y  IQUUE
n 1

? Is there still a need to apply “best” as an optimability condition for BIQUUE ?


Yes, there is! The general solution of the homogeneous equations 1cn M = 0c and
M c1n = 0 generated by the postulate of translational invariance of IQE did not
produce a symmetric matrix. Here we present the simple symmetrization. An
alternative approach worked depart from
1
2
(M + M c) = 12 {[I n  1n (1cn1n ) 11cn ]Z + Zc[I n  1n (1cn1n ) 11cn ]} ,

leaving the general matrix Z as an unknown to be determined. Let us therefore


develop BIQUUE for the linear model
E{y} = 1n P , D{y} = I nV 2

D{Vˆ 2 } = E{(Vˆ 2  E{Vˆ 2 }) 2 } = E{Vˆ 4 }  E{Vˆ 2 }2 .

Apply the summation convention over repeated indices i, j , k , A  {1,..., n}.


1st : E{Vˆ 2 }2
E{Vˆ 2 }2 = mij E{eiy e jy }mk A E{eky eAy } = mij mklS ijS k A
subject to
S ij := E{e e } = V G ij and S k A := E{eky eAy } = V 2G k A
y y
i j
2

E{Vˆ 2 }2 = V 4 mijG ij mk AG k A = V 4 (tr M ) 2

2nd : E{Vˆ 4 }
E{Vˆ 4 } = mij mk A E{eiy e jy eky eAy } = mij mk AS ijk A
4-1 Introduction 199
subject to
S ijk A := E{eiy e jy eky eAy }  i, j , k , A  {1,..., n} .

For a normal pdf, the fourth order moment S ijk A can be reduced to second order
moments. For a more detailed presentation of “normal models” we refer to Ap-
pendix D.
S ijk A = S ijS k A + S ik S jA + S iAS jk = V 4 (G ijG k A + G ik G j A + G iAG jk )

E{Vˆ 4 } = V 4 mij mk A (G ijG k A + G ik G jA + G iAG jk )

E{Vˆ 4 } = V 4 [(tr M ) 2 + 2 tr M cM ].

Let us briefly summarize the representation of the variance D{Vˆ 2 } =


E{(Vˆ 2  E{Vˆ 2 }) 2 } for normal models.
Let the linear model of i.i.d. direct observations be
defined by
E{y | pdf } = 1n P , D{y | pdf } = I nV 2 .
The variance of a normal IQE can be represented by
D{Vˆ 2 } := E{(Vˆ 2  E{Vˆ 2 }) 2 } =
= 2V 4 [(tr M ) 2 + tr(M 2 )].
In order to construct BIQUUE, we shall define a constrained Lagrangean which
takes into account the conditions of translational invariance, uniform unbiased-
ness and symmetry.
L (M, O0 , O1 , O 2 ) := 2 tr M cM + 2O0 (tr M  1) + 2O11cn M1 n + 2O 2 1cn M c1 n = min .
M , O0 , O1 , O2

Here we used the condition of translational invariance in the special form


1cn 12 (M + M c)1 n = 0 œ 1cn M1 n = 0 and 1cn M c1 n = 0 ,

which accounts for the symmetry of the unknown matrix. We here conclude with
the normal equations for BIQUUE generated from
wL wL wL wL
= 0, = 0, = 0, = 0.
w (vec M ) wO0 wO1 wO2

ª 2(I n … I n ) vec I n I n … 1 n 1 n … I n º ª vec M º ª0 º


« (vec I )c 0 0 0 »» «« O0 »» ««1 »»
« n
= .
« I n … 1cn 0 0 0 » « O1 » «0 »
« »« » « »
¬ 1cn … I n 0 0 0 ¼ ¬ O2 ¼ ¬0¼
200 4 The second problem of probabilistic regression

These normal equations will be solved lateron. Indeed M = (I n  1n 1 n1cn ) / ( n  1)


is a solution.
1
Vˆ 2 = y c(I n  1n 1n1cn )y
n 1
BIQUUE:
2
D{Vˆ 2 } = V4
n 1
Such a result is based upon
1 1
(tr M ) 2 (BIQUUE) = , (tr M 2 )(BIQUUE) = ,
n 1 n 1

D{Vˆ 2 | BIQUUE} = D{Vˆ 2 } = 2V 4 [(tr M ) 2 + (tr M 2 )](BIQUUE),

2
D{Vˆ 2 } = V 4.
n 1
Finally, we are going to outline the simultaneous estimation of {P , V 2 } for the
linear model of direct observations.
• first postulate: inhomogeneous, multilinear (bilinear) estimation
Pˆ = N 1 + A c1y + mc1 (y … y )

Vˆ 2 = N 2 + A c2 y + (vec M 2 )c( y … y )

ª Pˆ º ªN 1 º ª A c1 mc1 º ª y º
«Vˆ 2 » = «N » + « A c (vec M )c» « y … y »
¬ ¼ ¬ 2¼ ¬ 2 2 ¼¬ ¼

ª Pˆ º ªN A c1 m1c º
x = XY œ x := « 2 » , X = « 1 »
¬Vˆ ¼ ¬N 2 A c2 (vec M 2 )c¼

ª 1 º
Y := «« y »»
«¬ y … y »¼

• second postulate: uniform unbiasedness


ª Pˆ º ª P º
E {x} = E{« 2 »} = « 2 »
¬Vˆ ¼ ¬V ¼
• third postulate: minimum variance

D{x} := tr E{ª¬ x  E {x}º¼ ª¬ x  E {x}º¼ c } = min .


4-1 Introduction 201

4-13 BLUUE and BIQUUE of the front page example, sample median,
median absolute deviation
According to Table 4.1 and Table 4.2 we presented you with two sets of observa-
tions yi  Y, dim Y = n, i  {1,..., n} , the second one qualifies to certain “one
outlier”. We aim at a definition of the median and of the median absolute devia-
tion which is compared to the definition of the mean (weighted mean) and of the
root mean square error. First we order the observations according to
y(1) < y( 2) < ... < y( n1) < y( n ) by means of the permutation matrix
ª y(1) º ª y1 º
« y » « y2 »
« (2) » « »
« ... » = P « ... » ,
« y( n 1) » « yn 1 »
« » «¬ yn »¼
¬« y( n ) ¼»

namely
data set one data set two
ª 11 º ª 0 0 0 1 0 0º ª 15 º
ª11º ª 0 0 0 1 0º ª15º « 12 » « 0
«12 » « 0 1 0 0 0 0»» «« 12 »»
« » « 1 0 0 0 »» ««12»» « » «
« 13 » « 0 0 0 0 1 0» « 14 »
«13» = « 0 0 0 0 1 » «14» versus « »=« »« »
« » « »« » « 14 » « 0 0 1 0 0 0» « 11 »
«14 » « 0 0 1 0 0 » «11»
« 15 » «1 0 0 0 0 0» « 13 »
«¬15»¼ «¬1 0 0 0 0»¼ «¬13»¼ « » « »« »
«¬116 »¼ «¬ 00 0 0 0 1 »¼ «¬116»¼
ª0 0 0 1 0 0º
ª0 0 0 1 0º «0 1 0 0 0 0»
«0 1 0 0 0»» « »
« «0 0 0 0 1 0»
P5 = « 0 0 0 0 1» versus P6 = « ».
« » «0 0 1 0 0 0»
«0 0 1 0 0»
«1 0 0 0 0 0 »
«¬1 0 0 0 0»¼ « »
«¬0 0 0 0 0 1 »¼

Note PP c = I , P 1 = P c . Second, we define the sample median med y as well as


the median absolute deviation mad y of y  Y by means of

ª y([ n / 2]+1) if n is an odd number


med y := « 1
¬ 2 ( y( n / 2) + y( n / 2+1) ) if n is an even number
mad y := med | y( i )  med y | ,

where [n/2] denotes the largest integer (“natural number”) d n / 2 .


202 4 The second problem of probabilistic regression

Table 4.3: “direct” observations, comparison two data sets by means of med y,
mad y (I-LESS, G y -LESS), r.m.s. (I-BIQUUE)
data set one data set two
n = 5 (“odd”) n = 6 (“even”)
n / 2 = 2.5, [n / 2] = 2 n / 2 = 3, n / 2 + 1 = 4
[n / 2] + 1 = 3
med y = y(3) = 13 med y = 13.5

mad y = 1 mad y = 1.5

mean y (I -LESS) = 13 mean y (I -LESS) = 30.16


weighted mean y
(G y -LESS) = 13.5

G y = Diag(1,1,1,1,1, 1000
24
)

Pˆ (I -BLUUE ) = 13 Pˆ (I -BLUUE ) = 30.16

Vˆ 2 (I -BIQUUE ) = 2.5 Vˆ 2 (I-BIQUUE ) = 1770.1


r.m.s. (I -BIQUUE ) = r.m.s. (I -BIQUUE ) =
Vˆ (I -BIQUUE ) = 1.6 Vˆ (I-BIQUUE ) = 42.1

Third, we compute I-LESS, namely mean y = (1c1) 1 y = 1n 1cy listed in Table


4.3. Obviously for the second observational data set the Euclidean metric of the
observation space Y is not isotropic. Indeed let us compute G y -LESS, namely
the weighted mean y = (1cG y 1) 1 1cG y y . A particular choice of the matrix
G y  \ 6×6 of the metric, also called “weight matrix”, is G y = Diag(1,1,1,1,1, x)
such that
y + y2 + y3 + y4 + y5 + y6 x
weighted mean y = 1 ,
5+ x
where x is the unknown weight of the extreme value (“outlier”) y6 . A special
robust design of the weighted mean y is the median y , namely
weighted mean y = med y

such that
y1 + y2 + y3 + y4 + y5  5med y
x=
med y  y6
4-1 Introduction 203
here
24
x = 0.024, 390, 243  .
1000
Indeed the weighted mean with respect to the weight matrix G y = Diag(1,1,1,
1,1, 24 /1000) reproduces the median of the second data set. The extreme value
has been down-weighted by a weight 24 /1000 approximately.
Four, with respect to the simple linear model E{y} = 1P , D{y} = IV 2 we com-
pute I-BLUUE of P and I-BIQUUE of V 2 , namely
Pˆ = (1c1) 11cy = 1n 1cy
1 1 1
Vˆ 2 = y c ªI  1(1c1) 1 º¼ y = y c ªI  1 11cº y = (y  1Pˆ )c(y  1Pˆ ) .
n 1 ¬ n 1 ¬ n ¼ n 1
Obviously the extreme value y6 in the second data set has spoiled the specifica-
tion of the simple linear model. The r.m.s. (I-BLUUE) = 1.6 of the first data set
is increased to the r.m.s. (I-BIQUUE) = 42.1 of the second data set.
Five, we setup the alternative linear model for the second data set, namely
ª y1 º ª P1 º ª P º ª1º ª0º
« y2 » « P » « P » «1» «0»
« » « 1» « » « » « »
y P P » 1 0
E{« 3 »} = « 1 » = « = « » P + « »Q
« y4 » « P1 » « P » «1» «0»
« y5 » « P1 » « P » «1» «0»
« » « » « » «» «1 »
«¬ y6 »¼ ¬ P 2 ¼ ¬ P + Q ¼ ¬1¼ ¬ ¼

[ ] [ ]
ª A := 1, a  \ 5×2 , 1 := 1,1,1,1,1,1 c  \ 6×1
E{y} = Aȟ : «
«
¬ȟ := [ P ,Q ]  \ , a := [ 0, 0, 0, 0, 0,1]  \
c 2×1 c 6×1

ª1 0 0 0 0 0º
«0 1 0 0 0 0»
« »
0 0 1 0 0 0» 2
D{y} = « V  \ 6×6
«0 0 0 1 0 0»
«0 0 0 0 1 0»
«0 0 0 0 0 1 »¼
¬
D{y} = I 6V 2 , V 2  \ + ,
adding to the observation y6 the bias term Q . Still we assume the variance-
covariance matrix D{y} of the observation vector y  \ 6×1 to be isotropic with
one variance component as an unknown. ( Pˆ ,Qˆ ) is I 6 -BLUUE if
204 4 The second problem of probabilistic regression

ª Pˆ º
«Qˆ » = ( A cA) A cy
1

¬ ¼

ª Pˆ º ª 13 º
«Qˆ » = «103»
¬ ¼ ¬ ¼
Pˆ = 13, Qˆ = 103, P1 = Pˆ = 13, yˆ 2 = 116

ª Pˆ º ª Pˆ º V 2 ª 1 1º
D{« »} = ( A cA) 1 V 2 D{« »} = « 1 6 »
¬Qˆ ¼ ¬Qˆ ¼ 5 ¬ ¼
V2 6 1 2
V P2ˆ = , V Q2ˆ = V 2 , V PQ
ˆˆ =  V
5 5 5
Vˆ 2 is I 6 -BIQUUE if

1
Vˆ 2 = y c ªI 6  A ( A cA ) 1 A cº¼ y
n  rk A ¬

ª4 1 1 1 1 0º
« 1 4 1 1 1 0»
1 « 1 1 4 1 1 0»
»
I 6  A ( A cA ) 1 A c = «
5 « 1 1 1 4 1 0»
« 1 1 1 1 4 0»
«0 0 0 0 0 5 »¼
¬
§4 4 4 4 4 ·
ri := ª¬I 6  A( A cA) 1 A cº¼ = ¨ , , , , ,1¸ i  {1,..., 6}
ii
©5 5 5 5 5 ¹
are the redundancy numbers.
y c(I 6  A( A cA) 1 A c) y = 13466
1
Vˆ 2 = 13466 = 3366.5, Vˆ = 58.02
4
3366.5
V Pˆ (Vˆ ) =
2 2
= 673.3, V Pˆ (Vˆ ) = 26
5
6
V Q2ˆ (Vˆ 2 ) = 3366.5 = 4039.8, V Qˆ (Vˆ ) = 63.6 .
5
Indeed the r.m.s. value of the partial mean P̂ as well as of the estimated bias Qˆ
have changed the results remarkably, namely from r.m.s. (simple linear model)
42.1 to r.m.s. (linear model) 26. A r.m.s. value of the bias Qˆ in the order of 63.6
has been documented. Finally let us compute the empirical “error vector” l and
is variance-covariance matrix by means of
4-1 Introduction 205

e y = ª¬I 6  A ( A cA ) 1 A cº¼ y , D{e y } = ª¬I 6  A( A cA) 1 A cº¼ V 2 ,

e y = [ 2 1 1 2 0 116]c

ª4 1 1 1 1 0º
« 1 4 1 1 1 0» 2
{} «
D l = « 1
1
1
1
4
1
1
4
1
1
0» V
0» 5
« 1 1 1 1 4 0»
«¬ 0 0 0 0 0 5 »¼
ª4 1 1 1  1 0º
« 1 4 1 1  1 0»
« 0» .
D{l | Vˆ } = 673.3 « 1
 2 1 4 1  1
1 1 1 4 1 0»
« 1 1 1 1 4 0»
«¬ 0 0 0 0 0 5 »¼

4-14 Alternative estimation Maximum Likelihood (MALE)


Maximum Likelihood Estimation ("MALE") is a competitor to BLUUE of the
first moments E{y} and to the BIQUUE of the second central moments D{y} of
a random variable y…{Y, pdf } , which we like to present at least by an example.
Maximum Likelihood Estimation
:linear model:
E{y} = 1n P , D{y} = I nV 2
"independent, identically normal distributed observations"
[ y1 ,..., yn ]c
"direct observations"
unknown parameter:{P , V }  {R, R + } =: X
"simultaneous estimations of {P , V 2 } ".
Given the above linear model of independent, identically, normal distributed
observations [y 1 , ..., y n ]c = y  {R n , pdf } . The first moment P as well as the
central second moment V 2 constitute the unknown parameters ( P , V 2 )  R × R +
where R × R + is the admissible parameter space. The estimation of the unknown
parameters ( P , V 2 ) is based on the following optimization problem
Maximize the log-likelihood function
n
ln f ( y1 , ..., yn P , V 2 ) = ln – f ( yi P , V 2 ) =
i =1
n
1 1
= ln{ n n
exp( 2 ¦(y i  P ) 2 )} =
V 2V
( 2S ) 2 i =1

n
n n 1
=  ln 2S  ln V 2  2 ¦(y i  P ) 2 = max
2 2 V i =1 P ,V 2
206 4 The second problem of probabilistic regression

of the independent, identically normal distributed random variables { y1 ,… , yn } .


The log-likelihood function is simple if we introduce the first sample moment
m1 and the second sample moment m2 , namely
1 n 1 1 n 2 1
m1 := ¦ i ny = 1c y , m2 := ¦ yi = n y cy
n i =1 n i =1

( ) n
2
n
2
n
(
ln f y1 , " , yn P , V 2 =  ln 2S  ln V 2  2 m2  2m1 P + P 2 .
2V
)
Now we are able to define the optimization problem
( ) (
A P , V 2 := ln f y1 , " , yn P , V 2 = max) P, V 2

more precisely.
Definition (Maximum Likelihood Estimation, linear model
E{y} = 1n P , D{y} = I nV 2 , independent, identically normal
distributed observations { y1 ,… , yn } ):
A 2x1 vector [ PA , V A2 ]' is called MALE of [ P , V 2 ]' , (Maximum Likeli-
hood Estimation) with respect to the linear model 0.1 if its log-
likelihood function

A( P , V 2 ) := ln f ( y1 ,… , yn P , V 2 ) =
n n n
=  ln 2S  ln V 2  2 (m2  2m1 + P 2 )
2 2 2V
is minimal.
The simultaneous estimation of ( P , V 2 ) of type MALE can be characterized as
following.
Corollary (MALE with respect to the linear model
E{y}= 1n P , D{y}= I nV 2 , independent identically normal dis-
tributed observations { y1 ,… , yn } ):
( )
The log-likelihood function A P , V 2 with respect to the linear model
E{y} = 1n P , D{y} = I nV 2 , ( P , V 2  R × R + ) , of independent, identically
normal distributed observations { y1 ,… , yn } is maximal if
1 1
PA = m1 = 1c y, V A2 = m2  m12 = ( y  yA )c( y  yA )
n n
is a simultaneous estimation of the mean volume (first moment) PA and
of the variance (second moment) V A2 .
:Proof:
The Lagrange function
n n
L( P , V 2 ) :=  ln V 2  2 (m2  2m1 P + P 2 ) = max
2 2V P ,V 2
4-1 Introduction 207
leads to the necessary conditions
wL nm nP
( P , V 2 ) = 21 = 2 = 0
wP V V
wL n n
( P , V 2 ) =  2 + 4 (m2  2 P m1 + P 2 ) = 0,
wV 2 2V 2V
also called the likelihood normal equations.
Their solution is
ª P1 º ª m1 º 1 ª 1cy º
«V 2 » = « m  m 2 » = « y cy - (1cy)2 » .
¬ A¼ ¬ 2 1 ¼ n¬ ¼
The matrix of second derivates constitutes as a negative matrix the sufficiency
conditions.

ª 1 º
2 «V 2 0 »
w L
( P A , V A ) =  n « A »>0.
w ( P , V 2 )w ( P , V 2 ) ' « 0 1 »
« V A4 »¼
¬
h
Finally we can immediately check that A( P , V 2 ) o f as ( P , V 2 ) approaches the
boundary of the parameter space. If the log-likelihood function is sufficiently
regular, we can expand it as
ª PP º
A( P , V 2 ) = A( PA , V A2 ) + DA( PA , V A2 ) « 2 A2 » +
¬V  V A ¼
1 ª PP º ª PP º
+ D 2 A( PA , V A2 ) « 2 A2 » … « 2 A2 » + O3 .
2 ¬V  V A ¼ ¬V  V A ¼
Due to the likelihood normal likelihood equations DA( PA , V A2 ) vanishes. There-
fore the behavior of A( P , V 2 ) near ( PA , V A2 ) is largely determined by
D 2 A( PA , V A2 )  R × R + , which is a measure of the local curvature the log-
likelihood function A( P , V 2 ) . The non-negative Hesse matrix of second deriva-
tives

w2A
I ( PA , V A2 ) =  2 2
( PA , V A2 ) > 0
w ( P , V )w ( P , V ) '

is called observed Fischer information. It can be regarded as an index of the


steepness of the log-likelihood function moving away from ( P , V 2 ) , and as an
indicator of the strength of preference for the MLE point with respect to the
other points of the parameter space.
208 4 The second problem of probabilistic regression

Finally, compare by means of Table 4.4 ( PA , V A2 ) MALE of ( P , V 2 ) for the front


page example of Table 4.1 and Table 4.2
Table 4.4 : ( PA , V A2 ) MALE of ( P , V 2 )  {R, R + } : the front page examples

PA V A2 | VA |
st
1 example
(n=5) 13 2 2
2nd example
(n=6) 30.16 1474.65 36.40

4-2 Setup of the best linear uniformly unbiased estimator of type


BLUUE for the moments of first order
Let us introduce the special Gauss-Markov model y = Aȟ + e specified in Box
4.1, which is given for the first order moments in the form of a inconsistent sys-
tem of linear equations relating the first non-stochastic (“fixed”), real-valued
vector ȟ of unknowns to the expectation E{y} of the stochastic, real-valued
vector y of observations, Aȟ = E{y} , since E{y}  R ( A) is an element of the
column space R ( A ) of the real-valued, non-stochastic ("fixed") "first order
design matrix" A  \ n× m . The rank of the fixed matrix A, rk A, equals the num-
ber m of unknowns, ȟ  \ m . In addition, the second order central moments, the
regular variance-covariance matrix Ȉ y , also called dispersion matrix D{y}
constitute the second matrix Ȉ y  \ n×n of unknowns to be specified as a linear
model furtheron.
Box 4.1:
Special Gauss-Markov model
y = Aȟ + e
1st moments
Aȟ = E{y}, A  \ n× m , E{y}  R ( A), rk A = m (4.1)
2nd moments
Ȉ y = D{y}  \ n×n , Ȉ y positive definite, rk Ȉ y = n (4.2)

ȟ, E{y}, y  E{y} = e unknown


Ȉ y unknown .

4-21 The best linear uniformly unbiased estimation ȟ̂ of ȟ : Ȉ y -BLUUE


Since we are dealing with a linear model, it is "a natural choice" to setup a lin-
ear form to estimate the parameters ȟ of fixed effects, namely

ȟˆ = Ly + ț , (4.3)
4-2 Setup of the best linear uniformly unbiased estimators 209

where {L  \ m × n , ț  \ m } are fixed unknowns. In order to determine the real-


valued m×n matrix L and the real-valued m×1 vector ț , independent of the
variance-covariance matrix Ȉ y , the inhomogeneous linear estimation ȟ̂ of the
vector ȟ of fixed effects has to fulfil certain optimality conditions.
(1st) ȟ̂ is an inhomogeneous linear unbiased estimation of ȟ

E{ȟˆ} = E{Ly + ț} = ȟ for all ȟ  R m ,


(4.4)
and
(2nd) in comparison to all other linear uniformly unbiased estimations
ȟ̂ has minimum variance

tr D{ȟˆ}: = E{(ȟˆ  ȟ )c(ȟˆ  ȟ )} =


(4.5)
= tr L Ȉ y Lc =|| Lc ||Ȉ = min .
L

First the condition of a linear uniformly unbiased estimation E{ȟˆ} =


E{Ly + ț} = ȟ for all ȟ  R m with respect to the Special Gauss-Markov model
(4.1), (4.2) has to be considered in more detail. As soon as we substitute the
linear model (4.1) into the postulate of uniformly unbiasedness (4.4) we are led
to
E{ȟˆ} = E{Ly + ț} = LE{y} + ț = ȟ for all ȟ  R m (4.6)
and
LAȟ + ț = ȟ for all ȟ  R m . (4.7)

Beside ț = 0 the postulate of linear uniformly unbiased estimation with respect


to the special Gauss-Markov model (4.1), (4.2) leaves us with one condition,
namely
(LA  I m )ȟ = 0 for all ȟ  R m (4.8)
or
LA  I m = 0. (4.9)

Note that there are locally unbiased estimations such that (LA  I m )ȟ 0 = 0 for
LA  I m z 0. Alternatively, B. Schaffrin (2000) has softened the constraint of
unbiasedness (4.9) by replacing it by the stochastic matrix constraint
A cLc = I m + E0 subject to E{vec E0 } = 0, D{vec E0 } = (I m … Ȉ 0 ), Ȉ 0 a positive
definite matrix. For Ȉ0 o 0 , uniform unbiasedness is restored. Estimators which
fulfill the stochastic matrix constraint A cLc = I m + E0 for finite Ȉ0 are called
“softly unbiased” or “unbiased in the mean”.
Second, the choice of norm for "best" of type minimum variance has to be dis-
cussed more specifically. Under the condition of a linear uniformly unbiased
estimation let us derive the specific representation of the weighted Frobenius
matrix norm of Lc . Indeed let us define the dispersion matrix
210 4 The second problem of probabilistic regression

D{ȟˆ} := E{(ȟˆ  E{ȟˆ})(ȟˆ  E{ȟˆ})c} =


(4.10)
= E{(ȟˆ  ȟ )(ȟˆ  ȟ )c},

which by means of the inhomogeneous linear form ȟˆ = Ly + ț is specified to

D{ȟˆ} = LD{y}Lc (4.11)


and
Definition 4.1 ( ȟ̂ Ȉ y - BLUUE of ȟ ):
An m×1 vector ȟˆ = Ly + ț is called Ȉ y - BLUUE of ȟ (Best Linear
Uniformly Unbiased Estimation) with respect to the Ȉ y -norm in
(4.1) if
(1st) ȟ̂ is uniformly unbiased in the sense of

tr D{ȟˆ} : = tr L D{y} Lc =|| Lc ||Ȉ .


y
(4.12)

Now we are prepared for


Lemma 4.2: ( ȟˆ Ȉ y - BLUUE of ȟ ):
An m×1 vector ȟˆ = Ly + ț is Ȉ y - BLUUE of ȟ in (4.1) if and only if
ț=0 (4.13)
holds and the matrix L fulfils the system of normal equations
ª Ȉ y A º ª Lcº ª 0 º
« Ac 0 » « ȁ » = «I » (4.14)
¬ ¼¬ ¼ ¬ m¼
or
Ȉ y Lc + Aȁ = 0 (4.15)
and
A cLc = I m (4.16)
with the m × m matrix of "Lagrange multipliers".
:Proof:
Due to the postulate of an inhomogeneous linear uniformly unbiased estimation
with respect to the parameters ȟ  \ m of the special Gauss-Markov model we
were led to ț = 0 and one conditional constraint which makes it plausible to
minimize the constraint Lagrangean
L ( L, ȁ ) := tr LȈ y Lc + 2tr ȁ( A cLc  I m ) = min (4.17)
L ,ȁ

for Ȉ y - BLUUE. The necessary conditions for the minimum of the quadratic
constraint Lagrangean L (L, ȁ ) are
4-2 Setup of the best linear uniformly unbiased estimators 211

wL ˆ ˆ ˆ )c = 0
( L, ȁ ) := 2( Ȉ y Lˆ c + Aȁ (4.18)
wL
wL ˆ ˆ
(L, ȁ ) := 2( A cLˆ c  I m ) = 0 , (4.19)

which agree to the normal equations (4.14). The theory of matrix derivatives is
reviewed in Appendix B, namely (d3) and (d4).
The second derivatives
w 2L ˆ ) = 2( Ȉ … I ) > 0
(Lˆ , ȁ y m (4.20)
w (vecL)w (vecL)c
constitute the sufficiency conditions due to the positive-definiteness of the matrix
Ȉ for L (L, ȁ ) = min . (The Kronecker-Zehfuss Product A … B of two arbitrary
matrices A and B, is explained in Appendix A.)
h
Obviously, a homogeneous linear form ȟˆ = Ly is sufficient to generate Ȉ -
BLUUE for the special Gauss-Markov model (4.1), (4.2). Explicit representa-
tions of Ȉ - BLUUE of type ȟ̂ as well as of its dispersion matrix D{ȟˆ | ȟˆ Ȉ y -
BLUUE} generated by solving the normal equations (4.14) are collected in
Theorem 4.3 ( ȟˆ Ȉ y -BLUUE of ȟ ):
Let ȟˆ = Ly be Ȉ - BLUUE of ȟ in the special linear Gauss-Markov
model (4.1),(4.2). Then
ȟˆ = ( A cȈ y 1 A) 1 A cȈ y1 y (4.21)

ȟˆ = Ȉȟˆ A cȈ y1y (4.22)


are equivalent to the representation of the solution of the normal
equations (4.14) subjected to the related dispersion matrix

D{ȟˆ}:= Ȉȟˆ = ( A cȈ y1A ) 1 .

:Proof:
We shall present two proofs of the above theorem: The first one is based upon
Gauss elimination in solving the normal equations (4.14), the second one uses
the power of the IPM method (Inverse Partitioned Matrix, C. R. Rao's Pandora
Box).
(i) forward step (Gauss elimination):
Multiply the first normal equation by Ȉ y1 , multiply the reduced equation by Ac
and subtract the result from the second normal equation. Solve for ȁ
212 4 The second problem of probabilistic regression

Ȉ y Lˆ c + Aȁˆ = 0 (first equation: º


multiply by -A cȈy1 ) » œ
»
A cLˆ c = I m (second equation) »¼

 A cLˆ c  A cȈy1Aȁˆ = 0º
œ » œ
A cLˆc=I »¼
m

œ  A cȈ y1Aȁ ˆ =I Ÿ

ˆ = ( A cȈ 1A ) 1 .
Ÿ ȁ (4.23)
y

(ii) backward step (Gauss elimination):


Substitute ȁ̂ in the modified first normal equation and solve for L̂ .

Lˆ c + Ȉy1Aȁˆ = 0 œ Lˆ = ȁ ˆ cA cȈ 1 º
y
œ » œ
ˆ
ȁ = ( A cȈ y A )
1 1
»¼

Ÿ Lˆ = ( A cȈy1A ) 1 A cȈy1 . (4.24)

(iii) IPM (Inverse Partitioned Matrix):


Let us partition the symmetric matrix of the normal equations (4.14)

ª Ȉ y A º ª A11 A12 º
« Ac 0 » = « Ac 0 »¼
.
¬ ¼ ¬ 12
According to Appendix A (Fact on Inverse Partitioned Matrix: IPM) its Cayley
inverse is partitioned as well.
1 1
ª Ȉy Aº ªA A12 º ªB B12 º
= « 11 = « 11
« Ac
¬ 0 »¼ c
¬ A12 0 »¼ c
¬B12 B 22 »¼

B11 = I m  Ȉ y1A( A cȈ y1A) 1 A cȈ y1


c = ( A cȈ y1A) 1 A cȈ y1
B12
B 22 = ( AcȈ y1A) 1.

The normal equations are now solved by


1
ªLˆ cº ª A11 A12 º ª 0 º ª B11 B12 º ª 0 º
« »=« =
ˆ c
¬« ȁ ¼» ¬ A12 0 »¼ «¬ I m »¼ «¬ B12
c B 22 »¼ «¬ I m »¼

Lˆ = B12
c = ( AcȈ y1A) 1 AcȈ y1
ˆ = B = ( A cȈ 1A) 1. (4.25)
ȁ 22 y
4-2 Setup of the best linear uniformly unbiased estimators 213
(iv) dispersion matrix
The related dispersion matrix is computed by means of the "Error Propagation
Law".
D{ȟˆ} = D{Ly | Lˆ = ( A cȈ y1A) 1} = Lˆ D{y}Lˆ c
D{ȟˆ} = ( A cȈ y1A) 1 A cȈ y1Ȉ y Ȉ y1A( A cȈ y1 A) 1

D{ȟˆ} = ( A cȈ y1A ) 1 . (4.26)

Here is my proof's end.


h
By means of Theorem 4.3 we succeeded to produce ȟ̂ - BLUUE of ȟ . In conse-
n
quence, we have to estimate E{y} as Ȉ y - BLUUE of E{y} as well as the "er-
ror vector"
e y := y  E{y} (4.27)
n
e y := y  E {y} = y  Aȟˆ = ( I n  AL) y (4.28)
out of
Lemma 4.4: ( En{y} Ȉ y - BLUUE of E{y} , e y , D{e y }, D{y} ):
n
(i) Let E{y} be Ȉ - BLUUE of E{y} = Aȟ with respect to the
special Gauss-Markov model (4.1), (4.2) , Then
n
E {y} = Aȟˆ = A( A cȈ y 1A ) 1 A cȈy 1 y (4.29)
leads to the singular variance-covariance matrix (dispersion matrix)

D{Aȟˆ} = A( A cȈ y1A ) 1 A c . (4.30)


(ii) If the error vector e is empirically determined, we receive for
the residual vector
e y = [I n  A( A cȈy1A ) 1 A cȈy1 ]y (4.31)
and its singular variance-covariance matrix (dispersion matrix)
D{e y } = Ȉ y  A( A cȈ y1A ) 1 A c, rk D{e y } = n  m . (4.32)

(iii) The dispersion matrices of the special Gauss-Markov model (4.1),


(4.2) are related by

D{y} = D{Aȟˆ + e y } = D{Aȟˆ} + D{e y } =


(4.33)
= D{e y  e y } + D{e y },

C{e y , Aȟˆ} = 0, C{e y , e y  e y } = 0. (4.34)

e y and Aȟˆ are uncorrelated .


214 4 The second problem of probabilistic regression

:Proof:
n
(i ) E{y} = Aȟˆ = A ( A cȈ y 1 A ) 1 A cȈ y 1 y

As soon as we implement ȟ̂ Ȉ y - BLUUE of ȟ , namely (4.21), into Aȟˆ we are


directly led to the desired result.

(ii ) D{Aȟˆ} = A ( A cȈ y 1 A ) 1 A c

ȟ̂ Ȉ y - BLUUE of ȟ , namely (4.21), implemented in

D{Aȟˆ} := E{A(ȟˆ  E{ȟˆ})(ȟˆ  E{ȟˆ})cA c}

D{Aȟˆ} = AE{(ȟˆ  E{ȟˆ})(ȟˆ  E{ȟˆ})c}A c


D{Aȟˆ} = A( A cȈ y1A) 1 AcȈ y1 E{(y  E{y})( y  E{y})c}Ȉ y1A( AcȈ y1A) 1 Ac
D{Aȟˆ} = A( A cȈ y1A) 1 AcȈ y1A( A cȈ y1A) 1 Ac
D{Aȟˆ} = A( A cȈ y1A) 1 Ac

leads to the proclaimed result.


(iii ) e y = [I n  A( AcȈ y1 A) 1 AcȈ y1 ]y.

Similarly if we substitute Ȉ y - BLUUE of ȟ , namely (4.21), in


n
e y = y  E {y} = y  Aȟˆ = [I n  A ( A cȈ y 1A ) 1 A cȈ y 1 ]y

we gain what we wanted!


(iv ) D{eˆ y } = Ȉ  A( A cȈ y1A ) 1 A c

D{e y } := E{(e y  E{e y })(e y  E{e y })c}.

As soon as we substitute
E{e y } = [I n  A( A cȈy1A ) 1 A cȈy1 ]E{y}

in the definition of the dispersion matrix D{e y } , we are led to

D{e y }:= [I n  A ( A cȈy1A ) 1 A cȈy1 ] Ȉ [I n  Ȉy1A ( A cȈy1A ) 1 A c],


D{e y } =
= [ Ȉ y  A ( A cȈ A ) A c][I n  Ȉ y1A( A cȈ y1A ) 1 A c] =
1
y
1

= Ȉ y  A ( A cȈ y1A ) 1 A c  A( A cȈy1A ) 1 A c + A( A cȈy1A ) 1 A c =


= Ȉ y  A ( A cȈ y1A ) 1 A c.
4-2 Setup of the best linear uniformly unbiased estimators 215

rk D{e y } = rk D{y}  rk A( AȈ y1A) 1 A c = n  m.

( v ) D{y} = D{Aȟˆ + e y } = D{Aȟˆ} + D{e y } = D{e y  e y } + D{e y }

y  E{y} = y  Aȟ = y  Aȟˆ + A(ȟˆ  ȟ )


y  E{y} = A(ȟˆ  ȟ ) + e . y

The additive decomposition of the residual vector y-E{y} left us with two terms,
namely the predicted residual vector e y and the term which is a linear functional
of ȟˆ  ȟ. The corresponding product decomposition
[ y  E{y}][ y  E{y}]c =
= A (ȟˆ  ȟ )(ȟˆ  ȟ )c + A(ȟˆ  ȟ )e cy + e y (ȟˆ  ȟ )cA c + e y e cy

for ȟ̂ Ȉ y - BLUUE of ȟ , in particular E{ȟˆ} = ȟ, and


[ y  E{y}][ y  E{y}]c =
= A (ȟˆ  E{ȟˆ})(ȟˆ  E{ȟˆ})c + A(ȟˆ  E{ȟˆ})e cy + e y (ȟˆ  E{ȟˆ})cA c + e y e cy

D{y} = E{[y  E{y}][ y  E{y}]c} = D{Aȟˆ} + D{e y } = D{e y  e y } + D{e y }

due to

E{A (ȟˆ  E{ȟˆ})e cy } = E{A(ȟˆ  E{ȟˆ})( y  Aȟˆ )c} = 0


E{e y (ȟˆ  E{ȟˆ})cA c} = E{e y (ȟˆ  E{ȟˆ})cA c} = 0

or

C{Aȟˆ , e y } = 0, C{e y , Aȟˆ} = 0.

These covariance identities will be proven next.

C{Aȟˆ , e y } =
A( A cȈ y1A  ) 1 A cȈ y1 E{(y  E{y})( y  E{y})c}[I n  A( AcȈ y1A) 1 AcȈ y1 ]c

C{Aȟˆ , e y } =
A( A cȈ y1A) 1 A cȈ y1Ȉ y [I n  Ȉ y1A( A cȈ y1A) 1 A c]

C{Aȟˆ , e y } =
A( A cȈ y1A) 1 A c  A( A cȈ y1A) 1 A c = 0.

Here is my proof’s end.


h
216 4 The second problem of probabilistic regression

We recommend to consider the exercises as follows.


Exercise 4.1 (translation invariance: y 6 y  E{y}) :
Prove that the error prediction of type ȟ̂ Ȉ y - BLUUE of ȟ , namely
e y = [I n  A( A cȈy1A ) 1 A cȈy1 ]y
is translation invariant in the sense of y 6 y  E{y} that is
e y = [I n  A( A cȈy1A ) 1 A cȈy1 ]e y
subject to e y := y  E{y} .
Exercise 4.2 (idempotence):
Is the matrix I n  A ( A cȈ A ) 1 A cȈy1 idempotent ?
1
y
Exercise 4.3 (projection matrices):

Are the matrices A ( A cȈy1A ) 1 A cȈy1 and I n  A ( A cȈy1A ) 1 A cȈy1 projection


matrices?
4-22 The Equivalence Theorem of G y -LESS and Ȉ y -BLUUE
We have included the fourth chapter on Ȉ y -BLUUE in order to interpret G y -
LESS of the third chapter. The key question is open:
?When are Ȉ y -BLUUE and G y -LESS equivalent?
The answer will be given by
Theorem 4.5 (equivalence of Ȉ y -BLUUE and G y -LESS):
With respect to the special linear Gauss-Markov model of full column
rank (4.1), (4.2) ȟˆ = Ly is Ȉ y -BLUUE, if ȟ A = Ly is G y -LESS of (3.1)
for
G y = Ȉ y1  G y1 = Ȉ y . (4.35)
In such a case, ȟˆ = ȟ A is the unique solution of the system of normal
equations
( A cȈ1A )ȟˆ = A cȈ1y
y y (4.36)
attached with the regular dispersion matrix

D{ȟˆ} = ( A cȈ y1A ) 1 . (4.37)

The proof is straight forward if we compare the solution (3.11) of G y -LESS and
(4.21) of Ȉ y -BLUUE. Obviously the inverse dispersion matrix D{y},
y{Y,pdf} is equivalent to the matrix of the metric G y of the observation space
Y. Or conversely the inverse matrix of the metric of the observation space Y
determines the variance-covariance matrix D{y}  Ȉ y of the random variable
y  {Y, pdf} .
4-3 Setup of BIQUUE 217

4-3 Setup of the best invariant quadratic uniformly unbiased


estimator of type BIQUUE for the central moments of second
order
The subject of variance -covariance component estimation within Mathematical
Statistics has been one of the central research topics in the nineteen eighties. In a
remarkable bibliography up-to-date to the year 1977 H. Sahai listed more than
1000 papers on variance-covariance component estimations, where his basic
source was “Statistical Theory and Method“ abstracts (published for the Interna-
tional Statistical Institute by Longman Groups Limited), "Mathematical Re-
views" and "Abstract Service of Quality Control and Applied Statistics". Excel-
lent review papers and books exist on the topic of variance-covariance estimation
such as C.R. Rao and J. Kleffe, R.S. Rao (1977) S. B. Searle (1978), L.R. Ver-
dooren (1980), J. Kleffe (1980), and R. Thompson (1980). The PhD Thesis of B.
Schaffrin (1983) offers a critical review of state-of-the-art of variance-covariance
component estimation.
In Geodetic Sciences variance components estimation originates from F. R. Hel-
mert (1924) who used least squares residuals to estimate heterogeneous variance
components. R. Kelm (1974) and E. Grafarend, A. Kleusberg and B. Schaffrin
(1980) proved the relation of Ȉ0 Helmert type IQUUE balled Ȉ - HIQUUE to
BIQUUE and MINQUUE invented by C. R. Rao. Most notable is the Ph. D.
Thesis of M. Serbetci (1968) whose gravimetric measurements were analyzed by
Ȉ 0 -HIQUUE Geodetic extensions of the Helmert method to compete variance
components originate from H. Ebner (1972, 1977), W. Förstner (1979, 1980),
W. Welsch (1977, 1978, 1979, 1980), K. R. Koch (1978, 1981), C. G. Persson
(1981), L. Sjoeberg (1978), E. Grafarend and A. d'Hone (1978), E. Grafarend
(1984) B. Schaffrin (1979, 1980, 1981). W. Förstner (1979), H. Fröhlich (1980),
and K.R. Koch (1981) used the estimation of variance components for the ad-
justment of geodetic networks and the estimation of a length dependent variance
of distances. A special field of geodetic application has been oscillation analysis
based upon a fundamental paper by H. Wolf (1975), namely M. Junasevic (1977)
for the estimation of signal-to-noise ratio in gyroscopic azimuth observations.
The Helmert method of variance component estimation was used by E. Grafar-
end and A. Kleusberg (1980) and A. Kleusberg and E. Grafarend (1981) to esti-
mate variances of signal and noise in gyrocompass observations. Alternatively
K. Kubik (1967a, b, c, 1970) pioneered the method of Maximum Likelihood
(MALE) for estimating weight ratios in a hybrid distance – direction network.
"MALE" and "FEMALE" extensions were proposed by B. Schaffrin (1983), K.
R. Koch (1986), and Z. C. Yu (1996).
A typical problem with Ȉ0 -Helmert type IQUUE is that it does not produce
positive variances in general. The problem of generating a positive-definite vari-
ance-covariance matrix from variance-covariance component estimation has
218 4 The second problem of probabilistic regression

already been highlighted by J. R. Brook and T. Moore (1980), K.G. Brown


(1977, 1978), O. Bemk and H. Wandl (1980), V. Chew (1970), Han Chien-Pai
(1978), R. R. Corbeil and S. R. Searle (REML, 1976), F. J. H. Don and J. R.
Magnus (1980), H. Drygas (1980), S. Gnot, W. Klonecki and R. Zmyslony
(1977). H. O. Hartley and J. N. K. Rao (ML, 1967), in particular J. Hartung
(1979, 1980), J. L. Hess (1979), S. D. Horn and R. A. Horn (1975), S. D. Horn,
R. A. Horn and D. B. Duncan (1975), C. G. Khatri (1979), J. Kleffe (1978,
1980), ), J. Kleffe and J. Zöllner (1978), in particular L. R. Lamotte (1973,
1980), S. K. Mitra (1971), R. Pincus (1977), in particular F. Pukelsheim (1976,
1977, 1979, 1981 a, b), F. Pukelsheim and G. P. Styan (1979), C. R. Rao (1970,
1978), S. R. Searle (1979), S. R. Searle and H. V. Henderson (1979), J. S. Seely
(1972, 1977), in particular W. A. Thompson (1962, 1980), L. R. Verdooren
(1979), and H. White (1980).
In view of available textbooks, review papers and basic contributions in scien-
tific journals we are only able to give a short introduction. First, we outline the
general model of variance-covariance components leading to a linear structure
for the central second order moment, known as the variance-covariance matrix.
Second, for the example of one variance component we discuss the key role of
the postulate's (i) symmetry, (ii) invariance, (iii) uniform unbiasedness, and (iv)
minimum variance. Third, we review variance-covariance component estima-
tions of Helmert type.
4-31 Block partitioning of the dispersion matrix and linear space gener-
ated by variance-covariance components
The variance-covariance component model is defined by the block partitioning
(4.33) of a variance-covariance matrix Ȉ y , also called dispersion matrix D{y} ,
which follows from a corresponding rank partitioning of the observation vector
y = [ y1c,… , yAc ]c . The integer number A is the number of blocks. For instance, the
variance-covariance matrix Ȉ  R n× n in (4.41) is partitioned into A = 2 blocks.
The various blocks consequently factorized by variance V 2j and by covariances
V jk = U jk V jV k . U jk  [1, +1] denotes the correlation coefficient between the
blocks. For instance, D{y1 } = V11V 12 is a variance factorization, while
D{y1 , y 2 } = V12V 12 = V12 U12V 1 V 2 is a covariance factorization. The matrix blocks
V jj are built into the matrix C jj , while the off-diagonal blocks V jk , V jkc into the
matrix C jk of the same dimensions.
dim Ȉ = dim C jj = dim C jk = n × n .
The collective matrices C jj and C jk enable us to develop an additive decompo-
sition (4.36), (4.43) of the block partitioning variance-covariance matrix Ȉ y . As
soon as we collect all variance-covariance components in an peculiar true order,
namely
ı := [V 12 , V 12 , V 22 , V 13 , V 23 , V 32 ,..., V A 1A , , V A2 ]c ,
we are led to a linear form of the dispersion matrix (4.37), (4.43) as well as of the
4-3 Setup of BIQUUE 219

dispersion vector (4.39), (4.44). Indeed the dispersion vector d(y ) = Xı builds
up a linear form where the second order design matrix X, namely
2
× A ( A +1) / 2
X := [vec C1 ," , vec CA ( A +1) ]  R n ,
reflects the block structure. There are A(A+1)/2 matrices C j , j{1," , A(A +1) / 2} .
For instance, for A = 2 we are left with 3 block matrices {C1 , C2 , C3 } .
Before we analyze the variance-covariance component model in more detail,
we briefly mention the multinominal inverse Ȉ 1 of the block partitioned ma-
trix Ȉ . For instance by “JPM” and “SCHUR” we gain the block partitioned
inverse matrix Ȉ 1 with elements {U11 , U12 , U 22 } (4.51) – (4.54) derived from
the block partitioned matrix Ȉ with elements {V11 , V12 , V22 } (4.47).
“Sequential JPM” solves the block inverse problems for any block parti-
tioned matrix. With reference to Box 4.2 and Box 4.3
Ȉ = C1V 1 + C2V 2 + C3V 3 Ÿ Ȉ 1 = E1 (V ) + E2 (V ) + E3 (V )
is an example.
Box 4.2
Partitioning of variance-covariance matrix
ª V11V 12 V12V 12 " V1A 1V 1A 1 V1AV 1A º
« Vc V V22V 22 " V2A 1V 2A 1 V2AV 2A »
« 12 12 »
Ȉ=« # # # # »>0 (4.38)
« V1cA 1V 1A 1 V2cA 1V 2A 1 " VA 1A 1V A21 VA 1AV A 1A »
«¬ V1cAV 1A V2cAV 2A " VAc1AV A 1A VAAV A2 »¼

"A second moments V 2 of type variance, A (A  1) / 2 second moment


V jk of type covariance
matrix blocks of second order design
ª0 " 0º
C jj := « # V jj # » j  {1," , A } (4.39)
« »
¬« 0 " 0 ¼»
ª0 0º
«" 0 V jk " » ª subject to j < k
C jk := « Vkj » « and j , k  {1," , A} (4.40)
«" " » ¬
«0 0 »¼
¬
A A 1, A
Ȉ = ¦ C jjV 2j + ¦ C jk V jk (4.41)
j =1 j =1, k = 2, j < k

A ( A +1) / 2
Ȉ= ¦ C jV j  R n× m (4.42)
j =1
220 4 The second problem of probabilistic regression

[V 12 , V 12 , V 22 , V 13 , V 23 , V 32 ,..., V A 1A , , V A2 ]' =: V (4.43)


"dispersion vector"
D{y} := Ȉ y œ d {y} = vec D{y} = vec Ȉ
A ( A +1) / 2
d (y ) = ¦ (vec C j )V j = XV (4.44)
j =1
" X is called second order design matrix"
X := [vec C1 ," , vec CA ( A +1) / 2 ] (4.45)
"dimension identities"
d (y )  R n ×1 , V  R, X  R n ×A ( A +1) / 2 .
2 2

Box 4.3
Multinomial inverse
:Input:

ªȈ Ȉ12 º ª V11V 12 V12V 12 º


Ȉ = « 11' »=« »=
¬ Ȉ12 Ȉ 22 ¼ ¬ V12c V 12 V22V 22 ¼
(4.46)
ª V 0 º 2 ª 0 V12 º ª0 0 º 2 n× m
= « 11 » V1 + « » V 12 + « »V 2  R
¬ 0 0 ¼ V
¬ 12c 0 ¼ ¬ 0 V22 ¼

ªV 0º ª 0 V12 º ª0 0 º
C11 := C1 := « 11 » , C12 := C2 := « » , C22 := C3 := « » (4.47)
¬ 0 0¼ ¬ V12c 0 ¼ ¬ 0 V22 ¼
3
Ȉ = C11V 12 + C12V 12 + C22V 22 = C1V 1 + C2V 2 + C3V 3 = ¦ C jV j (4.48)
j =1

3
ªV 1 º
vec 6 = ¦ (vec C j )V j =[vec C1 , vec C2 , vec C3 ] ««V 2 »» = XV (4.49)
j =1
«¬V 3 »¼
2
×1
vec C j  R n j  {1,..., A(A + 1) / 2}
" X is called second order design matrix"
2
×A ( A +1) / 2
X := [vec C1 ," , vec CA (A +1) / 2 ]  R n
here: A=2
:output:
ªU 0 º 2 ª 0 U12 º 1 ª 0 0 º 2
Ȉ 1 = « 11 V1 + « V 12 + « »V 2 (4.50)
¬ 0 0 »¼ c
¬ U12 0 »¼ ¬ 0 U 22 ¼
4-3 Setup of BIQUUE 221

subject to
(4.51) U11 = V111 + qV111 V12 SV12c V111 , U12 = Uc21 =  qV111 V12 S (4.52)
V 122
(4.53) U 22 = S = (V22  qV12c V111 V12 ) 1 ; q := (4.54)
V 12V 22
A ( A +1) / 2 = 3
Ȉ 1 = E1 + E2 + E3 = ¦ Ej (4.55)
j =1

ªU 0 º 2 ª 0 U12 º 1 ª 0 0 º 2
E1 (V ) := « 11 » V 1 , E 2 (V ) := « » V 12 , E3 (V ) := « »V 2 . (4.56)
¬ 0 0¼ c
¬ U12 0 ¼ ¬ 0 U 22 ¼
The general result that inversion of a block partitioned symmetric matrix con-
serves the block structure is presented in
Corollary 4.6 (multinomial inverse):
A ( A +1) / 2 A ( A +1) / 2
Ȉ= ¦ C j V j œ Ȉ 1 = ¦ Ǽ j (V ) . (4.57)
j =1 j =1

We shall take advantage of the block structured multinominal inverse when we


are reviewing HIQUUE or variance-covariance estimations of Helmert type.
The variance component model as well as the variance-covariance model are
defined next. A variance component model is a linear model of type
ª V11V 12 0 " 0 0 º
« 0 V22V 22 " 0 0 »»
«
Ȉ=« # # % # # » (4.58)
« 0 0 " VA 1A 1V A21 0 »
« »
¬ 0 0 " 0 VAAV A2 ¼

ªV 12 º
d {y} = vec Ȉ = [vec C11 ,… , vec C jj ] « " » (4.59)
«V 2 »
¬ A¼
+
d {y} = XV  V  R . (4.60)

In contrast, the general model (4.49) is the variance-covariance model with a


linear structure of type
ª V 12 º
«V »
12
d {y} = vec Ȉ = [vec C11 , vec C12 , vec C12 ,… , vec CAA ] « V 22 » (4.61)
« »
"
« 2»
«¬ V A »¼
222 4 The second problem of probabilistic regression

d {y} = Xı  V 2j  R + , Ȉ positive definite. (4.62)

The most popular cases of variance-covariance components are collected in the


examples.
Example 4.1 (one variance components, i.i.d. observations)
D{y} : Ȉ y = , nV 2 subject to Ȉ y  SYM (R n×n ), V 2  R + .

Example 4.2 (one variance component, correlated observations)


D{y} : Ȉ y = 9nV 2 subject to Ȉ y  SYM (R n×n ), V 2  R + .

Example 4.3. (two variance components, two sets of totally uncorrected observa-
tions "heterogeneous observations")
ª n = n1 + n2
ª I n V 12 0 º «
D{y} : Ȉ y = « 1

subject to « Ȉ y  SYM (R n×n ) (4.63)
¬« 0 I n V »
2¼ « 2 + +
¬V 1  R , V 2  R .
2 2

Example 4.4 (two variance components, one covariance components, two sets of
correlated observations "heterogeneous observations")
ª n = n1 + n2
ªV V 2 V12V 12 º « n ×n n ×n
D{y} : Ȉ y = « 11 1 » subject to « V11  R , V22  R
1 1 2 2

¬ V12c V 12 V11V 22 ¼ n ×n
«¬ V12  R 1 2

Ȉ y  SYM (R n×n ), V 12  R + , V 22  R + , Ȉ y positive definite. (4.64)

Special case: V11 = I n , V22 = I n .


1 2

Example 4.5 (elementary error model, random effect model)


A A
e y = y  E{y z} = ¦ A j (z j  E{z j }) = ¦ A j e zj (4.65)
j =1 j =1

(4.66) E{e zj } = 0, E{e zj , eczk } = G jk I q (4.67)


A A
D{y} : Ȉ y = ¦ A j A cjV 2j + ¦ ( A j A ck + A k A cj )V jk . (4.68)
j =1 j , k =1, j < k

At this point, we should emphasize that a linear space of variance-covariance


components can be build up independently of the block partitioning of the dis-
persion matrix D{y} . For future details and explicit examples let us refer to B.
Schaffrin (1983).
4-3 Setup of BIQUUE 223

4-32 Invariant quadratic estimation of variance-covariance components


of type IQE
By means of Definition 4.7 (one variance component) and Definition 4.9 (vari-
ance-covariance components) we introduce
Vˆ 2 IQE of V 2 and Vˆ k IQE of V k .

Those conditions of IQE, represented in Lemma 4.7 and Lemma 4.9 enable us to
separate the estimation process of first moments ȟ j (like BLUUE) from the esti-
mation process of central second moments V k (like BIQUUE). Finally we pro-
vide you with the general solution (4.75) of the in homogeneous matrix equa-
tions M1/k 2 A = 0 (orthogonality conditions) for all k  {1, " ,A(A+1)/2} where
A(A+1)/2 is the number of variance-covariance components, restricted to the
special Gauss–Markov model E {y} = Aȟ , d {y} = XV of "full column rank",
A  R n× m , rk A = m .
Definition 4.7 (invariant quadratic estimation Vˆ 2 of V 2 : IQE ):
The scalar Vˆ 2 is called IQE (Invariant Quadratic Estimation) of
V 2  R + with respect to the special Gauss-Markov model of full col-
umn rank.
E{y} = Aȟ, A  R n×m , rk A = m
(4.69)
D{y} = VV 2 , V  R n×n , rk V = n, V 2  R + ,
if the “variance component V 2 is V ”
(i) a quadratic estimation
Vˆ 2 = y cMy = (vec M )c(y … y ) = (y c … y c)(vec M) (4.70)

subject to
M  SYM := {M  R n× n | M c = M} (4.71)

(ii) transformational invariant : y o y  E{y} =: ey in the sense of

Vˆ 2 = y cMy = e ycMe y (4.72)


or
Vˆ = (vec M )c(y … y ) = (vec M )c(e y … e y )
2
(4.73)
or
Vˆ = tr(Myy c) = tr(Me e c ) .
2
y y (4.74)

Already in the introductory paragraph we emphasized the key of "IQE". Indeed


by the postulate "IQE" the estimation of the first moments E{y} = Aȟ is
224 4 The second problem of probabilistic regression

supported by the estimation of the central second moments D{y} = VV 2 or


d {y} = XV . Let us present to you the fundamental result of " Vˆ 2 IQE OF V 2 ".
Lemma 4.8 (invariant quadratic estimation Vˆ 2 of Vˆ 2 :IQE) :
Let M = (M1/ 2 )cM1/ 2 be a multiplicative decomposition of the symmet-
ric matrix M . The scalar Vˆ 2 is IQE of V 2 , if and only if
(4.75) M1/ 2 = 0 or A c(M1/ 2 )c = 0 (4.76)
n× n
for all M 1/ 2
R .
:Proof:
First, we substitute the transformation y = E{y} + e y subject to expectation iden-
tity E{y} = Aȟ, A  R n× m , rk A = m, into y cMy.

y ' My = ȟ cA cMAȟ + ȟ cA cMe y + e ycMAȟ + e ycMe y .

Second, we take advantage of the multiplicative decomposition of the matrix M ,


namely
M = (M1/ 2 )cM1/ 2 , (4.77)

which generates the symmetry of the matrix

M  SYM := {M  R m×n | M c = M}
y cMy = ȟ cA c(M1/ 2 )cM1/ 2 Aȟ + ȟ cA c(M1/ 2 )cM1/ 2e y + e yc (M1/ 2 )cM1/ 2 Aȟ + e ycMe y .

Third, we postulate "IQE".

y cMy = e ycMe y œ M1/ 2 A = 0 œ A c(M1/ 2 )c = 0.

For the proof, here is my journey's end.


h
Let us extend " IQE " from a " one variance component model " to a " variance-
covariance components model ". First, we define " IQE " ( 4.83 ) for variance-
covariance components, second we give necessary and sufficient conditions
identifying " IQE " .
Definition 4.9 (variance-covariance components model Vˆ k IQE of V k ) :

The dispersion vector dˆ (y ) is called IQE ("Invariant Quadratic Estimation")


with respect to the special Gauss-Markov model of full column rank.
ª E{y} = Aȟ, A  {R n×m }; rk A = m
« (4.78)
«¬ d {y} = Xı, D{y} ~ Ȉ y positive definite, rk Ȉ y = n,
4-3 Setup of BIQUUE 225
if the variance-covariance components
ı := [V 12 , V 12 , V 22 , V 13 , V 23 ," , V A2 ]c (4.79)

(i) bilinear estimations


Vˆ k = y cMy = (vec M )c(y … y ) = tr M k yy c
(4.80)
M k  R n× n× A ( A +1) / 2 ,
subject to

M k  SYM := {M k  R n× n× A ( A +1) / 2 | M k = M k c }, (4.81)

(ii) translational invariant


y o y  E{y} =: e y

Vˆ k = y cM k y = e ycM k e y (4.82)

Vˆ k = (vec M k )c(y … y ) = (vec M k )c(e y … e y ). (4.83)

Note the fundamental lemma " Vˆ k IQE of V k " whose proof follows the
same line as the proof of Lemma 4.7.
Lemma 4.10 (invariant quadratic estimation Vˆ k of V k : IQE):

Let M k = (M1/k 2 )cM1/k 2 be a multiplicative decomposition of the sym-


metric matrix M k . The dispersion vector Vˆ k is IQE of V k , if and only
if
(4.84) M1/k 2 A = 0 or A c(M1/k 2 )c = 0 (4.85)

for all M1/k 2  R n× n× A ( A +1) / 2 .

? How can we characterize " Vˆ 2 IQE of V 2 " or " Vˆ k IQE of V k " ?

The problem is left with the orthogonality conditions (4.75), (4.76) and (4.84),
(4.85). Box 4.4 reviews the general solutions of the homogeneous equations
(4.86) and (4.88) for our " full column rank linear model ".
Box 4.4
General solutions of homogeneous matrix equations
M1/k 2 = 0 œ M k = Z k (I n - AA  ) (4.86)

" for all A   G := {A   R n× m | AA  A = A} "

: rk A = m
226 4 The second problem of probabilistic regression

A  = A L = ( A cG y A) 1 A cG y (4.87)

" for all left inverses A L  {A   R m× n | ( A  A)c = A  A} "

M1/k 2 = 0 º
» Ÿ M k = Z k [I n  A( A cG y A) A cG y ]
1/ 2 1
(4.88)
rk A d m ¼

"unknown matrices : Z k and G y .

First, (4.86) is a representation of the general solutions of the inhomogeneous


matrix equations (4.84) where Z k , k  {1," , A(A + 1) / 2}, are arbitrary matrices.
Note that k = 1, M1 describes the " one variance component model ", otherwise
the general variance-covariance components model. Here we are dealing with a
special Gauss-Markov model of " full column rank ", rk A = m . In this case, the
generalized inverse A  is specified as the " weighted left inverse " A L of type
(4.71) whose weight G y is unknown. In summarizing, representations of two
matrices Z k and G y to be unknown, given H1/k 2 , M k is computed by
M k = (M1/k 2 )cM1/k 2 = [I n  G y A(A cG y A)1 A ']Zck Z k [I n  A( A cG y A) 1 A cG y ]
(4.89)
definitely as a symmetric matrix.
4-33 Invariant quadratic uniformly unbiased estimations of variance-
covariance components of type IQUUE
Unbiased estimations have already been introduced for the first moments
E{y} = Aȟ, A  R n× m , rk A = m . Similarly we like to develop the theory of the
one variance component V 2 and the variance-covariance unbiased estimations
for the central second moments, namely components Vk ,
k{1,…,A(A +1)/2}, where A is the number of blocks. Definition 4.11 tells us when
we use the terminology " invariant quadratic uniformly unbiased estimation "
Vˆ 2 of V 2 or Vˆ k of V k , in short " IQUUE ". Lemma 4.12 identifies Vˆ 2 IQUUE
of V 2 by the additional tr VM = 1 . In contrast, Lemma 4.12 focuses on Vˆ k
IQUUE of V k by means of the additional conditions tr C j M k = į jk . Examples
are given in the following paragraphs.

Definition 4.11 (invariant quadratic uniformly unbiased estimation Vˆ 2


of V 2 and Vˆ k of V k : IQUUE) :
The vector of variance-covariance components Vˆ k is called IQUUE
(Invariant Quadratic Uniformly Unbiased Estimation ) of V k with re-
spect to the special Gauss-Markov model of full column rank.
4-3 Setup of BIQUUE 227

ª E{y}= Aȟ, AR n×m , rk A = m


« d {y}= Xı, XR n ×A ( A+1) / 2 , D{y}~ Ȉ positive definite, rk Ȉ
2

« y y

«¬ rk Ȉ y = n, vech D{y}= d{y},


(4.90)
if the variance-covariance components
ı := [V 12 , V 12 , V 22 , V 13 , V 23 ," , V A2 ] (4.91)
are
(i) a bilinear estimation
Vˆ k = y cM k y = (vec M k )c(y … y ) = tr M k yy c
(4.92)
M k  R n× n× A ( A +1) / 2
subject to

M k = (M1/k 2 )c(M1/k 2 )  SYM := {M k  R n× m× A ( A +1) / 2 | M k = M k c } (4.93)

(ii) translational invariant in the sense of


y o y  E{y} =: ey

Vˆ k = y cM k y = e ycM k e y (4.94)
or
Vˆ k = (vec M k )c(y … y ) = (vec M k )c(e y … e y ) (4.95)
or
Vˆ k = tr Ȃ k yy c = tr M k e y e yc , (4.96)

(iii) uniformly unbiased in the sense of


k = 1 (one variance component) :
E{Vˆ 2 } = V 2 , V 2  R + , (4.97)
k t 1 (variance-covariance components):
E{Vˆ k } = V k , V k  {R A ( A +1) / 2 | Ȉ y positive definite}, (4.98)
with A variance components and A(A-1)/2 covariance components.
Note the quantor “for all V 2  R + ” within the definition of uniform unbiased-
ness (4.81) for one variance component. Indeed, weakly unbiased estimators
exist without the quantor (B. Schaffrin 2000). A similar comment applies to the
quantor “for all V k  {R A ( A +1) / 2 | Ȉ y positive definite} ” within the definition of
uniform unbiasedness (4.82) for variance-covariance components. Let us charac-
terize “ Vˆ 2 IQUUE of V 2 ”.
228 4 The second problem of probabilistic regression

Lemma 4.12 ( Vˆ 2 IQUUE of V 2 ):


The scalar Vˆ 2 is IQUUE of V 2 with respect to the special Gauss-
Markov model of full column rank.

ª"first moment " : E{y} = Aȟ, A  R n×m , rk A = m


« n× n +
«¬"centralsecond moment " : D{y} = V , V  R , rk V = n, V  R ,
2 2

if and only if
(4.99) (i) M1/ 2 A = 0 and (ii) tr VM = 1 . (4.100)

:Proof:
First, we compute E{Vˆ 2 } .

Vˆ 2 = tr Me y e yc Ÿ E{Vˆ 2 } = tr MȈ y = tr Ȉ y M.

Second, we substitute the “one variance component model” Ȉ y = VV 2 .

E{Vˆ 2 } := V 2 V 2  R  œ tr VM = 1.
Third, we adopt the first condition of type “IQE”.
h
The conditions for “ Vˆ k IQUUE of V k ” are only slightly more complicated.

Lemma 4.13 ( Vˆ k IQUUE of V 2 ):

The vector Vˆ k , k  {1," , A(A + 1) / 2} is IQUUE of V k with respect to


the block partitioned special Gauss-Markov model of full column rank.
" first moment"

ª y1 º ª A1 º
«y » «A »
« 2 » « 2 »
A 1 , nA × m
E{« # »} = « # » ȟ = Aȟ, A  \ n n "n 1 2
, rk A = m (4.101)
« » « »
« y A 1 » « A A 1 »
«¬ y A »¼ «¬ A A »¼

n1 + n2 + " + nA 1 + nA = n

" central second moment"


4-3 Setup of BIQUUE 229
2
ª y1 º ª V11V 1 V12V 12 " V1A 1V 1A 1 V1AV 1l º
«y » « »
« 2 » « V12V 12 V22V 22 " V2A 1V 2A 1 V2AV 2A »
D{« # »} = « # # # # » (4.102)
« » « »
« y A 1 » « V1A 1V 1A 1 V2A 1V 2A 1 " VA 1,A 1V A21 VA 1,AV A 1,A »
«¬ y A »¼ « V1AV 1l V1AV 1l " VA 1,AV A 1,A VA ,AV A2 »¼
¬
A A ( A +1) / 2
D{y} = ¦ C jjV 2j + ¦ C jk V jk (4.103)
j =1 j , k =1
j <k

A ( A +1) / 2
D{y} = ¦ C jV j (4.104)
j =1

ı := [V 12 , V 12 , V 22 , V 13 , V 23 , V 32 ," , V A2 ]  \ A ( A +1) / 2 +1 (4.105)

C j  \ n×n×A ( A +1) / 2 ( 3d array) (4.106)

A 1 × nl
V11  \ n ×n , V12  \ n ×n ," , VA 1,A  \ n
1 1 1 2
, VAA  \ n ×n
A A
(4.107)

D{y}  Ȉ y  \ n×n = \ ( n +...+ n )×( n +...+ n )


1 l 1 l
(4.108)

rk Ȉ y = n, Ȉ y positive definite (4.109)


if and only if
(4.110) (i) M1/k 2 A = 0 and (ii) tr C j M k = į jk . (4.111)
Before we continue with the proof we have to comment on our setup of the vari-
ance-covariance components model. For a more easy access of an analyst we
have demonstrated the blocks partitioning of
the observation vector and the variance-covariance

ª y1 º , dim y1 = n1 ª V11V 12 " V1AV 1A º


«#» « »
# , Ȉy = « # # ».
« »
«¬ y A »¼ , dim y A = nA « V1AV 1A " VAAV A2 »
¬ ¼
n1 observations build up the observation vector y1 as well as the variance factor
V11. Similarly, n2 observations build up the variance factor V22. Both
observations collected in the observations vectors y1 and y2, constitute the
covariance factor V12.
This scheme is to be continued for the other observations and their
corresponding variance and covariance factors. The matrices C jj and C jk which
230 4 The second problem of probabilistic regression

map variance components V jk (k>j) to the variance-covariance matrix Ȉ y contain


the variance factors V jj at {colj, rowj} while the covariance factors
contain {V jkc , V jk } at {colk, rowj} and {colj, rowk}, respectively. The following
proof of Lemma 4.12 is based upon the linear structure (4.88).
:Proof:
First, we compute E{Vˆ k } .
E{Vˆ k } = tr M k Ȉ y = tr Ȉ y M k .

Second, we substitute the block partitioning of the variance-covariance matrix


Ȉy .
A ( A +1) / 2
º
Ȉy = ¦ C jV j » A ( A +1) / 2
j =1 » Ÿ tr Ȉ y M k = tr ¦ C j M kV j
j =1
E{Vˆ k } = tr Ȉ y M k ¼
»

A ( A +1) / 2
E{Vˆ k } = V k œ ¦ (tr C j M k )V j = V k , V i  R A ( A +1) / 2 œ (4.112)
j =1

tr C j M k  G jk = 0 .

Third, we adopt the first conditions of the type “IQE”.


4-34 Invariant quadratic uniformly unbiased estimations of one variance
component (IQUUE) from Ȉ y  BLUUE: HIQUUE
Here is our first example of “how to use IQUUE“. Let us adopt the residual vec-
tor e y as predicted by Ȉ y -BLUUE for a “one variance component“ dispersion
model, namely D{y} = VV 2 , rk V = m . First, we prove that M1/ 2 generated by
V-BLUUE fulfils both the conditions of IQUUE namely M1/ 2 A = 0 and
tr VM = tr V (M1/ 2 )cM1/ 2 = 1 . As outlined in Box 4.5, the one condition of uni-
form unbiasedness leads to the solutions for one unknown D within the “ansatz”
Z cZ = D V 1 , namely the number n-m of “degrees of freedom” or the “surjectiv-
ity defect”. Second, we follow “Helmert’s” ansatz to setup IQUUE of Helmert
type, in Short “HIQUUE”.

Box 4.5
IQUUE : one variance component
1st variations
{E{y} = Ax, A  R n× m , rk A = m, D{y} = VV 2 , rk V = m, V 2  R + }

e y = [I n  A ( A ' V 1A ) 1 A ' V 1 ]y (4.31)


4-3 Setup of BIQUUE 231
1st test: IQE
M1/ 2 A = 0
"if M1/ 2 = Z[I n  A( A ' V 1 A ) 1 A ' V 1 ] , then M1/ 2 A = 0 "
2nd test : IQUUE
"if tr VM = 1 , then
tr{V[I n  V 1 A( A ' V 1 A) 1 A ']Z cZ[I n  A( A ' V 1 A) 1 A ' V 1 ]} = 1
ansatz : ZcZ = D V 1 (4.113)
tr VM = D tr{V[V  V A( A cV A) ][I n  A( AcV A) AcV ]} = 1
1 1 1 1 1 1 1

tr VM = D tr[I n  A( A cV 1 A) A cV 1 ] = 1
tr I n = 0 º
»Ÿ
tr[ A( A ' V A) A ' V ] = tr A( A ' V A) A ' V A = tr I m = m ¼
1 1 1 1 1 1

1
tr VM = D (n  m) = 1 Ÿ D = . (4.114)
nm
Let us make a statement about the translational invariance of e y predicted by
Ȉ y - BLUUE and specified by the “one variance component” model Ȉ y = VV 2 .
e y = e y ( Ȉ y - BLUUE) = [I n  A( A ' Ȉ y1A) 1 A ' Ȉ y1 ]y . (4.115)

Corollary 4.14 (translational invariance):


e y = [I  A( A ' Ȉ y1A) 1 A ' Ȉ y1 ]e y = Pe y (4.116)
subject to
P := I n  A ( A ' Ȉ y1A) 1 A ' Ȉ y1 . (4.117)

The proof is “a nice exercise”: Use e y = Py and replace y = E{y} + e y =


A[ + e y . The result is our statement, which is based upon the “orthogonality
condition” PA = 0 . Note that P is idempotent in the sense of P = P 2 .
In order to generate “ Vˆ 2 IQUUE of V 2 ” we start from “Helmert’s ansatz”.

Box 4.6
Helmert’s ansatz
one variance component
e cy Ȉ y1e y = ecy P ' Ȉ y1Pe y = tr PȈ y1Pe y ecy (4.118)

E{e cy Ȉ y1e y } = tr(P ' Ȉ y1P E{e y ecy }) = tr(P ' Ȉ y1PȈ y ) (4.119)
232 4 The second problem of probabilistic regression

”one variance component“


Ȉ y = VV 2 = C1V 2

E{e cy V 1e y }= (tr P ' V 1PV )V 2 V 2 \ 2 (4.120)

tr P ' V 1 PV = tr[I n  V 1 A ( A ' V 1 A ) A '] = n  m (4.121)


E{e cy V 1e y }= (n  m)V 2 (4.122)

1
Vˆ 2 := e cy V 1e y Ÿ E{Vˆ 2 }=V 2 . (4.123)
nm
Let us finally collect the result of “Helmert’s ansatz” in
Corollary 4.15 ( Vˆ 2 of HIQUUE of V 2 ):
Helmert’s ansatz
1
Vˆ 2 = e y ' V 1e y (4.124)
nm
is IQUUE, also called HIQUUE.
4-35 Invariant quadratic uniformly unbiased estimators of variance
covariance components of Helmert type: HIQUUE versus HIQE
In the previous paragraphs we succeeded to prove that first M 1/ 2 generated by
e y = e y ( Ȉ y - BLUUE) with respect to “one variance component” leads to
IQUUE and second Helmert’s ansatz generated “ Vˆ 2 IQUUE of V 2 ”. Here we
reverse the order. First, we prove that Helmert’s ansatz for estimating variance-
covariance components may lead (or may, in general, not) lead to
“ Vˆ k IQUUE of V k ”.
Second, we discuss the proper choice of M1/k 2 and test whether (i) M1/k 2 A = 0
and (ii) tr H j M k = G jk is fulfilled by HIQUUE of whether M1/k 2 A = 0 is fulfilled
by HIQE.
Box 4.7
Helmert's ansatz
variance-covariance components
step one: make a sub order device of variance-covariance components:

V 0 := [V 12 , V 12 , V 2 2 , V 13 , V 12 ,..., V A 2 ]0c
A ( A +1) / 2
step two: compute Ȉ 0 := ( Ȉ y )0 = Ȉ ¦ C jV j (V 0 ) (4.125)
j =1
4-3 Setup of BIQUUE 233

step three: compute e y = e y ( Ȉ 0 - BLUUE), namely


1 1
P (V 0 ) := (I  A( A cȈ 0 A) 1 A cȈ 0 (4.126)
e y = P0 y = P0 e y (4.127)
step four: Helmert's ansatz
e cy Ȉ 01e y = ecy P0cȈ 0-1P0e y = tr(P0 Ȉ 01P0ce y ecy ) (4.128)
E{eˆ cy Ȉ e } = tr (P0 Ȉ P c Ȉ)
-1
0 y
1
0 0 (4.129)

''variance-covariance components''
A ( A +1) / 2
Ȉy = Ȉ ¦ CkV k (4.130)
k =1

E{e cy Ȉ -10 e cy } = tr(P0 Ȉ 01P0cCk )V k (4.131)


step five: multinomial inverse
A ( A +1) / 2 A ( A +1) / 2
Ȉ= ¦ Ck V k œ Ȉ 1 = ¦ Ek (V j ) (4.132)
k =1 k =1

input: V 0 , Ȉ 0 , output: Ek (V 0 ).
step six: Helmert's equation
i, j  {1," , A(A + 1) / 2}
A ( A +1) / 2
(4.133)
E{e cy Ei (V 0 )e y } = ¦ (tr P(V 0 )Ei (V 0 )P c(V 0 )C j )V j
k =1

"Helmert's choice''
A ( A +1) / 2
ecy Ei (V 0 ) e y = ¦ (tr P(V 0 )Ei (V 0 )P c(V 0 )C j )V j (4.134)
j =1

ª q := ey
 cEi (V 0 )ey 
«
q = Hıˆ « H := tr P (V 0 )Ei (V 0 )P '(V 0 )C j (" Helmert ' s process ") (4.135)
«
¬ıˆ := [Vˆ1 , Vˆ12 , Vˆ 2 , Vˆ13 , Vˆ 23 , Vˆ 3 ,..., Vˆ A ] .
2 2 2 2

Box 4.7 summarizes the essential steps which lead to “ Vˆ k HIQUUE of V k ” if


det H = 0 , where H is the Helmert matrix. For the first step, we use some prior
information V 0 = Vˆ 0 for the unknown variance-covariance components. For
instance, ( Ȉ y )0 = Ȉ 0 = Diag[(V 12 ) 0 ,..., (V A2 ) 0 ] may be the available information
on variance components, but leaving the covariance components with zero. Step
two enforces the block partitioning of the variance-covariance matrix generating
the linear space of variance-covariance components. e y = D0 e y in step three is
the local generator of the Helmert ansatz in step four. Here we derive the key
equation E{e y ' Ȉ -10 e y } = tr (D0 Ȉ 01D0c Ȉ) V k . Step five focuses on the multinormal
inverse of the block partitioned matrix Ȉ , also called “multiple IPM”. Step six is
234 4 The second problem of probabilistic regression

taken if we replace 6 01 by the block partitioned inverse matrix, on the “Hel-
mert’s ansatz”. The fundamental expectation equation which maps the variance-
covariance components V j by means of the “Helmert traces” H to the quadratic
terms q (V 0 ) . Shipping the expectation operator on the left side, we replace V j
by their estimates Vˆ j . As a result we have found the aborted Helmert equation
q = Hıˆ which has to be inverted. Note E{q} = Hı reproducing unbiasedness.
Let us classify the solution of the Helmert equation q = Hı with respect to bias.
First let us assume that the Helmert matrix is of full rank, vk H = A(A + 1) / 2 the
number of unknown variance-covariance components. The inverse solution, Box
4.8, produces an update ıˆ 1 = H 1 (ıˆ 0 ) ' q(ıˆ 0 ) out of the zero order information
Vˆ 0 we have implemented. For the next step, we iterate ıˆ 2 = H 1 (ıˆ 1 )q(ıˆ 1 ) up to
the reproducing point Vˆ w = Vˆ w1 with in computer arithmetic when iteration
ends. Indeed, we assume “Helmert is contracting”.
Box 4.8
Solving Helmert's equation
the fast case : rk H = A ( A + 1) / 2, det H z 0
:"iterated Helmert equation":
Vˆ1 = H 1 (Vˆ 0 )q (Vˆ 0 ),..., VˆZ = HZ1 (VˆZ 1 ) q(VˆZ 1 ) (4.136)
"reproducing point"
start: V 0 = Vˆ 0 Ÿ Vˆ1 = H 01 q0 Ÿ Vˆ 2 = H11 q1
subject to H1 := H (Vˆ1 ), q1 := q(Vˆ1 ) Ÿ
Ÿ ... Ÿ VˆZ = VˆZ 1 (computer arithmetic): end.

?Is the special Helmert variance-covariance estimator ıˆ x = H 1 q " JQUUE "?


Corollary 4.16 gives a positive answer.
Corollary 4.16 (Helmert equation, det H z 0);
In case the Helmert matrix H is a full rank matrix, namely
rk H = A ( A + 1) / 2
ıˆ = H 1q (4.137)
is Ȉ f -HIQUUE at reproducing point.
: Proof:
q := e cy Ei e y
E{ıˆ } = H E{q} = H 1 Hı = ı .
1

h
For the second case of our classification, let us assume that Helmert matrix is no
longer of full rank, rk H < A(A + 1) / 2 , det H=0. Now we are left with the central
question.
4-3 Setup of BIQUUE 235

? Is the special Helmert variance-covariance estimator ı = H l q = H + q of type n 1

“ MINOLESS” “ IQUUE”?
Unfortunately, the MINOLESS of the rank factorized Helmert equation
q = JKıˆ outlined in Box 4.9 by the weighted Moore-Penrose solution, indicates
a negative answer. Instead, Corollary 4 proves Vˆ is only HIQE, but resumes
also in establishing estimable variance-covariance components as “ Helmert
linear combinations” of them.
Box 4.9
Solving Helmert´s equation the second case:
rk H < A(A + 1) / 2 , det H=0

" rank factorization"


" MINOLESS"
H = JK , rkH = rkF = rkG =: v (4.138)

" dimension identities"


H  \ A ( A +1) / 2× A ( A +1) / 2 , J  \ A ( A +1) / 2× v , G  \ v × A ( A +1) / 2

H lm = H + ( weighted ) = K R ( weighted ) = J L ( weighted ) (4.139)

ıˆ lm = G ı-1K c(KG V-1K 1 )(J cG q J ) 1 G q q = HV+ , q q . (4.140)

In case “ detH=0” Helmert´s variance-covariance components estimation is no


longer unbiased, but estimable functions like Hıˆ exist:
Corollary 4.17 (Helmert equation, det H=0):
In case the Helmert matrix H, rkH< A(A + 1) / 2 , det H=0, is rank defi-
cient, the Helmert equation in longer generates an unbiased IQE. An es-
timable parameter set is H Vˆ :
(i) Hıˆ = HH + q is Ȉ 0  HIQUUE (4.141)
(ii) Vˆ is IQE .
:Proof:
(i) E{ıˆ } = H + E{q} = H + Hı z ı , ıˆ  IQE
(ii) E{Hıˆ } = HH + E{q} = HH + Hı = Hı , Hıˆ  HIQUUE.
h
In summary, we lost a bit of our illusion that ı y ( Ȉ y  BLUUE) now always
produces IQUUE.
236 4 The second problem of probabilistic regression

“ The illusion of progress is short, but exciting”

“ Solving the Helmert equations”


IQUUE versus IQE

det H z 0 det H=0


ıˆ k is Ȉ 0  HIQUUE of V k ıˆ k is only HIQE of ı k
Hıˆ k is Ȉ 0 -IQUUE .
Figure 4.1 : Solving the Helmert equation for estimating variance-covariance-
components
Figure 4.1 illustrates the result of Corollary 4 and Corollary 5. Another draw-
back is that we have no guarantee that
HIQE or HIQUUE
generates a positive definite variance-covariance matrix Ȉ̂ . Such a postulate can
be enforced by means of an inequality constraint on the Helmert equation
Hıˆ = q of type “ ıˆ > 0 ” or “ ıˆ > ı ” in symbolic writing. Then consult the text
books on “ positive variance-covariance component estimation”. At this end, we
have to give credit to B. Schaffrin (1.83, p.62) who classified Helmert´s vari-
ance-covariance components estimation for the first time correctly.
4-36 Best quadratic uniformly unbiased estimations of one variance
component: BIQUUE
First, we give a definition of “best” Vˆ 2 IQUUE of V 2 within Definition 4.18
namely for a Gauss normal random variable y  Y = {\ n , pdf} . Definition 4.19
presents a basic result representing “Gauss normal” BIQUUE. In particular we
outline the reduction of fourth order moments to second order moments if the
random variable y is Gauss normal or, more generally, quasi-normal. At same
length we discuss the suitable choice of the proper constrained Lagrangean gen-
erating Vˆ 2 BIQUUE of V 2 . The highlighted is Lemma 4 where we resume the
normal equations typical for BIQUUE and Theorem 4 with explicit representa-
tions of Vˆ 2 , D{Vˆ 2 } and Dˆ {Vˆ 2 } of type BIQUUE with respect to the special
Gauss-Markov model with full column rank.
? What is the " best" Vˆ 2 IQUUE of V 2 ?
First, let us define what is "best" IQUUE.
Definition 4.18 ( Vˆ 2 best invariant quadratic uniformly unbiased esti-
mation of V 2 : BIQUUE)
Let y  {\ n , pdf } be a Gauss normal random variable representing the
stochastic observation vector. Its central moments up to order four
4-3 Setup of BIQUUE 237

E{eiy } = 0 ,
E{eiy e yj } = S ij = vijV 2 (4.142)
E{e e e } = S ijk = 0, (obliquity)
y y y
i j k (4.143)
E{eiy e yj eky ely } = S ijkl = S ijS kl + S ik S jl + S ilS jk =
(4.144)
= (vij vkl + vik v jl + vil v jk )V 4
relate to the "centralized random variable"
e y := y  E{y} = [eiy ] . (4.145)
The moment arrays are taken over the index set i, j, k, l {1000, n}
when the natural number n is identified as the number of observations. n
is the dimension of the observation space Y = {\ n , pdf } .
The scalar Vˆ 2 is called BIQUUE of V 2 ( Best Invariant Quadratic Uni-
formly Unbiased Estimation) of the special Gauss-Markov model of full
column rank.
"first moments" :
E{y} = Aȟ, A  \ n× m , ȟ  \ m , rk A = m (4.146)
"central second moments":
D{y}  å y = VV 2 , V  \ n×m , V 2  \ + , rk V = n (4.147)
m 2 +
where ȟ  \ is the first unknown vector and V  \ the second un-
known " one variance component", if it is.
(i) a quadratic estimation (IQE):
Vˆ 2 = y cMy = (vec M )cy … y = tr Myy c (4.148)
subject to
1 1
M = (M )cM  SYM := {M  \ n×m M = M c}
2 2
(4.149)
(ii) translational invariant, in the sense of
y o y  E{y} =: e y (4.150)
V 2 = y cMy = ecy Mey
ˆ (4.151)
or equivalently
Vˆ 2 = (vec M )c y … y = (vec M )ce y … e y (4.152)
Vˆ 2 = tr Myy c = tr Me y ecy (4.153)
(iii) uniformly unbiased in the sense of
E{Vˆ 2 } = V 2 , V 2  \ + (4.154)
and
(iv) of minimal variance in the sense
D{Vˆ 2 } := E{[Vˆ 2  E{Vˆ 2 }]2 } = min . (4.155)
M
238 4 The second problem of probabilistic regression

In order to produce "best" IQUUE we have to analyze the variance


E{[Vˆ 2  E{Vˆ 2 }]1 } of the invariant quadratic estimation Vˆ 2 the "one variance
component", of V 2 . In short, we present to you the result in
Corollary 4.19 (the variance of Vˆ with respect to a Gauss normal
IQE):
If Vˆ 2 is IQE of V 2 , then for a Gauss normal observation space
Y = {\ n , pdf } the variance of V 2 of type IQE is represented by
E{[Vˆ 2  E{Vˆ 2 }]2 } = 2 tr M cVMV . (4.156)
: Proof:
ansatz: IQE
Vˆ = tr Me y ecy Ÿ E{Vˆ 2 } = (tr MV )V 2
2

E{[Vˆ 2  E{Vˆ 2 }]} = E{[tr Me y ecy  (tr MV)V 2 ][tr Me y ecy  (tr MV)V 2 ]} =
= E{(tr Me y ecy )(tr Me y ecy )}  (tr MV) 2 V 4 Ÿ (4.156).
h
2 2
With the “ansatz” Vˆ IQE of V we have achieved the first decomposition of
var {Vˆ 2 } . The second decomposition of the first term will lead us to central mo-
ments of fourth order which will be decomposed into central moments of second
order for a Gauss normal random variable y. The computation is easiest in
“Ricci calculus“. An alternative computation of the reduction “fourth moments
to second moments” in “Cayley calculus” which is a bit more advanced, is gives
in Appendix D.
E{(tr Me y ecy )(tr Me y ecy )} =
n n
= ¦ m ij m kl E{eiy e yj e ky ely } = ¦ m ij m kl ʌijkl =
i , j , k ,l =1 i , j , k ,l =1
n
= ¦ m ij m kl ( ʌij ʌ kl + ʌik ʌ jl + ʌil ʌ jk ) =
i , j , k ,l =1
n
= ¦ m ij m kl ( v ij v kl + v ik v jl + v il v jk )ı 4
i , j , k ,l =1

Ǽ{(tr Me y ecy )(tr 0 e y ecy )} = V 4 (tr MV ) 2 + 2V 4 tr(MV ) 2 . (4.157)


A combination of the first and second decomposition leads to the final result.
E{[Vˆ 2  E{Vˆ 2 }]} = E{(tr Me y ecy )(tr Me y ecy )}  V 4 (tr MV) =
= 2V 4 (tr MVMV ).
h
A first choice of a constrained Lagrangean for the optimization problems
“BIQUUE”, namely (4.158) of Box 4.10, is based upon the variance
E{[Vˆ 2  E{Vˆ 2 }] IQE}
4-3 Setup of BIQUUE 239

constrained to “IQE” and


(i) the condition of uniform unbiasedness ( tr VM ) -1 = 0
as well as
(ii) the condition of the invariant quadratic estimation A c(M1/ 2 ) = 0 .
A second choice of a constrained Lagrangean generating Vˆ 2 BIQUUE of V 2 ,
namely (4.163) of Box 4.10, takes advantage of the general solution of the ho-
mogeneous matrix equation M1/ 2 A = 0 which we already obtained for “IQE”.
(4.73) is the matrix container for M. In consequence, building into the La-
grangean the structure of the matrix M, desired by the condition of the invari-
ance quadratic estimation Vˆ 2 IQE of V 2 reduces the first Lagrangean by the
second condition. Accordingly, the second choice of the Lagrangean (4.163)
includes only one condition, in particular the condition for an uniformly unbiased
estimation ( tr VM )-1=0 . Still we are left with the problem to make a proper
choice for the matrices ZcZ and G y . The first "ansatz" ZcZ = ĮG y produces a
specific matrix M, while the second "ansatz" G y = V 1 couples the matrix of the
metric of the observation space to the inverse variance factor V 1 . Those " natu-
ral specifications" reduce the second Lagrangean to a specific form (4.164), a
third Lagrangean which only depends on two unknowns, D and O0 . Now we are
prepared to present the basic result for Vˆ 2 BIQUUE of V 2 .
Box 4.10
Choices of constrained Lagrangeans generating Vˆ 2 BIQUUE of V 2
"a first choice"
L(M1/ 2 , O0 , A1 ) := 2 tr(MVMV ) + 2O0 [(tr VM )  1] + 2 tr A1 Ac(M1/ 2 )c (4.158)
"a second choice"
M = (M 1/ 2
)cM 1/ 2
= [I n - G y A(A cG y A) 1 A c]Z cZ[I n  A(A cG y A)-1 A cG y ] (4.159)
ansatz : ZcZ = ĮG y
M = ĮG y [I n  A( A cG y A) 1 AcG y ] (4.160)
VM = ĮVG y [I n  A( A cG y A) 1 AcG y ] (4.161)
ansatz : G y = V 1
VM = Į[I n  A( AcV 1 A) 1 A cV 1 ] (4.162)

L(Į, O0 ) = tr MVMV + 2O0 [( VM  1)] (4.163)


tr MVMV = Į tr[I n  A( AcV A) A cV ] = Į ( n  m)
2 1 1 1 2

tr VM = Į tr[I n  A( A cV 1 A) 1 A cV 1 ] = Į ( n  m)
L(Į, O0 ) = Į 2 (n  m) + 2O0 [D (n  m)  1] = min . (4.164)
Į , O0
240 4 The second problem of probabilistic regression

Lemma 4.20 ( Vˆ 2 BIQUUE of V 2 ):


The scalar Vˆ 2 = y cMy is BIQUUE of V 2 with respect to special Gauss-
Markov model of full column rank, if and only if the matrix D together
with the "Lagrange multiplier" fulfills the system of normal equations
ª 1 1 º ª Dˆ º ª 0 º
«¬ n  m 0 »¼ «Oˆ » = «¬1 »¼ (4.165)
¬ 0¼
solved by
1 1
Dˆ = , O0 =  . (4.166)
nm nm
: Proof:
Minimizing the constrained Lagrangean
L(D , O0 ) = D 2 (n  m) + 2O0 [D ( n  m)  1] = min
D , O0

leads us to the necessary conditions


1 wL
(Dˆ , Oˆ0 ) = Dˆ (n  m) + Oˆ0 ( n  m) = 0
2 wD
1 wL
(Dˆ , Oˆ0 ) = Dˆ (n  m)  1 = 0
2 wO0
or

ª 1 1 º ª Dˆ º ª 0 º
«¬ n  m 0 »¼ « Oˆ » = «¬1 »¼
¬ 0¼
1
solved by Dˆ = Oˆ0 = .
nm
1 w2 L
(Dˆ , Oˆ0 ) = n  m × 0
2 wD 2
constitutes the necessary condition, automatically fulfilled. Such a solution for
the parameter D leads us to the " BIQUUE" representation of the matrix M.
1
M= V 1 [I n  A( AcV 1 A) 1 A cV 1 ] . (4.167)
nm
h
2 2 2
Explicit representations Vˆ BIQUUE of V , of the variance D{Vˆ } and its esti-
mate D{ıˆ 2 } are highlighted by
4-3 Setup of BIQUUE 241

Theorem 4.21 ( Vˆ BIQUUE of V 2 ):


Let Vˆ 2 = y cMy = (vec M )c(y … y ) = tr Myy c be BIQUUE of V 2 with re-
seat to the special Gauss-Markov model of full column rank.
(i) Vˆ 2 BIQUUE of V 2
Explicit representations of Vˆ 2 BIQUUE of V 2
Vˆ 2 = (n  m) 1 y c[V 1  V 1 A( A cV 1 A) 1 A cV 1 ]y (4.168)
Vˆ 2 = (n  m) 1 e cV 1e (4.169)
subject to e = e ( BLUUE).
(ii) D{ Vˆ 2 BIQUUE}
BIQUUE´s variance is explicitly represented by
D{Vˆ 2 | BIQUUE} = E{[Vˆ 2  E{Vˆ 2 }]2 BIQUUE} = 2(n  m) 1 (V 2 ) 2 .
(4.170)

(iii) D {Vˆ 2 }
An estimate of BIQUUE´s variance is
Dˆ {Vˆ 2 } = 2(n  m) 1 (Vˆ 2 ) (4.171)
Dˆ {Vˆ } = 2(n  m) (e cV e ) .
2 3 1 2
(4.172)

: Proof:
We have already prepared the proof for (i). Therefore we continue to prove (ii)
and (iii)
(i) D{ıˆ 2 BIQUUE}

D{Vˆ 2 } = E{[Vˆ 2  E{Vˆ 2 }]2 } = 2V 2 tr MVMV,


1
MV = [I n  A( AcV 1 A) 1 AcV 1 ],
nm
1
MVMV = [I n  A( A cV 1A) 1 A cV 1 ],
( n  m) 2
1
tr MVMV = Ÿ
nm
Ÿ D{Vˆ 2 } = 2(n  m) 1 (V 2 ) 2 .

(iii) D{Vˆ 2 }
Just replace within D{Vˆ 2 } the variance V 2 by the estimate Vˆ 2 .
Dˆ {Vˆ 2 } = 2(n  m) 1 (Vˆ 2 ) 2 . h
242 4 The second problem of probabilistic regression

Upon writing the chapter on variance-covariance component estimation I


learnt about the untimely death of J.F. Seely, Professor of Statistics at Oregon
State University, on 23 February 2002. J.F. Seely, born on 11 February 1941
in the small town of Mt. Pleasant, Utah, who made various influential contri-
butions to the theory of Gauss-Markov linear model, namely the quadratic
statistics for estimation of variance components. His Ph.D. adviser G. Zys-
kind had elegantly characterized the situation where ordinary least squares
approximation of fixed effects remains optimal for mixed models: the regres-
sion space should be invariant under multiplication by the variance-
covariance matrix. J.F. Seely extended this idea to variance-covariance com-
ponent estimation, introducing the notion of invariant quadratic subspaces
and their relation to completeness. By characterizing the class of admissible
embiased estimators of variance-covariance components. In particular, the
usual ANOVA estimator in 2-variance component models is inadmissible.
Among other contributions to the theory of mixed models, he succeeded in
generalizing and improving on several existing procedures for tests and con-
fidence intervals on variance-covariance components.

Additional Reading
Seely. J. and Lee, Y. (confidence interval for a variance: 1994), Azzam, A.,
Birkes, A.D. and Seely, J. (admissibility in linear models, polyhydral covariance
structure: 1988), Seely, J. and Rady, E. (random effects – fixed effects, linear
hypothesis: 1988), Seely, J. and Hogg, R.V. (unbiased estimation in linear mod-
els: 1982), Seely, J. (confidence intervals for positive linear combinations of
variance components, 1980), Seely, J. (minimal sufficient statistics and com-
pleteness, 1977), Olsen, A., Seely, J. and Birkes, D. (invariant quadratic embi-
ased estimators for two variance components, 1975), Seely, J. (quadratic sub-
spaces and completeness, 1971) and Seely, J. (linear spaces and unbiased estima-
tion, 1970).
5 The third problem of algebraic regression
- inconsistent system of linear observational equations
with datum defect: overdetermined- undertermined sys-
tem of linear equations:
{Ax + i = y | A  \ n×m , y  R ( A )  rk A < min{m, n}}

:Fast track reading:


Read only Lemma 5 (MINOS) and Lemma 5.9 (HAPS)

Lemma 5.2
G x -minimum norm,
G y -least squares solution

Lemma 5.3
G x -minimum norm,
G y -least squares solution

Definition 5.1 Lemma 5.4


G x -minimum norm, MINOLESS,
G y -least squares solution rank factorization

Lemma 5.5
MINOLESS
additive rank partitioning

Lemma 5.6
characterization of
G x , G y -MINOS

Lemma 5.7
eigenspace analysis
versus eigenspace synthesis
244 5 The third problem of algebraic regression

Lemma 5.9
D -HAPS

Definition 5.8
D -HAPS

Lemma 5.10
D -HAPS

We shall outline three aspects of the general inverse problem given in discrete
form (i) set-theoretic (fibering), (ii) algebraic (rank partitioning; “IPM”, the
Implicit Function Theorem) and (iii) geometrical (slicing).
Here we treat the third problem of algebraic regression, also called the general
linear inverse problem:
An inconsistent system of linear observational equations
{Ax + i = y | A  \ n× m , rk A < min {n, m}}
also called “under determined - over determined system of linear equations” is
solved by means of an optimization problem. The introduction presents us with
the front page example of inhomogeneous equations with unknowns. In terms of
boxes and figures we review the minimum norm, least squares solution
(“MINOLESS”) of such an inconsistent, rank deficient system of linear equa-
tions which is based upon the trinity
5-1 Introduction 245

5-1 Introduction
With the introductory paragraph we explain the fundamental concepts and basic
notions of this section. For you, the analyst, who has the difficult task to deal
with measurements, observational data, modeling and modeling equations we
present numerical examples and graphical illustrations of all abstract notions.
The elementary introduction is written not for a mathematician, but for you, the
analyst, with limited remote control of the notions given hereafter. May we gain
your interest?
Assume an n-dimensional observation space, here a linear space parameterized
by n observations (finite, discrete) as coordinates y = [ y1 ," , yn ]c  R n in which
an m-dimensional model manifold is embedded (immersed). The model manifold
is described as the range of a linear operator f from an m-dimensional parameter
space X into the observation space Y. As a mapping f is established by the
mathematical equations which relate all observables to the unknown parameters.
Here the parameter space X , the domain of the linear operator f, will be also
restricted to a linear space which is parameterized by coordinates
x = [ x1 ," , xm ]c  R m . In this way the linear operator f can be understood as a
coordinate mapping A : x 6 y = Ax. The linear mapping f : X o Y is geo-
metrically characterized by its range R(f), namely R(A), defined by R(f):=
{y  R n | y = f(x) for some x  X} which in general is a linear subspace of Y
and its kernel N(f), namely N(A), defined by N ( f ) := {x  X | f (x) = 0}. Here
the range R(f), namely the range space R(A), does not coincide with the n-
dimensional observation space Y such that y  R (f ) , namely y  R (A) . In
addition, we shall assume here that the kernel N(f), namely null space N(A) is
not trivial: Or we may write N(f) z {0}.
First, Example 1.3 confronts us with an inconsistent system of linear equations
with a datum defect. Second, such a system of equations is formulated as a spe-
cial linear model in terms of matrix algebra. In particular we are aiming at an
explanation of the terms “inconsistent” and “datum defect”. The rank of the
matrix A is introduced as the index of the linear operator A. The left complemen-
tary index n – rk A is responsible for surjectivity defect, which its right comple-
mentary index m – rk A for the injectivity (datum defect). As a linear mapping f
is neither “onto”, nor “one-to-one” or neither surjective, nor injective. Third, we
are going to open the toolbox of partitioning. By means of additive rank parti-
tioning (horizontal and vertical rank partitioning) we construct the minimum
norm – least squares solution (MINOLESS) of the inconsistent system of linear
equations with datum defect Ax + i = y , rk A d min{n, m }. Box 5.3 is an explicit
solution of the MINOLESS of our front page example.
Fourth, we present an alternative solution of type “MINOLESS” of the front
page example by multiplicative rank partitioning. Fifth, we succeed to identify
246 5 The third problem of algebraic regression

the range space R(A) and the null space N(A) using the door opener “rank
partitioning”.
5-11 The front page example
Example 5.1 (inconsistent system of linear equations with datum
defect: Ax + i = y, x  X = R m , y  Y  R n
A  R n× m , r = rk A d min{n, m} ):
Firstly, the introductory example solves the front page inconsistent system of
linear equations with datum defect,
 x1 + x2  1  x1 + x2 + i1 = 1
 x2 + x3  1 or  x2 + x3 + i2 = 1
+ x1  x3  3 + x1  x3 + i3 = 3

obviously in general dealing with the linear space X = R m


x, dim X = m, here
m=3, called the parameter space, and the linear space Y = R n
y , dim Y = n,
here n = 3 , called the observation space.
5-12 The front page example in matrix algebra
Secondly, by means of Box 5 and according to A. Cayley’s doctrine let us specify
the inconsistent system of linear equations with datum defect in terms of matrix
algebra.
Box 5.1:
Special linear model:
three observations, three unknowns, rk A =2

ª y1 º ª a11 a12 a13 º ª x1 º ª i1 º


y = «« y2 »» = «« a21 a22 a23 »» «« x2 »» + ««i2 »» œ
¬« y3 ¼» ¬« a31 a32 a33 ¼» ¬« x3 ¼» ¬« i3 ¼»
ª 1 º ª 1 1 0 º ª x1 º ª i1 º
œ y = Ax + i : «« 1 »» = «« 0 1 1 »» «« x2 »» + ««i2 »» œ
«¬ 3»¼ «¬ 1 0 1»¼ «¬ x3 »¼ «¬ i3 »¼

œ x c = [ x1 , x2 , x3 ], y c = [ y1 , y2 , y3 ] = [1, 1,  3], i c = [i1 , i2 , i3 ]


x  R 3×1 , y  Z 3×1  R 3×1

ª 1 1 0 º
A := «« 0 1 1 »»  Z 3×3  R 3×3
«¬ 1 0 1»¼

r = rk A = 2 .
5-1 Introduction 247

The matrix A  R n× m , here A  R 3×3 , is an element of R n× m generating a linear


mapping f : x 6 Ax. A mapping f is called linear if f (O x1 + x2 ) =
O f ( x1 ) + f ( x2 ) holds. The range R(f), in geometry called “the range space
R(A)”, and the kernel N(f), in geometry called “the null space N(A)” character-
ized the linear mapping as we shall see.
? Why is the front page system of linear equations called inconsistent ?
For instance, let us solve the first two equations, namely -x1 + x3 = 2 or x1 – x3 =
-2, in order to solve for x1 and x3. As soon as we compare this result to the third
equation we are led to the inconsistency 2 = 3. Obviously such a system of
linear equations needs general inconsistency parameters (i1 , i2 , i3 ) in order to
avoid contradiction. Since the right-hand side of the equations, namely the in
homogeneity of the system of linear equations, has been measured as well as the
linear model (the model equations) has been fixed, we have no alternative but
inconsistency.
Within matrix algebra the index of the linear operator A is the rank r = rk A ,
here r = 2, which coincides neither with dim X = m, (“parameter space”) nor
with dim Y = n (“observation space”). Indeed r = rk A < min {n, m}, here
r = rk A < min{3, 3}. In the terminology of the linear mapping f, f is neither
onto (“surjective”), nor one-to-one (“injective”). The left complementary index
of the linear mapping f, namely the linear operator A, which accounts for the
surjectivity defect, is given by d s = n  rkA, also called “degree of freedom”
(here d s = n  rkA = 1 ). In contrast, the right complementary index of the linear
mapping f, namely the linear operator A, which accounts for the injectivity defect
is given by d = m  rkA (here d = m  rkA = 1 ). While “surjectivity” relates to
the range R(f) or “the range space R(A)” and “injectivity” to the kernel N(f) or
“the null space N(A)” we shall constructively introduce the notion of
range R ( f ) versus kernel N ( f )
range space R (A) null space N ( f )

by consequently solving the inconsistent system of linear equations with datum


defect. But beforehand let us ask:
? Why is the inconsistent system of linear equations called
deficient with respect to the datum ?
At this point we have to go back to the measurement process. Our front page
numerical example has been generated from measurements with a leveling in-
strument: Three height differences ( yDE , yEJ , yJD ) in a triangular network have
been observed. They are related to absolute height x1 = hD , x2 = hE , x3 = hJ by
means of hDE = hE  hD , hEJ = hJ  hE , hJD = hD  hJ at points {PD , PE , PJ } , out-
lined in more detail in Box 5.1.
248 5 The third problem of algebraic regression

Box 5.2:
The measurement process of leveling and
its relation to the linear model
y1 = yDE = hDE + iDE =  hD + hE + iDE
y2 = yEJ = hEJ + iEJ =  hE + hJ + iEJ
y3 = yJD = hJD + iJD =  hJ + hD + iJD

ª y1 º ª hD + hE + iDE º ª  x1 + x2 + i1 º ª 1 1 0 º ª x1 º ª i1 º
« y » = «  h + h + i » = «  x + x + i » = « 0 1 1 » « x » + « i » .
« 2» « E J EJ » « 2 3 2» « »« 2» « 2»
«¬ y3 »¼ «¬ hJ + hD + iJD »¼ «¬  x3 + x1 + i3 »¼ «¬ 1 0 1»¼ «¬ x3 »¼ «¬ i3 »¼

Thirdly, let us begin with a more detailed analysis of the linear mapping
f : Ax  y or Ax + i = y , namely of the linear operator A  R n× m , r = rk A d
min{n, m}. We shall pay special attention to the three fundamental partitioning,
namely
(i) algebraic partitioning called additive and multiplicative
rank partitioning of the matrix A,
(ii) geometric partitioning called slicing of the linear space X
(parameter space) as well as of the linear space Y
(observation space),
(iii) set-theoretical partitioning called fibering of the set X of
parameter and the set Y of observations.
5-13 Minimum norm - least squares solution of the front page example
by means of additive rank partitioning
Box 5.3 is a setup of the minimum norm – least squares solution of the inconsis-
tent system of inhomogeneous linear equations with datum defect following the
first principle “additive rank partitioning”. The term “additive” is taken from the
additive decomposition y1 = A11x1 + A12 x 2 and y 2 = A 21x1 + A 22 x 2 of the ob-
servational equations subject to A11  R r × r , rk A11 d min{ n, m}.
Box 5.3:
Minimum norm-least squares solution of the inconsistent system
of inhomogeneous linear equations with datum defect ,
“additive rank partitioning”.
The solution of the hierarchical optimization problem
(1st) || i ||2I = min :
x
xl = arg{|| y  Ax || I = min | Ax + i = y, A  R n×m , rk A d min{ n, m }}
2
5-1 Introduction 249

(2nd) || x l ||2I = min :


xl
xlm = arg{|| xl ||2I = min | AcAxl = Acy, AcA  R m×m , rk AcA d m}

is based upon the simultaneous horizontal and vertical rank parti-


tioning of the matrix A, namely
ªA A12 º
A = « 11 , A  R r × r , rk A11 = rk A =: r
¬ A 21 A 22 »¼ 11
with respect to the linear model
y = Ax + i

ª y1 º ª A11 A12 º ª x1 º ª i1 º y1  R r ×1 , x1  R r ×1
«y » = « A » « » + « »,
¬ 2 ¼ ¬ 21 A 22 ¼ ¬ x 2 ¼ ¬ i 2 ¼ y 2  R ( n  r )×1 , x 2  R ( m  r )×1 .
First, as shown before, we compute the least-squares solution
|| i ||2I = min or ||y  Ax ||2I = min which generates standard normal
x x
equations
A cAxl = A cy
or
c A11 + Ac21 A 21
ª A11 c A12 + Ac21 A 22 º ª x1 º ª A11
A11 c Ac21 º ª y1 º
« Ac A + Ac A » « » =«
¬ 12 11 22 21
c A12 + A c22 A 22 ¼ ¬ x 2 ¼ ¬ A12
A12 c A c22 »¼ ¬« y 2 ¼»
or
ª N11 N12 º ª x1l º ª m1 º
=
«N
¬ 21 N 22 »¼ «¬ x 2 l »¼ «¬m 2 »¼
subject to
N11 := A11
c A11 + A c21 A 21 , N12 := A11
c A12 + Ac21 A 22 , m1 = A11c y1 + A c21y 2
N 21 := A12
c A11 + A c22 A 21 , N 22 := A12
c A12 + A c22 A 22 , m 2 = A12
c y1 + A c22 y 2 ,
which are consistent linear equations with an (injectivity) defect
d = m  rkA . The front page example leads us to

ª 1 1 0 º
ªA A12 º «
A = « 11 = 0 1 1 »»
¬ A 21 A 22 »¼ «
«¬ 1 0 1»¼

or
ª 1 1 º ª0º
A11 = « » , A12 = « »
¬ 0 1¼ ¬1 ¼
A 21 = [1 0] , A 22 = 1
250 5 The third problem of algebraic regression

ª 2 1 1º
A cA = «« 1 2 1»»
«¬ 1 1 2 »¼

ª 2 1º ª 1º
N11 = « » , N12 = « » , | N11 |= 3 z 0,
¬ 1 2 ¼ ¬ 1¼
N 21 = [ 1 1] , N 22 = 2

ª1º ª 4 º
y1 = « » , m1 = A11
c y1 + A c21y 2 = « »
¬1¼ ¬0¼
y 2 = 3, m 2 = A12
c y1 + A c22 y 2 = 4 .

Second, we compute as shown before the minimum norm solution


|| x l ||2I = min or x1cx1 + x c2 x 2 which generates the standard normal
l x
equations in the following way.
L (x1 , x 2 ) = x1cx1 + xc2 x 2 =
= (xc2l N12 1
c N11 1
 m1cN11 1
)(N11 1
N12 x 2l  N11 m1 ) + xc2l x 2l = min
x2
“additive decomposition of the Lagrangean”
L = L 0 + L1 + L 2
L 0 := m1cN11
2
m1 , L1:= 2xc2l N12 2
c N11 m1
L 2 := xc2l N12 2
c N11 N12 x 2l + xc2l x 2l

wL 1 wL1 1 wL2
(x 2lm ) = 0 œ (x 2lm ) + (x 2lm ) = 0
wx 2 2 wx 2 2 wx 2
2
c N11
œ  N12 m1 + (I + N12 2
c N11 N12 )x 2lm = 0 œ
œ x 2lm = (I + N12 2
c N11 N12 ) 1 N12 2
c N11 m1 ,

which constitute the necessary conditions. The theory of vector de-


rivatives is presented in Appendix B. Following Appendix A Facts:
Cayley inverse: sum of two matrices, formula (s9), (s10), namely
(I + BC1 A c) 1 BC1 = B( AB + C) 1 for appropriate dimensions of
the involved matrices, such that the identities holds
( I + N12 2
c N11 N12 ) 1 N12 2
c N11 = N12 c + N11
c ( N12 N12 2 1
)
we finally find
x 2 lm = N12 c + N112 ) 1 m1 .
c (N12 N12
The second derivatives

1 w2L
(x 2 lm ) = (N12
c N112 N12 + I ) > 0
2 wx 2 wxc2
5-1 Introduction 251

due to positive-definiteness of the matrix I + N12 2


c N11 N12 generate
the sufficiency condition for obtaining the minimum of the uncon-
strained Lagrangean. Finally let us backward transform
x 2 l 6 x1m = N11
1
N12 x 2 l + N11
1
m1 ,
x1lm = N111 N12 N12 c + N11
c (N12 N12 ) m1 + N11
2 1 1
m1 .

Let us right multiply the identity


c =  N11N11
N12 N12 c + N12 N12
c + N11N11
c

c + N11 N11
by (N12 N12 c ) 1 such that
c + N11 N11
c (N12 N12
N12 N12 c ) 1 = N11 N11 c + N11N11
c (N12 N12 c ) 1 + I

holds, and left multiply by N111 , namely


N111 N12 N12 c + N11 N11
c (N12 N12 c ) 1 = N11 c + N11 N11
c (N12 N12 c ) 1 + N11
1
.

Obviously we have generated the linear form


ª x1lm = N11 c + N11N11
c (N12 N12 c ) 1 m1
«
¬ x 2lm = N12 c + N11N11
c (N12 N12 c ) 1 m1
or
ª x º ª Nc º
xlm = « 1lm » = « 11 » (N12 N12
c + N11N11
c ) 1 m1
c ¼
¬ x 2lm ¼ ¬ N12
or
ª A c A + A c21A 21 º
x lm = « 11 11
c A11 + A c22 A 21 »¼
¬ A12
c A12 + A c21A 22 )( A12
[( A11 c A11 + A c22 A 21 ) + ( A11
c A11 + A c21A 21 ) 2 ]1
c y1 + A c21y 2 ].
[( A11

Let us compute numerically xlm for the front page example.

ª 5 4 º ª1 1º
c =«
N11N11 » c =«
, N12 N12 »
¬ 4 5 ¼ ¬1 1¼
ª 6 3 º 1 ª6 3º
c + N11N11
N12 N12 c =« » c + N11N11
, [N12 N12 c ]1 =
¬ 3 6 ¼ 27 «¬ 3 6 »¼
ª 4 º 1 ª 4 º 4 4
m1 = « » Ÿ x1lm = « » , x 2lm = , || xlm ||2I = 2
0
¬ ¼ 3 ¬ ¼0 3 3
252 5 The third problem of algebraic regression

4 4
x1lm = hˆD =  , x2lm = hˆE = 0, x3lm = hˆJ =
3 3
4
|| xlm ||2I = 2
3
x + x + x = 0 ~ hˆ + hˆ + hˆ = 0.
1lm 2lm 3lm D E J

The vector i lm of inconsistencies has to be finally computed by


means of
i lm = y  Axlm

ª1º
1 1
i lm =  ««1»» , Aci l = 0, || i lm ||2I = 3.
3 3
«¬1»¼

The technique of horizontal and vertical rank partitioning has been


pioneered by H. Wolf (1972,1973).
h
5-14 Minimum norm - least squares solution of the front page example
by means of multiplicative rank partitioning:
Box 5.4 is a setup of the minimum norm-least squares solution of the inconsistent
system of inhomogeneous linear equations with datum defect following the first
principle “multiplicative rank partitioning”. The term “multiplicative” is taken
from the multiplicative decomposition y = Ax + i = DEy + i of the observational
equations subject to
A = DE, D  R n×r , E  R r × m , rk A = rk D = rk E d min{n, m} .

Box 5.4:
Minimum norm-least squares solution of the inconsistent
system of inhomogeneous linear equations with datum defect
multiplicative rank partitioning
The solution of the hierarchical optimization problem
(1st) ||i ||2I = min :
x
xl = arg{|| y  Ax ||2I = min | Ax + i = y , A  R n×m , rk A d min{ n, m }}
(2nd) ||x l ||2I = min :
xl

x lm = arg{|| x l || I = min | A cAx l = A cy, AcA  R m×m , rk AcA d m}


2

is based upon the rank factorization A = DE of the matrix


A  R n× m subject to simultaneous horizontal and vertical rank
partitioning of the matrix A, namely
5-1 Introduction 253

ª D  R n×r , rk D = rk A =: r d min{n, m}
A = DE = « r ×m
¬E  R , rk E = rk A =: r d min{n, m}
with respect to the linear model
y = Ax + i

ª Ex =: z
y = Ax + i = DEx + i « DEx = Dz Ÿ y = Dz + i .
¬
First, as shown before, we compute the least-squares solution
|| i ||2I = min or ||y  Ax ||2I = min which generates standard normal
x x
equations
DcDz l = Dcy Ÿ z l = (DcD) 1 Dcy = Dcl y ,
which are consistent linear equations of rank rk D = rk DcD = rk A = r.
The front page example leads us to

ª 1 1 0 º ª 1 1 º
ªA A12 º «
A = DE = « 11 » = « 0 1 1 » , D = «« 0 1»»  R 3×2
»
¬ A 21 A 22 ¼
«¬ 1 0 1»¼ «¬ 1 0 »¼
or

DcDE = DcA Ÿ E = (DcD) 1 DcA


ª 2 1º 1 ª2 1º ª1 0 1º 2×3
DcD = « » , (DcD) 1 = « » ŸE=« »R
¬ 1 2 ¼ 3 ¬1 2¼ ¬ 0 1 1¼
1 ª1 0 1º
z l = (DcD) 1 Dc = « y
3 ¬0 1 1»¼
ª1º
4 ª2º
y = «« 1 »» Ÿ z l =  « »
3 ¬1 ¼
«¬ 3»¼

1 ª1 0 1º
z l = (DcD) 1 Dc = « y
3 ¬ 0 1 1»¼
ª1º
4 ª2º
y = «« 1 »» Ÿ z l =  « » .
3 ¬1 ¼
«¬ 3»¼

Second, as shown before, we compute the minimum norm solu-


tion || x A ||2I = min of the consistent system of linear equations with
A x
datum defect, namely
254 5 The third problem of algebraic regression

xlm = arg{|| xl ||2I = min | Exl = ( DcD) 1 Dcy }.


xl

As outlined in Box1.3 the minimum norm solution of consistent equa-


tions with datum defect namely Exl = (DcD) 1 Dcy, rk E = rk A = r is

xlm = Ec(EEc) 1 (DcD) 1 Dcy


xlm = Em Dl y = A lm

y = A+y ,

which is limit on the minimum norm generalized inverse. In sum-


mary, the minimum norm-least squares solution generalized in-
verse (MINOLESS g-inverse) also called pseudo-inverse A + or
Moore-Penrose inverse is the product of the MINOS g-inverse Em
(right inverse) and the LESS g-inverse Dl (left inverse). For the
front page example we are led to compute

ª1 0 1º ª2 1º
E=« » , EEc = « »
¬0 1 1¼ ¬1 2¼
ª2 1º
1 ª 2 1º 1«
(EEc) = «
1
, Ec(EEc) = « 1
1
2 »»
3 ¬ 1 2 »¼ 3
«¬ 1 1 »¼
ª 1 0 1º

xlm = Ec(EEc) (DcD) Dcy = « 1 1
1 1
0 »» y
3
«¬ 0 1 1»¼
ª1º ª 4 º ª 1º
« » 1« » 4« » 4
y = « 1 » Ÿ xlm = « 0 » = « 0 » , || xlm ||= 2
3 3 3
«¬ 3»¼ «¬ +4 »¼ «¬ +1»¼

ˆ
ª x1lm º ª« hD º» ª 1º
« » ˆ 4« »
xlm = « x2lm » = « hE » = « 0 »
3
«¬ x3lm »¼ «« hˆ »» «¬ +1»¼
¬ ¼
J

4
|| xlm ||= 2
3
x1lm + x2lm + x3lm = 0 ~ hˆD + hˆE + hˆJ = 0.

The vector i lm of inconsistencies has to be finally computed by


means of
i lm := y  Axlm = [I n  AAlm

]y,
i lm = [I n  Ec(EEc) 1 (DcD) 1 Dc]y;
5-1 Introduction 255

ª1º
1 1
i lm =  ««1»» , Aci l = 0, || i lm ||= 3.
3 3
«¬1»¼

h
Box 5.5 summarizes the algorithmic steps for the diagnosis of the simultaneous
horizontal and vertical rank partitioning to generate ( Fm Gy )-MINOS.
1

Box 5.5:
algorithm
The diagnostic algorithm for solving a general rank deficient system
of linear equations
y = Ax, A  \ n× m , rk A < min{n, m}
by means of simultaneous horizontal and vertical rank partioning
Determine
the rank of the matrix A
rk A < min{n, m} .

Compute
“the simultaneous horizontal and vertical
rank partioning”
r ×r r ×( m  r )
ª A11 A12 º A11  \ , A12  \
A=« »,
¬ A 21 A 22 ¼ A 21  \ ( ) , A 22  \ ( ) ( )
n  r ×r n  r × m r

“n-r is called the left complementary index, m-r the right


complementary index”
“A as a linear operator is neither injective ( m  r z 0 ) ,
nor surjective ( n  r = 0 ) . ”

Compute
the range space R(A) and the null space N(A)
of the linear operator A
R(A) = span {wl1 ( A )," , wlr ( A )}
N(A) = {x  \ n | N11x1A + N12 x 2 A = 0}
or
x1A =  N111
N12 x 2 A .
256 5 The third problem of algebraic regression

Compute
(Tm , Gy ) -MINOS
ª x º ª Nc º ªy º
x Am = « 1 » = « 11 » = [ N12 N12 c + N11N11 c G11y , A c21G12y ] « 1 »
c ]1 [ A11
c ¼
¬ x 2 ¼ ¬ N12 ¬y2 ¼
N11 := A11 c G11A11 + A c21G 22 A 21 , N12 := A11
y y
c G11A12 + A c21G 22 A 22
y y

N 21 := N12
c , N 22 := A12
c G11y A12 + A c21G 22
y
A 22 .

5-15 The range R(f) and the kernel N(f) interpretation of


“MINOLESS” by three partitionings
(i) algebraic (rank partitioning)
(ii) geometric (slicing)
(iii) set-theoretical (fibering)

Here we will outline by means of Box 5.6 the range space as well as the null
space of the general inconsistent system of linear equations.

Box 5.6:
The range space and the null space of the general
inconsistent system of linear equations
n ×m
Ax + i = y , A  \ , rk A d min{n, m}

“additive rank partitioning”.


The matrix A is called a simultaneous horizontal and vertical rank
partitioning, if
ªA A12 º r ×r
A = « 11 » , A11 = \ , rk A11 = rk A =: r
A
¬ 21 A 22 ¼

with respect to the linear model


n ×m
y = Ax + i, A  \ , rk A d min{n, m}

identification of the range space


n
R(A) = span {¦ e i aij | j  {1," , r}}
i =1
“front page example”
5-1 Introduction 257

ª1º ª 1 1 0 º
ª y1 º « » « » 3× 3
«¬ y2 »¼ = « 1 » , A = « 0 1 1 »  \ , rk A =: r = 2
¬ 3 ¼ ¬ 1 0 1¼
R(A)=span {e1 a11 + e 2 a21 + e3 a31 , e1 a12 + e 2 a22 + e3 a32 }  \ 3
or
R(A) = span {e1 + e3 , e1  e 2 }  \ 3 = Y
c1 = [1, 0,1], c 2 = [1,  1, 0], \ 3 = span{ e1 , e 2 , e3 }

ec2 ec1
O

e3

y
e1 e2

Figure 5.1 Range R (f ), range space R ( A) , (y  R ( A))

identification of the null space


N11x1A + N12 x 2 A = A11c y1 + A c21 y 2
N12 x1A + N 22 x 2 A = A12
c y1 + A c22 y 2

N ( A ):= {x  \ n | N11x1A + N12 x 2 A = 0}


or
N11x1A + N12 x 2 A = 0 œ x1A =  N11
1
N12 x 2 A
“front page example”
ª x1 º 1 ª 2 1 º ª 1º ªx3 º
« x » =  3 « 1 2 » « 1» x 3A = « x »
¬ 3 ¼A ¬ ¼¬ ¼ ¬ 3 ¼A

ª 2 1º 1 ª2 1º ª 1º
N11 = « »
1
, N11 = « » , N12 = « »
¬ 1 2 ¼ 3 ¬1 2¼ ¬ 1¼
x1A = u, x 2A = u, x 3A = u
N(A)= H 01 = G1,3 .
258 5 The third problem of algebraic regression

N(A)= L 0
1

N(A)=G1,3  \
3

x2

x1
Figure 5.2 : Kernel N( f ), null space N(A), “the null space N(A) as
the linear manifold L 0 (Grassmann space G1,3) slices the
1

parameter space X = \ 3 ”, x3 is not displayed .

Box 5.7 is a summary of MINOLESS of a general inconsistent system of linear


equations y = Ax + i. Based on the notion of the rank r = rk A < min{n, m}, we
designed the generalized inverse of
MINOS type
or
A Am or A1,2,3,4 .

Box 5.7
MINOLESS of a general inconsistent system of linear equations :
f : x o y = Ax + i, x  X = \ m (parameter space),
y Y = \ n (observation space)
r := rk A < min{n, m}
A- generalized inverse of MINOS type:
A1,2,3,4 or A Am
Condition # 1 Condition # 1
f(x)=f(g(y)) Ax =AA-Ax
œ œ
f = f D gD f AA-A=A
Condition # 2 Condition # 2
g(y)=g(f(x)) A y=A-Ax=A-AA-y
-

œ œ
g = gD f Dg A-AA-=A-
Condition # 3 Condition # 3
f(g(y))=yR(A) A-Ay= yR(A)
œ œ
5-1 Introduction 259

f D g = PR ( A ) A  A = PR ( A 
)

Condition # 4 Condition # 4
g(f(x))= y R ( A ) A AA  = PR ( A )

g D f = PR (g)
A  A = PR ( A ) .


R(A-) A
R(A )  D(A)
-
D(A-) R(A)  D(A-)
D(A) A- R(A)
P R(A)
A
R(A-)
A-

AA  = PR ( A )
f D g = PR ( f )

Figure 5.3 : Least squares, minimum norm generalized inverse


A Am ( A1,2,3,4 or A + ) , the Moore-Penrose-inverse
(Tseng inverse)

A similar construction of the generalized inverse of a matrix applies


to the diagrams of the mappings:

(1) under the mapping A:


D(A) o R(A)
AA  = PR ( A )
f D g = PR ( f )

(2) under the mapping A-:


R (A ) o PR ( A 
)

A A = PR ( A  )


g D f = PR ( g ) .

In addition, we follow Figure 5.4 and 5.5 for the characteristic diagrams de-
scribing:
260 5 The third problem of algebraic regression

(i) orthogonal inverses and adjoints in reflexive dual Hilbert spaces


Figure 5.4

X A Y Y
X
+
A =A 
A =A

( G y ) 1 = G y
*
Gx A G y A ( A G y A ) 

(A Gy A) 
A G y A Gy *
G x = (G x ) 1
*

X A Y X A Y

y 6 y = G y y  y  Y, y  Y

(ii) Venn diagrams, trivial fiberings

Figure 5.5 : Venn diagram, trivial fibering of the domain D(f):


Trivial fibers N ( f ) A , trivial fibering of the range
R( f ): trivial fibers R ( f ) and R ( f ) A ,
f : \m = X o Y = \n ,
X set system of the parameter space, Y set system of the
observation space

In particularly, if Gy is rank defect we proceed as follows.

ª / 0º
Gy G y = « y »
¬ 0 0¼
synthesis analysis
5-1 Introduction 261

ª Uc º
G*y = UcG y U = « 1 » G y [U1 , U 2 ]
G y = UG *y U c ¬ Uc2 ¼
= U1ȁ y U1c ª U1cG y U1 U1cG y U 2 º
=« »
¬ Uc2G y U1c Uc2G y U 2 ¼

ȁ y = U1cG y U1 Ÿ U1ȁ y = G y U1
0 = G y Uc2 and U1cG y U 2 = 0

|| y  Ax ||G2 = || i ||2 = i cG y i º
y
»Ÿ
G y = U1c ȁ y U1 »¼
(y  Ax)cU1c ȁ y U1 (y  Axc) = min Ÿ
x

Ÿ U1 ( y  Ax ) = U1i = i
If we use simultaneous horizontal and vertical rank partitioning
ª y º ªi º ªA A12 º ª x1 º ª i1 º
y = « 1 » + « 1 » = « 11 » « » + «i »
y i
¬ 2 ¼ ¬ 2 ¼ ¬ 21 A A 22 ¼ ¬ x 2 ¼ ¬ 2¼
subject to special dimension identities

y1  \ r ×1 , y 2  \ ( n  r )×1
A11  \ r × r , A12  \ r × ( m  r )
A 21  \ ( n  r )× r , A 22  \ ( n  r )× ( m  r ) ,
we arrive at Lemma 5.0.

Lemma 5.0 ((Gx, Gy) –MINOLESS, simultaneous horizontal and


vertical rank partitioning):
ªy º ªi º ªA A12 º ª x1 º ª i1 º
y = « 1 » + « 1 » = « 11 + (5.1)
¬ y 2 ¼ ¬i 2 ¼ ¬ A 21 A 22 »¼ «¬ x 2 »¼ «¬ i 2 »¼
subject to the dimension identities

y1  \ r ×1 , y 2  \ ( n  r )×1 , x1  \ r ×1 , x 2  \ ( m  r )×1
A11  \ r × r , A12  \ r ×( m  r )
A 21  \ ( n  r )× r , A 22  \ ( n  r )×( m  r )
is a simultaneous horizontal and vertical rank partitioning of the
linear model (5.1)
262 5 The third problem of algebraic regression

{y = Ax + i, A  \ n× m , r := rk A < min{n, m}} (5.2)

r is the index of the linear operator A,


n-r is the left complementary index and
m-r is the right complementary index.
x A is Gy-LESS if it fulfils the rank
x Am is MINOS of A cG y Ax A = A cG y y , if
(x1 )Am = N111 N12 [N12
c N111G11x  2G 21
x 1
N11 N12 + G x22 ]1
(5.3)
c N111G11x N111  2G x21 N11
(N12 1
)m1 + N111 m1

(x 2 )Am = [N12
c N111G11x N111 N12  2G 21
x
N111 N12 + G x22 ]1
(5.4)
c N111G11x N11
(N12 1 1
 2G x21 N11 )m1 .
The symmetric matrices (Gx, Gy) of the metric of the parameter space X
as well as of the observation space Y are consequently partitioned as

ª G y G12
y
º x
ª G11 x
G12 º
G y = « 11y y »
and « x x »
= Gx (5.5)
¬G 21 G 22 ¼ ¬ G 21 G 22 ¼

subject to the dimension identities


y
G11  \ r×r , G12
y
 \ r×( n  r ) x
G11  \ r×r , G12
x
 \ r×( m r )
versus
y
G 21  \ ( n r )×r , G 22
y
 \ ( n r )×( n r ) G x21  \ ( m r )×r , G x22  \ ( m r )×( m r )

deficient normal equations


A cG y Ax A = A cG y y (5.6)
or
ª N11 N12 º ª x1 º ª M11 M12 º ª y1 º ª m1 º
«N = = (5.7)
¬ 21 N 22 »¼ «¬ x 2 »¼ A «¬ M 21 M 22 »¼ «¬ y 2 »¼ «¬ m2 »¼

subject to
N11 := A11
c G11y A11 + A c21G 21
y
A11 + A11 y
c G12 A 21 + A c21G 22
y
A 21 (5.8)

N12 := A11
c G11y A12 + A c21G 21
y
A12 + A11 y
c G12 A 22 + A c21G 22
y
A 22 , (5.9)

N 21 = N12
c , (5.10)

N 22 := A12 y
c G11 A12 + A c22 G 21
y
A12 + A12 y
c G12 A 22 + Ac22 G 22
y
A 22 , (5.11)

M11 := A11 y
c G11 + A c21G 21
y
, M12 := A11 y
c G12 + A c21G 22
y
, (5.12)
5-2 MINOLESS and related solutions 263

M 21 := A12
c G11y + Ac22 G y21 , M 22 := A12 y
c G12 + A c22 G y22 , (5.13)

m1 := M11y1 + M12 y 2 , m2 := M 21y1 + M 22 y 2 . (5.14)

5-2 MINOLESS and related solutions like weighted minimum


norm-weighted least squares solutions
5-21 The minimum norm-least squares solution: "MINOLESS"
The system of the inconsistent, rank deficient linear equations Ax + i = y subject
to A  \ n× m , rk A < min{n, m} allows certain solutions which we introduce by
means of Definition 5.1 as a solution of a certain hierarchical optimization prob-
lem. Lemma 5.2 contains the normal equations of the hierarchical optimization
problems. The solution of such a system of the normal equations is presented in
Lemma 5.3 for the special case (i) | G x |z 0 and case (ii) | G x + A cG y A |z 0,
but | G x |= 0 . For the analyst:

Lemma 5.4 Lemma 5.5


presents the toolbox of presents the toolbox of
MINOLESS for multiplicative and MINOLESS for additive
rank partitioning, known as rank partitioning.
rank factorization.

Definition 5.1 ( G x -minimum norm- G y -least squares solution):


A vector x Am  X = \ m is called G x , G y -MINOLESS (MInimum NOrm
with respect to the G x -seminorm-Least Squares Solution with respect to
the G y -seminorm) of the inconsistent system of linear equations with
datum defect
ª rk A d min {n, m} n× m
A\
«
Ax + i = y « y  R ( A), N ( A) z {0} (5.15)
«x  X = \n , y  Y = \n ,
«¬
if and only if
first
x A = arg{|| i ||G = min | Ax + i = y, rk A d min{n, m}} ,
y
(5.16)
x
second
x Am = arg{|| x ||G = min | A cG y Ax A = AcG y y}
x
(5.17)
x
is G y -MINOS of the system of normal equations A cG y Ax A = A cG y y
which are G x -LESS.

The solutions of type G x , G y -MINOLESS can be characterized as following.


264 5 The third problem of algebraic regression

Lemma 5.2 ( G x -minimum norm, G y least squares solution):


A vector x Am  X = \ m is called G x , G y -MINOLESS of (5.1), if and
only if the system of normal equations

ª Gx A cG y A º ª x Am º ª 0 º
« A cG A = (5.18)
¬ y 0 »¼ «¬ OAm »¼ «¬ A cG y y »¼

with respect to the vector OAm of “Lagrange multipliers” is fulfilled. x Am


always exists and is uniquely determined, if the augmented matrix
[G x , A cG y A ] agrees to the rank identity
rk[G x , A cG y A ] = m (5.19)
or, equivalently, if the matrix G x + A cG y A is regular.

:Proof:
G y -MINOS of the system of normal equations A cG y Ax A = A cG y is constructed
by means of the constrained Lagrangean
L( x A , OA ) := xcA G x x A + 2OAc( A cG y Ax A  A cG y y ) = min ,
x ,O

such that the first derivatives


1 wL º
(x Am , OAm ) = G x x Am + A cG y AOAm = 0 »
2 wx
»œ
1 wL
(x Am , OAm ) = AcG y AxAm  AcG y y = 0 »
2 wO »¼

ª Gx A cG y A º ª x Am º ª 0 º
œ« =
¬ A cG y A 0 »¼ «¬ OAm »¼ «¬ AcG y y »¼

constitute the necessary conditions. The second derivatives

1 w 2L
( x Am , OAm ) = G x t 0 (5.20)
2 wxwx c
due to the positive semidefiniteness of the matrix G x generate the sufficiency
condition for obtaining the minimum of the constrained Lagrangean. Due to the
assumption A cG y y  R( A cG x A ) the existence of G y -MINOS x Am is granted. In
order to prove uniqueness of G y -MINOS x Am we have to consider
case (i) and case (ii)
G x positive definite G x positive semidefinite .

case (i): G x positive definite


5-2 MINOLESS and related solutions 265

Gx A cG y A
= G x  A cG y AG x1A cG y A = 0. (5.21)
A cG y A 0
First, we solve the system of normal equations which characterize x Am G x , G y -
MINOLESS of x for the case of a full rank matrix of the metric G x of the para-
metric space X, rk G x = m in particular. The system of normal equations is
solved for

ª x Am º ª G x A cG y A º ª 0 º ª C1 C2 º ª 0 º
« O » = « A cG A =
0 »¼ «¬ A cG y y »¼ «¬ C3 C4 »¼ «¬ A cG y y »¼
(5.22)
¬ Am ¼ ¬ y

subject to

ª Gx A cG y A º ª C1 C2 º ª G x A cG y A º ª G x A cG y A º
« A cG A » « » =«
¬ y
« »
0 ¼ ¬ C3 C4 ¼ ¬ A cG y A 0 ¼ ¬ A cG y A 0 »¼
(5.23)
as a postulate for the g-inverse of the partitioned matrix. Cayley multiplication of
the three partitioned matrices leads us to four matrix identities.
G x C1G x + G x C2 A cG y A + A cG y AC3G x + A cG y AC4 A cG y A = G x (5.24)
G x C1A cG y A + A cG y AC3 A cG y A = A cG y A (5.25)
A cG y AC1G x + A cG y AC2 A cG y A = AcG y A (5.26)
A cG y AC1 A cG y A = 0. (5.27)

Multiply the third identity by G x1A cG y A from the right side and substitute the
fourth identity in order to solve for C2.
A cG y AC2 A cG y AG x1A cG y A = A cG y AG x1A cG y A (5.28)
C2 = G x1A cG y A ( A cG y AG x1A cG y A ) 
solves the fifth equation
A cG y AG x1A cG y A ( A cG y AG x1A cG y A )  A cG y AG x1A cG y A =
(5.29)
= A cG y AG x1A cG y A
by the axiom of a generalized inverse
x Am = C2 A cG y y (5.30)

x Am = G y1A cG y A ( A cG y AG x1A cG y A )  A cG y y . (5.31)

We leave the proof for


“ G x1A cG y A ( A cG y AG x1A cG y A )  A cG y y is the weighted
pseudo-inverse or Moore Penrose inverse A G+ G ” as an exercise.
y x
266 5 The third problem of algebraic regression

case (ii): G x positive semidefinite


Second, we relax the condition rk G x = m by the alternative rk[G x , A cG y A ] = m
G x positive semidefinite. Add the second normal equation to the first one in
order to receive the modified system of normal equations

ªG x + A cG y A A cG y A º ª x Am º ª A cG y y º
« A cG A = (5.32)
¬ y 0 »¼ «¬ OAm »¼ «¬ A cG y y »¼

rk(G x + A cG y A ) = rk[G x , A cG y A ] = m . (5.33)

The condition rk[G x , A cG y A ] = m follows from the identity

ªG  0 º ª Gx º
G x + A cG y A = [G x , A cG y A ] « x  »« », (5.34)
¬ 0 ( A cG y A ) ¼ ¬ A cG y A ¼
namely G x + AcG y A z 0. The modified system of normal equations is solved
for

ª x Am º ªG x + A cG y A AcG y A º ª A cG y y º
« O » = « A cG A 0 »¼ «¬ A cG y y »¼
=
¬ Am ¼ ¬ y
(5.35)
ª C C2 º ª A cG y y º ª C1A cG y y + C2 A cG y y º
=« 1 »« »=« »
¬C3 C4 ¼ ¬ A cG y y ¼ ¬ C3A cG y y + C4 A cG y y ¼
subject to
ªG x + A cG y A A cG y A º ª C1 C2 º ªG x + A cG y A A cG y A º
« A cG A =
¬ y 0 »¼ «¬C3 C4 »¼ «¬ A cG y A 0 »¼
(5.36)
ªG x + A cG y A A cG y A º

¬ A cG y A 0 »¼

as a postulate for the g-inverse of the partitioned matrix. Cayley multiplication of


the three partitioned matrices leads us to the four matrix identities
“element (1,1)”
(G x + A cG y A)C1 (G x + A cG y A) + A cG y AC3 (G x + A cG y A) +
(5.37)
+(G x + A cG y A)C2 A cG y A + A cG y AC4 A cG y A = G x + A cG y A

“element (1,2)”
(G x + A cG y A)C1 A cG y A + A cG y AC3 = A cG y A (5.38)
5-2 MINOLESS and related solutions 267

“element (2,1)”
A cG y AC1 (G x + AcG y A) + AcG y AC2 AcG y A = AcG y A (5.39)

“element (2,2)”
A cG y AC1 A cG y A = 0. (5.40)

First, we realize that the right sides of the matrix identities are symmetric matri-
ces. Accordingly the left sides have to constitute symmetric matrices, too.
(G x + A cG y A)C1c (G x + A cG y A) + (G x + A cG y A)Cc3 A cG y A +
(1,1):
+ A cG y ACc2 (G x + A cG y A) + A cG y ACc4 AcG y A = G x + A cG y A

(1,2): A cG y AC1c (G x + AcG y A) + Cc3 A cG y A = A cG y A

(2,1): (G x + A cG y A )C1cA cG y A + A cG y ACc2 A cG y A = A cG y A

(2,2): A cG y AC1cA cG y A = A cG y AC1A cG y A = 0 .


We conclude
C1 = C1c , C2 = Cc3 , C3 = Cc2 , C4 = Cc4 . (5.41)

Second, we are going to solve for C1, C2, C3= C2 and C4.
C1 = (G x + A cG y A) 1{I m  A cG y A[ AcG y A(G x + A cG y A) 1 A cG y A]
(5.42)
A cG y A(G x + A cG y A ) 1}

C2 = (G x + A cG y A) 1 A cG y A[ A cG y A(G x + A cG y A) 1 A cG y A] (5.43)

C3 = [ A cG y A(G x + A cG y A) 1 A cG y A] AcG y A(G x + A cG y A) 1 (5.44)

C4 = [ A cG y A (G x + A cG y A ) 1 A cG y A ] . (5.45)

For the proof, we depart from (1,2) to be multiplied by A cG y A(G x + A cG y A) 1


from the left and implement (2,2)
A cG y AC2 AcG y A(G x + A cG y A ) 1 A cG y A = A cG y A (G x + A cG y A ) 1 A cG y A .
Obviously, C2 solves the fifth equation on the basis of the g-inverse
[ A cG y A(G x + A cG y A) 1 A cG y A]

or
A cG y A(G x + A cG y A) 1 A cG y A[ A cG y A(G x + A cG y A) 1 A cG y A]
(5.46)
A cG y A(G x + A cG y A ) 1 A cG y A = A cG y A(G x + A cG y A) 1 A cG y A .
268 5 The third problem of algebraic regression

We leave the proof for


“ (G x + A cG y A) 1 AcG y A[ A cG y A(G x + A cG y A ) 1 A cG y A] A cG y is the
weighted pseudo-inverse or Moore-Penrose inverse A G+ y ( G x + AcG y A ) ”

as an exercise. Similarly,
C1 = (I m  C2 A cG y A)(G x + A cG y A) 1 (5.47)

solves (2,2) where we again take advantage of the axiom of the g-inverse,
namely
A cG y AC1 AcG y A = 0 œ (5.48)

A cG y A(G x + A cG y A) 1 A cG y A(G x + A cG y A) 1 A cG y A 
 A cG y A(G x + A cG y A) 1 A cG y A[ A cG y A(G x + A cG y A) 1 A cG y A]
A cG y A(G x + A cG y A) 1 A cG y A = 0 œ

œ A cG y A(G x + A cG y A ) 1 A cG y A 
 A cG y A(G x + A cG y A ) 1 A cG y A( A cG y A(G x + A cG y A ) 1 A cG y A) 
A cG y A(G x + A cG y A ) 1 A cG y A = 0.

For solving the system of modified normal equations, we have to compute


C1 A cG y = 0 œ C1 = A cG y A = 0 œ A cG y AC1 AcG y A = 0 ,
a zone identity due to (2,2). In consequence,
x Am = C2 A cG y y (5.49)
has been proven. The element (1,1) holds the key to solve for C4 . As soon as we
substitute C1 , C2 = Cc3 , C3 = Cc2 into (1,1) and multiply
left by and right by
A cG y A(G x + AcG y A) 1 (G x + AcG y A) 1 AcG y A,

we receive
2AcGy A(Gx + AcGy A)1 AcGy A[AcGy A(Gx + AcGy A)1 AcGy A] AcGy A
(Gx + AcGy A)1 + AcGy A(Gx + AcGy A)1 AcGy AC4 AcGy A(Gx + AcGy A)1 AcGy A =
= AcGy A(Gx + AcGy A)1 AcGy A.
Finally, substitute
C4 = [ A cG y A(G x + A cG y A) 1 A cG y A] (5.50)
5-2 MINOLESS and related solutions 269

to conclude
A cG y A(G x + A cG y A) 1 A cG y A[ A cG y A(G x + A cG y A) 1 A cG y A] A cG y A
(G x + A cG y A) 1 AcG y A = AcG y A(G x + AcG y A) 1 AcG y A ,
namely the axiom of the g-inverse. Obviously, C4 is a symmetric matrix such
that C4 = Cc4 .
Here ends my elaborate proof.
The results of the constructive proof of Lemma 5.2 are collected in Lemma 5.3.
Lemma 5.3 ( G x -minimum norm, G y -least squares solution:
MINOLESS):
x Am = Ly
ˆ is G -minimum norm, G -least squares solution of (5.1)
x y
subject to
r := rk A = rk( A cG y A ) < min{n, m}
rk(G x + A cG y A ) = m

if and only if

Lˆ = A G+ y Gx
= ( A Am ) G y Gx
(5.51)

Lˆ = (G x + A cG y A ) 1 A cG y A[ A cG y A(G x + A cG y A ) 1 A cG y A ] AcG y (5.52)

xAm = (G x + AcG y A)1 AcG y A[ AcG y A(G x + AcG y A)1 AcG y A] AcG y y , (5.53)

where A G+ G = A1,2,3,4
y G G is the G y , G x -weighted Moore-Penrose inverse. If
x y x

rk G x = m , then

Lˆ = G x1A cG y A( A cG y AG x1A cG y A )  A cG y (5.54)

x Am = G x1A cG y A ( A cG y AG x1A cG y A )  A cG y y (5.55)

is an alternative unique solution of type MINOLLES.


Perhaps the lengthy formulae which represent G y , G x - MINOLLES in terms of
a g-inverse motivate to implement
explicit representations for the analyst
of the G x -minimum norm (seminorm), G y -least squares solution, if multiplica-
tion rank partitioning, also known as rank factorization, or additive rank parti-
tioning of the first order design matrix A is available. Here, we highlight both
representations of A + = A Am .
270 5 The third problem of algebraic regression

Lemma 5.4 ( G x -minimum norm, G y -least squares solution:


MINOLESS, rank factorization)
x Am = Ly
ˆ is G -minimum norm, G -least squares solution (MINOLLES)
x y
of (5.1)
{Ax + i = y | A  \ n×m , r := rk A = rk( A cG y A ) < min{n, m}} ,
if it is represented by multiplicative rank partitioning or rank factoriza-
tion
A = DE, D  \ n  r , E  \ r ×m (5.56)
as
case (i): G y = I n , G x = I m

Lˆ = A Am = Ec( EEc) 1 ( DcD) 1 Dc (5.57)

ˆ = E D ª«ER = Em = Ec(EEc)
 1
right inverse
L R L (5.58)
¬ DL = DA = (DcD) Dc left inverse
  1

x Am = A Am y = A + y = Ec(EEc) 1 (DcD) 1 Dcy . (5.59)

The unknown vector x Am has the minimum Euclidean length


|| x Am ||2 = xcAm x Am = y c( A + )cA + y = y c(DA )c(EEc) 1 DA y . (5.60)

y = y Am + i Am (5.61)
is an orthogonal decomposition of the observation vector
y  Y = \ n into
Ax Am = y Am  R ( A) and y  AxAm = i Am  R ( A) A , (5.62)
the vector of inconsistency.
y Am = Ax Am = AA + y = i Am = y  y Am = ( I n  AA + ) y =
and
= D( DcD) 1 Dcy = DDA y = [I n  D( DcD) 1 Dc]y = ( I n  DD A ) y
AA + y = D( DcD) 1 Dcy = DDA y = y Am is the projection PR ( A ) and
( I n  AA + ) y = [I n  D( DcD) 1 Dc]y = ( I n  DD A ) y is the projection PR ( A ) . A

i Am and y Am are orthogonal in the sense of ¢ i Am | y Am ² = 0 or


( I n  AA + )cA = [I n  D( DcD) 1 Dc]cD = 0 . The “goodness of fit” of
MINOLESS is
|| y  Ax Am ||2 =|| i Am ||2 = y c(I n  AA + )y =
(5.63)
= y c[I n  D(DcD) 1 Dc]y = y c(I n  DDA1 )y .
5-2 MINOLESS and related solutions 271

case (ii): G x and G y positive definite

Lˆ = ( A m A ) (weighted) = G x Ec( EG x1Ec) 1 ( DcG y D) 1 DcG y (5.64)


ª E = E weighted right inverse

Lˆ = ER (weighted) D L (weighted) «  m R
(5.65)
¬ E L = EA weighted left inverse

x Am = ( A Am )G G y d A G+ G y =
y x y x
(5.66)
= G Ec(EG Ec) ( DcG y D) 1 DG y y.
1
x
1
x
1

The unknown vector x Am has the weighted minimum Euclidean length


|| x Am ||G2 = xcAm G x x Am = y c( A + )cG x A + y =
x
(5.67)
= y cG y D(DcG y D) 1 (EG x1Ec) 1 EEc(EG x1Ec) 1 (DcG y D) 1 DcG y y c.

y = y Am + i Am (5.68)

is an orthogonal decomposition of the observation vector y  Y = \ n


into
Ax Am = y Am  R ( A) and y  AxAm =: i Am  R ( A) A (5.69)

of inconsistency.
y Am = AA G+ G yy x
and i A = ( I n  AA G+ G ) y
y x
(5.70)

AA G+ yGx
= PR ( A ) I n  AA G+ yGx
= PR ( A ) A

are G y -orthogonal
¢ i Am | y Am ² G = 0 or (I n  AA + ( weighted ))cG y A = 0 .
y
(5.71)
The “goodness of fit” of G x , G y -MINOLESS is
|| y  Ax Am ||G2 =|| i Am ||G2 =
y y

+
= y c[I n  AA Gy Gx ]cG y [I n  AA G+ G ]y = y x
(5.72)
= y c[I n  D(DcG y D) DcG y ]cG y [I n  D(DcG y D) 1 DcG y ]y =
1

= y c[G y  G y D(DcG y D) 1 DcG y ]y.

While Lemma 5.4 took advantage of rank factorization, Lemma 5.5 will alterna-
tively focus on additive rank partitioning.
272 5 The third problem of algebraic regression

Lemma 5.5 ( G x -minimum norm, G y -least squares solution:


MINOLESS, additive rank partitioning)
x Am = Ly
ˆ is G x -minimum norm, G y -least squares solution
(MINOLESS) of (5.1)
{Ax + i = y | A  \ n×m , r := rk A = rk( A cG y A ) < min{n, m}} ,

if it is represented by additive rank partitioning


ªA A12 º A11  \ r × r , A12  \ r ×( m  r )
A = « 11 , (5.73)
¬ A 21 A 22 »¼ A 21  \ ( n  r )× r , A 22  \ ( n  r )×( m  r )

subject to the rank identity


rk A = rk A11 = r (5.74)
as
case (i): G y = I n , G x = I m

ª Nc º
Lˆ = A Am = « 11 » (N12 N12
c + N11 N11
c ) 1 [ A11
c , Ac21 ] (5.75)
N c
¬ 12 ¼
subject to
N11 := A11
c A11 + Ac21A 21 , N12 := A11 c A12 + Ac21A 22 (5.76)
N 21 := A12 A11 + A 22 A 21 , N 22 := A12
c c c A12 + A c22 A 22 (5.77)
or
ª Nc º ªy º
x Am = « 11 » (N12 N12
c + N11 N11 c , Ac21 ] « 1 » .
c ) 1 [ A11 (5.78)
c ¼
¬ N12 ¬y 2 ¼
The unknown vector xAm has the minimum Euclidean length

|| x Am ||2 = x cAm x Am =
ªA º ªy º (5.79)
= [ y1c , y c2 ] « 11 » ( N12 N12
c + N11N11 c , A c21 ] « 1 » .
c ) 1[ A11
¬ A 21 ¼ ¬y2 ¼
y = y Am + i Am

is an orthogonal decomposition of the observation vector y  Y = \ n


into
Ax Am = y Am  R ( A) and y  AxAm =: i Am  R ( A) A , (5.80)
5-2 MINOLESS and related solutions 273

the vector of inconsistency.


y Am = Ax Am = AA Am y and i Am = y  Ax Am =
= ( I n  AA Am ) y

are projections onto R(A) and R ( A) A , respectively. i Am and y Am


are orthogonal in the sense of ¢ i Am | y Am ² = 0 or (I n  AA Am )cA = 0 .
The “goodness of fit” of MINOLESS is
|| y  Ax Am ||2 =|| i Am ||2 = y c(I n  AA Am )y .

I n  AA Am , rk( I n  AA Am ) = n  rk A = n  r , is the rank deficient a


posteriori weight matrix (G y )Am .
case (ii): G x and G y positive definite

Lˆ = ( A Am

)G yGx
.

5-22 (G x , G y ) -MINOS and its generalized inverse


A more formal version of the generalized inverse which is characteristic for G x -
MINOS, G y -LESS or (G x , G y ) -MINOS is presented by

Lemma 5.6 (characterization of G x , G y -MINOS):


(5.81) rk( A cG y A) = rk A ~ R ( A cG y ) = R ( A c) (5.82)

is assumed. x Am = L y is (G x , G y ) -MINOLESS of (5.1) for all y  \ n if


and only if the matrix L  \ m ×n fulfils the four conditions
G y ALA = G y A (5.83)
G x LAL = G x L (5.84)
G y AL = (G y AL)c (5.85)
G x LA = (G x LA )c . (5.86)

In this case G x x Am = G x L y is always unique. L, fulfilling the four


conditions, is called the weighted MINOS inverse or weighted
Moore-Penrose inverse.
:Proof:

The equivalence of (5.81) and (5.82) follows from


R( A cG y ) = R( A cG y A ) .
274 5 The third problem of algebraic regression

(i) G y ALA = G y A and G y AL = (G y AL)c .

Condition (i) G y ALA = G y A and (iii) G y AL = (G y AL)c are a consequence of


G y -LESS.
|| i ||G2 =|| y  Ax ||G2 = min Ÿ AcG y AxA = AcG y y.
y y x

If G x is positive definite, we can represent the four conditions (i)-(iv) of L by


(G x , G y ) -MINOS inverse of A by two alternative solutions L1 and L2, namely
AL1 = A( A cG y A )  A cG y AL1 = A( A cG y A )  A cL1cG y =
= A ( A cG y A )  A cG y =
= A ( A cG y A )  A cLc2 A cG y = A( A cG y A )  A cG y AL 2 = AL 2
and
L 2 A = G x1 ( A cLc2 G x ) = G x1 ( A cLc2 A cLc2 G x ) = G x1 ( A cL1cA cLc2 G x ) =
= G x1 ( A cL1cG x L 2 A ) = G x1 (G x L1AL 2 A ) =
= G x1 (G x L1AL1 A ) = L1 A,

L1 = G x1 (G x L1 AL1 ) = G x1 (G x L2 AL2 ) = L2

concludes our proof.


The inequality
|| x Am ||G2 =|| L y ||G2 d|| L y ||G2 +2 y cLcG x ( I n  LA) z +
x x y
(5.87)
+ || ( I m  LA ) z ||G2 y  \ n
x

is fulfilled if and only if the “condition of G x -orthogonality”


LcG x ( I m  LA ) = 0 (5.88)
applies. An equivalence is
LcG x = LcG x LA or LcG x L = LcG x LAL ,
which is produced by left multiplying with L. The left side of this equation is a
symmetric matrix. Consequently, the right side has to be a symmetric matrix, too.
G x LA = (G x LA )c .

Such an identity agrees to condition (iv). As soon as we substitute in the “condi-


tion of G x -orthogonality” we are led to
LcG x = LcG x LA Ÿ G x L = (G x LA )cL = G x LAL ,

a result which agrees to condition (ii).


? How to prove uniqueness of A1,2,3,4 = A Am = A + ?
5-2 MINOLESS and related solutions 275

Uniqueness of G x x Am can be taken from Lemma 1.4 (characterization of G x -


MINOS).
Substitute x A = Ly and multiply the left side by L.
A cG y ALy = A cG y y œ AcG y AL = AcG y

LcA cG y AL = LcA cG y œ G y AL = (G y AL)c = LcA cG y .

The left side of the equation LcA cG y AL = LcA cG y is a symmetric matrix. Conse-
quently the right side has to be symmetric, too. Indeed we have proven condition
(iii) (G y AL)c = G y AL . Let us transplant the symmetric condition (iii) into the
original normal equations in order to benefit from
A cG y AL = A cG y or G y A = LcA cG y A = (G y AL)cA = G y ALA .

Indeed, we have succeeded to have proven condition (i), in condition


(ii) G x LAL = G x L and G x LA = (G x LA)c.

Condition (ii) G y LAL = G x L and (iv) G x LA = (G x LA )c are a consequence of


G x -MINOS.
The general solution of the normal equations A cG y Ax A = A cG y y is
x A = x Am + [I m  ( A cG y A )  ( A cG y A )]z (5.89)

for an arbitrary vector z  \ m . A cG y ALA = A cG y A implies


x A = x Am + [I m  ( A cG y A )  A cG y ALA ]z =
= x Am + [I m  LA ]z.
Note 1:
The following conditions are equivalent:

ª (1) AA  A = A
«
« (2) A  AA  = A
(1st)
« (3) ( AA  )cG y = G y AA 
«
«¬ (4) ( A A )cG x = G x A A
 

ª A #G y AA  = A cG y
(2nd) « 
¬ ( A )cG x A A = ( A )cG x
 

“if G x and G y are positive definite matrices, then


A #G y = G x A #
or
A # = G x1A cG y
276 5 The third problem of algebraic regression

are representations for the adjoint matrix”


“if G x and G y are positive definite matrices, then
( A cG y A ) AA  = A cG y

( A  )cG x A  A = ( A  )cG x ”

ª AA  = PR ( A )
(3rd) « 
«¬ A A = PR ( A ) .


The concept of a generalized inverse of an arbitrary matrix is originally due to


E.H. Moore (1920) who used the 3rd definition. R. Penrose (1955), unaware of
E.H. Moore´s work, defined a generalized inverse by the 1st definition to
G x = I m , G y = I n of unit matrices which is the same as the Moore inverse. Y.
Tseng (1949, a, b, 1956) defined a generalized inverse of a linear operator be-
tween function spaces by means of
AA  = PR ( A ) , A  A = P ,
R ( A )

where R( A ) , R( A  ) , respectively are the closure of R ( A ) , R( A  ) , respec-


tively. The Tseng inverse has been reviewed by B. Schaffrin, E. Heidenreich and
E. Grafarend (1977). A. Bjerhammar (1951, 1957, 1956) initiated the notion of
the least-squares generalized inverse. C.R. Rao (1967) presented the first classi-
fication of g-inverses.
Note 2:
Let || y ||G = ( y cG y y )1 2 and || x ||G = ( x cG x x )1 2 , where G y and G x are positive
y x
semidefinite. If there exists a matrix A  which satisfies the definitions of Note 1,
then it is necessary, but not sufficient that
(1) G y AA  A = G y A

(2) G x A  AA  = G x A 

(3) ( AA  )cG y = G y AA 

(4) ( A  A )cG x = G x A  A .
Note 3:
A g-inverse which satisfies the conditions of Note 1 is denoted by A G+ G and y x
referred to as G y , G x -MINOLESS g-inverse of A.
A G+ G is unique if G x is positive definite. When both G x and G y are
y x
general positive semi definite matrices, A G+ G may not be unique . If
y x

| G x + A cG y A |z 0 holds, A G+ G is unique.
y x
5-2 MINOLESS and related solutions 277

Note 4:
If the matrices of the metric are positive definite, G x z 0, G y z 0 , then
(i) ( A G+ G )G+ G = A ,
y x x y

(ii) ( A G+ G ) # = ( A c)G+ 1 1 .
y x x Gy

5-23 Eigenvalue decomposition of (G x , G y ) -MINOLESS


For the system analysis of an inverse problem the eigenspace analysis and eigen-
space synthesis of x Am (G x , G y ) -MINOLESS of x is very useful and give some
peculiar insight into a dynamical system. Accordingly we are confronted with the
problem to develop “canonical MINOLESS”, also called the eigenvalue decom-
position of (G x , G y ) -MINOLESS.
First we refer again to the canonical representation of the parameter space X as
well as the observation space Y introduced to you in the first chapter, Box 1.6
and Box 1.9. But we add here by means of Box 5.8 the forward and backward
transformation of the general bases versus the orthogonal bases spanning the
parameter space X as well as the observation space Y. In addition, we refer to
Definition 1.5 and Lemma 1.6 where the adjoint operator A has been intro-
duced and represented.
Box 5.8
General bases versus orthogonal bases spanning the parameter space X as
well as the observation space Y.
“left” “right”
“parameter space” “observation space”
“general left base” “general right base”
span {a1 ,… , am } = X Y=span {b1 ,… , bn }
:matrix of the metric: :matrix of the metric:
(5.90) aac = G x bbc = G y (5.91)
“orthogonal left base” “orthogonal right base”
span {e ,… , e } = X
x
1
x
m Y=span {e1y ,… , e ny }
:matrix of the metric: :matrix of the metric:
(5.92) e x ecx = I m e y ecy = I n (5.93)
“base transformation” “base transformation”
(5.94) a = ȁ1x 2 9e x b = ȁ1y 2 Ue y (5.95)
278 5 The third problem of algebraic regression

versus versus
(5.96) e x = V cȁ x1 2 a e y = Ucȁ y1 2 b (5.97)
span {e1x ,… , e xm } = X Y=span {e1y ,… , e ny } .
Second, we are solving the general system of linear equations
{y = Ax | A  \ n ×m , rk A < min{n, m}}

by introducing
• the eigenspace of the rank
deficient, rectangular matrix
of rank r := rk A < min{n, m}:
A 6 A
• the left and right canonical
coordinates: x 6 x , y 6 y
as supported by Box 5.9. The transformations x 6 x (5.97), y 6 y (5.98) from
the original coordinates ( x1 ,… , x m ) to the canonical coordinates ( x1 ,… , x m ) , the
left star coordinates, as well as from the original coordinates ( y1 ,… , y n ) to the
canonical coordinates ( y1 ,… , y n ), the right star coordinates, are polar decompo-
sitions: a rotation {U, V}is followed by a general stretch {G1y 2 , G1x 2 } . Those
root matrices are generated by product decompositions of type G y = (G1y 2 )cG1y 2
as well as G x = (G1x 2 )cG1x 2 . Let us substitute the inverse transformations (5.99)
x 6 x = G x1 2 Vx and (5.100) y 6 y = G y1 2 Uy into the system of linear equa-
tions (5.1), (5.101) y = Ax + i, rk A < min{n, m} or its dual (5.102) y = A x + i .
Such an operation leads us to (5.103) y = f( x ) as well as (5.104) y = f (x).
Subject to the orthonormality condition (5.105) UcU = I n and (5.106)
V cV = I m we have generated the left-right eigenspace analysis (5.107)
ªȁ O1 º
ȁ = «
¬ O2 O3 »¼

subject to the rank partitioning of the matrices U = [U1 , U 2 ] and V = [ V1 , V2 ] .


Alternatively, the left-right eigenspace synthesis (5.118)
ªȁ O1 º ª V1c º 1 2
A = G y1 2 [U1 , U 2 ] « G
¬O2 O3 »¼ «¬ V2c »¼ x

is based upon the left matrix (5.109) L := G y1 2 U decomposed into (5.111)
L1 := G y1 2 U1 and L 2 := G y1 2 U 2 and the right matrix (5.100) R := G x1 2 V decom-
posed into R1 := G x1 2 V1 and R 2 := G x1 2 V2 . Indeed the left matrix L by means of
(5.113) LLc = G y1 reconstructs the inverse matrix of the metric of the observation
space Y. Similarly, the right matrix R by means of (5.114) RR c = G x1 generates
5-2 MINOLESS and related solutions 279

the inverse matrix of the metric of the parameter space X. In terms of “L, R” we
have summarized the eigenvalue decompositions (5.117)-(5.122).
Such an eigenvalue decomposition helps us to canonically invert y = A x + i
by means of (5.123), namely the “full rank partitioning” of the system of canoni-
cal linear equations y = A x + i . The observation vector y  \ n is decom-
posed into y1  \ r ×1 and y 2  \ ( n  r )×1 while the vector x  \ m of unknown
parameters is decomposed into x1  \ r ×1 and x 2  \ ( m  r )×1 .
(x1 ) Am = ȁ 1 y1

is canonical MINOLESS leaving y 2 ”unrecognized” and x 2 = 0 as a “fixed da-


tum”.
Box 5.9:
Canonical representation, the general case: overdetermined
and unterdetermined system without full rank
“parameter space X” versus “observation space”
(5.98) x = V cG1x 2 x y = UcG1y 2 y (5.99)
and and
(5.100) x = G x1 2 Vx y = G y1 2 Uy (5.101)
“overdetermined and unterdetermined system without
full rank”
{y = Ax + i | A  \ n× m , rk A < min{n, m}}

(5.102) y = Ax + i versus y = A x + i (5.103)


G y1 2 Uy = AG x1 2 x + UcG1y 2 y = A V cG x1 2 x +
+ G y1 2 Ui + UG1y 2 i

(5.104) y = ( UcG1y 2 AG x1 2 V )x versus y = (G y1 2 UA V cG x1 2 )x + i (5.105)


subject to subject to
(5.106) UcU = UUc = I n versus V cV = VV c = I m (5.107)
“left and right eigenspace”
“left-right eigenspace “left-right eigenspace
analysis” synthesis”
ª Uc º
A = « 1 » G1y 2 AG x1 2 [ V1 , V2 ] = A=
¬ Uc2 ¼
(5.108) ªȁ O1 º ª V1c º 1 2 (5.109)
ª ȁ O1 º G y1 2 [U1 , U 2 ] « Gx
=« » ¬O2 O3 »¼ «¬ V2c »¼
¬ O2 O3 ¼
280 5 The third problem of algebraic regression

“dimension identities”
ȁ  \ r × r , O1  \ r ×( m  r ) , U1  \ n × r , V1  \ m × r

O2  \ ( n  r )× r , O3  \ ( n  r )×( m  r ) , U 2  \ n ×( n  r ) , V2  \ m ×( m  r )

“left eigenspace” “right eigenspace”


(5.110) L := G y1 2 U Ÿ L1 = UcG1y 2 R := G x1 2 V Ÿ R 1 = V cG1x 2 (5.111)

(5.112) L1 := G y1 2 U1 , L 2 := G y1 2 U 2 R1 := G x1 2 V1 , R 2 := G x1 2 V2 (5.113)

(5.114) LLc = G y1 Ÿ ( L1 )cL1 = G y RR c = G x1 Ÿ (R 1 )cR 1 = G x (5.115)

ª Uc º ª L º ª Vcº ªR  º
(5.116) L1 = « 1 » G1y 2 =: « 1 » R 1 = « 1 » G1x 2 =: « 1 » (5.117)
¬ Uc2 ¼ ¬L2 ¼ ¬ V2c ¼ ¬R 2 ¼
(5.118) A = LA R 1 versus A = L1AR (5.119)

ªȁ O1 º
A = « =
ªR º 
¬O2 O3 »¼
(5.120) A = [L1 , L 2 ]A « » 1

versus (5.121)
¬R ¼ ª L º
= « 1 » A[R1 , R 2 ]
2

¬L2 ¼

AA # L1 = L1ȁ 2 º ª A # AR1 = R1ȁ 2


(5.122) » versus « (5.123)
AA # L 2 = 0 ¼ ¬ A AR 2 = 0
#

“inconsistent system of linear equations


without full rank”

ªȁ O1 º ª x1 º ª i1 º ª y1 º
(5.124) y = A x + i = « « »+« » = « »
¬ O2 O3 »¼ ¬ x 2 ¼ ¬ i 2 ¼ ¬ y 2 ¼

y1  \ r ×1 , y *2  \ ( n  r )×1 , i1*  \ r ×1 , i*2  \ ( n  r )×1

x1*  \ r ×1 , x*2  \ ( m  r )×1

“if ( x* , i* ) is MINOLESS, then x*2 = 0, i* = 0 :

(x1* )Am = ȁ 1 y1* . ”

Consult the commutative diagram of Figure 5.6 for a shortened summary of the
newly introduced transformation of coordinates, both of the parameter space X
as well as the observation space Y.
5-2 MINOLESS and related solutions 281
A
X
x R ( A)  Y

V cG1x 2 UcG1y 2

X
x y  R ( A )  Y
Figure 5.6 : Commutative diagram of coordinate transformations
Third, we prepare ourselves for MINOLESS of the general system of linear
equations
{y = Ax + i | A  \ n × m , rk A < min{n, m} ,

|| i ||G2 = min subject to || x ||G2 = min}


y x

by introducing Lemma 5.4-5.5, namely the eigenvalue-eigencolumn equations of


the matrices A#A and AA#, respectively, as well as Lemma 5.6, our basic result
of “canonical MINOLESS”, subsequently completed by proofs. Throughout we
refer to the adjoint operator which has been introduced by Definition 1.5 and
Lemma 1.6.
Lemma 5.7 (eigenspace analysis versus eigenspace synthesis
A  \ n × m , r := rk A < min{n, m} )
The pair of matrices {L, R} for the eigenspace analysis and the ei-
genspace synthesis of the rectangular matrix A  \ n ×m of rank
r := rk A < min{n, m} , namely
A = L1 AR versus A = LA R 1
or or

ªȁ O1 º
A = « = A=
¬O2 O3 »¼
versus ªR  º
ª L º = [L1 , L 2 ]A* « 1 »
= « 1 » A[R1 , R 2 ] ¬R 2 ¼
¬L2 ¼
are determined by the eigenvalue-eigencolumn equations (eigen-
space equations)
A # AR1 = R1ȁ 2 AA # L1 = L1ȁ 2
versus
A # AR 2 = 0 AA # L 2 = 0
282 5 The third problem of algebraic regression

subject to
ªO12 " 0 º
« »
« # % # » , ȁ = Diag(+ O1 ,… , + Or ) .
2 2

« 0 " Or2 »
¬ ¼
5-24 Notes
The algebra of eigensystems is treated in varying degrees by most books on lin-
ear algebra, in particular tensor algebra. Special mention should be made of R.
Bellman’s classic “Introduction to matrix analysis” (1970) and Horn’s and John-
son’s two books (1985, 1991) on introductory and advanced matrix analysis.
More or less systematic treatments of eigensystem are found in books on matrix
computations. The classics of the field are Householder’s “Theory of matrices in
numerical analysis” (1964) and Wilkinson’s “The algebraic eigenvalue prob-
lem” (1965) . G. Golub’s and Van Loan’s “Matrix computations” (1996) is the
currently definite survey of the field. Trefethen’s and Bau’s “Numerical linear
algebra” (1997) is a high-level, insightful treatment with a welcomed stress on
geometry. G.W. Stewart’s “Matrix algorithm: eigensystems” (2001) is becoming
a classic as well.
The term “eigenvalue” derives from the German Eigenwert, which was intro-
duced by D. Hilbert (1904) to denote for integral equations the reciprocal of the
matrix eigenvalue. At some point Hilbert’s Eigenwert inverted themselves and
became attached to matrices. Eigenvalues have been called many things in their
day. The “characteristic value” is a reasonable translation of Eigenwert. How-
ever, “characteristic” has an inconveniently large number of syllables and sur-
vives only in the terms “characteristic equation” and “characteristic polyno-
mial”. For symmetric matrices the characteristic equation and its equivalent are
also called the secular equation owing to its connection with the secular pertur-
bations in the orbits of planets. Other terms are “latent value” and “proper value”
from the French “valeur propre”.
Indeed the day when purists and pedants could legitimately object to “eigen-
value” as a hybrid of German and English has long since passed. The German
“eigen” has become thoroughly materialized English prefix meaning having to
do with eigenvalues and eigenvectors. Thus we can use “eigensystem”, “eigen-
space” or “eigenexpansion” without fear of being misunderstood. The term “ei-
genpair” used to denote an eigenvalue and eigenvector is a recent innovation.
5-3 The hybrid approximation solution: D-HAPS and Tykhonov-
Phillips regularization
G x ,G y  MINOLESS has been built on sequential approximations. First, the
surjectivity defect was secured by G y  LESS . The corresponding normal
5-3 The hybrid approximation solution 283

equations suffered from the effect of the injectivity defect. Accordingly, second
G x  LESS generated a unique solution the rank deficient normal equations.
Alternatively, we may constitute a unique solution of the system of inconsistent,
rank deficient equations
{Ax + i = y | AG  \ n× m , r := rk A < min{n, m}}

by the D -weighted hybrid norm of type “LESS” and “MINOS”. Such a solution
of a general algebraic regression problem is also called
• Tykhonov- Phillips regularization
• ridge estimator
• D  HAPS.
Indeed, D  HAPS is the most popular inversion operation, namely to regularize
improperly posed problems. An example is the discretized version of an integral
equation of the first kind.
Definition 5.8 (D-HAPS):
An m × 1vector x is called weighted D  HAPS (Hybrid AP proxi-
mative Solution) with respect to an D -weighted G x , G y -seminorm
of (5.1), if
x h = arg{|| y - Ax ||G2 +D || x ||G2 = min | Ax + i = y ,
y x
(5.125)
n× m
A\ ; rk A d min{n, m}}.

Note that we may apply weighted D  HAPS even for the case of rank identity
rkA d min{n, m} . The factor D  \ + balances the least squares norm and the
minimum norm of the unknown vector which is illustrated by Figure 5.7.

Figure 5.7. Balance of LESS and MINOS to general MORE


Lemma 5.9 (D  HAPS ) :
x h is weighted D  HAPS of x of the general system of inconsis-
tent, possibly of inconsistent, possibly rank deficient system of lin-
ear equations (5.1) if and only if the system of normal equations
1 1
(5.126) (D G x + A cG y A )x h = AcG y y or (G x + A cG y A )x h = A cG y y (5.127)
D D
is fulfilled. xh always exists and is uniquely determined if the matrix
D G x + A'G y A is regular or rk[G x , A cG y A] = m. (5.128)
284 5 The third problem of algebraic regression

: Proof :
D  HAPS is constructed by means of the Lagrangean
L( x ) :=|| y - Ax ||G2 +D || x ||G2 = ( y - Ax )cG y ( y - Ax) + D ( xcG y x) = min ,
y y x

such that the first derivates


dL
( x h ) = 2(D G x + A cG y A )x h  2A cG y y = 0
dx
constitute the necessary conditions. Let us refer to the theory of vector deriva-
tives in Appendix B. The second derivatives
w2L
( x h ) = 2(D G x + A cG y A ) t 0
wxwx c
generate the sufficiency conditions for obtaining the minimum of the uncon-
strained Lagrangean. If D G x + A ' G y A is regular of rk[G y , A cG y A ] = m , there
exists a unique solution.
h
Lemma 5.10 (D  HAPS ) :
If x h is D  HAPS of x of the general system of inconsistent, possi-
bly of inconsistent, possibly rank deficient system of linear equations
(5.1) fulfilling the rank identity
rk[G y , A cG y A ] = m or det(D G x + A cG y A ) z 0
then
x h = (D G x + A cG y A ) 1 A cG y y
or
1 1
x h = (G x + A cG y A ) 1 A cG y y
D D
or
x h = (D I + G x1A cG y A ) 1 G x1A cG y y
or
1 1 1
x h = (I +G x A cG y A ) 1 G x1A cG y y
D D
are four representations of the unique solution.
6 The third problem of probabilistic regression
– special Gauss - Markov model with datum problem –
Setup of BLUMBE and BLE for the moments of first order
and of BIQUUE and BIQE for the central moment of sec-
ond order
{y = Aȟ + c y , A  \ n×m , rk A < min{n, m}}

:Fast track reading:


Read only Definition 6.1, Theorem 6.3,
Definition 6.4-6.6, Theorem 6.8-6.11

Lemma 6.2
ȟˆ hom Ȉ y , S-BLUMBE of ȟ

Theorem 6.3
hom Ȉ y , S-BLUMBE of ȟ

Definition 6.1 Lemma 6.4


n{y}, D{Aȟˆ}, D{e y }
ȟˆ hom Ȉ y , S-BLUMBE E

Theorem 6.5
Vˆ 2 BIQUUE of Vˆ 2

Theorem 6.6
Vˆ 2 BIQE of V 2
286 6 The third problem of probabilistic regression

Definition 6.7 Lemma 6.10


ȟ̂ hom BLE of ȟ hom BLE, hom S-BLE, hom D -BLE

Definition 6.8 Theorem 6.11


ȟ̂ S-BLE of ȟ ȟ̂ hom BLE

Definition 6.9 Theorem 6.12


ȟ̂ hom hybrid var-min bias ȟ̂ hom S-BLE

6 Theorem 6.13
ȟ̂ hom D -BLE

Definition 6.7 and Lemma 6.2, Theorem 6.3, Lemma 6.4, Theorem 6.5 and 6.6
review ȟ̂ of type hom Ȉ y , S-BLUMBE, BIQE, followed by the first example.
Alternatively, estimators of type best linear, namely hom BLE, hom S-BLE and
hom D -BLE are presented. Definitions 6.7, 6.8 and 6.9 relate to various estima-
tors followed by Lemma 6.10, Theorem 6.11, 6.12 and 6.13.
In the fifth chapter we have solved a special algebraic regression problem,
namely the inversion of a system of inconsistent linear equations with a datum
defect. By means of a hierarchic postulate of a minimum norm || x ||2 = min , least
squares solution || y  Ax ||2 = min (“MINOLESS”) we were able to determine m
unknowns from n observations through the rank of the linear operator,
rk A = r < min{n, m} , was less than the number of observations or less the num-
ber of unknowns. Though “MINOLESS” generates a rigorous solution, we were
left with the problem to interpret our results.
The key for an evolution of “MINOLESS” is handed over to us by treating the
special algebraic problem by means of a special probabilistic regression
problem, namely as a special Gauss-Markov model with datum defect. The bias
6-1 Setup of the best linear minimum bias estimator of type BLUMBE 287

generated by any solution of a rank deficient system of linear equations will


again be introduced as a decisive criterion for evaluating “MINOLESS”, now in
the context of a probabilistic regression problem. In particular, a special form of
“LUMBE” the linear uniformly minimum bias estimator || LA  I ||= min , leads
us to a solution which is equivalent to “MINOS”. “Best” of LUMBE in the sense
of the average variance || D{ȟˆ} ||2 = tr D{ȟˆ} = min also called BLUMBE, will give
us a unique solution of ȟ̂ as a linear estimation of the observation vector
y  {Y, pdf} with respect to the linear model E{y} = Aȟ , D{y} = Ȉ y of “fixed
effects” ȟ  ; .
Alternatively, in the fifth chapter we had solved the ill-posed problem
y = Ax+i, A\ n×m , rk A < min{n,m} , by means of D -HAPS. Here with respect to a
special probabilistic regression problem we succeed to compute D -BLE ( D
weighted, S modified Best Linear Estimation) as an equivalence to D - HAPS of a
special algebraic regression problem. Most welcomed is the analytical optimiza-
tion problem to determine the regularization parameter D by means of a special
form of || MSE{D} ||2 = min , the weighted Mean Square Estimation Error. Such
an optimal design of the regulator D is not possible in the Tykhonov-Phillips
regularization in the context of D -HAPS, but definitely in the context of D -BLE.

6-1 Setup of the best linear minimum bias estimator of type


BLUMBE
Box 6.1 is a definition of our special linear Gauss-Markov model with datum
defect. We assume (6.1) E{y} = Aȟ, rk A < min{n, m} (1st moments) and (6.2)
D{y} = Ȉ y , Ȉ y positive definite, rk Ȉ y = n (2nd moments). Box 6.2 reviews the
bias vector as well as the bias matrix including the related norms.
Box 6.1
Special linear Gauss-Markov
model with datum defect
{y = Aȟ + c y , A  \ n×m , rk A < min{n, m}}

1st moments
E{y} = Aȟ , rk A < min{n, m} (6.1)

2nd moments
D{y} =: Ȉ y , Ȉ y positive definite, rk Ȉ y = n, (6.2)

ȟ  \ m , vector of “fixed effects”, unknown,


Ȉ y unknown or known from prior information.
288 6 The third problem of probabilistic regression

Box 6.2
Bias vector, bias matrix
Vector and matrix bias norms
Special linear Gauss-Markov model of fixed effects subject to datum defect
A  \ n× m , rk A < min{n, m}

E{y} = Aȟ, D{y} = Ȉ y (6.3)

“ansatz”

ȟˆ = Ly (6.4)

bias vector

ȕ := E{ȟˆ  ȟ} = E{ȟˆ}  ȟ z 0 (6.5)

ȕ = LE{y}  ȟ = [I m  LA ]ȟ z 0 (6.6)

bias matrix
B := I n  LA (6.7)

“bias norms”
|| ȕ ||2 = ȕcȕ = ȟ c[I m  LA]c[I m  LA]ȟ (6.8)

|| ȕ ||2 = tr ȕȕc = tr[I m  LA]ȟȟ c[I m  LA ]c =|| B ||ȟȟ


2
c (6.9)

|| ȕ ||S2 := tr[I m  LA]S[I m  LA ]c =|| B ||S2 (6.10)

“dispersion matrix”

D{ȟˆ} = LD{y}Lc = L6 y Lc (6.11)

“dispersion norm, average variance”

|| D{ȟˆ} ||2 := tr LD{y}Lc = tr L6 y Lc =:|| Lc ||Ȉ y


(6.12)

“decomposition”

ȟˆ  ȟ = (ȟˆ  E{ȟˆ}) + ( E{ȟˆ}  ȟˆ ) (6.13)

ȟˆ  ȟ = L(y  E{y})  [I m  LA]ȟ (6.14)

“Mean Square Estimation Error”

MSE{ȟˆ} := E{(ȟˆ  ȟ )(ȟˆ  ȟ )c} (6.15)


6-1 Setup of the best linear minimum bias estimator of type BLUMBE 289

MSE{ȟˆ} = LD{y}Lc + [I m  LA ]ȟȟ c[I m  LA ]c (6.16)

“modified Mean Square Estimation Error”

MSES {ȟˆ} := LD{y}Lc + [I m  LA ]S[I m  LA ]c (6.17)

“MSE norms, average MSE”

|| MSE{ȟˆ} ||2 := tr E{(ȟˆ  ȟ )(ȟˆ  ȟ )c} (6.18)


|| MSE{ȟˆ} ||2 =
= tr LD{y}Lc + tr[I m  LA]ȟȟ c[I m  LA ]c = (6.19)
= || Lc || 2
Ȉy + || (I m  LA)c || 2
ȟȟ c

|| MSES {ȟˆ} ||2 :=


:= tr LD{y}Lc + tr[I m  LA]S[I m  LA]c = (6.20)
=|| Lc ||
2
Ȉy + || (I m  LA)c || .2
ȟȟ c

Definition 6.1 defines (1st) ȟ̂ as a linear homogenous form, (2nd) of type “mini-
mum bias” and (3rd) of type “smallest average variance”.

Chapter 6-11 is a collection of definitions and lemmas, theorems basic for the
developments in the future.
6-11 Definitions, lemmas and theorems
Definition 6.1 (ȟˆ hom Ȉ , S-BLUMBE) :
y

An m × 1 vector ȟˆ = Ly is called homogeneous Ȉ y , S- BLUMBE


(homogeneous Best Linear Uniformly Minimum Bias Estimation)
of ȟ in the special inconsistent linear Gauss Markov model of fixed
effects of Box 6.1, if
(1st) ȟ̂ is a homogeneous linear form
ȟˆ = Ly (6.21)
(2nd) in comparison to all other linear estimations ȟ̂ has the
minimum bias in the sense of
|| B ||S2 :=|| ( I m  LA )c ||S2 = min, (6.22)
L

(3rd) in comparison to all other minimum bias estimations ȟ̂


has the smallest average variance in the sense of

|| D{ȟˆ} ||2 = tr LȈ y Lc =|| L ' ||2Ȉ = min .


y
(6.23)
L
290 6 The third problem of probabilistic regression

The estimation ȟ̂ of type hom Ȉ y , S-BLUMBE can be characterized by

Lemma 6.2 (ȟˆ hom Ȉ y , S-BLUMBE of ȟ ) :


An m × 1vector ȟˆ = Ly is hom Ȉ y , S-BLUMBE of ȟ in the spe-
cial inconsistent linear Gauss- Markov model with fixed effects of
Box 6.1, if and only if the matrix L fulfils the normal equations
ª Ȉy ASA 'º ªL 'º ª  º
« ASA ' = (6.24)
¬ 0 »¼ «¬ ȁ »¼ «¬ AS »¼
with the n × n matrix ȁ of “Lagrange multipliers”.

: Proof :
First, we minimize the S-modified bias matrix norm, second the MSE( ȟ̂ ) matrix
norm. All matrix norms have been chosen “Frobenius”.
(i) || (I m  LA) ' ||S2 = min .
L

The S -weighted Frobenius matrix norm || (I m  LA ) ' ||S2 establishes the La-
grangean
L (L) := tr(I m  LA)S(I m  LA) ' = min
L

for S-BLUMBE .
ª ASA ' Lˆ ' AS = 0
L (L) = min œ «
¬ ( ASA ') … I m > 0,
L

according to Theorem 2.3.


ª Ȉy ASA cº ª C1 C2 º ª Ȉ y ASA cº ª Ȉ y ASA cº
=« (6.25)
« ASA c
¬ 0 »¼ «¬C3 » «
C4 ¼ ¬ ASA c »
0 ¼ ¬ ASA c 0 »¼

Ȉ y C1 Ȉ y + Ȉ y C2 ASA c + ASAcC3 Ȉ y + ASAcC4 ASAc = Ȉ y (6.26)

Ȉ y C1 ASAc + ASA cC3 ASAc = ASAc (6.27)

ASA cC1 Ȉ y + ASA cC2 ASA c = ASA c (6.28)

ASA cC1ASA c = 0. (6.29)

Let us multiply the third identity by Ȉ -1y ASAc = 0 from the right side and substi-
tute the fourth identity in order to solve for C2 .
ASAcC2 ASAcȈ y1 ASAc = ASAcȈ y1 ASAc (6.30)
6-1 Setup of the best linear minimum bias estimator of type BLUMBE 291

C2 = Ȉ -1y ASA c( ASA cȈ y1 ASAc)  (6.31)

solves the fifth equation


A cSAȈ-1y ASA c( ASA cȈ-y1ASA c)  ASA cȈ-y1ASA c =
(6.32)
= ASA cȈ-y1ASA c

by the axiom of a generalized inverse.


(ii) || L ' ||2Ȉ = min .
y L

The Ȉ y -weighted Frobenius matrix norm of L subject to the condition of


LUMBE generates the constrained Lagrangean
L (L, ȁ) = tr LȈ y L '+ 2 tr ȁ '( ASA ' L ' AS) = min .
L,ȁ

According to the theory of matrix derivatives outlined in Appendix B


wL ˆ ˆ ˆ ) ' = 0,
(L, ȁ ) = 2( Ȉ y Lˆ '+ ASA ' ȁ
wL
wL ˆ ˆ
(L, ȁ ) = 2( ASA ' L ' AS) = 0 ,

ˆ ) constitute the necessary conditions, while
at the “point” (Lˆ , ȁ

w2L ˆ ) = 2( Ȉ … I ) > 0 ,
(Lˆ , ȁ y m
w (vec L) w (vec L ')

to be a positive definite matrix, the sufficiency conditions. Indeed, the first matrix
derivations have been identified as the normal equations of the sequential opti-
mization problem.
h
For an explicit representation of ȟ̂ hom Ȉ y , S-BLUMBE of ȟ we solve the nor-
mal equations for

Lˆ = arg{|| D(ȟˆ ) ||= min | ASA ' L ' AS = 0} .


L

In addition, we compute the dispersion matrix D{ȟˆ | hom BLUMBE} as well as


the mean square estimation error MSE{ȟˆ | hom BLUMBE}.
Theorem 6.3 ( hom Ȉ y , S-BLUMBE of ȟ ):
Let ȟˆ = Ly be hom Ȉ y , S-BLUMBE in the special Gauss-
Markov model of Box 6.1. Then independent of the choice of the
generalized inverse ( ASA ' Ȉ y ASA ')  the unique solution of the
normal equations (6.24) is
292 6 The third problem of probabilistic regression

ȟˆ = SA '( ASA ' Ȉ -1y ASA ')  ASA ' Ȉ-1y y , (6.33)

completed by
the dispersion matrix
D(ȟˆ ) = SA '( ASA ' Ȉ-1y ASA ')  AS , (6.34)

the bias vector


ȕ = [I m  SA '( ASA ' Ȉ -1y ASA ')  ASA ' Ȉ -1y A] ȟ , (6.35)

and the matrix MSE {ȟˆ} of mean estimation errors

E{(ȟˆ  ȟ )(ȟˆ  ȟ ) '} = D{ȟˆ} + ȕȕ ' (6.36)

modified by

E{(ȟˆ  ȟ )(ȟˆ  ȟ ) '} = D{ȟˆ} + [I m  LA]S[I m  LA]' =


(6.37)
= D{ȟˆ} + [S  SA '( ASA ' Ȉ -1 ASA ')  ASA ' Ȉ -1 AS ],
y y

based upon the solution of ȟȟ ' by S.

rk MSE{ȟˆ} = rk S (6.38)

is the corresponding rank identity.


:Proof:

(i) ȟ̂ hom Ȉ y , S-BLUMBE of ȟ .

First, we prove that the matrix of the normal equations


ª Ȉy ASA cº ª Ȉy ASAcº
« ASA c , =0
¬ 0 »¼ « ASA c
¬ 0 »¼

is singular.
Ȉy ASAc
=| Ȉ y | |  ASAcȈ y 1 ASAc |= 0 ,
ASA c 0

due to rk[ ASAcȈ y1ASAc] = rk A < min{n, m} assuming rk S = m , rk Ȉ y = n .


Note
A11 A12 ª A  \ m ×m
1 1

=| A11 | | A 22  A 21 A111 A12 | if « 11


A 21 A 22 ¬ rk A11 = m1
6-1 Setup of the best linear minimum bias estimator of type BLUMBE 293

with reference to Appendix A. Thanks to the rank deficiency of the partitioned


normal equation matrix, we are forced to compute secondly its generalized in-
verse.
The system of normal equations is solved for
ªLˆ cº ª Ȉ y ASA cº ª 0 º ª C1 C2 º ª 0 º
« »=« = (6.39)
«¬ ȁ »¼ ¬ ASA c
ˆ 0 »¼ «¬ AS »¼ «¬C3 C4 »¼ «¬ AS »¼

Lˆ c = C2 AS (6.40)

œ Lˆ = SA cCc2 (6.41)

Lˆ = SA( ASA cȈ y1ASA c)  ASA cȈ y1 (6.42)

such that

ȟˆ = Ly
ˆ = SA c( ASA cȈ 1ASA c)  ASAcȈ 1y.
y y (6.43)

We leave the proof for


“ SA c( ASA cȈ y1ASA c)  ASA cȈ y1
is a weighted pseudo-inverse or Moore-Penrose inverse” as an exercise.

(ii) Dispersion matrix D{ȟˆ} .


The residual vector

ȟˆ  E{y} = Lˆ (y  E{y}) (6.44)

leads to the variance-covariance matrix

D{ȟˆ} = LȈ
ˆ Lˆ c =
y

= SA c( ASA cȈ y1ASA c)  ASAcȈ y1 ASA c( ASA cȈ y1 ASA c)  AS = (6.45)


= SAc( ASAcȈ ASAc) AS .
1
y


(iii) Bias vector E

ȕ := E{ȟˆ  ȟ} = (I m  LA
ˆ )ȟ =
(6.46)
= [I m  SA c( ASA cȈ y1ASA c)  ASA cȈ y1A]ȟ .

Such a bias vector is not accessible to observations since ȟ is unknown. Instead


it is common practice to replace ȟ by ȟ̂ (BLUMBE), the estimation ȟ̂ of ȟ of
type BLUMBE.
(iv) Mean Square Estimation Error MSE{ȟˆ}
294 6 The third problem of probabilistic regression

MSE{ȟˆ} := E{(ȟˆ  ȟ )(ȟˆ  ȟ )c} = D{ȟˆ} + ȕȕ c =


(6.47)
ˆ Lˆ c + (I  LA
= LȈ ˆ )ȟȟ c(I  LA ˆ )c .
y m m

Neither D{ȟˆ | Ȉ y } , nor ȕȕ c are accessible to measurements. ȟȟ c is replaced by


K.R. Rao’s substitute matrix S, Ȉ y = 9V 2 by a one variance component model
V 2 by Vˆ 2 (BIQUUE) or Vˆ 2 (BIQE), for instance.
h
n
Lemma 6.4 ( E{y} , D{Aȟˆ} , e y , D{ey } for ȟ̂ hom Ȉ y , S of ȟ ):
(i) With respect to the model (1st) Aȟ = E{y} , E{y}  \( A ),
rk A =: r d m and VV 2 = D{y}, V positive definite, rkV = n
under the condition dim R(SA c) = rk(SA c) = rk A = r , namely
V, S-BLUMBE, is given by
n
E{y} = Aȟˆ = A( AcV 1 A)  AcV 1 y (6.48)
with the related singular dispersion matrix

D{Aȟˆ} = V 2 A( A cV 1A)  A c (6.49)

for any choice of the generalized inverse ( AcV 1 A)  .


(ii) The empirical error vector e y = y  E{y} results in the resid-
ual error vector e y = y  Aȟˆ of type
e y = [I n  A( A cV 1A) A cV 1By ] (6.50)

with the related singular dispersion matrices


D{e y } = V 2 [ V  A( A cV 1A )  A c] (6.51)

for any choice of the generalized inverse ( AcV 1 A)  .

(iii) The various dispersion matrices are related by

D{y} = D{Aȟˆ + e y } = D{Aȟˆ} + D{e y } =


(6.52)
= D{e y  e y } + D{e y },

where the dispersion matrices


e y and Aȟˆ (6.53)
are uncorrected, in particularly,

C{e y , Aȟˆ} = C{e y , e y  e y } = 0 . (6.54).


6-1 Setup of the best linear minimum bias estimator of type BLUMBE 295
When we compute the solution by Vˆ of type BIQUUE and of type BIQE we
arrive at Theorem 6.5 and Theorem 6.6.
Theorem 6.5 ( Vˆ 2 BIQUUE of V 2 , special Gauss-Markov model:
E{y} = Aȟ , D{y} = VV 2 ,
A  \ n× m , rk A = r d m ,
V  \ n× m , rk V = n ):
Let Vˆ 2 = y cMy = (vec M )cy … y = y c … y c(vec M ) be BIQUUE
with respect to the special Gauss-Markov model 6.1. Then
Vˆ 2 = (n - r )-1 y c[V 1 - V 1 A( A cV 1 A)  A cV 1 ]y (6.55)

Vˆ 2 = (n - r )-1 y c[V 1 - V 1 ASA c( ASA cV 1 ASA c)  ASA cV 1 ]y (6.56)

Vˆ 2 = (n - r )-1 y cV 1e y = (n - r )-1 e cy V 1e y (6.57)

are equivalent representations of the BIQUUE variance compo-


nent Vˆ 2 which are independent of the generalized inverses
( A cV 1 A)  or ( ASAcV 1 AcSA)  .
The residual vector e y , namely
e y (BLUMBE) = [I n  A ( A cV 1A ) 1 A cV 1 ]y , (6.58)

is of type BLUMBE. The variance of Vˆ 2 BIQUUE of V 2


D{Vˆ 2 } = 2(n  r ) 1 V 4 = 2( n  r ) 1 (V 2 ) 2 (6.59)

can be substituted by the estimation


D{Vˆ 2 } = 2(n  r ) 1 (Vˆ 2 ) 2 = 2(n  r ) 1 (e cy V 1e y ) 2 . (6.60)

Theorem 6.6 ( Vˆ 2 BIQE of V 2 , special Gauss-Markov model:


E{y} = Aȟ , D{y} = VV 2 ,
A  \ n× m , rk A = r d m ,
V  \ n× m , rk V = n ):
Let Vˆ 2 = y cMy = (vec M )cy … y = y c … y c(vec M ) be BIQE with
respect to the special Gauss-Markov model 6.1. Independent of the
matrix S and of the generalized inverses
( A cV 1 A )  or ( ASAcV 1 AcSA)  ,
Vˆ 2 = (n  r + 2) 1 y c[V 1  V 1 A( A cV 1 A) 1 A cV 1 ]y (6.61)

Vˆ 2 = (n  r + 2) 1 y c[V 1  V 1 ASA c( ASAcV 1 ASAc) 1 ASAcV 1 ]y (6.62)


296 6 The third problem of probabilistic regression

Vˆ 2 = ( n  r + 2) 1 y cV 1e y = ( n  r + 2) 1 e cy V 1e y (6.63)

are equivalent representations of the BIQE variance component


V̂2 . The residual vector e y , namely
e y (BLUMBE) = [I m  A ( A cV 1A ) 1 A cV 1 ]y , (6.64)

is of type BLUMBE. The variance of V̂2 BIQE of V2


D{Vˆ 2 } = 2(n  r )(n  r + 2) 2 V 4 = 2(n  r )[( n  r + 2) 1 V 2 ]2 (6.65)

can be substituted by the estimation

Dˆ {Vˆ 2 } = 2(n  r )( n  r + 2) 4 (e cy V 1e y ) 2 . (6.66)

The special bias


ȕV := E{Vˆ 2  V 2 } = 2( n  r + 2) 1V 2
2 (6.67)

can be substituted by the estimation

ȕˆ V = Eˆ {Vˆ 2  V 2 } = 2( n  r + 2) 2 e cy V 1e y .


2 (6.68)

Its MSE (Vˆ 2 ) (Mean Square Estimation Error)

MSE{Vˆ 2 }:= Eˆ {(Vˆ 2  V 2 ) 2 } = D{Vˆ 2 } + (V 2  E{Vˆ 2 }) 2 =


(6.69)
= 2( n  r + 2) 1 (V 2 ) 2

can be substituted by the estimation


n{Vˆ 2 } = Eˆ {(Vˆ 2  V 2 ) 2 } = Dˆ {Vˆ 2 } + ( Eˆ {V 2 }) 2 =
MSE
(6.70)
= 2(n  r + 2) 3 (e cy V 1e y ).

6-12 The first example: BLUMBE versus BLE, BIQUUE versus BIQE,
triangular leveling network
The first example for the special Gauss-Markov model with datum defect

{E{y} = Aȟ, A  \ n×m , rk A < min{n, m},


D{y} = VV 2 , V  \ n×m , V 2  \ + , rk V = n}

is taken from a triangular leveling network. 3 modal points are connected, by


leveling measurements [ hĮȕ , hȕȖ , hȖĮ ]c , also called potential differences of abso-
lute potential heights [hĮ , hȕ , hȖ ]c of “fixed effects”. Alternative estimations of
type
(i) I, I-BLUMBE of ȟ  \ m
6-1 Setup of the best linear minimum bias estimator of type BLUMBE 297

(ii) V, S-BLUMBE of ȟ  \ m
(iii) I, I-BLE of ȟ  \ m
(iv) V, S-BLE of ȟ  \ m
(v) BIQUUE of V 2  \ +
(vi) BIQE of V 2  \ +
will be considered. In particular, we use consecutive results of Appendix A,
namely from solving linear system of equations based upon generalized inverse,
in short g-inverses. For the analyst, the special Gauss-Markov model with datum
defect constituted by the problem of estimating absolute heights [hĮ , hȕ , hȖ ] of
points {PĮ , Pȕ , PȖ } from height differences is formulated in Box 6.3.
Box 6.3
The first example
ª hĮȕ º ª 1 +1 0 º ª hĮ º
« » « »
E{« hȕȖ »} = «« 0 1 +1»» « hȕ »
¬ hȖĮ ¼ ¬« +1 0 1¼» ¬ hȖ ¼
« » « »

ª hĮȕ º ª 1 + 1 0 º ª hĮ º
« » « »
y := « hȕȖ » , A := «« 0 1 +1»»  \ 3×3 , ȟ := « hȕ »
« hȖĮ »
¬ ¼ «¬ +1 0 1»¼ « hȖ »
¬ ¼

ª hĮȕ º
« »
D{« hȕȖ »} = D{y} = VV 2 , V 2  \ +
« hȖĮ »
¬ ¼
:dimensions:
ȟ  \ 3 , dim ȟ = 3, y  \ 3 , dim{Y, pdf } = 3
m = 3, n = 3, rk A = 2, rk V = 3.

6-121 The first example: I3, I3-BLUMBE


In the first case, we assume
a dispersion matrix a unity substitute matrix
D{y} = I 3V 2 of i.i.d. S=,3, in short u.s. .
and
observations [y1 , y 2 , y 3 ]c
Under such a specification ȟ̂ is I3, I3-BLUMBE of ȟ in the special Gauss-
Markov model with datum defect.
298 6 The third problem of probabilistic regression

ȟˆ = A c( AA cAA c)  AA cy

ª 2 1 1º ª2 1 0º

AA AA = 3 «« 1 2 1»» ,
c c ( AA cAA c)  = « 1 2 0 »» .
9
«¬ 1 1 2 »¼ «¬ 0 0 0 »¼

?How did we compute the g-inverse ( AA cAAc)  ?


The computation of the g-inverse ( AAcAAc)  has been based upon bordering.

ª 6 3  3 º ª ª 6 3º 1 0 º ª 2 1 0º
« » «« » 1 « »
( AA cAAc) = « 3 6 3» = « ¬ 3 6 ¼»

0 » = «1 2 0 » .
« 0 0 9
«¬ 3 3 6 »¼
¬ 0 »¼ «¬ 0 0 0 »¼

Please, check by yourself the axiom of a g-inverse:



ª +6 3 3º ª +6 3 3º ª +6 3 3º ª +6 3 3º
« »« » « » « »
« 3 +6 3» « 3 +6 3» « 3 +6 3» = « 3 +6 3»
¬« 3 3 +6 ¼» ¬« 3 3 +6 ¼» ¬« 3 3 +6¼» ¬« 3 3 +6¼»
or

ª +6 3 3º ª 2 1 0 º ª +6 3 3º ª +6 3 3º
« »1« » « » « »
« 3 +6 3» 9 « 1 2 0 » « 3 +6 3» = « 3 +6 3»
«¬ 3 3 +6 »¼ «¬ 0 0 0 »¼ «¬ 3 3 +6 »¼ «¬ 3 3 +6»¼

ª hĮ º ª  y1 + y3 º
« » 1
ȟˆ = « hȕ » (I 3 , I 3 -BLUMBE) = «« y1  y2 »»
3
« hȖ » «¬ y2  y3 »¼
¬ ¼

[ˆ1 + [ˆ2 + [ˆ3 = 0 .

Dispersion matrix D{ȟˆ} of


the unknown vector of
“fixed effects”

D{ȟˆ} = V 2 A c( AA cAAc)  A

ª +2 1 1º
V2 «
ˆ
D{ȟ} = 1 + 2  1 »
9 « »
¬« 1 1 +2 ¼»

“replace V 2 by Vˆ 2 (BIQUUE):
6-1 Setup of the best linear minimum bias estimator of type BLUMBE 299

Vˆ 2 = (n  rk A) 1 e cy e y ”

e y (I 3 , I 3 -BLUMBE) = [I 3  A ( AA c)  A c]y

ª1 1 1º ª1º
1 1
e y = «1 1 1» y = ( y1 + y2 + y3 ) «1»
3« » 3 «»
«¬1 1 1»¼ «¬1»¼

ª1 1 1º ª1 1 1º
1
e cy e y = y c ««1 1 1»» ««1 1 1»» y
9
«¬1 1 1»¼ «¬1 1 1»¼

1
e cy e y = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 )
3
1
Vˆ 2 (BIQUUE) = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 )
3

ª +2 1 1º

D{ȟ} = « 1 +2 1»» Vˆ 2 (BIQUUE)
ˆ
9
«¬ 1 1 +2 »¼

“replace V 2 by Vˆ 2 (BIQE):
Vˆ 2 = ( n + 2  rk A ) 1 e cy e y ”

e y (I 3 , I 3 -BLUMBE) = [I 3  A ( AA c)  A c]y

ª1 1 1º ª1º
1 1
e y = ««1 1 1»» y = ( y1 + y2 + y3 ) ««1»»
3 3
«¬1 1 1»¼ «¬1»¼

1
Vˆ 2 ( BIQE ) = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 )
9

ª +2 1 1º
1
D{ȟˆ} = «« 1 +2 1»» Vˆ 2 (BIQE) .
9
«¬ 1 1 +2 »¼

For practice, we recommend D{ȟˆ (BLUMBE), Vˆ 2 (BIQE)} , since the disper-


sion matrix D{ȟˆ} is remarkably smaller when compared to D{ȟˆ (BLUMBE),
Vˆ 2 (B IQUUE)} , a result which seems to be unknown!
300 6 The third problem of probabilistic regression

Bias vector ȕ(BLUMBE) of the


unknown vector of “fixed effects”
ȕ = [I 3  A c( AA cAA c)  AA cA]ȟ ,

ª1 1 1º

ȕ =  «1 1 1»» ȟ ,
3
«¬1 1 1»¼

“replace ȟ which is inaccessible by ȟˆ (I 3 , I 3 -BLUMBE) ”

ª1 1 1º

ȕ =  «1 1 1»» ȟˆ (I 3 , I 3 -BLUMBE) ,
3
¬«1 1 1»¼
ȕ=0

(due to [ˆ1 + [ˆ2 + [ˆ3 = 0 ).

Mean Square Estimation Error MSE{ȟˆ (I 3 , I 3 -BLUMBE)}

MSE{ȟˆ} = D{ȟˆ} + [I 3  A c( AA cAA c)  AA cA]V 2 ,

ª5 2 2º
V2 «
ˆ
MSE{ȟ} =  2 5 2 »» .
9 «
«¬ 2 2 5 »¼

“replace V 2 by Vˆ 2 (BIQUUE):
Vˆ 2 = (n  rk A) 1 ecy e y ”

1
Vˆ 2 (BIQUUE) = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 ) ,
3

ª5 2 2º

MSE{ȟ} =  « 2 5 2 »» Vˆ 2 (BIQUUE) .
ˆ
9
«¬ 2 2 5 »¼

“replace V 2 by Vˆ 2 (BIQE):

Vˆ 2 = ( n + 2  rk A ) 1 ecy e y ”

1
Vˆ 2 (BIQE) = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 )
9
6-1 Setup of the best linear minimum bias estimator of type BLUMBE 301

ª5 2 2º
1
MSE{ȟˆ} =  «« 2 5 2 »» Vˆ 2 (BIQE) .
9
«¬ 2 2 5 »¼

Residual vector e y and dispersion matrix D{e y } of the “random effect” e y


e y (I 3 , I 3 -BLUMBE) = [I 3  A ( A cA )  A c]y

ª1 1 1º ª1º
1« 1
e y = «1 1 1» y = ( y1 + y2 + y3 ) ««1»»
»
3 3
«¬1 1 1»¼ «¬1»¼

D{e y } = V 2 [I 3  A( A cA)  A c]

ª1 1 1º
V2 «
D{e y } = 1 1 1»» .
3 «
«¬1 1 1»¼

“replace V 2 by Vˆ 2 (BIQUUE) or Vˆ 2 (BIQE)”:

ª1 1 1º

D{e y } = «1 1 1»» Vˆ 2 (BIQUUE)
3
«¬1 1 1»¼

or

ª1 1 1º

D{e y } = «1 1 1»» Vˆ 2 (BIQE) .
3
«¬1 1 1»¼

Finally note that ȟˆ (I 3 , I 3 -BLUMBE) corresponds x lm (I 3 , I 3 -MINOLESS) dis-


cussed in Chapter 5. In addition, D{e y | I 3 , I 3 -BLUUE} = D{e y | I 3 , I 3 -BLUMBE} .

6-122 The first example: V, S-BLUMBE


In the second case, we assume
a dispersion matrix and a weighted substitute
D{y} = VV 2 of weighted matrix S, in short w.s. .
observations [ y1 , y2 , y3 ]
302 6 The third problem of probabilistic regression

Under such a specification ȟ̂ is V, S-BLUMBE of ȟ in the special Gauss-


Markov model with datum defect.

ȟˆ = SA c( ASAcV 1ASAc) 1 ASAcV 1y .

As dispersion matrix D{y} = VV 2 we choose

ª2 1 1º

V = «1 2 1 »» , rk V = 3 = n
2
«¬1 1 2 »¼

ª 3 1 1º

V = « 1 3 1»» , but
1

2
«¬ 1 1 3 »¼

S = Diag(0,1,1), rk S = 2

as the bias semi-norm. The matrix S fulfils the condition


rk(SA c) = rk A = r = 2 .

?How did we compute the g-inverse ( ASAcV 1 ASA c)  ?


The computation of the g-inverse ( ASAcV 1 ASA c)  has been based upon bor-
dering.

ª +3 1 1º

V = « 1 +3 1»» , S = Diag(0,1,1), rk S = 2
1

2
«¬ 1 1 +3»¼

ȟˆ = SA c( ASA cV 1ASA c)  ASA cV 1

ª 2 3 1 º
ASAcV ASAc = 2 «« 3 6 3»»
1

«¬ 1 3 2 »¼

ª 2 0 1º

( ASAcV ASAc) = « 0 0 3»»
1 

6
«¬ 1 0 2 »¼
ª hĮ º ª 0 º
ˆȟ = « h » 1«
= « 2 y1  y2  y3 »» .
« ȕ» 3
« hȖ »
¬ ¼ V ,S  BLUMBE ¬« y1 + y2  2 y3 ¼»
6-1 Setup of the best linear minimum bias estimator of type BLUMBE 303

Dispersion matrix D{ȟˆ} of


the unknown vector of
“fixed effects”

D{ȟˆ} = V 2SA c( ASA cV 1ASA c)  AS

ª0 0 0 º
V2 «
ˆ
D{ȟ} = « 0 2 1 »»
6
«¬0 1 2 »¼

“replace V 2 by Vˆ 2 (BIQUUE):
Vˆ 2 = (n  rk A) 1 e cy e y ”

e y = (V, S-BLUMBE) = [I 3  A( A cV 1A)  A cV 1 ]y

ª1 1 1º ª1º
1« y + y2 + y3
e y = «1 1 1»» y = 1 «1»
«»
3 3
«¬1 1 1»¼ «¬1»¼

ª1 1 1º ª1 1 1º
1 «
e cy e y = y c «1 1 1»» ««1 1 1»» y
9
«¬1 1 1»¼ «¬1 1 1»¼

1
e cy e y = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 )
3
1
Vˆ 2 (BIQUUE) = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 )
3

D{ȟˆ} = [V + A( A cV 1A)  A c]Vˆ 2 (BIQUUE)

ª1 1 1º

D{ȟ} = «1 1 1»» Vˆ 2 (BIQUUE) .
ˆ
3
«¬1 1 1»¼
“replace V 2 by Vˆ 2 (BIQE):
Vˆ 2 = (n + 2  rk A) 1 e cy e y ”

e y (V , S-BLUMBE) = [I 3  A ( A cV 1A )  A cV 1 ]y
304 6 The third problem of probabilistic regression

ª1 1 1º ª1º
1 y + y2 + y3
e y = ««1 1 1»» y = 1 «1»
«»
3 3
«¬1 1 1»¼ «¬1»¼

1
Vˆ 2 (BIQE) = ( y12 + y22 + y32 + 2 y1 y2 + 2 y2 y3 + 2 y3 y1 )
9

ª +2 1 1º
1
D{ȟˆ} = «« 1 +2 1»» Vˆ 2 (BIQE) .
9
«¬ 1 1 +2 »¼

We repeat the statement that we recommend the use of


D{ȟˆ (BLUMBE), Vˆ (BIQE)} since the dispersion matrix D{ȟˆ} is remarkably
smaller when compared to D{ȟˆ (BLUMBE), Vˆ 2 (BIQUUE)} !
Bias vector ȕ(BLUMBE) of the unknown vector of “fixed effects”
ȕ = [I 3  SA c( ASA cV 1 ASA c)  ASA cV 1 A ]ȟ

ª1 0 0 º ª[1 º
ȕ =  «1 0 0 » ȟ =  ««[1 »» ,
« »

¬«1 0 0 »¼ ¬«[1 ¼»
“replace ȟ which is inaccessible by ȟ̂ (V,S-BLUMBE)”

ª1º
ȕ =  ««1»» ȟˆ , (V , S-BLUMBE) z 0 .
¬«1¼»
Mean Square Estimation Error MSE{ȟˆ (V , S-BLUMBE)}

MSE{ȟˆ} =
= D{ȟˆ} + [S  SA c( ASA cV 1ASA c)  ASA cV 1AS]V 2

ª0 0 0 º
V2 «
ˆ
MSE{ȟ} = 0 2 1 »» = D{ȟˆ} .
6 «
«¬0 1 2 »¼

“replace V 2 by Vˆ 2 (BIQUUE):
Vˆ 2 = (n  rk A) 1 ecy e y ”

Vˆ 2 (BIQUUE)=3V 2
6-1 Setup of the best linear minimum bias estimator of type BLUMBE 305

MSE{ȟˆ} = D{ȟˆ} .

Residual vector e y and dispersion matrix D{e y } of the “random effect” e y


e y (V , S-BLUMBE) = [I 3  A ( A cV 1A )  A cV 1 ]y

ª1 1 1º ª1º
1« y + y2 + y3
e y = «1 1 1»» y = 1 «1»
«»
3 3
«¬1 1 1»¼ «¬1»¼

D{e y } = V 2 [V  A( A cV 1A)  A c]

ª1 1 1º
2 2«
D{e y } = V «1 1 1»» .
3
«¬1 1 1»¼
“replace V by Vˆ 2
2
(BIQE):
Vˆ 2 = (n + 2  rk A) 1 ecy e y ”

Vˆ 2 (BIQE) versus

ª0 0 0 º

MSE{ȟ} = «0 2 1 »» V 2 ( BIQE ) .
ˆ
6
«¬0 1 2 »¼

Residual vector e y and dispersion matrix D{e y } of the “random effects” e y


e y (V , S-BLUMBE) = [I 3  A ( A cV 1A )  A cV 1 ]y

ª1 1 1º ª1º
1« y + y2 + y3
e y = «1 1 1»» y = 1 «1»
«»
3 3
«¬1 1 1»¼ «¬1»¼

D{e y } = V 2 [V  A( A cV 1A)  A c]

ª1 1 1º
2 2«
D{e y } = V «1 1 1»» .
3
«¬1 1 1»¼

“replace V 2 by Vˆ 2 (BIQUUE) or Vˆ 2 (BIQE)”:


306 6 The third problem of probabilistic regression

ª1 1 1º

D{e y } = « 1 1 1»» Vˆ 2 (BIQUUE)
3
«¬1 1 1»¼

or

ª1 1 1º

D{e y } = «1 1 1»» Vˆ 2 (BIQE) .
3
«¬1 1 1»¼

6-123 The first example: I3 , I3-BLE


In the third case, we assume
a dispersion matrix and a unity substitute matrix
D{y} = I 3V 2 of i.i.d. S=I3, in short u.s. .
observations [ y1 , y2 , y3 ]

Under such a specification ȟ̂ is I3, I3-BLE of ȟ in the special Gauss-Markov


model with datum defect.

ȟˆ (BLE) = (I 3 + A cA ) 1 A cy

ª +3 1 1º ª2 1 1º

I 3 + A cA = «« 1 +3 1»» , (I 3 + AcA) = «1 2 1 »»
1

4
«¬ 1 1 +3»¼ «¬1 1 2 »¼
ª 1 0 1º ª  y1 + y3 º
ˆȟ (BLE) = 1 « 1 1 0 » y = 1 « + y  y »
4« » 4«
1 2»
«¬ 0 1 1»¼ «¬ + y2  y3 »¼

[ˆ1 + [ˆ2 + [ˆ3 = 0 .

Dispersion matrix D{ȟˆ | BLE} of the unknown vector of


“fixed effects”

D{ȟˆ | BLE} = V 2 A cA (I 3 + AcA) 2

2
ª +2 1 1º
V « 1 +2 1» .
D{ȟˆ | BLE} =
16 « »
«¬ 1 1 +2 »¼

Bias vector ȕ(BLE) of the unknown vector of “fixed effects”


6-1 Setup of the best linear minimum bias estimator of type BLUMBE 307

ȕ = [I 3 + A cA]1 ȟ

ª2 1 1º ª 2[1 + [ 2 + [3 º
1« 1«
ȕ =  « 1 2 1 » ȟ =  «[1 + 2[ 2 + [3 »» .
»
4 4
«¬ 1 1 2 »¼ «¬[1 + [ 2 + 2[3 »¼

Mean Square Estimation Error MSE{ȟ (BLE)}

MSE{ȟˆ (BLE)} = V 2 [I 3 + A cA]1

ª2 1 1º
V2 «
ˆ
MSE{ȟ (BLE)} = 1 2 1 »» .
4 «
«¬1 1 2 »¼

Residual vector e y and dispersion matrix D{e y } of the “random effect” e y


e y (BLE) = [ AA c + I 3 ]1 y

ª2 1 1º ª 2 y1 + y2 + y3 º
1« 1«
e y (BLE) = « 1 2 1 » y = « y1 + 2 y2 + y3 »»
»
4 4
«¬1 1 2 »¼ «¬ y1 + y2 + 2 y3 »¼

D{e y (BLE)} = V 2 [I 3 + AA c]2

ª6 5 5º
V2 «
D{e y (BLE)} = « 5 6 5 »» .
16
«¬5 5 6 »¼

Correlations

C{e y , Aȟˆ} = V 2 [I 3 + AA c]2 AA c

ª +2 1 1º
V2 «
ˆ
C{e y , Aȟ} =
 1 +2 1» .
16 « »
¬« 1 1 +2 ¼»
Comparisons BLUMBE-BLE

(i) ȟˆ BLUMBE  ȟˆ BLE

ȟˆ BLUMBE  ȟˆ BLE = A c( AA cAA c)  AAc( AAc + I 3 ) 1 y


308 6 The third problem of probabilistic regression

ª 1 0 1 º ª  y1 + y3 º
1 1
ȟˆ BLUMBE  ȟˆ BLE = «« 1 1 0 »» y = «« + y1  y2 »» .
12 12
«¬ 0 1 1»¼ «¬ + y2  y3 »¼

(ii) D{ȟˆ BLUMBE }  D{ȟˆ BLE }

D{ȟˆ BLUMBE }  D{ȟˆ BLE } =


= V 2 A c( AA cAAc)  AAc( AAc + I 3 ) 1 AAc( AAcAAc)  A +
+V 2 A c( AA cAAc)  AAc( AAc + I 3 ) 1 AAc( AAc + I 3 ) 1 AAc( AAcAAc)  A

ª +2 1 1º
7
D{ȟˆ BLUMBE }  D{ȟˆ BLE } = V « 1 +2 1»» positive semidefinite.
2

144 «
«¬ 1 1 +2 »¼

(iii) MSE{ȟˆ BLUMBE }  MSE{ȟˆ BLE }

MSE{ȟˆ BLUMBE }  MSE{ȟˆ BLE } =


= ı 2 A c( AA cAAc)  AAc( AAc + I 3 ) 1 AAc( AAcAAc)  A

ª +2 1 1º
V2 «
ˆ ˆ
MSE{ȟ BLUMBE }  MSE{ȟ BLE } = 1 +2 1»» positive semidefinite.
36 «
«¬ 1 1 +2 »¼

6-124 The first example: V, S-BLE


In the fourth case, we assume
a dispersion matrix D{y} = VV 2 of and a weighted substitute matrix S,
weighted observations [ y1 , y2 , y3 ] in short w.s. .
We choose
ª2 1 1º

V = «1 2 1 »» positive definite, rk V = 3 = n ,
2
«¬1 1 2 »¼

ª +3 1 1º

V = « 1 +3 1»» ,
1

2
«¬ 1 1 +3»¼

and
S = Diag(0,1,1), rk S = 2 ,
6-1 Setup of the best linear minimum bias estimator of type BLUMBE 309

ȟˆ = (I 3 + SA cV 1A) 1 SA cV 1y ,

ª 1 0 0º ª 21 0 0 º
1 «
I 3 + SA V A = « 2 5 2 » , (I 3 + SA V A) = «14 5 2 »» ,
c 1 « » c 1 1

21
«¬ 2 2 5 »¼ «¬14 2 5 »¼

ª hĮ º
ˆȟ (V, S-BLE) = « h » =
« ȕ»
« hȖ »
¬ ¼ V ,S -BLE
ª0 0 0º ª 0 º
1 « 1 «
= «14 6 4 » y = «10 y1  6 y2  4 y3 »» .
»
21 21
«¬ 4 6 10 »¼ «¬ 4 y1 + 6 y2  10 y3 »¼

Dispersion matrix D{ȟˆ | V, S-BLE} of the unknown vector of

“fixed effects”

D{ȟˆ | V, S-BLE} = V 2SA cV 1A[I 3 + SA cV 1A]1 S ,

ª0 0 0 º
V2 «
ˆ
D{ȟ | V, S-BLE} = « 0 76 22 »» .
441
«¬0 22 76 »¼

Bias vector ȕ(V, S-BLE) of the unknown vector of “fixed effects”


ȕ = [I 3 + SA cV 1 A]1 ȟ

ª 21 0 0 º ª 21[1 º
1 « 1 «
ȕ =  «14 5 2 » ȟ =  «14[1 + 5[ 2 + 2[ 3 »» .
»
21 21
«¬14 2 5 »¼ «¬14[1 + 2[ 2 + 5[ 3 »¼

Mean Square Estimation Error MSE{ȟ | V , S-BLE}


MSE{ȟ | V, S-BLE} = V 2 [I 3 + SA cVA]1 S

ª0 0 0 º
V2 «
MSE{ȟ | V, S-BLE} = 0 5 2 »» .
21 «
«¬0 2 5 »¼

Residual vector e y and dispersion matrix D{e y } of the “random effect” e y


e y (V , S-BLE) = [I 3 + ASA cV 1 ]1 y
310 6 The third problem of probabilistic regression

ª11 6 4 º ª11 y1 + 6 y2 + 4 y3 º
1 « » y = 1 « 6y + 9y + 6y »
e y {V , S-BLE} = 6 9 6
21 « » 21 «
1 2 3 »
«¬ 4 6 11»¼ «¬ 4 y1 + 6 y2 + 11y3 »¼

D{e y (V, S-BLE)} = V 2 [I 3 + ASA cV 1 ]2 V

ª614 585 565 º


V2 «
D{e y (V, S-BLE)} = 585 594 585 »» .
882 «
«¬565 585 614 »¼

Correlations

C{e y , Aȟˆ} = V 2 (I 3 + ASAcV 1 ) 2 ASAc

2
ª 29 9 20 º
V «  9 18  9 » .
C{e y , Aȟˆ} =
441 « »
«¬ 20 9 29 »¼

Comparisons BLUMBE-BLE

(i) ȟˆ BLUMBE  ȟˆ BLE

ȟˆ V ,S  BLUMBE  ȟˆ V ,S -BLE = SA c( ASA cV 1ASA c)  ASA c( ASA c + V ) 1 y

ª0 0 0º ª 0 º
1 « 1 «
ȟˆ V ,S  BLUMBE  ȟˆ V ,S -BLE = « 4 1 3 » y = « 4 y1  y2  3 y3 »» .
»
21 21
«¬ 3 1 4 »¼ «¬3 y1 + y2  4 y3 »¼

(ii) D{ȟˆ BLUMBE }  D{ȟˆ BLE }

D{ȟˆ V ,S -BLUMBE }  D{ȟˆ V ,S -BLE } =


= V SA c( ASA cV ASA c)  ASA c( ASA c + V ) 1 ASA c(ASA cV 1ASA c)  AV +
2 1

V 2 SA c( ASA cV 1 ASA c)  ASA c( ASA c + V ) 1 ASA c(ASA c + V ) 1 ASA c(ASA cV 1ASA c)  AS

2
ª0 0 0º
V «
D{ȟˆ V ,S -BLUMBE }  D{ȟˆ V ,S -BLE } = 0 142 103»» positive semidefinite.
882 «
«¬ 0 103 142 »¼

(iii) MSE{ȟˆ BLUMBE }  MSE{ȟˆ BLE }

MSE{ȟˆ V ,S -BLUMBE }  MSE{ȟˆ V ,S -BLE } =


= V 2SA c( ASAcV 1ASAc)  ASAc( ASAc + V ) 1 ASAc( ASAcV 1 ASAc)  AS
6-1 Setup of the best linear minimum bias estimator of type BLUMBE 311

ª0 0 0 º
V2 «
MSE{ȟˆ V ,S  BLUMBE }  MSE{ȟˆ V ,S  BLE } = « 0 4 3 »» positive semidefinite.
42
«¬0 3 4 »¼

Summarizing, let us compare


I,I-BLUMBE versus I,I-BLE
and
V,S-BLUMBE versus V,S-BLE!

ȟˆ BLUMBE  ȟˆ BLE , D{ȟˆ BLUMBE }  D{ȟˆ BLE } and MSE{ȟˆ BLUMBE }  MSE{ȟˆ BLE } as well
as ȟˆ V ,S -BLUMBE  ȟˆ V,S -BLE , D{ȟˆ V ,S -BLUMBE }  D{ȟˆ V ,S -BLE } and MSE{ȟˆ V ,S -BLUMBE } 
 MSE{ȟˆ V ,S -BLE } result positive semidefinite: In consequence, for three different
measures of distorsions
BLE is in favor of BLIMBE:
BLE produces smaller errors in comparing with BLIMBE!
Finally let us compare weighted BIQUUE and weighted BIQE:
(i) Weighted BIQUUE Vˆ 2 and weighted BIQE Vˆ 2
Vˆ 2 = (n  r ) 1 y cV 1e y = versus Vˆ 2 = (n  r + 2)y cV 1e y =
= (n  r ) 1 e cy V 1e y = (n  r + 2)e cy V 1e y

ª4 1 1º

(e y ) V ,S -BLUMBE = «1 4 1 »» y
6
«¬1 1 4 »¼

ª +3 1 1º

r = rk A = 2, n = 3, V = « 1 +3 1»»
1

2
«¬ 1 1 +3»¼

1 versus 1
Vˆ 2 = ( y12 + y22 + y32 ) Vˆ 2 = ( y12 + y22 + y32 )
2 6
(ii) D{Vˆ 2 | BIQUUE} versus D{Vˆ 2 | BIQE}

D{Vˆ 2 } = 2(n  r ) 1 V 4 versus D{Vˆ 2 } = 2(n  r )(n  r + 2) 1 V 4

D{Vˆ 2 } = 2V 4 versus 2
D{Vˆ 2 } = V 4
9
312 6 The third problem of probabilistic regression

Dˆ {Vˆ 2 } = 2(n  r ) 1 (Vˆ 2 ) 2 versus Eˆ {Vˆ 2  V 2 } =


= 2(n  r + 2) 1 e cy V 1e y

1 versus 1
Dˆ {Vˆ 2 } = ( y12 + y22 + y32 ) Eˆ {Vˆ 2  V 2 } =  ( y12 + y22 + y32 )
2 9

Eˆ {Vˆ 2  V 2 } = 2(n  r + 2) 1 (e cy V 1e y ) 2

1
Eˆ {(Vˆ 2  V 2 )} = ( y12 + y22 + y32 ) .
54

(iii) (e y ) BLUMBE = [I n  A( A cV 1A)  AV 1 ](e y ) BLE

(Vˆ 2 ) BIQUUE = ( n  r )(e cy ) BLE [ V 1  V 1A( A cV 1A)  AV 1 ](e y ) BLE

1
Vˆ 2BIQUUE  Vˆ 2BIQE = ( y12 + y22 + y32 ) positive.
3
We repeat that the difference Vˆ 2BIQUUE  Vˆ BIQE
2
is in favor of Vˆ 2BIQE < Vˆ 2BIQUUE .
6-2 Setup of the best linear estimators of type hom BLE, hom S-
BLE and hom Į-BLE for fixed effects
Numerical tests have documented that ȟ̂ of type Ȉ - BLUUE of ȟ is not robust
against outliers in the stochastic vector y observations. It is for this reason that
we give up the postulate of unbiasedness, but keeping the set up of a linear esti-
mation ȟˆ = Ly of homogeneous type. The matrix L is uniquely determined by
the D  weighted hybrid norm of type minimum variance || D{ȟˆ} ||2 and minimum
bias || I  LA ||2 . For such a homogeneous linear estimation (2.21) by means of
Box 6.4 let us specify the real-valued, nonstochastic bias vector
ȕ:= E{ȟˆ  ȟ} = E{ȟˆ}ȟ of type (6.11), (6.12), (6.13) and the real-valued, non-
stochastic bias matrix I m  LA (6.74) in more detail.
First, let us discuss why a setup of an inhomogeneous linear estimation is not
suited to solve our problem. In the case of an unbiased estimator, the setup of an
inhomogeneous linear estimation ȟˆ = Ly + ț led us to E{ȟˆ} = ȟ the postulate of
unbiasedness if and only if E{ȟˆ}  ȟ := LE{y}  ȟ + ț = (I m  LA)ȟ + ț = 0 for
all ȟ  R m or LA = I m and ț = 0. Indeed the postulate of unbiasedness restricted
the linear operator L to be the (non-unique) left inverse L = A L as well as the
vector ț of inhomogeneity to zero. In contrast the bias vector ȕ := E{ȟˆ  ȟ} =
E{ȟˆ}  ȟ = LE{y}  ȟ = (I m  LA)ȟ + ț for a setup of an inhomogeneous linear
estimation should approach zero if ȟ = 0 is chosen as a special case. In order to
include this case in the linear biased estimation procedure we set ț = 0 .
6-2 Setup of the best linear estimators fixed effects 313

Second, we focus on the norm (2.79) namely || ȕ ||2 := E{(ȟˆ  ȟ )c(ȟˆ  ȟ )} of the bias
vector ȕ , also called Mean Square Error MSE{ȟˆ} of ȟˆ . In terms of a setup of a
homogeneous linear estimation, ȟˆ = Ly , the norm of the bias vector is repre-
sented by (I m  LA)cȟȟ c(I m  LA) or by the weighted Frobenius matrix norm
c where the weight matrix ȟȟ c, rk ȟȟ c = 1, has rank one. Obviously
2
|| (I m  LA)c ||ȟȟ
2
|| (I m  LA)c ||ȟȟ c is only a semi-norm. In addition, ȟȟ c is not accessible since ȟ is
unknown. In this problematic case we replace the matrix ȟȟ c by a fixed, positive
definite m×m matrix S and define the S-weighted Frobenius matrix norm
|| (I m  LA)c ||S2 of type (2.82) of the bias matrix I m  LA . Indeed by means of
the rank identity, rk S=m we have chosen a weight matrix of maximal rank. Now
we are prepared to understand intuitively the following.
Here we focus on best linear estimators of type hom BLE, hom S-BLE and hom
Į-BLE of fixed effects ȟ, which turn out to be better than the best linear uni-
formly unbiased estimator of type hom BLUUE, but suffer from the effect to be
biased. At first let us begin with a discussion about the bias vector and the bias
matrix as well as the Mean Square Estimation Error MSE{ȟˆ} with respect to a
homogeneous linear estimation ȟˆ = Ly of fixed effects ȟ based upon Box 6.4.
Box 6.4
Bias vector, bias matrix
Mean Square Estimation Error
in the special Gauss–Markov model with fixed effects
E{y} = Aȟ (6.71)

D{y} = Ȉ y (6.72)

“ansatz”

ȟˆ = Ly (6.73)

bias vector

ȕ := E{ȟˆ  ȟ} = E{ȟˆ}  ȟ (6.74)

ȕ = LE{y}  ȟ = [I m  LA] ȟ (6.75)

bias matrix
B := I m  LA (6.76)

decomposition

ȟˆ  ȟ = (ȟˆ  E{ȟˆ}) + ( E{ȟˆ}  ȟ ) (6.77)

ȟˆ  ȟ = L(y  E{y})  [I m  LA] ȟ (6.78)


314 6 The third problem of probabilistic regression

Mean Square Estimation Error

MSE{ȟˆ} := E{(ȟˆ  ȟ )(ȟˆ  ȟ )c} (6.79)

MSE{ȟˆ} = LD{y}Lc + [I m  LA ] ȟȟ c [I m  LA ]c (6.80)

( E{ȟˆ  E{ȟˆ}} = 0)

modified Mean Square Estimation Error


MSES {ȟˆ} := LD{y}Lc + [I m  LA] S [I m  LA]c (6.81)
Frobenius matrix norms

|| MSE{ȟˆ} ||2 := tr E{(ȟˆ  ȟ )(ȟˆ  ȟ )c} (6.82)

|| MSE{ȟˆ} ||2 =
= tr LD{y}Lc + tr[I m  LA] ȟȟ c [I m  LA]c (6.83)
= || Lc ||
2
6y + || (I m  LA)c ||
2
ȟȟ '

|| MSES {ȟˆ} ||2 :=


:= tr LD{y}Lc + tr[I m  LA]S[I m  LA]c (6.84)
= || Lc ||6y + || (I m  LA)c ||S
2 2

hybrid minimum variance – minimum bias norm


Į-weighted

L(L) := || Lc ||62 y + 1 || (I m  LA)c ||S2 (6.85)


D
special model
dim R (SAc) = rk SAc = rk A = m . (6.86)

The bias vector ȕ is conventionally defined by E{ȟˆ}  ȟ subject to the homoge-


neous estimation form ȟˆ = Ly . Accordingly the bias vector can be represented
by (6.75) ȕ = [I m  LA] ȟ . Since the vector ȟ of fixed effects is unknown, there
has been made the proposal to use instead the matrix I m  LA as a matrix-valued
measure of bias. A measure of the estimation error is the Mean Square estima-
tion error MSE{ȟˆ} of type (6.79). MSE{ȟˆ} can be decomposed into two basic
parts:
• the dispersion matrix D{ȟˆ} = LD{y}Lc
• the bias product ȕȕc.
Indeed the vector ȟˆ  ȟ can be decomposed as well into two parts of type (6.77),
(6.78), namely (i) ȟˆ  E{ȟˆ} and (ii) E{ȟˆ}  ȟ which may be called estimation
6-2 Setup of the best linear estimators fixed effects 315

error and bias, respectively. The double decomposition of the vector ȟˆ  ȟ leads
straightforward to the double representation of the matrix MSE{ȟˆ} of type (6.80).
Such a representation suffers from two effects: Firstly the vector ȟ of fixed ef-
fects is unknown, secondly the matrix ȟȟ c has only rank 1. In consequence, the
matrix [I m  LA] ȟȟ c [I m  LA]c has only rank 1, too. In this situation there has
been made a proposal to modify MSE{ȟˆ} with respect to ȟȟ c by the regular ma-
trix S. MSES {ȟˆ} has been defined by (6.81). A scalar measure of MSE{ȟˆ} as
well as MSES {ȟˆ} are the Frobenius norms (6.82), (6.83), (6.84). Those scalars
constitute the optimal risk in Definition 6.7 (hom BLE) and Definition 6.8 (hom
S-BLE). Alternatively a homogeneous Į-weighted hybrid minimum variance-
minimum bias estimation (hom Į-BLE) is presented in Definition 6.9 (hom Į-
BLE) which is based upon the weighted sum of two norms of type (6.85),
namely
• average variance || Lc ||62 y = tr L6 y Lc
• average bias || (I m  LA)c ||S2 = tr[I m  LA] S [I m  LA]c.
The very important estimator Į-BLE is balancing variance and bias by the
weight factor Į which is illustrated by Figure 6.1.

min bias balance min variance


between variance and bias

Figure 6.1 Balance of variance and bias by the weight factor Į

Definition 6.7 ( ȟ̂ hom BLE of ȟ ):


An m×1 vector ȟ̂ is called homogeneous BLE of ȟ in the special
linear Gauss-Markov model with fixed effects of Box 6-3, if
(1st) ȟ̂ is a homogeneous linear form

ȟˆ = Ly , (6.87)
(2nd) in comparison to all other linear estimations ȟ̂ has the
minimum Mean Square Estimation Error in the sense of

|| MSE{ȟˆ} ||2 =
= tr LD{y}Lc + tr[I m  LA] ȟȟ c [I m  LA]c (6.88)
= || Lc ||6y + || (I m  LA)c ||
2 2
ȟȟ c .
316 6 The third problem of probabilistic regression

Definition 6.8 ( ȟ̂ S-BLE of ȟ ):


An m×1 vector ȟ̂ is called homogeneous S-BLE of ȟ in the special
linear Gauss-Markov model with fixed effects of Box 6.3, if
(1st) ȟ̂ is a homogeneous linear form

ȟˆ = Ly , (6.89)
(2nd) in comparison to all other linear estimations ȟ̂ has the mini-
mum S-modified Mean Square Estimation Error in the sense
of

|| MSES {ȟˆ} ||2 :=


:= tr LD{y}Lc + tr[I m  LA]S[I m  LA]c (6.90)
= || Lc ||62 y + || (I m  LA)c ||S2 = min .
L

Definition 6.9 ( ȟ̂ hom hybrid min var-min bias solution,


Į-weighted or hom Į-BLE):
An m×1 vector ȟ̂ is called homogeneous Į-weighted hybrid mini-
mum variance- minimum bias estimation (hom Į-BLE) of ȟ in the
special linear Gauss-Markov model with fixed effects of Box 6.3, if
(1st) ȟ̂ is a homogeneous linear form

ȟˆ = Ly , (6.91)

(2nd) in comparison to all other linear estimations ȟ̂ has the mini-


mum variance-minimum bias in the sense of the Į-weighted hybrid
norm

tr LD{y}Lc + 1 tr (I m  LA ) S (I m  LA )c
D
(6.92)
= || Lc ||62 + 1 || (I m  LA)c ||S2 = min ,
y
D L

in particular with respect to the special model


D  \ + , dim R (SA c) = rk SA c = rk A = m .
The estimations ȟ̂ hom BLE, hom S-BLE and hom Į-BLE can be characterized
as following:
Lemma 6.10 (hom BLE, hom S-BLE and hom Į-BLE):
An m×1 vector ȟ̂ is hom BLE, hom S-BLE or hom Į-BLE of ȟ in
the special linear Gauss-Markov model with fixed effects of Box
6.3, if and only if the matrix L̂ fulfils the normal equations
6-2 Setup of the best linear estimators fixed effects 317

(1st) hom BLE:

( Ȉ y + Aȟȟ cA c)Lˆ c = Aȟȟ c (6.93)


(2nd) hom S-BLE:
ˆ c = AS
( Ȉ y + ASAc)L (6.94)

(3rd) hom Į-BLE:

( Ȉ y + 1 ASAc)Lˆ c = 1 AS . (6.95)
D D

:Proof:
(i) hom BLE:
The hybrid norm || MSE{ȟˆ} ||2 establishes the Lagrangean
L (L) := tr L6 y Lc + tr (I m  LA) ȟȟ c (I m  LA)c = min
L

for ȟ̂ hom BLE of ȟ . The necessary conditions for the minimum of the quad-
ratic Lagrangean L (L) are
wL ˆ
(L) := 2[6 y Lˆ c + Aȟȟ cA cLˆ c  Aȟȟ c] = 0
wL
which agree to the normal equations (6.93). (The theory of matrix derivatives is
reviewed in Appendix B (Facts: derivative of a scalar-valued function of a ma-
trix: trace).
The second derivatives

w2 L
(Lˆ ) > 0
w (vecL)w (vecL)c

at the “point” L̂ constitute the sufficiency conditions. In order to compute such


an mn×mn matrix of second derivatives we have to vectorize the matrix normal
equation
wL ˆ
( L ) := 2Lˆ (6 y + Aȟȟ cA c)  2ȟ ȟ cA c ,
wL

wL
( Lˆ ) := vec[2 Lˆ (6 y + Aȟȟ cA c)  2ȟȟ cA c] .
w (vecL )

(ii) hom S-BLE:


The hybrid norm || MSEs {ȟˆ} ||2 establishes the Lagrangean
L (L) := tr L6 y Lc + tr (I m  LA) S (I m  LA)c = min
L
318 6 The third problem of probabilistic regression

for ȟ̂ hom S-BLE of ȟ . Following the first part of the proof we are led to the
necessary conditions for the minimum of the quadratic Lagrangean L (L)
wL ˆ
(L) := 2[6 y Lˆ c + ASAcLˆ c  AS]c = 0
wL
as well as to the sufficiency conditions

w2 L
(Lˆ ) = 2[( Ȉ y + ASAc) … I m ] > 0 .
w (vecL)w ( vecL)c

The normal equations of hom S-BLE wL wL (Lˆ ) = 0 agree to (6.92).

(iii) hom Į-BLE:

The hybrid norm || Lc ||62 + 1 || ( I m - LA )c ||S2 establishes the Lagrangean


y
D

L (L) := tr L6 y Lc + 1 tr (I m - LA)S(I m - LA)c = min


D L

for ȟ̂ hom Į-BLE of ȟ . Following the first part of the proof we are led to the
necessary conditions for the minimum of the quadratic Lagrangean L (L)
wL ˆ
(L) = 2[( Ȉ y + Aȟȟ cA c) … I m ]vecLˆ  2vec(ȟȟ cA c) .
wL
The Kronecker-Zehfuss product A … B of two arbitrary matrices as well as
( A + B) … C = A … B + B … C of three arbitrary matrices subject to dim A = dim
B is introduced in Appendix A, “Definition of Matrix Algebra: multiplication of
matrices of the same dimension (internal relation) and Laws”. The vec operation
(vectorization of an array) is reviewed in Appendix A, too, “Definition, Facts:
vecAB = (Bc … I cA )vecA for suitable matrices A and B”. Now we are prepared to
compute

w2 L
(Lˆ ) = 2[(6 y + Aȟȟ cA c) … I m ] > 0
w (vecL)w (vecL)c

as a positive definite matrix. The theory of matrix derivatives is reviewed in


Appendix B “Facts: Derive of a matrix-valued function of a matrix, namely
w (vecX) w (vecX)c ”.
wL ˆ
( L) = 2[ 1 ASA cL
ˆ c+ Ȉ Lˆ c  1 AS]cD
wL D y
D
as well as to the sufficiency conditions
6-2 Setup of the best linear estimators fixed effects 319

w2 L
(Lˆ ) = 2[( 1 ASA c + Ȉ y ) … I m ] > 0.
w (vecL)w (vecL)c D

The normal equations of hom Į-BLE wL wL (Lˆ ) = 0 agree to (6.93).


h
For an explicit representation of ȟ̂ as hom BLE, hom S-BLE and hom Į-BLE of
ȟ in the special Gauss–Markov model with fixed effects of Box 6.3, we solve the
normal equations (6.94), (6.95) and (6.96) for

Lˆ = arg{L (L) = min} .


L

Beside the explicit representation of ȟ̂ of type hom BLE, hom S-BLE and hom
Į-BLE we compute the related dispersion matrix D{ȟˆ} , the Mean Square Estima-
tion Error MSE{ȟˆ}, the modified Mean Square Estimation Error
MSES {ȟˆ} and MSED ,S {ȟˆ} in

Theorem 6.11 ( ȟ̂ hom BLE):


Let ȟˆ = Ly be hom BLE of ȟ in the special linear Gauss-Markov
model with fixed effects of Box 6.3. Then equivalent representa-
tions of the solutions of the normal equations (6.93) are

ȟˆ = ȟȟ cA c[ Ȉ y + Aȟȟ cA c]1 y (6.96)

(if [6 y + Aȟȟ cA c]1 exists)

and completed by the dispersion matrix

D{ȟˆ} = ȟȟ cA c[ Ȉ y + Aȟȟ cA c]1 Ȉ y ×


(6.97)
× [ Ȉ y + Aȟȟ cA c]1 Aȟȟ c ,

by the bias vector

ȕ := E{ȟˆ}  ȟ
(6.98)
= [I m  ȟȟ cA c( Aȟȟ cA c + Ȉ y ) 1 A] ȟ

and by the matrix of the Mean Square Estimation Error MSE{ȟˆ} :

MSE{ȟˆ}:= E{(ȟˆ  ȟ)(ȟˆ  ȟ)c}


(6.99)
= D{ȟˆ} + ȕȕc
320 6 The third problem of probabilistic regression

MSE{ȟˆ} :=
D{ȟˆ} + [I m  ȟȟ cA c( Aȟȟ cA c + Ȉ y ) 1 A] × (6.100)
×ȟȟ c [I m  Ac( Aȟȟ cA c + Ȉ y ) Aȟȟ c].
1

At this point we have to comment what Theorem 6.11 tells us. hom BLE has
generated the estimation ȟ̂ of type (6.96), the dispersion matrix D{ȟˆ} of type
(6.97), the bias vector of type (6.98) and the Mean Square Estimation Error of
type (6.100) which all depend on the vector ȟ and the matrix ȟȟ c , respectively.
We already mentioned that ȟ and the matrix ȟȟ c are not accessible from meas-
urements. The situation is similar to the one in hypothesis testing. As shown later
in this section we can produce only an estimator ȟ and consequently can setup a
hypothesis ȟ 0 of the "fixed effect" ȟ . Indeed, a similar argument applies to the
second central moment D{y} ~ Ȉ y of the "random effect" y, the observation
vector. Such a dispersion matrix has to be known in order to be able to com-
pute ȟ̂ , D{ȟˆ} , and MSE{ȟˆ} . Again we have to apply the argument that we are
only able to construct an estimate Ȉ ˆ and to setup a hypothesis about
y
D{y} ~ 6 y .
Theorem 6.12 ( ȟ̂ hom S-BLE):
Let ȟˆ = Ly be hom S-BLE of ȟ in the special linear Gauss-
Markov model with fixed effects of Box 6.3. Then equivalent repre-
sentations of the solutions of the the normal equations (6.94) are

ȟˆ = SA c( Ȉ y + ASA c) 1 y (6.101)

ȟˆ = ( A cȈ y1A + S 1 ) 1 AcȈ y1y (6.102)

ȟˆ = (I m + SA cȈ y1A) 1 SA c6 y1y (6.103)

(if S 1 , Ȉ y1 exist)


are completed by the dispersion matrices

D{ȟˆ} = SA c( ASAc + Ȉ y ) 1 Ȉ y ( ASAc + Ȉ y ) 1 AS (6.104)


D{ȟˆ} = ( A cȈ A + S ) Ac6 A( A cȈ A + S )
1
y
1 1 1
y
1
y
1 1
(6.105)

(if S 1 , Ȉ y1 exist)


by the bias vector

ȕ := E{ȟˆ}  ȟ
= [I m  SA c( ASA c + Ȉ y ) 1 A] ȟ

ȕ = [I m  ( A cȈ y1A + S 1 ) 1 A c6 y1A] ȟ (6.106)


6-2 Setup of the best linear estimators fixed effects 321

(if S 1 , Ȉ y1 exist)


and by the matrix of the modified Mean Square Estimation Error MSE{ȟˆ} :

MSES {ȟˆ} := E{(ȟˆ  ȟ )(ȟˆ  ȟ )c}


(6.107)
= D{ȟˆ} + ȕȕc

MSES {ȟˆ} = SA c( ASA c + Ȉ y ) 1 Ȉ y ( ASA c + Ȉ y ) 1 AS +


+[I m  SA c( ASA c + Ȉ y ) 1 A] ȟȟ c [I m  Ac( ASAc + Ȉ y ) 1 AS] = (6.108)
= S  SA c( ASA c + Ȉ y ) AS 1

MSES {ȟˆ} = ( A cȈ y1A + S 1 ) 1 A cȈ y1A( A cȈ y1A + S 1 )1 +


+ [I m  ( A cȈ y1A + S 1 ) 1 A cȈ y1A] ȟȟ c ×
(6.109)
× [I m  A cȈ y1A( A cȈ y1A + S 1 ) 1 ]
= ( A cȈ y1A + S 1 ) 1

(if S 1 , Ȉ y1 exist).


The interpretation of hom S-BLE is even more complex. In extension of the
comments to hom BLE we have to live with another matrix-valued degree of
freedom, ȟ̂ of type (6.101), (6.102), (6.103) and D{ȟˆ} of type (6.104), (6.105)
do no longer depend on the inaccessible matrix ȟȟ c , rk(ȟȟ c) , but on the "bias
weight matrix" S, rk S = m. Indeed we can associate any element of the bias
matrix with a particular weight which can be "designed" by the analyst. Again
the bias vector ȕ of type (6.106) as well as the Mean Square Estimation Error of
type (6.107), (6.108), (6.109) depend on the vector ȟ which is inaccessible. Be-
side the "bias weight matrix S" ȟ̂ , D{ȟˆ} , ȕ and MSEs {ȟˆ} are vector-valued or
matrix-valued functions of the dispersion matrix D{y} ~ 6 y of the stochastic
observation vector which is inaccessible. By hypothesis testing we may decide
upon the construction of D{y} ~ 6 y from an estimate 6y .

Theorem 6.13 ( ȟ̂ hom D -BLE):


Let ȟˆ = /y be hom D -BLE of ȟ in the special linear Gauss-
Markov model with fixed effects Box 6.3. Then equivalent repre-
sentations of the solutions of the normal equations (6.95) are

ȟˆ = 1 SA c( Ȉ y + 1 ASA c) 1 y (6.110)
D D
ȟˆ = ( A cȈ y1A + D S 1 ) 1 A cȈ y1y (6.111)

ȟˆ = (I m + 1 SA cȈ y1A) 1 1 SA cȈ y1y (6.112)


D D
322 6 The third problem of probabilistic regression

(if S 1 , Ȉ y1 exist)


are completed by the dispersion matrix

D{ȟˆ} = 1 SA c( Ȉ y + 1 ASA c) 1 Ȉ y ( Ȉ y + 1 ASA c) 1 AS 1 (6.113)


D D D D
D{ȟˆ} = ( A cȈ y1A + D S 1 ) 1 A cȈ y1A( A cȈ y1A + D S 1 )1 (6.114)

(if S 1 , Ȉ y1 exist),


by the bias vector

ȕ := E{ȟˆ}  ȟ
= [I m  1 SA c( 1 ASAc + Ȉ y ) 1 A] ȟ
D D
ȕ = [I m  ( AcȈ y1 A + D S 1 ) 1 AcȈ y1A] ȟ (6.115)

(if S 1 , Ȉ y1 exist)


and by the matrix of the Mean Square Estimation Error MSE{ȟˆ} :

MSE{ȟˆ} := E{(ȟˆ  ȟ )(ȟˆ  ȟ )c}


(6.116)
= D{ȟˆ} + ȕȕc

MSED , S {ȟˆ} = SCc( ASA c + Ȉ y ) 1 Ȉ y ( ASA c + Ȉ y ) 1 AS +

+ [I m - 1 SAc( 1 ASA c + Ȉ y ) 1 A] ȟȟ c ×
D D
(6.117)
× [I m - A c( 1 ASA c + Ȉ y ) 1 AS 1 ] =
D D
1 1 1
= S  SAc( ASAc + Ȉ y ) 1 1
AS
D D D D
MSED , S {ȟˆ} = ( A cȈ y1A + D S 1 ) 1 A cȈ y1A( A cȈ y1A + D S 1 ) 1 +
+ [I m - ( A cȈ y1A + D S 1 ) 1 A cȈ y1A] ȟȟ c ×
(6.118)
× [I m - A cȈ y1A( A cȈ y1A + D S 1 ) 1 ]
= ( A cȈ y1A + D S 1 ) 1

(if S 1 , Ȉ y1 exist).

The interpretation of the very important estimator hom Į-BLE ȟ̂ of ȟ is as fol-


lows: ȟ̂ of type (6.111), also called ridge estimator or Tykhonov-Phillips regula-
tor, contains the Cayley inverse of the normal equation matrix which is additively
decomposed into A cȈ y1A and D S 1 . The weight factor D balances the first
6-2 Setup of the best linear estimators fixed effects 323

inverse dispersion part and the second inverse bias part. While the experiment
informs us of the variance-covariance matrix Ȉ y , say Ȉ l y , the bias weight matrix
S and the weight factor D are at the disposal of the analyst. For instance, by the
choice S = Diag( s1 ,..., sA ) we may emphasize increase or decrease of certain bias
matrix elements. The choice of an equally weighted bias matrix is S = I m . In
contrast the weight factor D can be determined by the A-optimal design of type
• tr D{ȟˆ} = min
D
• ȕȕc = min
D
• tr MSED , S {ȟˆ} = min .
D
In the first case we optimize the trace of the variance-covariance matrix
D{ȟˆ} of type (6.113), (6.114). Alternatively by means of ȕȕ ' = min we optimize
D
the quadratic bias where the bias vector ȕ of type (6.115) is chosen, regardless
of the dependence on ȟ . Finally for the third case – the most popular one – we
minimize the trace of the Mean Square Estimation Error MSED , S {ȟˆ} of type
(6.118), regardless of the dependence on ȟȟ c . But beforehand let us present the
proof of Theorem 6.10, Theorem 6.11 and Theorem 6.8.
Proof:
(i) ȟˆ = ȟȟ cA c[ Ȉ y + Aȟȟ cA c]1 y
If the matrix Ȉ y + Aȟȟ cA c of the normal equations of type hom BLE is of full
rank, namely rk(Ȉ y + Aȟȟ cA c) = n, then a straightforward solution of (6.93) is

Lˆ = ȟȟ cA c[ Ȉ y + Aȟȟ cA c]1.

(ii) ȟˆ = SA c( Ȉ y + ASA c) 1 y
If the matrix Ȉ y + ASAc of the normal equations of type hom S-BLE is of full
rank, namely rk(Ȉ y + ASA c) = n, then a straightforward solution of (6.94) is

Lˆ = SAc( Ȉ y + ASAc) 1.

(iii) z = ( A cȈ y1A + S 1 ) 1 AcȈ y1y


Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matri-
ces, s(10), Duncan-Guttman matrix identity) the fundamental matrix identity
SA c( Ȉ y + ASA c) 1 = ( A cȈ y1A + S 1 ) 1 A cȈ y1 ,

if S 1 and Ȉ y1 exist. Such a result concludes this part of the proof.
(iv) ȟˆ = (I m + SA cȈ y1A) 1 SA cȈ y1y
Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matri-
ces, s(9)) the fundamental matrix identity
324 6 The third problem of probabilistic regression

SA c( Ȉ y + ASAc) 1 = (I m + SAcȈ y1 A) 1 SAcȈ y1 ,

if Ȉ y1 exists. Such a result concludes this part of the proof.

(v) ȟˆ = 1 SA c( Ȉ y + 1 ASA c) 1 y
D D
If the matrix Ȉ y + D1 ASA c of the normal equations of type hom Į-BLE is of full
rank, namely rk(Ȉ y + D1 ASA c) = n, then a straightforward solution of (6.95) is

Lˆ = 1 SA c[ Ȉ y + 1 ASAc]1 .
D D
(vi) ȟˆ = ( A cȈ y1A + D S 1 ) 1 A cȈ y1y

Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matri-
ces, s(10), Duncan-Guttman matrix identity) the fundamental matrix identity
1 SAc( Ȉ + ASAc) 1 = ( AcȈ 1 A + D S 1 ) 1 AcȈ 1
y y y
D
if S 1 and Ȉ y1 exist. Such a result concludes this part of the proof.

(vii) ȟˆ = (I m + 1 SA cȈ y1A) 1 1 SA cȈ y1y


D D
Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matri-
ces, s(9), Duncan-Guttman matrix identity) the fundamental matrix identity
1 SA c( Ȉ + ASA c) 1 = (I + 1 SA cȈ 1A ) 1 1 SA cȈ 1
y m y y
D D D
if Ȉ y1 exist. Such a result concludes this part of the proof.
(viii) hom BLE: D{ȟˆ}

D{ȟˆ} := E{[ȟˆ  E{ȟˆ}][ȟˆ  E{ȟˆ}]c} =


= ȟȟ cA c[ Ȉ y + Aȟȟ cA c]1 Ȉ y [ Ȉ y + Aȟȟ cA c]1 Aȟȟ c.

By means of the definition of the dispersion matrix D{ȟˆ} and the substitution of
ȟ̂ of type hom BLE the proof has been straightforward.
(ix) hom S-BLE: D{ȟˆ} (1st representation)

D{ȟˆ} := E{[ȟˆ  E{ȟˆ}][ȟˆ  E{ȟˆ}]c} =


= SA c( ASA c + Ȉ y ) 1 Ȉ y ( ASA c + Ȉ y ) 1 AS.

By means of the definition of the dispersion matrix D{ȟˆ} and the substitution of
ȟ̂ of type hom S-BLE the proof of the first representation has been straightfor-
ward.
6-2 Setup of the best linear estimators fixed effects 325

(x) hom S-BLE: D{ȟˆ} (2nd representation)

D{ȟˆ} := E{[ȟˆ  E{ȟˆ}][ȟˆ  E{ȟˆ}]c} =


= ( A cȈ y1A + S 1 ) 1 Ac6 y1A( A cȈ y1A + S 1 )1 ,

if S 1 and Ȉ y1 exist. By means of the definition of the dispersion matrix


D{ȟˆ} and the substitution of ȟ̂ of type hom S-BLE the proof of the second repre-
sentation has been straightforward.
(xi) hom Į-BLE: D{ȟˆ} (1st representation)
D{ȟ} := E{[ȟˆ  E{ȟˆ}][ȟˆ  E{ȟˆ}]c} =
ˆ

= 1 SA c( Ȉ y + 1 ASA c) 1 Ȉ y ( Ȉ y + 1 ASA c) 1 AS 1 .
D D D D
By means of the definition of the dispersion matrix D{ȟˆ} and the substitution of
ȟ̂ of type hom Į-BLE the proof of the first representation has been straightfor-
ward.
(xii) hom Į-BLE: D{ȟˆ} (2nd representation)

D{ȟˆ} := E{[ȟˆ  E{ȟˆ}][ȟˆ  E{ȟˆ}]c} =


= ( A cȈ y1A + D S 1 ) 1 AcȈ y1A( AcȈ y1A + D S 1 )1 ,

if S 1 and Ȉ y1 exist. By means of the definition of the dispersion matrix and the
D{ȟˆ} substitution of ȟ̂ of type hom Į-BLE the proof of the second representation
has been straightforward.
(xiii) bias ȕ for hom BLE, hom S-BLE and hom Į-BLE
As soon as we substitute into the bias ȕ := E{ȟˆ}  ȟ = ȟ + E{ȟˆ} the various esti-
mators ȟ̂ of the type hom BLE, hom S-BLE and hom Į-BLE we are directly
led to various bias representations ȕ of type hom BLE, hom S-BLE and hom Į-
BLE.
(xiv) MSE{ȟˆ} of type hom BLE, hom S-BLE and hom Į-BLE
MSE{ȟˆ} := E{(ȟˆ  ȟ )(ȟˆ  ȟ )c}
ȟˆ  ȟ = ȟˆ  E{ȟˆ} + ( E{ȟˆ}  ȟ )

E{(ȟˆ  ȟ )(ȟˆ  ȟ )c} =


E{(ȟˆ  E{ȟˆ})((ȟˆ  E{ȟˆ})c}
+( E{ȟˆ}  ȟ )( E{ȟˆ}  ȟ )c
MSE{ȟˆ} = D{ȟˆ} + ȕȕc .
326 6 The third problem of probabilistic regression

At first we have defined the Mean Square Estimation Error MSE{ȟˆ} of ȟ̂ . Sec-
ondly we have decomposed the difference ȟˆ  ȟ into the two terms
• ȟˆ  E{ȟˆ}
• E{ȟˆ}  ȟ
in order to derive thirdly the decomposition of MSE{ȟˆ} , namely
• the dispersion matrix of ȟ̂ , namely D{ȟˆ} ,
• the quadratic bias ȕȕc .
As soon as we substitute MSE{ȟˆ} the dispersion matrix D{ȟˆ} and the bias vector
ȕ of various estimators ȟ̂ of the type hom BLE, hom S-BLE and hom D -BLE
we are directly led to various representations ȕ of the Mean Square Estimation
Error MSE{ȟˆ} .
Here is my proof’s end. h
7 A spherical problem of algebraic representation - inconsis-
tent system of directional observational equations - overde-
termined system of nonlinear equations on curved mani-
folds
“Least squares regression is not appropriate when the response variable is cir-
cular, and can lead to erroneous results. The reason for this is that the squared
difference is not an appropriate measure of distance on the circle.”
U. Lund (1999)
A typical example of a nonlinear model is the inconsistent system of nonlinear
observational equations generated by directional measurements (angular obser-
vations, longitudinal data). Here the observation space Y as well as the parame-
ter space X is the hypersphere S p  R p +1 : the von Mises circle S1 , p = 2 the
Fisher sphere S 2 , in general the Langevin sphere S p . For instance, assume re-
peated measurements of horizontal directions to one target which are distributed
as polar coordinates on a unit circle clustered around a central direction. Alter-
natively, assume repeated measurements of horizontal and vertical directions to
one target which are similarly distributed as spherical coordinates (longitude,
latitude) on a unit sphere clustered around a central direction. By means of a
properly chosen loss function we aim at a determination of the central direction.
Let us connect all points on S1 , S 2 , or in general S p the measurement points, by
a geodesic, here the great circle, to the point of the central direction. Indeed the
loss function will be optimal at a point on S1 , S 2 , or in general S p , called the
central point. The result for such a minimum geodesic distance mapping will be
presented.
Please pay attention to the guideline of Chapter 7.

Lemma 7.2
minimum geodesic distance: S1

Definition 7.1 Lemma 7.3


minimum geodesic distance: S1 minimum geodesic distance: S1

Definition 7.4 Lemma 7.5


minimum geodesic distance: S 2 minimum geodesic distance: S 2

Lemma 7.6
minimum geodesic distance: S 2
328 7 A spherical problem of algebraic representation

7-1 Introduction
Directional data, also called “longitudinal data” or “angular data”, arise in
several situations, notable geodesy, geophysics, geology, oceanography, atmos-
pheric science, meteorology and others. The von Mises or circular normal distri-
bution CN ( P , N ) with mean direction parameter P (0 d P d 2S ) and concen-
tration parameter N (N > 0) , the reciprocal of a dispersion measure, plays the
role in circular data parallel to that of the Gauss normal distribution in linear
data. A natural extension of the CN distribution to the distribution on a p-
dimensional sphere S p  R p +1 leads to the Fisher - von Mises or Langevin distri-
bution L( P , N ) . For p=2, namely for spherical data (spherical longitude, spheri-
cal latitude), this distribution has been studied by R. A. Fisher (1953), generaliz-
ing the result of R. von Mises (1918) for p=1, and is often quoted as the Fisher
distribution. Further details can be taken from K. V. Mardia (1972), K. V. Mardia
and P.E. Jupp (2000), G. S. Watson (1986, 1998) and A. Sen Gupta and R.
Maitra (1998).
Box 7.1:
Fisher - von Mises or Langevin distribution
p=1 ( R. von Mises 1918)
f (/ | P , N ) = [2S I 0 (N )]1 exp[N cos( /  P / )] (7.1)
f (/ | P , N ) = [2S I 0 (N )] exp N < ȝ | ; >
1
(7.2)
cos < := (7.3)
:=< ȝ | ; >= P x X + P y Y = cos P / cos / + sin P / sin /
(7.4)
cos < = cos(/  P/ )

ȝ = e1 cos P/ + e 2 sin P /  S1 (7.5)


X = e1 cos / + e 2 sin /  S1 (7.6)

p=2 (R. A. Fisher 1953)


f ( /, ) | P / , P ) , N )
N
= exp[cos ) cos P) cos(/  P / ) + sin ) sin P) ] (7.7)
4S sinh N
N
= exp N < ȝ | X >
4S sinh N
cos < :=< ȝ | X >= P x X + P yY + P z Z = (7.8)
= cos P) cos P / cos ) cos / + cos P) sin P / cos ) sin / + sin P) sin )
cos < = cos ) cos P) cos(/  P / ) + sin ) sin P)
7-1 Introduction 329

ȝ = e1 P x + e 2 P y + e3 P z =
(7.9)
= e1 cos P) cos P / + e 2 cos P) sin P / + e3 sin P)  S 2
X = e1 X + e 2Y + e3 Z =
(7.10)
= e1 cos ) cos / + e 2 cos ) sin / + e3 sin )  S 2 .
Box 7.1 is a review of the Fisher- von Mises or Langevin distribution. First, we
setup the circular normal distribution on S1 with longitude / as the stochastic
variable and ( P/ , N ) the distributional parameters called “mean direction ȝ ”
and “concentration measure”, the reciprocal of a dispersion measure. Due to the
normalization of the circular probability density function (“pdf”) I 0 (N ) as the
zero order modified Bessel function of the first kind of N appears. The circular
distance between the circular mean vector ȝ  S1 and the placement vector
X  S1 is measured by “ cos < ”, namely the inner product < ȝ | X > , both P
and X represented in polar coordinates ( P / , / ) , respectively. In summary,
(7.1) is the circular normal pdf, namely an element of the exponential class.
Second, we refer to the spherical normal pdf on S 2 with spherical longitude / ,
spherical latitude ) as the stochastic variables and ( P / , P) , N ) the distribu-
tional parameters called “longitudinal mean direction, lateral mean direction
( P/ , P) ) ” and “concentration measure N ”, the reciprocal of a dispersion
measure. Here the normalization factor of the spherical pdf is N /(4S sinh N ) .
The spherical distance between the spherical mean vector ȝ  S 2 and the place-
ment vector X  S 2 is measured by “ cos < ”, namely the inner product
< ȝ | X > , both ȝ and X represented in polar coordinates – spherical coordi-
nates ( P / , P) , /, ) ) , respectively. In summary, (7.7) is the spherical normal pdf,
namely an element of the exponential class.
Box 7.2:
Loss function
p=1: longitudinal data
n
type1: ¦ cos < i = max ~ 1c cos Ȍ = max (7.11)
i =1
n
type 2 : ¦ (1  cos < i ) = min ~ 1c(1  cos Ȍ) = min (7.12)
i =1
n
Ȍ Ȍ
type 3 : ¦ sin 2
< i / 2 = min ~ (sin )c (sin ) = min (7.13)
i =1 2 2
transformation
1  cos < = 2sin 2 < / 2 (7.14)
" geodetic distance"
cos< i = cos(/ i  x) = cos / i cos x + sin / i sin x (7.15)
2sin < i / 2 = 1  cos < i = 1  cos / i cos x + sin / i sin x
2
(7.16)
330 7 A spherical problem of algebraic representation

ª cos <1 º ª y1 º ª cos /1 º


cos Ȍ = «« # »» , cos y := cos ȁ := «« # »» = «« # »» (7.17)
«¬cos < n »¼ «¬ yn »¼ «¬ cos / n »¼
cos Ȍ = cos y cos x + sin y sin x . (7.18)

? How to generate a loss function substituting least squares ?


Obviously the von Mises pdf ( p { 1) has maximum likelihood if 6in=1 cos < i =
6in=1 cos(/ i  x) is maximal. Equivalently 6in=1 (1  cos < i ) is minimal. By trans-
forming 1  cos < i by (7.14) into 2sin 2 < / 2 , an equivalent loss function is
6in=1 sin 2 < / 2 to be postulated minimal. According to Box 7.2 the geodetic dis-
tance is represented as a nonlinear function of the unknown mean direction
P  x . (7.17) constitutes the observation vector y  S1 .
Similarly the Fisher pdf ( p { 2) has maximum likelihood if 6in=1 cos < i is
maximal. Equivalent postulates are (7.20) 6in=1 (1  cos < i ) = min and (7.21)
6in=1 sin 2 < i / 2 = min . According to Box 7.3 the geodetic distance (7.23) is
represented as a nonlinear function of the unknown mean direction ( P/ , P) ) 
( x1 , x2 ) . (7.24), (7.25) constitute the nonlinear observational equations for direct
observations of type “longitude latitude” (/ i , ) i )  S 2 , the observation space
Y , and unknown parameters of type mean longitudinal, lateral direction
( P/ , P) )  S 2 , the parameter space X .
Box 7.3:
Loss function
p=2: longitudinal data
n
type1: ¦ cos < i = max ~ 1c cos Ȍ = max (7.19)
i =1
n
type 2 : ¦ (1  cos < i ) = min ~ 1c(1  cos Ȍ) = min (7.20)
i =1
n
Ȍ Ȍ
type 3 : ¦ sin 2
< i / 2 = min ~ (sin )c (sin ) = min (7.21)
i =1 2 2

transformation
1  cos < = 2sin 2 < / 2 (7.22)
" geodetic distance"
cos< i = cos ) i cos x2 cos(/ i  x1 ) + sin ) i sin x2 =
(7.23)
= cos ) i cos / i cos x1 cos x2 + cos ) i sin / i sin x1 cos x2 + sin ) i sin x2
7-2 Minimal geodesic distance: MINGEODISC 331

ª cos <1 º ª cos /1 º


cos Ȍ := «« # »» , cos y1 := cos ȁ := «« # »» (7.24)
«¬ cos < n »¼ «¬ cos / n »¼
ª cos )1 º
cos y 2 := cos ĭ := «« # »» , sin y1 , sin y 2 correspondingly
¬« cos ) n ¼»

ª cos )1 cos /1 º ª cos )1 sin /1 º ª sin )1 º


«
cos Ȍ = « » « » « »
# » cos x1 cos x2 + « # » sin x1 cos x2 + « # » sin x2 .
«¬ cos ) n cos / n »¼ «¬ cos ) n sin / n »¼ «¬sin ) n »¼
(7.25)
7-2 Minimal geodesic distance: MINGEODISC
By means of Definition 7.1 we define the minimal geodetic distance solution
(MINGEODISC) on S 2 . Lemma 7.2 presents you the corresponding nonlinear
normal equation whose close form solution is explicitly given by Lemma 7.3 in
terms of Gauss brackets (special summation symbols). In contrast Definition 7.4
confronts us with the definition of the minimal geodetic distance solution
(MINGEODISC) on S 2 . Lemma 7.5 relates to the corresponding nonlinear nor-
mal equations which are solved in a closed form via Lemma 7.6 , again taking
advantage of the Gauss brackets.

Definition 7.1 (minimum geodesic distance: S1 ):


A point / g  S1 is called at minimum geodesic distance to other
points / i  S1 , i  {1, " , n} if the circular distance function
n n
L(/ g ) := ¦ 2(1  cos < i ) = ¦ 2[1  cos(/ i  / g )] = min (7.26)
/g
i =1 i =1

is minimal.
n
/ g = arg {¦ 2(1  cos < i ) = min | cos < i = cos(/ i  / g )} . (7.27)
i=1

Lemma 7.2 (minimum geodesic distance, normal equation: S1 ):


A point / g  S1 is called at minimum geodesic distance to other
points / i  S1 , i  {1, " , n} if / g = x fulfils the normal equation
n n
 sin x (¦ cos / i ) + cos x (¦ sin / i ) = 0. (7.28)
i =1 i =1
332 7 A spherical problem of algebraic representation

Proof:
/ g is generated by means of the Lagrangean (loss function)
n
L( x) := ¦ 2[1  cos(/ i  x)] =
i =1
n n
= 2n  2 cos x ¦ cos / i  2sin x ¦ sin / i = min.
x
i =1 i =1

The first derivatives


d L( x ) n n
(/ g ) = 2sin / g ¦ cos / i  2 cos / g ¦ sin / i = 0
dx i =1 i =1

constitute the necessary conditions. The second derivative

d 2 L( x ) n n

2
(/ g ) = 2 cos / g ¦ cos / i + 2sin / g ¦ sin / i > 0
dx i =1 i =1

builds up the sufficiency condition for the minimum at / g .

Lemma 7.3 (minimum geodesic distance, solution of the normal equa-


tion: S1 ):
Let the point / g  S1 be at minimum geodesic distance to other
points / i  S1 , i  {1, " , n} . Then the corresponding normal
equation (7.28) is uniquely solved by
tan / g = [sin /] /[cos / ] , (7.29)

such that the circular solution point is


1
X g = e1 cos / g + e 2 sin / g = {e1 [cos /] + e 2 [sin /]} (7.30)
[sin / ] + [cos / ]2
2

with respect to the Gauss brackets


n
[sin / ]2 := (¦ sin / i ) 2 (7.31)
i=1
n
[cos / ]2 := (¦ cos / i ) 2 . (7.32)
i=1

Next we generalize MINGEODISC ( p = 1) on S1 to MINGEODISC ( p = 2)


on S 2 .

Definition 7.4 (minimum geodesic distance: S 2 ):


A point (/ g , ) g )  S 2 is called at minimum geodesic distance to
other points (/ i , ) i )  S 2 , i  {1, " , n} if the spherical distance
7-2 Minimal geodesic distance: MINGEODISC 333

function
n
L(/ g , ) g ) := ¦ 2(1  cos < i ) = (7.33)
i=1
n
= ¦ 2[1  cos ) i cos ) g cos(/ i  / g )  sin ) i sin ) g ] = min
/g ,)g
i =1

is minimal.
n
(/ g , ) g ) = arg {¦ 2(1  cos < i ) = min | (7.34)
i=1
| cos < i = cos ) i cos ) g cos(/ i  / g ) + sin ) i sin ) g } .

Lemma 7.5 (minimum geodesic distance, normal equation: S 2 ):


A point (/ g , ) g )  S 2 is called at minimum geodesic distance to
other points (/ i , ) i )  S 2 , i  {1, " , n} if / g = x1 , ) g = x2 fulfils
the normal equations
n n
 sin x2 cos x1 ¦ cos ) i cos / i  sin x2 sin x1 ¦ cos ) i sin / i +
i =1 i =1
n
(7.35)
+ cos x2 ¦ sin ) i = 0
i =1

n n
cos x2 cos x1 ¦ cos ) i sin / i  cos x2 sin x1 ¦ cos ) i cos / i = 0 . (7.36)
i =1 i =1

Proof:
(/ g , ) g ) is generated by means of the Lagrangean (loss function)
n
L( x1 , x2 ) := ¦ 2[1  cos ) i cos / i cos x1 cos x2 
i =1

 cos ) i sin / i sin x1 cos x2  sin ) i sin x2 ] =


n
= 2n  2 cos x1 cos x2 ¦ cos ) i cos / i 
i =1
n n
2sin x1 cos x2 ¦ cos ) i sin / i  2sin x2 ¦ sin ) i .
i =1 i =1

The first derivatives


wL( x) n
(/ g , ) g ) = 2sin / g cos ) g ¦ cos ) i cos / i 
w x1 i =1
n
 2 cos / g cos ) g ¦ cos ) i sin / i = 0
i =1
334 7 A spherical problem of algebraic representation

wL( x) n
(/ g , ) g ) = 2 cos / g sin ) g ¦ cos ) i cos / i +
w x2 i =1
n
+ 2sin / g sin ) g ¦ cos ) i sin / i 
i =1
n
 2 cos ) g ¦ sin ) i = 0
i =1
constitute the necessary conditions. The matrix of second derivative

w 2 L( x )
(/ g , ) g ) t 0
w xw xc
builds up the sufficiency condition for the minimum at (/ g , ) g ) .
w 2 L( x )
(/ g , ) g ) = 2 cos / g cos ) g [cos ) cos / ] +
w x12
+ 2sin / g cos ) g [cos ) sin / ]
w 2 L( x )
(/ g , ) g ) = 2sin / g sin ) g [cos ) cos / ] +
w x1 x2
+ 2 cos / g sin ) g [cos ) sin / ]
w L( x )
2
(/ g , ) g ) = 2 cos / g cos ) g [cos ) cos / ] +
w x22
+ 2sin / g cos ) g [cos ) sin / ] +
+ sin ) g [sin ) ]. .
h
Lemma 7.6 (minimum geodesic distance, solution of the normal
equation: S 2 ):
Let the point (/ g , ) g )  S 2 be at minimum geodesic distance to
other points (/ i , ) i )  S 2 , i  {1, " , n} . Then the corresponding
normal equations ((7.35), (7.36)) are uniquely solved by
tan / g = [cos ) sin /] /[cos ) cos /] (7.37)
[sin )]
tan ) g =
[cos ) cos / ]2 + [cos ) sin /]2
such that the circular solution point is
X g = e1 cos ) g cos / g + e 2 cos ) g sin / g + e3 sin ) g =
1
= *
[cos ) cos / ] + [cos ) sin /]2 + [sin )]2
2
(7.38)
*{e1[cos ) cos / ] + e 2 [cos ) sin /] + e3 [sin )]}
2 2
7-3 Special models 335

subject to
n
[cos ) cos / ] := ¦ cos ) i cos / i (7.39)
i=1
n
[cos ) sin / ] := ¦ cos ) i sin / i (7.40)
i=1
n
[sin )] := ¦ sin ) i .
i=1

7-3 Special models: from the circular normal distribution to the


oblique normal distribution
First, we present a historical note about the von Mises distribution on the circle.
Second, we aim at constructing a twodimensional generalization of the Fisher
circular normal distribution to its elliptic counterpart. We present 5 lemmas of
different type. Third, we intend to prove that an angular metric fulfils the four
axioms of a metric.

7-31 A historical note of the von Mises distribution


Let us begin with a historical note:
The von Mises Distribution on the Circle
In the early part of the last century, Richard von Mises (1918) considered the
table of the atomic weights of elements, seven entries of which are as follows:
Table 7.1

Element Al Sb Ar As Ba Be Bi
Atomic
W 26.98 121.76 39.93 74.91 137.36 9.01 209.00
Weight

He asked the question “Does a typical element in some sense have integer atomic
weight ?” A natural interpretation of the question is “Do the fractional parts of
the weight cluster near 0 and 1?” The atomic weight W can be identified in a
natural way with points on the unit circle, in such a way that equal fractional
parts correspond to identical points. This can be done under the mapping
ªcos T1 º
W ox=« , T1 = 2S (W  [W ]),
 ¬sin T1 »¼
where [u ] is the largest integer not greater than u . Von Mises’ question can now
be seen to be equivalent to asking “Do this points on the circle cluster near
e1 = [1 0]c ?”.

Incidentally, the mapping W o x can be made in another way:

336 7 A spherical problem of algebraic representation

ªcos T 2 º 1
W ox=« » , T 2 = 2S (W  [W + ]) .
 ¬sin T 2 ¼ 2
The two sets of angles for the two mappings are then as follows:
Table 7.2
Element Al Sb Ar As Ba Be Bi Average
T1 / 2S 0.98 0.76 0.93 0.91 0.36 0.01 0.00 T1 / 2S = 0.566

T 2 / 2S -0.02 -0.24 -0.06 -0.09 0.36 0.01 0.00 T 2 / 2S = 0.006

We note from the discrepancy between the averages in the final column that our
usual ways of describing data, e.g., means and standard deviations, are likely to
fail us when it comes to measurements of direction.
If the points do cluster near e1 then the resultant vector 6 Nj=1x j (here N =7)
 we should have approximately x / || x ||= e ,
should point in that direction, i.e., 1
 elements
where x = 6 x j / N and ||x|| = (xcx)1/ 2 is the length of x . For the seven  
  are considered
whose weights    we find
here, 

D D
ª 0.9617 º ª cos 344.09 º ª cos 15.91 º
x / || x ||= « » = « D » = « D»
,
  ¬ 0.2741 ¼ ¬«sin 344.09 ¼» ¬«sin  15.91 ¼»
a direction not far removed from e1 .

Von Mises then asked “For what distribution of the unit circle is the unit vec-
tor Pˆ = [cos Tˆ0 sin Tˆ0 ]c = x / || x || a maximum likelihood estimator (MLE) of a
 
direction T 0 of clustering or concentration ?” The answer is the distribution now
known as the von Mises or circular normal distribution. It has density, expressed
in terms of random angle T ,
exp{k cos(T  T 0 )} dT
,
I 0 (k ) 2S
where T 0 is the direction of concentration and the normalizing constant I 0 (k ) is
a Bessel function. An alternative expression is
exp(kx ' ȝ) dS ªcos Tˆ0 º
  , ȝ=« » , ||x|| = 1.
I 0 (k ) 2S  «¬sin Tˆ0 »¼

Von Mises’ question clearly has to do with the truth of the hypothesis
ª1 º
H 0 : T 0 = 0 or ȝ = e1 = « » .
  ¬0¼
7-3 Special models 337
It is worth mentioning that Fisher found the same distribution in another context
(Fisher, 1956, SMSI, pp. 133-138) as the conditional distribution of x , given

|| x ||= 1, when x is N 2 (ȝ, k 1 I 2 ) .
  
7-32 Oblique map projection

A special way to derive the general representation of the twodimensional gener-


alized Fisher sphere is by forming the general map of S 2 onto \ 2 . In order to
follow a systematic approach, let us denote

by { A, <} the “real” meta- versus by {D , r} resp. {x, y} , the


longitude / meta-colatitude meta-longitude / meta-
representing a point on the latitude representing the
sphere given by {P/ , P) } mapped values on the tan-
gential plane given by
{P / , P) } , alternatively by
its polar coordinates {x, y} .

At first, we want to derive the equations generating an oblique map projection of


S 2R onto T0 S 2R of equiareal type.

D = A, r = 2 R sin < / 2
versus
x = 2 R sin < / 2 cos A
y = 2 R sin < / 2sin A.
Second, we intend to derive the transformation from the local surface element
dAd< sin < to the alternate local surface element | J | dD dr sin < (D , r ) by means
of the inverse Jacobian determinant
ª dD º ª dA º
« dA 0 » « dD 0 »
« » = | J 1 | ~ J = « »
« 0 dr » « 0 d< »
¬« d < ¼» «¬ dr ¼»
ª wD wD º
« wA w< » ª DAD 0 º
J 1 =« »=
« wr wr » «¬ 0 D< r »¼
«¬ wA w< »¼
ª1 0 º
ªD A Dr A º « », J = 1
J=« D = 1 .
¬ DD < Dr < ¼» «0 » R cos < / 2
¬« R cos < / 2 ¼»
338 7 A spherical problem of algebraic representation

Third, we read the inverse equations of an oblique map projection of S 2\ of


equiareal type.

< r < r2
A = D , sin = , cos = 1  2
2 2R 2 4R

< < < < r r2


sin < = 2sin cos = 2sin 1  sin 2 =  1 2 .
2 2 2 2 R 4R
We collect our basic results in a few lemmas.
Lemma 7.7:
rdD dr
dAd < sin < = .
R2

Lemma 7.8 (oblique azimuthal map projection of S 2\ of equiareal type):

direct equations
D = A, r = 2 R sin < / 2
inverse equations
r r
A = D , sin < = 1  ( )2 .
R 2R

Lemma 7.9:
1 dD rdr
dAd < sin < = dxdy = .
R2 R2

Lemma 7.10 (oblique azimuthal map projection of S 2\ of equiareal


type):
direct equations
< <
x = 2 R sin cos A, y = 2R sin sin A
2 2
inverse equations
y < 1
tan A = , sin = x2 + y 2
x 2 2R

1 x2 + y 2 1
sin < = x2 + y 2 1  2
= x2 + y 2 4R 2  ( x 2 + y 2 ) .
R 4R 2R2
7-3 Special models 339

Lemma 7.11 (change from one chart in another chart


(“cha-cha-cha”), Kartenwechsel ):
The direct equations of the transformation of spherical longitude and
spherical latitude {/* , )* } into spherical meta-longitude and spherical
meta-colatitude { A, <} are established by

ª cot A = [  cos ) tan ) * + sin ) cos( / *  / )] / sin( / *  / )


«
¬ cos < = cos ) cos ) cos( /  / ) + sin ) sin ) )
* * *

with respect to a meta-North pole {/ , )} .


In contrast, the inverse equations of the transformation of spherical
meta-longitude and spherical meta-colatitude { A, <} into spherical
longitude and spherical latitude {/* , )* } read
ªcot(/*  / ) = [ sin ) cos A + cos ) cot < ] / sin A
«
«¬sin ) = sin ) cos < + cos ) sin < cos A .
*

We report two key problems.

Tangential plane Tangential plane


Figure 7.1 Figure 7.2

First, in the plane located at ( P/ , P) ) we place the circular normal distribution

x = 2(1  cos < ) cos A = r (< ) cos D


y = 2(1  cos < ) sin A = r (< ) sin D
or
A = D , r = 2(1  cos < )

as an alternative. A natural generalization towards an oblique normal distribution


will be given by
340 7 A spherical problem of algebraic representation

x* = x cos H + y sin H x = x* cos H  y* sin H


or
y =  x sin H + y cos H
*
y = x* sin H + y * cos H
and
( x ) F x + ( y ) F y = ( x cos H + y sin H ) 2 F x + ( x sin H + y cos H ) 2 F y =
* 2 * 2

= x 2 cos 2 H F x + x 2 sin 2 H F y + y 2 sin 2 H F x + y 2 cos 2 H F y +


+ xy (sin H cos H F x  sin H cos H F y ) =
= x 2 (cos 2 H F x + sin 2 H F y ) + y 2 (sin 2 H F x + cos 2 H F y ) +
+ xy sin H cos H ( F x  F y ).

The parameters ( F x , F y ) determine the initial values of the elliptic curve repre-
senting the canonical data set, namely (1/ F x , 1/ F y ) . The circular normal
distribution is achieved for the data set F x = F y = 1.
Second, we intend to transform the representation of coordinates of the oblique
normal distribution from Cartesian coordinates in the oblique equatorial plane to
curvilinear coordinates in the spherical reference frame:
x 2 (cos 2 H F x + sin 2 H F y ) + y 2 (sin 2 H F x + cos 2 H F y ) + xy sin H cos H ( F x  F y ) =
= r 2 cos 2 D (cos 2 H F x + sin 2 H F y ) + r 2 sin 2 D (sin 2 H F x + cos 2 H F y ) +
+ r 2 sin D cos D sin H cos H ( F x  F y )
= r 2 (< , A) cos 2 A(cos 2 H F x + sin 2 H F y ) + r 2 (< , A) sin 2 A(sin 2 H F x + cos 2 H F y ) +
+ r 2 (< , A) sin A cos A sin H cos H ( F x  F y ).

Characteristically, the radical component r (< , A) be a function of the colatitude


< and of the azimuth A. The angular coordinate is preserved, namely D = A .
Here, our comments on the topic of the oblique normal distribution are finished.
7-33 A note on the angular metric
We intend to prove that an angular metric fulfils all the four axioms of a metric.
Let us begin with these axioms of a metric, namely
< x|y > < x|y >
cos D = ,  0 d D d S œ D = arccos 1
|| x |||| y || || x |||| y ||
based on Euclidean metric forms. Let us introduce the distance function
x y < x |y >
d( , ) = arccos
|| x || || y || || x |||| y ||
7-4 Case study 341

to fulfill

M 1: d ( x, y ) t 0
M 2 : d ( x, y ) = 0 œ x = y
M 3 : d ( x, y ) = d ( y, x)
M 4 : d ( x, z ) d d ( x, y ) + ( y, z) : "Triangular Inequality".

Assume || x ||2 =|| y ||2 =|| z ||2 = 1 : Axioms M1 and M3 are easy to prove, Axiom M 2
is not complicated, but the Triangular Inequality requires work. Let x, y , z
 X, D = d (x, y ), E = d ( y, z ) and J = d ( x, z ) , i.e.
D , E , J  [0, S ],
cos D =< x, y >, cos E =< y, z >, cos J =< x, z > .
We wish to prove J d D + E . This result is trivial in the case cos t S , so we
may assume D + E  [0, S ] . The third desired inequality is equivalent to
cos J t cos(D + E ) . The proof of the basic formulas relies heavily on the proper-
ties of the inverse product:
< u + uc, v >=< u, v > + < uc, v > º
< u, v + v c >=< u, v > + < u, v c > »» for all u,uc, v, v c  \ 3 , and O  \.
< O u, v >= O < u, v >=< u, O v > »¼

Define xc, z c  \ 3 by x = (cos D )y + xc, z = (cos E )y  z c, then


< xc, z c >=< x  (cos D )y,  z + (cos E ) y >=
=  < x, z > + (cos D ) < y, z > + (cos E ) < x, y > (cos D )(cos E ) < y, y >=
=  cos J + cos D cos E + cos D cos E  cos D cos E = cos J + cos D cos E .
In the same way
|| xc ||2 =< x, xc >= 1  cos 2 D = sin 2 D
so that, since 0 d D d S , || xc ||= sin D . Similarly, || z c ||= sin E . But by Schwarz’
Inequality, < xc, z c > d || xc |||| z c || . It follows that cos J t cos D cos E  sin D sin E =
= cos(D + E ) and we are done!
7-4 Case study
Table 7.3 collects 30 angular observations with two different theodolites. The
first column contains the number of the directional observations / i ,
i  {1,...,30}, n = 30 . The second column lists in fractions of seconds the direc-
tional data, while the third column / fourth column is a printout of cos / i / sin / i .
Table 7.4 is a comparison of / g and /̂ as the arithmetic mean. Obviously, on the
level of concentration of data, / g and /̂ are nearly the same.
342 7 A spherical problem of algebraic representation

Table 7.3 The directional observation data using two theodolites


and its calculation
Theodolite 1 Theodolite 2
Value of Value of
No.
observation cos / i sin / i observation cos / i sin / i
( /i ) ( /i )
1 76 42 c 17.2 cc
D 0.229969 0.973198 D
76 42 c19.5cc 0.229958 0.973201
2 19.5 0.229958 0.973201 19.0 0.229960 0.973200
3 19.2 0.229959 0.973200 18.8 0.229961 0.973200
4 16.5 0.229972 0.973197 16.9 0.229970 0.973198
5 19.6 0.229957 0.973201 18.6 0.229962 0.973200
6 16.4 0.229972 0.973197 19.1 0.229960 0.973200
7 15.5 0.229977 0.973196 18.2 0.229964 0.973199
8 19.9 0.229956 0.973201 17.7 0.229966 0.973199
9 19.2 0.229959 0.973200 17.5 0.229967 0.973198
10 16.8 0.229970 0.973198 18.6 0.229962 0.973200
11 15.0 0.229979 0.973196 16.0 0.229974 0.973197
12 16.9 0.229970 0.973198 17.3 0.229968 0.973198
13 16.6 0.229971 0.973197 17.2 0.229969 0.973198
14 20.4 0.229953 0.973202 16.8 0.229970 0.973198
15 16.3 0.229973 0.973197 18.8 0.229961 0.973200
16 16.7 0.229971 0.973197 17.7 0.229966 0.973199
17 16.0 0.229974 0.973197 18.6 0.229962 0.973200
18 15.5 0.229977 0.973196 18.8 0.229961 0.973200
19 19.1 0.229960 0.973200 17.7 0.229966 0.973199
20 18.8 0.229961 0.973200 17.1 0.229969 0.973198
21 18.7 0.229962 0.973200 16.9 0.229970 0.973198
22 19.2 0.229959 0.973200 17.6 0.229967 0.973198
23 17.5 0.229967 0.973198 17.0 0.229970 0.973198
24 16.7 0.229971 0.973197 17.5 0.229967 0.973198
25 19.0 0.229960 0.973200 18.2 0.229964 0.973199
26 16.8 0.229970 0.973198 18.3 0.229963 0.973199
27 19.3 0.229959 0.973200 19.8 0.229956 0.973201
28 20.0 0.229955 0.973201 18.6 0.229962 0.973200
29 17.4 0.229968 0.973198 16.9 0.229970 0.973198
30 16.2 0.229973 0.973197 16.7 0.229971 0.973197
ˆ =
/ /ˆ =
¦ D
L

76 42 c17.73cc 6.898982 29.195958 D


L

76 42c17.91cc 6.898956 29.195968


sˆ = ±1.55cc sˆ = ±0.94 cc

Table 7.4:
Computation of theodolite data
Comparison of / ˆ and /
g

Left data set versus Right data set


ˆ = 76D 42'17.73'', sˆ = 1.55''
Theodolite 1: /
ˆ = 76D 42'17.91'', sˆ = 0.94 ''
Theodolite 2: /
“The precision of the theodolite two is higher compared to the theodolite one”.
7-4 Case study 343

Alternatively, let us present a second example. Let there be given observed azi-
muths / i and vertical directions ) i , by Table 7.3 in detail. First, we compute
the solution of the optimization problem
n n

¦ 2(1  cos < ) = ¦ 2[1  cos )


i =1
i
i =1
i cos ) P cos(/ i  / P )  sin ) i sin ) P ] =

= min
/P , )P

subject to values of the central direction


n n

¦ cos ) sin /
i =1
i i ¦ sin )
i =1
i
ˆ =
tan / ˆ =
, tan ) .
n n n
¦ cos )i cos /i
i =1
(¦ cos ) i cos / i ) + (¦ cos ) i sin / i )
2 2

i =1 i =1

Table 7.5:
Data of type azimuth / i and vertical direction ) i

/i )i /i )i

124D 9 88D1 125D 0 88D 0


125D 2 88D 3 124D 9 88D 2

126D1 88D 2 124D8 88D1


125D 7 88D1 125D1 88D 0

This accounts for measurements of data on the horizontal circle and the vertical
circle being Fisher normal distributed. We want to tackle two problems:

Problem 1: Compare (/ ˆ ,)
ˆ ) with the arithmetic mean (/, ) ) of the
data set. Why do the results not coincide?
Problem 2: In which case (/ˆ ,)
ˆ ) and (/, ) ) do coincide?

Solving Problem 1
Let us compute
ˆ ,)
(/, ) ) = (125D.212,5, 88D.125) and (/ ˆ ) = (125D.206, 664,5, 88D.125, 050, 77)

'/ = 0D.005,835,5 = 21cc .007,8 ,

') = 0D.000, 050, 7 = 0cc.18.

The results do not coincide due to the fact that the arithmetic means are obtained
by adjusting direct observations with least-squares technology.
344 7 A spherical problem of algebraic representation

Solving Problem 2
The results do coincide if the following conditions are met.
• All vertical directions are zero
• /̂ = / if the observations / i , ) i fluctuate only “a little” around
the constant value / 0 , ) 0
• /̂ = / if ) i = const.
• )ˆ = ) if the fluctuation of / around / is considerably
i 0
smaller than the fluctuation of ) i around ) 0 .
Note the values
8 8

¦ cos ) sin / i i = 0.213,866, 2; (¦ cos ) i sin / i ) 2 = 0.045,378,8


i =1 i =1
8 8

¦ cos ) i cos / i = 0.750,903, 27; (¦ cos ) i cos / i ) 2 = 0.022, 771,8


i =1 i =1

¦ cos ) i = 7.995, 705,3


i=1

and
/ i = / 0 + G/ i versus ) i = ) 0 + G) i

1 n 1 n
G/ = ¦ G/ i versus G) = ¦ G) i
n i =1 n i =1
n n n

¦ cos )i sin /i = n cos ) 0 sin / 0 + cos ) 0 cos / 0 ¦ G/i  sin ) 0 sin / 0 ¦ G) i =


i =1 i =1 i =1

= n cos ) 0 (sin / 0 + G/ cos / 0  G) tan / 0 sin / 0 )


n n n

¦ cos ) i cos / i = n cos ) 0 cos / 0  cos ) 0 sin / 0 ¦ G/ i  sin ) 0 sin / 0 ¦ G) i =


i =1 i =1 i =1

= n cos ) 0 (cos / 0  G/ sin / 0  G) tan / 0 cos / 0 )

ˆ = sin / 0 + G/ cos / 0  G) tan ) 0 sin / 0


tan /
cos / 0  G/ sin / 0  G) tan ) 0 cos / 0

sin / 0 + G/ cos / 0
tan / =
cos / 0  G/ sin / 0
7-4 Case study 345
n n
(¦ cos ) i sin / i ) 2 + (¦ cos ) i cos / i ) 2 =
i =1 i =1
2
= n 2 (cos 2 ) 0 + cos 2 G/ + sin 2 G)  2sin ) 0 cos G))
n

¦ sin ) i = n sin ) 0 + G) cos ) 0


i =1

ˆ = n sin ) 0 + G) cos ) 0
tan )
n cos ) 0 + G/ cos 2 ) 0 + G) 2 sin 2 ) 0  2G) sin ) 0 cos ) 0
2 2

sin ) 0 + G) cos ) 0
tan ) = .
cos ) 0  G) sin ) 0

ˆ z /, )
In consequence, / ˆ z ) holds in general.

At the end we will summarize to additional references like E. Batschelet (1965),


T.D. Downs and A.L. Gould (1967), E.W. Grafarend (1970), E.J. Gumbel et al
(1953), P. Hartmann et al (1974), and M.A. Stephens (1969).

References

Anderson, T.W. and M.A. Stephens (1972), Arnold, K.J. (1941), Barndorff-
Nielsen, O. (1978), Batschelet, E. (1965), Batschelet, E. (1971), Batschelet, E.
(1981), Beran, R.J. (1968), Beran, R.J. (1979), Blingham, C. (1964), Blingham,
C. (1974), Chang, T. (1986) Downs, T. D. and Gould, A. L. (1967), Durand, D.
and J.A. Greenwood (1957), Enkin, R. and Watson, G. S. (1996), Fisher, R. A.
(1953), Fisher, N.J. (1985), Fisher, N.I. (1993), Fisher, N.I. and Lee A. J.
(1983), Fisher, N.I. and Lee A. J. (1986), Fujikoshi, Y. (1980), Girko, V.L.
(1985), Goldmann, J. (1976), Gordon, L. and M. Hudson (1977), Grafarend, E.
W. (1970), Greenwood, J.A. and D. Durand (1955), Gumbel, E. J., Greenwood,
J. A. and Durand, D. (1953), Hammersley, J.M. (1950), Hartman, P. and G. S.
Watson (1974), Hetherington, T.J. (1981), Jensen, J.L. (1981), Jupp, P.E. and…
(1980), Jupp, P. E. and Mardia, K. V. , Kariya, T. (1989), (1989), Kendall, D.G.
(1974), Kent, J. (1976), Kent, J.T. (1982), Kent, J.T. (1983), Krumbein, W.C.
(1939), Langevin, P. (1905), Laycock, P.J. (1975), Lenmitz, C. (1995), Lenth,
R.V. (1981), Lord, R.D. (1948), Lund, U. (1999), Mardia, K.V. (1972), Mardia,
K.V. (1975), Mardia, K. V. (1988), Mardia, K. V. and Jupp, P. E. (1999),
Mardia, K.V. et al. (2000), Mhaskar, H.N., Narcowich, F.J. and J.D. Ward
(2001), Muller, C. (1966), Neudecker, H. (1968), Okamoto, M. (1973), Parker,
R.L. et al (1979), Pearson, K. (1905), Pearson, K. (1906), Pitman, J. and M. Yor
(1981), Presnell, B., Morrison, S.P. and R.C. Littell (1998), Rayleigh, L. (1880),
Rayleigh, L. (1905), Rayleigh, R. (1919), Rivest, L.P. (1982), Rivest, L.P.
346 7 A spherical problem of algebraic representation

(1988), Roberts, P.H. and H.D. Ursell (1960), Sander, B. (1930), Saw, J.G.
(1978), Saw, J.G. (1981), Scheidegger, A.E. (1965), Schmidt-Koenig, K. (1972),
Selby, B. (1964), Sen Gupta, A. and R. Maitra (1998), Sibuya, M. (1962), Stam,
A.J. (1982), Stephens, M.A. (1963), Stephens, M.A. (1964), Stephens, M. A.
(1969), Stephens, M.A. (1979), Tashiro, Y. (1977), Teicher, H. (1961), Von
Mises, R. (1918), Watson, G.S. (1956a, 1956b), Watson, G.S. (1960), Watson,
G.S.(1961), Watson, G.S. (1962), Watson, G.S. (1965), Watson, G.S. (1966),
Watson, G.S. (1967a, 1967b), Watson, G.S. (1968), Watson, G.S. (1969), Wat-
son, G.S. (1970), Watson, G.S. (1974), Watson, G.S. (1981a, 1981b), Watson,
G.S. (1982a, 1982b, 1982c, 1982d), Watson, G.S. (1983), Watson, G.S. (1986),
Watson, G.S. (1988), Watson, G.S. (1998), Watson, G.S. and E.J. Williams
(1956), Watson, G.S. and E..Irving (1957), Watson, G.S. and M.R. Leadbetter
(1963), Watson, G.S. and S. Wheeler (1964), Watson, G.S. and R.J. Beran
(1967), Watson, G.S., R. Epp and J.W. Tukey (1971), Wellner, J. (1979), Wood,
A. (1982), Xu, P.L. (1999), Xu, P.L. (2001), Xu, P. (2002), Xu, P.L. et al. (1996a,
1996b), Xu, P.L., and Shimada, S.(1997).
8 The fourth problem of probabilistic regression – special
Gauss-Markov model with random effects – Setup of BLIP
and VIP for the central moments of first order

: Fast track reading :


Read only Theorem 8.5, 8.6 and 8.7.

Lemma 8.4
hom BLIP, hom S-BLIP
and hom Į-VIP

Definition 8.1 Theorem 8.5


z : hom BLIP of z z : hom BLIP of z

Definition 8.2 Theorem 8.6


z : S-hom BLIP of z z : S-hom BLIP of z

Definition 8.3 Theorem 8.7


z : hom Į-VIP of z z : hom Į-VIP of z

The general model of type “fixed effects”, “random effects” and “error-in-
variables” will be presented in our final chapter:
Here we focus on “random effects”.
348 8 The fourth problem of probabilistic regression

Figure 8.1: Magic triangle


8-1 The random effect model
Let us introduce the special Gauss-Markov model with random effects
y = Cz + e y  Ce z specified in Box 8.1. Such a model is governed by two identi-
ties, namely the first identity CE{z} = E{y} of moments of first order and the
second identity D{y  Cz} + CD{z}Cc = D{y} of central moments of second
order. The first order moment identity CE{z} = E{y} relates the expectation
E{z} of the stochastic, real-valued vector z of unknown random effects ( “Zu-
fallseffekte”) to the expectation E{y} of the stochastic, real-valued vector y of
observations by means of the non-stochastic (“fixed”) real-valued matrix
C  \ n×l of rank rk C = l. n = dim Y is the dimension of the observation space Y,
l=dim Z the dimension of the parameter space Z of random effects z. The sec-
ond order central moment identity Ȉ y -Cz + CȈ z Cc = Ȉ y relates the variance-
covariance matrix Ȉ y -Cz of the random vector y  Cz , also called dispersion
matrix D{y  Cz} and the variance-covariance matrix Ȉ z of the random vector z,
also called dispersion matrix D{z} , to the variance-covariance matrix Ȉ y of the
random vector y of the observations, also called dispersion matrix D{y} . In the
simple random effect model we shall assume (i) rk Ȉ y = n and (ii) C{y, z} = 0 ,
namely zero correlation between the random vector y of observations and the
vector z of random effects. (In the random effect model of type Kolmogorov-
Wiener we shall give up such a zero correlation.) There are three types of un-
8-1 The random effect model 349

knowns within the simple special Gauss-Markov model with random effects: (i)
the vector z of random effects is unknown, (ii) the fixed vectors E{y}, E{z} of
expectations of the vector y of observations and of the vector z of random effects
(first moments) are unknown and (iii) the fixed matrices Ȉ y , Ȉ z of dispersion
matrices D{y}, D{z} (second central moments) are unknown.
Box 8.1:
Special Gauss-Markov model with random effects
y = Cz + e y  Ce z
E{y} = CE{z}  \ n
D{y} = D{y  Cz} + CD{z}Cc  \ n×n
C{y , z} = 0
z, E{z}, E{y}, Ȉ y-Cz , Ȉ z unknown
dim R (Cc) = rk C = l.

Here we focus on best linear predictors of type hom BLIP, hom S-BLIP and hom
Į-VIP of random effects z, which turn out to be better than the best linear uni-
formly unbiased predictor of type hom BLUUP. At first let us begin with a dis-
cussion of the bias vector and the bias matrix as well as of the Mean Square
Prediction Error MSPE{z} with respect to a homogeneous linear prediction
z = Ly of random effects z based upon Box 8.2.
Box 8.2:
Bias vector, bias matrix
Mean Square Prediction Error
in the special Gauss–Markov model with random effects
E{y} = CE{z} (8.1)
D{y} = D{y  Cz} + CD{z}Cc (8.2)
“ansatz”
z = Ly (8.3)
bias vector
ȕ := E{z  z} = E{z }  E{z} (8.4)

ȕ = LE{y}  E{z} = [I A  LC]E{z} (8.5)


bias matrix
B := I A  LC (8.6)
decomposition
350 8 The fourth problem of probabilistic regression

z  z =
(8.7)
z  E{z}  (z  E{z}) + ( E{z}  E{z})

z  z =
(8.8)
L(y  E{y})  (z  E{z})  [I A  LC]E{z}

Mean Square Prediction Error


MSPE{z} := E{(z  z )(z  z )c} (8.9)

MSPE{z} = LD{y}Lc + D{z} + [I A  LC]E{z}E{z}c [I A  LC]c (8.10)

(C{y , z} = 0, E{z  E{z}} = 0, E{z  E{z}} = 0)

modified Mean Square Prediction Error


MSPE S {z} := LD{y}Lc + D{z} + [I A  LC] S [I A  LC]c (8.11)
Frobenius matrix norms
|| MSPE{z} ||2 := tr E{(z  z )(z  z )c} (8.12)
|| MSPE{z} || = 2

= tr LD{y}Lc + tr D{z} + tr [I A  LC]E{z}E{z}c [I A  LC]c (8.13)


= || Lc ||62 + || (I A  LC)c ||2E{( z ) E ( z ) ' + tr E{(z  E{z})(z  E{z})c}
y

|| MSPE S {z} ||2 :=


:= tr LD{y}Lc + tr [I A  LC]S[I A  LC]c + tr D{z} (8.14)
= || Lc ||6y + || (I A  LC)c ||S + tr E{( z  E{z})(z  E{z})c}
2 2

hybrid minimum variance – minimum bias norm


Į-weighted

L(L) := || Lc ||62 y + 1 || (I A  LC)c ||S2 (8.15)


D
special model
dim R (SCc) = rk SCc = rk C = l . (8.16)

The bias vector ȕ is conventionally defined by E{z}  E{z} subject to the ho-
mogeneous prediction form z = Ly . Accordingly the bias vector can be repre-
sented by (8.5) ȕ = [I A  LC]E{z} . Since the expectation E{z} of the vector z
of random effects is unknown, there has been made the proposal to use instead
the matrix I A  LC as a matrix-valued measure of bias. A measure of the predic-
tion error is the Mean Square prediction error MSPE{z} of type (8.9).
MSPE{z} can be decomposed into three basic parts:
8-1 The random effect model 351

• the dispersion matrix D{z} = LD{y}Lc


• the dispersion matrix D{z}
• the bias product ȕȕc .
Indeed the vector z  z can also be decomposed into three parts of type (8.7),
(8.8) namely (i) z  E{z} , (ii) z  E{z} and (iii) E{z}  E{z} which may be
called prediction error, random effect error and bias, respectively. The triple
decomposition of the vector z  z leads straightforward to the triple representa-
tion of the matrix MSPE{z} of type (8.10). Such a representation suffers from
two effects: Firstly the expectation E{z} of the vector z of random effects is
unknown, secondly the matrix E{z}E{z c} has only rank 1. In consequence, the
matrix [I A  LC]E{z}E{z}c [I A  LC]c has only rank 1, too. In this situation there
has made the proposal to modify MSPE{z} by the matrix E{z}E{z c} and by the
regular matrix S. MSPE{z} has been defined by (8.11). A scalar measure of
MSPE{z } as well as MSPE{z} are the Frobenius norms (8.12), (8.13), (8.14).
Those scalars constitute the optimal risk in Definition 8.1 (hom BLIP) and Defi-
nition 8.2 (hom S-BLIP). Alternatively a homogeneous Į-weighted hybrid mini-
mum variance- minimum bias prediction (hom VIP) is presented in Definition
8.3 (hom Į-VIP) which is based upon the weighted sum of two norms of type
(8.15), namely
• average variance || Lc ||62 y = tr L6 y Lc
• average bias || (I A  LC)c ||S2 = tr[I A  LC] S [I A  LC]c .
The very important predictor Į-VIP is balancing variance and bias by the weight
factor Į which is illustrated by Figure 8.1.

min bias balance min variance


between variance and bias

Figure 8.1. Balance of variance and bias by the weight factor Į


Definition 8.1 ( z hom BLIP of z):
An l×1 vector z is called homogeneous BLIP of z in the special
linear Gauss-Markov model with random effects of Box 8.1, if
(1st) z is a homogeneous linear form
z = Ly , (8.17)
(2nd) in comparison to all other linear predictions z has the
352 8 The fourth problem of probabilistic regression

minimum Mean Square Prediction Error in the sense of


|| MSPE{z} ||2 =
= tr LD{y}Lc + tr D{z} + tr[I A  LC]E{z}E{z}c [I A  LC]c (8.18)
= || Lc ||62 y + || (I A  LC)c ||2E{( z ) E ( z ) ' + tr E{( z  E{z})( z  E{z})c}.

Definition 8.2 ( z S-hom BLIP of z):


An l×1 vector z is called homogeneous S-BLIP of z in the special
linear Gauss-Markov model with random effects of Box 8.1, if
(1st) z is a homogeneous linear form
z = Ly , (8.19)
(2nd) in comparison to all other linear predictions z has the mini-
mum S-modified Mean Square Prediction Error in the sense
of
|| MSPE S {z} ||2 :=
:= tr LD{y}Lc + tr[I A  LC]S[I A  LC]c + tr E{( z  E{z})( z  E{z})c}
= || Lc ||62 y + || (I A  LC)c ||S2 + tr E{( z  E{z})( z  E{z})c} = min . (8.20)
L

Definition 8.3 ( z hom hybrid min var-min bias solution,


Į-weighted or hom Į-VIP):
An l×1 vector z is called homogeneous Į-weighted hybrid mini-
mum variance- minimum bias prediction (hom Į-VIP) of z in the
special linear Gauss-Markov model with random effects of Box 8.1, if
(1st) z is a homogeneous linear form
z = Ly , (8.21)

(2nd) in comparison to all other linear predictions z has the


minimum variance-minimum bias in the sense of the
Į-weighted hybrid norm

tr LD{y}Lc + 1 tr (I A  LC) S (I A  LC)c


D
(8.22)
1
= || Lc ||6 + || (I A  LC)c ||S2 = min
2
y
D L

in particular with respect to the special model


D  \ + , dim R (SCc) = rk SCc = rk C = l .
8-1 The random effect model 353

The predictions z hom BLIP, hom S-BLIP and hom Į-VIP can be characterized
as follows:
Lemma 8.4 (hom BLIP, hom S-BLIP and hom Į-VIP):
An l×1 vector z is hom BLIP, hom S-BLIP or hom Į-VIP of
z in the special linear Gauss-Markov model with random effects
of Box 8.1, if and only if the matrix L̂ fulfils the normal equations
(1st) hom BLIP:

(6 y + CE{z}E{z}cCc)Lˆ c = CE{z}E{z}c (8.23)

(2nd) hom S-BLIP:

(6 y + CSCc)Lˆ c = CS (8.24)

(3rd) hom Į-VIP:

(6 y + 1 CSCc)Lˆ c = 1 CS . (8.25)
D D

:Proof:
(i) hom BLIP:
2
The hybrid norm || MSPE{z} || establishes the Lagrangean
L (L) := tr L6 y Lc + tr (I l  LC) E{z}E{z}c (I l  LC)c + tr 6 z = min
L

for z hom BLIP of z. The necessary conditions for the minimum of the quad-
ratic Lagrangean L (L) are
wL ˆ
(L) := 2[6 y Lˆ c + CE{z}E{z}cCcLˆ c  CE{z}E{z}c ] = 0 ,
wL
which agree to the normal equations (8.23). (The theory of matrix derivatives is
reviewed in Appendix B (Facts: derivative of a scalar-valued function of a ma-
trix: trace).)
The second derivatives

w2 L
(Lˆ ) > 0
w (vecL)w (vecL)c

at the “point” L̂ constitute the sufficiency conditions. In order to compute such


an ln×ln matrix of second derivatives we have to vectorize the matrix normal
equation
354 8 The fourth problem of probabilistic regression

wL ˆ
(L) := 2Lˆ (6 y + CE{z}E{z}cCc)  2 E{z}E{z}cCc
wL
wL
(Lˆ ) := vec[2Lˆ (6 y + CE{z}E{z}cCc)  2 E{z}E{z}cCc].
w (vecL)

(ii) hom S-BLIP:


2
The hybrid norm || MSPEs {z} || establishes the Lagrangean
L (L) := tr L6 y Lc + tr (I A  LC)S(I A  LC)c + tr 6 z = min
L

for z hom S-BLIP of z. Following the first part of the proof we are led to the
necessary conditions for the minimum of the quadratic Lagrangean L (L)
wL ˆ
(L) := 2[6 y Lˆ c + CSCcLˆ c  CS]c = 0
wL
as well as to the sufficiency conditions

w2 L
(Lˆ ) = 2[(6 y + CSCc) … I A ] > 0.
w (vecL)w (vecL)c

The normal equations of hom S-BLIP wL wL (Lˆ ) = 0 agree to (8.24).


(iii) hom Į-VIP:
The hybrid norm || Lc ||62 + 1 || (I A - LC)c ||S2 establishes the Lagrangean
y
D
L (L) := tr L6 y Lc + 1 tr (I A - LC)S(I A - LC)c = min
D L

for z hom Į-VIP of z. Following the first part of the proof we are led to the
necessary conditions for the minimum of the quadratic Lagrangean L (L)
wL ˆ
(L) = 2[(6 y + CE{z}E{z}cCc) … I A ]vecLˆ  2vec(E{z}E{z}cCc).
wL
The Kronecker-Zehfuss Product A … B of two arbitrary matrices as well as
( A + B) … C = A … B + B … C of three arbitrary matrices subject to dim A=dim
B is introduced in Appendix A. (Definition of Matrix Algebra: multiplication
matrices of the same dimension (internal relation) and multiplication of matrices
(internal relation) and Laws). The vec operation (vectorization of an array) is
reviewed in Appendix A, too. (Definition, Facts: vecAB = (Bc … I cA )vecA for
suitable matrices A and B.) Now we are prepared to compute

w2 L
(Lˆ ) = 2[(6 y + CE{z}E{z}Cc) … I A ] > 0
w (vecL)w (vecL)c
8-1 The random effect model 355

as a positive definite matrix. (The theory of matrix derivatives is reviewed in


Appendix B (Facts: derivative of a matrix-valued function of a matrix, namely
w (vecX) w (vecX)c ).)
wL ˆ
(L) = 2[ 1 CSCcLˆ c + 6 y Lˆ c  1 CS]cD = 0
wL D D
as well as to the sufficiency conditions

w2 L
(Lˆ ) = 2[( 1 CSCc + 6 y ) … I A ] > 0 .
w (vecL)w ( vecL)c D

The normal equations of hom Į-VIP wL wL (Lˆ ) = 0 agree to (8.25).


h
For an explicit representation of z as hom BLIP, hom S-BLIP and hom Į-VIP of
z in the special Gauss–Markov model with random effects of Box 8.1, we solve
the normal equations (8.23), (8.24) and (8.25) for

Lˆ = arg{L (L) = min} .


L

Beside the explicit representation of z of type hom BLIP, hom S-BLIP and hom
Į-VIP we compute the related dispersion matrix D{z} , the Mean Square Predic-
tion Error MSPE{z} , the modified the Mean Square Prediction Error MSPE S {z}
and MSPED ,S {z} and the covariance matrices C{z, z  z} in

Theorem 8.5 ( z hom BLIP):


Let z = Ly be hom BLIP of z in the special linear Gauss-Markov
model with random effects of Box 8.1. Then equivalent representations
of the solutions of the normal equations (8.23) are
z = E{z}E{z}cCc[6 y + CE{z}E{z}cCc]1 y
(8.26)
= E{z}E{z}cCc[6 y Cz + C6 z Cc + CE{z}E{z}cCc]1 y

(if [6 y + CE{z}E{z}cCc]1 exists)

are completed by the dispersion matrix


D{z} = E{z}E{z}cCc[6 y + CE{z}E{z}cCc]1 6 y ×
(8.27)
× [6 y + CE{z}E{z}cCc]1 CE{z}E{z}c

by the bias vector (8.5)


356 8 The fourth problem of probabilistic regression

ȕ := E{z }  E{z}
(8.28)
= [I A  E{z}E{z}cCc(CE{z}E{z}cCc + 6 y ) 1 C]E{z}

and by the matrix of the Mean Square Prediction Error MSPE{z} :


MSPE{z } := E{(z  z )(z  z )c}
(8.29)
= D{z} + D{z} + ȕȕc

MSPE{z } :=
D{z} + D{z} + [I A  E{z}E{z}cCc(CE{z}E{z}cCc + 6 y ) 1 C] × (8.30)
×E{z}E{z}c [I A  Cc(CE{z}E{z}cCc + 6 y ) 1 CE{z}E{z}c ].

At this point we have to comment what Theorem 8.5 tells us. hom BLIP has
generated the prediction z of type (8.26), the dispersion matrix D{z} of type
(8.27), the bias vector of type (8.28) and the Mean Square Prediction Error of
type (8.30) which all depend on the vector E{z} and the matrix E{z}E{z}c ,
respectively. We already mentioned that E{z} and E{z}E{z}c are not accessi-
ble from measurements. The situation is similar to the one in hypothesis theory.
As shown later in this section we can produce only an estimator E n{z} and con-
sequently can setup a hypothesis first moment E{z} of the "random effect" z.
Indeed, a similar argument applies to the second central moment D{y} ~ 6 y of
the "random effect" y, the observation vector. Such a dispersion matrix has to be
known in order to be able to compute z , D{z} , and MSPE{z} . Again we have
to apply the argument that we are only able to construct an estimate 6ˆ cy and to
setup a hypothesis about D{y} ~ 6 y .
Theorem 8.6 ( z hom S-BLIP):
Let z = Ly be hom S-BLIP of z in the special linear Gauss-Markov
model with random effects of Box 8.1. Then equivalent representations
of the solutions of the normal equations (8.24) are
z = SCc(6 y + CSCc) 1 y =
(8.31)
= SCc(6 y Cz + C6 z Cc + CSCc) 1 y

z = (Cc6 y1C + S 1 ) 1 Cc6 y1y (8.32)

z = (I A + SCc6 y1C) 1 SCc6 y1y (8.33)

(if S 1 , 6 y1 exist)


are completed by the dispersion matrices
D{z} = SCc(CSCc + 6 y ) 1 6 y (CSCc + 6 y ) 1 CS (8.34)
D{z} = (Cc6 y1C + S 1 ) 1 Cc6 y1C(Cc6 y1C + S 1 )1 (8.35)
8-1 The random effect model 357

(if S 1 , 6 y1 exist)


by the bias vector (8.5)
ȕ := E{z}  E{z}
= [I A  SCc(CSCc + 6 y ) 1 C]E{z}

ȕ = [I l  (Cc6 y1C + S 1 ) 1 Cc6 y1C]E{z} (8.36)

(if S 1 , 6 y1 exist)


and by the matrix of the modified Mean Square Prediction Error
MSPE{z} :
MSPE S {z} := E{(z  z )(z  z )c}
(8.37)
= D{z} + D{z} + ȕȕc

MSPES {z} = 6z + SCc(CSCc + 6y )1 6y (CSCc + 6y )1 CS +


(8.38)
+[I A  SCc(CSCc + 6y )1 C]E{z}E{z}c [Il  Cc(CSCc + 6y )1 CS]

MSPE S {z } = 6 z + (Cc6 y1C + S 1 ) 1 Cc6 y1C(Cc6 y1C + S 1 )1 CS +


+ [I A  (Cc6 y1C + S 1 ) 1 Cc6 y1C]E{z}E{z}c × (8.39)
× [I A  Cc6 y1C(Cc6 y1C + S 1 ) 1 ]

(if S 1 , 6 y1 exist).

The interpretation of hom S-BLIP is even more complex. In extension of the


comments to hom BLIP we have to live with another matrix-valued degree of
freedom, z of type (8.31), (8.32), (8.33) and D{z} of type (8.34), (8.35) do no
longer depend on the inaccessible matrix E{z}E{z}c , rk( E{z}E{z}c ) , but on the
"bias weight matrix" S, rk S = l. Indeed we can associate any element of the bias
matrix with a particular weight which can be "designed" by the analyst. Again
the bias vector ȕ of type (8.36) as well as the Mean Square Prediction Error of
type (8.37), (8.38), (8.39) depend on the vector E{z} which is inaccessible.
Beside the "bias weight matrix S" z , D{z} , ȕ and MSPE{z} are vector-valued
or matrix-valued functions of the dispersion matrix D{y} ~ 6 y of the stochastic
observation vector which is inaccessible. By hypothesis testing we may decide
upon the construction of D{y} ~ 6 y from an estimate 6y .

Theorem 8.7 ( z hom Į-VIP):


Let z = Ly be hom Į-VIP of z in the special linear Gauss-Markov
model with random effects Box 8.1. Then equivalent representations of
the solutions of the normal equations (8.25) are
358 8 The fourth problem of probabilistic regression

z = 1 SCc(6 y + 1 CSCc) 1 y
D D
(8.40)
= 1 SCc(6 y Cz + C6 z Cc + 1 CSCc) 1 y
D D
z = (Cc6 y1C + D S 1 ) 1 Cc6 y1y (8.41)

z = (I A + 1 SCc6 y1C) 1 1 SCc6 y1y (8.42)


D D
(if S 1 , 6 y1 exist)
are completed by the dispersion matrix

D{z} = 1 SCc(6 y + 1 CSCc) 1 6 y (6 y + 1 CSCc) 1 CS 1 (8.43)


D D D D
D{z} = (Cc6 y1C + D S 1 ) 1 Cc6 y1C(Cc6 y1C + D S 1 )1 (8.44)
(if S 1 , 6 y1 exist)
by the bias vector (8.5)
ȕ := E{z}  E{z}
= [I A  1 SCc( 1 CSCc + 6 y ) 1 C]E{z}
D D
ȕ = [I A  (Cc6 y1C + D S 1 ) 1 Cc6 y1C]E{z} (8.45)

(if S 1 , 6 y1 exist)


and by the matrix of the Mean Square Prediction Error MSPE{z} :
MSPE{z } := E{(z  z )(z  z )c}
(8.46)
= D{z} + D{z} + ȕȕc

MSPE{z } =
= 6 z + SCc(CSCc + 6 y ) 1 6 y (CSCc + 6 y ) 1 CS +

+[I A - 1 SCc( 1 CSCc + 6 y ) 1 C]E{z}E{z}c × (8.47)


D D
×[I A - Cc( CSCc + 6 y ) 1 CS 1 ]
1
D D
MSPE{z} =
= 6 z + (Cc6 C + D S 1 ) 1 Cc6 y1C(Cc6 y1C + D S 1 ) 1 CS +
1
y
(8.48)
+[I A - (Cc6 y1C + D S 1 ) 1 Cc6 y1C]E{z}E{z}'×
×[I A - Cc6 y1C(Cc6 y1C + D S 1 ) 1 ]
(if S 1 , 6 y1 exist).
8-1 The random effect model 359

The interpretation of the very important predictors hom Į-VIP z of z is as fol-


lows: z of type (8.41), also called ridge estimator or Tykhonov-Phillips regula-
tor, contains the Cayley inverse of the normal equation matrix which is additively
decomposed into Cc6 y1C and D S 1 . The weight factor Į balances the first inverse
dispersion part and the second inverse bias part. While the experiment informs
us of the variance-covariance matrix 6 y , say 6  y , the bias weight matrix S and
the weight factor Į are at the disposal of the analyst. For instance, by the choice
S = Diag( s1 ,..., sA ) we may emphasize increase or decrease of certain bias matrix
elements. The choice of an equally weighted bias matrix is S = I A . In contrast
the weight factor Į can be determined by the A-optimal design of type

• tr D{z} = min
D
• ȕȕc = min
D
• tr MSPE{z} = min .
D

In the first case we optimize the trace of the variance-covariance matrix


D{z} of type (8.43), (8.44). Alternatively by means of ȕȕ ' = min we optimize
D
the quadratic bias where the bias vector ȕ of type (8.45) is chosen, regardless of
the dependence on E{z} . Finally for the third case – the most popular one – we
minimize the trace of the Mean Square Prediction Error MSPE{z} of type
(8.48), regardless of the dependence on E{z}E{z}c . But beforehand let us pre-
sent the proof of Theorem 8.5, Theorem 8.6 and Theorem 8.7.
Proof:

(i) z = E{z}E{z}cCc[6 y + CE{z}E{z}cCc]1 y


If the matrix 6 y + CE{z}E{z}cCc of the normal equations of type hom BLIP is of
full rank, namely rk(6 y + CE{z}E{z}cCc) = n, then a straightforward solution of
(8.23) is

Lˆ = E{z}E{z}cCc[6 y + CE{z}E{z}cCc]1 .

(ii) z = SCc(6 y + CSCc) 1 y


If the matrix 6 y + CSC ' of the normal equations of type hom S-BLIP is of full
rank, namely rk(6 y + CSC ') = n, then a straightforward solution of (8.24) is

Lˆ = SCc[6 y + CSCc]1.

(iii) z = (Cc6 y1C + S 1 ) 1 Cc6 y1y


Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matri-
ces, s(10), Duncan-Guttman matrix identity) the fundamental matrix identity
SCc(6 y + CSCc) 1 = (Cc6 y1C + S 1 ) 1 Cc6 y1 ,
360 8 The fourth problem of probabilistic regression

if S 1 and 6 y1 exist. Such a result concludes this part of the proof.
(iv) z = (I A + SCc6 y1C) 1 SCc6 y1y

Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matri-
ces, s(9)) the fundamental matrix identity
SC '(6 y + CSCc) 1 = (I A + SCc6 y1C) 1 SCc6 y1 ,

if 6 y1 exists. Such a result concludes this part of the proof.

(v) z = 1 SCc(6 y + 1 CSCc) 1 y


D D
If the matrix 6 y + D1 CSCc of the normal equations of type hom Į-VIP is of full
rank, namely rk(6 y + D1 CSCc) = n, then a straightforward solution of (8.25) is

Lˆ = 1 SCc[6 y + 1 CSCc]1.
D D
(vi) z = (Cc6 y1C + D S 1 ) 1 Cc6 y1y

Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matri-
ces, s(10), Duncan-Guttman matrix identity) the fundamental matrix identity
1 SCc(6 + CSCc) 1 = (Cc6 1C + D S 1 ) 1 Cc6 1 ,
y y y
D
if S 1 and 6 y1 exist. Such a result concludes this part of the proof.

(vii) z = (I A + 1 SCc6 y1C) 1 1 SCc6 y1y


D D
Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matri-
ces, s(9), Duncan-Guttman matrix identity) the fundamental matrix identity
1 SCc(6 + CSCc) 1 = (I + 1 SCc6 1C) 1 1 SCc6 1 ,
y l y y
D D D
if 6 y1 exist. Such a result concludes this part of the proof.
(viii) hom BLIP: D{z}
D{z} := E{[z  E{z}][ z  E{z}]c} =
= E{z}E{z}' C '[6 y + CE{z}E{z}cCc]1 6 y ×
× [6 y + CE{z}E{z}cCc]1 CE{z}E{z}c .

By means of the definition of the dispersion matrix D{z} and the substitution of
z of type hom BLIP the proof has been straightforward.
(ix) hom S-BLIP: D{z} (1st representation)
8-1 The random effect model 361

D{z} := E{[z  E{z}][z  E{z }]c} =


= SCc(CSCc + 6 y ) 1 6 y (CSCc + 6 y ) 1 CS.

By means of the definition of the dispersion matrix D{z} and the substitution of
z of type hom S-BLIP the proof of the first representation has been straightfor-
ward.
(x) hom S-BLIP: D{z} (2nd representation)

D{z}:= E{[z  E{z}][z  E{z}]c} =


= (Cc6 y1C + S 1 ) 1 Cc6 y1C(Cc6 y1C + S 1 ) 1 ,

if S 1 and 6 y1 exist. By means of the definition of the dispersion matrix


D{z} and the substitution of z of type hom S-BLIP the proof the second repre-
sentation has been straightforward.
(xi) hom Į-VIP: D{z} (1st representation)

D{z}:= E{[z  E{z}][z  E{z}]c} =


= 1 SCc(6 y + 1 CSCc) 1 6 y (6 y + 1 CSCc) 1 CS 1 .
D D D D
By means of the definition of the dispersion matrix D{z} and the substitution of
z of type hom Į-VIP the proof the first representation has been straightforward.
(xii) hom Į-VIP: D{z} (2nd representation)

D{z } := E{[z  E{z}][z  E{z}]c} = (Cc6 y1C + D S 1 ) 1 Cc6 y1C(Cc6 y1C + D S 1 ) 1 ,

if S 1 and 6 y1 exist. By means of the definition of the dispersion matrix


D{z} and the substitution of z of type hom Į-VIP the proof of the second repre-
sentation has been straightforward.
(xiii) bias ȕ for hom BLIP, hom S-BLIP and hom Į-VIP
As soon as we substitute into the bias ȕ := E{z}  E{z} =  E{z} + E{z } the vari-
ous predictors z of the type hom BLIP, hom S-BLIP and hom Į-VIP we are
directly led to various bias representations ȕ of type hom BLIP, hom S-BLIP and
hom Į-VIP.
(xiv) MSPE{z} of type hom BLIP, hom S-BLIP and hom Į-VIP
MSPE{z}:= E{(z  z )(z  z )c}
z  z = z  E{z}  (z  E{z}) = z  E{z}  (z  E{z})  ( E{z}  E{z})
E{( z  z )(z  z )c} =
E{(z  E{z})((z  E{z})c} + E{(z  E{z})( z  E{z})c} +
+( E{z}  E{z})( E{z}  E{z})c
362 8 The fourth problem of probabilistic regression

MSPE{z} = D{z} + D{z} + ȕȕc.


At first we have defined the Mean Square Prediction Error MSPE{z} of z .
Secondly we have decomposed the difference z  z into the three terms
• z  E{z}
• z  E{z}
• E{z}  E{z}
in order to derive thirdly the decomposition of MSPE{z} , namely
• the dispersion matrix of z , namely D{z} ,
• the dispersion matrix of z , namely D{z} ,
• the quadratic bias ȕȕc .
As soon as we substitute MSPE{z} the dispersion matrix D{z} and the bias
vector ȕ of various predictors z of the type hom BLIP, hom S-BLIP and hom
Į-VIP we are directly led to various representations ȕ of the Mean Square Pre-
diction Error MSPE{z} . Here is my proof’s end.
h
8-2 Examples
Example 8.1 Nonlinear error propagation with random effect models
Consider a function y = f ( z ) where y is a scalar valued observation and z a
random effect. Three cases can be specified as follows:
Case 1 ( P z assumed to be known):
By Taylor series expansion we have
1 1
f ( z) = f (P z ) + f c( P z )( z  P z ) + f cc( P z )( z  P z ) 2 + O (3)
1! 2!
1
E{ y} = E{ f ( z )} = f ( P z ) + f cc( P z ) E{( z  P z ) 2 } + O (3)
2!
leading to (cf. E. Grafarend and B. Schaffrin 1983, p.470)
1
E{ y} = f ( P z ) +f cc( P z )V z2 + O (3)
2!
1
E{( y  E{ y}) } = E{[ f c( P z )( z  P z ) + f cc( P z )( z  P z ) 2 + O (3) 
2

2!
1
 f cc( P z )V z2  O (3)]2 },
2!
hence E{[ y  E{ y}][[ y  E{ y}]} is given by
8-2 Examples 363

1 2
V y2 = f c2 ( P z )V z2  f cc ( P z )V z4 + f fc cc( P z ) E{( z  P z )3 } +
4
1
+ f cc2 E{( z  P z ) 4 } + O (3).
4
Finally if z is quasi-normally distributed, we have
1
V y2 = f c 2 ( P z )V z2 + f cc 2 ( P z )V z4 + O (3).
2

Case 2 ( P z unknown, but [ 0 known as a fixed effect approximation


(this model is implied in E. Grafarend and B. Schaffrin 1983, p.470, [ 0 z P z )):
By Taylor series expansion we have
1 1
f ( z ) = f ([ 0 ) + f c([ 0 )( z  [ 0 ) + f cc([ 0 )( z  [ 0 ) 2 + O (3)
1! 2!
using
[ 0 = P z + ([ 0  P z ) Ÿ z  [ 0 = z  P z + ( P z  [ 0 )
we have
1 1
f ( z ) = f ([ 0 ) + f c([ 0 )( z  P z ) + f c([ 0 )( z  [ 0 ) +
1! 1!
1 1
+ f cc([ 0 )( z  P z ) + f cc([ 0 )( z  [ 0 ) 2 +
2

2! 2!
+ f cc([ 0 )( z  P z )( z  [ 0 ) + O (3)
and
1
E{ y} = E{ f ( z )} = f ([ 0 ) + f c([ 0 )( P z  [ 0 ) + f cc([ 0 )V z2 +
2
1
+ f cc([ 0 )( P z  [ 0 ) 2 + O (3)
2
leading to E{[ y  E{ y}][[ y  E{ y}]} as
V z2 = f c2 ([ 0 )V z2 + f fc cc([ 0 ) E{( z  P z )3} + 2 f fc cc([ 0 )V z2 ( P z  [ 0 ) +
1
+ f cc2 ([ 0 ) E{( z  P z ) 4 } + f cc2 ([ 0 ) E{( z  P z ) 3}( P z  [ 0 ) 
4
1
 f cc2 ([ 0 )V z4 + f cc2 ([ 0 )V z2 ( P z  [ 0 ) 2 + O (3)
4
and with z being quasi-normally distributed, we have
1 2
V z2 = f c2 ([ 0 )V z2 + 2 f fc cc([ 0 )V z2 ( P z  [ 0 ) + f cc ([ 0 )V z4 + f cc2 ([ 0 )V z2 ( P z  [ 0 ) 2 + O (3) ,
2
with the first and third terms (on the right hand side) being the right hand side-
terms of case 1 (cf. E. Grafarend and B. Schaffrin 1983, p.470).
364 8 The fourth problem of probabilistic regression

Case 3 ( P z unknown, but z0 known as a random effect approximation):


By Taylor series expansion we have
1 1
f ( z) = f (P z ) + f c( P z )( z  P z ) + f cc( P z )( z  P z ) 2 +
1! 2!
1
+ f ccc( P z )( z  P z )3 + O (4)
3!
changing
z  P z = z0  P z = z0  E{z0 }  ( P z  E{z0 })
and the initial bias
( P z  E{z0 }) = E{z0 }  P z =: E 0
leads to
z  P z = z0  E{z0 } + E 0 .
Consider
( z  P z ) 2 = ( z0  E{z0 }) 2 + E 02 + 2( z0  E{z0 }) E 0
we have
1 1
f ( z) = f (P z ) + f c( P z )( z0  E{z0 }) + f c( P z ) E 0 +
1! 1!
1 1
+ f cc( P z )( z0  E{z0 }) + f cc( P z ) E 0 + f cc( P z )( z0  E{z0 }) E 0 + O (3)
2 2

2! 2!
1 1
E{ y} = f ( P z ) + f c( P z ) E 0 + f cc( P z )V z2 + f cc( P z ) E 02 + O (3)
2 2 0

leading to E{[ y  E{ y}][[ y  E{ y}]} as

V y2 = f c2 ( P z )V z2 + f fc cc( P z ) E{( z0  E{z0 })3 } +


0

1
2 f fc cc( P z )V z2 E 0 + f cc2 ( P z ) E{( z0  E{z0 }) 4 } +
0
4
f cc2 ( P z ) E{( z0  E{z0 })3 }E 0 + f cc2 ( P z )V z2 E 02 +
0

1 1
+ f cc2 ( P z )V z4  f cc2 ( P z ) E{( z0  E{z0 }) 2 }V z2 + O (3)
4 0
2 0

and with z0 being quasi-normally distributed, we have


1 2
V y2 = f c2 ( P z )V z2 + 2 f fc cc( P z )V z2 E 0 + f cc ( P z )V z4 + f cc2 ( P z )V z2 E 02 + O (3)
0 0
2 0 0

with the first and third terms (on the right-hand side) being the right-hand side
terms of case 1.
8-2 Examples 365

Example 8.2 Nonlinear vector valued error propagation with random


effect models
In a GeoInformation System we ask for the quality of a nearly rectangular planar
surface element. Four points {P1, P2, P3, P4} of an element are assumed to have
the coordinates (x1, y1), (x2, y2), (x3, y3), (x4, y4) and form a 8×8 full variance-
covariance matrix (central moments of order two) and moments of higher order.
The planar surface element will be computed according the Gauß trapezoidal:
4
yi + yi +1
F =¦ ( xi  xi +1 )
i =1 2
with the side condition x5= x1, y5=y1. Note that within the Error Propagation
Law
w2 F
z0
wxwy
holds.
P3

P2

P4
e2

P1

e1

Figure 8.2: Surface element of a building in the map


First question
? What is the structure of the variance-covariance matrix of the four points if we
assume statistical homogeneity and isotropy of the network (Taylor-Karman
structure)?
Second question
! Approach the criterion matrix in terms of absolute coordinates. Interpolate the
correlation function linear!
366 8 The fourth problem of probabilistic regression

Table 8.1: Coordinates of a four dimensional simplex

Point x y
P1 100.00m 100.00m
P2 110.00m 117.32m
P3 101.34m 122.32m
P4 91.34m 105.00m
Table 8.2: Longitudinal and lateral correlation functions 6 m and 6 A
for a Taylor-Korman structured 4 point network

|x| 6 m (|x|) 6 A (|x|)


10m 0.700 0.450
20m 0.450 0.400
30m 0.415 0.238
Our example refers to the Taylor-Karman structure or the structure function
introduced in Chapter 3-222.
:Solution:
The Gauß trapezoidal surface element has the size:
y1 + y2 y + y3 y + y4 y + y1
F= ( x1  x2 ) + 2 ( x2  x3 ) + 3 ( x3  x4 ) + 4 ( x4  x1 ).
2 2 2 2
Once we apply the “error propagation law” we have to use (E44).
1
V F2 = JȈJ c + + (vecȈ)(vecȈ)c+ c .
2
In our case, n=1 holds since we have only one function to be computed. In con-
trast, the variance-covariance matrix enjoys the format 8×8, while the Jacobi
matrix of first derivatives is a 1×8 matrix and the Hesse matrix of second deriva-
tives is a 1×64 matrix.
(i) The structure of the homogeneous and isotropic variance-covariance matrix
is such that locally 2×2 variance-covariance matrices appear as unit matrices
generating local error circles of identical radius.
(ii) The celebrated Taylor-Karman matrix for absolute coordinates is given by
'xi 'x j
Ȉij (x p , x q ) = 6 m (| x p  x q |)G ij + [6 A (| x p  x q |)  6 m (| x p  x q |)]
|x p  x q |2
subject to
'x1 := 'x = x p  xq , 'x2 := 'y = y p  yq , i, j  {1, 2}; p, q  {1, 2,3, 4}.
8-2 Examples 367

By means of a linear interpolation we have derived the Taylor-Karman matrix


by Table 8.3 and Table 8.4.
Table 8.3: Distances and meridian correlation function 6 m and longitudinal
correlation function 6 A

p-q |x p  x q | |x p  x q |2 6m 6A xp-xq yp-yq


1-2 20.000 399.982 0.45 0.40 -10 -17.32
1-3 22.360 499.978 0.44 0.36 -1.34 -22.32
1-4 10.000 100.000 0.70 0.45 8.66 -5
2-2 10.000 100.000 0.70 0.45 8.66 -5
2-4 22.360 499.978 0.44 0.36 18.66 12.32
3-4 20.000 399.982 0.45 0.40 10 17.32

Table 8.4: Distance function versus 6 m (x), 6 A ( x)

|x| 6 m ( x) 6 A ( x)

10-20 0.95  0.025 | x | 0.5  0.005 | x |


20-30 0.52  0.0035 | x | 0.724  0.0162 | x |
Once we take care of ¦ m and ¦ A as a function of the distance for gives values of
tabulated distances we arrive at the Taylor-Karman correlation values of type
Table 8.5.
Table 8.5: Taylor-Karman matrix for the case study

x1 y1 x2 y2 x3 y3 x4 y4
x1 1 0 0.438 -0.022 0.441 -0.005 0.512 0.108
y1 1 -0.022 0.412 -0.005 0.361 0.108 0.638
x2 1 0 0.512 0.108 0.381 -0.037
y2 1 0.108 0.634 -0.037 0.417
x3 symmetric 1 0 0.438 -0.022
y3 1 -0.022 0.412
x4 1 0
y4 1

Finally, we have computed the Jacobi matrix of first derivatives in Table 8.6 and
the Hesse matrix of second derivatives in Table 8.7.
368 8 The fourth problem of probabilistic regression

Table 8.6: Table of the Jacobi matrix


“Jacobi matrix”
wF wF wF wF
J =[ , ," , , ]
wx1 wy1 wx4 wy4
J = 12 [ y2  y4 , x4  x2 , y3  y1 , x1  x3 , y4  y2 , x2  x4 , y1  y3 , x3  x1 ]

J = 12 [12.32,  18.66, 22.32,  1.34,  12.32,18.66,  22.32,1.34]

:Note:
wF y y º x0 = x4
= i +1 i -1 »
wxi 2 y0 = y4
»
wF xi +1  xi -1 » x5 = x1
= »
wyi 2 ¼ y5 = y1 .

Table 8.7: Table Hesse matrix


“Hesse matrix”
w w
H= … F (x)=
wxc wxc
w wF wF wF wF
= …[ , ,", , ]
wxc wx1 wy1 wx4 wy4
w2 F w2 F w2 F w2 F w2 F w2 F
=[ 2
, ,", , ,", , ]
wx1 wx1wy1 wx1wy4 wy1wx1 wy4 wx4 wy 2
w wF wF w wF wF
=[ ( ,", ),", ( ,", )].
wx1 wx1 wy4 wy4 wx1 wy4

Note the detailed computation in Table 8.8.


Table 8.8: Second derivatives {0, +1/2, -1/2}
: “interims formulae: Hesse matrix”:
w2 F w2 F
= =0  i, j = 1, 2,3, 4
wxi wx j wyi wy j
w2 F w2 F
= =0  i = 1, 2,3, 4
wxi wyi wyi wxi
w2 F w2 F 1
= =  i = 1, 2,3, 4
wxi wyi 1 wyi wxi +1 2
w2 F w2 F 1
= =  i = 1, 2,3, 4.
wyi wxi 1 wxi wyi +1 2
8-2 Examples 369

Results
At first, we list the distances {P1P2, P2P3, P3P4, P4P1} of the trapezoidal finite
element by |P1P2|=20 (for instance 20m), |P2P3|=10 (for instance 10m), |P3P4|=20
(for instance 20m) and |P4P1|=10 (for instance 10m).
Second, we compute V F2 ( first term) = JȈJ c by
V F2 ( first term) =
1
= [12.32,  18.66, 22.32,  1.34,  12.32, 18.62,  22.32, 1.34] ×
2
ª 1 0 0.438 -0.022 0.442 -0.005 0.512 0.108 º
« 0 1 -0.022 0.412 -0.005 0.362 0.108 0.638 »
« 0.438 -0.022 1 0 0.512 0.108 0.386 -0.037 »
«-0.022 0.412 0 1 0.108 0.638 -0.037 0.418 » ×
׫
0.442 -0.005 0.512 0.108 1 0 0.438 -0.022 »
« -0.005 0.362 0.108 0.638 0 1 -0.022 0.412 »
« 0.512 0.108 0.386 -0.037 0.438 -0.022 1 0 »
« 0.108 0.638 -0.037 0.418 -0.022 0.412 0 1 »¼
¬
1
× [12.32,  18.66, 22.32,  1.34,  12.32, 18.62,  22.32, 1.34]c =
2
= 334.7117.
1
Third, we need to compute V F2 ( second term) = H (vecȈ)(vecȈ)cH c by
2
1
V F2 ( second term) = H (vecȈ)(vecȈ)cH c = 7.2222 × 10-35
2
where
ª 0 0 0 12 0 0 0  12 º
« 0 0 1 0 0 0 1 0 »
« 1
2
1
2 »
« 0 2 0 0 0 2 0 0 »
« 1 0 0 0 1 0 0 0 »
H = vec « 2 1
2
1 »,
« 0 0 01  2 0 0 01 2 »
« 0 0 2 0 0 0 2 0 »
« 0 1 0 0 0 1 0 0 »
« 1 2 1
2 »
¬«  2 0 0 0 2 0 0 0 ¼»
and
ª 1 0 0.438 -0.022 0.442 -0.005 0.512 0.108 º
« 0 1 -0.022 0.412 -0.005 0.362 0.108 0.638 »
« 0.438 -0.022 1 0 0.512 0.108 0.386 -0.037 »
«-0.022 0.412 0 1 0.108 0.638 -0.037 0.418 »
vec Ȉ = vec « ».
« 0.442 -0.005 0.512 0.108 1 0 0.438 -0.022 »
« -0.005 0.362 0.108 0.638 0 1 -0.022 0.412 »
« 0.512 0.108 0.386 -0.037 0.438 -0.022 1 0 »
¬« 0.108 0.638 -0.037 0.418 -0.022 0.412 0 1 ¼»
Finally, we get the variance of the planar surface element F
370 8 The fourth problem of probabilistic regression

V F2 = 334.7117+7.2222 × 10-35 = 334.7117


i.e.
V F = ±18.2951 (m 2 ) .
Example 8.3 Nonlinear vector valued error propagation with random effect
models
The distance element between P1 and P2 has the size:

F = ( x2  x1 ) 2 + ( y2  y1 ) 2 .

Once we apply the “error propagation law” we have to use (E44).


1
V F2 = JȈJ c + H(vecȈ)(vecȈ)cH c.
2

Table 8.6: Table of the Jacobi matrix


“Jacobi matrix”
wF wF wF wF
J =[ , ," , , ]
wx1 wy1 wx4 wy4
( x2  x1 ) ( y2  y1 ) ( x2  x1 ) ( y2  y1 )
J =[ , , , , 0, 0, 0, 0]
F F F F
J = [0.5,  0.866, 0.5, 0.866, 0, 0, 0, 0].

Table 8.7: Table Hesse matrix


“Hesse matrix”
w w
H= … F (x)=
wxc wxc
w wF wF wF wF
= …[ , ," , , ]
wxc wx1 wy1 wx4 wy4
w2 F w2 F w2 F w2 F w2 F w2 F
=[ 2 , ," , , ," , , ]
wx1 wx1wy1 wx1wy4 wy1wx1 wy4 wx4 wy 2
w wF wF w wF wF
=[ ( ," , )," , ( ,", )].
wx1 wx1 wy4 wy4 wx1 wy4

Note the detailed computation in Table 8.8.


Table 8.8: Second derivatives
: “interims formulae: Hesse matrix”:

w wF wF 1 ( x  x )2 ( x  x1 )( y 2  y1 )
( ," , ) =[  2 31 ,  2 ,
wx1 wx1 wy 4 F F F3
1 ( x 2  x 2 ) ( x  x1 )( y 2  y1 )
 + 2 3 1 , 2 , 0, 0, 0, 0],
F F F3
8-2 Examples 371

w wF wF ( x  x )( y  y ) 1 ( y  y ) 2
( ," , ) = [ 2 1 3 2 1 ,  2 3 1 ,
wy1 wx1 wy4 F F F
( x2  x1 )( y2  y1 ) 1 ( y22  y12 )
,  + , 0, 0, 0, 0]
F3 F F3
w wF wF 1 ( x  x ) 2 ( x  x )( y  y )
( ," , ) = [ + 2 3 1 , 2 1 3 2 1 ,
wx2 wx1 wy4 F F F
1 ( x22  x12 ) ( x  x )( y  y )
 3
,  2 1 3 2 1 , 0, 0, 0, 0],
F F F
w wF wF ( x  x )( y  y ) 1 ( y  y )2
( ," , ) =[ 2 1 3 2 1 ,  + 2 3 1 ,
wy2 wx1 wy4 F F F
( x2  x1 )( y2  y1 ) 1 ( y22  y12 )
 ,  , 0, 0, 0, 0],
F3 F F3
w wF wF º
( ," , ) = [0, 0, 0, , 0, 0, 0, 0, 0] »
wxi wx1 wy4
» ,  i = 3, 4.
w wF wF »
( ," , ) = [0, 0, 0, , 0, 0, 0, 0, 0]»
wyi wx1 wy4 ¼

Results
At first, we list the distance {P1P2} of the distance element by |P1P2|=20 (for
instance 20m).
Second, we compute V F2 ( first term) = JȈJ c by
V F2 ( first term) =
= [0.5,  0.866, 0.5, 0.866, 0, 0, 0, 0] ×
ª 1 0 0.438 -0.022 0.442 -0.005 0.512 0.108 º
« 0 1 -0.022 0.412 -0.005 0.362 0.108 0.638 »
« »
« 0.438 -0.022 1 0 0.512 0.108 0.386 -0.037 »
-0.022 0.412 0 1 0.108 0.638 -0.037 0.418 »
× «« ×
0.442 -0.005 0.512 0.108 1 0 0.438 -0.022 »
« -0.005 0.362 0.108 0.638 0 1 -0.022 0.412 »»
«
« 0.512 0.108 0.386 -0.037 0.438 -0.022 1 0 »
«¬ 0.108 0.638 -0.037 0.418 -0.022 0.412 0 1 »¼
×[0.5,  0.866, 0.5, 0.866, 0, 0, 0, 0]c =
= 1.2000.
1
Third, we need to compute V F2 ( second term) = H (vecȈ)(vecȈ)cH c by
2
372 8 The fourth problem of probabilistic regression

1
V F2 ( second term) = H (vecȈ)(vecȈ)cH c = 0.0015
2
where
ª 0.0375 -0.0217 -0.0375 0.0217 0 0 0 0º
« -0.0217 0.0125 0.0217 -0.0125 0 0 0 0»
« »
« -0.0375 0.0217 0.0375 -0.0217 0 0 0 0»
0.0217 -0.0125 -0.0217 0.0125 0 0 0 0»
H = vec «« ,
0 0 0 0 0 0 0 0»
« 0 0 0 0 0 0 0 0»»
«
« 0 0 0 0 0 0 0 0»
«¬ 0 0 0 0 0 0 0 0»¼

and
ª 1 0 0.438 -0.022 0.442 -0.005 0.512 0.108 º
« 0 1 -0.022 0.412 -0.005 0.362 0.108 0.638 »
« »
« 0.438 -0.022 1 0 0.512 0.108 0.386 -0.037 »
-0.022 0.412 0 1 0.108 0.638 -0.037 0.418 »
vec Ȉ = vec «« .
0.442 -0.005 0.512 0.108 1 0 0.438 -0.022 »
« -0.005 0.362 0.108 0.638 0 1 -0.022 0.412 »»
«
« 0.512 0.108 0.386 -0.037 0.438 -0.022 1 0 »
«¬ 0.108 0.638 -0.037 0.418 -0.022 0.412 0 1 »¼

Finally, we get the variance of the distance element F


V F2 = 1.2000+0.0015 = 1.2015
i.e.
V F = ±1.0961 (m) .
9 The fifth problem of algebraic regression
- the system of conditional equations:
homogeneous and inhomogeneous equations -
{By = Bi versus c + By = Bi}

:Fast track reading:


Read Lemma 2, 3 and 6

Lemma 9.2
“inconsistent homogeneous
conditions”

Definition 9.1
Lemma 9.3
“inconsistent homogeneous
G y -norm: least squares solution
conditions”

Theorem 9.4
G y -seminorm: least squares
solution

Definition 9.5 Lemma 9.6


“inconsistent inhomogeneous “inconsistent inhomogeneous
conditions” conditions”

Here we shall outline two systems of poor conditional equations, namely homo-
geneous and inhomogeneous inconsistent equations. First, Definition 9.1 gives
us G y -LESS of a system of
inconsistent homogeneous conditional equations
which we characterize as the least squares solution with respect to the G y -
seminorm ( G y -norm) by means of Lemma 9.2, Lemma 9.3 ( G y -norm) and
Lemma 9.4 ( G y -seminorm). Second, Definition 9.5 specifies G y -LESS of a
system of
374 9 The fifth problem of algebraic regression

inconsistent inhomogeneous conditional equations


which alternatively characterize as the corresponding least squares solution with
respect to the G y -seminorm by means of Lemma 9.6. Third, we come up with
examples.

9-1 G y -LESS of a system of a inconsistent homogeneous


conditional equations
Our point of departure is Definition 9.1 by which we define G y -LESS of a sys-
tem of inconsistent homogeneous condition equations.
Definition 9.1 ( G y -LESS of a system of inconsistent homogeneous
condition equations):
An n × 1 vector i A of inconsistency is called G y -LESS (LEast
Squares Solution with respect to the G y -seminorm ) of the in-
consistent system of linear condition equations
Bi = By, (9.1)
if in comparison to all other vectors i  \ n the inequality
|| i A ||G2 := i cA G y i A d i cG y i =:|| i ||G2
y y
(9.2)
holds in particular if the vector of inconsistency i A has the least
G y -seminorm.
Lemma 9.2 characterizes the normal equations for the least squares solution of
the system of inconsistent homogeneous condition equations with respect to the
G y -seminorm.
Lemma 9.2 (least squares solutions of the system of inconsistent
homogeneous condition equations with respect to
the G y -seminorm ):
An n × 1 vector i A of the system of inconsistent homogeneous
condition equations
Bi = By (9.3)

is G y -LESS if and only if the system of normal equations

ªG y B cº ª i A º ª 0 º
=« » (9.4)
«B
¬ 0 »¼ «¬ OA »¼ ¬ B y ¼

with the q × 1 vector OA of “Lagrange multipliers” is fulfilled.


:Proof:
G y -LESS of Bi = By is constructed by means of the Lagrangean
9-1 Inconsistent homogeneous conditional equations 375

L( i, O ) := icG y i + 2O c( Bi  By ) = min .
i, O

The first derivatives


wL
( i A , OA ) = 2(G y i A + BcOA ) = 0
wi
wL
( i A , OA ) = 2( Bi A  By ) = 0
wO
constitute the necessary conditions. (The theory of vector-valued derivatives is
presented in Appendix B.) The second derivatives
wL
( i A , OA ) = 2G y t 0
wiwic
build up due to the positive semidefiniteness of the matrix G y the sufficiency
condition for the minimum. The normal equations (9.4) are derived from the two
equations of first derivatives, namely
ªG y B cº ª i A º ª 0 º
= .
«B
¬ 0 »¼ «¬ OA »¼ «¬ By »¼
h
Lemma 9.3 is a short review of the system of inconsistent homogeneous condi-
tion equations with respect to the G y -norm, Lemma 9.4 alternatively with re-
spect to the G y -seminorm.
Lemma 9.3 (least squares solution of the system of inconsistent
homogeneous condition equations with respect to the G y -norm):
An n × 1 vector i A of the system of inconsistent homogeneous
condition equations Bi = By is the least squares solution with re-
spect to the G y -norm if and only if it solves the normal equa-
tions
G y i A = Bc(BG y1Bc) 1 By. (9.5)

The solution
i A = G y1Bc( BG y1Bc) 1 By (9.6)

is unique. The “goodness of fit” of G y -LESS is


|| i A ||G2 = i cA G y i A = y cBc(BG y1Bc) 1 By.
y
(9.7)

:Proof:
A basis of the proof could be C. R. Rao’s Pandora Box, the theory of inverse
partitioned matrices (Appendix A: Fact: Inverse Partitioned Matrix /IPM/ of a
symmetric matrix). Due to the rank identity rkG y = n , the normal equations
(9.4) can be faster solved by Gauss elimination.
376 9 The fifth problem of algebraic regression

G y i A + BcOA = 0
Bi A = By.

Multiply the first normal equation by BG y1 and substitute the second normal
equation for Bi A .

BG y1G y i A = Bi A = BG y1BcOA º


»Ÿ
Bi A = By ¼

BG y1BcOA = By

OA = (BG y1Bc) 1 By.

Finally we substitute the “Lagrange multiplier” OA back to the first normal equa-
tion in order to prove
G y i A + BcOA = G y i A  Bc(BG y1Bc) 1 By = 0

i A = G y1Bc(BG y1Bc) 1 By.


h
We switch immediately to Lemma 9.4.

Lemma 9.4 (least squares solution of the system of inconsis-


tent homogeneous condition equations with respect
to the G y -seminorm ):
An n × 1 vector i A of the system of inconsistent homogeneous
condition equations Bi = By is the least squares solution with
respect to the G y -seminorm if the compatibility condition
R (Bc)  R (G y ) (9.8)

is fulfilled, and solves the system of normal equations


G y i A = Bc(BG y1Bc) 1 By , (9.9)

which is independent of the choice of the g-inverse G y .

9-2 Solving a system of inconsistent inhomogeneous conditional


equations
The text point of departure is Definition 9.5, a definition of G y -LESS of a sys-
tem of inconsistent inhomogeneous condition equations.
9-3 Examples 377

Definition 9.5 ( G y -LESS of a system of inconsistent inhomoge-


neous condition equations):
An n × 1 vector i A of inconsistency is called G y -LESS (LEast
Squares Solution with respect to the G y -seminorm ) of the in-
consistent system of inhomogeneous condition equations
c + By = B i (9.10)
(the minus sign is conventional),
if in comparison to all other vectors i  R n the inequality
|| i A ||2 := i cA G y i A d i cG y i =:|| i ||G2 y
(9.11)

holds, in particular if the vector of inconsistency i A has the least


G y -seminorm.
Lemma 9.6 characterizes the normal equations for the least squares solution of
the system of inconsistent inhomogeneous condition equations with respect to
the G y -seminorm.

Lemma 9.6 (least squares solution of the system of inconsistent


inhomogeneous condition equations with respect to
the G y -seminorm):
An n × 1 vector i A of the system of inconsistent homogeneous
condition equations
Bi = By  c = B(y  d) (9.12)

is G y -LESS if and only if the system of normal equations


ªG y B c º ª i A º ª O º
« B O » «O » = «By  c » , (9.13)
¬ ¼¬ A¼ ¬ ¼
with the q × 1 vector O of Lagrange multipliers is fulfilled. i A exists
surely if
R (Bc)  R (G y ) (9.14)
and it solves the normal equations
G y i A = B c(BG y1B c) 1 (By  c) , (9.15)

which is independent of the choice of the g-inverse G y . i A is


unique if the matrix G y is regular and in consequence positive
definite.
9-3 Examples
Our two examples relate to the triangular condition, the so-called zero misclo-
sure, within a triangular network, and the condition that the sum within a flat
triangle accounts to 180 o .
378 9 The fifth problem of algebraic regression

(i) the first example: triplet of angular observations


We assume that three observations of height differences within the triangle
PD PE PJ sum up to zero. The condition of holonomic heights says
hDE + hEJ + hJD = 0 ,
namely
ª hDE º ªiDE º
B := [1, 1, 1], y := «« hEJ »» , i := «« iEJ »» .
«¬ hJD »¼ «¬ iJD »¼

The normal equations of the inconsistent condition read for the case G y = I 3 :
i A = Bc(BBc) 1 By ,

ª1 1 1º
1
Bc(BBc)B = «1 1 1» ,
3 «1 1 1»
¬ ¼
1
(iDE )A = (iEJ )A = (iJD ) A = ( hDE + hEJ + hJD ) .
3
(ii) the second example: sum of planar triangles
Alternatively, we assume: three angles which form a planar triangle of sum to
D + E + J = 180D
namely
ªD º ªiD º
« »
B := [1, 1, 1], y := « E » , i := ««iE »» , c := 180D.
«¬ J »¼ «¬ iJ »¼

The normal equations of the inconsistent condition equation read in our case
G y = I3 :
i A = Bc(BBc) 1 (By  c) ,

ª1º
1
Bc(BBc) 1 = «1» , By  c = D + E + J  180D ,
3 «1»
¬¼
ª1º
1
(iD )A = (i E )A = (iJ )A = «1» (D + E + J  180D ) .
3 «1»
¬¼
10 The fifth problem of probabilistic regression
– general Gauss-Markov model with mixed effects–
Setup of BLUUE for the moments of first order
(Kolmogorov-Wiener prediction)
“Prediction company’s chance of success is not zero, but close to it.”
Eugene Fama
“The best way to predict the future is to invent it.”
Alan Kay

: Fast track reading :


Read only Theorem 10.3 and Theorem 10.5

Lemma 10.2
Ȉ y  BLUUE of ȟ and E{z} :

Theorem 10.3
Definition 10.1 n
ȟˆ , E {z} : Ȉ y  BLUUE of
Ȉ y  BLUUE of ȟ and E{z} :
ȟ and E{z} :

Lemma 10.4
n n
E{y}: ȟˆ , E {z} Ȉ y  BLUUE of
ȟ and E{z}

The inhomogeneous general linear Gauss-Markov model with fixed effects and
random effects will be presented first. We review the special Kolmogorov-
Wiener model and extend it by the proper stochastic model of type BIQUUE
given by Theorem 10.5.

10.1.1.1 Theorem 10.5


Homogeneous quadratic setup
of V̂ 2
380 10 The fifth problem of probabilistic regression

The extensive example for the general linear Gauss-Markov model with fixed
effects and random effects concentrates on a height network observed at two
epochs. At the first epoch we assume three measured height differences. In be-
tween the first and the second epoch we assume height differences which change
linear in time, for instance as a result of an earthquake we have found the height
difference model

hDE (W ) = hDE (0) + hDE W + O (W 2 ) .

Namely, W indicates the time interval from the first epoch to the second epoch

relative to the height difference hDE . Unknown are
• the fixed effects hDE and
• the expected values of stochastic effects
of type height difference velocities hDE
given the singular dispersion matrix of height differences. Alternative estimation
and prediction producers of
• type (V + CZCc) -BLUUE for the unknown fixed parameter vector ȟ of
height differences of initial epoch and
• the expectation data E{z} of stochastic height difference velocities z, and
• of type (V + CZCc) -BLUUE for the expectation data E{y} of height
difference measurements y,
• of type e y of the empirical error vector,
• as well as of type (V + CZCc) -BLUUP of the stochastic vector z of
height difference velocities.
For the unknown variance component V 2 of height difference observations we
review estimates of type BIQUUE.
At the end, we intend to generalize the concept of estimation and prediction of
fixed and random effects by a short historical remark.
10-1 Inhomogeneous general linear Gauss-Markov model
(fixed effects and random effects)
Here we focus on the general inhomogeneous linear Gauss-Markov model in-
cluding fixed effects and random effects. By means of Definition 10.1 we review
Ȉ y -BLUUE of ȟ and E{z} followed by the related Lemma 10.2, Theorem 10.3
and Lemma 10.4.
Box 10.1
Inhomogeneous general linear Gauss–Markov model
(fixed effects and random effects)
10-1 Inhomogeneous general linear Gauss-Markov model 381

Aȟ + CE{z} + Ȗ = E{y} (10.1)

Ȉ z := D{z}, Ȉ y := D{y}

C{y , z} = 0

ȟ, E{z}, E{y}, Ȉ z , Ȉ y unknown


Ȗ known
E{y}  Ȗ  ([ A, C]) . (10.2)

The n×1 stochastic vector y of observations is transformed by means of


y  Ȗ =: y to the new n×1 stochastic vector y of reduced observations which is
characterized by second order statistics, in particular by the first moments E{y}
and by the central second moments D{y}.

Definition 10.1 ( Ȉ y  BLUUE of ȟ and E{z} ):


The partitioned vector ȗ = Ly + ț , namely

ª ȟˆ º ª ȟˆ º ª L1 º ª ț1 º
« » = «n» = « »y + « »
¬ Șˆ ¼ «¬ E{z}»¼ ¬ L 2 ¼ ¬ț 2 ¼

is called Ȉ y  BLUUE of ȗ (Best Linear Uniformly Unbiased Es-


timation with respect to Ȉ y - norm) in (10.1) if
(1st) ȗ̂ is uniformly unbiased in the sense of

E{ȗˆ} = E{Ly + ț} = ȗ for all ȗ  R m+l (10.3)


or
ˆ
E{ȟ} = E{L1y + ț1} = ȟ for all ȟ  R m
(10.4)
n
E{Șˆ } = E{E {z}} = E{L 2 y + ț 2 } = Ș = E{z} for all Ș  R l
and
(2nd) in comparison to all other linear uniformly unbiased estimation ȗ̂
has minimum variance.
tr D{ȗˆ} := E{(ȗˆ  ȗ)c(ȗˆ  ȗ)} = tr LȈ y Lc =|| Lc ||2Ȉ = min
y
(10.5)
L
or
tr D{ȟˆ} := E{(ȟˆ - ȟ )c(ȟˆ - ȟ )} = tr L1Ȉ y L1c =|| L1c ||2Ȉ = min
y L1

n
tr D {Șˆ } := tr D{E {z}} := E {( Șˆ - Ș)
(Șˆ - Ș)} =
n n (10.6)
= E{( E {z}  E{z})c( E {z}  E{z}{z})} = tr L 2 Ȉ y Lc2 =|| Lc ||2Ȉ = min .
y L2
382 10 The fifth problem of probabilistic regression

We shall specify Ȉ y -BLUUE of ȟ and E{z} by means of ț1 = 0 , ț 2 = 0 and


writing the residual normal equations by means of “Lagrange multipliers”.

Lemma 10.2 ( Ȉ y  BLUUE of ȟ and E{z} ):


n
An (m+l)×1 vector [ȟˆ c, Șˆ c]c = [ȟˆ c, E {z}c]c = [L1c , Lc2 ]c y + [ț1c , ț c2 ]c is
c n c c
Ȉ y  BLUUE of [ȟ , E{z} ] in (10.1), if and only if
ț1 = 0, ț 2 = 0
hold and the matrices L1 and L 2 fulfill the system of normal equa-
tions
ª Ȉ y A Cº ª L1c Lc2 º ª 0 0º
« Ac 0 0 » « ȁ ȁ12 » = « I m 0» (10.7)
« » « 11 » « »
«¬ Cc 0 0 »¼ «¬ ȁ 21 ȁ 22 »¼ «¬ 0 I l »¼

or
Ȉ y L1c  Aȁ11  Cȁ 21 = 0, Ȉ y Lc2  Aȁ12  Cȁ 22 0
A cL1c = I m , A cLc2 = 0 (10.8)
CcL1c = 0, CcLc2 = I l
with suitable matrices ȁ11 , ȁ12 , ȁ 21 and ȁ 22 of “Lagrange multi-
pliers”.
Theorem 10.3 specifies the solution of the special normal equations by means of
(10.9) relative to the specific “Schur complements” (10.10)-(10.13).

Theorem 10.3 ( ȟˆ , En{z} Ȉ y  BLUUE of ȟ and E{z} ):


n
Let [ȟˆ c, E{z}c]c be Ȉ y  BLUUE of the [ȟ c, E{z}c]c in the mixed
Gauss-Markov model (10.1). Then the equivalent representations
of the solution of the normal equations (10.7)
ˆ ª ȟˆ º ª A cȈ-y1A A cȈ-1y Cº ª A cº
ˆȗ := «ª ȟ »º := « »=« Ȉ-y1y (10.9)
-1 » « »
¬ Șˆ ¼ «¬ E {z}»¼ ¬ C Ȉ y A C Ȉ y C ¼ ¬ C ¼
n c -1
c c

ȟˆ = {A cȈ-y1[I n  C(CcȈ-y1C) -1 CcȈ-y1 ]A}-1 × A cȈ-y1[I n - C(CcȈ-y1C) -1 CcȈ-y1 ] y

{z} = {CcȈ-1y [I n - A( AcȈ-y1A )-1 AcȈ-y1 ]C} × CcȈ-y1[In - A( AcȈ-y1A)-1 AcȈ-y1 ] y


1
n
Șˆ = E
ȟˆ = S -A1sA
n
Șˆ := E {z} = SC-1sC
n
ȟˆ = ( A cȈ-y1A ) 1 A cȈ-1y ( y  E {z})
Șˆ := En {z} = (CcȈ-1C) 1 CcȈ-1 ( y - Aȟˆ )
y y
10-1 Inhomogeneous general linear Gauss-Markov model 383

are completed by the dispersion matrices and the covariance matrices.


1
ˆ
°­ ª ȟ º °½ ª A cȈ y A A cȈ y Cº
-1 -1
­° ª ȟˆ º ½°
{}ˆ
D ȗ := D ® « » ¾ := D ® « » =
n ¾ « CcȈ-y1A CcȈ-y1C »
=: Ȉȗˆ
¯° ¬ Șˆ ¼ ¿° ¯° «¬ E {z}»¼ ¿° ¬ ¼

{}
D ȟˆ = {A cȈ-y1 [I n - C(CcȈ-y1C) -1 CcȈ-y1 ]A} =: Ȉȟˆ
1

C{ȟ n
ˆ Șˆ } = C{ȟˆ , E {z}}} =
=  {A cȈ-y1[I n - C(CcȈ-y1C) -1 CcȈ-y1 ]A} A cȈ-y1C(CcȈ-y1C) 1
1

= ( A cȈ-y1A ) -1 A cȈ-y1C {CȈ-y1[I n - A( A cȈ-1y A ) -1 AcȈ-y1 ]C}


-1

D {Șˆ } := D{E
n {z}} = {CcȈ-y1 [I n - A ( A cȈ-y1A ) -1 A cȈ-y1 ]C}1 =: ȈȘˆ
D{ȟˆ} = S 1 , D{Șˆ } = D{E n{z}} = S 1
A C

C{ȟˆ , z} = 0
C{Șˆ , z}:= C{E n {z}, z} = 0 ,
where the “Schur complements” are defined by
S A := A cȈ-y1[I n - C(CcȈ-y1C) -1 CcȈ-y1 ]A, (10.10)
s A := A cȈ-y1[I n - C(CcȈ-y1C) -1 CcȈ-1y ]y (10.11)

SC := CcȈ-y1[I n - A( A cȈ-y1A ) -1 A cȈ-y1 ]C (10.12)

sC := CcȈ-y1[I n - A( A cȈ-y1A ) -1 AcȈ-y1 ]y . (10.13)

Our final result (10.14)-(10.23) summarizes (i) the two forms (10.14) and (10.15)
of estimating E n {y} and D{En{y}} as derived covariance matrices, (ii) the empirical
error vector e y and the related variance-covariance matrices (10.19)-(10.21) and
(iii) the dispersion matrices D{y} by means of (10.22)-(10.23).
n
Lemma 10.4 ( E n
{y}: ȟˆ , E {z} Ȉ y  BLUUE of ȟ and E{z} ):

(i) With respect to the mixed Gauss-Markov model (10.1)


Ȉ y  BLUUE of the E{y} = Aȟ + CE{z} is given by
n
E n
{y} = Aȟˆ + C E {z} =
(10.14)
= AS -A1s A + C(CcȈy1C) 1 CcȈy1 ( y  AS -A1s A )
or
n n
E{y} = Aȟˆ + C E {z} =
(10.15)
= A ( A cȈy1A ) 1 A cȈy1 ( y  AS C-1sC ) + CS-1CsC
with the corresponding dispersion matrices
384 10 The fifth problem of probabilistic regression

D{En n
{y}} = D{Aȟˆ + C E {z}} =
n
= AD{ȟˆ}A c + A cov{ȟˆ , E n
{z}}Cc + C cov{ȟˆ , E n
{z}}A c + CD{E {z}}Cc
n
D{E n
{y}} = D{Aȟˆ + C E {z}} =
= C(CcȈ-y1C) 1 Cc + [I n - C(CcȈ-y1C) -1 CcȈ-y1 ]AS A1A c[I n - Ȉ -y1C(CcȈ -y1C) -1 Cc]
n
D{E n
{y}} = D{Aȟˆ + C E{z}} =
= A( A cȈ-y1A) 1 Ac + [I n - A( AcȈ-y1A)-1 AcȈ-y1 ]CS C1Cc[I n - Ȉ-y1A( AcȈ-y1A) -1 Ac],
where S A , s A , SC , sC are “Schur complements” (10.10), (10.11),
(10.12) and (10.13).
The covariance matrix of E n{y} and z amounts to
n
cov{E n
{y}, z} = C{Aȟˆ + C E{z}, z} = 0. (10.16)

(ii) If the “error vector” e y is empirically determined by means of


the residual vector e y = y  En { y} we gain the various representa-
tions of type
e y = [I n  C(CcȈ y1C) 1 CcȈ y1 ]( y  AS -A1s A ) (10.17)
or
e y = [I n  A ( A cȈy1A ) 1 A cȈy1 ]( y  CSC-1sC ) (10.18)

with the corresponding dispersion matrices


D{e y } = Ȉ y  C(CcȈ-y1C) 1 Cc 
(10.19)
[I n - C(CcȈ-y1C)-1 CcȈ-y1 ]AS A1A c[I n - Ȉ-y1C(CcȈ-y1C)-1 Cc]
or
D{e y } = Ȉ y  A( A cȈ -y1A) 1 Ac 
(10.20)
[I n - A( A cȈ -y1A) -1 A cȈ -y1 ]CS C1Cc[I n - Ȉ -y1A( A cȈ -y1A) -1 Ac],

where S A , s A , SC , sC are “Schur complements” (10.10), (10.11),


(10.12) and (10.13). e y and z are uncorrelated because of
C{e y , z} = 0. (10.21)
(iii) The dispersion matrices of the observation vector is given by
n
D{y} = D{Aȟˆ + C E{z} + e y } = D{Aȟˆ + C E n{z}} + D{e y }
(10.22)
D{y} = D{e y  e y } + D{e y }.
n
e y and E {y} are uncorrelated since
n
C{e y , E n
{y}} = C{e y , Aȟˆ + C E{z}} = C{e y , e y  e y } = 0 . (10.23)
10-2 Explicit representations of errors 385

10-2 Explicit representations of errors in the general Gauss-Markov


model with mixed effects
A collection of explicit representations of errors in the general Gauss-Markov
model with mixed effects will be presented: ȟ , E{z} , y  Ȗ = y , Ȉ z , Ȉ y will be
assumed to be unknown, Ȗ known. In addition, C{y, z} will be assumed to
vanish. The prediction of random effects will be summarized here. Note our
simple model
Aȟ + CE{z} = E{y}, E{y}  R ([ A, C]), rk[ A, C] = m + A < n ,
E{z} unknown, ZV 2 = D{z} , Z positive definite rk Z = s d A ,
VV 2 = D{y  Cz} , V positive semidefinite
rk V = t d n, rk[V, CZ] = n , C{z, y  Cz} = 0 .

A homogeneous-quadratic ansatz Vˆ 2 = y cMy will be specified now.


Theorem 10.5 (homogeneous-quadratic setup of Vˆ 2 ):
(i) Let Vˆ 2 = y cMy = (vec M)c( y … y ) be BIQUUE of V 2 with respect to
the model of the front desk.
Then
Vˆ 2 = ( n  m  A) 1[ y c{I n  ( V + CZCc) 1 A[ A c( V + CZCc) 1 A ]1 Ac}
( V + CZCc) 1 y  scASC1sC ]

Vˆ 2 = ( n  m  A) 1 [ y cQ( V + CZCc) 1 y  scA S A1sA ]

Q := I n  (V + CZCc) 1 C[Cc(V + CZCc) 1 C]1 Cc

subject to
[S A , s A ] := A c( V + CZCc) 1{I n  C[Cc( V + CZCc) 1 C]1 Cc( V + CZCc) 1}[ A, y ] =
= A cQ( V + CZCc) 1[ A, y ]

and
[SC , sC ] := Cc( V + CZCc) 1{I n  A[ Ac( V + CZCc) 1 A]1 Ac( V + CZCc) 1}[C, y] ,
where SA and SC are “Schur complements”.
Alternately, we receive the empirical data based upon
Vˆ 2 = ( n  m  A) 1 y c( V + CZCc) 1 e y =
= ( n  m  A) 1 e cy ( V + CZCc) 1 e y
and the related variances
386 10 The fifth problem of probabilistic regression

D{Vˆ 2 } = 2( n  m  A) 1V 4 = 2( n  m  A) 1 (V 2 ) 2
or replacing by the estimations
D{Vˆ 2 } = 2(n  m  A) 1 (Vˆ 2 ) 2
= 2(n  m  A) 1[e cy (V + CZCc) 1 e y ]2 .

(ii) If the cofactor matrix V is positive definite, we will find for the simple
representations of type BIQUUE of V 2 the equivalent representations

ª A cV 1A A cV 1Cº ª A cº 1
Vˆ 2 = ( n  m  A) 1 y c{V 1  V 1[ A, C] « 1 1 » « » A }y
¬ CcV A CcV C ¼ ¬ Cc ¼

Vˆ 2 = ( n  m  A) 1 ( y cQV 1 y  scA S A1s A )

Vˆ 2 = ( n  m  A) 1 y cV 1 [I n  A ( A cV 1A ) 1 A cV 1 ]( y  CSC1sC )
subject to the projection matrix
Q = I n  V 1C(CcV 1C) 1 Cc
and
[S A , s A ] := A cV 1 [I n  C(CcV 1C) 1 CcV 1 ][ A, y ] = A cQV 1 [ A, y ]

[SC , sC ] := {I A + CcV 1[I n  A( A cV 1A) 1 A cV 1 ]CZ}1 ×


×CcV 1[I n  A( AcV 1A) 1 AcV 1 ][C, y ].

Alternatively, we receive the empirical data based upon


Vˆ 2 = ( n  m  A) 1 y cV 1e y = ( n  m  A) 1 e cy V 1e y
and the related variances
D{Vˆ 2 } = 2( n  m  A) 1V 4 = 2( n  m  A) 1 (V 2 ) 2
Dˆ {Vˆ 2 } = 2( n  m  A) 1 (Vˆ 2 ) 2 = 2( n  m  A) 1 (e cy V 1e y ) 2 .

The proofs are straight forward.


10-3 An example for collocation
Here we will focus on a special model with fixed effects and random effects, in
particular with ȟ , E{z} , E{y} , Ȉ z , Ȉ y unknown, but Ȗ known.
We depart in analyzing a height network observed at two epochs. At the initial
epoch three height differences have been observed. From the first epochs to the
second epoch we assume height differences which change linear in time, for
instance caused by an Earthquake. There is a height varying model
hDE (W ) = hDE (0) + hDE

(0)W + O (W 2 ),
10-3 An example for collocation 387

where W notes the time difference from the first epoch to the second epoch, re-
lated to the height difference. Unknown are the fixed height differences hDE and

the expected values of the random height difference velocities hDE . Given is the
singular dispersion matrix of height difference measurements. Alternative esti-
mation and prediction data are of type (V + CZCc) -BLUUE for the unknown
parameter ȟ of height difference at initial epoch and the expected data E{z} of
stochastic height difference velocities z, of type (V + CZCc) -BLUUE of the
expected data E{y} of height difference observations y, of type e y of the em-
pirical error vector of observations and of type (V + CZCc) -BLUUP for the
stochastic vector z of height difference velocities. For the unknown variance
component V 2 of height difference observations we use estimates of type
BIQUUE. In detail, our model assumptions are
epoch 1
ª hDE º ª 1 0 º
 ªh º
E{«« hEJ »»} = « 0 1 » « DE »
 « » hEJ
«¬ hJD »¼ ¬ 1 1 ¼ ¬ ¼

epoch 2
ª hDE º
ª hDE º ª 1 0 º ª W 0 º «
«  » « hEJ »
»
E{« hEJ »} = 0 1 0 W « » « •
»
 « »« » « E{hDE }»
«¬ hJD »¼ ¬ 1 1 ¼ ¬ W W ¼ « • »
 ¬« E{hEJ }¼»
epoch 1 and 2
ª hDE º ª 1 0 º ª0 0º
« h » « » «0
EJ
« » « 0 1 0»
» ªh º « » •
h
« » 1 1 » DE 0 0 » ª E{hDE }º
E{« JD »} = « +«
kDE « 1 0 » «¬ hEJ »¼ « W 0 » «« E{hEJ
• »
}¼»
« » « ¬
» «0 W»
« k EJ » « 0 1 » « W

« k » ¬ 1 1 ¼ ¬ W »¼
¬ JD ¼
ª hDE º ª1 0º
« h » «0
« »EJ 1»
« »
« hJD » 1 1»
y := «  » , A := « ,
kDE «1 0»
« » «0
« k EJ » 1»
« 1 1 »¼
« k » ¬
¬ JD ¼
ªh º
ȟ := « DE »
¬ hEJ ¼
388 10 The fifth problem of probabilistic regression

ª 0 0º
« 0 0»
« » •
ª E{hDE }º
0 0»
C := « , E{z} = « • »
« W 0» ¬« E{hEJ }¼»
« 0 W»
« W W »
¬ ¼

rank identities
rk A=2, rk C=2, rk [A,C]=m+l=4.

The singular dispersion matrix D{y} = VV 2 of the observation vector y and the
singular dispersion matrix D{z} = ZV 2 are determined in the following. We
separate 3 cases.
(i) rk V=6, rk Z=1
1 ª1 1º
V = I6 , Z =
W 2 «¬1 1»¼
(ii) rk V=5, rk Z=2
1
V = Diag(1, 1, 1, 1, 1, 0) , Z = I 2 , rk(V +CZCc)=6
W2
(iii) rk V=4, rk Z=2
1
V = Diag(1, 1, 1, 1, 0, 0) , Z = I 2 , rk (V +CZCc)=6 .
W2
In order to be as simple as possible we use the time interval W=1.
With the numerical values of matrix inversion and of “Schur-complements”, e.g.
Table 1: (V +CZCc)-1
Table 2: {I n -A[Ac(V +CZCc)-1 A]-1 Ac(V +CZCc)-1 }
Table 3: {I n -C[Cc(V +CZCc)-1C]-1Cc(V +CZCc)-1 }
Table 4: “Schur-complements” SA, SC
Table 5: vectors sA, sC

1st case: n
ȟ̂ , D{ȟˆ} , E n
{z} , D{E{y}}
10-3 An example for collocation 389

1 ª 2 1 1 0 0 0 º 1 ª 2 y + y2  y3 º
ȟˆ = « y= « 1 ,
3 ¬1 2 1 0 0 0¼ » 3 ¬ y1 + 2 y2 + y3 »¼
V 2 ª2 1º
D{ȟˆ} = ,
3 «¬1 2 »¼

n ª 2 1 1 2 1 1º ª 2 y1  y2 + y3 + 2 y4 + y5  y6 º
E{z} = « y=« ,
¬ 1  2  1 1 2 1 »
¼ ¬  y1  2 y2  y3 + y4 + 2 y5 + y6 »¼
2
n V ª7 5º
D{E{z}} = «¬ 5 7 »¼ ,
3

2nd case: n
ȟ̂ , D{ȟˆ} , E n
{z} , D{E{z}}
1 ª 2 1 1 0 0 0 º 1 ª 2 y + y2  y3 º
ȟˆ = « y= « 1 ,
3 ¬1 2 1 0 0 0¼ » 3 ¬ y1 + 2 y2 + y3 »¼
V 2 ª2 1º
D{ȟˆ} = ,
3 «¬1 2 »¼

n ª 4 2 2 3 3 3º
E{z} = « y,
¬ 2 4 2 3 3 3 »¼

n V 2 ª13 5 º
D{E{z}} = ,
6 «¬ 5 13»¼

3rd case: n
ȟ̂ , D{ȟˆ} , E n
{z} , D{E{z}}
1 ª 2 1 1 0 0 0 º 1 ª 2 y + y2  y3 º
ȟˆ = « y= « 1 ,
3¬ 1 2 1 0 0 0 »
¼ 3 ¬ y1 + 2 y2 + y3 »¼
V 2 ª2 1º
D{ȟˆ} = ,
3 «¬1 2 »¼

n ª 2 1 1 3 0 0 º 1 ª 2 y1  y2 + y3 + 3 y4 º
E{z} = « ,
¬ 1 2 1 0 3 0 »¼ 3 «¬  y1  2 y2  y3 + 3 y5 »¼
2
n V ª5 1 º
D{E{z}} = «¬1 5»¼ .
3

Table 1:
Matrix inverse (V +CZCc)-1 for a mixed Gauss-Markov
model with fixed and random effects

V +CZCc (V +CZCc)-1
390 10 The fifth problem of probabilistic regression

1st case ª1 0 0 0 0 0º ª3 0 0 0 0 0 º
«0 1 0 0 0 0» «0 3 0 0 0 0 »
« » 1 ««0 »
«0 0 1 0 0 0» 0 3 0 0 0 »
«0 0 0 2 1 0» 3 «0 0 0 2 1 0»
«0 0 0 1 2 0» «0 0 0 1 2 0»
«0 0 0 0 0 1 »¼ «0 0 0 0 0 3»¼
¬ ¬
2nd case ª1 0 0 0 0 0 º ª4 0 0 0 0 0 º
«0 1 0 0 0 0 » «0 4 0 0 0 0 »
« » 1 «« 0 »
«0 0 1 0 0 0 » 0 4 0 0 0 »
«0 0 0 2 0 1» 4 «0 0 0 3 1 2»
«0 0 0 0 2 1» «0 0 0 1 3 2 »
«0 0 0 1 1 2 »¼ «0 0 0 2 2 4 »¼
¬ ¬
3rd case ª1 0 0 0 0 0 º ª1 0 0 0 0 0 º
«0 1 0 0 0 0 » «0 1 0 0 0 0 »
« » « »
«0 0 1 0 0 0 » «0 0 1 0 0 0 »
«0 0 0 1 0 1» «0 0 0 2 1 1»
«0 0 0 0 1 1» «0 0 0 1 2 1»
«0 0 0 1 1 3 »¼ «0 0 0 1 1 1 »¼
¬ ¬

Table 2:
Matrices {I n +[ Ac( V +CZCc) -1 A]1 Ac(V +CZCc) -1} for a mixed
Gauss-Markov model with fixed and random effects

{I n -A[ A c(V +CZCc) -1 A]-1 Ac(V +CZCc) -1 }

1st case ª 13 7 4 5 1 4º
« 7 13 4 1 5 4 »
1 «« 4 4 16 4 4 8 »
»
24 « 11 7 4 19 1 4»
« 7 11 4 1 19 4 »
« 4 4 8 4 4 16 »¼
¬
2nd case ª 13 5 6 4 4 3º
« 5 13 6 4 4 3 »
1 «« 6 6 12 0 0 6 »
»
24 « 11 5 6 20 4 3»
« 5 11 6 4 20 3 »
« 6 6 12 0 0 18 »¼
¬
10-3 An example for collocation 391

3rd case ª5 1 2 3 1 0 º
« 1 5 2  1  3 0 »
1 «« 2 2 4 2 2 0 »
»
8 « 3 1 2 5 1 0 »
« 1 3 2 1 5 0 »
«2 2 4 2 2 8 »¼
¬

Table 3:
Matrices {I n  C[Cc(V +CZCc)-1Cc]1 Cc(V +CZCc)-1 } for a mixed Gauss-Markov
model with fixed and random effects

{I n -C[Cc(V +CZCc)-1C]-1Cc(V +CZCc)-1 }


1st case ª3 0 0 0 0 0º
«0 3 0 0 0 0»
1 «« 0 0 3 0 0 0»
»
3 «0 0 0 1 1 1»
«0 0 0 1 1 1»
«0 0 0 0 0 0»¼
¬

2nd case ª2 0 0 0 0 0º
«0 2 0 0 0 0»
« »
«0 0 2 0 0 0»
«0 0 0 1 1 1 »
«0 0 0 1 1 1»
«0 0 0 0 0 0 »¼
¬
3rd case ª1 0 0 0 0 0º
«0 1 0 0 0 0»
« »
«0 0 1 0 0 0»
«0 0 0 0 0 0»
«0 0 0 0 0 0»
«0 0 0 1 1 1 »¼
¬

Table 4:
“Schur-complements” SA, SC , three cases, for a
mixed Gauss-Markov model with fixed and random effects
SA SC
S A1 SC1
(10.10) (10.12)
392 10 The fifth problem of probabilistic regression

1st case ª 2 1º 1 ª2 1º 1 ª 7 5º 1 ª7 5º


«¬ 1 2 »¼ 3 «¬1 2 »¼ 8 «¬ 5 7 »¼ 3 «¬ 5 7 »¼

2nd case ª 2 1º 1 ª2 1º 1 ª13 5º 1 ª13 5 º


«¬ 1 2 »¼ 3 «¬1 2 »¼ 24 «¬ 5 13 »¼ 6 «¬ 5 13»¼

3rd case ª 2 1º 1 ª2 1º 1 ª 5 1º 1 ª5 1 º


«¬ 1 2 »¼ 3 «¬1 2 »¼ 8 «¬ 1 5 »¼ 3 «¬1 5»¼

Table 5:
Vectors sA and sC, three cases, for a mixed
Gauss-Markov model with fixed and random effects
sA sC
(10.11) (10.13)
1st case ª1 0 1 0 0 0 º 1 ª 9 3 12 9 3 12 º
«0 1 1 0 0 0 » y 8 ¬ 3 9 12 3 9 12 »¼
« y
¬ ¼
2nd case ª1 0 1 0 0 0 º 1 ª 7 1 6 4 4 9 º
«0 1 1 0 0 0 » y 24 «¬ 1 7 6 4 4 9 »¼
y
¬ ¼
3rd case ª1 0 1 0 0 0 º 1 ª 3 1 2 5 1 0 º
«0 1 1 0 0 0 » y 8 «¬ 1 3 2 1 5 0 »¼
y
¬ ¼

First case:
n
ȟ̂ , D{ȟˆ} , E n
{z} , D{E{z}}

Second case:
n
ȟ̂ , D{ȟˆ} , E n
{z} , D{E{z}}

Third case:
n
ȟ̂ , D{ȟˆ} , E n
{z} , D{E{z}}.
Here are the results of computing
n
E n
{y} , D{E{z}}, e y and D{e y } ,

ordered as case 1, case 2, and case 3.


10-3 An example for collocation 393

Table 6:
Numerical values, Case 1
n
1st case: E n
{y} , D{E {y}} , e y , D{e y }
1

ª2 1 1 0 0 0º ª 2 y1 + y2  y3 º
«1 2 1 0 0 0 » « y1 + 2 y2 + y3 »
« » « »
n 1 1 2 0 0 0»  y + 2 y2 + 2 y3 »
E{y} = « y=« 1
«0 0 0 2 1 1» « 2 y4 + y5  y6 »
«0 0 0 1 2 1 » « y4 + 2 y5  y6 »
«0 0 0 1 1 2¼» « y + y + 2y »
¬ ¬ 4 5 6 ¼

ª2 1 1 0 0 0º
«1 2 1 0 0 0»
2 « »
n V « 1 1 2 0 0 0»
D{E{y}} =
3 «0 0 0 5 4 1»
«0 0 0 4 5 1»
«0 0 0 1 1 2 »¼
¬
ª 1 5 8 5 1 4 º
« 5 1 8 1 5 4»
1 «« 8 8 4 4 4 8»
»
e y =
12 « 11 7 4 7 11 8 »
«7 11 4 11 7 8»
« 4 4 8 8 8 4 »¼
¬
ª 1 1 1 0 0 0º
« 1 1 1 0 0 0»
2 « »
V « 1 1 1 0 0 0»
D{e y } = .
3 «0 0 0 1 1 1»
«0 0 0 1 1 1»
«0 0 0 1 1 1 »¼
¬

Table 7
Numerical values, Case 2
n
2nd case: E n
{y} , D{E{y}} , e y , D{e y }
1

ª4 2 2 0 0 0º
«2 4 2 0 0 0»
n 1 « 2 2 4 0 0 0»
E{y} = «
6« 0 0 0 3 3 3»
»
«0 0 0 3 3 3»
¬0 0 0 0 0 6¼
394 10 The fifth problem of probabilistic regression

ª2 1 1 0 0 0º
«1 2 1 0 0 0»
n V 2
« 1 1 2 0 0 0»
D{E{y}} = «
3 «0 0 0 5 4 1»
»
«0 0 0 4 5 1»
¬0 0 0 1 1 2¼

ª 2 2 2 0 0 0 º ª 2 y1  2 y2 + 2 y3 º
« 2 2 2 0 0 0 » « 2 y1 + 2 y2  2 y3 »
« »
1 «« 2 2 2 0 0 0 »» 2 y  2 y2 + 2 y3 »
e y = y=« 1
6 « 0 0 0 3 3 3 » « 3 y4  3 y5 + 3 y6 »
« 0 0 0 3 3 3» « 3 y4  3 y5 + 3 y6 »
«0 0 0 0 0 0» « »
¬ ¼ ¬ 3 y4 + 3 y5  3 y6 ¼
ª 2 2 2 0 0 0º
« 2 2 2 0 0 0»
2 « »
V « 2 2 2 0 0 0»
D{e y } =
6 « 0 0 0 3 3 0»
« 0 0 0 3 3 0»
«0 0 0 0 0 0 »¼
¬
n
1st case: E n
{y} , D{E{y}} , e y , D{e y }

ª2 1 1 0 0 0º ª 2 y1 + y2  y3 º
«1 2 1 0 0 0» « y1 + 2 y2 + y3 »
« » « »
n 1 1 2 0 0 0»  y + 2 y2 + 2 y3 »
E{y} = « y=« 1
«0 0 0 2 1 1» « 2 y4 + y5  y6 »
«0 0 0 1 2 1 » « y4 + 2 y5  y6 »
«0 0 0 1 1 2¼» « y + y + 2y »
¬ ¬ 4 5 6 ¼

ª2 1 1 0 0 0º
«1 2 1 0 0 0»
2 « »
n V « 1 1 2 0 0 0»
D{E{y}} =
3 «0 0 0 5 4 1»
«0 0 0 4 5 1»
«0 0 0 1 1 2 »¼
¬
ª 1 5 8 5 1 4 º
« 5 1 8 1 5 4»
1 «8 8 4 4 4 8»
»
e y = «
12 « 11 7 4 7 11 8 »
«7 11 4 11 7 8»
« 4 4 8 8 8 4 »¼
¬
10-3 An example for collocation 395

ª 1 1 1 0 0 0º
« 1 1 1 0 0 0»
2 « »
V « 1 1 1 0 0 0»
D{e y } = .
3 «0 0 0 1 1 1»
«0 0 0 1 1 1»
«0 0 0 1 1 1 »¼
¬

Table 8
Numerical values, Case 3
n
3rd case: E n
{y} , D{E{y}} , e y , D{e y }

ª0 0 0 0 0 0º
«0 0 0 0 0 0»
n 1« 0 0 0 0 0 0»
»
E{y} = «
3 « 2 1 1 0 0 0»
« 1 2 1 0 3 0»
« 1 1 0 3 3 0»¼
¬
ª2 1 1 0 0 0º
«1 2 1 0 0 0»
2 « »
n V « 1 1 2 0 0 0»
D{E{y}} =
3 «0 0 0 3 0 3»
«0 0 0 0 3 3»
«0 0 0 3 3 6 »¼
¬
ª 1 1 1 0 0 0º ª y1  y2 + y3 º
« 1 1  1 0 0 0» «  y1 + 2 y2  y3 »
1 « 1 1 1 0 0
»

«
y  y2 + y3 »
»
e y = « y=« 1
3« 0 0 0 0 0 0» « 0 »
«0 0 0 0 0 0» « 0 »
«0 0 0 0 0 »
0¼ « 0 »
¬ ¬ ¼
ª 1 1 1 0 0 0º
« 1 1 1 0 0 0»
V 2 «« 1 1 1 0 0 0»
»
D{e y } = .
3 «0 0 0 0 0 0»
«0 0 0 0 0 0»
«0 0 0 0 0 0 »¼
¬
396 10 The fifth problem of probabilistic regression

Table 9
Data of type z , D{z} for 3 cases
1st case:
1 ª 2 1 1 2 1 1º
z = « y,
3 ¬ 1 2 1 1 2 2 »¼
V ª7 5 º
D{z} = ,
3 «¬ 5 7 »¼
2nd case:
1 ª 4 2 2 3 3 3º
z = « y,
3 ¬ 2 4 2 3 3 3 »¼
V ª13 5 º
D{z} = ,
6 «¬ 5 13»¼
3rd case:
1 ª 2 1 1 3 0 0 º
z = « y,
3 ¬ 1 2 1 0 3 0 »¼
V ª5 1 º
D{z} = ,
3 «¬1 5»¼

Table 10
Data of type Vˆ 2 , D{Vˆ 2 }, Dˆ {Vˆ 2 } for 3 cases
1st case: n=6, m=2, A = 2 , n  m  A = 2
ª7 1 2 5 1 4º
« 1 7 2 1 5 4 »
1 « 2 2 10 4 4 8 »
»
Vˆ 2 = y c « y , D{Vˆ 2 } = V 4 ,
12 « 5 1 4 7 1 2 »
« 1 5 4 1 7 2»
«4 4 8 2 2 10 »¼
¬
ª7 1 2 5 1 4º
« 1 7 2 1 5 4 »
« »
1 2 2 10 4 4 8 » 2
Dˆ {Vˆ 2 } = {y c « y} ,
144 « 5 1 4 7 1 2 »
« 1 5 4 1 7 2»
«4 4 8 2 2 10 »¼
¬
2nd case: n=6, m=2, A = 2 , n  m  A = 2
10-4 Comments 397

ª 2 2 2 0 0 0 º
« 2 2 2 0 0 0 »
1 « 2 2 2 0 0 0 »
Vˆ = y c « » y , D{Vˆ } = V ,
2 2 4

12 « 0 0 0 3 3 3 »
« 0 0 0 3 3 3»
¬ 0 0 0 3 3 3 ¼
ª2 2 2 0 0 0 º
« 2 2 2 0 0 0 »
ˆ 1 «2 2 2 0 0 0 » 2
D{Vˆ } =
2
{y c « y} ,
144 « 0 0 0 3 3 3 »
»
«0 0 0 3 3 3»
¬0 0 0 3 3 3 ¼
3rd case: n=6, m=2, A = 2, nmA = 2
ª 1 1 1 0 0 0º
« 1 1 1 0 0 0»
1 « 1 1 1 0 0 0»
Vˆ 2 = y c « y , D{Vˆ 2 } = V 4 ,
6 «0 0 0 0 0 0»
»
«0 0 0 0 0 0»
¬0 0 0 0 0 0¼

ª 1 1 1 0 0 0º
« 1 1 1 0 0 0»
1 « 1 1 1 0 0 0» 2
Dˆ {Vˆ } =
2
{y c « y} .
144 « 0 0 0 0 0 0»
»
«0 0 0 0 0 0»
¬0 0 0 0 0 0¼
Here is my journey’s end.
10-4 Comments
(i)
In their original contribution A. N. Kolmogorov (1941 a, b, c) and N. Wiener
(“yellow devil”, 1939) did not depart from our general setup of fixed effects and
random effects. Instead they departed from the model
q q
“ yPD = cPD + ¦ cPD QE yQE + ¦c PD QE QJ yQE yQJ + O ( y 3 ) ”
E =1 E J , =1

of nonlinear prediction, e. g. homogeneous linear prediction


yP = yQ cPQ + yQ cPQ + " + yQ cPQ
1 1 1 1 2 1 2 q 1 q

yP = yQ cP Q + yQ cP Q + " + yQ cP Q
2 1 2 1 2 2 2 q 2 q


yP = yQ cP Q + yQ cP Q + " + yQ cP Q
p 1 1 p 1 1 2 p 1 2 q p 1 q

yP = yQ cP Q + yQ cP Q + " + yQ cP Q .
p 1 p 1 2 p 2 q p q
398 10 The fifth problem of probabilistic regression

From given values of “random effects” ( yQ ," , yQ ) other values of “random


1 p
effects” ( yP ," , yP ) have been predicted, namely under the assumption of
1 p
“equal correlation” P of type

P1 P2 = Q1 Q2“

n
E{ yP } = C(CȈ y1 C) 1 CȈ y1 yP
P P

n
D{E{ yP }} = C(CȈ y1 C) 1 Cc P

or
for all yP  { yP ," , yP }
1 P

|| yP  yˆ P ||:= E{( yP  yˆ P )}2 d min ,


Qq
E{( yP  yˆ P ) 2 } = E{( y2i  ¦ y1c j cij ) 2 } = min( KW )
Q1

for homogeneous linear prediction.


“ansatz”
E{ yP  E{ yˆ P }} := 0 , E{( yP  E{ yˆ P })( yP  E{ yˆ P })} = cov( yP , yP )
1 1 2 2 1 2

Q = Qq

E{( yP  yˆ P ) 2 } = cov( P, P )  ¦c pj cov( P, Q j ) + ¦¦ c pj c pk cov(Q j , Qk ) = min


Q = Q1 Qj Qk

“Kolmogorov-Wiener prediction”
k =q
c pj ( KW ) = ¦ [ cov(Q j , Qk )]1 cov(Qk , P)
k =1

q q
E{( yP  yˆ P ) 2 | KW } = cov( P, P)  ¦¦ cov( P, Q j ) cov( P, Qk )[cov(Q j , Qk )]1
j =1 k =1

constrained to

| Q j  Qk |
Qj Qk

cov(Q j , Qk ) = cov(Q j  Qk )

“ yP  E{ yP } is weakly translational invariant”


KW prediction suffers from the effect that “apriori” we know only one realiza-
tion of the random function y (Q1 )," , y (Qq ) , for instance.
10-4 Comments 399

n 1
cov(W) = ¦ ( yQ  E{ yQ })( yQ  E{ yQ }).
NG |Q j  Qk |
j j k k

Modified versions of the KW prediction exist if we work with random fields in


the plane (weak isotropy) or on the plane (rotational invariances).
As a model of “random effects” we may write

¦c PD QE E{ yQE } = E{ yPD }  CE{z} = E{ y}.


QE

(ii)
The first model applies if we want to predict data of one type to predict data of
the same type. Indeed, we have to generalize if we want to predict, for instance,
vertical deflections,
gravity gradients, from gravity disturbances.
gravity values,
The second model has to start from relating one set of heterogeneous data to
another set of heterogeneous data. In the case we have to relate the various data
sets to each other. An obvious alternative setup is

¦c 1 PD QE E{z1QE } + ¦ c2 PD RJ E{z2 RJ } + " = E{ yPD } .


QE RJ

(iii)
The level of collocation is reached if we include a trend model in addition to
Kolmogorov-Wiener prediction, namely
E{y} = Aȟ + CE{z} ,
the trend being represented by Aȟ . The decomposition of „trend“ and “signal”
is well represented in E. Grafarend (1976), E. Groten (1970), E. Groten and H.
Moritz (1964), S. Heitz (1967, 1968, 1969), S. Heitz and C.C. Tscherning (1972),
R.A. Hirvonen (1956, 1962), S. K. Jordan (1972 a, b, c, 1973), W.M. Kaula
(1959, 1963, 1966 a, b, c, 1971), K.R. Koch (1973 a, b), K.R. Koch and S. Lauer
(1971), L. Kubackova (1973, 1974, 1975), S. Lauer (1971 a, b), S.L. Lauritzen
(1972, 1973, 1975), D. Lelgemann (1972, 1974), P. Meissl (1970, 1971), in par-
ticular H. Moritz (1961, 1962, 1963 a, b, c, d, 1964, 1965, 1967, 1969, 1970 a, b,
c, d, e, 1971, 1972, 1973 a, b, c, d, e, f, 1974 a, b, 1975), H. Moritz and K.P.
Schwarz (1973), W. Mundt (1969), P. Naicho (1967, 1968), G. Obenson (1968,
1970), A.M. Obuchow (1947, 1954), I. Parzen (1960, 1963, 1972), L.P. Pellinen
(1966, 1970), V.S. Pugachev (1962), C.R. Rao (1971, 1972, 1973 a, b), R. Rupp
(1962, 1963, 1964 a, b, 1966 a, b, c, 1972, 1973 a, b, c, 1974, 1975), H. P.
Robertson (1940), M. Rosenblatt (1959, 1966), R. Rummel (1975 a, b), U. Schatz
(1970), I. Schoenberg (1942), W. Schwahn (1973, 1975), K.P. Schwarz (1972,
400 10 The fifth problem of probabilistic regression

1974 a, b, c, 1975 a, b, c), G. Seeber (1972), H.S. Shapiro (1970), L. Shaw et al


(1969), L. Sjoeberg (1975), G. N. Smith (1974), F. Sobel (1970), G.F. Taylor
(1935, 1938), C. Tscherning (1972, a, b, 1973, 1974 a, b, 1975 a, b, c, d, e), C.
Tscherning and R.H Rapp (1974), V. Vyskocil (1967, 1974 a, b, c), P. Whittle
(1963 a, b), N. Wiener (1958, 1964), H. Wolf (1969, 1974), E. Wong (1969), E.
Wong and J.B. Thoma (1961), A.M. Yaglom (1959, 1961).
(iv)
An interesting comparison is the various solutions of type
• ŷ2 = lˆ + Ly
ˆ
1 (best inhomogeneous linear prediction)
• ˆ
ŷ2 = Ly (best homogeneous linear prediction)
1

• ˆ
ŷ2 = Ly (best homogeneous linear unbiased prediction)
1

dispersion identities
D3 d D2 d D4 .
(v)
In spite of the effect that “trend components” and “KW prediction” may serve
well the needs of an analyst, generalizations are obvious. For instance, in Krige’s
prediction concept it is postulated that only
|| y p  y p ||2 = : E{( y p  y p  ( E{ y p }  E{ y p })) 2 }
1 2 1 2 1 2

is a weakly relative translational invariant stochastic process. A.N. Kolmogorov


has called the weakly relation translational invariant random function
“structure function”
Alternatively, higher order variance-covariance functions have been proposed:
|| y1 , y2 , y3 ||2 = : E{( y1  2 y2 + y3 )  ( E{ y1 }  2 E{ y2 } + E{ y3 }) 2 }
|| y1 , y2 , y3 , y4 ||2 = : E{( y1  3 y2 + 3 y3  y4 )  ( E{ y1}  3E{ y2 } + 3E{ y3 }  E{ y 4 }) 2 }
etc..
(vi)
Another alternative has been the construction of higher order absolute variance-
covariance functions of type
|| ( yQ  E{ yQ }) ( yQ  E{ yQ }) ( yQ  E{ yQ }) ||
1 1 2 2 3 3

|| ( yQ  E{ yQ }) (...) ( yQ  E{ yQ }) ||
1 1 n n

like in E. Grafarend (1984) derived from the characteristic functional, namely a


series expression of higher order variance-covariance functions.
11 The sixth problem of probabilistic regression
– the random effect model – “errors-in-variables”
“In difference to classical regression error-in-variables models here measure-
ments occurs in the regressors. The naive use of regression estimators leads to
severe bias in this situation. There are consistent estimators like the total least
squares estimator (TLS) and the moment estimator (MME).
J. Polzehl and S. Zwanzig
Department of Mathematics
Uppsala University, 2003 A.D.”
Read only Definition 11.1 and Lemma 11.2
Please, pay attention to the guideline of Chapter 11, namely to Figure
11.1, 11.2 and 11.3, and the Chapter 11.3: References

Definition 11.1 Lemma 11.2


“random effect model: ”error-in-variables model,
errors-in variables” normal equations”

Figure 11.1: Magic triangle


402 11 The sixth problem of probabilistic regression

By means of Figure 11.1 we review the mixed model (fixed effects plus random
effects), total least squares (fixed effects plus “errors-in-variables”) and a spe-
cial type of the mixed model which superimposes random effects and “errors-in-
variables”. Here we will concentrate on the model
“errors-in-variables”.
In the context of the general probabilistic regression problem
E{y} = Aȟ + &E{]} + E{;}Ȗ ,
we specialize here to the model
“errors in variables”,
namely
E{y} = E{X}Ȗ
in which y as well as X (vector y and matrix X) are unknown. A simple example
is
the straight line fit,
abbreviated by
“y = ax + b1”.
(x, y) is assumed to be measured, in detail
E{y} = aE{x} + b1 =
ªa º
= [ E{x}, 1] « »
¬b ¼
E{(x  E{x}) } z 0, E{(y  E{y})2 } z 0
2

but Cov{x, y} = 0.
Note
Ȗ1 := a, Ȗ 2 := b
E{y} = y  ey , E{x} = x  ex
Cov{ex , ey } = 0

and

ªȖ º
y  e y = [x  e x , 1] « 1 »
¬Ȗ2 ¼
Ÿ
y = xȖ1 + 1Ȗ 2  e x Ȗ1 + e y .
11 The sixth problem of probabilistic regression 403
constrained by the Lagrangean
L ( Ȗ1 , Ȗ 2 , e x , e y , Ȝ ) =:
= ecx Px e x + ecy Py e y + 2Ȝ c( y  xȖ1  1Ȗ 2 + e x Ȗ1  e y ) =
= min .
Ȗ1 , Ȗ 2 , ex , e y , Ȝ

The first derivatives constitute the necessary conditions, namely


1 wL
= x cȜ = 0,
2 wȖ1
1 wL
= 1cȜ = 0,
2 wȖ 2
1 wL
= Px e x + Ȗ1Ȝ = 0,
2 we x
1 wL
= Py e y  2 = 0,
2 we y
1 wL
= y  xȖ1  1Ȗ 2 + e x Ȗ1  e y = 0,
2 wȜ
while the second derivatives “ z 0 ” refer the sufficiency conditions.
Figure 11.2 is a geometric interpretation of the nonlinear model of type “errors-
in-variables”, namely the straight line fit of total least squares.

y •

( E{x}, E{ y}) •
• •


ey

ex P ( x, y )

x
Figure11.2. The straight line fit of total least squares
( E{y} = a E{x} + 1b, E{x} = x  e x , E{y} = y  e y )
404 11 The sixth problem of probabilistic regression

An alternative model for total least squares is the rewrite


E{y} = y  e y = Ȗ1 E{x} + 1Ȗ 2  y  1Ȗ 2 = Ȗ1 E{x} + e y
E{x} = x  e x  x = E{x} + e x ,
for instance
ª y  1Ȗ 2 º ª Ȗ1, n º ªe y º
« x » = « I » E{x} + «e » ,
¬ ¼ ¬ n ¼ ¬ x¼
we get for Ȉ - BLUUE of E{x}
n ª y  1Ȗ 2 º
E {x} = ( A cȈ 1A ) 1 A cȈ 1 « »
¬ x ¼
subject to
1
ªȖ I º ªȈ 0 º ªP 0º
A := « 1 n » , Ȉ1 = « yy » =« y .
¬ In ¼ ¬ 0 Ȉxx ¼ ¬0 Px »¼

We can further solve the optimality problem using the Frobenius Ȉ - semi-
norms:
ª y  1Ȗ 2  Ȗ1 E n {x}º 1 ª y  1Ȗ 2  Ȗ1 E n{x}º
« »Ȉ « »
«¬ xE n {x} »¼ «¬ xE n {x} »¼
ªeˆ º
 ª¬eˆ cy , eˆ cx º¼ Ȉ 1 « y » = eˆ cx Px eˆ x + eˆ cy Py eˆ y = min .
¬eˆ x ¼ 1 2 Ȗ ,Ȗ

11-1 Solving the nonlinear system of the model “errors-in-


variables”
First, we define the random effect model of type “errors-in-variables” subject to
the minimum condition i ycWy i y + tr I 'X WX I X = min . Second, we form the deri-
vations, the partial derivations i ycWy i y + tr I 'X WX I X + 2Ȝ c{ y  XJ + I XJ  i y } ,
the neccessary conditions for obtaining the minimum.
Definition 11.1 (the random effect model: errors-in-
variables)
The nonlinear model of type “errors-in-variables” is solved by
“total least squares” based on the risk function
(11.1) y = E{y} + e y ~ y = y0 + iy (11.2)

(11.3) X = E{X} + E X ~ X = X 0 + IX (11.4)


subject to
11-1 Solving the nonlinear system of the model “errors-in-variables” 405

(11.5) E{y}  \ n y0  \n (11.6)


~
(11.7) E{X}  \ nxm X0  \ nxm (11.8)
rk E{X} = m ~ rk X0 = m (11.9)

and n t m (11.10)

L1 := ¦ wii i2 ii1 ii2 and L2 := ¦ wk1k2 ii1k1 ii1k2


i1 ,i2 i1 , k1 , k2

L1 := icy Wi y and L2 := tr I 'X WX I X


L1 + L2 = icy Wy i y + tr I 'X WX I X = min
y0 ,X0

L =: || i y ||2 W + || I X ||2 W
y X
(11.11)

subject to
y  X Ȗ + I X Ȗ  i y = 0. (11.12)

The result of the minimization process is given by Lemma 11.2:


Lemma 11.2 (error-in-variables model, normal equations):
The risk function of the model “errors-in-variables” is minimal,
if and only if
1 wL
=  X cȜ A + I'X Ȝ A = 0 (11.13)
2 wȖ
1 wL
= Wy i y  Ȝ A = 0 (11.14)
2 wi y

1 wL
= WX I X + ȖȜ cA (11.15)
2 wI X

1 wL
= y  X y + IX Ȗ  i y = 0 (11.16)
2 wȜ
and

w2L
det ( ) t 0. (11.17)
wJ i wJ j

:Proof:
First, we begin with the modified risk function
406 11 The sixth problem of probabilistic regression

|| i y ||2 W + || I X ||2 +2 Ȝ c( y  XȖ + I X Ȗ  i y ) = min ,


y

where the minimum condition is extended over


y, X, i y , I X , Ȝ ,

when Ȝ denotes the Lagrange parameter.


icy Wy i y + tr I'X WX I X + 2Ȝ '( y  XȖ + I X Ȗ  i y ) = min (11.18)

if and only if
1 wL
=  X cȜ A + I'X Ȝ A = 0 (11.19)
2 wȖ
1 wL
= Wy i y  Ȝ A (11.20)
2 wi y

1 wL
= WX I X = 0 (11.21)
2 w (tr I'X WX I X )

1 wL
= y  XȖ + I X Ȗ  i y = 0 (11.22)
2 wȜ A
and

w2L
(11.23) positive semidefinite.
wȖ wȖ '
The first derivatives guarantee the necessity of the solution, while the second
derivatives being positive semidefinite assure the sufficient condition.

Second, we have the nonlinear equations, namely


(11.24) (  X c + I'X ) Ȝ A = 0 (bilinear)
(11.25) Wy i y  Ȝ A = 0 (linear)
(11.26) WX I X = 0 (bilinear)
(11.27) y  XȖ + I X Ȗ  i y = 0 , (bilinear)

which is a problem outside our orbit-of-interest. An example is given in the next


chapter. Consult the literature list at the end of this chapter.

11-2 Example: The straight line fit


Our example is based upon the “straight line fit”
" y = ax + b1 ",
11-2 Example: The straight line fit 407
where (x, y) has been measured, in detail
E{y} = a E{x} + b 1 =

ªa º
= [ E{x}, 1] « »
¬ b¼
or
ªJ º
J 1 := a, J 2 := b, xJ 1 + 1J 2 = [ x, 1] « 1 »
¬J 2 ¼
and
y  xJ 1  1J 2 + ecx J 1  ecy = 0.

( J 1 , J 2 ) are the two unknowns in the parameter space. It has to be noted that the
term exJ 1 includes two coupled unknowns, namely e x and J 1 .
Second, we formulate the modified method of least squares.
L (J 1 , J 2 , e x , e y , Ȝ ) =
= icWi + 2( y c  x cJ 1  1J 2 + i 'x J 1  i ' y )Ȝ =
= icWi + 2Ȝ c( y  xJ 1  1J 2 + i 'x J 1  i y )
or
i cy Wy i y + i cx Wx i x +
+2(y c  xcJ 1  1cJ 2 + i cxJ 1  i cy )Ȝ.

Third, we present the necessary and sufficient conditions for obtaining the mini-
mum of the modified method of least squares.
1 wL
(11.28) =  x cȜ A + icx Ȝ A = 0
2 wJ 1
1 wL
(11.29) = 1cȜ A = 0
2 wJ 2
1 wL
(11.30) = Wy i y  Ȝ A = 0
2 wi y
1 wL
(11.31) = Wx i x + Ȝ lJ 1 = 0
2 wi x
1 wL
(11.32) = y  xJ 1  1J 2 + i xJ  i y = 0
2 wȜ
and

ª w 2 L / w 2J 1 w 2 L / wJ 1wJ 2 º
det « 2 » t 0. (11.33)
¬w L / wJ 1wJ 2 w 2 L / w 2J 2 ¼
408 11 The sixth problem of probabilistic regression

Indeed, these conditions are necessary and sufficient for obtaining the minimum
of the modified method of least squares.
By Gauss elimination we receive the results
(11.34) (-x c + icx ) Ȝ A = 0

(11.35) Ȝ1 + ... + Ȝ n = 0

(11.36) Wy i y = Ȝ A

(11.37) Wx i x =  Ȝ A Ȗ1

(11.38) Wy y = Wy xJ 1 + Wy 1J 2  Wy i xJ 1 + Wy i y
or
Wy y = Wy xJ 1  Wy 1J 2  ( I x  J 12 ) Ȝ A = 0
(11.39)
if Wy = Wx = W
and
(11.40) xcWy  xcWxJ 1  xcW1J 2  xc(I x  J 12 )Ȝ A = 0

(11.41) y cWy  y cWxJ 1  y cW1J 2  y c( I x  J 12 ) Ȝ A = 0

(11.42) + x cȜ l = + i 'x Ȝ A

(11.43) Ȝ , +... + Ȝ n = 0
œ

ª x cº ª y cº ª x cº ª x cº
« y c» Wy  « x c» WxJ 1  « y c» W1J 2  « y c» ( I x  J 1 ) Ȝ A = 0
2
(11.44)
¬ ¼ ¬ ¼ ¬ ¼ ¬ ¼
subject to
(11.45) Ȝ1 + ... + Ȝ n = 0,

(11.46) x1c Ȝ l = i cx Ȝ l .
Let us iterate the solution.

ª0 0 0 0 (xcn  i cx ) º ªJ1 º ª 0 º
«0 0 0 0 1cn »» «J » « 0 »
« « 2» « »
«0 0 Wx 0 J 1I n » « ix » = « 0 » .
« » « » « »
«0 0 0 Wy I n » « iy » « 0 »
«¬ x n 1n J 1I n In 0 »¼ «¬ Ȝ A »¼ «¬ y n »¼

We meet again the problem that the nonlinear terms J 1 and i x appear. Our itera-
tion is based on the initial data
11-2 Example: The straight line fit 409

(i ) Wx = Wy = W = I n , (ii) (i x ) 0 = 0, (iii) (J 1 ) n = J 1 (y  (i y ) 0 = x(J 1 ) 0 + 1n (J 2 ) 0 )


in general

ª0 0 0 0 (xc  i cx ) º ª J 1(i +1) º ª 0 º


«0 0 0 0 1 »» «J » « »
« « 2(i +1) » « 0 »
«0 0 Wx 0 J 1(i ) I n » « i x (i +1) » = « 0 »
« » « » « »
«0 0 0 Wy I n » « i y (i +1) » « 0 »
«xn 1n J 1(i ) I n In 0 »¼ « Ȝ l (i +1) » « y »
¬ ¬ ¼ ¬ n¼
x1 = J 1 , x2 = J 1 , x3 = i x , x4 = i y , x5 = Ȝ A .

The five unknowns have led us to the example within Figure 11.3.

(1, 0.5) (7, 5.5) (10, 5.05) (11, 9)


Figure 11.3: General linear regression
Our solutions are collected as follows:
J 1 : 0.752, 6 0.866,3 0.781, 6
J 2 : 0.152, 0 -1.201,6 -0.244, 6
case : case : our general
V y (i ) = 0 V X (i ) = 0 results after
V X (i ) z 0 V y (i ) z 0 iteration.
11-3 References
Please, contact the following references.
Abatzoglou, J.T. and Mendel, J.M. (1987),,Abatzoglou, J.T. and Mendel, J.M.
and Harada, G.A. (1991), Bajen, M.T., Puchal, T., Gonzales, A., Gringo, J.M.,
Castelao, A ., Mora, J. and Comin, M. (1997), Berry, S.M., Carroll, R.J. and
Ruppert, D. (2002), Björck, A. (1996), Björck, A. (1997), Björck, A., Elfving, T.
and Strakos, Z. (1998), Björck, A., Heggernes, P. and Matstoms, P. (2000), Bo-
janczyk, A.W., Bront, R.P., Van Dooren, P., de Hoog, F.R. (1987), Bunch, J.R.,
Nielsen, C.P. and Sorensen, D.C. (1978), Bunke, H. and Bunke, O. (1989), Cap-
derou, A., Douguet, D., Simislowski, T., Aurengo, A. and Zelter, M. (1997), Car-
roll, R.J., Ruppert, D. and Stefanski, L.A. (1996), Carroll, R.J., Küschenhoff, H.,
Lombard, F. and Stefanski, L.A. (1996), Carroll, R.J. and Stefanski, L.A. (1997),
Chandrasekuran, S. and Sayed, A.H. (1996), Chun, J. Kailath, T. and Lev-Ari, H.
(1987), Cook, J.R. and Stefanski, L.A. (1994), Dembo, R.S., Eisenstat, S.C. and
Steihaug, T. (1982), De Moor, B. (1994), Fuller, W.A. (1987), Golub, G.H. and
Van Loan, C.F. (1980), Hansen, P.C. (1998), Higham, N.J. (1996), Holcomb, J.
P. (1996), Humak, K.M.S. (1983), Kamm, J. and Nagy, J.G. (1998), Kailath, T.
Kung, S. and More, M. (1979), Kailath, T. and Chun, J. (1994), Kailath, T. and
Sayed, A.H.(1995), Kleming, J.S. and Goddard, B.A. (1974), Kung, S.Y., Arun,
K.S. and Braskar Rao, D.V. (1983), Lemmerling, P., Van Huffel, S. and de Moor,
B. (1997), Lemmerling, P., Dologlou, I. and Van Huffel, S. (1998), Lin, X. and
Carroll, R.J. (1999), Lin, X. and Carroll, R.J. (2000), Mackens, W. and Voss, H.
(1997), Mastronardi, N, Van Dooren, P. and Van Huffel, S., Mastronardi, N.,
Lemmerling, P. and Van Huffel, S. (2000), Nagy, J.G. (1993), Park, H. and El-
den, L. (1997), Pedroso, J.J. (1996), Polzehl, J. and Zwanzig, S. (2003), Rosen,
J.B., Park, H. and Glick, J. (1996), Stefanski, L.A. and Cook, J.R. (1995), Stew-
art, M. and Van Dooren, P. (1997), Van Huffel, S. (1991), Van Huffel, S. and
Vandewalle, J. (1991), Van Huffel, S., Decanniere, C., Chen, H. and Van Hecke,
P.V. (1994), Van Huffel, S., Park, H. and Rosen, J.B. (1996), Van Huffel, S.,
Vandewalle, J., de Rao, M.C. and Willems, J.L.,Wang, N., Lin, X., Gutierrez,
R.G. and Carroll, R.J. (1998), and Yang, T. (1996).
12 The sixth problem of generalized algebraic regression
– the system of conditional equations with unknowns –
(Gauss-Helmert model)

C.F. Gauss and F.R. Helmert introduced the generalized algebraic regres-
sion problem which can be identified as a system of conditional equations
with unknowns.
:Fast track reading:
Read only Lemma 12.2, Lemma 12.5,
Lemma 12.8

Lemma 12.2
Normal equations: Ax + Bi = By

Definition 12.1
W - LESS: Ax + Bi = By

Lemma 12.3
Condition A  B

“The guideline of chapter twelve:


first definition and lemmas”

Lemma 12.5
R, W - MINOLEES: Ax + Bi = By

Definition 12.4
R, W - MINOLESS: Ax + Bi = By

Lemma 12.6
relation between A und B

“The guideline of chapter twelve:


second definition and lemmas”
412 12 The sixth problem of generalized algebraic regression

Lemma 12.8
Definition 12.7
R, W – HAPS: normal equations:
R, W – HAPS: Ax + Bi = By Ax + Bi = By

“The guideline of chapter twelve:


third definition and lemmas”
The inconsistent linear system
Ax + Bi = By
called generalized algebraic regression with unknowns or homogeneous
Gauß - Helmert model
will be characterized by certain solutions which we present in Definition 12.1,
Definition 12.4 and Definition 12.7 solving special optimization problems.
Because of rk B = q there holds automatically
R ( A )  R (B).
Lemma 12.2, Lemma 12.5 and Lemma 12.8 contain the normal equations as
special optimizational problems. Lemma 12.3 and Lemma 12.6 refer to special
solutions as linear forms of the observation vector, in particular which are char-
acterized by products of certain generalized inverses of the coefficient matrices
A and B of conditional equations. In addition, we compare R, W - MINOLESS
and R, W - HAPS by a special lemma. As examples we treat a
height network
which is characterized by absolute and relation height difference measurements
called “leveling” of type I - LESS, I, I - MINOLESS, I, I - HAPS and R,W-
MINOLESS.

Lemma 12.10
W - LESS: Ax + Bi = By  c

Definition 12.9
W - LESS: Ax + Bi = By  c

Lemma 12.11
Condition A  B

“The guideline of Chapter twelve:


fourth definition and lemmas”
12 The sixth problem of generalized algebraic regression 413

Lemma 12.13
R, W - MINOLESS:
Ax + Bi = By  c

Definition 12.12
R, W - MINOLESS:
Ax + Bi = By  c

Lemma 12.14
relation between A und B

“The guideline of chapter twelve:


fifth definition and lemmas”

Definition 12.15 Lemma 12.16


R, W – HAPS: Ax + Bi = By  c R, W – HAPS: Ax + Bi = By  c

“The guideline of chapter twelve:


sixth definition and lemmas”

The inconsistent linear system


Ax + Bi = By
- note the constant shift – called generalized algebraic regression with unknowns
or inhomogeneous
Gauß-Helmert model
will be characterized by certain solutions which we present in Definition 12.9,
Definition 12.12 and Definition 12.15, solving special optimization problems.
Because of the rank identity rk B = q there holds automatically
R ( A)  R (B).
Lemma 12.10, Lemma 12.13 and Lemma 12.16 contain the normal equations as
special optimizational problems. Lemma 12.11 and Lemma 12.14 refer to special
solutions as linear forms of the observation vector, in particular which can be
characterized by products of certain generalized inverses of the coefficient ma-
trices A and B of the conditional equations. In addition, we compare R, W-
MINOLESS and R, W - HAPS by a special lemma.
414 12 The sixth problem of generalized algebraic regression

At this point we have to mention that we were not dealing with a


consistent system of homogeneous or
inhomogeneous condition equations
with unknowns of type
Ax = By , By  R ( A ) or Ax = By  c, By  c + R ( A ) .

For further details we refer to E. Grafarend and B. Schaffrin


(1993, pages 28-34 and 54-57).
We conclude with Chapter 4 (conditional equations with unknowns, namely
“bias estimation” within an equivalent stochastic model) and Chapter 2 (Exam-
ples for the generalized algebraic regression problem: W - LESS, R,W - MI-
NOLESS and R, W - HAPS).
12-1 Solving the system of homogeneous condition equations with
unknowns
First, we solve the problem of homogeneous condition equations by the method
of minimizing the W - seminorm of Least Squares. We review by Definition
12.1, Lemma 12.2 and Lemma 12.3 the characteristic normal equations and linear
form which build up the solution of type W - LESS. Instead, secondly by Defini-
tion 12.4 and Lemma 12.5 and Lemma 12.6 R we present, W - MINOLESS as
MInimum NOrm LEast Squares Solution (R - SemiNorm, W - SemiNorm of type
Least Squares). Third, we alternatively concentrate by Definition 12.7 and
Lemma 12.8 and Lemma 12.9 on R, W - HAPS (Hybrid APproximate Solution
with respect to the combined R - and W - Seminorm). Fourth, we compare R, W
- MINOLESS and R, W - HAPS x h  xlm .
12-11 W - LESS
W - LESS is built on Definition 12.1, Lemma 12.2 and Lemma 12.3.
Definition 12.1 (W - LESS, homogeneous conditions with
unknowns):
An m × 1 vector xl is called W - LESS (LEast Squares Solu-
tion with respect to the W -seminorm ) of the inconsistent sys-
tem of linear equations
Ax + Bi = By (12.1)

with Bi A := By  Ax A , if compared to alternative vectors x  R m


with Bi := Bi  Ax the inequality
|| i A ||2 W := i cA Wi A d i cWi =:|| i ||2W (12.2)

holds, if in consequence i A has the smallest W - seminorm.


The solutions of type W - LESS are computed as follows.
12-1 Solving the system of homogeneous condition equations with unknowns 415

Lemma 12.2 (W - LESS, homogeneous conditions with


unknowns: normal equations):
An m × 1 vector x A is W-LESS of (12.1) if and only if it solves
the system of normal equations
ª W Bc 0 º ª i A º ª 0 º
« B 0 ǹ » « Ȝ A » = «B y » (12.3)
«¬ 0 A c 0 »¼ « x » « 0 »
¬ A¼ ¬ ¼
with the q × 1 vector OA of “Lagrange multipliers”. x A exists
in the case of
R (B)  R ( W) (12.4)
and is solving the system of normal equations
A c(BW  Bc) 1 AxA = A c(BW  Bc) 1 By . (12.5)
which is independent of the choice of the g - inverse W  and
unique. x A is unique if and only if the matrix
A c( BW  Bc) 1 A (12.6)
is regular, and equivalently, if
rk A = m (12.7)
holds.
:Proof:
W - LESS is constructed by means of the “Lagrange function”
L (i, x, Ȝ ):= i cWi + 2Ȝ c( Ax + Bi  By ) = min .
i , x, Ȝ

The necessary conditions for obtaining the minimum are given by the first de-
rivatives
1 wL
(i A , x A , Ȝ A ) = Wi A + BcȜ A = 0
2 wi
1 wL
(i A , x A , Ȝ A ) = A cȜ A = 0
2 wx
1 wL
(i A , x A , Ȝ A ) = Ax A + Bi A  By = 0.
2 wȜ
Details for obtaining the derivatives of vectors are given in Appendix B. The
second derivatives
1 w2L
(i A , x A , Ȝ A ) = W t 0
2 wiwic
are the sufficient conditions for the minimum due to the matrix W being positive
semidefinite.
Due to the condition
416 12 The sixth problem of generalized algebraic regression

R ( Bc )  R ( W )
we have
WW  Bc = Bc.
As shown in Appendix A, BW 1B is invariant with respect to the choice of the
generalized inverse W  . In fact, the matrix BW - Bc is uniquely invertible.
Elimination of the vector i A leads us to the system of reduced normal equations.
ª BW  Bc A º ª Ȝ A º ª By º
« »« » = « »
¬ Ac 0 ¼ ¬xA ¼ ¬ 0 ¼
and finally eliminating Ȝ A to
A c( BW  Bc) 1 Ax A = A c( BW  Bc) 1 By, (12.8)
because of BW W = B there follows the existence of x A . Uniqueness is assured


due to the regularity of the matrix


A c(BW  Bc) 1 A,
which is equivalent to rk A = m .
The linear forms x A = Ly , which lead to W - LESS of arbitrary observation
vectors y  R n because of (12.4), can be characterized by Lemma 12.3.
Lemma 12.3 (W - LESS, relation between A and B):
Under the condition
R (Bc)  R ( W)
is x A = L y W - LESS of (12.1) for all y  R n if and only if the matrix L
obeys the condition
L = AB (12.9)
subject to
( BW  Bc) 1 AA  = [( BW  Bc) 1 AA 1 ]c. (12.10)
In this case, the vector Ax A = AA  By is always unique.

12-12 R, W – MINOLESS
R, W - MINOLESS is built on Definition 12.4, Lemma 12.5 and Lemma 12.6.
Definition 12.4 (R, W - MINOLESS, homogeneous conditions
with unknowns):
An m × 1 vector xAm is called R, W - MINOLESS (Minimum
NOrm with respect to the R – Seminorm, LEast Squares Solution
with respect to the W – Seminorm) of the inconsistent system of
linear equations if (12.3) is consistent and x Am is R- MINOS of
(12.3).
12-1 Solving the system of homogeneous condition equations with unknowns 417

In case of R (Bc)  R ( W) we can compute the solutions of type R, W –


MINOLESS as follows.
Lemma 12.5 (R, W – MINOLESS, homogeneous conditions
with unknowns: normal equations):
Under the assumption
R (Bc)  R ( W)
is an m × 1 vector xAm R- MINOLESS of (12.1) if and only if it
solves the normal equation

ª R A c(BW  Bc)-1 A º ª x Am º
« A c(BW  Bc)-1 A 0 » «Ȝ » =
¬ ¼ ¬ Am ¼

ª 0 º
=«  -1 » (12.11)
¬ A c(BW Bc) By ¼
with the m × 1 vector Ȝ Am of “Lagrange – multipliers”. x Am
exists always and is uniquely determined, if
rk[R, Ac] = m (12.12)
holds, or equivalently, if the matrix
R + A c(BW  Bc)-1 A (12.13)
is regular.
The proof of Lemma 12.5 is based on applying Lemma 1.2 on the normal equa-
tions (12.5). The rest is based on the identity
ªR  º ªR º
R + A c(BW  Bc)-1 A = [ R , A c] «
0
 -1 » « » .
¬ 0 c
(BW B ) ¼ ¬ A ¼

Obviously, the condition (12.12) is fulfilled if the matrix R is positive definite, or


if R describes an R  norm. The linear forms x Am = Ly , which lead to the R, W
– MINOLESS solutions, are characterized as follows.
Lemma 12.6 (R, W – MINOLESS, relation between A und B):
Under the assumption R (B') = R ( W ) is x Am = Ly of type R, W
– MINOLESS of (12.1) for all y  R n if and only if the matrix
A  B follows the condition
L = AB (12.14)
subject to
(BW  Bc)-1 AA  = [(BW  Bc)-1 AA ']' (12.15)
RA  AA  = RA  (12.16)
418 12 The sixth problem of generalized algebraic regression

RA  A = ( RA  A ) ' (12.17)
is fulfilled. In this case
Rx Am = RLy (12.18)
is always unique. In the special case that R is positive definite,
the matrix L is unique, fulfilling (12.14) - (12.17).
:Proof:
Earlier we have shown that the representation
(BW  Bc)-1 AA  = [(BW  Bc)-1 AA  ]' (12.19)

leads us to L = A  B uniquely. The general solution of the system


A c(BW  Bc)-1 Ax A = A c(BW  Bc)-1 By (12.20)
is given by
x A = x Am + (I  [ A c(BW  B c)-1 A ] A c(BW  B c)-1 A ]z (12.21)
or
x A = x Am + (I  LBcA )z (12.22)
for an arbitrary m × 1 vector z , such that the related R - seminorm follows the
inequality
|| x Am ||2R =|| L y ||R2 d|| L y ||R2 +2y cLcR (I  LB  A )z + || (I  LB  A )z ||R2 . (12.23)

For arbitrary y  R n , we have the result that


LcR (I  LB  A) = 0 (12.24)
must be zero! Or
(12.25) RLB R AL = RL and RLB R A = (RLB R A )c. (12.26)

To prove these identities we must multiply from the right by L, namely


LcR = LcRLB R A : LcRL = LcRLB R AL. (12.27)
Due to the fact that the left hand side is a symmetric matrix, the right-hand side
must have the same property, in detail
RLB R A = (RLB R A)c q.e.d .
Add L = A  B and taking advantage of B R BA = A , we find

RA  AA  = RLB R ALW  Bc(BW  Bc) 1 =


= RLW  Bc(BW  Bc) 1 = RA  (12.28)
and
RA A = (RA  A)c .

(12.29)
12-1 Solving the system of homogeneous condition equations with unknowns 419

Uniqueness of xAm follows automatically. In case that the matrix R is positive


definite and, of course, invertible, it is granted that the matrix L = A  B is
unique!
12-13 R, W – HAPS
R, W – HAPS is alternatively built on Definition 12.7 and Lemma 12.8.
Definition 12.7 (R, W - HAPS, homogeneous conditions with un-
knowns):
An m × 1 vector x h with Bi h = By  Ax h is called R, W - HAPS
(Hybrid APproximate Solution with respect to the combined R-
and W- Seminorm if compared to all other vectors x  R n of
type Bi = By  Ax the inequality
|| i h ||2W + || x h ||2R := i ch Wi h + xch Rx h d

d i cWi + xcRx =:|| i ||2W + || x ||2R (12.30)


holds, in other words if the hybrid risk function || i ||2W + || x ||2R is
minimal.
The solutions of type R, W – HAPS can be computed by
Lemma 12.8 (R, W – HAPS homogeneous conditions with
unknowns: normal equations):
An m × 1 vector x h is R, W – HAPS of the Gauß – Helmert
model of conditional equations with unknowns if and only if the
normal equations
ª W B' 0 º ª i h º ª 0 º
« B 0 A » « Ȝ h » = « By » (12.31)
«¬ 0 A' R »¼ « x » «¬ 0 »¼
¬ h¼
with the q × 1 vector Ȝ h of “Lagrange – multpliers” are fulfilled.
x A certainly exists in case of
R (Bc)  R ( W) (12.32)
and is solution of the system of normal equations
(R +A c(BW  Bc)-1 A)x h = A c(BW  Bc)-1 By , (12.33)
which is independent of the choice of the generalized inverse W 
uniquely defined. x h is uniquely defined if and only if the matrix
(R +A c(BW  Bc)-1 A) is regular, equivalently if
rk[R, Ac] = m (12.34)
holds.
420 12 The sixth problem of generalized algebraic regression

:Proof:
With the “Lagrange function” Ȝ R, W – HAPS is defined by
L(i, x, Ȝ ) := i cWi + xcRx + 2Ȝ c(Ax + Bi - By ) = min .
i,x,Ȝ

The first derivatives


1 wL
(i h , x h , Ȝ h ) = Wi h + BcȜ h = 0 (12.35)
2 wi
1 wL
(i h , x h , Ȝ h ) = A cȜ h + Rx h = 0 (12.36)
2 wx
1 wL
( i h , x h , Ȝ h ) = Ax h + Bi h  By = 0 (12.37)
2 wȜ
establish the necessary conditions. The second derivatives
1 w 2L
(i h , x h , Ȝ h ) = W t 0 (12.38)
2 wiwi c
1 w 2L
(i h , x h , Ȝ h ) = R t 0 (12.39)
2 wxwxc
due to the positive definiteness of the matrices W and R a sufficient criterion for
the minimum.
If in addition
R (Bc)  R ( W)
holds, we are able to reduce i h , namely to device the reduced system of normal
equations
ª(BW  Bc) A º ª Ȝ h º ªBy º
« »« » = « » (12.40)
¬ A' R ¼ ¬ xh ¼ ¬ 0 ¼

and by reducing Ȝ h , in addition,

(R +A c(BW  Bc)-1 A)x h = A c(BW  B c)-1 By. (12.41)


Because of the identity
ªR- 0 º ªR º
R +A c(BW  B c)-1 A = [R , A '] «  -1 » « », (12.42)
¬0 (BW Bc) ¼ ¬ A ¼

we can assure the existence of our solution x h and, in addition, the equivalence
of the regularity of the matrix (R +A c(BW  Bc)-1 A) with the condition
rk[ R, A c] = m , the basis of the uniqueness of x h .
12-2 Examples 421

12-14 R, W - MINOLESS against R, W - HAPS


Obviously, R, W – HAPS with respect to the model (12.32) is unique if and only
if R, W – MINOLESS is unique, because the representations (12.34) and (12.12)
are identical. Let us replace (12.11) by the equivalent system
(R +A c(BW  Bc)-1 A)x Am + A c(BW  Bc)-1 AȜ Am = A c(BW  Bc)-1 By (12.43)
and
A c(BW Bc) Ax Am = A c(BW  Bc)-1 By
 -1
(12.44)
such that the difference
x h  x Am = (R +A c(BW  B c)-1 A)1 A c(BW  B c)-1 AȜ Am (12.45)
follows automatically.
12-2 Examples for the generalized algebraic regression problem:
homogeneous conditional equations with unknowns
As an example of inconsistent linear equations
Ax + Bi = By

we treat a height network, consisting of four points whose relative and absolute
heights are derived from height difference measurements according to the net-
work graph in Chapter 9. We shall study various optimal criteria of type
I-LESS
I, I-MINOLESS
I, I-HAPS,
and
R, W-MINOLESS: R positive semidefinite
W positive semidefinite.
We use constructive details of the theory of generalized inverses according to
Appendix A.
Throughout we take advantage of holonomic height difference measurements,
also called “gravimetric leveling”
{ hDE := hE  hD , hJD := hD  hJ , hEG := hG  hE , hGJ := hJ  hG }

within the triangles {PD , PE , PJ } and {PE , PG , PJ }. In each triangle we have the
holonomity condition, namely
{hDE + hEJ + hJD = 0, (hDE + hEJ =  hJD )}

{hJE + hEG + hGJ = 0, (hEG + hGJ =  hJE )} .


422 12 The sixth problem of generalized algebraic regression

12-21 The first case: I - LESS


In the first example we order four height difference measurements to the system
of linear equations
ªiDE º ª hDE º
« » « »
ª 1º ª1 1 0 0º «iJD » ª1 1 0 0º « hJD »
«1 » hEJ + « 0 0 1 1 » «i » = « 0 0 1 1 » « h »
¬ ¼ ¬ ¼ « EG » ¬ ¼ « EG »
«iGJ » « hGJ »
¬ ¼ ¬ ¼
as an example of homogeneous inconsistent condition equations with unknowns:
ª hDE º
« »
ª 1º ª 1 1 0 0 º « hJD »
A := « » , x := hEJ , B := , y := « »
¬1 ¼ ¬« 0 0 1 1 ¼» h
« EG »
« hGJ »
¬ ¼
n = 4 , m = 1, rkA = 1, rkB = q = 2
1
A c(BBc)-1 A = 1, A c(BBc)-1 B = [1, 1,1,1]
2
1
xA = (hEJ ) A = (  hDE  hJD + hEG + hGJ ) .
2
12-22 The second case: I, I – MINOLESS
In the second example, we solve I, I – MINOLESS for the problem of four
height difference measurements associated with the system of linear equations
ªiDE º ª hDE º
« » « »
ª 1 1º ª hE º ª1 1 0 0 º «iJD » ª1 1 0 0 º « hJD »
«¬ 1 1 »¼ « h » + « 0 0 1 1 » «i » = « 0 0 1 1 » « h »
«¬ J »¼ ¬ ¼ « EG » ¬ ¼ « EG »
«iGJ » « hGJ »
¬ ¼ ¬ ¼
as our second example of homogeneous inconsistent condition equations with
unknowns:
ª hDE º
« »
ª 1 1º ª hE º ª1 1 0 0 º « hJD »
A := « , x := « » , B := « , y := « »
¬ 1 1 »¼ «¬ hJ »¼ ¬ 0 0 1 1 »¼ h
« EG »
« hGJ »
¬ ¼
n = 4 , m = 2 , rkA = 1, rkB = q = 2 .

I, I – LESS solves the system of normal equations


12-2 Examples 423

A c(BBc)-1 Ax A = A c(BBc)-1 By
ª 1 1º
A c(BBc)-1 A = « » =: DE
¬ 1 1 ¼
ª1 º
D = « » , E = [1, 1] .
¬ 1¼
For the matrix of the normal equations
A c(BBc)-1 A = DE
we did rank factorizing:
O(D) = m × r , O(E) = r × m, rkD = rkE = r = 1

1 1 ª 1 1º
[ A c(BBc)-1 A ]' = Ec(EEc) 1 (DcD) 1 Dc = A c(BB c)-1 A = « ,
4 4 ¬ 1 1 »¼
1 ª 1 1 1 1º
A '(BB ')-1 B = .
2 «¬ 1 1 1 1 »¼
I, I – MINOLESS due to rk[R , A ' ] = 2 leads to the unique solution

ª hE º 1 ª hDE + hJD  hEG  hGJ º


x Am = « » = « »,
¬« hJ ¼» Am 4 «¬  hDE  hJD + hEG + hGJ ¼»
which leads to the centric equation
(hE )Am + (hJ )Am = 0.

12-23 The third case: I, I - HAPS


Relating to the second design we compute the solution vector x h of type I, I –
HAPS by the normal equations, namely
1 ª 5 3º ª 5 3º
I + A c(BBc) 1 A = , [I + A c(BBc) 1 A]1 = «
4 «¬3 5»¼ ¬ 3 5 »¼

ª hE º ª hDE + hJE  hEG  hGJ º


xh = « » = 4 « ».
«¬ hJ »¼ h «¬ hDE  hJE + hEG + hGJ »¼

12-24 The fourth case: R, W – MINOLESS, R positive semidefinite, W


positive semidefinite
This time we refer the characteristic vectors x, y and the matrices A, B to the
second design. The weight matrix of inconsistency parameters will be chosen to
424 12 The sixth problem of generalized algebraic regression

ª1 1 0 0º
1 «1 1 0 0»
W= «
2 0 0 1 1»
«0 0 1 1 »¼
¬
and W  = W , such that R (B ')  R ( W) holds. The positive semidefinite ma-
trix R = Diag(0,1) has been chosen in such a way that the rank partitioned un-
known vector x = [x1c , xc2 ]', O(x1 ) = r × 1, O( x 2 ) = ( m  1) × 1, rkA =: r = 1 relat-
ing to the partial solution x 2 = xJ = 0, namely
ª 1 1º 1 ª 1 1 1 1º
A c(BW  Bc)-1 A = « » , A c(BW  Bc)-1 B = «
¬ 1 1 ¼ 2 ¬ 1 1 1 1 »¼
and
ª 1 1º ª x E º 1 ª 1 1 1 1º
«¬ 1 1 »¼ « x » = 2 «¬ 1 1 1 1 »¼ y ,
«¬ J »¼ Am
1
( x E ) Am = ( hDE + hJD  hEG  hGJ ), ( xJ ) Am = 0.
2
12-3 Solving the system of inhomogeneous condition equations with
unknowns
First, we solve the problem of inhomogeneous condition equations by the
method of minimizing the W – seminorm of Least Squares. We review by Defi-
nition 12.9 and Lemma 12.10 and Lemma 12.11 the characteristic normal equa-
tions and linear form which build up the solution of type W – LESS. Second, we
extend the method of W – LESS by R, W – MINOLESS by means of Definition
12.12 and Lemma 12.13 and Lemma 12.14. R, W – MINOLESS stands for Mini-
mum Norm East Squares Solution (R – Seminorm, W – Seminorm of type (LEast
Squares). Third, we alternatively present by Definition 12.15 and Lemma 12.16
R, W – HAPS (Hybrid AProximate Solution with respect to the combined R- and
W– Seminorm). Fourth, we again compare R, W – MINOLESS and R, W –
HAPS by means of computing the difference vector x A  x Am .
12-31 W – LESS
W – LESS of our system of inconsistent inhomogeneous condition equations
with unknowns Ax + Bi = By - c, By  c + R(A) is built on Definition 12.9,
Lemma 12.10 and Lemma 12.11.
Definition 12.9 (W - LESS , inhomogeneous conditions with un-
knowns):
An m × 1 vector x A is called W - LESS (LEast Squares Solution
with respect to the W- seminorm) of the inconsistent system of
inhomogeneous linear equations
12-3 Solving the system of inhomogeneous condition equations with unknowns 425

Ax + Bi = By - c (12.46)

with Bi A := By  c  Ax A , if compared to alternative vector


x  R m with Bi := By  c  Ax the inequality
|| i A ||2W := i 'A Wi A d i'Wi =:|| i ||2W (12.47)
holds. As a consequence i A has the smallest W – seminorm.

The solutions of the type W- LESS are computed as follows.


Lemma 12.10 (W – LESS, inhomogeneous conditions with
unknowns: normal equations):
An m × 1 vector x A is W – LESS of (12.46) if and only if it
solves the system of normal equation
ª W Bc 0 º ª i A º ª 0 º
« B 0 A » « Ȝ A » = « By  c » (12.48)
« »« » « »
¬ 0 Ac 0 ¼ ¬ xA ¼ ¬ 0 ¼
with the q × 1 vector Ȝ A of “Lagrange – multipliers”. x A exists
indeed in case of
R (B ')  R ( W) (12.49)
and is solving the system of normal equations
A c(BW  Bc)-1 Ax A = A c(BW  Bc)-1 (By  c) , (12.50)

which is independent of the choice of the g – inverse W  and


unique. x A is unique if and only if the matrix

A c(BW  Bc)-1 A (12.51)


is regular, or equivalently, if
rkA = m (12.52)
holds. In this case the solution can be represented by
x A = [ A c(BW  Bc)-1 A]1 A c(BW  Bc)-1 (By  c). (12.53)
The proof follows the same line-of-thought of (12.3) – (12.7). The linear
form x A = L(y  d) = Ly  A follows the basic definitions and can be
characterized by Lemma 12.11.
Lemma 12.11 (W – LESS, relation between A and B):
Under the condition
R (Bc)  R ( W) (12.54)
426 12 The sixth problem of generalized algebraic regression

is x A = Ly  1 W – LESS of (12.46) for y  R n if and only if the


matrix L and m × 1 vector 1 obey the conditions

(12.55) L = AB and 1 = Ac (12.56)


subject to
(BW  Bc)-1 AA  = [(BW  B c)-1 AA  ]'. (12.57)

In this case, the vector Ax A = AA  (By  c) is always unique.


The proof is obvious.
12-32 R, W – MINOLESS
R, W – MINOLESS of our system of inconsistent, inhomogeneous condition
equations with unknowns Ax + Bi = By  c , By  c + R(A) is built on Definition
12.12, Lemma 12.13 and Lemma 12.14.
Definition 12.12 (R, W - MINOLESS, inhomogeneous conditions
with unknowns):
An m × 1 vector xAm is called R, W - MINOLESS (Minimum
NOrm with respect to the R – Seminorm LEast Squares Solution
with respect to the W – Seminorm) if the inconsistent system of
linear equations of (12.46) is inconsistent and x Am R- MINOS of
(12.46).
In case of R (B')  R ( W ) we can compute the solutions of type R, W –
MINOLESS of (12.46) as follows.
Lemma 12.13 (R, W – MINOLESS, inhomogeneous condi-
tions with unknowns: normal equations):
Under the assumption
R (Bc)  R ( W) (12.58)
is an m × 1 vector xAm R, W - MINOLESS of (12.46) if and only
if it solves the normal equation

ª R A c(BW  Bc)-1 A º ª x Am º
« A c(BW  Bc)-1 A 0 » «Ȝ » =
¬ ¼ ¬ Am ¼

ª 0 º
=«  -1 » (12.59)
¬ A c(BW Bc) (By  c) ¼
with the m × 1 vector Ȝ Am of “Lagrange – multipliers”. x Am exists al-
ways and is uniquely determined, if
rk[R, A '] = m (12.60)
12-3 Solving the system of inhomogeneous condition equations with unknowns 427
holds, or equivalently, if the matrix
R + A c(BW  Bc)-1 A (12.61)
is regular. In this case the solution can be represented by
x Am = [R + A c(BW  Bc)-1 A c(BW  Bc)-1 A ] ×
×{A c(BW  Bc)-1 A[R + A c(BW  Bc)-1 A ]1 × (12.62)
×A c(BW Bc) A} A c(BW Bc) (By  c) ,
 -1   -1

which is independent of the choice of the generalized inverse.


The proof follows similar lines as in Chapter 12-5. Instead we present the linear
forms x = Ly which lead to the R, W – MINOLESS solutions and which can be
characterized as follows.
Lemma 12.14 (R, W – MINOLESS, relation between A und B):
Under the assumption R (Bc)  R ( W) is x Am = Ly of type R, W
– MINOLESS of (12.46) for all y  R n if and only if the matrix
A  B follows the condition
(12.63) L = AB and 1 = Ac (12.64)
and
(BW  Bc)-1 AA  = [(BW  Bc)-1 AA  ]c (12.65)
RA  AA  = RA  (12.66)
RA  A = (RA  A)c (12.67)
are fulfilled. In this case is
Rx Am = R ( Ly  1) (12.68)
always unique. In the special case that R is positive definite,
the matrix L is unique, following (12.59) - (12.62).
The proof is similar to Lemma 12.6 if we replace everywhere By by By  c .

12-33 R, W – HAPS
R, W – HAPS is alternatively built on Definition 12.15 and Lemma 12.16 for the
special case of inconsistent, inhomogeneous conditions equations with unknowns
Ax + Bi = By  c , By  c + R(A) .
Definition 12.15 (R, W - HAPS, inhomogeneous conditions with un-
knowns):
An m × 1 vector x h with Bi h = By  c  Ax h is called R, W -
HAPS (Hybrid APproximate Solution with respect to the com-
bined R- and W- Seminorm if compared to all other vectors
x  R n of type Bi = By  c  Ax the inequality
428 12 The sixth problem of generalized algebraic regression

|| i h ||2W + || x h ||2R =: i ch Wi h + xch Rx h d


d i cWi + xcRx =:|| i ||2W + || x ||2R (12.69)
holds, in other words if the hybrid risk function || i || + || x || is mini-
2
W
2
R

mal.
The solution of type R, W – HAPS can be computed by
Lemma 12.16 (R, W – HAPS inhomogeneous conditions
with unknowns: normal equations):
An m × 1 vector x h is R, W – HAPS of the Gauß – Helmert
model of inconsistent, inhomogeneous condition equations with
unknowns if and only if the normal equations
ª W Bc 0 º ª i h º ª 0 º
« B 0 A » « Ȝ h » = « By  c » (12.70)
« »« » « »
¬ 0 Ac R ¼ ¬ x h ¼ ¬ 0 ¼
with the q × 1 vector Ȝ h of “Lagrange – multpliers” are fulfilled. x A ex-
ists certainly in case of
R (Bc)  R ( W) (12.71)
and is solution of the system of normal equations
[R +A c(BW  Bc)-1 A ]x h = A c(BW  B c)-1 (By  c) , (12.72)
which is independent of the choice of the generalized inverse W 
uniquely defined. x h is uniquely defined if and only if the matrix
[R +A c(BW  Bc)-1 A] is regular, equivalently if
rk[R, A c] = m (12.73)
holds. In this case the solution can be represented by
x h = [R +A c(BW  Bc)-1 A]1 A c(BW  Bc)-1 (By  c). (12.74)
The proof of Lemma 12.16 follows the lines of Lemma 12.8.
12-34 R, W - MINOLESS against R, W - HAPS
Again we note the relations between R, W-MINOLESS and R, W-HAPS: R, W-
HAPS is unique because the representations (12.59) and (12.12) are identical. Let
us replace (12.59) by the equivalent system
(R +A c(BW  Bc)-1 A)x Am + A c(BW  B c)-1 AȜ Am = A c(BW  B c)-1 (By  c) (12.75)
and

A c(BW Bc) x Am -1
= A c(BW  Bc)-1 (By  c) , (12.76)
12-4 Conditional equations with unknowns 429
such that the difference
x h  x Am = [R +A'(BW - B')-1 A ]1 A'(BWB')-1 AȜ Am (12.77)

follows automatically.
12-4 Conditional equations with unknowns: from the algebraic ap-
proach to the stochastic one
Let us consider the stochastic portray of the model “condition equations with
unknowns”, namely the stochastic Gauß-Helmert model. Consider the model
equations
AE{x} = BE{y}  Ȗ or Aȟ = BȘ  Ȗ
subject to
O( A) = q × m, O(B) = q × n
Ȗ = Bį for some į  R n
rkA = m < rkB = q < n
E{x} = ȟ, E{y} = Ș
E{x} = x  e x , E{y} = y  e y
ª E{x  E{x}} = 0
«
¬ E{(x  E{x})( x  E{x}) '} = ı x 4 x
2

versus
ª E{y  E{y}} = 0
«
«¬ E{(y  E{y})( y  E{y}) '} = ı y 4 y .
2

12-41 Shift to the centre


From the identity Ȗ = Bį to the centre we gain another identity of type
AE{x} = BE{y  į} = B{y  į}  Be y = w  Be y
such that
B(y  į) =: w

w = Aȟ + Be y .

12-42 The condition of unbiased estimators

The unknown ȟˆ = K1 By + A 1 is uniformly unbiased estimable, if and only if

ȟ  E{ȟˆ} º ªȟ = K1BE{y} + l1 = K1 ( Aȟ + Ȗ ) + l1
» or «
n
E{x} = E{E{x}}¼» ¬ for all ȟ  R m .
430 12 The sixth problem of generalized algebraic regression

ȟ̂ is unbiased estimable if and only if


A1 = K1J or K1 A = I m .
In consequence, K1 = A  must be a left generalized inverse.
L

n
The unknown E{y} = y  IJ = y  (K2 By  A 2 ) is uniformly unbiased estimable if
and only if
n
B{y} = E{E{y}} = (I n  K2 B) E{y} + A 2 = E{y}  K2 ( Aȟ + Ȗ ) + l2

for all ȟ  R m and


for all E{y} = R n .
n
E{y} is unbiased estimable if and only if
A 2 = K2 Ȗ or K2 A = 0.

12-43 n
The first step: unbiased estimation of ȟ̂ and E{ȟ}
n
The key lemma of unbiased estimation of ȟ̂ and E{ȟ} will be presented first.

ȟ̂ is unbiased estimable if and only if


n
ȟˆ = L1 y + A 1 = K1By + A 1 E{y} is unbiased estimable if and only if
n
E{y} = L 2 y + A 2 = (I n  K2 B)y + A 2
or
BL 2 = AL1 = AA  B
since
Ȗ + R ( A )  R (B ) = R 9 .

12-44 The second step: unbiased estimation K1 and K2

The bias parameter for K1 and K2 are to be estimated by


K1 = [ A '(BQ y B ') 1 A]A '(BQ y B ') 1
K2 = Q 'y B '(BQ y B ') 1 (I q  AK1 )
A 1 = K1Ȗ , A 2 = +K2 Ȗ

generating BLUUE of E{x} = ȟ and E{y} = Ș , respectively.


13 The nonlinear problem of the 3d datum transformation
and the Procrustes Algorithm

A special nonlinear problem is the three-dimensional datum transformation


solved by the Procrustes Algorithm. A definition of the three-dimensional datum
transformation with the coupled unknowns of type dilatation unknown, also
called scale factor, translation and rotation unknown follows afterwards.

:Fast track reading:


Read Definition 13.1, Corollary 13.2-13.4, 6
and Lemma 13.5 and Lemma 13.7

Corollary 13.2: translation partial


W - LESS for x 2

Corollary 13.3: scale partial


W - LESS for x1

Definition 13.1: ^ 7(3) Corollary 13.4: rotation partial


3d datum transformation W - LESS for X 3

Theorem 13.5: W – LESS of


( Y '1 = Y2 X '3 x1 + 1x '2 + E)

Corollary 13.6: I – LESS of


( Y1  Y2 X '3 x1 + 1x '2 + E)

“The guideline of Chapter 13: definition, corollaries and lemma”


Let us specify the parameter space X, namely
x1 the dilatation parameter x 2 the column vector
– the scale factor – versus of translation parameter
x1  \ x 2  \ 3×1

versus

X 3 O + (3) =: {X 3 || \ 3×3 | X 3* X 3 = I 3 and |X 3 |= +1}


X 3 is an orthonormal matrix, rotation matrix of three parameters
432 13 The nonlinear problem

which is built on the scalar x1 , the vector x 2 and the matrix X 3 . In addition, by
the matrices

ª x1 x2 ... xn 1 xn º
Y1 := «« y1 y2 ... yn 1 yn »»  \ 3×n
«¬ z1 z2 ... zn 1 zn »¼

and

ª X1 X 2 ... X n 1 Xn º
Y2 := «« Y1 Y2 ... Yn 1 Yn »»  \ 3×n
«¬ Z1 Z2 ... Z n 1 Z n »¼

we define a left and right three-dimensional coordinate arrays as an n-


dimensional simplex of observed data. Our aim is
to determine the parameters of the 3 - dimensional datum trans-
formation {x1 , x 2 , X 3} out of a nonlinear transformation (confor-
mal group ^ 7 (3) ). x1  \ stands for the dilatation parameter,
also called scale factor, x 2  \ 3×1 denotes the column vector of
translation parameters, and X 3 O + (3) =: {X 3  \ 3×3 | X '3 X 3 = I 3 ,
|X 3 |= +1} the orthonormal matrix, also called rotation matrix of
three parameters.
The key problem is
how to determine the parameters for the unknowns of type
{x1 , x 2 , X 3} , namely scalar dilatation x1 , vector of translation
and matrix of rotation, for instance by weighted least squares.
Example 1 (simplex of minimal dimension, n = 4 points tetrahedron):

ª x1 x2 x3 x4 ºc ª X1 X2 X3 X 4 ºc
Y1 := «« y1 y2 y3 y4 » œ «« Y1
» Y2 Y3 Y4 »» =: Y
«¬ z1 z2 z3 z4 »¼ «¬ Z1 Z2 Z3 Z 4 »¼

ª x1 y1 z1 º ª X 1 Y1 Z1 º ª e11 e12 e13 º


«x y2 z2 »» «« X 2 Y2 Z2 »» «e e e 23 »»
« 2 = X3 x1 + 1x '2 + « 21 22 .
« x3 y3 z3 » « X 3 Y3 Z3 » «e31 e32 e33 »
« » « » « »
¬« x4 y4 z4 ¼» ¬« X 4 Y4 Z 4 ¼» ¬«e 41 e 42 e 43 ¼»
Example 2 (W – LESS)
We depart from the setup of the pseudo-observation equation given in Example 1
(simplex of minimal dimension, n = 4 points, tetrahedron). For a diagonal
13-1 The 3d datum transformation and the Procrustes Algorithm 433

weight W = Diag( w1 ,..., w4 )  R 4× 4 we compute the Frobenius error matrix W –


semi - norm
ª w1 0 0 0 º ª e11 e12 e13 º
ª e11 e 21 e31 e 41 º «
0 w2 0 0 »» ««e 21 e 22 e 23 »»
|| E ||2W := tr(E ' WE) = tr( ««e12 e 22 e32 e 42 »» « × )=
« 0 0 w3 0 » «e31 e32 e33 »
«¬e13 e 23 e33 e 43 »¼ « » « »
¬« 0 0 0 w4 ¼» ¬«e 41 e 42 e 43 ¼»
ª e11 e12 e13 º
ª e11w1 e 21w2 e31w3 e 41w4 º «
e e e »
= tr( ««e12 w1 e 22 w2 e32 w3 e 42 w4 »» × « 21 22 23 » ) =
«e31 e32 e33 »
¬«e13 w1 e 23 w2 e33 w3 e 43 w4 ¼» «e »
¬« 41 e 42 e 43 ¼»
= w1e11
2
+ w2e 221 + w3e31
2
+ w4e 241 + w1e12
2
+ w2e 222 +
+ w3e32
2
+ w4e 242 + w1e13
2
+ w2e 223 + w3e33
2
+ w4e 243 .

Obviously, the coordinate errors (e11 , e12 , e13 ) have the same weight w1 ,
(e 21 , e 22 , e 23 ) have the same weight w2 , (e31 , e32 , e33 ) have the same weight w3 ,
and finally (e 41 , e 42 , e 43 ) have the same weight w4 . We may also say that the
error weight is pointwise isotropic,
weight e11 = weight e12 = weight e13 = w1
etc. However, the error weight is not homogeneous since
w1 = weight e11 z weight e 21 = w2 .
Of course, an ideal homogeneous and isotropic weight distribution is guaranteed
by the criterion w1 = w2 = w3 = w4 .
13-1 The 3d datum transformation and the Procrustes Algorithm
First, we present W - LESS for our nonlinear adjustment problem for the un-
knowns of type scalar, vector and special orthonormal matrix. Second, we re-
view the Procrustes Algorithm for the parameters {x1 , x 2 , X 3} .
Definition 13.1: (nonlinear analysis for the three-dimensional
datum transformation: the conformal group
^ 7 (3) ):
The parameter array {x1A , x 2 A , X 3A } is called W – LESS (LEast
Squares Solution with respect to the W – Seminorm) of the in-
consistent linear system of equations
Y2 Xc3 x1 + 1xc2 + E = Y1 (13.1)
subject to
Xc3 X3 = I 3 , | X3 |= +1 (13.2)
434 13 The nonlinear problem

of the field of parameters in comparison with alternative parame-


ter arrays {x1A , x 2 A , X 3A } fulfils the inequality equation
|| Y1  Y2 Xc3 A x1A  1xc2 A ||2W :=
:= tr(( Y1  Y2 Xc3A x1A  1xc2 A )cW( Y1  Y2 Xc3 A x1A  1xc2 A )) d
=: tr ((Y1  Y2 Xc3 x1  1xc2 )cW( Y1  Y2 Xc3 x1  1xc2 )) =:
=:|| Y1  Y2 Xc3 x1  1xc2 ||2W (13.3)
in other words if
EA := Y1  Y2 Xc3 A x1A  1xc2 A (13.4)
has the least W – seminorm.

? How to compute the three unknowns {x1 , x 2 , X 3} by means of W – LESS ?


Here we will outline the computation of the parameter vector by means of partial
W – LESS: At first, by means of W – LESS we determine x 2 A , secondly by
means of W – LESS x1A , followed by thirdly means of W – LESS X 3 . In total,
we outline the Procrustes Algorithm.
Step one: x 2

Corollary 13.2 (partial W – LESS for x 2 A ):

A 3 × 1 vector x 2A is partial W – LESS of (13.1) subject to (13.2)


if and only if x 2A fulfils the system of normal equations

1cW1x 2 A = ( Y1  Y2 Xc3 x1 )cW1. (13.5)


x 2A always exists and is represented by

x 2 A = (1cW1)-1 (Y1 - Y2 Xc3 x1 )cW1 . (13.6)

For the special case W = I n the translated parameter vector x 2 A


is given by
1
x 2A = (Y1  Y2 Xc3 x1 )c1. (13.7)
n
For the proof, we shall first minimize the risk function
(Y1  Y2 Xc3 x1  1xc2 )c(Y1  Y2 Xc3 x1  1xc2 ) = min
x2

with respect to x 2 !

:Detailed Proof of Corollary 13.2:


W – LESS is constructed by the unconstrained Lagrangean
13-1 The 3d datum transformation and the Procrustes Algorithm 435

1
L( x1 , x 2 , X 3 ) := || E ||2W =|| Y1  Y2 X '3 x1  1x ' 2 ||2W =
2
1
= tr( Y1  Y2 X '3 x1  1x '2 ) ' W( Y1  Y2 X '3 x1  1x '2 ) =
2
= min
x1 t 0, x 2  R 3×1 , X 3 ' X 3 = I 3

wL
( x 2A ) = (1'W1)x 2  ( Y1  Y2 X '3 x1 ) ' W1 = 0
wx ,2

constitutes the first necessary condition. Basics of the vector-valued differentials


are found in E. Grafarend and B. Schaffrin (1993, pp. 439-451). As soon as we
backward substitute the translational parameter x 2A , we are led to the centralized
Lagrangean
1
L( x1 , X 3 ) = tr{[ Y1  ( Y2 X '3 x1 + (1'W1) 111 ' W( Y1  Y2 X '3 x1 ))]' W *
2
*[Y1  ( Y2 X '3 x1 + (1'W1) 1 11 ' W( Y1  Y2 X '3 x1 ))]}

1
L( x1 , X 3 ) = tr{[(I  (1'W1) 111 ') W( Y1  Y2 X '3 x1 )]' W *
2
*[( I  (1'W1) 1 11 ') W( Y1  Y2 X '3 x1 )]}

1
C := I n  11'
2
being a definition of the centering matrix, namely for W = I n

C := I n  (1 ' W1) 1 11'W (13.8)

being in general symmetric. Substituting the centering matrix into the reduced
Lagrangean L( x1 , X 3 ) , we gain the centralized Lagrangean
1
L( x1 , X3 ) = tr{[ Y1  Y2 X '3 x1 ]' C'WC[ Y1  Y2 X '3 x1 ]}. (13.9)
2
Step two: x1

Corollary 13.3 (partial W – LESS for x1A ):


A scalar x1A is partial W – LESS of (13.1) subject to (13.3) if and
only if
tr Y '1 C'WCY2 X '3
x1A = (13.10)
tr Y '2 C'WCY2
436 13 The nonlinear problem

holds. For the special case W = I n the real parameter is given by

tr Y '1 C'CY2 X '3


x1A = . (13.11)
tr Y '2 C'CY2

The general condition is subject to


C := I n  (1'W1) 1 11'W. (13.12)

:Detailed Proof of Corollary 13.3:


For the proof we shall newly minimize the risk function
1
L( x1 , X 3 ) = tr{[ Y1  Y2 X '3 x1 ]' C'WC[ Y1  Y2 X '3 x1 ]} = min
2 x 1

subject to
X '3 X 3 = I 3 .
wL
( x1A ) = x1A tr X 3Y '2 C ' WCY2 X '3  tr Y '1 C ' WCY2 X '3 = 0
wx1
constitutes the second necessary condition. Due to
tr X 3Y '2 C ' WCY2 X '3 = tr Y '2 C ' WCY2 X '3 X 3 = Y ' 2 C ' WCY2

lead us to x1A . While the forward computation of (wL / wx1 )( x1A ) = 0 enjoyed a
representation of the optimal scale parameter x1A , its backward substitution into
the Lagrangean L( x1 , X 3 ) amounts to

tr Y '1 C ' WCY2 X '3 tr Y '1 C ' WCY2 X '3


L( X 3 ) = tr{[ Y1  Y2 X '3 ]C ' WC *[ Y1  Y2 X ' 3 ]}
tr Y '2 C ' WCY2 tr Y '2 C ' WCY2
1 tr Y '1 C ' WCY2 X '3
L( X 3 ) = tr{( Y '1 C ' WCY1 )  tr( Y '1 C ' WCY2 X '3 ) * 
2 tr Y '2 C ' WCY2
tr Y '1 C ' WCY2 X '3
 tr( X 3Y '2 C'WCY1 ) +
tr Y '2 C ' WCY2
[tr Y '1 C ' WCY2 X '3 ]2
+ tr( X 3Y '2 C'WCY2 X '3 )
[tr Y '2 C ' WCY2 ]2

1 [tr Y '1 C ' WCY2 X '3 ]2 1 [tr Y '1 C ' WCY2 X '3 ]2
L( X 3 ) = tr( Y '1 C ' WCY1 )  +
2 [tr Y '2 C ' WCY2 ] 2 [tr Y '2 C ' WCY2 ]

1 1 [tr Y '1 C ' WCY2 X '3 ]2


L( X3 ) = tr( Y '1 C ' WCY1 )  = min . (13.13)
2 2 [tr Y '2 C ' WCY2 ] X ' X =I
3 3 3

Third, we are left with the proof for the Corollary 13.4, namely X 3 .
13-1 The 3d datum transformation and the Procrustes Algorithm 437

Step three: X 3

Corollary 13.4 (partial W – LESS for X 3A ):


A 3 × 1 orthonormal matrix X 3 is partial W – LESS of (13.1)
subject to (13.3) if and only if
X 3A = UV ' (13.14)

holds where A := Y '1 C'WCY2 = UȈ s V ' is a singular value de-


composition with respect to a left orthonormal matrix
U, U'U = I 3 , a right orthonormal matrix V, VV' = I 3 , and
Ȉ s = Diag(V 1 , V 2 , V 3 ) a diagonal matrix of singular values
(V 1 , V 2 , V 3 ). The singular values are the canonical coordinates of
the right eigenspace ( A'A  Ȉ2s I) V = 0. The left eigenspace is
based upon U = AVȈs 1 .

:Detailed Proof of Corollary 13.4:


The form L( X 3 ) subject to X '3 X 3 = I 3 is minimal if

tr(Y '1 C ' WCY2 X '3 ) = max .


x1 t 0, X '3 X3 = I3

Let A := Y '1 C ' WCY2 = UȈV ' , a singular value decomposition with respect to a
left orthonormal matrix U, U'U = I 3 , a right orthonormal matrix V, VV' = I 3
and Ȉ s = Diag (V 1 , V 2 , V 3 ) a diagonal matrix of singular values (V 1 , V 2 , V 3 ).
Then
3 3
tr( AX '3 ) = tr( UȈ s V ' X '3 ) = tr( Ȉ s V ' X 3U ) = ¦ V i rii d ¦ V i
i =1 i =1
holds, since
R = V'X '3 U = [ rij ]  R 3×3 (13.15)
3
is orthonormal with || rii ||d 1 . The identity tr ( AX '3 ) = ¦ V i applies, if
i =1

V'X '3 U = I3 , i.e. X '3 = VU ', X 3 = UV ',

namely, if tr( AX '3 ) is maximal


3
tr( AX '3 ) = max œ tr AX '3 = ¦V i œ R = V'X '3 U = I 3 . (13.16)
X '3 X 3 = I 3
i =1

An alternative proof of Corollary 13.4 based on formal differentiation of traces


and determinants has been given by P.H. Schönemann (1966) and P.H. Schöne-
mann and R.M. Carroll (1970). Finally, we collect our sequential results in
Theorem 13.5 identifying the stationary point of W – LESS specialized for
W = I in Corollary 13.5. The highlight is the Procrustes Algorithm we review
in Table 13.1.
438 13 The nonlinear problem

Theorem 13.5 (W – LESS of Y '1 = Y2 X '3 x1 + 1x '2 + E )

(i) The parameter array {x1 , x 2 , X 3} is W – LESS if

tr Y '1 C ' WCY2 X '3A


x1A = (13.17)
tr Y '2 C ' WCY2

x 2 A = (1'W1) 1 ( Y1  Y2 X '3A x1A ) ' W1 (13.18)

X 3 = UV ' (13.19)

subject to the singular value decomposition of the general 3 × 3


matrix
Y '1 C ' WCY2 = U Diag(V 1 , V 2 , V 3 )V ' (13.20)

namely
[( Y '1 C ' WCY2 ) '( Y '1 C ' WCY2 )  V i I] v i = 0 (13.21)

V = [v1 , v 2 ,v 3 ], VV' = I 3 (13.22)

U = Y '1 C ' WCY2 V Diag(V 11 , V 21 , V 31 ), (13.23)


U'U = I3 (13.24)

and as well as the centering matrix


C := I n  (1'W1) 1 11'W. (13.25)

(ii) The empirical error matrix of type W- LESS accounts for


tr Y '1 C ' WCY2 VU '
EA = [I n  11'W(1 ' W1) 1 ]( Y1  Y2 VU ' ) (13.26)
tr Y '2 C ' WCY2

with the related Frobenius matrix W – seminorm


tr Y '1 C ' WCY2 VU '
|| EA ||2W = tr( E 'A WEA ) = tr{( Y1  Y2 VU ' )' *
tr Y '2 C ' WCY2
*[I n  11'W(1'W1) 1 ]' W[I n  11'W(1'W1) 1 ]*

tr Y '1 C ' WCY2 VU '


*( Y1  Y2 VU ' )} (13.27)
tr Y '2 C ' WCY2

and the representative scalar measure of the error of type W - LESS

|| EA ||W = tr(E 'A WEA ) / 3n . (13.28)


A special result is obtained if we specialize Theorem 13.5 to the case W = I n :
13-1 The 3d datum transformation and the Procrustes Algorithm 439

Corollary 13.6 (I – LESS of Y '1 = Y2 X '3 x1 + 1x '2 + E ):

(i) The parameter array {x1 , x 2 , X 3} is I – LESS of


Y '1 = Y2 X '3 x1 + 1x '2 + E if

tr Y '1 CY2 X '3A


x1A = (13.29)
tr Y '2 CY2
1
x 2A =
( Y1  Y2 X '3A x1A ) ' 1 (13.30)
n
X 3A = UV ' (13.31)
subject to the singular value decomposition of the general 3 × 3
matrix
Y1 ' CY2 = U Diag(V 1 , V 2 , V 3 )V ' (13.32)
namely
[(Y '1 CY2 )'(Y '1 CY2 )-V i2 ]Iv i = 0, i  {1,2,3}, V = [v1 , v 2 ,v 3 ], VV' = I 3 (13.33)

U = Y '1 CY2 V Diag(V 11 , V 21 , V 31 ), UU' = I 3 (13.34)

and as well as the centering matrix


1
C := I n  11'. (13.35)
n
(ii) The empirical error matrix of type I- LESS accounts for
1 tr Y '1 C ' Y2 VU '
EA = [I n  11']( Y1  Y2 VU ' ) (13.36)
n tr Y '2 CY2
with the related Frobenius matrix W – seminorm
tr Y '1 CY2 VU '
|| E ||I2 = tr( E 'A EA ) = tr{( Y1  Y2 VU ' )' *
tr Y '2 CY2
1 tr Y '1 CY2 VU '
*[I n  11']( Y1  Y2 VU ' )} (13.37)
n tr Y '2 CY2

and the representative scalar measure of the error of type I - LESS

|| EA ||I = tr(E 'A EA ) / 3n . (13.38)

In the proof of Corollary 13.6 we only sketch the result that the matrix
I n  (1/ n)11' is idempotent:
1 1 2 1
(I n  11c)(I n  11c) = I n  11c + 2 (11') 2
n n n n
2 1 1
= I n  11c + 2 n11c = I n  11c.
n n n
440 13 The nonlinear problem

As a summary of the various steps of Corollary 2-4, 5 and Theorem 5, Table 13.1
presents us the celebrated Procrustes Algorithm, which is followed by one short
und interesting Citation about “Procrustes”.
Following Table 13.1, we present the celebrated Procrustes Algorithm which is a
summary of the various steps of Corollary 2-4,5 and Theorem 5.
Table 13.1: Procrustes Algorithm
ª x1 y1 z1 º ª X 1 Y1 Z1 º
Step 1: Read Y1 = #« # # » and « # # # » = Y2
« » « »
«¬ xn yn zn »¼ «¬ X n Yn Z n ¼»
1
Step 2: Compute: Y '1 CY2 subject to C := I n  11 '
n
Step 3: Compute: SVD Y '1 CY2 = UDiag (V 1 , V 2 , V 3 ) V '
3-1 | ( Y '1 CY2 ) '( Y '1 CY2 )  V i2 I |= 0 Ÿ (V 1 , V 2 , V 3 )
3-2 (( Y '1 CY2 ) '( Y '1 CY2 )  V i2 I)v i = 0, i  {1, 2, 3}
V = [ v1 , v 2 ,v 3 ] right eigenvectors (right eigencolumns)
3-3 U = Y '1 CY2 VDiag (V 11 , V 21 , V 31 ) left eigenvectors
(left eigencolumns)
Step 4: Compute: X 3A = UV ' rotation
tr Y '1 CY2 X '3
Step 5: Compute: x1A = (dilatation)
tr Y '2 CY2
1
Step 6: Compute: x 2 A = ( Y1  Y2 X '3 x1 ) ' 1 (translation)
n
tr Y '1 CY2 VU '
Step 7: Compute: EA = C(Y1  Y2 VU ' ) (error matrix)
tr Y '2 CY2
‘optional control’ EA := Y1  ( Y2 X '3 x1A + 1x '2 A )
Step 8: Compute: || EA ||I := tr( E 'A EA ) (error matrix)
Step 9: Compute: || EA ||I := tr( EcA EA ) / 3n (mean error matrix)

Procrustes (the subduer), son of Poseidon, kept an inn benefiting from what he
claimed to be a wonderful all-fitting bed. He lopped of excessive limbage from
tall guests and either flattened short guests by hammering or stretched them by
racking. The victim fitted the bed perfectly but, regrettably, died. To exclude the
embarrassment of an initially exact-fitting guest, variants of the legend allow
Procrustes two, different-sized beds. Ultimately, in a crackdown on robbers and
monsters, the young Theseus fitted Procrustes to his own bed.
13-2 The variance - covariance matrix of the error matrix E 441

13-2 The variance - covariance matrix of the error matrix E


By Lemma 13.7 we review the variance - covariance matrix, namely the vector
valued form of the transposed error matrix, as a function of
Ȉ vecY , Ȉ vecY and
1 2

the covariance matrix


Ȉ vecY ' , ( I … x X ) vecȈ .
1 n 1 3 Y '2

Lemma 13.7: (Variance – covariance “error propagation”):


Let vecE ' be the vector valued form of the transposed error ma-
trix E := Y1  Y2 X '3 x1  1x '2 . Then
Ȉ vecE ' = Ȉ vecY ' + (I n … x1 X3 ) ȈvecY ' (I n … x1 X3 ) ' 2ȈvecY ' , ( I
1 2 1 n … x1 X 3 ) vec Y '2 (13.39)
is the exact representation of the dispersion matrix (variance –
covariance matrix) ȈvecE' of vec E ' in terms of dispersion matri-
ces (variance – covariance matrices) ȈvecY ' and ȈvecY ' of the two 1 2
coordinates sets vec Y '1 and vec Y '2 as well as their covariance
matrix
ȈvecY ' , ( I …X ) vecY ' .
1 n 3 2

The proof follows directly from “error propagation”. Obviously the variance –
covariance matrix of ȈvecE' can be decomposed in the variance – covariance
matrix ȈvecY ' , the product ( I n … x1X 3 ) ȈvecY ' (I n … X 3 ) ' using prior information
1 2
of x1 and X 3 and the covariance matrix ȈvecY ' , ( I … x X ) vecY ' again using prior 1 n 1 3 2
information of x1 and X3 .
13-3 Case studies: The 3d datum transformation and the Procrustes
Algorithm
By Table 13.1 and Table 13.2 we present two sets of coordinates, first for the
local system A, second for the global system B, also called “World Geodetic
System 84”. The units are in meter. The results of
I – LESS, Procrustes Algorithm
are listed in Table 13.3, especially

|| EA ||I := tr( E 'A EA ), ||| EA |||I := tr( E ' A EA ) / 3n


and
W – LESS, Procrustes Algorithm
in Table 13.4, specially

|| EA ||W := tr( E 'A WEA ), |||EA |||W := tr( E ' A WEA ) / 3n

completed by Table 13.5 of residuals from the Linearized Least Squares and by
Table 13.6 listing the weight matrix.
442 13 The nonlinear problem

Discussion
By means of the Procrustes Algorithm which is based upon W – LESS with
respect to Frobenius matrix W – norm we have succeeded to solve the normal
equations of Corollary 13.2 and 13.5 (necessary conditions) of the matrix –
valued “error equations”
vec E ' = vec Y '1  (I n … x1X3 ) vec Y '2  vec x 2 1 '

subject to
X '3 X 3 = I 3 , | X 3 |= +1 .

The scalar – valued unknown x1  R represented dilatation (scale factor), the


vector – valued unknown x 2  R 3×1 the translation vector, and the matrix -
valued unknown X 3  SO (3) the orthonormal matrix. The conditions of suffi-
ciency, namely the Hesse matrix of second derivatives, of the Lagrangean
L( x1 , x 2 , X 3 ) are not discussed here. They are given in the Procrustes references.
In order to present you with a proper choice of the isotropic weight matrix W, we
introduced the corresponding “random regression model”
E{vec E '} = E{vec Y '1}  (I n … x1X3 )E{vec Y '2 }  vec x 2 1 ' = 0
first moment identity,
D{vec E '} = D{vec Y '1 }  (I n … x1 X3 ) D{vec Y '2 }(I n … x1 X3 ) '
2C{vec Y '1 , (I n … x1 X3 ) vec Y '2 },
second central moment identity.

Table 13.2. Coordinates for system A (local system)


positional
Station name X(m) Y(m) Z(m) error sphere
Solitude 4 157 222.543 664 789.307 4 774 952.099 0.1433
Buoch Zeil 4 149 043.336 688 836.443 4 778 632.188 0.1551
Hohenneuffen 4 172 803.511 690 340.078 4 758 129.701 0.1503
Kuehlenberg 4 177 148.376 642 997.635 4 760 764.800 0.1400
Ex Mergelaec 4 137 012.190 671 808.029 4 791 128.215 0.1459
Ex Hof Asperg 4 146 292.729 666 952.887 4 783 859.856 0.1469
Ex Kaisersbach 4 138 759.902 702 670.738 4 785 552.196 0.1220

Table 13.3. Coordinates for system B (WGS 84)


positional
Station name X(m) Y(m) Z(m) error sphere
Solitude 4 157 870.237 664 818.678 4 775 416.524 0.0103
Buoch Zeil 4 149 691.049 688 865.785 4 779 096.588 0.0038
Hohenneuffen 4 173 451.354 690 369.375 4 758 594.075 0.0006
Kuehlenberg 4 177 796.064 643 026.700 4 761 228.899 0.0114
Ex Mergelaec 4 137 659.549 671 837.337 4 791 592.531 0.0068
Ex Hof Asperg 4 146 940.228 666 982.151 4 784 324.099 0.0002
Ex Kaisersbach 4 139 407.506 702 700.227 4 786 016.645 0.0041
13-3 Case studies: The 3d datum transformation and the Procrustes Algorithm 443
Table 13.4. Results of the I-LESS Procrustes transformation
Values
Rotation matrix 0.999999999979023 -4.33275933098276e- 6 4.81462518486797e-6
X 3  \3 x 3 -4.8146461589238e-6 0.999999999976693 -4.84085332591588e-6
4.33273602401529e-6 4.84087418647916e-6 0.999999999978896
Translation 641.8804
x 2  \ 3 x1 (m) 68.6553
416.3982
Scale x1  \ 1.00000558251985
Site X(m) Y(m) Z(m)
Residual matrix
Solitud 0.0940 0.1351 0.1402
E(m)
Buoch Zeil 0.0588 -0.0497 0.0137
Hohenneuffen -0.0399 -0.0879 -0.0081
Kuelenberg 0.0202 -0.0220 -0.0874
Ex Mergelaec -0.0919 0.0139 -0.0055
Ex Hof Asperg -0.0118 0.0065 -0.0546
Ex Keisersbach -0.0294 0.0041 0.0017
Error matrix norm
(m) 0.2890
|| EA ||I := tr(EcA EA )
Mean error matrix
norm (m) 0.0631
||| EA |||I := tr(EcA EA ) / 3n

Table 13.5. Results of the W-LESS Procrustes transformation


Values
Rotation matrix 0.999999999979141 4.77975830372179e-6 -4.34410139438235e-6
X 3  \3 x 3 -4.77977931759299e-6 0.999999999976877 -4.83729276438971e-6
4.34407827309968e-6 4.83731352815542e-6 0.999999999978865
Translation 641.8377
x 2  \ 3 x1 (m) 68.4743
416.2159
Scale x1  \ 1.00000561120732
Site X(m) Y(m) Z(m)
Residual matrix
E(m) Solitude 0.0948 0.1352 0.1407
Buoch Zeil 0.0608 -0.0500 0.0143
Hohenneuffen -0.0388 -0.0891 -0.0072
Kuelenberg 0.0195 -0.0219 -0.0868
Ex Mergelaec -0.0900 0.0144 -0.0052
Ex Hof Asperg -0.0105 0.0067 -0.0542
Ex Keisersbach -0.0266 0.0036 0.0022
Error matrix
norm (m)
||| EA |||W := tr( E*A WEA ) 0.4268

Mean error
matrix norm
(m) 0.0930
||| EA |||W := tr( E WEA ) / 3n
*
A
444 13 The nonlinear problem

Table 13.6. Residuals from the linearized LS solution


Site X(m) Y(m) Z(m)
Solitude 0.0940 0.1351 0.1402
Buoch Zeil 0.0588 -0.0497 0.0137
Hohenneuffen -0.0399 -0.0879 -0.0081
Kuelenberg 0.0202 -0.0220 -0.0874
Ex Mergelaec -0.0919 0.0139 -0.0055
Ex Hof Asperg -0.0118 0.0065 -0.0546
Ex Keisersbach -0.0294 0.0041 -0.0017

Table 13.7. Weight matrix

1.8110817 0 0 0 0 0 0
0 2.1843373 0 0 0 0 0
0 0 2.1145291 0 0 0 0
0 0 0 1.9918578 0 0 0
0 0 0 0 2.6288452 0 0
0 0 0 0 0 2.1642460 0
0 0 0 0 0 0 2.359370

13-4 References
Here is a list of important references:
Awange LJ (1999), Awange LJ (2002), Awange LJ, Grafarend E (2001 a, b, c),
Awange LJ, Grafarend E (2002), Bernhardt T (2000), Bingham C, Chang T,
Richards D (1992), Borg I, Groenen P (1997), Brokken FB (1983), Chang T, Ko
DJ (1995), Chu MT, Driessel R (1990), Chu MT, Trendafilov NT (1998),
Crosilla F (1983a, b), Dryden IL (1998), Francesco D, Mathien PP, Senechal D
(1997), Golub GH (1987), Goodall C (1991), Gower JC (1975), Grafarend E
and Awange LJ (2000, 2003), Grafarend E, Schaffrin B (1993), Grafarend E,
Knickmeyer EH, Schaffrin B (1982), Green B (1952), GullikssonM(1995a, b),
Kent JT, Mardia KV (1997), Koch KR (2001), Krarup T (1979), Lenzmann E,
Lenzmann L (2001a, b), Mardia K (1978), Mathar R (1997), Mathias R (1993),
Mooijaart A, Commandeur JJF (1990), Preparata FP, Shamos MI (1985), Reink-
ing J (2001), Schönemann PH (1966), Schönemann PH, Carroll RM (1970),
Schottenloher M (1997), Ten Berge JMF (1977), Teunissen PJG (1988), Tre-
fethen LN, Bau D (1997) and Voigt C (1998).
14 The seventh problem of generalized algebraic regression
revisited: The Grand Linear Model:
The split level model of conditional equations with unknowns
(general Gauss-Helmert model)

The reaction of one man can be


forecast by no known mathematics;
the reaction of a billion is something else again.
Isaac Asimov

:Fast track reading:


Read only Lemma 14.1, Lemma 14.2,
Lemma 14.3

Lemma 14.10: W - LESS Lemma 14.1: W - LESS

Lemma 14.13:R, W - MINOLESS Lemma 14.2: R, W - MINOLESS

14-11 Lemma 14.16: R, W - Lemma 14.3: R, W - HAPS


HAPS relation between A und B

“The guideline of Chapter 14: three lemmas”


The inconsistent, inhomogeneous system of linear equations
Ax + Bi = By  c
we treated before will be specialized for arbitrary condition equations between
the observation vector y on the one side and the other side. We assume in addi-
tion that those condition equations which do not contain the observation vector y
are consistent. The n × 1 vector i of inconsistency is specialized to B(y - i ) 
c + R ( A) .
The first equation: B1i = B1y  c1
The first condition equation is specialized to contain only conditions acting on
the observation vector y, namely as an inconsistent equation
B1i = B1y  c1.
446 14 The seventh problem of generalized algebraic regression

There are many examples for such a model. As a holonomity condition there is
said, for instance, that the “true observations” fulfill an equation of type
0 = B1E{y}  c1 .
Example
Let there be a given two connected triangular networks of type height difference
measurements which we already presented in Chapter 9-3, namely for c1 := 0
and
{hDE + hEJ + hJD = 0}
{hJE + hEG + hJE = 0}
ª1 1 1 0 0 º
B1 = « , y := ª¬ hDE , hEJ , hJD , hEJ , hGE º¼c .
¬ 0 1 0 1 1 »¼
The second equation: A 2 x + B 2 i = B 2 y  c 2 , c 2  R (B 2 )
The second condition equation with unknowns is assumed to be the general
model which is characterized by the inconsistent, inhomogeneous system of lin-
ear equations, namely
A 2 x + B 2 i = B 2 y  c 2 , c 2  R (B 2 ).

Examples have been given earlier.


The third equation: A 3 x = c3 , c3  R ( A 3 )
The third condition equation is specialized to contain only a restriction acting on
the unknown vector x in the sense of a fixed constraint, namely
A 3 x = c3 or A 3 x + c3 = 0, c3  R ( A 3 ).

We refer to our old example of fixing a triangular network in the plane whose
position coordinates are derived from distance measurements and fixed by a
datum constraint.
The other linear model of type Chapters 1, 3, 5, 9 and 12 can be consid-
ered as special cases. Lemma 14.1 refers to the solution of type W -
LESS, Lemma 14.2 of type R, W – MINOLESS and Lemma 14.3 of
type R, W – HAPS.

14-1 Solutions of type W-LESS


The solutions of our model equation Ax + Bi = By  c of type W – LESS can be
characterized by Lemma 14.1.
Lemma 14.1 (The Grand Linear Model, W - LESS):
The m × 1 vector xl is W – LESS of Ax + Bi = By  c if and only if
14-1 Solutions of type W-LESS 447

ª W Bc 0 º ª i l º ª 0 º
« B 0 A » « Ȝ l » = « By  c » (14.1)
« 0 Ac 0 » « x » « 0 »
¬ ¼¬ l¼ ¬ ¼
with the q3 × 1 vector Ȝ l of “Lagrange multipliers”. x l exists in the case
of
R (Bc)  R ( W)
and is solution of the system of normal equations
ª A c2 (B 2 W  W1W  Bc2 ) 1 A 2 A 3 º ª xl º
« =
¬ A2 0 »¼ «¬ Ȝ 3 »¼
(14.2)
ª A c (B W  W1W  Bc2 ) 1 (B 2 W  W1y  k 2 ) º
=« 2 2 »
¬ c 3 ¼
with
W1 := W  B1c ( B1 W  B1c ) 1 B1 (14.3)
k 2 := c 2  B 2 W B1c ( B1 W B1c ) c1
  1
(14.4)

and the q3 × 1 vector Ȝ 3 of “Lagrange multipliers” which are independ-


ent of the choice of the g-inverse W  uniquely determined if Ax l is
uniquely determined. x l is unique if the matrix
N := A c2 ( B 2 W  W1 W  Bc2 ) 1 A 2 + A c3 A 3 (14.5)
is regular or equivalently if
rk[ A c2 , A c3 ] = rk A = m. (14.6)

In this case, x l has the representation


xl = N 1 N 3 N 1 A c2 (B 2 W  W1  k 2 )  N 1 Ac3 ( A 3 N 1 Ac3 )  c3 (14.7)
with
N 3 := N  A c3 ( A 3 N 1A c3 )  A 3 (14.8)

independent of the choice of the g-inverse ( A 3 N 1 A c3 )  .


:Proof:
W – LESS will be constructed by means of the “Lagrange function”
L ( i, x, Ȝ ) := icWi + 2Ȝ c( Ax + Bi  By + c ) = min
i, x, Ȝ

for which the first derivatives


wL
(i l , xl , Ȝ l ) = 2( Wi l + BcȜ l ) = 0
wi
448 14 The seventh problem of generalized algebraic regression

wL
(i l , xl , Ȝ l ) = 2 A cȜ l = 0
wx
wL
(i l , xl , Ȝ l ) = 2( Axl + Bi l  By + c) = 0

are necessary conditions. Note the theory of vector derivatives is summarized in
Appendix B. The second derivatives
w2L
(i l , x l , Ȝ l ) = 2W t 0
wiwic
constitute due to the positive-semidefiniteness of the matrix W the sufficiency
condition. In addition, due to the identity WW  Bc = Bc and the invariance of
BW  Bc with respect to the choice of the g-inverse such that with the matrices
BW  Bc and B1 W  B1c the “Schur complements” B 2 W  Bc2  B 2 W  B1c
(B1 W  B1c ) 1 B1 W  Bc2 = B 2 W  W1 W  Bc2 is uniquely invertible. Once if the vector
( q1 + q2 + q3 ) × 1 vector Ȝ l is partitioned with respect to Ȝ cl := [Ȝ1c , Ȝ c2 , Ȝ c3 ] with
O(Ȝ i ) = q1 × 1 for all i = 1, 2, 3, then by eliminating of i l we arrive at the reduced
system of normal equations
ª B1W  B1c B1W  Bc2 0 0 º ª Ȝ1 º ª B1y  c1 º
« B W  B c B W  B c 0 A » « Ȝ » « B y  c »
« 2 1 2 2 2»
« 2»= « 2 2
» (14.9)
« 0 0 0 A 3»« 3»
Ȝ « c 3 »
¬« 0 A c2 A c3 0 »¼ «¬ Ȝ l »¼ «¬ 0 »¼

and by further eliminating Ȝ1 and Ȝ 2


1
ª B W  Bc B1 W  Bc2 º
[ 0, Ac2 ] «B1 W  B1c
B W 
B c » = A c(B 2 W W1 W Bc2 ) [B 2 W B1c (B1 W B1c ) , I]
  1   1

¬ 2 1 2 2¼

leads with c1  \ q = R (B1 ), c 2  \ q = R (B 2 ) and c3  R ( A 3 ) to the existence


1 2

of x l . An equivalent system of equations is


ª N A c3 º ª xl º ª A c2 (B 2 W  W1W  Bc2 ) 1 (B 2 W  W1y  k 2 )  Ac3c3 º
«¬ A 3 0 »¼ «¬ Ȝ 3 »¼ = « c3 »
¬ ¼
subject to
N := A c2 ( B 2 W W1 W  Bc2 ) 1 A 2 + A c3 A 3 ,


which we can solve for


c3 = A 3 xl = A 3 N  Nxl =
A 3 N  Ac2 (B 2 W  W1 W  Bc2 ) 1 (B 2 W  W1 y  k 2 )  A3 N  Ac3 (c3 + Ȝ 3 )

for an arbitrary g-inverse N  . In addition, we solve further for a g-inverse


( A 3 N  A c3 ) 
14-2 Solutions of type R, W-MINOLESS 449

A c3 (c3 + Ȝ 3 ) = A c3 ( A 3 N  A c3 )  A c3 N  Ac3 (c3  Ȝ 3 ) =


= A c3 ( A 3 N  A c3 )  A 3 N  Ac2 (B 2 W  W1 W  Bc2 ) 1 (B 2 W  W1 y  k 2 ) + Ac3 ( A 3 N  Ac3 )  c3
subject to
Nxl = A c2 (B 2 W  W1 W  Bc2 ) 1 (B 2 W  W1 y  k 2 )  Ac3 (c3 + Ȝ 3 ) =
[N  A c3 ( A 3 N  A c3 )  A 3 ]N  A c2 (B 2 W  W1 W  Bc2 ) 1 (B 2 W  W1 y  k 2 ) 
 A c3 ( A 3 N  A c3 )  c3 .

With the identity


ª(B W  W1W  Bc2 ) 1 0 º ª A 2 º
N = [ A c2 , A c3 ] « 2 (14.10)
¬ 0 I »¼ «¬ A 3 »¼
we recognize that with Nx l also Ax l = AN  Nx l is independent of the g-inverse
N  and is always uniquely determinable. x l is unique if and only if N is regu-
lar. We summarize our results by specializing the matrices A and B and the
vector c and find x l of type (14.7) and (14.8).

14-2 Solutions of type R, W-MINOLESS


The solutions of our model equation Ax + Bi = By  c of type R, W –
MINOLESS are characterized by Lemma 14.2.
Lemma 14.2 (The Grand Linear Model, R, W – MINOLESS):
Under the assumption
R (Bc)  R ( W) (14.11)
is the vector x lm R, W – MINOLESS of Ax + Bi = By  c if and only if
the system of normal equations
ª 0 º
ª R N º ª xlm º « 
c  
c 1  »
«¬ N 0 »¼ «¬ Ȝ lm »¼ « 3N N A 2 ( B 2 W W 1 W B 2 ) ( B 2 W W1 y  k 2 ) (14.12)
 
»
¬«  A c3 ( A 3 N A c3 ) c3 ¼»
with the m × 1 vector Ȝ lm of “Lagrange multipliers” subject to
(14.13) W1 := W  B1 (B1 W  B1c )-1 B1 , k 2 := c 2  B 2 W  B1c (B1 W  B1c )-1 c1 (14.14)
(14.15) N := A c2 (B 2 W W1 W Bc2 ) A 2 + A c3 A 3 , N 3 := N  A c3 ( A 3 N Ac3 ) A 3 . (14.16)
  -1 

All definitions are independent of the choice of the g – inverse


N  and ( A 3 N  A c3 )  . xAm exists always if and only if the matrix
R+N (14.17)
is regular, or equivalently if
rk[R, A] = m (14.18)
holds.
450 14 The seventh problem of generalized algebraic regression

:Proof:
The proof follows the line of Lemma 12.13 if we refer to the reduced system of
normal equations (12.62). The rest is subject to the identity (12.59)
ªR  0 0 0º
« »
0 I 0 0» ªR º
R + N = [R , A c] «

« ». (14.19)
« 0 0 (B 2 W  W1W  Bc2 )1 0 » ¬ A ¼
« »
¬« 0 0 0 I ¼»

It is obvious that the condition (14.18) is fulfilled if the matrix R is positive


definite and consequently if it is describing a R - norm. In specifying the matri-
ces A and B and the vector c, we receive a system of normal equations of type
(14.12)-(14.18).
14-3 Solutions of type R, W-HAPS
The solutions of our model equation Ax + Bi = By  c of type R, W – HAPS will
be characterized by Lemma 14.3.
Lemma 14.3 (The Grand Linear Model, R, W - HAPS):
An m × 1 vector x h is R, W – HAPS of Ax + Bi = By  c if it solves the
system of normal equations
ª W Bc 0 º ª i h º ª 0 º
« B 0 A » « Ȝ h » = « By - c » (14.20)
« 0 Ac R » « x » « 0 »
¬ ¼¬ h¼ ¬ ¼
with the q × 1 vector Ȝ A of “Lagrange multipliers”. x h exists if
R (Bc)  R ( W) (14.21)
and if it solves the system of normal equations
ª R + A c2 (B 2 W  W1W  Bc2 ) 1 A 2 A c3 º ª x h º
« =
¬ A3 0 »¼ «¬ Ȝ 3 »¼
(14.22)
ª A c (B W  W1W  Bc2 ) 1 (B 2 W  W1y  k 2 ) º
=« 2 2 »
¬ c3 ¼
with
(14.23) W1 := W  B1c ( B1 W  B1c ) 1 B1 , k 2 := c 2  B 2 W  B1c ( B1 W  B1c ) 1 c1 (14.24)

and the q3 × 1 vector Ȝ 3 of “Lagrange multipliers” which are defined


independent of the choice of the g – inverse W  and in such a way
that both Ax h and Rx A are uniquely determined. x h is unique if
and only if the matrix
14-3 Solutions of type R, W-HAPS 451

R+N (14.25)
with
N := A c2 ( B 2 W  W1 W  Bc2 ) 1 A 2 + A c3 A 3 (14.26)

being regular or equivalently, if


rk[R , A ] = m. (14.27)

In this case, x h can be represented by

x h = (R + N ) 1{R + N  A c3 [ A 3 (R + N) 1 A c3 ] A 3 ×
×(R + N ) 1 A c2 (B 2 W  W1 W  Bc2 ) 1 × (14.28)
×(B 2 W W1 y  k 2 )  ( R + N) Ac3 [ A 3 ( R + N) A c3 ) c3 ,
 1 1 

which is independent of the choice of the g - inverse [ A 3 ( R + N) 1 A c3 ] .

:Proof:

R, W – HAPS will be constructed by means of the “Lagrange function”


L ( i, x, Ȝ ) := icWi + x cRx + 2 Ȝ c( Ax + Bi  By + c) = min
i, x, Ȝ

for which the first derivatives


wL
( i h , x h , Ȝ h ) = 2( Wi h + BcȜ h ) = 0
wi
wL
( i h , x h , Ȝ h ) = 2( A cȜ h + Rx h ) = 0
wx
wL
( i h , x h , Ȝ h ) = 2( Ax h + Bi h  By + c ) = 0

are necessary conditions. Note the theory of vector derivatives is summarized in
Appendix B. The second derivatives

w 2L
(i h , x h , Ȝ h ) = 2 W t 0
wiwi c
w 2L
( i h , x h , Ȝ h ) = 2R t 0
wxwx c
constitute due to the positive-semidefiniteness of the matrices W and R a suffi-
ciency condition for obtaining a minimum. Because of the condition (14.21)
R (Bc)  R ( W) we are able to reduce first the vector i A in order to be left with
the system of normal equations.
452 14 The seventh problem of generalized algebraic regression

ª -B1W  B1c -B1 W  B c2 0 0 º ª Ȝ1 º ª B1y - c1 º


«-B W  Bc -B W  B c 0 A » « Ȝ » «B y - c »
« 2 1 2 2 3»
« 2»= « 2 2
» (14.29)
« 0 0 0 A 3 » « Ȝ 3 » « -c3 »
¬« 0 A c3 A c3 R ¼» «¬ x h »¼ «¬ 0 ¼»

is produced by partitioning the ( q1 + q2 + q3 ) × 1 vector due to Ȝ ch = [Ȝ1c , Ȝ c2 , Ȝ c3 ]


and 0( Ȝ i ) = qi for i = 1, 2, 3 . Because of BW  Bc and B1 W  B1c with respect to
the “Schur complement”, B 2 W  Bc2  B 2 W  B1c (B1 W  B1c ) 1 B1 W  Bc2 = B 2 W 
W1 W  Bc2 is uniquely invertible leading to a further elimination of Ȝ1 and Ȝ 2
because of
1
ª B W  Bc B1W  Bc2 º
» = A c2 (B 2 W W1 W Bc2 ) [ B 2 W B1c (B1W B1c ) , I ]
  1   1
[0, A c2 ] « 1  1 
B
¬ 2 W B1
c B 2 W B c

and
ª R + N A c3 º ª x h º ª A c2 (B 2 W  W1 W  Bc2 )1 (B 2 W  W1 y  k 2 )  A c3 c3 º
=« ».
« A
¬ 3 0 »¼ «¬ Ȝ 3 »¼ ¬ c3 ¼
For any g – inverse ( R + N )  there holds

c3 = A 3 x h = A 3 (R + N) 1 (R + N)x h =
A 3 (R + N)[ A c2 (B 2 W  W1 W  Bc2 )-1 (B 2 W  W1 y  k 2 )  A 3 (c3 + Ȝ 3 )]

and for an arbitrary g – inverse [ A 3 ( R + N)  A c3 ]

(R + N)x h = A c2 (B 2 W  W1W  Bc2 )-1 (B 2 W  W1y  k 2 )  A c3 (c3 = Ȝ 3 ) =


= {R + N  A c3 [ A 3 (R + N)  A c3 ] A c3 }(R + N)  (B 2 W  B1 W  Bc2 )-1 ×
×(B 2 W  W1y  k 2 )  A3c [ A3 ( R + N)  Ac3 ] c3

A c3 (c3 = Ȝ 3 ) = A c3 [ A 3 (R + N)  A 3 ](R + N)  A c3 (c3 = Ȝ 3 ) =


= A c3 [ A 3 ( R + N)  A c3 ] A 3 (R + N) 1 A c2 (B 2 W  W1 W  Bc2 )-1 (B 2 W  W1 y  k 2 ) +
+ A c3 [ A 3 (R + N)  A c3 ] c3 .
(14.30)
Thanks to the identity
ªR  0 0º ª R º
« »
R + N = [R, A c2 , A c3 ] « 0  
(B 2 W W1W B c2 ) 1
0 » «« A 2 »»
« 0 0 I »¼ «¬ A 3 »¼
¬
it is obvious that
14-4 Review of the various models: the sixth problem 453

(i) the solution x h exists always and


(ii) is unique when the matrix ( R + N ) is regular which co-
incide with (14.28).
Under the condition R (Bc)  R ( W) , R, W – HAPS is unique if R, W –
MINOLESS is unique. Indeed the forms for R, W – HAPS and R, W –
MINOLESS are identical in this case. A special form, namely
(R + N)x Am + NȜ Am =
(14.31)
= N 3 N A c2 (B 2 W W1 W  Bc2 )-1 (B 2 W  W1y  k 2 )  A c3 ( A 3 N  Ac3 ] c3
 

Nx Am =
(14.32)
= N 3 N Ac2 (B 2 W W1W Bc2 ) (B 2 W  W1y  k 2 )  A c3 ( A 3 N  A c3 ] c3 ,
   1

leads us to the representation

xh  xAm = (R + N)1 NȜ Am +
+(R + N)1{R + N  Ac3 ( A3 (R + N)1 Ac3 ) A3 ](R + N)1 
(14.33)
N3 N- }Ac2 (B2 W W1 W B'2 )-1 ×
×(B2 W W1y  k 2 ) + (R + N)1 Ac3 {(A3 N1 Ac3 )  [A3 (R + N)1 Ac3 ] }c3 .

14-4 Review of the various models: the sixth problem


Table 14.1 gives finally a review of the various models of type “split level”.
Table 14.1 (Special cases of the general linear model of type condi-
tional equations with unknowns (general Gauss-Helmert model)):
B1i = B1y - c1 ª0 º ª B1 º ª B1 º ª c1 º
A 2 x + B 2 i = B 2 y - c 2 œ « A 2 » x + «B 2 » i = « B 2 » y  «c 2 »
« » « » « » « »
A 3 x = -c31 ¬ A3 ¼ ¬0¼ ¬0¼ ¬ c3 ¼

Ax = y Ax + i = y AX = By Ax + Bi = By Bi = By Ax = By - c Ax + Bi = By - c Bi = By - c
y  R(A) y  R (A) By  R(A) By  R(A) By  c + R(A) By  c + R(A)

A2 = A A2 = A A2 = A A2 = A A2 = 0 A2 = A A2 = A A2 = 0
A3 = 0 A3 = 0 A3 = 0 A3 = 0 A3 = 0 A3 = 0 A3 = 0 A3 = 0
B1 = 0 B1 = 0 B1 = 0 B1 = 0 B1 = 0 B1 = 0 B1 = 0 B1 = B
B2 = I B2 = I B2 = B B2 = B B2 = 0 B2 = B B2 = B B2 = 0
(i = 0) (i = 0) (i = 0)
c1 = 0 c1 = 0 c1 = 0 c1 = 0 c1 = 0 c1 = 0 c1 = 0 c1 = c
c2 = 0 c2 = 0 c2 = 0 c2 = 0 c2 = 0 c2 = c c2 = c c2 = 0
c3 = 0 c3 = 0 c3 = 0 c3 = 0 c3 = 0 c3 = 0 c3 = 0 c3 = 0
454 14 The seventh problem of generalized algebraic regression

Example 14.1
As an example of a partitioned general linear system of equations, of type
Ax + Bi = By  c we treat a planar triangle whose coordinates consist of three
distance measurements under a datum condition. As approximate coordinates for
the three points we choose
xD = 3 / 2, yD = 1/ 2, xE = 3, yE = 1, xJ = 3 / 2, yJ = 3 / 2

such that the linearized observation equation can be represented as


A 2 x = y (B 2 = I, c 2 = 0, y  R (A 2 ))

ª  3 / 2 1/ 2 3/2 1/ 2 00 º
« »
A2 = « 0 1 0 0 01 ».
« 0 0 3 / 2 1/ 2  3 / 2 1/ 2 »¼
¬
The number of three degrees of freedom of the network of type translation and
rotation are fixed by three conditions:
ª1 0 0 0 0 0º ª 0, 01º
A 3x = c 3 (c 3  R(A 3 ) ), A 3 = «0 1 0 0 0 0 , c 3 = «0, 02 » ,
»
« » « »
«¬0 0 0 0 1 0»¼ «¬ 0, 01»¼

especially with the rank and order conditions


rkA 3 = m  rkA 2 = 3, O(A 2 ) = n × m=3 × 6, O(A 3 ) = ( m  rkA 2 ) × m=3 × 6,
and
the 6 × 6 matrix [ A c2 , A c3 ]c is of full rank. Choose the observation vector
ªA2 º ª y º
y = [104 ,5 × 104 , 4 × 104 ]c and find the solution x = « » « » , in detail:
¬ A 3 ¼ ¬ c 3 ¼
x = [0.01, 0.02, 0.00968, 0.02075, 0.01, 0.02050]c.
15 Special problems of algebraic regression and stochastic esti-
mation: multivariate Gauss-Markov model, the n-way clas-
sification model, dynamical systems
Up to now, we have only considered an “univariate Gauss-Markov model”. Its
generalization towards a multivariate Gauss-Markov model will be given in
Chapter 15.1. At first, we define a multivariate linear model by Definition 15.1
by giving its first and second order moments. Its algebraic counterpart via multi-
variate LESS is subject of Definition 15.2. Lemma 15.3 characterizes the multi-
variate LESS solution. Its multivariate Gauss-Markov counterpart is given by
Theorem 15.4. In case we have constraints in addition, we define by Definition
15.5 what we mean by “multivariate Gauss-Markov model with constraints”.
The complete solution by means of “multivariate Gauss-Markov model with
constraints” is given by Theorem 15.5.
In contrast, by means of a MINOLESS solution we present the celebrated “n-way
classification model”. Examples are given for a 1-way classification model, for a
2-way classification model without interaction, for a 2-way classification model
with interaction with all numerical details for computing the reflexive, symmetric
generalized inverse ( A cA ) rs . The higher classification with interaction is finally
reviewed. We especially deal with the problem how to compute a basis of unbi-
ased estimable quantities from biased solutions.
Finally, we take account of the fact that in addition to observational models, we
have dynamical system equations. Additionally, we therefore review the Kalman
Filter (Kalman - Bucy Filter). Two examples from tracking a satellite orbit and
from statistical quality control are given. In detail, we define the stochastic proc-
ess of type ARMA and ARIMA. A short introduction on “dynamical system the-
ory” is presented. By two examples we illustrate the notions of “a steerable
state” and of “observability”. A careful review of the conditions “steerability” by
Lemma 15.7 and “observability” by Lemma 15.8 is presented. Traditionally the
state differential equation as well as the observational equation are solved by a
typical Laplace transformation which we will review shortly. At the end, we
focus on the modern theory of dynamic nonlinear models and comment on the
theory of chaotic behaviour as its up-to date counterpart.
15-1 The multivariate Gauss-Markov model – a special problem of
probabilistic regression –
Let us introduce the multivariate Gauss-Markov model as a special problem of
the probabilistic regression. If for one matrix A of dimension O( A) = n × m in a
Gauss-Markov model instead of one vector of observations several observation
vectors y i of dimension O ( y i ) = n × p with identical variance-covariance matrix
Ȉij are given and the fixed array of parameters ȟ i has to be determined, the
model is referred to as a
456 15 Special problems of algebraic regression and stochastic estimation

Multivariate Gauss-Markov model.


The standard Gauss-Markov model is then called a univariate Gauss-Markov
model. The analysis of variance-covariance is applied afterwards to a multivari-
ate model if the effect of factors can be referred to not only by one characteristic
of the phenomenon to be observed, but by several characteristics. Indeed this is
the multivariate analysis of variance-covariance. For instance, the effects of
different regions on the effect of a species of animals are to be investigated, the
weight of the animals can serve as one characteristic and the height of the ani-
mals as a second one.
Multivariate models can also be setup, if observations are repeated at different
times, in order to record temporal changes of a phenomenon. If measurements in
order to detect temporal changes of manmade constructions are repeated with
identical variance-covariance matrices under the same observational program,
the matrix A of coefficients in the Gauss-Markov model stays the same for each
repetition and each repeated measurement corresponds to one characteristic.
Definition 15.1 (multivariate Gauss-Markov model):
Let the matrix A of the order n × m be given, called the first or-
der design matrix, let ȟ i denote the matrix of the order m × p of
fixed unknown parameters, and let y i be the matrix of the order
n × p called the matrix of observations subject to p d n . Then
we speak of a “multivariate Gauss-Markov model” if
ª O{y i } = n × p
(15.1) E{y i } = Aȟ i subject to «« O{A} = n × m
«¬O{ȟ i } = m × p

(15.2) D{y i , y j } = I nG ij subject to O{G ij } = p × p and p.d.

for all i, j  {1,..., p} apply for a second order statistics. Equiva-


lent vector and matrix forms are
(15.3) E{Y} = AȄ and E{vec Y } = (I p … A) vec Ȅ (15.4)
(15.5) D{vec Y} = Ȉ … I n and d {vec Y} = d ( Ȉ … I n ) (15.6)
subject to
O{Ȉ} = p × p, O{Ȉ … I} = np × np,
O{vec Y} = np × 1, O{Y} = n × p
O{D{vec Y}} = np × np, O{d (vec Y)} = np( np + 1) / 2.

The matrix D{vec Y} builds up the second order design matrix as


the Kronecker-Zehfuss product Ȉ and I n .
15-1 The multivariate Gauss-Markov model 457

In the multivariate Gauss-Markov model both the matrices ȟ i and V ij or ; and


Ȉ are unknown. An algebraic equivalent of the multivariate linear model would
read as given by Definition 15.2.
Definition 15.2 (multivariate linear model):
Let the matrix A of the order n × m be given, called first order
algebraic design matrix, let x i denote the matrix of the order of
fixed unknown parameter, and y i be the matrix of order
n × p called the matrix of observations subject to p d n . Then
we speak of an algebraic multivariate linear model if
p

¦ || y i  Axi ||G2 = min ~ || vec Y  (I p … A) vec X ||G2


y vec Y
= min (15.7)
xi
i =1

establishing a G y  or G vecY -weighted least squares solution of


type multivariate LESS.
It is a standard solution of type multivariate LESS if
ª O{X} = m × p
X = ( A cG y A)  A cG y Y « O{A} = n × m (15.8)
«
¬O{Y} = n × p ,
which nicely demonstrates that the multivariate LESS solution is built on a series
of univariate LESS solutions. If the matrix A is regular in the sense of
rk( A cG y A ) = rk A = m , our multivariate solution reads
X = ( A cG y A ) 1 A cG y Y , (15.9)
excluding any rank deficiency caused by a datum problem. Such a result may be
initiated by fixing a datum parameter of type translation (3 parameters at any
epoch), rotation (3 parameters at any epoch) and scale (1 parameter at any ep-
och). These parameters make up the seven parameter conformal group C7 (3) at
any epoch in a three-dimensional Euclidian space (pseudo-Euclidian space).
Lemma 15.3 (general multivariate linear model):
A general multivariate linear model is multivariate LESS if
p, p

¦ (y i  Ax i )Gij ( y j  Ax j ) = min (15.10)


x
i , j =1

or
n m, m p , p

¦ ¦ ¦ ( yD i  aDE xE i )G ij ( yD j  aDJ xJ j ) = min (15.11)


x
=1
D E J , i, j

or
(vec Y  (I p … A) vec X)c(I n … G y (vec Y  (I p … A) vec X) = min . (15.12)
x

An array X , dim X = m × p is multivariate LESS, if


458 15 Special problems of algebraic regression and stochastic estimation

vec X = [( I p … A )c( I n … G Y ) ( I p … A )]1 ( I p … A ) c( I n … G Y ) vec Y (15.13)


and
rk(I n … G Y ) = np. (15.14)

Thanks to the weight matrix Gij the multivariate least squares solution (15.3)
differs from the special univariate model (15.9). The analogue to the general
LESS model (15.10)-(15.12) of type multivariate BLUUE is given next.
Theorem 15.4 (multivariate Gauss-Markov model of type ȟ i ,
in particular ( Ȉ, I n ) - BLUUE):
A multivariate Gauss-Markov model is ( Ȉ, I n ) - BLUUE if the
vector vecȄ of an array Ȅ, dim Ȅ = n × p , dim(vec Ȅ) = np × 1
of unknowns is estimated by the matrix ȟ i , namely
ˆ = [(I … A)c( Ȉ … I ) 1 (I … A)]1 (I … A)c( Ȉ … I ) 1 vec Y (15.15)
vec Ȅ p n p p n

subject to
rk( Ȉ … I n ) 1 = np . (15.16)
Ȉ ~ V ij denotes the variance-covariance matrix of multivariate
effects yDi for all D = 1,..., n and i = 1,..., p . An unbiased
estimator of the variance-covariance matrix of multivariate
effects is
ª 1 ˆ ˆ
« i = j : Vˆ i = n  q (y i  Aȟ i )c(y i  Aȟ i )
2

« (15.17)
«i z j : Vˆ = 1 (y  Aȟˆ )c(y  Aȟˆ )
«¬ ij
nq
i i j i

because of
E{(y i  Aȟˆ i )c(y j Aȟˆ j )} = E{y ci ( I  A( AcA) 1 A c) y j } = V ij ( n  q) . (15.18)

A nice example is given in K.R. Koch (1988 pp. 281-286). For practical applica-
tions we need the incomplete multivariate models which do not allow a full rank
matrix V ij . For instance, in the standard multivariate model, it is assumed that
the matrix A of coefficients has to be identical for p vectors y i and the vectors
y i have to be completely given.
If due to a change in the observational program in the case of repeated measure-
ments or due to a loss of measurements, these assumptions are not fulfilled, an
incomplete multivariate model results. If all the matrices of coefficients are dif-
ferent, but if p vectors y i of observations agree with their dimension, the vari-
ance-covariance matrix Ȉ and the vectors ȟ i of first order parameters can be
iteratively estimated.
15-1 The multivariate Gauss-Markov model 459

For example, if the parameters of first order, namely ȟ i , and the parameters of
second order, namely V ij , the elements of the variance-covariance matrix, are
unknown, we may use the hybrid estimation of first and second order parame-
ters of type {ȟ i , V ij } as outlined in Chapter 3, namely Helmert type simultaneous
estimation of {ȟ i , V ij } (B. Schaffrin 1983, p.101).
An important generalization of the standard multivariate Gauss-Markov model
taking into account constraints, for instance caused by rank definitions, e.g. the
datum problem at r epochs, is the
multivariate Gauss-Markov model
with constraints
which we will treat at the end.
Definition 15.5 (multivariate Gauss-Markov model with constraints):
If in a multivariate model (15.1) and (15.2) the vectors ȟ i of pa-
rameters of first order are subject to constraints
Hȟ i = w i , (15.19)
where H denotes the r × m matrix of known coefficients with the
restriction
H ( A cA)  A cA = H, rk H = r d m (15.20)
and w i known r × 1 vectors, then
E{y i } = Aȟ i , (15.21)
D{y i , y j } = I nV ij (15.22)
subject to
Hȟ i = w i (15.23)
is called “the multivariate Gauss-Markov model with linear ho-
mogeneous constraints”. If the p vectors w i are collected in the
r × p matrix W , dim W = r × p, the corresponding matrix model
reads
E{Y} = A;, D{vec Y} = Ȉ … I n , HȄ = W (15.24)
subject to
O{Ȉ} = p × p, O{Ȉ … I n } = np × np,
O{vec Y} = np × 1, O{Y} = n × p
(15.25)
O{D{vec Y}} = np × np,
O{H} = r × m, O{Ȅ} = m × p, O{W} = r × p.

The vector forms


E{vec Y} = (I p … A) vec Ȅ, '{vec Y} = Ȉ … I n , vec W = (I p … H) vec Ȅ
are equivalent to the matrix forms.
460 15 Special problems of algebraic regression and stochastic estimation

A key result is Lemma 15.6 in which we solve for a given multivariate weight
matrix G ij - being equivalent to ( Ȉ … I n ) 1 - a multivariate LESS problem.
Theorem 15.6 (multivariate Gauss-Markov model with constraints):
A multivariate Gauss-Markov model with linear homogeneous
constraints is ( Ȉ, I n )  BLUUE if
ˆ = (I … ( AcA)  Ac) vec Y + Y(I … ( AcA) H c( H( AcA)  H c) 1 ) vec Y 
vec Ȅ p p

 (I p … ( A cA)  H c) 1 H ( A cA)  A c) vec Y


(15.26)
or
ˆ = ( A cA)  ( A cY + H c(H ( A cA)  H c) 1 ( W  H( A cA)  A cY))
Ȅ (15.27)
An unbiased estimation of the variance-covariance matrix Ȉ is
 = 1 ˆ  Y)c( AȄˆ  Y) +
Ȉ {( AȄ
nm+r (15.28)
+ (HȄˆ  W)c( H( AcA)  Ac) 1 ( HȄ
ˆ  W)}.

15-2 n-way classification models


Another special model is called
n-way classification model.
We will define it and show how to solve its basic equations. Namely, we begin
with the 1-way classification and to continue with the 2- and 3-way classification
models. A specific feature of any classification model is the nature of the specific
unknown vector which is either zero or one. The methods to solve the normal
equation vary: In one approach, one assumption is that the unknown vector of
zeros or ones is a fixed effect. The corresponding normal equations are solved
by standard MINOLESS, weighted or not. Alternatively, one assumes that the
parameter vector consists of random effects. Methods of variance-covariance
component estimation are applied.
Here we only follow a MINOLESS approach, weighted or not. The interested
reader of the alternative technique of variance-covariance component estimation
is referred to our Chapter 3 or to the literature, for instance H. Ahrens and J.
Laeuter (1974) or S.R. Searle (1971), my favorite.
15-21 A first example: 1-way classification
A one-way classification model is defined by
yij = E{ yij } + eij = P + Di + eij (15.29)
(15.30) y c := [y1c y c2 ...y cp 1 y cp ], xc := [ P D1 D 2 ...D p 1 D p ] (15.31)
where the parameters P and Di are unknown. It is characteristic for the model
that the coefficients of the unknowns are either one or zero. A MINOLESS
15-2 n-way classification models 461

(Minimum Norms LEast Squares Solution) for the unknown parameters { P ,


Di } is based on
|| y - Ax ||I2 = min and || x ||I2 = min
x x

we built around a numerical example.


Numerical example: 1-way classification
Here we will investigate data concerning the investment on consumer durables of
people with different levels of education. Assuming that investment is measured
by an index number, namely supposing that available data consist of values of
this index for 7 people: Table 15.1 illustrates a very small example, but adequate
for our purposes.
Table 15.1 (investment indices of seven people):
Level of education number of people Indices Total
1 (High School incomplete) 3 74, 68, 77 219
2 (High School graduate) 2 76, 80 156
3 (College graduate) 2 85,93 178
Total 7 553
A suitable model for these data is
yij = P + Di + eij , (15.32)

where yij is investment index of the jth person in the ith education level, P is a
general mean, Di is the effect on investment of the ith level of education and eij
is the random error term peculiar to yij . For the data of Table 15.1 there are 3
educational levels and i takes the values j = 1, 2,..., ni  1, ni where ni is the
number of observations in the ith educational level, in our case n1 = 3, n2 = 2 and
n3 = 2 in Table 15.1.
Our model is the model for the 1-way classification. In general, the groupings
such as educational levels are called classes and in our model yij as the response
and levels of education as the classes, this is a model we can apply to many
situations.
The normal equations arise from writing the data of Table 15.1 in terms of our
model equation.

ª 74 º ª y11 º ª P + D1 + e11 º
« 68 » « y12 » « P + D1 + e12 »
«77 » « y13 » « P + D1 + e13 »
« 76 » = « y21 » = « P + D 2 + e21 » , O (y ) = 7 × 1
«80 » « y22 » « P + D 2 + e22 »
«85 » « y31 » « P + D 3 + e31 »
«¬ 93 »¼ « y » « P + D + e »
¬ 32 ¼ ¬ 3 32 ¼
462 15 Special problems of algebraic regression and stochastic estimation

or
ª 74 º ª1 1 0 0º ª e11 º
« 68 » «1 1 0 0» «e »
« » « » ª P º « 12 »
«77 » «1 1 0 0 » « » « e13 »
D
« 76 » = y = «1 0 1 0 » « 1 » + « e21 » = Ax + e y
D
«80 » «1 0 1 0 » « 2 » « e22 »
«85 » «1 » «¬D 3 »¼ « »
« » « 0 0 1» « e31 »
«¬ 93 »¼ «¬1 0 0 1 »¼ «¬ e32 »¼
and
ª1 1 0 0º
«1 1 0 0»
«1 1 0 0» ªP º
A = «1 «D »
0 1 0 » , x = «D 1 » , O ( A) = 7 × 4, O( x) = 4 ×1
«1 0 1 0 » 2
«1 «¬D 3 »¼
0 0 1»
«¬1 0 0 1 »¼

with y being the vector of observations and e y the vector of corresponding error
terms. As an inconsistent linear equation
y  e y = Ax, O{y} = 7 × 1, O{A} = 7 × 4, O{x} = 4 × 1
we pose the key question:
?What is the rank of the design matrix A?
Most notable, the first column is 1n and the sum of the other three columns is
also one, namely c 2 + c 3 + c 4 = 1n ! Indeed, we have a proof for a linear depend-
ence: c1 = c 2 + c 3 + c 4 . The rank rk A = 3 is only three which differs from
O{A} = 7 × 4. We have to build in this rank deficiency. For example, we could
postulate the condition x4 = D 3 = 0 eliminating one component of the unknown
vector. A more reasonable approach would be based on the computation of the
symmetric reflexive generalized inverse
such that
xlm = ( A cA) rs A cy , (15.33)
which would guarantee a least squares minimum norm solution or a V, S-
BLUMBE solution (Best Linear V-Norm Uniformly Minimum Bias S-Norm
Estimation) for V=I, S=I and
rk A = rk A cA = rk( A cA) rs = rk A + (15.34)
A cA is a symmetric matrix Ÿ ( A cA ) rs is a symmetric matrix
or called
:rank preserving identity:
!symmetry preserving identity!
15-2 n-way classification models 463

We intend to compute xlm for our example.


Table 15.2: 1-way classification, example: normal equation
ª1 1 0 0 º
«1 1 0 0 »
ª1 1 1 1 1 1 1 º «1 1 0 0 » ª7 3 2 2 º
«1 1 1 0 0 0 0 » « «3 3 0 0»
A cA = « 1 0 1 0» = «
0 0 0 1 1 0 0» « » 2 0 2 0»
«0 0 0 0 0 1 1 » «1 0 1 0 » « 2 0 0 2 »
¬ ¼ 1 0 0 1 ¬ ¼
« »
¬«1 0 0 1 ¼»
A cA = DE, O{D} = 4 × 3, O{E} = 3 × 4

ª7 3 2º
«3 3 0»
DcD = « , E to be determined:
2 0 2»
«2 0 »

¬
DcA cA = DcDE Ÿ ( DcD) 1 DcAcA = E
ª7 3 2º
ª7 3 2 2º «3 3 0 » ª«
66 30 18º
DcD = « 3 3 0 0 » «2 = 30 18 6 »
«¬ 2 0 2 0 »¼ 0 2» «
«2 » ¬18 6 8 »¼
¬ 0 0¼

:compute (DcD) 1 and (DcD) 1 Dc :


ª1 0 0 1 º
Ÿ E = (DcD) 1 DcA cA = «0 1 0 1»
«¬0 0 1 1 »¼

( A cA) rs = Ec(EEc) 1 (DcD) 1 Dc =

ª 0.0833 0 0.0417 0.0417 º


« 0 0.2500 -0.1250 -0.1250 »

0.0417 -0.1250 0.3333 -0.1667 »
« »
¬« 0.0417 -0.1250 -0.1667 0.3333 ¼»

( A cA) rs A c =
ª 0.0833 0.0833 0.0833 0.1250 0.1250 0.1250 0.1250 º
« 0.2500 0.2500 0.2500 -0.1250 -0.1250 -0.1250 -0.1250 »
«-0.0833 -0.0833 -0.0833 0.3750 0.3750 -0.1250 -0.1250 »
« »
«¬-0.0833 -0.0833 -0.0833 -0.1250 -0.1250 0.3750 0.3750 »¼
ª 60.0 º
«13.0 »
xlm = ( AcA) Acy = «

rs ».
«18.0 »
¬« 29.0 ¼»
464 15 Special problems of algebraic regression and stochastic estimation

Summary
The general formulation of our 1-way classification problem is generated by
identifying the vector of responses as well as the vector of parameters:
Table 15.3: 1-way classification
y c := [y11 , y12 ...y1( n 1) y1n | y 21y 22 ...y 2( n
1 1 2 1) y 2 n | ... | y p1y p 2 ...y p ( n
2 p 1) y ( pn ) ]
p

xc := [ P D1 D 2 ...D p 1 D p ]

ª1 1 0 0º
«" " "»
«1 1 0 0»
«1 0 1 0
"»» , O( A) = n × ( p + 1)
A := «" 1
"
0 1 0
«" " "»
«1 0 0 1»
«" " "»
¬« 1 0 0 1 ¼»
p
n = n1 + n2 + ... + n p = ¦ ni (15.35)
i =1
experimental design: number of rank of the
number of observations parameters: design matrix:
n = n1 + n2 + … + n p 1+ p 1 + ( p  1) = p (15.36)

:MINOLESS:
(15.37) || y - Ax || = min and || x ||2 = min
2
(15.38)
x

xlm = ( A cA) rs A cy. (15.39)


15-22 A second example: 2-way classification without interaction
A two-way classification model without interaction is defined by
“MINOLESS”
yijk = E{ yijk } + eijk = P + D i + E j + eijk (15.40)
y c := [y11
c , y c21 ,..., y cp 1 1 , y cp1 , y12
c , y c22 ,..., y cp q 1 , y cpq ]

xc = [ P ,D1 ,..., D p , E1 ,..., E q ] (15.41)

(15.42) || y - Ax ||I2 = min and || x ||2 = min . (15.43)


x x

The factor A appears in p levels and the factor B in g levels. If nij denotes the
number of observations under the influence of the ith level of the factor A and
the jth level of the factor B, then the results of the experiment can be condensed
15-2 n-way classification models 465

in Table 15.4. If Di and E y denote the effects of the factors A and B, P the mean
of all observations, we receive
P + D i + E j = E{y ijk } for all i  {1,..., p}, j  {1,..., q}, k  {1,..., nij } (15.44)
as our model equation.
Table 15.4 (level of factors):
level of the factor B 1 2 … q
level of factor A 1 n11 n12 … n1q
2 n21 n22 n2q
… … … …
p np1 np2 npq

If nij = 0 for at least one pair{i, j} , then our experimental design is called in-
complete. An experimental design for which nij is equal of all pairs {ij} , is said
to be balanced.
The data of Table 15.5 describe such a general model of y ijk observations in the
ith row (brand of stove) and jth column (make of the pan), P is the mean, Di is
the effect of the ith row, E j is the effect of the jth column, and eijk is the error
term.
Outside the context of rows and columns Di is equivalently the effect due to the
ith level of the D factor and E j is the effect due to the jth level of the E factor.
In general, we have p levels of the D factor with i = 1,..., p and q levels of the E
factor with j = 1,..., q : in our example p = 4 and q = 3 .

Table 15.5 (number of seconds beyond 3 minutes, taken to boil 2 quarts of


water):
Make of Pan number of
A B C total observations mean
X 18 12 24 54 3 18
Brand of Y — — 9 9 3 18
Stove Z 3 — 15 18 3 18
W 6 3 18 27 3 18
Total 27 15 66 108
number of
observations 3 2 4
1
mean 9 7 2
16 12

With balanced data every one of the pq cells in Table 15.5 would have one (or
n) observations and n d 1 would be the only symbol needed to describe the num-
ber of observations in each cell. In our Table 15.5 some cells have zero observa-
tions and some have one. We therefore need nij as the number of observations in
466 15 Special problems of algebraic regression and stochastic estimation

the ith row and jth column. Then all nij = 0 or 1, and the number of observations
are the values of
q p p q
ni = ¦ nij , n j = ¦ nij , n = ¦¦ nij . (15.45)
j =1 i =1 i =1 j =1

Corresponding totals and means of the observations are shown, too. For the ob-
servations in Table 15.5 the linear equations of the model are given as follows,

ª18 º ª y11 º ª 1 1 ˜ ˜ ˜ 1 ˜ ˜ º ªe º
ª P º « e11 »
«12 » « y12 » « 1 1 ˜ ˜ ˜ ˜ 1 ˜ » «D1 » « e12 »
« 24 » « y13 » « 1 1 ˜ ˜ ˜ ˜ ˜ 1» «D 2 » « 13 »
« 9 » « y23 » « 1 ˜ 1 ˜ ˜ ˜ ˜ 1» «D 3 » « e23 »
« 3 » = « y31 » = « 1 ˜ ˜ 1 ˜ 1 ˜ ˜ » «D 4 » + « e31 » ,
«15 » « y33 » « 1 ˜ ˜ 1 ˜ ˜ ˜ 1» « E1 » « e33 »
« 6 » «y » « 1 ˜ ˜ ˜ 1 1 ˜ ˜» « E » « e41 »
« 3 » « y 41 » « 1 ˜ ˜ ˜ 1 ˜ 1 ˜» « E 2 » «e42 »
«18 » «« y42 »» « 1 ˜ ˜ ˜ 1 ˜ ˜ 1 »¼ ¬ 3 ¼ «e »
¬ ¼ ¬ 43 ¼ ¬ ¬ 43 ¼
where dots represent zeros. In summary,

ª18 º ª 1 1 0 0 0 1 0 0º ªe º
« 0» ª P º « e11 »
«12 » 1 1 0 0 0 0 1 «D1 » « e12 »
« 24 » « 1 1 0 0 0 0 0 1» «D 2 » « 13 »
«9» « 1 0 1 0 0 0 0 1» «D 3 » « e23 »
«3»=y=« 1 0 0 1 0 1 0 0» «D 4 » + « e31 »
«15 » « 1 0 0 1 0 0 0 1» « E1 » « e33 »
«6» « 1 0 0 0 1 1 0 0» « E » « e41 »
«3» « 1 0 0 0 1 0 1 0» « E 2 » « e42 »
«18 » « 1 0 0 0 1 0 0 1 »¼ ¬ 3 ¼ «e »
¬ ¼ ¬ ¬ 43 ¼
ª1 1 0 0 0 1 0 0º
ªP º
«1 1 0 0 0 0 1 0» « D1 »
«1 1 0 0 0 0 1 1» «D 2 »
«1 0 1 0 0 0 0 1» «D »
A = «1 0 0 1 0 1 0 0 » , x = «D 3 » , O( A) = 9 × 8, O(x) = 8 × 1
«1 0 0 1 0 0 0 1» 4
« E1 »
«1 0 0 0 1 1 0 0 » «E »
«1 0 0 0 1 0 1 0» « E2 »
«1 0 0 0 1 0 0 1 »¼ ¬ 3¼
¬
with y being the vector of observations and e y the vector of corresponding error
terms. As an inconsistent linear equation
y - e y = Ax, O{y} = 9 × 1, O{A} = 9 × 8, O{x} = 8 × 1
we pose the key question:
? What is the rank of the design matrix A ?

Most notable, the first column is 1n and the sum of the next 4 columns is also 1n
as well as the sum of the remaining 3 columns is 1n , too, namely
15-2 n-way classification models 467

c2 + c3 + c4 + c5 = 1n and c6 + c7 + c8 = 1n. The rank rkA = 1 + ( p  1) + ( q  1)


= 1 + 3 + 2 = 6 is only six which differs from O{A} = 9 × 8. We have to take ad-
vantage of this rank deficiency. For example, we could postulate the condition
x5 = 0 and x8 = 0 eliminating two components of the unknown vector. A more
reasonable approach would be based on the computation of the
symmetric reflexive generalized inverse
such that
xlm = ( AcA) rs Acy , (15.46)
which would guarantee a least square minimum norm solution or a I, I –
BLUMBE solution (Best Linear I – Norm Uniformly Minimum Bias I – Norm
Estimation) and
rk A = rk A cA = rk( A cA) rs = rkA c (15.47)

A cA is a symmetric matrix Ÿ ( A cA) is a symmetric matrix
rs

or called
:rank preserving identity:
:symmetry preserving identity:

Table 15.6: 2-way classification without interaction,


example: normal equation
P D1 D 2 D 3 D 4 E1 E 2 E 3
ª p pº
«1 1 0 0 0 1 0 0»
ª1 1 1 1 1 1 1 1 1º «1
«1 1 1 0 0 0 0 0 0» 1 0 0 0 0 1 0»
«1 1 0 0 0 0 0 1»
«0 0 0 1 0 0 0 0 0» « »
«0 0 0 0 1 1 0 0 0» «1 0 1 0 0 0 0 1»
A cA = « » «1 0 0 1 0 1 0 0»
«0 0 0 0 0 0 1 1 1»
«1 0 0 1 0 0 0 1»
«1 0 0 0 1 0 1 0 0»
«1 0 0 0 1 1 0 0»
«0 1 0 0 0 0 0 1 0»
«1
«¬ 0 0 1 1 0 1 0 0 1 »¼ 0 0 0 1 0 1 0»
«1 0 0 0 1 0 0 1»
«¬ n n »¼
ª9 3 1 2 3 3 2 4º
«3 3 0 0 0 1 1 1»
«1 0 1 0 0 0 0 1»
« 1»
= «2 0 0 2 0 1 0
3 0 0 0 3 1 1 1»
«3 1 0 1 1 3 0 0»
«2 1 0 0 1 0 2 0»
«4 1 1 1 1 0 0 4 »¼
¬
468 15 Special problems of algebraic regression and stochastic estimation

A cA = DE, O{D} = 8 × 6, O{E} = 6 × 8

ª9 3 1 2 3 2º
«3 3 0 0 1 1»
«1 0 1 0 0 0»
D = «2 0 0 2 1 0» , E to be determined
«3 0 0 0 1 1»
«3 1 0 1 3 0»
«¬ 2 1 0 0 0 2 »¼

DcA cA = DcDE Ÿ (DcD) 1 DcAcA = E


ª133 45 14 29 44 28º
« 45 21 4 8 15 11 »
« 3 2»
DcD = « 14 4 3 3
29 8 3 10 11 4 »
« 44 15 3 11 21 8 »
«¬ 28 11 2 4 8 10 »¼

compute (DcD) 1 and (DcD) 1 Dc


E = (DcD) 1 DcA cA =
ª1.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 1.0000 º
« 0.0000 1.0000 0.0000 0.0000 -1.0000 0.0000 0.0000 0.0000 »
« 0.0000 0.0000 1.0000 0.0000 -1.0000 0.0000 0.0000 0.0000 »
=« »
« 0.0000 0.0000 0.0000 1.0000 -1.0000 0.0000 0.0000 0.0000 »
« 0.0000 0.0000 0 0.0000 0.0000 1.0000 0.0000 -1.0000 »
«¬ 0.0000 0.0000 0 0.0000 0.0000 0.0000 1.0000 -1.0000 »¼
( AcA) rs = Ec(EEc)( DcD) 1 Dc =
ª 0.0665 -0.0360 0.1219 0.0166 -0.0360 0.0222 0.0748 -0.0305 º
« -0.0360 0.3112 -0.2327 -0.0923 -0.0222 -0.0120 -0.0822 0.0582 »
« 0.1219 -0.2327 0.8068 -0.2195 -0.2327 0.1240 0.1371 -0.1392 »
« »
0.0166 -0.0923 -0.2195 0.4208 -0.0923 -0.0778 0.1020 -0.0076 »

« -0.0360 -0.0222 -0.2327 -0.0923 0.3112 -0.0120 -0.0822 0.0582 »
« 0.0222 -0.0120 0.1240 -0.0778 -0.0120 0.2574 -0.1417 -0.0935 »
« »
« 0.0748 -0.0822 0.1371 0.1020 -0.0822 -0.1417 0.3758 -0.1593 »
«¬ -0.0305 0.0582 -0.1392 -0.0076 0.0582 -0.0935 -0.1593 0.2223 »¼
( AcA) rs A =
ª 0.0526 0.1053 0.0000 0.1579 0.1053 0.0526 0.0526 0.1053 0.0000 º
« 0.2632 0.1930 0.3333 -0.2105 -0.1404 -0.0702 -0.0702 -0.1404 0.0000»
« 0.0132 0.0263 -0.2500 0.7895 0.0263 -0.2368 0.0132 0.0263 -0.2500 »
« »
0.1535 0.0263 -0.0833 -0.2105 0.3596 0.4298 -0.1535 0.0263 -0.0833 »

«-0.0702 -0.1404 0.0000 -0.2105 -0.1404 -0.0702 0.2632 0.1930 0.3333 »
« 0.2675 -0.1316 -0.0833 0.0526 0.2018 -0.1491 0.2675 -0.1316 -0.0833 »
« »
«-0.1491 0.3684 -0.1667 0.0526 0.0351 0.0175 -0.1491 0.3684 -0.1667 »
¬«-0.0658 -0.1316 0.2500 0.0526 -0.1316 0.1842 -0.0658 -0.1316 0.2500 ¼»
15-2 n-way classification models 469

ª 5.3684 º
« 10.8421»
« -6.1579 »
« »
-1.1579 »
xlm = ( A cA) rs A cy = « .
« 1.8421 »
« -0.2105 »
« -4.2105 »»
«
¬« 9.7895 ¼»

Summary
The general formulation of our 2-way classification problem without interaction
is generated by identifying the vector of responses as well as the vector of pa-
rameters.
Table 15.7: 2-way classification without interaction
y c := [y11
c , ..., y cp1 , y12
c ,..., y cpq 1 y cpq ]
xc := [ P , D1 ,..., D p , E1 ,..., E q ]
A := [1n , c2 ,..., c p , c p +1 ,..., cq ]

subject to
c2 + ... + c p = 1, c p +1 + ... + cq = 1
q p p, q
ni = ¦ nij , n j = ¦ nij , n = ¦ nij (15.48)
j =1 i =1 i =1, j =1

experimental design: number of rank of the


number of observations parameters: design matrix:
p,q
n= ¦n ij 1+ p + q 1 + ( p  1) + (q  1) = p + q  1 (15.49)
i , j =1

:MINOLESS:
(15.50) || y  Ax || = min
2
and || x ||2 = min (15.51)
x

xlm = ( A cA) rs A cy .


15-23 A third example: 2-way classification with interaction
A two-way classification model with interaction is defined by
“MINOLESS”
yijk = E{ yijk } + eijk = P + D i + E j + (DE )ij + eijk (15.52)
subject to
i  {1,… , p}, j  {1,… , q}, k  {1,… , nij }
c ,… , y cp 1 , y cp , y12
y c := [ y11 c y c22 ,… y cnq 1 y cpq ]
470 15 Special problems of algebraic regression and stochastic estimation

xc := [ P , D1 ,… , D p , E1 ,… , E q , (DE )11 ,… , (DE ) pq ] (15.53)


(15.54) || y  Ax ||2I = min and || x ||2I = min . (15.55)
x x

It was been in the second example on 2-way classification without interaction


that the effects of different levels of the factors A and B were additive. An alter-
native model is a model in which the additivity does not hold: the observations
are not independent of each other. Such a model is called a model with interac-
tion between the factors whose effect (DE )ij has to be reflected by means of
ªi  {1,… ,p}
P + D i + E j + (DE )ij = E{ yijk } for all « j  {1,… ,q} (15.56)
« k  {1,… ,nij }
¬
like our model equation. As an example we consider by means of Table 15.8 a
plant breeder carrying out a series of experiments with three fertilizer treatments
on each of four varieties of grain. For each treatment-by-variety combination
when he or she plants several 4c × 4c plots. At harvest time she or he finds that
many of the plots have been lost due to being wrongly ploughed up and all he or
she is left with are the data of Table 15.8.
Table 15.8 (weight of grain form 4c × 4c trial plots):
Variety
Treatment 1 2 3 4 Totals
1 8 12 7
13 - 11
9
30 12 18 60
y11 (n11 ) y31 (n13 ) y41 ( n14 )
2 6 12
12 14 - -
18 26 44
3 - 9 14 10
7 16 14
11
13
16 30 48 94
Totals 48 42 42 66 198
With four of the treatment-in-variety combinations there are no data at all, and
with the others there are varying numbers of plots, ranging from 1 to 4 with a
total of 18 plots in all. Table 15.8 shows the yield of each plot, the total yields,
the number of plots in each total and the corresponding mean, for each treatment-
variety combination having data. Totals, numbers of observations and means are
also shown for the three treatments, the four varieties and for all 18 plots. The
symbols for the entries in the table, are also shown in terms of the model.
15-2 n-way classification models 471

The equations of a suitable linear model for analyzing data of the nature of Table
15.8 is for yijk as the kth observation in the ith treatment and jth variety. In our
top table, P is the mean, D i is the effect of the ith treatment, E j is the effect of
the jth variety, (DE )ij is the interaction effect for the ith treatment and the jth
variety and A ijk is the error term.
With balanced data every one of pq cells of our table would have n observa-
tions. In addition there would be pq levels of the (DE ) factor, the interaction
factor. However, with unbalanced data, when some cells have no observations
they are only as many (DE )ij - levels in the data as there are non-empty cells. Let
the number of such cells be s (s = 8 in Table 15.8). Then, if nij is the number of
observations in the (i, j)th cell of type “treatment i and variety j”, s the number of
cells in which nij z 0 , in all other cases nij > 0 . For these cells
nij

yij = ¦ yijk , yij = yij / nij (15.57)


k =1

is the total yield in the (i, j)th cell, and yij is the corresponding mean. Similarly,
p q p ,q p , q , nij

y = ¦ yi = ¦ y j = ¦ yij = ¦ yijk (15.58)


i =1 j =1 i , j =1 i =1, j =1, k =1

is the total yield for all plots, the number of observations called “plots” therein
being
p q p,q
n = ¦ ni = ¦ n j = ¦ nij . (15.59)
i =1 j =1 i, j

We shall continue with the corresponding normal equations being derived from
the observational equations.
(DE )
 
P D1 D 2 D 3 E1 E 2 E 3 E 4 11 13 14 21 22 31 33 34
ª 8 º ª y111 º ª1 1 ˜ ˜ 1 ˜ ˜ ˜ 1 ˜ ˜ ˜ ˜ ˜ ˜ ˜ º ª e111 º
«13» « y112 » «1 1 ˜ ˜ 1 ˜ ˜ ˜ 1 ˜ ˜ ˜ ˜ ˜ ˜ ˜ » ª P º « e112 »
« 9 » « y113 » «1 1 ˜ ˜ 1 ˜ ˜ ˜ 1 ˜ ˜ ˜ ˜ ˜ ˜ ˜ » « D1 » « e113 »
«12 » « y131 » «1 1 ˜ ˜ ˜ ˜ ˜ ˜ ˜ 1 ˜ ˜ ˜ ˜ ˜ ˜ » « D 2 » « e131 »
« 7 » « y141 » «1 1 ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ 1 ˜ ˜ ˜ ˜ ˜ » « D 3 » « e141 »
«11» « y142 » «1 1 ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ 1 ˜ ˜ ˜ ˜ ˜ » « E1 » « e142 »
« 6 » « y211 » «1 ˜ 1 ˜ 1 ˜ ˜ ˜ ˜ ˜ ˜ 1 ˜ ˜ ˜ ˜ » « E 2 » « e211 »
«12 » « y212 » «1 ˜ 1 ˜ 1 ˜ ˜ ˜ ˜ ˜ ˜ 1 ˜ ˜ ˜ ˜ » « E 3 » «e212 »
«12 » « y » «1 ˜ 1 ˜ ˜ 1 ˜ ˜ ˜ ˜ ˜ ˜ 1 ˜ ˜ ˜ » « E » « e »
«14 » = « y 221 » = «1 ˜ 1 ˜ ˜ 1 ˜ ˜ ˜ ˜ ˜ ˜ 1 ˜ ˜ ˜ » ˜ « (DE4) » + «e221 »
« 9 » « y222 » «1 ˜ ˜ 1 ˜ 1 ˜ ˜ ˜ ˜ ˜ ˜ ˜ 1 ˜ ˜ » « (DE )11 » « e222 »
« 7 » « y321 » «1 ˜ ˜ 1 ˜ 1 ˜ ˜ ˜ ˜ ˜ ˜ ˜ 1 ˜ ˜ » « (DE )13 » « e321 »
« » « 322 » « » « 14 » « 322 »
14
« » « y331 » «1 ˜ ˜ 1 ˜ ˜ 1 ˜ ˜ ˜ ˜ ˜ ˜ ˜ 1 ˜ » « (DE ) 21 » « e331 »
«16 » « y332 » «1 ˜ ˜ 1 ˜ ˜ 1 ˜ ˜ ˜ ˜ ˜ ˜ ˜ 1 ˜ » « (DE ) 22 » « e332 »
«10 » « y341 » «1 ˜ ˜ 1 ˜ ˜ ˜ 1 ˜ ˜ ˜ ˜ ˜ ˜ ˜ 1» « (DE )32 » « e341 »
«14 » « y342 » «1 ˜ ˜ 1 ˜ ˜ ˜ 1 ˜ ˜ ˜ ˜ ˜ ˜ ˜ 1» « (DE )33 » « e342 »
«11» « y343 » «1 ˜ ˜ 1 ˜ ˜ ˜ 1 ˜ ˜ ˜ ˜ ˜ ˜ ˜ 1» «¬ (DE )34 »¼ « e343 »
¬«13¼» ««¬ y344 »»¼ ¬«1 ˜ ˜ 1 ˜ ˜ ˜ 1 ˜ ˜ ˜ ˜ ˜ ˜ ˜ 1¼» ««¬ e344 ¼»»
472 15 Special problems of algebraic regression and stochastic estimation

where the dots represent zeros.

ª18 6 4 8 5 4 3 6 3 1 2 2 2 2 2 4 º ª P º ª y º ª198º
«6 6 ˜ ˜ 3 ˜ 1 2 3 1 2 ˜ ˜ ˜ ˜ ˜ » « D1 » « y1 » « 60 »
«4 ˜ 4 ˜ 2 2 ˜ ˜ ˜ ˜ ˜ 2 2 ˜ ˜ ˜ » « D 2 » « y2 » « 44 »
«8 ˜ ˜ 8 ˜ 2 2 4 ˜ ˜ ˜ ˜ ˜ 2 2 4 » « D 3 » « y3 » « 94 »
«5 3 2 ˜ 5 ˜ ˜ ˜ 3 ˜ ˜ 2 ˜ ˜ ˜ ˜ » « E1 » « y˜1 » « 48 »
«4 ˜ 2 2 ˜ 4 ˜ ˜ ˜ ˜ ˜ ˜ 2 2 ˜ ˜ » « E 2 » « y˜2 » « 42 »
«3 1 ˜ 2 ˜ ˜ 3 ˜ ˜ 1 ˜ ˜ ˜ ˜ 2 ˜ » «« E 3 »» «« y˜3 »» « 42 »
«6 2 ˜ 4 ˜ ˜ ˜ 6 ˜ ˜ 2 ˜ ˜ ˜ ˜ 4 » « E 4 » = « y˜4 » = « 66 » ~ AcAx = Acy.
«3 3 ˜ ˜ 3 ˜ ˜ ˜ 3 ˜ ˜ ˜ ˜ ˜ ˜ ˜ » « (DE )11 » « y11 » « 30 » A
«1 1 ˜ ˜ ˜ ˜ 1 ˜ ˜ 1 ˜ ˜ ˜ ˜ ˜ ˜ » « (DE )13 » « y13 » « 12 »
«2 2 ˜ ˜ ˜ ˜ ˜ 2 ˜ ˜ 2 ˜ ˜ ˜ ˜ ˜ »» « (DE )14 » « y14 » «« 18 »»
«
«2 ˜ 2 ˜ 2 ˜ ˜ ˜ ˜ ˜ ˜ 2 ˜ ˜ ˜ ˜ » « (DE ) 21 » « y21 » « 18 »
«2 ˜ 2 ˜ ˜ 2 ˜ ˜ ˜ ˜ ˜ ˜ 2 ˜ ˜ ˜ » «(DE ) 22 » « y22 » « 26 »
«2 ˜ ˜ 2 ˜ 2 ˜ ˜ ˜ ˜ ˜ ˜ ˜ 2 ˜ ˜ » « (DE )32 » « y32 » « 16 »
«2 ˜ ˜ 2 ˜ ˜ 2 ˜ ˜ ˜ ˜ ˜ ˜ ˜ 2 ˜ » « (DE )33 » « y33 » « 30 »
¬« 4 ˜ ˜ 4 ˜ ˜ ˜ 4 ˜ ˜ ˜ ˜ ˜ ˜ ˜ 4 ¼» «¬ (DE )34 »¼ «¬ y34 »¼ ¬« 48 ¼»

Now we again pose the key question:


?What is the rank of the design matrix A?
The first column is 1n and the sum of other columns is c2 + c3 + c4 = 1n and
c5 + c6 + c7 + c8 = 1n . How to handle the remaining sum (DE )... of our incomplete
model? Obviously, we experience rk[c9 ,… , c16 ] = 8 , namely
rk[(DE )ij ] = 8 for (DE )ij  {J 11 ,… , J 34 }. (15.60)
As a summary, we have computed rk( AcA) = 8 , a surprise for our special case. A
more reasonable approach would be based on the computation of the
symmetric reflexive generalized inverse
such that
x Am = ( A cA ) rs A cy , (15.61)
which would assure a minimum norm, least squares solution or a I, I – BLUMBE
solution (Best Linear I – Norm Uniformly Minimum Bias I – Norm Estimation )
and
rkA = rkA cA = rk(A cA ) rs = rkA + (15.62)

A cA is a symmetric matrix Ÿ ( A cA ) is a symmetric matrix
rs

or called
:rank preserving identity:
!symmetry preserving identity!
Table 15.9 summarizes all the details of 2-way classification with interaction. In
general, for complete models our table lists the general number of parameters and
the rank of the design matrix which differs from our incomplete design model.
15-2 n-way classification models 473

Table 15.9: 2-way classification with interaction


c ,… , y cp1 , y12
y c := [ y11 c ,… , y cpq 1 , y cpq ]
x c := [ P , D1 ,… , D p , E1 ,… , E q , (DE )11 ,… , (DE ) pq ]
A := [1n , c1 ,… , c p , c p +1 ,… , cq , c11 ,… , c pq ]
subject to
p,q
c2 + … + c p = 1 , c p +1 + … + cq = 1, ¦c ij = ( p  1)(q  1)
i , j =1
p q p ,q
n = ¦ ni = ¦ n j = ¦ ni , j
i =1 j =1 i, j

experimental design: number of rank of the


number of observations parameters: design matrix:
p,q
n = ¦ ni , j 1 + p + q + pq 1 + ( p  1) + (q  1) + (15.63)
+ ( p  1)(q  1)
i, j

(15.64) || y  Ax ||2 = min and || x ||2 = min (15.65)


x x

x Am = ( A cA ) rs A cy . (15.66)

For our key example we get from the symmetric normal equation A cAx A = A cy
the solution
x Am = ( A cA ) rs A cy

given A cA and A cy

O{A cA} = 16 × 16, O{P ,… , (DE )31 } = 16 × 1, O{A cy} = 16 × 1


A cA = DE, O{D} = 18 × 12, O{E} = 12 × 16

ª3 1 2 2 2 2 2 4º
«3 1 2 0 0 0 0 0»
«0 0 0 2 2 0 0 0»
«0 0 0 0 0 2 2 4»
«3 0 0 2 0 0 0 0»
«0 0 0 0 2 2 0 0»
«0 1 0 0 0 0 2 0»
« 4»
D = «0 0 2 0 0 0 0
3 0 0 0 0 0 0 0»
«0 1 0 0 0 0 0 0»
«0 0 2 0 0 0 0 0»
«0 0 0 2 0 0 0 0»
«0 0 0 0 2 0 0 0»
«0 0 0 0 0 2 0 0»
«0 0 0 0 0 0 2 0»
«0 0 0 0 0 0 0 4 »¼»
¬«
474 15 Special problems of algebraic regression and stochastic estimation

DcA cA = DcDE Ÿ ( DcD) 1 DcA cA = E


E = (DcD) 1 DcA cA =
ª1.0000 1.0000 0.0000 0.0000 1.0000 0.0000 0.0000 º
«1.0000 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000 »
« »
«1.0000 1.0000 0.0000 0.0000 0.0000 0 0.0000 »
«1.0000 0.0000 1.0000 0.0000 1.0000 0.0000 0.0000 »
«1.0000 0 1.0000 0.0000 0 1.0000 0.0000 »
«1.0000 0 0 1.0000 0 1.0000 0 »
« »
«1.0000 0.0000 0.0000 1.0000 0 0.0000 1.0000 »
«¬1.0000 0 0.0000 1.0000 0 0.0000 0 »¼

x Am = ( A cA) rs A cy =
= [6.4602, 1.5543, 2.4425, 2.4634, 0.6943, 1.0579, 3.3540, 1.3540,
1.2912, 0.6315, -0.3685, -0.5969, 3.0394, -1.9815, 2.7224,1.72245]c .

15-24 Higher classifications with interaction


If we generalize 1-way and 2-way classifications with interactions we arrive at a
higher classification of type
P + Di + E j + J k + … +
+(DE )ij + (DJ )ik + (DE ) jk + … + (DEJ )ijk + … = E{y ijk …A } (15.67)

for all
i  {1,… , p}, j  {1,… , q}, k  {1,… , r},… , A  {1,… , nijk ,…}.

An alternative stochastic model assumes a fully occupied variance – covariance


matrix of the observations, namely D{y}  E{[ y  E{y}][ y  E{y}]c}  Ȉij . Vari-
ance – covariance estimation techniques are to be applied. In addition, a mixed
model for the effects are be applied, for instance of type
E{y} = Aȟ + &E{]} (15.68)
D{y} = CD{z}Cc. (15.69)

Here we conclude with a discussion of what is unbiased estimable:


Example: 1–way classification
If we depart from the model E{y ij } = P + D i and note rkA = rkA cA =
1 + ( p  1) for i  {1,… , p}, namely rkA = p, we realize that the 1 + p
parameters are not unbiased estimable: The first column results from
the sum of the other columns. It is obvious the difference Di  D1 is un-
biased estimable. This difference produces a column matrix with full
rank.
Summary
Di  D1 quantities are unbiased estimable
15-2 n-way classification models 475
Example: 2–way classification with interaction
Our first statement relates to the unbiased estimability of the terms
D1 ,… , D p and E1 ,… , E q : obviously, the differences Di  D1 and
E j  E1 for i, j < 1 are unbiased estimable. The first column is the
sum of the other terms. For instance, the second column can be
eliminated which is equivalent to estimating Di  D1 in order to
obtain a design matrix of full column rank. The same effect can be
seen with the other effect E j for the properly chosen design ma-
trix: E j  E1 for all j > 1 is unbiased estimable! If we add the pq
effect (DE )ij of interactions, only those interactions increase the
rank of the design matrix by one respectively, which refer to the
differences Di  D1 and E j  E1 , altogether ( p  1)( q  1) interac-
tions. To the effect (DE )ij of the interactions are estimable,
pq  ( p  1)( q  1) = p + q  1 constants may be added, that is to
the interactions
(DE )i1 , (DE )i 2 ,… , (DE )iq
with
i  {1,… , p}
the constants
'(DE˜1 ), '(DE ˜2 ),…, '(DE ˜ q )
and to the interactions
(DE ) 2 j , (DE )3 j ,… , (DE ) pj
with
j  {1,… , q}
the constants
'(DE 2˜ ), '(DE 3˜ ),… , '(DE p ˜ ).
The constants '(DE1˜ ) need not to be added which can be inter-
preted by '(DE˜1 ) = '(DE1˜ ). A numerical example is
p = 2, q = 2
xc = [ P , D1 , D 2, E1 , E 2 , (DE )11 , (DE )12 , (DE ) 22 ].

Summary
'D = D 2  D1 , 'E = E 2  E1 for all i  {1, 2}, j  {1, 2} (15.70)
as well as
'(DE˜1 ), '(DE ˜2 ), '(DE 2˜ ) (15.71)
are unbiased estimable.
476 15 Special problems of algebraic regression and stochastic estimation

At the end we review the number of parameters and the rank of the design matrix
for a 3–way classification with interactions according to the following example.

3–way classification with interactions


experimental design: number of rank of the
number of observations parameters: design matrix:

1 + ( p  1) + (q  1) + (r  1)
1+ p + q + r +
p,q,r
+( p  1)(q  1) +
n= ¦ nijk + pq + pr + qr + (15.72)
+( p  1)(r  1) + (q  1)(r  1) +
i , j , k =1
+ pqr
( p  1)(q  1)(r  1) = pqr

15-3 Dynamical Systems


There are two essential items in the analysis of dynamical systems: First, there
exists a “linear or liniarized observational equation y (t ) = Cz(t ) ” connecting a
vector of stochastic observations y to a stochastic vector z of so called “state
variables”. Second, the other essential is the characteristic differential equation
of type “ zc(t ) = F (t , z(t )) ”, especially linearizied “ zc(t ) = Az(t ) ”, which maps
the first derivative of the “state variable” to the “state variable” its off. Both,
y (t ) and z(t ) are functions of a parameter, called “time t”. The second equation
describes the time development of the dynamical system. An alternative formula-
tion of the dynamical system equation is “ z(t ) = Az(t  1) ”. Due to the random
nature “of the two functions “ y (t ) = Cz(t ) ” and zc = Az ” the complete equa-
tions read
(15.73) E{y (t )} = CE{z} and V{e y (t1 ), e y (t2 )} = Ȉ y (t1 , t2 ) (15.74)
D{e y (t )} = Ȉ y (t ), (15.75)
(15.76) E{zc(t )} = AE{z(t )} and D{e z (t )} = Ȉ z (t ), (15.77)
V{e z (t1 ), e z (t2 )} = Ȉz (t1 , t2 ) (15.78)

Here we only introduce “the time invariant system equations” characterized by


A (t ) = A. zc(t ) abbreviates the functional dz(t ) / dt. There may be the case that
the variance-covariance functions Ȉ y (t ) and Ȉ z (t ) do not change in time:
Ȉ y (t ) = Ȉ y , Ȉ z (t ) = Ȉ z equal a constant. Various models exist for the variance-
covariance functions Ȉ y (t1 , t2 ) and Ȉ z (t1 , t2 ) , e.q. linear functions as in the case
of a Gauss process or a Brown process Ȉ y (t2  t1 ) , Ȉz (t2  t1 ) .
The analysis of dynamic system theory was initiated by R. E.
Kalman (1960) and by R. E. Kalman and R. S. Bucy (1961):
“KF” stand for “Kalman filtering”.

Example 1 (tracking a satellite orbit):


15-3 Dynamical Systems 477

Tracking a satellite’s orbit around the Earth might be based on the unknown state
vector z(t ) being a function of the position and the speed of the satellite at time t
with respect to a spherical coordinate system with origin at the mass center of the
Earth. Position and speed of a satellite can be measured by GPS, for instance. If
distances and accompanying angles are measured, they establish the observation
y (t ) . The principles of space-time geometry, namely mapping y (t ) into z(t ) ,
would be incorporated in the matrix C while e y (t ) would reflect the measure-
ment errors at the time instant t. The matrix A reflects the situation how position
and speed change in time according the physical lows governing orbiting bodies,
while ez would allow for deviation from the lows owing to factors as nonuni-
formity of the Earth gravity field.
Example 2 (statistical quality control):
Here the observation vector y (t ) is a simple approximately normal transforma-
tion of the number of derivatives observed in a sample obtained at time t, while
y1 (t ) and y2 (t ) represent respectively the refractive index of the process and
the drift of the index. We have the observation equation and the system equa-
tions
z (t ) = z2 (t ) + ez1
y (t ) = z1 (t ) + ey (t1 ) and 1
z2 (t ) = z2 (t  1) + ez 2 .
In vector notation, this system of equation becomes
z(t ) = Az(t  1) + e z
namely
ª z (t ) º ª1 1º ª ez º ª 0 1º
z(t ) = « 1 » , e z = « » « », A = «
1

»
¬ z2 ( t ) ¼ ¬ 0 1¼ «¬ ez »¼2 ¬ 0 1¼

do not change in time.


If we examine y (t )  y (t  1) for this model, we observe that under the assump-
tion of constant variance, namely e y (t ) = e y and e z (t ) = e z , the autocorrelation
structure of the difference is identical to that of an ARIMA (0,1,1) process. Al-
though such a correspondence is sometimes easily discernible, we should in
general not consider the two approaches to be equivalent.

A stochastic process is called an ARMA process of the order


( p, q ) if
z (t ) = a1z (t  1) + a2 z (t  2) + … + a p z (t  p ) =
= b0u(t ) + b1u(t  1) + b2u(t  2) + … + bq u(t  q ) (15.79)
for all t  {1,… , T }
also called a mixed autoregressive/moving - average process.
478 15 Special problems of algebraic regression and stochastic estimation

In practice, most time series are non-stationary. In order to fit a stationary model,
it is necessary to remove non-stationary sources of variation. If the observed
time series is non-stationary in the mean, then we can use the difference of the
series. Differencing is widely used in all scientific disciplines. If
z (t ), t  {1,… , T } , is replaced by ’d z (t ) , then we have a model capable of de-
scribing certain types of non-stationary signals. Such a model is called an “inte-
grated model” because the stationary model that fits to the difference data has to
the summed or “integrated” to provide a model for the original non-stationary
data. Writing
W (t ) = ’d z (t ) = (1  B ) d z (t ) (15.80)

for all t  {1,… , T }

the general autoregressive integrated moving-average (ARIMA) process is of


the form
ARIMA
W (t ) = D1W (t  1) + … + D pW (t  p ) + b0 u(t ) + … + bq u(t  q) (15.81)
or
ĭ( B )W(t ) = ī( B ) (15.82)

ĭ( B)(1  B) d z (t ) = ī( B)u (t ). (15.83)

Thus we have an ARMA ( p, q ) model for W (t ) , t  {1,… , T } , while the model


for W (t ) describing the dth differences for z (t ) is said to be an ARIMA process
of order ( p, d , q) . For our case, ARIMA (0,1,1) means a specific proc-
ess p = 1, ’ = 1, q = 1 . The model for z (t ), t  {1,… , T } , is clearly non-
stationary, as the AR operators ĭ( B )(1  q) d has d roots on the unit circle since
patting B = 1 makes the AR operator equal to zero. In practice, first differencing
is often found to the adequate to make a series stationary, and accordingly the
value of d is often taken to be one. Note the random part could be considered as
an ARIMA (0,1,0) process.
It is a special problem of time-series analysis that the error variances are gener-
ally not known a priori. This can be dealt with by guessing, and then updating
then is an appropriate way, or, alternatively, by estimating then forming a set of
data over a suitable fit period.
In the state space modeling, the prime objective is to predict the signal in the
presence of noise. In other words, we want to estimate the m × 1 state vector
E{z(t )} which cannot usually be observed directly. The Kalman filter provides a
general method for doing this. It consists of a set of equations that allow us to
update the estimate of E{z(t )} when a new observation becomes available. We
will outline this updating procedure with two stages, called
15-3 Dynamical Systems 479

Ɣ the prediction stage


and
Ɣ the updating stage.
Suppose we have observed a univariate time series up to time (t  1) , and that
E{z (t  1)} is “the best estimator” E{z (t  1)} based on information up to this
time. For instance, “best” is defined as an PLUUP estimator. Note that z (t ),
z (t  1) etc is a random variable. Further suppose that we have evaluated the
m × m variance-covariance matrix of E{zn (t  1)} which we denote by P{t  1}.
The first stage called the prediction stage is concerned with forecasting E{z (t )}
from data up to the time (t  1) , and we denote the resulting estimator in a obvi-
ous notation by E{zn (t ) | z (t  1)} . Considering the state equations where
D{ez (t  1)} is still unknown at time (t  1), the obvious estimator for E{z (t )} is
given by
E{z(t ) | zˆ (t  1)} = G (t ) E{zn
(t  1)} (15.84)
and the variance-covariance matrix
V{t | t  1} = G (t ) V{t  1}G c + W {t} (15.85)

called prediction equations. The last equations follows from the standard results
on variance-covariance matrices for random vector variables. When the new
observation at time t, namely when y (t ) has been observed the estimator for
E{z(t )} can be modified to take account of the extra information. At time
(t  1) , the best forecast of y (t ) is given by hc E{z(t ) | zˆ (t  1)} so that the pre-
diction error is given by
eˆy (t ) = y (t )  hc(t ) E{z (t ) | z (t  1)}. (15.86)

This quantity can be used to update the estimate of E{z (t )} and its variance-
covariance matrix.
E{zˆ (t )} = E{z(t ) | zˆ (t  1)} + K (t )eˆ y (15.87)

V{t} = V{t | t  1}  K (t )hc(t ) V{t | t  1} (15.88)

V{t} = V{t , t  1}hc(t ) /[hc(t ) V{t | t  1}h + V n2 ] . (15.89)

V{t} is called the gain matrix. In the univariate case, K (t ) is just a vector of
size ( m  1) . The previous equations constitute the second updating stage of the
Kalman filter, thus they are called the updating equations.
A major practical advantage of the Kalman filter is that the calculations are re-
cursive so that, although the current estimates are based on the whole past history
of measurements, there is no need for an ever expanding memory. Rather the
near estimate of the signal is based solely on the previous estimate and the latest
observations. A second advantage of the Kalman filter is that it converges fairly
480 15 Special problems of algebraic regression and stochastic estimation

quickly when there is a constant underlying model, but can also follow the
movement of a system where the underlying model is evolving through time.
For special cases, there exist much simpler equations. An example is to consider
the random walk plus noise model where the state vector z(t ) consist of one
state variable, the current level ȝ(t ) . It can be shown that the Kalman filter for
this model in the steady state case for t o f reduces the simple recurrence
relation
ˆ t ) = ȝ(
ȝ( ˆ t  1) + D e(
ˆ t) ,
where the smoothing constant D is a complicated function of the signal-to-noise
ratio ı 2w  ı n2 . Our equation is simple exponential smoothing. When ı 2w tends to
zero, ȝ(t ) is a constant and we find that D o 0 would intuitively be expected,
while as ı 2w  ı n2 becomes large, then D approaches unity.
For a multivariate time series approach we may start from the vector-valued
equation of type
E{y (t )} = CE{z} , (15.90)
where C is a known nonsingular m × m matrix.
By LESS we are able to predict

E{zn
(t )} = C1E{n
y (t )} . (15.91)

Once a model has been put into the state-space form, the Kalman filter can be
used to provide estimates of the signal, and they in turn lead to algorithms for
various other calculations, such as making prediction and handling missing val-
ues. For instance, forecasts may be obtained from the state-space model using
the latest estimates of the state vector. Given data to time N, the best estimate of
the state vector is written E{zn( N )} and the h-step-ahead forecast is given by

E{yn
( N )} = hc( N + h) E{zn
( N + h)} =
(15.92)
= h( N + h)G ( N + h)G{N + h  1}… G ( N + 1) E{zn
( N )}

where we assume h ( N + h) and future values of G (t ) are known. Of course, if


G (t ) is a constant, say G, then we get

E{yn
( N | h)} = hc( N + h)G h E{zn
( N )}. (15.93)

If future values of h(t ) or G(t ) are not known, then they must themselves be
forecasted or otherwise guessed.
Up to this day a lot of research has been done on nonlinear models in prediction
theory relating to state-vectors and observational equations. There are excellent
reviews, for instance by P. H. Frances (1988), C. W. J. Granger and P. Newbold
15-3 Dynamical Systems 481

(1986), A. C. Harvey (1993), M. B. Priestley (1981, 1988) and H. Tong (1990).


C. W. Granger and T. Teräsvirta (1993) is a more advanced text.
In terms of the dynamical system theory we regularly meet the problem that the
observational equation is not of full column rank. A state variable leads to a
relation between the system input-output solution, especially a statement on how
a system is developing in time. Very often it is reasonable to switch from a state
variable, in one reference system to another one with special properties. Let T
this time be a similarity transformation, namely described by a non-singular
matrix of type
z := Tz  z = T -1z (15.94)
d
z = T 1ATz + T 1Bu(t ), z 0 = T 1z 0 (15.95)
dt
y (t ) = CTz (t ) + Du(t ). (15.96)

The key question is now whether to the characteristic state equation there be-
longs a transformation matrix such that for a specific matrix A and B there
exists an integer number r, 0 d r < n , of the form
ª A ª r×r r × (n  r ) º

A12 º
A = « 11 »
, O{A } = «
¬ 0 A 22 ¼ ¬(n  1) × r (n  r ) × ( n  r ) ¼»
ª B º ª q×r º
B = « 1 » , O{B } = « ».
¬0 ¼ ¬ q × (n  r ) ¼
In this case the state equation separates in two distinct parts.
d
z1 (t ) = A11 z1 + A12

z 2 (t ) + B1 h(t ), z1 (0) = z10

(15.97)
dt
d
z 2 (t ) = A 22 z 2 (t ), z 2 (0) = z 20 . (15.98)
dt
The last n  r elements of z cannot be influenced in its time development.
Influence is restricted to the initial conditions and to the eigen dynamics of the
partial system 2 (characterized by the matrix A 22 ). The state of the whole sys-
tem cannot be influenced completely by the artificially given point of the state
space. Accordingly, the state differential equation in terms of the matrix pair (A,
B) is not steerable.
Example 3: (steerable state):
If we apply the dynamic matrix A and the introductory matrix of a state model of
type
482 15 Special problems of algebraic regression and stochastic estimation

ª 4 2º
« 3», B = ª 1 º ,
A=« 3
1 5» « 0.5»
¬ ¼
«  »
«¬ 3 3 »¼
we are led to the alternative matrices after using the similarity transformation
ª 1 1º ª1º
A = « , B =« ».
¬ 0 2 »¼ ¬ 0¼
If the initial state is located along the z1 -axis, for instance z20

= 0 , then the state
vector remains all times along this axis. It is only possible to move this axis
along a straight line “up and down”.
In case that there exists no similarity transformation we call the state matrices
(A, B) steerable. Steerability of a state differential equation may be tested by
Lemma 15.7 (Steerability):
The pair (A, B) is steerable if and only if
(15.99) rk[BAB … A n 1B] = rkF( A, B) = n.

F( A, B) is called matrix of steerability. If its rank r < n , then


there exists a transformation T such that A = T 1AT and
B = T 1B has the form
ª A A12
º ª B º
(15.100) A = « 11 »
, B = « 1 » (15.101)
¬ 0 A 22 ¼ ¬0 ¼

and ( A11 , B1 ) is steerable.

Alternatively we could search for a transformation matrix T such that transforms


the dynamic matrix and the exit matrix of a state model to the form
ª A 0 º ª r×r r × (n  r ) º
A = « 11 »
, O{A } = « »
¬( n  1) × r ( n  r ) × ( n  r ) ¼

¬ A 21 A 22 ¼

C = [C1 , 0], O{C } = [r × p, (n  r ) × p ].

In this case the state equation and the observational equations read
d
z1 (t ) = A11

z1 (t ) + B1 u(t ), z1 (0) = z10

(15.102)
dt
d
z 2 (t ) = A 21z1 (t ) + A 22 z 2 (t ) + B 2u( t ), z 2 (0) = z 20 (15.103)
dt
y (t ) = C 2 z1 (t ) + Du (t ). (15.104)
15-3 Dynamical Systems 483

The last n  r elements of the vector z are not used in the exit variable y. Since
they do not have an effect to z1 , the vector g contains no information of the
component of the state vector. This state moves in the n  r dimensional sub-
space of \ n without any change in the exit variables. Our model (C, A) is in this
case called non-observable.
Example 4: (observability):
If the exit matrix and the dynamic matrix of a state model can be characterized
by the matrices
ª4 2º ª 0 1º
C=«  », A = « ,
¬5 3¼ ¬ 2 3»¼
an application of the transformation matrix T leads to the matrices
ª 1 0º
C = [1, 0] , A = « ».
¬ 1 2 ¼
For an arbitrary motion of the state in the direction of the z 2 axis has no influ-
ence on the existing variable.
If there does not exist a transformation T, we call the state vector observable.
A rank study helps again!
Lemma 15.8 (Observability test):
The pair (C, A) is observable if and only if

ªC º
«CA »
rk « » = rkG(C, A ) = n. (15.105)
«# »
« n 1 »
«¬C »¼
G(C, A ) is called observability matrix. If its rank r < n , then
there exists a transformation matrix T such that A = T 1AT and
C = CT is of the form

ª A 0 º ª r×r r × (n  r ) º
A = « 11 »
, O{A } = « » (15.106)
¬( n  1) × r ( n  r ) × ( n  r ) ¼

¬ A 21 A 22 ¼

C = [C1 , 0], O{C } = [r × p, (n  r ) × p ] (15.107)


and C1 , A11 is observable.

With Lemma 15.7 and Lemma 15.8 we can only state whether a state model is
steerable or observable or not, or which dimension has a partial system being
classified as non-steerable and non-observable. In order to determine which part
of a system is non-steerable or non-observable - which eigen motion is not ex-
484 15 Special problems of algebraic regression and stochastic estimation

cited or non-observable – we have to be able to construct proper transformation


matrices T. A tool is the PBH – test we do not analyze here.
Both the state differential equation as well as the initial equation we can Laplace
transform easily. We only need the relations between the input, output and state
variable via polynom matrices. If the initial conditions z0 vanish, we get the
Laplace transformed characteristical equations
( sI n  A ) z( s ) = Bu( s ) (15.108)
y (s) = Cz ( s ) + Du( s ). (15.109)

For details we recommend to check the reference list. We only refer to solving
both the state differential equation as well as the initial equation: Eliminating the
state vector z( s ) lead us to the algebraic relation between u( s ) and y ( s ) :

(15.110) G( s ) = C( sI n  A ) 1 B + D
or
1

§ ªI 0 º ª A11 A12 º · ª B1 º
(15.111) G( s ) = ¬ªC C ¼º ¨ s « r

 « » ¸ « »+D
1 2 ¨ ¬0
© I n r »¼ ¬ 0 A 22 ¼ ¸¹ ¬ 0 ¼

= C1 ( sI n  A11 ) B1 + D.
1

Recently, the topic of chaos has attracted much attention. Chaotic behavior
arises from certain types of nonlinear models, and a loose definition is appar-
ently random behavior that is generated by a purely deterministic, nonlinear
system.
Refer to the contributions of K.S. Chan and H. Tong (2001), J. Gleick (1987), V.
Isham (1983), H. Kants and I. Schreiber (1997).
Appendix A: Matrix Algebra

As a two-dimensional array we define a quadratic and rectangular matrix. First,


we review matrix algebra with respect to two inner and one external relation,
namely multiplication of a matrix by a scalar, addition of matrices of the same
order, matrix multiplication of type Cayley, Kronecker-Zehfuss, Khati-Rao and
Hadamard. Second, we introduce special matrices of type symmetric, antisym-
metric, diagonal, unity, null, idempotent, normal, orthogonal, orthonormal (spe-
cial facts of representing a 2×2 orthonormal matrix, a general nxn orthonormal
matrix, the Helmert representation of an orthonormal matrix with examples,
special facts about the representation of a Hankel matrix with examples, the
definition of a Vandermonde matrix), the permutation matrix, the commutation
matrix. Third, scalar measures like rank, determinant, trace and norm. In detail,
we review the Inverse Partitional Matrix /IPM/ and the Cayley inverse of the
sum of two matrices. We summarize the notion of a division algebra. A special
paragraph is devoted to vector-valued matrix forms like vec, vech and veck.
Fifth, we introduce the notion of eigenvalue-eigenvector decomposition (analysis
versus synthesis) and the singular value decomposition. Sixth, we give details of
generalized inverse, namely g-inverse, reflexive g-inverse, reflexive symmetric g-
inverse, pseudo inverse, Zlobec formula, Bjerhammar formula, rank factoriza-
tion, left and right inverse, projections, bordering, singular value representation
and the theory solving linear equations.

A1 Matrix-Algebra
A matrix is a rectangular or a quadratic array of numbers,

ª a11 a12 ... a1m 1 a1m º


«a a22 ... a2 m 1 a2 m »
« 21 »
A := [aij ] = « ... ... ... ... ... » , aij  \,[ aij ]  \ n×m .
« »
« an 11 an 12 ... an 1m1 an 1m »
«¬ an1 an 2 ... anm 1 anm »¼

The format or “order” of A is given by the number n of rows and the number of
the columns,
O( A) := n × m.

Fact:
Two matrices are identical if they have identical format and if at each
place (i, j) are identical numbers, namely
ª i  {1,..., n}
A = B œ aij = bij «
¬ j  {1,..., m}.
486 Appendix A: Matrix Algebra

Beside the identity of two matrices the transpose of an m × n matrix A = [aij ] is


the m × n matrix ǹc = [a ji ] whose ij element is a ji .
Fact:
( Ac)c = A.

A matrix algebra is defined by the following operations:


• multiplication of a matrix by a scalar (external relation)
• addition of two matrices of the same order (internal relation)
• multiplication of two matrices (internal relation)

Definition (matrix additions and multiplications):


(1) Multiplication by a scalar
ǹ = [aij ], D  \ Ÿ D A = AD = [D aij ] .

(2) Addition of two matrices of the same order


A = [aij ], B = [bij ] Ÿ A + B := [aij + bij ]

A + B = B + A (commutativity)
(A + B) + C = A + (B + C) (associativity)

A  B = A + ( 1)B (inverse addition).

Compatibility
(D + E )A = D A + E A º
distributivity
D ( A + B) = D A + D B »¼

( A + B)c = A c + Bc.

(3) Multiplication of matrices


3(i) “Cayley-product” (“matrix-product”)
ª A = [aij ], O( A) = n × l º
« B = [b ], O(B) = l × m » Ÿ
¬ ij ¼
l
Ÿ C := A ˜ B = [cij ] := ¦ aik bkl , O(C) = n × m
k =1

3(ii) “Kronecker-Zehfuss-product”
A = [aij ], O( A) = n × m º
Ÿ
B = [bij ], O(B) = k × l »¼
A1 Matrix-Algebra 487
Ÿ C := B … A = [cij ], B … A := [bij A], O(C) = O(B … A) = kn × l
3(iii) “Khatri-Rao-product”
(of two rectangular matrices of identical column number)
A = [a1 ,..., am ], O ( A) = n × m º
Ÿ
B = [b1 ,..., bm ], O (B) = k × m »¼

Ÿ C := B : A := [b1 … a1 ,… , bm … am ], O(C) = kn × m

3(iv) “Hadamard-product”
(of two rectangular matrices of the same order; elementwise product)
G = [ gij ], O(G ) = n × m º
Ÿ
H =[hij ], O(H ) = n × m »¼

Ÿ K := G H = [kij ], kij := gij hij , O(K ) = n × m .

The existence of the product A ˜ B does not imply the existence of the product
B ˜ A . If both products exist, they are in general not equal. Two quadratic matri-
ces A and B, for which holds A ˜ B = B ˜ A , are called commutative.
Laws
(i) (A ˜ B) ˜ C = A ˜ (B ˜ C)
A ˜ ( B + C) = A ˜ B + A ˜ C
( A + B) ˜ C = A ˜ C + B ˜ C
( A ˜ B)c = ( Bc ˜ A c) .

(ii)
( A … B) … C = A … ( B … C) = A … B … C
( A + B ) … C = ( A … B ) + ( B … C)
A … ( B + C) = ( A … B) + ( A … C)
( A … B ) ˜ ( C … D ) = ( A ˜ C) … ( B ˜ D )
( A … B )c = A c … B c .

(iii)
( A : B) : C = A : ( B : C) = A : B : C
( A + B ) : C = ( A : C ) + ( B : C)
A : ( B + C) = ( A : B) + ( A : C)
( A ˜ C) : (B ˜ D) = ( A : B) ˜ (C : D)
A : (B ˜ D) = ( A : B) ˜ D, if dij = 0 for i z j.
488 Appendix A: Matrix Algebra

The transported Khatri-Rao-product generates a row product which we do


not follow here.

(iv) A B = B A
( A B ) C = A ( B C) = A B C
( A + B ) C = ( A C ) + ( B C)
( A1 ˜ B1 ˜ C1 ) ( A 2 ˜ B 2 ˜ C2 ) = ( A1 : A 2 )c ˜ ( B1 … B 2 ) ˜ (C1 : C2 )
(D ˜ A) (B ˜ D) = D ˜ ( A B) ˜ D, if dij = 0 for i z j
( A B)c = Ac Bc.

A2 Special Matrices
We will collect special matrices of symmetric, antisymmetric, diagonal, unity,
zero, idempotent, normal, orthogonal, orthonormal, positive-definite and posi-
tive-semidefinite, special orthonormal matrices, for instance of type Helmert or
of type Hankel.

Definitions (special matrices):


A quadratic matrix A = [aij ] of the order O( A) = n × n is called

symmetric œ aijc = a ji i, j  {1,..., n} : A = A c

antisymmetric œ aij =  a ji i, j  {1,..., n} : A =  A c


œ aij = 0  i z j ,
diagonal
A = Diag[a11 ,..., ann ]

ª aij = 0  i z j
unity œ I n× n = «
¬ aij = 1  i z j
zero matrix 0 n× n : aij = 0  i, j  {1,..., n}

upper º
ª aij = 0 i > j
» triangular: « a = 0 i < j
lower »¼ ¬ ij

idempotent if and only if A ˜ A = 0


normal if and only if A ˜ A c = A c ˜ A .

Definition (orthogonal matrix) :


The matrix A is called orthogonal if AA c and A cA are diagonal matrices.
(The rows and columns of A are orthogonal.)
A2 Special Matrices 489

Definition (orthonormal matrix) :


The matrix A is called orthonormal if AA c = A cA = I .
(The rows and columns of A are orthonormal.)
Facts (representation of a 2×2 orthonormal matrix) X  SO(2) :
A 2×2 orthonormal matrix X  SO(2) is an element of the special
orthogonal group SO(2) defined by
SO(2) := {X  R 2×2 | XcX = I 2 and det X = +1}

x12 + x 22 = 1
ªx x2 º
{X = « 1  R 2×2 x1 x3 + x 2 x 4 = 0 , x1 x 4  x 2 x3 = +1}
¬ x3 x 4 »¼
x32 + x 42 = 1

ª cos I sin I º
(i) X=«  R 2×2 , I  [0, 2S ]
¬ sin I cos I »¼

is a trigonometric representation of X  SO(2) .


ª x 1  x 2 º  R 2×2 , x  [1, +1]
(ii) X=« 2 »
¬ 1  x x ¼

is an algebraic representation of X  SO(2)


( x112 + x122 = 1, x11 x 21 + x12 x 22 =  x 1  x 2 + x 1  x 2 = 0, x 21
2
+ x 22
2
= 1) .

ª 1  x2 2x º
« + »
X = « 1+ x 1 + x 2 »  R 2×2 , x  R
2
(iii) 2
« 2 x 1 x »
«¬ 1 + x 2 1 + x 2 »¼

is called a stereographic projection of X


(stereographic projection of SO(2) ~ S1 onto L1 ).
ª 0 xº
(iv) X = (I 2 + S)(I 2  S) 1 , S = « »,
¬ x 0 ¼
where S = S c is a skew matrix (antisymmetric matrix), is called a
Cayley-Lipschitz representation of X  SO(2) .

(v) X  SO(2) is a commutative group (“Abel”)


(Example: X1  SO(2) , X 2  SO(2) , then X1 X 2 = X 2 X1 )
( SO( n) for n = 2 is the only commutative group, SO(n | n z 2) is not
“Abel”).
490 Appendix A: Matrix Algebra

Facts (representation of an n×n orthonormal matrix) X  SO(n) :

An n×n orthonormal matrix X  SO(n) is an element of the special orthogonal


group SO(n) defined by
SO(n) := {X  R n×n | XcX = I n and det X = +1} .

As a differentiable manifold SO(n) inherits a Riemann structure from the ambi-


2 2

ent space R n with a Euclidean metric ( vec Xc  \ , dim vec Xc = n ). Any


n 2

atlas of the special orthogonal group SO(n) has at least four distinct charts and
there is one with exactly four charts. (“minimal atlas”: Lusternik – Schnirelmann
category)
(i) X = (I n + S)(I n  S) 1 ,
where S = Sc is a skew matrix (antisymmetric matrix), is called a
Cayley-Lipschitz representation of X  SO(n) .
( n! / 2(n  2)! is the number of independent parameters/coordinates of X)
(ii) If each of the matrices R 1 ," , R k is an n×n orthonormal matrix, then
their product
R1R 2 " R k 1R k  SO(n)

is an n×n orthonormal matrix.

Facts (orthonormal matrix: Helmert representation) :

Let ac = [a1 , ", a n ] represent any row vector such that a i z 0 (i {1, " , n}) is
any row vector whose elements are all nonzero. Suppose that we require an
n×n orthonormal matrix, one row which is proportional to ac . In what follows
one such matrix R is derived.
Let [r1c, " , rnc ] represent the rows of R and take the first row r1c to be the row of
R that is proportional to ac . Take the second row r2c to be proportional to the n-
dimensional row vector
[a1 ,  a12 / a 2 , 0, 0, " , 0], (H2)
the third row r3c proportional to
[a1 , a 2 ,  (a12 + a 22 ) / a 3 , 0, 0, ", 0] (H3)

and more generally the first through nth rows r1c, " , rnc proportional to
k 1
[a1 , a 2 , " , a k 1 ,  ¦ a i2 / a k , 0, 0, " , 0] (Hn-1)
i =1

for k  {2,", n} ,
A2 Special Matrices 491

respectively confirm to yourself that the n-1 vectors ( H n1 ) are orthogonal to
each other and to the vector ac . In order to obtain explicit expressions for r1c, ",
rnc it remains to normalize ac and the vectors ( H n1 ). The Euclidean norm of the
kth of the vectors ( H n1 ) is
k 1 k 1 k 1 k
{¦ a i2 + (¦ a i2 ) 2 / a k2 }1 / 2 = {(¦ a i2 ) (¦ a i2 ) / a k2 }1 / 2 .
i =1 i =1 i =1 i =1

Accordingly for the orthonormal vectors r1c, " , rnc we finally find
n
(1st row) r1c = [¦ a i2 ] 1 / 2 (a1 , ", a n )
i =1

a 2k k 1
a i2
(kth row) rkc = [ k 1 k
] 1 / 2 (a1 , a 2 , ", a k 1 ,  ¦ , 0, 0, ", 0)
i =1 a k
(¦ a i2 ) (¦ a i2 ).
i =1 i =1

a 2n n 1
a i2
(nth row) rnc = [ n 1 n
] 1 / 2 [a1 , a 2 , ", a n1 ,  ¦ ] .
i =1 a n
(¦ a i2 ) (¦ a i2 ).
i =1 i =1

The recipy is complicated: When a c = [1, 1, ",1, 1] , the Helmert factors in the
1st row, …, kth row,…, nth row simplify to
r1c = n 1 / 2 [1, 1, ",1, 1]  R n

rkc = [k (k  1)]1 / 2 [1, 1, " ,1, 1  k , 0, 0, " , 0, 0]  R n


n
rnc = [ n( n  1)] [1, 1, " ,1, 1  n]  R .
1/ 2

The orthonormal matrix

ª r1c º
« rc »
« 2 »
«"»
« »
«rkc1 »  SO(n)
« rkc »
« »
«"»
«r c »
« n 1 »
«¬ rnc »¼

is known as the Helmert matrix of order n. (Alternatively the transposes of such a


matrix are called the Helmert matrix.)
492 Appendix A: Matrix Algebra

Example (Helmert matrix of order 3):


ª1/ 3 1/ 3 1/ 3 º
« »
«1/ 2 1/ 2 0 »  SO(3).
« »
«¬1/ 6 1/ 6 2 / 6 »¼

Check that the rows are orthogonal and normalized.

Example (Helmert matrix of order 4):

ª 1/ 2 1/ 2 1/ 2 1/ 2 º
« »
« 1/ 2 1/ 2 0 0 »
« »  SO(4).
« 1/ 6 1/ 6 2 / 6 0 »
«1/ 12 1/ 12 1/ 12 3 / 12 »¼
¬
Check that the rows are orthogonal and normalized.
Example (Helmert matrix of order n):

ª 1/ n 1/ n 1/ n 1/ n " 1/ n 1/ n º
« »
« 1/ 2 1/ 2 0 0 " 0 0 »
« »
« 1/ 6 1/ 6 2/ 6 0 " 0 0 »
« " " »
« »  SO(n).
« 1 1 1 1  (n 1) »
« " " 0 »
« (n 1)(n  2) (n 1)(n  2) (n 1)(n  2) (n 1)(n  2) »
« 1 1 1 1 1 n »
« " " »
¬« n(n 1) n(n 1) n(n 1) n(n 1) n(n 1) ¼»

Check that the rows are orthogonal and normalized. An example is the nth row

1  2n + n
2 2
1 1 (1  n ) n 1
+"+ + = + =
n ( n  1) n( n  1) n( n  1) n( n  1) n( n  1)
2
n n n ( n  1)
= = = 1,
n ( n  1) n( n  1)

where (n-1) terms 1/[n(n-1)] have to be summed.

Definition (orthogonal matrix) :


A rectangular matrix A = [aij ]  \ n×m is called
“a Hankel matrix” if the n+m-1 distinct elements of A ,
A2 Special Matrices 493

ª a11 º
«a »
« 21 »
« " »
« »
« an 11 »
«¬ an1 an 2 " anm »¼

only appear in the first column and last row.

Example: Hankel matrix of power sums


Let A  R n× m be a n×m rectangular matrix ( n d m ) whose entries are power
sums.
ª n n n
º
« ¦ D i xi ¦D x ¦D x
2 m
i i " i i »
« i =1 i =1 i =1
»
« n n n
m +1 »
¦D x2
A := «« i =1 i i
¦D x
i =1
3
i i " ¦ D i xi »
i =1 »
« # # # # »
« n n n »
« D xn n +1
" ¦ D i xin + m1 »
«¬ ¦
i =1
i i ¦D x
i =1
i i
i =1
»¼

A is a Hankel matrix.
Definition (Vandermonde matrix):
Vandermonde matrix: V  R n× n

ª 1 1 " 1 º
« x x2 " xn »
V := « #1 # # # »,
«¬ x1n 1 n 1
x2 n 1 »
" xn ¼
n
det V = – ( xi  x j ).
i, j
i> j

Example: Vandermonde matrix V  R 3×3

ª1 1 1º
V := «« x1 x2 x3 »» , det V = ( x2  x1 )( x3  x2 )( x3  x1 ).
«¬ x12 x22 x32 »¼

Example: Submatrix of a Hankel matrix of power sums


Consider the submatrix P = [a1 , a2 ," , an ] of the Hankel matrix A  R n× m (n d m)
whose entries are power sums. The determinant of the power sums matrix P is
494 Appendix A: Matrix Algebra
n n
det P = (– D i )(– xi )(det V ) 2 ,
i =1 i =1

where det V is the Vandermonde determinant.


Example: Submatrix P  R 3×3 of a 3×4 Hankel matrix of power sums (n=3,m=4)
A=
ª D1 x1 + D 2 x2 + D 3 x3 D1 x12 + D 2 x22 + D 3 x32 D1 x13 + D 2 x23 + D 3 x33 D1 x14 + D 2 x24 + D 3 x34 º
« 2 5»
«D1 x1 + D 2 x2 + D 3 x3 D1 x1 + D 2 x2 + D 3 x3 D1 x1 + D 2 x2 + D 3 x3 D1 x1 + D 2 x2 + D 3 x3 »
2 2 3 3 3 4 4 4 5 5

«D1 x13 + D 2 x23 + D 3 x33 D1 x14 + D 2 x24 + D 3 x34 D1 x15 + D 2 x25 + D 3 x35 D1 x16 + D 2 x26 + D 3 x36 »
¬ ¼
P = [a1 , a2 , a3 ]
ª D1 x1 + D 2 x2 + D 3 x3 D1 x12 + D 2 x22 + D 3 x32 D1 x13 + D 2 x23 + D 3 x33 º
« 2 4»
«D1 x1 + D 2 x2 + D 3 x3 D1 x1 + D 2 x2 + D 3 x3 D1 x1 + D 2 x2 + D 3 x3 » .
2 2 3 3 3 4 4

«D1 x13 + D 2 x23 + D 3 x33 D1 x14 + D 2 x24 + D 3 x34 D1 x15 + D 2 x25 + D 3 x35 »
¬ ¼

Definitions (positive definite and positive semidefinite matrices)


A matrix A is called positive definite, if and only if
xcAx > 0 x  \ n , x z 0 .
A matrix A is called positive semidefinite, if and only if
xcAx t 0 x  \ n .
An example follows.
Example (idempotence):
All idempotent matrices are positive semidefinite, at the time BcB and BBc
for an arbitrary matrix B .
What are “permutation matrices” or “commutation matrices”? After their defi-
nitions we will give some applications.
Definitions (permutation matrix, commutation matrix)
A matrix is called a permutation matrix if and only if each column of the
matrix A and each row of A has only one element 1 . All other elements
are zero. There holds AA c = I .
A matrix is called a commutation matrix, if and only if for a matrix of the
order n 2 × n 2 there holds
K = K c and K 2 = I n . 2

The commutation matrix is symmetric and orthonormal.


A3 Scalar Measures and Inverse Matrices 495
Example (commutation matrix)
ª1 0 0 0º
«0 0 1 0 »»
n=2Ÿ K4 = « = K c4 .
«0 1 0 0»
« »
¬0 0 0 1¼

A general definition of matrices K nm of the order nm × nm with n z m are to


found in J.R. Magnus and H. Neudecker (1988 p.46-48). This definition does not
lead to a symmetric matrix anymore. Nevertheless is the transpose commutation
matrix again a commutation matrix since we have K cnm = K nm and
K nm K mn = I nm .

Example (commutation matrix)


ª1 0 0 0 0 0º
«0 0 1 0 0 0»
«0 0 0 0 1 0»
n = 2º Ÿ = «0
m = 3»¼
K 2˜3 1 0 0 0 0»
«0 0 0 1 0 0»
«0 0 0 0 0 0»
«¬0 0 0 0 0 1 »¼

ª1 0 0 0 0 0º
«0 0 0 1 0 0»
n = 3º Ÿ K 3˜2
«
= «0 1 0 0 0 0»
m = 2 »¼ 0 0 0 0 1 0»
«0 0 1 0 0 0»
«¬0 0 0 0 0 1 »¼

K 3˜2 K 2˜3 = I 6 = K 2˜3 K 3˜2 .

A3 Scalar Measures and Inverse Matrices


We will refer to some scalar measures, also called scalar functions, of matrices.
Beforehand we will introduce some classical definitions of type
• linear independence
• column and row rank
• rank identities.

Definitions (linear independence, column and row rank):


A set of vectors x1 , ..., x n is called linear independent if for an arbitrary
n
linear combination 6 i=1D i xi = 0 only holds if all scalars D1 , ..., D n dis-
appear, that is if D1 = D 2 = ... = D n 1 = D n = 0 holds.
496 Appendix A: Matrix Algebra

For all vectors which are characterized by x1 ,..., x n unequal from zero are
called linear dependent.
Let A be a rectangular matrix of the order O( ǹ) = n × m . The column rank
of the matrix A is the largest number of linear independent columns, while
the row rank is the largest number of linear independent rows. Actually the
column rank of the matrix A is identical to its row rank. The rank of a ma-
trix thus is called
rk A .
Obviously,
rk A d min{n, m}.

If rk A = n holds, we say that the matrix A has full row ranks. In contrast if
the rank identity rk A = m holds, we say that the matrix A has full column
rank.

We list the following important rank identities.


Facts (rank identities):
(i) rk A = rk A c = rk A cA = rk AA c
(ii) rk( A + B) d rk A + rk B
(iii) rk( A ˜ B) d min{rk A, rk B}
(iv) rk( A ˜ B) = rk A if B has full row rank,
(v) rk( A ˜ B) = rk B if A has full column rank.
(vi) rk( A ˜ B ˜ C) + rk B t rk( A ˜ B) + rk(B ˜ C)
(vii) rk( A … B) = (rk A) ˜ (rk B).

If a rectangular matrix of the order O( A) = n × m is fulfilled and, in addition,


Ax = 0 holds for a certain vector x z 0 , then
rk A d m  1 .
Let us define what is a rank factorization, the column space, a singular matrix
and, especially, what is division algebra.
Facts (rank factorization)
We call a rank factorization
A = G˜F ,
if rk A = rk G = rk F holds for certain matrices G and F of the order
O(G ) = n × rk A and O(F) = rk A × m.
A3 Scalar Measures and Inverse Matrices 497

Facts
A matrix A has the column space
R ( A)
formed by the column vectors. The dimension of such a vector space is dim
R ( A) = rk A . In particular,
R ( A) = R ( AA c)
holds.

Definition (non-singular matrix versus singular matrix)


Let a quadratic matrix A of the order O( A) be given. A is called non-
singular or regular if rk A = n holds. In case rk A < n, the matrix A is
called singular.

Definition (division algebra):


Let the matrices A, B, C be quadratic and non-singular of the order
O( A) = O(B) = O(C) = n × n . In terms of the Cayley-product an inner
relation can be based on
A = [aij ], B = [bij ], C = [cij ], O( A) = O(B) = O(C) = n × n

(i) ( A ˜ B ) ˜ C = A ˜ ( B ˜ C) (associativity)

(ii) A˜I = A (identity)


(iii) A ˜ A 1 = I (inverse).

The non-singular matrix A 1 = B is called Cayley-inverse. The conditions


A ˜ B = In œ B ˜ A = In

are equivalent. The Cayley-inverse A 1 is left and right identical. The Cayley-
inverse is unique.

Fact: ( A 1 ) c = ( A c) 1 : A is symmetric œ A 1 is symmetric.

Facts: (Inverse Partitional Matrix /IPM/ of a symmetric matrix):

Let the symmetric matrix A be partitioned as

ªA A 12 º
A := « 11 c = A 11 , A c22 = A 22 .
, A 11
c
¬ A 12 A 22 »¼
498 Appendix A: Matrix Algebra

Then its Cayley inverse A 1 is symmetric and can be partitioned as well as


1
ªA A 12 º
A 1 = « 11 =
c
¬ A 12 A 22 »¼

ª[I + A 11
1 1
c A 11
A 12 ( A 22  A 12 A 12 ) 1 A 12 1
c ]A 11 1
 A 11 c A 11
A 12 ( A 22  A 12 1
A 12 ) 1 º
« 1 »,
¬  ( A 22  A 12c A 11 A 12 ) 1 A 12 1
c A 11 ( A 22  A 12 1
c A 11 A 12 ) 1 ¼
1
if A 11 exists ,
1
ªA A 12 º
A 1
= « 11 =
c
¬ A 12 A 22 »¼

ª c A 221 A 12 ) 1
( A 11  A 12 c A 221 A 12 ) 1 A 12 A 221
 ( A 11  A 12 º
« 1 »
,
1
c ( A 11  A 12
¬ A 22 A 12
1
c A 22 A 12 ) 1
[I + A 22 A 12
1 1
c ( A 11  A 12 A 22 A 12 1
c ) A 12 ]A 22 ¼

if A 221 exists .

S 11 := A 22  A 12 1
c A 11 A 12 and S 22 := A 11  A 12
c A 221 A 12

are the minors determined by properly chosen rows and columns of the matrix A
called “Schur complements” such that
1
ªA A 12 º
A 1
= « 11 =
c
¬ A 12 A 22 »¼

ª(I + A 11
1 1
A 12 S 11 1
c ) A 11
A 12 1
 A 11 1
A 12 S 11 º
« 1 1 1 »
¬  S 11 c A 11
A 12 S 11 ¼
1
if A 11 exists ,
1
ªA A 12 º
A 1
= « 11 =
c
¬ A 12 A 22 »¼

ª S 221  S 221 A 12 A 221 º


« 1 »
1
¬ A 22 A 12c S 22 [I + A 22 A 12
1 1 1
c S 22 A 12 ]A 22 ¼

if A 221 exists ,

are representations of the Cayley inverse partitioned matrix A 1 in terms of


“Schur complements”.
A3 Scalar Measures and Inverse Matrices 499

The formulae S11 and S 22 were first used by J. Schur (1917). The term “Schur
complements” was introduced by E. Haynsworth (1968). A. Albert (1969) re-
placed the Cayley inverse A 1 by the Moore-Penrose inverse A + . For a survey
we recommend R. W. Cottle (1974), D.V. Oullette (1981) and D. Carlson (1986).
:Proof:
For the proof of the “inverse partitioned matrix” A 1 (Cayley inverse) of the
partitioned matrix A of full rank we apply Gauss elimination (without pivoting).

AA 1 = A 1 A = I

ªA A 12 º
A = « 11 c = A 11 , A c22 = A 22
, A 11
c
¬ A 12 A 22 »¼

ª A  R m×m , A  R m×l
« 11 l ×m
12
l ×l
c R
«¬ A 12 , A 22  R

ªB B 12 º
A 1 = « 11 c = B 11 , B c22 = B 22
, B 11
c
¬B 12 B 22 »¼

ªB  R m×m , B  R m×l
« 11 l ×m
12
l ×l
c R
«¬ B 12 , B 22  R

AA 1 = A 1 A = I œ

ª A11B11 + A12 B12


c = B11A11 + B12 A12 c = Im (1)
«A B + A B = B A +B A = 0 (2)
« 11 12 12 22 11 12 12 22

c B11 + A 22 B12
« A12 c = B12
c A11 + B 22 A12
c =0 (3)
«
c B12 + A 22 B 22 = B12
¬ A12 c A12 + B 22 A 22 = I l (4).
1
Case (i): A 11 exists

“forward step”
A11B11 + A12 B12
c = I m (first left equation: º
» 1
cA ) »
multiply by  A12 11
œ
c = 0 (second right equation) ¼»
c B11 + A 22 B12
A12

c B 11  A 12
 A 12 c A 111
c = A 12
A 12 B 12 1
c A 11 º
œ » œ
c B 11 + A 22 B 12
A 12 c =0 ¼
500 Appendix A: Matrix Algebra

ª A B + A 12 B 12 c = Im
œ « 11 11 Ÿ
1
c =  A 12
c A 11 A 12 )B 12
¬( A 22  A 12
1
c A 11

c = ( A 22  A 12
B 12 1
c A 11 A 12 ) 1 A 12 1
c A 11
c = S 11 A 12
B 12 1 1
c A 11

or

ª Im 0 º ª A11 A12 º ª A11 A12 º


«  Ac A 1 = .
I l »¼ «¬ A12c A 22 »¼ «¬ 0 A 22  A12c A11 A12 »¼
1
¬ 12 11
Note the “Schur complement” S 11 := A 22  A 12 1
c A 11 A 12 .
“backward step”

A 11B 11 + A 12 B12
c = Im º
1 »
Ÿ
c =  ( A 22  A 12
B12 1 1
c A 11 A 12 ) A 12
c A 11 ¼

Ÿ B 11 = A 11
1
c ) = (I m  B 12 A 12
(I m  A 12 B 12 1
c ) A 11

B 11 = [I m + A 11
1 1
c A 11
A 12 ( A 22  A 12 A 12 ) 1 A 12 1
c ]A 11
B 11 = A 11 + A 11 A 12 S 11 A 12
1 1 1 1
c A 11

A11B12 + A12 B 22 = 0 (second left equation) Ÿ

Ÿ B 12 =  A 11
1
A 12 B 22 =  A 11
1 1
c A 11
A 12 ( A 22  A 12 A 12 ) 1

œ
B 22 = ( A 22  A12 1
c A11 A12 ) 1
B 22 = S11
1
.

Case (ii): A 221 exists

“forward step”
A11B12 + A12 B 22 = 0 (third right equation) º
c B12 + A 22 B 22 = I l (fourth left equation: » œ
A12 »
multiply by  A12 A 221 ) »¼

A 11B 12 + A 12 B 22 = 0 º
œ 1 »
œ
1
c B 12  A 12 B 22 =  A 12 A 22 ¼
 A 12 A 22 A 12

ª A c B + A 22 B 22 = I l
œ « 12 12 Ÿ
1
c )B 12 =  A 12 A 221
¬( A 11  A 12 A 22 A 12
A3 Scalar Measures and Inverse Matrices 501

B 12 = ( A 11  A 12 A 22
1
c ) 1 A 12
A 12 1
c A 22
B 12 = S 22 A 12 A 22
1 1

or
ªI m  A 12 A 221 º ª A 11 A 12 º ª A 11  A 12 A 221 A 12
c 0 º
« »« » =« ».
¬0 Il c
¼ ¬ A 12 A 22 ¼ ¬ c
A 12 A 22 ¼

Note the “Schur complement” S 22 := A 11  A 12 A 22


1
c .
A 12

“backward step”
c B12 + A 22 B 22 = I l
A 12 º
1 »
Ÿ
B12 =  ( A 11  A 12 A 22 A 12
1 1
c ) A 12 A 22 ¼

Ÿ B 22 = A 22
1
c ) = (I l  B 12
c B 12
(I l  A 12 1
c A 12 ) A 22

B 22 = [I l + A 221 A 12
c ( A 11  A 12 A 221 A 12
c ) 1 A 12 ]A 221
B 22 = A 22 + A 22 A 12
1 1 1 1
c S 22 A 12 A 22

c B 11 + A 22 B 12
A 12 c = 0 ( third left equation ) Ÿ

c =  A 221 A 12
Ÿ B 12 c B 11 =  A 221 A 12
c ( A 11  A 12 A 221 A 12
c ) 1

œ
B 11 = ( A 11  A 12 A 1
22 A 1c 2 )  1
B 1 1 = S 2 21 .

h
The representations { B11 , B12 , B 21 = B12
c , B 22 } in terms of { A11 , A12 , A 21 = A12
c ,
A 22 } have been derived by T. Banachiewicz (1937). Generalizations are referred
to T. Ando (1979), R. A. Brunaldi and H. Schneider (1963), F. Burns, D. Carl-
son, E. Haynsworth and T. Markham (1974), D. Carlson (1980), C. D. Meyer
(1973) and S. K. Mitra (1982), C. K. Li and R. Mathias (2000).
We leave the proof of the following fact as an exercise.

Fact (Inverse Partitioned Matrix /IPM/ of a quadratic matrix):


Let the quadratic matrix A be partitioned as

ªA A 12 º
A := « 11 .
¬ A 21 A 22 »¼

Then its Cayley inverse A 1 can be partitioned as well as


502 Appendix A: Matrix Algebra

1
ªA A 12 º
A 1 = « 11 =
¬ A 21 A 22 »¼

1
ª A 11 + A 11
1 1
A 12 S 11 1
A 21 A 11 1
 A 11 1
A 12 S 11 º
« 1 1 1 »,
¬  S 11 A 21 A 11 S 11 ¼
1
if A 11 exists
1
ªA A 12 º
A 1
= « 11 =
¬ A 21 A 22 »¼

ª S 221  S 221 A 12 A 221 º


« 1 »
,
1 1
¬ A 22 A 21S 22 A 221 + A 22 A 21S 22 A 12 A 22 ¼
1 1

if A 221 exists

and the “Schur complements” are definded by


S 11 := A 22  A 21 A 11
1
A 12 and S 22 := A 11  A 12 A 221 A 21 .

Facts: ( Cayley inverse: sum of two matrices):


(s1) ( A + B ) 1 = A 1  A 1 ( A 1 + B 1 )  1 A  1
(s2) ( A  B) 1 = A 1 + A 1 ( A 1  B 1 ) 1 A  1
(s3) ( A + CBD) 1 = A 1  A 1 (I + CBDA 1 ) 1 CBDA 1
(s4) ( A + CBD) 1 = A 1  A 1 (I + BDA 1C) 1 BDA 1
(s5) ( A + CBD) 1 = A 1  A 1CB(I + DA 1CB) 1 DA 1
(s6) ( A + CBD) 1 = A 1  A 1CBD(I + A 1CBD) 1 A 1
(s7) ( A + CBD) 1 = A 1  A 1CBDA 1 (I + CBDA 1 ) 1
(s8) ( A + CBD) 1 = A 1  A 1C(B 1 + DA 1C) 1 DA 1
( Sherman-Morrison-Woodbury matrix identity )
(s9) B( AB + C) 1 = (I + BC1 A) 1 BC1
(s10) BD( A + CBD) 1 = (B 1 + DA 1C) 1 DA 1
(Duncan-Guttman matrix identity).
W. J. Duncan (1944) calls (s8) the Sherman-Morrison-Woodbury matrix identity.
If the matrix A is singular consult H. V. Henderson and G. S. Searle (1981), D.
V. Ouellette (1981), W. M. Hager (1989), G. W. Stewart (1977) and K. S. Riedel
A3 Scalar Measures and Inverse Matrices 503

(1992). (s10) has been noted by W. J. Duncan (1944) and L. Guttman (1946):
The result is directly derived from the identity
( A + CBD)( A + CBD) 1 = I Ÿ

Ÿ A( A + CBD) 1 + CBD( A + CBD) 1 = I


( A + CBD) 1 = A 1  A 1CBD( A + CBD) 1

A 1 = ( A + CBD) 1 + A 1CBD( A + CBD) 1


DA 1 = D( A + CBD) 1 + DA 1CBD( A + CBD) 1
DA 1 = (I + DA 1CB)D( A + CBD) 1
DA 1 = (B 1 + DA 1C)BD( A + CBD) 1
(B 1 + DA 1C) 1 DA 1 = BD( A + CBD)1 .

h
Certain results follow directly from their definitions.
Facts (inverses):
(i) ( A ¸ B)1 = B1 ¸ A1
(ii) ( A  B)1 = B1  A1
(iii) A positive definite ” A1 positive definite
(iv) ( A  B)1 , ( A B)1 and (A1 B1 ) are positive
definite, then (A1 B1 )  ( A B)1 is positive
semidefinite as well as (A1 A )  I and I  (A1 A)1 .

Facts (rank factorization):

(i) If the n × n matrix is symmetric and positive semidefinite,


then its rank factorization is
ªG º
A = « 1 » [G1c G c2 ] ,
¬G 2 ¼
where G1 is a lower triangular matrix of the order
O(G1 ) = rk A × rk A with
rk G 2 = rk A ,
whereas G 2 has the format O(G 2 ) = (n  rk A) × rk A. In this
case we speak of a Choleski decomposition.
504 Appendix A: Matrix Algebra

(ii) In case that the matrix A is positive definite, the matrix


block G 2 is not needed anymore: G1 is uniquely determined.
There holds
A 1 = (G11 )cG11 .
Beside the rank of a quadratic matrix A of the order O( A) = n × n as the first
scalar measure of a matrix, is its determinant
n
A = ¦ (1)) ( j ,..., j
1 n )
–a iji
perm i =1
( j1 ,..., jn )

plays a similar role as a second scalar measure. Here the summation is extended
as the summation perm ( j1 ,… , jn ) over all permutations ( j1 ,..., jn ) of the set of
integer numbers (1,… , n) . ) ( j1 ,… , jn ) is the number of permutations which
transform (1,… , n) into ( j1 ,… , jn ) .

Laws (determinant)
(i) | D ˜ A | = D n ˜ | A | for an arbitrary scalar D  R
(ii) | A ˜ B |=| A | ˜ | B |
(iii) | A … B |=| A |m ˜ | B |n for an arbitrary m × n matrix B
(iv) | A c |=| A |
1
(v) | (A + A c) |d| A | if A + A c is positive definite
2
(vi) | A 1 |=| A |1 if A 1 exists
(vii) | A |= 0 œ A is singular ( A 1 does not exist)
(viii) | A |= 0 if A is idempotent, A z I
n
(ix) | A |= – aii if A is diagonal and a triangular matrix
i =1
n
(x) 0 d| A |d – aii =| A I | if A is positive definite
i =1
n
(xi) | A | ˜ | B | d | A | ˜– bii d| A B | if A and B are posi-
i =1
tive definite
1
ª det A11 det( A 22  A 21A11 A12 )
« m ×m
ª A11 A12 º « A11  R , rk A11 = m1 1 1

(xii) «A » =« 1
A 22 ¼
¬ 21 « det A 21 det( A11  A12 A 22 A 21 )
« A  R m ×m , rkA = m . 2 2
¬ 22 22 2
A3 Scalar Measures and Inverse Matrices 505

A submatrix of a rectangular matrix A is the result of a canceling procedure of


certain rows and columns of the matrix A. A minor is the determinant of a quad-
ratic submatrix of the matrix A. If the matrix A is a quadratic matrix, to any
element aij there exists a minor being the determinant of a submatrix of the ma-
trix A which is the result of reducing the i-th row and the j-th column. By multi-
plying with (1)i + j we gain a new element cij of a matrix C = [cij ] . The trans-
pose matrix Cc is called the adjoint matrix of the matrix A, written adjA . Its
order is the same as of the matrix A.

Laws (adjoint matrix)


n
(i) | A |= ¦ aij cij , i = 1,… , n
j =1
n
(ii) | A |= ¦ a jk c jk , k = 1,… , n
j =1

(iii) A ˜ (adj A) = (adj A) ˜ A = | A | ˜I


(iv) adj( A ˜ B) = (adj B) ˜ (adj A)
(v) adj( A … B) = (adj A) … (adj B)
(vi) adj A =| A | ˜A 1 if A is nonsingular
(vii) adjA positive definitive œ A positive definite.

As a third scalar measure of a quadratic matrix A of the order O( A) = n × n we


introduce the trace tr A as the sum of diagonal elements,
n
tr A = ¦ aii .
i =1

Laws (trace of a matrix)


(i) tr(D ˜ A) = D ˜ tr A for an arbitrary scalar D  R
(ii) tr( A + B) = tr A + tr B for an arbitrary n × n matrix B
(iii) tr( A … B) = (tr A) ˜ (tr B) for an arbitrary m × m matrix B
iv) tr A = tr(B ˜ C) for any factorization A = B ˜ C
(v) tr A c(B C) = tr( A c Bc)C for an arbitrary n × n matrix
B and C
(vi) tr A c = tr A
(vii) trA = rkA if A is idempotent
(viii) 0 < tr A = tr ( A I) if A is positive definite
(ix) tr( A B) d (trA) ˜ (trB) if A und % are positive semidefi-
nite.
506 Appendix A: Matrix Algebra

In correspondence to the W – weighted vector (semi) – norm.


|| x ||W = (xc W x)1/ 2

is the W – weighted matrix (semi) norm


|| A ||W = (trA cWA)1/ 2
for a given positive – (semi) definite matrix W of proper order.
Laws (trace of matrices):
(i) tr A cWA t 0
(ii) tr A cWA = 0 œ WA = 0
œ A = 0 if W is positive definite

A4 Vector-valued Matrix Forms


If A is a rectangular matrix of the order O( A) = n × m , a j its j – th column, then
vec A is an nm × 1 vector
ª a1 º
«a »
« 2 »
vec A = « … » .
« »
« an 1 »
«¬ an »¼
In consequence, the operator “vec” of a matrix transforms a vector in such a way
that the columns are stapled one after the other.

Definitions ( vec, vech, veck ):


ª a1 º
«a »
« 2 »
(i) vec A = « … » .
« »
« an 1 »
«¬ an »¼
(ii) Let A be a quadratic symmetric matrix, A = A c , of
order O( A) = n × n . Then vechA (“vec - koef”) is the
[n(n + 1) / 2] × 1 vector which is the result of row (column) stapels
of those matrix elements which are upper and under of its diago-
nal.
A4 Vector-valued Matrix Forms 507

ª a11 º
«… »
« »
« an1 »
a
A = [aij ] = [a ji ] = A c Ÿ vechA := «« 22 »» .

«a »
« n2 »
«… »
«¬ ann »¼
(iii) Let A be a quadratic, antisymmetric matrix, A = A c , of
order O( A) = n × n . Then veckA (“vec - skew”) is the
[n(n + 1) / 2] × 1 vector which is generated columnwise stapels of
those matrix elements which are under its diagonal.
ª a11 º
« … »
« »
« an1 »
« a »
A = [aij ] = [a ji ] =  Ac Ÿ veckA := « 32 » .

« a »
« n2 »
« … »
«¬ an, n 1 »¼

Examples
ªa b cº
(i) A=« Ÿ vecA = [a, d , b, e, c, f ]c
¬d e f »¼
ªa b cº
(ii) A = «« b d e »» = A c Ÿ vechA = [ a, b, c, d , e, f ]c
¬« c e f »¼
ª 0  a b c º
« a 0 d e »»
(iii) A=« =  A c Ÿ veckA = [a, b, c, d , e, f ]c .
«b d 0 f»
« »
«¬ c e f 0 »¼
Useful identities, relating to scalar- and vector - valued
measures of matrices will be reported finally.

Facts (vec and trace forms):


(i) vec(A ˜ B ˜ Cc) = (C … A) vec B
(ii) vec(A ˜ B) = (Bc … I n ) vec A = (Bc … A) vec I m =
= (I1 … A) vec B, A  R n× m , B  R m × q
508 Appendix A: Matrix Algebra

(iii) A ˜ B ˜ c = (cc … A)vecB = ( A … cc)vecB c, c  R q


(iv) tr( A c ˜ B) = (vecA)cvecB = (vecA c)vecBc = tr( A ˜ Bc)
(v) tr(A ˜ B ˜ Cc ˜ Dc) = (vec D)c(C … A) vec B =
= (vec Dc)c( A … C) vec Bc
(vi) K nm ˜ vecA = vecA c, A  R n× m
(vii) K qn (A … B) = (B … A)K pm
(viii) K qn (A … B)K mp = (B … A)
(ix) K qn (A … c) = c … A
(x) K nq (c … A) = A … c, A  R n×m , B  R q× p , c  R q
(xi) vec(A … B) = (I m … K pn … I q )(vecA … vecB)
(xii) A = (a1 ,… , a m ), B := Diagb, O(B) = m × m,
m
Cc = [c1 ,… , c m ] Ÿ vec(A ˜ B ˜ Cc) = vec[¦ (a j b j ccj )] =
j =1
m
= ¦ (c j … a j )b j = [c1 … a1 ,… , c m … a m )]b = (C : A)b
j =1

(xiii) A = [aij ], C = [cij ], B := Diagb, b = [b1 ,… ,b m ]  R m

Ÿ tr(A ˜ B ˜ Cc ˜ B) = (vec B)c vec(C ˜ B ˜ A c) =


= bc(I m : I m )c ˜ ( A : C)b = bc( A C)b
(xiv) B := I m Ÿ tr( A ˜ Cc) = rmc ( A C)rm
( rm is the m ×1 summation vector: rm := [1, …,1]c  R m )
(xv) vec DiagD := (I m D)rm = [I m ( A c ˜ B ˜ C)]rm =

= (I m : I m )c = [I m : ( A c ˜ B ˜ C)] ˜ vec DiagI m =


= (I m : I m )c ˜ vec( A c ˜ B ˜ C) =
= (I m : I m )c ˜ (Cc … A c)vecB = (C : A)cvecB
when D = A c ˜ B ˜ C is factorized.

Facts (Löwner partial ordering):


For any quadratic matrix A  R m×m there holds the uncertainty
I m ( A c ˜ A) t I m A A = I m [( A : I m )c ˜ (I m : A)]
in the Löwner partial ordering that is the difference matrix
I m (Ac ˜ A)  I m A A
is at least positive semidefinite.
A5 Eigenvalues and Eigenvectors 509

A5 Eigenvalues and Eigenvectors


To any quadratic matrix A of the order O( A) = m × m there exists an eigenvalue
O as a scalar which makes the matrix A  O I m singular. As an equivalent state-
ment, we say that the characteristic equation O I m  A = 0 has a zero value
which could be multiple of degrees, if s is the dimension of the related null space
N ( A  O I ) . The non-vanishing element x of this null space for which
Ax = O x, x z 0 holds, is called right eigenvector of A. Related vectors y for
which y cA = Ȝy , y z 0 , holds, are called left eigenvectors of A and are represen-
tative of the right eigenvectors A’. Eigenvectors always belong to a certain ei-
genvalue and are usually normed in the sense of xcx = 1, y cy = 1 as long as they
have real components. As the same time, the eigenvectors which belong to dif-
ferent eigenvalues are always linear independent: They obviously span a sub-
space of R ( A) .
In general, the eigenvalues of a matrix A are complex! There is an important
exception: the orthonormal matrices, also called rotation matrices whose eigen-
values are +1 or, –1 and idempotent matrices which can only be 0 or 1 as a
multiple eigenvalue generally, we call a null eigenvalue a singular matrix.
There is the special case of a symmetric matrix A = A c of order O( A) = m × m . It
can be shown that all roots of the characteristic polynomial are real numbers
and accordingly m - not necessary different - real eigenvalues exist. In addition,
the different eigenvalues O and P and their corresponding eigenvectors x and y
are orthogonal, that is
(O  P )xc ˜ y = ( xc ˜ Ac) ˜ y  xc( A ˜ y ) = 0, O  P z 0.
In case that the eigenvalue O of degrees s appears s-times, the eigenspace
N ( A  O ˜ I m ) is s - dimensional: we can choose s orthonormal eigenvectors
which are orthonormal to all other! In total, we can organize m orthonormal
eigenvectors which span the entire R m . If we restrict ourselves to eigenvectors
and to eigenvalues O , O z 0 , we receive the column space R ( A) . The rank of A
coincides with the number of non-vanishing eigenvalues {O1 ,… , Or }.
U := [U1 , U 2 ], O(U) = m × m, U ˜ U c = U cU = I m
U1 := [u1 ,… , u r ], O(U1 ) = m × r , r = rkA

U 2 := [u r +1 ,… , u m ], O(U 2 ) = m × (m  r ), A ˜ U 2 = 0.

With the definition of the r × r diagonal matrix O := Diag(O1 ,… Or ) of non-


vanishing eigenvalues we gain
ª/ 0º
A ˜ U = A ˜ [U1 , U 2 ] = [U1/, 0] = [U1 , U 2 ] « ».
¬ 0 0¼
510 Appendix A: Matrix Algebra

Due to the orthonormality of the matrix U := [U1 , U 2 ] we achieve the results


about eigenvalue – eigenvector analysis and eigenvalues – eigenvector
synthesis.
Lemma (eigenvalue – eigenvector analysis: decomposition):
Let A = A c be a symmetric matrix of the order O( A) = m × m .
Then there exists an orthonormal matrix U in such a way that
U cAU = Diag(O1 ,… Or , 0,… , 0)

holds. (O1 ,… Or ) denotes the set of non – vanishing eigenvalues


of A with r = rkA ordered decreasingly.

Lemma (eigenvalue – eigenvectorsynthesis: decomposition):


Let A = A c be a symmetric matrix of the order O( A) = m × m .
Then there exists a synthetic representation of eigenvalues and
eigenvectors of type
A = U ˜ Diag(O1 ,… Or , 0,… , 0)U c = U1/U1c .

In the class of symmetric matrices the positive (semi)definite matrices play a


special role. Actually, they are just the positive (nonnegative) eigenvalues
squarerooted.

/1/ 2 := Diag( O1 ,… , Or ) .

The matrix A is positive semidefinite if and only if there exists a quadratic


m × m matrix G such that A = GG c holds, for instance, G := [u1 /1/ 2 , 0] . The
quadratic matrix is positive definite if and only if the m × m matrix G is not
singular. Such a representation leads to the rank fatorization A = G1 ˜ G1c with
G1 := U1 ˜ /1/ 2 . In general, we have
Lemma (representation of the matrix U1 ):
If A is a positive semidefinite matrix of the order O( A) with non
– vanishing eigenvalues {O1 ,… , Or } , then there exists an m × r
matrix
U1 := G1 ˜ / 1 = U1 ˜ / 1/ 2
with
U1c ˜ U1 = I r , R (U1 ) = R (U1 ) = R ( A),
such that
U1c ˜ A ˜ U1 = (/ 1/ 2
˜ U1c ) ˜ (U1 ˜ / ˜ U1c ) ˜ (U1 ˜ / 1/ 2 ) = I r .
A5 Eigenvalues and Eigenvectors 511
The synthetic relation of the matrix A is
A = G1 ˜ G1c = U1 ˜ / 1 ˜ U1c .
The pseudoinverse has a peculiar representation if we introduce the matrices
U1 , U1 and / 1 .
Definition (pseudoinverse):
If we use the representation of the matrix A of type A = G1 ˜ G1c =
U1 /U1c then

A + := U1 ˜ U1 = U1 ˜ / 1 ˜ U1c
is the representation of its pseudoinverse namely
(i) AA + A = (U1 /U1c )( U1 / 1 U1c )( U1 /U1c ) = U1 /U1c
(ii) A + AA + = (U1 / 1 U1c )( U1 /U1c )( U1 / 1 U1c ) = U1/ 1 U1c = A +
(iii) AA + = (U1 /U1c )( U1 / 1 U1c ) = U1 U1c = ( AA + )c

(iv) A + A = (U1 / 1 U1c )( U1 /U1c ) = U1 U1c = ( A + A)c .


The pseudoinverse A + exists and is unique, even if A is singular. For a
nonsingular matrix A, the matrix A + is identical with A 1 . Indeed, for the case of
the pseudoinverse (or any other generalized inverse) the generalized inverse of a
rectangular matrix exists. The singular value decomposition is an excellent tool
which generalizes the classical eigenvalue – eigenvector decomposition of
symmetric matrices.
Lemma (Singular value decomposition):
(i) Let A be an n × m matrix of rank r := rkA d min(n, m) .
Then the matrices A cA and A cA are symmetric positive (semi)
definite matrices whose nonvanishing eigenvalues {O1 ,… Or } are
positive. Especially
r = rk( A cA) = rk( AA c)
holds. AcA contains 0 as a multiple eigenvalue of degree m  r ,
and AAc has the multiple eigenvalue of degree n  r .
(ii) With the support of orthonormal eigenvalues of A cA and
AA c we are able to introduce an m × m matrix V and an n × n
matrix U such that UUc = UcU = I n , VV c = V cV = I m holds and

U cAAcU = Diag(O12 ,… , Or 2 , 0,… , 0),

V cA cAV = Diag(O12 ,… , Or 2 , 0,… , 0).


512 Appendix A: Matrix Algebra

The diagonal matrices on the right side have different formats


m × m and m × n .
(iii) The original n × m matrix A can be decomposed
according to
ª/ 0º
U cAV = « » , O(UAV c) = n × m
¬ 0 0¼
with the r × r diagonal matrix
/ := Diag(O1 ,… , Or )

of singular values representing the positive roots of non-


vanishing eigenvalues of A cA and AA c .
(iv) A synthetic form of the n × m matrix A is
ª / 0º
A = Uc « » V c.
¬ 0 0¼
We note here that all transformed matrices of type T1AT of a quadratic
matrix have the same eigenvalues as A = ( AT)T1 being used as often as
an invariance property.
?what is the relation between eigenvalues and the trace, the determinant,
the rank? The answer will be given now.
Lemma (relation between eigenvalues and other scalar measures):
Let A be a quadratic matrix of the order O( A) = m × m with
eigenvalues in decreasing order. Then we have
m m
| A |= – O j , trA = ¦ O j , rkA = trA ,
j =1 j =1

if A is idempotent. If A = A c is a symmetric matrix with real


eigenvalues, then we gain
O1 t max{a jj | j = 1,… , m},

Om d min{a jj | j = 1,… , m}.

At the end we compute the eigenvalues and eigenvectors which relate the
variation problem xcAx = extr subject to the condition xcx = 1 , namely
xcAx + O (xcx) = extr .
x, O

The eigenvalue O is the Lagrange multiplicator of the optimization problem.


A6 Generalized Inverses 513

A6 Generalized Inverses
Because the inversion by Cayley inversion is only possible for quadratic
nonsingular matrices, we introduce a slightly more general definition in order to
invert arbitrary matrices A of the order O( A) = n × m by so – called generalized
inverses or for short g – inverses.
An m × n matrix G is called g – inverse of the matrix A if it fulfils the equation
AGA = A
in the sense of Cayley multiplication. Such g – inverses always exist and are
unique if and only if A is a nonsingular quadratic matrix. In this case
G = A 1 if A is invertible,
in other cases we use the notation
G = A  if A 1 does not exist.
For the rank of all g – inverses the inequality
r := rk A d rk A  d min{n, m}

holds. In reverse, for any even number d in this interval there exists a g – inverse
A  such that
d = rkA  = dim R ( A  )

holds. Especially even for a singular quadratic matrix A of the order


O( A) = n × n there exist g-inverses A  of full rank rk A  = n . In particular, such
g-inverses A r are of interest which have the same rank compared to the matrix
A, namely
rkA r = r = rkA .
Those reflexive g-inverse A r are equivalent due to the additional condition

A r AA r = A r
but are not necessary symmetric for symmetric matrices A. In general,
A = A c and A  g-inverse of A Ÿ

Ÿ ( A  )c g-inverse of A

Ÿ A rs := A  A( A  )c is reflexive symmetric g  inverse of A.


For constructing of A rs we only need an arbitrary g-inverse of A. On the other
side, A rs does not mean unique. There exist certain matrix functions which are
independent of the choice of the g-inverse. For instance,
514 Appendix A: Matrix Algebra

A( A cA)  A and A c( AA c) 1 A
can be used to generate special g-inverses of AcA or AA c . For instance,
A A := ( A cA)  A c and A m := A c( AA c) 
have the special reproducing properties
A( A cA)  A cA = AA A A = A
and
AAc( AAc)  A = AA m A = A ,
which can be generalized in case that W and S are positive semidefinite matrices
to
WA( A cWA)  A cWA = WA
ASAc( ASAc)  AS = AS ,
where the matrices
WA( A cWA)  A cW and SAc( ASA c)  AS
are independent of the choice of the g-inverse ( A cWA)  and ( ASA c)  .

A beautiful interpretation of the various g-inverses is based on the fact that the
matrices
( AA  )( AA  ) = ( AA  A) A  = AA  and ( A  A)( A  A) = A  ( AA  A) = A  A
are idempotent and can therefore be geometrically interpreted as projections. The
image of AA  , namely
R ( AA  ) = R ( A) = {Ax | x  R m }  R n ,

can be completed by the projections A  A along the null space


N ( A  A) = N ( A) = {x | Ax = 0}  R m .
By the choice of the g – inverse we are able to choose the projected direction of
AA  and the image of the projections A  A if we take advantage of the
complementary spaces of the subspaces
R ( A  A) † N ( A  A) = R m and R ( AA  ) † N ( AA  ) = R n
by using the symbol "† " as the sign of “direct sum” of linear spaces which only
have the zero element in common. Finally we have use the corresponding
dimensions
dim R ( A  A) = r = rkA = dim R ( AA  ) Ÿ
ªdim N ( A  A) = m  rkA = m  r
Ÿ«
¬ dim N ( AA ) = n  rkA = n  r

A6 Generalized Inverses 515

independent of the special rank of the g-inverses A  which are determined by the
subspaces R ( A  A) and N ( AA  ) , respectively.

N ( AA c) N (A  A) R( A  A )

R( AA  )

in R n in R m

Example (geodetic networks):


In a geodetic network, the projections A  A correspond to a S –
transformations in the sense of W. Baarda (1973).

Example ( A A and A m g-inverses):


The projections AA A = A( A cA)  A c guarantee that the subspaces
R ( AA  ) and N ( AA A ) are orthogonal to each other. The same
holds for the subspaces R ( A m A) and N ( A m A) of the
projections A m A = A c( AA c)  A.
In general, there exist more than one g-inverses which lead to identical
projections AA  and A  A. For instance, following A. Ben – Israel, T. N. E.
Greville (1974, p.59) we learn that the reflexive g-inverse which follows from
A r = ( A  A) A  ( AA  ) = A  AA 
contains the class of all reflexive g-inverses. Therefore it is obvious that the
reflexive g-inverses A r contain exact by one pair of projections AA  and A  A
and conversely. In the special case of a symmetric matrix A , A = A c , and n = m
we know due to
R ( AA  ) = R ( A) A N ( A c) = N ( A) = N ( A  A)
that the column spaces R ( AA  ) are orthogonal to the null space N ( A  A)
illustrated by the sign ”A ”. If these complementary subspaces R ( A  A) and
N ( AA  ) are orthogonal to each other, the postulate of a symmetric reflexive g-
inverse agrees to
A rs := ( A  A) A  ( A  A)c = A  A( A  )c ,
if A  is a suited g-inverse.
516 Appendix A: Matrix Algebra

There is no insurance that the complementary subspaces R ( A  A) and N ( A  A)


and R ( AA  ) and N ( AA  ) are orthogonal. If such a result should be reached,
we should use
the uniquely defined pseudoinverse A + ,
also called Moore-Penrose inverse
for which holds
R ( A + A) A N ( A + A), R ( AA + ) A N ( AA + )
or equivalent
AA = ( AA + )c, A + A = ( A + A)c.
+

If we depart from an arbitrary g-inverse ( AA  A)  , the pseudoinverse A + can be


build on
A + := Ac( AAcA)  Ac (Zlobec formula)
or
+
A := Ac( AAc) A( AcA) Ac (Bjerhammar formula) ,
 

if both the g-inverses ( AA c)  and ( A cA)  exist. The Moore-Penrose inverse


fulfils the Penrose equations:

(i) AA + A = A (g-inverse)
(ii) A + AA + = A + (reflexivity)
(iii) AA + = ( AA + )cº
» Symmetry due to orthogonal projection .
(iv) A + A = ( A + A)c »¼

Lemma (Penrose equations)


Let A be a rectangular matrix A of the order O( A) be given. A g-
generalized matrix inverse which is rank preserving rk( A) = rk( A + )
fulfils the axioms of the Penrose equations (i) - (iv).

For the special case of a symmetric matrix A also the pseudoinverse A + is


symmetric, fulfilling
R ( A + A) = R ( AA + ) A N ( AA + ) = N ( A + A) ,
in addition
A + = A( A 2 )  A = A( A 2 )  A( A 2 )  A.
A6 Generalized Inverses 517

Various formulas of computing certain g-inverses, for instance by the method of


rank factorization, exist. Let A be an n × m matrix A of rank r := rkA such that
A = GF, O(G ) = n × r , O(F) = r × m .
Due to the inequality r d rk G  d min{r , n} = r only G posesses reflexive g-
inverses G r , because of
I r × r = [(G cG ) 1 G c]G = [(G cG ) 1 G c](GG cr G ) = G r G
represented by left inverses in the sense of G L G = I. In a similar way, all g-
inverses of F are reflexive and right inverses subject to Fr := F c(FF c) 1 .
The whole class of reflexive g-inverses of A can be represented by
A r := Fr G r = Fr G L .
In this case we also find the pseudoinverse, namely
A + := F c(FF c) 1 (G cG ) 1 G c
because of
R ( A A) = R (F c) A N (F) = N ( A + A) = N ( A)
+

R ( AA + ) = R (G ) A N (G c) = N ( AA + ) = N ( A c).
If we want to give up the orthogonality conditions, in case of a quadratic matrix
A = GF , we could take advantage of the projections
A r A = AA r
we could postulate
R ( A p A) = R ( AA r ) = R (G ) ,
N ( A cA r ) = N ( A r A) = N (F) .
In consequence, if FG is a nonsingular matrix, we enjoy the representation
A r := G (FG ) 1 F ,
which reduces in case that A is a symmetric matrix to the pseudoinverse A + .
Dual methods of computing g-inverses A  are based on the basis of the null
space, both for F and G, or for A and A c . On the first side we need the matrix
EF by
FEcF = 0, rkEF = m  r versus G cEG c = 0, rkEG c = n  r
on the other side. The enlarged matrix of the order (n + r  r ) × (n + m  r ) is
automatically nonsingular and has the Cayley inverse
518 Appendix A: Matrix Algebra
1
ªA EG c º ª A+ EF+ º
«E » =« + »
¬ F 0 ¼ ¬ EG c 0¼

with the pseudoinverse A + on the upper left side. Details can be derived from A.
Ben – Israel and T. N. E. Greville (1974 p. 228).
If the null spaces are always normalized in the sense of
< EF | EcF >= I m  r , < EcG c | EG c >= I n  r
because of
+
E = EcF < EF | EcF > 1 = EcF
F

and
+
E Gc =< EcG c | EG c > 1 EcG c = EcG c
1
ªA EG c º ªA+ EG c º
«E = « » .
¬ F 0 »¼ ¬ EcF 0 ¼

These formulas gain a special structure if the matrix A is symmetric to the order
O( A) . In this case
EG c = EcF =: Ec , O(E) = (m  r ) × m , rk E = m  r
and
1
ª A Ec º ª A+ E c < E | Ec >  1 º
«E 0 » = « »
¬ < E | Ec > E
1
¬ ¼ 0 ¼
on the basis of such a relation, namely EA + = 0 there follows
I m = AA + + Ec < E | Ec > 1 E =
= ( A + EcE)[ A + + Ec(EEcEEc) 1 E]
and with the projection (S - transformation)
A + A = I m  Ec < E | Ec > 1 E = ( A + EcE) 1 A
and
A + = ( A + EcE) 1  Ec(EEcEEc) 1 E
pseudoinverse of A
R ( A + A) = R ( AA + ) = R ( A) A N ( A) = R (Ec) .
In case of a symmetric, reflexive g-inverse A rs there holds the orthogonality or
complementary
A6 Generalized Inverses 519

R ( A rs A) A N ( AA rs )
N ( AA rs ) complementary to R ( AA rs ) ,
which is guaranteed by a matrix K , rk K = m  r , O(K ) = (m  r ) × m such that
KEc is a non-singular matrix.
At the same time, we take advantage of the bordering of the matrix A by K and
K c , by a non-singular matrix of the order (2m  r ) × (2m  r ) .
1
ª A K cº ª A rs K R º
«K 0 » = «  ».
¬ ¼ ¬ (K R ) c 0 ¼
K R := Ec(KEc) 1 is the right inverse of A . Obviously, we gain the symmetric
reflexive g-inverse A rs whose columns are orthogonal to K c :

R( A rs A) A R(K c) = N ( AA rs )


KA rs = 0 Ÿ
Ÿ I m = AA + K c(EK c) 1 E =

rs

= ( A + K cK )[ A rs + Ec(EK cEK c) 1 E]


and projection (S - transformation)
A A = I m  Ec(KEc) 1 K = ( A + K cK ) 1 A c ,

rs

A rs = ( A + K cK ) 1  Ec(EK cEK c) 1 E.


symmetric reflexive g-inverse

For the special case of a symmetric and positive semidefinite m × m matrix A the
matrix set U and V are reduced to one. Based on the various matrix decompo-
sitions
ª- 0 º ª U1c º
A = [ U1 , U 2 ] « » « » = U1AU1c ,
¬ 0 0 ¼ ¬ U c2 ¼
we find the different g - inverses listed as following.
ª-1 L12 º ª U1c º
A = [ U1 , U 2 ] « »« ».
¬ L 21 L 21-L12 ¼ ¬ U c2 ¼

Lemma (g-inverses of symmetric and positive semidefinite


matrices):
ª-1 L12 º ª U1c º
(i) A  = [ U1 , U 2 ] « » « »,
¬ L 21 L 22 ¼ ¬ U c2 ¼
520 Appendix A: Matrix Algebra

(ii) reflexive g-inverse


ª-1 L12 º ª U1c º
A r = [ U1 , U 2 ] « »« »
¬ L 21 L 21-L12 ¼ ¬ U c2 ¼
(iii) reflexive and symmetric g-inverse
ª-1 L12 º ª U1c º
A rs = [ U1 , U 2 ] « »« »
¬ L12 L12 -L12 ¼ ¬ U c2 ¼
(iv) pseudoinverse
ª-1 0 º ª U1c º
A + = [ U1 , U 2 ] « » « c » = U1- U1 .
1

¬ 0 U
0¼ ¬ 2 ¼

We look at a representation of the Moore-Penrose inverse in terms of U 2 , the


basis of the null space N ( A  A) . In these terms we find
1 +
ªA = ª« A U2 º ,
U2 º
E := U1 Ÿ
¬« U c2 0 ¼» ¬ U c2 0 »¼

by means of the fundamental relation of A + A


A + A = lim( A + G I m ) 1 A = AA + = I m  U 2 U c2 = U1U1c ,
G o0

we generate the fundamental relation of the pseudo inverse

A + = ( A + U 2 U c2 ) 1  U 2 U c2 .
The main target of our discussion of various g-inverses is the easy handling of
representations of solutions of arbitrary linear equations and their characteriza-
tions.
We depart from the solution of a consistent system of linear equations,
Ax = c, O( A) = n × m, c  R ( A) Ÿ x = A  c for any g-inverse A  .
x = A  c is the general solution of such a linear system of equations. If we want
to generate a special g - inverse, we can represent the general solution by
x = A  c + (I m  A  A ) z for all z  R m ,
since the subspaces N ( A) and R (I m  A  A ) are identical. We test the consis-
tency of our system by means of the identity
AA  c = c .
c is mapped by the projection AA  to itself.
Similary we solve the matrix equation AXB = C by the consistency test: the
existence of the solution is granted by the identity
A6 Generalized Inverses 521

AA  CB  B = C for any g-inverse A  and B  .


If this condition is fulfilled, we are able to generate the general solution by

X = A  CB + Z  A  AZBB  ,
where Z is an arbitrary matrix of suitable order. We can use an arbitrary g-
inverse A  and B  , for instance the pseudoinverse A + and B + which would be
for Z = 0 coincide with two-sided orthogonal projections.
How can we reduce the matrix equation AXB = C to a vector equation?
The vec-operator is the door opener.
AXB = C œ (Bc … A) vec X = vec C .
The general solution of our matrix equation reads
vec X = (Bc … A)  vec C + [I  (Bc … A)  (Bc … A)] vec Z .
Here we can use the identity
( A … B)  = B  … A  ,
generated by two g-inverses of the Kronecker-Zehfuss product.
At this end we solve the more general equation Ax = By of consistent type
R ( A)  R (B) by
Lemma (consistent system of homogenous equations Ax = By ):
Given the homogenous system of linear equations Ax = By for
y  R A constraint by By  R ( A ) . Then the solution x = Ly can
be given under the condition
R ( A )  R (B ) .
In this case the matrix L may be decomposed by
L = A  B for a certain g-inverse A  .
Appendix B: Matrix Analysis
A short version on matrix analysis is presented. Arbitrary derivations of scalar-
valued, vector-valued and matrix-valued vector – and matrix functions for func-
tionally independent variables are defined. Extensions for differenting symmetric
and antisymmetric matrices are given. Special examples for functionally depend-
ent matrix variables are reviewed.
B1 Derivatives of Scalar valued and Vector valued Vector Functions
Here we present the analysis of differentiating scalar-valued and vector-valued
vector functions enriched by examples.
Definition: (derivative of scalar valued vector function):
Let a scalar valued function f (x) of a vector x of the order
O(x) = 1× m (row vector) be given, then we call
wf
Df (x) = [D1 f (x),… , Dm f ( x)] :=
wxc
first derivative of f (x) with respect to xc .

Vector differentiation is based on the following definition.


Definition: (derivative of a matrix valued matrix function):
Let a n × q matrix-valued function F(X) of a m × p matrix of
functional independent variables X be given. Then the nq × mp
Jacobi matrix of first derivates of F is defined by
wvecF(X)
J F = DF(X) := .
w (vecX)c

The definition of first derivatives of matrix-functions can be motivated as fol-


lowing. The matrices F = [ f ij ]  R n × q and X = [ xk A ]  R m × p are based on two-
dimensional arrays. In contrast, the array of first derivatives

ª wf ij º
» = ª¬ J ijk A º¼  R
n× q× m× p
«
wx
¬ kA ¼
is four-dimensional and automatic outside the usual frame of matrix algebra of
two-dimensional arrays. By means of the operations vecF and vecX we will
vectorize the matrices F and X. Accordingly we will take advantage of vecF(X)
of the vector vecX derived with respect to the matrix J F , a two-dimensional
array.
B2 Derivatives of Trace Forms 523

Examples
(i) f (x) = xcAx = a11 x12 + (a12 + a21 ) x1 x2 + a22 x22
wf
Df (x) = [D1 f (x), D 2 f (x)] = =
wxc
= [2a11 x1 + (a12 + a21 ) x2 | (a12 + a21 ) x1 + 2a22 x2 ] = xc( A + Ac)

ªa x + a x º
(ii) f ( x) = Ax = « 11 1 12 2 »
¬ a21 x1 + a22 x2 ¼
wf ª a11 a12 º
J F = Df (x) = =« =A
wxc ¬ a21 a22 »¼

ª x2 + x x x11 x12 + x12 x22 º


(iii) F(X) = X 2 = « 11 12 21 2 »
¬ x21 x11 + x22 x21 x21 x12 + x22 ¼

ª x112 + x12 x21 º


« »
x x +x x
vecF(X) = « 21 11 22 21 »
«x x + x x »
« 11 12 12 2 22 »
¬« x21 x12 + x22 ¼»
(vecX)c = [ x11 , x21 , x12 , x22 ]

ª 2 x11 x12 x21 0 º


« x11 + x22 x21 »»
wvecF(X) « x21 0
J F = DF(X) = =
w (vecX)c « x12 0 x11 + x22 x12 »
« »
¬ 0 x12 x21 2 x22 ¼

O(J F ) = 4 × 4 .

B2 Derivatives of Trace Forms


Up to now we have assumed that the vector x or the matrix X are functionally
idempotent. For instance, the matrix X cannot be a symmetric matrix
X = [ xij ] = [ x ji ] = Xc or an antisymmetric matrix X = [ xij ] = [ x ji ] =  Xc . In
case of a functional dependent variables, for instance xij = x ji or xij =  x ji we
can take advantage of the chain rule in order to derive the differential procedure.

ª A c, if X consists of functional independent elements;


w «
tr( AX) = « Ac + A - Diag[a11 ,… , ann ], if the n × n matrix X is symmetric;
wX
«¬ A c  A, if the n × n matrix X is antisymmetric.
524 Appendix B: Matrix Analysis

ª[vecAc]c, if X consists of functional independent elements;


«[vec(Ac + A - Diag[a ,…, a ])]c, if the n × n matrix X is
w
tr( AX) = « 11 nn

w(vecX) « symmetric;
«
¬[vec(Ac  A)]c, if the n × n matrix X is antisymmetric.
for instance
ªa a12 º ªx x12 º
A = « 11 » , X = « 11 .
¬ a21 a22 ¼ ¬ x21 x22 »¼

Case # 1: “the matrix X consists of functional independent elements”

ª w w º
« wx wx12 » ªa a21 º
w w
= « 11 », tr( AX) = « 11 = Ac.
wX « w w » wX ¬ a12 a22 »¼
« wx wx22 »¼
¬ 21
Case # 2: “the n × n matrix X is symmetric : X = Xc “

x12 = x21 Ÿ
tr( AX ) = a11 x11 + ( a12 + a21 ) x21 + a22 x22

ª w dx21 w º ª w w º
w « wx dx12 wx21 » « wx11 wx21 »
= « 11 »=« »
wX « w w » « w w »
« wx wx22 »¼ «¬ wx21 wx22 »¼
¬ 21
w ª a11 a12 + a21 º
tr( AX) = « = A c + A  Diag(a11 ,… , ann ) .
wX ¬ a12 + a21 a22 »¼

Case # 3: “the n × n matrix X is antisymmetric : X =  X c ”


x11 = x22 = 0, x12 =  x21 Ÿ tr( AX) = (a12  a21 ) x21
ª w dx21 w º ª w w º
« wx 
w dx12 wx21 » « wx11 wx21 »
= « 11 »=« »
wX « w w » « w w »
« » «
¬ wx21 wx22 ¼ ¬ wx21 wx22 »¼

w ª 0 a12 + a21 º
tr( AX) = « » = Ac  A .
wX ¬ a12  a21 0 ¼
B2 Derivations of Trace Forms 525

Let us now assume that the matrix X of variables xij is always consisting of
functionally independent elements. We note some useful identities of first de-
rivatives.
Scalar valued functions of vectors
w
(acx) = ac (B1)
w xc

w
(xcAx) = Xc( A + Ac). (B2)
w xc

Scalar-valued function of a matrix: trace


w tr(AX)
= Ac ; (B3)
wX
especially:
w acXb w tr(bacX)
= = b c … ac ;
w (vecX)c w (vecX)c
w
tr(XcAX) = ( A + A c) X ; (B4)
wX
especially:
w tr(XcX)
= 2(vecX)c .
w (vecX)c
w
tr(XAX) = XcA c + A cXc , (B5)
wX
especially:
w trX 2
= 2(vecXc)c .
w (vecX)c
w
tr(AX 1 ) = ( X 1AX 1 ), if X is nonsingular, (B6)
wX
especially:
1
w tr(X )
= [vec(X 2 )c]c ;
w (vecX)c

w acX 1b w tr(bacX 1 )


= = bc( X 1 )c … acX 1 .
w (vecX) c w (vecX) c
526 Appendix B: Matrix Analysis

w
trXD = D ( Xc)D 1 , if X is quadratic ; (B7)
wX
especially:
w trX
= (vecI)c .
w (vecX)c

B3 Derivatives of Determinantal Forms

The scalarvalued forms of matrix determinantal form will be listed now.


w
| AXBc |= A c(adjAXBc)cB =| AXBc | A c(BXcA c) 1 B,
wX
if AXBc is nonsingular ; (B8)
especially:
wacxb
= bc … ac, where adj(acXb)=1 .
w (vecX)c
w
| AXBXcC |= C(adjAXBXcC) AXB + Ac(adjAXBXcC)cCXBc ; (B9)
wX
especially:
w
| XBXc |= (adjXBXc)XB + (adjXB cXc) XB c ;
wX
w | XSXc |
= 2(vecX)c(S … adjXSXc), if S is symmetric;
w (vecX)c
w | XXc |
= 2(vecX)c(I … adjXXc) .
w (vecX)c
w
| AXcBXC |= BXC(adjAXcBXC) A + BcXA c(adjAXcBXC)cCc ; (B10)
wX
especially:
w
| XcBX |= BX(adjXcBX) + BcX(adjXcBcX) ;
wX
w | XcSX |
= 2(vecX)c(adjXcSX … S), if S is symmetric;
w (vecX)c
w | XcX |
= 2(vecX)c(adjXcX … I ) .
w (vecX)c
B4 Derivatives of a Vector/Matrix Function of a Vector/Matrix 527

w
| AXBXC |= BcXcA c(adjAXBXC)cCc + A c(adjAXBXC)cCcXcBc ; (B11)
wX
w
| XBX |= BcXc(adjXBX)c + (adjXBX)cXB c ;
wX
especially:
2
w|X |
= (vec[Xcadj(X 2 )c + adj(X 2 )cXc])c =
w (vecX)c

=| X |2 (vec[X c(X c) 2 + (X c) 2 X c])c = 2 | X |2 [vec(X 1 ) c]c, if X is non-singular .


w
| XD |= D | X |D ( X 1 )c, D  N if X is non-singular , (B12)
wX
w|X|
=| X | (X 1 )c if X is non-singular;
wX
especially:
w|X|
= [vec(adjXc)]c.
w (vecX)c
B4 Derivatives of a Vector/Matrix Function of a Vector/Matrix
If we differentiate the vector or matrix valued function of a vector or matrix, we
will find the results of type (B13) – (B20).

vector-valued function of a vector or a matrix


w
AX = A (B13)
wxc
w w (ac … A)vecX
AXa = = ac … A (B14)
w (vecX)c w (vecX)c

matrix valued function of a matrix


w (vecX)
= I mp for all X  R m × p (B15)
w (vecX)c
w (vecXc)
= K m˜ p for all X  R m × p (B16)
w (vecX)c
where K m ˜ p is a suitable commutation matrix
w (vecXX c)
= ( I m +K m ˜m )(X … I m ) for all X  R m × p ,
2
w (vecX )c

where the matrix I m +K m˜m is symmetric and idempotent,


2
528 Appendix B: Matrix Analysis

w (vecXcX)
= (I p +K p˜ p )(I p … Xc) for all X  R m × p
2
w (vecX)c
w (vecX 1 )
= ( X 1 )c if X is non-singular
w (vecX)c
w (vecXD ) D
= ¦ (Xc)D -j … X j 1 for all D  N , if X is a square matrix.
w (vecX)c j =1

B5 Derivatives of the Kronecker – Zehfuss product


Act a matrix-valued function of two matrices X and Y as variables be given. In
particular, we assume the function
F(X, Y) = X … Y for all X  R m × p , Y  R n × q
as the Kronecker – Zehfuss product of variables X and Y well defined. Then the
identities of the first differential and the first derivative follow:

dF(X, Y) = (dX) … Y + X … dY,


dvecF(X, Y) = vec( dX … Y) + vec(X … dY),

vec( dX … Y) = (I p … K qm … I n ) ˜ (vecdX … vecY) =


= (I p … K qm … I n ) ˜ (I mp … vecY) ˜ d (vecX) =
= (I p … [K qm … I n ) ˜ (I m … vecY)]) ˜ d (vecX),

vec(X … dY) = (I p … K qm … I n ) ˜ (vecX … vecdY) =


= (I p … K qm … I n ) ˜ (vecX … I nq ) ˜ d (vecY) =
= ([( I p … K qm ) ˜ (vecX … I q )] … I n ) ˜ d (vecY),

w vec(X … Y)
= I p … [(K qm … I  n) ˜ (I m … vecY)] ,
w (vecX)c
w vec(X … Y)
= (I p … K qm ) ˜ (vecX … I q )] … I n .
w (vecY)c

B6 Matrix-valued Derivatives of Symmetric or Antisymmetric Ma-


trix Functions
Many matrix functions f ( X) or F(X) force us to pay attention to dependencies
within the variables. As examples we treat here first derivatives of symmetric or
antisymmetric matrix functions of X.
B6 Matrix-valued Derivatives of Symmetric or Antisymmetric Matrix Functions 529

Definition: (derivative of a matrix-valued symmetric matrix


function):
Let F(X) be an n × q matrix-valued function of an m × m sym-
metric matrix X = X c . The nq × m( m + 1) / 2 Jacobi matrix of
first derivates of F is defined by
wvecF(X)
J Fs = DF(X = X c) := .
w (vechX )c

Definition: (derivative of matrix valued antisymmetric matrix


function):
Let F(X) be an n × q matrix-valued function of an m × m anti-
symmetric matrix X =  X c . The nq × m( m  1) / 2 Jacobi matrix
of first derivates of F is defined by
wvecF(X)
J aF = DF(X =  X c) := .
w (veckX )c
Examples
(i) Given is a scalar-valued matrixfunction tr(AX ) of a symmet-
ric variable matrix X = X c , for instance
ª x11 º
a a12 º x x12 º
A = ª« 11 , X = ª« 11 , vechX = «« x22 »»
a
¬ 21 a22 »¼ x
¬ 21 x22 »¼
«¬ x33 »¼
tr(AX ) = a11 x11 + (a12 + a 21 )x 21 + a22 x22
w w w w
=[ , , ]
w (vechX )c wx11 wx21 wx22
w tr(AX)
= [a11 , a12 + a21 , a22 ]
w (vechX)c
w tr(AX) w tr(AX)
= [vech(A c + A  Diag[a11 ,… , ann ])]c=[vech ]c.
w (vechX) c wX

(ii) Given is scalar-valued matrix function tr(AX) of an anti-


symmetric variable matrix X =  Xc , for instance
a a12 º 0  x21 º
A = ª« 11 , X = ª« , veckX = x21 ,
a
¬ 21 a22 »¼ x
¬ 21 0 »¼

tr(AX) = (a12  a 21 )x 21
530 Appendix B: Matrix Analysis

w w w tr(AX)
= , = a12  a21 ,
w (veckX)c wx21 w (veckX)c
w tr(AX) w tr(AX)
= [veck(A c  A)]c=[veck ]c .
w (veckX)c wX

B7 Higher order derivatives


Up to now we computed only first derivatives of scalar-valued, vector-valued
and matrix-valued functions. Second derivatives is our target now which will be
needed for the classification of optimization problems of type minimum or
maximum.
Definition: (second derivatives of a scalar valued vector function):
Let f (x) a scalar-valued function of the n × 1 vector x . Then
the m × m matrix
w2 f
DDcf (x) = D( Df ( x))c :=
wxwxc
denotes the second derivatives of f ( x ) to x and xc . Correspond-
ingly
w w
D2 f (x) := … f (x) = (vecDDc) f ( x)
wxc wx
denotes the 1 × m 2 vector of second derivatives.
and
Definition: (second derivative of a vector valued vector function):
Let f (x) be an n × 1 vector-valued function of the m × 1 vector x .
Then the n × m 2 matrix of second derivatives
w w w 2 f ( x)
H f = D2f (x) = D(Df (x)) =: … f ( x) =
wxc wx wxcwx
is the Hesse matrix of the function f (x) .
and
Definition: (second derivatives of a matrix valued matrix function):
Let F(X) be an n × q matrix valued function of an m × p matrix
of functional independent variables X . The nq × m 2 p 2 Hesse
matrix of second derivatives of F is defined by
w w w 2 vecF(X)
H F = D2 F(X) = D(DF(X)):= … vecF(X) = .
w (vecX)c w (vecX)c w (vecX)c…w(vecX)c
B7 Higher order derivatives 531
The definition of second derivatives of matrix functions can be motivated as
follows. The matrices F = [ f ij ]  R n×q and X = [ xk A ]  R m × p are the elements of a
two-dimensional array. In contrast, the array of second derivatives
w 2 f ij
[ ] = [kijk Apq ]  R n × q × m × p × m × p
wxk A wx pq
is six-dimensional and beyond the common matrix algebra of two-dimensional
arrays. The following operations map a six-dimensional array of second deriva-
tives to a two-dimensional array.
(i) vecF(X) is the vectorized form of the matrix valued function
(ii) vecX is the vectorized form of the variable matrix
w w
(iii) the Kronecker – Zehfuss product …
w (vecX )c w (vecX )c
vectorizes the matrix of second derivatives
(iv) the formal product of the 1 × m 2 p 2 row vector of second
derivatives with the nq ×1 column vector vecF(X) leads to
an nq × m 2 p 2 Hesse matrix of second derivatives.
Again we assume the vector of variables x and the matrix of variables X consists
of functional independent elements. If this is not the case we according to the
chain rule must apply an alternative differential calculus similary to the first
deri-vative, case studies of symmetric and antisymmetric variable matrices.
Examples:
(i) f (x) = xcAx = a11 x12 + (a12 + a21 ) x1 x2 + a22 x22

wf
Df (x) = = [2a11 x1 + (a12 + a21 ) x2 | (a12 + a21 ) x1 + 2a22 x2 ]
wxc

w2 f ª 2a11 a12 + a21 º


D2 f (x) = D(Df (x))c = =« = A + Ac
wxwxc ¬ a12 + a21 2a22 »¼

ªa x + a x º
(ii) f (x) = Ax = « 11 1 12 2 »
¬ a21 x1 + a22 x2 ¼
wf ª a11 a12 º
Df (x) = =« =A
wxc ¬ a21 a22 »¼

w 2f ª0 0º
DDcf (x) = =« , O(DDcf (x)) = 2 × 2
wxwx ¬0 0 »¼
c

D2 f (x) = [0 0 0 0], O(D2 f (x)) = 1 × 4


532 Appendix B: Matrix Analysis

ª x2 + x x x11 x12 + x12 x22 º


(iii) F(X) = X 2 = « 11 12 21 2 »
¬ x21 x11 + x22 x21 x21 x12 + x22 ¼

ª x112 + x12 x21 º


« »
« x21 x11 + x22 x21 »
vecF(X) = , O (F ) = O ( X) = 2 × 2
« x11 x12 + x12 x22 »
« »
«¬ x21 x12 + x222
»¼

(vecX)c = [ x11 , x21 , x12 , x22 ]

ª 2 x11 x12 x21 0 º


« x11 + x22 x21 »»
w vecF(X) « x21 0
JF = =
w (vecX)c « x12 0 x11 + x22 x12 »
« »
«¬ 0 x12 x21 2 x22 »¼
O(J F ) = 4 × 4

w w w w w w
HF = … vecF(X) = [ , , , ] … JF =
w (vecX)c w (vecX)c wx11 wx21 wx12 wx22

ª2 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0º
« »
0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0»

«0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0»
« »
¬« 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 2 ¼»

O(H F ) = 4 × 16 .
At the end, we want to define the derivative of order l of a matrix-valued matrix
function whose structure is derived from the postulate of a suitable array.
Definition ( l-th derivative of a matrix-valued matrix function):
Let F(X) be an n × q matrix valued function of an m × p matrix
of functional independent variables X. The nq × ml p l matrix of
l-th derivative is defined by
w w
Dl F(X) := …… … vecF(X) =
w (vecX)c l -times w (vecX)c

wl
= vecF(X) for all l  N .
w (vecX)c …… …(vecX)c
l -times
Appendix C: Lagrange Multipliers

?How can we find extrema with side conditions?

We generate solutions of such external problems first on the basis of algebraic


manipulations, namely by the lemma of implicit functions, and secondly by a
geometric tool box, by means of interpreting a risk function and side conditions
as level surfaces (specific normal images, Lagrange multipliers).

C1 A first way to solve the problem


A first way to find extreme with side conditions will be based on a risk function
f ( x1 ,..., xm ) = extr (C1)

with unknowns ( x1 ,..., xm )  \ m , which are restricted by side conditions of type

[ F1 ( x1 ,..., xm ), F2 ( x1 ,..., xm ),..., Fr ( x1 ,..., xm ) ]c = 0 (C2)

wFi
rk( ) = r < m. (C3)
wxm

The side conditions Fi ( x j ) (i = 1,..., r , j = 1,..., m) are reduced by the lemma of


the implicit function: solve for
xm r +1 = G1 ( x1 ,..., xm r )
xm r +2 = G2 ( x1 ,..., xmr )
... (C4)
xm1 = Gr 1 ( x1 ,..., xm r )
xm = Gr ( x1 ,..., xmr )

and replace the result within the risk function


f ( x1 , x2 ,..., xm r , G1 ( x1 ,..., xm r ),..., Gr ( x1 ,..., xm1 )) = extr . (C5)

The “free” unknowns ( x1 , x2 ,..., xm  r 1 , xm  r )  \ m  r can be found by taking the


result of the implicit function theorem as follows.

Lemma C1 (“implicit function theorem”):


Let ȍ be an open set of \ m = \ m  r × \ r and F : ȍ o \ r with vectors
x1  \ m  r and x 2  \ m  r . The maps
534 Appendix C: Lagrange Multipliers

ª F1 ( x1 ,..., xm  r ; xm  r +1 ,..., xm ) º
« F2 ( x1 ,..., xm  r ; xm  r +1 ,..., xm ) »
« »
(x1 , x 2 ) 6 F(x1 , x 2 ) = « ... » (C6)
« Fr 1 ( x1 ,..., xm  r ; xm  r +1 ,..., xm ) »
«¬ Fr ( x1 ,..., xm  r ; xm  r +1 ,..., xm ) »¼
transform a continuously differential function with F(x1 , x 2 ) = 0 . In case
of a Jacobi determinant j not zero or a Jacobi matrix J of rank r, or
w ( F1 ,..., Fr )
j := det J z 0 or rk J = r , J := , (C7)
w ( xm  r +1 ,..., xm )
there exists a surrounding U := U(x1 )  \ m  r and V := UG (x 2 )  \ r
such that the equation F (x1 , x 2 ) = 0 for any x1  U in V c has only one
solution
ª xm  r +1 º ª G1 ( x1 ,..., xm  r ) º
« xm  r + 2 » « G 2 ( x1 ,..., xm  r ) »
« » « »
x 2 = G (x1 ) or « ... » = « ... ». (C8)
x
« m 1 » « r 1 1G ( x ,..., xmr » )
«¬ xm »¼ «¬ G r ( x1 ,..., xm  r ) »¼

The function G : U o V is continuously differentiable.


A sample reference is any literature treating analysis, e.g. C. Blotter .
Lemma C1 is based on the Implicit Function Theorem whose result we insert
within the risk function (C1) in order to gain (C5) in the free variables ( x1 ,
..., xm  r )  \ m  r . Our example C1 explains the solution technique for finding
extreme with side conditions within our first approach. Lemma C1 illustrates that
there exists a local inverse of the side conditions towards r unknowns ( xm  r +1 ,
xm  r + 2 ,..., xm 1 , xm )  \ r which in the case of nonlinear side conditions towards
r unknowns ( xm  r +1 , xm  r + 2 ,..., xm 1 , xm )  \ r which in case of nonlinear side
conditions is not necessary unique.

:Example C1:

Search for the global extremum of the function


f ( x1 , x2 , x3 ) = f ( x, y , z ) = x  y  z

subject to the side conditions


ª F1 ( x1 , x2 , x3 ) = Z ( x, y, z ) := x 2 + 2 y 2  1 = 0 (elliptic cylinder)
« F ( x , x , x ) = E ( x, y , z ) := 3x  4 z = 0 (plane)
¬ 2 1 2 3
C1 A first way to solve the problem 535
wFi ª2 x 4 y 0 º
J=( )=« , rk J ( x z 0 oder y z 0) = r = 2
wx j ¬ 3 0 4 »¼
ª 1
«1 y = + 2 2 1  x
2

F1 ( x1 , x2 , x3 ) = Z ( x, y, z ) = 0 Ÿ «
«2 y =  1 2 1  x2
«¬ 2
3
F2 ( x1 , x2 , x3 ) = E ( x, y, z ) = 0 Ÿ z = x
4
1 3
1 f ( x1 , x2 , x3 ) = 1 f ( x, y , z ) = f ( x, + 2 1  x2 , ) =
2 4
x 1
=  2 1  x2
4 2
1 3
2 f ( x1 , x2 , x3 ) = 2 f ( x, y , z ) = f ( x,  2 1  x2 , )
2 4
x 1
= + 2 1  x2
4 2
1 1 x 1
1 f ( x) = 0 œ + = 0 œ1 x = 
c 2
4 2 1  x2 3
1 1 x 1
2 f ( x )c = 0 œ
+ 2 = 0 œ2 x = +
4 2 1 x 2 3
1 3 1 3
1 f ( ) =  (minimum), 2 f ( ) = + ( maximum).
3 4 3 4
At the position x = 1/ 3, y = 2 / 3, z = 1/ 4 we find a global minimum, but at
the position x = +1/ 3, y = 2 / 3, z = 1/ 4 a global maximum.

An alternative path to find extreme with side conditions is based on the


geometric interpretation of risk function and side conditions. First, we form the
conditions
F1 ( x1 ,… , xm ) = 0 º
F2 ( x1 ,… , xm ) = 0 » wFi
» rk( )=r
… » wx j
Fr ( x1 ,… , xm ) = 0 »¼

by continuously differentiable real functions on an open set ȍ  \ m . Then we


define r equations Fi ( x1 ,..., xm ) = 0 for all i = 1,..., r with the rank conditions
rk(wFi / wx j ) = r , geometrically an (m-1) dimensional surface M F  ȍ which
can be seen as a level surface. See as an example our Example C1 which describe
as side conditions
536 Appendix C: Lagrange Multipliers

F1 ( x1 , x2 , x3 ) = Z ( x, y , z ) = x 2 + 2 y 2  1 = 0
F2 ( x1 , x2 , x3 ) = E ( x, y , z ) = 3x  4 z = 0

representing an elliptical cylinder and a plane. In this case is the (m-r) dimen-
sional surface M F the intersection manifold of the elliptic cylinder and of the
plane as the m-r =1 dimensional manifold in \ 3 , namely as “spatial curve”.
Secondly, the risk function f ( x1 ,..., xm ) = extr generates an (m-1) dimensional
surface M f which is a special level surface. The level parameter of the (m-1)
dimensional surface M f should be external. In our Example C1 one risk function
can be interpreted as the plane
f ( x1 , x2 , x3 ) = f ( x, y , z ) = x  y  z .
We summarize our result within Lemma C2.

Lemma C2 (extrema with side conditions)


The side conditions Fi ( x1 ,… , xm ) = 0 for all i  {1,… , r} are built on
continuously differentiable functions on an open set ȍ  \ m which
are subject to the side conditions rk(wFi / wx j ) = r generating an
(m-r) dimensional level surface M f . The function f ( x1 ,… , xm ) pro-
duces certain constants, namely an (m-1) dimensional level surface
M f . f ( x1 ,… , xm ) is geometrically as a point p  M F conditionally
extremal (stationary) if and only if the (m-1) dimensional level sur-
face M f is in contact to the (m-r) dimensional level surface in p.
That is there exist numbers O1 ,… , Or , the Lagrange multipliers, by
r
grad f ( p ) = ¦ i =1 Oi grad Fi ( p ).
The unnormalized surface normal vector grad f ( p ) of the (m-1)
dimensional level surface M f in the normal space `M F of the level
surface M F is in the unnormalized surface normal vector
grad Fi ( p ) in the point p . To this equation belongs the variational
problem
L ( x1 ,… , xm ; O1 ,… , Or ) =
r
f ( x1 ,… , xm )  ¦ i =1 Oi Fi ( x1 ,… , xm ) = extr .

:proof:

First, the side conditions Fi ( x j ) = 0, rk(wFi / wx j ) = r for all i = 1,… , r ; j = 1,… , m


generate an (m-r) dimensional level surface M F whose normal vectors
ni ( p ) := grad Fi ( p )  ` p M F (i = 1,… , r )
span the r dimensional normal space `M of the level surface M F  ȍ . The
r dimensional normal space ` p M F of the (m-r) dimensional level surface M F
C1 A first way to solve the problem 537

is orthogonal complement Tp M p to the tangent space Tp M F  \ m1 of M F in


the point p spanned by the m-r dimensional tangent vectors

wx
t k ( p ) :=  Tp M F (k = 1,..., m  r ).
wx k x= p

:Example C2:
Let the m  r = 2 dimensional level surface M F of the sphere S r2  \ 3 of radius
r (“level parameter r 2 ”) be given by the side condition
F ( x1 , x2 , x3 ) = x12 + x2 2 + x32  r 2 = 0.
:Normal space:
ª 2 x1 º
wF wF wF
n( p ) = grad F ( p ) = e1 + e2 + e3 = [e1 , e 2 , e 3 ] « 2 x2 » .
wx1 wx2 wx3 «2 x »
¬ 3¼p
3
The orthogonal vectors [e1 , e 2 , e 3 ] span \ . The normal space will be generated
locally by a normal vector n( p ) = grad F ( p ).
:Tangent space:
The implicit representation is the characteristic element of the level surface. In
order to gain an explicit representation, we take advantage of the Implicit
Function Theorem according to the following equations.
F ( x1 , x2 , x3 ) = 0 º
»
) = r = 1 » Ÿ x3 = G ( x1 , x2 )
wF
rk(
wx j »¼

wF wF
x12 + x22 + x32  r = 0 and ( ) = [2 x1 + 2 x2 + 2 x3 ], rk( ) =1
wx j wx j
Ÿ x j = G ( x1 , x2 ) = + r 2  ( x12 + x2 2 ) .

The negation root leads into another domain of the sphere: here holds the do-
main 0 < x1 < r , 0 < x2 < r , r 2  ( x1 + x2 ) > 0.
2 2

The spherical position vector x( p ) allows the representation

x( p ) = e1 x1 + e 2 x2 + e 3 r 2  ( x12 + x22 ) ,

which is the basis to produce


538 Appendix C: Lagrange Multipliers

ª ª º
« « 1 »
wx x1
« t1 ( p ) = ( p ) = e1  e3 = [e1 , e 2 , e3 ] « 0 »
« wx2 r  ( x1 + x2 )
2 2 2
« x1 »
« «¬ r 2  ( x12 + x2 2 )»
¼
«
« ª º
« 0 »
« wx x2
«t1 ( p ) = wx ( p ) = e 2  e3 2 = [e1 , e 2 , e3 ] « 1 »,
« 2 r  ( x1 + x2 )
2 2
« x2 »
«¬ «¬ r 2  ( x12 + x2 2 )»
¼

which span the tangent space Tp M F = \ 2 at the point p.


:The general case:
In the general case of an ( m  r ) dimensional level surface M F , implicitly
produced by r side conditions of type
F1 ( x1 ,..., xm ) = 0 º
F2 ( x1 ,..., xm ) = 0 »
» wFi
... » rk ( wx ) = r ,
Fr  j ( x1 ,..., xm ) = 0 » j

Fr ( x1 ,..., xm ) = 0 ¼ »

the explicit surface representation, produced by the Implicit Function Theorem,


reads
x ( p ) = e1 x1 + e 2 x2 + ... + e m r xm r + e m r +1G1 ( x1 ,..., xmr ) + ... + e mGr ( x1 ,..., xmr ).

The orthogonal vectors [e1 ,..., e m ] span \ m .


Secondly, the at least once conditional differentiable risk function
f ( x1 ,..., xm ) for special constants describes an (m  1) dimensional level surface
M F whose normal vector
n f := grad f ( p )  ` p M f
spans an one-dimensional normal space ` p M f of the level surface M f  ȍ in
the point p . The level parameter of the level surface is chosen in the extremal
case that it touches the level surface M f the other level surface M F in the
point p . That means that the normal vector n f ( p ) in the point p is an element of
the normal space ` p M f . Or we may say the normal vector grad f ( p ) is a
linear combination of the normal vectors grad Fi ( p ) in the point p,
r
grad f ( p ) = ¦ i =1 Oi grad Fi ( p ) for all i = 1,..., r ,
where the Lagrange multipliers Oi are the coordinates of the vector grad f ( p ) in
the basis grad Fi ( p ).
C1 A first way to solve the problem 539

:Example C3:
Let us assume that there will be given the point X  \ 3 . Unknown is the point in
the m  r = 2 dimensional level surface M F of type sphere S r 2 = \ 3 which is
from the point X  \ 3 at extremal distance, either minimal or maximal.
The distance function || X  x ||2 for X  \ 3 and X  S r 2 describes the risk func-
tion
f ( x1 , x2 , x3 ) = ( X 1  x1 ) 2 + ( X 2  x2 ) 2 + ( X 3  x3 ) 2 = R 2 = extr ,
x1 , x2 , x3

which represents an m  1 = 2 dimensional level surface M f of type sphere


S r 2  \ 3 at the origin ( X 1 , X 2 , X 3 ) and level parameter R 2 . The conditional
extremal problem is solved if the sphere S R 2 touches the other sphere S r 2 . This
result is expressed in the language of the normal vector.
wf wf wf
n( p ) := grad f ( p ) = e1 + e2 + e3 =
wx1 wx2 wx3
ª 2( X 1  x1 ) º
= [e1 , e 2 , e 3 ] « 2( X 2  x2 ) »  N p M f
« 2( X  x ) »
¬ 3 3 ¼p

ª 2 x1 º
n( p ) := grad F ( p ) = [e1 , e 2 , e 3 ] « 2 x2 »
«2 x »
¬ 3¼
is an element of the normal space N p M f .
The normal equation
grad f ( p ) = O grad F ( p )

leads directly to three equations


xi  X 0 = O xi œ xi (1  O ) = X i (i = 1, 2,3) ,

which are completed by the fourth equation


F ( x1 , x2 , x3 ) = x12 + x2 2 + x3 2  r 2 = 0.

Lateron we solve the 4 equations.

Third, we interpret the differential equations


r
grad f ( p ) = ¦ i =1 Oi grad Fi ( p )

by the variational problem, by direct differentiation namely


540 Appendix C: Lagrange Multipliers

L ( x1 ,..., xm ; O1 ,..., Or ) =
r
= f ( x1 ,..., xm )  ¦ i =1 Oi Fi ( x1 ,..., xm ) = extr
x1 ,..., xm ; O1 ,..., Or

ª wL wf r wFi
« wx = wx  ¦ i =1 Oi wx = 0 ( j = 1,..., m)
« i j j

« wL
«  wx = Fi ( x j ) = 0 (i = 1,..., r ).
¬ k

:Example C4:
We continue our third example by solving the alternative system of equations.
L ( x1 , x2 , x3 ; O ) = ( X 1  x1 ) 2 + ( X 2  x2 ) 2 + ( X 3  x3 )
 O ( x12 + x22 + x32  r 2 ) = extr
x1 , x2 , x3 ; O

wL º
= 2( X j  x j )  2O x j = 0 »
wx j
»Ÿ
wL »
 = x1 + x2 + x3  r = 0 »
2 2 2 2

wO ¼
X X º
x1 = 1 ; x2 = 2 »
1 O 1 O Ÿ
»
x12 + x22 + x32  r 2 = 0 ¼
X 12 + X 2 2 + X 32
 r 2 = 0 Ÿ (1  O ) 2 r 2 + X 12 + X 2 2 + X 32 = 0 œ
(1  O ) 2
X 12 + X 2 2 + X 32 1
œ (1  O ) 2 = œ 1  O1, 2 = ± X 12 + X 2 2 + X 32 œ
r2 r

1 r ± X 12 + X 2 2 + X 32
O1, 2 = 1 ± X 12 + X 2 2 + X 32 =
r r
rX 1
( x1 )1, 2 = ± ,
X 12 + X 2 2 + X 32
rX 2
( x2 )1, 2 = ± ,
X 12 + X 2 2 + X 32
rX 3
( x3 )1, 2 = ± .
X + X 2 2 + X 32
1
2

The matrix of second derivatives H decides upon whether at the point


( x1 , x2 , x3 , O )1, 2 we enjoy a maximum or minimum.
C1 A first way to solve the problem 541

w 2L
H=( ) = (G jk (1  O )) = (1  O )I 3
wx j xk

H (1  O > 0) > 0 ( minimum) º ª H(1  O < 0) < 0 ( maximum)


( x1 , x2 , x3 ) is the point of minimum »¼ «( x , x , x ) is the point of maximum .
¬ 1 2 3

Our example illustrates how we can find the global optimum under side condi-
tions by means of the technique of Lagrange multipliers.

:Example C5:
Search for the global extremum of the function f ( x1 , x2 , x3 ) subject to two side
conditions F1 ( x1 , x2 , x3 ) and F2 ( x1 , x2 , x3 ) , namely
f ( x1 , x2 , x3 ) = f ( x, y , z ) = x  y  z (plane)

ª F1 ( x1 , x2 , x3 ) = Z ( x, y, z ) := x 2 + 2 y 2  1 = 0 (elliptic cylinder)
« F ( x , x , x ) = E ( x, y , z ) := 3x  4 z = 0 (plane)
¬ 2 1 2 3
wFi ª2 x 4 y 0 º
J=( )=« , rk J ( x z 0 oder y z 0) = r = 2 .
wx j ¬ 3 0 4 »¼

:Variational Problem:
L ( x1 , x2 , x3 ; O1 , O2 ) = L ( x, y, z; O , P )
= x  y  z  O ( x + 2 y 2  1)  P (3 x  4 z ) =
2
extr
x1 , x2 , x3 ; O , P

wL º
= 1  2O x  3P = 0 »
wx
»
wL 1
= 1  4O y = 0 Ÿ O =  »
wy 4y »
»
wL 1 »Ÿ
= 1  4 P = 0 Ÿ O = 
wz 4 »
wL »
 = x2 + 2 y 2  1 = 0 »
wO »
wL »
 = 3 x  4 z = 0. »
wP ¼
We multiply the first equation wL / wx by 4y, the second equation wL / wy by
(2 x) and the third equation wL / wz by 3 and add !
4 y  8O xy  12P y + 2 x + 8O xy  3 y + 12P y = y + 2 x = 0 .
542 Appendix C: Lagrange Multipliers

Replace in the cylinder equation (first side condition) Z(x, y, z)= x 2 + 2 y 2  1 = 0 ,


that is x1,2 = ±1/ 3. From the second condition of the plane (second side condi-
tion) E ( x, y, z ) = 3 x  4 z = 0 we gain z1,2 = ±1/ 4. As a result we find x1,2 , z1,2
and finally y1,2 = B 2 / 3.
The matrix of second derivatives H decides upon whether at the point
O 1,2 = B 3 / 8 we find a maximum or minimum.

w 2L ª 2O 0 0º
H=( )=« 0 4O 0 »
wx j xk «¬ 0 0 0 »¼

3
ºª ª - 34 0 0 º
3 ª 4 03 0 º » « H (O2 = 3 ) = « 0 - 3 0 » d 0
H (O1 =  ) = «0 2 0 » t 0
8 «0 0 0» »« 8 « 0 02 0 »
¬ ¼ »« ¬ ¼
(minimum) » «(maximum)
»«
( x, y, z; O , P )1 =( 13 ,- 32 , 14 ;- 83 , 14 ) 1 2 1 3 1
» «( x, y, z; O , P ) 2 =(- 3 , 3 ,- 4 ; 8 , 4 )
is the restricted minmal solution point.»¼ «¬is the restricted maximal solution point.

The geometric interpretation of the Hesse matrix follows from E. Grafarend and
P. Lohle (1991).
The matrix of second derivatives H decides upon whether at the point
( x1 , x2 , x3 , O )1, 2 we enjoy a maximum or minimum.
Apendix D: Sampling distributions and their use:
confidence intervals and confidence regions
D1 A first vehichle: Transformation of random variables
If the probability density function (p.d.f.) of a random vector y = [ y1 ,… , yn ]c is
known, but we want to derive the probability density function (p.d.f.) of a ran-
dom vector x = [ x1 ,… , xn ]c (p.d.f.) which is generated by an injective mapping x
=g(y) or xi = g i [ y1 ,… , yn ] for all i  {1," , n} we need the results of Lemma D1.
Lemma D1 (transformation of p.d.f.):
Let the random vector y := [ y1 ,… , yn ]c be transformed into the random vector
x = [ x1 ,… , xn ]c by an injective mapping x = g(y) or xi = gi [ y1 ,… , yn ] for all
i  {1,… , n} which is of continuity class C1 (first derivatives are continuous). Let
the Jacobi matrix J x := (wg i / wyi ) be regular ( det J x z 0 ), then the inverse
transformation y = g-1(x) or yi = gi1 [ x1 ,… , xn ] is unique. Let f x ( x1 ,… , xn ) be
the unknown p.d.f., but f y ( y1 ,… , yn ) the given p.d.f., then

f x ( x1 ,… , xn ) = f ( g11 ( x1 ,… , xn ),… , g11 ( x1 ,… , xn )) det J y

with respect to the Jacobi matrix


wyi wg 1
J y := [ ]=[ i ]
wx j wx j
for all i, j  {1,… , n} holds.
Before we sketch the proof we shall present two examples in order to make you
more familiar with the notation.
Example D1 (“counter example”):
The vector-valued random variable (y1, y2) is transformed into the vector-valued
random variable (x1, x2) by means of
x1 = y1 + y2 , x2 = y12 + y22
wx ª wx / wy1 wx1 / wy2 º ª 1 1 º
J x := [ ]= « 1 =
wy c ¬wx2 / wy1 wx2 / wy2 »¼ «¬ 2 y1 2 y2 »¼

x12 = y12 + 2 y1 y2 + y22 , x2 + 2 y1 y2 = y12 + 2 y1 y2 + y22


x12 = x2 + 2 y1 y2 , y2 = ( x12  x2 ) /(2 y1 )
1 x12  x2 1
x1 = y1 + y2 = y1 + , x1 y1 = y12 + ( x12  x2 )
2 y1 2
1
y12  x1 y1 + ( x12  x2 ) = 0
2
544 Appendix D: Sampling distributions and their use

1 x2 1
y1± =  x1 ± 1  ( x12  x2 )
2 4 2

1 x12 1 2
y2± = x1 B  ( x1  x2 ) .
2 4 2
At first we have computed the Jacobi matrix J x, secondly we aimed at an inver-
sion of the direct transformation ( y1 , y2 ) 6 ( x1 , x2 ) . As the detailed inversion
step proves, namely the solution of a quadratic equation, the mapping x = g(y) is
not injective.
Example D2:
Suppose (x1, x2) is a random variable having p.d.f.
ªexp( x1  x2 ), x1 t 0, x2 t 0
f x ( x1 , x2 ) = «
¬0 , otherwise.

We require to find the p.d.f. of the random variable


(x1+ x2, x2 / x1).
The transformation
x2
y1 = x1 + x2 , y2 =
x1

has the inverse


y1 yy
x1 = , x2 = 1 2 .
1 + y2 1 + y2

The transformation provides a one-to-one mapping between points in the first


quadrant of the (x1, x2) - plane Px2 and in the first quadrant of the (y1, y2) - plane
Py2 . The absolute value of the Jacobian of the transformation for all points in the
first quadrant is
wx1 wx1
w ( x1 , x2 ) wy1 wy2 (1 + y2 ) 1  y1 (1 + y2 ) 2
= = =
w ( y1 , y2 ) wx2 wx2 y2 (1 + y2 ) 1 y1 (1 + y2 ) 2 [(1 + y2 )  y2 ]
wy1 wy2
y1
= y1 (1 + y2 ) 3 + y1 y2 (1 + y2 ) 3 = .
(1 + y2 ) 2

Hence we have found for the p.d.f. of (y1, y2)


D1 A first vehichle: Transformation of random variables 545

ª y1
exp( y1 ) , y1 > 0, y2 > 0
f y ( y1 , y2 ) = « (1 + y2 ) 2
«
«¬0 , otherwise.

Incidentally it should be noted that y1 and y2 are independent random variables,


namely
f y ( y1 , y2 ) = f1 ( y1 ) f ( y2 ) = y1 exp( y1 )(1 + y2 ) 2 . h

Proof:
The probability that the random variables y1 ,… , yn take on values in the region
: y is given by

³" ³ f y ( y1 ,… , yn )dy1 " dyn .


:y

If the random variables of this integral are transformed by the function


xi = gi ( y1 ,… , yn ) for all i  {1,… , n} which map the region :y onto the regions
:x , we receive

³" ³ f y ( y1 ,… , yn )dy1 " dyn =


:y

³" ³ f y ( g11 ( x1 ,… , xn ),… , g n1 ( x1 ,… , xn )) det J y dx1 " dxn


:x

from the standard theory of transformation of hypervolume elements, namely

dy1 " dyn = | det J y | dx1 " dxn


or
*(dy1 š " š dyn ) = | det J y | * (dx1 š " š dxn ).
Here we have taken advantage of the oriented hypervolume element
dy1 š " š dyn (Grassmann product, skew product, wedge product) and the
Hodge star operator * applied to the n - differential form dy1 š " š dyn  / n
(the exterior algebra / n ).
The star * : / p o / n  p in R n maps a p - differential form onto a (n-p) - differ-
ential form, in general. Here p = n, n – p = 0 applies. Finally we define
f x ( x1 ,… , xn ) := f ( g11 ( x1 ,… , xn ),… , g n1 ( x1 ,… , xn )) | det J y |

as a function which is certainly non-negative and integrated over :x to one.


h
546 Appendix D: Sampling distributions and their use

In applying the transformation theorems of p.d.f. we meet quite often the prob-
lem that the function xi = gi ( y1 ,… , yn ) for all i  {1,… , n} is given but not the
inverse function yi = gi1 ( x1 ,… , xn ) for all i  {1,… , n} . Then the following
results are helpful.
Corollary D2 (Jacobian):
If the inverse Jacobian | det J x | = | det(wgi / wy j ) | is given, we are able to
compute.
wgi1 ( x1 ,… , xn ) wgi ( y1 ,… , yn ) 1
| det J y | = | det | = | det J |1 = | det | .
wx j wy j

Example D3 (Jacobian):
Let us continue Example D2. The inverse map

ª g 1 ( y , y ) º ª x + x º wy ª 1 1 º
y = « 11 1 2 » = « 1 2 » Ÿ =« 2 »
¬« g 2 ( y1 , y2 ) ¼» ¬ x2 / x1 ¼ wxc ¬  x2 / x1 1/ x1 ¼

wy 1 x x +x
jy = | J y | = = + 22 = 1 2 2
wx c x1 x1 x1

wx x12 x2
jx = | J x | = j y1 = | J y |1 = = = 1
wy c x1 + x2 y1

allows us to compute the Jacobian Jx from Jy. The direct map

ª g ( y , y ) º ª x º ª y /(1 + y2 ) º
x=« 1 1 2 »=« 1»=« 1 »
«¬ g 2 ( y1 , y2 ) »¼ ¬ x2 ¼ ¬ y1 y2 /(1 + y2 ) ¼

leads us to the final version of the Jacobian.


y1
jx = | J x | = .
(1 + y2 ) 2

For the special case that the Jacobi matrix is given in a partitioned form, the
results of Corollary D3 are useful.
Corollary D3 (Jacobian):
If the Jacobi matrix Jx is given in the partitioned form
wg i ªU º
| J x |= ( )=« »,
wy j ¬V ¼
then
D2 A second vehicle: Transformation of random variables 547

det J x = | det J x J xc | = det(UU c) det[VV c  (VUc)(UU c) 1 UV c]

if det(UU c) z 0

det J x = | det J x J xc | = det(VV c) det[UU c  UV c(VV c) 1 VU c] ,

if det(VV c) z 0

| det J y | = | det J x |1 .

Proof:
The Proof is based upon the determinantal relations of a partitioned matrix of
type
ªA Uº
» = det A det(D  VA U ) if det A z 0
1
det «
¬ V D ¼
ªA Uº
» = det D det( A  UD U) if det D z 0
1
det «
¬ V D ¼
ªA Uº
det « » = D det A  V (adj A )U ,
¬ V D¼
which have been introduced by G. Frobenius (1908): Über Matrizen aus positi-
ven Elementen, Sitzungsberichte der Königlich Preussischen Akademie der Wis-
senschaften von Berlin, 471-476, Berlin 1908 and J. Schur (1917): Über Potenz-
reihen, die im Innern des Einheitskreises beschränkt sind, J. reine und angew.
Math 147 (1917) 205-232.

D2 A second vehicle: Transformation of random variables


Previously we analyzed the transformation of the p.d.f. under an injective map of
random variables y 6 g ( y ) = x . Here we study the transformation of polar
coordinates [I1 , I2 ,… , In 1 , r ]  Y as parameters of an Euclidian observation
space to Cartesian coordinates [ y1 ,… , yn ]  Y . In addition we introduce the
hypervolume element of a sphere S n 1  Y, dim Y = n . First, we give three
examples. Second, we summarize the general results in Lemma D4.
Example D4 (polar coordinates: “2d”):
Table D1 collects characteristic elements of the transformation of polar coordi-
nates (I1 , r ) of type “longitude, radius” to Cartesian coordinates ( y1 , y2 ), their
domain and range, the planar elements dy1 , dy2 as well as the circle S1 embedded
into E 2 := {R 2 , G kl } , equipped with the canonical metric I 2 = [G kl ] and its total
measure of arc Z1.
548 Appendix D: Sampling distributions and their use

Table D1
Cartesian and polar coordinates of a two-dimensional observation space,
total measure of the arc of the circle

(I1 , r )  [0, 2S ] × ]0, f[ , ( y1 , y2 )  R 2

dy1dy2 = rdrdI1

S1 := {y  R 2 | y12 + y22 = 1}
2S

Z1 = ³ dI 1 = 2S .
0

Example D5 (polar coordinates: “3d”):


Table D2 is a collectors’ item for characteristic elements of the transformation of
polar coordinates (I1 , I2 , r ) of type “longitude, latitude, radius” to Cartesian
coordinates ( y1 , y2 , y3 ), their domain and range, the volume element
dy1 , dy2 , dy3 as well as of the sphere S2 embedded into E3 := {R 3 , G kl }
equipped with the canonical metric I 3 = [G kl ] and its total measure of surface Z2.
Table D2
Cartesian and polar coordinates of a three-dimensional observation space,
total measure of the surface of the circle
y1 = r cos I2 cos I1 , y2 = r cos I2 sin I1 , y3 = r sin I2

S S
(I1 , I2 , r )  [0, 2S ] × ]  , [ × ]0, r[ , ( y1 , y2 )  R 2
2 2
( y1 , y2 , y3 ), R 3

dy1dy2 dy3 = r 2 dr cos I2 dI1dI2

S 2 := { y  R 3 | y12 + y22 + y32 = 1}


2S +S / 2

Z2 = ³ dI ³ 1 dI2 cos I2 = 4S .
0 S / 2

Example D6 (polar coordinates: “4d”):


Table D3 is a collection of characteristic elements of the transformation of polar
coordinates (I1 , I2 , I3 , r ) to Cartesian coordinates ( y1 , y2 , y3 , y4 ), their domain
and range, the hypervolume element dy1 , dy2 , dy3 , dy4 as well as of the 3 - sphere
S3 embedded into E 4 := {R 4 , G kl } equipped with the canonical metric I 4 = [G kl ]
and its total measure of hypersurface.
D2 A second vehicle: Transformation of random variables 549

Table D3
Cartesian and polar coordinates
of a four-dimensional observation space total measure of the
hypersurface of the 3-sphere
y1 = r cos I3 cos I2 cos I1 , y2 = r cos I3 cos I2 sin I1 ,

y3 = r cos I3 sin I2 , y4 = r sin I3

S S S S
(I1 , I2 , I3 , r )  [0, 2S ] × ]  , [ × ]  , [ × ]0, 2S [
2 2 2 2
dy1dy2 dy3dy4 = r 3 cos2 I3 cos I2 drdI3 dI2 dI1

w ( y1 , y2 , y3 , y4 )
J y := =
w (I1 , I2 , I3 , r )

ª r cos I3 cos I2 sin I1 r cos I3 sin I2 cos I1 r sin I3 cos I2 cos I1 cos I3 cos I2 cos I1 º
« »
« + r cos I3 cos I2 cos I1  r cos I3 sin I2 sin I1  r sin I3 cos I2 sin I1 cos I3 cos I2 sin I1 »
« 0 + r cos I3 cos I2  r sin I3 sin I2 cos I3 cos I2 »
« »
¬ 0 0 r cos I3 sin I3 ¼
| det J y |= r 3 cos 2 I3 cos I2

S3 := { y  R 4 | y12 + y22 + y32 + y42 = 1}

Z3 = 2S 2 .
Lemma D4 (polar coordinates, hypervolume element, hypersurface element):

ª cos I ˜ cos I ˜ cos I ˜ " ˜ cos I ˜ cos I º


« n 1 n2 n3 2 1»
ª y1 º « »
« y » « cos In 1 ˜ cos In  2 ˜ cos In  3 ˜ " ˜ cos I 2 ˜ sin I1 »
« 2 » « »
« y3 » « cos In 1 ˜ cos In  2 ˜ cos In  3 ˜ " ˜ sin I2 »
« » « »
y
« 4 » « cos In 1 ˜ cos In  2 ˜ " ˜ cos I3 »
« " » = r« »
Let « »
y « " »
« n 3 » « »
« yn  2 » « cos In 1 ˜ cos In  2 ˜ sin In  3 »
« » « »
« yn 1 » cos In 1 ˜ cos In  2
« »
«« y »» « cos I ˜ sin I »
¬ n ¼ n 1 n  2
««sin I »»
¬ n 1 ¼
550 Appendix D: Sampling distributions and their use

be a transformation of polar coordinates (I1 , I2 ,… , In  2 , In 1 , r ) to Cartesian co-


ordinates ( y1 , y2 ,… , yn 1 , yn ) , their domain and range given by

S S S S S S
(I1 , I2 ,…, In2 , In1 , r )  [0, 2S ] × ]  , + [ ×"× ]  , + [ × ]  , + [ × ]0, f[,
2 2 2 2 2 2
then the local hypervolume element
dy1 ...dyn = r n 1dr cos n  2 In 1 cos n  3 In  2 ...cos 2 I3 cos I2 dIn 1dIn  2 ...dI3dI2 dI1 ,
as well as the global hypersurface element
+S / 2 +S / 2 2S
2 ˜ S ( n 1) / 2
Z n 1 = := ³ cos Inn12 dIn 1 " ³ cos I2 dI2 ³ dI1 ,
n 1
*( ) S / 2 S / 2 0
2
where J ( X ) is the gamma function.
Before we care for the proof, let us define Euler’s gamma function.
Definition D5 (Euler’s gamma function):
f

*( x) = ³ e  t t x 1 dt ( x > 0)
0

is Euler’s gamma function which enjoys the recurrence relation


*( x + 1) = x*( x)

subject to

1
*(1) = 1 or *( ) = S
2

3 1 1 1
*(2) = 1! *( ) = *( ) = S
2 2 2 2

… …

p pq pq
*(n + 1) = n ! *( ) = *( )
q q q

p
if is a rational
+
if n is integer, n  Z q
number, p / q  Q+ .

Example D7 (Euler’s gamma function):


D2 A second vehicle: Transformation of random variables 551

1
(i) *(1) = 1 (i) *( ) = S
2
3 1 1 1
(ii) *(2) = 1 (ii) *( ) = *( ) = S
2 2 2 2

5 3 3 3
(iii) *(3) = 1 ˜ 2 = 2 (iii) *( ) = *( ) = S
2 2 2 4

7 5 5 15
(iv) *(4) = 1 ˜ 2 ˜ 3 = 6 (iv) *( ) = *( ) = S.
2 2 2 8

Proof:
Our proof of Lemma D4 will be based upon computing the image of the tangent
space Ty S n 1  E n of the hypersphere S n 1  E n . Let us embed the hypersphere
S n 1 parameterized by (I1 , I2 ,… , In  2 , In 1 ) in E n parameterized by ( y1 ,… , yn ) ,
namely y  E n ,
y = e1r cos In 1 cos In  2 " cos I2 cos I1 +
+e 2 r cos In 1 cos In  2 " cos I2 sin I1 + " +
+e n 1r cos In 1 sin In  2 +
+e n r sin In 1.

Note that I1 is a parameter of type longitude, 0 d I1 d 2S , while I2 ,… , In1 are


parameters of type latitude, S / 2 < I2 < +S / 2,… , S / 2 < In1 < +S / 2 (open
intervals). The images of the tangent vectors which span the local tangent space
are given in the orthonormal n- leg {e1 , e 2 ,… , e n 1 , e n | 0} by
g1 := DI y =  e1r cos In 1 cos In  2 " cos I2 sin I1 +
1

+e 2 r cos In 1 cos In  2 " cos I2 cos I1

g 2 := DI y =  e1r cos In 1 cos In  2 "sin I2 cos I1 


2

e 2 r cos In 1 cos In  2 " sin I2 sin I1 +


+e3 r cos In 1 cos In  2 " cos I2
...
g n 1 := DI y =  e1r sin In 1 cos In  2 " cos I2 cos I1  " 
n 1

e n 1r sin In 1 sin In  2 +


+e n r cos In 1
552 Appendix D: Sampling distributions and their use

g n := DI y = e1r cos In 1 cos In  2 " cos I2 cos I1 +


n

+ e 2 r cos In 1 cos In  2 " sin I2 sin I1 + " +


+e n 1r cos In 1 cos In  2
+ e n r sin In 1 = y / r.

{g1 ,… , g n 1} span the image of the tangent space in E n . gn is the hypersphere
normal vector, || gn|| = 1. From the inner products < gi | gj > = gij, i, j  {1,… , n} ,
we derive the Gauss matrix of the metric G:= [ gij].
< g1 | g1 > = r 2 cos 2 In 1 cos 2 In  2 " cos 2 I3 cos 2 I2
< g 2 | g 2 > = r 2 cos 2 In 1 cos 2 In  2 " cos 2 I3
"
< g n 1 | g n 1 > = r 2 ,
< g n | g n > = 1.

The off-diagonal elements of the Gauss matrix of the metric are zero.
Accordingly

det G n = det G n 1 = r n 1 (cos In 1 ) n  2 (cos In  2 ) n 3 " (cos I3 ) 2 cos I2 .

The square root det G n , det G n1 elegantly represents the Jacobian deter-
minant
w ( y1 , y2 ,… , yn )
J y := = det G n .
w (I1 , I2 ,… , In 1 , r )

Accordingly we have found the local hypervolume element det G n


dr dIn 1 dIn  2 " dI3 dI2 dI1 . For the global hypersurface element Z n 1 , we inte-
grate
2S

³ dI 1 = 2S
0
+S / 2

³ cos I2 dI2 = [sin I2 ]+SS // 22 = 2


S / 2

+S / 2
1
³ cos 2 I3 dI3 = [cos I3 sin I3  I3 ]+SS // 22 = S / 2
S / 2
2
+S / 2
1 4
³ cos3 I4 dI4 = [cos 2 I4 sin I4  2sin I4 ]+SS // 22 = 
S / 2
3 3

...
D3 A first confidence interval of Gauss-Laplace normally distributed observations 553
+S / 2 +S / 2
1 1
³ (cos In 1 ) n  2 dIn 1 = [(cos In 1 ) n  3 ]+SS // 22 + (cos In 1 ) n  4 dIn 1
S / 2
n2 n  3 S³/ 2

recursively. As soon as we substitute the gamma function, we arrive at Zn-1.


h
D3 A first confidence interval of Gauss-Laplace normally dis-
tributed observations P ,V 2 known, the Three Sigma Rule
The first confidence interval of Gauss-Laplace normally distributed observations
constrained to ( P , V 2 ) known, will be computed as an introductory example. An
application is the Three Sigma Rule.
In the empirical sciences, estimates of certain quantities derived from observa-
tions are often given in the form of the estimate plus or minus a certain amount.
For instance, the distance between a benchmark on the Earth’s surface and a
satellite orbiting the Earth may be estimated to be
(20, 101, 104.132 ± 0.023) m
with the idea that the first factor is very unlikely to be outside the range
20, 101, 104.155 m to 20, 101, 104.109 m.
A cost accountant for a publishing company in trying to allow for all factors
which enter into the cost of producing a certain book,
actual production costs,
proportion of plant overhead,
proportion of executive salaries,
may estimate the cost to be 21 ± 1,1 Euro per volume with the implication that
the correct cost very probably lies between 19.9 and 22.1 Euro per volume. The
Bureau of Labor Statistics may estimate the number of unemployed in a certain
area to be 2.4 ± .3 million at a given time though intuitively it should be between
2.1 and 2.7 million. What we are saying is that in practice we are quite accus-
tomed to seeing estimates in the form of intervals.
In order to give precision to these ideas we shall consider a particular example.
Suppose that a random sample x  {R, pdf } is taken from a Gauss-Laplace
normal distribution with known mean P and known variance V 2 . We ask the
key question.
?What is the probability J of the random interval ( P  cV , P + cV )
to cover the mean P as a quantile c of the standard deviation V ?
To put this question into a mathematical form we write the probabilistic two-
sided interval identity.
554 Appendix D: Sampling distributions and their use

P ( x1 < X < x2 ) = P ( P  cV < X < P + cV ) = J ,


x2 = P + cV
1 § 1 ·
³ exp ¨  2 ( x  P ) 2 ¸ dx = J
x1 =  cV V 2S
P © 2V ¹

with a left boundary l = x1 and a right boundary r = x2 . The length of the inter-
val is x2  x1 = r  l . The center of the interval is ( x1 + x2 ) / 2 or P . Here we
have taken advantage of the Gauss-Laplace pdf in generating the cumulative
probability
P( x1 < X < x2 ) = F( x2 )  F( x1 )
F( x2 )  F( x1 ) = F( P + cV )  F( P + cV ).

Typical values for the confidence coefficient J are J = 0.95 ( J = 95% or


1  J = 5% negative confidence), J =0.99 ( J = 1% or 1  J = 1% negative con-
fidence) or J = 0.999 ( J = 999% or 1  J = 1% negative confidence).
O O

f(x)

P-3V P-2V P-V P P+V P+2V P+3V


x

Figure D1: Probability mass in a two-sided confidence interval x1 < X< x2 or


P  cV < X< P + cV , three cases: (i) c = 1 , (ii) c = 2 and (iii) c = 3.
D3 A first confidence interval of Gauss-Laplace normally distributed observations 555

Consult Figure D1 for a geometric interpretation. The confidence coefficient J


is a measure of the probability mass between x1 = P  cV and x2 = P + cV . For
a given confidence coefficient J
x2

³ f ( x | P ,V
2
)dx = J
x1

establishes an integral equation. To make this point of view to be better under-


stood let us transform the integral equations to its standard form.
1
x6z= (x  P) œ x = P + V z
V
x2 = P + cV +c
1 § 1 · 1 § 1 ·
³ exp ¨  2 ( x  P ) 2 ¸ dx = ³ exp ¨  z 2 ¸ dz = J
x = P  cV V 2S
1
© 2V ¹ c 2S © 2 ¹
x2 +c

³ f ( x | P , V 2 )dx = ³ f ( z | 0,1)dz = J .
x1 c

The special Helmert transformation maps x to z, now being standard Gauss-


Laplace normal: V 1 is the dilatation factor, also called scale variation, but P
the translation parameter.
The Gauss-Laplace pdf is symmetric, namely f (  x ) = f ( + x ) or f (  z ) = f ( + z ) .
Accordingly we can write the integral identity
x2 x2

³ f ( x | P , V 2 )dx = 2 ³ f ( x | P , V 2 )dx = J œ
x1 0
+c c
œ ³ f ( z | 0,1)dz = 2 ³ f ( z | 0,1)dz = J .
c 0

The classification of integral equations tells us that


z

J ( z ) = 2 ³ f ( z )dz
0

is a linear Volterra integral equation the first kind.


In case of a Gauss-Laplace standard normal pdf, such an integral equation is
solved by a table. In a forward computation
z z
1 § 1 2 ·
F( z ) := ³ f ( z | 0,1)dz or ) ( z ) := ³ 2S exp ¨©  2 z ¸ dz
f f ¹
are tabulated in a regular grid. For a given value F ( z1 ) or F ( z2 ) , z1 or z2 are
determined by interpolation. C. F. Gauss did not use such a procedure. He took
556 Appendix D: Sampling distributions and their use

advantage of the Gauss inequality which has been reviewed in this context by F.
Pukelsheim (1994). There also the Vysochanskii-Petunin inequality has been
discussed. We follow here a two-step procedure. First, we divide the domain
z  [0, f] into two intervals z  [0,1] and z  [1, f ] . In the first interval f ( z ) is
isotonic, differentiable and convex, f cc( z ) = f ( z )( z 2  1) < 0, while in the second
interval isotonic, differentiable and concave, f cc( z ) = f ( z )( z 2  1) > 0 . z = 1 is
the point of inflection. Second, we setup Taylor series of f ( z ) in the interval
z  [0,1] at the point z = 0 , while in the interval z  [1, f ] at the point z = 1 and
z  [2, f] at the point z = 2 .
Three examples of such a forward solution of the characteristic linear Volterra
integral equation of the first kind will follow. They establish:

The One Sigma Rule

The Two Sigma Rule The Three Sigma Rule

Box D1
Operational calculus applied to
the Gauss-Laplace probability distribution
“generating differential equations”
f cc( z ) + 2 f c( z ) + f ( z ) = 0
subject to
+f

³ f ( z )dz = 1
f

“recursive differentiation”
1 § 1 ·
f ( z) = exp ¨  z 2 ¸
2S © 2 ¹
f c( z ) =  zf ( z ) =: g ( z )
f cc( z ) = g c( z ) =  f ( z )  zg ( z ) = ( z 2  1) f ( z )
f ccc( z ) = 2 zf ( z ) + ( z 2  1) g ( z ) = (  z 3 + 3 z ) f ( z )
f ( 4) ( z ) = (3 z 2 + 3) f ( z ) + ( z 3 + 3 z ) g ( z ) = ( z 4  6 z 2 + 3) f ( z )
f (5) ( z ) = (4 z 3  12 z ) f ( z ) + ( z 4  6 z 2 + 3) g ( z ) = (  z 5 + 10 z 3  15 z ) f ( z )
f (6) ( z ) = (5 z 4 + 30 z 2  15) f ( z ) + (  z 5 + 10 z 3  15 z ) g ( z ) =
= ( z 6  15 z 4 + 45 z 2  15) f ( z )
D3 A first confidence interval of Gauss-Laplace normally distributed observations 557

f (7) ( z ) = (6 z 5  60 z 3 + 90 z ) f ( z ) + ( z 6  15 z 4 + 45 z 2  15) g ( z ) =
= ( z 7 + 21z 5  105 z 3 + 105 z ) f ( z )
f (8) ( z ) = (7 z 6 + 105 z 4  315 z 2 + 105) f ( z ) + ( z 7 + 21z 5  105 z 3 + 105 z ) f ( z ) =
= ( z 8  28 z 6 + 210 z 4  420 z 2 + 105) f ( z )
f (9) ( z ) = (8 z 7  168 z 5 + 840 z 3  840 z ) f ( z ) +
+( z 8  28 z 6 + 210 z 4  420 z 2 + 105) g ( z ) =
= (  z 9 + 36 z 7  378 z 5 + 1260 z 3  945z ) f ( z )
f (10) ( z ) = (9 z 8 + 252 z 6  1890 z 4 + 3780 z 2  945) f ( z ) +
+ ( z 9 + 36 z 7  378 z 5 + 1260 z 3  945) g ( z ) =
= ( z10  45 z 8 + 630 z 6  3150 z 4 + 4725 z 2  945) f ( z )

”upper triangle representation of the matrix transforming f ( z ) o f n ( z ) ”

ª f ( z) º ª1 º
« » ª 1 0 0 0 0 0 0 0 0 0 0º « »
« f c( z ) » « 0 1 0 0 0 0 0 0 0 0 0 »» « »
z
« f cc( z ) » « « 2»
« » « 1 0 1 0 0 0 0 0 0 0 0» « z »
ccc
« f ( z) » « » 3
« 0 3 0 1 0 0 0 0 0 0 0» « z »
« (4) » « 4»
« f ( z) » « 3 0 6 0 1 0 0 0 0 0 0» « z »
« f (5) ( z ) » = f ( z ) « 0 15 0 100 0 0 0 1
»
0 0» « z 5 » .
« » « « »
« f (6) ( z ) » « 15 0 45 01 0 0 15 0 0 0» « z 6 »
« (7) » « »« »
« f ( z) » « 0 105 0 105 0 21 0 1 0 0 0» « z 7 »
« (8) » « »
« 105 0 420 0 210 0 28 0 1 0 0» « z8 »
« f ( z) » « »
« f (9) ( z ) » « 0 945 0 1260 0 378 0 36 0 1 0 » « z 9 »
« (10) » « »
¬ 945 0 4725 0 3150 0 630 0 45 0 1 ¼ « 10 »
¬« f ( z ) ¼» ¬« z ¼»

D31 The forward computation of a first confidence interval of Gauss-


Laplace normally distributed observations: P , V 2 known
We can avoid solving the linear Volterra integral equation of the first kind if we
push forward the integration for a fixed value z.
Example D8 (Series expansion of the Gauss-Laplace integral, 1st interval):
Let us solve the integral
1

J ( z = 1) := 2³ f ( z )dz
0
558 Appendix D: Sampling distributions and their use

in the first interval 0 d z d 1 by Taylor expansion with respect to the successive


differentiation of f ( z ) outlined in Box D1 and the specific derivatives f n (0)
given in Table D1. Based on those auxiliary results, Box D2 presents us the de-
tailed interpretation. First, we expand exp(z 2 / 2) up to order O(14). The spe-
cific Taylor series are uniformly convergent. Accordingly, in order to compute
the integral, second we integrate termwise up to order O(15). For the specific
value z=1, we have computed the coefficient of confidence J (1) = 0.683 . The
result
P( P  V < X < P + V ) = 0.683

is known as the One Sigma Rule.


68.3 per cent of the sample are in the interval ]P  1V , P + 1V [ ,
0.317 per cent outside. If we make 3 experiments, one experiment
is outside the 1V interval.
Box D2
A specific integral
“expansion of the exponential function”
x x 2 x3 xn
exp x = 1 +
+ + + " + + O (n)  |x|<f
1! 2! 3! n!
1 2
x:=- z
2
§ 1 · 1 1 1 1 8 1 10 1
exp ¨  z 2 ¸ = 1  z 2 + z 4  z 6 + z  z + z12 +O (14)
© 2 ¹ 2 8 48 384 3840 46080

“the specific integral”

( )
z z
1 2 1 1 1 5 1 7
³ f ( z )dz = ³ exp  z dz = ( z  z3 + z  z +
0 2S 0 2S 6 40 336
1 9 1 1
+ z  z11 + z13 + O (15))
3456 42240 599040
“the specific values z=1”
1
2 1 1 1 1 1 1
J (1) = 2 ³ f ( z )dz = (1  +  +  + + O (15))
0 2S 6 40 336 3456 42240 599040

2
= (1  0.166, 667 + 0.025, 000) 
2S
0.002,976 + 0.000, 289 

0, 000, 024 + 0.000, 002) =


D3 A first confidence interval of Gauss-Laplace normally distributed observations 559

2
= 0.855, 624 = 0.682, 689 =
2S
= 0.683
“coefficient of confidence”
1
0.683 = 1  317.311 103 = 1  .
3
Table D1
Special values of derivatives
Gauss-Laplace probability distribution
z=0
1
= 0.398, 942
2S
1
f (0) = , f c(0) = 0,
2S
1 1
f cc(0) =  , f cc(0) = 0.199, 471
2S 2!
3 1 (4)
f ccc(0) = 0, f ( 4 ) (0) = + , f (0) = +0.049,868
2S 4!
15 1 (6)
f (5) (0) = 0, f (6) (0) =  , f (0) = 0.008,311.
2S 6!
Example D9 (Series expansion of the Gauss-Laplace integral, 2nd interval):
Let us solve the integrals
2 1 2

J ( z = 2) := 2 ³ f ( z )dz = 2 ³ f ( z )dz + 2 ³ f ( z )dz


0 0 1

J ( z = 2) = J ( z = 1) + 2 ³ f ( z )dz ,
1

namely in the 2nd interval 1 d z d 2 . First, we setup Taylor series of f ( z )


“around the point z=1”. The derivatives of f ( z ) “at the point z=1” are collected
up to order 10 in Table D2. Second, we integrate the Taylor series termwise and
receive the specific integral of Box D3. Note that termwise integrated is permit-
ted since the Taylor series are uniformly convergent. The detailed computation
560 Appendix D: Sampling distributions and their use

up to order O(12) has led us to the coefficient of confidence J (2) = 0.954 . The
result
P( P  2V < X < P + 2V ) = 0.954

is known as the Two Sigma Rule.


95.4 per cent of the sample interval ]P  2V , P + 2V [ , 0.046 per
cent outside. If we make 22 experiments, one experiment is outside
the 2V interval.
Box D3
A specific integral
“expansion of the exponential function”
1 § 1 ·
f ( z ) := exp ¨  z 2 ¸
2S © 2 ¹
1 1 1 10
f ( z ) = f (1) + f c(1)( z  1) + f cc(1)( z  1) 2 + " + f (1)( z  1)10 + O(11)
1! 2! 10!
“the specific integral”
2
11 11

³ f ( z )dz

= f (1)( z  1) + f c(1)( z  1) 2 + f cc(1)( z  1) 3 +
1
2 1! 3 2!
11 1 1 (4)
+
f ccc(1)( z  1) 4 + f (1)( z  1)5 +
4 3! 5 4!
1 1 (5) 1 1 (10)
+ f (1)( z  1)6 + " + f (1)( z  1)11 + O(12)
6 5! 11 10!
case z=2
2

J (2) = J (1) + 2 ³ f ( z )dz =J (1) + 2(0.241, 971 


1

0.120, 985 + 0.020,122  0.004, 024 


0.002, 016 + 0.000, 768 + 0.000,120 
0.000, 088  0.000, 002  0.000, 050 + O(12)) =
= 0.682, 690 + 0.271, 632 =
= 0.954
“coefficient of confidence”
1
0.954 = 1  45.678 103 = 1  .
22
D3 A first confidence interval of Gauss-Laplace normally distributed observations 561

Table D2
Special values of derivatives
Gauss-Laplace probability distribution
z=1
1 § 1·
= 0.398,942, exp ¨  ¸ = 0.606,531
2S © 2¹
1 § 1·
f (1) = exp ¨  ¸ = 0.241,971
2S © 2¹
1 § 1·
f c(1) =  f (1) =  exp ¨  ¸ = 0.241,971, f cc(1) = 0
2S © 2¹
2 § 1·
f ccc(1) = 2 f (1) = exp ¨  ¸ = 0.482,942
2S © 2¹
1
f ccc(1) = +0.080, 490
3!
f ( 4 ) (1) = 2 f (1) = 0.482, 942

1 (4)
f (1) = 0.020,122
4!
6 § 1·
f (5) (1) = 6 f (1) =  exp ¨  ¸ = 1.451,826
2S © 2¹
1 (5)
f (1) = 0.012, 098
5!
16 § 1·
f (6) (1) = 16 f (1) = exp ¨  ¸ = 3.871,536
2S © 2¹
1 (6)
f (1) = 0.005, 377
6!
20 § 1·
f (7) (1) = 20 f (1) = exp ¨  ¸ = 4.829, 420
2S © 2¹
1 (7)
f (1) = 0.000, 958
7!
f (8) (1) = 132 f (1) = 31.940,172
562 Appendix D: Sampling distributions and their use

1 (8)
f (1) = 0.000, 792
8!
f (9) (1) = 28 f (1) = 6.775,188

1 (9)
f (1) = 0.000, 019
9!
f (10) (1) = 8234 f (1) = 1992.389

1 (10)
f (1) = 0.000,549.
10!
Example D10 (Series expansion of the Gauss-Laplace integral, 3rd interval):
Let us solve the integrals
3 1 2 3

J ( z = 3) := 2 ³ f ( z )dz = 2 ³ f ( z )dz + 2 ³ f ( z ) dz + 2 ³ f ( z ) dz
0 0 1 2

J ( z = 3) = J ( z = 1) + J ( z = 2) + 2 ³ f ( z )dz,
2

namely in the 3rd interval 2 d z d 3 . First, we setup Taylor series of f(z) “around
the point z=2”. The derivatives of f(z) “at the point z=2” are collected up to order
10 in Table D3. Second, we integrate the Taylor series termwise and receive the
specific integral of Box D4. Note that termwise integration is permitted since the
Taylor series are uniformly convergent. The detailed computation up to order
O(12) leads us to the coefficient of confidence J (3) = 0.997 . The result
P ( P  3V < X < P + 3V ) = 0.997

is known as the Three Sigma Rule.


99.7 per cent of the sample are in the interval ]P  3V , P + 3V [ ,
0.003 per cent outside. If we make 377 experiments, one experi-
ment is outside the 3V interval.

Table D3
Special values of derivatives
Gauss-Laplace probability distribution
z=2
1
= 0.398,942, exp ( 2 ) = 0.135,335
2S
D3 A first confidence interval of Gauss-Laplace normally distributed observations 563

1
f (2) = exp ( 2 ) = 0.053,991
2S
f c(2) = 2 f (2) = 0.107, 982

1
f cc(2) = 3 f (2), f cc(2) = +0.080, 986
2!
1
f ccc(2) = 2 f (2), f ccc(2) = 0.017, 997
3!
1 (4)
f (4) (2) = 5 f (2), f (2) = 0.011, 248
4!
1 (5)
f (5) (2) = 18 f (2), f (2) = +0.008, 099
5!
1 (6)
f (6) (2) = 11 f (2), f (2) = 0.000,825
6!
1 (7)
f (7) (2) = 86 f (2), f (2) = 0.000, 921
7!
1 (8)
f (8) (2) = +249 f (2), f (2) = +0.000, 333
8!
1 (9)
f (9) (2) = 190 f (2), f (2) = +0.000, 028
9!
1 (10)
f (10) (2) = 2621 f (2), f (2) = 0.000, 039 .
10!

Box D4
A specific integral
“expansion of the exponential function”
1 § 1 ·
f ( z ) := exp ¨  z 2 ¸
2S © 2 ¹
1 1 1 (10)
f ( z ) = f (2) + f c(2)( z  2) + f cc(2)( z  2) 2 + " + f (2)( z  2)10 + O (11)
1! 3! 10!
“the specific integral”
z
11 11
³ f (z

)dz = f (2)( z  2) + f c(2)( z  2) 2 + f cc(2)( z  2)3 +
2
2 1! 3 2!
564 Appendix D: Sampling distributions and their use

11 1 1 ( 4)
+ f ccc(2)( z  2) 4 + f (2)( z  2)5 +
4 3! 5 4!
1 1 (5) 1 1 (10)
+ f (2)( z  2)6 + " + f (2)( z  2)11 + O (12)
6 5! 1110!
case z=3
3
J (3) = J (1) + J (2) + 2 ³ f ( z ) dz =
2

= 0.682, 690 + 0.271, 672 + 2(0.053,991  0.053,991 + 0.026,995  0.004, 499 


0.002, 250 + 0.001,350  0.000,118 + 0.000, 037 + 0.000, 003  0.000, 004 +
+O(12)) = 0.682, 690 + 0.271, 632 + 0.043, 028 = 0.997

“coefficient of confidence”
1
0.997 = 1  2.65 10 3 = 1  .
377
D 32 The backward computation of a first confidence interval of Gauss-
Laplace normally distributed observations: P , V 2 known
Finally we solve the Volterra integral equation of the first kind by the technique
of series inversion, also called series reversion. Let us recognize that the interval
integration of a Taylor series expanded Gauss-Laplace normal density distribu-
tion led us to a univariate homogeneous polynomial of arbitrary order. Such a
univariate homogeneous polynomial y = a1 x + a2 x 2 + " + an x n (“input”) can be
reversed as a univariate homogeneous polynomial x = b1 y + b2 y 2 + " + bn y n
(“output”) as outlined in Table D4. Consult M. Abramowitz and I. A. Stegun
(1965 p. 16) for a review, but E. Grafarend, T. Krarup and R. Syffus (1996) for a
derivation based upon Computer Algebra.
Table D4
Series inversion
E. Grafarend, T. Krarup, R. Syffus:
Journal of Geodesy 70 (1996) 276-286
“input: univariate homogeneous polynomial”
y = a1 x + a2 x 2 + a3 x 3 + a4 x 4 + a5 x 5 + a6 x 6 + a7 x 7 + O ( x 8 )

“output: reverse univariate homogeneous polynomial”


x = b1 y + b2 y 2 + b3 y 3 + b4 y 4 + b5 y 5 + b6 y 6 + b7 y 7 + O ( y 8 )

“coefficient relation”
D3 A first confidence interval of Gauss-Laplace normally distributed observations 565

(i) a1b1 = 1

(ii) a13b2 =  a2

(iii) a15b3 = 2a22  a1 a3

(iv) a17 b4 = 5a1a2 a3  a12 a4  5a23

(v) a19 b5 = 6a12 a2 a4 + 3a12 a32 + 14a24  a13 a5  21a1a22 a3

(vi) a111b6 = 7a13 a2 a5 + 7a13 a2 a4 + 84a1a23 a3  a14 a6  28a1a2 a32  42a25  28a12 a22 a4

(vii) a113b7 = 6a14 a2 a6 + 8a14 a2 a5 + 4a14 a42 + 120a12 a23 a4 + 180a12 a22 a32 + 132a26 

 a15 a7  36a13 a22 a5  72a13 a2 a3 a4  12a13 a33  330a1 a24 a3 .

Example D 11 (Solving the Volterra integral equation of the first kind):


Let us define the coefficient of confidence J = 0.90 or ninety per cent. We want
to know the quantile c0.90 which determines the probability identity
P ( P  cV < X < P + cV ) = 0.90 .
If you follow the detailed computation of Table D5, namely the input as well as
the output data up to order O(5), you find the quantile
c0.90 = 1.64 ,
as well as the confidence interval
P ( P  1.64V < X < P + 1.64V ) = 0.90 .
90 per cent of the sample are in the interval
]P  1.64V , P + 1.64V [ , 10 per cent outside. If we
make 10 experiments one experiment is outside the
1.62 V interval.

Table D5
Series inversion quantile c0.90
(i) input
1 z
11
J ( z ) = 2 ³ f ( z )dz + 2 ³ f ( z )dz = 0.682, 689 + 2[ f (1)( z  1) + f c(1)( z  1) 2 +
0 0
2 1!
1 1
+" + f ( n 1) (1)( z  1) n + O(n + 1)]
n (n  1)!
566 Appendix D: Sampling distributions and their use

1 11
[J ( z )  0.682, 689] = f (1)( z  1) + f c(1)( z  1) 2 + " +
2 2 1!
1 1
+ f ( n 1) (1)( z  1)n + O ( n + 1)]
n ( n  1)!

y = a1 x + a2 x 2 + " + an x n + O ( n + 1)
x := z  1
y := (0.900, 000  0.682, 689) / 2 = 0.108, 656

a1 := f (1) = 0.241, 971

11
a2 := f c(1) = 0.241, 971
2 1!
11
a3 := f cc(1) = 0
3 2!
11
a4 := f ccc(1) = 0.020,125
4 3!
1 1 (4)
a5 := f (1) = 0.004, 024
5 4!

1 1
an := f ( n 1) (1) .
n (n  1)!
(ii) output
1
b1 = = 4.132, 726
a1

1
b2 =  a2 = 8.539, 715
a13

1
b3 = (2a22  a1a3 ) = 35.292,308
a15

1
b4 = (5a1a2 a3  a12 a4  5a23 ) = 158.070
a17

1
b5 = 9
(6a12 a2 a4 + 3a12 a32 + 14a24  a13 a5  21a1a22 a3 ) = 475.452,152
a1

b1 y = 0.449, 045, b2 y 2 = 0.100,821 ,


D4 A second confidence interval for the mean, variance known 567

b3 y 3 = 0.045, 273, b4 y 4 = 0.022, 032 ,


b5 y 5 = 0.007, 201
x = b1 y + b2 y 2 + b3 y 3 + b4 y 4 + b5 y 5 + O (6) = 0.624, 372

z = x + 1 = 1.624,372 = c0.90 .

At this end we would like to give some sample references on computing the
“inverse error function”
x
1 1
y = F ( x ) := ³ exp(  z 2 )dz =: erf x
0 2S 2
versus
x = F 1 ( y ) = inv erf y ,
namely L. Carlitz (1963) and A. J. Strecok (1968).
D4 Sampling from the Gauss-Laplace normal distribution:
a second confidence interval for the mean, variance known
The second confidence interval of Gauss-Laplace i.i.d. observations will be
constructed for the mean P̂ BLUUE of P , when the variance V 2 is known. “n”
is the size of the sample, namely to agree to the number of observations.
Before we present the general sampling distribution we shall work through two
examples. Example D12 has been chosen for a sample size n = 2, while Example
D12 for n=3 observations. Afterwards the general result is obvious and suffi-
ciently motivated.

f 2 ( x ) = f 2 (Vˆ / V )
2 2

x := Vˆ 2 / V 2
2
Figure D2: Special Helmert pdf F p for one degree of freedom p = 1
568 Appendix D: Sampling distributions and their use

f1 ( z 0,1)

z = ( Pˆ  P ) /(V 2 / 2)
Figure D3: Special Gauss Laplace normal pdf of ( Pˆ  P ) / V 2 / 2

Example D12 (Gauss-Laplace i.i.d. observations, observation space Y ,


dim Y = 2 )
In order to derive the marginal distributions of P̂ BLUUE of P and Vˆ 2
BIQUUE of V 2 for a two dimensional Euclidean observation space we have to
introduce various images. First, we define the probability function of two Gauss-
Laplace i.i.d. observations and consequently implement the basic ( Pˆ , Vˆ 2 ) de-
composition into the pdf. Second, we separate the quadratic form
( y  1P )c( y  1P ) into the quadratic form ( y  1Pˆ )c( y  1Pˆ ) , the vehicle to intro-
duce Vˆ 2 BIQUUE of V 2 , and into the quadratic form ( Pˆ  P ) 2 the vehicle to
bring in P̂ BLUUE of P . Third, we aim at transforming the quadratic form
( y  1Pˆ )c( y  1Pˆ ) = y c My, rk M = 1 , into the special form 1 2 ( y1  y2 ) 2 =: x .
Fourth, we generate the marginal distributions f1 ( Pˆ P , V 2 / 2) of the mean P̂
BLUUE of P and f 2 (V 2 ) of the sample variance Vˆ 2 BIQUUE of V 2 . The
basic results of the example are collected in Corollary D6. The special Helmert
pdf F 2 with one degree of freedom is plotted in Figure D2, but the special
Gauss-Laplace normal pdf of variance V 2 / 2 in Figure D3.
The first action item
Let us assume an experiment of two Gauss-Laplace i.i.d. observations. Their pdf
is given by
f ( y1 , y2 ) = f ( y1 ) f ( y2 ) ,
D4 A second confidence interval for the mean, variance known 569

1 § 1 ·
f ( y1 , y2 ) = 2
exp ¨  2 [( y1  P ) 2 + ( y2  P ) 2 ] ¸ ,
2SV © 2V ¹
1 § 1 ·
f ( y1 , y2 ) = exp ¨  2 (y  1P )c(y  1P ) ¸ .
2SV 2 © 2V ¹
The second action item
The coordinates of the observation vector have been denoted by [ y1 , y2 ]c =
= y  Y, dim Y = 2 . The quadratic form ( y  1P )c( y  1P ) allows the fundamen-
tal decomposition
y  1P = ( y  1Pˆ ) + 1( Pˆ  P )

( y  1P )c( y  1P ) = ( y  1Pˆ )c( y  1Pˆ ) + 1c1( Pˆ  P ) 2

( y  1P )c( y  1P ) = Vˆ 2 + 2( Pˆ  P ) 2 .

Here, P̂ is BLUUE of P and Vˆ 2 BIQUUE of V 2 . The detailed computation


proves our statement.
[( y1  Pˆ ) + (Pˆ  P )]2 + [( y2  Pˆ ) + (Pˆ  P)]2 =
= ( y1  Pˆ )2 + ( y2  Pˆ )2 + 2(Pˆ  P )2 + 2(Pˆ  P )[( y1  Pˆ ) + ( y2  Pˆ )] =
= ( y1  Pˆ )2 + ( y2  Pˆ )2 + 2(Pˆ  P )2 + 2Pˆ ( y1  Pˆ ) + 2Pˆ ( y2  Pˆ )  2( y1  Pˆ )P  2( y2  Pˆ )P
1
Pˆ = ( y1 + y2 )
2
Vˆ 2 = ( y1  Pˆ ) 2 + ( y2  Pˆ ) 2 .
As soon as we substitute P̂ and Vˆ 2 we arrive at
( y1  P ) 2 + ( y2  P ) 2 = [( y1  Pˆ ) + ( Pˆ  P )]2 + [( y2  Pˆ ) + ( Pˆ  P )]2 =
= Vˆ 2 + 2( Pˆ  P ) 2 ,

since the residual terms vanish:


2 Pˆ ( y1  Pˆ ) + 2 Pˆ ( y2  Pˆ )  2( y1  Pˆ ) P  2( y2  Pˆ ) P =
= 2 Pˆ y1  2 Pˆ 2 + 2Pˆ y2  2Pˆ 2  2P y1 + 2PPˆ  2 P y2 + 2PPˆ =
= 2 Pˆ ( y1 + y2 )  4 Pˆ 2  2 P ( y1 + y2 ) + 4 PPˆ =
= 4 Pˆ 2  4 Pˆ 2  4PPˆ + 4 PPˆ = 0.

The third action item


The cumulative pdf
570 Appendix D: Sampling distributions and their use

Vˆ 2 Vˆ 2
dF = f ( y1 , y2 )dy1 dy2 = f1 ( Pˆ ) f 2 ( x)d Pˆ dx = f1 ( Pˆ ) f 2 ( 2
) d Pˆ d 2
V V
has to be decomposed into the first pdf f1 ( Pˆ P , V 2 / n ) representing the pdf
of the sample mean P̂ and the second pdf f 2 ( x ) of the new variable
x := ( y1  y2 ) 2 /(2V 2 ) = Vˆ 2 / V 2 representing the sample variance Vˆ 2 , normal-
ized by V 2 .
? How can the second decomposition f1f2 be understood?
Let us replace the quadratic form ( y  1P )c( y  1P ) = Vˆ 2 + 2( Pˆ  P ) 2 in the cu-
mulative pdf
1 § 1 ·
dF = f ( y1 , y2 )dy1 dy2 = 2
exp ¨  2 (y  1P )c(y  1P ) ¸ dy1 dy2 =
2SV © 2V ¹
1 § 1 ·
= 2
exp ¨  2 [Vˆ 2 + 2( Pˆ  P ) 2 ] ¸ dy1 dy2
2SV © 2V ¹

1 1 § 1 (Pˆ  P)2 · 1 1 § 1 Vˆ 2 ·
dF = f ( y1 , y2 )dy1dy2 =
exp ¨  ¸ exp ¨  2 ¸
dy1dy2 .
2S V
2
© 2 V / 2 ¹ 2S 2V © 2V ¹
2
2
The quadratic form Vˆ , conventionally given in terms of the residual vector
y  1P̂ will be rewritten in terms of the coordinates [ y1 , y2 ]c = y of the observa-
tion vector.
1 1
Vˆ 2 = ( y1  Pˆ ) 2 + ( y2  Pˆ ) 2 = [ y1  ( y1 + y2 )]2 + [ y2  ( y1 + y2 )]2
2 2
1 1 1
Vˆ 2 = ( y1  y2 ) 2 + ( y2  y1 ) 2 = ( y1  y2 ) 2 .
4 4 2
The fourth action item
1
The new variable x := ( y1  y2 ) 2 will be introduced in the cumulative pdf
2V 2
dF = f ( y1 , y2 )dy1dy2 . The new surface element d Pˆ dx will be related to the old
surface element dy1dy2 .

ª D y Pˆ D y Pˆ º
d Pˆ dx = | det « 1 2
| dy dy = J dy1dy2
¬ Dy x 1
D y x »¼ 1 2
2

wPˆ 1 wPˆ 1
D y Pˆ := = , D y Pˆ := =
1
wy1 2 2
wy2 2
D4 A second confidence interval for the mean, variance known 571

wx y1  y2 wx y y
D y x := = , D y x := = 1 2 2 .
1
wy1 V2 2
wy2 V

The absolute value of the Jacobi determinant amounts to


ª 1 1 º
« 2 2 » y y V2
| J |=| det « » | = 1 2 2 , | J |1 = .
« y1  y2 
y1  y2 » V y1  y2
«¬ V 2 V 2 »¼
In consequence, we have derived

V2 V
dy1 dy2 = d Pˆ dx = d Pˆ dx
y1  y2 2 x
based upon
1 1
x= 2
( y1  y2 ) 2 Ÿ 2 x = ( y1  y2 ) .
2V V
In collecting all detailed partial results we can formulate a corollary.
Corollary D6: (marginal probability distributions of Pˆ, V 2 given, and Vˆ 2 ):
The cumulative pdf of a set of two observations is represented by
dF = f ( y1 , y2 )dy1 dy2 = f1 ( Pˆ P , V 2 / 2) f 2 ( x)d Pˆ dx

subject to
1 1 § 1 ( Pˆ  P ) 2 ·
f1 ( Pˆ P , V 2 / 2) := exp ¨  ¸
2S V
2
© 2 V /2 ¹
2
2
Vˆ 1 1 § 1 ·
f 2 ( x) = f 2 ( 2 ) := exp ¨  x ¸
V 2S x © 2 ¹
subject to
+f +f

³ f1 ( Pˆ ) d Pˆ = 1 and ³ f 2 ( x) dx = 1.
f 0

f1 ( Pˆ ) is the pdf of the sample mean Pˆ = ( y1 + y2 ) / 2 and f 2 ( x) the pdf of the


sample variance Vˆ 2 = ( y1  y2 ) 2 / 2 = V 2 x . f1 ( Pˆ ) is a Gauss-Laplace pdf with
mean P and variance V 2 / n , while f 2 ( x) is a Helmert F 2 with one degree of
freedom.
Example D13 (Gauss-Laplace i.i.d. observations, observation space Y ,
dim Y = 3 )
572 Appendix D: Sampling distributions and their use

In order to derive the marginal distribution of P̂ BLUUE of P and Vˆ 2


BIQUUE of V 2 for a three-dimensional Euclidean observation space, we have to
act in various scenes. First, we introduce the probability function of three Gauss-
Laplace i.i.d. observations and consequently implement the ( Pˆ , Vˆ 2 ) decomposi-
tion in the pdf. Second, we force the quadratic form ( y  1Pˆ )c( y  1Pˆ ) to be de-
composed into Vˆ 2 and ( Pˆ  P ) 2 , actually a way to introduce the sample vari-
ance Vˆ 2 BIQUUE of Vˆ 2 and the sample mean P̂ BLUUE of P . Third, we
succeed to transform the quadratic form ( y  1Pˆ )c( y  1Pˆ ) = yMy, rkM = 2 into
the canonical form z12 + z22 by means of [ z1 , z2 ]c = H[ y1 , y2 , y3 ]c . Fourth, we
produce the right inverse H k = Hc in order to invert H, namely [ y1 , y2 , y3 ]c =
= H '[ z1 , z2 ]c . Fifth, in order to transform the original quadratic form
V 2 ( y  1Pˆ )c( y  1Pˆ ) into the canonical form z12 + z22 + z32 we review the general
Helmert transformation z = V 1H( y  P ) and its inverse y  P = V Hcz subject
to H  SO(3) and identify its parameters translation, rotation and dilatation
(scale). Sixth, we summarize the marginal probability distributions
f1 ( Pˆ P , V 2 / 3) of the sample mean P̂ BLUUE of P and f 2 (2Vˆ 2 / V 2 ) of the
sample variance Vˆ 2 BIQUUE of V 2 . The special Helmert pdf F 2 with two
degrees of freedom is plotted in Figure D4 while the special Gauss-Laplace
normal pdf of variance V 2 / 3 in Figure D5. The basic results of the example are
collected in Corollary D7.

f 2 (2Vˆ 2 | V 2 )

x := 2V 2 | V 2

Figure D4: Special Helmert pdf F p2 for two degrees of freedom, p = 2


D4 A second confidence interval for the mean, variance known 573

V2
f1 ( Pˆ | P , )
3

V2
( Pˆ  P ) /
3
V2
Figure D5: Special Gauss-Laplace normal pdf of ( Pˆ  P ) /
3
The first action item
Let us assume an experiment of three Gauss-Laplace i.i.d. observations. Their
pdf is given by
f ( y1 , y2 , y3 ) = f ( y1 ) f ( y2 ) f ( y3 ) ,
§ 1 ·
f ( y1 , y2 , y3 ) = (2S ) 3 / 2 V 3 exp ¨  2 [( y1  P ) 2 + ( y2  P ) 2 + ( y3  P ) 2 ] ¸ ,
© 2V ¹
§ 1 ·
f ( y1 , y2 , y3 ) = (2S ) 3 / 2 V 3 exp ¨  2 (y  1P )c( y  1P ) ¸ .
© 2V ¹
The coordinates of the observation vector have been denoted by [ y1 , y2 , y3 ]c = y  Y ,
dim Y = 2 .
The second action item
The quadratic form ( y1  1P )c( y2  1P ) allows the fundamental decomposition
y  1P = ( y  1Pˆ ) + 1( Pˆ  P ) ,

( y  1P )c( y  1P ) = ( y  1Pˆ )c( y  1Pˆ ) + 1c1( Pˆ  P ) 2 ,

( y  1P )c( y  1P ) = 2Vˆ 2 + 3( Pˆ  P ) 2 .
574 Appendix D: Sampling distributions and their use

Here, P̂ is BLUUE of P and Vˆ 2 BIQUUE of V 2 . The detailed computation


proves our statement.
[( y1  Pˆ ) + ( Pˆ  P )]2 + [( y2  Pˆ ) + ( Pˆ  P )]2 + [( y3  Pˆ ) + ( Pˆ  P )]2 =
= ( y1  Pˆ ) 2 + ( y2  Pˆ ) 2 + ( y3  Pˆ ) 2 + 3( Pˆ  P ) 2 + 2 Pˆ ( y1  Pˆ ) + 2 Pˆ ( y2  Pˆ ) +
+2 Pˆ ( y3  Pˆ )  2( y1  Pˆ ) P  2( y2  Pˆ ) P  2( y3  Pˆ ) P

1
Pˆ BLUUE of P : Pˆ = ( y1 + y2 + y3 )
3
1
Vˆ 2 BIQUUE of V 2 : Vˆ 2 = [( y1  Pˆ ) 2 + ( y2  Pˆ ) 2 + ( y3  Pˆ ) 2 ].
2
As soon as we substitute P̂ and Vˆ 2 we arrive at
( y1  P ) 2 + ( y2  P ) 2 + ( y3  P ) 2 = [( y1  Pˆ ) 2 + ( Pˆ  P )]2 +
+[( y2  Pˆ ) 2 + ( Pˆ  P ) 2 ]2 + [( y3  Pˆ ) 2 + ( Pˆ  P ) 2 ]2 =
= 2Vˆ 2 + 3( Pˆ  P ) 2 .

The third action item


Let us begin with transforming the cumulative probability function
dF = f ( y1 , y2 , y3 )dy1 dy2 dy3 =
§ 1 ·
= (2S ) 3 / 2 V 3 exp ¨  2 (y  1P )c(y  1P ) ¸ dy1 dy2 dy3 =
© 2V ¹
§ 1 ·
= (2S ) 3 / 2 V 3 exp ¨  2 [2Vˆ 2 + 3( Pˆ  P ) 2 ] ¸ dy1 dy2 dy3
© 2V ¹
into its canonical form. In detail, we transform the quadratic form Vˆ 2 BIQUUE
of V 2 canonically.
1 1 2 1 2 1 1
y1  Pˆ = y1  y1  ( y2 + y3 ) = y1  ( y2 + y3 ) = y1  y2  y3
3 3 3 3 3 3 3
1 1 1 2 1
y2  Pˆ = y2  y2  ( y1 + y3 ) =  y1 + y2  y3
3 3 3 3 3
1 1 1 1 2
y3  Pˆ = y3  y3  ( y1 + y2 ) =  y1  y2 + y3
3 3 3 3 3
as well as
4 2 1 2 1 2 4 2 4
( y1  Pˆ ) 2 = y1 + y2 + y3  y1 y2 + y2 y3  y3 y1
9 9 9 9 9 9
D4 A second confidence interval for the mean, variance known 575

1 2 4 2 1 2 4 4 2
( y2  Pˆ ) 2 = y1 + y2 + y3  y1 y2  y2 y3 + y3 y1
9 9 9 9 9 9
1 2 1 2 4 2 2 4 4
( y3  Pˆ ) 2 = y1 + y2 + y3 + y1 y2  y2 y3  y3 y1
9 9 9 9 9 9
and
2 2
( y1  Pˆ ) 2 + ( y2  Pˆ ) 2 + ( y3  Pˆ ) 2 = ( y1 + y22 + y32  y1 y2  y2 y3  y3 y1 ).
3
We shall prove
( y  1Pˆ )c( y  1Pˆ ) = y cMy = z12 + z22 , rkM = 2, M  \ 3×3

that the symmetric matrix M,

ª 2 1 1º
M = « 1 2 1»
« »
«¬ 1 1 2 »¼

has rank 2 or rank deficiency 1. Just compute M = 0 as a determinant identity


as well as the subdeterminant
2 1
=3z0
1 2

ª 2 1 1º ª y1 º
1
( y  1P ) ( y  1P ) = y My = [ y1 , y2 , y3 ] « 1 2 1» « y2 » .
ˆ c ˆ c
3 « »« »
«¬ 1 1 2 »¼ «¬ y3 »¼

? How to transform a degenerate quadratic form to a canonical form?

F.R. Helmert (1975, 1976 a, b) had the bright idea to implement what we call
nowadays the forward Helmert transformation
1
z1 = ( y1  y2 )
1˜ 2
1
z2 = ( y1 + y2  2 y3 )
2˜3
or
1 2
z12 = ( y1  2 y1 y2 + y22 )
2
576 Appendix D: Sampling distributions and their use

1
z22 = ( y12 + y22 + 4 y32 + 2 y1 y2  4 y2 y3  4 y3 y1 )
6
2 2
z12 + z22 = ( y1 + y22 + y32  y1 y2  y2 y3  y3 y1 ).
3
Indeed we found
( y  1Pˆ )c( y  1Pˆ ) = ( y1  Pˆ ) 2 + ( y2  Pˆ ) 2 + ( y3  Pˆ ) 2 = z12 + z22 .

In algebraic terms, a representation of the rectangular Helmert matrix is


ª 1 1 º
 0 » ª y1 º
ªz º « 1˜ 2 1˜ 2
z = « 1» = « » « y2 »
¬ z2 ¼ « 1 1 2 »« »
«  » «¬ y3 »¼
¬ 2˜3 2˜3 2˜3 ¼

ª 1 1 º
«  0 »
1˜ 2 1˜ 2
z = H 23 y , H 23 := « ».
« 1 1 2 »
«  »
¬ 2˜3 2˜3 2˜3 ¼
The rectangular Helmert matrix is right orthogonal,

ª 1 1 º
1 1 « »
ª º 1˜ 2 2˜3 »
«  0 »«
1˜ 2 1˜ 2 « 1 1 » ª1 0 º
H 23 H c23 = « »  »=« » = ǿ2.
« 1 1 2 » «« 1˜ 2 2 ˜ 3 » ¬0 1 ¼
«  »
¬ 2˜3 2˜3 2˜3 ¼ « 2 »
« 0  »
¬ 2˜3 ¼

The fourth action item


By means of the forward Helmert transformation we could prove that z12 + z22
represents the quadratic form ( y  1Pˆ )c( y  1Pˆ ) . Unfortunately, the forward
Helmert transformation only allows indirectly by a canonical transformation.
What would be needed is the inverse Helmert transformation z 6 y , also called
backward Helmert transformation. The rectangular Helmert matrix has the dis-
advantage to have no Cayley inverse. Fortunately, its right inverse
H R := H c23 (H 23 H c23 ) 1 = H c23  R 3× 2

solves our problem. The inverse Helmert transformation


y = H R z = H c23 z
D4 A second confidence interval for the mean, variance known 577

brings
2
( y  1Pˆ )c( y  1Pˆ ) = ( y12 + y22 + y32  y1 y2  y2 y3  y3 y1 )
3
via

ª 1 1
« y1 = 1 ˜ 2 z1 + 2 ˜ 3 z2
«
« 1 1
y = H 23z or « y2 = 
'
z1 + z2
1˜ 2 2˜3
«
«y =  2 z ,
«¬ 3 2˜3
2

1 2 1 2 1
y12 + y22 + y32 = z1 + z2 + z1 z2 +
2 6 3
1 1 1
+ z22 + z22  z1 z2 +
2 6 3
2
+ z22 ,
3
1 1 1 1 1 1
y1 y2 + y2 y3 + y3 y1 =  z12 + z22 + z1 z2  z22  z1 z2  z22 ,
2 6 3 3 3 2

2 2 2 3 3
( y1 + y22 + y32  y1 y2  y2 y3  y3 y1 ) = ( z12 + z22 ) = z12 + z22 ,
3 3 2 2
into the canonical form.
The fifth action item
Let us go back to the partitioned pdf in order to inject the canonical representa-
tion of the deficient quadratic form y cMy, M  \ 3×3 , rk M=2. Here we meet first
the problem to transform
§ 1 ·
dF = (2S ) 3/ 2 V 3 exp ¨  2 [2Vˆ 2 + 3( Pˆ  P ) 2 ] ¸ dy1dy2 dy3
© 2V ¹
by an extended vector [ z1 , z2 , z3 ]c =: z into the canonical form
§ 1 ·
dF = (2S ) 3/ 2 exp ¨  ( z12 + z22 + z32 ) ¸ dz1dz2 dz3 ,
© 2 ¹
which is generated by the general forward Helmert transformation
z = V  1 H ( y  1P )
578 Appendix D: Sampling distributions and their use

ª 1 1 º
« 0 »
« 1˜ 2 1˜ 2 » ª y1  P º
ª z1 º
«z » = 1 « 1 1

2 »«
y2  P »
« 2» V « 2˜3 2˜3 2˜3» « »
«¬ z3 »¼ « » ¬« y3  P ¼»
« 1 1 1 »
«¬ 3 3 3 »¼
or its backward Helmert transformation, also called the general inverse Helmert
transformation

ª 1 1 1 º
« 1˜ 2 2˜3 3 »» z
ª y1  P º « ª 1º
« y  P» = V « 1 1 1 »« »
« z2 .
« 2 » 1˜ 2 2˜3 3»« »
«¬ y3  P »¼ « » «z »
« 2 1 » ¬ 3¼
«¬ 0 
2˜3 3 »¼

y  1P = V Hcz

thanks to the orthonormality of the quadratic Helmert matrix H3, namely


H 3 H c3 = H c3 H 3 = I 3 or H 31 = H c3 .
Secondly, notice the transformation of the volume element
dy1 dy2 dy3 = d ( y1  P )d ( y2  P )d ( y3  P ) = V 3 dz1 dz2 dz3 ,

which is due to the Jacobi determinant


J = V 3 Hc = V 3 H = V 3 .

Let us prove that


Pˆ  P
z3 := 3V 1 ( Pˆ  P ) =
V
3
brings the first marginal density
V2 1 1 § 1 ·
f1 = ( Pˆ | P , ) := exp ¨  2 3( Pˆ  P ) 2 ¸
3 2S V © 2V ¹
3
into the canonical form
1
§ 1 ·
f1 ( z3 0,1) =
exp ¨  z32 ¸ .
2S © 2 ¹
Let us compute P̂  P as well as 3( Pˆ  P ) 2 which concludes the proof.
D4 A second confidence interval for the mean, variance known 579

ª y1  P º
1 1 1 «
z3 = V 1[ , , ] y2  P »
3 3 3 « »
«¬ y3  P »¼
y + y2 + y3  3P º
z3 = V 1 1 »
3 »Ÿ
y1 + y2 + y3 »
= Pˆ Ÿ y1 + y2 + y3 = 3Pˆ »
3 ¼
Pˆ  P 2 1
Ÿ z3 = 3V 1 , z3 = 2 3( Pˆ  P ) 2 .
3 V
Indeed the extended Helmert matrix H3 is ingenious to decompose
1
( y  1Pˆ )c( y  1Pˆ ) = z12 + z22 + z32
V2
into a canonical quadratic form relating z12 + z22 to V̂ 2 and z32 to ( Pˆ  P ) 2 . At this
point, we have to interpret the general Helmert transformation z = V 1H( y  P ) :
Structure elements
of the Helmert transformation
scale or dilatation V 1
rotation H
translation P
V 1  \ + produces a dilatation or a scale change, H  SO(3) := {H  \ 3×3
HcH = I 3 and H = +1} a rotation (3 parameters) and 1P  \ 3 a translation.
Please, prove for yourself that the quadratic Helmert matrix is orthonormal, that
is HH c = H cH = I 3 and H = +1 .
The sixth action item
Finally we are left with the problem to split the cumulative pdf into one part
f1 ( Pˆ ) which is a marginal distribution of the arithmetic mean P̂ BLUUE of P
and another part f 2 ( Pˆ ) which is a marginal distribution of the standard devia-
tion V̂ , V̂ 2 BIQUUE of V 2 , Helmert’s F 22 with two degrees of freedom.
First let us introduce polar coordinates (I1 , r ) which represent the Cartesian
coordinates z1 = r cos I1 , z2 = r sin I1 . The index 1 is needed for later generaliza-
tion to higher dimension. As a longitude, the domain of I1 is I1  [0, 2S ] or
0 d I1 d 2S . The new random variable z12 + z22 =|| z ||2 =: x or radius r relates to
Helmert’s

2Vˆ 2 1
F 2 = z12 + z22 = = ( y  1Pˆ )c( y  1Pˆ ) .
V2 V2
580 Appendix D: Sampling distributions and their use

Secondly, the marginal distribution of the arithmetic mean P̂ , P̂ BLUUE of P ,


V V
f1 ( Pˆ P , )d Pˆ = f1 ( z3 0,1)dz3  N ( P , )
3 3
is a Gauss-Laplace normal distribution with mean P and variance V 2 / 3 gen-
erated by
V
dF1 = f1 ( Pˆ P , )d Pˆ = f1 ( z3 0,1) dz3 =
3
+f +f
§ 1 · § 1 ·
= (2S ) 1/ 2 exp ¨  z32 ¸ dz3 ³ ³ (2S ) 1 exp ¨  ( z12 + z22 ) ¸ dz1dz2
© 2 ¹ f f © 2 ¹
or

V 3 § 3 ·
f1 ( Pˆ P , )d Pˆ = (2S ) 1/ 2 exp ¨  2 ( Pˆ  P ) 2 ¸ d Pˆ .
3 V © 2V ¹
Third, the marginal distribution of the sample variance 2Vˆ 2 / V 2 = z12 + z22 =: x ,
Helmert’s F 2 distribution for p=n-1=2 degrees of freedom,
p
1 1 § x·
f 2 (2Vˆ 2 / V 2 ) = f 2 ( x) = x 2 exp ¨  ¸
p © 2¹
2 p / 2 *( )
2
is generated by
+f 2S
§ 1 2· 1 1 § 1 ·
dF2 = 1/ 2
³f (2S ) exp ¨©  2 z3 ¸¹ dz3 ³0 dI1 (2S ) 2 exp ¨©  2 x ¸¹ dx
2S
1 § 1 ·
dF2 = (2S ) 1Z2 exp ¨  x ¸ dx subject to Z 2 = ³ dI 1 = 2S
2 © 2 ¹ 0

1 § 1 ·
dF2 = exp ¨  x ¸ dx
2 © 2 ¹
and
dx
x := z12 + z22 = r 2 Ÿ dx = 2rdr, dr =
2r
1
dz1 dz2 = rdrdI1 = dxdI1
2
is the transformation of the surface element dz1dz2 . In collecting all detailed
results let us formulate a corollary.
D4 A second confidence interval for the mean, variance known 581

Corollary D7 (marginal probability distributions of P̂ , V 2 given, and of Vˆ 2 ):


The cumulative pdf of a set of three observations is represented by
V2
dF = f ( y1 , y2 , y3 )dy1dy2 dy3 = f1 ( Pˆ | P , ) f 2 ( x )d Pˆ dx
3
subject to

V2 1 1 § 1 ( Pˆ  P ) 2 ·
f1 ( Pˆ | P , ) := * * exp ¨  2 ¸
3 2S V / 3 © 2 V /3 ¹

1 § 1 ·
f 2 ( x) = exp ¨  x ¸
2 © 2 ¹
subject to
2
x := z12 + z22 = 2Vˆ 2 / V 2 , dx = dVˆ 2
V2
and
f f

³ f1 ( Pˆ ) d Pˆ = 1 versus ³f 2 ( x ) dx = 1.
f 0

f1 ( Pˆ ) is the pdf of the sample mean Pˆ = ( y1 + y2 + y3 ) / 3 and f 2 ( x) the pdf of


the sample variance Vˆ 2 = ( y c  1Pˆ )c( y  1Pˆ ) / 2 normalized by V2. f1 ( Pˆ ) is a
Gauss-Laplace pdf with mean P̂ and variance V2/3, while f 2 ( x) a Helmert F2
with two degrees of freedom.
In summary, an experiment with three Gauss-Laplace i.i.d. observations is char-
acterized by two marginal probability densities, one for the mean P̂ BLUUE of
P and another one for Vˆ 2 BIQUUE of V 2 :
Marginal probability densities
n=3, Gauss-Laplace i.i.d. observations
Pˆ : V̂ 2 :
V f 2 (2Vˆ 2 / V 2 )dVˆ 2 =
f1 ( Pˆ P , )d Pˆ =
3

1 § 1 (Pˆ  P )2 · Vˆ 2 § Vˆ 2 · 2
= (2S )1/ 2 exp ¨  2 ¸ d Pˆ = exp ¨  2 ¸ dVˆ
V/ 3 © 2 V /3 ¹ V2 © V ¹

or or
582 Appendix D: Sampling distributions and their use

f1 ( z3 0,1)dz3 = f 2 ( x)dx =

1 1 1
= (2S ) 1/ 2 exp( z32 ) dz3 = exp( x)dx
2 2 2
Pˆ  P Vˆ 2
z3 := x := 2
3V V2
D41 Sampling distributions of the sample mean P̂ , V 2 known, and of
the sample variance Vˆ 2
The two examples have prepared us for the general sampling distribution of the
sample mean P̂ , V 2 known, and of the sample variance Vˆ 2 for Gauss-Laplace
i.i.d. observations, namely samples of size n. By means of Lemma D8 on the
rectangular Helmert transformation and Lemma D9 on the quadratic Helmert
transformation we prepare for Theorem D10 which summaries both the pdfs for
P̂ BLUUE of P , V 2 known, for the standard deviation Vˆ and for Vˆ 2
BIQUUE of V 2 . Corollary D11 focusses on the pdf of V = qVˆ where V is an
unbiased estimation of the standard deviation V , namely E{V } = V .
Lemma D8 (rectangular Helmert transformation):
The rectangular Helmert matrix H n 1, n  \ n×( n 1) transforms the degenerate
quadratic form
( n  1)Vˆ 2 := ( y  1Pˆ )c( y  1Pˆ ) = y c My, rk M= n-1

subject to
1
Pˆ = 1cy
n
into the canonical form
(n  1)Vˆ 2 = zcn 1z n 1 = z12 + " + zn21 .

The special Helmert transformation y n 6 H n 1, n y n = z n 1 is represented by


H n1,n :=
ª 1 1 º
« 0 0 " 0 0 »
« 1˜ 2 1˜ 2 »
« 1 1 2 »
«  0 " 0 0 »
« 2˜3 2˜3 2˜3 »
« " " " " " " " »
.
« »
« 1 1 1 1 n 1 »
"  0
« (n  1)(n  2) (n  1)(n  2) ( n  1)( n  2) ( n  1)( n  2) ( n  1)( n  2) »
« »
« 1 1 1 1 1 n »
« "  »
«¬ n(n  1) n(n  1) n(n  1) n(n  1) n(n  1) n( n  1) »¼
D4 A second confidence interval for the mean, variance known 583

The inverse Helmert transformation z 6 y = H R z = H cz or yn = Hcn 1×n z n is


based on its right inverse which thanks to the right orthogonality
H n 1,n Hcn ,n 1 = I n 1 coincides with its transpose,
H R = H c(HH c) 1 = H c  R n× ( n 1) .

Lemma D9 (quadratic Helmert transformation):


The quadratic Helmert matrix H  \ n× n , also called extended Helmert matrix or
augmented Helmert matrix, transforms the quadratic form
1 1
( y  1P )c( y  1P ) = 2 [( y  1Pˆ ) + 1( Pˆ  P )]c[( y  1Pˆ ) + 1( Pˆ  P )]
V2 V
subject to
1
Pˆ = 1cy
n
by means of
z = V 1H( y  1P ) or y = V Hc( z  1P )

into the canonical form


n 1
1
( y  1 P ) c( y  1 P ) = z 2
1 + " + z 2
n 1 + z 2
n = ¦ z 2j + z n2
V2 j =1

z 2n = n( Pˆ  P ) 2 .

Such a Helmert transformation y 6 z = V 1H( y  1P ) is represented by


H :=
ª 1 1 º
« 0 0 " 0 0 »
1˜ 2 1˜ 2
« »
« 1 1 2 »
«  0 " 0 0 »
2˜3 2˜3 2˜3
« »
« 1 1 1 3 »
 " 0 0
« 3˜ 4 3˜ 4 3˜4 3i4 »
« »
« " " »
« 1 1 1 1 n 1 »
« "  0 »
« ( n  1)( n  2) ( n  1)( n  2) ( n  1)( n  2) ( n  1)( n  2) ( n  1)( n  2) »
« 1 1 1 1 1 n »
« "  »
« n ( n  1) n ( n  1) n ( n  1) n ( n  1) n( n  1) n( n  1) »
« »
« 1 1 1 1 1 1 »
"
«¬ n n n n n n ¼»

Since the quadratic Helmert matrix is orthonormal, the inverse Helmert trans-
formation is generated by
z 6 y  1P = V H cz .
584 Appendix D: Sampling distributions and their use

The proofs for Lemma D8 and Lemma D9 are based on generalizations of the
special cases for n=2, Example D11, and for n=3, Example D12. Any proof will
be omitted here.
The highlight of this paragraph is the following theorem.
Theorem D10 (marginal probability distribution of ( Pˆ , V 2 ) and Vˆ 2 ):
The cumulative pdf of a set of n observations is represented by
dF = f ( y1 ,… , yn )dy1 " dyn =

= f1 ( Pˆ ) f 4 (Vˆ )d Pˆ dVˆ = f1 ( Pˆ ) f 2 (Vˆ 2 )d Pˆ dVˆ 2

as the product of the marginal pdf f1 ( Pˆ ) of the sample mean Pˆ = n 1 1y and the mar-
ginal pdf f 2 (Vˆ ) of the sample standard deviation Vˆ = ( y  1Pˆ )c( y  1Pˆ ) /( n  1) ,
also called r.m.s. (root mean square error), or the marginal pdf f 2 (Vˆ 2 ) of the
sample variance Vˆ 2 . Those marginal pdfs are represented by
dF1 = f1 ( Pˆ )d Pˆ

dF4 = f 4 (Vˆ )dVˆ and dF2 = f 2 (Vˆ 2 )dVˆ 2 ,

(i) sample mean P̂

V2 1 ª 1 ( Pˆ  P ) 2 º
f1 ( Pˆ ) = f1 ( Pˆ | P , ) := exp «  »
n V 2
¬ 2 V /n ¼
2S
n

n 1 1
z := ( Pˆ  P ) : f1 ( z )dz = exp(  z 2 ) dz
V 2S 2

(ii) sample r.m.s. Vˆ


p := n  1
dF4 = f 4 (Vˆ )dVˆ

2pp p Vˆ 2
f 4 (Vˆ ) = Vˆ p 1 exp(  )
p
V p 2 p / 2 *( ) 2 V2
2

Vˆ p
x := n  1 = Vˆ
V V
2 1
f 4 ( x )dx = x p 1 exp(  x 2 )dx
p 2
2 p / 2 *( )
2
D4 A second confidence interval for the mean, variance known 585

dF4 = f 4 ( x )dx

(iii) sample variance Vˆ 2

p := n -1
1 1 Vˆ 2
f 2 (Vˆ 2 ) = p p / 2Vˆ p 2 exp(  p ),
p
V p 2 p / 2 *( ) 2 V2
2
Vˆ 2 p
x := ( n  1) 2
= 2 Vˆ 2 :
V V
p
1 1 1
f 2 ( x )dx = x 2 exp(  x )dx.
p/2 p 2
2 *( )
2
V2
f1 ( Pˆ | P ,
) as the marginal pdf of the sample mean BLUUE of P is a
n
Gauss-Laplace pdf with mean P and variance V 2 / n . f 2 ( x ), x := pVˆ / V , is
the standard pdf of the normalized root-mean-square error with p degrees of
freedom. In contrast, f 2 ( x ), x := pVˆ 2 / V 2 is a Helmert Chi Square F p2 pdf with
p = n  1 degrees of freedom.
Before we present a sketch of a proof of Theorem D10 which will be run with
five action items and a special reference to the first and second vehicle, we give
some historical comments. S. Kullback (1934) refers the marginal pdf f1 ( Pˆ ) of
the “arithmetic mean” P̂ to S. D. Poisson (1827), F. Haussdorff (1901) and J.
O. Irwins (1927). He has also solved the problem to find the marginal pdf of the
“geometric mean”. The marginal pdf f 2 (Vˆ 2 ) of the sample variance Vˆ 2 has been
originally derived by F. R. Helmert (1875, 1976 a, b). A historical discussion of
Helmert’s distribution is offered by H. A. David (1957), W. Kruskal (1946), H.
O. Lancaster (1965, 1966), K. Pearson (1931) and O. Sheynin (1995).
The marginal pdf f 4 (Vˆ ) has not found any interest in practice so far. The reason
may be found in the effect that Vˆ is not an unbiased estimate of the standard
deviation V, namely E{Vˆ } = V . According to E. Czuber (1891 p. 162), K. D. P.
Rosen (1948 p. 37) L. Schmetterer (1956 p. 203), R. Stoom (1967 p. 199, 218),
M. Fisz (1971 p. 240) and H. Richter and V. Mammitzsch (1973 p. 42) have
documented that
p
*( )
p 2 Vˆ = qVˆ
V =
2 *( p + 1 )
2
586 Appendix D: Sampling distributions and their use

is an unbiased estimation V of the standard deviation V, namely E{Vˆ } = V .


p=n-1 again denotes the number of degrees of freedom. B. Schaffrin (1979 p.
240) has proven that

( y  1Pˆ )c( y  1P ) 2p 1
Vˆ p := = Vˆ p
1 2 p 1 2
p
2

is an asymptotic (“from above”) unbiased estimation Vˆ p of the standard devia-


tion V. Let us implement V BLUUE of V into the marginal pdf f (Vˆ ) :

Corollary D11 (marginal probability distributions of V for Gauss-Laplace


i.i.d. observations, E{V } = V ):

The marginal pdf of V , an unbiased estimation of the standard deviation V, is


represented by

dF4 = f 4 (V )dV

p +1 § § p + 1 ·2 ·
*p( ) ¨ ¨ *( ) ¸
2 ¸ V 2 ¸
p
2p 2
f 4 (V ) = V p 1
exp ¨  ¨ ¸
V p 2p/2 p
p
p ¨ ¨ *( p ) ¸ ¸
( ) 2 * p +1 ( ) ¨ © ¨ ¸ ¸
2 2 © 2 ¹ ¹

and

dF4 = f 4 ( x)dx

p +1
*( )
2 2 Vˆ
x :=
V *( p )
2

2 1
f 4 ( x) = x p 1 exp( x 2 )
p 2
2 p / 2 *( )
2

subject to

p +1
*(
)
E{x} = 2 2 .
p
*( )
2
D4 A second confidence interval for the mean, variance known 587

Figure D6 illustrates the marginal pdf for “degrees of freedom”


p  {1, 2,3, 4,5} .
2
2 x p 1 e  x /2
/[2 p / 2 *( p / 2)]

Figure. D6: Marginal pdf for the sample standard deviation V (r.m.s.)
Proof:
The first action item
The pdf of n Gauss-Laplace i.i.d. observations is given by
n
f ( y1 ," , yn ) = f ( y1 )" f ( yn ) = – f ( yi )
i =1

§ 1 ·
f ( y1 ," , yn ) = (2S )  n / 2 V  n exp ¨  2 (y  1P )c(y  1P ) ¸ .
© 2V ¹
The coordinates of the observation space Y have been denoted by
[ y1 ," , yn ]c = y . Note dim Y = n .
The second action item
The quadratic form ( y  1P )c( y  1P ) > 0 allows the fundamental decomposition
y  1P = (y  1Pˆ ) + 1( Pˆ  P ) ,
(y  1P )c(y  1P ) = (y  1Pˆ )c(y  1Pˆ ) + 1c1( Pˆ  P ) 2

1c1 = n
588 Appendix D: Sampling distributions and their use

(y  1P )c(y  1P ) = (n  1)Vˆ 2 + n( Pˆ  P ) 2 .

Here, P̂ is BLUUE of P and Vˆ 2 BIQUUE of V 2 . The decomposition of the


quadratic ( y  1P )c( y  1P ) into the sample variance Vˆ 2 and the square
( Pˆ  P ) 2 of the shifted sample mean P̂  P has already been proved for n=2 and
n=3. The general result is obvious.
The third action item
Let us transform the cumulative probability into its canonical forms.
dF = f ( y1 ," , yn )dy1 " dyn =

§ 1 ·
= (2S )  n / 2 V  n exp ¨  2 (y  1P )c(y  1P ) ¸ dy1 " dyn =
© 2V ¹

§ 1 ·
= (2S )  n / 2 V  n exp ¨  2 [( n  1)Vˆ 2 + n( Pˆ  P ) 2 ] ¸ dy1 " dyn
© 2V ¹
z = V  1 H ( y  1P ) or y  1P = V Hcz

1
( y  1P )c( y  1P ) = z12 + z22 + " + zn21 + zn2 .
V2
Here, we have substituted the divert Helmert transformation (quadratic Helmert
matrix H) and its inverse. Again V 1 is the scale factor, also called dilatation, H
an orthonormal matrix, also called rotation matrix, and 1P  R n the translation,
also called shift.
dF = f ( y1 ," , yn )dy1 " dyn =
1 § 1 1 · § 1 ·
= exp ¨  zn2 ( n 1) / 2 ¸
exp ¨  ( z12 + " + zn2 ) ¸ dz1dz2 " dzn 1dzn
2S © 2 (2S ) ¹ © 2 ¹

based upon
dy1dy2 " dyn 1dyn = V n dz1dz2 " dzn 1dzn

J = V n | Hc |= V n | H |= V n .

J again denotes the absolute value of the Jacobian determinant introduced by the
first vehicle.
The fourth action item
First, we identify the marginal distribution of the sample mean P̂ .

V2
dF1 = f1 ( Pˆ | P , )d Pˆ .
n
D4 A second confidence interval for the mean, variance known 589

According to the specific structure of the quadratic Helmert matrix zn is gener-


ated by

ª y1  P º
1 1 « y + " + yn  n P
zn = V [ 1
," , ] # » = V 1 1 ,
n n « » n
«¬ yn  P »¼

upon substituting
1 y + " + yn
Pˆ = 1cy = 1 Ÿ y1 + " + yn = n Pˆ Ÿ
n n
n( Pˆ  P ) ( Pˆ  P ) 2 1
zn = V 1 Ÿ zn2 = n 2
, dzn = n d Pˆ .
n V V
Let us implement dzn in the marginal distribution.
1 1
dF1 = exp( zn2 )dzn
2S 2
+f +f
1
³ " ³ (2S )  ( n 1) / 2 exp[  ( z12 + " + zn21 )]dz1 " dzn 1 ,
f f
2
+f +f
1
³f " f³ (2S )
 ( n 1) / 2
exp[  ( z12 + " + zn21 )]dz1 " dzn 1 = 1
2

1 1
dF1 = exp( zn2 )dzn
2S 2

1 1 1 ( Pˆ  P ) 2 V2
dF1 = exp( ) d P
ˆ = f1 ( P
ˆ | P , )d Pˆ .
2S V 2 V2 n
n n
The fifth action item
Second, we identify the marginal distribution of the sample variance Vˆ 2 . We
depart the ansatz
dF2 = f 2 (Vˆ )dVˆ = f 2 (Vˆ 2 )dVˆ 2

in order to determine the marginal distribution f 2 (Vˆ ) of the sample root-mean-


square errors Vˆ and the marginal distribution f 2 (Vˆ 2 ) of the sample variance
Vˆ 2 . A first version of the marginal probability distribution dF2 is
f
1 1
dF2 = ³ exp( 2 z
2
n )dzn
2S f
590 Appendix D: Sampling distributions and their use

1 1
( n 1) / 2
exp[  ( z12 + " + zn21 )]dz1 " dzn 1 .
(2S ) 2
Transform the Cartesian coordinates ( z1 ," , zn 1 )  R n 1 to spherical coordinates
()1 , ) 2 ," , ) n 2 , rn 1 ) . From the operational point of view, p = n  1 , the num-
ber of “degrees of freedom”, is an optional choice. Let us substitute the global
hypersurface element Z n 1 or Z p into dF2 , namely
f
1 1
³ exp( 2 z
2
n )dzn = 1
2S f

1 1
dF2 = r n 2 exp(  r 2 )dr
2( n 1) / 2 2
+S / 2 +S / 2 +S / 2 2S
³ cosn 3In 2 dIn 2 ³ cosn 4In 3dIn 3 " ³ cosI2 dI2 ³ dI1
S / 2 S / 2 S / 2 0

1 1
dF2 = p/2
r p 1 exp(  r 2 )dr
(2S ) 2
+S / 2 +S / 2 +S / 2 +S / 2 2S
³ cos p 2I p 1 ³ cos p 3I p 2 dI p 2 " ³ cos2I3dI3 ³ cosI2 dI2 ³ dI1
S / 2 S / 2 S / 2 S / 2 0

2 2
Z n 1 = Z p = S ( n 1) / 2 = S p/2 =
n 1 p
*( ) *( )
2 2
S /2 S /2 S /2 S /2 2S
= ³ cos p 2I p 1dI p 1 ³ cos p 3I p 2 dI p 2 " ³ cos 2I3dI3 ³ cosI2 dI2 ³ dM1
S / 2 S / 2 S / 2 S / 2 0

Zp 1
dF2 = p/2
r p 1 exp( r 2 )dr
(2S ) 2

2 1
dF2 = r p 1 exp( r 2 )dr .
p 2
2 p / 2 *( )
2
The marginal distribution of the r.m.s. f 2 (Vˆ ) is generated as soon as we substi-
tute the radius r = z12 + " + z n21 = n  1 Vˆ1 / V . Alternatively the marginal
distribution of the sample variance f 2 (Vˆ 2 ) is produced when we substitute the
radius square r 2 = z12 + " + zn21 = ( n  1)Vˆ 2 / V 2 .
Project A

Vˆ p
r = n  1 Vˆ / V = Ÿ dr =
p dVˆ
V V
dF2 = f 2 (Vˆ )dVˆ
D4 A second confidence interval for the mean, variance known 591

2pp 1 p 2
f 2 (Vˆ ) = Vˆ p -1 exp( Vˆ ) .
V p 2p/2 2V2
Indeed, f 2 (Vˆ ) establishes the marginal distribution of the root-mean-square
error Vˆ with p=n-1 degrees of freedom.
Project B
ª dx = 2rdr
Vˆ 2 Vˆ 2
x := r = (n  1) 2 = p 2 =: F p Ÿ «
2 2

V V « dr = dx = 1 dx
«¬ 2r 2 x

1 2p 1
r p 1 dr = x dx
2
dF2 = f 2 ( x)dx
p
1 1 1
f 2 ( x) := x 2 exp( x ).
p 2
2 p / 2 *( )
2
Finally, we have derived Helmert’s Chi Square F p2 distribution f 2 ( x)dx = dF2
by substituting r 2, r p-1 and dr in factor of x := r 2 and dx = 2r dr.
Project C
Replace the radical coordinate squared r 2 = (n  1)Vˆ 2 / V 2 = pVˆ 2 / V 2 by re-
scaling on the basis p / V 2
Vˆ 2 p
x = r 2 = z12 + " + zn21 = z12 + " + z 2p = (n  1) 2
= 2 Vˆ 2
V V
p
dx = dVˆ
V2
within Helmert’s F p2 with p=n-1 degrees of freedom
dF2 = f 2 (Vˆ 2 )dVˆ 2

1 1 p 2
f 2 (Vˆ 2 ) = p p / 2Vˆ p  2 exp( Vˆ )
p
V p 2 p / 2 *( ) 2V2
2 .
Recall that f 2 (Vˆ 2 ) establishes the marginal distribution of the sample variance
Vˆ 2 with p = n -1 degrees of freedom.
592 Appendix D: Sampling distributions and their use

Both, the marginal pdf f 2 (Vˆ ) of the sample standard deviation Vˆ , also called
root-mean-square error, and the marginal pdf f 2 (Vˆ 2 ) of the sample variance Vˆ 2
document the dependence on the variance V 2 and its power V p .
h
“Here is my journey’s end.”
(W. Shakespeare: Hamlet)
D42 The confidence interval for the sample mean, variance known

An application of Theorem D10 is Lemma D12 where we construct the confi-


dence interval for the sample mean P̂ , BLUUE of P, variance V 2 known, on the
basis of its sampling distribution. Example D14 is an example of a random sam-
ple of size n=4.

Lemma D12 (confidence interval for the sample mean, variance known):

The sampling distribution of the sample mean Pˆ = n 1 1cy , BLUUE of P, is


Gauss-Laplace normal, P  N ( Pˆ | P , V 2 / n) , if the observations yi , i 
{1," , n} , are Gauss-Laplace i.i.d. The “true” mean P is an element of the two-
sided confidence interval.

V V
P ]Pˆ  c1D / 2 , Pˆ + c1D / 2 [
n n

with confidence

V V
P{Pˆ  c1D / 2 < P < Pˆ + c1D / 2 } = 1  D
n n

of level 1-D. For three values of the coefficient of confidence J =1-D, Table D7
is a list of associated quantiles c1-D/2 .

Example D14 (confidence interval for the sample mean P̂ , V2 known):

Suppose that a random sample

ª y1 º ª1.2 º
« y » «3.4 »
y := « 2 » = « » , Pˆ = 2.7, V 2 = 9, r.m.s. : 3
« y3 » « 0.6 »
« » « »
«¬ y4 »¼ «¬5.6 »¼

of four observation is known from a Gauss-Lapace normal distribution with


unknown mean P and a known standard deviation V =3. P̂ BLUUE of P is the
arithmetic mean. We intend to determine upper and lower limits which are rather
D4 A second confidence interval for the mean, variance known 593

certain to contain the unknowns parameter P between them. Previously, for sam-
ples of size 4 we have known that the random variable

Pˆ  P Pˆ  P
z= =
V 3
n 2

is normally distributed with mean zero and unit variance. P̂ is the sample mean
2.7 and 3/2 is V / n . The probability J = 1  D that z will be between any two
arbitrarily chosen numbers c1 = - c and c2 = c is
c2

P{c1 < z < +c2 } = ³ f ( z )dz = J = 1  D ,


c1

+c

P{c < 2 < +c} = ³ f ( z )dz = J = 1  D ,


c

c c

P{c < z < +c} = ³ f ( z )dz  ³ f ( z )dz = J = 1  D ,


f f

c c
D D
³ f ( z )dz = 1  , ³ f ( z )dz = .
f
2 f
2

J is the coefficient of confidence, D the coefficient of negative confidence, also


called complementary coefficient of confidence. The four representations of the
probability J =1-D to include z in the confidence interval c < z < + c have led
to the linear Voltera integral equation of the first kind
c
D 1+ J
³ f ( z )dz = 1  = .
f
2 2

Three values of the coefficient of confidence J or its compliment D are popular


and listed in Table D6.
Table D6
Values of the coefficient of confidence
J 0.950 0.990 0.999
D 0.050 0.010 0.001
D
0.025 0.005 0.000,5
2
D 1+ J
1 = 0.975 0.995 0.999,5
2 2
594 Appendix D: Sampling distributions and their use

In solving the linear Voltera integral equation of the first kind


z
1 1
³ f ( z * )dz * = 1  D ( z ) = [1 + J ( z )].
f
2 2

Table D7 collects the quantiles c1-D/2 given the coefficients of confidence on their
complements which we listed in Table D7.
Table D7
Quantiles for the confidence interval of the sample mean,
variance unknown

1  D / 2 = (1 + J ) / 2 J D c1-D/2
0.975 0.95 0.05 1.960
0.995 0.99 0.01 2.576
0.999,5 0.999 0.001 3.291

Given the quantiles c1-D/2, we are going to construct the confidence interval for
the sample mean P̂ , the variance V 2 to be known. For this purpose, we convert
the forward transformations Pˆ o z = n ( Pˆ  P ) / V to P.

V
Pˆ  z=P
n

V V
Pˆ1 := Pˆ  c1D / 2 < P < Pˆ + c1D / 2 =: Pˆ 2 .
n n

The interval Pˆ1 < P < Pˆ 2 for the fixed value z = c1-D/2 contains the “true” mean P
with probability J =1-D.
c Pˆ 2
V V V2
P{Pˆ  c D < P < Pˆ + c D } = ³ f ( z )dz = ³ f ( Pˆ | P , )d Pˆ = J = 1  D
n 1 2 n 1 2 c Pˆ
n 1

since
V
Pˆ + c1D / 2
Pˆ 2 n c1D / 2
D
³ f ( Pˆ )d Pˆ = ³ f ( Pˆ )d Pˆ = ³ f ( Pˆ )d Pˆ = 1  .
f f f 2

An animation of the coefficient of confidence and the probability functions of a


confidence interval is offered by Figure D6 and D7.
D4 A second confidence interval for the mean, variance known 595

Figure D6: Two-sided confidence interval P ]Pˆ1 , Pˆ 2 [, f ( Pˆ | P , V 2 / n) p.d.f.:

f ( Pˆ | P , V 2 ) f ( Pˆ | P , V 2 )

D/2 J =1-D D/2

P̂1 P P̂ 2
P̂1
V V
Pˆ1 = Pˆ  c1D / 2 , Pˆ 2 = Pˆ + c1D / 2
n n

V P V
Pˆ1 = Pˆ  c1D / 2 Pˆ1 = Pˆ + c1D / 2
n n
Figure D7: Two-sided confidence interval, quantile c1-D/2
Let us specify all the integrals to our example
c1D / 2

³ f ( z )dz = 1  D / 2
f

1.960 2.576 3.291

³ f ( z )dz = 0.975 , ³ f ( z )dz = 0.995 , ³ f ( z )dz = 0.999,5 .


f f f

Those data lead to a triplet of confidence intervals.


case (i) J =0.95, D =0.05, c1-D/2 =1.960
3 3
P{2.7  1.96 < P < 2.7 + 1.96} = 0.95
2 2
P{0.24 < P < +5.64} = 0.95

case (ii) J =0.99, D =0.01, c1-D/2 =2.576


3 3
P{2.7  2.576 < P < 2.7 + 2.576} = 0.99
2 2
P{1.164 < P < 6.564} = 0.99

case (iii) J =0.999, D =0.001, c1-D/2 =3.291


596 Appendix D: Sampling distributions and their use

3 3
P{2.7  3.291 < P < 2.7 + 3.291} = 0.999
2 2
P{2.236 < P < 7.636} = 0.999 .

With probability 95% the “true” mean P is an element of the interval ]-0.24,
+5.64[. In contrast, with probability 99% the “true” mean P is an element of the
larger interval ]-1.164, +6.564[. Finally, with probability 99.9% the “true” mean
P is an element of the largest interval ]-2.236, +7.636[.
D5 Sampling from the Gaus-Laplace normal distribution:
a third confidence interval for the mean, variance unknown
In order to derive the sampling distributions for the sample mean, variance un-
known, of Gauss-Laplace i.i.d. observation D51 introduces two examples (two
and three observations, respectively, for generating Student’s t distribution.
Lemma D 13 reviews Student’s t-distribution of the random variable
n c ( Pˆ  P ) / Vˆ where the sample mean P̂ is BLUUE of P, whereas the sample
variance Vˆ 2 is BIQUUE of V2. D52 by means of Lemma D13 introduces the
confidence interval for the “true” mean P variance V2 unknown, which is based
on Student’s probability distribution. For easy computation, Table D12 is its flow
chart. D53 discusses The Uncertainly Principle generated by The Magic Trian-
gle of (i) length of confidence interval, (ii) coefficient of negative confidence,
also called the uncertainty number, and (iii) the number of observations. Various
figures and examples pave the way for the routine analyst’s use of the confi-
dence interval for the mean, variance unknown.
D51 Student’s sampling distribution of the random variable ( Pˆ  P ) / Vˆ
Two examples for n=2 or n=3 Gauss-Laplace i.i.d. observations keep us to de-
rive Student’s t-distribution for the random variable n ( Pˆ  P ) / Vˆ where P̂ is
BLUUE of P, whereas Vˆ 2 is BIQUUE of V2. Lemma D12 and its proof is the
highlight of this paragraph in generating the sampling probability distribution of
Student’s t.
Example D15 (Student’s t-distribution for two Gauss-Laplace i.i.d. observations):
First, assume an experiment of two Gauss-Laplace i.i.d. observations called y1
and y2: We want to prove that (y1+y2)/2 and (y1-y2)2/2 or the sample mean P̂ and
the sample variance Vˆ 2 are stochastically independent. y1 and y2 are elements of
the joint pdf.
f ( y1 , y2 ) = f ( y1 ) f ( y2 )

1 § 1 ·
f ( y1 , y2 ) = exp ¨  2 [( y1  P ) 2 + ( y2  P ) 2 ] ¸
2SV 2 © 2V ¹
D5 A third confidence interval for the mean, variance unknown 597

1 § 1 ·
f ( y1 , y2 ) = 2
exp ¨  2 (y  1P )c(y  1P ) ¸ .
2SV © 2V ¹
The quadratic form (y  1P )c(y  1P ) is decomposed into the sample variance
Vˆ 2 , BIQUUE of V2, and the deviate of the sample mean P̂ , BLUUE of P, from
P by means of the fundamental separation
y  1P = y  1Pˆ + 1( Pˆ  P )

(y  1P )c(y  1P ) = (y  1Pˆ )c(y  1Pˆ ) + 1c1( Pˆ  P ) 2

(y  1P )c(y  1P ) = Vˆ 2 + 2( Pˆ  P ) 2

1 § 1 · § 2 ·
f ( y1 , y2 ) = exp ¨  2 Vˆ 2 ¸ exp ¨  2 ( Pˆ  P ) 2 ¸ .
2SV 2 © 2V ¹ © 2V ¹
The joint pdf f (y1,y2) is transformed into a special form if we replace
Pˆ = ( y1 + y2 ) / 2 ,

1
Vˆ 2 = ( y1  Pˆ ) 2 + ( y2  Pˆ ) 2 = ( y1  y2 ) 2 ,
2
namely

1 § 1 1 · § 1 y + y2 ·
f ( y1 , y2 ) = exp ¨  2 ( y1  y2 ) 2 ¸ exp ¨  2 ( 1  P )2 ¸ .
2SV 2 © 2V 2 ¹ © V 2 ¹
Obviously the product decomposition of the joint pdf documented that (y1+y2)/2
and (y1-y2)2/2 or P̂ and Vˆ 2 are independent random variables.
Second, we intend to derive the pdf of Student’s random variable t := 2( Pˆ  P ) / Vˆ ,
the deviate of the sample mean P̂ form the “true” mean P, normalized by the
sample standard deviations Vˆ . Let us introduce the direct Helmert transforma-
tion
1 y1  y2 Vˆ 2 y1 + y2 2 Pˆ  P
z1 = = , z2 = (  P) = ,
V 2 V V 2 V 2
or
ª 1 1 º

ª z1 º 1 « 2 2 »» ª y1  P º
«z » = « 1 1 » «¬ y2  P »¼
,
¬ 2¼ V «
« »
¬ 2 2 ¼
as well as the inverse
598 Appendix D: Sampling distributions and their use

ª 1 1 º
ª y1  P º « 2 2 »» ª z1 º
«y  P» =V «
1 » «¬ z2 »¼
,
¬ 2 ¼ « 1
« »
¬ 2 2¼
which brings the joint pdf dF = f(y1, y2)dy1dy2 = f(z1, z2)dz1dz2 into the canonical
form.
1 § 1 · 1 § 1 ·
dF = exp ¨  z12 ¸ exp ¨  z22 ¸ dz1dz2
2S © 2 ¹ 2S © 2 ¹
1 1
1 2 2
dy1dy2 = dz1dz2 = dz1dz2 .
V2 1 1

2 2
The Helmert random variable x := z12 or z1 = x replaces the random variable z1
such that
1 1
dz1dz2 = dxdz2
2 x

1 1 § 1 · 1 § 1 ·
dF = exp ¨  x ¸ exp ¨  z22 ¸ dxdz2
2S 2 x © 2 ¹ 2S © 2 ¹
is the joint pdf of x and z2. Finally, we introduce Student’s random variable
Pˆ  P
t := 2

2 V
z2 = ( Pˆ  P ) Ÿ Pˆ  P = z2
V 2

z1 = x = Ÿ Vˆ = V z1 = V x
V
z2
t= œ z2 = x t Ÿ z22 = x t 2 .
x
Let us transform dF =f (x, z2) dx dz2 to dF =f (t, x) dt dx, namely from the joint
pdf of the Helmert random variable x and the Gauss-Laplace normal variate z2
to the joint pdf of the Student random variable t and the Helmert random vari-
able x.
D5 A third confidence interval for the mean, variance unknown 599

Dt z2 Dx z2
dz2 dx = dt dx
Dt x Dx x

2 3/ 2
x x t
dz2 dx = 2 dt dx
0 1

dz2 dx = x dt dx

dF = f (t, x)dt dx
1 1 1
f (t , x) = exp[ (1 + t 2 ) x ] .
2S 2 2
The marginal distribution of Student’s random variable t, namely dF3=f3(t)dt, is
generated by
f
1 1 1
f 3 (t ) := ³ exp[ (1 + t 2 ) x]dx
2S 2 0 2

subject to the standard integral


f
1 1
³ exp(E x)dx = [ E exp (  E x )] =
f
0
0
E

1 1 2
E := (1 + t 2 ), =
2 E 1+ t2
such that
1 1
f 3 (t ) = ,
2S 1 + t 2
1 1
dF3 = dt
2S 1 + t 2
and characterized by a pdf f3(t) which is reciprocal to (1+t2).
Example D16 (Student’s t-distribution for three Gauss-Laplace i.i.d. observa-
tions):
First, assume an experiment of three Gauss-Laplace i.i.d. observations called y1,
y2 and y3: We want to derive the joint pdf f(y1, y2, y3) in terms of the sample mean
P̂ , BLUUE of P, and the sample variance Vˆ 2 , BIQUUE of V2.
f(y1, y2, y3) = f(y1) f(y2) f(y3)
600 Appendix D: Sampling distributions and their use

1 § 1 ·
f ( y1 , y2 , y3 ) = 3/ 2 3
exp ¨  2 [( y1  P ) 2 + ( y2  P ) 2 + ( y3  P ) 2 ] ¸
(2S ) V © 2V ¹

1 § 1 ·
f ( y1 , y2 , y3 ) = 3/ 2 3
exp ¨  2 (y  1P )c(y  1P ) ¸ .
(2S ) V © 2V ¹
The quadratic form (y  1P )c(y  1P ) is decomposed into the sample variance
Vˆ 2 and the deviate sample mean P̂ from the “true” mean P by means of the
fundamental separation
y  1P = y  1Pˆ + 1( Pˆ  P )

(y  1P )c(y  1P ) = (y  1Pˆ )c(y  1Pˆ ) + 1c1( Pˆ  P ) 2

(y  1P )c(y  1P ) = 2Vˆ 2 + 3( Pˆ  P ) 2

dF = f ( y1 , y2 , y3 )dy1dy2 dy3 =
1 § 1 · § 3 ·
= exp ¨  2 2Vˆ 2 ¸ exp ¨  2 ( Pˆ  P ) 2 ¸ dy1dy2 dy3 .
(2S )3/ 2 V 3 © 2V ¹ © 2V ¹
Second, we intend to derive the pdf of Student’s random variable t := 3( Pˆ  P ) / Vˆ ,
the deviate of the sample mean P̂ from the “true” mean P, normalized by the
sample standard deviation V̂ . Let us introduce the direct Helmert transformation
1 1
z1 = ( y1  P + y2  P ) = ( y1 + y2  2 P )
V 2 V 2
1 1
z2 = ( y1  P + y2  P  2 y3 + 2 P ) = ( y1 + y2  2 y3 )
V 2˜3 V 2˜3
1 1
z3 = ( y1  P + y2  P + y3  P ) = ( y1 + y2 + y3  3P )
V 3 V 3
or

ª 1 1 º
« 0 »
« 1˜ 2 1˜ 2 » ª y1  P º
ª z1 º
«z » = 1 « 1 1 2 »« »
« 2» V «  » « y2  P »
« 2˜3 2˜3 2˜3 »
«¬ z3 »¼ « y  P »¼
« 1 1 1 »¬ 3
« »
¬ 3 3 3 ¼
as well as its inverse
D5 A third confidence interval for the mean, variance unknown 601

ª 1 1 1 º
« »
« 1˜ 2 2˜3 3» z
ª y1  P º ª 1º
«y  P» = V « 1 1 1 »« »
« 2 » « » z2
« 1˜ 2 2˜3 3»« »
«¬ y3  P »¼ «z »
« 2 1 »¬ 3¼
« 0  »
¬ 2˜3 3¼
in general
z = V 1 H (y  P ) versus (y  1P ) = V H cz ,

which help us to bring the joint pdf dF(y1, y2, y3)dy1 dy2 dy3 = f(z1, z2, z3)dz1 dz2
dz3 into the canonical form.
1 § 1 · § 1 ·
dF = exp ¨  ( z12 + z22 ) ¸ exp ¨  z32 ¸ dz1dz2 dz3
(2S )3/ 2 © 2 ¹ © 2 ¹

1 1 1
1˜ 2 2˜3 3
1 1 1 1
dy1dy2 dy3 = dz1dz2 dz3 = dz1dz2 dz3 .
V3 1˜ 2 2˜3 3
2 1
0 
2˜3 3

The Helmert random variable x := z12 + z22 or x = z12 + z22 replaces the ran-
dom variable z12 + z22 as soon as we introduce polar coordinates z1 = r cos I1 ,
z2 = r sin I1 , z12 + z22 = r 2 =: x and compute the marginal pdf
2S
1 § 1 · 1 1 § 1 ·
dF ( z3 , x) = exp ¨  z32 ¸ dz3 exp ¨  x ¸ dx ³ dI1 ,
2S © 2 ¹ 2S 2 © 2 ¹ 0
by means of
dx
x := z12 + z22 = r 2 Ÿ dx = 2rdr , dr = ,
2r
1
dz1dz2 = rdrdI1 = dxdI1 ,
2
1 § 1 · 1 § 1 ·
dF ( z3 , x) = exp ¨  x ¸ exp ¨  z32 ¸ dxdz3 ,
2 © 2 ¹ 2S © 2 ¹
the joint pdf of x and z. Finally, we inject Student’s random variable
602 Appendix D: Sampling distributions and their use

Pˆ  P
t := 3 ,

decomposed into

3 V
z3 = ( Pˆ  P ) Ÿ Pˆ  P = z3
V 3
Vˆ V V
z12 + z22 = x = 2 Ÿ Vˆ = z12 + z22 = x
V 2 2
z3 1 1 2
t= 2 œ z3 = x t Ÿ z32 = xt .
x 2 2
Let us transform dF = f (x, z3) dx dz3 to dF = f (t, x) dt dx. Alternatively we may
say that we transform the joint pdf of the Helmert random variable x and the
Gauss-Laplace normal variate z3 to the joint pdf of the Student random variable
t and the Helmert random variable x.
Dt z3 Dx z3
dz3 dx = dtdx
Dt x Dx x

x 1
x 3/ 2t
dz3 dx = 2 2 2 dtdx
0 1

x
dz3 dx = dtdx
2
dF = f (t , x)dtdx

1 1 1 t2
f (t , x) = x exp[ (1 + ) x].
2 2 2S 2 2
The marginal distribution of Student’s random variable t, namely dF3 = f3(t)dt, is
generated by
f
1 1 1 t2
f 3 (t ) := ³ x exp[ (1 + ) x]dx
2 2 2S 0
2 2

subject to the standard integral

*(D + 1)
f
1 1 t2
³ x exp( E x)dx = = = +
D
, D , E (1 )
0
E D +1 2 2 2
D5 A third confidence interval for the mean, variance unknown 603
such that
3
f *( )
1 t2 2
³ x exp[ (1 + ) x]dx = 23/ 2
2 2 t 2 3/ 2
0
(1 + )
2
3 1
*( ) = S
2 2
3 1 1
f 3 (t ) = *( ) ,
2 2S t2
(1 + )3/ 2
2

2
dF3 = dt .
t 2 3/ 2
4(1 + )
2
Again Student’s t-distribution is reciprocal t(1+t2/2)3/2.
Lemma D12 (Student’s t-distribution for the derivate of the mean ( Pˆ  P ) / Vˆ ,W.S.
Gosset 1908):
Let the random vector of observations y = [ y1 ," , yn ]c be Gauss-Laplace i.i.d.
Student’s random variable
Pˆ  P
t := n,

where the sample mean P̂ is BLUUE of P and the sample variance Vˆ 2 is
BIQUUE of V 2 , associated to the pdf
p +1
*( )
2 1 1
f (t ) = .
p pS t 2 ( p +1) / 2
*( ) (1 + )
2 p

p = n  1 is the “degree of freedom” of Student’s distribution f p (t ) .


W.S. Gosset published the t-distribution of the ratio n ( Pˆ  P ) / Vˆ under the
pseudonym “Student”: The probable error of a mean, Biometrika 6 (1908-09) 1-
25.
:Proof:
The joint probability distribution of the random variable zn := n ( Pˆ  P ) / V and
the Helmert random variable x := z12 + " + zn21 = ( n  1)Vˆ 2 / V 2 represented by
dF = f1 ( zn ) f 2 ( x )dzn dx
604 Appendix D: Sampling distributions and their use

due to the effect that zn and z12 + " + zn21 or P̂  P and ( n  1)Vˆ 2 are stochasti-
cally independent. Let us take reference to the specific pdfs with n and n  1 = p
degrees of freedom

dF1 = f1 ( zn )dzn and dF2 = f 2 ( x)dx ,

1 1 p / 2 ( p  2) / 2 § 1 ·
1 § 1 · f 2 ( x) = ( ) x exp ¨  x ¸ ,
f1 ( zn ) = exp ¨  zn2 ¸ and p 2 © 2 ¹
2S © 2 ¹ *( )
2

or

dF1 = f1 ( Pˆ )d Pˆ and dF2 = f 2 (Vˆ 2 )dVˆ 2 ,

1
2 f 2 (Vˆ 2 ) = p p/2
p p / 2Vˆ p2
n § n (Pˆ  P ) · V 2 *( p /2)
f1 (Pˆ ) = exp ¨  2 ¸ and
V 2S © 2 V ¹
§ 1 p 2·
exp ¨  2
Vˆ ¸
© 2V ¹

we derived earlier. Here, let us introduce Student’s random variable

zn Pˆ  P x x
t := n 1 = n or zn = t= t.
x Vˆ n 1 p

By means of the Jacobi matrix J and its absolute Jacobi determinant | J |

ª dt º ª Dz t Dx t º ª dzn º ª dzn º
« dx » = « D x D x » « dx » = J « dx »
n

¬ ¼ «¬ z nx »¼¬ ¼ ¬ ¼

ª p 1 º
«  x 3/ 2 zn p » p x
J=« x 2 » , | J |= , | J|-1 =
«¬ 0 »¼ x p
1

we transform the surface element dzn dx to the surface element | J |1 dtdx,
namely

x
dzn dx = dtdx.
p

Lemma D13 (Gamma Function):


Our final action item is to calculate the marginal probability distributions
dF3 = f 3 (t )dt of Student’s random variable t, namely
D5 A third confidence interval for the mean, variance unknown 605

dF3 = f 3 (t )dt
f
1 1 1 § 1 t2 ·
f 3 (t ) := ( ) p / 2 ³ x ( p 1) / 2 exp ¨  (1 + ) x ¸ dx.
2 pS *( p / 2) 2 0 © 2 p ¹

Consult W. Gröbner and N. Hofreiter (1973 p.55) for the standard integral

D ! *(D + 1)
f

³x
D
exp(  E x )dx = =
0
E D +1 E D +1

and
p +1
f *( )
1 t2 2
³0 x ( p 1) / 2
exp[  (1 + ) x ]dx = 2( p +1) / 2 ,
2 p t2
(1 + )( p +1) / 2
p
where p = n  1 is the rank of the quadratic Helmert matrix Hn. Notice a result
p +1 p 1
of the gamma function *( )=( )! . In summary, substituting the stan-
2 2
dard integral
dF3 = f 3 (t )dt

p +1
*( )
2 1 1
f 3 (t ) = 2
p pS t
*( ) (1 + )( p +1) / 2
2 p
resulted in Student’s t distribution, namely the pdf of Student’s random variable
( Pˆ  P ) / Vˆ .
h
D52 The confidence interval for the mean, variance unknown
Lemma D12 is the basis for the construction of the confidence interval of the
“true” mean, variance unknown, which we summarize in Lemma D13. Example
D17 contains all details for computing such a confidence interval, namely Table
D8, a collection of the most popular values of the coefficient of confidence, as
well as Table D9, D10 and D11, listing the quantiles for the confidence interval
of the Student random variable with p = n  1 degrees of freedom. Figure D8
and Figure D9 illustrate the probability of two-sided confidence interval for the
mean, variance unknown, and the limits of the confidence interval. Table D12 as
a flow chart paves the way for the “fast computation” of the confidence interval
for the “true” mean, variance unknown.
606 Appendix D: Sampling distributions and their use

Lemma D14 (confidence interval for the sample mean, variance unknown):
The random variable t = n ( Pˆ  P ) / Vˆ characterized by the ratio of the deviate
of the sample mean Pˆ = n 1 1cy , BLUUE of P , from the “true” mean P and the
standard deviation Vˆ , Vˆ 2 = (y  1Pˆ )c(y  1Pˆ ) /( n  1) BIQUUE of the “true”
variance V 2 , has the Student t-distribution with p = n  1 “degrees of freedom”.
The “true” mean P is an element of the two-sided confidence interval
Vˆ Vˆ
P ]Pˆ  c1D / 2 , Pˆ + c1D / 2 [
n n
with confidence
Vˆ Vˆ
P{Pˆ  c1D / 2 < P < Pˆ + c1D / 2 } = 1  D
n n
of level 1  D . For three values of the coefficient of confidence J = 1  D , Table
D9 is a list of associated quantiles c1D / 2 .
Example D17 (confidence interval for the example mean P̂ , V 2 unknown):
Suppose that a random sample

ª y1 º ª1.2 º
«y » « »
« 2 » = «3.4 » , Pˆ = 2.7, Vˆ 2 = 5.2, Vˆ = 2.3
« y3 » «0.6 »
« » « »
«¬ y4 »¼ ¬5.6 ¼

of four observations is characterized by the sample mean Pˆ = 2.7 and the sam-
ple variance Vˆ 2 = 5.2 . 2(2.7  P ) / 2.3 = n ( Pˆ  P ) / Vˆ = t has Student’s pdf
with p = n  1 = 3 degrees of freedom. The probability J = 1  D that t will be
between any two arbitrarily chosen numbers c1 = c and c2 = +c is
c2

P{c1 < t < c2 } = ³ f (t )dt = J = 1  D


c1

+c

P{c < t < c} = ³ f (t )dt = J = 1  D


c

c c

P{c < t < c} = ³ f (t )dt  ³ f (t )dt = J = 1  D


f f

c c
D D
³ f (t )dt = 1  , ³ f (t )dt =
f
2 f
2
D5 A third confidence interval for the mean, variance unknown 607

J is the coefficient of confidence, D the coefficient of negative confidence, also


called complementary coefficient of confidence. The four representations of the
probability J = 1  D to include t in the confidence interval c < t < +c have led
to the linear Volterra integral equation of the first kind
c
D 1
³ f (t )dt = 1  = (1 + J ) .
f
2 2

Three values of the coefficient of confidence J or its complement D are popular


and listed in Table D7.
Table D8
Values of the coefficient of confidence
J 0.950 0.990 0.999

D 0.050 0.010 0.001

D
0.025 0005 0.000,5
2
D 1+ J
1 = 0.975 0.995 0.999,5
2 2
In solving the linear Volterra integral equation of the first kind
t
1 1
³ f (t )dt = 1  D (t ) = [1 + J (t )] ,
f
2 2

which depends on the degrees of freedom p = n  1 , Table D9-D11 collect the


quantiles c1D / 2 / n given the coefficients of confidence or their complements
which we listed in Table D8.
Table D9
Quantiles c1D / 2 / n
for the confidence interval of the Student random variable with p = n  1
degrees of freedom
1  D / 2 = (1 + J ) / 2 = 0.975, J = 0.95, D = 0.05
1 1
p n c1D / 2 p n c1D / 2
n n
1 2 8.99 14 15 0.554
2 3 2.48 19 20 0.468
3 4 1.59 24 25 0.413
4 5 1.24 29 30 0.373
608 Appendix D: Sampling distributions and their use

5 6 1.05 39 40 0.320
6 7 0.925 49 50 0.284
7 8 0.836 99 100 0.198
8 9 0.769 199 200 0.139
9 10 0.715 499 500 0.088
Table D10
Quantiles c1D / 2 / n
for the confidence interval of the Student random variable with p = n  1
degrees of freedom
1  D / 2 = (1 + J ) / 2 = 0.995, J = 0.990,D = 0.010
p n 1 p n 1
c1D / 2 c1D / 2
n n

1 2 45.01 14 15 0.769
2 3 5.73 19 20 0.640
3 4 2.92 24 25 0.559
4 5 2.06 29 30 0.503
5 6 1.65 39 40 0.428
6 7 1.40 49 50 0.379
7 8 1.24 99 100 0.263
8 9 1.12 199 200 0.184
9 10 1.03 499 500 0.116

Table D11
Quantiles c1D / 2 / n
for the confidence interval of the Student random variable with p = n  1
degrees of freedom
1  D / 2 = (1 + J ) / 2 = 0.999.5, J = 0.999, D = 0.001
p n c1D / 2 c1D / 2 / n p n c1D / 2 c1D / 2 / n

1 2 636.619 450.158 14 15 4.140 1.069


2 3 31.598 18.243 19 20 3.883 0.868
3 4 12.941 6.470 24 25 3.725 0.745
4 5 8.610 3.851 29 30 3.659 0.668
5 6 6.859 2.800 30 31 3.646 0.655
6 7 5.959 2.252 40 41 3.551 0.555
7 8 5.405 1.911 60 61 3.460 0.443
8 9 5.041 1.680 120 121 3.373 0.307
9 10 4.781 1.512 f f 3.291 0
D5 A third confidence interval for the mean, variance unknown 609

Since Student’s pdf depends on p = n  1 , we have tabulated c1D / 2 / n for the


confidence interval we are going to construct. The Student’s random variable
t = n ( Pˆ  P ) / Vˆ is solved for P , evidently our motivation to introduce the
confidence interval Pˆ1 < P < Pˆ 2
Pˆ  P Vˆ Vˆ
t= n Ÿ Pˆ  P = t Ÿ P = Pˆ  t
Vˆ n n
Vˆ Vˆ
Pˆ1 := Pˆ  c1D / 2 < P < Pˆ + c1D / 2 =: Pˆ 2 .
n n
The interval Pˆ1 < P < Pˆ 2 for the fixed value t = c1D / 2 contains the “true” mean
P with probability J = 1  D .
Vˆ Vˆ
P{Pˆ - c1D / 2 < P < Pˆ + c1D / 2 } =
n n
+c
= ³ f (t )dt = J = 1  D
c

because of
c c1D / 2
D
³ f (t )dt = ³ f (t )dt = 1  .
f f
2

Figure D8 and Figure D9 illustrate the coefficient of confidence and the prob-
ability function of a confidence interval.

f (t )
J =1-D
D/2 D/2

c1D / 2 t + c1D / 2
Figure D8: Two-sided confidence interval P ]Pˆ1 , Pˆ 2 [ , Student’s pdf for p = 3
degrees of freedom (n = 4) Pˆ1 = Pˆ  Vˆ c1D / 2 / n , Pˆ 2 = Pˆ + Vˆ c1D / 2 / n

Pˆ  Vˆ c1D / 2 / n Pˆ + Vˆ c1D / 2 / n

P̂1 P P̂ 2
Figure D9: Two-sided confidence interval for the “true” mean P , quantile
c1D / 2
610 Appendix D: Sampling distributions and their use

Let us specify all the integrals to our example.


c1D / 2

³ f (t )dt = 1  D / 2
f

c0.975 c0.995 c0.999 ,5

³ f (t )dt = 0.975, ³ f (t )dt =0.995, ³ f (t )dt =0.999,5.


f f f

These data substituted into Tables D9-D11 lead to the triplet of confidence inter-
vals for the “true” mean.
case (i ) : J = 0.95, D = 0.05, 1  D / 2 = 0.975

p = 3, n = 4, c1D / 2 / n = 1.59

P{2.7  2.3 1.59 < P < 2.7 + 2.3 1.59} = 0.95

P{0.957 < P < 6.357} = 0.95 .

case (ii ) : J = 0.99, D = 0.01, 1  D / 2 = 0.995

p = 3, n = 4, c1D / 2 / n = 2.92

P{2.7  2.3 2.92 < P < 2.7 + 2.3 2.92} = 0.99

P{4.016 < P < 9.416} = 0.99 .

case (iii ) : J = 0.999, D = 0.001, 1  D / 2 = 0.999,5

p = 3, n = 4, c1D / 2 / n = 6.470

P{2.7  2.3 6.470 < P < 2.7 + 2.3 6.470} = 0.999

P{12.181 < P < 17.581} = 0.999 .

The results may be summarized as follows:


With probability 95% the “true” mean P is an element of the in-
terval ]-0.957,+6.357[. In contrast, with probability 99% the “true”
mean P is an element of the larger interval ]-4.016,+9.416[. Fi-
nally, with probability 99.9% the “true” mean P is an element of
the largest interval ]-12.181,+17.581[. If we compare the confi-
dence intervals for the mean P , V 2 known, versus V 2 unknown,
we realize much larger intervals for the second model. Such a re-
sult is not too much a surprise, since the model “ V 2 unknown” is
much weaker than the model “ V 2 known”.
D5 A third confidence interval for the mean, variance unknown 611

Table D12: Flow chart:


Confidence interval for the mean P , V 2 unknown:
Choose a coefficient of confidence J according to Table D8.

Solve the linear Volterra integral equation of the first kind


F (c1D / 2 ) = 1  D / 2 = (1 + J ) / 2 by Table 9, Table 10 or Table 11
for a Student pdf with p = n  1 degrees of freedom: read
c1D / 2 / n .

Compute the sample mean P̂ and the sample standard deviation Vˆ .

Compute Vˆ c1D / 2 / n and Pˆ  Vˆ c1D / 2 / n , Pˆ + Vˆ c1D / 2 / n .


D53 The Uncertainty Principle

Figure D10: Length of the confidence interval for the mean against the
number of observations
Figure D10 is the graph of the function

'P (n;D ) := 2Vˆ c1D / 2 / n ,


where 'P is the length of the confidence interval of the “true” mean P , V 2
unknown. The independent variable of the function is the number of observations
n. The function 'P (n;D ) is plotted for fixed values of the coefficient of com-
plementary confidence D , namely D = 10%, 5%, 1%. For reasons given later the
coefficient of complementary confidence D is called uncertainty number. The
graph function 'P (n;D) illustrates two important facts.
Fact #1: For a constant uncertainty number D , the length of the confi-
dence interval 'P is smaller the larger the number of observa-
tions n is chosen.
612 Appendix D: Sampling distributions and their use

Fact #2: For a constant number of observations n, the smaller the num-
ber of uncertainly D is chosen, the larger is the confidence in-
terval 'P .
Evidently, the diversive influences of (i) the length of the confidence interval
'P , (ii) the uncertainly number D and (iii) the number of observations n, which
we collected in The Magic Triangle of Figure D11, constitute The Uncertainly
Principle, in formula
'P (n  1) t: k PD ,
where kPD is called the quantum number for the mean P which depends on the
uncertainty number D . Table D13 is a list of those quantum numbers. Let us
interpret the uncertainty relation 'P (n  1) t kPD . The product 'P (n  1) defines
geometrically a hyperbola which we approximated out of the graph of Figure
D10. Given the uncertainty number D , the product 'P (n  1) has a smallest
number, here denoted by k PD . For instance, choose D = 1% such that
k PD /(n  1) d 'P or 16.4 /(n  1) d 'P . For n taken values 2, 11, 101, we get the
inequalities 8.2 d 'P , 1.64 d 'P , 0.164 d 'P .
length of the confidence interval

uncertainty number D number of observations n

Figure D11: The Magic Triangle, constituents:


(i) uncertainty number D ,
(ii) number of observations n,
(iii) length of the confidence interval 'P .

Table D13
Coefficient of complementary confidence D , uncertainty number D , versus
quantum number of the mean k PD (E. Grafarend 1970)
D kPD
10% 6.6
5% 9.6
1% 16.4
o0 of
D6 A fourth confidence interval for the variance 613

D6 Sampling from the Gauss-Laplace normal distribution:


a fourth confidence interval for the variance
Theorem D10 already supplied us with the sampling distribution of the sample
variance, namely with the probability density function f ( x) of Helmert’s ran-
dom variable x = (n  1)Vˆ 2 / V 2 of Gauss-Laplace i.i.d. observations n of sample
variance Vˆ 2 , BIQUUE of V 2 . D61 introduces accordingly the so far missing
confidence interval for the “true” variance V 2 . Lemma D15 contains the details,
followed by Example D16. Table D15, Table D16 and Table D17 contain the
properly chosen coefficients of complementary confidence and their quantiles
c1 ( p;D / 2), c2 ( p;1  D / 2) ,

dependent on the “degrees of freedom” p = n  1 . Table D18 as a flow chart


summarizes various steps in computing a confidence interval for the variance
V 2 . D 62 reviews The Uncertainty Principle which is built on (i) the coefficient
of complementary confidence D , also called uncertainty number, (ii) the number
of observations n and (iii) the length 'V 2 (n;D ) of the confidence interval for
the “true” variance V 2 .
D61 The confidence interval for the variance
Lemma D15 summaries the construction of a two-sided confidence interval for
the “true” variance based upon Helmert’s Chi Square distribution of the random
variable (n-1)Vˆ 2 / V 2 where Vˆ 2 is BIQUUE of V 2 . Example D18 introduces a
random sample of size n = 100 with an empirical variance Vˆ 2 = 20.6 . Based
upon coefficients of confidence and complementary confidence given in Table
D14, related confidence intervals are computed. The associated quantiles for
Helmert’s Chi Square distribution are tabulated in Table D15 (J = 0.95) , Table
D16 (J = 0.99) and Table D17 (J = 0.998). Finally, Table D18 as a flow chart
pares the way for the “fast computation” of the confidence interval for the “true”
variance V 2 .
Lemma D15 (confidence interval for the variance):
The random variable x =(n-1)Vˆ 2 / V 2 , also called Helmert’s F 2 , characterized by
the ration of the sample variance Vˆ 2 BIQUUE of V 2 , and the “true” variance
V 2 , has the FM2 pdf of p = n  1 degrees of freedom, if the random observations
yi , i  {1," n} are Gauss-Laplace i.i.d. The “true” variance V 2 is an element of
the two-sided confidence interval
(n  1)Vˆ 2 (n  1)Vˆ 2
V2  ] , [
c2 ( p;1  D / 2) c1 ( p; D / 2)
with confidence
(n  1)Vˆ 2 (n  1)Vˆ 2
P{ <V2 < } = 1D = J
c2 ( p;1  D / 2) c1 ( p;D / 2)
614 Appendix D: Sampling distributions and their use

of level 1  D . Tables D5, D16 and D14 list the quantiles c1 ( p;1  D / 2) and
c2 ( p;D / 2) associated to three values of Table D17 of the coefficient of com-
plementary confidence 1  D .
In order to make yourself more familiar with Helmert’s Chi Square distribution
we recommend to solve the problems of Exercise D1.
Exercise D1 (Helmert’s Chi Square FM2 distribution):
Helmert’s random variable x := (n-1)Vˆ 2 / V 2 = pVˆ 2 / V 2 has the non-symmetric
FM2 pdf. Prove that the first four central moments are,
(i ) S 1 = 0, E{x} = P = p

(ii ) S 2 = V x2 = 2 p

(iii ) S 3 = (2 p )3 / 2 (8 / p)1/ 4

(coefficient of skewness S 32 / S 22 = 8 / p )

(iv) S 4 = 6V 4 (1 + 2 p)

(coefficient of curtosis S 4 / S 22  3 = 3 + 12 / p ).

Guide

p º
E{x} = E{Vˆ 2 } » Ÿ E{x} = p
(i) V2
»
E{Vˆ 2 } = V 2 ("unbiasedness") »¼

2 2 º
D{Vˆ 2 } = V4 = V4»
n 1 p
(ii) » Ÿ D{x} = 2 p.
2
p »
D{x} = 4 D{Vˆ 2 } »
V ¼
Example D18 (confidence interval for the sample variance Vˆ 2 ):
Suppose that a random sample ( y1, " yn )  Y of size n = 100 has led to an em-
pirical variance Vˆ 2 = 20.6 . x =(n-1)Vˆ 2 / V 2 or 99 20.6 / V 2 = 2039.4 / V 2 has
Helmert’s pdf with p = n  1 = 99 degrees of freedom. The probability J = 1  D
that x will be between c1 ( p;D / 2) and c2 ( p;1  D / 2) is
c2

P{c1 ( p;D / 2) < x < c 2 ( p;1  D / 2)} = ³ f ( x)dx = J = 1  D


c1
D6 A fourth confidence interval for the variance 615
c2 c1

P{c1 ( p;D / 2) < x < c 2 ( p;1  D / 2)} = ³ f ( x)dx  ³ f ( x)dx = J = 1  D


0 0

c2 c1

³ f ( x)dx = F(c ) = 1  D / 2 = (1 + J ) / 2, ³ f ( x)dx = F(c ) = D / 2 = (1  J ) / 2


0
2
0
1

P{c1 ( p; D / 2) < x < c2 ( p;1  D / 2)} = F(c2 )  F(c1 ) = (1 + J ) / 2  (1  J ) / 2 = J

Vˆ 2
c1 ( p; D / 2) < x < c2 ( p;1  D / 2) œ c1 ( p; D / 2) < (n  1) < c2 ( p;1  D / 2)
V2
or

1 V2 1 (n  1)Vˆ 2 (n  1)Vˆ 2
< < œ < V 2
<
c2 (n  1)Vˆ 2 c1 c2 c1

(n  1)Vˆ 2 (n  1)Vˆ 2
P{ <V2 < } = 1D = J .
c2 ( p;1  D / 2) c1 ( p; D / 2)

Since Helmert’s pdf f ( x; p ) is now-symmetric there arises the question how to


distribute the confidence J or the complementary confidence D = J  1 on the
confidence interval limits c1 and c2 , respectively. If we setup F(c1 ) = D / 2 , we
define a cumulative probability half of the complementary confidence. If
F(c2 )  F(c1 ) = P{c1 ( p;D / 2) < x < c2 ( p;1  D / 2)} = 1  D = J is the cumulative
probability contained in the interval c1 < x < c2 , we derive F(c2 ) = 1  D / 2 .
Accordingly c1 ( p;D / 2) < x < c2 ( p;1  D / 2) is the confidence interval based
upon the quantile c1 with cumulative probability D / 2 and the quantile c2 with
cumulative probability 1  D / 2 . The four representations of the cumulative
probability of the confidence interval c1 < x < c2 establish two linear Volterra
integral equations of the first kind
c1 c2

³ f ( x)dx = D / 2 and ³ f ( x)dx = 1  D / 2 ,


0 0

dependent on the degree of freedom p = n  1 of Helmert’s pdf f ( x, p ) .


As soon as we have established the confidence interval c1 ( p;D / 2) < x <
c2 ( p;1  D / 2) for Helmert’s random variable x = (n -1)Vˆ 2 / V 2 = pVˆ 2 / V 2 , we
are left with the problem of how to generate a confidence interval for the “true”
variance V 2 , the sample variance Vˆ 2 given. If we take the reciprocal interval
c21 < x 1 < c11 for Helmert’s inverse random variable 1/ x = V 2 /[(n  1)Vˆ 2 ] , we
are able to multiply both sides by (n  1)Vˆ 2 . In summary, a confidence interval
which corresponds to c1 < x < c2 is (n  1)Vˆ 2 / c2 < V 2 < (n  1)Vˆ 2 / c1 .
Three values of the coefficient of confidence J or of complementary confidence
D = 1  J which are most popular we list in Table 14.
616 Appendix D: Sampling distributions and their use

0.5

f(x)

0 c1 ( p;D / 2) c2 ( p;1  D / 2)

x := (n-1)Vˆ 2 / V 2 = pVˆ 2 / V

Table 14: y = f ( x ), x := ( n  1)Vˆ 2 / V 2 = pVˆ 2 / V 2


Figure D12: Two-sided confidence interval V 2 ] pV 2 / c2 , pV 2 / c1 [
Helmert’s pdf, f(x)
F(c2 ) = 1  D / 2, F(c1 ) = D / 2 , F(c2 )  F(c1 ) = 1  D = J

(n  1)Vˆ 2 V2 (n  1)Vˆ 2
c2 ( p;1  D / 2) c1 ( p;D / 2)

Figure D13: Two-sided confidence interval for the “true” variance V 2 ,


quantiles c1 ( p;D / 2) and c2 ( p;1  D / 2)
Table D14
Values of the coefficient of confidence
J 0.950 0.990 0.998
D 0.050 0.010 0.002
D /2 0.025 0.005 0.001
1- D / 2 0.975 0.995 0.999

In solving the linear Volterra integral equations of the first kind


c1 c2

³ f ( x)dx = F(c1 ) = D / 2 and ³ f ( x)dx = F(c ) = 1  D / 2 ,


2
0 0

which depend on the degrees of freedom p = n  1 , Tables D15-D17 collect the


quantiles c1 ( p;D / 2) and c2 ( p;1  D / 2) for given values p and D .
D6 A fourth confidence interval for the variance 617

Table D15
Quantiles c1 ( p;D / 2) , c2 ( p;1  D / 2) for the confidence interval of the Hel-
mert random variable with p = n  1 degrees of freedom
D / 2 = 0.025,1  D / 2 = 0.975,D = 0.05, J = 0.95

p n c1 ( p;D / 2) c2 ( p;1  D / 2) p n c1 ( p;D / 2) c2 ( p;1  D / 2)

1 2 0.000 5.02 14 15 5.63 26.1


2 3 0.506 7.38 19 20 8.91 32.9
3 4 0.216 9.35 24 25 12.4 39.4
4 5 0.484 11.1 29 30 16.0 45.7
5 6 0.831 12.8 39 40 23.7 58.1
6 7 1.24 14.4 49 50 31.6 70.2
7 8 1.69 16.0 99 100 73.4 128
8 9 2.18 17.5
9 10 2.70 19.0
Table D16
Quantiles c1 ( p;D / 2) , c2 ( p;1  D / 2) for the confidence interval of the Hel-
mert random variable with p = n  1 degrees of freedom
D / 2 = 0.005, 1  D / 2 = 0.995, D = 0.01, J = 0.99

p n c1 ( p;D / 2) c2 ( p;1  D / 2) p n c1 ( p;D / 2) c2 ( p;1  D / 2)

1 2 0.000 7.88 8 9 1.34 22.0


2 3 0.010 9 10 1.73 23.6
3 4 0.072
4 5 0.207 14 15 4.07 31.3
19 20 6.84 38.6
5 6 0.412 24 25 9.89 45.6
6 7 0.676 29 30 13.1 52.3
7 8 0.989
39 40 20.0 65.5
49 50 27.2 78.2
99 100 66.5 139
Table D17
Quantiles c1 ( p;D / 2) , c2 ( p;1  D / 2) for the confidence interval of the Helmert
random variable for p = n  1 degrees of freedom
D / 2 = 0.001, 1  D / 2 = 0.999, D = 0.002, J = 0.998

p n c1 ( p;D / 2) c2 ( p;1  D / 2) p n c1 ( p;D / 2) c2 ( p;1  D / 2)

1 2 0.00 10.83 9 10 1.15 27.88


618 Appendix D: Sampling distributions and their use

2 3 0.00 13.82 14 15 3.04 36.12


3 4 0.02 16.27 19 20 5.41 43.82
4 5 0.09 18.47 24 25 8.1 51.2
5 6 0.21 20.52 29 30 11.0 58.3
6 7 0.38 22.46 50 51 24.7 86.7
7 8 0.60 24.32 70 71 39.0 112.3
8 9 0.86 26.13 99 100 60.3 147.3
100 101 61.9 149.4
Those data collected in Table D15, Table D16 and Table D17 lead to the triplet
of confidence intervals for the “true” variance
case (i ) : J = 0.95, D = 0.05, D / 2 = 0.025, 1  D / 2 = 0.975
p = 99, n = 100, c1 ( p;D / 2) = 73.4, c2 ( p;1  D / 2) = 128

99 20.6 99 20.6
P{ <V2 < } = 0.95
128 73.4

P{15.9 < V 2 < 27.8} = 0.95 .

case (ii ) : J = 0.99, D = 0.01, D / 2 = 0.005, 1  D / 2 = 0.995


p = 99, n = 100, c1 ( p;D / 2) = 66.5, c2 ( p;1  D / 2) = 139

99 20.6 99 20.6
P{ <V2 < } = 0.99
139 66.5

P{14.7 < V 2 < 30.7} = 0.99 .

case (iii ) : J = 0.998, D = 0.002,D / 2 = 0.001,1  D / 2 = 0.999


p = 99, n = 100, c1 ( p;D / 2) = 60.3, c2 ( p;1  D / 2) = 147.3

99 20.6 99 20.6
P{ <V2 < } = 0.996
147.3 60.3

P{13.8 < V 2 < 33.8} = 0.998 .

The results can be summarized as follows.


With probability 95%, the “true” variance V 2 is an element of the
interval ]15.9,27.8[. In contrast, with probability 99%, the “true”
variance is an element of the larger interval ]14.7,30.7[. Finally,
with probability 99.8% the “true” variance is an element of the
largest interval ]13.8, 33.8[. If we compare the confidence inter-
vals for the variance V 2 , we realize much larger intervals for
smaller complementary confidence namely 5%, 1% and 0.2%.
Such a result is subject of The Uncertainty Principle.
D6 A fourth confidence interval for the variance 619

Table D18: Flow chart:


Confidence interval for the variance V 2
Choose a coefficient of confidence J according to Table D14.

Solve the linear Volterra integral equation of the first kind


F(c1 ( p;D / 2)) = D / 2, F(c2 ( p;1  D / 2)) = 1  D / 2

by Table 15, Table 16 or Table 17 for a Helmert pdf with p = n  1


degrees of freedom.

Compute the sample variance Vˆ 2 and the quantiles

(n  1)Vˆ 2 (n  1)Vˆ 2
V2  ] , [.
c2 ( p;1  D / 2) c1 ( p; D / 2)

D62 The Uncertainty Principle

Figure D14: Length of the confidence interval for the variance against the
number of observations.
Figure D14 is the graph of the function
1 1
'V 2 (n;D ) := (n  1)Vˆ 2 (  ),
c1 (n  1;D / 2) c2 (n  1;1  D / 2)

where 'V 2 is the length of the confidence interval of the “true” variance V 2 .
The independent variable of the functions is the number of observations n. The
function 'V 2 (n;D ) is plotted for fixed values of the coefficient of complemen-
tary confidence D , namely D = 5% , 1%, 0.2%. For reasons given later on the
coefficient of complementary confidence D is called uncertainty number. The
graph of the function 'V 2 (n;D ) illustrates two important facts.
620 Appendix D: Sampling distributions and their use

Fact #1: For a contrast uncertainty number D , the length of the confi-
dence interval 'V 2 is smaller, the larger number of observa-
tions n is chosen.
Fact #2: For a contrast number of observations n, the smaller the num-
ber of uncertainty D is chosen, the larger is the confidence in-
terval 'V 2 .
Evidently, the divisive influences of (i) the length of the confidence interval
'V 2 , (ii) the uncertainty number D and (iii) the number of observations n,
which we collect in The Magic Triangle of Figure D15, constitute The Uncer-
tainty Principle, formulated by the inequality
'V 2 (n  1) t kV D ,
2

where kV D is called quantum number for the variance V 2 . The quantum number
2

depends on the uncertainty number D . Let us interpret the uncertainty relation


'V 2 (n  1) t kV D . The product 'V 2 (n  1) defines geometrically a hyperbola
2

which we approximated to the graph of Figure D14. Given the uncertainty num-
ber D , the product 'V 2 (n  1) has a smallest number denoted by kV D . For 2

instance, choose D = 1% such that kV D /(n  1) d 'V 2 or 42 /(n  1) d 'V 2 . For


2

the number of observations n, for instance n = 2, 11, 101, we find the inequalities
42 d 'V 2 , 4.2 d 'V 2 , 0.42 d 'V 2 .
length of the confidence interval

uncertainty number D number of observations n


Figure D15: The Magic Triangle, constituents:
(i) uncertainty number D ,
(ii) number of observations n,
(iii) length of the confidence interval 'V 2 .
Table D19
Coefficient of complementary confidence D , uncertainty number D versus
quantum number of the variance kV D (E. Grafarend 1970)
2

D kV D
2

10% 19.5
5% 25.9
1% 42.0
o0 of
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 621

At the end, pay attention to the quantum numbers kV D listed in Table D19.
2

D7 Sampling from the multidimensional Gauss-Laplace normal


distribution: the confidence region for the fixed parameters
in the linear Gauss-Markov model
Example D19 (special linear Gauss-Markov model, marginal probability distri-
butions)
For a simple linear Gauss-Markov model E{y} = Aȟ, A  \ n × m , rk A = m ,
namely n=3, m=2, D{y} = IV 2 of Gauss-Laplace i.i.d. observations, we are go-
ing to compute
• the sample pdf of ȟ̂ BLUUE of ȟ ,
• the sample pdf of Vˆ 2 BIQUUE of V 2 .
We follow the action within seven items. First, we identify the pdf of Gauss-
Laplace i.i.d. observations y  \ n . Second, we review the estimations of ȟ̂
BLUUE of ȟ as well as Vˆ 2 BIQUUE of V 2 . Third, we decompose the Euclid-
ean norm of the observation space || y  E{y} ||2 into the Euclidean norms
|| y  Aȟˆ ||2 and || ȟˆ  ȟ ||2 . Fourth, we present the eigenspace A cA analysis and
the eigenspace synthesis of the associated matrices M := I n  A( A cA) 1 A c and
N := A cA within the Eulidean norms || y  Aȟˆ ||2 = y cMy and || ȟˆ  ȟ ||2AcA =
= (ȟˆ  ȟ )cN (ȟˆ  ȟ ) , respectively. The eigenspace representation leads us to ca-
nonical random variables ( z1 ," , zn  m ) relating to the norm || y  Aȟˆ ||2 =
= y cMy and ( zn  m+1 ," , zn ) relating to the norm || ȟˆ  ȟ ||2N which are standard
Gauss-Laplace normal. Fifth, we derive the cumulative probability of Helmert’s
random variable x := (n  rk A)Vˆ 2 / V 2 = ( n  m)Vˆ 2 / V 2 and of the unknown
parameter vector ȟ̂ BLUUE of ȟ or its canonical counterpart Ș̂ BLUUE of Ș ,
multivariate Gauss-Laplace normal. Action six generates the marginal pdf of ȟ̂
or Ș̂ , both BLUUE of ȟ or Ș , respectively. Finally, action seven leads us to
Helmert’s Chi Square distribution F p2 with p = n  rk A = n  m (here: n–m = 1)
“degrees of freedom”.
The first action item
Let us assume an experiment of three Gauss-Laplace i.i.d. observations
[ y1 , y2 , y3 ]c = y which constitute the coordinates of the observation space Y,
dim Y = 3. The observations yi , i  {1, 2,3} are related to a parameter space ;
with coordinates [[1, [ 2 ] = ȟ in the sense to generate a straight line
y{k} = [1 + [ 2 k . The fixed effects [ j , j  {1, 2} , define geometrically a straight
line, statistically a special linear Gauss-Markov model of one variance compo-
nent, namely
the first moment
622 Appendix D: Sampling distributions and their use

­ ª y1 º ½ ª1 k1 º ª1 0 º
°« » ° « ª [1 º « ª[ º
E{y} = Aȟ  E ® « y2 » ¾ = «1 k2 » « » = «1 1 »» « 1 » , rk A = 2 .
»
° « y » ° «1 k » ¬[ 2 ¼ «1 2 » ¬[ 2 ¼
¯¬ 3 ¼ ¿ ¬ 3¼ ¬ ¼
The central second moment

­ ª y1 º ½ ª1 0 0 º
° °
D{y} = I nV  D{y} = D ® «« y2 »» ¾ = ««0 1 0 »» V 2 , V 2 < 0 .
2

° « y » ° «0 0 1 »
¯¬ 3 ¼ ¿ ¬ ¼
k represents the abscissa as a fixed random, y the ordinate as the observation,
naturally a random effect. Samples of the straight line are taken at k1 = 0,
k2 = 1, k3 = 2 , this calling for y (k1 ) = y (0) = y1 , y (k2 ) = y (1) = y2 , y (k3 ) =
= y (2) = y3 , respectively. E{y} is a consistent equation. Alternatively, we may
say E{y}  R ( A ) . The matrix A  \ 3× 2 is rank deficient by p = n  rk A = 1,
also called “degree of freedom”. The dispersion matrix D{y} , the central mo-
ment of second order, is represented as a linear model, too, namely by one-
variance component V 2 . The joint probability function of the three Gauss-
Laplace i.i.d. observations
dF = f ( y1 , y2 , y3 )dy1dy2 dy3 = f ( y1 ) f ( y2 ) f ( y3 )dy1dy2 dy3

§ 1 ·
f ( y1 , y2 , y3 ) = (2S ) 3/ 2 V 3 exp ¨  2 (y  E{y})c( y  E{y}) ¸
© 2V ¹
will be transformed by means of the special linear Gauss-Markov model with
one-variance component.
The second action item
For such a transformation, we need ȟ̂ BLUUE of ȟ and Vˆ 2 BIQUUE V 2 .

ȟˆ = ( AcA) 1 Acy ,
ª 3 3º 1 ª 5 3º
A cA = « » , ( A cA) = 6 « 3 3 » ,
1

¬ 3 5 ¼ ¬ ¼
ȟ̂ BLUUE of ȟ :

ª y1 º
ˆȟ = 1 ª 5 2 1º « y » ,
6 «¬ 3 0 3 »¼ « »
2
«¬ y3 »¼
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 623

1
Vˆ 2 = (y  Aȟˆ )c(y  Aȟˆ ) =
n  rk A
1
Vˆ 2 BIQUUE V 2 : = y c(I n  A ( A cA ) 1 A c)y ,
n  rk A
1
Vˆ 2 = ( y12  4 y1 y2 + 2 y1 y3 + 4 y22  4 y2 y3 + y32 ) .
6
The third action item
The quadratic form (y  E{y})c(y  E{y}) allows the fundamental decomposi-
tion

y  E{y} = y  Aȟ = y  Aȟˆ + A(ȟˆ  ȟ )

(y  E{y})c(y  E{y}) = ( y  Aȟ)c( y  Aȟ) = ( y  Aȟˆ )c( y  Aȟˆ ) + (ȟˆ  ȟ)cAcA(ȟˆ  ȟ)


(y  E{y})c(y  E{y}) = (n  rk A)ıˆ 2 + (ȟˆ  ȟ )cA cA(ȟˆ  ȟ )

ª3 3º ˆ
(y  E{y})c(y  E{y}) = ıˆ 2 + (ȟˆ  ȟ )c « » (ȟ  ȟ ).
¬3 5 ¼
The fourth action item
In order to bring the quadratic form (y  E{y})c(y  E{y}) = ( n  rk A)Vˆ 2 +
+(ȟˆ  ȟ )cAcA(ȟˆ  ȟ ) into a canonical form, we introduce the generalised forward
and backward Helmert transformation
HH c = I n
z = V H (y  E{y}) = V 1 H (y  Aȟ )
1

and
y  E{y} = V H cz

(y  E{y})c(y  E{y}) = V 2 z cHH cz = V 2 z cz

1
2
(y  E{y})c(y  E{y}) = z cz = z12 + z22 + z32 .
V
?How to relate the sample variance Vˆ 2 and the sample quadratic form
(ȟˆ  ȟ )cAcA(ȟˆ  ȟ ) to the canonical quadratic form z cz ?
Previously, for the example of direct observations in the special linear Gauss-
Markov model E{y} = 1P , D{y} = I nV 2 we succeeded to relate z12 + " + zn21 to
Vˆ 2 and zn2 to ( Pˆ  P ) 2 . Here the sample variance Vˆ 2 , BIQUUE V 2 , as well as
624 Appendix D: Sampling distributions and their use

the quadratic form of the deviate of the sample parameter vector ȟ̂ from the
“true” parameter vector ȟ have been represented by
1 1
Vˆ 2 = (y  Aȟˆ )c(y  Aȟˆ ) = y c[I n  A( A cA) 1 A c]y
n  rk A n  rk A

rk[I n  A( A cA) 1 A c] = n  rk A = n  m = 1
versus

(ȟˆ  ȟ )cA cA(ȟˆ  ȟ ), rk( A cA) = rk A = m .

The eigenspace of the matrices M and N, namely

M := I n  A ( A cA ) 1 A = N := A cA =
and

ª 1 2 1 º
1« ª3 3 º
= « 2 4 2 »» =« »,
6 ¬3 5 ¼
¬« 1 2 1 »¼
will be analyzed.
The eigenspace analysis of the The eigenspace analysis of the
matrix M matrices N, N-1
j  {1," , n} i  {1," , m}
V cMV = U cNU = Diag (J 1 ," , J m ) = ȁ N
ª Vcº
= « 1 » M [ V1 , V2 ] = U cN 1U = Diag (O1 ," , Om ) = ȁ N 1

¬ V2c ¼
1 1
= Diag ( P1 ," , P n  m , 0," , 0) = ȁ M J1 = ," , J m =
O1 Om
1 1
O1 = ," , Om =
J1 Jm
rk M = n  m rk N = rk N 1 = m
Orthonormality of the eigencolumns Orthonormality of the eigencolumns
V1cV1 = I n  m , V2cV2 = I m
U cU = I m
V1cV2 = 0  \ ( n  m )× m
ª V1c º ªI 0º
« V c » [ V1 V2 ] = « n  m
I m »¼
¬ 2¼ ¬ 0
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 625

v1cv1 = 1 u1cu1 = 1
v1cv2 = 0 u1cu2 = 0
... ...
vnc 1vn = 0 umc 1um = 0
vnc vn = 1 umc um = 1
V  SO(n) U  SO(m)
:eigencolumns: :eigencolumns:
(M  P j I n ) v j = 0 (N  J j I n )u j = 0
:eigenvalues: :eigenvalues:
| M  P j I n |= 0 | N  J i I m |= 0
in particular
eigenspace analysis of the matrix M, eigenspace analysis of the matrix N,
rkM=n-m, M  \ 3×3 , A  \ 3× 2 , rk M = 1 rkN=m, N  \ 2× 2 , rk N = 2
ȁ M = Diag (1, 0, 0) ȁ N = Diag (0.8377, 7.1623)
ª 0.4082 0.7024 0.5830 º
V = [ V1 , V2 ] = «« 0.8165 0.5667 0.1109»» ª 0.8112 0.5847 º
U=« »
«¬ 0.4082 0.4307 0.8049»¼ ¬ 0.5847 0.8112 ¼
V1  \ 3×1 V2  \ 3×2 U  \ 2×2

to be completed by
eigenspace synthesis of the eigenspace synthesis of the
matrix M matrix N, N-1
M = Vȁ M V c N = Uȁ N Uc, N 1 = Uȁ N Uc1

ª Vcº
M = [V1 , V2 ]ȁ M « 1 » =
¬ V2c ¼
ª Vcº
= [V1 , V2 ]Diag ( P1 ," , P n  m , 0," , 0) « 1 » N = UDiag (J 1 ," , J m )U c
¬ V2c ¼
M = V1 Diag ( P1 ," , P n  m )V1c N 1 = UDiag (O1 ," , Om )U c
in particular
M= N=
ª 0.4082 º ª 0.8112 0.5847 º ªJ 1 0 º
« 0.8165» P 0.4082 0.8165 0.4082 « 0.5847 0.8112 » « 0 J »
« » 1[ ] ¬ ¼¬ 2¼
«¬ 0.4082 »¼ ª 0.8112 0.5847 º
«0.5847 0.8112 »
¬ ¼
P1 = 1 versus
J 1 = 0.8377 , J 2 = 7.1623
626 Appendix D: Sampling distributions and their use

N 1 =
ª 0.8112 0.5847 º ªO1 0 º ª 0.8112 0.5847 º
=« »« »« »
¬ 0.5847 0.8112 ¼ ¬ 0 O2 ¼ ¬ 0.5847 0.8112 ¼
versus O1 = J 1 = 1.1937, O2 = J 2 = 0.1396.
1 1
P1 = 1
The non-vanishing eigenvalues of the matrix M have been denoted by
( P1 ," , P n  m ) , m eigenvalues are zero such that eigen (M ) = ( P1 ," , P n  m ,
, 0," , 0) . The eigenvalues of the regular matrix N span eigen (N) = (J 1 ," , J m ) .
Since the dispersion matrix D{ȟˆ} = ( AcA) 1V 2 = N 1V 2 is generated by the in-
verse of the matrix A cA = N , we have computed, in addition, the eigenvalues of
the matrix N 1 by means of eigen(N 1 ) = (O1 ," , Om ) . The eigenvalues of N and
N 1 , respectively, are related by
J 1 = O11 ," , J m = Om1 or O1 = J 11 ," , Om = J m1 .
In the example, the matrix M had only one non-vanishing eigenvalue P1 = 1 . In
contrast, the regular matrix N was characterized by two eigenvalues J 1 = 0.8377
and J 2 = 7.1623 , its inverse matrix N 1 by O1 = 1.1937 and O2 = 0.1396 .
The two quadratic forms, namely

y cMy = y cVȁ M V cy versus (ȟˆ  ȟ )cN (ȟˆ  ȟ ) = (ȟˆ  ȟ )cUȁ M U c(ȟˆ  ȟ ) ,

build up the original quadratic form


1
(y  E{y})c(y  E{y}) =
V2
1 1
= 2 y cMy + 2 (ȟˆ  ȟ )cN (ȟˆ  ȟ ) =
V V
1 1
= 2 y cVȁ M V cy + 2 (ȟˆ  ȟ )cUȁ N U c(ȟˆ  ȟ )
V V
in terms of the canonical random variables

V cy = y œ y = Vy and Uc(ȟˆ  ȟ ) = Șˆ  Ș

such that
1 1 1
2
( y  E{y})c( y  E{y}) = 2 ( y )cȁ M y + 2 ( Șˆ  Ș) ȁ N ( Șˆ  Ș)
V V V
nm m
1 1 1
2
(y  E{y})c(y  E{y}) = 2 ¦ (y j) Pj +
2
¦ ( Șˆ i  Și ) 2 J i
V V j =1 V2 i =1

1
(y  E{y})c( y  E{y}) = z12 + " + zn2 m + z 2 + " + zn2 .
V2 n  m +1
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 627

The quadratic form z cz splits up into two terms, namely


z12 + " + zn2 m = zn2 m +1 + " + zn2 =
1 nm and 1 m
= ¦ (y ) 2
j Pj = ¦ (Kˆ  K ) J
i i
2
j ,
V2 j =1 V2 i =1

here
1 2 Vˆ 2
z12 = y1* = 2
V2 V
and
1
z22 + z32 = [(Kˆ1  K1 ) 2 J 1 + (Kˆ2  K2 ) 2 J 2 ] ,
V2
or
1
z12 = ( y12  4 y1 y2 + 2 y1 y3 + 4 y22  4 y2 y3 + y32 )
6V 2
and
1 1 ª3 3 º ˆ
z22 + z32 = 2
[0.8377(Kˆ1  K1 )2 + 7.1623(Kˆ2  K 2 ) 2 ] = 2 (ȟˆ  ȟ )c « » (ȟ  ȟ ).
V V ¬3 5 ¼
The fifth action item
We are left with the problem to transform the cumulative probability dF =
= f ( y1 , y2 , y3 )dy1dy2 dy3 into the canonical form dF = f ( z1 , z2 , z3 )dz1dz2 dz3 .
Here we take advantage of Corollary D3. First, we introduce Helmert’s random
variable x := z12 and the random variables [ˆ1 and [ˆ2 of the unknown parameter
vector ȟ̂ of fixed effects (ȟˆ  ȟ )cA cA(ȟˆ  ȟ ) = z22 + z32 = || z ||2AcA if we denote
z := [ z2 , z3 ]c .

dz1dz2 = det A cA d [ˆ1d [ˆ2 = 6 d [ˆ1d [ˆ2 ,

according to Corollary D3 is special represention of the surface element by


means of the matrix of the metric A cA . In summary, the volume element
dx
dz1dz2 dz3 = det A cA d[ˆ1d[ˆ2
x
leads to the first canonical representation of the cumulative probability

1 1 dx | A cA |1/ 2 1
dF = exp( x) 2
exp[ 2 (ȟˆ  ȟ )cA cA(ȟˆ  ȟ )]d [ˆ1d [ˆ2 .
2S 2 x 2SV 2V
The left pdf establishes Helmert’s pdf of x = z12 = Vˆ 2 / V 2 , dx = V 2 dVˆ 2 . In
contrast, the right pdf characterizes the bivariate Gauss-Laplace pdf of || ȟˆ  ȟ ||2 .
628 Appendix D: Sampling distributions and their use

Unfortunately, the bivariate Gauss-Laplace A cA normal pdf is not given in the


canonical form. Therefore, second we do correlate || ȟˆ  ȟ ||2AcA by means of ei-
genspace synthesis.
1 1
A cA = Uȁ AcA Uc = UDiag (J 1 , J 2 )Uc = UDiag ( , )Uc
O1 O2

1 1
|| ȟˆ  ȟ ||2AcA := (ȟˆ  ȟ )cA cA(ȟˆ  ȟ ) = (ȟˆ  ȟ )cUDiag ( , )U c(ȟˆ  ȟ )
O1 O2

1
| A cA |1/ 2 = J 1J 2 =
O1O2

Șˆ  Ș := U c(ȟˆ  ȟ ) œ ȟˆ  ȟ = U ( Șˆ  Ș)

1 1
|| ȟˆ  ȟ ||2AcA = ( Șˆ  Ș)c Diag ( , )( Șˆ  Ș) =|| Șˆ  Ș ||2D
O1 O2

ª 1 º
« 1.1937 0 »
ª 3 3 º
|| ȟˆ  ȟ || 2
A cA = (ȟˆ  ȟ ) c « » (ȟˆ  ȟ ) = ( Șˆ  Ș) c « » ( Șˆ  Ș) =|| Șˆ  Ș ||D2 .
¬3 5 ¼ « 0 1 »
«¬ 0.1396 »¼

By means of the canonical variables Șˆ = Ucȟˆ we derive the cumulative probabil-


ity

1 1 1 1 1 (Kˆ1  K1 )2 (Kˆ2  K2 )2
dF = exp( x)dx exp[  ( + )]dKˆ1dKˆ2
V 2S x 2 2SV 2 O1O2 2V 2 O1 O2
or
dF = f ( x) f (Kˆ1 ) f (Kˆ2 ) dxdKˆ1dKˆ2 .

Third, we prepare ourselves for the cumulative probability dF =


= f ( z1 , z2 , z3 )dz1dz2 dz3 . We depart from the representation of the volume ele-
ment
dx
dz1dz2 dz3 = det A cA d[ˆ1d[ˆ2
2 x
subject to
1 2 1
x = z12 = 2
Vˆ = 2 y cMy
V V

ȟˆ = ( A cA) 1 A cy .
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 629

The Helmert random variable is a quadratic form of the coordinates ( y1 , y2 , y3 )


of the observation vector y. In contrast, ȟ̂ BLUUE of ȟ is a linear form of the
coordinates ( y1 , y2 , y3 ) of observation vector y. The transformation of the vol-
ume element
dxd [ˆ1d [ˆ2 = | J x | dy1dy2 dy3

is based upon the Jacobi matrix J x ( y1 , y2 , y3 )

ª D1 x D2 x D3 x º
« ˆ » ª ac º 1× 3
J x = « D1[1 D [ˆ D3[ˆ1 » = «

» ¬( A A) A ¼ 2 × 3
2 1 1
c
« ˆ
¬ D1[ 2 D2[ˆ2 ˆ
D3[ 2 ¼

ª D1 x º ª y1  2 y2 + y3 º
« » 2 1 «
a = « D2 x » = 2 My = 2 y1 + 4 y2  2 y3 »»
V 3V 2 «
«¬ D3 x »¼ «¬ y1  2 y2 + y3 »¼

1 1
x= 2
y cMy = ( y12  4 y1 y2 + 2 y1 y3 + 4 y22  4 y2 y3 + y32 )
V 6V 2

1 ª 5 2 1º
( A cA) 1 A c =
6 «¬ 3 0 3 »¼

ª 2 y1  4 y2 + 2 y3 4 y1 + 8 y2  4 y3 2 y1  4 y2 + 2 y3 º
1 « »
Jx = 5 2 1
6V 2 « »
«¬ 3 0 3 »¼

det J x = det( A cA) 1 det[aca  acA( AcA) 1 Aca]

det[aca  acA ( A cA ) 1 A ca]


det J x =
det( A cA)

4 4
aca = 4
y cM cMy = 4 y cMy
V V
4
acA( A cA) 1 Aca = 4 y cM cA( AcA) 1 AcMy =
V
4
= y c[I 3  A ( A cA ) 1 A c]A( A cA) 1 Ac[I 3  A( A cA) 1 Ac]y = 0
V4

2 y cMy
det J x = 2
V det( A cA)
630 Appendix D: Sampling distributions and their use

V2 det( A cA)
| det J y |=| det J x |1 = .
2 y cMy

The various representations of the Jacobian will finally lead us to the special
form of the volume element

1 1 y cMy
dx d[ˆ1 d[ˆ2 = 2 dy1 dy2 dy3
2 V | AcA |1/ 2
and the cumulative probability
1 1
dF = 3 3/ 2
exp[ ( z12 + z22 + z32 )]dz1dz2 dz3 =
V (2S ) 2
1 1 dx | A cA |1/ 2 1
= exp( x) 2
exp[ 2 (ȟˆ  ȟ )cA cA(ȟˆ  ȟ )]d[ˆ1d[ˆ2 =
V 2S 2 x 2SV 2V
1 1
= 3 exp[ (y  E{y})c(y  E{y})]dy1dy2 dy3 .
V (2S )3/ 2 2
The sixth action item
The first target is to generate the marginal pdf of the unknown parameter vector
ȟ̂ , BLUUE of ȟ .
f
1 1 dx | A cA |1/ 2 § 1 ·
dF1 = ³ exp( x) 2
exp ¨  2 (ȟˆ  ȟ )cAcA(ȟˆ  ȟ ) ¸ d [ˆ1 d [ˆ2
0 2S 2 x 2SV © 2V ¹
f
1 1 dx 1 § 1 2
(Kˆi  Ki ) 2 ·
dF1 = ³ exp( x) exp ¨  2 ¦ ¸ dKˆ1 dKˆ2
0 2S 2 2
x 2SV O1O2 © 2V i =1 Oi ¹

Let us substitute the standard integral


f
1 1 dx
³ exp( x) =1,
0 2S 2 x
in order to have derived the marginal probability

dF1 = f1 (ȟˆ | ȟ, ( A cA) 1V 2 ) d[ˆ1d[ˆ2


1/ 2
| A cA | § 1 ·
f1 (ȟˆ ) := 2
exp ¨  2 (ȟˆ  ȟ )cA cA(ȟˆ  ȟ ) ¸
2SV © 2V ¹
dF1 = f1 ( Șˆ | Ș, ȁ N V 2 )dKˆ1dKˆ2
1

1 1 § 1 2 (Kˆ  Ki ) 2 ·
f1 ( Șˆ ) = exp ¨  ¦ i ¸.
2SV 2 O1O2 © 2 i =1 Oi ¹
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 631

The seventh action item

The second target is to generate the marginal pdf of Helmert’s random variable
x := Vˆ 2 / V 2 , Vˆ 2 BIQUUE V 2 , namely Helmert’s Chi Square pdf F p2 with p=n-
rkA (here p=1) “degree of freedom”.
+f +f
1 1 dx | A cA |1/ 2 § 1 ·
dF2 = ³f f³ 2SV 2 exp ¨©  2V 2 || ȟˆ  ȟ ||A cA ¸¹ d[ˆ1d[ˆ2 .
2
exp( x)
2S 2 x
Let us substitute the integral
+f +f
| A cA |1/ 2 § 1 ·
³f f³ 2SV 2 exp ¨©  2V 2 || ȟˆ  ȟ ||A cA ¸¹ d[ˆ1d[ˆ2 = 1
2

in order to have derived the marginal distribution


dF2 = f 2 ( x)dx, 0 d x d f
p2
1 1
f 2 ( x) = p/2
x 2
exp( x) ,
2 *( p / 2) 2

subject to
1
p = n  rk A = n  m , here: p = 1 , *( ) = S
2
1 1 1
f 2 ( x) = exp( x) .
2S x 2
The results of the example will be generalized in Lemma D.
Theorem D16 (marginal probability distributions, special linear Gauss-
Markov model):
E{y} = Aȟ ª A  \ n×m , rk A = m, E{y}  R ( A )
subject to « 2
D{y} = I nV 2 ¬V  \
+

defines a special linear Gauss-Markov model of fixed effects ȟ  \ m and


V 2  \ + based upon Gauss-Laplace i.i.d. observations y := [ y1 ," , yn ]c .
ª E{ȟˆ} = ȟ
ȟˆ = ( A cA) 1 A cy subject to «
«¬ D{ȟˆ} = ( A cA) 1V 2

and
632 Appendix D: Sampling distributions and their use

ª E{Vˆ 2 } = Vˆ 2
1
Vˆ =
2
(y  Aȟˆ )c(y  Aȟˆ ) subject to « 4
n  rk A « D{Vˆ 2 } = 2V
«¬ n  rk A
identify ȟ̂ BLUUE ȟ and Vˆ 2 BIQUUE of V 2 . The cumulative pdf of the multi-
dimensional Gauss-Laplace probability distribution of the observation vector
y = [ y1 ," , yn ]c  Y
f (y | E{y}, D{y} = I nV 2 )dy1 " dyn =
1 § 1 ·
= n/2 n
exp ¨  2 (y  E{y})c(y  E{y}) ¸ dy1 " dyn =
(2S ) V © 2V ¹
ˆ ˆ ˆ
= f1 (ȟ ) f 2 (Vˆ )d [1 " d [ m dVˆ
2 2

can be split into two marginal pdfs f1 (ȟˆ ) of ȟ̂ , BLUUE of ȟ , and f 2 (Vˆ 2 ) of
Vˆ 2 , BIQUUE of V 2 .
(i) ȟˆ BLUUE of ȟ
The marginal pdf of ȟˆ , BLUUE of ȟ , is represented by
(1st version)

dF1 = f1 (ȟˆ )d[1 " d[ m

1 1/ 2 § 1 ·
f1 (ȟˆ ) = m/ 2 m
A cA exp ¨  2 (ȟˆ  ȟ )cA cA(ȟˆ  ȟ ) ¸ d [1 " d [ m
(2S ) V © 2V ¹
or
(2nd version)

dF1 = f1 ( Șˆ )dK1 " dKm

1 § 1 m
(Kˆi  K ) 2 ·
f1 (Kˆ ) = (O1O2 " Om 1Om ) 1/ 2 exp ¨  2 ¦ ¸,
(2S ) m / 2 (V 2 ) m / 2 © 2V i =1 Oi ¹
by means of Principal Component Analysis PCA also called Singular Value
Decomposition (SVD) or Eigenvalue Analysis (EIGEN) of ( A cA ) 1 ,
Ș = U[c ȟ ª U[c ( A cA) 1 U[ = ȁ = Diag (O1 , O2 ," , Om 1 , Om )
subject to «
Șˆ = U[c ȟˆ «¬ U[c U[ = I m , det U[ = +1

f1 ( Șˆ | Ș, ȁV 2 ) = f (Kˆ1 ) f (Kˆ2 ) " f (Kˆm 1 ) f (Kˆm )


D7 Sampling from the multidimensional Gauss-Laplace normal distribution 633

1 § 1 (Kˆ  Ki ) 2 ·
f (Kˆi ) = exp ¨  2 i ¸ i  {1," , m}.
V Oi 2S © 2V Oi ¹

The transformed fixed effects (Kˆ1 ," ,Kˆm ) , BLUUE of (K1 ," ,Km ) , are mutually
independent and Gauss-Laplace normal
Kˆi  N (Ki | V 2 Oi ) i  {1," , m} .
(3rd version)
Kˆi  K 1 1
zi := : f1 ( zi )dzi = exp( zi2 )dzi i  {1," , m} .
V Oi2
2S 2

(ii) Vˆ 2 BIQUUE V 2
The marginal pdf of Vˆ 2 , BIQUUE V 2 , is represented by
(1st version)
p = n  rk A

dF2 = f 2 (Vˆ 2 )dVˆ 2

1 § 1 Vˆ 2 ·
f 2 (Vˆ 2 ) = p p / 2 p2
Vˆ exp ¨ p 2 ¸.
V p 2 p / 2 *( p / 2) © 2 V ¹
(2nd version)
dF2 = f 2 ( x)dx

Vˆ 2 p 1
x := (n  rk A) 2
= 2 Vˆ 2 = 2 (y  Aȟˆ )c(y  Aȟˆ )
V V V
p
1 1 1
f 2 ( x) = p / 2 x 2 exp( x) .
2 *( p / 2) 2

f2(x) as the standard pdf of the normalized sample variance is a Helmert Chi
Square F p2 pdf with p = n  rk A “degree of freedom”.
:Proof:
The first action item
n
First, let us decompose the quadratic form || y  E{y} ||2 into estimates E{y} of
E{y} .
n
y  E{y} = y  E n
{y} + ( E{y}  E{y})

y  E{y} = y  A[ˆ + A(ȟˆ  ȟ )


634 Appendix D: Sampling distributions and their use

and
n
(y  E{y})c(y  E{y}) = ( y  E n
{y})c( y  E n
{y}) + ( E n
{y}  E{y})c( E{y}  E{y})

n
|| y  E{y} ||2 =|| y  E n
{y} ||2 + || E{y}  E{y} ||2

( y  E{y})c( y  E{y}) = ( y  Aȟˆ )c( y  Aȟˆ ) + ( ȟˆ  ȟ )cA cA( ȟˆ  ȟ)

|| y  E{y} ||2 = || y  Aȟˆ ||2 + || ȟˆ  ȟ ||2AcA .

Here, we took advantage of the orthogonality relation.

(ȟˆ  ȟ )cAc( y  Aȟˆ )c = (ȟˆ  ȟ)cAc(I n  A( AcA) 1 Ac) y =

= (ȟˆ  ȟ )c( A c  A cA( A cA) 1 A c)y = 0.

The second action item


Second, we implement Vˆ 2 BIQUUE of V 2 into the decomposed quadratic form.

|| y  Aȟˆ ||2 = (y  Aȟˆ )c(y  Aȟˆ ) = y c(I n  A( A cA) 1 A c)y =

= y cMy = (n  rk A)Vˆ 2

|| y  E{y} ||2 = (n  rk A)Vˆ 2 + (ȟˆ  ȟ )cA cA(ȟˆ  ȟ )

|| y  E{y} ||2 = y cMy + (ȟˆ  ȟ )cN(ȟˆ  ȟ ).

The matrix of the normal equations N:=A cA, rk N = rk A cA = rk A = m , and the


matrix of the variance component estimation M := I n  A( A cA) 1 A c, rk M =
= n  rk A = n  m have been introduced since their rank forms the basis of the
generalized forward and backward Helmert transformation.
HH c = I n

z = V 1H( y  E{y}) = V 1H( y  Aȟ )

and
y  E{y} = V H cz

1
(y  E{y})c(y  E{y}) = z cH cHz = z cz
V2
1
|| y  E{y} ||2 =|| z ||2 .
V2
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 635

The standard canonical variable z  \ n has to be associated with norms


|| y  Aȟˆ || and || ȟˆ  ȟ ||AcA .
The third action item
Third, we take advantage of the eigenspace representation of the matrices (M, N)
and their associated norms.

y cMy = y cVȁ M V cy versus (ȟˆ -ȟ )cN(ȟˆ -ȟ )=(ȟˆ -ȟ )cUȁ N U c(ȟˆ -ȟ )

ȁ M = Diag ( P1 ," , P n  m , 0," , 0) versus ȁ N = Diag (J 1 ," , J m ).


 \ n = \ nm × \ m  \m
m eigenvalues of the matrix M are zero, but n  rk A = n  m is the number of its
non-vanishing eigenvalues which we denote by ( P1 ," , P n  m ) . In contrast,
m = rk A is the n-m number of eigenvalues of the matrix N, all non-zero. The
canonical random variable

V cy = y œ y = Vy and Uc(ȟˆ  ȟ ) = Șˆ  Ș

lead to
1 1 1
2
(y  E{y})c(y  E{y}) = 2 ( y ) ȁ M y + 2 ( Șˆ  Ș)cȁ N ( Șˆ  Ș)
V V V
nm m
1 1 1
2
(y  E{y})c(y  E{y}) = 2 ¦ ( y j ) 2 P j + ¦ (Kˆ i  Ki ) 2 J i
V V j =1 V2 i =1

1
2
(y  E{y})c(y  E{y}) = z12 + " + zn2 m + zn2 m +1 + " + zn2
V
subject to
z12 + " + zn2 m = and zn2 m +1 + " + zn2 =
nm m
1 1
= ¦ (y ) 2
j Pj = ¦ (Kˆ i  Ki ) 2 J i
V2 j =1 V2 i =1

|| z ||2 = z cz = z12 + " + zn2 m + zn2 m +1 + " + zn2 =


1 1
y cMy + 2 (ȟˆ  ȟ )cN(ȟˆ  ȟ ) =
=
V2 V
1 1
= 2 || y  E{y} ||2 = 2 (y  E{y})c(y  E{y}).
V V
Obviously, the eigenspace synthesis of the matrices N = A cA and M = I n 
A( A cA) 1 A c has guided us to the proper structure synthesis of the generalized
Helmert transformation.
636 Appendix D: Sampling distributions and their use

The fourth action item


Fourth, the norm decomposition unable us to split the cumulative probability
dF = f ( y1 ," , yn )dy1 " dyn

into the pdf of the Helmert random variable x := z12 + " + zn2 m = V 2 (n  rk A)Vˆ 2 =
= V 2 (n  m)Vˆ 2 and the pdf of the difference random parameter vector
zn2m+1 + " + zn2 = V 2 (ȟˆ  ȟ )cA cA(ȟˆ  ȟ ) .
dF = f ( z1 ," , zn  m , zn  m +1 ," , zn )dz1 " dzn  m dzn  m +1dzn

1 § 1 ·
f ( z1 ," , zn ) = exp ¨  z cz ¸ =
(2S ) n / 2 © 2 ¹

1 n 2m § 1 · 1
m
§ 1 ·
=( ) exp ¨  ( z12 + " + zn2 m ) ¸ ( ) 2 exp ¨  ( zn2 m +1 + " + zn2 ) ¸ .
2S © 2 ¹ 2S © 2 ¹
Part A
x := r 2 = z12 + " + zn2 m Ÿ dx = 2( z1 dz1 + " + zn  m dzn  m )

z1 = r cos In  m 1 cos In  m  2 " cos I2 cos I1


z2 = r cos In  m 1 cos In  m  2 " cos I2 sin I1
...
zn  m 1 = r cos In  m1 sin In  m  2
zn  m = r sin In  m 1

ª z1 º
« " » = 1 Diag ( P ," , P )V cy
« » V 1 nm 1
«¬ zn  m »¼

V = [V1 , V2 ] , V1cV1 = I n  m , V2cV2 = I m , V1cV2 = 0

VV c = I n , V  \ n× n , V1  \ n× ( n  m ) , V2  \ n× m

and

ª zn m+1 º
« " » = 1 Diag ( J ," , J )Uc(ȟˆ  ȟ )
« » V 1 m

«¬ zn »¼

altogether
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 637

ª z1 º
« ... »
« »
« zn  m » ª Diag ( P1 ," , Pn m ) V1cy º
» =V «
1
« ».
« zn m+1 » «¬ Diag ( J 1 ," , J m )Uc(ȟˆ  ȟ ) »¼
« ... »
« »
«¬ zn »¼

The partitioned vector of the standard random variable z is associated with the
norm || zn  m ||2 and || zm ||2 , namely
|| zn  m ||2 + || zm ||2 = z12 + " + zn2 m + zn2 m +1 + " + zn2 =

1
= y cV1 Diag ( P1 ," , P n  m ) Diag ( P1 ," , P n  m )V1cy +
V2
1 ˆ
+ (ȟ  ȟ )cUDiag ( J 1 ," , J m ) Diag ( J 1 ," , J m )Uc(ȟˆ  ȟ ) =
V2
1 1
= 2
y cV1 Diag ( P1 ," , P n  m )V1cy + 2 (ȟˆ  ȟ )cUDiag (J 1 ," , J m )U c(ȟˆ  ȟ ) =
V V
dz1dz2 " dzn  m 1dzn  m = r n  m 1dr (cos In  m 1 ) n  m 1 (cos In  m 2 ) n  m  2 "
" cos 2 I3 cos I2 dIn  m 1dIn  m  2 " dI3 dI2 dI1 .

The representation of the local (n-m) dimensional hypervolume element in terms


of polar coordinates (I1 , I2 ," , In  m 1 , r ) has already been given by Lemma D4.
Here, we only transform the random variable r to Helmert’s random variable x.
dx
x := r 2 : dx = 2rdr , dr = , r n  m 1 = x ( n  m 1) / 2
2 x
1 ( n  m 1) / 2
r n  m 1dr = x dx.
2
Part A concludes with the representation of the left pdf in terms of Helmert’s
polar coordinates

1 n 2m 1
dFA = ( ) exp ( z12 + " + zn2 m )dz1 " dzn  m =
2S 2
1 1 n 2m n  m2  2
= ( ) x dx(cos In  m 1 ) n  m 1 (cos In  m  2 ) n  m  2 "
2 2S
" cos 2 I3 cos I2 dIn  m 1dIn  m  2 " dI3 dI2 dI1 .

Part B
638 Appendix D: Sampling distributions and their use

Part B focuses on the representation of the right pdf, first in terms of the random
variables ȟ̂ , second in terms of the canonical random variables K̂ .

1 m2 § 1 ·
dFr = ( ) exp ¨  ( zn2 m +1 + " + zn2 ) ¸ dzn  m +1 " dzn
2S © 2 ¹
1 ˆ
zn2m+1 + " + zn2 = (ȟ  ȟ )cA cA(ȟˆ  ȟ )
V2
1
dzn  m +1 " dzn = | A cA |1/ 2 d[ˆ1 " d[ˆm .
V m/2
The computation of the local m-dimensional hyper volume element dzn  m +1 " dzn
has followed Corollary D3 which is based upon the matrix of the metric
V 2 AcA . The first representation of the right pdf is given by
1 m2 § 1 ·
dFr = ( ) exp ¨  ( zn2 m +1 + " + zn2 ) ¸ dzn  m +1 " dzn =
2S © 2 ¹
1 m2 | A cA |1/ 2 § 1 ·
=( ) m/ 2
exp ¨  2 (ȟˆ  ȟ )cA cA(ȟˆ  ȟ ) ¸ d [ˆ1 " d [ˆm .
2S V © 2V ¹
Let us introduce the canonical random variables (Kˆ1 ," ,Kˆm ) which are gener-
ated by the correlating quadratic form || ȟˆ  ȟ ||2AcA .
1 1
(ȟˆ  ȟ )cA cA(ȟˆ  ȟ ) = (ȟˆ  ȟ )cUDiag ( ," , )U c(ȟˆ  ȟ ) .
O1 Om

Here, we took advantage of the eigenspace synthesis of the matrix A cA =: N and


( A cA) 1 =: N 1 . Such an inverse normal matrix is the representing dispersion
matrix D{ȟˆ} = ( AcA) 1V 2 = N 1V 2 .
UUc = I m  U  SO(m) := {U  \ n× m | UUc = I m ,| U |= +1}

N := A cA = UDiag (J 1 ," , J m )U c

versus
N 1 := ( A cA) 1 = UDiag (O1 ," , Om )U c
subject to
J 1 = O11 ," , J m = Om1 or O1 = J 11 ," , Om = J m1
1
| A cA |1/ 2 = J 1 "J m =
O1 " Om

Șˆ  Ș := Uc(ȟˆ  ȟ ) œ ȟˆ  ȟ := Uc( Șˆ  Ș)
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 639

1 1
|| ȟˆ  ȟ ||2AcA =: (ȟˆ  ȟ )cA cA (ȟˆ  ȟ ) = ( Șˆ  Ș)cDiag ( ,", )( Șˆ  Ș) .
O1 Om

The local m-dimensional hypervolume element d[ˆ1 " d[ˆm is transformed to the
local m-dimensional hypervolume element dKˆ1 " dKˆm by

d[ˆ1 " d[ˆm =| U | dKˆ1 " dKˆm = dKˆ1 " dKˆm .

Accordingly we have derived the second representation of the right pdf f ( Șˆ ) .

1 m2 1 § 1 1 1 ·
dFr = ( ) exp ¨  2 (Șˆ  Ș)cDiag ( ,", )(Șˆ  Ș) ¸ dKˆ1 " dKˆm .
2S V m/2
O1 "Om © 2V O1 Om ¹

Part C

Part C is an attempt to merge the left and right pdf

1 1 n 2m ( n  m  2) / 2
dF = dFA dFr = ( ) x dxd Zn  m 1
2 2S
1 m2 | A cA |1/ 2 § 1 ·
( ) exp ¨  (ȟˆ  ȟ )cA cA(ȟˆ  ȟ ) ¸ d [ˆ1 " d [ˆm
2S Vm © 2 ¹
or

1 1 n 2m ( n  m  2) / 2
dF = dFA dFr = ( ) x dxd Zn  m 1
2 2S
1 m2 1 § 1 1 1 ·
( ) exp ¨  2 ( Șˆ  Ș)c Diag ( ," , )( Șˆ  Ș) ¸ dKˆ1 " dKˆm .
2S V O1 " Om
m
© 2V O1 Om ¹

The local (n-m-1)-dimensional hypersurface element has been denoted by


dZ n  m 1 according to Lemma D4.
The fifth action item
Fifth, we are going to compute the marginal pdf of ȟ̂ BLUUE of ȟ .

dF1 = f1 (ȟˆ )d[ˆ1 " d[ˆm

as well as
dF1 = f1 ( Șˆ )dKˆ1 " dKˆm

include the first marginal pdf f1 (ȟˆ ) and f1 ( Șˆ ) , respectively.


The definition
640 Appendix D: Sampling distributions and their use
f
1 1 nm 1 m | AcA |1/ 2 § 1 ·
f1 (ȟˆ ) := ³ dx³9dZn  m 1 ( ) 2 x( n  m  2) / 2 ( ) 2 2 m/2
exp ¨  (ȟˆ  ȟ)cAcA(ȟˆ  ȟ) ¸
0
2 2S 2S (V ) © 2 ¹
subject to
f
1 1 n 2m ( n  m  2) / 2
dx 9d Z
³0 ³ n  m 1 2 ( 2S ) x =1

leads us to

1 m2 | A cA |1/ 2 § 1 ·
f1 (ȟˆ ) = ( ) exp ¨  (ȟˆ  ȟ )cA cA(ȟˆ  ȟ ) ¸ .
2S Vm © 2 ¹
Unfortunately, such a general multivariate Gauss-Laplace normal distribution
cannot be tabulated. An alternative is offered by introducing canonical unknown
parameters Ș̂ as random variables.
The definition
f
1 1 nm
f1 ( Șˆ ) := ³ dx ³9d Zn  m 1 ( ) 2 x ( n  m  2) / 2
0 2 2S
1 m2 1 § 1 1 1 ·
( ) m
(O1O2 " Om 1Om ) 1/ 2 exp ¨  2 ( Șˆ  Ș)c Diag ( ," , )( Șˆ  Ș) ¸
2S V © 2V O1 Om ¹
subject to
f
1 1 n 2m ( n  m  2) / 2
³0 ³
dx 9d Z n  m 1 ( ) x =1
2 2S

alternatively leads us to

1 m2 1 1 § 1 m
(Kˆi  K ) 2 ·
f1 ( Șˆ ) = ( ) exp ¨  2 ¦ ¸
2S V m O1 " Om © 2V i =1 Oi ¹

f1 (Kˆ1 ," ,Kˆm ) = f1 (Kˆ1 )" f1 (Kˆm )

1 § 1 (Kˆi  K ) 2 ·
f1 (Kˆi ) := exp ¨  ¸ i  {1," , m} .
V Oi 2S © 2 Oi ¹

Obviously the transformed random variables (Kˆ1 ," ,Kˆm ) BLUUE of (K1 ," ,Km )
are mutually independent and Gauss-Laplace normal.
The sixth action item
Sixth, we shall compute the marginal pdf of Helmert’s random variable
x = (n  rkA)Vˆ 2 / V 2 = (n  m)Vˆ 2 / V 2 , Vˆ 2 BIQUUE V 2 ,
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 641

dF2 = f 2 ( x)dx
includes the second marginal pdf f 2 ( x) .
The definition
+f +f
1 1 nm 1 m | A cA |1/ 2
f 2 ( x) := 9³ d Zn  m 1 ( ) 2 x ( n  m  2) / 2 ³ d [ˆ1 " ³ d [ˆm ( ) 2 exp
2 2S f f
2S Vm

§ 1 ˆ ˆ ·
¨  2 (ȟ  ȟ )cAcA(ȟ  ȟ ) ¸
© 2V ¹

subject to

2 S ( n  m 1) / 2
Zn  m 1 = ³9d Zn  m 1 = ,
n  m 1
*( )
2

according to Lemma D4
+f +f m 1/ 2
[ˆ " d [ˆ ( 1 ) 2 | A cA | exp §  1 (ȟˆ  ȟ )cA cA(ȟˆ  ȟ ) · =
³ f³
f
" d 1 m
2S Vm ¨
© 2V
2 ¸
¹
+f +f
1 m2 § 1 ·
= ³ dz1 " ³ dzm ( ) exp ¨  ( z12 + " + zm2 ) ¸ = 1
f f
2S © 2 ¹

leads us to

p := n  rk A = n  m

n  m 1 1 n  m 1 nm p
S *( ) = * ( )* ( ) = *( ) = *( )
2 2 2 2 2
p
1 1 1
f 2 ( x) = p/2
x 2
exp( x) ,
2 *( p / 2) 2

namely the standard pdf of the normalised sample variance, known as Helmert’s
Chi Square pdf F p2 with p = n  rk A = n  m “degree of freedom”. If you substi-
tute x = (n  rkA)Vˆ 2 / V 2 = (n  m)Vˆ 2 / V 2 , dx = (n  rk A)V 2 dVˆ 2 = ( n  m)V 2 dVˆ 2
we arrive at the pdf of the sample variance Vˆ 2 , in particular
dF2 = f 2 (Vˆ 2 )dVˆ 2
1 § 1 Vˆ 2 ·
f 2 (Vˆ 2 ) = p p / 2 p 2
Vˆ exp ¨ p 2 ¸ .
V r 2 p / 2 *( p / 2) © 2 V ¹
642 Appendix D: Sampling distributions and their use

Here is my proof’s end.


Theorem D17 (marginal probability distributions, special linear Gauss-
Markov model with datum defect):
E{y} = Aȟ A  \ n× m , r := rkA < min{n, m}
subject to E{y}  R ( A )
D{y} = VV 2
V  \ n× n , rkV = n

defines a special linear Gauss-Markov model with datum defect of fixed effects
ȟ  \ m and a positive definite variance-covariance matrix D{y} = Ȉ y of multi-
variate Gauss-Laplace distributed observations y := [ y1 ," , yn ]c .

ȟˆ = Ly “linear”
ȟˆ = A + y subject to || LA  I m ||2 = min “minimum bias”
trD{ȟˆ} = trLȈ y Lc = min “best”

and
ª E{Vˆ 2 } = V 2
1 « 4
Vˆ 2 = (y  Aȟˆ )cȈ y1 (y  Aȟˆ ) subject to « D{Vˆ 2 } = 2V
n  rk A «¬ n  rk A

identify ȟ̂ BLIMBE of ȟ and Vˆ 2 BIQUUE of V 2 .


Part A
The cumulative pdf of the multivariate Gauss-Laplace probability distribution of
the observation vector y = [ y1 ,… , yn ]c  Y
f (y | E{y}, D{y} = Ȉ y )dy1 " dyn =
1 1 § 1 ·
= n/2 1/ 2
exp ¨  (y  E{y})cȈ y1 (y  E{y}) ¸ dy1 " dyn
(2S ) | Ȉ y | © 2 ¹
is transformed by

Ȉ y = : cDiag (V 1 ,… , V n ) : = : cDiag ( V 1 ,… , V n ) Diag ( V 1 ,… , V n ) :

Ȉ1/y 2 := Diag ( V 1 ,… , V n ) :

Ȉ y = ( Ȉ1/y 2 )cȈ1/y 2

versus
D7 Sampling from the multidimensional Gauss-Laplace normal distribution 643

1 1
Ȉ y1 = : cDiag ( ,… , ) : =
V1 Vn
1 1 1 1
= W cDiag ( ,… , ) Diag ( ,… , )W
V1 Vn V1 Vn

1 1
Ȉ 1/ 2 := Diag ( ,… , )W
V1 Vn

Ȉ 1 = ( Ȉ y1/ 2 )cȈ y1/ 2

subject to the orthogonality condition


WW c = I n

|| y  E{y} ||2Ȉ := (y  E{y})cȈ y1 (y  E{y}) =


1
y

1 1 1 1
= (y  E{y})cW cDiag ( ,… , ) Diag ( ,… , ) W( y  E{y})c =
V1 Vn V1 Vn
= (y  E{y })c(y  E{y }) =:|| y  E{y } ||I2 n

subject to the star or canonical coordinates


1 1
y = Diag ( ," , ) Wy = Ȉ y1/ 2 y
V1 Vn

f (y | E{y }, D{y } = I n )dy1 " dyn =


1 § 1 ·
= n/2
exp ¨  (y  E{y })c(y  E{y }) ¸ dy1 " dyn =
(2S ) © 2 ¹
1 § 1 ·
= n/2
exp ¨  || y  E{y } ||2 ¸ dy1 " dyn
(2S ) © 2 ¹
into the canonical Gauss-Laplace pdf.
Part B
The marginal pdf of ȟ̂ BLUUE of ȟ , is represented by
(1st version)

dF1 = f1 (ȟˆ )d[1 " d[ n .


Appendix E: Statistical Notions

Definitions und lemmas relating to statistics are given, neglecting their proofs.
First, we reviews the statistical moments of a probability distribution of random
vectors and list the Gauss-Laplace normal distribution and its moments. We
slightly generalize the Gauss-Laplace normal distribution by introducing the
notion of a quasi - normal distribution. At the end of the Chapter E1 we review
two lemmas about the Gauss-Laplace 3ı rule, namely the Gauss-Laplace ine-
quality and the Vysochainskii – Potunin inequality. Chapter E2 reviews the spe-
cial linear error propagation as well as the general nonlinear error propagation,
namely based on y = g(x) introducing the moments of second, third and fourth
order. The special role of the Jacobi matrix as well as the Hesse matrix is clari-
fied. Chapter E3 reviews useful identities like E{yy c … y} and E{yy c … yy c} as
well as Ȍ := E{(y  E{y})(y  E{y})c …(y  E{y})} relating to the matrix of
obliquity and ī := E{(y  E{y}) (y  E{y})c …(y  E{y})(y  E{y})c} + ... relating
to the matrix of courtosis. The various notions of identifiability and unbiased
estimability in Chapter E4 are reviewed, both for the moments of first order and
for the central moments of second order.
E1: Moments of a probability distribution, the Gauss-Laplace
normal distribution and the quasi-normal distribution
First, we define the moments of a probability distribution of a vector valued
random function and explain the notion of a Gauss-Laplace normal distribution
and its moments. Especially we define the terminology of a quasi – Gauss-
Laplace normal distribution.
Definition E1 (statistical moments of a probability distribution):
(1) The expectation or first moment of a continuous random
vector [ X i ] for all i = 1,… , n of a probability density
f ( x1 ,… xn ) is defined as the mean vector ȝ := [ P1 ,… , P n ] of
E{Xi } = ȝ i
+f +f
ȝ i := E{Xi } = ³ … ³ xi f ( x1 ,… , xn )dx1 … dxn . (E1)
f f

The first moment related to the random vector [ei ] := [ Xi  E{Xi }]


is called first central moment
S i := E{Xi  ȝi } = E{Xi }  ȝi = 0. (E2)
(2) The second moment of a continuous random vector [ X i ]
for all i = 1,… , n of a probability density f ( x1 ,… , xn ) is the
mean matrix
+f +f
ȝ ij := E{Xi X j } = ³ … ³ xi x j f ( x1 ,… , xn )dx1 " dxn . (E3)
f f
E1 Moments of a probability distribution 645

The second moment related to the random vector [ei ] := [ X i  E{Xi }]


is called variance – covariance matrix or dispersion matrix [V ij ] ,
especially variance or dispersion V 2 for i = j or covariance for i z j :
S ii = V i2 = V {Xi } = D{Xi } := E{( Xi  ȝi ) 2 } (E4)
S ij = V ij = C{Xi , X j } := E{( Xi  ȝi )( X j  ȝ j )} (E5)

D{x} = [V ij ] = [C{Xi , X j }] = E{( x  E{x})( x  E{x})c} . (E6)

x := [ X1 ,… , X n ]c is a collection of the n × 1 random vector. The random


variables X i , X j for i z j are called with respect to the central moment
of second order uncorrelated if V ij = 0 for i z j .
(3) The third central moment with respect to the random vector
[ei ] := [ Xi  E{Xi }] defined by
S ijk := E{ei e j e k } = E{( Xi  ȝ i )( X j  ȝ j )( X k  ȝ k )} (E7)

contains for i = j = k the vector of obliquity with the compo-


nents
Ȍ i := E{ei3 } . (E8)
Uncorrelation with respect to the central moment up to third or-
der is defined by

3
ª 1 d i1 z i2 z i3 d n
E{e e e } = – E{e } ««
n1 n2 n3 nj
i1 i2 i3 ij and (E9)
j =1
«¬ 0 d n1 + n2 + n3 d 3.

(4) The fourth central moment relative to the random vector


[ei ] := [ Xi  E{Xi }]
S ijk A := E{ei e j ek eA } = E{( Xi  ȝi )( X j  ȝ j )( Xk  ȝ k )( XA  ȝ A )} (E10)

leads for i1 = i2 = i3 = i4 to the vector of curtosis with the compo-


nents
Ȗ i := E{ei4 }  3{V i2 }2 . (E11)
Uncorrelation with respect the central moments up the fourth
order is defined by
ª 1 d i1 z i2 z i3 z i4 d n
4
E{e e e e } = – E{e } ««
n1 n2 n3 n4 nj
i1 i2 i3 i4 ij und (E12)
j =1
«¬ 0 d n1 + n2 + n3 + n4 d 4.
646 Appendix E: Statistical Notions

(5) The central moments of the nth order relative to the random
vector [ei ] := [ Xi  E{Xi }] are defined by
S i …i := E{ei … ei } = E{( Xi  P i )… ( Xi  P i )}.
1 n 1 n 1 1 n n
(E13)

A special distribution is the Gauss-Laplace normal distribution of random vec-


tors in R n . Note that alternative distributions on manifolds exist in large num-
bers, for instance the von Mises distribution on S 2 or the Fisher distribution on
S 3 of Chapter 7.
Definition E2 (Gauss-Laplace normal distribution):
An n × 1 random vector x := [ X1 ,… , X n ]c is a Gauss-Laplace
normal distribution if its probability density f ( x1 ,… xi ) has the
representation
1
f (x) = (2S )  n / 2 | Ȉ |1/ 2 exp[ (x  E{x})cȈ 1 (x  E{x})]. (E14)
2
Symbolically we can write
x  N (ȝ, Ȉ) (E15)
with the moment of first order or mean vector ȝ := E{x} and the
central moment of second order or variance – covariance matrix
Ȉ := E{(x  E{x})( x  E{x})c}.
The moments of the Gauss-Laplace normal distribution are given next.
Lemma E3: (moments of the Gauss-Laplace normal distribution):
Let the n × 1 random vector x := [ X1 ,… , X n ]c follow a Gauss-
Laplace normal distribution. Then all central moments of odd
order disappear and the central moments of even order are
product sums of the central moments of second order exclusively.
E{ei … ei } = 0  n = 2m + 1, m = 1,… , f
1 n
(E16)

E{ei … ei } = fct(V i2 , V i i ,… , V i2 ,… , V i2 )  n = 2m, m = 1,… , f


1 n 1 12 2 n
(E17)

S ij = V ij , S ii = V i2 , (E18)

S ijk = 0, (E19)

S ijk A = S ijS k A + S ik S jA + S iAS jk , (E20)

S iijj = S iiS jj + 2S ijS ij = V i2V 2j + 2V ij2 , S iiii = 3(V i2 )2 (E21)

S ijk Am = 0, (E22)

S i i …i
12 2 m  2 i2 m 1i2 m
= S i i …i
12 2 m2
Si 2 m 1i2 m
+ S i i …i
12 2 m  3i2 m 1
Si 2 m  2 i2 m
+ … + S i i …i S i i . (E23)
2 3 2 m 1 1 2m
E1 Moments of a probability distribution 647

The vector of obliquity ȥ := [\ 1 ,… ,\ m ]c and the vector of curto-


sis Ȗ := [J 1 , … , J m ]c vanish.
A weaker assumption compared to the Gauss-Laplace normal distribution is the
assumption that the central moments up to the order four are of the form (E18)-
(E21). Thus we allow for a larger class of distributions which have a similar
structure compared to (E18)-(E21).
Definition E4 (quasi - Gauss-Laplace normal distribution):
A random vector x is quasi – Gauss-Laplace normally distrib-
uted if it has a continuous symmetric probability distribution
f ( x ) which allows a representation of its central moments up to
the order four of type (E18)-(E21).
Of special importance is the computation of error bounds for the Gauss-Laplace
normal distribution, for instance. As an example, we have the case called the 3V
rule which states that the probability for the random variable for X falling away,
from its mean by more then 3 standard deviations (SDs) is at most 5 %,
4
P{| X  ȝ |t 3V } d < 0.05 . (E24)
81
Another example is the Gauss-Laplace inequality, that bounds the probability
for the deviation from the mode Q .
Lemma E5 (Gauss inequality):
The expected squared deviation from the mode Ȟ is
4W 2
P{| X  Ȟ |t r} d for all r t 4 / 3 W (E25)
9r 2

P{| X  Ȟ |t r} d 1  (r / 3W ) for all r d 4 / 3 W (E26)

subject to W 2 := E{( X  Ȟ ) 2 } .

Alternatively, we take advantage of the Vysochanskii – Potunin inequality.


Lemma E6 (Vysochanskii – Potunin inequality):
The expected squared deviation from an arbitrary point D  \ is
4U 2
P{| X  D |t r} d for all r t 8 / 3 U (E27)
9r 2
4U 2 1
P{| X  D |t r} d  for all r d 8 / 3 U (E28)
3r 2 3
subject to U 2 := E{( X  D ) 2 }.
648 Appendix E: Statistical Notions

References about the two inequalities are C.F. Gauss (1823), F. Pukelsheim
(1994), J.R. Savage (1961), B. Ulin (1653), D.F. Vysochaniskii and Y. E. Petunin
(1980, 1983).
E2: Error propagation
At the beginning we note some properties of the operators expectation E and
dispersion D. Afterwards we review the special and general, also nonlinear
error propagation.
Lemma E7 (expectation operators E{X} ):
E is defined as a linear operator in the space of random variables
in R n , also called expectation operator. For arbitrary constants
D , E , G  \ there holds the identity
E{D Xi + E Xi + G } . (E29)
Let A and B be two m × n and m × A matrices and į an m × 1
vector of constants x := [ X i ,… , X n ]c and y := [ Yi ,… , Yn ]c two
n × 1 and A × 1 random vectors such that
E{AX + BX + į} = AE{x} + BE{y} + į (E30)
holds. The expectation operator E is not multiplicative that is
E{Xi X j } = E{Xi }E{X j } + C{Xi , X j } z E{Xi }E{X j } , (E31)
if X i and X j are correlated.

Lemma E8 (special error propagation):


Let y be an n × 1 dimensional random vector which depends lin-
ear of the m × 1 dimensional random vector x by means of a
constant n × m dimensional matrix A and of a constant dimen-
sional vector į of the form y := Ax + į . Then hold the “error
propagation law”
D{y} = D{Ax + į} = AD{x}Ac. (E32)
The dispersion function D is not addition, in consequence a
nonlinear operator that is
D{Xi + X j } = D{Xi } + D{X j } + 2C{Xi , X j } z D{Xi } + D{X j } , (E33)

if X i and X j are correlated.

The “special error propagation law” holds for a linear transformation


x o y = Ax + į . The “general nonlinear error propagation” will be presented by
Lemma E7 and E8. The detailed proofs are taken from Chapter 8-2, Examples.
E2 Error propagation 649

Corollary E9 (“nonlinear error propagation”):


Let y = g ( x) be a scalarvalued function between one random
variable x and one random variable y. g(x) is assumed to allow a
Taylor expansion around the fixed approximation point [ 0 :
1
g ( x) = g ([ 0 ) + g c([ 0 )( x  [ 0 ) + g cc([ 0 )( x  [ 0 ) 2 + O (3) =
2 (E34)
= J 0 + J 1 ( x  [ 0 ) + J 2 g ([ 0 ) 2 + O (3).

Then the expectation and dispersion identities hold


1
E{ y} = g ( P x ) + g cc([ 0 )V x2 + O (3) = g ( P x ) + J 2V x2 + O (3), (E35)
2
D{ y} = J 12V x2 + 4V x2 [J 1J 2 ( P x  [ 0 ) + J 22 ( P x  [ 0 ) 2 ]  J 22V x4 +
(E36)
+E{( x  P x )3 }[2J 1J 2 + 4J 22 ( P x  [ 0 )] + E{( x  P x ) 4 }J 22 + O (3).
For the special case of a fixed approximation point ȟ 0 chosen to
coincide with mean value P x = [ 0 and x being quasi – Gauss-
Laplace normal distributed we arrive at the identities
1
E{ y} = g ( P x ) + g cc( P x )V x2 + O (4), (E37)
2
1
D{ y} = [ g c( P )]2 V x2 + [ g cc( P x )]2 V x4 + O (3). (E38)
2
The representation (E37) and (E38) characterize the nonlinear
error propagation which is in general dependent of the central
moments of the order two and higher, especially of the obliquity
and the curtosis.

Lemma E10: (“nonlinear error propagation“):


Let y = f(x) be a vectorvalued function between the m × 1 ran-
dom vector x and the n × 1 random vector y. g(x) is assumed to
allow a Taylor expansion around the m × 1 fixed approximation
vector ȟ 0 :
g(x) = g(ȟ 0 ) + gc(ȟ 0 )( x  ȟ 0 ) +
1
+ g cc(ȟ 0 )[( x  ȟ 0 ) … ( x  ȟ 0 )] + O (3) = (E39)
2
1
= Ȗ 0 + J (x  ȟ 0 ) + H[( x  ȟ 0 ) … ( x  ȟ 0 ) + O (3).
2
650 Appendix E: Statistical Notions

With the n × m Jacobi matrix J := [ J j g i (ȟ 0 )] and the n × m 2


Hesse matrix H := [vecH1 ,… , vecH n ]c,
Hi := [w j w k g i (ȟ 0 )] (i = 1,…, n; j, k = 1,…, m) (E40)

there hold the following expectation and dispersion identities


(“nonlinear error propagation”)
1
E{y} = ȝ y = g (ȝ x ) + HvecȈ + O (3) (E41)
2
1
D{y} = Ȉ y = JȈJ c + J ˜ [ Ȉ … (ȝ x  ȟ 0 )c + (ȝ x  ȟ 0 )c … Ȉ]H c +
2
1
+ H[ Ȉ … (ȝ x  ȟ 0 ) + (ȝ x  ȟ 0 ) … Ȉ]J c +
2
1
+ H[ Ȉ … (ȝ x  ȟ 0 )(ȝ x  ȟ 0 )c + (ȝ x  ȟ 0 )(ȝ x  ȟ 0 )c … Ȉ +
4
+(ȝ x  ȟ 0 ) … Ȉ … (ȝ x  ȟ 0 )c + (ȝ x  ȟ 0 )c … Ȉ … (ȝ x  ȟ 0 )]H c +
1
+ J ˜ E{(x  ȝ x )(x  ȝ x )c … (x  ȝ x )c}H c +
2
1 (E42)
+ H ˜ E{(x  ȝ x ) … (x  ȝ x )(x  ȝ x )c}J c +
2
1
+ H ˜ [ E{( x  ȝ x )( x  ȝ x )c … ( x  ȝ x )} ˜ (ȝ x  ȟ 0 )c +
4
+(ȝ x  ȟ 0 ) ˜ E{( x  ȝ x )c … (x  ȝ x )( x  ȝ x )c} +
+(I … (ȝ x  ȟ 0 )) ˜ E{( x  ȝ x )( x  ȝ x )c … ( x  ȝ x )c} +
+ E{( x  ȝ x ) … (x  ȝ x )( x  ȝ x )c} ˜ ((ȝ x  ȟ 0 )c) … I)]H c +
1
+ H ˜ {(x  ȝ x )(x  ȝ x )c … (x  ȝ x )(x  ȝ x )c} 
4
 vecȈ(vecȈ)c]H c + O (3) .
In the special case that the fixed approximations vector ȟ 0 coin-
cides to the mean vector ȝ x = ȟ 0 and the random vector x is
quasi – Gauss-Laplace normally distributed the following identi-
ties hold:
1
E{y} = ȝ y = g (ȝ x ) + HvecȈ + O (4) (E43)
2
1
D{y} = Ȉ y = JȈJ c  HvecȈ(vecȈ)cH c +
4
1
+ H ˜ E{(x  ȝ x )( x  ȝ x )c … ( x  ȝ x )( x  ȝ x )c} ˜ H c + O (3) = (E44)
4
1
= JȈJ c + H[ Ȉ … Ȉ + E{(x  ȝ x )c … Ȉ … (x  ȝ x )}] ˜ H c + O (3) .
4
E3 Useful identities 651

E3:.Useful identities
Notable identities about higher order moments are the following.
Lemma E11 (identities: higher order moments):
(i) Kronecker products #1

E{yy c} = E{(y  E{y})( y  E{y})c} + E{y}E{y}c (E45)


E{yy c … y} = E{(y  E{y})( y  E{y})c … ( y  E{y})} +
+ E{yy c} … E{y} + E{y … E{y}c … y} + (E46)
+ E{y} … E{yy c}  2 E{y} … E{y}c … E{y}

E{yy c … yy c} = (  Ȍ … E{y}c  E{y}c … Ȍ 


 Ȍc … E{y}  E{y} … Ȍc + E{yy c} … E{yy c} +
(E47)
+ E{y c … E{yy c} … y} + E{y … y}E{y … y}c 
 2 E{y}E{y}c … E{y}E{y}c .
(ii) Kronecker products #2
Ȍ := E{(y  E{y})( y  E{y})c … ( y  E{y})} (E48)
( := E{( y  E{y})( y  E{y})c … ( y  E{y})( y  E{y}) c} 
 E{( y  E{y})( y  E{y})c … E{( y  E{y})( y  E{y})c} 
(E49)
 E{( y  E{y})c … E{( y  E{y})( y  E{y})c} … ( y  E{y})}
 E{( y  E{y}) … ( y  E{y})}E{( y  E{y})c … ( y  E{y})c} .
The n 2 × n matrix Ȍ contains the components of obliquity, the n 2 × n 2
matrix ( the components of curtosis relative to the n × 1 central ran-
dom vector y  E{y} .
(iii) Covariances between linear and quadratic forms
C{F1 y + E1 , F2 y + E 2 } := E{(F1 y + E1  E{F1 y + E1 })
(E50)
×(F2 y + E 2  E{F2 y + E 2 })c } = F1 ȈF2
(linear error propagation)

C{Fy + E, yc Hy} := E{(Fy + E  E{Fy + E})(ycHy  E{ycHy})} =


= F[ E{yy c … yc }  E{y}E{yc … y c}] vec H = (E51)
1
= FȌc vec(H + Hc ) + FȈ(H + Hc ) E{y}
2
652 Appendix E: Statistical Notions

C{ycGy , yc Hy} := E{(ycGy  E{ycGy})(yc Hy  E{ycHy})} =


= (vec G )c[ E{yy c … yyc}  E{y … y}E{yc … y c}] vec H =
1
= [vec(G + Gc )]c ( vec( H + Hc )  (E52)
4
1
 [vec(G + Gc )]c ( Ȍc … E{y} + Ȍ … E{y}c ) vec( H + Hc ) +
2
1
+ tr[(G + Gc ) Ȉ( H + Hc ) Ȉ] + E{y}c (G + Gc ) Ȉ(H + Hc ) E{y}.
2
(iv) quasi-Gauss-Laplace-normally distributed data
C{F1 y + į1 , F2 y + į 2 } = F1 ȈF2 (E53)
(independent from any distribution)

C{Fy + į, y cHy} = FȈ(H + Hc ) E{y} (E54)


1
C{ycGy , yc Hy} = tr[(G + Gc ) Ȉ(H + Hc ) Ȉ] +
2 (E55)
+ E{y}c (G + Gc ) Ȉ(H + Hc ) E{y}.
E4: The notions of identifiability and unbiasedness
The notions of identifiability and unbiasedness are introduced. According to the
classical concept we call an arbitrary vector y := Fȟ in a linear model “identifi-
able” if for a given probability of the observed random vector y with a given
dispersion structure for the likelihood L( y; M) there holds the identity
L(y; M1 ) = L(y; M 2 ) Ÿ M1 = M 2 . (E56)
If the likehood function L( y; M) belongs to the class of a exponential distribu-
tions for instance a Gauss-Laplace normal distribution, the likelihood function is
identifiable if the related “information matrix”
w 2 ln L(y; M) w ln L w ln L wM wM
I (M) :=  E ( ) = E{( )( )c} = [ ]I (ȟ )( ]c (E57)
wM i wM j wM wM wȟ wȟ

is regular. It has be emphasized that the matrix J (ȟ ) in the case of a Gauss-


Laplace normal distribution the corresponding “normal equation matrix” are
identical:
y ~ N (Aȟ, Ȉ) Ÿ J (ȟ ) = A cȈ 1 A . (E58)
In order to rely on the more general notion of a probability distribution, for in-
stance characterized by the moments of order one to order four. We introduce the
notion of “identifiability”.
E4 The notions of identifiability and unbiasedness 653
Definition E 12 (identifiability):
An arbitrary vector M := Fȟ with respect to a linear model is
identifiable, if the implication
Aȟ1 = Aȟ 2 Ÿ M1 = Fȟ1 = Fȟ 2 = M 2 (E59)

holds for all ȟ i  R m (i = 1, 2) .


Corollary E13 informs us about the equivalent formulation, namely that M := FY
is identifiable if R ( Fc)  R( A c) .
Corollary E13 ( R ( Fc)  R( A c) ):

M := Fȟ identifiable in a linear model œ R ( Fc)  R ( A c) .

The concept of “identifiability” is related to the concept of “estimability” or


“unbiasedness”.
Definition E14 (estimability):
An arbitrary vector M := Fȟ is called in a linear model “estima-
ble” if there exists a function such that
E{L(y )} = M (E60)
holds. M = Fȟ with respect to a linear model is “linear estimable”
if there exists a matrix L such that
E{L(y )} = M (E61)
holds. In all these cases L( y ) or Ly is an “unbiased estimable
quantity” of M .
A bridge between “identifiability” and “unbiased estimation” is built on the
result of Theorem E15.
Theorem E15 (“identifiability” versus “unbiased estimability”):
Let M = Fȟ be an arbitrary vector built in a linear model. The fol-
lowing statements are equivalent.
(i) M is identifyable,
(ii) M is estimable,
(iii) M is linear estimable,
(iv) M is invariant with respect to all transformations which do not
influence the observed random vector y (“S-transformation”),
(v) R (Fc)  R( A c) .
654 Appendix E: Statistical Notions

The above statements are the reason that for the vectors M = Fȟ only linear esti-
mations are analyzed.

Up to now we only analyzed the notion of identifiability and estimability of the


unknown parameter vector. Now we tend to the question how to analyze the
notions of “identifiability” and “estimability” of the unknown variance compo-
nent within a linear model.
Definition E16 (“identifiability of a variance component”):
The variance component V 2 with respect to a linear model
E{y} = Aȟ , D{y} = VV 2 is called “identifiable” if the implica-
tion
VV 12 = VV 22 Ÿ V 12 = V 22 (E62)

is fulfilled for all V i2  R + (i = 1, 2) .

Similarly to Corollary E13 there exists an equivalent characterization with a


special comment: Definition E17 works only if we recognize that R + allows
only for positive real numbers. In addition, the matrix V is not allowed to con-
tain the zero matrix in order that V 2 becomes identifiable. Similarly to Defini-
tion E14 we obtain the notion of estimability:
Definition E17 (“estimability of a variance component”):
The variance component V 2 is called with respect to the linear
model E{y} = Aȟ , D{y} = VV 2 “estimable”, if the function M
exists such that
E{M (y )} = V 2 (E63)

is fulfilled. V 2 is called within this linear model “quadratically


estimable” if there are symmetric matrices such that
E{(vec M )c(y … y )} = E{y cMy} = V 2 (E64)
with the n × n matrix M . In these cases we call M( y ) or
(vec M)c( y … y ) “unbiased estimable” with respect to the scalar V 2 .

The necessary condition for a quadratic unbiased estimation of the variance


component V 2 in a linear model is (vec M)c( A … A ) = 0 . In addition, for y z 0
the variance component is always positive, namely if M is a positive definite
matrix.
Appendix F: Bibliographic Indexes
There are various bibliographic indexes to publications in statistics. For instance,
the American Statistical Association and the Institute of Mathematical Statistics
publishes the “Current Index to Statistics” (www.statindex.org). Reference are
drawn from about 120 “core journals” that are fully indexed, about 400 “non-
core journals” from which articles are selected that have Adv. statistical content,
proceedings and edited books, and other sources. The current index to statistics
Extended Database (CIS-ED) includes coverage of about 120 "core journals",
selected articles since 1974 from about 900 additional journals (cumulatively),
and about 8,000 books in statistics published since 1974. The CIS Print Volume
is published annually. Each year's edition indexes primarily material appearing in
the statistics literature in a single year. The Print Volume consists of two main
components. The Author Index lists articles by the names of authors, and in-
cludes the full citation for each article. The Subject Index lists articles by the
significant words appearing in the title and additional key words descriptive of
the subject matter of the article. (For instance, Volume 24 of the Print Volume
contains approximately 9300 entries, 71% of which appeared in 1998. About
4800 of the entries came from 111 core journals.)
Alternatively there is “Statistical Theory & Method Abstracts” published by
“International Statistics Institute” (Voorburg, Netherlands). Abstracts are given
of papers on probability and statistics, as well as important applications. Rele-
vant papers are taken from journals specialized in these subject fields, as well as
from journals that are largely devoted to other fields, but regularly contain papers
of interest. Other sources include collective works such as conference proceed-
ings, Festschriften, and commemorative volumes. In addition, research reports
and monographs in a series, which are not proper textbooks but have the charac-
ter of extended research paper are being abstracted.
On the following journals, abstracting is virtually complete

Appl. Prob. Biom. J.


Ann. Appl. Prob. Calcutta Statist. Ass. Bull.
Ann. Inst. Henri Poincaré Canad. J. Statist.
Ann. Prob. Commun. Statist.-Simul. Comp.
Ann. Statist. Commun. Statist.-Theor. Meth.
Appl. Statist. Comp. Statist. Data Anal.
Appl. Stoch. Models Data anal. Comp. Statist.
Austral. & New Zealand J. Statist. Econometric Rev.
Bernoulli Econometric Theory
Biometrics Egypt. Statist. J.
Biometrika Environmetrics
656 Appendix F: Bibliographic Indexes

Extremes Prob. Theory relat. Fields


Finance and Stochastics Prob. Engrg. Mechanics
Insurance: Math. Econom. Proc. Inst. Statist. Math.
Int. Statist. Rev. Psychometrika
J. Amer. Statist. Ass. Queueing Systems Theory Appl.
J. Appl. Prob. Random Operators & Stoch.
J. Appl. Statist. Equat.
J. Appl. Statist. Sci. Random Structures & Algorithms
J. Biopharmaceutical statist. Rev. Brasil. Prob. Eststist.
J. Chinese Statist. Ass. Rev. Financial Studies
J. Comp. Graph. Statisti. S. Afr. Statist. J.
J. Econometrics Sankhya (A and B)
J. Indian Soc. Agri. Statist. Scand. J. Statist.
J. Indian Statist. Ass. Sequent. Anal.
J. Italian Statist. Soc. Statist. & Dec.
J. Japanist. Statist. Soc. Statist. Med.
J. Korean. Statist. Soc. Statistica
J. Multivariate Anal. Statistica Sinica
J. Nonparameteric Statist. Statistics
J. R. Statist. Soc. B Statist. & computing
J. Risk and Uncertainty Statist. Methods Med. Res.
J. Statist. Comp. Simul. Statist. Neerl.
J. Statist. Planning Infer. Statist. Papers
J. Theor. Prob. Statist. Prob. Letters
J. Time Ser. Anal. Statist. Sci.
Korean J. Appl. Statist. Statistician
Kybernetika
Lifetime Data Anal. Stoch. Anal. Appl.
Markov Process. Rel. Fields Stoch. Models
Math. Methods Statist. Stoch. Processes Appl.
Metrika Stoch. & Stoch. Reports
Metron Student
Monte Carlos Methods Appl. Technometrics
Pakistan J. Statist. Test
Prob. Math. Statist. Theory Prob. Appl.

Each issue contains an Author Index where all the abstracts are listed according
to names of all the authors showing the abstract number and the classification
number,
Classfication Scheme 2000
(http://www.cbs.nl/isi/stma.htm)
Appendix F: Bibliographic Indexes 657

contain 16 subject entries, namely

Appl. Prob. Biom. J.


Ann. Appl. Prob. Calcutta Statist. Ass. Bull.
Ann. Inst. Henri Poincaré Canad. J. Statist.
Ann. Prob. Commun. Statist.-Simul. Comp.
Ann. Statist. Commun. Statist.-Theor. Meth.
Appl. Statist. Comp. Statist. Data Anal.
Appl. Stoch. Models Data anal. Comp. Statist.
Austral. & New Zealand J. Statist. Econometric Rev.
Bernoulli Econometric Theory
Biometrics Egypt. Statist. J.
Biometrika Environmetrics
References

Abbe, E. (1906): Über die Gesetzmäßigkeit in der Verteilung der Fehler bei Beobach-
tungsreihen, Gesammelte Abhandlungen, vol. II, Jena 1863 (1906)
Abdous, B. and R. Theodorescu (1998): Mean, median, mode IV, Statistica Neerlandica
52 (1998) 356-359
Absil, P.-A. et al. (2002) : A Grassmann-Rayleigh quotient iteration for computing invari-
ant subspaces, SIAM Review 44 (2002) 57-73
Abramowitz, M. and J.A. Stegun (1965): Handbook of mathematical functions, Dover
Publication, New York 1965
Adams, M. and V. Guillemin (1996): Measure theory and probability, 2nd edition, Birk-
häuser Verlag, Boston 1996
Adatia, A. (1996): Asymptotic blues of the parameters of the logistic distribution based on
doubly censored samples, J. Statist. Comput. Simul. 55 (1996) 201-211
Adelmalek, N.N. (1974): On the discrete linear L1 approximation and L1 solutions of
overdetermined linear equations, J. Approximation Theory 11 (1974) 38-53
Afifi, A.A. and V. Clark (1996): Computer-aided multivariate analysis, Chapman and
Hall, Boca Raton 1996
Agostinelli, C. and M. Markatou (1998): A one-step robust estimator for regression based
on the weighted likelihood reweighting scheme, Stat.& Prob. Letters 37 (1998) 341-
350
Agrò, G. (1995): Maximum likelihood estimation for the exponential power function
parameters, Comm. Statist. Simul. Comput. 24 (1995) 523-536
Aickin, M. and C. Ritenbaugh (1996): Analysis of multivariate reliability structures and
the induced bias in linear model estimation, Statistics in Medicine 15 (1996) 1647-
1661
Aitchinson, J. and I.R. Dunsmore (1975): Statistical prediction analysis, Cambridge Uni-
versity Press, Cambridge 1975
Aitken, A.C. (1935): On least squares and linear combinations of observations, Proc. Roy.
Soc. Edinburgh 55 (1935) 42-48
Airy, G.B. (1861): On the algebraical and numerical theory of errors of observations and
the combination of observations, Macmillan Publ., London 1861
Akdeniz, F., Erol, H. and F. Oeztuerk (1999): MSE comparisons of some biased estima-
tors in linear regression model, J. Applied Statistical Science 9 (1999) 73-85
Albert, A. (1969): Conditions for positive and nonnegative definiteness in terms of pseudo
inverses, SIAM J. Appl. Math. 17 (1969) 434-440
Albertella, A. and F. Sacerdote (1995): Spectral analysis of block averaged data in geopo-
tential global model determination, J. Geodesy 70 (1995) 166-175
Alcala, J.T., Cristobal, J.A. and W. Gonzalez-Manteiga (1999): Goodness-of-fit test for
linear models based on local polynomials, Statistics & Probability Letters 42 (1999)
39-46
Aldrich, J. (1999): Determinacy in the linear model: Gauss to Bose and Koopmans, Inter-
national Statistical Review 67 (1999) 211-219
Alefeld, G. and J. Herzberger (1983): Introduction to interval computation. Computer
science and applied mathematics, Academic Press, New York - London 1983
Ali, A.K.A., Lin, C.Y. and E.B. Burnside (1997): Detection of outliers in mixed model
analysis, The Egyptian Statistical Journal 41 (1997) 182-194
Allende, S., Bouza, C. and I. Romero (1995): Fitting a linear regression model by combin-
ing least squares and least absolute value estimation, Questiio 19 (1995) 107-121
660 References
Allmer, F. (2001): Louis Krüger (1857-1923), 25 pages, Technical University of Graz,
Graz 2001
Alzaid, A.A. and L. Benkherouf (1995): First-order integer-valued autoregressive process
with Euler marginal distributions, J. Statist. Res. 29 (1995) 89-92
Ameri, A. (2000): Automatic recognition and 3D reconstruction of buildings through
computer vision and digital photogrammetry, Dissertation, Stuttgart University, Stutt-
gart 2000
An, H.-Z., F.J. Hickernell and L.-X. Zhu (1997): A new class of consistent estimators for
stochastic linear regressive models, J. Multivar. Anal. 63 (1997) 242-258
Anderson, P.L. and M.M. Meerscaert (1997): Periodic moving averages of random vari-
ables with regularly varying tails, Ann. Statist. 25 (1997) 771-785
Anderson, T.W. (1973): Asymptotically efficient estimation of covariance matrices with
linear structure, The Annals of Statistics 1 (1973) 135-141
Anderson, T.W. (2003): An introduction to multivariate statistical analysis, Wiley, Stan-
ford CA, 2003
Anderson, T.W. and M.A. Stephens (1972): Tests for randomness of directions against
equatorial and bimodal alternatives, Biometrika 59 (1972) 613-621
Anderson, W.N. and R.J. Duffin (1969): Series and parallel addition of matrices, J. Math.
Anal. Appl. 26 (1969) 576-594
Ando, T. (1979): Generalized Schur complements, Linear Algebra Appl. 27 (1979) 173-
186
Andrews, D.F. (1971): Transformations of multivariate data, Biometrics 27 (1971) 825-
840
Andrews, D.F. (1974): A robust method for multiple linear regression, Technometrics 16
(1974) 523-531
Andrews, D.F., Bickel, P.J. and F.R. Hampel (1972): Robust estimates of location, Prince-
ton University Press, Princeton 1972
Anh, V.V. and T. Chelliah (1999): Estimated generalized least squares for random coef-
ficient regression models, Scandinavian J. Statist. 26 (1999) 31-46
Anido, C. and T. Valdés (2000): Censored regression models with double exponential
error distributions: an iterative estimation procedure based on medians for correcting
bias, Revista Matemática Complutense 13 (2000) 137-159
Ansley, C.F. (1985): Quick proofs of some regression theorems via the QR Algorithm,
The American Statistician 39 (1985) 55-59
Antoch, J. and H. Ekblom (2003): Selected algorithms for robust M- and L-Regression
estimators, Developments in Robust Statistics, pp. 32-49, Physica Verlag, Heidelberg
2003
Anton, H. (1994): Elementary linear algebra, Wiley, New York 1994
Arnold, B.C. and N. Balakrishnan (1989): Relations, bounds and approximations for order
statistics, Lecture Notes in Statistics 53 (1989) 1-37
Arnold, B.C. and R.M. Shavelle (1998): Joint confidence sets for the mean and variance
of a normal distribution, American Statistical Association 52 (1998) 133-140
Arnold, B.F. and P. Stahlecker (1998): Prediction in linear regression combining crisp
data and fuzzy prior information, Statistics & Decisions 16 (1998) 19-33
Arnold, B.F. and P. Stahlecker (1999): A note on the robustness of the generalized least
squares estimator in linear regression, Allg. Statistisches Archiv 83 (1999) 224-229
Arnold, K.J. (1941): On spherical probability distributions, P.H. Thesis, Boston 1941
Arnold, S.F. (1981): The theory of linear models and multivariate analysis, J. Wiley, New
York 1981
References 661
Arrowsmith, D.K. and C.M. Place (1995): Differential equations, maps and chaotic be-
haviour, Champman and Hall, London 1995
Arun, K.S. (1992): A unitarily constrained total least squares problem in signal process-
ing, SIAM J. Matrix Anal. Appl. 13 (1992) 729-745
Atkinson, A.C. and L.M. Haines (1996): Designs for nonlinear and generalized linear
models, Handbook of Statistik 13 (1996) 437-475
Aven, T. (1993): Reliability and risk analysis, Chapman and Hall, Boca Raton 1993
Awange, J.L. (2002): Gröbner bases, multipolynomial resultants and the Gauss-Jacobi
combinatorial algorithms – adjustment of nonlinear GPS/LPS observations, Schriften-
reihe der Institute des Studiengangs Geodäsie und Geoinformatik, Report 2002.1
Awange, J. and E.W. Grafarend (2002): Linearized Least Squares and nonlinear Gauss-
Jacobi combinatorial algorithm applied to the 7-parameter datum transformation C7(3)
problem, Z. Vermessungswesen 127 (2002) 109-116
Axler, S., Bourdon, P. and W. Ramey, (2001): Harmonic function theory, 2nd ed.,
Springer Verlag, New York 2001
Azzalini, A. (1996): Statistical inference, Chapman and Hall, Boca Raton 1996
Azzam, A.M.H. (1996): Inference in linear models with nonstochastic biased factors,
Egyptian Statistical Journal, ISSR - Cairo University 40 (1996) 172-181
Azzam, A., Birkes, D. and J. Seely (1988): Admissibility in linear models with polyhedral
covariance structure, probability and statistics, essays in honor of Franklin A. Gray-
bill, J.N. Srivastava, Ed. Elsevier Science Publishers, B.V. (North-Holland) 1988
Baarda, W. (1967): A generalization of the concept strength of the figure, Publications on
Geodesy, New Series 2, Delft 1967
Baarda, W. (1968): A testing procedure for use in geodetic networks, Netherlands Geo-
detic Commission, New Series, Delft, Netherlands, 2 (5) 1968
Baarda, W. (1973): S-transformations and criterion matrices, Netherlands Geodetic
Commission, Vol. 5, No. 1, Delft 1973
Baarda, W. (1977): Optimization of design and computations of control networks, F.
Halmos and J. Somogyi (eds.) Akademiai Kiado, Budapest 1977, 419-436
Babai, L. (1986): On Lovasz' lattice reduction and the nearest lattice point problem, Com-
binatorica 6 (1986) 1-13
Babu, G.J. and E.D. Feigelson (1996): Astrostatistics, Chapman and Hall, Boca Raton
1996
Baddeley, A.J. (2000): Time-invariance estimating equations, Bernoulli 6 (2000) 783-808
Bai, J. (1994): Least squares estimation of a shift in linear processes, J.Time Series
Analysis 15 (1994) 453-472
Bai, Z.D. and Y. Wu (1997): General M-estimation, J. Multivariate Analysis 63 (1997)
119-135
Bai, Z.D., Rao, C.R. and Y.H. Wu (1997): Robust inference in multivariate linear regres-
sion using difference of two convex functions as the discrepancy measure, Handbook
of Statistics 15 (1997) 1-19
Bai, Z.D., Chan, X.R., Krishnaiah, P.R. and L.C. Zhao (1987): Asymptotic properties of
the EVLP estimation for superimposed exponential signals in noise, Technical Report
87-19, CMA, University of Pittsburgh, Pittsburgh 1987
Baksalary, J.K. and A. Markiewicz (1988): Admissible linear estimators in the general
Gauss-Markov model, J. Statist. Planning and Inference 19 (1988) 349-359
Baksalary, J.K. and F. Pukelsheim (1991b): On the Löwner, minus and star partial order-
ings of nonnegative definite matrices and their squares, Linear Algebra and its Appli-
cations 151 (1991) 135-141
662 References
Baksalary, J.K., Liski, E.P. and G. Trenkler (1989): Mean square error matrix improve-
ments and admissibility of linear estimators, J. Statist. Planning and Inference 23
(1989) 313-325
Baksalary, J.K., Rao, C.R. and A. Markiewicz (1992): A study of the influence of the
'natural restrictions' on estimation problems in the singular Gauss-Markov model, J.
Statist. Planning and Inference 31 (1992) 335-351
Balakrishnan, N. and Basu, A.P. (1995) (eds.): The exponential distribution, Gordon and
Breach Publishers, Amsterdam 1995
Balakrishnan, N. and R.A. Sandhu (1996): Best linear unbiased and maximum likelihood
estimation for exponential distributions under general progressive type-II censored
samples, Sankhya 58 (1996) 1-9
Bamler, R. and P. Hartl (1998): Synthetic aperture radar interferometry, Inverse Problems
14 (1998) R1-R54
Banachiewicz, T. (1937): Zur Berechnung der Determinanten, wie auch der Inversen und
zur darauf basierten Auflösung der Systeme linearen Gleichungen, Acta Astronom.
Ser. C3 (1937) 41-67
Bansal, N.K., Hamedani, G.G. and H. Zhang (1999): Non-linear regression with multidi-
mensional indices, Statistics & Probability Letters 45 (1999) 175-186
Bapat, R.B. (2000): Linear algebra and linear models, Springer, New York 2000
Barankin, E.W. (1949): Locally best unbiased estimates, Ann. Math. Statist. 20 (1949)
477-501
Barbieri, M.M. (1998): Additive and innovational outliers in autoregressive time series: a
unified Bayesian approach, Statistica 3 (1998) 395-409
Barham, R.H. and W. Drane (1972): An algorithm for least squares estimation of nonlin-
ear parameters when some of the parameters are linear, Technometrics 14 (1972) 757-
766
Bar-Itzhack, I.Y. and F. L., Markley (1990): Minimal parameter solution of the orthogo-
nal matrix differential equation, IEEE Transactions on automatic control 35 (1990)
314-317
Barlow, J.L. (1993): Numerical aspects of solving linear least squares problems, C.R.
Rao, ed., Handbook of Statistic 9 (1993) 303-376
Barlow, R.E. and F. Proschan (1966): Tolerance and confidence limits for classes of
distributions based on failure rate, Ann. Math. Statist 37 (1966) 1593-1601
Barlow, R.E., Clarotti, C.A., and F. Spizzichino (eds) (1993): Reliability and decision
making, Chapman and Hall, Boca Raton 1993
Barnard, J., McCulloch, R. and X.-L. Meng (2000): Modeling covariance matrices in
terms of standard deviations and correlations, with application to shrinkage, Statistica
Sinica 10 (2000) 1281-1311
Barnard, M.M. (1935): The scalar variations of skull parameters in four series of Egyptian
skulls, Ann. Eugen. 6 (1935) 352-371
Barndorff-Nielsen, O.E. (1978): Information and exponential families in statistical theory,
Wiley & Sons, Chichester & New York 1978
Barndorff-Nielsen, O.E., Cox, D.R. and C. Klüppelberg (2001): Complex stochastic
systems, Chapman and Hall, Boca Raton, Florida 2001
Barnett, V. (1999): Comparative statistical inference, 3rd ed., Wiley, Chichester 1999
Barone, J. and A. Novikoff (1978): A history of the axiomatic formulation of probability
from Borel to Kolmogorov, Part I, Archive for History of Exact Sciences 18 (1978)
123-190
Barrio, R. (2000): Parallel algorithms to evaluate orthogonal polynomial series, SIAM J.
Sci. Comput 21 (2000) 2225-2239
References 663
Barrlund, A. (1998): Efficient solution of constrained least squares problems with
Kronecker product structure, SIAM J. Matrix Anal. Appl. 19 (1998) 154-160
Barrodale, I. and D.D. Oleski (1981): Exponential approximation using Prony's method,
The Numerical Solution of Nonlinear Problems, eds. Baker, C.T.H. and C. Phillips,
258-269, 1998
Bartlett, M.S. and D.G. Kendall (1946): The statistical analysis of variance-heterogeneity
and the logarithmic transformation, Queen's College Cambridge, Magdalen College
Oxford, Cambridge/Oxford 1946
Bartoszynski, R. and M. Niewiadomska-Bugaj (1996): Probability and statistical infer-
ence, Wiley, New York 1996
Barut, A.O. and R.B.Haugen (1972): Theory of the conformally invariant mass, Annals of
Physics 71 (1972) 519-541
Basset JR., G. and R. Koenker (1978): Asymptotic theory of least absolute error , J. Amer.
Statist. Assoc. 73 (1978) 618-622
Bateman, H. (1910a): The transformation of the electrodynamical equations, Proc. Lon-
don Math. Soc. 8 (1910) 223-264, 469-488
Bateman, H. (1910b): The transformation of coordinates which can be used to transform
one physical problem into another, Proc. London Math. Soc. 8 (1910) 469-488
Bates, D.M. and M.J. Lindstorm (1986): Nonlinear least squares with conditionally linear
parameters, Proceedings of the Statistical Computation Section, American Statistical
Association, Washington 1986
Bates, D.M. and D.G. Watts (1980): Relative curvature measures of nonlinearity (with
discussion), J. Royal Statist. Soc. Ser. B 42 (1980) 1-25
Bates, D.M. and D.G. Watts (1988a): Nonlinear regression analysis and its applications,
John Wiley, New York 1988
Bates, D.M. and D.G. Watts (1988b): Applied nonlinear regression, J. Wiley, New York
1988
Bates, R.A., Riccomagno, E., Schwabe, R. and H.P. Wynn (1998): Lattices and dual
lattices in optimal experimental design for Fourier models, Computational Statistics &
Data Analysis 28 (1998) 283-296
Batschelet, E. (1965): Statistical methods for the analysis of problems in animal orienta-
tion and certain biological rhythms, Amer. Inst. Biol. Sciences, Washington 1965
Batschelet, E. (1971): Recent statistical methods for orientation, (Animal Orientation
Symposium 1970 on Wallops Island), Amer. Inst. Biol. Sciences, Washington, D.C.,
1971
Batschelet, E. (1981): Circular statistics in biology, Academic Press, London 1981
Bauer, H. (1992): Maß- und Integrationstheorie, 2. Auflage, Walter de Gruyter, Berlin /
New York 1992
Bauer, H. (1996): Probability theory, de Gruyter Verlag, Berlin-New York 1996
Bayen, F. (1976): Conformal invariance in physics, in: Cahen, C. and M. Flato (eds.),
Differential geometry and relativity, Reidel Publ., pages 171-195, Dordrecht 1976
Beale, E.M. (1960): Confidence regions in non-linear estimation, J. Royal Statist. Soc. B
22 (1960) 41-89
Beaton, A.E. and J.W. Tukey (1974): The fitting of power series, meaning polynomials,
illustrated on band-spectroscopic data, Technometrics 16 (1974) 147-185
Becker, T., Weispfennig, V. and H. Kredel (1998): Gröbner bases: a computational ap-
proach to commutative algebra, New York, Springer 1998
Beckermann, B. and E.B. Saff (1999): The sensitivity of least squares polynomial ap-
proximation, Int. Series of Numerical Mathematics, vol. 131: Applications and com-
putation of orthogonal polynomials (eds. W. Gautschi, G.H. Golub, G. Opfer) pp. 1-
19, Birkhäuser Verlag, Basel 1999
664 References
Beckers, J., Harnad, J., Perroud, M. and P. Winternitz (1978): Tensor fields invariant
under subgroups of the conformal group of space-time, J. Math. Phys. 19 (1978)
2126-2153
Behnken, D.W. and N.R. Draper (1972): Residuals and their variance, Technometrics 11
(1972) 101-111
Behrens, W.A. (1929): Ein Beitrag zur Fehlerberechnung bei wenigen Beobachtungen,
Landwirtschaftliche Jahrbücher 68 (1929) 807-837
Beichelt, F. (1997): Stochastische Prozesse für Ingenieure, Teubner Stuttgart 1997
Belikov, M.V. (1991): Spherical harmonic analysis and synthesis with the use of column-
wise recurrence relations, Manuscripta Geodaetica 16 (1991) 384-410
Belikov, M.V. and K.A. Taybatorov (1992): An efficient algorithm for computing the
Earth’s gravitational potential and its derivatives at satellite altitudes, Manuscripta
Geodaetica 17 (1992) 104-116
Belmehdi, S., Lewanowicz, S. and A. Ronveaux (1997): Linearization of the product of
orthogonal polynomials of a discrete variable, Applicationes Mathematicae 24 (1997)
445-455
Ben-Israel, A. and T. Greville (1974): Generalized inverses: Theory and applications,
Wiley, New York 1974
Benbow, S.J. (1999): Solving generalized least-squares problems with LSQR, SIAM J.
Matrix Anal. Appl. 21 (1999) 166-177
Benda, N. and R. Schwabe (1998): Designing experiments for adaptively fitted models,
in: MODA 5 – Advances in model-oriented data analysis and experimental design,
Proceedings of the 5th International Workshop in Marseilles, eds. Atkinson, A.C.,
Pronzato, L. and H.P. Wynn, Physica-Verlag, Heidelberg 1998
Bennett, R.J. (1979): Spatial time series, Pion Limited, London 1979
Beran, R.J. (1968): Testing for uniformity on a compact homogeneous space, J. App.
Prob. 5 (1968) 177-195
Beran, R.J. (1979): Exponential models for directional data, Ann. Statist. 7 (1979) 1162-
1178
Beran, R.J. (1994): Statistical methods for long memory processes, Chapman and Hall,
Boca Raton 1994
Berberan, A. (1992): Outlier detection and heterogeneous observations – a simulation
case study, Australian J. Geodesy, Photogrammetry and Surveying 56 (1992) 49-61
Berger, M.P.F. and F.E.S. Tan (1998): Optimal designs for repeated measures experi-
ments, Kwantitatieve Methoden 59 (1998) 45-67
Berman, A. and R.J. Plemmons (1979): Nonnegative matrices in the mathematical sci-
ences, Academic Press, New York 1979
Bertsekas, D.P. (1996): Incremental least squares methods and the extended Kalman
filter, Siam J. Opt. 6 (1996) 807-822
Berry, J.C. (1994): Improving the James-Stein estimator using the Stein variance estima-
tor, Statist. Probab. Lett. 20 (1994) 241-245
Bertuzzi, A., Gandolfi, A. and C. Sinisgalli (1998): Preference regions of ridge regression
and OLS according to Pitman’s criterion, Sankhya: The Indian J.Statistics 60 (1998)
437-447
Bessel, F.W. (1838): Untersuchungen über die Wahrscheinlichkeit der Beobachtungsfeh-
ler, Astronomische Nachrichten 15 (1838) 368-404
Betensky, R.A. (1997): Local estimation of smooth curves for longitudinal data, Statistics
in Medicine 16 (1997) 2429-2445
Beylkin, G. and N. Saito (1993): Wavelets, their autocorrelation function and multidimen-
sional representation of signals, in: Proceedings of SPIE - The international society of
optical engineering, Vol. LB 26, Int. Soc. for Optical Engineering, Bellingham 1993
References 665
Bhatia, R. (1996): Matrix analysis, Springer Verlag, New York 1996
Bhattacharya, R.N. and C.R. Rao (1976): Normal approximation and asymptotic expan-
sions, Wiley, New York 1976
Bhattacharya, R.N. and E.C. Waymire (2001): Iterated random maps and some classes of
Markov processes, D. N. Shanbhag and C.R. Rao, eds., Handbook of Statistic 19
(2001) 145-170
Bibby, J. (1974): Minimum mean square error estimation, ridge regression, and some
unanswered questions, colloquia mathematica societatis Janos Bolyai, Progress in sta-
tistics, ed. J. Gani, K. Sarkadi, I. Vincze, Vol. I, Budapest 1972, North Holland Publi-
cation Comp., Amsterdam 1974
Bickel, P.J. and K.A. Doksum (1977a): Mathematical statistics - Distribution theory for
transformations of random vectors, pp. 9-41, Holden-Day Inc 1977
Bickel, P.J. and K.A. Doksum (1977b): Mathematical statistics – Optimal tests and confi-
dence intervals: Likelihood ratio tests and related procedures, pp. 192-247, Holden-
Day Inc 1977
Bickel, P.J. and K.A. Doksum (1977c): Mathematical statistics – Basic ideas and selected
topics, pp. 369-406, Holden-Day Inc 1977
Bickel, P.J. and K.A. Doksum (1981): An analysis of transformations revisited, J.the
Maerican Statistical Association 76 (1981) 296-311
Bickel, P.J., Doksum, K. and J.L. Hodges (1982): A Festschrift for Erich L. Lehmann,
Chapman and Hall, Boca Raton 1982
Bierman, G.J. (1977): Factorization Methods for discrete sequential estimation, Academic
Press, New York 1997
Bill, R. (1985b): Kriteriummatrizen ebener geodätischer Netze, Deutsche Geodätische
Kommission, München, Reihe A, No. 102
Bilodeau, M. and D. Brenner (1999): Theory of multivariate statistics, Springer Verlag
1999
Bilodeau, M. and P. Duchesne (2000): Robust estimation of the SUR model, The Cana-
dian J.Statistics 28 (2000) 277-288
Bingham, C. (1964): Distributions on the sphere and propetive plane, PhD. Thesis, Yale
University 1964
Bingham, C. (1974): An antipodally symmetric distribution on the sphere, Ann. Statist. 2
(1974) 1201-1225
Bingham, C., Chang, T. and D. Richards (1992): Approximating the matrix Fisher and
Bingham distributions: Applications to spherical regression and Procrustes analysis,
J.Multivariate Analysis 41 (1992) 314-337
Bingham, N.H. (2001): Random Walk and fluctuation theory, D. N. Shanbhag and C.R.
Rao, eds., Handbook of Statistic 19 (2001) 171-213
Bini, D. and V. Pan (1994): Polynomial and matrix computations, Vol. 1: Fundamental
Algorithms, Birkhäuser, Boston 1994
Bill, R. (1984): Eine Strategie zur Ausgleichung und Analyse von Verdichtungsnetzen,
Deutsche Geodätische Kommission, Bayerische Akademie der Wissenschaften, Re-
port C295, 91 pp., München 1984
Bischof, C.H. and G. Quintana-Orti (1998): Computing rank-revealing QR factorizations
of dense matrices, ACM Transactions on Mathematical Software 24 (1998) 226-253
Bischoff, W. (1992): On exact D-optimal designs for regression models with correlated
observations, Ann. Inst. Statist. Math. 44 (1992) 229-238
Bjerhammar, A. (1951a): Rectangular reciprocal matrices with special reference to calcu-
lation, Bull. Géodésique 20 (1951) 188-210
666 References
Bjerhammar, A. (1951b): Application of calculus of matrices to the method of least-
squares with special reference to geodetic calculations, Trans. RIT, No. 49, Stock-
holm 1951
Bjerhammar, A. (1955): En ny matrisalgebra, SLT 211-288, Stockholm 1955
Bjerhammar, A. (1958): A generalized matrix algebra, Trans. RIT, No. 124, Stockholm
1958
Bjerhammar, A. (1973): Theory of errors and generalized matrix inverses, Elsevier, Am-
sterdam 1973
Björck, A. (1967): Solving linear least squares problems by Gram-Schmidt orthogonaliza-
tion, Nordisk Tidskr. Informationsbehandlung (BIT) 7 (1967) 1-21
Björck, A. (1996): Numerical methods for least squares problems, SIAM, Philadelphia
1996
Björck, A. and G.H. Golub (1973): Numerical methods for computing angles between
linear subspaces, Mathematics of Computation 27 (1973) 579-594
Björkström, A. and R. Sundberg (1999): A generalized view on continuum regression,
Scandinavian J.Statistics 26 (1999) 17-30
Blaker, H. (1999): Shrinkage and orthogonal decomposition, Scandinavian J.Statistics 26
(1999) 1-15
Blewitt, G. (2000): Geodetic network optimization for geophysical parameters, Geophysi-
cal Research Letters 27 (2000) 2615-3618
Bloomfield, P. and W.L. Steiger (1983): Least absolute deviations - theory, applications
and algorithms, Birkhäuser Verlag, Boston 1983
Bobrow, J.E. (1989): A direct minimization approach for obtaining the distance between
convex polyhedra, Int. J. Robotics Research 8 (1989) 65-76
Boggs, P.T., Byrd, R.H. and R.B. Schnabel (1987): A stable and efficient algorithm for
nonlinear orthogonal distance regression, SIAM J. Sci. Stat. Comput. 8 (1987) 1052-
1078
Bolfarine, H. and M. de Castro (2000): ANOCOVA models with measurement errors,
Statistics & Probability Letters 50 (2000) 257-263
Bollerslev, T. (1986): Generalized autoregressive conditional heteroskedasticity, J.
Econometrics 31 (1986) 307-327
Booth, J.G. and J.P. Hobert (1996): Standard errors of prediction in generalized linear
mixed models, J. American Statist. Assoc. 93 (1996) 262-272
Bordes L., Nikulin, M. and V. Voinov (1997): Unbiased estimation for a multivariate
exponential whose components have a common shift, J. Multivar. Anal. 63 (1997)
199-221
Borg, I. and P. Groenen (1997): Modern multidimensional scaling, Springer Verlag, New
York 1997
Borovkov, A.A. (1998): Mathematical statistics, Gordon and Breach Science Publishers,
Amsterdam 1998
Borre, K. (2001): Plane networks and their applications, Birkhäuser Verlag, Basel 2001
Bose, R.C. (1944): The fundamental theorem of linear estimation, Proc. 31st Indian Sci-
entific Congress (1944) 2-3
Bossler, J. (1973): A note on the meaning of generalized inverse solutions in geodesy, J.
Geoph. Res. 78 (1973) 2616
Bossler, J., Grafarend, E.W. and R. Kelm (1973): Optimal design of geodetic nets II, J.
Geoph. Res. 78 (1973) 5887-5897
Boulware, D.G., Brown, L.S. and R.D. Peccei (1970): Deep inelastic electroproduction
and conformal symmetry, Physical Review D2 (1970) 293-298
References 667
Box, G.E.P. and D.R. Cox (1964): Analysis of transformations, J.the Royal Statistical
Society, Series B 26 (1964) 211-252
Box, G.E.P. and G. Tiao (1973): Bayesian inference in statistical analysis, Addison-
Wesley, Reading 1973
Box, G.E.P. and N.R. Draper (1987): Empirical model-building and response surfaces, J.
Wiley, New York 1987
Box, M.J. (1971): Bias in nonlinear estimation, J. Royal Statistical Society B33 (1971)
171-201
Branco, M.D. (2001): A general class of multivariate skew-elliptical distributions,
J.Multivariate Analysis 79 (2001) 99-113
Brandt, S. (1992): Datenanalyse. Mit statistischen Methoden und Computerprogrammen,
3. Aufl., BI Wissenschaftsverlag, Mannheim 1992
Brandt, S. (1999): Data analysis: statistical and computational methods for scientists and
engineers, 3rd ed., Springer Verlag, New York 1999
Braess, D. (1986): Nonlinear approximation theory, Springer-Verlag, Berlin 1986
Breckling, J. (1989): Analysis of directional time series: application to wind speed and
direction, Springer Verlag, Berlin 1989
Brémaud P. (1999): Markov Chains – Gibbs Fields, Monte Carlo Simulation and Queues,
Springer Verlag New York 1999
Breslow, N.E. and D.G. Clayton (1993): Approximate inference in generalized linear
mixed models, J. Amer. Statist. Assoc. 88 (1993) 9-25
Brezinski, C. (1999): Error estimates for the solution of linear systems, SIAM J. Sci.
Comput. 21 (1999) 764-781
Brill, M. and E. Schock (1987): Iterative solution of ill-posed problems - a survey, in:
Model optimization in exploration geophysics, ed. A. Vogel, Vieweg, Braunschweig
1987
Bro, R. (1997): A fast non-negativity-constrained least squares algorithm, J. Chemomet-
rics 11 (1997) 393-401
Bro, R. and S. de Jong (1997): A fast non-negativity-constrained least squares algorithm,
J.Chemometrics 11 (1997) 393-401
Brock, J.E. (1968): Optimal matrices describing linear systems, AIAA J. 6 (1968) 1292-
1296
Brockwell, P.J. (2001): Continuous-time ARMA Processes, D. N. Shanbhag and C.R.
Rao, eds., Handbook of Statistic 19 (2001) 249-276
Brovelli, M.A., Sanso, F. and G. Venuti (2003): A discussion on the Wiener-Kolmogorov
prediction principle with easy-to-compute and robust variants, J. Geodesy 76 (2003)
673-683
Brown, B. and R. Mariano (1989): Measures of deterministic prediction bias in nonlinear
models, Int. Econ. Rev. 30 (1989) 667-684
Brown, B.M., Hall, P. and G.A. Young (1997): On the effect of inliers on the spatial
median, J. Multivar. Anal. 63 (1997) 88-104
Brown, H. and R. Prescott (1999): Applied mixed models in medicine, J. Wiley, Chiches-
ter 1999
Brown, K.G. (1976): Asymptotic behavior of Minque-type estimators of variance compo-
nents, The Annals of Statistics 4 (1976) 746-754
Brualdi, R.A. and H. Schneider (1983): Determinantal identities: Gauss, Schur, Cauchy,
Sylvester, Kronecker, Jacobi, Binet, Laplace, Muir and Cayley, Linear Algebra Appl.
52/53 (1983) 765-791
Brunk, H.D. (1958): On the estimation of parameters restricted by inequalities, Ann.
Math. Statist. 29 (1958) 437-454
668 References
Brunner, F.K., Hartinger, H. and L. Troyer (1999): GPS signal diffraction modelling: the
stochastic sigma- ' model, J. Geodesy 73 (1999) 259-267
Bruno, A.D. (2000): Power geometry in algebraic and differential equations, Elsevier,
Amsterdam-Lausanne-New York-Oxford-Shannon-Singapore-Tokyo 2000
Brzézniak, Z. and T. Zastawniak (1959): Basic stochastic processes, Springer Verlag,
Berlin 1959
Buja, A. (1996): What Criterion for a Power Algorithm?, Rieder, H. (editor): Robust
statistics, data analysis and computer intensive methods, In honour of Peter Huber’s
60th Birthday, Springer 1996
Buhmann, M.D. (2001): Approximation and interpolation with radial functions, In: Multi-
variate Approximation and Applications, pp. 25-43, Cambridge University Press,
Cambridge 2001
Bunday, B.D., Bokhari S.M.H. and K.H. Khan (1997): A new algorithm for the normal
distribution function, Sociedad de Estadistica e Investigacion Operativa 6 (1997) 369-
377
Bunke, H. und O. Bunke (1974): Identifiability and estimability, Math. Operations-
forschg. Statist. 5 (1974) 223-233
Bunke, H. and O. Bunke (1986): Statistical inference in linear models, J. Wiley, New
York 1986
Bunke, O. (1977): Mixed models, empirical Bayes and Stein estimators, Math. Opera-
tionsforschg. Ser. Statistics 8 (1977) 55-68
Buonaccorsi, J., Demidenko, E. and T. Tosteson (2000): Estimation in longitudinal ran-
dom effects models with measurement error, Statistica Sinica 10 (2000) 885-903
Burgio, G. and Y. Nitkitin (1998): Goodness-of-fit tests for normal distribution of order p
and their asymptotic effiency, Statistica 58 (1998) 213-230
Burns, F., Carlson, D., Haynsworth, E., and T. Markham (1974): Generalized inverse
formulas using the Schur complement, SIAM J. Appl. Math. 26 (1974) 254-259
Businger, P. and G.H. Golub (1965): Linear least squares solutions by Householder trans-
formations, Numer. Math., 7 (1965) 269-276
Butler, N.A. (1999): The efficiency of ordinary least squares in designed experiments
subject to spatial or temporal variation, Statistics & Probability Letters 41 (1999) 73-
81
Caboara, M. and E. Riccomagno (1998): An algebraic computational approach to the
identifiability of Fourier models, J. Symbolic Computation 26 (1998) 245-260
Cadet, A. (1996): Polar coordinates in Rnp, application to the computation of the Wishart
and Beta laws, Sankhya: The Indian J.Statistics 58 (1996) 101-114
Cai, J., Grafarend, E.W. and B. Schaffrin (2004): The A-optimal regularization parameter
in uniform Tykhonov-Phillips regularization - D -weighted BLE -, V Hotine-Marussi
Symposium on Mathematical Geodesy, Matera / Italy 2003, in: International Associa-
tion of Geodesy Symposia 127, pp. 309-324, Springer Verlag Berlin – Heidelberg
2004
Cambanis, S. and I. Fakhre-Zakeri (1996): Forward and reversed time prediction of auto-
regressive sequences, J. Appl. Prob. 33 (1996) 1053-1060
Campbell, H.G. (1977): An introduction to matrices, vectors and linear programming, 2nd
ed., Printice Hall, Englewood Cliffs 1977
Candy, J.V. (1988): Signal processing, McGrow Hill, New York 1988
Cantoni, E. (2003): Robust inference based on quasi-likelihoods for generalized linear
models and longitudinal data, Developments in Robust Statistics, pp. 114-124,
Physica Verlag, Heidelberg 2003
Carlin, B.P. and T.A. Louis (1996): Bayes and empirical Bayes methods, Chapman and
Hall, Boca Raton 1996
References 669
Carlitz, L. (1963): The inverse of the error function, Pacific J. Math. 13 (1963) 459-470
Carlson, D., Haynsworth, E. and T. Markham (1974): A generalization of the Schur com-
plement by means of the Moore-Penrose inverse, SIAM J. Appl. Math. 26 (1974) 169-
179
Carlson, D. (1986): What are Schur complements, anyway?, Linear Algebra and its Ap-
plications 74 (1986) 257-275
Carroll, J.D., Green, P.E. and A. Chaturvedi (1999): Mathematical tools for applied mul-
tivariate analysis, Academic Press, San Diego 1999
Carroll, J.D. and P.E. Green (1997): Mathematical tools for applied multivariate analysis,
Academic Press, San Diego 1997
Carroll, R.J. and D. Ruppert (1982): A comparison between maximum likelihood and
generalized least squares in a heteroscedastic linear model, M. American Statist.
Assoc. 77 (1982) 878-882
Carroll, R., Ruppert, D. and L. Stefanski (1995): Measurement error in nonlinear models,
Chapman and Hall, Boca Raton 1995
Carruthers, P. (1971): Broken scale invariance in particle physics, Phys. Lett. Rep. 1
(1971) 1-30
Caspary, W. (2000): Zur Analyse geodätischer Zeitreihen, Schriftenreihe, Heft 60-1,
Neubinerg 2001
Caspary, W. and K. Wichmann (1994): Lineare Modelle. Algebraische Grundlagen und
statistische Anwendungen, Oldenbourg Verlag, München / Wien 1994
Castillo, J. (1994): The singly truncated normal distribution, a non-steep exponential
family, Ann. Inst. Statist. Math 46 (1994) 57-66
Castillo, J. and M. Perez-Casany (1998): Weights poison distributions for overdispersion
and underdispersion situations, Ann. Ins. Statist. Math. 50 (1998) 567-585
Castillo, J. and P. Puig (1997): Testing departures from gamma, Rayleigh and truncated
normal distributions, Ann. Inst. Statist.Math. 49 (1997) 255-269
Cayley, A. (1855): Sept différents mémoires d'analyse, No. 3, Remarque sur la notation
des fonctions algebriques, Journal für die reine und angewandte Mathematik 50
(1855) 282-285
Cayley, A. (1858): A memoir on the theory of matrices, Phil. Transactions, Royal Society
of London 148 (1858) 17-37
Cenkov, N.N. (1972): Statistical decision rule and optimal inference, Nauka 1972
Chan, K.-S. and H. Tong (2001): Chaos, a statistical perspective, Springer-Verlag, New
York 2001
Chan, K.-S. and H. Tong (2002): A note on the equivalence of two approaches for speci-
fying a Markov process, Bernoulli 8 (2002) 117-122
Chan, L.-Y. (2000): Optimal designs for experiments with mixtures: a survey, Commun.
Statist.-Theory Meth. 29 (2000) 2281-2312
Chan, T.F. and P.C. Hansen (1991): Some applications of the rank revealing QR factori-
zations, Numer. Linear Algebra Appl., 1 (1991) 33-44
Chan, T.F. and P.C. Hansen (1992): Some applications of the rank revealing QR factori-
zation, SIAM J. Sci. Statist. Comput., 13 (1992), pp. 727-741
Chandrasekaran, S. (2000): An efficient and stable algorithm for the symmetric-definite
generalized eigenvalue problem, SIAM J. Matrix Anal. Appl. 21 (2000) 1202-1228
Chandrasekaran, S. and I.C.F. Ipsen (1995): Analysis of a QR algorithm for computing
singular values, SIAM J. Matrix Anal. Appl. 16 (1995) 520-535
Chandrasekaran, S., Gu, M. and A.H. Sayed (1998): A stable and efficient algorithm for
the indefinite linear least-squares problem, SIAM J. Matrix Anal. Appl. 20 (1998)
354-362
670 References
Chandrasekaran, S., Golub, G.H., Gu, M. and A.H. Sayed (1998): Parameter estimation in
the presence of bounded data uncertainties, SIAM J. Matrix Anal. Appl. 19 (1998)
235-252
Chang, F.-C. (1999): Exact D-optimal designs for polynomial regression without inter-
cept, Statistics & Probability Letters 44 (1999) 131-136
Chang, F.-C. and Y.-R. Yeh (1998): Exact A-optimal designs for quadratic regression,
Statistica Sinica 8 (1998) 527-533
Chang, T. (1986): Spherical regression, Annals of Statistics 14 (1986) 907-924
Chang, T. (1988): Estimating the relative rotations of two tectonic plates from boundary
crossings, American Statis. Assoc. 83 (1988) 1178-1183
Chang, T. (1993): Spherical regression and the statistics of tectonic plate reconstructions,
International Statis. Rev. 51 (1993) 299-316
Chapman, D.G. and H. Robbins (1951): Minimum variance estimation without regularity
assumptions, Ann. Math. Statist. 22 (1951) 581-586
Chartres, B.A. (1963): A geometrical proof of a theorem due to Slepian, SIAM Review 5
(1963) 335-341
Chatfield, C. and A.J. Collins (1981): Introduction to multivariate analysis, Chapman and
Hall, Boca Raton 1981
Chatterjee, S. and A.S. Hadi (1988): Sensitivity analysis in linear regression, J. Wiley,
New York 1988
Chatterjee, S. and M. Mächler (1997): Robust regression: a weighted least-squares ap-
proach, Commun. Statist. Theor. Meth. 26 (1997) 1381-1394
Chaturvedi, A. and A.T.K. Wan (1998): Stein-rule estimation in a dynamic linear model,
J. Appl. Stat. Science 7 (1998) 17-25
Chaturvedi, A. and A.T.K. Wan (2001): Stein-rule restricted regression estimator in a
linear regression model with nonspherical disturbances, Commun. Statist.-Theory
Meth. 30 (2001) 55-68
Chaturvedi, A. and A.T.K. Wan (1999): Estimation of regression coefficients subject to
interval constraints in models with non-spherical errors, Indian J.Statistics 61 (1999)
433-442
Chauby, Y.P. (1980): Minimum norm quadratic estimators of variance components,
Metrika 27 (1980) 255-262
Chen, C. (2003): Robust tools in SAS, Developments in Robust Statistics, pp. 125133,
Physica Verlag, Heidelberg 2003
Chen, H.-C. (1998): Generalized reflexive matrices: Special properties and applications,
Society for Industrial and Applied Mathematics, 9 (1998) 141-153
Chen, R.-B. and M.-N.L., Huang (2000): Exact D-optimal designs for weighted polyno-
mial regression model, Computational Statistics & Data Analysis 33 (2000) 137-149
Chen, X. (2001): On maxima of dual function of the CDT subproblem, J. Comput.
Mathematics 19 (2001) 113-124
Chen, Z. and J. Mi (1996): Confidence interval for the mean of the exponential distribu-
tion, based on grouped data, IEEE Transactions on Reliability 45 (1996) 671-677
Cheng, C.L. (1998): Polynomial regression with errors in variables, J. Royal Statistical
Soc. B60 (1998) 189-199
Cheng, C.L. and J.W. van Ness (1999): Statistical regression with measurement error,
Arnold Publ., London 1999
Cheng, C.-S. (1996): Optimal design: exact theory, Handbook of Statistik 13 (1996) 977-
1006
References 671
Cherrie, J.B., Beatson, R.K. and G.N. Newsam (2002): Fast evaluation of radial basis
functions: methods for generalized multiquadrics in RN*, SIAM J. Sci. Comput. 23
(2002) 1459-1571
Chiang, C.-Y. (1998): Invariant parameters of measurement scales, British J.Mathematical
and Statistical Psychology 51 (1998) 89-99
Chiang, C.L. (2003): Statistical methods of analysis, University of California, Berkeley,
USA 2003
Chikuse, Y. (1999): Procrustes analysis on some special manifolds, Commun. Statist.
Theory Meth. 28 (1999) 885-903
Chilès, J.P. and P. Delfiner (1999): Geostatistics - modelling spatial uncertainty, J. Wiley,
New York 1999
Chiodi, M. (1986): Procedures for generating pseudo-random numbers from a normal
distribution of order, Riv. Stat. Applic. 19 (1986) 7-26
Chmielewski, M.A. (1981): Elliptically symmetric distributions: a review and bibliogra-
phy, International Statistical Review 49 (1981) 67-74
Chow, T.L. (2000): Mathematical methods for physicists, Cambridge University Press,
Cambridge 2000
Chow, Y.S. and H. Teicher (1978): Probability theory, Springer Verlag, New York 1978
Christensen, R. (1996): Analysis of variance, design and regression, Chapman and Hall,
Boca Raton 1996
Chu, M.T. and N.T. Trendafilov (1998): Orthomax rotation problem. A differential equa-
tion approach, Behaviormetrika 25 (1998) 13-23
Chui, C.K. and G. Chen (1989): Linear Systems and optimal control, Springer Verlag,
New York 1989
Chui, C.K. and G. Chen (1991): Kalman filtering with real time applications, Springer
Verlag, New York 1991
Clark, G.P.Y.(1980): Moments of the least squares estimators in a nonlinear regression
model, JR. Statist. Soc. B42 (1980) 227-237
Clerc-Bérod, A. and S. Morgenthaler (1997): A close look at the hat matrix, Student 2
(1997) 1-12
Cobb, G.W. (1997): Introduction to design and analysis of experiments, Springer Verlag,
New York 1997
Cobb, L., Koppstein, P. and N.H. Chen (1983): Estimation and moment recursions rela-
tions for multimodal distributions of the exponential family, J. American Statist.
Assoc. 78 (1983) 124-130
Cochran, W. (1972a): Some effects of errors of measurement on linear regression, in:
Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Prob-
ability, pages 527-539, UCP, Berkeley 1972
Cochran, W. (1972b): Stichprobenverfahren, de Gruyter, Berlin 1972
Cohen, A. (1966): All admissible linear estimates of the mean vector, Ann. Math. Statist.
37 (1966) 458-463
Cohen, C. and A. Ben-Israel (1969): On the computation of canonical correlations, Ca-
hiers Centre Études Recherche Opér 11 (1969) 121-132
Collett, D. (1992): Modelling binary data, Chapman and Hall, Boca Raton 1992
Collet, D. and T. Lewis (1981): Discriminating between the von Mises and wrapped
normal distributions, Austr. J. Statist. 23 (1981) 73-79
Colton, D., Coyle, J. and P. Monk (2000): Recent developments in inverse acoustic scat-
tering theory, SIAM Review 42 (2000) 369-414
Cook, R.D., Tsai, C.L. and B.C. Wei (1986): Bias in nonlinear regression, Biometrika 73
(1986) 615-623
672 References
Cook, R.D. and S. Weisberg (1982): Residuals and influence in regression, Chapman and
Hall, London 1982
Cottle, R.W. (1974): Manifestations of the Schur complement, Linear Algebra Appl. 8
(1974) 189-211
Cox, A.J. and N.J. Higham (1999): Row-wise backward stable elimination methods for
the equality constrained least squares problem, SIAM J. Matrix Anal. Appl. 21 (1999)
313-326
Cox, D., Little, J. and D. O’Shea (1992): Ideals varieties and algorithms, Springer Verlag,
New York 1992
Cox, D.R. and D.V. Hinkley (1979): Theoretical statistics, Chapman and Hall, Boca
Raton 1979
Cox, D.R. and V. Isham (1980): Point processes, Chapman and Hall, Boca Raton 1980
Cox, D.R. and E.J. Snell (1989): Analysis of binary data, Chapman and Hall, Boca Raton
1989
Cox, D.R. and N. Reid (2000): The theory of the design of experiments, Chapman & Hall,
Boca Raton 2000
Cox, D.R. and N. Wermuth (1996): Multivariate dependencies, Chapman and Hall, Boca
Raton 1996
Cox, T.F. and M.A.A. Cox (2001): Multidimensional scaling, Chapman and Hall, Boca
Raton, Florida 2001
Cox, D.R. and P.J. Salomon (2003): Components of variance, Chapman & Hall/CRC,
Boca Raton – London – New York – Washington D.C. 2003
Craig, A.T. (1943): Note on the independence of certain quadratic forms, The Annals of
Mathematical Statistics 14 (1943) 195-197
Cross, P.A. (1985): Numerical methods in network design, in: Grafarend, E.W. and F.
Sanso (eds.), Optimization and design of geodetic networks, pp. 429-435, Springer,
Berlin - Heidelberg - New York 1985,
Crowder, M.J. (1987): On linear and quadratic estimating function, Biometrika 74 (1987)
591-597
Crowder, M.J. and D.J. Hand (1990): Analysis of repeated measures, Chapman and Hall,
Boca Raton 1990
Crowder, M.J., Sweeting, T. and R. Smith (1994): Statistical analysis of reliability data,
Chapman and Hall, Boca Raton 1994
Crowder, M.J., Kimber, A., Sweeting, T., and R. Smith (1993): Statistical analysis of
reliability data, Chapman and Hall, Boca Raton 1993
Csörgö, M. and L. Horvath (1993): Weighted approximations in probability and statistics,
J. Wiley, Chichester 1993
Csörgö, S. and L. Viharos (1997): Asymptotic normality of least-squares estimators of tail
indices, Bernoulli 3 (1997) 351-370
Csörgö, S. and J. Mielniczuk (1999): Random-design regression under long-range de-
pendent errors, Bernoulli 5 (1999) 209-224
Cummins, D. and A.C. Webster (1995): Iteratively reweighted partial least-squares: a
performance analysis by Monte Carlo simulation, J. Chemometrics 9 (1995) 489-507
Cunningham, E. (1910): The principle of relativity in electrodynamics and an extension
thereof, Proc. London Math. Soc. 8 (1910) 77-98
Czuber, E. (1891): Theorie der Beobachtungsfehler, Leipzig 1891
D‘Agostino, R. and M.A. Stephens (1986): Goddness-of-fit techniques, Marcel Dekker,
New York 1986
Daniel, J.W. (1967): The conjugate gradient method for linear and nonlinear operator
equations, SIAM J. Numer. Anal. 4 (1967) 10-26
References 673
Dantzig, G.B. (1940): On the nonexistence of tests of „Student’s“ hypothesis having
power functions independent of V2, Ann. Math. Statist. 11 (1949) 186-192
Das, I. (1996): Normal-boundary intersection: A new method for generating the Pareto
surface in nonlinear multicriteria optimization problems, SIAM J. Optim. 3 (1998)
631ff.
Das, R. and B.K. Sinha (1987): Robust optimum invariant unbiased tests for variance
components. In Proc. of the Second International Tampere Conference in Statistics. T.
Pukkila and S. Puntanen (eds.), University of Tampere - Finland (1987) 317-342
Das Gupta, S. (1980): Distribution of the correlation coefficient, in: Fienberg, S., Gani, J.,
Kiefer, J. and K. Krickeberg (eds): Lecture notes in statistics, pp. 9-16, Springer Ver-
lag 1980
Das Gupta, S., Mitra, S.K., Rao, P.S., Ghosh, J.K., Mukhopadhyay, A.C. and Y.R. Sarma
(1994a): Selected papers of C.R.Rao, vol. 1, John Wiley, New York 1994
Das Gupta, S., Mitra, S.K., Rao, P.S., Ghosh, J.K., Mukhopadhyay, A.C. and Y.R. Sarma
(1994b): Selected papers of C.R.Rao, vol. 2, John Wiley, New York 1994
David, F.N. and N.L. Johnson (1948): The probability integral transformation when pa-
rameters are estimated from the sample, Biometrika 35 (1948) 182-190
David, F.N. (1954): Tables of the ordinates and probability integral of the distribution of
the correlation coefficient in small samples, Cambridge University Press, London
1954
David, H.A. (1957): Some notes on the statistical papers of Friedrich Robert Helmert
(1943-1917), Bull. Stat. Soc. NSW 19 (1957) 25-28
David, H.A. (1970): Order Statistics, J. Wiley, New York 1970
David, H.A., Hartley, H.O. and E.S. Pearson (1954): The distribution of the ratio in a
single normal sample, of range to standard deviation, Biometrika 41 (1954) 482-293
Davidian, M. and A.R. Gallant (1993): The nonlinear mixed effects model with a smooth
random effects density, Biometrika 80 (1993) 475-488
Davidian, M., and D.M. Giltinan (1995): Nonlinear models for repeated measurement
data, Chapman and Hall, Boca Raton 1995
Davis, R.A.(1997) : M-estimation for linear regression with infinite variance, Probability
and Mathematical Statistics 17 (1997) 1-20
Davis, R. A. and W.T.M. Dunsmuir (1997): Least absolute deviation estimation for re-
gression with ARMA errors, J. theor. Prob. 10 (1997) 481-497
Davis, J.H. (2002): Foundations of deterministic and stochastic control, Birkhäuser, Bos-
ton-Basel-Berlin 2002
Davison, A.C. and D.V. Hinkley (1997): Bootstrap methods and their application, Cam-
bridge University Press, Cambridge 1997
Dax, A. (1997): An elementary proof of Farkas’ lemma, SIAM Rev. 39 (1997) 503-507
Decreusefond, L. and A.S. Üstünel (1999): Stochastic analysis of the fractional Brownian
motion, Potential Anal. 10 (1999) 177-214
Dedekind, R. (1901): Gauß in seiner Vorlesung über die Methode der kleinsten Quadrate,
Berlin 1901
Defant, A. and K. Floret (1993): Tensor norms and operator ideals, North Holland, Am-
sterdam 1993
Deitmar, A. (2002): A frist course in harmonic analysis, Springer Verlag, New York 2002
Demidenko, E. (2000): Is this the least squares estimate?, Biometrika 87 (2000) 437-452
Denham, M.C. (1997): Prediction intervals in partial least-squares, J. Chemometrics 11
(1997) 39-52
Denham, W. and S. Pines (1966): Sequential estimation when measurement function
nonlinearity is comparable to measurement error, AIAA J4 (1966) 1071-1076
674 References
Denis, J.-B. and A. Pazman (1999): Bias of LS estimators in nonlinear regression models
with constraints. Part II: Biadditive models, Applications of Mathematics 44 (1999)
375-403
Denison, D.G.T., Walden, A.T., Balogh, A. and R.J. Forsyth (1999): Multilayer testing of
spectral lines and the detection of the solar rotation frequency and its harmonics,
Appl. Statist. 48 (1999) 427-439
Dermanis, A. (1977): Geodetic linear estimation techniques and the norm choice problem,
Manuscripta Geodetica 2 (1977) 15-97
Dermanis, A. (1978): Adjustment of geodetic observations in the presence of signals,
International School of Advanced Geodesy, Erice, Sicily, May-June 1978, Bollettino
di Geodesia e Scienze Affini 38 (1979) 513-539
Dermanis, A. (1998): Generalized inverses of nonlinear mappings and the nonlinear geo-
detic datum problem, J. Geodesy 72 (1998) 71-100
Dermanis, A. and E.W. Grafarend (1981): Estimability analysis of geodetic, astrometric
and geodynamical quantities in Very Long Baseline Interferometry, Geoph. J. R. As-
tronom. Soc. 64 (1981) 31-56
Dermanis, A. and F. Sanso (1995): Nonlinear estimation problems for nonlinear models,
Manuscripta Geodaetica 20 (1995) 110-122
Dermanis, A. and R. Rummel (2000a): Parameter estimation as an inverse problem, Lec-
ture Notes in Earth Sciences 95 (2000) 24-47
Dermanis, A. and R. Rummel (2000b): The statistical approach to parameter determina-
tion: Estimation and prediction, Lecture Notes in Earth Sciences 95 (2000) 53-73
Dermanis, A. and R. Rummel (2000c): From finite to infinite-dimensional models (or
from discrete to continuous models), Lecture Notes in Earth Sciences 95 (2000) 53-73
Dermanis, A. and R. Rummel (2000d): Data analysis methods in geodesy, Lecture Notes
in Earth Sciences 95, Springer 2000
De Santis, A. (1991): Translated origin spherical cap harmonic analysis, Geoph. J. Int 106
(1991) 253-263
Dette, H. (1993): A note on E-optimal designs for weighted polynomial regression, Ann.
Stat. 21 (1993) 767-771
Dette, H. (1997a): Designing experiments with respect to 'standardized' optimality crite-
ria, J.R. Statist. Soc. B 59 (1997) 97-110
Dette, H. (1997b): E-optimal designs for regression models with quantitative factors – a
reasonable choice?, The Canadian J.Statistics 25 (1997) 531-543
Dette, H. and W. J. Studden (1993): Geometry of E-optimality, Ann. Stat. 21 (1993) 416-
433
Dette, H. and W. J. Studden (1997): The theory of canonical moments with applications in
statistics, probability, and analysis, J. Wiley, New York 1997
Dette, H. and T.E. O'Brien (1999): Optimality criteria for regression models based on
predicted variance, Biometrika 86 (1999) 93-106
Deutsch, F. (2001): Best approximation in inner product spaces, Springer Verlag, New
York 2001
Devidas, M. and E.O. George (1999): Monotonic algorithms for maximum likelihood
estimation in generalized linear models, The Indian J.Statistical 61 (1999) 382-396
Dewess, G. (1973): Zur Anwendung der Schätzmethode MINQUE auf Probleme der
Prozeßbilanzierung, Math. Operationsforschg. Statistik 4 (1973) 299-313
DiCiccio, T.J. and B. Efron (1996): Bootstrap confidence intervals, Statistical Science 11
(1996) 189-228
Diebolt, J. and J. Zuber (1999): Goodness-of-fit tests for nonlinear heteroscedastic regres-
sion models, Statistics & Probability Letters 42 (1999) 53-60
References 675
Dieck, T. (1987): Transformation groups, W de Gruyter, Berlin - New York 1987
Diggle, P.J., Liang, K.Y. and S.L. Zeger (1994): Analysis of longitudinal data, Clarendon
Press, Oxford 1994
Ding, C.G. (1999): An efficient algorithm for computing quantiles of the noncentral chi-
squared distribution, Computational Statistics & Data Analysis 29 (1999) 253-259
Dixon, W.J. (1951): Ratio involving extreme values, Ann. Math. Statistics 22 (1951) 68-
78
Dobson, A.J. (1990): An introduction to generalized linear models, Chapman and Hall,
Boca Raton 1990
Dobson, A.J. (2002): An introduction to generalized linear models, 2nd ed., Chapman -
Hall - CRC, Boca Raton 2002
Dodge, Y. (1987): Statistical data analysis based on the L1-norm and related methods,
Elsevier, Amsterdam 1987
Dodge, Y. (1997): LAD Regression for Detecting Outliers in Response and Explanatory
Variables, J. Multivariate Analysis 61 (1997) 144-158
Dodge, Y. and A.S. Hadi (1999): Simple graphs and bounds for the elements of the hat
matrix, J. Applied Statistics 26 (1999) 817-823
Dodge, Y. and D. Majumdar (1979): An algorithm for finding least square generalized
inverses for classification models with arbitrary patterns, J. Statist. Comput. Simul. 9
(1979) 1-17
Dodge, Y. and J. Jurecková (1997): Adaptive choice of trimming proportion in trimmed
least-squares estimation, Statistics & Probability Letters 33 (1997) 167-176
Donoho, D.L. and P.J. Huber (1983): The notion of breakdown point, Festschrift für Erich
L. Lehmann, eds. P.J. Bickel, K.A. Doksum and J.L. Hodges, Wadsworth, Belmont,
Calif. 157-184, 1983
Dorea, C.C.Y. (1997): L1-convergence of a class of algorithms for global optimization,
Student 2 (1997)
Downs, T.D. and A.L. Gould (1967): Some relationships between the normal and von
Mises distributions, Biometrika 54 (1967) 684-687
Dragan, V. and A. Halanay (1999): Stabilization of linear systems, Birkhäuser Boston –
Basel – Berlin 1999
Draper, N.R. and R. C. van Nostrand (1979): Ridge regression and James-Stein estima-
tion: review and comments, Technometrics 21 (1979) 451-466
Draper, N.R. and J.A. John (1981): Influential observations and outliers in regression,
Technometrics 23 (1981) 21-26
Draper, N.R. and F. Pukelsheim (1996): An overview of design of experiments, Statistical
Papers 37 (1996) 1-32
Draper, N.R. and F. Pukelsheim (2000): Ridge analysis of mixture response surfaces,
Statistics & Probability Letters 48 (2000) 131-140
Draper, N.R. and C.R. Van Nostrand (1979): Ridge regression and James Stein estima-
tion: review and comments, Technometrics 21 (1979) 451-466
Driscoll, M.F. (1999): An improved result relating quadratic forms and Chi-Square Dis-
tributions, The American Statistician 53 (1999) 273-275
Driscoll, M.F. and B. Krasnicka (1995): An accessible proof of Craig’s theorem in the
general case, The American Statistician 49 (1995) 59-62
Droge, B. (1998): Minimax regret analysis of orthogonal series regression estimation:
selection versus shrinkage, Biometrika 85 (1998) 631-643
Drygas, H. (1975): Estimation and prediction for linear models in general spaces, Math.
Operationsforsch. Statistik 6 (1975) 301-324
676 References
Drygas, H. (1983): Sufficiency and completeness in the general Gauss-Markov model,
Sankhya Ser. A 45 (1983) 88-98
Du, Z. and D.P. Wiens (2000): Jackknifing, weighting, diagnostics and variance estima-
tion in generalized M-estimation, Statistics & Probability Letters 46 (2000) 287-299
Duan, J.C. (1997): Augmented GARCH (p,q) process and its diffusion limit, J. Economet-
rics 79 (1997) 97-127
Duncan, W.J. (1944): Some devices for the solution of large sets of simultaneous linear
equations, London, Edinburgh and Dublin Philosophical Magazine and J.Science (7th
series) 35 (1944) 660-670
Dunfour, J.M. (1986): Bias of s2 in linear regression with dependent errors, The American
Statistician 40 (1986) 284-285
Dunnett, C.W. and M. Sobel (1954): A bivariate generalization of Student’s t-distribution,
with tables for certain special cases, Biometrika 41 (1954) 153-69
Dupuis, D.J. and C.A. Field (1998): A comparison of confidence intervals for generalized
extreme-value distributions, J. Statist. Comput. Simul. 61 (1998) 341-360
Durand, D. and J.A. Greenwood (1957): Random unit vectors II: usefulness of Gram-
Charlier and related series in approximating distributions, Ann. Math. Statist. 28
(1957) 978-986
Durbin, J. and G.S. Watson (1950): Testing for serial correlation in least squares regres-
sion, Biometrika 37 (1950) 409-428
Durbin, J. and G.S. Watson (1951): Testing for serial correlation in least squares regres-
sion II, Biometrika 38 (1951) 159-177
D’Urso, P. and T. Gastaldi (2000): A least-squares approach to fuzzy linear regression
analysis, Computational Statistics & Data Analysis 34 (2000) 427-440
Dyn, N., Leviatan, D., Levin, D. and A. Pinkus (2001): Multivariate approximation and
applications, Cambridge University Press, Cambridge 2001
Ecker, E. (1977): Ausgleichung nach der Methode der kleinsten Quadrate, Öst. Z. Ver-
messungswesen 64 (1977) 41-53
Eckert, M (1935): Eine neue flächentreue (azimutale) Erdkarte, Petermann’s Mitteilungen
81 (1935) 190-192
Eckhart, C. and G. Young (1939): A principal axis transformation for non-Hermitean
matrices, Bull. Amer. Math. Soc. 45 (1939) 188-121
Eckl, M.C., Snay, R.A., Solder, T., Cline, M.W. and G.L. Mader (2001): Accuracy of
GPS-derived positions as a function of interstation distance and observing-session du-
ration, J. Geodesy 75 (2001) 633-640
Edelman, A. (1989): Eigenvalues and condition numbers of random matrices, PhD disser-
tation, Massachussetts Institute of Technology 1989
Edelman, A. (1998): The geometry of algorithms with orthogonality constraints, SIAM J.
Matrix Anal Appl. 20 (1998) 303-353
Edelman, A., Arias, T.A. and Smith, S.T. (1998): The geometry of algorithms with or-
thogonality constraints, SIAM J. Matrix Anal. Appl. 20 (1998) 303-353
Edelman, A., Elmroth, E. and B. Kagström (1997): A geometric approach to perturbation
theory of matrices and matrix pencils. Part I: Versal deformations, SIAM J. Matrix
Anal. Appl. 18 (1997) 653-692
Edgar, G.A. (1998): Integral, probability, and fractal measures, Springer Verlag, New
York 1998
Edgeworth, F.Y. (1883): The law of error, Philosophical Magazine 16 (1883) 300-309
Edlund, O., Ekblom, H. and K. Madsen (1997): Algorithms for non-linear M-estimation,
Computational Statistics 12 (1997) 373-383
References 677
Eeg, J. and T. Krarup (1973): Integrated geodesy, Danish Geodetic Institute, Report No.
7, Copenhagen 1973
Effros, E.G. (1997): Dimensions and C* algebras, Regional Conference Series in Mathe-
matics 46, Rhode Island 1997
Efromovich, S. (2000): Can adaptive estimators for Fourier series be of interest to wave-
lets?, Bernoulli 6 (2000) 699-708
Efron, B. and R.J. Tibshirani (1994): An introduction to the bootstrap, Chapman and Hall,
Boca Raton 1994
Eibassiouni, M.Y. and J. Seely (1980): Optimal tests for certain functions of the parame-
ters in a covariance matrix with linear structure, Sankya: The Indian J.Statistics 42
(1980) 64-77
Ekblom, S. and S. Henriksson (1969): Lp-criteria for the estimation of location parame-
ters, SIAM J. Appl. Math. 17 (1969) 1130-1141
Elden, L. (1977): Algorithms for the regularization of ill-conditioned least squares prob-
lems, BIT 17 (1977) 134-145
Elhay, S., Golub, G.H. and J. Kautsky (1991): Updating and downdating of orthogonal
polynomials with data fitting applications, SIAM J. Matrix Anal. Appl. 12 (1991)
327-353
Elian, S.N. (2000): Simple forms of the best linear unbiased predictor in the general linear
regression model, American Statistician 54 (2000) 25-28
Ellis, R.L. and I. Gohberg (2003): Orthogonal systems and convolution operators, Birk-
häuser Verlag, Basel-Boston-Berlin 2003
Ellenberg, J.H. (1973): The joint distribution of the standardized least squares residuals
from a general linear regression, J.the American Statistical Association 68 (1973)
941-943
Elpelt, B. (1989): On linear statistical models of commutative quadratic type, Commun.
Statist.-Theory Method 18 (1989) 3407-3450
El-Basssiouni, M.Y. and Seely, J. (1980): Optimal tests for certain functions of the pa-
rameters in a covariance matrix with linear structure, Sankya A42 (1980) 64-77
Elfving, G. (1952): Optimum allocation in linear regression theory, Ann. Math. Stat. 23
(1952) 255-263
El-Sayed, S.M. (1996): The sampling distribution of ridge parameter estimator, Egyptian
Statistical Journal, ISSR – Cairo University 40 (1996) 211-219
Engel, J. and A. Kneip (1995): Model estimation in nonlinear regression, Lecture Notes in
Statistics 104 (1995) 99-107
Engl, H.W., Hanke, M. and A. Neubauer (1996): Regularization of inverse problems,
Kluwer Academic Publishers, Dordrecht 1996
Engl, H.W., Louis, A.K. and W. Rundell (1997): Inverse problems in geophysical applica-
tions, SIAM, Philadelphia 1997
Engler, K., Grafarend, E.W., Teunissen, P. and J. Zaiser (1982): Test computations of
three-dimensional geodetic networks with observables in geometry and gravity space,
Proceedings of the International Symposium on Geodetic Networks and Computa-
tions. Vol. VII, 119-141. Report B 258/VII. Deutsche Geodätische Kommission, Bay-
erische Akademie der Wissenschaften, München 1982.
Ernst, M.D. (1998): A multivariate generalized Laplace distribution, Computational Sta-
tistics 13 (1998) 227-232
Eubank, R.L. and P. Speckman (1991): Convergence rates for trigonometric and polyno-
mial-trigonometric regression estimators, Statistical & Probability Letters 11 (1991)
119-124
Euler, N. and W.H. Steeb (1992): Continuous symmetry, Lie algebras and differential
equations, B.I. Wissenschaftsverlag, Mannheim 1992
678 References
Even-Tzur, G. (1998): Application of the set covering problem to GPS measurements,
surveying and land Information Systems 58 (1998) 25-29
Even-Tzur, G. (1999): Reliability designs and control of geodetic networks, Z. Vermes-
sungswesen 4 (1999) 128-134
Even-Tzur, G. (2001): Graph theory application to GPS networks, GPS Solution 5 (2001)
31-38
Everitt, B.S. (1987): Introduction to optimization methods and their application in statis-
tics, Chapman and Hall, London 1987
Fagnani, F. and L. Pandolci (2002): A singular perturbation approach to a recursive de-
convolution problem, SIAM J. Control Optim 40 (2002) 1384-1405
Fahrmeir, L. and G. Tutz (2001): Multivariate statistical modelling based on generalized
linear models, Springer Verlag, New York 2001
Fakeev, A.G. (1981): A class of iterative processes for solving degenerate systems of
linear algebraic equations, USSR. Comp. Maths. Math. Phys. 21 (1981) 15-22
Falk, M., Hüsler, J. and R.D. Reiss (1994): Law of small numbers, extremes and rare
events, Birkhäuser Verlag, Basel 1994
Fan, J. and I. Gijbels (1996): Local polynomial modelling and its applications, Chapman
and Hall, Boca Raton 1996
Fang, K.-T. and Y. Wang (1993): Number-theoretic methods in statistics, Chapman and
Hall, Boca Raton 1993
Fang, K.-T. and Y.-T. Zhang (1990): Generalized multivariate analysis, Science Press
Beijing - Springer Verlag, Bejing - Berlin 1990
Fang, K.-T., Kotz, S. and K.W. Ng (1990): Symmetric multivariate and related distribu-
tions, Chapman and Hall, London 1990
Fang, Z. and D.P. Wiens (2000): Integer-valued, minimax robust designs for estimation
and extrapolation in heteroscedastic, approximately linear models, J. the American
Statistical Association 95 (2000) 807-818
Farahmand, K. (1996): Random polynomials with complex coefficients, Statistics &
Probability Letters 27 (1996) 347-355
Farahmand, K. (1999): On random algebraic polynomials, Proceedings of the American
Math. Soc. 127 (1999) 3339-3344
Farebrother, R.W. (1987): The historical development of the L1 and Lf estimation proce-
dures, Statistical Data Analysis Based on the L1-Norm and Related Methods, Y.
Dodge (ed.) 1987
Farebrother, R.W. (1988): Linear least squares computations, Dekker, New York 1988
Farebrother, R.W. (1999): Fitting linear relationships, Springer Verlag, New York 1999
Farrel, R.H. (1964): Estimators of a location parameter in the absolutely continuous case,
Ann. Math. Statist. 35 (1964) 949-998
Fassò, A. (1997): On a rank test for autoregressive conditional heteroscedasticity, Student
2 (1997) 85-94
Faulkenberry , G.D. (1973): A method of obtaining prediction intervals, J. Amer. Statist.
Ass. 68 (1973) 433-435
Fausett, D.W. and C.T. Fulton (1994): Large least squares problems involving Kronecker
products, SIAM J. Matrix Anal. Appl. 15 (1994) 219-227
Fedi, M. and G. Florio (2002): A stable downward continuation by using the ISVD
method, Geophys. J. Int. 151 (2001) 146-156
Fedorov, V.V. and P. Hackl (1997): Model-oriented design of experiments, Springer
Verlag, New York 1997
Fedorov, V.V., Montepiedra, G. and C.J. Nachtsheim (1999): Design of experiments for
locally weighted regression, J.Statistical Planning and Inference 81 (1999) 363-382
References 679
Feinstein, A.R. (1996): Multivariate analysis, Yale University Press, New Haven 1996
Fengler, M., Freeden, W. and V. Michel (2003): The Kaiserslautern multiscale geopoten-
tial model SWITCH-03 from orbit pertubations of the satellite CHAMP and its com-
parison to the models EGM96, UCPH2002_02_0.5, EIGEN-1s, and EIGEN-2, Geo-
physical Journal International (submitted) 2003
Feuerverger, A. and P. Hall (1998): On statistical inference based on record values, Ex-
tremes 1:2 (1998) 169-190
Fiebig, D.G., Bartels, R. and W. Krämer (1996): The Frisch-Waugh theorem and general-
ized least squares, Econometric Reviews 15 (1996) 431-443
Fierro, R.D. (1996): Pertubation analysis for twp-sided (or complete) orthogonal decom-
positions, SIAM J. Matrix Anal. Appl. 17 (1996) 383-400
Fierro, R.D. and J.R. Bunch (1995): Bounding the subspaces from rank revealing two-
sided orthogonal decompositions, SIAM J. Matrix Anal. Appl. 16 (1995) 743-759
Fierro, R.D. and P.C. Hansen (1995): Accuracy of TSVD solutions computed from rank-
revealing decompositions, Numer. Math. 70 (1995) 453-471
Fierro, R.D. and P.C. Hansen (1997): Low-rank revealing UTV decompositions, Numer.
Algorithms 15 (1997) 37-55
Fill, J.A. and D.E. Fishkind (1999): The Moore-Penrose generalized inverse for sums of
matrices, SIAM J. Matrix Anal. Appl. 21 (1999) 629-635
Fisher, N.I. (1993): Statistical analysis of circular data, Cambridge University Press,
Cambridge 1993
Fisher, N.J. (1985): Spherical medians, J. Royal Statistical Society, Series B: 47 (1985)
342-348
Fisher, N.J. and A.J. Lee (1983): A correlation coefficient for circular data, Biometrika 70
(1983) 327-332
Fisher, N.J. and A.J. Lee (1986): Correlation coefficients for random variables on a sphere
or hypersphere, Biometrika 73 (1986) 159-164
Fisher, N.I. and P. Hall (1989): Bootstrap confidence regions for directional data, J.
American Statist. Assoc. 84 (1989) 996-1002
Fisher, R.A. (1915): Frequency distribution of the values of the correlation coefficient in
samples from an indefinitely large population, Biometrika 10 (1915) 507-521
Fisher, R.A. (1935): The fiducial argument in statistical inference, Annals of Eugenics 6
(1935) 391-398
Fisher, R.A. (1939): The sampling distribution of some statistics obtained from nonlinear
equations, Ann. Eugen. 9 (1939) 238-249
Fisher, R.A. (1953): Dispersion on a sphere, Pro. Roy. Soc. Lond. A 217 (1953) 295-305
Fisher, R.A. and F. Yates (1942): Statistical tables for biological, agricultural and medical
research, 2nd edition, Oliver and Boyd, Edinburgh 1942
Fisz, M. (1970): Wahrscheinlichkeitsrechnung und mathematische Statistik, WEB deut-
scher Verlag der Wissenschaften, Berlin 1970
Fitzgerald, W.J., Smith, R.L., Walden, A.T. and P.C. Young (2001): Non-linear and non-
stationary signal processing, Cambridge University Press, Cambridge 2001
Fletcher, R. and C.M. Reeves (1964): Function minimization by conjugate gradients,
Comput. J. 7 (1964) 149-154
Flury, B. (1997): A first course in multivariate statistics, Springer Verlag, New York 1997
Focke, J. and G. Dewess (1972): Über die Schätzmethode MINQUE von C.R. Rao und
ihre Verallgemeinerung, Math. Operationsforschg. Statistik 3 (1972) 129-143
Foerstner, W. (1979a): Konvergenzbeschleunigung bei der a posteriori Varianzschätzung,
Z. Vermessungswesen 104 (1979) 149-156
680 References
Foerstner, W. (1979b): Ein Verfahren zur Schätzung von Varianz- und Kovarianz- Kom-
ponenten, Allg. Vermessungsnachrichten 86 (1979) 446-453
Foerstner, W. (1983): Reliability and discernability of extended Gauss-Markov models,
in: Seminar – Mathematical models of geodetic/Photogrammetric point determination
with regard to outliers and systematic errors, Ackermann, F.E. (ed), München 1983
Foerstner, W. and B. Moonen (2003): A metric for covariance matrices, in: E.W. Grafar-
end, F. Krumm and V. Schwarze: Geodesy – the Challenge of the 3rd Millenium, pp.
299-309, Springer Verlag, Berlin 2003
Forsgren, A. and W. Murray (1997): Newton methods for large-scale linear inequality-
constrained minimization, Siam J. Optim. 7 (1997) 162-176
Forsgren, A. and G. Sporre (2001): On weighted linear least-squares problems related to
interior methods for convex quadratic programming, SIAM J. Matrix Anal. Appl. 23
(2001) 42-56
Forsythe, A.B. (1972): Robust estimation of straight line regression coefficients by mini-
mizing p-th power deviations, Technometrics 14 (1972) 159-166
Foster, L.V. (2003): Solving rank-deficient and ill-posed problems using UTV and QR
factorizations, SIAM J. Matrix Anal. Appl. 25 (2003) 582-600
Fotiou, A. and D. Rossikopoulos (1993): Adjustment, variance component estimation and
testing with the affine and similarity transformations, Z. Vermessungswesen 118
(1993) 494-503
Foucart, T. (1999): Stability of the inverse correlation matrix. Partial ridge regression,
J.Statistical Planning and Inference 77 (1999) 141-154
Fox, M. and H. Rubin (1964): Admissibility of quantile estimates of a single location
parameter, Ann. Math. Statist. 35 (1964) 1019-1031
Franses, P.H. (1998): Time series models for business and esonomic forecasting, Cam-
bridge University Press, Cambridge 1998
Fraser, D.A.S. (1963): On sufficiency and the exponential family, J. Roy. Statist. Soc. 25
(1963) 115-123
Fraser, D.A.S. (1968): The structure of inference, J. Wiley, New York 1968
Fraser, D.A.S. and I. Guttman (1963): Tolerance regions, Ann. Math. Statist. 27 (1957)
162-179
Freeman, R.A. and P.V. Kokotovic (1996): Robust nonlinear control design, Birkhäuser
Verlag, Boston 1996
Freiberg, B. (1985): Exact design for regression models with correlated errors, Statistics
16 (1985) 479-484
Freund, P.G.O. (1974): Local scale invariance and gravitation, Annals of Physics 84
(1974) 440-454
Frey, M. and J.C. Kern (1997): The Pitman Closeness of a Class of Scaled Estimators,
The American Statistician, May 1997, Vol. 51 (1997) 151-154
Fristedt, B. and L. Gray (1995): A modern approach to probability theory, Birkhäuser,
Basel 1997
Frobenius, F.G. (1893): Gedächtnisrede auf Leopold Kronecker (1893), Ferdinand Georg
Frobenius, Gesammelte Abhandlungen, ed. J.S.Serre, Band III, pages 705-724,
Springer Verlag, Berlin 1968
Fujikoshi, Y. (1980): Asymptotic expansions for the distributions of sample roots under
non-normality, Biometrika 67 (1980) 45-51
Fulton, T., Rohrlich, F. and L. Witten (1962): Conformal invariance in physics, Reviews
of Modern Physics 34 (1962) 442-457
Furno, M. (1997): A robust heteroskedasticity consistent covariance matrix estimator,
Statistics 30 (1997) 201-219
References 681
Gabor, D. (1946): Theory of communication, J. the Electrical Engineers 93 (1946) 429-
441
Gaffke, N. and B. Heiligers (1996): Approximate designs for polynomial regression:
invariance, admissibility and optimality, Handbook of Statistik 13 (1996) 1149-1199
Galil, Z. (1985): Computing d-optimum weighing designs: Where statistics, combina-
torics, and computation meet, in: Proceedings of the Berkeley Conference in Honor of
Jerzy Neyman and Jack Kiefer, Vol. II, eds. L.M. LeCam and R.A. Olshen,
Wadsworth 1985
Gallant, A.R. (1987): Nonlinear statistical models, John Wiley, New York 1987
Gallavotti, G. (1999): Statistical mechanics: A short treatise, Springer-Verlag, New York
1999
Gander, W. (1981): Least squares with a quadratic constraint, Numer. Math. 36 (1981)
291-307
Gao, S. and T.M.F. Smith (1995): On the nonexistence of a global nonengative minimum
bias invariant quadratic estimator of variance components, Statistics and Probability
letters 25 (1995) 117-120
Gao, S. and T.M.F. Smith (1998): A constrained Minqu estimator of correlated response
variance from unbalanced dara in complex surveys, Statistica Sinica 8 (1998) 1175-
1188
Gao, Y., Lahaye, F., Heroux, P., Liao, X., Beck, N. and M. Olynik (2001): Modelling and
estimation of C1-P1 bias in GPS receivers, J. Geodesy 74 (2001) 621-626
Garcia, A.G. (2000): Orthogonal sampling formulas: a unified approach, SIAM Review
42 (2000) 499-512
García-Escudero, L.A., Gordaliza, A. and C. Matrán (1997): k-Medians and trimmed k-
medians, Student 2 (1997) 139-148
Garcia-Ligero, M.J., Hermoso, A. and J. Linares (1998): Least squared estimation for
distributed parameter systems with uncertain observations: Part 1: Linear prediction
and filtering, Applied Stochastic Models and Data Analysis 14 (1998) 11-18
Gassmann, H. (1989): Einführung in die Regelungstechnik, Verlag Harri Deutsch, Frank-
furt am Main 1989
Gather, U. and C. Becker (1997): Outlier identification and robust methods, Handbook of
Statistics 15 (1997) 123-143
Gauss, C.F. (1809): Theoria Motus, Corporum Coelesium, Lib. 2, Sec. III, Perthes u.
Besser Publ., 205-224, Hamburg 1809
Gauss, C.F. (1816): Bestimmung der Genauigkeit der Beobachtungen, Z. Astronomi 1
(1816) 185-197
Gautschi, W. (1982): On generating orthogonal polynomials, SIAM Journal on Scientific
and Statistical Computing 3 (1982) 289-317
Gautschi, W. (1985): Orthogonal polynomials - constructive theory and applications, J.
Comput. Appl. Math. 12/13 (1985) 61-76
Gautschi, W. (1997): Numerical analysis - an introduction, Birkhäuser Verlag, Boston-
Basel-Berlin 1997
Gelfand, A.E. and D.K. Dey (1988): Improved estimation of the disturbance variance in a
linear regression model, J. Econometrics 39 (1988) 387-395
Gelman, A., Carlin, J.B., Stern, H.S. and D.B. Rubin (1995): Bayesian data analysis,
Chapman and Hall, London 1995
Genton, M.G. (1998): Asymptotic variance of M-estimators for dependent Gaussian
random variables, Statistics and Probability Lett. 38 (1998) 255-261
Genton, M.G. and Y. Ma (1999): Robustness properties of dispersion estimators, Statistics
& Probability Letters 44 (1999) 343-350
682 References
Ghosh, M. and G. Meeden (1978): Admissibility of the MLE of the normal integer mean,
The Indian J.Statistics 40 (1978) 1-10
Ghosh, M., Mukhopadhyay, N. and P.K. Sen (1997): Sequential estimation, Wiley, New
York 1997
Ghosh, S. (1996): Wishart distribution via induction, The American Statistician 50 (1996)
243-246
Ghosh, S. (1999a): Multivariate analysis, design of experiments, and survey sampling,
Marcel Dekker, Basel 1999
Ghosh, S. (ed.)(1999b): Multivariate analysis, design of experiments, and survey sam-
pling, Marcel Dekker, New York 1999
Ghosh, S., Beran, J. and J. Innes (1997): Nonparametric conditional quantile estimation in
the presence of long memory, Student 2 (1997) 109-117
Giacolone, M. (1997): Lp-norm estimation for nonlinear regression models, Student 2
(1997) 119-130
Gil, A. and J. Segura (1998): A code to evaluate prolate and oblate spheroidal harmonics,
Computer Physics Communications 108 (1998) 267-278
Gil, J.A. and R. Romera (1998): On robust partial least squares (PLS) methods, J.
Chemometrics 12 (1998) 365-378
Gilbert, E.G. and C.P. Foo (1990): Computing the distance between general convex ob-
jects in three-dimensional space, JEEE Transactions on Robotics and Automation 6
(1990) 53-61
Gilberg, F., Urfer, F. and L. Edler (1999): Heteroscedastic nonlinear regression models
with random effects and their application to enzyme kinetic data, Biometrical Journal
41 (1999) 543-557
Gilchrist, R. and G. Portides (1995): M-estimation: some remedies, Lectures Notes in
Statistics 104 (1995) 117-124
Gill, P.E., Murray, W. and M.A. Saunders (2002): Snopt: An SQP algorithm for large
scale constrained optimization, Siam J. Optim. 12 (2002) 979-1006
Gille, J.C., Pelegrin, M. and P. Decaulne (1964): Lehrgang der Regelungstechnik, Verlag
Technik, Berlin 1964
Giri, N. (1977): Multivariate statistical inference, Academic Press, New York 1977
Giri, N. (1993): Introduction to probability and statistics, 2nd edition, Marcel Dekker,
New York 1993
Giri, N. (1996a): Multivariate statistical analysis, Marcel Dekker, New York 1996
Giri, N. (1996b): Group invariance in statistical inference, World Scientific, Singapore
1996
Girko, V.L. (1988): Spectral theory of random matrices, Nauka, Moscow 1988
Girko, V.L. (1990): Theory of random determinants, Kluwer Academic Publishers,
Dordrecht 1990
Girko, V.L. and A.K.Gupta (1996): Multivariate elliptically contoured linear models and
some aspects of the theory of random matrices, in: Multidimensional statistical analy-
sis and theory of random matrices, Proceedings of the Sixth Lukacs Symposium, eds.
Gupta, A.K. and V.L. Girko, pages 327-386, VSP, Utrecht 1996
Glatzer, E. (1999): Über Versuchsplanungsalgorithmen bei korrelierten Beobachtungen,
Master's Thesis, Wirtschaftsuniversität Wien
Gleason, J.R. (2000): A note on a proposed student t approximation, Computational Sta-
tistics & Data Analysis 34 (2000) 63-66
Gleick, J. (1987): Chaos, Viking, New York 1987
Glesser, L.J. and I. Olkin (1972): Estimation for a regression model with an unknown
covariance matrix, in: Proceedings of the Sixth Berkeley Symposium on Mathemati-
References 683
cal Statistics and Probability, pp. 541-569, Cam, L.M.L, Neyman, J. and Scott, E.L.,
University of California Press, Berkley and Los Angeles 1972
Glimm, J. (1960): On a certain class of operator algebras, Trans. American Mathematical
Society 95 (1960) 318-340
Gnedenko, B.V. and A.N. Kolmogorov (1968): Limit distributions for sums of independ-
ent random variables, Addison-Wesley Publ., Reading, Mass. 1968
Gnedin, A.V. (1993): On multivariate extremal processes, J. Multivariate Analysis 46
(1993) 207-213
Gnedin, A.V. (1994): On a best choice problem with dependent criteria, J. Applied Prob-
ability 31 (1994) 221-234
Gneiting, T. (1999): Correlation functions for atmospheric data analysis, Q. J. R. Meteo-
rol. Soc. 125 (1999) 2449-2464
Gnot, S. and A. Michalski (1994): Tests based on admissible estimators in two variance
components models, Statistics 25 (1994) 213-223
Gnot, S. and G. Trenkler (1996): Nonnegative quadratic estimation of the mean squared
errors of minimax estimators in the linear regression model, Acta Applicandae
Mathematicae 43 (1996) 71-80
Goad, C.C. (1996): Single-site GPS models, in: GPS for Geodesy, pp. 219-237, Teunis-
sen, P.J.G. and A. Kleusberg (eds), Berlin 1996
Godambe, V.P. (1991): Estimating Functions, Oxford University Press 1991
Godambe, V.P. (1995): A unified theory of sampling from finite populations, J. Roy.
Statist. Soc B17 (1955) 268-278
Göbel, M. (1998): A constructive description of SAGBI bases for polynomial invariants
of permutation groups, J. Symbolic Computation 26 (1998) 261-272
Goldberger, A.S. (1962): Best linear unbiased prediction in the generalized linear regres-
sion model, J. Amer. Statist. Ass. 57 (1962) 369-375
Goldie, C.M. and S. Resnick (1989): Records in a partially ordered set, Annals Probability
17 (1989) 678-689
Goldie, C.M. and S. Resnick (1995): Many multivariate records, Stochastic Processes
Appl. 59 (1995) 185-216
Goldie, C.M. and S. Resnick (1996): Ordered independent scattering, Commun. Statist.
Stochastic Models 12 (1996) 523-528
Goldstine, H. (1977): A history of numerical analysis from the 16th through the 19th
century, Springer Verlag, New York 1977
Golshstein, E.G. and N.V. Tretyakov (1996): Modified Lagrangian and monotone maps in
optimization, J. Wiley, New York 1996
Golub, G.H. (1965): Numerical methods for solving linear least squares solution, Numer.
Math. 7 (1965) 206-216
Golub G.H. (1968): Least squares, singular values and matrix approximations, Aplikace
Matematiky 13 (1968) 44-51
Golub, G.H. (1973): Some modified matrix eigenvalue problems, SIAM Review 15
(1973) 318-334
Golub, G.H. and C.F. van Loan (1996): Matrix computations, 3rd edition, John Hopkins
University Press, Baltimore 1996
Golub, G.H. and W. Kahan (1965): Calculating the singular values and pseudo-inverse of
a matrix, SIAM J Numer. Anal. 2 (1965) 205-224
Golub, G.H. and C. Reinsch (1970): Singular value decomposition and least squares
solutions, Numer. Math. 14 (1970) 403-420
Golub, G.H. and U. von Matt (1991): Quadratically constrained least squares and quad-
ratic problems, Numer. Math. 59 (1991) 561-580
684 References
Golub, G.H., Hansen, P.C. and D.P. O'Leary (1999): Tikhonov regularization and total
least squares, SIAM J. Matrix Anal. Appl. 21 (1999) 185-194
Gómez, E., Gómez-Villegas, M.A. and J.M. Marín (1998): A multivariate generalization
of the power exponential family of distributions, Commun. Statist. - Theory Meth. 27
(1998) 589-600
Gonin, R. and A.H. Money (1987a): Outliers in physical processes: L1- or adaptive Lp-
norm estimation?, in: Statistical Data Analysis Based on the L1 Norm and Related
Methods, Dodge Y. (ed), North-Holland 1987
Gonin, R. and A.H. Money (1987b): A review of computational methods for solving the
nonlinear L1 norm estimation problem, in: Statistical data analysis based on the L1
norm and related methods, Ed. Y. Dodge, North Holland 1987
Gonin, R. and A.H. Money (1989): Nonlinear lp-norm estimation, Marcel Dekker, New
York 1989
Goodall, C. (1991): Procrustes methods in the statistical analysis of shape, J. Royal Statis-
tical Society B 53 (1991) 285-339
Goodal, C.R. (1993): Computation using the QR decomposition, C.R. Rao, ed., Handbook
of Statistic 9 (1993) 467-508
Goodman, J.W. (1985): Statistical optics, Wiley, New York 1985
Gordon, A.D. (1997): L1-norm and L2-norm methodology in cluster analysis, Student 2
(1997) 181-193
Gordon, A.D. (1999): Classification, 2nd edition, Chapman and Hall, Yew York 1999
Gordon, L. and M. Hudson (1977): A characterization of the Von Mises Distribution,
Ann. Statist. 5 (1977) 813-814
Gordonova, V.I. (1973): The validation of algorithms for choosing the regularization
parameter, Zh. vychisl. mat. mat. fiz. 13 (1973) 1328-1332
Gorman, T.W.: (2001): Adaptive estimation using weighted least squares, Aust. N. Z. J.
Stat. 43 (2001) 287-297
Gotthardt, E. (1978): Einführung in die Ausgleichungsrechnung, 2. Auflage, Karlsruhe
1978
Gould, A.L. (1969): A regression technique for angular varietes, Biometrica 25 (1969)
683-700
Gower, J.C. and G.B. Dijksterhuis (2004): Procrustes Problems, Oxford Statistical Scien-
ce Series 30, Oxford 2004
Grafarend, E.W. (1967a): Bergbaubedingte Deformation und ihr Deformationstensor
Bergbauwissenschaften 14 (1967) 125-132
Grafarend, E.W. (1967b): Allgemeiner Fehlertensor bei a priori und a posteriori Korrela-
tionen, Z. Vermessungswesen 92 (1967) 157-165
Grafarend, E.W. (1969): Helmertsche Fußpunktkurve oder Mohrscher Kreis?, Allg. Ver-
messungsnachrichten 76 (1969) 239-240
Grafarend, E.W. (1970a): Verallgemeinerte Methode der kleinsten Quadrate für zyklische
Variable, Z. Vermessungswesen 4 (1970) 117-121
Grafarend, E.W. (1970b): Die Genauigkeit eines Punktes im mehrdimensionalen Euklidi-
schen Raum, Deutsche Geodätische Kommission bei der Bayerischen Akademie der
Wissenschaften C 153, München 1970
Grafarend, E.W. (1970c): Fehlertheoretische Unschärferelation, Festschrift Professor Dr.-
Ing. Helmut Wolf, 60. Geburtstag, Bonn 1970
Grafarend, E.W. (1971a): Mittlere Punktfehler und Vorwärtseinschneiden, Z. Vermes-
sungswesen 96 (1971) 41-54
Grafarend, E.W. (1971b): Isotropietests von Lotabweichungen Westdeutschlands, Z.
Geophysik 37 (1971) 719-733
References 685
Grafarend, E.W. (1972a): Nichtlineare Prädiktion, Z. Vermessungswesen 97 (1972) 245-
255
Grafarend, E.W. (1972b): Isotropietests von Lotabweichungsverteilungen Westdeutsch-
lands II, Z. Geophysik 38 (1972) 243-255
Grafarend, E.W. (1972c): Genauigkeitsmaße geodätischer Netze, Deutsche Geodätische
Kommission bei der Bayerischen Akademie der Wissenschaften A 73, München 1972
Grafarend, E.W. (1973a): Nichtlokale Gezeitenanalyse, Mitt. Institut für Theoretische
Geodäsie No. 13, Bonn 1973
Grafarend, E.W. (1973b): Optimales Design geodätische Netze 1 (zus. P. Harland), Deut-
sche Geodätische Kommission bei der Bayerischen Akademie der Wissenschaften A
74, München 1973
Grafarend, E.W. (1974): Optimization of geodetic networks, Bollettino di Geodesia e
Scienze Affini 33 (1974) 351-406
Grafarend, E.W. (1975): Second order design of geodetic nets, Z. Vermessungswesen 100
(1975) 158-168
Grafarend, E.W. (1976): Geodetic applications of stochastic processes, Physics of the
Earth and Planetory Interiors 12 (1976) 151-179
Grafarend, E.W. (1978): Operational geodesy, in: Approximation Methods in Geodesy,
eds. H. Moritz and H. Sünkel, pp. 235-284, H. Wichmann Verlag, Karlsruhe 1978
Grafarend, E.W. (1979): Kriterion-Matrizen I - zweidimensional homogene und isotope
geodätische Netze - Z. Vermessungswesen 104 (1979) 133-149
Grafarend, E.W. (1983): Stochastic models for point manifolds, in: Mathematical models
of geodetic/ photogrammetric point determination with regard to outliers and system-
atic errors, ed. F.E. Ackermann, Report A 98, 29-52, Deutsche Geodätische Kommis-
sion, Bayerische Akademie der Wissenschaften, München 1983
Grafarend, E.W. (1984): Variance-covariance component estimation of Helmert type in
the Gauss-Helmert model, Z. Vermessungswesen 109 (1984) 34-44
Grafarend, E.W. (1985a): Variance-covariance component estimation, theoretical results
and geodetic applications, Statistics and Decision, Supplement Issue No. 2 (1985)
407-447
Grafarend, E.W. (1985b): Criterion matrices of heterogeneously observed threedimen-
sional networks, Manuscripta Geodaetica 10 (1985) 3-22
Grafarend, E.W. (1985c): Criterion matrices for deforming networks, in: Optimization
and Design of Geodetic Networks, E.W. Grafarend and F. Sanso (eds.) pages 363-
428, Springer-Verlag, Berlin-Heidelberg-New York-Tokyo 1985
Grafarend, E.W. (1986): Generating classes of equivalent linear models by nuisance
parameter elimination - applications to GPS observations, Manuscripta Geodaetica 11
(1986) 262-271
Grafarend, E.W. (1989a): Four lectures on special and general relativity, Lecture Notes in
Earth Sciences, F. Sanso and R. Rummel (eds.), Theory of Satellite Geodesy and
Gravity Field Determination, Nr. 25, pages 115-151, Springer Verlag Berlin - Heidel-
berg - New York - London - Paris - Tokyo - Hongkong 1989
Grafarend, E.W. (1989b): Photogrammetrische Positionierung, Festschrift Prof. Dr.-Ing.
Dr. h.c. Friedrich Ackermann zum 60. Geburtstag, Institut für Photogrammetrie, Uni-
versität Stuttgart, Report 14, pages 45-55, Suttgart 1989.
Grafarend, E.W. (1991a): Relativistic effects in geodesy, Report Special Study Group
4.119, International Association of Geodesy, Contribution to "Geodetic Theory and
Methodology" ed. F. Sanso, 163-175, Politecnico di Milano, Milano/Italy 1991
Grafarend, E.W. (1991b): The Frontiers of Statistical Scientific Theory and Industrial
Applications (Volume II of the Proceedings of ICOSCO-I), American Sciences Press,
pages 405-427, New York 1991
686 References
Grafarend, E.W. (1998): Helmut Wolf – das wissenschaftliche Werk - the scientific work,
Heft A 115, Deutsche Geodätische Kommission, Bayerische Akademie der Wissen-
schaften, C.H. Beck’sche Verlagsbuchhandlung, 97 Seiten, München 1998
Grafarend, E.W. (2000): Mixed integer-real valued adjustment (IRA) problems, GPS
Solutions 4 (2000) 31-45
Grafarend, E.W. and J. Awange (2002a): Nonlinear adjustment of GPS observations of
type pseudo-ranges, GPS Solutions 5 (2002) 80-93
Grafarend, E.W. and J. Awange (2002b): Algebraic solution of GPS pseudo-ranging
equations, GPS Solutions 5 (2002) 20-32
Grafarend, E.W. and J. Shan (1997): Estimable quantities in projective networks, Z. Ver-
messungswesen, Part I, 122 (1997) 218-226, Part II, 122 (1997) 323-333
Grafarend, E.W. and J. Shan (2002): GPS Solutions: closed forms, critical and special
configurations of P4P, GPS Solutions 5 (2002) 29-42
Grafarend, E.W. and A. d'Hone (1978): Gewichtsschätzung in geodätischen Netzen,
Deutsche Geodätische Kommission bei der Bayerischen Akademie der Wissenschaf-
ten A 88, München 1978
Grafarend, E.W. and B. Richter (1978): Threedimensional geodesy II-the datum problem,
Z. Vermessungswesen 103 (1978) 44-59
Grafarend, E.W. and G. Kampmann (1996): C10(3): The ten parameter conformal group
as a datum transformation in threedimensional Euclidean space, Z. Vermessungswe-
sen 121 (1996) 68-77
Grafarend, E.W. W. and A. Kleusberg (1980): Expectation and variance component esti-
mation of multivariate gyrotheodolite observations, I. Allg. Vermessungsnachrichten
87 (1980) 129-137
Grafarernd, E.W. and F. Krumm (1985): Continuous networks I, in: Optimization and
Design of Geodetic Net-works. E.W. Grafarend and F. Sanso (Ed.) pp. 301-341,
Springer-Verlag, Berlin-Heidelberg-New York-Tokyo 1985
Grafarend, E.W. and A. Mader (1989): A graph-theoretical algorithm for detecting con-
figuration defects in triangular geodetic networks, Bulletin Géodésique 63 (1989)
387-394
Grafarend, E.W. and V. Mueller (1985): The critical configuration of satellite networks,
especially of Laser and Doppler type, for planar configurations of terrestrial points,
Manuscripta Geodaetica 10 (1985) 131-152
Grafarend, E.W. and F. Sanso (1985): Optimization and design of geodetic networks,
Springer Verlag, Berlin-Heidelberg-New York-Tokyo 1985
Grafarend, E.W. and B. Schaffrin (1974): Unbiased free net adjustment, Surv. Rev. XXII,
171 (1974) 200-218
Grafarend, E.W. and B. Schaffrin (1976): Equivalence of estimable quantities and invari-
ants in geodetic networks, Z. Vermessungswesen 191 (1976) 485-491
Grafarend, E.W. and B. Schaffrin (1979): Kriterion-Matrizen I – zweidimensional homo-
gene und isotope geodätische Netze – Z. Vermessungswesen 104 (1979), 133-149
Grafarend, E.W. and B. Schaffrin (1982): Kriterion Matrizen II: Zweidimensionale ho-
mogene und isotrope geodätische Netze, Teil II a: Relative cartesische Koordinaten,
Z. Vermessungswesen 107 (1982), 183-194, Teil IIb: Absolute cartesische Koordina-
ten, Z. Vermessungswesen 107 (1982) 485-493
Grafarend, E.W. and B. Schaffrin (1988): Von der statistischen zur dynamischen Auffa-
sung geodätischer Netze, Z. Vermessungswesen 113 (1988) 79-103
Grafarend, E.W. and B. Schaffrin (1989): The geometry of nonlinear adjustment - the
planar trisection problem, Festschrift to Torben Krarup eds. E. Kejlso, K. Poder and
C.C. Tscherning, Geodaetisk Institut, Meddelelse No. 58, pages 149-172, Kobenhavn
1989
References 687
Grafarend, E.W. and B. Schaffrin (1991): The planar trisection problem and the impact of
curvature on non-linear least-squares estimation, Comput. Stat. Data Anal. 12 (1991)
187-199
Grafarend, E.W. and B. Schaffrin (1993): Ausgleichungsrechnung in linearen Modellen,
Brockhaus, Mannheim 1993
Grafarend, E.W. and G. Offermanns (1975): Eine Lotabweichungskarte Westdeutschlands
nach einem geodätisch konsistenten Kolmo-gorov-Wiener Modell, Deutsche Geodäti-
sche Kommission bei der Bayerischen Akademie der Wissenschaften A 82, München
1975
Grafarend, E.W. and P. Xu (1994): Observability analysis of integrated INS/GPS system,
Bollettino di Geodesia e Scienze Affini 103 (1994) 266-284
Grafarend, E.W. and P. Xu (1995): A multi-objective second-order optimal design for
deforming networks, Geoph. Journal Int. 120 (1995) 577-589
Grafarend, E.W., Kleusberg, A. and B. Schaffrin (1980): An introduction to the variance-
covariance- component estimation of Helmert type, Z. Vermessungswesen 105 (1980)
161-180
Grafarend, E.W., Krarup, T. and R. Syffus (1996): An algorithm for the inverse of a
multivariate homogeneous polynomial of degree n, J. Geodesy 70 (1996) 276-286
Grafarend, E.W., Krumm, F. and F. Okeke (1995): Curvilinear geodetic datum transfor-
mations, Z. Vermessungswesen 120 (1995) 334-350
Grafarend, E.W., Knickemeyer, E.H. and B. Schaffrin (1982): Geodätische Datumstrans-
formationen, Z. Vermessungswesen 107 (1982) 15-25
Grafarend, E.W., Krumm, F. and B. Schaffrin (1985): Criterion matrices of heterogene-
ously observed threedimensional networks, Manuscripta Geodaetica 10 (1985) 3-22
Grafarend, E.W., Krumm, F. and B. Schaffrin (1986): Kriterion-Matrizen III: Zweidi-
mensional homogene und isotrope geodätische Netze, Z. Vermessungswesen 111
(1986) 197-207
Grafarend, E.W., Schmitt, G. and B. Schaffrin (1976): Über die Optimierung lokaler
geodätischer Netze (Optimal design of local geodetic networks), 7th course, High pre-
cision Surveying Engineering (7.Int. Kurs für Ingenieurvermessung hoher Präzision)
29 Sept - 8 Oct 1976, Darmstadt 1976
Grafarend, E.W., Mueller, J.J., Papo, H.B. and B. Richter (1979): Concepts for reference
frames in geodesy and geodynamics: the reference directions, Bulletin Géodésique 53
(1979) 195-213
Graham, A. (1981): Kronecker products and matrix calculus, J. Wiley, New York 1981
Gram, J.P. (1883): Über die Entwicklung reeller Funktionen in Reihen mittelst der Me-
thode der kleinsten Quadrate, J. Reine Angew. Math. 94 (1883) 41-73
Granger, C.W.J. and P. Newbold (1986): Forecasting economic time series, 2nd ed., Aca-
demic Press, New York 1986
Granger, C.W.J. and T. Teräsvirta (1993): Modelling nonlinear economic relations, Ox-
ford University Press, New York 1993
Graybill, F.A. (1954): On quadratic estimates of variance components, The Annals of
Mathematical Statistics 25 (1954) 267-372
Graybill, F.A. (1983): Matrices with applications in statistics, 2nd ed., Wadsworth, Belt-
mont 1983
Graybill, F.A. and R.A. Hultquist (1961): Theorems concerning Eisenhart’s model II, The
Annals of Mathematical Statistics 32 (1961) 261-269
Green, B. (1952): The orthogonal approximation of an oblique structure in factor analysis,
Psychometrika 17 (1952) 429-440
Green, P.J. and B.W. Silverman (1993): Nonparametric regression and generalized linear
models, Chapman and Hall, Boca Raton 1993
688 References
Greenbaum, A. (1997): Iterative methods for solving linear systems, SIAM, Philadelphia
1997
Greenwood, J.A. and D. Durand (1955): The distribution of length and components of the
sum of n random unit vectors, Ann. Math. Statist. 26 (1955) 233-246
Greenwood, P.E. and G. Hooghiemstra (1991): On the domain of an operator between
supremum and sum, Probability Theory Related Fields 89 (1991) 201-210
Grenander, U. (1981): Abstract inference, Wiley, New York 1981
Griffith, D.F. and D.J. Higham (1997): Learning LaTeX, SIAM, Philadelphia 1997
Grimstad, A-A. and T. Mannseth (2000): Nonlinearity, scale and sensitivity for parameter
estimation problems, SIAM J. Sci. Comput. 21 (2000) 2096-2113
Grodecki, J. (1999): Generalized maximum-likelihood estimation of variance components
with inverted gamma prior, J. Geodesy 73 (1999) 367-374
Groechenig, K. (2001): Foundations of time-frequency analysis, Birkäuser Verlag, Bos-
ton-Basel-Berlin 2001
Gross, J. (1996a): On a class of estimators in the general Gauss-Markov model, Commun.
Statist. – Theory Meth. 25 (1996) 381-388
Gross, J. (1996b): Estimation using the linear regression model with incomplete ellipsoi-
dal restrictions, Acta Applicandae Mathematicae 43 (1996) 81-85
Gross, J. (1998): Statistical estimation by a linear combination of two given statistics,
Statistics and Probability Lett. 39 (1998) 379-384
Gross, J. and G. Trenkler (1997): When do linear transforms of ordinary least squares and
Gauss-Markov estimator coincide?, Sankhya 59 (1997) 175-178
Gross, J., Trenkler, G. and E.P. Liski (1998): Necessary and sufficient conditions for
superiority of misspecified restricted least squares regression estimator, J. Statist.
Planning and Inference 71 (1998) 109-116
Gross, J., Trenkler, G. and H.J. Werner (2001): The equality of linear transforms of the
ordinary least squares estimator and the best linear unbiased estimator, The Indian J.
Statistics 63 (2001) 118-127
Grossmann, W. (1973): Grundzüge der Ausgleichungsrechnung, Springer-Verlag, Berlin
1973
Grubbs, F.E. (1973): Errors of measurement, precision, accuracy and the statistical com-
parison of measuring instruments, Technometrics 15 (1973) 53-66
Grubbs, F.E. and G. Beck (1972): Extension of sample sizes and percentage points for
significance tests of outlying observations, Technometrics 14 (1972) 847-854
Guenther, W.C. (1964): Another derivation of the non-central Chi-Square distribution, J.
the American Statistical Association 59 (1964) 957-960
Guérin, C.-A. (2000): Wavelet analysis and covariance structure of some classes of non-
stationary processes, The J.Fourier Analysis and Applications 4 (2000) 403-425
Gui, Q. and J. Zhang (1998): Robust biased estimation and its applications in geodetic
adjustments, J. Geodesy 72 (1998) 430-435
Gui, Q.M. and J.S. Liu (2000): Biased estimation in the Gauss-Markov model, Allg.
Vermessungsnachrichten 107 (2000) 104-108
Gulliksson, M. and P.A. Wedin (2000): The use and properties of Tikhonov filter matri-
ces, SIAM J. Matrix Anal. Appl. 22 (2000) 276-281
Gulliksson, M., Soederkvist, I. and P.A. Wedin (1997): Algorithms for constrained and
weighted nonlinear least-squares, Siam J. Optim. 7 (1997) 208-224
Gumbel, E.J., Greenwood, J.A. and D. Durand (1953): The circular normal distribution:
theory and tables, J. Amer. Statist. Assoc. 48 (1953) 131-152
Guolin, L. (2000): Nonlinear curvature measures of strength and nonlinear diagnosis,
Allg. Vermessungsnachrichten 107 (2000) 109-111
References 689
Guolin, L., Jinyun, G. and T. Huaxue (2000): Two kinds of explicit methods to nonlinear
adjustments of free-networks with rank deficiency, Geomatics Research Australasia
73 (2000) 25-32
Guolin, L., Lianpeng, Z. and J. Tao (2001): Linear Space [L,M] N and the law of general-
ized variance-covariance propagation, Allg. Vermessungsnachrichten 10 (2001) 352-
356
Gupta, A.K. and D.G. Kabe (1997): Linear restrictions and two step multivariate least
squares with aplications, Statistics & Probability Letters 32 (1997) 413-416
Gupta, A.K. and D.G. Kabe (1999a): On multivariate Liouville distribution, Metron 57
(1999) 173-179
Gupta, A.K. and D.G. Kabe (1999b): Distributions of hotelling’s T2 and multiple and
partial correlation coefficients for the mixture of two multivariate Gaussian popula-
tions, Statistics 32 (1999) 331-339
Gupta, A.K. and D.K. Nagar (1998): Quadratic forms in disguised matrix T-variate, Sta-
tistics 30 (1998) 357-374
Gupta, S.S. (1963): Probability integrals of multivariate normal and multivariate t 1, An-
nals of Mathematical Statistics 34 (1963) 792-828
Gut, A. (2002): On the moment problem, Bernoulli 8 (2002) 407-421
Guttman, I. (1982): Linear models: An Introduction, J. Willey & Sons 1982
Guttman, L. (1946): Enlargement methods for computing the inverse matrix, Ann. Math.
Statist. 17 (1946) 336-343
Guu, S.M., Lur, Y.Y. and C.T. Pang (2001): On infinite products of fuzzy matrices, SIAM
J. Matrix Anal. Appl. 22 (2001) 1190-1203
Haantjes, J. (1937): Conformal representations of an n-dimensional Euclidean space with
a non-definite fundamental form on itself, in: Nederl. Akademie van Wetenschappen,
Proc. Section of Sciences, vol. 40, pages 700-705, Noord-Hollandsche Uitgevers-
maatschappij, Amsterdam 1937
Haantjes, J. (1940): Die Gleichberechtigung gleichförmig beschleunigter Beobachter für
die elektromagnetischen Erscheinungen, in: Nederl. Akademie van Wetenschappen,
Proc. Section of Sciences, vol. 43, pages 1288-1299, Noord-Hollandsche Uitgevers-
maatschappij, Amsterdam 1940
Habermann, S.J. (1996): Advanced statistics, volmue I: description of populations,
Springer Verlag, New York 1996
Hadamard, J. (1899): Theorem sur les series entieres, Acta Math. 22 (1899) 1-28
Haerdle, W., Liang, H. and J. Gao (2000): Partially linear models, Physica-Verlag, Hei-
delberg 2000
Hager, W.W. (1989): Updating the inverse of a matrix, SIAM Rev. 31 (1989) 221-239
Hager, W.W. (2000): Iterative methods for nearly singular linear systems, SIAM J. Sci.
Comput. 22 (2000) 747-766
Hager, W.W. (2002): Minimizing the profile of a symmetric matrix, SIAM J. Sci. Com-
put. 23 (2002) 1799-1816
Hahn, M. and R. Bill (1984): Ein Vergleich der L1- und L2 - Norm am Beispiel Hel-
merttransformation, Allg. Vermessungsnachrichten 91 (1984) 441-450
Hahn, W. and P. Weibel (1996): Evolutionäre Symmetrietheorie, Wiss. Verlagsgesell-
schaft, Stuttgart 1996
Haimo, D. (eds) (1967): Orthogonal expansions and their continuous analogues, Southern
Illinois University Press, Carbondale 1967
Haines, G.V. (1985): Spherical cap harmonic analysis, J. Geophysical Research 90 (1985)
2583-2591
690 References
Hald, A. (1998): A history of mathematical statistics from 1750 to 1930, J. Wiley, New
York 1998
Hald, A. (2000): The early history of the cumulants and the Gram-Charlier series, Interna-
tional Statistical Review 68 (2000) 137-153
Halmos, P.R. (1946): The theory of unbiased estimation, Ann. Math. Statist. 17 (1946)
34-43
Hammersley, J.M. (1950): On estimating restricted parameters, J.R. Statist. Soc. (B) 12
(1950) 192-
Hampel, F.R. (1973): Robust estimation: a condensed partial survey, Zeitschrift für Wahr-
scheinlichkeitstheorie und verwandte Gebiete 27 (1973) 87-104
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J. and W.A. Stahel (1986): Robust statis-
tics, J. Wiley, New York 1986
Hanagal, D.D. (1996): UMPU tests for testing symmetry and stress-passing in some
bivariate exponential models, Statistics 28 (1996) 227-239
Hand, D.J. and M.J. Crowder (1996): Practical longitudinal data analysis, Chapman and
Hall, Boca Raton 1996
Hand, D.J., Daly, F., McConway, K., Lunn, D. and E. Ostrowski (1993): Handbook of
small data sets, Chapman and Hall, Boca Raton 1993
Hand, D.J. and C.C. Taylor (1987): Multivariate analysis of variance and repeated meas-
ures, Chapman and Hall, Boca Raton 1987
Handl, A.: Multivariate Analysemethoden. Theorie und Praxis multivariater Verfahren
unter besonderer Berücksichtigung von S-PLUS, Springer-Verlag
Hanke, M. (1991): Accelerated Landweber iterations for the solution of ill-posed equa-
tions, Numer. Math. 60 (1991) 341-375
Hanke, M. and P.C. Hansen (1993): Regularization methods for large-scale problems,
Surveys Math. Indust. 3 (1993) 253-315
Hansen, P.C. (1987): The truncated SVD as a method for regularization, BIT 27 (1987)
534-553
Hansen, P.C. (1990): The discrete Picard condition for discrete ill-posed problems, BIT
30 (1990) 658-672
Hansen, P.C. (1990): Truncated singular value decomposition solutions to discrete ill-
posed problems with ill-determined numerical rank, SIAM J. Sci. Statist. Comput. 11
(1990) 503-518
Hansen, P.C. (1994): Regularization tools: a matlab package for analysis and solution of
discrete ill-posed problems, Numer. Algorithms 6 (1994) 1-35
Hansen, P.C. (1995): Test matrices for regularization methods, SIAM J. Sci. Comput. 16
(1995) 506-512
Hansen, P.C. (1998): Rank-deficient and discrete ILL-posed problems, SIAM, Philadel-
phia 1998
Hardtwig, E. (1968): Fehler- und Ausgleichsrechung, Bibliographisches Institut, Mann-
heim 1968
Harley, B.I. (1956): Some properties of an angular transformation for the correlation
coefficient, Biometrika 43 (1956) 219-223
Harter, H.L. (1964): Criteria for best substitute interval estimators with an application to
the normal distribution, J. Amer. Statist. Assoc 59 (1964) 1133-1140
Harter, H.L. (1974/75): The method of least squares and some alternatives (five parts)
International Statistics Review 42 (1974) 147-174, 235-264, 43 (1975) 1-44, 125-190,
269-278
Harter, H.L. (1977): The non-uniqueness of absolute values regression, Commun. Statist.
Simul. Comput. 6 (1977) 829-838
References 691
Hartley, H.O. and J.N.K. Rao (1967): Maximum likelihood estimation for the mixed
analysis of variance model, Biometrika 54 (1967) 93-108
Hartman, P. and G.S. Watson (1974): „Normal“ distribution functions on spheres and the
modified Bessel function, Ann. Prob. 2 (1974) 593-607
Hartmann, C., Van Keer Berghen P., Smeyersverbeke, J. and D.L. Massart (1997): Robust
orthogonal regression for the outlier detection when comparing two series of meas-
urement results, Analytica Chimica Acta 344 (1997) 17-28
Hartung, J. (1981): Non-negative minimum biased invariant estimation in variance com-
ponet models, Annals of Statistics 9 (1981) 278-292
Hartung, J. (1999): Ordnungserhaltende positive Varianzschätzer bei gepaarten Messun-
gen ohne Wiederholungen, Allg. Statistisches Archiv 83 (1999) 230-247
Hartung, J. and B. Elpelt (1989): Multivariate Statistik, Oldenbourg Verlag, München
1989
Hartung, J. and K.H. Jöckel (1982): Zuverlässigkeits- und Wirtschaftlichkeitsüberlegun-
gen bei Straßenverkehrssignalanlagen, Qualität und Zuverlässigkeit 27 (1982) 65-68
Hartung, J. and D. Kalin (1980): Zur Zuverlässigkeit von Straßenverkehrssignalanlagen,
Qualität und Zuverlässigkeit 25 (1980) 305-308
Hartung, J. and B. Voet (1986): Best invariant unbiased estimators for the mean squared
error of variance component estimators, J. American Statist. Assoc. 81 (1986) 689-
691
Hartung, J. and H.J. Werner (1980): Zur Verwendung der restringierten Moore-Penrose-
Inversen beim Testen von linearen Hypothesen, Z. Angew. Math. Mechanik 60 (1980)
T344-T346
Hartung, J., Elpelt, B. and K.H. Klösener (1995): Statistik, Oldenbourg Verlag, München
1995
Hartung, J. et al (1982): Statistik, R. Oldenbourg Verlag, München 1982
Harvey, A.C. (1993): Time series models, 2nd ed., Harvester Wheatsheaf, New York 1993
Harville, D.A. (1976): Extension of the Gauss-Markov theorem to include the estimation
of random effects, Annals of Statistics 4 (1976) 384-395
Harville, D.A. (1977): Maximum likelihood approaches to variance component estimation
and to related problems, J.the American Statistical Association 72 (1977) 320-339
Harville, D.A. (1997): Matrix algebra from a statistician’s perspective, Springer Verlag,
New York 1997
Harville, D.A. (2001): Matrix algebra: exercises and solutions, Springer Verlag, New
York 2001
Hasssanein, K.M. and E.F. Brown (1996): Moments of order statistics from the rayleigh
distribution, J. Statistical Research 30 (1996) 133-152
Hassibi, A. and S. Boyd (1998): Integer parameter estimation in linear models with appli-
cations to GPS, JEEE Trans. on Signal Processing 46 (1998) 2938-2952
Haslett, J. and K. Hayes (1998): Residuals for the linear model with general covariance
structure, J. Royal Statistical Soc. B60 (1998) 201-215
Hastie, T.J. and R.J. Tibshirani (1990): Generalized additive models, Chapman and Hall,
Boca Raton 1990
Hauser, M.A., Pötscher, B.M. and E. Reschenhofer (1999): Measuring persistence in
aggregate output: ARMA models, fractionally integrated ARMA models and non-
parametric procedures, Empirical Economics 24 (1999) 243-269
Haussdorff, F. (1901): Beiträge zur Wahrscheinlichkeitsrechnung, Königlich Sächsische
Gesellschaft der Wissenschaften zu Leipzig, berichte Math. Phys. Chasse 53 (1901)
152-178
692 References
Hawkins, D.M. (1993): The accuracy of elemental set approximation for regression, J.
Amer. Statist. Assoc. 88 (1993) 580-589
Hayes, K. and J. Haslett (1999): Simplifying general least squares, American Statistician
53 (1999) 376-381
He, K. (1995): The robustness of bootstrap estimator of variance, J. Ital. Statist. Soc. 2
(1995) 183-193
He, X. (1991): A local breakdown property of robust tests in linear regression, J. Multi-
var. Analysis 38, 294-305, 1991
He, X., Simpson, D.G. and Portnoy, S.L. (1990): Breakdown robustness of tests, J. Am.
Statis. Assn 85, 446-452, 1990
Healy, D.M. (1998): Spherical Deconvolution, J. Multivariate Analysis 67 (1998) 1-22
Heck, B. (1981): Der Einfluss einzelner Beobachtungen auf das Ergebnis einer Ausglei-
chung und die Suche nach Ausreißern in den Beobachtungen, Allg. Vermessungs-
nachrichten 88 (1981) 17-34
Heideman, M.T., Johnson, D.H. and C.S. Burrus (1984): Gauss and the history of the fast
Fourier transform, JEEE ASSP Magazine 1 (1984) 14-21
Heiligers, B. (1994): E-optimal designs in weighted polynomial regression, Ann. Stat. 22
(1994) 917-929
Heine, V. (1955): Models for two-dimensional stationary stochastic processes, Biometrika
42 (1955) 170-178
Heinrich, L. (1985): Nonuniform estimates, moderate and large derivations in the central
limit theorem for m-dependent random variable, Math. Nachr. 121 (1985) 107-121
Hekimoglu, S. (1998): Application of equiredundancy design to M-estimation,
J.Surveying Engineering 124 (1998) 103-124
Hekimoglu, S. (2005): Do robust methods identify outliers more reliably than conven-
tional tests for outliers, Z. Vermessungswesen 3 (2005) 174-180
Hekimoglu, S. and M. Berber (2003): Effectiveness of robust methods in heterogeneous
linear models, J. Geodesy 76 (2003) 706-713
Hekimoglu, S and K.-R. Koch (2000): How can reliability of the test for outliers be meas-
ured?, Allg. Vermessungsnachrichten 7 (2000) 247-253
Helmert, F.R. (1875): Über die Berechnung des wahrscheinlichen Fehlers aus einer endli-
chen Anzahl wahrer Beobachtungsfehler, Z. Math. U. Physik 20 (1875) 300-303
Helmert, F.R. (1876): Diskussion der Beobachtungsfehler in Koppes Vermessung für die
Gotthardtunnelachse, Z. Vermessungswesen 5 (1876) 129-155
Helmert, F.R. (1876a): Die Genauigkeit der Formel von Peters zur Berechnung des wahr-
scheinlichen Fehlers direkter Beobachtungen gleicher Genauigkeit, Astron. Nachrich-
ten 88 (1976) 113-132
Helmert, F.R. (1876b): Über die Wahrscheinlichkeit der Potenzsummen der Beobach-
tungsfehler, Z. Math. U. Phys. 21 (1876) 192-218
Helmert, F.R. (1907): Die Ausgleichungsrechnung nach der Methode der kleinsten Quad-
rate, mit Anwendungen auf die Geodäsie, die Physik und die Theorie der Messinstru-
mente, B.G. Teubner, Leipzig – Berlin 1907
Henderson, H.V. (1981): The vec-permutation matrix, the vec operator and Kronecker
products: a review, Linear and Multilinear Algebra 9 (1981) 271-288
Henderson, H.V. and S.R. Searle (1981a): Vec and vech operators for matrices, with some
uses in Jacobians and multivariate statistics
Henderson, H.V. and S.R. Searle (1981b): On deriving the inverse of a sum of matrices,
SIAM Review 23 (1981) 53-60
Henderson, H.V., Pukelsheim, F. and S.R. Searle (1983): On the history of the Kronecker
product, Linear and Multilinear Algebra 14 (1983) 113-120
References 693
Hendriks, H. and Z. Landsman (1998): Mean location and sample mean location on mani-
folds: Asymptotics, tests, confidence regions, J. Multivar. Analysis 67 (1998) 227-243
Hengst, M. (1967): Einführung in die Mathematische Statistik und ihre Anwendung,
Bibliographisches Institut, Mannheim 1967
Henrici, P. (1962): Bounds for iterates, inverses, spectral variation and fields of values of
non-normal matrices, Numer. Math. 4 (1962) 24-40
Herzberg, A.M. and A.V. Tsukanov (1999): A note on the choice of the best selection
criterion for the optimal regression model, Utilitas Mathematica 55 (1999) 243-254
Hesse, K. (2003): Domain decomposition methods in multiscale geopotential determinati-
on from SST and SGG, Berichte aus der Mathematik, Shaker Verlag, Aachen 2003
Hetherington, T.J. (1981): Analysis of directional data by exponential models, PhD. The-
sis, University of California, Berkeley 1981
Hext, G.R. (1963): The estimation of second-order tensors, with related tests and designs,
Biometrika 50 (1963) 353-373
Heyde, C.C. (1997): Quasi-likelihood and its application. A general approach to optimal
parameter estimation, Springer Verlag, New York 1997
Hickernell, F.J. (1999): Goodness-of-fit statistics, discrepancies and robust designs, Sta-
tistics and Probability Letters 44 (1999) 73-78
Hida, T. and S. Si (2004): An innovation approach to random field, Application to white
noise theory, Probability and Statistics 2004
Higham, N.J. and F. Tisseur (2000): A block algorithm for matrix 1-norm estimation, with
an application to 1-norm pseudospectra, SIAM J. Matrix Anal. Appl. 21 (2000) 1185-
1201
Hinde, J. (1998): Overdispersion: models and estimation, Comput. Stat. & Data Anal. 27
(1998) 151-170
Hinkelmann, K. (ed) (1984): Experimental design, statistical models, and genetic statis-
tics, Marcel Dekker, Inc. 1984
Hinkley, D. (1979): Predictive likelihood, Ann. Statist. 7 (1979) 718-728
Hinkley, D., Reid, N. and E.J. Snell (1990): Statistical theory and modelling, Chapman
and Hall, Boca Raton 1990
Hjorth, J.S.U. (1993): Computer intensive statistical methods, Chapman and Hall, Boca
Raton 1993
Ho, L.L. (1997): Regression models for bivariate counts, Brazilian J. Probability and
Statistics 11 (1997) 175-197
Hoaglin, D.C. and R.E. Welsh (1978): The Hat Matrix in regression and ANOVA, The
American Statistician 32 (1978) 17-22
Hocking, R.R. (1996): Methods and applications of linear models – regression and the
analysis of variance, John Wiley & Sons. Inc 1996
Hodge, W. and D. Pedoe (1968): Methods of algebraic geometry, I, Cambridge University
Press, Cambridge 1968
Hoel, P.G. (1965): Minimax distance designs in two dimensional regression, Ann. Math.
Statist. 36 (1965) 1097-1106
Hoel, P.G., S.C. Port and C.J. Stone (1972): Introduction to stochastic processes, Hough-
ton Mifflin Publ., Boston 1972
Hoerl, A.E. and R.W. Kennard (2000): Ridge regression: biased estimation for nonor-
thogonal problems, Technometrics 42 (2000) 80-86
Hoepke, W. (1980): Fehlerlehre und Ausgleichungsrechnung, De Gruyter, Berlin 1980
Hoffmann, K. (1992): Improved estimation of distribution parameters: Stein-type estima-
tors, Teubner-Texte zur Mathematik, Stuttgart/Leipzig 1992
694 References
Hofmann, B. (1986): Regularization for applied inverse and ill-posed problems, Teubner
Texte zur Mathematik 85, Leipzig 1986
Hogg, R.V. (1972): Adaptive robust procedures: a partial review and some suggestions
for future applications and theory, J. American Statistical Association 43 (1972) 1041-
1067
Hogg, R.V. (1974): Adaptive robust procedures: a partial review and some suggestions
for future applications and theory, J. American Statistical Association 69 (1974) 909-
923
Hogg, R.V. and R.H. Randles (1975): Adaptive distribution free regression methods and
their applications, Technometrics 17 (1975) 399-407
Holota, P. (2001): Variational methods in the representation of the gravitational potential,
Cahiers du Centre Européen de Géodynamique et de Sésmologie 2001
Holota, P. (2002): Green’s function and external masses in the solution of geodetic
boundary-value problems, Presented at the 3rd Meeting of the IAG Intl. Gravity and
Geoid Comission, Thessaloniki, Greece, August 26-30, 2002
Holschneider, M. (2000): Introduction to continuous wavelet analysis, in: Klees, R. and R.
Haagmans (eds): Wavelets in the geosciences, Springer 2000
Hong, C.S. and H.J. Choi (1997): On L1 regression coefficients, Commun. Statist. Simul.
Comp. 26 (1997) 531-537
Hora, R.B. and R.J. Buehler (1965): Fiducial theory and invariant estimation, Ann. Math.
Statist. 37 (1965) 643-656
Horn, R.A. (1989): The Hadamard product, in Matrix Theory and Applications, C.R.
Johnson, ed., Proc. Sympos. Appl. Math. 40 (1989) 87-169
Horn, R.A. and C.R. Johnson (1990): Matrix analysis, Cambridge University Press, Cam-
bridge 1990
Horn, R.A. and C.R. Johnson (1991): Topics on Matrix analysis, Cambridge University
Press, Cambridge 1991
Hornoch, A.T. (1950): Über die Zurückführung der Methode der kleinsten Quadrate auf
das Prinzip des arithmetischen Mittels, Österr. Z. Vermessungswesen 38 (1950) 13-18
Hosking, J.R.M. and J.R. Wallis (1997): Regional frequency analysis. An approach based
on L-moments, Cambridge University Press 1997
Hosoda, Y. (1999): Truncated least-squares least norm solutions by applying the QR
decomposition twice, trans. Inform. Process. Soc. Japan 40 (1999) 1051-1055
Hotelling, H. (1953): New light on the correlation coefficient and its transform, J. Royal
Stat. Society, Series B, 15 (1953) 225-232
Hoyle, M.H. (1973): Transformations- an introduction and a bibliography, Int. Statist.
Review 41 (1973) 203-223
Hsu, J.C. (1996): Multiple comparisons, Chapman and Hall, Boca Raton 1996
Hsu, P.L. (1940): An algebraic derivation of the distribution of rectangular coordinates,
Proc. Edinburgh Math. Soc. 6 (1940) 185-189
Hsu, R. (1999): An alternative expression for the variance factors in using Iterated Almost
Unbiased Estimation, J. Geodesy 73 (1999) 173-179
Hsu, Y.S., Metry, M.H. and Y.L. Tong (1999): Hypotheses testing for means of depend-
ent and heterogeneous normal random variables, J. Statist. Planning and Inference 78
(1999) 89-99
Huang, J.S. (1999): Third-order expansion of mean squared error of medians, Statistics &
Probability Letters 42 (1999) 185-192
Huber, P.J. (1964): Robust estimation of a location parameter, Annals Mathematical
Statistics 35 (1964) 73-101
References 695
Huber, P.J. (1972): Robust statistics: a review, Annals Mathematical Statistics 43 (1972)
1041-1067
Huber, P.J. (1981): Robust Statistics, J. Wiley, New York 1981
Huda, S. and A.A. Al-Shiha (1999): On D-optimal designs for estimating slope, The
Indian J. Statistics 61 (1999) 485-495
Huet, S., A. Bouvier, M.A. Gruet and E. Jolivet (1996): Statistical tools for nonlinear
regression, Springer Verlag, New York 1996
Hunter, D.B. (1995): The evaluation of Legendre functions of the second kind, Numerical
Algorithms 10 (1995) 41-49
Huwang, L. and Y.H.S. Huang (2000): On errors-in-variables in polynomical regression-
Berkson case, Statistica Sinica 10 (2000) 923-936
Hwang, C. (1993): Fast algorithm for the formation of normal equations in a least-squares
spherical harmonic analysis by FFT, Manuscripta Geodaetica 18 (1993) 46-52
Ibragimov, F.A. and R.Z. Kasminskii (1981): Statistical estimation, asymptotic theory,
Springer Verlag, New York 1981
Ihorst, G. and G. Trenkler (1996): A general investigation of mean square error matrix
superiority in linear regression, Statistica 56 (1996) 15-23
Imhof, L. (2000): Exact designs minimising the integrated variance in quadratic regres-
sion, Statistics 34 (2000) 103-115
Imhof, J.P. (1961): Computing the distribution of quadratic forms in normal variables,
Biometrika 48 (1961) 419-426
Inda, M.A. de et al (1999): Parallel fast Legendre transform, proceedings of the ECMWF
Workshop “Towards TeraComputing – the Use of Parallel Processors in Meteorol-
ogy”, Worls Scientific Publishing Co 1999
Irle, A. (1990): Sequentialanalyse: Optimale sequentielle Tests, Teubner Skripten zur
Mathematischen Stochastik. Stuttgart 1990
Irle, A. (2001): Wahrscheinlichkeitstheorie und Statistik, Teubner 2001
Irwin, J.O. (1927): On the frequency distribution of the means of samples from a popula-
tion having any law of frequency with finite moments with special reference to Pear-
son’s Type II, Biometrika 19 (1927) 225-239
Isham, V. (1993): Statistical aspects of chaos, in: Networks and Chaos, Statistical and
Probabilistic Aspects (ed. D.E. Barndorff-Nielsen et al) 124-200, Chapman and Hall,
London 1993
Ishibuchi, H., Nozaki, K. and H. Tanaka (1992): Distributed representation of fuzzy rules
and its application to pattern classification, Fuzzy Sets and Systems 52 (1992) 21-32
Ishibuchi, H., Nozaki, K., Yamamoto, N. and H. Tanaka (1995): Selecting fuzzy if-then
rules for classification problems using genetic algorithms, IEEE Transactions on
Fuzzy Systems 3 (1995) 260-270
Ishibuchi, H. and T. Murata (1997): Minimizing the fuzzy rule base and maximizing its
performance by a multi-objective genetic algorithm, in: Sixth FUZZ-IEEE Confer-
ence, Barcelona 1997, pp. 259-264
Izenman, A.J. (1975): Reduced-rank regression for the multivariate linear model, J. Mul-
tivariate Analysis 5 (1975) 248-264
Jacob, N. (1996): Pseudo-differential operators and Markov processes, Akademie Verlag,
Berlin 1996
Jacobi, C.G.J. (1841): Deformatione et proprietatibus determinatum, Crelle's J. reine
angewandte Mathematik, Bd.22
Jacod, J. and P. Protter (2000): Probability essentials, Springer Verlag, Berlin 2000
Jaeckel, L.A. (1972): Estimating regression coefficients by minimizing the dispersion of
the residuals, Annals Mathematical Statistics 43 (1972) 1449-1458
696 References
Jajuga, K. (1995): On the principal components of time series, Statistics in Transition 2
(1995) 201-205
James, A.T. (1954): Normal multivariate analysis and the orthogonal group, Ann. Math.
Statist. 25 (1954) 40-75
Jammalamadaka, S.R. and A.SenGupta (2001): Topics in circular statistics, World Scien-
tific, Singapore 2001
Janacek, G. (2001): Practical time series, Arnold, London 2001
Jennison, C. and B.W. Turnbull (1997): Distribution theory of group sequential t, x2 and
F-Tests for general linear models, Sequential Analysis 16 (1997) 295-317
Jennrich, R.I. (1969): Asymptotic properties of nonlinear least squares estimation, Ann.
Math. Statist. 40 (1969) 633-643
Jensen, J.L. (1981): On the hyperboloid distribution, Scand. J. Statist. 8 (1981) 193-206
Jiancheng, L., Dingbo, C. and N. Jinsheng (1995): Spherical cap harmonic expansion for
local gravity field representation, Manuscripta Geodaetica 20 (1995) 265-277
Jiang, J. (1997): A derivation of BLUP - Best linear unbiased predictor, Statistics & Prob-
ability Letters 32 (1997) 321-324
Jiang, J. (1999): On unbiasedness of the empirical BLUE and BLUP, Statistics & Prob-
ability 41 (1999) 19-24
Jiang, J., Jia, H. and H. Chen (2001): Maximum posterior estimation of random effects in
generalized linear mixed models, Statistica Sinica 11 (2001) 97-120
Joe, H. (1997): Multivariate models and dependence concepts, Chapman and Hall, Boca
Raton 1997
John, P.W.M. (1998): Statistical design and analysis of experiments, SIAM 1998
Johnson, N.L. and S. Kotz (1970) : Continuous univariate distributions-1, distributions in
statistics, Houghton Mifflin Company Boston 1970
Johnson, N.L., Kotz, S. and A.W. Kemp (1992): Univariate discrete distributions, J.
Willey & Sons 1992
Joergensen, B. (1984): The delta algorithm and GLIM, Int. Statist. Review 52 (1984) 283-
300
Joergensen, B. (1997): The theory of dispersion models, Chapman and Hall, Boca Raton
1997
Joergensen, B., Lundbye-Christensen, S., Song, P.X.-K. and L. Sun (1996b): State space
models for multivariate longitudinal data of mixed types, Canad. J. Statist. 24 (1996b)
385-402
Jorgensen, P.C., Kubik, K., Frederiksen, P. and W. Weng (1985): Ah, robust estimation!,
Australian J. Geodesy, Photogrammetry and Surveying 42 (1985) 19-32
John, S. (1962): A tolerance region for multivariate normal distributions, Sankya A24
(1962) 363-368
Johnson, N.L. and S. Kotz (1970a): Continuous univariate distributions – 2, Houghton
Mifflin Company, Boston 1970
Johnson, N.L. and S. Kotz (1970b): Discrete distribution, Houghton Mifflin Company,
Boston 1970
Johnson, N.L. and S. Kotz (1972): Distributions in statistics: continuous multivariate
distributions, J. Wiley, New York 1972
Johnson, N.L., Kotz, S. and X. Wu (1991): Inspection errors for attributes in quality con-
trol, Chapman and Hall, Boca Raton 1991
Joshi, V.M. (1966): Admissibility of confidence intervals, Ann. Math. Statist. 37 (1966)
629-638
Judge, G.G. and M.E. Bock (1978): The statistical implications of pre-test and Stein-rule
estimators in econometrics, Amsterdam 1978
References 697
Judge, G.G. and T.A. Yancey (1981): Sampling properties of an inequality restricted
estimator, Economics Lett. 7 (1981) 327-333
Judge, G.G. and T.A. Yancey (1986): Improved methods of inference in econometrics,
Amsterdam 1986
Jukiü, D. and R. Scitovski (1997): Existence of optimal solution for exponential model by
least squares, J. Comput. Appl. Math. 78 (1997) 317-328
Jupp, P.E. and K.V. Mardia (1980): A general correlation coefficient for directional data
and related regression problems, Biometrika 67 (1980) 163-173
Jupp, P.E. and K.V. Mardia (1989): A unified view of the theory of directional statistics,
1975-1988, International Statist. Rev. 57 (1989) 261-294
Jureckova, J. (1995): Affine- and scale-equivariant M-estimators in linear model, Prob-
ability and Mathematical Statistics 15 (1995) 397-407
Jurisch, R. and G. Kampmann (1997): Eine Verallgemeinerung des arithmetischen Mittels
für einen Freiheitsgrad bei der Ausgleichung nach vermittelnden Beobachtungen, Z.
Vermessungswesen 11 (1997) 509-520
Jurisch, R. and G. Kampmann (1998): Vermittelnde Ausgleichungsrechnung mit balan-
cierten Beobachtungen – erste Schritte zu einem neuen Ansatz, Z. Vermessungswesen
123 (1998) 87-92
Jurisch, R. and G. Kampmann (2001): Plücker-Koordinaten – ein neues Hilfsmittel zur
Geometrie- Analyse und Ausreissersuche, Vermessung, Photogrammetrie und Kultur-
technik 3 (2001) 146-150
Jurisch, R. and G. Kampmann (2002): Teilredundanzen und ihre natürlichen Verallge-
meinerungen, Z. Vermessungswesen 127 (2002) 117-123
Jurisch, R., Kampmann, G. and B. Krause (1997): Über eine Eigenschaft der Methode der
kleinsten Quadrate unter Verwendung von balancierten Beobachtungen, Z. Vermes-
sungswesen 122 (1997) 159-166
Jurisch, R., Kampmann, G. and J. Linke (1999a): Über die Analyse von Beobachtungen in
der Ausgleichungsrechnung - Teil I, Z. Vermessungswesen 124 (1999) 350-357
Jurisch, R., Kampmann, G. and J. Linke (1999b): Über die Analyse von Beobachtungen
in der Ausgleichungsrechnung - Teil II, Z. Vermessungswesen 124 (1999) 350-357
Kagan, A.M., Linnik, J.V. and C.R. Rao (1965): Characterization problems of the normal
law based on a property of the sample average, Sankya Ser. A 27 (1965) 405-406
Kagan, A. and L.A. Shepp (1998): Why the variance?, Statist. Prob. Letters 38 (1998)
329-333
Kagan, A. and Z. Landsman (1999): Relation between the covariance and Fisher informa-
tion matrices, Statistics & Probability Letters 42 (1999) 7-13
Kahn, M., Mackisack, M.S., Osborne, M.R. and G.K. Smyth (1992): On the consistency
of Prony's method and related algorithms, J. Comp. and Graph. Statist. 1 (1992) 329-
349
Kahng, M.W. (1995): Testing outliers in nonlinear regression, J. the Korean Stat. Soc. 24
(1995) 419-437
Kakihara, Y. (2001): The Kolmogorov isomorphism theorem and extensions to some
nonstationary processes, D. N. Shanbhag and C.R. Rao, eds., Handbook of Statistic 19
(2001) 443-470
Kallianpur, G. (1963): Von Mises functionals and maximum likelihood estimation, San-
kya A25 (1963) 149-158
Kallianpur, G. and Y.-T. Kim (1996): A curious example from statistical differential
geometry, Theory Probab. Appl. 43 (1996) 42-62
Kallenberg, O. (1997): Foundations of modern probability, Springer Verlag, New York
1997
698 References
Kaminsky, K.S. and P.I. Nelson (1975): Best linear unbiased prediction of order statistics
in location and scale families, J. Amer. Statist. Ass. 70 (1975) 145-150
Kaminsky, K.S. and L.S. Rhodin (1985): Maximum likelihood prediction, Ann. Inst.
Statist. Math. 37 A (1985), 507-517
Kampmann, G. (1988): Zur kombinativen Norm-Schätzung mit Hilfe der L1-, der L2- und
der Boskovic-Laplace-Methode mit den Mittlen der linearen Programmierung, PhD.
Thesis, Bonn University, Bonn 1988
Kampmann, G. (1992): Zur numerischen Überführung verschiedener linearer Modelle der
Ausgleichungsrechnung, Z. Vermessungswesen 117 (1992), 278-287
Kampmann, G. (1994): Robuste Deformationsanalyse mittels balancierter Ausgleichung,
Allg. Vermessungsnachrichten 1 (1994) 8-17
Kampmann, G (1997): Eine Beschreibung der Geometrie von Beobachtungen in der
Ausgleichungsrechnung, Z. Vermessungswesen 122 (1997) 369-377
Kampmann, G. and B. Krause (1996): Balanced observations with a straight line fit,
Bolletino di Geodesia e Scienze Affini 2 (1996) 134-141
Kampmann, G. and B. Krause (1997a): Minimierung von Residuenfunktionen unter
Ganzzahligkeitrestriktionen, Allg. Vermessungsnachrichtungen 8-9 (1997) 325-331
Kampmann, G. and B. Krause (1997b): A breakdown point analysis for the straight line
fit based on balanced observations, Bolletino di Geodesia e Scienze Affini 3 (1997)
294-303
Kampmann, G. and B. Krause (2004): Zur statistischen Begründung des Regressionsmo-
dells der balanzierten Ausgleichungsrechnung, Z. Vermessungswesen 129 (2004)
176-183
Kampmann, G. and B. Renner (1999): Über Modellüberführungen bei der linearen Aus-
gleichungsrechnung, Allg. Vermessungsnachrichten 2 (1999) 42-52
Kampmann, G. and B. Renner (2000): Numerische Beispiele zur Bearbeitung latenter
Bedingungen und zur Interpretation von Mehrfachbeobachtungen in der Ausglei-
chungsrechnung, Z. Vermessungswesen 125 (2000) 190-197
Kanani, E. (2000): Robust estimators for geodetic transformations and GIS, Institut für
Geodäsie und Photogrammetrie an der Eidgenössischen Technischen Hochschule Zü-
rich, Mitteilungen Nr. 70, Zürich 2000
Kannan, N. and D. Kundu (1994): On modified EVLP and ML methods for estimating
superimposed exponential signals, Signal Processing 39 (1994) 223-233
Kantz, H. and Scheiber, T. (1997): Nonlinear rime series analysis, Cambridge University
Press, Cambridge 1997
Karatzas, I. and S.E. Shreve (1991): Brownian motion and stochastic calculus, Springer-
Verlag, New York 1991
Karian, Z.A. and E.J. Dudewicz (2000): Fitting statistical distributions, CRC Press 2000
Kariya, T. (1989): Equivariant estimation in a model with an ancillary statistic, Ann.
Statist 17 (1989) 920-928
Karlin, S. and W.J. Studden (1966a): Tchebychev systems, Interscience, New York
(1966)
Karlin, S. and W.J. Studden (1966b): Optimal experimental designs, Ann. Math. Statist.
57 (1966) 783-815
Karr, A.F. (1993): Probability, Springer Verlag, New York 1993
Kasala, S. and T. Mathew (1997): Exact confidence regions and tests in some linear func-
tional relationships, Statistics & Probability Letters 32 (1997) 325-328
Kasietczuk, B. (2000): Geodetic network adjustment by the maximum likelihood method
with application of local variance, asymmetry and excess coefficients, Anno LIX -
Bollettino di Geodesia e Scienze Affini 3 (2000) 221-235
References 699
Kass, R.E. and P.W. Vos (1997): Geometrical foundations of asymptotic inference,
Wiley, New York 1997
Kastrup, H.A. (1962): Zur physikalischen Deutung und darstellungstheoretischen Analyse
der konformen Transformationen von Raum und Zeit, Annalen der Physik 9 (1962)
388-428
Kastrup, H.A. (1966): Gauge properties of the Minkowski space, Physical Review 150
(1966) 1183-1193
Kay, S.M. (1988): Sinusoidal parameter estimation, Prentice Hall, Englewood Cliffs, N.J.
1988
Keller, J.B. (1975): Closest unitary, orthogonal and Hermitian operators to a given opera-
tor, Math. Mag. 46 (1975) 192-197
Kelly, R.J. and T. Mathew (1993): Improved estimators of variance components having
smaller probability of negativity, J. Royal Stat. Soc. B 55 (1993) 897-911
Kemperman, J.H.B. (1956): Generalized tolerance limits, Ann. Math. Statist. 27 (1956)
180-186
Kendall, D.G. (1974): Pole seeking Brownian motion and bird navigation, Joy. Roy. Stat.
Soc. B. 36 (1974) 365-417
Kendall, D.G. (1984): Shape manifolds, Procrustean metrics, and complex projective
space, Bulletin of the London Mathematical Society 16 (1984) 81-121
Kendall, M.G. (1960): The evergreen correlation coefficient, pages 274-277, in: Essays on
Honor of Harold Hotelling, ed. I. Olkin, Stanford University Press, Stanford 1960
Kenney, C.S., A.J. Laub and M.S. Reese (1998): Statistical condition estimation for linear
systems, SIAM J. Scientific Computing 19 (1998) 566-584
Kent, J.T. (1976): Distributions, processes and statistics on “spheres”, PhD. Thesis, Uni-
versity of Cambridge
Kent, J.T. (1983): Information gain and a general measure of correlation, Biometrika 70
(1983) 163-173
Kent, J.T. (1997): Consistency of Procrustes estimators, J. R. Statist. Soc. B 59 (1997)
281-290
Kent, J.T. and K.V. Mardia (1997): Consistency of Procrustes estimators, J. R. Statist.
Soc. 59 (1997) 281-290
Kent, J.T. and M. Mohammadzadeh (2000): Global optimization of the generalized cross-
validation criterion, Statistics and Computing 10 (2000) 231-236
Khan, R.A. (1973): On some properties of Hammersley’s estimator of an integer mean,
The Annals of Statistics 1 (1973) 756-762
Khan, R.A. (1978): A note on the admissibility of Hammersley’s estimator of an integer
mean, The Canadian J.Statistics 6 (1978) 113-119
Khan, R.A. (1998): Fixed-width confidence sequences for the normal mean and the bino-
mial probability, Sequential Analysis 17 (1998) 205-217
Khan, R.A. (2000): A note on Hammersley's estimator of an integer mean, J. Statist.
Planning and Inference 88 (2000) 37-45
Khatri, C.G. and C.R. Rao (1968): Solutions to some fundamental equations and their
applications to characterization of probability distributions, Sankya, Series A, 30
(1968) 167-180
Khatri, C.G. and S.K. Mitra (1976): Hermitian and nonnegative definite solutions of
linear matrix equations, SIAM J. Appl. Math. 31 (1976) 597-585
Khuri, A.I. (1999): A necessary condition for a quadratic form to have a chi-squared
distribution: an accessible proof, Int. J. Math. Educ. Sci. Technol. 30 (1999) 335-339
Khuri, A.I., Mathew, T. and B.K. Sinha (1998): Statistical tests for mixed linear models,
Wiley, New York 1998
700 References
Kidd, M. and N.F. Laubscher (1995): Robust confidence intervals for scale and its appli-
cation to the Rayleigh distribution, South African Statist. J. 29 (1995) 199-217
Kiefer, J. (1974): General equivalence theory for optimal designs (approximate theory),
Ann. Stat. 2 (1974) 849-879
Kiefer, J.C. and J. Wolfowitz (1959): Optimum design in regression problem, Ann. Math.
Statist. 30 (1959) 271-294
Kilmer, M.E. and D.P. O’Leary (2001): Choosing regularization parameters in iterative
methods for ILL-Posed Problems, SIAM J. Matrix Anal. Appl. 22 (2001) 1204-1221
Kim, C. and B.E. Storer (1996): Reference values for Cook’s distance, Commun. Statist. –
Simula. 25 (1996) 691-708
King, J.T. and D. Chillingworth (1979): Approximation of generalized inverses by iter-
ated regularization, Numer. Funct. Anal. Optim. 1 (1979) 499-513
King, M.L. (1980): Robust tests for spherical symmetry and their application to least
squares regression, Ann. Statist. 8 (1980) 1265-1271
Kirkwood, B.H., Royer, J.Y., Chang,T.C. and R.G.Gordon (1999): Statistical tools for
estimating and combining finite rotations and their uncertainties, Geoph. J. Int.
137(1999) 408-428
Kirsch, A. (1996): An introduction to the mathematical theory of inverse problems,
Springer Verlag, New York 1996
Kitagawa, G. and W. Gersch (1996): Smoothness priors analysis of time series, Springer
Verlag, New York 1996
Klebanov, L.B. (1976): A general definition of unbiasedness, Theory of Probability and
Appl. 21 (1976) 571-585
Klees, R., Ditmar, P. and P. Broersen (2003): How to handle colored observation noise in
large least-squares problems, J. Geodesy 76 (2003) 629-640
Kleffe, J. (1976): A note on MINQUE for normal models, Math. Operationsforschg.
Statist. 7 (1976) 707-714
Kleffe, J. and R. Pincus (1974): Bayes and the best quadratic unbiased estimators for
variance components and heteroscedastic variances in linear models, Math. Opera-
tionsforschg. Statistik 5 (1974) 147-159
Kleffe, J. and J.N.K. Rao (1986): The existence of asymptotically unbiased nonnegative
quadratic estimates of variance components in ANOVA models, J. American Statisti-
cal Assoc. 81(1986) 692-698
Kleusberg, A. and E.W. Grafarend (1981): Expectation and variance component estima-
tion of multivariate gyrotheodolite observation II, Allg. Vermessungsnachrichten 88
(1981) 104-108
Klonecki, W. and S. Zontek (1996): Improved estimators for simultaneous estimation of
variance components, Statistics & Probability Letters 29 (1996) 33-43
Knautz, H. (1996): Linear plus quadratic (LPQ) quasiminimax estimation in the linear
regression model, Acta Applicandae Mathematicae 43 (1996) 97-111
Knautz, H. (1999): Nonlinear unbiased estimation in the linear regression model with
nonnormal disturbances, J. Statistical Planning and Inference 81 (1999) 293-309
Knickmeyer, E.H. (1984): Eine approximative Lösung der allgemeinen linearen Geodäti-
schen Randwertaufgabe durch Reihenentwicklungen nach Kugelfunktionen, Deutsche
Geodätische Kommission bei der Bayerischen Akademie der Wissenschaften, Mün-
chen 1984
Knobloch, E. (1992): Historical aspects of the foundations of error theory, in: Echeveria,
J., Ibarra, J. and T. Mormann (eds): The space of mathematics – philosophical, epis-
temological and historical explorations, Walter de Gruyter 1992
Kobilinsky, A. (1990): Complex linear models and cyclic designs, Linear Algebra and its
Application 127 (1990) 227-282
References 701
Koch, G.G. (1968): Some further remarks on “A general approach to the estimation of
variance components“, Technometrics 10 (1968) 551-558
Koch, K.R. (1979): Parameter estimation in the Gauß-Helmert model, Boll. Geod. Sci.
Affini 38 (1979) 553-563
Koch, K.R. (1982): S-transformations and projections for obtaining estimable parameters,
in: Blotwijk, M.J. et al. (eds.): 40 Years of Thought, Anniversary volume for Prof.
Baarda’s 65th Birthday Vol. 1. pp. 136-144, Technische Hogeschool Delft, Delft 1982
Koch, K.R. (1987): Parameterschaetzung und Hypothesentests in linearen Modellen,
2nd ed., Duemmler, Bonn 1987
Koch, K.R. (1988): Parameter estimation and hypothesis testing in linear models, Sprin-
ger-Verlag, Berlin – Heidelberg – New York, 1988
Koch, K.R. (1999): Parameter estimation and hypothesis testing in linear models, 2nd ed.,
Springer Verlag, Berlin 1999
Koch, K.R. and J. Kusche (2002): Regularization of geopotential determination from
satellite data by variance components, J. Geodesy 76 (2002) 259-268
Koch, K.R. and Y. Yang (1998): Konfidenzbereiche und Hypothesentests für robuste
Parameterschätzungen, Z. Vermessungswesen 123 (1998) 20-26
König, D. and V. Schmidt (1992): Zufällige Punktprozesse, Teubner Skripten zur Mathe-
matischen Stochastik, Stuttgart 1992
Koenker, R. and G. Basset (1978): Regression quantiles, Econometrica 46 (1978) 33-50
Kokoszka, P. and T. Mikosch (2000): The periodogram at the Fourier frequencies, Sto-
chastic Processes and their Applications 86 (2000) 49-79
Kollo, T. and H. Neudecker (1993): Asymptotics of eigenvalues and unit-length eigenvec-
tors of sample variance and correlation matrices, J. Multivariate Anal. 47 (91993)
283-300
Kollo, T. and D. von Rosen (1996): Formal density expansions via multivariate mixtures,
in: Multidimensional statistical analysis and theory of random matrices, pp. 129-138,
Proceedings of the Sixth Lukacs Symposium, eds. Gupta, A.K. and V.L.Girko, VSP,
Utrecht 1996
Koopmans, T.C. and O. Reiersol (1950): The identification of structural characteristics,
Ann. Math. Statistics 21 (1950) 165-181
Kosko, B. (1992): Networks and fuzzy systems, Prentice-Hall, Englewood Cliffs 1992
Kotecky, R. and J. Niederle (1975): Conformally covariant field equations: First order
equations with non-vanishing mass, Czech. J. Phys. B25 (1975) 123-149
Kotlarski, I. (1967): On characterizing the gamma and the normal distribution, Pacific J.
Mathematics 20 (1967) 69-76
Kotsakis, C. and M.G. Sideris (2001): A modified Wiener-type filter for geodetic estima-
tion problems with non-stationary noise, J. Geodesy 75 (2001) 647-660
Kott, P.S. (1998): A model-based evaluation of several well-known variance estimators
for the combined ratio estimator, Statistica Sinica 8 (1998) 1165-1173
Kotz, S., Kozubowski, T.J. and K. Podgórki (2001): The Laplace distribution and gener-
alizations, Birkhäuser 1999
Koukouvinos, C. and J. Seberry (1996): New weighing matrices, Sankhya: The Indian J.
Statistics B58 (1996) 221-230
Kowalewski, G. (1995): Robust estimators in regression, Statistics in Transition 2 (1995)
123-135
Krämer, W., Bartels, R. and D.G. Fiebig (1996): Another twist on the equality of OLS and
GLS, Statistical Papers 37 (1996) 277-281
Krantz, S.G. and H.R. Parks (2002): The implicit function theorem – history, theory and
applications, Birkhäuser, Boston 2002
702 References
Krarup, T., Juhl, J. and K. Kubik (1980): Götterdämmerung over least squares adjustment,
in: Proc. 14th Congress of the International Society of Photogrammetry, vol. B3,
Hamburg 1980, 369-378
Krengel, U. (1985): Ergodic theorems, de Gruyter, Berlin-New York 1985
Kres, H. (1983): Statistical tables for multivariate analysis, Springer, Berlin-Heidelberg-
New York 1985
Krishnakumar, J. (1996): Towards a general robust estimation approach for generalised
regression models, Physics Abstract, Science Abstract Series A, INSPEC 1996
Kronecker, L. (1903): Vorlesungen über die Theorie der Determinanten, Erster Band,
Bearbeitet und fortgeführt von K.Hensch, B.G. Teubner, Leipzig 1903
Krumbein, W.C. (1939): Preferred orientation of pebbles in sedimentary deposits, J. Geol.
47 (1939) 673-706
Krumm, F. (1987): Geodätische Netze im Kontinuum: Inversionsfreie Ausgleichung und
Konstruktion von Kriterionmatrizen, Deutsche Geodätische Kommission, Bayerische
Akademie der Wissenschaften, PhD. Thesis, Report C334, München 1987
Krumm, F. and F. Okeke (1998): Graph, graph spectra, and partitioning algorithms in a
geodetic network structural analysis and adjustment, Bolletino di Geodesia e Science
Affini 57 (1998) 1-24
Krumm, F., Grafarend, E.W. and B. Schaffrin (1986): Continuous networks, Fourier
analysis and criterion matrices, Manuscripta Geodaetica 11 (1986) 57-78
Kruskal, W. (1946): Helmert’s distribution, American Math. Monthly 53 (1946) 435-438
Kruskal, W. (1968): When are Gauß-Markov and least squares estimators identical? A
coordinate-free approach, Ann. Statistics 39 (1968) 70-75
Kryanev, A.V. (1974): An iterative method for solving incorrectly posed problem, USSR.
Comp. Math. Math. Phys. 14 (1974) 24-33
Krzanowski, W.J. (1995): Recent advances in descriptive multivariate analysis, Clarendon
Press, Oxford 1995
Krzanowski, W.J. and F.H.C. Marriott (1994): Multivariate analysis: part I - distribution,
ordination and inference, Arnold Publ., London 1994
Krzanowski, W.J. and F.H.C. Marriott (1995): Multivariate analysis: part II - classifica-
tion, covariance structures and repeated measurements, Arnold Publ., London 1995
Kshirsagar, A.M. (1983): A course in linear models, Marcel Dekker Inc, New York –
Basel 1983
Kuang, S.L. (1991): Optimization and design of deformations monitoring schemes, PhD
dissertation, Department of Surveying Engineering, University of New Brunswick,
Tech. Rep. 91, Fredericton 1991
Kuang, S. (1996): Geodetic network analysis and optimal design, Ann Arbor Press, Chel-
sea, Michigan 1996
Kubácek, L. (1996a): Linear model with inaccurate variance components, Applications of
Mathematics 41 (1996) 433-445
Kubácek, L. (1996b): Nonlinear error propagation law, Applications of Mathematics 41
(1996) 329-345
Kubik, K. (1982): Kleinste Quadrate und andere Ausgleichsverfahren, Vermessung Pho-
togrammetrie Kulturtechnik 80 (1982) 369-371
Kubik, K., and Y. Wang (1991): Comparison of different principles for outlier detection,
Australian J. Geodesy, Photogrammetry and Surveying 54 (1991) 67-80
Kuechler, U. and M. Soerensen (1997): Exponential families of stochastic processes,
Springer Verlag, Berlin 1997
Kullback, S. (1934): An application of characteristic functions to the distribution problem
of statistics, Annals Math. Statistics 4 (1934) 263-305
References 703
Kumaresan, R. (1982): Estimating the parameters of exponentially damped or undamped
sinusoidal signals in noise, PhD thesis, The University of Rhode Island, Rhode Island
1982
Kundu, D. (1993a): Estimating the parameters of undamped exponential signals, Tech-
nometrics 35 (1993) 215-218
Kundu, D. (1993b): Asymptotic theory of least squares estimator of a particular nonlinear
regression model, Statistics and Probability Letters 18 (1993) 13-17
Kundu, D. (1994a): Estimating the parameters of complex valued exponential signals,
Computational Statistics and Data Analysis 18 (1994) 525-534
Kundu, D. (1994b): A modified Prony algorithm for sum of damped or undamped expo-
nential signals, Sankhya A 56 (1994) 525-544
Kundu, D. (1997): Asymptotic theory of the least squares estimators of sinusoidal signal,
Statistics 30 (1997) 221-238
Kundu, D. and R.D. Gupta (1998): Asymptotic properties of the least squares estimators
of a two dimensional model, Metrika 48 (1998) 83-97
Kundu, D. and A. Mitra (1998a): Fitting a sum of exponentials to equispaced data, The
Indian J.Statistics 60 (1998) 448-463
Kundu, D. and A. Mitra (1998b): Different methods of estimating sinusoidal frequencies:
a numerical comparison, J. Statist. Comput. Simul. 62 (1998) 9-27
Kunst, R.M. (1997): Fourth order moments of augmented ARCH processes, Commun.
Statist. Theor. Meth. 26 (1997) 1425-1441
Kunz, E. (1985): Introduction to commutative algebra and algebraic geometry, Birkhäuser
Boston – Basel – Berlin 1985
Kuo, W., Prasad, V.R., Tillman, F.A. and C.-L. Hwang (2001): Optimal reliability design,
Cambridge University Press, Cambridge 2001
Kuo, Y. (1976): An extended Kronecker product of matrices, J. Math. Anal. Appl. 56
(1976) 346-350
Kupper, L.L (1972): Fourier series and spherical harmonic regression, J. Royal Statist.
Soc. C21 (1972) 121-130
Kupper, L.L (1973): Minimax designs of Fourier series and spherical harmonic regres-
sions: a characterization of rota table arrangements, J. Royal Statist. Soc. B35 (1973)
493-500
Kurata, H. (1998): A generalization of Rao's covariance structure with applications to
several linear models, J. Multivariate Analysis 67 (1998) 297-305
Kurz, L. and M.H. Benteftifa (1997): Analysis of variance in statistical image processing,
Cambridge University Press, Cambridge 1997
Kusche, J. (2001): Implementation of multigrid solvers for satellite gravity anomaly re-
covery, J. Geodesy 74 (2001) 773-782
Kusche, J. (2002): Inverse Probleme bei der Gravitationsfeldbestimmung mittels SST-
und SGG- Satellitenmissionen , Deutsche Geodätische Kommission, Report C548,
München 2002
Kusche, J. (2003): A Monte-Carlo technique for weight estimation in satellite geodesy, J.
Geodesy 76 (2003) 641-652
Kusche, J. and R. Klees (2002): Regularization of gravity field estimation from satellite
gravity gradients, J. Geodesy 76 (2002) 359-368
Kushner, H. (1967a): Dynamical equations for optimal nonlinear filtering, J. Diff. Eq. 3
(1967) 179-190
Kushner, H. (1967b): Approximations to optimal nonlinear filters, IEEE Trans. Auto.
Contr. AC-12 (1967) 546-556
704 References
Kutoyants, Y.A. (1984): Parameter estimation for stochastic processes, Heldermann,
Berlin 1984
Kutterer, H. (1994): Intervallmathematische Behandlung endlicher Unschärfen linearer
Ausgleichsmodelle, PhD Thesis, Deutsche Geodätische Kommission DGK C423,
München 1994
Kutterer, H. and S. Schoen (1999): Statistische Analyse quadratischer Formen - der De-
terminantenansatz, Allg. Vermessungsnachrichten 10 (1999) 322-330
Kutterer, H. (1999): On the sensitivity of the results of least-squares adjustments concern-
ing the stochastic model, J. Geodesy 73 (1999) 350-361
Kutterer, H. (2002): Zum Umgang mit Ungewissheit in der Geodäsie – Bausteine für eine
neue Fehlertheorie - , Deutsche Geodätische Kommission, Report C553, München
2002
Laeuter, H. (1970): Optimale Vorhersage und Schätzung in regulären und singulären
Regressionsmodellen, Math. Operationsforschg. Statistik 1 (1970) 229-243
Laeuter, H. (1971): Vorhersage bei stochastischen Prozessen mit linearem Regressionsan-
teil, Math. Operationsforschg. Statistik 2 (1971) 69-85, 147-166
Laha, R.G. (1956): On the stochastic independence of two second-degree polynomial
statistics in normally distributed variates, The Annals of Mathematical Statistics 27
(1956) 790-796
Laha, R.G. and E. Lukacs (1960): On a problem connected with quadratic regression,
Biometrika 47 (1960) 335-343
Lai, T.L. and C.Z. Wie (1982): Least squares estimates in stochastic regression model
with applications to stochastic regression in linear dynamic systems, Anals of Statis-
tics 10 (1982) 154-166
Laird, N.M. and J.H. Ware (1982): Random-effects models for longitudinal data, Biomet-
rics 38 (1982) 963-974
LaMotte, L.R. (1973): Quadratic estimation of variance components, Biometrics 29
(1973) 311-330
LaMotte, L.R. (1973): On non-negative quadratic unbiased estimation of variance com-
ponents, J. American Statist. Assoc. 68 (1973) 728-730
LaMotte, L.R. (1976): Invariant quadratic estimators in the random, one-way ANOVA
model, Biometrics 32 (1976) 793-804
Lamotte, L.R., McWhorter, A. and R.A. Prasad (1988): Confidence intervals and tests on
the variance ratio in random models with two variance components, Com. Stat. –
Theory Meth. 17 (1988) 1135-1164
Lamperti, J. (1966): Probability, Benjamin Publ. 1966
Lancaster, H.O. (1965): The helmert matrices, American Math. Monthly 72 (1965) 4-11
Lancaster, H.O. (1966): Forerunners of the Pearson Chi2 , Australian J. Statistics 8 (1966)
117-126
Langevin, P. (1905): Magnetisme et theorie des electrons, Ann. de Chim. et de Phys. 5
(1905) 70-127
Lanzinger, H. and U. Stadtmüller (2000): Weighted sums for i.i.d. random variables with
relatively thin tails, Bernoulli 6 (2000) 45-61
Lardy, L.J. (1975): A series representation for the generalized inverse of a closed linear
operator, Atti Accad. Naz. Lincei Rend. Cl. Sci. Fis. Mat. Natur. 58 (1975) 152-157
Lauritzen, S. (1973): The probabilistic background of some statistical methods in Physical
Geodesy, Danish Geodetic Institute, Maddelelse 48, Copenhagen 1973
Lauritzen, S. (1974): Sufficiency, prediction and extreme models, Scand. J. Statist. 1
(1974) 128-134
References 705
Lawson, C.L. and R.J. Hanson (1995): Solving least squares problems, SIAM, Philadel-
phia 1995 (reprinting with corrections and a new appendix of a 1974 Prentice Hall
text)
Lawson, C.L. and R.J. Hanson (1974): Solving least squares problems, Prentice-Hal,
Englewod Cliffs 1974
Laycock, P.J. (1975): Optimal design: regression models for directions, Biometrika 62
(1975) 305-311
LeCam, L. (1960): Locally asymptotically normal families of distributions, University of
California Publication 3, Los Angeles 1960
LeCam, L. (1970): On the assumptions used to prove asymptotic normality of maximum
likelihood estimators, Ann. Math. Statistics 41 (1970) 802-828
LeCam, L. (1986): Proceedings of the Berkeley conference in honor of Jerzy Neyman and
Jack Kiefer, Chapman and Hall, Boca Raton 1986
Lee, J.C. and S. Geisser (1996): On the Prediction of Growth Curves, in: Lee, C., John-
son, W.O. and A. Zellner (eds.): Modelling and Prediction Honoring Seymour Geis-
ser, pp. 77-103, Springer Verlag, New York 1996
Lee, J.C., Johnson, W.O. and A. Zellner (1996): Modeling and prediction, Springer Ver-
lag, New York 1996
Lee, M. (1996): Methods of moments and semiparametric econometrics for limited de-
pendent variable models, Springer Verlag, New York 1996
Lee, P. (1992): Bayesian statistics, Wiley, New York 1992
Lee, S.L. (1995): A practical upper bound for departure from normality, Siam J. Matrix
Anal. Appl.16 (1995) 462-468
Lee, S.L. (1996): Best available bounds for departure from normality, Siam J. Matrix
Anal. Appl.17 (1996) 984-991
Lee, S.Y. and J.-Q. Shi (1998): Analysis of covariance structures with independent and
non-identically distributed observations, Statistica Sinica 8 (1998) 543-557
Lee, Y. and J.A. Nelder (1996): Hierarchical generalized linear models, J. Roy. Statist.
Soc., Series B, 58 (1996) 619-678
Lehmann, E.L. (1959): Testing statistical hypotheses, J. Wiley, New York 1959
Lehmann, E.L. and H. Scheffé (1950): Completeness, similar regions and unbiased esti-
mation, Part I, Sankya 10 (1950) 305-340
Lehmann, E.L. and G. Casella (1998): Theory of point estimation, Springer Verlag, New
York 1998
Lenk, U. (2001a): Schnellere Multiplikation großer Matrizen durch Verringerung der
Speicherzugriffe und ihr Einsatz in der Ausgleichungsrechnung, Z. Vermessungswe-
sen 4 (2001) 201-207
Lenk, U. (2001b): 2.5D-GIS und Geobasisdaten – Integration von Höheninformation und
Digitalen Situationsmodellen, Wiss. Arbeiten der Fachrichtung Vermessungswesen
der Uni. Hannover, Hannnover 2001
Lenstra, A.K., Lenstra, H.W. and L. Lovacz (1982): Factoring polynomials with rational
coefficients, Math. Ann. 261 (1982) 515-534
Lenstra, H.W. (1983): Integer programming with a fixed number of variables, Math.
Operations Res. 8 (1983) 538-548
Lenth, R.V. (1981): Robust measures of location for directional data, Technometrics 23
(1981) 77-81
Lenth, R.V. (1981): Robust measures of location for directional data, Technometrics 23
(1981) 77-81
706 References
Lentner, M.N. (1969): Generalized least-squares estimation of a subvector of parameters
in randomized fractional factorial experiments, Ann. Math. Statist. 40 (1969) 1344-
1352
Lenzmann, L. (2003): Strenge Auswertung des nichtlnearen Gauß-Helmert-Modells, Allg.
Vermessungsnachrichten 2 (2004) 68-73
Lesaffre, E. and G. Verbeke (1998): Local influence in linear mixed models, Biometrics
54 (1998) 570 – 582
Letac, G. and M. Mora (1990): Natural real exponential families with cubic variance
functions, Ann. Statist. 18 (1990) 1-37
Lether, F.G. and P.R. Wentson (1995): Minimax approximations to the zeros of P n (x) and
Gauss-Legendre quadrature, J. Comput. Appl. Math. 59 (1995) 245-252
Levenberg, K. (1944): A method for the solution of certain non-linear problems in least-
squares, Quart. Appl. Math. Vol. 2 (1944) 164-168
Levin, J. (1976): A parametric algorithm for drawing pictures of solid objects composed
of quadratic surfaces, Communications of the ACM 19 (1976) 555-563
Lewis, R.M. and V. Torczon (2000): Pattern search methods for linearly constrained
minimization, SIAM J. Optim. 10 (2000) 917-941
Lewis, T.O. and T.G. Newman (1968): Pseudoinverses of positive semidefinite matrices,
SIAM J. Appl. Math. 16 (1968) 701-703
Li, B.L. and C. Loehle (1995): Wavelet analysis of multiscale permeabilities in the sub-
surface, Geoph. Res. Lett. 22 (1995) 3123-3126
Li, C.K. and R. Mathias (2000): Extremal characterizations of the Schur complement and
resulting inequalities, SIAM Review 42 (2000) 233-246
Li, T. (2000): Estimation of nonlinear errors-in variables models: a simulated minimum
distance estimator, Statistics & Probability Letters 47 (2000) 243-248
Liang, K. and K. Ryu (1996): Selecting the form of combining regressions based on re-
cursive prediction criteria, in: Lee, C., Johnson, W.O. and A. Zellner (eds.): Model-
ling and prediction honoring Seymour Geisser, Springer Verlag, New York 1996,
122-135
Liang, K.Y. and S.L. Zeger (1986): Longitudinal data analysis using generalized linear
models, Biometrika 73 (1986) 13-22
Liang, K.Y. and S.L. Zeger (1995): Inference based on estimating functions in the pres-
ence of nuisance parameters, Statist. Sci. 10 (1995) 158-199
Liesen, J., Rozlozník, M. and Z. Strakos (2002): Least squares residuals and minimal
residual methods, SIAM J. Sci. Comput. 23 (2002) 1503-1525
Lindley, D.V. and A.F.M. Smith (1972): Bayes estimates for the linear model, J. Roy.
Stat. Soc. 34 (1972) 1-41
Lilliefors, H.W. (1967): On the Kolmogorov-Smirnov test for normality with mean and
variance unknown, J. American Statistical Assoc. 62 (1967) 399-402
Lin, A. and S.-P. Han (2004): A class of methods for projection on the intersection of
several ellipsoids, SIAM J. Optim 15 (2004) 129-138
Lin, X. and N.E. Breslow (1996): Bias correction in generalized linear mixed models with
multiple components of dispersion, J. American Statistical Assoc. 91 (1996) 1007-
1016
Lin, X.H. (1997): Variance component testing in generalised linear models with random
effects, Biometrika 84 (1997) 309-326
Lindsey, J.K. (1997): Applying generalized linear models, Springer Verlag, New York
1997
Lindsey, J.K. (1999): Models for repeated measurements, 2nd edition, Oxford University
Press, Oxford 1999
References 707
Lingjaerde, O. and N. Christophersen (2000): Shrinkage structure of partial least squares,
Board of the Foundation of the Scandivnavian J.Statistics 27 (2000) 459-473
Linke, J., Jurisch, R., Kampmann, G. und H. Runne (2000): Numerisches Beispiel zur
Strukturanalyse von geodätischen und mechanischen Netzen mit latenten Restriktio-
nen, Allg. Vermessungsnachrichten 107 (2000) 364-368
Linnik, J.V. and I.V. Ostrovskii (1977): Decomposition of random variables and vectors,
Transl. Math. Monographs Vol. 48, American Mathematical Society, Providence
1977
Liptser, R.S. and A.N. Shiryayev (1977): Statistics of random processes, Vol. 1, Springer
Verlag, New York 1977
Liski, E.P. and T. Nummi (1996): Prediction in repeated-measures models with engineer-
ing applications, Technometrics 38 (1996) 25-36
Liski, E.P. and S. Puntanen (1989): A further note on a theorem on the difference of the
generalized inverses of two nonnegative definite matrices, Commun. Statist.-Theory
Meth. 18 (1989) 1747-1751
Liski, E.P., Luoma, A. and A. Zaigraev (1999): Distance optimality design criterion in
linear models, Metrika 49 (1999) 193-211
Liski, E.P., Luoma, A., Mandal, N.K. and B.K. Sinha (1998): Pitman nearness, distance
criterion and optimal regression designs, Calcutta Statistical Ass. Bull. 48 (1998) 191-
192
Liski, E.P., Mandal, N.K., Shah, K.R. and B.K. Sinha (2002): Topics in optimal design,
Springer Verlag, New York 2002
Liu, J. (2000): MSEM dominance of estimators in two seemingly unrelated regressions, J.
Statistical Planning and Inference 88 (2000) 255-266
Liu, S. (2000): Efficiency comparisons between the OLSE and the BLUE in a singular
linear model, J. Statistical Planning and Inference 84 (2000) 191-200
Liu, X.-W. and Y.-X. Yuan (2000): A robust algorithm for optimization with general
equality and inequality constraints, SIAM J. Sci. Comput. 22 (2000) 517-534
Liu, W., Yao, Y. and C. Shi (2001): Theoretic research on robustified least squares esti-
mator based on equivalent variance-covariance, Geo-spatial Information Science 4
(2001) 1-8
Liu, W., Xia, Z. and M. Deng (2001): Modelling fuzzy geographic objects within fuzzy
fields, Geo-spatial Information Science 4 (2001) 37-42
Ljung, L. (1979): Asymptotic behavior of the extended Kalman filter as a parameter
estimator for linear systems, IEEE Trans. Auto. Contr. AC-24 (1979) 36-50
Lloyd, E.H. (1952): Least squares estimation of location and scale parameters using order
statistics, Biometrika 39 (1952) 88-95
Lohse, P. (1994): Ausgleichungsrechnung in nichtlinearen Modellen, Deutsche Geod.
Kommission C 429, München 1994
Long, J.S. and L.H. Ervin (2000): Using heteroscedasticy consistent standard errors in the
linear regression model, The American Statistician 54 (2000) 217-224
Longford, N.T. (1993): Random coefficient models, Clarendon Press, Oxford 1993
Longford, N. (1995): Random coefficient models, Oxford University Press, 1995
Longley, J.W. and R.D. Longley (1997): Accuracy of Gram-Schmidt orthogonalization
and householder transformation for the solution of linear least squares problems, Nu-
merical Linear Algebra with Applications 4 (1997) 295-303
López-Blázquez, F. (2000): Unbiased estimation in the non-central Chi-Square distribu-
tion, J. Multivariate Analysis 75 (2000) 1-12
Lord, R.D. (1948): A problem with random vectors, Phil. Mag. 39 (1948) 66-71
708 References
Lord, R.D. (1995): The use of the Hankel transform in statistics, I. General theory and
examples, Biometrika 41 (1954) 44-55
Lorentz, G.G. (1966): Metric entropy and approximation, Bull. American Math. Soc. 72
(1966) 903-937
Ludlow, J. and W. Enders (2000): Estimating non-linear ARMA models using Fourier
coefficients, International J.Forecasting 16 (2000) 333-347
Lütkepol, H. (1996): Handbook of matrices, J. Wiley, Chichester U.K. 1996
Lund, U. (1999): Least circular distance regression for directional data, J. Applied Statis-
tics 26 (1999) 723-733
Macinnes, C.S. (1999): The solution to a structured matrix approximation problem using
Grassmann coordinates, SIAM J. Matrix Analysis Appl. 21 (1999) 446-453
Madansky, A. (1959): The fitting of straight lines when both variables are subject to error,
J. American Statistical Ass. 54 (1959) 173-205
Madansky, A. (1962): More on lenght of confidence intervals, J. Amer. Statist. Assoc. 57
(1962) 586-589
Maejima, M. (1978): Some Lp versions for the central limit theorem, Ann. Probability 6
(1978) 341-344
Maekkinen, J. (2002): A bound for the Euclidean norm of the difference between the best
linear unbiased estimator and a linear unbiased estimator, J. Geodesy 76 (2002) 317-
322
Maess, G. (1988): Vorlesungen über numerische Mathematik II. Analysis, Birkhäuser
Verlag, Basel Boston 1988
Magder, L.S. and S.L. Zeger (1996): A smooth nonparametric estimate of a mixing distri-
bution using mixtures of Gaussians, J. Amer. Statist. Assoc. 91 (1996) 1141-1151
Magee, L. (1998): Improving survey-weighted least squares regression, J. Roy Statist.
Soc. B 60 (1998) 115-126
Magness, T.A. and J.B. McGuire (1962): Comparison of least-squares and minimum
variance estimates of regression parameters, Ann. Math. Statist. 33 (1962) 462-470
Magnus, J.R. and H. Neudecker (1988): Matrix differential calculus with applications in
statistics and econometrics, Wiley, Chichester 1988
Mahalanabis, A. and M. Farooq (1971): A second-order method for state estimation of
nonlinear dynamical systems, Int. J. Contr. 14 (1971) 631-639
Mahanobis, P.C., Bose, R.C. and S.N. Roy (1937): Normalisation of statistical variates
and the use of rectangular coordinates in the theory of sampling distributions, Sank-
hya 3 (1937) 1-40
Mallet, A. (1986): A maximum likelihood estimation method for random coefficient
regression models, Biometrika 73 (1986) 645-656
Malliavin, P. (1997): Stochastic analysis, Springer Verlag, New York 1997
Mallows, C.L. (1961): Latent vectors of random symmetric matrices, Biometrika 48
(1961) 133-149
Malyutov, M.B. and R.S. Protassov (1999): Functional approach to the asymptotic nor-
mality of the non-linear least squares estimator, Statistics & Probability Letters 44
(1999) 409-416
Mamontov, Y. and M. Willander (2001): High-dimensional nonlinear diffusion stochastic
processes, World Scientific, Singapore 2001
Mandel, J. (1994): The analysis of two-way layouts, Chapman and Hall, Boca Raton 1994
Mangasarian, O.L. and D.R. Musicant (2000): Robust linear and support vector regres-
sion, IEEE Transactions on Pattern analysis and Maschen Intelligence 22 (2000) 950-
955
References 709
Mangoubi, R.S. (1998): Robust estimation and failure detection, Springer Verlag, Berlin-
Heidelberg New York 1998
Manly, B.F.J. (1976): Exponential data transformations, The Statistician 25 (1976) 37-42
Mardia, K.V. (1962): Multivariate Pareto distributions, Ann. Math. Statistics 33 (1962)
1008-1015
Mardia, K.V. (1970): Measures of multivariate skewness and kurtosis with applications,
Biometrika 57 (1970) 519-530
Mardia, K.V. (1972): Statistics of directional data, Academic Press, New York 1972
Mardia, K.V. (1975a): Characterization of directional distributions, Statistical Distribu-
tions, Scientific Work 3 (1975), G.P. Patil et al (Eds.), 365-385
Mardia, K.V. (1975b): Statistics of directional data, J. Royal Statistical society, Series B,
37 (1975) 349-393
Mardia, K.V. (1976): Linear-circular correlation coefficients and rhythmometry, Bio-
metrika 63 (1976) 403-405
Mardia, K.V. (1988): Directional data analysis: an overview, J. Applied Statistics 2
(1988) 115-122
Mardia, K.V. and M.L. Puri (1978): A robust spherical correlation coefficient against
scale, Biometrika 65 (1978) 391-396
Mardia, K.V., Kent J. and J.M. Bibby: Multivariate analysis, Academic Press, London
1979
Mardia, K.V., Southworth, H.R. and C.C. Taylor (1999): On bias in maximum likelihood
estimators, J. Statist. Planning and Inference 76 (1999) 31-39
Mardia, K.V. and P.E. Jupp (1999): Directional statistics, J. Wiley, England 1999
Marinkovic, P, Grafarend, E.W. and T. Reubelt (2003): Space gravity spectroscopy: the
benefits of Taylor-Karman structured criterion matrices, Advances in Geosciences 1
(2003) 113-120
Mariwalla, K.H. (1971): Coordinate transformations that form groups in the large, in: De
Sitter and Conformal Groups and their Applications, A.O. Barut and W.E. Brittin
(Hrsg.), vol. 13, pages 177-191, Colorado Associated University Press, Boulder 1971
Markatou, M. and X. He (1994): Bounded influence and high breakdown point testing
procedures in linear models, J. Am. Statist. Assn. 89, 543-49, 1994
Markiewicz, A. (1996): Characterization of general ridge estimators, Statistics & Prob-
ability Letters 27 (1996) 145-148
Markov, A.A. (1912): Wahrscheinlichkeitsrechnung, 2nd edition, Teubner, Leipzig 1912
Maroãeviü, T. and D. Jukiü (1997): Least orthogonal absolute deviations problem for
exponential function, Student 2 (1997) 131-138
Marquardt, D.W. (1963): An algorithm for least-squares estimation of nonlinear parame-
ters, J. Soc. Indust. Appl. Math. 11 (1963) 431-441
Marquardt, D.W. (1970): Generalized inverses, ridge regression, biased linear estimation
and nonlinear estimation, Technometrics 12 (1970) 591-612
Marriott, J. and P. Newbold (1998): Bayesian comparison of ARIMA and stationary
ARMA models, J. Statistical Review 66 (1998) 323-336
Marsaglia, G. and G.P.H. Styan (1972): When does rank (A+B) = rank A + rank B?,
Canad. Math. Bull. 15 (1972) 451-452
Marshall, J. (2002): L1-norm pre-analysis measures for geodetic networks, J. Geodesy 76
(2002) 334-344
Martinek, Z. (2002a): Spherical harmonic analysis of regularly distributed data on a
sphere with a uniform or a non-uniform distribution of data uncertainties, or, shortly,
what I know about: Scala surface spherical harmonics, Geodätisches Oberseminar
Stuttgart 2002
710 References
Martinek, Z. (2002b): Lecture Notes 2002. Scalar surface spherical harmonics, Geo For-
schungs Zentrum Potsdam 2002
Martinez, W.L. and E.J. Wegman (2000): An alternative criterion useful for finding exact
E-optimal designs, Statistic & Probability Letters 47 (2000) 325-328
Maruyama, Y. (1998): Minimax estimators of a normal variance, Metrika 48 (1998) 209-
214
Masry, E. (1997): Additive nonlinear arx time series and projection estimates, Economet-
ric Theory 13 (1997) 214-252
Mastronardi, N., Lemmerling, P. and S. van Huffel (2000): Fast structured total least
squares algorithm for solving the basic deconvolution problem, Siam J. Matrix Anal.
Appl. 22 (2000) 533-553
Matérn, B. (1989): Precision of area estimation: a numerical study, J. Microscopy 153
(1989) 269-284
Mathai, A.M. (1997): Jacobians of matrix transformations and functions of matrix argu-
ments, World Scientific, Singapore 1997
Mathar, R. (1997): Multidimensionale Skalierung: mathematische Grundlagen und algo-
rithmische Aspekte, Teubner Stuttgart 1997
Mathew, T. (1989): Optimum invariant tests in mixed linear models with two variance
components, Statistical Data Analysis and Inference, Y. Dodge (ed.), North-Holland
(1989) 381-388
Mathew, T. (1997): Wishart and Chi-Square Distributions Associated with Matrix Quad-
ratic Forms, J. Multivariate Analysis 61 (1997) 129-143
Mathew, T. and B.K. Sinha (1988): Optimum tests in unbalanced two-way models with-
out interaction, Ann. Statist. 16 (1988) 1727-1740
Mathew, T. and S. Kasala (1994): An exact confidence region in multivariate calibration,
The Annals of Statistics 22 (1994) 94-105
Mathew, T. and W. Zha (1996): Conservative confidence regions in multivariate calibra-
tion, The Annals of Statistics 24 (1996) 707-725
Mathew, T. and K. Nordstroem (1997): An inequality for a measure of deviation in linear
models, The American Statistician 51 (1997) 344-349
Mathew, T. and W. Zha (1997): Multiple use confidence regions in multivariate calibra-
tion, J. American Statist. Assoc. 92 (1997) 1141-1150
Mathew, T. and W. Zha (1998): Some single use confidence regions in a multivariate
calibration problem, Applied Statist. Science III (1998) 351-363
Mathew, T., Sharma, M.K. and K. Nordström (1998): Tolerance regions and multiple-use
confidence regions in multivariate calibration, The Annals of Statistics 26 (1998)
1989-2013
Mathias, R. and G.W. Stewart (1993): A block QR algorithm and the singular value de-
composition, Linear Algebra Appl. 182 (1993) 91-100
Mauly, B.F.J. (1976): Exponential data transformations, Statistician 27 (1976) 37-42
Mautz, R. (2002): Solving nonlinear adjustment problems by global optimization, Boll. di
Geodesia e Scienze Affini 61 (2002) 123-134
Maxwell, S.E. (2003): Designig experiments and analyzing data. A model comparison
perspective, Lawrence Erlbaum Associates, Publishers, London - New Jersey 2003
Maybeck, P. (1979): Stochastic models, estimation and control, vol. 1, Academic Press,
New York 1979
Mayer, D.H. (1975): Vector and tensor fields on conformal space, J. Math. Physics 16
(1975) 884-893
Mazya, V. and T. Shaposhnikova (1998): Jacques Hadamand, a universal mathematician
American Mathematical Society, Providence 1998
References 711
McCullagh, P. (1983): Quasi-likelihood functions, The Annals of Statistics 11 (1983) 59-
67
McCullagh, P. (1987): Tensor methods in statistics, Chapman and Hall, London 1987
McCullagh, P. and J.A. Nelder (1989): Generalized linear models, Chapman and Hall,
London 1989
McCulloch, C.E. (1997): Maximum likelihood algorithms for generalized linear mixed
models, J. American Statist. Assoc. 92 (1997) 162-170
McCulloch, C.E. and S.R. Searle (2002): Generalized, lineas and mixed models, Wiley
Series in Probability and Statistic 2002
McElroy, F.W. (1967): A necessary and sufficient condition that ordinary least-squares
estimators be best linear unbiased, J.the American Statistical Association 62 (1967)
1302-1304
McGilchrist, C.A. (1994): Estimation in generalized mixed models, J. Roy. Statist. Soc.,
Series B, 56 (1994) 61-69
McGilchrist, C.A. and C.W. Aisbett (1991): Restricted BLUP for mixed linear models,
Biometrical Journal 33 (1991) 131-141
McGilchrist, C.A. and K.K.W. Yau (1995): The derivation of BLUP, ML, REML estima-
tion methods for generalised linear mixed models, Commun. Statist.-Theor. Meth. 24
(1995) 2963-2980
McMorris, F.R. (1997): The median function on structured metric spaces, Student 2
(1997) 195-201
Mehta, M.L. (1991): Random matrices, Academic Press, New York 1991
Meissl, P. (1965): Über die Innere Genauigkeit dreidimensionaler Punkthaufen, Z. Ver-
messungswesen 90 (1965) 198-118
Meissl, P. (1969): Zusammenfassung und Ausbau der inneren Fehlertheorie eines Punkt-
haufens, Deutsche Geod. Kommission A 61, München 1994, 8-21
Meissl, P. (1976): Hilbert spaces and their application to geodetic least-squares problems,
Bolletino di Geodesia e Scienze Affini 35 (1976) 49-80
Melbourne, W. (1985): The case of ranging in GPS-based geodetic systems, Proc. 1st Int.
Symp. on Precise Positioning with GPS, Rockville, Maryland (1985) 373-386
Menz, J. (2000): Forschungsergebnisse zur Geomodellierung und deren Bedeutung, Ma-
nuskript 2000
Menz, J. and N. Kolesnikov (2000): Bestimmung der Parameter der Kovarianzfunktionen
aus den Differenzen zweier Vorhersageverfahren, Manuskript, 2000
Merriman, M. (1877): On the history of the method of least squares, The Analyst 4 (1877)
Merriman, M. (1884): A textbook on the method of least squares, Wiley, New York 1884
Meyer, C.D. (1973): Generalized inverses and ranks of block matrices, SIAM J. Appl.
Math. 25 (1973) 597-602
Mhaskar, H.N., Narcowich, F.J and J.D. Ward (2001): Representing and analyzing scat-
tered data on spheres, In: Multivariate approximations and applications, Cambridge
University Press, Cambridge 2001, 44-72
Midi, H. (1999): Preliminary estimators for robust non-linear regression estimation, J.
Applied Statistics 26 (1999) 591-600
Migon, H.S. and D. Gammermann (1999): Statistical inference, Arnold London 1999
Miller, R.G. (1981): Simultaneous statistical inference, Springer Verlag 1981
Minami, M. and K. Shimizu (1998): Estimation for a common correlation coefficient in
bivariate normal distributions with missing observations, Biometrics 54 (1998) 1136-
1146
Minzhong, J. and C. Xiru (1999): Strong consistency of least squares estimate in multiple
regression when the error variance is infinite, Statistica Sinica 9 (1999) 289-296
712 References
Minkler, G. and J. Minkler (1993): Theory and application of Kalman filtering, Magellan
Book Company 1993
Misra, R.K. (1996): A multivariate procedure for comparing mean vectors for populations
with unequal regression coefficient and residual covariance matrices, Biom. J. 38
(1996) 415-424
Mitra, S.K. (1982): Simultaneous diagonalization of rectangular matrices, Linear Algebra
Appl. 47 (1982) 139-150
Mjulekar, M.S. and S.N. Mishra (2000): Confidence interval estimation of overlap: Equal
means case, Computational Statistics & Data Analysis 34 (2000) 121-137
Mohan, S.R. and S.K. Neogy (1996): Algorithms for the generalized linear complemen-
tarity problem with a vertical block Z-matrix, Siam J. Optimization 6 (1996) 994-
1006
Moire, C. and J.A. Dawson (1992): Distribution, Chapman and Hall, Boca Raton 1996
Money, A.H. et al. (1982): The linear regression model: Lp-norm estimation and the
choice of p, Commun. Statist. Simul. Comput. 11 (1982) 89-109
Monin, A.S. and A.M. Yaglom (1981): Statistical fluid mechanics: mechanics of turbu-
lence, vol. 2, The Mit Press, Cambridge 1981
Montromery, D.C. (1996): Introduction to statistical quality control, 3rd edition, J. Wiley,
New York 1996
Mood, A.M., F.A. Graybill and D.C. Boes (1974): Introduction to the theory of statistics,
3rd ed., McGraw-Hill, New York 1974
Moon, M.S. and R.F. Gunst (1994): Estimation of the polynomial errors-in-variables
model with decreasing error variances, J. Korean Statist. Soc. 23 (1994) 115-134
Moon, M.S. and R.F. Gunst (1995): Polynomial measurement error modelling, Comput.
Statist. Data Anal. 19 (1995) 1-21
Moore, E.H. (1900): A fundamental remark concerning determinantal notations with the
evaluation of an important determinant of special form, Ann.Math. 2 (1900) 177-188
Moore, E.H. (1920): On the reciprocal of the general algebraic matrix, Bull. Amer. Math.
Soc 26 (1920) 394-395
Morgan, B.J.T. (1992): Analysis of quantal response data, Chapman and Hall, Boca Raton
1992
Morgenthaler, S. (1992): Least-absolute-deviations fits for generalized linear models,
Biometrika 79 (1992) 747-754
Morgera, S. (1992): The role of abstract algebra in structured estimation theory, IEEE
Trans. Inform. Theory 38 (1992) 1053-1065
Moritz, H. (1976): Covariance functions in least-squares collocation, Rep. Ohio State Uni.
240, 1976
Morris, C.N. (1982): Natural exponential families with quadratic variance functions, Ann.
Statist. 10 (1982) 65-80
Morrison, D.F. (1967): Multivariate statistical methods, Mc Graw-Hill Book Comp., New
York 1967
Morrison, T.P. (1997): The art of computerized measurement, Oxford University Press,
Oxford 1997
Morsing, T. and C. Ekman (1998): Comments on construction of confidence intervals in
connection with partial least-squares, J. Chemometrics 12 (1998) 295-299
Moser, B.K. and M.H. McCann (1996): Maximum likelohood and restricted maximum
likelihood estimators as functions of ordinary least squares and analysis of variance
estimators, Commun. Statist.-Theory Meth. 25 (1996) 631-646
Moser, B.K. and J.K. Sawyer (1998): Algorithms for sums of squares and covariance
matrices using Kronecker Products, The American Statistician 52 (1998) 54-57
References 713
Moutard, T. (1894): Notes sur la propagation des ondes et les équations de l'hydroudyna-
mique, Paris 1893, reprint Chelsea Publ., New York 1949
Mudholkar, G.S, (1997): On the efficiencies of some common quick estimators, Commun.
Statist.-Theory Meth. 26 (1997) 1623-1647
Mueller, C.H. (1997): Robust planning and analysis of experiments, Springer Verlag,
New York 1997
Mueller, C.H. (1998): Breakdown points of estimators for aspects of linear models, in:
MODA 5 – Advances in model-oriented data analysis and experimental design, pp.
137-144, Atkinson, A.K., Pronzato, L. and H.P. Wynn (eds), Physica Verlag 1998
Mueller, C.H. (2003): Robust estimators for estimating discontinuous functions, Devel-
opments in Robust Statistics, pp. 266-277, Physica Verlag, Heidelberg 2003
Mueller-Gronbach, T. (1996): Optimal designs for approximating the path of a stochastic
process, J. Statistical Planning and Inference 49 (1996) 371-385
Mueller, H. (1983): Strenge Ausgleichung von Polygonnetzen unter rechentechnischen
Aspekten, Deutsche Geodätische Komission, Bayerische Akademie der Wissenschaf-
ten C279, München 1983
Mueller, H. (1985): Second-order design of combined linear-angular geodetic networks,
Bulletin Geodésique, 59 (1985), 316-331
Mueller, J. (1987): Sufficiency and completeness in the linear model, J. Multivariate
Anal. 21 (1987) 312-323
Mueller, J., Rao, C.R. and B.K. Sinha (1984): Inference on parameters in a linear model: a
review of recent results, in: Experimental design, statistical models and genetic statis-
tics, K. Hinkelmann (ed.), Chap. 16, Marcel Dekker, New York 1984
Mueller, W. (1995): Ein Beispiel zur Versuchungsplanung bei korrelierten Beobachtun-
gen, Osterreichische Zeitschrift für Statistik 24 (1995) 9-15
Mueller, W. (1998a): Spatial data collection, contributions to statistics, Physica Verlag,
Heidelberg 1998
Mueller, W. (1998b): Collecting spatial data - optimum design of experiments for random
fields, Physica-Verlag, Heidelberg 1998
Mueller, W. (2001): Collecting spatial data - optimum design of experiments for random
fields, 2nd ed., Physica-Verlag, Heidelberg 2001
Mueller, W. and A. Pázman (1998): Design measures and extended information matrices
for experiments without replications, J. Statist. Planning and Inference 71 (1998) 349-
362
Mueller, W. and A. Pázman (1999): An algorithm for the computation of optimum de-
signs under a given covariance structure, Computational Statistics 14 (1999) 197-211
Mueler, W.G. (1995): Ein Beispiel zur Versuchungsplanung bei korrelierten Beobachtun-
gen, Österreichische Zeitschrift für Statistik N.F. 24 (1995) 9-15
Mukhopadhyay, P. and R. Schwabe (1998): On the performance of the ordinary least
squares method under an error component model, Metrika 47 (1998) 215-226
Muir, T. (1911): The theory of determinants in the historical order of development, vol-
umes 1-4, Dover, New York 1911, reprinted 1960
Muirhead, R.J. (1982): Aspects of multivariate statistical theory, J. Wiley, New York
1982
Mukherjee, K. (1996): Robust estimation in nonlinear regression via minimum distance
method, Mathematical Methods of Statistics 5 (1996) 99-112
Mukhopadhyay, N. (2000): Probability and statistical inference, Dekker, New York 2000
Muller, C. (1966): Spherical harmonics – Lecture notes in mathematics 17 (1966),
Springer-Verlag, New York, 45pp.
714 References
Muller, D. and W.W.S. Wei (1997): Iterative least squares estimation and identification of
the tranfer function model, J. Time Series Analysis 18 (1997) 579-592
Muller, H. and M. Illner (1984): Gewichtsoptimierung geodätischer Netze. Zur Anpas-
sung von Kriteriumsmatrizen bei der Gewichtsoptimierung, Allg. Vermessungsnach-
richten (1984), 253-269
Muller, H. and G. Schmitt (1985): SODES2 – Ein Programm-System zur Gewichtsopti-
mierung zweidimensionaler geodätischer Netze. Deutsche Geodätische Kommission,
München, Reihe B, 276 (1985)
Munoz-Pichardo, J.M., Munoz-García, J., Fernández-Ponce, J.M. and F. López-Bláquez
(1998): Local influence on the general linear model, Sankhya: The Indian J. Statistics
60 (1998) 269-292
Murray, J.K. and J.W. Rice (1993): Differential geometry and statistics, Chapman and
Hall, Boca Raton 1993
Myers, J.L. (1979): Fundamentals of experimental designs, Allyn and Bacon, Boston
1979
Nadaraya, E. (1993): Limit distribution of the integrated squared error of trigonometric
series regression estimator, Proceedings of the Georgian Academy of Sciences.
Mathematics 1 (1993) 221-237
Naes, T. and H. Martens (1985): Comparison of prediction methods for multicollinear
data, Commun. Statist. Simulat. Computa. 14 (1985) 545-576
Naether, W. (1985): Exact designs for regression models with correlated errors, Statistics
16 (1985) 479-484
Nagaev, S.V. (1979): Large deviations of sums of independent random variables, Ann.
Probability 7 (1979) 745-789
Nagar, D.K. and A.K. Gupta (1996): On a test statistic useful in Manova with structured
covariance matrices, J. Appl. Stat. Science 4 (1996) 185-202
Nagaraja, H.N. (1982): Record values and extreme value distributions, J. Appl. Prob. 19
(1982) 233-239
Nagarsenker, B.N. (1977): On the exact non-null distributions of the LR criterion in a
general MANOVA model, Sankya A39 (1977) 251-263
Nahin, P.J. (2004): When least is best, how mathematicians discovered many clever ways
to make things as small (or a s large) as possible, Princeton University Press, Prince-
ton and Oxford 2004
Nakamura, N. and S. Konishi (1998): Estimation of a number of components for multi-
variate normal mixture models based on information criteria, Japanese J. Appl. Statist
27 (1998) 165-180
Nakamura, T. (1990): Corrected score function for errors-invariables models: methodol-
ogy and application to generalized linear models, Biometrika 77 (1990) 127-137
Nandi, S. and D. Kundu (1999): Least-squares estimators in a stationary random field, J.
Indian Inst. Sci. 79 (1999) 75-88
Nashed, M.Z. (1976): Generalized inverses and applications, Academic Press, New York
1976
Nelson, R. (1995): Probability, stochastic processes, and queueing theory, Springer Ver-
lag, New York 1995
Nesterov, Y.E. and A.S. Nemirovskii (1992): Interior point polynomial methods in con-
vex programming, Springer Verlag, New York 1992
Neudecker, H. (1968): The Kronecker matrix product and some of its applications in
econometrics, Statistica Neerlandica 22 (1968) 69-82
Neudecker, H. (1969): Some theorems on matrix differentiation with special reference to
Kronecker matrix products, J. American Statist. Assoc. 64 (1969) 953-963
References 715
Neudecker, H. (1978): Bounds for the bias of the least squares estimator of V2 in the case
of a first-order autoregressive process (positive autocorrelation), Econometrica 45
(1977) 1257-1262
Neudecker, H. (1978): Bounds for the bias of the LS estimator of V2 in the case of a first-
order (positive) autoregressive process when the regression contains a constant term,
Econometrica 46 (1978) 1223-1226
Neumaier, A. (1998): Solving ILL-conditioned and singular systems: a tutorial on regu-
larization, SIAM Rev. 40 (1998) 636-666
Neuts, M.F. (1995): Algorithmic probability, Chapman and Hall, Boca Raton 195
Neutsch, W. (1995): Koordinaten: Theorie und Anwendungen, Spektrum Akademischer
Verlag, Heidelberg 1995
Neway, W.K. and J.K. Powell (1987): Asymmetric least squares estimation and testing,
Econometrica 55 (1987) 819-847
Newman, D. (1939): The distribution of range in samples from a normal population,
expressed in terms of an independent estimate of standard deviation, Biometrika 31
(1939) 20-30
Neykov, N.M. and C.H. Mueller (2003): Breakdown point and computation of trimmed
likelihood estimators in generalized linear models, in: R. Dutter, P. Filzmoser, U.
Gather, P.J. Rousseeuw (Eds.), Developments in Robust Statistics, pp. 277-286,
Physica Verlag, Heidelberg 2003
Neyman, J. (1934): On the two different aspects of the representative method, J. Royal
Statist. Soc. 97 (1934) 558-606
Neyman, J. (1937): Outline of the theory of statistical estimation based on the classical
theory of probability, Phil. Trans. Roy. Soc. London 236 (1937) 333-380
Ng, M.K. (2000): Preconditioned Lanczos methods for the minimum eigenvalue of a
symmetric positive definitive toeplitz matrix, SIAM J. Svi. Comput 21 (2000) 1973-
1986
Nicholson, W.K. (1999): Introduction to abstract algebra, 2nd ed., J. Wiley, New York
1999
Niemeier, W. (1985): Deformationsanalyse, in: Geodätische Netze in Landes- und Ingeni-
eurvermessung II, Kontaktstudium, ed. H. Pelzer, Wittwer, Stuttgart 1985
Nkuite, G. (1998): Ausgleichung mit singulärer Varianzkovarianzmatrix am Beispiel der
geometrischen Deformationsanalyse, Dissertation, TU München, München 1998
Nobre, S. and M. Teixeiras (2000): Der Geodät Wilhelm Jordan und C.F. Gauss, Gauss-
Gesellschaft e.V. Goettingen, Mitt. 38, pp. 49-54 Goettingen 2000
Nyquist, H. (1988): Least orthogonal absolute deviations, Comput. Statist. Data Anal. 6
(1988) 361-367
O'Neill, M., Sinclair, L.G. and F.J. Smith (1969): Polynomial curve fitting when abscissa
and ordinate are both subject ot error, Comput. J. 12 (1969) 52-56
O'Neill, M.E. and K. Mathews (2000): A weighted least squares approach to Levene's test
of homogeneity of variance, Austral. & New Zealand J. Statist. 42 (2000) 81-100
Offlinger, R. (1998): Least-squares and minimum distance estimation in the three-
parameter Weibull and Fréchet models with applications to river drain data, in: Kahle,
et al (eds.) Advances in stochastic models for reliability, quality and safety, pages 81-
97, Birkhäuser Verlag, Boston 1998
Ogawa, J. (1950): On the independence of quadratic forms in a non-central normal sys-
tem, Osaka Mathematical Journal 2 (1950) 151-159
Ohtani, K. (1988a): Optimal levels of significance of a pre-test in estimating the distur-
bance variance after the pre-test for a linear hypothesis on coefficients in a linear re-
gression, Econom. Lett. 28 (1988) 151-156
716 References
Ohtani, K. (1998b): On the sampling performance of an improved Stein inequality re-
stricted estimator, Austral. and New Zealand J. Statis. 40 (1998) 181-187
Ohtani, K. (1998c): The exact risk of a weighted average estimator of the OLS and Stein-
rule estimators in regression under balanced loss, Statistics & Decisions 16 (1998) 35-
45
Ohtani, K. (1996): Further improving the Stein-rule estimator using the Stein variance
estimator in a misspecified linear regression model, Statist. Probab. Lett. 29 (1996)
191-199
Okamoto, M. (1973): Distinctness of the eigenvalues of a quadratic form in a multivariate
sample, Ann. Stat. 1 (1973) 763-765
Okeke, F. and F. Krumm (1998): Graph, graph spectra and partitioning algorithms in a
geodetic network structural analysis and adjustment, Bolletino di Geodesia e Scienze
Affini 57 (1998) 1-24
Olkin, I. (1998): The density of the inverse and pseudo-inverse of a random matrix, Statis-
tics and Probability Letters 38 (1998) 131-135
Olkin, J. (2000): The 70th anniversary of the distribution of random matrices: a survey,
Technical Report No. 2000-06, Department of Statistics, Stanford University, Stan-
ford 2000
Olkin, I. and S.N. Roy (1954): On multivariate distribution theory, Ann. Math. Statist. 25
(1954) 329-339
Olkin, I. and A.R. Sampson (1972): Jacobians of matrix transformations and induced
functional equations, Linear Algebra Appl. 5 (1972) 257-276
Olkin, I. and J.W. Pratt (1958): Unbiased estimation of certain correlation coefficient,
Annals Mathematical Statistics 29 (1958) 201-211
Olsen, A., Seely, J. and D. Birkes (1976): Ivariant quadratic unbiased estimators for two
variance components, Annals of Statistics 4 (1976) 878-890
Ord, J.K. and S. Arnold (1997): Kendall’s advanced theory of statistics, volume IIA,
classical inference, Arnold Publ., 6th edition, London 1997
Ortega, J.M. and W.C. Rheinboldt (2000): Iterative solution of nonlinear equations in
several variables, SIAM 2000
Osborne, M.R. (1972): Some aspects of nonlinear least squares calculations, Numerical
Methods for Nonlinear Optimization, ed. F.A. Lootsma, Academic Press, New York
1972
Osborne, M.R. (1976): Nonlinear least squares the Levenberg algorithm revisited, J. Aust.
Math. Soc. B 19 (1976) 343-357
Osborne, M.R. and G.K. Smyth (1986): An algorithm for exponential fitting revisited, J.
App. Prob. (1986) 418-430
Osborne, M.R. and G.K. Smyth (1995): A modified Prony algorithm for fitting sums of
exponential functions, SIAM J. Sc. and Stat. Comp. 16 (1995) 119-138
Osiewalski, J. and M.F.J. Steel (1993): Robust Bayesian inference in elliptical regression
models, J. Economatrics 57 (1993) 345-363
Ouellette, D.V. (1981): Schur complements and statistics, Linear Algebra Appl. 36 (1981)
187-295
Owens, W.H. (1973): Strain modification of angular density distributions, Techtonophys-
ics 16 (1973) 249-261
Oyet, A.J. and D.P. Wiens (2000): Robust designs for wavelet approximations of regres-
sion models, Nonparametric Statistics 12 (2000) 837-859
Ovtchinnikov, E.E. and L.S. Xanthis (2001): Successive eigenvalue relaxation : a new
method for the generalized eigenvalue problem and convergence estimates, Proc. R.
Soc. Lond. A 457 (2001) 441-451
References 717
Padmawar, V.R. (1998): On estimating nonnegative definite quadratic forms, Metrika 48
(1998) 231-244
Pagano, M. (1978): On periodic and multiple autoregressions, Annals of Statistics 6
(1978) 1310-1317
Paige, C.C. and M.A.Saunders (1975): Solution of sparse indefinite systems of linear
equations, SIAM J. Numer. Anal. 12 (1975) 617-629
Paige, C. and C. van Loan (1981): A Schur decomposition for Hamiltonian matrices,
Linear Algebra and its Applications 41 (1981) 11-32
Pakes, A.G. (1999): On the convergence of moments of geometric and harmonic means,
Statistica Neerlandica 53 (1999) 96-110
Pal, N. and W.K. Lim (2000): Estimation of a correlation coefficient: some second order
decision – theoretic results, Statistics and Decisions 18 (2000) 185-203
Pal, S.K. and P.P. Wang (1996): Genetic algorithms for pattern recognition, CRC Press,
Boca Raton 1996
Papoulis, A. (1991): Probability, random variables and stochastic processes, McGraw
Hill, New York 1991
Park, H. (1991): A parallel algorithm for the unbalanced orthogonal Procrustes problem,
Parallel Computing 17 (1991) 913-923
Parker, W.V. (1945): The characteristic roots of matrices, Duke Math. J. 12 (1945) 519-
526
Parthasaratky, K.R. (1967): Probability measures on metric spaces, Academic Press, New
York 1967
Parzen, E. (1962): On estimation of a probability density function and mode, Ann. Math.
Statistics 33 (1962) 1065-1073
Patel, J.K. and C.B. Read (1982): Handbook of the normal distribution, Marcel Dekker,
New York and Basel 1982
Patil, V.H. (1965): Approximation to the Behrens-Fisher distribution, Biometrika 52
(1965) 267-271
Pázman, A. (1986): Foundations of optimum experimental design, Mathematics and its
applications, D. Reidel, Dordrecht 1986
Pázman, A. and J.-B. Denis (1999): Bias of LS estimators in nonlinear regression models
with constraints. Part I: General case, Applications of Mathematics 44 (1999) 359-374
Pázman, A. and W.G. Mueler (1998): A new interpretation of design measures, in:
MODA 5 – Advances in model-oriented data analysis and experimental design, pp.
239-246, Atkinson, A.K., Pronzato, L. and H.P. Wynn (eds), Physica Verlag 1998
Pearson, E.S. (1970): William Sealy Gosset, 1876-1937: "Student" as a statistician, Stud-
ies in the History of Statistics and Probalbility (E.S. Pearson and M.G. Kendall, eds.)
Hafner Publ., 360-403, New York 1970
Pearson, E.S. and H.O. Hartley (1958): Biometrika Tables for Statisticians Vol. 1, Cam-
bridge University Press, Cambridge 1958
Pearson, K. (1905): The problem of the random walk, Nature 72 (1905) 294
Pearson, K. (1906): A mathematical theory of random migration, Mathematical Contribu-
tions to the Theory of Evolution, XV Draper’s Company Research Memoirs, Bio-
metrik Series III, London 1906
Pearson, K. (1931): Historical note on the distributional of the Standard Deviations of
Samples of any size from any indefinitely large Normal Parent Population, Bio-
metrika 23 (1931) 416-418
Peddada, S.D. and T. Smith (1997): Consistency of a class of variance estimators in linear
models under heteroscedasticity, Sankhya 59 (1997) 1-10
718 References
Pelzer, H. (1971): Zur Analyse geodätischer Deformationsmessungen, Deutsche Geodäti-
sche Kommission, Akademie der Wissenschaften, Reihe C (164), München 1971
Pelzer, H. (1974): Zur Behandlung singulärer Ausgleichungsaufgaben, Z. Vermessungs-
wesen 99 (1974) 181-194, 479-488
Pena, D., Tiao, G.C., and R.S. Tsay (2001): A course in time series analysis, Wiley, New
York 2001
Penrose, R. (1955): A generalised inverse for matrices, Proc. Cambridge Phil. Soc. 51
(1955) 406-413
Penny, K.I. (1996): Appropriate critical values when testing for a single multivariate
outlier by using the Mahalanobis distance, in: Applied Statistics, ed. S.M. Lewis and
D.A. Preece, J. Royal Stat. Soc. 45 (1996) 73-81
Percival, D.B. and A.T. Walden (1993): Spectral analysis for physical applications, Cam-
bridge, Cambridge University Press 1997
Pereyra, V. and G. Scherer (1973): Efficient computer manipulation of tensor products
with application to multidimensional approximation, Math. Computation 27 (1973)
595-605
Perron, F. and N. Giri (1992): Best equivariant estimation in curved covariance models, J.
Multivariate Analysis 40 (1992) 46-55
Perron, F. and N. Giri (1990): On the best equivariant estimator of mean of a multivariate
normal population, J. Multivariate Analysis 32 (1990) 1-16
Percival, D.B. and A.T. Walden (1999): Wavelet methods for time series analysis, Cam-
bridge University Press, Cambridge 1999
Petrov, V.V. (1975): Sums of independent random variables, Berlin 1975
Pfeufer, A. (1990): Beitrag zur Identifikation und Modellierung dynamischer Deformati-
onsprozesse, Vermessungstechnik 38 (1990) 19-22
Pfeufer, A. (1993): Analyse und Interpretation von Überwachungsmessungen - Termino-
logie und Klassifikation, Z. Vermessungswesen 118 (1993) 19-22
Phillips, G.M. (2000): Two millennia of mathematics – From Archimedes to Gauss,
Springer 2000
Piepho, H.-P. (1998): An algorithm for fitting the shifted multiplicative model, J. Statist.
Comput. Simul. 62 (1998) 29-43
Pilz, J. (1983): Bayesian estimation and experimental design in linear regression models,
Teubner-Texte zur Mathematik 55, Teubner, Leipzig 1983
Pincus, R. (1974): Estimability of parameters of the covariance matrix and variance com-
ponents, Math. Operationsforschg. Statistik 5 (1974) 245-248
Pinheiro, J.C. and D.M. Bates (2000): Mixed-effects models in S and S-Plus, Statistics
and Computing, Springer-Verlag, New York 2000
Pison, G., Van Aelst, S. and G. Willems (2003): Small sample corrections for LTS and
MCD, Developments in Robust Statistics, pp. 330-343, Physica Verlag, Heidelberg
2003
Pistone, G. and M.P. Rogantin (1999): The exponential statistical manifold: mean pa-
rameters. Orthogonality and space transformations, Bernoulli 5 (1999) 721-760
Pitman, E.J.G. (1979): Some basic theory for statistical inference, Chapman and Hall,
Boca Raton 1979
Pitman, J. and M. Yor (1981): Bessel processes and infinitely divisible laws, unpublished
report, University of California, Berkeley
Plachky, D. (1993): An estimation-theoretical characterization of the Poisson distribution,
Statistics and Decisions, Supplement Issue 3 (1993) 175-178
Plackett, R.L. (1949): A historical note on the method of least-squares, Biometrika 36
(1949) 458-460
References 719
Plackett, R.L. (1972): The discovery of the method of least squares, Biometrika 59 (1972)
239-251
Plato, R. (1990): Optimal algorithms for linear ill-posed problems yield regularization
methods, Numer. Funct. Anal. Optim. 11 (1990) 111-118
Plemmons, R.J. (1990): Recursive least squares computation, Proceedings of the Interna-
tional Symposium MTNS 3 (1990) 495-502
Pohst, M. (1987): A modification of the LLL reduction algorithm, J. Symbolic Computa-
tion 4 (1987) 123-127
Poirier, D.J. (1995): Intermediate statistics and econometrics, The MIT Press, Cambridge
1995
Poisson, S.D. (1827): Connaissance des temps de l’annee 1827
Polasek, W. and S. Liu (1997): On generalized inverses and Bayesian analysis in simple
ANOVA models, Student 2 (1997) 159-168
Pollock, D.S.G. (1999): A handbook of time-series analysis, signal processing and dy-
namics, Academic Press, Cambridge 1999
Polya, G. (1919): Zur Statistik der sphaerischen Verteilung der Fixsterne, Astr. Nachr.
208 (1919) 175-180
Polya, G. (1930): Sur quelques points de la théorie des probabilités, Ann. Inst. H. Poin-
care 1 (1930) 117-161
Pope, A.J. (1976): The statistics of residuals and the detection of outliers, NOAA Techni-
cal Report, NOW 65 NGS 1, U.S. Dept. of Commerce, Rockville, Md., 1976
Popinski, W. (1999): Least-squares trigonometric regression estimation, Applicationes
Mathematicae 26 (1999) 121-131
Portnoy, S. and R. Koenker (1997): The Gaussian hare and the Laplacian tortoise: com-
putability of squared error versus absolute-error estimators, Statistical Science 12
(1997) 279-300
Potts, D., Steidl, G. and M. Tasche (1996): Kernels of spherical harmonics and spherical
frames, in: Advanced Topics in Multivariate Approximation pp. 287-301, Fontanella,
F., Jetter, K. and P.J. Laurent (eds), World Scientific Publishing 1996
Powers, D.L. (1999): Boundary value problems, Harcourt Academic Press 1999
Pratt, J.W. (1961): Length of confidence intervals, J. American Statistical Assoc. 56
(1961) 549-567
Pratt, J.W. (1963): Shorter confidence intervals for the mean of a normal distribution with
known variance, Ann. Math. Statist. 34 (1963) 574-586
Prescott, P. (1975): An approximate tests for outliers in linear models, Technometric 17
(1975) 129-132
Presnell, B., Morrison, S.P. and R.C. Littell (1998): Projected multivariate linear models
for directional data, J. American Statist. Assoc. 93 (1998) 1068-1077
Press, S.J. (1989): Bayesian statistics: Principles, models and applications, Wiley, New
York 1989
Press, W.H., Teukolsky, S.A., Vetterling, W.T. and B.P. Flannery (1992): Numerical
Recipes in FORTRAN (2nd edition), Cambridge University Press, Cambridge 1992
Priestley, M.B. (1981): Spectral analysis and time series, Vol. 1 and 2, Academic Press,
London 1981
Priestley, M.B. (1988): Nonlinear and nonstationary time series analysis, Academic Press,
London 1988
Prony, R. (1795): Essai experimentale et analytique, J.Ecole Polytechnique (Paris) 1
(1795) 24-76
Prószynski, W. (1997): Measuring the robustness potential of the least-squares estimation:
geodetic illustration, J. Geodesy 71 (1997) 652-659
720 References
Pruscha, H. (1996): Angewandte Methoden der Mathematischen Statistik, Teubner Skrip-
ten zur Mathematischen Stochastik, Stuttgart 1996
Pugachev, V.S. and I.N. Sinitsyn (2002): Stochastic systems, Theory and applications,
Russian Academy of Sciences 2002
Puntanen, S., Styan, G.P.H. and H.J. Werner (2000): Two matrix-based proofs that the
linear estimator Gy is the best linear unbiased estimator, J. Statist. Planning and Infer-
ence 88 (2000) 173-179
Pukelsheim, F. (1981a): Linear models and convex geometry: Aspects of non-negative
variance estimation, Math. Operationsforsch. u. Stat. 12 (1981) 271-286
Pukelsheim, F. (1981b): On the existence of unbiased nonnegative estiamtes of variance
covariance components, Ann. Statist. 9 (1981) 293-299
Pukelsheim, F. (1993): Optimal design of experiments, Wiley, New York 1993
Pukelsheim, F. (1994): The three sigma rule, American Statistician 48 (1994) 88-91
Pukelsheim, F. and B. Torsney (1991): Optimal weights for experimental designs on
linearly independent support points, The Annals of Statistics 19 (1991) 1614-1625
Pukelsheim, F. and W.J. Studden (1993): E-optimal designs for polynomial regression,
Ann. Stat. 21 (1993) 402-415
Qingming, G. and L. Jinshan (2000): Biased estimation in Gauss-Markov model, Allg.
Vermessungsnachrichten 107 (2000) 104-108
Qingming, G., Yuanxi, Y. and G. Jianfeng (2001): Biased estimation in the Gauss-
Markov model with constraints, Allg. Vermessungsnachrichten 108 (2001) 28-30
Qingming, G., Lifen, S., Yuanxi, Y. and G. Jianfeng (2001): Biased estimation in the
Gauss-Markov model not full of rank, Allg. Vermessungsnachrichten 108 (2001) 390-
393
Quintana, E.S., Quintana, G., Sun, X. and R. van de Geijn (2001): A note on parallel
matrix inversion, SIAM J. Sci. Comput 22 (2001) 1762-1771
Quintana-Orti, G., Sun, X. and C.H. Bischof (1998): A blas-3-version of the QR factoriza-
tion with column pivoting, SIAM J. Sci. Comput. 19 (1998) 1486-1494
Rader, C. and A.O. Steinhardt (1988): Hyperbolic householder transforms, SIAM. J.
Matrix Anal. Appl. 9 (1988) 269-290
Rafajlowicz, E. (1988): Nonparametric least squares estimation of a regression function,
Statistic 19 (1988) 349-358
Raj, D. (1968): Sampling theory, Mc Graw-Hill Book Comp., Bombay 1968
Ramsey, J.O. and B.W. Silverman (1997): Functional data analysis, Springer Verlag, New
York 1997
Rao, B.L.S.P. (1997a): Variance components, Chapman and Hall, Boca Raton 1997
Rao, B.L.S.P. (1997b): Weighted least squares and nonlinear regression, J. Ind. Soc. Ag.
Statistics 50 (1997) 182-191
Rao, B.L.S.P. and B.R. Bhat (1996): Stochastic processes and statistical inference, New
Age International, New Delhi 1996
Rao, C.R. (1945): Generalisation of Markoff’s Theorem and tests of linear hypotheses,
Sankya 7 (1945) 9-16
Rao, C.R. (1952a): Some theorems on Minimum Variance Estimation, Sankhya 12, 27-42
Rao, C.R. (1952b): Advanced statistical methods in biometric research, Wiley, New York
1952
Rao. C.R. (1965a): Linear Statistical Interference and ist Applications, Wiley, New York
1965
Rao, C.R. (1965b): The theory of least squares when the parameters are stochastic and its
application to the analysis of growth curves, Biometrika 52 (1965) 447-458
References 721
Rao, C.R. (1970): Estimation of heteroscedastic variances in linear models, J. Am. Stat.
Assoc. 65 (1970) 161-172
Rao, C.R. (1971a): Estimation of variance and covariance components - MINQUE theory,
J. Multivar. Anal. 1 (1971) 257-275
Rao, C.R. (1971b): Unified theory of linear estimation, Sankya Ser. A33 (1971) 371-394
Rao, C.R. (1971c): Minimum variance quadratic unbiased estimation of variance compo-
nents, J. Multivar. Anal. 1 (1971) 445-456
Rao, C.R. (1972a): Unified theory of least squares, Communications in Statistics 1 (1972)
1-8
Rao, C.R. (1972b): Estimation of variance and covariance components in linear models. J.
Am. Stat. Ass. 67 (1972) 112-115
Rao, C.R. (1973a): Linear statistical inference and its applications, 2nd ed., Wiley, New
York 1973
Rao, C.R. (1973b): Representation of best linear unbiased estimators in the Gauss-
Markoff model with a singular dispersion matrix, J. Multivariate Analysis 3 (1973)
276-292
Rao, C.R. (1976): Estimation of parameters in a linear model, Ann. Statist. 4 (1976) 1023-
1037
Rao, C.R. (1985): The inefficiency of least squares: extensions of the Kantorovich ine-
quality, Linear algebra and its applications 70 (1985) 249-255
Rao, C.R. and S.K. Mitra (1971): Generalized inverse of matrices and its applications,
Wiley, New York 1971
Rao, C.R. and J. Kleffe (1988): Estimation of variance components and applications,
North Holland, Amsterdam 1988
Rao, C.R. and R. Mukerjee (1997): Comparison of LR, score and wald tests in a non-IID
setting, J. Multivariate Analysis 60 (1997) 99-110
Rao, C.R. and H. Toutenburg (1995a): Linear models, least-squares and alternatives,
Springer-Verlag, New York 1995
Rao, C.R. and H. Toutenburg (1995b): Linear models, Springer Verlag, New York 1995
Rao, C.R. and H. Toutenburg (1999): Linear models, Least squares and alternatives, 2nd
ed., Springer Verlag, New York 1999
Rao, C.R. and G. J. Szekely (2000): Statistics for the 21st century - Methodologies for
applications of the future, Marcel Dekker, Basel 2000
Rao, P.S.R.S. and Y.P. Chaubey (1978): Three modifications of the principle of the
MINQUE, Commun. Statist. Theor. Methods A7 (1978) 767-778
Ravishanker, N. and D.K. Dey (2002): A first course in linear model theory – Multivariate
normal and related distributions, Chapman & Hall/CRC 2002
Ravi, V. and H.-J. Zimmermann (2000): Fuzzy rule based classification with FeatureSe-
lector and modified threshold accepting, European J.Operational Research 123 (2000)
16-28
Ravi, V., Reddy, P.J. and H.-J. Zimmermann (2000): Pattern classification with principal
component analysis and fuzzy rule bases, European J.Operational Research 126
(2000) 526-533
Ravishanker, N. and D.K. Dey (2002): A first course in linear model theory, CRC Press,
Boca Raton 2002
Rayleigh, L. (1880): On the resultant of a large number of vibrations of the same pitch
and of arbitrary phase, Phil. Mag. 5 (1880) 73-78
Rayleigh, L. (1905): The problem of random walk, Nature 72 (1905) 318
Rayleigh, L. (1919): On the problem of random vibrations, and of random flights in one,
two or three dimensions, Phil Mag. 37 (1919) 321-347
722 References
Reeves, J. (1998): A bivariate regression model with serial correlation, The Statistician 47
(1998) 607-615
Reich, K. (2000): Gauss' Schüler. Studierten bei Gauss und machten Karriere. Gauss'
Erfolg als Hochschullehrer (Gauss's students: studied with him and were successful.
Gauss's success as a university professor), Gauss Gesellschaft E.V.Göttingen, Mittei-
lungen Nr. 37, pages 33-62, Göttingen 2000
Relles, D.A. (1968): Robust regression by modified least squares, PhD. Thesis, Yale
University, Yale 1968
Remondi, B.W. (1984): Using the Global Positioning System (GPS) phase observable for
relative geodesy: modelling, processing and results. PhD.Thesis, Center for Space Re-
search, The University of Texas, Austin 1984
Ren, H. (1996): On the eroor analysis and implementation of some eigenvalue decomposi-
tion and singular value decomposition algorithms, UT-CS-96-336, LAPACK working
note 115 (1996)
Rencher A.C. (2000): Linear models in statistics, J. Wiley, New York 2000
Renfer, J.D. (1997): Contour lines of L1 -norm regression, Student 2 (1997) 27-36
Resnikoff, G.J. and G.J. Lieberman (1957): Tables of the noncentral t-distribution, Stan-
ford University Press, Stanford 1957
Riccomagno, E., Schwabe, R. and H.P. Wynn (1997): Lattice-based optimum design for
Fourier regression, Ann. Statist. 25 (1997) 2313-2327
Rice, J.R. (1969): The approximation of functions, vol. II - Nonlinear and multivariate
theory, Addison-Wesley, Reading 1969
Richards, F.S.G. (1961): A method of maximum likelihood estimation, J. Royal Stat. Soc.
B 23 (1961) 469-475
Richter, H. and V. Mammitzsch (1973): Methode der kleinsten Quadrate, Stuttgart 1973
Riedel, K.S. (1992): A Sherman-Morrison-Woodbury identity for rank augmenting matri-
ces with application to centering, SIAM J. Matrix Anal. Appl. 13 (1992) 659-662
Riedwyl, H. (1997): Lineare Regression, Birkhäuser Verlag, Basel 1997
Richter, W.D. (1985): Laplace-Gauß integrals, Gaussian measure asymptotic behaviour
and probabilities of moderate deviations, Z. Analysis und ihre Anwendungen 4 (1985)
257-267
Rilstone, P., Srivastava, V.K. and A. Ullah (1996): The second order bias and mean
squared error of nonlinear estimators, J. Econometrics 75 (1996) 369-395
Rivest, L.P. (1982): Some statistical methods for bivariate circular data, J. Royal Statisti-
cal Society, Series B: 44 (1982) 81-90
Rivest, L.P. (1988): A distribution for dependent unit vectors, Comm. Statistics A: Theory
Methods 17 (1988) 461-483
Rivest, L.P. (1989): Spherical regression for concentrated Fisher-von Mises distributions,
Annals of Statistics 17 (1989) 307-317
Roberts, P.H. and H.D. Ursell (1960): Random walk on the sphere and on a Riemannan
manifold, Phil. Trans. Roy. Soc. A252 (1960) 317-356
Robinson, G.K. (1982): Behrens-Fisher problem, Encyclpedia of the Statistical Sciences,
Vol. 1, Wiley, New York 1982
Robinson, P.M. and C. Velasco (1997): Autocorrelation-Robust Interference, Handbook
of Statistics 15 (1997) 267-298
Rodgers, J.L. and W.A. Nicewander (1988): Thirteen ways to look at the correlation
coefficient, the Maerican Statistician 42 (1988) 59-66
Rohatgi, V.K. (1987): Statistical inference, J. Willey & Sons 1987
Rohde, C.A. (1966): Some results on generalized inverses, SIAM Rev. 8 (1966) 201-205
References 723
Romano, J.P. and A.F. Siegel (1986): Counterexamples in probability and statistics,
Chapman and Hall, Boca Raton 1986
Romanowski, M. (1979): Random errors in observations and the influence of modulation
on their distribution, K. Wittwer Verlag, Stuttgart 1979
Rosen, J.B., Park, H. and J. Glick (1996): Total least norm formulation and solution for
structured problems, SIAM J. Matrix Anal. Appl. 17 (1996) 110-126
Rosén, K.D.P. (1948): Gauss’s mean error and other formulae for the precision of direct
observations of equal weight, Tätigkeitsbereiche Balt. Geod. Komm. 1944-1947, pp.
38-62, Helsinki 1948
Rosenblatt, M. (1971): Curve estimates, Ann. Math. Statistics 42 (1971) 1815-1842
Rosenblatt, M. (1997): Some simple remarks on an autoregressive scheme and an implied
problem, J. theor. Prob. 10 (1997) 295-305
Ross, G.J.S. (1982): Non-linear models, Math. Operationsforschung Statistik 13 (1982)
445-453
Ross, S.M. (1983): Stochastic processes, Wiley, New York 1983
Rousseeuw, P.J. and A.M. Leroy (1987): Robust regression and outlier detection, J.
Wiley, New York 1987
Roy, T. (1995): Robust non-linear regression analysis, J. Chemometrics 9 (1995) 451-457
Rozanski, I.P. and R. Velez (1998): On the estimation of the mean and covariance pa-
rameter for Gaussian random fields, Statistics 31 (1998) 1-20
Rueda, C., Salvador, B. and M.A. Fernández (1997): Simultaneous estimation in a re-
stricted linear model, J. Multivariate Analysis 61 (1997) 61-66
Rueschendorf, L. (1988): Asymptotische Statistik, Teubner, Stuttgart 1988
Rummel, R. (1975): Zur Behandlung von Zufallsfunktionen und –folgen in der physikali-
schen Geodäsie, Deutsche Geodätische Kommission bei der Bayerischen Akademie
der Wissenschaften, Report No. C 208, München 1975
Rummel, R. and K. P. Schwarz (1977): On the nonhomogenity of the global covariance
function, Bull. Géodésique 51 (1977) 93-103
Ruppert, D. and R.J. Carroll (1980): Trimmed least squares estimation in the linear model,
J.the American Statistical Association 75 (1980) 828-838
Rutherford, A. (2001): Introducing Anova and Ancova – a GLM approach, Sage, London
2001
Rutherford, D.E. (1933): On the condition that two Zehfuss matrices be equal, Bull.
American Math. Soc. 39 (1933) 801-808
Saalfeld, A. (1999): Generating basis sets of double differences, J. Geodesy 73 (1999)
291-297
Sacks, J. and D. Ylvisaker (1966): Design for regression problems with correlated errors,
Annals of Mathematical Statistics 37 (1966) 66-89
Sahai, H.(2000): The analysis of variance: fixed, random and mixed models, 778 pages,
Birkhaeuser Verlag, Basel 2000
Saichev, A.I. and W.A. Woyczynski (1996): Distributions in the physical and engineering
sciences, Vol. 1, Birkäuser Verlag, Basel 1996
Sakallioglu, S., Kaciranlar, S. and F. Akdeniz (2001): Mean squared error comparisons of
some biased regression estimators, Commun. Statist.- Theory Math. 30 (2001) 347-
361
Samorodnitsky, G. and M.S. Taqqu (1994): Stable non-Gaussian random processes,
Chapman and Hall, Boca Raton 1994
Sampson, P.D. and P. Guttorp (1992): Nonparametric estimation of nonstationary spatial
covariance structure, J.the American Statistical Association 87 (1992) 108-119
Sander, B. (1930): Gefugekunde und Gesteine, J. Springer, Vienna 1930
724 References
Sansò, F. (1990): On the aliasing problem in the spherical harmonic analysis, Bull. Géod.
64 (1990) 313-330
Sansò, F. and G. Sona (1995): The theory of optimal linear estimation for continuous
fields of measurements, Manuscripta Geodetica 20 (1995) 204-230
Sastry, S. (1999): Nonlinear systems: Analysis, stability and control, Springer-Verlag,
New York 1999
Sathe, S.T. and H.D. Vinod (1974): Bound on the variance of regression coefficients due
to heteroscedastic or autoregressive errors, Econometrica 42 (1974) 333-340
Saw, J.G. (1978): A family of distributions on the m-sphere and some hypothesis tests,
Biometrika 65 (1978) 69-73
Saw, J.G. (1981): On solving the likelihood equations which derive from the Von Mises
distribution, Technical Report, University of Florida, 1981
Saxe, K. (2002): Beginning functional analyses, Springer 2002
Sayed, A.H., Hassibi, B. and T. Kailath (1996): Fundamental inertia conditions for the
minimization of quadratic forms in indefinite metric spaces, Oper. Theory: Adv.
Appl., Birkhäuser Verlag, Cambridge/ Mass 1996
Schach, S. and T. Schäfer (1978): Regressions- und Varianzanalyse, Springer, Berlin
1978
Schafer, J.L. (1997): Analysis of incomplete multivariate data, Chapman and Hall, Lon-
don 1997
Schaffrin, B. (1979): Einige ergänzende Bemerkungen zum empirischen mittleren Fehler
bei kleinen Freiheitsgraden, Z. Vermessungsesen 104 (1979) 236-247
Schaffrin, B. (1983a): Varianz-Kovarianz Komponentenschätzung bei der Ausgleichung
heterogener Wiederholungsmessungen, Deutsche Geodätische Kommission, Report C
282, München, 1983
Schaffrin, B. (1983b): Model choice and adjustment techniques in the presence of prior
information, Ohio State University Department of Geodetic Science and Surveying,
Report 351, Columbus 1983
Schaffrin, B. (1985): The geodetic datum with stochastic prior information, Publ. C313,
German Geodetic Commission, München 1985
Schaffrin, B. (1991): Generalized robustified Kalman filters for the integration of GPS
and INS, Tech. Rep. 15, Geodetic Institute, Stuttgart Unversity 1991
Schaffrin, B. (1995): A generalized Lagrange function approach to include fiducial con-
straints, Z. Vermessungswesen 7 (1995) 325-350
Schaffrin, B. (1997): Reliability measures for correlated observations, J. Surveying Engin-
eering 123 (1997) 126-137
Schaffrin, B. (1999): Softly unbiased estimation part1: The Gauss-Markov model, Linear
Algebra and its Applications 289 (1999) 285-296
Schaffrin, B. (2001a): Equivalent systems for various forms of kriging, including least-
squares collocation, Z. Vermessungswesen 126 (2001) 87-94
Schaffrin, B. (2001b): Minimum mean square error adjustment, part. I: the empirical BLE
and the repro-BLE for direct observation, J.the Geodetic Society of Japan 46 (2000)
21-30
Schaffrin, B. and E.W. Grafarend (1982a): Kriterion-Matrizen II: Zweidimensionale ho-
mogene und isotrope geodätische Netze, Teil II a: Relative cartesische Koordinaten,
Z. Vermessungswesen 107 (1982) 183-194
Schaffrin, B. and E.W. Grafarend (1982b): Kriterion-Matrizen II: Zweidimensionale ho-
mogene und isotrope geodätische Netze. Teil II b: Absolute cartesische Koordinaten,
Z. Vermessungswesen 107 (1982) 485-493
Schaffrin, B., Grafarend, E.W. and G. Schmitt (1977): Kanonisches Design Geodätischer
Netze I, Manuscripta Geodaetica 2 (1977) 263-306
References 725
Schaffrin, B., Grafarend, E.W. and G. Schmitt (1978): Kanonisches Design Geodätischer
Netze II, Manuscripta Geodaetica 2 (1978) 1-22
Schaffrin, B. and E.W. Grafarend (1986): Generating classes of equivalent linear models
by nuisance parameter elimination, Manuscripta Geodaetica 11 (1986) 262-271
Schaffrin, B. and E.W. Grafarend (1991): A unified computational scheme for traditional
and robust prediction of random effects with some applications in geodesy, The Fron-
tiers of Statistical Scientific Theory & Industrial Applications 2 (1991) 405-427
Schaffrin, B. and J.H. Kwon (2002): A bayes filter in friendland form for INS/GPS vector
gravimetry, Geoph. J. Int. 149 (2002) 64-75
Shanbhag D.N. and C.R. Rao (eds.) (2001): Stochastic Processes: Theory and methods,
Elsevier 2001
Scheidegger, A.E. (1965): On the statistics of the orientation of bedding planes, grain axes
and similar sedimentological data, U.S. Geol. Survey Prof. Paper 525-C (1965) 164-
167
Schetzen, M. (1980): The Volterra and Wiener theories of nonlinear systems, Wiley, New
York 1980
Schick, A. (1999): Improving weighted least-squares estimates in heteroscedastic linear
regression when the variance is a function of the mean response, J.Statistical Planning
and Inference 76 (1999) 127-144
Schiebler, R. (1988): Giorgio de Chirico and the theory of relativity, Lecture given at
Stanford University, Wuppertal 1988
Schmetterer, L. (1956): Einführung in die mathematische Statistik, Wien 1956
Schmidt, E. (1907): Entwicklung willkürlicher Funktionen, Math. Annalen 63 (1907) 433-
476
Schmidt, K. (1996): A comparison of minimax and least squares estimators in linear
regression with polyhedral prior information, Acta Applicandae Mathematicae 43
(1996) 127-138
Schmidt, K.D. (1996): Lectures on risk theory, Teubner Skripten zur Mathematischen
Stochastik, Stuttgart 1996
Schmidt, P. (1976): Econometrics, Marcel Dekker, New York 1976
Schmidt-Koenig, K. (1972): New experiments on the effect of clock shifts on homing
pigeons in animal orientation and navigation, Eds.: S.R. Galler, K. Schmidt-Koenig,
G.J. Jacobs and R.E. Belleville, NASA SP-262, Washington D:C. 1972
Schmitt, G. (1975): Optimaler Schnittwinkel der Bestimmungsstrecken beim einfachen
Bogenschnitt, Allg. Vermessungsnachrichten 6 (1975) 226-230
Schmitt, G. (1977a): Experiences with the second-order design problem in theoretical and
practical geodetic networks, Proceedings International Symposium on Optimization of
Design and Computation of Control Networks, Sporon 1977
Schmitt, G. (1977b): Experiences with the second-order design problem in theoretical and
practical geodetic networks, Optimization of design and computation of control net-
works. F. Halmos and J. Somogyi eds, Akadémiai Kiadó, Budapest (1979), 179-206
Schmitt, G. (1978): Gewichtsoptimierung bei Mehrpunkteinschaltung mit Streckenmes-
sung, Allg. Vermessungsnachrichten 85 (1978) 1-15
Schmitt, G. (1979): Zur Numerik der Gewichtsoptimierung in geodätischen Netzen, Deut-
sche Geodätische Kommission, Bayerische Akademie der Wissenschaften, Report
256, München 1979
Schmitt, G. (1980): Second order design of a free distance network considering different
types of criterion matrices, Bulletin Geodetique 54 (1980) 531-543
Schmitt, G. (1985): Second Order Design, Third Order Design, Optimization and design
of geodetic networks, Springer Verlag, Berlin (1985), 74-121
726 References
Schmitt, G., Grafarend, E.W., and B. Schaffrin.(1977): Kanonisches Design geodätischer
Netze I, Manuscripta Geodaetica 2 (1977) 263-306
Schmitt, G., Grafarend, E.W. and B. Schaffrin (1978): Kanonisches Design geodätischer
Netze II, Manuscripta Geodaetica 3 (1978) 1-22
Schneeweiß, H. and H.J. Mittag (1986): Lineare Modelle mit fehlerbehafteten Daten,
Physica-Verlag, Heidelberg 1986
Schock, E. (1987): Implicite iterative methods for the approximate solution of ill-posed
problems, Bolletino U.M.I., Series 1-B, 7 (1987) 1171-1184
Schoenberg, I.J. (1938): Metric spaces and completely monotone functions, Ann. Math.
39 (1938) 811-841
Schott, J.R. (1997): Matrix analysis for statistics, J. Wiley, New York 1997
Schott, J.R. (1998): Estimating correlation matrices that have common eigenvectors,
Comput. Stat. & Data Anal. 27 (1998) 445-459
Schouten, J.A. and J.Haantjes (1936): Über die konforminvariante Gestalt der relativisti-
schen Bewegungsgleichungen, in: Koningl. Ned. Akademie van Wetenschappen,
Proc. Section of Sciences, vol. 39, Noord-Hollandsche Uitgeversmaatschappij, Am-
sterdam 1936
Schultz, C. and G. Malay (1998): Orthogonal projections and the geometry of estimating
functions, J. Statistical Planning and Inference 67 (1998) 227-245
Schultze, J. and J. Steinebach (1996): On least squares estimates of an exponential tail
coefficient, Statistics & Decisions 14 (1996) 353-372
Schur, J. (1911): Bemerkungen zur Theorie der verschränkten Bilinearformen mit unend-
lich vielen Veränderlichen, J. Reine und angew. Math. 140 (1911) 1-28
Schur, J. (1917): Über Potenzreihen, die im Innern des Einheitskreises beschränkt sind, J.
Reine angew. Math. 147 (1917) 205-232
Schwarz, G. (1978): Estimating the dimension of a model, The Annals of Statistics 6
(1978) 461-464
Schwarz, H. (1960): Stichprobentheorie, Oldenbourg, München 1960
Schwarz, K. P. (1976): Least-squares collocation for large systems, Boll. Geodesia e
Scienze Affini 35 (1976) 309-324
Scitovski, R. and D. Jukiü (1996): Total least squares problem for exponential function,
Inverse Problems 12 (1996) 341-349
Seal, H.L. (1967): The historical development of the Gauss linear model, Biometrika 54
(1967) 1-24
Searle, S.R. and C.R. Henderson (1961): Computing procedures for estimating compo-
nents of variance in the two-way classification, mixed model, Biometrics 17 (1961)
607-616
Searle, S.R. (1971a). Linear Models, Wiley, New York 1971
Searle, S.R. (1971b): Topics in variance components estimation, Biometrics 27 (1971) 1-
76
Searle, S.R. (1974): Prediction, mixed models, and variance components, Reliability and
Biometry (1974) 229-266
Searle, S.R., Casella, G. and C.E. McCulloch (1992): Variance components, Wiley, New
York 1992
Seber, G.A.F. (1977): Linear regression analysis, Wiley, New York 1977
Seely, J. (1970): Linear spaces and unbiased estimation, Ann. Math. Statist. 41 (1970)
1725-1734
Seely, J. (1970): Linear spaces and unbiased estimation – Application to the mixed linear
model, Ann. Math. Statist. 41 (1970) 1725-1734
References 727
Seely, J. (1971): Quadratic subspaces and completeness, Ann. Math. Statist. 42 (1971)
710-721
Seely, J. (1975): An example of an inquadmissible analysis of variance estimator for a
variance component, Biometrika 62 (1975) 689
Seely, J. (1977): Minimal sufficient statistics and completeness, Sankhya, Series A, Part
2, 39 (1977) 170-185
Seely, J. (1980): Some remarks on exact confidence intervals for positive linear combina-
tions of variance components, J.the American Statistical Association 75 (1980) 372-
374
Seely, J. and R.V. Hogg (1982): Unbiased estimation in linear models, Communication in
Statistics 11 (1982) 721-729
Seely, J. and Y. El-Bassiouni (1983): Applying Wald’s variance components test, Ann.
Statist. 11 (1983) 197-201
Seely, J. and E.-H. Rady (1988): When can random effects be treated as fixed effects for
computing a test statistics for a linear hypothesis?, Communications in Statistics 17
(1988) 1089-1109
Seely, J. and Y. Lee (1994): A note on the Satterthwaite confidence interval for a vari-
ance, Communications in Statistics 23 (1994) 859-869
Seely, J., Birkes, D. and Y. Lee (1997): Characterizing sums of squares by their distribu-
tion, American Statistician 51 (1997) 55-58
Seemkooei, A.A. (2001): Comparison of reliability and geometrical strength criteria in
geodetic networks, J. Geodesy 75 (2001) 227-233
Segura, J. and A. Gil (1999): Evaluation of associated Legendre functions off the cut and
parabolic cylinder functions, Electronic Transactions on Numerical Analysis 9 (1999)
137-146
Selby, B. (1964): Girdle distributions on the sphere, Biometrika 51 (1964) 381-392
Sen Gupta, A. and R. Maitra (1998): On best equivarience and admissibility of simultane-
ous MLE for mean direction vectors of several Langevin distributions, Ann. Inst. Sta-
tist. Math. 50 (1998) 715-727
Sengupta, D. and S.R. Jammalamadaka (2003): Linear models, an integrated approach,
Iseries of Multivariate analysis 6 (2003)
Serfling, R.J. (1980): Approximation theorems of mathematical statistics, J. Wiley, New
York 1980
Shaban, A.M.M. (1994): Anova, minque, PSD-minqmbe, canova and cminque in estima-
ting variance components, Statistica 54 (1994) 481-489
Shah, B.V. (1959): On a generalisation of the Kronecker product designs, Ann. Math.
Statistics 30 (1959) 48-54
Shalabh (1998): Improved estimation in measurement error models through Stein rule
procedure, J. Multivar. Analysis 67 (1998) 35-48
Shao, Q.-M. (1996): p-variation of Gaussian processes with stationary increments, Studia
Scientiarum Mathematicarum Hungarica 31 (1996) 237-247
Shapiro, S.S. and M.B. Wilk (1965): An analysis of variance for normality (complete
samples), Biometrika 52 (1965), 591-611
Shapiro, S.S., Wilk, M.B. and M.J. Chen (1968): A comparative study of various tests for
normality, J. American Statistical Ass. 63 (1968) 1343-1372
Sheppard, W.F. (1912): Reduction of errors by means of negligible differences, Proc. 5th
Int. Congress Mathematicians (Cambridge) 2 (1912) 348-384
Shevlyakov, G.L. and T.Y. Khcatova (1998): On robust estimation of a correlation coeffi-
cient and correlation matrix, Contributions to statistics, pp. 153-162, 1998
Sheynin, O.B. (1966): Origin of the theory of errors, Nature 211 (1966) 1003-1004
728 References
Sheynin, O.B. (1979): Gauß and the theory of errors, Archive for History of Exact Sci-
ences 20 (1979)
Sheynin, O. (1995): Helmert’s work in the theory of errors, Arch. Hist. Exact Sci. 49
(1995) 73-104
Shin, D.W. and S.H. Song (2000): Asymptotic efficiency of the OLSE for polynomial
regression models with spatially correlated errors, Statistics Probability Letters 47
(2000) 1-10
Shiryayev, A.N. (1973): Statistical sequential analysis, Transl. Mathematical Monographs
8, American Mathematical Society, Providence/R.I. 1973
Shkarofsky, I.P. (1968): Generalized turbulence space-correlation and wave-number
spectrum-function pairs, Canadian J.Physics 46 (1968) 2133-2153
Shorack, G.R. (1969): Testing and estimating ratios of scale parameters, J. Am. Statist.
Assn 64, 999-1013, 1969
Shumway, R.H. and D.S. Stoffer (2000): Time series analysis and its applications,
Springer Verlag, New Ýork 2000
Shwartz, A. and A. Weiss (1995): Large deviations for performance analysis, Chapman
and Hall, Boca Raton 1995
Sibuya, M. (1960): Bivariate extreme statistics, Ann. Inst. Statist. Math. 11 (1960) 195-
210
Sibuya, M. (1962): A method of generating uniformly distributed points on n-dimensional
spheres, Ann. Inst. Statist. Math. 14 (1962) 81-85
Sillard, P., Altamimi, Z. and C. Boucher (1998): The ITRF96 realization and its associ-
ated velocity field, Geophysical Research Letters 25 (1998) 3223-3226
Silvey, S.D. (1975): Statistical inference, Chapman and Hall, Boca Raton 1975
Silvey, S.D. (1980): Optimal design, Chapman and Hall, 1980
Sima, V. (1996): Algorithms for linear-quadratic optimization, Dekker, New York 1996
Simmonet, M. (1996): Measures and probabilities, Springer Verlag, New York 1996
Simon, H.D. and H. Zha (2000): Low-rank matrix approximation using the Lanczos bidi-
agonalization process with applications, SIAM J. Sci. Comput. 21 (2000) 2257-2274
Simoncini, V. and F. Perotti (2002): On the numerical solution of ( O 2 A + O B + C ) x = b
and application to structural dynamics, SIAM J. Sci. Comput. 23 (2002) 1875-1897
Simonoff, J.S. (1996): Smoothing methods in statistics, Springer Verlag, New York 1996
Singh, R. (1963): Existence of bounded length confidence intervals, Ann. Math. Statist.
34 (1963) 1474-1485
Singh, S. and D.S. Tracy (1999): Ridge regression using scrambled responses, Metron 57
(1999) 147-157
Singh, S., Horn, S., Chowdhury, S. and F. Yu (1999): Calibration of the estimators of
variance, Austral. & New Zeeland J. Statist. 41 (1999) 199-212
Sjoeberg, L.E. (2003): The BLUE of the GPS double difference satellite-to-receiver
Range for precise positioning, Z. Vermessungswesen 1 (2003) 26-30
Slakter, M.J. (1965): A comparison of the Pearson chi-square and Kolmogorov goodness-
of-fit-tests with respect to validity, J. American Statist. Assoc. 60 (1965) 854-858
Small, C.G. (1996): The statistical theory of shape, Springer Verlag, New York 1996
Smith, A.F.M. (1973): A general Bayesian linear model, J. Royal Statistical Society B 35
(1973) 67-75
Smith, P.J. (1995): A recursive formulation of the old problem of obtaining moments
from cumulants and vice versa, American Statistical Association 49 (1995) 217-218
Smith, T. and S.D. Peddada (1998): Analysis of fixed effects linear models under hetero-
scedastic errors, Statistics & Probability Letters 37 (1998) 399-408
References 729
Smyth, G.K. (1989): Generalized linear models with varying dispersion, J. Royal Statisti-
cal Society B51 (1989) 47-60
Sneeuw, N. and R. Bun (1996): Global spherical harmonic computation by two-
dimensional Fourier methods, J. Geodesy 70 (1996) 224-232
Solari, H.G., Natiello, M.A. and G.B. Mindlin (1996): Nonlinear dynamics, IOP, Bristol
1996
Solomon, P.J. (1985): Transformations for components of variance and covariance, Bio-
metrica 72 (1985) 233-239
Somogyi, J. (1998): The robust estimation of the 2D-projective transformation, Acta
Geod. Geoph. Hung. 33 (1998) 279-288
Song, S.H. (1996): Consistency and asymptotic unbiasedness of S2 in the serially corre-
lated error components regression model for panel data, Statistical Papers 37 (1996)
267-275
Song, S.H. (1999): A note on S2 in a linear regression model based on two-stage sampling
data, Statistics & Probability Letters 43 (1999) 131-135
Soper, H.E. (1916): On the distributions of the correlation coefficient in small samples,
Biometrika 11 (1916) 328-413
Soroush, M. and K.R. Muske (2000): Analytical model predictive control, Progress in
Systems and Control Theory 26 (2000) 166-179
Spanos, A. (1999): Probability theory and statistical inference, Cambridge University
Press, Cambridge 1999
Spaeth, H. (1997): Zum Ausgleich von sphärischen Messdaten mit Kleinkreisen, Allg.
Vermessungsnachrichtungen 11 (1997) 408-410
Spaeth, H. and G.A. Watson (1987): On orthogonal linear l1 approximation, Numerische
Mathematik 51 (1987) 531-543
Sposito, V.A. (1982): On unbiased Lp regression. J. Amer. Statist. Assoc. 77 (1982) 652-
653
Sprent, P. and N.C. Smeeton (1989): Applied nonparametric statistical methods, Chapman
and Hall, Boca Raton, Florida 1989
Sprott, D.A. (1978): Gauss’s contributions to statistics, Historia Mathematica 5 (1978)
183-203
Srecok, A.J. (1968): On the calculation of the inverse of the error function, Math. Compu-
tation 22 (1968) 144-158
Srivastava, A.K., Dube, M. and V. Singh (1996): Ordianry least squares and Stein-rule
predictions in regression models under inclusion of some superfluous variables, Sta-
tistical Papers 37 (1996) 253-265
Srivastava, A.K. and Shalabh, S. (1996): Efficiency properties of least squares and Stein-
Rule predictions in linear regression models, J. Appl. Stat. Science 4 (1996) 141-145
Srivastava, A.K. and Shalabh, S. (1997): A new property of Stein procedure in measure-
ment error model, Statistics & Probability Letters 32 (1997) 231-234
Srivastava, M.S. and D. von Rosen (1998): Outliers in multivariate regression models, J.
Multivariate Analysis 65 (1998) 195-208
Stahlecker, P. and K. Schmidt (1996): Biased estimation and hypothesis testing in linear
regression, Acta Applicandae Mathematicae 43 (1996) 145-151
Stahlecker, P., Knautz, H. and G. Trenkler (1996): Minimax adjustment technique in a
parameter restricted linear model, Acta Applicandae Mathematicae 43 (1996) 139-144
Stam, A.J. (1982): Limit theorems for uniform distributions on spheres in high dimen-
sional euclidean spaces, J. Appl. Prob. 19 (1982) 221-229
Steele, B.M. (1996): A modified EM algorithm for estimation in generalized mixed mod-
els, Biometrics 52 (1996) 1295-310
730 References
Stefanski, L.A. (1989): Unbiased estimation of a nonlinear function of a normal mean
with application to measurement error models, Communications Statist. Theory
Method. 18 (1989) 4335-4358
Stefansky, W. (1971): Rejecting outliers by maximum normal residual, Ann. Math. Statis-
tics 42 (1971) 35-45
Stein, C. (1945): A two-sample test for a linear hypothesis whose power is independent of
the variance, Ann. Math. Statistics 16 (1945) 243-258
Stein, C. (1950): Unbiased estimates with minimum variance, Ann. Math. Statist. 21
(1950) 406-415
Stein, C. (1959): An example of wide discrepancy between fiducial and confidence inter-
vals, Ann. Math. Statist. 30 (1959) 877-880
Stein, C. (1964): Inadmissibility of the usual estimator for the variance of a normal distri-
bution with unknown mean, Ann. Inst. Statist. Math. 16 (1964) 155-160
Stein, C. and A. Wald (1947): Sequential confidence intervals for the mean of a normal
distribution with known variance, Ann. Math. Statist. 18 (1947) 427-433
Steiner, F. and B. Hajagos (1999a): A more sophisticated definition of the sample median,
Acta Geod. Geoph. Hung. 34 (1999) 59-64
Steiner, F. and B. Hajagos (1999b): Insufficiency of asymptotic results demonstrated on
statistical efficiencies of the L1 Norm calculated for some types of the supermodel
fp(x), Acta Geod. Geoph. Hung. 34 (1999) 65-69
Steiner, F. and B. Hajagos (1999c): Error characteristics of MAD-S (of sample medians)
in case of small samples for some parent distribution types chosen from the super-
models fp(x) and fa(x), Acta Geod. Geoph. Hung. 34 (1999) 87-100
Steinmetz, V. (1973): Regressionsmodelle mit stochastischen Koeffizienten, Proc. Oper.
Res. 2, DGOR Ann. Meet., Hamburg 1973, pp. 95-104
Stenger, H. (1971): Stichprobentheorie, Physica-Verlag, Würzburg 1971
Stephens, M.A. (1963): Random walk on a circle, Biometrika 50 (1963) 385-390
Stephens, M.A. (1964): The testing of unit vectors for randomness, J. Amer. Statist. Soc.
59 (1964) 160-167
Stephens, M.A. (1969a): Tests for randomness of directions against two circular alterna-
tives, J. Amer. Statist. Ass. 64 (1969) 280-289
Stephens, M.A. (1969b): Test for the von Mises distribution, Biometrika 56 (1969) 149-
160
Stephens, M.A. (1979): Vector correlations, Biometrika 66 (1979) 41-88
Stepniak, C. (1985): Ordering of nonnegative definite matrices with application to com-
parison of linear models, Linear Algebra and Its Applications 70 (1985) 67-71
Stewart, C.W. (1997): Bias in robust estimation caused by discontinuities and multiple
structures, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (1997)
818-833
Stewart, G.W. (1977): On the perturbation of pseudo-inverses, projections and linear least
squares, SIAM Review 19 (1977) 634-663
Stewart, G.W. (1995a): Gauss, statistics and Gaussian elimination, J. Computational and
Graphical Statistics 4 (1995) 1-11
Stewart, G.W. (1995b): Afterword, in Translation: Theoria Combinationis Observationum
Erroribus Minimis Obnoxiae, pars prior-pars posterior-supplementum by Carl Frie-
drich Gauss (Theory of the Combination of Observations Least Subject to Errors,
Classics in Applied Mathematics, SIAM edition , pages 205-236, Philadelphia 1995
Stewart, G.W. (1992): An updating algorithm for subspace tracking, IEEE Trans. Signal
Proc. 40 (1992) 1535-1541
References 731
Stewart, G.W. (1998): Matrix algorithms, vol. 1: Basic decompositions, SIAM, Philadel-
phia 1998
Stewart, G.W. (1999): The QLP approximation to the singular value decomposition,
SIAM J. Sci. Comput. 20 (1999) 1136-1348
Stewart, G.W. (2001): Matrix algorithms, vol. 2: Eigensystems, SIAM, Philadelphia 2001
Stewart, G.W. and Sun Ji-Guang (1990): Matrix perturbation theory, Academic Press,
New York 1990
Stewart, K.G. (1997): Exact testing in multivariate regression, Econometric reviews 16
(1997) 321-352
Stigler, S.M. (1973a): Laplace, Fisher and the discovery of the concept of suffiency,
Biometrika 60 (1973) 439-445
Stigler, S.M. (1973b): Simon Newcomb, Percy Daniell, and the history of robust estima-
tion 1885-1920, J. American Statistical Association 68 (1973) 872-879
Stigler, S.M. (1977): An attack on Gauss, published by Legendre in 1820, Historia
Mathematica 4 (1977) 31-35
Stigler, S.M. (1986): The history of statistics, the measurement of uncertainty before
1900, Belknap Press-Harvard University Press, Cambridge/Mass. 1986
Stigler, S.M. (1999): Statistics on the table, the history of statistical concepts and meth-
ods, Harvard University Press, Cambridge-London 1999
Stigler, S.M. (2000): International statistics at the millennium: progressing or regressing,
International Statistical Review 68 (2000) 2,111-121
Stoica, P. and T. Soederstroem (1998): Partial least squares: A first-order analysis, Board
of the Foundation of the Scandinavian J.Statistics 25 (1998) 17-24
Stopar, B. (1999): Design of horizontal GPS net regarding non-uniform precision of GPS
baseline vector components, Bollettino di Geodesia e Scienze Affini 58 (1999) 255-
272
Stopar, B. (2001): Second order design of horizontal GPS net, Survey Review 36 (2001)
44-53
Storm, R. (1967): Wahrscheinlichkeitsrechnung, mathematische Statistik und statistische
Qualitätskontrolle, Leipzig 1967
Stoyanov, J. (1997): Regularly perturbed stochastic differential systems with an internal
random noise, Nonlinear Analysis, Theory, Methods & Applications 30 (1997) 4105-
4111
Stoyanov, J. (1998): Global dependency measure for sets of random elements: "The Ital-
ian problem" and some consequences, in: Ioannis Karatzas et al. (Eds.), Stochastic
process and related topics in memory of Stamatis Cambanis 1943-1995, Birkhäuser,
Boston/Basel/Berlin 1998
Stoyanov, J. (1999): Inverse Gaussian distribution and the moment problem, J. Appl.
Statist. Science 9 (1999) 61-71
Stoyanov, J. (2000): Krein condition inprobabilistic moment problems, Bernoulli 6 (2000)
939-949
Strang, G. and K. Borre (1997): Linear algebra, geodesy and GPS, Wellesley, Cambridge
Press 1997
Stroebel, D. (1997): Die Anwendung der Ausgleichungsrechnung auf elastsmechanische
Systeme, Deutsche Geodätische Kommission, Bayerische Akademie der Wissenschaf-
ten, Report C478, München 1997
Strohmer, T. (2000): A Levinson-Galerkin algorithm for regularized trigonometric ap-
proximation, SIAM J. Sci. Comput 22 (2000) 1160-1183
Stroud, A.H. (1966): Gaussian quadrature formulas, Prentice Hall, Englewood Cliffs, N.J.
1966
732 References
Stroud, A.H. and D. Secrest (1963): Approximate integrations formulas for certain spheri-
cally symmetric regions, Mathematics of Computation 17 (1963) 105-135
Stuart, A. and J.K. Ord (1994): Kendall’s advanced theory of statistics: volume I, distribu-
tion theory, Arnold Publ., 6th edition, London 1997
Student (Gosset, W.S.): The probable error of a mean, Biometrika 6 (1908) 1-25
Stulajter, F. (1978): Nonlinear estimators of polynomials in mean values of a Gaussian
stochastic process, Kybernetika 14 (1978) 206-220
Sturmfels, B. (1996): Gröbner bases and convex polytopes, American Mathematical
Society, Providence 1996
Subrahamanyan, M. (1972): A property of simple least squares estimates, Sankya B34
(1972) 355-356
Suenkel, H. (1985): K. Fourier analysis of geodetic networks, in: Optimization and design
of geodetic networks, pp. 257-302, Grafarend E.W. and F. Sansò (eds), Springer Ver-
lag 1985
Sugaria, N. and Y. Fujikoshi (1969): Asymptotic expansions of the non-null distributions
of the likelihood ratio criteria for multivariate linear hypothesis and independence,
Ann. Math. Stat. 40 (1969) 942-952
Sun, J.-G. (2000): Condition number and backward error for the generalized singular
value decomposition, Siam J. Matrix Anal. Appl. 22 (2000) 323-341
Sundaram, R.K. (1996): A first course in optimisation theory, Cambridge University Press
1996
Swallow, W.H. and F. Kianifard (1996): Using robust scale estimates in detecting multi-
ple outliers in linear regression, Biometrics 52 (1996) 545-556
Swallow, W.H. and S.R. Searle (1978): Minimum variance quadratic unbiased estimation
(MIVQUE) of variance components, Technometrics 20 (1978) 265-272
Swamy, P.A.V.B. (1971): Statistical inference in random coefficient regression models,
Springer-Verlag, Berlin 1971
Sylvester, J.J. (1850): Additions to the articles, "On a new class of theorems", and "On
Pascal's theorem", Phil. Mag. 37 (1850) 363-370
Sylvester, J.J. (1851): On the relation between the minor determinants of linearly equiva-
lent quadratic functions, Phil. Mag. 14 (1851) 295-305
Szabados, T. (1996): An elementary introduction to the Wiener process and stochastic
integrals, Studia Scientiarum Mathematicarum Hungarica 31 (1996) 249-297
Szasz, D. (1996): Boltzmann’s ergodic hypothesis, a conjecture for centuries?, Studia
Scientiarum Mathematicarum Hungarica 31 (1996) 299-322
Takos, I. (1999): Adjustment of observation equations without full rank, Bolletino di
Geodesia e Scienze Affini 58 (1999) 195-208
Tanana, V.P. (1997): Methods for solving operator equations, VSP, Utrecht, Netherlands
1997
Tanizaki, H. (1993): Nonlinear filters – estimation and applications, Springer, Berlin
Heidelberg New York 1993
Tanizaki, H. (2000): Bias correction of OLSE in the regression model with lagged de-
pendent variables, Computational Statistics & Data Analysis 34 (2000) 495-511
Tanner, A. (1996): Tools for statistical inference, 3rd ed., Springer Verlag, New York
1996
Tarpey, T. (2000): A note on the prediction sum of squares statistic for restricted least
squares, The American Statistician 54 (2000) 116-118
Tarpey, T. and B. Flury (1996): Self-consistency, a fundamental concept in statistics,
Statistical Science 11 (1996) 229-243
References 733
Tasche, D. (2003): Unbiasedness in least quantile regression, in: R. Dutter, P. Filzmoser,
U. Gather, P.J. Rousseeuw (Eds.), Developments in Robust Statistics, pp. 377-386,
Physica Verlag, Heidelberg 2003
Tashiro, Y. (1977): On methods for generating uniform random points on the surface of a
sphere, Ann. Inst. Statist. Math. 29 (1977) 295-300
Tate, R.F. (1959): Unbiased estimation: functions of location and scale parameters, Ann.
Math. Statist. 30 (1959) 341-366
Tate, R.F. and G.W. Klett (1959): Optimal confidence intervals for the variance of a
normal distribution, J. American Statistical Assoc. 16 (1959) 243-258
Teicher, H. (1961): Maximum likelihood characterization of distribution, Ann. Math.
Statist. 32 (1961) 1214-1222
Taylor, J.R. (1982): An introduction to error analysis, University Science Books, Sausa-
lito 1982
Tenorio, L. (2001): Statistical Regularization of inverse problems, SIAM Review 43
(2001) 347-366
Teunissen, P.J.G. (1985a: The geometry of geodetic inverse linear mapping and non-
linear adjustment, Netherlands Geodetic Commission, Publications on Geodesy, New
Series, Vol. 8/1, Delft 1985
Teunissen, P.J.G. (1985b: Zero order design: generalized inverses, adjustment, the datum
problem and S-transformations, In: Optimization and design of geodetic networks,
Grafarend, E.W. and F. Sanso eds., Springer Verlag, Berlin-Heidelberg-New York-
Tokyo 1985
Teunissen, P.J.G. (1989a): Nonlinear inversion of geodetic and geophysical data: diagnos-
ing nonlinearity, In: Brunner, F.K. and C. Rizos (eds): Developments in four-
dimensinal geodesy, Lecture Notes in Earth Sciences 29 (1989), 241-264
Teunissen, P.J.G. (1989b): First and second moments of non-linear least-squares estima-
tors, Bull. Geod. 63 (1989) 253-262
Teunissen, P.J.G. (1990): Non-linear least-squares estimators, Manuscripta Geodaetica 15
(1990) 137-150
Teunissen, P.J.G. (1993): Least-squares estimation of the integer GPS ambiguities, LGR
series, No. 6, Delft Geodetic Computing Centre, Delft 1993
Teunissen, P.J.G. (1995a: The invertible GPS ambiguity transformation, Manuscripta
Geodaetica 20 (1995) 489-497
Teunissen, P.J.G. (1995b: The least-squares ambiguity decorrelation adjustment: a
method for fast GPS integer ambiguity estimation, J. Geodesy 70 (1995) 65-82
Teunissen, P.J.G. (1997a): A canonical theory for short GPS baselines. Part I: The base-
line precision, J. Geodesy 71 (1997) 320-336
Teunissen, J.P.G. (1997b): On the sensitivity of the location, size and shape of the GPS
ambiguity search space to certain changes in the stochastic model, J. Geodesy 71
(1997) 541-551
Teunissen, J.P.G. (1997c): On the GPS widelane and its decorrelating property, J. Geod-
esy 71 (1997) 577-587
Teunissen, J.P.G. (1997d): The least-squares ambiguity decorrelation adjustment: its
performance on short GPS baselines and short observation spans, J. Geodesy 71
(1997) 589-602
Teunissen, P.J.G. and A. Kleusberg (1998): GPS observation and positioning concepts, in:
GPS for Geodesy, pp. 187-229, Teunissen, P.J.G. and A. Kleusberg (eds), Berlin 1998
Teunissen, P.J.G., de Jonge, P.J. and C.C.J.M. Tiberius (1997): The least-squares ambigu-
ity decorrelation adjustment: its performance on short GPS baselines and short obser-
vation spans, J. Geodesy 71 (1997) 589-602
734 References
Théberge, A. (2000): Calibration and restricted weights, Survey Methodology 26 (2000)
99-107
Theil, H. (1965): The analysis of disturbanaces in regression analysis, J.the American
Statistical Association 60 (1965) 1067-1079
Thompson, R. (1969). Iterative estimation of variance components for non-orthogonal
data, Biometrics 25 (1969) 767-773
Thompson, W.A. (1955): The ratio of variances in variance components model, Ann.
Math. Statist. 26 (1955) 325-329
Thomson, D.J. (1982): Spectrum estimation and harmonic analysis, Proceedings Of The
IEEE 70 (1982) 1055-1096
Tian, G.-L. (1998): The comparison between polynomial regression and orthogonal poly-
nomial regression, Statistics & Probability Letters 38 (1998) 289-294
Tiberius, C.C.J.M. and F. Kenselaar (2000): Estimation of the stochastic model for GPS
code and phase observables, Survey Review 35 (2000) 441-455
Tikhonov, A.N. and V.Y. Arsenin (1977): Solutions of ill-posed problems, J. Wiley, New
York 1977
Tikhonov, A.N., A.S. Leonov and A.G. Yagola (1998a): Nonlinear ill-posed problems,
vol.1, Appl. Math. and Math. Comput. 14, Chapman & Hall, London 1998
Tikhonov, A.N., A.S. Leonov and A.G. Yagola (1998b): Nonlinear ill-posed problems,
vol.2, Appl. Math. and Math. Comput. 14, Chapman & Hall, London 1998
Tjoestheim, D. (1990): Non-linear time series and Markov chains, Adv. Appl. Prob. 22
(1990) 587-611
Tjur, T (1998): Nonlinear regression, quasi likelihood, and overdispersion in generalized
linear models, American Statistician 52 (1998) 222-227
Tobias, P.A. and D.C. Trinidade (1995): Applied reliability, Chapman and Hall, Boca
Raton 1995
Tolimieri, R., An, A. and C. Lu (1989): Algorithms for discrete Fourier transform and
convolution, Springer Verlag 1989
Tominaga, Y. and I. Fujiwara (1997): Prediction-weighted partial least-squares regression
(PWPLS), Chemometrics and Intelligent Lab Systems 38 (1997) 139-144
Tong, H. (1990): Non-linear time series, Oxford University Press, Oxford 1990
Toranzos F.I. (1952): An asymmetric bell-shaped frequency curve, Ann. Math. Statist. 23
(1952) 467-469
Tornatore, V. and F. Migliaccio (2001): Stochastic modelling of non-stationary smooth
phenomena, International Association of Geodesy Symposia 122 “IV Hotine – Ma-
russi Symposium on Mathematical Geodesy”, Springer Verlag, Berlin – Heidelberg
2001
Toutenburg, H. (1970): Vorhersage im allgemeinen linearen Regressionsmodell mit sto-
chastischen Regressoren, Math. Operationsforschg. Statistik 2 (1970) 105-116
Toutenburg, H. (1975): Vorhersage in linearen Modellen, Akademie Verlag, Berlin 1975
Toutenburg, H. (1996): Estimation of regression coefficients subject to interval con-
straints, Sankhya: The Indian J. Statistics A, 58 (1996) 273-282
Toutenburg, H. (2000): Improved predictions in linear regression models with stochastic
linear constraints, Biometrical Journal 42 (2000) 71-86
Townsend, E.C. and S.R. Searle (1971): Best quadratic unbiased estimation of variance
components from unbalanced data in the l-way classification, Biometrics 27 (1971)
643-657
Trefethen, L.N. and D. Bau (1997): Numerical linear algebra, Society for Industrial and
Applied Mathematics (SIAM), Philadelphia 1997
References 735
Troskie, C.G. and D.O. Chalton (1996): Detection of outliers in the presence of multicol-
linearity, in: Multidimensional statistical analysis and theory of random matrices, Pro-
ceedings of the Sixth Lukacs Symposium, eds. Gupta, A.K. and V.L.Girko, pages
273-292, VSP, Utrecht 1996
Troskie, C.G., Chalton, D.O. and M. Jacobs (1999): Testing for outliers and influential
observations in multiple regression using restricted least squares, South African Sta-
tist. J. 33 (1999) 1-40
Tsai, H. and K.S. Chan (2000): A note on the covariance structure of a continuous-time
arma process, Statistica Sinica 10 (2000) 989-998
Tsimikas, J.V. and J. Ledolter (1997): Mixed model representation of state space models:
new smoothing results and their application to REML estimation, Statistica Sinica 7
(1997) 973-991
Tufts, D.W. and R. Kumaresan (1982): Estimation of frequencies of multiple sinusoids:
making linear prediction perform like maximum likelihood, Proc. of IEEE Special is-
sue on Spectral estimation 70 (1982) 975-989
Turkington, D. (2000): Generalised vec operators and the seemingly unrelated regression
equations model with vector correlated disturbances, J. Econometrics 99 (2000) 225-
253
Ulrych, T.J. and R.W. Clayton (1976): Time series modelling and maximum entropy,
Phys. Earth and Planetry Interiors 12 (1976) 188-200
Vainikko, G.M. (1982): The discrepancy principle for a class of regularization methods,
USSR. Comp. Math. Math. Phys. 22 (1982) 1-19
Vainikko, G.M. (1983): The critical level of discrepancy in regularization methods,
USSR. Comp. Math. Math. Phys. 23 (1983) 1-9
Van der Veen, A.-J. (1996): A schur method for low-rank matrix approximation, SIAM J.
Matrix Anal. Appl. 17 (1996) 139-160
Van Garderen, K.J. (1999): Exact geometry of autoregressive models, J. Time Series
Analysis 20 (1999) 1-21
Van Gelderen, M. and R. Rummel (2001): The solution of the general geodetic boundary
value problem by least squares, J. Geodesy 75 (2001) 1-11
Van Huffel, S. (1990): Solution and properties of the restricdes total least squares prob-
lem, Processing of the International Mathematical Theory of Networks and Systems
Symposium (MTNS ´89) 521-528
Van Huffel, S. (1997): Recent advances in total least squares techniques and errors-in-
variables modelling, SIAM, Philadelphia 1997
Van Huffel, S. and H. Zha (1991a): The restricted total least squares problem: Formula-
tion, algorithm, and properties, SIAM J. Matrix Anal. Appl. 12 (1991) 292-309
Van Huffel, S. and H. Zha (1991b): The total least squares problem, SIAM J. Matrix
Anal. Appl. 12 (1991) 377-407
Van Mierlo, J. (1980): Free network adjustment and S-transformations, Deutsche Geod.
Kommission B 252, München 1980, 41-54
Van Montfort, K. (1988): Estimating in structural models with non-normal distributed
variables: some alternative approaches, Leiden 1988
Van Montfort, K. (1989): Estimating in structural models with non-normal distributed
variables: some alternative approaches, 'Reprodienst, Subfaculteit Psychologie', Lei-
den 1989
Van Rosen, D. (1988): Moments for matrix normal variables, Statistics 19 (1988) 575-583
Vanicek, P. and E.W. Grafarend (1980): On the weight estimation in leveling, National
Oceanic and Atmospheric Administration, Report NOS 86, NGS 17, Rockville 1980
Varadhan, S.R.S. (2001): Diffusion Processes, D. N. Shanbhag and C.R. Rao, eds., Hand-
book of Statistic 19 (2001) 853-871
736 References
Vasconcellos, K.L.P. and M.C. Gauss (1997): Approximate bias for multivariate nonlin-
ear heteroscedastic regressions, Brazilian J. Probability and Statistics 11 (1997) 141-
159
Vedel-Jensen, E.B. and L. Stougaard-Nielsen (2000): Inhomogeneous Markov point
processes by transformation, Bernoulli 6 (2000) 761-782
Ventsel, A.D. and M.I. Freidlin (1969): On small random perturbations of dynamical
systems, Report delivered at the meeting of the Moscow Mathematical Society on
March 25, 1969, Moscow 1969
Ventzell, A.D. and M.I. Freidlin (1970): On small random perturbations of dynamical
systems, Russian Math. Surveys 25 (1970) 1-55
Verbeke, G. and G. Molenberghs (2000): Linear mixed models for longitudinal data,
Springer-Verlag, New York 2000
Vernizzi, A., Goller, R. and P. Sais (1995): On the use of shrinkage estimators in filtering
extraneous information, Giorn. Econ. Annal. Economia 54 (1995) 453-480
Verbeke, G. and G. Molenberghs (1997): Linear Mixed Models in Practice, Springer,
New York 1997
Veroren, L.R. (1980): On estimation of variance components, Statistica Neerlandica 34
(1980) 83-106
Vichi, M. (1997): Fitting L2 norm classification models to complex data sets, Student 2
(1997) 203-213
Vigneau, E. M.F. Devaux, E.M. Qannari and P. Robert (1997): Principal component
regression, ridge regression and ridge principal component regression in spectroscopy
calibration, J. Chemometrics 11 (1997) 239-249
Vinod, H.D. and L.R. Shenton (1996): Exact moments for autoregressive and random
walk models for a zero or stationary initial value, Econometric Theory 12 (1996) 481-
499
Vinograde, B. (1950): Canonical positive definite matrices underinternal linear transfor-
mations, Proc. Amer. Math. Soc.1 (1950) 159-161
Voinov, V.G. and M.S. Nikulin (1993a): Unbiased estimators and their applications,
volume 1: univariate case, Kluwer-Academic Publishers, Dordrecht 1993
Voinov, V.G. and M.S. Nikulin (1993b): Unbiased estimators and their applications,
volume 2: multivariate case, Kluwer-Academic Publishers, Dordrecht 1993
Volterra, V. (1930): Theory of functionals, Blackie, London 1930
Vonesh, E.F. and V.M. Chinchilli (1997): Linear and nonlinear models for the analysis of
repeated measurements, Marcel Dekker Inc, New York – Bael – Hong Kong 1997
Von Mises, R. (1981): Über die „Ganzzahligkeit“ der Atomgewichte und verwandte
Fragen, Phys. Z. 19 (1918) 490-500
Wagner, H. (1959): Linear programming techniques for regression analysis, J. American
Statistical Association 56 (1959) 206-212
Wahba, G. (1975): Smoothing noisy data with spline functions, Numer. Math. 24 (1975)
282-292
Wald, A. (1939): Contributions to the theory of statistical estimation and testing hypothe-
ses, Ann. Math. Statistics 10 (1939) 299-326
Wald, A. (1945): Sequential tests for statistical hypothesis, Ann. Math. Statistics 16
(1945) 117-186
Walker, J.S. (1996): Fast Fourier transforms, 2nd edition, CRC Press, Boca Raton 1996
Walker, P.L. (1996): Elliptic functions, J. Wiley, Chichester U.K. 1996
Walker, S. (1996): An EM algorithm for nonlinear random effect models, Biometrics 52
(1996) 934-944
References 737
Wallace, D.L. (1980): The Behrens-Fisher and Fieller-Creasy problems, in: R.A. Fisher:
an appreciation, Fienberg and Hinkley, eds, Springer, pp 117-147, New York 1980
Wan, A.T.K. (1994a): The sampling performance of inequality restricted and pre-test
estimators in a misspecified linear model, Austral. J. Statist. 36 (1994) 313-325
Wan, A.T.K. (1994b): Risk comparison of the inequality constrained least squares and
other related estimators under balanced loss, Econom. Lett. 46 (1994) 203-210
Wan, A.T.K. (1994c): The non-optimality of interval restricted and pre-test estimators
under squared error loss, Comm. Statist. A – Theory Methods 23 (1994) 2231-2252
Wan, A.T.K. (1999): A note on almost unbiased generalized ridge regression estimator
under asymmetric loss, J. Statist. Comput. Simul. 62 (1999) 411-421
Wan, A.T.K. and K. Ohtani (2000): Minimum mean-squared error estimation in linear
regression with an inequality constraint, J.Statistical Planning and Inference 86 (2000)
157-173
Wang, J. (1996): Asymptotics of least-squares estimators for constrained nonlinear re-
gression, Annals of Statistics 24 (1996) 1316-1326
Wang, J. (2000): An approach to GLONASS ambiguity resolution, J. Geodesy 74 (2000)
421-430
Wang, M. C. and G.E. Uhlenbeck (1945): On the theory of the Brownian motion II, Re-
view of Modern Physics 17 (1945) 323-342
Wang, N., Lin, X. and R.G. Guttierrez (1999): A bias correction regression calibration
approach in generalized linear mixed measurement error models, Commun. Statist.
Theory Meth. 28 (1999) 217-232
Wang, Q-H. and B-Y. Jing (1999): Empirical likelihood for partial linear models with
fixed designs, Statistics & Probability Letters 41 (1999) 425-433
Wang, S. and G.I. Uhlenbeck (1945): On the theory of the Brownian motion II, Rev.
Modern Phys. 17 (1945) 323-342
Wang, T. (1996): Cochran Theorems for multivariate components of variance models,
Sankhya: The Indian J. Statistics A, 58 (1996) 238-342
Wassel, S.R. (2002): Rediscovering a family of means, Mathematical Intelligencer 24
(2002) 58-65
Waterhouse, W.C. (1990): Gauss’s first argument for least squares, Archive for the His-
tory of Exact Sciences 41 (1990) 41-52
Watson, G.S. (1983): Statistics on spheres, Wiley, New York 1983
Watson, G.S. (1956a): Analysis of dispersion on a sphere, Monthly Notices of the Royal
Astronomical society, Geophysical Supplement 7 (1956) 153-159
Watson, G.S. (1956b): A test for randomness of directions, Monthly Notices Roy. Astro.
Soc. Geoph. Suppl. 7 (1956) 160-161
Watson, G.S. (1960): More significance tests on the sphere, Biometrika 47 (1960) 87-91
Watson, G.S. (1961): Goodness-of-fit tests on a circle, Biometrika 48 (1961) 109-114
Watson, G.S. (1962): Goodness-of-fit tests on a circle-II, Biometrika 49 (1962) 57-63
Watson, G.S. (1964): Smooth regression analysis, Sankhya: The Indian J. Statistics: Se-
ries A (1964), 359-372
Watson, G.S. (1965): Equatorial distributions on a sphere, Biometrika 52 (1965) 193-201
Watson, G.S. (1966): Statistics of orientation data, J. Geology 74 (1966) 786-797
Watson, G.S. (1967a): Another test for the uniformity of a circular distribution, Bio-
metrika 54 (1967) 675-677
Watson, G.S. (1967b): Some problems in the statistics of directions, Bull. of I.S.I. 42
(1967) 374-385
738 References
Watson, G.S. (1968): Orientation statistic in the earth sciences, Bull of the Geological
Institutions of the Univ. of Uppsala 2 (1968) 73-89
Watson, G.S. (1969): Density estimation by orthogonal series, Ann. Math. Stat. 40
(1969): Density estimation by orthogonal series, Ann. Math. Stat. 40 (1969) 1469-
1498
Watson, G.S. (1970): The statistical treatment of orientation data, Geostatistics – a collo-
quium (Ed. D.F. Merriam), Plenum Press, New York 1970, 1-10
Watson, G.S. (1974): Optimal invariant tests for uniformity, Studies in Probability and
Statistics, Jerusalem Academic Press (1974) 121-128
Watson, G.S. (1981): Large sample theory of the Langevin distributions, J. Stat. Planning
Inference 8 (1983) 245-256
Watson, G.S: (1982): The estimation of palaeomagnetic ploe positions, Statistics in Prob-
ability: Essay in hohor of C.R. Rao, North-Holland, Amsterdam and New York 1982
Watson, G.S (1982): Distributions on the circle and sphere, Essays in Statsistical Science,
J. App. Prob. Special Volume 19A (1982) 265-280
Watson, G.S. (1984): The theory of concentrated Langevin distributions, J. Mult. Anal. 14
(1984) 74-82
Watson, G.S. (1986): Some estimation theory on the sphere, Ann. Inst. Statist. Math. 38
(1986) 263-275
Watson, G.S. (1987): The total approximation problem, in: Approximation theory IV, eds.
Chui, C.K. et al, pages 723-728, Academic Press 1987
Watson, G.S. (1988): The Langevin distribution on high dimensional spheres, J. Applied
Statistics 15 (1988) 123-130
Watson, G.S. (1998): On the role of statistics in polomagnetic proof of continental drift,
Canadian J. Statistics 26 (1998) 383-392
Watson, G.S. and E.J. Williams (1956): On the construction of significance tests on the
circle and the sphere, Biometrika 43 (1956) 344-352
Watson, G.S. and E. Irving (1957): Statistical methods in rock magnetism, Monthly No-
tices Roy. Astro. Soc. 7 (1957) 290-300
Watson, G.S. and M.R. Leadbetter (1963): On the estimation of the probability density-I,
Ann. Math. Stat 34 (1963) 480-491
Watson, G.S. and S. Wheeler (1964): A distribution-free two-sample test on a circle,
Biometrika 51 (1964) 256
Watson, G.S. and R.J. Beran (1967): Testing a sequence of unit vectors for serial correla-
tion, J. Geophysical Research 72 (1967) 5655-5659
Watson, G.S., Epp, R. and J. W. Tukey (1971): Tsting unit vectors for correlation, J.
Geophysical Research 76 (1971) 8480-8483
Wedderburn, R. (1974): Quasi-likelihood functions, generalized linear models and the
Gauß-Newton method, Biometrica 61 (1974) 439-447
Wei, B.-C. (1998): Exponential family: nonlinear models, Springer Verlag, Singapore
1998
Wei, B.-C. (1998): Testing for varying dispersion in exponential family nonlinear models,
Ann. Inst. Statist. Math. 50 (1998) 277-294
Wei, B.-C. and Y.-Q. Hu (1998): Generalized Leverage and its applications, Board of the
Foundations of the Scandinavian J.Statistics 25 (1998)25-37
Wei, M. (1997): Equivalent formulae for the supremum and stability of weighted pseudo-
inverses, Mathematics of Computation 66 (1997) 1487-1508
Wei, M. (2001): Supremum and stability of weighted pseudoinverses and weighted least
squares problems analysis and computations, Nova Science Publishers, New York
2001
References 739
Wei, M. and A.R. de Pierro (2000): Upper perturbation bounds of weighted projections,
weighted and constrained least squares problems, SIAM J. Matrix Anal. Appl. 21
(2000) 931-951
Wei, M. and A.R. de Pierro (2000): Some new properties of the equality constrained and
weighted least squares problem, Linear Algebra and its Applications 320 (2000) 145-
165
Weiss, A. (2002): Determination of thermal stratification and turbulence of the atmos-
pheric surface layer over various types of terrain by optical scintillometry, Disserta-
tion Swiss Federal Institute of Technology Zurich 2002
Weiss, G. and R. Rebarber (2000): Optimizability and estimatability for infinite-
dimensional linear systems, Siam J. Control Optim. 39 (2000) 1204-1232
Weisstein, E.W. (1999): Legendre Polynomial, CRC Press LLC, Wolfram Research Inc.
1999
Wellisch, S. (1910): Theorie und Praxis der Ausgleichungsrechnung Band 2: Probleme
der Ausgleichungsrechnung, pp. 46-49, Kaiserliche und königliche Hof-
Buchdruckerei und Hof-Verlags-Buchhandlung, Carl Fromme, Wien und Leipzig
1910
Wellner, J (1979): Permutation tests for directional data, Ann. Statist. 7 (1979) 924-943
Wells, D.E., Lindlohr, W., Schaffrin, B. and E.W. Grafarend (1987): GPS design: Undif-
ferenced carrier beat phase observations and the fundamental difference theorem,
University of New Brunswick, Surveying Engineering, Technical Report Nr. pages
116, 141 pages, Fredericton/Canada 1987
Welsh, A.H. (1996): Aspects of statistical inference, J. Wiley, New York 1996
Wenzel, H.G. (1977): Zur Optimierung von Schwerenetzen, Z. Vermessungswesen 102
(1977) 452-457
Werkmeister, P. (1916): Graphische Ausgleichung bei trigonometrischer Punktbestim-
mung durch Einschneiden, Z. Vermessungswesen 45 (1916) 113-126
Wernstedt, J. (1989): Experimentelle Prozeßanalyse, Oldenbourg Verlag, München 1989
Wertz, J.R. (1978): Spacekraft attitude determination and control, Kluwer Academic
Publishers, Dordrecht – Boston – London 1978
Wess, J. (1960): The conformal invariance in quantum field theory, in: Il Nuovo Cimento,
Nicola Zanichelli (Hrsg.), vol 18, Bologna 1960
Wetzel, W., Jöhnk, M.D. and P. Naeve (1967): Statistische Tabellen, de Gruyter, Berlin
1977
Whittaker, E.T. and G. Robinson (1924): The calculus of observations, Blackie, London
1924
Whittle, P. (1963a): Prediction and regulation, D. van Nostrand Co., Inc., Princeton 1963
Whittle, P. (1963b): Stochastic processes in several dimensions, Bull. Inst. Int. Statist. 40
(1963) 974-994
Whittle, P. (1973): Some general points in the theory of optimal experimental design, J.
Royal Statist. B35 (1973) 123-130
Wickerhauser, M.V. (1996): Adaptive Wavelet-Analysis, Theorie und Software, Vieweg
& Sohn Verlag, Braunschweig/Wiesbaden 1996
Wieser, A. (2000): Equivalent weight matrix, Graz University of Technilogy, Graz 2000
Wigner, E.P. (1958): On the distribution of the roots of certain symmetric matrices, Ann.
Math. 67 (1958)
Wilcox, R.R. (1997): Introduction to robust estimation and hypothesis testing, Academic
Press, San Diego 1997
Wilcox, R.R. (2001): Fundamentals of modern statistical methods, Springer Verlag, New
York 2001
740 References
Wilders, P. and E. Brakkee (1999): Schwarz and Schur: an algebraical note on equiva-
lence properties, SIAM J. Sci. Comput. 20 (1999) 2297-2303
Wilkinson, J. (1965): The algebraic eigenvalue problem, Clarendon Press, Oxford 1965
Wilks, S.S. (1962): Mathematical statistics, J. Wiley, New York 1962
Wilks, S.S. (1963): Multivariate statistical outliers, Sankya A25 (1963) 407-426
Williams, E.J. (1963): A comparison of the direct and fiducial arguments in the estimation
of a parameter, J. Royal Statistical Society, Series B, 25 (1963) 95-99
Wimmer, G. (1995): Properly recorded estimate and confidence regions obtained by an
approximate covariance operator in a special nonlinear model, Applications of
Mathematics 40 (1995) 411-431
Wimmer, H. (1981a): Ein Beitrag zur Gewichtsoptimierung geodätischer Netze, Deutsche
Geodätische Kommission, München, Reihe C (1981), 269
Wimmer, H. (1981b): Second-order design of geodetic networks by an iterative approxi-
mation of a given criterion matrix, in: Proc. of the IAG Symposium on geodetic net-
works and computations, R. Sigle ed., Deutsche Geodätische Kommission, München,
Reihe B, Nr. 258 (1981), Heft Nr. III, 112-127
Wishart, J. (1928): The generalized product moment distribution in samples from a nor-
mal multivariate population, Biometrika 20 (1928) 32-52
Wishner, R., Tabaczynski, J. and M. Athans (1969): A comparison of three non-linear
filters, Automatica 5 (1969) 487-496
Witkovsky, Viktor (1998): Modified minimax quadratic estimation of variance compo-
nents, Kybernetika 34 (1998) 535-543
Witting, H. and G. Nölle (1970): Angewandte Mathematische Statistik, Teubner Verlag,
Stuttgart 1970
Wolf, H. (1968): Ausgleichungsrechnung nach der Methode der kleinsten Quadrate,
Ferdinand Dümmlers Verlag, Bonn 1968
Wolf, H. (1973): Die Helmert-Inverse bei freien geodätischen Netzen, Z. Vermessungs-
wesen 98 (1973) 396-398
Wolf, H. (1975): Ausgleichungsrechnung I, Formeln zur praktischen Anwendung,
Duemmler, Bonn 1975
Wolf, H. (1976): The Helmert block method, its origin and development, Proc. 2 nd Int.
Symp. on problems related to the Redefinition of North American Geodetic Networks,
pp. 319-326, Arlington 1976
Wolf, H. (1980a): Ausgleichungsrechnung II, Aufgaben und Beispiel zur praktischen
Anwendung, Duemmler, Bonn 1980
Wolf, H. (1980b): Hypothesentests im Gauß-Helmert-Modell, Allg. Vermessungsnach-
richten 87 (1980) 277-284
Wolf, H. (1997): Ausgleichungsrechnung I und II, 3. Auflage, Ferdinand Dümmler Ver-
lag, Bonn 1997
Wolfowitz, J. and J. Kiefer (1959): Optimum design in regression problems, Ann. Math.
Statist. 30 (1959) 271-294
Wolkowicz, H. and G.P.H. Styan (1980): More bounds for eigenvalues using traces,
Linear Algebra Appl. 31 (1980) 1-17
Wolter, K.H. and Fuller, W.A. (1982): Estimation of the quadratic errors-in-variables
model, Biometrika 69 (1982) 175-182
Wong, C.S. (1993): Linear models in a general parametric form, Sankya 55 (1993) 130-
149
Wong, W.K. (1992): A unified approach to the construction of minimax designs, Bio-
metrika 79 (1992) 611-620
References 741
Wood, A. (1982): A bimodal distribution for the sphere, Applied Statistics 31 (1982) 52-
58
Woolson, R.F. and W.R. Clarke (1984): Analysis of categorical incomplete data, J. Royal
Statist. Soc. Series A 147 (1984) 87-99
Worbs, E. (1955): Carl Friedrich Gauß, ein Lebensbild, Leipzig 1955
Wu, C.F.J. (1981): Asymptotic theory of nonlinear least squares estimation, Ann. Stat. 9
(1981) 501-513
Wu, Q. and Z. Jiang (1997): The existence of the uniformly minimum risk equivariant
estimator in Sure model, Commun. Statist. - Theory Meth. 26 (1997) 113-128
Wunsch, G. (1986): Handbuch der Systemtheorie, Oldenbourg Verlag, München 1986
Xi, Z. (1993): Iterated Tikhonov regularization for linear ill-posed problems, PhD. Thesis,
Universität Kaiserslautern, Kaiserslautern 1993
Xu, P. (1989): On robust estimation with correlated observations, Bulletin Géodesique 63
(1989) 237-252
Xu, P. (1991): Least squares collocation with incorrect prior information, Z. Verm. 116
(1991) 266-273
Xu, P. (1992): The value of minimum norm estimation of geopotential fields, Geoph. J.
Int. 111 (1992) 170-178
Xu, P. (1995): Testing the hypotheses of non-estimable functions in free net adjustment
models, Manuscripta Geodaetica 20 (1995) 73-81
Xu, P. (1999a): Biases and accuracy of, and an alternative to, discrete nonlinear filters, J.
Geodesy 73 (1999) 35-46
Xu, P. (1999b): Spectral theory of constrained second-rank symmetric random tensors,
Geoph. J. Int. 138 (1999) 1-24
Xu, P. (2001): Random simulation and GPS decorrelation, J. Geodesy 75 (2001) 408-423
Xu, P. (2002): Isotropic probabilistic models for directions, planes and referential sys-
tems, Proc. Royal Soc. London A 458 (2002) 2017-2038
Xu, P. and E.W. Grafarend (1996): Statistics and geometry of the eigenspectra of three-
dimensional second-rank symmetric random tensors, Geophysical Journal Interna-
tional 127 (1996) 744-756
Xu, P. and E.W. Grafarend (1996): Probability distribution of eigenspectra and eigendi-
rections of a twodimensional, symmetric rank two random tensor, J. Geodesy 70
(1996) 419-430
Yaglom, A.M. (1961): Second-order homogeneous random fields in: Proceedings of the
Fourth Berkeley Symposium on Mathematical Statistics and Probability, pages 593-
622, University of California Press, Berkeley 1961
Yang, H. (1996): Efficiency matrix and the partial ordering of estimate, Commun. Statist.
- Theory Meth. 25(2) (1996) 457-468
Yang, Y. (1994): Robust estimation for dependent observations, Manuscripta Geodaetica
19 (1994) 10-17
Yang, Y. (1999a): Robust estimation of geodetic datum transformation, J. Geodesy 73
(1999) 268-274
Yang, Y. (1999b): Robust estimation of systematic errors of satellite laser range, J. Geod-
esy 73 (1999) 345-349
Yazji, S. (1998): The effect of the characteristic distance of the correlation function on the
optimal design of geodetic networks, Acta Geod. Geoph. Hung., Vol. 33 (2-4) (1998)
215-234
Ye, Y. (1997): Interior point algorithms: Theory and analysis, Wiley, New York 1997
Yeh, A.B. (1998): A bootstrap procedure in linear regression with nonstationary errors,
The Canadian J. Stat. 26 (1998) 149-160
742 References
Yeung, M.-C. and T.F. Chan (1997): Probabilistic analysis of Gaussian elimination with-
out pivoting, SIAM J. Matrix Anal. Appl. 18 (1997) 499-517
Ylvisaker, D. (1977): Test resistance, J. Am Statis. Assn 72, 551-557, 1977
Yohai, V.J. and R.H. Zamar (1997): Optimal locally robust M-estimates of regression, J.
Statistical Planning and Inference 64 (1997) 309-323
Yor, M. (1992): Some aspects of Brownian motion, Part I: Some special functionals,
Birkhäuser Verlag, Basel 1992
Yor, M. (1992): Some aspects of Brownian motion, Part II: Some recent martingale prob-
lems, Birkhäuser Verlag, Basel 1997
Youcai, H. and S.P. Mertikas (1995): On the designs of robust regression estimators,
Manuscripta Geodaetica 20 (1995) 145-160
Youssef, A.H.A. (1998): Coefficient of determination for random regression model,
Egypt. Statist. J. 42 (1998) 188-196
Yu, Z.C. (1996): A universal formula of maximum likelihood estimation of variance-
covariance components, J. Geodesy 70 (1996) 233-240
Yuan, K.H. and P.M. Bentler (1997): Mean and covariance structure analysis- theoretical
and practical improvements, J. American Statist. Assoc. 92 (1997) 767-774
Yuan, K.H. and P.M. Bentler (1998a): Robust mean and covariance structure analysis,
British J. Mathematical and Statistical Psychology (1998) 63-88
Yuan, K.H. and P.M. Bentler (1998b): Robust mean and covariance structure analysis
through iteratively reweighed least squares, Psychometrika 65 (2000) 43-58
Yuan, Y. (2000): On the truncated conjugate gradient method, Math. Prog. 87 (2000) 561-
571
Yuan, Y. (1999): On the truncated conjugate gradient method, Springer- Verlag
Yusuf, S., Peto, R., Lewis, J., Collins, R. and P. Sleight (1985): Beta blockade during and
after myocardial infarction: An overview of the randomized trials, Progress in Car-
diovascular Diseases 27 (1985) 335-371
Zabell, S. (1992): R.A. Fisher and the fiducial argument, Statistical Science 7 (1992) 369-
387
Zackin, R., de Gruttola, V. and N. Laird (1996): Nonparametric mixed-effects for re-
peated binary data arising in serial dilution assays: Application to estimating viral
burden in AIDS, J. American Statist. Assoc. 91 (1996) 52-61
Zacks, S.(1971): The theory of statistical inference, J. Wiley, New York 1971
Zacks, S. (1996): Adaptive designs for parametric models, 151-180
Zadeh, L. (1965): Fuzzy sets, Information and Control 8 (1965) 338-353
Závoti, J. (1999): Modified versions of estimates based on least squares and minimum
norm, Acta Geod. Geoph. Hung. 34 (1999) 79-86
Závoti, J. (2001): Filtering of earth’s polar motion using trigonometric interpolation, Acta
Geod. Geoph. Hung. 36 (2001) 345-352
Zehfuss, G. (1858): Über eine gewisse Determinante, Zeitschrift für Mathematik und
Physik 3 (1858) 298-301
Zellner, A. (1971): An introduction to Bayesian inference in econometrics, J. Wiley, New
York 1971
Zha, H. (1995): Comments on large least squares problems involving Kronecker products,
SIAM J. Matrix Anal. Appl. 16 (1995) 1172
Zhan, X. (2000): Singular values of differences of positive semidefinite matrices, Siam J.
Matrix Anal. Appl. 22 (2000) 819-823
Zhang, Y. (1985): The exact distribution of the Moore-Penrose inverse of X with a den-
sity, in: Multivariate Analysis VI, Krishnaiah, P.R. (ed), pages 633-635, Elsevier,
New York 1985
References 743
Zhang, J.Z., Chen, L.H. and N.Y. Deng (2000): A family of scaled factorized Broyden-
like methods for nonlinear least squares problems, SIAM J. Optim. 10 (2000) 1163-
1179
Zhang, Z. and Y. Huang (2003): A projection method for least squares problems with a
quadratic equality constraint, SIAM J. Matrix Anal. Appl. 25 (2003) 188-212
Zhao, L. (2000): Some contributions to M-estimation in linear models, J. Statistical Plan-
ning an Inference 88 (2000) 189-203
Zhao, Y. and S. Konishi (1997): Limit distributions of multivariate kurtosis and moments
under Watson rotational symmetric distributions, Statistics & Probability Letters 32
(1997) 291-299
Zhdanov, M.S. (2002): Geophysical inverse theory and regularization problems, Methods
in Geochemistry and Geophysics 36, Elsevier, Amsterdam-Boston-London 2002
Zhenhua, X. (1993): Iterated Tikhonov regularization for linear ill-posed problem, PhD.
Thesis University of Kaiserslautern, Kaiserslautern 1993
Zhen-Su She, Jackson, E. and Orszag, S. (1990) : Intermittency of turbulence, in: The
Legacy of John von Neumann (J. Glimm, J. Impagliazzo, I. Singer eds.) Proc. Symp.
Pure Mathematics, vol. 50, pages 197-211, American Mathematical Society, Provi-
dence, Rhode Island 1990
Zhou, J. (2001): Two robust design approaches for linear models with correlated errors,
Statistica Sinica 11 (2001) 261-272
Zhou, K.Q. and S.L. Portnoy (1998): Statistical inference on hetero-skedastic models
based on regression quantiles, J. Nonparametric Statistics 9 (1998) 239-260
Zhong, D. (1997): Robust estimation and optimal selection of polynomial parameters for
the interpolation of GPS geoid heights, J. Geodesy 71 (1997) 552-561
Zhu, Jianjun (1996): Robustness and the robust estimate, J. Geodesy 70 (1996) 586-590
Ziegler, A., Kastner, C. and M. Blettner (1998): The generalised estimating equations: an
annotated bibliography, Biom. J. 40 (1998) 115-139
Zimmermann, H.-J. (1991): Fuzzy set theory and its applications, 2nd ed., Kluwer Aca-
demic Publishers, Dordrecht 1991
Zioutas, G., Camarinopoulos, L. and E.B. Senta (1997): Theory and Methodology: Robust
autoregressive estimates using quadratic programming, European J. Operational Re-
search 101 (1997) 486-498
Zippel, R. (1993): Effective polynomial computations, Kluwer Academic Publishers,
Boston 1993
Zolotarev, V.M. (1997): Modern theory of summation of random variables, VSP, Utrecht
1997
Zucker, D.M., Lieberman, O. and O. Manor (2000): Improved small sample inference in
the mixed linear model: Bartlett correction and adjusted likelihood, J. R. Statist. Soc.
B. 62 (2000) 827-838
Zurmuehl, R. and S. Falk (1984): Matrizen und ihre Anwendungen, Teil 1: Grundlagen,
5.ed., Springer-Verlag, Berlin 1984
Zurmuehl, R. and S. Falk (1986): Matrizen und ihre Anwendungen, Teil 2: Numerische
Methoden, 5.ed., Springer-Verlag, Berlin 1986
Zwet, W.R. van and J. Osterhoff (1967): On the combiantion of independent test statistics,
Ann. Math. Statist. 38 (1967) 659-680
Zyskind, G. (1969): On best linear estimation and a general Gauss-Markov theorem in
linear models with arbitrary nonnegative covariance structure, SIAM J. Appl. Math.
17 (1969) 1190-1202
Index

1-way classification, 460, 461, 463, 464 622, 623, 631, 632, 633, 634, 640,
2-way classification , 464, 467, 469, 642
470, 473, 475 bivariate Gauss-Laplace pdf, 627
3-way classification, n-way classifica- Bjerhammar formula, 485, 516
tion, 476 BLE, 285, 287
3d datum transformation, 431, 433, 441 BLIMBE, 311, 642
BLIP, 347
algebraic regression, 40 BLUMBE, 285, 291, 293, 287, 298, 299,
A-optimal design, 323, 359 300
ARIMA, 455, 477, 478 BLUUE, 188, 187, 194, 195, 201, 208,
arithmetic mean, 191, 195, 184 210, 379, 380, 387, 430, 467, 567,
ARMA, 455, 478 569, 571, 572, 574, 579, 581, 582,
ARMA process, 477 585, 586, 588, 592, 596, 597, 599,
associativity, 486, 497 603, 606, 621, 622, 629, 630, 632,
augmented Helmert matrix, 583 633, 639, 640, 643
autoregressive integrated moving- BLUUP, 380, 387
average process, see ARIMA bordering, 302, 485
break points, 176, 177, 181, 182, 143
best homogeneous linear prediction,
Brown process, 476
400
best homogeneous linear unbiased canonical LESS, 135, 137, 139, 131
prediction, 400 canonical MINOLESS, 281
best inhomogeneous linear prediction, canonical MINOS, 1, 13, 26, 36, 37, 41,
400 212, 485
Best Invariant Quadratic Uniformly Cayley inverse, 322, 323, 324, 359, 360,
Unbiased Estimation, see BIQUUE 497, 498, 499, 501, 502, 513, 517,
Best Linear Uniformly Unbiased 576
Estimation, see BLUUE Cayley multiplication, 513
Best Linear V-Norm Uniformly Mini- Cayley-product, 486, 497
mum Bias S-Norm Estimation, 462 characteristic equation, 509
bias matrix, 87, 88, 91, 93, 94, 288, characteristic polynomial, 509
312, 313, 349 Choleski decomposition, 503
bias vector, 87, 91, 93, 90, 288, 293, collocation, 386, 399
292, 300, 304, 309, 312, 313, column space, 6, 496, 497
314, 320, 321, 322, 349, 350, commutativity, 486
355, 357, 358 condition equations with unknowns, 429
bias weight matrix, 359 conditional equations, 413
BIQE, 285, 294, 299, 300, 301, 303, confidence coefficient, 554
304, 305, 311 confidence interval, 543, 553, 557, 564,
BIQUUE, 187, 189-196, 198-201, 217, 567, 592, 596, 605, 606, 611, 612,
236-241, 285, 294, 298-301, 613, 614, 617, 619, 620
303, 304, 305, 311, 380, 379, confidence region, 543, 621
385, 386, 387, 569, 571, 572, consistent linear equation, 2, 6
574, 579, 581, 582, 588, 596, cumulative pdf, 569, 571, 581
597, 599, 603, 606, 613, 621,
746 Index

cumulative probability, 554, 615, 627, Fisher sphere, 327


628, 630 fixed effects, 304, 306, 309, 312, 313,
curtosis, 614, 645, 649, 647, 651 314, 347, 348, 397
curved manifolds, 327 Fourier analysis, 45, 46, 47, 48, 49, 50, 51
Fourier coefficient, 45
datum defect, 74 Fourier series, 40, 41, 42, 44, 45
datum transformation, 2, 78, 83 Fourier synthesis, 45
determinant, 485, 504, 512 Fourier-Legendre analysis, 59, 63, 64, 65,
determinantal Form, 526 57
diagonal, 488 Fourier-Legendre series, 52, 67
distributed observation, 91 Fourier-Legendre synthesis, 57, 68
distribution Frobenius error matrix, 433
- circular normal, 328, 329, 336, Frobenius matrix norm, 209, 290, 291,
339, 340 314, 313, 350
- circular normal Fisher, 335 Frobenius matrix W - seminorm, 438, 439
- Fisher- von Mises or Langevin, Frobenius norm, 87, 88, 91, 92, 94, 93,
329 122
- Langevin, 328
- oblique normal, 339, 340, 335 gamma function, 550, 553, 604, 605
- von Mises, 335, 336, Gauss-Laplace normal pdf, 573
- Gauss normal, 328 Gauss elimination, 23
distributivity, 486 Gauss matrix, 552
Duncan-Guttman matrix identity, 323, Gauss process, 476
324, 359, 360, 502 Gauss-Helmert model, 411, 412, 413,
dynamical Systems, 476 419, 428, 429
Gauss-Jacobi Combinatorial Algotithm,
eigenspace, 31, 509 176, 177, 179, 180, 182
eigenspace analysis, 1, 27, 30, 31, 33, Gauss-Laplace inequality, 644, 647
34, 40, 118, 151, 131, 133, 135, Gauss-Laplace normal distribution, 567,
243, 281, 621, 624, 625 580, 644, 646, 647, 652
eigenspace synthesis, 1, 28, 30, 33, 34, Gauss-Laplace normally distributed
36, 118, 131, 133, 135, 138, observations, 553, 557, 564
144, 243, 281, 621, 625, 628, Gauss-Laplace pdf, 554, 555, 581, 585
635, 638 Gauss-Laplace probability distribution,
eigenvalue analysis, 632 556, 559, 561, 562
eigenvalue decomposition, 26 Gauss-Markov model
eigenvalue-eigenvector analysis, 510 - consistent linear, 86, 87
eigenvalue-eigenvector decomposition, - general with mixed effects, 379, 385
485 - general linear with fixed effects, 380
eigenvalue-eigenvector synthesis, 510 - inhomogeneous general linear with
eigenvalues, 509, 512 fixed effects, 379
eigenvectors, 509, 512 - linear, 621
Equivalence Theorem, 216 - mixed, 382, 383
error propagation, 365, 366, 648 - mixed with fixed effects, 389, 390,
error-in-variables, 347, 348, 401-405 391, 392
estimability, 653, 654 - mixed with random effects, 389,
extended Helmert matrix, 583 390, 391, 392
- multivariate with constraints, 455,
fibering, 3, 7, 12, 96, 100, 103, 244, 459, 460
248, 256 - multivariate model, 455, 456, 457
Fisher - von Mises, 328 - multivariate with linear homogene-
Fisher distribution, 646 ous constraints, 459
Fisher pdf, 330
Index 747
- special consistent linear, 88 Gy-LESS, 95, 112-116, 118, 129, 130,
- special consistent linear model of 131, 138, 139, 188, 202, 216, 273,
fixed effects, 89, 90 373, 374, 376, 377
- special error propagation law, 648 Gy-MINOS, 264
- special model, 85, 192, 208, 209, Gy-norm, 375
210, 211, 213, 223, 224, 226, Gy-seminorm, 112, 374, 375, 376, 377
228, 236, 237, 240, 241, 285,
297, 302, 306 Hadamard-product, 487
- special with fixed effects, 313, Hankel matrix, 492, 493, 485, 494
315, 316, 319, 320 HAPS, 243, 244, 282, 283, 284, 287
- special with random effects, 347, Hellenic mean, 185
349, 348, 355 Helmert’s Chi Square F 2 pdf, 585, 591,
- special without datum defect, 187 614, 621, 631, 633,
- special linear, 216, 287, 288, 623, Helmert equation, 234, 235, 236
631 Helmert matrix, 233, 234, 235, 485, 491,
- special linear of fixed effects, 631 492, 578, 579
- special linear with datum defect, Helmert random variable, 598, 601, 602,
642 603, 617, 629, 636
- special linear with datum defect of Helmert transformation, 555, 572, 575,
fixed effects, 642 576, 577, 578, 579, 588, 597, 600,
- special linear with fixed effects, 623, 634
321 Helmert’s inverse random variable, 615
- special linear with random effects, Helmert’s pdf, 614, 615, 616
351, 352, 353, 355, 356, 357 Helmert’s polar coordinates, 637
general Gauß-Helmert model, 445, 453 Helmert’s random variable, 613, 614,
general linear model of type conditional 615, 621, 627, 631
equations with unknowns, 453 Helmert’s random variable x, 637
generalized Helmert transformation, Helmert's ansatz, 230-234
635 Hesse matrix, 366, 367, 368, 370, 442,
geometric mean, 184 531, 542, 644, 650
g-inverse, 16, 254, 298, 302, 376, Higher classifications with interaction,
377, 415, 425, 447, 448, 485, 474
513, 514, 516, 517, 519, 521 Hilbert space, 41, 44, 54, 45, 56, 114,
- reflexive and symmetric, 520 HIQUUE, 221, 232, 236, 230, 235, 217,
- reflexive, 16, 110, 485, 513, 515, 234
517, 520 Hodge star operator, 152, 153, 164, 545,
- reflexive symmetric, 485, 513 hom BLE, 312, 313, 315, 316, 317, 319,
- symmetric reflexive, 518, 519 320, 321, 323, 324, 325, 326
goodness of fit, 375 hom BLIP, 347, 349, 351, 353, 355, 359,
Grand Linear Model, 445, 446, 449, 360, 361, 362
450 hom BLUUP, 349
Grassmann coordinates, 143, 152, 156, hom LUMBE, 90, 91,
157, 158, 161, 162, 165, 169- hom S-BLE, 312, 313, 315, 316, 317,
175 318, 319, 320, 321, 323, 324, 325,
Grassmann product, 545 326
Gx-LESS, 180, 181 hom S-BLIP, 347, 349, 351, 353, 354,
(Gx, Gy)-MINOLESS, 261, 264, 263, 355, 356, 357, 359, 360, 361, 362
265, 271, 273, 274, 276, 277, hom S-LUMBE, 88, 90
282 hom Į-BLE, 312, 313, 315, 316, 317,
Gx-MINOS, 1, 18-23, 25, 26, 36, 37, 318, 319, 321, 322, 324, 325, 326
68, 86, 90, 91, 273 hom Į-VIP, 347, 349, 351, 352, 353, 355,
357, 359, 360, 361, 362
748 Index

homogeneous, 124, 125 infinite Fourier series, 48


homogeneous conditional equations inhomogeneous conditions with un-
with unknowns, 421 knowns, 425, 428,
homogeneous conditions with un- invariant quadratic estimation, 191, 223
knowns , 415, 417, 419 invariant quadratic estimation: IQE, 188
homogeneous inconsistent condition invariant quadratic unformly unbiased
equations with unknowns, 422 estimation: IQUUE, 188
homogeneous linear prediction, 349, Invariant Quadratic Uniformly Unbiased
397, 398 Estimation, 193, 226, 230
homogeneous Linear Uniformly Mini- inverse, 497
mum Bias Estimation, 88 - Cayley, see Cayley inverse
homogeneous Į-weighted hybrid mini- - generalized, 2, 16, 18, 23, 25, 26,
mum variance- minimum bias 109, 110, 111, 129, 130, 197, 254,
estimation, 316 259, 265, 273, 276, 291, 293, 294,
Hybrid APproximate Solution, 419 297, 413, 416, 421, 427, 511, 513
- left, 114, 127, 130, 197, 226, 270,
I, I-BLUMBE, 467, 472 271, 312
I, I-HAPS, 421, 423, 412 - left and right, 485
I, I-LESS, 422 - left generalized, 430
I, I-MINOLESS, 412, 421, 422, 423 - Moore-Penrose, see Moore-Penrose
I,I-BLE, 311 inverse
I,I-BLUMBE, 311 - pseudo, 254, 265, 268, 293, 485,
I3 ,I3-BLE, 306 511, 516, 517, 518, 520, 521
I3, I3-BLUMBE, 301 - reflexive, 36, 37, 138, 254, 259,
I3, I3-MINOLESS, 301 265, 268, 273, 293
I-BIQUUE, 203, 202 - reflexive, symmetric generalized ,
I-BLUUE, 203 455
idempotence, 216, 488, 494 - right, 2, 20, 68, 270, 271, 572
idempotent, 24, 25, 231, 514 - symmetric reflexive generalized,
identifiability, 644, 652, 653, 654 462, 467, 472
identity, 497 - weighted right, 21
I-LESS, 113, 144, 127, 148, 149, 159, inverse addition, 486
202, 412, 421, 422, 439, 441 inverse Helmert transformation, 583
I-MINOS, 68, 72, 75 inverse partitioned matrices, 19, 22, 115,
implicit function theorem, 2, 3, 96, 74, 211, 212, 375
533, 537, 538 IQE, 198, 199, 223, 224, 225, 228, 231,
inconsis-tent homogeneous condition 235, 239
equations, 373, 375, 376 IQUUE, 193, 197, 198, 217, 226, 228,
inconsistent inhomogeneous condition 230, 231, 232, 237, 238
equations, 374, 377 IQUUE of Helmert type: HIQUUE, 189
inconsistent linear equation, 466 isotropic, 124, 125
inconsistent linear system of equations,
433 Jacobi matrix, 366, 367, 368, 370
inconsistent, inhomogeneous condition Jacobian determinant, 552
equations with unknowns, 426,
427 Kalman - Bucy Filter, 455
inconsistent, inhomogeneous system of Kalman filter, 478, 479, 480
linear equations, 445 Khatri-Rao-product, 487, 488
inconsistent, inhomogeneous system of Kolmogorov-Wiener prediction, 379,
linear equations, namely, 446 398, 399
independent identically, 91 Kronecker - Zehfuss, 528, 531
independent identically distributed, 93 Kronecker products, 651
Index 749
Kronecker-Rao product, 122 - criterion, 124, 365
Kronecker-Zehfuss Product, 89, 122, - diagonal, 485
211, 318, 354, 456, 486, 521 - dispersion, 90, 93,
- extended Helmert, see extended
Lagrange multipliers, 19, 374, 377, 376, Helmert matrix
382, 415, 417, 419, 425, 426, - hat, 128, 143, 144, 146, 148, 151
428, 447, 449, 450, 533, 536, - Helmert, see Helmert matrix
538, 541 - Hesse, see Hesse matrix
Langevin sphere, 327 - idempotent, 485, 509
Laplace transform, 484 - inverse partitional, 485, 497
Laplace-Beltrami operator, 41 - inverse partitioned, 499, 501
latent conditions, 143 - Jacobi, see Jacobi matrix
latent restrictions, 152, 158, 160, 163, - lower triangular, 488
168, 169, 172 - normal, 485
least squares, 100, 101, 103, 109, 107, - null, 485
111-115, 185, 187 - orthogonal, 485, 489, 490, 492
least squares solution, 96, 248, 252, 375 - orthonormal, 485, 509, 510
left eigenspace, 437 - permutation, 201, 494, 485
left eigenvectors, 509 - projection, 16, 24, 25, 216
LESS, 104, 107-110, 117, 119, 121, - quadratic, 497, 501, 509, 517
122, 126, 135, 189, 190, 480 - quadratic Helmert, see quadratic
LESS model, 458 Helmert matrix
leverage point, 165, 166, 143, 144, 148, - rectangular, 493, 496, 503, 506, 516
149, 151, 176 - rectangular Helmert, see rectangular
likehood function, 652 Helmert matrix
linear equations - underdetermined, 40 - reflexive, 25
linear independence, 495 - rotation, 509
linear uniformly minimum bias estima- - simple dispersion, 91
tor, 85, 86 - simple variance-covariance matrix,
Linear Uniformly Unbiased Estimation, 91, 92, 93
193, 209 - singular, 496
linear Voltera integral equation of the - substitute, 90, 301, 306, 308
first kind, 594 - symmetric, 485, 497, 516, 523, 488
linear Volterra integral equation of the - Taylor Karman criterion, see Taylor
first kind, 556, 557, 619 Karman criterion matrix
linear Volterra integral equation the - Taylor-Karman, Taylor-Karman ma-
first kind, 555 trix
linear Volterra integral equations of the - unity, 485, 488
first kind, 615, 616 - upper triangular, 488
logarithmic mean, 184 - Vandermonde, see Vandermonde
longitudinal and lateral correlation matrix
functions, 366 - zero, 488
Löwner partial ordering, 508 matrix multiplication
LUMBE, 85, 189 - Cayley, 485
LUUE, 193 - Khati-Rao, 485
- Kronecker-Zehfuss, 485
MALE, 205, 206, 208, 217 matrix of courtosis, 644
matrix matrix of obliquity, 644
- adjoint, 503 Maximum Likelihood Estimation, 191,
- antisymmetric, 485, 488, 523 205, 206
- canonically simple dispersion, 93 maximum likelihood estimator MLE, 336
- commutation, 494, 495, 485, 527 Mean Square Error, see MSE
750 Index

Mean Square Estimation Error, see norm, 485


MSE normal, 488
Mean Square Estimation Error, see normal equation, 18, 20, 22, 23, 89, 114,
MSE 116, 117, 114, 194, 212, 240, 262,
Mean Square Prediction Error, see 293, 405
MSPE null space, 4, 5, 6, 9, 10, 11, 12, 14, 24,
median, 184, 191, 201 97, 98, 99, 104, 245, 247, 509,
minimum norm, 7, 252, 248 517, 518, 520,
minimum norm solution, 2, 3, 4, 7, 17, n-way classification model, 455, 460
72, 79, 254, 250
minimum variance, 190 obliquity, 645, 647, 649, 651
MINQUUE, 217 observability, 483
MINOLESS, 235, 243, 244, 245, 256, observation space, 6
258, 263, 269, 270, 272, 281, orthogonal, 488
286, 464, 469 orthogonal complement, 9
MINOS, 12, 14-18, 24, 32, 33, 34, 37, orthogonal functions, 40
38, 52, 64, 65, 75, 77, 85, 189, outlier, 176, 191, 202, 312
243, 255
mixed model, 348, 402 p.d.f., 543, 544, 547
modified Mean Square Estimation parameter space, 11, 12,
Error, 314 partial redundancies, 143, 148
modified Mean Square Estimation Error partitioning
MSE, 319 - algebraic, 3, 7, 12, 15, 96, 100, 103,
modified Mean Square Prediction Error, 244, 248, 256
350, 355, 357 - geometric, 3, 7, 12, 15, 96, 100,
modified method of least squares, 407, 103, 244, 248, 256
408 - rank, 4, 7, 10, 12, 96, 100
moment estimator, 401 - set-theoretical, 3, 7, 12, 96, 100,
Moore-Penrose inverse, 499, 516, 520 103, 244, 248, 256
MSE, 288, 300, 304, 307, 309, 313, - rank, 100, 101, 103, 104, 106, 127,
315, , 314, 319, 320, 321, 322, 128, 133, 137, 139, 160, 166, 244,
323, 325, 326 245, 248, 249, 252, 256, 261
MSPE, 349, 350, 351, 355, 356, 358, Penrose equations, 516
352, 359, 362 Plücker coordinates, 143, 152, 156, 157,
multidimensional Gauss-Laplace 158, 161, 162, 165, 169-175
normal distribution, 621 PLUUP, 479
multivariate BLUUE, 458 polar decomposition, 28, 31
multivariate Gauss-Laplace normal positive-definite, 494
distribution, 640 positive-semidefinite, 494
multivariate Gauss-Laplace probability Principal Component Analysis, 632
distribution, 642 prior information, 2, 23, 119
multivariate Gauss-Markov, 458 probability density function, 543
multivariate least squares solution, 458 Procrustes algorithm, 2, 75, 78, 79, 81,
multivariate LESS, 457 431, 433, 437, 440, 441
Procrustes transformation, 442, 443
necessary conditions, 19, 89, 101, 112, projections, 485
115, 194, 250, 264, 284, 291, pseudo-observations, 179, 181
207, 210, 332, 334, 353, 354,
375, 448, 451, 317, 318, 404, quadratic Helmert matrix, 583, 588, 589,
415, 420 605
nonlinear error propagation, 644, 648, quadratic Helmert transformation, 582,
649, 650 583
Index 751
quasi - Gauss-Laplace normal distribu- singular value representation, 485
tion, 647 singular values, 512
quasi - normal distribution, 644 skew product, 545
R, W-HAPS, 412, 413, 414, 420, 421, skewness, 614
424, 427, 428, 445, 446, 451, slicing, 3, 7, 12, 96, 100, 103, 244, 248,
453 256
R, W-MINOLEES, 411, 412, 413, 414, S-LUMBE, 86, 89, 90
416, 417, 419, 421, 423, 424, special linear error propagation, 644
426, 427, 428, 445, 446, 449, state differential equation, 484
450, 453 state equation, 479, 482
random effect, 301, 305, 307, 309, 347, state vector, 483
348, 351, 379, 380, 397, 398, state-space form, 480
402, 460, 385 statistical homogeneity and isotropy, 365
random effect model, 401, 404 S-transformation, 515, 518, 519, 653
random variable, 544, 543, 545, 547 Student random variable, 598, 600-609
range space, 6, 9 Student’s pdf, 606, 609, 611
rank, 485, 512 Student’s random variable,
rank Student’s t distribution, 596, 603, 605-
- column, 495 606
- row, 495 Sturm-Liouville boundary condition, 41,
rank factorization, 235, 270, 485, 496, 45, 57
503, 517, 510 Sturm-Liouville equations, 46
rank partitioning sufficiency condition, 19, 89, 102, 113,
- additive, 271, 272, 263 115, 195, 207, 211, 251, 264, 284,
- multiplicative, 263, 270 291, 317, 318, 375, 448, 451, 332,
- vertical, 107 334, 353, 354, 355
Rao’s Pandora Box, 375 6y-BLUUE, 187, 188, 210, 211, 216, 379,
rectangular Helmert matrix, 576, 582 380, 381, 382, 383
rectangular Helmert transformation, system of conditional equations, 373
582 system of conditional equations with
reflexiv, 516 unknowns, 411
regression, 40, 85 system of consistent linear equations, 17
ridge estimator, 283, 322, 359 system of directional observational equa-
right eigenspace, 437 tions - inconsistent, 327
right eigenvector, 509 system of homogeneous condition equa-
6, In-BLUUE, 458, 460 tions with unknowns, 414
system of inconsistent linear equations,
Sampling distributions, 543 99, 111
S-BLE, 316 system of inhomogeneous condition
6-BLUUE, 404 equations with unknowns, 424
Schur complements, 382, 383, 384, system of homogenous equations - con-
385, 388, 391, 448, 498, 500, sistent, 521
501, 502 system of inhomogeneous linear equa-
second derivatives, 415 tions
second order design, 120-123 - consistent, 13
Sherman-Morrison-Woodbury matrix - inconsistent, 101
identity, 502 - overdetermined, 100
S-hom BLIP, 352 system of linear equations
similarity transformation, 83, 481, 482 - consistent, 1, 5, 15, 16, 18, 20, 25,
simple bias matrix, 92 520
singular value decomposition, 41, 437,
438, 439, 485, 511, 632
752 Index

- inconsistent, 100, 104, 105, 109, variance component estimation, 634


111, 112, 113, 130, 208, 243, variance-covariance component estima-
245, 246, 256, 258, 280 tion, 217, 218, 460
- overdetermined, 95, 96, 104, 103, variance-covariance components, 218,
106, 134, 135, 137, 189 219, 223, 225, 229, 232, 236
- underdetermined, 1, 3, 10, 32, 33, Variational Problem, 541
34, 36, 49, 52, 87, 90 vec, 506, 507
system of linear observational vech, 506
equations, 84 veck, 506
system of linear observational equations vector valued matrix Forms, 506
- inconsistent, 95-99 Venn diagrams, 4
system of nonlinear equations - overde- VIP, 347
termined, 327 Volterra integral equation of the first
system of observational equation - kind, 565, 607
consistent, 74 von Mises, 328
von Mises circle, 327
Taylor Karman criterion matrix, 124 von Mises pdf, 330
Taylor-Karman matrix, 367 von Mises distribution, 646
Taylor-Karman structure, 125, 365, 366 Vysochainskii - Potunin inequality, 644,
system of conditional equations with 647, 556
unknowns, 411
Three Sigma Rule, 553, 562, Wassel's family of means, 185
total least squares, 114, 348, 402, 403, wedge product, 545
404 W-LESS, 411, 412, 414, 415, 416, 424,
total least squares estimator, 401 425, 431, 432, 433, 434, 435, 437,
trace, 485, 512, 523, 525 438, 441, 443, 445, 446, 447
trace of a matrix, 503, 506
trance, 507 Zlobec formula, 485, 516
two-sided confidence interval, 595, 609,
613, 616
Tykhonov-Phillips regularization, 282,
283, 287, 322, 359

unbiased estimability, 644


unbiased estimation, 191
unbiasedness, 652, 653
Uncertainly Principle, 612, 613, 618,
620
uncertainty number, 619
uncertainty number D, 620
underdetermined, 5, 6, 7, 15,
underdetermined regression problem,
41
uniform unbiasedness, 190

V, S-BLE, 308, 309, 311


V, S-BLUMBE, 301, 302, 303, 304,
305, 311, 462
Vandermonde determinant, 494
Vandermonde matrix, 485, 493
variance component, 218, 221, 222,
224, 226, 227, 228, 230, 231,
232, 236, 237, 238

You might also like