Professional Documents
Culture Documents
György Terdik
Multivariate
Statistical
Methods
Going Beyond the Linear
Frontiers in Probability and the Statistical
Sciences
Editor-in-Chief
Somnath Datta, Sch of Public Health & Info Sci, University of Louisville,
Louisville, KY, USA
Series Editors
Frederi G. Viens, 1399 Math, Science Building, Purdue University, WEST
LAFAYETTE, IN, USA
Dimitris N. Politis, Dept Math, APM 5701, University of California, San Diego,
La Jolla, CA, USA
Konstantinos Fokianos, Mathematics & Statistics, University of Cyprus Mathemat-
ics & Statistics, Nikosia, Cyprus
Michael Daniels, J.T. Patterson Labs Bldg PAT 141-MC, University of Texas,
Austin, USA
The “Frontiers” is a new series of books (edited volumes and monographs) in
probability and statistics designed to capture exciting trends in current research
as they develop. Some emphasis will be given on novel methodologies that may
have interdisciplinary applications in scientific fields such as biology, ecology,
economics, environmental sciences, finance, genetics, material sciences, medicine,
omics studies and public health.
Multivariate Statistical
Methods
Going Beyond the Linear
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2021
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG.
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To
Judit,
and
Bori and Bálint and Ábel
Foreword
As the author rightly points out, much of what we call linear modeling and inference
in multivariate statistics is closely tied to the multivariate Gaussian distribution and
allied topics. Indeed, the classic text by Professor C.R. Rao is aptly titled Linear
Statistical Inference (1973, Wiley) as is one of the books recently co-authored by me
which is titled Linear Models and Regression with R (2020, World Scientific Press).
Non-Gaussian multivariate distributions are characterized by nonlinearity, where the
cumulants play a prominent role. This monograph is a very deep and substantial
piece of work analyzing the cumulants and related statistical measures like the
skewness and kurtosis for the most commonly discussed non-Gaussian multivariate
distributions. This is also a great place to learn about and polish mathematical
prerequisites like multilinear algebra and tensor products for higher-order partial
derivatives of vector-valued functions. A large number of exercises at the end of
each chapter is an added bonus for those who want to use this as a text for an
advanced course. The extensive set of nearly two hundred references, which range
from a manuscript that goes back to 1931 to the most recent papers appearing in
2020 and 2021, demonstrates the breadth and depth of coverage.
This monograph will be a standard reference for those who are interested in
multivariate distribution theory that goes beyond the multidimensional normal, and
will remain so for many years to come!
vii
Preface
Linear theory in statistics is closely connected to the normal distribution. One of the
main reasons for this is that the best predictor is linear when the random variables are
jointly normal, and an important characterization of the normal distribution is that
all cumulants of order higher than 2 are zero. The linear theory is pretty much well
defined, unlike the nonlinear theory, which can proceed in many different ways. A
careful study of the cumulants is a necessary and typical part of nonlinear statistics.
Such a study of cumulants for multivariate distributions is made complicated by
the index notations. One solution to this problem is the usage of tensor analysis.
In this book we offer an alternate method, which we believe is simpler to follow.
The higher-order cumulants with the same degree for a multivariate vector can
be collected together and kept as a vector. To be able to do so, we introduce a
particular differential operator on a multivariate function, called the T -derivative,
and use it to obtain cumulants and provide results which are somewhat analogous to
well-known results in the univariate case. We demonstrate this technique through
the characterization of several multivariate distributions via their cumulants and
by extending the discussion to statistical inference for multivariate skewness and
kurtosis.
The book is organized as follows:
Chapter 1 introduces some basic notions and methods which are used in
permutations, matrix theory, multilinear algebra, and set partitions.
Chapter 2 deals with the method of tensor products for the higher-order partial
derivatives of vector-valued functions. Faà di Bruno’s formula is also discussed here.
Chapter 3 concerns the basic theory of T -moments and T -cumulants. Besides
connections between cumulants and moments, there are results which relate the
cumulants of products to products of cumulants, conditional cumulants, etc.
Chapter 4 covers the elementary theory of the nonlinear Hilbert space of
Gaussian variates. Multivariate vector-valued Hermite polynomials are introduced
with their basic properties and their moments and cumulants are derived.
Chapters 5 and 6 deal with various applications of the material of the previous
chapters. In particular, Chap. 5 deals with the cumulants of multivariate skew-
distributions, including skew-normal, skew-spherical, skew-t, scale mixtures of
ix
x Preface
xi
xii Contents
2.2 T-derivative.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 70
2.2.1 Differentials and Derivatives . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 70
2.2.2 The Operator of T-derivative . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 72
2.2.3 Basic Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 74
2.2.4 T-derivative of T-products . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 77
2.2.5 Taylor Series Expansion .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 88
2.3 Multi-Variable Faà di Bruno’s Formula . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 90
2.4 Appendix .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 98
2.4.1 Proof of Faà di Bruno’s Lemma.. . . . . . . . .. . . . . . . . . . . . . . . . . . . . 98
2.4.2 Proof of Faà di Bruno’s T-formula .. . . . . .. . . . . . . . . . . . . . . . . . . . 99
2.4.3 Moment Commutators .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 100
2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 101
2.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 106
3 T-Moments and T-Cumulants . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 107
3.1 Multiple Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 107
3.2 Tensor Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 110
3.3 Cumulants for Multiple Variables.. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 118
3.3.1 Definition of Cumulants .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 118
3.3.2 Definition of T-cumulants . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 120
3.3.3 Basic Properties .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 124
3.4 Expressions between Moments and Cumulants ... . . . . . . . . . . . . . . . . . . . 126
3.4.1 Expression for Cumulants via Moments .. . . . . . . . . . . . . . . . . . . . 126
3.4.2 Expressions for Moments via Cumulants.. . . . . . . . . . . . . . . . . . . 138
3.4.3 Expression of the Cumulant of Products via
Products of Cumulants . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 143
3.5 Additional Matters .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 149
3.5.1 Expressions of Moments and Cumulants via
Preceding Moments and Cumulants . . . . .. . . . . . . . . . . . . . . . . . . . 149
3.5.2 Cumulants and Fourier Transform . . . . . . .. . . . . . . . . . . . . . . . . . . . 151
3.5.3 Conditional Cumulants . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 154
3.5.4 Cumulants of the Log-likelihood Function .. . . . . . . . . . . . . . . . . 161
3.6 Appendix .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 167
3.6.1 Proof of Lemma 3.6 and Theorem 3.7 . . .. . . . . . . . . . . . . . . . . . . . 167
3.6.2 A Hint for Proof of Lemma 3.8 . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 170
3.6.3 Proof of Lemma 3.2 . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 172
3.6.4 Proof of Lemma 3.5 . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 173
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 174
3.8 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 180
4 Gaussian Systems, T-Hermite Polynomials, Moments,
and Cumulants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 183
4.1 Hermite Polynomials in One Variable . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 183
4.2 Hermite Polynomials of Several Variables . . . . . . .. . . . . . . . . . . . . . . . . . . . 185
4.3 Moments and Cumulants for Gaussian Systems .. . . . . . . . . . . . . . . . . . . . 190
Contents xiii
4.3.1
Moments of Gaussian Systems and Hermite
Polynomials .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 190
4.3.2 Cumulants for Product of Gaussian Variates and
Hermite Polynomials . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 193
4.4 Products of Hermite Polynomials, Linearization.. . . . . . . . . . . . . . . . . . . . 197
4.5 T-Hermite Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 202
4.6 Moments, Cumulants, and Linearization . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 216
4.6.1 Cumulants for T-Hermite Polynomials . .. . . . . . . . . . . . . . . . . . . . 220
4.6.2 Products for T-Hermite Polynomials.. . . .. . . . . . . . . . . . . . . . . . . . 223
4.7 Gram–Charlier Expansion.. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 226
4.8 Appendix .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 232
4.8.1 Proof of Theorem 4.2 .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 232
4.8.2 Proof of (4.79) . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 233
4.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 234
4.10 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 238
5 Multivariate Skew Distributions . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 241
5.1 The Multivariate Skew-Normal Distribution . . . . .. . . . . . . . . . . . . . . . . . . . 241
5.1.1 The Inverse Mill’s Ratio and the Central Folded
Normal Distribution . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 242
5.1.2 Skew-Normal Random Variates . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 244
5.1.3 Canonical Fundamental Skew-Normal (CFUSN)
Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 247
5.2 Elliptically Symmetric and Skew-Spherical Distributions .. . . . . . . . . . 252
5.2.1 Elliptically Contoured Distributions . . . . .. . . . . . . . . . . . . . . . . . . . 253
5.2.2 Multivariate Moments and Cumulants . . .. . . . . . . . . . . . . . . . . . . . 261
5.2.3 Canonical Fundamental Skew-Spherical Distribution . . . . . . 266
5.3 Multivariate Skew-t Distribution .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 275
5.3.1 Multivariate t-Distribution.. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 275
5.3.2 Skew-t Distribution . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 277
5.3.3 Higher-Order Cumulants of Skew-t Distributions.. . . . . . . . . . 278
5.4 Scale Mixtures of Skew-Normal Distribution .. . .. . . . . . . . . . . . . . . . . . . . 285
5.5 Multivariate Skew-Normal-Cauchy Distribution .. . . . . . . . . . . . . . . . . . . . 287
5.5.1 Moments of h(|Z|) . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 292
5.6 Multivariate Laplace .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 294
5.7 Appendix .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 297
5.7.1 Spherically Symmetric Distribution . . . . .. . . . . . . . . . . . . . . . . . . . 297
5.7.2 T-Derivative of an Inner Product .. . . . . . . .. . . . . . . . . . . . . . . . . . . . 303
5.7.3 Proof of (5.44).. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 304
5.7.4 Proof of Lemma 5.6 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 306
5.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 307
5.9 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 310
6 Multivariate Skewness and Kurtosis .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 313
6.1 Multivariate Skewness of Random Vectors.. . . . . .. . . . . . . . . . . . . . . . . . . . 313
6.2 Multivariate Kurtosis of Random Vectors . . . . . . . .. . . . . . . . . . . . . . . . . . . . 321
xiv Contents
A Formulae .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 351
A.1 Bell Polynomials.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 351
A.1.1 Incomplete (Partial) Bell Polynomials .. .. . . . . . . . . . . . . . . . . . . . 351
A.1.2 Bell Polynomials .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 352
A.2 Commutators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 353
A.2.1 Moment Commutators .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 353
A.2.2 Commutators Connected to T-Hermite Polynomials .. . . . . . . 356
A.3 Derivatives of Composite Functions .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 359
A.4 Moments, Cumulants .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 361
A.4.1 T-Moments, T-Cumulants . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 361
A.5 Hermite Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 363
A.5.1 Product of Hermite Polynomials .. . . . . . . .. . . . . . . . . . . . . . . . . . . . 363
A.5.2 T-Hermite Polynomials .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 365
A.6 Function G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 368
A.6.1 Moments, Cumulants for Skew-t Generator R . . . . . . . . . . . . . . 370
A.6.2 Moments of Beta Powers . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 374
A.7 Complementary Error Function .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 376
A.8 Derivatives of i-Mill’s Ratio. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 378
Notations . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 381
Solutions. . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 385
References .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 409
Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 417
Chapter 1
Some Introductory Algebra
1.1 Permutations
The product operation of permutations has the important property that it is not
commutative, as the following example illustrates.
Example 1.1 Let
123 123
p= , q= ;
312 213
then
123 123
pq = , qp = .
132 321
Thus we see that pq = qp, so the product operation does not commute.
We will denote the inverse permutation of p by p−1 , for which pp−1 = p−1 p = e,
where e is the identity mapping. Pn constitutes a group with product operation
pq. This group Pn is not commutative. Its generators are the interchanges of
adjacent elements (i, i + 1), called (standard) “transpositions.” Each permutation is
equivalent to a series (product) of transpositions, where the number of transpositions
is kept to a minimum.
We will not necessarily separate the numbers by commas, if they each have one
digit in a permutation. For instance, p = (312) is an abbreviation for the permutation
p = (3, 1, 2) of numbers 1:3.
Remark 1.1 A simple way to generate the inverse of a permutation p to take its
elements from 1 to n and writing down the original indices (places) in that order.
For instance, let p = (312); here “1” has index 2, “2” has index 3, and “2” has
index 1; therefore, the result is p−1 = (231). This simple algorithm for inverting
permutations can easily be implemented as a computer code since usually, when
sorting the entries of a vector, one can also record the original indices of the entries.
We will use commutator matrices to change the order of a tensor product
of vectors. In this way, we reorder the terms in a product and obtain a new
permutation of the vectors. We can apply commutator matrices successively and
obtain new permutations. The consecutive application of matrices corresponds to
the consecutive application of permutations; e.g. first p then q, will be denoted by
q × p.
The following remark is worth noting.
1.1 Permutations 3
i.e. (j, p (j )) elements of matrix Pp are ones. Any permutation matrix is orthogonal,
that is
Pp = P−1
p = Pp−1 .
and the consecutive application of these permutations also gives the same result
the odd places are taken by the values of k1:n and at the even places by jp(1:n) . An
instance is the following,
let k1:3 = (2, 3, 4) and j1:3 = (5, 7, 8), then all possible
permutations m3 (2, 3, 4) , (5, 7, 8)p(1:3) are
Let A and B be m×n and p×q matrices, respectively. The tensor product A⊗B
of A
and B, which we shall refer to as T-product for short, is defined by A ⊗ B = aij B ,
i.e.
⎡ ⎤
a11B . . . a1n B
⎢ ⎥
A ⊗ B = ⎣ ... . . . ... ⎦ .
am1 B . . . amn B
The tensor product A ⊗ B is a matrix of mp × nq. This product is also called the
direct product or the Kronecker product. Throughout this book we shall apply the
following precedence: a matrix product comes before a T-product
AB ⊗ C = (AB) ⊗ C,
aA ⊗ bB = ab (A ⊗ B) ,
A ⊗ B ⊗ C = (A ⊗ B) ⊗ C = A ⊗ (B ⊗ C) ,
(A + B) ⊗ (C + D) = A ⊗ C + A ⊗ D + B ⊗ C + B ⊗ D.
6 1 Some Introductory Algebra
3. A connection between T-products and the ordinary matrix products is the mixed
product rule
(A ⊗ B) (C ⊗ D) = AC ⊗ BD, (1.3)
(A ⊗ B) = A ⊗ B .
tr (A ⊗ B) = (trA) (trB) ,
The most frequently used properties of the vec operator are the following:
1. If A, B, and C are matrices such that the product ABC is defined then
vec (ABC) = C ⊗ A vecB. (1.4)
where A and B are matrices of the same order. Notice the notation vec A =
(vecA) , at (1.9), which will be applied later on as well. Suppose that the product
ABCD is defined; then we have
tr (ABCD) = vec D C ⊗ A vecB (1.10)
= vec D A ⊗ C vecB .
T-products have the advantage of having the possibility of commuting their factors
with the help of a linear operator called commutation matrix.
We will use the property of commutativity by linear operators mostly when we
deal with vectors.
8 1 Some Introductory Algebra
where the elementary matrix Ei,j is m × n, and all entries of Ei,j are zero except the
(i, j )th entry, which is 1. Notice that Ei,j = Ej,i ; therefore, the transpose of A is
A = ai,j Ei,j .
i,j
The T-product is not commutative. Nevertheless, A⊗B and B⊗A are permutation
equivalent from the following basic properties of the commutator Km•n :
1. We have
Km•n = K−1
m•n = Kn•m .
(see (1.5)). Observe that Km•p reorders a T-product of vectors with dimension
p × 1 and m × 1, respectively, into the product when the order of vectors is:
m × 1 is the first and p × 1 is the second.
4. More generally, if A is m × n and B is p × q and b is p × 1, then
in particular we have
Km•m (A ⊗ A) = (A ⊗ A) Kn•n .
Equation (1.13) shows that matrix Km•p interchanges b and a in a T-product. One
can identify the dimensions of a and b from the notation Km•p , and vice versa,
the dimensions of a and b define the index of Km•p . In this notation, the order in
the subscript shows the order of the result, namely Km•p interchanges vectors with
dimensions p and m with dimensions m and p in a T-product of two vectors:
Km•p (b ⊗ a) = a ⊗ b.
The usage of notation K(21) (p, m) for Km•p will be useful for our purposes, where
the subscript denotes a permutation (21) (in cycle notation (2, 1)S ). The usage of
cycle notation will be more convenient for the permutations of several elements.
Subscripts m and p show the dimensions of starting vectors that are the subject of
T-products. Although the version K(21) (p, m) seems an unnecessary complication,
its application will be clear later on when multiple T-products are considered. In the
case of interchanging only two vectors in a T-product of two vectors the notation
Km•p is more compact than K(21) (p, m) and we will use both of them in the sequel.
More generally, the subscript of K will show the permutation that is applied to a
T-product, while the order of the dimensions is the same as the original order of the
vectors.
Let a1 and a2 be vectors with dimensions d1 and d2 , respectively; then by this
new notation we have
⊗ ⊗
K(j +1,j )S (d1:n ) = Idi ⊗ K(21) dj , dj +1 ⊗ Id
i=1:j −1 i=j +2:n i
= Id1:(j−1) ⊗ K(21) dj , dj +1 ⊗ Id(j+2):n ,
Note that the entries of d1:n are not necessarily equal, they are the dimensions of
vectors ai , i = 1, 2, . . . , n, and they are given in the order of vectors
ai .
In practice when we apply Kp (d1:n ) to a T-product of vectors ⊗ i=1:n ai , we shall
⊗
simply write Kp i=1:n ai and the dimensions d1:n will be defined by the T-product,
i.e. dimensions d1:n will be omitted from the notation.
The following example shows that Kp (d1:n ) can be constructed by joining terms
in a T-product for a permutation p and by changing the set d1:n of the dimensions.
Example 1.5 Let the permutation p = (432)S of 4 elements. It has only one cycle
(432)S , that is p = (1423). Now we apply p to product a1 ⊗ a2 ⊗ a3 ⊗ a4 , and
notice a1 ⊗ a2 ⊗ a3 ⊗ a4 = a1 ⊗ (a2 ⊗ a3 ) ⊗ a4 = a1 ⊗ b3 ⊗ a4 , where b3 =
a2 ⊗a3 , with dimension d2 d3 . The permutation (23)S of 3 elements with dimensions
(d1 , d2 d3 , d4 ) will provide the same result as p; hence, the commutator follows:
K(432)S (d1:4 ) = K(23)S (d1 , d2 d3 , d4 ) = Id1 ⊗ K(21)S (d2 d3 , d4 ) = Id1 ⊗ Kd4 •d2 d3 .
Notice that the permutation (432)S interchanges the product (a2 ⊗ a3 ) and a4 . This
process can be applied to some permutations p, and it may simplify the process.
Remark 1.3 In Example 1.5 the commutator K(23)S (d1 , d2 d3 , d4 ) will not change
if we interchange d2 and d3 in the product d2 d3 . This does not mean that applying
K(23)S (d1 , d2 d3 , d4 ) to a1 ⊗ (a2 ⊗ a3 ) ⊗ a4 would be the same as applying it to
a1 ⊗ (a3 ⊗ a2 ) ⊗ a4 . Therefore, one has to keep the order of a T-product correct.
Usually, we pay attention to the form of Kp in terms of Km•p , since we calculate
Km•p easily, and based on this we can calculate Kp , for some situations, as a direct
by-product of interchanges.
The result of the following Lemma gives a simple method of transposing the kth
element of the T-product of vectors to the last place and leaves the rest unchanged.
It is useful for calculating T-derivatives.
Lemma 1.2 Consider the permutation p of transposing the last element to the kth
place, and leaving the rest of the elements unchanged, it is given by an -cycle
(n : k)S = (n, n − 1, . . . , k)S with
length = n − k + 1. The commutator for
permutation (n : k)S on a T-product ⊗ i=1:n ai with dimensions d1:n is
K(n:k)S (d1:n ) = Id1:(k−1) ⊗ K(2,1)S dk:(n−1) , dn (1.24)
= Id1:(k−1) ⊗ Kdn •dk:(n−1) ,
where d1:(k−1) and dk:(n−1) denote the product of the corresponding dimensions.
1.3 Symmetrization and Multilinear Algebra 13
1.3.1 Symmetrization
The tensor product ⊗ 1:q aj , aj ∈ R is linear in each of its components. This
d
rank 1.
14 1 Some Introductory Algebra
q
The Euclidean space Rd is spanned by d q unit vectors, each of them is a q-tensor
product
⊗
e⊗
j1:q = eji , (1.28)
1:q
of unit vectors ej spanning the space Rd , and ej is the unit vector in the j th
coordinate direction. We consider the simplest case when the entries of the unit
basis vectors ej ∈ Rd are zero except the j th one that is 1, and we call this basis
canonical or coordinate basis.
Recall ⊗ permutation p ∈ Pq we have a commutator matrix Kp such
that for each
that Kp ⊗ a
1:q j = 1:q ap(j ) , see Sect. 1.2.3, p. 7.
Definition 1.1 A tensor w ∈ Md,q is called q-symmetric if w = Kp w, for any
permutation p ∈ Pq . q-symmetric tensors in Md,q constitute a linear subspace of
Md,q , which will be denoted by Sd,q .
The elements of Sd,q will also be called Sd,q -symmetric, when dimension d is
also important.
Example 1.6 Let
w (a) = a⊗q ;
Let us define the symmetrizer matrix Sd1q for the symmetrization of a T-product
of q vectors with the same dimension d by
1
Sd1q = Kp , (1.29)
q!
p∈Pq
where Pq denotes the set of all permutations of the numbers 1 : q, and the
summation extends over all q! permutations p ∈ Pq . For instance, the result
of Sd14 (a1 ⊗ a2 ⊗ a3 ⊗ a4 ) is a vector with dimension d 4 , and it is symmetric.
Symmetrizer Sd1q is called q-way symmetrization matrix as well, q-symmetrizer
and q-symmetrization for short.
Symmetrizer Sd1q can be generated by an algorithm using commutator matrices
Kp of all permutations. This needs quite some computer capacity when d and q
1.3 Symmetrization and Multilinear Algebra 15
become large, since the number of permutations and the sizes of matrices grow fast.
It seems reasonable to construct a library with matrices Sd1q of different d and q.
We shall use the symmetrizer for simplifying some expressions in T-products;
applying Sd1q to expressions yields symmetric expressions.
The image of Md,q via symmetrizer Sd1q is a subspace that has been denoted by
Sd,q ; it is invariant under operator Sd1q . Space Md,q is with dimension d q , and the
linear subspace Sd,q is with dimension
d+q−1
ŋd,q = . (1.30)
q
The first term a⊗3 is 3-symmetric; the term a⊗2 ⊗ b is invariant under those
permutations that interchange the first two components; hence, we have
K(213) a⊗2 ⊗ b = Id 3 a⊗2 ⊗ b .
Those permutations, when we interchange entries 2 and 3, also give the same results
K(231) a⊗2 ⊗ b = K(132) a⊗2 ⊗ b ,
K(312) a⊗2 ⊗ b = K(321) a⊗2 ⊗ b .
Now, we see
1
Id 3 + K(132) + K(312) a⊗2 ⊗ b = 2Id 3 + 2K(312) + 2K(132) a⊗2 ⊗ b
2
1
= I 3 + K(132) + K(213) + K(321)
2 d
+K(312) + K(231) a⊗2 ⊗ b .
1
Sd13 = Id 3 + K(132) + K(213) + K(321) + K(312) + K(231)
3!
and obtain
⊗2 3!
Id 3 + K(132) + K(312) a ⊗ b = Sd13 a⊗2 ⊗ b .
2
16 1 Some Introductory Algebra
Therefore,
w (a, b) = a⊗3 − Id 3 + K(132) + K(312) a⊗2 ⊗ b
= a⊗3 − 3Sd13 a⊗2 ⊗ b .
w (a, b) = a⊗3 − 3a⊗2 ⊗ b.
wk1 , . . . , wkq .
q
way. For any w ∈ Rd we introduce q-plet (triplet, quadruplet,
etc.) indices, call
it the multi-index of the entries
of w.
The q-plet index k 1:q is an ordered vector
of kj ∈ 1 : d, i.e. w = w(k1:q ) corresponds to the indices of the entries of
⊗ q
1:q a ∈ Rd , so that kj denotes the index of j th entry in the product, namely if
j q
aj = a1:d,j , then the index of the term j =1 akj ,j is k1:q .
Example 1.9 Consider aj = [a1 , a2 ]j , and let q = 3. Then
w = a1 ⊗ a2 ⊗ a3
= a1,1 a1,2a1,3 , a1,1 a1,2 a2,3, a1,1 a2,2 a1,3, a1,1 a2,2 a2,3 , a2,1 a1,2a1,3 , a2,1
a1,2 a2,3 , a2,1 a2,2a2,3 , a2,1 a2,2a2,3 ,
where ak,j denotes the kth entry of aj . The product a1 ⊗ a2 ⊗ a3 has eight entries
a1 ⊗a2 ⊗a3 = w(1,1,1) , w(1,1,2) , w(1,2,1) , w(1,2,2) , w(2,1,1) , w(2,1,2) , w(2,2,1) , w(2,2,2) ,
for instance a(1,1,2) = a(1,2,1) = a(2,1,1) = a12 a2 . The indices of distinct values are
IÐ2,3 = {(1, 1, 1) , (1, 1, 2) , (1, 2, 2) , (2, 2, 2)}.
18 1 Some Introductory Algebra
d
a= a k ek ,
k=1
⊗q
d
a ⊗q
= a m em = a(j1:q ) e⊗
j1:q .
m=1 (j1:q )
Now we collect the same values of coefficients a(j1:q ) . They correspond to the
indices of distinct values
a⊗q = a(j1:q ) e⊗
k1:q ,
(j1:q )∈Id,q
Ð (k1:q )|p(k1:q )=(j1:q )
where the second sum is taken over all multi-indices k1:q when there exists a
permutation p ∈ Pq , such that p k1:q = j1:q ∈ IÐ d,q .
The set k1:q |p k1:q = j1:q includes only the distinct permutations p j1:q
of multi-index
j1:q . For instance, if j1:q = (1, . . . , 1), then the only possible
value of k1:q is (1, . . . , 1). Actually, a⊗q ∈ Sd,q ; hence,
a⊗q = Sd1q a⊗q = a(j1:q ) Sd1q e⊗
k1:q ,
(j1:q )∈IÐd,q (k1:q )|p(k1:q )=(j1:q )
e⊗
j1:q = Sd1q e⊗
k1:q = e⊗
k1:q ,
(k1:q )|p(k1:q )=(j1:q ) (k1:q )|p(k1:q )=(j1:q )
⎛ ⎞ ⎛ ⎞
⊗ ⊗
em1:q = ⎝
ej1:q
e⊗
k1:q
⎠ ⎝ e⊗
k1:q
⎠ = 0,
(k1:q )|p(k1:q )=(j1:q ) (k1:q )|p(k1:q )=(m1:q )
since there are no equal vectors in these sums and all are orthogonal.
1.3 Symmetrization and Multilinear Algebra 19
Let us introduce the type (j1:q ) of an index j1:q such that (j1:q ) =
1 , .. . , q , where k denotes the cardinality number of k ∈ 1 : q, in the index
j1:q . For instance, the type of (j1:3 ) = (1, 1, 2) is (j1:3 ) = [2, 1, 0]. It is clear that
the permutation of a multi-index j1:q will not change its type.
Now for any j1:q ∈ IÐ the set of multi-indices k1:q , so that p k1:q =
d,q
j1:q , p ∈ Pq includes all possible permutations of j1:q ; hence, the sum runs
over (j1:q ) ! terms where (j1:q ) is the type of j1:q . Therefore, we conclude that
2 q!
e⊗
= .
j1:q
(j1:q ) !
Lemma 1.4 The system e⊗
j1:q ∈ Sd,q | j 1:q ∈ I Ð
d,q is a complete orthogonal
system in Sd,q .
Proof We have seen the orthogonality above.
Thecompleteness follows from the
expansion by the distinct entries of w = w(k1:q ) ∈ Sd,q , which defines and is
defined by a vector of Rŋd,q .
Now the dimension of Sd,q and the number of distinct values ŋd,q coincide (see
(1.30)). The following table shows the dimensions of Sd,q when q changes.
It is seen that the dimension changes rapidly. The particular orthogonal system
for d = 3 and q = 2 is the following.
Example 1.11 We let d = 3 and q = 2, then ej ∈ R3 , the indices of distinct entries
are IÐ3,2 = {(1, 1) , (1, 2) , (1, 3) , (2, 2) , (2, 3) , (3, 3)}, hence we have e⊗ ⊗2
j,j = ej ,
j = 1 : 3, e⊗j,j +1 = ej ⊗ ej +1 + ej +1 ⊗ ej , j = 1 : 2, and e⊗
j,j +2 = ej ⊗ ej +2 +
ej +2 ⊗ ej , j = 1. If w ∈ S3,2 , then we can write
3
2
w= e⊗
wj,jj,j + e⊗
wj,j +1 e⊗
j,j +1 + w1,31,3
j =1 j =1
2
3−k
= e⊗
wj,j +kj,j +k .
k=0 j =1
20 1 Some Introductory Algebra
Example 1.12 Let us consider the case IÐ 2,3 —see Example 1.10 above. The orthog-
⊗ ⊗3 ⊗
onal system includes e1,1,2 = e⊗2
e(1,1,1) = e1 , ⊗2
1 ⊗ e2 + e2 ⊗ e1 + e1 ⊗ e2 ⊗ e1 ,
e⊗
⊗2 ⊗2
e2,2,2 = e⊗3
1,2,2 = e1 ⊗ e2 + e2 ⊗ e1 + e2 ⊗ e1 ⊗ e2 , 2 . The norm square of these
2 2 2 2
e⊗
vectors are 1,1,1 e⊗
= 2,2,2 e⊗
= 1, and 1,1,2 e⊗
= 1,2,2 = 3.
e⊗
and the coefficient ofj1:q is calculated by the following:
Lemma 1.5 Let w ∈ Md,q ; then the orthogonal projection w of w into space Sd,q
is given by
w= e⊗
(j1:q )
w j1:q ,
(j1:q )∈Id,q
Ð
where
1
(j1:q ) =
w 2
w(k1:q ) .
e⊗
j1:q (k1:q )|p(k1:q )=(j1:q )
2
e⊗
w
We also have e⊗
j1:q = j1:q (j1:q ) . If w ∈ Sd,q , then w
w (j1:q ) = w(j1:q ) .
1.3 Symmetrization and Multilinear Algebra 21
e⊗
vectors Ð
j1:q with the same order as the entries of w ,
Qd,q = e⊗
j1:q .
(j1:q )∈IÐd,q
The vector wÐ is with dimension ŋd,q , see (1.30), and the matrix Qd,q is d q ×ŋd,q ,
so we can take product
Qd,q wÐ = w e⊗
(j1:q )j1:q = w, (1.31)
( j1:q ∈IÐ
) d,q
exists, and it has the property that the distinct elements of a q-symmetric vector w
can be obtained by
wÐ = Q +
d,q w. (1.32)
Matrix Q+ d,q will be called a q-way elimination matrix (elimination matrix for
+
short).Qd,q elimination matrix is not unique, in general, since there are several
methods of collecting the distinct elements of a w ∈ Sd,q , and let us collect them
into a vector like wÐ . We have synchronized the triplet: vector wÐ of distinct entries
of w ∈ Sd,q , the q-plication matrix Qd,q , and the elimination matrix Q+ d,q together
by fixing the alphabetical orderof the indices. The vector w Ð is defined by the
order of the orthogonal system e⊗
j1:q . We use the lexicographic ordering of the
22 1 Some Introductory Algebra
coordinate vectors e⊗
j1:q by their multi-indices for selection of distinct entries.
This results in removing each second and later occurrence of the same element from
w. The following example shows our treatment.
Example 1.13 The vector form of a 3 × 3 symmetric matrix
⎡ ⎤
a1,1 a1,2 a1,3
A = ⎣ a1,2 a2,2 a2,3 ⎦
a1,3 a2,3 a3,3
2
e⊗
where as we have seen j1:q is the number of e⊗ e⊗
k1:q in the construction j1:q . The
product
Qd,q 1d q = ωd,q (1.33)
and norm-weights
! "
3!
ω3,3 = = [1, 3, 3, 3, 6, 1, 3, 3, 3, 1] .
k ! k=1:ŋd,q
We see that for q = 3, the multi-index (k, k, k) occurs once and (i, j, k) with distinct
i, j , k, occurs 6 times.
It is more convenient to renumerate the vectors of Q3,3 by the order they appear,
i.e.
⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
Q3,3 = e1 ,
e2 ,
e3 ,
e4 ,
e5 ,
e6 ,
e7 ,
e8 ,
e9 ,
e10 .
10
x= e⊗
xkk .
k=1
10
e⊗
2
y = Q3,2 x = xk k ek = ω3,3 x,
k=1
where ek denotes the standard unit vector in R10 , and denotes the Hadamard
(element wise) product. In some cases, depending on the software
at hand,
it is
more convenient to replace the Hadamard product ω3,3 x by Diagω3,3 x, where
Diagω3,3 is the diagonal matrix of vector ω3,3 .
Consider the transformation y = Qd,q x of an x ∈ Sd,q , the dimension of y is
ŋd,q , see (1.30), and
y = Qd,q x = Qd,q e⊗
x(j1:q )j1:q
(j1:q )∈IÐd,q
2 2
= e⊗
j1:q x(j1:q ) ek (j1:q ) = ω d,q xÐ = Diag e⊗
j1:q xÐ ,
(j1:q )∈IÐd,q
where ek (j1:q ) is a coordinate vector in Rŋd,q , with the same ordering k j1:q as xÐ .
24 1 Some Introductory Algebra
q
The operator Qd,q is valid for each vector in the Euclidean space Rd as well. If
q
x ∈ Rd , then
x= x(i1:q ) e⊗
i1:q ,
i1:q
and calculating Qd,q x we need the products
⊗
ej1:q e⊗
i1:q = e⊗ ⊗
k1:q ei1:q = δp(k1:q )i1:q .
(k1:q )|p(k1:q )=(j1:q ) (k1:q )|p(k1:q )=(j1:q )
It follows that the coefficient of a coordinate vector ek (j1:q ) in Rŋd,q is the sum of
those x(k1:q ) for p k1:q = j1:q ,
⎛ ⎞
y = Qd,q x = ⎝ x(k1:q ) ⎠ ek (j1:q ) .
( j1:q ∈IÐ
) d,q (k1:q )|p(k1:q )=(j1:q )
where
x(j1:q ) denotes the distinct values of the symmetrized x; hence, we obtained
y = ωd,q Q+
d,2 Sd1q x .
Remark 1.6 As far as x ∈ Sd,q , the transpose Qd,2 of the q-plication matrix Qd,2
acts as
Qd,2 x = ωd,2 Q+ d,2 x ,
and one can replace Q+
d,2 x by the distinct values x of x; hence, Qd,2 x = ωd,2 x .
Ð Ð
1.3 Symmetrization and Multilinear Algebra 25
Q+ +
d,q Kp w = Qd,q w.
Q+ +
d,q Kp = Qd,q .
Q+ +
d,q Sd1q w = Qd,q w,
Iŋd,q = Q+ +
d,q Qd,q = Qd,q Sd1q Qd,q .
Now consider the connection between the symmetrizer and the q-plication; for
any wÐ ∈ Rŋd,q , and for any commutator Kp , we have
Kp Qd,q wÐ = Kp w = w = Qd,q wÐ ;
Kp Qd,q = Qd,q ;
Sd1q Qd,q Q+ +
d,q = Qd,q Qd,q ,
and Sd1q = Sd1q = S2d1q is an idempotent matrix with rank ŋd,q . Hence, Sd1q =
Qd,q Q+
d,q .
26 1 Some Introductory Algebra
it picks up the distinct elements of w assuming w ∈ Sd,q first and then q-plicate it.
The result is / Sd,q , then
Sd1q w ∈ Sd,q . Again if w ∈ Sd1q w = Sd1q w, although both
belong to Sd,q , i.e. both are q-symmetric.
Example 1.16 Let w = 1 : 8, then Sd1q w = [1, 2, 2, 4, 2, 4, 4, 8], and Sd1q w =
[1, 10/3, 10/3, 17/3, 10/3, 17/3, 17/3, 8] (see the Example 1.10).
One possible check to see whether w ∈ Sd,q is to take the equation
Sd1q w = w,
since it will be fulfilled only if w ∈ Sd,q .
There are at least two different concepts of partitions and both refer to the process
of dividing an object into smaller sub-objects.
1. Integer Partitions: A partition of an integer n is a way to write it as a sum of
positive integers, such as 4 = 4, 4 = 3 + 1, 4 = 2 + 2, 4 = 2 + 1 + 1,
4 = 1 + 1 + 1 + 1.
2. Set Partitions: A partition of a set divides it into disjoint subsets. Let K be a
set of subsets obtained from set . K is a partition of if the included subsets
of K are disjoint, non-empty, and their union is the whole set .
There are some connections between these concepts as we will see later. For
instance the number of all partitions of number 4 as we have seen is 5, and at the
same time the number of all partitions of set = {1, 2, 3, 4}, see Partition Table 1.3
below, is 15. The integer partitions of 4 show the possible block cardinalities of a
set partitions of 4 elements.
We consider set partitions.
A set of n elements can be split into a set of disjoint subsets, i.e. it can be
partitioned. The set of n elements will correspond to set 1 : n = {1, 2, . . . , n}.
If K = {b1 , b2 , . . . , br }, where each bj ⊂ 1 : n, then K is a partition provided
∪bj = 1 : n, each bj is non-empty and bj ∩ bi = ∅ (the empty set) is disjoint
whenever j = i. The subsets bj, j = 1, 2, . . . , r are called the blocks of K. We
will call r (the number of the blocks in partition K), the size of K, and denote it by
|K| = r, and a partition with size r will be denoted by K{r} . Let us denote the set
of all partitions of the numbers 1 : n by Pn .
Partition matrices are very productive for handling set partitions. By using
matrices ideas and methods become transparent and provide algorithms naturally.
Below we give an algorithm for generating all partitions of set 1 : n. The reason to
do so is to be able to generate partitions by a simple computer
program.
Consider a set of n-dimensional row vectors uj = uj,1 , uj,2 , . . . , uj,n ,
(j = 1, 2, . . . , r), where uj,k is either one or zero and at least one entry is 1, i.e.
1.4 Partitions and Diagrams 27
the matrix
⎡⎤
u1
⎢ u2 ⎥
⎢ ⎥
U=⎢ . ⎥ (1.34)
⎣ .. ⎦
ur
is r × n, and its entries are only zeros and ones. Let the sum of the
# entries of each
column be 1, i.e. the sum of the rows of matrix U (denoted by U) be the row
vector [1, 1, 1, . . . , 1] = 1n :
r
U= uj = 1n . (1.35)
j =1
Such a vector system uj , j = 1, 2, ..., r, i.e. the r × n matrix U, see (1.34), called
partition matrix. It corresponds to the rth size partition K = {b1 , b2 , . . . , br } of
set 1 : n in the following way. Each row vector uj corresponds to a block bj of 1 : n
with respect to the index of 1s in uj , i.e. a particular value k from 1 : n is in block
bj if uj,k = 1, see Example 1.21 below.
To illustrate, let n = 1, then we have one element and one partition
U = u1 = [1] . (1.36)
Let n = 2, and we have a partition v 1 = [1, 1], with both elements in it, and
another partition with blocks u1 = [1, 0] , u2 = [0, 1] corresponding to the second
term of (1.37). (To avoid any possibility of confusion we used v 1 instead of u1 for
the first term as the elements are different.) So we have two matrices
U1 = [v 1 ] = [1, 1] ,
! " ! "
u1 10
U2 = = , (1.37)
u2 01
Definition 1.3 If the new partition matrix has the same number of rows as the
previous matrix (with an extra column), we call the extension inclusive. If the
number of rows increases by one (with an extra column), we will call the extension
exclusive.
At each new stage we will have one exclusive extension to every partition of the
previous stage and the number of inclusive extensions is equal to the number of
rows of the previous matrix. We note that each matrix contributes to one partition
and each row of the matrix contributes to one block of the partition.
To obtain the above partitions we proceed by obtaining partitions for a set of
lower cardinality (n − 1) and by inclusion and exclusion arguments we can obtain
partitions for a set of cardinality n.
This procedure works in general as well. We add all the possible inclusive and
exclusive extensions to each partition K ∈ Pn−1 to obtain all the partitions of Pn
based on the partitions of Pn−1 . In this way we will obtain all the partitions of Pn
because if L ∈ Pn then one of the blocks, say b0 ∈ L, contains the element n.
If |b0 | = 1, then L is generated by exclusive extension from partition K ∈ Pn−1
containing the rest of the blocks of L. If |b0 | > 1, then L is generated by inclusive
extension from partition K ∈ Pn−1 containing all the blocks of L, but instead
of b0 block b1 = b0 \n, i.e. the element n is discarded from b0 ; in other words
b0 = b1 ∪ (n).
Now we can extend the results to the case n = 4 as we do in Table 1.3.
1.4 Partitions and Diagrams 29
r
uj = 1n ,
j =1
When n = 4, see Table 1.3. In this case we have 15 matrices, thus leading to 15
partitions. The number of terms in which there are various partitions is as follows:
We can give a general recursive formula for computing the number of partitions.
Let Sn (r) denote the number of partitions of 1 : n with r blocks. This is called the
Stirling numbers of second kind or the Stirling partition number. Sn+1 (r) satisfies
the recursive equation
observe the difference to (1.50), where we consider the partitions of pairs. The
explanation for the recursion (1.38) lies in the fact that one can generate the
partitions of {1, 2, . . . , n} with the help of the partitions of {1, 2, . . . , n − 1}. As we
have seen for n = 2, 3, 4 a partition with size r of {1, 2, . . . , n} can be derived by
the inclusive extension of a partition with size r of (1, 2, . . . , n − 1), the number
of these is r, and by the exclusive extension of a partition with size (r − 1) of
{1, 2, . . . , n − 1} the number of these in each case is 1, hence the recursive formula.
We note that the initial values of this recursion are Sn (n) = 1, Sn (1) = 1,
Sn (0) = 0, and Sn (r) = 0 if r > n.
From (1.38) we obtain the following cases:
Case 1.1 Let n = 1,
r = 1, S1 (1) = 1.
1.4 Partitions and Diagrams 31
r = 1, S4 (1) = 1,
r = 2, S4 (2) = S3 (1) + 2S3 (2) = 7,
r = 3, S4 (3) = S3 (2) + 3S3 (3) = 6,
r = 4, S4 (4) = 1.
n
Bn = Sn (r). (1.39)
r=1
n! n!
N = n r = n , (1.41)
j =1 j ! k
j =1 j ! j =1 j ! (j !)
j
note 0! = 1, and the latter formula will be used throughout below. Observe that
if all j equal either 0 or 1, in other words n is divided into r distinct parts: n =
j1 + . . . + jr , jk = jm , if k = m, then N coincides with the multinomial coefficient
n
N = . (1.42)
j1 , . . . , jr
with size r
n! j
n−r+1
Bn,r (x1 , . . . , xn−r+1 ) =
N x1:n = n−r+1 xj
j =r, j =r, j =1 j ! (j !)j j =1
j j =n j j =n
n−r+1 j
1 xj
= n! ,
j ! j!
j =r, j =1
j j =n
#n−r+1
where the sum is taken for all types with constraint j ≥ 0, j =1 j = r, and
#n−r+1
j =1 j j = n.
Polynomials Bn,r are called incomplete (exponential) Bell polynomials. These
polynomials provide a compact form of partitions with fixed size.
The following examples will clarify these ideas.
Example 1.19 If n = 4, the number of all partitions with type = [1, 0, 1, 0] is
4!/3! = 4. That is 1 = 1 (one block with one element, k1 = 1), 2 = 0 (no blocks
with 2 elements), and 3 = 1 (one block with k2 = 3 elements). Compare with
Table 1.3.
In the following example we consider the complete picture for n = 5.
Example 1.20 Let n = 5, and the possible sizes r of partitions K are r =
1, . . . , 5.
1. If r = 1, then there is one partition with one block and type 1 = [0, 0, 0, 0, 1],
N1 = 1, S5 (1) = 1,
B5,1 (x1 , . . . , x5 ) = x5 .
N
[1, 0, 0, 1, 0] 5!/4! = 5
[0, 1, 1, 0, 0] 5!/2!3! = 10
The number of all partitions of size 2 is S5 (2) = S4 (1) + 2S4 (2) = 15 (see (1.38)
and Table 1.4).
N
[1, 2, 0, 0, 0] 5!/2!2!2! = 15
[2, 0, 1, 0, 0] 5!/2!3! = 10
4. If r = 4, we have partitions with 4 blocks and the only type is 1 = [3, 1, 0, 0, 0],
N1 = 5!/2!3! = 10, S5 (4) = 10
5. If r = 5, then we have one partition with 5 blocks and the only type is 1 =
[5, 0, 0, 0, 0], N1 = 5!/5! = 1, S5 (5) = 1,
Finally, the number of all partitions is B5 = 52, and the complete Bell
polynomial is
B5 (x1 , . . . , x5 ) = x15 +10x13x2 +10x12x3 +15x1 x22 +5x1 x4 +10x2x3 +x5 . (1.43)
n
n n!
n−r+1
xj j
Bn (x1 , . . . , xn ) = Bn,r x1 , . . . , xn−r+1 = n−r+1 .
j ! j =1 j!
r=1 r=1 j =r, j =1
j j =n
(1.44)
It is worth noting that Bell polynomials can be put in a direct form without
preliminary calculation of incomplete Bell polynomials, see (A.1), p. 352.
The correspondence between partitions K and matrices U is not unique because one
can permute the rows of U without changing partition K. Therefore we introduce
1.4 Partitions and Diagrams 35
uj ≤ uk
if
n
n
uj,m 2−m ≤ uk,m 2−m . (1.45)
m=1 m=1
and then the corresponding partition K = {b1 = (1, 3) , b2 = (2, 4)}. In this way
we can order both the blocks of a partition and the elements in a block.
Definition 1.7 A partition K is in canonical form if the blocks of K with larger
cardinality come before the smaller cardinality, similarly to the partition matrix.
The order of the blocks of the same cardinality is alphabetical and elements in each
block are increasing.
2. For r = 2; L1 = {(2, 3), (1)}, L2 = {(1, 3), (2)}, L3 = {(1, 2), (3)},
! "
011
L1 ↔ U2,1 = , p U2,1 = p (L1 ) = (231) ,
100
! "
101
L2 ↔ U2,2 = , p U2,2 = p (L2 ) = (132) ,
010
! "
110
L3 ↔ U2,3 = , p U2,3 = p (L3 ) = (123) ,
001
where p Lj lists the element of the blocks without separation.
3. For r = 3; L = {(1), (2) , (3)}
⎡ ⎤
100
L ↔ U3 = I = I3 = ⎣ 0 1 0 ⎦ , p (U3 ) = p (L) = (1 : 3) .
001
Note that permutations depend on partitions; therefore, we shall also use the
notation p (L). It can be seen that for a given permutation, say p = (1 : 3) one can
place several, actually 3, partitions. If the size of a partition is fixed at, say r = 2,
then partitions correspond to distinct permutations.
If the size of partitions is given, then we have a unique correspondence between
partitions and permutations. This will be satisfactory for later relationship between
partitions and permutations.
1.4 Partitions and Diagrams 37
It is quite clear that summing up some rows of a partition matrix U will result in
a partition again. Actually that operation implies the union of the corresponding
blocks in a partition L. Moreover, we can keep the canonical form replacing the
sum in the place of the first row in the summation. One can see that each partition
can be obtained by summing up some rows of U = In . This is the finest partition.
To be more precise, consider the set Pn of all the partitions of 1 : n and define a
partial order on Pn . In this case, one says that partition K comes before, or is finer
than, partition L (K is contained in partition L), and K L whenever the blocks
of L are the unions of the blocks of K; clearly L is coarser. The coarsest, i.e. the
largest partition in that sense, is the partition with one block On = {1 : n} and the
finest, i.e. the smallest one, is the partition with n blocks In = {(1), (2), . . . , (n)}.
Each set of elements has a least upper bound and a greatest lower bound, so that it
forms a lattice.
Example 1.22 Partition matrix
⎡ ⎤
1001
U = ⎣0 1 0 0⎦,
0010
i.e. U = K = {(1, 4) , (2) , (3)}, is finer than L = {(1, 2, 4) , (3)}, and the only
partition that is finer than K is I. It can be seen that summing the first and second
rows results in L. Similarly, to get a finer partition than K one has to split the first
row of K.
Splitting up a row generates a finer partition. If we consider all possible splits of
a row, then the result is like generating all the possible partitions of the elements of
that row. The following general result provides an algorithm for generating all finer
partition of a partition L ∈ Pn .
The union K ∪ L of partitions K and L is defined as the smallest partition that
contains both of them. Actually the blocks b of the partition K ∪ L are defined by
the condition that the elements j and k are in b if and only if there exists a chain
j = 1 , 2 , . . ., p = k such that for 1 ≤ m < p, m and m+1 are in the same block
of K or L (see Example 1.23). Suppose that partition matrices U and V correspond
to the partitions L and K, respectively. Then V is finer than U, V U, if each row
of U can be obtained by summing up some rows of V. The union K ∪ L ↔ V ∪ U is
given by summing up the minimal number of rows of both partition matrices until
V ∪ U contains both U and V. Similarly On ↔ 1 and I ↔ In , where In is the
38 1 Some Introductory Algebra
unit matrix with dimension n. The conclusion of the above idea is that set Pn can
be dressed up to the lattice structure. The lattice structure needs a partial ordering
and two operations between the elements. In our case set Pn already has the partial
ordering () and the operation ∪. Now we define the intersection K ∩ L as another
operation, the coarsest common refinement. The elements j and k of 1 : n are in the
same block of K ∩ L if and only if j and k are in the same block as both K and L.
The intersection K ∩ L includes all possible non-empty intersections of the blocks
of K with the blocks of L.
Example 1.23 Let n = 10, K = {(1, 2) , (3, 4) , (5, 6, 7, 8) , (9) , (10)} and
L = {(1, 2, 4) , (3, 6, 7) , (5, 8) , (9, 10)}. Then K ∪ L = {(1, 2, 3, 4, 5, 6, 7, 8) ,
(9, 10)}, and
K ∩ L = {(1, 2) , (3) , (4) , (5, 8) , (6, 7) , (9) , (10)}. An instance: 2 and 3 are in a
block of K ∪ L since there is a chain j = 1 = 2, 2 = 4, 3 = k = 3.
Example 1.24 Consider a partition L ∈ P4 , with two blocks a1 = (1, 4),
and a2 = (2, 3). The partitions of set {1, 4} consisting of elements of a1 are
K11 = (1, 4) , K21 = {(1) , (4)}. Partitions with respect to a2 are K12 = (2, 3),
K22 = {(2) , (3)}. Now the union of K11 and K22 , say, is K11 ∪K22 = {(1, 4) , (2) , (3)}.
The union K11 ∪ K22 ∈ P4 is a partition of set {1, 2, 3, 4} and K11 ∪ K22 L. We see
that any union Ki1 ∪ Kj2 , i, j = 1, 2, is finer than L; moreover, if a partition K is
finer than L, then it is of the form Ki1 ∪ Kj2 .
that two blocks b1 and b2 of K hook with respect to L if there exists a block of
L containing at least one element from each of the blocks b1 and b2 . Two blocks
b1 and b2 of K communicate with respect to partition L if there exists a series of
rows (blocks) am1 = bi1 , am2 , . . ., ams = bi2 of K, such that amj and amj+1 hook
with respect to L for j = 1, 2, . . . , s − 1. Now one can define a partition K is
indecomposable if all blocks communicate. Partition K is indecomposable, in that
sense, if and only if there are no blocks bi1 , bi2 , . . ., bis of K, s < |K| and rows
(blocks) au1 , au2 , . . . , aut of L, where |K| denotes the number of blocks in K, such
that
Notice that s < |K| guarantees that there are no two non-empty blocks of (1 : n),
such that each of them can be composed as the union of the blocks of both partitions
K and L, since O can always be considered a block that is the union of the blocks
of both K and L.
We conclude that a partition K is indecomposable with respect to partition L
if all blocks of K communicate. Observe again that the roles of K and L are
interchangeable considering formula (1.46). A partition K is also referred to as
connected with respect to partition L if K and L are indecomposable.
Example 1.25 Suppose that the partition L of 1 : n contains n − 1 blocks (|L| =
n − 1), then necessarily there are n − 2 blocks of individual elements, say a1 = (1),
a2 = (2), . . ., an−2 = (n − 2) and one with a pair an−1 = (n − 1, n). It can be
seen that there are no blocks of any partition K that hook with respect to blocks of
L with individual elements. Therefore, the only possibility for blocks b1 and b2 of
K to communicate with respect to L is that one of them should contain the element
n − 1 and the other should contain the element n.
The only way that partition K be indecomposable with respect to partition L is
that K contains either one (K = O ) or two blocks (|K| = 2) only, b1 and b2 , say,
and the elements n − 1 and n belong to different blocks, i.e. b1 and b2 hook with
respect to the block an−1 = (n − 1, n).
Example 1.26 All the indecomposable partitions of (1, 2, 3, 4) with respect to par-
tition L = {(1, 2), (3, 4)} are K1 = {(1, 2, 3, 4)}, K2 = {(1, 2, 3), (4)} and K3 =
{(1, 2, 4), (3)}, K4 = {(1, 4, 3), (2)}, K5 = {(4, 2, 3), (1)}, K6 = {(1, 3), (2, 4)},
K7 = {(1, 4), (2, 3)}, K8 = {(1, 3), (2) , (4)}, K9 = {(1) , (4) , (2, 3)}, K10 =
{(1, 4), (2) , (3)}, K11 = {(2, 4), (1) , (3)}.
For the mathematically minded reader, we give a graph-theoretic interpretation
of the analysis of indecomposable partitions.
Graph A graph G = (V , E) consists of two sets, a finite set V of elements called
vertices and a finite set E of elements called edges. Each edge is identified with a
pair of vertices; if e ∈ E is an edge, then e = (v1 , v2 ), where v1 and v2 are (not
necessarily different) vertices in V .
40 1 Some Introductory Algebra
When rows are summed up we obtain the row vector 1 = [1, 1, 1, 1] . The partition
corresponding to the matrix O = (1, 1, 1, 1) is always indecomposable, as this is
the partition with a single element (block), which, by definition, is indecomposable.
This is the smallest partition one can achieve. So partition K1 = {(1, 2, 3, 4)} is
indecomposable. Now consider partition K2 = {(1, 3) , (2, 4)}. We associate the
matrix
! "
1010
V2 = ,
0101
where the first row corresponds to the block (1, 3) , and the second row corresponds
to the block (2, 4) . Consider the product of partition matrices,
!"
101
V2 U = .
011
1.4 Partitions and Diagrams 41
i.e. the product is a block diagonal matrix. If the condition (1.47) is not satisfied,
then partition K is indecomposable with respect to L.
In the case of the above example we see V2 U is not in a block diagonal form;
hence, partition K2 is indecomposable with respect to L. In fact the condition (1.47)
is a necessary and sufficient condition for partition K to be decomposable with
respect to L.
There is another way to demonstrate the meaning of indecomposable partitions.
Consider a graph (K, L) with vertices of blocks of K and edges with respect to
L, i.e. two vertices (blocks) b1 and b2 of K are connected if there exists a block
of L containing at least one to one element from each of the blocks b1 and b2 .
Now the partitions K and L are indecomposable if and only if any vertex can be
reached from any other vertex, in other words if the graph is closed. One can check
whether the partitions K and L are indecomposable in the following way. Take
partition matrices U and V corresponding to the partitions L and K, respectively,
and consider the product VU . Notice that an entry ci,j of VU is not zero if and
only if the ith row of V and the j th row of U have at least one nonzero entry at the
same place, i.e. the ith block of K and the j th block of L have at least one common
element. The partitions K and L are indecomposable if and only if one cannot find
blocks A and B such that VU has the block diagonal form
! "
A0
(1.48)
0 B
with possible reordering rows and columns of VU , i.e. there exist permutation
matrices P1 and P2 of appropriate orders such that
! "
A0
VU = P1 P2 . (1.49)
0 B
Reordering is admissible; hence, for instance, permuting the rows of VU corre-
sponds to permuting the rows of V, which will not change partition K. Remember
that the partitions K and L are indecomposable if and only if L and K are
indecomposable. Note that K and K are never indecomposable unless K = O.
Suppose that partition L is given. The problem of finding all partitions K that
are indecomposable with respect to L is equivalent to the problem of finding all
partitions whose blocks are connected by L.
42 1 Some Introductory Algebra
Example 1.27 All the indecomposable partitions of (1, 2, 3) with respect to par-
tition L = {(1), (2, 3)} are K1 = {(1, 2, 3)}, K2 = {(1, 3), (2)}, and K3 =
{(1, 2), (3)}. In other words the block (2, 3) of L is able to ”connect” the blocks
of Kj , j = 1, 2, 3.
It is easy to see that the partitions K and L are indecomposable if and only if
the rows and columns of the matrix W = VU are connected, i.e. the question
is whether the matrix W1 = WW has the form (1.49) and from now on at each
new stage we take the matrix product Wk = W2k−1 (W1 and therefore all Wk are
symmetric). If the rows and columns of the matrix W = VU are connected, then
in log2 (|W|) steps, where |W| denotes the size of W, all the entries of Wk will
be positive. Otherwise if the rows and columns of the matrix W = VU are not
connected, i.e. it has the form
! "
A0
W = P1 P2 ,
0 B
then L will ”connect” the blocks of K = {(1), (2, 3) , (4) , (5)} with partition matrix
⎡ ⎤
10 00 0
⎢0 1 10 0⎥
V=⎢
⎣0 0
⎥,
01 0⎦
00 00 1
which cannot be put into the form (1.48) because of the connection of rows. Now
let us take the next step
! "
21
W1 = V U V U = ,
13
1.4.8 Diagrams
n!
SnII = (n − 1)!! = , (1.51)
2n/2 (n/2)!
and we let SnII = 0, for odd n. The set PII n is considered empty if n is odd and it has
(n − 1)!! elements if n is even.
The partition matrix VII corresponding to K II has exactly n/2 rows with two
1s in each row. The number of all such matrices (in canonical form) is (n − 1)!!
because the canonical form implies that the place of the first 1 in each row is fixed.
The second 1 can be put into the remaining places of the first row, which is n − 1
possible choices; then n − 3 places remain in the second row and so on.
Now if L is an arbitrary initial partition of 1 : n and K II ∈ PII n, then L, K
II
b2 . The diagram is also given by two partition matrices U, VII corresponding to
L, K II .
A graph is called closed if any vertex can be reached from any other vertex, i.e. the
partitions L and K II are indecomposable. A diagram L, K II has a loop if there is
a block a ∈ K II and b ∈ L such that b contains a. See Example below.
a block
The diagram L, K II will be called closed and without a loop if the graph
defined by it is closed(indecomposable) and has no loop,
respectively. We shall use
the notations L, K II cl , “cl” for closed and L, K II cnl , “cnl” for closed no loop,
accordingly. It can be seen that the rows of the matrix product VII U contain only
either 1s or 2s besides 0s. The diagram is without a loop if VII U has no entry with
2 at all. The diagram is closed if the matrix product VII U cannot be partitioned,
with a possible reordering of its rows and columns, in the following way:
! "
reordering VII
1 U1 0
II
V U → ,
0 VII
2 U2
One can obtain all such types of partitions K II by fixing the components of b1
and permuting the components of b2 . In this way each permutation provides us a
partition of pairs K II . Hence,
S II (r, r) = r! (1.52)
L
(see Exercise 1.52). Equivalently, ↔ U is 2 × n = 2r with n/2 1s in each row
and VII is n/2 × n. So that U VII is 2 × n/2 with each entry being 1. Suppose
that U is fixed; then the number of all the matrices VII is (n/2)!. Indeed in each row
of VII there are two 1s and n − 2 0s. The place of one of the 1s is fixed (to obtain a
canonical partition) and the second element 1 can be put into n/2 places in the first
row n/2 − 1 places in the second row, and so on.
Example 1.31 Let L = {b1 , b2 }, where b1 = (1 : 3), and b2 = (4 : 6), then
!
"
111000
L↔U= ,
000111
We see that there are 3! partitions K II . Indeed, let us fix the places of b1 (in different
rows naturally), then the second “1” in the first row can be put into three places,
the second “1” in the second row can have two places, and finally there is only one
place for the remaining third “1” in the third row.
Case 1.6 The next problem concerns finding all the closed diagrams L, K II
without loops, where L = {b1 , b2 , b3 }, with |bj | = rj , suppose that r1 + r2 + r3 =
2r, even then
r1 r2
S II (r1:3 ) = r3 ! (r − r3 )!, (1.53)
r − r2 r − r1
the result is
r1 r2 r3 r1 r2
k1 ! (r1 − k1 )! (r3 − r1 + k1 )! = r3 ! (r − r3 )!.
k1 k1 r1 − k1 r − r2 r − r1
In each step we chose a block and an element from that block. We could do it 2×2×2
ways, S II (2, 2, 2) = 8 (see (1.53)).
Case 1.8 The case of four blocks is more complicated than the previous ones. Con-
sider the closed diagrams L, K II cnl without loops, where L = {b1 , b2 , b3 , b4 },
with |bj | = rj ; suppose that r1 + r2 + r3 + r4 = 2r is even. Then split up each bj
1.4 Partitions and Diagrams 47
into three parts kj,1:3 , such that k1,2 = k2,1 , k1,3 = k3,1, k1,4 = k4,1 , and proceed
k2,3 = k3,2 , k2,4 = k4,2 , finally k3,4 = k4,3. See the following table.
If the number of parts k1:4,1:4 is given, then we use the reasoning of the previous
cases and finally we sum all these possible parts
∗ r1 !r2 !r3 !r4 !
S II (r1:4 ) = , (1.55)
k1,2!k1,3 !k1,4 !k2,3!k2,4 !k3,4 !
where summation ∗ is for all non-negative integers k1,2 , k1,3 , k1,4 , k2,3 , k2,4 , k3,4 ,
such that k1,2 + k1,3 + k1,4 + k2,3 + k2,4 + k3,4 = r, and r1 = k1,2 + k1,3 + k1,4 ,
r2 = k1,2 + k2,3 + k2,4 , r3 = k1,3 + k2,3 + k3,4 , r4 = k1,4 + k2,4 + k3,4 (see
Exercise 1.55).
Let us consider
a partition K I,II with blocks having two elements at most. Let a
diagram L, K I,II correspond to a graph that can have free edges with respect to
the blocks of K I,II with single elements. These free edges are called the arms of the
diagram L, K I,II . Let PI,II n denote the set of all partitions having blocks with one
I,II
or two elements, i.e. Pn = K I,II . The arms of a diagram L, K I,II correspond
to the blocks of partition K I,II with one element.
Example
1.33 Take the partitions L = {b1 = (1), b2 = (2, 3, 4)}. The diagram
L, K I,II is closed if blocks b1 and b2 are connected by K I,II . If K I,II =
{(1, 2) , (3) , (4)} of (1 : 4), then L, K I,II is closed and has two arms. The graph
(L, K I,II ), K I,II ∈ PI,II
n has only one edge (k1 , k2 ) ∈ K
I,II but has arms (j ) ∈ K I,II
as well.
Put dK for the number of arms in partition K I,II and DK = j | (j ) ∈ K I,II for
the set of arms, |DK | = dK . A pair of matrices U, VI,II corresponds to the diagram
(L, K I,II ), where VI,II is a partition matrix with rows having entry 1 in either one or
two places. dK is the number of rows at VI,II having only one entry with 1.
First we consider set PI,IIn , and separate its partitions by the number of arms dK .
Example 1.34 Consider PI,II n , to be the case when n = 4. All the possible
choices for dK are 4, 2, and 0. If dK = 4, K I,II contains only arms, K I,II =
{(1) , (2) , (3) , (4)}, i.e. VI,II = I4 . If dK = 2, K I,II contains 2 arms and 1 pair, for
48 1 Some Introductory Algebra
Now if dK = 0, K I,II contains only pairs, for example K I,II = {(1, 2) , (3, 4)}, i.e.
! "
1100
V I,II
= .
0011
n!
SnI,II (n − 2k) = .
(n − 2k)!k!2k
I,II
Sn+1 (m) = SnI,II (m − 1) + (m + 1) SnI,II (m + 1) ,
n
SnI,II = (n − m − 1)!!,
m
m=n:2:0
min(n1 ,n2 )
n1 !n2 !
SnI,II
1 ,n2
= .
(n1 − r)! (n2 − r)!r!
k=0
Case 1.10 Now, we consider a closed diagram L, K I,II cnl without loops with
arms, where L = (b1 , b2 , b3 ), with |bj | = nj . Let n1:3 = n, and the number
of arms dK be n − 2r, which implies that there are r edges. Suppose furthermore
that these edges connect r1 vertices from b1 , r2 vertices from b2 , and r3 vertices
from b3 , so r1:3 = 2r. The number SnI,II 1:3 (r1:3 ) of all such partitions is
n1:3 II n1:3 !
SnI,II (r1:3 ) = S (r1:3 ) = , (1.58)
1:3
r1:3 (n1:3 − r1:3 )! (r − r1:3 )!
where r = (r, r, r), since one has chosen sub-blocks with cardinalities r1:3 in all
possible ways, and then apply (1.53) to obtain all closed diagrams without a loop.
Case 1.11 All closed diagrams L, K I,II without loops, with arms, where L =
(b1 , b2 , b3 , b4 ), with |bj | = nj ; let n1:4 = n, the number of arms dK be n − 2r,
which implies that there are r edges. Suppose furthermore that these edges connect
r1 vertices from b1 , r2 vertices from b2 , r3 vertices from b3 , and r4 vertices from b4
so r1:4 = 2r. The number SnI,II 1:4 (r1:4 ) of all such partitions is
n1:4 II
Sn1:4 (r1:4 ) =
I,II
S (r1:4 ) . (1.59)
r1:4
1.5 Appendix
Proof
vec (BA) = A ⊗ Ip vecB
= (vecB) ⊗ Inp vec A ⊗ Ip
= (vecB) ⊗ Inp Im ⊗ Kp•n ⊗ Ip vecA ⊗ vecIp .
Proof It is enough to prove that one can interchange two neighboring matrices in
a T-product. This latter one, i.e. interchanging two neighboring matrices, follows
from the formula (1.15) as follows. We use (1.15) for interchanging the neighboring
elements of the T-product Aj +1 ⊗ Aj +2 ; by our notation we have
K(21) dj +1 , dj +2 Aj +1 ⊗ Aj +2 K−1
(21) pj +1 , pj +2 = Aj +2 ⊗ Aj +1 .
Hence we can complete both commutator matrices with the T-product of unit
matrices and obtain
⊗
Id1:j ⊗ K(21) dj +1 , dj +2 ⊗ Idj+3:n Ak
k=1:n
−1
Id1:j ⊗ K(21) pj +1 , pj +2 ⊗ Idj+3:n
⊗
= Ak ⊗ K(21) dj +1 , dj +2 Aj +1 ⊗ Aj +2
k=1:j
⊗
K−1
(21) pj +1 , pj +2 ⊗ Ak
k=j +3:n
⊗ ⊗
= Ak ⊗ Aj +2 ⊗ Aj +1 ⊗ Ak .
k=1:j k=j +3:n
2
e⊗
since among the q! terms e⊗
j1:q includes j1:q = q!/(j1:q ) ! terms. Now
w = Sd1q w = w(k1:q ) Sd1q e⊗
k1:q .
(k1:q )
Summing up Sd1q e⊗
k1:q by those k1:q for which p k1:q = j1:q , we shall have the
e⊗
same basis vectors (j1:q ) !/q!j1:q and if we sum up the coefficients in each case,
2
e⊗
then we can replace (j1:q ) !/q! by 1/ j1:q ; hence
w= e⊗
(j1:q )
w j1:q ,
(j1:q )∈IÐd,q
where
1
(j1:q ) =
w 2
w(k1:q ) .
e⊗
j1:q (k1:q )|p(k1:q )=(j1:q )
by direct computation.
52 1 Some Introductory Algebra
1.6 Exercises
Section 1.1
(1423)−1 = (1342) ,
(341625)−1 = (351264) ,
(561324)−1 = (354612) .
Section 1.2
ab = a ⊗ b .
a1 ⊗ Bb = (a1 ⊗ B) b,
and
a1 ⊗ Bb ⊗ a2 = (a1 ⊗ B ⊗ a2 ) b,
more general
⊗ ⊗ ⊗ ⊗
ak ⊗ Bb ⊗ ak = ak ⊗ B ⊗ ak b.
1:j −1 j +1:M 1:j −1 j +1:M
and
vec Id ⊗ Id 2 a⊗4 = Id 2 ⊗ vec Id a⊗4 .
54 1 Some Introductory Algebra
L−1 −1 −1
22 = Id 4 + Kp1 + Kp2 , (1.61)
and
Id 4 + Id 4 + K−1 −1
(12)S (d, d, d) ⊗ Id K(12)S d, d , d = Id 4 + K(3124) + K(1342) .
2
and if a = b
vec a ⊗ a ⊗ a ⊗ a = a⊗4 ,
d
vecId = e⊗2
k . (1.62)
k=1
More general, denote matrix e⊗k−1
by E (d, k) and show
=1:d
d
vecE (d, k) = e⊗k
k .
k=1
vec⊗2 Id vec⊗2 Id = d 2 ,
vec Id 2 vec⊗2 Id = d,
and
vec Id 2 K(1324)vec⊗2 Id = d 2 .
1.30 Show
vecId 3 ⊗ vecId 4 = Id 3 ⊗ Kd 3 •d 4 ⊗ Id 4 vecId 7 .
L−1 −1 −1 −1
13 = Id 4 +K(1243) + K(1342) +K(2341) ,
without using inverse of matrices. Hint: see Remark 1.1 for inverse permutations.
1.33 Show
K(135246)vec⊗3 Id = vecId 3 ,
and
K−1 ⊗3
(142536)vec Id = vecId 3 .
1.34 Show
1.37 Denote the columns of matrix B by bk and ek be the unit coordinate vector,
show
vec B⊗2 = bj ⊗ bk ⊗ ej ⊗ ek ,
vec⊗2 B = bj ⊗ ej ⊗ bk ⊗ ek ,
1.6 Exercises 57
and
vec Kd•d B⊗2 = bj ⊗ bk ⊗ ek ⊗ ej .
Section 1.3
L−1 −1 −1 −1
13 ,11 = Id 4 +K(1243) + K(1342) +K(2341) .
Show that
w (a) = L−1
13 ,11 a ⊗3
⊗ b
is symmetric. Hint: rewrite the inverse in terms of inverse permutations first, see
Exercise 1.32, p. 56.
#
1.43 Show that vecId ∈ Sd,2 with rank d and vecId = dj=1 e(j,j ) .
1.44 Let w ∈ Sd,2 , show the particular form
d−1 d−k
w= e⊗
wj,j +k(j,j +k)
k=0 j =1
of w for general d.
1.45 Show
Qd,2vecA = Q+
d,2 vec A + A − diagA .
58 1 Some Introductory Algebra
1
Sd12 (b ⊗ A) = (b ⊗ A + A ⊗ b) .
2
Section 1.4
The theory of permutation is well known. Here we refer to the classical works
[And76] and [Aig12].
The tensor product has a wide range of references, we used [MN99, Gra18], see
also [KvR06]. One reason we use the name T -product instead of Kronecker product
is the following quote by Muirhead [Mui09] “Actually the connection between
this product and the German mathematician Kronecker (1823-1891) seems rather
obscure.” Earlier a distinction was made between the right, half-right and left, half-
left tensor products, [Hol85]; we use the left–right one as a general case. The vec
operator, also called the pack operator, was considered by [Mac74] in addition to
the previous references for tensors.
The theory of commutation matrices is described by Magnus and Neudecker
[MN99] in detail. [Hol85] uses the notion of direct product permuting matrix
(DPPM) of k degrees, and a particular case, the transposition matrix is also
considered there. The structure of a commutator in terms of elementary matrices
is given by [Gra18].
[HS81] reviewed the results of a vec-permutation matrix for a cyclically permut-
ing order in Kronecker products of three or more matrices.
An algorithm for generating the commutator matrix Kp for any p ∈ Pn has been
published by Holmquist [Hol96a]. The star product with basic properties has been
applied for matrix derivatives by [Mac74].
Introductory works on multilinear algebra are [KM97] and [Nor84].
The symmetrization matrix has been considered by [Mei05] and [Hol96a] with
the latter using the notation Sd1q as well. Further results on symmetric tensors are
treated in [CGLM08] and [DL08]. For orthogonal decomposability, ODECO for
short, see [Rob16], and for super-symmetric tensors, see [DGP18].
Duplication and elimination matrices can be found in [CGLM08, Mei05, MN80]
and [MN99]. Our approach is traced back to Magnus and Neudecker (1980) [MN80]
who introduce two transformation matrices, L and D, which consist of zeros and
ones. L eliminates from vecA the supra-diagonal elements of A for any (n, n)
arbitrary matrix A, while D performs the reverse transformation for a symmetric
A.
The inclusive and exclusive extensions methods are treated in [Cam94]. Aigner
[Aig12] studies partitions in detail as a lattice structure, as well as the number of
all partitions of type etc., see also DLMF [DLM], 26.8.22. Bell numbers are
considered by [And76]; Bell polynomials with statistical applications have been
used by [VN94] and [VN97].
Indecomposable partitions in relation to stochastic processes are considered by
[LS59, Bri01] and [MM85] among others.
The number of closed diagrams without loops has been considered by [Car62],
[Fel40] (p.22).
Chapter 2
The Tensor Derivative of Vector
Functions
Example 2.2 Let f (x) = exp (x), g (x) = ln EeixX . Then h (x) = f (g (x)) =
EeixX is the characteristic function.
Our object is to obtain an expression for the partial derivatives of h (·) in terms of
the partial derivatives of g (·) since such expressions will be used to obtain results
for cumulants and moments.
Let us introduce the following notations for the derivative:
d r f (x)
f (r) (x) = , r = 1, 2, . . . .,
dx r
and for the partial derivatives
∂g (x)
gj = , j = 1, 2, . . . , d,
∂xj
∂ k g (x)
g(1,2,...,k) = , k = 2, . . . , d.
∂x1 ∂x2 · · · ∂xk
One can easily verify the following equations in terms of the partial derivatives
of (2.1):
∂h (x)
= f (1) (g) g1 , (2.2)
∂x1
∂ 2 h (x)
= f (1) (g) g(1,2) + f (2) (g) g1 g2 . (2.3)
∂x1 ∂x2 + ,- . +,-.
(inclusive) (exclusive)
Observe that taking the derivative of (2.2), we obtain an inclusive and an exclusive
extension of index (1) of g1 and this happens when we take the next derivative
∂ 3 h (x)
= f (1) (g) g(1,2,3) + f (2) (g) g(1,2) g3 (2.4)
∂x1 ∂x2 ∂x3 + ,- . + ,- .
(inclusive) (exclusive)
+f (2)
(g) g(1,3) g2 + g1 g(2,3) + f (3) (g) g1 g2 g3
+ ,- . + ,- .
(inclusive) (exclusive)
= f (1) (g) g(1,2,3) + f (2) (g) g(1,3) g2 + g1 g(2,3) + g(1,2) g3 (2.5)
+f (3) (g) g1 g2 g3
as well. From the expressions (2.2), (2.3), and (2.5), we see that higher-order
partial derivatives can be obtained from lower-order derivatives using inclusive
and exclusive principle that we have introduced earlier; (see Sect. 1.4, p. 26).
2.1 Derivatives of Composite Functions 63
Tables 1.1, 1.2, p. 28 and 1.3, p.29 are useful when we derive expressions (2.2),
(2.3), and (2.5).
We notice that the evaluation of the partial derivatives of h follow an elegant
pattern, since the derivatives of h depend on the evaluation of the partial derivatives
of g in a special way. Let us look at the expressions (2.3) and (2.5). If we associate a
vector of one element [1] with g1 , then the partial derivative of g in (2.5) is obtained
from the vectors [1, 1], [1, 0], and [0, 1]. We conclude
(See Table 1.1). We can consider the above partial derivatives as partitioning the
set {1, 2} by two partitions K1 = {(1, 2)} and K2 = {(1) , (2)}, where partition K1
contains a single block with respect to the second-order partial derivative g(1,2) , and
K2 contains two blocks with respect to g1 and g2 . When we calculate the third-order
derivative, which is given by (2.5) via (2.4), we note that from every partition in (2.4)
we obtain both inclusive and exclusive partitions. The number of inclusive partitions
which are obtained at any stage n is equal to the number of blocks of partitions at
stage (n − 1) plus one exclusive partition. At stage 2 we have two partitions: K1 =
{(1, 2)} and K2 = {(1) , (2)}, the second partition has two blockstherefore at step 3,
(i.e. third-order partial derivatives), we will have two inclusive g(1,3) g2 , g1 g(2,3)
and one exclusive g1 g2 g3 partition (see Table 1.2, p. 28) according to K2 . We will
have one inclusive g(1,2,3) and one exclusive partition g(1,2)g3 which correspond to
K1 . In other words, the partial derivatives obtained in (2.5) can be considered as
partitions of the set {1, 2, 3} into partitions K1 = {(1, 2, 3)}, K2 = {(2, 3) , (1)},
K3 = {(1, 3) , (2)}, K4 = {(1, 2) , (3)}, K5 = {(1) , (2) , (3)}. Partition K1 has one
block, the partitions K2 , K3 , K4 have two blocks each, while partition K5 has three
blocks, etc.
Remark 2.1 It is worth mentioning that the each partition Kj above is in a canonical
form.
Now we state and prove a general result for the derivatives of a compound function.
Lemma 2.1 (Faà di Bruno’s Formula) Let f and g be two scalar-valued func-
tions, f is a function of one variable and g is a function of variable x ∈ Rd . Let f
be continuously differentiable d times and g (·) be continuously differentiable by all
entries of its argument x. Let h (x) be a scalar-valued implicit function of x so that
h (x) = f (g (x)) .
64 2 The Tensor Derivative of Vector Functions
Then for n ≤ d,
∂ n h (x)
n ∂ |b| g (x)
= f (r) (g) , (2.6)
∂x1 ∂x2 · · · ∂xn j ∈b ∂xj
r=1 K{r} ∈Pn b∈K{r}
where K{r} denotes partitions with size |K| = r, and ∂ |b| g (x) / j ∈b ∂xj is the
|b|t h -order derivative of g (x) with respect to a block xb = xj , j ∈ b of x.
Before one proves the above result, the product and the summation symbols
in Eq. (2.6) need some explanation. The first product symbol b∈K{r} stands for
the product over all blocks# b which belong to a particular partition K{r} (see for
instance (2.5)). Summation K{r} ∈Pn is over all such possible partitions K{r} which
are generated from the set (1, 2, . . . , n) and in which the number of blocks is r. For
example in the case when n = 3, (2.5) shows that we have one partition K{1} for
r = 1; three partitions K{2},1 , K{2},2 , K{2},3 each with two blocks for r = 2 , and
one partition K{3} = {(1) , (2) , (3)} with three blocks when r = 3.
The proof of Lemma 2.1 is given in the Appendix on p. 98.
We have stated Faà di Bruno’s Lemma for the first-order derivatives of the
function h by each of its variable xj . We will show below that this is not really
a restriction since all higher-order derivatives can be derived from this one. If the
function g is of one variable and we are interested, say, in the third-order derivative
g (3) , we may proceed as follows. Let us define a function g of three variables by
g (x1 , x2 , x3 ) = g (x1 + x2 + x3 ) ,
This idea leads us to the distinct values principle, and it will be used for mixed
higher-order derivatives as well. We recognize that the partial derivatives allow us
to equate some variables after the derivative has been taken. The general formula
(2.6) for a compound function of one variable will have the following simple form.
Corollary 2.1 (Faà di Bruno’s Formula of One Variable) If h (x) = f (g (x)),
then the nth-order derivative of h can be obtained from (2.6) and is given by
∂n
n d |b|
f (g (x)) = f (r) (g (x)) g (x) .
∂x n b∈K{r} dx |b|
r=1 K{r} ∈Pn
We see that the derivative d |b| /dx |b| g (x) depends only on the cardinality of block
b. Hence we can collect blocks with the same cardinality j . This allows us to use
2.1 Derivatives of Composite Functions 65
the idea of type = (1 , . . . , n ) of a partition K{r} , (see Definition 1.4, p. 32). We
recall that j is the number of blocks having cardinality j , j = 1, 2, . . . , n, in a
partition K ∈ Pn . We have that j ≥ 0, and
n−r+1
n−r+1
n= j j , j = r.
j =1 j =1
Now, since we are given the number of all such partitions, (see (1.41), p. 32),
therefore
n
n−r+1 j
∂n 1 g (j ) (x)
f (g (x)) = f (r) (g (x)) n! .
∂x n j ! j!
r=1 j =r;j j =n j =1
(2.7)
Later we shall prove Theorems 3.1 and 3.3 as an application of the formula (2.6).
Besides many other uses of this formula (2.6), we will now consider two particular
cases which will help us to prove some properties of cumulants.
Corollary 2.2 Let f (x) = ln x, and recognize that the rth-order derivative of
f (x) = ln x is
so for the compound function ln (g (x)) we have the derivative ∂x1 ∂x2 . . . ∂xn given
by
∂n
n ∂ |b|
ln (g (x)) = (−1)r−1 (r − 1)!g −r (x) g (x) ,
∂x1:n j ∈b ∂xj
r=1 K{r} ∈Pn b∈K{r}
(2.8)
where ∂x1:n = ∂x1 ∂x2 . . . ∂xn , |b| denotes the number of the elements in block b
and the second summation extends over all the partitions K{r} of Pn with size r (see
(2.5) for n = 3). Again, if g is of one variable then we get
j
∂ n ln (g (x))
n
n−r+1
1 g (j ) (x)
= (−1)r−1 (r − 1)!g −r (x) n! ,
∂x n j ! j!
r=1 j =r, j j =n j =1
(2.9)
by (2.7), since the derivative ∂ |b| / j ∈b ∂xj depends only on the cardinality |b| of
the block b, and the number of all partitions with type = (1 , . . . , n ), are given
by (1.41).
66 2 The Tensor Derivative of Vector Functions
n
n−r+1 j
∂ n exp (g (x)) 1 g (j ) (x)
= exp (g (x)) n! ,
∂x n j ! j!
r=1 j =r, j j =n j =1
(2.11)
n
n−r+1 j
1 1
0 = n! (−1)r−1 (r − 1)! , (2.12)
j ! j!
r=1 j =r, j j =n j =1
where h0 = 1, and hn is the expression (2.11) at zero; at the same time hn is the
Bell polynomial of g (1) , g (2) , . . . , g (n) , (see (1.44), p. 34), i.e.
n n!
hn = hn g (1) (0) , g (2) (0) , . . . , g (n) (0) = n−r+1 .
r=1 j =r, j j =n j =1 j ! (j !)j
n n!
Bn = n−r+1 .
r=1 j =r, j j =n j =1 j ! (j !)j
(see (1.39), p. 31), those are the numbers of the elements of set Pn . It is seen
therefore that H (x) is the generating function of Bell numbers Bn , i.e.
∞
xn
H (x) = exp (exp (x) − 1) = Bn ,
n!
n=0
where B0 = 1.
Faà di Bruno’s formula (2.6) is general, since we can apply the distinct values princi-
ple and can obtain any higher-order derivatives. Nevertheless, some explicit formula
might be useful for mixed higher-order derivatives, like e.g. ∂ j +k /∂x1 ∂ k x2 f (g).
j
∂ |j|
Dj g (x) = g (x) ,
∂xj
we observe that the dimension of j is determined by the dimension of x. Moreover an
index j may contain zeros at places which are not included in the partial derivatives.
Now, we generalize the way by which partitions are used to derive Faà di Bruno’s
formula of one variable (2.7). In the case of one variable we identified blocks of a
partition K{r} ∈ Pn only by the cardinality. Thus partitions K{r} with the same type
provide the same partial derivative
∂ |b| g (x)
n−r+1 j
= g (j ) (x) .
b∈K{r} j ∈b ∂xj j =1
68 2 The Tensor Derivative of Vector Functions
where n = (n1 , n2 ), and 0 ≤ nk are integers. We use the distinct values principle
similar to the one variable case. Consider partial derivatives ∂ n /∂y1 ∂y2 . . . ∂yn , with
n = |n| = n1 + n2 , and identify the first n1 members among the variables y1:n by x1
and the rest of the n2 members by x2 . Thus each block bk of a partition K{r} ∈ Pn
will be characterized by a pair jk,1 , jk,2 = jk , where jk,1 denotes the number of
the elements from set 1 : n1 and jk,2 from the set (n1 + 1) : n2 , respectively. Each
of them is included in the block bk . The possible values of jk,m are 0, 1, . . . , nm ,
m = 1, 2, and the maximum number of such pairs is K = (n1 + 1) (n2 + 1) − 1,
since each block contains at least one element and there is no block at all with a pair
(0, 0). We use the lexicographic order of the pairs j1:K , containing all possible pairs
which can occur in any partition K{r} ∈ Pn ; an element jk of the complete list of all
pairs j1:K will be called d−dimensional cardinality of a block b, (d = 2 right now).
In this situation the type = 1:K of a partition K{r} with cardinality r means
that pair jk shows up k times in K{r} . Since we list k for all possible jk , the type
= 1:K of a partition K{r} contains quite a number of zeros. Each element of the
sets 1 : n1 and (n1 + 1) : n2 occurs exactly once in a partition K{r} , so that the
following equations:
K
K
jk,1 k = n1 , jk,2 k = n2 ,
k=1 k=1
K
k = r.
j =1
K
K
jk k = n, k = r,
k=1 j =1
∂ |n|
n K
1 Djk g k
f (g) = (r)
f (g) n! , (2.13)
∂x1n1 ∂x2n2 r=1
j ! jk !
p(r,,j)
k=1
where set p (r, , j) = p (r, (1 , . . . , K ) , (j1 , . . . , jK )), is defined by the constraint
K
K
K
jk,1 k = n1 , jk,2 k = n2 , k = r.
k=1 k=1 j =1
The following example shows the usage of the formula (2.13) for a higher-order
mixed derivative.
Example 2.3 Suppose we are interested in the mixed derivative ∂ 3 /∂x12 ∂x2 of
h (x) = f (g (x1 , x2 )). We have n1 = 2, n2 = 1, and n = 2 + 1 = 3. We
consider all partitions of P3 ; see Table 1.2, and change 3 by 1 everywhere in
the Table. For instance, partition K = {b1 = (1, 3) , b2 = (2)} is transformed to
K = {b1 = (1, 1) , b2 = (2)}. Next, we observe that a block can be characterized
by two numbers (j1 , j2 ), showing that the block contains j1 times the element 1 and
j2 times the element 2, such that j1 = 0, 1, 2, and j2 = 0, 1. Then (j1 , j2 ) = (2, 0)
for block b1 = (1, 1) and (0, 1) for block b2 = (2), respectively. The number of
all possible pairs (j1 , j2 ) of all blocks is K = 3 · 2 − 1. We enumerate the pairs
jk , k = 1 : 5, by j1 = (0, 1), j2 = (1, 0), j3 = (1, 1), j4 = (2, 0), j5 = (2, 1).
If r = 1 then the only partition is K = {b = (1, 2, 1)} and j = j5 = (2, 1), with
type 1:4 = 0, 5 = 1. If r = 2 then we have two partitions; one is the type
1 = 1, 4 = 1, and the pairs are j1 = (0, 1), j4 = (2, 0), the other one is the
type 2 = 1, 3 = 1 and j2 = (1, 0), and j3 = (1, 1). Finally, if r = 3, then
K = {b1 = (1) , b2 = (2) , b3 = (1)}, so that j1 = (0, 1), and j2 = (1, 0), which
corresponds to 1 = 1, 2 = 2. We use (2.13) and obtain
∂3
D2,1 f (g) = f (g)
∂x12∂x2
= f (1) (g) D2,1 g + f (2) (g) D2,0 gD0,1 g + 2D1,0 gD1,1 g
2
+ f (3) (g) D1,0 g D0,1 g.
The same result follows from (2.6), or more directly from (2.5), using the distinct
values principle, in fact we consider
∂ 3 h (y)
= f (1) (g) g(1,2,3) + f (2) (g) g(1,3) g2 + g1 g(2,3) + g(1,2) g3 + f (3) (g) g1 g2 g3 ,
∂y1 ∂y2 ∂y3
70 2 The Tensor Derivative of Vector Functions
∂3
h (x1 , x2 ) = f (1) (g) D2,1 g + f (2) (g) D2,0 gD0,1 g + 2D1,0 gD1,1 g
∂x12 ∂x2
2
+f (3) (g) D1,0 g D0,1 g.
Faà di Bruno’s formula for general mixed derivatives also follows in a similar
manner.
Theorem
2.1 Let n = (n1 , . . . , nd ), |n| = nk , K = (nk + 1) − 1, 0 < jk =
jk,1 , . . . , jk,d ≤ n, k = 1 : K, 0 ≤ k , and k = 1 : K then we have
|n|
∂ |n|
K
1 Djk g k
f (g) = f (r)
(g) n! ,
∂x1n1 · · · ∂xdnd r=1 p(r,,j) k=1
k ! jk !
K
K
jk,s k = ns , s = 1 : d, k = r.
k=1 j =1
2.2 T-derivative
We will now discuss the basic differentiation rules when taking partial derivatives
as a tensor product for vector-valued functions. This simple technique will be handy
for dealing with several multivariate nonlinear statistical problems.
For x ∈ Rd , let the vector-valued function f (x) = [f1 (x) , f2 (x) , . . . , fm (x)] be
differentiable as many times as is necessary. Let us introduce the notation for partial
derivatives
! "
∂ ∂ ∂ ∂
= , ,..., ,
∂x ∂x1 ∂x2 ∂xd
2.2 T-derivative 71
This notation is in accordance with the matrix product of a column vector f and a
row vector operator ∂/∂x , i.e.
⎡ ∂f1 ∂f1 ∂f1 ⎤
∂x1 ∂x2 ··· ∂xd
! " ⎢ .. ⎥
⎢ ∂f2 . . . ⎥
∂ ∂ ∂ ⎢ . ⎥
f , ,..., = ⎢ ∂x1 ⎥ .
∂x1 ∂x2 ∂xd ⎢ .. .. .. ⎥
⎣ . . . ⎦
∂fm ∂fm
∂x1 · · · · · · ∂xd m×d
The Jacobian matrix with respect to the partial derivative Dx f =∂f/∂x is well
known. We will use the Magnus–Neudecker approach where the justification of the
differentiation rules, which we are going to derive below, is based on the theory of
partial derivatives and differentials. The First Identification Theorem provides the
connection between the differential df, and the derivative Dx f of a vector function,
namely
If the differential df of a function f is given, then the value of the partials can be
immediately determined. The rules for obtaining differentials are commonly used
for ordinary derivatives, so it is convenient to obtain the derivative (for instance the
Jacobian matrix) from the differential.
We provide three examples to illustrate this idea.
Cauchy’s rule An instance is the derivative of a composite function using
Cauchy’s rule of invariance. The formula of Cauchy’s rule of invariance states
that if f ∈ Rm1 , g ∈ Rm2 , x ∈ Rd , and h (x) = f (g (x)), then
This clearly implies the chain rule using the First Identification Theorem
using (1.8), p. 7. Therefore the partial derivative, i.e. the Jacobian matrix follows
The difference of a three-tensor product follows directly from the previous case
Dx (f1 ⊗ f2 ⊗ f3 ) = Dx f1 ⊗f2 ⊗f3 +f1 ⊗(Dx f2 )⊗f3 +f1 ⊗f2 ⊗(Dx f3 ) . (2.18)
More generally the Jacobian matrix of the tensor product for vector-valued
functions is given by the following:
Lemma 2.3
⊗
M
⊗
⊗
Dx fk (x) = fk ⊗ Dx fj (x) ⊗ fk . (2.19)
1:M 1:j −1 j +1:M
j =1
We note that for a scalar-valued function f , the T-derivative coincides with the
gradient of f at a point x. Recall the difference vecA = vec (A ) and vec A =
(vecA) .
The main difference between the vectorized Dx and Dx⊗ is that Dx⊗ is strongly
connected to the T-product and results in a vector which is formally a T-product of
function f and the partial derivative ∂/∂x. In practice Dx⊗ is the following T-product
∂ ∂ ∂
Dx⊗ f = vec f = vec f = f ⊗ . (2.20)
∂x ∂x ∂x
or using (2.20)
∂
Dx⊗ Ax = (Ax) ⊗ = (A ⊗ Id ) vecId = vecA , (2.21)
∂x
(see (1.8), p. 7 for the last equation).
We provide some properties of Dx⊗ , which will frequently be used for different
applications.
Property 2.1 Let g (x) = Af (x), where x ∈ Rd , f (x) ∈ Rm2 , and A is m1 × m2 ,
then we obtain
In fact, we have
∂ ∂
Dx⊗ g = Af ⊗ = (A ⊗ Id ) f ⊗ = (A ⊗ Id ) Dx⊗ f.
∂x ∂x
We use the chain rule (2.14) for the Jacobian to prove this
*
∂ **
Dx f (Ax) = f * A,
∂y y=Ax
Some simple rules for T-derivatives follow directly from the definition and the above
properties of partial derivatives.
Sum rule Let f ∈ Rm , and g ∈ Rm be functions of variable x ∈ Rd , then
Dx⊗ (f + g) = Dx⊗ f + D⊗
x g.
Here we need some more explanation of, say, Dx⊗ h|g=const . We shall see in
Lemma 2.4 that the derivative of compound functions cannot be decomposed.
It is not a simple product of the derivative of h multiplied by the derivative of f
as is usual in the case of compound functions.
We proceed with the T-derivative of compound functions, which is more general
than Property 2.2 and is usually referred to as the chain rule.
Let f ∈ Rm1 , g ∈ Rm2 , x ∈ Rd , and y ∈ Rm2 then for the function h (x) =
f (g (x)) we have the chain rule (2.14) for partial derivatives as follows:
Dx h (x) = Dy f (y) m (Dx g (x))m2 ×d ,
1 ×m2
where y = g (x). We apply Lemma 1.1, p. 10, where we have shown for an m2 × m1
matrix A and a p × m2 matrix B that
vec (BA) = vec A ⊗ Im1 p Im1 ⊗ Km2 •m1 ⊗ Ip vecIm1 ⊗ vecB
= vec B ⊗ Im1 p Im2 ⊗ Kp•m1 ⊗ Ip vecA ⊗ vecIp .
Now we replace Am2 ×m1 by Dy f , and Bd×m2 by (Dx g) and obtain
Dx⊗ h = vec (Dx h) = vec (Dx g) Dy f = Dy f ⊗ Id Dx⊗ g = Im1 ⊗ (Dx g) Dy⊗ f,
(see (1.19), p. 9). The second line of expression vec (BA) implies a similar result as
well.
Thus we have obtained the following Lemma.
Lemma 2.4 (Chain Rule) Let f ∈ Rm1 , g ∈ Rm2 , x ∈ Rd , and y ∈ Rm2 , then the
chain rule for the compound function h (x) = f (g (x)) is
Dx⊗ h = Dy f ⊗ Id Dx⊗ g = Dy⊗ f ⊗ Im1 d K(1324) (m1 , m1 , m2 , d) vecIm1 ⊗ Dx⊗ g ,
(2.24)
where
where
The general form of the chain rule is not very convenient. When we express the
T-derivative of a composite function in terms of T-derivatives of the compositions
(second rows of (2.24) and (2.25)). There are cases when using Jacobian matrices,
either Dy f or Dx g (first rows of (2.24) and (2.25)) simplifies the result. One can
recognize the ordinary chain rule using the product in (2.24) and obtaining Dx⊗ h =
Dy f Dx⊗ g, (see (1.60), p. 51 for the star product).
The following particular cases have important applications:
1. If h is scalar-valued h = h, i.e. f = f , m1 = 1, then
Dx⊗ h = Dy⊗ f ⊗ Id Dx⊗ g, (2.26)
or in another form
where denotes the star product, see (1.60), p. 51. Using the star product, the
ordinary chain rule appears here.
2. If g is scalar-valued, i.e. m2 = 1, g = g, then Dy⊗ = ∂/∂y and
Dx⊗ h = Im1 ⊗ Dx⊗ g Dy⊗ f,
hence
(see (1.22)).
4. If both f and g are scalar-valued, i.e. m1 = 1, m2 = 1, then Dx⊗ = ∂/∂x and
Dy⊗ = ∂/∂y and we obtain
Let us start with an example which shows the internal aspects of the usage of T-
derivative.
Example 2.5 The differential of x⊗2 is
where K(132) = Id ⊗ K(21) is the commutator for ∂/∂x and x in the product x⊗2 ⊗
∂/∂x. Note that the difference between x ⊗ (x ⊗ ∂/∂x) = x ⊗ Dx⊗ x, and x⊗2 ⊗
∂/∂x = Dx⊗ x⊗2 . The first term in (2.29) corresponds to (x ⊗ ∂/∂x) ⊗ x where we
differentiate the “first” term of x⊗x; doing so we need to interchange the second and
third term in a T-product a ⊗ a ⊗ b, this can be done with the help of a commutator
matrix as
a ⊗ a ⊗ b = K−1 −1
(132) K(132) (a ⊗ a ⊗ b) = K(132) (a ⊗ b ⊗ a) , (2.30)
We note the use of the inverse of commutator. Here and later on it is pointed out—
as in (2.30), that the result is reached by interchanging vectors back and forth. It is
worth noting that although the derivative is unique, the form of the right-hand side
(2.29) is not, since
x ⊗ vecId = K−1
(231) (vecId ⊗ x) .
We shall keep using both notations Dx⊗ and ⊗∂/∂x for the T-derivative. It is
important to note that although the tensor product is associative in general and
Dx⊗ f = f ⊗ ∂/∂x, nevertheless expressions in tensor products including ∂/∂x are
not associative any more. For instance (f1 ⊗ f2 ) ⊗ ∂/∂x = f1 ⊗ (f2 ⊗ ∂/∂x), since
(f1 ⊗ f2 ) ⊗ ∂/∂x = Dx⊗ (f1 ⊗ f2 ) and f1 ⊗ (f2 ⊗ ∂/∂x) = f1 ⊗ Dx⊗ f2 ; therefore,
bracketing is necessary for the correct calculus.
We proceed with some properties of the operator Dx⊗ which are frequently used
below.
Property 2.3 If x ∈ Rd , f1 ∈ Rm1 , and f2 ∈ Rm2 then
⊗
Dx⊗ (f1 ⊗ f2 ) = K−1 ⊗
(132) (m1 , m2 , d) Dx f1 ⊗ f2 + f1 ⊗ Dx f2 , (2.32)
where K−1
(132) (m1 , m2 , d) = K(132) (m1 , d, m2 ).
Proof To prove (2.32) first we consider the derivative
Dx (f1 ⊗ f2 ) = Dx f1 ⊗ f2 + f1 ⊗ Dx f2 ,
by (2.17) hence
Dx⊗ (f1 ⊗ f2 ) = vec (Dx (f1 ⊗ f2 )) = vec (Dx f1 ⊗ f2 ) + (f1 ⊗ Dx f2 ) .
and we have
vec f1 ⊗ (Dx f2 ) = f1 ⊗ Dx⊗ f2 ,
by (1.22) hence
∂ ∂
vec f1 ⊗ f + f ⊗ f 2 = Im1 ⊗ Km2 •d Dx⊗ f1 ⊗ f2 + f1 ⊗ Dx⊗ f2 .
∂x 2 1 ∂x
The commutator matrix Im1 ⊗Km2 •d changes the order of a triple T-product, namely
let a1 ∈ Rm1 , a2 ∈ Rm2 , and b ∈ Rd then
Im1 ⊗ Km2 •d (a1 ⊗ b ⊗ a2 ) = a1 ⊗ a2 ⊗ b,
the difference is that the tensor products do not commute, therefore we need to apply
a commutator before acts ∂/∂x.
Applying Dx⊗ on f1 ⊗ f2 yields two terms; one is
∂
f1 ⊗ ⊗ f2 = K(132) (m1 , m2 , d) (f1 ⊗ f2 ⊗ a)|a=∂/∂x ,
∂x
i.e. reversing the last two vectors in product f1 ⊗f2 ⊗a before applying the derivative
∂/∂x. Now we see the pattern
∂ ∂
(f1 ⊗ f2 ) ⊗ = f1 ⊗ f2 ⊗ + K−1
(132) (m1 , m2 , d)
∂x ∂x
K(132) (m1 , m2 , d) (f1 ⊗ f2 ⊗ a)|a=∂/∂x
⊗ −1 ∂
= f1 ⊗ Dx f2 + K(132) (m1 , m2 , d) f1 ⊗ ⊗ f2
∂x
⊗
= f1 ⊗ Dx⊗ f2 + K−1
(132) (m1 , m2 , d) Dx f1 ⊗ f2 .
Here we must remember that the inverse K−1 (132) (m1 , m2 , d) = K(132) (m1 , d, m2 )
of the commutator K(132) (m1 , m2 , d), and is equal to Im1 ⊗ Km2 •d , where the
dimensions are counted. Notice K−1 (132) (m1 , m2 , d) = K(132) (m1 , d, m2 ).
We derive the second term in (2.32); simply f1 ⊗(f2 ⊗ ∂/∂x) = f1 ⊗Dx⊗ f2 , hence
there is no need for a commutator to reach the derivative of f2 .
This implies the following rule of thumb: Applying the operator Dx⊗ for the first
function f1 in the product f1 ⊗ f2 , one needs to transform vector ∂/∂x just behind
vector f1 by the commutator K(132) (m1 , m2 , d), and it is necessary to use the inverse
K−1
(132) for maintaining equality and pretending to transform it back to the original
form.
There is another way of changing the orders of a tensor product (f1 ⊗ f2 ) ⊗ ∂/∂x
so that ∂/∂x be able to act on f1 , namely we can change the order of f1 and f2 . The
result is
∂
(f1 ⊗ f2 ) ⊗ = f1 ⊗ Dx⊗ f2 + K−1
(213) (m1 , m2 , d)
∂x
K(213) (m1 , m2 , d) (f1 ⊗ f2 ⊗ a)|a=∂/∂x
∂
= f1 ⊗ Dx⊗ f2 + K−1
(213) (m 1 , m 2 , d) f 2 ⊗ f 1 ⊗
∂x
= f1 ⊗ Dx⊗ f2 + K−1
(213) (m1 , m2 , d) f2 ⊗ Dx f1 .
⊗
Naturally
⊗
K−1 −1 ⊗
(132) (m1 , m2 , d) Dx f1 ⊗ f2 = K(213) (m1 , m2 , d) f2 ⊗ Dx f1 ,
80 2 The Tensor Derivative of Vector Functions
and either one is correct. Again the general pattern of (2.32) for T-products is similar
to the common differential of products.
Dx (f1 ⊗ f2 ⊗ f3 ) = Dx f1 ⊗ f2 ⊗ f3 + f1 ⊗ Dx f2 ⊗ f3 + f1 ⊗ f2 ⊗ Dx f3 , (2.33)
see Lemma 2.3, and then follow the definition of Dx⊗ . That is, we need to vectorize
the transposition of the terms one by one. Doing so, we have
vec (Dx f1 ) ⊗ f2 ⊗ f3 = Im1 ⊗ Km2 m3 •d vec (Dx f1 ) ⊗ f2 ⊗ f3
= K−1
(1423) (m1:3 , d) vec (Dx f1 ) ⊗ f2 ⊗ f3
for the first term in (2.33), where we used (1.21), p. 9. Again, we applied ⊗∂/∂x
on the first term f1 in the product, and to be able to do so, we have to use the
commutator K(1423) (m1:3 , d) and its inverse K−1 (1423) (m1:3 , d), which restores the
product to the original form, to obtain ∂/∂x in the second place from the last one in
(f1 ⊗ f2 ⊗ f3 ) ⊗ ∂/∂x.
Similarly, we interchange ∂/∂x and f3 , for the second term in (2.33), by the
commutator K(1243) (m1:3 , d), and after we have Dx f2 we restore it by the inverse
commutator K−1 (1243) (m1:3 , d). The last term in (2.33) is the simplest one
∂
f1 ⊗ f2 ⊗ f3 ⊗ = f1 ⊗ f2 ⊗ Dx⊗ f3 ,
∂x
Method 2: We can apply the result 2.3 to the 3−product in two steps:
+ K−1 ⊗
(1243) (m1:3 , d) f1 ⊗ Dx f2 ⊗ f3 .
f1 ⊗ f2 ⊗ f3 ⊗ a = K−1
(2314) (m1:3 , d) K(2314) (m1:3 , d) (f1 ⊗ f2 ⊗ f3 ⊗ a)
= K−1
(2314) (m1:3 , d) (f2 ⊗ f3 ⊗ (f1 ⊗ a)) .
⊗ ⊗
1:(k−1) f j ⊗ f k (x) ⊗ (k+1):M f j ⊗ ∂/∂x, so the corresponding permutation
is (1 : k − 1, k + 1 : M, k, M + 1) = (k : M)S .
Lemma 2.5 The general result, with respect to (2.32), is the following: if fk (x) ∈
Rmk , k = 1, 2, . . . , M, then
⊗
M
Dx⊗ fj (x) = K−1
(M+1:k)S (m1:M , d)
1:M
k=1
⊗
⊗
fj ⊗ Dx⊗ fk (x) ⊗ fj , (2.35)
1:(k−1)∗ (k+1:M)∗
where we put the empty set for both (1 : 0)∗ = ∅, and (M + 1 : M)∗ = ∅; these
are the first and last terms in the sum. Another form of the derivative is
⊗
M ⊗
Dx⊗ fj (x) = K−1
(k:M)S (m1:M , d) fj ⊗ Dx⊗ fj (x) .
1:M 1:M,j =k
k=1
(2.36)
Proof We use Property 2.3 with induction and the result of Lemma 1.2.
Example 2.6 We consider a particular case
f (x) = x⊗M ,
of (2.35), where all functions are equal fj (x) = x, and all dimensions mj are the
same mj = d. We apply (2.36) and realize that moving the kth term to the Mth
place is equivalent to the commutator Lk = Id k−1 ⊗ Kd M−k •d ⊗ Id and L−1 k =
Id k−1 ⊗ Kd•d M−k ⊗ Id . This implies that the corresponding partition to commutator
Id k−1 ⊗ Kd M−k •d has type 1 = 1, M−1 = 1, j = 0 otherwise. Therefore we can
write
M
Id k−1 ⊗ Kd•d M−k ⊗ Id = L−1
1M−1 ,11 ⊗ Id ,
k=1
In particular
Dx⊗ x⊗3 = K−1
(2314) x
⊗2 ⊗ vecI −1
d + K(1324) x
⊗2 ⊗ vecI
d +x
⊗2 ⊗ vecI
d
= Id 4 + K−1(1324) + K−1
(2314) x⊗2 ⊗ vecI
d = L −1
1 ,1 ⊗ Id x⊗2 ⊗ vecI .
d
2 1
⊗k
∂
Dx⊗k f = Dx⊗ ⊗k−1
Dx f =f⊗
∂x
! "⊗k
∂ ∂ ∂
= [f1 (x) , f2 (x) , . . . , fm (x)] ⊗ , ,..., ,
∂x1 ∂x2 ∂xd
the result is a column vector of order md k , containing all possible partial deriva-
tives of entries of f with order d with respect to the T-product of partials
[∂/∂x1 , ∂/∂x2 , . . . , ∂/∂xd ]⊗k .
A simpler version of the T-derivative of higher orders is when functions are
scalar-valued, and it will be applied several times.
Let f and g be scalar-valued functions of x ∈ Rd then the first four derivatives
are as follows:
First order The T-derivative of the product fg is
Second order Let us consider the second-order T-derivative, (the scalar com-
mutes with vectors), hence the first term is
⊗
Dx⊗ f Dx⊗ g = f Dx⊗2 g + Dx⊗ g ⊗ Dx⊗ f = f Dx⊗2 g + K−1 ⊗
(21) Dx f ⊗ Dx g ,
84 2 The Tensor Derivative of Vector Functions
Third order Now, for the third-order T-derivative we apply Dx⊗ to the second
derivative (2.37) term by term; an example is the derivative of the second term of
(2.37), which has two terms, the first one is
Dx⊗ Dx⊗ f ⊗ Dx⊗ g = Dx⊗ f ⊗ Dx⊗2 g + K−1 ⊗2
(132) Dx f ⊗ Dx g
⊗
= K−1
(231) Dx
⊗2
g ⊗ Dx
⊗
f + K −1
(132) Dx
⊗2
f ⊗ Dx
⊗
g ,
−1
since K−1 −1
(213) K(132) = K(132) K(213) = K−1(231) .
The rest of the terms of (2.37) are simple, so we obtain
Dx⊗3 (fg) = f Dx⊗3 g + Id 3 + K−1
(231) + K −1
(132)
Dx⊗2 f ⊗ Dx⊗ g + Dx⊗2 g ⊗ Dx⊗ f + gDx⊗3 f. (2.38)
Notice that the left-hand sides in all cases above are q-symmetric, see Sect. 1.3.1,
p. 13. Let us consider the third-order case, the left-hand side of (2.38) is
and now apply the symmetrizer Sd13 to both sides and obtain
since changing the order of operator Dx⊗ results in no change. The first two terms
on right-hand side of (2.38) are 3-symmetric too, the coefficients of the last term are
2.2 T-derivative 85
permutation matrices; if we apply the symmetrizer the result is simply changing the
order in the sum so that
Sd13 Id 3 + K−1
(231) + K −1
(132) = 3Sd13 .
This looks much closer formally to the ordinary derivatives of a product of two
functions.
Fourth order Similarly to the third-order the fourth-order derivative is written
Dx⊗4 (fg) = Sd14 f Dx⊗4 g + g Dx⊗4 f
+4 Dx⊗3 f ⊗ Dx⊗ g + Dx⊗3 g ⊗ Dx⊗ f
+3 Dx⊗2 f ⊗ Dx⊗2 g + Dx⊗2 g ⊗ Dx⊗2 f
4
4
= Sd14 Dx⊗k f ⊗ Dx⊗4−k g ,
k
k=0
m ⊗ ⊗kj
Dx⊗m fj = Sd1m Dx fj . (2.40)
k1:n j
j =1:n k1:n =m
We generalize the above result (2.40) for higher-order derivatives of the tensor
product of vector-valued functions when n = 2. Naturally the distinct values
principle then can be applied obtaining formulae like (2.40). One can easily show
that if n = 2; then the result of Lemma 2.6 is a particular case of Lemma 2.7.
86 2 The Tensor Derivative of Vector Functions
⊗
+ f1 ⊗ D1,2 f2 + K−1 ⊗ ⊗
(1423) D2 f1 ⊗ D1 f2
⊗
= f1 ⊗ D1,2 f2 + K−1 ⊗ ⊗ −1 ⊗ ⊗
(1324) D1 f1 ⊗ D2 f2 + K(1423)D2 f1 ⊗ D1 f2
+ K−1 ⊗
(1342) D1,2 f1 ⊗ f2 ,
where K−1 −1
(1342) = K(1342) (m1 , m2 , d1 , d2 ) by default notation. We argue as follows:
⊗
we used D2 on the first term of (2.41) as
⊗ −1
D2⊗ K−1
(132) (m 1 , m 2 , d 1 ) D1 f 1 ⊗ f 2 = K(132) (m 1 , m 2 , d 1 ) ⊗ I d2 D2⊗ D1⊗ f1 ⊗ f2
= K−1 −1 ⊗ ⊗
(1324) K(1342) (m1 , d1 , m2 , d2 ) D1,2 f1 ⊗ f2 + D1 f1 ⊗ D2 f2
⊗
= K−1 −1 ⊗ −1
(1324) K(1243) (m1 , d1 , m2 , d2 ) D1,2 f1 ⊗ f2 + K(1324) D1 f1 ⊗ D2 f2
⊗ ⊗
since the consecutive application of commutators acts on vectors with orders defined
by the second commutator, therefore we obtain the derivative of the first term
⊗ ⊗ ⊗
D2⊗ K−1 −1 ⊗2 −1
(132) (m1 , m2 , d1 ) D1 f1 ⊗ f2 = K(1342) D1,2 f1 ⊗ f2 + K(1324) D1 f1 ⊗ D2 f2 .
⊗
We combine the two terms above and arrive at the derivative D1:2 (f1 ⊗ f2 ) above.
A similar argument leads to the third-order derivative
⊗ ⊗
D1:3 (f1 ⊗ f2 ) = f1 ⊗ D1:3 f2 + K−1 ⊗ ⊗
(15234) D3 f1 ⊗ D1,2 f2
+ K−1 ⊗ ⊗ −1 ⊗ ⊗
(13245) D1 f1 ⊗ D2,3 f2 + K(13524)D1,3 f1 ⊗ D2 f2
+ K−1 ⊗ ⊗ −1 ⊗ ⊗
(14235) D2 f1 ⊗ D1,3 f2 + K(14523)D2,3 f1 ⊗ D1 f2
⊗
+ K−1 ⊗ ⊗ −1
(13425) D1,2 f1 ⊗ D3 f2 + K(1342) D1:3 f1 ⊗ f2
= K−1 ⊗ ⊗
p(b) Db f1 ⊗ Dbc f2 ,
b
where the summation is taken over all possible blocks b of the set (1 : n) and its
complement bc , including the empty set as well. The summation contains 2n terms,
each block is written in increasing order of its elements; the permutation p (b) of
(1 : n + 2) is defined by (1, b + 2, 2, bc + 2).
88 2 The Tensor Derivative of Vector Functions
We proceed with the second-order derivative Dx⊗2 (f1 ⊗ f2 ), when both functions
depend on the single variable x; we use (2.42) and the distinct values principle, then
substitute x in each xj . It is seen then
Dx⊗2 (f1 ⊗ f2 ) = K−1
(1342) Dx
⊗2
f 1 ⊗ f 2 + K −1
(1324) + K −1
(1423)
⊗
Dx f1 ⊗ Dx⊗ f2 + f1 ⊗ Dx⊗2 f2 . (2.43)
We have seen the derivative of scalar-valued functions, like the product rule 2.2.3
which provides the T-derivative of the inner product f g, as
Dx⊗ f g = f ⊗ Id Dx⊗ g + g ⊗ Id Dx⊗ f. (2.45)
A simple case of (2.45) is a x where a is constant. Since the Taylor series expansion
of a characteristic function contains terms like (a x)k , obtaining the higher-order T-
derivatives of (a x)k is of some interest.
Lemma 2.8 Let a be a constant vector of dimension d, then
k
Dx⊗k a x = k!a⊗k . (2.46)
Proof The formula (2.46) can be proved directly. We observe that (a x)k =
a⊗k x⊗k , then for k = 1, we have
Dx⊗ a x = a ⊗ Id vecId = a. (2.47)
hence
k k−1 ⊗ k−1
Dx⊗ a x = k a x Dx a x = k a x a (2.48)
We replace the power on the left-hand side by the tensor product, resulting in
We consider the T-differential operator Dx⊗ , then we split the vector x into sub-
vectors of possibly different dimensions [d1 , d2 , . . . , dn ], in the same way we take
a list of vectors [x1 , x2 , . . . , xn ] and the derivatives with respect to the sub-vector
⊗
entries. We denote this operator by Dx⊗1:n , or D1:n , for short. These sub-vectors will
have the role of scalars in Faà di Bruno’s formula (2.6), and the T-derivative Dx⊗1:n
will be employed instead of partial derivatives.
We can permute the vectors in a T-product with the help of commutators, which
leads to the following.
Proposition 2.1 Let p be a permutation of (1 : n), xp = xp(1:n) and the function
f (x1:n ) ∈ Rm be continuously differentiable n times by any of its variables, then
Im ⊗ K−1
p Dx⊗p f = Dx⊗1:n f,
equivalently
Dx⊗p f = Im ⊗ Kp Dx⊗1:n f,
note Im ⊗ K−1 −1
p d[1:n] = Kpn+1 (m, d1:n ) ,where pn+1 = (1, p + 1).
Let us repeat the basic notions which will be used in the next theorem; Pn is
the set of all partitions K of the numbers (1 : n). A partition K with cardinality
|K| = k and type is in canonical form if the blocks {b1 , b2 , . . . , bk } of K are in
order defined by its type . Ordering bj ≤ bk of the blocks bj ∈ K with same type
is defined by
2−m ≤ 2−m , (2.50)
m∈bj m∈bi
2.3 Multi-Variable Faà di Bruno’s Formula 91
and equality in (2.50) is possible if and only if j = i, (see Definition 1.7 and (1.45),
p. 35). In addition elements in each block are increasing.
Theorem 2.2 (Faà di Bruno’s Formula) Let us take a list of vectors x1:n =
(x1 , x2 , . . . , xn ) with dimensions d1:n = (d1 , d2 , . . . , dn ). Let the implicit function
f (g (x)), x ∈ Rdk , be differentiable dk times where f and g are scalar-valued
functions. Then
n ⊗
Dx⊗1:n f (g (x1:n )) = f (r) (g (x1:n )) K−1 (d )
p(K{r} ) 1:n
Dx⊗b g (x1:n ) ,
b∈K{r}
r=1 K{r} ∈Pn
(2.51)
where
{r} denotes partitions with size |K| = r, K{r} is in the canonical form, and
K
p K{r} is a permutation of 1 : n defined by K{r} , (see Sect. 1.4.4), and Dx⊗b is a
|b|t h -order T-derivative.
See the Appendix 2.4.2, p. 99, for the proof.
The summation in (2.51), which is over all partitions K{r} with size r, can
be divided such that collecting partitions K{r} with the same type when size r is
given. Then we sum over with respect to types first. Let us recall that the type of
partition K is = (1 , . . . , n ), when K contains exactly j blocs with cardinality
j , (see Sect. 1.4.4). The number and structure of partitions with the same type
follow incomplete Bell polynomials; therefore, collecting all partitions with fixed
cardinality, say r, can be controlled. Namely, one can obtain both the number of
partitions, which is N and the structure of a partition K with type , from the
incomplete Bell polynomial
n−r+1
Bn,r (x1 , . . . , xn−r+1 ) = N xj j , (2.52)
j =r,j j =n j =1
where
n!
N = n , (2.53)
j =1 j ! (j !)
j
and where xj j tells us that the partition has j blocs with cardinality j . We identify
xj in (2.52) with j th-order derivative and j is the number of the T-products of these
derivatives. An instance is if n = 4 then x22 corresponds to Dx⊗i ,xj ⊗ Dx⊗k ,xm , where
we have second-order derivatives with respect to blocs b1 = (i, j ) and b2 = (k, m)
of a partition K{2} and the number of all partitions with such type = (0, 2, 0, 0)
is N = 3. Let us consider partitions with cardinality r, then we obtain the possible
types , fulfilling j = r, and j j = n, the exponents in n−r+1 j =1 xj j show the
orders of the derivatives, and the number of those partitions defined by type is N ,
see Example 1.20. A list of Bell polynomials can be found in Sect. A.1, p. 351.
92 2 The Tensor Derivative of Vector Functions
type = 1:2 = (2, 0), and the corresponding derivative is g1⊗ ⊗ g2⊗ .
Differentiating (2.54) with respect to x2 , we obtain
⊗ ⊗
D1,2 f (g) = f (1) (g) g(1,2) + f (2) (g) g1⊗ ⊗ g2⊗ , (2.55)
giving a good agreement with the incomplete Bell polynomials. The formula (2.51)
is true for n = 2.
Case 2.3 (n = 3) Here the possible r = 1 : 3,
1. If r = 1, B3,1 (x1:3 ) = x3 , 1:3 = (0, 0, 1), the only partition K{1} = {(1, 2, 3)}
⊗
and the derivative is g(1,2,3) .
2. If r = 2, B3,2 (x1:3 ) = 3x1 x2 , = 1:3 = (1, 1, 0), the only type with coefficients
4!/2! = 3, partitions K{2},1 = {(1, 2) , (3)}, K{2},2 = {(1, 3) , (2)}, K{2},3 =
⊗
{(2, 3) , (1)}, the corresponding derivatives are g(1,2) ⊗ g3⊗ , g(1,3)
⊗
⊗ g2⊗ , g(2,3)
⊗
⊗
⊗
g1 . Notice that blocks are in canonical form. Now the order of indices of the term
⊗
g(1,3) ⊗ g2⊗ are different from the starting one which is 1, 2, 3; the corresponding
permutation is (13|2), hence we apply the commutator K−1 (132) .
3. If r = 3, B3,3 (x1:3 ) = x13 , = 1:3 = (3, 0, 0), one partition K{3} =
{(1) , (2) , (3)}, with the derivative g1⊗ ⊗ g2⊗ ⊗ g3⊗ .
2.3 Multi-Variable Faà di Bruno’s Formula 93
Now to check the above steps, we differentiate (2.55) by x3 , we will see the
general pattern appears alone,
⊗ ⊗ ⊗
D1,2,3 f (g) = f (1) (g) g(1,2,3) + f (2) (g) g(1,2) ⊗ g3⊗ + K−1 ⊗
(132) g(1,3) ⊗ g2
⊗
+g1⊗ ⊗ g(2,3)
⊗
(2.56)
+K−1 ⊗ ⊗
(231) g(2,3) ⊗ g1
3. If r = 3; B4,3 (x1:2 ) = 6x12x2 ; = (2, 1, 0, 0); K3,1 = {(1) , (2) , (3, 4)}, K3,2 =
{(1, 2) , (3) , (4)}, K3,3 = {(1, 3) , (2) , (4)}, K3,4 = {(1, 4) , (2) , (3)}, K3,5 =
{(1) , (2, 3) , (4)}, K3,6 = {(1) , (2, 4) , (3)}; g1⊗ ⊗ g2⊗ ⊗ g(3,4) ⊗ ⊗
, g(1,2) ⊗ g3⊗ ⊗
g4⊗ , K−1 g ⊗
⊗ g2
⊗
⊗ g ⊗
, g ⊗
⊗ g ⊗
⊗ g ⊗
, K −1
g ⊗
⊗ g ⊗
⊗ g ⊗
3 ,
(1324) (1,3)
4 1 (2,3)
4 (1423) (1,4) 2
⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗
g1 ⊗ g(2,3) ⊗ g4 , K(1243) g1 ⊗ g(2,4) ⊗ g3 .
4. If r = 4 again we have four blocks
B4,4 (x1 ) = x14 ; = (4, 0, 0, 0); K1 = {(1) , (2) , (3) , (4)}; g1⊗ ⊗ g2⊗ ⊗ g3⊗ ⊗ g4⊗ .
94 2 The Tensor Derivative of Vector Functions
⊗ ⊗
+ f (2) (g) g(1,2) ⊗ g(3,4) + K−1 (1324) g ⊗
(1,3) ⊗ g ⊗
(2,4)
+ K−1 ⊗ ⊗
(1423) g(1,4) ⊗ g(2,3)
+ f (3) (g) g1⊗ ⊗ g2⊗ ⊗ g(3,4) ⊗ ⊗
+ g(1,2) ⊗ g3⊗ ⊗ g4⊗
+ K−1(1324) g ⊗
(1,3) ⊗ g2
⊗
⊗ g4
⊗
+ g1⊗ ⊗ g(2,3)
⊗
⊗ g4⊗ + K−1 (1423) g ⊗
(1,4) ⊗ g2
⊗
⊗ g3
⊗
+ K−1 ⊗ ⊗
(1243) g1 ⊗ g(2,4) ⊗ g3
⊗
+ f (4) (g) g1⊗ ⊗ g2⊗ ⊗ g3⊗ ⊗ g4⊗ .
All the terms are directly derived from (2.56), except one: K−1
(1342) g ⊗
(1,3,4) ⊗ g ⊗
2 ,
which comes from K−1 ⊗ ⊗
(132) g(1,3) ⊗ g2 :
∂ ∂
K−1 g ⊗
(132) (1,3) ⊗ g ⊗
2 ⊗ = K−1
(1324) g ⊗
(1,3) ⊗ g ⊗
2 ⊗
∂x4 ∂x4
= K−1 ⊗ ⊗ −1 ⊗ ⊗
(1324) g(1,3) ⊗ g(2,4) + K(1243) g(1,3,4) ⊗ g2
= K−1 g ⊗
(1324) (1,3) ⊗ g ⊗
(2,4)
+ K−1 K −1
g ⊗
(1324) (1243) (1,3,4) ⊗ g ⊗
2 ,
−1
evaluating K−1 −1
(1324) K(1243) = K(1243) K(1324) = K−1
(1342) , which is expected.
One can find the general case in the proof Appendix 2.4.2, p. 99.
We consider some particular cases of the formula (2.51) which help to prove
some properties of cumulants; there can be several other uses of that formula as
well.
2.3 Multi-Variable Faà di Bruno’s Formula 95
If all variables among x1:n are the same x, then those tensor products which
correspond to the same type are equal, for example
⊗
Dx⊗3 f (g) = f (1) (g) gx,3 + f (2) (g) Id + K−1 −1
(132) + K(231)
⊗
gx,2 ⊗ gx⊗ + f (3) (g) gx⊗3 .
⊗
We see that gx⊗ ⊗ gx,2 corresponds to the type 1:3 = (1, 1, 0), and gx⊗3 to the
⊗ ⊗k
type 1:3 = (3, 0, 0). Now we assign Dx⊗k g k = ⊗ k =0 gx,k to the type 1:n ,
similarly to the incomplete Bell polynomials.
We restrict partitions K{r} ∈ Pn , with size r and type , let us denote it by K{r|}
then the general formula follows as:
Corollary 2.6 Let x1 , x2 , . . . , xn , be equal to xj = x, then, we can collect
partitions K{r} of the same type
n ⊗ ⊗
Dx⊗n f (g (x)) = f (r) (g) L−1
lr:1 gx,j j , (2.58)
r=1 j =r,j j =n
where
L−1
lr:1 = K−1
p(K{r|} )
,
K{r|} ∈Pn
and the summation is over all possible partitions K{r|} ∈ Pn , with size r and type
, see Sect. 2.4.3, p. 100 for the general definition of commutator L−1
lr:1 .
The left-hand side of expression (2.58) is n-symmetric, just because the operator
Dx⊗n is invariant under the changes of the order of Dx⊗ applied. The right-hand side
should be symmetric as well. This implies that applying the commutators
L−1
lr:1 = K−1
p(K{r|} )
(2.59)
K{r|} ∈Pn
⊗
on ⊗ gx,j j provides an n-symmetric tensor. Recall Example 1.8, p. 15, to show
that the symmetrization of a tensor, a⊗2 ⊗ b, say, which is a T-product of symmetric
tensors, a⊗2 and b, needs a smaller number of commutators than the general case,
⊗
3 instead of 6 there. This happens here as well, since ⊗ gx,j j is a tensor of
⊗ ⊗2
symmetric (lower-order) tensors gx,j j . An instance is gx,3 , say, which is a tensor
⊗
product of a 3-symmetric tensor gx,3 with itself. The sum of commutators (2.59)
can be characterized by type only, since r is the sum of the components of .
The following example shows the benefit of this Corollary.
Example 2.9 Let f and g be like above and n = 3, then the type = 1:3 =
(1, 1, 0); see (2.56), identifies the sum of commutators (2.59). The type = 1:3 =
96 2 The Tensor Derivative of Vector Functions
⊗
(1, 1, 0) implies the tensor product of a 2-symmetric tensor (2 = 1) gx,2 and a 1-
⊗
symmetric tensor (1 = 1) gx . We shall neglect zeros of 1:3 from the notation and
introduce L−1
12 ,11 for the commutator with respect to (2.59), so
⊗
Dx⊗3 f (g) = f (1) (g) gx,3 + f (2) (g) L−1 ⊗ ⊗
12 ,11 gx,2 ⊗ gx + f
(3)
(g) gx⊗3 ,
where
L−1 −1 −1
12 ,11 = Id 3 + K(132) + K(231) ; (2.60)
L−1
12 ,11 = Id 3 + K(132) + K(312) ;
now compare it to the Example 1.8, p. 15, and conclude that they are the same.
Example 2.10 If n = 4, then from (2.57) it follows
⊗
Dx⊗4 f (g) = f (1) (g) gx,4 + f (2) (g) L−1 ⊗ ⊗
13 ,11 gx,3 ⊗ gx + f
(2)
(g) L−1 ⊗2
22 gx,2
(2.61)
+ f (3) (g) L−1 ⊗ ⊗2
12 ,21 gx,2 ⊗ gx + f
(4)
(g) gx⊗4 ,
where commutator L−1 13 ,11 , corresponds to the type 1 = (1, 0, 1, 0), when r = 2.
Here the subject of symmetrization is a product of a 3-symmetric and a 1-symmetric
⊗
tensors, i.e. gx,3 ⊗ gx⊗ ,
L−1 −1 −1 −1
13 ,11 = Id 4 + K(2431) + K(1243) + K(1342) . (2.62)
Now, L−1 ⊗2
22 , symmetrizes a product of a 2-symmetric tensor with itself, i.e. gx,2 , the
generating type is 2 = (0, 2, 0, 0), (see above the Case n = 4, r = 2), the result is
L−1 −1 −1
22 = Id 4 + K(1324) + K(1423). (2.63)
Finally L−1
12 ,21 is defined by the type = (2, 1, 0, 0), where for j = 2, 2 = 1, and
⊗ ⊗
for j = 1, 1 = 2, and the tensor product gx,2 ⊗ gx⊗2 includes a 2-symmetric gx,2
⊗2
and a product of two 1-symmetric tensor gx tensor,
L−1 −1 −1 −1 −1 −1
12 ,21 = Id 4 + K(3412) + K(1324) + K(2314) + K(1423) + K(2413); (2.64)
we apply the symmetrizer Sd15 for the result and obtain the symmetry equivalent
version
⊗ ⊗ ⊗ ⊗
Dx⊗5 f (g) = f (1) (g) gx,5 + f (2) (g) 5gx,4 ⊗ gx⊗ + 10gx,3 ⊗ gx,2
⊗2 ⊗
+f (3) (g) 15gx,2 ⊗ gx⊗ + 10gx,3 ⊗ gx⊗2
⊗
+10f (4) (g) gx,2 ⊗ gx⊗3 + f (4) (g) gx⊗5 .
It is worth noting that the computational usage of symmetrizer Sd15 needs more
power and room than the usage of the L commutator.
2.4 Appendix
Proof of Lemma 2.1. We prove the result (2.6) by induction. We have seen the
result is true for n = 2 and 3. We assume the result is true for n, and show the result
is valid for n + 1.
We differentiate both sides of (2.6) with respect to xn+1 . Let us consider the
derivative of a typical term on the right-hand side of (2.6). By applying the product
rule, we obtain
∂ (1)
f (r) (g) gb = f (r+1) (g) gn+1 gb (2.65)
∂xn+1 K ∈P n b∈K K{r} ∈Pn b∈K
|K |=r
∂
+f (r) (g) gb , r = 1, 2, . . . , n,
∂xn+1
K{r} ∈Pn b∈K{r}
where K{r} denotes partitions with size |K| = r. Let us compare the result (2.65)
(1) (2)
with (2.3) and (2.5). We see (2.5) is obtained by differentiating f (g) and f (g)
of (2.3) with respect to g and the second term (2.4) and the last term of (2.5) are
(1) (2)
the derivative of f (g) and f (g). As we have noted earlier these two terms are
exclusive expansions of the two terms from the order n = 2. From this it is obvious
that the term
gn+1 gb ,
b∈K{r}
is the exclusive extension of the product term b∈K{r} gb , and therefore corresponds
* *
to the partition K{r+1} ∈ Pn+1 , so *K{r+1} * = r +1, where a typical block of K{r+1}
2.4 Appendix 99
∂
gb ,
∂xn+1
b∈K{r}
contains all the inclusive extensions of the partition of K{r} ∈ Pn (again let us
compare this with the first term g1,2,3 , here K1 = {(1, 2, 3)}, the third term g1 g2,3 ,
here K2 = {(1) , (2, 3)}, and the fourth term g1,3 g2 , here K3 = {(2) , (1, 3)}). Now
sum both sides (2.65) and collect the coefficients of f (r) (g) (r = 1, 2, . . . , n + 1).
The coefficients of each f (r) (g) contain both inclusive and exclusive extensions
of partitions with blocks oforder r − 1, when 1 ≤ r ≤ n. When r = n + 1 the
coefficient of f (n+1) (g) is n+1
j =1 gj . Hence the formula (2.6) for n + 1, i.e.
is true.
Proof As we have seen in 2.1, p. 92, the formula (2.51) is true for n = 1 : 4.
Actually differentiating the function f (r) (g) in the sequel contributes to exclusive
extensions while differentiating the product of gb⊗ ’s gives inclusive extensions.
Hence the general formula can be proved by induction on n i.e. assuming the
formula (2.51) is true for n, prove it for n + 1. Differentiate
n ⊗ ⊗
Dx⊗n
1:n
f (g) = f (r) (g) K−1
p(K{r} )
Dxb g (x) ,
b∈K{r}
r=1 K{r} ∈Pn
* *
by Dx⊗n+1 , note that partition K{r} is of size *K{r} * = r. Consider
⊗
Dx⊗n+1 f (r) (g) gb⊗
b∈K{r}
K{r} ∈Pn
⊗
= f (r+1) (g) K−1
p(K{r} )
⊗ Id+1 g ⊗ ⊗ gn+1
⊗
b∈K{r} b
K{r} ∈Pn
⊗
+ f (r) (g) K−1
p(Ke ) Dx⊗n+1 gb⊗ ,
b∈K{r}
K{r} ∈Pn
100 2 The Tensor Derivative of Vector Functions
⊗
r ⊗ ⊗
Dx⊗n+1 gb⊗ = K−1 K−1
p(K{r} ) ((j +1):n)S
g⊗ ⊗ Dx⊗n+1 gb⊗j ⊗ gb⊗k
b∈K{r} 1:j −1 bk j +1:r
j =1
r ⊗ ⊗
= K−1
p(Kincl ) g⊗ ⊗ g ⊗b ,n+1 ⊗ g⊗ ,
1:j −1 bk ( j ) j +1:r bk
j =1
gives all the inclusive extensions of the partition of K{r} ∈ Pn . Now if we collect all
partitions K ∈ Pn+1 corresponding to the term with f (r) (g) in Dx⊗n+1 Dx⊗n 1:n)
f (g),
they are the union of partitions which are the exclusive extensions of K{r} ∈ Pn , on
the one hand and the inclusive extensions of the partitions of K{r} ∈ Pn , on the other
hand. In both cases the result has r blocs (the inclusive extension does not increase
the number of blocks in a partition). The union contains all partitions K ∈ Pn+1
with |K| = r. Hence the formula (2.51) is true for n + 1.
where index lr:1 is defined by the following way: if j = 0 then set j by lj , where
if j = 3, then
l denotes the actual value of j , for instance lj = 3j . The type
is either in decreasing (canonical) lr:1 = ljr , ljr−1 , . . . lj1 or l1:r increasing order
of jk ; jk ≥ jk−1 . The summation is taken over all partitions K{r|} of the set 1 :
n, having type and size r. L−1l1:r denotes moment commutator when the l1:r is in
increasing order.
The number of all partitions K{r|} of n−element set with type is
n!
N = n ,
j =1 j ! (j !)
j
(see (1.41), (1.42), p. 32) which applies for the number of commutators in the sum
of L−1
lr:1 . Several particular cases are listed in Sect. A.2.1, p. 353.
2.5 Exercises 101
2.5 Exercises
2.1 Let h (x) = f (g (x)) and f (x) = ln x. Use (2.6) and show that
∂3
(3) (2) (1) (1) (2) (2) (1)
exp (g) = exp (g) g1,2,3 + exp (g) g1,3 g2 + g1 g2,3 + g1,2 g3
∂x1 ∂x2 ∂x3
(1) (1) (1)
+ exp (g) g1 g2 g3 .
2.4 Let h (x) = f (g (x)), use formula (2.13) and derive ∂ 4 h (x) /∂x1∂x23 . Compare
the result with Case A.1.
2.5 Use formula (2.13) for derivative ∂ 4 f (g (x1 , x2 )) /∂x12∂x22 , set n = (2, 2),
|n| = 4, K = 8, 0 < jk ≤ n, jk , k = 1 : K, and use the reading Table.
j1 \j2 0 1 2
0 (0, 1) (0, 2)
1 (1, 0) (1, 1) (1, 2)
2 (2, 0) (2, 1) (2, 2)
∂4
f (g) = f (1) (g) D2,2 g
∂x12 ∂x22
2
+f (2) (g) 2D2,1 gD0,1 g + 2D1,0 gD1,2 g + D2,0 gD0,2 g + 2 D1,1 g
2 2
+f (3) (g) D2,0 g D0,1 g + D0,2 g D1,0 g + 2D0,1 gD1,0 gD1,1 g
2 2
+f (4) (g) D1,0 g D0,1 g .
Dx⊗ (a ⊗ x) = a ⊗ vecId ,
and
If G (x) is a matrix-valued function, then define Dx⊗ G = vec (Dx vecG). Let
G (x) = A ⊗ Bx and show that
2.14 Let f1 (y) = exp (y1 − 1/2y 2), and f2 (x) = [y1 , y2 ] = [μ1 x1
+μ2 x2 , σ11 x12 + 2σ12 x1 x2 + σ22 x22 , then show that
! "
μ1 − x1 − σ12 x2
Dx⊗ f = exp (y1 − 1/2y2) .
μ2 − x2 − σ12 x1
2.15 Sow that if f1 (y) = y, exp y , and f2 (x) = μ1 x 1 + μ2 x 2 −
1/2 σ11 x12 + 2σ12 x1 x2 + σ22 x22 , then
⎡ ⎤
μ1 − x1 − σ12 x2
⎢ μ2 − x2 − σ12 x1 ⎥
Dx⊗ f = ⎢ ⎥
⎣(μ1 − x1 − σ12 x2 ) exp y ⎦ .
(μ2 − x2 − σ12 x1 ) exp y
or
Dx⊗ (Vx ⊗ vecV) = K−1 2
(132) d, d , d vecV ⊗ vecV .
and
⊗2
Dx⊗2 (y − Vx)⊗2 = K−1 −1
(1324) (d) + K(2314) (d) vecV .
k−1
k − 1 −1
Dx⊗(k−1) ⊗ ⊗k
f Dx g = f Dx g + K(23) d, d k−j , d, d j
j S
j =1
⊗(k−j ) ⊗j
× Dx g ⊗ Dx f . (2.66)
d kj
2.24 Let x ∈ Rd , k = (k1 , k2 , . . . , kd ), k! = k1 !k2 ! . . . kd !, xk = j =1 xj , ∂xk =
∂x1k1 ∂x2k2 · · · ∂xdkd .
Show that the following two forms of Taylor series expansion of
a scalar valued function g (x) are equivalent:
1
g (x) = ak xk + o |x|2n0 ,
k!
0≤k1:d ,k1:d ≤n0
where
*
∂ k g (x) **
ak = *
∂xk *
x=0
and
1 ⊗m ⊗m
n0
g (x) = x Dy g (y) |y=0 + o |x|2n0 . (2.67)
m!
m=0
2.5 Exercises 105
2.27 Let h (x) = f (g (x)), and x = x1:4 = (x1 , x2 , x3 , x4 ) a list of vectors with
dimensions d1:4 = (d1 , d2 , d3 , d4 ), respectively. Derive the following derivatives:
1. If n = 1, then
2. If n = 2, then
Dx⊗1:2 f (g) = f (1) (g) Dx⊗1:2 g + f (2) (g) Dx⊗1 g ⊗ Dx⊗2 g . (2.68)
3. If n = 3, then
Dx⊗3
1:3
f (g) = f (1) (g) Dx⊗31:3
g + f (2) (g)
Dx⊗1,2 g ⊗ Dx⊗3 g + K−1 ⊗ ⊗ ⊗ ⊗
(2,3)S Dx1,3 g ⊗ Dx2 g + Dx1 g ⊗ Dx2:3 g
+ f (3) (g) Dx⊗1 g ⊗ Dx⊗2 g ⊗ Dx⊗3 g .
4. If n = 4, then
Dx⊗4
1:4
f (g) = f (1) (g) Dx⊗4
1:3
g + f (2) (g) Dx⊗1,2 g ⊗ Dx⊗3,4 g
+K−1 ⊗ ⊗
(2,3)S Dx1,3 g ⊗ Dx2,4 g +
+f (2) K−1
(1,3,4,2) D ⊗3
x1,3,4 g ⊗ D ⊗
x2 g
The use of Faà di Bruno’s lemma in statistics has been initiated by Lukacs [Luk55],
see also [Luk70]. See [Sav06] and further references there for general Faà di Bruno’s
formula when function f is a multivariate function, as also [Har06] for application
to cumulants. Faà di Bruno’s formula for multivariate functions in connection
with recursive multivariate derivatives is considered by Miatto [Mia19], and with
multivariate Bell polynomials by Schumann [Sch19].
MacRae [Mac74] appears to be the first to define a matrix derivative using
the tensor product and a derivative operator. Other relevant references for matrix
differentiation are Neudecker [Neu69], Henderson and Searle [HS81] and [KvR06].
The book [MN99] is basic in our treatment. We use notations and apply several
results from this book, and in particular from Chapter 9 of [MN99]. Neudecker
[Neu69] and MacRae [Mac74] deal with matrix differentiation using tensor product.
Henderson and Searle [HS79] considered the multivariate derivative of symmetric
matrices. Ma [Ma09] expresses the higher partial derivatives of composite functions
in terms of factor functions, see also by Noschese and Ricci [NR03].
Chapter 3
T-Moments and T-Cumulants
Moments of a vector random variable X ∈ Rn are usually defined via integrals with
respect to the joint distribution of X. We shall use an equivalent definition for the
moments of X based on its characteristic function φ (λ), which is also a common
tool in probability theory. The characteristic function of X is defined as
φ (λ) = E exp iλ X , (3.1)
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 107
G. Terdik, Multivariate Statistical Methods, Frontiers in Probability
and the Statistical Sciences, https://doi.org/10.1007/978-3-030-81392-5_3
108 3 T-Moments and T-Cumulants
where λ ∈ Rn are real, and it is always known to exist. Moreover, E |Xj |2n0 <
∞, j ∈ {1, 2, . . . n}, for some n0 if and only if all the partial derivatives
∂ kj ∂ kj
φ (λ) = n φ (λ)
∂λk kj
j =1 ∂λj
up to order kj ≤ 2n0 exist, are continuous at zero, and the Taylor expansion
i kj
φ (λ) = μX,k λk + o λ2n0 , (3.2)
k!
kj ≤2n0
*
It is clear that instead of the derivatives (−i∂)r /∂λk φX (λ)*λ=0 = EXk , we have
only taken the first-order derivatives of φY with respect to each variable λm,j .
Let 1n denote a vector with all 1s in its coordinates, i.e. 1n = [1, 1, ..., 1]
with dimension n. The following definition gives all mixed higher-order moments
as well.
Definition 3.1 The nth-order moment of a random variable X ∈ Rn is
*
(−i∂)n *
μX,1n = EX1n = φ (λ) * , (3.3)
∂λ1n
X *
λ=0
Remark 3.1 Once again definition (3.3) covers all higher-order moments μX,k =
EXk with order kj = r. The reason is that we can expand X into a vector of
dimension r including all the entries with their multiplicities, and by the distinct
values principle we plug the entries of X back into the expression.
It is worth recalling here some properties of the moments of the products of
random variables
n
EX1n = E Xk = μ(1:n) .
k=1
n
n
n
E aXi + bXj Xk = aE Xk + bE Xk .
k=1,k=i,j k=1,k=i k=1,k=j
n
μp(1:n) = E Xp(k) = EX1n = μ(1:n) .
k=1
r
k
r
k
μX,k = E Xj j = EXj j . (3.4)
j =1 j =1
Note, the above properties are not true for mixed higher-order moments unless
we rewrite them using the distinct values principle.
We can apply the mixed product rule (see (1.3), p. 6), to tensor products, which
yields
i k+m ⊗k
∞
∞
i k+m ⊗k ⊗m ⊗m ⊗m
EX1 λ⊗k
1 X 2 λ2 = E X1 ⊗ X2 λ⊗k ⊗m
1 ⊗ λ2 .
k!m! k!m!
k,m=0 k,m=0
Now it is straightforward to use the chain rule for the terms in the series
expansion and formula (2.48), (see also Exercise 2.26) to obtain
⊗k ⊗m
Dλ⊗1 ,λ2 X1 ⊗ X2 λ⊗k
1 ⊗ λ2
⊗m
⊗k ⊗m
= Dλ⊗2 Dλ⊗1 X1 ⊗ X2 λ⊗k
1 ⊗ λ ⊗m
2
hence
Dλ⊗1 ,λ2 φ (λ1 , λ2 ) = Dλ⊗2 Dλ⊗1 φ (λ1 , λ2 )
∞
i k+m ⊗(k−1) ⊗(k−1)
= i2 EX λ1
(k − 1)! (m − 1)! 1
k,m=1
⊗(m−1) ⊗(m−1)
X2 λ2 (X1 ⊗ X2 ) .
Setting λ1 , λ2 = 0, we have
*
*
Dλ⊗1 ,λ2 φ (λ1 , λ2 )* = i 2 EX1 ⊗ X2 .
λ1 ,λ2 =0
112 3 T-Moments and T-Cumulants
n
i k+m ⊗ *
*
φ (λ1 , λ2 ) = Dλ ,λ φ (λ1 , λ2 ) * λ⊗k
1 ⊗ λ⊗m
2 + o λ1 ⊗ λ2 n .
k!m! 1 k 21 m λ1 ,λ2 =0
k,m=0
We recall the notation λ1k , and λ21m , which denotes repetitions of vectors λ1 and
λ2 , for instance λ213 = λ2,2,2 = [λ2 , λ2 , λ2 ].
If we now carry on with the characteristic function φX of variables X = X1:d =
[X1 , X2 , . . . , Xd ] , i.e.
φX (λ) = E exp iλ X , λ ∈ Rd ,
∞
ik k−1
Dλ⊗ φX (λ) = iEX + E λ X X,
(k − 1)!
k=2
3.2 Tensor Moments 113
which gives iEX at λ = 0, and in general by Lemma 2.8, the kth-order T-derivative
(−i)k Dλ⊗k φX at zero results the T-moment μ⊗
X,k = EX
⊗k as follows:
*
*
(−i)k Dλ⊗k φX (λ)* = μ⊗
X,k . (3.5)
λ=0
In this sense μX = μ⊗
X,1 .
*
*
Definition 3.2 The kth-order (k ≥ 1) T-derivative (−i)k Dλ⊗k φX (λ)* of the
λ=0
characteristic function φX at zero will be called the kth-order T-moment μ⊗
X,k =
⊗k ⊗
EX of a multiple random variable X. Actually μX,k is the expected value of the
T-product X⊗k of X, where the expectation is taken element-wise.
Observe that T-moment μ⊗ d
X,k with dimension k involves all possible kth-order
moments of the entries of X, some of which are with multiplicities. The T-moment
μ⊗ ⊗
X,k is k-symmetric, μX,k ∈ Sd,k , since EX
⊗k is invariant under permutation of
terms in the T-product X , see Sect. 1.3.2, p. 16. The distinct entries of μ⊗
⊗k
X,k can
+
be obtained using the elimination matrix Qd,k , see (1.32), p. 21. The distinct entries
are denoted by μ⊗Ð + ⊗ ⊗Ð
X,k = Qd,k μX,k , the dimension of μX,k is the number of distinct
values ŋd,q , (see (1.30). p. 15).
Example 3.3 (Multivariate Normal Distribution) Take X = [X1 , X2 , . . . , Xd ].
Let us assume that X has a multivariate
normal
distribution with mean EX = μ
and variance–covariance matrix V = σj,k . It is well known that the characteristic
function φX (λ) is
⎧ ⎫
⎨ d
1
d d ⎬
φX (λ1:d ) = exp i λj μj − λj λk σj,k
⎩ 2 ⎭
j =1 j =1 k=1
⎧ ⎫r
∞
1 ⎨
d
1
d
d ⎬
=1+ i λj μj − λj λk σj,k .
r! ⎩ 2 ⎭
r=1 j =1 j =1 k=1
and
*
1 ∂ 2 φX (λ1 , λ2 , . . . , λd ) **
* = EXj Xk = σj,k + μj μk .
i2 ∂λj ∂λk λ=0
114 3 T-Moments and T-Cumulants
Now let us use T-moments for deriving the same quantities. Rewrite the character-
istic function in terms of vectors and the variance–covariance matrix
1
φX (λ) = exp iλ μ − λ Vλ .
2
Now applying the chain rule (2.24) p. 75 for the T-derivative, we obtain the first-
order derivative
1 1
Dλ⊗ φX (λ) = exp iλ μ − λ Vλ Dλ⊗ iλ μ − λ Vλ = φX (λ) (iμ − Vλ) ,
2 2
accordingly. We see the usual first and second moments in a vector form of the
multivariate normal distribution.
Next, we calculate the fourth-order central T-moment.
Example 3.4 (Multivariate Normal Distribution, Cont. 1) Let Y = X − μ be the
centralized version of X. Now the characteristic function has the form
1
φY (λ) = exp − λ Vλ .
2
Consider the third-order central moment, namely differentiate (3.6), from where we
ignore μ. First we take the T-derivative of (Vλ)⊗2 first
*
*
Now Dλ⊗3 φY (λ)* = 0, and hence μ⊗ ⊗3
Y,3 = 0. Differentiate Dλ φY (λ) once
λ=0
more term by term. We can ignore the first term since it will not contribute to the
3.2 Tensor Moments 115
fourth-order moment as its derivative at zero is zero. The derivative of the next term
is
∂
vecV ⊗ (Vλ) ⊗ = vec⊗2 V,
∂λ
multiplied by a constant, where we have used that V is symmetric, see Exercise 1.19.
Hence the fourth-order central moment is,
*
*
Dλ⊗4 φY (λ)* = L−1 ⊗2
22 vec V,
λ=0
where L−1
22 is a commutator matrix (see (A.3), p. 353), and Exercise 1.19. Therefore
μ⊗ −1 ⊗2
Y,4 = L22 vec V.
Compare this with result of Exercise 2.27. Like earlier, the expected value μ⊗
Y,4 =
⊗4
EY , is 4-symmetric; therefore, we can apply the symmetrizer
μ⊗ −1
Y,4 = Sd14 L22 (vecV)
⊗2
= 3vec⊗2 V.
Now suppose that the vector X is partitioned into two parts X = X1 , X2 with
dimensions (d1 , d2 ) which are not necessarily equal. The characteristic function of
X can be written in terms of partitions, see Example 3.2,
φX (λ) = E exp i λ X
i k+ ⊗k
∞
⊗
= E exp i λ1 X1 + λ2 X2 = E X1 ⊗ X2 λ⊗k
1 ⊗ λ⊗
2
k!!
k,=0
where On = (1 : n) is the coarsest partition of the set 1 : n, cf. Sect. 1.4.5, p. 37, in
this way μ⊗ ⊗
On = μ1:n .
The following definition is a generalization of the scalar-valued case:
Definition 3.3 The T-moment of a list of random vector variates X1 , X2 , . . . , Xn is
defined by their joint characteristic function as
⊗ *
*
E Xj = (−i)n Dλ⊗1 ,λ2 ,...,λn φX1 ,X2 ,...,Xn (λ1 , λ2 , . . . , λn )* .
j =1:n λ1:n =0
For instance if b = (3, 5), then μ⊗ 3,5 = EX3 ⊗X5 . At the same time μ3,5 =
[EX3 , EX5 ]. If X3 and X5 are independent, then μ⊗
3,5 = EX3 ⊗ EX5 naturally.
Consider the following example to see the usefulness of the distinct values
principle.
Example 3.5 (Multivariate Normal Distribution, Cont. 2) We derive the fourth-
order central moment μ⊗
Y,4 using the distinct values principle. For the fourth-order
moment, we pretend to have four variables and change the characteristic function
1
φY (λ) = exp − λ Vλ ,
2
for
φ (λ1:4 ) = φY (λ1 + λ2 + λ3 + λ4 ) ,
3.2 Tensor Moments 117
+ (vecV)⊗2 = L−1
22 (vecV)
⊗2
,
where
L−1 −1 −1
22 = Id + K(1324) + K(1423) .
Note the commutator matrix L−1 22 and some more commutator matrices are collected
in the Appendix (see (A.3), p. 353). Hence we have
μ⊗ −1
Y,4 = L22 (vecV)
⊗2
.
Like earlier, the expected value EX⊗4 is symmetric; therefore, we can apply the
symmetrizer
μ⊗ −1
Y,4 = Sd14 L22 (vecV)
⊗2
= 3Sd14 (vecV)⊗2 .
The properties of the T-moments of a list are similar to those of the scalar-valued
case.
Multilinear Let X = a ⊗ X1 + b ⊗ X2 , where a and b are arbitrary constant
vectors with dimensions such that the summation is valid. Then the expected
value of X = a ⊗ X1 + b ⊗ X2 is
μX = a ⊗ μ1 + b ⊗ μ2
as expected.
118 3 T-Moments and T-Cumulants
and indeed
Symmetry The order of the terms in the T-product and in the T-derivative is
important; the T-moment of variables X1 , X2 , . . . , Xn is not symmetric.
⊗ Let
p ∈ Pn , be a permutation of 1 : n, then E ⊗ X
j =1:n p(j ) = K p(1:n) E j =1:n j ,
X
⊗ ⊗
or μp(1:n) = Kp μ1:n for short. If the components of the list X1:n are the same
X, then μ⊗X,n = EX
⊗n is n-symmetric; therefore, μ⊗ = S
X,n
⊗
d1n μX,n , where Sd1n
denotes the symmetrizer matrix.
The way moments are defined (see Definition 3.3) will help us understand cumulants
more easily. Take a random vector X = X1:n , denote the logarithm of the charac-
teristic function ln φX (λ) by ψX (λ), and call it cumulant generator function, or
cumulant function, for short. Let us retain the assumptions of the previous section
and now consider the Taylor series expansion around zero of the cumulant function
ψX (λ) of X, i.e.
i kj
ψX (λ) = c(X, k)λk + o(λn ), (3.7)
k!
kj ≤n
/#
where λ = λ2i . The cumulant Cum X1k1 , . . . , Xn1kn = Cum(X1 , . . . , X1 ,
+ ,- .
k1
. . . , Xr , . . . , Xn ) with order |k| = kj is defined as the coefficient c(X, k) of
+ ,- .
kn
3.3 Cumulants for Multiple Variables 119
λk via Eq. (3.7). In general for the nth-order cumulant of a single variable X, the
convention
is adopted. To obtain the mixed higher-order cumulants Cum X1k1 , . . . , Xn1kn =
κX,k (where k = k1:n ) by higher-order derivatives of ψX (λ) in (3.7), it is enough to
take the first-order derivatives of a cumulant function with respect to a variable Y =
[X1 , . . . , X1 , . . . , Xr , . . . , Xr ] acting as if all the components of Y were different.
+ ,- . + ,- .
k1 kr
The Taylor series expansion of the logarithm ofthe characteristicfunction with kj
distinct variables provides the cumulant Cum X1k1 , . . . , Xn1kn since the entries
of Y are not assumed to be different, see the similar argument for moments on page
109, in other words the distinct values principle can be applied again.
Example 3.6 The Gamma distribution (β, α) is given either by the density
x β−1
f(β,α) (x) = exp (−x/α) , if x > 0 and zero otherwise,
α β (β)
φX (λ) = (1 − αiλ)−β ,
κX,n = α n (n − 1)!β,
where 0! = 1 as usual.
120 3 T-Moments and T-Cumulants
Again, the entries of X are not assumed to be different; therefore, the following
definition is correct, and it also covers the case when some of the entries of X are
equal as well. We define the cumulant for different variables only.
Definition 3.5 The nth-order cumulant of variates X1:n = X, is
*
(−i∂)n *
Cumn (X) = ψ (λ) * . (3.8)
∂λ1n
X *
λ=0
We shall also use the following short notation Cumn (X) = κX = κ(1:n) =
Cum (X1:n ) as well.
Notice the index (1 : n) is considered as a partition, actually On = (1 : n), cf.
Sect. 1.4.5, p. 37. Similarly to the multiple moments multi-indexing will be used
for higher-order cumulants. We also note that the second-order cumulant is called
variance as well as the third order is skewness and the fourth order is kurtosis
when the variate is standardized, (see Sect. 6.1, p. 313 for more details). One of
the simplest examples is the cumulants of the Gaussian distribution.
Example 3.7 The cumulant function
of a Gaussian random vector X[1,2] =
[X1 , X2 ] with EXj = μj , Cov Xj , Xk = σj k is
1
ψX1:2 (u1 , u2 ) = i(μ1 u1 + μ2 u2 ) − (σ11 u21 + 2σ1,2 u1 u2 + σ22 u22 ).
2
The first-order cumulants are the first-order derivatives at zero κXj = μj ; similarly,
the second-order ones are κ(j,k) = σj k , and all higher-order cumulants are zero.
where λ1:n = (λ1 , λ2 , . . . , λn ) is the corresponding list of vectors with the same
dimensions d1:n = (d1 , d2 , . . . , dn ) as X1:n . The logarithm of the characteristic
function φX1:n (λ1:n ) the cumulant function is denoted by ψX1:n (λ1:n ).
The first-order derivative of the cumulant function ψX1:n (λ1:n ), at zero, with
respect to each component of λ1:n will be defined as the cumulant of X1:n . More
precisely we recall that
Dλj φ = φ∂/∂λj = φ ∂/∂λ1,j , ∂/∂λ2,j , . . . , ∂/∂λdj ,j
3.3 Cumulants for Multiple Variables 121
is the Jacobian with respect to the vector λj of φ, the column vector of the transpose
of the Jacobian is the T-derivative
⊗ ∂ ∂
Dλj φ = vec φ =φ⊗ ,
∂λj ∂λj
and Dλ⊗1:n is the column vector of the partial differential operator of order n with
respect to each variable λj only once, so that
Dλ⊗1:n φ = Dλ⊗n Dλ⊗1:(n−1) φ ,
(see Sect. 2.2, p. 70). The dimension of Dλ⊗n 1:n
1n
is d1:n = nj=1 dj , where 1n denotes a
vector with all 1s in its coordinates, i.e. 1n = [1, 1, ..., 1] with dimension n. In this
way, we keep all the derivatives with the same order in a vector. Now the definition
of the cumulant of a list of vectors X1:n is the following.
Definition 3.6 The T-cumulant of a collection of random vectors X1:n is
*
*
Cumn (X1:n ) = (−i)n Dλ⊗1:n ψX1:n (λ1:n )* ,
λ1:n =0
or Cumn (X1:n ) = κ ⊗ ⊗ ⊗
X1:n (d) = κ 1:n (d) = κ On (d) for short, where On denotes the
“partition” (1 : n).
We shall use notations Cumn (X1:n ), κ ⊗ ⊗ ⊗
X1:n (d), κ 1:n (d), and κ On (d) in the sequel.
(Notice the multi-index notation κ ⊗ ⊗
1:n here as well.) The cumulant κ 1:n (d) is a vector
of dimension d n = dj having all possible cumulants of the entries of vectors
1
κ⊗ ⊗
X,n (d). The distinct values of κ X,n can be obtained with the linear transformation
κ ⊗Ð + ⊗ +
X,n = Qd,n κ X,n , where Qd,n is the elimination matrix. The dimension of the vector
κ ⊗Ð
X,n of distinct values is explicitly given, and denoted by ŋd,n , (see Sect. 1.3.2,
p.16).
First we apply the operator ∂/∂u1 , then take the vector of the transpose, the result
is
Du⊗1 u2 C2,1 u1 = vec u2 C2,1 = C1,2 u2 ,
Notice that u2 C2,1 u1 is a scalar so u1 C1,2 u2 = u2 C2,1 u1 , hence the result
*
κ⊗ 2 ⊗ *
1,2 = (−i) Du1 ,u2 ψX1,2 (u1 , u2 ) (u 1 ,u2 )=0
= vecC2,1 .
3.3 Cumulants for Multiple Variables 123
Hence we have obtained the cumulant κ ⊗ 1,1 = Cum2 (X1 , X1 ) = vecC1,1 . Observe
that both the matrix C1,1 and the vector κ ⊗ ⊗
1,1 are 2-symmetric, κ 1,1 ∈ Sd,2 .
There are other ways for defining the moments and cumulants for multiple
random variables that will keep the quantities that belong together either as a vector,
a matrix, or a tensor.
Remark 3.3 Kollo provides an alternative definition of higher-order moments of
vector-valued variates as follows: the first-order moment m1 (X) = EX is a vector;
the second-order one is m2 (X) = EX X, which is a matrix with a natural
connection to the variance–covariance matrix. The third-order moment is the tensor
product of the previous patterns m3 (X) = EX ⊗ X ⊗ X, and the higher-order ones
follow this pattern. The fourth-order one is
We can express the vector form of the second term of c4 (X) with the help of
commutators
vec Id 2 + Kd•d (V ⊗ V) Id 2 = Id 4 + Id 2 ⊗ Kd•d vec (V ⊗ V)
= Id 4 + Id 2 ⊗ Kd•d (Id ⊗ Kd•d ⊗ Id ) (vecV)⊗2
= K(1324) (d) + K(1342) (d) vec⊗2 V
124 3 T-Moments and T-Cumulants
since K(1243)K(1324) = K(1342). The third term of c4 (X) has the following form:
vec vecVvec V = vec⊗2 V.
where L−1 −1 −1
22 = Id 4 + K(1324) (d) + K(1423) (d). As we shall see later, in Sect. 3.4.1,
formula (3.36), vecc4 (X) equals Cum4 (X), hence both the vectorized moments and
Kollo’s cumulants coincide with T-moments and T-cumulants.
The results obtained here are generalizations of well known results given for scalar
random variables, we shall consider the scalar case first and then the general vector-
valued case.
Suppose that dimensions of X1 , X2 , . . . , Xn are d = [d1 , d2 , . . . , dn ].
Property 3.1 (Symmetry) Let X ∈ Rn , then the cumulant (with scalar
value) Cum(X) = Cum(Xp ) is symmetric, i.e. κ(1:n) = κp , where p =
(p (1) , p (2) , . . . , p (n)) ∈ Pn is a permutation of the integers 1 : n.
If some dj > 1 then the T-cumulants are not symmetric any more, they fulfill the
equation
i.e. κ ⊗ −1 ⊗ −1
1:n = Kp (d) κ p , where Kp (d) is a commutator matrix defined by the
permutation p, see (1.23), p. 11.
If all the entries of the list X1:n are the same then κ ⊗1:n (d) is n-symmetric, i.e.
κ 1:n ∈ Sd,n , or equivalently κ ⊗
⊗
1:n (d) = S κ ⊗
d1n 1:n (d), see Remark 3.2.
This and the following result are obvious consequences of the T-derivative
applied to the cumulant function.
Property 3.2 (Multilinearity, Scaling) Let c1:2 = [c1 , c2 ] be constants, then
in the scalar case. Let A and B be constant matrices with appropriate dimensions,
then
and
Cumn+1 (AY1 + BY2 , X1:n ) = A ⊗ Id(n) Cumn+1 (Y1 , X1:n )
+ B ⊗ Id(n) Cumn+1 (Y2 , X1:n ). (3.11)
Cum(X, Y) = 0. (3.13)
This formula is the additive version of formula (3.4) of the expectation of the product
of independent variables. If X1:n is independent of Y1:m and the corresponding
dimensions are equal, then
κ⊗ ⊗ ⊗
X1:n +Y1:n = Cumn (X1:n + Y1:n ) = Cumn (X1:n ) + Cumn (Y1:n ) = κ X1:n + κ Y1:n .
(3.14)
Again, formula (3.14) is the additive version of formula (3.4), which is the
expectation of the T-product of independent variables. Let ψX1:n +Y1:n (λ1:n ) be the
cumulant generator function of X1:n + Y1:n = (X1 + Y1 , . . . , Xn + Yn ), by an
obvious notation. Then we have the additivity
for independent variates and then calculating the nth-order T-derivative with respect
to λ1:n = [λ1 , . . . , λn ] of both sides of (3.15), and setting λ1:n = 0 (i.e. λ1 =
0, λ2 = 0, . . . , λn = 0) we obtain (3.14). It also follows that for any constant c1:n
with appropriate dimension, if n > 1, then
This property is applied for a list Y1:n of random vectors setting X = vecY1:n .
Very often cumulants are estimated using estimated moments. We shall see that
every nth-order joint cumulant can be written in terms of nth and lower-order joint
moments. Therefore, it is important to obtain an explicit relation between cumulants
and moments and vice versa.
m
i kj
φ(λ) = μk λk + o(|λ|m ), (3.16)
k!
k≥0,kj ≤m
k
with the moment coefficients μk = E nj=1 Xj j and the series expansion of the
corresponding cumulant generator function
m
i kj
ψ (λ) = Cum X1k1 , . . . , Xn1kn λk + o(|λ|m ). (3.17)
k!
k≥0,kj ≤m
Cum X1k1 , . . . , Xn1kn = Cum(X1 , . . . , X1 , . . . , Xn , . . . , Xn ).
+ ,- . + ,- .
k1 times kn times
Taking the logarithm on the right side of (3.16), then the Taylor series expansion
for it, and collecting the appropriate coefficients of λk one can obtain the required
relations between moments and cumulants. We recall the following notations:
1 1
φ(λ) = 1 + iμ1 λ1 + iμ2 λ2 − μ(1,1)λ21 − μ(1,2)λ1 λ2 − μ(2,2)λ22 + o |λ|2 .
2 2
We consider the Taylor series expansion of the function ln (1 + y), and see that
1 1
ψ(λ) = ln 1 + iμ1 λ1 + iμ2 λ2 − μ(1,1) λ21 − μ(1,2) λ1 λ2 − μ(2,2) λ22 + o |λ|2
2 2
= iμ1 λ1 + iμ2 λ2 − (μ(1,2) − μ1 μ2 )λ1 λ2
1 1
− (μ(1,1) − μ21 )λ21 − (μ(2,2) − μ22 )λ22 + o |λ|2 .
2 2
respectively.
We will now consider what appears to be a special case of expressing cumulants
via moments when k1 = k2 = . . . = kn = 1, but it turns out that it covers the
general case as well, by the distinct values principle.
Theorem 3.1 (Cumulants via Moments) Takerandom variables X1:n , and let
μ(bj ) denote the expected value of the product E k∈bj Xk , then
n
r
κ(1:n) = (−1)r−1 (r − 1)! μ(bj ) , (3.20)
r=1 K{r} ∈Pn j =1
* *
where K{r} = {b1 , b2 , . . . , br } ∈ Pn is a partition with size *K{r} * = r and Pn is
the set of all partitions of (1 : n), see (3.23) for an other form of (3.20).
Proof This theorem is a reformulation of Corollary 2.2, p. 65.
128 3 T-Moments and T-Cumulants
r
uj = [1, 1, . . . , 1] ,
j =1
r
uj = [1, 1, . . . , 1] .
j =1
(Note here that sum of vectors is given by the sum of the corresponding entries.)
This condition is important for obtaining the partition K = {b1 , b2 , . . . , br } of
(1, 2, . . . , n).
Let uj = uj,1 , uj,2 , . . . , uj,n , (j = 1, 2, . . . , r) where uj,k is either 1 or zero.
Let
u u u u
EX1 j,1 X2 j,2 · · · Xn j,n = EX1:n
j
.
3.4 Expressions between Moments and Cumulants 129
u
j
EX1:n = μ(1,2),
or if uj = [1, 1, 1, . . . , 1] = 1n , then
u
j
EX1:n = μX1n = μ(1:n) ,
1:n
and so on. Suppose we have the partition K = {b1 = (1, 2) , b2 = (3, 4, . . . , n)}
and want to consider the corresponding expectation μ(b1 ) = μ(1,2), and μ(b2 ) =
μ(3:n)
⎛ ⎞ ⎛ ⎞
E⎝ Xk ⎠ E ⎝ Xk ⎠ = μ(b1 ) μ(b2 )
k∈b1 k∈b2
u1 u2
= EX1:n EX1:n = μ(u1 ) μ(u2 ) ,
u1 + u2 = [1, 1, 1, . . . , 1] = 1n .
3
κ(1:3) = μ(1:3) − μ(1,2)μ3 − μ(1,3)μ2 − μ1 μ(2,3) + 2μ1 μ2 μ3 = E (Xi − μi ) .
i=1
(3.21)
The first three cumulants, see also (3.19), are equal to the central moments
but this is not true for higher-order cumulants. One might easily check this for
cumulants of order four (see Example 3.13 below).
With the above notation we* can *rewrite formula (3.20) where the summation is
over all partitions K{r} of size *K{r} * = r. The general formula is
n
r
κ(1:n) = (−1)r−1 (r − 1)! μ (u j ) , X ∈ R n , (3.23)
r=1 U ∈Pn,|U |=r j =1
u
where μ(uj ) = EX1:n j
the second summation is taken over all possible partition
matrices U with row vectors uj , uj ∈ Rn , U = uj j =1:r , see Sect. 1.4 for details.
* *
Such a vector system uj j =1:r corresponds to a partition K{r} with size *K{r} * = r,
of the set 1 : n = (1, 2, . . . , n); therefore, the double sum in (3.23) is over all
partitions of K{r} ∈ Pn , where Pn is the set of all partitions of the numbers 1 : n.
We have seen that generating the partitions K{r} = {b1 , b2 , . . . , br } is equivalent to
generating the row vectors [u1 , u2 , . . . , ur ] , where the elements of each vector are
either 1 or a zero depending on whether a particular value j is in the block or not.
Example 3.13 We can now write the fourth-order cumulant using the matrices U in
Table 1.3, p. 29, as
We see that κO4 = Cum (X1 , X2 , X3 , X4 ) given by (3.24) has 15 terms, but it gets
considerably simpler if we assume EX1 = EX2 = EX3 = EX4 = 0. Under this
assumption (3.24) reduces to
This latter expression shows that fourth-order cumulants differ from fourth-order
central moments:
4
κ(1:4) = E (Xi − EXi ) , (3.26)
i=1
unless all centralized μ(j,k) = κ(j,k) , j = k, are zero, i.e. variates are uncorrelated.
Formula (3.20) gets significantly simpler when all components of X1:n coincide
with X ∈ R, say, because in this case μ(b) = E k∈b Xk = μX,|b| , only depends on
the cardinality |b| of the block b. The number of all partitions K{r} with size r and
type = [1 , . . . , n ] is the number of terms in the sum (3.20) which is
n! n!
n−r+1 = n−r+1 ,
j =1 j ! (j !) j
j =1 j ! rj =1 kj !
where kj denotes the cardinality of the blocks (see (1.41), p. 32). Now we change
the summation by all partitions in (3.20) so that we collect partitions with the same
type for a given r,
n
n−r+1 j
1 μX,j
κX,n = n! (−1)r−1 (r − 1)! (3.27)
j ! j!
r=1 j =r,j j =n j =1
n n!
r
κX,n = (−1) r−1
(r − 1)! n−r+1 r μX,kj ,
r=1 j =r,j j =n j =1 j ! j =1 kj ! j =1
where kj denotes the cardinality of blocks of the partition K{r} , and the sum is taken
# = (1 , . . . , #
over all sequences n ), j ≥ 0 such that the following two conditions
are satisfied: nj=1 j = r, and nj=1 j j = n.
Expression (3.20) is considered to be general by the distinct value principle, and
we can get any higher-order cumulant. However Theorem 2.1, p. 70 that gives the
formula for higher-order derivatives of compound functions can be used to obtain
the formula for higher-order cumulants directly:
|n|
K
1 μ(jk ) k
κ(n) = (−1)r−1 (r − 1)! n! , (3.28)
k ! jk !
r=1 p(r,,j) k=1
K
jk,s k = ns , s = 1 : d,
k=1
K
k = r.
j =1
These formulae for the vector-valued case follow from the Example 3.9 similar
to the scalar-valued case. Observe that Cov(Xk , Xj ) denotes the usual covariance
⊗u
matrix. Now using the obvious notations either μ⊗ ⊗ ⊗
bj = E Xbj or μuj = EX1:n ,
j
n ⊗
κ⊗
1:n = (−1)r−1 (r − 1)! K−1
p(K{r} )
μ⊗ , (3.29)
j =1:r bj
r=1 K{r} ∈Pn ,
where K{r} denotes partitions of the set 1 : n, with size r, and the second sum is
over all such partitions K{r} ∈ Pn , or equivalently by
n ⊗
κ⊗
1:n = (−1)r−1 (r − 1)! K−1
p(U ) (d1:n ) μ⊗
uj , (3.30)
j =1:r
r=1 U ∈Pn ,|U |=r
We shall see that the first three T-cumulants, like in the scalar case, equal the
central T-moments but this is not true for higher-order cumulants. It is easy to check
this for cumulants of order four or higher (see (3.32), also (3.19), Examples 3.14
and 3.15).
Example 3.14 Let n = 3. First we discuss the partitions with size r and the
corresponding permutations of (1 : 3) (see also Example 1.21, p. 36 as well). The
possible sizes are |K| = r = 1, 2, 3. When r = 1, K{1} = {b} and the single
partition in this case is only one block with elements (1, 2, 3). For r = 2 the
partitions have two blocks K{2} = {b1 , b2 } , where b1 has one element, and b2
has two elements and the possible partitions are K{2},1 = {(1) , (2, 3)}, K{2},2 =
{(1, 3) , (2)}, K{2},3 = {(1, 2) , (3)}. When r = 3, K{3} = {(1) , (2) , (3)}. Since
the permutations depend on the partitions;
therefore, we usethe notation
p (K). The
corresponding
permutations
are p K {1} = (123), p K {2},1 = p K {2},3 = (123),
p K{2},2 = (132), and p K{3} = (123), see Example 1.21, p. 36. Now we apply
(3.29) to get
κ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗ ⊗
1:3 = μ1:3 − K(132) μ1,3 ⊗ μ2 − μ1,2 ⊗ μ3 − K(231) μ2,3 ⊗ μ1 + 2μ1 ⊗ μ2 ⊗ μ3 .
(3.31)
The third term in the right-hand side may be replaced by E (X1 ⊗ EX2 ⊗ X3 ), since
K−1 ⊗ ⊗ −1
(132) μ1,3 ⊗ μ2 = EK(132) (d1:3 ) (X1 ⊗ X3 ⊗ EX2 ) = E (X1 ⊗ EX2 ⊗ X3 ) ,
−1
and K−1
(132) = Id1 ⊗ Kd3 •d2 = Id1 ⊗ Kd3 •d2 = Id1 ⊗ Kd2 •d3 , i.e. the T-
product E (X1 ⊗ X3 ) ⊗ EX2 will be rearranged to the original order (1, 2, 3). Now
one can easily show that
⊗
κ⊗
1:3 = E (Xi − EXi ) . (3.32)
i=1:3
In the case of Xi = X
κ⊗ ⊗ −1 ⊗ ⊗ ⊗3
X,3 = μX,3 − L12 ,11 μX,2 ⊗ μX + 2μX ; (3.33)
where
L−1 −1 −1
12 ,11 = Id 3 + K(132) + K(231) ,
κ⊗
X,3 = E (X − EX)
⊗3
.
of formula (3.30) sums the product of expectations with respect to the partition
matrices L ↔ U in a given order |U | = r. This means that U has r rows, actually
the corresponding partitions L have exactly r blocks, (none of which is empty).
* * that for a partition L = {b1 , b2 , . . . , br } ∈ Pn , with box-
We saw in Sect. 1.4.4
cardinality kj = *bj *, we can consider its type = (1 , . . . , n ). We recall that
the type of the partition K is (1 , . . . , n ), so that K contains exactly j blocs with
cardinality j . The number of all partitions with type = (1 , . . . , n ) is the number
of terms in the sum (3.34) which is
n!
n−r+1 ,
j =1 j ! (j !)j
⊗uj
see (1.41), p. 32. The orders of the expected values in the product ⊗ j =1:r EX1:n
correspond to the type of the partition; therefore, it is reasonable to split the sum
(3.34) further up with respect to the different types of partitions.
Therefore, equating the variables of the list X1:n to X, say, and using the
symmetrizer Sd1n and symmetry equivalent equivalence, we obtain an expression
similar to (3.27)
⊗
n
n!
κ⊗
X,n = (−1)r−1 (r − 1)! n r μ⊗ (3.35)
j =1 j ! k
j =1 j ! j =1:r X,kj
r=1 j =r,j j =n
n ⊗
1 1 ⊗ j
= n! (−1)r−1 (r − 1)! μX,j ,
j =1:(n−r+1) j ! j!
r=1 j =r,j j =n
where kj denote the cardinalities of the blocks, as usual (see Definition 1.2, p. 16
of symmetry equivalence). Here one can apply the incomplete Bell polynomials to
show the structure of partitions with size r and type .
Remark 3.4 It is worth noting that in expression (3.29) we can use the canonical
form for partitions which defines the commutator matrix K−1 p(K{r} )
and the order of
⊗
the tensor product j =1:r μ⊗ ⊗
bj , as well as the order of vectors inside μbj . If we set
the components of X1:n to be equal, then the order of vectors inside μ⊗
bj will have no
⊗ ⊗
importance. The T-products j =1:r μbj will still be different, even if we fix a type
3.4 Expressions between Moments and Cumulants 135
In case all the variables are the same, this becomes L−1
13 ,11 μ ⊗
X,3 ⊗ μ⊗
X , where
the matrix is
L−1 −1 −1 −1
13 ,11 = Id 4 +K(1243) + K(1342) + K(2341) .
case, we have
μ⊗ ⊗ −1 ⊗ ⊗ −1 ⊗ ⊗
1,2 ⊗ μ3,4 + K(1324) μ1,3 ⊗ μ2,4 + K(1423) μ1,4 ⊗ μ2,3 .
Observe that commutators replace the indices into the original order. In case all
the variables are the same this becomes L−1 ⊗2
22 μX,2 , where
L−1 −1 −1
22 = Id 4 + K(1324) + K(1423) ;
μ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗
1 ⊗ μ2 ⊗ μ3,4 + μ1 ⊗ μ2,3 ⊗ μ4 + K(1243) μ1 ⊗ μ2,4 ⊗ μ3
+ μ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗
1,2 ⊗ μ3 ⊗ μ4 + K(1324) μ1,3 ⊗ μ2 ⊗ μ4 + K(1423) μ1,4 ⊗ μ2 ⊗ μ3 .
In case all variables are the same this becomes L−1 ⊗ ⊗2
12 ,21 μX,2 ⊗ μX , with
L−1 −1 −1 −1 −1 −1
12 ,21 = Id 4 + K(1324) + K(1423) + K(2314) + K(2413) + K(3412) .
Now Cum4 (X1:4 ) is the sum of the above expressions multiplied by (−1)r−1 (r −
1)! accordingly
κ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗
1:4 = μ1:4 − μ1,2,3 ⊗ μ4 − K(1243) μ1,2,4 ⊗ μ3
− K−1
(1342) μ⊗
1,3,4 ⊗ μ ⊗ ⊗ ⊗
2 − μ1 ⊗ μ2,3,4
− μ⊗ ⊗ −1 ⊗ ⊗ −1 ⊗ ⊗
1,2 ⊗ μ3,4 − K(1324) μ1,3 ⊗ μ2,4 − K(1423) μ1,4 ⊗ μ2,3
+ 2μ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗
1 ⊗ μ2 ⊗ μ3,4 + 2μ1 ⊗ μ2,3 ⊗ μ4 + 2K(1243) μ1 ⊗ μ2,4 ⊗ μ3
+ 2μ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗
1,2 ⊗ μ3 ⊗ μ4 + 2K(1324) μ1,3 ⊗ μ2 ⊗ μ4
+ 2K−1 ⊗ ⊗ ⊗
(1423)μ1,4 ⊗ μ2 ⊗ μ3
− 6μ⊗ ⊗ ⊗ ⊗
1 ⊗ μ2 ⊗ μ3 ⊗ μ4 .
It should be noted that each term depends on the dimensions d1:4 of X1:4 as well. In
case all the variables are the same X, we have
κ⊗X,4 = μ ⊗
X,4 − L −1
13 ,11 μ⊗
X,3 ⊗ μ⊗
X − L−1 ⊗2
μ
22 X,2 + 2L−1
12 ,21 μ⊗
X,2 ⊗ μ⊗2
X − 6μ⊗4
X .
κ⊗ ⊗ ⊗ ⊗ ⊗2 ⊗ ⊗2 ⊗4
X,4 = μX,4 − 4μX,3 ⊗ μX − 3μX,2 + 12μX,2 ⊗ μX − 6μX .
Further, assuming μ⊗
X = 0, we get
κ⊗ ⊗ −1 ⊗2 ⊗ ⊗2
X,4 = μX,4 − L22 μX,2 = μX,4 − 3μX,2 . (3.36)
3.4 Expressions between Moments and Cumulants 137
shows that besides the incomplete Bell polynomial B5,1 (x1:5 ) = x5 the only term
which does not contain x1 is 10x2x3 . This latter one is a term of the incomplete Bell
polynomial B5,2 (x1 , . . . , x4 ) = 5x1 x4 + 10x2x3 . Therefore, the cumulant κ ⊗
X,5 has
the form
κ⊗ X,5 = μ ⊗
X,5 − L−1
13 ,12 μ⊗
X,3 ⊗ μ⊗
X,2 ,
L−1 −1 −1 −1 −1 −1
13 ,12 = Id 5 + K(12435) + K(12534) + K(13425) + K(13524) + K(14523) (3.37)
+ K−1 −1 −1 −1
(23415) + K(23514) + K(24513) + K(34512).
Using the 5-symmetry and Sd15 , we obtain the symmetry equivalent form
κ⊗ ⊗ ⊗ ⊗
X,5 = μX,5 − 10μX,2 ⊗ μX,3 .
We can get this formula directly from the Bell polynomial B5 (x1:5 ) as well. Using
Sd16 on the cumulant κ ⊗
X,6 of a centered X we see, it is symmetry equivalent to
κ⊗ ⊗ ⊗ ⊗ ⊗2 ⊗3
X,6 = μX,6 − 15μX,2 ⊗ μX,4 − 10μX,3 + 30μX,2 .
The distinct values principle is applied and we consider the moment EX1n as the
k
general case because higher-order mixed moments EY1:r1:r can be put into the form
EX1n , where n = k1:r and
X = [ Y1 , . . . , Y1 , . . . , Yr , . . . , Yr ] = Y1k1 , . . . , Yr1kr .
+ ,- . + ,- .
k1 kr
Now EX1n = μ(1:n) , κXbj = Cum Xbj
μ(1:n) = κ Xb j , (3.38)
K∈Pn bj ∈K
Now in particular
Recall here the notation κ3 = Cum (X3 ), whilst κX,3 = Cum3 (X).
In order to obtain cumulants of the products of random variables in the next
section, we need the following result which relates moments to cumulants. Here we
express joint moments in terms of cumulants. The result is similar to the result in
Theorem 3.1.
Theorem 3.3 (Moments in Terms of Cumulants) Take random variables X1:n ,
then
n r
μ(1:n) = κX,bj , (3.40)
j =1
r=1 K{r} ∈Pn
where K{r} = {b1 , b2 , . . . , br } is a partition of the set (1, 2, . . . , n), Xbj denotes
the subset of random variables [X1 , X2 , . . . , Xn ] such that Xbj = Xi , i ∈ bj ,
and the summation is over all such partitions.
3.4 Expressions between Moments and Cumulants 139
r
n−r+1
κX,bj = j
κX,j .
j =1
K∈Pn K∈Pn j =1
n
n−r+1 j
1 xj
Bn (x1:n ) = n!
j ! j!
r=1 j =r,j j =n j =1
(see ( 1.44)), where the summation over all partitions K ∈ Pn is split into two parts.
First we sum up for partitions with a given order r, then for all r. Hence we can
rewrite (3.40) as
n
n−r+1 j
1 κX,j
μX,n = n! .
j ! j!
r=1 j =r,j j =n j =1
We conclude that μX,n = EXn can be expressed by the help of Bell polynomials
replacing the unknowns xj by cumulants κX,j = Cumj (X); therefore, we can use
the Bell polynomials to directly express the moments in terms of cumulants μX,n =
Bn κX,1 , . . . , κX,n .
Example 3.18 The Bell polynomial of degree 5 is
+ 10κX,2κX,1
3
+ κX,1
5
= B5 κX,1 , . . . , κX,n .
140 3 T-Moments and T-Cumulants
Observe the convention for the order of the terms and variables inside a term: in
the polynomial expression the alphabetical order of the indices is used, while the
Kendall–Stuart notation when expressing moments in terms of cumulants, puts the
higher-order cumulants first, etc., which results in almost the opposite order as in
the polynomials.
From Faà di Bruno’s formula for general mixed derivatives, we get the Theo-
rem 2.1, p. 70 which gives the formula for higher-order derivatives of compound
functions and we directly obtain the formula for higher-order cumulants
|n|
K
1 κjk k
μ(n) = n! (3.41)
k ! j k !
r=1 p(r,,j) k=1
n = (n1 , . . . , nd ), |n| = nk , K = (nk + 1) − 1, 0 < jk = jk,1 , . . . , jk,d ≤
n, k = 1 : K, 0 ≤ k , k = 1 : K, where the set p (r, ,j) =
p (r, (1 , . . . , K ) , (j1 , . . . , jK )) of (1 , . . . , K ), (j1 , . . . , jK ) is defined by the
constraint
K
jk,s k = ns , s = 1 : d,
k=1
K
k = r.
j =1
Observe that the left-hand side of (3.40) is the product of variables contained in
On which is the largest partition with one block b = {(1 : n)}, and the right-hand
side is the sum with respect to all partitions K ∈ Pn . Actually, they are all finer than
O, i.e. O K.
E (X1 X2 · · · Xn ) = E X1:n .
⊗k
We consider the moment EX⊗1 n
1:n as the general case because the moment EY1:r
1:r
⊗1n
can be put into the form EX1:n , where
X1:n = Y1k1 , . . . , Yr1kr = [Y1 , . . . , Y1 , . . . , Yr , . . . , Yr ],
+ ,- . + ,- .
k1 kr
μ⊗ ⊗ −1 ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
1:3 = κ 1:3 + K(321) κ 2,3 ⊗ κ 1 + K(132) κ 1,3 ⊗ κ 2 + κ 1,2 ⊗ κ 3 + κ 1 ⊗ κ 2 ⊗ κ 3 .
(3.44)
μ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗3
X,3 = κ X,3 + K(132) κ X,2 ⊗ κ X,1 + κ X,1 ⊗ κ X,2 + K(231) κ X,1 ⊗ κ X,2 + κ X,1 .
μ⊗ ⊗ ⊗ ⊗ ⊗3
X,3 = κ X,3 + 3κ X,2 ⊗ κ X,1 + κ X,1 .
Note that κ ⊗ ⊗
X,1 = μX .
μ⊗ ⊗ −1 ⊗2
X,4 = κ X,4 + L22 κ X,2 ,
where
L−1
22 =
−1
Kp(K) (d) ,
K
and where partition K = {b1 , b2 }, and both blocks bi contain two elements.
These are K1 = {(1,
2) , (3, 4)}, K2 = {(1, 3) , (2, 4)}, K3 = {(1, 4) , (2, 3)}.
Permutations p Kj are the unique permutation with respect to partition Kj , (see
Sect. 1.4.4 for the connection between partitions and permutations). Therefore,
L−1 −1 −1
22 = Id 4 + K(1324) (d) + K(1423) (d) ,
and
−1 ⊗2
μ⊗ ⊗ ⊗ ⊗2
X,4 = κ X,4 + L22 κ X,2 = κ X,4 + 3κ X,2 ,
by Sd14 .
In particular when Xj = X
⊗
μ⊗
X,n = K−1
p(K) κ⊗
X,|bj |
, (3.45)
bj ∈K
K∈Pn
where the summation is over all partitions K = {b1 , b2 , . . . , bk*} of* 1 : n. We repeat
the notation κ ⊗
X,|bj |
= Cum|bj | (X) for the T-cumulant of order *bj * having identical
vector variates X in it, and for a list of vector variates Xbj with possibly different
* *
dimensions the T-cumulant κ ⊗ * *
X,bj = Cum(Xbj ) is of order bj .
The nth-order moment μ⊗ X,n is symmetric; therefore, we can apply the sym-
metrizer in a general case as well. Collecting the partitions with the same size and
type and using Bell polynomials and symmetrizer Sd1n , we arrive at the formula
n n! ⊗ ⊗
μ⊗
X,n = n κ X,jj , (3.46)
j =1 j ! (j !)
j j =1:n−r+1
r=1 j =r,j j =n
3.4 Expressions between Moments and Cumulants 143
n ⊗ ⊗l
μ⊗
X,n = L−1
lr:1 κ X,jj ,
j =1:r
r=1 j =r,j j =n
see Sect. 2.4.3, p. 2.4.3 for correspondence between 1:n and lr:1 .
Example 3.21 If κ ⊗ ⊗
X,1 = μX = 0, then by formula (3.46) we have
μ⊗ ⊗ ⊗ ⊗ ⊗2 ⊗3
X,6 = κ X,6 + 15κ X,4 ⊗ κ X,2 + 10κ X,3 + 15κ X,2 ,
μ⊗ ⊗ −1 ⊗ ⊗ −1 ⊗2 −1 ⊗3
X,6 = κ X,6 + L14 ,12 κ X,4 ⊗ κ X,2 + L23 , κ X,3 + L32 , κ X,2 .
Let us start with solving the problem of finding Cum (X1 , X2 X3 ), in terms of
products of joint cumulants of X1:3 ∈ R3 . We can express Cum (X1 , X2 X3 ) in
terms of moments and then the moments in terms of cumulants,
The point actually is that the moments are taken as the products of the variables
given in the cumulant, then we can turn the moments into individual cumulants.
Therefore, the cumulant of the products can be expressed with the products of
cumulants of the variables themselves. Moreover the cumulant of X1 and X2 X3
can be considered as the cumulant of products with respect to a partition of all the
variables X1:3 considered. The random variables XL = (X1 , X2 X3 ) correspond to
the partition L = {(1), (2, 3)}. The result we got above is a sum of the products
of cumulants each term of which also corresponds to a partition of the set (1, 2, 3).
They are K1 = {(1, 2, 3)} , K2 = {(1, 3), (2)} and K3 = {(1, 2), (3)}.
K2 K3
hooked through (2,3)∈L hooked through (2,3)∈L
- .+ , - .+ ,
(2, 3) , (2) (1, 2) , (3)
+ ,- . + ,- .
communicate communicate
The main feature of these partitions is that there is no other partition but the largest
one (1, 2, 3) which can be combined by the help of the elements, i.e. blocks, of
both L and Kj at the same time. All the indecomposable partitions of (1, 2, 3)
with respect to the partition L = {(1), (2, 3)} are K1 = {(1, 2, 3)} , K2 =
{(1, 3), (2)} and K3 = {(1, 2), (3)}, and all of them show up in the right-hand side.
Let XL denote the vector of entries taken by the partition L, i.e. if L =
{b1 , b2 , . . . , bk } then
⎡ ⎤
XL = Xb1 , Xb2 , . . . , Xbk = ⎣ Xj , Xj , . . . , Xj ⎦ .
j ∈b1 j ∈b2 j ∈bk
Let us call L the initial partition. Similarly let K denote a new partition. We say the
new partition K is indecomposable with respect to the partition L if all blocs in K
communicate with respect to L (see Sect. 1.4.6, p. 38 for details). Then we have
Theorem 3.6 (Malyshev’s Formula) Let the initial
partition L be {b1 , b2 ,
. . . , bk }. The cumulant of the products Xb1 , Xb2 , . . . , Xbk can be
expressed as the cumulants of the subsets of the individual variables Xb , b ∈ K,
⎛ ⎞
Cum ⎝ Xj , Xj , . . . , Xj ⎠ = Cum(Xb ), (3.48)
j ∈b1 j ∈b2 j ∈bk K∪L=O b∈K
where Xb denotes the vector containing the items Xs , s ∈ b, and the summation is
over all partitions K such that K and L are indecomposable (K ∪ L = O).
Proof See Appendix 3.6.1, p. 169.
3.4 Expressions between Moments and Cumulants 145
There are several instances where we need to calculate the cumulants of products
of random variables, such as Cum(X1 X2 , X3 X4 X5 ), etc. This will become apparent
when we study higher-order moments and cumulants for distributions.
Let us consider some examples to illustrate the above ideas.
Example 3.23 Let (X1 , X2 , X3 ) be three random variables. Let us compute
Cum (X1 X2 , X3 ) . We have from (3.47)
Here the initial partition L consists of the set L = {(1, 2) , (3)} , and the partitioning
sets given in the right-hand sides expressions of (3.49) are K1 = {(1, 2, 3)} ,
K2 = {(1, 3) , (2)} , K3 = {(1) , (2, 3)} . The first partition given by the set K1
contains only one block b1 = (1, 2, 3) , the second partition given by the set K2
has two blocks b1 = (1, 3) ; b2 = (2), and the third partition given by the K3
contains two blocks b1 = (1) , b2 = (2, 3) . The partition K1 is indecomposable
as it has only one set with all the elements. In fact any single set with all the
elements (1, 2, . . . , n) is always indecomposable. Now consider K2 , with the blocks
b1 = (1, 3) , b2 = (2) . We note the block (1, 2) of L contains elements 1, 2 which
are in the blocks b1 and b2 . Hence the sets b1 and b2 hook, and they communicate
as well. (If a partition has only two blocks and if they hook with respect to L, they
obviously communicate.) Therefore, K2 is an indecomposable partition with respect
to L. Now consider K3 , where b1 = (1) and b2 = (2, 3). We can see that b1 and b2
hook and also communicate. Hence K3 is an indecomposable partition with respect
to L.
Example 3.24 Let (X1 , X2 , X3 , X4 ) be four random variables, and let us calculate
Cum(X1 , X2 , X3 X4 ). For convenience, let us assume EXi = 0, (i = 1, 2, 3, 4) .
We have from (3.93), p. 174,
Here the initial partition is L = {(1) , (2) , (3, 4)}, and the partitions K1 , K2 , and
K3 are
K1 K2 K3
hooked through (3,4)∈L hooked through (3,4)∈L
- .+ , - .+ ,
{(1, 2, 3, 4)} (1, 3) , (2, 4) (1, 4) , (2, 3)
+ ,- . + ,- .
communicate communicate
Partition K1 is indecomposable, as it contains only one set with all the elements
(1, 2, 3, 4). Partition K2 has two blocks b1 = (1, 3) , b2 = (2, 4) .The block
(3, 4) of L has the elements 3, 4 which are in b1 and b2 . Hence b1 and b2 hook,
and communicate. Therefore, K2 is an indecomposable partition with respect to
146 3 T-Moments and T-Cumulants
the initial partition L. Similarly we can see the partition K3 = {(1, 4) , (2, 3)} is
indecomposable partition with respect to L.
Now consider another example
Example 3.25 Consider now the partition L = {(1, 2) , (3, 4)} for random variables
X1:4 . We have listed all indecomposable partitions with respect to L in Exam-
ple 1.26, p. 39. Using those we have
If EXj = 0, then
The scalar version of Cum2 (X1 , X2 ⊗ X3 ) has been treated before at the beginning
of the previous section. To obtain this result, we need the expressions of cumulants
in terms of moments (3.29), and we obtain
κ⊗ ⊗ ⊗ ⊗
X1 ,X2 ⊗X3 = μ1,2,3 − μ1 ⊗ μ2,3 (3.50)
μ⊗ ⊗ ⊗ ⊗
1,2 = κ 1,2 + κ 1 ⊗ κ 2 (3.51)
3.4 Expressions between Moments and Cumulants 147
and
μ⊗
1,2,3 = κ ⊗
X1 ,X2 ⊗X3 + κ ⊗
X1 ⊗ κ ⊗
X2 ⊗X3 ,1 = κ ⊗
X1 ,X2 ⊗X3 + κ ⊗
X1 ⊗ κ ⊗
2,3 + κ ⊗
X2 ⊗ κ ⊗
X3 ,
(3.52)
since κ ⊗ ⊗ ⊗ ⊗
X2 ⊗X3 ,1 = μ2,3 . From now on, we apply our usual notation as κ Xj = κ j
etc. Substituting the moments by cumulants in (3.50) we obtain
κ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗
X1 ,X2 ⊗X3 = κ 1:3 + κ 1 ⊗ κ 2,3 + K(132) κ 1,3 ⊗ κ 2 + κ 1,2 ⊗ κ 3
+ κ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
1 ⊗ κ 2 ⊗ κ 3 − κ 1 ⊗ κ 2,3 − κ 1 ⊗ κ 2 ⊗ κ 3
= κ⊗ −1 ⊗ ⊗ ⊗ ⊗
1:3 + K(132) κ 1,3 ⊗ κ 2 + κ 1 ⊗ κ 2,3 . (3.53)
EX1 ⊗ X2 ⊗ Y = κ ⊗ ⊗ ⊗ −1 ⊗ ⊗
X1 ,X2 ,Y + κ X1 ⊗ κ X2 ,Y + K(132) κ X1 ,Y ⊗ κ 2
+ κ⊗ ⊗ ⊗ ⊗ ⊗
1,2 ⊗ κ X1 ,Y + κ 1 ⊗ κ 2 ⊗ κ X1 ,Y . (3.54)
But using the result in (3.53) (after changing the variables), we get
κ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗
X2 ,Y = κ X2 ,X3 ⊗X4 = κ 2,3,4 + K(132) κ 2,4 ⊗ κ 3 + κ 2,3 ⊗ κ 4 .
κ⊗ ⊗ ⊗ ⊗ ⊗
Y,1 = μ3,4 = κ 3,4 + κ 3 ⊗ κ 4
+κ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
1,2 ⊗ κ 3,4 + κ 3 ⊗ κ 4 + κ 1 ⊗ κ 2 ⊗ κ 3,4 + κ 3 ⊗ κ 4 . (3.55)
148 3 T-Moments and T-Cumulants
Except for the first term in the right-hand side of (3.55), all the partitions are
decomposable with respect to the initial partition L = {(1) , (2) , (3, 4)}. Let us
carefully look at all the terms of the right-hand side of expression (3.55). Each one
of the last four indecomposable partitions,contains two blocks, and each block of
each of these partitions contains only one element from the block (3, 4) of L. If
there is a block (within a partition with more than one block) that contains both
elements 3 and 4, that partition will be decomposable. This observation, which is
similar to the one we made when calculating Cum2 (X1 , X2 ⊗ X3 ), is crucial for our
evaluation of the general result which will be stated later in Lemma 3.6.
The blocks of any partition (except for the first partition) do not hook (and do not
communicate) with the initial partition L. Now we can express EX1 ⊗X2 ⊗Y =μ⊗ 1:4
by (3.43) as
⊗
μ⊗
1:4 = K−1
p(K) κ⊗
X,bj , (3.56)
bj ∈K
K∈P4
with all the partitions on right-hand side. Equating (3.55) and (3.55), and canceling
the common terms, we obtain
κ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ −1 ⊗ ⊗
X1 ,X2 ,X3 ⊗X4 = κ 1:4 + κ 1,2,3 ⊗ κ 4 + K(1243) κ 1,2,4 ⊗ κ 3 + K(1423) κ 1,4 ⊗ κ 2,3
+ K−1 ⊗ ⊗
(1323) κ 1,3 ⊗ κ 2,4 . (3.57)
All the above partitions are indecomposable with respect to the initial partition L =
{(1) , (2) , (3, 4)} .
Now we state the analogous result to the scalar case, using commutator matrices.
It has been shown, see Sect. 1.4.4, p. 35, that for each partition K, in canonical form,
there is a unique matrix representation and there corresponds a unique permutation
p (K). This means that the elements of the blocks b ∈ K and the blocks in K appear
alphabetical order and it is fixed.
Theorem 3.7 (Malyshev’s Formula) Let X1:n denote a list of random vectors.
Now consider the list of products taken by an initial partition L = (b1 , b2 , . . . , br ):
⊗ ⊗ ⊗
XL = Xb1 , Xb2 , . . . , Xbr .
The cumulant of the products XL with respect to the partition L can be expressed
with the products of cumulants of the block of individual variables such that L and
K are indecomposable partitions, K ∪ L = O, i.e.
⊗ ⊗ ⊗ ⊗
−1
Cumk Xb1 , Xb2 , . . . , Xbr = Kp(K) db1:p Cum|b| (Xb ),
b∈K
K∪L=O
(3.58)
3.5 Additional Matters 149
⊗
⊗ Xb denotes the set of vectors containing the
items Xs , s ∈ b,
where Xb =
X
j ∈b j , and the order of the dimensions d b1:p = d b1 , ..., d bp follows the order
of variables in XL . With a shorter notation
⊗
κ⊗
XL =
−1
Kp(K) κ⊗
X,b , (3.59)
b∈K
K∪L=O
where again the blocks in all partitions and the entries inside the blocks must be in
the same order as in the product, say canonical order (see Definition 1.7, p. 35).
Proof See Proof 3.6.1.
Example 3.26 To compute Cum (X1 ⊗ X2 , X3 ⊗ X4 ) we need to use Cum (X1 , X2 ,
X3 ⊗ X4 ), and lower-order cumulants such as Cum (X1 , X3 ⊗ X4 ), Cum (X2 , X3
⊗X4 ), given by (3.53) and (3.57). After substitution, we have
κ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗
X1 ⊗X2 ,X3 ⊗X4 = κ 1:4 + κ 1,2,3 ⊗ κ 4 + K(1243) κ 1,2,4 ⊗ κ 3
+K−1 ⊗ ⊗ −1 ⊗ ⊗
(1342)κ 1,3,4 ⊗ κ 2 + K(1243) κ 2,3,4 ⊗ κ 1
+K−1 ⊗ ⊗ −1 ⊗ ⊗
(1324)κ 1,3 ⊗ κ 2,4 + K(1423) κ 1,4 ⊗ κ 2,3
+K−1 ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗
(1324)κ 1,3 ⊗ κ 2 ⊗ κ 4 + K(1423) κ 1,4 ⊗ κ 2 ⊗ κ 3
+K−1 ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗
(2314)κ 2,3 ⊗ κ 1 ⊗ κ 4 + K(2413) κ 2,4 ⊗ κ 1 ⊗ κ 3 .
Here the initial partition is L = {(1, 2) , (3, 4)}, and all the above partitions are
indecomposable. If we assume that the means are zero, then
κ⊗ ⊗ −1 ⊗ ⊗ −1 ⊗ ⊗
X1 ⊗X2 ,X3 ⊗X4 = κ 1:4 + K(1423) κ 1,4 ⊗ κ 2,3 + K(1324) κ 1,3 ⊗ κ 2,4 . (3.60)
In the previous sections, we have seen expressions for cumulants via moments and
vice versa. The use of the formula for expressing joint cumulants and moments in
terms of mixed products of preceding moments and cumulants will be apparent in
later sections.
Let X ∈ Rd , and put λ1:d = λ. The characteristic function φX (λ) of X can be
expressed with the cumulant function ψX (λ) by the equation φ = exp (ψ), hence
150 3 T-Moments and T-Cumulants
μ⊗ ⊗
X = κX.
and note that on the right-hand side we have the product of derivatives of φ and ψ,
hence the second-order moment involves the second cumulant and the product of
first moment and cumulant
μ⊗ ⊗ ⊗ ⊗
X,2 = κ X,2 + κ X ⊗ μX .
Again μ⊗X,3 is expressed with the help of preceding moments and cumulants. The
fourth-order case is obtained similarly
= φDλ⊗4 ψ + Id 4 + K−1(1243) + K −1
(1342) D ⊗3
λ ψ ⊗ D ⊗
λ φ
+ Id 4 + K−1 −1 ⊗2 ⊗2 ⊗
(1423) + K(1324) Dλ ψ ⊗ Dλ φ + Dλ ψ ⊗ Dλ φ.
⊗3
We note that the fourth-order derivatives of both the characteristic and cumulant
functions are 4-symmetric and if we rearrange the above equation so that the
difference of the fourth-order moment and cumulant is on the left-hand side then
3.5 Additional Matters 151
+ Id 4 + K−1(1423) + K −1
(1324) κ ⊗
X,2 ⊗ μ ⊗ ⊗ ⊗
X,2 + κ X ⊗ μX,3
3
3 ⊗
= κ ⊗ μ⊗
X,j ,
j X,4−j
j =1
(see Definition 1.2, p. 16 of symmetry equivalence). Lemma 2.7, p. 87, and the
Exercise 2.23, (2.66) provide us the general expression for higher-order derivatives
n−1
n − 1 ⊗(n−j ) ⊗j
Dλ⊗n φX − φX Dλ⊗n ψX = Dλ ψX ⊗ Dλ φX ,
j
j =1
n−1
n−1
n−1 n−1 ⊗
μ⊗ ⊗
X,n − κ X,n = κ⊗ ⊗
X,n−j ⊗ μX,j = κ ⊗ μ⊗
X,n−j ,
j j − 1 X,j
j =1 j =1
(3.61)
by symmetrizer Sd1n .
If X is centralized so that μ⊗ ⊗
X = κ X = 0, then the right-hand side of (3.61) is
zero at both j = 1 and j = n − 1, which implies that μ⊗ ⊗
X,n = κ X,n , n = 2, 3, and
the limits of the summation are 2 and n − 2, respectively.
Cumulants can also be expressed in terms of moments using the Fourier transform.
Consider a random variable X, and let X = [X1 , X2 , . . . , Xn ] be independent copies
of X. Denote the Fourier frequencies by
ωk = 2πk/n, k = 0, 1, 2, . . . , n − 1,
0 ≤ ωk < 2π, define the vector Fn = 1, eiω1 , . . . , eiωn−1 , and then we show
that
1 n
Cumn (X) = E Fn X . (3.62)
n
This formula looks beneficial for estimating the cumulants when the sample size is
large, and it works for cumulants with higher order. The Fast Fourier transform is an
152 3 T-Moments and T-Cumulants
efficient tool. In practice one can slice a sample into parts with length of the order
of cumulants to be estimated and then average the estimations based on these slices.
However, sliced-estimations are so biased that several problems can occur such as
complex values, etc.
Let us take an example when n = 3. Then ωk = 2πk/3, and the expected value
3 3
E F3 X = E X1 + X2 eiω1 + X3 eiω2
= 3EX3 + 9EXEX2 eiω1 + eiω2 + 6 (EX)3
= 3μ3 + 9μ1 μ2 eiω1 + eiω2 + 6μ31 ,
n
vecX1:n = e k ⊗ Xk ,
k=1
where ek are the coordinate unit vectors of Rn . The independence of the components
implies that the cumulants of the sum is the sum of the cumulants, in particular we
have
n
Cum1 (vecX1:n ) = ek ⊗ κ ⊗ ⊗
X,1 = 1n ⊗ κ X,1 ,
k=1
and
n
Cum2 (vecX1:n ) = e⊗2
k ⊗ κ⊗ ⊗
X,2 = vec (I2 ) ⊗ κ X,2 .
k=1
n
3
X= Xk eiωk−1 = X1:n Fn = Fn ⊗ Id vecX1:n , (3.63)
k=1
⊗ ⊗3 ⊗3
μ3
X,3
= F3 ⊗ Id Cum3 (vecX1:3 ) + L−1
12 ,11 F3 ⊗ Id
⊗j
⊗j
n
j n−1
Fn ek = F n ek = ei2πj k/n = nδj n . (3.64)
k=1 k=0
then we shall prove the general vector-valued version of 3.62, see Appendix,
Sect. 3.6.3.
1 3⊗n
κ⊗
X,n = EX .
n
154 3 T-Moments and T-Cumulants
Noting the linearity of the conditional moments, we find the coefficient of λk is
the higher-order conditional moment μX,k|Y = E Xk |Y . Now the conditional
cumulants of X under variate Y is defined by the log of the conditional characteristic
function, namely by the conditional cumulant function ψX (λ|Y ) = log φX (λ|Y ).
Conditional cumulants are the coefficients of ψX (λ|Y ) in the series expansion
∞ k
i
ψX (λ|Y ) = Cumk (X|Y ) λk .
k!
k=0
If X and Y are independent, then neither conditional moments nor cumulants depend
on the condition, since the conditional expectation of exp (iλX) does not depend on
Y.
Property 3.5 Assume, the necessary moments exist
1. If X and Y are independent, then E (X|Y ) = E (X) .
2. E (g (Y ) X|Y ) = g (Y ) E (X|Y ).
From these properties of conditional moments, the next property of conditional
cumulant follows directly.
Property 3.6 Assume, the necessary moments exist
1. If X and Y are independent, then ECumk (X|Y ) = Cumk (X).
2. Cumk (g (Y ) X|Y ) = g k (Y ) Cumk (X|Y ).
The definitions above allow us to use all formulas of the preceding sections for
moments and cumulants. For instance
Cum2 (X|Y ) = E X2 |Y − E (X|Y ) E (X|Y ) ,
3.5 Additional Matters 155
n ⊗
κ⊗
1:n|Y = (−1)r−1 (r − 1)! Kp−1K μ⊗ ,
( {r} ) j =1:r (bj )|Y
r=1 K{r} ∈Pn
Setting X1 = X2 = X, which also applies for the variance and Brillinger’s formula
follows, namely
where K = {b1 , b2 , . . . , bk } is a partition of the set {1, 2, . . . , n}, Xbj denotes the
subset of random variables {X1 , X2 , . . . , Xn } such that Xbj = Xi , i ∈ bj , and
the summation is taken over all such partitions. For n = 3, Brillinger’s formula
asserts that
κ(1:3) = Cum κ(1:3)|Y + Cum κ(1,2)|Y , κ3|Y + Cum κ(1,3)|Y , κ2|Y
+ Cum κ(2,3)|Y , κ1|Y + Cum κ1|Y , κ2|Y , κ3|Y .
We might notice the similarity with the conditional expectation, except now
cumulants are summed up with respect to the partitions. In our case the T-cumulant
version of Brillinger’s formula is as follows.
Theorem 3.8 (Brillinger’s Theorem) Let X1:n be a list of random vectors, then
Cum (X1:n ) = K−1
p(K) Cumk Cum Xb1 |Y , Cum Xb2 |Y , . . . , Cum Xbk |Y ,
K∈Pn
(3.67)
where the summation is for all partitions K ∈ Pn , and permutations p (K) assume
the canonical form of K.
The method of proof follows that of the derivation of formula (3.65) above,
namely first expressing the cumulants in terms of moments, then the moments
in terms of conditional ones, replacing the conditional moments with conditional
cumulants, and finally expressing moments via cumulants—see Sect. 3.6.2 for the
proof in a particular case.
The distinct value principle is valid so Brillinger’s formula is simplified for a
list of variables X1:n when all of them are the same X, say. Recall that the type
= [1 , . . . , n ] of a partition K{r} = {bk } with size r, means that there are exactly
j blocks among bk with cardinality j . If the elements of the list X1:n are the same,
3.5 Additional Matters 157
* *
then the cumulants Cum Xbj |Y depend on the cardinality *bj * of a block bj only,
and Cumn (X) is symmetric, hence applying the symmetrizer Sd1n , we have
Cumn (X) = Cumk Cum|b1 | (X|Y) , Cum|b2 | (X|Y) , . . . , Cum|bk | (X|Y)
K∈Pn
n n−r+1
1
= n! Cumr
r=1 j =r, j =1
j ! (j !)j
jj =n
× Cum1 (X|Y)1:1 , . . . , Cumn−r+1 (X|Y)1:n−r+1 , (3.68)
+K−1 ⊗ ⊗ ⊗ ⊗ ⊗
(231) Cum2 κ 2,3|Y , κ 1|Y + Cum3 κ 1|Y , κ 2|Y , κ 3|Y ,
therefore,
κ⊗ ⊗ −1 ⊗ ⊗ ⊗
X,3 = Cum1 κ X,3|Y + L12 ,11 Cum2 κ X,2|Y , κ X,1|Y + Cum3 κ X,1|Y
where the second term does not depend on λk , hence all derivatives that include
∂ n /∂λnk , n > 1, is zero. This implies the following general result.
Lemma 3.3 Let X1:n , n > 1, be a list of random variables, then Cum (X1:n |Xk ) =
0, 1 ≤ k ≤ n.
An application of Brillinger’s formula simplifies the cumulant of product of
independent variates expressed in terms of cumulants of individual variates.
Let (X1 , X2 ), and (Y1 , Y2 ) be independent and consider
κX1 Y1 ,X2 Y2 = Cum κX1 Y1 ,X2 Y2 |Y1,2 + Cum κX1 Y1 |Y1,2 , κX2 Y2 |Y1,2
= Cum Y1 Y2 κX1 ,X2 |Y1,2 + Cum Y1 κX1 |Y1,2 , Y2 κX2 |Y1,2
= κX1 ,X2 κY1 Y2 ,1 + κX1 κX2 κY1 ,Y2 . (3.69)
The left-hand side is symmetric in X and Y what we can achieve in the right-hand
side as well
κX1 Y1 ,X2 Y2 = κX1 ,X2 κY1 ,Y2 + κX1 ,X2 κY1 κY2 + κX1 κX2 κY1 ,Y2 .
Take independent random variables X and Y , then the Brillinger’s formula takes
the form as a special case of the above
More
generally assume X1:n and Y1:n are independent random variates, denote
Yb = j ∈b Yj , then we have
Cumn (X1 Y1 , . . . , Xn Yn ) = Cum (Yb , b ∈ K) Cum|b| (Xb ) . (3.70)
K∈Pn b∈K
n
n
2 1 κX,j j
Cumn (XY ) = n! Cumr Y{1 } , Y{ 2}
, . . . , Y n
{n } ,
j ! j!
r=1 j =r, j =1
jj =n
(3.71)
3.5 Additional Matters 159
j
where Y{ } corresponds to the block with cardinality j , which includes the power
j
Y j only (it implies listing Y j consecutively j times).
Example 3.29 Let X and Y be independent then we apply (3.71) for fourth-order
cumulant of the product κXY,4 , (see either Example 3.15 for the types of partitions
according to this case, or Bell polynomials, Sect. A.1, p. 351 ).
1. If r = 1, we have one block with 4 elements; therefore, the contribution of the
corresponding term to κXY,4 is κY 4 ,1 κX,4 .
2. If r = 2, we have two types: (a) = (1, 0, 1, 0) and the corresponding term is
4κX,3 κX,1 κY 3 ,Y . (b) = (0, 2, 0, 0), which implies 3κX,2
2 κ
Y 2 ,2 .
3. If r = 3, then = (2, 1, 0, 0), we have 6κX,1 κX,2 κY,Y,Y 2 .
2
Closing this section we use the Brillinger’s formula (3.48) and Malyshev’s
formula for independent variates.
Lemma 3.4 Consider X and Y independent random variables then
1. The 2nd-order cumulant (variance) is
Let us consider the conditional cumulants for jointly Gaussian random variables.
If X and Y are jointly Gaussian, then we know that the conditional expectation
E (X|Y) is linear in Y, namely
where CX,Y and CY,Y denote covariance and variance–covariance matrix, respec-
tively (CY,Y > 0); moreover X−E (X|Y) and Y are uncorrelated hence independent.
The conditional variance
The same argument leads to the general case: If X1:n and Y are jointlyGaussian,
n > 1, then the conditional cumulants do not depend on Y , more precisely
If a list X1:n and Y are jointly Gaussian, n > 1, then we can simply argue that
E (X1:n |Y) is a constant with respect to conditional Y; therefore, we have
Cumn (X1:n |Y) = Cumn (X1:n −E (X1:n |Y) |Y) = Cumn (X−E (X1:n |Y)) = 0,
and X−E (X1:n |Y) is a Gaussian variate that is the result is zero.
The above results can be used to obtain the cumulants of the partial derivatives of
the log-likelihood function. Such expressions are useful in studying the asymptotic
theory in statistics.
Consider a random sample [X1 , X2 , . . . , XN ] = X ∈ RN , with the likelihood
function L (ϑ, X) and let l (ϑ) denote the log-likelihood function, i.e.
l (ϑ) = ln L (ϑ, X) , ϑ ∈ Rd .
The expected value of the left-hand side of the above expression is zero, as we may
change the order of the derivative and the integral, which gives the result (3.73).
The same argument will lead, more generally to several partial derivatives
4 4
Dϑ1:d L (ϑ, X)
Eϑ = Dϑ1:d L (ϑ, x) dx = Dϑ1:d el(ϑ) dx
L (ϑ, X)
d
= Eϑ Dϑb l (ϑ) ,
r=1 K ∈Pd b∈K
|K |=r
where Dϑb = ∂ |b| / j ∈b ∂ϑj ; therefore,
d
Eϑ Dϑb l (ϑ) = 0. (3.74)
r=1 K ∈Pd b∈K
|K |=r
Observe the difference: μ∂ϑ(1,2) and μ∂ϑ1:2 correspond to μ∂ϑb = Eϑ Dϑb l (ϑ) ,with
b = (1, 2), and μ∂ϑ1:2 = μ∂ϑ1 ,ϑ2 = Eϑ Dϑ1 l (ϑ) Dϑ2 l (ϑ), respectively.
Let b be a block of a partition K ∈ Pd−1 then
4
Dϑd Eϑ Dϑb l (ϑ) = Dϑ(b,d) l (ϑ) + Dϑb l (ϑ) Dϑd l (ϑ) el(ϑ) dx,
k
Dϑd μ∂ϑb ,j =1:k = μ∂ϑ b ,d + μ∂ϑd ,ϑb ,j =1:k . (3.76)
j ( i ) ,ϑbj ,j =i j
i=1
For instance if k = 2,
Dϑd μ∂ϑb ,ϑb2 = μ∂ϑ b ,ϑb2 + μ∂ϑb ,ϑ(b ,d ) + μ∂ϑb ,ϑb2 ,ϑd . (3.77)
1 ( 1 ,d ) 1 2 1
Hence the rule for higher-order derivatives of the expected values of the log-
likelihood function follows that for the derivatives of the compound function el(ϑ) .
So we can obtain formula (3.74) from the consecutive derivatives of μ∂ϑ1 , as well.
The first-order cumulant of Dϑ1 l (ϑ) coincides with the moment κϑ∂1 = μ∂ϑ1 .
Now we take the derivative of the cumulant
Dϑ2 κϑ∂1 = Dϑ2 μ∂ϑ1 = μ∂ϑ(1,2) + μ∂ϑ1 ,ϑ2 = κϑ∂(1,2) + κϑ∂1 ,ϑ2 ,
where κϑ∂(1,2) = μ∂ϑ(1,2) , as they are the expected values of Dϑ1 ,ϑ2 l (ϑ), and κϑ∂1 ,ϑ2 =
μ∂ϑ1 ,ϑ2 , since the covariance of Dϑ1 l (ϑ) and Dϑ2 l (ϑ) is equal to the expected value
of their product when μ∂ϑ1 = μ∂ϑ2 = 0. A similar argument leads us to the formula
Dϑ2,3 κϑ∂1 = Dϑ2,3 μ∂ϑ1 = μ∂ϑ(1,2,3) + μ∂ϑ(i,j),ϑ [3] + μ∂ϑ1:3 = κϑ∂(1,2,3) + κϑ∂(i,j),ϑ + κϑ∂1:3 ,
k k
164 3 T-Moments and T-Cumulants
since the third-order central moment is equal to the third-order cumulant, where
μ∂ϑ(i,j),ϑ [3] denotes the sum of three terms with respect to the partitions with type
k
= [1, 1, 0], see (2.5), p 62. Both derivatives Dϑ2 κϑ∂1 and Dϑ2,3 κϑ∂1 suggest that the
derivatives of the cumulants of derivatives of the log-likelihood function follows the
same rule as the moments do. We will show
d
Cum Dϑb l (ϑ) , b ∈ K = 0, (3.78)
r=1 K ∈Pd
|K |=r
d
κϑ∂b ,b∈K = 0.
r=1 K ∈Pd
|K |=r
The idea is to prove the same derivative rule to (3.76) for the cumulants, and then
(3.78) follows.
Lemma 3.5 (Skovgaard) Let L ∈ Pd−1 and bj , j = 1 : k be blocks of L, then
k
Dϑd κϑ∂b ,j =1:k = κϑ∂ b ,d + κϑ∂d ,ϑb ,j =1:k . (3.79)
j ( i ) ,ϑbj ,j =i j
i=1
We have seen that each term on the right-hand side in (3.75) is equal to the
corresponding cumulant term, hence (3.79) holds for k = 1
= κϑ∂ + κϑ∂b ,ϑ + μ∂ϑb ,ϑb ,ϑd − μ∂ϑb ,ϑd μ∂ϑb − μ∂ϑb μ∂ϑb ,ϑd
(b1 ,d ) ,ϑb2 1 (b2 ,d ) 1 2 1 2 1 2
where we have applied the formula of cumulants via moments three times, note that
we have added a zero term μ∂ϑd μ∂ϑb ,ϑb = 0 to the expression
1 2
in order to get κϑ∂b ,ϑb ,ϑd . One can compare this result with (3.77). (See Exer-
1 2
cise 3.32 and the proof of Skovgaard’s Lemma in the Appendix, Sect. 3.6.4.)
Equation (3.74) is in terms of the expected values of the derivatives of the log–
likelihood function, where as (3.78) is in terms of the cumulants.
For example,
suppose we have a single parameter ϑ, then the moment Eϑ b∈K ϑb depends on
j j
the type of the partition K ∈ Pd , i.e. Eϑ b∈K ϑb = Eϑ j Dϑ l (ϑ) . Let us
denote
d
j
j
μ∂d () = Eϑ Dϑ l (ϑ) .
j =1
The distinct values principle applies here again and we use the formula (1.41) p. 32
for the number of partitions of the same type and obtain the single parameter case
of the formula (3.74)
d
d−r+1
d!μ∂d ()
= 0, (3.80)
r=1 j =r, j j =d j =1
j ! (j !)j
where the summation is taken over all possible type = 1:d , fulfilling the
assumptions j ≥ 0, j = r, j j = d. For instance
2 3 4
μ∂4 (1 , 2 , 3 , 4 ) = Eϑ [Dϑ l (ϑ)]1 Dϑ2 l (ϑ) Dϑ3 l (ϑ) Dϑ4 l (ϑ) ,
d
d−r+1
d!κd∂ ()
= 0. (3.82)
r=1 j =r, j j =d j =1
j ! (j !)j
166 3 T-Moments and T-Cumulants
The multivariate extension (when the elements of the parameter vector are also
vectors) of the formula (3.74) can easily be obtained using Faà di Bruno’s
formula
d ⊗
⊗|b|
K−1
p(K) (d1:d ) Eϑ Dϑb l (ϑ) = 0, (3.84)
b∈K
r=1 K ∈Pd
|K |=r
where ϑb denotes a subset of vectors ϑ j , j ∈ b . Now, in particular if d = 2 and
ϑ 1 = ϑ 2 = ϑ then (3.84) gives the well known result
Cov (Dϑ l (ϑ) , Dϑ l (ϑ)) = −Eϑ Dϑ Dϑ l (ϑ) ,
where
⊗ ⊗2
μ∂4 (1 , 2 , 3 , 4 ) = Eϑ Dϑ⊗ l (ϑ) 1 ⊗ Dϑ⊗2 l (ϑ)
⊗3 ⊗4
⊗ Dϑ⊗2 l (ϑ) ⊗ Dϑ⊗2 l (ϑ) .
3.6 Appendix 167
d
⊗|b|
K−1
p(K) (d1:d ) Cumr Dϑb l (ϑ) , b ∈ K = 0.
r=1 K ∈Pd
|K |=r
3.6 Appendix
Here κ ⊗ b1 = Cum|b1 | Xb1 is the cumulant of a subset of the elements
[X1 , X2 , . . . , Xn−2 ] and Xn−1 , and κ ⊗
b2 = Cum|b2 | Xb2 is the cumulant of the
complementary subset and Xn .
Proof We prove the result (3.85) by induction over n, and for the proof we follow
a similar procedure to the one we followed for obtaining Cum (X1 , X2 X3 ) and
Cum (X1 , X2 , X3 X4 ) .
Let us suppose the result (3.85) is true for some n and we will prove the result is
true for n+1. Let us calculate the expectation of the product of the random variables
(X1 , X2 , . . . , Xn−2 , Xn−1 , Xn , Xn+1 ) , and this is given by Theorem 3.3
⊗
μ⊗
X1:n+1 = K−1
p(K) κ⊗
Xb .
bj ∈K j
K∈Pn+1
Now let us evaluate (3.86) which is the expectation of n term. We have seen earlier
that if we know all partitions for order n − 1 we can obtain all the partitions for
the order n, using inclusive and exclusive matrices given in Tables 1, 2, 3, 4 for
168 3 T-Moments and T-Cumulants
(compare the above with (3.52) and (3.54)).We note that in the terms
Cum|bj | (Ybj ) ⊗ Cum|bk | (Ybk , Yn ), the first term Cum|bj | (Ybj ) must contain at
least one element of the set X1:n−1 , and therefore, the order of Cum|bk |+1 (Ybk , Yn )
is strictly less than n + 1. We now evaluate the terms Cum|bj | (Ybj ) ⊗
Cum|bk |+1 (Ybk , Yn ) and Cum1 (Yn ) in the expression (3.87). We have
κ⊗
Yb ,Yn = κ⊗
Xb ,Xn ⊗Xn+1
k k
= κ⊗
Xb ,Xn ,Xn+1 + K−1 ⊗
p(a) κ Xa ⊗ κ⊗
Xa , (3.88)
k 1 ,Xn 2 ,Xn+1
a1 ∪a2 =bk
where a1 and a2 are the partitions of the set bk . Here we have used the result (3.85)
which we have assumed to be true for n and the observation we have made earlier
that Cum|bk |+1 (Ybk , Yn ) is of order less than n + 1. We also have
κ⊗ ⊗ ⊗ ⊗ ⊗
Yn = κ Xn ⊗Xn+1 = κ Xn ,Xn+1 + κ Xn ⊗ κ Xn+1 . (3.89)
3.6 Appendix 169
μ⊗ ⊗
X1:n+1 = κ X1:n−1 ,Xn ⊗Xn+1 + K−1
p(K) κ⊗
Xb , (3.90)
j
K∈Pn+1 bj ∈K
where
K−1
p(K) Cum(Xbj ) = κ ⊗ ⊗
X1:n−1 ⊗ κ Xn ⊗Xn+1
K∈Pn+1 bj ∈K
+ K−1
(p(K),n,n+1) (3.91)
K∈Pn−1 ,|K|>1
⎛
×⎝ κ⊗ ⊗
Xb ⊗ κ Yb ,Yn ,Yn+1 + K−1
p(K,a)
j k
bk ∈K bj ∈K, j =k bj ∈K
⎞
× κ⊗ ⊗
Xb ⊗ κ Xa ,Xn ⊗ κ⊗
Xa ,Xn+1
⎠
j 1 2
a1 ∪a2 =bk
+ κ⊗ ⊗
Xb ⊗ κ Xn ,Xn+1 + κ⊗ ⊗ ⊗
Xb ⊗ κ Xn ⊗ κ Xn+1 .
j j
bj ∈K bj ∈K
The summation above is over all decomposable partitions K ∈ Pn+1 with respect
to the initial partition L = {(1) , (2) , . . . , (n − 1) , (n, n + 1)}. (Compare the above
expression (3.91) with (3.57).) By equating (3.87) and (3.90) and rearranging the
terms, we get
κ⊗
X1:n−2 ,Xn−1 ⊗Xn = K−1
p(K) κ⊗
Xb − κ⊗
Xb
j j
K∈Pn+1 bj ∈K K∈Pn+1 bj ∈K
= K−1
p(K) κ⊗
Xb
K ∈Pn+1 b∈K
K ∪L=O
= κ⊗
X1:n + K−1 ⊗ ⊗
p(K) κ b1 ⊗ κ b1 ,
K ∈Pn ,K ={b1 ,b2 }
n−1∈b1 ,n∈b2
where the initial partition L is {(1) , (2) , . . . , (n − 1) , (n, n + 1)} and the sum-
mation is over all indecomposable partitions with respect to the initial partition L
defined above (compare the result with (3.57)). Hence the result.
Proof of Theorem 3.7 We proceed with induction on both on the number of
variables included and on the partitions.
170 3 T-Moments and T-Cumulants
(Y1 , . . . , Yr ⊗ Yr+1 ) = XL ,
We see that each cumulants contained in right side of this expression is subject of
the induction. Hence the result (3.59).
κ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
1:3 = μ1:3 − μ1 ⊗ μ2,3 − K(132) μ1,3 ⊗ μ2 − μ1,2 ⊗ μ3 + 2μ1 ⊗ μ2 ⊗ μ3
= Eμ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗
1:3|Y − Eμ1|Y ⊗ Eμ2,3|Y − K(132) Eμ1,3|Y ⊗ Eμ2|Y − Eμ1,2|Y ⊗ Eμ3|Y
+2Eμ⊗ ⊗ ⊗
1|Y ⊗ Eμ2|Y ⊗ Eμ3|Y ,
3.6 Appendix 171
μ⊗ ⊗
j |Y = κ j |Y ,
μ⊗ ⊗ ⊗ ⊗
j,k|Y = κ j,k|Y + κ j |Y ⊗ κ k|Y
μ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗
1:3|Y = κ 1:3|Y + κ 1|Y ⊗ κ 2,3|Y + K(132) κ 1,3|Y ⊗ κ 2|Y + κ 1,2|Y
⊗κ ⊗ ⊗ ⊗ ⊗
3|Y + κ 1|Y ⊗ κ 2|Y ⊗ κ 3|Y ,
and obtain
κ⊗
1:3 = E κ⊗ ⊗ ⊗ −1 ⊗ ⊗
1:3|Y + κ 1|Y ⊗ κ 2,3|Y + K(132) κ 1,3|Y ⊗ κ 2|Y
+κ ⊗1,2|Y ⊗ κ ⊗
3|Y + κ ⊗
1|Y ⊗ κ ⊗
2|Y ⊗ κ ⊗
3|Y
−Eκ ⊗1|Y ⊗ Eκ ⊗
2,3|Y + Eκ ⊗
2|Y ⊗ κ ⊗
3|Y
−K−1 ⊗ ⊗ ⊗
(132) Eκ 1,3|Y + Eκ 1|Y ⊗ κ 3|Y ⊗ Eκ 2|Y
⊗
− Eκ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
1,2|Y + Eκ 1|Y ⊗ κ 2|Y ⊗ Eκ 3|Y + 2Eκ 1|Y ⊗ Eκ 2|Y ⊗ Eκ 3|Y ,
+Eκ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
1,2|Y ⊗ κ 3|Y − Eκ 1,2|Y ⊗ Eκ 3|Y + Eκ 1|Y ⊗ κ 2|Y ⊗ κ 3|Y
−Eκ ⊗ ⊗ ⊗
1|Y ⊗ Eκ 2|Y ⊗ κ 3|Y
−K−1 ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
(132) Eκ 1|Y ⊗ κ 3|Y ⊗ Eκ 2|Y − Eκ 1|Y ⊗ κ 2|Y Eκ 3|Y
+2Eκ ⊗ ⊗
1|Y ⊗ Eκ 2|Y ⊗ Eκ 3|Y
⊗
= Cum1 κ ⊗1:3|Y + Cum 2 κ ⊗
, κ
1,2|Y 3|Y
⊗
+ K −1
(132) Cum2 κ ⊗
, κ
1,3|Y 2|Y
⊗
+Cum2 κ ⊗ ⊗ ⊗ ⊗ ⊗
1|Y , κ 2,3|Y + Cum3 κ 1|Y , κ 2|Y , κ 3|Y
κ⊗ ⊗
1:3 = Cum1 κ 1:3|Y
+Cum2 κ ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗
1,2|Y , κ 3|Y + K(132) Cum2 κ 1,3|Y , κ 2|Y + Cum2 κ 1|Y , κ 2,3|Y
+Cum3 κ ⊗ ⊗ ⊗
1|Y , κ 2|Y , κ 3|Y .
172 3 T-Moments and T-Cumulants
n
Y = vecX1:n = e k ⊗ Xk ,
k=1
where κ ⊗
X,n denotes the nth-order T-cumulants of X. The Fourier transform of X1:n
writes as
n
3
X= Xk eiωk−1 = Fn ⊗ Id vecX1:n .
k=1
The commutator K−1 rearranges the T-product but whatever the permutation of
⊗ p(K)
⊗n
κ Yb we can apply Fn ⊗ Id one by one on the terms
n
⊗|b| ⊗|b|
Fn ⊗ Id κ⊗
Yb = ek ⊗ κ⊗
X,|b|
k=1
n
⊗|b|
= F⊗|b|
n ek ⊗ κ⊗ ⊗
X,|b| = nδ|b|n κ X,|b| ,
k=1
3.6 Appendix 173
k
κϑ∂b ,j =1:k = (−1)r (r − 1)! μ∂ϑb ,j ∈a
j j
r=1 K{r} ∈Pk a∈K{r}
k
Dϑd κϑ∂b ,j =1:k = (−1)r (r − 1)! Dϑd μ∂ϑb ,j ∈a
j j
r=1 K{r} ∈Pk a∈K{r}
k
= (−1)r (r − 1)!
r=1 K{r} ∈Pk
⎛ ⎞
k
⎝ μ∂ + μ∂ϑd ,ϑb ,j ∈ai μ∂ϑb ,j ∈a\i ⎠ ,
ϑ (bi ,d ) ,ϑbj ,j ∈ai \i j j
i=1 a∈K{r} ,a=ai
where ai ∈ K{r} denotes the block which contains index i. If r = 1, then K{1} is
with one block a = (1 : k) and the product is empty.
Now, for each index i we have
⎛ ⎞
k
(−1)r (r − 1)! ⎝μ∂ϑ μ∂ϑb ,j ∈a\i ⎠
b ,d ,ϑb ,j ∈ai \i ( i ) j j
r=1 K{r} ∈Pk a∈K{r} ,a=ai
= κϑ∂ b ,d .
( i ) ,ϑbj ,j =1:k\i
The second sum corresponds to the inclusive extension of the partitions of the set
1:k
k
k
(−1)r (r − 1)! μ∂ϑd ,ϑb ,j ∈ai μ∂ϑb ,j ∈a\i ,
j j
r=1 K{r} ∈Pk i=1 a∈K{r} ,a=ai
174 3 T-Moments and T-Cumulants
it includes all the terms from k + 1 order cumulant κϑ∂d ,ϑb ,j =1:k . The missing
j
exclusive extensions like μ∂ϑd a∈K{r} μ∂ϑb ,j ∈a can be included since μ∂ϑd is zero.
j
We summarize the above and obtain the derivative
k
Dϑd κϑ∂b ,j =1:k = κϑ∂ b ,d + κϑ∂d ,ϑb ,j =1:k ,
j ( i ) ,ϑbj ,j =1:k\i j
i=1
3.7 Exercises
Justify that all the partitions in the right-hand side expression of 3.92, viz,
{(1, 2, 3)} , {(2) , (1, 3)} , {(3) , (1, 2)} are indecomposable with respect to the initial
partition L = {(1) , (2, 3)}.
3.2 Equating (A.19) and the expression EX1 X2 Y by (3.39), where Y = X3 X4 , and
canceling terms which are common, show
Justify that all the above partitions are indecomposable with respect to the initial
partition L = {(1) , (2) , (3, 4)}.
3.3 Let the initial partition be L = {(1, 2) , (3, 4)}, show
κ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗
X1 ⊗X2 ,X3 ⊗X4 = κ (1:4) + κ (1,2,3) ⊗ κ 4 + K(1243) κ (1,2,4) ⊗ κ 3
+K−1 ⊗ ⊗ −1 ⊗ ⊗
(1342) κ (1,3,4) ⊗ κ 2 + K(2431) κ 2,3,4 ⊗ κ 1
+K−1 ⊗ ⊗ −1 ⊗ ⊗
(1423) κ (1,4) ⊗ κ (2,3) + K(1324) κ (1,3) ⊗ κ (2,4)
+K−1 ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗
(1423) κ (1,4) ⊗ κ 2 ⊗ κ 3 + K(1324) κ (1,3) ⊗ κ 2 ⊗ κ 4
+κ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
1 ⊗ κ (2,3) ⊗ κ 4 + κ 1 ⊗ κ 2 ⊗ κ 3,4 .
Conclude
Cum2 X⊗2 , X⊗2 = κ ⊗ −1 −1
X,4 + Id + K(1243) + K(1342) + K(2431)
−1
κ⊗ ⊗ −1 −1
X,3 ⊗ κ X,1 + K(1324) + K(1423) κ X,2
⊗2
+ K−1 (1324) + K −1
(1423) κ ⊗
X,2 ⊗ κ ⊗2
X,1
+ Id + K−1 (1243) κ ⊗
X,1 ⊗ κ ⊗
X,2 ⊗ κ ⊗
X,1 .
3.7 Suppose the random variables (X1 , X2 , X3 , X4 ) have mean zero, then show
κ⊗ ⊗ −1 ⊗ ⊗ −1 ⊗ ⊗
X1 ,X2 ,X3 ⊗X4 = κ (1:4) + K(1423) κ (1,4) ⊗ κ (2,3) + K(1324) κ (1,3) ⊗ κ (2,4) . (3.95)
3.8 Suppose that random variate (X1 , X2 , X3 , X4 ) is multivariate normal with mean
zero, then
κ⊗ −1 ⊗ ⊗ −1 ⊗ ⊗
X1 ⊗X2 ,X3 ⊗X4 = K(1423) κ (1,4) ⊗ κ (2,3) + K(1324) κ (1,3) ⊗ κ (2,4) (3.96)
and
κ⊗
X⊗2 ,X⊗2
= Id 4 + K(1243) vec ⊗2 .
3.9 Take a random variable X ∈ Rd , find the first three cumulants for the
polynomial
where Qj are constant matrices with appropriate dimensions (see Exercise 2.22).
3.10 Let X1 , X2 , X3 ∈ Rd be independent, and identically distributed with the
random vector X, and let Y = vec [X1 , X2 , X3 ]. Then show
⊗2
μ⊗
Y,2 = 1 3 ⊗ κ ⊗
Y,1 + vec (I3 ) ⊗ κ ⊗
Y,2 ,
If n = 4 then
κ⊗
X,4|Y = μ ⊗
X,4|Y − L−1
3,1 μ⊗
X,3|Y ⊗ μ⊗
X|Y
−L−1 ⊗2 −1 ⊗ ⊗2 ⊗4
2,2 μX,2|Y + 2L2,1,1 μX,2|Y ⊗ μX|Y − 6μX|Y ,
and
3.13 Show that 4th order cumulant of the product of two independent variates is
2 + 6κ
κXY ,4 = κX,4 κY,4 + 4κX,4 κY,3 κY,1 + 3κX,4 κY,2 2 4
X,4 κY,2 κY,1 + κX,4 κY,1
2 + 12κ
+ 4κX,3 κX,1 κY,4 + 12κX,3 κX,1 κY,3 κY,1 + 12κX,3 κX,1 κY,2 2
X,3 κX,1 κY,2 κY,1
3.7 Exercises 177
2 κ
+ 3κX,2 2 2 2 2 2
Y,4 + 12κX,2 κY,3 κY,1 + 6κX,2 κY,2 + 12κX,2 κY,2 κY,1
2 κ
+ 6κX,2 κX,1 2 2 2 4
Y,4 + 12κX,2 κX,1 κY,3 κY,1 + 12κX,2 κX,1 κY,2 + κX,1 κY,4 .
3.14 Take a random variable Y , and express the cumulant of X1:3 in terms of
conditional covariances and cumulants
κ(1:3) = Eκ(1:3)|Y + Cov κ(1:2)|Y , κ3|Y + Cov κ(1,3)|Y , κ2|Y
+ Cov κ(2,3)|Y , κ1|Y + Cum κ1|Y , κ2|Y , κ3|Y .
+κX,2 κY,1
2 2
κZ,1 + κX,1
2 2
κY,2 κZ,1 + κX,1
2 2
κY,1 κZ,2 .
κX1 Y1 ,X2 Y2 ,X3 Y3 |Y1:3 = κX1:3 κY1 Y2 Y3 ,1 + κX(1,2) κX3 κY1 Y2 ,Y3
+κX[1,3] κX2 κY1 Y3 ,Y2 + κX2,3 κX1 κY2 Y3 ,Y1
+κX1 κX2 κX3 κY1:3 .
3.22 Let X and Y be independent. Use formula for cumulants of product to show
that
2 2 4
κXY ,4 = κX,4 κY,4 + 4κY,3 κY,1 + 3κY,2 + 6κY,2 κY,1 + κY,1
+ κX,3 κX,1 4κY,4 + 12κY,3κY,1 + 12κY,2
2
+ 12κY,2 κY,1
2
2 2 2
+ κX,2 3κY,4 + 12κY,3κY,1 + 6κY,2 + 12κY,2κY,1
+ κX,2 κX,1
2
6κY,4 + 12κY,3κY,1 + 12κY,2
2
+ κX,1
4
κY,4 .
3.23 Let X = [X1 , X2 , . . . , Xn ], where the entries of X are i.i.d with X. Use the
T −derivative of the characteristic function to show
μ⊗ ⊗2
X,2 = κX,1 1d + κX,2 vec (Id ) .
2
where L−1 ⊗3 −1 −1
2,1 = Id + K(132) + K(231) , compare this result with Example 3.19.
3.25 (See [Goo75].) Let X = [X1 , X2 , . . . , Xn ] be independent copies of a random
variable X. Let ωk = 2πk/n, k = 0, 1, 2, . . . , n − 1, 0 ≤ ωk < 2π, be Fourier
frequencies,
consider exp (iω k ) on the unit circle, and notice the periodicity. Define
Fn = 1, eiω1 , · · · ei(n−1)ω1 and use Exercise 3.27 to show
1 n
E X Fn = Cumn (X) .
n
Cum1 (vecY) = 1n ⊗ κ ⊗
X,1 ,
and
Cumn (vecY) = e⊗n ⊗
k ⊗ κ X,n .
3.32 Use the rule of moments for expressing Dϑd μ∂ϑb ,ϑb2 ,ϑb3 , then show
1
Dϑd μ∂ϑb ,ϑb2 ,ϑb3 = μ∂ϑ b ,ϑb2 + μ∂ϑb ,ϑ(b ,d ) + μ∂ϑb ,ϑb2 ,d .
1 ( 1 ,d ) 1 2 1
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 183
G. Terdik, Multivariate Statistical Methods, Frontiers in Probability
and the Statistical Sciences, https://doi.org/10.1007/978-3-030-81392-5_4
184 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants
These
polynomials form a complete
orthogonal system in the Hilbert space
L2 R, B, √1 exp(−x 2 /2)dx , which implies that every (Borel measurable)
2π
function g by assumption
4 +∞ 1
g 2 (x) √ exp(−x 2 /2) dx < ∞,
−∞ 2π
4 2
+∞
n
gk 1
lim g (x) − Hk (x) √ exp(−x 2 /2) dx = 0.
n→∞ −∞ k! 2π
k=0
d
H0 = 1, Hn (x) = nHn−1 (x) .
dx
Property 4.3 The generating function of Hn is
∞
Hn (x)
a n = exp(xa − a 2 /2).
n!
n=0
Note that Property 4.3 is understood as Taylor series in a for fixed x and an L2
Hermite series, see (4.1), when a is fixed.
A translation of these facts into the terminology of probability leads to the
following idea. Let X be a Gaussian random variable with mean 0 and variance
4.2 Hermite Polynomials of Several Variables 185
1. The set of all random variables Y which are measurable with respect to X with
finite variance, called variables depending on X, forms a Hilbert space. Let us
denote it by L2 (X). The inner product in L2 (X) is the covariance as usual, i.e. if
Y1 , Y2 ∈ L2 (X) then Y1 , Y2 = Cov(Y1 , Y2 ). Now it is easy to see that the random
variables Hn = Hn (X), n = 0, 1, 2, . . . form a closed orthogonal system in L2 (X).
Therefore if Y ∈ L2 (X), then there exists a Borel measurable function g so that
Y = g(X) and also
∞
gk
Y = Hk ,
k!
k=0
Remark 4.1 In physicists’ literature the polynomials we are studying here are
referred to as “probabilists’ Hermite polynomials” and are denoted by H e. Their
Hermite polynomials have similar formulae, which may be √ obtained from ours by
replacing the power of x with the corresponding power of 2x and multiplying the
entire sum by 2n/2 .
n
ak Xk , ak ∈ R,
k=0
of Xk is Gaussian. Let L2 (X) be the Hilbert space of all random variables depending
on X, i.e. measurable with respect to the system X and having finite variance. We
note that Hilbert space L2 (X) is sometimes called nonlinear to distinguish from the
linear Hilbert space L2 (X), which is generated by the system X, including only finite
linear combinations and their limits in the mean square sense. So every element
in the linear space L2 (X) is Gaussian but this is not the case in L2 (X). Now the
question is whether a closed orthogonal system of polynomials in Xk exists in L2 (X)
and what it looks like.
186 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants
X = {X
Let us consider a Gaussian system
k , k = 1, 2,...} with EXk = 0, k =
1, 2, ... and covariance matrix C = σj k = Cov Xj , Xk . We define the Hermite
polynomials of several variables through the generator function
⎛ ⎞
n
1
n
(X1:n ,a1:n ) = exp ⎝ ak Xk − aj σj k ak ⎠ , (4.2)
2
k=1 j,k=1
as follows.
We consider the Taylor series expansion of (X1:n ,a1:n ) by a1:n ,
a m1:n
1:n
(X1:n ,a1:n ) = Hm1:n .
m1:n !
m1:n ≥0
m
Then the coefficients Hm1:n of a1:n1:n , are polynomials of X1:n and depends on the
covariances
σj k of X1:n as well. The polynomials Hm1:n =
Hm1:n X1m1 , X21m2 , .., Xn1mn will be called Hermite polynomials; the
dependence of covariances σj k is understood through the Gaussian variables
X1:n . We repeat the notation for vectors of which some entries coincide
[X1 , X1 , . . . , X1 , X2 , X2 , . . . , X2 , .., Xn , Xn , . . . , Xn ] = X1m1 , X21m2 , .., Xn1mn .
+ ,- . + ,- . + ,- .
m1 m2 mn
It is also worth using some more compact form for the exponent
n
1 1
(X1:n ,a1:n ) = exp ak Xk − aj σj k ak = exp a X− a Ca ,
2 2
k=1
#
where a = a1:n , and a Ca = σj k aj ak .
We emphasize that here and everywhere else in this book the zero expected values
of the Gaussian variables are assumed unless otherwise stated. Since there is no
restriction on the distinct values of variables of polynomials, we can use the distinct
values principle and we can simplify the general definition of Hm1:n by setting
m = |m1:n | = m1:n , then starting with only first-order derivatives of by each
variable,
∂m
Hm (X1:m ) = (X1:m ,a1:m ) . (4.3)
∂a1:m
Then equating
each X1:m1 by Y1 , etc. Xmn−1 +1:mn , by Yn , we arrive at polynomial
Hm1:n Y1m1 , Y21m2 , .., Yn1mn . Let 1:n be the type of multi-index m1:n , then we
could use the notation
Hm1:n Y1m1 , Y21m2 , .., Yn1mn = H1:n (Y1:n ) (4.4)
4.2 Hermite Polynomials of Several Variables 187
as well. An instance of getting H3 (Y1 , Y1 , Y2 ) = H3 Y12 , Y2 , is deriving H3 of
distinct variables by the first derivatives of :
then change X1 and X2 into Y1 , X3 into Y2 , including the indices of covariances σ12
into σ11 and σ23 into σ12 , now we obtain
H3 Y12 , Y2 = Y12 Y2 − 2σ12 Y1 − σ11 Y2 .
n−1
Hn (X1:n ) = Xn Hn−1 X1:(n−1) − σj n Hn−2 X[1:(n−1)]\j , (4.5)
j =1
Then take an independent copy Y = Y1:n of X = X1:n , and notice the usage of
conditional expectation
(X1:n ,a1:n ) = exp a X E exp ia X
= E exp a X |X1:n E exp ia Y |X1:n
= E exp a (X + iY) |X1:n ,
now the series expansion of the exponent exp (a (X + iY)) provides directly the
Hermite polynomials
n
Hn (X1:n ) = E (X1:n + iY1:n ) |X1:n = E
1n
(Xk + iYk ) |X1:n .
k=1
Example 4.1 Let us take n = 4 and consider (4.6). Here we shall use two facts,
which will be shown later, one is that the expected values of Gaussian products of
odd orders are zero and the other one is
4
H4 (X1:4 ) = Xk − σ12 X3 X4 − σ13 X2 X4 − σ14 X2 X3 − σ23 X1 X4
k=1
−σ24 X1 X3 − σ34 X1 X2 + σ12 σ34 + σ13 σ24 + σ14 σ23 ,
Hn (X1:n ) = Hn (Xp(1:n) ),
Hn+1 (aY + bZ, X1:n ) = aHn+1 (Y, X1:n ) + bHn+1 (Z, X1:n ),
i.e. Hn is multilinear, cf. Exercise 4.1, p. 234. We can apply the multilinear property
on each variable one after another
H2 (a1 X1 + a2 X2 , a1 X1 + a2 X2 ) = a1 H2 (X1 , a1 X1 + a2 X2 )
+a2 H2 (X2 , a1 X1 + a2 X2 )
= a12 H2 (X1 , X1 ) + 2a1a2 H2 (X1 , X2 )
+a22 H2 (X2 , X2 ) .
One can
transform
Hn (X) to depend on the standard variable, since Hn (X) =
σ n Hn σ −1 X .
190 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants
Next, we will show some general results on moments and cumulants for Gaussian
systems and Hermite polynomials.
Let us consider the higher-order moments for a Gaussian system X = {Xk } with
EXk = 0 and covariance σj k = Cov(Xj , Xk ). We have expressed the joint moments
in terms of cumulants, (see (3.40), p. 138), since cumulants of order larger than 2
are zero, the covariances remain only in the expression
1n
n r
EX1:n = μ(1:n) = κX,bj .
j =1
r=1 K{r} ∈Pn
4.3 Moments and Cumulants for Gaussian Systems 191
This implies that the summation is taken over all partitions of pairs K II ∈ PII
n of the
set 1 : n = {1, 2, ...n}; therefore,
μ(1:n) = σj k .
K II ∈PII
n (j,k)∈K
II
We have seen that the number of partitions K II is SnII = (n − 1)!! if n is even ( cf.
(1.51), p. 43); therefore, we obtain
n/2
μ(1:n) = σjm km , (4.7)
(n−1)!! m=1
1n
where (jm , km ) ∈ K II . Note EX1:n = 0, for odd n, and EX2n = (n − 1)!!σ 2n . In
practice we use the inclusive–exclusive method for generating partitions of pairs,
which serve as indices of σj k , (see Sect. 1.4, p. 26). The following algorithm
2n−1
μ(1:2n) = σk,2n μ(1:(2n−1))\k , (4.8)
k=1
can be applied, starting by n = 1, the index (1 : (2n − 1)) \k means that k is missing
from the set 1 : (2n − 1).
Example 4.2 For instance let n = 4, then
and if Xj = X, j = 1 : 4, then
μX,4 = 3σ 4 .
If n = 6, then we obtain
μ1:6 = σ16 (σ23 σ45 + σ24 σ35 + σ25 σ34 ) + σ26 (σ13 σ45 + σ14 σ35 + σ15 σ34 ) (4.10)
+σ36 (σ12 σ45 + σ14 σ25 + σ15 σ24 ) + σ14 (σ12 σ35 + σ13 σ25 + σ15 σ23 )
+σ15 (σ12 σ34 + σ13 σ24 + σ14 σ23 ) ,
192 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants
μX,6 = 15σ 6 .
The first-order moments of the Hermite polynomials are zero by the construction,
(see Recursion 1), the second-order ones are
n
EHn Xk1:n Hm Xj1:m = δnm σks ,jp(s) , (4.11)
p s=1
and
EHn2(X) = n!σ 2n .
m
E Hkj Xbj = σj k , (4.12)
j =1 {K II |( L,K II )nl } (j,k)∈K II
where the summation is taken over all diagrams L, K II without loops. If the set
II *
K * L, K II nl is empty, then the expectation is zero.
In particular, if p = 2, i.e. L = (b1 , b2 ), then the diagram L, K II without loop
corresponds to a partition K II containing pairs (k1 , k2 ) so that k1 ∈ b1 and k2 ∈ b2 .
One can reach all such types of partitions by fixing the first entries and permuting
the second ones, see (4.11).
Example 4.4 Take m = 3 and kj = 2, j = 1, 2, 3 then (4.12) gives that
EH2 (X1 , X2 ) H2 (X3 , X4 ) H2 (X5 , X6 ) = σ23 σ45 σ16 + σ23 σ46 σ15 + σ24 σ35 σ16
+σ24 σ36 σ15 + σ25 σ36 σ41 + σ25 σ46 σ31
+σ26 σ35 σ41 + σ26 σ45 σ31 . (4.13)
4.3 Moments and Cumulants for Gaussian Systems 193
Cum (Xb ) = 0,
where XL denotes the vector of products of variables Xb1 , Xb2 , . . . , Xbp .
Now if the system X1:n is Gaussian (with zero mean), then all cumulants are
zero on the right-hand side of (4.14) except the second-order ones. Therefore the
summation is taken only for K ∈ PII n . The assumption that L ∪ K = O, i.e. the
II
graph (L, K II ) is closed, (see Sect. 1.4.8, p. 43). Thus we have the formula
Proposition 4.2
Cum(XL ) = σij . (4.15)
{K II |(L,K II )cl } (i,j )∈K II
If the set of partitions K II of the closed diagram L, K II cl is empty, then the
cumulant is zero. This implies that n must be an even number.
Example 4.5 We let L = {b1 = (1) , b2 = (2 : 4)}, then
cf. (4.9).
Second-order cumulant of two Hermite polynomials coincides with the expected
value of their product
n
Cum Hn Xk1:n , Hm Xj1:m = EHn Xk1:n Hm Xj1:m = δnm σks ,jp(s) ,
n! s=1
cf. (4.11), since the expected values of Hermite polynomials are zero.
194 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants
Proposition 4.3 Take a partition L = b1 , b2 , . . . , bp of 1 : n such that |bj | = nj
then
Cum Hn1 (Xb1 ), Hn2 (Xb2 ), . . . , Hnp (Xbp ) = σj k ,
{K |(L,K )cnl } (j,k)∈K
II II II
(4.16)
where the summation is over all closed diagrams L, K II cnl without loops.
Again, if the set of partitions K II of the closed diagram L, K II without a loop
is empty, then the cumulant is zero, for instance, if n is odd the set PII
n is empty,
therefore the cumulant is zero.
Remark 4.3 Using partition matrices U, see Sect. 1.4, one can construct algorithms
for generating partitions for all diagrams above serving to obtain moments and
cumulants.
Suppose that all variables below in the examples of this subsection are jointly
Gaussian with mean zero. The following formulae will be useful for calculating
higher-order spectra of particular random processes.
Example 4.6 Let X1:2k be jointly Gaussian with mean zero then the formula (4.16)
has the following form:
where summation is all over closed diagrams L, K II cnl without loops and
L = (b1 , . . . bk ), bj = (2j − 1, 2j ), the number of these diagrams is 2k−1 (k − 1)!
(see Case 1.7, p. 46). This example is a particular case of cumulants of products
expressed by the products of cumulants for Gaussian variables, since the cumulant
of products X2j −1 X2j equals to the cumulant of H2 X2j −1 , X2j . The particular
cases of the Formula (4.17) for k = 2, 3, 4 gives us the following:
k = 2. Let k = 2, then formula (4.17) implies
Compare this result with Cum (X1 X2 , X3 X4 ), when variables are centered
Gaussian ones. Two particular cases follow
2
Cum (H2 (X1 ) , H2 (X2 )) = 2σ12 ,
4.3 Moments and Cumulants for Gaussian Systems 195
and
k = 3. Let k = 3, then one can show that Cum (H2 (X1 , X2 ) , H2 (X3 , X4 ) ,
H2 (X5 , X6 )) coincides with (4.13), recall the connection between third-order
cumulants and third-order central moments, cf. (3.21), p. 129. Compare this
result with cumulant of products Cum (X1 X2 , X3 X4 , X5 X6 ), when all variables
are Gaussian, see Example 1.32. Assume further X2j −1 = X2j , then we have
In particular
since 23 3! = 48.
Remark 4.4 We have seen in Example 3.6, p. 119, that the cumulants of the Gamma
distribution (μ, α) are
(n − 1)!μ
κX,n = ,
αn
Consider an example Cum (H1 (X1 ), H1 (X2 ), H20 (X3 )), here r = 11 < 20,
and it is clear that there is no closed diagrams without loops; therefore,
Cum (H1 (X1 ), H1 (X2 ), H20 (X3 )) = 0. As far as ri ≤ r, i.e.
r1 + r2 + r3
ri ≤ ,
2
for instance r1 ≤ r2 + r3 , in other words, if the triangle inequality is valid then the
formula (4.18) is also valid, otherwise the cumulant is zero. In particular, if Xj ’s
coincide then
Cum Hr1 (X), Hr2 (X), Hr3 (X) = S II (r1:3 ) σ 2r . (4.19)
Case 4.3 (Cumulant of 4 Hermite Polynomials) Consider the case Cum Hr1 (X1 ),
Hr2 (X2 ), Hr3 (X3 ) , Hr4 (X4 ) . We are interested in closed diagrams L, K II
4.4 Products of Hermite Polynomials, Linearization 197
without loops, where L = (b1 , b2 , b3 , b4 ), with |bj | = rj , let r1 +r2 +r3 +r4 = 2r,
i.e. the sum is an even number, see Case 1.8, p. 46, for details. We have
Cum Hr1 (X1 ), Hr2 (X2 ), Hr3 (X3 ), Hr4 (X4 )
∗ r1 !r2 !r3 !r4 !
= σ k12 σ k13 σ k14 σ k23 σ k13 σ k34 , (4.20)
k12 !k13 !k14 !k23!k24 !k34 ! 12 13 14 23 24 34
where summation ∗ runs for all nonnegative integers k12 , k13 , k14 , k23, k24 , k34 , so
that k12 +k13 +k14 +k23 +k24 +k34 = r, and r1 = k12 +k13 +k14, r2 = k12 +k23 +k24,
r3 = kII13
+ k23 + k34 , r4 = k14 + k24 + k34 . The number of all closed diagrams
L, K without loops is S II (r1:4 ); see (1.55), p. 47 for details, hence
Cum Hr1 (X), Hr2 (X), Hr3 (X), Hr4 (X) = S II (r1:4 ) σ 2r . (4.21)
Note, the cumulant is zero if either r1 + r2 + r3 + r4 is odd or if r1 = max rj , say,
and r1 > r2 + r3 + r4 .
The formula for the general case is more complicated since splitting blocs for
building loops are not unique.
There have been attempts to express the product of several Hermite poly-
nomials in terms of linear combination of Hermite polynomials. The coeffi-
cients of that linear form could serve
for a general formula of the cumulant
Cum Hr1 (X), Hr2 (X), . . . , Hrp (X) , p ≥ 4, similar to (4.19).
1n
Proposition 4.4 The product X1:n is expressed in terms of Hermite polynomials by
⎛ ⎞
1n
⎜ ⎟
X1:n = ⎝ σij ⎠ Hd XDK , (4.22)
d≡nmod(2) KdI,II (i,j )∈KdI,II
where the sum is taken over all d ∈ 0 : n with d ≡ n mod (2), actually d = n :
2 : 0 = n, n − 2, . . ., the second sum is taken over all partitions KdI,II ∈ PI,II
n and
where XDK = {Xm | (m) ∈ DK }. We note that it is necessary that the evenness of
the number of arms dK be the same as the evenness of n, i.e. dK ≡ n mod (2).
This follows from the recurrence formula, (4.5), since all the powers in a Hermite
polynomial have the same evenness as their order, and the number of n − d must be
even.
Example 4.7 Let us consider the formula (4.22) when n = 4. All possible choices
for arms d are 4, 2, and 0. If d = 4, K4I,II contains only arms, K I,II =
{(1) , (2) , (3) , (4)}. If d = 2, K2I,II contains 2 arms and 1 pair, an example
K I,II = {(1) , (2) , (3, 4)} with σ34 H2 (X1 , X2 ). Finally, if d = 0, K0I,II contains
only pairs, for example K I,II = {(1, 2) , (3, 4)}, etc. We have
14
X1:4 = H4 (X1:4 ) + σ12 H2 (X3 , X4 ) + σ13 H2 (X2 , X4 ) + σ14 H2 (X2 , X3 )
+σ23 H2 (X1 , X4 ) + σ24 H2 (X1 , X3 ) + σ34 H2 (X1 , X2 ) + σ12 σ34
+σ13 σ24 + σ14 σ23 ,
and in particular
X4 = H4 (X) + 6σ 2 H2 (X) + 3σ 4 .
It is worth mentioning a particular case of (4.22) when all variables coincide, i.e.
expressing Xn in terms of Hermite polynomials. We collect thecoefficients
by dK .
If dK = k, then there are (n − k − 1)!! partitions available and nk possible choices
for arms. If the number of arms dK = k then n − k must be even, the case of odd
n − k is handled by (n − k − 1)!!, since it is zero if n − k is odd, cf. (1.56), p. 48.
Therefore we have
n
[n/2]
n n−k n!
X = n
(n − k − 1)!! σ Hk (X) = σ 2k Hn−2k (X) .
k (n − 2k)!k!2k
k=0 k=0
(4.23)
This latter formula is usually referred to as the inversion formula for Hermite
polynomials given by (4.24). Once again here (n − k − 1)!! = 0 unless n ≡ k,
4.4 Products of Hermite Polynomials, Linearization 199
mod (2), so there are [n/2] terms to add in (4.23). These terms correspond to the
orders n, n − 2, . . . , n − 2k, . . . , n − 2 [n/2], with coefficient
n n!
(2k − 1)!! = .
n − 2k (n − 2k)!k!2k
The Hermite
polynomials Hn (X1:n ) can also be expressed in terms of the products
XDK = (m)∈DK Xm , i.e.
⎛ ⎞
⎜ ⎟
Hn (X1:n ) = (−1)(n−d)/2 ⎝ σij ⎠ XDK ,
d≡nmod(2) KdI,II (i,j )∈KdI,II
where the sum is taken over all partitions KdI,II for which the evenness of the number
of arms d ∈ (0 : n) is the same as n, i.e. d ≡ n mod (2).
The case when all variables coincide, i.e. Xj = X, j = 1, . . . , n, implies
[n/2]
(−1)k n!
Hn (X) = Xn−2k σ 2k . (4.24)
(n − 2k)!k!2k
k=0
where d ∈ (0 : n), and the summation is over all closed diagrams L, KdI,II
cnl
without loops.
n−1
Xn Hn−1 (X1:n−1 ) = Hn (X1:n ) + σj n Hn−2 X[1:(j −1),(j +1):(n−1)] ,
j =1
since Xn = H1 (Xn ) and (4.25) is applied. This formula has been considered as the
recursion formula for the Hermite polynomials (see recurrence formula p. 187).
200 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants
Case 4.4 (Product of 2 Hermite Polynomials) The product of two Hermite poly-
nomials Hm (X1 ) and Hn (X2 ) can be simplified as a special case of (4.25). One
needs to collect the number of blocks in the partition K I,II with respect to the
partition L = {b1 = (1 : m), b2 = (m + 1 : m + n)}, when the number of arms
fixed m + n − 2r, say, it has been given in (1.57). p. 49. Since in that case, the
r , we have
product of covariances is the same, i.e. σ12
min(m,n)
m n r
Hm (X1 ) Hn (X2 ) = r! σ Hm+n−2r X1m−r , X21n−r . (4.26)
r r 12
r=0
When the variates X1 and X2 are identical, a particular case of (4.26) writes as
min(m,n)
m!n!
Hm (X) Hn (X) = σ 2r Hm+n−2r (X) . (4.27)
(m − r)! (n − r)!r!
r=0
H42 (X) = H8 (X) + 16σ 2 H6 (X) + 72σ 4 H4 (X) + 96σ 6 H2 (X) + 24σ 8 . (4.28)
Case 4.5 (Product of 3 Hermite Polynomials) Now let us consider the product of
three Hermite polynomials Hn1 (X1 ) Hn2 (X2 ) Hn3 (X3 ). Let n = n1 + n2 + n3 ,
ri ≤ ni , and introduce the following notation for the cumulant:
Cr1:3 (X1:3 ) = Cum Hr1 (X1 ), Hr2 (X2 ), Hr3 (X3 ) ,
n
n1:3
Hn1 (X1 ) Hn2 (X2 ) Hn3 (X3 ) = Cr1:3 (X1:3 ) Hn−2r
r1:3
r=0 r1 +r2 +r3 =2r
× X1n1 −r1 , X21n2 −r2 , X31n3 −r3 , (4.29)
4.4 Products of Hermite Polynomials, Linearization 201
from (4.29).
Now we present an example to show the use of these formulae.
Example 4.10 We consider the product H2 (X1 ) H2 (X2 ) H2 (X3 ) using (4.25) with
ni = 2,
H2 (X1 ) H2 (X2 ) H2 (X3 ) = H6 X12 , X212 , X312 + 4σ23 H4 X12 , X2 , X3
+ 4σ12 H4 (X1, X2 , X312 ) + 4σ13 H4 (X1, X212 , X3 ) + 8σ12 σ13 H2 (X2 , X3 )
+ 8σ13 σ23 H2 (X1 , X2 ) + 8σ12 σ23 H2 (X1 , X3 )
+ 2σ13
2
H2 (X2 ) + 2σ23
2
H2 (X1 ) + 2σ12
2
H2 (X3 ) + 8σ12 σ13 σ23 .
One can check this expression using (4.29) and building the following table:
n1:3
r ri rj r1:3 S II (r1:3 ) Coefficient # of terms
0 0 0 1 1 1 1
1 0 1, j = i 4 1 4 3
2 2 1, j = i 4 2 8 3
2 0 2, j = i 1 2 2 3
3 2 2, j = i 1 8 8 1
where ri denotes one of the triple r1:3 , and rj the rest two of them.
Particularly, if Xk = X, k = 1 : 3, then
2H
observe that the terms like σ13 σ23 H2 (X1 , X2 ) and σ13 2 (X2 ) provide terms with
the same type σ H2 (X), hence the number of them is 3 · 8 + 3 · 2 = 30.
4
Case 4.6 (Product of 4 Hermite Polynomials) The case of product of four Her-
mite polynomials Hn1 (X1 ) Hn2 (X2 ) Hn3 (X3 ) Hn4 (X4 ) follows from the result of
Case 1.11 and (4.20). Let n = n1:4 , ri ≤ ni and
Cr1:4 (X1:4 ) = Cum Hr1 (X1 ), Hr2 (X2 ), Hr3 (X3 ), Hr4 (X4 ) ,
then we have
In particular, for one variable case, we can use formula (1.59), p. 49, and obtain
⎛ ⎞
n
Hn1 (X) Hn2 (X) Hn3 (X) Hn4 (X) = ⎝ SnI,II (r1:4 )⎠ σ 2r Hn−2r (X) ,
1:4
r=0 r1:4 =2r
(4.31)
where SnI,II
1:4 (r1:4 ) is analogue to (4.30).
Expressions similar to (4.26), (4.29), and (4.31) for the product of several
Hermite polynomials, are much more complicated, and therefore we omit them and
suggest using (4.25) instead.
where the list of real vectors a1:n follows the structure of X1:n . Similarly to the
multivariate case, the right-hand side depends on covariance matrices Ck,j , which
correspond to the Gaussian system X1:n (recall EXk = 0) on left-hand side, the
relationship between them is one to one. We recall the connection κ ⊗j,k = vec Ck,j
4.5 T-Hermite Polynomials 203
between cumulants and covariance matrices, given in (3.9), p. 122. Observe that κ ⊗
j,k
is a second-order tensor with covariance entries. We can change covariance matrices
to cumulants in (4.32) and obtain
⎛ ⎞
n
1
n
⊗
(X1:n , a1:n ) = exp ⎝ Xk a k − κ j,k aj ⊗ ak ⎠ .
2
k=1 j,k=1
The variables of Hermite polynomials are taken from a Gaussian random system,
we emphasize that Hn (X1:n ) depends on the distribution of X1:n , more precisely on
covariance structure of X1:n , (we keep EX = 0).
The first derivative of the generator function is
∂
n
Da⊗j =⊗ = (X1:n , a1:n ) Xj − Cj,u au ,
∂aj
u=1
− (X1:n , a1:n ) κ ⊗
j,k ,
∂
Cj,k ak ⊗ = Cj,k ⊗ Ik vec Ik = vec Ck,j = κ ⊗
j,k ,
∂ak
204 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants
(see (1.7), p. 7) and we see that all entries of the matrix Xk X j − Ck,j are the
corresponding Hermite polynomials.
The T-Hermite polynomial Hn is with dimension d1:n and all entries are
Hermite polynomials of order n. Hn is not n-symmetric, since one cannot change
the order of the T-derivatives; (see Sect. 1.3.1, p. 13 for the notion of n-symmetric
vectors). The entries of Hn are Hermite polynomials; therefore, the entries are
symmetric as the function of scalar variables. For instance let X1 = [X1 , X2 ]1 ,
X2 = [X1 , X2 ]2 then
H2 (X1:2 )
= H2 X1,1 , X1,2 , H2 X1,1 , X2,2 , H2 X2,1 , X1,2 , H2 X2,1 , X2,2 ,
where Xk,j is the kth entry of Xj . Each
entry H2 Xi,j , Xm,n of H2 (X1:2 ) is
symmetric, i.e. H2 Xi,j , Xm,n = H2 Xm,n , Xi,j .
Now, it is satisfactory to index the variables of up to the order of the Hermite
polynomial, let n = 3, and consider the third derivative,
⊗
3
Da⊗1:3 = Da⊗3 ⊗
Da1:2 = Da3 (X1:3 , a1:3 ) X1 − C1,u au
u=1
3
⊗ X2 − C2,u au − vec C2,1
u=1
3
3
= (X1:3 , a1:3 ) X1 − C1,u au ⊗ X2 − C2,u au
u=1 u=1
3
⊗ X3 − C3,u au
u=1
4.5 T-Hermite Polynomials 205
3
− (X1:3 , a1:3 ) vec C2,1 ⊗ X3 − C3,u au
u=1
3
− (X1:3 , a1:3 ) X1 − C1,u au ⊗ vec C3,2
u=1
3
+ (X1:3 , a1:3 ) K−1
(132) (d1:3 ) vec C3,1 ⊗ X2 − C2,u au ,
u=1
where the matrix K−1(132) (d1:3 ) changes the order of the T-product of vectors. Let us
⊗
take Da1:3 at zero and obtain the third-order T-Hermite polynomial
*
H3 (X1:3 ) = Da⊗1:3 *a =0 = X⊗1 ⊗ ⊗
1:3 − κ 1,2 ⊗ X3 − X1 ⊗ κ 2,3
3
1:3
− K−1 ⊗
(132) κ 1,3 ⊗ X2 , (4.34)
where we interpret the product K−1 ⊗
(132) κ 1,3 ⊗ X2 as follows. If we have the
product X1 ⊗ X2 ⊗ X3 and we need the covariance κ ⊗ 1,3 of X1 and X3 in the product,
first, we interchange X2 and X3 by K(132), then take the covariance vector κ ⊗ 1,3 of
−1
X1 , X3 and then we reorder by K(132) back to the original order. Now, we rewrite
(4.34), so that each T-product term start with a constant
⊗1
H3 (X1:3 ) = X1:33 − κ ⊗ −1 ⊗ −1 ⊗
1,2 ⊗ X3 − K(231) κ 2,3 ⊗ X1 − K(132) κ 1,3 ⊗ X2 . (4.35)
At this point, we note that (4.34) can be rewritten into another form, namely
H3 (X1:3 ) = X⊗1 ⊗ −1 ⊗ −1 ⊗
1:3 − X1 ⊗ κ 2,3 − K(312) X3 ⊗ κ 1,2 − K(213) X2 ⊗ κ 1,3 ,
3
where we set the constants after the variables. One can get some even more
equivalent forms, changing the orders inside the covariances, κ ⊗
3,1 for instance.
Our purpose is to simplify general formulae; therefore, we will prefer setting
constants either before or after the variables. If an expression includes more T-
Hermite polynomials, then we suppose that all of them are in the same form.
If Xj = X, then we have a simple form
H3 (X) = X⊗3 − L−1
12 ,11 κ ⊗
X,2 ⊗ X = X ⊗3
− L−1
11 ,12 X ⊗ κ ⊗
X,2 , (4.36)
L−1 −1 −1
12 ,11 = Id 3 + K(312) + K(213) .
206 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants
Hn (X) = Hn (X, X, . . . , X) .
+ ,- .
n
H5 (X) = X⊗5 − 10X⊗3 ⊗ κ X,2 + 15κ ⊗2
X,2 ⊗ X.
Generating function (X1:n , a1:n ) can be written using the characteristic func-
tion, see Property 4 for more details,
⎛ ⎞
n
1
n
(X1:n , a1:n ) = exp ⎝ a k Xk − ak Ck,j aj ⎠
2
k=1 k,j =1
n
n
= exp ak Xk E exp i a k Xk
k=1 k=1
( n * )
*
*
= E exp ak (Xk + iYk ) * X1:n ,
*
k=1
where Y1:n is an independent copy of X1:n . Now the Taylor series expansion of the
exponent by a1:n inside the conditional expectation coincides with the Taylor series
expansion of (X1:n , a1:n ) and equating the appropriate coefficients of the powers
of ak we obtain the following.
Lemma 4.1 (Construction by Condition) The Hermite polynomial Hn (X1:n ) can
be given by the following conditional expectation:
⊗ * ⊗ *
* *
Hn (X1:n ) = E (Xk + iYk )* X1:n = E (xk + iXk )* .
k=1:n k=1:n x1:n =X1:n
(4.37)
4.5 T-Hermite Polynomials 207
Proof We have seen that the generator (X1:n , a1:n ) has the following form:
*
(X1:n , a1:n ) = E exp a1:n X1:n + ia1:n Y1:n * X1:n
*
= E exp a1:n (x1:n + iX1:n ) *x =X .
1:n 1:n
Now let us take the T-derivative by a1:n of the second equality before plugging
x1:n = X1:n in, and obtain
* ⊗
Da⊗1:n E exp a1:n (x1:n + iX1:n ) *a =E (xk + iXk ) ,
1:n =0 k=1:n
It follows directly from the definition that we can permute the variables of a
T-Hermite polynomial using commutator matrices so that it satisfies the following
equation:
Property 4.8 (Permutation of Variables) For each permutation p ∈ Pn , we have
Hn Xp(1:n) = Kp (d1:n ) Hn (X1:n ) . (4.38)
Similarly, the nth-order T-Hermite polynomial can be expressed by the previous two
T-Hermite polynomials.
Theorem 4.2 (Recurrence Relation) We have H0 = 1, H1 (X1 ) = X1 , and
if n > 1
Hn (X1:n ) = Hn−1 X1:(n−1) ⊗ Xn
n−1
− K−1 ⊗
(n,j,[1:(n−1)]\j ) (d1:n ) κ n,j ⊗ Hn−2 X[1:(n−1)]\j .
j =1
(4.40)
Remember that κ ⊗
k,j = Cum2 Xk , Xj = vec Cj,k .
See Sect. 4.8.1, Appendix for the proof.
One can have another equivalent form to (4.40).
n−1
Hn (X1:n ) = Hn−1 X1:(n−1) ⊗ Xn − K−1
((j :n−1)S )n
j =1
× (d1:n ) Hn−2 X[1:(n−1)]\j ⊗ κ ⊗
j,n , (4.41)
where permutations (j : n − 1)S n take the j th element to the place n − 1 and
leaves the rest elements of 1 : n unchanged (see Sect. 1.1, p. 1 for permutations
in cycle notation). The notation [1 : (n − 1)] \j denotes the index set when the
component j is missing from 1 : (n − 1).
The commutators in Eq. (4.40) define the commutator
n−1
Jn = K−1
(n,j,[1:(n−1)]\j ) (d1:n ) . (4.42)
j =1
The recurrence formula (4.40) implies the following form for equal variates:
Hn (X) = Hn−1 (X) ⊗ X − Jn Hn−2 (X) ⊗ κ ⊗
X,2 . (4.43)
Hn (X) = Hn−1 (X) ⊗ X − (n − 1) Hn−2 (X) ⊗ κ ⊗
X,2 .
Note that symmetry equivalence = is useful only when one side of the equivalence,
like Hn here, is n-symmetric; because in that case one gets equality using sym-
metrizer Sd1n on one side only.
4.5 T-Hermite Polynomials 209
Although the formula of H4 with symmetrizer Sd14 is much simpler than (4.43),
still for large d, say, d ≥ 8 it is useless computationally since the symmetrizer is a
d 4 × d 4 matrix and assumes evaluating 4! permutational matrices.
Property 4.9 (Independent Variables) It is easy to see, using the definition (4.33),
if X1:k and X(k+1):n are independent Gaussian random variates, then
Hn (X1:n ) = Hk (X1:k ) ⊗ Hn−k X(k+1):n .
Here we used our usual notation for the repeated variables Yj 1kj , i.e. Yj 1kj =
Yj , Yj , . . . , Yj is a list with kj vector components.
+ ,- .
kj
Remark 4.5 Although T-Hermite polynomials are unique, the terms involved can
be put in different forms, an instance is the following:
X1 ⊗ κ ⊗ −1 ⊗ −1 ⊗
2,3 ⊗ X4 = K(1324) X1 ⊗ κ 3,2 ⊗ X4 = K(1423) X1 ⊗ X4 ⊗ κ 2,3 ,
etc.
Multi-linearity has different forms for T-Hermite polynomials, we will consider
three of them. For instance
H2 (a ⊗ Xk , Xj ) = (a ⊗ Xk ) ⊗ Xj − a ⊗ κ ⊗
k,j = a ⊗ H2 Xk , Xj ,
Proof First, we show (4.44). In fact, we use Theorem 4.2 and obtain
= Hn (X1:n ) ⊗ (Y + Z)
n
− K−1 κ ⊗
+ κ ⊗
⊗ H n−1 X (1:n)\j
(n+1,(j :1)S ) Y,Xj Z,Xj
j =1
showing (4.46).
Consecutive application of formula (4.45) provides the following:
⊗
Hn (Ak Xk )1:n = Ak Hn (X1:n ) , (4.47)
1:n
If we take the derivative by Xk different from Xn , then (4.48) is not valid. Let
n = 3, for example, and take the derivative by X2 , then we have
⊗ ⊗ ⊗ ⊗
DX 2
H 3 (X 1:3 ) = DX2 (H 2 (X 1:2 ) ⊗ X 3 ) − DX2 H 1 (X 1 ) ⊗ κ 2,3
⊗ −1
−DX K
2 (132)
κ⊗1,3 ⊗ H1 (X2 )
= K−1 −1 ⊗
(1324) X1 ⊗ X3 ⊗ vec Id2 − K(1324) κ 1,3 ⊗ vec Id2
= K−1
(1324) H2 X1,3 ⊗ vec Id2 .
Proof We shall use the recurrence relation, Theorem 4.2. First, we reorder the
variables
Hn (X1:n ) = K−1
(k:n)S Hn X1:n\k , Xk ,
n
⊗
DX Hn (X) = K−1
((k:n) (Hn−1 (X) ⊗ vec Id ) , (4.49)
S )n+1
k=1
where ((k : n)S )n+1 ∈ Pn+1 , (see Sect. 1.1, p. 1 for the cycle notation of
⊗
permutations). The derivative DX Hn can be obtained using symmetrizer as well
n
⊗ −1
DX Hn (X) = Sd1n ⊗ Id K(k:n) ⊗ Id (Hn−1 (X) ⊗ vec Id )
S
k=1
= n Sd1n ⊗ Id (Hn−1 (X) ⊗ vec Id ) .
⊗
for proving (4.49). The assertion follows from T-derivative DX Hn (X) =
⊗
EDx (x + iX); see (2.36) for the T-derivative of T-product of functions.
Since the Hermite polynomial Hn (X) depends on the variance–covariance
matrix of X, (recall EX = 0), we introduce the notation Hn (x|) for Hn (X),
where x is a real vector.
Property 4.13 (Rodrigues’s Formula) We have the following Rodrigues’s formula
for Hermite polynomial Hn (x|) namely
by μ
*
* 1 −1 **
Dμ⊗n ϕ (x−μ|0, )*μ=0 = ϕ (x|0, )Dμ⊗n −1
exp x μ − μ μ *
2 μ=0
*
*
= ϕ (x|0, ) Dμ⊗n −1 x, μ| −1 * , (4.51)
μ=0
* ⊗n
Dμ⊗n ϕ (x−μ|0, )*μ=0 = ϕ (x|0, ) −1 Hn (x|) .
We also have
*
Dμ⊗n ϕ (x−μ|0, )*μ=0 = (−1)n Dx⊗n ϕ (x|0, ) .
Now we replace these results into Eq. (4.51), and rearrange them to get Rodrigues’s
formula.
⊗n
Note Hn −1 x| −1 = −1 Hn (x|) is referred to as covariant Hermite
polynomials.
Remark 4.7 There is another definition of Hn (X) which differs from ours
in
changing X into −1 X. The corresponding polynomials are Hn −1 X =
−1⊗n Hn (X), cf. Rodrigues’s formula (4.50). Since Hn (X) is n-symmetric, it
is also possible to list only the distinct values. One can use eliminating matrix Q+d,n
for this purpose: Q+ d,n Hn (X), cf. (1.32), p. 21.
The Multilinear Property 4.10 of T-Hermite polynomials Hn provides us the
multinomial formula.
214 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants
We see that in each case above we have n-symmetry equivalence =; to get equality
we should apply the symmetrizer Sd1n .
Gaussian systems are usually called linear, the main reason is that for jointly
Gaussian vectors X, and Y the conditional expectation
that is the best prediction of X by Y is linear in Y, where CX,Y logically denotes the
covariance matrix of X and Y, and the variance–covariance matrix CY,Y assumed to
be positive definite. Moreover, X−E (X|Y) and Y are uncorrelated; therefore, they
are independent. An immediate consequence of this is that the conditional variance
Var (X|Y) = Var (X−E (X|Y)) is constant
and obtain
E (X − E (X|Y) + E (X|Y))⊗2 |Y = E (X − E (X|Y))⊗2 |Y
+ E (E(X|Y))⊗2 |Y .
Proof We use the formula (4.52) for Hn (X) = Hn (X−E (X|Y) + E (X|Y)) and
obtain
since X−E (X|Y) and E (X|Y) are independent, indeed they are uncorrelated
E (X−E (X|Y)) (E (X|Y) − μX ) = E X − μX − CX,Y C−1
Y,Y (Y−μY )
× (Y−μY ) C−1
Y,Y CY,X
= CX,Y C−1 −1
Y,Y CY,X − CX,Y CY,Y CY,X = 0,
Q.E.D.
Please, note that the conditional expectation is not element-wise, each entry of
E (X|Y) depends on the whole Y.
where the summation is taken over all partitions K II of PII n . The number of all
partitions K II ∈ PII
n is (n − 1)!! (see (1.51), p. 43); therefore, there are (n − 1)!!
terms in the above sum. If n is odd, then μ⊗ 1:n is zero.
In particular if , say, n = 2k is even and the variables of X1:n are the same
Xk = X, then
EX⊗2k = μ⊗ −1 ⊗k
X,2k = Lk2 κ X,2 ,
4.6 Moments, Cumulants, and Linearization 217
⊗1
15
⊗ ⊗ ⊗
EX1:66 = K−1
pj κ pj (1) ⊗ κ pj (2) ⊗ κ pj (3) ., (4.55)
j =1
where the permutations pj of the numbers 1:6 are originated by the indices of the
products in (4.10). Partitions
yield permutations as follows: a permutation pj ∈
P6 is split into three pairs pj (1) |pj (2) |pj (3) , so that pj (k) corresponds to the
cumulant κ ⊗pj (k) . Moreover if Xj = X, j = 1 : 6, then
EX⊗6 = L−1 ⊗3 ⊗3
32 κ X,2 = 5!!κ X,2 , (4.56)
15
L−1
32 = K−1
pj ,
j =1
n−1
μ⊗
1:2n = K−1 ⊗ ⊗
(k:2(n−1))S (d1:2n ) μ(1:2(n−1))\k ⊗ κ k,n ,
k=1
where the permutation (k : 2 (n − 1))S ∈ P2n is in cycle notation (see Sect. 1.1,
p. 1); it is simply moving k into the place 2 (n − 1).
First-order moments of Hermite polynomials are zero by the construction, the
second-order ones are,
⊗
E Hn Xk1:n ⊗ Hm Xj1:n = δnm K−1
mn (p) d k1:n , d j1:n κ⊗
ks ,jp(s) ,
s=1:n
p∈Pn
(4.57)
218 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants
with n! terms.
In particular
⊗n
EHn (X1 ) ⊗ Hn (X2 ) = M−1
mn d1n , d21n κ 1,2 ,
Example 4.15 We provide the detailed formula for EH3 (X1:3 ) ⊗ H3 (X4:6 ) at (A.21)
which implies
⊗3
E [H3 (X) ⊗ H3 (X)] = M−1
m3 κ X,2 ,
−1 −1 −1 −1 −1 −1
M−1
m3 = K(142536) + K(142635) + K(152436) + K(152634) + K(162435) + K(162534).
Vector EHn (X) ⊗ Hn (X) is not 2n-symmetric, nevertheless we can simplify it.
Let us consider the case n = 3 and replace the inverse commutator matrices by
commutator matrices; then we obtain
M−1
m3 = K(135246) + K(135264) + K(135426) + K(135624) + K(135462) + K(135642),
therefore
⊗3 ⊗3
EH3 (X)⊗2 = M−1
m3 κ X,2 = 3! Id 3 ⊗ Sd13 K(135246)κ X,2 .
In general we have
−1 ⊗n
EH⊗2
n (X) = Mmn κ X,2 . (4.60)
since
p = 2, i.e. L = (b1 , b2 ), then the diagram L, K II without a loop corresponds
to a partition K II containing pairs (k1 , k2 ) so that k1 ∈ b1 and k2 ∈ b2 . One can
reach all such types of partitions by fixing the first entries and permuting the second
ones (see 4.57)).
We assume that all the variables below in this subsection are jointly Gaussian.
Let us take a partition L = (b1 , b2 , . . . , bp ) of 1 : n with size |L| = p,
and
⊗a list of Gaussian system of vectors X1:n , we recall the notation XL =
⊗
Xb1 , Xb2 , . . . , ⊗ Xbp . We already have the formula
⊗
Cump (XL ) = K−1
p(K) Cum|b| (Xb ), (4.63)
b∈K
L∪K=O
where the summation is over all closed diagrams L, K II cnl without loop.
4.6 Moments, Cumulants, and Linearization 221
see (4.59), since the covariance coincides with the cumulant if the mean is zero. In
particular
⊗n
Cum2 (Hn (X) , Hn (X)) = M−1
mn κ X,2 , (4.67)
see (4.61).
The following equations are particular cases of the above theorem:
Lemma 4.3 Let the partition L be {(2j − 1, 2j ) |j = 1 : k}, then
Cum H2 X1,2 , H2 X3,4 , . . . , H2 X(2k−1),2k (4.69)
= K−1 (d ) Cum2 Xb1 ⊗ Cum2 Xb2 ... ⊗ Cum2 Xbk ,
p(K II ) 1:2k
2k−1 (k−1)!
where the blocks bj of pairs and the permutation p K II of the numbers (1 : 2k)
correspond to the closed diagrams L, K II without loops, and the summation is
taken over all such diagrams. If all random variables included in (4.69) are the
same X, then the usage of commutator matrices is still necessary
⎛ ⎞
Cumk (H2 (X)) = ⎝ K−1
p(K II )
⎠ κ ⊗k .
X,2 (4.70)
2k−1 (k−1)!
2k
Cumulant Cumk (H2 (X)) ∈ Rd is not 2k-symmetric, but only is k-symmetric (see
Case 1.7, p. 46).
Both Cumk (H2 (X)) and κ ⊗k
X,2 are k-symmetric in the space of multilinear
algebra Md 2 ,k , therefore
⎛ ⎞
Sd 2 1k Cumk (H2 (X)) = Cumk (H2 (X)) = Sd 2 1k ⎝ K−1
p(K II )
⎠ κ ⊗k .
X,2
2k−1 (k−1)!
222 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants
4
where the permutation matrices act on Rd , the permutation (1324) corresponds to
the partition of pairs K II = {(1, 3) , (2, 4)}, and the other permutation (1423) stands
for the partition of pairs K II = {(1, 4) , (2, 3)}. The commutator K−1 (1423) = K(1342)
permutes vectors in the T-products of four vectors with dimensions d, and Sd 2 12
symmetrize vectors in the T-products of two vectors with dimensions d 2 , hence we
have
1 1
Sd 2 12 K(1342) = K(1342) + K(4213) = Sd 2 12 = Id 4 + K(21) (d 2 , d 2 ) .
2 2
Example 4.17 Let k = 3, we obtain the cumulant and also the moment EH2 (X)⊗3
by (4.70):
⎛ ⎞
8
Cum3 (H2 (X) , H2 (X) , H2 (X)) = ⎝ K−1 ⎠ ⊗3
pj (d) κ X,2 . (4.71)
j =1
Following the algorithm which is given in Case 1.7, p. 46, we derive the per-
mutations involved in (4.71): p1 = (23|45|16) = (1 : 5)S , p2 = (23|46|15),
p3 = (24|35|16), p4 = (24|36|15), p5 = (25|36|14), p6 = (13|45|26), p7 =
(13|46|25), p8 = (143625). We denoted the correspondence of the permutations
to the partitions here by separating the blocks inside the permutations. Note that the
summation runs by partitions and both the permutations and the cumulants should be
assimilated to it. For instance the permutation (134526) corresponds to the partition
K II = {(2, 6) , (4, 5) , (3, 1)}. Making it unique, one has to use the canonical form
K II = {(1, 3) , (2, 6) , (3, 5)}, see Definition 1.7.
4.6 Moments, Cumulants, and Linearization 223
where the summation is taken over all d ∈ 0 : n with d ≡ n mod (2), and over
all partitions KdI,II ∈ PI,II , and where X = X | (m) ∈ K I,II is arranged in
n DK m
alphabetical order by indices. The permutation p KrI,II lists the pairs first then the
arms afterwards, both are in alphabetical order, the T-product of vector covariances
κ⊗
j,k and X DK follows the ordering of p K I,II
r .
If the variates of the list X1:n are the same X, then using the symmetrizer Sd1n
we have
[n/2]
n!
X⊗n = κ ⊗k ⊗ Hn−2k (X)
2k (n − 2k)!k! X,2
k=0
[n/2]
n!
= Hn−2k (X) ⊗ κ ⊗k
X,2 . (4.72)
2k (n − 2k)!k!
k=0
where the summation runs for all partitions KdI,II for which the evenness of the
number of arms d ∈ (0 : n) is the same as n, i.e. d ≡ n mod (2)
Using symmetrizer Sd1n , for one vector variate we obtain
[n/2]
(−1)k n! (−1)k n!
[n/2]
Hn (X) = X⊗(n−2k) ⊗ κ ⊗k = κ ⊗k ⊗ X⊗(n−2k) .
(n − 2k)!k!2 k X,2
(n − 2k)!k!2k X,2
k=0 k=0
(4.73)
224 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants
Remark 4.8 The definition of Hk gives the same result as (4.73). For instance, H3
is given by (4.36) as follows:
H3 (X) = X⊗3 − L−1 ⊗
12 ,11 κ X,2 ⊗ X = X
⊗3
− 3κ ⊗
X,2 ⊗ X,
1
Sd13 = Id 3 + K(132) + K(213) + K(321) + K(312) + K(231) .
3!
Now X⊗3 is symmetric under permuting its terms, hence Sd13 X⊗3 = X⊗3 , and
applying Sd13 on κ ⊗
X,2 ⊗ X we obtain a simpler form
⊗3
H3 (X) = X⊗3 − 3Sd13 κ ⊗
X,2 ⊗ X = X − 3κ ⊗
X,2 ⊗ X.
If r = 0, then K0I,II is either {(1, 3) , (2, 4)} or {(1, 4) , (2, 3)}, and DK0 = ∅, in both
cases.
The result is the following:
H2 Xb1 ⊗ H2 Xb2 = H2 (X1 , X2 ) ⊗ H2 (X3 , X4 )
= H4 (X1:4 ) + K−1 ⊗
(1324) κ 1,3 ⊗ H2 (X2 , X4 )
+K−1(1423) κ ⊗
1,4 ⊗ H 2 (X 2 , X 3 )
+K−1(2314) κ ⊗
2,3 ⊗ H 2 (X 1 , X 4 )
+K−1(2413) κ ⊗
2,4 ⊗ H 2 (X 1 , X 3 )
+K−1 ⊗ ⊗ −1 ⊗ ⊗
(1324) κ 1,3 ⊗ κ 2,4 + K(1423) κ 1,4 ⊗ κ 2,3 .
In particular
where
L−1 −1 −1 −1 −1
2,H 2 = K(1324) + K(1423) + K(2314) + K(2413) .
Example 4.19 We give the detailed formula for H3 (X1:3 ) ⊗ H3 (X4:6 ) in the
Appendix, (see (A.20), p. 366), which yields us the case of equal variables
H3 (X)⊗2 = H6 (X) + L−1 ⊗
2,H 4 κ X,2 ⊗ H4 (X)
+ L−1 ⊗2 −1 ⊗3
2,2,H 2 κ X,2 ⊗ H2 (X) + Mm3 κ X,2 , (4.75)
Sd16 H3 (X)⊗2 = H6 (X) + 9H4 (X) ⊗ κ ⊗ ⊗2 ⊗3
X,2 + 18H2 (X) ⊗ κ X,2 + 6κ X,2 .
Example 4.20 We also obtain H4 (X)⊗2 , first using the formula (4.74) for different
variables then we apply the distinct values principle and obtain
H4 (X)⊗2 = H8 (X) + L−1
2,H 6 κ ⊗
X,2 ⊗ H 6 (X) + L−1 ⊗2
2,2,H 4 κ X,2 ⊗ H4 (X)
+ L−1 ⊗3 −1 ⊗4
2,2,2,H 2 κ X,2 ⊗ H2 (X) + Mm4 κ X,2 , (4.76)
Sd18 H4 (X)⊗2 = H8 (X) + 16κ ⊗ ⊗2
X,2 ⊗ H6 (X) + 72κ X,2 ⊗ H4 (X)
+ 96κ ⊗3 ⊗4
X,2 ⊗ H2 (X) + 24κ X,2 ,
cf. (4.28).
× φZ (λ|μ, ) ,
and we obtain
∞ k
i
φX (λ) = 1 + Bk 0, 0, κ ⊗ ⊗
X,3 , . . . κ X,k λ⊗k φZ (λ|μ, )
k!
k=3
Here Hk (z|) corresponds to the variate Z ∈ N (0, ). To obtain (−1)k Dz⊗k ϕ
(z|μ, ) we consider Hk (z − μ|) and obtain
⊗k
(−1)k Dz⊗k ϕ (z|μ, ) = ϕ (z|μ, ) −1 Hk (z − μ|) .
Now the inverse-Fourier transform of φX (λ), i.e. the density fX (x) can be
written into the so-called Gram–Charlier series
1 ⊗k
∞
fX (x) = 1 + Bk 0, 0, κ ⊗ ⊗
X,3 , . . . κ X,k −1 Hk ((x − μ) |)
k!
k=3
×ϕ (x|μ, )
∞
1 −1/2 ⊗k
⊗ ⊗
= 1+ Bk 0, 0, κ X,3 , . . . κ X,k
k!
k=3
×Hk −1/2 (x − μ) |I ϕ (x|μ, ) ,
where the Hermite polynomial Hk (z|I) corresponds to Z ∈ N (0, I), with standard
Gaussian densityϕ (z|0, I), with respect
to our earlier notations Hk (z|I) = Hk (Z).
The value of Hk −1/2 (z − μ) |I in the above expansion is simply plugging the
−1/2 (z − μ) into Hk (Z). An instance is
⊗2
H2 −1/2 (z − μ) |I = −1/2 (z − μ) − vec I
= −1/2⊗2 (z − μ)⊗2 − κ ⊗
2 ,
which implies that H2 −1/2 (z − μ) |I corresponds to −1/2⊗2 H2 (Z−μ), when
Z ∈ N (0, ).
The product
⊗k ⊗k ⊗k
−1/2 Bk 0, κ ⊗ ⊗
X,2 , . . . κ X,k =
−1/2
EX⊗k = E −1/2 X ,
equals to Bk κ ⊗ , κ ⊗
,
Y,1 Y,2 Y,3κ ⊗
, . . . , κ ⊗
Y,k , where Y =
−1/2
(X − μ), i.e. Y is
standardized EY = 0, VarY = I.
In general we have that the third-order central moment and cumulant coincide
EY⊗3 = κ ⊗Y,3 , and also
EY⊗3 = B3 0, 0, κ ⊗ ⊗
Y,3 = κ Y,3 .
4.7 Gram–Charlier Expansion 229
(see Sect. A.1, p. 351) the pattern of sixth and higher-order Bell polynomials are
changing,
B6 0, 0, κ ⊗ ⊗ ⊗ ⊗ ⊗ −1 ⊗2
Y,3 , κ Y,4 , κ Y,5 , κ Y,6 = κ Y,6 + L23 κ Y,3 , (4.77)
hence B6 0, 0, κ ⊗ , κ ⊗
, κ ⊗
, κ ⊗ ⊗
Y,3 Y,4 Y,5 Y,6 = κ Y,6 . The symmetry of EY
⊗6 and B are
6
the same; therefore, B6 is 6-symmetric, hence we can symmetrize both sides by
Sd16 , actually taking Sd16 L−1 ⊗2
23 κ Y,3 , we obtain
⊗
B6 0, 0, κ ⊗ ⊗ ⊗ ⊗ ⊗2
Y,3 , κ Y,4 , κ Y,5 , κ Y,6 = κ Y,6 + 10κ Y,3 .
= κ⊗ −1 ⊗ ⊗ ⊗ ⊗ ⊗
Y,7 + L13 ,14 κ Y,3 ⊗ κ Y,4 = κ Y,7 + 35κ Y,3 ⊗ κ Y,4 ,
= κ⊗ −1 ⊗ ⊗ −1 ⊗2
Y,8 + L13 ,15 κ Y,3 ⊗ κ Y,5 + L24 κ Y,4
= κ⊗ ⊗ ⊗ ⊗2
Y,8 + 56κ Y,3 ⊗ κ Y,5 + 35κ Y,4 .
5
1 ⊗
fX (x) = 1 + κ Y,k Hk −1/2 (x − μ) |I ϕ (x|μ, )
k!
k=3
8
1
+ Bk 0, 0, κ ⊗
Y,3 , . . . , κ ⊗
Y,k
k!
k=6
×Hk −1/2 (x − μ) |I ϕ (x|μ, ) + O,
230 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants
where Y = −1/2 (x − μ); therefore, one can conclude that fX (x) depends on μ
and through the Gaussian parts of this expression.
From now on let Y = −1/2 (X − μ); therefore, κ ⊗ Y,1 = μ = 0, VarY = Id , and
ϕ (y) = ϕ (y|0, I). The density of Y, i.e. the density of the standardized X writes as
5
1 ⊗
8
1
⊗ ⊗
fY (y) = 1 + κ Hk (y) + Bk 0, 0, κ Y,3 , . . . , κ Y,k Hk (y)
k! Y,k k!
k=3 k=6
× ϕ (y) + O. (4.78)
The orthogonality of Hermite polynomials implies that we only have to consider the
expected value of
H3 (Y) H3 (Y) κ ⊗ ⊗
Y,3 = vec H3 (Y) H3 (Y) κ Y,3
⊗
= κ Y,3 ⊗ Id 3 vec H3 (Y) H3 (Y)
⊗
= κ Y,3 ⊗ Id 3 H3 (Y)⊗2 ,
where the Cum2 (H3 (Z)) has been given by (4.57) (see (A.8) p. 356 for commutator
M−1
m3 ).
Once again, we have the orthogonality of Hermite polynomials; EH3 (Z) ⊗
Hk (Z) = 0, k = 3; therefore, we obtain
1
⊗
EH3 (Y) = EH3 (Z) H3 (Z) κ ⊗
Y,3 = κ Y,3 ⊗ Id 3 EH3 (Z)
⊗2
3!
1 ⊗
= κ Y,3 ⊗ Id 3 M−1
m3 (vec Id )
⊗3
= κ⊗
Y,3 ,
3!
4.7 Gram–Charlier Expansion 231
where κ ⊗
Y = vec Id . Then we apply (4.79) and obtain
EH3 (Y)⊗2 = κ ⊗
Y,6 +L−1 ⊗2
κ
23 Y,3 +L−1
2,H 4 vec Id ⊗ κ ⊗ −1 ⊗3
Y,4 +Mm3 vec Id , (4.80)
and obtain
EH4 (Y)⊗2 = κ ⊗
Y,8 + L−1
13 ,15 κ ⊗
Y,3 ⊗ κ ⊗
Y,5 + L−1
2,H 6 vec Id ⊗ κ ⊗
Y,6 + L−1 ⊗2
κ
23 Y,3
+L−1 ⊗2
24 κ Y,4 (4.81)
+L−1 ⊗2 ⊗ −1 ⊗4
2,2,H 4 vec Id ⊗ κ Y,4 + Mm4 vec Id .
use the formulae of this section for getting ideas to estimate and test skewness and
kurtosis.
The following family of distributions
⎛ ⎞
k
g (y) = c (θ ) exp ⎝ θ j Hj (y)⎠ ϕ (y)
j =1
leads to a distribution of the form (4.78), since after the series expansion of the
exponent we can use the formula for linearization ((4.74), p. 224) of T-Hermite
polynomials. As follows from Lemma 4.4 the coefficients are necessarily those of
formula 4.79.
4.8 Appendix
−1
= K(1,n)((1, n)S d1:n ) Da⊗(n−1)
S 1:(n−1)
⎡ ⎛ ⎞⎤*
*
*
× ⎣ψ X1:(n−1) , a1:(n−1) ⎝− Cn,j aj ⎠⎦**
j =1:(n−1) *
a1:n =0
−1
= −K(1,n)S
((1, n)S d1:n ) K−1
(2:j ) (2 : j )S dn , d1:(n−1)
S
j =1:(n−1)
× vec Cj,n ⊗ Hn−2 (X(1:(j −1),(j +1):(n−1)) ) .
= Hn−1 (X1:(n−1) ) ⊗ Xn
−K−1
(1,n)S ((1, n)S d1:n ) K−1
(2:j ) dn , d1:(n−1)
S
j =1:(n−1)
× vec Cj,n ⊗ Hn−2 X(1:(j −1),(j +1):(n−1)) .
1 ⊗
EH3 (Y) = κ Y,3 ⊗ Id 3 M−1
m3 (vec Id )
⊗3
.
3!
M−1
m3 = K(135246) + K(135264) + K(135426) + K(135624) + K(135462) + K(135642),
K(135246)vec⊗3 Id = vec Id 3 .
= 3!κ ⊗
Y,3
since κ ⊗
Y,3 is 3-symmetric.
The same formulae are valid in general
1
EHk (Y) = EHk (Z) Hk (Z) Bk 0, 0, κ ⊗ Y,3 , . . . κ ⊗
Y,k
k!
1
= Bk 0, 0, κ ⊗ Y,3 , . . . κ⊗
Y,k ⊗ Id k M−1 mk (vec Id )
⊗k
k!
= Bk 0, 0, κ ⊗ ⊗
Y,3 , . . . κ Y,k .
Q.E.D.
4.9 Exercises
All random variables below are jointly normal, centralized (EXj = 0), with
covariances σj k , and cumulants κ ⊗
j,k .
4.1 Let {Y, Z, X1:n } be a Gaussian system and a, b real numbers, then show
Hn+1 (aY + bZ, X1:n ) = aHn+1 (Y, X1:n ) + bHn+1 (Z, X1:n ).
4.9 Exercises 235
where the summation is taken over all permutations (k1 , k2 , k3 ) of numbers (4 : 6).
4.4 Assume that random variables X1:k are jointly normal, show that
Cum (H2 (X1 ) , H2 (X2 ) , . . . , H2 (Xk )) = 2k−1 σ1,m1 σm1 ,m2 · · · σmk−1 ,1 ,
(k−1)!
and in particular
Hint: Consider the structure of all possible closed diagrams without loops.
4.12 Use expression (4.29) and show that
= X⊗4 − 6κ X,2 ⊗ X⊗2 + 3κ ⊗2
2 ,
Hn+1 (AY + BZ, X1:n ) = (A ⊗ Id n ) Hn+1 (Y, X1:n ) + (B ⊗ Id n ) Hn+1 (Z, X1:n ) .
4.20 Consider the quadratic form XAX , where X is Gaussian with EX = 0 and
covariance matrix C ([Hol96a, MP92]), where A can be considered as a symmetric
matrix (since the asymmetric part of a quadratic form is zero). Show that
EXAX = trCA,
k ⊗k
E XAX = (2k − 1)!!κ 2 Sd12k vec⊗k A.
and
Cumk XAX = 2k−1 (k − 1)! (trCA)k .
−1/2 1/2 ⊗3
3
3 −1/2 3 1 3
H3 X − X = Sd13 Hk,3−k
k=0
(n − 1) (3−k)/2 k
−1/2
× −1/2 (X − μ) , μ−X .
X
The Inverse Mill’s (i-Mill’s) ratio is used to derive the moments and cumulants of
skew-normal distributions in the following section.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 241
G. Terdik, Multivariate Statistical Methods, Frontiers in Probability
and the Statistical Sciences, https://doi.org/10.1007/978-3-030-81392-5_5
242 5 Multivariate Skew Distributions
5.1.1 The Inverse Mill’s Ratio and the Central Folded Normal
Distribution
Let Z denote the standard normal random variable, and |Z| the central folded normal
(or half-normal) distribution with the characteristic function
1 2
φ|Z| (λ) = 2 exp − λ (iλ) ,
2
where denotes the univariate standard normal distribution. The cumulant gener-
ating function of |Z| is
1
ψ|Z| (λ) = log 2 − λ2 + log (iλ) .
2
The ratio of the standard normal density ϕ (λ) and the corresponding distribution
function (λ), viz. the following function
ϕ (λ)
ρM (λ) = ,
(λ)
is called the Inverse Mill’s ratio of the standard normal distribution, and we shall
refer to it as i-Mill’s ratio.We will see the connection between this i-Mill’s ratio ρM
and the derivatives of the cumulant generating function ψ|Z| if we take the derivative
of the cumulant generating function
d
ψ|Z| (λ) = −λ + iρM (iλ) .
dλ
Let us denote the function ρM (iλ) by ρiM (λ) for convenience. Then the derivatives
of ψ|Z| and ρiM of order higher than 2 will coincide, i.e.
(k) (k−1)
ψ|Z| (λ) = iρiM (λ) , k > 2. (5.1)
(n)
We can easily obtain a clear formula for ρiM (0), i.e. the cumulant of |Z|, in terms
of moments since the moments of the central folded normal distribution are well
known and the derivatives of i-Mill’s ratio ρiM are equal to the derivatives of the
cumulant generating function.
Proposition 5.1 The moments E |Z|n and the cumulants Cumn (|Z|) of |Z| are
given by
7
2
μ|Z|,2k+1 = E |Z| 2k+1
= (2k)!! , (5.2)
π
μ|Z|,2k = E |Z|2k = (2k − 1)!!,
5.1 The Multivariate Skew-Normal Distribution 243
and
7
2 2
κ|Z|,1 = , κ|Z|,2 = 1 − ,
π π
(n−1)
κ|Z|,n = (−i)n−1 ρiM (0) , n > 2,
(n−1)
respectively. If n > 2, then the derivatives of i-Mill’s ratio at zero (−i)n−1 ρiM (0)
can be expressed in terms of moments μ|Z|,j :
n j
(n−1) 1 μ|Z|,j
(−i)n−1 ρiM (0) = n! (−1)r−1 (r − 1)! .
j =1:(n−r+1) j ! j!
r=1 j =r,j j =n
Proof It is well known that the moments of the modulus of the standard univariate
normal distribution are given by
7
2
E |Z| = 2
n (n−1)/2
((n + 1) /2) ,
π
so that (5.2) follows. Cumulants κ|Z|,k can be directly calculated via these moments
(see (3.27), p. 131). At the same time if n > 2 then the (n − 1)th derivative of i-
Mill’s ratio ρiM (λ) coincides with the nth derivative of the cumulant generating
function ψ|Z| , and hence with the nth cumulant κ|Z|,n . So the assertion follows.
We show some particular cases in the following example.
Example
√ 5.1 We have the first- and second-order cumulants κ|Z|,1 = μ|Z|,1 =
2/π, and κ|Z|,2 = μ|Z|,2 − μ2|Z|,1 = 1 − 2/π, as usual. Now we shall use i-
Mill’s ratio for deriving the cumulants and compare it to the formula (3.27), p. 131,
using moments.
First let us take the derivative of the normal density ϕ (1) (iλ) = λϕ (iλ), therefore
(1)
ρM (iλ) = ρM (iλ) (λ − iρM (iλ)) = ρiM (λ) (λ − iρiM (λ)). Now we consider
the third- and fourth-order derivatives of the cumulant function. The third-order
derivative is
d3 (2) (1)
ψ|Z| (λ) = iρiM = (λ − 2iρiM ) ρiM + ρiM ,
dλ3
= (λ − 2iρiM ) (λ − iρiM ) ρiM + ρiM
= ρiM (1 + (λ − 2iρiM ) (λ − iρiM )) ,
where we keep using the notation ρiM (λ) = ρM (iλ). If we set λ = 0, then we
obtain
(2)
ρiM (0) = ρiM (0) 1 − 2ρiM
2
(0) ,
244 5 Multivariate Skew Distributions
follows. Conversely, first we can use the formula (3.22), p. 130, then plug the
moments into (5.2), and get
7 3 7
2 2
κ|Z|,3 = μ|Z|,3 − 3μ|Z|,2 μ|Z|,1 + 2μ3|Z|,1 =2 − .
π π
which is
(3)
ρiM (0) = 2iρiM
2 2
(0) 3ρiM (0) − 2
Using formula (3.25), p. 130, and (5.2) yields the same result
2
2 2
κ|Z|,4 = μ|Z|,4 − 4μ|Z|,3 μ|Z|,1 − 3μ2|Z|,2 + 12μ|Z|,2 μ2|Z|,1 − 6μ4|Z|,1 = 4 −6 .
π π
See Sect. A.8, p. 378 for further derivatives of i-Mill’s ratio. Cumulants of |Z| are
also listed there in (A.38), p. 379.
where is a correlation matrix (in the sequel we shall assume that > 0), ϕ is the
normal density function, and denotes the univariate standard normal distribution.
5.1 The Multivariate Skew-Normal Distribution 245
1
δ= α. (5.3)
(1 + α α)1/2
1 −1
α= δ.
−1 1/2
1−δ δ
The characteristic function of SNd (0, , α) is given in terms of the skew parameter
by
φX (λ) = 2 exp −1/2λ λ iδ λ ,
1
ψX (λ) = log 2 − λ λ + log iδ λ .
2
Cumulants can be derived by taking the T-derivative of ψX . Let us observe that we
can use i-Mill’s ratio since
⊗ ϕ iδ λ
Dλ log iδ λ = i δ = iρiM δ λ δ.
iδ λ
We consider the first two cumulants. The first one follows from
Dλ⊗ ψX = −λ + iρiM δ λ δ
√
at zero, where we plug ρiM (0) = 2/ 2π in and obtain
7
⊗ 2
κX,1 = μ⊗
X,1 = δ.
π
(1)
Dλ⊗2 ψX = −vec + iρiM δ λ δ ⊗2 ;
246 5 Multivariate Skew Distributions
⊗
hence, κX,2 follows
*
⊗ * 2 ⊗2
κX,2 = (−i)2 Dλ⊗2 ψX * = vec − δ ,
λ=0 π
(1)
where again we use ρiM (0) = −i2/π (see Sect. A.8, p. 378 for some more
derivatives of i-Mill’s ratio).
Furthermore we note that derivatives Dλ⊗k ψ result in δ ⊗k , at zero, multiplied by
a constant, say, ck , k = 3, 4 . . .. For instance the third-order one is
do not depend on but on δ only. We can apply the results of Proposition 5.1, to
derive the coefficients ck of δ ⊗k , which are the derivatives of i-Mill’s ratio at zero,
therefore are the cumulants κ|Z|,k , i.e. ck = κ|Z|,k .
Lemma 5.1 The cumulants of the multivariate skew-normal distribution
SNd (μ, , α) are
7
2
κ⊗
X,1 = EX = δ,
π
2
κ⊗
X,2 = vec − δ ⊗2 ,
π
and for k = 3, 4 . . .
κ⊗ ⊗k
X,k = κ|Z|,k δ ,
where is a d × p skewness matrix satisfying a < 1 for all a = 1, while
Z1 ∈ N 0, Ip and Z2 ∈ N (0, Id )* are independent.
* * The modulus of Z1 is taken
*
component-wise, i.e. |Z1 | = *Z1,1* , . . . , *Z1,p * . A simple construction of is
−1/2
= Ip + where is some d × p real matrix.
The expected value EX and variance VarX can be obtained directly from
Eq. (5.5). The expected value is
7
2
EX = E |Z1 | = 1,
π
where variance Var |Z1 | follows from the variance of entries of |Z1 |: Var |Z1 | =
Ip − 2/πIp (see Proposition 5.1). Hence we have
2
VarX = Id − .
π
A location-scale extension of X is given by
X = μ + 1/2 X,
Let Z1 ∈ N 0, Ip and Z2 ∈ N (0, ); then the characteristic function φX of
CF U SNd,p (, ) is
φX (λ) = 2p exp −1/2λ λ p i λ , (5.6)
1
ψX (λ) = p log 2 − λ λ + log p i λ , (5.7)
2
and we shall derive the cumulants by the T-derivative
*
Cumn (X) = (−i)n Dλ⊗n ψX (λ)*λ=0 ,
where, as usual,
! "
∂
Dλ⊗ ψX = vec ψX (λ) .
∂λ
It is seen from (5.12) that cumulants of order higher than 2 do not depend on but
only.
1 p
ψ|Z| (λ) = p log 2 − λ2 + log (iλk ) .
2
k=1
5.1 The Multivariate Skew-Normal Distribution 249
p
p
Dλ⊗ ψ|Z| (λ) = −λ + Dλ⊗ log (iλk ) = −λ + i ρiM (λk ) ek ,
k=1 k=1
p
Dλ⊗2 ψ|Z| (λ) = −vecIp + i ρiM (λk ) e⊗2
(1)
k .
k=1
(k)
Observe that the derivatives ρiM λj of i-Mill’s ratio at zero do not depend
on index j . The first two moments, the expected value μ|Z| and variance
matrix |Z| of |Z|, are* obtained√by substituting zero into these derivatives;
*
μ|Z| = −i Dλ⊗ ψ|Z| (λ)*λ=0 = 2/π1p , and |Z| = − Dλ⊗ ψ|Z| (λ)*λ=0 =
(1 − 2/π) vecIp .
Higher-order T-derivatives have the following simple form
p
(n−1)
Dλ⊗n ψ|Z| (λ) = i ρiM λj e⊗n
j , n > 2.
j =1
p
i⊗
p,n = e⊗n
k , (5.8)
k=1
for short notation of higher-order cumulants below. So the cumulants of |Z| are the
following:
√
Lemma 5.2 We have κ ⊗ |Z|,1 = μ|Z| = 2/π1, κ ⊗ |Z|,2 = vec |Z| =
(1 − 2/π) vecIp , and for n > 2
*
κ⊗ ⊗n * ρiM (0) i⊗
n−1 (n−1)
|Z|,n = (−i) Dλ ψ|Z| (λ) λ=0 = (−i)
n
p,n ,
(n−1)
where ρiM (λ) = ρM (iλ) is i-Mill’s ratio as before and ρiM (0) has clear form
in terms of μ|Z|,k (see Proposition 5.1).
250 5 Multivariate Skew Distributions
Now let us turn to the cumulants of X with characteristic function (5.6). Let us
consider the first derivative
i *
Dλ⊗ log p i λ = ⊗ *
Dλ p (iy) y= λ
p (i λ)
i *
= p−1 iy[1:p]\k ϕ (iyk )*y= λ ek
p (i λ)
k
*
p−1 iy[1:p]\k ϕ (iyk )*y= λ
=i ek ,
p (i λ)
k
#
ρiM (yk ) ek is a vector with coordinates ρiM (yk ), denoted by [ρiM (yk )]k=1:p .
k
The values of all ρiM (yk ) are equal at y = 0; hence we obtain the first cumulant
7
⊗ 2
κX,1 = μ⊗
X,1 = 1.
π
The first derivative of i-Mill’s ratio is necessary for the second derivative of ψX
* * *
Dλ⊗ ρiM (yk )*y= λ ek = i ρiM (yk )*y= λ ek ⊗ek = i ρiM (yk )*y= λ ⊗2 e⊗2
k .
We proceed with
*
*
Dλ⊗2 ψX = −vec + i ⊗2 e⊗2
(1)
ρiM (yk )* k ,
y= λ
k
2
VarX = − .
π
The third-order derivative of ψX does not depend on any more. We repeat the
calculation of second-order case and get
*
*
Dλ⊗3 ψX (λ) = i ⊗3 e⊗3
(2)
ρiM (yk )* k . (5.9)
y= λ
k
(3)
We proceed using Proposition 5.1 for ρiM (0) and obtain
7 7
2 4 2 4
⊗ ⊗3
κX,3 =− −1 ⊗3
ek = − − 1 ⊗3 i⊗
p,3 ,
π π π π
k
where
p
i⊗
p,3 = e⊗3
k ; (5.10)
k=1
in other words i⊗ ⊗2
p,3 = vec ek , with the matrix of vectors of e⊗2
k , which is
k=1:p
e⊗2
k with dimension p2 × p. Let us introduce the notation ek = •,k for
k=1:p
the kth column vector of matrix ; then we have
7 7
⊗ 2 4 ⊗3 2 4
κX,3 =− −1 •,k = − − 1 ⊗3 i⊗
p,3 .
π π π π
k
κ⊗ ⊗r ⊗
X,r = κ|Z|,r ip,r , (5.13)
252 5 Multivariate Skew Distributions
where i⊗
p,3 is defined in (5.10). Let us use the j th column vector •,j = ej of
matrix ; then
⊗
2 r
κX,r = κ|Z|,r
2
1p 1p , r > 2, (5.14)
(r) (r+1)
Dλ⊗ (ek )⊗r ρiM ek λ = iρiM ek λ (ek )⊗r ⊗ Ip ek
(r+1)
= i (ek )⊗(r+1) ρiM ek λ .
2 ⊗r
r
⊗
κX,r = κ|Z|,r
2
•,j ⊗r
•,k = κ|Z|,r
2
•,j •,k .
j,k j,k
The product •,j •,k is the (j, k)th entry of the matrix ; therefore
r
•,j •,k is the (j, k)th entry of the rth Hadamard power ( )r of .
Hence
⊗
2 r
κX,r = 1p 1p .
W = RU, (5.15)
X = μ + 1/2 W,
Cum2m+1 (X) = 0.
of the cumulants generator function ψW (λ) = log φW (λ) = log g (λ λ) of W and
conclude that in this case cumulants are calculated by the T-derivative of the log
characteristic generator function
j ⊗j
**
κ⊗
W,j = (−i) Dλ log λ λ * .
λ=0
254 5 Multivariate Skew Distributions
with coefficients gj = (−1)j g (j ) (0). This will play an important role later on.
Definition 5.1 Let us introduce the log-generator function f = log (g) and define
the “generator moment” by νk = (−1)k g (k) (0), together with the “generator
cumulant” by ζk = (−1)k f (k) (0).
Although neither the generator moments nor the generator cumulants correspond
to moments and cumulants of a random variate, respectively, Faà di Bruno’s formula
(see (3.27), p. 131) is valid and we can express generator cumulants in terms of
generator moments (and vice versa):
k k
k! νj j
ζk = (−1)r−1 (r − 1)! k , (5.16)
j =1 j ! j =1
j!
r=1 j =r,j j =k
where the second sum is taken over all sequences (types) = (1 , . . . , k ), j ≥ 0,
such the following two conditions are satisfied:
#
• j = r.
#j =1
• j =1 j j = k.
f (2) (0) = −2f (1) (0). Observe ν1 = ζ1 , since EWj = 0. Connections between
both moments and generator moments and cumulants and generator cumulants are
stated in the following theorem.
Theorem 5.1 The odd moments μW,2m+1 = EWj2m+1 of Wj are zero and the even
ones are
(2m)!
μW,2m = 2m (2m − 1)!!νm = νm , (5.17)
m!
where νm denotes the generator moment.
Odd cumulants κW,2m+1 of Wj are zero as well, and the even ones are
(2m)!
κW,2m = 2m (2m − 1)!!ζm = ζm , (5.18)
m!
where ζm denotes generator cumulant. Even cumulants can be expressed in terms of
generator moments
m
m! νj j
κW,2m = 2 (2m − 1)!!
m
(−1)r−1 (r − 1)! ,
j =1 j ! j =1
j!
r=1 j =r,j j =m
(5.19)
where the second sum is taken over all sequences = (1 , . . . , ), j ≥ 0, such
that the following two conditions are satisfied:
#
• j = r,
#j =1
• j =1 j j = m.
Proof The marginal characteristic function of entry Wj is g λ2j when λj =
0, . . . λj , . . . 0 . We differentiate g λ2j to obtain the moments of Wj in terms
of generator moments,
∂ 2
g λj = g (1) λ2j 2λj ,
∂λj
∂2 2 2
g λj = g (2)
λ2
j 2λj + 2g (1)
λ2j .
∂λ2j
We proceed with noting that the coefficients of g (n) λ2j = ∂ n /∂λnj g λ2j are
j
scalar-valued series, say, bn , when we take the derivatives of a compound function
256 5 Multivariate Skew Distributions
g λ2j . Therefore, we have
n (n−2)
g (n) λ2j = bnn g (n) λ2j 2λj + bnn−1 g (n−1) λ2j 2λj
(n−2m)
+ bnn−m g (n−m) λj 2λj , (5.20)
n
where m = 2 , and coefficients bnk fulfill the following recursion:
bnn = 1
n
n−k n−k−1
bnn−k = 2bn−1 (n − 2k + 1) + bn−1 , k = 1, . . . , m = .
2
If n is odd, then the power of the last term in (5.20) is n − 2 [n/2] = 1; therefore,
n/2
g (n) (0) = 0; otherwise g (n) (0) = bn g (m) (0); hence
8
0 if n odd,
EWjn =
(−1)m b2m
m g (m)
(0) if n = 2m even.
noticed that bn does not depend on g; hence, we may choose, say, gm (t) =
It can k
been
exp −t (being a valid characteristic function) and derive coefficients b2m ; the
2
result is
m
b2m = 2m (2m − 1)!!;
hence, for n = 2m
Now (5.17) follows changing (−1)m g (m) (0) to the generator moment
(2m)!
EWj2m = νm . (5.21)
m!
Observe that the right-hand side does not depend on the index j ; therefore, all
marginals are distributed equally. Plugging the cumulant generator function f into
(5.20), it is readily seen that the odd generator cumulants ζm are also zero. The even
order cumulants for each Wj are
(2m)!
Cum2m Wj = 2m (2m − 1)!!ζm = ζm .
m!
The general formula (5.27) is based on formula “cumulants in terms of moments”
for ζm in terms of νm (see (3.27), p. 131).
5.2 Elliptically Symmetric and Skew-Spherical Distributions 257
for each entry Wj , where we observe two quantities: one is kurtosis (standardized
generator cumulant since ν1 = ζ1 )
ζ2
κ2 =
, (5.23)
ν12
ν2 − ν12 ν2
μ2 = = − 1,
ν12 ν12
ζm
κm = , (5.25)
ν1m
(see Exercise 5.4). We shall see later that the usage of moment and cumulant
parameters reduces the number of parameters for a multiple spherically distributed
random variate W significantly. The number of characteristics is halved for an
elliptically contoured distribution.
The cumulant parameters κm can be expressed by moment parameters μm in
higher orders as well.
Corollary 5.1 The moments of standardized Wj are zero for odd orders and
2m
Wj μW,2m
E √ = = (2m − 1)!! (
μm + 1) ,
2ν1 (2ν1 )m
m
m! μj + 1 j
κm = (−1)r−1(r − 1)! . (5.27)
j =1 j ! j =1
j!
r=1 j =r,j j =m
The formula (5.27) is valid for all m ≥ 1, since we have seen earlier that
μ1 = 0,
κ1 = 1.
5.2 Elliptically Symmetric and Skew-Spherical Distributions 259
j
we change the ratio νj /ν1 =
μ2 + 1, and hence the assertion (5.27) follows.
Now let us turn to the stochastic representation (5.15) and consider the one
variable case Wj = RUj . We are interested in the even order moments
where the even order moments EUj2m are given by (5.81); hence, we obtain
μR,2m
μW,2m = m
(2m − 1)!!. (5.28)
2 (d/2) m
On the other hand we can use the expression (5.17) and equating it to (5.28) we get
(2m)! μR,2m
νm = m (2m − 1)!!,
m! 2 (d/2)m
where (d/2)m = d/2 (d/2 + 1) · · · (d/2 + m − 1), see (A.24), p. 368 for Pochham-
mer’s symbol (d/2)m . Now we express the generator moment in terms of the
moment of the generating variate R
m! (2m)! μR,2m
νm = μR,2m = 2m .
(2m)! 22m m! (d/2)m 2 (d/2)m
νm (d/2)m μR,2m
μm = − 1 = −1
ν1m (d/2)m μm
R,2
1 μR,2m
= − 1, (5.29)
α (d, m) μm
R,2
Now we use the particular values for moments of Uj , see (5.81) and get
3 3 2 6 3
κW,4 = 2
κR 2 ,2 + κR 2
2 ,1 − 2 κR 2 ,1 = − 2 κR 2 ,1 + κ 2 .
d (d + 2) d d (d + 2) d (d + 2) R ,2
(5.30)
2n 1 κU ,j j
n−r+1
κW,2n = (2n)! 2 , . . . , Rn
Cumr R{1 } , R{ 1
,
2} {n } j ! j!
r=1 j =r,j j =n j =1
j is even
(5.31)
where the summation is taken over all even order cumulants of U1 , since the odd
j
orders are zero and where R{ } corresponds to the block with cardinality j , which
j
includes power R j only (it implies listing R j consecutively j times).
Here the cumulants κU1 ,j of U1 are involved and in particular cases, they can be
evaluated explicitly to get the clear formula for κW1 ,2n .
Example 5.3 Let us take the fourth-order cumulant κW,4 = κRU1 ,4 of Wj , say. We
can apply the formula to cumulants (5.31) and obtain
Now we turn to the cumulants of U1 into moments and use formula (5.81) for the
moments
3
μU1 ,4 = ,
d (d + 2)
κU2 1 ,2 = μU1 ,2 = 1/d,
3 3 6
κU1 ,4 = μU1 ,4 − 3μ2U1 ,2 = − 2 =− 2 .
d (d + 2) d d (d + 2)
6 1
κRU1 ,4 = − κ 4 + 3 2 κR 2 ,2 . (5.33)
d 2 (d + 2) R ,1 d
*
includes the T-moments of W through the coefficients μ⊗ = j D ⊗j φ (λ)*
j (−i) λ W *
λ=0
of λ⊗j .
Now, the characteristic generator g is a function of one variable with series
expansion
∞
(−1)j gj j
g (u) = u ,
j!
j =1
such that gj = (−1)j g (j ) (0) = (−1)j νj . Let us rewrite the characteristic function
using the series expansion of g (u) and obtain
∞
(−1)j gj
φW (λ) = g λ λ = λ λ .
j!
j =1
262 5 Multivariate Skew Distributions
* ∞
(−1)j gj j **
*
(−i)k Dλ⊗k φW (λ)* = (−i)k Dλ⊗k λ λ * .
λ=0 j! λ=0
j =1
We observe that
*
* ∞
*
* (−1)j gj ⊗k
j*
*
μ⊗ = (−i) k ⊗k
D φ W (λ) * = (−i)k Dλ λ λ *
k λ λ=0 j! *
j =1 λ=0
⎧
⎨ 0 if 2j = k,
= 1
⎩ gj cj if 2j = k,
j!
and vector cj does not depend on g. Let us use gj = (−1)j g (j ) (0) = νj , and
⊗2j
cj = Dλ (λ λ)j , hence
νm ⊗2m m
μ⊗
W,2m = D λ λ .
m! λ
We have derived the connection between the generator moments and the marginal
moments by (5.17); now we apply it and obtain
μW1 ,2m m μW1 ,2m ⊗2m m
μ⊗
W,2m = Dλ⊗2m λ λ = D λ λ .
m!2 (2m − 1)!!
m (2m)! λ
*
with coefficients κ ⊗ = n D ⊗j ψ (λ)*
j (−i) λ W * , and the cumulant generator function
λ=0
f = log (g) with series expansion
∞
(−1)j j
f (u) = fj u ,
j!
j =1
Recall that K{r|} denotes particular partitions with size r and type (see page
95). Commutator L−1 r2 is defined by
L−1
r2 = K−1
p(K{r|} )
,
(2r−1)!!
where the summation is over all partitions K{r|} ∈ P2r , with type =
(0, r, 0, . . . , 0), i.e. partitions K{r|} , and includes r blocks of numbers 1 : 2r, each
with cardinality 2. Index r2 corresponds to the entry 2 = r of , see Sect. A.2.1,
p. 353.
Summarizing the above calculations it follows:
Theorem 5.2 The moments and cumulants of odd orders of spherically distributed
W are zero; the moments of even orders are
μW,2m
μ⊗
W,2m = L−1 vec⊗m Id = μW,2m vec⊗m Id ;
(2m − 1)!! m2
κW,2m
κ⊗
W,2m = L−1 vec⊗m Id = κW,2m vec⊗m Id .
(2m − 1)!! m2
√
where −1/2 = 1/ 2ν1 Id . The formula (5.27) shows that κm is a polynomial of
μk ,
k = 2 : m. We denote this polynomial by am (
μ2 . . . ,
μm ) =
κm ; hence,
Cum2m −1/2 W = am ( μm ) L−1
μ2 . . . , m2 vec
⊗m
Id ;
in particular a1 = 1,
⎧
⎨μ2 , m = 2,
μ2 , . . . ,
am ( μm ) = μ3 − 3
μ2 , m = 3,
⎩
μ4 − 4
μ3 − 3
μ22 + 6
μ2 , m = 4.
Proof The crucial point of the proof is understanding the derivative Dλ⊗2m (λ λ)m .
First we notice that
m m
Dλ⊗2m λ λ = Dλ⊗2m fj (λ) ,
j =1
264 5 Multivariate Skew Distributions
where fj (λ) = λ λ, that is we can use the general Leibnitz rule (2.40) and
symmetrize by Sd12m ,
2m ⊗ ⊗kj
Dλ⊗2m j fj (λ) = Dλ fj (λ) , (5.35)
k1:m j
k1:m =2m
= (2m)!vec⊗m Id ,
since the only nonzero term is when kj = 2, j = 1 : m, then Dλ⊗2 fj (λ) = 2vecId .
Therefore, we have
2m m (2m)! m
2 = 2 = (2m)!,
21m (2!)m
terms in the sum (5.35), each equals (vecId )⊗m ; hence, the assertion follows.
Remark 5.2 Both higher-order moments μ⊗ ⊗
W,2m and cumulants κ W,2m assume
calculating the symmetrizer Sd12m first in practice. The actual calculation of the
symmetrizer for large dimension d and order 2m is really time-, memory-, and
space- consuming. One can get an efficient calculation evaluating the Dλ⊗2m (λ λ)m
by using the T-derivative step by step. An instance is
2
Dλ⊗4 λ λ = 23 Id 4 + K−1
(3214) + K −1 ⊗2 3 −1 ⊗2
(1324) vec Id = 2 L22 vec Id ,
where 3 commutator matrices are included instead of 24, which are necessary for
obtaining Sd14 . Further examples can be found in Sect. A.2, p. 353, Appendix.
Example 5.4 Deriving Dλ⊗6 (λ λ)3 means finding 15 partitions K{3|} , with size 3
and type = (0, 3, 0, 0, 0, 0), i.e. splitting up the set 1 : 6 into three blocks, and
each block contains two elements. The partitions are in canonical form, and the
corresponding commutator L−1 32 is listed in Sect. A.2.1, p. 354. The result is
3
Dλ⊗6 λ λ = 48L−1 ⊗3
32 vec Id .
We can calculate EW⊗2m directly from the expected values of the entries. The
nonzero entries of EW⊗2m are those where all terms in the product of entries of W
have even degrees, see (5.28).
Remark 5.3 We compare the expected values of the entries of W⊗2m to the expected
values of those in Z⊗2m , where Z is a standard normally distributed variate. The
nonzero entries of the expected value EZ⊗2m are products with even powers such
5.2 Elliptically Symmetric and Skew-Spherical Distributions 265
that
d
d
E Zi2ki = (2ki − 1)!!,
i=1 i=1
in turn.
An interesting side result is that the vector a = Sd12m vec⊗m Id is symmetric (a ∈
Sd,2m ⊂ Md,2m ) and has the following structure. Let us use the multilinear indexing
j1:2m , for the entries of a (see Sect. 1.3.2), denote i the number of repetitions, i.e.
the multiplicity of i, i = 1, . . . , d, in the index j1:2m (cf. Remark 1.5, p. 20, type of a
multi- index). If there exists an odd i , then aj1:2m = 0; otherwise if all multiplicities
i of multi-index j1:2m are even, then we write i = 2ki and then
1 d
aj1:2m = (2ki − 1)!!.
(2m − 1)!!
i=1
Now replacing the semifactorials in terms of the factorial and power of 2, similarly
to (2m − 1)!! = (2m)!/2m m!, we obtain
−1
m 2m
aj1:2m = ,
k1:d 2k1:d
since k1:d = m.
Example 5.5 Let the generating variate R be Gamma distributed with parameters
ϑ > 0, α > 0; then we have
ϑ r (α + r)
ER r = ;
(α)
266 5 Multivariate Skew Distributions
μ⊗ ⊗
W2 ,2k+1 = μβ2 ,2k+1 μR,2k+1 μU2 ,2k+1 = 0, k = 0, 1, . . . ,
κ⊗ ⊗
X,1 = E |W1 | = μβ1 μR μ|U1 |,1 .
We refer (A.32), p. 374 for the moments of βk including μβ1 = G1,0 , and
Lemma 5.12 for
7
1 1
μ⊗|U1 |,1 = 1p ,
π G1
Remark 5.4 We have seen that the first-order cumulant (and moment) of X is
expressed clearly in terms of dimensions (through G functions) and the expected
value of generating variate R besides the skewness matrix . Hereinafter we shall
express neither the moments, cumulants of βk , the T-moments, nor the T-cumulants
of |U1 | in detail. We collected the corresponding formulae in Sect. A.6, p. 368,
Appendix, instead. The calculation can be carried out by plugging these formulae
into the actual expressions.
The second- and higher-order moments and cumulants of X depend on the
higher-order moments and cumulants of vectors |W1 | and W2 including mixed
ones, for instance the variance of X depends on κ ⊗ ⊗ ⊗
|W1 |,2 , κ W2 ,2 and κ |W1 |,W2 . Let
268 5 Multivariate Skew Distributions
κ⊗
|W1 |,W2 = Cum2 (β1 R |U1 | , β2 RU2 ) (5.37)
Now β1 R and β2 R can be pulled from the conditional cumulants first; then
we observe that |U1 | and U2 are independent from Q = [β1 , R]; therefore,
Cum2 (|U1 | , U2 |Q) = Cum2 (|U1 | , U2 ), which is zero since U1 and U2 are
independent. The second term is zero as well, since Cum1 (U2 |Q) = μ⊗U2 ,1 = 0;
hence, the result
κ⊗
(|W1 |,W2 ) = Cum1 β1 β2 R Cum2 (|U1 | , U2 |Q)
2
+ Cum2 β1 RCum1 (|U1 | |Q) , β2 RCum1 (U2 |Q) = 0.
1/2⊗2
κ⊗
X,2 = ⊗2 ⊗
κ |W1 |,2 + Id −
κ⊗
W2 ,2 .
5.2 Elliptically Symmetric and Skew-Spherical Distributions 269
We use μ⊗ ⊗
|U1 |,2 given by (5.85) and the independence of the components of κ |W1 |,2
for the first term. The second term contains vec = ⊗2 vecIp , as follows:
1/2
1 1/2⊗2 1
Cum2 Id − U2 = Id − vecId = vec Id −
d d
1
= vecId − ⊗2 vecIp , (5.38)
d
where (1.4), p. 7 has been applied, namely
1/2⊗2 1/2 1/2
Id − vecId = vec Id − Id Id − (5.39)
= vec Id − = vecId − ⊗2 vecIp .
Gk = Gk (p), where constants c1 , c2 , and c3 are defined by the previous line of the
formula. Hence, the variance–covariance matrix follows:
is also similar
1/2⊗2k 1/2⊗2 ⊗k
I − (vecId )⊗k = Id − (vecId ) (5.41)
⊗k
= vec⊗k Id − = vecId − ⊗2 vecIp ,
where we used the mixed product rule for T-product (see (1.3), p. 6 and (5.39)).
Case 5.3 (Third-Order Cumulant) We use Lemma 5.5 for neglecting zero terms
and obtain
1/2⊗2
⊗ ⊗3 ⊗ −1
μX,3 = μ|W1 |,3 + L12 ,11 ⊗ Id −
μ⊗ ⊗2
|W1 |,W2
μ⊗ ⊗
|W1 |,3 = μβ1 ,3 μR,3 μ|U1 |,3 ;
hence,
μR,3 μβ ,β 2 7 1
μ⊗ ⊗3 ⊗ L−1 1p ⊗ vecId − ⊗2 vecIp
1 2
X,3 = μR,3 μβ1 ,3 μ|U1 |,3 + dG1 (p) 1
π 2 1,1
κ⊗
+ κ⊗ ⊗
(W2 ,|W1 |,W2 ) + κ W21 ,|W1 |
|W1 |,W212 2
= Id 3 + K−1
(312) + K −1 ⊗ −1 ⊗
(231) κ |W1 |,W21 = L12 ,11 κ |W1 |,W21 .
2 2
5.2 Elliptically Symmetric and Skew-Spherical Distributions 271
We use conditional cumulant (see Example 3.27, p. 157) for both terms. Let us apply
Brillinger’s theorem ((3.67), p. 156) with condition Q = [β1 , R] to
κ⊗ ⊗ −1 ⊗ ⊗ ⊗
|W1 |,3 = Cum1 κ |W1 |,3|Q + L12 ,11 Cum2 κ |W1 |,2|Q , κ |W1 |,1|Q + Cum3 κ |W1 |,1|Q
= μβ1 ,3 μR,3 κ ⊗
|U |,3 + κ 2 2 L −1
β R ,β R 1 ,1 κ ⊗
|U |,2 ⊗ κ ⊗ ⊗3
|U |,1 + κβ1 R,3 κ |U |,1
1 1 1 2 1 1 1 1
κβ R,β 2 R 2 G1
L−1
1
= √2 1 ,1 1p ⊗ vecId . (5.42)
d π 1 2
One can also use the formula for expressing cumulants in terms of moments (see
(3.33)) and find the cumulant as
κ⊗ ⊗ −1 ⊗ ⊗ ⊗3
X,3 = μX,3 − L12 ,11 μX,2 ⊗ μX,1 + μX,1 ,
μ⊗ ⊗
|W1 |,4 = μR,4 μβ1 ,4 μ|U1 |,4 ;
μ⊗ = μR,4 μβ 2 ,β 2 μ⊗ ⊗
|U1 |,2 ⊗ μU2 ,2
|W1 |⊗2 ,W⊗2
2 1 2
μR,4 μβ 2 ,β 2 1 1 1
= 1 2
vecIp + 1 2 − vecIp ⊗ vecId ,
d d π G2 (p) p
and finally
1
μ⊗ ⊗4 = μR,4 μβ2 ,4 μ⊗⊗4 = L−1 vec⊗2 vecId .
W2 U2 d (d + 2) 22
μ⊗ ⊗4 ⊗
X,4 = μR,4 μβ1 ,4 μ|U1 |,4
1 −1
+μR,4 μβ 2 ,β 2 L1 ,2 vec
1 2 d2 2 1
d
+ vec 1p 1p − ⊗ vec Id −
πG2 (p)
1
+ L−1
22 vec
⊗2
Id − .
d (d + 2)
(2k − 1)!! ⊗m ⊗
μ⊗
|W1 | = μR,2k+m μβ m ,β 2k μ|U1 |,m ⊗ L−1 ⊗k
k2 vec 2 ,
1m ,W212k 1 2 k
2 (d/2)k
(5.45)
where the moment μβ m ,β 2k is given by G-function (μβ m ,β 2k = Gm,2k (p, d), cf.
1 2 1 2
(A.34), p. 375), T-moment μ⊗
|U1 |,m depends on p and m only (see Lemma 5.12)
and where L−1k2 denotes the moment commutator (see Sect. A.2, p. 353 for moment
commutators L−1lr:1 ).
The joint T-cumulant of |W1 | and W2 can be calculated by the following formula:
κ⊗
|W1 | = ϒs,t (Q) K−1
1m ,2 W212k p K{s} ,L{t}
t =1:k, K{s} ∈Pm ,
s=1:m L ∈P2k
{t}
⊗
× ⊗|aj | κ ⊗
|U1 |,|aj |
j =1:s
⊗
⊗ μU,|bj | L−1 ⊗k
|bj |2 vec 2 , (5.46)
j =1:t
where the summation goes over all K{s} ∈ Pm and L{t } ∈ P2k and where ϒs,t (Q)
is given by the cumulants
ϒs,t (Q) = κβ m β 2k ,k Cum|h| R |ai | , R |bj | |i,j ∈h .
1 2 h∈K
K∈Ps+t
[n/2]
(2k − 1)!!
μ⊗
X,n = L−1
1n−2p ,212p μR,2k+m μβ m ,β 2k
1 2 2p (d/2)p
p=0
× ⊗(n−2p) μ⊗
|U1 |,m ⊗ L−1
p2 vec⊗p
Id −
,
and
[n/2] 1/2⊗2p
κ⊗
X,n = L−1
1n−2p ,212p ⊗(n−2p)
⊗ Id 2 − ⊗2
κ⊗
|W1 | ,
1n−2p ,W212p
p=0
(5.47)
n
n 1/2
= Cumn ( |W1 |)1n−m , I − W2
m 1m
m=0
n
1/2⊗m
n
= ⊗(n−m) ⊗ I − κ⊗
|W1 |1n−2m ,W212m .
m
m=0
Cumulants κ ⊗
|W1 |1m ,W21n−m , where the number of W2 is odd, are zero by Lemma 5.5;
hence (5.47) follows, except the coefficients, which we shall prove separately. Let
us consider κ ⊗
X,3 first, put n = 3 in (5.47), and obtain
κ⊗ ⊗3 ⊗
X,3 = κ |W1 |,3 + 3Sd13 ⊗ Id 2 −
⊗2
κ⊗|W1 |,W21 . 2
The cumulant κ ⊗
|W1 |,W21 has been considered in Example 5.42; now let us consider
2
it together with its coefficient. In a general case the cumulant κ ⊗|W1 |1n−m ,W21m
⊗
1/2⊗2k
contains the cumulant κ U2 ,2k with coefficient I − , namely
1/2⊗2k 1/2⊗2k
I − κ⊗
U2 ,2k = c Id −
vec⊗k Id
⊗k
= c Id 2 − ⊗2 vec⊗k Id ,
√ Z
W= p Z/ Z = RU, (5.48)
S
√
where R = p Z /S is the generating variate, so that R 2 /d ∈ F (d, p) has F -
distribution with degrees of freedom d and p.
Let μ ∈ Rd ; A is a d × d matrix; then the linear transform
X = μ + A W
ER 2m p m (d/2 + m) (p/2 − m) p m
= = G2m (d) G−2m (d) ;
dm d (d/2) (p/2) d
We also have the even order moments of the components of uniform distribution (on
sphere Sd−1 )
(2m − 1)!!
μU1 ,2m = ,
2m (d/2)m
see (5.81). These two moments provide us the moments for components of W
We recall that all entries of W have the same distribution; therefore, we can use
notation W as a general entry.
The cumulants with even order of W can be calculated with the help of cumulant
parameters
κm . We consider the moment parameters μm first, which is
1 μR,2m (d/2)m pm (d/2)m p/2 − 1 m (p/2 − 1)m
μm = m −1 = −1= −1
α (d, m) μR,2 (d/2)m (p/2 − m)m pd/2 (p/2 − m)m
(5.50)
(see (5.29) for moments of R, and (A.24) for Pochhammer’s Symbol). Then
cumulant parameter
κm is calculated by the general expression (5.27).
Now we use Theorem 5.2 to state the following:
Lemma 5.7 Let p > 2m and W be t-multivariate, W ∈ Mtd (p, 0, Id ), with
dimension d and degrees of freedom p, then EW = 0, and both the moments and the
cumulants with odd higher order are zero. The moments with even order are given
by
pm
μ⊗
W,2m = L−1 (vecId )⊗m .
2m (p/2 − m)m m2
X = μ + Rp V. (5.54)
EX = μ + ERp EV,
κ⊗ ⊗j
V,j = κ|Z|,j δ , j = 2.
278 5 Multivariate Skew Distributions
where Gk (d) is defined by (5.79), for particular values of Gk (d) and μRp ,k (see
Sect. A.6.1, p. 370).
Now turning back to expected value
7
p
μ⊗
X = μ + G−1 (p) δ,
π
p p p p
= G−2 (p) vec − G2−1 (p) δ ⊗2 = vec − G2−1 δ ⊗2 ;
2 π p−2 π
We are interested in obtaining skewness and kurtosis of X later on; therefore, our
subject will be the first four cumulants of X.
We have seen (Lemma 3.4 and Exercise 5.14) the following form of third-order
cumulant
κ⊗
X,3 = κ Rp ,3 κ ⊗
V,3 + 3κ ⊗
V,1 ⊗ κ ⊗
V,2 + κ ⊗3
V,1
+ κRp ,1 κRp ,2 3κ ⊗ V,3 + 6κ ⊗
V,1 ⊗ κ ⊗ ⊗
V,2 + κRp ,1 κ V,3 ,
3
where we used symmetrizer Sd13 to get the right-hand sides above. Now we show
another equivalent form by the following Lemma.
5.3 Multivariate Skew-t Distribution 279
κ⊗
X,3 = c1 (p) δ
⊗3
+ c2 (p) L−1
12 ,11 (vec ⊗ δ) ,
where
7
p 2 2 1
c1 (p) = p G−1 (p) G (p) − ,
π π −1 p−3
p 3/2 7 7
2G−1 (p) 2 p pG−1 (p)
c2 (p) = = .
2 (p − 2) (p − 3) π π (p − 2) (p − 3)
Proof We apply symmetrizer Sd13 and use symmetry equivalence form =. Direct
calculation shows
κ⊗ ⊗ ⊗ ⊗ ⊗3
X,3 = μRp ,3 μV,3 − 3μRp ,1 μRp ,2 μV,1 ⊗ μV,2 + 2κRp ,1 μV,1 ;
3
hence, we have
κ⊗ ⊗ −1 ⊗ ⊗ ⊗3
X,3 = μRp ,3 μV,3 − 3μRp μRp ,2 L12 ,11 μV,2 ⊗ μV,1 + 2κRp ,1 μV .
3
Now the quantities included in the coefficient of δ ⊗3 are given by (5.55), with
particular values in Sect. A.6.1, p. 370 and those of cumulants κ|Z|,1 are at (A.38).
Therefore,
7 7
p p p p
κ⊗
X,3 = G−1 (p) μ⊗ − G−1 (p) L−1 ⊗ ⊗
12 ,11 μV,1 ⊗ μV,2
p−3 2 V,3 p−2 2
7
p p 3
+2 G (p) μ⊗3
2 2 −1 V
7
p p p p 2
= G−1 (p) μ⊗ − 3 L −1
μ ⊗
⊗ μ ⊗
+ 2 G (p) μ⊗3
2 p − 3 V,3 p − 2 12 ,11 V V,2 2 −1 V
and
7
2
μ⊗
V = δ, μ⊗V,2 = vec,
π
7 7
2 ⊗3 2
μ⊗
V,3 =− δ +3 vec ⊗ δ,
π π
280 5 Multivariate Skew Distributions
since
⎛ 7 7 ⎞ 7 7 3
3
⊗ 2 2 ⎠ ⊗3 2 −1 2 ⊗2 2
⎝
μV,3 = 2 − δ +3 L vec − δ ⊗δ+ δ ⊗3
π π π 12 ,11 π π
⎛ 7 7 7 3 7 3 ⎞ 7
3
2 2 2 2 ⎠ ⊗3 2 −1
= ⎝2 − −3 + δ +3 L vec ⊗ δ
π π π π π 12 ,11
7 7
2 ⊗3 2 −1
=− δ +3 L vec ⊗ δ.
π π 12 ,11
and obtain κ ⊗
X,3 the assertion of the Lemma.
Let us recall Lemma 3.4, where we have seen the following form of fourth-order
cumulant:
κ⊗ ⊗ ⊗ ⊗ ⊗2 ⊗ ⊗2
X,4 = κRp ,4 κ V,4 + 4κ V,3 ⊗ κ V,1 + 3κ V,2 + 6κ V,2 ⊗ κ V,1 + κ V,1
⊗4
(5.57)
+ 4κRp ,3 κRp ,1 κ ⊗ ⊗ ⊗ ⊗2 ⊗
V,4 + 3κ V,3 ⊗ κ V,1 + 3κ V,2 + 3κ V,2 ⊗ κ V,1
⊗2
+ 3κR2 p ,2 κ ⊗V,4 + 4κ ⊗
V,3 ⊗ κ ⊗
V,1 + 2κ ⊗2
V,2 + 4κ ⊗
V,2 ⊗ κ ⊗2
V,1
+ 6κRp ,2 κR2 p ,1 κ ⊗ ⊗ ⊗ ⊗2 ⊗
V,4 + 2κ V,3 ⊗ κ V,1 + 2κ V,2 + κRp ,1 κ V,4 ,
4
5.3 Multivariate Skew-t Distribution 281
where
2
2 ⊗2 ⊗2 2 2
κ ⊗2
V,2 = vec − δ = (vec) ⊗2
− 2 vec ⊗ δ ⊗2
+ δ ⊗4 .
π π π
where
2p2 2 2 3
c1 (p) = G − G2−1 ,
π −1 p−3 π
2p2
c2 (p) = ,
(p − 4) (p − 2)2
2 p2
c3 (p) = G2 .
π (p − 3) (p − 2) −1
We have also
2
κX,4 = μRp |Z| ,4 δ ⊗4 + 3 (vec)⊗2 + 6 κRp ,Rp ,Rp2 − 1 vec ⊗ δ ⊗2 .
π
Proof Let G−1 = G−1 (p). One can derive the formula
κX,4 = κRp4 ,1 κ ⊗
V,4 + 4κ 3
Rp ,Rp κ ⊗
V,1 ⊗ κ ⊗ ⊗2
V,3 + 3κRp2 ,Rp2 κ V,2
by (5.58) for κX,4 directly. We rather pay attention to the particular value of κ ⊗
V,2
κX,4 = κRp4 ,1 κ|Z|,4 + 4κRp ,Rp3 κ|Z|,1 κ|Z|,3 + κRp ,4 κ|Z|,1
4
δ ⊗4
⊗2
2 2
+3κRp2 ,2 vec − δ ⊗2 + 6κRp ,Rp ,Rp2 κ|Z|,1
2
vec − δ ⊗2 ⊗ δ ⊗2 ;
π π
282 5 Multivariate Skew Distributions
κX,4 = κRp4 ,1 κ|Z|,4 + 4κRp ,Rp3 κ|Z|,1 κ|Z|,3 + κRp ,4 κ|Z|,1
4
2 2
2 2
+3 κRp2 ,2 − 6 κRp ,Rp ,Rp2 δ ⊗4
π π
2
+3κRp2 ,2 (vec)⊗2 + 6 κRp ,Rp ,Rp2 − κRp2 ,2 vec ⊗ δ ⊗2 .
π
We proceed with formulae of Sect. A.6.1
2p2
κRp2 ,2 =
(p − 4) (p − 2)2
and
p 2 4
κRp ,Rp ,Rp2 − κRp2 ,2 = − G2−1 .
2 (p − 2) (p − 3)
p2
−6μRp ,4 + 8κRp ,Rp3 + 3κRp2 ,2 − 6κRp ,Rp ,Rp2 + κRp ,4 = −3 G−1 (p)4
2
5.3 Multivariate Skew-t Distribution 283
and where
1 2 p2
μRp ,4 − κRp ,Rp3 = G .
2 −1 p − 3
where
Cr Rp , 1:n = Cum Rp { } , Rp2 , . . . , Rpn ,
1 {2 } {n }
j
where Rp corresponds to the block with cardinality j , which includes the
{ j }
j j
power Rp only (it implies listing Rp consecutively j times).
284 5 Multivariate Skew Distributions
n 1
κ⊗
X,n = Cr Rp , 1:n (n − 22 )!
j !
r=1 j =r, j =1:(n−r+1)\2
j j =n
j
κ|Z|,j ⊗2
× L−1 δ ⊗(n−22 )
⊗ κ V,2 ,
j! [n−22 ]1 ,2
which might be useful from computational point of view (see Sect. A.2 for moment
commutator L−1
[n−22 ]1 ,2 , it has n!/2 ! (n − 22 )! (2!) terms).
2
n
n!
κ⊗
X,n = Sd1n n j
Cumr κ⊗
X,1|R , . . . , κ⊗
X,n|Rp 1:n ,
j =1 j ! (j !)
p 1:1
r=1 j =r,
j j =n
where κ ⊗ ⊗
X,j |Rp = Cumj X|Rp 1:j denotes j copies of κ X,j |Rp , as usual, including
the case j = 0, when Cumj X|Rp is missing from Cumr . Therefore, Cumr
contains exactly r variables. The conditional cumulant
κ⊗ ⊗ ⊗ j
X,j |Rp = κ Rp V,j |Rp = Rp κ V,j ,
n Cr Rp , 1:n ⊗ ⊗
κ⊗
X,n = n! n κ V,jj
=1:n
j =1 j ! (j !)
j j
r=1 j =r,
j j =n
n
n j
1 κ|Z|,j 1
= n!Sd1n Cr Rp , 1:n
j ! j! 2 !
r=1 j =r, j =1,j =2
j j =n
2
1
× κ ⊗
V,2 ⊗ δ
2 ⊗(n−22 )
,
2!
X = μ + K 1/2 (η) V,
where we assume that V and η are independent variates. Let us denote the random
weight function by
Recall that the cumulants of V are given in Lemma 5.1 and suppose we are given
the cumulants of ξ .
In this section we derive the first four cumulants for X ∈ SMSNd (μ, K (η)
, α).
The first cumulant, i.e. the moment of X, is
κ⊗
X,1 = μ.
The variance of X is calculated by using the formula of cumulant for the product of
independent variates (see (3.69), p. 158). We obtain
2 ⊗2 2
κ⊗
X,2 = κξ 2 ,1 κ ⊗
V,2 + κξ,2 κ ⊗2
V,1 = κξ 2 ,1 vec − δ + κξ,2 δ ⊗2
π π
2 ⊗2
= κξ 2 ,1 vec + κξ,2 − κξ 2 ,1 δ
π
2 2 ⊗2 2
= κξ 2 ,1 vec − κξ,1 δ = μξ,2 vec − μ2ξ δ ⊗2 . (5.60)
π π
The third-order cumulant of X follows in the same way (see Exercise 3.21, p.
178)
κ⊗ ⊗ −1 ⊗ ⊗ ⊗3
X,3 = κξ 3 ,1 κ V,3 + κξ,ξ 2 L12 ,11 κ V,2 ⊗ κ V,1 + κξ,3 κ V,1 ,
286 5 Multivariate Skew Distributions
where the expression for the cumulants of V has been found; therefore
κ⊗ ⊗ −1 ⊗ ⊗ ⊗3
X,3 = κξ 3 ,1 κ V,3 + κξ,ξ 2 L12 ,11 κ V,2 ⊗ κ V,1 + κξ,3 κ V,1
7 7
2 3/2 2 ⊗3 −1 2 2
= κξ 3 ,1 2 − δ + κξ,ξ 2 L12 ,11 vec − δ ⊗2
π π π π
7
3/2 2 ⊗3
⊗δ + κξ,3 δ
π
7
⊗3 −1 2
= c1 δ + κξ,ξ 2 L12 ,11 vec ⊗ δ, (5.61)
π
where L−112 ,11 denotes the moment commutator (see (A.2), p. 353). Collecting
coefficients of δ ⊗3 we obtain that
2 3/2 7
2
c1 = 3μξ,3 − κξ,1
3
− μξ,3 .
π π
We apply the formula (3.72), p. 159 and moment commutators to get the fourth-
order cumulant
κ⊗ ⊗ −1 ⊗ ⊗
X,4 = κξ 4 ,1 κ V,4 + L13 ,11 κξ 3 ,ξ κ V,3 ⊗ κ V,1
+ κξ 2 ,2 L−1 ⊗2 −1 ⊗ ⊗2 ⊗4
22 κ V,2 + κξ,ξ,ξ 2 L12 ,21 κ V,2 ⊗ κ V,1 + κξ,4 κ V,1 . (5.62)
n ⊗ ⊗
κ⊗
X,n = Cr (ξ, 1:n ) L−1
l1:r κ V,jj (5.63)
j =1:(n−r+1)
r=1 j =r,
jj =n
n ⊗ 1 ⊗
= n! Cr (ξ, 1:n ) κ V,jj ,
j =1:(n−r+1) j ! (j !)j
r=1 j =r,
jj =n
where
2
Cr (ξ, 1:n ) = Cumr ξ{1 } , ξ{ 2}
, . . . , ξ n
{n }
5.5 Multivariate Skew-Normal-Cauchy Distribution 287
n j
1 κ|Z|,j
κ⊗
X,n = Cr (ξ, 1:n ) (n − 22 )!
j ! j!
r=1 j =r, j =1:(n−r+1)\2
j j =n
L−1
[n−22 ]1 ,2 δ
⊗(n−22 )
⊗ κ ⊗ 2
V,2 , (5.64)
j
1 1 (n − 22 )! (2!)2 2 !
n!
j ! j! n!
j =1:(n−r+1)
j
1 1
= (n − 22 )! .
j ! j!
j =1:(n−r+1)\2
1
ψX (λ) = log 2 + λ λ + log 1 − F 0|a λ, a a ,
2
where F 0|a λ, a a is the cdf of skew-generalized normal distribution.
We introduce a = δ , a a = α 2 > 0, and
4 ∞
f (λ) = 1 − F 0|δ λ, α = 2ϕ (x) h (x) δ λ dx, (5.65)
0
Faà di Bruno’s formula provides the derivatives of log f (λ) in terms of f (λ)
and its derivatives (see (2.58), p. 95), namely
n
Dλ⊗n log f (λ) = (−1)r−1 (r − 1)!f −r (λ)
r=1 j =r,j j =n
⎛ ⎞
⊗ ⊗
×⎝ Kp−1K ⎠ fλ,j j , (5.67)
( {r|} )
K{r|} ∈Pn
⊗j
where fλ,j = Dλ f (λ), and the third summation is over all possible par-
titions K{r|} ∈ Pn , with size r and type . The number of such K{r|} is
n! n−r+1
j =1 1/j ! (1/j !)j .
Now, let Z denote the standard normal variate.
The derivatives of f (λ) are as follows. The first-order one is
4 ∞
1
Dλ⊗ log f (λ) = 2ϕ (x) Dλ⊗ h (x) δ λ dx (5.68)
f (λ) 0
4 ∞
1
= 2ϕ (x) ϕ h (x) δ λ h (x) dxδ,
f (λ) 0
which corresponds to n = 1, in (5.67). The expected value also follows from this;
putting λ = 0 in (5.68) we obtain
4 7
* 2 ∞ 2
Dλ⊗ log f (λ)*λ=0 = √ 2ϕ (x) h (x) dxδ = μh(|Z|) ;
2π 0 π
hence,
7
* 2
Dλ⊗ ψX (λ)*λ=0 = λ|λ=0 + μh(|Z|) δ,
π
therefore,
*
*
Dλ⊗2k f (λ)* = 0.
λ=0
The T-derivatives of ϕ with even orders at zero are given by the following formula:
* (2k − 1)!!
*
Dλ⊗2k ϕ h (x) δ λ * = √ h (x)2k δ ⊗2k , k ≥ 1.
λ=0 2π
We apply these results to derive the nth, n > 0 order derivatives of f (λ) at 0. Let
k > 1; then
* 4 ∞ *
* *
Dλ⊗2k+1 f (λ)* = 2ϕ (x) Dλ⊗2k ϕ h (x) δ λ h (x)* dxδ
λ=0 0 λ=0
4 ∞
(2k − 1)!!
= √ 2ϕ (x) h (x)2k+1 dxδ⊗(2k+1)
2π 0
(2k − 1)!!
= √ μh(|Z|),2k+1 δ ⊗(2k+1) .
2π
We can rewrite the semi factorial (2k − 1)!! by (A.22), p. 368, and obtain (2n)! =
2n n! (2n − 1)!!
*
* (2k)!
Dλ⊗2k+1 f (λ)* = √ μh(|Z|),2k+1 δ ⊗(2k+1) ,
λ=0 2k k! 2π
which is valid for k ≥ 0 as well.* We apply these T-derivatives to the terms of (5.67)
when λ = 0, i.e. Dλ⊗n log f (λ)*λ=0 . First of all we notice that the summation is over
all partitions K{r|} ∈ Pn with size r and type , when each block of K{r|} has odd
⊗j
cardinality, namely j = 0 for even j . Second, the T-derivatives fλ,j = Dλ f (λ)
⊗
and their T-products ⊗ fλ,j j are equal to δ ⊗n multiplied by a constant for each
size r and type . Therefore, there is no need touse commutator matrices. For a
given r and the number of blocks in K{r|} is n! n−r+1
j =1 1/j ! (1/j !)j .
Hence, we conclude that the cumulants of X depend on those types = 1:n
where components with even indices are zero, i.e. j = 0 for each even j . Let us
290 5 Multivariate Skew Distributions
denote types with j = 0 for all even j by ∗ and introduce the notation
⎛ ⎞
⊗ ⊗
Cr (h, 1:n ) = f −r (λ) ⎝ Kp−1K ⎠ fλ,j j
( {r|} )
K{r|} ∈Pn
n−r+1 j
1 (j − 1)!μh(|Z|),j
= 2r n! √ δ ⊗n
j ! 2(j −1)/2 ((j − 1) /2)!j ! 2π
j =1
n−r+1 j
−(n−r)/2 −r/2 1 μh(|Z|),j
= 2 n!2
r
(2π) δ ⊗n
j ! ((j − 1) /2)!j
j =1
r/2 n−r+1
1 μh(|Z|),j j
−(n−r)/2 2
= n!2 δ ⊗n (5.69)
π j ! ((j − 1) /2)!j
j =1
for the terms of sum in (5.67). Observe that in formula (5.67) if n is odd then r
cannot be even and if n is even then r cannot be odd; therefore (−1)r−1 = (−1)n−1 .
The cumulant generating function ψX (λ) contains a quadratic term; therefore,
κ⊗
X,2 requires spatial attention.
Let n = 2 in (5.67). If r = 1, then the only type is 1 = 0, 2 = 1, and j = 2,
which is even; hence, it does not contribute to the second-order derivative. In the
case if r = 2, then 1 = 2 and j = 1 is odd. So we obtain
*
* 2 1 2 2
Dλ⊗2 log f (λ)* = −2! μ δ ⊗2 = − μ2h(|Z|) δ ⊗2 .
λ=0 π 2! h(|Z|),1 π
n
κ⊗
X,n = (−1)
n−1
(r − 1)! Cr (h, 1:n ) δ ⊗n , (5.70)
r=1 ∗ :j =r,j j =n
5.5 Multivariate Skew-Normal-Cauchy Distribution 291
where Cr (h, 1:n ) is defined by (5.69) and where the second summation runs over
all types ∗ , which fulfils the following assumptions: j = 0 for all even j , j = r,
and j j = n.
Our purpose is to provide the first four cumulants in detail, which will be
necessary for the study of multivariate skewness and kurtosis. First, let us consider
these cumulants with respect to the Theorem above; then we calculate μh(|Z|),k by a
clear formula.
n=3 Take n = 3; if r = 1, then 1,2 = 0, 3 = 1, and j = 3; therefore
7 7
−1 2 μh(|Z|),3 ⊗3 2
C1 (h, ∗ = (0, 0, 1)) = 3!2 δ = μh(|Z|),3 δ ⊗3 ;
π 3 π
We obtained
7 3/2
⊗ 2 2
κX,3 = μh(|Z|),3 + 2! μ3h(|Z|) δ ⊗3 . (5.71)
π π
2 μh(|Z|),3 ⊗4 2
C2 (h, ∗ = (1, 0, 1, 0)) = 4!2−1 μh(|Z|) δ = 4 μh(|Z|) μh(|Z|),3 δ ⊗4 .
π 3 π
where polynomials Hm and Fm fulfill the following differential equations:
d
Hm (z) = zHm−1
(z) + H (z)
dz m−1
m−1
dk
Fm (z) = H (z)
dzk m−1−k
k=0
5.5 Multivariate Skew-Normal-Cauchy Distribution 293
with initial values H0 (z) = 1, F0 (z) = 0 (see Sect. A.7, p. 376 for more details).
We apply the formula (5.74) and get
7
(−1)n−1 1 2 n/2
μh(|Z|),n = (n/2 + 1/2)
(n − 1)! π α 2
π
×
Hn−1 (1/α) M (1/α) − Fn−1
(1/α)
2
7
(−1)n−1 2(n−1)/2 2
= (n/2 + 1/2)
(n − 1)!α n π
π
×
Hn−1 (1/α) M (1/α) − Fn−1
(1/α) .
2
We are interested in moments with odd orders n = 2k + 1, which have a special
form
7
2k k! 2 π
μh(|Z|),2k+1 = 2k+1
H2k (1/α) M (1/α) − F2k
(1/α)
(2k)!α π 2
7
1 2 π
= H 2k (1/α) M (1/α) − F
2k (1/α)
(2k − 1)!!α 2k+1 π 2
7
1 2
= g2k+1 (α) ,
(2k − 1)!! π
1 π
g2k+1 (α) =
H2k (1/α) M (1/α) − F2k
(1/α) ,
α 2k+1 2
in particular
1π
g1 (α) = M (1/α) .
α2
We obtained the moments μh(|Z|),2k+1 in the following simple form:
Lemma 5.10 Let function h be defined by (5.66) and Z denote the standard normal
variate; then
7
1 2
μh(|Z|),2k+1 = g2k+1 (α) .
(2k − 1)!! π
⊗ 1 1 2
κX,1 = M (1/α) δ, κ⊗
X,2 = vec − M (1/α) δ ⊗2 , (5.75)
α α2
and
n r
r
⊗ 2 1 gj (α) j ⊗n
κX,n = n! (−1) r−1
(r − 1)! δ ,
π j ! j!
r=1 ∗ (r) j =1
1
φX (λ) = 2 .
1 − iλ θ − 1/2 λ θ + 1/2λ λ
5.6 Multivariate Laplace 295
We will use Faà di Bruno’s formula (5.67) when the first- and second-order
derivatives of P (λ) count only; therefore, the second summation in formula (2.51)
is over all those partitions K I,II that have blocks with only one or two elements,
i.e. K I,II ∈ PI,II
n (see Sect. 1.4.8.2, p. 47). This implies that the corresponding
I,II
types ∗ (r) of partitions K{r} with size r should fulfill the following assumptions:
j ≥ 0, j = 1, 2, . . . , r, j = r, j j = n, and for j > 2, j = 0.
n
Dλ⊗n log φX (λ) = −Dλ⊗n log P (λ) = (−1)r (r − 1)!P −r (λ)
r=1
⎛ ⎞
⎜ ⎟ ⊗ ⊗
⎝ K−1
I,II
⎠ Pλ,j j ,
p K{r}
∗ (r) I,II
K{r}
⊗j I,II
where Pλ,j = Dλ P (λ). The number of partitions K{r} is the number of all
possible blocks containing 1 element each and all possible blocks of the rest subset
of 1 : n containing 2 elements. If we divide 1 : n into r blocks with one or two
elements, then the number of blocks with one element k, say, and the number of
blocks with two elements m fulfill k + m = r and k + 2m = n, i.e. m = n − r.
Therefore, n/2 ≤ r ≤ n, where · denotes ceiling. The corresponding type to a
given r is ∗ (r) = (1 , 2 , 0, . . . , 0) = (2r − n, n − r, 0, . . . , 0).
n
Dλ⊗n log φX (λ) = −Dλ⊗n log P (λ) = (−1)r (r − 1)!P −r (λ)
r=n/2
⊗2 ⊗1
× L−1
∗ (r) Pλ,2 ⊗ Pλ,1 ,
sense L−1 −1
∗ (r) = L21 ,11 ,
I,II
which corresponds to all possible partitions K{r}
2
n ⊗(n−r)
κ⊗
X,n = (r − 1)!L−1
∗ (r) vec − θ
⊗2
⊗ θ ⊗(2r−n) . (5.76)
r=n/2
= −θ ⊗2 + θ ⊗2 − vec;
therefore, κ ⊗
X,2 = vec.
n = 3 Let us apply (5.76) to n = 3. In this case r is either 3/2 = 2 or 3. Let
r = 2; then ∗ (2) = (1, 1, 0) and L−1 −1 −1
∗ (2) = L12 ,11 (see (A.2), p. 353, for L12 ,11 ).
If r = 3, then ∗ (3) = (3, 0, 0) and L−1
∗ (2) = I. We obtain
κ⊗ −1
X,3 = L12 ,11 vec − θ ⊗2 ⊗ θ + 2θ ⊗3 = L−1 ⊗3
12 ,11 (vec ⊗ θ ) − θ .
(5.77)
⊗2
κ⊗
X,4 = L22 vec − θ
⊗2
+ 2L12 ,21 vec − θ ⊗2 ⊗ θ ⊗2 + 6θ ⊗4
= L22 vec⊗2 − vec ⊗ θ ⊗2 − θ ⊗2 ⊗ vec
5.7 Appendix 297
+2L12 ,21 vec ⊗ θ ⊗2 + (6 − 12 + 3) θ ⊗4
= L22 vec⊗2 + L12 ,21 vec ⊗ θ ⊗2 − 3θ ⊗4 . (5.78)
5.7 Appendix
which is connected to the expected value of the products of the modulus of uniform
random variables |Ui |ki .
Proposition 5.3 Let U ∈ Rd be uniform random variate on sphere Sd−1 ; then
d
1
d
E Ui2ki = (2ki − 1)!!, (5.80)
2k (d/2)k
i=1 i=1
(2k − 1)!!
EUi2k = (5.81)
2k (d/2)k
#
for the even order moments of U, where k = ki , and (d/2)k =
d/2 (d/2 + 1) · · · (d/2 + k − 1), and 2k (d/2)k = d (d + 2) · · · (d + 2 (k − 1)).
d
2ki 1 (2ki )!
d
E Ui = . (5.82)
(d/2)k 22ki ki !
i=1 i=1
#
k = ki , and (d/2)k = d/2 (d/2 + 1) · · · (d/2 + k − 1). We have moments for
the T-products of U; the odd moments are zero and the even ones are
EU12m
EU⊗2m = L−1 vec⊗m Id = EU12m Sd12m vec⊗m Id .
(2m − 1)!! m2
298 5 Multivariate Skew Distributions
The odd cumulants of U are zero and the even ones are
Cum2m (U1 ) −1 ⊗m
Cum2m (U) = L vec Id = Cum2m (U1 ) Sd12m vec⊗m Id .
(2m − 1)!! m2
(5.83)
The higher-order T-moments and T-cumulants of U follow from Theorem 5.2, p.
263. The moment μU1 ,2m is clearly given by the formula 5.81; hence, the moment
μ⊗U,2m as well. The cumulant κU1 ,2m is given through the formula “cumulants via
moments,” see (3.27), p. 131; hence, the cumulant κ ⊗
U,2m needs further calculations
in practice.
Example 5.8 We have fourth-order moment
1 3
EU⊗4 = L−1 ⊗2
22 vec vecId = Sd1 vec⊗2 Id ;
d (d + 2) d (d + 2) 4
3 3 −6
κU1 ,4 = μU1 ,4 − 3μ2U1 ,2 = − 2 = 2 .
d (d + 2) d d (d + 2)
Lemma 5.12 Let U be uniform on sphere Sd−1 . The higher-order moments of the
modulus
7
d
1 1 1 d
E |Ui | =
ki
((ki + 1) /2) , (5.84)
π d1 Gk (d)
i=1 i=1
#
where k = ki , d1 is the number of nonzeros ki , in particular
7
1 k!
E |Ui | 2k+1
= .
π G2k+1 (d)
1 1 1
E |U|⊗2 = vecId + 1d 2 − vecId , (5.85)
d π G2 (d)
5.7 Appendix 299
and
7 7
1 1
d
⊗3 1 1
E |U| =
e(k,k,k) +
e(k,k,j ) (5.86)
π G3 (d) π 2G3 (d)
k=1 j =k
7
1 1
+ 3
e(j,k,m) ,
π G3 (d)
j =k=m
3 d
1 1
E |U|⊗4 =
e(j,j,j,j ) +
e(j,j,j,k)
d (d + 2) π G4 (d)
j =1 j =k
1
+
e(j,j,k,k) (5.87)
4G4 (d)
j =k
7
1 1 1 1
+ 3
e(j,j,k,m) + 2
e(j,k,m,n) ,
π G4 (d) π G4 (d)
j =k=m j =k=m=n
e(j,k,m,n) ∈ Sd,4 .
where
Remark 5.5 Observe that in spite of E |Ui |2 = 2 |U|⊗2 = EU⊗2 , since vector
* EU* i,E
|U| contains mixed products entries |Ui | *Uj * (i = j ) and these do not equal
⊗2
Ui Uj . Another observation is that there are only three distinct elements of the third-
order and four distinct elements of the fourth-order moments!
Proof Since U is strongly connected to the standard normal random variable on Rd ,
it is quite straightforward to derive the moments of U
EU⊗(2k+1) = 0, k = 1, 2, . . . ;
Cum2k+1 (U) = 0,
which follows from the expression of the cumulant via moments. (Namely to
express cumulant via moments one uses the moments of products according to the
partitions of the set {1, 2, . . . 2k + 1}. Now, any partition of this set contains at least
one block that has odd number elements, and it makes the expectation zero.) We
have, see (5.80),
(2k − 1)!!
EUi2k =
2k (d/2)k
300 5 Multivariate Skew Distributions
#
for the even order moments of U, where k = ki , (d/2)k = d/2 (d/2 + 1) · · ·
(d/2 + k − 1), and 2k (d/2)k = d (d + 2) · · · (d + 2 (k − 1)). First we calculate
EU⊗4 . The entries of U⊗4 contain all possible fourth-order products from U. Now
we are interested in those entries of U⊗4 that have even order products. These are
either Uk4 or Uj2 Uk2 , j = k. We shall separate them since EU14 and EU12 U22 , say, are
different. We have
3 1
EU14 = , EU12 U22 =
d (d + 2) d (d + 2)
by (5.80). We notice that EZ⊗4 = 3Sd14 (vecId )⊗2 for a standardized Gaussian
vector with i.i.d. entries. The structure of EZ⊗4 is exactly the same as that of EU⊗4 ,
having nonzero entries EZ14 and EZ12 Z22 , being 3 and 1, respectively. The difference
is 1/d (d + 2) ; hence,
3
EU⊗4 = Sd14 (vecId )⊗2 = EU14 Sd14 (vecId )⊗2 .
d (d + 2)
The sum of the vector (vecId )⊗2 is d 2 ; the symmetrizer Sd14 is rearranging the
entries only; therefore,
3 3d
EU⊗4 = d2 = .
j d (d + 2) d +2
j
Now let us turn to the modulus. Let Z be standard normal, then Z and Z are
independent, and U = Z/ Z and Z are independent as well. Let
It is known that
((k + d) /2)
E Zk = 2k/2 .
(d/2 )
Hence, we have
7
1 ((n + 1) /2) (d/2)
E |U| = E |Z| /E Z 1d =
n n n
1d . (5.88)
π ((n + d) /2)
d
1
d
E |Ui |ki = E |Zi |ki
i=1
E Z k
i=1
7
d1
(d/2) (ki −1)/2 2
= k/2 2 ((ki + 1) /2)
2 ((k + d) /2) π
i=1
7
1 1 d
1
= ((ki + 1) /2) ,
π d1 Gk (d)
i=1
#
since the entries of Z are independent where k = ki , and d1 is the number of
nonzeros ki .
Nevertheless, we have a general formula (5.88), it looks simpler to use separate
formulae for even powers |U|2k = U2k (the power is in entry-wise), see (5.82), and
for odd powers we write
7
1 (k + 1)
E |U|2k+1 = 1 2k+1 . (5.89)
π G2k+1 (d) d
We shall evaluate the expected values of |U|⊗3 and |U|⊗4 ; clearly, each of them has
entries of products of modulus with different entries; therefore, it will be necessary
to collect the entries with the same powers.
Let us start by putting U = [Uk ]k=1:d into the form,
d
|U| = |Uk | ek ,
k=1
302 5 Multivariate Skew Distributions
* *
where ek ∈ Rd are unit vectors. The entries of |U|⊗2 are Uk2 , and *Uj * |Uk |, j = k.
Observe that
d
vecId =
e(j,j ) ,
j =1
where
e(j,j ) are orthogonal vectors in Sd,2 , and the sum of all such vectors is 1d 2 .
Now we use the formulae (5.82) and (5.89), respectively, and obtain
1 1 1
E |U|⊗2 = vecId + 1d 2 − vecId .
d π G2 (d)
⊗3 ⊗3
* * we2 are interested* in* E |U| , where the entries of |U|
Next, include |Uk |3 ,
*Uj * |Uk | , j = k, and *Uj * |Uk | |Um |, j = k = m. These types of products have
different expected values. We write |U|⊗3 in terms of e(j,k,m) ∈ Sd,3 first
d ⊗3
d * *
|U| ⊗3
= |Uk | ek = |Uk |3
e(k,k,k) + *Uj * |Uk |2
e(k,k,j )
k=1 k=1 j =k
* *
+ *Uj * |Uk | |Um |
e(j,k,m) ,
j =k=m
d * * * *3 * *2
|U|⊗4 = *Uj *4
e(j,j,j,j ) + *Uj * |Uk |
e(j,j,j,k) + *Uj * |Uk |2
e(j,j,k,k)
j =1 j =k j =k
* *2 * *
+ *Uj * |Uk | |Um |
e(j,j,k,m) + *Uj * |Uk | |Um | |Un |
e(j,k,m,n) ,
j =k=m j =k=m=n
√
where e(j,k,m,n) ∈ Sd,4 , and the formulae G4 (d) = d (d + 2) /4, and (3/2) =
3 π/4
3
EU14 = ,
d (d + 2)
1 1
E |U1 | |U2 |3 = ,
π G4 (d)
1
E |U1 |2 |U2 |2 = ,
4G4 (d)
7
1 1
E |U1 | |U2 | |U3 | =
2
3
,
π G4 (d)
1 1
E |U1 | |U2 | |U3 | |U4 | = ,
π 2 G4 (d)
3
d
1 1 1
E |U|⊗4 =
e(j,j,j,j ) +
e(j,j,j,k) +
e(j,j,k,k)
d (d + 2) π G4 (d) 4G4 (d)
j =1 j =k j =k
7
1 1 1 1
+
e(j,j,k,m) +
e(j,k,m,n) .
π 3 G4 (d) π 2 G4 (d)
j =k=m j =k=m=n
L−1
n2 = Kp−1K ;
( n, )
(2n−1)!!
304 5 Multivariate Skew Distributions
the summation is over all partitions Kn, ∈ P2n , with type = (0, n, 0, . . . , 0) (see
Sect. A.2, p. 353).
Proof We use the formula (2.58),
⎛ ⎞
2n ⊗ ⊗
Dx⊗2n f (g (x)) = f (r) (g) ⎝ Kp−1K ⎠ gx,j j ,
( r, )
r=1 j =r,j j =2n Kr, ∈P2n
(5.91)
where partitions Kr, ∈ P2n , with size r and type . The type = 1:n =
(1 , . . . , n ) means that the j th-order derivative happens j times. Let f (g (λ)) =
(λ λ)n , so that f (x) = x n , and g (λ) = λ λ. In that case if j is larger than 2, then
⊗
gλ,jj = 0.
Let r = n−1; then the set 1 : 2n is split up into n−1 blocks; therefore, there must
⊗
be at least one block with cardinality at least 3; hence, ⊗ gλ,jj = 0. Similarly, if
⊗
r < n, then ⊗ gx,j j = 0. If r > n, then f (r) (x) = 0.
The only nonzero value occurs in the sum (5.91), p. 304, when r = n, and j = 2,
⊗n
2 = n, j = 0, j = 2. We obtain f (n) (x) = n!, gλ,2 = 2 (vecId )⊗n ; in addition
we need all the partitions Kn, ∈ P2n , with = (0, n, 0, . . . , 0). The number of
these partitions N = (2n)!/2n n! = (2n − 1)!!, see (2.53), p. 91, and the assertion
follows.
= μβ1 ,4 μR,4 κ ⊗ −1 ⊗ ⊗ −1 ⊗2
|U1 |,4 + κβ13 R 3 ,β1 R L13 ,11 κ |U1 |,3 ⊗ κ |U1 |,1 + L22 κβ12 R 2 ,2 κ |U1 |,2
+κβ 2 R 2 ,β1 R,β1 R L−1
1 ,2
2 1
κ ⊗
|U 1 |,2 ⊗ κ ⊗2
|U 1 |,1 + κβ1 R,4 κ ⊗4|U1 |,1
1
= Sd14 μβ1 ,4 μR,4 κ ⊗ ⊗ ⊗ ⊗2
|U1 |,4 + 4κβ13 R 3 ,β1 R κ |U1 |,3 ⊗ κ |U1 |,1 + 3κβ12 R 2 ,2 κ |U1 |,2
+6κβ 2 R 2 ,β1 R,β1 R κ ⊗ |U1 |,2 ⊗ κ ⊗2
|U1 |,1 + κ β 1 R,4 κ ⊗4
|U1 |,1
1
5.7 Appendix 305
(see (A.5), p. 354 for commutator L−1 12 ,21 , (A.36), p. 375 for κβ13 R 3 ,β1 R , say). Only
those conditional cumulants are nonzero, which separate |W1 | and W2 and do not
include the first-order cumulant of W2 . Let us denote 2 = Id − , for short;
then
⊗2 ⊗ 2 κ⊗
1/2⊗2 1/2 1/2
|W1 |12 ,W212 = Cum 4 |W 1 | , |W1 | , 2 W 2 , 2 W2
= L−1 ⊗ ⊗
22 Cum2 κ |W1 |,2|Q , κ 1/2 + L−1
21 ,12 Cum3
2 W2 ,2|Q
× κ⊗ ⊗ ⊗
|W1 |,1|Q , κ |W1 |,1|Q , κ 1/2 2 W2 ,2|Q
= κβ 2 R 2 ,β 2 R 2 L−1
22 κ⊗ ⊗
|U1 |,2 ⊗ μ 1/2 + κβ1 R,β1 R,β 2 R 2 L−1
21 ,12
1 2 2 U2 ,2 2
× κ ⊗2 ⊗
|U1 |,1 ⊗ μ 1/2 2 U2 ,2
κβ 2 R 2 ,β 2 R 2
L−1 ⊗2 ⊗ 1/2⊗2
= 1 2
22 κ |U1 |,2 ⊗ 2 vecId
d
κβ1 R,β1 R,β 2 R 2
L−1 ⊗2 ⊗2 1/2⊗2
+ 2
21 ,12 κ |U1 |,1 ⊗ 2 vecId
d
κβ 2 R 2 ,β 2 R 2
= 1 2
L−1
22 ⊗2 ⊗
κ |U1 |,2 ⊗ vec 2
d
κβ1 R,β1 R,β 2 R 2
+ 2
L−1
21 ,12 ⊗2 ⊗2
κ |U1 |,1 ⊗ vec 2 .
d
We use conditional cumulants and neglect all terms that include W2 with odd order;
we get
−6μβ2 ,4 μR,4 κβ 2 R 2 ,2
= vec⊗2 2 + 2 2 L−1 ⊗2
22 vec 2 .
d (d + 2)
2 d
One can get the final expression of κ ⊗X,4 by plugging these terms into (5.44) and
⊗
conclude the dependence of κ X,4 on skewness matrix and generating variate R;
the rest of the quantities depend on dimensions d and p only.
306 5 Multivariate Skew Distributions
Proof The assertion (5.45) simply follows from the assumptions of the model
CFUSS, cf. (5.36).
Using Brillinger’s theorem ((3.67), p. 156) for T-cumulant κ ⊗
|W1 |1m ,W212k under
condition Q = [β1 , R], we obtain
κ⊗
|W1 | = K−1
Cum
s+t
1m ,W212k p K{s} ,L{t}
t =1:k, K{s} ∈Pm ,
s=1:m L ∈P2k
{t}
× κ⊗ ⊗ ⊗ ⊗
|W1 |,|a1 ||Q , . . . , κ |W1 |,|as ||Q , κ W2 ,|b1 ||Q , . . . κ W2 ,|bt ||Q ,
since by Lemma 5.5 cumulants of W2 with odd orders are zero. Now conditional
cumulants
κ⊗
|W1 |,|aj ||Q
= Cum|aj | (|W1 | |Q) = (β1 R)|aj | κ ⊗
|U1 |,|aj |
,
and
κ⊗
W2 ,|bj ||Q
= Cum|bj | (W2 |Q) = (β2 R)|bj | κ ⊗
U2 ,|b1 | .
Therefore
Cums+t κ ⊗ |W1 |,|a1 ||Q , . . . , κ ⊗
, κ ⊗
|W1 |,|as ||Q W2 ,|b1 ||Q , . . . , κ ⊗
W2 ,|bt ||Q
= Cums+t (β1 R)|a1 | κ ⊗ |U1 |,|a1 | , . . . , (β1 R)
|as | ⊗
κ |U1 |,|as | ,
(β2 R)|b1 | κ ⊗
U2 ,|b1 | , . . . , (β2 R)
|bt | ⊗
κ U2 ,|bt |
= Cums+t (β1 R)|a1 | , . . . , (β1 R)|as | , (β2 R)|b1 | , . . . , (β2 R)|bt |
⊗ ⊗
κ⊗
|U1 |,|aj |
⊗ κ⊗U2 ,|bj |
⊗ ⊗
= ϒs,t (β1 , R) κ⊗
|U1 |,|am | ⊗ μU,|bj | L−1 ⊗k
|b | vec Id , j 2
5.8 Exercises
5.1 Show
(4) (3) (2) (1)
ρiM = (x − 2iρiM ) ρiM + 3ρiM 1 − 2iρiM ,
and conclude
(4) 5 3
ρiM (0) = 24ρiM (0) − 20ρiM (0) + 3ρiM (0) ;
hence,
7 5 7 3 7
(4) 2 2 2
κ|Z|,5 = i(−i)5 ρiM (0) = 24 − 20 +3 .
π π π
Next use the formula (3.27), p. 131, and show the similar result.
5.2 Take an X with multivariate skew-normal distribution and use central moments
to show
7 7
⊗ 2 2 ⊗3
μX,3 = 3 vec ⊗ δ − δ
π π
μ⊗
X,4 = 3 (vec)
⊗2
.
308 5 Multivariate Skew Distributions
κ4 =
μ4 − 4 μ22 + 6
μ3 − 3 μ2 ,
κ5 =
μ5 − 5
μ4 − 10
μ3 μ22 + 10
μ2 + 30 μ3 − 10
μ2 .
5.10 Let random variable β12 be Beta(p/2, d/2) distributed, and β22 = 1 − β12 .
Assume that the random variables R and β1 are independent. Use conditional
cumulants to show
and
5.12 Take random vectors W1 and W2 defined by model (5.36), and use conditional
cumulants to show
5.13 Assume that X ∈ Std (μ, , α, p), and p > 2, use formula (5.58) to derive
the second-order cumulant
p p
κ⊗
X,2 = vec − G2−1 (p) δ ⊗2 .
p−2 π
= κRp ,3 κ ⊗ ⊗ ⊗
V,3 + 3κ V,2 ⊗ κ V,1 + κ V,1
⊗3
+ κRp ,1 κRp ,2 3κ ⊗ ⊗ ⊗
V,3 + 6κ V,2 ⊗ κ V,1 + κRp ,1 κ V,3 .
3 ⊗
κ⊗
W,3 = κRp3 ,1 κ|Z|,3 δ
⊗3
+ 3κRp ,Rp2 κ|Z|,1 δ ⊗ κ ⊗
V,2 + κRp ,3 κ|Z|,1 δ
2 3 ⊗3
2
= κRp3 ,1 κ|Z|,3 + κRp ,3 κ|Z|,1 − 3κRp ,Rp2 κ|Z|,1
3
δ ⊗3 + 3κRp ,Rp2 κ|Z|,1 δ ⊗ vec.
π
5.16 Let W be given by the previous exercises. Use Sect. A.6.1, p. 370 and (A.38),
p. 379 and show
7 7
2 2 2 1 3p3/2 1
κ⊗
W,3 = p
3/2 G − ⊗3
δ + G−1 Sd13 (vec ⊗ δ) .
π π −1 p − 3 (p − 2) (p − 3) π
310 5 Multivariate Skew Distributions
5.17 Assume that X ∈ Std (μ, , α, p), and p > 3. Show that
7
2 p 3/2 2 1
κX,3 =2 G−1 (p) 2
G−1 (p) − δ ⊗3
π 2 π p−3
2
+ (3δ ⊗ vec) .
(p − 3) (p − 2)
and
p 2
−6μRp ,4 + 8κRp ,Rp3 + 3κRp2 ,2 − 6κRp ,Rp ,Rp2 + κRp ,4 = −6 G−1 (p)4 .
2
5.19 Let Rp be given by (5.53). Show that
5.21 Assume h (x) is defined by (5.66) and Z is a standard normal variate; show
7
1 2 1 6 π 1 1
μh(|Z|),5 = + + 3 M (1/α) − + 5 .
3α 5 π α4 α2 2 α α2
Moments of folded normal distribution are obtained in Elandt [Ela61]. The mul-
tivariate skew-normal distribution was introduced by Azzalini and Dalla Valle
[ADV96]. Further properties and applications of skew-normal distribution including
canonical fundamental skew normal can be found in papers [AC99, GHL01, PP01,
AC03, AVGQ04, Azz05, AVG05, Cap12, CC13, Shu16] and [AVFG18] among
others.
We refer the reader to the book by Fang, Kotz, and Ng, [FKN17] for the theory
of spherically symmetric or rotationally symmetric distributions. The distributions
of elliptical or ellipsoidal symmetry are considered in [Ste93, Sut86]; higher-
order moments are given in [BB86]. Further properties of such distributions
5.9 Bibliographic Notes 311
are given in [And03, KvR06, Mui09], and asymmetric versions like canonical
fundamental skew-spherical distribution [AB02, Gen04, GL05, DL05]. We follow
[AC03, SDB03] and [KM03] also deal with skew t-distribution. We refer to a
review article [Ser04] on multivariate symmetry. Some more skew distributions
include scale mixtures of skew normal [Kim08], multivariate skew-normal-Cauchy
[KRYAV16], and multivariate Laplace distribution [KS05, Kv95]. A recent review
of skew-symmetric (also called symmetry-modulated) distributions can be found in
[AA20] and [JTT21b].
Chapter 6
Multivariate Skewness and Kurtosis
Using the standard normal distribution as the yardstick, statisticians have defined
notions of skewness (asymmetry) and kurtosis (peakedness) in the univariate case.
Since all the odd central moments (when they exist) are zero for a symmetric
distribution on the real line, a first attempt at measuring asymmetry is to ask how
different the third central moment is from zero, although in principle one could use
any other odd central moment, or even a combination of them.
The analysis presented here is based on the cumulant vectors of the third and
fourth order, defined below. In our derivations, we utilize an elegant and powerful
tool: the T-derivative and the T-cumulants.
In this section and the next one, it will be shown that all cumulant-based measures
of skewness and kurtosis that have appeared in the literature can be expressed in
terms of the third and fourth cumulant vectors, respectively. Also, several hitherto
unnoticed relationships between different indices will be explored. We define what
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 313
G. Terdik, Multivariate Statistical Methods, Frontiers in Probability
and the Statistical Sciences, https://doi.org/10.1007/978-3-030-81392-5_6
314 6 Multivariate Skewness and Kurtosis
Y = −1/2 (X − μ)
with zero means and identity matrix for its variance–covariance. A complete
picture of skewness is contained in the third-order T-cumulant (skewness vector)
of standardized X
⊗3
κ⊗
Y,3 = Cum3
−1/2
(X − μ) = −1/2 κ⊗
Y,3 .
We shall use the Kendall–Stuart’s notation for both skewness and kurtosis.
Definition 6.1 The third-order T-cumulant γ ⊗ ⊗
X,1 = κ Y,3 of the standardized X will
be called the skewness vector of X, and the total skewness of X is defined by the
square norm of the skewness vector
2
γ1,d = γ ⊗
X,1 .
This definition guarantees that skewness is invariant under the shifting and
orthogonal transformations, in other words it is affine invariant. Let Q be an
orthogonal transformation. We show that the total skewness of X is invariant under
the orthogonal transformations Q of X, indeed
γ⊗ ⊗ ⊗
QX,1 = κ Y,3 = γ X,1 .
In view of the fact that EY = 0 and Y has unit variance, the third-order cumulant
equals the third-order central moment, so that
γ⊗ ⊗
X,1 = μY,3 . (6.1)
In general cumulants are invariant under shifting by a constant hence the assumption
that EY = 0 does not influence the treatment while using the moments simplifies
some formulae.
The skewness vector γ ⊗ ⊗
X,1 is 3-symmetric, i.e. γ X,1 ∈ Sd,3 , and therefore not all
entries are distinct. For instance: if d = 2, then
2
γ1,2 = γ ⊗
X,1 = κY21 ,3 + 3κ(Y
2
1 ,Y1 ,Y2 )
+ 3κ(Y
2
1 ,Y2 ,Y2 )
+ κY22 ,3 .
Now we list skewness measures for some distributions which have been consid-
ered in the previous chapters.
Case 6.1 (Normal Distribution) We see that the multivariate Gaussian distribution
is not skewed since cumulants of order higher than two are all zero.
Case 6.2 (Skew-Normal Distribution) If X is a skew-normal variate, X ∈
SNd (μ, , α), then the variance is given by
⊗ 2 ⊗2 2
κX,2 = vec − δ , VarX = − δδ ,
π π
therefore the skewness vector is
2 −1/2⊗3 ⊗3
γ⊗
1 = κ |Z|,3 − δδ δ ,
π
√
where κ|Z|,3 = 2/π (4/π − 1) and δ is the skew parameter, (see Lemma 5.1).
Hence the total skewness can be written as
3
2 −1⊗3 ⊗3 2 −1
γ⊗ δ ⊗3
2
1 = 2
κ|Z|,3 − δδ δ = κ|Z|,3 δ − δδ
2
δ .
π π
√
One can observe that the skew parameter δ = π/2μX , (see (5.4), p. 246);
therefore, the total skewness can be expressed by the mean and the variance as
2 3
4−π −1
γ⊗
2
1 = μX − μX μX μX .
2
Case 6.3 (CFUSN) The skewness for canonical fundamental skew-normal distri-
bution, X ∈ CF U SNd,p (0, , ), is based on the results of Lemma 5.3, namely
the variance and the third-order cumulant which are given by
7
2 ⊗ 2 4
VarX = − , κX,3 = − − 1 ⊗3 i⊗
p,3 .
π π π
⊗
Now we combine the variance VarX and the cumulant κX,3 to obtain the skewness
vector of X as
7 −1/2⊗3
⊗ 2 4 2
γ1 = − −1 −
⊗3 i⊗
p,3
π π π
−1/2 ⊗3
2
= −
i⊗
p,3 ,
π
316 6 Multivariate Skewness and Kurtosis
2 −1 3
2 4 2
γ⊗ 1p −
2
1 = −1 1p ,
π π π
see (5.60), p. 285, and (5.61), p. 286. Let us recall that c1 is given explicitly
2 3/2 7
2
c1 = 3μξ,3 − κξ,1
3
− μξ,3 .
π π
(see (5.71), p. 291). Clear expressions of μh(|Z|) and μh(|Z|),3 are given in
Examples 5.6 and 5.7, p. 294. Hence the skewness vector is
7 3/2 −1/2 ⊗3
2 2 1 2
γ⊗
1 = μh(|Z|),3 + 2! 3
μh(|Z|) − 2 M (1/α) δδ
δ .
π π α
6.1 Multivariate Skewness of Random Vectors 317
Case 6.9 (ML) If X has Multivariate Laplace distribution, X ∈ MLd (θ , ), then
VarX = and κ ⊗ −1 ⊗3
X,3 = L12 ,11 (vec ⊗ θ ) − θ ,
The following examples reveal several relationships among the well known
indices of skewness which appeared in the literature, and their connection to the
skewness vector γ ⊗
1 , which can actually be seen as the common denominator.
Example 6.1 (Mardia’s Skewness) Mardia suggested the square norm β1,d =
2 ⊗3
EY⊗3 of the vector EY⊗3 = E −1/2 (X − μ) as a measure of departure
from symmetry for X. Mardia’s measure coincides with our total skewness
2
β1,d = γ ⊗
X,1 = γ1,d ,
since third-order central moments and third-order cumulants are equal. Recall that
γ⊗ ⊗
X,1 = κ Y,3 by Definition 6.1. Let Y1 and Y2 be two independent copies of Y, then
3 ⊗3 ⊗3 2
E Y1 Y2 = EY1 Y⊗3
2 = EY1 EY⊗3
2 = EY
⊗3
= β1,d .
n
3
31,d = 1
β Yi Yj (6.2)
n2
i,j =1
based on a sample Y1 , . . . , Yn .
Example 6.2 (Móri–Székely–Rohatgi) The skewness vector b (Y) of Y can be
defined by the quantity
b (Y) = E Y Y Y = vec Id ⊗ Id κ ⊗
Y,3 , (6.3)
(see Exercise 6.1) and we will refer to it as MSzR skewness vector. Note that
vec Id ⊗ Id is a matrix of dimension d × d 3 , which contains d unit values per-
row, whereas all the others are 0; as a consequence, this measure
does not
take into
account the contribution of cumulants of the type Cum3 Xj , Xk , X , where all
three indices j , k, are different from each other. The corresponding scalar measure
of multivariate skewness is b (Y) = b (Y)2 .
Example 6.3 (Malkovich–Afifi Measure) They consider the following approach to
measure skewness: Let Sd−1 be the d − 1 dimension unit sphere in Rd and let
318 6 Multivariate Skewness and Kurtosis
where cos (a, b) indicates the cosine of the angle between the vectors a and b. Thus
we have the following inequality:
2
2 2
b (Y) = sup u⊗3 κ ⊗
Y,3 = κ⊗
Y,3 sup cos2 u⊗3 , κ ⊗ ⊗
Y,3 ≤ κ Y,3 .
u u
⊗3
In case there exists a u0 such that cos u0 , κ ⊗
Y,3 = 1; in other words κ ⊗ Y,3
is a rank 1 element of the symmetric subspace Sd,3 , (see Sect. 1.3, p. 13) then
2
Malkovich–Afifi measure b (Y) = κ ⊗ Y,3 = γ1,d . For instance this occurs when
X is skew-normal. Another instance is when Y has independent components. In
this case all the entries of κ ⊗ Y,3 are zero but not necessarily κYi ,3 . Assume that all
the components of Y are skewed, κYi ,3 = 0, otherwise we can restrict ourselves
to a smaller dimensional problem. A particular example of this is when d = 2,
κ⊗
Y,3 = [κY1 ,3 , 0, 0, 0, 0, 0, 0, κY2 ,3 ] . In the case of independence we have
d
κ⊗
Y,3 = κYi ,3 e⊗3
i .
i=1
Remark 6.1 Take an orthogonal transformation Q, and let X = QY. Then the
Malkovich–Afifi measure of skewness of X and that of Y are equal, i.e. b (X) =
b (Y).
2 2 2
Indeed u⊗3 , κ ⊗ = u⊗3 , Q⊗3 κ ⊗ = Q −1 u ⊗3 , κ ⊗ ≤ b (Y)
Y,3 Y,3 Y,3
6.1 Multivariate Skewness of Random Vectors 319
2
for any u, and for any Q, hence b (X) ≤ b (Y). Similarly u⊗3 , κ ⊗ Y,3 =
2 2
u⊗3 , Q−1⊗3 κ ⊗
X,3 = (Qu)⊗3 , κ ⊗
X,3 ≤ b (X), hence b (Y) ≤ b (X).
3 d
Tk = κ(Yk ,Yj ,Yj ) .
d (d + 2)
j =1
Notice that vector T does not depend on Cum3 Yj , Yk , Y when j, k, are all
different from each other; furthermore, the vector T coincides with the skewness
vector (6.3) up to a constant. One can show that
4
3
u ⊗ u⊗3 (du) = vec Id ⊗ Id .
Sd−1 d (d + 2)
1 3 2
d
b12 (X) = EYi .
d
j =1
320 6 Multivariate Skewness and Kurtosis
−1/2
Y = Dλ−1 1/2 Y = Dλ Y.
The expected value of Y is E
Y = 0, and Var
Y = Dλ−1 , (
Y is uncorrelated) hence
the statistic b12 (X) can be written as
i = Cum3 e D −1/2 Y
3 = Cum3 Y
EY i i λ
−1/2 ⊗3 ⊗3 −1/2 ⊗3 ⊗ ⊗3 ⊗
= ei Dλ Cum3 −1/2 X = ei Dλ κ Y,3 = ei κ .
Y,3
d
1 ⊗3 −1/2 ⊗3 ⊗ 2
b12 (Y) = ei Dλ γ1 .
d
j =1
⊗3
It may be remarked that ei is a unit axis vector while ⊗3 is a rotation in the
3
Euclidean space Rd . The measure b12 (Y) is the norm square of the projection
of κ ⊗
Y,3 to the subspace of R
d 3 and hence it does not contain all the information
contained in γ ⊗
1.
Example 6.6 (Kollo’s Measure) An alternate skewness vector has been defined by
Kollo as
b (Y) = E 1d 2 (Y ⊗ Y) ⊗ Y,
Each entry of κ ⊗ Y,3 contributes to value b (Y). Nevertheless, we note the fact
that not all third-order mixed moments appear in b (Y) individually, so that some
information can be lost. If one compares skewness vector b (Y) to the one in (6.3),
then one can see the difference between the vectors 1d 2 and vec Id . The result is
that here the 0s of vec Id are changed to 1s and we get the linear combinations of
the corresponding values of γ ⊗1 . When d = 2 and
κ⊗
Y,3 = κY1 ,3 , κ(Y1 ,Y1 ,Y2 ) , κ(Y1 ,Y1 ,Y2 ) , κ(Y1 ,Y2 ,Y2 ) , κ(Y1 ,Y1 ,Y2 ) , κ(Y1 ,Y2 ,Y2 ) , κ(Y1 ,Y2 ,Y2 ) , κY2 ,3
then b (Y) = 0 even for an asymmetric distribution, making it not a valid measure
of symmetry.
2
γ2,d = γ ⊗
X,2
γ⊗ ⊗ −1 ⊗2 ⊗ −1 ⊗2
X,2 = μY,4 − L22 μY,2 = μY,4 − L22 vec Id , (6.6)
γ⊗ ⊗ ⊗2
X,2 = μY,4 − 3vec Id .
The kurtosis of vector variate X, similarly to the skewness, is invariant under shift
and orthogonal transformations; therefore, it is invariant under affine transforma-
tions.
As we have seen above elliptically symmetric distributions are not skewed and
their fourth-order cumulants are not necessarily zero. Therefore, we can get even
closer to Gaussianity if besides skewness the kurtosis is also 0.
Now we list kurtosis measures for some distributions which have been considered
in the previous chapter.
Case 6.10 (Normal) The kurtosis of normal distribution is 0.
Case 6.11 (Skew Normal) Let X be with skew-normal distribution, X ∈
SNd (μ, , α). We will express the kurtosis in terms of the mean μX . Since
322 6 Multivariate Skewness and Kurtosis
√
δ= π/2μX , and we recall
2
VarX = − δδ , and κ ⊗ ⊗4
Y,4 = κ|Z|,4 δ ,
π
therefore, the kurtosis vector is
⊗4
−1/2⊗4 ⊗4 −1/2
γ⊗
X,2 = (2π − 6) − μ μ
X X μX = (2π − 6) − μ μ
X X μX
since
2
4−π
δ ⊗4 = δ4 = μX 4 ,
2
and
π2 π2 8 3 3
κ|Z|,4 = 1− = 2π 1 − = 2π − 6.
4 4 π π π
Since we have
2 ⊗
VarX = − , and κX,4 = κ|Z|,4 ⊗r i⊗
p,4 ,
π
and
2
2 2
κ|Z|,4 = 4 −6 ,
π π
6.2 Multivariate Kurtosis of Random Vectors 323
γ2,d = d (d + 2) κY21 ,4 ,
Case 6.15 (Skew-t) The kurtosis of the distribution Std (μ, , α, p) is based on the
variance (5.56), p. 278 and the fourth-order cumulant in Lemma 5.9.
Case 6.16 (SMSN) Let X have scale mixtures of skew-normal distribution, X ∈
SMSNd (μ, K (η) , α). Then the kurtosis of X is
−1/2⊗4
2
γ⊗
X,2 = μξ,2 − μ2ξ δδ κ⊗
X,4 ,
π
where κ ⊗
X,4 is given by (5.62), p. 286, and the cumulants of V can be found in
Lemma 5.1, and ξ is defined by (5.59).
Case 6.17 (SNC) The kurtosis of multivariate skew-normal-Cauchy distribution
SNC(,a) is given by
2 −1/2 ⊗4
8 2 2 2
γ⊗
X,2 = − μh(|Z|),3 μh(|Z|) + 6 4
μh(|Z|) − μh(|Z|) δδ δ ,
π π π
⊗
see (5.75) for variance and (5.72) for κX,4 .
324 6 Multivariate Skewness and Kurtosis
Case 6.18 (ML) Let us consider random variate X with multivariate Laplace
distribution, X ∈ MLd (θ , ), then
γ⊗
X,2 =
−1/2⊗4
L−1 ⊗2 −1
22 vec + L12 ,21 vec ⊗ θ
⊗2
− 3θ ⊗4 ,
cf. (5.78).
The kurtosis vector γ ⊗
X,2 forms the basis for all multivariate measures of kurtosis
proposed in the literature, as the examples that follow, demonstrate.
Example 6.7 (Mardia) Mardia defined an index of kurtosis as
2
β2,d = E Y Y
⊗
and we express μY,4 by γ ⊗
X,2 , hence
−1 ⊗2
β2,d = vec Id 2 vec Id 2 γ ⊗
X,2 + vec Id 2 L22 vec Id .
Finally, we observe that the constant term does not depend on the distribution. If
Y is standard Gaussian vector for which the fourth cumulant is zero and β2,d =
E (Y Y)2 = d (d + 2), so that as a side result we have
−1 ⊗2
vec Id 2 L22 vec Id = d (d + 2)
and we obtain
β2,d = vec Id 2 γ ⊗
X,2 + d (d + 2) . (6.7)
A consequence of this is that Mardia’s measure does not depend on all the entries
of γ ⊗
X,2 , which has ŋd,4 = d(d + 1)(d + 2)(d + 3)/24 distinct elements, (see (1.30)
15), while β2,d includes only d 2 elements among them. We note if X is Gaussian,
then γ ⊗
X,2 = 0.
Example 6.8 (Koziol) Koziol considered the following index of kurtosis. Let
Y be
an independent copy of Y, then
4 2
Y Y = E
E Y⊗4 Y⊗4 = EY⊗4 .
6.2 Multivariate Kurtosis of Random Vectors 325
2
= μ⊗
2
Therefore EY⊗4 Y,4 can be considered as the next higher degree
analogue of Mardia’s skewness index β1,d . Specifically
2
μ⊗
Y,4 = γ2,d + 6β2,d − 3d (d + 2) (6.8)
⊗
γ ⊗ ⊗2
X,2 vec Id = vec Id 2 γ X,2 , where β2,d is Mardia’s index of kurtosis. Indeed, if
we express the moment μ⊗Y,4 in terms of cumulants, then we obtain
2 2
μ⊗
Y,4 = γ⊗ −1 ⊗2
X,2 + L22 vec Id = γ2,d + 2γ ⊗ −1 ⊗2
X,2 L22 vec Id + 3d (d + 2) ,
where we have observed L−1 ⊗2
22 vec Id L−1 ⊗2
22 vec Id = 3d (d + 2), (see Exer-
cise 6.2, p. 348). Now γ ⊗ ⊗ −1 ⊗
2 is 4-symmetric; therefore, γ X,2 L22 = 3γ 2X, , (see
Exercise 6.3, p. 348) hence
2
μ⊗
Y,4 = γ2,d + 6γ ⊗ ⊗2
X,2 vec Id + 3d (d + 2) .
Then
vec B (Y) = Id 2 ⊗ vec Id μ⊗
Y,4 − (d + 2) vec Id ,
since expressing μ⊗
Y,4 from (6.6) we have
vec B (Y) = Id 2 ⊗ vec Id γ ⊗X,2 + L−1
22 vec⊗2
Id − (d + 2) vec Id
−1 ⊗2
= Id 2 ⊗ vec Id γ ⊗
X,2 + Id 2 ⊗ vec Id L22 vec Id − (d + 2) vec Id
= Id 2 ⊗ vec Id γ ⊗
X,2 ,
(see Exercise 6.5, p. 348). As in the case of MSzR skewness measure, this measure
does not take into account the contribution of cumulants of the type E (Yr Ys Yt Yu )
326 6 Multivariate Skewness and Kurtosis
2 2 2
since we have u⊗4 κ ⊗ Y,4 = u⊗4 2 κ ⊗
Y,4 cos u⊗4 , κ ⊗
Y,4 =
2
κ⊗
Y,4 cos2 u⊗4 , κ ⊗ ⊗
Y,4 . If κ Y,4 is rank 1 tensor in symmetric space Sd,4 then
equality occurs in (6.10), similarly to the case of skewness.
Remark
6.2 It must be noted that the idea used in Eq. (6.4), namely the integral of
u u⊗4 κ ⊗ Y,4 over the unit sphere, will not work, since it is easy to see that this
results in a zero vector. So the extension to vector valued case is not possible.
Example 6.10 (Kollo) Kollo introduces the kurtosis matrix
⎛ ⎞2
d
d d
B (Y) = EYi Yj YY = E Yi Yj YY = E ⎝ Yi ⎠ YY
i,j =1 i,j =1 j =1
= E 1d 2 (Y ⊗ Y) YY .
The vector corresponding to this B (Y) can be expressed in terms of the kurtosis
vector γ ⊗
X,2 as
⎛ ⎞2
d
vec B (Y) = E ⎝ Yi ⎠ (Y ⊗ Y) = E 1d 2 (Y ⊗ Y) vec YY
j =1
= EY⊗2 1d 2 (Y ⊗ Y) = Id 2 ⊗ 1d 2 EY⊗4
= Id 2 ⊗ 1d 2 γ ⊗ −1 ⊗2
X,2 + L22 vec Id .
Example 6.11 (Srivastava) Using the notations of Example 6.5, the average of the
fourth moments of the centered and scaled variable
Y,
1 4
d
b2 (X) = EYi ,
d
j =1
−1/2
Y = Dλ Y.
Again we see the average of the fourth moments of the standardized variable
Y. We
have
−1/2 ⊗4 ⊗
i = e D
Cum4 Y κ Y,4 ,
i λ
and
2
i4 = Cum4 Y
EY i + 3Var Y
i = Cum4 Y
i + 3
hence
1 ⊗4 ⊗4 ⊗
d
1 ⊗4
−1/2
b2 (Y) = ei κ Y,4 + 3 = i⊗
d,4 Dλ γ⊗
X,2 + 3;
d d
j =1
The skewness γ ⊗ ⊗ 3 4
1 and kurtosis γ X,2 vectors contain d and d elements, respec-
tively, not all of which are distinct. Just as the covariance matrix of a d-dimensional
vector contains only ŋd,2 = d(d + 1)/2 distinct elements, we also have that
γ⊗ ⊗
X,1 contains ŋd,3 = d(d + 1)(d + 2)/6 distinct entries, while γ X,2 contains
ŋd,4 = d(d + 1)(d + 2)(d + 3)/24 distinct entries at most.
Similar to the fact that there are many applications, and measures, which
consider only the distinct elements of the covariance matrix, it is quite sensible
and reasonable to follow this approach and define skewness and kurtosis measures
based on just the distinct elements of the corresponding cumulant vectors. One
can use the elimination matrix Q+ d,q (see Sect. 1.3.2, p. 16) for separating distinct
328 6 Multivariate Skewness and Kurtosis
γ1Ð = ||γ ⊗ + ⊗ 2
X,1,Ð || = ||Qd,3 γ 1 ||
2
and
γ2Ð = ||γ ⊗ 2 + ⊗ 2
X,2,Ð || = ||Qd,4 γ X,2 || .
where
1
n
3
μ = X, 3=
(Xi − 3 μ) ,
μ) (Xi − 3
n−1
i=1
EH3 (Y) = κ ⊗
Y,3 , EH4 (Y) = κ ⊗
Y,4 ,
First let us start by assuming that the parameters μ and are known.
Case 6.19 (Skewness: μ and are Known) The skewness vector has been defined
by γ ⊗ ⊗ ⊗
X,1 = κ Y,3 , i.e. γ X,1 is the third-order cumulant of Y, which is the third-order
central moment of Y. If the parameters μ and are known, then we obtain the
following estimator for skewness:
γ⊗ ⊗3
X,1 = Yi ,
The calculation of variance (6.12) is either based on formula (3.45), p. 142, or the
linearization of powers by Hermite polynomials, (see Proposition 4.6, p. 216) and
formula (4.79), p. 231. Now we use the method of Hermite polynomials and obtain
the estimator
γ⊗ ⊗3 −1
X,1,H = H3 (Yi ) = Yi − L11 ,12 Yi ⊗ vec Id .
This estimator is also unbiased and asymptotically normal with variance vector
vec Cγ 1 , which is
We have seen EH3 (Y)⊗2 by (4.80), p. 231. Now we apply it to the case when
κ⊗ ⊗
Y,1 = 0 and κ Y,2 = vec Id , hence we have
vec Cγ 1 = κ ⊗ −1 ⊗2 −1 ⊗ −1 ⊗3 ⊗2
Y,6 + L23 κ Y,3 + L2,H 4 vec Id ⊗ κ Y,4 + Mm3 vec Id − κ Y,3 ,
(6.13)
(see Sect. A.2, p. 353 for commutators). The left-hand side of Eq. (6.13)
is symmetric with respect to the symmetrizer Sd 3 12 S⊗2
d13 . Nevertheless we cannot
simplify the formula using the symmetrizer Sd 3 12 S⊗2 d13 for both sides, since for
instance, Sd 3 12 Sd13 L2,H 4 vec Id ⊗ κ Y,4 = 9Sd 3 12 Sd13 vec Id ⊗ κ ⊗
⊗2 −1 ⊗ ⊗2
Y,4 . If we
apply the symmetrizer Sd16 to both sides we obtain
Sd16 vec Cγ 1 = κ ⊗
Y,6 + 9κ ⊗2
Y,3 + 9 vec Id ⊗ κ ⊗ ⊗3
Y,4 + 6vec Id ,
which can be used directly if d = 1, otherwise it gives information about the number
of terms included in the equation.
In the case when μ and are estimated, we use standardized Y 3k variates with
estimated parameters; see (6.11). In the sequel we shall use Hermite polynomials
at the observed standardized sample 3 Yk ; therefore, we are interested in their
computational aspect.
We consider the Hermite polynomials Hj of a normal standard vector variate Z
and replace Z with 3
Yk , then we take the mean, denoting the result by Hj 3Yk . Now
H1 (Z) = Z; therefore, we have
H1 3 3−1/2 Xj − X =
Yj = 3−1/2 Xj − X = 0.
6.4 Testing Multivariate Skewness 331
H2 (Z ) = Z⊗2 − vecId ,
therefore,
−1/2 ⊗2
H2 3 3
Yj = Xk − X − vec Id = 0
since
⊗2 ⊗2 ⊗2 ⊗2
3−1/2 Xk − X = 3−1/2 Xk − X = 3−1/2 3
vec
= vec 3−1/2
33−1/2 = vec Id . (6.14)
Here we list the results for the Hermite polynomials of order 3 : 6. The arguments
are given in Sect. 6.7.1, Appendix. If we apply the Gram–Charlier approximation
to the distribution of Y, and use the formula (4.79), p. 231, then we obtain the
following estimators of cumulants, i.e. each estimated Hermite polynomial provides
an estimate of the corresponding cumulants as follows:
κ⊗
3 3 3⊗3
Y,3 = H3 Yj = Y , (6.15)
κ⊗
3 3 3⊗4 − 3vec⊗2 Id ,
Y,4 = H4 Yj = Y (6.16)
κ⊗
3 3 3⊗5 − 10vec Id ⊗ 3
Y⊗3 ,
Y,5 = H5 Yj = Y (6.17)
κ⊗
3 −1 ⊗2
Y,6 + L23 3 Yj = 3
κ Y,3 = H6 3 Y⊗6 − 15vec Id ⊗ 3
Y⊗4 + 30vec⊗3 Id . (6.18)
We can obtain the left side of the above symmetrized equations using symmetrizer.
We note that Sd16 L−1 κ ⊗2
23 3 Y,3 = 103κ ⊗2
Y,3 in the last equation; therefore,
⊗6
⊗2
κ⊗
3 = 3
Y − 15vec Id ⊗ 3
Y ⊗4 − 10 3
Y ⊗3 + 30vec⊗3 Id . (6.19)
Y,6
γ⊗
3 3⊗3 3
X,1 = Yi = H3 Yi . (6.20)
332 6 Multivariate Skewness and Kurtosis
Now we have
⊗3 ⊗3
3
Y⊗3 3−1/2 1/2 −1/2 Xi − μ − Xi − μ
i = ,
√ ⊗3 √ √ √ ⊗2 √ ⊗3
n Yi − Y = nY⊗3 ⊗2
i − 3Yi ⊗ n Y + 3 n Yi ⊗ Y − nY .
√ ⊗3 D √
√
n Yi − Y ∼ nY⊗3
i − 3vec Id ⊗ Yi = n H3 (Yi ),
√ ⊗ √ ⊗3 D √
γ X,1 = n Yi − Y
n3 ∼ n H3 (Yi ). (6.21)
√
Moreover n H3 (Yi ) is asymptotically normal with expected value γ ⊗
X,1 and
variance (6.13).
Remark 6.3 We estimate skewness 3 γ⊗ 3⊗3
X,1 by Y , nevertheless
we estimate the
variance Cγ 1 of skewness by the sample variance of H3 3 Yj . The reason is that
variance Cγ 1 corresponds to the variance of H3 (Y). Observe that the sample
variance of H3 3 Yj is different from that of 3 Y⊗3
j , although the sample means of
them are the same. In practice we calculate both 3
Y⊗3 and H3 3 Yj , then we use 3 Y⊗3
j j
to estimate skewness and H3 Y 3j to estimate the variance of 3 γ⊗ , respectively.
X,1
We note that variance of 3 Y⊗3
j follows formula (6.13). Results in Sect. 6.6 provide
numerical verification of this by Monte Carlo methods.
6.4 Testing Multivariate Skewness 333
Now the construction of 3 γ⊗X,1 shows thatit is 3-symmetric, i.e. the number of
⊗
the distinct entries of 3
γ is ŋd,3 = d+2 3 , (see Sect. 1.3.2, p. 16). Therefore,
√X,1 ⊗
although the statistics n3 γ X,1 satisfy asymptotic normality, it will not have full
rank but a rank of ŋd,3 at most. There are several methods for finding a linear
transformation which provides a standardized 3 γ⊗
X,1 . For instance one can use
either eigen-decomposition, singular-value decomposition, or Moore–Penrose (MP)
pseudoinverse of Cγ 1 . We will use the MP inverse of Cγ 1 and denote C+ γ 1 for
+1/2 −1/2
simplicity. In this way Cγ 1 corresponds Cγ 1 if Cγ 1 would have full rank.
The test statistics will be naturally based on the sum of squares of entries of
√ +1/2 ⊗
nCγ 1 3 γ X,1 ,
√ +1/2 ⊗ 2 ⊗
nCγ 1 3
γ X,1 γ X,1 C+
= n3 γ⊗
γ 13X,1 ,
γ⊗ +
γ⊗
X,1,Ð = Qd,33X,1 .
γ⊗
Then variance of 3 γ⊗
X,1,Ð , i.e. the variance of distinct entries of estimator 3X,1 is
calculated by
+
Cγ 1 ,Ð = Q+
d,3 Cγ 1 Qd,3 ,
334 6 Multivariate Skewness and Kurtosis
√ −1/2 ⊗ 2
nCγ ,Ð γ X,1,Ð = nγ ⊗ −1 ⊗
X,1,Ð Cγ ,Ð γ X,1,Ð
1 1
vec Cγ 1 = M−1 ⊗3
m3 vec Id = 6vec Sd13 ,
d+2
i.e. Cγ 1 = 6Sd13 . The MP inverse of Cγ 1 is 1/6Sd13 with rank ŋd,3 = 3 ; see
(4.68), p. 221. Hence we obtain Mardia’s statistics
⊗ n ⊗ 2
γ X,1 C+
n3 γ⊗
γ 13X,1 = 3
γ ,
6 X,1
for checking normality based on skewness. Let us recall that 3γ⊗X,1 is calculated by
the formula (6.20).
In fact, the covariance vector Cγ 1 is the third-order cumulant of H3 (Y) where
Y ∈ N (0, Id ). The entries of vec Cγ 1 can be organized in three groups, the
6.4 Testing Multivariate Skewness 335
3 (Yk ), k = 1 : d, then VarH2 Yj H1 (Yk ), j = k, and finally
first VarH
VarH1 Yj Hk (Y2 ) H1 (Ym ), where j, k, and m are distinct. The following table
shows the values and the numbers of the entries in Cγ 1 :
We show that
C3b = 2 (d + 2) Id .
If j = i, then e⊗2 ⊗3
j ⊗ ei = ej which is invariant under Sd13 ; therefore, the result of
# ⊗2
the product with ek ⊗ Id is 1. If j = i then there are two permutations (among
6) which do not change e⊗2
j ⊗ ei . This latter one changes the order of the first two
components hence the result is 2/6. This latter one occurs d −1 times. Summarizing
this we obtain
1 1
vec Id ⊗ Id Sd13 ((vec Id ) ⊗ Id ) = (2 (d − 1) + 6) Id = 2 (d + 2) Id ,
6 6
336 6 Multivariate Skewness and Kurtosis
freedom d.
The same result can be reached if we consider the form (6.3) and we use Hermite
polynomials with Gaussian entries, such that
⎡ ⎤
vec Id ⊗ Id H3 (Y) = ⎣ H3 Yj , Yj , Yk ⎦
j k
Let us consider the hypothesis H0 : X ∈ Ed (μ, , g), with our earlier notation
X = μ + 1/2 W,
1
Cum2m √ κm L−1
W = m2 vec
⊗m
Id
2ν1
= κ3 L−1
32 +
κ 2 L−1
2,H 4 Id 2 ⊗ L−1
22 vec⊗3 Id + 6vec Sd13
κ3 Sd16 vec⊗3 Id + 3!!
= 5!! κ2 L−1 ⊗3
2,H 4 Id 2 ⊗ Sd14 vec Id + 3!vec Sd13 .
γ⊗
results in the asymptotic variance of the sum of entries of 3X,1 . Hence we can use
#d 3 ⊗
the asymptotic normality of 1 3 γ X,1 for testing.
In case of MRSz index let us denote the variance matrix by Cγ 1 ,M , which is
⊗2
vec Cγ 1 ,M = vec Id ⊗ Id vec Cγ 1 ,
therefore
Cγ 1 ,M = d (d + 2) ((d + 4)
κ6 + (d + 8)
κ4 + 2) Id ,
n 3
2
b /d (d + 2) ((d + 4)
κ6 + (d + 8)
κ4 + 2) (6.23)
κ⊗ ⊗ −1 ⊗2
Y,4 = μY,4 − L22 vec Id ,
(see (3.36)), hence the method of moments gives the following estimator:
γ⊗ ⊗4 −1 ⊗2
X,2 = Yi − L22 vec Id .
+204vec⊗2 Id ⊗ κ ⊗
Y,4
+280vec Id ⊗ κ ⊗2 ⊗4
Y,3 + 96vec Id .
Cum2 (H4 (Y)) = κ ⊗
Y,8 + L −1
13 ,15 κ ⊗
Y,3 ⊗ κ ⊗ −1 ⊗2
Y,5 + L24 κ Y,4
+L−12,H 6 vec Id ⊗ κ ⊗
Y,6 + L −1 ⊗2
κ
23 Y,3 (6.25)
+L−1 ⊗2 ⊗ −1 ⊗4
2,2,H 4 vec Id ⊗ κ Y,4 + Mm4 vec Id − κ Y,4 ,
⊗2
(see 4.81, p. 231 for cumulants, and Sect. A.2, p. A.2 for commutators). The 8-
symmetric version of Cum2 (H4 (Y)) is
Sd18 Cum2 (H4 (Y)) = κ ⊗ ⊗ ⊗2 ⊗ ⊗
Y,8 + 16κ Y,6 ⊗ vec Id + 56κ Y,5 ⊗ κ Y,3
+ 34κ ⊗2 ⊗ ⊗2
Y,4 + 36κ Y,4 ⊗ vec Id
+ 160κ ⊗2 ⊗4
Y,3 ⊗ vecId + 24vec Id .
Note again that both sides of (6.25) is symmetric with respect to Sd 4 12 S⊗2
d14 , but Sd18 ;
therefore, symmetric version is only useful for scalar-valued cases.
In the case when parameters μ and are not known then we use estimated
parameters and sample 3 γ⊗
Yj . We have seen that estimator 3X,2 is simplified
Yj = 3
H4 3 Y⊗4 − 3vec⊗2 Id .
√ ⊗4 D √ √
n Yi − Y ∼ n H4 (Yi ) − L−1
13 ,11 κ ⊗
Y,3 ⊗ n H 1 (Y i ) ,
+L−1⊗2
13 ,11 κ ⊗2
Y,3 ⊗ vec Id .
340 6 Multivariate Skewness and Kurtosis
√ √ √ ⊗2
= n H4 (Yi ) − 4H3 (Yi ) ⊗ n Y + 6 nY⊗2
i ⊗Y
√ ⊗3 √ ⊗4 ⊗2
−4 n Yi ⊗ Y + n Y − 6vec Id ⊗ Y
D √ √
∼ n H4 (Yi ) − 4κ ⊗
Y,3 ⊗ n Y.
⊗k
We have stochastic convergences Y = op (1), k ≥ 1 and H3 (Yi ) → EH3 (Y) =
√
κ⊗
D
(see Lemma 4.4, p. 231), and n Yi ∼ Z. We proceed with using Slutsky’s
Y,3 ,
argument, when n → ∞, and obtain
√ ⊗4 D√ √
n 3
Y − 3vec⊗2 Id ∼ n H4 (Yi ) − L−1 ⊗
13 ,11 κ Y,3 ⊗ n Y,
with variance
vec Cγ 2 = Cum2 H4 (Y) − 4κ ⊗ ⊗2
Y,3 ⊗ H1 (Y) = Cum2 (H4 (Y))+16κ Y,3 ⊗vec Id ,
where Cum2 (H4 (Y)) has been given by (6.25). The 8-symmetric version is
vec Cγ 2 = κ ⊗ ⊗ ⊗2 ⊗ ⊗
Y,8 + 16κ Y,6 ⊗ κ Y,2 + 56κ Y,5 ⊗ κ Y,3
+33κ ⊗2
Y,4 + 36 κ ⊗
Y,4 ⊗ κ ⊗2
Y,2 (6.27)
+160κ ⊗2 ⊗ ⊗4 ⊗2
Y,3 ⊗ κ Y,2 + 24κ Y,2 + 16κ Y,3 ⊗ vec Id .
Case 6.22 (Kurtosis: μ and are unknown) Let us assume the necessary moments
exist, then the estimator
⊗4
γ⊗
3 κ⊗ 3 3⊗4 − L−1 vec⊗2 Id = 3
Y − 3vec⊗2 Id ,
X,2 = 3Y,4 = H4 Yj = Y 22 (6.28)
Yj = 3
H4 3 Y⊗4 −1 3⊗2 + L−1 vec⊗2 Id ,
j − L12 ,21 vec Id ⊗ Yj 22
+ L−1⊗2 ⊗2 −1 ⊗4
13 ,11 κ Y,3 ⊗ vec Id + Mm4 vec Id .
√ +1/2 ⊗ 2 ⊗
nCγ 2 3
γ X,2 γ X,1 C+
= n3 γ⊗
γ 23X,2 ,
⊗
which follows χ 2 distribution asymptotically. One cannot use n3 γ X,1 C+ γ⊗
γ 23
⊗ d+3
X,2
directly since 3
γ X,4 is 4-symmetric, i.e. the number of distinct entries is ŋd,4 = 4 ,
(see Sect. 1.30, p. 15). Therefore the degree of the distribution χ 2 is ŋd,4 at most.
vec Cγ 2 = M−1 ⊗4
m4 vec Id = 4!vecSd14 ,
i.e. variance matrix Cγ 2 = 4!Sd14 and the MP inverse of Cγ 2 is Sd14 /4! (see (4.68),
p. 221), and the rank of Sd14 is ŋd,4 . Hence we obtain statistics
⊗ n 2
γ X,2 C+
n3 γ⊗
γ 23X,1 = γ⊗
3
24 X,2
342 6 Multivariate Skewness and Kurtosis
γ⊗
for checking normality based on total kurtosis. Let us recall that 3X,2 is calculated
by the formula (6.28).
Mardia’s measure of kurtosis is equivalent to
2,d = E [Y]⊗4 vec Id 2 − d (d + 2) = vec Id 2 γ ⊗ ,
β X,2
2,d = 0, and
therefore we can use the estimator of kurtosis vector for checking H0 : β
obtain estimator
⊗
32,d = vec Id 2 3
β γ X,2 ,
σβ32 = 8d (d + 2) .
2,d
Indeed
⊗2 ⊗4 ⊗2
Sd14 vec Id 2 = Sd14 ej ⊗ ek = ej + Sd14 ej ⊗ ek
j,k, j =k j =k
8 ⊗2 ⊗2
= e⊗4
j + e j ⊗ e ⊗2
k + e j ⊗ e ⊗2
k ⊗ e j + e j ⊗ e k ,
4!
j =k j =k
32,d follows
hence the variance of β
⊗
Var vec Id 2 3γ X,2 = 4!vec Id 2 Sd14 vec Id 2 = 4!d + 8 d 2 − d = 8d (d + 2) ,
√ √
We conclude that nβ 2,d / 8d (d + 2) is asymptotically N (0, 1).
Let us consider CMSzR kurtosis matrix B (Y) of Y, cf. (6.9). By our treatment
B (Y) is expressed in terms of κ ⊗
4
vec B (Y) = Id 2 ⊗ vec Id κ ⊗
4
with ŋd,2 = d (d + 1) /2 distinct entry. It follows directly that we can use the
estimator
⊗
vec B 3
Y = Id 2 ⊗ vec Id 3
γ X,2 ,
for kurtosis matrix B (Y). Under the hypothesis of Gaussianity, we have the variance
of this estimator
VarvecB 3
Y = 4! Id 2 ⊗ vec Id Sd14 Id 2 ⊗ (vec Id )
6.6 A Simulation Study 343
which follows from the general result with rank ŋd,2 . One can use this result for
testing Gaussianity as well.
The trace of B (Y), which is used as a measure for kurtosis, corresponds to
Mardia’s index, hence we neglect it.
Distributions with zero skewness are not necessarily normal, for instance they can
be symmetric as well. We have that all odd cumulants are zero for an elliptically
symmetric variate X ∈ Ed (μ, , g); therefore, the variance vector becomes the
following:
Cγ 2 = κ ⊗ −1 ⊗2 −1 ⊗
Y,8 + L24 κ Y,4 + L2,H 6 vec Id ⊗ κ Y,6
+ L−1 ⊗2 ⊗ −1 ⊗4 ⊗2
2,2,H 4 vec Id ⊗ κ Y,4 + Mm4 vec Id − κ Y,4 .
Let us recall that the cumulants of Y are given in Sect. 6.4.2.2. In addition, if we
assume that H0 : κ ⊗
Y,4 = 0 is true; then we obtain
Cγ 2 = κ ⊗ −1 ⊗
Y,8 + L2,H 6 vec Id ⊗ κ Y,6 + Mm4 vec Id
−1 ⊗4
= κ4 L−1
42 + κ3 L−1
2,H 6 Id 2 ⊗ L32
−1
vec⊗4 Id + 4!vec Sd14
= 7!! κ4 Sd18 + 5!!κ3 L−1
2,H 6 dI 2 ⊗ Sd1 6 vec⊗4 Id + 4!vec Sd14 ,
where κj denotes the cumulant parameter of Y, and the commutators are given in
Sect. A.2, p. 353.
γ⊗
In this case the sum of the entries of estimator 3X,1 has asymptotic variance
1d 8 Cγ 2 = 7!!
κ4 + 5!!42
κ3 + 4! d 4 ,
In this section some numerical results will be provided in support of the theoretical
results. Monte Carlo experiments will be performed in order to compare estimations
γ X,1 and 3γ X,1 of skewness. We draw N = 1000 samples and each sample has
n = 1000 observations. We estimate parameters (below) for each sample and then
we take their average.
344 6 Multivariate Skewness and Kurtosis
Let us start with Mardia’s statistics, and use the 1D case for simplicity.
Case 6.23 (Normal Distribution) Let us suppose that Y has normal distribution,
Y ∈ N (0, 1). Mardia suggests the third-order moment Y 33 for the estimation of
skewness, and he uses 6 for the variance of the estimator (see [Mar70], (2.25)).
The proof of this result might be the following. We know that VarH3 (Y ) = 6, and
H3 Y 3 =Y 33 , (see (6.15); therefore, we conclude the variance nVarY33 is close to 6.
Our general formula (6.13) of variances (corresponding to the estimation of
skewness by the Hermite polynomial H3 , and the usage of true parameters) reduces
to 6 for the normal distribution since the cumulants of order greater than 2 are zero.
Formula (3.25) shows the variance for kurtosis, which is 24. Now Y ∈ N (0, 1),
H3 (Y ) = Y 3 − 3Y , and H4 (Y ) = Y 4 − 6Y 2 + 3. We estimate γX,1 and γX,2 by
H3 (Y ) and H4 (Y ), respectively. We can also estimate γX,1 and γX,2 using Y 3 by
3 3 3 3 34 3
H3 Y = Y andH4 Y = Y − 3, respectively, where Y denotes the standardized
variate using estimated parameters.
Estimator Hk (Y ) 3
Estimator Hk Y
Skewness 7 ∗ 10−5 7.7 ∗ 10−5
Variance for skewness 5.9165 5.9207
Kurtosis −0.0002 −0.0013
Variance for Kurtosis 24.3228 24.23
Case 6.24 (Skew Normal, 1D) The scalar-valued (1D) skew-normal variable X ∈
SNd (μ, , α) has been chosen with parameters μ = 0.7136, α = 2 (δ = 0.8944),
and = 1, (we put a correlation “matrix” for this latter one ). From a sample of X
we calculated a sample of standardized X denoted by Y , assuming the parameter μ
and the variance = 0.4907 are known. A “sample” of standardized X denoted by
3 is also calculated by estimated mean and variance.
Y
The variance C3
γ 1 of estimated skewness is given by formula (6.13):
True value Cγ 1 C3
γ1,
Variance 9.9376 9.9317
Case 6.25 (Skew Normal, 3D) We consider X ∈ SN3 (μ, , α), where α =
[2, 10, 1] , and
⎡ ⎤
1 0.6325 0.7071
= ⎣ 0.6325 1 0.8944 ⎦ ,
0.7071 0.8944 1
μ = EX X Y 3
Y
0.58447 0.58338 −0.0010811 0
0.78688 0.78639 −0.00011134 0
0.73505 0.73407 −0.0012536 0
Skewness is estimated by γ ⊗ ⊗3 −1
X,1 = H3 (Yi ) = Yi − L11 ,12 Yi ⊗ vec Id , and
H3 3 Y = 3 Y⊗3 according to true and estimated parameters, respectively, see
Table 6.1 for the result.
The variance Cγ 1 which were calculated by the parameters, which calculated by
formula (6.13), and the variance C3 γ 1 of the estimators 3
Y⊗3 can be found in the
first two columns of the
second Table 6.1. The third column contains the mean of
the variances of H3 3 Y , which is calculated for each sample during the simulation.
This latter looks much better estimate of Cγ 1 than the previous one. We conclude
that in practice when a sample is given then one can estimate the skewness with
3
Y⊗3 and the variance of the estimate with the variance of H3 3 Y . The number of
the entries of variance vector vecCγ 1 is 729, among them there are 55 distinct entries
and we listed 28 only.
346 6 Multivariate Skewness and Kurtosis
Table 6.1 Estimates: skewness and its variance for skew-normal distribution
vec Cγ 1 vec C3 vecCH3 3
γ⊗ γ⊗ γ⊗
γ1 Y
X,1 X,1 3X,1
6.1398 6.1662 6.0466
0.045497 0.044867 0.044106
0.2158 0.0984 0.1947
0.10098 0.10194 0.10104
0.269 0.0281 0.2406
0.060291 0.059591 0.059111
0.1288 −0.2574 0.1223
0.10098 0.10194 0.10104
0.131 0.1926 0.09
0.22411 0.22597 0.22493
0.1606 0.0189 0.1514
0.13381 0.1339 0.13339
0.5006 0.1283 0.4275
0.060291 0.059591 0.059111
0.0782 0.0494 0.0572
0.13381 0.1339 0.13339
0.0959 0.0198 0.0762
0.079895 0.080431 0.08022
1.5771 0.107 1.4524
0.10098 0.10194 0.10104
0.2571 0.0695 0.2389
0.22411 0.22597 0.22493
0.0467 0.0695 0.0371
0.13381 0.1339 0.13339
10.5342 6.1905 10.2052
0.22411 0.22597 0.22493
0.7562 0.1954 0.711
0.49738 0.49651 0.49254
0.1286 0.0912 0.1147
0.29698 0.29765 0.2963
0.0279 −0.2011 0.023
0.13381 0.1339 0.13339
2.0899 0.4283 2.0503
0.29698 0.29765 0.2963
0.3408 0.0942 0.3236
0.17732 0.17804 0.17729
0.0619 0.1312 0.0569
0.060291 0.059591 0.059111
0.8792 0.1684 0.8489
0.13381 0.1339 0.13339
0.1373 0.1535 0.136
0.079895 0.080431 0.08022
0.1118 0.0771 0.1102
0.13381 0.1339 0.13339
0.3048 0.3055 0.2965
0.29698 0.29765 0.2963
0.2481 0.0455 0.2392
0.17732 0.17804 0.17729
0.5507 0.2483 0.5213
0.079895 0.080431 0.08022
0.2473 0.2609 0.2389
0.17732 0.17804 0.17729
0.5489 0.0987 0.5185
0.10587 0.10911 0.10764
6.4592 5.8826 6.3931
6.7 Appendix
Fourth order
H4 (Z) = Z⊗4 − L−1
12 ,21 vec Id ⊗ Z⊗2
+ L−1 ⊗2
22 vec Id ,
Yj = 3
H4 3 Y⊗4 − L−1
12 ,21 vec I d ⊗ 3
Y j
⊗2 + L−1 vec⊗2 I
22 d
=3Y⊗4 − L−1 −1 ⊗2
12 ,21 − L22 vec Id = Y
3⊗4 − L−1 vec⊗2 Id
22
=3
Y⊗4 − 3vec⊗2 Id ,
L−1 −1 −1 −1 −1 −1 −1
12 ,21 − L22 = Id 4 + K(1324) + K(1423) + K(2314) + K(2413) + K(3412)
−Id 4 − K−1 −1
(1324) − +K(1423)
= K−1 −1 −1
(2314) + K(2413) + K(3412) = K(3124) + K(3142) + K(3412) ,
Sixth order
H6 (Z) = Z⊗6 − 15vec Id ⊗ Z⊗4 + 45Z⊗2 ⊗ vec⊗2 Id − 15vec⊗3 Id ,
H6 3Yj = 3Y⊗6 − 15vecId ⊗ 3Y⊗4 + 30vec⊗3 Id .
6.8 Exercises
6.2 Show
2
L−1 ⊗2
22 vec Id = 3d (d + 2) ,
then
⊗
vecB (Y) = Id 2 ⊗ (vec Id ) κ ⊗
4 = (vec Id ) ⊗ Id 2 κ 4 .
then
Moments of folded normal distribution are obtained in Elandt [Ela61]. The mul-
tivariate skew-normal distribution was introduced by Azzalini and Dalla Valle
[ADV96]. Further properties and applications of skew-normal distribution including
canonical fundamental skew-normal can be found in papers [AC99, GHL01, PP01,
AC03, AVGQ04, Azz05, AVG05, Cap12, CC13, Shu16], and [AVFG18] among
others.
We refer the reader to the book by Fang, Kotz, and Ng, [FKN17] for the theory
of spherically symmetric or rotationally symmetric distributions. The distributions
6.9 Bibliographic Notes 349
n−r+1 j
1 xj
Bn,r (x1:n−r+1 ) = n! .
j ! j!
j =r, j =1
j j =n
Particular cases:
n=1
n=5
B1,1 (x1 ) = x1
B5,1 (x1:5 ) = x5
n=2
B5,2 (x1:4 ) = 5x1 x4 + 10x2 x3
B2,1 (x1:2 ) = x2
B5,3 (x1:3 ) = 15x1 x22 + 10x12 x3
B2,2 (x1 ) = x12
B5,4 (x1:2 ) = 10x13 x2
n=3
B5,5 (x1 ) = x15
B3,1 (x1:3 ) = x3
n=6
B3,2 (x1:2 ) = 3x1 x2
B6,1 (x1:6 ) = x6
B3,3 (x1 ) = x13
B6,2 (x1:5 ) = 6x1 x5 + 15x2 x4 + 10x32
n=4
B6,3 (x1:4 ) = 15x12 x4 + 60x1 x2 x3 + 15x23
B4,1 (x1:4 ) = x4
B6,4 (x1:3 ) = 45x12 x22 + 20x13 x3
B4,2 (x1:3 ) = 4x1 x3 + 3x22
B6,5 (x1:2 ) = 15x14 x2
B4,3 (x1:2 ) = 6x12 x2
B6,6 (x1 ) = x16
B4,4 (x1 ) = x14
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 351
G. Terdik, Multivariate Statistical Methods, Frontiers in Probability
and the Statistical Sciences, https://doi.org/10.1007/978-3-030-81392-5
352 A Formulae
n
Bn (x1:n ) = Bn,r (x1:n−r+1 ) .
r=1
Bn (x1 , . . . , xn ) (A.1)
(k1 −1)+ (kn−2 −1
n−1 )+ 1 x n−2k1 +k2
1
= n! ...
(n − 2k1 + k2 )! 1!
k1 =0 k2 =(2k1 −n)+ kn−1 =(2kn−2 −kn−3 )+
n kj−1 −2kj +kj+1
1 xj
,
kj −1 − 2kj + kj +1 ! j!
j =1
where k0 = n, and kn = kn+1 = 0, and the limits of summations are valid for
non-negative numbers. Particular cases:
n Bn (x1:n )
1 x1
2 x12 + x2
3 x13 + 3x1 x2 + x3
4 x14 + 6x12 x2 + 4x1 x3 + 3x22 + x4
5 x15 + 10x13 x2 + 15x1 x22 + 10x12 x3 + 5x1 x4 + 10x2 x3 + x5
6 x16 + 15x14 x2 + 20x13 x3 + 45x12 x22 + 15x23 + 60x1 x2 x3 + 15x4 x12 + 10x32 +
15x2 x4 + 6x1 x5 + x6
7 x17 + 21x15 x2 + 35x14 x3 + 105x13 x22 + 35x13 x4 + 210x12 x2 x3 + 105x1 x23
+21x12 x5 + 105x1 x2 x4 + 70x1 x32 + 105x22 x3 + 7x1 x6 + 21x2 x5 + 35x3 x4 + x7
8 x18 + 28x16 x2 + 56x15 x3 + 210x14 x22 + 70x14 x4 + 420x12 x23 + 560x13 x2 x3
+105x24 + 56x13 x5 + 840x1 x22 x3 + 280x12 x32 + 420x12 x2 x4
+280x2 x32 + 210x22 x4 + 280x1 x3 x4 + 168x1 x2 x5 + 28x12 x6
+35x42 + 56x3 x5 + 28x2 x6 + 8x1 x7 + x8
A Formulae 353
One can use the R package “kStatistics” for getting Bell polynomials and Bell
numbers.
n 1 2 3 4 5 6 7 8 9 10
Bn 1 2 5 15 52 203 877 4141 21,147 115,975
A.2 Commutators
L−1 −1 −1
12 ,11 = Id 3 + K(132) + K(231) = Id 3 + K(132) + K(312) , (A.2)
L−1 −1 −1
11 ,12 = Id 3 + K(213) + K(312) = Id 3 + K(213) + K(231) .
Commutator: L−1
22 (2.63), p. 96, (1.61), p. 54,
L−1 −1 −1
22 = Id 4 + K(1324) + K(1423) = Id 4 + K(1324) + K(1342) . (A.3)
Commutator: L−1
13 ,11 (2.62), p. 96,
L−1 −1 −1 −1
13 ,11 = Id 4 + K(1243) + K(1342) + K(2341) = Id 4 + K(1243) + K(1423) + K(4123)
(A.4)
L−1 −1 −1 −1
11 ,13 = Id 4 + K(2134) + K(3124) + K(4123) = Id 4 + K(2134) + K(2314) + K(2341) .
354 A Formulae
4
Commutator: L−1
12 ,21 (2.64), p. 96, number of terms 2,1,1 /2 = 12/2,
L−1 −1 −1 −1 −1 −1
12 ,21 = Id 4 + K(1324) + K(1423) + K(2314) + K(2413) + K(3412) . (A.5)
Commutator: L−1
13 ,12 (3.37), p. 137,
L−1 −1 −1 −1 −1 −1
13 ,12 = Id 5 + K(12435) + K(12534) + K(13425) + K(13524) + K(14523)
+ K−1 −1 −1 −1
(23415) + K(23514) + K(24513) + K(34512) .
p 123|45 124|35 125|34 134|25 135|24 145|23 234|15 235|14 245|13 345|12
K−1
p Id 5 K−1 −1 −1 −1 −1 −1 −1 −1 −1
(12435) K(12534) K(13425) K(13524) K(14523) K(23415) K(23514) K(24513) K(34512)
Commutator: L−1
32
15
L−1
32 = K−1
pj . (A.6)
j =1
where permutation pj is split into three pairs pj (1) |pj (2) |pj (3) , and .N =
6!/3!23 = 15 = 5!!, since n = 6, 2 = 3.
Commutator: L−1
23
10
L−1
23 = K−1
pj .
j =1
pj 123|456 124|356 125|346 126|345 134|256 135|246 136|245 145|236 146|235 156|234
K−1
pj Id 6 K−1 −1 −1 −1 −1 −1 −1 −1 −1
(124356) K(125346) K(126345) K(134256) K(135246) K(136245) K(145236) K(146235) K(156234)
One can get these permutations choosing all possible 3 subsets of the set 1 : 6,
order each subset alphabetically. Then pair and unify them
forming blocks and
the permutations. The number of permutations N = 63 /2 = 10.
Commutator: L−1
24
35
L−1
24 = K−1
pj .
j =1
B1 1234 1235 1236 1237 1238 1245 1246 1247 1248 1256 1257
1258 1267 1268 1278 1345 1346 1347 1348 1356 1357 1358 1367
1368 1378 1456 1457 1458 1467 1468 1478 1567 1568 1578 1678
8
Here 4 = 2, n = 8, r = 2, N = 8!/2! (4!)2 = 4 /2 = 35.
Commutator: L−1
15 ,13
56
L−1
15 ,13 = K−1
pj .
j =1
Now, we have n = 8, 3 = 1, 5 = 1, r = 2, N = 83 = 8!/3!5! = 56. We
choose subsets with three elements by all possible ways and complete them by
the complementary (5 digit) according to 1 : 8. We do not divide 85 because
each pair is different. We construct permutations taking 5 digits, for instance if
we have B2 = (1, 3, 6), then the complementary subset is (2, 4, 5, 7, 8) and then
permutation p = (24,578,136). We list 56 blocks with 3 entries:
B2 123 124 125 126 127 128 134 135 136 137 138 145 146 147 148 156 157 158
167 168 178 234 235 236 237 238 245 246 247 248 256 257 258 267 268 278 345
346 347 348 356 357 358 367 368 378 456 457 458 467 468 478 567 568 578 678
356 A Formulae
Permutations (see (1.1), p. 4) for mixing two T -products of vectors with an equal
number of terms, say n implies commutator
M−1
mn = K−1
mn (p) d k1:n , d j1:n , (A.7)
p∈Pn
−1 −1 −1 −1
M−1
m3 = K(142536) + K(142635) + K(152436) + K(152634)
+K−1 −1
(162435) + K(162534) (A.8)
= K(135246) + K(135264) + K(135426) + K(135624) + K(135462) + K(135642)
vecId 3 = K−1 −1 ⊗3 ⊗3
(142536)K(142536)vecId 3 = K(142536)vec Id = K(135246)vec Id .
A.2.2.2 H-Commutators
Examples
Commutator: L−1
2,H 2 Coefficient of κ ⊗
X,2 ⊗ H2 (X) :
L−1 −1 −1 −1 −1
2,H 2 = K(1324) + K(1423) + K(1324) + K(2413). (A.10)
Commutator: L−1
2,2,H 2 Coefficient of κ ⊗2
X,2 ⊗ H2 (X) :
L−1 −1
2,2,H 2 = L6,m2 = L−1
(m2 ((1:3)\j,(4:6)\k),j,k) (A.11)
j =1:3,k=4:6
= K−1 −1 −1 −1
(253614) + K(263514) + K(243615) + K(263415)
+K−1 −1 −1 −1
(243516) + K(253416) + K(153624) + K(163524)
+K−1 −1 −1 −1
(143625) + K(163425) + K(143526) + K(162534)
+K−1 −1 −1 −1
(152634) + K(162534) + K(142635) + K(162435)
+K−1 −1
(142536) + K(152436),
where for instance m2 ((1 : 3) \j, (4 : 6) \k) |j =1,k=5 = {(2436) , (2634)} and
L−1 −1 −1
(m2 ((1:3)\j,(4:6)\k),j,k) |j =1,k=5 = K(243615) + K(263415).
Commutator: L−1
2,H 4 Coefficient of κ ⊗
X,2 ⊗ H4 (X) :
L−1
2,H 4 = K−1
(j,k,(1:6)\(j,k)) (A.12)
j =1:3,k=4:6
= K−1 −1 −1 −1 −1
(142356) + K(152346) + K(162345) + K(241356) + K(251346)
+ K−1 −1 −1 −1
(261345) + K(341256) + K(351246) + K(361245)
We also have
L−1 ⊗2
2,H 4 = L11 ,12 ⊗ Id 3 Id 3 ⊗ L11 ,12 K(134256) = L11 ,12 K(134256).
Commutators: L−1 −1 −1
2,H 6 , L2,2,H 4 , L2,2,2,H 2 . We have K−1
(15234678) = K(13452678) and
358 A Formulae
L−1
2,H 6 = K−1 ⊗2
(j,k,(1:8)\(j,k)) = L11 ,13 K(13452678), (A.13)
j =1:4,k=5:8
L−1
2,2,H 4 = L−1
(m2 ((1:4)\(j,k),(5:8)\(s,t )),j,k,s,t ) , (A.14)
j,k=1:4,j =k
s,t =5:8,s=t
L−1
2,2,2,H 2 = L−1
(m3 ((1:4)\j,(5:8)\k),j,k) , (A.15)
j =1:4,k=5:8
44
and L−1 −1
2,2,H 4 has 2 2 2 = 72 terms and L2,2,2,H 2 has 4 ∗ 4 ∗ 3! = 96 terms.
An instance m3 ((1 : 4) \j, (5 : 8) \k) |j =1,k=6 = {(253748) , (253847) ,
(273548) , (273845) , (283547) , (283745)} and the corresponding commutator
is
L−1 −1 −1 −1
(m3 ((1:4)\j,(5:8)\k),j,k) |j =1,k=6 = K(25374816) + K(25384716) + K(27354816)
+K−1 −1 −1
(27384516) + K(28354716) + K(28374516).
L−1
2,2,H 4 includes terms like:
K−1 −1
(37481256) + K(37481256) = K(56137824) + K(56137842),
K−1 −1
(27481356) + K(27481356) = K(51637825) + K(56127842).
J-Commutators
are connected to T -Hermite polynomials that are included in the recurrence
relation 4.2, p. 208. Define the J-commutator by
n−1
Jn = K−1
(n,j,[1:(n−1)]\j ) (d1:n )
j =1
J4 = K−1 −1 −1
(4123) + K(4213) + K(4312) .
n−1
Jn = K−1
((j :n−1)S )n
j =1
A Formulae 359
and we have
J4 = K−1 −1
(2314) + K(1324) + Id 4 ,
J5 = K−1 −1 −1
(23415) + K(13425) + K(12435) + Id 5 .
∂ 4 h (x)
= f (1) (g) g(1,2,3,4)
∂x1 ∂x2 ∂x3 ∂x4
+f (2) (g) g(1,2,3)g4 + g1 g(2,3,4) + g(1,2,4) g3 + g2 g(1,3,4)
+f (2) (g) g(1,2)g(3,4) + g(1,3) g(2,4) + g(1,4)g(2,3)
+f (3) (g) g1 g2 g(3,4) + g1 g3 g(2,4) + g1 g4 g(2,3)
+f (3) (g) g(1,2)g3 g4 + g(1,3)g2 g4 + g(1,4) g2 g3
+f (4) (g) g1 g2 g3 g4 .
In particular
∂ 4 h (x) (1,3)
= f (1) (g) g(1,2)
∂x1 ∂x23
(1,2) (1) (1) (3) (1,2) (1) (1) (1,2)
+f (2) (g) g(1,2) g2 + g1 g2 + g(1,2) g2 + g2 g(1,2)
(1,1) (2) (1,1) (2) (1,1) (2)
+f (2) (g) g(1,2) g2 + g(1,2) g2 + g(1,2) g2
(1) (1) (2) (1) (1) (2) (1) (1) (2)
+f (3) (g) g1 g2 g2 + g1 g2 g2 + g1 g2 g2
! "
(1,1) (1) 2 (1,1) (1) 2 (1) (1) 2
+f (3) (g) g(1,2) g2 + g(1,2) g2 + g(1,2) g2
(1,0) (0,1) 3
+f (4) (g) g1 g2 ,
(1,3) (1,2) (1) (1) (3) (1,1) (2)
= f (1) (g) g(1,2) + f (2) (g) 3g(1,2) g2 + g1 g2 + 3g(1,2) g2
! "
(1) (1) (2) (1,1) (1) 2 (1) (1) 3
+f (g) 3g1 g2 g2 + 3g(1,2) g2
(3)
+ f (4) (g) g1 g2 ,
360 A Formulae
and
∂ 4 h (x) (2,2)
= f (1) (g) g(1,2)
∂x12 ∂x22
(2,1) (1) (1) (1,2) (2,1) (1) (1) (1,2)
+f (2) (g) g(1,2) g2 + g1 g(1,2) + g(1,2) g2 + g1 g(1,2)
! "
(2) (2) (1,1) 2 (1,1) 2
+f (2) (g) g1 g2 + g(1,2) + g(1,2)
! "
(1) 2 (2) (1) (1) (1,1) (1) (1) (1,1)
+f (3) (g) g1 g2 + g1 g2 g(1,2) + +g1 g2 g(1,2)
! "
(2) (1) 2 (1,1) (1) (1) (1,1) (1) (1)
+f (3)
(g) g1 g2 + g(1,2) g1 g2 + g(1,2) g1 g2
(1) 2 (1) 2
+f (4) (g) g1 g2
(2,2) (2,1) (1) (1) (1,2)
= f (1) (g) g(1,2) + f (2) (g) 2g(1,2) g2 + 2g1 g(1,2)
"
(2) (2) (1,1) 2 (1,1) 2
+g1 g2 + g(1,2) + g(1,2)
! "
(2) (1) 2 (1,1) (1) (1) (1) 2 (1) 2
+f (3) (g) g1 g2 + 2g(1,2) g1 g2 + f (4) (g) g1 g2 .
Case A.2
Dx⊗5 f (g (x)) = f (1) (g (x)) Dx⊗5 g (x)
A Formulae 361
+f (2) (g (x)) Sd15 5Dx⊗4 g (x) ⊗ Dx⊗ g (x) + 10Dx⊗3 g (x) ⊗ Dx⊗2 g (x)
⊗2
⊗2 ⊗ ⊗3
⊗ ⊗2
+f (g (x)) Sd15 15 Dx g (x)
(3)
⊗ Dx g (x) + Dx g (x) ⊗ Dx g (x)
⊗3 ⊗5
+f (4) (g (x)) Sd15 Dx⊗2 g (x) ⊗ Dx⊗ g (x) + f (5) (g (x)) Dx⊗ g (x) .
μ1 = κ 1 ,
In particular
2
μX,4 = κX,4 + 4κX,3 κX + 3κX,3 + 6κX,2 κX2 + κX4 .
μ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
1:3 = κ 1:3 + κ 1 ⊗ κ 2,3 + K(3,2)S (d1:3 ) κ 1,3 ⊗ κ 2 + κ 1,2 ⊗ κ 3 + κ 1 ⊗ κ 2 ⊗ κ 3 .
362 A Formulae
In particular, if Xj = X then
μ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗3
X,3 = κ X,3 + K(3,2)S κ X,2 ⊗ κ X,1 + κ X,1 ⊗ κ X,2 + K(231) κ X,1 ⊗ κ X,2 + κ X,1
= κ⊗ −1 ⊗ ⊗ ⊗3 ⊗ ⊗ ⊗ ⊗3
X,3 + L12 ,11 κ X,2 ⊗ κ X,1 + κ X,1 = κ X,3 + 3κ X,2 ⊗ κ X,1 + κ X,1 .
+K−1 (1342) κ ⊗
1,3,4 ⊗ κ ⊗ −1 ⊗
2 + K(2341) κ 2:4 ⊗ κ 1
⊗
+κ ⊗ 1,2 ⊗ κ ⊗
3,4 + K −1
(1324) κ ⊗
1,3 ⊗ κ ⊗
2,4
+K−1 ⊗ ⊗ ⊗
(1423) κ 1,4 ⊗ κ 2,3 + κ 1,2 ⊗ κ 3 ⊗ κ 4
⊗ ⊗
+K−1 ⊗ ⊗ ⊗ −1 ⊗
(1324) κ 1,3 ⊗ κ 2 ⊗ κ 4 + K(1423) κ 1,4 ⊗ κ 2 ⊗ κ 3
⊗ ⊗
+K−1 (2314) κ ⊗
2,3 ⊗ κ ⊗
1 ⊗ κ ⊗
4
+K−1 (2413) κ ⊗
2,4 ⊗ κ ⊗
1 ⊗ κ ⊗
3 + K −1
(3412) κ ⊗
3,4 ⊗ κ ⊗
1 ⊗ κ ⊗
2
+κ ⊗ ⊗ ⊗ ⊗
1 ⊗ κ2 ⊗ κ3 ⊗ κ4 .
Assume Cum1 Xj = EXj = 0,
μ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ −1 ⊗ ⊗
1:4 = κ 1:4 + κ 1,2 ⊗ κ 3,4 + K(3,2)S (d1:4 ) κ 1,3 ⊗ κ 2,4 + K(4:2)S (d1:4 ) κ 1,4 ⊗ κ 2,3 .
In particular, if Xj = X, then
μ⊗
X,4 = κ ⊗
X,4 + L−1
13 ,11 κ ⊗
X,3 ⊗ κ ⊗
X,1 + L−1 ⊗2
κ
22 X,2 + L −1
12 ,12 κ ⊗
X,2 ⊗ κ ⊗2 ⊗4
X,1 + κ X,1
= κ⊗ X,4 + 4 κ ⊗
X,3 ⊗ κ ⊗ ⊗2 ⊗ ⊗2 ⊗4
X,1 + 3κ X,2 + 6κ X,2 ⊗ κ X,1 + κ X,1 .
μ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗2 ⊗
X,6 = κ X,6 + 21κ X,2 κ 12 ⊗ κ X,5 + 35κ X,3 ⊗ κ X,4 + 105κ X,2 ⊗ κ X,3
μ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗2
X,8 = κ X,8 + 28κ X,2 ⊗ κ X,6 + 56κ X,3 ⊗ κ X,5 + 35κ X,4
+280κ ⊗ ⊗2 ⊗2 ⊗ ⊗4
X,2 ⊗ κ X,3 + 210κ X,2 ⊗ κ X,4 + 105κ X,2 ,
H0 =
1,
H1 (X1 ) =
X1 ,
H2 (X1:2 ) =
X1 X2 − σ12 ,
H3 (X1:3 ) X1 X2 X3 − σ12 X3 − σ13 X2 − σ23 X1 ,
=
H4 (X1:4 ) X1 X2 X3 X4 − σ12 X3 X4 − σ13 X2 X4 − σ14 X2 X3
=
−σ23 X1 X4 − σ24 X1 X3 − σ34 X1 X2 + σ12 σ34 + σ13 σ24 + σ14 σ23 ,
#
H5 (X1:5 ) = X5 H4 (X1:4 ) − 4j =1 σj,5 H3 X(1:5)\j .
These formulae are valid for the case when all Xj are identical to X, say
H0 = 1,
H1 (X) = X,
H2 (X) = X2 − σ 2 ,
H3 (X) = X 3 − 3Xσ 2 ,
H4 (X) = X 4 − 6σ 2 X 2 + 3σ 4 ,
H5 (X) = X 5 − 10σ 2 X 3 + 15σ 4 X,
H6 (X) = X 6 − 15σ 2 X 4 + 45σ 4 X 2 − 15σ 6 .
2
H2 (X1 )H2 (X2 ) = H4 (X12 , X212 ) + 4σ12 H2 (X1:2 ) + 2σ12 .
Case A.7
H2 (X1 ) H2 (X2 ) H2 (X3 ) = H6 X12 , X212 , X312
+ 4σ23 H4 X12 , X2 , X3 + 4σ12 H4 (X1, X2 , X312 )
+ 4σ13 H4 (X1, X212 , X3 ) + 8σ12 σ13 H2 (X2 , X3 ) + 8σ13 σ23 H2 (X1 , X2 )
+ 8σ12 σ23 H2 (X1 , X3 )
+ 2σ13
2
H2 (X2 ) + 2σ23
2
H2 (X1 ) + 2σ12
2
H2 (X3 ) + 8σ12 σ13 σ23 .
Case A.8
H4 X12 , X212 H2 (X3 ) = H6 X12 , X212 , X312 + 4σ13 H4 (X1 , X212 , X3 )
+ 4σ23 H4 X12 , X2 , X3 + 2σ13
2
H2 (X2 ) + 2σ23
2
H2 (X1 ) + 8σ13 σ23 H2 (X1 , X2 ).
364 A Formulae
2
H4 (X1 ) H2 (X3 ) = H6 X14 , X312 + 8σ13 H4 (X13 , X3 ) + 12σ13 H2 (X1 )
H4 (X) H2 (X) = H6 (X) + 8σ 2 H4 (X) + 12σ 4 H2 (X) ,
cf. (4.26).
Case A.9
4
8
H4 (X1:4 ) H4 (X5:8 ) = H8 (X1:8 ) + σj k H6 X(1:4)\j,(5:8)\k
i=1 j =5
+ σj1 k1 σj2 k2 H4 X(1:4)\(j1 ,j2 ),(5:8)\(k1 ,k2 )
72
+ σj1 k1 σj2 k2 σj3 k3 H2 X(1:4)\(j1 ,j2 ,j3 ),(5:8)\(k1 ,k2 ,k3 )
96
+ σ1k1 σ2k2 σ3k3 σ4k3 ,
4!
H4 (X) H4 (X) = H8 (X) + 16σ 2 H6 (X) + 72σ 4 H4 (X) + 96σ 6 H2 (X) + 24σ 8 ,
In particular
H6 (X1 ) H2 (X4 ) = H8 X16 , X412 + 12σ14 H6 X312 , X4 + 30σ14
2
H4 X414 .
Case A.11
EH2 (X1 ) H2 (X2 )H2 (X3 )H2 (X4 ) = 16 (σ12 σ13 σ24 σ34 + σ12 σ23 σ14 σ24
+σ23 σ13 σ24 σ14 + 4 σ13
2 2
σ24 + σ23
2 2
σ14 + σ12
2 2
σ34 .
A Formulae 365
In particular
2 H (X , X ) + 6σ 3
H3 (X1 ) H3 (X4 ) = H6 X13 , X413 + 9σ14 H4 X12 , X412 + 18σ14 2 1 4 14
H0 =1
H1 (X1 ) = X1
H2 (X1:2 ) = X1 ⊗ X2 − κ ⊗
1,2
⊗1
H3 (X1:3 ) = X1:33 − κ ⊗ ⊗ −1
1,2 ⊗ X3 − X1 ⊗ κ 2,3 − K(3,2)S(d1:3 ) κ 1,3 ⊗ X
⊗
2
H4 (X1:4 ) = X⊗1 ⊗
1:4 − κ 1,2 ⊗ X
4 −1 ⊗
3 ⊗ X4 − K(3,2)S (d1:4 ) κ 1,3 ⊗ X2 ⊗ X4
−K−1 ⊗ −1
(4:2)S (d1:4 ) κ 1,4 ⊗ X2 ⊗ X3 − K(3,2)S X1 ⊗ κ 2,3 ⊗ X4
⊗
−K−1 ⊗
(4,3)S (d1:4 ) X1 ⊗ κ 2,4 ⊗ X3 − X1 ⊗ X2 ⊗ κ ⊗
3,4
+κ ⊗
1,2 ⊗ κ ⊗
3,4 + K −1
(3,2) (d 1:4 ) κ ⊗
1,3 ⊗ κ ⊗
2,4 + K −1 ⊗ ⊗
(4:2) (d1:4 ) κ 1,4 ⊗ κ 2,3 .
S S
Suppose that all Xj =X, and use notation vecVar (X) = κ 2 , then we have:
H0 =1
H1 (X) = X
H2 (X) = X⊗2 − κ 2 ,
H3 (X) = X⊗3 − L−1 ⊗2
11 ,12 X ⊗ κ 2 = X⊗3 − 3X ⊗ κ ⊗2
2
⊗4 −1 ⊗ ⊗2
−1 ⊗2
H4 (X) = X − L2,H 2 κ 2 ⊗ X + L22 κ 2
= X⊗4 − 6κ 2 ⊗ X⊗2 + 3κ ⊗2
2
H5 (X) = X⊗5 − 10κ 2 ⊗ X⊗3 + 15κ ⊗2
2 ⊗X
H6 (X) = X⊗6 − 15κ 2 ⊗ X⊗4 + 45κ ⊗2
2 ⊗X
⊗2 − 15κ ⊗3
2
H7 (X) = X⊗7 − 21κ 2 ⊗ X⊗5 + 105κ ⊗2
2 ⊗X
⊗2 − 105κ ⊗3 ⊗ X
2
H8 (X) = X⊗8 − 28κ 2 ⊗ X⊗6 + 210κ ⊗2
2 ⊗X
⊗4 − 420κ ⊗3 ⊗ X⊗2 + 105κ ⊗4 ,
2 2
where the symmetry equivalence (=) is obtained by using symmetrizer Sd1k , noting
that left-hand sides are symmetric.
Products of H2 , cf. (4.74), p. 224.
Let all nj = 2, any number of arms dK be even, Ln,0 = {bi = (2i + 1, 2 (i + 1))
|i = 0 : (n − 1)}, and all dimensions be d.
Case A.13 L2,0 = {b1 = (1, 2) , (3, 4)}, dK = (0, 2, 4)
H2 X12 ⊗ H2 X212 = H2 (X1 , X1 ) ⊗ H2 (X2 , X2 )
= H4 X12 , X212 + K−1 −1 −1 −1 ⊗
(1324) + K(1423) + K(2314) + K(2413) κ 1,2
⊗H2 (X1:2 ) + K−1(1324) + K −1 ⊗2
(1423) κ 1,2 .
Case A.15 H3 ⊗ H3
+K−1
(152346) κ ⊗
1,5 ⊗ H 4 X 1:6\1,5
+K−1
(162345) κ ⊗
1,6 ⊗ H 4 X 1:6\1,6 + K −1
(241356) κ ⊗
2,4 ⊗ H 4 X 1:6\2,4
+K−1 ⊗
(251346) κ 2,5 ⊗ H4 X1:6\2,5 + K−1 ⊗
(261345) κ 2,6 ⊗ H4 X1:6\2,6
+K−1 ⊗
(341256) κ 3,4 ⊗ H4 X1:6\3,4 + K−1 ⊗
(351246) κ 3,5 ⊗ H4 X1:6\3,5
+K−1
(361245) κ ⊗
3,6 ⊗ H 4 X 1:6\3,6
+ K−1(253614) κ ⊗
2,5 ⊗ κ ⊗
3,6 + K −1
(263514) κ ⊗
2,6 ⊗ κ ⊗
3,5 ⊗ H2 X1,4
+ K−1 ⊗ ⊗ −1 ⊗
(243615) κ 2,4 ⊗ κ 3,6 + K(263415) κ 2,6 ⊗ κ 3,4
⊗
⊗ H2 X1,5
+ K−1 ⊗ ⊗ −1 ⊗
(243516) κ 2,4 ⊗ κ 3,5 + K(253416) κ 2,5 ⊗ κ 3,4
⊗
⊗ H2 X1,6
+ K−1(153624) κ ⊗
1,5 ⊗ κ ⊗
3,6 + K −1
(163524) κ ⊗
1,6 ⊗ κ ⊗
3,5 ⊗ H2 X2,4
+ K−1(143625) κ ⊗
1,4 ⊗ κ ⊗
3,6 + K −1
(163425) κ ⊗
1,6 ⊗ κ ⊗
3,4 ⊗ H2 X2,5
+ K−1(143526) κ ⊗
1,4 ⊗ κ ⊗
3,5 + K −1
(153426) κ ⊗
1,5 ⊗ κ ⊗
3,4 ⊗ H2 X2,6
+ K−1 ⊗ ⊗ −1 ⊗
(152634) κ 1,5 ⊗ κ 2,6 + K(162534) κ 1,6 ⊗ κ 2,5
⊗
⊗ H2 X3,4
+ K−1 ⊗ ⊗ −1 ⊗
(142635) κ 1,4 ⊗ κ 2,6 + K(162435) κ 1,6 ⊗ κ 2,4
⊗
⊗ H2 X3,5
+ K−1(142536) κ ⊗
1,4 ⊗ κ ⊗
2,5 + K −1
(152436) κ ⊗
1,5 ⊗ κ ⊗
2,4 ⊗ H2 X3,6
+K−1
(142536) κ ⊗
1,4 ⊗ κ ⊗
2,5 ⊗ κ ⊗
3,6 + K −1
(142635) κ ⊗
1,4 ⊗ κ ⊗
2,6 ⊗ κ ⊗
3,5
+K−1 ⊗ ⊗ ⊗ −1 ⊗
(152436) κ 1,5 ⊗ κ 2,4 ⊗ κ 3,6 + K(152634) κ 1,5 ⊗ κ 2,6 ⊗ κ 3,5
⊗ ⊗
+K−1 ⊗ ⊗ ⊗ −1 ⊗ ⊗
(162435) κ 1,6 ⊗ κ 2,4 ⊗ κ 3,5 + K(162534) κ 1,6 ⊗ κ 2,5 ⊗ κ 3,6 .
⊗
+K−1
(152634) κ ⊗
1,5 ⊗ κ ⊗
2,6 ⊗ κ ⊗
3,5
+K−1
(162435) κ ⊗
1,6 ⊗ κ ⊗
2,4 ⊗ κ ⊗
3,5
+K−1 ⊗ ⊗ ⊗
(162534) κ 1,6 ⊗ κ 2,5 ⊗ κ 3,6
A.6 Function G
n!! = n ∗ (n − 2) · · · 3 ∗ 1,
and
√ (2k − 1)!!
(k + 1/2) = π . (A.23)
2k
Pochhammer’s Symbol
p (p + 2) · · · (p + 2 (k − 1)) = (p)k,2 .
A Formulae 369
In particular:
:√ √
(d/2 + 1/2) π 2(2k−1)!!
k (k−1)! =
(2k−1)!!
π 2(2k−2)!! d = 2k
G1 (d) = = k
(d/2) k
= √ k!2 = √ (2k)!! d = 2k + 1
G1 (2k) π(2k−1)!! π(2k−1)!!
(A.25)
((p + 1) /2 + k)
G2k+1 (p) =
(p/2)
= (p + 1) /2 ((p + 1) /2 + 1) · · · ((p + 1) /2 + k − 1) G1 (p)
= 2−k (p + 1) (p + 3) · · · (p + 2k − 1) G1 (p)
= 2−k (p + 1) (p + 1 + 2) · · · (p + 1 + 2 (k − 1))
= 2−k (p + 1)k,2 G1 (p) .
(A.27)
Let p > k, and then G−k (p) is well defined and follows:
2
G−1 (p) = G1 (p) ,
p−1
1 2
G−2 (p) = = ,
p/2 − 1 p−2
2
G−3 (p) = G−1 (p) ,
p−3
1 4
G−4 (p) = = ,
(p/2 − 2) (p/2 − 1) (p − 4) (p − 2)
22
G−5 (p) = G−1 (p) .
(p − 3) (p − 5)
370 A Formulae
In general
2k 2k
G−2k (p) = = ,
(p − 2k) · · · (p − 4) (p − 2) (p − 2k)k,2
1 ≤ k < p/2, (A.28)
2k G 1 (p) 2k G 1 (p)
G−(2k−1) (p) = = ,
(p − (2k − 1)) · · · (p − 3) (p − 1) (p − (2k − 1))k,2
1 ≤ k < p/2.
(A.29)
Moments
⎧
⎪ 1
p n/2 ⎪
⎨ n = 2k
(p − 2k)k,2
μRp ,n = G−n (p) = pk . (A.30)
2 ⎪
⎪
G−1 (p)
n = 2k + 1
⎩
(p − 2k − 1)k,2
In particular:
pk
μRp ,2k = ,
(p − 2k) · · · (p − 4) (p − 2)
and
7 7 √
p p 2 2p
μRp = G−1 (p) = G1 (p) = G1 (p) ,
2 2 p−1 p−1
p 2 p
μRp ,2 = = ,
2 p−2 p−2
p 3/2 2 7
p p p
μRp ,3 = G−1 (p) = G−1 (p) = μR ,
2 p−3 p−3 2 p−3 p
p 3/2 √
22 p 2p
= G1 (p) = G1 (p) ,
2 (p − 1) (p − 3) (p − 3) (p − 1)
p 2 4 p
μRp ,4 = = μR ,2 ,
2 (p − 4) (p − 2) p−4 p
A Formulae 371
p 5/2 22 p2
μRp ,5 = G−1 (p) = μR ,
2 (p − 3) (p − 5) (p − 3) (p − 5) p
p 3 4
μRp ,6 = .
2 (p − 4) (p − 2)
Cumulants
k
2k − 1
κRp ,2k = μRp ,2k − κRp ,2k−2j +1 μRp ,2j −1
2j − 1
j =1
k−1
2k − 1
− κRp ,2k−2j μRp ,2j
2j
j =1
k
pk 2k − 1 pj
= − κRp ,1 κRp ,2k−2j +1
(p − 2k)k,2 2j − 1 (p − (2j − 1))j,2
j =1
k−1
2k − 1 pj
− κRp ,2k−2j .
2j (p − 2j )j,2
j =1
In particular
p p 2
κRp ,2 = G−2 (p) − (G−1 (p))2 = − (G−1 (p))2 ,
2 2 p−2
p2 1 1
κRp2 ,2 = μRp ,4 − μ2Rp ,2 = −
p−2 p−4 p−2
p 2 8 2p2
= = , (A.31)
2 (p − 4) (p − 2)2 (p − 4) (p − 2)2
p 3/2 2 p 1/2 p 2
κRp ,Rp2 = G−1 (p) − G−1 (p)
2 p−3 2 2 p−2
p 3/2 2
= G−1 (p) ,
2 (p − 2) (p − 3)
p 2 4 2
2
κRp ,Rp3 = μRp ,4 − μRp μRp ,3 = − G (p) ,
2 (p − 4) (p − 2) p − 3 −1
1 2 p2
μRp ,4 − κRp ,Rp3 = μRp μRp ,3 = G−1 (p) ,
2 p−3
372 A Formulae
and
In particular
√
μχp ,1 = κχp ,1 = 2G1 (p)
μχp ,2 = p,
κχp ,2 = p − κχ2p ,1 ,
μχp ,3 = (p + 1) κχp ,1 ,
μχp ,4 = p (p + 2) ,
μχp ,5 = (p + 1) (p + 2) (p + 3) κχp ,1 ,
κχp ,5 = (p + 1) (p + 2) (p + 3) κχp ,1 − 4κχp ,4 − 6pκχp ,3
−4 (p + 1) κχp ,2 κχp ,1 − p (p + 2) κχp ,1
= κχp ,1 (p + 1) (p + 2) (p + 3) − 4κχp ,4 − 6p 1 − 2κχp ,2
−4 (p + 1) κχp ,2 − p (p + 2) ,
2k−1
2k − 1
κχp ,2k = μχp ,2k − κχp ,2k−j μχp ,j
j
j =1
k
2k − 1
= μχp ,2k − κχp ,2k−2j +1 μχp ,2j −1
2j − 1
j =1
k−1
2k − 1
− κχp ,2k−2j μχp ,2j
2j
j =1
374 A Formulae
k
2k − 1
= (p)2k,2 − κχp ,1 κχp ,2k−2j +1 (p − 1)j,2
2j − 1
j =1
k−1
2k − 1
− κχp ,2k−2j (p)j,2 ,
2j
j =1
2k
2k
κχp ,2k+1 = μχp ,2k+1 − κχp ,2k−j +1 μχp ,j
j
j =1
k
2k
= μχp ,2k+1 − κχp ,2k−2j +1 μχp ,2j
2j
j =1
k
2k
− κχp ,2(k−j +1) μχp ,2j −1
2j − 1
j =1
k
2k
= κχp ,1 (p)2k+1,2 − κχp ,2k−2j +1 (p)2j,2
2j
j =1
k
2k
−κχp ,1 κχp ,2(k−j +1) (p)2j −1,2 ,
2j − 1
j =1
2k
2k
κχp ,2k+1 = μχp ,2k+1 − κχp ,j μχp ,2k−j +1 .
j −1
j =1
where again
moreover,
κβ1 R,3 = μβ1 ,3 κR,3 + 3κβ 2 ,β1 κR,2 κR,1 + μ3R,1 κβ1 3 ,
1
2
κβ 2 R 2 ,β1 R,β1 R = μβ 2 β 2 ,3 κR 2 ,R,R + 2κβ 2 ,β 2 κR,2
2 2 1 2 1
such that
√
Erfc z/ 2 = Erfc (z) .
We have
7
d z2 /2
z2 /2 2
e Erfc (z) = ze Erfc (z) − ,
dz π
d 2 z2 /2 2 2
2
e Erfc (z) = ez /2 Erfc (z) + z2 ez /2 Erfc (z)
dz
A Formulae 377
7 2 7
2 2
−z = z + 1 e Erfc (z) − z
2 z /2
,
π π
d 3 z2 /2 2 7 2
e Erfc (z) = z + 3z e
3 z /2
Erfc (z) − z + 2
2
,
dz3 π
and
d 4 z2 /2 2 7 2
e Erfc (z) = z z + 3z + 3z + 3 e
3 2 z /2
Erfc (z) − z + 3z
3
dz4 π
2 7
2
= z4 + 6z2 + 3 ez /2 Erfc (z) − z3 + 5z .
π
d
Hm (z) = zHm−1
(z) + H (z)
dz m−1
n−1 k
d
Fm (z) = H (z) .
dzk m−1−k
k=0
√
hence, M z/ 2 = M (z). Indeed
7 √
z2 /2 2
M (z) = e Erfc z/ 2
π
1 √ 2
= 2Erfc
z/ 2 = Erfc (z) .
√1 e−x /2 √1 e −x /2
2 2
2π 2π
ϕ (ix)
ρiM (x) = .
(ix)
In particular
Cumulants of |Z| in terms of i-Mill’s ratio are given by
A Formulae 379
k 1 2 3 4 5 6 7 8
/ / / /
2
μ|Z|,k π 1 2 π2 3 8 π2 15 48 π2 105
(k−1)
κ|Z|,k = (−i)k−1 ρiM (0) , (A.38)
(k−1)
ρiM (0) = (i)k−1 κ|Z|,k .
In particular we have
k 1 2 3 4 5 6
/ / 3 / 2 / 5 3
κ|Z|,k 2
π 1− 2
π 2 2
π − 2
π −6 2
π + 4 π2 24 2
π − −120 π2 +
/ 3 2
120 π2 −
20 2
π +
/ 28 π2
3 π2
k 7 8
7/2 5/2 3/2 4 3 2
κ|Z|,k 720 π2 −840 2
π +266 2
π − −5040 2
π +6720 2
π −2688 2
π +
/
63 π2 288 π2
Notations
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 381
G. Terdik, Multivariate Statistical Methods, Frontiers in Probability
and the Statistical Sciences, https://doi.org/10.1007/978-3-030-81392-5
382 Notations
μ(a) μ(a) = E nj=1 Xaj : expected value of the product of ran-
dom variables according to indexset a = (a1 , a2 , . . . , an ),
in particular μ(1:n) = EX1n = E nk=1 Xk .
μX,n μX,n = EXn .
μ⊗
a (d) μ⊗ (d) = E ⊗n j =1 Xaj : expected value (with dimension
a
d = dj ) of the T-product of random variables Xaj ,
⊗
μ⊗ ⊗
a (d) = μa forshort, in particular μ1:n (d) = EX1 ⊗
⊗n
X2 · · · ⊗ Xn = E j =1 Xj .
μ⊗ μ⊗X,n = EX .
⊗n
X,n
κa κa = Cum Xa1 , . . . , Xan , cumulant of random variables
according to index set a = (a1 , a2 , . . . , an ), in particular
κ(1:n) = Cum (X1:n ).
κ⊗
a (d) κ⊗a (d) = Cum (Xa ), nth order T-cumulant of the list
of vectors Xak , k = 1 : n, for example κ ⊗ 1:n (d) =
Cum (X1:n ), κ ⊗ 1:n (d) = κ
1:n
⊗
for short, κ ⊗
1:n (d) is a vector
with dimension d = dj .
κ⊗
X,n κ⊗X,n = Cumn (X).
κ⊗
1,2 Covariance vector (second-order cumulant) κ ⊗ 1,2 =
vecCov (X2 , X1 ), where Cov (X2 , X1 ) denotes covariance
matrix.
Hn Hermite polynomial with degree n.
Hn T-Hermite polynomial
with degree n.
Bracketing Parentheses:
a set of ordered elements. Curly brackets:
a set
ofelements without any particular order. Square
brackets: elements constitute either a vector or a matrix.
Partitions, L ∈ Pn Pn is the set of all partitions of the numbers 1 : n =
(1, 2, . . . , n), a partition L ∈ Pn contains blocks L =
{b1 , b2 , . . . , bk } so that blocks b1 , b2 , . . . , bk are disjoint
and ∪bj = {1, 2, . . . , n}. L{k} ∈ Pn is a partition with size
k, K{r|} ∈ Pn , is a partition with size r and type .
Partitions PII
n PII
n is the set of all partitions K of the pairs of the set
II
(1, 2, . . . n).
Partitions PI,II
n PI,II
n is the set of all partitions K
I,II ∈ PI,II . Blocks of K I,II
n
include either 1 or 2 elements.
Permutations, p ∈ Pn Pn denotes the set of all permutations of the numbers
1 : n = (1, 2, . . . , n) , if p ∈ Pn then p (1 : n) =
(p (1) , p (2) , . . . , p (n)), pq: product of two permutations,
p × q: consecutive application of permutations.
Repetition Vector having same components: [d, d, d, d, d] = d1k ,
+ ,- .
k
replicating value d k-times;
384 Notations
Chapter 1
1.11
a1 ⊗ Bb = vec Bba1 = (a1 ⊗ B) b,
Bb ⊗ a2 = vec a2 b B = (B ⊗ a2 ) b.
1.15
2
vec⊗2 Id a⊗4 = vec Id a⊗2 vec Id a⊗2 = a a =a⊗2 a⊗2 = a⊗4 vecId 2 ,
vec Id ⊗ Id 2 a⊗4 = vec Id a⊗2 ⊗ a⊗2 = Id 2 a⊗2 ⊗ vec Id a⊗2 = Id 2 ⊗ vec Id a⊗4 .
1.16
vec AA AA = AA ⊗ AA (vecId ) = A⊗2 A⊗2 (vecId )
= Id 2 A⊗2 A⊗2 (vecId ) = (vecId ) ⊗ Id 2 A⊗4 .
1.17
K(234)S (d1:4 ) = K(23)S (d1 , d2 , d3 d4 ) = Id1 ⊗ K(2,1)S (d2 , d3 d4 ) = Id1 ⊗ Kd3 d4 •d2 .
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 385
G. Terdik, Multivariate Statistical Methods, Frontiers in Probability
and the Statistical Sciences, https://doi.org/10.1007/978-3-030-81392-5
386 Solutions
1.19
K−1
(132) d, d 2
, d = K−1
(1423) = K(1342)
K−1 −1
(12) ⊗ Id K(12) d, d 2 , d = K(2134)K(3124) = K(1324).
S S
K−1 −1 −1
(2134) K(1423) = K(1324) .
1.23
vec b ⊗ a ⊗ b ⊗ a = vec b ⊗ a ⊗ b ⊗ a
= Im ⊗ Km•p ⊗ Ip vec b ⊗ a ⊗ vec b ⊗ a
= Im ⊗ Km•p ⊗ Ip (b ⊗ a)⊗2 .
1.29
vec A ⊗ A = K(1324) vecA ⊗ vecA = K(1324)K(1243)K(1324)vec A ⊗ A
= K(1324)K(1342)vec A ⊗ A
= K(1432)vec A ⊗ A
vec A ⊗ A = K(1324) vecA ⊗ vecA = K(1324)K(2134)K(1324)vec A ⊗ A
= K(1324)K(3124)vec A ⊗ A
= K(3214)vec A ⊗ A .
1.33
d ⊗2
vecId = e⊗2
j , vec⊗3 Id = e⊗2 ⊗2 ⊗2
j ⊗ ek ⊗ em , vecId 3 = ej ⊗ ek ⊗ em .
j =1 j,k,m j,k,m
⊗2
K−1 ⊗3
(142536)vec Id = ej ⊗ ek ⊗ em = vecId 3 .
j,k,m
vecId 3 = Id 2 ⊗ Kd•d 2 ⊗ Id vecId 2 ⊗ vecId
= Id 2 ⊗ Kd•d 2 ⊗ Id (Id ⊗ Kd•d ⊗ Id ) vec⊗2 Id ⊗ vecId
Solutions 387
= Id 2 ⊗ Kd•d 2 ⊗ Id Id ⊗ Kd•d ⊗ Id 3 vec⊗3 Id
= Id ⊗ Id ⊗ Kd•d 2 Kd•d ⊗ Id 2 ⊗ Id vec⊗3 Id
= Id ⊗ K(1423)K(2134) ⊗ Id vec⊗3 Id = K(135246)vec⊗3 Id
vec⊗3 Id = Id ⊗ Id ⊗ Kd 2•d Id 2 ⊗ Kd•d ⊗ Id vecId 3
= Id ⊗ K(1342)K(2134) ⊗ Id vecId 3
= K(13452)vecId 3 .
1.36
vec A ⊗ Id 2 Ip ⊗ Kp·d ⊗ Id ⊗ vec B ⊗ Id 2 Ip ⊗ Kp·d ⊗ Id vec ( ⊗ )⊗2
= vec vec B ⊗ Id 2 Ip ⊗ Kp·d ⊗ Id (( ⊗ ) ⊗ ( ⊗ )) Ip ⊗ Kd·p ⊗ Id vecA ⊗ Id 2
= vec vec B ⊗ Id 2 Ip ⊗ Kp·d ⊗ Id ⊗ ( ⊗ ) Kd·p ⊗ vecA ⊗ Id 2
= vec vec B ⊗ Id 2 ⊗ Kp·d ( ⊗ ) Kd·p ⊗ vecA ⊗ Id 2
= vec vec B ⊗ Id 2 ( ⊗ ⊗ ⊗ ) vecA ⊗ Id 2
= vec vec B ⊗ Id 2 (( ⊗ ) vecA ⊗ ⊗ )
= vec vec B ( ⊗ ) vecA ⊗ ⊗
= vec B ( ⊗ ) vecA vec ( ⊗ )
= trAB vec⊗2 .
1.45
+
Qd,2 x = ωd,2 Q+
d,2 Sd1 2 x = ω d,2 Q d,2 x + K (21) x /2 .
r1 − k1,2 ! r3 − k3,4 !
× k1,4 !k2,3!
r1 − k1,2 − k1,3 ! r3 − k3,4 − k1,3 !k1,3 !
r1 !r2 !r3 !r4 !
= .
k1,2 !k1,3 !k1,4!k2,3 !k2,4 !k3,4!
Chapter 2
2.5
1. r = 1,
8 = 1, j8 = (2, 2).
2. r = 2,
1 = 1, 7 = 1, j1 = (0, 1), j7 = (2, 1);
3 = 1, 5 = 1, j3 = (1, 0), j7 = (1, 2);
4 = 2, j4 = (1, 1);
2 = 1, 6 = 1, j2 = (0, 2), j6 = (2, 0).
3. r = 3,
1 = 2, 6 = 1, j1 = (0, 1), j6 = (2, 0);
3 = 2, 2 = 1, j3 = (1, 0), j2 = (0, 2);
1 = 1, 3 = 1, 4 = 1, j1 = (0, 1), j3 = (1, 0), j4 = (1, 1).
4. r = 4,
1 = 2, 3 = 2, j1 = (0, 1), j3 = (1, 0) .
Replace the above cases into (2.13).
2.10
∂ ∂
Dx⊗ (A ⊗ Bx) = (A ⊗ B) (Id ⊗ x) ⊗ = (A ⊗ B ⊗ Id ) Id ⊗ x ⊗
∂x ∂x
= (A ⊗ B ⊗ Id ) (Id ⊗ vecId ) = A ⊗ ((B ⊗ Id ) vecId ) = A ⊗ vecB .
2.11
Dx⊗ x Ax = vec x A + A = A + A x.
∂
(Ax)⊗2 ⊗ = Ax ⊗ vecA + K−1
(132) vecA ⊗ Ax
∂x
= K−1(132) + K −1
(231) vecA ⊗ Ax ,
∂
Dx⊗2 (Ax)⊗2 = K−1
(1324) + K −1
(2314) vecA
⊗ Ax ⊗
∂x
= K−1 −1 ⊗2
(1324) + K(2314) vec A = K(3124) + K(1324) vec A
⊗2
= K(3124) + K(1324) K(1324)vecA⊗2 = K(3214) + Id 4 vecA⊗2 .
2.13 x ∈ R, f1 ∈ Rm1
∂ ⊗
vec Dy f1 (y) f2 = vec Dx⊗ f2 Dy f1 (y) = Im1 ⊗ Dx⊗ f2 Dy f1
∂x
∂
Dy f1 (y) f2 = Dy f1 (y) Dx⊗ f2 = Dx⊗ f2 ⊗ Im1 Km2 •m1 Dy⊗ f1 .
∂x
2.17 Dy f = Dy⊗ f ,
∂
∂
Dx⊗ h = Dy⊗ f ⊗ Id Dx⊗ g = Dy⊗ f ⊗ Id g⊗ = Dy⊗ f g ⊗
∂x ∂x
∂
= Dy⊗ f gj ⊗ = Dy⊗ f ∗ Dx⊗ g.
j ∂x
2.18
∂
∂
−1 2
(Vx ⊗ vecV) ⊗ = K(12)S d, d , d vecV ⊗ Vx ⊗
∂x ∂x
= K−1
(12) d, d 2 , d vecV ⊗ vecV .
S
∂ ∂
(Vx ⊗ vecV) ⊗ = K−1
(132) d, d 2
, d Vx ⊗ ⊗ vecV
∂x ∂x
= K−1 2
(132) d, d , d vecV ⊗ vecV .
hence,
Dx⊗ f ⊗ Id 2 g = g ⊗ Id (Im ⊗ vecId ) Dx⊗ + f ⊗ Id 2 Dx⊗ g
= Dx⊗ g ⊗ Id 2 (Im ⊗ vecId ⊗ Id ) Dx⊗ f + f ⊗ Id 2 Dx⊗2 g,
since
f ⊗ Id Dx⊗ g = Dx⊗ g ⊗ Id vec f ⊗ Id = Dx⊗ g ⊗ Id (f ⊗ vecId )
= Dx⊗ g ⊗ Id (Im ⊗ vecId ) f,
Dx⊗ g ⊗ Id (Im ⊗ vecId ) ⊗ Id = Dx⊗ g ⊗ Id 2 (Im ⊗ vecId ⊗ Id ) ;
therefore,
Dx⊗ f ⊗ Id Dx⊗ g = f ⊗ Id 2 Dx⊗2 g+ Dx⊗ g ⊗ Id 2 (Im ⊗ vecId ⊗ Id ) Dx⊗ f.
2.27 f (1) (g) is scalar, commutes with Dx⊗1 g; hence, Dx⊗1 f (g) = Dx⊗1 g ⊗ f (1) (g)
⊗ ⊗ (1)
Dx1 g ⊗ Dx2 f (g) = f (2) (g) Dx⊗1 g ⊗ Dx⊗2 g .
Dx⊗4 f (2) (g) Dx⊗1:2 g ⊗ Dx⊗3 g = f (3) (g) Dx⊗1:2 g ⊗ Dx⊗3 g ⊗ Dx⊗4 g
The next term contains a commutator matrix; therefore, the first step
Dx⊗4 f (2) (g) K(2,3)S d(2,3)S Dx⊗1,3 g ⊗ Dx⊗2 g
= K(2,3)S d(2,3)S ⊗ Id4 Dx⊗4 f (2) (g) Dx⊗1,3 g ⊗ Dx⊗2 g
= K(2,3)S (d1 , d3 , d2 , d4 ) Dx⊗4 f (2) (g) Dx⊗1,3 g ⊗ Dx⊗2 g ,
then
Dx⊗4 f (2) (g) Dx⊗1,3 g ⊗ Dx⊗2 g = f (3) (g) Dx⊗1,3 g ⊗ Dx⊗2 g ⊗ Dx⊗4 g
+f (2) (g) K(2,3)S (d1 d3 , d4 , d2 ) Dx⊗3
1,3,4
g ⊗ Dx⊗2 g
Dx⊗4 f (2) (g) K(2,3)S d(2,3)S Dx⊗1,3 g ⊗ Dx⊗2 g
We see that (1423) is the inverse of the permutation (1342) corresponding to the
term Dx⊗3
1,3,4
g ⊗ Dx⊗2 g.
−1
K(1423) (d1 , d3 , d4 , d2 ) = K(1342) (d1:4 ) .
Dx⊗4 Dx⊗1 g ⊗ Dx⊗2:3 g = K−1
(1,4,2,3) D ⊗
x1,4 g ⊗ D ⊗
x2:3 g + Dx⊗1 g ⊗ Dx⊗2:4 g.
Dx⊗4
1:4
f (g) = f (1) (g) Dx⊗41:3
g + f (2) (g) Dx⊗1,2 g ⊗ Dx⊗3,4 g + K−1 ⊗ ⊗
(2,3)S Dx1,3 g ⊗ Dx2,4 g +
+f (2) K−1 ⊗3 ⊗
(1,3,4,2) Dx1,3,4 g ⊗ Dx2 g + f
(3) (g) D ⊗ g ⊗ D ⊗ g ⊗ D ⊗ g
x1:2 x3 x4
Chapter 3
3.8 Gaussian κ ⊗ −1
X,k = 0, for k > 2, and K(1423) = K(1342) = K(1243) K(1324) ,
K−1 ⊗2 ⊗2 ⊗2
(1324) κ X,2 = (Id ⊗ Kd·d ⊗ Id ) κ X,2 = vec ,
Cum2 X⊗2 , X⊗2 = K−1(1324) + K −1 ⊗2
(1423) κ X,2 = vec
⊗2
+ K(1342)κ ⊗2
X,2
⊗2
= Id 4 + K(1243) vec .
3.12
κXY ,3 = Cum1 κXY,3|Y + 3Cum2 κXY |Y , κXY,2|Y + Cum3 κXY |Y ,
κXY ,3 = Cum1 Y 3 κX,3|Y + 3Cum2 Y κX|Y , Y 2 κX,2|Y + Cum3 Y κX|Y ,
κXY ,3 = κX,3 κY 3 ,1 + 3κX,1κX,2 κY 2 ,Y +κX,1
3
κY,3 .
3
κXY ,3 = κX,3 κY,3 +3κX,3κY,2 κY,1 + 3κX,2κX,1 κY,3 + 6κX,2κX,1 κY,2 κY,1 + κX,3 κY,1
+ κX,1
3
κY,3
3 3
= κX,3 κY ,3 + 3κY,1 κY ,2 + κY,1 + κY ,3 κX,3 + 3κX,1 κX,2 + κX,1
3
κXY ,3 = κX,3 κY 3 ,1 + 3κX,1κX,2 κY 2 ,Y + κX,1 κY,3 = μY,3 μX,3 −3μX,1 μX,2 μY,2 μY,1
+ 2μ3X,1 μ3Y,1 .
3.16
3.21
Cum κX1 Y1 |Y1,2 , κX2 Y2 |Y1,2 , κX3 Y3 |Y1:3 = Cum Y1 κX1 ,1 , Y2 κX2 ,1 , Y3 κX3 ,1
= κX1 ,1 κX2 ,1 κX3 ,1 Cum (Y1 , Y2 , Y3 )
Cum κX1 Y1 ,X2 Y2 |Y1:3 , κX1 Y1 |Y1:3 = κX1 ,X2 κX3 ,1 Cum (Y1 Y2 , Y3 )
κX1 Y1 ,X2 Y2 ,X3 Y3 = Cum κX1 Y1 ,X2 Y2 ,X3 Y3 |Y1:3 + Cum κX1 Y1 ,X2 Y2 |Y1:3 , κX3 Y3 |Y1:3
+Cum κX1 Y1 ,X2 Y2 |Y1:3 , κX1 Y1 |Y1:3 + Cum κX2 Y2 ,X3 Y3 |Y1:3 , κX1 Y1 |Y1:3
+Cum κX1 Y1 |Y1,2 , κX2 Y2 |Y1,2 , κX3 Y3 |Y1:3
= Cum Y1 Y2 Y3 κX1:3 |Y1:3 + Cum Y1 Y2 κX1,2 |Y1:3 , Y3 κX3 |Y1:3
+Cum Y1 Y3 κX1,3 |Y1:3 , Y2 κX2 |Y1,3 + Cum Y2 Y3 κX2,3 |Y1:3 , Y1 κX1 |Y1:3
+Cum Y1 κX1 |Y1,2 , Y2 κX2 |Y1,2 Y3 , κX3 |Y1:3
= κX1:3 κY1 Y2 Y3 ,1 + κX1,2 κX3 κY1 Y2 ,Y3 + κX1,3 κX2 κY1 Y3 ,Y2
3.23
φ0 (λ1:n ) = φ (λk ) ,
ψ0 (λ1:n ) = ψ (λk )
394 Solutions
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
∂ 1 ∂ ∂
∂λ1 φ0 (λ) φ(λ1 ) ∂λ1 φ (λ1 ) ∂λ1 ψ (λ1 )
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
Dλ⊗ φ0 (λ) = ⎢
⎣
..
.
⎥ = φ0 (λ) ⎢
⎦ ⎣
..
.
⎥ = φ0 (λ) ⎢
⎦ ⎣
..
.
⎥
⎦
∂ 1 ∂ ∂
φ
∂λn 0 (λ) φ(λn ) ∂λn φ (λn ) ∂λn ψ (λn )
n
= φ0 (λ) ψ (1) (λk ) ek ,
k=1
⊗3
Dλ⊗3 φ0 (λ) = φ0 (λ) ψ (1) (λk ) ek + φ0 (λ) ψ (1) (λk ) ek ⊗ ψ (2) (λk ) e⊗2
k
+φ0 (λ) K−1
(1,3,2) ψ (2) (λk ) e⊗2
k ⊗ ψ (1) (λk ) ek
+φ0 (λ) ψ (2) λj ψ (1) (λk ) e⊗2
j ⊗ ek + φ0 (λ) ψ (3) (λk ) e⊗3
k .
j,k
3.25 F3 1d = 0,
⊗3 ⊗3
μ⊗ 3
X,3 = κX,1 F3 + 3κX,2κX,1 I⊗3
1d d + K −1
(132) + K −1
(231) F3 1d
⊗2
⊗3
× ⊗ F3 vec (Id ) + κX,3 F3 e⊗3
k
3
= κX,3 F3 ek = 3κX,3.
3.26
φ0 (λ1:n ) = φ (λk ) ,
ψ0 (λ1:n ) = ψ (λk ) .
n
Dλ⊗ φ0 (λ) = φ0 (λ) ek ⊗ Dλ⊗k ψ (λk ) ,
k=1
⊗2
Dλ⊗2 φ0 (λ) = φ0 (λ) ek ⊗ Dλ⊗k ψ (λk ) + φ0 (λ) e⊗2 ⊗2
k ⊗ Dλk ψ (λk ) .
Solutions 395
3.27
φ0 (λ1:n ) = φ (λk ) ,
ψ0 (λ1:n ) = ψ (λk ) ,
⊗3
Dλ⊗3 φ0 (λ) = φ0 (λ) ek ⊗ Dλ⊗ ψ (λk )
+φ0 (λ) ek ⊗ Dλ⊗ ψ (λk ) ⊗ e⊗2
k ⊗ Dλ
⊗2
ψ (λk )
+φ0 (λ) K−1
(132) e ⊗2
k ⊗ D ⊗2
λ ψ (λk ) ⊗ e k ⊗ D ⊗
λ ψ (λk )
+φ0 (λ) e⊗2 ⊗2
k ⊗ Dλ ψ (λk ) ek ⊗ Dλ⊗ ψ (λk )
+φ0 (λ) e⊗3 ⊗3
k ⊗ Dλ ψ (λk ) ,
take n = 3
Cum3 (Y) = Cum3 e k ⊗ Xk = Cum3 (ek ⊗ Xk ) = e⊗3 ⊗
k ⊗ κ X,3 ,
see (3.12).
3.29 Use (3.58):
Cum X AX, X BX = Cum2 (vecA) X ⊗ X, (vecB) X ⊗ X
= (vecA) ⊗ (vecB) Cum2 (X ⊗ X, X ⊗ X)
= (vecA) ⊗ (vecB) K−1
(2,1)S +K −1
(4,2)S [vec ⊗ vec]
see (1.10).
3.31
Cum2 (X1:2 |Y ) = μX1 X2 |Y − μX1 |Y μX2 |Y = E X1 ± μX1 |Y X2 ± μX2 |Y |Y − μX1 |Y μX2 |Y
= E X1 − μX1 |Y X2 − μX2 |Y − μX2 |Y E X1 − μX1 |Y |Y
−μX1 |Y E X2 − μX2 |Y |Y
= E X1 − μX1 |Y X2 − μX2 |Y = Cum2 (X1:2 − E (X1:2 |Y )) .
396 Solutions
3.32
Dϑd κϑb1 ,ϑb2 ,ϑb3 = Dϑd μϑb1 ,ϑb2 ,ϑb3 − Dϑd μϑb3 μϑb1 ,ϑb2 + μϑb2 μϑb1 ,ϑb3
+μϑb1 μϑb2 ,ϑb3 + 2Dϑd μϑb1 μϑb2 μϑb3
= μϑb1 ,ϑb2 ,ϑb3 ,ϑd + μϑb1 ,d ,ϑb2 ,ϑb3 + μϑb1 ,ϑb2 ,d ,ϑb3 + μϑb1 ,ϑb2 ,ϑb3 ,d
− μϑb3 ,ϑd μϑb1 ,ϑb2 + μϑb3 μϑb1 ,ϑb2 ,ϑd + μϑb2 ,ϑd μϑb1 ,ϑb3
+μϑb2 μϑb1 ,ϑb3 ,ϑd + μϑb1 ,ϑd μϑb2 ,ϑb3 + μϑb1 μϑb2 ,ϑb3 ,ϑd
− μϑb3 ,d μϑb1 ,ϑb2 + μϑb3 μϑb1 ,d ,ϑb2 + μϑb1 ,ϑb2 ,d + μϑb2 ,d μϑb1 ,ϑb3
+μϑb2 μϑb1 ,ϑb3 ,d + μϑb1 ,d ,ϑb3
−μϑb1 ,d μϑb2 ,ϑb3 − μϑb1 μϑb2 ,d ,ϑb3 ,d + μϑb2 ,ϑb3 ,d
= μϑb1 ,ϑb2 ,ϑb3 ,ϑd − μϑb3 ,ϑd μϑb1 ,ϑb2 + μϑb3 μϑb1 ,ϑb2 ,ϑd
+μϑb2 ,ϑd μϑb1 ,ϑb3 + μϑb2 μϑb1 ,ϑb3 ,ϑd + μϑb1 ,ϑd μϑb2 ,ϑb3 + μϑb1 μϑb2 ,ϑb3 ,ϑd
+μϑb1 ,d ,ϑb2 ,ϑb3 − μϑb1 ,d μϑb2 ,ϑb3 − μϑb3 μϑb1 ,d ,ϑb2 − μϑb2 μϑb1 ,d ,ϑb3 . . .
= κϑb1 ,ϑb2 ,ϑb3 ,ϑd + κϑb1 ,d ,ϑb2 ,ϑb3 + κϑb1 ,ϑb2 ,d ,ϑb3 + κϑb1 ,ϑb2 ,ϑb3 ,d
3.34
Dϑd κϑb = Dϑd μϑb = μϑ {b,d } + μϑb ,ϑd = κϑ {b,d } + κϑb ,ϑd
4
Dϑd μϑb1 ,ϑb2 = Dϑd lϑb1 lϑb2 el(ϑ) dx = μϑ{b ,d } ,ϑb2 + μϑb1 ,ϑ{b ,d } + μϑb1 ,ϑb2 ,d
1 2
4
Dϑd μϑb1 ,ϑb2 ,ϑb3 = Dϑd lϑb1 lϑb2 lϑb3 el(ϑ) dx
= μϑ{b ,ϑb2 ,ϑb3 + μϑb1 ,ϑ{b ,ϑb3 + μϑb1 ,ϑb2 ,ϑ{b + μϑb1 ,ϑb2 ,ϑb3 ,d .
1 ,d } 2 ,d } 3 ,d }
Solutions 397
3.35
4 4
Dϑd+1 Eϑ lϑ1:d = Dϑd+1 lϑ1:d L (ϑ, x) dx = Dϑd+1 lϑ1:d el(ϑ) dx
4
= lϑ1:(d+1) + lϑd+1 lϑ1:d el(ϑ) dx
Chapter 4
All random variables below are jointly normal, centralized (EXj = 0), with
covariances σj k , and cumulants κ ⊗
j,k .
4.1
n−1
Hn+1 (aY + bZ, X1:n ) = (aY + bZ) Hn (X1:n ) − σj,aY +bZ Hn−1 X(1:n)\j
j =1
n−1
= aY Hn (X1:n ) + bZHn (X1:n ) − aσj,Y Hn−1 X(1:n)\j
j =1
n−1
− bσj,Z Hn−1 X(1:n)\j
j =1
−K−1 ⊗ ⊗ ⊗ ⊗
(1423) κ 1,4 ⊗ X2 ⊗ X3 − κ 1,2 ⊗ X3 ⊗ X4 + κ 1,2 ⊗ κ 3,4
398 Solutions
−X1 ⊗ κ ⊗
2,3 ⊗ X 4 + K −1
κ ⊗
(1423) 1,4 ⊗ κ ⊗
2,3 − K −1
(1324) κ ⊗
1,3 ⊗ X 2 ⊗ X 4
+K−1 ⊗ ⊗
(1324) κ 1,3 ⊗ κ 2,4 ,
K−1 −1 −1
(1243) K(1423) = K(1243) K(1342) = K(1324) = K(1324),
L−1 −1 −1
22 =Id 4 + K(1324) + K(1423) ,
L−1 −1 −1 −1 −1 −1
12 ,21 = Id 4 + K(1324) + K(1423) + K(2314) + K(2413) + K(3412) ,
H4 (X) = X⊗4 − L−1
22 X ⊗2
⊗ κ ⊗
X,2 + L−1
12 ,21 Id 4 + K −1
(1324) + K −1 ⊗2
(1423) κ X,2
= Sd14 X⊗4 − 6κ X,2 ⊗ X⊗2 + 3κ ⊗2
X,2 .
4.14
−K−1 ⊗
(1423) H2 (X1 , X4 ) ⊗ κ 2,3
−K−1 ⊗ ⊗ ⊗ ⊗
(1423) κ 1,4 ⊗ H2 (X2:3 ) − κ 1,2 ⊗ H2 (X3:4 ) + κ 1,2 ⊗ κ 3,4
+K−1 ⊗ ⊗ −1 ⊗ ⊗
(132) κ 1,3 ⊗ κ 2,4 + K(1423) κ 1,4 ⊗ κ 2,3 .
H4 (X) = X⊗1 −1 ⊗ −1 ⊗2
1:4 − L2,H 2 κ X,2 ⊗ H2 (X) + L2,2 κ X,2 ,
4
and
⊗ ⊗
E (xk + iXk ) = E (xk + iXk ) ⊗ x3 − Ex1 ⊗ X2 ⊗ X3 − EX1 ⊗ x2 ⊗ X3
k=1:3 k=1:2
⊗
=E (xk + iXk ) ⊗ x3 − x1 ⊗ κ ⊗ −1 ⊗
2,3 − K(213) x2 ⊗ κ 1,3 .
k=1:2
cf. (4.34).
4.16 Let Y = X1 + X2 ,
H2 (Y, X1 + X2 ) = H2 (Y, X1 ) + H2 (Y, X2 ) = K−1
(21) H2 (X1 , X1 + X2 )
+H2 (X2 , X1 + X2 )
= K−1
(21) (H2 (X1 ) + H2 (X2 ) + H2 (X1 , X2 ) + H2 (X2 , X1 ))
= H2 (X1 ) + H2 (X2 ) + Id 2 + K−1
(21) H2 (X1 , X2 ) ,
4.17
+ (A ⊗ Id ) K−1
(21) (B ⊗ Id ) H2 (X2 , X1 )
+ (B ⊗ Id ) K−1
(21) (A ⊗ Id ) H2 (X1 , X2 )
+ (B ⊗ Id ) K−1
(21) (B ⊗ Id ) H2 (X2 , X2 )
400 Solutions
+B⊗2 H2 (X2 ) .
(A ⊗ Id ) K−1
(21) (B ⊗ Id ) H2 (X2 , X1 ) = (A ⊗ Id ) (Id ⊗ B) K(21) H2 (X2 , X1 )
= (A ⊗ B) H2 (X1 , X2 ) .
4.19
H3 X − X = H2 X − X ⊗ X − X − Id 3 + K−1
(2,1,3) X − X ⊗ κ⊗
X−X,2
n−1
= H2 X − X ⊗ X − X − Id 3 + K−1(2,1,3) X − X ⊗ κ⊗ X,2
n
1
κ⊗ = κ⊗ ,
X−X,2 n − 1 X,2
⊗2
H3 X−X = (X − μ)⊗3 + 3 (X − μ)⊗2 ⊗ X − μ − +3 (X − μ) ⊗ X − μ
⊗3
+ X−μ − 2 (X − μ) ⊗ κ ⊗ + μ − X ⊗ κ⊗
X−X,2 X−X,2
= H3 (X−μ) + 2 (X − μ) ⊗ κ X,2 + H3 X − μ + 2 X − μ ⊗ κ ⊗
⊗
X,2
+3H2 (X−μ) ⊗ H1 X − μ + 3κ ⊗ X,2 ⊗ X − μ
+3H1 (X−μ) ⊗ H2 X − μ + 3 (X−μ) ⊗ κ ⊗
X,2
⊗ ⊗
−2 (X − μ) ⊗ κ −2 μ−X ⊗κ
X−X,2 X−X,2
= H3 (X−μ) + H3 μ − X + 3H2,1 X−μ, μ − X + 3H1,2 X−μ, μ − X .
4.24
n
n
E (Hn (X − E (X|Y ) + E (X|Y )) |Y ) = E Hk (X − E (X|Y )) Hn−k (E (X|Y )) |Y
k
k=0
n
n
= Hn−k (E (X|Y )) E (Hk (X − E (X|Y )))
k
k=0
= Hn (E (X|Y )) ,
Solutions 401
E (H2 (X) |Y) = E (H2 (X−E (X|Y) + E (X|Y)) |Y) = E (H2 (W + V) |Y)
= E (H2 (W) |Y) + E (H2 (V) |Y) + E (H2 (W1 , V2 ) |Y) + E (H2 (W2 , V1 ) |Y)
= E (H2 (W)) + E (H2 (V)) + E (H1 (W1 ) H1 (V2 ) |Y) + E (H1 (W2 ) H1 (V1 ) |Y)
= E (H2 (V)) .
4.25
E X3 |Y = E (X − E (X|Y ) + E (X|Y ))3 |Y = E (X − E (X|Y ))3 |Y
+3E (X − E (X|Y ))2 E (X|Y ) |Y
+3E (X − E (X|Y )) E2 (X|Y ) |Y + E3 (X|Y )
= E (X − E (X|Y ))3 + 3E (X|Y ) E (X − E (X|Y ))2
4.26
E (Y1 Y2 |Y3 ) = E (Y1 − E (Y1 |Y3 )) ((Y2 − E (Y2 |Y3 ))) + E (Y1 |Y3 ) E (Y2 |Y3 )
= σ12 − EE (Y1 |Y3 ) E (Y2 |Y3 ) + E (Y1 |Y3 ) E (Y2 |Y3 )
E (Y1 Y3 ) = EY3 E (Y1 |Y3 ) .
E (H3 (Y1:3 )|Y3 ) = Y3 E (Y1 Y2 |Y3 ) − σ23 E (Y1 |Y3 ) − σ13 E (Y2 |Y3 ) − σ12 Y3
= Y3 E (Y1 |Y3 ) E (Y2 |Y3 ) − Y3 EE (Y1 |Y3 ) E (Y2 |Y3 )
−σ23 E (Y1 |Y3 ) − σ13 E (Y2 |Y3 )
= Y3 E (Y1 |Y3 ) E (Y2 |Y3 ) − Y3 EE (Y1 |Y3 ) E (Y2 |Y3 )
−E (E (Y1 |Y3 ) Y3 ) E (Y1 |Y3 ) − E (E (Y2 |Y3 ) Y3 ) E (Y2 |Y3 )
= H3 (E (Y1 |Y3 ) , E (Y2 |Y3 ) , Y3 ).
402 Solutions
Chapter 5
5.1
(4) (3) (2) (1)
ρiM = (x − 2iρiM ) ρiM + 3ρiM 1 − 2iρiM ,
2
(4)
ρiM (0) = 4ρiM0
3 2
3ρiM0 − 2 + 3ρiM0 1 − 2ρiM0
2
= 12ρiM0
5
− 8ρiM0
3
+ 3ρiM0 − 12ρiM0
3
+ 12ρiM0
5
= 24ρiM0
5
− 20ρiM0
3
+ 3ρiM0 ,
7 5 7 3 7
4 (4) 2 2 2
κ|Z|,5 = (−i) ρiM (0) = 24 − 20 +3 ,
π π π
μ μ|Z|,4 μ|Z|,2 μ|Z|,3 1 μ|Z|,3
|Z|,5
κ|Z|,5 = 5! − μ|Z|,1 − + 2! μ2|Z|,1
5! 4! 2! 3! 2! 3!
1
μ|Z|,2 3! 4 μ|Z|,2
2 4!
+ μ|Z|,1 − μ|Z|,1 + μ5|Z|,1
2! 2! 3! 2! 5!
= μ|Z|,5 − 5μ|Z|,1 μ|Z|,4 − 10μ|Z|,2 μ|Z|,3 + 20μ2|Z|,1 μ|Z|,3
μ4 + 1 − 4 (
= μ3 + 1) − 3 (
μ2 + 1)2 + 12 (
μ2 + 1) − 6
=
μ4 − 4
μ3 − 3
μ22 + 6
μ2 ,
Solutions 403
similarly
2
ζ5 ν5ν4 ν3 ν2 ν2 ν3 ν2
κ5 = 5 = 5 − 5 4 − 10 3 2 + 30 2 + 20 3 − 60 2 + 24
ν1 ν1 ν1 ν1 ν1 ν1 ν1 ν1
μ5 + 1 − 5 (
= μ4 + 1) − 10 (
μ3 + 1) (
μ2 + 1) + 30 (
μ2 + 1)2
+20 (
μ3 + 1) − 60 (
μ2 + 1) + 24
=
μ5 − 5
μ4 − 10 (
μ3 +
μ3
μ2 + 22 + 2
μ2 ) + 30 μ μ2 + 20
μ3 − 60
μ2
+1 − 5 − 10 + 30 + 20 − 60 + 24
μ5 − 5
= μ4 − 10
μ3
μ2 + 30
μ22 + 10
μ3 − 10
μ2 .
5.5
(2)
λ2 = 2, = 1
(4) (3)
2
λ2 = 4 λ3 = 22 · 2! · 3!!, = 2
(2)
λ2 = 2 · ! · (2 − 1)!!
induction
(2+2) (2+1)
+1
λ2 = (2 + 2) λ2+1
(2)
= (2 + 2) (2 + 1) λ2
5.6
κXY ,4 = κX,4 κY,4 + 3κY,2 2 + κX,3 κX,1 4κY,4 + 12κY,22 2
+ κX,2 2 +
3κY,4 + +6κY,2
2
+ κX,2 κX,1 2
6κY,4 + 12κY,2 4 κ
+ κX,1 Y,4
= κY,4 κX,4 + 4κX,3 κX,1 + 3κX,2 2 + 6κ 2 4
X,2 κX,1 + κX,1
2
+ 3κY,2 2 + 4κ
κX,4 + 4κX,3 κX,1 + 2κX,2 2
X,2 κX,1
= κY,4 μX,4 + 3κY,22 μX,4 − μ2X,2
404 Solutions
and
2
μX,4 − μ2X,2 = κX,4 + 4κX,3κX,1 + 3κX,2
2
+ 6κX,2 κX,1
2
+ κX,1
4
− κX,2 + κX,1
2
κβ1 R,β2 R = Cum1 (β1 β2 Cum2 (R, R)) + Cum1 (R)2 Cum2 (β1 , β2 )
= μβ1 ,β2 κR,2 + κβ1 ,β2 μ2R = μR,2 κβ1 ,β2 + μβ1 μβ2 κR,2
= μR,2 μβ1 ,β2 − μβ1 μβ2 + μβ1 μβ2 μR,2 − μ2R = μR,2 μβ1 ,β2 − μβ1 μβ2 μ2R
= μβ1 ,β2 μR,2 − μ2R + μβ1 ,β2 − μβ1 μβ2 μ2R = μβ1 ,β2 μR,2 − μβ1 μβ2 μ2R .
5.12
= 0.
5.13
κ⊗ ⊗ ⊗ ⊗2 ⊗
W,2 = κ Rp V,2 = κRp ,2 κ V,2 + κRp ,2 κ V,1 + κRp ,1 κ V,2
2
2 ⊗2
= μRp ,2 κ ⊗
V,2 + κ κ ⊗2
Rp ,2 V,1 = μ Rp ,2 vec − δ 2
+ κ|Z|,1 κRp ,2 δ ⊗2
π
2
2
= μRp ,2 vec + μRp ,2 − μRp ,1 − μRp ,2 δ ⊗2
2
π π
Solutions 405
p 2
= G−2 (p) vec − μ2Rp ,1 δ ⊗2
2 π
p p
= vec − G2−1 (p) δ ⊗2 ,
p−2 π
κ⊗ ⊗ ⊗2 ⊗2 ⊗ ⊗2
W,2 = μRp ,2 κ V,2 + κ V,1 − μRp ,1 κ V,1 = μRp ,2 κ V,2 + μRp ,2 − μRp ,1 κ V,1
2 2
p 2 p 2 2 ⊗2
= μRp ,2 κ ⊗ ⊗2
V,2 + κRp ,2 κ V,1 = vec − δ ⊗2 + − (G−1 (p))2 δ
p−2 π 2 p−2 π
p p
= vec − G2−1 (p) δ ⊗2 ,
p−2 π
(Lemma 3.4).
5.14 r = 1, then we shall have the term κRp3 ,1 δ ⊗3 , since 1:3 = (0, 0, 1),
C1 Rp , 1:3 = Cum1 Rp3 .
r = 2, then we shall have the term κRp ,Rp2 δ ⊗ κ ⊗ V,2 , 1:3 = (1, 1, 0),
2
C1 Rp , 1:3 = Cum2 Rp , Rp2 .
r = 3, then we shall have the term κRp ,3 δ ⊗3 , 1:3 = (3, 0, 0), C1 Rp , 1:3 =
Cum3 Rp
κ⊗
W,3 = κRp3 ,1 κ|Z|,3 δ
⊗3
+ 3κRp ,Rp2 κ|Z|,1 δ ⊗ κ ⊗ 3
V,2 + κRp ,3 κ|Z|,1 δ
2 ⊗3
2
= κRp3 ,1 κ|Z|,3 + κRp ,3 κ|Z|,1
3
− 3κRp ,Rp2 κ|Z|,1 δ ⊗3 + 3κRp ,Rp2 κ|Z|,1 δ ⊗ vec
π
= κRp |Z| ,3 δ ⊗3 + 3κRp ,Rp2 κ|Z|,1 δ ⊗ vec.
2 3 2p − 7
− − = 0,
p − 3 (p − 2) (p − 3) (p − 2) (p − 3)
2 3 2p − 7 2p − 4 − 3 − 2p + 7
− − = .
p − 3 (p − 2) (p − 3) (p − 2) (p − 3) (p − 2) (p − 3)
406 Solutions
√
Coefficient of δ ⊗3 is 2 (p/π)3/2 G3−1 (p) − 1/ (p − 3) 2/π.
Chapter 6
6.1
d
Yi2 = vecY Y = Y⊗2 (vecId ) = (vecId ) Y⊗2 ,
i=1
Y Y Y = (vecId ) Y⊗2 ⊗ Y = (vecId ) ⊗ Id Y⊗3 ,
Y Y Y = Y ⊗ (vecId ) Y⊗2 = Id ⊗ (vecId ) Y⊗3 ,
d
d
d
d
⊗2
⊗2
Y Y= Yi2 = ei Y ⊗ ei Y = ei Y⊗2 = ei Y⊗2
i=1 i=1 i=1 i=1
= (vecId ) Y⊗2 .
6.2
d
d
d
L−1 ⊗2
22 vec Id = 3 e⊗4
i + e⊗2 ⊗2
i ⊗ ej + ei ⊗ e⊗2
j ⊗ ei
i=1 i,j =1,i=j i,j =1,i=j
d
+ ei ⊗ ej ⊗ ei ⊗ ej ,
i,j =1,i=j
L−1 −1 −1
22 = Id 4 + K(1324) + K(1423) = Id 4 + K(1324) + K(1342) = Id 4 + Kp1 + Kp2 ,
d
d
vec⊗2 Id = e⊗4
i + e⊗2 ⊗2
i ⊗ ej ,
i=1 i,j =1,i=j
d d
Kp1 vec ⊗2
Id = e⊗4
i + ei ⊗ ej ⊗ ei ⊗ ej
i=1 i,j =1,i=j
d d
Kp2 vec⊗2 Id = Id ⊗ Kd 2 •d (vecId )⊗2 = e⊗4
i + ei ⊗ e⊗2
j ⊗ ei .
i=1 i,j =1,i=j
6.3 κ ⊗
4 is 4-symmetrical
2
(vecId )⊗2 a⊗4 = (vecId ) a⊗2 (vecId ) a⊗2 = a a = a⊗2 a⊗2 = a⊗4 vecId 2 .
Solutions 407
6.4
d
d
d
L−1 ⊗2
22 vec Id = 3 e⊗4
i + e⊗2 ⊗2
i ⊗ ej + ei ⊗ e⊗2
j ⊗ ei
i=1 i,j =1,i=j i,j =1,i=j
d
+ ei ⊗ ej ⊗ ei ⊗ ej ,
i,j =1,i=j
d
Id 2 ⊗ (vecId ) L−1 ⊗2
22 vec Id = Id 2 ⊗ e⊗2
i L−1 ⊗2
22 vec Id
i=1
= 3vecId + (d − 1) vecId = (d + 2) vecId .
6.5
vec Y Y YY = Y Y Y⊗2 = Y Y ⊗ Y⊗2 = Y⊗2 vecId ⊗ Y⊗2
= (vecId ) Y⊗2 Y⊗2 = (vecId ) ⊗ Id 2 Y⊗4
= Y⊗2 ⊗ Y⊗2 vecId
= Y⊗2 ⊗ (vecId ) Y⊗2 = Id 2 ⊗ (vecId ) Y⊗4 ,
vecB (Y) = Id 2 ⊗ (vecId ) κ ⊗4 + L−1
22 vec⊗2
Id − (d + 2) vecId
−1 ⊗2
= Id 2 ⊗ (vecId ) κ ⊗
4 + Id 2 ⊗ (vecId ) L22 vec Id − (d + 2) vecId
= Id 2 ⊗ (vecId ) κ ⊗
4.
6.6
( # )
2 !Y " E
2
Yi Y1
1
b (Y) = E Yi = # 2 .
Y2 E Yi Y2
References
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 409
G. Terdik, Multivariate Statistical Methods, Frontiers in Probability
and the Statistical Sciences, https://doi.org/10.1007/978-3-030-81392-5
410 References
[CGLM08] Comon P, Golub G, Lim LH, Mourrain B (2008) Symmetric tensors and symmetric
tensor rank. SIAM J Matrix Anal Appl 30(3):1254–1279
[Cha67] Chambers JM (1967) On methods of asymptotic approximation for multivariate
distributions. Biometrika 54(3–4):367–383
[Cra99] Cramér H (1999) Mathematical methods of statistics, vol 43. Princeton University
Press, Princeton
[DGP18] Domino K, Gawron P, Pawela Ł (2018) Efficient computation of higher-order
cumulant tensors. SIAM J Sci Comput 40(3):A1590–A1610
[Dha18] Dharmani BC (2018) Multivariate generalized Gram–Charlier series in vector nota-
tions. J Math Chem 56(6):1631–1655
[DL05] Dey DK, Liu J (2005) A new construction for skew multivariate distributions. J
Multivar Anal 95(2):323–344
[DL08] De Silva V, Lim LH (2008) Tensor rank and the ill-posedness of the best low-rank
approximation problem. SIAM J Matrix Anal Appl 30(3):1084–1127
[DLM] NIST Digital Library of Mathematical Functions. http://dlmf.nist.gov/, Release 1.0.17
of 2017-12-22. Olver FWJ, Olde Daalhuis AB, Lozier DW, Schneider BI, Boisvert
RF, Clark CW, Miller BR, Saunders BV (eds)
[DM77] Dobrushin RL, Minlos RA (1977) Polynomials in linear random functions. Russ Math
Surv 32(2):71–127
[DM79a] Dobrushin RL, Major P (1979) Non-central limit theorems for non-linear functionals
of Gaussian fields. Z Wahrsch Verw Gebiete 50:27–52
[DM79b] Dobrushin RL, Minlos RA (1979) The moments and polynomials of a generalized
random field. Theory Probab Appl 23(4):686–699
[DX14] Dunkl CF, Xu Y (2014) Orthogonal polynomials of several variables, vol 155.
Cambridge University Press, Cambridge
[Ela61] Elandt RC (1961) The folded normal distribution: Two methods of estimating
parameters from moments. Technometrics 3(4):551–562
[EMOT81] Erdélyi A, Magnus W, Oberhettinger F, Tricomi FG (1981) Higher transcendental
functions. Vol. II. Robert E. Krieger Publishing Co. Inc., Melbourne, FL. Based on
notes left by Harry Bateman, Reprint of the 1953 original
[Fel40] Feldheim E (1940) Expansions and integral-transforms for products of Laguerre and
Hermite polynomials. Q J Math 1(1):18–29
[Fel66] Feller W (1966) An Introduction of Probability Theory and its Application, vol II.
John Wiley, New York
[Fis30] Fisher RA (1930) Moments and product moments of sampling distributions. Proc
Lond Math Soc 2(1):199–238
[FKN17] Fang KW, Kotz S, Ng KW (2017) Symmetric multivariate and related distributions.
Chapman and Hall/CRC, Boca Raton
[Gen04] Genton MG (2004) Skew-elliptical distributions and their applications: a journey
beyond normality. CRC Press, Boca Raton
[GHL01] Genton MG, He L, Liu X (2001) Moments of skew-normal random vectors and their
quadratic forms. Stat Prob Lett 51(4):319–325
[GJ12] Glimm J, Jaffe A (2012) Quantum physics: a functional integral point of view.
Springer Science & Business Media, New York
[GL05] Genton MG, Loperfido NMR (2005) Generalized skew-elliptical distributions and
their quadratic forms. Ann Inst Stat Math 57(2):389–401
[Goo75] Good IJ (1975) A new formula for cumulants. Math Proc Camb Philos Soc 78(2):333–
337
[GR00] Gradshteyn IS, Ryzhik IM (2000) Table of integrals, series, and products, 6th edn.
Academic Press Inc., San Diego, CA. Translated from the Russian, Translation edited
and with a preface by Alan Jeffrey and Daniel Zwillinger
[Gra49] Grad H (1949) Note on N-dimensional Hermite polynomials. Commun Pure Appl
Math 2(4):325–330
412 References
[Gra18] Graham A (2018) Kronecker products and matrix calculus with applications. Courier
Dover Publications, New York
[Gup63] Gupta SS (1963) Bibliography on the multivariate normal integrals and related topics.
Ann Math Stat 34(3):829–838
[Hal00] Hald A (2000) The early history of the cumulants and the Gram–Charlier series. Int
Stat Rev 68(2):137–153
[Har06] Hardy M (2006) Combinatorics of partial derivatives. Electron J Combin 13(1):1
[Hen02] Henze N (2002) Invariant tests for multivariate normality: a critical review. Stat Pap
43(4):467–506
[Hid80] Hida T (1980) Brownian motion. Springer, New York
[Hol85] Holmquist B (1985) The direct product permuting matrices. Linear Multilinear
Algebra 17(2):117–141
[Hol88] Holmquist B (1988) Moments and cumulants of the multivariate normal distribution.
Stoch Anal Appl 6(3):273–278
[Hol96a] Holmquist B (1996) The d-variate vector Hermite polynomial of order. Linear Algebra
Appl 237/238:155–190
[Hol96b] Holmquist B (1996) Expectations of products of quadratic forms in normal variables.
Stoch Anal Appl 14(2):149–164
[HS79] Henderson HV, Searle SR (1979) Vec and vech operators for matrices, with some uses
in jacobians and multivariate statistics. Canad J Stat 7(1):65–81
[HS81] Henderson HV, Searle SR (1981) The vec-permutation matrix, the vec operator and
Kronecker products: a review. Linear Multilinear Algebra 9(4):271–288
[Iso82] Isogai T (1982) On a measure of multivariate skewness and a test for multivariate
normality. Ann Inst Stat Math 34(1):531–541
[Iss18] Isserlis L (1918) On a formula for the product-moment coefficient of any order of a
normal frequency distribution in any number of variables. Biometrika 12(1/2):134–
139
[Jam58] James GS (1958) On moments and cumulants of systems of statistics. Sankhyā Indian
J Stat 20:1–30
[JM62] James GS, Mayne AJ (1962) Cumulants of functions of random variables. Sankhyā
Indian J Stat A 24:47–54
[JST06] Jammalamadaka S, Subba Rao T, Terdik Gy (2006) Higher order cumulants of random
vectors and applications to statistical inference and time series. Sankhya (A Methodol)
68:326–356
[JTT20] Jammalamadaka SR, Taufer E, Terdik GyH (2020) On multivariate skewness and
kurtosis. Sankhya A 83(2):1–38
[JTT21a] Jammalamadaka SR, Taufer E, Terdik GyH (2021) Asymptotic theory for statistics
based on cumulant vectors with applications. Scand J Stat 48(2):708–728
[JTT21b] Jammalamadaka SR, Taufer E, Terdik GyH (2021) Cumulants of multivariate sym-
metric and skew symmetric distributions. Symmetry 13(8):1383
[KBF+ 15] Khatib ME, Brea O, Fertitta E, Bendazzoli GL, Evangelisti S, Leininger T (2015) The
total position-spread tensor: Spin partition. J Chem Phys 142(9):094113
[Ken44] Kendall MG (1944) The advanced theory of statistics. Vol. I, J. B. Lippincott Co.,
Philadelphia
[Kim08] Kim HM (2008) A note on scale mixtures of skew normal distribution. Stat Probab
Lett 78(13):1694–1701
[Kla02] Klar B (2002) A treatment of multivariate skewness, kurtosis, and related statistics. J
Multivar Anal 83(1):141–165
[KM97] Kostrikin A, Manin Y (1997) Linear algebra and geometry. Breach Science Publishers
[KM03] Kim HJ, Mallick BK (2003) Moments of random vectors with skew t distribution and
their quadratic forms. Stat Probab Lett 63(4):417–423
[Kol08] Kollo T (2008) Multivariate skewness and kurtosis measures with an application in
ICA. J Multivar Anal 99(10):2328–2338
References 413
[Koz87] Koziol JA (1987) An alternative formulation of Neyman’s smooth goodness of fit tests
under composite alternatives. Metrika 34(1):17–24
[Koz89] Koziol JA (1989) A note on measures of multivariate kurtosis. Biom J 31(5):619–624
[KRYAV16] Kahrari F, Rezaei M, Yousefzadeh F, Arellano-Valle RB (2016) On the multivariate
skew-normal-Cauchy distribution. Stat Probab Lett 117:80–88
[KS77] Kendall M, Stuart A (1977) The advanced theory of statistics, Vol. 1: Distribution
theory, 4th edn. Griffin, London
[KS05] Kollo T, Srivastava MS (2005) Estimation and testing of parameters in multivariate
Laplace distribution. Commun Stat Theory Methods 33(10):2363–2387
[Kv95] Kollo T, von Rosen D (1995) Minimal moments and cumulants of symmetric matrices:
An application to the wishart distribution. J Multivar Anal 55(2):149–164
[KvR06] Kollo T, von Rosen D (2006) Advanced multivariate statistics with matrices, vol 579.
Springer Science & Business Media, New York
[Leo64] Leonov VP (1964) Nekotorye primeneniya starshikh semiinvariantov k teorii statsion-
arnykh sluchainykh protsessov. Izdat. “Nauka”, Moscow
[LHL18] Luan X, Huang B, Liu F (2018) Higher order moment stability region for Markov
jump systems based on cumulant generating function. Automatica 93:389–396
[Lop14] Loperfido N (2014) A note on the fourth cumulant of a finite mixture distribution. J
Multivar Anal 123:386–394
[LS59] Leonov VP, Shiryaev AN (1959) On a method of calculation of semi-invariants.
Theory Probab Appl 4(3):319–329
[Luk55] Lukacs E (1955) Applications of Faa di Bruno’s formula in statistics. Am Math
Monthly 62:340–348
[Luk70] Lukacs E (1970) Characteristic functions. Griffin
[MA73] Malkovich JF, Afifi AA (1973) On tests for multivariate normality. J Am Stat Assoc
68(341):176–179
[Ma09] Ma TW (2009) Higher chain formula proved by combinatorics. Electron J Combin
16(1):N21
[Mac74] MacRae EC (1974) Matrix derivatives with an application to an adaptive linear
decision problem. Ann Stat 2:337–346
[Maj81] Major P (1981) Multiple Wiener–Itô integrals, Lecture Notes in Mathematics, vol 849,
2nd, 2014 edn. Springer, New York
[Mal78] Malahov AN (1978) Kumulyantnyiy analiz sluchaynyih negaussovyih protsessov i
ih preobrazovaniy [Cumulant analysis of random non-Gaussian processes and their
transformations]. Sov. Radio Publ, Moskow, P 376
[Mal80] Malyshev VA (1980) Cluster expansions in lattice models of statistical physics and
the quantum theory of fields. Uspekhi Mat Nauk 35(2(212)):2–53
[Mar70] Mardia KV (1970) Measures of multivariate skewness and kurtosis with applications.
Biometrika 57:519–530
[MC86] McCullagh P, Cox DR (1986) Invariants and likelihood ratio statistics. Ann Stat
14(4):1419–1430
[McC87] McCullagh P (1987) Tensor methods in statistics. Monographs on statistics and
applied probability. Chapman & Hall, London
[McC18] McCullagh P (2018) Tensor methods in statistics. Courier Dover Publications, New
York
[McK52] McKean HPJ (1952) A new proof of the completeness of the hermite functions. Tech.
rep., Mathematical Notes
[Mei05] Meijer E (2005) Matrix algebra for higher order moments. Linear Algebra Appl
410:112–134
[Mia19] Miatto FM (2019) Recursive multivariate derivatives of arbitrary order. Preprint.
arXiv:191111722
[MM85] Malyshev VA, Minlos RA (1985) Gibbs random fields. Nauka, Moscow
414 References
[MN80] Magnus JR, Neudecker H (1980) The elimination matrix: Some lemmas and applica-
tions. SIAM J Algebraic Discrete Methods 1(4):422–449
[MN99] Magnus JR, Neudecker H (1999) Matrix differential calculus with applications in
statistics and econometrics. John Wiley & Sons Ltd., Chichester. Revised reprint of
the 1988 original
[MNBO09] Michalowicz JV, Nichols JM, Bucholtz F, Olson CC (2009) An Isserlis’ theorem
for mixed Gaussian variables: Application to the auto-bispectral density. J Stat Phys
136(1):89–102
[Mor82] Morris CN (1982) Natural exponential families with quadratic variance functions. Ann
Stat 10(1):65–80
[MP92] Mathai AM, Provost SB (1992) Quadratic forms in random variables: theory and
applications. Dekker
[MRS94] Móri TF, Rohatgi VK, Székely GJ (1994) On multivariate skewness and kurtosis.
Theory Probab Appl 38(3):547–551
[Mui09] Muirhead RJ (2009) Aspects of multivariate statistical theory, vol 197. John Wiley &
Sons, New York
[Neu69] Neudecker H (1969) Some theorems on matrix differentiation with special reference
to Kronecker matrix products. J Am Stat Assoc 64(327):953–963
[NGS08] Di Nardo E, Guarino G, Senato D (2008) Symbolic computation of moments of
sampling distributions. Comput Stat Data Anal 52(11):4909–4922
[Nor84] Northcott DG (1984) Multilinear algebra. Cambridge University Press, Cambridge
[NR03] Noschese S, Ricci PE (2003) Differentiation of multivariable composite functions and
bell polynomials. J Comput Anal Appl 5(3):333–340
[NS09] Di Nardo E, Senato D (2009) The eleventh and twelveth problems of rota’s fubini
lectures: from cumulants to free probability theory. In: From combinatorics to
philosophy, Springer US, New York, pp 91–130
[OST18] Orlowski P, Sali A, Trojani F (2018) Arbitrage free dispersion. SSRN Electron J, 78.
https://doi.org/10.2139/ssrn.3314269
[PP01] Psarakis S, Panaretos J (2001) On some bivariate extensions of the folded normal and
the folded-t distributions. J Appl Stat Sci 10(2):119–136
[PS72] Pólya Gy, Szegő G (1972) Problems and theorems in analysis, vol I. Springer, Berlin,
Heidelberg
[PT11] Peccati G, Taqqu MS (2011) Wiener chaos: moments, cumulants and diagrams: A
survey with computer implementation, vol 1. Springer Science & Business Media,
New York
[Rah17] Rahman S (2017) Wiener–Hermite polynomial expansion for multivariate gaussian
probability measures. J Math Anal Appl 454(1):303–334
[RB89] Rayner JCW, Best DJ (1989) Smooth tests of goodness of fit. Oxford statistical science
series. Oxford University Press, New York
[Rob16] Robeva E (2016) Orthogonal decomposition of symmetric tensors. SIAM J Matrix
Anal Appl 37(1):86–102
[RS00] Rota G, Shen J (2000) On the combinatorics of cumulants. J Combin Theory A
91:283–304. http://www.idealibrary.com
[Sav06] Savits TH (2006) Some statistical applications of Faa di Bruno. J Multivar Anal
97(10):2131–2140
[Sch19] Schumann A (2019) Multivariate bell polynomials and derivatives of composed
functions. Preprint. arXiv:190303899
[SDB03] Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions
with applications to Bayesian regression models. Can J Stat 31(2):129–150
[Ser04] Serfling RJ (2004) Multivariate symmetry and asymmetry. Encyclopedia Stat Sci
8:5338–5345
[Shi60] Shiryaev AN (1960) Some problems in spectral theory of higher order- moments I.
Theor Prob Appl 5:293–313
[Shi96] Shiryaev AN (1996) Probability, vol 95. Springer, New York
References 415
[Sho10] Shopin SA (2010) Cubic mapping of the normal random vector. Izvestiya Tula State
Univ 2010(2):211 – 221
[Shu16] Shushi T (2016) A proof for the conjecture of characteristic function of the generalized
skew-elliptical distributions. Stat Probab Lett 119:301–304
[Sko86] Skovgaard I (1986) A note on the differentiation of cumulants of log likelihood
derivatives. Int Stat Rev 54(1):29–32
[Son01] Song KS (2001) Rényi information, loglikelihood and an intrinsic distribution mea-
sure. J Stat Plan Inference 93(1–2):51–69
[Spe83] Speed TP (1983) Cumulants and partition lattices. Austral J Stat 25(2):378–388
[Spe86a] Speed TP (1986) Cumulants and partition lattices. II. Generalised k-statistics. J
Austral Math Soc Ser A 40(1):34–53
Speed TP (1986) Cumulants and partition lattices. III. Multiply-indexed arrays. J
Austral Math Soc Ser A 40(2):161–182
[Spe86c] Speed TP (1986) Cumulants and partition lattices. IV. A.s. convergence of generalised
k-statistics. J Austral Math Soc Ser A 41(1):79–94
[Spe90] Speed TP (1990) Invariant moments and cumulants. In: Coding theory and design
theory, Part II, IMA Vol. Math. Appl., vol 21, Springer, New York, pp 319–335
[Sri84] Srivastava MS (1984) A measure of skewness and kurtosis and a graphical method for
assessing multivariate normality. Stat Probab Lett 2(5):263–267
[SS88a] Speed TP, Silcock HL (1988) Cumulants and partition lattices. V. Calculating
generalized k-statistics. J Austral Math Soc Ser A 44(2):171–196
[SS88b] Speed TP, Silcock HL (1988) Cumulants and partition lattices. VI. Variances and
covariances of mean squares. J Austral Math Soc Ser A 44(3):362–388
[ST08] Swe M, Taniguchi M (2008) Higher-order asymptotic properties of a weighted
estimator for Gaussian ARMA processes. J Time Series Anal 12(1):83–93
[Ste93] Steyn HS (1993) On the problem of more than one kurtosis parameter in multivariate
analysis. J Multivar Anal 44(1):1–22
[Sut86] Sutradhar BC (1986) On the characteristic function of multivariate Student t-
distribution. Can J Stat/La Revue Canadienne de Statistique 14(4):329–337
[Sze36] Szegő G (1936) Orthogonal polynomials. American Mathematical Society, Collo-
quium Publ., vol XXIII. American Mathematical Society, New York
[Taq75] Taqqu MS (1975) Weak convergence to fractional Brownian motion and to the
Rosenblatt process. Z Wahrsch verwandte Geb 31:287–302
[Ter02] Terdik Gy (2002) Higher order statistics and multivariate vector hermite polynomials
for nonlinear analysis of multidimensional time series. Teor Ver Matem Stat (Teor
Imovirnost ta Matem Statyst) 66:147–168
[Thi97] Thiélé TN (1897) Elementaer Iagttagelseslaere. København: Gyldendalske. Reprinted
in English as ‘The Theory of Observations’ Ann. Math. Statist. (1931) 2:165–308
[VN94] Voinov VG, Nikulin M (1994) On power series, bell polynomials, Hardy-Ramanujan-
Rademacher problem and its statistical applications. Kybernetika 30(3):343–358
[VN97] Voinov VG, Nikulin MS (1997) On a subset sum algorithm and its probabilistic and
other applications. Birkhäuser Boston, Boston, MA, pp 153–163
[VPO19] Vicuña MI, Palma W, Olea R (2019) Minimum distance estimation of locally
stationary moving average processes. Comput Stat Data Anal 140:1–20
[Wat38] Watson GN (1938) A note on the polynomials of Hermite And Laguerre. J Lond Math
Soc 1(1):29–32
[Wei15] Weis M (2015) Multi-dimensional signal decomposition techniques for the analysis
of eeg data. PhD thesis, Universitätsbibliothek Ilmenau
[Whi53] Whittle P (1953) The analysis of multiple stationary time series. J R Stat Soc Ser B
15:125–139
[Wit84] Withers CS (1984) A chain rule for differentiation with applications to multivariate
hermite polynomials. Bull Austral Math Soc 30(2):247–250
416 References
[Wit00] Withers CS (2000) A simple expression for the multivariate Hermite polynomials. Stat
Probab Lett 47(2):165–169
[Wün00a] Wünsche A (2000) Corrigenda: “General Hermite and Laguerre two-dimensional
polynomials”. J Phys A 33(17):3531
[Wün00b] Wünsche A (2000) General Hermite and Laguerre two-dimensional polynomials. J
Phys A 33(8):1603–1629
Index
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 417
G. Terdik, Multivariate Statistical Methods, Frontiers in Probability
and the Statistical Sciences, https://doi.org/10.1007/978-3-030-81392-5
418 Index
S
I Semifactorial, 381
Inner product, 382 Solutions, 385
Star product, 51
Subscription, 384
M Symmetrizer, 14
Mill’s ratio, 241
Moment
of Gaussian system, 190, 216 T
of Hermite polynomial, 192, 217 Tensor product, 5
of order m, 108 Type
via cumulants, 138 multi-index, 19, 186
Multi-index, 16 partition, 32, 33, 45, 65, 66, 90, 91, 131,
Multilinear Algebra, 13 135, 156
P V
Partitions, 26, 383 Vec operator, 6
indecomposable, 38, 144, 148