You are on page 1of 424

Frontiers in Probability and the Statistical Sciences

György Terdik

Multivariate
Statistical
Methods
Going Beyond the Linear
Frontiers in Probability and the Statistical
Sciences

Editor-in-Chief
Somnath Datta, Sch of Public Health & Info Sci, University of Louisville,
Louisville, KY, USA

Series Editors
Frederi G. Viens, 1399 Math, Science Building, Purdue University, WEST
LAFAYETTE, IN, USA
Dimitris N. Politis, Dept Math, APM 5701, University of California, San Diego,
La Jolla, CA, USA
Konstantinos Fokianos, Mathematics & Statistics, University of Cyprus Mathemat-
ics & Statistics, Nikosia, Cyprus
Michael Daniels, J.T. Patterson Labs Bldg PAT 141-MC, University of Texas,
Austin, USA
The “Frontiers” is a new series of books (edited volumes and monographs) in
probability and statistics designed to capture exciting trends in current research
as they develop. Some emphasis will be given on novel methodologies that may
have interdisciplinary applications in scientific fields such as biology, ecology,
economics, environmental sciences, finance, genetics, material sciences, medicine,
omics studies and public health.

More information about this series at http://www.springer.com/series/11957


György Terdik

Multivariate Statistical
Methods
Going Beyond the Linear

– Vector-Moments and Vector-Cumulants


– Nonlinear Statistics of Normal Multivariates
– Testing Skewness and Kurtosis
György Terdik
Faculty of Informatics, Dept. of IT
University of Debrecen
Debrecen, Hungary

ISSN 2624-9987 ISSN 2624-9995 (electronic)


Frontiers in Probability and the Statistical Sciences
ISBN 978-3-030-81391-8 ISBN 978-3-030-81392-5 (eBook)
https://doi.org/10.1007/978-3-030-81392-5

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2021
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG.
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To
Judit,
and
Bori and Bálint and Ábel
Foreword

As the author rightly points out, much of what we call linear modeling and inference
in multivariate statistics is closely tied to the multivariate Gaussian distribution and
allied topics. Indeed, the classic text by Professor C.R. Rao is aptly titled Linear
Statistical Inference (1973, Wiley) as is one of the books recently co-authored by me
which is titled Linear Models and Regression with R (2020, World Scientific Press).
Non-Gaussian multivariate distributions are characterized by nonlinearity, where the
cumulants play a prominent role. This monograph is a very deep and substantial
piece of work analyzing the cumulants and related statistical measures like the
skewness and kurtosis for the most commonly discussed non-Gaussian multivariate
distributions. This is also a great place to learn about and polish mathematical
prerequisites like multilinear algebra and tensor products for higher-order partial
derivatives of vector-valued functions. A large number of exercises at the end of
each chapter is an added bonus for those who want to use this as a text for an
advanced course. The extensive set of nearly two hundred references, which range
from a manuscript that goes back to 1931 to the most recent papers appearing in
2020 and 2021, demonstrates the breadth and depth of coverage.
This monograph will be a standard reference for those who are interested in
multivariate distribution theory that goes beyond the multidimensional normal, and
will remain so for many years to come!

Santa Barbara, CA, USA S. Rao Jammalamadaka


February 25, 2021 Distinguished Professor,
Dept. of Statistics and Applied Probability
University of California, Santa Barbara

vii
Preface

Linear theory in statistics is closely connected to the normal distribution. One of the
main reasons for this is that the best predictor is linear when the random variables are
jointly normal, and an important characterization of the normal distribution is that
all cumulants of order higher than 2 are zero. The linear theory is pretty much well
defined, unlike the nonlinear theory, which can proceed in many different ways. A
careful study of the cumulants is a necessary and typical part of nonlinear statistics.
Such a study of cumulants for multivariate distributions is made complicated by
the index notations. One solution to this problem is the usage of tensor analysis.
In this book we offer an alternate method, which we believe is simpler to follow.
The higher-order cumulants with the same degree for a multivariate vector can
be collected together and kept as a vector. To be able to do so, we introduce a
particular differential operator on a multivariate function, called the T -derivative,
and use it to obtain cumulants and provide results which are somewhat analogous to
well-known results in the univariate case. We demonstrate this technique through
the characterization of several multivariate distributions via their cumulants and
by extending the discussion to statistical inference for multivariate skewness and
kurtosis.
The book is organized as follows:
Chapter 1 introduces some basic notions and methods which are used in
permutations, matrix theory, multilinear algebra, and set partitions.
Chapter 2 deals with the method of tensor products for the higher-order partial
derivatives of vector-valued functions. Faà di Bruno’s formula is also discussed here.
Chapter 3 concerns the basic theory of T -moments and T -cumulants. Besides
connections between cumulants and moments, there are results which relate the
cumulants of products to products of cumulants, conditional cumulants, etc.
Chapter 4 covers the elementary theory of the nonlinear Hilbert space of
Gaussian variates. Multivariate vector-valued Hermite polynomials are introduced
with their basic properties and their moments and cumulants are derived.
Chapters 5 and 6 deal with various applications of the material of the previous
chapters. In particular, Chap. 5 deals with the cumulants of multivariate skew-
distributions, including skew-normal, skew-spherical, skew-t, scale mixtures of

ix
x Preface

skew normal, skew-normal-Cauchy, and Laplace distributions. The final Chap. 6 is


devoted to statistical inference for multivariate skewness and kurtosis. We study the
estimators of higher-order cumulant vectors and their asymptotic normality. Explicit
formulae for the asymptotic covariances of estimated skewness and kurtosis are
given.
The results presented in the book can be followed rather easily because specific
references to the formulae that have been used previously are given step by step.
Nevertheless, a close reading of the core Chaps. 3 and 4 is required for a proper
understanding.
The exercises are of service to practising the methods; some of them are alluded
to as simple facts in the text but left unproved. Selected solutions, some useful
formulae, as well as basic notations are listed in the Appendix of the book under
the titles Solutions, Formulae, and Notations, respectively.
The book is designed both as text for advanced graduate-level courses in
multivariate statistical analysis and as a reference book for research workers
interested in this area. The R package MultiStatM is available for applications of
the theory considered in this book.
During the summer of 2001, when Professor Tata Subba Rao visited the
University of Debrecen, we discussed the idea and agreed that statisticians need
to pay more attention to cumulants in general. We proposed to write a book on
the elementary treatment of cumulants, but unfortunately that work was never
continued. In the present book, we use some of the ideas and even some material on
cumulants for scalar-valued variates that we discussed at that time.
It is a great pleasure to thank Professor Sreenivasa Rao Jammalamadaka at
the University of California, Santa Barbara, and Professor Emanuele Taufer at
the University of Trento for numerous discussions and joint work concerning
multivariate skewness and kurtosis.
Special thanks go to Professor Sreenivasa Rao Jammalamadaka for a careful
reading of the manuscript and useful suggestions which improved the presentation
of the book.

Debrecen, Hungary György Terdik


January 2021
Contents

1 Some Introductory Algebra .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1


1.1 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1
1.2 Tensor Product, vec Operator, and Commutation . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Tensor Product . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5
1.2.2 The vec Operator.. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6
1.2.3 Commutation Matrices . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7
1.2.4 Commuting T-Products of Vectors .. . . . . .. . . . . . . . . . . . . . . . . . . . 10
1.3 Symmetrization and Multilinear Algebra.. . . . . . . .. . . . . . . . . . . . . . . . . . . . 13
1.3.1 Symmetrization . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13
1.3.2 Multi-Indexing, Elimination, and Duplication . . . . . . . . . . . . . . 16
1.4 Partitions and Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 26
1.4.1 Generating all Partitions . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 29
1.4.2 The Number of All Partitions . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 30
1.4.3 Canonical Partitions . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 34
1.4.4 Partitions and Permutations . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 35
1.4.5 Partitions with Lattice Structure . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 37
1.4.6 Indecomposable Partitions . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 38
1.4.7 Alternative Ways of Checking Indecomposability . . . . . . . . . . 40
1.4.8 Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 43
1.5 Appendix .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 49
1.5.1 Proof of Lemma 1.1 . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 49
1.5.2 Proof of Lemma 1.3 . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 50
1.5.3 Proof of Lemma 1.5 . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 50
1.5.4 Star Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 51
1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 52
1.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 59
2 The Tensor Derivative of Vector Functions . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 61
2.1 Derivatives of Composite Functions .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 61
2.1.1 Faà di Bruno’s Formula.. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 63
2.1.2 Mixed Higher-Order Derivatives .. . . . . . . .. . . . . . . . . . . . . . . . . . . . 67

xi
xii Contents

2.2 T-derivative.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 70
2.2.1 Differentials and Derivatives . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 70
2.2.2 The Operator of T-derivative . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 72
2.2.3 Basic Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 74
2.2.4 T-derivative of T-products . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 77
2.2.5 Taylor Series Expansion .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 88
2.3 Multi-Variable Faà di Bruno’s Formula . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 90
2.4 Appendix .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 98
2.4.1 Proof of Faà di Bruno’s Lemma.. . . . . . . . .. . . . . . . . . . . . . . . . . . . . 98
2.4.2 Proof of Faà di Bruno’s T-formula .. . . . . .. . . . . . . . . . . . . . . . . . . . 99
2.4.3 Moment Commutators .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 100
2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 101
2.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 106
3 T-Moments and T-Cumulants . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 107
3.1 Multiple Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 107
3.2 Tensor Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 110
3.3 Cumulants for Multiple Variables.. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 118
3.3.1 Definition of Cumulants .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 118
3.3.2 Definition of T-cumulants . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 120
3.3.3 Basic Properties .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 124
3.4 Expressions between Moments and Cumulants ... . . . . . . . . . . . . . . . . . . . 126
3.4.1 Expression for Cumulants via Moments .. . . . . . . . . . . . . . . . . . . . 126
3.4.2 Expressions for Moments via Cumulants.. . . . . . . . . . . . . . . . . . . 138
3.4.3 Expression of the Cumulant of Products via
Products of Cumulants . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 143
3.5 Additional Matters .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 149
3.5.1 Expressions of Moments and Cumulants via
Preceding Moments and Cumulants . . . . .. . . . . . . . . . . . . . . . . . . . 149
3.5.2 Cumulants and Fourier Transform . . . . . . .. . . . . . . . . . . . . . . . . . . . 151
3.5.3 Conditional Cumulants . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 154
3.5.4 Cumulants of the Log-likelihood Function .. . . . . . . . . . . . . . . . . 161
3.6 Appendix .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 167
3.6.1 Proof of Lemma 3.6 and Theorem 3.7 . . .. . . . . . . . . . . . . . . . . . . . 167
3.6.2 A Hint for Proof of Lemma 3.8 . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 170
3.6.3 Proof of Lemma 3.2 . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 172
3.6.4 Proof of Lemma 3.5 . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 173
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 174
3.8 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 180
4 Gaussian Systems, T-Hermite Polynomials, Moments,
and Cumulants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 183
4.1 Hermite Polynomials in One Variable . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 183
4.2 Hermite Polynomials of Several Variables . . . . . . .. . . . . . . . . . . . . . . . . . . . 185
4.3 Moments and Cumulants for Gaussian Systems .. . . . . . . . . . . . . . . . . . . . 190
Contents xiii

4.3.1
Moments of Gaussian Systems and Hermite
Polynomials .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 190
4.3.2 Cumulants for Product of Gaussian Variates and
Hermite Polynomials . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 193
4.4 Products of Hermite Polynomials, Linearization.. . . . . . . . . . . . . . . . . . . . 197
4.5 T-Hermite Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 202
4.6 Moments, Cumulants, and Linearization . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 216
4.6.1 Cumulants for T-Hermite Polynomials . .. . . . . . . . . . . . . . . . . . . . 220
4.6.2 Products for T-Hermite Polynomials.. . . .. . . . . . . . . . . . . . . . . . . . 223
4.7 Gram–Charlier Expansion.. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 226
4.8 Appendix .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 232
4.8.1 Proof of Theorem 4.2 .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 232
4.8.2 Proof of (4.79) . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 233
4.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 234
4.10 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 238
5 Multivariate Skew Distributions . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 241
5.1 The Multivariate Skew-Normal Distribution . . . . .. . . . . . . . . . . . . . . . . . . . 241
5.1.1 The Inverse Mill’s Ratio and the Central Folded
Normal Distribution . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 242
5.1.2 Skew-Normal Random Variates . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 244
5.1.3 Canonical Fundamental Skew-Normal (CFUSN)
Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 247
5.2 Elliptically Symmetric and Skew-Spherical Distributions .. . . . . . . . . . 252
5.2.1 Elliptically Contoured Distributions . . . . .. . . . . . . . . . . . . . . . . . . . 253
5.2.2 Multivariate Moments and Cumulants . . .. . . . . . . . . . . . . . . . . . . . 261
5.2.3 Canonical Fundamental Skew-Spherical Distribution . . . . . . 266
5.3 Multivariate Skew-t Distribution .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 275
5.3.1 Multivariate t-Distribution.. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 275
5.3.2 Skew-t Distribution . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 277
5.3.3 Higher-Order Cumulants of Skew-t Distributions.. . . . . . . . . . 278
5.4 Scale Mixtures of Skew-Normal Distribution .. . .. . . . . . . . . . . . . . . . . . . . 285
5.5 Multivariate Skew-Normal-Cauchy Distribution .. . . . . . . . . . . . . . . . . . . . 287
5.5.1 Moments of h(|Z|) . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 292
5.6 Multivariate Laplace .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 294
5.7 Appendix .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 297
5.7.1 Spherically Symmetric Distribution . . . . .. . . . . . . . . . . . . . . . . . . . 297
5.7.2 T-Derivative of an Inner Product .. . . . . . . .. . . . . . . . . . . . . . . . . . . . 303
5.7.3 Proof of (5.44).. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 304
5.7.4 Proof of Lemma 5.6 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 306
5.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 307
5.9 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 310
6 Multivariate Skewness and Kurtosis .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 313
6.1 Multivariate Skewness of Random Vectors.. . . . . .. . . . . . . . . . . . . . . . . . . . 313
6.2 Multivariate Kurtosis of Random Vectors . . . . . . . .. . . . . . . . . . . . . . . . . . . . 321
xiv Contents

6.3 Indices Based on Distinct Elements of Cumulant Vectors .. . . . . . . . . . 327


6.4 Testing Multivariate Skewness .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 328
6.4.1 Estimation of Skewness . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 329
6.4.2 Testing Zero Skewness . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 333
6.5 Testing Multivariate Kurtosis . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 337
6.5.1 Estimation of Kurtosis .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 338
6.5.2 Testing Zero Kurtosis .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 341
6.6 A Simulation Study .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 343
6.7 Appendix .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 346
6.7.1 Estimated Hermite Polynomials . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 346
6.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 347
6.9 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 348

A Formulae .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 351
A.1 Bell Polynomials.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 351
A.1.1 Incomplete (Partial) Bell Polynomials .. .. . . . . . . . . . . . . . . . . . . . 351
A.1.2 Bell Polynomials .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 352
A.2 Commutators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 353
A.2.1 Moment Commutators .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 353
A.2.2 Commutators Connected to T-Hermite Polynomials .. . . . . . . 356
A.3 Derivatives of Composite Functions .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 359
A.4 Moments, Cumulants .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 361
A.4.1 T-Moments, T-Cumulants . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 361
A.5 Hermite Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 363
A.5.1 Product of Hermite Polynomials .. . . . . . . .. . . . . . . . . . . . . . . . . . . . 363
A.5.2 T-Hermite Polynomials .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 365
A.6 Function G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 368
A.6.1 Moments, Cumulants for Skew-t Generator R . . . . . . . . . . . . . . 370
A.6.2 Moments of Beta Powers . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 374
A.7 Complementary Error Function .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 376
A.8 Derivatives of i-Mill’s Ratio. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 378
Notations . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 381
Solutions. . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 385
References .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 409
Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 417
Chapter 1
Some Introductory Algebra

Abstract In this chapter, we summarize some basic theory concerning permu-


tations, multilinear algebra, set partitions, and diagrams for usage throughout
the later part of the book. We start with the notion of permutations and then
discuss the tensor product (T -product), vec operator, and commutation matrices.
A transformation of multiple T -products of vectors into a given order is also
considered. In the section on symmetrization and multilinear algebra, we introduce
the symmetrizer in the framework of multilinear algebra of Cartesian tensors. In
connection with the symmetric subspace of tensors we construct linear operators
in terms of matrices for the elimination of identical entries from a tensor as well
as the duplication, triplication, quadruplication, etc. of tensors of distinct entries.
The Partitions and Diagrams section includes the inclusive and exclusive method
of extending partitions to derive all partitions of a finite set. The use of Bell
polynomials and Bell numbers for obtaining the partitions is outlined. Partitions
with lattice structure are considered mainly to discuss indecomposable partitions
and diagrams. A discussion of the particular cases of diagrams, such as closed
diagrams without loops and closed diagrams with arms and no loops, concludes
this chapter.

1.1 Permutations

A set of n distinct objects can be rearranged in several ways. Any of these


arrangements can be considered as a function of the set mapped to itself. In this
sense, a permutation is a bijection. We identify the distinct elements with integers
1 : n = (1, 2, . . . , n).
Let Pn denote the set of all permutations of numbers 1 : n; if p ∈ Pn , then p
can be represented by the following mapping:
 
1 2 ... n
p= ,
p (1) p (2) . . . p (n)

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 1


G. Terdik, Multivariate Statistical Methods, Frontiers in Probability
and the Statistical Sciences, https://doi.org/10.1007/978-3-030-81392-5_1
2 1 Some Introductory Algebra

p (1 : n) = (p (1) , p (2) , . . . , p (n)) for short. Now let us consider a permutation


q (1 : n); then the product (composition) of p and q is defined by
 
1 2 ... n
pq = .
p (q (1)) p (q (2)) . . . p (q (n))

The product operation of permutations has the important property that it is not
commutative, as the following example illustrates.
Example 1.1 Let
   
123 123
p= , q= ;
312 213

then
   
123 123
pq = , qp = .
132 321

Thus we see that pq = qp, so the product operation does not commute.
We will denote the inverse permutation of p by p−1 , for which pp−1 = p−1 p = e,
where e is the identity mapping. Pn constitutes a group with product operation
pq. This group Pn is not commutative. Its generators are the interchanges of
adjacent elements (i, i + 1), called (standard) “transpositions.” Each permutation is
equivalent to a series (product) of transpositions, where the number of transpositions
is kept to a minimum.
We will not necessarily separate the numbers by commas, if they each have one
digit in a permutation. For instance, p = (312) is an abbreviation for the permutation
p = (3, 1, 2) of numbers 1:3.
Remark 1.1 A simple way to generate the inverse of a permutation p to take its
elements from 1 to n and writing down the original indices (places) in that order.
For instance, let p = (312); here “1” has index 2, “2” has index 3, and “2” has
index 1; therefore, the result is p−1 = (231). This simple algorithm for inverting
permutations can easily be implemented as a computer code since usually, when
sorting the entries of a vector, one can also record the original indices of the entries.
We will use commutator matrices to change the order of a tensor product
of vectors. In this way, we reorder the terms in a product and obtain a new
permutation of the vectors. We can apply commutator matrices successively and
obtain new permutations. The consecutive application of matrices corresponds to
the consecutive application of permutations; e.g. first p then q, will be denoted by
q × p.
The following remark is worth noting.
1.1 Permutations 3

Remark 1.2 The consecutive application of permutations is equal to the product of


permutations but in the opposite order, i.e. p × q = p (q (1 : n)) = (qp) (1 : n).
Example 1.2 Let p = (1243), i.e. p interchanges the last two elements, and let q
interchange the two elements in the middle, q = (1324), and then the consecutive
application of p followed by q is

q × p = q (p (1 : 4)) = q ((1243)) = (1423) .

The product pq = (1423) gives the same result as well.


We shall use the cycle notation for permutations, particularly when it simplifies
expressions. Let us apply the permutation p to itself repeatedly on an element, say,
k. Then we have p (k), p2 (k), . . . and
 so on until we return to the original element,
p (k) = k. The sequence α (k) = k, p (k) , p2 (k) , . . . p−1 (k) S is called a cycle
and  is the length of the cycle. We also refer to α as the -cycle. A cycle α (k)
is defined for permutations of at least max (α (k)) elements; in a natural way it
is defined for higher-order permutations as well. For instance, the cycle (3, 2)S
makes sense for all permutations of order n larger than 2 and simply denotes the
interchange of elements 3 and 2.
A permutation can be decomposed into one or more disjoint cycles. The cycles
of each permutation determine a partition of 1 : n. The cycle notation is not unique,
and each element of a cycle α (k) can be chosen as a starting element since α (k) =
α (p (k)). One can use canonical cycles notation, where the largest element is listed
first in each cycle, and the cycles are sorted in increasing order of their first element.
Example 1.3 Consider
 
1234
p= ,
3124

then p (1) = 3, p2 (1) = 2, and p3 (1) = 1, and therefore, α3 = (132)S is a


cycle of p with length  = 3. We have p (4) = 4, so α1 = (4)S is a 1-cycle. The
corresponding partition of 1 : 4 is {(1, 2, 3) , (4)}. The cycle notation of p is p =
(132)S since the 1-cycle is usually not listed. The permutation p = (132)S means
that all elements that are not listed in the cycle remain unchanged. The canonical
cycle notation of p = (3124) is p = (321)S .
Permutations that transpose only two elements, say, j and k, (j = k), have one
cycle only p = (j, k)S , which will be denoted by p(j,k)S as well. Another simple
case, which will be used frequently, is when we move the kth element to the last
place such that we shift back the series of elements (k + 1, . . . , n) by one lag and
leave the rest of the (1, . . . , k − 1) elements unchanged. This permutation in cycle
notation is p = (k, k + 1, . . . , n)S = (k : n)S , for short, with length  = n − k + 1,
in canonical form p = (n, k : (n − 1))S .
4 1 Some Introductory Algebra

Permutations can be considered as column vectors in a natural way. A square


matrix P is said to be a permutation matrix if each row and each column of
P contain a single element 1, and all remaining elements are zero. A unique
permutation matrix Pp corresponds for each p ∈ Pn so that

Pp [1, 2, . . . , n] = [p (1) , p (2) , . . . , p (n)] ,

i.e. (j, p (j )) elements of matrix Pp are ones. Any permutation matrix is orthogonal,
that is

Pp = P−1
p = Pp−1 .

Example 1.4 Let p = (312) = (3 : 1)S , then


⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤
1 001 1 3
Pp ⎣2⎦ = ⎣1 0 0⎦ ⎣2⎦ = ⎣1⎦ ,
3 010 3 2

and the inverse is


⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤
1 010 1 2
P−1
p
⎣ 2⎦ = ⎣0 0 1 ⎦ ⎣ 2⎦ = ⎣3⎦ ,
3 100 3 1

now conclude p−1 = (231), indeed pp−1 = p × p−1 = (123).


A particular permutation of 1 : 2n will be of some interest when we consider the
expected value of the product of two Hermite polynomials (see (4.57), p. 217).
If p ∈ Pn , then we define mn (p) = (1, p (1) + n, 2, p (2) + n, ..., n, p (n) + n) ∈
P2n . Permutation mn (p) is a special mixture of 1 : n and p (1 : n) + n.
For example if n = 3 and p = (321), then m3 (p) = (162534).
Permutation mn (p) can be reached in two steps, first, take the permutation
p1 = (1 : n, p (1 : n) + n) ∈ P2n and then apply the permutation q1 =
(1, 1 + n, 2, 2 + n, ..., n, 2n). The product of these two permutations gives mn (p),
i.e.

mn (p) = p1 q1 = (1 : n, p (1 : n) + n) (1, 1 + n, 2, 2 + n, ..., n, 2n) , (1.1)

and the consecutive application of these permutations also gives the same result

mn (p) = q1 × p1 = (1, 1 + n, 2, 2 + n, ..., n, 2n) × (1 : n, p (1 : n) + n) .

We define mapping mn on pair of vectors k1:n = (k1 , k2 , ..., kn ), j1:n =


 well, so that we use mn (p) on the indices of these vectors.
(j1 , j2 , ..., jn ) as
The image mn k1:n , jp(1:n) is a particular mixing of these two vectors, namely
1.2 Tensor Product, vec Operator, and Commutation 5

the odd places are taken by the values of k1:n and at the even places by jp(1:n) . An
instance is the following,
 let k1:3 = (2, 3, 4) and j1:3 = (5, 7, 8), then all possible
permutations m3 (2, 3, 4) , (5, 7, 8)p(1:3) are

{(253748) , (253847) , (273548) , (273845) , (283547) , (283745)} , (1.2)

and it will be denoted by m3 ((2, 3, 4) , (5, 7, 8)) shortly.

1.2 Tensor Product, vec Operator, and Commutation

1.2.1 Tensor Product

Let A and B be m×n and p×q matrices, respectively. The tensor product A⊗B
of A
and B, which we shall refer to as T-product for short, is defined by A ⊗ B = aij B ,
i.e.
⎡ ⎤
a11B . . . a1n B
⎢ ⎥
A ⊗ B = ⎣ ... . . . ... ⎦ .
am1 B . . . amn B

The tensor product A ⊗ B is a matrix of mp × nq. This product is also called the
direct product or the Kronecker product. Throughout this book we shall apply the
following precedence: a matrix product comes before a T-product

AB ⊗ C = (AB) ⊗ C,

assuming that the matrix product AB is valid.


Here we list some of the most important and well known properties of tensor
products:
1. T-products commute with scalars

aA ⊗ bB = ab (A ⊗ B) ,

for any scalars a and b.


2. T-products are associative and distributive

A ⊗ B ⊗ C = (A ⊗ B) ⊗ C = A ⊗ (B ⊗ C) ,
(A + B) ⊗ (C + D) = A ⊗ C + A ⊗ D + B ⊗ C + B ⊗ D.
6 1 Some Introductory Algebra

3. A connection between T-products and the ordinary matrix products is the mixed
product rule

(A ⊗ B) (C ⊗ D) = AC ⊗ BD, (1.3)

assuming that the appropriate matrix products are valid.


4. The transpose of T-products is

(A ⊗ B) = A ⊗ B .

5. The determinant and inverse of T-products for square matrices, let A be m × m


and B be p × p, then

det (A ⊗ B) = (det A)p (det B)m ,


(A ⊗ B)−1 = A−1 ⊗ B−1 ,

assuming that both inverses exist.


6. The trace of T-products for square matrices A and B is

tr (A ⊗ B) = (trA) (trB) ,

provided that both are n × n.


7. If both A and B are positive definite A > 0 and B > 0, then so is A ⊗ B and
A ⊗ B > 0.
8. The rank of A ⊗ B is equal to the rank of AA ⊗ BB .

1.2.2 The vec Operator




The vec operator acts on a matrix A = aij i=1:m,j =1:n of m × n, by stacking the
columns of the matrix A one underneath the other, i.e.
⎡ ⎤
a1
⎢ a2 ⎥
⎢ ⎥
vecA = ⎢ . ⎥ ,
⎣ .. ⎦
an

where aj denotes the j th column of A. vecA is an mn-dimensional vector.


Sometimes vec operator is also called pack operator.
1.2 Tensor Product, vec Operator, and Commutation 7

The most frequently used properties of the vec operator are the following:
1. If A, B, and C are matrices such that the product ABC is defined then
 
vec (ABC) = C ⊗ A vecB. (1.4)

In particular if a and b are vectors of dimensions m and p, respectively then


   
vec ab = Ip ⊗ a b (1.5)
= (b ⊗ Im ) a (1.6)
= b ⊗ a, (1.7)

where Ip is the identity matrix of dimension p. Moreover, let A be m × n and B


be n × q, then
 
vec (AB) = B ⊗ Im vecA (1.8)
 
= B ⊗ A vecIn
 
= Iq ⊗ A vecB.

2. Trace (tr) and vec operators are connected by


 
tr A B = vec AvecB, (1.9)
 
tr (ABCD) = vec D C ⊗ A vecB
 
= vec D A ⊗ C vecB ,

where A and B are matrices of the same order. Notice the notation vec A =
(vecA) , at (1.9), which will be applied later on as well. Suppose that the product
ABCD is defined; then we have
 
tr (ABCD) = vec D C ⊗ A vecB (1.10)
 
= vec D A ⊗ C vecB .

1.2.3 Commutation Matrices

T-products have the advantage of having the possibility of commuting their factors
with the help of a linear operator called commutation matrix.
We will use the property of commutativity by linear operators mostly when we
deal with vectors.
8 1 Some Introductory Algebra

Let us start with a matrix A of m × n. It can be written in terms of elementary


matrices

A= ai,j Ei,j ,
i,j

where the elementary matrix Ei,j is m × n, and all entries of Ei,j are zero except the

(i, j )th entry, which is 1. Notice that Ei,j = Ej,i ; therefore, the transpose of A is
 
A = ai,j Ei,j .
i,j

Now let us vectorize both sides


 
vecA = ai,j vecEi,j
i,j
 

= vecEi,j vecA, (1.11)
i,j
 

where vecEi,j denotes the matrix that is obtained by adding column vectors
i,j

vecEi,j into a matrix with respect to the order defined by vecA, i.e. column-
continuously
  of the entries of A. In this way we obtain a permutation matrix
 
vecEi,j with columns vecEi,j in the same order as the elements of vecA.
i,j
The other way round vector vecA is a permutation of vector vecA; therefore,
there exists a particular permutation matrix, we denote it by Km•n , of order mn×mn,
called a commutation matrix, or simply a commutator, which is defined by the
equation

Km•n vecA = vecA . (1.12)


 

We see from (1.11) that Km•n = vecEi,j . One can obtain matrix Km•n by
i,j

permuting the columns of unit matrix Imn since the vector vecEi,j = vecEj,i , is the
((i − 1) n + j )th (i = 1 : m, j = 1 : n) unit vector of Imn .
We observe that for a column vector a we have veca = veca = a, therefore
K1•m = Km•1 = Im .
It can be seen that Km•n is given as a particular permutation p of the columns of
the unit matrix Imn with order mn × mn. Permutation p is defined in the following
way p (1) = 1, p (2) = n + 1 and in general p ((j − 1) m + k) = (k − 1) n + j . In
other words, matrix Km•n has n × m blocks of dimension m × n and all the entries
of the block (j, k) are zero, except the entry (i, j ) that is 1 (j = 1 : n, i = 1 : m).
Once again Km•n is a particular permutation matrix of order mn.
1.2 Tensor Product, vec Operator, and Commutation 9

The T-product is not commutative. Nevertheless, A⊗B and B⊗A are permutation
equivalent from the following basic properties of the commutator Km•n :
1. We have

Km•n = K−1
m•n = Kn•m .

2. K1•n and Km•1 are the unit matrices.


3. Let a be m × 1 and b be p × 1; then
   
Km•p (b ⊗ a) = Km•p vec ab = vec ba = a ⊗ b (1.13)

(see (1.5)). Observe that Km•p reorders a T-product of vectors with dimension
p × 1 and m × 1, respectively, into the product when the order of vectors is:
m × 1 is the first and p × 1 is the second.
4. More generally, if A is m × n and B is p × q and b is p × 1, then

Kp•m (A ⊗ B) = (B ⊗ A) Kq•n , (1.14)


Kp•m (A ⊗ B) Kn•q = B ⊗ A, (1.15)
Kp•m (A ⊗ b) = b ⊗ A, (1.16)
Km•p (b ⊗ A) = A ⊗ b; (1.17)

in particular we have

Km•m (A ⊗ A) = (A ⊗ A) Kn•n .

If A is an m × m square matrix, then Km•m and A ⊗ A commute

Km•m (A ⊗ A) = (A ⊗ A) Km•m . (1.18)

5. The vec of a T-product and the T-product of vectors are connected by


 
vec (A ⊗ B) = In ⊗ Kq•m ⊗ Ip (vecA ⊗ vecB) . (1.19)

If either A or B is a vector, then Kq•m is a unit matrix; therefore, for instance, if


A is m × n and b is p × 1, then we have

vec (A ⊗ b) = vecA ⊗ b, (1.20)


   
vec A ⊗ b = In ⊗ Kp•m (vecA ⊗ b) , (1.21)
 
vec b ⊗ A = b ⊗ vecA, (1.22)
 
vec (b ⊗ A) = Kn•p ⊗ Im (b ⊗ vecA) .
10 1 Some Introductory Algebra

The following Lemma will be found useful later on.


Lemma 1.1 Let A be m × n and B be p × m matrices; then
  
vec (BA) = vec A ⊗ Inp In ⊗ Km•n ⊗ Ip (vecIn ⊗ vecB)
   
= vec B ⊗ Inp Im ⊗ Kp•n ⊗ Ip vecA ⊗ vecIp .

See Sect. 1.5.1, p. 49, Appendix for proof.

1.2.4 Commuting T-Products of Vectors

Equation (1.13) shows that matrix Km•p interchanges b and a in a T-product. One
can identify the dimensions of a and b from the notation Km•p , and vice versa,
the dimensions of a and b define the index of Km•p . In this notation, the order in
the subscript shows the order of the result, namely Km•p interchanges vectors with
dimensions p and m with dimensions m and p in a T-product of two vectors:

Km•p (b ⊗ a) = a ⊗ b.

The usage of notation K(21) (p, m) for Km•p will be useful for our purposes, where
the subscript denotes a permutation (21) (in cycle notation (2, 1)S ). The usage of
cycle notation will be more convenient for the permutations of several elements.
Subscripts m and p show the dimensions of starting vectors that are the subject of
T-products. Although the version K(21) (p, m) seems an unnecessary complication,
its application will be clear later on when multiple T-products are considered. In the
case of interchanging only two vectors in a T-product of two vectors the notation
Km•p is more compact than K(21) (p, m) and we will use both of them in the sequel.
More generally, the subscript of K will show the permutation that is applied to a
T-product, while the order of the dimensions is the same as the original order of the
vectors.
Let a1 and a2 be vectors with dimensions d1 and d2 , respectively; then by this
new notation we have

K(21) (d1 , d2 ) (a1 ⊗ a2 ) = a2 ⊗ a1 .

Although the dimensions of vectors are important especially from a computational


perspective in particular, we will frequently omit them mainly when it does not
cause any misunderstanding. Dimensions of a multiple T-product that is the subject
of permutation are necessary for finding the dimensions of the commutator matrix.
Note that cycle (21)S and cycle (12)S are the same; we use the canonical one (21)S
as usual. The inverse matrix of Km•p = K(21)S (p, m) is Kp•m = K(21)S (m, p),
since the inverse of permutation (21)S is (21)S ; therefore, the order of dimensions
1.2 Tensor Product, vec Operator, and Commutation 11

reflects the change of the order. In this notation


 −1
K(21)S (d1 , d2 ) (a2 ⊗ a1 ) = K(21)S (d2 , d1 ) (a2 ⊗ a1 ) = a1 ⊗ a2 .

Now consider a list of vectors (a1 , a2 , . . . , an ) with dimensions d1:n =


(d1 , d2 , . . . , dn ), respectively. Let us observe that ⊗i=1:j Idi = Id1:j , where
⊗
i=1:j means the T-product of the matrices and define the matrix

⊗   ⊗
K(j +1,j )S (d1:n ) = Idi ⊗ K(21) dj , dj +1 ⊗ Id
i=1:j −1 i=j +2:n i
 
= Id1:(j−1) ⊗ K(21) dj , dj +1 ⊗ Id(j+2):n ,

where (j + 1, j )S is the cycle notation of a permutation p transposing two adjacent


elements j and j + 1 (see Sect. 1.1). Clearly
⊗ ⊗      
K(j +1,j )S (d1:n ) ai = Idi ai ⊗ K(21) dj , dj +1 aj ⊗ aj +1
i=1:n i=1:j −1
⊗  
⊗ Idi ai
i=j +2:n
⊗ ⊗
= ai ⊗ aj +1 ⊗ aj ⊗ ai .
i=1:j −1 i=j +2:n

Therefore, one can transpose (interchange) elements aj and aj +1 in the T-product


of vectors with the help of the matrixK(j +1,j )S (d1:n ). In case all vectors aj have
the same dimension d, we write K(j +1,j )S (d), or K(j +1,j )S for short.
The inverse operator of K(j +1,j )S (d1:n ) equals to the commutator of the inverse
permutation
 −1  
K(j +1,j )S (d1:n ) = K(j +1,j )S d1:(j −1) , dj +1 , dj , d(j +2):n .

We recall that Pn denotes the set of all permutations of the numbers 1 : n =


(1, 2, . . . , n) ; if p ∈ Pn , then p = (p (1) , p (2) , . . . , p (n)). From this it follows
that for any permutation p = (p (1) , p (2) , . . . , p (n)) , p ∈ Pn , there exists a matrix
Kp (d1:n ) such that
⊗ ⊗
Kp (d1:n ) ai = ap(i) , (1.23)
i=1:n i=1:n

just because any permutation p (1 : n) can be decomposed by the product of


transpositions of neighboring elements. For any p ∈ Pn commutator Kp can be
generated directly. So we do not need a decomposition of a permutation in order to
get the commutator matrix of a permutation p.
12 1 Some Introductory Algebra

In this general case the inverse operator of Kp (d1:n ) is


 −1  
Kp (d1:n ) = K−1
p (d1:n ) = Kp−1 (pd1:n ) = Kp−1 dp .

Note that the entries of d1:n are not necessarily equal, they are the dimensions of
vectors ai , i = 1, 2, . . . , n, and they are given in the order of vectors
 ai .
In practice when we apply Kp (d1:n ) to a T-product of vectors ⊗ i=1:n ai , we shall
⊗
simply write Kp i=1:n ai and the dimensions d1:n will be defined by the T-product,
i.e. dimensions d1:n will be omitted from the notation.
The following example shows that Kp (d1:n ) can be constructed by joining terms
in a T-product for a permutation p and by changing the set d1:n of the dimensions.
Example 1.5 Let the permutation p = (432)S of 4 elements. It has only one cycle
(432)S , that is p = (1423). Now we apply p to product a1 ⊗ a2 ⊗ a3 ⊗ a4 , and
notice a1 ⊗ a2 ⊗ a3 ⊗ a4 = a1 ⊗ (a2 ⊗ a3 ) ⊗ a4 = a1 ⊗ b3 ⊗ a4 , where b3 =
a2 ⊗a3 , with dimension d2 d3 . The permutation (23)S of 3 elements with dimensions
(d1 , d2 d3 , d4 ) will provide the same result as p; hence, the commutator follows:

K(432)S (d1:4 ) = K(23)S (d1 , d2 d3 , d4 ) = Id1 ⊗ K(21)S (d2 d3 , d4 ) = Id1 ⊗ Kd4 •d2 d3 .

Notice that the permutation (432)S interchanges the product (a2 ⊗ a3 ) and a4 . This
process can be applied to some permutations p, and it may simplify the process.
Remark 1.3 In Example 1.5 the commutator K(23)S (d1 , d2 d3 , d4 ) will not change
if we interchange d2 and d3 in the product d2 d3 . This does not mean that applying
K(23)S (d1 , d2 d3 , d4 ) to a1 ⊗ (a2 ⊗ a3 ) ⊗ a4 would be the same as applying it to
a1 ⊗ (a3 ⊗ a2 ) ⊗ a4 . Therefore, one has to keep the order of a T-product correct.
Usually, we pay attention to the form of Kp in terms of Km•p , since we calculate
Km•p easily, and based on this we can calculate Kp , for some situations, as a direct
by-product of interchanges.
The result of the following Lemma gives a simple method of transposing the kth
element of the T-product of vectors to the last place and leaves the rest unchanged.
It is useful for calculating T-derivatives.
Lemma 1.2 Consider the permutation p of transposing the last element to the kth
place, and leaving the rest of the elements unchanged, it is given by an -cycle
(n : k)S = (n, n − 1, . . . , k)S with 
length  = n − k + 1. The commutator for
permutation (n : k)S on a T-product ⊗ i=1:n ai with dimensions d1:n is
 
K(n:k)S (d1:n ) = Id1:(k−1) ⊗ K(2,1)S dk:(n−1) , dn (1.24)
= Id1:(k−1) ⊗ Kdn •dk:(n−1) ,

where d1:(k−1) and dk:(n−1) denote the product of the corresponding dimensions.
1.3 Symmetrization and Multilinear Algebra 13

Proof See the previous Example 1.5.


Remark 1.4 If n > k, then we use the notation n : k = (n, n − 1, . . . k). The inverse
permutation of p = (n : k)S transposes k and n back; p−1 = (k : n)S ; hence,
 
K(k:n)S dp = Id1:(k−1) ⊗ Kdk:(n−1) •dn , (1.25)

where dp denotes the dimensions of the transformed series, namely, dp = pd1:n .


Closing this section we apply the commutator to matrix T-products as well.
Lemma 1.3 Let Ak , k = 1 : n be square matrices with dimensions dk × dk ; then
⊗  ⊗
Kp (d1:n ) Ak K−1
p (d1:n ) = Ap(k) . (1.26)
k=1:n k=1:n

If matrices Ak have dimensions dk × pk , then


⊗  ⊗
Kp (d1:n ) Ak K−1
p (p1:n ) = Ap(k) . (1.27)
k=1:n k=1:n

See Sect. 1.5.2, p. 50 for proof.

1.3 Symmetrization and Multilinear Algebra

1.3.1 Symmetrization

The tensor product ⊗ 1:q aj , aj ∈ R is linear in each of its components. This
d

and some more elementary properties of theT-product enable us to construct a


multilinear algebra. The space generated by ⊗ dq
1:q aj ∈ R , aj ∈ R , constitutes
d

a multilinear algebra; let us denote it by Md,q . The elements of Md,q will be


calledtensors (more precisely Cartesian tensors); in particular, the tensors of the
form ⊗ 1:q aj are called factorizable tensors. The case in which the dimensions of
components aj are different; aj ∈ Rdj follows immediately from our treatment in
many respects. Here we concentrate on symmetric tensors, so we restrict ourselves
to the case of equal dimensions.
q
The major difference between Euclidean space Rd and the multilinear algebra
q
Md,q is indexing, which allows the usage of further structures. If w ∈ Rd , then
we use the ordinary indices from 1 to d q . If the same vector w ∈ Md,q , then it has
multi-indices for its entries instead of single ones.
rank of b ∈ Md,q is the minimal number of factorizable tensors bk (of
The the
form ⊗ 1:q a j ), so that b equals to the linear combination of b k , for instance a ⊗q has

rank 1.
14 1 Some Introductory Algebra

q
The Euclidean space Rd is spanned by d q unit vectors, each of them is a q-tensor
product
⊗
e⊗
j1:q = eji , (1.28)
1:q

of unit vectors ej spanning the space Rd , and ej is the unit vector in the j th
coordinate direction. We consider the simplest case when the entries of the unit
basis vectors ej ∈ Rd are zero except the j th one that is 1, and we call this basis
canonical or coordinate basis.
Recall ⊗ permutation p ∈ Pq we have a commutator matrix Kp such
that for each
that Kp ⊗ a
1:q j = 1:q ap(j ) , see Sect. 1.2.3, p. 7.
Definition 1.1 A tensor w ∈ Md,q is called q-symmetric if w = Kp w, for any
permutation p ∈ Pq . q-symmetric tensors in Md,q constitute a linear subspace of
Md,q , which will be denoted by Sd,q .
The elements of Sd,q will also be called Sd,q -symmetric, when dimension d is
also important.
Example 1.6 Let

w (a) = a⊗q ;

then w (a) is q-symmetric of rank 1, since for any permutation p ∈ Pq , we have


Kp a⊗q = a⊗q .
⊗q
We define the rank of b ∈ Sd,q as the minimal number of bk such that b equals
⊗q
the linear combination of bk .
Example 1.7 a = [1, 0, 0, 0, 0, 0, 0, 1] ∈ M2,3 is 3-symmetric, with rank 2 (larger
than 1), namely a = e⊗3 ⊗3
1 + e2 , ej ∈ R .
2

Let us define the symmetrizer matrix Sd1q for the symmetrization of a T-product
of q vectors with the same dimension d by

1 
Sd1q = Kp , (1.29)
q!
p∈Pq

where Pq denotes the set of all permutations of the numbers 1 : q, and the
summation extends over all q! permutations p ∈ Pq . For instance, the result
of Sd14 (a1 ⊗ a2 ⊗ a3 ⊗ a4 ) is a vector with dimension d 4 , and it is symmetric.
Symmetrizer Sd1q is called q-way symmetrization matrix as well, q-symmetrizer
and q-symmetrization for short.
Symmetrizer Sd1q can be generated by an algorithm using commutator matrices
Kp of all permutations. This needs quite some computer capacity when d and q
1.3 Symmetrization and Multilinear Algebra 15

become large, since the number of permutations and the sizes of matrices grow fast.
It seems reasonable to construct a library with matrices Sd1q of different d and q.
We shall use the symmetrizer for simplifying some expressions in T-products;
applying Sd1q to expressions yields symmetric expressions.
The image of Md,q via symmetrizer Sd1q is a subspace that has been denoted by
Sd,q ; it is invariant under operator Sd1q . Space Md,q is with dimension d q , and the
linear subspace Sd,q is with dimension
 
d+q−1
ŋd,q = . (1.30)
q

Example 1.8 We consider the expression


  
w (a, b) = a⊗3 − Id 3 + K(132) + K(312) a⊗2 ⊗ b .

The first term a⊗3 is 3-symmetric; the term a⊗2 ⊗ b is invariant under those
permutations that interchange the first two components; hence, we have
   
K(213) a⊗2 ⊗ b = Id 3 a⊗2 ⊗ b .

Those permutations, when we interchange entries 2 and 3, also give the same results
   
K(231) a⊗2 ⊗ b = K(132) a⊗2 ⊗ b ,
   
K(312) a⊗2 ⊗ b = K(321) a⊗2 ⊗ b .

Now, we see

   1  
Id 3 + K(132) + K(312) a⊗2 ⊗ b = 2Id 3 + 2K(312) + 2K(132) a⊗2 ⊗ b
2
1
= I 3 + K(132) + K(213) + K(321)
2 d
 
+K(312) + K(231) a⊗2 ⊗ b .

Here we recognize the symmetrizer

1  
Sd13 = Id 3 + K(132) + K(213) + K(321) + K(312) + K(231)
3!
and obtain
   ⊗2  3!  
Id 3 + K(132) + K(312) a ⊗ b = Sd13 a⊗2 ⊗ b .
2
16 1 Some Introductory Algebra

Therefore,
  
w (a, b) = a⊗3 − Id 3 + K(132) + K(312) a⊗2 ⊗ b
 
= a⊗3 − 3Sd13 a⊗2 ⊗ b .

If we use Sd13 directly on both sides of w (a, b), then we obtain

Sd13 w (a, b) = w (a, b) ;

hence, w (a, b) is 3-symmetric.


Later on it will be convenient to introduce a notation for the equations when both
sides are q-symmetric.
Definition 1.2 We will call tensors a and b q-symmetry equivalent if Sd1q a =

Sd1q b, and we denote it by a = b.
An instance is the above Example 1.8, where instead of writing
 
w (a, b) = a⊗3 − 3Sd13 a⊗2 ⊗ b ,

we can simply write


w (a, b) = a⊗3 − 3a⊗2 ⊗ b.

Obviously Sd1q Kp = Kp Sd1q = Sd1q , since either operation Sd1q Kp or Kp Sd1q


changes only the order of terms in the sum (1.29).
It follows that symmetrizer Sd1q has the following:

Property 1.1 Sd1q = S2d1q , Sd1q = Sd1q , the image of Md,q via Sd1q is Sd,q , and
the rank of the matrix Sd1q is ŋd,q , see (1.30), i.e. Sd1q is an orthogonal projection
on Md,q to subspace Sd,q .
Please notice the difference between notations w(k1:q ) and wk1:q , the first one is
 
an entry of a tensor w with index k1:q , but the second one is a vector wk1:q =


wk1 , . . . , wkq .

1.3.2 Multi-Indexing, Elimination, and Duplication


q
The multi-indexing of vectors w ∈ Rd facilitates w to use some more connections
between the entries and yields the space Md,q with new structure. The multi-

indexing of a w ∈ Rd is inherited from the tensor product ⊗
q
1:q aj in the following
1.3 Symmetrization and Multilinear Algebra 17

q
way. For any w ∈ Rd we introduce q-plet (triplet, quadruplet,
  etc.) indices, call
it the multi-index of the entries
 of w.
 The q-plet index k 1:q is an ordered vector
of kj ∈ 1 : d, i.e. w = w(k1:q ) corresponds to the indices of the entries of
⊗ q
1:q a ∈ Rd , so that kj denotes the index of j th entry in the product, namely if

j  q  
aj = a1:d,j , then the index of the term j =1 akj ,j is k1:q .

Example 1.9 Consider aj = [a1 , a2 ]j , and let q = 3. Then

w = a1 ⊗ a2 ⊗ a3

= a1,1 a1,2a1,3 , a1,1 a1,2 a2,3, a1,1 a2,2 a1,3, a1,1 a2,2 a2,3 , a2,1 a1,2a1,3 , a2,1

a1,2 a2,3 , a2,1 a2,2a2,3 , a2,1 a2,2a2,3 ,

where ak,j denotes the kth entry of aj . The product a1 ⊗ a2 ⊗ a3 has eight entries


a1 ⊗a2 ⊗a3 = w(1,1,1) , w(1,1,2) , w(1,2,1) , w(1,2,2) , w(2,1,1) , w(2,1,2) , w(2,2,1) , w(2,2,2) ,

and the corresponding multi-indices are

{(1, 1, 1) , (1, 1, 2) , (1, 2, 1) , (1, 2, 2) , (2, 1, 1) , (2, 1, 2) , (2, 2, 1) , (2, 2, 2)} .

Notice the lexicographical order of the indices.


Multi-Index IÐ d,q Now if w ∈ Sd,q , then the indices are compatible with the
indices of a⊗q ∈ Sd,q , a ∈ Rd , that is, entries are invariant

under the permutation
of their indices, since the j1:q -th entry of a⊗q is a⊗q (j ) = aj1 aj2 · · · ajq =
1:q
ap(j1 ) ap(j2 ) · · · ap(jq ) , 1 ≤ jk ≤ d, for any permutation p ∈ Pq . The distinct
entries of any w ∈ Sd,q correspond to the indices of distinct entries of a⊗q . At  this
point we can use canonical form for the indices of the entries of a⊗q , namely j1:q
is canonical if j1 ≤ j2 · · · ≤ jq .
Let the index set of distinct entries of a⊗q be IÐ . We note that IÐ d,q does not
  d,qÐ
depend on a particular a. Now the indices j1:q ∈ Id,q of distinct values define a
subset of coordinate basis e⊗ j1:q through the q-plet (triplet, quadruplet, etc.) indices.

Example 1.10 A particular case of Example 1.9 is when a1 = a2 = a3 = a, then


 
a⊗3 = a13 , a12 a2 , a12 a2 , a1 a22 , a12 a2 , a1 a22 , a1 a22 , a23 ,

for instance a(1,1,2) = a(1,2,1) = a(2,1,1) = a12 a2 . The indices of distinct values are
IÐ2,3 = {(1, 1, 1) , (1, 1, 2) , (1, 2, 2) , (2, 2, 2)}.
18 1 Some Introductory Algebra

To this end we consider an a ∈ Rd , in terms of unit coordinate vectors ek


d
a= a k ek ,
k=1

and take the qth T-power of a in terms of coordinate vectors e⊗


j1:q (see (1.28)),

 ⊗q

d 
a ⊗q
= a m em = a(j1:q ) e⊗
j1:q .
m=1 (j1:q )

Now we collect the same values of coefficients a(j1:q ) . They correspond to the
indices of distinct values
 
a⊗q = a(j1:q ) e⊗
k1:q ,
(j1:q )∈Id,q
Ð (k1:q )|p(k1:q )=(j1:q )
 
where the second sum is taken over all multi-indices k1:q when there exists a
   
permutation p ∈ Pq , such that p k1:q = j1:q ∈ IÐ d,q .
       
The set k1:q |p  k1:q = j1:q includes   only the distinct permutations p j1:q
of multi-index
  j1:q . For instance, if j1:q = (1, . . . , 1), then the only possible
value of k1:q is (1, . . . , 1). Actually, a⊗q ∈ Sd,q ; hence,
 
a⊗q = Sd1q a⊗q = a(j1:q ) Sd1q e⊗
k1:q ,
(j1:q )∈IÐd,q (k1:q )|p(k1:q )=(j1:q )

which allows us to introduce a vector system in Sd,q defined


 by the coordinate
  ⊗
vectors ek . Let j1:q ∈ Id,q , and define the vector system 
Ð ej1:q by

 
e⊗
j1:q = Sd1q e⊗
k1:q = e⊗
k1:q ,
(k1:q )|p(k1:q )=(j1:q ) (k1:q )|p(k1:q )=(j1:q )

since the sum is q-symmetric. Therefore, each vector e⊗ e⊗


j1:q is q-symmetric, j1:q ∈
Sd,q .   
e⊗
If j1:q = m1:q , then  e⊗
j1:q and m1:q are orthogonal

⎛ ⎞ ⎛ ⎞
⊗ ⊗
 
em1:q = ⎝
ej1:q
 e⊗
k1:q
⎠ ⎝ e⊗
k1:q
⎠ = 0,
(k1:q )|p(k1:q )=(j1:q ) (k1:q )|p(k1:q )=(m1:q )

since there are no equal vectors in these sums and all are orthogonal.
1.3 Symmetrization and Multilinear Algebra 19

 
Let us introduce the type (j1:q ) of an index j1:q such that (j1:q ) =
 
1 , .. . , q , where k denotes the cardinality number of k ∈ 1 : q, in the index
j1:q . For instance, the type of (j1:3 ) = (1, 1, 2) is (j1:3 ) = [2, 1, 0]. It is clear that
the permutation of a multi-index j1:q will not change its type.
     
Now for any j1:q ∈ IÐ the set of multi-indices k1:q , so that p k1:q =
  d,q  
j1:q , p ∈ Pq includes all possible permutations   of j1:q ; hence, the sum runs
over (j1:q ) ! terms where (j1:q ) is the type of j1:q . Therefore, we conclude that

2 q!
e⊗
 = .
j1:q
(j1:q ) !
   
Lemma 1.4 The system e⊗
j1:q ∈ Sd,q | j 1:q ∈ I Ð
d,q is a complete orthogonal
system in Sd,q .
Proof We have seen the orthogonality above.
 Thecompleteness follows from the
expansion by the distinct entries of w = w(k1:q ) ∈ Sd,q , which defines and is
defined by a vector of Rŋd,q .
Now the dimension of Sd,q and the number of distinct values ŋd,q coincide (see
(1.30)). The following table shows the dimensions of Sd,q when q changes.

q ŋd,q : Number of distinct entries


d+1 d(d+1)
2 2 = 2!
d+2 d(d+1)(d+2)
3 3 = 3!
d+3  d(d+1)(d+2)(d+3)
4 4 = 4!

It is seen that the dimension changes rapidly. The particular orthogonal system
for d = 3 and q = 2 is the following.
Example 1.11 We let d = 3 and q = 2, then ej ∈ R3 , the indices of distinct entries
are IÐ3,2 = {(1, 1) , (1, 2) , (1, 3) , (2, 2) , (2, 3) , (3, 3)}, hence we have e⊗ ⊗2
j,j = ej ,
j = 1 : 3, e⊗j,j +1 = ej ⊗ ej +1 + ej +1 ⊗ ej , j = 1 : 2, and  e⊗
j,j +2 = ej ⊗ ej +2 +
ej +2 ⊗ ej , j = 1. If w ∈ S3,2 , then we can write


3 
2
w= e⊗
wj,jj,j + e⊗
wj,j +1 e⊗
j,j +1 + w1,31,3
j =1 j =1


2 
3−k
= e⊗
wj,j +kj,j +k .
k=0 j =1
20 1 Some Introductory Algebra

1.3.2.1 q-Symmetrizing Vectors


   
Remark 1.5 The orthogonal system e⊗
j1:q ∈ Sd,q , j1:q ∈ Id,q is not normed,
Ð

and the norm square of e⊗


j1:q equals the number of all possible permutations of index
 
j1:q , i.e. q!/ (j1:q ) !.

Example 1.12 Let us consider the case IÐ 2,3 —see Example 1.10 above. The orthog-
⊗ ⊗3 ⊗
onal system includes  e1,1,2 = e⊗2
e(1,1,1) = e1 ,  ⊗2
1 ⊗ e2 + e2 ⊗ e1 + e1 ⊗ e2 ⊗ e1 ,
e⊗
 ⊗2 ⊗2
e2,2,2 = e⊗3
1,2,2 = e1 ⊗ e2 + e2 ⊗ e1 + e2 ⊗ e1 ⊗ e2 , 2 . The norm square of these
2 2 2 2
e⊗
vectors are 1,1,1 e⊗
= 2,2,2 e⊗
= 1, and 1,1,2 e⊗
= 1,2,2 = 3.

In general it does not follow that if w ∈ Sd,q then there exists an a ∈ Rd , so


that w= a⊗q , i.e. w is not necessarily to be with rank 1. The constraint for a tensor
w = w(j1:q ) with rank one can be read out from the entries of w. A possible
algorithm for checking q-symmetry of w is to see whether w(j1:q ) = wp(j1:q ) for all
 
j1:q ∈ IÐd,q and all p ∈ Pq .
q
In general, q-symmetrizing a vector w ∈ Rd in terms of multi-indices means
averaging those coefficients for which indices are permutation equivalent; formally,
if

w= w(k1:q ) e⊗
k1:q ,
( 1:q )
k

then the symmetrized version of w is




w = Sd1q w = e⊗
(j1:q )
w j1:q ,
(j1:q )∈IÐd,q

e⊗
and the coefficient ofj1:q is calculated by the following:

Lemma 1.5 Let w ∈ Md,q ; then the orthogonal projection  w of w into space Sd,q
is given by


w= e⊗
(j1:q )
w j1:q ,
(j1:q )∈Id,q
Ð

where
1 
(j1:q ) =
w 2
w(k1:q ) .
e⊗
j1:q (k1:q )|p(k1:q )=(j1:q )

2
e⊗
w
We also have  e⊗
j1:q = j1:q (j1:q ) . If w ∈ Sd,q , then w
w (j1:q ) = w(j1:q ) .
1.3 Symmetrization and Multilinear Algebra 21

See Appendix 1.5.3 for the proof.


If w is q-symmetric, then the formula in the above Lemma gives the original
coefficients multiplied by the norm of  e⊗
j1:q , since the coefficients in the summation
(j1:q ) do not change by permuting indices.
of w
Let us introduce the vector wÐ ∈  Rŋd,q for the distinct entries of a q-
symmetric vector w ∈ Sd,q , by wÐ = w (j1:q ) , where the indices are
(j1:q )∈IÐd,q
in lexicographic order (otherwise it is not unique).
 Now we define the matrix Qd,q
⊗  
for the system  ej1:q ∈ Sd,q , j1:q ∈ Id,q , so that Qd,q is built up by the column
Ð

e⊗
vectors Ð
j1:q with the same order as the entries of w ,

 
Qd,q = e⊗
j1:q .
(j1:q )∈IÐd,q

The vector wÐ is with dimension ŋd,q , see (1.30), and the matrix Qd,q is d q ×ŋd,q ,
so we can take product

Qd,q wÐ = w e⊗
(j1:q )j1:q = w, (1.31)
( j1:q ∈IÐ
) d,q

which results in vector w. If q = 2, then Qd,2 is called duplication, if q = 3,


triplication, and if q = 4, quadruplication. In general, we will refer to matrix
Qd,q as q-plication matrix. Linear operator Qd,q transforms the Euclidean space
Rŋd,q into space Sd,q . Image Sd,q is of dimension ŋd,q , see (1.30), Sd,q ⊂ Rd , and
q

operator Qd,q is a one to one mapping.


Q-plication matrix Qd,q is of full column rank, by its construction, and hence,
its Moore–Penrose inverse
 −1
 
Q+
d,q = Q d,q Q d,q Qd,q

exists, and it has the property that the distinct elements of a q-symmetric vector w
can be obtained by

wÐ = Q +
d,q w. (1.32)

Matrix Q+ d,q will be called a q-way elimination matrix (elimination matrix for
+
short).Qd,q elimination matrix is not unique, in general, since there are several
methods of collecting the distinct elements of a w ∈ Sd,q , and let us collect them
into a vector like wÐ . We have synchronized the triplet: vector wÐ of distinct entries
of w ∈ Sd,q , the q-plication matrix Qd,q , and the elimination matrix Q+ d,q together
by fixing the alphabetical orderof the indices. The vector w Ð is defined by the

order of the orthogonal system  e⊗
j1:q . We use the lexicographic ordering of the
22 1 Some Introductory Algebra

 
coordinate vectors  e⊗
j1:q by their multi-indices for selection of distinct entries.
This results in removing each second and later occurrence of the same element from
w. The following example shows our treatment.
Example 1.13 The vector form of a 3 × 3 symmetric matrix
⎡ ⎤
a1,1 a1,2 a1,3
A = ⎣ a1,2 a2,2 a2,3 ⎦
a1,3 a2,3 a3,3

S3,2 -symmetric. The vector


is  of distinct
+
values is
(vecA)Ð = vecÐ A =
a1,1, a1,2 , a1,3, a2,2 , a2,3 , a3,3 . Now Qd,q vecA = a1,1 , a2,1 , a3,1, a2,2, a3,2 ,

a3,3 .
We can use Q+
q
d,q on a general vector from R
d eliminating those entries as in

a symmetric vector. The vector 1d q ∈ Sd,q is q-symmetric; if we expand 1d q we


obtain
 
1d q = e⊗
k1:q = e⊗
j1:q ,
(k1:q ) (j1:q )∈Id,q
Ð

2
e⊗
where as we have seen j1:q is the number of e⊗ e⊗
k1:q in the construction j1:q . The
product

Qd,q 1d q = ωd,q (1.33)

is a vector in Rŋd,q with entries 


2
e⊗
j1:q = q!/ (j1:q ) ! since matrix Qd,q is built up
by e⊗
vectors (see Remark 1.5). We will refer to ωd,q as norm-weights. Equation
j1:q
(1.33) follows from Lemma 1.5 above as well.

Consider the transformation y = Qd,q x of an x ∈ Sd,q ; the dimension of y is
ŋd,q .
Example 1.14 In particular let d = 3, and q = 3, then ŋ3,3 = 10, the triplication
matrix will be
 
Q3,3 =  e⊗ e ⊗
e
, ⊗
e
, ⊗
e
, ⊗
e
, ⊗
e
, ⊗
e
, ⊗
e
, ⊗
e ⊗
1,1,1 1,1,2 1,1,3 1,2,2 1,2,3 2,2,2 1,3,3 2,2,3 2,3,3 3,3,3 ,
, ,

with corresponding types (j1:q )

((3, 0, 0) , (2, 1, 0) , (2, 0, 1) , (1, 2, 0) , (1, 1, 1) , (0, 3, 0) , (1, 0, 2) , (0, 2, 1) ,


(0, 1, 2) , (0, 0, 3)) ,
1.3 Symmetrization and Multilinear Algebra 23

and norm-weights
! "
3!
ω3,3 = = [1, 3, 3, 3, 6, 1, 3, 3, 3, 1] .
k ! k=1:ŋd,q

We see that for q = 3, the multi-index (k, k, k) occurs once and (i, j, k) with distinct
i, j , k, occurs 6 times.
It is more convenient to renumerate the vectors of Q3,3 by the order they appear,
i.e.

⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
Q3,3 = e1 ,
e2 ,
e3 ,
e4 ,
e5 ,
e6 ,
e7 ,
e8 ,
e9 ,
e10 .

Now if x ∈ S3,3 , then


10
x= e⊗
xkk .
k=1

We recall that coordinates xk correspond to the coordinates in the Euclidean space


⊗ ⊗
e⊗
2
R27 , and we haveej ek = δ j k k , and hence,



10
e⊗
2
y = Q3,2 x = xk k ek = ω3,3  x,
k=1

where ek denotes the standard unit vector in R10 , and  denotes the Hadamard
(element wise) product. In some cases, depending on the software
 at hand,
 it is
more convenient to replace the Hadamard product ω3,3  x by Diagω3,3 x, where
Diagω3,3 is the diagonal matrix of vector ω3,3 .

Consider the transformation y = Qd,q x of an x ∈ Sd,q , the dimension of y is
ŋd,q , see (1.30), and

 

y = Qd,q x = Qd,q e⊗
x(j1:q )j1:q
(j1:q )∈IÐd,q
  
2 2
= e⊗
j1:q x(j1:q ) ek (j1:q ) = ω d,q  xÐ = Diag e⊗
j1:q xÐ ,
(j1:q )∈IÐd,q
 
where ek (j1:q ) is a coordinate vector in Rŋd,q , with the same ordering k j1:q as xÐ .
24 1 Some Introductory Algebra

 q
The operator Qd,q is valid for each vector in the Euclidean space Rd as well. If
q
x ∈ Rd , then

x= x(i1:q ) e⊗
i1:q ,
i1:q


and calculating Qd,q x we need the products

⊗
 
ej1:q e⊗
 i1:q = e⊗ ⊗
k1:q ei1:q = δp(k1:q )i1:q .
(k1:q )|p(k1:q )=(j1:q ) (k1:q )|p(k1:q )=(j1:q )

It follows that the coefficient of a coordinate vector ek (j1:q ) in Rŋd,q is the sum of
   
those x(k1:q ) for p k1:q = j1:q ,
⎛ ⎞

 
y = Qd,q x = ⎝ x(k1:q ) ⎠ ek (j1:q ) .
( j1:q ∈IÐ
) d,q (k1:q )|p(k1:q )=(j1:q )

Lemma 1.5 shows the connection with the symmetrization of x


 2
y= e⊗
j1:q 
x(j1:q ) ek (j1:q ) ,
( j1:q ∈IÐ
) d,q

where 
x(j1:q ) denotes the distinct values of the symmetrized x; hence, we obtained
 
y = ωd,q  Q+
d,2 Sd1q x .

We summarize this result by the following Lemma.


q
Lemma 1.6 For an arbitrary x ∈ Rd we have
 

Qd,q x = ωd,q  Q+
d,q Sd1 q x .


Remark 1.6 As far as x ∈ Sd,q , the transpose Qd,2 of the q-plication matrix Qd,2
acts as
 

Qd,2 x = ωd,2  Q+ d,2 x ,


and one can replace Q+
d,2 x by the distinct values x of x; hence, Qd,2 x = ωd,2  x .
Ð Ð
1.3 Symmetrization and Multilinear Algebra 25

For any permutation p ∈ Pq , commutator Kp does not change a q-symmetric


vector w ∈ Sd,q ; therefore, we have

Q+ +
d,q Kp w = Qd,q w.

Hence on space Sd,q we have the operator equality

Q+ +
d,q Kp = Qd,q .

If we restrict ourselves to Sd,q , it readily follows for the symmetrizer

Q+ +
d,q Sd1q w = Qd,q w,

as well. At this point, it is important to understand the following. Although operator


Q+ + +
d,q acts on q-symmetric space Sd,q and Qd,q Sd1q = Qd,q , it does not mean that
+ +
the two matrices Qd,q Sd1q and Qd,q would be equal.

Example 1.15 Let w = [1 : 8] , then Q+  +


2,3 w = [1, 2, 4, 8] , and Q2,3 Sd1q w =
[1, 10/3, 17/3, 8] .
Moreover on space Rŋd,q we have

Iŋd,q = Q+ +
d,q Qd,q = Qd,q Sd1q Qd,q .

Now consider the connection between the symmetrizer and the q-plication; for
any wÐ ∈ Rŋd,q , and for any commutator Kp , we have

Kp Qd,q wÐ = Kp w = w = Qd,q wÐ ;

therefore on space Rŋd,q we have

Kp Qd,q = Qd,q ;

hence, the symmetrizer fulfills the equality

Sd1q Qd,q = Qd,q ,

on Rŋd,q as well. Now

Sd1q Qd,q Q+ +
d,q = Qd,q Qd,q ,


and Sd1q = Sd1q = S2d1q is an idempotent matrix with rank ŋd,q . Hence, Sd1q =
Qd,q Q+
d,q .
26 1 Some Introductory Algebra

Nevertheless, we define  Sd1q = Qd,q Q+


q
d,q , which acts on any w ∈ R
d so that

it picks up the distinct elements of w assuming w ∈ Sd,q first and then q-plicate it.
The result is  / Sd,q , then 
Sd1q w ∈ Sd,q . Again if w ∈ Sd1q w = Sd1q w, although both
belong to Sd,q , i.e. both are q-symmetric.
Example 1.16 Let w = 1 : 8, then  Sd1q w = [1, 2, 2, 4, 2, 4, 4, 8], and Sd1q w =
[1, 10/3, 10/3, 17/3, 10/3, 17/3, 17/3, 8] (see the Example 1.10).
One possible check to see whether w ∈ Sd,q is to take the equation 
Sd1q w = w,
since it will be fulfilled only if w ∈ Sd,q .

1.4 Partitions and Diagrams

There are at least two different concepts of partitions and both refer to the process
of dividing an object into smaller sub-objects.
1. Integer Partitions: A partition of an integer n is a way to write it as a sum of
positive integers, such as 4 = 4, 4 = 3 + 1, 4 = 2 + 2, 4 = 2 + 1 + 1,
4 = 1 + 1 + 1 + 1.
2. Set Partitions: A partition of a set  divides it into disjoint subsets. Let K be a
set of subsets obtained from set . K is a partition of  if the included subsets
of K are disjoint, non-empty, and their union is the whole set .
There are some connections between these concepts as we will see later. For
instance the number of all partitions of number 4 as we have seen is 5, and at the
same time the number of all partitions of set  = {1, 2, 3, 4}, see Partition Table 1.3
below, is 15. The integer partitions of 4 show the possible block cardinalities of a
set partitions of 4 elements.
We consider set partitions.
A set of n elements can be split into a set of disjoint subsets, i.e. it can be
partitioned. The set of n elements will correspond to set 1 : n = {1, 2, . . . , n}.
If K = {b1 , b2 , . . . , br }, where each bj ⊂ 1 : n, then K is a partition provided
∪bj = 1 : n, each bj is non-empty and bj ∩ bi = ∅ (the empty set) is disjoint
whenever j = i. The subsets bj, j = 1, 2, . . . , r are called the blocks of K. We
will call r (the number of the blocks in partition K), the size of K, and denote it by
|K| = r, and a partition with size r will be denoted by K{r} . Let us denote the set
of all partitions of the numbers 1 : n by Pn .
Partition matrices are very productive for handling set partitions. By using
matrices ideas and methods become transparent and provide algorithms naturally.
Below we give an algorithm for generating all partitions of set 1 : n. The reason to
do so is to be able to generate partitions by a simple computer
program.
Consider a set of n-dimensional row vectors uj = uj,1 , uj,2 , . . . , uj,n ,
(j = 1, 2, . . . , r), where uj,k is either one or zero and at least one entry is 1, i.e.
1.4 Partitions and Diagrams 27

the matrix
⎡⎤
u1
⎢ u2 ⎥
⎢ ⎥
U=⎢ . ⎥ (1.34)
⎣ .. ⎦
ur

is r × n, and its entries are only zeros and ones. Let the sum of the
# entries of each
column be 1, i.e. the sum of the rows of matrix U (denoted by U) be the row

vector [1, 1, 1, . . . , 1] = 1n :

 
r
U= uj = 1n . (1.35)
j =1

Such a vector system uj , j = 1, 2, ..., r, i.e. the r × n matrix U, see (1.34), called
partition matrix. It corresponds to the rth size partition K = {b1 , b2 , . . . , br } of
set 1 : n in the following way. Each row vector uj corresponds to a block bj of 1 : n
with respect to the index of 1s in uj , i.e. a particular value k from 1 : n is in block
bj if uj,k = 1, see Example 1.21 below.
To illustrate, let n = 1, then we have one element and one partition

U = u1 = [1] . (1.36)

Let n = 2, and we have a partition v 1 = [1, 1], with both elements in it, and
another partition with blocks u1 = [1, 0] , u2 = [0, 1] corresponding to the second
term of (1.37). (To avoid any possibility of confusion we used v 1 instead of u1 for
the first term as the elements are different.) So we have two matrices

U1 = [v 1 ] = [1, 1] ,
! " ! "
u1 10
U2 = = , (1.37)
u2 01

associated with the partitions. The first matrix U1 is obtained by adding 1 to U =


[1] of (1.36) and the second matrix U2 is obtained by adding zero to the first row
and creating a new row vector with zero and one, to obtain u1 + u2 = [1, 1]. The
sequence of steps is shown in Tables 1.1 and 1.2.

Table 1.1 Partitions for n = 1 and n = 2


n=1 n=2


⎨ U1 = [1,
( 1] ) Add 1 to the extra column
U1 = [1] 10 If we add 0, then we have to create a new row,

⎩ U2 = #
01 so that the condition uj = [1, 1] is satisfied
28 1 Some Introductory Algebra

Table 1.2 Partitions for n = 2 and n = 3


n=2 n=3


⎪ Add 1 to the extra column, and the

⎪ Inclusive [1, 1, 1] (1) # 
⎨ condition uj = 13 is satisfied
U1 = [1, 1] ( )

⎪ Exclusive 1 1 0 (2) Add 0, and create a new row vector ,

⎪ # 
⎩ 001 and keep the condition uj = 13 .
⎧ ( )

⎪ 101 Add 1 and 0 to the extra column,

⎪ Inclusive (3) # 

⎪ 010 and the condition uj = 13 is satisfied

⎪ ( )


( ) ⎪
⎪ 100 Add 0 and 1 to the extra column,
⎨ Inclusive (4) #
10 
U2 = 011 the condition uj = 13 is satisfied
01 ⎪
⎪ ⎡ ⎤



⎪ 100

⎪ Exclusive ⎢
⎪ ⎥ Add [0,0] to the extra column, and

⎪ ⎣ 0 1 0 ⎦ (5) # 

⎩ create a new row with uj = 13 .
001

Definition 1.3 If the new partition matrix has the same number of rows as the
previous matrix (with an extra column), we call the extension inclusive. If the
number of rows increases by one (with an extra column), we will call the extension
exclusive.
At each new stage we will have one exclusive extension to every partition of the
previous stage and the number of inclusive extensions is equal to the number of
rows of the previous matrix. We note that each matrix contributes to one partition
and each row of the matrix contributes to one block of the partition.
To obtain the above partitions we proceed by obtaining partitions for a set of
lower cardinality (n − 1) and by inclusion and exclusion arguments we can obtain
partitions for a set of cardinality n.
This procedure works in general as well. We add all the possible inclusive and
exclusive extensions to each partition K ∈ Pn−1 to obtain all the partitions of Pn
based on the partitions of Pn−1 . In this way we will obtain all the partitions of Pn
because if L ∈ Pn then one of the blocks, say b0 ∈ L, contains the element n.
If |b0 | = 1, then L is generated by exclusive extension from partition K ∈ Pn−1
containing the rest of the blocks of L. If |b0 | > 1, then L is generated by inclusive
extension from partition K ∈ Pn−1 containing all the blocks of L, but instead
of b0 block b1 = b0 \n, i.e. the element n is discarded from b0 ; in other words
b0 = b1 ∪ (n).
Now we can extend the results to the case n = 4 as we do in Table 1.3.
1.4 Partitions and Diagrams 29

Table 1.3 Partitions for n = 4


n=3 n=4

⎪ Inclusive ([1, 1, 1, 1])

[1, 1, 1] 1110

⎩ Exclusive
0001
n=3 n=4 n=3 n=4
⎧ ( ) ⎧ ( )

⎪ 1101 ⎪
⎪ 1011

⎪ Inclusive ⎪
⎪ Inclusive

⎪ ⎪


⎪ (
0010
) ⎪
⎪ (
0100
)
( ) ⎪
⎪ ( ) ⎪


⎨ 1100 ⎪
⎨ 1010
110 Inclusive 101 Inclusive
⎪ 0011 ⎪ 0101
001 ⎪ ⎡ ⎤ 010 ⎪ ⎡ ⎤

⎪ ⎪



1100 ⎪

1010

⎪ ⎢ ⎥ ⎪
⎪ ⎢ ⎥

⎪ Exclusive ⎣0 0 1 0⎦ ⎪
⎪ Exclusive ⎣0 1 0 0⎦

⎩ ⎪

0001 0001
⎧ ⎡ ⎤

⎪ 1001

⎪ ⎢ ⎥

⎪ Inclusive ⎣0 1 0 0⎦




⎧ ( ) ⎪

⎪ ⎡0 0 1 0⎤

⎪ 1 001 ⎪
⎪ 1000

⎪ Inclusive ⎪
⎪ ⎢ ⎥

⎪ ⎪
⎪ ⎣0 1 0 1⎦

⎪ (
0 110
) ⎡ ⎤ ⎪

Inclusive
( ) ⎪
⎪ ⎪


⎨ 100 ⎪

100 Inclusive
1 000
⎢ ⎥ ⎡0 0 1 0⎤
0 111 ⎣0 1 0⎦ 1000
011 ⎪
⎪ ⎡ ⎤ ⎪
⎪ ⎢ ⎥

⎪ 001 ⎪
⎪ Inclusive ⎣0 1 0 0⎦

⎪ 1 000 ⎪


⎪ ⎢ ⎥ ⎪


⎪ Exclusive ⎣ 0 1 1 0⎦ ⎪
⎪ ⎡
0011


⎩ ⎪

0 001 ⎪
⎪ 1000

⎪ ⎢ ⎥

⎪ ⎢0 1 0 0⎥

⎪ Exclusive ⎢ ⎥

⎪ ⎣0 0 1 0⎦



0001

1.4.1 Generating all Partitions

We have seen that generating partitions K = {b1 , b2 , . . . , br } is equivalent to


generating row vectors [u1 , u2 , . . . , ur ], where the elements of each vector is either
one or zero depending on whether a particular value j is in the block or not. While
generating the vectors [u1 , u2 , . . . , ur ] we must note that the sum of the vectors
must satisfy the linear constraint


r
uj = 1n ,
j =1

which expresses the fact that ∪bk = 1 : n. We note that vectors u1 , u2 , . . . ur


are n-dimensional. To generate these vectors we proceed sequentially starting from
n = 1. Suppose we have all the partitions Pn of set 1 : n and we want to generate
the partitions of Pn+1 . Each partition K = {b1 , b2 , . . . , br } ∈ Pn can be expanded
30 1 Some Introductory Algebra

into partitions of Pn+1 in two ways. The new element


 n + 1 is added to ablock bj ,
b j = bj ∪ (n + 1) and the new partition Kj = b1 , b2 , . . . , b j , . . . , br ∈ Pn+1 ,
j = 1, 2, . . . , r. In this way we obtain r partitions of Pn+1 . The other way
is to expand K by a new block containing the single element n + 1, K =
{b1 , b2 , . . . , br , (n + 1)} ∈ Pn+1 .

1.4.2 The Number of All Partitions

When n = 4, see Table 1.3. In this case we have 15 matrices, thus leading to 15
partitions. The number of terms in which there are various partitions is as follows:

Number of blocks Number of partitions


1 1
2 7
3 6
4 1
Total: 15

We can give a general recursive formula for computing the number of partitions.
Let Sn (r) denote the number of partitions of 1 : n with r blocks. This is called the
Stirling numbers of second kind or the Stirling partition number. Sn+1 (r) satisfies
the recursive equation

Sn+1 (r) = Sn (r − 1) + rSn (r); (1.38)

observe the difference to (1.50), where we consider the partitions of pairs. The
explanation for the recursion (1.38) lies in the fact that one can generate the
partitions of {1, 2, . . . , n} with the help of the partitions of {1, 2, . . . , n − 1}. As we
have seen for n = 2, 3, 4 a partition with size r of {1, 2, . . . , n} can be derived by
the inclusive extension of a partition with size r of (1, 2, . . . , n − 1), the number
of these is r, and by the exclusive extension of a partition with size (r − 1) of
{1, 2, . . . , n − 1} the number of these in each case is 1, hence the recursive formula.
We note that the initial values of this recursion are Sn (n) = 1, Sn (1) = 1,
Sn (0) = 0, and Sn (r) = 0 if r > n.
From (1.38) we obtain the following cases:
Case 1.1 Let n = 1,

r = 1, S1 (1) = 1.
1.4 Partitions and Diagrams 31

Case 1.2 Let n = 2,

r = 1, S2 (1) = S1 (0) + S1 (1) = 1,


r = 2, S2 (2) = S1 (1) + 2S1 (2) = 1.

Case 1.3 Let n = 3,

r = 1, S3 (1) = S2 (0) + S2 (1) = 1,


r = 2, S3 (2) = S2 (1) + 2S2 (2) = 3,
r = 3, S3 (3) = 1.

If n = 3, the total number of partitions is 5.


Case 1.4 Now consider n = 4, using the following table—see above as well

r = 1, S4 (1) = 1,
r = 2, S4 (2) = S3 (1) + 2S3 (2) = 7,
r = 3, S4 (3) = S3 (2) + 3S3 (3) = 6,
r = 4, S4 (4) = 1.

The total number of partitions is 15.


The above recursive formula (1.38) helps us to check that the terms in the
partitions we are recursively collecting are correct. Summing up the numbers Sn (r)
we obtain the number of all possible partitions of 1 : n


n
Bn = Sn (r). (1.39)
r=1

Bn is called the nth Bell number.


The Bell numbers fulfill the following recursion:
n  
 n  

n n
Bn+1 = Bn−k = Bk , B0 = 1. (1.40)
k k
k=0 k=0

The proof of this is as follows.


Bn+1 is the number of all possible partitions of (1, 2, . . . , n + 1) . Now the
element n + 1 lies in a block of cardinality k + 1, 0 ≤ k ≤ n; such
  blocks contain
k elements from 1 : n; therefore, the total number of choice is nk . Once we have
chosen a block with element n + 1 and k elements from 1 : n, the remaining n − k
elements can be partitioned in Bn−k ways in total. Hence, the formula (1.40).
32 1 Some Introductory Algebra

Example 1.17 If n = 4, B4 = 1 + 3 · 1 + 3 · 2 + 1 · 5 = 15, if n = 5, B5 =


1 + 4 + 6 · 2 + 4 · 5 + 15 = 52.
The generating function (see Corollary 2.5, p. 66) of the Bell numbers is

 Bk x k  
= exp ex − 1 .
k!
k=0

Definition 1.4 (Type of a Partition) Consider a partition K{r} = {b1 , b2 , . . . , br }


∈ Pn ,* with
* size r. Let us denote the cardinality kj of a block in partition K, i.e.
kj = *bj *. The type of a partition K{r} is  = [1 , . . . , n ], if K{r} contains exactly
j blocks with cardinality j . A partition with size r and type  will be denoted by
K{r|} .
# #
It is clear that j ≥ 0, and j j j = n, and j j = r. Naturally, some j ’s
are zero.
Example 1.18 Let n = 4, and the possible sizes r of partitions K are r = 1, 2, 3, 4.
If r = 1, then the only block is b = (1 : 4), and  = [0, 0, 0, 1], if r = 2, then
we have two possible partitions K = {b1 , b2 } with types 1 = [1, 0, 1, 0], 2 =
[0, 2, 0, 0], if r = 3, then  = [2, 1, 0, 0] is the only type of K = {b1 , b2 , b3 }.
Finally, if r = 4, then we have K = {b1 , b2 , b3 , b4 } where each block has one
element so the type is  = [4, 0, 0, 0] .
The number of all partitions K{r} of an n-element set with blocks’ cardinality kj
and with type  is

n! n!
N = n r = n , (1.41)

j =1 j ! k
j =1 j ! j =1 j ! (j !)
j

note 0! = 1, and the latter formula will be used throughout below. Observe that
if all j equal either 0 or 1, in other words n is divided into r distinct parts: n =
j1 + . . . + jr , jk = jm , if k = m, then N coincides with the multinomial coefficient
 
n
N = . (1.42)
j1 , . . . , jr

Incomplete Bell Polynomials Consider a partition K{r} = {b1 , b2 , . . . , br } ∈ Pn ,


again with size r. The minimum cardinality of the blocks is 1 in any partition. The
maximum cardinality of blocks contained in a partition K{r} with size r is n − r + 1.
This happens in the case in which r − 1 blocks contain 1 element and one block is
with n − r + 1 elements. We can characterize K{r|} by a product x1:n  = n j
j =1 xj .
This shows that there are j blocks with cardinality j . If K{r} is of size r, then the
power of xj , j > n − r + 1 is necessarily 0; hence, we can construct a polynomial
with x1 , . . . , xn−r+1 unknowns with coefficients N summarizing all partitions K{r}
1.4 Partitions and Diagrams 33

with size r

  n!  j
n−r+1
Bn,r (x1 , . . . , xn−r+1 ) = 
N x1:n = n−r+1 xj
j =r, j =r, j =1 j ! (j !)j j =1
j j =n j j =n

 
n−r+1  j
1 xj
= n! ,
j ! j!
j =r, j =1
j j =n

#n−r+1
where the sum is taken for all types  with constraint j ≥ 0, j =1 j = r, and
#n−r+1
j =1 j j = n.
Polynomials Bn,r are called incomplete (exponential) Bell polynomials. These
polynomials provide a compact form of partitions with fixed size.
The following examples will clarify these ideas.
Example 1.19 If n = 4, the number of all partitions with type  = [1, 0, 1, 0] is
4!/3! = 4. That is 1 = 1 (one block with one element, k1 = 1), 2 = 0 (no blocks
with 2 elements), and 3 = 1 (one block with k2 = 3 elements). Compare with
Table 1.3.
In the following example we consider the complete picture for n = 5.
Example 1.20 Let n = 5, and the possible sizes r of partitions K are r =
1, . . . , 5.
1. If r = 1, then there is one partition with one block and type 1 = [0, 0, 0, 0, 1],
N1 = 1, S5 (1) = 1,

B5,1 (x1 , . . . , x5 ) = x5 .

2. If r = 2, then we have pairs of blocks constituting a partition with j and k


elements, j +k = 5, both j and k are necessary 1, since j j +kk = 5. The
possible types are 1 = [1, 0, 0, 1, 0], with N1 = 5!/4! = 5, 2 = [0, 1, 1, 0, 0],
with N2 = 5!/2!3! = 10.

 N
[1, 0, 0, 1, 0] 5!/4! = 5
[0, 1, 1, 0, 0] 5!/2!3! = 10

The number of all partitions of size 2 is S5 (2) = S4 (1) + 2S4 (2) = 15 (see (1.38)
and Table 1.4).

B5,2 (x1 , . . . , x4 ) = 5x1x4 + 10x2 x3 .


34 1 Some Introductory Algebra

3. If r = 3, then we have partitions with 3 blocks and types:

 N
[1, 2, 0, 0, 0] 5!/2!2!2! = 15
[2, 0, 1, 0, 0] 5!/2!3! = 10

S5 (3) = S4 (2) + 3S4 (3) = 25,

B5,3 (x1 , x2 , x3 ) = 15x1x22 + 10x12x3 .

4. If r = 4, we have partitions with 4 blocks and the only type is 1 = [3, 1, 0, 0, 0],
N1 = 5!/2!3! = 10, S5 (4) = 10

B5,4 (x1 , x2 ) = 10x13 x2 .

5. If r = 5, then we have one partition with 5 blocks and the only type is 1 =
[5, 0, 0, 0, 0], N1 = 5!/5! = 1, S5 (5) = 1,

B5,5 (x1 ) = x15 .

Finally, the number of all partitions is B5 = 52, and the complete Bell
polynomial is

B5 (x1 , . . . , x5 ) = x15 +10x13x2 +10x12x3 +15x1 x22 +5x1 x4 +10x2x3 +x5 . (1.43)

Notice: B5 = B5 (1, . . . , 1).


Definition 1.5 (Complete Bell Polynomials) (Bell polynomials for short) Bn are
defined as the sum of incomplete Bell polynomials


n
  n  n!  
n−r+1 
xj j
Bn (x1 , . . . , xn ) = Bn,r x1 , . . . , xn−r+1 = n−r+1 .
j ! j =1 j!
r=1 r=1 j =r, j =1
j j =n
(1.44)
It is worth noting that Bell polynomials can be put in a direct form without
preliminary calculation of incomplete Bell polynomials, see (A.1), p. 352.

1.4.3 Canonical Partitions

The correspondence between partitions K and matrices U is not unique because one
can permute the rows of U without changing partition K. Therefore we introduce
1.4 Partitions and Diagrams 35

an ordering for the rows of U :

uj ≤ uk

if


n 
n
uj,m 2−m ≤ uk,m 2−m . (1.45)
m=1 m=1

Assume that a matrix U has type  and denote it by U . Matrix U is in canonical


form if the rows with larger cardinality come before the smaller cardinality. The
rows with the same cardinality are in decreasing order with respect to ordering
(1.45). Now it is easy to see that the correspondence between partitions and
matrices U in canonical form is unique. A permutation matrix corresponds to type
 = [n, 0, . . . , 0]. If a permutation matrix is in canonical form, then it is In , where In
is the unit matrix with dimension n. Actually, any permutation matrix corresponds
to the same partition with a single element in each block.
Definition 1.6 The matrix U of size r × n (r ≤ n) with properties
• each column of U contains a single nonzero element which is 1,
• it is in canonical form
will be called a partition matrix.
If U is a partition matrix, then u1,1 = 1.
The ordering of the rows of a partition matrix implies an ordering of the blocks of
that partition. If U is a partition matrix, then not only the rows of U are in decreasing
order but the indices of 1s inside a row are in increasing order. For instance, let
! "
1010
U= ,
0101

and then the corresponding partition K = {b1 = (1, 3) , b2 = (2, 4)}. In this way
we can order both the blocks of a partition and the elements in a block.
Definition 1.7 A partition K is in canonical form if the blocks of K with larger
cardinality come before the smaller cardinality, similarly to the partition matrix.
The order of the blocks of the same cardinality is alphabetical and elements in each
block are increasing.

1.4.4 Partitions and Permutations

There is a unique permutation p (U) with respect to the rows uj , j = 1, . . . , r, of a


canonical partition U, defined in the following way.
36 1 Some Introductory Algebra

Let us consider the column vector u = [u1 , u2 , . . . , ur ] = vecU of dimension


rn and construct the corresponding permutation in the following way. Let p(1) = 1
(equals the index of the first nonzero entry in u1 ); then p(2) equals the index of
the second 1 in u by mod (n) unless the modulus is 0 in which case p(2) = n,
and so on. In general suppose that the j th nonzero entry of vector u is at place
h = (k − 1) n + h1 , then p(j ) = h1 , notice h1 = n if h = 0 mod (n), and therefore
h1 ∈ 1 : n. The permutation matrix with respect to the permutation p (U) will have
1 at place (j, k) if the column k of U contains 1 that is the j th 1 in u.
If K is in canonical form, then p (U) = p (K) = (b1 . . . br ), i.e. we list
the numbers of 1 : n in the order of their occurrence in the blocks. If we
separate the permutation with respect to the blocks p (K) = (b1 | . . . |br ), then the
correspondence is unique.
Example 1.21 If n = 3, then the rth size partitions L ∈ P3 with the corresponding
canonical matrices and the permutations of 1 : 3 are:
1. For r = 1; L = {(1 : 3)}

L ↔ U1 = [1, 1, 1] , p (U1 ) = p (L) = (1 : 3) .

2. For r = 2; L1 = {(2, 3), (1)}, L2 = {(1, 3), (2)}, L3 = {(1, 2), (3)},
! "
011  
L1 ↔ U2,1 = , p U2,1 = p (L1 ) = (231) ,
100
! "
101  
L2 ↔ U2,2 = , p U2,2 = p (L2 ) = (132) ,
010
! "
110  
L3 ↔ U2,3 = , p U2,3 = p (L3 ) = (123) ,
001
 
where p Lj lists the element of the blocks without separation.
3. For r = 3; L = {(1), (2) , (3)}
⎡ ⎤
100
L ↔ U3 = I = I3 = ⎣ 0 1 0 ⎦ , p (U3 ) = p (L) = (1 : 3) .
001

Note that permutations depend on partitions; therefore, we shall also use the
notation p (L). It can be seen that for a given permutation, say p = (1 : 3) one can
place several, actually 3, partitions. If the size of a partition is fixed at, say r = 2,
then partitions correspond to distinct permutations.
If the size of partitions is given, then we have a unique correspondence between
partitions and permutations. This will be satisfactory for later relationship between
partitions and permutations.
1.4 Partitions and Diagrams 37

In that way we have a unique correspondence between partitions and permuta-


tions.

1.4.5 Partitions with Lattice Structure

It is quite clear that summing up some rows of a partition matrix U will result in
a partition again. Actually that operation implies the union of the corresponding
blocks in a partition L. Moreover, we can keep the canonical form replacing the
sum in the place of the first row in the summation. One can see that each partition
can be obtained by summing up some rows of U = In . This is the finest partition.
To be more precise, consider the set Pn of all the partitions of 1 : n and define a
partial order on Pn . In this case, one says that partition K comes before, or is finer
than, partition L (K is contained in partition L), and K  L whenever the blocks
of L are the unions of the blocks of K; clearly L is coarser. The coarsest, i.e. the
largest partition in that sense, is the partition with one block On = {1 : n} and the
finest, i.e. the smallest one, is the partition with n blocks In = {(1), (2), . . . , (n)}.
Each set of elements has a least upper bound and a greatest lower bound, so that it
forms a lattice.
Example 1.22 Partition matrix
⎡ ⎤
1001
U = ⎣0 1 0 0⎦,
0010

i.e. U = K = {(1, 4) , (2) , (3)}, is finer than L = {(1, 2, 4) , (3)}, and the only
partition that is finer than K is I. It can be seen that summing the first and second
rows results in L. Similarly, to get a finer partition than K one has to split the first
row of K.
Splitting up a row generates a finer partition. If we consider all possible splits of
a row, then the result is like generating all the possible partitions of the elements of
that row. The following general result provides an algorithm for generating all finer
partition of a partition L ∈ Pn .
The union K ∪ L of partitions K and L is defined as the smallest partition that
contains both of them. Actually the blocks b of the partition K ∪ L are defined by
the condition that the elements j and k are in b if and only if there exists a chain
j = 1 , 2 , . . ., p = k such that for 1 ≤ m < p, m and m+1 are in the same block
of K or L (see Example 1.23). Suppose that partition matrices U and V correspond
to the partitions L and K, respectively. Then V is finer than U, V  U, if each row
of U can be obtained by summing up some rows of V. The union K ∪ L ↔ V ∪ U is
given by summing up the minimal number of rows of both partition matrices until
V ∪ U contains both U and V. Similarly On ↔ 1 and I ↔ In , where In is the
38 1 Some Introductory Algebra

unit matrix with dimension n. The conclusion of the above idea is that set Pn can
be dressed up to the lattice structure. The lattice structure needs a partial ordering
and two operations between the elements. In our case set Pn already has the partial
ordering () and the operation ∪. Now we define the intersection K ∩ L as another
operation, the coarsest common refinement. The elements j and k of 1 : n are in the
same block of K ∩ L if and only if j and k are in the same block as both K and L.
The intersection K ∩ L includes all possible non-empty intersections of the blocks
of K with the blocks of L.
Example 1.23 Let n = 10, K = {(1, 2) , (3, 4) , (5, 6, 7, 8) , (9) , (10)} and
L = {(1, 2, 4) , (3, 6, 7) , (5, 8) , (9, 10)}. Then K ∪ L = {(1, 2, 3, 4, 5, 6, 7, 8) ,
(9, 10)}, and
K ∩ L = {(1, 2) , (3) , (4) , (5, 8) , (6, 7) , (9) , (10)}. An instance: 2 and 3 are in a
block of K ∪ L since there is a chain j = 1 = 2, 2 = 4, 3 = k = 3.
Example 1.24 Consider a partition L ∈ P4 , with two blocks a1 = (1, 4),
and a2 = (2, 3). The partitions of set {1, 4} consisting of elements of a1 are
K11 = (1, 4) , K21 = {(1) , (4)}. Partitions with respect to a2 are K12 = (2, 3),
K22 = {(2) , (3)}. Now the union of K11 and K22 , say, is K11 ∪K22 = {(1, 4) , (2) , (3)}.
The union K11 ∪ K22 ∈ P4 is a partition of set {1, 2, 3, 4} and K11 ∪ K22  L. We see
that any union Ki1 ∪ Kj2 , i, j = 1, 2, is finer than L; moreover, if a partition K is
finer than L, then it is of the form Ki1 ∪ Kj2 .

1.4.6 Indecomposable Partitions

Definition 1.8 We shall call a partition K indecomposable with respect to the


partition L if K ∪ L = O.
Notice here, at once, that this definition is symmetric, namely: K is indecompos-
able with respect to partition L if and only if L is indecomposable with respect to
partition K.
It is easy to see that K and L are indecomposable if and only if there are no two
non-empty blocks of (1 : n) such that each of them can be composed as the union
of the blocks of both partitions K and L, because in that case K ∪ L = O (see
(1.46)). Notice that one non-empty block always exists, and it is O. One can see that
any partition indecomposable with respect to O forms the set of all partitions, and
the only partition that is indecomposable with respect to I is O.
An equivalent definition for indecomposable partitions can be given as follows.
Let K and L be two partitions. Apply some two way table for the definition of
indecomposable partitions. Consider a partition L = {a1 , a2 , . . . , ak } and construct
a table from the blocks of L. Each row of the table contains elements of the block
of L. In this way we correspond a table with k rows to partition L. The length of the
rows is the cardinality of blocks, and we denote the rows by aj as well. This  table
is not necessarily a matrix. Now consider a partition K = b1 , b2 , . . . , bj . We say
1.4 Partitions and Diagrams 39

that two blocks b1 and b2 of K hook with respect to L if there exists a block of
L containing at least one element from each of the blocks b1 and b2 . Two blocks
b1 and b2 of K communicate with respect to partition L if there exists a series of
rows (blocks) am1 = bi1 , am2 , . . ., ams = bi2 of K, such that amj and amj+1 hook
with respect to L for j = 1, 2, . . . , s − 1. Now one can define a partition K is
indecomposable if all blocks communicate. Partition K is indecomposable, in that
sense, if and only if there are no blocks bi1 , bi2 , . . ., bis of K, s < |K| and rows
(blocks) au1 , au2 , . . . , aut of L, where |K| denotes the number of blocks in K, such
that

bi1 ∪ bi2 ∪ . . . ∪ bis = au1 ∪ au2 ∪ . . . ∪ aut . (1.46)

Notice that s < |K| guarantees that there are no two non-empty blocks of (1 : n),
such that each of them can be composed as the union of the blocks of both partitions
K and L, since O can always be considered a block that is the union of the blocks
of both K and L.
We conclude that a partition K is indecomposable with respect to partition L
if all blocks of K communicate. Observe again that the roles of K and L are
interchangeable considering formula (1.46). A partition K is also referred to as
connected with respect to partition L if K and L are indecomposable.
Example 1.25 Suppose that the partition L of 1 : n contains n − 1 blocks (|L| =
n − 1), then necessarily there are n − 2 blocks of individual elements, say a1 = (1),
a2 = (2), . . ., an−2 = (n − 2) and one with a pair an−1 = (n − 1, n). It can be
seen that there are no blocks of any partition K that hook with respect to blocks of
L with individual elements. Therefore, the only possibility for blocks b1 and b2 of
K to communicate with respect to L is that one of them should contain the element
n − 1 and the other should contain the element n.
The only way that partition K be indecomposable with respect to partition L is
that K contains either one (K = O ) or two blocks (|K| = 2) only, b1 and b2 , say,
and the elements n − 1 and n belong to different blocks, i.e. b1 and b2 hook with
respect to the block an−1 = (n − 1, n).
Example 1.26 All the indecomposable partitions of (1, 2, 3, 4) with respect to par-
tition L = {(1, 2), (3, 4)} are K1 = {(1, 2, 3, 4)}, K2 = {(1, 2, 3), (4)} and K3 =
{(1, 2, 4), (3)}, K4 = {(1, 4, 3), (2)}, K5 = {(4, 2, 3), (1)}, K6 = {(1, 3), (2, 4)},
K7 = {(1, 4), (2, 3)}, K8 = {(1, 3), (2) , (4)}, K9 = {(1) , (4) , (2, 3)}, K10 =
{(1, 4), (2) , (3)}, K11 = {(2, 4), (1) , (3)}.
For the mathematically minded reader, we give a graph-theoretic interpretation
of the analysis of indecomposable partitions.
Graph A graph G = (V , E) consists of two sets, a finite set V of elements called
vertices and a finite set E of elements called edges. Each edge is identified with a
pair of vertices; if e ∈ E is an edge, then e = (v1 , v2 ), where v1 and v2 are (not
necessarily different) vertices in V .
40 1 Some Introductory Algebra

We can assign a graph G to the partitions K and L. We consider a partition K (a


set V ) as a set of vertices. The edges (with respect to a partition L) are as follows.
Let b1 and b2 be two blocks of K that define an edge if they hook with respect to
L. This means that there exists a block a ∈ L, such that it contains an element from
b1 and an element from b2 as well. Note here, it is possible that a block in K may
hook with itself. See Examples 3.23, p. 145 and 3.24, p. 145 for applications. If set
K1 contains only one block, say b, which will hook with itself necessarily, it gives
a vertex with a self-loop. Two vertices b1 and b2 (of K, say) communicate if they
are connected in the graph, i.e. if there exists a path between them. Partition K is
indecomposable with respect to L, if and only if any vertex of K can be reached
from any other vertex through the edges defined by L; in other words the graph is
closed.

1.4.7 Alternative Ways of Checking Indecomposability

Given an initial partition L, it is important to check whether a given partition K


is an indecomposable partition with respect to L. We have seen earlier that we can
associate a matrix (with each row corresponding to a block) with every particular
partition, and each row containing either 1 or a 0 depending on whether a specific
element is present or absent in the block. Let us consider the initial partition L =
{(1) , (2) , (3, 4)} (see also Example 3.24, p. 145). The number of variables is four,
and the matrix we define will hence have four columns, with which we associate the
matrix
⎡ ⎤
1000
U = ⎣0 1 0 0⎦.
0011

When rows are summed up we obtain the row vector 1 = [1, 1, 1, 1] . The partition
corresponding to the matrix O = (1, 1, 1, 1) is always indecomposable, as this is
the partition with a single element (block), which, by definition, is indecomposable.
This is the smallest partition one can achieve. So partition K1 = {(1, 2, 3, 4)} is
indecomposable. Now consider partition K2 = {(1, 3) , (2, 4)}. We associate the
matrix
! "
1010
V2 = ,
0101

where the first row corresponds to the block (1, 3) , and the second row corresponds
to the block (2, 4) . Consider the product of partition matrices,
!"
 101
V2 U = .
011
1.4 Partitions and Diagrams 41

Lemma 1.7 Partition K is decomposable with respect to the initial partition L if


the matrices V and U associated with K and L, respectively, satisfy the condition
! "
A0
VU = , (1.47)
0 B

i.e. the product is a block diagonal matrix. If the condition (1.47) is not satisfied,
then partition K is indecomposable with respect to L.
In the case of the above example we see V2 U is not in a block diagonal form;
hence, partition K2 is indecomposable with respect to L. In fact the condition (1.47)
is a necessary and sufficient condition for partition K to be decomposable with
respect to L.
There is another way to demonstrate the meaning of indecomposable partitions.
Consider a graph (K, L) with vertices of blocks of K and edges with respect to
L, i.e. two vertices (blocks) b1 and b2 of K are connected if there exists a block
of L containing at least one to one element from each of the blocks b1 and b2 .
Now the partitions K and L are indecomposable if and only if any vertex can be
reached from any other vertex, in other words if the graph is closed. One can check
whether the partitions K and L are indecomposable in the following way. Take
partition matrices U and V corresponding to the partitions L and K, respectively,
and consider the product VU . Notice that an entry ci,j of VU is not zero if and
only if the ith row of V and the j th row of U have at least one nonzero entry at the
same place, i.e. the ith block of K and the j th block of L have at least one common
element. The partitions K and L are indecomposable if and only if one cannot find
blocks A and B such that VU has the block diagonal form
! "
A0
(1.48)
0 B

with possible reordering rows and columns of VU , i.e. there exist permutation
matrices P1 and P2 of appropriate orders such that
! "
A0
VU = P1 P2 . (1.49)
0 B

Reordering is admissible; hence, for instance, permuting the rows of VU corre-
sponds to permuting the rows of V, which will not change partition K. Remember
that the partitions K and L are indecomposable if and only if L and K are
indecomposable. Note that K and K are never indecomposable unless K = O.
Suppose that partition L is given. The problem of finding all partitions K that
are indecomposable with respect to L is equivalent to the problem of finding all
partitions whose blocks are connected by L.
42 1 Some Introductory Algebra

Example 1.27 All the indecomposable partitions of (1, 2, 3) with respect to par-
tition L = {(1), (2, 3)} are K1 = {(1, 2, 3)}, K2 = {(1, 3), (2)}, and K3 =
{(1, 2), (3)}. In other words the block (2, 3) of L is able to ”connect” the blocks
of Kj , j = 1, 2, 3.
It is easy to see that the partitions K and L are indecomposable if and only if
the rows and columns of the matrix W = VU are connected, i.e. the question
is whether the matrix W1 = WW has the form (1.49) and from now on at each
new stage we take the matrix product Wk = W2k−1 (W1 and therefore all Wk are
symmetric). If the rows and columns of the matrix W = VU are connected, then
in log2 (|W|) steps, where |W| denotes the size of W, all the entries of Wk will
be positive. Otherwise if the rows and columns of the matrix W = VU are not
connected, i.e. it has the form
! "
A0
W = P1 P2 ,
0 B

then this property will come down to all Wk . Indeed, we obtain


! " !  "
A0  A 0 
W1 = WW = P1 P2 P2 P1
0 B 0 B
! "
A1 0 
= P1 P1 ,
0 B1

where A1 = AA , B1 = BB , and observe that a permutation matrix is orthogonal



P2 P2 = I. The consequence of this note is a simple procedure of checking the
indecomposability of the partitions K and L.
Example 1.28 Let partition L = {(1, 2), (3, 4, 5)} with partition matrix
! "
11000
U= ,
00111

then L will ”connect” the blocks of K = {(1), (2, 3) , (4) , (5)} with partition matrix
⎡ ⎤
10 00 0
⎢0 1 10 0⎥
V=⎢
⎣0 0
⎥,
01 0⎦
00 00 1

and therefore they are indecomposable. Also, it can be seen that


! "
1100
V U = ,
0111
1.4 Partitions and Diagrams 43

which cannot be put into the form (1.48) because of the connection of rows. Now
let us take the next step
! "
   21
W1 = V U V U = ,
13

and we obtain that all the entries are positive.


Example 1.29 The indecomposable partitions with respect to L = {(1), (2, 3) ,
(4) , (5)} are those K that contain maximum two blocks. If the size |K| = 2, then
one block should contain the element 2 and the other one the element 3; hence the
only block of L that ”connects” the blocks of K is the block (2, 3).

1.4.8 Diagrams

The characteristic function of a Gaussian random variable depends only on the


mean and covariances. Therefore all the higher-order moments and cumulants of
Gaussian random variables can be expressed by the first- and second-order ones. So
the partitions into pairs of set 1 : n have particular importance. The set of all such
partitions will be denoted by PII n and the elements of Pn are K . Let us denote the
II II

number of all partitions in PII 2n by S II , which can be calculated in the following


2n
way. The recursion from 2n to 2 (n + 1) gives S2n II exclusive extensions and 2nS II
2n
inclusive extensions; hence, the formula
II
S2(n+1) = S2n
II
+ 2nS2n
II
= (2 (n + 1) − 1) S2n
II
= . . . = (2 (n + 1) − 1)!! (1.50)

is valid. The number of all possible pairs of set 1 : n for even n is

n!
SnII = (n − 1)!! = , (1.51)
2n/2 (n/2)!

and we let SnII = 0, for odd n. The set PII n is considered empty if n is odd and it has
(n − 1)!! elements if n is even.
The partition matrix VII corresponding to K II has exactly n/2 rows with two
1s in each row. The number of all such matrices (in canonical form) is (n − 1)!!
because the canonical form implies that the place of the first 1 in each row is fixed.
The second 1 can be put into the remaining places of the first row, which is n − 1
possible choices; then n − 3 places remain in the second row and so on.  
Now if L is an arbitrary initial partition of 1 : n and K II ∈ PII n, then L, K
II

will be called a diagram (Gaussian diagram as well). Actually L, K II can be


regarded as a graph with vertices b ∈ L, where b is a block of 1 : n, and edges
(k1 , k2 ) ∈ K II . The edge (k1 , k2 ) connects vertices b1 and b2 if k1 ∈ b1 and k2 ∈
44 1 Some Introductory Algebra

 
b2 . The diagram is also given by two partition matrices U, VII corresponding to
 
L, K II .

1.4.8.1 Closed Diagrams Without Loops

A graph is called closed if any vertex can be reached from any  other vertex, i.e. the
partitions L and K II are indecomposable. A diagram L, K II has a loop if there is
a block a ∈ K II and  b ∈ L such that b contains a. See Example below.
 a block
The diagram L, K II will be called closed and without a loop if the graph
defined by it is closed(indecomposable) and has no loop,
 respectively. We shall use
the notations L, K II cl , “cl” for closed and L, K II cnl , “cnl” for closed no loop,
accordingly. It can be seen that the rows of the matrix product VII U contain only
either 1s or 2s besides 0s. The diagram is without a loop if VII U has no entry with
2 at all. The diagram is closed if the matrix product VII U cannot be partitioned,
with a possible reordering of its rows and columns, in the following way:
!  "
 reordering VII
1 U1 0
II
V U →  ,
0 VII
2 U2

j , j = 1, 2, are appropriate partition matrices.


where Uj and VII
Example 1.30 The diagram with respect to the partitions L = {(1), (2, 3, 4)}
! "
1000
L↔U= ,
0111

and a partition of pairs K II = {(1, 2) , (3, 4)}


" !
1100
K ↔V =
II II
,
0011

is closed and has a loop, since we have an entry 2 in


" !
11

V U =
II
.
02
 
Let L = bj |j = 1 : p be given, then the number S of all possible *diagrams *
 
L, K II (actually the number of partitions K II ) depends on the cardinality *bj * = rj
 
of blocks bj ∈ L, and we denote S by S II r1:p .
Case 1.5 We start with a case in which
 the initial partition L contains two blocks,
i.e. L = {b1 , b2 }, then the diagram L, K II without loop exists only if |b1 | = |b2 |,
and then partition K II contains only pairs (k1 , k2 ) such that k1 ∈ b1 and k2 ∈ b2 .
1.4 Partitions and Diagrams 45

One can obtain all such types of partitions K II by fixing the components of b1
and permuting the components of b2 . In this way each permutation provides us a
partition of pairs K II . Hence,

S II (r, r) = r! (1.52)

 L
(see Exercise 1.52). Equivalently,  ↔ U is 2 × n = 2r with n/2 1s in each row
and VII is n/2 × n. So that U VII is 2 × n/2 with each entry being 1. Suppose
that U is fixed; then the number of all the matrices VII is (n/2)!. Indeed in each row
of VII there are two 1s and n − 2 0s. The place of one of the 1s is fixed (to obtain a
canonical partition) and the second element 1 can be put into n/2 places in the first
row n/2 − 1 places in the second row, and so on.
Example 1.31 Let L = {b1 , b2 }, where b1 = (1 : 3), and b2 = (4 : 6), then
!
"
111000
L↔U= ,
000111

and an instance of K II capturing b1 and b2 is


⎡ ⎤
100100
K II ↔ VII = ⎣ 0 1 0 0 1 0 ⎦ .
001001

We see that there are 3! partitions K II . Indeed, let us fix the places of b1 (in different
rows naturally), then the second “1” in the first row can be put into three places,
the second “1” in the second row can have two places, and finally there is only one
place for the remaining third “1” in the third row.
 
Case 1.6 The next problem concerns finding all the closed diagrams L, K II
without loops, where L = {b1 , b2 , b3 }, with |bj | = rj , suppose that r1 + r2 + r3 =
2r, even then
  
r1 r2
S II (r1:3 ) = r3 ! (r − r3 )!, (1.53)
r − r2 r − r1

ri ≤ r, ri + rj ≥ rk , the proof of this belongs to calculus; one may obtain a


combinatorial proof as follows.   
Start choosing k1 vertices from b1 and b2 . This can be done in kr11 kr21 k1 ! ways.
Now r1 − k1 vertices are chosenfrom r3 , and the rest of the r1 − k1 element of b1 is
connected. This can be done in r1 r−k
3
1
(r1 − k1 )! ways. The remaining vertices are
connected in (r3 − r1 + k1 )! ways r3 − r1 + k1 = r2 − k1 , hence r3 + k1 = r, and
46 1 Some Introductory Algebra

the result is
       
r1 r2 r3 r1 r2
k1 ! (r1 − k1 )! (r3 − r1 + k1 )! = r3 ! (r − r3 )!.
k1 k1 r1 − k1 r − r2 r − r1

The expression (1.53) has a symmetric form as well


  
r1 r2 r1 !r2 !r3 !
S (r1:3 ) =
II
r3 ! (r − r3 )! = .
r − r2 r − r1 (r − r1 )! (r − r2 )! (r − r3 )!
(1.54)

We will use the following diagrams of pairs in Sect. 4.3.2, p. 193.


Case 1.7 (Case 1.5 Contd.) Let the initial partition L contain k blocks of pairs, i.e.
L = (bII1, . . . bk ), so that bj = (2j − 1, 2j ), say. Then a partition K of the diagram
II

L, K without a loop contains only pairs (km , kn ) such that km ∈ bp and kn ∈ bq ,


p = q. Let us start from a pair, bm = (2m − 1, 2m) say, visit all the blocks bp ,
and arrive at bm again. The number of such diagrams is 2k−1 (k − 1)! because all
possible orders of blocks are (k − 1)! and the connection between two blocks can be
made in 2 different ways in accordance with the permutations of their elements and
the final step is determined. Therefore, their number is 2k−1 (see Example 1.32).
Example 1.32 Let L = {b1 , b2 , b3 }, where b1 = (1, 2), b2 = (3, 4), and b3 =
(5, 6)
⎡ ⎤
11000 0
L ↔ U = ⎣0 0 1 1 0 0⎦.
00001 1
 
Then a closed diagram without loops L, K II can be constructed in the following
way. Let us start a path from element {1} of b1 , connect it to the element {3} of b2 ,
then the next element is necessarily {4} of b2 , and then it can be connected to one
element, say {5} of b3 . There is no choice at the last step and we connect {6} to {2}.
The corresponding K II is {(1, 3) , (4, 5) , (6, 2)}
⎡ ⎤
101000
K II ↔ VII = ⎣ 0 0 0 1 1 0 ⎦ .
010001

In each step we chose a block and an element from that block. We could do it 2×2×2
ways, S II (2, 2, 2) = 8 (see (1.53)).
Case 1.8 The case of four blocks is more complicated than the previous ones. Con-
sider the closed diagrams L, K II cnl without loops, where L = {b1 , b2 , b3 , b4 },
with |bj | = rj ; suppose that r1 + r2 + r3 + r4 = 2r is even. Then split up each bj
1.4 Partitions and Diagrams 47

into three parts kj,1:3 , such that k1,2 = k2,1 , k1,3 = k3,1, k1,4 = k4,1 , and proceed
k2,3 = k3,2 , k2,4 = k4,2 , finally k3,4 = k4,3. See the following table.

|b1 | = r1 k1,2 k1,3 k1,4


|b2 | = r2 k1,2 k2,3 k2,4
|b3 | = r3 k1,3 k2,3 k3,4
|b4 | = r4 k1,4 k2,4 k3,4

If the number of parts k1:4,1:4 is given, then we use the reasoning of the previous
cases and finally we sum all these possible parts
∗ r1 !r2 !r3 !r4 !
S II (r1:4 ) = , (1.55)
k1,2!k1,3 !k1,4 !k2,3!k2,4 !k3,4 !

where summation  ∗ is for all non-negative integers k1,2 , k1,3 , k1,4 , k2,3 , k2,4 , k3,4 ,
such that k1,2 + k1,3 + k1,4 + k2,3 + k2,4 + k3,4 = r, and r1 = k1,2 + k1,3 + k1,4 ,
r2 = k1,2 + k2,3 + k2,4 , r3 = k1,3 + k2,3 + k3,4 , r4 = k1,4 + k2,4 + k3,4 (see
Exercise 1.55).

1.4.8.2 Closed Diagrams with Arms and No Loops

Let us consider
 a partition K I,II with blocks having two elements at most. Let a
diagram L, K I,II correspond to a graph that can have free edges with respect to
the blocks of K I,II with single elements. These free edges are called the arms of the

diagram L, K I,II . Let PI,II n denote the set of all partitions having blocks with one
I,II    
or two elements, i.e. Pn = K I,II . The arms of a diagram L, K I,II correspond
to the blocks of partition K I,II with one element.
Example
 1.33 Take the partitions L = {b1 = (1), b2 = (2, 3, 4)}. The diagram
L, K I,II is closed if blocks b1 and b2 are connected by K I,II . If K I,II =
 
{(1, 2) , (3) , (4)} of (1 : 4), then L, K I,II is closed and has two arms. The graph
(L, K I,II ), K I,II ∈ PI,II
n has only one edge (k1 , k2 ) ∈ K
I,II but has arms (j ) ∈ K I,II

as well.
 
Put dK for the number of arms in partition K I,II and DK = j | (j ) ∈ K I,II for
 
the set of arms, |DK | = dK . A pair of matrices U, VI,II corresponds to the diagram
(L, K I,II ), where VI,II is a partition matrix with rows having entry 1 in either one or
two places. dK is the number of rows at VI,II having only one entry with 1.
First we consider set PI,IIn , and separate its partitions by the number of arms dK .

Example 1.34 Consider PI,II n , to be the case when n = 4. All the possible
choices for dK are 4, 2, and 0. If dK = 4, K I,II contains only arms, K I,II =
{(1) , (2) , (3) , (4)}, i.e. VI,II = I4 . If dK = 2, K I,II contains 2 arms and 1 pair, for
48 1 Some Introductory Algebra

example K I,II = {(1) , (2) , (3, 4)}, i.e.


⎡ ⎤
1000
V I,II
= ⎣0 1 0 0⎦.
0011

Now if dK = 0, K I,II contains only pairs, for example K I,II = {(1, 2) , (3, 4)}, i.e.
! "
1100
V I,II
= .
0011

If dK = m, then the number of all partitions SnI,II (dK ) is given as


 
n
SnI,II (m) = (n − m − 1)!!. (1.56)
m

The reason is as follows. If dK = m, then n = m + 2k, i.e. m = n − 2k; we have k


pairs in addition to m arms. SnI,II (dK ) writes also

n!
SnI,II (n − 2k) = .
(n − 2k)!k!2k

Indeed we have the following recursion by n for SnI,II (m)

I,II
Sn+1 (m) = SnI,II (m − 1) + (m + 1) SnI,II (m + 1) ,

where again the number of inclusive extensions with dK = m − 1 is SnI,II (m − 1)


and the number of exclusive extensions is SnI,II (m + 1), hence the result (see (1.51),
(1.38)).
The cardinality of PI,II
n is

 n
SnI,II = (n − m − 1)!!,
m
m=n:2:0

where the summation is taken by set {n : 2 : 0} = {n, n − 2, n − 4, . . . , n − 2


[n/2]}, and [n/2] denotes the entire part of n/2, see (4.26), p. 200, for the
application of this.
Case 1.9 Next, let us consider partition L = {b1 = (1 : n1 ), b2 = (n1 + 1 : n1
 describe the partitions of K
+n2 )},I,IIand
I,II for the closed diagram without loops

L, K . The possible number of arms are n1 + n2 − 2r, r ≤ min (n1 , n2 ), where


r is the number of edges. Let the number of arms dK = n1 + n2 − 2r in partition
1.5 Appendix 49

K I,II ; then the number of such partitions K I,II is


  
n1 n2 n1 !n2 !
Sn1 ,n2 (r, r) = r!
I,II
= . (1.57)
r r (n1 − r)! (n2 − r)!r!
 
A closed diagram without loops L, K III cnl includes r element from each b1
  
and b2 , and they are connected. The possible number of these is nr1 nr2 . Once
the r-tuples are chosen  they can be connected in k! ways, hence (1.57).
The cardinality of L, K I,II cnl sums up all partitions by all possible arms


min(n1 ,n2 )
n1 !n2 !
SnI,II
1 ,n2
= .
(n1 − r)! (n2 − r)!r!
k=0

 
Case 1.10 Now, we consider a closed diagram L, K I,II cnl without loops with
arms, where L = (b1 , b2 , b3 ), with |bj | = nj . Let n1:3 = n, and the number
of arms dK be n − 2r, which implies that there are r edges. Suppose furthermore
that these edges connect r1 vertices from b1 , r2 vertices from b2 , and r3 vertices
from b3 , so r1:3 = 2r. The number SnI,II 1:3 (r1:3 ) of all such partitions is
 
n1:3 II n1:3 !
SnI,II (r1:3 ) = S (r1:3 ) = , (1.58)
1:3
r1:3 (n1:3 − r1:3 )! (r − r1:3 )!
where r = (r, r, r), since one has chosen sub-blocks with cardinalities r1:3 in all
possible ways, and then apply (1.53) to obtain all closed diagrams without a loop.
 
Case 1.11 All closed diagrams L, K I,II without loops, with arms, where L =
(b1 , b2 , b3 , b4 ), with |bj | = nj ; let n1:4 = n, the number of arms dK be n − 2r,
which implies that there are r edges. Suppose furthermore that these edges connect
r1 vertices from b1 , r2 vertices from b2 , r3 vertices from b3 , and r4 vertices from b4
so r1:4 = 2r. The number SnI,II 1:4 (r1:4 ) of all such partitions is
 
n1:4 II
Sn1:4 (r1:4 ) =
I,II
S (r1:4 ) . (1.59)
r1:4

1.5 Appendix

1.5.1 Proof of Lemma 1.1

Proof

vec (BA) = (In ⊗ B) vecA


 
= (vecA) ⊗ Inp vec (In ⊗ B)
  
= (vecA) ⊗ Inp In ⊗ Km•n ⊗ Ip (vecIn ⊗ vecB) .
50 1 Some Introductory Algebra

 
vec (BA) = A ⊗ Ip vecB
   
= (vecB) ⊗ Inp vec A ⊗ Ip
   
= (vecB) ⊗ Inp Im ⊗ Kp•n ⊗ Ip vecA ⊗ vecIp .

1.5.2 Proof of Lemma 1.3

Proof It is enough to prove that one can interchange two neighboring matrices in
a T-product. This latter one, i.e. interchanging two neighboring matrices, follows
from the formula (1.15) as follows. We use (1.15) for interchanging the neighboring
elements of the T-product Aj +1 ⊗ Aj +2 ; by our notation we have
    
K(21) dj +1 , dj +2 Aj +1 ⊗ Aj +2 K−1
(21) pj +1 , pj +2 = Aj +2 ⊗ Aj +1 .

Hence we can complete both commutator matrices with the T-product of unit
matrices and obtain
    ⊗ 
Id1:j ⊗ K(21) dj +1 , dj +2 ⊗ Idj+3:n Ak
k=1:n
 
−1  
Id1:j ⊗ K(21) pj +1 , pj +2 ⊗ Idj+3:n
⊗   
= Ak ⊗ K(21) dj +1 , dj +2 Aj +1 ⊗ Aj +2
k=1:j
  ⊗
K−1
(21) pj +1 , pj +2 ⊗ Ak
k=j +3:n
⊗ ⊗
= Ak ⊗ Aj +2 ⊗ Aj +1 ⊗ Ak .
k=1:j k=j +3:n

Therefore the assertion (1.27) follows.

1.5.3 Proof of Lemma 1.5


   
Proof Let (j1:q ) be the type of k1:q , and let j1:q the canonical index among
       
the permutations p k1:q of k1:q . Note that the type of k1:q and p k1:q are the
same. Consider first
(j1:q ) ! ⊗
Sd1q e⊗
k1:q = 
ej1:q
q!
1.5 Appendix 51

2
e⊗
since among the q! terms e⊗
j1:q includes j1:q = q!/(j1:q ) ! terms. Now



w = Sd1q w = w(k1:q ) Sd1q e⊗
k1:q .
(k1:q )
     
Summing up Sd1q e⊗
k1:q by those k1:q for which p k1:q = j1:q , we shall have the
e⊗
same basis vectors (j1:q ) !/q!j1:q and if we sum up the coefficients in each case,
2
e⊗
then we can replace (j1:q ) !/q! by 1/ j1:q ; hence



w= e⊗
(j1:q )
w j1:q ,
(j1:q )∈IÐd,q

where
1 
(j1:q ) =
w 2
w(k1:q ) .
e⊗
j1:q (k1:q )|p(k1:q )=(j1:q )

1.5.4 Star Product

Closing this Appendix we define the star product



of matrices, which is of some
use in
matrix
derivatives. Let a matrix A = ai,j be n × m, and a blocked matrix
B = Bi,j be n × m. Assuming that all blocks have the same size, the star product
A  B is defined by

AB= ai,j Bi,j . (1.60)
i,j

Let a be n × 1, and b = [b1 , . . . , bn ] where each bk is d × 1. One can show


  
a ⊗ Id b = a  b,

by direct computation.
52 1 Some Introductory Algebra

1.6 Exercises

Section 1.1

1.1 Prove Remark 1.2.


1.2 Multiply (14)S and (25)S , calculate the consecutive application of them, and
then use Remark 1.1 for obtaining both inverses. Notice here that cycle (14)S
assumes 4 elements at least; otherwise it is valid for 5th-order permutations as well.
1.3 Recall notation 4 : 2 = (432) and 2 : 4 = (234), show that (4 : 2)S (2 : 4)S =
(1234), and conclude (4 : 2)−1
S = (2 : 4)S .

1.4 Show that the inverse permutation of p = (n : k)S is p−1 = (k : n)S .


1.5 Show the following equations:

(1432) × (2134) = (2431)


(1432) × (1243) = (1342) ,
(1324) × (2134) = (2314) ,
(1324) × (1243) = (1423) .

1.6 Show the following equations:

(1423)−1 = (1342) ,
(341625)−1 = (351264) ,
(561324)−1 = (354612) .

1.7 Let p = (312)S , and calculate m4 (p) (see (1.1)).

Section 1.2

1.8 Let a and b be the vectors of dimensions m and p, respectively, show

ab = a ⊗ b .

1.9 Show that


 
vec b ⊗ a = a ⊗ b.
1.6 Exercises 53

1.10 Show that


        
vec vec ab vec ab = vec ab ⊗ vec ab
= (b ⊗ a)⊗2 .

1.11 If Bb is a valid product, then show

a1 ⊗ Bb = (a1 ⊗ B) b,

and

a1 ⊗ Bb ⊗ a2 = (a1 ⊗ B ⊗ a2 ) b,

more general
⊗ ⊗ ⊗ ⊗ 
ak ⊗ Bb ⊗ ak = ak ⊗ B ⊗ ak b.
1:j −1 j +1:M 1:j −1 j +1:M

1.12 Show that


 
trBCD = vec B (I ⊗ C) vecD,
  
trBX CXD = vec X B D ⊗ C vecX
  
= vec X DB ⊗ C vecX,

as particular cases of (1.10) (see [Mui09], p. 76).


1.13 Show that
   
Km2 •m2 vec A ⊗ A = vec A ⊗ A .

1.14 Show that


 2  2
a⊗4 vec⊗2 Id = a⊗2 vecId = ai2 = a4 .

1.15 Show that

a⊗4 vec⊗2 Id = a⊗4 vecId 2

and
    
vec Id ⊗ Id 2 a⊗4 = Id 2 ⊗ vec Id a⊗4 .
54 1 Some Introductory Algebra

1.16 Let A be a d × d matrix, show that


   
vec AA AA = vec Id ⊗ Id 2 A⊗4 .

1.17 Find the commutator for permutation p = (234)S , in terms of interchange


matrix Km•n .
1.18 Let commutator L−1
22 of vectors with dimension d be given by

L−1 −1 −1
22 = Id 4 + Kp1 + Kp2 , (1.61)

where p1 = (1324), p2 = (1423), then show that


 
Id 2 ⊗ vec Id L−1
22 (vecId )
⊗2
= (d + 2) vecId .

1.19 Let V be a symmetric d × d matrix, and λ and a be d-dimensional vectors.


Show that
   
K−1
(3,2)S d, d 2
, d ((Vλ) ⊗ a⊗vecV) = K −1
(1,2)S d, d 2
, d (vecV ⊗ (Vλ) ⊗a) ,

and
   
Id 4 + Id 4 + K−1 −1
(12)S (d, d, d) ⊗ Id K(12)S d, d , d = Id 4 + K(3124) + K(1342) .
2

1.20 Let  be a d × p matrix, show that


 1/2 ⊗2
Id −  vecId = vecId − ⊗2 vecIm .

1.21 If  symmetric then show that


 ⊗r
 ⊗2r (vecId )⊗r = vec 2 ,

Chacón and Duong [CD10].


1.22 Let  be a d × p matrix, show that
 1/2 ⊗4
Id −  (vecId )⊗2 = (vecId )⊗2 − vecId ⊗ ⊗2 vecIp
 ⊗2
− ⊗2 vecIp ⊗ vecId + ⊗4 vecIp .

1.23 Let a be m × 1 and b be p × 1, then show that


   
vec b ⊗ a ⊗ b ⊗ a = Im ⊗ Km•p ⊗ Ip (b ⊗ a)⊗2 ,
1.6 Exercises 55

and if a = b
 
vec a ⊗ a ⊗ a ⊗ a = a⊗4 ,

Loperfido [Lop14, p. 1839].


1.24 Show that

K(23)S (vecId ⊗ x) = K(12)S (x ⊗ vecId ) .

1.25 Rewrite the commutator matrices In ⊗ Km•n ⊗ Ip and Im ⊗ Kp•n ⊗ Ip in terms


of permutations for matrices.
1.26 Let ek denote the kth coordinate unit vector in Rd , show


d
vecId = e⊗2
k . (1.62)
k=1
 
More general, denote matrix e⊗k−1
 by E (d, k) and show
=1:d


d
vecE (d, k) = e⊗k
k .
k=1

 Id be the unit matrix with dimension


1.27 Let   d. Express vec (Id ⊗ Id ) =
vec Id 2 in terms of vec⊗2 Id and show that vec Id 2 = vec⊗2 Id .
1.28 Show that

vec⊗2 Id vec⊗2 Id = d 2 ,

vec Id 2 vec⊗2 Id = d,

and
  
vec Id 2 K(1324)vec⊗2 Id = d 2 .

1.29 Take a square matrix A, show that


 
K(1432)vec A ⊗ A = vec⊗2 A,
 
K(3214)vec A ⊗ A = vec⊗2 A.
56 1 Some Introductory Algebra

1.30 Show
 
vecId 3 ⊗ vecId 4 = Id 3 ⊗ Kd 3 •d 4 ⊗ Id 4 vecId 7 .

1.31 Let A be m × n, a be q × 1, and b be p × 1, then


   
vec a ⊗ A ⊗ b = Iqn ⊗ Kp•m (a ⊗ vecA ⊗ b) .

1.32 Rewrite the commutator matrix

L−1 −1 −1 −1
13 = Id 4 +K(1243) + K(1342) +K(2341) ,

without using inverse of matrices. Hint: see Remark 1.1 for inverse permutations.

1.33 Show

K(135246)vec⊗3 Id = vecId 3 ,

and

K−1 ⊗3
(142536)vec Id = vecId 3 .

1.34 Show

K(135246)K(135642) = K(154362) = K−1


(164325).

1.35 Let  and  be p × p and d × d symmetric matrices, respectively. Show that

K(1324)vec ( ⊗ )⊗2 = vec⊗2 ( ⊗ ) .

1.36 Let  be d × d,  be p × p, and A be p × p, show


      
vec A ⊗ Id 2 Ip ⊗ Kp·d ⊗ Id ⊗ vec B ⊗ Id 2 Ip ⊗ Kp·d ⊗ Id
vec ( ⊗ )⊗2 = trAB vec⊗2 .

1.37 Denote the columns of matrix B by bk and ek be the unit coordinate vector,
show
  
vec B⊗2 = bj ⊗ bk ⊗ ej ⊗ ek ,


vec⊗2 B = bj ⊗ ej ⊗ bk ⊗ ek ,
1.6 Exercises 57

and
  
vec Kd•d B⊗2 = bj ⊗ bk ⊗ ek ⊗ ej .

1.38 Show that


 
a ⊗ Id b = a  b.

Section 1.3

1.39 Let  be a d × d symmetric matrix, show vec is Sd,2 symmetric.


1.40 If A is d × d square matrix, then show

Sd12 A⊗2 = A⊗2 Sd12 .


 
Hint: Sd12 = Id 2 + K(21) /2, and see (1.18), p. 9.
1.41 Let A be a d × d matrix, then show that A⊗q and Sd1q commute. Hint: show
Kp A⊗q = A⊗q Kp , first.
1.42 Let a, b ∈ Rd and define matrix

L−1 −1 −1 −1
13 ,11 = Id 4 +K(1243) + K(1342) +K(2341) .

Show that
 
w (a) = L−1
13 ,11 a ⊗3
⊗ b

is symmetric. Hint: rewrite the inverse in terms of inverse permutations first, see
Exercise 1.32, p. 56.
#
1.43 Show that vecId ∈ Sd,2 with rank d and vecId = dj=1  e(j,j ) .
1.44 Let w ∈ Sd,2 , show the particular form


d−1 d−k
w= e⊗
wj,j +k(j,j +k)
k=0 j =1

of w for general d.
1.45 Show
  
Qd,2vecA = Q+ 
d,2 vec A + A − diagA .
58 1 Some Introductory Algebra

1.46 Let b be with dimension d and A be d × d, show

1
Sd12 (b ⊗ A) = (b ⊗ A + A ⊗ b) .
2

Section 1.4

1.47 Consider a partition K with blocks a1 , a2 , . . ., ak . Denote L1 , L2 , . . ., Lk ,


partitions of subsets according to each block a1 , a2 , . . ., ak , respectively. In this way
we have a sub- lattice of Pn being equivalent to the lattice generated by the blocks of
K. Show that any partition L is finer, then K belongs to this sub-lattice, see (3.42),
p. 140 for an application.
1.48 List all the indecomposable partitions of (1, 2, 3) according to the partition
L = {(1), (2, 3)}.
1.49 List all partitions indecomposable with respect to L = {(1) , (2) , (3, 4)}.
1.50 The number of all the indecomposable partitions of (1, 2, 3, 4, 5, 6) according
to the partition L = {(1, 2, 3), (4, 5, 6)} is 178. Show that the following partitions
are indecomposable to L:
K1 = {(1, 5, 6) , (2, 3, 4)} ,
K2 = {(1, 3) , (2, 4, 6) , (5)} ,
K3 = {(1, 6) , (2) , (3) , (4) , (5)}.
1.51 The number of all the indecomposable partitions of (1, 2, 3, 4, 5, 6) according
to the partition L = {(1, 2), (3, 4) , (5, 6)} is 129. Show that the following partitions
are indecomposable to L:
K1 = {(1, 5, 6) , (2, 3, 4)} ,
K2 = {(1, 3, 5) , (2, 6) , (4)} ,
K3 = {(1) , (2, 6) , (3) , (4, 5)}.
 
1.52 Find the number of all closed diagrams without loops L, K II cnl when L =
{b1 = (1 : r), b2 = (r + 1 : 2r)}.
1.53 If (1.46) fulfills, then the block bis +1 , say, does not communicate with any of
the blocks bi1 , bi2 , . . . , bis .
* *
1.54 Show that for a partition L = (b1 , b2 , . . . , bk ), where for each j , *bj * = 2,
 
the number of all closed diagrams without loops L, K II is 2k−1 (k − 1)!.
1.55 Show that formula (1.55) is valid.
1.56 Prove formula (1.56) using combinatorics.
1.57 Show that formula (1.57) is valid.
1.7 Bibliographic Notes 59

1.7 Bibliographic Notes

The theory of permutation is well known. Here we refer to the classical works
[And76] and [Aig12].
The tensor product has a wide range of references, we used [MN99, Gra18], see
also [KvR06]. One reason we use the name T -product instead of Kronecker product
is the following quote by Muirhead [Mui09] “Actually the connection between
this product and the German mathematician Kronecker (1823-1891) seems rather
obscure.” Earlier a distinction was made between the right, half-right and left, half-
left tensor products, [Hol85]; we use the left–right one as a general case. The vec
operator, also called the pack operator, was considered by [Mac74] in addition to
the previous references for tensors.
The theory of commutation matrices is described by Magnus and Neudecker
[MN99] in detail. [Hol85] uses the notion of direct product permuting matrix
(DPPM) of k degrees, and a particular case, the transposition matrix is also
considered there. The structure of a commutator in terms of elementary matrices
is given by [Gra18].
[HS81] reviewed the results of a vec-permutation matrix for a cyclically permut-
ing order in Kronecker products of three or more matrices.
An algorithm for generating the commutator matrix Kp for any p ∈ Pn has been
published by Holmquist [Hol96a]. The star product with basic properties has been
applied for matrix derivatives by [Mac74].
Introductory works on multilinear algebra are [KM97] and [Nor84].
The symmetrization matrix has been considered by [Mei05] and [Hol96a] with
the latter using the notation Sd1q as well. Further results on symmetric tensors are
treated in [CGLM08] and [DL08]. For orthogonal decomposability, ODECO for
short, see [Rob16], and for super-symmetric tensors, see [DGP18].
Duplication and elimination matrices can be found in [CGLM08, Mei05, MN80]
and [MN99]. Our approach is traced back to Magnus and Neudecker (1980) [MN80]
who introduce two transformation matrices, L and D, which consist of zeros and
ones. L eliminates from vecA the supra-diagonal elements of A for any (n, n)
arbitrary matrix A, while D performs the reverse transformation for a symmetric
A.
The inclusive and exclusive extensions methods are treated in [Cam94]. Aigner
[Aig12] studies partitions in detail as a lattice structure, as well as the number of
all partitions of type  etc., see also DLMF [DLM], 26.8.22. Bell numbers are
considered by [And76]; Bell polynomials with statistical applications have been
used by [VN94] and [VN97].
Indecomposable partitions in relation to stochastic processes are considered by
[LS59, Bri01] and [MM85] among others.
The number of closed diagrams without loops has been considered by [Car62],
[Fel40] (p.22).
Chapter 2
The Tensor Derivative of Vector
Functions

Abstract Taking the first-order partial derivative of a vector-valued function results


in the Jacobian matrix, which contains all partial derivatives of each entry. The
matrix which contains all possible second-order partials of each entry, called the
Hessian, is also well-studied. The first-order partial derivatives of a vector is a
matrix, the next and higher-order partials constitute matrices with complicated
structures. Among the different ways of handling this problem, there are some
methods which use the tensor product of possible matrix valued functions and
partials. Here we follow a very simple version in that line, namely we put partials
into a column vector and apply it as consecutive tensor products on a vector-valued
function. In this way we keep the results as vectors, and although the tensor product
is not commutative, we can use linear operators to reach all permutations of the
terms involved in the process. The main objective of this chapter is to show how
simple and clear formulae can be derived if we use the method of tensor products for
higher-order partial derivatives of vector-valued functions. Faà di Bruno’s formula
will play an important role later on when we will be interested in the connections
between moments and cumulants.

2.1 Derivatives of Composite Functions

Let f and g be two scalar-valued functions, where f is a function of one variable


and g is a function of multiple variables x = (x1 , x2 , . . . , xd ). We assume that f is
continuously differentiable d times and g (·) is continuously differentiable in all its
arguments. Let h (x) be a scalar-valued function of x defined by

h (x) = f (g (x)) . (2.1)

The function h is known as a composite or compound function. To see the


motivation for defining these functions, let us consider some special cases.
Example 2.1 Let f (x) = ln x, and g (x) = EeixX , where X is a random variable.
The function h (x) = f (g (x)) is nothing but the cumulant generating function.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 61


G. Terdik, Multivariate Statistical Methods, Frontiers in Probability
and the Statistical Sciences, https://doi.org/10.1007/978-3-030-81392-5_2
62 2 The Tensor Derivative of Vector Functions

Example 2.2 Let f (x) = exp (x), g (x) = ln EeixX . Then h (x) = f (g (x)) =
EeixX is the characteristic function.
Our object is to obtain an expression for the partial derivatives of h (·) in terms of
the partial derivatives of g (·) since such expressions will be used to obtain results
for cumulants and moments.
Let us introduce the following notations for the derivative:

d r f (x)
f (r) (x) = , r = 1, 2, . . . .,
dx r
and for the partial derivatives

∂g (x)
gj = , j = 1, 2, . . . , d,
∂xj
∂ k g (x)
g(1,2,...,k) = , k = 2, . . . , d.
∂x1 ∂x2 · · · ∂xk

One can easily verify the following equations in terms of the partial derivatives
of (2.1):

∂h (x)
= f (1) (g) g1 , (2.2)
∂x1
∂ 2 h (x)
= f (1) (g) g(1,2) + f (2) (g) g1 g2 . (2.3)
∂x1 ∂x2 + ,- . +,-.
(inclusive) (exclusive)

Observe that taking the derivative of (2.2), we obtain an inclusive and an exclusive
extension of index (1) of g1 and this happens when we take the next derivative

∂ 3 h (x)
= f (1) (g) g(1,2,3) + f (2) (g) g(1,2) g3 (2.4)
∂x1 ∂x2 ∂x3 + ,- . + ,- .
(inclusive) (exclusive)


+f (2)
(g) g(1,3) g2 + g1 g(2,3) + f (3) (g) g1 g2 g3
+ ,- . + ,- .
(inclusive) (exclusive)


= f (1) (g) g(1,2,3) + f (2) (g) g(1,3) g2 + g1 g(2,3) + g(1,2) g3 (2.5)
+f (3) (g) g1 g2 g3

as well. From the expressions (2.2), (2.3), and (2.5), we see that higher-order
partial derivatives can be obtained from lower-order derivatives using inclusive
and exclusive principle that we have introduced earlier; (see Sect. 1.4, p. 26).
2.1 Derivatives of Composite Functions 63

Tables 1.1, 1.2, p. 28 and 1.3, p.29 are useful when we derive expressions (2.2),
(2.3), and (2.5).
We notice that the evaluation of the partial derivatives of h follow an elegant
pattern, since the derivatives of h depend on the evaluation of the partial derivatives
of g in a special way. Let us look at the expressions (2.3) and (2.5). If we associate a
vector of one element [1] with g1 , then the partial derivative of g in (2.5) is obtained
from the vectors [1, 1], [1, 0], and [0, 1]. We conclude

[1, 1] ⇐⇒ g(1,2) (inclusive)


! "
10
⇐⇒ g1 g2 (exclusive)
01

(See Table 1.1). We can consider the above partial derivatives as partitioning the
set {1, 2} by two partitions K1 = {(1, 2)} and K2 = {(1) , (2)}, where partition K1
contains a single block with respect to the second-order partial derivative g(1,2) , and
K2 contains two blocks with respect to g1 and g2 . When we calculate the third-order
derivative, which is given by (2.5) via (2.4), we note that from every partition in (2.4)
we obtain both inclusive and exclusive partitions. The number of inclusive partitions
which are obtained at any stage n is equal to the number of blocks of partitions at
stage (n − 1) plus one exclusive partition. At stage 2 we have two partitions: K1 =
{(1, 2)} and K2 = {(1) , (2)}, the second partition has two blockstherefore at step 3,
(i.e. third-order partial derivatives), we will have two inclusive g(1,3) g2 , g1 g(2,3)
and one exclusive g1 g2 g3 partition (see Table 1.2, p. 28) according to K2 . We will
have one inclusive g(1,2,3) and one exclusive partition g(1,2)g3 which correspond to
K1 . In other words, the partial derivatives obtained in (2.5) can be considered as
partitions of the set {1, 2, 3} into partitions K1 = {(1, 2, 3)}, K2 = {(2, 3) , (1)},
K3 = {(1, 3) , (2)}, K4 = {(1, 2) , (3)}, K5 = {(1) , (2) , (3)}. Partition K1 has one
block, the partitions K2 , K3 , K4 have two blocks each, while partition K5 has three
blocks, etc.
Remark 2.1 It is worth mentioning that the each partition Kj above is in a canonical
form.

2.1.1 Faà di Bruno’s Formula

Now we state and prove a general result for the derivatives of a compound function.
Lemma 2.1 (Faà di Bruno’s Formula) Let f and g be two scalar-valued func-
tions, f is a function of one variable and g is a function of variable x ∈ Rd . Let f
be continuously differentiable d times and g (·) be continuously differentiable by all
entries of its argument x. Let h (x) be a scalar-valued implicit function of x so that

h (x) = f (g (x)) .
64 2 The Tensor Derivative of Vector Functions

Then for n ≤ d,

∂ n h (x) 
n   ∂ |b| g (x)
= f (r) (g)  , (2.6)
∂x1 ∂x2 · · · ∂xn j ∈b ∂xj
r=1 K{r} ∈Pn b∈K{r}


where K{r} denotes partitions with size |K| = r, and ∂ |b| g (x) / j ∈b ∂xj is the
 
|b|t h -order derivative of g (x) with respect to a block xb = xj , j ∈ b of x.
Before one proves the above result, the product and the summation  symbols
in Eq. (2.6) need some explanation. The first product symbol b∈K{r} stands for
the product over all blocks# b which belong to a particular partition K{r} (see for
instance (2.5)). Summation K{r} ∈Pn is over all such possible partitions K{r} which
are generated from the set (1, 2, . . . , n) and in which the number of blocks is r. For
example in the case when n = 3, (2.5) shows that we have one partition K{1} for
r = 1; three partitions K{2},1 , K{2},2 , K{2},3 each with two blocks for r = 2 , and
one partition K{3} = {(1) , (2) , (3)} with three blocks when r = 3.
The proof of Lemma 2.1 is given in the Appendix on p. 98.
We have stated Faà di Bruno’s Lemma for the first-order derivatives of the
function h by each of its variable xj . We will show below that this is not really
a restriction since all higher-order derivatives can be derived from this one. If the
function g is of one variable and we are interested, say, in the third-order derivative
g (3) , we may proceed as follows. Let us define a function  g of three variables by


g (x1 , x2 , x3 ) = g (x1 + x2 + x3 ) ,

g and put x1 = x2 = x3 = x then we obtain


and use the formula (2.6) for 
*
g (x1 , x2 , x3 ) **
∂ 3 d3
* = g (x) = g (3) (x) .
∂x1 ∂x2 ∂x3 xj =x dx 3

This idea leads us to the distinct values principle, and it will be used for mixed
higher-order derivatives as well. We recognize that the partial derivatives allow us
to equate some variables after the derivative has been taken. The general formula
(2.6) for a compound function of one variable will have the following simple form.
Corollary 2.1 (Faà di Bruno’s Formula of One Variable) If h (x) = f (g (x)),
then the nth-order derivative of h can be obtained from (2.6) and is given by
 
∂n 
n   d |b|
f (g (x)) = f (r) (g (x)) g (x) .
∂x n b∈K{r} dx |b|
r=1 K{r} ∈Pn

We see that the derivative d |b| /dx |b| g (x) depends only on the cardinality of block
b. Hence we can collect blocks with the same cardinality j . This allows us to use
2.1 Derivatives of Composite Functions 65

the idea of type  = (1 , . . . , n ) of a partition K{r} , (see Definition 1.4, p. 32). We
recall that j is the number of blocks having cardinality j , j = 1, 2, . . . , n, in a
partition K ∈ Pn . We have that j ≥ 0, and


n−r+1 
n−r+1
n= j j , j = r.
j =1 j =1

Now, since we are given the number of all such partitions, (see (1.41), p. 32),
therefore

 n  
n−r+1  j
∂n 1 g (j ) (x)
f (g (x)) = f (r) (g (x)) n! .
∂x n j ! j!
r=1 j =r;j j =n j =1
(2.7)

Later we shall prove Theorems 3.1 and 3.3 as an application of the formula (2.6).
Besides many other uses of this formula (2.6), we will now consider two particular
cases which will help us to prove some properties of cumulants.
Corollary 2.2 Let f (x) = ln x, and recognize that the rth-order derivative of
f (x) = ln x is

f (r) (x) = (−1)r−1 (r − 1)!x −r ,

so for the compound function ln (g (x)) we have the derivative ∂x1 ∂x2 . . . ∂xn given
by
 
∂n 
n   ∂ |b|
ln (g (x)) = (−1)r−1 (r − 1)!g −r (x)  g (x) ,
∂x1:n j ∈b ∂xj
r=1 K{r} ∈Pn b∈K{r}
(2.8)

where ∂x1:n = ∂x1 ∂x2 . . . ∂xn , |b| denotes the number of the elements in block b
and the second summation extends over all the partitions K{r} of Pn with size r (see
(2.5) for n = 3). Again, if g is of one variable then we get
  j
∂ n ln (g (x)) 
n  
n−r+1
1 g (j ) (x)
= (−1)r−1 (r − 1)!g −r (x) n! ,
∂x n j ! j!
r=1 j =r, j j =n j =1
(2.9)

by (2.7), since the derivative ∂ |b| / j ∈b ∂xj depends only on the cardinality |b| of
the block b, and the number of all partitions with type  = (1 , . . . , n ), are given
by (1.41).
66 2 The Tensor Derivative of Vector Functions

Another simple consequence of formula (2.7) is the following.


Corollary 2.3 Since all derivatives of f (x) = exp x, are exp x, therefore the
summation in (2.6) is over all partitions and results in the following:
 
∂ n exp (g (x))   ∂ |b| g (x)
= exp (g (x))  . (2.10)
∂x1 ∂x2 . . . ∂xn j ∈b ∂xj
K∈Pn b∈K

In the case of one variable, (2.7) applies, so we have

 n  
n−r+1  j
∂ n exp (g (x)) 1 g (j ) (x)
= exp (g (x)) n! ,
∂x n j ! j!
r=1 j =r, j j =n j =1
(2.11)

where  = (1 , . . . , n ) is the type of a partition K{r} with size r.


A particular case of Corollary 2.2 shows that the sum of the coefficients in (2.9)
is zero.
Corollary 2.4 We now consider a particular case when g (x) = exp (x) in formula
(2.9). Then the left-hand side of (2.9) is the derivative of ln (exp (x)) = x, which is
zero for n > 1. Each derivative of exp (x) is itself on the right-hand side of (2.9) so
that if we put x = 0 for n > 1, then we obtain


n  
n−r+1  j
1 1
0 = n! (−1)r−1 (r − 1)! , (2.12)
j ! j!
r=1 j =r, j j =n j =1

where the term n! can be ignored.


We now show how an application of formula (2.7) produces the generating
function of Bell numbers.
Corollary 2.5 (Number of All Partitions) Let g (x) = exp (x) − 1 in (2.11) and
notice that the derivatives of g coincide with the derivatives of exp (x), and g (k) (x)
at x = 0, k = 1, 2, . . ., are the coefficients of the Taylor series of g (x). On the other
hand side, the Taylor series of H (x) = exp (g (x)) is

 xn
H (x) = hn ,
n!
n=0
2.1 Derivatives of Composite Functions 67

where h0 = 1, and hn is the expression (2.11) at zero; at the same time hn is the
Bell polynomial of g (1) , g (2) , . . . , g (n) , (see (1.44), p. 34), i.e.

  n  n!
hn = hn g (1) (0) , g (2) (0) , . . . , g (n) (0) = n−r+1 .
r=1 j =r, j j =n j =1 j ! (j !)j

We proceed with the formula for Bell number Bn as follows, B0 = 1 and


n  n!
Bn = n−r+1 .
r=1 j =r, j j =n j =1 j ! (j !)j

(see (1.39), p. 31), those are the numbers of the elements of set Pn . It is seen
therefore that H (x) is the generating function of Bell numbers Bn , i.e.

 xn
H (x) = exp (exp (x) − 1) = Bn ,
n!
n=0

where B0 = 1.

2.1.2 Mixed Higher-Order Derivatives

Faà di Bruno’s formula (2.6) is general, since we can apply the distinct values princi-
ple and can obtain any higher-order derivatives. Nevertheless, some explicit formula
might be useful for mixed higher-order derivatives, like e.g. ∂ j +k /∂x1 ∂ k x2 f (g).
j

Let x ∈ R and j = j1:d , 0 ≤ jk integer, and |j| = j1 + · · · + jd , then let us


d

introduce the following compact form for higher-order mixed derivatives:

∂ |j|
Dj g (x) = g (x) ,
∂xj
we observe that the dimension of j is determined by the dimension of x. Moreover an
index j may contain zeros at places which are not included in the partial derivatives.
Now, we generalize the way by which partitions are used to derive Faà di Bruno’s
formula of one variable (2.7). In the case of one variable we identified blocks of a
partition K{r} ∈ Pn only by the cardinality. Thus partitions K{r} with the same type
 provide the same partial derivative

 ∂ |b| g (x)  
n−r+1 j
 = g (j ) (x) .
b∈K{r} j ∈b ∂xj j =1
68 2 The Tensor Derivative of Vector Functions

First let us consider the case in which d = 2; then we have

Dn g (x1 , x2 ) = ∂ n1 +n2 g (x1 , x2 ) /∂x1n1 ∂x2n2 ,

where n = (n1 , n2 ), and 0 ≤ nk are integers. We use the distinct values principle
similar to the one variable case. Consider partial derivatives ∂ n /∂y1 ∂y2 . . . ∂yn , with
n = |n| = n1 + n2 , and identify the first n1 members among the variables y1:n by x1
and the rest of the n2 members by x2 . Thus each block bk of a partition K{r} ∈ Pn
will be characterized by a pair jk,1 , jk,2 = jk , where jk,1 denotes the number of
the elements from set 1 : n1 and jk,2 from the set (n1 + 1) : n2 , respectively. Each
of them is included in the block bk . The possible values of jk,m are 0, 1, . . . , nm ,
m = 1, 2, and the maximum number of such pairs is K = (n1 + 1) (n2 + 1) − 1,
since each block contains at least one element and there is no block at all with a pair
(0, 0). We use the lexicographic order of the pairs j1:K , containing all possible pairs
which can occur in any partition K{r} ∈ Pn ; an element jk of the complete list of all
pairs j1:K will be called d−dimensional cardinality of a block b, (d = 2 right now).
In this situation the type  = 1:K of a partition K{r} with cardinality r means
that pair jk shows up k times in K{r} . Since we list k for all possible jk , the type
 = 1:K of a partition K{r} contains quite a number of zeros. Each element of the
sets 1 : n1 and (n1 + 1) : n2 occurs exactly once in a partition K{r} , so that the
following equations:


K 
K
jk,1 k = n1 , jk,2 k = n2 ,
k=1 k=1

must be fulfilled. Also since the cardinality of K{r} is r, we have


K
k = r.
j =1

Hence the constrains for type  are


K 
K
jk k = n, k = r,
k=1 j =1

where again n = (n1 , n2 ).


We proceed to collect all partitions K{r} ∈ 
Pn with some givencardinality r, and
with the same type . The partial derivative b∈K{r} ∂ |b| g (x) / j ∈b ∂xj for each
  
such partition K{r} is equal to Kk=1 Djk g ; the number of all these partitions is the
product of the corresponding one variable case.
The general formula (2.6) leads us to the following:
2.1 Derivatives of Composite Functions 69

Lemma 2.2 Let n = (n1 , n2 ), |n| = n1 + n2 , K = (n1 + 1) (n2 + 1) − 1, and j1:K


be the complete list of 2−dimensional cardinalities of all blocks b with respect to n,
then

    
∂ |n|
n K
1 Djk g k
f (g) = (r)
f (g) n! , (2.13)
∂x1n1 ∂x2n2 r=1
j ! jk !
p(r,,j)
k=1

where set p (r, , j) = p (r, (1 , . . . , K ) , (j1 , . . . , jK )), is defined by the constraint


K 
K 
K
jk,1 k = n1 , jk,2 k = n2 , k = r.
k=1 k=1 j =1

The following example shows the usage of the formula (2.13) for a higher-order
mixed derivative.
Example 2.3 Suppose we are interested in the mixed derivative ∂ 3 /∂x12 ∂x2 of
h (x) = f (g (x1 , x2 )). We have n1 = 2, n2 = 1, and n = 2 + 1 = 3. We
consider all partitions of P3 ; see Table 1.2, and change 3 by 1 everywhere in
the Table. For instance, partition K = {b1 = (1, 3) , b2 = (2)} is transformed to
K = {b1 = (1, 1) , b2 = (2)}. Next, we observe that a block can be characterized
by two numbers (j1 , j2 ), showing that the block contains j1 times the element 1 and
j2 times the element 2, such that j1 = 0, 1, 2, and j2 = 0, 1. Then (j1 , j2 ) = (2, 0)
for block b1 = (1, 1) and (0, 1) for block b2 = (2), respectively. The number of
all possible pairs (j1 , j2 ) of all blocks is K = 3 · 2 − 1. We enumerate the pairs
jk , k = 1 : 5, by j1 = (0, 1), j2 = (1, 0), j3 = (1, 1), j4 = (2, 0), j5 = (2, 1).
If r = 1 then the only partition is K = {b = (1, 2, 1)} and j = j5 = (2, 1), with
type 1:4 = 0, 5 = 1. If r = 2 then we have two partitions; one is the type
1 = 1, 4 = 1, and the pairs are j1 = (0, 1), j4 = (2, 0), the other one is the
type 2 = 1, 3 = 1 and j2 = (1, 0), and j3 = (1, 1). Finally, if r = 3, then
K = {b1 = (1) , b2 = (2) , b3 = (1)}, so that j1 = (0, 1), and j2 = (1, 0), which
corresponds to 1 = 1, 2 = 2. We use (2.13) and obtain

∂3
D2,1 f (g) = f (g)
∂x12∂x2


= f (1) (g) D2,1 g + f (2) (g) D2,0 gD0,1 g + 2D1,0 gD1,1 g
 2
+ f (3) (g) D1,0 g D0,1 g.

The same result follows from (2.6), or more directly from (2.5), using the distinct
values principle, in fact we consider

∂ 3 h (y)

= f (1) (g) g(1,2,3) + f (2) (g) g(1,3) g2 + g1 g(2,3) + g(1,2) g3 + f (3) (g) g1 g2 g3 ,
∂y1 ∂y2 ∂y3
70 2 The Tensor Derivative of Vector Functions

and change variables y1 and y2 to x1 , and y3 to x2 to obtain

∂3

h (x1 , x2 ) = f (1) (g) D2,1 g + f (2) (g) D2,0 gD0,1 g + 2D1,0 gD1,1 g
∂x12 ∂x2
 2
+f (3) (g) D1,0 g D0,1 g.

Faà di Bruno’s formula for general mixed derivatives also follows in a similar
manner.
Theorem
 2.1 Let n = (n1 , . . . , nd ), |n| = nk , K =  (nk + 1) − 1, 0 < jk =
jk,1 , . . . , jk,d ≤ n, k = 1 : K, 0 ≤ k , and k = 1 : K then we have

|n|    
∂ |n|
K
1 Djk g k
f (g) = f (r)
(g) n! ,
∂x1n1 · · · ∂xdnd r=1 p(r,,j) k=1
k ! jk !

where set p (r, , j) = p (r, (1 , . . . , K ) , (j1 , . . . , jK )), which depends on r,


(1 , . . . , K ), and (j1 , . . . , jK ), is defined by the constraints


K 
K
jk,s k = ns , s = 1 : d, k = r.
k=1 j =1

Although we have a clear expression for higher-order mixed derivatives, the


distinct values principle together with the formula (2.6) might lead us to a more
direct result.

2.2 T-derivative

We will now discuss the basic differentiation rules when taking partial derivatives
as a tensor product for vector-valued functions. This simple technique will be handy
for dealing with several multivariate nonlinear statistical problems.

2.2.1 Differentials and Derivatives

For x ∈ Rd , let the vector-valued function f (x) = [f1 (x) , f2 (x) , . . . , fm (x)] be
differentiable as many times as is necessary. Let us introduce the notation for partial
derivatives
! "
∂ ∂ ∂ ∂
= , ,..., ,
∂x ∂x1 ∂x2 ∂xd
2.2 T-derivative 71

and denote the Jacobian matrix of f by


! "
∂f ∂ ∂ ∂
Dx f = = f , , . . . , .
∂x ∂x1 ∂x2 ∂xd

This notation is in accordance with the matrix product of a column vector f and a
row vector operator ∂/∂x , i.e.
⎡ ∂f1 ∂f1 ∂f1 ⎤
∂x1 ∂x2 ··· ∂xd
! " ⎢ .. ⎥
⎢ ∂f2 . . . ⎥
∂ ∂ ∂ ⎢ . ⎥
f , ,..., = ⎢ ∂x1 ⎥ .
∂x1 ∂x2 ∂xd ⎢ .. .. .. ⎥
⎣ . . . ⎦
∂fm ∂fm
∂x1 · · · · · · ∂xd m×d

The Jacobian matrix with respect to the partial derivative Dx f =∂f/∂x is well
known. We will use the Magnus–Neudecker approach where the justification of the
differentiation rules, which we are going to derive below, is based on the theory of
partial derivatives and differentials. The First Identification Theorem provides the
connection between the differential df, and the derivative Dx f of a vector function,
namely

df (x; dx) = Dx f (x) dx.

If the differential df of a function f is given, then the value of the partials can be
immediately determined. The rules for obtaining differentials are commonly used
for ordinary derivatives, so it is convenient to obtain the derivative (for instance the
Jacobian matrix) from the differential.
We provide three examples to illustrate this idea.
Cauchy’s rule An instance is the derivative of a composite function using
Cauchy’s rule of invariance. The formula of Cauchy’s rule of invariance states
that if f ∈ Rm1 , g ∈ Rm2 , x ∈ Rd , and h (x) = f (g (x)), then

dh (x; dx) = df (g (x) ; dg (x; dx)) .

This clearly implies the chain rule using the First Identification Theorem

Dx h (x) dx = Dy f (y) Dx g (x) dx, (2.14)

so that y = g (x). Here the Jacobian matrix Dy f (y) is m1 × m2 , and Dx g (x) is


m2 × d, hence Dx h (x) is m1 × d.
Product rule Another rule is to calculate the partial derivative of the inner
product from the differential, i.e.
 
d f g = g df + f dg.
72 2 The Tensor Derivative of Vector Functions

The derivative is obtained as follows:


 
Dx f g = g Dx f + f Dx g, (2.15)

which is a row vector.


Tensor product rule The differential of a tensor product is given by

d (f1 ⊗ f2 ) = df1 ⊗ f2 + f1 ⊗ df2


= Dx f1 dx ⊗ f2 + f1 ⊗ Dx f2 dx.

Both terms here are vectors so that


   
d (f1 ⊗ f2 ) = vec f2 dx (Dx f1 ) + vec Dx f2 dxf1
= [(Dx f1 ) ⊗ f2 ] dx + [f1 ⊗ (Dx f2 )] dx, (2.16)

using (1.8), p. 7. Therefore the partial derivative, i.e. the Jacobian matrix follows

Dx (f1 ⊗ f2 ) = (Dx f1 ) ⊗ f2 + f1 ⊗ (Dx f2 ) . (2.17)

The difference of a three-tensor product follows directly from the previous case

d (f1 ⊗ f2 ⊗ f3 ) = df1 ⊗ f2 ⊗ f3 + f1 ⊗ df2 ⊗ f3 + f1 ⊗ f2 ⊗ df3 ,

and this provides the Jacobian matrix in a similar manner

Dx (f1 ⊗ f2 ⊗ f3 ) = Dx f1 ⊗f2 ⊗f3 +f1 ⊗(Dx f2 )⊗f3 +f1 ⊗f2 ⊗(Dx f3 ) . (2.18)

More generally the Jacobian matrix of the tensor product for vector-valued
functions is given by the following:
Lemma 2.3

⊗ 
M 

⊗
Dx fk (x) = fk ⊗ Dx fj (x) ⊗ fk . (2.19)
1:M 1:j −1 j +1:M
j =1

2.2.2 The Operator of T-derivative

We introduce a partial differential operator in terms of the tensor product. We recall


that the matrix product precedes (comes before) the T-product.
2.2 T-derivative 73

Definition 2.1 Let x ∈ Rd and f (x) ∈ Rm . Operator Dx⊗ (called T-derivative) is


defined as follows:
⎡ ∂f1 ∂f1 ∂f1 ⎤
∂x1 ∂x2 · · · ∂xd
  ⎢ .. ⎥
⎢ ∂f2 . . . . ⎥
∂f ⎢ ⎥
Dx⊗ f = vec (Dx f) = vec = vec ⎢ ∂x1 ⎥ .
∂x ⎢ .. . . .. ⎥
⎣ . . . ⎦
∂fm ∂fm
∂x1 · · · · · · ∂xd

We note that for a scalar-valued function f , the T-derivative coincides with the
gradient of f at a point x. Recall the difference vecA = vec (A ) and vec A =
(vecA) .
The main difference between the vectorized Dx and Dx⊗ is that Dx⊗ is strongly
connected to the T-product and results in a vector which is formally a T-product of
function f and the partial derivative ∂/∂x. In practice Dx⊗ is the following T-product
 
∂  ∂ ∂
Dx⊗ f = vec f  = vec f = f ⊗ . (2.20)
∂x ∂x ∂x

The following is an example of this.


Example 2.4 First we consider a simple case
 
∂ 
Dx⊗ x = vec (Dx x) = vec x  = vecId .
∂x

Then we apply Dx⊗ on a linear function; either using the definition


 
∂ 
Dx⊗ Ax = vec (Dx Ax) = vec Ax  = vecA ,
∂x

or using (2.20)


Dx⊗ Ax = (Ax) ⊗ = (A ⊗ Id ) vecId = vecA , (2.21)
∂x
(see (1.8), p. 7 for the last equation).
We provide some properties of Dx⊗ , which will frequently be used for different
applications.
Property 2.1 Let g (x) = Af (x), where x ∈ Rd , f (x) ∈ Rm2 , and A is m1 × m2 ,
then we obtain

Dx⊗ g = (A ⊗ Id ) Dx⊗ f. (2.22)


74 2 The Tensor Derivative of Vector Functions

In fact, we have
 
∂ ∂
Dx⊗ g = Af ⊗ = (A ⊗ Id ) f ⊗ = (A ⊗ Id ) Dx⊗ f.
∂x ∂x

The next property is


Property 2.2 Let A be an m1 × d matrix, and f ∈ Rm2 , then
  *
*
Dx⊗ f (Ax) = Im2 ⊗ A Dy⊗ f (y)* . (2.23)
y=Ax

We use the chain rule (2.14) for the Jacobian to prove this
 *
∂ **
Dx f (Ax) = f  * A,
∂y y=Ax

then by the definition of Dx⊗ we have


!   " *
∂    *
Dx⊗ f (Ax) = vec A f  = Im2 ⊗ A Dy⊗ f (y)* .
∂y y=Ax

In particular, if m2 = 1, i.e. f is scalar-valued then


*
*
Dx⊗ f (Ax) = A Dy⊗ f (y)* .
y=Ax

2.2.3 Basic Rules

Some simple rules for T-derivatives follow directly from the definition and the above
properties of partial derivatives.
Sum rule Let f ∈ Rm , and g ∈ Rm be functions of variable x ∈ Rd , then

Dx⊗ (f + g) = Dx⊗ f + D⊗
x g.

Product rule Let f ∈ Rm , and g ∈ Rm be functions of variable x ∈ Rd , then


     
Dx⊗ f g = g ⊗ Im Dx⊗ f + f ⊗ Im D⊗
x g.

We consider the Jacobian and the definition of Dx⊗


           
vec Dx f g = vec g Dx f + vec f Dx g = g ⊗ Im Dx⊗ f + f ⊗ Im D⊗
x g,

to obtain the assertion.


2.2 T-derivative 75

Decomposition rule Let u (x) = h (f, g) be a vector-valued function of functions


f and g and both f and g be vector-valued functions of x ∈ Rd . We also assume
that there is no connection between f and g, then

Dx⊗ u = Dx⊗ h|g=const + Dx⊗ h|f=const ,

Here we need some more explanation of, say, Dx⊗ h|g=const . We shall see in
Lemma 2.4 that the derivative of compound functions cannot be decomposed.
It is not a simple product of the derivative of h multiplied by the derivative of f
as is usual in the case of compound functions.
We proceed with the T-derivative of compound functions, which is more general
than Property 2.2 and is usually referred to as the chain rule.
Let f ∈ Rm1 , g ∈ Rm2 , x ∈ Rd , and y ∈ Rm2 then for the function h (x) =
f (g (x)) we have the chain rule (2.14) for partial derivatives as follows:
 
Dx h (x) = Dy f (y) m (Dx g (x))m2 ×d ,
1 ×m2

where y = g (x). We apply Lemma 1.1, p. 10, where we have shown for an m2 × m1
matrix A and a p × m2 matrix B that
   
vec (BA) = vec A ⊗ Im1 p Im1 ⊗ Km2 •m1 ⊗ Ip vecIm1 ⊗ vecB
   
= vec B ⊗ Im1 p Im2 ⊗ Kp•m1 ⊗ Ip vecA ⊗ vecIp .
 
Now we replace Am2 ×m1 by Dy f , and Bd×m2 by (Dx g) and obtain

     
Dx⊗ h = vec (Dx h) = vec (Dx g) Dy f = Dy f ⊗ Id Dx⊗ g = Im1 ⊗ (Dx g) Dy⊗ f,

(see (1.19), p. 9). The second line of expression vec (BA) implies a similar result as
well.
Thus we have obtained the following Lemma.
Lemma 2.4 (Chain Rule) Let f ∈ Rm1 , g ∈ Rm2 , x ∈ Rd , and y ∈ Rm2 , then the
chain rule for the compound function h (x) = f (g (x)) is
      
Dx⊗ h = Dy f ⊗ Id Dx⊗ g = Dy⊗ f ⊗ Im1 d K(1324) (m1 , m1 , m2 , d) vecIm1 ⊗ Dx⊗ g ,
(2.24)

where

K(1324) (m1 , m1 , m2 , d) = Im1 ⊗ Km2 •m1 ⊗ Id .


76 2 The Tensor Derivative of Vector Functions

We also have another form


  
 
Dx⊗ h = Im2 ⊗ (Dx g) Dy⊗ f = Dx⊗ g ⊗ Im1 d K(1324) (m2 , m1 , d, d)
 
Km2 •m1 Dy⊗ f ⊗ vecId , (2.25)

where

K(1324) (m2 , m1 , d, d) = Im2 ⊗ Kd•m1 ⊗ Id .

The general form of the chain rule is not very convenient. When we express the
T-derivative of a composite function in terms of T-derivatives of the compositions
(second rows of (2.24) and (2.25)). There are cases when using Jacobian matrices,
either Dy f or Dx g (first rows of (2.24) and (2.25)) simplifies the result. One can
recognize the ordinary chain rule using the  product in (2.24) and obtaining Dx⊗ h =
Dy f  Dx⊗ g, (see (1.60), p. 51 for the star product).
The following particular cases have important applications:
1. If h is scalar-valued h = h, i.e. f = f , m1 = 1, then
  
Dx⊗ h = Dy⊗ f ⊗ Id Dx⊗ g, (2.26)

or in another form

Dx⊗ h = Dy⊗ f  Dx⊗ g, (2.27)

where  denotes the star product, see (1.60), p. 51. Using the star product, the
ordinary chain rule appears here.
2. If g is scalar-valued, i.e. m2 = 1, g = g, then Dy⊗ = ∂/∂y and
 
Dx⊗ h = Im1 ⊗ Dx⊗ g Dy⊗ f,

hence

Dx⊗ h = Dy⊗ f ⊗ Dx⊗ g. (2.28)

3. If d = 1, then Dx⊗ = ∂/∂x and



 
Dx⊗ h = Dx⊗ g ⊗ Im1 Km2 •m1 Dy⊗ f;

(see (1.22)).
4. If both f and g are scalar-valued, i.e. m1 = 1, m2 = 1, then Dx⊗ = ∂/∂x and
Dy⊗ = ∂/∂y and we obtain

Dx⊗ h = Dy⊗ f Dx⊗ g.


2.2 T-derivative 77

2.2.4 T-derivative of T-products

Let us start with an example which shows the internal aspects of the usage of T-
derivative.
Example 2.5 The differential of x⊗2 is

dx⊗2 = dx ⊗ x + x ⊗ dx = (Id ⊗ x) dx + (x ⊗ Id ) dx,

(see, (2.16)), hence the T-derivative is


  ∂  
Dx⊗ x⊗2 = x⊗2 ⊗ = vec Id ⊗ x + x ⊗ Id
∂x
 
= Id ⊗ K−1
(21) (vecId ⊗ x) + x ⊗ vecId ,

(see (1.21), p. 9), which can be simplified as


  ∂
x⊗2 ⊗ = K−1
(132) (vecId ⊗ x) + x ⊗ vecId , (2.29)
∂x

where K(132) = Id ⊗ K(21) is the commutator for ∂/∂x and x in the product x⊗2 ⊗
∂/∂x. Note that the difference between x ⊗ (x ⊗ ∂/∂x) = x ⊗ Dx⊗ x, and x⊗2 ⊗
∂/∂x = Dx⊗ x⊗2 . The first term in (2.29) corresponds to (x ⊗ ∂/∂x) ⊗ x where we
differentiate the “first” term of x⊗x; doing so we need to interchange the second and
third term in a T-product a ⊗ a ⊗ b, this can be done with the help of a commutator
matrix as

a ⊗ a ⊗ b = K−1 −1
(132) K(132) (a ⊗ a ⊗ b) = K(132) (a ⊗ b ⊗ a) , (2.30)

hence the first term in (2.29) is


 

K−1
(132) x ⊗ ⊗ x = K−1
(132) (vecId ⊗ x) .
∂x

We note the use of the inverse of commutator. Here and later on it is pointed out—
as in (2.30), that the result is reached by interchanging vectors back and forth. It is
worth noting that although the derivative is unique, the form of the right-hand side
(2.29) is not, since

x ⊗ vecId = K−1
(231) (vecId ⊗ x) .

So we have another form of (2.29)


 
Dx⊗ x⊗2 = K−1
(132) + K −1
(231) (vecId ⊗ x) . (2.31)
78 2 The Tensor Derivative of Vector Functions

We shall keep using both notations Dx⊗ and ⊗∂/∂x for the T-derivative. It is
important to note that although the tensor product is associative in general and
Dx⊗ f = f ⊗ ∂/∂x, nevertheless expressions in tensor products including ∂/∂x are
not associative any more. For instance (f1 ⊗ f2 ) ⊗ ∂/∂x = f1 ⊗ (f2 ⊗ ∂/∂x), since
(f1 ⊗ f2 ) ⊗ ∂/∂x = Dx⊗ (f1 ⊗ f2 ) and f1 ⊗ (f2 ⊗ ∂/∂x) = f1 ⊗ Dx⊗ f2 ; therefore,
bracketing is necessary for the correct calculus.
We proceed with some properties of the operator Dx⊗ which are frequently used
below.
Property 2.3 If x ∈ Rd , f1 ∈ Rm1 , and f2 ∈ Rm2 then
 ⊗ 
Dx⊗ (f1 ⊗ f2 ) = K−1 ⊗
(132) (m1 , m2 , d) Dx f1 ⊗ f2 + f1 ⊗ Dx f2 , (2.32)

where K−1
(132) (m1 , m2 , d) = K(132) (m1 , d, m2 ).
Proof To prove (2.32) first we consider the derivative

Dx (f1 ⊗ f2 ) = Dx f1 ⊗ f2 + f1 ⊗ Dx f2 ,

by (2.17) hence
 
Dx⊗ (f1 ⊗ f2 ) = vec (Dx (f1 ⊗ f2 )) = vec (Dx f1 ⊗ f2 ) + (f1 ⊗ Dx f2 ) .

Now (1.21) yields


    
vec (Dx f1 ) ⊗ f2 = Im1 ⊗ Km2 •d Dx⊗ f1 ⊗ f2 ,

and we have
   
vec f1 ⊗ (Dx f2 ) = f1 ⊗ Dx⊗ f2 ,

by (1.22) hence
    
∂    ∂    
vec f1 ⊗ f + f ⊗ f 2  = Im1 ⊗ Km2 •d Dx⊗ f1 ⊗ f2 + f1 ⊗ Dx⊗ f2 .
∂x 2 1 ∂x

The commutator matrix Im1 ⊗Km2 •d changes the order of a triple T-product, namely
let a1 ∈ Rm1 , a2 ∈ Rm2 , and b ∈ Rd then
 
Im1 ⊗ Km2 •d (a1 ⊗ b ⊗ a2 ) = a1 ⊗ a2 ⊗ b,

so that Im1 ⊗ Km2 •d = K−1


(132) (m1 , m2 , d).

The operator Dx⊗ acts on a tensor product f1 ⊗ f2 similar to differentiating the


product of functions, which is
   
∂ ∂ ∂
(f1 f2 ) = f1 f2 + f1 f2 ,
∂x ∂x ∂x
2.2 T-derivative 79

the difference is that the tensor products do not commute, therefore we need to apply
a commutator before acts ∂/∂x.
Applying Dx⊗ on f1 ⊗ f2 yields two terms; one is
 

f1 ⊗ ⊗ f2 = K(132) (m1 , m2 , d) (f1 ⊗ f2 ⊗ a)|a=∂/∂x ,
∂x

i.e. reversing the last two vectors in product f1 ⊗f2 ⊗a before applying the derivative
∂/∂x. Now we see the pattern
 
∂ ∂
(f1 ⊗ f2 ) ⊗ = f1 ⊗ f2 ⊗ + K−1
(132) (m1 , m2 , d)
∂x ∂x
K(132) (m1 , m2 , d) (f1 ⊗ f2 ⊗ a)|a=∂/∂x
  
⊗ −1 ∂
= f1 ⊗ Dx f2 + K(132) (m1 , m2 , d) f1 ⊗ ⊗ f2
∂x
 ⊗ 
= f1 ⊗ Dx⊗ f2 + K−1
(132) (m1 , m2 , d) Dx f1 ⊗ f2 .

Here we must remember that the inverse K−1 (132) (m1 , m2 , d) = K(132) (m1 , d, m2 )
of the commutator K(132) (m1 , m2 , d), and is equal to Im1 ⊗ Km2 •d , where the
dimensions are counted. Notice K−1 (132) (m1 , m2 , d) = K(132) (m1 , d, m2 ).
We derive the second term in (2.32); simply f1 ⊗(f2 ⊗ ∂/∂x) = f1 ⊗Dx⊗ f2 , hence
there is no need for a commutator to reach the derivative of f2 .
This implies the following rule of thumb: Applying the operator Dx⊗ for the first
function f1 in the product f1 ⊗ f2 , one needs to transform vector ∂/∂x just behind
vector f1 by the commutator K(132) (m1 , m2 , d), and it is necessary to use the inverse
K−1
(132) for maintaining equality and pretending to transform it back to the original
form.
There is another way of changing the orders of a tensor product (f1 ⊗ f2 ) ⊗ ∂/∂x
so that ∂/∂x be able to act on f1 , namely we can change the order of f1 and f2 . The
result is

(f1 ⊗ f2 ) ⊗ = f1 ⊗ Dx⊗ f2 + K−1
(213) (m1 , m2 , d)
∂x
K(213) (m1 , m2 , d) (f1 ⊗ f2 ⊗ a)|a=∂/∂x
  

= f1 ⊗ Dx⊗ f2 + K−1
(213) (m 1 , m 2 , d) f 2 ⊗ f 1 ⊗
∂x
 
= f1 ⊗ Dx⊗ f2 + K−1
(213) (m1 , m2 , d) f2 ⊗ Dx f1 .

Naturally
 ⊗   
K−1 −1 ⊗
(132) (m1 , m2 , d) Dx f1 ⊗ f2 = K(213) (m1 , m2 , d) f2 ⊗ Dx f1 ,
80 2 The Tensor Derivative of Vector Functions

hence we have another form of the T-derivative


 
Dx⊗ (f1 ⊗ f2 ) = K−1 ⊗ ⊗
(213) (m1 , m2 , d) f2 ⊗ Dx f1 + f1 ⊗ Dx f2
 ⊗ 
= K−1 ⊗
(132) (m1 , m2 , d) Dx f1 ⊗ f2 + f1 ⊗ Dx f2 ,

and either one is correct. Again the general pattern of (2.32) for T-products is similar
to the common differential of products.

2.2.4.1 T-derivative of More Tensor Products

The T-derivative of tensor products is similar to the derivative of the product of


several functions. We consider the T-derivative of a triple tensor product in detail
then state its general version.
Method 1: First, let us consider the Jacobian (derivative) of a triple tensor
product

Dx (f1 ⊗ f2 ⊗ f3 ) = Dx f1 ⊗ f2 ⊗ f3 + f1 ⊗ Dx f2 ⊗ f3 + f1 ⊗ f2 ⊗ Dx f3 , (2.33)

see Lemma 2.3, and then follow the definition of Dx⊗ . That is, we need to vectorize
the transposition of the terms one by one. Doing so, we have
     
vec (Dx f1 ) ⊗ f2 ⊗ f3 = Im1 ⊗ Km2 m3 •d vec (Dx f1 ) ⊗ f2 ⊗ f3
 
= K−1 
(1423) (m1:3 , d) vec (Dx f1 ) ⊗ f2 ⊗ f3

for the first term in (2.33), where we used (1.21), p. 9. Again, we applied ⊗∂/∂x
on the first term f1 in the product, and to be able to do so, we have to use the
commutator K(1423) (m1:3 , d) and its inverse K−1 (1423) (m1:3 , d), which restores the
product to the original form, to obtain ∂/∂x in the second place from the last one in
(f1 ⊗ f2 ⊗ f3 ) ⊗ ∂/∂x.
Similarly, we interchange ∂/∂x and f3 , for the second term in (2.33), by the
commutator K(1243) (m1:3 , d), and after we have Dx f2 we restore it by the inverse
commutator K−1 (1243) (m1:3 , d). The last term in (2.33) is the simplest one
 

f1 ⊗ f2 ⊗ f3 ⊗ = f1 ⊗ f2 ⊗ Dx⊗ f3 ,
∂x

where there is no need for a commutator.


Finally we obtain
 ⊗ 
Dx⊗ (f1 ⊗ f2 ⊗ f3 ) = K−1
(1423) (m1:3 , d) Dx f1 ⊗ f2 ⊗ f3
 
+K−1 ⊗ ⊗
(1243) (m1:3 , d) f1 ⊗ Dx f2 ⊗ f3 + f1 ⊗ f2 ⊗ Dx f3 . (2.34)
2.2 T-derivative 81

Method 2: We can apply the result 2.3 to the 3−product in two steps:

Dx⊗ (f1 ⊗ f2 ⊗ f3 ) = Dx⊗ (f1 ⊗ (f2 ⊗ f3 )) = f1 ⊗ Dx⊗ (f2 ⊗ f3 )


 ⊗ 
+ K−1(132) (m1 , m2 m3 , d) Dx f1 ⊗ f2 ⊗ f3
 ⊗ 
= f1 ⊗ Dx⊗ (f2 ⊗ f3 ) + K−1 (1423) (m1:3 , d) Dx f1 ⊗ f2 ⊗ f3
 ⊗ 
= K−1
(1423) (m1:3 , d) Dx f1 ⊗ f2 ⊗ f3 + f1 ⊗ f2 ⊗ Dx f3

 
+ K−1 ⊗
(1243) (m1:3 , d) f1 ⊗ Dx f2 ⊗ f3 .

An alternative way of reaching the T-derivative Dx⊗ fk in a 3−product f1 ⊗ f2 ⊗ f3


is to interchange vector fk and the last one f3 by setting it just before ∂/∂x , instead
of moving ∂/∂x to the (k + 1)t h place. This yields for, say, k = 1 the following:

f1 ⊗ f2 ⊗ f3 ⊗ a = K−1
(2314) (m1:3 , d) K(2314) (m1:3 , d) (f1 ⊗ f2 ⊗ f3 ⊗ a)

= K−1
(2314) (m1:3 , d) (f2 ⊗ f3 ⊗ (f1 ⊗ a)) .

If we change a by ∂/∂x we obtain the term


 
K−1 ⊗
(2314) (m1:3 , d) f2 ⊗ f3 ⊗ Dx f1 .

In this way we have an alternate form of the T-derivative.


The generalization of the T-derivative of a triple tensor product, keeping
Lemma 2.3 in mind, is straightforward for an M−tensor product. Two types of
commutators will be necessary:
One which sets the last term in a tensor product with M + 1 elements into the
kth place, k = 2, . . . M + 1. This corresponds to a permutation p of 2 : (M + 1),
namely p is an −cycle ((M + 1) : k)S = ((M + 1) , M, . . . , k) whose length  =
M −k +2; see Lemma 1.2 for details. We can simplify this commutator by grouping
the product
⊗ ⊗ ⊗
fk = fj ⊗ fj ,
1:M 1:k k+1:M

then the permutation (M + 1 : k)S of ⊗ 1:M fk ⊗ ∂/∂x is equivalent
 to the permu-
⊗  ⊗
tation (1 : k, (M + 1) , k + 1 : M) of fj ⊗ fj ⊗ ∂/∂x. Actually
1:k  (k+1):M

we interchange the last two elements (k+1):M fj and ∂/∂x, paying attention to
the dimensions.
The other type of commutators corresponds to setting the  kth element
⊗
to the Mth place; now we group the product by 1:(k−1) fj ⊗ fk (x) ⊗
   
⊗ ⊗
(k+1):M fj and interchange fk (x) and (k+1):M fj in the product
82 2 The Tensor Derivative of Vector Functions

   
⊗ ⊗
1:(k−1) f j ⊗ f k (x) ⊗ (k+1):M f j ⊗ ∂/∂x, so the corresponding permutation
is (1 : k − 1, k + 1 : M, k, M + 1) = (k : M)S .
Lemma 2.5 The general result, with respect to (2.32), is the following: if fk (x) ∈
Rmk , k = 1, 2, . . . , M, then

⊗ 
M
Dx⊗ fj (x) = K−1
(M+1:k)S (m1:M , d)
1:M
k=1
⊗
⊗ 
fj ⊗ Dx⊗ fk (x) ⊗ fj , (2.35)
1:(k−1)∗ (k+1:M)∗

where we put the empty set for both (1 : 0)∗ = ∅, and (M + 1 : M)∗ = ∅; these
are the first and last terms in the sum. Another form of the derivative is

⊗ 
M ⊗ 
Dx⊗ fj (x) = K−1
(k:M)S (m1:M , d) fj ⊗ Dx⊗ fj (x) .
1:M 1:M,j =k
k=1
(2.36)

Proof We use Property 2.3 with induction and the result of Lemma 1.2.
Example 2.6 We consider a particular case

f (x) = x⊗M ,

of (2.35), where all functions are equal fj (x) = x, and all dimensions mj are the
same mj = d. We apply (2.36) and realize that moving the kth term to the Mth
place is equivalent to the commutator Lk = Id k−1 ⊗ Kd M−k •d ⊗ Id and L−1 k =
Id k−1 ⊗ Kd•d M−k ⊗ Id . This implies that the corresponding partition to commutator
Id k−1 ⊗ Kd M−k •d has type 1 = 1, M−1 = 1, j = 0 otherwise. Therefore we can
write


M
Id k−1 ⊗ Kd•d M−k ⊗ Id = L−1
1M−1 ,11 ⊗ Id ,
k=1

(see Sect. 2.4.3, p. 100). Hence


  
Dx⊗ x⊗M = L−1
1M−1 ,11 ⊗ Id x ⊗(M−1)
⊗ vecId .
2.2 T-derivative 83

In particular
   
Dx⊗ x⊗3 = K−1
(2314) x
⊗2 ⊗ vecI −1
d + K(1324) x
⊗2 ⊗ vecI
d +x
⊗2 ⊗ vecI
d
     
= Id 4 + K−1(1324) + K−1
(2314) x⊗2 ⊗ vecI
d = L −1
1 ,1 ⊗ Id x⊗2 ⊗ vecI .
d
2 1

2.2.4.2 T-derivative with Higher Orders

Now, if we repeat Dx⊗ then we obtain the second-order derivative


!  "  ⊗2
 
   ∂ ∂  ∂
Dx⊗2 f = Dx⊗ Dx⊗ f = vec Dx Dx⊗ f = vec f⊗ = f⊗ ,
∂x ∂x ∂x

either by the definition or using (2.20).


If we apply the operator Dx⊗ consecutively k times i.e. Dx⊗k on f ∈ Rm , then it
yields

   ⊗k

Dx⊗k f = Dx⊗ ⊗k−1
Dx f =f⊗
∂x
! "⊗k
 ∂ ∂ ∂
= [f1 (x) , f2 (x) , . . . , fm (x)] ⊗ , ,..., ,
∂x1 ∂x2 ∂xd

the result is a column vector of order md k , containing all possible partial deriva-
tives of entries of f with order d with respect to the T-product of partials
[∂/∂x1 , ∂/∂x2 , . . . , ∂/∂xd ]⊗k .
A simpler version of the T-derivative of higher orders is when functions are
scalar-valued, and it will be applied several times.
Let f and g be scalar-valued functions of x ∈ Rd then the first four derivatives
are as follows:
First order The T-derivative of the product fg is

Dx⊗ (fg) = f Dx⊗ g + gDx⊗ f.

Second order Let us consider the second-order T-derivative, (the scalar com-
mutes with vectors), hence the first term is
   ⊗ 
Dx⊗ f Dx⊗ g = f Dx⊗2 g + Dx⊗ g ⊗ Dx⊗ f = f Dx⊗2 g + K−1 ⊗
(21) Dx f ⊗ Dx g ,
84 2 The Tensor Derivative of Vector Functions

the second term is similar, hence


  
Dx⊗2 (fg) = f Dx⊗2 g + Id 2 + K−1
(21) Dx⊗ f ⊗ Dx⊗ g + gDx⊗2 f
 
= f Dx⊗2 g + 2Sd12 Dx⊗ f ⊗ Dx⊗ g + gDx⊗2 f. (2.37)

Third order Now, for the third-order T-derivative we apply Dx⊗ to the second
derivative (2.37) term by term; an example is the derivative of the second term of
(2.37), which has two terms, the first one is
   
Dx⊗ Dx⊗ f ⊗ Dx⊗ g = Dx⊗ f ⊗ Dx⊗2 g + K−1 ⊗2
(132) Dx f ⊗ Dx g

   
= K−1
(231) Dx
⊗2
g ⊗ Dx

f + K −1
(132) Dx
⊗2
f ⊗ Dx

g ,

and the second one is


 ⊗   −1   
Dx⊗ K−1 ⊗
(21) Dx f ⊗ Dx g = K(21) ⊗ Id Dx⊗ f ⊗ Dx⊗2 g + K−1 ⊗2 ⊗
(132) Dx f ⊗ Dx g
   
= K−1 ⊗ ⊗2 −1 ⊗2
(213) Dx f ⊗ Dx g + K(231) Dx f ⊗ Dx g ,

 −1
since K−1 −1
(213) K(132) = K(132) K(213) = K−1(231) .
The rest of the terms of (2.37) are simple, so we obtain
 
Dx⊗3 (fg) = f Dx⊗3 g + Id 3 + K−1
(231) + K −1
(132)
 
Dx⊗2 f ⊗ Dx⊗ g + Dx⊗2 g ⊗ Dx⊗ f + gDx⊗3 f. (2.38)

The same argument as in Example 1.8, p. 15 leads to a compact form


 
Dx⊗3 (fg) = f Dx⊗3 g + 3Sd13 Dx⊗2 f ⊗ Dx⊗ g + Dx⊗2 g ⊗ Dx⊗ f + gDx⊗3 f.

Notice that the left-hand sides in all cases above are q-symmetric, see Sect. 1.3.1,
p. 13. Let us consider the third-order case, the left-hand side of (2.38) is

Dx⊗3 (fg) = (fg) ⊗ Dx⊗ ⊗ Dx⊗ ⊗ Dx⊗ ,

and now apply the symmetrizer Sd13 to both sides and obtain

Sd13 Dx⊗3 (fg) = Dx⊗3 (fg) ,

since changing the order of operator Dx⊗ results in no change. The first two terms
on right-hand side of (2.38) are 3-symmetric too, the coefficients of the last term are
2.2 T-derivative 85

permutation matrices; if we apply the symmetrizer the result is simply changing the
order in the sum so that
 
Sd13 Id 3 + K−1
(231) + K −1
(132) = 3Sd13 .

We rewrite (2.38) as follows:


    3
3
Dx⊗3 (fg) = Sd13 ⊗3
f Dx g + D ⊗2 f ⊗ Dx⊗ g
0 1 x
     
3 3
+ Dx⊗ f ⊗ Dx⊗2 g + g Dx⊗3 f .
2 3

This looks much closer formally to the ordinary derivatives of a product of two
functions.
Fourth order Similarly to the third-order the fourth-order derivative is written
    
Dx⊗4 (fg) = Sd14 f Dx⊗4 g + g Dx⊗4 f
 
+4 Dx⊗3 f ⊗ Dx⊗ g + Dx⊗3 g ⊗ Dx⊗ f
 
+3 Dx⊗2 f ⊗ Dx⊗2 g + Dx⊗2 g ⊗ Dx⊗2 f
4  
   
4
= Sd14 Dx⊗k f ⊗ Dx⊗4−k g ,
k
k=0

and the general Leibniz rule follows by induction


n  
   
n
Dx⊗n (fg) = Sd1n Dx⊗k f ⊗ Dx⊗n−k g , (2.39)
k
k=0

the further generalization of this expression is the following.


Lemma 2.6 Let f1 , . . . , fn be scalar-valued functions of x ∈ Rd , and we consider

   m  ⊗ ⊗kj
Dx⊗m fj = Sd1m Dx fj . (2.40)
k1:n j
j =1:n k1:n =m

We generalize the above result (2.40) for higher-order derivatives of the tensor
product of vector-valued functions when n = 2. Naturally the distinct values
principle then can be applied obtaining formulae like (2.40). One can easily show
that if n = 2; then the result of Lemma 2.6 is a particular case of Lemma 2.7.
86 2 The Tensor Derivative of Vector Functions

Assume fj ∈ Rmj , j = 1, 2, are functions of a list of variables x1:n


with dimensions d1:n , in other words variables x = x1:n are sliced into sub-

vectors xj with dimension dj . Let us use the notation D1:n (f1 ⊗ f2 ) for the

T-derivative Dx1:n (f1 ⊗ f2 ), in particular D1,2 (f1 ⊗ f2 ) = Dx⊗

1 ,x2
⊗ f2 ). The
(f1 
dimensions (m1:2 , d1:2 ) for the commutator matrices of order m1:2 d1:2 will
be considered as default and it will be neglected for shorter notation, for instance
K−1 −1
(1324) (m1:2 , d1:2 ) = K(1324) .
The first-order derivative of f1 ⊗ f2 (see (2.32)) is
 ⊗ 
D1⊗ (f1 ⊗ f2 ) = K−1 ⊗
(132) D1 f1 ⊗ f2 + f1 ⊗ D1 f2 , (2.41)

then the consecutive application of (2.32) yields the second-order one


  

D1:2 (f1 ⊗ f2 ) = K−1
(1324) D1

f 1 ⊗ D2

f 2 + K −1
(1243) D ⊗
1,2 f 1 ⊗ f 2


+ f1 ⊗ D1,2 f2 + K−1 ⊗ ⊗
(1423) D2 f1 ⊗ D1 f2

= f1 ⊗ D1,2 f2 + K−1 ⊗ ⊗ −1 ⊗ ⊗
(1324) D1 f1 ⊗ D2 f2 + K(1423)D2 f1 ⊗ D1 f2
 
+ K−1 ⊗
(1342) D1,2 f1 ⊗ f2 ,

where K−1 −1
(1342) = K(1342) (m1 , m2 , d1 , d2 ) by default notation. We argue as follows:

we used D2 on the first term of (2.41) as
 ⊗   −1   
D2⊗ K−1
(132) (m 1 , m 2 , d 1 ) D1 f 1 ⊗ f 2 = K(132) (m 1 , m 2 , d 1 ) ⊗ I d2 D2⊗ D1⊗ f1 ⊗ f2
   
= K−1 −1 ⊗ ⊗
(1324) K(1342) (m1 , d1 , m2 , d2 ) D1,2 f1 ⊗ f2 + D1 f1 ⊗ D2 f2

   
= K−1 −1 ⊗ −1
(1324) K(1243) (m1 , d1 , m2 , d2 ) D1,2 f1 ⊗ f2 + K(1324) D1 f1 ⊗ D2 f2
⊗ ⊗

and the product of commutators involved here is


 −1
K−1 −1
(1324)K(1243) (m1 , d1 , m2 , d) = K(1243) (m1 , d1 , m2 , d) K(1324) = K−1
(1342),

since the consecutive application of commutators acts on vectors with orders defined
by the second commutator, therefore we obtain the derivative of the first term
 ⊗     ⊗ ⊗ 
D2⊗ K−1 −1 ⊗2 −1
(132) (m1 , m2 , d1 ) D1 f1 ⊗ f2 = K(1342) D1,2 f1 ⊗ f2 + K(1324) D1 f1 ⊗ D2 f2 .

We recall that the consecutive application of permutations corresponds to the


product of permutations in opposite order, (see Remark 1.2, p. 2).
Now let us consider the T-derivative of the second term of (2.41)
   ⊗ ⊗ 
D2⊗ f1 ⊗ D1⊗ f2 = K−1 ⊗2
(1423) D2 f1 ⊗ D1 f2 + f1 ⊗ D1,2 f2 .
2.2 T-derivative 87


We combine the two terms above and arrive at the derivative D1:2 (f1 ⊗ f2 ) above.
A similar argument leads to the third-order derivative
⊗ ⊗
D1:3 (f1 ⊗ f2 ) = f1 ⊗ D1:3 f2 + K−1 ⊗ ⊗
(15234) D3 f1 ⊗ D1,2 f2

+ K−1 ⊗ ⊗ −1 ⊗ ⊗
(13245) D1 f1 ⊗ D2,3 f2 + K(13524)D1,3 f1 ⊗ D2 f2

+ K−1 ⊗ ⊗ −1 ⊗ ⊗
(14235) D2 f1 ⊗ D1,3 f2 + K(14523)D2,3 f1 ⊗ D1 f2
   ⊗ 
+ K−1 ⊗ ⊗ −1
(13425) D1,2 f1 ⊗ D3 f2 + K(1342) D1:3 f1 ⊗ f2

= K−1 ⊗ ⊗
p(b) Db f1 ⊗ Dbc f2 ,
b

where b is a block of 1 : 3 including the empty set, and permutation p (b)


denotes the permutation of the indices of dimensions (m1:2 , d1:3 ), such that it is
a permutation of (1, b + 2, 2, bc + 2); for instance let b = (1), bc = (2, 3), then
p (b) = (13245) corresponds to commutator K−1 (13245) , and it permutes a T-product
of vectors with dimensions (m1:2 , d1:3 ) into a T-product of vectors with dimensions
(m1 , d1 , m2 , d2 , d3 ). We show blocks b for each term of the third-order derivative:
⊗ ⊗ *
*
D1:3 (f1 ⊗ f2 ) = f1 ⊗ D1:3 f2 b=∅
* *
* *
+ K−1 ⊗ ⊗
(13245) D1 f1 ⊗ D2,3 f2 * + K−1 ⊗ ⊗
(14235)D2 f1 ⊗ D1,3 f2 *
b=(1) b=(2)
*
*
+ K−1 D ⊗
(15234) 3 1f ⊗ D ⊗
1,2 2 *
f
b=(3)
 *
*
+ K−1
(13425) D ⊗
1,2 f 1 ⊗ D3

f 2 *
b=(1,2)
*
*
+ K−1 ⊗
(13524) D1,3 f1 ⊗ D2⊗ f2 *
b=(1,3)
*  ⊗ **
⊗ *
+ K−1 D ⊗
f
(14523) 2,3 1 ⊗ D1 2*
f + K−1
(1342) D f
1:3 1 ⊗ f 2 * .
b=(2,3) b=(1:3)

One can use induction for proving the following Lemma.


Lemma 2.7 The general Leibniz rule for the T-derivative of the tensor product f1 ⊗
f2 is


D1:n (f1 ⊗ f2 ) = K−1 ⊗ ⊗
p(b) (m1:2 , d1:n ) Db f1 ⊗ Dbc f2 , (2.42)
b

where the summation is taken over all possible blocks b of the set (1 : n) and its
complement bc , including the empty set as well. The summation contains 2n terms,
each block is written in increasing order of its elements; the permutation p (b) of
(1 : n + 2) is defined by (1, b + 2, 2, bc + 2).
88 2 The Tensor Derivative of Vector Functions

We proceed with the second-order derivative Dx⊗2 (f1 ⊗ f2 ), when both functions
depend on the single variable x; we use (2.42) and the distinct values principle, then
substitute x in each xj . It is seen then
   
Dx⊗2 (f1 ⊗ f2 ) = K−1
(1342) Dx
⊗2
f 1 ⊗ f 2 + K −1
(1324) + K −1
(1423)
 ⊗ 
Dx f1 ⊗ Dx⊗ f2 + f1 ⊗ Dx⊗2 f2 . (2.43)

Moreover, the formula (2.42) yields for general n:


⎛ ⎞

n   
Dx⊗n (f1 ⊗ f2 ) = ⎝ K−1 ⎠ Dx⊗j f1 ⊗ Dx⊗(n−j ) f2 ,
p(b) (m1:2 , d1:n )
j =0 |b|=j
(2.44)
 
where the second summation contains nj terms choosing j elements among n in all
possible ways.
In closing this section, we emphasize that when we employ Dx⊗1:n , where x1:n has
dimensions d1:n , the order of the vectors in x1:n is important.
Example 2.7 Let us change the order of the variables in Dx⊗1 ,x2 and apply it to
f (x1:2 ) ∈ Rm , we obtain
 
∂ ∂ −1 ∂ ∂
Dx⊗2 ,x1 f (x1:2 ) = f⊗ ⊗ = K(132) f ⊗ ⊗ = K−1 ⊗
(132) Dx1 ,x2 f (x1:2 ) ,
∂x2 ∂x1 ∂x1 ∂x2
 
where K−1 −1
(132) = K(132) (m, d1 , d2 ) = Im ⊗ Kd2 •d1 ; i.e. K(132) (m, d2 , d1 ) rear-
ranges the T-product f ⊗ ∂/∂x1 ⊗ ∂/∂x2 to the original order f ⊗ ∂/∂x2 ⊗ ∂/∂x1 .
Example 2.8 A particular case of the previous example is when the function f = φ
a characteristic function, which is scalar-valued (m = 1), hence
* *
* *
Dλ⊗2 ,λ1 φ (λ1 , λ2 )* = K−1 D ⊗
(21) λ1 ,λ2 φ (λ1 , λ 2 ) * ,
λ1 ,λ2 =0 λ1 ,λ2 =0

which implies that EX2 ⊗ X1 = K−1


(21) EX1 ⊗ X2 , this latter one also follows from
the properties of T-product.

2.2.5 Taylor Series Expansion

The series expansion of a scalar-valued function has primary importance in the


theory of cumulants.
2.2 T-derivative 89

We have seen the derivative of scalar-valued functions, like the product rule 2.2.3
which provides the T-derivative of the inner product f g, as
     
Dx⊗ f g = f ⊗ Id Dx⊗ g + g ⊗ Id Dx⊗ f. (2.45)

A simple case of (2.45) is a x where a is constant. Since the Taylor series expansion
of a characteristic function contains terms like (a x)k , obtaining the higher-order T-
derivatives of (a x)k is of some interest.
Lemma 2.8 Let a be a constant vector of dimension d, then
 k
Dx⊗k a x = k!a⊗k . (2.46)

Proof The formula (2.46) can be proved directly. We observe that (a x)k =
a⊗k x⊗k , then for k = 1, we have
   
Dx⊗ a x = a ⊗ Id vecId = a. (2.47)

The kth power of a x is a compound function with f1 (y) = y k , f2 (x) = a x, and


the Chain rule (2.26) provides
  
Dx⊗ f = Dy⊗ f1 ⊗ Id Dx⊗ f2 ,

hence
 k  k−1 ⊗   k−1
Dx⊗ a x = k a x Dx a x = k a x a (2.48)

and the second-order T-derivative of (a x)k is


 k−1  k−2  k−2 ⊗2
Dx⊗ k a x a= (a ⊗ Id ) k (k − 1) a x a = k (k − 1) a x a ,

and so on, until we finally arrive at


 k
Dx⊗k a x = k!a⊗k .

We replace the power on the left-hand side by the tensor product, resulting in

Dx⊗k a⊗k x⊗k = k!a⊗k .


90 2 The Tensor Derivative of Vector Functions

An application of this Lemma is the Taylor series expansion of the characteristic


function
∞ m ∞ m
   i  m  i  ⊗m 
ϕ (λ) = E exp iX λ = E λ X = λ EX⊗m , (2.49)
m! m!
m=0 m=0
 
where we assign the higher-order moments EX⊗m to the coefficients of λ⊗m
which are obtained by the higher-order derivative

EX⊗m = i −m Dλ⊗m ϕ (λ) |λ=0 .

2.3 Multi-Variable Faà di Bruno’s Formula

We consider the T-differential operator Dx⊗ , then we split the vector x into sub-
vectors of possibly different dimensions [d1 , d2 , . . . , dn ], in the same way we take
a list of vectors [x1 , x2 , . . . , xn ] and the derivatives with respect to the sub-vector

entries. We denote this operator by Dx⊗1:n , or D1:n , for short. These sub-vectors will
have the role of scalars in Faà di Bruno’s formula (2.6), and the T-derivative Dx⊗1:n
will be employed instead of partial derivatives.
We can permute the vectors in a T-product with the help of commutators, which
leads to the following.
Proposition 2.1 Let p be a permutation of (1 : n), xp = xp(1:n) and the function
f (x1:n ) ∈ Rm be continuously differentiable n times by any of its variables, then
 
Im ⊗ K−1
p Dx⊗p f = Dx⊗1:n f,

equivalently
 
Dx⊗p f = Im ⊗ Kp Dx⊗1:n f,
 
note Im ⊗ K−1 −1
p d[1:n] = Kpn+1 (m, d1:n ) ,where pn+1 = (1, p + 1).

Let us repeat the basic notions which will be used in the next theorem; Pn is
the set of all partitions K of the numbers (1 : n). A partition K with cardinality
|K| = k and type  is in canonical form if the blocks {b1 , b2 , . . . , bk } of K are in
order defined by its type . Ordering bj ≤ bk of the blocks bj ∈ K with same type
is defined by
 
2−m ≤ 2−m , (2.50)
m∈bj m∈bi
2.3 Multi-Variable Faà di Bruno’s Formula 91

and equality in (2.50) is possible if and only if j = i, (see Definition 1.7 and (1.45),
p. 35). In addition elements in each block are increasing.
Theorem 2.2 (Faà di Bruno’s Formula) Let us take a list of vectors x1:n =
(x1 , x2 , . . . , xn ) with dimensions d1:n = (d1 , d2 , . . . , dn ). Let the implicit function
f (g (x)), x ∈ Rdk , be differentiable dk times where f and g are scalar-valued
functions. Then


n  ⊗  
Dx⊗1:n f (g (x1:n )) = f (r) (g (x1:n )) K−1 (d )
p(K{r} ) 1:n
Dx⊗b g (x1:n ) ,
b∈K{r}
r=1 K{r} ∈Pn
(2.51)

where
  {r} denotes partitions with size |K| = r, K{r} is in the canonical form, and
K
p K{r} is a permutation of 1 : n defined by K{r} , (see Sect. 1.4.4), and Dx⊗b is a
|b|t h -order T-derivative.
See the Appendix 2.4.2, p. 99, for the proof.
The summation in (2.51), which is over all partitions K{r} with size r, can
be divided such that collecting partitions K{r} with the same type when size r is
given. Then we sum over with respect to types first. Let us recall that the type of
partition K is  = (1 , . . . , n ), when K contains exactly j blocs with cardinality
j , (see Sect. 1.4.4). The number and structure of partitions with the same type
follow incomplete Bell polynomials; therefore, collecting all partitions with fixed
cardinality, say r, can be controlled. Namely, one can obtain both the number of
partitions, which is N and the structure of a partition K with type , from the
incomplete Bell polynomial

 
n−r+1

Bn,r (x1 , . . . , xn−r+1 ) = N xj j , (2.52)
j =r,j j =n j =1

where

n!
N = n , (2.53)
j =1 j ! (j !)
j


and where xj j tells us that the partition has j blocs with cardinality j . We identify
xj in (2.52) with j th-order derivative and j is the number of the T-products of these
derivatives. An instance is if n = 4 then x22 corresponds to Dx⊗i ,xj ⊗ Dx⊗k ,xm , where
we have second-order derivatives with respect to blocs b1 = (i, j ) and b2 = (k, m)
of a partition K{2} and the number of all partitions with such type  = (0, 2, 0, 0)
is N = 3. Let us consider partitions with cardinality r, then we obtain the possible
 
types , fulfilling j = r, and j j = n, the exponents in n−r+1 j =1 xj j show the
orders of the derivatives, and the number of those partitions defined by type  is N ,
see Example 1.20. A list of Bell polynomials can be found in Sect. A.1, p. 351.
92 2 The Tensor Derivative of Vector Functions

The usual notations we have introduced earlier are f (g (x)) = f (g),


Dx⊗j g (x) = Dj⊗ g (x) = gj⊗ , and Dx⊗j ,xk g (x) = Dj,k ⊗
g (x) = g(j,k) ⊗
, and
so on. In accordance with this if we have a block b of {1, 2, . . . , n} then
Dx⊗b g (x) = Db⊗ g (x) = gb⊗ , and we will associate a product ⊗ ⊗
b∈K gb with
⊗ ⊗
a partition K, for instance the product g(1,2) ⊗ g3 corresponds to the partition
K = {(1, 2) , (3)}.
For a better understanding of the sum by partitions in (2.51), the usage of
incomplete Bell polynomials, and the commutator matrices, we will show Faà di
Bruno’s formula for n = 4 in some detail. Actually, we derive the derivatives first by
the formula and then we differentiate one by one using the rules of the T-derivative.
A list of corresponding incomplete Bell polynomials is given in Sect. A.1, p. 351.
Case 2.1 (n = 1) Then r = 1, and B1,1 (x1 ) = x1 , the derivative accordingly is

Dx⊗ f (g (x)) = f (1) (g) gx⊗ . (2.54)

Case 2.2 (n = 2) Here r = 1 : 2,


1. If r = 1, B2,1 (x1:2 ) = x2 , then the only partition is K = {(1, 2)} with one block;

it has type  = 1:2 = (0, 1), and the corresponding derivative is g(1,2) .
2. If r = 2, B2,2 (x1 ) = x1 , then we have one partition K2 = {(1) , (2)} again, with
2

type  = 1:2 = (2, 0), and the corresponding derivative is g1⊗ ⊗ g2⊗ .
Differentiating (2.54) with respect to x2 , we obtain
⊗ ⊗
D1,2 f (g) = f (1) (g) g(1,2) + f (2) (g) g1⊗ ⊗ g2⊗ , (2.55)

giving a good agreement with the incomplete Bell polynomials. The formula (2.51)
is true for n = 2.
Case 2.3 (n = 3) Here the possible r = 1 : 3,
1. If r = 1, B3,1 (x1:3 ) = x3 , 1:3 = (0, 0, 1), the only partition K{1} = {(1, 2, 3)}

and the derivative is g(1,2,3) .
2. If r = 2, B3,2 (x1:3 ) = 3x1 x2 ,  = 1:3 = (1, 1, 0), the only type with coefficients
4!/2! = 3, partitions K{2},1 = {(1, 2) , (3)}, K{2},2 = {(1, 3) , (2)}, K{2},3 =

{(2, 3) , (1)}, the corresponding derivatives are g(1,2) ⊗ g3⊗ , g(1,3)

⊗ g2⊗ , g(2,3)



g1 . Notice that blocks are in canonical form. Now the order of indices of the term

g(1,3) ⊗ g2⊗ are different from the starting one which is 1, 2, 3; the corresponding
permutation is (13|2), hence we apply the commutator K−1 (132) .
3. If r = 3, B3,3 (x1:3 ) = x13 ,  = 1:3 = (3, 0, 0), one partition K{3} =
{(1) , (2) , (3)}, with the derivative g1⊗ ⊗ g2⊗ ⊗ g3⊗ .
2.3 Multi-Variable Faà di Bruno’s Formula 93

Now to check the above steps, we differentiate (2.55) by x3 , we will see the
general pattern appears alone,
  
⊗ ⊗ ⊗
D1,2,3 f (g) = f (1) (g) g(1,2,3) + f (2) (g) g(1,2) ⊗ g3⊗ + K−1 ⊗
(132) g(1,3) ⊗ g2


+g1⊗ ⊗ g(2,3)

(2.56)

+ f (3) (g) g1⊗ ⊗ g2⊗ ⊗ g3⊗


  
⊗ ⊗
= f (1) (g) g(1,2,3) + f (2) (g) g(1,2) ⊗ g3⊗ + K−1
(132) g ⊗
(1,3) ⊗ g2


+K−1 ⊗ ⊗
(231) g(2,3) ⊗ g1

+ f (3) (g) g1⊗ ⊗ g2⊗ ⊗ g3⊗ .

Case 2.4 (n = 4) Here r = 1 : 4.


1. If r = 1, B4,1 (x1:4 ) = x4 , 1:4 = (0, 0, 0, 1), the only partition is K1 =

{(1, 2, 3, 4)} and the derivative is g(1,2,3,4) .
2. If r = 2, B4,2 (x1:3 ) = 4x1 x3 + 3x2 , types are either 1 = (1, 0, 1, 0), or 2 =
2

(0, 2, 0, 0) with coefficients 4!/3! = 4 and 4!/2! (2!)2 = 3, respectively.


(a) Partitions with respect to 1 , are K2,1 = {(1, 2, 3) , (4)}, K2,2 =
{(1, 2, 4) , (3)}, K2,3 = {(1, 3, 4) , (2)}, and K2,4 = {(1) , (2, 3, 4)}, the

corresponding derivatives are g(1,2,3) ⊗ g4⊗ , g(1,2,4)

⊗ g3⊗ , g(1,3,4)

⊗ g2⊗ ,
 
g1⊗ ⊗ g(2,3,4)

. Two products need commutators: K−1 (1243) g ⊗
(1,2,4) ⊗ g ⊗
3 , and
K−1 ⊗ ⊗
(1342) g(1,3,4) ⊗ g2 .
(b) Partitions with respect to 2 , are K2,1 = {(1, 2) , (3, 4)}, K2,2 =
{(1, 3) , (2, 4)}, K2,3 = {(1,
 4) , (2, 3)}. The
 derivatives with commutators

are g(1,2) ⊗ g(3,4), K(1324) g(1,3) ⊗ g(2,4) , and K−1
⊗ ⊗ −1 ⊗ ⊗
(1423) g ⊗
(1,4) ⊗ g ⊗
(2,3) .

3. If r = 3; B4,3 (x1:2 ) = 6x12x2 ;  = (2, 1, 0, 0); K3,1 = {(1) , (2) , (3, 4)}, K3,2 =
{(1, 2) , (3) , (4)}, K3,3 = {(1, 3) , (2) , (4)}, K3,4 = {(1, 4) , (2) , (3)}, K3,5 =
{(1) , (2, 3) , (4)}, K3,6 = {(1) , (2, 4) , (3)}; g1⊗ ⊗ g2⊗ ⊗ g(3,4) ⊗ ⊗
, g(1,2) ⊗ g3⊗ ⊗
   
g4⊗ , K−1 g ⊗
⊗ g2

⊗ g ⊗
, g ⊗
⊗ g ⊗
⊗ g ⊗
, K −1
g ⊗
⊗ g ⊗
⊗ g ⊗
3 ,
(1324) (1,3)
 4 1 (2,3)
 4 (1423) (1,4) 2
⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗
g1 ⊗ g(2,3) ⊗ g4 , K(1243) g1 ⊗ g(2,4) ⊗ g3 .
4. If r = 4 again we have four blocks
B4,4 (x1 ) = x14 ;  = (4, 0, 0, 0); K1 = {(1) , (2) , (3) , (4)}; g1⊗ ⊗ g2⊗ ⊗ g3⊗ ⊗ g4⊗ .
94 2 The Tensor Derivative of Vector Functions

Now differentiating (2.56) with respect to x4 , and comparing it to the above


specifications, we see a general pattern.
⊗ ⊗
D1:4 f (g) = f (1) (g) g(1,2,3,4) (2.57)
  

+ f (2) (g) g(1,2,3) ⊗ g4⊗ + g1⊗ ⊗ g(2,3,4)

+ K−1 ⊗ ⊗
(1243) g(1,2,4) ⊗ g3
 
+ K−1(1342) g ⊗
(1,3,4) ⊗ g2

  
⊗ ⊗
+ f (2) (g) g(1,2) ⊗ g(3,4) + K−1 (1324) g ⊗
(1,3) ⊗ g ⊗
(2,4)
 
+ K−1 ⊗ ⊗
(1423) g(1,4) ⊗ g(2,3)

+ f (3) (g) g1⊗ ⊗ g2⊗ ⊗ g(3,4) ⊗ ⊗
+ g(1,2) ⊗ g3⊗ ⊗ g4⊗
 
+ K−1(1324) g ⊗
(1,3) ⊗ g2

⊗ g4

 
+ g1⊗ ⊗ g(2,3)

⊗ g4⊗ + K−1 (1423) g ⊗
(1,4) ⊗ g2

⊗ g3

 
+ K−1 ⊗ ⊗
(1243) g1 ⊗ g(2,4) ⊗ g3

 
+ f (4) (g) g1⊗ ⊗ g2⊗ ⊗ g3⊗ ⊗ g4⊗ .
 
All the terms are directly derived from (2.56), except one: K−1
(1342) g ⊗
(1,3,4) ⊗ g ⊗
2 ,
 
which comes from K−1 ⊗ ⊗
(132) g(1,3) ⊗ g2 :

     
∂ ∂
K−1 g ⊗
(132) (1,3) ⊗ g ⊗
2 ⊗ = K−1
(1324) g ⊗
(1,3) ⊗ g ⊗
2 ⊗
∂x4 ∂x4
  
= K−1 ⊗ ⊗ −1 ⊗ ⊗
(1324) g(1,3) ⊗ g(2,4) + K(1243) g(1,3,4) ⊗ g2
 
= K−1 g ⊗
(1324) (1,3) ⊗ g ⊗
(2,4)
 
+ K−1 K −1
g ⊗
(1324) (1243) (1,3,4) ⊗ g ⊗
2 ,

 −1
evaluating K−1 −1
(1324) K(1243) = K(1243) K(1324) = K−1
(1342) , which is expected.
One can find the general case in the proof Appendix 2.4.2, p. 99.
We consider some particular cases of the formula (2.51) which help to prove
some properties of cumulants; there can be several other uses of that formula as
well.
2.3 Multi-Variable Faà di Bruno’s Formula 95

If all variables among x1:n are the same x, then those tensor products which
correspond to the same type are equal, for example
  

Dx⊗3 f (g) = f (1) (g) gx,3 + f (2) (g) Id + K−1 −1
(132) + K(231)

gx,2 ⊗ gx⊗ + f (3) (g) gx⊗3 .


We see that gx⊗ ⊗ gx,2 corresponds to the type 1:3 = (1, 1, 0), and gx⊗3 to the
 ⊗  ⊗k
type 1:3 = (3, 0, 0). Now we assign Dx⊗k g k = ⊗ k =0 gx,k to the type 1:n ,
similarly to the incomplete Bell polynomials.
We restrict partitions K{r} ∈ Pn , with size r and type , let us denote it by K{r|}
then the general formula follows as:
Corollary 2.6 Let x1 , x2 , . . . , xn , be equal to xj = x, then, we can collect
partitions K{r} of the same type 


n  ⊗ ⊗
Dx⊗n f (g (x)) = f (r) (g) L−1
lr:1 gx,j j , (2.58)
r=1 j =r,j j =n

where

L−1
lr:1 = K−1
p(K{r|} )
,
K{r|} ∈Pn

and the summation is over all possible partitions K{r|} ∈ Pn , with size r and type
, see Sect. 2.4.3, p. 100 for the general definition of commutator L−1
lr:1 .
The left-hand side of expression (2.58) is n-symmetric, just because the operator
Dx⊗n is invariant under the changes of the order of Dx⊗ applied. The right-hand side
should be symmetric as well. This implies that applying the commutators

L−1
lr:1 = K−1
p(K{r|} )
(2.59)
K{r|} ∈Pn

 ⊗
on ⊗ gx,j j provides an n-symmetric tensor. Recall Example 1.8, p. 15, to show
that the symmetrization of a tensor, a⊗2 ⊗ b, say, which is a T-product of symmetric
tensors, a⊗2 and b, needs a smaller number of commutators than the general case,
 ⊗
3 instead of 6 there. This happens here as well, since ⊗ gx,j j is a tensor of
⊗ ⊗2
symmetric (lower-order) tensors gx,j j . An instance is gx,3 , say, which is a tensor

product of a 3-symmetric tensor gx,3 with itself. The sum of commutators (2.59)
can be characterized by type  only, since r is the sum of the components of .
The following example shows the benefit of this Corollary.
Example 2.9 Let f and g be like above and n = 3, then the type  = 1:3 =
(1, 1, 0); see (2.56), identifies the sum of commutators (2.59). The type  = 1:3 =
96 2 The Tensor Derivative of Vector Functions


(1, 1, 0) implies the tensor product of a 2-symmetric tensor (2 = 1) gx,2 and a 1-

symmetric tensor (1 = 1) gx . We shall neglect zeros of 1:3 from the notation and
introduce L−1
12 ,11 for the commutator with respect to (2.59), so


Dx⊗3 f (g) = f (1) (g) gx,3 + f (2) (g) L−1 ⊗ ⊗
12 ,11 gx,2 ⊗ gx + f
(3)
(g) gx⊗3 ,

where

L−1 −1 −1
12 ,11 = Id 3 + K(132) + K(231) ; (2.60)

see (2.56). Notice that the index 12 in L−1


12 ,11 corresponds to 2 = 1 and 11 to 1 = 1.
We replace the inverse commutators

L−1
12 ,11 = Id 3 + K(132) + K(312) ;

now compare it to the Example 1.8, p. 15, and conclude that they are the same.
Example 2.10 If n = 4, then from (2.57) it follows

Dx⊗4 f (g) = f (1) (g) gx,4 + f (2) (g) L−1 ⊗ ⊗
13 ,11 gx,3 ⊗ gx + f
(2)
(g) L−1 ⊗2
22 gx,2
(2.61)
+ f (3) (g) L−1 ⊗ ⊗2
12 ,21 gx,2 ⊗ gx + f
(4)
(g) gx⊗4 ,

where commutator L−1 13 ,11 , corresponds to the type 1 = (1, 0, 1, 0), when r = 2.
Here the subject of symmetrization is a product of a 3-symmetric and a 1-symmetric

tensors, i.e. gx,3 ⊗ gx⊗ ,

L−1 −1 −1 −1
13 ,11 = Id 4 + K(2431) + K(1243) + K(1342) . (2.62)

Now, L−1 ⊗2
22 , symmetrizes a product of a 2-symmetric tensor with itself, i.e. gx,2 , the
generating type is 2 = (0, 2, 0, 0), (see above the Case n = 4, r = 2), the result is

L−1 −1 −1
22 = Id 4 + K(1324) + K(1423). (2.63)

Finally L−1
12 ,21 is defined by the type  = (2, 1, 0, 0), where for j = 2, 2 = 1, and
⊗ ⊗
for j = 1, 1 = 2, and the tensor product gx,2 ⊗ gx⊗2 includes a 2-symmetric gx,2
⊗2
and a product of two 1-symmetric tensor gx tensor,

L−1 −1 −1 −1 −1 −1
12 ,21 = Id 4 + K(3412) + K(1324) + K(2314) + K(1423) + K(2413); (2.64)

see (2.57) above.


2.3 Multi-Variable Faà di Bruno’s Formula 97

The general notation for an L−1 commutator which corresponds to a type  =


1:n , is the following; the index of L−1 includes the nonzero entries of  starting
with the largest j which is the index of the entry and the value is the value of j ,
and continues by decreasing (increasing from the smallest value) j , an instance
 = (3, 2, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0); n = 12, r = 3, then L−1
15 ,22 ,31 symmetrizes
⊗ ⊗2
gx,5 ⊗ gx,2 ⊗ gx⊗3 to become a 12-symmetric tensor. The number of commutators
in the sum (2.59) for L−1 3 2
15 ,22 ,31 is given by the coefficient of the term x1 x3 x5 in
the incomplete Bell polynomial B12,3 (see 1.44, p. 34). The definition of moment
commutators L−1 is in Sect. 2.4.3 in more detail, the name of L−1 will be clear in
Chap. 3.
The simplification of Dx⊗n f (g (x)) is still possible using the symmetrizer Sd1n .
The nth-order T-derivative Dx⊗n is symmetric; therefore,
 ⊗ 
 
n  ⊗(n−r+1) 1 gx,j
Dx⊗n f (g (x)) = n! f (r)
(g) ,
j =1 j ! j!
r=1 j =r; j j =n

(see Definition 1.2, p. 16 of symmetry equivalence).


We have seen Faa Di Bruno’s formula for the scalar case which of course, follows
from the general case.
Corollary 2.7 For simplicity suppose that all dj = 1, i.e. x = [x1 , x2 , . . . , xn ] and
the formula (2.51) is simplified as
 
∂n 
n   ∂ |b|
f (g (x)) = f (r) (g (x))  g (x) ,
∂x1 ∂x2 . . . ∂xn b∈K{r}
j ∈b ∂xj
r=1 K{r} ∈Pn

which we have already proved, see (2.6).


Example 2.11 We can use the Bell polynomials B5 (see 1.44, p. 34, for
the definition and Sect. A.1, p. 351 for the list of the incomplete Bell

polynomials);Dx⊗4f (g) = f (1) (g) gx,4 +f (2) (g) L−1 ⊗ ⊗
13 ,11 gx,3 ⊗gx +f
(2) (g) L−1 g ⊗2
22 x,2
 

Dx⊗5 f (g) = f (1) (g) gx,5 + f (2) (g) L−1 g ⊗
14 ,11 x,4 ⊗ gx

+ L−1
g ⊗
13 ,12 x,3 ⊗ g ⊗
x,2
 
+f (3) (g) L−1 ⊗2 ⊗ −1 ⊗
22 ,11 gx,2 ⊗ gx + L13 ,21 gx,3 ⊗ gx
⊗2

+f (4) (g) L−1 ⊗ ⊗3


12 ,31 gx,2 ⊗ gx + f
(4)
(g) gx⊗5 ,
98 2 The Tensor Derivative of Vector Functions

we apply the symmetrizer Sd15 for the result and obtain the symmetry equivalent
version
 
 ⊗ ⊗ ⊗ ⊗
Dx⊗5 f (g) = f (1) (g) gx,5 + f (2) (g) 5gx,4 ⊗ gx⊗ + 10gx,3 ⊗ gx,2
 
⊗2 ⊗
+f (3) (g) 15gx,2 ⊗ gx⊗ + 10gx,3 ⊗ gx⊗2

+10f (4) (g) gx,2 ⊗ gx⊗3 + f (4) (g) gx⊗5 .

It is worth noting that the computational usage of symmetrizer Sd15 needs more
power and room than the usage of the L commutator.

2.4 Appendix

2.4.1 Proof of Faà di Bruno’s Lemma

Proof of Lemma 2.1. We prove the result (2.6) by induction. We have seen the
result is true for n = 2 and 3. We assume the result is true for n, and show the result
is valid for n + 1.
We differentiate both sides of (2.6) with respect to xn+1 . Let us consider the
derivative of a typical term on the right-hand side of (2.6). By applying the product
rule, we obtain

∂   (1)  
f (r) (g) gb = f (r+1) (g) gn+1 gb (2.65)
∂xn+1 K ∈P n b∈K K{r} ∈Pn b∈K
|K |=r

 ∂ 
+f (r) (g) gb , r = 1, 2, . . . , n,
∂xn+1
K{r} ∈Pn b∈K{r}

where K{r} denotes partitions with size |K| = r. Let us compare the result (2.65)
(1) (2)
with (2.3) and (2.5). We see (2.5) is obtained by differentiating f (g) and f (g)
of (2.3) with respect to g and the second term (2.4) and the last term of (2.5) are
(1) (2)
the derivative of f (g) and f (g). As we have noted earlier these two terms are
exclusive expansions of the two terms from the order n = 2. From this it is obvious
that the term

gn+1 gb ,
b∈K{r}


is the exclusive extension of the product term b∈K{r} gb , and therefore corresponds
* *
to the partition K{r+1} ∈ Pn+1 , so *K{r+1} * = r +1, where a typical block of K{r+1}
2.4 Appendix 99

is b∈K{r} , and (n + 1). Similarly

∂ 
gb ,
∂xn+1
b∈K{r}

contains all the inclusive extensions of the partition of K{r} ∈ Pn (again let us
compare this with the first term g1,2,3 , here K1 = {(1, 2, 3)}, the third term g1 g2,3 ,
here K2 = {(1) , (2, 3)}, and the fourth term g1,3 g2 , here K3 = {(2) , (1, 3)}). Now
sum both sides (2.65) and collect the coefficients of f (r) (g) (r = 1, 2, . . . , n + 1).
The coefficients of each f (r) (g) contain both inclusive and exclusive extensions
of partitions with blocks oforder r − 1, when 1 ≤ r ≤ n. When r = n + 1 the
coefficient of f (n+1) (g) is n+1
j =1 gj . Hence the formula (2.6) for n + 1, i.e.

∂ n+1 h (x)  n+1   ∂ |b| g (x)


= f (r) (g) 
∂x1 ∂x2 · · · ∂xn+1 j ∈b ∂xj
r=1 K{r} ∈Pn+1 b∈K

is true.

2.4.2 Proof of Faà di Bruno’s T-formula

Proof As we have seen in 2.1, p. 92, the formula (2.51) is true for n = 1 : 4.
Actually differentiating the function f (r) (g) in the sequel contributes to exclusive
extensions while differentiating the product of gb⊗ ’s gives inclusive extensions.
Hence the general formula can be proved by induction on n i.e. assuming the
formula (2.51) is true for n, prove it for n + 1. Differentiate


n  ⊗  ⊗ 
Dx⊗n
1:n
f (g) = f (r) (g) K−1
p(K{r} )
Dxb g (x) ,
b∈K{r}
r=1 K{r} ∈Pn

* *
by Dx⊗n+1 , note that partition K{r} is of size *K{r} * = r. Consider
 ⊗
Dx⊗n+1 f (r) (g) gb⊗
b∈K{r}
K{r} ∈Pn
   ⊗
= f (r+1) (g) K−1
p(K{r} )
⊗ Id+1 g ⊗ ⊗ gn+1

b∈K{r} b
K{r} ∈Pn
 ⊗
+ f (r) (g) K−1
p(Ke ) Dx⊗n+1 gb⊗ ,
b∈K{r}
K{r} ∈Pn
100 2 The Tensor Derivative of Vector Functions

the term K−1 −1


pn ⊗ Id+1 = K(pn ,n+1) , fits the formula, it corresponds to the partition
Ke = {b ∈ K, (n + 1)} ∈ Pn+1 , i.e. Ke is the exclusive extension of K{r} ∈ Pn ,
and so |Ke | = r + 1, similarly we denote the inclusive extension of the partition
K{r} by Kincl ,

⊗ 
r ⊗   ⊗ 
Dx⊗n+1 gb⊗ = K−1 K−1
p(K{r} ) ((j +1):n)S
g⊗ ⊗ Dx⊗n+1 gb⊗j ⊗ gb⊗k
b∈K{r} 1:j −1 bk j +1:r
j =1


r ⊗   ⊗ 
= K−1
p(Kincl ) g⊗ ⊗ g ⊗b ,n+1 ⊗ g⊗ ,
1:j −1 bk ( j ) j +1:r bk
j =1

gives all the inclusive extensions of the partition of K{r} ∈ Pn . Now if we collect all
partitions K ∈ Pn+1 corresponding to the term with f (r) (g) in Dx⊗n+1 Dx⊗n 1:n)
f (g),
they are the union of partitions which are the exclusive extensions of K{r} ∈ Pn , on
the one hand and the inclusive extensions of the partitions of K{r} ∈ Pn , on the other
hand. In both cases the result has r blocs (the inclusive extension does not increase
the number of blocks in a partition). The union contains all partitions K ∈ Pn+1
with |K| = r. Hence the formula (2.51) is true for n + 1.

2.4.3 Moment Commutators

Commutator matrix L corresponds to type  = 1:n , such that nonzero j s of 


define the sum of commutator matrices as follows:

L−1
lr:1 = K−1
p(K{r|} )
,
K{r|} ∈Pn

where index lr:1 is defined by the following way: if j = 0 then set j by lj , where
 if j = 3, then
l denotes the actual value of j , for instance  lj = 3j . The type 
is either in decreasing (canonical) lr:1 = ljr , ljr−1 , . . . lj1 or l1:r increasing order
of jk ; jk ≥ jk−1 . The summation is taken over all partitions K{r|} of the set 1 :
n, having type  and size r. L−1l1:r denotes moment commutator when the l1:r is in
increasing order.
The number of all partitions K{r|} of n−element set with type  is

n!
N = n ,
j =1 j ! (j !)
j

(see (1.41), (1.42), p. 32) which applies for the number of commutators in the sum
of L−1
lr:1 . Several particular cases are listed in Sect. A.2.1, p. 353.
2.5 Exercises 101

2.5 Exercises

2.1 Let h (x) = f (g (x)) and f (x) = ln x. Use (2.6) and show that

∂3 1 (3) 3 (1) (2) 1  (1) 3


ln (g) = g − g g + 2 g .
∂x 3 g g2 g3

2.2 Let h (x) = f (g (x)) and f (x) = exp x, show that


  3 
∂3
exp (g) = exp (g) g + 3g g + g
(3) (1) (2) (1)
.
∂x 3

2.3 Let h (x) = f (g (x)), x ∈ R3 and f (x) = exp x

∂3  
(3) (2) (1) (1) (2) (2) (1)
exp (g) = exp (g) g1,2,3 + exp (g) g1,3 g2 + g1 g2,3 + g1,2 g3
∂x1 ∂x2 ∂x3
(1) (1) (1)
+ exp (g) g1 g2 g3 .

2.4 Let h (x) = f (g (x)), use formula (2.13) and derive ∂ 4 h (x) /∂x1∂x23 . Compare
the result with Case A.1.
2.5 Use formula (2.13) for derivative ∂ 4 f (g (x1 , x2 )) /∂x12∂x22 , set n = (2, 2),
|n| = 4, K = 8, 0 < jk ≤ n, jk , k = 1 : K, and use the reading Table.

j1 \j2 0 1 2
0 (0, 1) (0, 2)
1 (1, 0) (1, 1) (1, 2)
2 (2, 0) (2, 1) (2, 2)

Derive ’s for r = 1 : 4, then the result

∂4
f (g) = f (1) (g) D2,2 g
∂x12 ∂x22
  2 
+f (2) (g) 2D2,1 gD0,1 g + 2D1,0 gD1,2 g + D2,0 gD0,2 g + 2 D1,1 g
  2  2 
+f (3) (g) D2,0 g D0,1 g + D0,2 g D1,0 g + 2D0,1 gD1,0 gD1,1 g
 2  2
+f (4) (g) D1,0 g D0,1 g .

2.6 Derive Jacobian matrix of f (x) = μ1 x1 + μ2 x2 , −1/2(σ11x12 + 2σ12 x1 x2



+σ22 x22 ) .
102 2 The Tensor Derivative of Vector Functions

2.7 Show that

f1 ⊗ df2 ⊗ f3 = (f1 ⊗ (Dx f2 ) ⊗ f3 ) dx.

Hint: see Exercise 1.11.


2.8 Show that

vec (Dx f) = Km•d Dx⊗ f.


2.9 Calculate T −derivative of f (x) = μ1 x1 + μ2 x2 , −1/2(σ11x12 + 2σ12 x1 x2



+σ22 x22 ) .
2.10 Let a be a constant vector, then show that

Dx⊗ (a ⊗ x) = a ⊗ vecId ,

and

Dx⊗ (a ⊗ Bx) = a ⊗ vecB .

If G (x) is a matrix-valued function, then define Dx⊗ G = vec (Dx vecG). Let
G (x) = A ⊗ Bx and show that

Dx⊗ G (x) = vecA ⊗ vecB .

2.11 Show that


 
Dx⊗ x Ax = A + A x.

2.12 Let A be a square matrix, and show that


 
Dx⊗2 (Ax)⊗2 = K(3214) + Id 4 vecA⊗2 .

2.13 Let x ∈ R, f1 ∈ Rm1 , show that


 
∂   
vec Dy f1 (y) f2 = Dx⊗ f2 ⊗ Im1 Km2 •m1 Dy⊗ f1
∂x
    ⊗
= Im1 ⊗ Dx⊗ f2 Dy f1 .

Hint: apply (1.22).


2.5 Exercises 103

2.14 Let f1 (y) = exp (y1 − 1/2y  2), and f2 (x) = [y1 , y2 ] = [μ1 x1
+μ2 x2 , σ11 x12 + 2σ12 x1 x2 + σ22 x22 , then show that
! "
μ1 − x1 − σ12 x2
Dx⊗ f = exp (y1 − 1/2y2) .
μ2 − x2 − σ12 x1


2.15 Sow that if f1 (y) = y, exp y , and f2 (x) = μ1 x 1 + μ2 x 2 −
 
1/2 σ11 x12 + 2σ12 x1 x2 + σ22 x22 , then
⎡ ⎤
μ1 − x1 − σ12 x2
⎢ μ2 − x2 − σ12 x1 ⎥
Dx⊗ f = ⎢ ⎥
⎣(μ1 − x1 − σ12 x2 ) exp y ⎦ .
(μ2 − x2 − σ12 x1 ) exp y

2.16 Use (2.25) and show if m2 = 1, that

Dx⊗ h = Dy⊗ f ⊗ Dx⊗ g.

2.17 Show that (2.26) and (2.27) are equivalent.


2.18 Let V a d × d matrix, apply (2.32) to show either
  
Dx⊗ (Vx ⊗ vecV) = K−1
(12) d, d 2 , d vecV ⊗ vecV ,
S

or
  
Dx⊗ (Vx ⊗ vecV) = K−1 2
(132) d, d , d vecV ⊗ vecV .

2.19 Show that


  
Dx⊗ (y − Vx)⊗2 = − K−1
(132) (d) + K −1
(231) (d) vecV ⊗ (y − Vx) ,

and
  
 ⊗2
Dx⊗2 (y − Vx)⊗2 = K−1 −1
(1324) (d) + K(2314) (d) vecV .

2.20 Show that

Dx⊗ (f1 ⊗ f2 ⊗ f3 ) = Dx⊗ (f1 ⊗ (f2 ⊗ f3 ))


 
= K−1 ⊗
(2314) (m1:3 , d) f2 ⊗ f3 ⊗ Dx f1
 
+ K−1 ⊗ ⊗
(1324) (m1:3 , d) f1 ⊗ f3 ⊗ Dx f2 + f1 ⊗ f2 ⊗ Dx f3 .
104 2 The Tensor Derivative of Vector Functions

2.21 Use Property 2.3 for deriving (2.34).


2.22 Let

f (x) = Q0 + Q1 x + Q2 x⊗2 + Q3 x⊗3 ,

a cubic polynomial, show


 

Dx⊗ f (x) = vecQ1 + (Q2 ⊗ Id ) K−1(213) + I d 3 (x ⊗ vecId )
  
+ (Q3 ⊗ Id ) K−1
(2314) + K −1
(1324) + Id4 x ⊗2
⊗ vecId ,

see Shopin [Sho10].


2.23 (Leibniz’s rule) Let f and g be scalar valued functions of x ∈ Rd , use formula
(2.42) and prove

k−1  
   k − 1 −1  
Dx⊗(k−1) ⊗ ⊗k
f Dx g = f Dx g + K(23) d, d k−j , d, d j
j S
j =1
 
⊗(k−j ) ⊗j
× Dx g ⊗ Dx f . (2.66)

d kj
2.24 Let x ∈ Rd , k = (k1 , k2 , . . . , kd ), k! = k1 !k2 ! . . . kd !, xk = j =1 xj , ∂xk =
∂x1k1 ∂x2k2 · · · ∂xdkd .
Show that the following two forms of Taylor series expansion of
a scalar valued function g (x) are equivalent:
 1  
g (x) = ak xk + o |x|2n0 ,
k!
0≤k1:d ,k1:d ≤n0

where
*
∂ k g (x) **
ak = *
∂xk *
x=0

and

 1 ⊗m  ⊗m   
n0
g (x) = x Dy g (y) |y=0 + o |x|2n0 . (2.67)
m!
m=0
2.5 Exercises 105

2.25 Prove that


     
Dx⊗2 f g = f ⊗ Id 2 Dx⊗2 g + g ⊗ Id 2 Dx⊗2 f
  
+ Dx⊗ g ⊗ Id 2 (Im ⊗ vecId ⊗ Id ) Dx⊗ f
  
+ Dx⊗ f ⊗ Id 2 (Im ⊗ vecId ⊗ Id ) Dx⊗ g.

2.26 Show that


    
⊗(k−1) ⊗(−1)
Dx⊗2
1 ,x 2
x1⊗k ⊗ x2⊗ a ⊗k1 ⊗ a2
⊗
= k x1 ⊗ x2
 
⊗(k−1) ⊗(−1)
a1 ⊗ a2 (a 1 ⊗ a 2 ) .

2.27 Let h (x) = f (g (x)), and x = x1:4 = (x1 , x2 , x3 , x4 ) a list of vectors with
dimensions d1:4 = (d1 , d2 , d3 , d4 ), respectively. Derive the following derivatives:
1. If n = 1, then

Dx⊗1 f (g) = f (1) (g) Dx⊗1 g.

2. If n = 2, then
 
Dx⊗1:2 f (g) = f (1) (g) Dx⊗1:2 g + f (2) (g) Dx⊗1 g ⊗ Dx⊗2 g . (2.68)

3. If n = 3, then

Dx⊗3
1:3
f (g) = f (1) (g) Dx⊗31:3
g + f (2) (g)
   
Dx⊗1,2 g ⊗ Dx⊗3 g + K−1 ⊗ ⊗ ⊗ ⊗
(2,3)S Dx1,3 g ⊗ Dx2 g + Dx1 g ⊗ Dx2:3 g
 
+ f (3) (g) Dx⊗1 g ⊗ Dx⊗2 g ⊗ Dx⊗3 g .

4. If n = 4, then

Dx⊗4
1:4
f (g) = f (1) (g) Dx⊗4
1:3
g + f (2) (g) Dx⊗1,2 g ⊗ Dx⊗3,4 g

+K−1 ⊗ ⊗
(2,3)S Dx1,3 g ⊗ Dx2,4 g +
 
+f (2) K−1
(1,3,4,2) D ⊗3
x1,3,4 g ⊗ D ⊗
x2 g

+f (3) (g) Dx⊗1:2 g ⊗ Dx⊗3 g ⊗ Dx⊗4 g

+f (4) (g) Dx⊗1 g ⊗ Dx⊗2 g ⊗ Dx⊗3 g ⊗ Dx⊗4 g.


106 2 The Tensor Derivative of Vector Functions

2.28 Let λ ∈ R3 , show that


 3
Dλ⊗6 λ λ = 48L−1
32 (vecId )
⊗3
,

see (A.2.1), p. 354 for commutator L−1


32 .

2.6 Bibliographic Notes

The use of Faà di Bruno’s lemma in statistics has been initiated by Lukacs [Luk55],
see also [Luk70]. See [Sav06] and further references there for general Faà di Bruno’s
formula when function f is a multivariate function, as also [Har06] for application
to cumulants. Faà di Bruno’s formula for multivariate functions in connection
with recursive multivariate derivatives is considered by Miatto [Mia19], and with
multivariate Bell polynomials by Schumann [Sch19].
MacRae [Mac74] appears to be the first to define a matrix derivative using
the tensor product and a derivative operator. Other relevant references for matrix
differentiation are Neudecker [Neu69], Henderson and Searle [HS81] and [KvR06].
The book [MN99] is basic in our treatment. We use notations and apply several
results from this book, and in particular from Chapter 9 of [MN99]. Neudecker
[Neu69] and MacRae [Mac74] deal with matrix differentiation using tensor product.
Henderson and Searle [HS79] considered the multivariate derivative of symmetric
matrices. Ma [Ma09] expresses the higher partial derivatives of composite functions
in terms of factor functions, see also by Noschese and Ricci [NR03].
Chapter 3
T-Moments and T-Cumulants

Abstract Probability distributions are characterized by their moments and cumu-


lants under some very broad assumptions. We argue that using the tensor products
of vectors leads to an intuitive and natural way to deal with higher-order moments
and cumulants for multivariate distributions, as will be shown in this chapter. We
use characteristic and cumulant generating functions to derive the basic theory of
T-moments and T -cumulants. We consider the joint moments and cumulants for
a collection of multiple random variables, keeping those having the same order
together in a vector. These vector-valued quantities have a tensor-product structure.
After listing the basic properties of both moments and cumulants, the multivariate
Faà di Bruno’s formula is applied to derive relationships between them, namely
cumulants are expressed in terms of moments and vice versa. We also provide
results concerning the cumulants of products in terms of products of cumulants,
conditional cumulants, and cumulants of the log-likelihood function among others.
The importance of Bell polynomials in practical applications of the formulae is
pointed out in several cases. In our treatment, both moments and cumulants are
strongly connected to the T -differential operator Dx⊗ , so that we follow the notations
inherited from calculus as well as the traditional Kendall–Stuart notations.

3.1 Multiple Moments

Moments of a vector random variable X ∈ Rn are usually defined via integrals with
respect to the joint distribution of X. We shall use an equivalent definition for the
moments of X based on its characteristic function φ (λ), which is also a common
tool in probability theory. The characteristic function of X is defined as
 
φ (λ) = E exp iλ X , (3.1)

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 107
G. Terdik, Multivariate Statistical Methods, Frontiers in Probability
and the Statistical Sciences, https://doi.org/10.1007/978-3-030-81392-5_3
108 3 T-Moments and T-Cumulants

where λ ∈ Rn are real, and it is always known to exist. Moreover, E |Xj |2n0 <
∞, j ∈ {1, 2, . . . n}, for some n0 if and only if all the partial derivatives

∂ kj ∂ kj
φ (λ) = n φ (λ)
∂λk kj
j =1 ∂λj

up to order kj ≤ 2n0 exist, are continuous at zero, and the Taylor expansion

 i kj  
φ (λ) = μX,k λk + o λ2n0 , (3.2)
k!
kj ≤2n0

around zero is valid. Here k denotes the vector of non-negative integers


  kj
(k1 , k2 , . . . , kn ) ∈ Nn , k! = nj=1 kj ! is the multiple factorial, Xk = nj=1 Xj ,
/#
and λ = λ2i .
From now on we shall assume the existence of all necessary moments used in the
sequel. One can consider Eq. (3.2) as defining the moments, namely the coefficients
μX,k = EXk of λk in the series expansion of φ (λ) is the kth moment of X.
Since there is no assumption that all the components of X should be mutually
distinct, the distinct values principle can be applied. A definition of the nth, (n ≤
2n0 ) order moment μX,n = EXn of say a variable X, can be given as follows. Put
X = (X, X, . . . , X ), use (3.1), and obtain the characteristic function of X
+ ,- .
n
 
#
n
φ(λ) = E exp iX λj .
j =1

Now, the nth-order derivative of φ1 (λ) = E exp (iλX)exists and is continuous at


zero if and only if the derivative ∂ n /∂λ1n φ (λ) = ∂ n / nj=1 ∂λj φ (λ) exists and is
continuous at zero. Hence EXn = (−i)n ∂ n /∂λ1n φ (λ) |λ=0 .
The distinct values principle implies that only the first-order partial derivatives
of the characteristic function with respect to each variable are needed for obtaining
higher-order moments. To show this let us consider Xk , with |k| = kj = r. Define
a random variable Y ∈ Rr , such that the first k1 entries Y1,j , j = 1 : k1 , coincide
with X1 , denote it by Y1 , the next k2 entries Y2,j coincide with X2 and let us denote
them by Y2 , and so on, up to Yn . The characteristic function of Y is
 
  
n

φY (λ) = E exp iλ Y = E exp i λm Ym ,
m=1
3.1 Multiple Moments 109

where λm = λm,1:km , is the appropriate slice of λ. Then replace Ym,j by Xm , the


corresponding entry of X
⎛ ⎞
kj

n 
φY (λ) = E exp ⎝i Xm λm,j ⎠ .
m=1 j =1

*
It is clear that instead of the derivatives (−i∂)r /∂λk φX (λ)*λ=0 = EXk , we have
only taken the first-order derivatives of φY with respect to each variable λm,j .
Let 1n denote a vector with all 1s in its coordinates, i.e. 1n = [1, 1, ..., 1]
with dimension n. The following definition gives all mixed higher-order moments
as well.
Definition 3.1 The nth-order moment of a random variable X ∈ Rn is
*
(−i∂)n *
μX,1n = EX1n = φ (λ) * , (3.3)
∂λ1n
X *
λ=0

μX,1n = μ(1:n) for short.


The (n1 , n2 )th-order moment of the random variables X and Y , i.e.
μ(X,Y ),(n1 ,n2 ) = EXn1 Y n2 is a particular case of (3.3) when the first n1 components
of X = X1:(n1 +n2 ) are equal to X and the other n2 components are equal to Y .
The following example shows the generality of formula (3.3).
Example 3.1 Let [X, Y ] be a Gaussian random vector with EX = 0, EY = 0,
VarX = σX2 , VarY = σY2 , and Cov (X, Y ) = σXY . The characteristic function of
[X, Y ] is
 
φX,Y (u1 , u2 ) = exp −1/2(σX2 u21 + 2σXY u1 u2 + σY2 u22 ) .

If we want to calculate EX2 Y 2 , say, then consider the characteristic function

φX,Y (u1 , u2 ) = φX,Y (λ1 + λ2 , λ3 + λ4 )

and take the derivative


*
(−i∂)4 *
*
φX,Y (λ1 + λ2 , λ3 + λ4 )* = σX2 σY2 + 2σXY
2
,
∂λ1:4 *
λ1:4 =0

from which we obtain

EX2 Y 2 = σX2 σY2 + 2σXY


2
.
110 3 T-Moments and T-Cumulants

Remark 3.1 Once again definition (3.3) covers all higher-order moments μX,k =
EXk with order kj = r. The reason is that we can expand X into a vector of
dimension r including all the entries with their multiplicities, and by the distinct
values principle we plug the entries of X back into the expression.
It is worth recalling here some properties of the moments of the products of
random variables


n
EX1n = E Xk = μ(1:n) .
k=1

Multilinear μ(1:n) = μ(1,2...,n) is multilinear in the entries of X, i.e. we have

  
n 
n 
n
E aXi + bXj Xk = aE Xk + bE Xk .
k=1,k=i,j k=1,k=i k=1,k=j

Symmetric μ(1:n) is symmetric in the entries of X, for each permutation p ∈ Pn ,


i.e. we have


n
μp(1:n) = E Xp(k) = EX1n = μ(1:n) .
k=1

Independence Let the entries of X be independent, then EX1n = (EX)1n .


Moreover if X = (X1 , . . . , X1 , . . . , Xr , . . . , Xr ) and X1 , . . . , Xr , are indepen-
+ ,- . + ,- .
k1 kr
dent then


r
k 
r
k
μX,k = E Xj j = EXj j . (3.4)
j =1 j =1

Note, the above properties are not true for mixed higher-order moments unless
we rewrite them using the distinct values principle.

3.2 Tensor Moments

In the previous section, we considered higher-order moments of various degrees for


random variables. Now we introduce the vector of moments of a given degree for
a multivariate X as a column vector. The moment with degree one is simple; it is
the vector of expected values of the entries μ = EX. The vector of expected values
with second degree will include all possible second-order products of the entries of
X, and so on for higher degrees.
3.2 Tensor Moments 111

Consider first an example that shall lead us to the higher-order moments of T-


products of vector variates.

  
Example 3.2 Suppose that the vector λ is partitioned into two parts λ = λ1 , λ2
with
not necessarily the same dimensions (d1 , d2 ). The characteristic function of
  
X = X1 , X2 is written as a series expansion
 
   
φ (λ1 , λ2 ) = E exp iX λ = E exp i X1 λ1 + X2 λ2

 ∞

i k+m   k   m i k+m   ⊗k   ⊗m
= E X1 λ 1 X2 λ 2 = E X1 λ 1 X2 λ 2 .
k!m! k!m!
k,m=0 k,m=0

We can apply the mixed product rule (see (1.3), p. 6), to tensor products, which
yields

i k+m  ⊗k  

 ∞

i k+m ⊗k ⊗m ⊗m ⊗m
EX1 λ⊗k
1 X 2 λ2 = E X1 ⊗ X2 λ⊗k ⊗m
1 ⊗ λ2 .
k!m! k!m!
k,m=0 k,m=0

Now it is straightforward to use the chain rule for the terms in the series
expansion and formula (2.48), (see also Exercise 2.26) to obtain
  
⊗k ⊗m
Dλ⊗1 ,λ2 X1 ⊗ X2 λ⊗k
1 ⊗ λ2
⊗m

   
⊗k ⊗m
= Dλ⊗2 Dλ⊗1 X1 ⊗ X2 λ⊗k
1 ⊗ λ ⊗m
2

⊗(k−1) ⊗(k−1) ⊗(m−1) ⊗(m−1)


= kmX1 λ1 X2 λ2 (X1 ⊗ X2 )

hence
 
Dλ⊗1 ,λ2 φ (λ1 , λ2 ) = Dλ⊗2 Dλ⊗1 φ (λ1 , λ2 )

 i k+m ⊗(k−1) ⊗(k−1)
= i2 EX λ1
(k − 1)! (m − 1)! 1
k,m=1
⊗(m−1) ⊗(m−1)
X2 λ2 (X1 ⊗ X2 ) .

Setting λ1 , λ2 = 0, we have
*
*
Dλ⊗1 ,λ2 φ (λ1 , λ2 )* = i 2 EX1 ⊗ X2 .
λ1 ,λ2 =0
112 3 T-Moments and T-Cumulants

Therefore the series expansion of φ up to second order is as follows:


  *
*
φ (λ1 , λ2 ) = φ (0, 0) + Dλ⊗1 φ (λ1 , λ2 ) * λ1
λ1 ,λ2 =0
  *
*
+ Dλ⊗2 φ (λ1 , λ2 ) * λ2
λ1 ,λ2 =0
  *
*
+ Dλ⊗1 ,λ2 φ (λ1 , λ2 ) * (λ1 ⊗ λ2 )
λ1 ,λ2 =0
  *
1 *
+ Dλ⊗2 φ (λ1 , λ2 ) * λ⊗2
1
2 1 λ1 ,λ2 =0
  *
1 *
+ Dλ⊗2 φ (λ1 , λ2 ) * λ⊗2
2 + o (|λ1 ⊗ λ2 |)
2 2 λ1 ,λ2 =0
 
= 1 + iEX1 λ1 + iEX2 λ2 + i 2 (EX1 ⊗ X2 ) (λ1 ⊗ λ2 )
⊗2 ⊗2 ⊗2 ⊗2
+i 2 EX1 λ1 + i 2 EX2 λ2 + o (λ1 ⊗ λ2 ) .

More generally, we have the Taylor series expansion of φ (λ1 , λ2 )


n
i k+m  ⊗  *  
*
φ (λ1 , λ2 ) = Dλ ,λ φ (λ1 , λ2 ) * λ⊗k
1 ⊗ λ⊗m
2 + o λ1 ⊗ λ2 n .
k!m! 1 k 21 m λ1 ,λ2 =0
k,m=0

We recall the notation λ1k , and λ21m , which denotes repetitions of vectors λ1 and
λ2 , for instance λ213 = λ2,2,2 = [λ2 , λ2 , λ2 ].
If we now carry on with the characteristic function φX of variables X = X1:d =
[X1 , X2 , . . . , Xd ] , i.e.
 
φX (λ) = E exp iλ X , λ ∈ Rd ,

and use the series expansion of the exponent, we obtain


∞ k
 i ⊗
φX (λ) = μX,k λ⊗k ,
k!
k=0

(see (2.49), p. 90). Applying the operator Dλ⊗ on φX (λ) we have


 ik  k−1
Dλ⊗ φX (λ) = iEX + E λ X X,
(k − 1)!
k=2
3.2 Tensor Moments 113

which gives iEX at λ = 0, and in general by Lemma 2.8, the kth-order T-derivative
(−i)k Dλ⊗k φX at zero results the T-moment μ⊗
X,k = EX
⊗k as follows:

*
*
(−i)k Dλ⊗k φX (λ)* = μ⊗
X,k . (3.5)
λ=0

In this sense μX = μ⊗
X,1 .
*
*
Definition 3.2 The kth-order (k ≥ 1) T-derivative (−i)k Dλ⊗k φX (λ)* of the
λ=0
characteristic function φX at zero will be called the kth-order T-moment μ⊗
X,k =
⊗k ⊗
EX of a multiple random variable X. Actually μX,k is the expected value of the
T-product X⊗k of X, where the expectation is taken element-wise.
Observe that T-moment μ⊗ d
X,k with dimension k involves all possible kth-order
moments of the entries of X, some of which are with multiplicities. The T-moment
μ⊗ ⊗
X,k is k-symmetric, μX,k ∈ Sd,k , since EX
⊗k is invariant under permutation of

terms in the T-product X , see Sect. 1.3.2, p. 16. The distinct entries of μ⊗
⊗k
X,k can
+
be obtained using the elimination matrix Qd,k , see (1.32), p. 21. The distinct entries
are denoted by μ⊗Ð + ⊗ ⊗Ð
X,k = Qd,k μX,k , the dimension of μX,k is the number of distinct
values ŋd,q , (see (1.30). p. 15).
Example 3.3 (Multivariate Normal Distribution) Take X = [X1 , X2 , . . . , Xd ].
Let us assume that X has a multivariate

normal
distribution with mean EX = μ
and variance–covariance matrix V = σj,k . It is well known that the characteristic
function φX (λ) is
⎧ ⎫
⎨ d
1 
d d ⎬
φX (λ1:d ) = exp i λj μj − λj λk σj,k
⎩ 2 ⎭
j =1 j =1 k=1
⎧ ⎫r

 1 ⎨ 
d
1 
d 
d ⎬
=1+ i λj μj − λj λk σj,k .
r! ⎩ 2 ⎭
r=1 j =1 j =1 k=1

We can easily show that


*
1 ∂φX (λ1 , λ2 , . . . , λd ) **
* = EXj = μj
i ∂λj λ=0

and
*
1 ∂ 2 φX (λ1 , λ2 , . . . , λd ) **
* = EXj Xk = σj,k + μj μk .
i2 ∂λj ∂λk λ=0
114 3 T-Moments and T-Cumulants

Now let us use T-moments for deriving the same quantities. Rewrite the character-
istic function in terms of vectors and the variance–covariance matrix
 
 1 
φX (λ) = exp iλ μ − λ Vλ .
2

Now applying the chain rule (2.24) p. 75 for the T-derivative, we obtain the first-
order derivative
   
1 1
Dλ⊗ φX (λ) = exp iλ μ − λ Vλ Dλ⊗ iλ μ − λ Vλ = φX (λ) (iμ − Vλ) ,
2 2

and then we get the second-order derivative as follows:

Dλ⊗2 φX (λ) = φX (λ) (iμ − Vλ)⊗2 − φX (λ) vecV, (3.6)

hence we have that the expected value of X


*
−i Dλ⊗ φX (λ)*λ=0 = EX = μ,

and the second-order moment is


*
*
− Dλ⊗2 φX (λ)* = μ⊗
X,2 = μ
⊗2
+ vecV,
λ=0

accordingly. We see the usual first and second moments in a vector form of the
multivariate normal distribution.
Next, we calculate the fourth-order central T-moment.
Example 3.4 (Multivariate Normal Distribution, Cont. 1) Let Y = X − μ be the
centralized version of X. Now the characteristic function has the form
 
1 
φY (λ) = exp − λ Vλ .
2

Consider the third-order central moment, namely differentiate (3.6), from where we
ignore μ. First we take the T-derivative of (Vλ)⊗2 first

Dλ⊗ (Vλ)⊗2 = Vλ ⊗ vecV + K−1


(132) (d) (vecV ⊗ Vλ) ,

then we obtain the third-order derivative


 
Dλ⊗3 φY (λ) = −φY (λ) V⊗3 λ⊗3 + φY (λ) K−1 −1
(231) (d) + K(132) (d) + Id 3 (vecV ⊗ Vλ) .

*
*
Now Dλ⊗3 φY (λ)* = 0, and hence μ⊗ ⊗3
Y,3 = 0. Differentiate Dλ φY (λ) once
λ=0
more term by term. We can ignore the first term since it will not contribute to the
3.2 Tensor Moments 115

fourth-order moment as its derivative at zero is zero. The derivative of the next term
is

vecV ⊗ (Vλ) ⊗ = vec⊗2 V,
∂λ
multiplied by a constant, where we have used that V is symmetric, see Exercise 1.19.
Hence the fourth-order central moment is,
*
*
Dλ⊗4 φY (λ)* = L−1 ⊗2
22 vec V,
λ=0

where L−1
22 is a commutator matrix (see (A.3), p. 353), and Exercise 1.19. Therefore

μ⊗ −1 ⊗2
Y,4 = L22 vec V.

Compare this with result of Exercise 2.27. Like earlier, the expected value μ⊗
Y,4 =
⊗4
EY , is 4-symmetric; therefore, we can apply the symmetrizer


μ⊗ −1
Y,4 = Sd14 L22 (vecV)
⊗2
= 3vec⊗2 V.

 
Now suppose that the vector X is partitioned into two parts X = X1 , X2 with
dimensions (d1 , d2 ) which are not necessarily equal. The characteristic function of
X can be written in terms of partitions, see Example 3.2,
  
φX (λ) = E exp i λ X

i k+  ⊗k  


     ⊗
= E exp i λ1 X1 + λ2 X2 = E X1 ⊗ X2 λ⊗k
1 ⊗ λ⊗
2
k!!
k,=0

= φX1 ,X2 (λ1 , λ2 ) ,



 
where λ is partitioned with respect to X into two parts λ = λ1 , λ2 , see
Example 3.2 for details. Now the operator Dλ⊗1 ,λ2 = Dλ⊗2 Dλ⊗1 results in
*
*
(−i)2 Dλ⊗1 ,λ2 φX1 ,X2 (λ1 , λ2 )* = EX1 ⊗ X2 = μ⊗
X1 ,X2 .
λj =0

It is clear that if we are given a list of variates X1 , X2 , . . . , Xn possibly with

different dimensions d1:n = [d1 , d2 , . . . , dn ], then we can build a vector X1:n =


   
X1 , X2 , . . . , Xn and consider the joint characteristic function φX and the
expectation
⊗
EX1 ⊗ X2 · · · ⊗ Xn = E Xj = μ⊗ ⊗
X1:n = μOn ,
j =1:n
116 3 T-Moments and T-Cumulants

where On = (1 : n) is the coarsest partition of the set 1 : n, cf. Sect. 1.4.5, p. 37, in
this way μ⊗ ⊗
On = μ1:n .
The following definition is a generalization of the scalar-valued case:
Definition 3.3 The T-moment of a list of random vector variates X1 , X2 , . . . , Xn is
defined by their joint characteristic function as
⊗ *
*
E Xj = (−i)n Dλ⊗1 ,λ2 ,...,λn φX1 ,X2 ,...,Xn (λ1 , λ2 , . . . , λn )* .
j =1:n λ1:n =0

We shall also use the following short notations:


⊗ *
*
E Xj = μ⊗
X1:n = (−i) n ⊗
Dλ1:n ϕ X 1:n (λ 1:n ) * .
j =1:n λ1:n =0

The previous definition of the higher-order T-moment of a vector variate X again


can be considered as a particular case of the T-moment of a list since there is
no restriction that members of the list should be different, so the distinct values
principle can be applied here as well.
Observe the difference: μ⊗ ⊗
X1:n = μOn is a vector with dimension d1:n =
1n
n
j =1 dj which includes the expected value of products of all the possible entries
of vector variates X1:n while μ1:n is a list of expected values of Xj . We recall
that d1:n = [d1 , d2 , . . . , dn ] are the dimensions of the list of multiple variates
X1:n = [X1 , X2 , . . . , Xn ].
⊗
Definition 3.4 Let b be a block of the set 1 : n then μ⊗ ⊗
b = μX,b = E j ∈b Xj .

For instance if b = (3, 5), then μ⊗ 3,5 = EX3 ⊗X5 . At the same time μ3,5 =
[EX3 , EX5 ]. If X3 and X5 are independent, then μ⊗
3,5 = EX3 ⊗ EX5 naturally.
Consider the following example to see the usefulness of the distinct values
principle.
Example 3.5 (Multivariate Normal Distribution, Cont. 2) We derive the fourth-
order central moment μ⊗
Y,4 using the distinct values principle. For the fourth-order
moment, we pretend to have four variables and change the characteristic function
 
1 
φY (λ) = exp − λ Vλ ,
2

for

φ (λ1:4 ) = φY (λ1 + λ2 + λ3 + λ4 ) ,
3.2 Tensor Moments 117

and consider the derivatives



Dλ⊗1 φ (λ1:4 ) = −φ (λ1:4 ) V λj ,
j
     
Dλ⊗1:2 φ (λ1:4 ) = φ (λ1:4 ) V λj ⊗ V λj − φ (λ1:4 ) vecV,
j j
  ⊗3   
Dλ⊗1:3 φ (λ1:4 ) = −φ (λ1:4 ) V λj +φ (λ1:4 ) V λj ⊗vecV
j j
     
+ φ (λ1:4 ) K−1
(123) vecV⊗ V λj + φ (λ1:4 ) vecV⊗ V λj .
j j

Now we ignore the zero terms and obtain


*
*
Dλ⊗1:4 φ (λ1:4 )* = K−1
(1423) (vecV)
⊗2
+K−1
(1324) (vecV)
⊗2
λ1:4 =0

+ (vecV)⊗2 = L−1
22 (vecV)
⊗2
,

where

L−1 −1 −1
22 = Id + K(1324) + K(1423) .

Note the commutator matrix L−1 22 and some more commutator matrices are collected
in the Appendix (see (A.3), p. 353). Hence we have

μ⊗ −1
Y,4 = L22 (vecV)
⊗2
.

Like earlier, the expected value EX⊗4 is symmetric; therefore, we can apply the
symmetrizer

μ⊗ −1
Y,4 = Sd14 L22 (vecV)
⊗2
= 3Sd14 (vecV)⊗2 .

The properties of the T-moments of a list are similar to those of the scalar-valued
case.
Multilinear Let X = a ⊗ X1 + b ⊗ X2 , where a and b are arbitrary constant
vectors with dimensions such that the summation is valid. Then the expected
value of X = a ⊗ X1 + b ⊗ X2 is

μX = a ⊗ μ1 + b ⊗ μ2

as expected.
118 3 T-Moments and T-Cumulants

The case X = X1 ⊗ (a ⊗ X2 + b ⊗ X3 ) ⊗ X4 , for example, follows from the


multilinear property of T-products
   
μX = K−1 ⊗ −1 ⊗
(2134) a ⊗ μ1,2,4 + K(2134) b ⊗ μ1,3,4 ,

and indeed

EX = EX1 ⊗ (a ⊗ X2 + b ⊗ X3 ) ⊗ X4 = EX1 ⊗ a ⊗ X2 ⊗ X4 + EX1 ⊗ b ⊗ X3 ⊗ X4


   
= K−1
(2134) a ⊗ μ⊗
1,2,4 + K−1
(2134) b ⊗ μ ⊗
1,3,4 .

Symmetry The order of the terms in the T-product and in the T-derivative is
important; the T-moment of variables X1 , X2 , . . . , Xn is not symmetric.
⊗ Let
p ∈ Pn , be a permutation of 1 : n, then E ⊗ X
j =1:n p(j ) = K p(1:n) E j =1:n j ,
X
⊗ ⊗
or μp(1:n) = Kp μ1:n for short. If the components of the list X1:n are the same
X, then μ⊗X,n = EX
⊗n is n-symmetric; therefore, μ⊗ = S
X,n

d1n μX,n , where Sd1n
denotes the symmetrizer matrix.

3.3 Cumulants for Multiple Variables

Cumulants, also called the semi-invariants, are very important characteristics of


distributions. We assume the existence of the necessary moments.

3.3.1 Definition of Cumulants

The way moments are defined (see Definition 3.3) will help us understand cumulants
more easily. Take a random vector X = X1:n , denote the logarithm of the charac-
teristic function ln φX (λ) by ψX (λ), and call it cumulant generator function, or
cumulant function, for short. Let us retain the assumptions of the previous section
and now consider the Taylor series expansion around zero of the cumulant function
ψX (λ) of X, i.e.

 i kj
ψX (λ) = c(X, k)λk + o(λn ), (3.7)
k!
kj ≤n

/#  
where λ = λ2i . The cumulant Cum X1k1 , . . . , Xn1kn = Cum(X1 , . . . , X1 ,
+ ,- .
k1
. . . , Xr , . . . , Xn ) with order |k| = kj is defined as the coefficient c(X, k) of
+ ,- .
kn
3.3 Cumulants for Multiple Variables 119

λk via Eq. (3.7). In general for the nth-order cumulant of a single variable X, the
convention

Cumn (X) = Cum(X, X, . . . , X ) = κX, X, . . . , X = κX,n


+ ,- . + ,- .
n n

 
is adopted. To obtain the mixed higher-order cumulants Cum X1k1 , . . . , Xn1kn =
κX,k (where k = k1:n ) by higher-order derivatives of ψX (λ) in (3.7), it is enough to
take the first-order derivatives of a cumulant function with respect to a variable Y =
[X1 , . . . , X1 , . . . , Xr , . . . , Xr ] acting as if all the components of Y were different.
+ ,- . + ,- .
k1 kr
The Taylor series expansion of the logarithm ofthe characteristicfunction with kj
distinct variables provides the cumulant Cum X1k1 , . . . , Xn1kn since the entries
of Y are not assumed to be different, see the similar argument for moments on page
109, in other words the distinct values principle can be applied again.
Example 3.6 The Gamma distribution  (β, α) is given either by the density

x β−1
f(β,α) (x) = exp (−x/α) , if x > 0 and zero otherwise,
α β  (β)

or by the characteristic function

φX (λ) = (1 − αiλ)−β ,

and the nth-order moment is


 (β + n)
μX,n = α n .
 (β)

The nth-order cumulant Cum(X, X, . . . , X ) = κX,n of X can be obtained either as


+ ,- .
n
the nth derivative of ψX (λ) at zero or by the distinct values principle as the partial
derivative ∂ n /∂λ11:n
n
of the characteristic function
  
ψX1:n (λ1:n ) = −β ln 1 − iα λ1:n ,

at zero, both giving the same nth-order cumulant

κX,n = α n (n − 1)!β,

where 0! = 1 as usual.
120 3 T-Moments and T-Cumulants

Again, the entries of X are not assumed to be different; therefore, the following
definition is correct, and it also covers the case when some of the entries of X are
equal as well. We define the cumulant for different variables only.
Definition 3.5 The nth-order cumulant of variates X1:n = X, is
*
(−i∂)n *
Cumn (X) = ψ (λ) * . (3.8)
∂λ1n
X *
λ=0

We shall also use the following short notation Cumn (X) = κX = κ(1:n) =
Cum (X1:n ) as well.
Notice the index (1 : n) is considered as a partition, actually On = (1 : n), cf.
Sect. 1.4.5, p. 37. Similarly to the multiple moments multi-indexing will be used
for higher-order cumulants. We also note that the second-order cumulant is called
variance as well as the third order is skewness and the fourth order is kurtosis
when the variate is standardized, (see Sect. 6.1, p. 313 for more details). One of
the simplest examples is the cumulants of the Gaussian distribution.
Example 3.7 The cumulant function
 of a Gaussian random vector X[1,2] =
[X1 , X2 ] with EXj = μj , Cov Xj , Xk = σj k is

1
ψX1:2 (u1 , u2 ) = i(μ1 u1 + μ2 u2 ) − (σ11 u21 + 2σ1,2 u1 u2 + σ22 u22 ).
2
The first-order cumulants are the first-order derivatives at zero κXj = μj ; similarly,
the second-order ones are κ(j,k) = σj k , and all higher-order cumulants are zero.

3.3.2 Definition of T-cumulants

Now consider a list of multiple random variables X1:n = (X1 , X2 , . . . , Xn ), as


a series of vectors with dimensions d1:n = (d1 , d2 , . . . , dn ). The corresponding
characteristic function is the characteristic function of all variables involved
 
φX1:n (λ1:n ) = φvecX1:n (vecλ1:n ) = E exp ivec λ1:n vecX1:n ,

where λ1:n = (λ1 , λ2 , . . . , λn ) is the corresponding list of vectors with the same
dimensions d1:n = (d1 , d2 , . . . , dn ) as X1:n . The logarithm of the characteristic
function φX1:n (λ1:n ) the cumulant function is denoted by ψX1:n (λ1:n ).
The first-order derivative of the cumulant function ψX1:n (λ1:n ), at zero, with
respect to each component of λ1:n will be defined as the cumulant of X1:n . More
precisely we recall that


Dλj φ = φ∂/∂λj = φ ∂/∂λ1,j , ∂/∂λ2,j , . . . , ∂/∂λdj ,j
3.3 Cumulants for Multiple Variables 121

is the Jacobian with respect to the vector λj of φ, the column vector of the transpose
of the Jacobian is the T-derivative
 
⊗ ∂ ∂
Dλj φ = vec φ  =φ⊗ ,
∂λj ∂λj

and Dλ⊗1:n is the column vector of the partial differential operator of order n with
respect to each variable λj only once, so that
 
Dλ⊗1:n φ = Dλ⊗n Dλ⊗1:(n−1) φ ,


(see Sect. 2.2, p. 70). The dimension of Dλ⊗n 1:n
1n
is d1:n = nj=1 dj , where 1n denotes a
vector with all 1s in its coordinates, i.e. 1n = [1, 1, ..., 1] with dimension n. In this
way, we keep all the derivatives with the same order in a vector. Now the definition
of the cumulant of a list of vectors X1:n is the following.
Definition 3.6 The T-cumulant of a collection of random vectors X1:n is
*
*
Cumn (X1:n ) = (−i)n Dλ⊗1:n ψX1:n (λ1:n )* ,
λ1:n =0

or Cumn (X1:n ) = κ ⊗ ⊗ ⊗
X1:n (d) = κ 1:n (d) = κ On (d) for short, where On denotes the
“partition” (1 : n).
We shall use notations Cumn (X1:n ), κ ⊗ ⊗ ⊗
X1:n (d), κ 1:n (d), and κ On (d) in the sequel.
(Notice the multi-index notation κ ⊗ ⊗
1:n here as well.) The cumulant κ 1:n (d) is a vector
of dimension d n = dj having all possible cumulants of the entries of vectors
1

X1 , X2 , . . . , Xn in the order defined by the T-product. In some cases when it is not


misleading we shall omit the dimensions: κ ⊗ ⊗
1:n (d) = κ 1:n . Generally, notations for
cumulants will be similar to those of moments, cf. Definition 3.4.
Definition 3.7 Let b be a block of the set 1 : n then κ ⊗ = κ⊗
X,b =
  b
Cum Xj , j ∈ b .
We emphasize that blocks b are treated as multi-index when they are the index
either of moments or cumulants. The case when some components of multivariates
X1:n are equal is considered as a special case.
Remark 3.2 If all components of X1:n are equal to X, say, then we shall use
the notation Cumn (X) = κ ⊗ X,n (d), where d is the common dimension. In that

case the vector κ X,n (d) contains d n entries, similar to the T-moments because
the T-derivative Dλ⊗n behaves this way. By the same argument the T-cumulant is
n-symmetric κ ⊗ ⊗n
X,n (d) ∈ Sd,n , since Dλ is invariant under permutations of its
terms in the T-product Dλ⊗n , which implies that κ ⊗ ⊗
X,n (d) = Sd1n κ X,n (d), where
Sd1n is the symmetrizer. Since we allowed to change the order of the derivatives
among the d n entries of Dλ⊗n , there are several equalities between the entries of
122 3 T-Moments and T-Cumulants

κ⊗ ⊗
X,n (d). The distinct values of κ X,n can be obtained with the linear transformation
κ ⊗Ð + ⊗ +
X,n = Qd,n κ X,n , where Qd,n is the elimination matrix. The dimension of the vector
κ ⊗Ð
X,n of distinct values is explicitly given, and denoted by ŋd,n , (see Sect. 1.3.2,
p.16).

Example 3.8 The cumulant function of a Gaussian random vector X1,2 =


  
X1 , X2 is
! " ! "
  1 u1 u1
ψX1,2 (u1 , u2 ) = (μ1 u1 + μ2 u2 ) − CX1 ,X2 ,
2 u2 u2

where the covariance matrix CX1 ,X2 is partitioned as


! "
C1,1 C1,2
CX1 ,X2 = ,
C2,1 C2,2

and we can express the quadratic form as


! " ! "
u1 u1    
CX1 ,X2 = u1 C1,1 u1 + u1 C1,2 u2 + u2 C2,1 u1 + u2 C2,2 u2
u2 u2
  
= u1 C1,1 u1 + 2u1 C1,2 u2 + u2 C2,2 u2 .
 
Now the first-order cumulant of the components are Cum1 Xj = μj , and it is clear
that any T-cumulant with higher order than 2 is zero. We show that the second-order
T-cumulants are the vectors of the transpose of covariance matrices, i.e.
 
κ⊗
j,k = Cum2 Xj , Xk = vecCk,j , j, k = 1, 2. (3.9)

Indeed, if for instance, if j = 2 and k = 1, we show that


   
Du⊗1 ,u2 u2 C2,1 u1 = Du⊗2 Du⊗1 u2 C2,1 u1 = vecC2,1 .


First we apply the operator ∂/∂u1 , then take the vector of the transpose, the result
is
    
Du⊗1 u2 C2,1 u1 = vec u2 C2,1 = C1,2 u2 ,

see (2.21), p. 73. Now we have

Du⊗2 C1,2 u2 = vecC2,1 .

  
Notice that u2 C2,1 u1 is a scalar so u1 C1,2 u2 = u2 C2,1 u1 , hence the result
*
κ⊗ 2 ⊗ *
1,2 = (−i) Du1 ,u2 ψX1,2 (u1 , u2 ) (u 1 ,u2 )=0
= vecC2,1 .
3.3 Cumulants for Multiple Variables 123

If j = k = 1, we obtain the first-order derivative



Du⊗1 u1 C1,1 u1 = 2C1,1u1 ,

and apply Du⊗1 once again to get


  
Du1 Du1 u1 C1,1 u1 = 2vecC1,1 .

Hence we have obtained the cumulant κ ⊗ 1,1 = Cum2 (X1 , X1 ) = vecC1,1 . Observe
that both the matrix C1,1 and the vector κ ⊗ ⊗
1,1 are 2-symmetric, κ 1,1 ∈ Sd,2 .
There are other ways for defining the moments and cumulants for multiple
random variables that will keep the quantities that belong together either as a vector,
a matrix, or a tensor.
Remark 3.3 Kollo provides an alternative definition of higher-order moments of
vector-valued variates as follows: the first-order moment m1 (X) = EX is a vector;
the second-order one is m2 (X) = EX X, which is a matrix with a natural
connection to the variance–covariance matrix. The third-order moment is the tensor
product of the previous patterns m3 (X) = EX ⊗ X ⊗ X, and the higher-order ones
follow this pattern. The fourth-order one is

m4 (X) = EX ⊗ X ⊗ X ⊗ X = EXX ⊗ XX = EX⊗2 X⊗2 .

We can vectorize these moments

vec m2 (X) = EX⊗2 , vec m3 (X) = EX⊗3 , vec m4 (X) = EX⊗4 ,

and conclude that the connection to T-moments is simply vecmk (X) = μ⊗


X,k .
The first cumulant is c1 (X) = EX; the second one c2 (X) = Var (X) is the
variance–covariance matrix V. From now on let X be centralized. The third-order
cumulant is just the central moment as usual c3 (X) = m3 (X). The fourth-order
cumulant is defined as
 
c4 (X) = m4 (X) − Id 2 + Id 2 ⊗ Kd•d (V ⊗ V) − vecVvec V.

We can express the vector form of the second term of c4 (X) with the help of
commutators
    
vec Id 2 + Kd•d (V ⊗ V) Id 2 = Id 4 + Id 2 ⊗ Kd•d vec (V ⊗ V)
 
= Id 4 + Id 2 ⊗ Kd•d (Id ⊗ Kd•d ⊗ Id ) (vecV)⊗2
 
= K(1324) (d) + K(1342) (d) vec⊗2 V
124 3 T-Moments and T-Cumulants

since K(1243)K(1324) = K(1342). The third term of c4 (X) has the following form:
 
vec vecVvec V = vec⊗2 V.

We summarize the above expressions and get

vec c4 (X) = EX⊗4 − L−1 ⊗2


22 vec V,

where L−1 −1 −1
22 = Id 4 + K(1324) (d) + K(1423) (d). As we shall see later, in Sect. 3.4.1,
formula (3.36), vecc4 (X) equals Cum4 (X), hence both the vectorized moments and
Kollo’s cumulants coincide with T-moments and T-cumulants.

3.3.3 Basic Properties

The results obtained here are generalizations of well known results given for scalar
random variables, we shall consider the scalar case first and then the general vector-
valued case.
Suppose that dimensions of X1 , X2 , . . . , Xn are d = [d1 , d2 , . . . , dn ].
Property 3.1 (Symmetry) Let X ∈ Rn , then the cumulant (with scalar
value) Cum(X) = Cum(Xp ) is symmetric, i.e. κ(1:n) = κp , where p =
(p (1) , p (2) , . . . , p (n)) ∈ Pn is a permutation of the integers 1 : n.
If some dj > 1 then the T-cumulants are not symmetric any more, they fulfill the
equation

Cumn (X1:n ) = K−1


p (d) Cumn (Xp ),

i.e. κ ⊗ −1 ⊗ −1
1:n = Kp (d) κ p , where Kp (d) is a commutator matrix defined by the
permutation p, see (1.23), p. 11.
If all the entries of the list X1:n are the same then κ ⊗1:n (d) is n-symmetric, i.e.
κ 1:n ∈ Sd,n , or equivalently κ ⊗

1:n (d) = S κ ⊗
d1n 1:n (d), see Remark 3.2.
This and the following result are obvious consequences of the T-derivative
applied to the cumulant function.
Property 3.2 (Multilinearity, Scaling) Let c1:2 = [c1 , c2 ] be constants, then

Cum(c1 Y1 + c2 Y2 , X) = c1 Cum(Y1 , X) + c2 Cum(Y2 , X),

in the scalar case. Let A and B be constant matrices with appropriate dimensions,
then

Cum2 (AY, BX) = (A ⊗ B) Cum2 (Y, X) , (3.10)


3.3 Cumulants for Multiple Variables 125

and
 
Cumn+1 (AY1 + BY2 , X1:n ) = A ⊗ Id(n) Cumn+1 (Y1 , X1:n )
 
+ B ⊗ Id(n) Cumn+1 (Y2 , X1:n ). (3.11)

If a and b are constant vectors, then

Cumn+1 (a ⊗ Y1 + b ⊗ Y2 , X1:n ) = a ⊗ Cumn+1 (Y1 , X1:n )


+ b ⊗ Cumn+1 (Y2 , X1:n ), (3.12)

assuming that the vector addition is valid.


To establish (3.10), we consider the simple connection AY λ = Y A λ. Then
take the T-derivative of the cumulant generator, use the chain rule setting u1 =
A λ1 , u2 = B λ2 , and get Dλ⊗1 ,λ2 ψ (A λ1 ,B λ2 ) = ADu⊗1 ,λ2 ψ (A λ1 ,B λ2 ) =
(A ⊗ B) Du⊗1 ,u2 ψ (A λ1 ,B λ2 ).
Similarly, for the case (3.12) we first show that λ (a ⊗ Y) = λ (a ⊗ I) Y =
((a ⊗ I) λ) Y, then the T-derivative Dλ⊗ ψ ((a ⊗ I) λ) = (a ⊗ I) Du⊗ ψ ((a ⊗ I)
λ) = a ⊗ Du⊗ ψ ((a ⊗ I) λ) gives (3.12).
Property 3.3 (Independence) If there is at least one entry Xj of X ∈ Rn , which is
independent from an entry Yk of Y ∈ Rm , then

Cum(X, Y) = 0. (3.13)

If X ∈ Rn is independent of Y ∈ Rm , and n = m then

Cum(X + Y) = Cum(X) + Cum(Y).

This formula is the additive version of formula (3.4) of the expectation of the product
of independent variables. If X1:n is independent of Y1:m and the corresponding
dimensions are equal, then

κ⊗ ⊗ ⊗
X1:n +Y1:n = Cumn (X1:n + Y1:n ) = Cumn (X1:n ) + Cumn (Y1:n ) = κ X1:n + κ Y1:n .
(3.14)

Again, formula (3.14) is the additive version of formula (3.4), which is the
expectation of the T-product of independent variables. Let ψX1:n +Y1:n (λ1:n ) be the
cumulant generator function of X1:n + Y1:n = (X1 + Y1 , . . . , Xn + Yn ), by an
obvious notation. Then we have the additivity

ψX1:n +Y1:n (λ1:n ) = ψX1:n (λ1:n ) + ψY1:n (λ1:n ) , (3.15)


126 3 T-Moments and T-Cumulants

for independent variates and then calculating the nth-order T-derivative with respect
to λ1:n = [λ1 , . . . , λn ] of both sides of (3.15), and setting λ1:n = 0 (i.e. λ1 =
0, λ2 = 0, . . . , λn = 0) we obtain (3.14). It also follows that for any constant c1:n
with appropriate dimension, if n > 1, then

Cumn (X1:n + c1:n ) = Cumn (X1:n ) .

The next Property is well known, see also Example 3.8.


Property 3.4 (Gaussianity) The random vector X ∈ Rn is Gaussian if and only if
for all subset b of (1 : n)

κX,b = Cum(Xb ) = 0, |b| > 2.

This property is applied for a list Y1:n of random vectors setting X = vecY1:n .

3.4 Expressions between Moments and Cumulants

Very often cumulants are estimated using estimated moments. We shall see that
every nth-order joint cumulant can be written in terms of nth and lower-order joint
moments. Therefore, it is important to obtain an explicit relation between cumulants
and moments and vice versa.

3.4.1 Expression for Cumulants via Moments


3.4.1.1 Expressions for scalar variates

We have seen in (3.2) that the characteristic function of X = [X1 , X2 , . . . , Xn ]


fulfills the series expansion


m
i kj
φ(λ) = μk λk + o(|λ|m ), (3.16)
k!
k≥0,kj ≤m

 k
with the moment coefficients μk = E nj=1 Xj j and the series expansion of the
corresponding cumulant generator function


m
i kj  
ψ (λ) = Cum X1k1 , . . . , Xn1kn λk + o(|λ|m ). (3.17)
k!
k≥0,kj ≤m

see (3.7), with the cumulant coefficients


3.4 Expressions between Moments and Cumulants 127

 
Cum X1k1 , . . . , Xn1kn = Cum(X1 , . . . , X1 , . . . , Xn , . . . , Xn ).
+ ,- . + ,- .
k1 times kn times

Taking the logarithm on the right side of (3.16), then the Taylor series expansion
for it, and collecting the appropriate coefficients of λk one can obtain the required
relations between moments and cumulants. We recall the following notations:

μXj = μj , μ(Xj ,Xk ) = μ(j,k) , μ(Xj ,Xk ,Xm ) = μ(j,k,m) , (3.18)

similarly to the notations of T-moments.


Example 3.9 Let n = 2 and m = 2, and apply formula (3.16), so that

1 1  
φ(λ) = 1 + iμ1 λ1 + iμ2 λ2 − μ(1,1)λ21 − μ(1,2)λ1 λ2 − μ(2,2)λ22 + o |λ|2 .
2 2
We consider the Taylor series expansion of the function ln (1 + y), and see that
  
1 1
ψ(λ) = ln 1 + iμ1 λ1 + iμ2 λ2 − μ(1,1) λ21 − μ(1,2) λ1 λ2 − μ(2,2) λ22 + o |λ|2
2 2
= iμ1 λ1 + iμ2 λ2 − (μ(1,2) − μ1 μ2 )λ1 λ2
1 1  
− (μ(1,1) − μ21 )λ21 − (μ(2,2) − μ22 )λ22 + o |λ|2 .
2 2

We equate the coefficients λj and λj λk by formula (3.17) and we obtain κj =


μj , j = 1, 2; and

κ(j,k) = μ(j,k) − μj μk = Cov(Xj , Xk ), j, k = 1, 2, (3.19)

respectively.
We will now consider what appears to be a special case of expressing cumulants
via moments when k1 = k2 = . . . = kn = 1, but it turns out that it covers the
general case as well, by the distinct values principle.
Theorem 3.1 (Cumulants via Moments) Takerandom variables X1:n , and let
μ(bj ) denote the expected value of the product E k∈bj Xk , then


n  
r
κ(1:n) = (−1)r−1 (r − 1)! μ(bj ) , (3.20)
r=1 K{r} ∈Pn j =1

* *
where K{r} = {b1 , b2 , . . . , br } ∈ Pn is a partition with size *K{r} * = r and Pn is
the set of all partitions of (1 : n), see (3.23) for an other form of (3.20).
Proof This theorem is a reformulation of Corollary 2.2, p. 65.
128 3 T-Moments and T-Cumulants

For example when r = 1, we have the single partition (1, 2, . . . , n) with


one block, when r = 2, the set (1 : n) is partitioned into two blocks (b1 , b2 ),
where either b1 contains one element, b2 contains n − 1 elements, or b1 contains
two elements, b2 contains n − 2 elements, and so on. We discussed in Sect. 1.4,
p. 26, how to generate these partitions for any n, given the partitions for n − 1.
Instead of this recursive method, one can consider the incomplete Bell polynomial
Bn,r (x1 , . . . , xn−r+1 ) which shows the structure and number of all partitions with
size r, (see Sect. 1.4.4, p. 35). We will illustrate these partitions in the following
different examples.
Example 3.10 Let n = 1. Obviously K = {b} and b = (1), and κ1 = μ1 .
Example 3.11 Let n = 2. Then r = 1, 2. When r = 1, then K{1} = {b} and b =
(1, 2), when r = 2, then K{2} = {b1 , b2 }, i.e. the partition contains two blocks b1
and b2 . They are b1 = (1) , b2 = (2), i.e. each block contains one element. So by
(3.23) we have

κ(1,2) = μ(1,2) − μ1 μ2 = Cov (X1 , X2 ) .

We have seen that generating the partitions K = {b1 , b2 , . . . , br } is equivalent


to generating the row vectors u1 , u2 , . . . , ur , where the elements of each vector are
either one or a zero depending on whether a particular value j is in the block or
not. While generating the vectors (u1 , u2 , . . . , ur ), we must note that the sum of the
vectors satisfies the linear constraint


r
uj = [1, 1, . . . , 1] ,
j =1

which expresses the fact that ∪bk = (1 : n).


In order to illustrate these ideas, we introduce some notations and ideas which
may look a bit difficult in the beginning, but will become clear later.
Let (u1 , u2 , . . . , ur ) be r vectors. Let us assume that each of them are row vectors
with n elements in each. Each element in each vector is either 1 or a zero, and
assume further that


r
uj = [1, 1, . . . , 1] .
j =1

(Note here that sum of vectors is given by the sum of the corresponding entries.)
This condition is important for obtaining the partition K = {b1 , b2 , . . . , br } of
(1, 2, . . . , n).

Let uj = uj,1 , uj,2 , . . . , uj,n , (j = 1, 2, . . . , r) where uj,k is either 1 or zero.
Let
u u u u
EX1 j,1 X2 j,2 · · · Xn j,n = EX1:n
j
.
3.4 Expressions between Moments and Cumulants 129

Suppose uj = [1, 1, 0, . . . , 0] , then

u
j
EX1:n = μ(1,2),

or if uj = [1, 1, 1, . . . , 1] = 1n , then

u
j
EX1:n = μX1n = μ(1:n) ,
1:n

and so on. Suppose we have the partition K = {b1 = (1, 2) , b2 = (3, 4, . . . , n)}
and want to consider the corresponding expectation μ(b1 ) = μ(1,2), and μ(b2 ) =
μ(3:n)
⎛ ⎞ ⎛ ⎞
 
E⎝ Xk ⎠ E ⎝ Xk ⎠ = μ(b1 ) μ(b2 )
k∈b1 k∈b2
u1 u2
= EX1:n EX1:n = μ(u1 ) μ(u2 ) ,

where u1 = (1, 1, 0, . . . , 0) , u2 = (0, 0, 1, 1, . . . , 1) and we note

u1 + u2 = [1, 1, 1, . . . , 1] = 1n .

The following example considers the case n = 3.


Example 3.12 First note that if n = 3, then the partitions with size k, k = 1, 2, 3 of
(1 : 3) are given in Example 1.21, p. 36. Consider the partitions given in Table 1.2,
p. 28. Each of the matrices, which we denoted by the numbers (1) to (5) leads to a
partition corresponding to the result (3.20) for the case n = 3. They are collected in
Table 3.1. We can conclude that the third-order cumulant κ(1:3) = Cum(X1:3 ) has 5
terms with the following coefficients:


3
κ(1:3) = μ(1:3) − μ(1,2)μ3 − μ(1,3)μ2 − μ1 μ(2,3) + 2μ1 μ2 μ3 = E (Xi − μi ) .
i=1
(3.21)

Table 3.1 Expectations Marking Partition


corresponding to Table 1.2,
p. 28 (1) μ(1,2,3)
(2) μ(1,2) μ3
(3) μ(1,3) μ2
(4) μ1 μ(2,3)
(5) μ1 μ2 μ3
130 3 T-Moments and T-Cumulants

In particular, for equal entries we have

κX,3 = μX,3 − 3μX,1μX,2 + 2μ3X,1 = E (X − EX)3 . (3.22)

The first three cumulants, see also (3.19), are equal to the central moments
but this is not true for higher-order cumulants. One might easily check this for
cumulants of order four (see Example 3.13 below).
With the above notation we* can *rewrite formula (3.20) where the summation is
over all partitions K{r} of size *K{r} * = r. The general formula is


n  
r
κ(1:n) = (−1)r−1 (r − 1)! μ (u j ) , X ∈ R n , (3.23)
r=1 U ∈Pn,|U |=r j =1

u
where μ(uj ) = EX1:n j
the second summation is taken over all possible partition


matrices U with row vectors uj , uj ∈ Rn , U = uj j =1:r , see Sect. 1.4 for details.

* *
Such a vector system uj j =1:r corresponds to a partition K{r} with size *K{r} * = r,
of the set 1 : n = (1, 2, . . . , n); therefore, the double sum in (3.23) is over all
partitions of K{r} ∈ Pn , where Pn is the set of all partitions of the numbers 1 : n.
We have seen that generating the partitions K{r} = {b1 , b2 , . . . , br } is equivalent to
generating the row vectors [u1 , u2 , . . . , ur ] , where the elements of each vector are
either 1 or a zero depending on whether a particular value j is in the block or not.
Example 3.13 We can now write the fourth-order cumulant using the matrices U in
Table 1.3, p. 29, as

κ(1:4) = μ(1:4) − μ(1,2,3)μ4 − μ(1,2,4)μ3 − μ(1,3,4)μ2


− μ(2,3,4)μ1 − μ(1,2) μ(3,4) − μ(1,3) μ(2,4) − μ(1,4) μ(2,3)

+ 2 μ(1,2)μ3 μ4 + μ(1,3) μ2 μ4 + μ(1,4)μ2 μ3 + μ1 μ(2,3)μ4

+μ1 μ(2,4)μ3 + μ1 μ2 μ(3,4)
− 6μ1 μ2 μ3 μ4 . (3.24)

Assuming Xi = X, for all i = 1, 2, 3, 4, we have

κX,4 = μX,4 − 4μX,3 μX − 3μ2X,2 + 12μX,2μ2X − 6μ4X . (3.25)

We see that κO4 = Cum (X1 , X2 , X3 , X4 ) given by (3.24) has 15 terms, but it gets
considerably simpler if we assume EX1 = EX2 = EX3 = EX4 = 0. Under this
assumption (3.24) reduces to

κ(1:4) = μ(1:4) − μ(1,2)μ(3,4) − μ(1,3)μ(2,4) − μ(1,4)μ(2,3).


3.4 Expressions between Moments and Cumulants 131

This latter expression shows that fourth-order cumulants differ from fourth-order
central moments:


4
κ(1:4) = E (Xi − EXi ) , (3.26)
i=1

unless all centralized μ(j,k) = κ(j,k) , j = k, are zero, i.e. variates are uncorrelated.
Formula (3.20) gets significantly simpler when  all components of X1:n coincide
with X ∈ R, say, because in this case μ(b) = E k∈b Xk = μX,|b| , only depends on
the cardinality |b| of the block b. The number of all partitions K{r} with size r and
type  = [1 , . . . , n ] is the number of terms in the sum (3.20) which is

n! n!
n−r+1 = n−r+1  ,
j =1 j ! (j !) j
j =1 j ! rj =1 kj !

where kj denotes the cardinality of the blocks (see (1.41), p. 32). Now we change
the summation by all partitions in (3.20) so that we collect partitions with the same
type for a given r,


n  
n−r+1  j
1 μX,j
κX,n = n! (−1)r−1 (r − 1)! (3.27)
j ! j!
r=1 j =r,j j =n j =1

or equivalently, using the blocks’ cardinality, we have


n  n! 
r
κX,n = (−1) r−1
(r − 1)! n−r+1 r μX,kj ,
r=1 j =r,j j =n j =1 j ! j =1 kj ! j =1

where kj denotes the cardinality of blocks of the partition K{r} , and the sum is taken
#  = (1 , . . . , #
over all sequences n ), j ≥ 0 such that the following two conditions
are satisfied: nj=1 j = r, and nj=1 j j = n.
Expression (3.20) is considered to be general by the distinct value principle, and
we can get any higher-order cumulant. However Theorem 2.1, p. 70 that gives the
formula for higher-order derivatives of compound functions can be used to obtain
the formula for higher-order cumulants directly:

|n|
  
K  
1 μ(jk ) k
κ(n) = (−1)r−1 (r − 1)! n! , (3.28)
k ! jk !
r=1 p(r,,j) k=1

where n = (n1 , . . . , nd ), |n| = nk , K = (nk + 1) − 1, 0 < jk =


jk,1 , . . . , jk,d ≤ n, k = 1 : K, 0 ≤ k , k = 1 : K, and the set p (r, , j) =
p (r, (1 , . . . , K ) , (j1 , . . . , jK )) depends on (1 , . . . , K ), (j1 , . . . , jK ), and is
132 3 T-Moments and T-Cumulants

defined by the constraints


K
jk,s k = ns , s = 1 : d,
k=1


K
k = r.
j =1

We now turn to the case of T-cumulants.

3.4.1.2 Expressions for Vector Variates

The expectation operator E defined for vectors or matrices acts element-wise,


for example E [X1 , X2 ] = [EX1 , EX2 ] . Therefore, the first-order cumulants are
Cum1 (Xj ) = EXj , j = 1, 2; and the second-order ones are

 
Cum2 (Xj , Xk ) = E Xj − EXj ⊗ (Xk − EXk ) = EXj ⊗ Xk − EXj ⊗ EXk
= vecCov(Xk , Xj ), j, k = 1, 2.

These formulae for the vector-valued case follow from the Example 3.9 similar
to the scalar-valued case. Observe that Cov(Xk , Xj ) denotes the usual covariance
⊗u
matrix. Now using the obvious notations either μ⊗ ⊗ ⊗
bj = E Xbj or μuj = EX1:n ,
j

we have the expression for T-cumulants in terms of T-moments.


Theorem 3.2 Take a list of random vectors X1:n , and assume regularity conditions.
The T-cumulants κ ⊗
1:n can be expressed via T-moments either by


n  ⊗
κ⊗
1:n = (−1)r−1 (r − 1)! K−1
p(K{r} )
μ⊗ , (3.29)
j =1:r bj
r=1 K{r} ∈Pn ,

where K{r} denotes partitions of the set 1 : n, with size r, and the second sum is
over all such partitions K{r} ∈ Pn , or equivalently by


n  ⊗
κ⊗
1:n = (−1)r−1 (r − 1)! K−1
p(U ) (d1:n ) μ⊗
uj , (3.30)
j =1:r
r=1 U ∈Pn ,|U |=r

where the second



summation is taken over all
possible
partition matrices U with
row vectors uj j =1:r . Such a vector system uj j =1:r corresponds to a partition
K{r} , (see Sect. 1.4 for details).
3.4 Expressions between Moments and Cumulants 133

We shall see that the first three T-cumulants, like in the scalar case, equal the
central T-moments but this is not true for higher-order cumulants. It is easy to check
this for cumulants of order four or higher (see (3.32), also (3.19), Examples 3.14
and 3.15).
Example 3.14 Let n = 3. First we discuss the partitions with size r and the
corresponding permutations of (1 : 3) (see also Example 1.21, p. 36 as well). The
possible sizes are |K| = r = 1, 2, 3. When r = 1, K{1} = {b} and the single
partition in this case is only one block with elements (1, 2, 3). For r = 2 the
partitions have two blocks K{2} = {b1 , b2 } , where b1 has one element, and b2
has two elements and the possible partitions are K{2},1 = {(1) , (2, 3)}, K{2},2 =
{(1, 3) , (2)}, K{2},3 = {(1, 2) , (3)}. When r = 3, K{3} = {(1) , (2) , (3)}. Since
the permutations depend on the partitions;
  therefore, we usethe notation
 p (K). The
corresponding
  permutations
 are p K {1} = (123), p K {2},1 = p K {2},3 = (123),
p K{2},2 = (132), and p K{3} = (123), see Example 1.21, p. 36. Now we apply
(3.29) to get

κ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗ ⊗
1:3 = μ1:3 − K(132) μ1,3 ⊗ μ2 − μ1,2 ⊗ μ3 − K(231) μ2,3 ⊗ μ1 + 2μ1 ⊗ μ2 ⊗ μ3 .
(3.31)

The third term in the right-hand side may be replaced by E (X1 ⊗ EX2 ⊗ X3 ), since

K−1 ⊗ ⊗ −1
(132) μ1,3 ⊗ μ2 = EK(132) (d1:3 ) (X1 ⊗ X3 ⊗ EX2 ) = E (X1 ⊗ EX2 ⊗ X3 ) ,

 −1  
and K−1
(132) = Id1 ⊗ Kd3 •d2 = Id1 ⊗ Kd3 •d2 = Id1 ⊗ Kd2 •d3 , i.e. the T-
product E (X1 ⊗ X3 ) ⊗ EX2 will be rearranged to the original order (1, 2, 3). Now
one can easily show that
⊗
κ⊗
1:3 = E (Xi − EXi ) . (3.32)
i=1:3

In the case of Xi = X

κ⊗ ⊗ −1 ⊗ ⊗ ⊗3
X,3 = μX,3 − L12 ,11 μX,2 ⊗ μX + 2μX ; (3.33)

where

L−1 −1 −1
12 ,11 = Id 3 + K(132) + K(231) ,

follows from (3.31) (see (A.2), p. 353, for L−1 ⊗


12 ,11 ). Again κ X,3 is symmetric, hence
we can use the symmetrizer
 
⊗3  ⊗
κ⊗ ⊗ ⊗ ⊗
X,3 = Sd13 μX,3 − 3μX,2 ⊗ μX + 2μX = μX,3 − 3μ⊗ ⊗ ⊗3
X,2 ⊗ μX + 2μX .
134 3 T-Moments and T-Cumulants

We can use (3.32) to replace Xi = X, and obtain

κ⊗
X,3 = E (X − EX)
⊗3
.

The second sum


 ⊗
K−1
p(U ) (d1:n ) μ⊗
uj (3.34)
j =1:r
U ∈Pn ,|U |=r

of formula (3.30) sums the product of expectations with respect to the partition
matrices L ↔ U in a given order |U | = r. This means that U has r rows, actually
the corresponding partitions L have exactly r blocks, (none of which is empty).
* * that for a partition L = {b1 , b2 , . . . , br } ∈ Pn , with box-
We saw in Sect. 1.4.4
cardinality kj = *bj *, we can consider its type  = (1 , . . . , n ). We recall that
the type of the partition K is (1 , . . . , n ), so that K contains exactly j blocs with
cardinality j . The number of all partitions with type  = (1 , . . . , n ) is the number
of terms in the sum (3.34) which is

n!
n−r+1 ,
j =1 j ! (j !)j

 ⊗uj
see (1.41), p. 32. The orders of the expected values in the product ⊗ j =1:r EX1:n
correspond to the type of the partition; therefore, it is reasonable to split the sum
(3.34) further up with respect to the different types of partitions.
Therefore, equating the variables of the list X1:n to X, say, and using the
symmetrizer Sd1n and symmetry equivalent equivalence, we obtain an expression
similar to (3.27)

  ⊗
n
n!
κ⊗
X,n = (−1)r−1 (r − 1)! n r μ⊗ (3.35)

j =1 j ! k
j =1 j ! j =1:r X,kj
r=1 j =r,j j =n


n  ⊗  
 1 1 ⊗ j
= n! (−1)r−1 (r − 1)! μX,j ,
j =1:(n−r+1) j ! j!
r=1 j =r,j j =n

where kj denote the cardinalities of the blocks, as usual (see Definition 1.2, p. 16
of symmetry equivalence). Here one can apply the incomplete Bell polynomials to
show the structure of partitions with size r and type .
Remark 3.4 It is worth noting that in expression (3.29) we can use the canonical
form for partitions which defines the commutator matrix K−1 p(K{r} )
and the order of
⊗
the tensor product j =1:r μ⊗ ⊗
bj , as well as the order of vectors inside μbj . If we set
the components of X1:n to be equal, then the order of vectors inside μ⊗
bj will have no
⊗ ⊗
importance. The T-products j =1:r μbj will still be different, even if we fix a type
3.4 Expressions between Moments and Cumulants 135

, but for a given type we can permute all the products


* * into the one characterizing
type , and arrange them in decreasing order of *bj * involved, if the cardinalities are
equal, then we choose first the one having the smaller entries. This way the canonical
form may be broken, but each type  will indicate a well defined sum of commutator
matrices. We denote the sum of commutator matrices by L−1 l1:r , see Sect. 2.4.3, p. 100
for more details. Finally using the symmetrizer, no order will have importance. The
following Example illustrates this idea.
Example 3.15 For a proper understanding of the role of the commutator matrices
we shall consider the case Cum4 (X1:4 ) when the variables are, at least apparently,
mutually distinct (see also (2.54)–(2.57), p. 94 for the corresponding T-derivatives).
Case 3.1 If r = 1, there is only one block (1 : 4) in the first-order partition, |U | =
1, |b1 | = 4, 4 = 1, j = 0, j = 4, and the corresponding expected value is μ⊗ 1:4 .
Case 3.2 If r = |U | = 2, then we have two types:
1. 1 = 1, 2 = 0, 3 = 1, 4 = 0, k1 = 1, k2 = 3. the corresponding product of
expected values has the form μ⊗ ⊗
j,k,m ⊗ μi , the number of which is 4!/3! = 4. We
take the second-order partitions of type 1.3 for  = [1, 0, 1, 0] from Table and
the corresponding permutations are (1, 2, 3|4), (1, 2, 4|3), (1, 3, 4|2), (2, 3, 4|1),
in canonical order see Definition 1.7. The contribution of these to the cumulant
Cum4 (X1:4 ) is
   
μ⊗ ⊗ −1 ⊗ ⊗ −1 ⊗ ⊗ −1 ⊗ ⊗
1,2,3 ⊗ μ4 + K(1243) μ1,2,4 ⊗ μ3 + K(1342) μ1,3,4 ⊗ μ2 + K(2341) μ2,3,4 ⊗ μ1 .

 
In case all the variables are the same, this becomes L−1
13 ,11 μ ⊗
X,3 ⊗ μ⊗
X , where
the matrix is

L−1 −1 −1 −1
13 ,11 = Id 4 +K(1243) + K(1342) + K(2341) .

2. 1 = 0, 2 = 2, 3 = 0, 4 = 0. The corresponding product of expected values


is μ⊗ ⊗
i,j ⊗ μk,m ; the number of these is 4!/2! (2!) = 3. Again like in the previous
2

case, we have

μ⊗ ⊗ −1 ⊗ ⊗ −1 ⊗ ⊗
1,2 ⊗ μ3,4 + K(1324) μ1,3 ⊗ μ2,4 + K(1423) μ1,4 ⊗ μ2,3 .

Observe that commutators replace the indices into the original order. In case all
the variables are the same this becomes L−1 ⊗2
22 μX,2 , where

L−1 −1 −1
22 = Id 4 + K(1324) + K(1423) ;

see (A.3), p. 353.


136 3 T-Moments and T-Cumulants

Case 3.3 If r = |U | = 3, then we have 1 = 2, 2 = 1, 3 = 0, 4 = 0, and


μ⊗ ⊗ ⊗
i ⊗ μj ⊗ μk,m , and the number of these is 4!/2!2! = 6, with

μ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗
1 ⊗ μ2 ⊗ μ3,4 + μ1 ⊗ μ2,3 ⊗ μ4 + K(1243) μ1 ⊗ μ2,4 ⊗ μ3

+ μ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗
1,2 ⊗ μ3 ⊗ μ4 + K(1324) μ1,3 ⊗ μ2 ⊗ μ4 + K(1423) μ1,4 ⊗ μ2 ⊗ μ3 .
 
In case all variables are the same this becomes L−1 ⊗ ⊗2
12 ,21 μX,2 ⊗ μX , with

L−1 −1 −1 −1 −1 −1
12 ,21 = Id 4 + K(1324) + K(1423) + K(2314) + K(2413) + K(3412) .

Case 3.4 Finally if r = |U | = 4, we have 1 = 4, j = 0, j = 1, with number 1,


that is μ⊗ ⊗ ⊗
1 ⊗ μ2 ⊗ μ3 ⊗ μ4 .

Now Cum4 (X1:4 ) is the sum of the above expressions multiplied by (−1)r−1 (r −
1)! accordingly
 
κ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗
1:4 = μ1:4 − μ1,2,3 ⊗ μ4 − K(1243) μ1,2,4 ⊗ μ3
 
− K−1
(1342) μ⊗
1,3,4 ⊗ μ ⊗ ⊗ ⊗
2 − μ1 ⊗ μ2,3,4

− μ⊗ ⊗ −1 ⊗ ⊗ −1 ⊗ ⊗
1,2 ⊗ μ3,4 − K(1324) μ1,3 ⊗ μ2,4 − K(1423) μ1,4 ⊗ μ2,3

+ 2μ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗
1 ⊗ μ2 ⊗ μ3,4 + 2μ1 ⊗ μ2,3 ⊗ μ4 + 2K(1243) μ1 ⊗ μ2,4 ⊗ μ3

+ 2μ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗
1,2 ⊗ μ3 ⊗ μ4 + 2K(1324) μ1,3 ⊗ μ2 ⊗ μ4

+ 2K−1 ⊗ ⊗ ⊗
(1423)μ1,4 ⊗ μ2 ⊗ μ3

− 6μ⊗ ⊗ ⊗ ⊗
1 ⊗ μ2 ⊗ μ3 ⊗ μ4 .

It should be noted that each term depends on the dimensions d1:4 of X1:4 as well. In
case all the variables are the same X, we have
   
κ⊗X,4 = μ ⊗
X,4 − L −1
13 ,11 μ⊗
X,3 ⊗ μ⊗
X − L−1 ⊗2
μ
22 X,2 + 2L−1
12 ,21 μ⊗
X,2 ⊗ μ⊗2
X − 6μ⊗4
X .

Using the symmetrizer matrix Sd14 , we obtain


κ⊗ ⊗ ⊗ ⊗ ⊗2 ⊗ ⊗2 ⊗4
X,4 = μX,4 − 4μX,3 ⊗ μX − 3μX,2 + 12μX,2 ⊗ μX − 6μX .

Further, assuming μ⊗
X = 0, we get


κ⊗ ⊗ −1 ⊗2 ⊗ ⊗2
X,4 = μX,4 − L22 μX,2 = μX,4 − 3μX,2 . (3.36)
3.4 Expressions between Moments and Cumulants 137

Example 3.16 Consider Cum5 (X) = κ ⊗


X,5 , when EX = 0, the Bell polynomial of
degree 5

B5 (x1:5 ) = x15 + 10x13x2 + 15x1x22 + 10x12x3 + 5x1 x4 + 10x2x3 + x5

shows that besides the incomplete Bell polynomial B5,1 (x1:5 ) = x5 the only term
which does not contain x1 is 10x2x3 . This latter one is a term of the incomplete Bell
polynomial B5,2 (x1 , . . . , x4 ) = 5x1 x4 + 10x2x3 . Therefore, the cumulant κ ⊗
X,5 has
the form
 
κ⊗ X,5 = μ ⊗
X,5 − L−1
13 ,12 μ⊗
X,3 ⊗ μ⊗
X,2 ,

where L−12,3 is a sum of 10 commutator matrices with respect to partitions with


size r = 2 and type  = [0, 1, 1, 0, 0], hence the coefficient is −1. The possible
partitions are (123|45), (124|35), (125|34), (134|25), (135|24), (145|23), (234|15),
(235|14), (245|13), (345|12) (blocks are separated by |). We obtain the commutator
matrices with respect to these partitions:

L−1 −1 −1 −1 −1 −1
13 ,12 = Id 5 + K(12435) + K(12534) + K(13425) + K(13524) + K(14523) (3.37)

+ K−1 −1 −1 −1
(23415) + K(23514) + K(24513) + K(34512).

Using the 5-symmetry and Sd15 , we obtain the symmetry equivalent form


κ⊗ ⊗ ⊗ ⊗
X,5 = μX,5 − 10μX,2 ⊗ μX,3 .

We can get this formula directly from the Bell polynomial B5 (x1:5 ) as well. Using
Sd16 on the cumulant κ ⊗
X,6 of a centered X we see, it is symmetry equivalent to


κ⊗ ⊗ ⊗ ⊗ ⊗2 ⊗3
X,6 = μX,6 − 15μX,2 ⊗ μX,4 − 10μX,3 + 30μX,2 .

This formula is based on the Bell polynomial of degree 6

B6 (x1:6 ) = x16 + 15x14x2 + 20x13x3 + 45x12x22 + 15x23 + 60x1x2 x3


+15x4x12 + 10x32 + 15x2 x4 + 6x1x5 + x6 ,

and the coefficients correspond to the coefficients of B6,r , if r = 2, multiplied by


−1, and if r = 3 multiplied by 2!.
As n increases, writing all the partitions becomes more and more difficult.
However, we have seen a clear pattern evolving, (see Sect. 1.4.1, p. 29), which can
be used to generate the partitions iteratively for any n, given the partitions for (n−1).
138 3 T-Moments and T-Cumulants

3.4.2 Expressions for Moments via Cumulants


3.4.2.1 Expressions for Scalar Variates

The distinct values principle is applied and we consider the moment EX1n as the
k
general case because higher-order mixed moments EY1:r1:r can be put into the form
EX1n , where n = k1:r and
 
X = [ Y1 , . . . , Y1 , . . . , Yr , . . . , Yr ] = Y1k1 , . . . , Yr1kr .
+ ,- . + ,- .
k1 kr

 
Now EX1n = μ(1:n) , κXbj = Cum Xbj
 
μ(1:n) = κ Xb j , (3.38)
K∈Pn bj ∈K

where the summation is over all partitions K = {b1 , b2 , . . . , bk } of (1 : n).


We give an example for the use of formula (3.38) when n = 3.
Example 3.17 Each partition of (1 : 3) is listed in Example 1.21; therefore, if
X1:3 ∈ R3 , then

μ(1:3) = κ(1:3) + κ(1,2)κ3 + κ(1,3)κ2 + κ(2,3)κ1 + κ1 κ2 κ3 . (3.39)

Now in particular

μX,3 = κX,3 + 3κX,2 κX,1 + κX,1


3
.

Recall here the notation κ3 = Cum (X3 ), whilst κX,3 = Cum3 (X).
In order to obtain cumulants of the products of random variables in the next
section, we need the following result which relates moments to cumulants. Here we
express joint moments in terms of cumulants. The result is similar to the result in
Theorem 3.1.
Theorem 3.3 (Moments in Terms of Cumulants) Take random variables X1:n ,
then


n  r
μ(1:n) = κX,bj , (3.40)
j =1
r=1 K{r} ∈Pn

where K{r} = {b1 , b2 , . . . , br } is a partition of the set (1, 2, . . . , n), Xbj denotes
 
the subset of random variables [X1 , X2 , . . . , Xn ] such that Xbj = Xi , i ∈ bj ,
and the summation is over all such partitions.
3.4 Expressions between Moments and Cumulants 139

Proof See Corollary 2.3, p. 66.


Expression (3.40) is similar to the expression of cumulant in terms of moments
(3.20); it is even simpler in the sense that it is connected directly to Bell polynomials,
see (1.44), p. 34. There is no need to use the finer incomplete Bell polynomials. The
reason is that all the derivatives of exp (x) at 0 are equal to 1.
Let us consider the case of EXn . Then product κXb1 · · · κXbr depends only on
type of a given partition K with order r, and type  = [1 , . . . , n ], where j means
that there are exactly j blocks among bk with cardinality j , the cumulant κXbk is
then simply κX,j ; therefore,

 r  
n−r+1

κX,bj = j
κX,j .
j =1
K∈Pn K∈Pn j =1

Recall that Bell polynomials have a similar form


n  
n−r+1  j
1 xj
Bn (x1:n ) = n!
j ! j!
r=1 j =r,j j =n j =1

(see ( 1.44)), where the summation over all partitions K ∈ Pn is split into two parts.
First we sum up for partitions with a given order r, then for all r. Hence we can
rewrite (3.40) as


n  
n−r+1  j
1 κX,j
μX,n = n! .
j ! j!
r=1 j =r,j j =n j =1

We conclude that μX,n = EXn can be expressed by the help of Bell polynomials
replacing the unknowns xj by cumulants κX,j = Cumj (X); therefore, we can use
the Bell polynomials  to directly express the moments in terms of cumulants μX,n =
Bn κX,1 , . . . , κX,n .
Example 3.18 The Bell polynomial of degree 5 is

B5 (x1:5 ) = x15 + 10x13x2 + 10x12x3 + 15x1x22 + 5x1 x4 + 10x2x3 + x5

see 1.43, hence

EX5 = μX,5 = κX,5 + 5κX,4 κX,1 + 10κX,3κX,2 + 10κX,3κX,1


2 2
+ 15κX,2 κX,1

+ 10κX,2κX,1
3
+ κX,1
5

 
= B5 κX,1 , . . . , κX,n .
140 3 T-Moments and T-Cumulants

Observe the convention for the order of the terms and variables inside a term: in
the polynomial expression the alphabetical order of the indices is used, while the
Kendall–Stuart notation when expressing moments in terms of cumulants, puts the
higher-order cumulants first, etc., which results in almost the opposite order as in
the polynomials.
From Faà di Bruno’s formula for general mixed derivatives, we get the Theo-
rem 2.1, p. 70 which gives the formula for higher-order derivatives of compound
functions and we directly obtain the formula for higher-order cumulants

|n| 
 
K  
1 κjk k
μ(n) = n! (3.41)
k ! j k !
r=1 p(r,,j) k=1

 
n = (n1 , . . . , nd ), |n| = nk , K = (nk + 1) − 1, 0 < jk = jk,1 , . . . , jk,d ≤
n, k = 1 : K, 0 ≤ k , k = 1 : K, where the set p (r, ,j) =
p (r, (1 , . . . , K ) , (j1 , . . . , jK )) of (1 , . . . , K ), (j1 , . . . , jK ) is defined by the
constraint


K
jk,s k = ns , s = 1 : d,
k=1


K
k = r.
j =1

Observe that the left-hand side of (3.40) is the product of variables contained in
On which is the largest partition with one block b = {(1 : n)}, and the right-hand
side is the sum with respect to all partitions K ∈ Pn . Actually, they are all finer than
O, i.e. O  K.

E (X1 X2 · · · Xn ) = E X1:n .

A more general and useful version of this theorem is the following.


Theorem 3.4 (Speed’s Theorem ) Take random variables X1:n and a partition L
of the set (1 : n), then
  
μXb = κX,bj , (3.42)
b∈L KL bj ∈K

where μXb = E j ∈b Xj and the summation is taken over all partitions K finer
than L.

Proof Use formula (3.40) for EXb and the result of Exercise 1.47.
3.4 Expressions between Moments and Cumulants 141

3.4.2.2 Expressions for Vector Variates

⊗k
We consider the moment EX⊗1 n
1:n as the general case because the moment EY1:r
1:r

⊗1n
can be put into the form EX1:n , where
 
X1:n = Y1k1 , . . . , Yr1kr = [Y1 , . . . , Y1 , . . . , Yr , . . . , Yr ],
+ ,- . + ,- .
k1 kr

i.e. the same elements in the product Y⊗k


1:r
1:r
are treated as they were different. We
introduce the shorter notations μ1:n = EX⊗1
⊗ ⊗
1:n and κ X,b = Cum (Xb ) for higher-
n

order T-moments and cumulants, respectively.


Theorem 3.5 Let X1:n be a list of random variates with dimensions d1:n . Suppose
that all partitions below are in canonical form, then
 ⊗
μ⊗
1:n = K−1
p(K) κ⊗
X,b , (3.43)
b∈K
K∈Pn

where the summation is over all partitions K = {b1 , b2 , . . . , bk } of (1 : n), see


Sect. 1.4.4 for the connection between partitions and permutations. A more general
version of (3.43) is
⊗  ⊗
EXb = K−1
p(K) κ⊗
X,b ,
b∈L b∈K
KL

where the summation is taken over all partitions K of 1 : n finer than L.


Here is an example of the use of formula (3.43) when n = 3.
Example 3.19 Let n = 3, then

μ⊗ ⊗ −1 ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
1:3 = κ 1:3 + K(321) κ 2,3 ⊗ κ 1 + K(132) κ 1,3 ⊗ κ 2 + κ 1,2 ⊗ κ 3 + κ 1 ⊗ κ 2 ⊗ κ 3 .
(3.44)

Now in particular, if Xj = X, then

μ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗3
X,3 = κ X,3 + K(132) κ X,2 ⊗ κ X,1 + κ X,1 ⊗ κ X,2 + K(231) κ X,1 ⊗ κ X,2 + κ X,1 .

Using the commutator L−1


12 ,11 (A.2), p. 353), we obtain
 
μ⊗ ⊗ −1 ⊗ ⊗ ⊗3
X,3 = κ X,3 + L12 ,11 κ X,2 ⊗ κ X,1 + κ X,1 .
142 3 T-Moments and T-Cumulants

By the help of the symmetrizer Sd13 we have


μ⊗ ⊗ ⊗ ⊗ ⊗3
X,3 = κ X,3 + 3κ X,2 ⊗ κ X,1 + κ X,1 .

Note that κ ⊗ ⊗
X,1 = μX .

Example 3.20 Let us assume μ⊗


X = 0, then

μ⊗ ⊗ −1 ⊗2
X,4 = κ X,4 + L22 κ X,2 ,

where

L−1
22 =
−1
Kp(K) (d) ,
K

and where partition K = {b1 , b2 }, and both blocks bi contain two elements.
These are K1 =  {(1,
 2) , (3, 4)}, K2 = {(1, 3) , (2, 4)}, K3 = {(1, 4) , (2, 3)}.
Permutations p Kj are the unique permutation with respect to partition Kj , (see
Sect. 1.4.4 for the connection between partitions and permutations). Therefore,

L−1 −1 −1
22 = Id 4 + K(1324) (d) + K(1423) (d) ,

and
 
−1 ⊗2 
μ⊗ ⊗ ⊗ ⊗2
X,4 = κ X,4 + L22 κ X,2 = κ X,4 + 3κ X,2 ,

by Sd14 .
In particular when Xj = X
 ⊗
μ⊗
X,n = K−1
p(K) κ⊗
X,|bj |
, (3.45)
bj ∈K
K∈Pn

where the summation is over all partitions K = {b1 , b2 , . . . , bk*} of* 1 : n. We repeat
the notation κ ⊗
X,|bj |
= Cum|bj | (X) for the T-cumulant of order *bj * having identical
vector variates X in it, and for a list of vector variates Xbj with possibly different
* *
dimensions the T-cumulant κ ⊗ * *
X,bj = Cum(Xbj ) is of order bj .
The nth-order moment μ⊗ X,n is symmetric; therefore, we can apply the sym-
metrizer in a general case as well. Collecting the partitions with the same size and
type and using Bell polynomials and symmetrizer Sd1n , we arrive at the formula

 
n  n! ⊗ ⊗
μ⊗
X,n = n κ X,jj , (3.46)
j =1 j ! (j !)
j j =1:n−r+1
r=1 j =r,j j =n
3.4 Expressions between Moments and Cumulants 143

from (3.45). Compare this to the expression of cumulants in terms of moments



(3.35). We use the Bell polynomials here by an obvious notation we have μ⊗ X,n =
 

Bn κ X,j , j = 1, 2, . . . , n . We shall define T-Bell polynomials in Sect. 4.7, later.
We can also use commutators L−1
lr:1 and obtain


n  ⊗ ⊗l
μ⊗
X,n = L−1
lr:1 κ X,jj ,
j =1:r
r=1 j =r,j j =n

see Sect. 2.4.3, p. 2.4.3 for correspondence between 1:n and lr:1 .
Example 3.21 If κ ⊗ ⊗
X,1 = μX = 0, then by formula (3.46) we have


μ⊗ ⊗ ⊗ ⊗ ⊗2 ⊗3
X,6 = κ X,6 + 15κ X,4 ⊗ κ X,2 + 10κ X,3 + 15κ X,2 ,

and the more detailed formula

μ⊗ ⊗ −1 ⊗ ⊗ −1 ⊗2 −1 ⊗3
X,6 = κ X,6 + L14 ,12 κ X,4 ⊗ κ X,2 + L23 , κ X,3 + L32 , κ X,2 .

See Appendix A.4.1, p. 361 for further formulae.


The moments of Gaussian variates follow directly
Example 3.22 Let X be Gaussian with EX = 0, and covariance vector κ ⊗
2 . Then
from the previous Example we have only 15 terms left

μ⊗ ⊗3
X,6 = 15κ X,2 ,

since all cumulants of higher order than 2 are 0.

3.4.3 Expression of the Cumulant of Products via Products


of Cumulants
3.4.3.1 Expressions for Scalar Variates

Let us start with solving the problem of finding Cum (X1 , X2 X3 ), in terms of
products of joint cumulants of X1:3 ∈ R3 . We can express Cum (X1 , X2 X3 ) in
terms of moments and then the moments in terms of cumulants,

κX1 ,X2 X3 = μ(1,2,3) − μ(2,3)μ1 = κ(1,2,3) + κ(1,2)κ3 + κ(1,3)κ2 + κ(2,3)κ1


+ κ1 κ2 κ3 − κ(2,3)κ1 − κ1 κ2 κ3
= κ(1,2,3) + κ(1,3)κ2 + κ(1,2)κ3 , (3.47)

where we have applied expressions (3.23) and (3.38).


144 3 T-Moments and T-Cumulants

The point actually is that the moments are taken as the products of the variables
given in the cumulant, then we can turn the moments into individual cumulants.
Therefore, the cumulant of the products can be expressed with the products of
cumulants of the variables themselves. Moreover the cumulant of X1 and X2 X3
can be considered as the cumulant of products with respect to a partition of all the
variables X1:3 considered. The random variables XL = (X1 , X2 X3 ) correspond to
the partition L = {(1), (2, 3)}. The result we got above is a sum of the products
of cumulants each term of which also corresponds to a partition of the set (1, 2, 3).
They are K1 = {(1, 2, 3)} , K2 = {(1, 3), (2)} and K3 = {(1, 2), (3)}.

K2 K3
hooked through (2,3)∈L hooked through (2,3)∈L
- .+ , - .+ ,
(2, 3) , (2) (1, 2) , (3)
+ ,- . + ,- .
communicate communicate

The main feature of these partitions is that there is no other partition but the largest
one (1, 2, 3) which can be combined by the help of the elements, i.e. blocks, of
both L and Kj at the same time. All the indecomposable partitions of (1, 2, 3)
with respect to the partition L = {(1), (2, 3)} are K1 = {(1, 2, 3)} , K2 =
{(1, 3), (2)} and K3 = {(1, 2), (3)}, and all of them show up in the right-hand side.
Let XL denote the vector of entries taken by the partition L, i.e. if L =
{b1 , b2 , . . . , bk } then
⎡ ⎤
      
XL = Xb1 , Xb2 , . . . , Xbk = ⎣ Xj , Xj , . . . , Xj ⎦ .
j ∈b1 j ∈b2 j ∈bk

Let us call L the initial partition. Similarly let K denote a new partition. We say the
new partition K is indecomposable with respect to the partition L if all blocs in K
communicate with respect to L (see Sect. 1.4.6, p. 38 for details). Then we have
Theorem 3.6 (Malyshev’s Formula) Let the  initial
 partition  L be  {b1 , b2 ,
. . . , bk }. The cumulant of the products Xb1 , Xb2 , . . . , Xbk can be
expressed as the cumulants of the subsets of the individual variables Xb , b ∈ K,
⎛ ⎞
    
Cum ⎝ Xj , Xj , . . . , Xj ⎠ = Cum(Xb ), (3.48)
j ∈b1 j ∈b2 j ∈bk K∪L=O b∈K

where Xb denotes the vector containing the items Xs , s ∈ b, and the summation is
over all partitions K such that K and L are indecomposable (K ∪ L = O).
Proof See Appendix 3.6.1, p. 169.
3.4 Expressions between Moments and Cumulants 145

There are several instances where we need to calculate the cumulants of products
of random variables, such as Cum(X1 X2 , X3 X4 X5 ), etc. This will become apparent
when we study higher-order moments and cumulants for distributions.
Let us consider some examples to illustrate the above ideas.
Example 3.23 Let (X1 , X2 , X3 ) be three random variables. Let us compute
Cum (X1 X2 , X3 ) . We have from (3.47)

Cum (X1 X2 , X3 ) = κ(1,2,3) + κ(1,3)κ2 + κ(2,3)κ1 . (3.49)

Here the initial partition L consists of the set L = {(1, 2) , (3)} , and the partitioning
sets given in the right-hand sides expressions of (3.49) are K1 = {(1, 2, 3)} ,
K2 = {(1, 3) , (2)} , K3 = {(1) , (2, 3)} . The first partition given by the set K1
contains only one block b1 = (1, 2, 3) , the second partition given by the set K2
has two blocks b1 = (1, 3) ; b2 = (2), and the third partition given by the K3
contains two blocks b1 = (1) , b2 = (2, 3) . The partition K1 is indecomposable
as it has only one set with all the elements. In fact any single set with all the
elements (1, 2, . . . , n) is always indecomposable. Now consider K2 , with the blocks
b1 = (1, 3) , b2 = (2) . We note the block (1, 2) of L contains elements 1, 2 which
are in the blocks b1 and b2 . Hence the sets b1 and b2 hook, and they communicate
as well. (If a partition has only two blocks and if they hook with respect to L, they
obviously communicate.) Therefore, K2 is an indecomposable partition with respect
to L. Now consider K3 , where b1 = (1) and b2 = (2, 3). We can see that b1 and b2
hook and also communicate. Hence K3 is an indecomposable partition with respect
to L.
Example 3.24 Let (X1 , X2 , X3 , X4 ) be four random variables, and let us calculate
Cum(X1 , X2 , X3 X4 ). For convenience, let us assume EXi = 0, (i = 1, 2, 3, 4) .
We have from (3.93), p. 174,

Cum(X1 , X2 , X3 X4 ) = κ(1,2,3,4) + κ(1,3)κ(2,4) + κ(1,4) κ(2,3).

Here the initial partition is L = {(1) , (2) , (3, 4)}, and the partitions K1 , K2 , and
K3 are

K1 K2 K3
hooked through (3,4)∈L hooked through (3,4)∈L
- .+ , - .+ ,
{(1, 2, 3, 4)} (1, 3) , (2, 4) (1, 4) , (2, 3)
+ ,- . + ,- .
communicate communicate

Partition K1 is indecomposable, as it contains only one set with all the elements
(1, 2, 3, 4). Partition K2 has two blocks b1 = (1, 3) , b2 = (2, 4) .The block
(3, 4) of L has the elements 3, 4 which are in b1 and b2 . Hence b1 and b2 hook,
and communicate. Therefore, K2 is an indecomposable partition with respect to
146 3 T-Moments and T-Cumulants

the initial partition L. Similarly we can see the partition K3 = {(1, 4) , (2, 3)} is
indecomposable partition with respect to L.
Now consider another example
Example 3.25 Consider now the partition L = {(1, 2) , (3, 4)} for random variables
X1:4 . We have listed all indecomposable partitions with respect to L in Exam-
ple 1.26, p. 39. Using those we have

Cum (X1 X2 , X3 X4 ) = κ(1,2,3,4) + κ(1,2,3)κ4 + κ(1,2,4)κ3 + κ(1,3,4)κ2 + κ(2,3,4)κ1


+ κ(1,3)κ(2,4) + κ(1,4)κ(2,3) + κ(1,3)κ2 κ4 + κ(1,4)κ2 κ3
+ κ(2,3)κ1 κ4 + κ(2,4)κ1 κ3 .

If EXj = 0, then

Cum (X1 X2 , X3 X4 ) = κ(1,2,3,4) + κ(1,3)κ(2,4) + κ(1,4)κ(2,3).

Moreover if in addition X1:4 is multi-normal, then

Cum (X1 X2 , X3 X4 ) = κ(1,3) κ(2,4) + κ(1,4) κ(2,3)

= Cov (X1 , X3 ) Cov (X2 , X4 ) + Cov (X1 , X4 ) Cov (X2 , X3 ) .

Set X1 = X2 = X, and X3 = X4 = Y . Then, as a consequence, for centered


bivariate normal, we have
 
Cov X2 , Y 2 = 2κX,Y
2
.

Now we turn to the multiple vector-valued cases.

3.4.3.2 Expressions for Vector Variates

The scalar version of Cum2 (X1 , X2 ⊗ X3 ) has been treated before at the beginning
of the previous section. To obtain this result, we need the expressions of cumulants
in terms of moments (3.29), and we obtain

κ⊗ ⊗ ⊗ ⊗
X1 ,X2 ⊗X3 = μ1,2,3 − μ1 ⊗ μ2,3 (3.50)

and by (3.43) for vector variates we have

μ⊗ ⊗ ⊗ ⊗
1,2 = κ 1,2 + κ 1 ⊗ κ 2 (3.51)
3.4 Expressions between Moments and Cumulants 147

and
 
μ⊗
1,2,3 = κ ⊗
X1 ,X2 ⊗X3 + κ ⊗
X1 ⊗ κ ⊗
X2 ⊗X3 ,1 = κ ⊗
X1 ,X2 ⊗X3 + κ ⊗
X1 ⊗ κ ⊗
2,3 + κ ⊗
X2 ⊗ κ ⊗
X3 ,
(3.52)

since κ ⊗ ⊗ ⊗ ⊗
X2 ⊗X3 ,1 = μ2,3 . From now on, we apply our usual notation as κ Xj = κ j
etc. Substituting the moments by cumulants in (3.50) we obtain

κ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗
X1 ,X2 ⊗X3 = κ 1:3 + κ 1 ⊗ κ 2,3 + K(132) κ 1,3 ⊗ κ 2 + κ 1,2 ⊗ κ 3

+ κ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
1 ⊗ κ 2 ⊗ κ 3 − κ 1 ⊗ κ 2,3 − κ 1 ⊗ κ 2 ⊗ κ 3

= κ⊗ −1 ⊗ ⊗ ⊗ ⊗
1:3 + K(132) κ 1,3 ⊗ κ 2 + κ 1 ⊗ κ 2,3 . (3.53)

Observe the difference of notations κ ⊗ ⊗


3 = Cum1 (X3 ) and κ X,3 = Cum3 (X),
 
similarly κ ⊗
(j,k) = Cum2 Xj , Xk . Here, we have to pay attention to the orders
of the T-products and orders of the variables included in the cumulants. As far as
partitions are concerned, the initial partition on left-hand side is L = {(1), (2, 3)},
and all indecomposable partitions with respect to L appear on the right-hand side.
Now we carefully derive the case κ ⊗ X1 ,X2 ,X3 ⊗X4 , the steps made here will follow
the proof of the general result in Lemma 3.6. Let Y = X3 ⊗ X4 . Then from (3.52)
and (3.53) we have

EX1 ⊗ X2 ⊗ Y = κ ⊗ ⊗ ⊗ −1 ⊗ ⊗
X1 ,X2 ,Y + κ X1 ⊗ κ X2 ,Y + K(132) κ X1 ,Y ⊗ κ 2

+ κ⊗ ⊗ ⊗ ⊗ ⊗
1,2 ⊗ κ X1 ,Y + κ 1 ⊗ κ 2 ⊗ κ X1 ,Y . (3.54)

But using the result in (3.53) (after changing the variables), we get

κ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗
X2 ,Y = κ X2 ,X3 ⊗X4 = κ 2,3,4 + K(132) κ 2,4 ⊗ κ 3 + κ 2,3 ⊗ κ 4 .

Now same expressions can be found for κ ⊗


X1 ,Y . And we have

κ⊗ ⊗ ⊗ ⊗ ⊗
Y,1 = μ3,4 = κ 3,4 + κ 3 ⊗ κ 4

Substituting the above expressions in (3.54), we have


 
EX1 ⊗ X2 ⊗ Y = κ ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗
X1 ,X2 ,X3 ⊗X4 + κ 1 ⊗ κ 2,3,4 + K(132) κ 2,4 ⊗ κ 3 + κ 2,3 ⊗ κ 4
 
+K−1 (132) κ ⊗
1,3,4 + K −1
κ ⊗
(132) 1,4 ⊗ κ ⊗
3 + κ ⊗
1,3 ⊗ κ ⊗
4 ⊗ κ2

   
+κ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
1,2 ⊗ κ 3,4 + κ 3 ⊗ κ 4 + κ 1 ⊗ κ 2 ⊗ κ 3,4 + κ 3 ⊗ κ 4 . (3.55)
148 3 T-Moments and T-Cumulants

Except for the first term in the right-hand side of (3.55), all the partitions are
decomposable with respect to the initial partition L = {(1) , (2) , (3, 4)}. Let us
carefully look at all the terms of the right-hand side of expression (3.55). Each one
of the last four indecomposable partitions,contains two blocks, and each block of
each of these partitions contains only one element from the block (3, 4) of L. If
there is a block (within a partition with more than one block) that contains both
elements 3 and 4, that partition will be decomposable. This observation, which is
similar to the one we made when calculating Cum2 (X1 , X2 ⊗ X3 ), is crucial for our
evaluation of the general result which will be stated later in Lemma 3.6.
The blocks of any partition (except for the first partition) do not hook (and do not
communicate) with the initial partition L. Now we can express EX1 ⊗X2 ⊗Y =μ⊗ 1:4
by (3.43) as
 ⊗
μ⊗
1:4 = K−1
p(K) κ⊗
X,bj , (3.56)
bj ∈K
K∈P4

with all the partitions on right-hand side. Equating (3.55) and (3.55), and canceling
the common terms, we obtain

κ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ −1 ⊗ ⊗
X1 ,X2 ,X3 ⊗X4 = κ 1:4 + κ 1,2,3 ⊗ κ 4 + K(1243) κ 1,2,4 ⊗ κ 3 + K(1423) κ 1,4 ⊗ κ 2,3

+ K−1 ⊗ ⊗
(1323) κ 1,3 ⊗ κ 2,4 . (3.57)

All the above partitions are indecomposable with respect to the initial partition L =
{(1) , (2) , (3, 4)} .
Now we state the analogous result to the scalar case, using commutator matrices.
It has been shown, see Sect. 1.4.4, p. 35, that for each partition K, in canonical form,
there is a unique matrix representation and there corresponds a unique permutation
p (K). This means that the elements of the blocks b ∈ K and the blocks in K appear
alphabetical order and it is fixed.
Theorem 3.7 (Malyshev’s Formula) Let X1:n denote a list of random vectors.
Now consider the list of products taken by an initial partition L = (b1 , b2 , . . . , br ):
 ⊗ ⊗ ⊗ 
XL = Xb1 , Xb2 , . . . , Xbr .

The cumulant of the products XL with respect to the partition L can be expressed
with the products of cumulants of the block of individual variables such that L and
K are indecomposable partitions, K ∪ L = O, i.e.
⊗ ⊗ ⊗    ⊗
−1 
Cumk Xb1 , Xb2 , . . . , Xbr = Kp(K) db1:p Cum|b| (Xb ),
b∈K
K∪L=O
(3.58)
3.5 Additional Matters 149

⊗
⊗ Xb denotes the set of vectors containing the
items Xs , s ∈ b,
where Xb =
X
j ∈b j , and the order of the dimensions d b1:p = d b1 , ..., d bp follows the order
of variables in XL . With a shorter notation
 ⊗
κ⊗
XL =
−1
Kp(K) κ⊗
X,b , (3.59)
b∈K
K∪L=O

where again the blocks in all partitions and the entries inside the blocks must be in
the same order as in the product, say canonical order (see Definition 1.7, p. 35).
Proof See Proof 3.6.1.
Example 3.26 To compute Cum (X1 ⊗ X2 , X3 ⊗ X4 ) we need to use Cum (X1 , X2 ,
X3 ⊗ X4 ), and lower-order cumulants such as Cum (X1 , X3 ⊗ X4 ), Cum (X2 , X3
⊗X4 ), given by (3.53) and (3.57). After substitution, we have

κ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗
X1 ⊗X2 ,X3 ⊗X4 = κ 1:4 + κ 1,2,3 ⊗ κ 4 + K(1243) κ 1,2,4 ⊗ κ 3

+K−1 ⊗ ⊗ −1 ⊗ ⊗
(1342)κ 1,3,4 ⊗ κ 2 + K(1243) κ 2,3,4 ⊗ κ 1

+K−1 ⊗ ⊗ −1 ⊗ ⊗
(1324)κ 1,3 ⊗ κ 2,4 + K(1423) κ 1,4 ⊗ κ 2,3

+K−1 ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗
(1324)κ 1,3 ⊗ κ 2 ⊗ κ 4 + K(1423) κ 1,4 ⊗ κ 2 ⊗ κ 3

+K−1 ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗
(2314)κ 2,3 ⊗ κ 1 ⊗ κ 4 + K(2413) κ 2,4 ⊗ κ 1 ⊗ κ 3 .

Here the initial partition is L = {(1, 2) , (3, 4)}, and all the above partitions are
indecomposable. If we assume that the means are zero, then

κ⊗ ⊗ −1 ⊗ ⊗ −1 ⊗ ⊗
X1 ⊗X2 ,X3 ⊗X4 = κ 1:4 + K(1423) κ 1,4 ⊗ κ 2,3 + K(1324) κ 1,3 ⊗ κ 2,4 . (3.60)

3.5 Additional Matters

3.5.1 Expressions of Moments and Cumulants via Preceding


Moments and Cumulants

In the previous sections, we have seen expressions for cumulants via moments and
vice versa. The use of the formula for expressing joint cumulants and moments in
terms of mixed products of preceding moments and cumulants will be apparent in
later sections.
Let X ∈ Rd , and put λ1:d = λ. The characteristic function φX (λ) of X can be
expressed with the cumulant function ψX (λ) by the equation φ = exp (ψ), hence
150 3 T-Moments and T-Cumulants

the derivative of φ is expressed by the derivative of ψ

Dλ⊗ φ = Dλ⊗ exp (ψ) = φDλ⊗ ψ,

with the well known consequence

μ⊗ ⊗
X = κX.

Next consider the second T-derivative of φ

Dλ⊗2 φ = φDλ⊗2 ψ + Dλ⊗ ψ ⊗ Dλ⊗ φ,

and note that on the right-hand side we have the product of derivatives of φ and ψ,
hence the second-order moment involves the second cumulant and the product of
first moment and cumulant

μ⊗ ⊗ ⊗ ⊗
X,2 = κ X,2 + κ X ⊗ μX .

Next we use (2.39), p. 85, and take the derivatives


  
Dλ⊗3 φ = φDλ⊗3 ψ + Dλ⊗ ψ ⊗ Dλ⊗2 φ + Id 3 + K−1
(132) Dλ⊗2 ψ ⊗ Dλ⊗ φ ,

to get the corresponding moments-cumulant expression as follows:


 
μ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗
X,3 = κ X,3 + Id 3 + K(132) κ X,2 ⊗ μX + κ X ⊗ μX,2 .

Again μ⊗X,3 is expressed with the help of preceding moments and cumulants. The
fourth-order case is obtained similarly

Dλ⊗4 φ = φDλ⊗4 ψ + Dλ⊗3 ψ ⊗ Dλ⊗ φ + Dλ⊗ ψ ⊗ Dλ⊗3 φ + K−1 ⊗2 ⊗2


(1423)Dλ ψ ⊗ Dλ φ
  
+ Id 4 + K−1
(1324) Dλ⊗2 ψ ⊗ Dλ⊗2 φ
   
+ Id 4 + K−1 −1 ⊗3
(1324) K(1243) Dλ ψ ⊗ Dλ φ

  
= φDλ⊗4 ψ + Id 4 + K−1(1243) + K −1
(1342) D ⊗3
λ ψ ⊗ D ⊗
λ φ
 
+ Id 4 + K−1 −1 ⊗2 ⊗2 ⊗
(1423) + K(1324) Dλ ψ ⊗ Dλ φ + Dλ ψ ⊗ Dλ φ.
⊗3

We note that the fourth-order derivatives of both the characteristic and cumulant
functions are 4-symmetric and if we rearrange the above equation so that the
difference of the fourth-order moment and cumulant is on the left-hand side then
3.5 Additional Matters 151

we can apply the symmetrizer Sd14 and obtain


  
μ⊗ ⊗ −1 −1
X,4 − κ X,4 = Id 4 + K(1243) + K(1342) κ⊗
X,3 ⊗ μX

  
+ Id 4 + K−1(1423) + K −1
(1324) κ ⊗
X,2 ⊗ μ ⊗ ⊗ ⊗
X,2 + κ X ⊗ μX,3

3  

 3 ⊗
= κ ⊗ μ⊗
X,j ,
j X,4−j
j =1

(see Definition 1.2, p. 16 of symmetry equivalence). Lemma 2.7, p. 87, and the
Exercise 2.23, (2.66) provide us the general expression for higher-order derivatives

n−1  
  n − 1  ⊗(n−j ) ⊗j

Dλ⊗n φX − φX Dλ⊗n ψX = Dλ ψX ⊗ Dλ φX ,
j
j =1

which readily gives the following.


Lemma 3.1 Let X ∈ Rd , then we have

n−1 
  n−1 
 
 n−1 n−1 ⊗
μ⊗ ⊗
X,n − κ X,n = κ⊗ ⊗
X,n−j ⊗ μX,j = κ ⊗ μ⊗
X,n−j ,
j j − 1 X,j
j =1 j =1
(3.61)

by symmetrizer Sd1n .
If X is centralized so that μ⊗ ⊗
X = κ X = 0, then the right-hand side of (3.61) is
zero at both j = 1 and j = n − 1, which implies that μ⊗ ⊗
X,n = κ X,n , n = 2, 3, and
the limits of the summation are 2 and n − 2, respectively.

3.5.2 Cumulants and Fourier Transform

Cumulants can also be expressed in terms of moments using the Fourier transform.
Consider a random variable X, and let X = [X1 , X2 , . . . , Xn ] be independent copies
of X. Denote the Fourier frequencies by
ωk = 2πk/n, k = 0, 1, 2, . . . , n − 1,

0 ≤ ωk < 2π, define the vector Fn = 1, eiω1 , . . . , eiωn−1 , and then we show
that
1   n
Cumn (X) = E Fn X . (3.62)
n
This formula looks beneficial for estimating the cumulants when the sample size is
large, and it works for cumulants with higher order. The Fast Fourier transform is an
152 3 T-Moments and T-Cumulants

efficient tool. In practice one can slice a sample into parts with length of the order
of cumulants to be estimated and then average the estimations based on these slices.
However, sliced-estimations are so biased that several problems can occur such as
complex values, etc.
Let us take an example when n = 3. Then ωk = 2πk/3, and the expected value

  3  3
E F3 X = E X1 + X2 eiω1 + X3 eiω2
 
= 3EX3 + 9EXEX2 eiω1 + eiω2 + 6 (EX)3
 
= 3μ3 + 9μ1 μ2 eiω1 + eiω2 + 6μ31 ,

where eiω1 + eiω2 = −1, since 1 + eiω1 + eiω2 = 0, and we obtain


  3
E F3 X = 3μ3 − 9μ1 μ2 + 6μ31 = 3κ3

by the expression of cumulant κ3 via moments.


We proceed with vector-valued case. Let X1:n = [X1 , X2 , . . . , Xn ] be n
independent copies of X ∈ Rd . We can write


n
vecX1:n = e k ⊗ Xk ,
k=1

where ek are the coordinate unit vectors of Rn . The independence of the components
implies that the cumulants of the sum is the sum of the cumulants, in particular we
have
 n 

Cum1 (vecX1:n ) = ek ⊗ κ ⊗ ⊗
X,1 = 1n ⊗ κ X,1 ,
k=1

and
 n 

Cum2 (vecX1:n ) = e⊗2
k ⊗ κ⊗ ⊗
X,2 = vec (I2 ) ⊗ κ X,2 .
k=1

In general by the same argument we obtain


 

n
Cumn (vecX1:n ) = e⊗n
k ⊗ κ⊗
X,n ,
k=1
3.5 Additional Matters 153

see (3.12). Let


n
 
3
X= Xk eiωk−1 = X1:n Fn = Fn ⊗ Id vecX1:n , (3.63)
k=1

and take n = 3. Then express the T −moment μ⊗


3
X,3
with cumulants
 
⊗ ⊗ −1 ⊗ ⊗
μ3
X,3
= κ 3
X,3
+ L12 ,11 κ 3
X
⊗ κ 3
X,2
+ κ ⊗3
3
X
,

(see (A.2), p. 353, for L−1 3 ⊗


= 3κ ⊗
12 ,11 ), use (3.63) for X, and obtain μ3
X,3 X,3 . Indeed

⊗   ⊗3   ⊗3
μ3
X,3
= F3 ⊗ Id Cum3 (vecX1:3 ) + L−1
12 ,11 F3 ⊗ Id

Cum2 (vecX1:3 ) ⊗ Cum1 (vecX1:3 )


  ⊗3  ⊗3
+ F3 ⊗ Id Cum1 (vecX1:3 )
  ⊗3  ⊗3   ⊗3
= F3 ⊗ Id 13 ⊗ κ ⊗
X,1 + L−1
12 ,11 Id ⊗ F3
   ⊗3  ⊗3

vec (I3 ) ⊗ κ ⊗
X,2 ⊗ 1 3 ⊗ κ ⊗
X,1 + F3 ⊗ Id ek ⊗ κ ⊗
X,3
 
⊗3
= F3 e⊗3 k κ⊗ ⊗
X,3 = 3κ X,3 ,

since the orthogonality of the Fourier vector: if j ≤ n.

⊗j
 ⊗j

n
  j  n−1
Fn ek = F n ek = ei2πj k/n = nδj n . (3.64)
k=1 k=0

Using the form (3.63), one can write


⎡ ⎤⊗n
 ⊗n X1

n
 ⊗n ⎢ . ⎥
3
X⊗n = Xk eikω1 = Fn ⊗ Id ⎣ .. ⎦ ,
k=1
Xn

then we shall prove the general vector-valued version of 3.62, see Appendix,
Sect. 3.6.3.

#n3.2 LetiωX1:n = [X1 , X2 , . . . , Xn ] be n independent copies of X ∈ R , and


Lemma d
3
X = k=1 Xk e k . Then

1 3⊗n
κ⊗
X,n = EX .
n
154 3 T-Moments and T-Cumulants

3.5.3 Conditional Cumulants

Conditional moments are usually calculated using the conditional distribution of


the random variables. We treat conditional moments with the help of conditional
characteristic function like we did for moments in Sect. 3.1. Let us start with the
series expansion of the conditional characteristic function of a random variate X
when the random variate Y is given

φX (λ|Y ) = E (exp (iλX) |Y )


∞ k
 i
= μX,k|Y λk .
k!
k=0

Noting the linearity of the conditional moments, we find the coefficient of λk is
the higher-order conditional moment μX,k|Y = E Xk |Y . Now the conditional
cumulants of X under variate Y is defined by the log of the conditional characteristic
function, namely by the conditional cumulant function ψX (λ|Y ) = log φX (λ|Y ).
Conditional cumulants are the coefficients of ψX (λ|Y ) in the series expansion
∞ k
 i
ψX (λ|Y ) = Cumk (X|Y ) λk .
k!
k=0

If X and Y are independent, then neither conditional moments nor cumulants depend
on the condition, since the conditional expectation of exp (iλX) does not depend on
Y.
Property 3.5 Assume, the necessary moments exist
1. If X and Y are independent, then E (X|Y ) = E (X) .
2. E (g (Y ) X|Y ) = g (Y ) E (X|Y ).
From these properties of conditional moments, the next property of conditional
cumulant follows directly.
Property 3.6 Assume, the necessary moments exist
1. If X and Y are independent, then ECumk (X|Y ) = Cumk (X).
2. Cumk (g (Y ) X|Y ) = g k (Y ) Cumk (X|Y ).
The definitions above allow us to use all formulas of the preceding sections for
moments and cumulants. For instance
 
Cum2 (X|Y ) = E X2 |Y − E (X|Y ) E (X|Y ) ,
3.5 Additional Matters 155

which is the conditional variance of X given Y . Notice the difference from


Cum2 (E (X|Y )) = E (E (X|Y ))2 − (EX)2 , which is the variance of the conditional
expectation.
The T-conditional cumulant follows this line of definition using the conditional
version of the characteristic and cumulant functions, φX1:n and ψX1:n .
The connections between conditional moments and conditional cumulants are
analogue to the relationship between the ordinary moments and cumulants since
they are based on the relation between the series expansions of a function and those
of its logarithm.  

We introduce the notations μ⊗ 1:n|Y = E j =1:n Xj |Y for conditional T-
moments and κ ⊗
1:n|Y = Cumn (X1:n |Y) for conditional T-cumulants and as well as
 ⊗n
μ⊗
X,n|Y = E X |Y and κ ⊗
X,n|Y = Cumn (X|Y), respectively.
We express the conditional cumulants in terms of conditional moments


n  ⊗
κ⊗
1:n|Y = (−1)r−1 (r − 1)! Kp−1K μ⊗ ,
( {r} ) j =1:r (bj )|Y
r=1 K{r} ∈Pn

and the conditional moments in terms of conditional cumulants


 ⊗
μ⊗
1:n|Y = K−1
p(K) κ⊗
(X,b)|Y ,
b∈K
K∈Pn

for further applications.


Among several properties of conditional moments it is well known that the
expectation of the conditional expectation is the unconditional expectation, namely
E (E (X|Y )) = E (X). The connection between the cumulants and conditional
cumulants is a bit more complicated. Conditional theory can be used to express
unconditional cumulants in terms of conditional ones. For instance, if we express the
cumulant in terms of moments, then replace the expected values by the expectation
of conditional ones we have

κ(1,2) = Eμ(1,2)|Y − Eμ1|Y Eμ2|Y .

Now we can substitute the conditional moments by conditional cumulants

κ(1,2) = Eκ(1,2)|Y + Eκ1|Y κ2|Y − Eκ1|Y Eκ2|Y

noting that the first-order cumulant is just


 the  expectation. Finally rewrite
moments
 via cumulants
 Eκ (1,2)|Y = Cum κ (1,2)|Y , Eκ1|Y κ2|Y − Eκ1|Y Eκ2|Y =
Cum κ1|Y , κ2|Y and obtain
   
κ(1,2) = Cum κ(1,2)|Y + Cum κ1|Y , κ2|Y . (3.65)
156 3 T-Moments and T-Cumulants

Using covariance instead of vector cumulants, we have

Cov (X1 , X2 ) = E (Cov (X1 , X2 |Y )) + Cov (E (X1 |Y ) , E (X2 |Y )) .

Setting X1 = X2 = X, which also applies for the variance and Brillinger’s formula
follows, namely

Var (X) = E (Var (X|Y )) + Var (E (X|Y )) .

The general version of this expression is the formula


  
Cum(X1:n ) = Cum Cum(Xb1 |Y ), Cum(Xb2 |Y ), . . . , Cum(Xbk |Y ) ,
K∈Pn
(3.66)

where K = {b1 , b2 , . . . , bk } is a partition of the set {1, 2, . . . , n}, Xbj denotes the
 
subset of random variables {X1 , X2 , . . . , Xn } such that Xbj = Xi , i ∈ bj , and
the summation is taken over all such partitions. For n = 3, Brillinger’s formula
asserts that
     
κ(1:3) = Cum κ(1:3)|Y + Cum κ(1,2)|Y , κ3|Y + Cum κ(1,3)|Y , κ2|Y
   
+ Cum κ(2,3)|Y , κ1|Y + Cum κ1|Y , κ2|Y , κ3|Y .

We might notice the similarity with the conditional expectation, except now
cumulants are summed up with respect to the partitions. In our case the T-cumulant
version of Brillinger’s formula is as follows.
Theorem 3.8 (Brillinger’s Theorem) Let X1:n be a list of random vectors, then
       
Cum (X1:n ) = K−1
p(K) Cumk Cum Xb1 |Y , Cum Xb2 |Y , . . . , Cum Xbk |Y ,
K∈Pn
(3.67)

where the summation is for all partitions K ∈ Pn , and permutations p (K) assume
the canonical form of K.
The method of proof follows that of the derivation of formula (3.65) above,
namely first expressing the cumulants in terms of moments, then the moments
in terms of conditional ones, replacing the conditional moments with conditional
cumulants, and finally expressing moments via cumulants—see Sect. 3.6.2 for the
proof in a particular case.
The distinct value principle is valid so Brillinger’s formula is simplified for a
list of variables X1:n when all of them are the same X, say. Recall that the type
 = [1 , . . . , n ] of a partition K{r} = {bk } with size r, means that there are exactly
j blocks among bk with cardinality j . If the elements of the list X1:n are the same,
3.5 Additional Matters 157

  * *
then the cumulants Cum Xbj |Y depend on the cardinality *bj * of a block bj only,
and Cumn (X) is symmetric, hence applying the symmetrizer Sd1n , we have

 
 
Cumn (X) = Cumk Cum|b1 | (X|Y) , Cum|b2 | (X|Y) , . . . , Cum|bk | (X|Y)
K∈Pn


n  n−r+1
 1
= n! Cumr
r=1 j =r, j =1
j ! (j !)j
jj =n
 
× Cum1 (X|Y)1:1 , . . . , Cumn−r+1 (X|Y)1:n−r+1 , (3.68)

where in case k = 0, the Cumk (X|Y)1:k is missing from the enumeration of


variables in the cumulant Cumr .
One can also discover here the connection to Bell polynomials.
Example 3.27 Let n = 3, then
     
κ⊗
1:3 = Cum 1 κ ⊗
1:3|Y + Cum 2 κ ⊗
, κ
1,2|Y 3|Y

+ K −1
(132) Cum2 κ ⊗
, κ
1,3|Y 2|Y

   
+K−1 ⊗ ⊗ ⊗ ⊗ ⊗
(231) Cum2 κ 2,3|Y , κ 1|Y + Cum3 κ 1|Y , κ 2|Y , κ 3|Y ,

therefore,
     
κ⊗ ⊗ −1 ⊗ ⊗ ⊗
X,3 = Cum1 κ X,3|Y + L12 ,11 Cum2 κ X,2|Y , κ X,1|Y + Cum3 κ X,1|Y

(see (A.2), p. 353, for ).


Example 3.28 Let n = 4, then by symmetrizer Sd14 , we have
     

κ⊗ ⊗ ⊗ ⊗ ⊗
X,4 = Cum1 κ X,4|Y + 4Cum2 κ X,3|Y , κ X,1|Y + 3Cum2 κ X,2|Y
   
+6Cum3 κ ⊗ ⊗ ⊗ ⊗
X,2|Y , κ X,1|Y , κ X,1|Y + Cum4 κ X,1|Y .

A constant as a variable is independent from any other random variable so


that the cumulant containing a constant is zero. Now the conditional cumulant
Cum (X1 , X2 |X2 ) treats X2 as a constant; therefore, it is expected to be zero. Indeed

Cum (X1 , X2 |X2 ) = X2 E (X1 |X2 ) + E (X1 |X2 ) E (X2 |X2 )


= X2 E (X1 |X2 ) + E (X1 |X2 ) X2
= 0.
158 3 T-Moments and T-Cumulants

One can also argue with the conditional cumulant function


   
ln E (exp (iλX) |Xk ) = iλk Xk + ln E exp iλ[1.n]\k X[1.n]\k |Xk ,

where the second term does not depend on λk , hence all derivatives that include
∂ n /∂λnk , n > 1, is zero. This implies the following general result.
Lemma 3.3 Let X1:n , n > 1, be a list of random variables, then Cum (X1:n |Xk ) =
0, 1 ≤ k ≤ n.
An application of Brillinger’s formula simplifies the cumulant of product of
independent variates expressed in terms of cumulants of individual variates.
Let (X1 , X2 ), and (Y1 , Y2 ) be independent and consider
   
κX1 Y1 ,X2 Y2 = Cum κX1 Y1 ,X2 Y2 |Y1,2 + Cum κX1 Y1 |Y1,2 , κX2 Y2 |Y1,2
   
= Cum Y1 Y2 κX1 ,X2 |Y1,2 + Cum Y1 κX1 |Y1,2 , Y2 κX2 |Y1,2
= κX1 ,X2 κY1 Y2 ,1 + κX1 κX2 κY1 ,Y2 . (3.69)

The left-hand side is symmetric in X and Y what we can achieve in the right-hand
side as well

κX1 Y1 ,X2 Y2 = κX1 ,X2 κY1 ,Y2 + κX1 ,X2 κY1 κY2 + κX1 κX2 κY1 ,Y2 .

Take independent random variables X and Y , then the Brillinger’s formula takes
the form as a special case of the above

Var (XY ) = EY 2 Var (X) + (EX)2 Var (Y )


= Var (X) Var (Y ) + (EY )2 Var (X) + (EX)2 Var (Y ) .

More
 generally assume X1:n and Y1:n are independent random variates, denote
Yb = j ∈b Yj , then we have
 
Cumn (X1 Y1 , . . . , Xn Yn ) = Cum (Yb , b ∈ K) Cum|b| (Xb ) . (3.70)
K∈Pn b∈K

In particular if X and Y are independent, then we obtain from the previous


expression


n   
n  
2 1 κX,j j
Cumn (XY ) = n! Cumr Y{1 } , Y{ 2}
, . . . , Y n
{n } ,
j ! j!
r=1 j =r, j =1
jj =n
(3.71)
3.5 Additional Matters 159

j
where Y{ } corresponds to the block with cardinality j , which includes the power
j
Y j only (it implies listing Y j consecutively j times).
Example 3.29 Let X and Y be independent then we apply (3.71) for fourth-order
cumulant of the product κXY,4 , (see either Example 3.15 for the types of partitions
according to this case, or Bell polynomials, Sect. A.1, p. 351 ).
1. If r = 1, we have one block with 4 elements; therefore, the contribution of the
corresponding term to κXY,4 is κY 4 ,1 κX,4 .
2. If r = 2, we have two types: (a)  = (1, 0, 1, 0) and the corresponding term is
4κX,3 κX,1 κY 3 ,Y . (b)  = (0, 2, 0, 0), which implies 3κX,2
2 κ
Y 2 ,2 .
3. If r = 3, then  = (2, 1, 0, 0), we have 6κX,1 κX,2 κY,Y,Y 2 .
2

4. Finally, if r = 4, then  = (4, 0, 0, 0), with κX,14 κ .


Y,4

Now, summarizing these cases we obtain

κXY,4 = κY 4 ,1 κX,4 + 4κX,3κX,1 κY 3 ,Y + 3κX,2


2
κY 2 ,2 + 6κX,1
2
κX,2 κY,Y,Y 2 + κX,1
4
κY,4 .
(3.72)

Closing this section we use the Brillinger’s formula (3.48) and Malyshev’s
formula for independent variates.
Lemma 3.4 Consider X and Y independent random variables then
1. The 2nd-order cumulant (variance) is

κXY ,2 = κX1 ,X2 κY1 Y2 ,1 + κX1 κX2 κY1 ,Y2 ,

with symmetric version

κXY ,2 = κX,2 κY,2 + κX,2 κY,1


2
+ κX,1
2
κY,2 ,

2. The 3rd-order cumulant (skewness) is


3
κXY ,3 = κX,3 κY 3 ,1 + 3κX,1 κX,2 κY 2 ,Y + κX,1 κY,3

with symmetric version

κXY ,3 = κX,3 κY,3 + 3κX,3κY,2 κY,1 + 3κX,2κX,1 κY,3


3 3
+ 6κX,2 κX,1 κY,2 κY,1 + κX,3 κY,1 + κX,1 κY,3 ,

3. Finally the 4th-order cumulant (kurtosis) is

κXY,4 = κY 4 ,1 κX,4 + 4κX,3κX,1 κY 3 ,Y + 3κX,2


2
κY 2 ,2 + 6κX,1
2
κX,2 κY,Y,Y 2 + κX,1
4
κY,4
160 3 T-Moments and T-Cumulants

with symmetric version


   
κXY ,4 = κX,4 κY,4 + 4 κX,4 κY,3 κY,1 + κX,3 κX,1 κY,4 + 3 κX,4 κY,2
2
+ κX,2
2
κY,4
   
+ 6 κX,4 κY,2 κY,1
2
+ κX,2 κX,1
2
κY,4 + κX,4 κY,14
+ κX,1
4
κY,4

+ 12κX,3 κX,1 κY,3 κY,1


 
2 2
+ 12 κX,3 κX,1 κY,2 κY,1 + κX,2 κX,1 κY,3 κY,1
 
+ 12 κX,2
2
κY,3 κY,1 + κX,3 κX,1 κY,2
2
+ 6κX,2
2 2
κY,2
 
2 2 2 2
+ 12 κX,2 κY,2 κY,1 + κX,2 κX,1 κY,2 .

The usage of conditional cumulants for products of independent variables results


in more compact forms than usage of cumulants for products, see Exercises 3.21
and 3.22, for instances. The usage of conditional cumulants for products of
independent variables results in more compact forms, than usage of cumulants for
products (see Exercises 3.21 and 3.22, for examples).
Remark 3.5 In the most general case when one considers cumulants of products
Cum4 (X1 Y1 , X2 Y2 , X3 Y3 , X4 Y4 ) in terms of products of cumulants there are 2465
terms (among all 4140 subsets) with respect to the indecomposable partitions
with respect to partition L = {(1, 2) , (3, 4) , (5, 6) , (7, 8)} of 8 elements. In our
independent case it simplifies to either 5, (3.72) or 17 (Exercise 3.22) terms. One
can also check these formulae by first expressing 4th-order cumulant in terms of
expected values then using independence writing them in terms of cumulants. When
doing so one should note that first-order cumulants are not zero.

3.5.3.1 Conditional Gaussian Cumulants

Let us consider the conditional cumulants for jointly Gaussian random variables.
If X and Y are jointly Gaussian, then we know that the conditional expectation
E (X|Y) is linear in Y, namely

E (X|Y) = μX + CX,Y C−1


Y,Y (Y−μY ) ,

where CX,Y and CY,Y denote covariance and variance–covariance matrix, respec-
tively (CY,Y > 0); moreover X−E (X|Y) and Y are uncorrelated hence independent.
The conditional variance

Var (X|Y) = Var (X−E (X|Y) |Y) = Var (X−E (X|Y))

of X is constant; therefore, it does not depend on Y.


3.5 Additional Matters 161

Example 3.30 Let X1:3 and Y be jointly Gaussian then

Cum3 (X1:3 |Y ) = Cum3 (X1:3 − E (X1:3 |Y ) + E (X1:3 |Y ) |Y )


= Cum3 (X1:3 − E (X1:3 |Y )) + Cum3 (E (X1:3 |Y ) |Y ) .

If Y is given, then E (X1:3 |Y ) is considered to be constant, according to Y ; therefore,


Cum3 (E (X1:3 |Y ) |Y ) = 0. We obtain

Cum3 (X1:3 |Y ) = Cum3 (X1:3 − E (X1:3 |Y )) ,

and X1:3 − E (X1:3 |Y ) is Gaussian; therefore, we conclude that Cum3 (X1:3 |Y ) is


zero. As a special case, we also have

Cum3 (X|Y ) = E (X − E (X|Y ))3 = 0.

The same argument leads to the general case: If X1:n and Y are jointlyGaussian,
n > 1, then the conditional cumulants do not depend on Y , more precisely

Cumn (X1:n |Y ) = Cumn (X1:n − E (X1:n |Y )) .

If a list X1:n and Y are jointly Gaussian, n > 1, then we can simply argue that
E (X1:n |Y) is a constant with respect to conditional Y; therefore, we have

Cumn (X1:n |Y) = Cumn (X1:n −E (X1:n |Y) |Y) = Cumn (X−E (X1:n |Y)) = 0,

and X−E (X1:n |Y) is a Gaussian variate that is the result is zero.

3.5.4 Cumulants of the Log-likelihood Function

The above results can be used to obtain the cumulants of the partial derivatives of
the log-likelihood function. Such expressions are useful in studying the asymptotic
theory in statistics.
Consider a random sample [X1 , X2 , . . . , XN ] = X ∈ RN , with the likelihood
function L (ϑ, X) and let l (ϑ) denote the log-likelihood function, i.e.

l (ϑ) = ln L (ϑ, X) , ϑ ∈ Rd .

It is well known that under the regularity conditions

∂l (ϑ) ∂l (ϑ) ∂ 2 l (ϑ)


Eϑ = −Eϑ . (3.73)
∂ϑ1 ∂ϑ2 ∂ϑ1 ∂ϑ2
162 3 T-Moments and T-Cumulants

Observe the notation of expected value Eϑ , since the parameter ϑ is included in


the likelihood function L (ϑ, X). We introduce the notation ∂/∂ϑ = Dϑ , and
∂ 2 /∂ϑj ∂ϑk = Dϑj ,ϑk , and so on for partial derivatives. The result (3.73) can
be extended to products of several partial derivatives for d = 4, who use these
expressions in the evaluation of Bartlett’s correction. We can arrive at the result
(3.73) by observing that L (ϑ, X) = el(ϑ) . Suppose d = 2, then we have

Dϑ1 ,ϑ2 el(ϑ) = Dϑ1 ,ϑ2 L (ϑ, X) ,

and from (2.10), p. 66 we obtain




Dϑ1 ,ϑ2 el(ϑ) = el(ϑ) Dϑ1 l (ϑ) Dϑ2 l (ϑ) + Dϑ1 ,ϑ2 l (ϑ) .

Equating the two expressions above we get

Dϑ1 ,ϑ2 L (ϑ, X)


= Dϑ1 l (ϑ) Dϑ2 l (ϑ) + Dϑ1 ,ϑ2 l (ϑ) ,
L (ϑ, X)

The expected value of the left-hand side of the above expression is zero, as we may
change the order of the derivative and the integral, which gives the result (3.73).
The same argument will lead, more generally to several partial derivatives
4 4
Dϑ1:d L (ϑ, X)
Eϑ = Dϑ1:d L (ϑ, x) dx = Dϑ1:d el(ϑ) dx
L (ϑ, X)

d  
= Eϑ Dϑb l (ϑ) ,
r=1 K ∈Pd b∈K
|K |=r


where Dϑb = ∂ |b| / j ∈b ∂ϑj ; therefore,


d  
Eϑ Dϑb l (ϑ) = 0. (3.74)
r=1 K ∈Pd b∈K
|K |=r

Consider a partition K ∈ Pd and blocks b ∈ K, introduce the notations


μ∂ϑb = Eϑ Dϑb l (ϑ), μ∂ϑb ,ϑb = Eϑ Dϑb1 l (ϑ) Dϑb2 l (ϑ), and μ∂ϑb ,b∈K =
 1 2  
Eϑ b∈K Dϑb l (ϑ), and similarly for the cumulants κϑ∂b = Cum Dϑb l (ϑ) ,
   
κϑ∂b ,ϑb = Cum Dϑb1 l (ϑ) , Dϑb2 l (ϑ) , and κϑ∂b ,b∈K = Cum Dϑb l (ϑ) , b ∈ K .
1 2
The simplest case of (3.74) is Eϑ Dϑ1 l (ϑ) = μ∂ϑ1 = 0, and as we know κϑ∂1 = μ∂ϑ1 .
The derivative of the expected value of the first derivative of the log-likelihood
3.5 Additional Matters 163

function μ∂ϑ1 has the following form


4 4
 
Dϑ2 μ∂ϑ1 = Dϑ1 ,ϑ2 el(ϑ) dx = Dϑ1 ,ϑ2 l (ϑ) + Dϑ1 l (ϑ) Dϑ2 l (ϑ) el(ϑ) dx

Dϑ2 μ∂ϑ1 = μ∂ϑ(1,2) + μ∂ϑ1:2 .

Observe the difference: μ∂ϑ(1,2) and μ∂ϑ1:2 correspond to μ∂ϑb = Eϑ Dϑb l (ϑ) ,with
b = (1, 2), and μ∂ϑ1:2 = μ∂ϑ1 ,ϑ2 = Eϑ Dϑ1 l (ϑ) Dϑ2 l (ϑ), respectively.
Let b be a block of a partition K ∈ Pd−1 then
4
 
Dϑd Eϑ Dϑb l (ϑ) = Dϑ(b,d) l (ϑ) + Dϑb l (ϑ) Dϑd l (ϑ) el(ϑ) dx,

Using our notation we have

Dϑd μ∂ϑb = μ∂ϑ(b,d) + μ∂ϑb ,ϑd (3.75)

which corresponds to the inclusive–exclusive principle, see Corollary 2.3, p. 66.


Then the derivative rule (3.75) for moments implies


k
Dϑd μ∂ϑb ,j =1:k = μ∂ϑ b ,d + μ∂ϑd ,ϑb ,j =1:k . (3.76)
j ( i ) ,ϑbj ,j =i j
i=1

For instance if k = 2,

Dϑd μ∂ϑb ,ϑb2 = μ∂ϑ b ,ϑb2 + μ∂ϑb ,ϑ(b ,d ) + μ∂ϑb ,ϑb2 ,ϑd . (3.77)
1 ( 1 ,d ) 1 2 1

Hence the rule for higher-order derivatives of the expected values of the log-
likelihood function follows that for the derivatives of the compound function el(ϑ) .
So we can obtain formula (3.74) from the consecutive derivatives of μ∂ϑ1 , as well.
The first-order cumulant of Dϑ1 l (ϑ) coincides with the moment κϑ∂1 = μ∂ϑ1 .
Now we take the derivative of the cumulant

Dϑ2 κϑ∂1 = Dϑ2 μ∂ϑ1 = μ∂ϑ(1,2) + μ∂ϑ1 ,ϑ2 = κϑ∂(1,2) + κϑ∂1 ,ϑ2 ,

where κϑ∂(1,2) = μ∂ϑ(1,2) , as they are the expected values of Dϑ1 ,ϑ2 l (ϑ), and κϑ∂1 ,ϑ2 =
μ∂ϑ1 ,ϑ2 , since the covariance of Dϑ1 l (ϑ) and Dϑ2 l (ϑ) is equal to the expected value
of their product when μ∂ϑ1 = μ∂ϑ2 = 0. A similar argument leads us to the formula

Dϑ2,3 κϑ∂1 = Dϑ2,3 μ∂ϑ1 = μ∂ϑ(1,2,3) + μ∂ϑ(i,j),ϑ [3] + μ∂ϑ1:3 = κϑ∂(1,2,3) + κϑ∂(i,j),ϑ + κϑ∂1:3 ,
k k
164 3 T-Moments and T-Cumulants

since the third-order central moment is equal to the third-order cumulant, where
μ∂ϑ(i,j),ϑ [3] denotes the sum of three terms with respect to the partitions with type
k
 = [1, 1, 0], see (2.5), p 62. Both derivatives Dϑ2 κϑ∂1 and Dϑ2,3 κϑ∂1 suggest that the
derivatives of the cumulants of derivatives of the log-likelihood function follows the
same rule as the moments do. We will show


d 
 
Cum Dϑb l (ϑ) , b ∈ K = 0, (3.78)
r=1 K ∈Pd
|K |=r

the analogue of formula (3.74). Let K ∈ Pd−1 andbj , j = 1 : k be blocks of K.


We can use the shorter notation for cumulants Cum Dϑb l (ϑ) , b ∈ K = κϑ∂b ,b∈K ,
and rewrite (3.78) as


d 
κϑ∂b ,b∈K = 0.
r=1 K ∈Pd
|K |=r

The idea is to prove the same derivative rule to (3.76) for the cumulants, and then
(3.78) follows.
Lemma 3.5 (Skovgaard) Let L ∈ Pd−1 and bj , j = 1 : k be blocks of L, then


k
Dϑd κϑ∂b ,j =1:k = κϑ∂ b ,d + κϑ∂d ,ϑb ,j =1:k . (3.79)
j ( i ) ,ϑbj ,j =i j
i=1

We have seen that each term on the right-hand side in (3.75) is equal to the
corresponding cumulant term, hence (3.79) holds for k = 1

Dϑd κϑ∂b = κϑ∂(b,d) + κϑ∂b ,ϑd ,

We proceed with the derivative


 
Dϑd κϑ∂b ,ϑb = Dϑd μ∂ϑb ,ϑb − μ∂ϑb μ∂ϑb
1 2 1 2 1 2

= μ∂ϑ + μ∂ϑb ,ϑ + μ∂ϑb ,ϑb ,ϑd


(b1 ,d ) ,ϑb2 1 (b2 ,d ) 1 2
   
− μ∂ϑ + μ∂ϑb ,ϑd μ∂ϑb − μ∂ϑb μ∂ϑ + μ∂ϑb ,ϑd
(b1 ,d ) 1 2 1 (b2 ,d ) 2

= κϑ∂ + κϑ∂b ,ϑ + μ∂ϑb ,ϑb ,ϑd − μ∂ϑb ,ϑd μ∂ϑb − μ∂ϑb μ∂ϑb ,ϑd
(b1 ,d ) ,ϑb2 1 (b2 ,d ) 1 2 1 2 1 2

= κϑ∂ + κϑ∂b ,ϑ + κϑ∂b ,ϑb ,ϑd ,


(b1 ,d ) ,ϑb2 1 (b2 ,d ) 1 2
3.5 Additional Matters 165

where we have applied the formula of cumulants via moments three times, note that
we have added a zero term μ∂ϑd μ∂ϑb ,ϑb = 0 to the expression
1 2

μ∂ϑb ,ϑb2 ,ϑd − μ∂ϑb ∂


,ϑd μϑb2 − μ∂ϑb μ∂ϑb ,ϑd
1 1 1 2

in order to get κϑ∂b ,ϑb ,ϑd . One can compare this result with (3.77). (See Exer-
1 2
cise 3.32 and the proof of Skovgaard’s Lemma in the Appendix, Sect. 3.6.4.)
Equation (3.74) is in terms of the expected values of the derivatives of the log–
likelihood function, where as (3.78) is in terms of the cumulants.
 For example,
suppose we have a single parameter ϑ, then the moment Eϑ b∈K ϑb depends on
   j  j
the type  of the partition K ∈ Pd , i.e. Eϑ b∈K ϑb = Eϑ j Dϑ l (ϑ) . Let us
denote
d 
  j
j
μ∂d () = Eϑ Dϑ l (ϑ) .
j =1

The distinct values principle applies here again and we use the formula (1.41) p. 32
for the number of partitions of the same type and obtain the single parameter case
of the formula (3.74)


d  
d−r+1
d!μ∂d ()
= 0, (3.80)
r=1 j =r, j j =d j =1
j ! (j !)j

where the summation is taken over all possible type  = 1:d , fulfilling the
assumptions j ≥ 0, j = r, j j = d. For instance
  2   3   4
μ∂4 (1 , 2 , 3 , 4 ) = Eϑ [Dϑ l (ϑ)]1 Dϑ2 l (ϑ) Dϑ3 l (ϑ) Dϑ4 l (ϑ) ,

where j = r, r = 1 : 4, j j = 4; therefore,

μ∂4 (0, 0, 0, 1) + 4μ∂4 (1, 0, 1, 0) + 6μ∂4 (2, 1, 0, 0) + 3μ∂4 (0, 2, 0, 0)


+ μ∂4 (4, 0, 0, 0) = 0. (3.81)

One can use the Bell polynomials similarly as well. The cumulants Cum Dϑb l (ϑ) ,
b ∈ K) depend on the type  of the partition K for a one parameter case,
i.e.κϑ∂b ,b∈K = κd∂ (). Now the general formula (3.78) has the following form:


d  
d−r+1
d!κd∂ ()
= 0. (3.82)
r=1 j =r, j j =d j =1
j ! (j !)j
166 3 T-Moments and T-Cumulants

Hence the result for cumulants when d = 4

κ4∂ (0, 0, 0, 1) + 4κ4∂ (1, 0, 1, 0) + 6κ4∂ (2, 1, 0, 0) + 3κ4∂ (0, 2, 0, 0)


+ κ4∂ (4, 0, 0, 0) = 0, (3.83)

which is a special case of (3.82).

3.5.4.1 Cumulants of the log-likelihood Function, the Vector Parameter


Case

The multivariate extension (when the elements of the parameter vector are also
vectors) of the formula (3.74) can easily be obtained using Faà di Bruno’s
formula

  2.51, p.91.  If we partition the vector parameters into d subsets, ϑ = ϑ 1:d =


ϑ 1 , ϑ 2 , . . . , ϑ d with dimensions [d1 , d2 , . . . , dd ] respectively, then it follows
that


d  ⊗  
⊗|b|
K−1
p(K) (d1:d ) Eϑ Dϑb l (ϑ) = 0, (3.84)
b∈K
r=1 K ∈Pd
|K |=r



where ϑb denotes a subset of vectors ϑ j , j ∈ b . Now, in particular if d = 2 and
ϑ 1 = ϑ 2 = ϑ then (3.84) gives the well known result
  
Cov (Dϑ l (ϑ) , Dϑ l (ϑ)) = −Eϑ Dϑ Dϑ l (ϑ) ,

or in vectorized form, the same can be written as


   
Eϑ Dϑ⊗ l (ϑ) ⊗ Dϑ⊗ l (ϑ) = −Eϑ Dϑ⊗2 l (ϑ) .

If, for instance d = 4, say and ϑ 1 = ϑ 2 = ϑ 3 = ϑ 4 = ϑ ∈ Rm , then we can apply


(3.84) and have

Sm14 μ∂4 (0, 0, 0, 1) + 4μ∂4 (1, 0, 1, 0) + 6μ∂4 (2, 1, 0, 0)

+3μ∂4 (0, 2, 0, 0) + μ∂4 (4, 0, 0, 0) = 0,

where

⊗  ⊗2
μ∂4 (1 , 2 , 3 , 4 ) = Eϑ Dϑ⊗ l (ϑ) 1 ⊗ Dϑ⊗2 l (ϑ)
 ⊗3  ⊗4
⊗ Dϑ⊗2 l (ϑ) ⊗ Dϑ⊗2 l (ϑ) .
3.6 Appendix 167

We can obtain a similar expression for the cumulants and it is given by


d   
⊗|b|
K−1
p(K) (d1:d ) Cumr Dϑb l (ϑ) , b ∈ K = 0.
r=1 K ∈Pd
|K |=r

3.6 Appendix

3.6.1 Proof of Lemma 3.6 and Theorem 3.7

We obtain a general expression for Cumn−1 (X1 , X2 , . . . , Xn−2 , Xn−1 ⊗ Xn ),


which will be used at proving Theorem 3.7, below.
Lemma 3.6

κ⊗ ⊗
X1:n−2 ,Xn−1 ⊗Xn = κ X1:n + K−1 ⊗ ⊗
p(K) κ b1 ⊗ κ b2 , (3.85)
K ∈Pn ,K ={b1 ,b2 }
n−1∈b1 ,n∈b2

 
Here κ ⊗ b1 = Cum|b1 | Xb1 is the cumulant of a subset of the elements
 
[X1 , X2 , . . . , Xn−2 ] and Xn−1 , and κ ⊗
b2 = Cum|b2 | Xb2 is the cumulant of the
complementary subset and Xn .
Proof We prove the result (3.85) by induction over n, and for the proof we follow
a similar procedure to the one we followed for obtaining Cum (X1 , X2 X3 ) and
Cum (X1 , X2 , X3 X4 ) .
Let us suppose the result (3.85) is true for some n and we will prove the result is
true for n+1. Let us calculate the expectation of the product of the random variables
(X1 , X2 , . . . , Xn−2 , Xn−1 , Xn , Xn+1 ) , and this is given by Theorem 3.3
 ⊗
μ⊗
X1:n+1 = K−1
p(K) κ⊗
Xb .
bj ∈K j
K∈Pn+1

(see special cases when n = 2, n = 3, and n = 4 given by (A.16), (A.17),


and (A.19), respectively). Now as before put Xn ⊗ Xn+1 = Yn , Xj = Yj ,
(j = 1, 2, . . . , n − 1) . Therefore,
 ⊗
μ⊗ ⊗
X1:n+1 = μY1:n = K−1
p(K) κ⊗
Yb . (3.86)
bj ∈K j
K∈Pn

Now let us evaluate (3.86) which is the expectation of n term. We have seen earlier
that if we know all partitions for order n − 1 we can obtain all the partitions for
the order n, using inclusive and exclusive matrices given in Tables 1, 2, 3, 4 for
168 3 T-Moments and T-Cumulants

values n = 1, 2, 3, 4. Each partition of order n − 1 gives rise to an inclusive matrix


of similar order (i.e. same number of rows = number of blocks) and one exclusive
matrix of higher order (one more block in the partition). We notice contrary that the
partitions K ∈ Pn can be generated from the partitions of Pn−1 taking inclusive
and exclusive expansions by the element n which is actually the index of Yn =
Xn ⊗ Xn+1 . Using these arguments, we can write
 ⊗
μ⊗
Y1:n = K−1
p(K) κ⊗
Yb (3.87)
bj ∈K j
K∈Pn

= Cumn (Y1:n−1 , Yn ) + Cumn−1 (X1:n−1 ) ⊗ Cum1 (Yn )


+ ,- . + ,- .
|K|=1,inclusive expansion |K|=1,exclusive expansion



 ⎜ 
+ K−1 ⎜ Cum|bj | (Ybj ) ⊗ Cum|bk |+1 (Ybk , Yn )
(p(K),n) ⎜
K∈Pn−1 ,|K|>1 ⎜bk ∈K bj ∈K, j =k
⎝ + ,- .
inclusive expansions of K



 ⎟
+ Cumn−1 (Xbj ) ⊗ Cum1 (Yn )⎟
⎟,
bj ∈K ⎟
+ ,- .⎠
exclusive expansion of K

(compare the above with (3.52) and (3.54)).We note that in the terms
Cum|bj | (Ybj ) ⊗ Cum|bk | (Ybk , Yn ), the first term Cum|bj | (Ybj ) must contain at
least one element of the set X1:n−1 , and therefore, the order of Cum|bk |+1 (Ybk , Yn )
is strictly less than n + 1. We now evaluate the terms Cum|bj | (Ybj ) ⊗
Cum|bk |+1 (Ybk , Yn ) and Cum1 (Yn ) in the expression (3.87). We have

κ⊗
Yb ,Yn = κ⊗
Xb ,Xn ⊗Xn+1
k k
  
= κ⊗
Xb ,Xn ,Xn+1 + K−1 ⊗
p(a) κ Xa ⊗ κ⊗
Xa , (3.88)
k 1 ,Xn 2 ,Xn+1
a1 ∪a2 =bk

where a1 and a2 are the partitions of the set bk . Here we have used the result (3.85)
which we have assumed to be true for n and the observation we have made earlier
that Cum|bk |+1 (Ybk , Yn ) is of order less than n + 1. We also have

κ⊗ ⊗ ⊗ ⊗ ⊗
Yn = κ Xn ⊗Xn+1 = κ Xn ,Xn+1 + κ Xn ⊗ κ Xn+1 . (3.89)
3.6 Appendix 169

Substituting (3.88) and (3.89) in (3.87), we get


 
μ⊗ ⊗
X1:n+1 = κ X1:n−1 ,Xn ⊗Xn+1 + K−1
p(K) κ⊗
Xb , (3.90)
j
K∈Pn+1 bj ∈K

where
 
K−1
p(K) Cum(Xbj ) = κ ⊗ ⊗
X1:n−1 ⊗ κ Xn ⊗Xn+1
K∈Pn+1 bj ∈K

+ K−1
(p(K),n,n+1) (3.91)
K∈Pn−1 ,|K|>1

  
×⎝ κ⊗ ⊗
Xb ⊗ κ Yb ,Yn ,Yn+1 + K−1
p(K,a)
j k
bk ∈K bj ∈K, j =k bj ∈K


× κ⊗ ⊗
Xb ⊗ κ Xa ,Xn ⊗ κ⊗
Xa ,Xn+1

j 1 2
a1 ∪a2 =bk
 
+ κ⊗ ⊗
Xb ⊗ κ Xn ,Xn+1 + κ⊗ ⊗ ⊗
Xb ⊗ κ Xn ⊗ κ Xn+1 .
j j
bj ∈K bj ∈K

The summation above is over all decomposable partitions K ∈ Pn+1 with respect
to the initial partition L = {(1) , (2) , . . . , (n − 1) , (n, n + 1)}. (Compare the above
expression (3.91) with (3.57).) By equating (3.87) and (3.90) and rearranging the
terms, we get

   

κ⊗
X1:n−2 ,Xn−1 ⊗Xn = K−1
p(K) κ⊗
Xb − κ⊗
Xb
j j
K∈Pn+1 bj ∈K K∈Pn+1 bj ∈K
 
= K−1
p(K) κ⊗
Xb
K ∈Pn+1 b∈K
K ∪L=O

= κ⊗
X1:n + K−1 ⊗ ⊗
p(K) κ b1 ⊗ κ b1 ,
K ∈Pn ,K ={b1 ,b2 }
n−1∈b1 ,n∈b2

where the initial partition L is {(1) , (2) , . . . , (n − 1) , (n, n + 1)} and the sum-
mation is over all indecomposable partitions with respect to the initial partition L
defined above (compare the result with (3.57)). Hence the result.
Proof of Theorem 3.7 We proceed with induction on both on the number of
variables included and on the partitions.
170 3 T-Moments and T-Cumulants

If L belongs to either P2 or P3 , then the Theorem follows from the formula


(3.85). Suppose that the formula (3.59) is valid for all partitions of P2 , P3 . . . , , Pn
and those partitions L0 of Pn+1 which comes before L, i.e. L0 ⊆ L. Now if L =
{b1 , b2 , . . . , bk } and
⊗ ⊗ ⊗ 
XL = Xj , Xj , . . . , Xj .
b1 b2 br

The cumulant of the products XL according to the partition L in canonical form,


can be expressed by
 the products of cumulants of the individual set of variables
Xb = Xj , j ∈ b , b ∈ K, such that L and K are indecomposable partitions,
K ∪ L = O, i.e., then suppose that n + 1 ∈ bk , we can assume that k = r, and
denote
⊗ ⊗
Ym = Xj , m = 1 : r − 1, Yr = Xj , and Yr+1 = Xn+1 .
bm j ∈bk ,j =n+1

The cumulant of the products XL according to the partition L in canonical form,


can be expressed by
 the products of cumulants of the individual set of variables
Xb = Xj , j ∈ b , b ∈ K, such that L and K are indecomposable partitions,
K ∪ L = O, i.e. We apply the formula (3.85) for X⊗
br

(Y1 , . . . , Yr ⊗ Yr+1 ) = XL ,

we can use Lemma even in the case when r = n, and get



κ⊗ ⊗
XL = κ Y1:r ,Yr+1 + K−1 ⊗ ⊗
p(K) κ b1 ⊗ κ b1 ,
K ∈Pr+1 ,K ={b1 ,b2 }
r∈b1 ,r+1∈b2

We see that each cumulants contained in right side of this expression is subject of
the induction. Hence the result (3.59).

3.6.2 A Hint for Proof of Lemma 3.8

Let n = 3, we express the cumulant κ ⊗


1:3 in terms of moments

κ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
1:3 = μ1:3 − μ1 ⊗ μ2,3 − K(132) μ1,3 ⊗ μ2 − μ1,2 ⊗ μ3 + 2μ1 ⊗ μ2 ⊗ μ3

= Eμ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗
1:3|Y − Eμ1|Y ⊗ Eμ2,3|Y − K(132) Eμ1,3|Y ⊗ Eμ2|Y − Eμ1,2|Y ⊗ Eμ3|Y

+2Eμ⊗ ⊗ ⊗
1|Y ⊗ Eμ2|Y ⊗ Eμ3|Y ,
3.6 Appendix 171

now replace the conditional moments by conditional cumulants as follows:

μ⊗ ⊗
j |Y = κ j |Y ,

μ⊗ ⊗ ⊗ ⊗
j,k|Y = κ j,k|Y + κ j |Y ⊗ κ k|Y

μ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗
1:3|Y = κ 1:3|Y + κ 1|Y ⊗ κ 2,3|Y + K(132) κ 1,3|Y ⊗ κ 2|Y + κ 1,2|Y

⊗κ ⊗ ⊗ ⊗ ⊗
3|Y + κ 1|Y ⊗ κ 2|Y ⊗ κ 3|Y ,

and obtain

κ⊗
1:3 = E κ⊗ ⊗ ⊗ −1 ⊗ ⊗
1:3|Y + κ 1|Y ⊗ κ 2,3|Y + K(132) κ 1,3|Y ⊗ κ 2|Y

+κ ⊗1,2|Y ⊗ κ ⊗
3|Y + κ ⊗
1|Y ⊗ κ ⊗
2|Y ⊗ κ ⊗
3|Y
 
−Eκ ⊗1|Y ⊗ Eκ ⊗
2,3|Y + Eκ ⊗
2|Y ⊗ κ ⊗
3|Y
 
−K−1 ⊗ ⊗ ⊗
(132) Eκ 1,3|Y + Eκ 1|Y ⊗ κ 3|Y ⊗ Eκ 2|Y

 
− Eκ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
1,2|Y + Eκ 1|Y ⊗ κ 2|Y ⊗ Eκ 3|Y + 2Eκ 1|Y ⊗ Eκ 2|Y ⊗ Eκ 3|Y ,

Collecting the expected values according to formula of expression cumulants in


terms of moments, we get
   
κ⊗
1:3 = Cum 1 κ ⊗
1:3|Y + E κ ⊗
1|Y ⊗ κ ⊗ ⊗ ⊗
2,3|Y − Eκ 1|Y ⊗ Eκ 2,3|Y
   
+K−1(132) E κ ⊗
1,3|Y ⊗ κ ⊗
2|Y − Eκ ⊗
1,3|Y ⊗ Eκ ⊗
2|Y

+Eκ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
1,2|Y ⊗ κ 3|Y − Eκ 1,2|Y ⊗ Eκ 3|Y + Eκ 1|Y ⊗ κ 2|Y ⊗ κ 3|Y

−Eκ ⊗ ⊗ ⊗
1|Y ⊗ Eκ 2|Y ⊗ κ 3|Y

−K−1 ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
(132) Eκ 1|Y ⊗ κ 3|Y ⊗ Eκ 2|Y − Eκ 1|Y ⊗ κ 2|Y Eκ 3|Y

+2Eκ ⊗ ⊗
1|Y ⊗ Eκ 2|Y ⊗ Eκ 3|Y

     
= Cum1 κ ⊗1:3|Y + Cum 2 κ ⊗
, κ
1,2|Y 3|Y

+ K −1
(132) Cum2 κ ⊗
, κ
1,3|Y 2|Y

   
+Cum2 κ ⊗ ⊗ ⊗ ⊗ ⊗
1|Y , κ 2,3|Y + Cum3 κ 1|Y , κ 2|Y , κ 3|Y

 
κ⊗ ⊗
1:3 = Cum1 κ 1:3|Y
     
+Cum2 κ ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗
1,2|Y , κ 3|Y + K(132) Cum2 κ 1,3|Y , κ 2|Y + Cum2 κ 1|Y , κ 2,3|Y
 
+Cum3 κ ⊗ ⊗ ⊗
1|Y , κ 2|Y , κ 3|Y .
172 3 T-Moments and T-Cumulants

3.6.3 Proof of Lemma 3.2

Let X1:n = [X1 , X2 , . . . , Xn ] be n independent copies of X ∈ Rd , and ek be the


coordinate unit vectors of Rn . First we write


n
Y = vecX1:n = e k ⊗ Xk ,
k=1

then observe that the independence of the components of X1:n implies


 n   n 
 
Cumr (vecX1:n ) = Cumr e k ⊗ Xk = e⊗r
k ⊗ κ⊗
X,r ,
k=1 k=1

where κ ⊗
X,n denotes the nth-order T-cumulants of X. The Fourier transform of X1:n
writes as


n
 
3
X= Xk eiωk−1 = Fn ⊗ Id vecX1:n .
k=1

Now take the expected value of 3


X⊗n and obtain
 ⊗n
E3
X⊗n = Fn ⊗ Id Evec⊗n X1:n .

The expected value of Evec⊗n X1:n = EY⊗n can be expressed by κ ⊗


X,n as follows
from (3.43):
 ⊗
EY⊗n = K−1
p(K) κ⊗
Yb .
b∈K
K∈Pn

The commutator K−1 rearranges the T-product but whatever the permutation of
⊗  p(K)
 ⊗n
κ Yb we can apply Fn ⊗ Id one by one on the terms
 n 
 ⊗|b|  ⊗|b|
Fn ⊗ Id κ⊗
Yb = ek ⊗ κ⊗
X,|b|
k=1
 n 
 ⊗|b|
= F⊗|b|
n ek ⊗ κ⊗ ⊗
X,|b| = nδ|b|n κ X,|b| ,
k=1
3.6 Appendix 173

(see (3.64)). Thus


  ⊗n ⊗
Fn ⊗ Id K−1
p(K) κ⊗ ⊗
Yb = nκ X,n .
b∈K
K∈Pn

3.6.4 Proof of Lemma 3.5

Proof Let L ∈ Pd−1 and bj , j = 1 : k be blocks of L. We


#nuse the r−1
formula

for cumulants via moments (see (3.20), p. 127) κ1:n = (−1) (r −
#  r=1
1)! K{r} ∈Pn rj =1 μ∂bj ,


k  
κϑ∂b ,j =1:k = (−1)r (r − 1)! μ∂ϑb ,j ∈a
j j
r=1 K{r} ∈Pk a∈K{r}

before taking the derivative


k  
Dϑd κϑ∂b ,j =1:k = (−1)r (r − 1)! Dϑd μ∂ϑb ,j ∈a
j j
r=1 K{r} ∈Pk a∈K{r}


k 
= (−1)r (r − 1)!
r=1 K{r} ∈Pk
⎛ ⎞

k   
⎝ μ∂ + μ∂ϑd ,ϑb ,j ∈ai μ∂ϑb ,j ∈a\i ⎠ ,
ϑ (bi ,d ) ,ϑbj ,j ∈ai \i j j
i=1 a∈K{r} ,a=ai

where ai ∈ K{r} denotes the block which contains index i. If r = 1, then K{1} is
with one block a = (1 : k) and the product is empty.
Now, for each index i we have
⎛ ⎞
k  
(−1)r (r − 1)! ⎝μ∂ϑ μ∂ϑb ,j ∈a\i ⎠
b ,d ,ϑb ,j ∈ai \i ( i ) j j
r=1 K{r} ∈Pk a∈K{r} ,a=ai

= κϑ∂ b ,d .
( i ) ,ϑbj ,j =1:k\i

The second sum corresponds to the inclusive extension of the partitions of the set
1:k


k  
k 
(−1)r (r − 1)! μ∂ϑd ,ϑb ,j ∈ai μ∂ϑb ,j ∈a\i ,
j j
r=1 K{r} ∈Pk i=1 a∈K{r} ,a=ai
174 3 T-Moments and T-Cumulants

it includes all the terms from k + 1 order cumulant κϑ∂d ,ϑb ,j =1:k . The missing
 j
exclusive extensions like μ∂ϑd a∈K{r} μ∂ϑb ,j ∈a can be included since μ∂ϑd is zero.
j
We summarize the above and obtain the derivative


k
Dϑd κϑ∂b ,j =1:k = κϑ∂ b ,d + κϑ∂d ,ϑb ,j =1:k ,
j ( i ) ,ϑbj ,j =1:k\i j
i=1

showing exactly the same rule as derivative of moments.

3.7 Exercises

3.1 Let (X1 , X2 , X3 ) be 3 random variables. Show that

Cum (X1 , X2 X3 ) = κ(1,2,3) + κ(1,3)κ2 + κ(1,2)κ3 . (3.92)

Justify that all the partitions in the right-hand side expression of 3.92, viz,
{(1, 2, 3)} , {(2) , (1, 3)} , {(3) , (1, 2)} are indecomposable with respect to the initial
partition L = {(1) , (2, 3)}.
3.2 Equating (A.19) and the expression EX1 X2 Y by (3.39), where Y = X3 X4 , and
canceling terms which are common, show

Cum (X1 , X2 , X3 X4 ) = κ(1:4) + κ(1,2,3)κ4 + κ(1,2,4)κ3 + κ(1,4)κ(2,3) + κ(1,3)κ(2,4).


(3.93)

Justify that all the above partitions are indecomposable with respect to the initial
partition L = {(1) , (2) , (3, 4)}.
3.3 Let the initial partition be L = {(1, 2) , (3, 4)}, show

Cum (X1 X2 , X3 X4 ) = κ(1:4) + κ(1,2,3)κ4 + κ(1,2,4)κ3 + κ(1,3,4)κ2 + κ2,3,4κ1


+κ(1,3)κ(2,4) + κ(1,4)κ(2,3)
+κ(1,3)κ2 κ4 + κ(1,4)κ2 κ3 + κ(2,3)κ1 κ4 + κ(2,4)κ1 κ3 .

3.4 Let us suppose that the random variable (X1 , X2 , X3 , X4 ) is multivariate


normal with mean zero, then show

Cum (X1 X2 , X3 X4 ) = κ(1,3)κ(2,4) + κ(1,4) κ(2,3). (3.94)


3.7 Exercises 175

Suppose X1 = X2 = X3 = X4 = X, where X is a Gaussian random variable with


mean zero and variance σ 2 , and conclude (3.94)
     
Cum X2 , X2 = Cov X2 , X2 = 2Var X2 = 2σ 4 .

3.5 Let X be matrix valued random variate, show that


   ⊗2
Evec⊗4 X = Cum2 vec⊗2 X + Evec⊗2 X .

3.6 Let X1:4 be a list of random vectors, use (3.58) to show

κ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗
X1 ⊗X2 ,X3 ⊗X4 = κ (1:4) + κ (1,2,3) ⊗ κ 4 + K(1243) κ (1,2,4) ⊗ κ 3

+K−1 ⊗ ⊗ −1 ⊗ ⊗
(1342) κ (1,3,4) ⊗ κ 2 + K(2431) κ 2,3,4 ⊗ κ 1

+K−1 ⊗ ⊗ −1 ⊗ ⊗
(1423) κ (1,4) ⊗ κ (2,3) + K(1324) κ (1,3) ⊗ κ (2,4)

+K−1 ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗
(1423) κ (1,4) ⊗ κ 2 ⊗ κ 3 + K(1324) κ (1,3) ⊗ κ 2 ⊗ κ 4

+κ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
1 ⊗ κ (2,3) ⊗ κ 4 + κ 1 ⊗ κ 2 ⊗ κ 3,4 .

Conclude
   
Cum2 X⊗2 , X⊗2 = κ ⊗ −1 −1
X,4 + Id + K(1243) + K(1342) + K(2431)
−1

   
κ⊗ ⊗ −1 −1
X,3 ⊗ κ X,1 + K(1324) + K(1423) κ X,2
⊗2

  
+ K−1 (1324) + K −1
(1423) κ ⊗
X,2 ⊗ κ ⊗2
X,1
  
+ Id + K−1 (1243) κ ⊗
X,1 ⊗ κ ⊗
X,2 ⊗ κ ⊗
X,1 .

3.7 Suppose the random variables (X1 , X2 , X3 , X4 ) have mean zero, then show

κ⊗ ⊗ −1 ⊗ ⊗ −1 ⊗ ⊗
X1 ,X2 ,X3 ⊗X4 = κ (1:4) + K(1423) κ (1,4) ⊗ κ (2,3) + K(1324) κ (1,3) ⊗ κ (2,4) . (3.95)

3.8 Suppose that random variate (X1 , X2 , X3 , X4 ) is multivariate normal with mean
zero, then

κ⊗ −1 ⊗ ⊗ −1 ⊗ ⊗
X1 ⊗X2 ,X3 ⊗X4 = K(1423) κ (1,4) ⊗ κ (2,3) + K(1324) κ (1,3) ⊗ κ (2,4) (3.96)

Suppose X1 = X2 = X3 = X4 = X, where X is a Gaussian random variable with


mean zero and variance , then conclude (3.96)
 
κ⊗
X⊗2 ,X⊗2
= K −1
(1423) + K −1 ⊗2
(1324) κ X,2 ,
176 3 T-Moments and T-Cumulants

and
 
κ⊗
X⊗2 ,X⊗2
= Id 4 + K(1243) vec ⊗2 .

3.9 Take a random variable X ∈ Rd , find the first three cumulants for the
polynomial

f (X) = Q0 + Q1 X + Q2 X⊗2 + Q3 X⊗3 ,

where Qj are constant matrices with appropriate dimensions (see Exercise 2.22).
3.10 Let X1 , X2 , X3 ∈ Rd be independent, and identically distributed with the
random vector X, and let Y = vec [X1 , X2 , X3 ]. Then show
 ⊗2
μ⊗
Y,2 = 1 3 ⊗ κ ⊗
Y,1 + vec (I3 ) ⊗ κ ⊗
Y,2 ,

where 13 = [1, 1, 1] .


3.11 Show that, if n = 3 then
 ⊗3 
κ⊗
X,3|Y = E X − μ⊗
X|Y |Y .

If n = 4 then
 
κ⊗
X,4|Y = μ ⊗
X,4|Y − L−1
3,1 μ⊗
X,3|Y ⊗ μ⊗
X|Y

−L−1 ⊗2 −1 ⊗ ⊗2 ⊗4
2,2 μX,2|Y + 2L2,1,1 μX,2|Y ⊗ μX|Y − 6μX|Y ,

where the commutator matrices L−1 −1 −1


2,1,1 , L2,2 , and L3,1 , as usual in the other cases,
are collected in Sect. A.2, p. 353.
3.12 If X and Y are independent, then show

κXY ,3 = κX,3 κY 3 ,1 + 3κX,1κX,2 κY 2 ,Y + κX,1


3
κY,3

and

κXY ,3 = μY,3 μX,3 − 3μX,2 μY,2 μX,1 μY,1 + 2μ3X,1μ3Y,1 .

3.13 Show that 4th order cumulant of the product of two independent variates is

2 + 6κ
κXY ,4 = κX,4 κY,4 + 4κX,4 κY,3 κY,1 + 3κX,4 κY,2 2 4
X,4 κY,2 κY,1 + κX,4 κY,1
2 + 12κ
+ 4κX,3 κX,1 κY,4 + 12κX,3 κX,1 κY,3 κY,1 + 12κX,3 κX,1 κY,2 2
X,3 κX,1 κY,2 κY,1
3.7 Exercises 177

2 κ
+ 3κX,2 2 2 2 2 2
Y,4 + 12κX,2 κY,3 κY,1 + 6κX,2 κY,2 + 12κX,2 κY,2 κY,1
2 κ
+ 6κX,2 κX,1 2 2 2 4
Y,4 + 12κX,2 κX,1 κY,3 κY,1 + 12κX,2 κX,1 κY,2 + κX,1 κY,4 .

If κY,1 = κX,1 = 0, then conclude


2 2 2 2
κXY ,4 = κX,4 κY,4 + 3κX,4κY,2 + 3κX,2 κY,4 + 6κX,2 κY,2 .

3.14 Take a random variable Y , and express the cumulant of X1:3 in terms of
conditional covariances and cumulants
   
κ(1:3) = Eκ(1:3)|Y + Cov κ(1:2)|Y , κ3|Y + Cov κ(1,3)|Y , κ2|Y
   
+ Cov κ(2,3)|Y , κ1|Y + Cum κ1|Y , κ2|Y , κ3|Y .

In particular if all Xj = X, then conclude (in an obvious notation)


   
2
κX,3 = EκX,3|Y + 3Cov μX|Y , σX|Y + Cum3 μX|Y .

3.15 Let (X, Y ) be normal distributed, show that ECum3 (X|Y ) = 0.


3.16 Show that Cum (X1 , X2 , X3 , X4 |X4 ) = 0.
3.17 Let X and Y be independent random variables, then show that

κXY,2 = κX,2 κY,2 + κX,2 κY,1


2
+ κY,2 κX,1
2
.

3.18 Use conditional cumulants to show

κX,3 = μX3 − 3μX,2 μX + 2μ3X .

3.19 Show that

κXY,3 = μY,3 μX,3 − 3μX,2 μX μY,2 μY + 2μ3Y μ3X .

3.20 Let X, Y , Z, and W be independent random variables, show that


2 2 2
κXY Z,2 = κX,2 κY,2 κZ,2 + κX,2 κY,1 κZ,2 + κX,1 κY,2 κZ,2 + κX,2 κY,2 κZ,1

+κX,2 κY,1
2 2
κZ,1 + κX,1
2 2
κY,2 κZ,1 + κX,1
2 2
κY,1 κZ,2 .

If the means are 0, then

κXY,2 = κX,2 κY,2 ,


κXY Z,2 = κX,2 κY,2 κZ,2,
κXY ZW,2 = κX,2 κY,2 κZ,2κW,2 .
178 3 T-Moments and T-Cumulants

3.21 Let X1:3 and Y1:3 be independent, show that

κX1 Y1 ,X2 Y2 ,X3 Y3 |Y1:3 = κX1:3 κY1 Y2 Y3 ,1 + κX(1,2) κX3 κY1 Y2 ,Y3
+κX[1,3] κX2 κY1 Y3 ,Y2 + κX2,3 κX1 κY2 Y3 ,Y1
+κX1 κX2 κX3 κY1:3 .

3.22 Let X and Y be independent. Use formula for cumulants of product to show
that
 
2 2 4
κXY ,4 = κX,4 κY,4 + 4κY,3 κY,1 + 3κY,2 + 6κY,2 κY,1 + κY,1
 
+ κX,3 κX,1 4κY,4 + 12κY,3κY,1 + 12κY,2
2
+ 12κY,2 κY,1
2

 
2 2 2
+ κX,2 3κY,4 + 12κY,3κY,1 + 6κY,2 + 12κY,2κY,1
 
+ κX,2 κX,1
2
6κY,4 + 12κY,3κY,1 + 12κY,2
2
+ κX,1
4
κY,4 .

3.23 Let X = [X1 , X2 , . . . , Xn ], where the entries of X are i.i.d with X. Use the
T −derivative of the characteristic function to show

μ⊗ ⊗2
X,2 = κX,1 1d + κX,2 vec (Id ) .
2

3.24 (Example 3.19 Continued) Let X = [X1 , X2 , . . . , Xn ], where the entries of


X are i.i.d. with X, show that

μ⊗ ⊗3 −1
X,3 = κX,1 1d + 3κX,2 κX,1 L2,1 (vec (Id ) ⊗ 1d ) + κX,3
3
e⊗3
k ,

where L−1 ⊗3 −1 −1
2,1 = Id + K(132) + K(231) , compare this result with Example 3.19.
3.25 (See [Goo75].) Let X = [X1 , X2 , . . . , Xn ] be independent copies of a random
variable X. Let ωk = 2πk/n, k = 0, 1, 2, . . . , n − 1, 0 ≤ ωk < 2π, be Fourier
frequencies,

consider exp (iω k ) on the unit circle, and notice the periodicity. Define
Fn = 1, eiω1 , · · · ei(n−1)ω1 and use Exercise 3.27 to show

1   n
E X Fn = Cumn (X) .
n

3.26 Let X1 , X2 , X3 ∈ Rd be independent and identically distributed with random


vector X, and Y = vec [X1 , X2 , X3 ]. Then show that
 ⊗2
μ⊗ ⊗
Y,2 = 13 ⊗ κ X,1 + vec (I3 ) ⊗ κ ⊗
X,2 ,

where 13 = [1, 1, 1] .


3.7 Exercises 179

3.27 Let Y = vecX1:n , where Xk are independent, and identically distributed


random variates with random vector X ∈ Rd . Then show
 ⊗3   
μ⊗ ⊗
Y,3 = 13 ⊗ κ X,1 + L(1,2) 13 ⊗ κ ⊗ ⊗
X,1 ⊗ vec (I3 ) ⊗ κ X,2 + e⊗3 ⊗
k ⊗ κ X,3 .

3.28 Let Y = vecX1:n , where Xk are independent and identically distributed


random variates with random vector X ∈ Rd . Show that

Cum1 (vecY) = 1n ⊗ κ ⊗
X,1 ,

Cum2 (vecY) = vec (In ) ⊗ κ ⊗


X,2 ,

and

Cumn (vecY) = e⊗n ⊗
k ⊗ κ X,n .

3.29 Let X be a Gaussian random vector with EX = 0, A and B matrices with


appropriate dimensions, and Cov(X, X) = . Then
 
Cum X AX, X BX = 2TrAB , (3.97)

(see Taniguchi, 1991[ST08]).


3.30 Let X and Y be independent, show that

μY,3 μX,3 − 3μX,1 μX,2 μY,2 μY,1 + 2μ3X,1 μ3Y,1


= κX,3 κY,3 + 3κX,3 κY,2 κY,1 + 3κX,2 κX,1 κY,3
+6κX,2κX,1 κY,2 κY,1 + κX,3 κY,1
3
+ κX,1
3
κY,3

and conclude that both are κXY ,3 .


3.31 let X1:2 and Y be jointly normal variates show that

Cum2 (X1:2 |Y ) = Cum2 (X1:2 − E (X1:2 |Y )) ,

3.32 Use the rule of moments for expressing Dϑd μ∂ϑb ,ϑb2 ,ϑb3 , then show
1

Dϑd κϑ∂b ,ϑb2 ,ϑb3 = κϑ∂b ,ϑb2 ,ϑb3 ,ϑd +κϑ(b



,ϑb2 ,ϑb3 +κϑb1 ,ϑ(b

,ϑb3 +κϑb1 ,ϑb2 ,ϑ(b

.
1 1 1 ,d ) 2 ,d ) 3 ,d )

3.33 Show κϑ∂b = μ∂ϑb = 0, for any non-empty block b.


180 3 T-Moments and T-Cumulants

3.34 Take three blocks b1 , b2 , b3 of the set 1 : (d − 1). Show that

Dϑd μ∂ϑb ,ϑb2 ,ϑb3 = μ∂ϑ b ,ϑb2 + μ∂ϑb ,ϑ(b ,d ) + μ∂ϑb ,ϑb2 ,d .
1 ( 1 ,d ) 1 2 1

3.35 Show that Dϑd+1 Eϑ lϑ1:d = Eϑ lϑ(1:(d+1)) + Eϑ lϑ(1:d) lϑd+1 .

3.8 Bibliographic Notes

The connection between moments and characteristic functions is well known in


probability theory see [Fel66, Luk70] for instance.
We mention here some milestones in the theory of cumulants without claiming
completeness.
Cumulants and their applications in statistics were initiated by Thiélé, T.N.
(1889) [Thi97], see [Hal00], in more detail, and [Spe83, McC87]. The basic theory
of cumulants has been developed by Fisher [Fis30], Kendall [Ken44], and Kendall
and Stuart [KS77].
Whittle [Whi53] provided some results on higher-order cumulants of linear
functions of sample covariances of multiple stationary time series.
James [Jam58, JM62], investigated cumulants of functions of random variables
which includes the problem of cumulant of products via products of cumulants.
We have adopted Lukacs’s idea [Luk55], who used the Faà di Bruno’s Formula
to establish connections between moments and cumulants.
An important application of cumulants to higher-order spectra of non-Gaussian
stationary processes was started in Kolmogorov’s school. Leonov and Shiryaev
[LS59, Shi60, Leo64, Shi96] proved the expression of the cumulant of products via
products of cumulants using the notion of indecomposable partitions, and Malyshev
[Mal80].
The calculation of cumulants via conditioning is shown in a paper by Brillinger
[Bri69], see [McC18] also.
Cumulants proved to be very efficient tools for Limit Theorems and several
statistical methods. In the frequency domain approach to multivariate stationary
processes (see pioneering works of Brillinger [Bri65, BR67, Bri81, Bri82, Bri91],
among others). They are used in Central Limit Theorems, approximations of
densities, nonlinear statistics, etc. ([BNC79, BNC89])
A series of papers have been published by T. Speed [Spe83, Spe86a? , Spe86c,
SS88a, SS88b, Spe90, Spe90], containing several important results on cumulants
using deep mathematical tools such as the Mobius inversion formula.
The excellent works of McCullagh [McC87, McC18, MC86], use tensor analysis
for the cumulants of vectors. The advantage of tensors is that one can express
general formulae in terms of the indices.
One can consult with the books by Malyshev and Minlos [MM85], by Malahov
[Mal78], and by Peccati and Taqqu [PT11] for the theory of cumulants.
3.8 Bibliographic Notes 181

The method of using T −derivative for establishing results from a unified


perspective on cumulants of a list of random vectors was started by Jammalamadaka
et al (2006) [JST06] (see also [Ter02], and recently [JTT20]). A different approach
is considered in Kollo and von Rosen (2006) ([KvR06]) and Kollo (2008) [Kol08],
where they propose notions of minimal sets of all possible higher-order moments
and cumulants.
Expressions of cumulants in terms of moments using Fourier transform can be
found in [McC87], Exercise 2.25 p.55, [MM85, Goo75] and [RS00].
For moments in terms of moments and cumulants, we refer to [McC18]
Exercise 2.3 pp. 51 and [Mor82]. Cumulants of the partial derivatives of the log-
likelihood function were considered by Skovgaard, (1986) [Sko86], McCullagh and
Cox, (1986)[MC86]. The applications of multivariate higher-order cumulants and
moments can be found in [KBF+ 15, BKB+ 16, NGS08, NS09, LHL18, OST18,
BP07, BMP12, Wei15, BKB+ 16, VPO19], and [CECTBBG16]
Chapter 4
Gaussian Systems, T-Hermite
Polynomials, Moments, and Cumulants

Abstract Hermite polynomials have several applications in many fields of science.


We start with the classical Hermite polynomials of one variable, which constitute a
complete set of orthonormal polynomials in the nonlinear Hilbert space of Gaussian
variates. We use the method of generating functions for deriving multivariate
Hermite polynomials. Well known properties are listed and higher-order moments
and cumulants are considered in detail not only for Hermite polynomials but also
for Gaussian systems. These general formulae are given in connection with set
partitions. Clear, computationally simple expressions are given for the product
of two, three, and four Hermite polynomials in terms of linear combinations of
Hermite polynomials. We use our T-calculus to study multivariate vector-valued
Hermite polynomials. Most of the results for multivariate Hermite polynomials
(scalar-valued) are generalized to vector-valued cases with the help of commutators,
T-moments, and T-cumulants. We also establish relations between multivariate
Hermite polynomials and multiple vector-valued Hermite polynomials. The Gram–
Charlier expansion of multivariate distributions in terms of T-Hermite polynomials
closes the chapter.
We consider the so-called probabilists’ Hermite polynomials, which physicists
denote differently as H e.

4.1 Hermite Polynomials in One Variable

The sequence of Hermite polynomials Hn (x), n = 0, 1, 2, . . . , is defined by the


following conditions, where Hn (x) is a polynomial of degree n with positive leading
coefficient,
4 +∞ 1
Hn (x) Hm (x) √ exp(−x 2 /2) dx = n!δnm , n, m = 0, 1, 2, . . . .
−∞ 2π

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 183
G. Terdik, Multivariate Statistical Methods, Frontiers in Probability
and the Statistical Sciences, https://doi.org/10.1007/978-3-030-81392-5_4
184 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

These
 polynomials form a complete
 orthogonal system in the Hilbert space
L2 R, B, √1 exp(−x 2 /2)dx , which implies that every (Borel measurable)

function g by assumption
4 +∞ 1
g 2 (x) √ exp(−x 2 /2) dx < ∞,
−∞ 2π

can be expanded into the Hermite series



 gk
g (x) = Hk (x) , (4.1)
k!
k=0

where Hk are Hermite polynomials and the coefficients gk are calculated as


4 +∞ 1
gk = g (x) Hk (x) √ exp(−x 2 /2) dx.
−∞ 2π

The convergence of (4.1) is taken in the L2 sense, i.e.

4  2
+∞ 
n
gk 1
lim g (x) − Hk (x) √ exp(−x 2 /2) dx = 0.
n→∞ −∞ k! 2π
k=0

The following properties of Hermite polynomials are well known:


Property 4.1 Rodrigues’ formula:
 n
−x 2 /2 d
e−x
2 /2
e Hn (x) = (−1) n
.
dx

Property 4.2 Hn polynomials constitute an Appell sequence viz.:

d
H0 = 1, Hn (x) = nHn−1 (x) .
dx
Property 4.3 The generating function of Hn is

 Hn (x)
a n = exp(xa − a 2 /2).
n!
n=0

Note that Property 4.3 is understood as Taylor series in a for fixed x and an L2
Hermite series, see (4.1), when a is fixed.
A translation of these facts into the terminology of probability leads to the
following idea. Let X be a Gaussian random variable with mean 0 and variance
4.2 Hermite Polynomials of Several Variables 185

1. The set of all random variables Y which are measurable with respect to X with
finite variance, called variables depending on X, forms a Hilbert space. Let us
denote it by L2 (X). The inner product in L2 (X) is the covariance as usual, i.e. if
Y1 , Y2 ∈ L2 (X) then Y1 , Y2  = Cov(Y1 , Y2 ). Now it is easy to see that the random
variables Hn = Hn (X), n = 0, 1, 2, . . . form a closed orthogonal system in L2 (X).
Therefore if Y ∈ L2 (X), then there exists a Borel measurable function g so that
Y = g(X) and also

 gk
Y = Hk ,
k!
k=0

where gk = Cov(Y, Hk ) and


 

n
gk
lim Var Y − Hk = 0.
n→∞ k!
k=0

Remark 4.1 In physicists’ literature the polynomials we are studying here are
referred to as “probabilists’ Hermite polynomials” and are denoted by H e. Their
Hermite polynomials have similar formulae, which may be √ obtained from ours by
replacing the power of x with the corresponding power of 2x and multiplying the
entire sum by 2n/2 .

4.2 Hermite Polynomials of Several Variables

An interesting question to address is the following. Take a Gaussian system X =


{Xk , 0 ≤ k ≤ n}, which means that every finite linear combination


n
ak Xk , ak ∈ R,
k=0

of Xk is Gaussian. Let L2 (X) be the Hilbert space of all random variables depending
on X, i.e. measurable with respect to the system X and having finite variance. We
note that Hilbert space L2 (X) is sometimes called nonlinear to distinguish from the
linear Hilbert space L2 (X), which is generated by the system X, including only finite
linear combinations and their limits in the mean square sense. So every element
in the linear space L2 (X) is Gaussian but this is not the case in L2 (X). Now the
question is whether a closed orthogonal system of polynomials in Xk exists in L2 (X)
and what it looks like.
186 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants


X = {X
Let us consider a Gaussian system
k , k = 1, 2, ...} with EXk = 0, k =
1, 2, ... and covariance matrix C = σj k = Cov Xj , Xk . We define the Hermite
polynomials of several variables through the generator function
⎛ ⎞

n
1 
n
 (X1:n ,a1:n ) = exp ⎝ ak Xk − aj σj k ak ⎠ , (4.2)
2
k=1 j,k=1

as follows.
We consider the Taylor series expansion of  (X1:n ,a1:n ) by a1:n ,

 a m1:n
1:n
 (X1:n ,a1:n ) = Hm1:n .
m1:n !
m1:n ≥0

m
Then the coefficients Hm1:n of a1:n1:n , are polynomials of X1:n and depends on the
covariances
 σj k of X1:n  as well. The polynomials Hm1:n =
Hm1:n X1m1 , X21m2 , .., Xn1mn will be called Hermite polynomials; the
dependence of covariances σj k is understood through the Gaussian variables
X1:n . We repeat the notation for vectors of which some entries coincide 
[X1 , X1 , . . . , X1 , X2 , X2 , . . . , X2 , .., Xn , Xn , . . . , Xn ] = X1m1 , X21m2 , .., Xn1mn .
+ ,- . + ,- . + ,- .
m1 m2 mn
It is also worth using some more compact form for the exponent
   

n
1 1 

 (X1:n ,a1:n ) = exp ak Xk − aj σj k ak = exp a X− a Ca ,
2 2
k=1

#
where a = a1:n , and a Ca = σj k aj ak .
We emphasize that here and everywhere else in this book the zero expected values
of the Gaussian variables are assumed unless otherwise stated. Since there is no
restriction on the distinct values of variables of polynomials, we can use the distinct
values principle and we can simplify the general definition of Hm1:n by setting
m = |m1:n | = m1:n , then starting with only first-order derivatives of  by each
variable,

∂m
Hm (X1:m ) =  (X1:m ,a1:m ) . (4.3)
∂a1:m

Then equating
 each X1:m1 by Y1 , etc. Xmn−1 +1:mn , by Yn , we arrive at polynomial
Hm1:n Y1m1 , Y21m2 , .., Yn1mn . Let 1:n be the type of multi-index m1:n , then we
could use the notation
 
Hm1:n Y1m1 , Y21m2 , .., Yn1mn = H1:n (Y1:n ) (4.4)
4.2 Hermite Polynomials of Several Variables 187

 
as well. An instance of getting H3 (Y1 , Y1 , Y2 ) = H3 Y12 , Y2 , is deriving H3 of
distinct variables by the first derivatives of :

H3 (X1 , X2 , X3 ) = X1 X2 X3 − σ23 X1 − σ13 X2 − σ12 X3 ,

then change X1 and X2 into Y1 , X3 into Y2 , including the indices of covariances σ12
into σ11 and σ23 into σ12 , now we obtain
 
H3 Y12 , Y2 = Y12 Y2 − 2σ12 Y1 − σ11 Y2 .

Therefore we consider the Hermite polynomials of the form Hn (X1:n ).


In this way, as we shall see there will be nice, clear formulae for Hermite
polynomials.
 The results concerning
 Hn (X1:n ) will cover all cases of the form
Hm1:n X1m1 , X21m2 , .., Xn1mn as well, since there is no assumption of variables
of Hn (X1:n ) to be distinct.
We shall also follow another convention, setting random variates X1:n as
variables of Hn (instead of real numbers x1:n ), and as a consequence of this the
coefficients of the polynomials will depend on the covariances σj k of the variates
Xj and Xk . The Hermite polynomial with respect to a Gaussian

 system 
X1:n will be
understood together with its covariance structure C = Cov Xj , Xk , and with
zero expected values EXk = 0. There are direct methods for deriving Hermite
polynomials as follows:
1. Appell sequence
• H0 = 1,  
• ∂/∂Xk Hn (X1:n ) = Hn−1 X[1:n]\k , where [1 : n] \k denotes the index set
when k is missing from the set 1 : n,
• Hn (X1:n ) is a symmetric function and EHn (X1:n ) = 0.
2. Recurrence formula
• H0 = 1, H1 (X) = X,
• If n > 1,

  n−1
 
Hn (X1:n ) = Xn Hn−1 X1:(n−1) − σj n Hn−2 X[1:(n−1)]\j , (4.5)
j =1

where, for instance


 if j = 1, then the variable X1 is missing from Hn−2 , i.e.
Hn−2 X2:(n−1) .
3. Exponent. We repeat here the method using the generating function (4.2)
 *
∂n 1 *
Hn (X1:n ) = exp a X − a Ca ** .
∂a1:n 2 a1:n =0
188 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

4. Conditional expectation. The method of conditional expectation is based on


rewriting the exponent in (4.2), using the characteristic function of X:
 
1    
exp a X − a Ca = exp a X E exp ia X .
2

Then take an independent copy Y = Y1:n of X = X1:n , and notice the usage of
conditional expectation
   
 (X1:n ,a1:n ) = exp a X E exp ia X
       
= E exp a X |X1:n E exp ia Y |X1:n
   
= E exp a (X + iY) |X1:n ,

now the series expansion of the exponent exp (a (X + iY)) provides directly the
Hermite polynomials
 n 
  
Hn (X1:n ) = E (X1:n + iY1:n ) |X1:n = E
1n
(Xk + iYk ) |X1:n .
k=1

Therefore we can obtain Hn (X1:n ) from the conditional expectation


   *

n 
n *
*
E (Xk + iYk ) |X1:n = E (xk + iXk ) * . (4.6)
*
k=1 k=1 x1:n =X1:n

Example 4.1 Let us take n = 4 and consider (4.6). Here we shall use two facts,
which will be shown later, one is that the expected values of Gaussian products of
odd orders are zero and the other one is

EX1 X2 X3 X4 = σ12 σ34 + σ13 σ24 + σ14 σ23 ,

(see (4.9)). We neglect zero terms hence


 

4 
4
E (xk + iXk ) = xk − σ12 x3 x4 − σ13 x2 x4 − σ14 x2 x3 − σ23 x1 x4
k=1 k=1
−σ24 x1 x3 − σ34 x1 x2 + σ12 σ34 + σ13 σ24 + σ14 σ23 .
4.2 Hermite Polynomials of Several Variables 189

Now we plug x1:n = X1:n and obtain


4
H4 (X1:4 ) = Xk − σ12 X3 X4 − σ13 X2 X4 − σ14 X2 X3 − σ23 X1 X4
k=1
−σ24 X1 X3 − σ34 X1 X2 + σ12 σ34 + σ13 σ24 + σ14 σ23 ,

cf. Sect. A.5, p. 363.


Note again that the Hermite polynomials depend on the covariance structure of
the underlying Gaussian system.
The variables of Hermite polynomials are taken from a Gaussian random system
and their covariance structure also belongs to the definition of Hermite polynomials.
The first 5 polynomials are listed in Sect. A.5, p. 363. Those formulae are valid for
the case when either some or all Xj are identical to, say, X.
Some properties of Hermite polynomials of several variables:
Property 4.4 (Independent Variables) It is easy to see that if X1:k and X(k+1):n are
independent random vector variates, then

Hn (X1:n ) = Hk (X1:k )Hn−k (X(k+1):n ).

Property 4.5 (Symmetry) It is symmetric

Hn (X1:n ) = Hn (Xp(1:n) ),

for any permutation p ∈ Pn .


Property 4.6 (Multilinear) If {Y, Z, X1:n } is a Gaussian system and a, b are real
numbers, then

Hn+1 (aY + bZ, X1:n ) = aHn+1 (Y, X1:n ) + bHn+1 (Z, X1:n ),

i.e. Hn is multilinear, cf. Exercise 4.1, p. 234. We can apply the multilinear property
on each variable one after another

H2 (a1 X1 + a2 X2 , a1 X1 + a2 X2 ) = a1 H2 (X1 , a1 X1 + a2 X2 )
+a2 H2 (X2 , a1 X1 + a2 X2 )
= a12 H2 (X1 , X1 ) + 2a1a2 H2 (X1 , X2 )
+a22 H2 (X2 , X2 ) .

One can
 transform
 Hn (X) to depend on the standard variable, since Hn (X) =
σ n Hn σ −1 X .
190 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

Property 4.7 (Multinomial Expansion) A consequence of the previous property is


the Multinomial Expansion Theorem:
   

n  n k1:n  
Hn ak Xk = a1:n Hn Xk1:n .
k1:n
k=1 0≤k1:n ;k1:n =n

In particular, we have the binomial expansion


n  
 n k n−k  
Hn (aX1 + bX2 ) = a b Hn X1k , X21n−k .
k
k=0

Moreover, in the case of the independence of X1 and X2 we have


n  
 n k n−k
Hn (aX1 + bX2 ) = a b Hk (X1 ) Hn−k (X2 ) .
k
k=0

Remark 4.2 Since Xk = X, k = 1, 2, . . . , n is a Gaussian system, the above


definitions are generalizations of those given in the previous Sect. 4.1, i.e. let
Var (X) = 1 then Hn (X) = Hn (X, X, . . . , X ), for example H2 (X, X) = H2 (X) =
+ ,- .
n
X2 − 1.

4.3 Moments and Cumulants for Gaussian Systems

Next, we will show some general results on moments and cumulants for Gaussian
systems and Hermite polynomials.

4.3.1 Moments of Gaussian Systems and Hermite Polynomials

Let us consider the higher-order moments for a Gaussian system X = {Xk } with
EXk = 0 and covariance σj k = Cov(Xj , Xk ). We have expressed the joint moments
in terms of cumulants, (see (3.40), p. 138), since cumulants of order larger than 2
are zero, the covariances remain only in the expression

1n

n  r
EX1:n = μ(1:n) = κX,bj .
j =1
r=1 K{r} ∈Pn
4.3 Moments and Cumulants for Gaussian Systems 191

This implies that the summation is taken over all partitions of pairs K II ∈ PII
n of the
set 1 : n = {1, 2, ...n}; therefore,
 
μ(1:n) = σj k .
K II ∈PII
n (j,k)∈K
II

We have seen that the number of partitions K II is SnII = (n − 1)!! if n is even ( cf.
(1.51), p. 43); therefore, we obtain

 
n/2
μ(1:n) = σjm km , (4.7)
(n−1)!! m=1

1n
where (jm , km ) ∈ K II . Note EX1:n = 0, for odd n, and EX2n = (n − 1)!!σ 2n . In
practice we use the inclusive–exclusive method for generating partitions of pairs,
which serve as indices of σj k , (see Sect. 1.4, p. 26). The following algorithm


2n−1
μ(1:2n) = σk,2n μ(1:(2n−1))\k , (4.8)
k=1

can be applied, starting by n = 1, the index (1 : (2n − 1)) \k means that k is missing
from the set 1 : (2n − 1).
Example 4.2 For instance let n = 4, then

μ(1:4) = EX1 X2 X3 X4 = σ12 σ34 + σ13 σ24 + σ14 σ23 . (4.9)

Observe, there is no restriction on Xj to be the same, hence if X1 = X2 , and X3 =


X4 , then

EX12 X32 = σ12 σ32 + 2σ13


2
,

and if Xj = X, j = 1 : 4, then

μX,4 = 3σ 4 .

If n = 6, then we obtain

μ1:6 = σ16 (σ23 σ45 + σ24 σ35 + σ25 σ34 ) + σ26 (σ13 σ45 + σ14 σ35 + σ15 σ34 ) (4.10)
+σ36 (σ12 σ45 + σ14 σ25 + σ15 σ24 ) + σ14 (σ12 σ35 + σ13 σ25 + σ15 σ23 )
+σ15 (σ12 σ34 + σ13 σ24 + σ14 σ23 ) ,
192 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

by recursion (4.8) from n = 4, and if Xj = X, j = 1 : 6, then

μX,6 = 15σ 6 .

The first-order moments of the Hermite polynomials are zero by the construction,
(see Recursion 1), the second-order ones are

    
n
EHn Xk1:n Hm Xj1:m = δnm σks ,jp(s) , (4.11)
p s=1

where k1:n = (k1 , k2 , . . . , kn ), j1:m = (j1 , j2 , . . . , jm ) and the summation is taken


over all permutations jp(1:m) of the indices (j1 , j2 , . . . , jm ), it contains n! terms,
while the order of (k1 , k2 , . . . , kn ) is fixed.
Example 4.3 In particular

EHn (Xj )Hm (Xk ) = δnm n!σjnk ,

and

EHn2(X) = n!σ 2n .

Proposition 4.1 Let us take a partition L = (b1 , b2 , . . . , bm ) of 1 : n with |bj | =


kj , i.e. the number of indices in the set bj is kj . Then


m
   
E Hkj Xbj = σj k , (4.12)
j =1 {K II |( L,K II )nl } (j,k)∈K II

 
where the summation is taken over all diagrams L, K II without loops. If the set
 II *  
K * L, K II nl is empty, then the expectation is zero.
 
In particular, if p = 2, i.e. L = (b1 , b2 ), then the diagram L, K II without loop
corresponds to a partition K II containing pairs (k1 , k2 ) so that k1 ∈ b1 and k2 ∈ b2 .
One can reach all such types of partitions by fixing the first entries and permuting
the second ones, see (4.11).
Example 4.4 Take m = 3 and kj = 2, j = 1, 2, 3 then (4.12) gives that

EH2 (X1 , X2 ) H2 (X3 , X4 ) H2 (X5 , X6 ) = σ23 σ45 σ16 + σ23 σ46 σ15 + σ24 σ35 σ16
+σ24 σ36 σ15 + σ25 σ36 σ41 + σ25 σ46 σ31
+σ26 σ35 σ41 + σ26 σ45 σ31 . (4.13)
4.3 Moments and Cumulants for Gaussian Systems 193

4.3.2 Cumulants for Product of Gaussian Variates and


Hermite Polynomials

Now we consider a well known characterization of Gaussian families. A system


X1:n of random variables is Gaussian if and only if

Cum (Xb ) = 0,

for any block b ⊆ 1 : n, with cardinality |b| > 2.


We have considered an expression for cumulants of products in terms
 of products
of cumulants; see (3.48), p. 144. In general when a partition L = b1 , b2 , . . . , bp
of (1 : n) is given, then
 
Cum (XL ) = Cum(Xb ), (4.14)
L∪K=O b∈K

   
where XL denotes the vector of products of variables Xb1 , Xb2 , . . . , Xbp .
Now if the system X1:n is Gaussian (with zero mean), then all cumulants are
zero on the right-hand side of (4.14) except the second-order ones. Therefore the
summation is taken only for K ∈ PII n . The assumption that L ∪ K = O, i.e. the
II

partitions L and K are indecomposable, is equivalent to the assumption that the


II

graph (L, K II ) is closed, (see Sect. 1.4.8, p. 43). Thus we have the formula
Proposition 4.2
 
Cum(XL ) = σij . (4.15)
{K II |(L,K II )cl } (i,j )∈K II
 
If the set of partitions K II of the closed diagram L, K II cl is empty, then the
cumulant is zero. This implies that n must be an even number.
Example 4.5 We let L = {b1 = (1) , b2 = (2 : 4)}, then

Cum(XL ) = Cum (X1 , X2 X3 X4 ) = σ12 σ34 + σ13 σ24 + σ14 σ23 ,

cf. (4.9).
Second-order cumulant of two Hermite polynomials coincides with the expected
value of their product

         
n
Cum Hn Xk1:n , Hm Xj1:m = EHn Xk1:n Hm Xj1:m = δnm σks ,jp(s) ,
n! s=1

cf. (4.11), since the expected values of Hermite polynomials are zero.
194 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

 
Proposition 4.3 Take a partition L = b1 , b2 , . . . , bp of 1 : n such that |bj | = nj
then
   
Cum Hn1 (Xb1 ), Hn2 (Xb2 ), . . . , Hnp (Xbp ) = σj k ,
{K |(L,K )cnl } (j,k)∈K
II II II

(4.16)
 
where the summation is over all closed diagrams L, K II cnl without loops.
 
Again, if the set of partitions K II of the closed diagram L, K II without a loop
is empty, then the cumulant is zero, for instance, if n is odd the set PII
n is empty,
therefore the cumulant is zero.
Remark 4.3 Using partition matrices U, see Sect. 1.4, one can construct algorithms
for generating partitions for all diagrams above serving to obtain moments and
cumulants.
Suppose that all variables below in the examples of this subsection are jointly
Gaussian with mean zero. The following formulae will be useful for calculating
higher-order spectra of particular random processes.
Example 4.6 Let X1:2k be jointly Gaussian with mean zero then the formula (4.16)
has the following form:

Cum (H2 (X1 , X2 ) , H2 (X3 , X4 ) , . . . , H2 (X2k−1 , X2k ))


   
= Cov Xkm , Xkn , (4.17)
2k−1 (k−1)! (km ,kn )∈K II

 
where summation is all over closed diagrams L, K II cnl without loops and
L = (b1 , . . . bk ), bj = (2j − 1, 2j ), the number of these diagrams is 2k−1 (k − 1)!
(see Case 1.7, p. 46). This example is a particular case of cumulants of products
expressed by the products of cumulants for Gaussian variables, since the cumulant
of products X2j −1 X2j equals to the cumulant of H2 X2j −1 , X2j . The particular
cases of the Formula (4.17) for k = 2, 3, 4 gives us the following:
k = 2. Let k = 2, then formula (4.17) implies

Cum (H2 (X1 , X2 ) , H2 (X3 , X4 )) = σ23 σ41 + σ24 σ31 .

Compare this result with Cum (X1 X2 , X3 X4 ), when variables are centered
Gaussian ones. Two particular cases follow
2
Cum (H2 (X1 ) , H2 (X2 )) = 2σ12 ,
4.3 Moments and Cumulants for Gaussian Systems 195

and

Cum (H2 (X) , H2 (X)) = 2σ 4 .

k = 3. Let k = 3, then one can show that Cum (H2 (X1 , X2 ) , H2 (X3 , X4 ) ,
H2 (X5 , X6 )) coincides with (4.13), recall the connection between third-order
cumulants and third-order central moments, cf. (3.21), p. 129. Compare this
result with cumulant of products Cum (X1 X2 , X3 X4 , X5 X6 ), when all variables
are Gaussian, see Example 1.32. Assume further X2j −1 = X2j , then we have

Cum (H2 (X1 ) , H2 (X2 ) , H2 (X3 )) = 8σ12 σ23 σ31 ,

see (4.18) below. In particular

Cum3 (H2 (X)) = Cum (H2 (X) , H2 (X) , H2 (X)) = 8σ 6 .

k = 4. If k = 4, and X2j −1 = X2j = Xj , then by (4.17) we have

Cum (H2 (X1 ) , H2 (X2 ) , H2 (X3 ) , H2 (X4 ))


= 8 (σ12 σ23 σ34 σ41 + σ12 σ24 σ43 σ31 + σ13 σ32 σ24 σ41 + σ13 σ34 σ42 σ21
+σ14 σ42 σ23 σ31 + σ14 σ43 σ32 σ21 ) .

In particular

Cum4 (H2 (X)) = 48σ 8,

since 23 3! = 48.
Remark 4.4 We have seen in Example 3.6, p. 119, that the cumulants of the Gamma
distribution  (μ, α) are

(n − 1)!μ
κX,n = ,
αn

setting μ = 1/2, and α = 1/σ 2 we obtain

κX,n = 2n−1 (n − 1)!σ 2n ,


 
which is exactly the cumulants Cumn (H2 (X)), X ∈ N 0, σ 2 , except n = 1, (see
 
(4.17)). It follows that the distribution of H2 (X) + 1 is  1/2, 1/σ 2 .
We consider some more special cases of the general formula (4.16) when the
Hermite polynomials only depend on one Gaussian variates, and we have clear
formulae for cumulants.
196 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

Case 4.1 (Cumulant of 2 Hermite Polynomials) General formula for cumulant of


two Hermite polynomials coincide with the expected value of the product, cf. (4.11),
   
Cum Hn Xj , Hm (Xk ) = δnm n!σjnk .
 
Observe the connection to the closed diagram L, K II , where L = (b1 , b2 ), |b1 | =
|b2 | = n.
Case 4.2 (Cumulant of 3 Hermite Polynomials) Let us consider now the case
Cum Hr1 (X1 ), Hr2 (X2 ), Hr3 (X3 ) , applying (4.16) two problems will arise, one
is the possible covariances, which is simple since the product σ12 σ23 σ31 isthe only
possible choice. The next problem is finding all possible closed diagrams L, K II
without loops, where L = (b1 , b2 , b3 ), with |bj | = rj . Actually, only the number
of blocks of the same type is necessary since we do not differentiate between the
elements of a block bj . We are interested in closed diagrams without loops when
r1 + r2 + r3 = 2r, i.e. the sum should be an even number. The number of partitions
K II of pairs is
  
r1 r2
S II (r1:3 ) = r3 ! (r − r3 )!,
r − r2 r − r1

ri ≤ r, (see (1.53), p. 45), so we have


  r−r r−r1 r−r2
Cum Hr1 (X1 ), Hr2 (X2 ), Hr3 (X3 ) = S II (r1:3 ) σ12 3 σ23 σ31 . (4.18)

Consider an example Cum (H1 (X1 ), H1 (X2 ), H20 (X3 )), here r = 11 < 20,
and it is clear that there is no closed diagrams without loops; therefore,
Cum (H1 (X1 ), H1 (X2 ), H20 (X3 )) = 0. As far as ri ≤ r, i.e.

r1 + r2 + r3
ri ≤ ,
2
for instance r1 ≤ r2 + r3 , in other words, if the triangle inequality is valid then the
formula (4.18) is also valid, otherwise the cumulant is zero. In particular, if Xj ’s
coincide then
 
Cum Hr1 (X), Hr2 (X), Hr3 (X) = S II (r1:3 ) σ 2r . (4.19)

Case 4.3 (Cumulant of 4 Hermite Polynomials) Consider the case Cum Hr1 (X1 ),
  
Hr2 (X2 ), Hr3 (X3 ) , Hr4 (X4 ) . We are interested in closed diagrams L, K II
4.4 Products of Hermite Polynomials, Linearization 197

without loops, where L = (b1 , b2 , b3 , b4 ), with |bj | = rj , let r1 +r2 +r3 +r4 = 2r,
i.e. the sum is an even number, see Case 1.8, p. 46, for details. We have
 
Cum Hr1 (X1 ), Hr2 (X2 ), Hr3 (X3 ), Hr4 (X4 )
∗ r1 !r2 !r3 !r4 !
= σ k12 σ k13 σ k14 σ k23 σ k13 σ k34 , (4.20)
k12 !k13 !k14 !k23!k24 !k34 ! 12 13 14 23 24 34

where summation  ∗ runs for all nonnegative integers k12 , k13 , k14 , k23, k24 , k34 , so
that k12 +k13 +k14 +k23 +k24 +k34 = r, and r1 = k12 +k13 +k14, r2 = k12 +k23 +k24,
r3 = kII13
 + k23 + k34 , r4 = k14 + k24 + k34 . The number of all closed diagrams
L, K without loops is S II (r1:4 ); see (1.55), p. 47 for details, hence
 
Cum Hr1 (X), Hr2 (X), Hr3 (X), Hr4 (X) = S II (r1:4 ) σ 2r . (4.21)
 
Note, the cumulant is zero if either r1 + r2 + r3 + r4 is odd or if r1 = max rj , say,
and r1 > r2 + r3 + r4 .
The formula for the general case is more complicated since splitting blocs for
building loops are not unique.
There have been attempts to express the product of several Hermite poly-
nomials in terms of linear combination of Hermite polynomials. The coeffi-
cients of that linear form could serve
 for a general formula of the cumulant
Cum Hr1 (X), Hr2 (X), . . . , Hrp (X) , p ≥ 4, similar to (4.19).

4.4 Products of Hermite Polynomials, Linearization

Let us recall that if L = (b1 , b2 , . . . , bp ) is a partition of (1 : n), then XL denotes a


vector of dimension p, namely; XL = (Xb1 , Xb2 , . . . , Xbp ). Let us consider
 
a partition K I,II with blocks having two elements at most. The diagram L, K I,II
 of K
corresponds to a graph that might have free edges with respect to the blocs I,II

with single elements. These free edges are called the arms of the diagram L, K I,II .
I,II I,II
Let P
 I,IIn denote the set of all partitions having one or two elements, i.e. P n =
K , (see Sect. 1.4.8, p. 43).
 
The graph L, K I,II , K I,II ∈ PI,II n has not only vertices b ∈ L and edges
(k1 , k2 ) ∈ K I,II but arms (m) ∈ K I,II as well, (see Example 1.33, p. 47) for a
diagram with arms. We recall that arms are blocks with one  entry in them. Denote
dK the number of arms in the partition K I,II and DK = m | (m) ∈ K I,II the set
of arms. Let KdI,II denote a partition where the number of arms is d.
198 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

1n
Proposition 4.4 The product X1:n is expressed in terms of Hermite polynomials by
⎛ ⎞
1n
 ⎜  ⎟  
X1:n = ⎝ σij ⎠ Hd XDK , (4.22)
d≡nmod(2) KdI,II (i,j )∈KdI,II

where the sum is taken over all d ∈ 0 : n with d ≡ n mod (2), actually d = n :
2 : 0 = n, n − 2, . . ., the second sum is taken over all partitions KdI,II ∈ PI,II
n and
where XDK = {Xm | (m) ∈ DK }. We note that it is necessary that the evenness of
the number of arms dK be the same as the evenness of n, i.e. dK ≡ n mod (2).
This follows from the recurrence formula, (4.5), since all the powers in a Hermite
polynomial have the same evenness as their order, and the number of n − d must be
even.
Example 4.7 Let us consider the formula (4.22) when n = 4. All possible choices
for arms d are 4, 2, and 0. If d = 4, K4I,II contains only arms, K I,II =
{(1) , (2) , (3) , (4)}. If d = 2, K2I,II contains 2 arms and 1 pair, an example
K I,II = {(1) , (2) , (3, 4)} with σ34 H2 (X1 , X2 ). Finally, if d = 0, K0I,II contains
only pairs, for example K I,II = {(1, 2) , (3, 4)}, etc. We have

14
X1:4 = H4 (X1:4 ) + σ12 H2 (X3 , X4 ) + σ13 H2 (X2 , X4 ) + σ14 H2 (X2 , X3 )
+σ23 H2 (X1 , X4 ) + σ24 H2 (X1 , X3 ) + σ34 H2 (X1 , X2 ) + σ12 σ34
+σ13 σ24 + σ14 σ23 ,

and in particular

X4 = H4 (X) + 6σ 2 H2 (X) + 3σ 4 .

It is worth mentioning a particular case of (4.22) when all variables coincide, i.e.
expressing Xn in terms of Hermite polynomials. We collect thecoefficients
 by dK .
If dK = k, then there are (n − k − 1)!! partitions available and nk possible choices
for arms. If the number of arms dK = k then n − k must be even, the case of odd
n − k is handled by (n − k − 1)!!, since it is zero if n − k is odd, cf. (1.56), p. 48.
Therefore we have


n   
[n/2]
n n−k n!
X = n
(n − k − 1)!! σ Hk (X) = σ 2k Hn−2k (X) .
k (n − 2k)!k!2k
k=0 k=0
(4.23)

This latter formula is usually referred to as the inversion formula for Hermite
polynomials given by (4.24). Once again here (n − k − 1)!! = 0 unless n ≡ k,
4.4 Products of Hermite Polynomials, Linearization 199

mod (2), so there are [n/2] terms to add in (4.23). These terms correspond to the
orders n, n − 2, . . . , n − 2k, . . . , n − 2 [n/2], with coefficient
 
n n!
(2k − 1)!! = .
n − 2k (n − 2k)!k!2k

The Hermite
 polynomials Hn (X1:n ) can also be expressed in terms of the products
XDK = (m)∈DK Xm , i.e.
⎛ ⎞
 ⎜  ⎟
Hn (X1:n ) = (−1)(n−d)/2 ⎝ σij ⎠ XDK ,
d≡nmod(2) KdI,II (i,j )∈KdI,II

where the sum is taken over all partitions KdI,II for which the evenness of the number
of arms d ∈ (0 : n) is the same as n, i.e. d ≡ n mod (2).
The case when all variables coincide, i.e. Xj = X, j = 1, . . . , n, implies


[n/2]
(−1)k n!
Hn (X) = Xn−2k σ 2k . (4.24)
(n − 2k)!k!2k
k=0

Theorem 4.1 (Linearization of Products of Hermite Polynomials) Take a parti-


tion L = (b1 , b2 , . . . , bp ) of (1 : n) with cardinality |bj | = nj . Then
⎛ ⎞

p
    ⎜  ⎟  
Hnj Xbj = ⎝ σj k ⎠ HdK XDK , (4.25)
 
j =1 d≡nmod(2) L,K I,II (j,k)∈KdI,II
d cnl

 
where d ∈ (0 : n), and the summation is over all closed diagrams L, KdI,II
cnl
without loops.

2 }, and b1 = (1 : n − 1), b2 = (n) then


In particular if p = 2, L = {b1 , b
all the partitions of Pn for diagram L, KdI,II
I,II
without loop are K I,II = I =
cnl
{(1), (2), . . . , (n)}, K I,II = {(1), (2), . . . , (j − 1), (j, n), (j + 1), . . . , (n − 1)},
j = 1, 2, . . . , n − 1, therefore


n−1
 
Xn Hn−1 (X1:n−1 ) = Hn (X1:n ) + σj n Hn−2 X[1:(j −1),(j +1):(n−1)] ,
j =1

since Xn = H1 (Xn ) and (4.25) is applied. This formula has been considered as the
recursion formula for the Hermite polynomials (see recurrence formula p. 187).
200 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

Case 4.4 (Product of 2 Hermite Polynomials) The product of two Hermite poly-
nomials Hm (X1 ) and Hn (X2 ) can be simplified as a special case of (4.25). One
needs to collect the number of blocks in the partition K I,II with respect to the
partition L = {b1 = (1 : m), b2 = (m + 1 : m + n)}, when the number of arms
fixed m + n − 2r, say, it has been given in (1.57). p. 49. Since in that case, the
r , we have
product of covariances is the same, i.e. σ12


min(m,n)   
m n r  
Hm (X1 ) Hn (X2 ) = r! σ Hm+n−2r X1m−r , X21n−r . (4.26)
r r 12
r=0

When the variates X1 and X2 are identical, a particular case of (4.26) writes as


min(m,n)
m!n!
Hm (X) Hn (X) = σ 2r Hm+n−2r (X) . (4.27)
(m − r)! (n − r)!r!
r=0

(see formula 1.57, p. 49, for the number of terms in (4.27)).


Example 4.8 We use formula (4.26) and obtain
   
H2 (X1 ) H3 (X2 ) = H5 X12 , X213 + 6σ12 H3 X1 , X212 + 6σ1,2
2
H1 (X2 ) ,

in particular setting X1 = X2 = X, we have

H2 (X) H3 (X) = H5 (X) + 6σ 2 H3 (X) + 6σ 4 H1 (X) .

Example 4.9 The formula (4.27) implies

H42 (X) = H8 (X) + 16σ 2 H6 (X) + 72σ 4 H4 (X) + 96σ 6 H2 (X) + 24σ 8 . (4.28)

Case 4.5 (Product of 3 Hermite Polynomials) Now let us consider the product of
three Hermite polynomials Hn1 (X1 ) Hn2 (X2 ) Hn3 (X3 ). Let n = n1 + n2 + n3 ,
ri ≤ ni , and introduce the following notation for the cumulant:
 
Cr1:3 (X1:3 ) = Cum Hr1 (X1 ), Hr2 (X2 ), Hr3 (X3 ) ,

given in (4.18). Set 2r = r1 + r2 + r3 , we have


n   
n1:3
Hn1 (X1 ) Hn2 (X2 ) Hn3 (X3 ) = Cr1:3 (X1:3 ) Hn−2r
r1:3
r=0 r1 +r2 +r3 =2r
 
× X1n1 −r1 , X21n2 −r2 , X31n3 −r3 , (4.29)
4.4 Products of Hermite Polynomials, Linearization 201

with the assumption ri ≤ min(r, ni ). If X1 = X2 = X3 , then by (4.19)

Cr1:3 (X1:3 ) = S II (r1:3 ) σ 2r ,

where S II (r1:3 ) is given by (1.53), p. 45. Now we recall the quantity


 
n1:3 II n1:3 !
SnI,II (r1:3 ) = S (r1:3 ) = , (4.30)
1:3
r1:3 (n1:3 − r1:3 )! (r − r1:3 )!

where r = (r, r, r), (see (1.58), p. 49), and we obtain


⎛ ⎞

n 
Hn1 (X) Hn2 (X) Hn3 (X) = ⎝ SnI,II (r1:3 )⎠ σ 2r Hn−2r (X)
1:3
r=0 r1 +r2 +r3 =2r

from (4.29).
Now we present an example to show the use of these formulae.
Example 4.10 We consider the product H2 (X1 ) H2 (X2 ) H2 (X3 ) using (4.25) with
ni = 2,

   
H2 (X1 ) H2 (X2 ) H2 (X3 ) = H6 X12 , X212 , X312 + 4σ23 H4 X12 , X2 , X3
+ 4σ12 H4 (X1, X2 , X312 ) + 4σ13 H4 (X1, X212 , X3 ) + 8σ12 σ13 H2 (X2 , X3 )
+ 8σ13 σ23 H2 (X1 , X2 ) + 8σ12 σ23 H2 (X1 , X3 )
+ 2σ13
2
H2 (X2 ) + 2σ23
2
H2 (X1 ) + 2σ12
2
H2 (X3 ) + 8σ12 σ13 σ23 .

One can check this expression using (4.29) and building the following table:

n1:3 
r ri rj r1:3 S II (r1:3 ) Coefficient # of terms
0 0 0 1 1 1 1
1 0 1, j = i 4 1 4 3
2 2 1, j = i 4 2 8 3
2 0 2, j = i 1 2 2 3
3 2 2, j = i 1 8 8 1

where ri denotes one of the triple r1:3 , and rj the rest two of them.
Particularly, if Xk = X, k = 1 : 3, then

H2 (X) H2 (X) H2 (X) = H6 (X) + 12σ 2 H4 (X) + 30σ 4 H2 (X) + 8σ 6 ,


202 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

2H
observe that the terms like σ13 σ23 H2 (X1 , X2 ) and σ13 2 (X2 ) provide terms with
the same type σ H2 (X), hence the number of them is 3 · 8 + 3 · 2 = 30.
4

Case 4.6 (Product of 4 Hermite Polynomials) The case of product of four Her-
mite polynomials Hn1 (X1 ) Hn2 (X2 ) Hn3 (X3 ) Hn4 (X4 ) follows from the result of
Case 1.11 and (4.20). Let n = n1:4 , ri ≤ ni and
 
Cr1:4 (X1:4 ) = Cum Hr1 (X1 ), Hr2 (X2 ), Hr3 (X3 ), Hr4 (X4 ) ,

then we have

Hn1 (X1 ) Hn2 (X2 ) Hn3 (X3 ) Hn4 (X4 )


n  n1:4   
= Cr1:4 (X1:4 ) Hn−2r X1n1 −r1 , X21n2 −r2 , X31n3 −r3 , X41n4 −r4 .
r1:4
r=0 r1:4 =2r

In particular, for one variable case, we can use formula (1.59), p. 49, and obtain
⎛ ⎞

n 
Hn1 (X) Hn2 (X) Hn3 (X) Hn4 (X) = ⎝ SnI,II (r1:4 )⎠ σ 2r Hn−2r (X) ,
1:4
r=0 r1:4 =2r
(4.31)

where SnI,II
1:4 (r1:4 ) is analogue to (4.30).

Expressions similar to (4.26), (4.29), and (4.31) for the product of several
Hermite polynomials, are much more complicated, and therefore we omit them and
suggest using (4.25) instead.

4.5 T-Hermite Polynomials

We take a Gaussian system X = {Xk ∈ Rdk , k = 1, 2, . . .} where the dimensions


 dk
of components Xk are not necessarily equal. Let EXk = 0 and Cj,k = Cov Xj , Xk
denote the covariance matrix. Let us define the generator function by
⎛ ⎞
n
 1 
n

 (X1:n , a1:n ) = exp ⎝ a k Xk − ak Ck,j aj ⎠ , (4.32)
2
k=1 k,j =1

where the list of real vectors a1:n follows the structure of X1:n . Similarly to the
multivariate case, the right-hand side depends on covariance matrices Ck,j , which
correspond to the Gaussian system X1:n (recall EXk = 0) on left-hand side, the
relationship between them is one to one. We recall the connection κ ⊗j,k = vec Ck,j
4.5 T-Hermite Polynomials 203

between cumulants and covariance matrices, given in (3.9), p. 122. Observe that κ ⊗
j,k
is a second-order tensor with covariance entries. We can change covariance matrices
to cumulants in (4.32) and obtain
⎛ ⎞

n
 1 
n
⊗  
 (X1:n , a1:n ) = exp ⎝ Xk a k − κ j,k aj ⊗ ak ⎠ .
2
k=1 j,k=1

We shall consider the T-derivatives Da⊗1:n of  for defining T-Hermite polynomials.


Definition 4.1 Let X = {Xk , k = 1, 2, . . .} be a Gaussian system with EXk = 0
and covariance matrices Cj,k = Cov(Xj , Xk ). We define
 H0 = 1, and  for n ≥ 1,
the nth-order T-Hermite polynomial of the list Xk1:n = Xk1 . . . , Xkn as
   **
Hn Xk1:n = Da⊗k  Xk1:n , ak1:n * . (4.33)
1:n ak1:n =0

The variables of Hermite polynomials are taken from a Gaussian random system,
we emphasize that Hn (X1:n ) depends on the distribution of X1:n , more precisely on
covariance structure of X1:n , (we keep EX = 0).
The first derivative of the generator function  is
 
∂ 
n
Da⊗j  =⊗ =  (X1:n , a1:n ) Xj − Cj,u au ,
∂aj
u=1

which yields the first-order T-Hermite polynomial at a1:n = 0:


  *
*
H1 Xj = Da⊗j  * = Xj .
a1:n =0

Consider the second T-derivative of :


 

n
Da⊗j,k  = Da⊗k  (X1:n , a1:n ) Xj − Cj,u au
u=1
   

n 
n
=  (X1:n , a1:n ) Xj − Cj,u au ⊗ Xk − Ck,u au
u=1 u=1

− (X1:n , a1:n ) κ ⊗
j,k ,

where we obtained the second term using the equality

  ∂  
Cj,k ak ⊗ = Cj,k ⊗ Ik vec Ik = vec Ck,j = κ ⊗
j,k ,
∂ak
204 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

(see (3.9), p. 122). Hence the second-order T-Hermite polynomial follows:


  *
*
H2 Xj , Xk = Da⊗2 * = Xj ⊗ Xk − κ ⊗
j,k .
j,k
a1:n =0
 
The entries of
 H2 Xj , Xk are the second-order Hermite polynomials
H2 Xj,m , Xk,n . The only assumption to be checked  is whether the product
Xj,m Xk,n meets the corresponding covariance Cov Xj,m , Xk,n . To show this
we consider
     
H2 Xj , Xk = vec Xk X j − vec C k,j = vec X k X 
j − C k,j ,

(see (1.7), p. 7) and we see that all entries of the matrix Xk X j − Ck,j are the
corresponding Hermite polynomials. 
The T-Hermite polynomial Hn is with dimension d1:n and all entries are
Hermite polynomials of order n. Hn is not n-symmetric, since one cannot change
the order of the T-derivatives; (see Sect. 1.3.1, p. 13 for the notion of n-symmetric
vectors). The entries of Hn are Hermite polynomials; therefore, the entries are

symmetric as the function of scalar variables. For instance let X1 = [X1 , X2 ]1 ,

X2 = [X1 , X2 ]2 then

H2 (X1:2 )

        
= H2 X1,1 , X1,2 , H2 X1,1 , X2,2 , H2 X2,1 , X1,2 , H2 X2,1 , X2,2 ,
 
where Xk,j is the kth entry of  Xj . Each
 entry H2 Xi,j , Xm,n of H2 (X1:2 ) is
symmetric, i.e. H2 Xi,j , Xm,n = H2 Xm,n , Xi,j .
Now, it is satisfactory to index the variables of  up to the order of the Hermite
polynomial, let n = 3, and consider the third derivative,
  
 ⊗  
3
Da⊗1:3  = Da⊗3 ⊗
Da1:2  = Da3  (X1:3 , a1:3 ) X1 − C1,u au
u=1
  

3
⊗ X2 − C2,u au − vec C2,1
u=1
   

3 
3
=  (X1:3 , a1:3 ) X1 − C1,u au ⊗ X2 − C2,u au
u=1 u=1
 

3
⊗ X3 − C3,u au
u=1
4.5 T-Hermite Polynomials 205

 

3
− (X1:3 , a1:3 ) vec C2,1 ⊗ X3 − C3,u au
u=1
 

3
− (X1:3 , a1:3 ) X1 − C1,u au ⊗ vec C3,2
u=1
 

3
+ (X1:3 , a1:3 ) K−1
(132) (d1:3 ) vec C3,1 ⊗ X2 − C2,u au ,
u=1

where the matrix K−1(132) (d1:3 ) changes the order of the T-product of vectors. Let us

take Da1:3  at zero and obtain the third-order T-Hermite polynomial
*
H3 (X1:3 ) = Da⊗1:3  *a =0 = X⊗1 ⊗ ⊗
1:3 − κ 1,2 ⊗ X3 − X1 ⊗ κ 2,3
3
1:3
 
− K−1 ⊗
(132) κ 1,3 ⊗ X2 , (4.34)
 
where we interpret the product K−1 ⊗
(132) κ 1,3 ⊗ X2 as follows. If we have the
product X1 ⊗ X2 ⊗ X3 and we need the covariance κ ⊗ 1,3 of X1 and X3 in the product,
first, we interchange X2 and X3 by K(132), then take the covariance vector κ ⊗ 1,3 of
−1
X1 , X3 and then we reorder by K(132) back to the original order. Now, we rewrite
(4.34), so that each T-product term start with a constant
 
⊗1
H3 (X1:3 ) = X1:33 − κ ⊗ −1 ⊗ −1 ⊗
1,2 ⊗ X3 − K(231) κ 2,3 ⊗ X1 − K(132) κ 1,3 ⊗ X2 . (4.35)

At this point, we note that (4.34) can be rewritten into another form, namely
   
H3 (X1:3 ) = X⊗1 ⊗ −1 ⊗ −1 ⊗
1:3 − X1 ⊗ κ 2,3 − K(312) X3 ⊗ κ 1,2 − K(213) X2 ⊗ κ 1,3 ,
3

where we set the constants after the variables. One can get some even more
equivalent forms, changing the orders inside the covariances, κ ⊗
3,1 for instance.
Our purpose is to simplify general formulae; therefore, we will prefer setting
constants either before or after the variables. If an expression includes more T-
Hermite polynomials, then we suppose that all of them are in the same form.
If Xj = X, then we have a simple form
   
H3 (X) = X⊗3 − L−1
12 ,11 κ ⊗
X,2 ⊗ X = X ⊗3
− L−1
11 ,12 X ⊗ κ ⊗
X,2 , (4.36)

where the commutator L−1


12 ,11 , say, follows from (4.35):

L−1 −1 −1
12 ,11 = Id 3 + K(312) + K(213) .
206 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

The derivation of n = 4 can be found in Exercise 4.13, p. 236.


A list of T-Hermite polynomials is in Sect. A.5.2, p. 365.
We can apply the distinct values principle again for Hn (X1:n ) and set all Xj = X
into Hn

Hn (X) = Hn (X, X, . . . , X) .
+ ,- .
n

Although a T-Hermite polynomial is not symmetric in general, they became


n-symmetric if all vector variables coincide; Hn (X1:n ) = Hn (X). We have
Sd1n Hn (X) = Hn (X), hence we can use the n-symmetric equivalence version of it.
Namely, if we apply the symmetrizer Sd1n on Hn , then the order of the T-products
of the terms involved in Hn is arbitrary. An instance is H5 (X), say, which can be
written in the following form:


H5 (X) = X⊗5 − 10X⊗3 ⊗ κ X,2 + 15κ ⊗2
X,2 ⊗ X.

Generating function  (X1:n , a1:n ) can be written using the characteristic func-
tion, see Property 4 for more details,
⎛ ⎞
n
 1 
n

 (X1:n , a1:n ) = exp ⎝ a k Xk − ak Ck,j aj ⎠
2
k=1 k,j =1
 n 
 

n

= exp ak Xk E exp i a k Xk
k=1 k=1
(  n * )
 *
 *
= E exp ak (Xk + iYk ) * X1:n ,
*
k=1

where Y1:n is an independent copy of X1:n . Now the Taylor series expansion of the
exponent by a1:n inside the conditional expectation coincides with the Taylor series
expansion of  (X1:n , a1:n ) and equating the appropriate coefficients of the powers
of ak we obtain the following.
Lemma 4.1 (Construction by Condition) The Hermite polynomial Hn (X1:n ) can
be given by the following conditional expectation:
 ⊗ *  ⊗ *
* *
Hn (X1:n ) = E (Xk + iYk )* X1:n = E (xk + iXk )* .
k=1:n k=1:n x1:n =X1:n
(4.37)
4.5 T-Hermite Polynomials 207

Proof We have seen that the generator  (X1:n , a1:n ) has the following form:
    * 
 (X1:n , a1:n ) = E exp a1:n X1:n + ia1:n Y1:n * X1:n
  *
= E exp a1:n (x1:n + iX1:n ) *x =X .
1:n 1:n

Now let us take the T-derivative by a1:n of the second equality before plugging
x1:n = X1:n in, and obtain
  * ⊗
Da⊗1:n E exp a1:n (x1:n + iX1:n ) *a =E (xk + iXk ) ,
1:n =0 k=1:n

which proves (4.37).


⊗
In practice, we have to find the expected value E k=1:n (xk + iXk ) first then we
plug x1:n = X1:n into the result.
Example 4.11 Let us consider the case n = 2. We start with the product
⊗
(xk + iXk ) = x1 ⊗ x2 + x1 ⊗ iX2 + iX1 ⊗ x2 − X1 ⊗ X2 ,
k=1:2

then we take the expected value of both sides


⊗
E (xk + iXk ) = x1 ⊗ x2 − EX1 ⊗ X2 ,
k=1:2

finally we replace x1:2 = X1:2 and obtain


⊗ *
*
E (xk + iXk )* = X1 ⊗ X2 − κ ⊗
1,2 = H2 (X1:2 ) .
k=1:2 x1:2 =X1:2

It follows directly from the definition that we can permute the variables of a
T-Hermite polynomial using commutator matrices so that it satisfies the following
equation:
Property 4.8 (Permutation of Variables) For each permutation p ∈ Pn , we have
 
Hn Xp(1:n) = Kp (d1:n ) Hn (X1:n ) . (4.38)

We can generate T-Hermite polynomials by the linear combination of the


previous two polynomials. For instance, let us consider H3 , (4.34),
   
⊗1
H3 (X1:3 ) = X1:33 − κ ⊗1,2 ⊗ X 3 − K −1
(231) κ ⊗
2,3 ⊗ X 1 − K −1
(132) κ ⊗
1,3 ⊗ X 2
 
= H2 (X1:2 ) ⊗ X3 − K−1 ⊗
(231) κ 2,3 ⊗ H1 (X1 )
 
−K−1(132) κ ⊗
1,3 ⊗ H 1 (X 2 .
) (4.39)
208 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

Similarly, the nth-order T-Hermite polynomial can be expressed by the previous two
T-Hermite polynomials.
Theorem 4.2 (Recurrence Relation) We have H0 = 1, H1 (X1 ) = X1 , and
if n > 1
 
Hn (X1:n ) = Hn−1 X1:(n−1) ⊗ Xn

n−1   
− K−1 ⊗
(n,j,[1:(n−1)]\j ) (d1:n ) κ n,j ⊗ Hn−2 X[1:(n−1)]\j .
j =1
(4.40)
   
Remember that κ ⊗
k,j = Cum2 Xk , Xj = vec Cj,k .
See Sect. 4.8.1, Appendix for the proof.
One can have another equivalent form to (4.40).

  
n−1
Hn (X1:n ) = Hn−1 X1:(n−1) ⊗ Xn − K−1
((j :n−1)S )n
j =1
   
× (d1:n ) Hn−2 X[1:(n−1)]\j ⊗ κ ⊗
j,n , (4.41)
 
where permutations (j : n − 1)S n take the j th element to the place n − 1 and
leaves the rest elements of 1 : n unchanged (see Sect. 1.1, p. 1 for permutations
in cycle notation). The notation [1 : (n − 1)] \j denotes the index set when the
component j is missing from 1 : (n − 1).
The commutators in Eq. (4.40) define the commutator


n−1
Jn = K−1
(n,j,[1:(n−1)]\j ) (d1:n ) . (4.42)
j =1

The recurrence formula (4.40) implies the following form for equal variates:
 
Hn (X) = Hn−1 (X) ⊗ X − Jn Hn−2 (X) ⊗ κ ⊗
X,2 . (4.43)

Although T-Hermite polynomials are not symmetric in general, they become n-


symmetric if all variables are equal; Hn (X1:n ) = Hn (X). Hence we can apply
symmetrizer Sd1n for both sides of (4.43) and get


Hn (X) = Hn−1 (X) ⊗ X − (n − 1) Hn−2 (X) ⊗ κ ⊗
X,2 .


Note that symmetry equivalence = is useful only when one side of the equivalence,
like Hn here, is n-symmetric; because in that case one gets equality using sym-
metrizer Sd1n on one side only.
4.5 T-Hermite Polynomials 209

Example 4.12 H4 (X) is 4-symmetric, therefore


 

H4 (X) = Sd14 H3 (X) ⊗ X − J4 κ ⊗ ⊗
X,2 ⊗ H2 (X) = H3 (X) ⊗ X − 3H2 (X) ⊗ κ X,2 .

Although the formula of H4 with symmetrizer Sd14 is much simpler than (4.43),
still for large d, say, d ≥ 8 it is useless computationally since the symmetrizer is a
d 4 × d 4 matrix and assumes evaluating 4! permutational matrices.
Property 4.9 (Independent Variables) It is easy to see, using the definition (4.33),
if X1:k and X(k+1):n are independent Gaussian random variates, then
 
Hn (X1:n ) = Hk (X1:k ) ⊗ Hn−k X(k+1):n .

If the random variables Yj , j = 1, 2, . . . , n are independent Gaussian, then


  ⊗  
Hkj Y1k1 , Y21k2 , . . . , Yn1kn = Hkj Yj .
j =1:n

Here we used our usual notation for the repeated variables Yj 1kj , i.e. Yj 1kj =
 
Yj , Yj , . . . , Yj is a list with kj vector components.
+ ,- .
kj

Remark 4.5 Although T-Hermite polynomials are unique, the terms involved can
be put in different forms, an instance is the following:
   
X1 ⊗ κ ⊗ −1 ⊗ −1 ⊗
2,3 ⊗ X4 = K(1324) X1 ⊗ κ 3,2 ⊗ X4 = K(1423) X1 ⊗ X4 ⊗ κ 2,3 ,

etc.
Multi-linearity has different forms for T-Hermite polynomials, we will consider
three of them. For instance
 
H2 (a ⊗ Xk , Xj ) = (a ⊗ Xk ) ⊗ Xj − a ⊗ κ ⊗
k,j = a ⊗ H2 Xk , Xj ,

where we used the identity


 
vec Cov Xj , a ⊗ Xk = a ⊗ κ ⊗
k,j ,

see (3.12), p. 125.


Property 4.10 (Multilinear) T-Hermite polynomial Hn is multilinear. Let
{Y, Z, X1:n } be a centralized Gaussian system, and A, and a a real matrix and
a vector, respectively. Then Hn is additive

Hn+1 (X1:n , Y + Z) = Hn+1 (X1:n , Y) + Hn+1 (X1:n , Z) , (4.44)


210 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

and homogenous in terms of matrix product


 
Hn+1 (X1:n , AY) = Idk ⊗ A Hn+1 (X1:n , Y) , (4.45)

and homogenous in terms of T-product

Hn+1 (a ⊗ Y, X1:n ) = a ⊗ Hn+1 (Y, X1:n ). (4.46)

Proof First, we show (4.44). In fact, we use Theorem 4.2 and obtain

Hn+1 (X1:n , Y + Z) = Hn (X1:n ) ⊗ (Y + Z)



n   
− K−1 :1)
κ⊗
Y+Z,X ⊗ Hn−1 X(1:n)\j
(n+1,(j S) j
j =1

= Hn (X1:n ) ⊗ (Y + Z)
n    
− K−1 κ ⊗
+ κ ⊗
⊗ H n−1 X (1:n)\j
(n+1,(j :1)S ) Y,Xj Z,Xj
j =1

= Hn+1 (X1:n , Y) + Hn+1 (X1:n , Z) .

Now let A, be a p × q real matrix, use (4.37) and obtain


⊗ *
*
Hn+1 (X1:n , AY) = E (xk + iXk ) ⊗ (y + iAY)*
k=1:n x1:n =X1:n ,y=AY
⊗ *
*
= E (xk + iXk ) ⊗ A (y + iY)*
k=1:n x1:n =X1:n ,y=Y
  ⊗ *
*
= Idk ⊗ A E (xk + iXk ) ⊗ (y + iY)*
k=1:n x1:n =X1:n ,y=Y

and the assertion (4.45) follows.


Next, we use (4.37) again
⊗ *
*
Hn+1 (a ⊗ Y, X1:n ) = E (y+ia ⊗ Y) ⊗ (xk + iXk )*
k=1:n y=a⊗Y,x1:n=X1:n
⊗ *
*
= a ⊗ E (y+iY) ⊗ (xk + iXk )* ,
k=1:n y=Y,x1:n=X1:n

showing (4.46).
Consecutive application of formula (4.45) provides the following:
  ⊗ 
Hn (Ak Xk )1:n = Ak Hn (X1:n ) , (4.47)
1:n

where Ak , k = 1 : n are square matrices.


4.5 T-Hermite Polynomials 211

One can get several cases of multi-linearity of T-Hermite polynomials combining


formulae (4.44), (4.45), and
 (4.46)using  commutator
  matrices if it is necessary.
For instance since a Idk ⊗ b = Idk ⊗ b a = a ⊗ b , we obtain
       
H2 a Xk , b Xj = a H2 Xk , b Xj = a Idk ⊗ b H2 Xk , b Xj
   
= a ⊗ b H2 Xk , Xj ,

therefore, the definition of vector H2 is compatible with the definition (4.3) of H2 ,


since the left side is a multivariate but scalar-valued Hermite polynomial.
Remark 4.6 The entries of Hn (X) are Hermite polynomials of degree n, with mixed
orders. Let the multi-index of an entry of Hn (X) be k1:n = (k1 , . . . , kn ), so that
ki ≥ 0, and let the type of k1:n be 1:n . Then we can identify the entries [Hn (X)]k1:n
of Hn (X) by Hermite polynomial H1:n (X1:n ) (see (4.4)), which corresponds to the
Hn (X1:n ) using distinct values principle, cf. (4.3).
The recurrence formula (4.40) readily implies the following derivative:

DX n
Hn (X1:n ) = Hn−1 (X1:n ) ⊗ vec Idn . (4.48)

If we take the derivative by Xk different from Xn , then (4.48) is not valid. Let
n = 3, for example, and take the derivative by X2 , then we have
 
⊗ ⊗ ⊗ ⊗
DX 2
H 3 (X 1:3 ) = DX2 (H 2 (X 1:2 ) ⊗ X 3 ) − DX2 H 1 (X 1 ) ⊗ κ 2,3
 
⊗ −1
−DX K
2 (132)
κ⊗1,3 ⊗ H1 (X2 )
   
= K−1 −1 ⊗
(1324) X1 ⊗ X3 ⊗ vec Id2 − K(1324) κ 1,3 ⊗ vec Id2
   
= K−1
(1324) H2 X1,3 ⊗ vec Id2 .

The explanation of getting commutator matrix K−1


(1324) is the following:
K−1
(132) (d1:3 ) = K(132) (d1 , d3 , d2 ), and K(132) (d1 , d3 , d2 ) ⊗ Id2 = K(1324) (d1 , d3 ,
d2 , d2 ), then K(1324) (d1 , d3 , d2 , d2 ) = K−1
(1324) (d1:3 , d2 ). In practice we use
K−1
(1324) = Id1 ⊗ Kd2 •d3 ⊗ Id2 for calculation.
The derivative of the Hermite polynomial Hn by a variate, say, Xk follows from
the permutation property (4.38) and the result (4.48).
Property 4.11 (Derivative by Variables) The T-derivative by a variable Xk of a T-
Hermite polynomial Hn (X1:n ) is
⊗    
DX k
Hn (X1:n ) = K−1
(k:n)S Hn−1 X(1:n)\k ⊗ vec Idk .
212 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

Proof We shall use the recurrence relation, Theorem 4.2. First, we reorder the
variables
 
Hn (X1:n ) = K−1
(k:n)S Hn X1:n\k , Xk ,

then by the recurrence relation, we have


   
Hn (X1:n ) = K−1
(k:n) Hn−1 X(1:n)\k ⊗ Xk + P X(1:n)\k ,
S

where the second term P does not depend on Xk . Therefore we obtain


    
⊗ −1
DX H n (X 1:n ) = K (k:n) ⊗ Id k H n−1 X (1:n)\k ⊗ vec Id k .
k S

Property 4.12 (Derivative) The derivative of the Hermite polynomial Hn of one


vector variable is


n

DX Hn (X) = K−1
((k:n) (Hn−1 (X) ⊗ vec Id ) , (4.49)
S )n+1
k=1

where ((k : n)S )n+1 ∈ Pn+1 , (see Sect. 1.1, p. 1 for the cycle notation of

permutations). The derivative DX Hn can be obtained using symmetrizer as well
 n 
    
⊗ −1
DX Hn (X) = Sd1n ⊗ Id K(k:n) ⊗ Id (Hn−1 (X) ⊗ vec Id )
S
k=1
 
= n Sd1n ⊗ Id (Hn−1 (X) ⊗ vec Id ) .

Proof We use the formula of construction by condition (4.37), i.e.


*
Hn (X) = E (x + iX)⊗n *x=X ,


for proving (4.49). The assertion follows from T-derivative DX Hn (X) =

EDx (x + iX); see (2.36) for the T-derivative of T-product of functions.
Since the Hermite polynomial Hn (X) depends on the variance–covariance
matrix  of X, (recall EX = 0), we introduce the notation Hn (x|) for Hn (X),
where x is a real vector.
Property 4.13 (Rodrigues’s Formula) We have the following Rodrigues’s formula
for Hermite polynomial Hn (x|) namely

Hn (x|) = (−1)n ϕ (x|0, )−1 ()⊗n Dx⊗n ϕ (x|0, ) . (4.50)


4.5 T-Hermite Polynomials 213

Proof Let us take the T-derivative of the density function


 
1 1 −1
ϕ (x|μ, ) = exp − (x−μ)  (x−μ) = ϕ (x−μ|0, )
(2π)d/2 ||1/ 2 2

by μ
 *
* 1  −1 **
Dμ⊗n ϕ (x−μ|0, )*μ=0 = ϕ (x|0, )Dμ⊗n  −1
exp x  μ − μ  μ *
2 μ=0
 *
*
= ϕ (x|0, ) Dμ⊗n   −1 x, μ| −1 * , (4.51)
μ=0

where  is defined in (4.32). By the definition (4.33) we have


 *  
*
Dμ⊗n   −1 x, μ| −1 * = Hn  −1 x| −1 ,
μ=0
 
the polynomial Hn  −1 x| −1 corresponds to the variable  −1 X with variance
 
 −1 , that is Var  −1 X =  −1 ; therefore, Var (X) = . Now we use (4.47) and
 ⊗n  
obtain  −1 Hn (x|) = Hn  −1 x| −1 . Therefore

*  ⊗n
Dμ⊗n ϕ (x−μ|0, )*μ=0 = ϕ (x|0, )  −1 Hn (x|) .

We also have
*
Dμ⊗n ϕ (x−μ|0, )*μ=0 = (−1)n Dx⊗n ϕ (x|0, ) .

Now we replace these results into Eq. (4.51), and rearrange them to get Rodrigues’s
formula.
   ⊗n
Note Hn  −1 x| −1 =  −1 Hn (x|) is referred to as covariant Hermite
polynomials.
Remark 4.7 There is another definition of Hn (X) which differs from  ours
 in
changing X into  −1 X. The corresponding polynomials are Hn  −1 X =
 −1⊗n Hn (X), cf. Rodrigues’s formula (4.50). Since Hn (X) is n-symmetric, it
is also possible to list only the distinct values. One can use eliminating matrix Q+d,n
for this purpose: Q+ d,n Hn (X), cf. (1.32), p. 21.
The Multilinear Property 4.10 of T-Hermite polynomials Hn provides us the
multinomial formula.
214 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

Proposition 4.5 (Multinomial Formula) The multinomial expansion of the T-


Hermite polynomial of a sum is the following:
 n    
   n 
Hn Xk = Hn X1k1 , . . . , Xn1kn .
k1:n
k=0 0≤k1:n ;k1:n =n

In particular, we have the Binomial expansion


n  
 
 n  
Hn (AX1 + BX2 ) = A⊗k ⊗ B⊗n−k Hn X1k , X21n−k ,
k
k=0

and in the case of independence of X1 and X2 , we have


n  
 
n
Hn (AX1 + BX2 ) = A⊗k ⊗ B⊗n−k (Hk (X1 ) ⊗ Hn−k (X2 )) .
k
k=0
(4.52)


We see that in each case above we have n-symmetry equivalence =; to get equality
we should apply the symmetrizer Sd1n .
Gaussian systems are usually called linear, the main reason is that for jointly
Gaussian vectors X, and Y the conditional expectation

E (X|Y) = μX + CX,Y C−1


Y,Y (Y−μY ) ,

that is the best prediction of X by Y is linear in Y, where CX,Y logically denotes the
covariance matrix of X and Y, and the variance–covariance matrix CY,Y assumed to
be positive definite. Moreover, X−E (X|Y) and Y are uncorrelated; therefore, they
are independent. An immediate consequence of this is that the conditional variance
Var (X|Y) = Var (X−E (X|Y)) is constant

Var (X−E (X|Y)) = CX,X − CX,Y C−1


Y,Y CY,X ,

i.e. it does not depend on Y, cf. Sect. 3.5.3.1, p. 160.


Now we address an issue of the conditional expectation of Hermite polynomials
assuming jointly Gaussianity of the variates.
Example 4.13 Let vector variates X and Y be jointly Gaussian and we consider the
conditional expectation
 
E (H2 (X) |Y) = E (X−E(X|Y) + E (X|Y))⊗2 |Y − κ ⊗
X,2 . (4.53)
4.5 T-Hermite Polynomials 215

We use the orthogonality of X − E (X|Y) and E (X|Y), i.e.

E ((X − E (X|Y)) ⊗ E (X|Y) |Y)


= E ((X − E (X|Y)) |Y) ⊗ E (X|Y) = E (X − E (X|Y)) ⊗ E (X|Y) = 0

and obtain
   
E (X − E (X|Y) + E (X|Y))⊗2 |Y = E (X − E (X|Y))⊗2 |Y
 
+ E (E(X|Y))⊗2 |Y .

here the second term is simply (E(X|Y))⊗2 , the first term is


 
E (X − E (X|Y))⊗2 |Y = E (X − E (X|Y))⊗2 = EX⊗2 − E (E(X|Y)) ⊗2 .

Now we replace these results to (4.53) and obtain

E (H2 (X) |Y) = (E(X|Y))⊗2 + μ⊗


X,2 − E (E(X|Y))
⊗2
− κ⊗
X,2

= (E(X|Y))⊗2 − E (E(X|Y))⊗2 + (E(X|Y))⊗2


= H2 (E (X|Y)) .

We prove the generalization of this example as an application of the Binomial


formula above.
Lemma 4.2 Let us assume that vector variates X and Y are jointly Gaussian, then
the conditional expectation of T-Hermite polynomials is T-Hermite polynomials of
the conditional expectation of variates, i.e.

E (Hn (X) |Y) = Hn (E (X|Y)) .

Proof We use the formula (4.52) for Hn (X) = Hn (X−E (X|Y) + E (X|Y)) and
obtain

Hn (X) = Hn (X−E (X|Y) + E (X|Y))


n  
  n
= Hk (X−E (X|Y)) ⊗ Hn−k (E (X|Y)) ,
k
k=0
216 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

since X−E (X|Y) and E (X|Y) are independent, indeed they are uncorrelated
 
E (X−E (X|Y)) (E (X|Y) − μX ) = E X − μX − CX,Y C−1
Y,Y (Y−μY )

× (Y−μY ) C−1
Y,Y CY,X

= CX,Y C−1 −1
Y,Y CY,X − CX,Y CY,Y CY,X = 0,

therefore independent. Now the conditional expectation


n  

 n
E (Hn (X) |Y) = E (Hk (X−E (X|Y)) ⊗ Hn−k (E (X|Y)) |Y)
k
k=0
n  
 n
= E (Hk (X−E (X|Y)) ⊗ Hn−k (E (X|Y)))
k
k=0
= Hn (E (X|Y)) .

Q.E.D.
Please, note that the conditional expectation is not element-wise, each entry of
E (X|Y) depends on the whole Y.

4.6 Moments, Cumulants, and Linearization

Let us consider higher-order moments for a Gaussian system X = {Xk , k = 1, 2, . . .}


with EXk = 0and covariance
 matrix Cj,k = Cov(Xj , Xk ). We repeat the notation

κ⊗
j,k = Cum 2 X j , X k = vec Cj,k , and PII
n denotes the set of all the partitions of
pairs of 1 : n.
Proposition 4.6 If n is even, then
 ⊗
EX⊗1 ⊗
1:n = μ1:n =
n
K−1 (d )
p(K II ) 1:n
κ⊗
j,k ,
(j,k)∈K II
K II ∈PII
n

where the summation is taken over all partitions K II of PII n . The number of all
partitions K II ∈ PII
n is (n − 1)!! (see (1.51), p. 43); therefore, there are (n − 1)!!
terms in the above sum. If n is odd, then μ⊗ 1:n is zero.
In particular if , say, n = 2k is even and the variables of X1:n are the same
Xk = X, then

EX⊗2k = μ⊗ −1 ⊗k
X,2k = Lk2 κ X,2 ,
4.6 Moments, Cumulants, and Linearization 217

see Sect. A.2, p. 353 for the definition of commutator L−1


k2 . We apply the sym-
metrizer Sd1n and obtain

μ⊗ ⊗k
X,2k = (2k − 1)!!κ X,2 , (4.54)

where another form of the coefficient is (n − 1)!! = n!/2n/2 (n/2)!.


Example 4.14 If n = 6, then we obtain

⊗1

15
⊗ ⊗ ⊗
EX1:66 = K−1
pj κ pj (1) ⊗ κ pj (2) ⊗ κ pj (3) ., (4.55)
j =1

where the permutations pj of the numbers 1:6 are originated by the indices of the
products in (4.10). Partitions
 yield permutations  as follows: a permutation pj ∈
P6 is split into three pairs pj (1) |pj (2) |pj (3) , so that pj (k) corresponds to the
cumulant κ ⊗pj (k) . Moreover if Xj = X, j = 1 : 6, then


EX⊗6 = L−1 ⊗3 ⊗3
32 κ X,2 = 5!!κ X,2 , (4.56)

where commutator L−1


32


15
L−1
32 = K−1
pj ,
j =1

also see (A.6), p. 354 in detail for commutator L−1


32 . Observe that 5!! = 6!/3!2 = 15
3
n
and compare to N = n!/ j =1 j ! (j !) , where type  corresponds to blocks with
j

cardinality 2, namely if n = 6, j = 3δ2j , cf. (1.41), p. 32.


A recursive algorithm for the expected value μ⊗
1:2n can be applied starting from
n = 1, as follows:


n−1
μ⊗
1:2n = K−1 ⊗ ⊗
(k:2(n−1))S (d1:2n ) μ(1:2(n−1))\k ⊗ κ k,n ,
k=1

where the permutation (k : 2 (n − 1))S ∈ P2n is in cycle notation (see Sect. 1.1,
p. 1); it is simply moving k into the place 2 (n − 1).
First-order moments of Hermite polynomials are zero by the construction, the
second-order ones are,

       ⊗
E Hn Xk1:n ⊗ Hm Xj1:n = δnm K−1
mn (p) d k1:n , d j1:n κ⊗
ks ,jp(s) ,
s=1:n
p∈Pn
(4.57)
218 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

where k1:n = (k1 , k2 , . . . , kn ), j1:n = (j1 , j1 , . . . , jn ) and the summation (4.57) is


taken for all permutations p ∈ Pn when the order of (k1 , k2 , . . . , kn ) is fixed and the
indices j1:n are permuted jp(1) , jp(2) . . . , jp(n) , this way the summation is taken
over for n! terms.
We recall that the mixture operator mn defined on Pn such that mn (p) =
(1, p (1) + n, 2, p (2) + n, . . . , n, p (n) + n) ∈ P2n , (see  Sect. 1.1, p. 1). We iden-
tify the indices k1:n by 1 : n and the permuted indices jp(1) , jp(2) . . . , jp(n) by p.
Now the mixing commutator matrix is defined by
    
Mmn d k1:n , d j1:n = K−1
mn (p) d k1:n , d j1:n , (4.58)
p(1:n)

with n! terms.
In particular
  ⊗n
EHn (X1 ) ⊗ Hn (X2 ) = M−1
mn d1n , d21n κ 1,2 ,

where Hn (Xj ) = Hn (Xj , . . . , Xj ), and Xj ∈ Rdj . In particular if all dimensions


+ ,- .
n
equal to d, then we use notation Mmn (d), or Mmn shortly and obtain
⊗n
EHn (X1 ) ⊗ Hn (X2 ) = M−1
mn κ 1,2 . (4.59)

Example 4.15 We provide the detailed formula for EH3 (X1:3 ) ⊗ H3 (X4:6 ) at (A.21)
which implies
⊗3
E [H3 (X) ⊗ H3 (X)] = M−1
m3 κ X,2 ,

where the mixing commutator M−1


m3 , is given by (A.8), p. 356, i.e.

−1 −1 −1 −1 −1 −1
M−1
m3 = K(142536) + K(142635) + K(152436) + K(152634) + K(162435) + K(162534).

Take a standardized Gaussian vector variate Y with EY = 0, Var (Y) = Id ,


then EH⊗2 ⊗3 ⊗2
3 (Y) = Mm3 (vec Id ) , moreover EH3 (Y) = Cum2 (H3 (Y)). Both
the second-order moments and cumulants are 2-symmetric, namely EH⊗2 3 (Y) ∈
⊗2 ⊗2 ⊗3
Sd 3 ,2 , nevertheless Sd 3 12 H3 (Y) = EH3 (Y) = 3!Sd 3 12 (vec Id ) , as one
may expect. The reason is that, for instance, Sd 3 12 K−1
(152634) = Sd 3 12 , since
  3  −1
Sd 3 12 = 1/2 Id 3 + K(21) d . At the same time K(152634) = K(135624); therefore,
 
Sd 3 12 K−1
(152634) = 1/2 K(135624) + K(624135) .
Let us note that EH⊗2
3 (Y) = 3!vec Sd1n Sd13 .
4.6 Moments, Cumulants, and Linearization 219

Vector EHn (X) ⊗ Hn (X) is not 2n-symmetric, nevertheless we can simplify it.
Let us consider the case n = 3 and replace the inverse commutator matrices by
commutator matrices; then we obtain

M−1
m3 = K(135246) + K(135264) + K(135426) + K(135624) + K(135462) + K(135642),

(see (A.8) p. 356)). Now we realize that


 
M−1
m3 = 3! Id 3 ⊗ Sd13 K(135246),

therefore
⊗3   ⊗3
EH3 (X)⊗2 = M−1
m3 κ X,2 = 3! Id 3 ⊗ Sd13 K(135246)κ X,2 .

In general we have
−1 ⊗n
EH⊗2
n (X) = Mmn κ X,2 . (4.60)

We introduce the permutation p2n of 1 : 2n, so that p2n = (1, 3, . . . 2n − 1, 2, 4, . . .


2n), i.e. the first n places are taken by ordered odd numbers and then by even ones.
Now we can write
 
EHn (X)⊗2 = n! Id n ⊗ Sd1n Kp2n κ ⊗n
X,2 .

In particular, let Y be a Gaussian vector variate, i.e. EY = 0, Var (Y) = Id , then


 
EH⊗2
n (Y) = n! Id n ⊗ Sd1n vec Id n = n!vec Sd1n (4.61)

since

Kp2k vec⊗k Id = vec Id k ,

cf. Exercise 1.33, p. 56.


Proposition 4.7 Let us take a partition L = (b1 , b2 , . . . , bp ) of (1 : n). Then
⊗    ⊗
E Hnj (Xbj ) = K−1 d
p(K II ) b1:p
κ⊗ , (4.62)
j =1:p (j,k)∈K II j,k
{ K II |( L,K II )nl }
 
where* the summation is taken over all diagrams L, K II without loops. If the set
 II   
K * L, K II , is empty, the expectation is zero.
nl

Let partition matrices VII and U correspond to partitions K II and L respectively,


then the matrix product VII U contains either ones or twos besides zeros only.
The diagram is without loop if there is no entry with 2 at all. In particular if
220 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

 
p = 2, i.e. L = (b1 , b2 ), then the diagram L, K II without a loop corresponds
to a partition K II containing pairs (k1 , k2 ) so that k1 ∈ b1 and k2 ∈ b2 . One can
reach all such types of partitions by fixing the first entries and permuting the second
ones (see 4.57)).

4.6.1 Cumulants for T-Hermite Polynomials

We assume that all the variables below in this subsection are jointly Gaussian.
Let us take a partition L = (b1 , b2 , . . . , bp ) of 1 : n with size |L| = p,
and
⊗a list of Gaussian  system of vectors X1:n , we recall the notation XL =

Xb1 , Xb2 , . . . , ⊗ Xbp . We already have the formula
 ⊗
Cump (XL ) = K−1
p(K) Cum|b| (Xb ), (4.63)
b∈K
L∪K=O

expressing the cumulants of products in terms of cumulants, see (3.58), p. 148,


where the summation is all over indecomposable partitions K to L and L, K are in
canonical form.
Now because of the Gaussianity of the system, all the cumulants are zero on the
right-hand side of (4.63) except for the second-order ones. Therefore the summation
is taken only for partitions K II ∈ PII n . The assumption L ∪ K
II = O, i.e. the

partitions L and K are indecomposable, is equivalent to the assumption that the


II

graph (L, K II ) is closed.


Theorem 4.3 Let L = (b1 , b2 , . . . , bp ) be a partition of 1 : n. Then we have the
formula
   ⊗
Cump (XL ) = K−1
p(K )
II d b κ⊗ , (4.64)
1:p
(j,k)∈K II j,k
{ K II |( L,K II )c }
 
where the summation is over all closed diagrams L, K II c and where the order of
 
the dimensions db1:p = db1 , . . . , dbp of the commutator K−1
p(K II )
follows the order
 
of the variables in XL , and the dimensions are dbk = dj , j ∈ bk .
The cumulants of T-Hermite polynomials of variables corresponding to L with
block sizes |bj | = nj are given by
 
Cump Hn1 (Xb1 ), Hn2 (Xb2 ), . . . , Hnp (Xbp )
   ⊗
= K−1
p(K )
II d b κ⊗ , (4.65)
1:p
(j,k)∈K II j,k
{K |(L,K )cnl }
II II

 
where the summation is over all closed diagrams L, K II cnl without loop.
4.6 Moments, Cumulants, and Linearization 221

If p = 2 in (4.64), i.e. we consider the covariance vector of two T-Hermite


polynomials, then
⊗n
Cum2 (Hn (X1 ) , Hn (X2 )) = M−1
mn κ 1,2 , (4.66)

see (4.59), since the covariance coincides with the cumulant if the mean is zero. In
particular
⊗n
Cum2 (Hn (X) , Hn (X)) = M−1
mn κ X,2 , (4.67)

and if X is standardized then

VarHn (X) = n!Sd1n , (4.68)

see (4.61).
The following equations are particular cases of the above theorem:
Lemma 4.3 Let the partition L be {(2j − 1, 2j ) |j = 1 : k}, then
      
Cum H2 X1,2 , H2 X3,4 , . . . , H2 X(2k−1),2k (4.69)
      
= K−1 (d ) Cum2 Xb1 ⊗ Cum2 Xb2 ... ⊗ Cum2 Xbk ,
p(K II ) 1:2k
2k−1 (k−1)!

 
where the blocks bj of pairs and the permutation p K II of the numbers (1 : 2k)
 
correspond to the closed diagrams L, K II without loops, and the summation is
taken over all such diagrams. If all random variables included in (4.69) are the
same X, then the usage of commutator matrices is still necessary
⎛ ⎞

Cumk (H2 (X)) = ⎝ K−1
p(K II )
⎠ κ ⊗k .
X,2 (4.70)
2k−1 (k−1)!

2k
Cumulant Cumk (H2 (X)) ∈ Rd is not 2k-symmetric, but only is k-symmetric (see
Case 1.7, p. 46).
Both Cumk (H2 (X)) and κ ⊗k
X,2 are k-symmetric in the space of multilinear
algebra Md 2 ,k , therefore
⎛ ⎞

Sd 2 1k Cumk (H2 (X)) = Cumk (H2 (X)) = Sd 2 1k ⎝ K−1
p(K II )
⎠ κ ⊗k .
X,2
2k−1 (k−1)!
222 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

The commutators K−1


p(K II )
are defined on Md,2k ; therefore, Sd 2 1k K−1
p(K II )
=
Sd 2 1k and we cannot simplify the right-hand side, (see Sect. 1.3.1, p. 13 for
symmetrization).
Example 4.16 If k = 2, we have

Cum2 (H2 (X1 , X2 ) , H2 (X3 , X4 ))


 
= K−1
(1324) (d 1:4 ) κ ⊗
1,3 ⊗ κ ⊗ −1 ⊗ ⊗
2,4 + K(1423) (d1:4 ) κ 1,4 ⊗ κ 2,3 ,

and for a one vector variable


 
Cum2 (H2 (X) , H2 (X)) = EH2 (X)⊗2 = K−1 −1 ⊗2
(1324) + K(1423) κ X,2 ,

4
where the permutation matrices act on Rd , the permutation (1324) corresponds to
the partition of pairs K II = {(1, 3) , (2, 4)}, and the other permutation (1423) stands
for the partition of pairs K II = {(1, 4) , (2, 3)}. The commutator K−1 (1423) = K(1342)
permutes vectors in the T-products of four vectors with dimensions d, and Sd 2 12
symmetrize vectors in the T-products of two vectors with dimensions d 2 , hence we
have
1  1 
Sd 2 12 K(1342) = K(1342) + K(4213) = Sd 2 12 = Id 4 + K(21) (d 2 , d 2 ) .
2 2

Example 4.17 Let k = 3, we obtain the cumulant and also the moment EH2 (X)⊗3
by (4.70):
⎛ ⎞
8
Cum3 (H2 (X) , H2 (X) , H2 (X)) = ⎝ K−1 ⎠ ⊗3
pj (d) κ X,2 . (4.71)
j =1

Following the algorithm which is given in Case 1.7, p. 46, we derive the per-
mutations involved in (4.71): p1 = (23|45|16) = (1 : 5)S , p2 = (23|46|15),
p3 = (24|35|16), p4 = (24|36|15), p5 = (25|36|14), p6 = (13|45|26), p7 =
(13|46|25), p8 = (143625). We denoted the correspondence of the permutations
to the partitions here by separating the blocks inside the permutations. Note that the
summation runs by partitions and both the permutations and the cumulants should be
assimilated to it. For instance the permutation (134526) corresponds to the partition
K II = {(2, 6) , (4, 5) , (3, 1)}. Making it unique, one has to use the canonical form
K II = {(1, 3) , (2, 6) , (3, 5)}, see Definition 1.7.
4.6 Moments, Cumulants, and Linearization 223

4.6.2 Products for T-Hermite Polynomials

Taking the T-product X⊗1 n


1:n of a list of Gaussian system X1:n , we can express it in
terms of T-Hermite polynomials.
Proposition 4.8 Let us denote the set of partitions of pairs and arms by PI,II
n , and
the set of arms by DK . The T-product
   
⊗  
X⊗1
1:n
n
= K−1
  κ⊗ ⊗ Hd XDK ,
p KdI,II (j,k)∈KdI,II j,k
d≡nmod(2) K I,II
d

where the summation is taken over all d ∈ 0 : n with d ≡ n mod  (2), and over
all partitions KdI,II ∈ PI,II , and where X = X | (m) ∈ K I,II is arranged in
n DK  m 
alphabetical order by indices. The permutation p KrI,II lists the pairs first then the
arms afterwards, both are in alphabetical  order, the T-product of vector covariances
κ⊗
j,k and X DK follows the ordering of p K I,II
r .
If the variates of the list X1:n are the same X, then using the symmetrizer Sd1n
we have

 
[n/2]
n!
X⊗n = κ ⊗k ⊗ Hn−2k (X)
2k (n − 2k)!k! X,2
k=0

 
[n/2]
n!
= Hn−2k (X) ⊗ κ ⊗k
X,2 . (4.72)
2k (n − 2k)!k!
k=0

The T-Hermite polynomial Hn can also be expressed in terms of the T-products


of the variables
   

Hn (X1:n ) = (−1)(n−d)/2 K−1
I,II
 κ ⊗
I,II j,k ⊗ XDK ,
p Kd ,DK (j,k)∈Kd
d≡nmod(2) KdI,II

where the summation runs for all partitions KdI,II for which the evenness of the
number of arms d ∈ (0 : n) is the same as n, i.e. d ≡ n mod (2)
Using symmetrizer Sd1n , for one vector variate we obtain

 
[n/2]
(−1)k n!  (−1)k n!
[n/2]
Hn (X) = X⊗(n−2k) ⊗ κ ⊗k = κ ⊗k ⊗ X⊗(n−2k) .
(n − 2k)!k!2 k X,2
(n − 2k)!k!2k X,2
k=0 k=0
(4.73)
224 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

Remark 4.8 The definition of Hk gives the same result as (4.73). For instance, H3
is given by (4.36) as follows:


H3 (X) = X⊗3 − L−1 ⊗
12 ,11 κ X,2 ⊗ X = X
⊗3
− 3κ ⊗
X,2 ⊗ X,

where the permuting matrix, by our notation,

1  
Sd13 = Id 3 + K(132) + K(213) + K(321) + K(312) + K(231) .
3!

Now X⊗3 is symmetric under permuting its terms, hence Sd13 X⊗3 = X⊗3 , and
applying Sd13 on κ ⊗
X,2 ⊗ X we obtain a simpler form
 
 ⊗3
H3 (X) = X⊗3 − 3Sd13 κ ⊗
X,2 ⊗ X = X − 3κ ⊗
X,2 ⊗ X.

Now we consider the products of T-Hermite polynomials in terms of the linear


combination of T-Hermite polynomials.
Let us take a partition L = (b1 , b2 , . . . , bp ) of (1 : n) with |bj | = nj . The set
 I,II 
of all partitions having one or two elements is denoted by PI,II I,II
n , i.e. Pn = K ,
I,II
and a partition K with a arms will be denoted by Ka .
I,II

Proposition 4.9 (Linearization Formula) The products of T-Hermite polynomials


⊗    
Hnj Xbj = K−1
 
j =1:p   p KaI,II
a≡nmod(2) L,K I,II
a
cnl
⊗   
× κ⊗
j,k ⊗ Ha XDKa , (4.74)
(j,k)∈KaI,II
 
where permutation p KaI,II lists the pairs first then the arms, both are in
alphabetical orders, and where
 the T-product of vector covariances and XDK
follows the ordering of p KaI,II . The summation is taken over all closed diagrams
 
L, KaI,II , KaI,II ∈ PI,II
n , without loops and the evenness of the number of arms
cnl
a equals to the evenness of n, i.e. a ≡ n mod (2).
Example 4.18 Let n = m = 2 and L = {b1 = (1, 2) , b2 = (3, 4)} then the
possible values of r in (4.74) are 0, 2, 4.
If r = 4, then K4I,II = {(1) , (2) , (3) , (4)}, and DK4 = {1, 2, 3, 4}, we have only
arms.
If r = 2, then K2I,II can be {(1, 3) , (2) , (4)}, {(1, 4) , (2) , (3)} and the correspond-
ing DK2 = {2, 4}, and DK2 = {1, 3},
4.6 Moments, Cumulants, and Linearization 225

If r = 0, then K0I,II is either {(1, 3) , (2, 4)} or {(1, 4) , (2, 3)}, and DK0 = ∅, in both
cases.
The result is the following:
   
H2 Xb1 ⊗ H2 Xb2 = H2 (X1 , X2 ) ⊗ H2 (X3 , X4 )
 
= H4 (X1:4 ) + K−1 ⊗
(1324) κ 1,3 ⊗ H2 (X2 , X4 )
 
+K−1(1423) κ ⊗
1,4 ⊗ H 2 (X 2 , X 3 )
 
+K−1(2314) κ ⊗
2,3 ⊗ H 2 (X 1 , X 4 )
 
+K−1(2413) κ ⊗
2,4 ⊗ H 2 (X 1 , X 3 )
   
+K−1 ⊗ ⊗ −1 ⊗ ⊗
(1324) κ 1,3 ⊗ κ 2,4 + K(1423) κ 1,4 ⊗ κ 2,3 .

In particular

H2 (X)⊗2 = H4 (X1:4 ) + L−1 ⊗ −1 ⊗2


2,H 2 κ X,2 ⊗ H2 (X) + Mm2 κ X,2 ,

where

L−1 −1 −1 −1 −1
2,H 2 = K(1324) + K(1423) + K(2314) + K(2413) .

Example 4.19 We give the detailed formula for H3 (X1:3 ) ⊗ H3 (X4:6 ) in the
Appendix, (see (A.20), p. 366), which yields us the case of equal variables
 
H3 (X)⊗2 = H6 (X) + L−1 ⊗
2,H 4 κ X,2 ⊗ H4 (X)

+ L−1 ⊗2 −1 ⊗3
2,2,H 2 κ X,2 ⊗ H2 (X) + Mm3 κ X,2 , (4.75)

which includes the cumulant


⊗3 ⊗3
Cum2 (H3 (X)) = M−1
m3 κ X,2 = 3!Sd13 κ X,2 ,

as well. Commutator matrices L−1 −1 −1


2,H 4 , L2,2,H 2 , and Mm3 are given in Appendix by
formulae (A.12), (A.11), and (A.8), p. 357, respectively. The 6-symmetric version
of (4.75) is


Sd16 H3 (X)⊗2 = H6 (X) + 9H4 (X) ⊗ κ ⊗ ⊗2 ⊗3
X,2 + 18H2 (X) ⊗ κ X,2 + 6κ X,2 .

cf. Exercise 4.12.


226 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

Example 4.20 We also obtain H4 (X)⊗2 , first using the formula (4.74) for different
variables then we apply the distinct values principle and obtain
 
H4 (X)⊗2 = H8 (X) + L−1
2,H 6 κ ⊗
X,2 ⊗ H 6 (X) + L−1 ⊗2
2,2,H 4 κ X,2 ⊗ H4 (X)

+ L−1 ⊗3 −1 ⊗4
2,2,2,H 2 κ X,2 ⊗ H2 (X) + Mm4 κ X,2 , (4.76)

where commutator matrices L−1 −1 −1 −1


2,H 6 , L2,2,H 4 , L2,2,2,H 2 , and Mm4 are given in
Appendix by formulae (A.13), (A.14), (A.15), and (A.9), p. 357, respectively. The
8-symmetric version of (4.76) is


Sd18 H4 (X)⊗2 = H8 (X) + 16κ ⊗ ⊗2
X,2 ⊗ H6 (X) + 72κ X,2 ⊗ H4 (X)

+ 96κ ⊗3 ⊗4
X,2 ⊗ H2 (X) + 24κ X,2 ,

cf. (4.28).

 |L| = 2, 3, 4, there


In particular cases of (4.74), like p = 2, 3, 4, i.e. the sizes
are constructions for deriving all partitions when the diagram L, KdI,II is closed
cnl
and without loop.

4.7 Gram–Charlier Expansion

The characteristic function of a multivariate X can be written in terms of moments


and also in terms of cumulants as follows:
∞ k
∞ 
 i ⊗ ⊗k  i k ⊗
⊗k
φX (λ) = μ λ = exp κ λ .
k! X,k k! X,k
k=0 k=0

We have an expression of moments in terms of cumulants,


 ⊗
μ⊗
X,n = K−1
p(K) κ⊗
X,|bj |
,
bj ∈K
K∈Pn

where the summation is over all partitions L = {b1 , b2 , . . . , bk } of 1 : n, (see (3.46)


p. 142 for a connection to Bell polynomial). As we see the expected value μ⊗ X,n is a
⊗
polynomial of the cumulants κ X,jj up to order n. This polynomial defines the T-Bell
polynomial Bk , in an analogous way to scalar case:
 
Bk κ ⊗ ⊗ ⊗ ⊗
X,1 , κ X,2 , . . . κ X,k = μX,n .
4.7 Gram–Charlier Expansion 227

We put φX (λ) in terms of T-Bell polynomials as


∞ k
 i  
φX (λ) = Bk κ ⊗ , κ
X,1 X,2

, . . . κ ⊗
X,k λ⊗k .
k!
k=0

Let Z be a Gaussian random variate with expected value EZ = EX = κ ⊗ X,1 =


μ⊗
X,1 ,variance vector κ ⊗ Z,2 = Cum 2 (Z) = Cum 2 (X) = vec  = κ ⊗
X,2 . The
characteristic function of Z is
   
1 ⊗ 1 ⊗
φZ (λ|μ, ) = exp iμ λ − λ λ = exp iκ X,1 λ − κ X,2 λ⊗2 .
2 2

We express the characteristic function φX as exponent of the cumulant function



∞ k 

 i 
⊗ ⊗
φX (λ) = exp κ − κ Z,k λ⊗k φZ (λ|μ, )
k! X,k
k=0

then the cumulants in terms of moments,


∞ 
 ik  
⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗k
φX (λ) = Bk κ X,1 − κ Z,1 , κ X,2 − κ Z,2 , . . . κ X,k − κ Z,k λ
k!
k=0

× φZ (λ|μ, ) ,

and we obtain
 
∞ k
 i  
φX (λ) = 1 + Bk 0, 0, κ ⊗ ⊗
X,3 , . . . κ X,k λ⊗k φZ (λ|μ, )
k!
k=3

since the first two differences of cumulants are zero and κ ⊗


Z,k = 0, if k ≥ 3.
Now the density fX (x) can be given by the inverse-Fourier transform of φX (λ).
The inverse-Fourier transform of i k λ⊗k φZ (λ|μ, ), which is the derivative of the
normal density (−1)k Dz⊗k ϕ (z|μ, ) this latter one will be expressed in term of a
Hermite polynomial. We have the Rodrigues’s formula, cf. (4.50)

Hk (z|) = (−1)k ϕ (z|0, )−1  ⊗k Dz⊗k ϕ (z|0, ) ,

and we can express the derivative by


 ⊗k
(−1)k Dz⊗k ϕ (z|0, ) = ϕ (z|0, )  −1 Hk (z|) .
228 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

Here Hk (z|) corresponds to the variate Z ∈ N (0, ). To obtain (−1)k Dz⊗k ϕ
(z|μ, ) we consider Hk (z − μ|) and obtain
 ⊗k
(−1)k Dz⊗k ϕ (z|μ, ) = ϕ (z|μ, )  −1 Hk (z − μ|) .

Now the inverse-Fourier transform of φX (λ), i.e. the density fX (x) can be
written into the so-called Gram–Charlier series
 
1     ⊗k
∞
fX (x) = 1 + Bk 0, 0, κ ⊗ ⊗
X,3 , . . . κ X,k  −1 Hk ((x − μ) |)
k!
k=3
×ϕ (x|μ, )
 
∞
1  −1/2 ⊗k  
⊗ ⊗
= 1+  Bk 0, 0, κ X,3 , . . . κ X,k
k!
k=3
 
×Hk  −1/2 (x − μ) |I ϕ (x|μ, ) ,

where the Hermite polynomial Hk (z|I) corresponds to Z ∈ N (0, I), with standard
Gaussian densityϕ (z|0, I), with respect
 to our earlier notations Hk (z|I) = Hk (Z).
The value of Hk  −1/2 (z − μ) |I in the above expansion is simply plugging the
 −1/2 (z − μ) into Hk (Z). An instance is
   ⊗2
H2  −1/2 (z − μ) |I =  −1/2 (z − μ) − vec I
 
=  −1/2⊗2 (z − μ)⊗2 − κ ⊗
2 ,

 
which implies that H2  −1/2 (z − μ) |I corresponds to  −1/2⊗2 H2 (Z−μ), when
Z ∈ N (0, ).
The product
 ⊗k    ⊗k  ⊗k
 −1/2 Bk 0, κ ⊗ ⊗
X,2 , . . . κ X,k = 
−1/2
EX⊗k = E  −1/2 X ,
 
equals to Bk κ ⊗ , κ ⊗
,
Y,1 Y,2 Y,3κ ⊗
, . . . , κ ⊗
Y,k , where Y = 
−1/2
(X − μ), i.e. Y is
standardized EY = 0, VarY = I.
In general we have that the third-order central moment and cumulant coincide
EY⊗3 = κ ⊗Y,3 , and also
 
EY⊗3 = B3 0, 0, κ ⊗ ⊗
Y,3 = κ Y,3 .
4.7 Gram–Charlier Expansion 229

Furthermore, it is worth noting that the polynomials B3 , B4 , and B5 have the


following simple forms:
 
B3 0, 0, κ ⊗ ⊗
Y,3 = κ Y,3 ,
 
B4 0, 0, κ ⊗ ⊗ ⊗
Y,3 Y,4 = κ Y,4 ,
, κ
 
B5 0, 0, κ ⊗ ⊗ ⊗ ⊗
Y,3 , κ Y,4 , κ Y,5 = κ Y,5 ,

(see Sect. A.1, p. 351) the pattern of sixth and higher-order Bell polynomials are
changing,
 
B6 0, 0, κ ⊗ ⊗ ⊗ ⊗ ⊗ −1 ⊗2
Y,3 , κ Y,4 , κ Y,5 , κ Y,6 = κ Y,6 + L23 κ Y,3 , (4.77)
 
hence B6 0, 0, κ ⊗ , κ ⊗
, κ ⊗
, κ ⊗ ⊗
Y,3 Y,4 Y,5 Y,6 = κ Y,6 . The symmetry of EY
⊗6 and B are
6
the same; therefore, B6 is 6-symmetric, hence we can symmetrize both sides by
Sd16 , actually taking Sd16 L−1 ⊗2
23 κ Y,3 , we obtain
 
 ⊗
B6 0, 0, κ ⊗ ⊗ ⊗ ⊗ ⊗2
Y,3 , κ Y,4 , κ Y,5 , κ Y,6 = κ Y,6 + 10κ Y,3 .

Similarly, by applying Sd17 we obtain


 
B7 0, 0, κ ⊗ , κ ⊗
, κ ⊗
, κ
Y,3 Y,4 Y,5 Y,6 Y,7

, κ ⊗


= κ⊗ −1 ⊗ ⊗ ⊗ ⊗ ⊗
Y,7 + L13 ,14 κ Y,3 ⊗ κ Y,4 = κ Y,7 + 35κ Y,3 ⊗ κ Y,4 ,

and also by Sd18 we have


 
B8 0, 0, κ ⊗ , κ ⊗
, κ ⊗
, κ ⊗
,
Y,3 Y,4 Y,5 Y,6 Y,7 Y,8κ ⊗
, κ ⊗

= κ⊗ −1 ⊗ ⊗ −1 ⊗2
Y,8 + L13 ,15 κ Y,3 ⊗ κ Y,5 + L24 κ Y,4

= κ⊗ ⊗ ⊗ ⊗2
Y,8 + 56κ Y,3 ⊗ κ Y,5 + 35κ Y,4 .

We rewrite the density


5
1 ⊗  
fX (x) = 1 + κ Y,k Hk  −1/2 (x − μ) |I ϕ (x|μ, )
k!
k=3

8
1  
+ Bk 0, 0, κ ⊗
Y,3 , . . . , κ ⊗
Y,k
k!
k=6
 
×Hk  −1/2 (x − μ) |I ϕ (x|μ, ) + O,
230 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

where Y =  −1/2 (x − μ); therefore, one can conclude that fX (x) depends on μ
and  through the Gaussian parts of this expression.
From now on let Y =  −1/2 (X − μ); therefore, κ ⊗ Y,1 = μ = 0, VarY = Id , and
ϕ (y) = ϕ (y|0, I). The density of Y, i.e. the density of the standardized X writes as
 

5
1 ⊗ 
8
1  
⊗ ⊗
fY (y) = 1 + κ Hk (y) + Bk 0, 0, κ Y,3 , . . . , κ Y,k Hk (y)
k! Y,k k!
k=3 k=6

× ϕ (y) + O. (4.78)

If Y is with density (4.78), then we calculate EH3 (Y) as follows:


4
EH3 (Y) = H3 (y) fY (y) dy
Rd
4 

5
1 ⊗
= H3 (y) 1 + κ Hk (y)
Rd k! Y,k
k=3
  
1 ⊗ ⊗
+ B6 0, 0, κ Y,3 , . . . , κ Y,6 H6 (y) ϕ (y) dy + O.
6!

The orthogonality of Hermite polynomials implies that we only have to consider the
expected value of
 
H3 (Y) H3 (Y) κ ⊗  ⊗
Y,3 = vec H3 (Y) H3 (Y) κ Y,3
   
⊗
= κ Y,3 ⊗ Id 3 vec H3 (Y) H3 (Y)
 
⊗
= κ Y,3 ⊗ Id 3 H3 (Y)⊗2 ,

with respect to the density ϕ (y), which is equivalent assuming Gaussianity of Y,


hence we use Z ∈ N (0, I) instead of Y. Consider
⊗3
EH3 (Z)⊗2 = Cum2 (H3 (Z)) = M−1 −1
m3 κ Z,2 = Mm3 (vec Id )
⊗3
,

where the Cum2 (H3 (Z)) has been given by (4.57) (see (A.8) p. 356 for commutator
M−1
m3 ).
Once again, we have the orthogonality of Hermite polynomials; EH3 (Z) ⊗
Hk (Z) = 0, k = 3; therefore, we obtain

1  
⊗
EH3 (Y) = EH3 (Z) H3 (Z) κ ⊗
Y,3 = κ Y,3 ⊗ Id 3 EH3 (Z)
⊗2
3!
1  ⊗ 
= κ Y,3 ⊗ Id 3 M−1
m3 (vec Id )
⊗3
= κ⊗
Y,3 ,
3!
4.7 Gram–Charlier Expansion 231

cf. Proof Sect. 4.8.2.  


We also have EH3 (Y) = B3 0, 0, κ ⊗ ⊗
Y,3 = κ Y,3 , and we show that it is true in
general as well.
Lemma 4.4 Let Gram–Charlier series expansion (4.78) for density fY be valid,
then for k ≥ 3 we have
 
EHk (Y) = Bk 0, 0, κ ⊗ ⊗
Y,3 , . . . κ Y,k , (4.79)

where Hk denotes the vector Hermite polynomial of a standard Gaussian vector


variate Z ∈ N (0, I).
See Sect. 4.8.2 for the proof.
If we consider the variance of H3 (Y), which is EH3 (Y)⊗2 − (EH3 (Y))⊗2 , then
we need only to derive EH3 (Y)⊗2 , since EH3 (Y) has already been given.
First, we linearize H3 (Z)⊗2 by (4.75):
 
H3 (Z)⊗2 = H6 (Z) + L−1
2,H 4 κ ⊗
Z,2 ⊗ H 4 (Z) + L−1 ⊗2 −1 ⊗3
2,2,H 2 κ Z,2 ⊗ H2 (Z) + Mm3 κ Z,2 ,

where κ ⊗
Y = vec Id . Then we apply (4.79) and obtain
 
EH3 (Y)⊗2 = κ ⊗
Y,6 +L−1 ⊗2
κ
23 Y,3 +L−1
2,H 4 vec Id ⊗ κ ⊗ −1 ⊗3
Y,4 +Mm3 vec Id , (4.80)

note that EH2 (Y) = 0.


Similarly, we consider H4 (Z) by (4.76):
 ⊗   
H4 (Z)⊗2 = H8 (Z) + L−1 −1 ⊗2
2,H 6 κ 2 ⊗ H6 (Z) + L2,2,H 4 κ 2 ⊗ H4 (Z)
 
+ L−1 ⊗3 −1 ⊗4
2,2,2,H 2 κ 2 ⊗ H2 (Z) + Mm4 κ 2 ,

and obtain
    
EH4 (Y)⊗2 = κ ⊗
Y,8 + L−1
13 ,15 κ ⊗
Y,3 ⊗ κ ⊗
Y,5 + L−1
2,H 6 vec Id ⊗ κ ⊗
Y,6 + L−1 ⊗2
κ
23 Y,3

+L−1 ⊗2
24 κ Y,4 (4.81)
 
+L−1 ⊗2 ⊗ −1 ⊗4
2,2,H 4 vec Id ⊗ κ Y,4 + Mm4 vec Id .

In closing this section we remark:


Remark 4.9 Above we have derived the Gram–Charlier series of a density fX (x),
although this series diverges in many cases of interest—it converges only if fX (x)
falls off faster than e−x /4 at infinity by a theorem of Cramér—nevertheless, we will
2
232 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

use the formulae of this section for getting ideas to estimate and test skewness and
kurtosis.
The following family of distributions
⎛ ⎞

k

g (y) = c (θ ) exp ⎝ θ j Hj (y)⎠ ϕ (y)
j =1

leads to a distribution of the form (4.78), since after the series expansion of the
exponent we can use the formula for linearization ((4.74), p. 224) of T-Hermite
polynomials. As follows from Lemma 4.4 the coefficients are necessarily those of
formula 4.79.

4.8 Appendix

4.8.1 Proof of Theorem 4.2

Proof Recall the fact that


   
Da⊗(n−1)
1:(n−1)
ψ X1:n , a1:n] = ψ (X1:n , a1:n ) Pn−1 X1:(n−1) , a1:n ,
  
where the vector polynomial Pn−1 X1:(n−1) , a1:n is of dimension d1:(n−1) . By
using the formula (4.33)
 *  
Pn−1 X1:(n−1) , a1:n *a = Hn−1 X1:(n−1)
1:n =0

and after simplification, one can express it as


 
Pn−1 X1:(n−1) , a1:n
= ψ −1 (X1:n , a1:n ) Da⊗n−1
1:(n−1)
ψ (X1:n , a1:n )
 
1  
n
−1
   

=ψ X1:(n−1) , a1:(n−1) exp −an Xn + ak Ck,n an + an Cn,k ak
2
k=1
(  )
1  
n
⊗n−1
   

×Da1:(n−1) ψ X1:(n−1) , a1:(n−1) exp an Xn − ak Ck,n an + an Cn,k ak
2
k=1
 
  1 
n−1
  
= ψ −1 X1:(n−1) , a1:(n−1) exp ak Ck,n an + an Cn,k ak
2
k=1
(  )
1  
n−1
  
×Da⊗n−1
1:(n−1)
ψ X1:(n−1) , a1:(n−1) exp − ak Ck,n an + an Cn,k ak ,
2
k=1
4.8 Appendix 233

where the summation is taken j |k ≤ n − 1, i.e. either j or k is smaller than n.


We reorder the T-differentials, see Proposition 2.1, and differentiate it by Da⊗n first
and get
 *
Da⊗n Pn−1 X1:(n−1) , a1:n] *a
1:n =0

−1
= K(1,n)((1, n)S d1:n ) Da⊗(n−1)
S 1:(n−1)
⎡ ⎛ ⎞⎤*
 *
  *
× ⎣ψ X1:(n−1) , a1:(n−1) ⎝− Cn,j aj ⎠⎦**
j =1:(n−1) *
a1:n =0
  
−1
= −K(1,n)S
((1, n)S d1:n ) K−1
(2:j ) (2 : j )S dn , d1:(n−1)
S
j =1:(n−1)
   
× vec Cj,n ⊗ Hn−2 (X(1:(j −1),(j +1):(n−1)) ) .

Now by the definition


*   *
*
Hn (X1:n ) = Da⊗n ψ (X1:n , a1:n )*a = Da⊗n Da⊗n−1 ψ (X 1:n 1:n *
, a )
1:n 1:n =0 1:(n−1) a1:n =0
   *
= Da⊗n ψ (X1:n , a1:n ) Pn−1 X1:(n−1) , a1:n *a
1:n =0

= Hn−1 (X1:(n−1) ) ⊗ Xn
  
−K−1
(1,n)S ((1, n)S d1:n ) K−1
(2:j ) dn , d1:(n−1)
S
j =1:(n−1)

   
× vec Cj,n ⊗ Hn−2 X(1:(j −1),(j +1):(n−1)) .

4.8.2 Proof of (4.79)

Proof First we prove (4.79), for n = 3,

1  ⊗ 
EH3 (Y) = κ Y,3 ⊗ Id 3 M−1
m3 (vec Id )
⊗3
.
3!

Let us consider M−1


m3 :

M−1
m3 = K(135246) + K(135264) + K(135426) + K(135624) + K(135462) + K(135642),

(see (A.8) p. 356)) and we realize that


 
M−1
m3 = 3! Id 3 ⊗ Sd13 K(135246),
234 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

moreover by Exercise 1.33, p. 56, we have

K(135246)vec⊗3 Id = vec Id 3 .

Now, we apply these formulae and obtain


    
⊗ ⊗
κ Y,3 ⊗ Id 3 M−1
m3 (vec Id )
⊗3
= 3! κ Y,3 ⊗ Id 3 Id 3 ⊗ Sd13 K(135246) (vec Id )⊗3
 
⊗ ⊗
= 3! κ Y,3 ⊗ Sd13 vec Id 3 = vec Sd13 κ Y,3

= 3!κ ⊗
Y,3

since κ ⊗
Y,3 is 3-symmetric.
The same formulae are valid in general

Kp2k vec⊗k Id = vec Id k ,

where p2k is a permutation of 1 : 2k, so that p2k = (1, 3, . . . 2k − 1, 2, 4, . . . 2k), i.e.


the first k places are taken by ordered odd numbers and then even ones. Moreover
 
M−1
mk = k! Id k ⊗ Sd1k Kp2k ,

therefore we can repeat the steps above and get

1  
EHk (Y) = EHk (Z) Hk (Z) Bk 0, 0, κ ⊗ Y,3 , . . . κ ⊗
Y,k
k!
1    
= Bk 0, 0, κ ⊗ Y,3 , . . . κ⊗
Y,k ⊗ Id k M−1 mk (vec Id )
⊗k
k!
 
= Bk 0, 0, κ ⊗ ⊗
Y,3 , . . . κ Y,k .

Q.E.D.

4.9 Exercises

All random variables below are jointly normal, centralized (EXj = 0), with
covariances σj k , and cumulants κ ⊗
j,k .
4.1 Let {Y, Z, X1:n } be a Gaussian system and a, b real numbers, then show

Hn+1 (aY + bZ, X1:n ) = aHn+1 (Y, X1:n ) + bHn+1 (Z, X1:n ).
4.9 Exercises 235

4.2 Let Y ∈ Rd be normal random variable with EY = 0, VarY = I, show that


E (Y Y)2 = d (d + 2).
4.3 Take random variables X1:6 and a partition L = (b1 , b2 ), where b1 = (1 : 3),
and b2 = (4 : 6). Use Example 1.31. p. 45 and show
    
EH3 Xb1 H3 Xb2 = σ1k1 σ2k2 σ3k3 ,
3!

where the summation is taken over all permutations (k1 , k2 , k3 ) of numbers (4 : 6).
4.4 Assume that random variables X1:k are jointly normal, show that

Cum (H2 (X1 ) , H2 (X2 ) , . . . , H2 (Xk )) = 2k−1 σ1,m1 σm1 ,m2 · · · σmk−1 ,1 ,
(k−1)!

and in particular

Cumk (H2 (X)) = 2k−1 (k − 1)!σ 2k .

4.5 Show that for any k ≥ 1,

Cum (H2 (X1 , X2 ) , H2 (X3 , X4 ) , . . . , H2 (X2k−1 , X2k ) , Y1 ) = 0.

4.6 Show that

Cum (H2 (X1 , X2 ) , H2 (X3 , X4 ) , . . . , H2 (X2k−1 , X2k ) , Y1:2 )


   
= Cov Xjm , Xjn .
2k k! (jm ,jn )∈K II

Hint: See Example 4.16.


4.7 If m ≥ 3 then show that
 
Cum H2 (X1,2 ), H2 (X3,4 ), . . . , H2 (X2k−1,2k ), Y1:m = 0.

4.8 Show that


  
Cum H3 (Y1:3 ) , X4:(3+k) = δk,3 Cov(Y1 , Xi1 )Cov(Y2 , Xi2 )Cov(Y3 , Xi3 ).
3!

4.9 Show that

Cum (H3 (Y1:3 ) , H2 (X1:2 ) , H1 (X3 )) = Cum (H3 (Y1:3 ) , X1:3 ) .


236 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

4.10 Show that



Cum (H3 (Y1:3 ) , H3 (Z1:3 )) = Cov(Y1 , Zi1 )Cov(Y2 , Zi2 )Cov(Y3 , Xk1 ).
3!

4.11 Show that

Cum (H3 (Y1:3 ) , H3 (Z1:3 ) , X1:2 )



= Cov(Y1 , Zi1 )Cov(Y2 , Zi2 )Cov(Y3 , Xk1 )Cov(Xk2 , Zi3 ).
3!·2·3

Hint: Consider the structure of all possible closed diagrams without loops.
4.12 Use expression (4.29) and show that

H32 (X) = H6 (X) + 9σ 2 H4 (X) + 18σ 4H2 (X) + 6σ 6 .

4.13 Take the fourth-order derivative Da⊗4


1:4
of generator function  (X1:n , a1:n )
(see (4.32), p. 202) and show formula H4 (X1:4 ) given in Sect. A.5.2, p. 365, then
conclude
 
H4 (X) = X⊗4 − L12 ,21 κ ⊗X,2 ⊗ X ⊗2
+ L22 κ ⊗2
X,2


= X⊗4 − 6κ X,2 ⊗ X⊗2 + 3κ ⊗2
2 ,

where commutators are given in Sect. A.2, p. 353.


4.14 Express H4 (X1:4 ) and H4 (X) in terms of Hermite polynomials with lower
degree.
4.15 Find the expected value of T-product
⊗
h (x1:3 ) = E (xk + iXk ) .
k=1:3

and show h (X1:3 ) = H3 (X1:3 ) .


4.16 Use either the additivity of H2 or the recurrence relation and show that
 
H2 (X1 + X2 ) = H2 (X1 ) + H2 (X2 ) + Id + K−1
(2,1) H2 (X1 , X2 ) .

4.17 Show that

H2 (AX1 + BX2 ) = A⊗2 H2 (X1 ) + (A ⊗ B) H2 (X1 , X2 ) + (B ⊗ A) H2 (X2 , X1 )


+ B⊗2 H2 (X2 ) .
4.9 Exercises 237

4.18 Show that

Hn+1 (AY + BZ, X1:n ) = (A ⊗ Id n ) Hn+1 (Y, X1:n ) + (B ⊗ Id n ) Hn+1 (Z, X1:n ) .

4.19 Show that


     
H3 X−X = H3 (X−μ) + H3 μ − X + 3H2,1 X−μ, μ − X
 
+ 3H1,2 X−μ, μ − X .

4.20 Consider the quadratic form XAX , where X is Gaussian with EX = 0 and
covariance matrix C ([Hol96a, MP92]), where A can be considered as a symmetric
matrix (since the asymmetric part of a quadratic form is zero). Show that

EXAX = trCA,
 k ⊗k
E XAX = (2k − 1)!!κ 2 Sd12k vec⊗k A.

and
 
Cumk XAX = 2k−1 (k − 1)! (trCA)k .

4.21 Show that


 
H3 (Y)⊗2 = H6 (Y)+L−1
H 4,2 (H 4 (Y) ⊗ κ 2 )+L−1
H 2,2,2 H 2 (Y) ⊗ κ ⊗2 −1 ⊗3
Y,2 +Lm3 κ Y,2 ,

where commutators L−1 −1 −1


H 2,2,2 , LH 4,2 , and Lm3 are given in Sect. A.2.2.2.
4.22 Assume we have a sample X1:n for a Gaussian vector X ∈ N (μ, ). Denote
X and  3 the mean and the sample variance matrices, this latter one be positive
definite. Show that

      
−1/2 1/2 ⊗3 
3
3 −1/2 3 1 3
H3  X − X = Sd13   Hk,3−k
k=0
(n − 1) (3−k)/2 k

  
−1/2
×  −1/2 (X − μ) ,  μ−X .
X

4.23 Let variates X and Y be jointly Gaussian, show that

E (X|Y) = μX + CX,Y C−1


Y,Y (Y−μY ) ,

VX|Y = Var (E (X|Y)) = CX,Y C−1


Y,Y CY,X .
238 4 Gaussian Systems, T-Hermite Polynomials, Moments, and Cumulants

4.24 Let variates X and Y be jointly Gaussian, show that

E (Hn (X) |Y ) = Hn (E (X|Y )) .

4.25 Let variates X and Y be jointly Gaussian, show that


   
E X3 |Y = 3E (X|Y ) E (X − E (X|Y ))2 + E3 (X|Y ) = H3 (E (X|Y )) .

4.26 Let variate Y1:3 be Gaussian, show that

E (H3 (Y1:3 )|Y3 ) = H3 (E (Y1 |Y3 ) , E (Y2 |Y3 ) , Y3 ).

4.10 Bibliographic Notes

Isserlis’ theorem (nowadays also called Wick’s theorem) on moments of Gaus-


sian systems and of Hermite polynomials was started by Isserlis, [Iss18] (see
[MNBO09]).
Szegő in his classical book [Sze36] (see also [PS72]) considers Hermite polyno-
mials as a part of general orthogonal polynomials. For orthogonal polynomials of
several variables see Dunkl and Xu [DX14].
The books by Erdelyi et al. [EMOT81] and by Abramowitz and Stegun [AS92]
contain general formulae for Hermite polynomials.
Expressing the product of several Hermite polynomials in terms of linear
combination of Hermite polynomials, linearization, has been started as early as
Watson [Wat38], Feldheim [Fel40] and continued by Carlitz [Car62], where the
formulae are complicated, nevertheless in computational point of view it looks
simple in some cases.
The Hermite polynomials of several variables are closely connected to Wick
calculus, which is important in the study of stochastic (ordinary and partial)
differential equations. The general constructions of Wick polynomials in Gaussian
random functions can be found in the review by Dobrushin and Minlos [DM77], J.
Glimm and A. Jaffe [GJ12], where one can find the diagram technique as well.
The formulae that allow one to compute higher-order moments of the multi-
variate normal distributions and Hermite polynomials in terms of its covariance
structure using diagrams are given by Malyshev [Mal80], Malyshev and Minlos
[MM85], Peccati and Taqqu [PT11].
Deep results based on Hermite polynomials in non-central limit theorems and
multiple Wiener-Itô integrals have been published by Taqqu [Taq75], Dobrushin
[DM79b], Dobrushin and Major [DM79a] and Major [Maj81]. Concerning Wiener
measure see a paper by McKean [McK52]. A bibliography on multivariate normal
integrals is given by Gupta [Gup63].
4.10 Bibliographic Notes 239

Recently polynomial chaos expansions comprising multivariate Hermite orthog-


onal polynomials in dependent Gaussian random variables have been studied by
Peccati and Taqqu [PT11], and Rahman [Rah17].
Hermite polynomials play fundamental roles in asymptotic expansions for
distributions including Gram–Charlier and Edgeworth expansions, see [Wit84],
and for the third-order Edgeworth expansions [Wit00] gives the bi- and trivariate
polynomials up to order six.
The Hermite polynomials of complex-valued Gaussian systems has been defined
in [Hid80].
N-Dimensional Hermite Polynomials have been studied by Grad [Gra49] and
Holmquist [Hol96a], for further generalization see [Wit00]. The expectations and
cumulants of products of multiple normal variables are derived by Holmquist
[Hol88] and [Hol96b]. An explicit expression for the polynomials in the bivariate
case and for orders less than or equal to six are presented by Barndorff-Nielsen
and Pedersen [BNP79] and by Amari and Kumon [AK83] for general dimensions.
General Hermite and Laguerre two-dimensional (2D) polynomials have been
considered by Wunsche [Wün00a, Wün00b].
Some results on general and multivariate Hermite polynomials have been
published by Terdik [Ter02] and Jammalamadaka S. Rao, Subba Rao, and Terdik
[JST06].
Multivariate Hermite polynomials are of interest in connection with Gram–
Charlier, Edgeworth, and saddle-point expansions, for this see Chamber [Cha67],
Blinnikov [BM98, BA17] and [Dha18].
Cramér’s theorem on the divergence of Gram–Charlier series can be found in
Cramér [Cra99].
Chapter 5
Multivariate Skew Distributions

Abstract In this chapter we provide a systematic treatment of several multivariate


skew distributions. General formulae for cumulant vectors at least up to the fourth
order are given, which are necessary for deriving the corresponding skewness and
kurtosis measures discussed later in Sects. 6.1, 6.4, and 6.5.
In the first section all cumulants of multivariate skew-normal distributions and
canonical fundamental skew-normal distributions are given. The Inverse Mill’s ratio
and the central folded normal distribution are also discussed.
Section 5.1 is devoted to skew-spherical distributions. We start with two impor-
tant classes of symmetric distributions, namely the spherically symmetric distribu-
tions and elliptically symmetric distributions, deriving in turn all their cumulants.
Moment and cumulant parameters for spherical distributions are introduced, as
they play an important role in the study of cumulants of multivariate elliptically
contoured distributions. A canonical fundamental skew-spherical distribution is
defined in a similar way to the canonical fundamental skew-normal distribution,
and the first four cumulants are provided.
Cumulants for multivariate skew-t, scale mixtures of skew normal, multivariate
skew-normal-Cauchy, and multivariate Laplace distributions are given in Sects. 5.4–
5.6. The method in each case uses T-derivatives and T-cumulants.

5.1 The Multivariate Skew-Normal Distribution

The Inverse Mill’s (i-Mill’s) ratio is used to derive the moments and cumulants of
skew-normal distributions in the following section.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 241
G. Terdik, Multivariate Statistical Methods, Frontiers in Probability
and the Statistical Sciences, https://doi.org/10.1007/978-3-030-81392-5_5
242 5 Multivariate Skew Distributions

5.1.1 The Inverse Mill’s Ratio and the Central Folded Normal
Distribution

Let Z denote the standard normal random variable, and |Z| the central folded normal
(or half-normal) distribution with the characteristic function
 
1 2
φ|Z| (λ) = 2 exp − λ  (iλ) ,
2

where  denotes the univariate standard normal distribution. The cumulant gener-
ating function of |Z| is

1
ψ|Z| (λ) = log 2 − λ2 + log  (iλ) .
2
The ratio of the standard normal density ϕ (λ) and the corresponding distribution
function  (λ), viz. the following function

ϕ (λ)
ρM (λ) = ,
 (λ)

is called the Inverse Mill’s ratio of the standard normal distribution, and we shall
refer to it as i-Mill’s ratio.We will see the connection between this i-Mill’s ratio ρM
and the derivatives of the cumulant generating function ψ|Z| if we take the derivative
of the cumulant generating function

d
ψ|Z| (λ) = −λ + iρM (iλ) .

Let us denote the function ρM (iλ) by ρiM (λ) for convenience. Then the derivatives
of ψ|Z| and ρiM of order higher than 2 will coincide, i.e.

(k) (k−1)
ψ|Z| (λ) = iρiM (λ) , k > 2. (5.1)

(n)
We can easily obtain a clear formula for ρiM (0), i.e. the cumulant of |Z|, in terms
of moments since the moments of the central folded normal distribution are well
known and the derivatives of i-Mill’s ratio ρiM are equal to the derivatives of the
cumulant generating function.
Proposition 5.1 The moments E |Z|n and the cumulants Cumn (|Z|) of |Z| are
given by
7
2
μ|Z|,2k+1 = E |Z| 2k+1
= (2k)!! , (5.2)
π
μ|Z|,2k = E |Z|2k = (2k − 1)!!,
5.1 The Multivariate Skew-Normal Distribution 243

and
7
2 2
κ|Z|,1 = , κ|Z|,2 = 1 − ,
π π
(n−1)
κ|Z|,n = (−i)n−1 ρiM (0) , n > 2,

(n−1)
respectively. If n > 2, then the derivatives of i-Mill’s ratio at zero (−i)n−1 ρiM (0)
can be expressed in terms of moments μ|Z|,j :


n    j
(n−1) 1 μ|Z|,j
(−i)n−1 ρiM (0) = n! (−1)r−1 (r − 1)! .
j =1:(n−r+1) j ! j!
r=1 j =r,j j =n

Proof It is well known that the moments of the modulus of the standard univariate
normal distribution are given by
7
2
E |Z| = 2
n (n−1)/2
 ((n + 1) /2) ,
π

so that (5.2) follows. Cumulants κ|Z|,k can be directly calculated via these moments
(see (3.27), p. 131). At the same time if n > 2 then the (n − 1)th derivative of i-
Mill’s ratio ρiM (λ) coincides with the nth derivative of the cumulant generating
function ψ|Z| , and hence with the nth cumulant κ|Z|,n . So the assertion follows.
We show some particular cases in the following example.
Example
√ 5.1 We have the first- and second-order cumulants κ|Z|,1 = μ|Z|,1 =
2/π, and κ|Z|,2 = μ|Z|,2 − μ2|Z|,1 = 1 − 2/π, as usual. Now we shall use i-
Mill’s ratio for deriving the cumulants and compare it to the formula (3.27), p. 131,
using moments.
First let us take the derivative of the normal density ϕ (1) (iλ) = λϕ (iλ), therefore
(1)
ρM (iλ) = ρM (iλ) (λ − iρM (iλ)) = ρiM (λ) (λ − iρiM (λ)). Now we consider
the third- and fourth-order derivatives of the cumulant function. The third-order
derivative is

d3 (2) (1)
ψ|Z| (λ) = iρiM = (λ − 2iρiM ) ρiM + ρiM ,
dλ3
= (λ − 2iρiM ) (λ − iρiM ) ρiM + ρiM
= ρiM (1 + (λ − 2iρiM ) (λ − iρiM )) ,

where we keep using the notation ρiM (λ) = ρM (iλ). If we set λ = 0, then we
obtain
 
(2)
ρiM (0) = ρiM (0) 1 − 2ρiM
2
(0) ,
244 5 Multivariate Skew Distributions

and the cumulant


7 3 7
2 2
κ|Z|,3 =2 −
π π

follows. Conversely, first we can use the formula (3.22), p. 130, then plug the
moments into (5.2), and get
7 3 7
2 2
κ|Z|,3 = μ|Z|,3 − 3μ|Z|,2 μ|Z|,1 + 2μ3|Z|,1 =2 − .
π π

Similarly, the fourth-order derivative of the cumulant function follows from


 
(3) (2) (1) (1)
ρiM = (λ − 2iρiM ) ρiM + 2ρiM 1 − iρiM ,

which is
 
(3)
ρiM (0) = 2iρiM
2 2
(0) 3ρiM (0) − 2

at zero. Hence, we obtain the fourth-order cumulant


 2
2 2
κ|Z|,4 = 4 −6 .
π π

Using formula (3.25), p. 130, and (5.2) yields the same result
 2
2 2
κ|Z|,4 = μ|Z|,4 − 4μ|Z|,3 μ|Z|,1 − 3μ2|Z|,2 + 12μ|Z|,2 μ2|Z|,1 − 6μ4|Z|,1 = 4 −6 .
π π

See Sect. A.8, p. 378 for further derivatives of i-Mill’s ratio. Cumulants of |Z| are
also listed there in (A.38), p. 379.

5.1.2 Skew-Normal Random Variates

Now we turn to the cumulants of a skew-normal distribution.


A d-dimensional random vector X has multivariate skew-normal distribution
SNd (μ, , α) with shape parameter α if its density function is defined by
 
2ϕ (x; μ, )  α  (x − μ) ,

where is a correlation matrix (in the sequel we shall assume that > 0), ϕ is the
normal density function, and  denotes the univariate standard normal distribution.
5.1 The Multivariate Skew-Normal Distribution 245

Now we transform the shape parameter α into the skew parameter δ by

1
δ= α. (5.3)
(1 + α  α)1/2

This mapping is one to one with inverse transformation

1 −1
α=   δ.
 −1 1/2
1−δ δ

The characteristic function of SNd (0, , α) is given in terms of the skew parameter
by
   
φX (λ) = 2 exp −1/2λ λ  iδ  λ ,

and the corresponding cumulant generating function is

1  
ψX (λ) = log 2 − λ λ + log  iδ  λ .
2
Cumulants can be derived by taking the T-derivative of ψX . Let us observe that we
can use i-Mill’s ratio since
 
⊗    ϕ iδ  λ  
Dλ log  iδ λ = i    δ = iρiM δ  λ δ.
 iδ λ

We consider the first two cumulants. The first one follows from
 
Dλ⊗ ψX = − λ + iρiM δ  λ δ

at zero, where we plug ρiM (0) = 2/ 2π in and obtain
7
⊗ 2
κX,1 = μ⊗
X,1 = δ.
π

Next we take the derivative of ρiM


  (1)  
Dλ⊗ ρiM δ  λ = ρiM δ  λ δ

and obtain the second derivative of ψX as

(1)  
Dλ⊗2 ψX = −vec + iρiM δ  λ δ ⊗2 ;
246 5 Multivariate Skew Distributions


hence, κX,2 follows
*
⊗ * 2 ⊗2
κX,2 = (−i)2 Dλ⊗2 ψX * = vec − δ ,
λ=0 π
(1)
where again we use ρiM (0) = −i2/π (see Sect. A.8, p. 378 for some more
derivatives of i-Mill’s ratio).
Furthermore we note that derivatives Dλ⊗k ψ result in δ ⊗k , at zero, multiplied by
a constant, say, ck , k = 3, 4 . . .. For instance the third-order one is

Dλ⊗3 ψX (0) = iρiM (0) δ ⊗3 = c3 δ ⊗3 .


(2)

We conclude that the cumulants of order higher than 2,


*
⊗ *
κX,k = (−i)n Dλ⊗k ψX (λ)* = ck δ ⊗k
λ=0

do not depend on but on δ only. We can apply the results of Proposition 5.1, to
derive the coefficients ck of δ ⊗k , which are the derivatives of i-Mill’s ratio at zero,
therefore are the cumulants κ|Z|,k , i.e. ck = κ|Z|,k .
Lemma 5.1 The cumulants of the multivariate skew-normal distribution
SNd (μ, , α) are
7
2
κ⊗
X,1 = EX = δ,
π
2
κ⊗
X,2 = vec − δ ⊗2 ,
π
and for k = 3, 4 . . .

κ⊗ ⊗k
X,k = κ|Z|,k δ ,

where κ|Z|,k are expressed explicitly by μ|Z|,k (see (5.2)).


Some special cases of κ|Z|,k are listed in (A.38), p. 379.
One
√ can observe that the skew parameter δ can be expressed by the mean μX , i.e.
δ = π/2μX ; therefore all cumulants are defined by μX , for instance
 π  ⊗3
κ⊗
X,3 = 2 − 2 μX . (5.4)

Remark 5.1 It is interesting to note that the multivariate skew-normal distribution


is determined by the mean and the variance only, since the cumulants are given in
5.1 The Multivariate Skew-Normal Distribution 247

terms of μX . More precisely


 π k/2
κ⊗
X,k = κ|Z|,k μ⊗k
X , k ≥ 3.
2

5.1.3 Canonical Fundamental Skew-Normal (CFUSN)


Distribution
 
The marginal stochastic representation of a random vector X ∈ Rd with distribu-
tion CF U SNd,p () is given by
 1/2
X =  |Z1 | + Id −  Z2 , (5.5)

where  is a d × p skewness matrix satisfying a < 1 for all a = 1, while
Z1 ∈ N 0, Ip and Z2 ∈ N (0, Id )* are independent.
* *  The modulus of Z1 is taken

*
component-wise, i.e. |Z1 | = *Z1,1* , . . . , *Z1,p * . A simple construction of  is
 −1/2
 =  Ip +  where  is some d × p real matrix.
The expected value EX and variance VarX can be obtained directly from
Eq. (5.5). The expected value is
7
2
EX = E |Z1 | = 1,
π

and using the independence of Z1 and Z2 the variance is

VarX = Var |Z1 |  + Id −  ,

where variance Var |Z1 | follows from the variance of entries of |Z1 |: Var |Z1 | =
Ip − 2/πIp (see Proposition 5.1). Hence we have

2
VarX = Id −  .
π
A location-scale extension of X is given by

X = μ + 1/2 X,

where μ ∈ Rd and is a d × d positive definite matrix. The first, second moments


as well as higher-order moments and cumulants of this can be obtained from those
of X.
Another generalization of CF U SNd,p (), where we replace the standard Z2 by
a scaled Z2 ∈ N (0, ) in the model (5.5), and we denote it by CF U SNd,p ( , ).
248 5 Multivariate Skew Distributions

 
Let Z1 ∈ N 0, Ip and Z2 ∈ N (0, ); then the characteristic function φX of
CF U SNd,p ( , ) is
   
φX (λ) = 2p exp −1/2λ λ p i λ , (5.6)

where p denotes the standard normal distribution function of dimension p.


Distribution function p is given by the product of
 standard normal distribution
function  of one variable, namely p (iω) = k=1:p  (iωk ). The cumulant
generating function

1  
ψX (λ) = p log 2 − λ λ + log p i λ , (5.7)
2
and we shall derive the cumulants by the T-derivative
*
Cumn (X) = (−i)n Dλ⊗n ψX (λ)*λ=0 ,

where, as usual,
!  "
∂ 
Dλ⊗ ψX = vec ψX (λ)  .
∂λ

It is seen from (5.12) that cumulants of order higher than 2 do not depend on but
 only.

5.1.3.1 Cumulants of CFUSN Distribution



* * 
Let us consider the higher-order cumulants of |Z| = |Z1 | , . . . , *Zp * . The variate
|Z| (modulus by entries) is central folded normal distributed with characteristic
function
 
1
φ|Z| (λ) = 2p exp − λ2 p (iλ) ,
2
 
where p (iλ) denotes the standard μ = 0,  = Ip multivariate cumulative
distribution function. The components
p of a standard distribution are independent;
therefore, we have p (iλ) = k=1  (iλk ), and the cumulant function of |Z| is
written in the form

1 p
ψ|Z| (λ) = p log 2 − λ2 + log  (iλk ) .
2
k=1
5.1 The Multivariate Skew-Normal Distribution 249

The T-derivatives of ψ|Z| (λ) can be expressed by i-Mill’s ratio, similarly to


the scalar-valued central folded normal distribution (see (5.1)), namely the first
derivative is


p 
p
Dλ⊗ ψ|Z| (λ) = −λ + Dλ⊗ log  (iλk ) = −λ + i ρiM (λk ) ek ,
k=1 k=1

where ek ∈ Rp is the kth unit coordinate vector. The second derivative is


p
Dλ⊗2 ψ|Z| (λ) = −vecIp + i ρiM (λk ) e⊗2
(1)
k .
k=1

(k)  
Observe that the derivatives ρiM λj of i-Mill’s ratio at zero do not depend
on index j . The first two moments, the expected value μ|Z| and variance
matrix  |Z| of |Z|, are* obtained√by substituting zero into these derivatives;
*
μ|Z| = −i Dλ⊗ ψ|Z| (λ)*λ=0 = 2/π1p , and  |Z| = − Dλ⊗ ψ|Z| (λ)*λ=0 =
(1 − 2/π) vecIp .
Higher-order T-derivatives have the following simple form


p
(n−1)  
Dλ⊗n ψ|Z| (λ) = i ρiM λj e⊗n
j , n > 2.
j =1

We introduce the vector


p
i⊗
p,n = e⊗n
k , (5.8)
k=1

for short notation of higher-order cumulants below. So the cumulants of |Z| are the
following:

Lemma 5.2 We have κ ⊗ |Z|,1 = μ|Z| = 2/π1, κ ⊗ |Z|,2 = vec |Z| =
(1 − 2/π) vecIp , and for n > 2
*
κ⊗ ⊗n * ρiM (0) i⊗
n−1 (n−1)
|Z|,n = (−i) Dλ ψ|Z| (λ) λ=0 = (−i)
n
p,n ,

(n−1)
where ρiM (λ) = ρM (iλ) is i-Mill’s ratio as before and ρiM (0) has clear form
in terms of μ|Z|,k (see Proposition 5.1).
250 5 Multivariate Skew Distributions

Now let us turn to the cumulants of X with characteristic function (5.6). Let us
consider the first derivative
  i *
Dλ⊗ log p i λ = ⊗ *
  Dλ p (iy) y= λ
p (i λ)
i    *
=  p−1 iy[1:p]\k ϕ (iyk )*y= λ ek
p (i λ)
k
  *
 p−1 iy[1:p]\k ϕ (iyk )*y= λ
=i ek ,
p (i λ)
k

where ek ∈ Rp is the kth unit coordinate vector, since p is a product of standard


one variable distributions. In the above sum the nominator and denominator of the
ratio differ only by one term, namely by  (iyk ); hence, we have
  * *
p−1 iy[1:p]\k ϕ (iyk )*y= λ ϕ (iyk ) **
= = ρiM (yk )|y= λ .
p (i λ)  (iyk ) *y= λ

Therefore the derivative of ψX is expressed in terms of i-Mill’s ratio


*
    *
*
Dλ⊗ ψX = − λ + Dλ⊗ log p i λ = − λ + i ρiM (yk ) ek * ,
*
k y= λ

#  
ρiM (yk ) ek is a vector with coordinates ρiM (yk ), denoted by [ρiM (yk )]k=1:p .
k
The values of all ρiM (yk ) are equal at y = 0; hence we obtain the first cumulant
7
⊗ 2
κX,1 = μ⊗
X,1 = 1.
π

The first derivative of i-Mill’s ratio is necessary for the second derivative of ψX
* * *
Dλ⊗ ρiM (yk )*y= λ ek = i ρiM (yk )*y= λ ek ⊗ek = i ρiM (yk )*y= λ ⊗2 e⊗2
k .

We proceed with
 *
*
Dλ⊗2 ψX = −vec + i ⊗2 e⊗2
(1)
ρiM (yk )* k ,
y= λ
k

and the second cumulants follows from this


* 2 ⊗2  ⊗2
⊗ *
κX,2 = Dλ⊗2 ψX * = −vec +  ek
λ=0 π
k
2 ⊗2 2  
= −vec +  vecIp = vec − vec  .
π π
5.1 The Multivariate Skew-Normal Distribution 251

This latter form also implies the variance matrix

2
VarX = −  .
π
The third-order derivative of ψX does not depend on any more. We repeat the
calculation of second-order case and get
 *
*
Dλ⊗3 ψX (λ) = i ⊗3 e⊗3
(2)
ρiM (yk )* k . (5.9)
y= λ
k

(3)
We proceed using Proposition 5.1 for ρiM (0) and obtain
7   7  
2 4  2 4
⊗ ⊗3
κX,3 =− −1  ⊗3
ek = − − 1 ⊗3 i⊗
p,3 ,
π π π π
k

where


p
i⊗
p,3 = e⊗3
k ; (5.10)
k=1
 
in other words i⊗ ⊗2
p,3 = vec ek , with the matrix of vectors of e⊗2
k , which is
  k=1:p
e⊗2
k with dimension p2 × p. Let us introduce the notation ek = •,k for
k=1:p
the kth column vector of matrix ; then we have
7   7  
⊗ 2 4 ⊗3 2 4
κX,3 =− −1 •,k = − − 1 ⊗3 i⊗
p,3 .
π π π π
k

We have the following Lemma for general orders.


Lemma 5.3 The cumulants of canonical fundamental skew-normal distribution,
CF U SNd,p ( , ), are the following
7
2
κ⊗
X,1 = EX = 1p , (5.11)
π
2  
κ⊗
X,2 = vec − vec  , (5.12)
π
and for k = 3, 4 . . .

κ⊗ ⊗r ⊗
X,r = κ|Z|,r  ip,r , (5.13)
252 5 Multivariate Skew Distributions

where i⊗
p,3 is defined in (5.10). Let us use the j th column vector •,j = ej of
matrix ; then


2  r
κX,r = κ|Z|,r
2
1p   1p , r > 2, (5.14)

where ( )r denotes the rth Hadamard power of  .


Some special values of κ|Z|,r are listed in (5.2) and (A.38), p. 379.
Proof We continue by induction based on the third-order derivative (5.9). We
consider the rth (r > 2) derivative of i-Mill’s ratio
*
* (r)   
= ρiM ek  λ ,
(r)
ρiM (yk )*  y= λ

and apply the T-derivative

(r)    (r+1)    
Dλ⊗ (ek )⊗r ρiM ek  λ = iρiM ek  λ (ek )⊗r ⊗ Ip ek
(r+1)   
= i (ek )⊗(r+1) ρiM ek  λ .

Then we obtain the (r + 1)th cumulant


 

ρiM (0) (ek )⊗r+1 = κ|Z|,r (ek )⊗r .
(r)
κX,r+1 = (−i)r
k k

Now, ek is the kth column of ; it is denoted by •,k . We obtain

2  ⊗r
  r

κX,r = κ|Z|,r
2
•,j ⊗r
•,k = κ|Z|,r
2
•,j •,k .
j,k j,k


The product •,j •,k is the (j, k)th entry of the matrix  ; therefore
 r

•,j •,k is the (j, k)th entry of the rth Hadamard power ( )r of  .
Hence


2  r
κX,r = 1p   1p .

5.2 Elliptically Symmetric and Skew-Spherical Distributions

Two important classes of symmetric distributions: spherically symmetric distribu-


tions and elliptically symmetric distributions are considered in the following section.
5.2 Elliptically Symmetric and Skew-Spherical Distributions 253

5.2.1 Elliptically Contoured Distributions

First we consider spherically symmetric distributions. A random variate W is


spherically distributed if its distribution is invariant under rotations of Rd , which
is equivalent to having the stochastic representation

W = RU, (5.15)

where R is a non-negative random variable, U is uniform on sphere Sd−1 , and R


and U are independent. Random variable R is called the generating variate, with
the generating distribution F , and vector random variable U is the uniform base
of the spherical distribution.
The characteristic function can be written in the form
 
φW (λ) = g λ λ ,

where function g is called characteristic generator.


An elliptically symmetric random variable X ∈ Ed (μ, , g) is defined by the
location-scale extension of W so that

X = μ +  1/2 W,

where μ ∈ Rd ,  is a variance–covariance matrix and W is spherically distributed.


The cumulants of X are just the cumulants of W multiplied by a constant, except
the mean, which is EX = Cum1 (X) = μ. For m ≥ 1 we will see below
 ⊗2m
Cum2m (X) =  1/2 Cum2m (W) ,

Cum2m+1 (X) = 0.

Let us consider the series expansion


∞ j
 i
ψW (λ) = κ ⊗
W,j λ
⊗j
j!
j =1

of the cumulants generator function ψW (λ) = log φW (λ) = log g (λ λ) of W and
conclude that in this case cumulants are calculated by the T-derivative of the log
characteristic generator function

j ⊗j
  **
κ⊗
W,j = (−i) Dλ log λ λ * .
λ=0
254 5 Multivariate Skew Distributions

According to the stochastic representation of W (see (5.15)), cumulants can be


calculated if either the generator function g or the distribution of generating variate
R is known.
Let us start with using the generator function g for deriving the cumulants of
W. Now, the characteristic generator g is a function of one variable, it corresponds
to the characteristic function at λ λ, therefore the series expansion of g includes
(−1)j instead of (i)j , see above, and we have

 (−1)j
g (u) = gj uj ,
j!
j =1

with coefficients gj = (−1)j g (j ) (0). This will play an important role later on.
Definition 5.1 Let us introduce the log-generator function f = log (g) and define
the “generator moment” by νk = (−1)k g (k) (0), together with the “generator
cumulant” by ζk = (−1)k f (k) (0).
Although neither the generator moments nor the generator cumulants correspond
to moments and cumulants of a random variate, respectively, Faà di Bruno’s formula
(see (3.27), p. 131) is valid and we can express generator cumulants in terms of
generator moments (and vice versa):


k  k 
 
k! νj j
ζk = (−1)r−1 (r − 1)! k , (5.16)
j =1 j ! j =1
j!
r=1 j =r,j j =k

where the second sum is taken over all sequences (types)  = (1 , . . . , k ), j ≥ 0,
such the following two conditions are satisfied:
#
• j = r.
#j =1
• j =1 j j = k.

5.2.1.1 Marginal Moments and Cumulants

The marginal characteristic function of the j th entry of W is given by the


 variable

   
λj = 0, . . . , λj , . . . , 0 . If we take the derivatives of φWj λj = g λ2j , it is
readily seen that the derivatives do not depend on j . Moreover, the odd moments
are zero. Let us introduce the notation κW,m , for the mth-order cumulant of Wj , and
apply the formula cumulants in terms of moments (see (3.27), p. 131) as follows.
In particular, we get the second-order moment in terms of the generator
moment EWj2 = μW,2 = 2ν1 , and the variance in terms of the generator
     
cumulant Cum2 Wj = κW,2 = 2ζ1 , since d/dλj g λ2j = −2λj g (1) λ2j , and
     
d 2 /dλ2j g λ2j = −2g (1) λ2j + λ2j g (2) λ2j ; hence, μW,2 = −2g (1) (0), also
5.2 Elliptically Symmetric and Skew-Spherical Distributions 255

f (2) (0) = −2f (1) (0). Observe ν1 = ζ1 , since EWj = 0. Connections between
both moments and generator moments and cumulants and generator cumulants are
stated in the following theorem.
Theorem 5.1 The odd moments μW,2m+1 = EWj2m+1 of Wj are zero and the even
ones are

(2m)!
μW,2m = 2m (2m − 1)!!νm = νm , (5.17)
m!
where νm denotes the generator moment.
Odd cumulants κW,2m+1 of Wj are zero as well, and the even ones are

(2m)!
κW,2m = 2m (2m − 1)!!ζm = ζm , (5.18)
m!
where ζm denotes generator cumulant. Even cumulants can be expressed in terms of
generator moments


m   
 
m! νj j
κW,2m = 2 (2m − 1)!!
m
(−1)r−1 (r − 1)!  ,
j =1 j ! j =1
j!
r=1 j =r,j j =m
(5.19)

where the second sum is taken over all sequences  = (1 , . . . ,  ), j ≥ 0, such
that the following two conditions are satisfied:
#
• j = r,
#j =1
• j =1 j j = m.
 
Proof The marginal characteristic function of entry Wj is g λ2j when λj =
   
0, . . . λj , . . . 0 . We differentiate g λ2j to obtain the moments of Wj in terms
of generator moments,

∂  2   
g λj = g (1) λ2j 2λj ,
∂λj
∂2  2   2  
g λj = g (2)
λ2
j 2λj + 2g (1)
λ2j .
∂λ2j
   
We proceed with noting that the coefficients of g (n) λ2j = ∂ n /∂λnj g λ2j are
j
scalar-valued series, say, bn , when we take the derivatives of a compound function
256 5 Multivariate Skew Distributions

 
g λ2j . Therefore, we have
    n   (n−2)
g (n) λ2j = bnn g (n) λ2j 2λj + bnn−1 g (n−1) λ2j 2λj
  (n−2m)
+ bnn−m g (n−m) λj 2λj , (5.20)

n
where m = 2 , and coefficients bnk fulfill the following recursion:

bnn = 1
n
n−k n−k−1
bnn−k = 2bn−1 (n − 2k + 1) + bn−1 , k = 1, . . . , m = .
2
If n is odd, then the power of the last term in (5.20) is n − 2 [n/2] = 1; therefore,
n/2
g (n) (0) = 0; otherwise g (n) (0) = bn g (m) (0); hence
8
0 if n odd,
EWjn =
(−1)m b2m
m g (m)
(0) if n = 2m even.

 noticed that bn does not depend on g; hence, we may choose, say, gm (t) =
It can k
 been
exp −t (being a valid characteristic function) and derive coefficients b2m ; the
2

result is
m
b2m = 2m (2m − 1)!!;

hence, for n = 2m

EWj2m = (−1)m 2m (2m − 1)!!g (m) (0) .

Now (5.17) follows changing (−1)m g (m) (0) to the generator moment

(2m)!
EWj2m = νm . (5.21)
m!
Observe that the right-hand side does not depend on the index j ; therefore, all
marginals are distributed equally. Plugging the cumulant generator function f into
(5.20), it is readily seen that the odd generator cumulants ζm are also zero. The even
order cumulants for each Wj are

  (2m)!
Cum2m Wj = 2m (2m − 1)!!ζm = ζm .
m!
The general formula (5.27) is based on formula “cumulants in terms of moments”
for ζm in terms of νm (see (3.27), p. 131).
5.2 Elliptically Symmetric and Skew-Spherical Distributions 257

We use further derivatives of generator functions, and for m = 4 : 8 we obtain


 
κW,4 = μW,4 − 3μ2W,2 = 12ν2 − 12ν12 = 12 ν2 − ν12 ,
 
κW,6 = 23 5!! ν3 − 3ν2 ν1 + 2ν13 ,
 
κW,8 = 24 7!! ν4 − 4ν3 ν1 − 3ν22 + 12ν2 ν12 − 6ν14

(cf. Exercises and (5.19)). Now, the fourth-order cumulant of standardized Wj ,


usually called kurtosis, has the form
 
κW,4 ν2 − ν12 ζ2 Wj
2
=3 = 3 2 = Cum4 √ , (5.22)
κW,2 ν12 ν1 2ν1

for each entry Wj , where we observe two quantities: one is kurtosis (standardized
generator cumulant since ν1 = ζ1 )

ζ2
κ2 =
 , (5.23)
ν12

and the other one is

ν2 − ν12 ν2

μ2 = = − 1,
ν12 ν12

which contains the standardized generator moment ν2 /ν12 .


Both quantities 
μ2 and 
κ2 have the same value (see (5.22)). Kurtosis 
κ2 originally
used to be called kurtosis parameter (Muirhead kurtosis). Observe that parameter

μ2 depends on the generator moment of the standardized variate only, so it can be
called moment parameter. For m ≥ 1 we define the moment parameter by
νm

μm = − 1. (5.24)
ν1m

The ratio ζ2 /ν12 included in (5.23) is generalized for higher order by

ζm

κm = , (5.25)
ν1m

and it will be referred to as cumulant parameter. We have seen by (5.22) that



κ2 = μ2 .
258 5 Multivariate Skew Distributions

Let us consider an example; take m = 3,


   
ζ3 ν3 ν2

κ3 = = −1 −3 −1 =
μ3 − 3
μ2 .
ν13 ν13 ν12

The following normalized cumulants of Wj , where cumulant parameters 


κm are
expressed in terms of moment parameters 
μm , explain our notations,
 
Wj
Cum2 √ = 1, (5.26)
2ν1
 
Wj
Cum4 √ = 3!!
κ2 = 3!!
μ2 ,
2ν1
 
Wj
Cum6 √ = 5!!
κ3 = 5!! (
μ3 − 3
μ2 ) ,
2ν1
   
Wj
Cum8 √ = 7!!
κ4 = 7!! 
μ4 − 4
μ3 − 3
μ22 + 6
μ2
2ν1

(see Exercise 5.4). We shall see later that the usage of moment and cumulant
parameters reduces the number of parameters for a multiple spherically distributed
random variate W significantly. The number of characteristics is halved for an
elliptically contoured distribution.
The cumulant parameters  κm can be expressed by moment parameters  μm in
higher orders as well.
Corollary 5.1 The moments of standardized Wj are zero for odd orders and
 2m
Wj μW,2m
E √ = = (2m − 1)!! (
μm + 1) ,
2ν1 (2ν1 )m

for even orders, where 


μm is the moment parameter (5.24).
The cumulants of standardized Wj are zero for odd orders and
 
Wj
Cum2m √ = (2m − 1)!!
κm ,
2ν1

for even orders, where 


κm is the cumulant parameter (5.25), such that


m     
m! μj + 1  j


κm = (−1)r−1(r − 1)!  . (5.27)
j =1 j ! j =1
j!
r=1 j =r,j j =m

The formula (5.27) is valid for all m ≥ 1, since we have seen earlier that 
μ1 = 0,

κ1 = 1.
5.2 Elliptically Symmetric and Skew-Spherical Distributions 259

It is also worth noticing that if all 


μj = 0 then (5.27) is zero, i.e. all 
κm = 0, cf.
(2.12), p. 66.
Proof The general formula (5.27) is based on formula “cumulants” ζm in terms of
“moments” νj , see (3.27), p. 131, hence
 j
ζm  m  m! 

νj

κm = m = (−1)r−1 (r − 1)!  j
,
j =1 j ! j =1 j !ν1
ν1
r=1 j =r,j j =m

j
we change the ratio νj /ν1 = 
μ2 + 1, and hence the assertion (5.27) follows.
Now let us turn to the stochastic representation (5.15) and consider the one
variable case Wj = RUj . We are interested in the even order moments

μW,2m = μR,2m μUj ,2m ,

where the even order moments EUj2m are given by (5.81); hence, we obtain

μR,2m
μW,2m = m
(2m − 1)!!. (5.28)
2 (d/2) m

On the other hand we can use the expression (5.17) and equating it to (5.28) we get

(2m)! μR,2m
νm = m (2m − 1)!!,
m! 2 (d/2)m

where (d/2)m = d/2 (d/2 + 1) · · · (d/2 + m − 1), see (A.24), p. 368 for Pochham-
mer’s symbol (d/2)m . Now we express the generator moment in terms of the
moment of the generating variate R

m! (2m)! μR,2m
νm = μR,2m = 2m .
(2m)! 22m m! (d/2)m 2 (d/2)m

The dependence of moment parameter  μm on the moment of the generating variate


R follows directly from the definition of 
μm and the above expression

νm (d/2)m μR,2m

μm = − 1 = −1
ν1m (d/2)m μm
R,2
1 μR,2m
= − 1, (5.29)
α (d, m) μm
R,2

where α (d, m) = (d/2)m / (d/2)m = (1 + 2/d) · · · (1 + 2 (m − 1) /d) =



k=2:m (1 + 2 (k − 1) /d) = (1)m,2/d .
260 5 Multivariate Skew Distributions

Let Uj be a component of U; then the stochastic representation of W (5.15)


implies Wj = RUj ; therefore, the cumulants of Wj can be expressed either by the
moments of R or by the cumulants of R.
Example 5.2 Let us take the fourth-order cumulant κW,4 = κRU1 ,4 of W1 , say
(all Wj have the same distribution). We can apply the formula for cumulants via
moments (see (3.27), p. 131); then using the independency of R and Uj , we obtain
 2
κW,4 = μR,4 μU1 ,4 − 3 μR,2 μU1 ,2 .

Now we use the particular values for moments of Uj , see (5.81) and get

3   3 2 6 3
κW,4 = 2
κR 2 ,2 + κR 2
2 ,1 − 2 κR 2 ,1 = − 2 κR 2 ,1 + κ 2 .
d (d + 2) d d (d + 2) d (d + 2) R ,2
(5.30)

The cumulants of Wj are connected to the cumulants of R in general as well.


The cumulant Cumn (W ) is an nth order cumulant of the product RU1 of two
independent variates. A direct method of its calculation has been considered in
accordance with the conditional cumulants (see (3.71) p. 158). We use that formula
for our case and obtain:
Lemma 5.4 The even order cumulants of a component Wj of a spherically
distributed random variate W are given in terms of generating variate R as follows:


2n    1  κU ,j j
 n−r+1
κW,2n = (2n)! 2 , . . . , Rn
Cumr R{1 } , R{ 1
,
2} {n } j ! j!
r=1 j =r,j j =n j =1
j is even
(5.31)

where the summation is taken over all even order cumulants of U1 , since the odd
j
orders are zero and where R{ } corresponds to the block with cardinality j , which
j
includes power R j only (it implies listing R j consecutively j times).
Here the cumulants κU1 ,j of U1 are involved and in particular cases, they can be
evaluated explicitly to get the clear formula for κW1 ,2n .
Example 5.3 Let us take the fourth-order cumulant κW,4 = κRU1 ,4 of Wj , say. We
can apply the formula to cumulants (5.31) and obtain

κRU1 ,4 = κR 4 ,1 κU1 ,4 + 3κR 2 ,2 κU2 1 ,2 . (5.32)


5.2 Elliptically Symmetric and Skew-Spherical Distributions 261

Now we turn to the cumulants of U1 into moments and use formula (5.81) for the
moments
3
μU1 ,4 = ,
d (d + 2)
κU2 1 ,2 = μU1 ,2 = 1/d,
3 3 6
κU1 ,4 = μU1 ,4 − 3μ2U1 ,2 = − 2 =− 2 .
d (d + 2) d d (d + 2)

Finally, let us plug these into (5.32) and obtain

6 1
κRU1 ,4 = − κ 4 + 3 2 κR 2 ,2 . (5.33)
d 2 (d + 2) R ,1 d

One can easily show that (5.30) equals (5.33).

5.2.2 Multivariate Moments and Cumulants

The series expansion of the characteristic function



   i j ⊗j
φW (λ) = g λ λ = μ⊗ λ
j
j!
j =1

*
includes the T-moments of W through the coefficients μ⊗ = j D ⊗j φ (λ)*
j (−i) λ W *
λ=0
of λ⊗j .
Now, the characteristic generator g is a function of one variable with series
expansion

 (−1)j gj j
g (u) = u ,
j!
j =1

such that gj = (−1)j g (j ) (0) = (−1)j νj . Let us rewrite the characteristic function
using the series expansion of g (u) and obtain

   (−1)j gj   
φW (λ) = g λ λ = λ λ .
j!
j =1
262 5 Multivariate Skew Distributions

Hence we can calculate the moments by

* ∞
 (−1)j gj  j **
*
(−i)k Dλ⊗k φW (λ)* = (−i)k Dλ⊗k λ λ * .
λ=0 j! λ=0
j =1

We observe that
*
* ∞
 *
* (−1)j gj ⊗k  
 j*
*
μ⊗ = (−i) k ⊗k
D φ W (λ) * = (−i)k Dλ λ λ *
k λ λ=0 j! *
j =1 λ=0

⎨ 0 if 2j = k,
= 1
⎩ gj cj if 2j = k,
j!

and vector cj does not depend on g. Let us use gj = (−1)j g (j ) (0) = νj , and
⊗2j
cj = Dλ (λ λ)j , hence

νm ⊗2m   m
μ⊗
W,2m = D λ λ .
m! λ
We have derived the connection between the generator moments and the marginal
moments by (5.17); now we apply it and obtain
μW1 ,2m  m μW1 ,2m ⊗2m   m
μ⊗
W,2m = Dλ⊗2m λ λ = D λ λ .
m!2 (2m − 1)!!
m (2m)! λ

The same argument applies to the cumulants generator function ψW (λ) =


log φW (λ) with series expansion

 i j ⊗j
ψW (λ) = κ ⊗ λ ,
j
j!
j =1

*
with coefficients κ ⊗ = n D ⊗j ψ (λ)*
j (−i) λ W * , and the cumulant generator function
λ=0
f = log (g) with series expansion

 (−1)j j
f (u) = fj u ,
j!
j =1

where fj = (−1)j f (j ) (0) = ζm . Now we obtain κ ⊗ ⊗2m


2m = ζm /m!Dλ (λ λ)m , and
since ζm is connected to the mth cumulant of a component of W by (5.18), we get
κW,2m  m κW,2m ⊗2m   m
κ⊗
W,2m = D ⊗2m λ λ = D λ λ .
2m m! (2m − 1)!! λ (2m)! λ
5.2 Elliptically Symmetric and Skew-Spherical Distributions 263

Recall that K{r|} denotes particular partitions with size r and type  (see page
95). Commutator L−1 r2 is defined by

L−1
r2 = K−1
p(K{r|} )
,
(2r−1)!!

where the summation is over all partitions K{r|} ∈ P2r , with type  =
(0, r, 0, . . . , 0), i.e. partitions K{r|} , and includes r blocks of numbers 1 : 2r, each
with cardinality 2. Index r2 corresponds to the entry 2 = r of , see Sect. A.2.1,
p. 353.
Summarizing the above calculations it follows:
Theorem 5.2 The moments and cumulants of odd orders of spherically distributed
W are zero; the moments of even orders are

μW,2m 
μ⊗
W,2m = L−1 vec⊗m Id = μW,2m vec⊗m Id ;
(2m − 1)!! m2

furthermore, the cumulants of even orders are

κW,2m 
κ⊗
W,2m = L−1 vec⊗m Id = κW,2m vec⊗m Id .
(2m − 1)!! m2

In terms of cumulant parameters the standardized cumulants


 
−1/2
Cum2m  W W = κm L−1
m2 vec
⊗m
Id (5.34)


where  −1/2 = 1/ 2ν1 Id . The formula (5.27) shows that  κm is a polynomial of 
μk ,
k = 2 : m. We denote this polynomial by am (
μ2 . . . , 
μm ) = 
κm ; hence,
 
Cum2m  −1/2 W = am ( μm ) L−1
μ2 . . . ,  m2 vec
⊗m
Id ;

in particular a1 = 1,

⎨μ2 , m = 2,
μ2 , . . . , 
am ( μm ) = μ3 − 3
μ2 , m = 3,

μ4 − 4
 μ3 − 3
μ22 + 6
μ2 , m = 4.

Proof The crucial point of the proof is understanding the derivative Dλ⊗2m (λ λ)m .
First we notice that
 m m
Dλ⊗2m λ λ = Dλ⊗2m fj (λ) ,
j =1
264 5 Multivariate Skew Distributions

where fj (λ) = λ λ, that is we can use the general Leibnitz rule (2.40) and
symmetrize by Sd12m ,

  
  2m ⊗ ⊗kj
Dλ⊗2m j fj (λ) = Dλ fj (λ) , (5.35)
k1:m j
k1:m =2m


= (2m)!vec⊗m Id ,

since the only nonzero term is when kj = 2, j = 1 : m, then Dλ⊗2 fj (λ) = 2vecId .
Therefore, we have
 
2m m (2m)! m
2 = 2 = (2m)!,
21m (2!)m

terms in the sum (5.35), each equals (vecId )⊗m ; hence, the assertion follows.
Remark 5.2 Both higher-order moments μ⊗ ⊗
W,2m and cumulants κ W,2m assume
calculating the symmetrizer Sd12m first in practice. The actual calculation of the
symmetrizer for large dimension d and order 2m is really time-, memory-, and
space- consuming. One can get an efficient calculation evaluating the Dλ⊗2m (λ λ)m
by using the T-derivative step by step. An instance is
 2  
Dλ⊗4 λ λ = 23 Id 4 + K−1
(3214) + K −1 ⊗2 3 −1 ⊗2
(1324) vec Id = 2 L22 vec Id ,

where 3 commutator matrices are included instead of 24, which are necessary for
obtaining Sd14 . Further examples can be found in Sect. A.2, p. 353, Appendix.
Example 5.4 Deriving Dλ⊗6 (λ λ)3 means finding 15 partitions K{3|} , with size 3
and type  = (0, 3, 0, 0, 0, 0), i.e. splitting up the set 1 : 6 into three blocks, and
each block contains two elements. The partitions are in canonical form, and the
corresponding commutator L−1 32 is listed in Sect. A.2.1, p. 354. The result is

 3
Dλ⊗6 λ λ = 48L−1 ⊗3
32 vec Id .

We can calculate EW⊗2m directly from the expected values of the entries. The
nonzero entries of EW⊗2m are those where all terms in the product of entries of W
have even degrees, see (5.28).
Remark 5.3 We compare the expected values of the entries of W⊗2m to the expected
values of those in Z⊗2m , where Z is a standard normally distributed variate. The
nonzero entries of the expected value EZ⊗2m are products with even powers such
5.2 Elliptically Symmetric and Skew-Spherical Distributions 265

that


d 
d
E Zi2ki = (2ki − 1)!!,
i=1 i=1

where k1:d = m. At the same time we have EZ⊗2m in a vector form

EZ⊗2m = (2m − 1)!!Sd12m vec⊗m Id ,

see (4.54). We compare this to

EW⊗2m = EW12m Sd12m vec⊗m Id .

Hence, we conclude that the higher-order moments of an elliptic random vector W


differ from the moments of a normal one Z only in a constant EW12m .
The major difference shows up by comparing the cumulants, since higher than 2
order, cumulants of Z are zero but the cumulants of W are

Cum2m (W) = Cum2m (W1 ) Sd12m vec⊗m Id ,

in turn.
An interesting side result is that the vector a = Sd12m vec⊗m Id is symmetric (a ∈
Sd,2m ⊂ Md,2m ) and has the following structure. Let us use the multilinear indexing
j1:2m , for the entries of a (see Sect. 1.3.2), denote i the number of repetitions, i.e.
the multiplicity of i, i = 1, . . . , d, in the index j1:2m (cf. Remark 1.5, p. 20, type of a
multi- index). If there exists an odd i , then aj1:2m = 0; otherwise if all multiplicities
i of multi-index j1:2m are even, then we write i = 2ki and then

1  d
aj1:2m = (2ki − 1)!!.
(2m − 1)!!
i=1

Now replacing the semifactorials in terms of the factorial and power of 2, similarly
to (2m − 1)!! = (2m)!/2m m!, we obtain
  −1
m 2m
aj1:2m = ,
k1:d 2k1:d

since k1:d = m.
Example 5.5 Let the generating variate R be Gamma distributed with parameters
ϑ > 0, α > 0; then we have

ϑ r  (α + r)
ER r = ;
 (α)
266 5 Multivariate Skew Distributions

and the kurtosis parameter 


κ2 is
 
d ER 4 d  (α + 4)  (α)

κ2 =  2 − 1 = −1
d + 2 ER 2 d + 2  (α + 2)2
d  (α) d (α + 3) (α + 2)
= (α + 3) (α + 2) −1= − 1.
d+2  (α + 2) d + 2 α (α + 1)

5.2.3 Canonical Fundamental Skew-Spherical Distribution

We consider Canonical Fundamental Skew-Spherical distribution CFUSS as an


extension of Canonical Fundamental Skew-Normal (CFUSN) distribution (see
(5.5)).
Let W be spherically symmetrically distributed with dimension p + d, and with
generating variate R, i.e. W = RU, where U is a uniform random variate. Now


  , such that W ∈ Rp and
let us split W into two sub-vectors W = W 1 , W2 1
W2 ∈ Rd . In this case we have W1 = β1 RU1 and W2 = β2 RU2 , where β12 is with
distribution Beta(p/2, d/2), and β22 = 1 − β12 is also distributed as Beta(d/2, p/2).
Moreover, variates R, β12 , U1 and U2 are independent.
Note that spherically symmetric distributions can be characterized not only by
generating variates R but by characteristic generators as well (see Sect. 5.2.1). In this
section we use generating variates R and express moments and cumulants in terms
of the moments and cumulants of R. We consider the moments and cumulants of β1
and β2 , which are constants in the model since they are Beta distributed depending
on dimensions only. Some of them are listed explicitly in terms of dimensions
in Sect. A.6.2, p. 374. Similarly, moments and cumulants of modulus of uniform
random multivariates |U| can be found in Lemma 5.12 and are included in formulae
below. In this section dimensions p and q will be fixed; therefore, we will use short
notation for the joint moments Eβ1m1 β2m2 = Gm1 ,m2 (p, d) = Gm1 ,m2 , cf. (A.34),
as well as for the function Gk (p) =  ((p + k) /2) /  (p/2), will be denoted by
Gk (p) = Gk , in short cf. (5.79).
Let us define a vector variate X with Canonical Fundamental Skew-Spherical
distribution, X ∈ CF U SSd,p (0, R, ), by the equation
 1/2
X =  |W1 | + Id −  W2 , (5.36)

where the modulus is taken by element-wise,  is a d × p skewness matrix


(assumed that a < 1, for all a = 1). An instance for  is  =
 −1/2
 Id +  with some p × d real matrix . We can rewrite CFUSS variate
5.2 Elliptically Symmetric and Skew-Spherical Distributions 267

X in terms of uniform distribution as follows:


 1/2
X = β1 R |U1 | + β2 R Id −  U2 .

Variable W2 is the product of independent variates W2 = Rβ2 U2 ; therefore, not


only the first-order moment is zero, μ⊗ ⊗
W2 ,1 = κ W2 ,1 = 0, but the odd higher-order
moments (and cumulants)

μ⊗ ⊗
W2 ,2k+1 = μβ2 ,2k+1 μR,2k+1 μU2 ,2k+1 = 0, k = 0, 1, . . . ,

as well, since the component U2 of W2 is uniform random variate on sphere Sp−1


(see Proposition 5.3).
The conditional independence of W1 and W2 under condition Q = [β1 , R] is
implied by their form. We will see that the cumulants κ ⊗
X,n of X will be simplified
by using conditional cumulants.
Let us start with some moments and cumulants of random vector X with CFUSS
distribution.
Case 5.1 (Expected Value) The first-order moment (expected value) and cumulant
of X are equal, i.e. κ ⊗ ⊗
X,1 = μX,1 ,

κ⊗ ⊗
X,1 = E |W1 | = μβ1 μR μ|U1 |,1 .

We refer (A.32), p. 374 for the moments of βk including μβ1 = G1,0 , and
Lemma 5.12 for
7
1 1
μ⊗|U1 |,1 = 1p ,
π G1

respectively, which results in


7
1 G1,0
κ⊗
X,1 = μR 1p .
π G1

Remark 5.4 We have seen that the first-order cumulant (and moment) of X is
expressed clearly in terms of dimensions (through G functions) and the expected
value of generating variate R besides the skewness matrix . Hereinafter we shall
express neither the moments, cumulants of βk , the T-moments, nor the T-cumulants
of |U1 | in detail. We collected the corresponding formulae in Sect. A.6, p. 368,
Appendix, instead. The calculation can be carried out by plugging these formulae
into the actual expressions.
The second- and higher-order moments and cumulants of X depend on the
higher-order moments and cumulants of vectors |W1 | and W2 including mixed
ones, for instance the variance of X depends on κ ⊗ ⊗ ⊗
|W1 |,2 , κ W2 ,2 and κ |W1 |,W2 . Let
268 5 Multivariate Skew Distributions

us consider the cumulant κ ⊗


(|W1 |,W2 )

κ⊗
|W1 |,W2 = Cum2 (β1 R |U1 | , β2 RU2 ) (5.37)

and use the conditional cumulant according to Q = [β1 , R] (see Brillinger’s


theorem (3.67), p. 156)
 
κ⊗
|W1 |,W2 = Cum1 Cum2 (β1 R |U1 | , β2 RU2 |Q)
 
+ Cum2 Cum1 (β1 R |U1 | |Q) , Cum1 (β2 RU2 |Q) .

Now β1 R and β2 R can be pulled from the conditional cumulants first; then
we observe that |U1 | and U2 are independent from Q = [β1 , R]; therefore,
Cum2 (|U1 | , U2 |Q) = Cum2 (|U1 | , U2 ), which is zero since U1 and U2 are
independent. The second term is zero as well, since Cum1 (U2 |Q) = μ⊗U2 ,1 = 0;
hence, the result
 
κ⊗
(|W1 |,W2 ) = Cum1 β1 β2 R Cum2 (|U1 | , U2 |Q)
2

 
+ Cum2 β1 RCum1 (|U1 | |Q) , β2 RCum1 (U2 |Q) = 0.

A similar argument leads to the following:


Lemma 5.5 All mixed T-cumulants of |W1 | and W2 on condition β1 and R are
zero. All joint T-moments and joint T-cumulants of |W1 | and W2 that contain odd
number of W2 are also zero.
Proof Random vectors |W1 | and W2 are conditional independent when Q =
[β1 , R] is given; therefore, their conditional cumulants are zero. The joint moments
of |W1 | and W2 that include odd number of W2 are zero because conditionally |W1 |
and W2 are independent; hence, we can separate the moments of |W1 | and W2 and
the odd order moments of W2 are zero. We remark that separation is possible using
commutator matrices. Cumulants are expressed by the product of moments and each
product has at least one term having odd number of W2 ; therefore, the result is zero.
Case 5.2 (Variance) We have seen that random vectors |W1 | and W2 are uncorre-
lated (see (5.37)); therefore,
μβ2 ,2 μR,2  
μ⊗ ⊗2 ⊗
X,2 = μβ1 ,2 μR,2  μ|U1 |,2 + Id 2 − ⊗2 vecId ,
d

see (5.85) for μ⊗


|U1 |,2 . For similar reasons the second-order cumulant is

 1/2⊗2
κ⊗
X,2 =  ⊗2 ⊗
κ |W1 |,2 + Id − 
κ⊗
W2 ,2 .
5.2 Elliptically Symmetric and Skew-Spherical Distributions 269

We use μ⊗ ⊗
|U1 |,2 given by (5.85) and the independence of the components of κ |W1 |,2
for the first term. The second term contains vec = ⊗2 vecIp , as follows:
 1/2 
 1 1/2⊗2 1  
Cum2 Id −  U2 = Id −  vecId = vec Id − 
d d
1  
= vecId − ⊗2 vecIp , (5.38)
d
where (1.4), p. 7 has been applied, namely
 1/2⊗2  1/2  1/2
Id −  vecId = vec Id −  Id Id −  (5.39)
 
= vec Id −  = vecId − ⊗2 vecIp .

We combine both terms above and obtain


 
κ⊗
X,2 = μ β 1 ,2 μR,2  ⊗2 ⊗
μ|U1 |,2 − μ 2
β 1 ,1 μ2
R,1 ⊗2 μ⊗2
|U1 |
μβ2 ,2  
+ μR,2 vecId − ⊗2 vecIp
d
  
1 1 1 1 1
= μβ1 ,2 μR,2 ⊗2 − vecIp + 1p2
p π G2 π G2
G2 μR,2 μβ2 ,2  
− μ2β1 μ2R 1 ⊗2 1p2 + vec Id − 
π d
   
1 1 1 μR,2 μβ2 ,2
= μβ1 ,2 μR,2 − − vec
p π G2 d
 
μβ1 ,2 μR,2 2  ⊗2 μR,2 μβ2 ,2
2 2 G1
+ − μβ1 μR 1p + vecId
πG2 π d
 ⊗2
= c1 vecId + c2 1p + c3 vec ,

Gk = Gk (p), where constants c1 , c2 , and c3 are defined by the previous line of the
formula. Hence, the variance–covariance matrix follows:

VarX = c1 Id + c2 1p 1p  + c3  . (5.40)

One can derive the variance in terms of moments as well.


We shall consider higher even order T-moments of W2 that will contain a term
 1/2⊗2k
similar to (5.39), namely Id −  vec⊗k Id . The method of simplifying it
270 5 Multivariate Skew Distributions

is also similar
 1/2⊗2k  1/2⊗2 ⊗k
I −  (vecId )⊗k = Id −  (vecId ) (5.41)
   ⊗k
= vec⊗k Id −  = vecId − ⊗2 vecIp ,

where we used the mixed product rule for T-product (see (1.3), p. 6 and (5.39)).
Case 5.3 (Third-Order Cumulant) We use Lemma 5.5 for neglecting zero terms
and obtain
  1/2⊗2 
⊗ ⊗3 ⊗ −1
μX,3 =  μ|W1 |,3 + L12 ,11  ⊗ Id −  
μ⊗ ⊗2
|W1 |,W2

(see below for commutator L−1


12 ,11 and (A.2), p. 353). Direct computation shows that

μ⊗ ⊗
|W1 |,3 = μβ1 ,3 μR,3 μ|U1 |,3 ;

using the formulae of Lemma 5.12 we obtain


7
μR,3 μβ1 ,β 2 1
μ⊗ = μR,3 μβ1 ,β 2 μ⊗ ⊗ μ⊗ = 2
1p ⊗ vecId ;
|W1 |,W⊗2
2 2 |U1 |,1 U2 ,2 dG1 (p) π

hence,
μR,3 μβ ,β 2 7 1   
μ⊗ ⊗3 ⊗ L−1 1p ⊗ vecId − ⊗2 vecIp
1 2
X,3 = μR,3 μβ1 ,3  μ|U1 |,3 + dG1 (p) 1
π 2 1,1

see (5.86) for μ⊗


|U1 |,3 in detail.
We neglect zero terms and obtain the third-order cumulant as
  1/2⊗2 
κ⊗
X,3 =  ⊗3 ⊗
κ |W1 |,3 + L−1
12 ,11  ⊗ Id −  
κ⊗
|W1 |,W21 . 2

An explanation of the multilinearity commutator L−1


12 ,11 is the multilinearity property
of cumulants, since the crossproduct term of a sum in a third-order cumulant is the
following:

κ⊗
  + κ⊗ ⊗
(W2 ,|W1 |,W2 ) + κ W21 ,|W1 |
|W1 |,W212 2

 
= Id 3 + K−1
(312) + K −1 ⊗ −1 ⊗
(231) κ |W1 |,W21 = L12 ,11 κ |W1 |,W21 .
2 2
5.2 Elliptically Symmetric and Skew-Spherical Distributions 271

We use conditional cumulant (see Example 3.27, p. 157) for both terms. Let us apply
Brillinger’s theorem ((3.67), p. 156) with condition Q = [β1 , R] to
     
κ⊗ ⊗ −1 ⊗ ⊗ ⊗
|W1 |,3 = Cum1 κ |W1 |,3|Q + L12 ,11 Cum2 κ |W1 |,2|Q , κ |W1 |,1|Q + Cum3 κ |W1 |,1|Q
 
= μβ1 ,3 μR,3 κ ⊗
|U |,3 + κ 2 2 L −1
β R ,β R 1 ,1 κ ⊗
|U |,2 ⊗ κ ⊗ ⊗3
|U |,1 + κβ1 R,3 κ |U |,1
1 1 1 2 1 1 1 1

(see (A.2), p. 353, for commutator L−1


12 ,11 ). The crossproduct term simplifies as well:
neglecting zero terms,
 
κ⊗
|W1 |,W212 = Cum1 Cum3 (|W1 | , W2 , W2 |Q)
 
+L−111 ,12 Cum2 Cum1 (|W1 | |Q) , Cum2 (W2 |Q)
 
+Cum3 Cum1 (|W1 | |Q) , Cum1 (W2 |Q) , Cum1 (W2 |Q)
 
= L−1 ⊗ ⊗
11 ,12 Cum2 κ |W1 |,1|Q , κ |W1 |,2|Q ,

since there is conditional independence and zero expected values. We obtain


 
κ⊗ −1
|W1 |,W212 = L11 ,12 Cum2 Cum1 (β1 R |U1 | |Q) , Cum2 (β2 RU2 |Q)
 
= κβ R,β 2 R 2 L−1
1 ,1 κ ⊗
|U |,1 ⊗ κ ⊗
U ,2
1 2 1 2 1 2

κβ R,β 2 R 2 G1  
L−1
1
= √2 1 ,1 1p ⊗ vecId . (5.42)
d π 1 2

We can express the cumulants of the products of β1 and R in terms of individual


moments and cumulants (see (A.35), p. 375, Sect. A.6.2 for more details). Now we
have
  
κ⊗X,3 = ⊗3
μ β 1 ,3 μ R,3 κ ⊗
|U1 |,3 + κ 2 2 L−1
L−1
β1 R ,β1 R 12 ,11 12 ,11 κ ⊗
|U1 |,2 ⊗ κ ⊗
|U1 |,1

+κβ1 R,3 κ ⊗3
|U1 |,1 (5.43)
κβ1 R,β 2 R 2  
+√ 2
1p ⊗ vec Id −  .
2πd (d − 1)

One can also use the formula for expressing cumulants in terms of moments (see
(3.33)) and find the cumulant as

κ⊗ ⊗ −1 ⊗ ⊗ ⊗3
X,3 = μX,3 − L12 ,11 μX,2 ⊗ μX,1 + μX,1 ,

where the moments are given above.


272 5 Multivariate Skew Distributions

Case 5.4 (Fourth-Order Cumulant) Let us consider the moment first


  1/2⊗2 
μ⊗ = ⊗4 ⊗
μ|W1 |,4 + L−1
 ⊗2
⊗ Id − 
μ⊗
X,4 12 ,21 |W1 |⊗2 ,W⊗2
2
 1/2⊗4 
+ Id −  μ⊗ ⊗4 ,
W2

where commutator matrix L−1


12 ,21 is as before, cf. (A.5), p. 354.
The calculation of moments included in μ⊗ X,4 is straightforward by using
Lemma 5.12:

μ⊗ ⊗
|W1 |,4 = μR,4 μβ1 ,4 μ|U1 |,4 ;

we refer to (5.87) for details. Take the next term

μ⊗ = μR,4 μβ 2 ,β 2 μ⊗ ⊗
|U1 |,2 ⊗ μU2 ,2
|W1 |⊗2 ,W⊗2
2 1 2

μR,4 μβ 2 ,β 2  1 1 1  

= 1 2
vecIp + 1 2 − vecIp ⊗ vecId ,
d d π G2 (p) p

and finally

1
μ⊗ ⊗4 = μR,4 μβ2 ,4 μ⊗⊗4 = L−1 vec⊗2 vecId .
W2 U2 d (d + 2) 22

Now we summarize these terms and get

μ⊗ ⊗4 ⊗
X,4 = μR,4 μβ1 ,4  μ|U1 |,4
1 −1  


+μR,4 μβ 2 ,β 2 L1 ,2 vec 
1 2 d2 2 1

d    
  
+ vec 1p 1p  −  ⊗ vec Id − 
πG2 (p)
1  
+ L−1
22 vec
⊗2
Id −  .
d (d + 2)

Now the fourth-order cumulant has the form


 1/2⊗4
κ⊗
X,4 =  ⊗4 ⊗
κ |W1 |,4 + I d −  
κ⊗W2 ,4
  1/2⊗2 
+ L−1
12 ,21 
⊗2
⊗ Id −  κ⊗
|W1 | . (5.44)
12 ,W212

See Sect. 5.7.3 for the proof.


5.2 Elliptically Symmetric and Skew-Spherical Distributions 273

Let us define partitions K ∈ Pm , and L ∈ P2k , i.e. K is a partition of 1 : m


with blocks a, and L is a partition of 1 : 2k, so that each block b ∈ L has even
cardinality. Naturally K ∪ L ∈ Pm+2k . We recall notation 2 = Id −  .
Lemma 5.6 Let W1 and W2 be defined by the model CFUSS, cf. (5.36); then for
m, k ≥ 0 we have 2 = Id − 

(2k − 1)!!  ⊗m ⊗   
μ⊗
|W1 | = μR,2k+m μβ m ,β 2k  μ|U1 |,m ⊗ L−1 ⊗k
k2 vec 2 ,
1m ,W212k 1 2 k
2 (d/2)k
(5.45)

where the moment μβ m ,β 2k is given by G-function (μβ m ,β 2k = Gm,2k (p, d), cf.
1 2 1 2
(A.34), p. 375), T-moment μ⊗
|U1 |,m depends on p and m only (see Lemma 5.12)
and where L−1k2 denotes the moment commutator (see Sect. A.2, p. 353 for moment
commutators L−1lr:1 ).
The joint T-cumulant of |W1 | and W2 can be calculated by the following formula:
 
κ⊗
|W1 | = ϒs,t (Q) K−1
 
1m ,2 W212k p K{s} ,L{t}
t =1:k, K{s} ∈Pm ,
s=1:m L ∈P2k
{t}
⊗
× ⊗|aj | κ ⊗
|U1 |,|aj |
j =1:s
⊗ 
⊗ μU,|bj | L−1 ⊗k
|bj |2 vec 2 , (5.46)
j =1:t

where the summation goes over all K{s} ∈ Pm and L{t } ∈ P2k and where ϒs,t (Q)
is given by the cumulants
   
ϒs,t (Q) = κβ m β 2k ,k Cum|h| R |ai | , R |bj | |i,j ∈h .
1 2 h∈K
K∈Ps+t

The formula (5.46) is valid for m, k ≥ 0; if either m = 0 or k = 0 then the other


variable is missing from the summation.
See Sect. 5.7.4 for the proof.
There are some more ways to find cumulants κ ⊗|W1 |1m ,W212k ; instances are either
via moments (3.28), Malyshev’s formula (see (3.7), p. 148) or formula (3.71), p.
158, etc.
We state the following theorem on general formulae for T-moments and T-
cumulants of CF U SSd,p (0, R, ) distributions.
274 5 Multivariate Skew Distributions

Theorem 5.3 Assume X fulfills Eq. (5.36); then


[n/2]
(2k − 1)!!
μ⊗
X,n = L−1
1n−2p ,212p μR,2k+m μβ m ,β 2k
1 2 2p (d/2)p
p=0
  
× ⊗(n−2p) μ⊗
|U1 |,m ⊗ L−1
p2 vec⊗p
Id −  
,

and


[n/2]   1/2⊗2p 
κ⊗
X,n = L−1
1n−2p ,212p  ⊗(n−2p)
⊗ Id 2 − ⊗2
κ⊗
|W1 | ,
1n−2p ,W212p
p=0
(5.47)

where [n/2] denotes the entire part of n/2.


Proof The cumulant κ ⊗
X,n is Sd,n -symmetric (see Property 3.1, p. 124); therefore
Sd1n
  1/2 
⊗ 
κ X,n = Cumn  |W1 | + I −  W2

n  
   
 n 1/2
 
= Cumn ( |W1 |)1n−m , I −  W2
m 1m
m=0
n  
  1/2⊗m
 n
= ⊗(n−m) ⊗ I −  κ⊗
|W1 |1n−2m ,W212m .
m
m=0

Cumulants κ ⊗
|W1 |1m ,W21n−m , where the number of W2 is odd, are zero by Lemma 5.5;
hence (5.47) follows, except the coefficients, which we shall prove separately. Let
us consider κ ⊗
X,3 first, put n = 3 in (5.47), and obtain
  
κ⊗ ⊗3 ⊗
X,3 =  κ |W1 |,3 + 3Sd13  ⊗ Id 2 − 
⊗2
κ⊗|W1 |,W21 . 2

The cumulant κ ⊗
|W1 |,W21 has been considered in Example 5.42; now let us consider
2
it together with its coefficient. In a general case the cumulant κ ⊗|W1 |1n−m ,W21m
⊗  
 1/2⊗2k
contains the cumulant κ U2 ,2k with coefficient I −  , namely
 1/2⊗2k  1/2⊗2k
I −  κ⊗
U2 ,2k = c Id − 
vec⊗k Id
 ⊗k
= c Id 2 − ⊗2 vec⊗k Id ,

where the constant c = Cum2k (U1 ), W1 ∈ Rp , and W2 ∈ Rd . We see that the


above calculation strongly depends on the particular cumulants of U2 (see (5.83)).
5.3 Multivariate Skew-t Distribution 275

5.3 Multivariate Skew-t Distribution

We start with the multivariate t-distribution.

5.3.1 Multivariate t-Distribution

Multivariate W is t-distributed, W ∈ Mtd (p, 0, Id ), if


7
p
W= Z,
S2

where Z ∈ Nd (0, Id ) is standard normal, and S 2 is χ 2 distributed with degrees of


freedom p. Random variable W ∈ Mtd (p, 0, Id ) is spherically distributed, since
we have

√ Z
W= p Z/ Z = RU, (5.48)
S

where R = p Z /S is the generating variate, so that R 2 /d ∈ F (d, p) has F -
distribution with degrees of freedom d and p.
Let μ ∈ Rd ; A is a d × d matrix; then the linear transform

X = μ + A W

will be considered as X ∈ Mtd (p, μ, ), where = A A; hence, X is an


elliptically symmetric random variable. The characteristic function of X is quite
complicated; therefore, we use formula (5.48) that is the stochastic representation
of W for deriving higher-order cumulants including skewness and kurtosis.
We shall use the even order moments of generating variate R, i.e. the even order
moments of the F -distribution with degrees of freedom d and p. As far as p > 2m,
we have

ER 2m  p m  (d/2 + m)  (p/2 − m)  p m
= = G2m (d) G−2m (d) ;
dm d  (d/2)  (p/2) d

therefore, by (A.26) and (A.28) p. 370, we have

pm (d/2)m pm (d/2)m pm (d/2)m


μR,2m = = = .
(p/2 − m + 1)m (1 − p/2)m (p/2 − m)m
276 5 Multivariate Skew Distributions

We also have the even order moments of the components of uniform distribution (on
sphere Sd−1 )

(2m − 1)!!
μU1 ,2m = ,
2m (d/2)m

see (5.81). These two moments provide us the moments for components of W

pm (d/2)m (2m − 1)!! pm (2m − 1)!!


μW,2m = μR,2m μU1 ,2m = = . (5.49)
(p/2 − m)m 2m (d/2)m 2m (p/2 − m)m

We recall that all entries of W have the same distribution; therefore, we can use
notation W as a general entry.
The cumulants with even order of W can be calculated with the help of cumulant
parameters 
κm . We consider the moment parameters μm first, which is
 
1 μR,2m (d/2)m pm (d/2)m p/2 − 1 m (p/2 − 1)m

μm = m −1 = −1= −1
α (d, m) μR,2 (d/2)m (p/2 − m)m pd/2 (p/2 − m)m
(5.50)

(see (5.29) for moments of R, and (A.24) for Pochhammer’s Symbol). Then
cumulant parameter 
κm is calculated by the general expression (5.27).
Now we use Theorem 5.2 to state the following:
Lemma 5.7 Let p > 2m and W be t-multivariate, W ∈ Mtd (p, 0, Id ), with
dimension d and degrees of freedom p, then EW = 0, and both the moments and the
cumulants with odd higher order are zero. The moments with even order are given
by

pm
μ⊗
W,2m = L−1 (vecId )⊗m .
2m (p/2 − m)m m2

The variance–covariance matrix of W has the form


p
VarW =  = Id ;
p−2

it is diagonal and the even order standardized cumulants are as follows:


 
Cum2m  −1/2 W = 
κm L−1
m2 (vecId )
⊗m
,
5.3 Multivariate Skew-t Distribution 277

where the cumulant parameters  κm are given by the expression


√ (5.27) of moment
parameters (5.50); in particular cases we have Cum2 Wj / 2ν1 = 1, and
 
Wj 2
Cum4 √ = 3!! , (5.51)
2ν1 p−4
 
Wj 16
Cum6 √ = 5!! .
2ν1 (p − 4) (p − 6)

5.3.2 Skew-t Distribution

Let V be a d-dimensional multivariate skew-normal distribution, V ∈


SNd (0, , α), and S 2 be χ 2 distributed with degrees of freedom p random variable;
moreover, let V and S 2 be independent. Define a Skew−t distributed random vector
X by

p
X=μ+ V (5.52)
S
and denote it by Std (μ, , α, p).
We introduce the notation

p
Rp = , (5.53)
S
where S is χ distributed variable with shape parameter p; hence,

X = μ + Rp V. (5.54)

The skew-normal distribution is characterized by the skewness vector δ, which is


given by α (see (5.3)).
Let us start with the mean

EX = μ + ERp EV,

where EV = κ ⊗ V,1 and recall that the higher-order cumulants of skew-normal


variates have been given in Lemma 5.1 as follows:
7
2 2
κ⊗
V,1 = δ, κ ⊗
V,2 = vec − δ ⊗2 ,
π π

and in general if j > 2, we have

κ⊗ ⊗j
V,j = κ|Z|,j δ , j = 2.
278 5 Multivariate Skew Distributions

We provide moments of Rp (which are moments of χ distributions), in the following


Proposition.
9 −k
Proposition 5.2 Let p > k, and μRp ,k = E S 2 /p ; then

 p k/2  ((p − k) /2)  p k/2


μRp ,k = = G−k (p) , (5.55)
2  (p/2) 2

where Gk (d) is defined by (5.79), for particular values of Gk (d) and μRp ,k (see
Sect. A.6.1, p. 370).
Now turning back to expected value
7
p
μ⊗
X = μ + G−1 (p) δ,
π

again we recall that δ is given by α, see (5.3).


The variance vector has the following form:
 2
κ⊗ ⊗ 2 ⊗2
X,2 = κ Rp V,2 = ERp V − ERp (EV)⊗2 = μRp ,2 μ⊗ ⊗2
V,2 − μRp ,1 μV,1
2

p p p p
= G−2 (p) vec − G2−1 (p) δ ⊗2 = vec − G2−1 δ ⊗2 ;
2 π p−2 π

therefore, the variance matrix of X is


p p
X = VarX = − G2−1 (p) δδ  . (5.56)
p−2 π

We are interested in obtaining skewness and kurtosis of X later on; therefore, our
subject will be the first four cumulants of X.

5.3.3 Higher-Order Cumulants of Skew-t Distributions

We have seen (Lemma 3.4 and Exercise 5.14) the following form of third-order
cumulant

 
κ⊗
X,3 = κ Rp ,3 κ ⊗
V,3 + 3κ ⊗
V,1 ⊗ κ ⊗
V,2 + κ ⊗3
V,1
 
+ κRp ,1 κRp ,2 3κ ⊗ V,3 + 6κ ⊗
V,1 ⊗ κ ⊗ ⊗
V,2 + κRp ,1 κ V,3 ,
3

where we used symmetrizer Sd13 to get the right-hand sides above. Now we show
another equivalent form by the following Lemma.
5.3 Multivariate Skew-t Distribution 279

Lemma 5.8 Let p > 2; then

κ⊗
X,3 = c1 (p) δ
⊗3
+ c2 (p) L−1
12 ,11 (vec ⊗ δ) ,

where
7  
p 2 2 1
c1 (p) = p G−1 (p) G (p) − ,
π π −1 p−3
 p 3/2 7 7
2G−1 (p) 2 p pG−1 (p)
c2 (p) = = .
2 (p − 2) (p − 3) π π (p − 2) (p − 3)


Proof We apply symmetrizer Sd13 and use symmetry equivalence form =. Direct
calculation shows

κ⊗ ⊗ ⊗ ⊗ ⊗3
X,3 = μRp ,3 μV,3 − 3μRp ,1 μRp ,2 μV,1 ⊗ μV,2 + 2κRp ,1 μV,1 ;
3

hence, we have

κ⊗ ⊗ −1 ⊗ ⊗ ⊗3
X,3 = μRp ,3 μV,3 − 3μRp μRp ,2 L12 ,11 μV,2 ⊗ μV,1 + 2κRp ,1 μV .
3

Now the quantities included in the coefficient of δ ⊗3 are given by (5.55), with
particular values in Sect. A.6.1, p. 370 and those of cumulants κ|Z|,1 are at (A.38).
Therefore,
7 7
p p p p
κ⊗
X,3 = G−1 (p) μ⊗ − G−1 (p) L−1 ⊗ ⊗
12 ,11 μV,1 ⊗ μV,2
p−3 2 V,3 p−2 2
7
p p 3
+2 G (p) μ⊗3
2 2 −1 V
7  
p p p p 2
= G−1 (p) μ⊗ − 3 L −1
μ ⊗
⊗ μ ⊗
+ 2 G (p) μ⊗3
2 p − 3 V,3 p − 2 12 ,11 V V,2 2 −1 V

and
7
2
μ⊗
V = δ, μ⊗V,2 = vec ,
π
7 7
2 ⊗3 2
μ⊗
V,3 =− δ +3 vec ⊗ δ,
π π
280 5 Multivariate Skew Distributions

since
⎛ 7  7 ⎞ 7 7 3
3  
⊗ 2 2 ⎠ ⊗3 2 −1 2 ⊗2 2

μV,3 = 2 − δ +3 L vec − δ ⊗δ+ δ ⊗3
π π π 12 ,11 π π
⎛ 7  7 7 3 7 3 ⎞ 7
3
2 2 2 2 ⎠ ⊗3 2 −1
= ⎝2 − −3 + δ +3 L vec ⊗ δ
π π π π π 12 ,11
7 7
2 ⊗3 2 −1
=− δ +3 L vec ⊗ δ.
π π 12 ,11

We proceed with collecting coefficients


7   7 7 
p p 2 ⊗3 2
κ⊗
X,3 = G−1 (p) − δ +3 vec ⊗ δ
2 p−3 π π
7 7 3 ⎞
p 2 p 2 2
−3 vec ⊗ δ + 2 G−1 (p) δ ⊗3 ⎠
p−2 π 2 π
⎛⎛ 7 3 7 ⎞
7
p 2 p 2 ⎠ ⊗3
= G−1 (p) ⎝⎝pG2−1 (p) − δ
2 π p−3 π
7  
2 p p
+3 − vec ⊗ δ
π p−3 p−2
7   
p 2 1 1
=p G−1 (p) G2−1 (p) − δ ⊗3 + 3 vec ⊗ δ
π π p−3 (p − 3) (p − 2)

and obtain κ ⊗
X,3 the assertion of the Lemma.
Let us recall Lemma 3.4, where we have seen the following form of fourth-order
cumulant:

 
κ⊗ ⊗ ⊗ ⊗ ⊗2 ⊗ ⊗2
X,4 = κRp ,4 κ V,4 + 4κ V,3 ⊗ κ V,1 + 3κ V,2 + 6κ V,2 ⊗ κ V,1 + κ V,1
⊗4
(5.57)
 
+ 4κRp ,3 κRp ,1 κ ⊗ ⊗ ⊗ ⊗2 ⊗
V,4 + 3κ V,3 ⊗ κ V,1 + 3κ V,2 + 3κ V,2 ⊗ κ V,1
⊗2

 
+ 3κR2 p ,2 κ ⊗V,4 + 4κ ⊗
V,3 ⊗ κ ⊗
V,1 + 2κ ⊗2
V,2 + 4κ ⊗
V,2 ⊗ κ ⊗2
V,1
 
+ 6κRp ,2 κR2 p ,1 κ ⊗ ⊗ ⊗ ⊗2 ⊗
V,4 + 2κ V,3 ⊗ κ V,1 + 2κ V,2 + κRp ,1 κ V,4 ,
4
5.3 Multivariate Skew-t Distribution 281

where
   2
2 ⊗2 ⊗2  2 2
κ ⊗2
V,2 = vec − δ = (vec ) ⊗2
− 2 vec ⊗ δ ⊗2
+ δ ⊗4 .
π π π

A more compact form will be provided by the following Lemma.


Lemma 5.9 Let p > 3; then the fourth-order cumulant of X ∈ Std (μ, , α, p) is
given by
   
κ⊗
X,4 = c1 (p) δ
⊗4
+ c2 (p) L−1
22 (vec )
⊗2
− c3 (p) L12 ,21 vec ⊗ δ ⊗2 ,

where
 
2p2 2 2 3
c1 (p) = G − G2−1 ,
π −1 p−3 π
2p2
c2 (p) = ,
(p − 4) (p − 2)2
2 p2
c3 (p) = G2 .
π (p − 3) (p − 2) −1

We have also

 2 
κX,4 = μRp |Z| ,4 δ ⊗4 + 3 (vec )⊗2 + 6 κRp ,Rp ,Rp2 − 1 vec ⊗ δ ⊗2 .
π
Proof Let G−1 = G−1 (p). One can derive the formula


 
κX,4 = κRp4 ,1 κ ⊗
V,4 + 4κ 3
Rp ,Rp κ ⊗
V,1 ⊗ κ ⊗ ⊗2
V,3 + 3κRp2 ,Rp2 κ V,2

+ 6κRp ,Rp ,Rp2 κ ⊗2 ⊗


V,1 ⊗ κ V,2 + κRp ,4 δ
⊗4

by (5.58) for κX,4 directly. We rather pay attention to the particular value of κ ⊗
V,2
 

κX,4 = κRp4 ,1 κ|Z|,4 + 4κRp ,Rp3 κ|Z|,1 κ|Z|,3 + κRp ,4 κ|Z|,1
4
δ ⊗4
 ⊗2  
2 2
+3κRp2 ,2 vec − δ ⊗2 + 6κRp ,Rp ,Rp2 κ|Z|,1
2
vec − δ ⊗2 ⊗ δ ⊗2 ;
π π
282 5 Multivariate Skew Distributions

we have κRp4 ,1 = μRp ,4 ; therefore,



κX,4 = κRp4 ,1 κ|Z|,4 + 4κRp ,Rp3 κ|Z|,1 κ|Z|,3 + κRp ,4 κ|Z|,1
4

 2  2 
2 2
+3 κRp2 ,2 − 6 κRp ,Rp ,Rp2 δ ⊗4
π π
2 
+3κRp2 ,2 (vec )⊗2 + 6 κRp ,Rp ,Rp2 − κRp2 ,2 vec ⊗ δ ⊗2 .
π
We proceed with formulae of Sect. A.6.1

2p2
κRp2 ,2 =
(p − 4) (p − 2)2

and
 p 2 4
κRp ,Rp ,Rp2 − κRp2 ,2 = − G2−1 .
2 (p − 2) (p − 3)

The coefficient of δ ⊗4 follows:


 2
2
μRp ,4 κ|Z|,4 + 4κRp ,Rp3 κ|Z|,1 κ|Z|,3 + 3 κRp2 ,2
π
 2
2 4
−6 κRp ,Rp ,Rp2 + κRp ,4 κ|Z|,1
π
         2
2 2 2 2 2 2 2
= −6 +4 μRp ,4 + 8 −4 κRp ,Rp3 + 3 κRp2 ,2
π π π π π
 2  2
2 2
−6 κRp ,Rp ,Rp2 + κRp ,4
π π
 2  
2
= −6μRp ,4 + 8κRp ,Rp3 + 3κRp2 ,2 − 6κRp ,Rp ,Rp2 + κRp ,4
π
2 
+4 μRp ,4 − κRp ,Rp3 ,
π
where

p2
−6μRp ,4 + 8κRp ,Rp3 + 3κRp2 ,2 − 6κRp ,Rp ,Rp2 + κRp ,4 = −3 G−1 (p)4
2
5.3 Multivariate Skew-t Distribution 283

and where

1 2 p2
μRp ,4 − κRp ,Rp3 = G .
2 −1 p − 3

The cumulants contained in this expression are considered in Lemma 5.1.


The cumulants of V, except the second-order one, depend on Kronecker powers
δ ⊗k multiplied by a constant; therefore, the usage of commutator matrices is not
necessary. The products including κ ⊗2
V,2 = vec − 2/πδ
⊗2
need to be considered
   
2 ⊗2 2 ⊗2
L−1 κ ⊗2
22 V,2 = L−1
22 vec − δ ⊗ vec − δ
π π
 2
2 −1   2
= L−1
22 (vec )
⊗2
− L22 vec ⊗ δ ⊗2 + δ ⊗2 ⊗ vec + 3 δ ⊗4 .
π π

Closing this section we provide a formula for the cumulant κ ⊗


X,n with general
order n.
Theorem 5.4 Let X ∈ Std (μ, , α, p); then
7
p
κ⊗ ⊗
X,1 = μX = μ + G−1 (p) δ,
π
p p
κ⊗
X,2 = vec − (G−1 (p))2 δ ⊗2 ;
p−2 π

if n > 2 and p > n − 1, then n-symmetrized version by symmetrizer Sd1n of κ ⊗


X,n is
the following:
⎛ ⎞

n  
n  j
   1 κ|Z|,j 1
κ⊗
X,n = n! Cr Rp , 1:n ⎝ ⎠
j ! j! 2 !
r=1 j =r, j =1,j =2
j j =n
 2
1
× δ ⊗(n−22 ) ⊗ κ ⊗ 2
V,2 , (5.58)
2!

where
     
   
Cr Rp , 1:n = Cum Rp { } , Rp2 , . . . , Rpn ,
1 {2 } {n }

 
j
where Rp corresponds to the block with cardinality j , which includes the
{ j }
j j
power Rp only (it implies listing Rp consecutively j times).
284 5 Multivariate Skew Distributions

We can also express κ ⊗


X,n ignoring symmetrization


n     1
κ⊗
X,n = Cr Rp , 1:n (n − 22 )!
j !
r=1 j =r, j =1:(n−r+1)\2
j j =n
 j  
κ|Z|,j ⊗2
× L−1 δ ⊗(n−22 )
⊗ κ V,2 ,
j! [n−22 ]1 ,2

which might be useful from computational point of view (see Sect. A.2 for moment
commutator L−1
[n−22 ]1 ,2 , it has n!/2 ! (n − 22 )! (2!) terms).
2

Proof We combine (3.68), p. 157, and (3.71), p. 158


n      
n!
κ⊗
X,n = Sd1n n j
Cumr κ⊗
X,1|R , . . . , κ⊗
X,n|Rp 1:n ,
j =1 j ! (j !)
p 1:1
r=1 j =r,
j j =n

 
where κ ⊗ ⊗
X,j |Rp = Cumj X|Rp 1:j denotes j copies of κ X,j |Rp , as usual, including
 
the case j = 0, when Cumj X|Rp is missing from Cumr . Therefore, Cumr
contains exactly r variables. The conditional cumulant

κ⊗ ⊗ ⊗ j
X,j |Rp = κ Rp V,j |Rp = Rp κ V,j ,

since Rp and X are independent. We apply Lemma 5.1 to obtain the κ ⊗


V,j and get

 

n  Cr Rp , 1:n ⊗ ⊗
κ⊗
X,n = n! n κ V,jj
=1:n
j =1 j ! (j !)
j j
r=1 j =r,
j j =n


n  
n  j
  1 κ|Z|,j 1
= n!Sd1n Cr Rp , 1:n
j ! j! 2 !
r=1 j =r, j =1,j =2
j j =n
 2
1
× κ ⊗
V,2 ⊗ δ
2 ⊗(n−22 )
,
2!

where we have separated j = 2, since the second-order cumulant κ ⊗


V,2 is different
in the product.
5.4 Scale Mixtures of Skew-Normal Distribution 285

5.4 Scale Mixtures of Skew-Normal Distribution

A possible generalization of multivariate skew-t distributions is the following.


Let V be a d-dimensional multivariate skew-normal distribution, V ∈
SNd (0, , α), and define K (η) as a weight function depending on mixing variable
η with some cumulative distribution function H (x). We define multivariate X with
the Scale Mixtures of Skew-Normal Distribution, X ∈ SMSNd (μ, K (η) , α), by
the stochastic equation

X = μ + K 1/2 (η) V,

where we assume that V and η are independent variates. Let us denote the random
weight function by

ξ = K 1/2 (η) . (5.59)

Recall that the cumulants of V are given in Lemma 5.1 and suppose we are given
the cumulants of ξ .
In this section we derive the first four cumulants for X ∈ SMSNd (μ, K (η)
, α).
The first cumulant, i.e. the moment of X, is

κ⊗
X,1 = μ.

The variance of X is calculated by using the formula of cumulant for the product of
independent variates (see (3.69), p. 158). We obtain
 
2 ⊗2 2
κ⊗
X,2 = κξ 2 ,1 κ ⊗
V,2 + κξ,2 κ ⊗2
V,1 = κξ 2 ,1 vec − δ + κξ,2 δ ⊗2
π π
  2 ⊗2
= κξ 2 ,1 vec + κξ,2 − κξ 2 ,1 δ
π
2 2 ⊗2 2
= κξ 2 ,1 vec − κξ,1 δ = μξ,2 vec − μ2ξ δ ⊗2 . (5.60)
π π

The third-order cumulant of X follows in the same way (see Exercise 3.21, p.
178)

κ⊗ ⊗ −1 ⊗ ⊗ ⊗3
X,3 = κξ 3 ,1 κ V,3 + κξ,ξ 2 L12 ,11 κ V,2 ⊗ κ V,1 + κξ,3 κ V,1 ,
286 5 Multivariate Skew Distributions

where the expression for the cumulants of V has been found; therefore

κ⊗ ⊗ −1 ⊗ ⊗ ⊗3
X,3 = κξ 3 ,1 κ V,3 + κξ,ξ 2 L12 ,11 κ V,2 ⊗ κ V,1 + κξ,3 κ V,1
   7  7  
2 3/2 2 ⊗3 −1 2 2
= κξ 3 ,1 2 − δ + κξ,ξ 2 L12 ,11 vec − δ ⊗2
π π π π
7
3/2 2 ⊗3
⊗δ + κξ,3 δ
π
7
⊗3 −1 2
= c1 δ + κξ,ξ 2 L12 ,11 vec ⊗ δ, (5.61)
π

where L−112 ,11 denotes the moment commutator (see (A.2), p. 353). Collecting
coefficients of δ ⊗3 we obtain that
   2 3/2 7
2
c1 = 3μξ,3 − κξ,1
3
− μξ,3 .
π π

We apply the formula (3.72), p. 159 and moment commutators to get the fourth-
order cumulant

κ⊗ ⊗ −1 ⊗ ⊗
X,4 = κξ 4 ,1 κ V,4 + L13 ,11 κξ 3 ,ξ κ V,3 ⊗ κ V,1

+ κξ 2 ,2 L−1 ⊗2 −1 ⊗ ⊗2 ⊗4
22 κ V,2 + κξ,ξ,ξ 2 L12 ,21 κ V,2 ⊗ κ V,1 + κξ,4 κ V,1 . (5.62)

One can insert cumulants κ ⊗ ⊗


V,k by Lemma 5.1 and express κ X,4 in terms of T-
⊗k ⊗j
products of δ and vec with coefficients given by the cumulants of ξ .
Now we state the general formula for cumulants κ ⊗
X,n based on the expression
(3.71), p. 158 as follows:


n  ⊗ ⊗
κ⊗
X,n = Cr (ξ, 1:n ) L−1
l1:r κ V,jj (5.63)
j =1:(n−r+1)
r=1 j =r,
jj =n

 
n  ⊗ 1 ⊗
= n! Cr (ξ, 1:n ) κ V,jj ,
j =1:(n−r+1) j ! (j !)j
r=1 j =r,
jj =n

where
 
2
Cr (ξ, 1:n ) = Cumr ξ{1 } , ξ{ 2}
, . . . , ξ n
{n }
5.5 Multivariate Skew-Normal-Cauchy Distribution 287

(see Sect. A.2 for moment commutator L−1 ⊗


l1:r ). We notice that each cumulant κ V,j has
the same form, i.e. κ ⊗
V,j = κ|Z|,j δ
⊗j
but j = 2, which is κ ⊗ ⊗2
V,2 = vec − 2/πδ ,
cf. Lemma 5.1; therefore, we separate the term j = 2 and obtain


n    j
1 κ|Z|,j
κ⊗
X,n = Cr (ξ, 1:n ) (n − 22 )!
j ! j!
r=1 j =r, j =1:(n−r+1)\2
j j =n

L−1
[n−22 ]1 ,2 δ
⊗(n−22 )
⊗ κ ⊗ 2
V,2 , (5.64)

where the number of terms in moment commutator L−1


1n−2 ,2 follows from those of
2
L−1
l1:r used in (5.63), i.e.

  j
1 1 (n − 22 )! (2!)2 2 !
n!
j ! j! n!
j =1:(n−r+1)

  j
1 1
= (n − 22 )! .
j ! j!
j =1:(n−r+1)\2

5.5 Multivariate Skew-Normal-Cauchy Distribution

The cumulant generating function of multivariate Skew-Normal-Cauchy distribu-


tion SNC(,a) with  positive definite d × d matrix is

1   
ψX (λ) = log 2 + λ λ + log 1 − F 0|a λ, a a ,
2
 
where F 0|a λ, a a is the cdf of skew-generalized normal distribution.
We introduce a  = δ  , a a = α 2 > 0, and
  4 ∞  
f (λ) = 1 − F 0|δ λ, α = 2ϕ (x)  h (x) δ  λ dx, (5.65)
0

where  denotes the univariate standard normal distribution and


x
h (x) = √ . (5.66)
1 + α2 x 2

We have f (0) = 1/2.


288 5 Multivariate Skew Distributions

Faà di Bruno’s formula provides the derivatives of log f (λ) in terms of f (λ)
and its derivatives (see (2.58), p. 95), namely


n 
Dλ⊗n log f (λ) = (−1)r−1 (r − 1)!f −r (λ)
r=1 j =r,j j =n
⎛ ⎞
 ⊗ ⊗
×⎝ Kp−1K ⎠ fλ,j j , (5.67)
( {r|} )
K{r|} ∈Pn

⊗j
where fλ,j = Dλ f (λ), and the third summation is over all possible par-
titions K{r|} ∈ Pn , with size r and type . The number of such K{r|} is

n! n−r+1
j =1 1/j ! (1/j !)j .
Now, let Z denote the standard normal variate.
The derivatives of f (λ) are as follows. The first-order one is
4 ∞  
1
Dλ⊗ log f (λ) = 2ϕ (x) Dλ⊗  h (x) δ  λ dx (5.68)
f (λ) 0
4 ∞  
1
= 2ϕ (x) ϕ h (x) δ  λ h (x) dxδ,
f (λ) 0

which corresponds to n = 1, in (5.67). The expected value also follows from this;
putting λ = 0 in (5.68) we obtain
4 7
* 2 ∞ 2
Dλ⊗ log f (λ)*λ=0 = √ 2ϕ (x) h (x) dxδ = μh(|Z|) ;
2π 0 π

hence,
7
* 2
Dλ⊗ ψX (λ)*λ=0 = λ|λ=0 + μh(|Z|) δ,
π

where μh(|Z|) denotes the expected value of h (|Z|). Therefore, we have


7
⊗ 2
κX,1 = μ⊗
X = μh(|Z|) δ.
π

The higher-order derivatives of f (λ) are necessary to derive cumulants, i.e. to


derive
 (5.67). The derivatives of f (λ) (see (5.65)) depend on the derivatives of
ϕ h (x) δ  λ ; moreover, we shall consider them at λ = 0. The first derivative as we
have seen above
4 ∞   * 7
* * 1 2
Dλ f (λ)*λ=0 =

2ϕ (x) ϕ h (x) δ λ h (x) dxδ**

= μh(|Z|) δ.
0 λ=0 2 π
5.5 Multivariate Skew-Normal-Cauchy Distribution 289

It is known that, for k ≥ 1


 *
⊗(2k−1) *
Dλ ϕ h (x) δ  λ * = 0;
λ=0

therefore,
*
*
Dλ⊗2k f (λ)* = 0.
λ=0

The T-derivatives of ϕ with even orders at zero are given by the following formula:
 * (2k − 1)!!
*
Dλ⊗2k ϕ h (x) δ  λ * = √ h (x)2k δ ⊗2k , k ≥ 1.
λ=0 2π

We apply these results to derive the nth, n > 0 order derivatives of f (λ) at 0. Let
k > 1; then
* 4 ∞   *
* *
Dλ⊗2k+1 f (λ)* = 2ϕ (x) Dλ⊗2k ϕ h (x) δ  λ h (x)* dxδ
λ=0 0 λ=0
4 ∞
(2k − 1)!!
= √ 2ϕ (x) h (x)2k+1 dxδ⊗(2k+1)
2π 0
(2k − 1)!!
= √ μh(|Z|),2k+1 δ ⊗(2k+1) .

We can rewrite the semi factorial (2k − 1)!! by (A.22), p. 368, and obtain (2n)! =
2n n! (2n − 1)!!
*
* (2k)!
Dλ⊗2k+1 f (λ)* = √ μh(|Z|),2k+1 δ ⊗(2k+1) ,
λ=0 2k k! 2π

which is valid for k ≥ 0 as well.* We apply these T-derivatives to the terms of (5.67)
when λ = 0, i.e. Dλ⊗n log f (λ)*λ=0 . First of all we notice that the summation is over
all partitions K{r|} ∈ Pn with size r and type , when each block of K{r|} has odd
⊗j
cardinality, namely j = 0 for even j . Second, the T-derivatives fλ,j = Dλ f (λ)
 ⊗
and their T-products ⊗ fλ,j j are equal to δ ⊗n multiplied by a constant for each
size r and type . Therefore, there is no need touse commutator matrices. For a
given r and  the number of blocks in K{r|} is n! n−r+1
j =1 1/j ! (1/j !)j .
Hence, we conclude that the cumulants of X depend on those types  = 1:n
where components with even indices are zero, i.e. j = 0 for each even j . Let us
290 5 Multivariate Skew Distributions

denote types  with j = 0 for all even j by ∗ and introduce the notation
⎛ ⎞
 ⊗ ⊗
Cr (h, 1:n ) = f −r (λ) ⎝ Kp−1K ⎠ fλ,j j
( {r|} )
K{r|} ∈Pn


n−r+1  j
1 (j − 1)!μh(|Z|),j
= 2r n! √ δ ⊗n
j ! 2(j −1)/2 ((j − 1) /2)!j ! 2π
j =1


n−r+1  j
−(n−r)/2 −r/2 1 μh(|Z|),j
= 2 n!2
r
(2π) δ ⊗n
j ! ((j − 1) /2)!j
j =1
 r/2 n−r+1
 1  μh(|Z|),j j
−(n−r)/2 2
= n!2 δ ⊗n (5.69)
π j ! ((j − 1) /2)!j
j =1

for the terms of sum in (5.67). Observe that in formula (5.67) if n is odd then r
cannot be even and if n is even then r cannot be odd; therefore (−1)r−1 = (−1)n−1 .
The cumulant generating function ψX (λ) contains a quadratic term; therefore,
κ⊗
X,2 requires spatial attention.
Let n = 2 in (5.67). If r = 1, then the only type is 1 = 0, 2 = 1, and j = 2,
which is even; hence, it does not contribute to the second-order derivative. In the
case if r = 2, then 1 = 2 and j = 1 is odd. So we obtain
*
* 2 1 2 2
Dλ⊗2 log f (λ)* = −2! μ δ ⊗2 = − μ2h(|Z|) δ ⊗2 .
λ=0 π 2! h(|Z|),1 π

Now the second T-derivative of 1/2λ λ is vec, so


*
* 2 2
Dλ⊗2 ψX (λ)* = vec − μ δ ⊗2 = κ ⊗
X,2 .
λ=0 π h(|Z|)
Using (5.67) we have the higher-order cumulants of X by the following theorem.
Theorem 5.5 If X ∈ SNC (,a) , then
7
⊗ 2 2 2
κX,1 = μh(|Z|) δ, κ⊗
X,2 = vec − μ δ ⊗2 ,
π π h(|Z|)

and for n > 2


n 
κ⊗
X,n = (−1)
n−1
(r − 1)! Cr (h, 1:n ) δ ⊗n , (5.70)
r=1 ∗ :j =r,j j =n
5.5 Multivariate Skew-Normal-Cauchy Distribution 291

where Cr (h, 1:n ) is defined by (5.69) and where the second summation runs over
all types ∗ , which fulfils the following assumptions: j = 0 for all even j , j = r,
and j j = n.
Our purpose is to provide the first four cumulants in detail, which will be
necessary for the study of multivariate skewness and kurtosis. First, let us consider
these cumulants with respect to the Theorem above; then we calculate μh(|Z|),k by a
clear formula.
n=3 Take n = 3; if r = 1, then 1,2 = 0, 3 = 1, and j = 3; therefore
7 7
−1 2 μh(|Z|),3 ⊗3 2
C1 (h, ∗ = (0, 0, 1)) = 3!2 δ = μh(|Z|),3 δ ⊗3 ;
π 3 π

if r = 3, then 1 = 3, 2,3 = 0, and j = 1; therefore,


 3/2  3/2
2 1 3 ⊗3 2
C3 (h, ∗ = (3, 0, 0)) = 3! μh(|Z|) δ = μ3h(|Z|) δ ⊗3 .
π 3! π

We obtained
7  3/2 
⊗ 2 2
κX,3 = μh(|Z|),3 + 2! μ3h(|Z|) δ ⊗3 . (5.71)
π π

n = 4 Let n = 4; then we have to consider the cases defined by ∗ . If r = 2, then


1 = 1, 3 = 1, 2,4 = 0; therefore,

2 μh(|Z|),3 ⊗4 2
C2 (h, ∗ = (1, 0, 1, 0)) = 4!2−1 μh(|Z|) δ = 4 μh(|Z|) μh(|Z|),3 δ ⊗4 .
π 3 π

Note that if r = 2, then  = (0, 2, 0, 0) does not fulfill the requirement of ∗ .


If r = 4, then 1 = 4, 2,3,4 = 0, and j = 1; therefore,
 2  2
2 1 4 2
C2 (h, ∗ = (4, 0, 0, 0)) = 4! μh(|Z|) δ ⊗4 = μ4h(|Z|) .δ ⊗4
π 4! π

Let us insert these into formula (5.70) and obtain


  2 
⊗ 8 2
κX,4 =− μh(|Z|),3 μh(|Z|) + 6 μ4h(|Z|) δ ⊗4 . (5.72)
π π
292 5 Multivariate Skew Distributions

5.5.1 Moments of h(|Z|)

Cumulants κ ⊗ X,n depend on moments μh(|Z|),k besides skewness parameter δ, cf.


(5.70). Function h (x) is explicitly given by (5.66) and Z is the standard normal
variate; therefore the problem of finding μh(|Z|),n is calculated by the following
integral:
4 ∞ xn
μh(|Z|),n = 2  n/2 ϕ (x) dx.
0 1 + α2 x 2

By changing variable x 2 = y, we obtain


4 ∞
1 y (n−1)/2 −y/2
μh(|Z|),n = √  n/2 e dy
2π 0 αn 1/α 2 +y
 n/2
1 2 √ 2
= √ 2
 (n/2 + 1/2) 2e1/4α D1−2(n/2+1/2) (α)
2π α
7  n/2
1 2 2
=  (n/2 + 1/2) e1/4α D−n (α) , (5.73)
π α2

where we used formulae for parabolic cylinder function (Weber–Hermite function)


D−n/2−3/2 (see [GR00, 3.383.6, p. 348]).
Let Erfc and M denote the complementary error function and Mill’s ratio,
respectively. Note that here we use the “probabilist’s” versions of functions Erfc
and M (see Sect. A.7, p. 376).
The parabolic cylinder function D−m−1 has the following special form (see
[DLM, 12.7.6]):
7
π (−1)m −z2 /4 d m  z2 /2 
D−m−1 (z) = e e Erfc (z) (5.74)
2 m! dzm
7  7 
π (−1)m −z2 /4 z2 /2 2
= e 
Hm (z) e Erfc (z) − Fm (z)

2 m! π
(−1)m −z2 /4  π  
= e Hm (z) M (z) − Fm (z) ,
m! 2

where polynomials Hm and Fm fulfill the following differential equations:

d 
Hm (z) = zHm−1

(z) + H (z)
dz m−1

m−1
dk 
Fm (z) = H (z)
dzk m−1−k
k=0
5.5 Multivariate Skew-Normal-Cauchy Distribution 293

with initial values H0 (z) = 1, F0 (z) = 0 (see Sect. A.7, p. 376 for more details).
We apply the formula (5.74) and get
7  
(−1)n−1 1 2 n/2
μh(|Z|),n =  (n/2 + 1/2)
(n − 1)! π α 2
π 
× 
Hn−1 (1/α) M (1/α) − Fn−1

(1/α)
2
7
(−1)n−1 2(n−1)/2 2
=  (n/2 + 1/2)
(n − 1)!α n π
π 
× 
Hn−1 (1/α) M (1/α) − Fn−1

(1/α) .
2
We are interested in moments with odd orders n = 2k + 1, which have a special
form
7  
2k k! 2 π 
μh(|Z|),2k+1 = 2k+1
H2k (1/α) M (1/α) − F2k 
(1/α)
(2k)!α π 2
7  
1 2 π 
= H 2k (1/α) M (1/α) − F 
2k (1/α)
(2k − 1)!!α 2k+1 π 2
7
1 2
= g2k+1 (α) ,
(2k − 1)!! π

where the function g2k+1 (α) is defined by the equation

1 π 
g2k+1 (α) = 
H2k (1/α) M (1/α) − F2k

(1/α) ,
α 2k+1 2
in particular


g1 (α) = M (1/α) .
α2
We obtained the moments μh(|Z|),2k+1 in the following simple form:
Lemma 5.10 Let function h be defined by (5.66) and Z denote the standard normal
variate; then
7
1 2
μh(|Z|),2k+1 = g2k+1 (α) .
(2k − 1)!! π

Note 1/ (2k − 1)!! = 2k k!/ (2k)!; hence if k = 0, then we have 1 for


1/ (2k − 1)!!.
294 5 Multivariate Skew Distributions

Example 5.6 An instance is the mean μh(|Z|) ,


7 7
2 1 π
μh(|Z|) = g1 (α) = M (1/α) .
π α 2

Example 5.7 We consider the third-order moment μh(|Z|),3 ,


7 7
2 1 2 π  
μh(|Z|),3 = g3 (α) = 3 H2 (1/α) M (1/α) − F2 (1/α)
π α π 2
7    
1 2 π 1
= 3 + 1 M (1/α) − 1/α
α π 2 α2
7   7 
1 π 1 2
= 4 + α M (1/α) −
α 2 α π

(see (A.37), p. 377, for functions H2 and F2 ).


Theorem 5.6 Let X ∈SNC(,a), and a  = δ  , a a = α 2 (> 0); then

⊗ 1 1 2
κX,1 = M (1/α) δ, κ⊗
X,2 = vec − M (1/α) δ ⊗2 , (5.75)
α α2
and

n  r  
r  
⊗ 2 1 gj (α) j ⊗n
κX,n = n! (−1) r−1
(r − 1)! δ ,
π j ! j!
r=1 ∗ (r) j =1

where the type ∗ (r) = 1:r fulfills the following assumptions: j ≥ 0, j =


1, 2, . . . , r, j = r, j j = n, and j = 0 for all even j .
Proof We plug the result of Lemma 5.10 into (5.70) and obtain
 r n−r+1
 1  gj (α) j
2
Cr (h, ∗ ) = n! δ ⊗n ,
π j ! j!
j =1

which proves the assertion.

5.6 Multivariate Laplace

The characteristic function of multivariate Laplace distribution X ∈ MLd (θ , ) is


defined by

1
φX (λ) =   2 .
1 − iλ θ − 1/2 λ θ + 1/2λ λ
5.6 Multivariate Laplace 295

Let us denote the denominator by


 2  
P (λ) = 1 − iλ θ − 1/2 λ θ + 1/2λ λ = 1 − iλ θ + 1/2λ  − θ θ  λ,

so we have the T-derivatives as


 
Dλ⊗ P (λ) = −iθ − λ θ θ + λ,

Dλ⊗2 P (λ) = vec − θ ⊗2 ,


Dλ⊗k P (λ) = 0, k ≥ 3.

We will use Faà di Bruno’s formula (5.67) when the first- and second-order
derivatives of P (λ) count only; therefore, the second summation in formula (2.51)
is over all those partitions K I,II that have blocks with only one or two elements,
i.e. K I,II ∈ PI,II
n (see Sect. 1.4.8.2, p. 47). This implies that the corresponding
I,II
types ∗ (r) of partitions K{r} with size r should fulfill the following assumptions:
j ≥ 0, j = 1, 2, . . . , r, j = r, j j = n, and for j > 2, j = 0.


n
Dλ⊗n log φX (λ) = −Dλ⊗n log P (λ) = (−1)r (r − 1)!P −r (λ)
r=1
⎛ ⎞
 ⎜ ⎟ ⊗ ⊗
⎝ K−1

I,II
⎠ Pλ,j j ,
p K{r}
∗ (r) I,II
K{r}

⊗j I,II
where Pλ,j = Dλ P (λ). The number of partitions K{r} is the number of all
possible blocks containing 1 element each and all possible blocks of the rest subset
of 1 : n containing 2 elements. If we divide 1 : n into r blocks with one or two
elements, then the number of blocks with one element k, say, and the number of
blocks with two elements m fulfill k + m = r and k + 2m = n, i.e. m = n − r.
Therefore, n/2 ≤ r ≤ n, where · denotes ceiling. The corresponding type to a
given r is ∗ (r) = (1 , 2 , 0, . . . , 0) = (2r − n, n − r, 0, . . . , 0).


n
Dλ⊗n log φX (λ) = −Dλ⊗n log P (λ) = (−1)r (r − 1)!P −r (λ)
r=n/2
 
⊗2 ⊗1
× L−1
∗ (r) Pλ,2 ⊗ Pλ,1 ,

where the commutator matrix L−1


∗ (r) is a superposition of n!/1 !2 !2 commutator
2
⊗1 ⊗2
matrices to reach Pλ,1 ⊗ Pλ,2 from various orders of vectors Pλ,1 and Pλ,2 . In this
296 5 Multivariate Skew Distributions

sense L−1 −1
∗ (r) = L21 ,11 ,
I,II
which corresponds to all possible partitions K{r}
2

Dλ⊗ P (0) = −iθ,

Dλ⊗2 P (0) = −θ ⊗2 +vec.

Lemma 5.11 Let random vector X be multivariate Laplace distribution, X ∈


MLd (θ , ); then


n  ⊗(n−r)
κ⊗
X,n = (r − 1)!L−1
∗ (r) vec − θ
⊗2
⊗ θ ⊗(2r−n) . (5.76)
r=n/2

Now we consider some examples when n = 1 : 4.


n=1 If n = 1, then
*
* 1 *
Dλ⊗ log φX (λ)*λ=0 =− Dλ P (λ)**

= iθ,
P (λ) λ=0

and conclude that μ⊗


X = θ.
n = 2 Take n = 2; then
* *
* 1  ⊗ ⊗2 1 *
Dλ⊗2 log φX (λ)* = Dλ P (λ) − Dλ P (λ)**
⊗2
λ=0 P (λ)2 P (λ) λ=0

= −θ ⊗2 + θ ⊗2 − vec;

therefore, κ ⊗
X,2 = vec.
n = 3 Let us apply (5.76) to n = 3. In this case r is either 3/2 = 2 or 3. Let
r = 2; then ∗ (2) = (1, 1, 0) and L−1 −1 −1
∗ (2) = L12 ,11 (see (A.2), p. 353, for L12 ,11 ).
If r = 3, then ∗ (3) = (3, 0, 0) and L−1
∗ (2) = I. We obtain
  
κ⊗ −1
X,3 = L12 ,11 vec − θ ⊗2 ⊗ θ + 2θ ⊗3 = L−1 ⊗3
12 ,11 (vec ⊗ θ ) − θ .
(5.77)

n = 4 Finally, we consider n = 4, now r = 2 : 4. If r = 2, then ∗ (2) =


(0, 2, 0, 0), with L−1 −1
∗ (2) = L22 ; if r = 3, then ∗ (3) = (2, 1, 0, 0), with L∗ (3) =
L12 ,21 ; if r = 4, then ∗ (4) = (4, 0, 0, 0), with L−1
∗ (4) = I; therefore we obtain

 ⊗2   
κ⊗
X,4 = L22 vec − θ
⊗2
+ 2L12 ,21 vec − θ ⊗2 ⊗ θ ⊗2 + 6θ ⊗4
 
= L22 vec⊗2 − vec ⊗ θ ⊗2 − θ ⊗2 ⊗ vec
5.7 Appendix 297

 
+2L12 ,21 vec ⊗ θ ⊗2 + (6 − 12 + 3) θ ⊗4
 
= L22 vec⊗2  + L12 ,21 vec ⊗ θ ⊗2 − 3θ ⊗4 . (5.78)

5.7 Appendix

5.7.1 Spherically Symmetric Distribution

In this section, we discuss spherically symmetric distributions and elliptically


symmetric distributions and derive their cumulants.
For convenience let us introduce
 ((d + k) /2)
Gk (d) = , (5.79)
 (d/2)

which is connected to the expected value of the products of the modulus of uniform
random variables |Ui |ki .
Proposition 5.3 Let U ∈ Rd be uniform random variate on sphere Sd−1 ; then


d
1 
d
E Ui2ki = (2ki − 1)!!, (5.80)
2k (d/2)k
i=1 i=1

(2k − 1)!!
EUi2k = (5.81)
2k (d/2)k
#
for the even order moments of U, where k = ki , and (d/2)k =
d/2 (d/2 + 1) · · · (d/2 + k − 1), and 2k (d/2)k = d (d + 2) · · · (d + 2 (k − 1)).


d
2ki 1  (2ki )!
d
E Ui = . (5.82)
(d/2)k 22ki ki !
i=1 i=1
#
k = ki , and (d/2)k = d/2 (d/2 + 1) · · · (d/2 + k − 1). We have moments for
the T-products of U; the odd moments are zero and the even ones are

EU12m
EU⊗2m = L−1 vec⊗m Id = EU12m Sd12m vec⊗m Id .
(2m − 1)!! m2
298 5 Multivariate Skew Distributions

The odd cumulants of U are zero and the even ones are

Cum2m (U1 ) −1 ⊗m
Cum2m (U) = L vec Id = Cum2m (U1 ) Sd12m vec⊗m Id .
(2m − 1)!! m2
(5.83)
The higher-order T-moments and T-cumulants of U follow from Theorem 5.2, p.
263. The moment μU1 ,2m is clearly given by the formula 5.81; hence, the moment
μ⊗U,2m as well. The cumulant κU1 ,2m is given through the formula “cumulants via
moments,” see (3.27), p. 131; hence, the cumulant κ ⊗
U,2m needs further calculations
in practice.
Example 5.8 We have fourth-order moment

1 3
EU⊗4 = L−1 ⊗2
22 vec vecId = Sd1 vec⊗2 Id ;
d (d + 2) d (d + 2) 4

moreover, the sum of the entries is


  3d
EU⊗4 = .
j d +2
j

To get the fourth-order cumulant of U we need the fourth-order cumulant of entries

3 3 −6
κU1 ,4 = μU1 ,4 − 3μ2U1 ,2 = − 2 = 2 .
d (d + 2) d d (d + 2)

Lemma 5.12 Let U be uniform on sphere Sd−1 . The higher-order moments of the
modulus
7

d
1 1  1 d
E |Ui | =
ki
 ((ki + 1) /2) , (5.84)
π d1 Gk (d)
i=1 i=1
#
where k = ki , d1 is the number of nonzeros ki , in particular
7
1 k!
E |Ui | 2k+1
= .
π G2k+1 (d)

√ modulus of a vector is taken by element-wise; we have E |U|


Next, =
1/ πG1 (d) 1d (see (A.25)) and for T-products

1 1 1  
E |U|⊗2 = vecId + 1d 2 − vecId , (5.85)
d π G2 (d)
5.7 Appendix 299

and
7 7
1 1  
d
⊗3 1 1
E |U| = 
e(k,k,k) + 
e(k,k,j ) (5.86)
π G3 (d) π 2G3 (d)
k=1 j =k
7
1 1 
+ 3

e(j,k,m) ,
π G3 (d)
j =k=m

e(j,k,m) ∈ Sd,3 , and


where

3  d
1 1 
E |U|⊗4 = 
e(j,j,j,j ) + 
e(j,j,j,k)
d (d + 2) π G4 (d)
j =1 j =k

1 
+ 
e(j,j,k,k) (5.87)
4G4 (d)
j =k
7
1 1  1 1 
+ 3

e(j,j,k,m) + 2 
e(j,k,m,n) ,
π G4 (d) π G4 (d)
j =k=m j =k=m=n

e(j,k,m,n) ∈ Sd,4 .
where
Remark 5.5 Observe that in spite of E |Ui |2 = 2 |U|⊗2 = EU⊗2 , since vector
* EU* i,E
|U| contains mixed products entries |Ui | *Uj * (i = j ) and these do not equal
⊗2

Ui Uj . Another observation is that there are only three distinct elements of the third-
order and four distinct elements of the fourth-order moments!
Proof Since U is strongly connected to the standard normal random variable on Rd ,
it is quite straightforward to derive the moments of U

EU⊗(2k+1) = 0, k = 1, 2, . . . ;

hence, the odd order cumulants are zero too

Cum2k+1 (U) = 0,

which follows from the expression of the cumulant via moments. (Namely to
express cumulant via moments one uses the moments of products according to the
partitions of the set {1, 2, . . . 2k + 1}. Now, any partition of this set contains at least
one block that has odd number elements, and it makes the expectation zero.) We
have, see (5.80),

(2k − 1)!!
EUi2k =
2k (d/2)k
300 5 Multivariate Skew Distributions

#
for the even order moments of U, where k = ki , (d/2)k = d/2 (d/2 + 1) · · ·
(d/2 + k − 1), and 2k (d/2)k = d (d + 2) · · · (d + 2 (k − 1)). First we calculate
EU⊗4 . The entries of U⊗4 contain all possible fourth-order products from U. Now
we are interested in those entries of U⊗4 that have even order products. These are
either Uk4 or Uj2 Uk2 , j = k. We shall separate them since EU14 and EU12 U22 , say, are
different. We have
3 1
EU14 = , EU12 U22 =
d (d + 2) d (d + 2)

by (5.80). We notice that EZ⊗4 = 3Sd14 (vecId )⊗2 for a standardized Gaussian
vector with i.i.d. entries. The structure of EZ⊗4 is exactly the same as that of EU⊗4 ,
having nonzero entries EZ14 and EZ12 Z22 , being 3 and 1, respectively. The difference
is 1/d (d + 2) ; hence,

3
EU⊗4 = Sd14 (vecId )⊗2 = EU14 Sd14 (vecId )⊗2 .
d (d + 2)

The sum of the vector (vecId )⊗2 is d 2 ; the symmetrizer Sd14 is rearranging the
entries only; therefore,
  3 3d
EU⊗4 = d2 = .
j d (d + 2) d +2
j

Now let us turn to the modulus. Let Z be standard normal, then Z and Z are
independent, and U = Z/ Z and Z are independent as well. Let

|U|k = (|Z| / Z)k ;

the power and modulus are by element-wise. Hence,

E |Z|k = E |U|k E Zk ;

we have that each entry of Z is identically distributed; therefore,


7 7
2 2
E |Z| 2k+1
= 2 k! k
= (2k)!! ;
π π

each entry of E |Z|2k is (2k − 1)!!, if n = 2k,

2k  ((2k + 1) /2) 2k  (k + 1/2)


E |Z|2k = √ = √ ;
π π
5.7 Appendix 301

hence, for all n we have the formula


7
2 2n/2  ((n + 1) /2)
E |Z| = 2
n (n−1)/2
 ((n + 1) /2) = √ .
π π

It is known that
 ((k + d) /2)
E Zk = 2k/2 .
 (d/2 )

Hence, we have
7
1  ((n + 1) /2)  (d/2)
E |U| = E |Z| /E Z 1d =
n n n
1d . (5.88)
π  ((n + d) /2)

The modulus of the entries of a T-product of a d dimension vector includes products


of various powers


d
1 
d
E |Ui |ki = E |Zi |ki
i=1
E Z k
i=1
7

d1
 (d/2) (ki −1)/2 2
= k/2 2  ((ki + 1) /2)
2  ((k + d) /2) π
i=1
7
1  1 d
1
=  ((ki + 1) /2) ,
π d1 Gk (d)
i=1
#
since the entries of Z are independent where k = ki , and d1 is the number of
nonzeros ki .
Nevertheless, we have a general formula (5.88), it looks simpler to use separate
formulae for even powers |U|2k = U2k (the power is in entry-wise), see (5.82), and
for odd powers we write
7
1  (k + 1)
E |U|2k+1 = 1 2k+1 . (5.89)
π G2k+1 (d) d

We shall evaluate the expected values of |U|⊗3 and |U|⊗4 ; clearly, each of them has
entries of products of modulus with different entries; therefore, it will be necessary
to collect the entries with the same powers.
Let us start by putting U = [Uk ]k=1:d into the form,


d
|U| = |Uk | ek ,
k=1
302 5 Multivariate Skew Distributions

* *
where ek ∈ Rd are unit vectors. The entries of |U|⊗2 are Uk2 , and *Uj * |Uk |, j = k.
Observe that


d
vecId = 
e(j,j ) ,
j =1

where 
e(j,j ) are orthogonal vectors in Sd,2 , and the sum of all such vectors is 1d 2 .
Now we use the formulae (5.82) and (5.89), respectively, and obtain

1 1 1  
E |U|⊗2 = vecId + 1d 2 − vecId .
d π G2 (d)
⊗3 ⊗3
* * we2 are interested* in* E |U| , where the entries of |U|
Next, include |Uk |3 ,
*Uj * |Uk | , j = k, and *Uj * |Uk | |Um |, j = k = m. These types of products have
different expected values. We write |U|⊗3 in terms of  e(j,k,m) ∈ Sd,3 first
 d ⊗3
 
d * *
|U| ⊗3
= |Uk | ek = |Uk |3 
e(k,k,k) + *Uj * |Uk |2 
e(k,k,j )
k=1 k=1 j =k
 * *
+ *Uj * |Uk | |Um |
e(j,k,m) ,
j =k=m

then use the formulae (5.82) and (5.89), get


7
1 1
E |U1 | = 3
,
π G3 (d)
7
1 1
E |U1 | |U2 | = 2
,
π 2G3 (d)
7
1 1
E |U1 | |U2 | |U3 | = ,
π 3 G3 (d)

note  (3/2) = π/2. Finally, we arrive at
7 7
1 1  
d
1 1
E |U|⊗3 = 
e(k,k,k) + 
e(k,k,j )
π G3 (d) π 2G3 (d)
k=1 j =k
7
1 1 
+ 3

e(j,k,m)
π G3 (d)
j =k=m
7 ⎛ ⎞
1 1 ⎝ 1 1 
d
= 
e(k,k,k) + 
e(k,k,j ) + 2 e(j,k,m) ⎠ .

π G3 (d) 2 π
k=1 j =k j =k=m
5.7 Appendix 303

We calculate E |U|⊗4 in a similar manner to the previous calculation, consider the


expansion


d * *  * *3  * *2
|U|⊗4 = *Uj *4 
e(j,j,j,j ) + *Uj * |Uk |
e(j,j,j,k) + *Uj * |Uk |2 
e(j,j,k,k)
j =1 j =k j =k
 * *2  * *
+ *Uj * |Uk | |Um |
e(j,j,k,m) + *Uj * |Uk | |Um | |Un |
e(j,k,m,n) ,
j =k=m j =k=m=n

√ 
where e(j,k,m,n) ∈ Sd,4 , and the formulae G4 (d) = d (d + 2) /4, and  (3/2) =
3 π/4

3
EU14 = ,
d (d + 2)
1 1
E |U1 | |U2 |3 = ,
π G4 (d)
1
E |U1 |2 |U2 |2 = ,
4G4 (d)
7
1 1
E |U1 | |U2 | |U3 | =
2
3
,
π G4 (d)
1 1
E |U1 | |U2 | |U3 | |U4 | = ,
π 2 G4 (d)

then we summarize these, and obtain

3 
d
1 1  1 
E |U|⊗4 = 
e(j,j,j,j ) + 
e(j,j,j,k) + 
e(j,j,k,k)
d (d + 2) π G4 (d) 4G4 (d)
j =1 j =k j =k
7
1 1  1 1 
+ 
e(j,j,k,m) + 
e(j,k,m,n) .
π 3 G4 (d) π 2 G4 (d)
j =k=m j =k=m=n

5.7.2 T-Derivative of an Inner Product

Lemma 5.13 Let λ ∈ Rd ; then


 n
Dλ⊗2n λ λ = 2n n!L−1
n2 (vecId )
⊗n
, (5.90)

where the moment commutator matrix L−1


n2 is defined by


L−1
n2 = Kp−1K ;
( n, )
(2n−1)!!
304 5 Multivariate Skew Distributions

the summation is over all partitions Kn, ∈ P2n , with type  = (0, n, 0, . . . , 0) (see
Sect. A.2, p. 353).
Proof We use the formula (2.58),
⎛ ⎞

2n   ⊗ ⊗
Dx⊗2n f (g (x)) = f (r) (g) ⎝ Kp−1K ⎠ gx,j j ,
( r, )
r=1 j =r,j j =2n Kr, ∈P2n
(5.91)

where partitions Kr, ∈ P2n , with size r and type . The type  = 1:n =
(1 , . . . , n ) means that the j th-order derivative happens j times. Let f (g (λ)) =
(λ λ)n , so that f (x) = x n , and g (λ) = λ λ. In that case if j is larger than 2, then
⊗
gλ,jj = 0.
Let r = n−1; then the set 1 : 2n is split up into n−1 blocks; therefore, there must
 ⊗
be at least one block with cardinality at least 3; hence, ⊗ gλ,jj = 0. Similarly, if
 ⊗
r < n, then ⊗ gx,j j = 0. If r > n, then f (r) (x) = 0.
The only nonzero value occurs in the sum (5.91), p. 304, when r = n, and j = 2,
⊗n
2 = n, j = 0, j = 2. We obtain f (n) (x) = n!, gλ,2 = 2 (vecId )⊗n ; in addition
we need all the partitions Kn, ∈ P2n , with  = (0, n, 0, . . . , 0). The number of
these partitions N = (2n)!/2n n! = (2n − 1)!!, see (2.53), p. 91, and the assertion
follows.

5.7.3 Proof of (5.44)

Proof We use Brillinger’s theorem ((3.67), p. 156) and condition Q = [β1 , R]


again:
   
κ⊗ ⊗ −1 ⊗ ⊗
|W1 |,4 = Cum1 κ |W1 |,4|Q + L13 ,11 Cum2 κ |W1 |,3|Q , κ |W1 |,1|Q
 
+L−1 ⊗
22 Cum2 κ |W1 |,2|Q
   
+L−1
12 ,21 Cum 3 κ ⊗
, κ ⊗
, κ ⊗
|W1 |,2|Q |W1 |,1|Q |W1 |,1|Q + Cum 4 κ ⊗
|W1 |,1|Q

= μβ1 ,4 μR,4 κ ⊗ −1 ⊗ ⊗ −1 ⊗2
|U1 |,4 + κβ13 R 3 ,β1 R L13 ,11 κ |U1 |,3 ⊗ κ |U1 |,1 + L22 κβ12 R 2 ,2 κ |U1 |,2
 
+κβ 2 R 2 ,β1 R,β1 R L−1
1 ,2
2 1
κ ⊗
|U 1 |,2 ⊗ κ ⊗2
|U 1 |,1 + κβ1 R,4 κ ⊗4|U1 |,1
1

= Sd14 μβ1 ,4 μR,4 κ ⊗ ⊗ ⊗ ⊗2
|U1 |,4 + 4κβ13 R 3 ,β1 R κ |U1 |,3 ⊗ κ |U1 |,1 + 3κβ12 R 2 ,2 κ |U1 |,2

+6κβ 2 R 2 ,β1 R,β1 R κ ⊗ |U1 |,2 ⊗ κ ⊗2
|U1 |,1 + κ β 1 R,4 κ ⊗4
|U1 |,1
1
5.7 Appendix 305

(see (A.5), p. 354 for commutator L−1 12 ,21 , (A.36), p. 375 for κβ13 R 3 ,β1 R , say). Only
those conditional cumulants are nonzero, which separate |W1 | and W2 and do not
include the first-order cumulant of W2 . Let us denote 2 = Id −  , for short;
then
   
⊗2 ⊗ 2 κ⊗
1/2⊗2 1/2 1/2
|W1 |12 ,W212 = Cum 4  |W 1 | ,  |W1 | ,  2 W 2 ,  2 W2
 
= L−1 ⊗ ⊗
22 Cum2 κ |W1 |,2|Q , κ 1/2 + L−1
21 ,12 Cum3
2 W2 ,2|Q
 
× κ⊗ ⊗ ⊗
|W1 |,1|Q , κ |W1 |,1|Q , κ 1/2 2 W2 ,2|Q
 
= κβ 2 R 2 ,β 2 R 2 L−1
22 κ⊗ ⊗
|U1 |,2 ⊗ μ 1/2 + κβ1 R,β1 R,β 2 R 2 L−1
21 ,12
1 2 2 U2 ,2 2
 
× κ ⊗2 ⊗
|U1 |,1 ⊗ μ 1/2 2 U2 ,2
κβ 2 R 2 ,β 2 R 2  
L−1 ⊗2 ⊗ 1/2⊗2
= 1 2
22  κ |U1 |,2 ⊗ 2 vecId
d
κβ1 R,β1 R,β 2 R 2  
L−1 ⊗2 ⊗2 1/2⊗2
+ 2
21 ,12  κ |U1 |,1 ⊗  2 vecId
d
κβ 2 R 2 ,β 2 R 2  
= 1 2
L−1
22 ⊗2 ⊗
κ |U1 |,2 ⊗ vec 2
d
κβ1 R,β1 R,β 2 R 2  
+ 2
L−1
21 ,12  ⊗2 ⊗2
κ |U1 |,1 ⊗ vec 2 .
d
We use conditional cumulants and neglect all terms that include W2 with odd order;
we get

κ ⊗1/2 = μβ2 ,4 μR,4 κ ⊗1/2 + κβ 2 R 2 ,2 L−1 ⊗2


22 κ 1/2
2 W2 ,4 2 U2 ,4 2 2 U2 ,2

−6μβ2 ,4 μR,4 κβ 2 R 2 ,2
= vec⊗2 2 + 2 2 L−1 ⊗2
22 vec 2 .
d (d + 2)
2 d

One can get the final expression of κ ⊗X,4 by plugging these terms into (5.44) and

conclude the dependence of κ X,4 on skewness matrix  and generating variate R;
the rest of the quantities depend on dimensions d and p only.
306 5 Multivariate Skew Distributions

5.7.4 Proof of Lemma 5.6

Proof The assertion (5.45) simply follows from the assumptions of the model
CFUSS, cf. (5.36).
Using Brillinger’s theorem ((3.67), p. 156) for T-cumulant κ ⊗
|W1 |1m ,W212k under
condition Q = [β1 , R], we obtain
 
κ⊗
|W1 | = K−1
  Cum
s+t
1m ,W212k p K{s} ,L{t}
t =1:k, K{s} ∈Pm ,
s=1:m L ∈P2k
{t}
 
× κ⊗ ⊗ ⊗ ⊗
|W1 |,|a1 ||Q , . . . , κ |W1 |,|as ||Q , κ W2 ,|b1 ||Q , . . . κ W2 ,|bt ||Q ,

since by Lemma 5.5 cumulants of W2 with odd orders are zero. Now conditional
cumulants

κ⊗
|W1 |,|aj ||Q
= Cum|aj | (|W1 | |Q) = (β1 R)|aj | κ ⊗
|U1 |,|aj |
,

and

κ⊗
W2 ,|bj ||Q
= Cum|bj | (W2 |Q) = (β2 R)|bj | κ ⊗
U2 ,|b1 | .

Therefore
 
Cums+t κ ⊗ |W1 |,|a1 ||Q , . . . , κ ⊗
, κ ⊗
|W1 |,|as ||Q W2 ,|b1 ||Q , . . . , κ ⊗
W2 ,|bt ||Q

= Cums+t (β1 R)|a1 | κ ⊗ |U1 |,|a1 | , . . . , (β1 R)
|as | ⊗
κ |U1 |,|as | ,

(β2 R)|b1 | κ ⊗
U2 ,|b1 | , . . . , (β2 R)
|bt | ⊗
κ U2 ,|bt |
 
= Cums+t (β1 R)|a1 | , . . . , (β1 R)|as | , (β2 R)|b1 | , . . . , (β2 R)|bt |
⊗ ⊗
κ⊗
|U1 |,|aj |
⊗ κ⊗U2 ,|bj |
⊗ ⊗ 
= ϒs,t (β1 , R) κ⊗
|U1 |,|am | ⊗ μU,|bj | L−1 ⊗k
|b | vec Id , j 2

where we introduced the notation


 
ϒs,t (β1 , R) = Cums+t (β1 R)|a1 | , . . . , (β1 R)|as | , (β2 R)|b1 | , . . . , (β2 R)|bt | .
5.8 Exercises 307

We use Brillinger’s theorem again and obtain


 
ϒs,t (β1 , R) = Cums+t (β1 R)|a1 | , . . . , (β1 R)|as | , (β2 R)|b1 | , . . . , (β2 R)|bt |
    
= Cumk Cum (β1 R)|ai | , (β2 R)|bj | |β1
h∈K
K∈Ps+t
    
= Cumk β1m β22k Cum|h| R |ai | , R |bj | |i,j ∈h
h∈K
K∈Ps+t
   
= κβ m β 2k ,k Cum|h| R |ai | , R |bj | |i,j ∈h .
1 2 h∈K
K∈Ps+t

5.8 Exercises

5.1 Show
 
(4) (3) (2) (1)
ρiM = (x − 2iρiM ) ρiM + 3ρiM 1 − 2iρiM ,

and conclude
(4) 5 3
ρiM (0) = 24ρiM (0) − 20ρiM (0) + 3ρiM (0) ;

hence,
7 5 7 3 7
(4) 2 2 2
κ|Z|,5 = i(−i)5 ρiM (0) = 24 − 20 +3 .
π π π

Next use the formula (3.27), p. 131, and show the similar result.
5.2 Take an X with multivariate skew-normal distribution and use central moments
to show
7 7
⊗  2 2 ⊗3
μX,3 = 3 vec ⊗ δ − δ
π π

(see Genton [GHL01]).


5.3 Let X be multivariate skew-normal variate; show that


μ⊗
X,4 = 3 (vec )
⊗2
.
308 5 Multivariate Skew Distributions

5.4 Let Wj be a standardized entry of a spherically distributed random variate W.


Show that


κ4 = 
μ4 − 4 μ22 + 6
μ3 − 3 μ2 ,

κ5 = 
μ5 − 5
μ4 − 10
μ3  μ22 + 10
μ2 + 30 μ3 − 10
μ2 .

5.5 Show by induction that 2 + 2 order derivative fulfills


  (2+2)
+1
λ2 = 2+1 · ( + 1)! · (2 + 1)!!.

5.6 Let W1 = RU1 , R, and U1 be independent. Use the independency of generating


variate R and Uj , and Lemma 3.4, p. 159 to show

κRU1 ,4 = κR 4 ,1 κU1 ,4 + 3κR 2 ,2 κU2 1 ,2 .

5.7 Let W1 = RU1 , R, and U1 be independent; show that


 3
κW1 ,6 = μR,6 μU1 ,6 − 15μR,4 μR,2 μU1 ,4 μU1 ,2 + 30 μ2R,2 μ2U1 ,2 .

5.8 Let generating variate R be Gamma distributed with parameters ϑ > 0,


α > 0, and then we have μR,r = ϑ r  (α + r) /  (α). Let m = 2, 4, and 6.
Derive generator moment νm , moment parameter 
μm , generator cumulants ζm , and
cumulant parameter 
κm of W = RU, see (5.15).
5.9 Let X be distributed as CFUSS distribution, which is given by Eq. (5.36). Show
μβ2 ,2 μR,2  
μ⊗ ⊗2 ⊗
X,2 = μβ1 ,2 μR,2  μ|U1 |,2 + Id 2 − ⊗2 vecId ,
d
where
 
1 1 1 1 1
μ⊗
|U1 |,2 = − vecIp + 1 2.
p π G2 π G2 p

5.10 Let random variable β12 be Beta(p/2, d/2) distributed, and β22 = 1 − β12 .
Assume that the random variables R and β1 are independent. Use conditional
cumulants to show

κβ1 R,β2 R = κR,2 μβ1 ,β2 + μ2R κβ1 ,β2


= κβ1 ,β2 μR,2 + μβ1 μβ2 κR,2
= μβ1 ,β2 μR,2 − μβ1 μβ2 μ2R ,
5.8 Exercises 309

and

Cov (β1 R) = κβ1 R,2 = μβ1 ,2 κR,2 + μ2R κβ1 ,2 .

5.11 Exercise 5.10 continued, show that

κβ1 R,β 2 R 2 = μβ1 ,β 2 κR,R 2 + κβ1 ,β 2 μR,2 μR,1


2 2 2

= μR,3 κβ1 ,β 2 + μβ1 μβ2 κR,R 2 .


2

5.12 Take random vectors W1 and W2 defined by model (5.36), and use conditional
cumulants to show

Cum3 (|W1 | , |W1 | , W2 ) = 0.

5.13 Assume that X ∈ Std (μ, , α, p), and p > 2, use formula (5.58) to derive
the second-order cumulant
p p
κ⊗
X,2 = vec − G2−1 (p) δ ⊗2 .
p−2 π

5.14 Let W = Rp V be given by (5.54); use Lemma 3.4 to show


   
κ⊗
W,3 = κ Rp ,3 κ ⊗
V,3 + L−1
12 ,11 κ ⊗
V,2 ⊗ κ ⊗
V,1 + κ ⊗3
V,1 (5.92)
  
+ κRp ,1 κRp ,2 3κ ⊗ −1 ⊗
V,3 + 2L12 ,11 κ V,2 ⊗ κ V,1

+ κR3 p ,1 κ ⊗
V,3


 
= κRp ,3 κ ⊗ ⊗ ⊗
V,3 + 3κ V,2 ⊗ κ V,1 + κ V,1
⊗3

 
+ κRp ,1 κRp ,2 3κ ⊗ ⊗ ⊗
V,3 + 6κ V,2 ⊗ κ V,1 + κRp ,1 κ V,3 .
3 ⊗

5.15 Let W = Rp V; recall V ∈ SNd (0, , α). Use (5.58) to show


κ⊗
W,3 = κRp3 ,1 κ|Z|,3 δ
⊗3
+ 3κRp ,Rp2 κ|Z|,1 δ ⊗ κ ⊗
V,2 + κRp ,3 κ|Z|,1 δ
2 3 ⊗3

 
2
= κRp3 ,1 κ|Z|,3 + κRp ,3 κ|Z|,1 − 3κRp ,Rp2 κ|Z|,1
3
δ ⊗3 + 3κRp ,Rp2 κ|Z|,1 δ ⊗ vec .
π

5.16 Let W be given by the previous exercises. Use Sect. A.6.1, p. 370 and (A.38),
p. 379 and show
7   7
2 2 2 1 3p3/2 1
κ⊗
W,3 = p
3/2 G − ⊗3
δ + G−1 Sd13 (vec ⊗ δ) .
π π −1 p − 3 (p − 2) (p − 3) π
310 5 Multivariate Skew Distributions

5.17 Assume that X ∈ Std (μ, , α, p), and p > 3. Show that
7  
 2  p 3/2 2 1
κX,3 =2 G−1 (p) 2
G−1 (p) − δ ⊗3
π 2 π p−3

2
+ (3δ ⊗ vec ) .
(p − 3) (p − 2)

5.18 Let Rp be given by (5.53). Show that


 p 2 4
−κRp2 ,2 + κRp ,Rp ,Rp2 = − G2−1 (p) ,
2 (p − 2) (p − 3)

and
 p 2
−6μRp ,4 + 8κRp ,Rp3 + 3κRp2 ,2 − 6κRp ,Rp ,Rp2 + κRp ,4 = −6 G−1 (p)4 .
2
5.19 Let Rp be given by (5.53). Show that

κRp ,Rp3 = μRp ,4 − μRp μRp ,3 .

5.20 Let Rp be given by (5.53).

κRp ,Rp ,Rp2 = μRp4 − μ2R 2 − 2μRp μRp3 + 2μ2Rp μRp2 .


p

5.21 Assume h (x) is defined by (5.66) and Z is a standard normal variate; show
7    
1 2 1 6 π 1 1
μh(|Z|),5 = + + 3 M (1/α) − + 5 .
3α 5 π α4 α2 2 α α2

5.9 Bibliographic Notes

Moments of folded normal distribution are obtained in Elandt [Ela61]. The mul-
tivariate skew-normal distribution was introduced by Azzalini and Dalla Valle
[ADV96]. Further properties and applications of skew-normal distribution including
canonical fundamental skew normal can be found in papers [AC99, GHL01, PP01,
AC03, AVGQ04, Azz05, AVG05, Cap12, CC13, Shu16] and [AVFG18] among
others.
We refer the reader to the book by Fang, Kotz, and Ng, [FKN17] for the theory
of spherically symmetric or rotationally symmetric distributions. The distributions
of elliptical or ellipsoidal symmetry are considered in [Ste93, Sut86]; higher-
order moments are given in [BB86]. Further properties of such distributions
5.9 Bibliographic Notes 311

are given in [And03, KvR06, Mui09], and asymmetric versions like canonical
fundamental skew-spherical distribution [AB02, Gen04, GL05, DL05]. We follow
[AC03, SDB03] and [KM03] also deal with skew t-distribution. We refer to a
review article [Ser04] on multivariate symmetry. Some more skew distributions
include scale mixtures of skew normal [Kim08], multivariate skew-normal-Cauchy
[KRYAV16], and multivariate Laplace distribution [KS05, Kv95]. A recent review
of skew-symmetric (also called symmetry-modulated) distributions can be found in
[AA20] and [JTT21b].
Chapter 6
Multivariate Skewness and Kurtosis

Abstract A unified treatment of all currently available cumulant-based indices of


multivariate skewness and kurtosis is provided in this chapter. They are expressed in
terms of the third and fourth-order cumulant vectors, respectively. Such a treatment
helps to reveal many subtle features and inter-connections among the existing
indices. Computational formulae for obtaining these measures are provided for
general families of multivariate distributions, yielding several new results and a
systematic exposition of many known results. Based on this analysis, new measures
of skewness and kurtosis are proposed.
For a given multivariate distribution, explicit formulae are provided for the
asymptotic covariances of estimated cumulant vectors of the third and the fourth
order, which are needed for showing the asymptotic normality of test statistics based
on them. This allows us to extend several known results and provides ready-to-use
expressions in terms of population cumulants and commutator matrices, for any
symmetric and asymmetric distribution as long as the required moments exist.
Statistical inference for both multivariate skewness and kurtosis is studied.

6.1 Multivariate Skewness of Random Vectors

Using the standard normal distribution as the yardstick, statisticians have defined
notions of skewness (asymmetry) and kurtosis (peakedness) in the univariate case.
Since all the odd central moments (when they exist) are zero for a symmetric
distribution on the real line, a first attempt at measuring asymmetry is to ask how
different the third central moment is from zero, although in principle one could use
any other odd central moment, or even a combination of them.
The analysis presented here is based on the cumulant vectors of the third and
fourth order, defined below. In our derivations, we utilize an elegant and powerful
tool: the T-derivative and the T-cumulants.
In this section and the next one, it will be shown that all cumulant-based measures
of skewness and kurtosis that have appeared in the literature can be expressed in
terms of the third and fourth cumulant vectors, respectively. Also, several hitherto
unnoticed relationships between different indices will be explored. We define what

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 313
G. Terdik, Multivariate Statistical Methods, Frontiers in Probability
and the Statistical Sciences, https://doi.org/10.1007/978-3-030-81392-5_6
314 6 Multivariate Skewness and Kurtosis

we consider to be natural measures of multivariate skewness and kurtosis and show


their relation to the measures defined by others.
Let X be a d-dimensional random vector whose first four moments exist. We will
denote EX by μ and the positive definite variance–covariance matrix VarX by .
We consider the standardized vector variate

Y =  −1/2 (X − μ)

with zero means and identity matrix for its variance–covariance. A complete
picture of skewness is contained in the third-order T-cumulant (skewness vector)
of standardized X
   ⊗3
κ⊗
Y,3 = Cum3 
−1/2
(X − μ) =  −1/2 κ⊗
Y,3 .

We shall use the Kendall–Stuart’s notation for both skewness and kurtosis.
Definition 6.1 The third-order T-cumulant γ ⊗ ⊗
X,1 = κ Y,3 of the standardized X will
be called the skewness vector of X, and the total skewness of X is defined by the
square norm of the skewness vector

2
γ1,d = γ ⊗
X,1 .

This definition guarantees that skewness is invariant under the shifting and
orthogonal transformations, in other words it is affine invariant. Let Q be an
orthogonal transformation. We show that the total skewness of X is invariant under
the orthogonal transformations Q of X, indeed

γ⊗ ⊗ ⊗
QX,1 = κ Y,3 = γ X,1 .

In view of the fact that EY = 0 and Y has unit variance, the third-order cumulant
equals the third-order central moment, so that

γ⊗ ⊗
X,1 = μY,3 . (6.1)

In general cumulants are invariant under shifting by a constant hence the assumption
that EY = 0 does not influence the treatment while using the moments simplifies
some formulae.
The skewness vector γ ⊗ ⊗
X,1 is 3-symmetric, i.e. γ X,1 ∈ Sd,3 , and therefore not all
entries are distinct. For instance: if d = 2, then
2
γ1,2 = γ ⊗
X,1 = κY21 ,3 + 3κ(Y
2
1 ,Y1 ,Y2 )
+ 3κ(Y
2
1 ,Y2 ,Y2 )
+ κY22 ,3 .

The skewness for a multivariate Gaussian vector Y is zero, as it is for any


distribution which is symmetric, so that it can be considered as a departure from
symmetry.
6.1 Multivariate Skewness of Random Vectors 315

Now we list skewness measures for some distributions which have been consid-
ered in the previous chapters.
Case 6.1 (Normal Distribution) We see that the multivariate Gaussian distribution
is not skewed since cumulants of order higher than two are all zero.
Case 6.2 (Skew-Normal Distribution) If X is a skew-normal variate, X ∈
SNd (μ, , α), then the variance is given by

⊗ 2 ⊗2 2
κX,2 = vec − δ , VarX = − δδ  ,
π π
therefore the skewness vector is
 
2  −1/2⊗3 ⊗3
γ⊗
1 = κ |Z|,3 − δδ δ ,
π

where κ|Z|,3 = 2/π (4/π − 1) and δ is the skew parameter, (see Lemma 5.1).
Hence the total skewness can be written as
     3
2  −1⊗3 ⊗3 2  −1
γ⊗ δ ⊗3 
2
1 = 2
κ|Z|,3 − δδ δ = κ|Z|,3 δ − δδ
2
δ .
π π

One can observe that the skew parameter δ = π/2μX , (see (5.4), p. 246);
therefore, the total skewness can be expressed by the mean and the variance as
 2  3
4−π   −1
γ⊗
2
1 = μX − μX μX μX .
2

Case 6.3 (CFUSN) The skewness for canonical fundamental skew-normal distri-
bution, X ∈ CF U SNd,p (0, , ), is based on the results of Lemma 5.3, namely
the variance and the third-order cumulant which are given by
7  
2 ⊗ 2 4

VarX = −  , κX,3 = − − 1 ⊗3 i⊗
p,3 .
π π π

Now we combine the variance VarX and the cumulant κX,3 to obtain the skewness
vector of X as
7   −1/2⊗3
⊗ 2 4 2
γ1 = − −1 −  
⊗3 i⊗
p,3
π π π
 −1/2 ⊗3
2
= −  
 i⊗
p,3 ,
π
316 6 Multivariate Skewness and Kurtosis

and the total skewness γ1,d of X is

 2   −1 3
2 4 2
γ⊗ 1p  − 
2
1 = −1  1p ,
π π π

where  denotes the Hadamard power.


Case 6.4 (Elliptically Symmetric Distribution) We have γ ⊗1 = 0, (see Theorem 5.2,
p. 263) for elliptically symmetric random variate X ∈ Ed (μ, , g).
Case 6.5 (CFUSS) We refer to the previous chapter for skewness of Canonical
Fundamental Skew-Spherical distribution X ∈ CF U SSd,p (0, , ); the variance
is at (5.40), and the third-order cumulant is at (5.43) p. 271.
Case 6.6 (Skew-t Distribution) We refer to (5.56), p. 278 for the variance and
Lemma 5.8, p. 279 for the third-order cumulant which are necessary to obtain the
skewness of distribution Std (μ, , α, p). We will skip the details.
Case 6.7 (SMSN) Let us consider the Scale Mixtures of Skew-Normal Distribution,
X ∈ SMSNd (μ, K (η) , α), where
7
2  2
VarX =μξ,2 − μ2ξ δδ and κ ⊗
X,3 = c1 δ
⊗3
+ κξ,ξ 2 L−1
12 ,11 vec ⊗ δ,
π π

see (5.60), p. 285, and (5.61), p. 286. Let us recall that c1 is given explicitly

   2 3/2 7
2
c1 = 3μξ,3 − κξ,1
3
− μξ,3 .
π π

Then the skewness vector is


 −1/2⊗3  7 
22 2
γ⊗
1 = μξ,2 − μξ δδ  ⊗3 −1
c1 δ + κξ,ξ 2 L12 ,11 vec ⊗ δ .
π π

Case 6.8 (SNC) Let X be multivariate skew-normal-Cauchy distributed, X ∈


SNC (,a), (5.75) then
7  3/2 
1 ⊗ 2 2
VarX =  − 2 M2 (1/α) δδ  and κX,3 = μh(|Z|),3 + 2! μh(|Z|) δ ⊗3 ,
3
α π π

(see (5.71), p. 291). Clear expressions of μh(|Z|) and μh(|Z|),3 are given in
Examples 5.6 and 5.7, p. 294. Hence the skewness vector is
7  3/2   −1/2 ⊗3
2 2 1 2
γ⊗
1 = μh(|Z|),3 + 2! 3
μh(|Z|)  − 2 M (1/α) δδ 
δ .
π π α
6.1 Multivariate Skewness of Random Vectors 317

Case 6.9 (ML) If X has Multivariate Laplace distribution, X ∈ MLd (θ , ), then

VarX =  and κ ⊗ −1 ⊗3
X,3 = L12 ,11 (vec  ⊗ θ ) − θ ,

(see (5.77)); therefore, the skewness vector is


 
γ⊗
1 =
−1/2⊗3
L−1
12 ,11 (vec  ⊗ θ ) − θ
⊗3
.

The following examples reveal several relationships among the well known
indices of skewness which appeared in the literature, and their connection to the
skewness vector γ ⊗
1 , which can actually be seen as the common denominator.
Example 6.1 (Mardia’s Skewness) Mardia suggested the square norm β1,d =
2  ⊗3
EY⊗3 of the vector EY⊗3 = E  −1/2 (X − μ) as a measure of departure
from symmetry for X. Mardia’s measure coincides with our total skewness

2
β1,d = γ ⊗
X,1 = γ1,d ,

since third-order central moments and third-order cumulants are equal. Recall that
γ⊗ ⊗
X,1 = κ Y,3 by Definition 6.1. Let Y1 and Y2 be two independent copies of Y, then

  3 ⊗3 ⊗3 2
E Y1 Y2 = EY1 Y⊗3
2 = EY1 EY⊗3
2 = EY
⊗3
= β1,d .

This suggests an estimator


n
  3
31,d = 1
β Yi Yj (6.2)
n2
i,j =1

based on a sample Y1 , . . . , Yn .
Example 6.2 (Móri–Székely–Rohatgi) The skewness vector b (Y) of Y can be
defined by the quantity

 
b (Y) = E Y Y Y = vec Id ⊗ Id κ ⊗
Y,3 , (6.3)

(see Exercise 6.1) and we will refer to it as MSzR skewness vector. Note that
vec Id ⊗ Id is a matrix of dimension d × d 3 , which contains d unit values per-
row, whereas all the others are 0; as a consequence, this measure
 does not
 take into
account the contribution of cumulants of the type Cum3 Xj , Xk , X , where all
three indices j , k,  are different from each other. The corresponding scalar measure
of multivariate skewness is b (Y) = b (Y)2 .
Example 6.3 (Malkovich–Afifi Measure) They consider the following approach to
measure skewness: Let Sd−1 be the d − 1 dimension unit sphere in Rd and let
318 6 Multivariate Skewness and Kurtosis

u ∈ Sd−1 . Note that


 
Cum3 u Y = u⊗3 κ ⊗,
Y,3

where u ∈ Sd−1 . Malkovich–Afifi define measure of skewness by


 2 
b (Y) = sup u⊗3 κ ⊗
Y,3 .
u

= u⊗3 u⊗3 = (u u)⊗3 = 1; therefore,


2
We have u⊗3
 2 2  
u⊗3 κ ⊗
Y,3 = κ⊗
Y,3 cos2 u⊗3 , κ ⊗
Y,3 ,

where cos (a, b) indicates the cosine of the angle between the vectors a and b. Thus
we have the following inequality:
 2   
2 2
b (Y) = sup u⊗3 κ ⊗
Y,3 = κ⊗
Y,3 sup cos2 u⊗3 , κ ⊗ ⊗
Y,3 ≤ κ Y,3 .
u u
 
⊗3
In case there exists a u0 such that cos u0 , κ ⊗
Y,3 = 1; in other words κ ⊗ Y,3
is a rank 1 element of the symmetric subspace Sd,3 , (see Sect. 1.3, p. 13) then
2
Malkovich–Afifi measure b (Y) = κ ⊗ Y,3 = γ1,d . For instance this occurs when
X is skew-normal. Another instance is when Y has independent components. In
this case all the entries of κ ⊗ Y,3 are zero but not necessarily κYi ,3 . Assume that all
the components of Y are skewed, κYi ,3 = 0, otherwise we can restrict ourselves
to a smaller dimensional problem. A particular example of this is when d = 2,
κ⊗ 
Y,3 = [κY1 ,3 , 0, 0, 0, 0, 0, 0, κY2 ,3 ] . In the case of independence we have


d
κ⊗
Y,3 = κYi ,3 e⊗3
i .
i=1

Hence the cumulants κYi ,3 of entries of Y constitute a vector in Rd , let us denote it



κ 3 ∈ Rd . It looks reasonable to choose the unit vector either by the equation u1 =
   
√ 1/3 √
 κ 3 , or by u2 = 3 κYj ,3 / κ ⊗
κ 3 /  Y,3 2/3 , (normalizing the vector 3 κY ,3
j ).
j j
 2 2
⊗3
One can see that in both cases uj κ ⊗ ⊗
Y,3 , j = 1, 2, are less than κ Y,3 .

Remark 6.1 Take an orthogonal transformation Q, and let X = QY. Then the
Malkovich–Afifi measure of skewness of X and that of Y are equal, i.e. b (X) =
b (Y).
 2  2   2
Indeed u⊗3 , κ ⊗ = u⊗3 , Q⊗3 κ ⊗ = Q −1 u ⊗3 , κ ⊗ ≤ b (Y)
Y,3 Y,3 Y,3
6.1 Multivariate Skewness of Random Vectors 319

 2
for any u, and for any Q, hence b (X) ≤ b (Y). Similarly u⊗3 , κ ⊗ Y,3 =
 2  2
u⊗3 , Q−1⊗3 κ ⊗
X,3 = (Qu)⊗3 , κ ⊗
X,3 ≤ b (X), hence b (Y) ≤ b (X).

Example 6.4 (Balakrishnan’s Extension of Malkovich–Afifi Measure) A possible


multivariate
  extension of Malkovich–Afifi measure is the following. Let us denote
 d ω as the normalized Lebesgue element of surface area on Sd−1 and define
4   4
T= u u⊗3 κ ⊗
Y,3  (du) = u ⊗ u⊗3  (du) κ ⊗
Y,3 . (6.4)
Sd−1 Sd−1

We see that the extension is a transformation of skewness vector γ ⊗


1 multiplied by
a constant. If d = 3 , the result is
⎡ ⎤ ⎡#3 ⎤
κY1 ,3 + κ(Y1 ,Y2 ,Y2 ) + κ(Y1 ,Y3 ,Y3 ) j =1 κ(Y1 ,Yj ,Yj )
3 3 ⎢#3 ⎥
T= ⎣κ(Y ,Y ,Y ) + κY ,3 + κ(Y ,Y ,Y ) ⎦ = ⎣ κ ⎦,
15 1 1 2 2 2 3 3
15 #j3=1 (Y2 ,Yj ,Yj )
κ(Y1 ,Y1 ,Y3 ) + κ(Y2 ,Y2 ,Y1 ) + κY3 ,3 j =1 κ(Y3 ,Yj ,Yj )

in general, the kth entry Tk of T is

3  d
Tk = κ(Yk ,Yj ,Yj ) .
d (d + 2)
j =1

 
Notice that vector T does not depend on Cum3 Yj , Yk , Y when j, k,  are all
different from each other; furthermore, the vector T coincides with the skewness
vector (6.3) up to a constant. One can show that
4
3    
u ⊗ u⊗3  (du) = vec Id ⊗ Id .
Sd−1 d (d + 2)

Therefore (6.4) is 3/d (d + 2) times b (Y), which is given in (6.3).

Example 6.5 (Srivastava’s Measure) Let X be a d-dimensional random vector


with variance matrix . We consider the spectral values of  by    =
Diag (λ1 , . . . , λd ) = Dλ , where  is an orthogonal matrix. A shifted and scaled
vector of X can be defined by

Y = Dλ−1   (X − EX) .

Let the ith coordinate of  i = e 


Y be Y i Y where ei is the ith coordinate axis. Then
the scalar measure of skewness can be

1   3 2
d
b12 (X) = EYi .
d
j =1
320 6 Multivariate Skewness and Kurtosis

Using our notations we have

 −1/2
Y = Dλ−1    1/2 Y = Dλ   Y.
 
The expected value of  Y is E
Y = 0, and Var 
Y = Dλ−1 , (
Y is uncorrelated) hence
the statistic b12 (X) can be written as
   
i = Cum3 e D −1/2   Y
3 = Cum3 Y
EY i i λ
     
 −1/2  ⊗3 ⊗3 −1/2  ⊗3 ⊗ ⊗3 ⊗
= ei Dλ  Cum3  −1/2 X = ei Dλ  κ Y,3 = ei κ  .
Y,3

Hence b12 (X) can be expressed in terms of our skewness vector as

d  
1  ⊗3  −1/2  ⊗3 ⊗ 2
b12 (Y) = ei Dλ  γ1 .
d
j =1

⊗3
It may be remarked that ei is a unit axis vector while  ⊗3 is a rotation in the
3
Euclidean space Rd . The measure b12 (Y) is the norm square of the projection
of κ ⊗
Y,3 to the subspace of R
d 3 and hence it does not contain all the information

contained in γ ⊗
1.
Example 6.6 (Kollo’s Measure) An alternate skewness vector has been defined by
Kollo as
 

b (Y) = E 1d 2 (Y ⊗ Y) ⊗ Y,

which can again be expressed in terms of γ ⊗


1 as follows:
 

b (Y) = 1d 2 ⊗ Id γ ⊗
1. (6.5)

Each entry of κ ⊗ Y,3 contributes to value b (Y). Nevertheless, we note the fact
that not all third-order mixed moments appear in b (Y) individually, so that some
information can be lost. If one compares skewness vector b (Y) to the one in (6.3),

then one can see the difference between the vectors 1d 2 and vec Id . The result is
that here the 0s of vec Id are changed to 1s and we get the linear combinations of
the corresponding values of γ ⊗1 . When d = 2 and


κ⊗
Y,3 = κY1 ,3 , κ(Y1 ,Y1 ,Y2 ) , κ(Y1 ,Y1 ,Y2 ) , κ(Y1 ,Y2 ,Y2 ) , κ(Y1 ,Y1 ,Y2 ) , κ(Y1 ,Y2 ,Y2 ) , κ(Y1 ,Y2 ,Y2 ) , κY2 ,3

= [1, −1]⊗3 = [1, −1, −1, 1, −1, 1, 1, −1] ,


6.2 Multivariate Kurtosis of Random Vectors 321

then b (Y) = 0 even for an asymmetric distribution, making it not a valid measure
of symmetry.

6.2 Multivariate Kurtosis of Random Vectors

The kurtosis vector of X will be defined by the fourth-order cumulant vector κ ⊗


Y,4 ,
which is zero for Gaussian random vectors and can be used as the standard. More
precisely the kurtosis of X is measured by the cumulant vector of its standardized
−1/2
version Y =  X (X − μX ).
Definition 6.2 The fourth-order T-cumulant of the standardized X will be called
kurtosis vector of X. It is denoted by γ ⊗ ⊗ ⊗
X,2 , i.e. γ X,2 = κ Y,4 . The square norm of

γ X,2

2
γ2,d = γ ⊗
X,2

is the scalar measure of kurtosis that will be called total kurtosis of X.


Now the central moment μ⊗ ⊗
Y,4 and cumulant κ Y,4 are different; hence we need
to consider the expression of the cumulant via moments. Noting Y has unit variance
matrix, i.e. κ ⊗2 ⊗2 ⊗2
Y,2 = μY,2 = vec Id , we have

γ⊗ ⊗ −1 ⊗2 ⊗ −1 ⊗2
X,2 = μY,4 − L22 μY,2 = μY,4 − L22 vec Id , (6.6)

where the commutator L−1 ⊗


22 is given by (A.3), p. 353. The kurtosis vector γ X,2 is
4-symmetric, i.e. γ ⊗
X,1 ∈ Sd,4 , hence we can use the symmetrizer Sd1q and obtain


γ⊗ ⊗ ⊗2
X,2 = μY,4 − 3vec Id .

The kurtosis of vector variate X, similarly to the skewness, is invariant under shift
and orthogonal transformations; therefore, it is invariant under affine transforma-
tions.
As we have seen above elliptically symmetric distributions are not skewed and
their fourth-order cumulants are not necessarily zero. Therefore, we can get even
closer to Gaussianity if besides skewness the kurtosis is also 0.
Now we list kurtosis measures for some distributions which have been considered
in the previous chapter.
Case 6.10 (Normal) The kurtosis of normal distribution is 0.
Case 6.11 (Skew Normal) Let X be with skew-normal distribution, X ∈
SNd (μ, , α). We will express the kurtosis in terms of the mean μX . Since
322 6 Multivariate Skewness and Kurtosis


δ= π/2μX , and we recall

2 
VarX = − δδ , and κ ⊗ ⊗4
Y,4 = κ|Z|,4 δ ,
π
therefore, the kurtosis vector is

  ⊗4
 −1/2⊗4 ⊗4  −1/2
γ⊗
X,2 = (2π − 6) − μ μ
X X μX = (2π − 6) − μ μ
X X μX

since
 2
4−π
δ ⊗4 = δ4 = μX 4 ,
2

and
   
π2 π2 8 3 3
κ|Z|,4 = 1− = 2π 1 − = 2π − 6.
4 4 π π π

The total kurtosis has the form


  4
2   −1
γ⊗
X,2 = (2π − 6)2 μX − μX μX μX .

Case 6.12 (CFUSN) The kurtosis of canonical fundamental skew-normal variate,


X ∈ CF U SNd,p (0, , ), is
  2   −1/2⊗4
2 2 2
γ⊗
X,2 = 4 −6 −  ⊗4 i⊗
p,4
π π π
   −1/2 ⊗4
8 3 2
= 1− − 
 i⊗
p,4 .
π π π

Since we have
2 ⊗
VarX = −  , and κX,4 = κ|Z|,4 ⊗r i⊗
p,4 ,
π
and
 2
2 2
κ|Z|,4 = 4 −6 ,
π π
6.2 Multivariate Kurtosis of Random Vectors 323

by Lemma 5.3. The total kurtosis


  2   −1 4
2 8 3 2
γ2,d = γ ⊗
X,2 = 1− 1p  −   1p
π π π

follows from above, where  denotes the Hadamard power.


Case 6.13 (Elliptically Symmetric) The kurtosis of an elliptically symmetric ran-
dom variable X ∈ Ed (μ, , g) is
κY1 ,4 −1 ⊗2
γ⊗ ⊗
X,2 = κ Y,4 = L vec Id ,
3 22
(see Theorem 5.2), where κY1 ,4 is given in terms of the cumulants of generating
variate R as follows:
6 1
κY1 ,4 = − κ 4 + 3 2 κR 2 ,2 ,
d 2 (d + 2) R ,1 d
cf. (5.33). The total kurtosis is

γ2,d = d (d + 2) κY21 ,4 ,

see Exercise 6.2, p. 348 for the norm square of L−1 ⊗2


22 vec Id .
Case 6.14 (CFUSS) We allude to the results of the previous chapter for kurtosis
of Canonical Fundamental Skew-Spherical distribution X ∈ CF U SSd,p (0, , );
see (5.40), p. 269 for the variance and (5.44), p. 272 for the fourth-order cumulant.

Case 6.15 (Skew-t) The kurtosis of the distribution Std (μ, , α, p) is based on the
variance (5.56), p. 278 and the fourth-order cumulant in Lemma 5.9.
Case 6.16 (SMSN) Let X have scale mixtures of skew-normal distribution, X ∈
SMSNd (μ, K (η) , α). Then the kurtosis of X is
 −1/2⊗4
2 
γ⊗
X,2 = μξ,2 − μ2ξ δδ κ⊗
X,4 ,
π

where κ ⊗
X,4 is given by (5.62), p. 286, and the cumulants of V can be found in
Lemma 5.1, and ξ is defined by (5.59).
Case 6.17 (SNC) The kurtosis of multivariate skew-normal-Cauchy distribution
SNC(,a) is given by
  2   −1/2 ⊗4
8 2 2 2
γ⊗
X,2 = − μh(|Z|),3 μh(|Z|) + 6 4
μh(|Z|)  − μh(|Z|) δδ  δ ,
π π π


see (5.75) for variance and (5.72) for κX,4 .
324 6 Multivariate Skewness and Kurtosis

Case 6.18 (ML) Let us consider random variate X with multivariate Laplace
distribution, X ∈ MLd (θ , ), then
   
γ⊗
X,2 = 
−1/2⊗4
L−1 ⊗2 −1
22 vec  + L12 ,21 vec  ⊗ θ
⊗2
− 3θ ⊗4 ,

cf. (5.78).
The kurtosis vector γ ⊗
X,2 forms the basis for all multivariate measures of kurtosis
proposed in the literature, as the examples that follow, demonstrate.
Example 6.7 (Mardia) Mardia defined an index of kurtosis as
 2
β2,d = E Y Y

and this is related to our total kurtosis measure γ2,d as follows:


 2 ⊗
E Y Y = E [Y]⊗2 [Y]⊗2 = μY,4 vecId 2

⊗
and we express μY,4 by γ ⊗
X,2 , hence

      −1 ⊗2
β2,d = vec Id 2 vec Id 2 γ ⊗
X,2 + vec Id 2 L22 vec Id .

Finally, we observe that the constant term does not depend on the distribution. If
Y is standard Gaussian vector for which the fourth cumulant is zero and β2,d =
E (Y Y)2 = d (d + 2), so that as a side result we have
   −1 ⊗2
vec Id 2 L22 vec Id = d (d + 2)

and we obtain
 
β2,d = vec Id 2 γ ⊗
X,2 + d (d + 2) . (6.7)

A consequence of this is that Mardia’s measure does not depend on all the entries
of γ ⊗
X,2 , which has ŋd,4 = d(d + 1)(d + 2)(d + 3)/24 distinct elements, (see (1.30)
15), while β2,d includes only d 2 elements among them. We note if X is Gaussian,
then γ ⊗
X,2 = 0.

Example 6.8 (Koziol) Koziol considered the following index of kurtosis. Let 
Y be
an independent copy of Y, then
 4 2
Y Y = E
E  Y⊗4 Y⊗4 = EY⊗4 .
6.2 Multivariate Kurtosis of Random Vectors 325

2
= μ⊗
2
Therefore EY⊗4 Y,4 can be considered as the next higher degree
analogue of Mardia’s skewness index β1,d . Specifically

2
μ⊗
Y,4 = γ2,d + 6β2,d − 3d (d + 2) (6.8)

   ⊗
γ ⊗ ⊗2
X,2 vec Id = vec Id 2 γ X,2 , where β2,d is Mardia’s index of kurtosis. Indeed, if
we express the moment μ⊗Y,4 in terms of cumulants, then we obtain

2 2
μ⊗
Y,4 = γ⊗ −1 ⊗2
X,2 + L22 vec Id = γ2,d + 2γ ⊗ −1 ⊗2
X,2 L22 vec Id + 3d (d + 2) ,

 
where we have observed L−1 ⊗2
22 vec Id L−1 ⊗2
22 vec Id = 3d (d + 2), (see Exer-
cise 6.2, p. 348). Now γ ⊗ ⊗ −1 ⊗
2 is 4-symmetric; therefore, γ X,2 L22 = 3γ 2X, , (see
Exercise 6.3, p. 348) hence

2
μ⊗
Y,4 = γ2,d + 6γ ⊗ ⊗2
X,2 vec Id + 3d (d + 2) .

Finally we use (6.7) and obtain (6.8).


Example 6.9 (Cardoso, Móri–Rohatgi–Székely) Cardoso, Móri–Rothagi–Székely
define what we will call the CMSzR kurtosis matrix B (Y) of Y by
 
B (Y) = E YY YY − (d + 2) Id . (6.9)

Then
 
vec B (Y) = Id 2 ⊗ vec Id μ⊗
Y,4 − (d + 2) vec Id ,

which can be expressed in terms of γ ⊗


X,2 as
 
vec B (Y) = Id 2 ⊗ vec Id γ ⊗
X,2 ,

since expressing μ⊗
Y,4 from (6.6) we have

  
vec B (Y) = Id 2 ⊗ vec Id γ ⊗X,2 + L−1
22 vec⊗2
Id − (d + 2) vec Id
    −1 ⊗2
= Id 2 ⊗ vec Id γ ⊗ 
X,2 + Id 2 ⊗ vec Id L22 vec Id − (d + 2) vec Id
 
= Id 2 ⊗ vec Id γ ⊗
X,2 ,

(see Exercise 6.5, p. 348). As in the case of MSzR skewness measure, this measure
does not take into account the contribution of cumulants of the type E (Yr Ys Yt Yu )
326 6 Multivariate Skewness and Kurtosis

where r, s, t, u are different. We note that

trB (Y) = β2,d .

Similar to the discussion regarding skewness, a measure proposed by Malkovich


and Afifi simply provides a different derivation of the total kurtosis in the form
 2  2
B (Y) = sup u⊗4 κ ⊗
Y,4 ≤ κ⊗
Y,4 , (6.10)
u

 2 2  2
since we have u⊗4 κ ⊗ Y,4 = u⊗4 2 κ ⊗
Y,4 cos u⊗4 , κ ⊗
Y,4 =
2  
κ⊗
Y,4 cos2 u⊗4 , κ ⊗ ⊗
Y,4 . If κ Y,4 is rank 1 tensor in symmetric space Sd,4 then
equality occurs in (6.10), similarly to the case of skewness.
Remark
 6.2 It must be noted that the idea used in Eq. (6.4), namely the integral of
u u⊗4 κ ⊗ Y,4 over the unit sphere, will not work, since it is easy to see that this
results in a zero vector. So the extension to vector valued case is not possible.
Example 6.10 (Kollo) Kollo introduces the kurtosis matrix
⎛ ⎞2

d 
d d
B (Y) = EYi Yj YY = E Yi Yj YY = E ⎝ Yi ⎠ YY
i,j =1 i,j =1 j =1
 

= E 1d 2 (Y ⊗ Y) YY .

The vector corresponding to this B (Y) can be expressed in terms of the kurtosis
vector γ ⊗
X,2 as

⎛ ⎞2
d  

vec B (Y) = E ⎝ Yi ⎠ (Y ⊗ Y) = E 1d 2 (Y ⊗ Y) vec YY
j =1
 
 
= EY⊗2 1d 2 (Y ⊗ Y) = Id 2 ⊗ 1d 2 EY⊗4
  

= Id 2 ⊗ 1d 2 γ ⊗ −1 ⊗2
X,2 + L22 vec Id .

We note that if Y = [Y, −Y ], where Y is a non-Gaussian random variate, then


B (Y) = 0, although Y is non-Gaussian.
6.3 Indices Based on Distinct Elements of Cumulant Vectors 327

Example 6.11 (Srivastava) Using the notations of Example 6.5, the average of the
fourth moments of the centered and scaled variable 
Y,

1   4 
d
b2 (X) = EYi ,
d
j =1

provides a kurtosis measure, where

 −1/2
Y = Dλ   Y.

Again we see the average of the fourth moments of the standardized variable 
Y. We
have
    −1/2  ⊗4 ⊗
i = e D
Cum4 Y  κ Y,4 ,
i λ

and
   2  
i4 = Cum4 Y
EY i + 3Var Y
i = Cum4 Y
i + 3

hence

1  ⊗4 ⊗4 ⊗
d
1  ⊗4
−1/2
b2 (Y) = ei  κ Y,4 + 3 = i⊗
d,4 Dλ   γ⊗
X,2 + 3;
d d
j =1

see (5.8) for i⊗


d,4 . Note that this index is not affine invariant (nor is the corresponding
skewness index that has been already discussed).

6.3 Indices Based on Distinct Elements of Cumulant Vectors

The skewness γ ⊗ ⊗ 3 4
1 and kurtosis γ X,2 vectors contain d and d elements, respec-
tively, not all of which are distinct. Just as the covariance matrix of a d-dimensional
vector contains only ŋd,2 = d(d + 1)/2 distinct elements, we also have that
γ⊗ ⊗
X,1 contains ŋd,3 = d(d + 1)(d + 2)/6 distinct entries, while γ X,2 contains
ŋd,4 = d(d + 1)(d + 2)(d + 3)/24 distinct entries at most.
Similar to the fact that there are many applications, and measures, which
consider only the distinct elements of the covariance matrix, it is quite sensible
and reasonable to follow this approach and define skewness and kurtosis measures
based on just the distinct elements of the corresponding cumulant vectors. One
can use the elimination matrix Q+ d,q (see Sect. 1.3.2, p. 16) for separating distinct
328 6 Multivariate Skewness and Kurtosis

entries. The selection of the distinct elements from the vectors γ ⊗ ⊗


X,1 and γ X,2 can
be accomplished via linear transformations.
In the case of a covariance matrix  of a vector X the elimination matrix Q+ d,2
above is a matrix acting on vec  and in the approach defined in this book it holds
that vec  = κ ⊗ X,2 , i.e. the distinct elements of vec  correspond to the distinct
elements of a tensor product X ⊗ X.
We shall use elimination matrices Q+ +
d,3 and Qd,4 for shortening the skewness
γ⊗ ⊗
X,1 and kurtosis γ X,2 vectors, respectively, keeping just the distinct entries in
them. The matrices Q+ +
d,3 and Qd,4 are actually the Moore–Penrose inverses of the
triplication and quadruplication matrices. This gives the cumulant vectors of distinct
elements as γ ⊗ + ⊗ ⊗ + ⊗
X,1,Ð = Qd,3 γ X,1 and γ X,2,Ð = Qd,4 γ X,2 .  
In such a case the distinct element vector of γ ⊗ ⊗
X,1 has dimension dim γ X,1,Ð =
 
d (d + 1) (d + 2) /6. For instance, when d = 2, dim γ ⊗ X,1,Ð is 50% of
 
dim γ ⊗ X,1 , and in general, the percentage of distinct elements decreases in the
proportion (1 + 1/d) (1 + 2/d) /6, getting close to 1/6 for large d. Similarly, the
fraction of distinct elements in γ ⊗ ⊗
X,2,Ð relative to γ X,2 approaches 1/24 for large d,
a significant reduction.
Following the discussion of this section, one can define the indices of total
skewness and total kurtosis by exploiting the square norms of the skewness and
kurtosis vectors containing only the distinct elements, namely

γ1Ð = ||γ ⊗ + ⊗ 2
X,1,Ð || = ||Qd,3 γ 1 ||
2

and

γ2Ð = ||γ ⊗ 2 + ⊗ 2
X,2,Ð || = ||Qd,4 γ X,2 || .

6.4 Testing Multivariate Skewness

In this section we assume that a sample X1 , . . . , Xn is available for a vector variate


X ∈ Rd , which has higher-order moments at least up to order 6, and denote
−1/2
EX and VarX by  μ and , respectively. We let Y =  (X − μ) and Yj =
 −1/2 Xj − μ , such that Y is the standardized variate and Yj the standardized
sample.
In the case when the mean μ and variance  are not known, we estimate the
standardized variates
 
3 3−1/2 Xj − 3
Yj =  μ , (6.11)
6.4 Testing Multivariate Skewness 329

where

1 
n
3
μ = X, 3=
 (Xi − 3 μ) ,
μ) (Xi − 3
n−1
i=1

as usual. We observe that


    
3
Yj =  3−1/2  1/2  −1/2 Xj − μ − Xj − μ ,

hence asymptotically Yj and 3 Yj are equivalent in distribution. (The bar over an


expression is used for the sample mean of that expression.) In Sect. 4.7 we have
seen the connection between the expected value of Hermite polynomials at the
standardized value Y of X and cumulants (see Lemma 4.4), i.e.

EH3 (Y) = κ ⊗
Y,3 , EH4 (Y) = κ ⊗
Y,4 ,

cf. (3.36), p. 136.


These equations provide estimators for skewness and kurtosis. We shall take the
sample means of H3 and H4 , respectively. This result allows us to extend Neyman’s
idea of smooth testing for multivariate normality.
The asymptotic normality of the estimators, which we will consider below,
has been well studied, and so we will restrict ourselves to just considering the
asymptotic variances of estimators. Using appropriate variances is very important
in practice.

6.4.1 Estimation of Skewness

First let us start by assuming that the parameters μ and  are known.
Case 6.19 (Skewness: μ and  are Known) The skewness vector has been defined
by γ ⊗ ⊗ ⊗
X,1 = κ Y,3 , i.e. γ X,1 is the third-order cumulant of Y, which is the third-order
central moment of Y. If the parameters μ and  are known, then we obtain the
following estimator for skewness:

γ⊗ ⊗3
X,1 = Yi ,

by the method of moments. It is well known that γ ⊗X,1 is an unbiased estimator of


⊗ ⊗
skewness and γ X,1 − γ X,1 is asymptotically normal with variance vector
   
Cum2 Y⊗3 = κ ⊗
Y,6 + L−1
14 ,12 κ ⊗
Y,4 ⊗ vec Id + L−1 ⊗2 −1 ⊗3 ⊗2
23 κ Y,3 + L32 vec Id − κ Y,3 ,
(6.12)
330 6 Multivariate Skewness and Kurtosis

and its 6-symmetric version is


   
Sd16 Cum2 Y⊗3 = κ ⊗ Y,6 + 15 κ ⊗
Y,4 ⊗ vec Id + 9κ ⊗2 ⊗3
Y,3 + 15vec Id .

The calculation of variance (6.12) is either based on formula (3.45), p. 142, or the
linearization of powers by Hermite polynomials, (see Proposition 4.6, p. 216) and
formula (4.79), p. 231. Now we use the method of Hermite polynomials and obtain
the estimator
 
γ⊗ ⊗3 −1
X,1,H = H3 (Yi ) = Yi − L11 ,12 Yi ⊗ vec Id .

This estimator is also unbiased and asymptotically normal with variance vector
vec Cγ 1 , which is

vec Cγ 1 = Cum2 (H3 (Y)) = EH3 (Y)⊗2 − κ ⊗2


Y,3 .

We have seen EH3 (Y)⊗2 by (4.80), p. 231. Now we apply it to the case when
κ⊗ ⊗
Y,1 = 0 and κ Y,2 = vec Id , hence we have
 
vec Cγ 1 = κ ⊗ −1 ⊗2 −1 ⊗ −1 ⊗3 ⊗2
Y,6 + L23 κ Y,3 + L2,H 4 vec Id ⊗ κ Y,4 + Mm3 vec Id − κ Y,3 ,
(6.13)

(see Sect. A.2, p. 353 for commutators). The left-hand side of Eq. (6.13)
is symmetric with respect to the symmetrizer Sd 3 12 S⊗2
d13 . Nevertheless we cannot
simplify the formula using the symmetrizer Sd 3 12 S⊗2 d13 for both sides, since for
   
instance, Sd 3 12 Sd13 L2,H 4 vec Id ⊗ κ Y,4 = 9Sd 3 12 Sd13 vec Id ⊗ κ ⊗
⊗2 −1 ⊗ ⊗2
Y,4 . If we
apply the symmetrizer Sd16 to both sides we obtain
 

Sd16 vec Cγ 1 = κ ⊗
Y,6 + 9κ ⊗2
Y,3 + 9 vec Id ⊗ κ ⊗ ⊗3
Y,4 + 6vec Id ,

which can be used directly if d = 1, otherwise it gives information about the number
of terms included in the equation.
In the case when μ and  are estimated, we use standardized Y 3k variates with
estimated parameters; see (6.11). In the sequel we shall use Hermite polynomials
at the observed standardized sample 3 Yk ; therefore, we are interested in their
computational aspect.
We consider the Hermite polynomials Hj of a normal standard vector variate Z
 
and replace Z with 3
Yk , then we take the mean, denoting the result by Hj 3Yk . Now
H1 (Z) = Z; therefore, we have

     
H1 3 3−1/2 Xj − X = 
Yj =  3−1/2 Xj − X = 0.
6.4 Testing Multivariate Skewness 331

The second-order Hermite polynomial of Z is

H2 (Z ) = Z⊗2 − vecId ,

therefore,

   −1/2  ⊗2
H2 3 3
Yj =  Xk − X − vec Id = 0

since
  ⊗2  ⊗2  ⊗2  ⊗2
3−1/2 Xk − X = 3−1/2 Xk − X =  3−1/2 3
vec 
 
= vec 3−1/2 
33−1/2 = vec Id . (6.14)

Here we list the results for the Hermite polynomials of order 3 : 6. The arguments
are given in Sect. 6.7.1, Appendix. If we apply the Gram–Charlier approximation
to the distribution of Y, and use the formula (4.79), p. 231, then we obtain the
following estimators of cumulants, i.e. each estimated Hermite polynomial provides
an estimate of the corresponding cumulants as follows:
 
κ⊗
3 3 3⊗3
Y,3 = H3 Yj = Y , (6.15)
 
κ⊗
3 3 3⊗4 − 3vec⊗2 Id ,
Y,4 = H4 Yj = Y (6.16)
 
κ⊗
3 3 3⊗5 − 10vec Id ⊗ 3
Y⊗3 ,
Y,5 = H5 Yj = Y (6.17)
 
κ⊗
3 −1 ⊗2
Y,6 + L23 3 Yj = 3
κ Y,3 = H6 3 Y⊗6 − 15vec Id ⊗ 3
Y⊗4 + 30vec⊗3 Id . (6.18)

We can obtain the left side of the above symmetrized equations using symmetrizer.
We note that Sd16 L−1 κ ⊗2
23 3 Y,3 = 103κ ⊗2
Y,3 in the last equation; therefore,

 ⊗6
 ⊗2
κ⊗
3 = 3
Y − 15vec Id ⊗ 3
Y ⊗4 − 10 3
Y ⊗3 + 30vec⊗3 Id . (6.19)
Y,6

Case 6.20 (Skewness: μ and  are unknown) The skew vector γ ⊗ ⊗


X,1 = κ Y,3 , i.e.
the third-order cumulant of Y, which is actually the third-order central moment, can
be estimated by the method of moments as 3 γ⊗ κ⊗ 3⊗3
X,1 = 3 Y,3 = Y . By the previous
 
calculations, we also have that H3 3
Yj gives the same estimate, i.e.

 
γ⊗
3 3⊗3 3
X,1 = Yi = H3 Yi . (6.20)
332 6 Multivariate Skewness and Kurtosis

Now we have
 ⊗3   ⊗3
3
Y⊗3 3−1/2  1/2  −1/2 Xi − μ − Xi − μ
i =  ,

hence we can assume that Xi has been standardized; therefore, we consider


 ⊗3
3
Y⊗3 = Y i − Y .
i

We are interested in the asymptotic distribution of

√  ⊗3  √ √ √ ⊗2 √ ⊗3
n Yi − Y = nY⊗3 ⊗2
i − 3Yi ⊗ n Y + 3 n Yi ⊗ Y − nY .

We can apply the following stochastic limits: Y⊗2


i = op (1) + vec Id , Y = op (1),
√ D
and the weak limit nYi ∼ Z. Then we use Slutsky’s argument and obtain

√  ⊗3 D √
 √
n Yi − Y ∼ nY⊗3
i − 3vec Id ⊗ Yi = n H3 (Yi ),

when n → ∞. Hence we have proved the following.


Lemma 6.1 Let Yi , i = 1 : n be a sample and assume that the sixth-order moment
of Y exists then

√ ⊗ √  ⊗3 D √
γ X,1 = n Yi − Y
n3 ∼ n H3 (Yi ). (6.21)


Moreover n H3 (Yi ) is asymptotically normal with expected value γ ⊗
X,1 and
variance (6.13).
Remark 6.3 We estimate skewness 3 γ⊗ 3⊗3
X,1 by Y , nevertheless
 
we estimate the
variance Cγ 1 of skewness by the sample variance of H3 3 Yj . The reason is that
variance Cγ 1 corresponds to the variance of H3 (Y). Observe that the sample
 
variance of H3 3 Yj is different from that of 3 Y⊗3
j , although the sample means of
 
them are the same. In practice we calculate both 3
Y⊗3 and H3 3 Yj , then we use 3 Y⊗3
  j j
to estimate skewness and H3 Y 3j to estimate the variance of 3 γ⊗ , respectively.
X,1
We note that variance of 3 Y⊗3
j follows formula (6.13). Results in Sect. 6.6 provide
numerical verification of this by Monte Carlo methods.
6.4 Testing Multivariate Skewness 333

6.4.2 Testing Zero Skewness

Since, in practice neither μ nor  may be known, we consider standardized variables


 
3
Yj , j = 1 : n, and statistics 3 γ⊗ 3 3⊗3
X,1 = H3 Yj = Yj with asymptotic variance
(6.13).
Under hypothesis H0 : γ ⊗ X,1 = 0, we have the asymptotic variance matrix Cγ 1 of

3
γ X,1 , by formula (6.13) as
 
vec Cγ 1 = κ ⊗ −1 ⊗ −1 ⊗3
Y,6 + L2,H 4 vec Id ⊗ κ Y,4 + Mm3 vec Id . (6.22)

We can estimate Cγ 1 using estimated cumulants. It seems natural to standardize


γ⊗
3 ⊗
X,1 and consider the corresponding χ statistics for testing H0 : γ X,1 = 0.
2

Now the construction of 3 γ⊗X,1 shows thatit is 3-symmetric, i.e. the number of

the distinct entries of 3
γ is ŋd,3 = d+2 3 , (see Sect. 1.3.2, p. 16). Therefore,
√X,1 ⊗
although the statistics n3 γ X,1 satisfy asymptotic normality, it will not have full
rank but a rank of ŋd,3 at most. There are several methods for finding a linear
transformation which provides a standardized 3 γ⊗
X,1 . For instance one can use
either eigen-decomposition, singular-value decomposition, or Moore–Penrose (MP)
pseudoinverse of Cγ 1 . We will use the MP inverse of Cγ 1 and denote C+ γ 1 for
+1/2 −1/2
simplicity. In this way Cγ 1 corresponds Cγ 1 if Cγ 1 would have full rank.
The test statistics will be naturally based on the sum of squares of entries of
√ +1/2 ⊗
nCγ 1 3 γ X,1 ,

√ +1/2 ⊗ 2 ⊗
nCγ 1 3
γ X,1 γ X,1 C+
= n3 γ⊗
γ 13X,1 ,

which follows an asymptotic χ 2 distribution with degrees of freedom defined by the


rank of Cγ 1 .
An alternate solution to this problem can be obtained as follows: We apply the
linear transformation Q+
d,3 (see (1.32), p. 21), which eliminates the repeated entries

of 3
γ X,1 , namely

γ⊗ +
γ⊗
X,1,Ð = Qd,33X,1 .

γ⊗
Then variance of 3 γ⊗
X,1,Ð , i.e. the variance of distinct entries of estimator 3X,1 is
calculated by
+
Cγ 1 ,Ð = Q+
d,3 Cγ 1 Qd,3 ,
334 6 Multivariate Skewness and Kurtosis

hence in the case of full rank of Cγ 1 ,Ð the norm square

√ −1/2 ⊗ 2
nCγ ,Ð γ X,1,Ð = nγ ⊗ −1 ⊗
X,1,Ð Cγ ,Ð γ X,1,Ð
1 1

is appropriate for test statistics being asymptotically χ 2 distributed with degrees of


freedom ŋd,3 . In practice one can check this easily.
In Example 6.2, p. 317 we have considered MSzR skewness vector which has
been denoted by b (Y). Actually it is connected to our skewness vector by the
equation
 
b (Y) = vec Id ⊗ Id γ ⊗
X,1 .

Therefore, we can estimate it using the standardized 3


Yj
  ⊗
3
b = vec Id ⊗ Id 3
γ X,1 ,

and with asymptotic variance


 
Cb = vec Id ⊗ Id Cγ 1 ((vec Id ) ⊗ Id ) .

Since all the entries of 3


b are distinct, we can assume that Cb is positive definite and
2
b=n 3
3 b C 3

b
−1
b is asymptotically χ 2 distributed with degrees of freedom d.
In particular cases, one may be able to get more efficient methods for the
calculation of the test statistics.

6.4.2.1 Testing Gaussianity by Skewness

If Gaussianity is assumed, then Y ∈ N (0, Id ) hence the cumulants κ ⊗ ⊗


Y,6 and κ Y,4
are zero in (6.22) and the variance matrix Cγ 1 is reduced to

vec Cγ 1 = M−1 ⊗3
m3 vec Id = 6vec Sd13 ,

d+2
i.e. Cγ 1 = 6Sd13 . The MP inverse of Cγ 1 is 1/6Sd13 with rank ŋd,3 = 3 ; see
(4.68), p. 221. Hence we obtain Mardia’s statistics

⊗ n ⊗ 2
γ X,1 C+
n3 γ⊗
γ 13X,1 = 3
γ ,
6 X,1

for checking normality based on skewness. Let us recall that 3γ⊗X,1 is calculated by
the formula (6.20).
In fact, the covariance vector Cγ 1 is the third-order cumulant of H3 (Y) where
Y ∈ N (0, Id ). The entries of vec Cγ 1 can be organized in three groups, the
6.4 Testing Multivariate Skewness 335

 
 3 (Yk ), k = 1 : d, then VarH2 Yj H1 (Yk ), j = k, and finally
first VarH
VarH1 Yj Hk (Y2 ) H1 (Ym ), where j, k, and m are distinct. The following table
shows the values and the numbers of the entries in Cγ 1 :

Var # of terms # of distinct


VarH3 (Yk ) = 6 d d
 
VarH2 Yj H1 (Yk ) = 2 3d (d − 1) d (d − 1)
 
VarH1 Yj H1 (Yk ) H1 (Ym ) = 1 d (d − 1) (d − 2) d (d − 1) (d − 2) /6

Therefore, we obtain ŋd,3 distinct values. In the sumof squares


 of test statistics
we divided each entry H3 (Yk ) by 6, and entries H2 Yj H1 (Yk ) by 2, among
3d (d − 1) entries there are d (d − 1) distinct entries; therefore, we have to divide
it by 3 in addition to 2. Finally there are d (d − 1) (d − 2) /6 distinct entries
among d (d − 1) (d − 2) entries of type H1 Yj H1 (Yk ) H1 (Ym ) with variance 1.
Therefore we obtain χ 2 distribution with degrees of freedom ŋd,3 asymptotically for
2
γ⊗
the test statistics n 3X,1 /6.
Under Gaussianity we have seen that Cγ 1 = 6Sd13 so that the asymptotic
variance for MRSz statistics 3
b is
   
C3b = vec Id ⊗ Id Cγ 1 ((vec Id ) ⊗ Id ) = 6 vec Id ⊗ Id Sd13 ((vec Id ) ⊗ Id ) .

We show that

C3b = 2 (d + 2) Id .

Indeed, let ei be the ith column of Id , we consider the product


⎛ ⎞ ⎛ ⎞ ⎛ ⎞

d 
d 
d d 
   
⎝ ⊗2 ⊗2 ⊗2 ⊗2
ek ⊗ Id ⎠ Sd13 ⎝ ej ⊗ ei ⎠ = ⎝ ek ⊗ Id Sd13 ej ⊗ ei ⎠ .
k=1 j =1 j =1 k=1

If j = i, then e⊗2 ⊗3
j ⊗ ei = ej which is invariant under Sd13 ; therefore, the result of
# ⊗2
the product with ek ⊗ Id is 1. If j = i then there are two permutations (among
6) which do not change e⊗2
j ⊗ ei . This latter one changes the order of the first two
components hence the result is 2/6. This latter one occurs d −1 times. Summarizing
this we obtain
  1 1
vec Id ⊗ Id Sd13 ((vec Id ) ⊗ Id ) = (2 (d − 1) + 6) Id = 2 (d + 2) Id ,
6 6
336 6 Multivariate Skewness and Kurtosis

and conclude that under the hypothesis of Gaussianity the statistics 3 b =


3  −13 3 2
nb C3b b = n/2 (d + 2) b is asymptotically χ distributed with degrees of
2

freedom d.
The same result can be reached if we consider the form (6.3) and we use Hermite
polynomials with Gaussian entries, such that
⎡ ⎤
     
vec Id ⊗ Id H3 (Y) = ⎣ H3 Yj , Yj , Yk ⎦
j k

is a d-dimensional vector with orthogonal entries. Recall that Y ∈ N (0, Id ). The


covariance matrix Cb is diagonal
⎡ ⎤
     
vec Id ⊗ Id CH3 (vec Id ⊗ Id ) = ⎣Cov H3 Yj , Yj , Yk , H3 (Ym , Ym , Y )⎦
j m
k,
⎡ ⎤
  
= Diag ⎣ VarH3 Yj , Yj , Yk ⎦
j k
 
= Diag [2 (d − 1) + 6]k = Diag [2 (d + 2)]k

and TrCb = 2d (d + 2), since


! "
  2j =
 k
VarH3 Yj , Yj , Yk = .
6j =k

The asymptotic distribution of b (Y)2 /2 (d + 2) is χ 2 with degrees of freedom d.


Now we summarize these results in the following:
Lemma 6.2 Under hypothesis of Gaussianity we have that Mardia’s statistics
2
γ⊗
n/6 3X,1 has asymptotically χ 2 distribution with degrees of freedom ŋd,3 . The
MRSz statistics n/2 (d + 2) 3
2
b has asymptotically χ 2 distribution with degrees of
freedom d.

6.4.2.2 Testing Elliptical Symmetry by Skewness

Let us consider the hypothesis H0 : X ∈ Ed (μ, , g), with our earlier notation

X = μ +  1/2 W,

where W is spherically distributed. In this framework Y = (2ν1 )−1/2 W (ν1 is the


generator moment of W, see (5.17), p. 255), since we require that Y be standardized.
We will use the following result
6.5 Testing Multivariate Kurtosis 337

 
1
Cum2m √ κm L−1
W = m2 vec
⊗m
Id
2ν1

on the cumulants of symmetric variates W in terms of cumulant parameters  κm , see


(5.34), p. 263. So under H0 we have that all odd cumulants are zero; therefore,
 
κ3 L−1
vec Cγ 1 =  ⊗3
32 vec Id +  κ2 L−1 −1 ⊗2 −1
2,H 4 vec Id ⊗ L22 vec Id + Mm3 vec Id
⊗3

  
=  κ3 L−1
32 + 
κ 2 L−1
2,H 4 Id 2 ⊗ L−1
22 vec⊗3 Id + 6vec Sd13
 
κ3 Sd16 vec⊗3 Id + 3!!
= 5!! κ2 L−1 ⊗3
2,H 4 Id 2 ⊗ Sd14 vec Id + 3!vec Sd13 .

The sum of the entries of vec Cγ 1


 

1d 6 vec Cγ 1 = 15
κ6 + 3!!32
κ4 + 3! d 3

γ⊗
results in the asymptotic variance of the sum of entries of 3X,1 . Hence we can use
#d 3 ⊗
the asymptotic normality of 1 3 γ X,1 for testing.
In case of MRSz index let us denote the variance matrix by Cγ 1 ,M , which is
 ⊗2
vec Cγ 1 ,M = vec Id ⊗ Id vec Cγ 1 ,

therefore

Cγ 1 ,M = d (d + 2) ((d + 4)
κ6 + (d + 8)
κ4 + 2) Id ,

hence obtain that

n 3
2
b /d (d + 2) ((d + 4)
κ6 + (d + 8)
κ4 + 2) (6.23)

is χ 2 distributed with degrees of freedom d asymptotically.

6.5 Testing Multivariate Kurtosis

We follow the steps of the methods we applied to skewness.


338 6 Multivariate Skewness and Kurtosis

6.5.1 Estimation of Kurtosis


 
We estimate kurtosis γ ⊗ X,2 by H4 Yj , see Sect. 6.4.1, which actually is estimation
by moments.
Asymptotic variances will be constructed using the results of Sect. 4.7, p. 226
similarly to the case of skewness.
Case 6.21 (Kurtosis: μ and  are known) If parameters μ and  are known, then
we estimate the kurtosis either by the method of moments or by the method of
Hermite polynomials. We can standardize X and use Y for the estimation. Let us
express κ ⊗
Y,4 in terms of moments

κ⊗ ⊗ −1 ⊗2
Y,4 = μY,4 − L22 vec Id ,

(see (3.36)), hence the method of moments gives the following estimator:

γ⊗ ⊗4 −1 ⊗2
X,2 = Yi − L22 vec Id .

The variance vector of estimator γ ⊗


X,2 is given by
   
Cum2 Y⊗4 = κ ⊗ −1 ⊗
Y,8 + L12 ,16 vec Id ⊗ κ Y,6
 
+L−1 ⊗ ⊗ −1 ⊗2
13 ,15 κ Y,3 ⊗ κ Y,5 + L24 κ Y,4 (6.24)
   
+L−1 ⊗2 −1 ⊗2 ⊗
12 ,32 vec Id ⊗ κ Y,3 + L22 ,14 vec Id ⊗ κ Y,4
 ⊗2
+L−1 ⊗4 ⊗ −1 ⊗2
42 vec Id − κ Y,4 + L22 vec Id ,

and its symmetric version is


 
Sd18 Cum2 Y⊗4 = κ ⊗ ⊗ ⊗ ⊗ ⊗2
Y,8 + 28vec Id ⊗ κ Y,6 + 56κ Y,3 ⊗ κ Y,5 + 34κ Y,4

+204vec⊗2 Id ⊗ κ ⊗
Y,4

+280vec Id ⊗ κ ⊗2 ⊗4
Y,3 + 96vec Id .

Now we use the Hermite polynomials’ approach and obtain estimator


 
γ⊗
 ⊗4 −1 ⊗2
X,2 = H4 (Yi ) = Yi − L2,H 2 vec Id ⊗ Yi + L−1 ⊗2
22 vec Id ,

with variance vector


6.5 Testing Multivariate Kurtosis 339

 
Cum2 (H4 (Y)) = κ ⊗
Y,8 + L −1
13 ,15 κ ⊗
Y,3 ⊗ κ ⊗ −1 ⊗2
Y,5 + L24 κ Y,4
  
+L−12,H 6 vec Id ⊗ κ ⊗
Y,6 + L −1 ⊗2
κ
23 Y,3 (6.25)
 
+L−1 ⊗2 ⊗ −1 ⊗4
2,2,H 4 vec Id ⊗ κ Y,4 + Mm4 vec Id − κ Y,4 ,
⊗2

(see 4.81, p. 231 for cumulants, and Sect. A.2, p. A.2 for commutators). The 8-
symmetric version of Cum2 (H4 (Y)) is


Sd18 Cum2 (H4 (Y)) = κ ⊗ ⊗ ⊗2 ⊗ ⊗
Y,8 + 16κ Y,6 ⊗ vec Id + 56κ Y,5 ⊗ κ Y,3

+ 34κ ⊗2 ⊗ ⊗2
Y,4 + 36κ Y,4 ⊗ vec Id

+ 160κ ⊗2 ⊗4
Y,3 ⊗ vecId + 24vec Id .

Note again that both sides of (6.25) is symmetric with respect to Sd 4 12 S⊗2
d14 , but Sd18 ;
therefore, symmetric version is only useful for scalar-valued cases.
In the case when parameters μ and  are not known then we use estimated
parameters and sample 3 γ⊗
Yj . We have seen that estimator 3X,2 is simplified

 
Yj = 3
H4 3 Y⊗4 − 3vec⊗2 Id .

Now we can repeat Lemma 6.1 for this case.


Lemma 6.3 Let Yi , i = 1 : n be a sample and assume that the eighth-order
moment of Y exists then

√  ⊗4 D √  √ 
n Yi − Y ∼ n H4 (Yi ) − L−1
13 ,11 κ ⊗
Y,3 ⊗ n H 1 (Y i ) ,

where the commutator L−1 γ⊗


13 ,11 is given by (A.4), p. 353. Moreover 3X,2 is asymptoti-

cally normal with expected value γ X,2 and variance
 
vec Cγ 2 = κ ⊗
Y,8 + L −1
13 ,15 κ ⊗
Y,3 ⊗ κ ⊗ −1 ⊗2
Y,5 + L24 κ Y,4
  
+L−1 ⊗
2,H 6 vec Id ⊗ κ Y,6 + L23 κ Y,3
−1 ⊗2
(6.26)
 
+L−1 ⊗2 ⊗ −1 ⊗4
2,2,H 4 vec Id ⊗ κ Y,4 + Mm4 vec Id − κ Y,4
⊗2

 
+L−1⊗2
13 ,11 κ ⊗2
Y,3 ⊗ vec Id .
340 6 Multivariate Skewness and Kurtosis

Proof The kurtosis is affine invariant, hence we can assume μ = 0 and  = Id ,


hence 3
Yi = Yi − Y. We consider
√  ⊗4  √  
n 3Y − 3vec⊗2 Id = nH4 3 Yj
 
 √
 ⊗4  ⊗2
⊗2
= n Yi − Y − 6vec Id ⊗ Yi − Y + 3vec Id

 √ √ √ ⊗2
= n H4 (Yi ) − 4H3 (Yi ) ⊗ n Y + 6 nY⊗2
i ⊗Y
√ ⊗3 √ ⊗4 ⊗2
−4 n Yi ⊗ Y + n Y − 6vec Id ⊗ Y
D √ √
∼ n H4 (Yi ) − 4κ ⊗
Y,3 ⊗ n Y.

⊗k
We have stochastic convergences Y = op (1), k ≥ 1 and H3 (Yi ) → EH3 (Y) =

κ⊗
D
(see Lemma 4.4, p. 231), and n Yi ∼ Z. We proceed with using Slutsky’s
Y,3 ,
argument, when n → ∞, and obtain
√  ⊗4 D√ √
n 3
Y − 3vec⊗2 Id ∼ n H4 (Yi ) − L−1 ⊗
13 ,11 κ Y,3 ⊗ n Y,

with variance

 
vec Cγ 2 = Cum2 H4 (Y) − 4κ ⊗ ⊗2
Y,3 ⊗ H1 (Y) = Cum2 (H4 (Y))+16κ Y,3 ⊗vec Id ,

where Cum2 (H4 (Y)) has been given by (6.25). The 8-symmetric version is


vec Cγ 2 = κ ⊗ ⊗ ⊗2 ⊗ ⊗
Y,8 + 16κ Y,6 ⊗ κ Y,2 + 56κ Y,5 ⊗ κ Y,3
 
+33κ ⊗2
Y,4 + 36 κ ⊗
Y,4 ⊗ κ ⊗2
Y,2 (6.27)

+160κ ⊗2 ⊗ ⊗4 ⊗2
Y,3 ⊗ κ Y,2 + 24κ Y,2 + 16κ Y,3 ⊗ vec Id .

Case 6.22 (Kurtosis: μ and  are unknown) Let us assume the necessary moments
exist, then the estimator
   ⊗4
γ⊗
3 κ⊗ 3 3⊗4 − L−1 vec⊗2 Id = 3
Y − 3vec⊗2 Id ,
X,2 = 3Y,4 = H4 Yj = Y 22 (6.28)

(see A.3) for the commutator L−1 γ⊗


22 ). Estimator 3X,2 is asymptotically unbiased and
is asymptotically normal with variance Cγ 2 (6.26) and see (6.27) for the symmetric
version of (6.26).
Remark 6.4 In practice neither μ nor  is known; therefore, we consider sample-
standardized variables 3 γ⊗
Yj , j = 1 : n, and statistics 3X,2 with asymptotic variance
6.5 Testing Multivariate Kurtosis 341

(6.26) for the estimator of kurtosis γ ⊗


X,2 . At the same time the sample variance of

   
Yj = 3
H4 3 Y⊗4 −1 3⊗2 + L−1 vec⊗2 Id ,
j − L12 ,21 vec Id ⊗ Yj 22

is applied for the estimator of variance Cγ 2 .

6.5.2 Testing Zero Kurtosis

Under hypothesis H0 : γ ⊗ X,2 = 0, we have the asymptotic variance matrix Cγ 2 of



3
γ X,2 , from formula (6.26) as
    
vec Cγ 2 = κ ⊗
Y,8 + L−1
13 ,15 κ ⊗
Y,3 ⊗ κ ⊗
Y,5 + L−1
2,H 6 vec Id ⊗ κ ⊗
Y,6 + L−1 ⊗2
κ
23 Y,3

+ L−1⊗2 ⊗2 −1 ⊗4
13 ,11 κ Y,3 ⊗ vec Id + Mm4 vec Id .

Therefore using the MP inverse of covariance matrix C+ γ 2 , we conclude that


√ +1/2 ⊗
statistics nCγ 2 3 γ X,2 is asymptotically standard normal. The test statistics will
√ +1/2 ⊗
be naturally based on the sum of squares of some entries of nCγ 2 3 γ X,1 ,

√ +1/2 ⊗ 2 ⊗
nCγ 2 3
γ X,2 γ X,1 C+
= n3 γ⊗
γ 23X,2 ,

⊗
which follows χ 2 distribution asymptotically. One cannot use n3 γ X,1 C+ γ⊗
γ 23
⊗ d+3
X,2

directly since 3
γ X,4 is 4-symmetric, i.e. the number of distinct entries is ŋd,4 = 4 ,
(see Sect. 1.30, p. 15). Therefore the degree of the distribution χ 2 is ŋd,4 at most.

6.5.2.1 Testing Gaussianity

If Y ∈ N (0, Id ) then cumulants κ ⊗


Y,r , r > 2 are zero in (6.26); therefore, variance
matrix Cγ 2 is reduced to

vec Cγ 2 = M−1 ⊗4
m4 vec Id = 4!vecSd14 ,

i.e. variance matrix Cγ 2 = 4!Sd14 and the MP inverse of Cγ 2 is Sd14 /4! (see (4.68),
p. 221), and the rank of Sd14 is ŋd,4 . Hence we obtain statistics

⊗ n 2
γ X,2 C+
n3 γ⊗
γ 23X,1 = γ⊗
3
24 X,2
342 6 Multivariate Skewness and Kurtosis

γ⊗
for checking normality based on total kurtosis. Let us recall that 3X,2 is calculated
by the formula (6.28).
Mardia’s measure of kurtosis is equivalent to
 
2,d = E [Y]⊗4 vec Id 2 − d (d + 2) = vec Id 2  γ ⊗ ,
β X,2

2,d = 0, and
therefore we can use the estimator of kurtosis vector for checking H0 : β
obtain estimator
  ⊗
32,d = vec Id 2  3
β γ X,2 ,

for Mardia’s measure.


 Under  the assumption of the Gaussianity of Y, the expected
value of estimator vec Id 2 3γ⊗X,2 will be zero with asymptotic variance

σβ32 = 8d (d + 2) .
2,d

Indeed
 ⊗2  ⊗4  ⊗2
Sd14 vec Id 2 = Sd14 ej ⊗ ek = ej + Sd14 ej ⊗ ek
j,k, j =k j =k
 8   ⊗2  ⊗2 
= e⊗4
j + e j ⊗ e ⊗2
k + e j ⊗ e ⊗2
k ⊗ e j + e j ⊗ e k ,
4!
j =k j =k

32,d follows
hence the variance of β
  ⊗   
Var vec Id 2 3γ X,2 = 4!vec Id 2 Sd14 vec Id 2 = 4!d + 8 d 2 − d = 8d (d + 2) ,

√ √
We conclude that nβ 2,d / 8d (d + 2) is asymptotically N (0, 1).
Let us consider CMSzR kurtosis matrix B (Y) of Y, cf. (6.9). By our treatment
B (Y) is expressed in terms of κ ⊗
4
 
vec B (Y) = Id 2 ⊗ vec Id κ ⊗
4

with ŋd,2 = d (d + 1) /2 distinct entry. It follows directly that we can use the
estimator
    ⊗
vec B 3
Y = Id 2 ⊗ vec Id 3
γ X,2 ,

for kurtosis matrix B (Y). Under the hypothesis of Gaussianity, we have the variance
of this estimator
     
VarvecB 3
Y = 4! Id 2 ⊗ vec Id Sd14 Id 2 ⊗ (vec Id )
6.6 A Simulation Study 343

which follows from the general result with rank ŋd,2 . One can use this result for
testing Gaussianity as well.
The trace of B (Y), which is used as a measure for kurtosis, corresponds to
Mardia’s index, hence we neglect it.

6.5.2.2 Testing Alternate Symmetry

Distributions with zero skewness are not necessarily normal, for instance they can
be symmetric as well. We have that all odd cumulants are zero for an elliptically
symmetric variate X ∈ Ed (μ, , g); therefore, the variance vector becomes the
following:
 
Cγ 2 = κ ⊗ −1 ⊗2 −1 ⊗
Y,8 + L24 κ Y,4 + L2,H 6 vec Id ⊗ κ Y,6
 
+ L−1 ⊗2 ⊗ −1 ⊗4 ⊗2
2,2,H 4 vec Id ⊗ κ Y,4 + Mm4 vec Id − κ Y,4 .

Let us recall that the cumulants of Y are given in Sect. 6.4.2.2. In addition, if we
assume that H0 : κ ⊗
Y,4 = 0 is true; then we obtain
 
Cγ 2 = κ ⊗ −1 ⊗
Y,8 + L2,H 6 vec Id ⊗ κ Y,6 + Mm4 vec Id
−1 ⊗4

  
=  κ4 L−1
42 +  κ3 L−1
2,H 6 Id 2 ⊗ L32
−1
vec⊗4 Id + 4!vec Sd14
  
= 7!! κ4 Sd18 + 5!!κ3 L−1
2,H 6 dI 2 ⊗ Sd1 6 vec⊗4 Id + 4!vec Sd14 ,

where κj denotes the cumulant parameter of Y, and the commutators are given in
Sect. A.2, p. 353.
γ⊗
In this case the sum of the entries of estimator 3X,1 has asymptotic variance
 

1d 8 Cγ 2 = 7!!
κ4 + 5!!42
κ3 + 4! d 4 ,

and asymptotic normality correspondingly.

6.6 A Simulation Study

In this section some numerical results will be provided in support of the theoretical
results. Monte Carlo experiments will be performed in order to compare estimations

γ X,1 and 3γ X,1 of skewness. We draw N = 1000 samples and each sample has
n = 1000 observations. We estimate parameters (below) for each sample and then
we take their average.
344 6 Multivariate Skewness and Kurtosis

Let us start with Mardia’s statistics, and use the 1D case for simplicity.
Case 6.23 (Normal Distribution) Let us suppose that Y has normal distribution,
Y ∈ N (0, 1). Mardia suggests the third-order moment Y 33 for the estimation of
skewness, and he uses 6 for the variance of the estimator (see [Mar70], (2.25)).
The proof of this result might be the following. We know that VarH3 (Y ) = 6, and
 
H3 Y 3 =Y 33 , (see (6.15); therefore, we conclude the variance nVarY33 is close to 6.
Our general formula (6.13) of variances (corresponding to the estimation of
skewness by the Hermite polynomial H3 , and the usage of true parameters) reduces
to 6 for the normal distribution since the cumulants of order greater than 2 are zero.
Formula (3.25) shows the variance for kurtosis, which is 24. Now Y ∈ N (0, 1),
H3 (Y ) = Y 3 − 3Y , and H4 (Y ) = Y 4 − 6Y 2 + 3. We estimate γX,1 and γX,2 by
H3 (Y ) and H4 (Y ), respectively. We can also estimate γX,1 and γX,2 using Y 3 by
   
3 3 3 3 34 3
H3 Y = Y andH4 Y = Y − 3, respectively, where Y denotes the standardized
variate using estimated parameters.

 
Estimator Hk (Y ) 3
Estimator Hk Y
Skewness 7 ∗ 10−5 7.7 ∗ 10−5
Variance for skewness 5.9165 5.9207
Kurtosis −0.0002 −0.0013
Variance for Kurtosis 24.3228 24.23

Case 6.24 (Skew Normal, 1D) The scalar-valued (1D) skew-normal variable X ∈
SNd (μ, , α) has been chosen with parameters μ = 0.7136, α = 2 (δ = 0.8944),
and  = 1, (we put a correlation “matrix” for this latter one ). From a sample of X
we calculated a sample of standardized X denoted by Y , assuming the parameter μ
and the variance  = 0.4907 are known. A “sample” of standardized X denoted by
3 is also calculated by estimated mean and variance.
Y

True parameters Estimate by X Estimate by Y 3


Estimate by Y
Mean EX = 0.7136 X = 0.7140 Y = 0.0005 3 = 0.0000
Y
Variance  = 0.4907 3 = 0.4916
 3 = 1.0019
 3 = 1.0000


We estimated skewness, kurtosis, and the sixth-order cumulant (using formula


(6.19) for this latter one), respectively. The sixth-order cumulant is included in the
formulae of variance of skewness.

True parameter Estimate by Y 3


Estimate by Y
Skewness γX,1 = 0.4538 γX,1 = 0.4547
 γX,1 = 0.4498
3
Kurtosis γX,2 = 0.3051 γX,2 = 0.2982
 γX,2 = 0.2785
3
Sixth cumulant κY,6 = −0.6615 
κY,6 = −0.6480 3
κY,6 = −0.6557
6.7 Appendix 345

The variance C3
γ 1 of estimated skewness is given by formula (6.13):

True value Cγ 1 C3
γ1,
Variance 9.9376 9.9317

Case 6.25 (Skew Normal, 3D) We consider X ∈ SN3 (μ, , α), where α =
[2, 10, 1] , and
⎡ ⎤
1 0.6325 0.7071
= ⎣ 0.6325 1 0.8944 ⎦ ,
0.7071 0.8944 1

as a consequence of these parameters δ = [0.7325, 0.9862, 0.9212] . We calculated


Y and 3Y with respect to the true and estimated parameters, respectively. The mean
of these samples are

μ = EX X Y 3
Y
0.58447 0.58338 −0.0010811 0
0.78688 0.78639 −0.00011134 0
0.73505 0.73407 −0.0012536 0

 
Skewness is estimated by γ ⊗ ⊗3 −1
X,1 = H3 (Yi ) = Yi − L11 ,12 Yi ⊗ vec Id , and
 
H3 3 Y = 3 Y⊗3 according to true and estimated parameters, respectively, see
Table 6.1 for the result.
The variance Cγ 1 which were calculated by the parameters, which calculated by
formula (6.13), and the variance C3 γ 1 of the estimators 3
Y⊗3 can be found in the
first two columns of the
 second Table 6.1. The third column contains the mean of
the variances of H3 3 Y , which is calculated for each sample during the simulation.
This latter looks much better estimate of Cγ 1 than the previous one. We conclude
that in practice when a sample is given then one can estimate the skewness with
 
3
Y⊗3 and the variance of the estimate with the variance of H3 3 Y . The number of
the entries of variance vector vecCγ 1 is 729, among them there are 55 distinct entries
and we listed 28 only.
346 6 Multivariate Skewness and Kurtosis

Table 6.1 Estimates: skewness and its variance for skew-normal distribution
vec Cγ 1 vec C3 vecCH3 3 
γ⊗ γ⊗ γ⊗
γ1 Y
X,1 X,1 3X,1
6.1398 6.1662 6.0466
0.045497 0.044867 0.044106
0.2158 0.0984 0.1947
0.10098 0.10194 0.10104
0.269 0.0281 0.2406
0.060291 0.059591 0.059111
0.1288 −0.2574 0.1223
0.10098 0.10194 0.10104
0.131 0.1926 0.09
0.22411 0.22597 0.22493
0.1606 0.0189 0.1514
0.13381 0.1339 0.13339
0.5006 0.1283 0.4275
0.060291 0.059591 0.059111
0.0782 0.0494 0.0572
0.13381 0.1339 0.13339
0.0959 0.0198 0.0762
0.079895 0.080431 0.08022
1.5771 0.107 1.4524
0.10098 0.10194 0.10104
0.2571 0.0695 0.2389
0.22411 0.22597 0.22493
0.0467 0.0695 0.0371
0.13381 0.1339 0.13339
10.5342 6.1905 10.2052
0.22411 0.22597 0.22493
0.7562 0.1954 0.711
0.49738 0.49651 0.49254
0.1286 0.0912 0.1147
0.29698 0.29765 0.2963
0.0279 −0.2011 0.023
0.13381 0.1339 0.13339
2.0899 0.4283 2.0503
0.29698 0.29765 0.2963
0.3408 0.0942 0.3236
0.17732 0.17804 0.17729
0.0619 0.1312 0.0569
0.060291 0.059591 0.059111
0.8792 0.1684 0.8489
0.13381 0.1339 0.13339
0.1373 0.1535 0.136
0.079895 0.080431 0.08022
0.1118 0.0771 0.1102
0.13381 0.1339 0.13339
0.3048 0.3055 0.2965
0.29698 0.29765 0.2963
0.2481 0.0455 0.2392
0.17732 0.17804 0.17729
0.5507 0.2483 0.5213
0.079895 0.080431 0.08022
0.2473 0.2609 0.2389
0.17732 0.17804 0.17729
0.5489 0.0987 0.5185
0.10587 0.10911 0.10764
6.4592 5.8826 6.3931

6.7 Appendix

6.7.1 Estimated Hermite Polynomials

We use the formulae given in Sect. A.5.2, p. 365.


Third order

H3 (Z) = Z⊗3 − L−1


11 ,12 (Z ⊗ vec Id ) ,
 
Yj = 3
H3 3 Y⊗3 − L−1 3
11 ,12 Y ⊗ vec Id = Y
3⊗3 = 3
κ Y,3 .
6.8 Exercises 347

Fourth order
 
H4 (Z) = Z⊗4 − L−1
12 ,21 vec Id ⊗ Z⊗2
+ L−1 ⊗2
22 vec Id ,
   
Yj = 3
H4 3 Y⊗4 − L−1
12 ,21 vec I d ⊗ 3
Y j
⊗2 + L−1 vec⊗2 I
22 d
 
=3Y⊗4 − L−1 −1 ⊗2
12 ,21 − L22 vec Id = Y
3⊗4 − L−1 vec⊗2 Id
22


=3
Y⊗4 − 3vec⊗2 Id ,

since (6.14), where L−1 ⊗2


22 is a commutator which transform vec Id to a 4-
symmetric tensor, we have

L−1 −1 −1 −1 −1 −1 −1
12 ,21 − L22 = Id 4 + K(1324) + K(1423) + K(2314) + K(2413) + K(3412)

−Id 4 − K−1 −1
(1324) − +K(1423)

= K−1 −1 −1
(2314) + K(2413) + K(3412) = K(3124) + K(3142) + K(3412) ,

therefore one can choose an equivalent commutator, say L−1


22 for this job, observe
⊗2 ⊗2
K(3412)vec Id = vec Id .
Fifth order

H5 (Z) = Z⊗5 − 10vec Id ⊗ Z⊗3 + 15vec⊗2 Id ⊗ Z,
 
H5 3Y =3 Y⊗5 − 10vecId ⊗ 3Y⊗3 .

Sixth order

H6 (Z) = Z⊗6 − 15vec Id ⊗ Z⊗4 + 45Z⊗2 ⊗ vec⊗2 Id − 15vec⊗3 Id ,
 
H6 3Yj = 3Y⊗6 − 15vecId ⊗ 3Y⊗4 + 30vec⊗3 Id .

6.8 Exercises

6.1 Let Y ∈ Rd , observe Y Y is scalar and show that


     
Y Y Y = (vec Id ) ⊗ Id Y⊗3 = Id ⊗ (vec Id ) Y⊗3 .
348 6 Multivariate Skewness and Kurtosis

6.2 Show
2
L−1 ⊗2
22 vec Id = 3d (d + 2) ,

see Sect. A.2 for commutator L−1


22 .
6.3 Show that

(vec Id )⊗2 L−1 ⊗


22 κ 4 = 3 (vecId )
⊗2 ⊗
κ4 .

6.4 Show that


 
Id 2 ⊗ (vec Id ) L−1
22 [vec Id ]
⊗2
= (d + 2) vec Id .

6.5 Assume that EY = 0, and VarY = Id . Show that if


 
B (Y) = E Y Y YY − (d + 2) Id ,

then
    ⊗
vecB (Y) = Id 2 ⊗ (vec Id ) κ ⊗ 
4 = (vec Id ) ⊗ Id 2 κ 4 .

6.6 Let Y = [Y1 , Y2 ] , show that if


( #  )
2
E Yi Y1
b (Y) = # 2 ,
E Yi Y2

then

b (Y)2 = μ2111 + 5μ2112 + 5μ2122 + μ2222 + 4μ111μ112 + 2μ111μ122


+8μ112μ122 + 2μ112μ222 + 4μ122μ222 .

6.9 Bibliographic Notes

Moments of folded normal distribution are obtained in Elandt [Ela61]. The mul-
tivariate skew-normal distribution was introduced by Azzalini and Dalla Valle
[ADV96]. Further properties and applications of skew-normal distribution including
canonical fundamental skew-normal can be found in papers [AC99, GHL01, PP01,
AC03, AVGQ04, Azz05, AVG05, Cap12, CC13, Shu16], and [AVFG18] among
others.
We refer the reader to the book by Fang, Kotz, and Ng, [FKN17] for the theory
of spherically symmetric or rotationally symmetric distributions. The distributions
6.9 Bibliographic Notes 349

of elliptical or ellipsoidal symmetry is considered in [Ste93, Sut86] higher-order


moments moment are given in [BB86]. Further properties of such distributions
are given in [And03, KvR06, Mui09], and asymmetric versions like canonical
fundamental skew-spherical distribution [AB02, Gen04, GL05, DL05]. Sahu et al.
[SDB03] and Kim and Mallick [KM03] deal with skew t- distribution. We refer to
a review article [Ser04] on multivariate symmetry. Some more skew distributions
include scale mixtures of skew normal [Kim08], multivariate skew-normal-Cauchy
[KRYAV16] and multivariate Laplace distribution [KS05, Kv95].
There are several indices of asymmetry; see [KS77] for scalar case and numerous
works in connection with testing normality. A pioneering work by Mardia [Mar70]
started the multivariate case. Various notions of skewness and kurtosis have been
proposed by different authors, and the list includes Malkovich and Afifi [MA73],
Isogai [Iso82], Srivastava [Sri84], Koziol [Koz89], Cardoso [Car89] Song [Son01],
and Móri et al. [MRS94], Balakrishnan et al. [BBQ07] and Kollo [Kol08]. Jam-
malamadaka et al. (2006) [JST06] proposed to use multivariate vector cumulants
for skewness and kurtosis, see also [JTT20] and [JTT21a].
A systematic treatment of asymptotic distributions of skewness and kurtosis
indices is given in Baringhaus [Bar91], Baringhaus and Henze [BH91, BH92],
Henze [Hen02] and Klar [Kla02]; see also [Koz87, RB89].
Appendix A
Formulae

A.1 Bell Polynomials

A.1.1 Incomplete (Partial) Bell Polynomials

 
n−r+1  j
1 xj
Bn,r (x1:n−r+1 ) = n! .
j ! j!
j =r, j =1
j j =n

Particular cases:

n=1
n=5
B1,1 (x1 ) = x1
B5,1 (x1:5 ) = x5
n=2
B5,2 (x1:4 ) = 5x1 x4 + 10x2 x3
B2,1 (x1:2 ) = x2
B5,3 (x1:3 ) = 15x1 x22 + 10x12 x3
B2,2 (x1 ) = x12
B5,4 (x1:2 ) = 10x13 x2
n=3
B5,5 (x1 ) = x15
B3,1 (x1:3 ) = x3
n=6
B3,2 (x1:2 ) = 3x1 x2
B6,1 (x1:6 ) = x6
B3,3 (x1 ) = x13
B6,2 (x1:5 ) = 6x1 x5 + 15x2 x4 + 10x32
n=4
B6,3 (x1:4 ) = 15x12 x4 + 60x1 x2 x3 + 15x23
B4,1 (x1:4 ) = x4
B6,4 (x1:3 ) = 45x12 x22 + 20x13 x3
B4,2 (x1:3 ) = 4x1 x3 + 3x22
B6,5 (x1:2 ) = 15x14 x2
B4,3 (x1:2 ) = 6x12 x2
B6,6 (x1 ) = x16
B4,4 (x1 ) = x14

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 351
G. Terdik, Multivariate Statistical Methods, Frontiers in Probability
and the Statistical Sciences, https://doi.org/10.1007/978-3-030-81392-5
352 A Formulae

A.1.2 Bell Polynomials


n
Bn (x1:n ) = Bn,r (x1:n−r+1 ) .
r=1

Direct calculation of Bell polynomials:

Bn (x1 , . . . , xn ) (A.1)
(k1 −1)+ (kn−2 −1

n−1   )+ 1  x n−2k1 +k2
1
= n! ...
(n − 2k1 + k2 )! 1!
k1 =0 k2 =(2k1 −n)+ kn−1 =(2kn−2 −kn−3 )+

1  x k1 −2k1 +k3


2
(k1 − 2k2 + k3 )! 2!
1  x kn−2 −2kn−1 1  x kn−1
2 n
×···×
(kn−2 − 2kn−1 )! 2! kn−1 ! n!
(k1 −1)+ (kn−2 −1

n−1   )+
= n! ...
k1 =0 k2 =(2k1 −n)+ kn−1 =(2kn−2 −kn−3 )+


n  kj−1 −2kj +kj+1
1 xj
  ,
kj −1 − 2kj + kj +1 ! j!
j =1

where k0 = n, and kn = kn+1 = 0, and the limits of summations are valid for
non-negative numbers. Particular cases:

n Bn (x1:n )
1 x1
2 x12 + x2
3 x13 + 3x1 x2 + x3
4 x14 + 6x12 x2 + 4x1 x3 + 3x22 + x4
5 x15 + 10x13 x2 + 15x1 x22 + 10x12 x3 + 5x1 x4 + 10x2 x3 + x5
6 x16 + 15x14 x2 + 20x13 x3 + 45x12 x22 + 15x23 + 60x1 x2 x3 + 15x4 x12 + 10x32 +
15x2 x4 + 6x1 x5 + x6
7 x17 + 21x15 x2 + 35x14 x3 + 105x13 x22 + 35x13 x4 + 210x12 x2 x3 + 105x1 x23
+21x12 x5 + 105x1 x2 x4 + 70x1 x32 + 105x22 x3 + 7x1 x6 + 21x2 x5 + 35x3 x4 + x7
8 x18 + 28x16 x2 + 56x15 x3 + 210x14 x22 + 70x14 x4 + 420x12 x23 + 560x13 x2 x3
+105x24 + 56x13 x5 + 840x1 x22 x3 + 280x12 x32 + 420x12 x2 x4
+280x2 x32 + 210x22 x4 + 280x1 x3 x4 + 168x1 x2 x5 + 28x12 x6
+35x42 + 56x3 x5 + 28x2 x6 + 8x1 x7 + x8
A Formulae 353

One can use the R package “kStatistics” for getting Bell polynomials and Bell
numbers.

A.1.2.1 Bell Numbers

The number of all subsets: Bn = Bn (1n )

n 1 2 3 4 5 6 7 8 9 10
Bn 1 2 5 15 52 203 877 4141 21,147 115,975

(see [DLM, Table 26.7.1]).

A.2 Commutators

A.2.1 Moment Commutators

See Sect. 2.4.3, p. 100 for definition.


Commutator: L−1
12 ,11 (2.60), p. 96,

L−1 −1 −1
12 ,11 = Id 3 + K(132) + K(231) = Id 3 + K(132) + K(312) , (A.2)

L−1 −1 −1
11 ,12 = Id 3 + K(213) + K(312) = Id 3 + K(213) + K(231) .

Commutator: L−1
22 (2.63), p. 96, (1.61), p. 54,

L−1 −1 −1
22 = Id 4 + K(1324) + K(1423) = Id 4 + K(1324) + K(1342) . (A.3)

Commutator: L−1
13 ,11 (2.62), p. 96,

L−1 −1 −1 −1
13 ,11 = Id 4 + K(1243) + K(1342) + K(2341) = Id 4 + K(1243) + K(1423) + K(4123)
(A.4)

L−1 −1 −1 −1
11 ,13 = Id 4 + K(2134) + K(3124) + K(4123) = Id 4 + K(2134) + K(2314) + K(2341) .
354 A Formulae

 4 
Commutator: L−1
12 ,21 (2.64), p. 96, number of terms 2,1,1 /2 = 12/2,

L−1 −1 −1 −1 −1 −1
12 ,21 = Id 4 + K(1324) + K(1423) + K(2314) + K(2413) + K(3412) . (A.5)

Commutator: L−1
13 ,12 (3.37), p. 137,

L−1 −1 −1 −1 −1 −1
13 ,12 = Id 5 + K(12435) + K(12534) + K(13425) + K(13524) + K(14523)

+ K−1 −1 −1 −1
(23415) + K(23514) + K(24513) + K(34512) .

Partitions with respect to type: 2 = 1, 3 = 1, i.e. L−1


13 ,12

p 123|45 124|35 125|34 134|25 135|24 145|23 234|15 235|14 245|13 345|12
K−1
p Id 5 K−1 −1 −1 −1 −1 −1 −1 −1 −1
(12435) K(12534) K(13425) K(13524) K(14523) K(23415) K(23514) K(24513) K(34512)

Commutator: L−1
32


15
L−1
32 = K−1
pj . (A.6)
j =1

Permutations pj of the numbers 1 : 6, originated from the partitions with respect


to type:  = (0, 3, 0, 0, 0, 0), i.e. 2 = 3 (see (4.10), (4.55), and (2.28))

pj 12|34|56 12|35|46 12|36|45 13|24|56 13|25|46 13|26|45 14|23|56 14|26|35


14|25|36 15|23|46 15|24|36 15|26|34 16|23|45 16|24|35 16|25|34

 
where permutation pj is split into three pairs pj (1) |pj (2) |pj (3) , and .N =
6!/3!23 = 15 = 5!!, since n = 6, 2 = 3.

Observe that the number of terms in L−1


k2 is (2k)!/k!2 = (2k − 1)!!, see (A.22),
k
−1 −1
instances are #L42 : 8!/4!2 = 7!!, #L52 : 9!!.
4

Commutator: L−1
23


10
L−1
23 = K−1
pj .
j =1

Permutations pj of the numbers 1 : 6,  = (0, 0, 2, 0, 0, 0), i.e. : 3 = 2,


A Formulae 355

pj 123|456 124|356 125|346 126|345 134|256 135|246 136|245 145|236 146|235 156|234
K−1
pj Id 6 K−1 −1 −1 −1 −1 −1 −1 −1 −1
(124356) K(125346) K(126345) K(134256) K(135246) K(136245) K(145236) K(146235) K(156234)

One can get these permutations choosing all possible 3 subsets of the set 1 : 6,
order each subset alphabetically. Then pair and unify them
 forming blocks and
the permutations. The number of permutations N = 63 /2 = 10.
Commutator: L−1
24


35
L−1
24 = K−1
pj .
j =1

Any partition in P8 with type 4 = 2 (j = 0, j = 4) contains two blocks


of four elements each. Listing one block from each partition is satisfactory for
getting all corresponding permutations p in P8 , since the second block is the
complementary set of the first one. An instance is the following: if the first block
B1 is (1, 3, 5, 7), then the second one is (2, 4, 6, 8) ; therefore, we list only first
blocks:

B1 1234 1235 1236 1237 1238 1245 1246 1247 1248 1256 1257
1258 1267 1268 1278 1345 1346 1347 1348 1356 1357 1358 1367
1368 1378 1456 1457 1458 1467 1468 1478 1567 1568 1578 1678

8
Here 4 = 2, n = 8, r = 2, N = 8!/2! (4!)2 = 4 /2 = 35.
Commutator: L−1
15 ,13


56
L−1
15 ,13 = K−1
pj .
j =1


Now, we have n = 8, 3 = 1, 5 = 1, r = 2, N = 83 = 8!/3!5! = 56. We
choose subsets with three elements by all possible ways and complete   them by
the complementary (5 digit) according to 1 : 8. We do not divide 85 because
each pair is different. We construct permutations taking 5 digits, for instance if
we have B2 = (1, 3, 6), then the complementary subset is (2, 4, 5, 7, 8) and then
permutation p = (24,578,136). We list 56 blocks with 3 entries:

B2 123 124 125 126 127 128 134 135 136 137 138 145 146 147 148 156 157 158
167 168 178 234 235 236 237 238 245 246 247 248 256 257 258 267 268 278 345
346 347 348 356 357 358 367 368 378 456 457 458 467 468 478 567 568 578 678
356 A Formulae

A.2.2 Commutators Connected to T-Hermite Polynomials


A.2.2.1 Mixing Commutator

Permutations (see (1.1), p. 4) for mixing two T -products of vectors with an equal
number of terms, say n implies commutator
  
M−1
mn = K−1
mn (p) d k1:n , d j1:n , (A.7)
p∈Pn

and it has n! terms. An instance:


Commutator: M−1
m3 If n = 3, then

−1 −1 −1 −1
M−1
m3 = K(142536) + K(142635) + K(152436) + K(152634)

+K−1 −1
(162435) + K(162534) (A.8)
= K(135246) + K(135264) + K(135426) + K(135624) + K(135462) + K(135642)

(see (1.2), p. 5).


Assuming that all dimensions coincide then by the construction of mn we have

M−1
mn = K(o,peven ) ,
peven ∈Pn

where o = (1, 3, . . . , 2n − 1) and peven denote permutations of even numbers of


(1 : 2n). Therefore,
 
M−1
mn = n! Id n ⊗ Sd1n Kp2n , (A.9)

p2n = (1, 3, . . . , 2n − 1, 2, 4, . . . , 2n),

vecId 3 = K−1 −1 ⊗3 ⊗3
(142536)K(142536)vecId 3 = K(142536)vec Id = K(135246)vec Id .

Let q2n = (1, n + 1, 2, n + 2, . . . , n, 2n), then q−1


2n = p2n , and we have

Kq2n vecId n = K−1 ⊗n


q2n vec Id .
A Formulae 357

A.2.2.2 H-Commutators

Examples
Commutator: L−1
2,H 2 Coefficient of κ ⊗
X,2 ⊗ H2 (X) :

L−1 −1 −1 −1 −1
2,H 2 = K(1324) + K(1423) + K(1324) + K(2413). (A.10)

Commutator: L−1
2,2,H 2 Coefficient of κ ⊗2
X,2 ⊗ H2 (X) :

L−1 −1
2,2,H 2 = L6,m2 = L−1
(m2 ((1:3)\j,(4:6)\k),j,k) (A.11)
j =1:3,k=4:6

= K−1 −1 −1 −1
(253614) + K(263514) + K(243615) + K(263415)

+K−1 −1 −1 −1
(243516) + K(253416) + K(153624) + K(163524)

+K−1 −1 −1 −1
(143625) + K(163425) + K(143526) + K(162534)

+K−1 −1 −1 −1
(152634) + K(162534) + K(142635) + K(162435)

+K−1 −1
(142536) + K(152436),

where for instance m2 ((1 : 3) \j, (4 : 6) \k) |j =1,k=5 = {(2436) , (2634)} and

L−1 −1 −1
(m2 ((1:3)\j,(4:6)\k),j,k) |j =1,k=5 = K(243615) + K(263415).

Commutator: L−1
2,H 4 Coefficient of κ ⊗
X,2 ⊗ H4 (X) :

L−1
2,H 4 = K−1
(j,k,(1:6)\(j,k)) (A.12)
j =1:3,k=4:6

= K−1 −1 −1 −1 −1
(142356) + K(152346) + K(162345) + K(241356) + K(251346)

+ K−1 −1 −1 −1
(261345) + K(341256) + K(351246) + K(361245)

= K(134256) + K(134526) + K(134562) + K(314256) + K(314526) + K(314562)


+ K(341256) + K(341526) + K(341562).

We also have
  
L−1 ⊗2
2,H 4 = L11 ,12 ⊗ Id 3 Id 3 ⊗ L11 ,12 K(134256) = L11 ,12 K(134256).

Commutators: L−1 −1 −1
2,H 6 , L2,2,H 4 , L2,2,2,H 2 . We have K−1
(15234678) = K(13452678) and
358 A Formulae


L−1
2,H 6 = K−1 ⊗2
(j,k,(1:8)\(j,k)) = L11 ,13 K(13452678), (A.13)
j =1:4,k=5:8

L−1
2,2,H 4 = L−1
(m2 ((1:4)\(j,k),(5:8)\(s,t )),j,k,s,t ) , (A.14)
j,k=1:4,j =k
s,t =5:8,s=t

L−1
2,2,2,H 2 = L−1
(m3 ((1:4)\j,(5:8)\k),j,k) , (A.15)
j =1:4,k=5:8

44
and L−1 −1
2,2,H 4 has 2 2 2 = 72 terms and L2,2,2,H 2 has 4 ∗ 4 ∗ 3! = 96 terms.
An instance m3 ((1 : 4) \j, (5 : 8) \k) |j =1,k=6 = {(253748) , (253847) ,
(273548) , (273845) , (283547) , (283745)} and the corresponding commutator
is

L−1 −1 −1 −1
(m3 ((1:4)\j,(5:8)\k),j,k) |j =1,k=6 = K(25374816) + K(25384716) + K(27354816)

+K−1 −1 −1
(27384516) + K(28354716) + K(28374516).

L−1
2,2,H 4 includes terms like:

K−1 −1
(37481256) + K(37481256) = K(56137824) + K(56137842),

K−1 −1
(27481356) + K(27481356) = K(51637825) + K(56127842).

J-Commutators
are connected to T -Hermite polynomials that are included in the recurrence
relation 4.2, p. 208. Define the J-commutator by


n−1
Jn = K−1
(n,j,[1:(n−1)]\j ) (d1:n )
j =1

(see (4.42), p. 208). Observe

J4 = K−1 −1 −1
(4123) + K(4213) + K(4312) .

Alternatively, we have for (4.41), p. 208 another form:


n−1
Jn = K−1
((j :n−1)S )n
j =1
A Formulae 359

and we have

J4 = K−1 −1
(2314) + K(1324) + Id 4 ,

J5 = K−1 −1 −1
(23415) + K(13425) + K(12435) + Id 5 .

A.3 Derivatives of Composite Functions

Case A.1 Let h (x) = f (g (x)),

∂ 4 h (x)
= f (1) (g) g(1,2,3,4)
∂x1 ∂x2 ∂x3 ∂x4


+f (2) (g) g(1,2,3)g4 + g1 g(2,3,4) + g(1,2,4) g3 + g2 g(1,3,4)


+f (2) (g) g(1,2)g(3,4) + g(1,3) g(2,4) + g(1,4)g(2,3)


+f (3) (g) g1 g2 g(3,4) + g1 g3 g(2,4) + g1 g4 g(2,3)


+f (3) (g) g(1,2)g3 g4 + g(1,3)g2 g4 + g(1,4) g2 g3
+f (4) (g) g1 g2 g3 g4 .

In particular

∂ 4 h (x) (1,3)
= f (1) (g) g(1,2)
∂x1 ∂x23
 
(1,2) (1) (1) (3) (1,2) (1) (1) (1,2)
+f (2) (g) g(1,2) g2 + g1 g2 + g(1,2) g2 + g2 g(1,2)
 
(1,1) (2) (1,1) (2) (1,1) (2)
+f (2) (g) g(1,2) g2 + g(1,2) g2 + g(1,2) g2
 
(1) (1) (2) (1) (1) (2) (1) (1) (2)
+f (3) (g) g1 g2 g2 + g1 g2 g2 + g1 g2 g2
!       "
(1,1) (1) 2 (1,1) (1) 2 (1) (1) 2
+f (3) (g) g(1,2) g2 + g(1,2) g2 + g(1,2) g2
 
(1,0) (0,1) 3
+f (4) (g) g1 g2 ,
 
(1,3) (1,2) (1) (1) (3) (1,1) (2)
= f (1) (g) g(1,2) + f (2) (g) 3g(1,2) g2 + g1 g2 + 3g(1,2) g2
!   "  
(1) (1) (2) (1,1) (1) 2 (1) (1) 3
+f (g) 3g1 g2 g2 + 3g(1,2) g2
(3)
+ f (4) (g) g1 g2 ,
360 A Formulae

and

∂ 4 h (x) (2,2)
= f (1) (g) g(1,2)
∂x12 ∂x22
 
(2,1) (1) (1) (1,2) (2,1) (1) (1) (1,2)
+f (2) (g) g(1,2) g2 + g1 g(1,2) + g(1,2) g2 + g1 g(1,2)
!     "
(2) (2) (1,1) 2 (1,1) 2
+f (2) (g) g1 g2 + g(1,2) + g(1,2)
!  "
(1) 2 (2) (1) (1) (1,1) (1) (1) (1,1)
+f (3) (g) g1 g2 + g1 g2 g(1,2) + +g1 g2 g(1,2)
!   "
(2) (1) 2 (1,1) (1) (1) (1,1) (1) (1)
+f (3)
(g) g1 g2 + g(1,2) g1 g2 + g(1,2) g1 g2
   
(1) 2 (1) 2
+f (4) (g) g1 g2

(2,2) (2,1) (1) (1) (1,2)
= f (1) (g) g(1,2) + f (2) (g) 2g(1,2) g2 + 2g1 g(1,2)
    "
(2) (2) (1,1) 2 (1,1) 2
+g1 g2 + g(1,2) + g(1,2)
!   "    
(2) (1) 2 (1,1) (1) (1) (1) 2 (1) 2
+f (3) (g) g1 g2 + 2g(1,2) g1 g2 + f (4) (g) g1 g2 .

Case A.2

Dx⊗5 f (g (x)) = f (1) (g (x)) Dx⊗5 g (x)


 
+f (2) (g (x)) L−1
14 ,12 D ⊗4
x g (x) ⊗ D ⊗
x g (x)
 
+f (2) (g (x)) L−1
13 ,12 D ⊗3
x g (x) ⊗ D ⊗2
x g (x)
 ⊗2 
−1 ⊗2 ⊗
+f (g (x)) L22 ,11 Dx g (x)
(3)
⊗ Dx g (x)
  ⊗ ⊗2 
+f (3) (g (x)) L−1
13 ,21 Dx
⊗3
g (x) ⊗ Dx g (x)
  ⊗3 
+f (4) (g (x)) L−1
12 ,31 Dx
⊗2
g (x) ⊗ Dx

g (x)
 ⊗5
+f (5) (g (x)) Dx⊗ g (x) ,

and since Dx⊗5 f (g (x)) = Sd15 Dx⊗5 f (g (x)), therefore


Dx⊗5 f (g (x)) = f (1) (g (x)) Dx⊗5 g (x)
A Formulae 361

 
+f (2) (g (x)) Sd15 5Dx⊗4 g (x) ⊗ Dx⊗ g (x) + 10Dx⊗3 g (x) ⊗ Dx⊗2 g (x)
  ⊗2 
⊗2 ⊗ ⊗3
 ⊗ ⊗2
+f (g (x)) Sd15 15 Dx g (x)
(3)
⊗ Dx g (x) + Dx g (x) ⊗ Dx g (x)
 ⊗3  ⊗5
+f (4) (g (x)) Sd15 Dx⊗2 g (x) ⊗ Dx⊗ g (x) + f (5) (g (x)) Dx⊗ g (x) .

A.4 Moments, Cumulants

μ1 = κ 1 ,

μ(1,2) = κ(1,2) + κ1 κ2 , (A.16)


 
μ1,2,3 = κ1,2,3 + κ1 κ(2,3) + κ(1,3) κ2 + κ(1,2) κ3 + κ1 κ2 κ3 , (A.17)

κ1:4 = μ1:4 − μ(1,2,3)μ4 − μ(1,2,4)μ3 − μ(1,3,4)μ2 − μ(2,3,4)μ1


−μ(1,2) μ(3,4) − μ(1,3) μ(2,4) − μ(1,4)μ(2,3) (A.18)

+2 μ(1,2)μ3 μ4 + μ(1,3)μ2 μ4 + μ(1,4) μ2 μ3 + μ1 μ(2,3)μ4

+μ1 μ(2,4) μ3 + μ1 μ2 μ(3,4) − 6μ1 μ2 μ3 μ4 ,

μ1:4 = κ1:4 + κ1 κ(2,3,4) + κ(1,3,4)κ2 + κ(1,2,4)κ3 + κ1,2,3 κ4 + κ(1,2) κ(3,4)


+κ(1,3) κ(2,4) + κ(1,4) κ(2,3)
+κ1 κ(2,3) κ4 + κ(1,3) κ2 κ4 + κ(1,2) κ3 κ4
+κ(1,4) κ2 κ3 + κ1 κ(2,4)κ3 + κ1 κ2 κ(3,4) + κ1 κ2 κ3 κ4 . (A.19)

In particular
2
μX,4 = κX,4 + 4κX,3 κX + 3κX,3 + 6κX,2 κX2 + κX4 .

A.4.1 T-Moments, T-Cumulants

Case A.3 T -moment, n = 3

μ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗
1:3 = κ 1:3 + κ 1 ⊗ κ 2,3 + K(3,2)S (d1:3 ) κ 1,3 ⊗ κ 2 + κ 1,2 ⊗ κ 3 + κ 1 ⊗ κ 2 ⊗ κ 3 .
362 A Formulae

In particular, if Xj = X then

μ⊗ ⊗ −1 ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ ⊗3
X,3 = κ X,3 + K(3,2)S κ X,2 ⊗ κ X,1 + κ X,1 ⊗ κ X,2 + K(231) κ X,1 ⊗ κ X,2 + κ X,1

= κ⊗ −1 ⊗ ⊗ ⊗3 ⊗ ⊗ ⊗ ⊗3
X,3 + L12 ,11 κ X,2 ⊗ κ X,1 + κ X,1 = κ X,3 + 3κ X,2 ⊗ κ X,1 + κ X,1 .

Case A.4 T -moment, n = 4


 
μ⊗ ⊗ ⊗ ⊗ −1 ⊗
1:4 = κ 1:4 + κ 1,2,3 ⊗ κ 4 + K(1243) κ 1,2,4 ⊗ κ 3

 
+K−1 (1342) κ ⊗
1,3,4 ⊗ κ ⊗ −1 ⊗
2 + K(2341) κ 2:4 ⊗ κ 1

 
+κ ⊗ 1,2 ⊗ κ ⊗
3,4 + K −1
(1324) κ ⊗
1,3 ⊗ κ ⊗
2,4
 
+K−1 ⊗ ⊗ ⊗
(1423) κ 1,4 ⊗ κ 2,3 + κ 1,2 ⊗ κ 3 ⊗ κ 4
⊗ ⊗

   
+K−1 ⊗ ⊗ ⊗ −1 ⊗
(1324) κ 1,3 ⊗ κ 2 ⊗ κ 4 + K(1423) κ 1,4 ⊗ κ 2 ⊗ κ 3
⊗ ⊗

 
+K−1 (2314) κ ⊗
2,3 ⊗ κ ⊗
1 ⊗ κ ⊗
4
   
+K−1 (2413) κ ⊗
2,4 ⊗ κ ⊗
1 ⊗ κ ⊗
3 + K −1
(3412) κ ⊗
3,4 ⊗ κ ⊗
1 ⊗ κ ⊗
2

+κ ⊗ ⊗ ⊗ ⊗
1 ⊗ κ2 ⊗ κ3 ⊗ κ4 .
 
Assume Cum1 Xj = EXj = 0,

μ⊗ ⊗ ⊗ ⊗ −1 ⊗ ⊗ −1 ⊗ ⊗
1:4 = κ 1:4 + κ 1,2 ⊗ κ 3,4 + K(3,2)S (d1:4 ) κ 1,3 ⊗ κ 2,4 + K(4:2)S (d1:4 ) κ 1,4 ⊗ κ 2,3 .

In particular, if Xj = X, then
   
μ⊗
X,4 = κ ⊗
X,4 + L−1
13 ,11 κ ⊗
X,3 ⊗ κ ⊗
X,1 + L−1 ⊗2
κ
22 X,2 + L −1
12 ,12 κ ⊗
X,2 ⊗ κ ⊗2 ⊗4
X,1 + κ X,1
 

= κ⊗ X,4 + 4 κ ⊗
X,3 ⊗ κ ⊗ ⊗2 ⊗ ⊗2 ⊗4
X,1 + 3κ X,2 + 6κ X,2 ⊗ κ X,1 + κ X,1 .

Case A.5 T -moment, n = 6, 8, Xj = X:


μ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗2 ⊗
X,6 = κ X,6 + 21κ X,2 κ 12 ⊗ κ X,5 + 35κ X,3 ⊗ κ X,4 + 105κ X,2 ⊗ κ X,3

μ⊗ ⊗ ⊗ ⊗ ⊗ ⊗ ⊗2
X,8 = κ X,8 + 28κ X,2 ⊗ κ X,6 + 56κ X,3 ⊗ κ X,5 + 35κ X,4

+280κ ⊗ ⊗2 ⊗2 ⊗ ⊗4
X,2 ⊗ κ X,3 + 210κ X,2 ⊗ κ X,4 + 105κ X,2 ,

where Sd16 and Sd18 have been applied.


A Formulae 363

A.5 Hermite Polynomials

H0 =
1,
H1 (X1 ) =
X1 ,
H2 (X1:2 ) =
X1 X2 − σ12 ,
H3 (X1:3 ) X1 X2 X3 − σ12 X3 − σ13 X2 − σ23 X1 ,
=
H4 (X1:4 ) X1 X2 X3 X4 − σ12 X3 X4 − σ13 X2 X4 − σ14 X2 X3
=
−σ23 X1 X4 − σ24 X1 X3 − σ34 X1 X2 + σ12 σ34 + σ13 σ24 + σ14 σ23 ,
#  
H5 (X1:5 ) = X5 H4 (X1:4 ) − 4j =1 σj,5 H3 X(1:5)\j .
These formulae are valid for the case when all Xj are identical to X, say

H0 = 1,
H1 (X) = X,
H2 (X) = X2 − σ 2 ,
H3 (X) = X 3 − 3Xσ 2 ,
H4 (X) = X 4 − 6σ 2 X 2 + 3σ 4 ,
H5 (X) = X 5 − 10σ 2 X 3 + 15σ 4 X,
H6 (X) = X 6 − 15σ 2 X 4 + 45σ 4 X 2 − 15σ 6 .

A.5.1 Product of Hermite Polynomials


 
Case A.6 Recall notation Xj 12 = Xj , Xj . Let p = 2

2
H2 (X1 )H2 (X2 ) = H4 (X12 , X212 ) + 4σ12 H2 (X1:2 ) + 2σ12 .

Case A.7
 
H2 (X1 ) H2 (X2 ) H2 (X3 ) = H6 X12 , X212 , X312
 
+ 4σ23 H4 X12 , X2 , X3 + 4σ12 H4 (X1, X2 , X312 )
+ 4σ13 H4 (X1, X212 , X3 ) + 8σ12 σ13 H2 (X2 , X3 ) + 8σ13 σ23 H2 (X1 , X2 )
+ 8σ12 σ23 H2 (X1 , X3 )
+ 2σ13
2
H2 (X2 ) + 2σ23
2
H2 (X1 ) + 2σ12
2
H2 (X3 ) + 8σ12 σ13 σ23 .

Case A.8

   
H4 X12 , X212 H2 (X3 ) = H6 X12 , X212 , X312 + 4σ13 H4 (X1 , X212 , X3 )
 
+ 4σ23 H4 X12 , X2 , X3 + 2σ13
2
H2 (X2 ) + 2σ23
2
H2 (X1 ) + 8σ13 σ23 H2 (X1 , X2 ).
364 A Formulae

  2
H4 (X1 ) H2 (X3 ) = H6 X14 , X312 + 8σ13 H4 (X13 , X3 ) + 12σ13 H2 (X1 )
H4 (X) H2 (X) = H6 (X) + 8σ 2 H4 (X) + 12σ 4 H2 (X) ,

cf. (4.26).
Case A.9


4 
8
 
H4 (X1:4 ) H4 (X5:8 ) = H8 (X1:8 ) + σj k H6 X(1:4)\j,(5:8)\k
i=1 j =5
  
+ σj1 k1 σj2 k2 H4 X(1:4)\(j1 ,j2 ),(5:8)\(k1 ,k2 )
72
  
+ σj1 k1 σj2 k2 σj3 k3 H2 X(1:4)\(j1 ,j2 ,j3 ),(5:8)\(k1 ,k2 ,k3 )
96

+ σ1k1 σ2k2 σ3k3 σ4k3 ,
4!

where 1 ≤ ji ≤ 4, and 5 ≤ ki ≤ 8 are distinct values.

H4 (X) H4 (X) = H8 (X) + 16σ 2 H6 (X) + 72σ 4 H4 (X) + 96σ 6 H2 (X) + 24σ 8 ,

where the coefficients are calculated by (4.26).


Case A.10
   
H6 X12 , X212 , X312 H2 (X4 ) = H8 X12 , X212 , X312 , X412
 
+ 4σ14 H6 X1 , X212 , X312 , X4 + 4σ24 H6 (X12 , X2 , X312 , X4 )
+ 4σ34 H6 (X12 , X212 , X3 , X4 )
   
+ 2σ14
2
H4 X212 , X312 + 2σ24 2
H4 (X12 , X312 ) + 2σ34
2
H4 X12 , X212
 
+ 8σ14 σ24 H4 X1, X2 , X312 + 8σ14 σ34 H4 (X1, X212 , X3 )
+ 8σ24 σ34 H4 (X12 , X2 , X3 ).

In particular
     
H6 (X1 ) H2 (X4 ) = H8 X16 , X412 + 12σ14 H6 X312 , X4 + 30σ14
2
H4 X414 .

Case A.11

EH2 (X1 ) H2 (X2 )H2 (X3 )H2 (X4 ) = 16 (σ12 σ13 σ24 σ34 + σ12 σ23 σ14 σ24

+σ23 σ13 σ24 σ14 + 4 σ13
2 2
σ24 + σ23
2 2
σ14 + σ12
2 2
σ34 .
A Formulae 365

This expectation corresponds to the term in (4.25).


Case A.12

H3 (X1:3 ) H3 (X4:6 ) = H6 (X1:6 )


     
+σ14 H4 X1:6\(1,4) + σ15 H4 X1:6\(1,5) + σ16 H4 X1:6\(1,6)
     
+σ24 H4 X1:6\(2,4) + σ25 H4 X1:6\(2,5) + σ26 H4 X1:6\(2,6)
     
+σ34 H4 X1:6\(3,4) + σ35 H4 X1:6\(3,5) + σ36 H4 X1:6\(3,6)
   
+H2 X1,4 (σ25 σ36 + σ26 σ35 ) + H2 X1,5 (σ24 σ36 + σ26 σ34 )
   
+H2 X1,6 (σ25 σ34 + σ24 σ35 ) + H2 X2,4 (σ15 σ36 + σ16 σ35 )
   
+H2 X2,5 (σ14 σ36 + σ16 σ34 ) + H2 X2,6 (σ15 σ34 + σ14 σ35 )
   
+H2 X3,4 (σ15 σ26 + σ16 σ25 ) + H2 X3,5 (σ14 σ26 + σ16 σ24 )
 
+H2 X3,6 (σ14 σ25 + σ15 σ24 )

+ σ1k1 σ2k2 σ3k3 .
3!

In particular
    2 H (X , X ) + 6σ 3
H3 (X1 ) H3 (X4 ) = H6 X13 , X413 + 9σ14 H4 X12 , X412 + 18σ14 2 1 4 14

(see also (4.26)), and

H3 (X)2 = H6 (X) + 9σ 2 H4 (X) + 18H2 (X) σ 4 + 6σ 6 .

A.5.2 T-Hermite Polynomials

H0 =1
H1 (X1 ) = X1
H2 (X1:2 ) = X1 ⊗ X2 − κ ⊗
1,2  
⊗1
H3 (X1:3 ) = X1:33 − κ ⊗ ⊗ −1
1,2 ⊗ X3 − X1 ⊗ κ 2,3 − K(3,2)S(d1:3 ) κ 1,3 ⊗ X

2

H4 (X1:4 ) = X⊗1 ⊗
1:4 − κ 1,2 ⊗ X
4 −1 ⊗
3 ⊗ X4 − K(3,2)S (d1:4 ) κ 1,3 ⊗ X2 ⊗ X4
   
−K−1 ⊗ −1
(4:2)S (d1:4 ) κ 1,4 ⊗ X2 ⊗ X3  − K(3,2)S X1 ⊗ κ 2,3 ⊗ X4

−K−1 ⊗
(4,3)S (d1:4 ) X1 ⊗ κ 2,4 ⊗  X3 − X1 ⊗ X2 ⊗ κ ⊗
 3,4  
+κ ⊗
1,2 ⊗ κ ⊗
3,4 + K −1
(3,2) (d 1:4 ) κ ⊗
1,3 ⊗ κ ⊗
2,4 + K −1 ⊗ ⊗
(4:2) (d1:4 ) κ 1,4 ⊗ κ 2,3 .
S S

Cycle notations are used here for permutations: K−1


(3,2) = K−1 −1
(1324) , K(4,3) =
S S
K−1
(1243) , K−1
(4:2)S = K−1
(1423) = K(1342) (see Sect. 1.1, p. 1).
366 A Formulae

Suppose that all Xj =X, and use notation vecVar (X) = κ 2 , then we have:
H0 =1
H1 (X) = X
H2 (X) = X⊗2 − κ 2 ,   
H3 (X) = X⊗3 − L−1 ⊗2
11 ,12 X ⊗ κ 2 = X⊗3 − 3X ⊗ κ ⊗2
2
⊗4 −1  ⊗ ⊗2
 −1 ⊗2
H4 (X) = X − L2,H 2 κ 2 ⊗ X + L22 κ 2

= X⊗4 − 6κ 2 ⊗ X⊗2 + 3κ ⊗2
2

H5 (X) = X⊗5 − 10κ 2 ⊗ X⊗3 + 15κ ⊗2
2 ⊗X

H6 (X) = X⊗6 − 15κ 2 ⊗ X⊗4 + 45κ ⊗2
2 ⊗X
⊗2 − 15κ ⊗3
2

H7 (X) = X⊗7 − 21κ 2 ⊗ X⊗5 + 105κ ⊗2
2 ⊗X
⊗2 − 105κ ⊗3 ⊗ X
2

H8 (X) = X⊗8 − 28κ 2 ⊗ X⊗6 + 210κ ⊗2
2 ⊗X
⊗4 − 420κ ⊗3 ⊗ X⊗2 + 105κ ⊗4 ,
2 2

where the symmetry equivalence (=) is obtained by using symmetrizer Sd1k , noting
that left-hand sides are symmetric.
Products of H2 , cf. (4.74), p. 224.
Let all nj = 2, any number of arms dK be even, Ln,0 = {bi = (2i + 1, 2 (i + 1))
|i = 0 : (n − 1)}, and all dimensions be d.
Case A.13 L2,0 = {b1 = (1, 2) , (3, 4)}, dK = (0, 2, 4)
   
H2 X12 ⊗ H2 X212 = H2 (X1 , X1 ) ⊗ H2 (X2 , X2 )
   
= H4 X12 , X212 + K−1 −1 −1 −1 ⊗
(1324) + K(1423) + K(2314) + K(2413) κ 1,2
 
⊗H2 (X1:2 ) + K−1(1324) + K −1 ⊗2
(1423) κ 1,2 .

Case A.14 L3,0 = {b1 = (1, 2) , (3, 4) , (5, 6)}, dK = (0, 2, 4, 6)


     
H2 X12 ⊗ H2 X212 ⊗ H2 X312 = H2 (X1 , X1 )
⊗H2 (X2 , X2 ) ⊗ H2 (X3 , X3 )
 
= H6 X12 , X212 , X312
 
+ K−1
(1324) + K −1
(2314) + K −1
(1423) + K −1
(2314) + K −1 ⊗
(2413) κ 1,2 ⊗ H2 (X1:2 )
 
+ K−1
(1324) + K −1 ⊗2
(1423) κ 1,2 .

Case A.15 H3 ⊗ H3

H3 (X1:3 ) ⊗ H3 (X4:6 )) (A.20)


  
= H6 (X1:6 ) + K−1
(142356) κ ⊗
1,4 ⊗ H 4 X 1:6\1,4
A Formulae 367

  
+K−1
(152346) κ ⊗
1,5 ⊗ H 4 X 1:6\1,5
     
+K−1
(162345) κ ⊗
1,6 ⊗ H 4 X 1:6\1,6 + K −1
(241356) κ ⊗
2,4 ⊗ H 4 X 1:6\2,4
     
+K−1 ⊗
(251346) κ 2,5 ⊗ H4 X1:6\2,5 + K−1 ⊗
(261345) κ 2,6 ⊗ H4 X1:6\2,6
     
+K−1 ⊗
(341256) κ 3,4 ⊗ H4 X1:6\3,4 + K−1 ⊗
(351246) κ 3,5 ⊗ H4 X1:6\3,5
  
+K−1
(361245) κ ⊗
3,6 ⊗ H 4 X 1:6\3,6
      
+ K−1(253614) κ ⊗
2,5 ⊗ κ ⊗
3,6 + K −1
(263514) κ ⊗
2,6 ⊗ κ ⊗
3,5 ⊗ H2 X1,4
      
+ K−1 ⊗ ⊗ −1 ⊗
(243615) κ 2,4 ⊗ κ 3,6 + K(263415) κ 2,6 ⊗ κ 3,4

⊗ H2 X1,5
      
+ K−1 ⊗ ⊗ −1 ⊗
(243516) κ 2,4 ⊗ κ 3,5 + K(253416) κ 2,5 ⊗ κ 3,4

⊗ H2 X1,6
      
+ K−1(153624) κ ⊗
1,5 ⊗ κ ⊗
3,6 + K −1
(163524) κ ⊗
1,6 ⊗ κ ⊗
3,5 ⊗ H2 X2,4
      
+ K−1(143625) κ ⊗
1,4 ⊗ κ ⊗
3,6 + K −1
(163425) κ ⊗
1,6 ⊗ κ ⊗
3,4 ⊗ H2 X2,5
      
+ K−1(143526) κ ⊗
1,4 ⊗ κ ⊗
3,5 + K −1
(153426) κ ⊗
1,5 ⊗ κ ⊗
3,4 ⊗ H2 X2,6
      
+ K−1 ⊗ ⊗ −1 ⊗
(152634) κ 1,5 ⊗ κ 2,6 + K(162534) κ 1,6 ⊗ κ 2,5

⊗ H2 X3,4
      
+ K−1 ⊗ ⊗ −1 ⊗
(142635) κ 1,4 ⊗ κ 2,6 + K(162435) κ 1,6 ⊗ κ 2,4

⊗ H2 X3,5
      
+ K−1(142536) κ ⊗
1,4 ⊗ κ ⊗
2,5 + K −1
(152436) κ ⊗
1,5 ⊗ κ ⊗
2,4 ⊗ H2 X3,6
   
+K−1
(142536) κ ⊗
1,4 ⊗ κ ⊗
2,5 ⊗ κ ⊗
3,6 + K −1
(142635) κ ⊗
1,4 ⊗ κ ⊗
2,6 ⊗ κ ⊗
3,5
   
+K−1 ⊗ ⊗ ⊗ −1 ⊗
(152436) κ 1,5 ⊗ κ 2,4 ⊗ κ 3,6 + K(152634) κ 1,5 ⊗ κ 2,6 ⊗ κ 3,5
⊗ ⊗

   
+K−1 ⊗ ⊗ ⊗ −1 ⊗ ⊗
(162435) κ 1,6 ⊗ κ 2,4 ⊗ κ 3,5 + K(162534) κ 1,6 ⊗ κ 2,5 ⊗ κ 3,6 .

We have Cum2 (H3 (X1:3 ) ⊗ H3 (X4:6 )) = EH3 (X1:3 ) ⊗ H3 (X4:6 ), and


 
EH3 (X1:3 ) ⊗ H3 (X4:6 ) = K−1
(142536) κ ⊗
1,4 ⊗ κ ⊗
2,5 ⊗ κ ⊗
3,6
 
+K−1
(142635) κ ⊗
1,4 ⊗ κ ⊗
2,6 ⊗ κ ⊗
3,5 (A.21)
 
+K−1 ⊗ ⊗ ⊗
(152436) κ 1,5 ⊗ κ 2,4 ⊗ κ 3,6
368 A Formulae

 
+K−1
(152634) κ ⊗
1,5 ⊗ κ ⊗
2,6 ⊗ κ ⊗
3,5
 
+K−1
(162435) κ ⊗
1,6 ⊗ κ ⊗
2,4 ⊗ κ ⊗
3,5
 
+K−1 ⊗ ⊗ ⊗
(162534) κ 1,6 ⊗ κ 2,5 ⊗ κ 3,6

(see (4.15)); hence,


⊗3
E [H3 (X) ⊗ H3 (X)] = M−1
m3 κ X,2 ,

see (A.7) for M−1


m3 .

A.6 Function G

Double Factorial (Semifactorial)


For odd n

n!! = n ∗ (n − 2) · · · 3 ∗ 1,

(2n)! = 2n n! (2n − 1)!!, (A.22)

and
√ (2k − 1)!!
 (k + 1/2) = π . (A.23)
2k
Pochhammer’s Symbol

(a)k =  (a + k) /  (a) (A.24)

(see [DLM, 5.2.5 Pochhammer’s Symbol])  (d/2 + 2m) /  (d/2) = (d/2)2m .


Pochhammer’s k-Symbol

p (p + 2) · · · (p + 2 (k − 1)) = (p)k,2 .
A Formulae 369

Definition G0 (p) = 1, let k > 0,


8
 ((p + k) /2) 2−k (p)k,2 n = 2k
Gk (p) = =
 (p/2) 2−k (p + 1)k,2 G1 (p) n = 2k + 1.

In particular:
:√ √
 (d/2 + 1/2) π 2(2k−1)!!
k (k−1)! =
(2k−1)!!
π 2(2k−2)!! d = 2k
G1 (d) = = k
 (d/2) k
= √ k!2 = √ (2k)!! d = 2k + 1
G1 (2k) π(2k−1)!! π(2k−1)!!
(A.25)

G2k (p) = (p/2)k = 2−k p (p + 2) · · · (p + 2 (k − 1)) = 2−k (p)k,2 , (A.26)

 ((p + 1) /2 + k)
G2k+1 (p) =
 (p/2)
= (p + 1) /2 ((p + 1) /2 + 1) · · · ((p + 1) /2 + k − 1) G1 (p)
= 2−k (p + 1) (p + 3) · · · (p + 2k − 1) G1 (p)
= 2−k (p + 1) (p + 1 + 2) · · · (p + 1 + 2 (k − 1))
= 2−k (p + 1)k,2 G1 (p) .
(A.27)

Let p > k, and then G−k (p) is well defined and follows:

2
G−1 (p) = G1 (p) ,
p−1
1 2
G−2 (p) = = ,
p/2 − 1 p−2
2
G−3 (p) = G−1 (p) ,
p−3
1 4
G−4 (p) = = ,
(p/2 − 2) (p/2 − 1) (p − 4) (p − 2)
22
G−5 (p) = G−1 (p) .
(p − 3) (p − 5)
370 A Formulae

In general

2k 2k
G−2k (p) = = ,
(p − 2k) · · · (p − 4) (p − 2) (p − 2k)k,2
1 ≤ k < p/2, (A.28)
2k G 1 (p) 2k G 1 (p)
G−(2k−1) (p) = = ,
(p − (2k − 1)) · · · (p − 3) (p − 1) (p − (2k − 1))k,2
1 ≤ k < p/2.
(A.29)

A.6.1 Moments, Cumulants for Skew-t Generator R

Moments


⎪ 1
 p n/2 ⎪
⎨ n = 2k
(p − 2k)k,2
μRp ,n = G−n (p) = pk . (A.30)
2 ⎪

G−1 (p)
n = 2k + 1

(p − 2k − 1)k,2

In particular:

pk
μRp ,2k = ,
(p − 2k) · · · (p − 4) (p − 2)

and
7 7 √
p p 2 2p
μRp = G−1 (p) = G1 (p) = G1 (p) ,
2 2 p−1 p−1
p 2 p
μRp ,2 = = ,
2 p−2 p−2
 p 3/2 2 7
p p p
μRp ,3 = G−1 (p) = G−1 (p) = μR ,
2 p−3 p−3 2 p−3 p
 p 3/2 √
22 p 2p
= G1 (p) = G1 (p) ,
2 (p − 1) (p − 3) (p − 3) (p − 1)
 p 2 4 p
μRp ,4 = = μR ,2 ,
2 (p − 4) (p − 2) p−4 p
A Formulae 371

 p 5/2 22 p2
μRp ,5 = G−1 (p) = μR ,
2 (p − 3) (p − 5) (p − 3) (p − 5) p
 p 3 4
μRp ,6 = .
2 (p − 4) (p − 2)

Cumulants
k 
 
2k − 1
κRp ,2k = μRp ,2k − κRp ,2k−2j +1 μRp ,2j −1
2j − 1
j =1

k−1 
 
2k − 1
− κRp ,2k−2j μRp ,2j
2j
j =1

k 
 
pk 2k − 1 pj
= − κRp ,1 κRp ,2k−2j +1
(p − 2k)k,2 2j − 1 (p − (2j − 1))j,2
j =1

k−1 
 
2k − 1 pj
− κRp ,2k−2j .
2j (p − 2j )j,2
j =1

In particular

p  p 2 
κRp ,2 = G−2 (p) − (G−1 (p))2 = − (G−1 (p))2 ,
2 2 p−2

 
p2 1 1
κRp2 ,2 = μRp ,4 − μ2Rp ,2 = −
p−2 p−4 p−2
 p 2 8 2p2
= = , (A.31)
2 (p − 4) (p − 2)2 (p − 4) (p − 2)2

 p 3/2 2  p 1/2 p 2
κRp ,Rp2 = G−1 (p) − G−1 (p)
2 p−3 2 2 p−2
 p 3/2 2
= G−1 (p) ,
2 (p − 2) (p − 3)
 p 2  4 2

2
κRp ,Rp3 = μRp ,4 − μRp μRp ,3 = − G (p) ,
2 (p − 4) (p − 2) p − 3 −1

1 2 p2
μRp ,4 − κRp ,Rp3 = μRp μRp ,3 = G−1 (p) ,
2 p−3
372 A Formulae

κRp ,3 = μRp ,3 − 3μRp ,1 μRp ,2 + 2μ3Rp ,1


 p 3/2  
2 2
=− G−1 (p) −3 + 2G2−1 (p)
2 p−3 p−2
 p 3/2  
2p − 7
=2 G−1 (p) G−1 (p) −
2
,
2 (p − 2) (p − 3)

κRp ,4 = μRp ,4 − 4μRp ,3 μRp − 3μ2Rp ,2 + 12μRp ,2 μ2Rp − 6μ4Rp


  2
 p 2 4 8 2
2
= − G (p) − 3
2 (p − 4) (p − 2) p − 3 −1 p−2

2
+12 G2−1 (p) − 6G4−1 (p)
p−2
 p 2  −2p + 10 2p − 7

= −8 G (p) − 6G−1 (p) ,
2 4
2 (p − 2)2 (p − 4) (p − 2) (p − 3) −1

κRp ,Rp ,Rp2 = μRp ,4 − μ2Rp ,2 − 2μRp μRp ,3 + 2μ2Rp μRp ,2


  2
 p 2 4 2
= −
2 (p − 4) (p − 2) p−2

2 2
−2 G2−1 (p) + 2 G2−1 (p)
p−3 p−2
 p 2  8 4

2
= − G −1 (p)
2 (p − 2)2 (p − 4) (p − 2) (p − 3)
 p k/2  ((p − k) /2)  p k/2
μRp ,k = = G−k (p) .
2  (p/2) 2

Moments and Cumulants of χ Distribution with Degrees of Freedom p

μχp ,k = 2k/2 Gk (p)

and

μχp ,2k = p (p + 2) · · · (p + 2 (k − 1)) = (p)k,2 ,


μχp ,2k+1 = (p + 1) (p + 3) · · · (p + 2k − 1) μχp ,1 = (p + 1)k,2 μχp ,1 .
A Formulae 373

In particular

μχp ,1 = κχp ,1 = 2G1 (p)

μχp ,2 = p,

κχp ,2 = p − κχ2p ,1 ,

μχp ,3 = (p + 1) κχp ,1 ,

κχp ,3 = (p + 1) κχp ,1 − 3pκχp ,1 + 2κχ3p ,1 = (−2p + 1) κχp ,1 + 2κχ3p ,1


   
= −κχp ,1 p − κχ2p ,1 + κχp ,1 = κχp ,1 1 − 2κχp ,2
 
= (p + 1) κχp ,1 − 2κχp ,2 κχp ,1 − pκχp ,1 = κχp ,1 1 − 2κχp ,2 ,

μχp ,4 = p (p + 2) ,

κχp ,4 = p (p + 2) − 4 (p + 1) κχ2p ,1 − 3p2 + 12pκχ2p ,1 − 6κχ4p ,1

= −2p2 + 2p + 4 (2p − 1) κχ2p ,1 − 6κχ4p ,1

= p (p + 2) − 3κχp ,3 κχp ,1 − 3pκχp ,2 − (p + 1) κχ2p ,1 ,

μχp ,5 = (p + 1) (p + 2) (p + 3) κχp ,1 ,
κχp ,5 = (p + 1) (p + 2) (p + 3) κχp ,1 − 4κχp ,4 − 6pκχp ,3
−4 (p + 1) κχp ,2 κχp ,1 − p (p + 2) κχp ,1
  
= κχp ,1 (p + 1) (p + 2) (p + 3) − 4κχp ,4 − 6p 1 − 2κχp ,2

−4 (p + 1) κχp ,2 − p (p + 2) ,


2k−1
2k − 1

κχp ,2k = μχp ,2k − κχp ,2k−j μχp ,j
j
j =1

k 
 
2k − 1
= μχp ,2k − κχp ,2k−2j +1 μχp ,2j −1
2j − 1
j =1

k−1 
 
2k − 1
− κχp ,2k−2j μχp ,2j
2j
j =1
374 A Formulae

k 
 
2k − 1
= (p)2k,2 − κχp ,1 κχp ,2k−2j +1 (p − 1)j,2
2j − 1
j =1

k−1 
 
2k − 1
− κχp ,2k−2j (p)j,2 ,
2j
j =1

2k  
 2k
κχp ,2k+1 = μχp ,2k+1 − κχp ,2k−j +1 μχp ,j
j
j =1

k  
 2k
= μχp ,2k+1 − κχp ,2k−2j +1 μχp ,2j
2j
j =1

k 
 
2k
− κχp ,2(k−j +1) μχp ,2j −1
2j − 1
j =1

k  
 2k
= κχp ,1 (p)2k+1,2 − κχp ,2k−2j +1 (p)2j,2
2j
j =1

 k  
2k
−κχp ,1 κχp ,2(k−j +1) (p)2j −1,2 ,
2j − 1
j =1
2k 
 
2k
κχp ,2k+1 = μχp ,2k+1 − κχp ,j μχp ,2k−j +1 .
j −1
j =1

A.6.2 Moments of Beta Powers


 m
βj j The random variable β12 has Beta(p/2, d/2) distribution and the random
 m /2  m /2
variable β2 is so that β22 = 1 − β12 . We have Eβ1m1 β2m2 = E β12 1 β22 2 =
 m /2  m /2
E β12 1 1 − β12 2 . If the numbers m1 and m2 are such that both p + m1 and
d + m2 are positive, then
4 1
1
Eβ1m1 β2m2 = x p/2+m1 /2−1 (1 − x)d/2+m2 /2−1 dx (A.32)
B (p/2, d/2) 0
B ((p + m1 ) /2, (d + m2 ) /2)
= ,
B (p/2, d/2)
A Formulae 375

where B (·) denotes the β-function. We have

B ((p + m1 ) /2, (d + m2 ) /2)  ((p + m1 ) /2)  ((d + m2 ) /2)  (p/2 + d/2)


=
B (p/2, d/2)  ((p + m1 ) /2 + (d + m2 ) /2)  (p/2)  (d/2)
 ((p + m1 ) /2)  ((d + m2 ) /2)
=
 (p/2)  (d/2)
 ((p + d) /2)
 ((p + d + m1 + m2 ) /2)
= Gm1 (p) Gm2 (d) Gm1 +m2 (d + p) ,

where Gm (p) =  ((p + m) /2) /  (p/2) (see (5.79)). Let us denote


Gm1 (p) Gm2 (d) Gm1 +m2 (d + p) by Gm1 ,m2 (p, d), for short; then the expected
value of β1m1 β2m2 is

Eβ1m1 β2m2 = Gm1 ,m2 (p, d) , (A.33)

where again

B ((p + m1 ) /2, (d + m2 ) /2)


Gm1 ,m2 (p, d) = = Gm1 (p) Gm2 (d) Gm1 +m2 (d + p) .
B (p/2, d/2)
(A.34)

Using conditional cumulants of products of independent variates one can express


m
mixed moments and cumulants of βj j and R in terms of moments and cumulants
mj
of βj and R, see (3.70), (3.71), (Exercise 3.21). For instance

κβ 2 ,β1 = μβ1 ,3 − μβ1 μβ1 ,2 ;


1

moreover,

κβ 2 R 2 ,β1 R = μβ1 ,3 μR,3 − μβ1 ,1 μβ1 ,2 μR,1 μR,2 , (A.35)


1

κβ1 R,β 2 R 2 = μβ1 ,β 2 μR,3 − μ2β1 μβ2 ,2 μR,1 μR,2 ,


2 2

κβ1 R,3 = μβ1 ,3 κR,3 + 3κβ 2 ,β1 κR,2 κR,1 + μ3R,1 κβ1 3 ,
1

= μβ1 ,3 μR,3 − 3μβ1 μβ1 ,2 μR,1 μR,2 + μ3R,1 μ3β1 ,

κβ 3 R 3 ,β1 R = μβ1 ,4 μR,4 − μβ1 ,3 μR,3 μβ1 μR


1

= μR,4 κβ 3 ,β1 + μβ1 ,3 μR,3 κR 3 ,R , (A.36)


1

κβ 2 R 2 ,2 = μβ1 ,4 μR,4 − μ2β 2 ,2 μ2R 2 ,2 ,


1 1
376 A Formulae

κβ 2 R 2 ,β1 R,β1 R = μβ1 ,3 κR 2 ,R,R + 2κβ 2 ,β 2 κR 2 ,1 κR,2 + 2κβ 3 ,β1 κR 2 ,R κR,1


1 1 1 1

+κβ 2 ,β1 ,β1 μR 2 μ2R ,


1

κβ 3 R 3 ,β1 R = μβ1 ,4 μR,4 − μβ1 ,3 μR,3 μβ1 μR ,


1

κβ 2 R 2 ,2 = μβ1 ,4 μR,4 − μ2β 2 ,2 μ2R 2 ,2 ,


1 1

κβ 2 R 2 ,β1 R,β1 R = κβ 2 ,β1 ,β1 μR 2 μ2R + 2κβ 2 ,β 2 κR,2


2
+ 2κβ 3 ,β1 κR 2 ,R κR,1 + μ3β1 κR 2 ,R,R ,
1 1 1 1 1

κβ 2 R 2 ,2 = μβ2 ,4 μR,4 − μ2β 2 ,2 μ2R 2 ,2 ,


2 2

κβ 2 R 2 ,β 2 R 2 = μβ 2 ,β 2 μR,4 − μβ1 ,2 μβ2 ,2 μ2R,2 ,


1 2 1 2

2
κβ 2 R 2 ,β1 R,β1 R = μβ 2 β 2 ,3 κR 2 ,R,R + 2κβ 2 ,β 2 κR,2
2 2 1 2 1

+2κβ1 β 2 ,β1 κR 2 ,R κR,1 + μR 2 μ2R κβ 2 ,β1 ,β1 .


2 2

A.7 Complementary Error Function

Complementary error function (probabilist’s complementary error function) is


defined by
7 4 7 4
2 z 2 ∞
−x 2 /2
e−x
2 /2
Erfc (z) = 1 − e dx = , z > 0.
π 0 π z

Note that the complementary error function is defined by [DLM, 7.2.1] as


4 z
2
e−x dx,
2
Erfc (z) = 1 − √
π 0

such that
 √ 
Erfc z/ 2 = Erfc (z) .

We have
7
d  z2 /2 
z2 /2 2
e Erfc (z) = ze Erfc (z) − ,
dz π
d 2  z2 /2  2 2
2
e Erfc (z) = ez /2 Erfc (z) + z2 ez /2 Erfc (z)
dz
A Formulae 377

7   2 7
2 2
−z = z + 1 e Erfc (z) − z
2 z /2
,
π π
d 3  z2 /2    2  7 2
e Erfc (z) = z + 3z e
3 z /2
Erfc (z) − z + 2
2
,
dz3 π

and

d 4  z2 /2      2  7 2
e Erfc (z) = z z + 3z + 3z + 3 e
3 2 z /2
Erfc (z) − z + 3z
3
dz4 π
  2  7
 2
= z4 + 6z2 + 3 ez /2 Erfc (z) − z3 + 5z .
π

We define functions Hm and Fm by


7
d m  z2 /2 
z2 /2 2
e Erfc (z) = Hm (z) e Erfc (z) − Fm (z)
 
,
dzm π

and we have the following recursions H0 (z) = 1, F0 (z) = 0,

H1 (z) = z, F1 (z) = 1, (A.37)


H2 (z) = z2 + 1, F2 (z) = z,
H3 (z) = z3 + 3z, F3 (z) = z2 + 2,
H4 (z) = z4 + 6z2 + 3, F3 (z) = z3 + 5z.

Therefore, the general solution follows:

d 
Hm (z) = zHm−1

(z) + H (z)
dz m−1

n−1 k
d
Fm (z) = H (z) .
dzk m−1−k
k=0

Mill’s ratio (probabilist’s Mill’s ratio) is defined by


7
z2 /2 2
M (z) = e Erfc (z) .
π

We can obtain it from the definition in [DLM, 7.8.1], which is


4 ∞
z2
e−z dz;
2
M (z) = e /

z
378 A Formulae

 √ 
hence, M z/ 2 = M (z). Indeed

7  √ 
z2 /2 2
M (z) = e Erfc z/ 2
π
1  √  2
= 2Erfc
z/ 2 = Erfc (z) .
√1 e−x /2 √1 e −x /2
2 2
2π 2π

A.8 Derivatives of i-Mill’s Ratio

i-Mill’s ratio is defined by

ϕ (ix)
ρiM (x) = .
 (ix)

We have derivatives as follows:


(1)
ρiM = (x − iρiM ) ρiM ,
(2) (1)
ρiM = (x − 2iρiM ) ρiM + ρiM ,
 
(3) (2) (1) (1)
ρiM = (x − 2iρiM ) ρiM + 2ρiM 1 − iρiM ,
 
(4) (3) (2) (1)
ρiM = (x − 2iρiM ) ρiM + 3ρiM 1 − 2iρiM ,
 
(5) (4) (3) (1) (2)
ρiM = (x − 2iρiM ) ρiM + 4ρiM 1 − 2iρiM − 6iρiM ,
 
(6) (5) (4) (1) (2) (3) (2) (3)
ρiM = (x − 2iρiM ) ρiM + 5ρiM 1 − 2iρiM − 12iρiM ρiM − 8iρiM ρiM
 
(5) (4) (1) (2) (3)
= (x − 2iρiM ) ρiM + 5ρiM 1 − 2iρiM − 20iρiM ρiM ,
   
(7) (6) (5) 1 (4) (2) (3)2 (2) (4)
ρiM = (x − 2iρiM ) ρiM + 6ρiM 1 − 2iρiM − 10iρiM ρiM − 20i ρiM + ρiM ρiM
   
(6) (5) (1) (4) (2) (3)2
= (x − 2iρiM ) ρiM + 6ρiM 1 − 2iρiM − 10i 3ρiM ρiM + 2ρiM .

Moments of |Z|, Z ∈ N (0, 1) are


7
2
μ|Z|,2k+1 = E |Z|2k+1 = (2k)!! ,
π
μ|Z|,2k = E |Z|2k = (2k − 1)!!.

In particular
Cumulants of |Z| in terms of i-Mill’s ratio are given by
A Formulae 379

k 1 2 3 4 5 6 7 8
/ / / /
2
μ|Z|,k π 1 2 π2 3 8 π2 15 48 π2 105

(k−1)
κ|Z|,k = (−i)k−1 ρiM (0) , (A.38)
(k−1)
ρiM (0) = (i)k−1 κ|Z|,k .

In particular we have

k 1 2 3 4 5 6
/ / 3 /  2 / 5  3
κ|Z|,k 2
π 1− 2
π 2 2
π − 2
π −6 2
π + 4 π2 24 2
π − −120 π2 +
/ 3  2
120 π2 −
20 2
π +
/ 28 π2
3 π2

k 7 8
 7/2  5/2  3/2  4  3  2
κ|Z|,k 720 π2 −840 2
π +266 2
π − −5040 2
π +6720 2
π −2688 2
π +
/
63 π2 288 π2
Notations

The following notations are commonly used.


Rd Euclidean space of dimension d consisting of d × 1 real
vectors.
k:n Colon operator. k : n denotes consecutive integers from
k to n (just like in MATLAB, R, Python etc., operator
“:” creates subscript arrays typically), if k < n, then
k : n = [k, k + 1, . . . , n], and if k > n, then integers
are in decreasing order k : n = [k, k − 1, . . . , n],
k : n is an ordered set in general, and (k : n) is a multi-
index.
!! Semifactorial (double factorial): (2k − 1)!! = 1 · 3 ·
5 · · · (2k − 1) = (2k)!/2k k!, for odd number and (2k)!! =
2 · 4 · 6 · · · (2k), for even number (if exceptionally (2k)!! =
0, then it will be stated clearly), instances are (1.51, (1.56),
and (4.23).
(·)S Cycle notation for a permutation p: (j1 , j2 , . . . , jk )S ,
for instance, cycle (j + 1, j )S transposes two adjacent
elements j and j + 1, cf. Sect. 1.1.
1d Vector with coordinates 1s only, i.e. 1d = [1, 1, . . . , 1].
X, Y , Z Random variables, capitals.
a, X Bold face letters: vectors (column), a is a real vector, X ∈
Rd is a vector variate.
e k ∈ Rd Unit axis vector. #d
i⊗
d,m Vector i⊗ d,m =
⊗m
k=1 ek .
A, B Bold face capital letters: matrices with dimensions m × n
and p × q, respectively, say.
Id Unit matrix with dimension d.
[X1 , X1 , X2 , X2 , X2 , X3 , X3 ] = X[1,1,2,2,2,3,3] = X12 ,213 ,312 .

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 381
G. Terdik, Multivariate Statistical Methods, Frontiers in Probability
and the Statistical Sciences, https://doi.org/10.1007/978-3-030-81392-5
382 Notations

Xa (d) List/array of vectors Xak , k = 1 : n, with possible different


dimensions, for example X1:n (d) = (X1 , X2 , . . . , Xn ),
where d = d1:n = (d1 , d2 , . . . , dn ) , and di is the
dimension of vector/ Xi .
# 2
X Norm of X, X = Xi .
 Transpose, upperscript.
⊗ Tensor (direct, Kronecker) product, T-product for short.
⊗ upperscript: T-exponent, for instance A⊗2 = A ⊗ A.
 Hadamard product.
Md,q Multilinear algebra.
Sd,q Set of symmetric tensors, linear subspace of Md,q .
Sd1q Symmetrizer matrix

= Equivalence in symmetry.  
ŋd,q Dimension of Sd,q : ŋd,q = d+q−1 .
  q
j1:q j1:q = 1 , . . . , q : type of either multi-index j1:q
(Remark 1.5, p. 20) or type of a partition (Definition 1.4,
p. 32).
wÐ Distinct entries of w ∈ Sd,q .
Qd,q q-plication matrix; Qd,2 duplication, Qd,3 triplication,
Qd,4 quadruplication (see (1.31), p. 21).
Q+
d,q Elimination matrix (q-way elimination matrix) (see (1.32),
p. 21).
ωd,q norm-weights (1.33), p. 22.
Dx f Jacobian matrix.
Dx⊗ Operator: T-derivative (Dx⊗ precedes both the matrix and
T-products)
vec Vector operator, applies for matrices column wise, sim-
ilarly

 for a list of vectors is defined as vecX1:n (d) =
  
X1 , X2 , . . . , Xn ; it is a column vector with dimension
d1:n .
Short notations: vec⊗2 A = (vecA)⊗2 , vec A = (vecA) .
tr Trace of a matrix
det Determinant of a matrix #
Xa1:n Sum, associated with an index set: Xa1:n = nj=1 Xaj
Xa1:n Product
n associated with an index set: Xa1:n =
j =1 Xa j.
#
a x Inner product: a x = dk=1 ak xk .
 k
Xk Exponent of a vector by a vector: Xk = dj=1 Xj j , where
k = [k1 , k2 , . . . , kd )], and 
X = [X1 , X2 , . . . , Xd ], for example Xr1d = dj=1 Xjr .
 
N μ, σ 2 Normal distribution with mean μ and variance σ 2 ,
N (μ, ) multivariate normal distribution with mean
vector μ and variance matrix .
Notations 383


μ(a) μ(a) = E nj=1 Xaj : expected value of the product of ran-
dom variables according to indexset a = (a1 , a2 , . . . , an ),
in particular μ(1:n) = EX1n = E nk=1 Xk .
μX,n μX,n = EXn . 
μ⊗
a (d) μ⊗ (d) = E ⊗n j =1 Xaj : expected value (with dimension
a 
d = dj ) of the T-product of random variables Xaj ,

μ⊗ ⊗
a (d) = μa forshort, in particular μ1:n (d) = EX1 ⊗
⊗n
X2 · · · ⊗ Xn = E j =1 Xj .
μ⊗ μ⊗X,n = EX .
⊗n
X,n 
κa κa = Cum Xa1 , . . . , Xan , cumulant of random variables
according to index set a = (a1 , a2 , . . . , an ), in particular
κ(1:n) = Cum (X1:n ).
κ⊗
a (d) κ⊗a (d) = Cum (Xa ), nth order T-cumulant of the list
of vectors Xak , k = 1 : n, for example κ ⊗ 1:n (d) =
Cum (X1:n ), κ ⊗ 1:n (d) = κ
1:n

for short, κ ⊗
1:n (d) is a vector
with dimension d = dj .
κ⊗
X,n κ⊗X,n = Cumn (X).
κ⊗
1,2 Covariance vector (second-order cumulant) κ ⊗ 1,2 =
vecCov (X2 , X1 ), where Cov (X2 , X1 ) denotes covariance
matrix.
Hn Hermite polynomial with degree n.
Hn T-Hermite polynomial
  with degree n.
Bracketing Parentheses:
  a set of ordered elements. Curly brackets:
a set
of elements without any particular order. Square
brackets: elements constitute either a vector or a matrix.
Partitions, L ∈ Pn Pn is the set of all partitions of the numbers 1 : n =
(1, 2, . . . , n), a partition L ∈ Pn contains blocks L =
{b1 , b2 , . . . , bk } so that blocks b1 , b2 , . . . , bk are disjoint
and ∪bj = {1, 2, . . . , n}. L{k} ∈ Pn is a partition with size
k, K{r|} ∈ Pn , is a partition with size r and type .
Partitions PII
n PII
n is the set of all partitions K of the pairs of the set
II

(1, 2, . . . n).
Partitions PI,II
n PI,II
n is the set of all partitions K
I,II ∈ PI,II . Blocks of K I,II
n
include either 1 or 2 elements.
Permutations, p ∈ Pn Pn denotes the set of all permutations of the numbers
1 : n = (1, 2, . . . , n) , if p ∈ Pn then p (1 : n) =
(p (1) , p (2) , . . . , p (n)), pq: product of two permutations,
p × q: consecutive application of permutations.
Repetition Vector having same components: [d, d, d, d, d] = d1k ,
+ ,- .
k
replicating value d k-times;
384 Notations

Subscriptions If subscript is a vector (ordered) of integers, say a1:n =


a[1:n] = [a1 , a2 , . . . , an ], ak is the kth element of a1:n ,
then Xa1:n denotes a vector with
components indexed
by
the elements of set a[1:n] , i.e. Xa1 , Xa2 , . . . , Xan .
Solutions

Chapter 1

1.11
 
a1 ⊗ Bb = vec Bba1 = (a1 ⊗ B) b,
 
Bb ⊗ a2 = vec a2 b B = (B ⊗ a2 ) b.

1.15
       2
vec⊗2 Id a⊗4 = vec Id a⊗2 vec Id a⊗2 = a a =a⊗2 a⊗2 = a⊗4 vecId 2 ,

       
vec Id ⊗ Id 2 a⊗4 = vec Id a⊗2 ⊗ a⊗2 = Id 2 a⊗2 ⊗ vec Id a⊗2 = Id 2 ⊗ vec Id a⊗4 .

1.16
   
vec AA AA = AA ⊗ AA (vecId ) = A⊗2 A⊗2 (vecId )
 
= Id 2 A⊗2 A⊗2 (vecId ) = (vecId ) ⊗ Id 2 A⊗4 .

1.17

K(234)S (d1:4 ) = K(23)S (d1 , d2 , d3 d4 ) = Id1 ⊗ K(2,1)S (d2 , d3 d4 ) = Id1 ⊗ Kd3 d4 •d2 .

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 385
G. Terdik, Multivariate Statistical Methods, Frontiers in Probability
and the Statistical Sciences, https://doi.org/10.1007/978-3-030-81392-5
386 Solutions

1.19
 
K−1
(132) d, d 2
, d = K−1
(1423) = K(1342)

   
K−1 −1
(12) ⊗ Id K(12) d, d 2 , d = K(2134)K(3124) = K(1324).
S S

K−1 −1 −1
(2134) K(1423) = K(1324) .

1.23
      
vec b ⊗ a ⊗ b ⊗ a = vec b ⊗ a ⊗ b ⊗ a
     
= Im ⊗ Km•p ⊗ Ip vec b ⊗ a ⊗ vec b ⊗ a
 
= Im ⊗ Km•p ⊗ Ip (b ⊗ a)⊗2 .

1.29
     
vec A ⊗ A = K(1324) vecA ⊗ vecA = K(1324)K(1243)K(1324)vec A ⊗ A
 
= K(1324)K(1342)vec A ⊗ A
 
= K(1432)vec A ⊗ A

     
vec A ⊗ A = K(1324) vecA ⊗ vecA = K(1324)K(2134)K(1324)vec A ⊗ A
 
= K(1324)K(3124)vec A ⊗ A
 
= K(3214)vec A ⊗ A .

1.33


d   ⊗2
vecId = e⊗2
j , vec⊗3 Id = e⊗2 ⊗2 ⊗2
j ⊗ ek ⊗ em , vecId 3 = ej ⊗ ek ⊗ em .
j =1 j,k,m j,k,m

Rearanging vec⊗3 Id by K(135246) we get K(135246)vec⊗3 Id = vecId 3 , and


K−1
(142536) = K(135246).

 ⊗2
K−1 ⊗3
(142536)vec Id = ej ⊗ ek ⊗ em = vecId 3 .
j,k,m
  
vecId 3 = Id 2 ⊗ Kd•d 2 ⊗ Id vecId 2 ⊗ vecId
  
= Id 2 ⊗ Kd•d 2 ⊗ Id (Id ⊗ Kd•d ⊗ Id ) vec⊗2 Id ⊗ vecId
Solutions 387

  
= Id 2 ⊗ Kd•d 2 ⊗ Id Id ⊗ Kd•d ⊗ Id 3 vec⊗3 Id
    
= Id ⊗ Id ⊗ Kd•d 2 Kd•d ⊗ Id 2 ⊗ Id vec⊗3 Id
 
= Id ⊗ K(1423)K(2134) ⊗ Id vec⊗3 Id = K(135246)vec⊗3 Id
    
vec⊗3 Id = Id ⊗ Id ⊗ Kd 2•d Id 2 ⊗ Kd•d ⊗ Id vecId 3
 
= Id ⊗ K(1342)K(2134) ⊗ Id vecId 3
= K(13452)vecId 3 .

1.36
     
vec A ⊗ Id 2 Ip ⊗ Kp·d ⊗ Id ⊗ vec B ⊗ Id 2 Ip ⊗ Kp·d ⊗ Id vec ( ⊗ )⊗2
     
= vec vec B ⊗ Id 2 Ip ⊗ Kp·d ⊗ Id (( ⊗ ) ⊗ ( ⊗ )) Ip ⊗ Kd·p ⊗ Id vecA ⊗ Id 2
    
= vec vec B ⊗ Id 2 Ip ⊗ Kp·d ⊗ Id  ⊗ ( ⊗ ) Kd·p ⊗  vecA ⊗ Id 2
   
= vec vec B ⊗ Id 2  ⊗ Kp·d ( ⊗ ) Kd·p ⊗  vecA ⊗ Id 2
   
= vec vec B ⊗ Id 2 ( ⊗  ⊗  ⊗ ) vecA ⊗ Id 2
  
= vec vec B ⊗ Id 2 (( ⊗ ) vecA ⊗  ⊗ )
 
= vec vec B ( ⊗ ) vecA ⊗  ⊗ 
 
= vec B ( ⊗ ) vecA vec ( ⊗ )

= trAB vec⊗2 .

1.45
    
 + 
Qd,2 x = ωd,2  Q+
d,2 Sd1 2 x = ω d,2  Q d,2 x + K (21) x /2 .

Let x = vecA, then K(21)vecA = vecA .


1.55
          
r3 r4 r1 r2 r4 − k3,4 r2 − k1,2 r1 − k1,2
k3,4 ! k1,2 ! k2,4 !
k3,4 k3,4 k1,2 k1,2 k2,4 k2,4 k1,3
 
r3 − k3,4
k1,3 !k1,4!k2,3 !
k1,3
r4 !r3 ! r1 !r2 !
=        
r4 − k3,4 ! r3 − k3,4 !k3,4! r1 − k1,2 ! r2 − k1,2 !k1,2!
   
r4 − k3,4 ! r2 − k1,2 !
×   
r4 − k2,4 − k3,4 ! r2 − k1,2 − k2,4 !k2,4 !
388 Solutions

   
r1 − k1,2 ! r3 − k3,4 !
×    k1,4 !k2,3!
r1 − k1,2 − k1,3 ! r3 − k3,4 − k1,3 !k1,3 !
r1 !r2 !r3 !r4 !
= .
k1,2 !k1,3 !k1,4!k2,3 !k2,4 !k3,4!

Chapter 2

2.5
1. r = 1,
8 = 1, j8 = (2, 2).
2. r = 2,
1 = 1, 7 = 1, j1 = (0, 1), j7 = (2, 1);
3 = 1, 5 = 1, j3 = (1, 0), j7 = (1, 2);
4 = 2, j4 = (1, 1);
2 = 1, 6 = 1, j2 = (0, 2), j6 = (2, 0).
3. r = 3,
1 = 2, 6 = 1, j1 = (0, 1), j6 = (2, 0);
3 = 2, 2 = 1, j3 = (1, 0), j2 = (0, 2);
1 = 1, 3 = 1, 4 = 1, j1 = (0, 1), j3 = (1, 0), j4 = (1, 1).
4. r = 4,
1 = 2, 3 = 2, j1 = (0, 1), j3 = (1, 0) .
Replace the above cases into (2.13).
2.10
 
∂ ∂
Dx⊗ (A ⊗ Bx) = (A ⊗ B) (Id ⊗ x) ⊗ = (A ⊗ B ⊗ Id ) Id ⊗ x ⊗
∂x ∂x
= (A ⊗ B ⊗ Id ) (Id ⊗ vecId ) = A ⊗ ((B ⊗ Id ) vecId ) = A ⊗ vecB .

2.11
       
Dx⊗ x Ax = vec x A + A = A + A x.

2.12 Let A be a square matrix, and show that


 
Dx⊗2 (Ax)⊗2 = K(3214) + Id 4 vecA⊗2 ,
 

Dx⊗ (Ax)⊗2 = (Ax) ⊗2
⊗ ,
∂x
Solutions 389

∂  
(Ax)⊗2 ⊗ = Ax ⊗ vecA + K−1 
(132) vecA ⊗ Ax
∂x
  
= K−1(132) + K −1
(231) vecA ⊗ Ax ,

  ∂

Dx⊗2 (Ax)⊗2 = K−1
(1324) + K −1
(2314) vecA 
⊗ Ax ⊗
∂x
   
= K−1 −1 ⊗2 
(1324) + K(2314) vec A = K(3124) + K(1324) vec A
⊗2 

   
= K(3124) + K(1324) K(1324)vecA⊗2 = K(3214) + Id 4 vecA⊗2 .

2.13 x ∈ R, f1 ∈ Rm1
  
∂          ⊗
vec Dy f1 (y) f2 = vec Dx⊗ f2 Dy f1 (y) = Im1 ⊗ Dx⊗ f2 Dy f1
∂x
∂   
Dy f1 (y) f2 = Dy f1 (y) Dx⊗ f2 = Dx⊗ f2 ⊗ Im1 Km2 •m1 Dy⊗ f1 .
∂x
 
2.17 Dy f = Dy⊗ f ,

      ∂
    ∂
Dx⊗ h = Dy⊗ f ⊗ Id Dx⊗ g = Dy⊗ f ⊗ Id g⊗ = Dy⊗ f g ⊗
∂x ∂x
   ∂

= Dy⊗ f gj ⊗ = Dy⊗ f ∗ Dx⊗ g.
j ∂x

2.18

∂   


−1 2
(Vx ⊗ vecV) ⊗ = K(12)S d, d , d vecV ⊗ Vx ⊗
∂x ∂x
  
= K−1
(12) d, d 2 , d vecV ⊗ vecV .
S

∂    ∂
 
(Vx ⊗ vecV) ⊗ = K−1
(132) d, d 2
, d Vx ⊗ ⊗ vecV
∂x ∂x
  
= K−1 2
(132) d, d , d vecV ⊗ vecV .

2.25 Let us denote g = Dx⊗ g, and consider


 
h = f ⊗ Id 2 g ,
   
dh (x; dx) = df (x; dx) ⊗ Id 2 g + f ⊗ Id 2 dg (x; dx)
   
= g  ⊗ Id (Im ⊗ vecId ) df (x; dx) + f ⊗ Id 2 dg (x; dx) ;
390 Solutions

hence,
      
Dx⊗ f ⊗ Id 2 g = g  ⊗ Id (Im ⊗ vecId ) Dx⊗ + f ⊗ Id 2 Dx⊗ g
    
= Dx⊗ g ⊗ Id 2 (Im ⊗ vecId ⊗ Id ) Dx⊗ f + f ⊗ Id 2 Dx⊗2 g,

since
         
f ⊗ Id Dx⊗ g = Dx⊗ g ⊗ Id vec f ⊗ Id = Dx⊗ g ⊗ Id (f ⊗ vecId )
  
= Dx⊗ g ⊗ Id (Im ⊗ vecId ) f,

and since by (1.5) and (1.7)

f ⊗ vecId = (Im ⊗ vecId ) f,


       
Dx⊗ g ⊗ Id (Im ⊗ vecId ) ⊗ Id Dx⊗ f = Dx⊗ g ⊗ Id 2 (Im ⊗ vecId ⊗ Id ) Dx⊗ f

     
Dx⊗ g ⊗ Id (Im ⊗ vecId ) ⊗ Id = Dx⊗ g ⊗ Id 2 (Im ⊗ vecId ⊗ Id ) ;

therefore,
        
Dx⊗ f ⊗ Id Dx⊗ g = f ⊗ Id 2 Dx⊗2 g+ Dx⊗ g ⊗ Id 2 (Im ⊗ vecId ⊗ Id ) Dx⊗ f.

 
2.27 f (1) (g) is scalar, commutes with Dx⊗1 g; hence, Dx⊗1 f (g) = Dx⊗1 g ⊗ f (1) (g)

 ⊗   ⊗ (1)   
Dx1 g ⊗ Dx2 f (g) = f (2) (g) Dx⊗1 g ⊗ Dx⊗2 g .

The second term


 
f (1) (g) Dx⊗2 Dx⊗1 g = f (1) (g) Dx⊗1:2 g.

Combine these results we obtain


    
Dx⊗3 f (2) (g) Dx⊗1 g ⊗ Dx⊗2 g = f (3) (g) Dx⊗1 g ⊗ Dx⊗2 g ⊗ Dx⊗3 g
 
+ f (2) (g) K−1 ⊗ ⊗
(2,3) (d1:3 ) Dx1,3 g ⊗ Dx2 g + f
(2)
(g) Dx⊗1 g ⊗ Dx⊗2:3 g.
S

 
Dx⊗4 f (2) (g) Dx⊗1:2 g ⊗ Dx⊗3 g = f (3) (g) Dx⊗1:2 g ⊗ Dx⊗3 g ⊗ Dx⊗4 g

+f (2) (g) K−1 ⊗3 ⊗


(2,3)S (d1 d2 , d3 , d4 ) Dx1,2,4 g ⊗ Dx3 g

+f (2) (g) Dx⊗1,2 g ⊗ Dx⊗3,4 g.


Solutions 391

The next term contains a commutator matrix; therefore, the first step
   
Dx⊗4 f (2) (g) K(2,3)S d(2,3)S Dx⊗1,3 g ⊗ Dx⊗2 g
     
= K(2,3)S d(2,3)S ⊗ Id4 Dx⊗4 f (2) (g) Dx⊗1,3 g ⊗ Dx⊗2 g
 
= K(2,3)S (d1 , d3 , d2 , d4 ) Dx⊗4 f (2) (g) Dx⊗1,3 g ⊗ Dx⊗2 g ,

then
 
Dx⊗4 f (2) (g) Dx⊗1,3 g ⊗ Dx⊗2 g = f (3) (g) Dx⊗1,3 g ⊗ Dx⊗2 g ⊗ Dx⊗4 g
 
+f (2) (g) K(2,3)S (d1 d3 , d4 , d2 ) Dx⊗3
1,3,4
g ⊗ Dx⊗2 g

+f (2) (g) Dx⊗1,3 g ⊗ Dx⊗2,4 g.

It is necessary considering the product


 
K(2,3)S d(2,3)S , d4 K(2,3)S (d1 d3 , d4 , d2 ) = K(1324) (d1 , d3 , d2 , d4 ) K(1,2,4,3) (d1 , d3 , d4 , d2 )

of two commutator matrices, which corresponds to consecutive application of


permutations p1 = (1243) and p2 = (1324), the result is p1 p2 = (1423), and
therefore,

   
Dx⊗4 f (2) (g) K(2,3)S d(2,3)S Dx⊗1,3 g ⊗ Dx⊗2 g

= f (3) (g) K(2,3)S (d1 , d3 , d2 , d4 ) Dx⊗1,3 g ⊗ Dx⊗2 g ⊗ Dx⊗4 g


 
+ f (2) (g) K(1,4,3,2) (d1 , d3 , d4 , d2 ) Dx⊗3 1,3,4
g ⊗ D ⊗
x2 g

+ f (2) (g) K(2,3)S (d1 , d3 , d2 , d4 ) Dx⊗1,3 g ⊗ Dx⊗2,4 g.

We see that (1423) is the inverse of the permutation (1342) corresponding to the
term Dx⊗3
1,3,4
g ⊗ Dx⊗2 g.

 −1
K(1423) (d1 , d3 , d4 , d2 ) = K(1342) (d1:4 ) .

Notice that Dx⊗3


1,3,4
g ⊗ Dx⊗2 g corresponds to the partition K of (1 : 4) with blocks
K = {(1, 3, 4) , (2)} and p (K) = (1342). Now
 
Dx⊗4 f (2) (g) Dx⊗1 g ⊗ Dx⊗2:3 g = Dx⊗4 f (2) (g) Dx⊗1 g ⊗ Dx⊗2:3 g
 
= f (3) (g) Dx⊗1 g ⊗ Dx⊗2:3 g ⊗ Dx⊗4 g
392 Solutions

   
Dx⊗4 Dx⊗1 g ⊗ Dx⊗2:3 g = K−1
(1,4,2,3) D ⊗
x1,4 g ⊗ D ⊗
x2:3 g + Dx⊗1 g ⊗ Dx⊗2:4 g.

Now we collect all terms

Dx⊗4
1:4
f (g) = f (1) (g) Dx⊗41:3
g + f (2) (g) Dx⊗1,2 g ⊗ Dx⊗3,4 g + K−1 ⊗ ⊗
(2,3)S Dx1,3 g ⊗ Dx2,4 g +
 
+f (2) K−1 ⊗3 ⊗
(1,3,4,2) Dx1,3,4 g ⊗ Dx2 g + f
(3) (g) D ⊗ g ⊗ D ⊗ g ⊗ D ⊗ g
x1:2 x3 x4

+f (4) (g) Dx⊗1 g ⊗ Dx⊗2 g ⊗ Dx⊗3 g ⊗ Dx⊗4 g.

Chapter 3

3.8 Gaussian κ ⊗ −1
X,k = 0, for k > 2, and K(1423) = K(1342) = K(1243) K(1324) ,

K−1 ⊗2 ⊗2 ⊗2
(1324) κ X,2 = (Id ⊗ Kd·d ⊗ Id ) κ X,2 = vec ,

   
Cum2 X⊗2 , X⊗2 = K−1(1324) + K −1 ⊗2
(1423) κ X,2 = vec
⊗2
+ K(1342)κ ⊗2
X,2
  ⊗2
= Id 4 + K(1243) vec .

3.12
     
κXY ,3 = Cum1 κXY,3|Y + 3Cum2 κXY |Y , κXY,2|Y + Cum3 κXY |Y ,
     
κXY ,3 = Cum1 Y 3 κX,3|Y + 3Cum2 Y κX|Y , Y 2 κX,2|Y + Cum3 Y κX|Y ,
κXY ,3 = κX,3 κY 3 ,1 + 3κX,1κX,2 κY 2 ,Y +κX,1
3
κY,3 .

3
κXY ,3 = κX,3 κY,3 +3κX,3κY,2 κY,1 + 3κX,2κX,1 κY,3 + 6κX,2κX,1 κY,2 κY,1 + κX,3 κY,1

+ κX,1
3
κY,3
   
3 3
= κX,3 κY ,3 + 3κY,1 κY ,2 + κY,1 + κY ,3 κX,3 + 3κX,1 κX,2 + κX,1

− κX,3 κY ,3 + 6κX,1 κX,2 κY,2 κY,1


= κX,3 μY,3 + μX,3 κY,3 − κX,3 κY,3 + 6κX,2 κX,1 κY,2 κY,1 ,

κX,3 = μX,3 − 3μX,1 μX,2 + 2μ3X,1,

μX,3 = κX,3 + 3κX,1 κX,2 + κX,1


3
,
κY 2 ,Y = κY,3 + κY,2 κY,1 + κY,2 κY,1 ,
Solutions 393

3
κXY ,3 = κX,3 κY 3 ,1 + 3κX,1κX,2 κY 2 ,Y + κX,1 κY,3 = μY,3 μX,3 −3μX,1 μX,2 μY,2 μY,1

+ 2μ3X,1 μ3Y,1 .

3.16

Cum (X1 , X2 , X3 , X4 |X4 ) = X4 μ1:3|4 −μ1,2,3|4X4 −X4 μ1,2|4 μ3|4 − X4 μ1,3|4μ2|4



−X4 μ2,3|4μ1|4 + 2 μ1,2|4 μ3|4 X4 + μ1,3|4μ2|4 X4
+X4 μ1|4 μ2|4 μ3|4 + μ1|4 μ2,3|4 X4 + μ1|4 μ2|4 μ3|4 X4

+μ1|4 μ2|4 μ3|4 X4 − μ1,2|4 μ3|4 X4 − μ1,3|4μ2|4 X4
−μ1|4 μ2,3|4 X4 − 6μ1|4 μ2|4 μ3|4 X4 = 0.

3.21
   
Cum κX1 Y1 |Y1,2 , κX2 Y2 |Y1,2 , κX3 Y3 |Y1:3 = Cum Y1 κX1 ,1 , Y2 κX2 ,1 , Y3 κX3 ,1
= κX1 ,1 κX2 ,1 κX3 ,1 Cum (Y1 , Y2 , Y3 )
 
Cum κX1 Y1 ,X2 Y2 |Y1:3 , κX1 Y1 |Y1:3 = κX1 ,X2 κX3 ,1 Cum (Y1 Y2 , Y3 )

   
κX1 Y1 ,X2 Y2 ,X3 Y3 = Cum κX1 Y1 ,X2 Y2 ,X3 Y3 |Y1:3 + Cum κX1 Y1 ,X2 Y2 |Y1:3 , κX3 Y3 |Y1:3
   
+Cum κX1 Y1 ,X2 Y2 |Y1:3 , κX1 Y1 |Y1:3 + Cum κX2 Y2 ,X3 Y3 |Y1:3 , κX1 Y1 |Y1:3
 
+Cum κX1 Y1 |Y1,2 , κX2 Y2 |Y1,2 , κX3 Y3 |Y1:3
   
= Cum Y1 Y2 Y3 κX1:3 |Y1:3 + Cum Y1 Y2 κX1,2 |Y1:3 , Y3 κX3 |Y1:3
   
+Cum Y1 Y3 κX1,3 |Y1:3 , Y2 κX2 |Y1,3 + Cum Y2 Y3 κX2,3 |Y1:3 , Y1 κX1 |Y1:3
 
+Cum Y1 κX1 |Y1,2 , Y2 κX2 |Y1,2 Y3 , κX3 |Y1:3

= κX1:3 κY1 Y2 Y3 ,1 + κX1,2 κX3 κY1 Y2 ,Y3 + κX1,3 κX2 κY1 Y3 ,Y2

+κX2,3 κX1 κY2 Y3 ,Y1 + κX1 κX2 κX3 κY1:3 .

3.23

φ0 (λ1:n ) = φ (λk ) ,

ψ0 (λ1:n ) = ψ (λk )
394 Solutions

(see Exercise 3.27).


 ⊗2 
Dλ⊗2 φ0 (λ) = φ0 (λ) ψ (1) (λk ) ek + φ0 (λ) ψ (2) (λk ) e⊗2
k

⎡ ⎤ ⎡ ⎤ ⎡ ⎤
∂ 1 ∂ ∂
∂λ1 φ0 (λ) φ(λ1 ) ∂λ1 φ (λ1 ) ∂λ1 ψ (λ1 )
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
Dλ⊗ φ0 (λ) = ⎢

..
.
⎥ = φ0 (λ) ⎢
⎦ ⎣
..
.
⎥ = φ0 (λ) ⎢
⎦ ⎣
..
.


∂ 1 ∂ ∂
φ
∂λn 0 (λ) φ(λn ) ∂λn φ (λn ) ∂λn ψ (λn )

n
= φ0 (λ) ψ (1) (λk ) ek ,
k=1
 ⊗3    
Dλ⊗3 φ0 (λ) = φ0 (λ) ψ (1) (λk ) ek + φ0 (λ) ψ (1) (λk ) ek ⊗ ψ (2) (λk ) e⊗2
k
   
+φ0 (λ) K−1
(1,3,2) ψ (2) (λk ) e⊗2
k ⊗ ψ (1) (λk ) ek
   
+φ0 (λ) ψ (2) λj ψ (1) (λk ) e⊗2
j ⊗ ek + φ0 (λ) ψ (3) (λk ) e⊗3
k .
j,k


3.25 F3 1d = 0,
 
⊗3 ⊗3 
μ⊗ 3
X,3 = κX,1 F3 + 3κX,2κX,1 I⊗3
1d d + K −1
(132) + K −1
(231) F3 1d

⊗2
 ⊗3
× ⊗ F3 vec (Id ) + κX,3 F3 e⊗3
k
   3
= κX,3 F3 ek = 3κX,3.

3.26

φ0 (λ1:n ) = φ (λk ) ,

ψ0 (λ1:n ) = ψ (λk ) .

Let ek be the unit coordinate vector in Rn ; then


n
Dλ⊗ φ0 (λ) = φ0 (λ) ek ⊗ Dλ⊗k ψ (λk ) ,
k=1

 ⊗2 
Dλ⊗2 φ0 (λ) = φ0 (λ) ek ⊗ Dλ⊗k ψ (λk ) + φ0 (λ) e⊗2 ⊗2
k ⊗ Dλk ψ (λk ) .
Solutions 395

3.27

φ0 (λ1:n ) = φ (λk ) ,

ψ0 (λ1:n ) = ψ (λk ) ,

 ⊗3
Dλ⊗3 φ0 (λ) = φ0 (λ) ek ⊗ Dλ⊗ ψ (λk )
   
+φ0 (λ) ek ⊗ Dλ⊗ ψ (λk ) ⊗ e⊗2
k ⊗ Dλ
⊗2
ψ (λk )
   
+φ0 (λ) K−1
(132) e ⊗2
k ⊗ D ⊗2
λ ψ (λk ) ⊗ e k ⊗ D ⊗
λ ψ (λk )
  
+φ0 (λ) e⊗2 ⊗2
k ⊗ Dλ ψ (λk ) ek ⊗ Dλ⊗ ψ (λk )

+φ0 (λ) e⊗3 ⊗3
k ⊗ Dλ ψ (λk ) ,

take n = 3
   
Cum3 (Y) = Cum3 e k ⊗ Xk = Cum3 (ek ⊗ Xk ) = e⊗3 ⊗
k ⊗ κ X,3 ,

see (3.12).
3.29 Use (3.58):
   
Cum X AX, X BX = Cum2 (vecA) X ⊗ X, (vecB) X ⊗ X


= (vecA) ⊗ (vecB) Cum2 (X ⊗ X, X ⊗ X)

 
= (vecA) ⊗ (vecB) K−1
(2,1)S +K −1
(4,2)S [vec ⊗ vec]

= 2 (vec) (A ⊗ B) vec = 2TrAB ,

see (1.10).
3.31
   
Cum2 (X1:2 |Y ) = μX1 X2 |Y − μX1 |Y μX2 |Y = E X1 ± μX1 |Y X2 ± μX2 |Y |Y − μX1 |Y μX2 |Y
    
= E X1 − μX1 |Y X2 − μX2 |Y − μX2 |Y E X1 − μX1 |Y |Y
 
−μX1 |Y E X2 − μX2 |Y |Y
  
= E X1 − μX1 |Y X2 − μX2 |Y = Cum2 (X1:2 − E (X1:2 |Y )) .
396 Solutions

3.32

Dϑd κϑb1 ,ϑb2 ,ϑb3 = Dϑd μϑb1 ,ϑb2 ,ϑb3 − Dϑd μϑb3 μϑb1 ,ϑb2 + μϑb2 μϑb1 ,ϑb3

+μϑb1 μϑb2 ,ϑb3 + 2Dϑd μϑb1 μϑb2 μϑb3

= μϑb1 ,ϑb2 ,ϑb3 ,ϑd + μϑb1 ,d ,ϑb2 ,ϑb3 + μϑb1 ,ϑb2 ,d ,ϑb3 + μϑb1 ,ϑb2 ,ϑb3 ,d

− μϑb3 ,ϑd μϑb1 ,ϑb2 + μϑb3 μϑb1 ,ϑb2 ,ϑd + μϑb2 ,ϑd μϑb1 ,ϑb3

+μϑb2 μϑb1 ,ϑb3 ,ϑd + μϑb1 ,ϑd μϑb2 ,ϑb3 + μϑb1 μϑb2 ,ϑb3 ,ϑd
  
− μϑb3 ,d μϑb1 ,ϑb2 + μϑb3 μϑb1 ,d ,ϑb2 + μϑb1 ,ϑb2 ,d + μϑb2 ,d μϑb1 ,ϑb3
 
+μϑb2 μϑb1 ,ϑb3 ,d + μϑb1 ,d ,ϑb3
 
−μϑb1 ,d μϑb2 ,ϑb3 − μϑb1 μϑb2 ,d ,ϑb3 ,d + μϑb2 ,ϑb3 ,d

= μϑb1 ,ϑb2 ,ϑb3 ,ϑd − μϑb3 ,ϑd μϑb1 ,ϑb2 + μϑb3 μϑb1 ,ϑb2 ,ϑd

+μϑb2 ,ϑd μϑb1 ,ϑb3 + μϑb2 μϑb1 ,ϑb3 ,ϑd + μϑb1 ,ϑd μϑb2 ,ϑb3 + μϑb1 μϑb2 ,ϑb3 ,ϑd

+μϑb1 ,d ,ϑb2 ,ϑb3 − μϑb1 ,d μϑb2 ,ϑb3 − μϑb3 μϑb1 ,d ,ϑb2 − μϑb2 μϑb1 ,d ,ϑb3 . . .

= κϑb1 ,ϑb2 ,ϑb3 ,ϑd + κϑb1 ,d ,ϑb2 ,ϑb3 + κϑb1 ,ϑb2 ,d ,ϑb3 + κϑb1 ,ϑb2 ,ϑb3 ,d

κ1:4 = μ1:4 − μ1,2 μ3,4 − μ1,3 μ2,4 − μ1,4 μ2,3 .

3.34

κϑ1 ,ϑ2 = μϑ1 ,ϑ2 ,


κϑ1 ,ϑ2 ,ϑ3 = μϑ1 ,ϑ2 ,ϑ3 ,

since κϑj = μϑj = 0.

Dϑd κϑb = Dϑd μϑb = μϑ {b,d } + μϑb ,ϑd = κϑ {b,d } + κϑb ,ϑd
4  
Dϑd μϑb1 ,ϑb2 = Dϑd lϑb1 lϑb2 el(ϑ) dx = μϑ{b ,d } ,ϑb2 + μϑb1 ,ϑ{b ,d } + μϑb1 ,ϑb2 ,d
1 2

4  
Dϑd μϑb1 ,ϑb2 ,ϑb3 = Dϑd lϑb1 lϑb2 lϑb3 el(ϑ) dx

= μϑ{b ,ϑb2 ,ϑb3 + μϑb1 ,ϑ{b ,ϑb3 + μϑb1 ,ϑb2 ,ϑ{b + μϑb1 ,ϑb2 ,ϑb3 ,d .
1 ,d } 2 ,d } 3 ,d }
Solutions 397

3.35
4 4
Dϑd+1 Eϑ lϑ1:d = Dϑd+1 lϑ1:d L (ϑ, x) dx = Dϑd+1 lϑ1:d el(ϑ) dx
4
 
= lϑ1:(d+1) + lϑd+1 lϑ1:d el(ϑ) dx

= Eϑ lϑ1:(d+1) + Eϑ lϑ1:d lϑd+1 ,



where Dϑb = ∂ |b| / j ∈b ∂ϑj .

Chapter 4

All random variables below are jointly normal, centralized (EXj = 0), with
covariances σj k , and cumulants κ ⊗
j,k .
4.1


n−1
 
Hn+1 (aY + bZ, X1:n ) = (aY + bZ) Hn (X1:n ) − σj,aY +bZ Hn−1 X(1:n)\j
j =1


n−1
 
= aY Hn (X1:n ) + bZHn (X1:n ) − aσj,Y Hn−1 X(1:n)\j
j =1


n−1
 
− bσj,Z Hn−1 X(1:n)\j
j =1

= aHn+1 (Y, X1:n ) + bHn+1 (Z, X1:n ).


 
4.6 Set X2k+1 = Y1 , X2k+2 = Y2 ; then diagrams L, K II c,nl correspond to L =
(b1 , . . . bk ), bj = (2j − 1, 2j ), j = 1 : k, bk+1 = 2k + 1, bk+2 = 2k + 2.
4.13 Consider only the nonzero terms of the T -derivative of third order (see (4.34),
p. 205):
*  *
* *
H4 (X1:4 ) = Da⊗4 ψ * = Da⊗4 Da⊗3 ψ *
1:4
a1:4 =0 1:3
a1:4 =0
 
= X⊗1
1:4
4
− X1 ⊗ X2 ⊗ κ ⊗
3,4 − K−1
(1243) X1 ⊗ κ ⊗
2,4 ⊗ X3

−K−1 ⊗ ⊗ ⊗ ⊗
(1423) κ 1,4 ⊗ X2 ⊗ X3 − κ 1,2 ⊗ X3 ⊗ X4 + κ 1,2 ⊗ κ 3,4
398 Solutions

 
−X1 ⊗ κ ⊗
2,3 ⊗ X 4 + K −1
κ ⊗
(1423) 1,4 ⊗ κ ⊗
2,3 − K −1
(1324) κ ⊗
1,3 ⊗ X 2 ⊗ X 4

+K−1 ⊗ ⊗
(1324) κ 1,3 ⊗ κ 2,4 ,

K−1 −1 −1
(1243) K(1423) = K(1243) K(1342) = K(1324) = K(1324),

L−1 −1 −1
22 =Id 4 + K(1324) + K(1423) ,

L−1 −1 −1 −1 −1 −1
12 ,21 = Id 4 + K(1324) + K(1423) + K(2314) + K(2413) + K(3412) ,

   
H4 (X) = X⊗4 − L−1
22 X ⊗2
⊗ κ ⊗
X,2 + L−1
12 ,21 Id 4 + K −1
(1324) + K −1 ⊗2
(1423) κ X,2
 
= Sd14 X⊗4 − 6κ X,2 ⊗ X⊗2 + 3κ ⊗2
X,2 .

4.14

H4 (X1:4 ) = H3 (X1:3 ) ⊗ X4 − H2 (X1:2 ) ⊗ κ ⊗ −1 ⊗


3,4 − K(132) H2 (X1 , X3 ) ⊗ κ 2,4

−K−1 ⊗
(1423) H2 (X1 , X4 ) ⊗ κ 2,3

−K−1 ⊗ ⊗ ⊗ ⊗
(1423) κ 1,4 ⊗ H2 (X2:3 ) − κ 1,2 ⊗ H2 (X3:4 ) + κ 1,2 ⊗ κ 3,4

+K−1 ⊗ ⊗ −1 ⊗ ⊗
(132) κ 1,3 ⊗ κ 2,4 + K(1423) κ 1,4 ⊗ κ 2,3 .

H4 (X) = X⊗1 −1 ⊗ −1 ⊗2
1:4 − L2,H 2 κ X,2 ⊗ H2 (X) + L2,2 κ X,2 ,
4

see Sect. A.2.2.2, p. 357 for: L−1


2,H 2 .
4.15
⊗ ⊗ ⊗
(xk + iXk ) = (xk + iXk ) ⊗ (x3 + iX3 ) = (xk + iXk ) ⊗ x3
k=1:3 k=1:2 k=1:2
⊗
+ (xk + iXk ) ⊗ iX3 ,
k=1:2

the second term


⊗ 
(xk + iXk ) ⊗ iX3 = x1 ⊗ x2 ⊗ iX3 +x1 ⊗ iX2 ⊗ iX3 + iX1 ⊗ x2 ⊗ iX3
k=1:2

+iX1 ⊗ iX2 ⊗ iX3 ,
Solutions 399

and
⊗ ⊗
E (xk + iXk ) = E (xk + iXk ) ⊗ x3 − Ex1 ⊗ X2 ⊗ X3 − EX1 ⊗ x2 ⊗ X3
k=1:3 k=1:2
⊗
=E (xk + iXk ) ⊗ x3 − x1 ⊗ κ ⊗ −1 ⊗
2,3 − K(213) x2 ⊗ κ 1,3 .
k=1:2

Plug x1:3 = X1:3 and obtain


⊗ *    
*
E (xk + iXk )* = H2 X[1:2] ⊗ X3 − K−1
(231) κ ⊗
2,3 ⊗ H 1 (X 1 )
k=1:3 x1:3 =X1:3
 
−K−1 ⊗
(132) κ 1,3 ⊗ H1 (X2 ) ,

cf. (4.34).
4.16 Let Y = X1 + X2 ,

H2 (Y, X1 + X2 ) = H2 (Y, X1 ) + H2 (Y, X2 ) = K−1
(21) H2 (X1 , X1 + X2 )

+H2 (X2 , X1 + X2 )

= K−1
(21) (H2 (X1 ) + H2 (X2 ) + H2 (X1 , X2 ) + H2 (X2 , X1 ))
 
= H2 (X1 ) + H2 (X2 ) + Id 2 + K−1
(21) H2 (X1 , X2 ) ,

H2 (X1 + X2 ) = (X1 + X2 ) ⊗ (X1 + X2 ) − κ ⊗ X1 +X2 ,2 = (X1 + X2 ) ⊗ (X1 + X2 )


 
−κ ⊗
X1 ,2 − κ ⊗
X2 ,2 − Id + K −1 ⊗
(21) κ X1 ,X2
 
= H2 (X1 ) + H2 (X2 ) + Id + K−1 (21) H2 (X1 , X2 ) .

4.17

H2 (AX1 + BX2 ) = H2 (AX1 + BX2 , AX1 + BX2 ) = (A ⊗ Id ) H2 (X1 , AX1 + BX2 )

+ (B ⊗ Id ) H2 (X2 , AX1 + BX2 )


= (A ⊗ Id ) K−1
(21) (A ⊗ Id ) H2 (X1 , X1 )

+ (A ⊗ Id ) K−1
(21) (B ⊗ Id ) H2 (X2 , X1 )

+ (B ⊗ Id ) K−1
(21) (A ⊗ Id ) H2 (X1 , X2 )

+ (B ⊗ Id ) K−1
(21) (B ⊗ Id ) H2 (X2 , X2 )
400 Solutions

= A⊗2 H2 (X1 ) + (A ⊗ B) H2 (X1 , X2 ) + (B ⊗ A) H2 (X2 , X1 )

+B⊗2 H2 (X2 ) .

Use (1.14), p. 9 for K−1


(21) (B ⊗ Id ) = (Id ⊗ B) K(21) ,

(A ⊗ Id ) K−1
(21) (B ⊗ Id ) H2 (X2 , X1 ) = (A ⊗ Id ) (Id ⊗ B) K(21) H2 (X2 , X1 )
= (A ⊗ B) H2 (X1 , X2 ) .

4.19
        
H3 X − X = H2 X − X ⊗ X − X − Id 3 + K−1
(2,1,3) X − X ⊗ κ⊗
X−X,2
    n−1   
= H2 X − X ⊗ X − X − Id 3 + K−1(2,1,3) X − X ⊗ κ⊗ X,2
n
1
κ⊗ = κ⊗ ,
X−X,2 n − 1 X,2
     ⊗2
H3 X−X = (X − μ)⊗3 + 3 (X − μ)⊗2 ⊗ X − μ − +3 (X − μ) ⊗ X − μ
 ⊗3  
+ X−μ − 2 (X − μ) ⊗ κ ⊗ + μ − X ⊗ κ⊗
X−X,2 X−X,2
   
= H3 (X−μ) + 2 (X − μ) ⊗ κ X,2 + H3 X − μ + 2 X − μ ⊗ κ ⊗

X,2
   
+3H2 (X−μ) ⊗ H1 X − μ + 3κ ⊗ X,2 ⊗ X − μ
 
+3H1 (X−μ) ⊗ H2 X − μ + 3 (X−μ) ⊗ κ ⊗
X,2
 
⊗ ⊗
−2 (X − μ) ⊗ κ −2 μ−X ⊗κ
X−X,2 X−X,2
     
= H3 (X−μ) + H3 μ − X + 3H2,1 X−μ, μ − X + 3H1,2 X−μ, μ − X .

4.24
n  
 n  
E (Hn (X − E (X|Y ) + E (X|Y )) |Y ) = E Hk (X − E (X|Y )) Hn−k (E (X|Y )) |Y
k
k=0
n  
 n
= Hn−k (E (X|Y )) E (Hk (X − E (X|Y )))
k
k=0
= Hn (E (X|Y )) ,
Solutions 401

X = [X1 , X2 ] , [W1 , W2 ] = X−E (X|Y), [V1 , V2 ] = E (X|Y)

E (H2 (X) |Y) = E (H2 (X−E (X|Y) + E (X|Y)) |Y) = E (H2 (W + V) |Y)

= E (H2 (W) |Y) + E (H2 (V) |Y) + E (H2 (W1 , V2 ) |Y) + E (H2 (W2 , V1 ) |Y)

= E (H2 (W)) + E (H2 (V)) + E (H1 (W1 ) H1 (V2 ) |Y) + E (H1 (W2 ) H1 (V1 ) |Y)

= E (H2 (V)) .

4.25
     
E X3 |Y = E (X − E (X|Y ) + E (X|Y ))3 |Y = E (X − E (X|Y ))3 |Y
 
+3E (X − E (X|Y ))2 E (X|Y ) |Y
 
+3E (X − E (X|Y )) E2 (X|Y ) |Y + E3 (X|Y )
   
= E (X − E (X|Y ))3 + 3E (X|Y ) E (X − E (X|Y ))2

+3E2 (X|Y ) E (X − E (X|Y )) + E3 (X|Y )


 
= 3E (X|Y ) E (X − E (X|Y ))2 + E3 (X|Y ) = H3 (E (X|Y )) .

4.26

E (Y1 Y2 |Y3 ) = E (Y1 − E (Y1 |Y3 )) ((Y2 − E (Y2 |Y3 ))) + E (Y1 |Y3 ) E (Y2 |Y3 )
= σ12 − EE (Y1 |Y3 ) E (Y2 |Y3 ) + E (Y1 |Y3 ) E (Y2 |Y3 )
E (Y1 Y3 ) = EY3 E (Y1 |Y3 ) .

E (H3 (Y1:3 )|Y3 ) = Y3 E (Y1 Y2 |Y3 ) − σ23 E (Y1 |Y3 ) − σ13 E (Y2 |Y3 ) − σ12 Y3
= Y3 E (Y1 |Y3 ) E (Y2 |Y3 ) − Y3 EE (Y1 |Y3 ) E (Y2 |Y3 )
−σ23 E (Y1 |Y3 ) − σ13 E (Y2 |Y3 )
= Y3 E (Y1 |Y3 ) E (Y2 |Y3 ) − Y3 EE (Y1 |Y3 ) E (Y2 |Y3 )
−E (E (Y1 |Y3 ) Y3 ) E (Y1 |Y3 ) − E (E (Y2 |Y3 ) Y3 ) E (Y2 |Y3 )
= H3 (E (Y1 |Y3 ) , E (Y2 |Y3 ) , Y3 ).
402 Solutions

Chapter 5

5.1
 
(4) (3) (2) (1)
ρiM = (x − 2iρiM ) ρiM + 3ρiM 1 − 2iρiM ,
   2
(4)
ρiM (0) = 4ρiM0
3 2
3ρiM0 − 2 + 3ρiM0 1 − 2ρiM0
2

= 12ρiM0
5
− 8ρiM0
3
+ 3ρiM0 − 12ρiM0
3
+ 12ρiM0
5

= 24ρiM0
5
− 20ρiM0
3
+ 3ρiM0 ,
7 5 7 3 7
4 (4) 2 2 2
κ|Z|,5 = (−i) ρiM (0) = 24 − 20 +3 ,
π π π
μ μ|Z|,4 μ|Z|,2 μ|Z|,3 1 μ|Z|,3
|Z|,5
κ|Z|,5 = 5! − μ|Z|,1 − + 2! μ2|Z|,1
5! 4! 2! 3! 2! 3!
1  
μ|Z|,2  3! 4 μ|Z|,2
2 4! 
+ μ|Z|,1 − μ|Z|,1 + μ5|Z|,1
2! 2! 3! 2! 5!
= μ|Z|,5 − 5μ|Z|,1 μ|Z|,4 − 10μ|Z|,2 μ|Z|,3 + 20μ2|Z|,1 μ|Z|,3

+30μ|Z|,1 μ2|Z|,2 − 60μ3|Z|,1 μ|Z|,2 + 24μ5|Z|,1 ,


7 7 7 7 7 7  2 7
2 2 2 2 2 2 2 2 2 2
κ|Z|,5 = 4!! −5·3 − 10 · 2 + 20 · 2 + 30 − 60 + 4!
π π π π π π π π π π
 2 7 7 7
2 2 2 2 2
=8 − (60 − 40) + (8 − 15 − 20 + 30)
π π π π π
 2 7 7 7
2 2 2 2 2
= 4! − (20) + 3 .
π π π π π

5.4  = 4, replace ν /ν1 by 


μ + 1, see (5.24) and get
 2
ζ4 ν4 ν3 ν2 ν2
κ4 = 4 = 4 − 4 3 − 3
 2
+ 12 2 − 6
ν1 ν1 ν1 ν1 ν1

μ4 + 1 − 4 (
= μ3 + 1) − 3 (
μ2 + 1)2 + 12 (
μ2 + 1) − 6
=
μ4 − 4
μ3 − 3
μ22 + 6
μ2 ,
Solutions 403

similarly
 2
ζ5 ν5ν4 ν3 ν2 ν2 ν3 ν2

κ5 = 5 = 5 − 5 4 − 10 3 2 + 30 2 + 20 3 − 60 2 + 24
ν1 ν1 ν1 ν1 ν1 ν1 ν1 ν1

μ5 + 1 − 5 (
= μ4 + 1) − 10 (
μ3 + 1) (
μ2 + 1) + 30 (
μ2 + 1)2
+20 (
μ3 + 1) − 60 (
μ2 + 1) + 24
 
=
μ5 − 5
μ4 − 10 (
μ3 + 
μ3 
μ2 +  22 + 2
μ2 ) + 30 μ μ2 + 20
μ3 − 60
μ2

+1 − 5 − 10 + 30 + 20 − 60 + 24
μ5 − 5
= μ4 − 10
μ3 
μ2 + 30
μ22 + 10
μ3 − 10
μ2 .

5.5
 (2)
λ2 = 2,  = 1
  (4)  (3)
2
λ2 = 4 λ3 = 22 · 2! · 3!!,  = 2
  (2)

λ2 = 2 · ! · (2 − 1)!!

induction
  (2+2)  (2+1)
+1
λ2 = (2 + 2) λ2+1
 (2)
= (2 + 2) (2 + 1) λ2

= 2+1 · ( + 1)! · (2 + 1)!!.

5.6
     
κXY ,4 = κX,4 κY,4 + 3κY,2 2 + κX,3 κX,1 4κY,4 + 12κY,22 2
+ κX,2 2 +
3κY,4 + +6κY,2
 
2
+ κX,2 κX,1 2
6κY,4 + 12κY,2 4 κ
+ κX,1 Y,4
 
= κY,4 κX,4 + 4κX,3 κX,1 + 3κX,2 2 + 6κ 2 4
X,2 κX,1 + κX,1
 
2
+ 3κY,2 2 + 4κ
κX,4 + 4κX,3 κX,1 + 2κX,2 2
X,2 κX,1
 
= κY,4 μX,4 + 3κY,22 μX,4 − μ2X,2
404 Solutions

and
 2
μX,4 − μ2X,2 = κX,4 + 4κX,3κX,1 + 3κX,2
2
+ 6κX,2 κX,1
2
+ κX,1
4
− κX,2 + κX,1
2

= κX,4 + 4κX,3κX,1 + 2κX,2


2
+ 4κX,2 κX,1
2
,

(see (A.18), p. 361).


5.8
 
d ER 4 d  (α + 4)  (α)

κ4 =   −1 = −1
d + 2 ER 2 2 d + 2  (α + 2)2
d  (α) d (α + 3) (α + 2)
= (α + 3) (α + 2) −1= − 1.
d+2  (α + 2) d + 2 α (α + 1)
   
5.10 κ1,2 = Cum1 κ1,2|Y + Cum2 κ1|Y , κ2|Y

κβ1 R,β2 R = Cum1 (β1 β2 Cum2 (R, R)) + Cum1 (R)2 Cum2 (β1 , β2 )

= μβ1 ,β2 κR,2 + κβ1 ,β2 μ2R = μR,2 κβ1 ,β2 + μβ1 μβ2 κR,2
   
= μR,2 μβ1 ,β2 − μβ1 μβ2 + μβ1 μβ2 μR,2 − μ2R = μR,2 μβ1 ,β2 − μβ1 μβ2 μ2R
   
= μβ1 ,β2 μR,2 − μ2R + μβ1 ,β2 − μβ1 μβ2 μ2R = μβ1 ,β2 μR,2 − μβ1 μβ2 μ2R .

5.12

Cum3 (|W1 | , |W1 | , W2 )


 
= Cum1 Cum3 (β1 R |U1 | , β1 R |U1 | , β2 RU2 |β1 , R)
 
+K∗2 Cum2 Cum1 (β1 R |U1 | |β1 , R) , Cum2 (β1 R |U1 | , β2 RU2 |β1 , R)
  
+Cum2 Cum2 (β1 R |U1 | |β1 , R) , Cum1 (β2 RU2 |β1 , R)
 
+K∗3 Cum3 Cum1 (β1 R |U1 | |β1 , R) , Cum1 (β1 R |U1 | |β1 , R) , Cum1 (β2 RU2 |β1 , R)

= 0.

5.13

κ⊗ ⊗ ⊗ ⊗2 ⊗
W,2 = κ Rp V,2 = κRp ,2 κ V,2 + κRp ,2 κ V,1 + κRp ,1 κ V,2
2

 
2 ⊗2
= μRp ,2 κ ⊗
V,2 + κ κ ⊗2
Rp ,2 V,1 = μ Rp ,2 vec − δ 2
+ κ|Z|,1 κRp ,2 δ ⊗2
π
   2 
2
= μRp ,2 vec + μRp ,2 − μRp ,1 − μRp ,2 δ ⊗2
2
π π
Solutions 405

p  2
= G−2 (p) vec − μ2Rp ,1 δ ⊗2
2 π
p p
= vec − G2−1 (p) δ ⊗2 ,
p−2 π
   
κ⊗ ⊗ ⊗2 ⊗2 ⊗ ⊗2
W,2 = μRp ,2 κ V,2 + κ V,1 − μRp ,1 κ V,1 = μRp ,2 κ V,2 + μRp ,2 − μRp ,1 κ V,1
2 2

   
p 2 p 2 2 ⊗2
= μRp ,2 κ ⊗ ⊗2
V,2 + κRp ,2 κ V,1 = vec − δ ⊗2 + − (G−1 (p))2 δ
p−2 π 2 p−2 π
p p
= vec − G2−1 (p) δ ⊗2 ,
p−2 π

(Lemma 3.4).
5.14 r = 1, then we shall have the term κRp3 ,1 δ ⊗3 , since 1:3 = (0, 0, 1),
   
C1 Rp , 1:3 = Cum1 Rp3 .
r = 2, then we shall have the term κRp ,Rp2 δ ⊗ κ ⊗ V,2 , 1:3 = (1, 1, 0),
2

   
C1 Rp , 1:3 = Cum2 Rp , Rp2 .
 
r = 3, then we shall have the term κRp ,3 δ ⊗3 , 1:3 = (3, 0, 0), C1 Rp , 1:3 =
 
Cum3 Rp


κ⊗
W,3 = κRp3 ,1 κ|Z|,3 δ
⊗3
+ 3κRp ,Rp2 κ|Z|,1 δ ⊗ κ ⊗ 3
V,2 + κRp ,3 κ|Z|,1 δ
2 ⊗3

 
2
= κRp3 ,1 κ|Z|,3 + κRp ,3 κ|Z|,1
3
− 3κRp ,Rp2 κ|Z|,1 δ ⊗3 + 3κRp ,Rp2 κ|Z|,1 δ ⊗ vec
π
= κRp |Z| ,3 δ ⊗3 + 3κRp ,Rp2 κ|Z|,1 δ ⊗ vec .

5.16 Collect coefficients of δ ⊗3


   7   1/2
 p 3/2 2 2 3/2 2  p 3/2 2G−1 2 2
G−1 2 − −3
2 p−3 π π 2 (p − 2) (p − 3) π π
 p 3/2    3/2
2p − 7 2
2 G−1 (p) G2−1 − ,
2 (p − 2) (p − 3) π

and 2 (p/π)3/2 G−1 (p) with coefficient

2 3 2p − 7
− − = 0,
p − 3 (p − 2) (p − 3) (p − 2) (p − 3)
2 3 2p − 7 2p − 4 − 3 − 2p + 7
− − = .
p − 3 (p − 2) (p − 3) (p − 2) (p − 3) (p − 2) (p − 3)
406 Solutions


Coefficient of δ ⊗3 is 2 (p/π)3/2 G3−1 (p) − 1/ (p − 3) 2/π.

Chapter 6

6.1

d
Yi2 = vecY Y = Y⊗2 (vecId ) = (vecId ) Y⊗2 ,
i=1
      
Y Y Y = (vecId ) Y⊗2 ⊗ Y = (vecId ) ⊗ Id Y⊗3 ,
     
Y Y Y = Y ⊗ (vecId ) Y⊗2 = Id ⊗ (vecId ) Y⊗3 ,

 d 

d 
d
 

d
⊗2
 ⊗2

Y Y= Yi2 = ei Y ⊗ ei Y = ei Y⊗2 = ei Y⊗2
i=1 i=1 i=1 i=1

= (vecId ) Y⊗2 .

6.2

d 
d 
d
L−1 ⊗2
22 vec Id = 3 e⊗4
i + e⊗2 ⊗2
i ⊗ ej + ei ⊗ e⊗2
j ⊗ ei
i=1 i,j =1,i=j i,j =1,i=j


d
+ ei ⊗ ej ⊗ ei ⊗ ej ,
i,j =1,i=j

L−1 −1 −1
22 = Id 4 + K(1324) + K(1423) = Id 4 + K(1324) + K(1342) = Id 4 + Kp1 + Kp2 ,


d 
d
vec⊗2 Id = e⊗4
i + e⊗2 ⊗2
i ⊗ ej ,
i=1 i,j =1,i=j

d  d
Kp1 vec ⊗2
Id = e⊗4
i + ei ⊗ ej ⊗ ei ⊗ ej
i=1 i,j =1,i=j

   d d
Kp2 vec⊗2 Id = Id ⊗ Kd 2 •d (vecId )⊗2 = e⊗4
i + ei ⊗ e⊗2
j ⊗ ei .
i=1 i,j =1,i=j

6.3 κ ⊗
4 is 4-symmetrical
 2
(vecId )⊗2 a⊗4 = (vecId ) a⊗2 (vecId ) a⊗2 = a a = a⊗2 a⊗2 = a⊗4 vecId 2 .
Solutions 407

6.4


d 
d 
d
L−1 ⊗2
22 vec Id = 3 e⊗4
i + e⊗2 ⊗2
i ⊗ ej + ei ⊗ e⊗2
j ⊗ ei
i=1 i,j =1,i=j i,j =1,i=j


d
+ ei ⊗ ej ⊗ ei ⊗ ej ,
i,j =1,i=j
  d  
  
Id 2 ⊗ (vecId ) L−1 ⊗2
22 vec Id = Id 2 ⊗ e⊗2
i L−1 ⊗2
22 vec Id
i=1
= 3vecId + (d − 1) vecId = (d + 2) vecId .

6.5
        
vec Y Y YY = Y Y Y⊗2 = Y Y ⊗ Y⊗2 = Y⊗2 vecId ⊗ Y⊗2
   
= (vecId ) Y⊗2 Y⊗2 = (vecId ) ⊗ Id 2 Y⊗4
 
= Y⊗2 ⊗ Y⊗2 vecId
   
= Y⊗2 ⊗ (vecId ) Y⊗2 = Id 2 ⊗ (vecId ) Y⊗4 ,

(vecId ) a⊗2 is scalar


   
(vecId ) ⊗ Id 2 a⊗4 = (vecId ) a⊗2 ⊗ a⊗2 = Id 2 a⊗2 ⊗ (vecId ) a⊗2 ,
 
= Id 2 ⊗ (vecId ) a⊗4 ,

  
vecB (Y) = Id 2 ⊗ (vecId ) κ ⊗4 + L−1
22 vec⊗2
Id − (d + 2) vecId
    −1 ⊗2
= Id 2 ⊗ (vecId ) κ ⊗ 
4 + Id 2 ⊗ (vecId ) L22 vec Id − (d + 2) vecId
 
= Id 2 ⊗ (vecId ) κ ⊗
4.

6.6
( #  )
 2 !Y " E
2
Yi Y1
1
b (Y) = E Yi = # 2 .
Y2 E Yi Y2
References

[AA20] Adcock C, Azzalini A (2020) A selective overview of skew-elliptical and related


distributions and of their applications. Symmetry 12(1):118
[AB02] Arnold BC, Beaver RJ (2002) Skewed multivariate models related to hidden
truncation and/or selective reporting. Test 11(1):7–54
[AC99] Azzalini A., Capitanio A (1999) Statistical applications of the multivariate skew
normal distribution. J R Stat Soc B (Stat Methodol) 61(3):579–602
[AC03] Azzalini A., Capitanio A (2003) Distributions generated by perturbation of symmetry
with emphasis on a multivariate skew t-distribution. J R Stat Soc B (Stat Methodol)
65(2):367–389
[ADV96] Azzalini A, Dalla Valle A (1996) The multivariate skew-normal distribution.
Biometrika 83(4):715–726
[Aig12] Aigner M (2012) Combinatorial theory. Springer Science & Business Media, New
York
[AK83] Amari S-i, Kumon M (1983) Differential geometry of Edgeworth expansions in
curved exponential family. Ann Inst Stat Math 35(1):1–24
[And76] Andrews GE (1976) The theory of partitions. Addison-Wesley Publishing Co.,
Reading, 1976. Encyclopedia of Mathematics and its Applications, vol. 2
[And03] Anderson (2003) An introduction to multivariate statistical analysis. John Wiley &
Sons, New York
[AS92] Abramowitz M, Stegun IA (1992) Handbook of mathematical functions with
formulas, graphs, and mathematical tables. Dover Publications Inc., New York.
Reprint of the 1972 edition.
[AVFG18] Arellano-Valle RB, Ferreira CS, Genton MG (2018) Scale and shape mixtures of
multivariate skew-normal distributions. J Multivar Anal 166:98–110
[AVG05] Arellano-Valle RB, Genton MG (2005) On fundamental skew distributions. J Multivar
Anal 96(1):93–116
[AVGQ04] Arellano-Valle RB, Gómez HW, Quintana FA (2004) A new class of skew-normal
distributions. Commun Stat Theory Methods 33(7):1465–1480
[Azz05] Azzalini A (2005) The skew-normal distribution and related multivariate families.
Scand J Stat 32(2):159–188
[BA17] Brenn T, Anfinsen SN (2017) A revisit of the Gram-Charlier and Edgeworth series
expansions. Artikler, rapporter og annet (fysikk og teknologi), Department of Physics
and Technology, University of Tromsø, The Arctic University of Norway
[Bar91] Baringhaus L (1991) Testing for spherical symmetry of a multivariate distribution.
Annals Stat 19(2):899–917

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 409
G. Terdik, Multivariate Statistical Methods, Frontiers in Probability
and the Statistical Sciences, https://doi.org/10.1007/978-3-030-81392-5
410 References

[BB86] Berkane M, Bentler PM (1986) Moments of elliptically distributed random variates.


Stat Probab Lett 4(6):333–335
[BBQ07] Balakrishnan N, Brito MR, Quiroz AJ (2007) A vectorial notion of skewness and its
use in testing for multivariate symmetry. Commun Stat Theory Methods 36(9):1757–
1767
[BH91] Baringhaus L, Henze N (1991) Limit distributions for measures of multivariate
skewness and kurtosis based on projections. J Multivar Anal 38(1):51–69
[BH92] Baringhaus L, Henze N (1992) Limit distributions for Mardia’s measure of multivari-
ate skewness. Ann Stat 20(4):1889–1902
[BKB+ 16] Brea O, El Khatib M, Bendazzoli GL, Evangelisti S, Leininger T, Angeli C (2016) The
spin-partitioned total-position spread tensor: An application to diatomic molecules. J
Phys Chem A 120(27):5230–5238
[BM98] Blinnikov S, Moessner R (1998) Expansions for nearly Gaussian distributions. Astron
Astrophys Suppl Ser 130(1):193–205
[BMP12] Berg A, McMurry T, Politis DN (2012) Testing time series linearity. In: Time series
analysis: methods and applications. Elsevier, New York, pp 27–42
[BNC79] Barndorff-Nielsen O, Cox DR (1979) Edgeworth and saddle-point approximations
with statistical applications. J R Stat Soc B 41(3):279–312 (1979). With discussion
[BNC89] Barndorff-Nielsen, OE, Cox DR (1989) Asymptotic techniques for use in statistics.
Monographs on statistics and applied probability. Chapman & Hall, London
[BNP79] Barndorff-Nielsen O, Pedersen BV (1979) The bivariate Hermite polynomials up to
order six. Scand J Stat 6(3):127–128
[BP07] Berg A, Politis DN (2007) Higher-order accurate polyspectral estimation with flat-top
lag-windows. Ann Inst Stat Math 61(2):477–498
[BR67] Brillinger DR, Rosenblatt M (1967) Asymptotic theory of k-th order spectra. In:
Harris B (ed) Spectral analysis of time series. Wiley, New York, pp 153–188
[Bri65] Brillinger DR (1965) An introduction to polyspectra. Ann Math Stat 36:1351–1374
[Bri69] Brillinger DR (1969) The calculation of cumulants via conditioning. Ann Inst Stat
Math 21(1):215–218
[Bri81] Brillinger DR (1981) Time series: data analysis and theory. Mc Graw Hill, New York,
expanded edition
[Bri82] Brillinger DR (1982) Asymptotic normality of finite Fourier transforms of stationary
generalized processes. J Multivar Anal 12(1):64–71
[Bri91] Brillinger DR (1991) Some history of the study of higher-order moments and spectra.
Stat Sin 1:465–476
[Bri01] Brillinger DR (2001) Time series; data analysis and theory. Society for Industrial and
Applied Mathematics (SIAM), Philadelphia, PA. Reprint of the 1981 edition
[Cam94] Cameron PJ (1994) Combinatorics: topics, techniques, algorithms. Cambridge
University Press, Cambridge
[Cap12] Capitanio A (2012) On the canonical form of scale mixtures of skew-normal
distributions. Preprint. arXiv:1207.0797
[Car62] Carlitz L (1962) The product of several Hermite or Laguerre polynomials. Monat-
shefte Math 66(5):393–396
[Car89] Cardoso JF (1989) Source separation using higher order moments. In: International
conference on acoustics, speech, and signal processing. IEEE, Piscataway, pp 2109–
2112
[CC13] Chakraborty AK, Chatterjee M (2013) On multivariate folded normal distribution.
Sankhya B 75(1):1–15
[CD10] Chacón JE, Duong T (2010) Multivariate plug-in bandwidth selection with uncon-
strained pilot bandwidth matrices. Test 19(2):375–398
[CECTBBG16] Costa AH, Enríquez-Caldera R, Tello-Bello M, Bermúdez-Gómez CR (2016)
High resolution time-frequency representation for chirp signals using an adaptive
system based on duffing oscillators. Digital Signal Process 55:32–43
References 411

[CGLM08] Comon P, Golub G, Lim LH, Mourrain B (2008) Symmetric tensors and symmetric
tensor rank. SIAM J Matrix Anal Appl 30(3):1254–1279
[Cha67] Chambers JM (1967) On methods of asymptotic approximation for multivariate
distributions. Biometrika 54(3–4):367–383
[Cra99] Cramér H (1999) Mathematical methods of statistics, vol 43. Princeton University
Press, Princeton
[DGP18] Domino K, Gawron P, Pawela Ł (2018) Efficient computation of higher-order
cumulant tensors. SIAM J Sci Comput 40(3):A1590–A1610
[Dha18] Dharmani BC (2018) Multivariate generalized Gram–Charlier series in vector nota-
tions. J Math Chem 56(6):1631–1655
[DL05] Dey DK, Liu J (2005) A new construction for skew multivariate distributions. J
Multivar Anal 95(2):323–344
[DL08] De Silva V, Lim LH (2008) Tensor rank and the ill-posedness of the best low-rank
approximation problem. SIAM J Matrix Anal Appl 30(3):1084–1127
[DLM] NIST Digital Library of Mathematical Functions. http://dlmf.nist.gov/, Release 1.0.17
of 2017-12-22. Olver FWJ, Olde Daalhuis AB, Lozier DW, Schneider BI, Boisvert
RF, Clark CW, Miller BR, Saunders BV (eds)
[DM77] Dobrushin RL, Minlos RA (1977) Polynomials in linear random functions. Russ Math
Surv 32(2):71–127
[DM79a] Dobrushin RL, Major P (1979) Non-central limit theorems for non-linear functionals
of Gaussian fields. Z Wahrsch Verw Gebiete 50:27–52
[DM79b] Dobrushin RL, Minlos RA (1979) The moments and polynomials of a generalized
random field. Theory Probab Appl 23(4):686–699
[DX14] Dunkl CF, Xu Y (2014) Orthogonal polynomials of several variables, vol 155.
Cambridge University Press, Cambridge
[Ela61] Elandt RC (1961) The folded normal distribution: Two methods of estimating
parameters from moments. Technometrics 3(4):551–562
[EMOT81] Erdélyi A, Magnus W, Oberhettinger F, Tricomi FG (1981) Higher transcendental
functions. Vol. II. Robert E. Krieger Publishing Co. Inc., Melbourne, FL. Based on
notes left by Harry Bateman, Reprint of the 1953 original
[Fel40] Feldheim E (1940) Expansions and integral-transforms for products of Laguerre and
Hermite polynomials. Q J Math 1(1):18–29
[Fel66] Feller W (1966) An Introduction of Probability Theory and its Application, vol II.
John Wiley, New York
[Fis30] Fisher RA (1930) Moments and product moments of sampling distributions. Proc
Lond Math Soc 2(1):199–238
[FKN17] Fang KW, Kotz S, Ng KW (2017) Symmetric multivariate and related distributions.
Chapman and Hall/CRC, Boca Raton
[Gen04] Genton MG (2004) Skew-elliptical distributions and their applications: a journey
beyond normality. CRC Press, Boca Raton
[GHL01] Genton MG, He L, Liu X (2001) Moments of skew-normal random vectors and their
quadratic forms. Stat Prob Lett 51(4):319–325
[GJ12] Glimm J, Jaffe A (2012) Quantum physics: a functional integral point of view.
Springer Science & Business Media, New York
[GL05] Genton MG, Loperfido NMR (2005) Generalized skew-elliptical distributions and
their quadratic forms. Ann Inst Stat Math 57(2):389–401
[Goo75] Good IJ (1975) A new formula for cumulants. Math Proc Camb Philos Soc 78(2):333–
337
[GR00] Gradshteyn IS, Ryzhik IM (2000) Table of integrals, series, and products, 6th edn.
Academic Press Inc., San Diego, CA. Translated from the Russian, Translation edited
and with a preface by Alan Jeffrey and Daniel Zwillinger
[Gra49] Grad H (1949) Note on N-dimensional Hermite polynomials. Commun Pure Appl
Math 2(4):325–330
412 References

[Gra18] Graham A (2018) Kronecker products and matrix calculus with applications. Courier
Dover Publications, New York
[Gup63] Gupta SS (1963) Bibliography on the multivariate normal integrals and related topics.
Ann Math Stat 34(3):829–838
[Hal00] Hald A (2000) The early history of the cumulants and the Gram–Charlier series. Int
Stat Rev 68(2):137–153
[Har06] Hardy M (2006) Combinatorics of partial derivatives. Electron J Combin 13(1):1
[Hen02] Henze N (2002) Invariant tests for multivariate normality: a critical review. Stat Pap
43(4):467–506
[Hid80] Hida T (1980) Brownian motion. Springer, New York
[Hol85] Holmquist B (1985) The direct product permuting matrices. Linear Multilinear
Algebra 17(2):117–141
[Hol88] Holmquist B (1988) Moments and cumulants of the multivariate normal distribution.
Stoch Anal Appl 6(3):273–278
[Hol96a] Holmquist B (1996) The d-variate vector Hermite polynomial of order. Linear Algebra
Appl 237/238:155–190
[Hol96b] Holmquist B (1996) Expectations of products of quadratic forms in normal variables.
Stoch Anal Appl 14(2):149–164
[HS79] Henderson HV, Searle SR (1979) Vec and vech operators for matrices, with some uses
in jacobians and multivariate statistics. Canad J Stat 7(1):65–81
[HS81] Henderson HV, Searle SR (1981) The vec-permutation matrix, the vec operator and
Kronecker products: a review. Linear Multilinear Algebra 9(4):271–288
[Iso82] Isogai T (1982) On a measure of multivariate skewness and a test for multivariate
normality. Ann Inst Stat Math 34(1):531–541
[Iss18] Isserlis L (1918) On a formula for the product-moment coefficient of any order of a
normal frequency distribution in any number of variables. Biometrika 12(1/2):134–
139
[Jam58] James GS (1958) On moments and cumulants of systems of statistics. Sankhyā Indian
J Stat 20:1–30
[JM62] James GS, Mayne AJ (1962) Cumulants of functions of random variables. Sankhyā
Indian J Stat A 24:47–54
[JST06] Jammalamadaka S, Subba Rao T, Terdik Gy (2006) Higher order cumulants of random
vectors and applications to statistical inference and time series. Sankhya (A Methodol)
68:326–356
[JTT20] Jammalamadaka SR, Taufer E, Terdik GyH (2020) On multivariate skewness and
kurtosis. Sankhya A 83(2):1–38
[JTT21a] Jammalamadaka SR, Taufer E, Terdik GyH (2021) Asymptotic theory for statistics
based on cumulant vectors with applications. Scand J Stat 48(2):708–728
[JTT21b] Jammalamadaka SR, Taufer E, Terdik GyH (2021) Cumulants of multivariate sym-
metric and skew symmetric distributions. Symmetry 13(8):1383
[KBF+ 15] Khatib ME, Brea O, Fertitta E, Bendazzoli GL, Evangelisti S, Leininger T (2015) The
total position-spread tensor: Spin partition. J Chem Phys 142(9):094113
[Ken44] Kendall MG (1944) The advanced theory of statistics. Vol. I, J. B. Lippincott Co.,
Philadelphia
[Kim08] Kim HM (2008) A note on scale mixtures of skew normal distribution. Stat Probab
Lett 78(13):1694–1701
[Kla02] Klar B (2002) A treatment of multivariate skewness, kurtosis, and related statistics. J
Multivar Anal 83(1):141–165
[KM97] Kostrikin A, Manin Y (1997) Linear algebra and geometry. Breach Science Publishers
[KM03] Kim HJ, Mallick BK (2003) Moments of random vectors with skew t distribution and
their quadratic forms. Stat Probab Lett 63(4):417–423
[Kol08] Kollo T (2008) Multivariate skewness and kurtosis measures with an application in
ICA. J Multivar Anal 99(10):2328–2338
References 413

[Koz87] Koziol JA (1987) An alternative formulation of Neyman’s smooth goodness of fit tests
under composite alternatives. Metrika 34(1):17–24
[Koz89] Koziol JA (1989) A note on measures of multivariate kurtosis. Biom J 31(5):619–624
[KRYAV16] Kahrari F, Rezaei M, Yousefzadeh F, Arellano-Valle RB (2016) On the multivariate
skew-normal-Cauchy distribution. Stat Probab Lett 117:80–88
[KS77] Kendall M, Stuart A (1977) The advanced theory of statistics, Vol. 1: Distribution
theory, 4th edn. Griffin, London
[KS05] Kollo T, Srivastava MS (2005) Estimation and testing of parameters in multivariate
Laplace distribution. Commun Stat Theory Methods 33(10):2363–2387
[Kv95] Kollo T, von Rosen D (1995) Minimal moments and cumulants of symmetric matrices:
An application to the wishart distribution. J Multivar Anal 55(2):149–164
[KvR06] Kollo T, von Rosen D (2006) Advanced multivariate statistics with matrices, vol 579.
Springer Science & Business Media, New York
[Leo64] Leonov VP (1964) Nekotorye primeneniya starshikh semiinvariantov k teorii statsion-
arnykh sluchainykh protsessov. Izdat. “Nauka”, Moscow
[LHL18] Luan X, Huang B, Liu F (2018) Higher order moment stability region for Markov
jump systems based on cumulant generating function. Automatica 93:389–396
[Lop14] Loperfido N (2014) A note on the fourth cumulant of a finite mixture distribution. J
Multivar Anal 123:386–394
[LS59] Leonov VP, Shiryaev AN (1959) On a method of calculation of semi-invariants.
Theory Probab Appl 4(3):319–329
[Luk55] Lukacs E (1955) Applications of Faa di Bruno’s formula in statistics. Am Math
Monthly 62:340–348
[Luk70] Lukacs E (1970) Characteristic functions. Griffin
[MA73] Malkovich JF, Afifi AA (1973) On tests for multivariate normality. J Am Stat Assoc
68(341):176–179
[Ma09] Ma TW (2009) Higher chain formula proved by combinatorics. Electron J Combin
16(1):N21
[Mac74] MacRae EC (1974) Matrix derivatives with an application to an adaptive linear
decision problem. Ann Stat 2:337–346
[Maj81] Major P (1981) Multiple Wiener–Itô integrals, Lecture Notes in Mathematics, vol 849,
2nd, 2014 edn. Springer, New York
[Mal78] Malahov AN (1978) Kumulyantnyiy analiz sluchaynyih negaussovyih protsessov i
ih preobrazovaniy [Cumulant analysis of random non-Gaussian processes and their
transformations]. Sov. Radio Publ, Moskow, P 376
[Mal80] Malyshev VA (1980) Cluster expansions in lattice models of statistical physics and
the quantum theory of fields. Uspekhi Mat Nauk 35(2(212)):2–53
[Mar70] Mardia KV (1970) Measures of multivariate skewness and kurtosis with applications.
Biometrika 57:519–530
[MC86] McCullagh P, Cox DR (1986) Invariants and likelihood ratio statistics. Ann Stat
14(4):1419–1430
[McC87] McCullagh P (1987) Tensor methods in statistics. Monographs on statistics and
applied probability. Chapman & Hall, London
[McC18] McCullagh P (2018) Tensor methods in statistics. Courier Dover Publications, New
York
[McK52] McKean HPJ (1952) A new proof of the completeness of the hermite functions. Tech.
rep., Mathematical Notes
[Mei05] Meijer E (2005) Matrix algebra for higher order moments. Linear Algebra Appl
410:112–134
[Mia19] Miatto FM (2019) Recursive multivariate derivatives of arbitrary order. Preprint.
arXiv:191111722
[MM85] Malyshev VA, Minlos RA (1985) Gibbs random fields. Nauka, Moscow
414 References

[MN80] Magnus JR, Neudecker H (1980) The elimination matrix: Some lemmas and applica-
tions. SIAM J Algebraic Discrete Methods 1(4):422–449
[MN99] Magnus JR, Neudecker H (1999) Matrix differential calculus with applications in
statistics and econometrics. John Wiley & Sons Ltd., Chichester. Revised reprint of
the 1988 original
[MNBO09] Michalowicz JV, Nichols JM, Bucholtz F, Olson CC (2009) An Isserlis’ theorem
for mixed Gaussian variables: Application to the auto-bispectral density. J Stat Phys
136(1):89–102
[Mor82] Morris CN (1982) Natural exponential families with quadratic variance functions. Ann
Stat 10(1):65–80
[MP92] Mathai AM, Provost SB (1992) Quadratic forms in random variables: theory and
applications. Dekker
[MRS94] Móri TF, Rohatgi VK, Székely GJ (1994) On multivariate skewness and kurtosis.
Theory Probab Appl 38(3):547–551
[Mui09] Muirhead RJ (2009) Aspects of multivariate statistical theory, vol 197. John Wiley &
Sons, New York
[Neu69] Neudecker H (1969) Some theorems on matrix differentiation with special reference
to Kronecker matrix products. J Am Stat Assoc 64(327):953–963
[NGS08] Di Nardo E, Guarino G, Senato D (2008) Symbolic computation of moments of
sampling distributions. Comput Stat Data Anal 52(11):4909–4922
[Nor84] Northcott DG (1984) Multilinear algebra. Cambridge University Press, Cambridge
[NR03] Noschese S, Ricci PE (2003) Differentiation of multivariable composite functions and
bell polynomials. J Comput Anal Appl 5(3):333–340
[NS09] Di Nardo E, Senato D (2009) The eleventh and twelveth problems of rota’s fubini
lectures: from cumulants to free probability theory. In: From combinatorics to
philosophy, Springer US, New York, pp 91–130
[OST18] Orlowski P, Sali A, Trojani F (2018) Arbitrage free dispersion. SSRN Electron J, 78.
https://doi.org/10.2139/ssrn.3314269
[PP01] Psarakis S, Panaretos J (2001) On some bivariate extensions of the folded normal and
the folded-t distributions. J Appl Stat Sci 10(2):119–136
[PS72] Pólya Gy, Szegő G (1972) Problems and theorems in analysis, vol I. Springer, Berlin,
Heidelberg
[PT11] Peccati G, Taqqu MS (2011) Wiener chaos: moments, cumulants and diagrams: A
survey with computer implementation, vol 1. Springer Science & Business Media,
New York
[Rah17] Rahman S (2017) Wiener–Hermite polynomial expansion for multivariate gaussian
probability measures. J Math Anal Appl 454(1):303–334
[RB89] Rayner JCW, Best DJ (1989) Smooth tests of goodness of fit. Oxford statistical science
series. Oxford University Press, New York
[Rob16] Robeva E (2016) Orthogonal decomposition of symmetric tensors. SIAM J Matrix
Anal Appl 37(1):86–102
[RS00] Rota G, Shen J (2000) On the combinatorics of cumulants. J Combin Theory A
91:283–304. http://www.idealibrary.com
[Sav06] Savits TH (2006) Some statistical applications of Faa di Bruno. J Multivar Anal
97(10):2131–2140
[Sch19] Schumann A (2019) Multivariate bell polynomials and derivatives of composed
functions. Preprint. arXiv:190303899
[SDB03] Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions
with applications to Bayesian regression models. Can J Stat 31(2):129–150
[Ser04] Serfling RJ (2004) Multivariate symmetry and asymmetry. Encyclopedia Stat Sci
8:5338–5345
[Shi60] Shiryaev AN (1960) Some problems in spectral theory of higher order- moments I.
Theor Prob Appl 5:293–313
[Shi96] Shiryaev AN (1996) Probability, vol 95. Springer, New York
References 415

[Sho10] Shopin SA (2010) Cubic mapping of the normal random vector. Izvestiya Tula State
Univ 2010(2):211 – 221
[Shu16] Shushi T (2016) A proof for the conjecture of characteristic function of the generalized
skew-elliptical distributions. Stat Probab Lett 119:301–304
[Sko86] Skovgaard I (1986) A note on the differentiation of cumulants of log likelihood
derivatives. Int Stat Rev 54(1):29–32
[Son01] Song KS (2001) Rényi information, loglikelihood and an intrinsic distribution mea-
sure. J Stat Plan Inference 93(1–2):51–69
[Spe83] Speed TP (1983) Cumulants and partition lattices. Austral J Stat 25(2):378–388
[Spe86a] Speed TP (1986) Cumulants and partition lattices. II. Generalised k-statistics. J
Austral Math Soc Ser A 40(1):34–53
Speed TP (1986) Cumulants and partition lattices. III. Multiply-indexed arrays. J
Austral Math Soc Ser A 40(2):161–182
[Spe86c] Speed TP (1986) Cumulants and partition lattices. IV. A.s. convergence of generalised
k-statistics. J Austral Math Soc Ser A 41(1):79–94
[Spe90] Speed TP (1990) Invariant moments and cumulants. In: Coding theory and design
theory, Part II, IMA Vol. Math. Appl., vol 21, Springer, New York, pp 319–335
[Sri84] Srivastava MS (1984) A measure of skewness and kurtosis and a graphical method for
assessing multivariate normality. Stat Probab Lett 2(5):263–267
[SS88a] Speed TP, Silcock HL (1988) Cumulants and partition lattices. V. Calculating
generalized k-statistics. J Austral Math Soc Ser A 44(2):171–196
[SS88b] Speed TP, Silcock HL (1988) Cumulants and partition lattices. VI. Variances and
covariances of mean squares. J Austral Math Soc Ser A 44(3):362–388
[ST08] Swe M, Taniguchi M (2008) Higher-order asymptotic properties of a weighted
estimator for Gaussian ARMA processes. J Time Series Anal 12(1):83–93
[Ste93] Steyn HS (1993) On the problem of more than one kurtosis parameter in multivariate
analysis. J Multivar Anal 44(1):1–22
[Sut86] Sutradhar BC (1986) On the characteristic function of multivariate Student t-
distribution. Can J Stat/La Revue Canadienne de Statistique 14(4):329–337
[Sze36] Szegő G (1936) Orthogonal polynomials. American Mathematical Society, Collo-
quium Publ., vol XXIII. American Mathematical Society, New York
[Taq75] Taqqu MS (1975) Weak convergence to fractional Brownian motion and to the
Rosenblatt process. Z Wahrsch verwandte Geb 31:287–302
[Ter02] Terdik Gy (2002) Higher order statistics and multivariate vector hermite polynomials
for nonlinear analysis of multidimensional time series. Teor Ver Matem Stat (Teor
Imovirnost ta Matem Statyst) 66:147–168
[Thi97] Thiélé TN (1897) Elementaer Iagttagelseslaere. København: Gyldendalske. Reprinted
in English as ‘The Theory of Observations’ Ann. Math. Statist. (1931) 2:165–308
[VN94] Voinov VG, Nikulin M (1994) On power series, bell polynomials, Hardy-Ramanujan-
Rademacher problem and its statistical applications. Kybernetika 30(3):343–358
[VN97] Voinov VG, Nikulin MS (1997) On a subset sum algorithm and its probabilistic and
other applications. Birkhäuser Boston, Boston, MA, pp 153–163
[VPO19] Vicuña MI, Palma W, Olea R (2019) Minimum distance estimation of locally
stationary moving average processes. Comput Stat Data Anal 140:1–20
[Wat38] Watson GN (1938) A note on the polynomials of Hermite And Laguerre. J Lond Math
Soc 1(1):29–32
[Wei15] Weis M (2015) Multi-dimensional signal decomposition techniques for the analysis
of eeg data. PhD thesis, Universitätsbibliothek Ilmenau
[Whi53] Whittle P (1953) The analysis of multiple stationary time series. J R Stat Soc Ser B
15:125–139
[Wit84] Withers CS (1984) A chain rule for differentiation with applications to multivariate
hermite polynomials. Bull Austral Math Soc 30(2):247–250
416 References

[Wit00] Withers CS (2000) A simple expression for the multivariate Hermite polynomials. Stat
Probab Lett 47(2):165–169
[Wün00a] Wünsche A (2000) Corrigenda: “General Hermite and Laguerre two-dimensional
polynomials”. J Phys A 33(17):3531
[Wün00b] Wünsche A (2000) General Hermite and Laguerre two-dimensional polynomials. J
Phys A 33(8):1603–1629
Index

B via moments and cumulants, 149


Bell numbers, 31, 67, 353
Bell polynomial, 34, 67, 137, 226, 352
incomplete, 32, 91, 137, 351 D
Bracketing, 383 Diagram, 44
closed, 194, 220
without loops, 192, 219
C Distinct values principle, 64, 67, 68, 70, 85,
Characteristic function, 108, 112, 226 88, 108, 110, 116, 119, 186, 206
elliptically symmetric, 253 Distribution
gamma, 119 CFUSN, 247
multivariate Laplace, 294 CFUSS, 266
normal, 109, 113 elliptically symmetric, 252
skew-normal, 245 gamma, 119, 195
Commutator matrix, 7, 11, 353 Laplace, 294
commutator for T–products, 10 normal, 122
H-commutator, 205, 225, 226, 357 skew-normal, 241
J-commutator, 208, 358 skew-normal-Cauchy, 287
mixing commutator, 218, 356 skew-t, 277
moment commutator, 95, 96, 137, 353 SMSN, 285
Cumulant, 118 spherically symmetric, 297
CFUSN, 248 Double factorial, 368
conditional, 154 Duplication, 18, 21
gamma, 119
Hermite polynomials, 193
multi-Laplace, 296 E
normal, 120, 122 Elimination, 21
parameter, 257 Equivalence in symmetry, 208, 366, 382
of products, 143 Exponent, 382
properties, 124
skew-normal, 244
skew-t, 278 G
SNC, 290 Gaussian
T-Hermite polynomials, 220 system, 185
via moments, 126 Graph

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 417
G. Terdik, Multivariate Statistical Methods, Frontiers in Probability
and the Statistical Sciences, https://doi.org/10.1007/978-3-030-81392-5
418 Index

closed, 43 lattice structure, 37


without loop, 44 matrix, 27
number of, 30
set Partitions, 26
H type, 32
Hadamard power, 252 Permutations, 1, 383
Hermite polynomial canonical cycle, 3
inversion formula, 198 cycle notation, 3, 381
one variable, 183 matrix, 4
product, 197
several variable, 186, 206
Hilbert space R
linear, 185 Repetition, 383
nonlinear, 185

S
I Semifactorial, 381
Inner product, 382 Solutions, 385
Star product, 51
Subscription, 384
M Symmetrizer, 14
Mill’s ratio, 241
Moment
of Gaussian system, 190, 216 T
of Hermite polynomial, 192, 217 Tensor product, 5
of order m, 108 Type
via cumulants, 138 multi-index, 19, 186
Multi-index, 16 partition, 32, 33, 45, 65, 66, 90, 91, 131,
Multilinear Algebra, 13 135, 156

P V
Partitions, 26, 383 Vec operator, 6
indecomposable, 38, 144, 148

You might also like