You are on page 1of 62

Forecasting Missing Values with

Time-Varying High Dimensional


Hierarchical Density Models:
Applications to Credit Curves

Yonas Khanna1,3, Andre Lucas2,3

1
Financial Risk Model Development, ING Bank, The Netherlands
2
Tinbergen Institute, The Netherlands
3
School of Business and Economics, Vrije Universiteit Amsterdam, The Netherlands
Forecasting Missing Values with Time-Varying High
Dimensional Hierarchical Density Models:
Applications to Credit Curves∗

Yonas Khanna1,3 , Andre Lucas2,3

1
Financial Risk Model Development, ING Bank, The Netherlands
2
Tinbergen Institute, The Netherlands
3
Vrije Universiteit Amsterdam, The Netherlands

This version: October 29, 2021

Abstract

We propose a dynamic multivariate Gaussian model for the conditional imputation of


missing values in high dimensional panel data. The key novelty of our proposed imputation
model is the utilization of serial correlation and exploitation of hierarchical factor structures
in the panel for estimating its missing values in terms of in-sample density forecasts. The
model adopts generalized autoregressive score dynamics to exploit the conditionally latent
factor structure in cross-sectional mean and variance matrices of p-dimensional observa-
tion vectors through time. We identify multiple time-varying common and idiosyncratic
location and scale factors to predict the in-sample conditional density of missing values.
Our hierarchical density model permits the estimation of factors when there is at least one
observation present in the panel’s cross-sections and accommodates very general missing
data patterns. A simulation-based algorithm is developed to quantify the forecast uncer-
tainty in the estimated factors at sparse time points. The reliability of our imputation
methodology is investigated by means of Monte Carlo experiments on both simulated and
empirical data. We provide an empirical illustration of the dynamic hierarchical density
model for imputing large gaps in a high-dimensional panel of daily global credit curves.

Keywords: Missing value imputation, hierarchical dynamic factor models, forecasting


conditional mean and variance, common and idiosyncratic factors, high-dimensional panel
data
JEL classifications: ...


Khanna thanks ING Bank for the financial support and MarkitTM for the credit curves data.
1 Introduction
Quantitative applications require complete data sets, but real-world data are often only
partially observed in reality. The missing data problem is frequently encountered in
many areas of research empirical. Large longitudinal surveys in social sciences or cohort
studies in biology may include missing entries due to attrition, whereas the dependent
variables in economic panel data can have also missing observations because the data
is measured at different frequencies or not across all spatial indicators (e.g., region and
sector). In finance, cross-sections of returns and credit curves can be sparse as a result of
bankruptcies or a shortage of liquid quotes. In policy making and risk management prac-
tices, incomplete historical time-series introduce complications. The missingness problem
is particularly burdensome for the computation of risk metrics because they require the
knowledge of large time-series data, often as long as 10 years (e.g., value-at-risk). Missing
entries in data sets are typically imputed instead of being removed by dropping rows or
columns in order to prevent loss of information. The selection of the proper estimation
method is critical for the accuracy and reliability of imputations and estimating missing
values is therefore not uncomplicated (Cahan et al., 2021).
This paper introduces a framework to impute the entire density of missing entries
in panel data of large dimensions. We propose a novel approach to predict the location
and scale of missing observations by a dynamic multivariate Gaussian time-series model
with multiple unobserved common and idiosyncratic location and scale components. The
cross-sections means and variances of the multivariate model follow a latent hierarchical
multi-factor structure through time. For the estimation of dynamic unobserved loca-
tion/scale factors in presence of missing values, we develop a new observation-driven fil-
ter. The proposed filter permits the signal extraction of common components when there
is at least one observation present in the panel’s cross-section. This gives researchers the
opportunity to predict the densities of missing values with very few observations. We fur-
ther develop a simulation algorithm to measure the uncertainty of the estimated location
and scale factors for the missing observations. We show that our framework performs
well for a variety of missing data scenarios and at very high missing data rates.
Regression-based imputation methods are often used in practice to complete large
time-series panels. In particular, factor models are used to exploit the comovement in
cross-sections of the panel and utilize a set of common components to estimate missing
values. The factors are assumed to be known during the imputation process and are

1
pre-estimated by principal component analysis (PCA). However, the estimation of PCA
factors in presence of missing data is a challenging task. Recent work in the econometrics
field by Jin et al. (2021), Xiong & Pelger (2019), and Bai & Ng (2021) has proposed
different PCA factor-based methods to consistently estimate the common component
when the factor structure is strong within the time-series system. In Cahan et al. (2021)
additional algorithms are introduced to efficiently estimate the PCA factors on sparse
panels and provide the distribution theory needed to conduct inference.
While regression models with known factors are efficient and easily scalable for the
imputation of large data panels, such models ignore the presence of time-dependence
among (cross-sectional) time-series observations. That is, these factor models do not
exploit the predictive power present in the sparse panel data. This is especially important
in the context of time-series imputation because the factors can be predictable at sparse
time points, given past observed values. In addition, the aforementioned methods only
predict the mean of the missing observations, and not their potentially time-varying
variance matrix. The so-called factor-based ’residual overlay’ algorithm in Cahan et al.
(2021) does permit the estimation of the covariance matrix of panels in the presence of
missing entries, but the respective matrix is assumed to be constant. It is well known that
the variances of economic and financial time-series variables are subject to time variation
and constant variances are thus not sufficient for risk management practices that also
require the knowledge of the time-varying volatility of missing observation.
Another shortcoming of the current factor-based imputation methods is that they do
not apply to all missing data mechanisms, except the imputation frameworks of Xiong
& Pelger (2019) and Bai & Ng (2021). Their PCA estimators are applicable on both
missing at random (MAR) and not missing at random (NMAR) mechanism, such as
block-missing and staggered missing. However, the performance of their estimators is
only tested for missing data patterns in which an entry in the panel data is much likelier
to be observed than missing. In practice, time-series systems exist in which are much
likelier to be missing than observed. For example, credit default swap (CDS) curves are
highly illiquid and for the majority of firms, the credit curves do not exist at all (EBA,
2015).
Our econometric framework overcomes the above shortcomings and provides three
improvements to handle missing values. First, we relax the assumption of imposing a
constant variance matrix for the panel series and specify a dynamic multivariate Gaussian
density model whose cross-sections of time-varying mean vectors and diagonal variance

2
matrices both follow a factor structure. The factor structure, in mean and variance, is in-
duced by a pre-specified design matrix that relates the factors to the time-varying means
and variances of the multivariate Normal distribution. To be more precise, the design
matrix includes dummy variables and acts as a factor-loading matrix to decompose the
cross-sectional means and variances into independent location and scale factors, respec-
tively. The matrix can include both common and idiosyncratic dummies, but may also
include complex hierarchical structure through spatial decomposition, e.g., via region,
sector, and rating dummies. Additionally, the proposed model allows the design matrix
to be time-varying as well. As such, the dynamic factor-based density model provides the
econometrician a high degree of flexibility to decompose the cross-sections of the panel
data.
Next, in our modeling framework, the factors are assumed to be time-varying un-
observed components. We adopt the generalized autoregressive score (GAS) framework
of Creal et al. (2013) to filter the time-varying factors. Thereby, the dynamic factors
evolve as conditionally latent variables, i.e., the model is observation-driven. The con-
ditional observation density of our GAS location-scale model is multivariate Gaussian
with a time-varying mean and time-varying diagonal variance matrix. Naturally, the
time-dependence feature of time-series is modeled by the GAS filter when all series are
completely observed, but in the presence of missing values, the GAS filter needs to be
modified. To permit the estimation of the dynamic factors in the presence of missing
values, we extend the multi-state GAS filter of Creal et al. (2013) with the same rules
that apply to the Kalman filter when dealing with missing observations, see Durbin &
Koopman (2012) for a detailed overview. As result, our extension efficiently utilizes only
the available values in the cross-section to estimate the factors through time. Since our
approach uses signal extractions techniques for the treatment of missing values, it is
closely related to the parameter-driven dynamic factor models of e.g., Stock & Watson
(2016) and Jungbacker et al. (2011). However, our significantly imputation model differs
from their state space factor model because we also filter time-varying volatility and not
just the dynamic mean.
Last, the forecasting framework is independent of the missing data mechanism because
our modification of the GAS filter allows for the estimation of common components when
there is at least one time-series observation present in the panel’s cross-section. As a
result, our imputation mechanism accommodates very general missing data patterns and
in contrast to other frameworks, our factor estimator, i.e., the GAS filter, requires far

3
fewer observations. In addition, the remaining location and scale factors that cannot
be estimated due to lack of information are extrapolated, just like for out-of-sample
forecasting. However, the score-driven updating process immediately continues when
new information arrives. Since the location-scale GAS factors are completely revealed by
past observations, i.e., the factors are perfectly predictable one step-ahead, our filtered
means and variance are effectively in-sample density forecasts at missing entries. Along
this line of reasoning, we quantify the uncertainty of predicted factors at sparse time
points via simulation-based in-sample forecast bands. Such a prediction mechanism for
treating missing data justifies the claim of forecasting missing values given in the title of
the paper.
The remainder of the paper is organized as follows. Section 2 introduces the dynamic
location-scale model and provides an overview of empirically relevant designs for its fac-
tor loading matrix, and further introduces the extended GAS filter and our simulation
algorithm to construct in-sample forecast bands. In Section 3, an extensive Monte Carlo
experiment is performed to study the performance of our model across a range of missing
data mechanisms. A similar experiment is conducted in Section 4 to verify the strong
performance of our framework on empirical data. Section 4 also includes our main case
study to highlight that the dynamic density model scales well to high dimensions. In
particular, we forecast missing observations present within the time-series of more than
140 credit curves (daily observations over 10 years), whilst using a hierarchical design
matrix that includes nearly 100 common and idiosyncratic location and scale factors.

2 Econometric Framework

2.1 Classification of missing data


Let yt = (y1,t , . . . , yp,t )0 ∈ Rp be the p-dimensional time-series observation vector at time
t. Suppose that yt includes missing entries. For this, a p-dimensional indicator function
It = (I1,t , . . . , Ip,t )0 is defined whose i-th element takes the value Ii,t = 1 if yi,t is observed
and Ii,t = 0 when yi,t is missing, for i = 1, . . . , p and t = 1, . . . , n. Missing data is typically
assumed to be missing at random (MAR). Under the MAR assumption the missing data
mechanism Ii,t is a Bernouilli distributed random variable with probability πi for Ii,t = 1
and Ii,t = 0 with probability 1 − πi . In addition, {yt }nt=1 is assumed to be independent
of {It }nt=1 and the individual MAR probabilities πi , . . . , πp are also independent from

4
each other, i.e., the missingness probability is only the same within each time-series.
However, missing values cannot always be classified as MAR because time-series may
include systematic missing patterns. For example, time-series variables may include gaps
and open ends because the prices of financial instruments are illiquid, or simply because
the variable did not exist yet and/or its evolution stopped. Such missing data patterns
imply non-random missing data mechanism is non-random, i.e., the missing observations
are not driven by a stochastic It . Since missing entries are pre-determined when data
is not MAR, we classify such time-series as being conditionally missing. In particular,
the missing data mechanism is then driven by a deterministic sequence It , whose i-th
element takes the value Ii,t = 0 at pre-determined time points t = τ, . . . , τ ∗ and Ii,t = 1
otherwise. The conditionally missing situation also includes holidays on which data are
not observed because holidays do not take place randomly.

2.2 Dynamic multivariate conditional location-scale factor model


Let yt = (y1,t , . . . , yp,t )0 ∈ Rp be the p-dimensional time-series observation vector at time
t and define ft as the vector of time-varying parameters. The information set up to time
t consists of Ft = {yt , yt−1 , . . . , y1 }. We assume that the time-series data yt follows a
Gaussian distribution with a time-varying mean vector µt = (µ1,t , . . . , µp,t )0 and diagonal
2 0

time-varying variance matrix Σt = diag (σt2 ), with variance vector σt2 = σ1,t 2
, . . . , σp,t .
The conditional distribution of yt takes the form
exp − 12 (yt − µt )0 Σt (yt − µt )

p (yt |ft , Ft−1 ; ψ) = , (1)
(2π)p/2 |Σt |1/2
where ψ is the static parameter vector, and together with Ft−1 , the vector of dynamic
parameters ft fully specifies the observation density p(·). In particularly, we assume
that that each cross-section yt follows a strong factor structure that decomposes the
0
data into m unobserved mean and log-variance components: ftµ = f1,t µ µ
, . . . , fm,t and
2
 2 2
0
ftlog σ = f1,t
log σ log σ
, . . . , fm,t . The unobserved mean and log-variance components may
consist of both common and idiosyncratic factors and are independent from each other.
The design of each cross-sectional decomposition for the mean and variance vectors is
governed by a pre-specified matrix, say the p × m factor loading matrix Ft , which is
known and fixed at time t. The time-series vector yt evolves then according to the

5
following location-scale observation equation
1
yt = µt + Σt2 εt , εt ∼ N (0, Ip ),
 2
 (2)
µt = Ft ftµ , σt2 = exp Ft ftlog σ ,
for t = 1, . . . , n and with εt being a p-dimensional vector of standard normal independent
innovations across the cross-section and time. The design matrix Ft acts as a selection
matrix in order to aggregate (a selection of) the unobserved factors time-varying param-
2
eter vectors ftµ and ftlog σ into means µt and variances σt2 of yt through time. These
dynamic parameters are assumed to be independent from each other and are all collected
 0
µ log σ 2
in the 2m × 1 vector ft = ft , ft .
We further assume that the driving mechanism of the time varying parameter vector
ft+1 is observation-driven, i.e. it only depends on its previous value ft , past observations
yt , and potentially on the matrix Ft , and is parameterized by the static parameter vector
ψ, i.e., ft+1 = g(ft , yt , Ft ; ψ). In the generalized autoregressive score (GAS) framework of
Creal et al. (2013) and Harvey (2009), the transition function g(ft , yt , Ft ; ψ) is specified
by the autoregressive predictive filter recursion

ft+1 = κ + Bft + Ast , (3)


 0  
2 2
for t = 1, . . . , n, where κ = κµ , κlog σ is a vector, A = diag Aµ , Alog σ and B =
 
2
diag B µ , B log σ are matrices of with coefficients with proper dimensions, and all co-
efficients are collected in ψ. The innovations in the GAS transition equation in (3) are
driven by the scaled-score process st . This process is defined as the scaled derivative of
the log-observation density of yt with respect to the vector ft , at time t, that is
∂ log p (yt |ft , Ft−1 ; ψ)
st = St ∇t , ∇t = , (4)
∂ft
 0
µ log σ 2
∇t = ∇t , ∇t is the time t score vector of the predictive likelihood, evaluated at
 2

ft , and St = diag Stµ , Stlog σ is a positive definite scaling matrix of appropriate size.
Creal et al. (2013) suggest to base the scaling on some positive power of the information
matrix of ft to account for the local (at time t) curvature of the likelihood function. More
specifically,
∂ 2 log p (yt |ft , Ft−1 ; ψ)
 
St = It−λ , It = −Et−1 , (5)
∂ft ∂ft0
 2

where It = diag Itµ , Itlog σ . Possible choices for scaling are λ ∈ {0, 12 , 1} and each
choice leads to GAS factor recursions with different properties. When λ = 0, no scaling
1
is applied because St reduces to the identity matrix I. With square-root scaling λ = 2
one

6
imposes Var[st ] = I, whereas with unit-scaling St becomes equal to the inverse Fischer’s
information matrix It−1 , such that Var[st ] = It−1 . With the latter choice, the GAS
recursion in (3) resembles the Gauss-Newton algorithm for updating the time-varying
parameter vector ft through time and typically brings GAS models closest to well known
observation-driven models, such as ARMA and GARCH (Creal et al., 2008, 2013; Harvey,
2009).
2
The independence between the unobserved component vectors ftµ and ftlog σ , results
into stacked scaled-scores and parameter matrices. This makes it possible to let these dy-
namic set of parameters evolve through separate GAS recursions. Filtering time-varying
parameters with multiple set of transition equations comes with substantial increases
in numerical efficiency as it avoids computing inverses of potentially large matrices It
2
through time. For example, with a GAS recursion for each m × 1 vector ftµ and ftlog σ ,
2
each filter will have its own m × 1 scaled-score process: sµt and slog
t
σ
. These scaled-score
processes are twice as small as the one in (4) and thus require the computation of smaller
2
matrix elements, i.e., two m × 1 score vectors ∇µt and ∇log
t
σ
, and two m × m scaling
2
matrices Stµ and Stlog σ . This leads to the following results.

Proposition 1. For the Gaussian observation density in (1), together with the obser-
2
vation equation in (2), the m × 1 score vectors ∇µt and ∇log
t
σ
, the m × m information
2
matrices Itµ and Itlog σ are
σ2 1
∇µt = Ft0 Σ−1 ∇log = Ft0 ε2t − ιp ,

t (yt − µt ) , t
2 (6)
2 1
Itµ = Ft0 Σ−1
t Ft , I log σ
t = F 0
F t ,
2 t
2 −λ
2
 
−1/2
where εt = Σt (yt − µt ), Stµ = (Itµ )−λ , Stlog σ = Itlog σ are m × m scaling matrices,
ιp is a p × 1 unit vector, Ft0 Σ−1 0
t Ft and Ft Ft are both m × m invertable matrices, and

t = 1, . . . , n.

See appendix for all proofs and derivations.


With inverse scaling λ = 1, the scaled scores become
−1 0 −1
sµt = Ft0 Σ−1t Ft Ft Σt (yt − µt )
(7)
σ2 −1
slog = (Ft0 Ft ) Ft0 ε2t − ιp

t

Both scaled-scores take familiar regression estimator expressions, namely sµt is a GLS
with dependent variable equal to the prediction error yt − µt , with regression matrix
2
Ft . Similarly, slog
t
σ
is an OLS with dependent variable equal to the excess squared

7
innovations term ε2t − ιp and regression matrix Ft . It is mention-worthy that the inverse
of scaling matrices only exist when the normal matrices are invertible, for t = 1, . . . , n.
Singularity is avoided by imposing the trivial requirement by the rank theorem of having
no more independent factors than the number of time-series in the system, i.e., p ≤ m.
Similarly, we require that the m location and log-variance GAS factors are independent
of other factors, i.e., perfect multicollinearity.

2.3 Specifying factor loading matrix Ft for missing observations


The factor loading matrix Ft specifies the design for the cross-sectional decomposition
of the observation density through time. In many cases of interest, however, matrix Ft
is not time-varying and the design is fixed, i.e., Ft = F , for t = 1, . . . , n. It simplest
specification is the p-dimensional identity matrix F = Ip . Through this design each
location and log-scale of the distribution is modeled separately, across the p series in the
system. That is, each time-series follows a univariate specification of the model in (2) and
boils down to an equation-by-equation application of this model. In fact, with inverse
scaling λ = 1, the GAS model reduces to the VARMA(1,1)-VEGARCH(1,1) model
yt+1 = κµ + B µ yt + (Aµ − B µ )et + et+1 , et := Σt εt ,
2 2 2
(8)
2
= κlog σ + B log σ log σt2 + Alog σ ε2t − ιp ,

log σt+1 εt ∼ N (0, Ip ) ,
2 2
for t = 1, . . . , n, with c := κµ , Φ := B µ , Θ := Aµ − B µ , ω := κlog σ , α := Alog σ and
2
β := B log σ are static parameter matrices of appropriate size and Σt = diag (σt2 ).
However, for any other fixed and (pseudo) invertable p × m matrix F , the VARMA-
VEGARCH model in (8) takes the following expression
 −1 0 −1 
yt+1 = F κµ + F B µ F −1 yt + F Aµ F 0 Σ−1 t F F Σ t − B µ −1
F et + et+1 ,
(9)
2 2 2 −1
2
= F κlog σ + F B log σ F −1 log σt2 + F Alog σ (F 0 F ) F 0 ε2t − ιp ,

log σt+1
where, again, et := Σt εt , εt ∼ N (0, Ip ) and Σt = diag (σt2 ). The matrix F now augments
the VARMA-VEGARCH model and its exact model formulation is dependent on F ’s
design. The factor loading matrix F can provide new specifications for the VARMA-
VEGARCH. For example, one particular specification of our interest is the common factor
specification, i.e., F is the (p × 1) unit vector F := ιp . This design lets all p locations
2 2
and scales evolve through single (GAS) factors, i.e., ftµ := fc,t
µ
and ftlog σ := fc,t
log σ
,
respectively. Assuming that common factors are estimable when there is at least one

8
observation not missing1 in the cross-section of yt , the factors can be used to impute
density the missing observations in yt . The common factor specification of F simplifies
the model in (9) in a intuitive manner. First, the single common factor specification
implies m = 1 and as a consequence, all coefficient vectors/matrices in (9) reduce to
scalars. Second, it is easy to verify that the autoregressive term in the mean and log-
variance equation will simply evolve as cross-sectional averages of yt and log σt2 over time,
for all series, e.g.,
p
!
X
F Aµ F −1 yt = Aµ ιp ι−1 µ −1 0
p yt = A p ιp ιp yt = A
µ
p−1 yi,t ιp = Aµ y t ιp ,
i=1

since A is a scalar and the pseudo inverse of ιp equals the 1×p vector: ι−1
µ
p = (1/p, . . . 1/p).

The corresponding MA and ARCH term will evolve similarly through their GLS/OLS-
based estimates for intercepts only. Clearly, the cross-sectional average estimates, i.e.,
the common factors, exist when there is at least one observation available, thus making
it possible to use it for imputation of location and scale.
Another design matrix specification of our interest, is the one that includes both
common and idiosyncratic factors. With such a matrix F it is possible to decompose the
cross-section of means and (log-)variances through multiple factors. For example, consider
the full rank factor loading matrix that decomposes the mean vector µt as follows:
    
µ
µ1,t 1 1 0 0 fc,t
 µ  1 0 . . . 0   f µ 
    
2,t   1,t 
 .  = .   . , (10)
  
 ..   .. 0 0 1   .. 
    
µ
µp,t 1 −1 . . . −1 fp−1,t
| {z } | {z } | {z }
µt (p×1) F (p×p) ftµ (p×1)
µ
for t = 1, . . . , n, F is a p × p matrix, and fc,t is the common GAS location factor and
µ µ
f1,t . . . , fp−1,t are the remaining p − 1 idiosyncratic location factors. This matrix imposes
that each individual mean follows one common and one idiosyncratic factor through a
spatial decomposition. The spatial decomposition in (10) uses effect coding instead of
plain vanilla dummy coding. In effect coded matrices the unit vector does not represent
the mean of the group whose dummy is discarded to avoid perfect multicollinearity,
as in dummy in dummy matrices, but the overall grand mean of all groups, i.e., it is
the common factor. This is a key advantage over dummy matrices when dealing with
sparse series, because the common factor can be estimated when there is at least one
1
Subsection 2.4 discusses this matter in detail.

9
observation not missing in the cross-section. Hence, with effect coding it does not matter
which observation is missing, whereas with dummy matrices the intercept will simply
not be estimable if the time-series corresponding to the intercept is missing. Another
advantage of effect coded matrices is that it allows to estimate an idiosyncratic factor
for all series in the system, without making the normal matrix singular. Namely, the
decomposition of the means in (10) is equivalent to

f µ + f µ ,

if i = 1, . . . , p − 1
c,t i,t
µi,t := Pp−1 µ
µ
fc,t − i=1 fi,t , if i = p,

where − p−1 µ
P
i=1 fi,t is the (effect coded) idiosyncratic of the p-th time-series, for t =

1, . . . , n. Note that the same spatial decomposition naturally applies to the time-varying
variances as well, since F is also used in (2) to model the log-variance factors.

Figure 1: An hierarchical design for factor loading matrix F

Level 1 Level 3
y1,t
 
1 1 0 1 0
 
Cluster 1 
 1 1 0 0 1 

 

 1 1 0 −1 −1 

 

 1 0 1 1 0 

 
F9×9 = Cluster 2  1 0 1 0 1  y5,t
 
 

 1 0 1 −1 −1 

 

 1 −1 −1 1 0 
 
Cluster 3 
 1 −1 −1 0 1 
1 −1 −1 −1 −1 y9,t
Level 2 Idio. Factors Cluster 2
Notes: An illustration of an effect coded multi-level loading matrix for 3 clusters among 9 time-series in the cross-section.
The level 1 dummy indicates the common level among all series, level 2 dummies account for fixed effects across the clusters,
and level 3 dummies model idiosyncratic dynamics within each cluster.

The design for F can be further extended such that the factor loading matrix imposes
a hierarchical structure with multiple common factors, together with accompanying id-
iosyncratic factors. The p time-series in the system may belong to certain predefined (i.e.,
known) clusters or groups. The cross-sectional cluster dynamics can be modeled as time-
varying fixed effects by simply including more (effect coded) dummies in F . Consider
the case with 3 unique cluster, among p = 9 time-series. Assume that the size of each

10
cluster is also 3. Then, we can induce (at least) 3 common factors in F as depicted in
the figure above. The loading matrix in Figure 1 decomposes the cross-sections of p = 9
time-series through m = 9 free factors and 4 effect coded factors. This yields a total of
13 factors, while the matrix remains invertable. It consists of 4 common factors, i.e., 1
shared among all series, 2 for just the first two clusters, and the third one is effect coded,
which is computed whilst using the other cluster factors. Similarly, the matrix includes
9 idiosyncratic factors, one for each time-series in the system. It is mention worthy that,
local in time, the level 1 common factor can be estimated when there is at least one
observation not missing in the cross-section, whereas level 2 factors require one available
observation in each cluster. Also, when the time-series are known to change from clusters
over time, the level 2 dummies can be designed accordingly to allow for a time-varying
Ft , for t = 1, . . . , n.
Moreover, the general notation of the model in (2), together with the GAS recursion in
(3), comes with additional flexibility that also allows to formulate well known structural
state space model. For example, for a univariate time-series the time-varying drift speci-
2 2
fication for its log-variance is obtained by setting F := 1, κlog σ = B log σ := 0 and scalar
2
coefficient Alog σ := a. As a second example, the so-called local linear trend model for
2 2
is obtained by taking F := (1, 0) and setting κlog σ := (0, 0)0 , B log σ := [(1, 1), (0, 1)] and
2
Alog σ := diag(a1 , a2 ), with scalar coefficients a1 and a2 . The main difference between lin-
ear Gaussian state space models and our GAS model, is the flexibility to simultaneously
structurally decompose the location and scale of (multivariate) time-series.

2.4 Forecasting missing observations


Out-of-sample forecasting with GAS models is typically performed by replacing the
scaled-score innovation st in the predictive GAS filter by its conditional expectation,
i.e, by continuing to run the GAS recursion in (3) with st = 0, for t = n + 2, n + 3, . . . .
The score is put to zero due to no observations being available to update the score. In this
line of thought, Creal et al. (2014) also recommend setting the score to zero when miss-
ing values are encountered during the filtering process. This is the most frequently used
method to deal with missing observations in GAS models and is more formally referred
to as the ‘setting-to-zero’ (Blasques et al., 2021). Under the context of a multivariate
Gaussian setting, Lucas et al. (2016) relate this method to the Expectation-Maximization
algorithm, and justify why putting the score to zero is a reasonable way to treat missing

11
value in score-driven models. Therefore, the ’setting-to-zero’ method is a convenient way
to handle missing data in GAS models. As a result, it has been successfully applied in a
wide range of empirical applications, including as Creal et al. (2014); Delle Monache et
al. (2016); Koopman et al. (2018); Buccheri et al. (2020).
Putting scores to zero in multivariate score-driven models with multiple factors, how-
ever, is not straightforward. In univariate GAS models, the conditionally latent moment
of the respective density is one-to-one related with its GAS factor, e.g., µt := ft or
log σt2 := ft . In our case, the factor loadings matrix Ft relates GAS factors to the time-
varying mean and log-variance vectors of the multivariate Gaussian density. That is, p
time-series are decomposed into m mean and m log-variance GAS factors. The idea is that
when the cross-sectional dependence (in mean and variance) is modeled through multiple
common and idiosyncratic factors (over time), that some factors can, theoretically, still
be estimated if yt is only partially missing. More precisely, we assume that idiosyncratic
µ 2
log σ
factors, such as fi,t and fi,t for any yi,t , for i = 1, . . . , p, cannot be estimated whenever
µ log σ 2
yi,t is missing. Contrarily, common factors, say fc,t and fc,t , that model cross-sectional
dependencies among all time-series in yt are assumed to be estimable if there is at least one
observation not missing. Thus, setting st = 0 whenever yt contains missing observations
is not a viable option. This is an important matter when filtering multiple time-varying
µ
parameters, especially under the context of data imputation. Hence, common factors fc,t
log σ2
and fc,t estimated with partially missing observation vectors can be used to impute
the density of missing observations, local-in-time.
There are a few applications that discuss how to use the ‘setting-to-zero’ method in a
multi-factor environment. For example, Creal et al. (2014) propose to expand the compu-
tation of scores (and scaling matrices) into a summation on observation level (element-by-
element treatment). This allows ignoring missing values during intermediate calculations
because, at the observation level, the score is zero for missing observations. This is a
feasible solution, but it is not easy to implement as each score element is dependent
on the design of matrix Ft , i.e., the factor structure local-in-time. On the other hand,
Delle Monache et al. (2016) show how this method can be used through the use of selec-
tion matrices when the GAS model augments the state of the Kalman filter for linear,
and thereby the authors do not cover the scope of pure score-driven models. Below, we
propose an easy-to-implement ‘setting-to-zero’ method for filtering with missing values in
multi-factor score-driven models. Our solution closely follows the rules for dealing with
missing values in the Kalman filter and state-space models.

12
Suppose that the observation vector yt = (yi,t , . . . , yp,t ) is partially observed. Let pt
denote the number of vector values actually observed in yt , with 1 ≤ pt ≤ p at time t.
The observation vector without missing values is denoted as yt∗ = Wt yt , where Wt is a
pt × p known selection matrix, i.e., its rows are a subset of the rows of the identity matrix
Ip . Indeed, the matrix Wt is obtained by discarding the i-th row from Ip for each i-th
missing value in yt . Just like for state space models, we replace the original (complete)
observation equation in (2) by one for observed elements only, i.e.,
1
,∗
yt∗ = µ∗t + Σt2 ε∗t , ε∗t ∼ N (0, Ipt ),
 2
 (11)
µ∗t = Ft∗ ftµ , σt2,∗ = exp Ft∗ ftlog σ ,

where Ft∗ = Wt Ft and Σ∗t = diag σt2,∗ , for t = 1, . . . , n. A (potentially) lower di-


mensional observation-vector, yields to lower dimensional elements in the measurement


equation, e.g., pt × 1 mean vector µ∗t and pt × pt variance matrix Σ∗t . The update in the
GAS filter will proceed similar to the one in the standard case, but the update equation
in (3) needs to be slightly altered. Namely, we continue the score-updating process by
replacing the element in Proposition 1 with Ft∗ , µ∗t , Σ∗t and ε∗t . With this replacement,
however, Ft ∗ 0 Ft∗ may not be invertable anymore. The size of Ft∗ is pt × m and during
the selection process it is not ruled out that the number of location/scale factors to es-
timate, m, become larger than the number of observed values in the cross-section, pt .
In this case, we proceed by computing the scaled-score through Moore-Penrose inverses.
 2

The resulting scaled-score s∗t = sµt ,∗ , stlog σ ,∗ remain of size 2m × 1. Time points that
require pseudo inverses are a clear indication that certain factors cannot be estimated
due to sparsity, i.e., ∂ log p (yt |ft , Ft−1 ; ψ)/∂fm,t = 0. Thus, for those factors the corre-
2 ,∗
sponding values in sµt ,∗ and slog
t
σ
should be set to zero. The setting-to-zero’ method
is here employed through an indicator function. Consider an m-dimensional indicator
function 1t whose j-th value takes the value 1 if the j-th column in Ft ∗ contains zeros
only, and zero otherwise. That is, 1t will consists of ones only for GAS factors that can
still be estimated through yt∗ . The indicator can be used to set the score to zero for fac-
tors that cannot be estimated through an multiplication, i.e., 1t sµt ,∗ and 1t stlog σ ,∗ ,
2

where denotes the Hadamard’s element-by-element multiplication operator. Filtering


will then proceed by simply replacing st with 1∗t s∗t , with 1∗t = (1t , 1t )0 . The full pro-
cedure of filtering with (and without) missing observations is summarized in Algorithm 1.

13
Algorithm 1: GAS filtering w/ and w/o missing observations
2
Initialize GAS filters with some initial state vectors f1µ ∈ Rm and f1log σ ∈ Rm .
for t = 1, . . . , n do
if yt is either completely or partially observed then
Determine Wt and compute

yt∗ = Wt yt , Ft∗ = Wt Ft , µ∗t = Ft∗ ftµ ,


 2
   1
,∗
(12)
σt2,∗ = exp Ft∗ ftlog σ , Σ∗t = diag σt2,∗ , ε∗t = Σt2 (yt∗ − µ∗t );

2 ,∗
Use Ft∗ , µ∗t , Σ∗t , ε∗t to obtain scaled-scores sµt ,∗ and slog
t
σ
through Proposition
1 and use Moore-Penrose inverses if Ft ∗ 0 Ft ∗ is singular;
Construct 1t by inspecting Ft ∗ for columns that contain 0s only, and set
1∗t = (1t , 1t )0 ;
Proceed with updating of ft as follows

ft+1 = κ + Bft + A1∗t s∗t , (13)

else
Obtain update of ft with st = 0:

ft+1 = κ + Bft (14)

end
end
It is mention-worthy that Blasques et al. (2021) derive the asymptotic properties of
the ML estimator under the employment of this method and argue that the ‘setting-to-
zero’ method may not lead to consistent estimation of the model’s static parameter vector
ψ. However, the results of our extensive Monte Carlo study yield no signs of inconsistent
inference on the MLE of the factor loading matrix used in this study.

2.5 In-sample forecast bands


The one-step-ahead predictions of time-varying parameters ft+1 rely on past information
only. The key ingredient of ft ’s driving mechanism is the scaled-score st , which mainly
depends on the innovations εt . This is especially evident for the (scaled-)score of the
log-variance equation in (6), wherein εt is the only element unknown at time t. Hence,
in the GAS model, by construction, the Gaussian moments µt and Σt , and the loadings
matrix Ft are known and fixed at time t. This also holds for the score vector of the

14
−1
location factors. Recall that εt = Σt 2 (yt − µt ). It follows
−1 − 21 −1
∇µt = Ft0 Σ−1 0
t (yt − µt ) = Ft Σt Σt
2
(yt − µt ) = Ft0 Σt 2 εt , (15)

for t = 1, . . . , n. Therefore, the only element required to determine the future value of ft+1
is the innovations vector εt . For the production of out-of-sample forecasts fn+h , for h ≥ 2,
the innovation st cannot be updated and is set to zero. Since the Gaussian innovations
play a crucial role in determining the future value of ft+h , the uncertainty of future
innovations is taken into account during the construction of forecast bounds for ft+h , see
e.g., Blasques et al. (2016) for a detailed discussion. Out-of-sample forecast bands are
obtained by simulating future innovations to update the score innovation sn+h and build
simulated path of one step-ahead predictions. However, when values in yt are missing,
values are also missing in εt . In other words, the filtered estimates ft can be treated
as in-sample forecasts. This makes it possible to describe the prediction uncertainty at
sparse time points by means of in-sample forecast uncertainty.
The simulation-based method of Blasques et al. (2016) computes forecast bands whilst
taking both innovation uncertainty and parameter uncertainty into account. Parameter
uncertainty refers to the fact that the true parameter vector ψ is unknown and is replaced
 
by its MLE ψb to estimate the dynamic parameters, i.e., fbt+1 = f fbt , yt , Ft ; ψb . The
simulation method first samples multiple sets of static parameters from the (approximate)
asymptotic distribution of the MLE, namely
 
b n−1 W
ψb(j) ∼ N ψ, c , (16)
h i
for j = 1, . . . , S predefined number of samples S, and Var ψb = n−1 W
c is the asymptotic
covariance matrix of the MLE. We use the robust estimator of White (1980) to compute
the variance matrix, just like in Blasques et al. (2016). Thereafter, each sample ψb(j)
  n oS
(j) (j) (j)
is used to build S new paths of fbt+1 = f fbt , yt , Ft ; ψb(j) . The collection fbt
j=1
characterizes the empirical distribution function fbt over S values i.e.,
  S  
1 fbt(j) ≤ x ,
X
−1
HS,t (x) = PrS,t ft ≤ x = S
b (17)
j=1

where HS,t (x) denotes the empirical cdf of fbt , for t = 1, . . . , n, and 1(·) is an indicator
function that returns 1 when the respective condition is satisfied and 0 otherwise. The
above cdf allows to compute is-sample confidence bands that inherit the uncertainty
present within the MLE. For out-of-sample periods, innovations ε†t+h are sampled from
the standard normal distribution and the filtering process is continued with the pre-

15
sampled static parameter set ψb(j) , i.e., in a non-nested simulation. Thereby, forecast
bands will inherit both innovation and parameter uncertainty. We adopt this method to
construct to describe the uncertainty of in-sample forecasts for ft at sparse time points.
The algorithm to characterize that incorporates innovation and parameter uncertainty
for in-sample forecasts, is given below.
Algorithm 2: In-sample forecast bands for GAS factors
2
Initialize GAS filters with some initial state vectors f1µ ∈ Rm and f1log σ ∈ Rm .
for j = 1, . . . , S do
Simulate ψb(j) using (16)
for t = 1, . . . , n do
if yt is complete then
2
Compute sµt and slog
t
σ
using (6);
else if yt is partially observed then
Define number of missing observations in yt as kt = p − pt ;
Compute pt × 1 vector ε∗t using (12);
Simulate kt × 1 vector ε†t ∼ N (0, Ikt );
 0
Set p × 1 vector ε̃t = indexsort ε∗t , ε†t , with indices i = 1, . . . , p;
2
Use Ft , Σt and ε̃t to compute sµt and slog
t
σ
through (6) and (15);
else
Simulate p × 1 vector ε†t ∼ N (0, Ip );
2
Use Ft , Σt and ε†t to compute sµt and slog
t
σ
through (6) and (15);
end
(j)
Obtain update of fbt through
 
(j) (j)
fbt+1 = f fbt , yt , Ft ; ψb(j) , (18)

whilst using (3).


end
end
(j) S
n o
Specify FS,t (x) as the collection fbt , for t = 1, . . . , n.
j=1

2.6 An illustrative example


The density forecasting framework for the treatment of missing values is not dependent
on the missing data mechanisms classified in Subsection 2.1. It is, however, interesting to
study how the dynamic location-scale factor model behaves when observations are either
conditionally or randomly missing. For this, we graphically exploit the GAS model’s

16
architecture under the presence of missing data and illustrate the methodology discussed
so far. In our illustrations, a sample realization from the true DGP of the GAS model
is used. The time-series dimension is set to p = 4 and each series is of length n = 2500.
The DGP includes 1 common factor, 3 free idiosyncratic factors, and 1 effect-coded
idiosyncratic factor. The factor loading matrix is kept constant, i.e.,
 
1 1 0 0
 
1 0 1 0 
F = ,
 
1 0 0 1 
 
1 −1 −1 −1
The static parameters are specified by the following parameter matrices
   
0.4 0 0 0 0.98 0 0 0
   
 0 0.1 0 0   0 0.95 0 0 
µ µ µ
κ = 0ι4 , A = , B = ,
   
0 0 0.1 0   0 0 0.95 0 
 
 
0 0 0 0.1 0 0 0 0.95
   
0.04 0.2 0 0 0
   
 0   0 0.1 0 0 
log σ 2 log σ 2 2
κ = , A = , B log σ := B µ ,
   
 0  0 0 0.1 0 
   
0 0 0 0 0.1
With this specification, the cross-sections of yt include a strong location/scale common
factor structure together with persistent idiosyncratic dynamics, while the strength of
the conditional mean and variance signals remains more or less equal. All series are
initially completely observed, but missing entries are gradually introduced in the system.
Entries in the simulated yt are discarded by either a deterministic or stochastic missing
data pattern It . The conditionally missing data pattern is introduced in the true data
by putting certain intervals in the series to missing. Namely, we set a gap in the middle,
together with open ends on both sides of the series to missing. The missing window size
is set to 20% of the length of the series n, such that 60% of each series’ observations are
unobserved. Stochastic sparsity is introduced in the data by putting a certain percentage
of each time-series to randomly to missing. This done by using a Bernoulli random
generator with πi = 25%, for i = 1, . . . , p = 4.
Each missing data pattern is induced iteratively in the simulated time-series data.
Observations in each series are put to missing in a fixed order, i.e., first observations
in y1 are discarded and afterwards in y2 , etc. We consider each iteration as a separate

17
simulation, which results in having 5 different missingness scenarios for the original sample
realization {yt }nt=1 . The first scenario is the base scenario in which all data points are
observed. In the second scenario, only the first time-series vector (y1 ) will contain missing
observations, whereas the third scenario corresponds to a time-series system in which the
first two time-series (y1 and y2 ) have missing entries, etc. The GAS filter is rerun on
the sparse time-series system in each scenario to obtain so-called re-filtered estimates for
the conditionally time-varying parameter vector ft . The re-filtering is performed whilst
using the true static parameter vector because it allows to study how a correctly specified
GAS filter behaves under the presence of missing values among multiple time-series. In
addition, the forecast bands are simulated at each scenario with S = 500 paths without
parameter uncertainty.
Figure 2 displays the re-filtering results with conditionally missing observations. Each
panel of Subplot 2a displays the time-series system for one of the five scenarios. Subplot 2b
displays re-filtered estimates for the common dynamic location and log-variance factors
µ 2
log σ
(fc,t and fc,t ) against the true ones, which are obtained by rerunning the predictive
filter on the corresponding sparse systems in Subplot 2a. In the first scenario, the re-
filtered and true estimates of the common factors perfectly agree because there are no
missing observations and the GAS filter is exact. Since the filter is exact, there is no
innovation uncertainty, and as a result there no are no in-sample forecast bands. In the
second scenario, multiple observation windows in y1 are discarded. Now the re-filtered
estimates become slightly inaccurate and forecast bands start to grow because innovation
uncertainty started to exist. However, the re-filtered estimates converge to their true path
after fully observed yt ’s are encountered. This is a result of the GAS filter being invertible.
In the subsequent scenarios, more sparse series with the exact same missing entries enter
the GAS recursion and the re-filtered estimates of the common levels become more and
more imprecise and innovation uncertainty increases. The latter is evident by the size of
the forecast bands. In the last panel, all time-series are subject to conditionally missing
observations, and the common factors estimates and their forecast bands converge to
their unconditional mean and variance of the predictive filter, respectively.
The re-filtered estimates at the fourth scenario highlight that the common factors can
be well tracked when there is at least one data point observed in the cross-section. This
also holds for the re-filtered estimates in Subplot 2c. The panels in this graph depict
the re-filtered estimates of the fourth mean µ4,t and the third idiosyncratic log-variance
2
log σ
f3,t (corresponds to the third time-series: y3 ). It is noticeable that when the first

18
three time-series have conditionally missing observations (fourth panel in Subplot 2c),
2
log σ
the re-filtered path f3,t also mean reverts to its unconditional mean. Since y3 is also
2
log σ
missing at the 5-th scenario, the re-filtered estimates of f3,t are the same in the last
two panels in Subplot 2c. It is noteworthy that when only y4 is completely observed, the
mean and variance of all time-series is equal to the common mean and variance factors,
at the conditionally sparse locations. This is evident by the plotted re-filtered estimates
µ
of fc,t and µ4,t in the second to last panels of Subplots 2b and 2c. However, this is only
the case when the unconditional mean of all factors is equal to each other because the
otherwise mean-reverted idiosyncratic filter would shift the location and scale estimates
from each other.
Figure 3 displays the re-filtered estimates under the presence of a stochastic miss-
ing data mechanism. For comparison purposes, the exact same realization of time-series
is used as in Figure 2. In addition, the re-filtered estimates are plotted for the same
factors/moments as in the previous subplots. It is again evident that the refiltered es-
timates become more imprecise as fewer data enters the GAS filter at the same time
points. However, across all missingness scenarios, the re-filtered time-varying parameters
on randomly missing data seem to be more accurate than conditionally missing observa-
tions. The reason behind this rate of accuracy is that even when 75% of the observations
among all series in the system is missing, the probability of the entire cross-section being
sparse is very low. Therefore, it is unlikely that local-in-time, any of the location/scale
factors have to converge to their unconditional mean. Hence, at every sparse time point,
potentially, more than 1 observed data points enter the GAS filter to re-filter the time-
varying factors. In our simulation study, we numerically investigate the accuracy of the
GAS filter across all missingness scenarios and the two missing data mechanisms, while
treating the true static parameter vector as unknown.

19
Figure 2: Filtering with conditionally missing observations

(a) (b) (c)


Panel A: 0 Sparse Time-Series Panel A: 0 Sparse Time-Series Panel A: 0 Sparse Time-Series
3 1.5
20 y1, t y2, t y3, t y4, t True Refiltered True Refiltered True Refiltered True Refiltered
2 2 1.0
1, t 3
10 1
1, t 1 0.5

fc,logt 2

f3,logt 2
2

4, t
fc, t
0 0 0 0.0
10 1 1 1 0.5
2 2 1.0
20
Panel B: 1 Sparse Time-Series Panel B: 1 Sparse Time-Series Panel B: 1 Sparse Time-Series
3 1.5
20 y1, t y2, t y3, t y4, t 2
3 2 1.0
10 1 1 0.5

fc,logt 2

f3,logt 2
2

4, t
fc, t
0 0 0 0.0
10 1 1 1 0.5
2 2 1.0
20
Panel C: 2 Sparse Time-Series Panel C: 2 Sparse Time-Series Panel C: 2 Sparse Time-Series
3 1.5
20 y1, t y2, t y3, t y4, t 2
3 2 1.0
10 1 1 0.5

fc,logt 2

f3,logt 2
2

4, t
fc, t
0 0 0 0.0
10 1 1 1 0.5
2 2 1.0
20
Panel D: 3 Sparse Time-Series Panel D: 3 Sparse Time-Series Panel D: 3 Sparse Time-Series
3 1.5
20 y1, t y2, t y3, t y4, t 2
3 2 1.0
10 1 1 0.5

fc,logt 2

f3,logt 2
2

4, t
fc, t
0 0 0 0.0
10 1 1 1 0.5
2 2 1.0
20
Panel E: 4 Sparse Time-Series Panel E: 4 Sparse Time-Series Panel E: 4 Sparse Time-Series
3 1.5
20 y1, t y2, t y3, t y4, t 2
3 2 1.0
10 1 1 0.5

fc,logt 2

f3,logt 2
2

4, t
fc, t
0 0 0 0.0
10 1 1 1 0.5
2 2 1.0
20
0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500

Notes: An illustration of filtered estimates of time-varying factors with conditionally missing observations, under correct specification and known static parameter vector ψ. Subplot 2a displays
in its Panel A the system of 4 simulated time-series, with their true paths of their conditional mean (pink) and volatility (blue). Each panel in this subplot indicates a scenario of the number of
subsequent sparse series in the system, containing missing observations at the exact same locations. Subplot 2b displays in each panel the filtered estimates of the common mean and log-variance
2
µ log σ
factors: fc,t and fc,t , given the time-series in Subplot 2a. Their true paths are plotted in dashed black lines and the re-filtered estimates in pink and blue for the mean and variance factors,
respectively. Similarly, Subplot 2c depicts the filtered estimates of the fourth mean estimate µ4,t and the third idiosyncratic log-variance factor corresponding to fourth time-series: y4,t . The
gray shaded areas depict forecast bands with 95% confidence, obtained through simulation of S = 500 paths without parameter uncertainty (ψ is known to us).

20
Figure 3: Filtering with randomly missing observations

(a) (b) (c)


Panel A: 0 Sparse Time-Series Panel A: 0 Sparse Time-Series Panel A: 0 Sparse Time-Series
3 1.5
20 y1, t y2, t y3, t y4, t True Refiltered True Refiltered True Refiltered True Refiltered
2 2 1.0
1, t 3
10 1
1, t 1 0.5

fc,logt 2

f3,logt 2
2

4, t
fc, t
0 0 0 0.0
10 1 1 1 0.5
2 2 1.0
20
Panel B: 1 Sparse Time-Series Panel B: 1 Sparse Time-Series Panel B: 1 Sparse Time-Series
3 1.5
20 y1, t y2, t y3, t y4, t 2
3 2 1.0
10 1 1 0.5

fc,logt 2

f3,logt 2
2

4, t
fc, t
0 0 0 0.0
10 1 1 1 0.5
2 2 1.0
20
Panel C: 2 Sparse Time-Series Panel C: 2 Sparse Time-Series Panel C: 2 Sparse Time-Series
3 1.5
20 y1, t y2, t y3, t y4, t 2
3 2 1.0
10 1 1 0.5

fc,logt 2

f3,logt 2
2

4, t
fc, t
0 0 0 0.0
10 1 1 1 0.5
2 2 1.0
20
Panel D: 3 Sparse Time-Series Panel D: 3 Sparse Time-Series Panel D: 3 Sparse Time-Series
3 1.5
20 y1, t y2, t y3, t y4, t 2
3 2 1.0
10 1 1 0.5

fc,logt 2

f3,logt 2
2

4, t
fc, t
0 0 0 0.0
10 1 1 1 0.5
2 2 1.0
20
Panel E: 4 Sparse Time-Series Panel E: 4 Sparse Time-Series Panel E: 4 Sparse Time-Series
3 1.5
20 y1, t y2, t y3, t y4, t 2
3 2 1.0
10 1 1 0.5

fc,logt 2

f3,logt 2
2

4, t
fc, t
0 0 0 0.0
10 1 1 1 0.5
2 2 1.0
20
0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500

Notes: An illustration of filtered estimates of time-varying factors with randomly missing observations, under correct specification and known static parameter vector ψ. Subplot 2a displays in
its Panel A the system of 4 simulated time-series, with their true paths of their conditional mean (pink) and volatility (blue). Each panel in this subplot indicates a scenario of the number of
subsequent sparse series in the system, containing missing observations at random locations.

21
2.7 Filtering equations for score-driven updates and smoothing
The GAS recursion in (3) lets the parameter vector ft evolve based on past observations
only. The model specification in (2) is thereby observation-driven model. This implies
that all parameters are one step-ahead perfectly predictable and that under correct spec-
ification that these parameters are completely revealed by past observations. However,
under misspecification, the predictive filter is (approximately) an expectation of the true
underlying state αt , namely ft ≈ E[αt |Ft−1 ]. As such, filters using past, contemporane-
ous, and future observations, i.e., smoothing, can provide more accurate estimates than
the predictive filter, especially with missing data.
Buccheri et al. (2021) recently proposed a methodology for GAS-based transition
equations that use more than just past data to estimate time-varying parameters. Their
score-driven update (uses past and current observations) and smoothing (uses all ob-
servations) technique is developed through an analogy between the GAS model and
linear Gaussian state-space model. In particular, a set of GAS update and smoothing
recursions is devised through a general representation of the Kalman filter and Kalman
smoother, respectively. The proposed recursions are approximate but equally powerful
as exact filtering techniques, see Buccheri et al. (2021) for more details.
Filtering with the GAS update and smoothing equations is straightforward as these
recursions do not require additional elements than the ones used for running the predic-
tive filter. The score-drive update recursion for ft is given by

ft|t = ft + B −1 Ast , (19)

for t = 1, . . . , n, and the score-driven backward smoother is

rt−1 = st (B − ASt It )0 rt
(20)
ft|n = ft + B −1 Art−1 ,

where rn = 0 and t = n, . . . , 1. Just like for the predictive filter in (3), updated and
µ µ log σ2
smoothed estimates of the location and log-variance factors, i.e., ft|t , ft|n and ft|t ,
log σ2
ft|n , are obtained by running the above recursions whilst using the score and scale
elements in (6).

22
2.8 Estimation
The filtered parameters ft are a function of the data. This allows to estimate the GAS
model’s static parameter vector ψ by maximum likelihood. That is, by numerically
maximizing the conditional log-likelihood function `t (ψ) w.r.t. ψ. In the presence of
missing data, however, likelihood evaluations do require careful attention. When no
observations are available the likelihood degenerates to zero, see Blasques et al. (2021).
On the other hand, having at least one observation not missing in yt (i.e., pt ≥ 1), then
`t is computed via the marginal likelihood function p(yt∗ |ft , Ft ; ψ). More formally, the
ML criterion function for multivariate score-driven models with missing observations is
defined as n
X
ψb := arg max `t (ψ),
ψ t=1

log p (y ∗ |ft , Ft ; ψ) , if pt ≥ 1,
 (21)
t
`t (ψ) =
0,

otherwise,

where yt∗ = Wt yt , for t = 1, . . . , n. Blasques et al. (2021) refer to the log-likelihood in


(21) as the pseudo log-likelihood function.
To carry out the ML estimation, one requires to first initialize the GAS filter recursion
in (3) at a fixed point f1 ∈ R2×m . A typical choice for the initialization of observation-
driven models is the unconditional mean of the filter itself, namely f1 = (I − B)−1 κ.
Using the unconditional mean of the GAS recursion as initial values does introduce
vulnerability to outliers in the beginning of the filter. To mitigate this issue one may
fix initial values at some estimates inspired by the sample moment of the data. For
example, consider the p × 1 vector of sample averages over τ consecutive observations
0
y 1:τ = y 1,1:τ , . . . , y p,1:τ , where y i,1:τ = τ −1 τt=1 yi,t , for i = 1, . . . , p. This vector of
P

averages can be used to analytically estimate the first values of the location factors, fbµ , 1

by solving
fb1µ = F1−1 y 1:τ ,

for any (pseudo-)invertable factor loading matrix F1 . By setting τ to a small number,


say τ = 10, one can ensure that the initial values will yield a relatively close prediction
b1 = F1 fb1µ to y1 . Similarly, we can define a p × 1 vector of prediction error variances over
µ
0
the first τ observations: s21:τ = s21,1:τ , . . . , s2p,1:τ , where s2i,1:τ = τ −1 τt=1 (yi,t − µ
bi,1 )2 ,
P

23
2
which we can use to solve for the log-variance factors’ initial value fb1log σ = F1−1 log (s21:τ ).
Such a way to find initial values can also be carried out under when the first τ observation
in the time-series include missing observations. In this case, the normalization of τ −1
in both summation to compute y i,1:τ and s2i,1:τ is adjusted accordingly. When the first
τ observation are completely missing, one may proceed with estimation of initial values
by only using available observations, i.e., yt∗ . Factors that cannot be estimated due to
(a sequence) of missing observations in the beginning of the time-series in the system,
can be simply initialized with their unconditional mean, as this the value to which it
converges over a long window of missing observations (assuming stationarity). Thus,
fb1 may contain both initial values computed through moments of the GAS filter and
sample moments of the time-series data.
Furthermore, sample moments may also be used to reduce the number of static
parameters by employing a mean and/or variance targeting approach. A targeting ap-
proach for GAS models is established by setting the unconditional mean of the filter
equal to its corresponding sample moment, which avoids numerical estimation of the
conditional intercept κ. This approach is employed fairly easily when the GAS factors
are one-to-one related with the time-varying moment, but our multi-factor location-
scale model establishes this relationship through a (potentially) time-varying matrix
Ft . Therefore, we also take an average of the factor loading matrices. For the location
factors, the mean-targeting approach takes the following form

−1
κµ := (Im − B µ )F y,
Pn
where F = n−1 t=1 Ft is an (pseudo-)invertible matrix and y = y 1:n . A similar expres-
2
sion for κlog σ is obtained by replacing y with log (s21:n ). However, variance-targeting is
not is recommended when the signal of the mean is higher than the volatility in the ob-
served time-series (e.g, modeling on levels). Hence, s2 will also include the magnitude of
the mean process’ (i.e., µt ) variance. When modeling (stationary) returns, for example,
variance-targeting will not introduce any inconsistencies.

24
3 Monte Carlo Evidence

3.1 Goals of MC experiments


The aim of our Monte Carlo experiment is to show that our dynamic multivariate
location-scale factor model performs well in the presence of missing values. To this
end, the Monte Carlo experiment follows the design of the illustrative examples in Sub-
section 2.6 in which the missing data mechanism is introduced in a sequential manner
among the series in the simulated system. The same DGP and scenarios in 2.6 are used
for the experiment (p = 4 time series of length n = 2500 with a strong factor structure
and effect coded factor loading matrix). For the deterministic pattern, we again consider
the case in which 60% of each time-series is to missing by discarding a sequence of a
gap in the middle and open ends on both sides of the series to missing. Similarly, for
the stochastic missing date mechanism a random Bernoulli generator is used with equal
missing rates across all time-series, but probabilities we extend the scope of probabil-
ities as follows: πi,t ∈ {25%, 50%, 75%}, for i = 1, . . . , p = 4. In addition, the static
parameter set ψ is treated as unknown. Thus, the GAS model is re-estimated by ML on
the time-series system of each missingness scenario. This allows us to study the finite
sample accuracy of the MLE and the model’s in-sample forecast performance, across
an increasing number of sparse time-series in the system, given either a deterministic or
stochastic missing data mechanism. For each missing data mechanism and scenario com-
bination, the bias of the MLE and the in-sample mean absolute forecast error (MAFE)
at each missing entry is computed for the time-varying factors. The MAFE measure is
also computed for functions of the dynamic parameters, such as means, volatilities, and
the tail measures value-at-risk (VaR) and expected shortfall (ES) (and their hit vari-
ables). Furthermore, for each scenario, the asymptotic robust covariance of the MLE is
computed to construct forecast bands that inherit parameter uncertainty. The bands are
constructed by simulating S = 500 paths and their coverage rates (at sparse locations
only) are also gathered to verify their accuracy at the nominal level.

25
3.2 Main simulation results
This subsection discusses the main findings of the Monte Carlo experiment. We present
bias estimates for the MLE and the in-sample forecast performance results for the dy-
namic factors. The remaining performance statistics of the experiment are provided
Appendix C. The bias estimates are reported in Table 1 and the MAFE and coverage
rates of the forecast bands for the time-varying factors are reported in Table 2. All
metrics are computed as the average across M = 500 replications and are reported for
both the conditionally and randomly missing data mechanisms, given the sparse series
in the system. The percentages indicate the missing rate of each time-series, given their
missing data mechanism. The sequential sparse series indicate (the order and) which
time-series include missing values, i.e., the missingness scenarios. For example, y1:3,t
implies that the first three time-series variables: y1,t , y2,t and y3,t , are subject to missing
observations.
The first column with biases estimates in Table 1 indicates that there are zero time-
series subject to missing observations, i.e., the GAS model in this scenario is estimated
on a completely observed time-series system. The results with no missing observations
suggest that the MLE posses little to no biases for the static parameters of the common
and idiosyncratic mean and log-variance factors. When the time-series is in presence of
conditionally missing observations, the biases do slightly increase across the missingness
scenarios but remain of the same magnitude as for the base scenario (zero missing
observation in system). The last scenario in which all four series contain a gap in the
middle and open ends on both sides can be compared to the same DGP but then with
an effective sample size n = 1000 and with no missing observations. As the time-series
system’s length is increased from 1000 to 2500, the biases show a reduction, which is in
line with standard asymptotic theory of the MLE. This is evident for the parameters of
all factors.
The Monte Carlo results further indicate that the biases are also small when the
data is randomly missing, but for the scaled-score parameters (A’s) the biases do tend
to slightly increase with the number of sparse time-series in the system and the miss-
ingness rate. It is further noticeable that for these parameters the biases are the largest
when the first three time-series are sparse and decrease when all time-series have missing

26
entries. Nevertheless, among all stochastic missing data rates and corresponding miss-
ingness scenarios, the biases are considerably small and have no impact on the forecast
performance. Hence, as evident in Table 2, the MAFE for randomly missing observations
with πi = 25% is even lower than scenarios with 60% conditionally missing.
Panels A and B in Table 2 report the MAFE for all time-varying factors the mean
and log-variance factors, respectively. The errors are reported for the exact same miss-
ing data mechanism and scenario combinations in Table 1. The first column in Table 2
corresponds to the scenario in which all values are present in the system and therefore no
forecasting was needed. The MAFE in this column is simply the MAE between the MLE
and true estimates of ft . It is noticeable that the MLE at the base scenario produces
filtered estimates with equal precision to the true parameter vector θ. The remain-
ing columns report the MAFE (measured at missing entries only) across an increasing
number of sparse series.
We have embedded a coloring scheme in Panels in which interesting patterns emerge
because time-series are made sparse in a fixed order. In particular, the shades of gray
highlight the order in which the MAFE of the idiosyncratic factors converges because
their corresponding time-series (already) include missing values. For example, when y1,t
is sparse, its idiosyncratic factor f1,t cannot be estimated local-in-time, and therefore its
scaled-score is set to zero. If y1,t is conditionally missing, the factor f1,t converges to its
unconditional mean over time. By making the subsequent time-series (y2,t ), the MAFE
of f1,t remains unchanged because y1,t was already subject to missing observations in an
earlier scenario. As a result, the MAFE also converges, i.e., it converges to its variance.
The darker the gray, at a later scenario a particular idiosyncratic factor converges. Only
for the common factors, the shades of gray indicate an increase in MAFE across an in-
creasing number of sparse series. The MAFE under stochastic missing data mechanisms
does slightly increase after the corresponding time-series includes missing entries but
at a very slow rate. Therefore, the forecast error also converges for randomly missing
data patterns. The MAFE for the common factor increases with the number of sparse
time-series, among all missing data mechanisms and scenario combinations. However,
when data is subject to randomly missing observations, even when 75% of the values in
all time-series is missing, the MAFE under stochastic sparsity is much smaller than the
MAFE for a (long) sequence of missing observations. In fact, for MAR observations,

27
the common mean and log-variance factors are estimated twice as accurately as for the
deterministic missing data pattern with large missing windows. As a result, functions
of factors under MAR at a rate of 75%, such as time-varying means, volatilities, VaR
and ES, are also estimated twice as accurate than conditionally missing observations,
see Appendix C.
Finally, the last two panels in Table 2 report the coverage rates of the simulation-
based in-sample forecast bands for the factors at a confidence level of 95%. Since the
zero column in Table 2 implies that the GAS model is estimated on completely observed,
innovation uncertainty is not part of the simulation. That is, the in-sample bands are
only constructed by taking parameter uncertainty into account. These bands are at the
nominal level for the mean factors, but the rates are smaller for the log-variance factors.
Nonetheless, when time-series do include missing entries, the coverage rates are very
close to nominal level across all missing data mechanisms and scenario combinations.
The forecast bands uncertainty implied under MAR is slightly conservative at high
missing rates (i.e., πi = 25%). This is likely due to the fact that under stochastic
missing data patterns the factors can be well tracked over time, even with (potentially)
multiple values in the cross-section being missing. Therefore, the uncertainty implied by
the missing observations is slightly too large. Moreover, the accuracy of the confidence
bands naturally propagates into functions of the factors, as reported for the moments
and tail risk measures in Appendix C.

28
Table 1: MLE biases

Conditionally Missing Randomly Missing


60% 25% 50% 75%
Sequential Sparse Time-Series in System
Parameters True 0 y1,t y1:2,t y1:3,t y1:4,t y1,t y1:2,t y1:3,t y1:4,t y1,t y1:2,t y1:3,t y1:4,t y1,t y1:2,t y1:3,t y1:4,t

Panel A: Mean factors

κc 0 0.000 -0.000 -0.000 -0.000 0.001 -0.000 -0.000 -0.000 -0.000 0.000 0.000 0.000 0.000 -0.000 -0.000 -0.000 -0.000
κ1 0 0.000 -0.000 -0.000 -0.000 -0.000 0.000 0.000 0.000 0.000 -0.000 0.000 0.000 0.000 -0.000 -0.000 -0.000 -0.000
κ2 0 0.000 0.000 0.000 0.000 0.000 -0.000 -0.000 -0.000 -0.000 0.000 -0.000 -0.000 -0.000 -0.000 0.000 0.000 0.000
κ3 0 0.000 -0.000 -0.000 -0.000 -0.000 -0.000 0.000 -0.000 -0.000 0.000 0.000 -0.000 0.000 0.000 -0.000 -0.000 -0.000
Ac 0.4 0.002 -0.012 -0.015 0.092 0.004 -0.008 -0.016 -0.020 -0.012 -0.016 -0.027 -0.014 -0.005 -0.018 -0.029 0.057 0.058
A1 0.1 -0.002 -0.001 0.000 -0.003 -0.004 0.001 0.004 0.008 0.009 0.006 0.015 0.03 0.022 0.007 0.023 0.075 0.024
A2 0.1 -0.003 0.003 -0.002 -0.005 -0.005 -0.000 0.005 0.009 0.010 0.003 0.017 0.033 0.025 0.007 0.023 0.074 0.038
A3 0.1 -0.002 0.003 0.014 -0.005 -0.006 -0.001 0.003 0.009 0.009 0.001 0.010 0.029 0.021 0.005 0.022 0.075 0.035
Bc 0.98 -0.002 -0.003 -0.004 -0.001 -0.003 -0.002 -0.002 -0.002 -0.002 -0.002 -0.002 -0.002 -0.002 -0.002 -0.002 -0.003 -0.002
B1 0.95 -0.002 -0.003 -0.003 -0.003 -0.004 -0.002 -0.002 -0.002 -0.002 -0.002 -0.002 -0.003 -0.002 -0.001 -0.002 -0.004 -0.001
B2 0.95 -0.002 -0.002 -0.002 -0.002 -0.003 -0.002 -0.002 -0.002 -0.002 -0.002 -0.002 -0.003 -0.002 -0.003 -0.001 -0.003 -0.001
B3 0.95 -0.003 -0.003 -0.003 -0.003 -0.004 -0.002 -0.002 -0.002 -0.002 -0.002 -0.002 -0.002 -0.002 -0.002 -0.002 -0.003 -0.001

Panel B: Log-variance factors

κc 0.04 0.003 0.005 0.009 0.005 0.003 0.003 0.004 0.004 0.004 0.004 0.006 0.007 0.005 0.005 0.01 0.015 0.008
κ1 0 0.000 -0.000 -0.000 0.001 0.000 0.000 0.000 0.001 0.001 0.001 0.001 0.002 0.001 0.004 0.003 0.004 0.001
κ2 0 0.000 0.001 -0.000 0.001 0.000 0.001 0.001 0.001 0.001 0.001 0.001 0.002 0.001 -0.000 0.003 0.004 0.001
κ3 0 -0.001 0.001 0.004 0.001 -0.000 0.000 0.001 0.001 0.001 0.000 0.002 0.002 0.001 -0.000 0.002 0.003 0.001
Ac 0.2 0.000 -0.003 0.002 0.07 0.003 -0.006 -0.011 -0.016 -0.014 -0.008 -0.016 -0.016 -0.017 -0.009 -0.013 0.020 -0.006
A1 0.1 0.000 -0.000 0.001 -0.003 -0.001 0.003 0.006 0.009 0.010 0.008 0.014 0.024 0.019 0.010 0.021 0.049 0.021
A2 0.1 -0.000 0.004 0.002 -0.002 -0.001 0.002 0.007 0.010 0.010 0.005 0.014 0.024 0.019 0.006 0.024 0.052 0.021
A3 0.1 0.000 0.004 0.014 -0.002 -0.001 0.002 0.005 0.009 0.009 0.005 0.011 0.024 0.019 0.006 0.018 0.050 0.026
Bc 0.98 -0.001 -0.003 -0.004 -0.003 -0.001 -0.001 -0.002 -0.002 -0.002 -0.002 -0.003 -0.003 -0.002 -0.002 -0.004 -0.005 -0.002
B1 0.95 -0.004 -0.004 -0.004 -0.004 -0.006 -0.003 -0.003 -0.003 -0.003 -0.002 -0.002 -0.003 -0.001 -0.001 -0.002 -0.001 -0.002
B2 0.95 -0.002 -0.002 -0.003 -0.003 -0.004 -0.003 -0.003 -0.003 -0.003 -0.002 -0.003 -0.003 -0.002 -0.004 -0.004 -0.005 -0.003
B3 0.95 -0.002 -0.002 -0.001 -0.004 -0.005 -0.003 -0.003 -0.003 -0.003 -0.002 -0.002 -0.003 -0.002 -0.002 -0.001 -0.002 -0.001

Notes: This table reports biases of the common and idiosyncratic mean and log-variance parameter estimators. The biases are obtained through M = 500 replications and are reported for the
conditionally and randomly missing data mechanisms, across the number of sparse series in the system. The percentages indicate the missing rate of each time-series, given their missing data
mechanism. Sequentially missing time-series indicate the missingness scenarios, e.g., y1:3,t implies that the first three time-series variables:: y1 , y2 and y3 are subject to missing observations.

29
Table 2: In-sample forecast perfomance results

Conditionally Missing Randomly Missing

60% 25% 50% 75%

Sequential Sparse Time-Series in System

Factors 0 y1,t y1:2,t y1:3,t y1:4,t y1,t y1:2,t y1:3,t y1:4,t y1,t y1:2,t y1:3,t y1:4,t y1,t y1:2,t y1:3,t y1:4,t

Panel A: In-sample MAFE for mean factors

fc,t 0.030 0.197 0.326 0.453 0.793 0.085 0.122 0.150 0.173 0.126 0.187 0.232 0.278 0.162 0.251 0.322 0.405
f1,t 0.037 0.235 0.235 0.235 0.235 0.096 0.100 0.105 0.108 0.143 0.148 0.153 0.160 0.191 0.195 0.201 0.213
f2,t 0.035 0.087 0.236 0.236 0.235 0.048 0.101 0.106 0.109 0.060 0.148 0.154 0.160 0.072 0.195 0.201 0.211
f3,t 0.036 0.086 0.135 0.236 0.236 0.048 0.057 0.106 0.109 0.058 0.076 0.153 0.159 0.072 0.101 0.200 0.208
f4,t 0.056 0.100 0.160 0.231 0.232 0.068 0.080 0.092 0.123 0.081 0.107 0.133 0.183 0.095 0.138 0.181 0.242

Panel B: In-sample MAFE for log-variance factors

fc,t 0.023 0.156 0.261 0.376 0.594 0.053 0.078 0.098 0.114 0.083 0.127 0.164 0.194 0.117 0.185 0.252 0.301
f1,t 0.032 0.320 0.319 0.322 0.320 0.098 0.105 0.111 0.115 0.160 0.166 0.175 0.182 0.235 0.237 0.244 0.251
f2,t 0.030 0.112 0.32 0.322 0.319 0.045 0.105 0.111 0.115 0.061 0.168 0.176 0.182 0.083 0.235 0.244 0.251
f3,t 0.030 0.111 0.183 0.321 0.319 0.045 0.057 0.111 0.116 0.061 0.084 0.175 0.182 0.082 0.119 0.244 0.251
f4,t 0.044 0.124 0.211 0.320 0.320 0.062 0.079 0.095 0.128 0.080 0.116 0.151 0.205 0.104 0.165 0.229 0.284

Panel C: Coverage of in-sample forecast bands for mean factors at 95% confidence level (%)

fc,t 88.074 94.522 94.359 98.198 93.655 93.949 94.065 94.099 94.653 94.243 93.925 94.967 95.319 94.870 94.804 98.222 97.591
f1,t 94.261 95.581 95.981 96.351 95.457 96.601 96.843 97.440 97.321 97.029 97.810 98.595 98.611 96.977 98.416 99.299 98.284
f2,t 95.021 96.244 96.284 96.264 95.895 96.172 96.899 97.471 97.526 96.512 97.897 98.771 98.788 97.218 98.360 99.471 98.476
f3,t 94.594 96.317 96.739 96.274 95.651 96.616 97.048 97.381 97.398 97.026 97.348 98.668 98.574 97.319 97.775 99.554 98.385
f4,t 96.747 98.501 98.628 99.218 98.624 97.669 98.321 98.577 98.524 98.670 99.110 99.256 99.62 99.365 99.672 99.807 99.931

Panel D: Coverage of in-sample forecast bands for log-variance factors at 95% confidence level (%)

fc,t 98.588 96.133 96.309 98.705 98.687 97.975 97.052 95.719 94.789 97.294 94.278 91.352 89.871 95.856 91.867 90.06 88.079
f1,t 81.659 94.347 94.841 94.061 93.218 91.891 94.925 95.905 96.183 93.532 96.305 97.535 97.335 93.289 96.754 98.892 96.735
f2,t 82.989 94.985 94.867 94.285 93.736 90.406 94.895 96.082 96.508 92.941 96.183 97.404 97.334 93.636 96.742 98.822 96.461
f3,t 82.664 94.858 96.333 94.140 93.605 90.170 92.709 95.772 96.096 92.985 94.355 97.556 97.564 93.782 94.157 98.874 96.596
f4,t 77.718 91.665 94.782 98.093 95.360 85.108 87.288 87.870 95.801 89.012 89.658 89.105 97.796 90.705 91.793 93.264 98.679

Notes: The first two panels of this table report the MAFE of the common and idiosyncratic mean and log-variance factors. The last two panels report the coverage rates of the forecast bands
corresponding to a confidence level of 95%. For more details, see caption of Table 1.

30
4 Case Study: Forecasting Sparse Credit Curves

4.1 Description of empirical application and data


This section presents two applications of our forecasting methodology for the treatment
of missing values in credit curves. Daily CDS curves are gathered from MarkitTM for 24
financial institutions. The number of firms is equally distributed over the regions: Asia,
Europe, and North America. For each institution, the daily term structure of the credit
curves consists of 6 maturity points and its tenors range from 6 months to 10 years.
The CDS time-series span over a 10 year period from January 2, 2011 to December 31,
2020. This leads to a total of p = 144 time-series with n = 2609 observations. Figure 4
displays the observed CDS term structures.
Figure 4: CDS term structures of financial institutions
Panel A: Asia
200 300
6M 5Y 120
400 Bk of India 1Y 7Y 100 CTBC Finl Hldg Co Ltd INDL COML BNK OF CHINA LTD 250 Kookmin Bk
150
300
3Y 10Y 80 200
60 100 150
200
40 100
100 50
20 50
0 0 0

200 400
100 MIZUHO Finl Gp INC Mitsubishi UFJ Morgan Stanley Secs Co Ltd Nomura Hldgs Inc 125 Temasek Hldgs
150 300 100
80
100 200 75
60
50
40 50 100
25
20 0 0 0

200 400
Panel B: Europe
300
150
Allianz SE 300
BNP Paribas 250 Barclays PLC 300
Deutsche Bk AG
200 200
100 200 150
100 100
50 100
50
0 0 0 0

250
400 250
300
ING Groep NV Lloyds Bkg Group plc 200
Skandinaviska Enskilda Banken AB 200 Swiss Life Ltd
300
200 150 150
200
100 100
100
100 50 50
0 0 0

200
Panel C: North America
250
500
150
Amern Express Cr Corp 300 Boston Pptys Inc 200
Cap One Finl Corp Goldman Sachs Gp Inc
400
200 150 300
100
100 200
50 100
50 100
0 0 0 0

200
CDX-NAIG-5Y 1000
200 JPMorgan Chase & Co Navient Corp 150
Royal Bk Cda 200 Wells Fargo & Co
150 800 150
600 100
100 100
400
50 50 50
200
0 0 0 0
2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020

Notes: each panel of this figure displays 8 historical CDS term structures belonging to a specific region (i.e., for Asia,
Europe or North America). The CDS maturities of the curve series run from 6M-10Y. The scales of the spreads is measured
in basis points (bps). The subplot for JPMorgan Chase & Co also includes the time-series of the North American CDS
index of investment grade with a 5Y tenor (CDXNAIG-5Y) and is plotted in black dots.

The credit curves among all financial institutions show a similar pattern over time,
especially during crisis’. For example, all credit spread curves are at their highest during
either the credit crunch around 2012 or during the more recent Covid-19 crisis. Note
that the subplot of JPMorgan Chase & Co (JPM) curves also includes the time-series
of the North American CDS index of investment-grade with a 5Y tenor (CDXNAIG-

31
5Y). This CDS index is used in our first application. In the first application, we consider
JPM’s curve and rerun our Monte Carlo experiment of Section 3 to verify the forecasting
performance of the dynamic location-scale factor model on empirical data. However, in
this application, the simulation experiment is extended with an additional missing data
scenario in which the CDX is used as a benchmark by including it as an additional tenor
to the time-series system. The simulation exercise is further extended by also considering
a factor loading matrix without common components and cross-validation is performed
to construct additional missing data scenarios.
Through Figure 4 it is evident that not all CDS time-series are completely observed.
For most of the firms, the observations seem to be conditionally missing instead of
randomly missing. Hence, large gaps in the beginning and the middle of the credit
curves are present. The main objective of our second empirical application is to construct
synthetic CDS curves at sparse time points. Namely, we take the entire panel of p = 144
time-series and fit several high-dimensional hierarchical specifications of our dynamic
density model. For the design of the hierarchical factor loading matrices, we consider
a variety of combinations of tenor and region factors, but also of credit rating and
idiosyncratic factors. The credit ratings for each firm are also obtained from MarkitTM
and are taken as the average rating from the credit quality grades given by the companies:
Standard Poor, Moody, and Fitch Group. As the credit quality of firms is subject to
change over time, the ratings are retrieved on daily basis as well. Thereby, time-varying
specifications of the factor loading matrix Ft are also included in the second application.
In addition, we construct hierarchical factor loading matrices that also interaction terms
between tenor and rating combination.
Moreover, Table 3 reports for each institution the percentage of missing observations
per maturity. This table also provides an overview of the rating frequencies over the 10
year CDS history of the firms. It is noticeable that the lower end of the curves (up to
3 years) consists of more missing observations than CDS series with higher tenors. The
reason for this is that higher maturity quotes are more liquid. Also, most institutions
belong to a group credit quality of single-A and triple-A grade (highest credit grade)
is the least observed among the firms. Only the institutions Temasak Holdings and
Navient Corp retained the same rating over our period.

32
Table 3: Missing observations and rating decomposition per CDS name

Missing Values (%) Rating Decomposition (%)


Institution 6M 1Y 3Y 5Y 7Y 10Y AAA AA A BBB B

Panel A: Asia

Bk of India 0.038 0.077 0.077 95.439 4.561


CTBC Finl Hldg Co Ltd 42.545 36.374 31.430 31.430 31.430 31.430 48.103 51.897
INDL COML BNK OF CHINA LTD 54.274 54.274 54.274 54.274 54.274 54.274 0.115 99.885
Kookmin Bk 0.23 0.115 99.885
MIZUHO Finl Gp INC 81.219 81.219 76.121 67.190 67.190 71.867 0.115 99.885
Mitsubishi UFJ Morgan Stanley Secs Co Ltd 24.607 12.265 8.816 8.049 14.795 10.310 0.805 99.195
Nomura Hldgs Inc 38.137 61.863
Temasek Hldgs 0.652 100

Panel B: Europe

Allianz SE 0.115 99.885


BNP Paribas 11.000 89.000
Barclays PLC 58.068 49.828 48.524 45.343 45.381 45.381 6.401 44.845 48.754
Deutsche Bk AG 1.418 72.787 25.795
ING Groep NV 15.102 3.795 0.345 0.077 0.077 0.153 100
Lloyds Bkg Group plc 60.636 55.424 44.845 37.792 38.865 40.82 22.346 77.654
Skandinaviska Enskilda Banken AB 0.077 0.077 0.498 0.498 0.115 38.137 61.748
Swiss Life Ltd 5.788 9.467 0.038 0.038 0.038 0.038 0.115 90.878 9.007

Panel C: North America

Amern Express Cr Corp 16.098 3.680 0.613 0.115 99.885


Boston Pptys Inc 1.035 0.115 15.293 84.707
Cap One Finl Corp 0.038 0.038 0.038 0.038 0.038 0.153 0.115 99.885
Goldman Sachs Gp Inc 0.077 0.038 99.923 0.077
JPMorgan Chase & Co 0.038 0.038 8.126 91.874
Navient Corp 39.095 39.057 38.904 38.904 39.785 39.785 100
Royal Bk Cda 38.367 21.349 18.666 17.631 22.231 22.422 0.115 99.387 0.498
Wells Fargo & Co 0.077 0.038 8.739 91.261

Notes: this table reports the percentage of missing values present within the CDS curves (over the 10 year period: January
2, 2011 - December 31, 2020) of the 24 financial institutions in our panel data. The decomposition of the historical rating
frequencies for each firm is also reported. Empty table cells indicate zero percentage values.

4.2 Imputing single curves


The first application assesses the performance of the dynamic density model on real data.
That is, a single credit curve is imputed. As in the previous Monte Carlo experiment,
artificial missing entries are introduced in the observed data, and again, the objective
is to gather the forecast performance statistics (accuracy of forecasts and confidence
intervals) for a quad-variate time-series system. For this, the upper-end of the CDS
curve from 3Y-10Y of JPM is taken as input because it is completely observed and the
constant effect coded factor loading matrix (i.e., one common plus three free idiosyncratic
factors) is used to specify the GAS model. However, the simulation experiment of
Section 3 is extended with two more design matrices. The results are also gathered for

33
the basic specification of the density model in which the factor loading is the identity
matrix (i.e. F = I4 ). Such a design imposes no factor structure, i.e., it implies a
univariate specification of the model. This design is our benchmark specification. The
second design addition is the specification that includes an extra time-series in the
system to aid the imputation accuracy in a single-curve setting. We add the North
American CDX of investment grade of 5 years maturity to JPM’s time-series system,
but do not set its observation to missing. Thereby, there will always be at least one value
observed in the cross-section. This makes it possible to always estimate the time-varying
common location and scale factors and is an empirical relevant solution to threat missing
observations.
The simulation experiment is further extended by employing cross-validation to gen-
erate more missing data scenarios. This time, the order of the series that include missing
observations is permuted and will not follow a sequential pattern. The permutation of
p = 4 series provides a total of 4! = 24 different missingness orders. We use the same
missing data mechanisms as discussed in Section 3, i.e., time-series that are either 60%
conditionally missing or up to 75% randomly missing. The MAFE metric is computed
by measuring the error between the true observed value yt and its mean forecast µ
bt
and is, again, only measured at missing entries. To ensure the positivity of the pre-
dictions, the time-series are modeled in natural logarithms. Therefore, the final mean
forecasts are computed under the assumption of the Log-Normal distribution, i.e., the
µt + 12 σ
mean forecast is computed as exp(b bt2 ), where µ bt2 are the mean and variance
bt and σ
estimates obtained by fitting the dynamic density model in (2) on log yt . The accuracy
of the confidence intervals is also measured based on their coverage of the missing ob-
servations in yt . For the conditionally missing scenarios, the intervals are computed by
taking the exponent of the simulation-based forecast bands for the mean and afterward
trimming the bands according to the 95% confidence level. 95% confidence intervals
under the randomly missing data mechanism are computed by prediction intervals for
µt ± 2b
yt , i.e., exp(b σt ). The reason for this is that the estimates for µ bt2 in presence
bt and σ
of randomly missing data were accurate enough to provide good coverage and therefore
no simulation was required.
Table 4 reports the in-sample forecast accuracy and coverage rates obtained on JPM’s
term-structure. All results are computed whilst using the filtered location and scale

34
estimates from the predictive GAS recursion in (3) (one step-ahead) and are reported
as the average over 24 cross-validation samples. The first column with zero sparse series
belongs to the scenario in which all time-series values are observed. It is noticeable that,
in terms of in-sample accuracy, the dynamic multivariate Gaussian model fits the data
well, regardless of the design of the factor loading matrix. That is, the univariate and
effect-coded specifications of the time-varying location-scale model fit the data equally
well. This is also confirmed by the in-sample coverage rates of the prediction intervals2
for JPM’s observations. Hence, these coverage rates are accurate to the nominal level.
The remaining columns indicate missing data scenarios in which artificial missing
values are introduced in the empirical CDS time-series. The MAFE forecast accuracy
of the model under a univariate specification and in presence of conditionally missing
observations is the same across all number of sparse series in the system. This is an
expected result because there is no factor structure imposed in the mean or variance
of the density model. To be more precise, the remaining observed entries of the cross-
section do not contribute to the overall fit. When the 4 × 4 effect coded matrix is used,
the MAFE increases for all tenors as the time-series system includes more sparse series.
The MAFE eventually converges around 16 bps for all tenors when there are no more
observations left in the conditionally missing cross-section. The MAFE in this situate
makes it equal to ones under a univariate specification. However, the third time-series
system also includes the CDS index. Thereby, when all four tenors of JPM’s CDS
are conditionally missing there is still one observation present in the cross-section to
estimate the common mean and log-variance factors. Hence, the MAFE accuracy under
this specification with four missing tenors is comparable to the MAFE in which three
tenors are missing with the basic effect coded matrix (i.e., without CDX). In this case,
the MAFE is approximately 50% higher when cross-sections are conditionally missing,
compared to cross-sections with just one observed value.
Similar results are obtained for the remaining missing data mechanisms. In line with
Monte Carlo simulation in Section 3, the MAFE for randomly missing data patterns
is much smaller than for the conditionally missing pattern. Even with 75% randomly
missing data points, for all tenors the MAFE is 4-5 times smaller than then conditionally
2
Note that simulation of confidence bands is not required for this scenario because all values are
observed and used to fit the dynamic density models.

35
missing scenarios. This also holds for the univariate specification and implies that JPM’s
tenors include a high degree of forecastability. Nevertheless, effect-coded specifications
of the factor loading matrix deliver more accurate results and the inclusion of the CDX
is an aid to the forecast performance of the model when all tenors are in presence of
randomly occurring missing entries. In this case, the accuracy for all tenors is at most
1 bps larger than the scenario in which all values are observed.
Since the predictive filtered estimates are very accurately learned in presence of ran-
domly missing observations, the computation of correct confidence intervals for the CDS
observations does not require any simulation. That is, taking the forecast uncertainty
into account to construct the bands is not necessary. For conditionally missing scenar-
ios, however, simulation is required because the idiosyncratic factors mean revert due
to the large sequences of missing entries. The forecast bands inherit both innovation
and parameter uncertainty and S = 500 paths were simulated, just like for the Monte
Carlo experiment in Section 3. Among all factor loading specifications and randomly
missing scenarios, the coverage rates are very close to the nominal level. This holds
for CDS tenors subject to conditionally missing observations and with an effect coded
matrix. For identity specification of the factor loading matrix, the coverage rates are
smaller than 95%. Therefore, common factor designs for F and additional sources of
information not only significantly improve the forecast accuracy, but also the accuracy
of confidence bands.
Moreover, in Appendix D the similar performance results are provided, but then com-
puted whilst using the filtered estimates from score-driven update filter and smoother
recursions. The application of nowcasting algorithms provides an additional improve-
ment in the MAFE for imputing randomly missing observations. The GAS smoother,
however, performs poorly in presence of randomly missing values in JPM’s tenors. This
is probably due to the misspecification of the Gaussian distribution. On conditionally
missing scenarios, the performance of the GAS filter, update, and smoother recursion is
equivalent.

36
Table 4: In-sample forecast accuracy of predictive filter on CDS JPM Term Structure

Conditionally Missing Randomly Missing

60% 25% 50% 75%

Sparse Series in Time-Series System

Factor Specification Tenor 0 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

Panel A: MAFE of in-sample forecasts (bps)

3Y 1.873 15.126 15.108 15.102 15.105 2.105 2.126 2.178 2.093 2.564 2.571 2.456 2.516 3.344 3.286 3.308 3.348
5Y 2.217 16.426 16.402 16.417 16.400 2.587 2.575 2.610 2.543 2.921 3.042 3.001 3.031 3.970 3.960 3.946 3.948
Identity
7Y 2.557 16.547 16.533 16.535 16.533 2.883 2.898 2.952 2.874 3.239 3.298 3.293 3.328 4.239 4.134 4.236 4.196
10Y 2.810 17.091 17.081 17.054 17.010 3.288 3.192 3.200 3.146 3.562 3.499 3.559 3.571 4.442 4.348 4.357 4.398

3Y 1.956 6.658 8.302 9.616 15.103 2.108 2.081 2.034 2.119 2.203 2.172 2.256 2.228 2.501 2.526 2.614 2.785
5Y 2.259 4.143 5.119 7.800 16.354 2.372 2.386 2.372 2.455 2.477 2.646 2.515 2.591 2.540 2.608 2.717 3.100
Effect Coding
7Y 2.522 7.131 5.135 6.816 16.589 2.693 2.701 2.677 2.735 2.738 2.774 2.846 2.992 2.806 2.891 3.093 3.659
10Y 2.757 10.309 12.716 14.579 17.225 2.94 2.972 2.95 3.014 2.981 3.621 3.122 3.197 3.236 3.25 3.317 3.710

3Y 1.919 6.790 7.060 8.284 9.530 2.242 2.006 2.053 2.077 2.214 2.196 2.203 2.136 2.595 2.512 2.598 2.527
5Y 2.224 4.068 5.286 5.823 8.746 2.331 2.274 2.299 2.386 2.422 2.394 2.409 2.432 2.510 2.507 2.594 2.832
Effect Coding + CDXNAIG-5Y
7Y 2.472 5.070 5.796 6.423 10.091 2.634 2.544 2.727 2.657 2.688 2.634 2.689 2.732 2.760 2.766 2.882 3.139
10Y 2.694 7.741 8.699 8.858 11.753 2.895 2.845 2.981 2.908 2.937 2.917 2.986 3.032 3.197 3.18 3.283 3.588

Panel B: Coverage of in-sample bands at 95% confidence level (%)

3Y 95.017 91.922 91.733 91.693 91.506 95.173 94.787 94.996 95.007 94.787 94.740 95.076 94.930 94.676 95.338 95.241 95.075
5Y 95.362 96.678 96.417 96.396 96.556 94.907 95.040 94.889 95.073 95.480 94.733 94.920 94.800 94.622 94.791 94.761 94.892
Identity
7Y 95.132 92.478 92.228 92.407 92.253 95.333 95.293 94.907 95.200 95.733 95.293 95.102 95.050 94.604 94.623 94.789 94.718
10Y 95.017 90.956 90.978 91.100 91.092 94.827 94.933 94.836 95.007 95.253 95.38 95.107 95.103 94.955 95.292 95.216 95.170

3Y 94.366 90.233 89.133 87.844 81.281 93.973 94.04 94.178 94.507 93.707 94.020 94.138 94.51 92.364 92.787 94.012 94.351
5Y 95.017 98.044 94.989 97.078 84.894 95.600 95.373 95.324 94.913 94.84 94.820 95.173 94.913 94.169 94.867 95.396 95.191
Effect Coding
7Y 95.631 99.322 96.728 97.419 91.947 95.307 95.853 95.742 95.660 95.493 95.813 95.840 95.577 96.293 96.084 96.311 95.778
10Y 96.014 99.689 98.044 95.219 92.047 95.867 95.840 95.627 95.460 95.947 94.587 95.298 95.080 95.502 95.173 95.224 94.431

3Y 94.366 92.189 94.644 92.726 93.950 93.573 93.893 93.982 94.087 93.547 93.92 93.929 94.357 93.164 93.244 93.813 94.896
5Y 94.787 98.778 98.783 98.452 98.108 94.533 95.000 95.111 94.560 94.080 94.773 94.907 95.340 94.320 94.524 94.936 95.513
Effect Coding + CDXNAIG-5Y
7Y 95.745 99.044 98.606 97.778 97.256 96.027 95.800 95.680 95.693 95.187 95.627 95.933 96.240 95.227 95.636 95.757 96.507
10Y 96.282 99.189 98.478 97.293 96.314 95.707 96.093 95.893 96.200 95.987 96.167 96.160 96.323 95.218 95.031 95.387 96.578

Notes: this table reports the empirical Monte Carlo imputation performance results for the dynamic multi-factor density model on JPM’s CDS time-series (tenors run from 3Y till 10Y).
The performance results are reported for different missing data patterns (i.e., combinations of missing data mechanisms and number of sparse time-series in the system), given three different
specifications of the factor loading matrix. Panel A of this table reports the MAFE in basis points (bps) and Panel B reports coverage rates (%) for the true observed values, computed at sparse
entries only. All results are computed whilst using the filtered location and scale estimates from the predictive GAS recursion in (3) (one step-ahead, i.e., ft|t−1 ). The estimation period is January
2, 2011 - December 31, 2020.

37
4.3 Synthetic curve construction via spatial decompositions
The results of the previous application suggest that additional sources of information
can significantly aid the imputation accuracy of our dynamic density model, especially
when yt is subject to conditionally missing observations in a single curve set-up. With
this second application, the aim is to show that utilizing multiple sources of information
is also fruitful for imputing large gaps in high-dimensional panels. We exploit the factor
structure in the panel of p = 144 CDS time-series to identify multiple common features
and quantify the potential gain in imputation accuracy attached to the shared factors.
For this, multiple hierarchical specifications of the dynamic model on the panel of p = 144
CDS time-series are fitted. The multi-level structure in Ft is iteratively expanded with
spatial factors, i.e., we keep on adding more categorical (effect coded) dummy variables
in Ft and fit all designs to filter the unobserved location and scale factors.
The basic specification of Ft is again equal to the identity matrix that imposes no
factor structure. The second design only includes a single common factor, i.e., a Ft is
equal to the unit vector Ft = ι144 . This is the highest level in our hierarchical structure,
i.e., the level 1 factor. The third design includes both the unit vector and effect coded
dummies for the 6 tenors. Similarly, region and rating factors are added. The tenor,
region and rating dummies induce the second layer in the cross-section hierarchy (level
2). Note that as the credit quality of the firms is time-varying, the factor loading matrix
becomes also dynamic when the rating category is included. The matrix is further
extended with level 3 dummies to account for time-varying firm-based fixed-effects, i.e.,
the idiosyncratic firm factors. Note that to include these factors, one CDS firm factor
in each region is effect coded to follow the hierarchy, see Figure 1 for an illustration.
The final addition are interaction terms between tenor and rating factors are added to
further improve the fit.
The inclusion of all common, idiosyncratic, and interaction dummies in Ft , makes the
column dimension larger than 50, leading to more than 100 free location and log-variance
factors. By even specifying the A and B in (3) as diagonal parameter matrices, the total
size of the static parameter vector ψ will exceed over 300 unknowns. This makes it
difficult to numerically estimate the (largest specification) dynamic model. Therefore,
we propose to limit the number of unknown parameters in ψ by common parameter

38
reduction techniques. First, the employment of the mean-targeting approach discussed
in Subsection 2.8 reduces the number of unknowns by the size of κµ . The size of ψ is
2
further reduced by pooling the persistence parameters B µ and B log σ . Namely, a single
persistence parameter is used per factor category, e.g., all tenor factors in location and
µ log σ 2
scale share the same scalar parameters Btenor and Btenor , respectively. We also reduce
the dimensional of Ft by only adding interaction terms between all rating categories and
up to the 3Y tenor. Furthermore, the credit rating qualities AAA and AA are pooled
together, to construct a single rating class: AAA-AA. The reason for this is that the
frequency of these two ratings is very low, see Table 3. Such choices lead to significant
reductions in static parameters ψ and allow to fit more parsimonious versions of the GAS
model. As reported in Table 5, the largest hierarchical design required the estimation of
just over 130 static parameters, while being able to fit almost 100 unique location and
log-variance components.
Table 5 provides an overview of the number of dynamic factors and static parameters
for each hierarchical specification fitted on the high-dimensional panel of p = 144 CDS
time-series. Note that to ensure the invertibility of Ft that includes idiosyncratic factors,
the rating AAA-AA was removed from the matrix. It is noticeable that the univariate
specification of the model, i.e., Ft = I144 , fits the CDS time-series very well. With the
score-driven update filter, the in-sample MAE is below 1 bps, and that the smoother,
again, yields the poorest fit. The results of the common factor specification suggest
that the cross-section of means and variances of the CDS panel are in favor of a time-
varying hierarchical factor loading matrix specification. Hence, as more dummy variables
are added, all in-sample statistics improve. The largest in-sample improvements occur
when the rating and idiosyncratic factors are fitted. The smallest contributions in MAE
were made by the inclusion of regional factors and the interaction terms. Compared
to smaller hierarchical specifications, the largest fits the data the best. However, the
univariate specification indicates fits the data the best compared to all designs. This
indicates that there are still some cross-sectional dependencies unmodeled, even after
fitting nearly 100, both common and idiosyncratic, mean and variance factors.

39
Table 5: Estimation results for dynamic high-demensional hierarchical density models

Panel A: Dynamic Factors & Static Parameters Panel B: In-Sample Performance


Factor Loading Effect Coded # Free Factors # Total # Param. Log Lik. Predictive Update Smoother
Matrix Specification Factors (2 × m; location + scale) Factors (MAE) (MAE) (MAE)

Identity 2 × 144 = 288 288 720 554550 2.540 0.982 8.021

Unit 2×1=2 2 5 -409616 46.396 46.352 46.347

Unit, Tenor 10Y 2(1 + 5) = 12 14 22 -265710 32.936 32.899 32.904

Unit, Tenor, Region 10Y, EU 2(1 + 5 + 2) = 16 20 30 -240612 31.899 31.842 31.874

Unit, Tenor, Region, 10Y, EU, A 2(1 + 5 + 2 + 3) = 22 28 41 -125445 22.436 22.329 23.114
Rating

Unit, Tenor, Region, 10Y, EU, A, 2(1 + 5 + 2 + 2 + 21) = 62 74 103 247741 8.698 8.183 10.528
Rating, Firm Kookmin,
JPMorgen,
Scandinavian

Unit, Tenor, Region, 10Y, EU, A, 2(1+5+2+2+21+9) = 80 98 132 267348 7.984 7.402 10.040
Rating, Firm, Kookmin,
Tenor-Rating JPMorgan,
Scandinavian,
6M-A, 1Y-A,
3Y-A

Notes: the table provides an overview of the estimation results of the fitted hierarchical dynamic density models on the
high-dimensional panel of p = 144 CDS time-series displayed in Figure 4. The first column of the table reports the
specification of the factor loading matrices. The first column in Panel A lists the effect-coded factors of the included
categorical variables in the corresponding Ft (p × m). Remaining columns in this Panel report, the sum of total free
time-varying factors location and log-variance factors, followed by the number of total factors. The numbers in this
column are the sum of the number of free factors plus two times the number of effect coded factors in the first column of
Panel A. The number of parameters indicates the size of the unknown parameter set ψ. Panel B reports the in-sample
performance statics, i.e., the Log Likelihood of the estimated model and the MAE of the fitted values in bps. The MAE
accuracy is taken as the average of in-sample prediction errors across all observed values in the CDS panel data. The
MAE is reported based on the conditional estimates of three different GAS filters, i.e., for ft|t−1 (predictive filter), ft|t
(update filter) and ft|n (smoother). The estimation period is January 2, 2011 - December 31, 2020.

Figure 5 and 6 display the filtered conditional one step-ahead (ft ) common and
idiosyncratic mean and log-variance components of the largest specification of the dy-
namic density model in Table 2. The graphs visually illustrate the differences between
all sources of fluctuations and highlight the presence of strong common factors and id-
iosyncratic factors. Note that level 1 common factor implies is the intercept that also
captures the fluctuations of the AAA-AA rating because it was discarded from Ft to
avoid perfect multicollinearity. As such the remaining rating factors are modeled as
a difference from the intercept, i.e., a top-up, just like in plain vanilla dummy variable
regressions. The remaining dynamic factors are subject to effect coded dummy variables.

40
Figure 5: Filtered common location-scale factors

(a) Mean factors


0.5

0.0
Unit + AAA-AA
0.5

1.0
2012 2013 2014 2015 2016 2017 2018 2019 2020
0.5 0.1
0.5
1.0 6M 1Y 0.0 3Y
1.0 0.1
1.5

0.6 1.0
0.75
0.4 5Y 0.50 7Y 10Y
0.5
0.2 0.25

0.0 0.2 0.2


Asia 0.0
Europe North America
0.2 0.0
0.2

0.2 0.0 0.50


A BBB 0.25
BB
0.4 0.2

0.5 0.2
0.0
0.0 6M & AAA-AA 0.0 1Y & AAA-AA 3Y & AAA-AA
0.5 0.2
0.5

0.5 0.5 0.25


6M & A 1Y & A 0.00 3Y & A
0.0 0.0
0.25

0.5 0.5
0.2
0.0
0.0 6M & BBB 1Y & BBB 0.0 3Y & BBB
0.5 0.5 0.2

0 6M & BB 0 1Y & BB 0.0 3Y & BB


1
1 0.5
2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020

(b) Log-variance factors


2

3 Unit + AAA-AA
4

2012 2013 2014 2015 2016 2017 2018 2019 2020

2 2 0.0
6M 1Y 3Y
1 1 0.5

0.5 0.5 0
5Y 1.0 7Y 10Y
1.0 1
1.5
1.5

1 1 1
Asia 0 Europe 0 North America
0 1 1

4
0
1.0 3
1 A 1.5
BBB BB
2
2 2.0
2 2 1
0
6M & AAA-AA 0 1Y & AAA-AA 0 3Y & AAA-AA
2 1
2

2
1 1
0
6M & A 0 1Y & A 3Y & A
0
1

2 2 1
6M & BBB 1Y & BBB 0 3Y & BBB
0 0
1

1
0 0.0 0
6M & BB 2.5
1Y & BB 3Y & BB
1
5 5.0
2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020

Notes: the first panel of this figure displays the filtered conditional common CDS mean factors, given the parameter
estimates of the largest specification of the dynamic density model in Table 5. The filtered estimates are obtained from
the predictive GAS filter (ft ) and are plotted together with their 95% forecast bands that inherit both innovation and
parameter uncertainty, for which S = 500 simulation paths were simulated. The median of the simulation bands is
depicted in solid gray. All filtered estimates are also plotted in solid lines, but each factor category is highlighted with a
different color, e.g., the regional factors are plotted in yellow and the interaction terms in purple. Similarly, the second
panel depicts the time-varying common log-variance factors.

41
Figure 6: Filtered idiosyncratic location-scale factors

(a) Mean factors


1.25 0.5
Panel A: Asia
1.00
Bk of India CTBC Finl Hldg Co Ltd 0.5 INDL COML BNK OF CHINA LTD 0.5 Kookmin Bk
0.0
0.75 0.0 0.0
0.50 0.5 0.5
0.25 0.5

0.25
0.50 MIZUHO Finl Gp INC 0.5 Mitsubishi UFJ Morgan Stanley Secs Co Ltd Nomura Hldgs Inc 0.50
Temasek Hldgs
0.5
0.25 0.0 0.75
0.00 0.0
0.5 1.00
0.25
0.5 1.25
Panel B: Europe 1.0
0.4 Allianz SE BNP Paribas Barclays PLC Deutsche Bk AG
0.25
0.6 0.5
0.00 0.5
0.8 0.0
0.25
1.0 0.0
0.50
1.2 0.5
0.75

0.4
0.2
ING Groep NV Lloyds Bkg Group plc 0.0 Skandinaviska Enskilda Banken AB 1.0 Swiss Life Ltd
0.5
0.0
0.5 0.5
0.2 0.0
0.4 0.0
0.5 1.0
Panel C: North
0.50
America
Amern Express Cr Corp Boston Pptys Inc Cap One Finl Corp 0.75 Goldman Sachs Gp Inc
0.25 0.25
0.0 0.50
0.00 0.00
0.25
0.5 0.25 0.25
0.00
0.50 0.50

0.2
0.0
JPMorgan Chase & Co 1.5
Navient Corp 0.0 Royal Bk Cda Wells Fargo & Co
0.5 0.0
0.5 0.2
1.0 1.0
1.0 1.5 0.4
0.5
2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020

(b) Log-variance factors


3 2
Panel A: Asia
Bk of India CTBC Finl Hldg Co Ltd 4 INDL COML BNK OF CHINA LTD Kookmin Bk
2 0
0
1 2
2
0 2 0
1 4

2
1 MIZUHO Finl Gp INC 3 Mitsubishi UFJ Morgan Stanley Secs Co Ltd Nomura Hldgs Inc Temasek Hldgs
1 0
0 2
0
1 1 2
1
0
2 2 4

2
Panel B: Europe
4
0 Allianz SE BNP Paribas Barclays PLC 4 Deutsche Bk AG
1 0 2 2
2 0
0
3 2 2

1 ING Groep NV 4 Lloyds Bkg Group plc Skandinaviska Enskilda Banken AB 4 Swiss Life Ltd
0
0 2 2
1 0 2 0
2 2 2

4 3
Panel C: North America
Amern Express Cr Corp 2
Boston Pptys Inc 1 Cap One Finl Corp 1 Goldman Sachs Gp Inc
2 1 0
0
0 1
0
1 1

5.0
4 JPMorgan Chase & Co 0 Navient Corp Royal Bk Cda Wells Fargo & Co
2.5 1
2 2 0.0 0
0 2.5
4 1

2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020

Notes: the first plot if this panel displays the filtered conditional idiosyncratic (fixed-effect) mean factors (solid pink).
Each panel of the first panel plots 8 factors, namely one for each of the financial firms within the respective region
(i.e., for Asia, North America and Europe). The subplots also plot the simulation-based 95% confidence bands. The
implied median of the bands is depicted in solid gray. Similarly, the second plot of this figure displays the time-varying
idiosyncratic log-variance factors. For more details, see caption of Figure 5.

In Figure 5 it is evident that in the first half of our time span, the uncertainty in the
mean factors is larger than in the second half. The bands are larger due to the existence

42
of innovation uncertainty. Hence, multiple CDS term structures in the panel are in
presence of conditionally missing observations over the first half. In the second half
static parameter uncertainty exists, which is smaller negligible for the mean equation.
This also holds for the idiosyncratic factors in 6. However, for the log-variance factors,
the parameter uncertainty is larger and yields larger in-sample confidence bands.
Interestingly, the forecast bands show highly dynamic patterns for factors that cannot
be estimated due to missing observations. For example, the single B-rated location and
scale factors were initialized with their unconditional mean because these factors could
not be estimated, i.e., there were no CDS firms with this credit rating at the beginning
of our panel data. Here, the filtered estimates for this rating class are simply equal to the
unconditional mean until information arrives that allows for their conditional estimation.
At such time points, the implied median of the forecast bands is very different from the
filtered estimate. This is likely due to the spillover dynamics present in the time-series
system and the correlation structure present in the (robust) asymptotic variance of the
MLE during the simulation.
To better understand the adequacy of our largest imputation model, we use its
conditional filtered factors to construct synthetic credit spreads. For this we visually
compare the synthetic spreads with and without the idiosyncratic information. A pos-
sible realization of a synthetic spread at time point t, say ybi,t , is obtained as follows:
ybi,t = exp(b
µi,t + σ
bi,t εbi,t ), for i = 1, . . . , p = 144 and t = 1, . . . , n = 2609. To obtain
an approximate density for the credit spreads values, we propagate also the remaining
residuals of the system in the formula, i.e,
p
1 (exp(b
X
−1
Ht (b yi,t ≤ x) = p
yi,t ) = Pr (b bi,t εbj,t ) ≤ x) ,
µi,t + σ
j=1

where Ht (b
yi,t ) is an approximate cumulative density function for predicting yi,t on ob-
servation level and µ
bi,t and σ
bi,t are pre-computed and fixed moment estimates for yi,t .
Imputing observations on the levels provides a more relativistic estimate for the missing
observations because empirical innovations are also considered. When considering the
case in which we do not make use of any idiosyncratic information for the prediction
µ
of the 6 tenors yt of the k-th firm, we do not include the firm fixed-effects factor fk,t
log σ2
and fk,t in the computation of and ybt ’s and leave out all 6 local-in-time residuals that

43
belong to this firm. That is, only the common factors are used in this case and the sum
will run for up to p = 144 − 6 = 138 residuals to compute Ht (b
yi,t ). Note that for certain
time points less than p = 144 residuals are available because the CDS panel data is in
presence of missing observations.

Figure 7: Examples of synthetic credit curves


Panel A: Synthetic CDS curves without idiosyncratic inform. Panel B: Synthetic CDS curves with idiosyncratic inform.
250
Amern Express Cr Corp 6M 7Y 250
Amern Express Cr Corp
200 1Y 10Y 200
150 3Y True 150
5Y
100 100
50 50

500 500
400 Lloyds Bkg Group plc 400 Lloyds Bkg Group plc
300 300
200 200
100 100

400 Nomura Hldgs Inc 400 Nomura Hldgs Inc


300 300
200 200
100 100

300 300
Royal Bk Cda Royal Bk Cda
200 200

100 100

2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020

Notes: This figure displays the synthetic CDS term structures for four different companies. Panel A plots the curves that
are computed without the use of idiosyncratic, i.e., with common factors only. The conditional filtered location and scale
factors are obtained from the largest hierarchical density model specified in Table 5. The solid colored lines correspond
to the median of the approximate spread density and the shaded areas represent the 95% empirical confidence intervals,
for which the model’s residuals were used. The true observed curves for the corresponding firms are plotted in black dots.
Panel B displays the synthetic curves for the same firms in Panel A but is computed with the use of idiosyncratic factors.

Figure 7 provides four examples of synthetic CDS curves computed with and without
the use of idiosyncratic information. The synthetic curves are computed for the firms:
American Express Credit Corporation (AXP), Lloyds Banking Group Plc (LYG), No-
mura Holdings Inc (NMR), and Royal Bank of Canada (RYT). The two panels of the
figure highlight that without the use of idiosyncratic information much larger errors are
made. At many time points, the synthetic curves are either over predicting or under-
predicting the true data points. The idiosyncratic firm factors are highly informative,
even when the respective curve consists of large conditionally missing gaps. For example,
the CDS curves of LYG and RYT include many missing entries, and the overall level
of these curves is significantly shifted by their idiosyncratic factors and the predictions

44
are much closer to observed values. Most of the time, the true spreads data points are
within the empirical bands with a 95% confidence level.

Table 6: Coverage rates of synthetic curve bands at 95% confidence level

Tenor
Region 6M 1Y 3Y 5Y 7Y 10Y

Panel A: Coverage without idiosyncratic information (%)

Asia 50.853 44.37 31.559 24.396 26.227 39.867


Europe 62.639 59.491 34.276 26.250 25.053 29.542
North America 62.874 58.552 36.020 27.645 26.787 35.512

Panel B: Coverage with idiosyncratic information (%)

Asia 94.461 93.518 87.941 92.866 96.493 94.126


Europe 94.160 93.954 95.022 93.685 95.999 95.281
North America 92.823 91.836 88.223 93.110 95.228 92.933

Notes: the table reports the in-sample coverage rates of the empirical 95% confidence bands for the true CDS data points
present in the panel of p = 144 time-series. The first panel reports the coverage rates of the empirical bands for which
no idiosyncratic information was used. The coverage accuracy is further divided per region-tenor combination. Similarly,
Panel B reports coverage accuracy of empirical bands for which idiosyncratic was used.

The coverage rates based on Ht (b


yi,t ) with and without the use of idiosyncratic in-
formation are reported in Table 6. The empirical bands for which no idiosyncratic
information was utilized perform very poorly across all regions and tenors. In contrast,
the confidence intervals implied by Ht (b
yi,t ) that does include idiosyncratic dynamics are
highly accurate and are very close to the nominal level. As a result, it is also much
more likelier that the imputed densities Ht (b
yi,t ) provide a better description of the un-
observed values yi,t . Hence, our largest time-varying hierarchical density model serves
as an adequate choice for imputing the missing values present in the high-dimensional
panel data of CDS time-series.

5 Conclusion
We have introduced a score-driven hierarchical Gaussian location-scale model for fore-
casting the conditional density of missing values in multivariate time-series systems. The
model stands out for utilizing both the predictive power and factor structure present
in the system while permitting the estimation of time-varying common location and

45
scale factors with just a single value observed in the cross-sections through time. The
model’s imputation mechanism accommodates the treatment of very general missing
data patterns. Additionally, the dummy regression-based design matrix of the model
offers researchers a high degree of flexibility to exploit the common and idiosyncratic
factor structure in the panel’s cross-sections of means and variances. We have developed
a simulation algorithm to quantify the forecast of the dynamic factors at sparse time
points, which is appealing for risk management. In the future, we can extend the model’s
observation density to a heavy-tailed one, e.g., the multivariate Student’s t distribution.
This leads to robust GAS dynamics for dynamic means and volatilities. Another exten-
sion is to induce dynamic correlations and impose asymmetry effects to account for the
potential asymmetric relationship between shocks and conditional covariances (i.e., the
leverage effect).

References
Abadir, K. (2005). Matrix algebra. Cambridge New York: Cambridge University Press.

Bai, J., & Ng, S. (2021, September). Matrix completion, counterfactuals, and factor
analysis of missing data. Journal of the American Statistical Association, 1–18.

Blasques, F., Gorgi, P., & Koopman, S. (2021, April). Missing observations in
observation-driven time series models. Journal of Econometrics, 221 (2), 542–568.

Blasques, F., Koopman, S. J., Lasak, K., & Lucas, A. (2016, July). In-sample confidence
bands and out-of-sample forecast bands for time-varying parameters in observation-
driven models. International Journal of Forecasting, 32 (3), 875–887.

Buccheri, G., Bormetti, G., Corsi, F., & Lillo, F. (2020). A score-driven conditional
correlation model for noisy and asynchronous data: An application to high-frequency
covariance dynamics. Journal of Business & Economic Statistics, 1–17.

Buccheri, G., Bormetti, G., Corsi, F., & Lillo, F. (2021). Filtering and smoothing with
score-driven models.

46
Cahan, E., Bai, J., & Ng, S. (2021). Factor-based imputation of missing values and
covariances in panel data of large dimensions.

Creal, D., Koopman, S. J., & Lucas, A. (2008). A general framework for observa-
tion driven time-varying parameter models. Tinbergen Institute Discussion Papers,
108 (4).

Creal, D., Koopman, S. J., & Lucas, A. (2013). Generalized autoregressive score models
with applications. Journal of Applied Econometrics, 28 (5), 777–795.

Creal, D., Schwaab, B., Koopman, S. J., & Lucas, A. (2014). Observation-driven
mixed-measurement dynamic factor models with an application to credit risk. Review
of Economics and Statistics, 96 (5), 898–915.

Delle Monache, D., Petrella, I., & Venditti, F. (2016). Adaptive state space models with
applications to the business cycle and financial stress (CEPR Discussion Papers No.
11599).

Durbin, J., & Koopman, S. J. (2012). Time series analysis by state space methods.
Oxford University Press.

EBA. (2015). EBA Report on CVA under Article 456 (2) of Regulation (EU) No
575/2013. European Banking Authority.

Harvey, A. C. (2009). Dynamic models for volatility and heavy tails. Cambridge Uni-
versity Press.

Jin, S., Miao, K., & Su, L. (2021). On factor models with random missing: Em
estimation, inference, and cross validation. Journal of Econometrics, 222 (1), 745–
777.

Jungbacker, B., Koopman, S. J., & van der Wel, M. (2011). Maximum likelihood esti-
mation for dynamic factor models with missing data. Journal of Economic Dynamics
and Control , 35 (8), 1358-1368.

Koopman, S. J., Lit, R., Lucas, A., & Opschoor, A. (2018). Dynamic discrete copula
models for high-frequency stock price changes. Journal of Applied Econometrics,
33 (7), 966–985.

47
Lucas, A., Opschoor, A., & Schaumburg, J. (2016). Accounting for missing values in
score-driven time-varying parameter models. Economics Letters, 148 , 96–98.

Stock, J., & Watson, M. (2016). Dynamic Factor Models, Factor-Augmented Vec-
tor Autoregressions, and Structural Vector Autoregressions in Macroeconomics. In
J. B. Taylor & H. Uhlig (Eds.), Handbook of Macroeconomics (p. 415-525). Elsevier.

White, H. (1980, May). A heteroskedasticity-consistent covariance matrix estimator and


a direct test for heteroskedasticity. Econometrica, 48 (4), 817.

Xiong, R., & Pelger, M. (2019). Large dimensional latent factor modeling with missing
observations and applications to causal inference. SSRN Electronic Journal .

Appendices

A Derivation of Scaled-Scores
The time t log of the conditional density in (1) is

1
log p (yt |ft , Ft−1 ; ψ) = − [log 2π + log |Σt | + (yt − µt )0 Σt (yt − µt )] ,
2
 2

with p-dimensional mean and variance vectors µt = Ft ftµ and σt2 = exp Ft ftlog σ ,
respectively, and p × p diagonal variance matrix Σt = diag(σt2 ), for t = 1, . . . , n.

Proof of Proposition 1: The score and information matrix of the location factors
are straightforward to derive:

∂ log p (yt |ft , Ft−1 ; ψ)


∇µt = = Ft0 Σ−1
t (yt − µt )
∂ftµ
 2 
µ ∂ log p (yt |ft , Ft−1 ; ψ)
It = −Et−1 = Ft0 Σ−1
t Ft
∂ftµ ∂ftµ 0

From basic matrix calculus results it follows that for any triangular matrix Σt = Jt Jt0 ,
its log determinant can be written as
p
X
log |Σt | = log |Jt Jt0 | 2
= log |Jt | = 2 log Πpi=1 Jii,t =2 log Jii,t ,
i=1

48
see for instance Abadir (2005) for this and other useful results. Since Σt is diagonal, we
1
can simply define Jt := Σt2 = diag(σt ). Hence, we have
p p   p 
X X 1 2 X 2

log |Σt | = 2 log σi,t = 2 log exp Ft ftlog σ = Ft ftlog σ = tr(diag(log σt )),
i=1 i=1
2 i i=1
i

where tr(·) denotes the trace operator. By using this matrix result, the score and
information matrix of the log-variance factors take the following expressions:

σ2 1
∇log
t = Ft0 [(yt − µt )0 Σt (yt − µt ) − ιp ]
2
1
= Ft0 ε2t − ιp

2
2 1
Itlog σ = F 0 F
2
This completes the proof.

B Submodels of the multi-factor GAS model


B.1 The VARMA model

The multivariate GAS location model, with constant volatility and with time-varying
factor loading matrices takes the form

yt = µt + et , et ∼ N (0, Σe ),
µt = F t f t , t = 1, . . . , n,
ft+1 = κ + Bft + Ast , st = St ∇t ,
−1
∇t = Ft0 Σ−1
e (yt − µt ), St = Ft0 Σ−1
e Ft

−1 0 −1
Define Kt := (Ft0 Σ−1
e Ft ) Ft Σe , such that st = Kt (yt − µt ) = Kt (yt − Ft ft ). By
substituting st into predictive filter ft+1 , we get

ft+1 = κ + Bft + AKt (yt − Ft ft )

From the observation equations it follows that for (pseudo) invertable Ft we have

yt = µt + et = Ft ft + et ⇐⇒ ft = Ft−1 (yt − et )

49
Similarly, by subsituting this expression of ft in the GAS factor recursion one obtains

−1
(yt+1 − et+1 ) = κ + BFt−1 (yt − et ) + AKt yt − Ft Ft−1 (yt − et )

Ft+1
= κ + BFt−1 (yt − et ) + AKt et
−1 −1
Ft+1 yt+1 − Ft+1 et+1 = κ + BFt−1 yt − BFt−1 et + AKt et
−1
yt+1 = κ + BFt−1 yt + AKt − BFt−1 et + Ft+1
−1

Ft+1 et+1

Now we multiply the each side of the last equation by Ft+1 and get

−1
yt+1 = Ft+1 κ + Ft+1 BFt−1 yt + Ft+1 AKt − BFt−1 et + Ft+1 Ft+1
−1

Ft+1 Ft+1 et+1
yt+1 = Ft+1 κ + Ft+1 BFt−1 yt + Ft+1 AKt − BFt−1 et + et+1


Further assume that Ft := Ip = F for t = 1, . . . , n, and thereby Kt = K = Ip . Hence,

yt+1 = F κ + F BF −1 yt + F AK − BF −1 et + et+1


= κ + Byt + (A − B)et + et+1 ,

is a VARMA(1,1) model with Φ := B and Θ := A − B.

B.2 The VEGARCH model

The multivariate GAS exponential scale model, with zero mean and with time-varying
factor loading matrices takes the form
1/2
yt = Σt εt , εt ∼ N (0, Ip ),
Σt = diag(σt2 ), σt2 = exp(Ft ft )
ft+1 = κ + Bft + Ast , st = St ∇t ,
1 −1
∇t = Ft0 ε2t − ιp , St = 2 (Ft0 Ft )

,
2
for t = 1, . . . , n. By substituting st = (Ft0 Ft )−1 Ft0 (ε2t − ιp ) into predictive filter ft+1 , we
get
−1
ft+1 = κ + Bft + A (Ft0 Ft ) Ft0 ε2t − ιp


From the exponential link function it follows that for (pseudo) invertable Ft we have

σt2 = exp(Ft ft ) ⇐⇒ ft = Ft−1 log σt2

50
Similarly, by subsituting this expression of ft in the GAS factor recursion one obtains

−1 −1
2
= κ + BFt−1 log σt2 + A (Ft0 Ft ) Ft0 ε2t − ιp

Ft+1 log σt+1

Now we multiply the each side of the last equation by Ft+1 and get

−1 −1
2
= Ft+1 κ + Ft+1 BFt−1 log σt2 + Ft+1 A (Ft0 Ft ) Ft0 ε2t − ιp

Ft+1 Ft+1 log σt+1
−1
2
= Ft+1 κ + Ft+1 BFt−1 log σt2 + Ft+1 A (Ft0 Ft ) Ft0 ε2t − ιp

log σt+1

Further assume that Ft := Ip = F for t = 1, . . . , n. Hence,

−1
2
= F κ + F BF −1 log σt2 + F A (F 0 F ) F 0 ε2t − ιp

log σt+1
= κ + B log σt2 + A ε2t − ιp ,


is a (symmetric) VEGARCH(1,1) model with ω := κ, α := A and β := B.

C Additional Monte Carlo simulation results


This appendix reports additional results of our Monte Carlo study in Section 3. Ad-
ditional experiment were performed to highlight that the forecast performance results
obtained on factor level, also hold for functions of the location and log-variance factors.
For example, panels A-D in Table 7 reports these results for all time-varying mean and
volatility predictions at sparse locations. The MAFE errors and percentage coverage
rates are reported for the exact same missing data mechanism and scenario combina-
tions in Table 1. The last panel in Table 7 reports the Pearson’s correlation between the
bi , σi2 − σ
forecast errors of mean and variance pairs (in percentages), i.e., Corr (µi − µ bi2 ),
for i = 1, . . . , p = 4.
Additionally, we measure the forecast error and accuracy of the bands for the model
implied risk measures, given the true and predicted paths of the time-varying means and
variances for the four series in our system series. We consider the measures: value-at-risk
(VaR) and expected shortfall (ES) at several confidence levels. The MAFE metric for
the mode implied VaR and ES at conventional 1 − α% confidence levels is reported in
Table 8. The coverage rates of the forecast bands for these time-varying risk measures
is reported in Table 9. Furthermore, we have also gathered the precision of these risk
measures based on the accuracy of the predicted hit variables. All hit variables take

51
value one if the the time-series variables exceed their VaR/ES threshold over time, and
are zero otherwise. The forecast error of the hit-variables is then measured as the MAFE
between the true and predicted hit-variables. Since the hit-variable is a binary variable,
the MAFE quantifies the precision because it is now naturally bounded between 0 and
1. Table 10 reports the MAFE precision rates for each time-series, among all 1 − α
confidence levels.

52
Table 7: In-sample forecast perfomance results on mean-variance level

Conditionally Missing Randomly Missing

60% 25% 50% 75%

Sparse Series in Time-Series System

Moments 0 y1,t y1:2,t y1:3,t y1:4,t y1,t y1:2,t y1:3,t y1:4,t y1,t y1:2,t y1:3,t y1:4,t y1,t y1:2,t y1:3,t y1:4,t

Panel A: MAFE of in-sample forecasts for mean equation

µ1 0.050 0.419 0.486 0.557 0.818 0.169 0.182 0.193 0.202 0.256 0.279 0.297 0.317 0.339 0.377 0.412 0.456
µ2 0.048 0.136 0.486 0.558 0.821 0.075 0.183 0.193 0.203 0.098 0.280 0.298 0.318 0.118 0.377 0.410 0.455
µ3 0.048 0.136 0.218 0.557 0.818 0.074 0.096 0.193 0.203 0.096 0.136 0.297 0.317 0.118 0.179 0.410 0.453
µ4 0.065 0.151 0.246 0.330 0.816 0.091 0.114 0.134 0.21 0.114 0.158 0.193 0.335 0.134 0.199 0.246 0.477

Panel B: MAFE of in-sample forecasts for variance equation

σ1 0.020 0.237 0.258 0.297 0.360 0.071 0.074 0.077 0.079 0.119 0.125 0.133 0.135 0.182 0.193 0.214 0.212
σ2 0.019 0.037 0.259 0.299 0.359 0.023 0.074 0.077 0.079 0.028 0.126 0.134 0.136 0.032 0.191 0.213 0.211
σ3 0.019 0.037 0.061 0.298 0.361 0.023 0.028 0.077 0.079 0.028 0.039 0.134 0.136 0.032 0.051 0.212 0.210
σ4 0.025 0.046 0.078 0.106 0.360 0.032 0.038 0.046 0.081 0.037 0.051 0.066 0.142 0.042 0.064 0.084 0.222

Panel C: Coverage of mean equation at 95% confidence level (%)

µ1 92.804 95.031 95.258 98.080 94.551 95.716 95.593 95.752 95.995 96.032 96.127 96.849 97.113 96.12 97.164 99.038 98.809
µ2 94.041 95.111 95.175 98.048 94.447 94.894 95.650 95.794 95.972 94.816 96.228 97.053 97.141 95.086 97.095 99.075 98.869
µ3 93.492 95.144 93.537 98.036 94.518 95.239 95.071 95.742 95.942 95.225 93.663 96.960 97.187 95.158 93.204 99.121 98.864
µ4 96.086 96.862 96.657 99.069 95.015 96.620 96.550 96.259 96.923 96.318 96.299 96.496 98.159 97.682 97.884 98.734 99.668

Panel D: Coverage of variance equation at 95% confidence level (%)

σ1 94.665 94.983 95.438 97.671 98.335 94.976 94.458 93.680 93.610 94.645 93.405 91.241 91.444 93.454 92.186 91.931 90.107
σ2 94.946 95.278 95.318 97.751 98.373 94.857 94.508 93.760 93.405 95.025 93.300 91.108 90.985 94.529 92.518 92.107 89.993
σ3 94.756 95.567 91.359 97.686 98.402 95.210 93.895 93.713 93.004 95.020 90.726 91.065 91.168 95.161 86.535 92.057 90.368
σ4 91.896 95.823 96.602 98.710 98.353 92.307 92.214 91.764 93.569 92.884 91.437 88.614 92.571 93.433 92.241 89.058 92.733

Panel E: Forecast error correlation between mean and variance pairs (%)

ρµ ,σ2 0.191 0.298 -0.222 -0.729 -0.632 -0.563 -0.273 -0.764 -0.263 -0.885 -0.751 -0.535 -0.844 0.215 0.123 0.166 -0.150
1 1
ρµ ,σ2 1.249 0.615 -0.653 -0.723 -0.738 -0.497 -0.360 0.095 0.687 -0.552 0.138 0.289 -0.346 -0.183 0.376 0.615 0.588
2 2
ρµ ,σ2 -0.273 0.336 0.0210 -1.068 -0.518 0.329 0.234 -0.156 -0.098 -0.248 -0.286 0.058 -0.120 -0.126 0.088 -0.004 -0.292
3 3
ρµ ,σ2 -0.402 -0.110 -0.702 -0.55 -1.385 -0.181 0.237 0.010 0.499 -0.216 -0.229 -0.116 -0.140 0.242 0.025 -0.151 -0.439
4 4

Notes: The first two panels of this table report the MAFE of the mean and volatility for each time-series. The next two panels report the coverage rates of the forecast bands corresponding to a
confidence level of 95% for the mean and volatility. The last panel reports the Pearson’s correlation coefficient (in %) between the forecast errors of mean and variance pairs. For more details on
the scenarios and missing data mechanisms, see caption of Table 1.

53
Table 8: In-sample forecast performance results for tail risk measures

Conditionally Missing Randomly Missing

60% 25% 50% 75%

Sparse Series in Time-Series System

Series 0 y1,t y1:2,t y1:3,t y1:4,t y1,t y1:2,t y1:3,t y1:4,t y1,t y1:2,t y1:3,t y1:4,t y1,t y1:2,t y1:3,t y1:4,t

Panel A: MAFE of in-sample forecasts for VaR with α = 5%

y1 0.108 1.265 1.379 1.551 1.899 0.388 0.407 0.426 0.438 0.645 0.677 0.718 0.735 0.949 1.013 1.120 1.104
y2 0.103 0.225 1.381 1.543 1.893 0.136 0.409 0.428 0.439 0.167 0.686 0.728 0.737 0.199 0.997 1.108 1.096
y3 0.103 0.225 0.382 1.532 1.876 0.135 0.166 0.427 0.438 0.166 0.236 0.723 0.737 0.198 0.312 1.153 1.099
y4 0.136 0.274 0.462 0.604 1.887 0.179 0.221 0.263 0.450 0.215 0.303 0.384 0.752 0.248 0.376 0.482 1.116

Panel B: MAFE of in-sample forecasts for VaR with α = 1%

y1 0.143 1.734 1.884 2.112 2.53 0.523 0.546 0.570 0.585 0.874 0.915 0.969 0.987 1.295 1.378 1.523 1.489
y2 0.137 0.287 1.883 2.101 2.520 0.177 0.549 0.573 0.586 0.214 0.927 0.982 0.990 0.255 1.356 1.509 1.478
y3 0.138 0.287 0.491 2.087 2.499 0.175 0.214 0.571 0.585 0.214 0.303 0.975 0.990 0.252 0.400 1.571 1.483
y4 0.180 0.356 0.602 0.784 2.511 0.236 0.290 0.346 0.601 0.282 0.397 0.505 1.009 0.324 0.491 0.633 1.502

Panel C: MAFE of in-sample forecasts for ES with α = 2.5%

y1 0.144 1.742 1.893 2.121 2.541 0.525 0.549 0.572 0.587 0.878 0.919 0.973 0.991 1.301 1.384 1.53 1.495
y2 0.137 0.288 1.892 2.110 2.531 0.178 0.551 0.575 0.588 0.215 0.932 0.986 0.994 0.256 1.362 1.516 1.485
y3 0.138 0.288 0.492 2.096 2.51 0.176 0.215 0.573 0.587 0.214 0.304 0.98 0.994 0.253 0.402 1.578 1.489
y4 0.181 0.357 0.604 0.788 2.522 0.237 0.292 0.348 0.603 0.283 0.398 0.507 1.013 0.326 0.493 0.636 1.509

Panel D: MAFE of in-sample forecasts for ES with α = 1%

y1 0.162 1.972 2.140 2.396 2.854 0.592 0.618 0.644 0.660 0.991 1.037 1.097 1.116 1.47 1.563 1.728 1.685
y2 0.154 0.320 2.138 2.383 2.841 0.198 0.621 0.647 0.662 0.239 1.051 1.112 1.119 0.284 1.538 1.712 1.673
y3 0.155 0.320 0.547 2.368 2.820 0.196 0.240 0.645 0.660 0.239 0.338 1.104 1.12 0.281 0.447 1.782 1.678
y4 0.203 0.399 0.674 0.879 2.831 0.265 0.326 0.389 0.678 0.316 0.445 0.568 1.141 0.364 0.551 0.711 1.700

Notes: The panels of this table report the MAFE accuracy of the time-varying VaR and ES at several confidence levels for each time-series in the system. For more details on the scenarios and
missing data mechanisms, see caption of Table 1.

54
Table 9: In-sample accuracy of forecast bands for tail risk measures

Conditionally Missing Randomly Missing

60% 25% 50% 75%

Sparse Series in Time-Series System

Series 0 y1,t y1:2,t y1:3,t y1:4,t y1,t y1:2,t y1:3,t y1:4,t y1,t y1:2,t y1:3,t y1:4,t y1,t y1:2,t y1:3,t y1:4,t

Panel A: Coverage of VaR with α = 5% at 95% confidence level (%)

y1 95.079 95.276 95.720 97.895 97.768 95.473 95.158 94.511 94.462 95.389 94.638 93.698 94.142 95.413 95.150 95.436 96.108
y2 95.427 95.551 95.981 97.960 97.825 95.310 94.948 94.172 94.311 95.355 94.587 93.402 93.960 94.868 95.289 95.479 96.087
y3 95.331 95.664 92.372 97.965 97.918 95.516 94.532 94.463 94.105 95.575 92.080 93.521 94.230 95.619 90.048 95.487 96.071
y4 93.683 96.804 97.275 98.913 97.957 94.013 94.232 93.868 95.258 94.686 93.817 92.801 96.235 96.050 95.832 93.860 98.505

Panel B: Coverage of VaR with α = 1% at 95% confidence level (%)

y1 94.966 95.210 95.695 97.879 97.996 95.384 95.064 94.265 94.109 95.143 94.199 92.959 93.282 94.926 94.519 94.510 94.93
y2 95.247 95.528 95.988 97.941 98.061 95.211 94.916 93.951 93.939 95.335 94.118 92.615 93.016 94.791 94.656 94.576 94.913
y3 95.217 95.719 92.042 97.958 98.146 95.395 94.422 94.244 93.670 95.480 91.577 92.784 93.341 95.560 89.007 94.600 94.926
y4 92.977 96.554 97.229 98.897 98.151 93.436 93.616 93.106 94.631 94.036 92.925 91.443 95.224 95.264 94.835 92.233 97.633

Panel C: Coverage of ES with α = 2.5% at 95% confidence level (%)

y1 94.964 95.206 95.696 97.880 97.999 95.382 95.063 94.260 94.110 95.141 94.195 92.951 93.273 94.917 94.512 94.500 94.915
y2 95.244 95.529 95.989 97.940 98.061 95.208 94.921 93.953 93.937 95.334 94.113 92.606 93.000 94.791 94.647 94.567 94.896
y3 95.214 95.719 92.039 97.958 98.148 95.392 94.418 94.240 93.668 95.477 91.571 92.776 93.330 95.558 88.998 94.592 94.910
y4 92.970 96.550 97.226 98.899 98.153 93.429 93.612 93.098 94.622 94.032 92.916 91.424 95.216 95.256 94.824 92.218 97.620

Panel D: Coverage of ES with α = 1% at 95% confidence level (%)

y1 94.927 95.175 95.693 97.875 98.062 95.355 95.053 94.215 94.013 95.073 94.088 92.778 93.054 94.751 94.316 94.280 94.512
y2 95.181 95.523 95.976 97.944 98.124 95.192 94.898 93.908 93.841 95.344 93.980 92.423 92.739 94.805 94.411 94.359 94.466
y3 95.169 95.746 91.974 97.942 98.212 95.356 94.403 94.154 93.554 95.464 91.433 92.623 93.059 95.544 88.759 94.390 94.518
y4 92.779 96.457 97.233 98.934 98.209 93.277 93.435 92.939 94.433 93.850 92.682 91.048 94.914 95.014 94.538 91.849 97.219

Notes: The panels of this table report the coverage rates of the forecast bands for the time-varying VaR and ES at several confidence levels for each time-series in the system. For more details
on the scenarios and missing data mechanisms, see caption of Table 1.

55
Table 10: In-sample forecast precision of tail risk measures

Conditionally Missing Randomly Missing

60% 25% 50% 75%

Sparse Series in Time-Series System

Series 0 y1,t y1:2,t y1:3,t y1:4,t y1,t y1:2,t y1:3,t y1:4,t y1,t y1:2,t y1:3,t y1:4,t y1,t y1:2,t y1:3,t y1:4,t

Panel A: MAFE of in-sample forecasts for VaR hit-variable with α = 5% (%)

y1 0.381 5.044 5.044 5.044 5.044 5.046 2.338 2.355 2.422 4.968 3.622 3.611 3.65 5.019 4.542 4.562 4.594
y2 0.364 0.815 5.006 5.006 5.006 0.474 4.989 2.367 2.409 0.605 5.03 3.682 3.695 0.719 5.011 4.553 4.556
y3 0.366 0.824 1.319 4.992 4.992 0.486 0.606 5.002 2.453 0.587 0.818 5.016 3.716 0.688 1.053 4.95 4.511
y4 0.499 0.987 1.633 2.089 5.009 0.62 0.779 0.93 4.912 0.785 1.108 1.37 4.934 0.893 1.329 1.687 4.968

Panel B: MAFE of in-sample forecasts for VaR hit-variable with α = 1% (%)

y1 0.143 1.013 1.013 1.013 1.013 1.049 0.655 0.659 0.668 0.999 0.914 0.919 0.911 1.017 1.056 1.045 1.068
y2 0.127 0.271 1.01 1.01 1.01 0.176 1.036 0.65 0.661 0.194 1.006 0.92 0.941 0.234 0.994 1.021 1.041
y3 0.132 0.27 0.422 0.982 0.982 0.16 0.192 0.982 0.636 0.199 0.261 1.012 0.93 0.231 0.341 0.987 1.032
y4 0.173 0.343 0.569 0.747 1.035 0.228 0.276 0.327 0.995 0.259 0.371 0.477 1.005 0.305 0.465 0.594 1.008

Panel C: MAFE of in-sample forecasts for ES hit-variable with α = 2.5% (%)

y1 0.139 0.978 0.978 0.978 0.978 1.021 0.638 0.644 0.653 0.969 0.891 0.894 0.891 0.986 1.026 1.016 1.036
y2 0.122 0.264 0.976 0.976 0.976 0.18 1.01 0.643 0.645 0.193 0.978 0.9 0.916 0.224 0.964 0.992 1.013
y3 0.128 0.264 0.413 0.951 0.951 0.151 0.187 0.957 0.622 0.193 0.249 0.984 0.907 0.225 0.331 0.955 1.004
y4 0.165 0.333 0.559 0.734 1.001 0.223 0.276 0.322 0.959 0.249 0.361 0.470 0.975 0.300 0.452 0.581 0.977

Panel D: MAFE of in-sample forecasts for ES hit-variable with α = 1% (%)

y1 0.065 0.394 0.394 0.394 0.394 0.405 0.305 0.311 0.324 0.381 0.408 0.415 0.406 0.392 0.453 0.439 0.450
y2 0.058 0.126 0.384 0.384 0.384 0.075 0.396 0.307 0.286 0.09 0.373 0.387 0.409 0.106 0.38 0.434 0.445
y3 0.058 0.126 0.198 0.373 0.373 0.071 0.093 0.38 0.293 0.105 0.131 0.402 0.424 0.108 0.159 0.381 0.446
y4 0.086 0.175 0.299 0.383 0.391 0.104 0.136 0.172 0.375 0.127 0.185 0.240 0.378 0.154 0.229 0.304 0.391

Notes: The panels of this table report the precision of hit-variables for the time-varying VaR and ES at several confidence levels for each time-series in the system. For more details on the
scenarios and missing data mechanisms, see caption of Table 1.

56
D Additional case study results
D.1 Empirical performance of GAS update filter and smoother

Table 11: In-sample forecast accuracy of update filter on CDS JPM Term Structure
Conditionally Missing Randomly Missing

60% 25% 50% 75%

Sparse Series in Time-Series System

Factor Specification Tenor 0 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

Panel A: MAE of in-sample forecasts (bps)

1Y 0.602 15.126 15.108 15.102 15.105 2.105 1.569 1.439 1.266 2.564 2.115 1.877 1.844 3.344 2.992 2.880 2.906
3Y 0.684 16.426 16.402 16.417 16.400 2.587 1.918 1.748 1.551 2.921 2.487 2.263 2.197 3.970 3.592 3.451 3.383
Identity
5Y 0.848 16.547 16.533 16.535 16.533 2.883 2.143 1.957 1.752 3.239 2.688 2.521 2.449 4.239 3.742 3.760 3.651
10Y 1.047 17.091 17.081 17.054 17.010 3.288 2.465 2.196 2.006 3.562 2.909 2.776 2.692 4.442 4.013 3.880 3.840

1Y 0.891 6.467 8.184 9.535 15.103 1.440 1.261 1.168 1.213 1.578 1.443 1.481 1.485 1.981 1.926 2.020 2.246
3Y 0.813 3.144 4.554 7.600 16.354 1.190 1.151 1.128 1.180 1.328 1.450 1.356 1.542 1.384 1.487 1.722 2.356
Effect Coding
5Y 0.839 6.650 4.259 6.453 16.589 1.362 1.277 1.251 1.302 1.423 1.426 1.555 1.841 1.579 1.627 1.971 2.871
10Y 0.895 10.200 12.658 14.596 17.225 2.104 1.800 1.713 1.796 2.089 2.699 2.007 2.216 2.351 2.290 2.373 3.035

1Y 0.844 6.563 6.901 8.168 9.464 1.670 1.251 1.210 1.175 1.708 1.511 1.448 1.377 2.167 1.990 2.055 1.968
3Y 0.762 3.479 4.823 5.477 8.620 1.284 1.109 1.039 1.079 1.401 1.277 1.238 1.344 1.547 1.517 1.621 2.034
Effect Coding + NAIG-5Y
5Y 0.890 4.301 5.216 5.959 9.938 1.402 1.199 1.367 1.202 1.490 1.365 1.390 1.521 1.659 1.632 1.796 2.314
10Y 0.955 7.218 8.207 8.526 11.63 1.686 1.432 1.567 1.361 1.776 1.615 1.620 1.749 2.153 2.055 2.187 2.765

Panel B: Coverage of in-sample confidence bands at 95% confidence level (%)

1Y 100 92.089 91.856 91.881 91.694 95.173 96.853 97.484 97.753 94.787 96.093 96.831 96.867 94.676 95.916 96.066 95.942
3Y 100 96.844 96.556 96.559 96.739 94.907 96.880 97.342 97.807 95.480 96.080 96.813 96.870 94.622 95.449 95.630 95.907
Identity
5Y 100 92.856 92.461 92.685 92.519 95.333 97.067 97.333 97.913 95.733 96.507 96.747 96.913 94.604 95.312 95.645 95.666
10Y 100 91.111 91.150 91.300 91.244 94.827 96.720 97.538 97.833 95.253 96.533 96.827 96.933 94.955 95.859 96.013 96.109

1Y 99.847 90.944 89.511 88.207 81.708 98.853 99.240 99.431 99.413 98.600 98.840 98.902 98.550 96.969 97.484 97.775 96.831
3Y 100 98.489 95.439 97.400 85.164 99.973 99.867 99.822 99.773 99.760 99.573 99.684 99.047 99.680 99.604 99.339 97.691
Effect Coding
5Y 100 99.611 97.078 97.707 92.122 99.867 99.907 99.884 99.860 99.813 99.780 99.756 99.213 99.662 99.658 99.532 97.884
10Y 100 99.900 98.167 95.333 92.197 99.813 99.693 99.707 99.453 99.653 97.787 99.400 98.890 99.191 98.924 98.972 97.038

1Y 99.847 92.367 95.072 93.119 94.406 98.507 99.080 99.111 99.393 98.240 98.780 98.773 98.563 96.924 97.284 97.203 97.358
3Y 100 99.233 99.011 98.711 98.411 99.867 99.960 99.929 99.847 99.800 99.753 99.769 99.393 99.422 99.293 99.087 98.342
Effect Coding + NAIG-5Y
5Y 100 99.522 98.756 97.989 97.481 99.973 99.947 99.564 99.902 99.853 99.833 99.791 99.523 99.458 99.413 99.316 98.593
10Y 100 99.367 98.711 97.519 96.550 99.413 99.733 99.511 99.727 99.200 99.480 99.573 99.320 98.916 98.893 98.684 98.398

Notes: this table reports the empirical Monte Carlo imputation performance results for the dynamic multi-factor density model on JPM’s CDS time-series (tenors run from 3Y till 10Y).
The performance results are reported for different missing data patterns (i.e., combinations of missing data mechanisms and number of sparse time-series in the system), given three different
specifications of the factor loading matrix. Panel A of this table reports the MAFE in basis points (bps) and Panel B reports coverage rates (%) for the true observed values, computed at sparse
entries only. All results are computed whilst using the filtered location and scale estimates from the GAS update recursion in (19) (conditioned up to until time t, i.e., ft|t ). The estimation
period is January 2, 2011 - December 31, 2020.

57
Table 12: In-sample forecast accuracy of smoother on CDS JPM Term Structure

Conditionally Missing Randomly Missing

60% 25% 50% 75%

Sparse Series in Time-Series System

Factor Specification Tenor 0 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

Panel A: MAE of in-sample forecasts (bps)

1Y 5.282 15.186 15.183 15.177 15.180 4.926 4.938 5.065 5.017 4.287 4.364 4.290 4.351 3.421 3.377 3.377 3.422
3Y 6.692 16.588 16.527 16.554 16.515 6.218 6.101 6.361 6.212 5.366 5.494 5.411 5.438 4.298 4.289 4.358 4.396
Identity
5Y 7.368 16.631 16.713 16.655 16.722 6.780 6.691 7.229 6.879 6.090 6.157 6.108 6.128 5.112 5.054 5.122 5.110
10Y 7.649 17.239 17.257 17.227 17.235 7.203 7.252 7.525 7.294 6.439 6.493 6.446 6.452 5.393 5.438 5.482 5.543

1Y 5.956 10.315 10.704 11.241 15.365 6.137 6.036 5.937 6.033 6.067 5.897 5.763 5.419 5.804 5.419 5.013 4.354
3Y 6.952 9.010 9.184 9.582 16.601 6.997 6.818 6.781 6.901 6.915 7.583 6.351 5.955 6.719 6.317 5.787 4.995
Effect Coding
5Y 6.748 10.847 8.506 8.829 16.597 6.832 6.638 6.566 6.429 6.795 6.535 6.200 5.684 6.796 6.409 5.792 4.991
10Y 6.539 10.410 12.543 14.446 17.005 6.183 6.293 6.270 6.177 5.921 6.303 5.855 5.568 5.536 5.451 5.304 4.529

1Y 5.891 10.292 10.188 10.537 11.016 6.096 5.861 7.237 5.964 5.865 5.911 5.680 5.466 5.666 5.405 4.976 4.533
3Y 6.983 8.691 9.409 9.466 10.658 6.964 6.764 6.704 6.913 6.713 6.620 6.516 6.269 6.683 6.445 6.191 5.465
Effect Coding + NAIG-5Y
5Y 6.959 8.113 9.132 9.959 11.963 6.905 6.849 8.567 6.835 6.853 6.807 6.586 6.371 6.801 6.606 6.465 5.813
10Y 6.292 9.663 11.213 12.461 13.495 6.808 6.678 8.740 6.718 6.562 6.491 6.498 6.246 6.802 6.687 6.383 5.917

Panel B: Coverage of in-sample confidence bands at 95% confidence level (%)

1Y 52.472 93.844 93.572 93.863 93.625 63.067 62.880 61.360 60.607 78.120 78.147 78.751 78.930 96.836 96.813 96.797 96.621
3Y 48.793 97.700 97.533 97.437 97.453 59.147 58.880 57.964 58.013 77.240 75.467 75.929 76.170 95.227 95.267 94.904 95.047
Identity
5Y 50.709 95.511 94.772 94.837 94.633 62.213 61.827 60.089 60.227 76.507 75.140 74.653 74.617 92.702 92.722 92.515 92.420
10Y 53.545 91.300 91.322 91.130 91.169 63.333 62.600 60.987 61.520 74.253 74.280 75.018 74.863 91.584 91.564 91.476 91.671

1Y 47.949 89.689 89.039 87.611 81.061 48.160 47.600 49.422 49.607 48.613 49.500 53.218 55.857 52.702 56.618 63.579 76.591
3Y 47.566 97.122 94.889 96.496 87.539 48.240 49.653 50.569 50.553 49.107 52.507 54.049 58.910 50.747 55.724 63.538 79.138
Effect Coding
5Y 54.159 99.122 97.011 97.341 93.081 57.333 57.747 59.182 60.893 56.707 61.060 64.520 72.060 60.036 64.813 74.501 87.858
10Y 60.636 99.700 97.772 94.563 91.983 65.733 66.147 66.267 67.420 70.613 73.140 71.227 74.503 76.498 77.809 79.816 89.251

1Y 49.406 93.389 95.650 93.130 94.844 47.307 48.440 46.293 48.827 49.827 50.627 52.102 54.800 54.604 57.302 62.750 73.149
3Y 46.493 97.967 98.367 98.193 98.742 47.227 47.120 48.560 48.167 48.347 49.573 51.240 54.600 50.204 53.111 58.356 73.084
Effect Coding + NAIG-5Y
5Y 50.326 98.311 98.317 97.437 97.256 55.867 54.680 52.924 55.853 55.600 55.453 58.142 62.073 57.004 59.627 64.145 77.527
10Y 61.250 99.133 98.189 96.867 97.144 60.000 61.787 59.324 62.033 63.933 64.747 66.418 70.143 66.542 67.071 71.799 83.698

Notes: this table reports the empirical Monte Carlo imputation performance results for the dynamic multi-factor density model on JPM’s CDS time-series (tenors run from 3Y till 10Y).
The performance results are reported for different missing data patterns (i.e., combinations of missing data mechanisms and number of sparse time-series in the system), given three different
specifications of the factor loading matrix. Panel A of this table reports the MAFE in basis points (bps) and Panel B reports coverage rates (%) for the true observed values, computed at
sparse entries only. All results are computed whilst using the filtered location and scale estimates from the GAS smoothing recursion in (19) (conditioned across all time points t, i.e., ft|n ). The
estimation period is January 2, 2011 - December 31, 2020.

58
D.2 Rolling window OLS and smoothed factors

Figure 8: Smoothed estimates of common factors

(a) Mean factors


0.5
0.0
6M & AAA-AA
GAS Smoother
0.5 OLS
1.0
2012 2013 2014 2015 2016 2017 2018 2019 2020
0.1
0.5
0.5
1.0 6M 1Y 0.0 3Y
1.5 1.0 0.1

0.6 1.0
0.75
0.4 5Y 0.50 7Y 0.5
10Y
0.2 0.25

0.5
0.0 0.25
Asia Europe 0.00 North America
0.0 0.25
0.5

0.2 0.25 0.6


0.4
A 0.00 BBB 0.4 BB
0.25 0.2

0.5
0.0 0.0 0.0
6M & AAA-AA 1Y & AAA-AA 3Y & AAA-AA
0.5 0.5 0.2

0.5 0.50
0.25 0.1
6M & A 1Y & A 3Y & A
0.0 0.00 0.0

0.5 0.5 0.2


0.0 0.0 0.0
6M & BBB 1Y & BBB 3Y & BBB
0.5 0.5 0.2

1
0.5 0.25
0 6M & BB 0.0 1Y & BB 0.00 3Y & BB
0.5 0.25
2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020

(b) Log-variance factors


2.0
2.5
3.0 6M & AAA-AA
3.5 GAS Smoother
4.0
4.5
2012 2013 2014 2015 2016 2017 2018 2019 2020
2 2 0.0
1
6M 1Y 3Y
1 0.5

0.5 0
5Y 1 7Y 10Y
1.0 1
1.5 2

1 1 1
Asia 0 Europe 0 North America
0 1
1

0 6
1
2
A 2 BBB 4 BB
3 2

2 2 1
0 6M & AAA-AA 0 1Y & AAA-AA 0 3Y & AAA-AA
2 1
2

2
1 1
0
6M & A 0 1Y & A 3Y & A
0
1

2 2
1
0
6M & BBB 0
1Y & BBB 0 3Y & BBB
1

0 0 0
6M & BB 1Y & BB 3Y & BB
5 5 2
2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020

Notes: the first panel of this figure displays the filtered conditional common CDS mean factors. All filtered estimates
obtained through the predictive GAS filter (ft ) are plotted in solid lines, but each factor category is highlighted with a
different color, e.g., the regional factors are plotted in yellow and the interaction terms in purple. The factors according to
the GAS smoother are plotted in solid bright blue lines. The daily rolling window OLS estimates of the mean factors are
plotted in black dots. Similarly, the second panel depicts filtered and smoothing estimates for the time-varying common
log-variance factors. For more details, see caption of Figure 5.

59
Figure 9: Smoothed estimates of idiosyncratic location-scale factors

(a) Mean factors


Panel A: Asia
1.25 Predictive 0.50 0.5
Bk of India CTBC Finl Hldg Co Ltd 0.75 INDL COML BNK OF CHINA LTD Kookmin Bk
1.00 Smoother 0.25 0.50
0.75 OLS 0.00 0.0
0.25
0.50 0.25 0.00
0.25 0.50 0.5
0.25
0.00 0.75

0.6 MIZUHO Finl Gp INC 0.5


Mitsubishi UFJ Morgan Stanley Secs Co Ltd 0.5 Nomura Hldgs Inc 0.4 Temasek Hldgs
0.6
0.4 0.0 0.8
0.0
1.0
0.2 0.5 1.2
0.5
1.4

0.50
Panel B: Europe
0.6
0.25 Allianz SE 0.25 BNP Paribas 0.4
Barclays PLC 1.0 Deutsche Bk AG
0.50 0.00 0.2 0.5
0.75 0.25 0.0 0.0
1.00 0.50 0.2
1.25 0.75 0.4 0.5

0.50
1.5
0.25 ING Groep NV 0.6 Lloyds Bkg Group plc 0.0 Skandinaviska Enskilda Banken AB Swiss Life Ltd
0.4 1.0
0.00 0.5
0.2
0.25 0.5
0.0
1.0
0.50 0.2 0.0

Panel C: North America


0.5
Amern Express Cr Corp 0.25 Boston Pptys Inc 0.4 Cap One Finl Corp 1.00 Goldman Sachs Gp Inc
0.0 0.00 0.2 0.75
0.0 0.50
0.5 0.25
0.2 0.25
0.50
1.0 0.4 0.00
0.75 0.6

0.5 1.4
JPMorgan Chase & Co Navient Corp 0.0 Royal Bk Cda 0.50 Wells Fargo & Co
0.0 1.2 0.25
1.0 0.5 0.00
0.5
1.0 0.25
1.0 0.8
1.5 0.50
2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020

(b) Log-variance factors


2
Panel A:4 Asia
2 Predictive Bk of India CTBC Finl Hldg Co Ltd INDL COML BNK OF CHINA LTD Kookmin Bk
Smoother 0 2 0
1
0 0
2 2
1

1 MIZUHO Finl Gp INC 2


Mitsubishi UFJ Morgan Stanley Secs Co Ltd 1 Nomura Hldgs Inc Temasek Hldgs
0
0 1 0
1 0 1 2

Panel B: Europe 4
0 Allianz SE 1 BNP Paribas Barclays PLC Deutsche Bk AG
2 2
1 0
1 0
2 0
2 2

4 1
1 ING Groep NV Lloyds Bkg Group plc Skandinaviska Enskilda Banken AB 4 Swiss Life Ltd
0
0 2 2
1
1 0 0
2
2 2
Panel C: North America
3 Amern Express Cr Corp 2 Boston Pptys Inc 1 Cap One Finl Corp 1 Goldman Sachs Gp Inc
2 1
0 0
1
0
0 1
1 1
1

4 4
JPMorgan Chase & Co 0 Navient Corp Royal Bk Cda Wells Fargo & Co
2 1
2 1
0 0
2
0 2
3 1
2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020 2012 2013 2014 2015 2016 2017 2018 2019 2020

Notes: the first plot if this panel displays the filtered conditional idiosyncratic (fixed-effect) mean factors (solid pink).
Each panel of the first panel plots 8 factors, namely one for each of the financial firms within the respective region. The
factors according to the GAS smoother are plotted in solid bright blue lines. The daily rolling window OLS estimates of
the mean factors are plotted in black dots. Similarly, the second panel depicts filtered and smoothing estimates for the
time-varying common log-variance factors. For more details, see caption of Figure 5.

60

You might also like