Are you sure?
This action might not be possible to undo. Are you sure you want to continue?
by
Aijun Zhang
A dissertation submitted in partial fulﬁllment
of the requirements for the degree of
Doctor of Philosophy
(Statistics)
in The University of Michigan
2009
Doctoral Committee:
Professor Vijayan N. Nair, CoChair
Agus Sudjianto, CoChair, Bank of America
Professor Tailen Hsing
Associate Professor Jionghua Jin
Associate Professor Ji Zhu
c _ Aijun Zhang 2009
All Rights Reserved
To my elementary school, high school and university teachers
ii
ACKNOWLEDGEMENTS
First of all, I would express my gratitude to my advisor Prof. Vijay Nair for guiding
me during the entire PhD research. I appreciate his inspiration, encouragement and
protection through these valuable years at the University of Michigan. I am thankful
to Julian Faraway for his encouragement during the ﬁrst years of my PhD journey. I
would also like to thank Ji Zhu, Judy Jin and Tailen Hsing for serving on my doctoral
committee and helpful discussions on this thesis and other research works.
I am grateful to Dr. Agus Sudjianto, my coadvisor from Bank of America, for
giving me the opportunity to work with him during the summers of 2006 and 2007
and for oﬀering me a fulltime position. I appreciate his guidance, active support
and his many illuminating ideas. I would also like to thank Tony Nobili, Mike Bonn,
Ruilong He, Shelly Ennis, Xuejun Zhou, Arun Pinto, and others I ﬁrst met in 2006
at the Bank. They all persuaded me to jump into the area of credit risk research; I
did it a year later and ﬁnally came up with this thesis within two more years.
I would extend my acknowledgement to Prof. KaiTai Fang for his consistent
encouragement ever since I graduated from his class in Hong Kong 5 years ago.
This thesis is impossible without the love and remote support of my parents in
China. To them, I am most indebted.
iii
TABLE OF CONTENTS
DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . iii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
CHAPTER
I. An Introduction to Credit Risk Modeling . . . . . . . . . . . . 1
1.1 Two Worlds of Credit Risk . . . . . . . . . . . . . . . . . . . 2
1.1.1 Credit Spread Puzzle . . . . . . . . . . . . . . . . . 2
1.1.2 Actual Defaults . . . . . . . . . . . . . . . . . . . . 5
1.2 Credit Risk Models . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Structural Approach . . . . . . . . . . . . . . . . . 6
1.2.2 Intensitybased Approach . . . . . . . . . . . . . . . 9
1.3 Survival Models . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 Parametrizing Default Intensity . . . . . . . . . . . 12
1.3.2 Incorporating Covariates . . . . . . . . . . . . . . . 13
1.3.3 Correlating Credit Defaults . . . . . . . . . . . . . . 15
1.4 Scope of the Thesis . . . . . . . . . . . . . . . . . . . . . . . 18
II. Adaptive Smoothing Spline . . . . . . . . . . . . . . . . . . . . . 24
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2 AdaSS: Adaptive Smoothing Spline . . . . . . . . . . . . . . . 28
2.2.1 Reproducing Kernels . . . . . . . . . . . . . . . . . 30
2.2.2 Local Penalty Adaptation . . . . . . . . . . . . . . . 32
2.3 AdaSS Properties . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 38
iv
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.6 Technical Proofs . . . . . . . . . . . . . . . . . . . . . . . . . 46
III. Vintage Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . 48
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 MEV Decomposition Framework . . . . . . . . . . . . . . . . 51
3.3 Gaussian Process Models . . . . . . . . . . . . . . . . . . . . 58
3.3.1 Covariance Kernels . . . . . . . . . . . . . . . . . . 58
3.3.2 Kriging, Spline and Kernel Methods . . . . . . . . . 60
3.3.3 MEV Backﬁtting Algorithm . . . . . . . . . . . . . 65
3.4 Semiparametric Regression . . . . . . . . . . . . . . . . . . . 67
3.4.1 Single Segment . . . . . . . . . . . . . . . . . . . . 68
3.4.2 Multiple Segments . . . . . . . . . . . . . . . . . . . 69
3.5 Applications in Credit Risk Modeling . . . . . . . . . . . . . 73
3.5.1 Simulation Study . . . . . . . . . . . . . . . . . . . 74
3.5.2 Corporate Default Rates . . . . . . . . . . . . . . . 78
3.5.3 Retail Loan Loss Rates . . . . . . . . . . . . . . . . 80
3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.7 Technical Proofs . . . . . . . . . . . . . . . . . . . . . . . . . 86
IV. Dualtime Survival Analysis . . . . . . . . . . . . . . . . . . . . . 88
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.2 Nonparametric Methods . . . . . . . . . . . . . . . . . . . . . 93
4.2.1 Empirical Hazards . . . . . . . . . . . . . . . . . . . 94
4.2.2 DtBreslow Estimator . . . . . . . . . . . . . . . . . 95
4.2.3 MEV Decomposition . . . . . . . . . . . . . . . . . 97
4.3 Structural Models . . . . . . . . . . . . . . . . . . . . . . . . 104
4.3.1 Firstpassagetime Parameterization . . . . . . . . . 104
4.3.2 Incorporation of Covariate Eﬀects . . . . . . . . . . 109
4.4 Dualtime Cox Regression . . . . . . . . . . . . . . . . . . . . 111
4.4.1 Dualtime Cox Models . . . . . . . . . . . . . . . . 112
4.4.2 Partial Likelihood Estimation . . . . . . . . . . . . 114
4.4.3 Frailtytype Vintage Eﬀects . . . . . . . . . . . . . . 116
4.5 Applications in Retail Credit Risk Modeling . . . . . . . . . . 119
4.5.1 Credit Card Portfolios . . . . . . . . . . . . . . . . 119
4.5.2 Mortgage Competing Risks . . . . . . . . . . . . . . 124
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.7 Supplementary Materials . . . . . . . . . . . . . . . . . . . . 130
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
v
LIST OF FIGURES
Figure
1.1 Moody’srated corporate default rates, bond spreads and NBER
dated recessions. Data sources: a) Moody’s Baa & Aaa corpo
rate bond yields (http://research.stlouisfed.org/fred2/categories/119);
b) Moody’s Special Comment on Corporate Default and Recovery
Rates, 19202008 (http://www.moodys.com/); c) NBERdated reces
sions (http://www.nber.org/cycles/). . . . . . . . . . . . . . . . . . . 4
1.2 Simulated drifted Wiener process, ﬁrstpassagetime and hazard rate. 8
1.3 Illustration of retail credit portfolios and vintage diagram. . . . . . 19
1.4 Moody’s speculativegrade default rates for annual cohorts 19702008:
projection views in lifetime, calendar and vintage origination time. . 21
1.5 A road map of thesis developments of statistical methods in credit
risk modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1 Heterogeneous smooth functions: (1) Doppler function simulated with
noise, (2) HeaviSine function simulated with noise, (3) Motorcycle
Accident experimental data, and (4) CreditRisk tweaked sample. . . 26
2.2 OrdSS function estimate and its 2ndorder derivative (upon stan
dardization): scaled signal from f(t) = sin(ωt) (upper panel) and
f(t) = sin(ωt
4
) (lower panel), ω = 10π, n = 100 and snr = 7. The
sin(ωt
4
) signal resembles the Doppler function in Figure 2.1; both
have timevarying frequency. . . . . . . . . . . . . . . . . . . . . . . 34
2.3 OrdSS curve ﬁtting with m = 2 (shown by solid lines). The dashed
lines represent 95% conﬁdence intervals. In the creditrisk case, the
log loss rates are considered as the responses, and the timedependent
weights are speciﬁed proportional to the number of replicates. . . . 39
vi
2.4 Simulation study of Doppler and HeaviSine functions: OrdSS (blue),
AdaSS (red) and the heterogeneous truth (light background). . . . . 40
2.5 Nonstationary OrdSS and AdaSS for MotorcycleAccident Data. . . 42
2.6 OrdSS, nonstationary OrdSS and AdaSS: performance in comparison. 42
2.7 AdaSS estimate of maturation curve for CreditRisk sample: piecewise
constant ρ
−1
(t) (upper panel) and piecewiselinear ρ
−1
(t) (lower panel). 44
3.1 Vintage diagram upon truncation and exempliﬁed prototypes. . . . 50
3.2 Synthetic vintage data analysis: (top) underlying true marginal ef
fects; (2nd) simulation with noise; (3nd) MEV Backﬁtting algorithm
upon convergence; (bottom) Estimation compared to the underlying
truth. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.3 Synthetic vintage data analysis: (top) projection views of ﬁtted val
ues; (bottom) data smoothing on vintage diagram. . . . . . . . . . . 77
3.4 Results of MEV Backﬁtting algorithm upon convergence (right panel)
based on the squared exponential kernels. Shown in the left panel is
the GCV selection of smoothing and structural parameters. . . . . . 79
3.5 MEV ﬁtted values: Moody’srated Corporate Default Rates . . . . . 79
3.6 Vintage data analysis of retail loan loss rates: (top) projection views
of emprical loss rates in lifetime m, calendar t and vintage origination
time v; (bottom) MEV decomposition eﬀects
ˆ
f(m), ˆ g(t) and
ˆ
h(v) (at
log scale). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.1 Dualtimetodefault: (left) Lexis diagram of subsampled simula
tions; (right) empirical hazard rates in either lifetime or calendar time. 90
4.2 DtBreslow estimator vs oneway nonparametric estimator using the
data of (top) both pre2005 and post2005 vintages; (middle) pre
2005 vintages only; (bottom) post2005 vintages only. . . . . . . . . 98
4.3 MEV modeling of empirical hazard rates, based on the dualtimeto
default data of (top) both pre2005 and post2005 vintages; (middle)
pre2005 vintages only; (bottom) post2005 vintages only. . . . . . . 101
4.4 Nonparametric analysis of credit card risk: (top) oneway empirical
hazards, (middle) DtBrewlow estimation, (bottom) MEV decompo
sition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
vii
4.5 MEV decomposition of credit card risk for low, medium and high
FICO buckets: Segment 1 (top panel); Segment 2 (bottom panel). . 123
4.6 Home prices and unemployment rate of California state: 20002008. 125
4.7 MEV decomposition of mortgage hazards: default vs. prepayment . 125
4.8 Split timevarying covariates into little time segments, illustrated. . 133
viii
LIST OF TABLES
Table
1.1 Moody’s speculativegrade default rates. Data source: Moody’s spe
cial comment (release: February 2009) on corporate default and re
covery rates, 19202008 (http://www.moodys.com/) and author’s cal
culations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1 AdaSS parameters in Doppler and HeaviSine simulation study. . . . . 41
3.1 Synthetic vintage data analysis: MEV modeling exercise with GCV
selected structural and smoothing parameters. . . . . . . . . . . . . 77
4.1 Illustrative data format of pooled credit card loans . . . . . . . . . . 120
4.2 Loanlevel covariates considered in mortgage credit risk modeling,
where NoteRate could be dynamic and others are static upon origi
nation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.3 Maximum partial likelihood estimation of mortgage covariate eﬀects
in dualtime Cox regression models. . . . . . . . . . . . . . . . . . . 128
ix
ABSTRACT
This research deals with some statistical modeling problems that are motivated
by credit risk analysis. Credit risk modeling has been the subject of considerable
research interest in ﬁnance and has recently drawn the attention of statistical re
searchers. In the ﬁrst chapter, we provide an uptodate review of credit risk models
and demonstrate their close connection to survival analysis.
The ﬁrst statistical problem considered is the development of adaptive smooth
ing spline (AdaSS) for heterogeneously smooth function estimation. Two challenging
issues that arise in this context are evaluation of reproducing kernel and determi
nation of local penalty, for which we derive an explicit solution based on piecewise
type of local adaptation. Our nonparametric AdaSS technique is capable of ﬁtting a
diverse set of ‘smooth’ functions including possible jumps, and it plays a key role in
subsequent work in the thesis.
The second topic is the development of dualtime analytics for observations in
volving both lifetime and calendar timescale. It includes “vintage data analysis”
(VDA) for continuous type of responses in the third chapter, and “dualtime survival
analysis” (DtSA) in the fourth chapter. We propose a maturationexogenousvintage
(MEV) decomposition strategy in order to understand the risk determinants in terms
of selfmaturation in lifetime, exogenous inﬂuence by macroeconomic conditions, and
heterogeneity induced from vintage originations. The intrinsic identiﬁcation problem
is discussed for both VDA and DtSA. Speciﬁcally, we consider VDA under Gaus
sian process models, provide an eﬃcient MEV backﬁtting algorithm and assess its
performance with both simulation and real examples.
x
DtSA on Lexis diagram is of particular importance in credit risk modeling where
the default events could be triggered by both endogenous and exogenous hazards.
We consider nonparametric estimators, ﬁrstpassagetime parameterization and semi
parametric Cox regression. These developments extend the family of models for both
credit risk modeling and survival analysis. We demonstrate the application of DtSA
to credit card and mortgage risk analysis in retail banking, and shed some light on
understanding the ongoing credit crisis.
xi
CHAPTER I
An Introduction to Credit Risk Modeling
Credit risk is a critical area in banking and is of concern to a variety of stakehold
ers: institutions, consumers and regulators. It has been the subject of considerable
research interest in banking and ﬁnance communities, and has recently drawn the
attention of statistical researchers. By Wikipedia’s deﬁnition,
“Credit risk is the risk of loss due to a debtor’s nonpayment of a loan
or other line of credit.” (Wikipedia.org, as of March 2009)
Central to credit risk is the default event, which occurs if the debtor is unable to
meet its legal obligation according to the debt contract. The examples of default
event include the bond default, the corporate bankruptcy, the credit card charge
oﬀ, and the mortgage foreclosure. Other forms of credit risk include the repayment
delinquency in retail loans, the loss severity upon the default event, as well as the
unexpected change of credit rating.
An enormous literature in credit risk has been fostered by both academics in
ﬁnance and practitioners in industry. There are two parallel worlds based upon a
simple dichotomous rule of data availability: (a) the direct measurements of credit
performance and (b) the prices observed from credit market. The data availability
leads to two streams of credit risk modeling that have key distinctions.
1
1.1 Two Worlds of Credit Risk
The two worlds of credit risk can be simply characterized by the types of default
probability, one being actual and the other being implied. The former corresponds
to the direct observations of defaults, also known as the physical default probability
in ﬁnance. The latter refers to the riskneutral default probability implied from the
credit market data, e.g. corporate bond yields.
The academic literature of corporate credit risk has been inclined to the study of
the implied defaults, which is yet a puzzling world.
1.1.1 Credit Spread Puzzle
The credit spread of a defaultable corporate bond is its excess yield over the
defaultfree Treasury bond of the same time to maturity. Consider a zerocoupon
corporate bond with unit face value and maturity date T. The yieldtomaturity at
t < T is deﬁned by
Y (t, T) = −
log B(t, T)
T −t
(1.1)
where B(t, T) is the bond price of the form
B(t, T) = E
_
exp
_
−
_
T
t
(r(s) + λ(s))ds
__
, (1.2)
given the independent term structures of the interest rate r() and the default rate
λ(). Setting λ(t) ≡ 0 gives the benchmark price B
0
(t, T) and the yield Y
0
(t, T) of
Treasury bond. Then, the credit spread can be calculated as
Spread(t, T) = Y (t, T) −Y
0
(t, T) = −
log(B(t, T)/B
0
(t, T))
T −t
= −
log(1 −q(t, T))
T −t
(1.3)
where q(t, T) = E
_
e
−
R
T
t
λ(t)ds
¸
is the conditional default probability P[τ ≤ T[τ > t] and
τ denotes the timetodefault, to be detailed in the next section.
2
The credit spread is supposed to comove with the default rate. For illustration,
Figure 1.1 (top panel) plots the Moody’srated corporate default rates and Baa
Aaa bond spreads ranging from 19202008. The shaded backgrounds correspond to
NBER’s latest announcement of recession dates. Most spikes of the default rates and
the credit spreads coincide with the recessions, but it is clear that the movements of
two time series diﬀer in both level and change. Such a lack of match is the socalled
credit spread puzzle in the latest literature of corporate ﬁnance; the actual default
rates could be successfully implied from the market data of credit spreads by none of
the existing structural credit risk models.
The default rates implied from credit spreads mostly overestimate the expected
default rates. One factor that helps explain the gap is the liquidity risk – a secu
rity cannot be traded quickly enough in the market to prevent the loss. Figure 1.1
(bottom panel) plots, in ﬁne resolution, the Moody’s speculativegrade default rates
versus the highyield credit spreads: 19942008. It illustrates the phenomenon where
the spread changed in response to liquidityrisk events, while the default rates did not
react signiﬁcantly until quarters later; see the liquidity crises triggered in 1998 (Rus
sian/LTCM shock) and 2007 (Subprime Mortgage meltdown). Besides the liquidity
risk, there exist other known (e.g. tax treatments of corporate bonds vs. government
bonds) and unknown factors that make incomplete the implied world of credit risk.
As of today, we lack a thorough understanding of the credit spread puzzle; see e.g.
Chen, Lesmond and Wei (2007) and references therein.
The shaky foundation of the default risk implied from market credit spreads with
out looking at the historical defaults leads to further questions about the credit
derivatives, e.g. Credit Default Swap (CDS) and Collateralized Debt Obligation
(CDO). The 200708 collapse of credit market in Wall Street is partly due to over
complication in “innovating” such complex ﬁnancial instruments on the one hand,
and oversimpliﬁcation in quantifying their embedded risks on the other.
3
Figure 1.1: Moody’srated corporate default rates, bond spreads and NBER
dated recessions. Data sources: a) Moody’s Baa & Aaa cor
porate bond yields (http://research.stlouisfed.org/fred2/categories/119);
b) Moody’s Special Comment on Corporate Default and Recovery
Rates, 19202008 (http://www.moodys.com/); c) NBERdated recessions
(http://www.nber.org/cycles/).
4
1.1.2 Actual Defaults
The other world of credit risk is the study of default probability bottom up from
the actual credit performance. It includes the popular industry practices of
a) credit rating in corporate ﬁnance, by e.g. the three major U.S. rating agencies:
Moody’s, Standard & Poor’s, and Fitch.
b) credit scoring in consumer lending, by e.g. the three major U.S. credit bureaus:
Equifax, Experian and TransUnion.
Both credit ratings and scores represent the creditworthiness of individual corpora
tions and consumers. The ﬁnal evaluations are based on statistical models of the
expected default probability, as well as judgement by rating/scoring specialists. Let
us describe very brieﬂy some rating and scoring basics related to the thesis.
The letters Aaa and Baa in Figure 1.1 are examples of Moody’s rating system,
which use Aaa, Aa, A, Baa, Ba, B, Caa, Ca, C to represent the likelihood of default
from the lowest to the highest. The speculative grade in Figure 1.1 refers to Ba and
the worse ratings. The speculativegrade corporate bonds are sometimes said to be
highyield or junk.
FICO, developed by Fair Isaac Corporation, is the bestknown consumer credit
score and it is the most widely used by U.S. banks and other credit card or mortgage
lenders. It ranges from 300 (very poor) to 850 (best), and intends to represent the
creditworthiness of a borrower such that he or she will repay the debt. For the same
borrower, the three major U.S. credit bureaus often report inconsistent FICO scores
based on their own proprietary models.
Compared to either the industry practices mentioned above or the academics of
the implied default probability, the academic literature based on the actual defaults
is much smaller, which we believe is largely due to the limited access for an academic
researcher to the proprietary internal data of historical defaults. A few academic
5
works will be reviewed later in Section 1.3. In this thesis, we make an attempt
to develop statistical methods based on the actual credit performance data. For
demonstration, we shall use the synthetic or tweaked samples of retail credit portfolios,
as well as the public release of corporate default rates by Moody’s.
1.2 Credit Risk Models
This section reviews the ﬁnance literature of credit risk models, including both
structural and intensitybased approaches. Our focus is placed on the probability of
default and the hazard rate of timetodefault.
1.2.1 Structural Approach
In credit risk modeling, structural approach is also known as the ﬁrmvalue ap
proach since a ﬁrm’s inability to meet the contractual debt is assumed to be deter
mined by its asset value. It was inspired by the 1970s BlackScholesMerton method
ology for ﬁnancial option pricing. Two classic structural models are the Merton model
(Merton, 1974) and the ﬁrstpassagetime model (Black and Cox, 1976).
The Merton model assumes that the default event occurs at the maturity date
of debt if the asset value is less than the debt level. Let D be the debt level with
maturity date T, and let V (t) be the latent asset value following a geometric Brownian
motion
dV (t) = µV (t)dt + σV (t)dW(t), (1.4)
with drift µ, volatility σ and the standard Wiener process W(t). Recall that EW(t) =
0, EW(t)W(s) = min(t, s). Given the initial asset value V (0) > D, by It´ o’s lemma,
V (t)
V (0)
= exp
_
_
µ −
1
2
σ
2
_
t + σW
t
_
∼ Lognormal
_
_
µ −
1
2
σ
2
_
t, σ
2
t
_
, (1.5)
from which one may evaluate the default probability P(V (T) ≤ D).
6
The notion of distancetodefault facilitates the computation of conditional default
probability. Given the sample path of asset values up to t, one may ﬁrst estimate the
unknown parameters in (1.4) by maximum likelihood method. According to Duﬃe
and Singleton (2003), let us deﬁne the distancetodefault X(t) by the number of
standard deviations such that log V
t
exceeds log D, i.e.
X(t) = (log V (t) −log D)/σ. (1.6)
Clearly, X(t) is a drifted Wiener process of the form
X(t) = c + bt + W(t), t ≥ 0 (1.7)
with b =
µ−σ
2
/2
σ
and c =
log V (0)−log D
σ
. Then, it is easy to verify that the conditional
probability of default at maturity date T is
P(V (T) ≤ D[V (t) > D) = P(X(T) ≤ 0[X(t) > 0) = Φ
_
X(t) + b(T −t)
√
T −t
_
, (1.8)
where Φ() is the cumulative normal distribution function.
The ﬁrstpassagetime model by Black and Cox (1976) extends the Merton model
so that the default event could occur as soon as the asset value reaches a prespeciﬁed
debt barrier. Figure 1.2 (left panel) illustrates the ﬁrstpassagetime of a drifted
Wiener process by simulation, where we set the parameters b = −0.02 and c = 8 (s.t.
a constant debt barrier). The key diﬀerence of Merton model and BlackCox model
lies in the green path (the middle realization viewed at T), which is treated as default
in one model but not the other.
By (1.7) and (1.8), V (t) hits the debt barrier once the distancetodefault process
X(t) hits zero. Given the initial distancetodefault c ≡ X(0) > 0, consider the
7
Figure 1.2: Simulated drifted Wiener process, ﬁrstpassagetime and hazard rate.
ﬁrstpassagetime
τ = inf¦t ≥ 0 : X(t) ≤ 0¦, (1.9)
where inf ∅ = ∞ as usual. It is well known that τ follows the inverse Gaussian
distribution (Schr¨ odinger, 1915; Tweedie, 1957; Chhikara and Folks, 1989) with the
density
f(t) =
c
√
2π
t
−3/2
exp
_
−
(c + bt)
2
2t
_
, t ≥ 0. (1.10)
See also other types of parametrization in Marshall and Olkin (2007; ¸13). The
survival function S(t) is deﬁned by P(τ > t) for any t ≥ 0 and is given by
S(t) = Φ
_
c + bt
√
t
_
−e
−2bc
Φ
_
−c + bt
√
t
_
. (1.11)
The hazard rate, or the conditional default rate, is deﬁned by the instantaneous
rate of default conditional on the survivorship,
λ(t) = lim
∆t↓0
1
∆t
P(t ≤ τ < t + ∆t[τ ≥ t) =
f(t)
S(t)
. (1.12)
Using the inverse Gaussian density and survival functions, we obtain the form of the
8
ﬁrstpassagetime hazard rate:
λ(t; c, b) =
c
√
2πt
3
exp
_
−
(c + bt)
2
2t
_
Φ
_
c + bt
√
t
_
−e
−2bc
Φ
_
−c + bt
√
t
_, c > 0. (1.13)
This is one of the most important forms of hazard function in structural approach
to credit risk modeling. Figure 1.2 (right panel) plots λ(t; c, b) for b = −0.02 and
c = 4, 6, 8, 10, 12, which resemble the default rates from low to high credit qualities
in terms of credit ratings or FICO scores. Both the trend parameter b and the initial
distancetodefault parameter c provide insights to understanding the shape of the
hazard rate; see the details in Chapter IV or Aalen, Borgan and Gjessing (2008; ¸10).
Modern developments of structural models based on Merton and BlackCox mod
els can be referred to Bielecki and Rutkowski (2004; ¸3). Later in Chapter IV, we
will discuss the dualtime extension of ﬁrstpassagetime parameterization with both
endogenous and exogenous hazards, as well as nonconstant default barrier and in
complete information about structural parameters.
1.2.2 Intensitybased Approach
The intensitybased approach is also called the reducedform approach, proposed
independently by Jarrow and Turnbull (1995) and Madan and Unal (1998). Many
followup papers can be found in Lando (2004), Bielecki and Rutkowski (2004) and
references therein. Unlike the structural approach that assumes the default to be
completely determined by the asset value subject to a barrier, the default event in the
reducedform approach is governed by an externally speciﬁed intensity process that
may or may not be related to the asset value. The default is treated as an unexpected
event that comes ‘by surprise’. This is a practically appealing feature, since in the
real world the default event (e.g. Year 2001 bankruptcy of Enron Corporation) is
9
often all of a sudden happening without announcement.
The default intensity corresponds to the hazard rate λ(t) = f(t)/S(t) deﬁned in
(1.12) and it has roots in statistical reliability and survival analysis of timetofailure.
When S(t) is absolutely continuous with f(t) = d(1 −S(t))/dt, we have that
λ(t) =
−dS(t)
S(t)dt
= −
d[log S(t)]
dt
, S(t) = exp
_
−
_
t
0
λ(s)ds
_
, t ≥ 0.
In survival analysis, λ(t) is usually assumed to be a deterministic function in time.
In credit risk modeling, λ(t) is often treated as stochastic. Thus, the default time
τ is doubly stochastic. Note that Lando (1998) adopted the term “doubly stochastic
Poisson process” (or, Cox process) that refers to a counting process with possibly
recurrent events. What matters in modeling defaults is only the ﬁrst jump of the
counting process, in which case the default intensity is equivalent to the hazard rate.
In ﬁnance, the intensitybased models are mostly the termstructure models bor
rowed from the literature of interestrate modeling. Below is an incomplete list:
Vasicek: dλ(t) = κ(θ −λ(t))dt + σdW
t
CoxIngersollRoll: dλ(t) = κ(θ −λ(t))dt + σ
_
λ(t)dW
t
(1.14)
Aﬃne jump: dλ(t) = µ(λ(t))dt + σ(λ(t))dW
t
+ dJ
t
with reference to Vasicek (1977), Cox, Ingersoll and Roll (1985) and Duﬃe, Pan and
Singleton (2000). The last model involves a purejump process J
t
, and it covers both
the meanreverting Vasicek and CIR models by setting µ(λ(t)) = κ(θ − λ(t)) and
σ(λ(t)) =
_
σ
2
0
+ σ
2
1
λ(t).
The termstructure models provide straightforward ways to simulate the future
default intensity for the purpose of predicting the conditional default probability.
However, they are ad hoc models lacking fundamental interpretation of the default
event. The choices (1.14) are popular because they could yield closedform pricing
10
formulas for (1.2), while the realworld default intensities deserve more ﬂexible and
meaningful forms. For instance, the intensity models in (1.14) cannot be used to
model the endogenous shapes of ﬁrstpassagetime hazards illustrated in Figure 1.2.
The dependence of default intensity on state variables z(t) (e.g. macroeconomic
covariates) is usually treated through a multivariate termstructure model for the
joint of (λ(t), z(t)). This approach essentially presumes a linear dependence in the
diﬀusion components, e.g. by correlated Wiener processes. In practice, the eﬀect of
a state variable on the default intensity can be nonlinear in many other ways.
The intensitybased approach also includes the duration models for econometric
analysis of actual historical defaults. They correspond to the classical survival analysis
in statistics, which opens another door for approaching credit risk.
1.3 Survival Models
Credit risk modeling in ﬁnance is closely related to survival analysis in statis
tics, including both the ﬁrstpassagetime structural models and the duration type
of intensitybased models. In a rough sense, default risk models based on the actual
credit performance data exclusively belong to survival analysis, since the latter by
deﬁnition is the analysis of timetofailure data. Here, the failure refers to the default
event in either corporate or retail risk exposures; see e.g. Duﬃe, Saita and Wang
(2007) and Deng, Quigley and Van Order (2000).
We ﬁnd that the survival models have at least fourfold advantages in approach
to credit risk modeling:
1. ﬂexibility in parametrizing the default intensity,
2. ﬂexibility in incorporating various types of covariates,
3. eﬀectiveness in modeling the credit portfolios, and
11
4. being straightforward to enrich and extend the family of credit risk models.
The following materials are organized in a way to make these points one by one.
1.3.1 Parametrizing Default Intensity
In survival analysis, there are a rich set of lifetime distributions for parametrizing
the default intensity (i.e. hazard rate) λ(t), including
Exponential: λ(t) = α
Weibull: λ(t) = αt
β−1
(1.15)
Lognormal: λ(t) =
1
tα
φ(log t/α)
Φ(−log t/α)
Loglogistic: λ(t) =
(β/α)(t/α)
β−1
1 + (t/α)
β
, α, β > 0
where φ(x) = Φ
(x) is the density function of normal distribution. Another example
is the inverse Gaussian type of hazard rate function in (1.13). More examples of
parametric models can be found in Lawless (2003) and Marshall and Olkin (2007)
among many other texts. They all correspond to the distributional assumptions of the
default time τ, while the inverse Gaussian distribution has a beautiful ﬁrstpassage
time interpretation.
The loglocationscale transform can be used to model the hazard rates. Given a
latent default time τ
0
with baseline hazard rate λ
0
(t) and survival function S
0
(t) =
exp
_
−
_
t
0
λ
0
(s)ds
_
, one may model the ﬁrmspeciﬁc default time τ
i
by
log τ
i
= µ
i
+ σ
i
log τ
0
, −∞ < µ
i
< ∞, σ
i
> 0 (1.16)
with individual risk parameters (µ
i
, σ
i
). Then, the ﬁrmspeciﬁc survival function
takes the form
S
i
(t) = S
0
_
_
t
e
µ
i
_
1/σ
i
_
(1.17)
12
and the ﬁrmspeciﬁc hazard rate is
λ
i
(t) = −
d log S
i
(t)
dt
=
1
σ
i
t
_
t
e
µ
i
_
1/σ
i
λ
0
_
_
t
e
µ
i
_
1/σ
i
_
. (1.18)
For example, given the Weibull baseline hazard rate λ
0
(t) = αt
β−1
, the loglocation
scale transformed hazard rate is given by
λ
i
(t) =
α
σ
i
t
β
σ
i
−1
exp
_
−
βµ
i
σ
i
_
. (1.19)
1.3.2 Incorporating Covariates
Consider ﬁrmspeciﬁc (or consumerspeciﬁc) covariates z
i
∈ R
p
that are either
static or timevarying. The dependence of default intensity on z
i
can be studied by
regression models. In the context of survival analysis, there are two popular classes
of regression models, namely
1. the multiplicative hazards regression models, and
2. the accelerated failure time (AFT) regression models.
The hazard rate is taken as the modeling basis in the ﬁrst class of regression models:
λ
i
(t) = λ
0
(t)r(z
i
(t)) (1.20)
where λ
0
(t) is a baseline hazard function and r(z
i
(t)) is a ﬁrmspeciﬁc relativerisk
multiplier (both must be positive). For example, one may use the inverse Gaussian
hazard rate (1.13) or pick one from (1.15) as the baseline λ
0
(t). The relative risk term
is often speciﬁed by r(z
i
(t)) = exp¦θ
T
z
i
(t)¦ with parameter θ ∈ R
p
. Then, the hazard
ratio between any two ﬁrms, λ
i
(t)/λ
j
(t) = exp¦θ
T
([z
i
(t) − z
j
(t)]¦, is constant if the
covariates diﬀerence z
i
(t)−z
j
(t) is constant in time, thus ending up with proportional
hazard rates. Note that the covariates z
i
(t) are sometimes transformed before entering
13
the model. For example, an econometric modeler usually takes diﬀerence of Gross
Domestic Product (GDP) index and uses the GDP growth as the entering covariate.
For a survival analyst, the introduction of the multiplicative hazards model (1.20)
is incomplete without introducing the Cox’s proportional hazards (CoxPH) model.
The latter is among the most popular models in modern survival analysis. Rather
than using the parametric form of baseline hazard, Cox (1972, 1975) considered
λ
i
(t) = λ
0
(t) exp¦θ
T
z
i
(t)¦ by leaving the baseline hazard λ
0
(t) unspeciﬁed. Such
semiparametric modeling could reduce the estimation bias of the covariate eﬀects due
to baseline misspeciﬁcation. One may use the partial likelihood to estimate θ, and
the nonparametric likelihood to estimate λ
0
(t). The estimated baseline
ˆ
λ
0
(t) can be
further smoothed upon parametric, termstructure or datadriven techniques.
The second class of AFT regression models are based on the aforementioned log
locationscale transform of default time. Suppose that z
i
is not timevarying. Given
τ
0
with baseline hazard rate λ
0
(t), consider (1.16) with the location µ
i
= θ
T
z
i
and
the scale σ
i
= 1 (for simplicity), i.e. log τ
i
= θ
T
z
i
+log τ
0
. Then, it is straightforward
to get the survival function and hazard rate by (1.17) and (1.18):
S
i
(t) = S
0
_
t exp¦−θ
T
z
i
¦
_
, λ
i
(t) = λ
0
_
t exp¦−θ
T
z
i
¦
_
exp¦−θ
T
z
i
¦ (1.21)
An interesting phenomenon is when using the Weibull baseline, by (1.19), we have
that λ
i
(t) = λ
0
(t) exp¦−βθ
T
z
i
¦. This is equivalent to use the multiplicative hazards
model (1.20), as a unique property of the Weibull lifetime distribution. For more
details about AFT regression, see e.g. Lawless (2003; ¸6).
Thus, the covariates z
i
could induce the ﬁrmspeciﬁc heterogeneity via either a
relativerisk multiplier λ
i
(t)/λ
0
(t) = e
θ
T
z
i
(t)
or the default time acceleration τ
i
/τ
0
=
e
θ
T
z
i
. In situations where certain hidden covariates are not accessible, the unexplained
variation also leads to heterogeneity. The frailty models introduced by Vaupel, et al.
14
(1979) are designed to model such unobserved heterogeneity as a random quantity,
say, Z ∼ Gamma(δ
−1
, δ) with shape δ
−1
and scale δ > 0, where a Gamma(k, δ)
distribution has the density
p(z; k, δ) =
z
k−1
exp¦−z/δ¦
Γ(k)δ
k
, z > 0 (1.22)
with mean kδ and variance kδ
2
. Then, the proportional frailty model extends (1.20)
to be
λ
i
(t) = Z λ
0
(t)r(z
i
(t)), (1.23)
for each single obligor i subject to the odd eﬀect Z; see Aalen, et al. (2008; ¸6).
On the other hand, the frailty models can be also used to characterize the default
correlation among multiple obligors, which is the next topic.
1.3.3 Correlating Credit Defaults
The issue of default correlation embedded in credit portfolios has drawn intense
discussions in the recent credit risk literature, in particular along with today’s credit
crisis involving problematic basket CDS or CDO ﬁnancial instruments; see e.g. Bluhm
and Overbeck (2007). In eyes of a fund manager, the positive correlation of defaults
would increase the level of total volatility given the same level of total expectation (cf.
Markowitz portfolio theory). In academics, among other works, Das, et al. (2007)
performed an empirical analysis of default times for U.S. corporations and provided
evidence for the importance of default correlation.
The default correlation could be eﬀectively characterized by multivariate survival
analysis. In a broad sense, there exists two diﬀerent approaches:
1. correlating the default intensities through the common covariates,
2. correlating the default times through the copulas.
15
The ﬁrst approach is better known as the conditionally independent intensitybased
approach, in the sense that the default times are independent conditional on the com
mon covariates. Examples of common covariates include the marketwide variables,
e.g. the GDP growth, the shortterm interest rate, and the houseprice appreciation
index. Here, the default correlation induced by the common covariates is in contrast
to the aforementioned default heterogeneity induced by the ﬁrmspeciﬁc covariates.
As announced earlier, the frailty models (1.23) can also be used to capture the
dependence of multiobligor default intensities in case of hidden common covariates.
They are usually rephrased as shared frailty models, for diﬀerentiation from single
obligor modeling purpose. The frailty eﬀect Z is usually assumed to be timeinvariant.
One may refer to Hougaard (2000) for a comprehensive treatment of multivariate
survival analysis from a frailty point of view. See also Aalen et al. (2008; ¸11) for
some intriguing discussion of dynamic frailty modeled by diﬀusion and L´evy processes.
In ﬁnance, Duﬃe, et al. (2008) applied the shared frailty eﬀect across ﬁrms, as well
as a dynamic frailty eﬀect to represent the unobservable macroeconomic covariates.
The copula approach to default correlation considers the default times as the
modeling basis. A copula C : [0, 1]
n
→ [0, 1] is a function used to formulate the
multivariate joint distribution based on the marginal distributions, e.g.,
Gaussian: C
Σ
(u
1
, . . . , u
n
) = Φ
Σ
(Φ
−1
(u
1
), . . . , Φ
−1
(u
n
))
Studentt: C
Σ,ν
(u
1
, . . . , u
n
) = Θ
Σ,ν
(Θ
−1
ν
(u
1
), . . . , Θ
−1
ν
(u
n
)) (1.24)
Archimedean: C
Ψ
(u
1
, . . . , u
n
) = Ψ
−1
_
n
i=1
Ψ(u
i
)
_
where Σ ∈ R
n×n
, Φ
Σ
denotes the multivariate normal distribution, Θ
Σ,ν
denotes the
multivariate Studentt distribution with degrees of freedom ν, and Ψ is the generator
of Archimedean copulas; see Nelsen (2006) for details. When Ψ(u) = −log u, the
Archimedean copula reduces to
n
i=1
u
i
(independence copula). By Sklar’s theorem,
16
for a multivariate joint distribution, there always exists a copula that can link the
joint distribution to its univariate marginals. Therefore, the joint survival distribution
of (τ
1
, . . . , τ
n
) can be characterized by S
joint
(t
1
, . . . , t
n
) = C(S
1
(t
1
), . . . , S
n
(t
n
)), upon
an appropriate selection of copula C.
Interestingly enough, the shared frailty models correspond to the Archimedean
copula, as observed by Hougaard (2000; ¸13) and recast by Aalen, et al. (2008; ¸7).
For each τ
i
with underlying hazard rates Zλ
i
(t), assume λ
i
(t) to be deterministic and
Z to be random. Then, the marginal survival distributions are found by integrating
over the distribution of Z,
S
i
(t) = E
Z
_
exp
_
−
_
t
0
λ
i
(s)ds
__
= L(Λ
i
(t)), i = 1, . . . , n (1.25)
where L(x) = E
Z
[exp¦−xZ¦] is the Laplace transform of Z and Λ
i
(t) is the cumula
tive hazard. Similarly, the joint survival distribution is given by
S
joint
(t
1
, . . . , t
n
) = L
_
n
i=1
Λ
i
(t
i
)
_
= L
_
n
i=1
L
−1
(S
i
(t
i
))
_
, (1.26)
where the second equality follows (1.25) and leads to an Archimedean copula. For
example, given the Gamma frailty Z ∼ Gamma(δ
−1
, δ), we have that L(x) = (1 +
δx)
−1/δ
, so
S
joint
(t
1
, . . . , t
n
) =
_
1 + δ
n
i=1
_
t
i
0
λ
i
(s)ds
_
−1/δ
.
In practice of modeling credit portfolios, one must ﬁrst of all check the appropri
ateness of the assumption behind either the frailty approach or the copula approach.
An obvious counter example is the recent collapse of the basket CDS and CDO market
for which the rating agencies have once abused the use of the oversimpliﬁed Gaus
sian copula. Besides, the correlation risk in these complex ﬁnancial instruments is
highly contingent upon the macroeconomic conditions and blackswan events, which
17
are rather dynamic, volatile and unpredictable.
Generalization
The above survey of survival models is too brief to cover the vast literature of
survival analysis that deserves potential applications in credit risk modeling. To be
selective, we have only cited Hougaard (2000), Lawless (2003) and Aalen, et al. (2008),
but there are numerous texts and monographs on survival analysis and reliability.
It is straightforward to enrich and extend the family of credit risk models, by
either the existing devices in survival analysis, or the new developments that would
beneﬁt both ﬁelds. The latter is one of our objectives in this thesis, and we will focus
on the dualtime generalization.
1.4 Scope of the Thesis
Both old and new challenges arising in the context of credit risk modeling call
for developments of statistical methodologies. Let us ﬁst preview the complex data
structures in realworld credit risk problems before deﬁning the scope of the thesis.
The vintage time series are among the most popular versions of economic and
risk management data; see e.g. ALFRED digital archive
1
hosted by U.S. Federal
Reserve Bank of St. Louis. For risk management in retail banking, consider for in
stance the revolving exposures of credit cards. Subject to the market share of the
cardissuing bank, thousands to hundred thousands of new cards are issued monthly,
ranging from product types (e.g. speciality, reward, cobrand and secured cards),
acquisition channels (e.g. banking center, direct mail, telephone and internet), and
other distinguishable characteristics (e.g. geographic region where the card holder
resides). By convention, the accounts originated from the same month constitute a
monthly vintage (quarterly and yearly vintages can be deﬁned in the same fashion),
1
http://alfred.stlouisfed.org/
18
Figure 1.3: Illustration of retail credit portfolios and vintage diagram.
where “vintage” is a borrowed term from winemaking industry to account for the
sharing origination. Then, a vintage time series refers to the longitudinal observa
tions and performance measurements for the cohort of accounts that share the same
origination date and therefore the same age.
Figure 1.3 provides an illustration of retail credit portfolios and the scheme of
vintage data collection, where we use only the geographic MSA (U.S. Metropolitan
Statistical Area) as a representative of segmentation variables. Each segment consists
of multiple vintages that have diﬀerent origination dates, and each vintage consists of
varying numbers of accounts. The most important feature of the vintage data is its
dualtime coordinates. Let the calendar time be denoted by t, each vintage be denoted
by its origination time v, then the lifetime (i.e. age, or called the “monthonbook”
for a monthly vintage) is calculated by m = t − v for t ≥ v. Such (m, t; v) tuples
lie in the dualtime domain, called the vintage diagram, with xaxes representing the
calendar time t and yaxis representing the lifetime m. Unlike a usual 2D space, the
vintage diagram consists of unitslope trajectories each representing the evolvement of
a diﬀerent vintage. On a vintage diagram, all kinds of data can be collected including
the continuous, binary or count observations. In particular, when the dualtime
19
observations are the individual lives together with birth and death time points, and
when these observations are subject to truncation and censoring, one may draw a line
segment for each individual to represent the vintage data graphically. Such a graph
is known as the Lexis diagram in demography and survival analysis; see Figure 4.1
based on a dualtimetodefault simulation.
For corporate bond and loan issuers, the observations of default events or default
rates share the same vintage data structure. Table 1.1 shows the Moody’s speculative
grade default rates for annual cohorts 19702008, which are calculated from the cu
mulative rates released recently by Moody’s Global Credit Policy (February 2009).
Each cohort is observed by year end of 2008, and truncated by 20 years maximum in
lifetime. Thus, the performance window for the cohorts originated in the latest years
becomes shorter and shorter. The default rates in Table 1.1 are aligned in lifetime,
while they can be also aligned in calendar time. To compare the cohort performances
in dualtime coordinates, the marginal plots in both lifetime and calendar time are
given Figure 1.4, together with the sidebyside boxplots for checking the vintage (i.e.
cohort) heterogeneity. It is clear that the average default rates gradually decrease in
lifetime, but very volatile in calendar time. Note that the calendar dynamics follow
closely the NBERdated recessions shown in Figure 1.1. Besides, there seems to be
heterogeneity among vintages. These projection views are only an initial exploration.
Further analysis of this vintage data will be performed in Chapter III.
The main purpose of the thesis is to develop a dualtime modeling framework
based on observations on the vintage diagram, or “dualtime analytics”. First of all,
it is worth mentioning that the vintage data structure is by no means an exclusive
feature of economic and ﬁnancial data. It also appears in clinical trials with staggered
entry, demography, epidemiology, dendrochronology, etc. As an interesting example
in dendrochronology, Esper, Cook and Schweingruber (Science, 2002) analyzed the
long historical treering records in a dualtime manner in order to reconstruct the
20
Figure 1.4: Moody’s speculativegrade default rates for annual cohorts 19702008:
projection views in lifetime, calendar and vintage origination time.
past temperature variability. We believe that the dualtime analytics developed in
the thesis can be also applied to nonﬁnancial ﬁelds.
The road map in Figure 1.5 outlines the scope of the thesis. In Chapter II, we
discuss the adaptive smoothing spline (AdaSS) for heterogeneously smooth function
estimation. Two challenging issues are evaluation of reproducing kernel and determi
nation of local penalty, for which we derive an explicit solution based upon piecewise
type of local adaptation. Four diﬀerent examples, each having a speciﬁc feature of
heterogeneous smoothness, are used to demonstrate the new AdaSS technique. Note
that the AdaSS may estimate ‘smooth’ functions including possible jumps, and it
plays a key role in the subsequent work in the thesis.
In Chapter III, we develop the vintage data analysis (VDA) for continuos type
of dualtime responses (e.g. loss rates). We propose an MEV decomposition frame
work based on Gaussian process models, where M stands for the maturation curve,
E for the exogenous inﬂuence and V for the vintage heterogeneity. The intrinsic
identiﬁcation problem is discussed. Also discussed is the semiparametric extension
of MEV models in the presence of dualtime covariates. Such models are motivated
from the practical needs in ﬁnancial risk management. An eﬃcient MEV Backﬁtting
algorithm is provided for model estimation, and its performance is assessed by a sim
21
'
&
$
%
Chapter I: Introduction
Credit Risk Modeling
Survival Analysis
'
&
$
%
Chapter II: AdaSS
Smoothing Spline
Local Adaptation
'
&
$
%
Chapter IV: DtSA
Nonparametric Estimation
Structural Parametrization
Dualtime Cox Regression
'
&
$
%
Chapter III: VDA
MEV Decomposition
Gaussian Process Models
Semiparametric Regression
? ?
@
@
@
@
@
@
@R
Figure 1.5: A road map of thesis developments of statistical methods in credit risk
modeling.
ulation study. Then, we apply the MEV risk decomposition strategy to analyze both
corporate default rates and retail loan loss rates.
In Chapter IV, we study the dualtime survival analysis (DtSA) for timetodefault
data observed on Lexis diagram. It is of particular importance in credit risk mod
eling where the default events could be triggered by both endogenous and exoge
nous hazards. We consider (a) nonparametric estimators under oneway, twoway
and threeway underlying hazards models, (b) structural parameterization via the
ﬁrstpassagetime triggering system with an endogenous distancetodefault process
associated with exogenous timetransformation, and (c) dualtime semiparametric
Cox regression with both endogenous and exogenous baseline hazards and covari
ate eﬀects, for which the method of partial likelihood estimation is discussed. Also
discussed is the randomeﬀect vintage heterogeneity modeled by the shared frailty.
Finally, we demonstrate the application of DtSA to credit card and mortgage risk
analysis in retail banking, and shed some light on understanding the ongoing credit
crisis from a new dualtime analytic perspective.
22
T
a
b
l
e
1
.
1
:
M
o
o
d
y
’
s
s
p
e
c
u
l
a
t
i
v
e

g
r
a
d
e
d
e
f
a
u
l
t
r
a
t
e
s
.
D
a
t
a
s
o
u
r
c
e
:
M
o
o
d
y
’
s
s
p
e
c
i
a
l
c
o
m
m
e
n
t
(
r
e
l
e
a
s
e
:
F
e
b
r
u
a
r
y
2
0
0
9
)
o
n
c
o
r
p
o

r
a
t
e
d
e
f
a
u
l
t
a
n
d
r
e
c
o
v
e
r
y
r
a
t
e
s
,
1
9
2
0

2
0
0
8
(
h
t
t
p
:
/
/
w
w
w
.
m
o
o
d
y
s
.
c
o
m
/
)
a
n
d
a
u
t
h
o
r
’
s
c
a
l
c
u
l
a
t
i
o
n
s
.
C
o
h
o
r
t
Y
e
a
r
1
Y
e
a
r
2
Y
e
a
r
3
Y
e
a
r
4
Y
e
a
r
5
Y
e
a
r
6
Y
e
a
r
7
Y
e
a
r
8
Y
e
a
r
9
Y
e
a
r
1
0
Y
e
a
r
1
1
Y
e
a
r
1
2
Y
e
a
r
1
3
Y
e
a
r
1
4
Y
e
a
r
1
5
Y
e
a
r
1
6
Y
e
a
r
1
7
Y
e
a
r
1
8
Y
e
a
r
1
9
Y
e
a
r
2
0
1
9
7
0
8
.
7
7
2
1
.
0
8
2
1
.
8
6
6
0
.
7
7
8
0
.
8
0
4
0
.
8
4
0
0
.
4
4
3
0
.
9
4
1
1
.
0
0
0
0
.
0
0
0
0
.
0
0
0
1
.
2
5
0
3
.
3
2
9
0
.
7
1
1
0
.
0
0
0
1
.
6
1
2
3
.
5
0
0
1
.
9
7
5
0
.
0
0
0
1
.
2
0
5
1
9
7
1
1
.
1
5
2
1
.
9
8
9
0
.
8
2
9
0
.
8
5
9
0
.
8
9
8
0
.
4
7
4
1
.
0
0
9
1
.
0
7
2
0
.
0
0
0
0
.
0
0
0
1
.
3
3
9
3
.
5
7
3
0
.
7
6
1
0
.
0
0
0
1
.
7
2
1
4
.
6
5
9
2
.
1
1
0
0
.
0
0
0
1
.
2
9
3
0
.
0
0
0
1
9
7
2
1
.
9
5
7
0
.
8
1
2
0
.
8
4
0
0
.
8
7
8
0
.
4
6
3
0
.
9
7
7
1
.
0
3
4
0
.
0
0
0
0
.
0
0
0
1
.
2
6
6
4
.
0
4
9
0
.
7
1
9
0
.
0
0
0
2
.
4
3
9
4
.
3
8
2
1
.
9
6
7
0
.
0
0
0
1
.
1
8
5
1
.
2
6
3
6
.
6
4
7
1
9
7
3
1
.
2
7
1
0
.
8
7
6
0
.
9
1
6
0
.
4
8
4
1
.
0
1
8
1
.
0
6
9
0
.
0
0
0
0
.
0
0
0
1
.
2
8
9
4
.
7
5
5
0
.
7
1
8
0
.
0
0
0
1
.
6
0
8
4
.
3
4
3
2
.
9
3
4
0
.
0
0
0
1
.
1
5
7
1
.
2
3
1
7
.
6
9
8
1
.
3
7
2
1
9
7
4
1
.
3
3
0
0
.
9
3
1
0
.
4
9
3
1
.
0
3
7
1
.
0
9
0
0
.
0
0
0
0
.
0
0
0
1
.
3
0
3
4
.
7
9
4
0
.
7
2
6
0
.
0
0
0
1
.
6
2
0
6
.
0
9
8
2
.
9
3
0
0
.
0
0
0
1
.
1
7
7
2
.
5
0
7
7
.
8
5
5
1
.
4
0
6
1
.
5
7
8
1
9
7
5
1
.
7
3
5
0
.
9
1
2
1
.
4
3
9
1
.
0
0
7
0
.
0
0
0
0
.
0
0
0
1
.
2
0
5
4
.
4
6
2
0
.
6
7
6
0
.
0
0
0
1
.
5
2
1
5
.
7
2
1
2
.
7
4
1
1
.
0
0
1
1
.
0
7
8
2
.
2
8
3
7
.
1
2
5
1
.
2
6
6
1
.
3
8
6
0
.
0
0
0
1
9
7
6
0
.
8
6
4
1
.
3
6
1
1
.
4
2
4
0
.
0
0
0
0
.
5
3
1
1
.
1
3
4
3
.
6
0
7
0
.
6
3
9
0
.
0
0
0
1
.
4
3
0
5
.
3
4
8
2
.
5
6
1
0
.
9
4
3
1
.
0
0
9
2
.
1
2
4
7
.
8
1
4
1
.
2
0
4
1
.
3
0
8
0
.
0
0
0
0
.
0
0
0
1
9
7
7
1
.
3
3
9
1
.
4
0
6
0
.
0
0
0
1
.
0
5
2
1
.
1
2
8
3
.
5
9
9
0
.
6
3
8
0
.
0
0
0
1
.
4
1
9
5
.
3
0
5
2
.
5
3
6
0
.
9
2
7
0
.
9
8
3
2
.
0
7
0
7
.
6
5
0
1
.
1
8
6
1
.
2
9
7
0
.
0
0
0
0
.
0
0
0
0
.
0
0
0
1
9
7
8
1
.
7
9
8
0
.
0
0
0
1
.
0
1
2
1
.
0
8
6
3
.
4
6
3
1
.
2
2
3
0
.
6
5
1
2
.
7
0
9
6
.
4
9
6
2
.
4
1
1
0
.
8
8
4
0
.
9
4
3
2
.
9
9
4
7
.
4
8
6
1
.
1
7
3
2
.
5
7
5
0
.
0
0
0
0
.
0
0
0
0
.
0
0
0
0
.
0
0
0
1
9
7
9
0
.
4
2
0
0
.
8
9
7
0
.
9
5
8
3
.
0
3
1
2
.
1
2
8
3
.
3
9
6
2
.
9
6
3
7
.
5
7
9
2
.
0
8
7
0
.
7
6
5
0
.
8
2
4
3
.
5
0
7
6
.
6
6
8
1
.
0
4
5
2
.
2
7
6
0
.
0
0
0
0
.
0
0
0
0
.
0
0
0
0
.
0
0
0
0
.
0
0
0
1
9
8
0
1
.
6
1
3
0
.
8
5
9
4
.
0
4
5
1
.
8
8
4
3
.
9
9
3
3
.
1
3
8
7
.
2
7
2
2
.
4
5
1
0
.
6
8
0
2
.
2
2
2
3
.
9
2
5
6
.
9
6
6
1
.
9
6
7
2
.
1
2
5
0
.
0
0
0
0
.
0
0
0
0
.
0
0
0
1
.
2
2
3
0
.
0
0
0
0
.
0
0
0
1
9
8
1
0
.
7
0
1
4
.
0
5
3
1
.
9
3
2
3
.
6
7
5
3
.
4
2
3
7
.
7
9
6
2
.
5
1
4
0
.
5
6
0
2
.
4
7
0
4
.
7
0
2
6
.
8
5
5
1
.
7
5
2
1
.
9
0
6
0
.
0
0
0
0
.
0
0
0
0
.
0
0
0
1
.
1
2
0
1
.
1
5
4
0
.
0
0
0
0
.
0
0
0
1
9
8
2
3
.
5
7
1
4
.
1
0
4
2
.
9
0
2
3
.
4
0
3
7
.
6
7
3
2
.
2
1
3
0
.
4
9
5
2
.
2
0
8
4
.
2
6
6
6
.
2
5
6
1
.
6
0
3
1
.
7
3
9
0
.
0
0
0
0
.
0
0
0
0
.
0
0
0
1
.
1
0
3
1
.
1
6
9
0
.
0
0
0
0
.
0
0
0
2
.
5
7
5
1
9
8
3
3
.
8
2
4
3
.
1
5
3
3
.
6
8
1
7
.
3
0
8
2
.
8
7
1
2
.
3
1
5
3
.
1
8
0
6
.
0
1
3
6
.
9
0
4
1
.
6
3
1
1
.
8
3
3
0
.
0
0
0
0
.
0
0
0
0
.
0
0
0
1
.
1
8
1
1
.
2
6
1
1
.
3
0
6
1
.
3
5
5
2
.
7
8
3
2
.
8
6
4
1
9
8
4
3
.
3
3
3
3
.
7
9
7
8
.
1
9
9
2
.
7
9
3
3
.
5
3
5
4
.
0
5
2
7
.
1
4
8
5
.
7
8
8
1
.
3
4
9
2
.
2
7
8
0
.
0
0
0
0
.
0
0
0
0
.
0
0
0
1
.
0
5
0
1
.
1
5
7
1
.
1
9
4
1
.
2
3
4
1
.
2
8
0
2
.
6
2
3
0
.
0
0
0
1
9
8
5
3
.
4
4
8
6
.
6
6
8
3
.
8
6
0
3
.
7
5
3
5
.
0
4
1
7
.
4
5
4
5
.
6
9
6
1
.
6
5
7
1
.
8
7
3
0
.
0
0
0
1
.
4
7
7
0
.
0
0
0
0
.
8
5
6
0
.
9
5
4
1
.
0
1
4
1
.
0
7
1
2
.
2
7
5
2
.
3
2
6
0
.
0
0
0
0
.
0
0
0
1
9
8
6
5
.
6
4
4
4
.
5
6
3
3
.
2
8
2
4
.
6
0
3
7
.
4
6
0
6
.
4
5
8
2
.
9
6
6
2
.
4
3
5
1
.
0
8
4
1
.
1
9
4
0
.
0
0
0
1
.
4
1
9
0
.
8
1
3
1
.
7
7
3
1
.
8
6
1
3
.
9
6
0
1
.
0
3
0
0
.
0
0
0
0
.
0
0
0
1
.
1
5
0
1
9
8
7
4
.
2
2
2
3
.
8
9
4
5
.
4
4
1
8
.
2
6
1
7
.
5
4
7
3
.
6
4
4
2
.
6
5
3
1
.
1
3
2
1
.
2
6
4
0
.
4
7
1
1
.
0
6
5
1
.
2
3
9
1
.
3
7
6
2
.
8
8
9
3
.
0
7
2
4
.
1
1
3
0
.
0
0
0
0
.
0
0
0
1
.
0
1
6
0
.
0
0
0
1
9
8
8
3
.
5
8
0
5
.
8
9
0
8
.
3
3
5
8
.
0
3
2
3
.
6
1
8
3
.
0
7
8
1
.
1
9
1
1
.
6
7
8
0
.
7
5
4
1
.
2
6
8
2
.
4
3
5
1
.
6
4
0
2
.
3
6
4
3
.
8
7
1
4
.
2
1
0
0
.
0
0
0
0
.
0
0
0
0
.
8
9
0
0
.
0
0
0
2
.
1
1
9
1
9
8
9
5
.
7
9
4
9
.
9
8
0
8
.
1
7
6
3
.
9
5
5
3
.
2
3
8
1
.
2
8
0
1
.
7
5
9
0
.
9
9
7
1
.
8
9
0
2
.
6
3
1
1
.
4
7
7
2
.
6
7
4
3
.
5
2
7
5
.
1
3
4
0
.
0
0
0
0
.
0
0
0
0
.
7
9
8
0
.
0
0
0
1
.
8
8
7
3
.
0
2
0
1
9
9
0
9
.
9
7
6
8
.
7
3
2
4
.
6
5
4
3
.
2
3
7
1
.
3
4
9
1
.
8
1
1
0
.
8
7
6
1
.
9
8
2
2
.
3
1
0
1
.
7
3
0
2
.
8
1
5
4
.
1
3
9
4
.
5
3
4
0
.
0
0
0
0
.
0
0
0
0
.
7
3
5
0
.
0
0
0
1
.
6
6
3
3
.
5
0
1
1
9
9
1
9
.
3
7
0
5
.
0
5
9
3
.
3
7
1
1
.
4
0
1
2
.
1
5
5
0
.
9
1
4
2
.
0
7
8
2
.
4
4
0
1
.
8
5
9
3
.
0
6
9
4
.
5
3
4
4
.
9
5
1
0
.
0
0
0
0
.
7
5
9
0
.
8
2
9
0
.
0
0
0
0
.
9
2
3
3
.
8
4
9
1
9
9
2
5
.
1
5
4
3
.
4
9
5
1
.
5
9
6
2
.
3
6
2
1
.
1
8
5
2
.
0
1
7
2
.
3
3
8
2
.
2
2
5
2
.
9
4
9
4
.
2
9
0
5
.
1
7
0
0
.
0
0
0
0
.
7
1
2
0
.
7
9
1
0
.
0
0
0
0
.
9
2
6
4
.
0
4
9
1
9
9
3
3
.
0
7
2
1
.
8
4
8
3
.
3
4
0
1
.
3
9
9
1
.
6
0
8
1
.
8
7
1
3
.
1
4
6
3
.
1
0
1
4
.
2
1
0
4
.
0
5
6
0
.
5
0
1
0
.
5
5
9
0
.
6
0
7
1
.
3
0
3
0
.
7
1
5
3
.
1
9
4
1
9
9
4
2
.
0
7
3
2
.
8
6
9
1
.
9
0
1
1
.
2
6
5
2
.
3
2
7
3
.
8
8
3
3
.
5
5
3
4
.
1
9
7
3
.
8
9
6
1
.
4
7
4
1
.
2
6
9
0
.
4
7
1
1
.
5
6
8
0
.
5
8
6
3
.
2
5
5
1
9
9
5
2
.
9
2
0
1
.
7
8
5
1
.
8
7
4
2
.
7
6
3
3
.
8
6
2
3
.
9
6
3
6
.
2
9
7
4
.
9
6
0
2
.
6
9
6
1
.
2
4
7
0
.
6
9
5
1
.
1
6
5
0
.
8
7
7
2
.
4
4
8
1
9
9
6
1
.
6
3
7
2
.
1
4
5
3
.
5
4
4
4
.
1
8
7
4
.
1
1
9
6
.
1
5
2
4
.
8
9
4
2
.
7
8
0
1
.
3
6
2
0
.
6
1
7
1
.
0
5
7
1
.
2
1
3
2
.
7
1
5
1
9
9
7
2
.
0
2
8
3
.
5
7
0
4
.
5
9
7
4
.
7
2
7
6
.
9
2
1
4
.
6
9
7
2
.
6
1
7
1
.
4
4
4
0
.
4
6
8
1
.
3
5
1
1
.
2
6
1
2
.
8
2
9
1
9
9
8
3
.
1
5
2
5
.
6
4
2
5
.
9
3
3
8
.
3
2
4
5
.
0
6
8
3
.
7
9
2
2
.
4
1
5
0
.
6
2
0
1
.
4
6
6
1
.
5
2
2
2
.
4
6
8
1
9
9
9
5
.
3
8
4
6
.
6
6
7
8
.
7
4
3
6
.
2
9
9
3
.
7
2
3
2
.
4
4
9
0
.
8
3
2
1
.
3
9
8
1
.
4
9
4
2
.
4
8
2
2
0
0
0
6
.
3
3
9
9
.
4
4
0
7
.
0
4
8
4
.
1
0
5
2
.
5
6
2
1
.
3
6
5
1
.
7
4
4
1
.
3
4
8
2
.
6
0
7
2
0
0
1
1
0
.
1
2
4
7
.
7
1
3
4
.
9
4
7
2
.
7
3
0
1
.
5
4
2
1
.
8
5
1
1
.
3
4
1
2
.
7
8
0
2
0
0
2
7
.
9
2
1
5
.
3
7
5
2
.
9
2
7
1
.
6
9
1
2
.
2
4
7
1
.
2
8
1
3
.
1
5
9
2
0
0
3
5
.
1
2
3
2
.
9
5
5
1
.
6
5
4
2
.
0
5
0
1
.
2
2
4
3
.
3
9
2
2
0
0
4
2
.
3
4
6
1
.
6
5
2
1
.
9
4
3
1
.
0
1
3
3
.
0
7
0
2
0
0
5
1
.
7
3
2
1
.
8
9
5
1
.
1
4
1
3
.
8
8
0
2
0
0
6
1
.
6
8
8
1
.
1
6
9
4
.
7
1
7
2
0
0
7
0
.
9
1
8
4
.
7
9
2
2
0
0
8
4
.
1
2
9
23
CHAPTER II
Adaptive Smoothing Spline
2.1 Introduction
For observational data, statisticians are generally interested in smoothing rather
than interpolation. Let the observations be generated by the signal plus noise model
y
i
= f(t
i
) + ε(t
i
), i = 1, . . . , n (2.1)
where f is an unknown smooth function with unit interval support [0, 1], and ε(t
i
) is
the random error. Function estimation from noisy data is a central problem in mod
ern nonparametric statistics, and includes the methods of kernel smoothing [Wand
and Jones (1995)], local polynomial smoothing [Fan and Gijbels (1996)] and wavelet
shrinkage [Vidakovic (1999)]. In this thesis, we concentrate on the method of smooth
ing spline that has an elegant formulation through the mathematical device of repro
ducing kernel Hilbert space (RKHS). There is a rich body of literature on smoothing
spline since the pioneering work of Kimeldorf and Wahba (1970); see the mono
graphs by Wahba (1990), Green and Silverman (1994) and Gu (2002). Ordinarily,
the smoothing spline is formulated by the following variation problem
min
f∈Wm
_
1
n
n
i=1
w(t
i
)[y
i
−f(t
i
)]
2
+ λ
_
1
0
[f
(m)
(t)]
2
dt
_
, λ > 0 (2.2)
24
where the target function is an element of the morder Sobolev space
J
m
= ¦f : f, f
, . . . , f
(m−1)
absolutely continuous, f
(m)
∈ L
2
[0, 1]¦. (2.3)
The objective in (2.2) is a sum of two functionals, the mean squared error MSE =
1
n
n
i=1
w(t
i
)[y
i
− f(t
i
)]
2
and the roughness penalty PEN =
_
1
0
[f
(m)
(t)]
2
dt; hence the
tuning parameter λ controls the tradeoﬀ between ﬁdelity to the data and the smooth
ness of the estimate. Let us call the ordinary smoothing spline (2.2) OrdSS in short.
The weights w(t
i
) in MSE come from the nonstationary error assumption of (2.1)
where ε(t) ∼ N(0, σ
2
/w(t)). They can be also used for aggregating data with repli
cates. Suppose the data are generated by (2.1) with stationary error but timevarying
sampling frequency r
k
for k = 1, . . . , K distinct points, then
MSE =
1
n
K
k=1
r
k
j=1
[y
kj
−f(t
k
)]
2
=
1
K
K
k=1
w(t
k
)
_
¯ y
k
−f(t
k
)
¸
2
+ Const. (2.4)
where ¯ y
k
∼ N(f(t
k
), σ
2
/r
k
) for k = 1, . . . , K are the pointwise averages. To train
OrdSS, the raw data can be replaced by ¯ y
k
s together with the weights w(t
k
) ∝ r
k
.
The OrdSS assumes that f(t) is homogeneously smooth (or rough), then performs
regularization based on the simple integral of squared derivatives [f
(m)
(t)]
2
. It ex
cludes the target functions with spatially varying degrees of smoothness, as noted
by Wahba (1985) and Nycha (1988). In practice, there are various types of func
tions of nonhomogeneous smoothness, and four such scenarios are illustrated in Fig
ure 2.1. The ﬁrst two are the simulation examples following exactly the same setting
as Donoho and Johnstone (1994), by adding the N(0, 1) noise to n = 2048 equally
spaced signal responses that are rescaled to attain signaltonoise ratio 7. The last
two are the sampled data from automotive engineering and ﬁnancial engineering.
1. Doppler function: f(t) =
_
t(1 −t) sin(2π(1 + a)/(t + a)), a = 0.05, t ∈ [0, 1]. It
25
Figure 2.1: Heterogeneous smooth functions: (1) Doppler function simulated with
noise, (2) HeaviSine function simulated with noise, (3) MotorcycleAccident
experimental data, and (4) CreditRisk tweaked sample.
has both timevarying magnitudes and timevarying frequencies.
2. HeaviSine function: f(t) = 3 sin 4πt − sgn(t − 0.3) − sgn(0.72 − t), t ∈ [0, 1]. It
has a downward jump at t = 0.3 and an upward jump at 0.72.
3. MotorcycleAccident experiment: a classic data from Silverman (1985). It con
sists of accelerometer readings taken through time from a simulated motorcycle
crash experiment, in which the time points are irregularly spaced and the noises
vary over time. In Figure 2.1, a hypothesized smooth shape is overlaid on the
observations, and it illustrates the fact that the acceleration stays relatively
constant until the crash impact.
4. CreditRisk tweaked sample: retail loan loss rates for the revolving credit expo
sures in retail banking. For academic use, business background is removed and
26
the sample is tweaked. The tweaked sample includes monthly vintages booked
in 7 years (20002006), where a vintage refers to a group of credit accounts
originated from the same month. They are all observed up to the end of 2006,
so a vintage booked earlier has a longer observation length. In Figure 2.1, the
xaxis represents the monthsonbook, and the circles are the vintage loss rates.
The pointwise averages are plotted as solid dots, which are wiggly on large
monthsonbook due to decreasing sample sizes. Our purpose is to estimate an
underlying shape (illustrated by the solid line), whose degree of smoothness, by
assumption, increases upon growth.
The ﬁrst three examples have often been discussed in the nonparametric analysis
literature, where function estimation with nonhomogeneous smoothness has been
of interest for decades. A variety of adaptive methods has been proposed, such as
variablebandwidth kernel smoothing (M¨ uller and Stadtm¨ uller, 1987), multivariate
adaptive regression splines (Friedman, 1991), adaptive wavelet shrinkage (Donoho
and Johnstone, 1994), localpenalty regression spline (Ruppert and Carroll, 2000), as
well as the treed Gaussian process modeling (Gramacy and Lee, 2008). In the context
of smoothing spline, Luo and Wahba (1997) proposed a hybrid adaptive spline with
knot selection, rather than using every sampling point as a knot. More naturally,
Wahba (1995), in a discussion, suggested to use the local penalty
PEN =
_
1
0
ρ(t)[f
(m)
(t)]
2
dt, s.t. ρ(t) > 0 and
_
1
0
ρ(t)dt = 1 (2.5)
where the addon function ρ(t) is adaptive in response to the local behavior of f(t) so
that heavier penalty is imposed to the regions of lower curvature. This local penalty
functional is the foundation of the whole chapter. Relevant works include Abramovich
and Steinberg (1996) and Pintore, Speckman and Holmes (2006).
This chapter is organized as follows. In Section 2.2 we formulate the AdaSS (adap
27
tive smoothing spline) through the local penalty (2.5), derive its solution via RKHS
and generalized ridge regression, then discuss how to determine both the smoothing
parameter and the local penalty adaptation by the method of cross validation. Sec
tion 2.3 covers some basic properties of the AdaSS. The experimental results for the
aforementioned examples are presented in Section 2.4. We summarize the chapter
with Section 2.5 and give technical proofs in the last section.
2.2 AdaSS: Adaptive Smoothing Spline
The AdaSS extends the OrdSS (2.2) based on the local penalty (2.5). It is formu
lated as
min
f∈Wm
_
1
n
n
i=1
w(t
i
)
_
y
i
−f(t
i
)
¸
2
+ λ
_
1
0
ρ(t)
_
f
(m)
(t)
¸
2
dt
_
, λ > 0. (2.6)
The following proposition has its root in Kimeldorf and Wahba (1971); or see Wahba
(1990; ¸1).
Proposition 2.1 (AdaSS). Given a bounded integrable positive function ρ(t), deﬁne
the r.k.
K(t, s) =
_
1
0
ρ
−1
(u)G
m
(t, u)G
m
(s, u)du, t, s ∈ [0, 1] (2.7)
where G
m
(t, u) =
(t−u)
m−1
+
(m−1)!
. Then, the solution of (2.6) can be expressed as
f(t) =
m−1
j=0
α
j
φ
j
(t) +
n
i=1
c
i
K(t, t
i
) ≡ α
T
φ(t) +c
T
ξ
n
(t) (2.8)
where φ
j
(t) = t
j
/j! for j = 0, . . . , m−1, α ∈ R
m
, c ∈ R
n
.
Since the AdaSS has a ﬁnitedimensional linear expression (2.8), it can be solved
by generalized linear regression. Write y = (y
1
, . . . , y
n
)
T
, B = (φ(t
1
), , φ(t
n
))
T
,
W = diag(w(t
1
), . . . , w(t
n
)) and Σ = [K(t
i
, t
j
)]
n×n
. Substituting (2.8) into (2.6), we
28
have the regularized least squares problem
min
α,c
_
1
n
_
_
y −Bα−Σc
_
_
2
W
+ λ
_
_
c
_
_
2
Σ
_
, (2.9)
where x
2
A
= x
T
Ax for A ≥ 0. By setting partial derivatives w.r.t. α and c to be
zero, we obtain the equations
B
T
W
_
Bα+Σc
_
= B
T
Wy, ΣW
_
Bα+ (Σ+ nλW
−1
)c
_
= ΣWy,
or
Bα+ (Σ+ nλW
−1
)c = y, B
T
c = 0. (2.10)
Consider the QRdecomposition B = (Q
1
.
.
.Q
2
)(R
T
.
.
.0
T
)
T
= Q
1
R with orthogonal
(Q
1
.
.
.Q
2
)
n×n
and uppertriangular R
m×m
, where Q
1
is n m and Q
2
is n (n −m).
Multiplying both sides of y = Bα + (Σ + nλW
−1
)c by Q
T
2
and using the fact
that c = Q
2
Q
T
2
c (since B
T
c = 0), we have that Q
T
2
y = Q
T
2
(Σ + nλW
−1
)c =
Q
T
2
(Σ+ nλW
−1
)Q
2
Q
T
2
c. Then, simple algebra yields
´c = Q
2
_
Q
T
2
(Σ+ nλW
−1
)Q
2
_
−1
Q
T
2
y, ´ α = R
−1
Q
T
1
_
y −(Σ+ nλW
−1
)´c
_
. (2.11)
The ﬁtted values at the design points are given by ˆ y = B´ α + Σ´c. By (2.10),
y = B´ α+ (Σ+ nλW
−1
)´c, so the residuals are given by
e = y − ˆ y = nλW
−1
´c ≡
_
I −S
λ,ρ
_
y (2.12)
where S
λ,ρ
= I −nλW
−1
Q
2
_
Q
T
2
(Σ+nλW
−1
)Q
T
2
_
−1
Q
T
2
is often called the smoothing
matrix in the sense that
ˆ y = S
λ,ρ
y = nλW
−1
Q
2
_
Q
T
2
(Σ+ nλW
−1
)Q
T
2
_
−1
Q
T
2
y. (2.13)
29
We use the notation S
λ,ρ
to denote its explicit dependence on both the smoothing
parameter λ and the local adaptation ρ(), where ρ is needed for the evaluation of
the matrix Σ = [K(t
i
, t
j
)] based on the r.k. (2.7).
The AdaSS is analogous to OrdSS in constructing the conﬁdence interval under a
Bayesian interpretation by Wahba (1983). For the sampling time points in particular,
we may construct the 95% interval pointwise by
ˆ
f(t
i
) ±z
0.025
_
´
σ
2
w
−1
(t
i
)S
λ,ρ
[i, i], i = 1, . . . , n (2.14)
where S
λ,ρ
[i, i] represents the ith diagonal element of the smoothing matrix. The
estimate of noise variance is given by
´
σ
2
=
_
_
(I − S
λ,ρ
)y
_
_
2
W
_
(n − p), where p =
trace(S
λ,ρ
) can be interpreted as the equivalent degrees of freedom.
2.2.1 Reproducing Kernels
The AdaSS depends on the evaluation of the r.k. of the integral form (2.7). For
general ρ(t), we must resort to numerical integration techniques. In what follows we
present a closedform expression of K(t, s) when ρ
−1
(t) is piecewise linear.
Proposition 2.2 (Reproducing Kernel). Let ρ
−1
(t) = a
k
+b
k
t, t ∈ [τ
k−1
, τ
k
) for some
ﬁxed (a
k
, b
k
) and the knots ¦0 ≡ τ
0
< τ
1
< < τ
K
≡ 1¦. Then, the AdaSS kernel
(2.7) can be explicitly evaluated by
K(t, s) =
κt∧s
k=1
m−1
j=0
(−1)
j
_
(a
k
+ b
k
u)
(u −t)
m−1−j
(u −s)
m+j
(m−1 −j)!(m + j)!
−b
k
m−1−j
l=0
(−1)
l
(u −t)
m−1−j−l
(u −s)
m+1+j+l
(m−1 −j −l)!(m + 1 + j + l)!
_¸
¸
¸
¸
¸
t∧s∧τ
−
k
τ
k−1
(2.15)
where κ
t∧s
= min¦k : τ
k
> t ∧ s¦, κ
1∧1
= K, and ¦F(u)¦[
τ
τ
= F(τ
) −F(τ).
Proposition 2.2 is general enough to cover many types of r.k.’s. For example,
30
setting b
k
≡ 0 gives the piecewiseconstant results for ρ
−1
(t) obtained by Pintore,
et al. (2006). For another example, set m = 2, K = 1 and ρ
−1
(t) = a + bt for
t ∈ [0, 1]. (By (2.5), a = b/(e
b
−1) and b ∈ R.) Then,
K(t, s) =
_
¸
_
¸
_
a
6
(3t
2
s −t
3
) +
b
12
(2t
3
s −t
4
), if t ≤ s,
a
6
(3ts
2
−s
3
) +
b
12
(2ts
3
−s
4
), otherwise.
(2.16)
When b = 0, (2.16) reduces to the OrdSS kernel for ﬁtting cubic smoothing splines.
The derivatives of the r.k. are also of interest. In solving boundary value dif
ferential equations, the Green’s function G
m
(t, u) has the property that f(t) =
_
1
0
G
m
(t, u)f
(m)
(u)du for f ∈ J
m
with zero f
(j)
(0), j = 0, . . . , m − 1. The equa
tion (2.7) is such an example of applying G
m
(t, ) to K(t, s) (s ﬁxed), implying that
∂
m
K(t, s)
∂t
m
= ρ
−1
(t)G
m
(s, t) =
_
¸
_
¸
_
(a
κt
+ b
κt
t)
(s−t)
m−1
(m−1)!
, if t ≤ s
0, otherwise.
(2.17)
The welldeﬁned higherorder derivatives of the kernel can be derived from (2.17),
and they should be treated from knot to knot if ρ
−1
(t) is piecewise.
Lowerorder lth derivatives (l < m) of (2.15) can be obtained by straightforward
but tedious calculations. Here, we present only the OrdSS kernels (by setting ρ(t) ≡ 1
in Proposition 2.2) and their derivatives to be used in next section:
K(t, s) =
m−1
j=0
(−1)
j
t
m−1−j
s
m+j
(m−1 −j)!(m + j)!
+
(−1)
m
(s −t)
2m−1
(2m−1)!
I(t ≤ s), (2.18)
∂
l
K(t, s)
∂t
l
=
m−1−l
j=0
(−1)
j
t
m−1−j−l
s
m+j
(m−1 −j −l)!(m + j)!
+
(−1)
m+l
(s −t)
2m−1−l
(2m−1 −l)!
I(t ≤ s).(2.19)
Note that the kernel derivative coincides with (2.17) when l = m.
Remarks: Numerical methods are inevitable for approximating (2.7) when ρ
−1
(t)
31
takes general forms. Rather than directly approximating (2.7), one may approximate
ρ
−1
(t) ﬁrst by a piecewise function with knot selection, then use the closedform
results derived in Proposition 2.2.
2.2.2 Local Penalty Adaptation
One of the most challenging problems for the AdaSS is selection of smoothing
parameter λ and local penalty adaptation function ρ(t). We use the method of cross
validation based on the leaveoneout scheme.
Conditional on (λ, ρ()), let
ˆ
f(t) denote the AdaSS trained on the complete sample
of size n, and
ˆ
f
[k]
(t) denote the AdaSS trained on the leaveoneout sample ¦(t
i
, y
i
)¦
i=k
for k = 1, . . . , n. Deﬁne the score of crossvalidation by
CV(λ, ρ) =
1
n
n
i=1
w(t
i
)
_
y
i
−
ˆ
f
[i]
(t
i
)
_
2
. (2.20)
For the linear predictor (2.13) with smoothing matrix S
λ,ρ
, Craven and Wahba (1979)
justiﬁed that
ˆ
f(t
i
) −
ˆ
f
[i]
(t
i
) = S
λ,ρ
(y
i
−
ˆ
f
[i]
(t
i
)) for i = 1, . . . , n. It follows that
CV(λ, ρ) =
1
n
n
i=1
w(t
i
)(y
i
−
ˆ
f(t
i
))
2
(1 −S
λ,ρ
[i, i])
2
=
1
n
y
T
W(I −diag(S
λ,ρ
))
−2
(I −S
λ,ρ
)
2
y, (2.21)
where diag(S
λ,ρ
) is the diagonal part of the smoothing matrix. Replacing the elements
of diag(S
λ
) by their average n
−1
trace(S
λ,ρ
), we obtain the socalled generalized cross
validation score
GCV(λ, ρ) =
1
n
n
i=1
w(t
i
)(y
i
−
ˆ
f
λ
(t
i
))
2
(1 −n
−1
trace(S
λ,ρ
))
2
. (2.22)
For ﬁxed ρ(), the best tuning parameter λ is usually found by minimizing (2.21) or
(2.22) on its log scale, i.e. by varying log λ on the real line.
To select ρ(), consider the rules: (1) choose a smaller value of ρ(t) if f() is less
smooth at t; and (2) the local roughness of f() is quantiﬁed by the squared mth
32
derivative
_
f
(m)
(t)
¸
2
. If the true f
(m)
(t) were known, it is reasonable to determine
the shape of ρ(t) by
ρ
−1
(t) ∝
_
f
(m)
(t)
¸
2
, (2.23)
Since the true function f(t) is unknown a priori, so are its derivatives. Let us consider
the estimation of the derivatives. In the context of smoothing splines, nonparametric
derivative estimation is usually obtained by taking derivatives of the spline estimate.
Using that as an initial estimate of local roughness, we may adopt a twostep proce
dure for selection of ρ(t), as follows.
Step 1: run the OrdSS with ˜ m = m + a (a = 0, 1, 2) and crossvalidated λ to
obtain the initial function estimate:
˜
f(t) =
m+l−1
j=0
˜ α
j
φ
j
(t) +
n
i=1
˜ c
i
¯
K(t, t
i
),
where
¯
K(t, s) =
_
1
0
G
m+a
(t, u)G
m+a
(s, u)du.
Step 2: calculate
˜
f
(m)
(t) and use it to determine the shape of ρ
−1
(t).
In Step 1, with no prior knowledge, we start with OrdSS upon the improper
adaptation ˜ ρ(t) ≡ 1. The extra order a = 0, 1, 2 imposed on penalization of derivatives
_
1
0
[f
(m+a)
(t)]
2
dt is to control the smoothness of the mth derivative estimate. So, the
choice of a depends on how smooth the estimated
˜
f
(m)
(t) is desired to be.
In Step 2, the mth derivative of
˜
f(t) can be obtained by diﬀerentiating the bases,
˜
f
(m)
(t) =
a−1
j=0
˜ α
m+j
φ
j
(t) +
n
i=1
˜ c
i
∂
m
∂t
m
¯
K(t, t
i
) (2.24)
where
∂
m
∂t
m
¯
K(t, t
i
) is calculated according to (2.19) after replacing m by m + a. The
goodness of the spline derivative depends largely on the spline estimate itself. Rice
and Rosenblatt (1983) studied the largesample properties of spline derivatives (2.24)
that have slower convergence rates than the spline itself. Given the ﬁnite sample, we
ﬁnd by simulation study that (2.24) tends to undersmooth, but it may capture the
rough shape of the derivatives upon standardization. Figure 2.2 demonstrates the
33
Figure 2.2: OrdSS function estimate and its 2ndorder derivative (upon standardiza
tion): scaled signal from f(t) = sin(ωt) (upper panel) and f(t) = sin(ωt
4
)
(lower panel), ω = 10π, n = 100 and snr = 7. The sin(ωt
4
) signal resem
bles the Doppler function in Figure 2.1; both have timevarying frequency.
estimation of the 2ndorder derivative for f(t) = sin(ωt) and f(t) = sin(ωt
4
) using
the OrdSS with m = 2, 3. In both sinewave examples, the order2 OrdSS could yield
rough estimates of derivative, and the order3 OrdSS could give smooth estimates but
large bias on the boundary. Despite these downside eﬀects, the shape of the derivative
can be more or less captured, meeting our needs for estimating ρ(t) below.
In determining the shape of ρ
−1
(t) from
˜
f
(m)
(t), we restrict ourselves to the piece
wise class of functions, as in Proposition 2.2. By (2.23), we suggest to estimate ρ
−1
(t)
piecewise by the method of constrained least squares:
¸
¸ ˜
f
(m)
(t)
¸
¸
2γ
+ ε = a
0
ρ
−1
(t) = a
κt
+ b
κt
t, κ
t
= min¦k : τ
k
≥ t¦ (2.25)
subject to the constraints (1) positivity: ρ
−1
(t) > 0 for t ∈ [0, 1], (2) continuity: a
k
+
34
b
k
τ
k
= a
k+1
+ b
k+1
τ
k
for k = 1, . . . , K −1 and (3) normalization: a
0
= a
0
_
1
0
ρ(t)dt =
K
k=1
_
τ
−
k
τ
k−1
dt
a
k
+b
k
t
> 0, for given γ ≥ 1 and prespeciﬁed knots ¦0 ≡ τ
0
< τ
1
< . . . <
τ
K−1
< 1¦ and τ
−
K
= 1. Clearly, ρ
−1
(t) is an example of linear regression spline. If the
continuity condition is relaxed and b
k
≡ 0, we obtain the piecewiseconstant ρ
−1
(t).
The power transform
¸
¸ ˜
f
(m)
(t)
¸
¸
2γ
, γ ≥ 1 is used to stretch [
˜
f
(m)
(t)]
2
, in order to
make up for (a) overestimation of [f
(m)
(t)]
2
in slightly oscillating regions and (b)
underestimation of [f
(m)
(t)]
2
in highly oscillating regions, since the initial OrdSS is
trained under the improper adaptation ˜ ρ(t) ≡ 1. The lowerright panel of Figure 2.2
is such an example of overestimating the roughness by the OrdSS.
The knots ¦τ
k
¦
K
k=1
∈ [0, 1] can be speciﬁed either regularly or irregularly. In most
situations, we may choose equally spaced knots with K to be determined. For the
third example in Figure 2.1 that shows ﬂat vs. rugged heterogeneity, one may choose
sparse knots for the ﬂat region and dense knots for the rugged region, and may also
choose regular knots such that ρ
−1
(t) is nearly constant in the ﬂat region. However,
for the second example in Figure 2.1 that shows jumptype heterogeneity, one should
shrink the size of the interval containing the jump.
Given the knots, the parameters (λ, γ) can be jointly determined by the cross
validation criteria (2.21) or (2.22).
Remarks: The selection of the local adaptation ρ
−1
(t) is a nontrivial problem, for
which we suggest to use ρ
−1
(t) ∝
¸
¸ ˜
f
(m)
(t)
¸
¸
2γ
together with piecewise approximation.
This can be viewed as a joint force of Abramovich and Steinberg (1996) and Pintore,
et al. (2006), where the former directly executed (2.23) on each sampled t
i
, and the
latter directly impose the piecewise constant ρ
−1
(t) without utilizing the fact (2.23).
Note that the approach of Abramovich and Steinberg (1996) may have the estimation
bias of
¸
¸ ˜
f
(m)
(t)
¸
¸
2
as discussed above, and the r.k. evaluation by approximation based
on a discrete set of ρ
−1
(t
i
) remains another issue. As for Pintore, et al. (2006),
the highdimensional parametrization of ρ
−1
(t) requires nonlinear optimization whose
35
stability is usually of concern. All these issues can be resolved by our approach, which
therefore has both analytical and computational advantages.
2.3 AdaSS Properties
Recall that the OrdSS (2.2) has many interesting statistical properties, e.g.
(a) it is a natural polynomial spline of degree (2m−1) with knots placed t
1
, . . . , t
n
(hence, the OrdSS with m = 2 is usually referred to the cubic smoothing spline);
(b) it has a Bayesian interpretation through the duality between RKHS and Gaus
sian stochastic processes (Wahba, 1990, ¸1.41.5);
(c) its large sample properties of consistency and rate of convergence depends on
the smoothing parameter (Wahba, 1990, ¸4 and references therein);
(d) it has an equivalent kernel smoother with variable bandwidth (Silverman, 1984).
Similar properties are desired to be established for the AdaSS. We leave (c) and (d) to
future investigation. For (a), the AdaSS properties depend on the r.k. (2.7) and the
structure of local adaptation ρ
−1
(t). Pintore, et al. (2006) investigated the piecewise
constant case of ρ
−1
(t) and argued that the AdaSS is also a piecewise polynomial
spline of degree (2m − 1) with knots on t
1
, . . . , t
n
, while it allows for more ﬂexible
lowerorder derivatives than the OrdSS. The piecewiselinear case of ρ
−1
(t) can be
analyzed in a similar manner. For the AdaSS (2.8) with the r.k. (2.15),
f(t) =
m−1
j=0
α
j
φ
j
(t) +
i:t
i
≤t
c
i
K(t, t
i
) +
i:t
i
>t
c
i
K(t, t
i
)
with degree (m − 1) in the ﬁrst two terms, and degree 2m in the third term (after
one degree increase due to the trend of ρ
−1
(t)).
36
For (b), the AdaSS based on the r.k. (2.7) corresponds to the following zeromean
Gaussian process
Z(t) =
_
1
0
ρ
−1/2
(u)G
m
(t, u)dW(u), t ∈ [0, 1] (2.26)
where W(t) denotes the standard Wiener process. The covariance kernel is given by
E[Z(t)Z(s)] = K(t, s). When ρ(t) ≡ 1, the Gaussian process (2.26) is the (m− 1)
fold integrated Wiener process studied by Shepp (1966). Following Wahba (1990;
¸1.5), consider a random eﬀect model F(t) = α
T
φ(t) +b
1/2
Z(t), then the best linear
unbiased estimate of F(t) from noisy data Y (t) = F(t) +ε corresponds to the AdaSS
solution (2.8), provided that the smoothing parameter λ = σ
2
/nb. Moreover, one
may set an improper prior on the coeﬃcients α, derive the posterior of F(t), then
obtain a Bayesian type of conﬁdence interval estimate; see Wahba (1990; ¸5). Such
conﬁdence intervals on the sampling points are given in (2.14).
The inverse function ρ
−1
(t) in (2.26) can be viewed as the measurement of non
stationary volatility (or, variance) of the mth derivative process such that
d
m
Z(t)
dt
m
= ρ
−1/2
(t)
dW(t)
dt
. (2.27)
By the duality between Gaussian processes and RKHS, d
m
Z(t)/dt
m
corresponds to
f
(m)
(t) in (2.6). This provides an alternative reasoning for (2.23) in the sense that
ρ
−1
(t) can be determined locally by the secondorder moments of d
m
Z(t)/dt
m
.
To illustrate the connection between the AdaSS and the Gaussian process with
nonstationary volatility, consider the case m = 1 and piecewiseconstant ρ
−1
(t) =
a
k
, t ∈ [τ
k−1
, τ
k
) for some a
k
> 0 and prespeciﬁed knots 0 ≡ τ
0
< τ
1
< < τ
K
≡ 1.
By (2.15), the r.k. is given by
K(t, s) =
κt∧s
k=1
a
k
_
t ∧ s −τ
k−1
_
, t, s ∈ [0, 1]. (2.28)
37
By (2.26) and (2.27), the corresponding Gaussian process satisﬁes that
dZ(t) =
√
a
k
dW(t), for t ∈ [τ
k−1
, τ
k
), k = 1, . . . , K (2.29)
where a
k
is the local volatility from knot to knot. Clearly, Z(t) reduces to the standard
Wiener process W(t) when a
k
≡ 1. As one of the most appealing features, the
construction (2.29) takes into account the jump diﬀusion, which occurs in [τ
k−1
, τ
k
)
when the interval shrinks and the volatility a
k
blows up.
2.4 Experimental Results
The implementation of OrdSS and AdaSS diﬀers only in the formation of Σ
through the r.k. K(t, s). It is most crucial for the AdaSS to select the local adaptation
ρ
−1
(t). Consider the four examples in Figure 2.1 with various scenarios of heteroge
neous smoothness. The curve ﬁtting results by the OrdSS with m = 2 (i.e. cubic
smoothing spline) are shown in Figure 2.3, as well as the 95% conﬁdence intervals for
the last two cases. In what follows, we discuss the need of local adaptation in each
case, and compare the performance of OrdSS and AdaSS.
Simulation Study.
The Doppler and HeaviSine functions are often taken as the benchmark examples
for testing adaptive smoothing methods. The global smoothing by the OrdSS (2.2)
may fail to capture the heterogeneous smoothness, and as a consequence, oversmooth
the region where the signal is relatively smooth and undersmooth where the signal is
highly oscillating. For the Doppler case in Figure 2.3, the underlying signal increases
its smoothness with time, but the OrdSS estimate shows lots of wiggles for t > 0.5.
In the HeaviSine case, both jumps at t = 0.3 and t = 0.72 could be captured by the
OrdSS, but the smoothness elsewhere is sacriﬁced as a compromise.
The AdaSS ﬁtting results with piecewiseconstant ρ
−1
(t) are shown in Figure 2.4.
38
Figure 2.3: OrdSS curve ﬁtting with m = 2 (shown by solid lines). The dashed lines
represent 95% conﬁdence intervals. In the creditrisk case, the log loss
rates are considered as the responses, and the timedependent weights are
speciﬁed proportional to the number of replicates.
Initially, we ﬁtted the OrdSS with crossvalidated tuning parameter λ, then estimated
the 2ndorder derivatives by (2.24). To estimate the piecewiseconstant local adapta
tion ρ
−1
(t), we prespeciﬁed 6 equally spaced knots (including 0 and 1) for the Doppler
function, and prespeciﬁed the knots (0, 0.295, 0.305, 0.5, 0.715, 0.725, 1) for the Heavi
Sine function. (Note that such knots can be determined graphically based on the plots
of the squared derivative estimates.) Given the knots, the function ρ
−1
(t) is estimated
by constrained least squares (2.25) conditional on the power transform parameter γ,
where the best γ
∗
is found jointly with λ by minimizing the score of crossvalidation
(2.21). In our implementation, we assumed that both tuning parameters are bounded
such that log λ ∈ [−40, 10] and γ ∈ [1, 5], then performed the bounded nonlinear op
timization. Table 2.1 gives the numerical results in both simulation cases, including
the crossvalidated (λ, γ), the estimated variance of noise (cf. true σ
2
= 1), and the
39
Figure 2.4: Simulation study of Doppler and HeaviSine functions: OrdSS (blue),
AdaSS (red) and the heterogeneous truth (light background).
40
Table 2.1: AdaSS parameters in Doppler and HeaviSine simulation study.
Simulation n snr λ γ ˆ σ
2
dof CV
Doppler 2048 7 1.21e012 2.13 0.9662 164.19 2180.6
HeaviSine 2048 7 1.86e010 4.95 1.0175 35.07 2123.9
equivalent degrees of freedom.
Figure 2.4 (bottom panel) shows the improvement of AdaSS over OrdSS after
zooming in. The middle panel shows the datadriven estimates of the adaptation
function ρ(t) plotted at log scale. For the Doppler function, the imposed penalty
ρ(t) in the ﬁrst interval [0, 0.2) is much less than that in the other four intervals,
resulting in heterogenous curve ﬁtting. For the HeaviSine function, low penalty is
assigned to where the jump occurs, while the relatively same high level of penalty is
assigned to other regions. Both simulation examples demonstrate the advantage of
using the AdaSS. A similar simulation study of the Doppler function can be found in
Pintore, et al.(2006) who used a small sample size (n = 128) and highdimensional
optimization techniques.
MotorcycleAccident Data.
Silverman (1985) originally used the motorcycleaccident experimental data to
test the nonstationary OrdSS curve ﬁtting; see the conﬁdence intervals in Figure 2.3.
Besides error nonstationarity, smoothness nonhomogeneity is another important fea
ture of this data set. To ﬁt such data, the treatment from a heterogenous smoothness
point of view is more appropriate than using only the nonstationary error treatment.
Recall that the data consists of acceleration measurements (subject to noise) of the
head of a motorcycle rider during the course of a simulated crash experiment. Upon
the crash occurrence, it decelerates to about negative 120 gal (a unit gal is 1 centimeter
per second squared). The data indicates clearly that the acceleration rebounds well
above its original level before setting back. Let us focus on the period prior to the
crash happening at t ≈ 0.24 (after time rescaling), for which it is reasonable to
41
Figure 2.5: Nonstationary OrdSS and AdaSS for MotorcycleAccident Data.
Figure 2.6: OrdSS, nonstationary OrdSS and AdaSS: performance in comparison.
42
assume the acceleration stays constant. However, the OrdSS estimate in Figure 2.3
shows a slight bump around t = 0.2, a false discovery.
In presence of nonstationary errors N(0, σ
2
/w(t
i
)) with unknown weights w(t
i
),
a common approach is to begin with an unweighted OrdSS and estimate w
−1
(t
i
) from
the local residual sums of squares; see Silverman (1985). The local moving averaging
(e.g. loess) is usually performed; see the upperleft panel in Figure 2.5. Then, the
smoothed weight estimates are plugged into (2.2) to run the nonstationary OrdSS. To
perform AdaSS (2.6) with the same weights, we estimated the local adaptation ρ
−1
(t)
from the derivatives of nonstationary OrdSS. Both results are shown in Figure 2.5.
Compared to Figure 2.3, both nonstationary spline models result in tighter estimates
of 95% conﬁdence intervals than the unweighted OrdSS.
Figure 2.6 compares the performances of stationary OrdSS, nonstationary OrdSS
and nonstationary AdaSS. After zooming in around t ∈ [0.1, 0.28], it is clear that
AdaSS performs better than both versions of OrdSS in modeling the constant accel
eration before the crash happening. For the settingback process after attaining the
maximum acceleration at t ≈ 0.55, AdaSS gives a more smooth estimate than OrdSS.
Besides, they diﬀer in estimating the minimum of the acceleration curve at t ≈ 0.36,
where the nonstationary OrdSS seems to have underestimated the magnitude of
deceleration, while the estimate by the nonstationary AdaSS looks reasonable.
CreditRisk Data.
The last plot in Figure 2.1 represents a tweaked sample in retail credit risk man
agement. It is of our interest to model the growth or the maturation curve of the log
of loss rates in lifetime (monthonbook). Consider the sequence of vintages booked in
order from 2000 to 2006. Their monthly loss rates are observed up to December 2006
(right truncation). Aligning them to the same origin, we have the observations accu
mulated more in smaller monthonbook and less in larger monthonbook, i.e., the
sample size decreases in lifetime. As a consequence, the simple pointwise or moving
43
Figure 2.7: AdaSS estimate of maturation curve for CreditRisk sample: piecewise
constant ρ
−1
(t) (upper panel) and piecewiselinear ρ
−1
(t) (lower panel).
averages have increasing variability. Nevertheless, our experiences tell that the mat
uration curve is expected to have increasing degrees of smoothness once the lifetime
exceeds certain monthonbook. This is also a desired property of the maturation
curve for the purpose of extrapolation.
By (2.4), spline smoothing may work on the pointwise averages subject to non
stationary errors N(0, σ
2
/w(t
i
)) with weights determined by the number of replicates.
The nonstationary OrdSS estimate is shown in Figure 2.3, which however does not
smooth out the maturation curve for large monthonbook. The reasons of the unsta
bility in the rightmost region could be (a) OrdSS has the intrinsic boundary bias for
especially small w(t
i
), and (b) the raw inputs for training purpose are contaminated
and have abnormal behavior. The exogenous cause of contamination in (b) will be
explained in Chapter III.
Alternatively, we applied the AdaSS with judgemental prior on heterogeneous
smoothness of the target function. For demonstration, we purposefully speciﬁed
44
ρ
−1
(t) to take the forms of Figure 2.7, including (a) piecewiseconstant and (b)
piecewiselinear. Both speciﬁcations are nondecreasing in lifetime and they span
about the same range. The curve ﬁtting results for both ρ
−1
(t) are nearly identical
upon visual inspection. Compared with the OrdSS performance in the large month
onbook region, the AdaSS yields smoother estimate of maturation curve, as well
as tighter conﬁdence intervals based on judgemental prior. The AdaSS estimate of
maturation curve, transformed to the original scale, is plotted by the red solid line in
Figure 2.1.
This creditrisk sample will appear again in Chapter III of vintage data analysis.
The really interesting story is yet to unfold.
2.5 Summary
In this chapter we have studied the adaptive smoothing spline for modeling het
erogeneously ‘smooth’ functions including possible jumps. In particular, we studied
the AdaSS with piecewise adaptation of local penalty and derived the closedform re
producing kernels. To determine the local adaptation function, we proposed a shape
estimation procedure based the function derivatives, together with the method of
crossvalidation for parameter tuning. Four diﬀerent examples, each having a speciﬁc
feature of heterogeneous smoothness, were used to demonstrate the improvement of
AdaSS over OrdSS.
Future works include the AdaSS properties w.r.t. (c) and (d) listed in Section 2.3.
Its connection with a timewarping smoothing spline is also under our investigation.
However, it is not clear how the AdaSS can be extended to multivariate smoothing
in a tensorproduct space, as an analogy of the smoothing spline ANOVA models in
Gu (2002). In next chapter, we will consider the application of AdaSS in a dualtime
domain, where the notion of smoothing spline will be generalized via other Gaussian
processes.
45
2.6 Technical Proofs
Proof to Proposition 2.1: by minor modiﬁcation on Wahba (1990; ¸1.21.3).
By Taylor’s theorem with remainder, any function in J
m
(2.3) can be expressed by
f(t) =
m−1
j=0
t
j
j!
f
(j)
(0) +
_
1
0
G
m
(t, u)f
(m)
(u)du (2.30)
where G
m
(t, u) =
(t−u)
(m−1)
+
(m−1)!
is the Green’s function for solving the boundary value
problem: if f
(m)
1
= g with f
1
∈ B
m
= ¦f : f
(k)
(0) = 0, for k = 0, 1, . . . , m−1¦, then
f
1
(t) =
_
1
0
G
m
(t, u)g(u)du. The remainder functions f
1
(t) =
_
1
0
G
m
(t, u)f
(m)
(u)du lie
in the subspace H = J
m
∩ B
m
. Given the bounded integrable function ρ(t) > 0 on
t ∈ [0, 1], associate H with the inner product
¸f
1
, g
1
)
H
=
_
1
0
f
(m)
1
(u)g
(m)
1
(u)λ(u)du, (2.31)
and the induced square norm f
1

2
H
=
_
1
0
λ(u)[f
(m)
1
(u)]
2
du. By similar arguments to
Wahba (1990; ¸1.2), one may verify that H is an RKHS with r.k. (2.7), where the
reproducing property
¸f
1
(), K(t, ))
H
= f
1
(t), ∀f
1
∈ H (2.32)
follows that
∂
m
∂t
m
K(t, s) = ρ(t)
−1
G
m
(s, t).
Then, using the arguments of Wahba (1990; ¸1.3) would prove the ﬁnitedimensional
expression (2.8) that minimizes the AdaSS objective functional (2.6).
Proof to Proposition 2.2: The r.k. with piecewise ρ
−1
(t) equals
K(t, s) =
κt∧s
k=1
_
t∧s∧τ
−
k
τ
k−1
(a
k
+ b
k
u)
(u −t)
m−1
(m−1)!
(u −s)
m−1
(m−1)!
du.
46
For any τ
k−1
≤ t
1
≤ t
2
< τ
k
, using the successive integration by parts,
_
t
2
t
1
(a
k
+ b
k
u)
(u −t)
m−1
(m−1)!
(u −s)
m−1
(m−1)!
du =
_
t
2
t
1
(a
k
+ b
k
u)
(u −t)
m−1
(m−1)!
d
(u −s)
m
m!
= (a
k
+ b
k
u)
(u −t)
m−1
(m−1)!
(u −s)
m
m!
¸
¸
¸
t
2
t
1
−
_
t
2
t
1
(a
k
+ b
k
u)
(u −t)
m−2
(m−2)!
(u −s)
m
m!
du
−b
k
_
t
2
t
1
(u −t)
m−1
(m−1)!
(u −s)
m
m!
du = ,
it is straightforward (but tedious) to derive that
_
t
2
t
1
(a
k
+ b
k
u)
(u −t)
m−1
(m−1)!
(u −s)
m−1
(m−1)!
du
=
m−1
j=0
(−1)
j
(a
k
+ b
k
u)
(u −t)
m−1−j
(m−1 −j)!
(u −s)
m+j
(m + j)!
¸
¸
¸
t
2
t
1
−b
k
m−1
j=0
(−1)
j
T
j
in which T
j
denotes the integral evaluations for j = 0, 1, . . . , m−1
T
j
≡
_
t
2
t
1
(u −t)
m−1−j
(m−1 −j)!
(u −s)
m+j
(m + j)!
du =
m−1−j
l=0
(−1)
l
(u −t)
m−1−j−l
(u −s)
m+1+j+l
(m−1 −j −l)!(m + 1 + j + l)!
¸
¸
¸
t
2
t
1
.
Choosing τ
1
= τ
k−1
, τ
2
= t ∧ s ∧ τ
−
k
for k = 1, . . . , κ
t∧s
and adding them all gives the
ﬁnal r.k. evaluation by
K(t, s) =
κt∧s
k=1
m−1
j=0
(−1)
j
_
(a
k
+ b
k
u)
(u −t)
m−1−j
(u −s)
m+j
(m−1 −j)!(m + j)!
−b
k
m−1−j
l=0
(−1)
l
(u −t)
m−1−j−l
(u −s)
m+1+j+l
(m−1 −j −l)!(m + 1 + j + l)!
_¸
¸
¸
¸
¸
t∧s∧τ
−
k
τ
k−1
.
47
CHAPTER III
Vintage Data Analysis
3.1 Introduction
The vintage data represents a special class of functional, longitudinal or panel data
with dualtime characteristics. It shares the crosssectional time series feature such
that there are J subjects each observed at calendar time points ¦t
jl
, l = 1, . . . , L
j
¦,
where L
j
is the total number of observations on the jth subject. Unlike purely
longitudinal setup, each subject j we consider corresponds to a vintage originated at
time v
j
such that v
1
< v
2
< . . . < v
J
and the elapsed time m
jl
= t
jl
− v
j
measures
the lifetime or age of the subject. To this end, we denote the vintage data by
_
y(m
jl
, t
jl
; v
j
), l = 1, . . . , L
j
, j = 1, . . . , J
_
, (3.1)
where the dualtime points are usually sampled from the regular grids such that for
each vintage j, m
jl
= 0, 1, 2, . . . and t
jl
= v
j
, v
j
+ 1, v
j
+ 2, . . .. We call the grids of
such (m
jl
, t
jl
; v
j
) a vintage diagram. Also observed on the vintage diagram could be
the covariate vectors x(m
jl
, t
jl
; v
j
) ∈ R
p
. In situations where more than one segment
of dualtime observations are available, we may denote the (static) segmentation vari
ables by z
k
∈ R
q
, and denote the vintage data by ¦y(m
kjl
, t
kjl
; v
kj
), x(m
kjl
, t
kjl
; v
kj
)¦
for segment k = 1, . . . , K. See Figure 1.3 for an illustration of vintage diagram and
48
dualtime collection of observations, as well as the hierarchical structure of multi
segment credit portfolios.
The vintage diagram could be truncated in either age or time, and several in
teresting prototypes are illustrated in Figure 3.1. The triangular prototype on the
topleft panel is the most typical as the time series data are often subject to the
systematic truncation from right (say, observations up to today). Each vintage series
evolves in the 45
◦
direction such that for the same vintage v
j
, the calendar time t
j
and the lifetime m
j
move at the same speed along the diagonal — therefore, the
vintage diagram corresponds to the hyperplane ¦(t, m, v) : v = t − m¦ in the 3D
space. In practice, there exist other prototypes of vintage data truncation. The rect
angular diagram is obtained after truncation in both lifetime and calendar time. The
corporate default rates released by Moody’s, shown in Table 1.1, are truncated in
lifetime to be 20 years maximum. The other trapezoidal prototype corresponds to
the timesnapshot selection common to credit risk modeling exercises. Besides, the
U.S. Treasury rates released for multiple maturity dates serve as an example of the
parallelogram prototype.
The age, time and vintage dimensions are all of potential interest in ﬁnancial risk
management. The empirical evidences tell that the age eﬀects of selfmaturation na
ture are often subject to smoothness in variation. The time eﬀects are often dynamic
and volatile due to macroeconomic environment. Meanwhile, the vintage eﬀects are
conceived as the forwardlooking measure of performance since origination, and they
could be treated as random eﬀects upon appropriate structural assumption. Bearing
in mind these practical considerations, we aim to develop a formal statistical frame
work for vintage data analysis (VDA), by integrating ideas from functional data
analysis (FDA), longitudinal data analysis (LDA)
1
and time series analysis (TSA).
To achieve this, we will take a Gaussian process modeling approach, which is ﬂexible
1
Despite the overlap between FDA and LDA, we try to diﬀerentiate them methodologically by
tagging FDA with nonparametric smoothing and tagging LDA with randomeﬀect modeling.
49
Figure 3.1: Vintage diagram upon truncation and exempliﬁed prototypes.
50
enough to cover a diversity of scenarios.
This chapter develops the VDA framework for continuous type of responses (usu
ally, rates). It is organized as follows. In Section 3.2, we discuss the maturation
exogenousvintage (MEV) decomposition in general. The MEV models based on
Gaussian processes are presented in Section 3.3, where we discuss Kriging, smoothing
spline and kernel methods. An eﬃcient backﬁtting algorithm is provided for model
estimation. Section 3.4 covers the semiparametric regression given the dualtime
covariates and segmentation variables. Computational results are presented in Sec
tion 3.5, including a simulation study for assessing the MEV decomposition technique,
and real applications in both corporate and retail credit risk cases. We conclude the
chapter in Section 3.6 and give technical proofs in the last section.
3.2 MEV Decomposition Framework
Let 1 = ¦v
j
: 0 ≡ v
1
< v
2
< . . . < v
J
< ∞¦ be a discrete set of origination times
of J vintages, and deﬁne the dualtime domain
Ω =
_
(m, t; v) : m ≥ 0, t = v + m, v ∈ 1
_
,
where m denotes the lifetime and t denotes the calendar time. Unlike a usual 2D
(m, t) space, the dualtime domain Ω consists of isolated 45
◦
lines ¦(m, t) : t − m =
v
j
, m ≥ 0¦ for each ﬁxed vintage v
j
∈ 1. The triangular vintage diagram in Figure 3.1
is the discrete counterpart of Ω upon right truncation in calendar time.
Let Ω be the support of the vintage data (3.1), for which we propose the MEV
(maturationexogenousvintage) decomposition modeling framework
η(y(m, t; v)) = f(m) + g(t) + h(v) + ε, (3.2)
51
where η is a prespeciﬁed transform function, ε is the random error, and the three
MEV components have the following practical interpretations:
a) f(m) represents the maturation curve characterizing the endogenous growth,
b) g(t) represents the exogenous inﬂuence by macroeconomic conditions,
c) h(v) represents the vintage heterogeneity measuring the origination quality.
For the time being, let f, g, h be open to ﬂexible choices of parametric, nonparametric
or stochastic process models. We begin with some general remarks about model
identiﬁcation.
Suppose the maturation curve f(m) in (3.2) absorbs the intercept eﬀect, and the
constraints Eg = Eh = 0 are set as usual:
E
_
tmax
t
min
g(t)dt = E
_
V
h(v)dv = 0. (3.3)
The question is, do these constraints suﬃce to ensure the model identiﬁability? It
turns out for the dualtime domain Ω, all the MEV decomposition models are subject
to the linear dependency, as stated formally in the following lemma.
Lemma 3.1 (Identiﬁcation). For the MEV models (3.2) deﬁned on the dualtime
domain Ω, write f = f
0
+ f
1
with the linear part f
0
and the nonlinear part f
1
and
similarly write g = g
0
+ g
1
, h = h
0
+ h
1
. Then, one of f
0
, g
0
, h
0
is not identiﬁable.
The identiﬁcation (or collinearity) problem in Lemma 3.1 is incurred by the hy
perplane nature of the dualtime domain Ω in the 3D space: ¦(t, m, v) : v = t − m¦
(see Figure 3.1 top right panel). We need to break up such linear dependency in order
to make the MEV models estimable. A practical way is to perform binning in age,
time or vintage direction. For example, given the vintage data of triangular proto
type, one may group the large m region, the small t region and the large v region, as
52
these regions have few observations. A commonly used binning procedure for h(v) is
to assume the piecewiseconstant vintage eﬀects by grouping monthly vintages every
quarter or every year.
An alternative strategy for breaking up the linear dependency is to directly remove
the trend for one of f, g, h functions. Indeed, this strategy is simple to implement,
provided that f, g, h all could be expanded by certain basis functions. Then, one only
needs to ﬁx the linear bases for any two of f, g, h. By default, we remove the trend
of h and let it measure only the nonlinear heterogeneity. A word of caution is in
order in situations where there does exist an underlying trend h
0
(v), then it would be
absorbed by f(m) and g(t). One may retrieve h
0
(v) = γv by simultaneously adjusting
f(m) → f(m) + γm and g(t) → g(t) − γt. To determine the slope γ requires extra
knowledge about the trend of at least one of f, g, h. The trend removal for h(v) is
needed only when there is no such extra knowledge. The MEV decomposition with
zerotrend vintage eﬀects may be referred to as the ad hoc approach.
In what follows we review several existing approaches that are related to MEV
decomposition. The ﬁrst example is a conventional ageperiodcohort (APC) model
in social sciences of demography and epidemiology. The second example is the gener
alized additive model (GAM) in nonparametric statistics, where the smoothing spline
is used as the scatterplot smoother. The third example is an industry GAMEED ap
proach that may reduce to a sequential type of MEV decomposition. The last example
is another industrial DtD technique involving the interactions of MEV components.
They all can be viewed as one or another type of vintage data analysis.
The APC Model.
Suppose there are I distinct mvalues and L distinct tvalues in the vintage data.
By (3.2), setting f(m) = µ +
I
i=1
α
i
I(m = m
i
), g(t) =
L
l=1
β
l
I(t = t
l
) and h(v) =
53
J
j=1
γ
j
I(v = v
j
) gives us the the APC (ageperiodcohort) model
η(y(m
i
, t
l
; v
j
)) = µ + α
i
+ β
l
+ γ
j
+ ε, (3.4)
where the vintage eﬀects ¦γ
j
s¦ used to be called the cohort eﬀects; see Mason, et al.
(1973) and Mason and Wolﬁnger (2002). Assume
i
α
i
=
l
β
l
=
j
v
j
= 0, as
usual; then one may reformulate the APC model as y = Xθ + ε, where y is the the
vector consisting of η(y(m
i
, t
l
; v
j
)) and
X =
_
1, x
(α)
1
, . . . , x
(α)
I−1
, x
(β)
1
, . . . , x
(β)
L−1
, x
(γ)
1
, . . . , x
(γ)
J−1
_
with dummy variables. For example, x
(α)
i
is deﬁned by x
(α)
i
(m) = I(m = m
i
)−I(m =
m
I
) for i = 1, . . . , I − 1. Without loss of generality, let us select the factorial levels
(m
I
, t
L
, v
J
) to satisfy m
I
−t
L
+ v
J
= 0 (after reshuﬄing the indices of v).
The identiﬁcation problem of the APC model (3.4) follows Lemma 3.1, since the
regression matrix is linearly dependent through
I−1
i=1
(m
i
− ¯ m)x
(α)
i
−
L−1
l=1
(t
l
−
¯
t )x
(β)
l
+
J−1
j=1
(v
j
− ¯ v)x
(γ)
j
= ¯ m−
¯
t + ¯ v, (3.5)
where ¯ m =
I
i=1
m
i
/I,
¯
t =
L
l=1
t
l
/L and ¯ v =
J
j=1
v
j
/J. So, the ordinary least
squares (OLS) cannot be used for ﬁtting y = Xθ + ε. This is a key observation by
Kupper (1985) that generated longterm discussions and debates on the estimability
of the APC model. Among others, Fu (2000) suggested to estimate the APC model
by ridge regression, whose limiting case leads to a socalled intrinsic estimator
´
θ =
(X
T
X)
+
X
T
y, where A
+
is the MoorePenrose generalized inverse of A satisfying that
AA
+
A = A.
The discrete APC model tends to massively identify the eﬀects at the grid level and
often ends up with unstable estimates. Since the vintage data are always unbalanced
54
in m, t or v projection, there would be very limited observations for estimating
certain factorial eﬀects in (3.4). The APC model is formulated overﬂexibly in the
sense that it does not regularize the grid eﬀect by neighbors. Fu (2008) suggested to
use the spline model for h(v); however, such spline regularization would not bypass
the identiﬁcation problem unless it does not involve a linear basis.
The Additive Splines.
If f, g, h are all assumed to be unspeciﬁed smooth functions, the MEV models
would become the generalized additive models (GAM); see Hastie and Tibshirani
(1990). For example, using the cubic smoothing spline as the scatterplot smoother,
we have the additive splines model formulated by
arg min
f,g,h
_
J
j=1
L
j
l=1
_
η(y
jl
) −f(m
jl
) −g(t
jl
) −h(v
j
)
_
2
+ λ
f
_
_
f
(2)
(m)
_
2
dm
+λ
g
_
_
g
(2)
(t)
_
2
dt + λ
h
_
_
h
(2)
(v)
_
2
dv
_
, (3.6)
where λ
f
, λ
g
, λ
h
> 0 are the tuning parameters separately controlling the smoothness
of each target function; see (2.2).
Usually, the optimization (3.6) is solved by the backﬁtting algorithm that iterates
ˆ
f ← S
f
_
_
η(y
jl
) − ˆ g(t
jl
) −
ˆ
h(v
j
)
_
j,l
_
ˆ g ← S
g
_
_
η(y
jl
) −
ˆ
f(m
jl
) −
ˆ
h(v
j
)
_
j,l
_
(centered) (3.7)
ˆ
h ← S
h
_
_
η(y
jl
) −
ˆ
f(m
jl
) − ˆ g(t
jl
)
_
j,l
_
(centered)
after initializing ˆ g =
ˆ
h = 0. See Chapter II about the formation of smoothing matrices
S
f
, S
g
, S
h
, for which the corresponding smoothing parameters can be selected by the
method of generalized crossvalidation. However, the linear parts of f, g, h are not
identiﬁable, by Lemma 3.1. To get around the identiﬁcation problem, we may take
55
the ad hoc approach to remove the trend of h(v). The trendremoval can be achieved
by appending to every iteration of (3.7):
ˆ
h(v) ←
ˆ
h(v) −v
_
J
j=1
v
2
j
_
−1
J
j=1
v
j
ˆ
h(v
j
).
We provide more details on (3.6) to shed light on its further development in the
next section. By Proposition 2.1 in Chapter II, each of f, g, h can be expressed by
the basis expansion of the form (2.8). Taking into account the identiﬁability of both
the intercept and the trend, we may express f, g, h by
f(m) = µ
0
+ µ
1
m +
I
i=1
α
i
K(m, m
i
)
g(t) = µ
2
t +
L
l=1
β
l
K(t, t
l
) (3.8)
h(v) =
J
j=1
γ
j
K(v, v
j
).
Denote µ = (µ
0
, µ
1
, µ
2
)
T
, α = (α
1
, . . . , α
I
)
T
, β = (β
1
, . . . , β
L
)
T
, and γ = (γ
1
, . . . , γ
J
)
T
.
We may equivalently write (3.6) as the regularized least squares problem
min
_
1
n
_
_
y −Bµ −
¯
Σ
f
α−
¯
Σ
g
β −
¯
Σ
h
γ
_
_
2
+ λ
f
_
_
α
_
_
2
Σ
f
+ λ
g
_
_
β
_
_
2
Σg
+ λ
h
_
_
γ
_
_
2
Σ
h
_
,
(3.9)
where y is the vector of n =
J
j=1
L
j
observations ¦η(y
jl
)¦, B,
¯
Σ
f
, Σ
f
are formed by
B =
__
(1, m
jl
, t
jl
)
_
j,l
¸
n×3
,
¯
Σ
f
=
_
K(m
jl
, m
i
)
¸
n×I
, Σ
f
=
_
K(m
i
, m
i
)
¸
I×I
and
¯
Σ
g
, Σ
g
,
¯
Σ
h
, Σ
h
are formed similarly.
The GAMEED Approach.
The GAMEED refers to an industry approach, namely the generalized additive
maturation and exogenous eﬀects decomposition, adopted by Enterprise Quantitative
Risk Management, Bank of America, N.A. (Sudjianto, et al., 2006). It has also been
56
documented as part of the patented work “Risk and Reward Assessment Mechanism”
(USPTO: 20090063361). Essentially, it has two practical steps:
1. Estimate the maturation curve
ˆ
f(m) and the exogenous eﬀect ˆ g(t) from
η(y(m, t; v)) = f(m; α) + g(t; β) + ε, Eg = 0
where f(m; α) can either follow a spline model or take a parametric form sup
ported by prior knowledge, and g(t; β) can be speciﬁed like in (3.4) upon neces
sary time binding while retaining the ﬂexibility in modeling the macroeconomic
inﬂuence. If necessary, one may further model ˆ g(t; β) by a timeseries model in
order to capture the autocorrelation and possible jumps.
2. Estimate the vintagespeciﬁc sensitivities to
ˆ
f(m) and ˆ g(t) by performing the
regression ﬁtting
η(y(m
jl
, t
jl
; v
j
)) = γ
j
+ γ
(f)
j
ˆ
f(m
jl
) + γ
(g)
j
ˆ g(t
jl
) + ε
for each vintage j (upon necessary bucketing).
The ﬁrst step of GAMEED is a reduced type of MEV decomposition without
suﬀering theidentiﬁcation problem. The estimates of f(m) and g(t) can be viewed
as the common factors to all vintages. In the second step, the vintage eﬀects are
measured through not only the main (intercept) eﬀects, but also the interactions
with
ˆ
f(m) and ˆ g(t). Clearly, when the interaction sensitivities γ
(f)
j
= γ
(g)
j
= 1, the
industry GAMEED approach becomes a sequential type of MEV decomposition.
The DtD Technique.
The DtD refers to the dualtime dynamics, adopted by Strategic Analytics Inc.,
for consumer behavior modeling and delinquency forecasting. It was brought to the
57
public domain by Breeden (2007). Using our notations, the DtD model takes the
form
η(y(m
jl
, t
jl
; v
j
)) = γ
(f)
j
f(m
jl
) + γ
(g)
j
g(t
jl
) + ε (3.10)
and η
−1
() corresponds to the socalled superposition function in Breeden (2007).
Clearly, the DtD model involves the interactions between the vintage eﬀects and
(f(m), g(t)). Unlike the sequential GAMEED above, Breeden (2007) suggested to
iteratively ﬁt f(m), g(t) and γ
(f)
j
, γ
(g)
j
in a simultaneous manner.
However, the identiﬁcation problem could as well happen to the DtD technique,
unless it imposes structural assumptions that could break up the trend dependency.
To see this, one may use Taylor expansion for each nonparametric function in (3.10)
then apply Lemma 3.1 to the linear terms. Unfortunately, such identiﬁcation problem
was not addressed by Breeden (2007), and the stability of the iterative estimation
algorithm might be an issue.
3.3 Gaussian Process Models
In MEV decomposition framework (3.2), the maturation curve f(m), the exoge
nous inﬂuence g(t) and the vintage heterogeneity h(v) are postulated to capture the
endogenous, exogenous and origination eﬀects on the dualtime responses. In this
section, we propose a general approach to MEV decomposition modeling based on
Gaussian processes and provide an eﬃcient algorithm for model estimation.
3.3.1 Covariance Kernels
Gaussian processes are rather ﬂexible in modeling a function Z(x) through a mean
function µ() and a covariance kernel K(, ),
EZ(x) = µ(x), E[Z(x) −µ(x)][Z(x
) −µ(x
)] = σ
2
K(x, x
) (3.11)
58
where σ
2
is the variance scale. For ease of discussion, we will write µ(x) separately
and only consider the zeromean Gaussian process Z(x). Then, any ﬁnitedimensional
collection of random variables Z = (Z(x
1
), . . . , Z(x
n
)) follows a multivariate normal
distribution Z ∼ N
n
(0, σ
2
Σ) with Σ = [K(x
i
, x
j
)]
n×n
.
The covariance kernel K(, ) plays a key role in constructing Gaussian processes. It
is symmetric K(x, x
) = K(x
, x) and positive deﬁnite:
_
x,x
K(x, x
)f(x)f(x
)dxdx
>
0 for any nonzero f ∈ L
2
. The kernel is said to be stationary if K(x, x
) depends only
on [x
i
−x
i
[, i = 1, . . . , d for x ∈ R
d
. It is further said to be isotropic or radial if K(x, x
)
depends only on the distance x−x
. For x ∈ R, a stationary kernel is also isotropic.
There is a parsimonious approach to using stationary kernels K(, ) to model certain
nonstationary Gaussian processes such that Cov[Z(x), Z(x
)] = σ(x)σ(x
)K(x, x
),
where the variance scale σ
2
(x) requires a separate model of either parametric or non
parametric type. For example, Fan, Huang and Li (2007) used the local polynomial
approach to smoothing σ
2
(x).
Several popular classes of covariance kernels are listed below, including the adap
tive smoothing spline (AdaSS) kernels discussed in Chapter II.
Exponential family: K(x, x
) = exp
_
−
_
[x −x
[
φ
_
κ
_
, 0 < κ ≤ 2, φ > 0
Mat´ern family: K(x, x
) =
2
1−κ
Γ(κ)
_
[x −x
[
φ
_
κ
K
κ
_
[x −x
[
φ
_
, κ, φ > 0 (3.12)
AdaSS r.k. family: K(x, x
) =
_
X
φ
−1
(u)G
κ
(x, u)G
κ
(x
, u)du, φ(x) > 0.
More examples of covariance kernels can be referred to Stein (1999) and Rasmussen
and Williams (2006).
Both the exponential and Mat´ern families consist of stationary kernels, for which
φ is the scale parameter and κ can be viewed as the shape parameter. In the Mat´ern
family, K
κ
() denotes the modiﬁed Bessel function of order κ; see Abramowitz and
Stegun (1972). Often used are the halfinteger order κ = 1/2, 3/2, 5/2 and the corre
59
sponding Mat´ern covariance kernels take the following forms
K(x, x
) = e
−
x−x

φ
, κ = 1/2
K(x, x
) = e
−
x−x

φ
_
1 +
[x −x
[
φ
_
, κ = 3/2 (3.13)
K(x, x
) = e
−
x−x

φ
_
1 +
[x −x
[
φ
+
[x −x
[
2
3φ
2
_
, κ = 5/2.
It is interesting that these halfinteger Mat´ern kernels correspond to autocorrelated
Gaussian processes of order κ + 1/2; see Stein (1999; p. 31). In particular, the
κ = 1/2 Mat´ern kernel, which is equivalent to the exponential kernel with κ = 1, is
the covariance function of the OrnsteinUhlenbeck (OU) process.
The AdaSS family of reproducing kernels (r.k.’s) are based on the Green’s function
G
κ
(x, u) = (x −u)
κ−1
+
/(κ −1)! with order κ and local adaptation function φ(x) > 0.
In Chapter 2.2.1, we have derived the explicit r.k. expressions with piecewise φ
−1
(x).
They are nonstationary covariance kernels with the corresponding Gaussian processes
discussed in Chapter 2.3. In this chapter, we concentrate on a particular class of
AdaSS kernels
K(x, x
) =
min{k:τ
k
>x∧x
}
k=1
φ
k
_
x ∧ x
−τ
k−1
_
(3.14)
with piecewiseconstant adaptation φ
−1
(x) = φ
k
for x ∈ [τ
k
−1, τ
k
) and knots x
−
min
≡
τ
0
< τ
1
< < τ
K
≡ x
+
max
. By (2.29), the Gaussian process with kernel (3.14)
generalizes the standard Wiener process with nonstationary volatility, which is useful
for modeling the heterogeneous behavior of an MEV component, e.g. g(t).
3.3.2 Kriging, Spline and Kernel Methods
Using Gaussian processes to model the component functions in (3.2) would give
us a class of Gaussian process MEV models. Among others, we consider
η(y(m, t; v)) = µ(m, t; v) + Z
f
(m) + Z
g
(t) + Z
h
(v) + ε (3.15)
60
where ε ∼ N(0, σ
2
) is the random error, Z
f
(m), Z
g
(t), Z
h
(v) are three zeromean
Gaussian processes with covariance kernels σ
2
f
K
f
, σ
2
g
K
g
, σ
2
h
K
h
, respectively. To bypass
the identiﬁcation problem in Lemma 3.1, the overall mean function is postulated to
be µ(m, t; v) ∈ H
0
on the hyperplane of the vintage diagram, where
H
0
⊆ span¦1, m, t¦ ∩ null¦Z
f
(m), Z
g
(t), Z
h
(v)¦, (3.16)
among other choices. For example, choosing cubic spline kernels for all f, g, h al
lows for H
0
= span¦0, m, t¦, then µ(m, t; v) = µ
0
+ µ
1
m + µ
2
t. In other situations,
the functional space H
0
might shrink to be span¦1¦ in order to satisfy (3.16), then
µ(m, t; v) = µ
0
. In general, let us denote µ(m, t; v) = µ
T
b(m, t; v), for µ ∈ R
d
.
For simplicity, let us assume that Z
f
(m), Z
g
(t), Z
h
(v) are mutually independent
for x ≡ (m, t, v) ∈ R
3
. Consider the sum of univariate Gaussian processes
Z(x) = Z
f
(m) + Z
g
(t) + Z
h
(v), (3.17)
with mean zero and covariance Cov[Z(x), Z(x
)] = σ
2
K(x, x
). By the independence
assumption,
K(x, x
) = λ
−1
f
K
f
(m, m
) + λ
−1
g
K
g
(t, t
) + λ
−1
h
K
h
(v, v
). (3.18)
with λ
f
= σ
2
/σ
2
f
, λ
g
= σ
2
/σ
2
g
and λ
h
= σ
2
/σ
2
h
. Then, given the vintage data (3.1)
with n =
J
j=1
L
j
observations, we may write (3.15) in the vectormatrix form
y = Bµ +¯z, Cov[¯z] = σ
2
_
Σ+I
¸
(3.19)
where y =
_
η(y
jl
)
¸
n×1
, B =
_
b
T
jl
¸
n×d
, ¯z =
_
Z(x
jl
) + ε
¸
n×1
and Σ =
_
K(x
jl
, x
j
l
)
¸
n×n
.
By either GLS (generalized least squares) or MLE (maximum likelihood estimation),
61
µ is estimated to be
´ µ =
_
B
T
_
Σ+I
¸
−1
B
_
−1
B
T
_
Σ+I
¸
−1
y. (3.20)
where the invertibility of Σ + I is guaranteed by large values of λ
f
, λ
g
, λ
h
as in the
context of ridge regression.
In the Gaussian process models there are additional unknown parameters for deﬁn
ing the covariance kernel (3.18), namely (λ, θ) = (λ
f
, λ
g
, λ
h
, θ
f
, θ
g
, θ
h
), as well as the
unknown variance parameter σ
2
(a.k.a. the nugget eﬀect). The λ
f
, λ
g
, λ
h
, as the ra
tios of σ
2
to the variance scales of Z
f
, Z
g
, Z
h
, are usually referred to as the smoothing
parameters. The θ
f
, θ
g
, θ
h
denote the structural parameters, e.g scale parameter φ
and shape parameter κ in exponential or Mat´ern family of kernels, or local adap
tation φ() in AdaSS kernel. By (3.19) and after dropping the constant term, the
loglikelihood function is given by
(µ, λ, θ, σ
2
) = −
n
2
log(σ
2
) −
1
2
log
¸
¸
Σ+I
¸
¸
−
1
2σ
2
(y −Bµ)
T
(Σ+I)
−1
(y −Bµ)
in which Σ depends on (λ, θ). Whenever (λ, θ) are ﬁxed, the µ estimate is given by
(3.20), then one may estimate the variance parameter to be
ˆ σ
2
=
1
n −p
(y −B´ µ)Σ
−1
(y −B´ µ), (3.21)
where p = rank(B) is plugged in order to make ˆ σ
2
an unbiased estimator.
Usually, numerical procedures like NewtonRaphson algorithm are needed to itera
tively estimate (λ, θ) that admit no closedform solutions; see Fang, Li and Sudjianto
(2006, ¸5). Such a standard maximum likelihood estimation method is computation
ally intensive. In the next subsection, we will discuss how to iteratively estimate the
parameters by an eﬃcient backﬁtting algorithm.
62
MEV Kriging means the prediction based on the Gaussian process MEV model
(3.15), where the naming of ‘Kriging’ follows the geostatistical convention; see Cressie
(1993) in spatial statistics. See also Fang, Li and Sudjianto (2006) for the use of Krig
ing in the context of computer experiments. Kriging can be equally treated by either
multivariate normal distribution theory or Bayesian approach, since the conditional
expectation of multivariate normal distribution equals the Bayesian posterior mean.
The former approach is taken here for making prediction from the noisy observations,
for which the Kriging behaves as a smoother rather than an interpolator.
Let us denote the prediction of η(ˆ y(x)) at x = (m, t, v) by ζ(x) = ´ µ
T
b(x) +
E
_
Z
f
(m) +Z
g
(t) +Z
h
(v) +ε
¸
¸
¯z
_
, where E[Z[¯z] denotes the conditional expectation of
Z given ¯z = y −B´ µ. By the property of multivariate normal distribution,
ˆ
ζ(x) = ´ µ
T
b(x) +ξ
T
n
(x)
_
Σ+I
¸
−1
_
y −B´ µ
_
(3.22)
where ξ
n
(x) denotes the vector of
_
K(x, x
jl
)
¸
n×1
evaluated between x and each sam
pling point.
The Kriging (3.22) is known to be a best linear unbiased predictor. It has also
a smoothing spline reformulation that covers the additive cubic smoothing splines
model (3.6) as a special case. Formally, we have the following proposition.
Proposition 3.2 (MEV Kriging as Spline Estimator). The MEV Kriging (3.22) can
be formulated as a smoothing spline estimator
arg min
ζ∈H
_
J
j=1
L
j
l=1
_
η(y
jl
) −ζ(x
jl
)
_
2
+
_
_
ζ
_
_
2
H
1
_
(3.23)
where H = H
0
⊕H
1
with H
0
⊆ span¦1, m, t¦ and H
1
is the reproducing kernel Hilbert
space induced by (3.18) such that H
0
⊥ H
1
.
63
The spline solution to (3.23) can be written as a ﬁnitedimensional expression
ζ(x) = µ
T
b(x) +
J
j=1
L
j
l=1
c
jl
K(x, x
jl
) (3.24)
with coeﬃcients determined by the regularized least squares,
min
_
_
_
y −Bµ −Σc
_
_
2
+
_
_
c
_
_
Σ
_
. (3.25)
Furthermore, we have the following simpliﬁcation.
Proposition 3.3 (Additive Separability). The MEV Kriging (3.22) can be expressed
additively as
ζ(x) = µ
T
b(x) +
I
i=1
α
i
K
f
(m, m
i
) +
L
l=1
β
l
K
g
(t, t
l
) +
J
j=1
γ
j
K
g
(v, v
j
) (3.26)
based on I distinct mvalues, L distinct tvalues and J distinct vvalues in (3.1). The
coeﬃcients can be estimated by
min
_
_
_
y −Bµ −
¯
Σ
f
α−
¯
Σ
g
β −
¯
Σ
h
γ
_
_
2
+ λ
f
_
_
α
_
_
2
Σ
f
+ λ
g
_
_
β
_
_
2
Σg
+ λ
h
_
_
γ
_
_
2
Σ
h
_
,
(3.27)
where the matrix notations are the same as in (3.9) up to mild modiﬁcations.
Proposition 3.3 converts the MEV Kriging (3.22) to a kernelized additive model
of the form (3.26). It connects the kernel methods in machine learning that advocates
the use of covariance kernels for mapping the original inputs to some highdimensional
feature space. Based on our previous discussions, such kernelized feature space cor
responds to the family of Gaussian processes, or the RKHS; see e.g. Rasmussen and
Williams (2006) for more details.
Proposition 3.3 breaks down the highdimensional parameter estimation and it
makes possible the development of a backﬁtting algorithm to deal with three uni
64
variate Gaussian processes separately. Such backﬁtting procedure provides a more
eﬃcient procedure than the standard method of maximum likelihood estimation.
Besides, it can easily deal with the collinearity problem among Z
f
(m), Z
g
(t), Z
h
(v)
caused by the hyperplane dependency of vintage diagram.
3.3.3 MEV Backﬁtting Algorithm
For given (λ, θ), the covariance kernels in (3.18) are deﬁned explicitly, so one
may ﬁnd the optimal solution to (3.27) by setting partial derivatives to zero and
solving the linear system of algebraic equations. There are however complications for
determining (λ, θ). In this section we propose a backﬁtting algorithm to minimize the
regularized least squares criterion (3.27), together with a generalized crossvalidation
(GCV) procedure for selecting the smoothing parameters λ
f
, λ
g
, λ
h
, as well as the
structural parameters θ
f
, θ
g
, θ
h
used for deﬁning the covariance kernels. Here, the
GCV criterion follows our discussion in Chapter 2.2.2; see also Golub, Heath and
Wahba (1979) in the context of ridge regression.
Denote the complete set of parameters by Ξ = ¦µ; α, λ
f
, θ
f
; β, λ
g
, θ
g
; γ, λ
h
, θ
h
; σ
2
¦
where λ’s are smoothing parameter and θ’s are structural parameters for deﬁning the
covariance kernels. Given the vintage data (3.1) with notations y, B in (3.6):
Step 1: Set the initial value of µ to be the ordinary least squares (OLS) estimate
(B
T
B)
−1
B
T
y; Set the initial values of β, γ to be zero.
Step 2a: Given Ξ with (α, λ
f
, θ
f
) unknown, get the pseudo data ¯ y = y −Bµ−
¯
Σ
g
β −
¯
Σ
h
γ. For each input (λ
f
, θ
f
), estimate α by ridge regression
´ α =
_
¯
Σ
T
f
¯
Σ
f
+ λ
f
Σ
f
_
−1
¯
Σ
T
f
¯ y (3.28)
where
¯
Σ
f
=
_
K
f
(m
jl
, m
i
; θ
f
)
¸
n×I
and Σ
f
=
_
K
f
(m
i
, m
j
; θ
f
)
¸
I×I
. Determine the
65
best choice of (λ
f
, θ
f
) by minimizing
GCV(λ
f
, θ
f
) =
1
n
_
_
_(I −
¯
S(λ
f
, θ
f
))¯ y
_
_
_
2
__
1 −
1
n
Trace
_
¯
S(λ
f
, θ
f
)
_
_
2
, (3.29)
where
¯
S(λ
f
, θ
f
) =
¯
Σ
f
_
¯
Σ
T
f
¯
Σ
f
+ λ
f
Σ
f
_
−1
¯
Σ
T
f
.
Step 2b: Run Step 2a with the unknown parameters replaced by (β, λ
g
, θ
g
) and
¯ y = y−Bµ−
¯
Σ
f
α−
¯
Σ
h
γ. Form
¯
Σ
g
, Σ
g
and
¯
S(λ
g
, θ
g
). Select (λ
g
, θ
g
) by GCV,
then estimate β by ridge regression.
Step 2c: Run Step 2a with the unknown parameters replaced by (γ, λ
h
, θ
h
) and
¯ y = y − Bµ −
¯
Σ
f
α −
¯
Σ
g
β. (For the ad hoc approach, remove the trend of ¯ y
in the vintage direction.) Form
¯
Σ
h
, Σ
h
and
¯
S(λ
h
, θ
h
). Select (λ
h
, θ
h
) by GCV.
Estimate γ by ridge regression.
Step 2d: reestimate the µ by GLS (3.20) for given (λ
f
, λ
g
, λ
h
, θ
f
, θ
g
, θ
h
).
Step 3: Repeat steps 2a–2d until convergence, say, when the estimates of
µ, α, β, γ change less than a prespeciﬁed tolerance. Obtain ˆ σ
2
by (3.21).
For future reference, the above iterative procedures are named as the MEV Back
ﬁtting algorithm. It is eﬃcient and can automatically take into account selection of
both smoothing and structural parameters. After obtaining the parameter estimates,
we may reconstruct the vintage data by ´ y = B´ µ+
¯
Σ
f
´ α+
¯
Σ
g
´
β+
¯
Σ
h
´ γ. The following
are several remarks for practical implementation:
1. In order to capture a diversity of important features, a combination of diﬀerent
covariance kernels can be included in the same MEV model. The exponential,
Mat´ern and AdaSS r.k. families listed in (3.12), (3.13), (3.14) are of our primary
interest here, whereas other types of kernels can be readily employed. When
the underlying function is heterogeneously smooth, e.g. involving jumps, one
may choose the AdaSS kernel (3.14).
66
2. For the ridge regression (3.28), the matrix
¯
Σ
f
contains rowwise replicates and
can be reduced by weighted least squares, in which case the response vector
¯ y is reduced to the pointwise averages; see (2.4). The evaluation of the GCV
criterion (3.29) can be modiﬁed accordingly; see (2.22).
3. When the mean function µ(m, t; v) is speciﬁed to be the intercept eﬀect, i.e.
µ(m, t; v) = µ
0
, one may replace µ
0
by the mean of y in step 1 and step 2d. In
this case, we may write the MEV Kriging as ζ(x) =
ˆ
f(m) + ˆ g(t) +
ˆ
h(v) with
ˆ
f(m) = ˆ µ
0
+
I
i=1
ˆ α
i
K
f
(m, m
i
), ˆ g(t) =
L
l=1
ˆ
β
l
K
g
(t, t
l
),
ˆ
h(v) =
J
j=1
ˆ γ
j
K
h
(v, v
j
).
4. In principal, both ˆ g(t) and
ˆ
h(v) in step 2b and step 2c have mean zero, but in
practice, they may need the centering adjustment due to machine rounding.
3.4 Semiparametric Regression
Having discussed the MEV decomposition models based on Gaussian processes,
we consider in this section the inclusion of covariates in the following two scenarios:
1. x(m
jl
, t
jl
; v
j
) ∈ R
p
(or x
jl
in short) for a ﬁxed segment, where x
jl
denotes the
covariates of jth vintage at time t
jl
and age m
jl
= t
jl
−v
j
;
2. x(m
kjl
, t
kjl
; v
kj
) ∈ R
p
and z
k
∈ R
q
for multiple segments that share the same
vintage diagram, where z
k
represents the static segmentation variables.
Of our interest are the the semiparametric MEV regression models with both non
parametric f(m), g(t), h(v) and parametric covariate eﬀects.
After a brief discussion of the ﬁrst situation with a single segment, we will spend
more time in the second situation with multiple segments. For simplicity, the covari
ates x
jl
or x
kjl
are assumed to be deterministic, i.e. no error in variables.
67
3.4.1 Single Segment
Given a single segment with dualtime observations ¦y(m
jl
, t
jl
; v
j
), x(m
jl
, t
jl
; v
j
)¦
for j = 1, . . . , J and l = 1, . . . , L
j
, we consider the partial linear model that adds the
linear covariate eﬀects to the MEV decomposition (3.2),
η(y
jl
) = f(m
jl
) + g(t
jl
) + h(v
j
) +π
T
x
jl
+ ε (3.30)
= µ
T
b
jl
+π
T
x
jl
+ Z
f
(m
jl
) + Z
g
(t
jl
) + Z
h
(v
j
) + ε
where x
jl
≡ x(m
jl
, t
jl
; v
j
), (µ, π) are the parameters for the mean function, and f, g, h
are nonparametric and modeled by Gaussian processes.
The semiparametric regression model (3.30) corresponds to the linear mixedeﬀect
modeling, and the covariate eﬀects π can be simply estimated in the same way we
estimated µ in (3.20). That is, conditional on Σ,
(´ µ; ´ π) =
_
¯
B
T
_
Σ+I
¸
−1
¯
B
_
−1
¯
B
T
_
Σ+I
¸
−1
y. (3.31)
where
¯
B = [B
.
.
.X] with X = [x
jl
]
n×p
. This GLS estimator can be incorporated with
the established MEV Backﬁtting algorithm discussed in Section 3.3.3. Only step 1
and step 2d need modiﬁed to be:
Step 1: Set (µ, π) initially to be the OLS estimate, i.e. (3.31) with zero Σ.
Step 2d: reestimate (µ; π) by (3.31) for given (λ
f
, λ
g
, λ
h
, θ
f
, θ
g
, θ
h
).
Then, the whole set of parameters in (3.30) can be iteratively estimated by the mod
iﬁed backﬁtting algorithm. There is a word of caution about multicollinearity, since
the orthogonality condition in Proposition 3.2 may be not satisﬁed between the co
variate space span¦x¦ and the Gaussian processes. In this case, instead of starting
from the fully nonparametric approach, we may assume the f, g, h components in
(3.30) can be approximated by the kernel basis expansion through (3.26), then run
68
backﬁtting. Fortunately, the smoothing parameters λ
f
, λ
g
, λ
h
may guard against pos
sible multicollinearity and give the shrinkage estimate of the covariate eﬀects π. As
a tradeoﬀ, part of Gaussian process components might be oversmoothed.
3.4.2 Multiple Segments
Suppose we are given K segments of data that share the same vintage diagram,
_
y(m
kjl
, t
kjl
; v
kj
), x(m
kjl
, t
kjl
; v
kj
), z
k
_
, (3.32)
for k = 1, . . . , K, j = 1, . . . , J and l = 1, . . . , L
j
, where x
kjl
∈ R
p
and z ∈ R
q
.
Denote by N the total number of observations across K segments. There are various
approaches to modeling crosssegment responses, e.g.
a. η(y
kjl
) = f(m
kjl
) + g(t
kjl
) + h(v
kj
) +π
T
x
kjl
+ω
T
z
k
+ ε
b. η(y
kjl
) = f
k
(m
kjl
) + g
k
(t
kjl
) + h
k
(v
kj
) +π
T
k
x
kjl
+ ε (3.33)
c. η(y
kjl
) = u
(k)
f
f(m
kjl
) + u
(k)
g
g(t
kjl
) + u
(k)
h
h(v
kj
) +π
T
k
x
kjl
+ W(z
k
) + ε.
Here, the ﬁrst approach assumes that the segments all perform the same MEV com
ponents and covariate eﬀect π, and diﬀer only through ω
T
z
k
. The second approach
takes the other extreme assumption such that each segment performs marginally
diﬀerently in terms of f, g, h and covariate eﬀects, therefore equivalent to K sepa
rate singlesegment models. Clearly, both of these approaches can be treated by the
singlesegment modeling technique discussed in the last section.
Of our interest of study is the third approach that allows for multiple segments to
have diﬀerent sensitivities u
(k)
f
, u
(k)
g
, u
(k)
h
∈ R to the same underlying f(m), g(t), h(v).
In another word, the functions f(m), g(t), h(v) represent the common factors across
segments, while the coeﬃcients u
(k)
f
, u
(k)
g
, u
(k)
h
represent the idiosyncratic multipliers.
Besides, it allows for the segmentspeciﬁc addon eﬀects W(z
k
) and segmentspeciﬁc
69
covariate eﬀect π
k
. Thus, the model (c) is postulated as a balance between the
overstringent model (a) and the overﬂexible model (b).
Diﬀerent strategies can be applied to model the segmentspeciﬁc eﬀects W(z
k
).
Among others, we consider (c1) the linear modeling W(z
k
) = ω
T
z
k
with parameter
ω ∈ R
q
and (c2) the Gaussian process modeling
EW(z
k
) = 0, EW(z
k
)W(z
k
) = λ
W
K
W
(z
k
, z
k
) (3.34)
for the smoothing parameter λ
W
> 0 and some prespeciﬁced family of covariance
kernels K
W
. For example, we may use the squared exponential covariance kernel in
(3.12). The estimation of these two types of segment eﬀects are detailed as follows.
Linear Segment Eﬀects.
Given the multisegment vintage data (3.32), consider
η(y
kjl
) = u
(k)
f
f(m
kjl
) + u
(k)
g
g(t
kjl
) + u
(k)
h
h(v
kj
) +π
T
k
x
kjl
+ω
T
z
k
+ ε (3.35)
with basis approximation
f(m) = µ
0
+
I
i=1
α
i
K
f
(m, m
i
), g(t) =
L
l=1
β
l
K
g
(t, t
l
), h(v) =
J
j=1
γ
j
K
g
(v, v
j
)
through the exponential or Mat´ern kernels. (The smoothing spline kernels can be
also used after appropriate modiﬁcation of the mean function.) Using the matrix
notations, the model takes the form
_
u
f
, Bµ +
¯
Σ
f
α
_
+
_
u
g
,
¯
Σ
g
β
_
+
_
u
h
,
¯
Σ
h
γ
_
+Xπ +Zω
where the underscored vectors and matrices are formed based on multisegment ob
servations, (u
f
, u
g
, u
h
) are the extended vector of segmentwise multipliers, X has
70
K p columns corresponding to the vectorized coeﬃcients π = (π
1
; . . . ; π
K
), and
Z is as usual. When both the kernel parameters (λ, θ) for K
f
, K
g
, K
h
and the (con
densed Kvector) multipliers u
f
, u
g
, u
h
are ﬁxed, the regularized least squares can be
used for parameter estimation, using the same penalty terms as in (3.27).
Iterative procedures can be used to estimate all the parameters simultaneously,
including ¦µ, α, β, γ, λ, θ¦ for the underlying f, g, h functions and u
f
, u
g
, u
h
, π, ω
across segments. The following algorithm is constructed by utilizing the MEV Back
ﬁtting algorithm as an intermediate routine.
Step 1: Set the initial values of (π, ω) to be the OLS estimates for ﬁtting
y = Xπ +Zω +ε based on all N observations across K segments.
Step 2: Run the MEV Backﬁtting algorithm for the pseudo data y −X´ π −Z´ ω
for obtaining the estimates of
ˆ
f(m), ˆ g(t),
ˆ
h(v)
Step 3: Update (u
f
, u
g
, u
h
) and (π, ω) by OLS estimates for ﬁtting
y = Fu
f
+Gu
g
+Hu
h
+Xπ +Zω +ε
where F, G, H are all N K regression submatrices constructed based on
ˆ
f(m
kjl
), ˆ g(t
kjl
),
ˆ
h(v
kj
), respectively.
Step 4: Repeat step 2 and step 3 until convergence.
Note that in Step 2, the common factors
ˆ
f(m), ˆ g(t) and
ˆ
h(v) are estimated ﬁrst
by assuming u
k
= Eu
k
= 1 for each of f, g, h. In Step 3, the segmentwise multipliers
u
k
’s are updated by regressing over
ˆ
f, ˆ g and
ˆ
h. In so doing, the concern of stability
is one reason. Disregarding the segment variability in Step 2 is also reasonable, since
f(m), g(t), h(v) are by deﬁnition the common factors in the overall sense. Then,
including the segment variability in Step 3 improves both the estimation of (π, ω)
71
and the estimation of f, g, h in the next iteration of Step 2. Such ﬁtting procedure is
practically interesting and computationally tractable.
Spatial Segment Heterogeneity.
We move to consider the segment eﬀect modeling by a Gaussian process
η(y
kjl
) = u
(k)
f
f(m
kjl
) + u
(k)
g
g(t
kjl
) + u
(k)
h
h(v
kj
) + W(z
k
) +π
T
k
x
kjl
+ ε (3.36)
where f, g, h are approximated in the same way as in (3.35) and W(z
k
) is assumed to
follow (3.34). The orthogonality constraint between W(z) and ¦f(m), g(t), h(v)¦
can be justiﬁed from a tensorproduct space point of view, but the orthogonal
ity is not guaranteed between W(z
k
) and the span of x
kjl
. Despite of the or
thogonality constraint, let us directly approximate W(z) by the basis expansion
W(z) =
K
k=1
ω
k
K
W
(z, z
k
) associated with the unknown scale parameter (denoted
by θ
W
). Then, we have the following two iterative stages of model estimation.
Stage 1: for given π
k
and u
k
≡ 1, k = 1, . . . , K, estimate the commonfactor
functions f(m), g(t), h(v) and the spatial segment eﬀect W(z
k
) in
η(y
kjl
) −π
T
k
x
kjl
= µ
0
+
I
i=1
α
i
K
f
(m
kjl
, m
i
) +
L
i=1
β
i
K
g
(t
kjl
, t
i
)
+
J
i=1
γ
i
K
g
(v
kj
, v
i
) +
K
i=1
ω
i
K
W
(z
k
, z
i
) + ε
by the regularized least squares
min
_
_
_
¯ y −Bµ −
¯
Σ
f
α−
¯
Σ
g
β −
¯
Σ
h
γ −
¯
Σ
W
ω
_
_
2
+λ
f
_
_
α
_
_
2
Σ
f
+ λ
g
_
_
β
_
_
2
Σg
+ λ
h
_
_
γ
_
_
2
Σ
h
+ λ
W
_
_
ω
_
_
2
Σ
W
_
, (3.37)
where ¯ y is the pseudo response vector of η(y
kjl
) − π
T
k
x
kjl
for all (k, j, l). The
regression submatrices B,
¯
Σ
f
,
¯
Σ
g
,
¯
Σ
h
,
¯
Σ
W
are constructed accordingly, all hav
72
ing N rows. The matrices used for regularization are of size I I, LL, J J
and K K, respectively. Both the smoothing parameters λ
f
, λ
g
, λ
h
, λ
W
> 0
and the structural parameters θ
f
, θ
g
, θ
h
, θ
W
need to be determined.
Stage 2: for given f, g, h, estimate the parameters (u
f
, u
g
, u
h
) and π based on
y = Fu
f
+Gu
g
+Hu
h
+Xπ + ¯ w, Cov[ ¯ w] = σ
2
_
λ
W
Σ
W
+I
¸
where Σ
W
=
_
K
W
(z
k
, z
k
)
¸
N×N
is obtained by evaluating every pair of obser
vations either within or between segments. The parameter estimation can be
obtained by the generalized least squares.
Obviously, the regularized least squares in Stage 1 extends the MEV Backﬁtting
algorithm to entertain a fourth kernel component K
W
. Let us call the extended
algorithm MEVS Backﬁtting, where the added letter “S” means the segment eﬀect.
Iterate Stage 1 and Stage 2 until convergence, then we obtain the following list of
eﬀect estimates (as a summary):
1. Commonfactor maturation curve:
ˆ
f(m) = ˆ µ
0
+
I
i=1
ˆ α
i
K
f
(m, m
i
)
2. Commonfactor exogenous inﬂuence: ˆ g(t) =
L
l=1
ˆ
β
l
K
g
(t, t
l
)
3. Commonfactor vintage heterogeneity:
ˆ
h(v) =
J
j=1
ˆ γ
j
K
h
(v, v
j
)
4. Idiosyncratic multipliers to (
ˆ
f, ˆ g,
ˆ
h): u
(k)
f
, u
(k)
g
, u
(k)
h
for k = 1, . . . , K
5. Segmentspeciﬁc covariate eﬀects: π
k
, for k = 1, . . . , K
6. Spatial segment heterogeneity:
´
W(z) =
K
k=1
ˆ ω
k
K
W
(z, z
k
).
3.5 Applications in Credit Risk Modeling
This section presents the applications of MEV decomposition methodology to
the real data of (a) Moody’s speculativegrade corporate default rates shown in Ta
73
ble 1.1, and (b) the tweaked sample of retail loan loss rates discussed in Chapter II.
Both examples demonstrate the rising challenges for analysis of the credit risk data
with dualtime coordinates, to which we make only an initial attempt based on the
MEV Backﬁtting algorithm. Our focus is on understanding (or reconstruction) of the
marginal eﬀects ﬁrst, then perform data smoothing on the dualtime domain. To test
the MEV decomposition modeling, we begin with a simulation study based on the
synthetic vintage data.
3.5.1 Simulation Study
Let us synthetize the year 2000 to 2008 monthly vintages according to the diagram
shown in Figure 1.3. Assume each vintage is originated at the very beginning of
the month, and its followup performance is observed at each month end. Let the
horizontal observation window be left truncated from the beginning of 2005 (time
0) and right truncated by the end of 2008 (time 48). Vertically, all the vintages
are observed up to 60 monthsonbook. It thus ends with the rectangular diagram
shown in Figure 3.1, consisting of vintages j = −60, . . . , −1 originated before 2005
and j = 0, 1, . . . , 47 originated after 2005. For each vintage j with origination time
v
j
, the dualtime responses are simulated by
log(y
jl
) = f
0
(m
jl
) + g
0
(t
jl
) + h
0
(v
j
) + ε, ε ∼ N(0, σ
2
) (3.38)
where t
jl
runs through max¦v
j
, 0¦ to 48 and m
jl
= t
jl
− v
j
. Regardless of the noise,
the responses y
jl
is the product of three marginal eﬀects, exp(f
0
(m)) exp(g
0
(t))
exp(h
0
(v)). In our simulation setting, the underlying truth exp(f
0
(m)) takes the
smooth form of inverse Gaussian hazard rate function (1.13) with parameter c = 6
and b = −0.02, the underlying exp(g
0
(t)) is volatile with exponential growth and
a jump at t = 23, and the underlying h
0
(v) is assumed to be a dominating sine
74
wave within [0, 40] and relatively ﬂat elsewhere. See their plots in Figure 3.2 (top
panel). Adding the noise with σ = 0.1, we obtain the synthetic vintage data, whose
projection views in lifetime, calendar and vintage origination time are shown in the
second panel of Figure 3.2. One may check the pointwise averages (or medians in the
vintage boxplots) in each projection view, compare them with the underlying truths
in order to see the contamination eﬀects by other marginals.
This synthetic data can be used to demonstrate the ﬂexibility of MEV decompo
sition technique such that a combination of diﬀerent kernels can be employed. We
manually choose the order3/2 Mat´ern kernel (3.13) for f(m), the AdaSS r.k. (3.14)
for g(t), and the squared exponential kernel (3.12) for h(v), then run the MEV Back
ﬁtting algorithm based on the logtransformed data. Since our data has very few
observations in both ends of the vintage direction, we performed binning for v ≥ 40,
as well as binning for v ≤ 0 based on the prior knowledge that pre2005 vintages
can be treated as a single bucket. Then, the MEV Backtting algorithm outputs the
estimated marginal eﬀects plotted in Figure 3.2 (bottom panel). They match the
underlying truth except for slight boundary bias.
The GCVselected structural and smoothing parameters are tabulated in Ta
ble 3.1. In ﬁtting g(t) with AdaSS, we have speciﬁed the knots τ
k
to handle the
jump by diagnostic from the raw marginal plots; for simplicity we have enforced φ
k
to be the same for regions without jump. It is interesting to check the ordering of
crossvalidated smoothing parameters. By (3.18), the smoothing parameters corre
spond to the reciprocal ratios of their variance scales to the nugget eﬀect σ
2
, therefore,
the smaller the smoothing parameter is, the larger variation the corresponding Gaus
sian process holds. Thus, the ordering of crossvalidated λ
f
> λ
h
> λ
g
conﬁrms with
the smoothness assumption for f
0
, g
0
, h
0
in our simulation setting.
Combining the three marginal estimates, and taking the inverse transform gives
the ﬁtted rates; see Figure 3.3 for the projection views. The noiseremoved recon
75
Figure 3.2: Synthetic vintage data analysis: (top) underlying true marginal eﬀects;
(2nd) simulation with noise; (3nd) MEV Backﬁtting algorithm upon con
vergence; (bottom) Estimation compared to the underlying truth.
76
Table 3.1: Synthetic vintage data analysis: MEV modeling exercise with GCV
selected structural and smoothing parameters.
MEV Kernel choice Structural parameter Smoothing parameter
f(m) Mat´ern (κ = 3/2) φ = 2.872 20.086
g(t) AdaSS r.k. Eq.(3.14) τ
k
= 21, 25, 47 1.182
φ
k
= 0.288, 8.650, 0.288
h(v) Squared exponential φ = 3.000 7.389
Figure 3.3: Synthetic vintage data analysis: (top) projection views of ﬁtted values;
(bottom) data smoothing on vintage diagram.
77
struction can reveal the prominent marginal features. It is clear that the MEV decom
position modeling performs like a smoother on the dualtime domain. One may also
compare the level plots of the raw data versus the reconstructed data in Figure 3.3
to see the dualtime smoothing eﬀect on the vintage diagram.
3.5.2 Corporate Default Rates
In the Figure 1.4 of Chapter 1 we have previewed the annual corporate default
rates (in percentage) of Moody’s speculativegrade cohorts: 19702008. Each yearly
cohort is regarded as a vintage. For vintages originated earlier than 1988, the default
rates are observed up to 20 years maximum in lifetime; for vintages originated after
1988, the default rates are observed up to the end of year 2008. Such truncation in
both lifetime and calendar time corresponds to the trapezoidal prototype of vintage
diagram illustrated in Figure 3.1.
Figure 1.4 gives the projection views of the empirical default rates in lifetime,
calendar and vintage origination time. By checking the marginal averages or medians
plotted on top of each view, it is evident that exogenous inﬂuence by macroeconomic
environment is more ﬂuctuating than both the maturation and vintage eﬀects. How
ever, such marginal plots are contaminated by each other; see e.g. the many spikes
shown in the lifetime projection actually correspond to the exogenous eﬀects. It is
our purpose to separate these marginal eﬀects.
We made slight data preprocessing. First, the transform function is prespeciﬁed
to be η(y) = log(y/100) where y is the raw percentage measurements and zero re
sponses are treated as missing values (one may also match them to a small positive
value). Second, since there are very limited number of observations for small calendar
time, the exogenous eﬀects g(t) for t = 2, 3 are merged to t = 4, and the leftmost
observation at t = 1 is removed because of its jumpy behavior. Besides, the vintage
eﬀects are merged for large v = 30, . . . , 39.
78
Figure 3.4: Results of MEV Backﬁtting algorithm upon convergence (right panel)
based on the squared exponential kernels. Shown in the left panel is the
GCV selection of smoothing and structural parameters.
Figure 3.5: MEV ﬁtted values: Moody’srated Corporate Default Rates
79
To run the MEV Backﬁtting algorithm, K
f
, K
g
, K
h
are all chosen to be the squared
exponential covariance kernels, for which the mean function is ﬁxed to be the intercept
eﬀect. For each univariate Gaussian process ﬁtting, say Z
f
(m), the GCV criterion
selects the smoothing parameter λ
f
, as well as the structural parameter θ
f
(i.e. the
scale parameter φ in (3.12)). For demonstration, we used the grid search on the
log scale of λ with only 3 diﬀerent choices of φ = 1, 2, 3 (φ = 3 is set to be the
largest scale for stability, otherwise the reciprocal conditional number of the ridged
regression matrix would vanish). The backﬁtting results upon convergence are shown
in Figure 3.4, where GCV selects the largest scale parameter in each iteration. An
interesting result is the GCVselected smoothing parameter λ, which turns out to
be 9.025, 0.368, 6.050 for Z
f
, Z
g
, Z
h
, respectively. By (3.18), it is implied that the
exogenous inﬂuence shares the largest variation, followed by the vintage heterogeneity,
while the maturation curve has the least variation. Furthermore, the changes of
estimated exogenous eﬀects in Figure 3.4 follow the NBER recession dates shown in
Figure 1.1.
The inverse transform η
−1
() maps the ﬁtted values to the original percentage scale.
Figure 3.5 plots the ﬁtted values in both lifetime and calendar views. Compared to
the raw plots in Figure 1.4, the MEV modeling could retain some of most interesting
features, while smoothing out others.
3.5.3 Retail Loan Loss Rates
Recall the tweaked creditrisk sample discussed in Chapter II. It was used as a
motivating example for the development of adaptive smoothing spline in ﬁtting the
pointwise averages that suppose to smooth out in lifetime (monthonbook, in this
case). However, the raw responses behave rather abnormally in large monthonbook,
partly because of the small size of replicates, and partly because of the contamination
by the exogenous macroeconomic inﬂuence. In this section, the same tweaked sample
80
is analyzed on the dualtime domain, for which we model not only the maturation
curve, but also the exogenous inﬂuence and the vintage heterogeneity eﬀects.
The tweaked sample of dualtime observations are supported on the triangular
vintage diagram illustrated in Figure 3.1, which is mostly typical in retail risk man
agement. They measure the loss rates in retail revolving exposures from January 2000
to December 2006. Figure 3.6 (top panel) shows the projection views in lifetime, cal
endar and vintage origination time, where the new vintages originated in 2006 are
excluded since they have too short window of performance measurements. Some of
the interesting features for such vintage data are listed below:
1. The rates are ﬁxed to be zero for small monthonbook m ≤ 4.
2. There are less and less observations when the monthonbook m increases.
3. There are less and less observations when the calendar month t decreases.
4. The rates behave relatively smooth in m, from the lifetime view.
5. The rates behave rather dynamic and volatile in t, from the calendar view.
6. The rates show heterogeneity across vintages, from the vintage view.
To model the loss rates, timescale binding was performed ﬁrst for large month
onbook m ≥ 72 and small calendar time t ≤ 12. We prespeciﬁed the log transform
for the positive responses (upon removal of zero responses). The MEV decomposition
model takes the form of log(y
jl
) = f(m
jl
) +g(t
jl
) +h(v
j
) +ε, subject to Eg = Eh = 0.
In this way, the original responses are decomposed multiplicatively to be e
f(m)
, e
g(t)
and e
h(v)
, where the exponentiated maturation curve could be regarded as the baseline,
together with the exponentiated exogenous and vintage heterogeneity multipliers.
In this MEV modeling exercise, the Mat´ern kernels were chosen to run the MEV
Backﬁtting algorithm, and the GCV criterion was used to select the smoothing and
structural parameters. The ﬁtting results are presented in Figure 3.6 (bottom panel),
81
Figure 3.6: Vintage data analysis of retail loan loss rates: (top) projection views of
emprical loss rates in lifetime m, calendar t and vintage origination time
v; (bottom) MEV decomposition eﬀects
ˆ
f(m), ˆ g(t) and
ˆ
h(v) (at log scale).
which shows the estimated
ˆ
f(m), ˆ g(t),
ˆ
h(v) in the log response scale. Each of these es
timated functions follows the general dynamic pattern of the corresponding marginal
averages in the empirical plots above. The diﬀerences in local regions reﬂect the
improvements in each marginal direction after removing the contamination of other
marginals. Compared to both the exogenous and vintage heterogeneity eﬀects, the
maturation curve is rather smooth. It is also worth pointing out that the exogenous
spike could be captured relatively well by the Mat´ern kernel.
3.6 Discussion
For the vintage data that emerge in credit risk management, we propose an MEV
decomposition framework to estimate the maturation curve, exogenous inﬂuence and
82
vintage heterogeneity on the dualtime domain. One diﬃculty associated with the
vintage data is the intrinsic identiﬁcation problem due to their hyperplane nature,
which also appears in the conventional cohort analysis under the ageperiodcohort
model. To regularize the threeway marginal eﬀects, we have studied nonparametric
smoothing based on Gaussian processes, and demonstrated its ﬂexibility through
choosing covariance kernels, structural and smoothing parameters. An eﬃcient MEV
Backﬁtting algorithm is provided for model estimation. The new technique is then
tested through a simulation study and applied to analyze both examples of corporate
default rates and retail loss rates, for which some preliminary results are presented.
Beyond our initial attempt made in this chapter, there are other open problems
associated with the vintage data analysis (VDA) worthy of future investigation.
1. The dualtime model assessment and validation remains an issue. In the MEV
Backﬁtting algorithm we have used the leaveoneout crossvalidation for ﬁtting
each marginal component function. It is however not directly performing cross
validation on the dualtime domain, which would be an interesting subject of
future study.
2. The model forecasting is often a practical need in ﬁnancial risk management,
as it is concerning about the future uncertainty. To make prediction within the
MEV modeling framework, one needs to extrapolate f(m), g(t), h(v) in each
marginal direction in order to know about their behaviors out of the train
ing bounds of m, t, v. By Kriging theory, for any x = (m, t; v) on the vintage
diagram, the formula (3.22) provides the conditional expectation given the his
torical performance. Furthermore, it is straightforward to simulate the random
paths for a sequence of inputs (x
1
, . . . , x
p
) based on the multivariate conditional
normal distribution N
p
(µ
(c)
, Σ
(c)
), where the conditional mean and covariance
83
are given by
_
¸
_
¸
_
µ
(c)
= B´ µ + Υ
_
Σ+I
¸
−1
_
y −B´ µ
_
Σ
(c)
= Σ−Υ
_
Σ+I
¸
−1
Υ
T
with
_
¸
_
¸
_
Υ =
_
K(x
i
, x
jl
)
¸
p×n
Σ =
_
K(x
i
, x
j
)
¸
p×p
(3.39)
using the aggregated kernel given by (3.18) and the notations in (3.19). How
ever, there is a word of caution about making prediction of the future from
the historical training data. In the MEV modeling framework, extrapolating
the maturation curve f(m) in lifetime is relatively safer than extrapolating the
exogenous and vintage eﬀects g(t) and h(v), since f(m) is of selfmaturation
nature, while g(t) can be interfered by future macroeconomic changes and h(v)
is also subject to the future changes of vintage origination policy. Before making
forecasts in terms of calendar time, the outofsample test is recommended.
3. For extrapolation of MEV component functions, parametric modeling approach
is sometimes more tractable than using Gaussian processes. For example, if
given the prior knowledge that the maturation curve f(m) follows the shape of
the inverse Gaussian hazard rate (1.13), like our synthetic case in Section 3.5,
one may estimate the parametric f(m) with nonlinear regression technique. As
for the exogenous eﬀect g(t) that is usually dynamic and volatile, one may use
parametric time series models; see e.g. Fan and Yao (2003). Such parametric
approach would beneﬁt the loss forecasting as they become more and more
validated through experiences.
4. The MEV decomposition we have considered so far is simpliﬁed to exclude the
interaction eﬀects involving two or all of age m, time t and vintage v arguments.
In formulating the models by Gaussian processes, the mutual independence
among Z
f
(m), Z
g
(t), Z
h
(v) is also postulated. One of the ideas for taking into
account the interaction eﬀects is to crosscorrelate Z
f
(m), Z
g
(t), Z
h
(v). Consider
84
the combined Gaussian process (3.17) with covariance
Cov[Z
f
(m) + Z
g
(t) + Z
h
(v), Z
f
(m
) + Z
g
(t
) + Z
h
(v
)]
= EZ
f
(m)Z
f
(m
) +EZ
f
(m)Z
g
(t
) +EZ
f
(m)Z
h
(v
)
+ EZ
g
(t)Z
f
(m
) +EZ
g
(t)Z
g
(t
) +EZ
g
(t)Z
h
(v
)
+ EZ
h
(v)Z
f
(m
) +EZ
h
(v)Z
g
(t
) +EZ
h
(v)Z
h
(v
),
which reduces to (3.18) when the crosscorrelations between Z
f
(m), Z
g
(t) and
Z
h
(v) are all zero. In general, it is straightforward to incorporate the cross
correlation structure and follow immediately the MEV Kriging procedure. For
example, one of our works in process is to assume that
EZ
g
(t)Z
h
(v) =
σ
2
_
λ
g
λ
h
K
gh
(t, v), where K
gh
(t, v) = ρI(t ≥ v), ρ ≥ 0
in order to study the interaction eﬀect between the vintage heterogeneity and
the exogenous inﬂuence. Results will be presented by a future report.
Besides the open problems listed above, the scope of VDA can be also general
ized to study noncontinuous types of observations. For examples we considered in
Section 3.5, the Moody’s default rates were actually calculated from some raw binary
indicators (of default or not), while the loss rates in the synthetic retail data can be
viewed as the ratio of loss units over the total units. Thus, if we are given the raw
binary observations or unit counts, one may consider the generalized MEV models
upon the speciﬁcation of a link function, as an analogy to the generalized linear mod
els (Mccullagh and Nelder, 1989). The transform η in (3.2) plays a similar role as
such a link function, except for that it is functioning on the rates that are assumed
to be directly observed.
One more extension of VDA is to analyze the timetodefault data on the vintage
85
diagram, which is deferred to the next chapter where we will develop the dualtime
survival analysis.
3.7 Technical Proofs
Proof of Lemma 3.1: By that m−t +v = 0 on the vintage diagram Ω, at least
one of the linear parts f
0
, g
0
, h
0
is not identiﬁable, since otherwise f
0
(m) −m, g
0
(t) +
t, h
0
(v) −v would always satisfy (3.2).
Proof of Proposition 3.2: By applying the general spline theorem to (3.23), the
target function has a ﬁnitedimensional representation ζ(x) = µ
0
+µ
1
m+µ
2
t +c
T
ξ
n
,
where c
T
ξ
n
=
J
j=1
L
j
l=1
c
jl
K(x, x
jl
). Substitute it back into (3.23) and use the
reproducing property ¸K(, x
jl
), K(, x
j
l
))
H
1
= K(x
jl
, x
j
l
), then we get
min
_
_
_
y −Bµ −Σc
_
_
2
+
_
_
c
_
_
Σ
_
using the notations in (3.19). Taking the partial derivatives w.r.t. µ and c and setting
them to zero gives the solution
´ µ =
_
B
T
(Σ+I)
−1
B
_
−1
B
T
(Σ+I)
−1
y
´c =
_
Σ+I
_
−1
(y −B´ µ),
which leads ζ(x) to have the same form as the Kriging predictor (3.22).
Proof of Proposition 3.3: By (3.18) and comparing (3.24) and (3.26), it is
not hard to derive the relationships between two sets of kernel coeﬃcients,
α
i
= λ
−1
f
j,l
c
jl
I(m
jl
= m
i
), β
l
= λ
−1
g
j,l
c
jl
I(t
jl
= t
l
), γ
j
= λ
−1
h
L
j
l=1
c
jl
(3.40)
that aggregate c
jl
for the replicated kernel bases for every m
i
, t
l
, v
j
, respectively.
86
Then, using the new vectors of coeﬃcients α = (α
1
, . . . , α
I
)
T
, β = (β
1
, . . . , β
L
)
T
and
γ = (γ
1
, . . . , γ
J
)
T
, we have that
Σc =
¯
Σ
f
α+
¯
Σ
g
β +
¯
Σ
h
γ
c
T
Σc = λ
f
α
T
Σ
f
α+ λ
g
β
T
Σ
g
β + λ
h
γ
T
Σ
h
γ.
Plugging them into (3.25) leads to (3.27), as asserted.
87
CHAPTER IV
Dualtime Survival Analysis
4.1 Introduction
Consider in this chapter the default events naturally observed on the dualtime
domain, lifetime (or, age) in one dimension and calendar time in the other. Of our
interest is the timetodefault from multiple origins, such that the events happening
at the same calendar time may diﬀer in age. Age could be the lifetime of a person
in a common sense, and it could also refer to the elapsed time since initiation of a
treatment. In credit risk modeling, age refers to the duration of a credit account since
its origination (either a corporate bond or a retail loan). For many accounts having
the same origin, they share the same age at any followup calendar time, and group
as a single vintage.
Refer to Chapter I about basics of survival analysis in lifetime scale, where the
notion of hazard rate is also referred to be the default intensity in credit risk con
text. As introduced in Chapter 1.4, dualtime observations of survival data subject to
truncation and censoring can be graphically represented by the Lexis diagram (Kei
ding, 1990). To begin with, let us simulate a Lexis diagram of dualtimetodefault
observations:
(V
j
, U
ji
, M
ji
, ∆
ji
), i = 1, . . . , n
j
, j = 1, . . . , J (4.1)
88
where V
j
denotes the origination time of the jth vintage that consists of n
j
accounts.
For each account, U
ji
denotes the time of left truncation, M
ji
denotes the lifetime
of event termination, and ∆
ji
indicates the status of termination such that ∆
ji
= 1
if the default event occurs prior to getting censored and 0 otherwise. Let us ﬁx
the rectangular vintage diagram with age m ∈ [0, 60], calendar time t ∈ [0, 48] and
vintage origination V
j
= −60, . . . , −1, 0, 1, . . . , 47, same as the synthetic vintage data
in Chapter 3.5. So, the observations are left truncated at time 0. For the censoring
mechanism, besides the systematic censoring with m
max
= 60 and t
max
= 48, an
independent random censoring with the constant hazard rate λ
att
is used to represent
the account attrition (i.e., a memoryless exponential lifetime distribution).
Denote by λ
v
(m, t) or λ
j
(m, t) the underlying dualtime hazard rate of vintage
v = V
j
at lifetime m and calendar time t. Then, one may simulate the dualtimeto
default data (4.1) according to:
τ
ji
= inf
_
m :
_
m
0
λ
j
(s, V
j
+ s)ds ≥ −log X, X ∼ Unif
_
0, 1
¸
_
(default)
C
ji
= inf
_
m : λ
att
m ≥ −log X, X ∼ Unif
_
0, 1]
_
(attrition)
U
ji
= max¦
¯
U
ji
, V
j
¦ −V
j
, with
¯
U
ji
≡ 0 (left truncation)
Book (V
j
, U
ji
, M
ji
, ∆
ji
) if U
ji
< min¦τ
ji
, C
ji
¦ : (4.2)
M
ji
= min¦τ
ji
, C
ji
, m
max
, t
max
−V
j
¦ (termination time)
∆
ji
= I
_
τ
ji
≤ min¦C
ji
, m
max
, t
max
−V
j
¦
_
(default indicator)
for each given V
j
= −60, . . . , −1, 0, 1, . . . , 47. For each vintage j, one may book n
j
number of origination accounts by repeatedly running (4.2) n
j
times independently.
Let us begin with n
j
≡ 1000 and specify the attrition rate λ
att
= 50bp (a basis point
is 1%). Among other types of dualtime default intensities, assume the multiplicative
form
λ
v
(m, t) = λ
f
(m)λ
g
(t)λ
h
(v) = exp
_
f(m) + g(t) + h(v)
_
(4.3)
89
Figure 4.1: Dualtimetodefault: (left) Lexis diagram of subsampled simulations;
(right) empirical hazard rates in either lifetime or calendar time.
where the maturation curve f(m), the exogenous inﬂuence g(t) and the unobserved
heterogeneity h(v) are interpreted in the sense of Chapter III and they are assumed
to follow Figure 3.2 (top panel). For simulating the lefttruncated survival accounts,
we assume the zero exogenous inﬂuence on their historical hazards.
Figure 4.1 (left) is an illustration of the Lexis diagram for one random simulation
per vintage, where each asterisk at the end of a line segment denotes a default event
and the circle denotes the censorship. To see default behavior in dualtime coordi
nates, we calculated the empirical hazard rates by (4.14) in lifetime and (4.15) in
calendar time, respectively, and the results are plotted in Figure 4.1 (right). Each
marginal view of the hazard rates shows clearly the pattern matching our underlying
assumption, namely
1. The endogenous (or maturation) performance in lifetime is relatively smooth;
2. The exogenous performance in calendar time is relatively dynamic and volatile.
90
This is typically the case in credit risk modeling, where the default risk is greatly
aﬀected by macroeconomic conditions. It is our purpose to develop the dualtime
survival analysis in not only lifetime, but also simultaneously in calendar time.
Despite that the Lexis diagram has appeared since Lexis (1875), statistical litera
ture of survival analysis has dominantly focused on the lifetime only rather than the
dualtime scales; see e.g. Andersen, et al. (1993) with nine to one chapter coverage.
In another word, lifetime has been taken for granted as the basic timescale of sur
vival, with very few exceptions that take either calendar time as the primary scale
(e.g. Arjas (1986)) or both time scales (see the survey by Keiding (1990)). There
is also discussion on selection of the appropriate timescale in the context of multiple
timescale reduction; see e.g. Farewell and Cox (1979) about a rotation technique
based on a naive Cox proportional hazards (CoxPH) model involving only the lin
earized timecovariate eﬀect. Such timescale dimension reduction approach can be
also referred to Oakes (1995) and Duchesne and Lawless (2000), given the multiple
dimensions that may include usage scales like mileage of a vehicle.
The literature of regression models based on dualtimetodefault data (a.k.a. sur
vival data with staggered entry) has also concentrated on the use of lifetime as the
basic timescale together with timedependent covariates. Mostly used is the CoxPH
regression model with an arbitrary baseline hazard function in lifetime. Some weak
convergence results can be referred to Sellke and Siegmund (1983) by martingale ap
proximation and Billias, Gu and Ying (1997) by empirical process theory; see also the
related works of Slud (1984) and Gu and Lai (1991) on twosample tests. However,
such oneway baseline model speciﬁcation is unable to capture the baseline hazard
in calendar time. Then, it comes to Efron (2002) with symmetric treatment of dual
timescales under a twoway proportional hazards model
λ
ji
(m, t) = λ
f
(m)λ
g
(t) exp¦θ
T
z
ji
(m)¦, i = 1, . . . , n
j
, j = 1, . . . , J. (4.4)
91
Note that Efron (2002) considered only the special case of (4.4) with cubic polynomial
expansion for log λ
f
and log λ
g
, as well as the timeindependent covariates. In general,
we may allow for arbitrary (nonnegative) λ
f
(m) and λ
g
(t), hence name (4.4) to be
a twoway Cox regression model.
In credit risk, there seems to be no formal literature on dualtimetodefault risk
modeling. It is our objective of this chapter to develop the dualtime survival analysis
(DtSA) for credit risk modeling, including the methods of nonparametric estimation,
structural parameterization, intensitybased semiparametric regression, and frailty
speciﬁcation accounting for vintage heterogeneity eﬀects. The aforementioned dual
time Cox model (4.4) will be one of such developments, whereas the Efron’s approach
with polynomial baselines will fail to capture the rather diﬀerent behaviors of endoge
nous and exogenous hazards illustrated in Figure 4.1.
This chapter is organized as follows. In Section 4.2 we start with the generic form
of likelihood function based on the dualtimetodefault data (4.1), then consider the
oneway, twoway and threeway nonparametric estimators on Lexis diagram. The
simulation data described above will be used to assess the performance of nonparamet
ric estimators. In Section 4.3 we discuss the dualtime feasibility of structural models
based on the ﬁrstpassagetime triggering system, which is deﬁned by an endogenous
distancetodefault process associated with the exogenous timetransformation. Sec
tion 4.4 is devoted to the development of dualtime semiparametric Cox regression
with both endogenous and exogenous baseline hazards and covariate eﬀects, for which
the method of partial likelihood estimation plays a key role. Also discussed is the
frailty type of vintage heterogeneity by a randomeﬀect formulation. In Section 4.5,
we demonstrate the applications of both nonparametric and semiparametric dual
time survival analysis to credit card and mortgage risk modeling, based on our real
dataanalytic experiences in retail banking. We conclude the chapter in Section 4.6
and provide some supplementary materials at the end.
92
4.2 Nonparametric Methods
Denote by λ
ji
(m) ≡ λ
ji
(m, t) with t ≡ V
j
+ m the hazard rates of lifetimeto
default τ
ji
on the Lexis diagram. Given the dualtimetodefault data (4.1) with
independent lefttruncation and rightcensoring, consider the joint likelihood under
either continuous or discrete setting:
J
j=1
n
j
i=1
_
p
ji
(M
ji
)
S
ji
(U
+
ji
)
_
∆
ji
_
S
ji
(M
+
ji
)
S
ji
(U
+
ji
)
_
1−∆
ji
(4.5)
where p
ji
(m) denotes the density and S
ji
(m) denotes the survival function of τ
ji
.
On the continuous domain, m
+
= m, S
ji
(m) = e
−Λ
ji
(m)
with the cumulative hazard
Λ
ji
(m) =
_
m
0
λ
ji
(s)ds, then the loglikelihood function has the generic form
=
J
j=1
n
j
i=1
∆
ji
log λ
ji
(M
ji
) + log S
ji
(M
ji
) −log S
ji
(U
ji
)
=
J
j=1
n
j
i=1
_
∆
ji
log λ
ji
(M
ji
) −
_
M
ji
U
ji
λ
ji
(m)dm
_
. (4.6)
On the discrete domain of m, S
ji
(m) =
s<m
[1−λ
ji
(s)] and the likelihood function
(4.5) can be rewritten as
J
j=1
n
j
i=1
_
λ
ji
(M
ji
)
¸
∆
ji
[1 −λ
ji
(M
ji
)]
1−∆
ji
m∈(U
ji
,M
ji
)
[1 −λ
ji
(m)] (4.7)
and the loglikelihood function is given by
=
J
j=1
n
j
i=1
_
_
_
∆
ji
log
_
λ
ji
(M
ji
)
1 −λ
ji
(M
ji
)
_
+
m∈(U
ji
,M
ji
]
log[1 −λ
ji
(m)]
_
_
_
. (4.8)
In what follows we discuss oneway, twoway and threeway nonparametric approach
to the maximum likelihood estimation (MLE) of the underlying hazard rates.
93
4.2.1 Empirical Hazards
Prior to discussion of oneway nonparametric estimation, let us review the classical
approach to the empirical hazards vintage by vintage. For each ﬁxed V
j
, let λ
j
(m)
be the underlying hazard rate in lifetime. Given the data (4.1),
_
¸
¸
_
¸
¸
_
number of defaults: nevent
j
(m) =
n
j
i=1
I(M
ji
= m, ∆
ji
= 1)
number of atrisk’s: nrisk
j
(m) =
n
j
i=1
I(U
ji
< m ≤ M
ji
)
(4.9)
for a ﬁnite number of lifetime points m with nevent
j
(m) ≥ 1. Then, the jthvintage
contribution to the loglikelihood (4.8) can be rewritten as
j
=
m
nevent
j
(m) log λ
j
(m) +
_
nrisk
j
(m) −nevent
j
(m)
_
log[1 −λ
j
(m)] (4.10)
on the discrete domain of m. It is easy to show that the MLE of λ
j
(m) is given by
the following empirical hazard rates
´
λ
j
(m) =
nevent
j
(m)
nrisk
j
(m)
if nevent
j
(m) ≥ 1 and NaN otherwise. (4.11)
They can be used to express the wellknown nonparametric estimators for cumulative
hazard and survival function
(NelsonAalen estimator)
´
Λ
j
(m) =
m
i
≤m
´
λ
j
(m
i
); (4.12)
(KaplanMeier estimator)
´
S
j
(m) =
m
i
≤m
−
_
1 −
´
λ
j
(m
i
)
¸
(4.13)
for any m ≥ 0 under either continuous or discrete setting. Clearly, the Kaplan
Meier estimator and the NelsonAalen estimator are related empirically by
´
S
j
(m) =
m
i
≤m
−
_
1 − ∆
´
Λ(m
i
)
¸
, where the empirical hazard rates
´
λ(m
i
) ≡ ∆
´
Λ(m
i
) are also
called the NelsonAalen increments.
94
Now, we are ready to present the oneway nonparametric estimation based on the
complete dualtimetodefault data (4.1). In the lifetime dimension (vertical of Lexis
diagram), assume λ
ji
(m, t) = λ(m) for all (j, i), then we may immediately extend
(4.11) by simple aggregation to obtain the oneway estimator in lifetime:
´
λ(m) =
J
j=1
nevent
j
(m)
J
j=1
nrisk
j
(m)
, (4.14)
whenever the numerator is positive. Alternatively, in the calendar time (horizontal of
Lexis diagram) assume λ
ji
(m, t) = λ(t) for all (j, i), then we may obtain the oneway
nonparametric estimator in calendar time:
´
λ(t) =
J
j=1
nevent
j
(t −V
j
)
J
j=1
nrisk
j
(t −V
j
)
, (4.15)
whenever the numerator is positive. Accordingly, one may obtain the empirical cu
mulative hazard and survival function by the NelsonAalen estimator (4.12) and the
KaplanMeier estimator (4.13) in each marginal dimension.
For example, consider the simulation data introduced in Section 4.1. Both one
way nonparametric estimates are plotted in the right panel of Figure 4.1 (upon linear
interpolation), which may capture roughly the endogenous and exogenous hazards.
However, they may be biased in other cases we shall discuss next.
4.2.2 DtBreslow Estimator
The twoway nonparametric approach is to estimate the lifetime hazards λ
f
(m)
and the calendar hazards λ
g
(t) simultaneously under the multiplicative model:
λ
ji
(m, t) = λ
f
(m)λ
g
(t) = exp¦f(m) + g(t)¦, for all (j, i). (4.16)
For identiﬁability, assume that Elog λ
g
= Eg = 0.
95
Following the oneway empirical hazards (4.11) above, we may restrict to the
pointwise parametrization
ˆ
f(m) =
Lm
l=1
α
l
I(m = m
[l]
), ˆ g(t) =
Lt
l=1
β
l
I(t = t
[l]
) (4.17)
based on L
m
distinct lifetimes m
[1]
< < m
[Lm]
and L
t
distinct calendar times t
[1]
<
< t
[Lt]
such that there is at least one default event observed at the corresponding
time. One may ﬁnd the MLE of the coeﬃcients α, β in (4.17) by maximizing (4.6) or
(4.8). Rather than the crude optimization, we give an iterative DtBreslow estimator
based on Breslow (1972), by treating (4.16) as the product of a baseline hazard
function and a relativerisk multiplier.
Given the dualtimetodefault data (4.1) and any ﬁxed λ
g
=
´
λ
g
, the Breslow
estimator of λ
f
(m) is given by
´
λ
f
(m) ←
J
j=1
nevent
j
(m)
J
j=1
´
λ
g
(V
j
+ m) nrisk
j
(m)
, m = m
[1]
, . . . , m
[Lm]
(4.18)
and similarly given any ﬁxed λ
f
=
´
λ
f
,
´
λ
g
(t) ←
J
j=1
nevent
j
(t −V
j
)
J
j=1
´
λ
f
(t −V
j
) nrisk
j
(t −V
j
)
, t = t
[1]
, . . . , t
[Lt]
. (4.19)
Both marginal Breslow estimators are known to be the nonparametric MLE of the
baseline hazards conditional on the relativerisk terms. To take into the identiﬁability
constraint Elog
g
(t) = 0, we may update λ
g
(t) by
λ
g
(t) ← exp
_
log
´
λ
g
(t) −mean
_
log
´
λ
g
_
_
. (4.20)
By iterating (4.18) to (4.20) until convergence, we obtain the DtBreslow estimator.
It could be viewed as a natural extension of the oneway nonparametric estimators
96
(4.14) and (4.15). In (4.18), it is usually appropriate to set the initial exogenous
multipliers
´
λ
g
≡ 1, then the algorithm could converge in a few iterations.
We carry out a simulation study to assess the performance of DtBreslow estimator.
Using the simulation data by (4.2), the twoway estimations of λ
f
(m) and λ
g
(t) are
shown in Figure 4.2 (top). Also plotted are the oneway nonparametric estimation
and the underlying truth. For this rectangular diagram of survival data, the oneway
and DtBreslow estimations have comparable performance. However, if we use the
nonrectangular Lexis diagram (e.g. either pre2005 vintages or post2005 vintages),
the DtBreslow estimator would demonstrate the signiﬁcant improvement over the
oneway estimation; see Figure 4.2 (middle and bottom). It is demonstrated that the
oneway nonparametric methods suﬀer the potential bias in both cases. By comparing
(4.16) to (4.14) and (4.15), it is clear that the lifting power of DtBreslow estimator
(e.g. in calibrating λ
f
(m)) depends on not only the exogenous scaling factor
´
λ
g
(t),
but also the vintagespeciﬁc weights nrisk
j
(m).
4.2.3 MEV Decomposition
Lastly, we consider a threeway nonparametric procedure that is consistent with
the MEV (maturationexogenousvintage) decomposition framework we have devel
oped for vintage data analysis in Chatper III. For each vintage with accounts
(V
j
, U
ji
, M
ji
, ∆
ji
) for i = 1, . . . , n
j
, assume the underlying hazards to be
λ
ji
(m, t) = λ
f
(m)λ
g
(t)λ
h
(V
j
) = exp
_
f(m) + g(t) + h(V
j
)
_
. (4.21)
For identiﬁability, let Eg = Eh = 0. Such MEV hazards model can be viewed as the
extension of dualtime nonparametric model (4.16) with the vintage heterogeneity.
Due to the hyperplane nature of the vintage diagram such that t ≡ V
j
+ m in
(4.21), the linear trends of f, g, h are not identiﬁable; see Lemma 3.1. Among other
97
Figure 4.2: DtBreslow estimator vs oneway nonparametric estimator using the data
of (top) both pre2005 and post2005 vintages; (middle) pre2005 vintages
only; (bottom) post2005 vintages only.
98
approaches to break up the linear dependency, we consider
1. Binning in vintage, lifetime or calendar time for especially the marginal regions
with sparse observations;
2. Removing the marginal trend in the vintage dimension h().
Note that the second strategy assumes the zero trend in vintage heterogeneity, which
is an ad hoc approach when we have no prior knowledge about the trend of f, g, h.
Later in Section 4.4 we discuss also a frailty type of vintage eﬀect.
The nonparametric MLE for the MEV hazard model (4.21) follows the DtBres
low estimator discussed above. Speciﬁcally, we may estimate the three conditional
marginal eﬀects iteratively by:
a) Maturation eﬀect for ﬁxed
´
λ
g
(t) and
´
λ
h
(v)
´
λ
f
(m) ←
J
j=1
nevent
j
(m)
J
j=1
´
λ
h
(V
j
)
´
λ
g
(V
j
+ m) nrisk
j
(m)
(4.22)
for m = m
[1]
, . . . , m
[Lm]
.
b) Exogenous eﬀect for ﬁxed
´
λ
f
(m) and
´
λ
h
(v)
_
¸
¸
_
¸
¸
_
´
λ
g
(t) ←
J
j=1
nevent
j
(t −V
j
)
J
j=1
´
λ
h
(V
j
)
´
λ
f
(t −V
j
) nrisk
j
(t −V
j
)
´
λ
g
() ← exp
_
´ g −mean
_
´ g
_
_
, with ´ g = log
´
λ
g
(4.23)
for t = t
[1]
, . . . , t
[Lt]
.
c) Vintage eﬀect for ﬁxed
´
λ
f
(m) and
´
λ
g
(t)
_
¸
_
¸
_
´
λ
h
(V
j
) ←
n
j
i=1
I(∆
ji
= 1)
Lm
l=1
´
λ
f
(m
[l]
)
´
λ
g
(V
j
+ m
[l]
) nrisk
j
(m
[l]
)
´
λ
h
() ← exp
_
´
h −mean
_
´
h
_
_
, with
´
h = log
´
λ
h
(4.24)
99
for j = 1, . . . , J (upon necessary binning). In (4.24), one may also perform the
trendremoval
´
λ
h
() ← exp¦
´
h − mean⊕trend
_
´
h
_
¦ if zerotrend can be assumed
for the vintage heterogeneity.
Setting the initial values
´
λ
g
≡ 1 and
´
λ
h
≡ 1, then iterating the above three
steps until convergence would give us the nonparametric estimation of the threeway
marginal eﬀects. Consider again the simulation data with both pre2005 and post
2005 vintages observed on the rectangular Lexis diagram Figure 4.1. Since there
are limited sample sizes for very old and very new vintages, the vintages with V
j
≤
−36 are merged and the vintages with V
j
≥ 40 are cut oﬀ. Besides, given the
prior knowledge that pre2005 vintages have relatively ﬂat heterogeneity, binning is
performed in every half year. Then, running the above MEV iterative procedure,
we obtain the estimation of maturation hazard λ
f
(m), exogenous hazard λ
g
(t) and
the vintage heterogeneity λ
h
(v), which are plotted in Figure 4.3 (top). They are
consistent with the underlying true hazard functions we have prespeciﬁed in (4.3).
Furthermore, we have also tested the MEV nonparametric estimation for the dual
time data with either pre205 vintages only (trapezoidal diagram) or the post2005
vintages only (triangular diagram). The ﬁtting results are also presented in Figure 4.3,
which implies that our MEV decomposition is quite robust to the irregularity of the
underlying Lexis diagram.
Remarks: before ending this section, we make several remarks about the nonpara
metric methods for dualtime survival analysis:
1. First of all, the dualtimetodefault data have essential diﬀerence from the
usual bivariate lifetime data that are associated with either two separate failure
modes per subject or a single mode for a pair of subjects. For this reason,
it is questionable whether the nonparametric methods developed for bivariate
survival are applicable to Lexis diagram; see Keiding (1990). Our developments
100
Figure 4.3: MEV modeling of empirical hazard rates, based on the dualtimeto
default data of (top) both pre2005 and post2005 vintages; (middle)
pre2005 vintages only; (bottom) post2005 vintages only.
101
of DtBreslow and MEV nonparametric estimators for default intensities corre
spond to the ageperiod or ageperiodcohort model in cohort or vintage data
analysis of loss rates; see (3.4) in the previous chapter. For either default inten
sities or loss rates on Lexis diagram, one needs to be careful about the intrinsic
identiﬁcation problem, for which we have discussed the practical solutions in
both of these chapters.
2. The nonparametric models (4.16) and (4.21) underlying the DtBreslow and
MEV estimators are special cases of the dualtime Cox models in Section 4.4.
In the ﬁnitesample (or discrete) case, we have assumed in (4.17) the endoge
nous and exogenous hazard rates to be pointwise, or equivalently assumed the
cumulative hazards to be step functions. In this case, statistical properties of
DtBreslow estimates
´
λ
f
(m) and
´
λ
g
(t) can be studied by the standard likelihood
theory for the ﬁnitedimensional parameters α or β. However on the continu
ous domain, the largesample properties of DtBreslow estimators are worthy of
future study when the dimensions of both α and β increase to inﬁnity at the
essentially same speed.
3. Smoothing techniques can be applied to oneway, twoway or threeway non
parametric estimates, where the smoothness assumptions follow our discussion
in the previous chapter; see (3.2). Consider e.g. the DtBreslow estimator with
output of twoway empirical hazard rates
´
λ
f
(m) and
´
λ
g
(t), i.e. NelsonAalen
increments on discrete time points; see (4.12). One may postprocess them by
kernel smoothing
λ
f
(m) =
1
b
f
Lm
l=1
K
f
_
m−m
[l]
b
f
_
´
λ
f
(m
[l]
), λ
g
(t) =
1
b
g
Lt
l=1
K
g
_
t −t
[l]
b
g
_
´
λ
g
(t
[l]
)
for m, t on the continuous domains, where K
f
() and K
g
() together with band
widths b
f
and b
g
are the kernel functions that are bounded and vanish outside
102
[−1, 1]; see RamlauHansen (1983). Alternatively, one may use the penalized
likelihood formulation via smoothing spline; see Gu (2002; ¸7). The challenging
issues in these smoothing methods are known to be the selection of bandwidths
or smoothing parameters, as well as the correction of boundary eﬀects. More
details can be referred to Wang (2005) and references therein.
4. An alternative way of dualtime hazard smoothing can be based on the vintage
data analysis (VDA) developed in the previous chapter. For largesample obser
vations, one may ﬁrst calculate the empirical hazard rates
´
λ
j
(m) per vintage by
(4.11), use them as the pseudoresponses of vintage data structure as in (3.1),
then perform Gaussian process modeling discussed in Section 3.3. In this case,
the variance of
´
λ
j
(m) can be estimated by a tiecorrected formula
´ σ
2
j
(m) =
_
nrisk
j
(m) −nevent
j
(m)
_
nevent
j
(m)
nrisk
3
j
(m)
and
´
λ
j
(m) are uncorrelated in m; see e.g. Aalen, et al. (2008; p.85). Then, we
may write
´
λ
j
(m) = λ
j
(m)(1 + ε
j
(m)) with Eε
j
(m) = 0 and
Eε
2
j
(m) ≈
´ σ
2
j
(m)
´
λ
2
j
(m)
=
1
nevent
j
(m)
−
1
nrisk
j
(m)
→ 0, as n
j
→ ∞
on the discrete time domain; by Taylor expansion, log
´
λ
j
(m) ≈ log λ
j
(m) +
ε
j
(m), corresponding to the MEV model (3.15) with nonstationary noise.
5. An important message delivered by Figure 4.2 (middle and bottom) is that
the nonparametric estimation of λ
f
(m) in conventional lifetimeonly survival
analysis might be misleading when there exists exogenous hazard λ
g
(t) in the
calendar dimension. Both the DtBreslow and MEV nonparametric methods
may correct the bias of lifetimeonly estimation, as demonstrated in Figure 4.2
and Figure 4.3 with either pre2005 only or post2005 only vintages.
103
4.3 Structural Models
We have introduced in Chapter 1.2.1 the structural approach to credit risk model
ing, which is popular in the sense that the default events can be triggered by a latent
stochastic process when it hits the default boundary. For a quick review, one may
refer to (1.6) for the latent distancetodefault process w.r.t. zero boundary, (1.9) for
the notion of ﬁrstpassagetime, and (1.13) for the concrete expression of hazard rate
based on the inverse Gaussian lifetime distribution. In this section, we consider the
extension of structural approach to dualtimetodefault analysis.
4.3.1 Firstpassagetime Parameterization
Consider the Lexis diagram of dualtime default events for vintages j = 1, . . . , J.
Let X
j
(m) be the endogenous distanttodefalut (DD) process in lifetime and assume
it follows a drifted Wiener process
X
j
(m) = c
j
+ b
j
m + W
j
(m), m ≥ 0 (4.25)
where X
j
(0) ≡ c
j
∈ R
+
denotes the parameter of initial distancetodefault, b
j
∈
R denotes the parameter of trend of credit deterioration and W
j
(t) is the Wiener
process. Without getting complicated, let us assume that W
j
and W
j
are uncorrelated
whenever j ,= j
. Recall the hazard rate (or, default intensity) of the ﬁrstpassagetime
of X
j
(m) crossing zero:
λ
j
(m; c
j
, b
j
) =
c
j
√
2πm
3
exp
_
−
(c
j
+ b
j
m)
2
2m
_
Φ
_
c
j
+ b
j
m
√
m
_
−e
−2b
j
c
j
Φ
_
−c
j
+ b
j
m
√
m
_, (4.26)
whose numerator and denominator correspond to the density and survival functions;
see (1.10)–(1.13). By (4.6), the MLE of (c
j
, b
j
) can be obtained by maximizing the
104
loglikelihood of the jth vintage observations
j
=
n
j
i=1
∆
ji
log λ
j
(M
ji
; c
j
, b
j
) + log S
j
(M
ji
; c
j
, b
j
) −log S
j
(U
ji
; c
j
, b
j
), (4.27)
where the maximization can be carried out by nonlinear optimization. We supplement
the gradients of the loglikelihood function at the end of the chapter.
As their literal names indicate, both the initial distancetodefault c
j
and the trend
of credit deterioration b
j
have interesting structural interpretations. With reference
to Aalen and Gjessing (2001) or Aalen, et al. (2008; ¸10):
1. The cparameter represents the initial value of DD process. When c becomes
smaller, a default event tends to happen sooner, hence moving the concentration
of the defaults towards left. In theory, the hazard function (4.26) for m ∈ (0, ∞)
is always ﬁrstincreasingthendecreasing, with maximum at m
∗
such that the
derivative λ
(m
∗
) = 0. In practice, let us focus on the bounded interval m ∈
[, m
max
] for some > 0. Then, when c gradually approaches zero, the hazard
rate would be shaped from (a) monotonic increasing to (b) ﬁrstincreasing
thendecreasing and ﬁnally to (c) monotonic decreasing. See Figure 1.2 for the
illustration with ﬁxed b and varying c values.
2. The bparameter represents the trend of DD process, and it determines the level
of stabilized hazard rate as m → ∞. In Aalen, et al. (2008; ¸10), let φ
m
(x)
be the conditional density of X(m) given that min
0<s≤m
X(s) > 0 and it is said to
be quasistationary if φ
m
(x) is minvariant. In our case, the quasistationary
distribution is φ(x) = b
2
xe
−bx
for x ≥ 0 and the limiting hazard rate is given
by
lim
m→∞
λ(m) =
1
2
∂φ(x)
∂x
¸
¸
¸
x=0
=
b
2
2
,
which does not depend on the cparameter. In Figure 1.2 with b = −0.02, all
the hazard rate curves will converge to the constant 0.0002 after a long run.
105
The above ﬁrstpassagetime parameterization is only on the endogenous or mat
uration hazard in lifetime. Next, we extend the ﬁrstpassagetime structure taking
into account the exogenous eﬀect in calendar time. To achieve that, we model the
dualtime DD process for each vintage V
j
by a subordinated process
X
∗
j
(m) = X
j
(Ψ
j
(m)), with Ψ
j
(m) =
_
m
0
ψ(V
j
+ s)ds (4.28)
where X
j
(m) is the endogenous DD process (4.25) and Ψ
j
(m) is an increasing time
transformation function such that all vintages share the same integrand ψ(t) > 0
in calendar time. We set the constraint Elog ψ = 0 on the exogenous eﬀect, in
the sense of letting the ‘average’ DD process be captured endogenously. Such time
transformation approach can be viewed as a natural generalization of the loglocation
scale transformation (1.16) with correspondence σ
i
= 1 and ψ() = e
−µ
i
. It has
been used to extend the accelerated failure time (AFT) models with timevarying
covariates; see e.g. Lawless (2003; ¸6.4.3).
The exogenous function ψ(t) rescales the original lifetime increment ∆m to be
ψ(V
j
+ m)∆m, and therefore transforms the diﬀusion dX
j
(m) = b
j
dm + dW
j
(m) to
dX
∗
j
(m) = b
j
ψ(V
j
+ m)dm +
_
ψ(V
j
+ m)dW
j
(m) (4.29)
w.r.t. the zero default boundary. Alternatively, one may rewrite it as
dX
∗
j
(m) =
_
ψ(V
j
+ m)dX
j
(m) −b
j
_
_
ψ(V
j
+ m) −ψ(V
j
+ m)
_
dm. (4.30)
Let us introduce a new process
¯
X
j
(m) such that d
¯
X
j
(m) =
_
ψ(V
j
+ m)dX
j
(m), and
introduce a timevarying default boundary
¯
D
j
(m) = b
j
_
m
0
_
_
ψ(V
j
+ s) −ψ(V
j
+ s)
_
ds.
106
Then, by (4.30), it is clear that the timetodefault τ
∗
j
= inf¦m ≥ 0 : X
∗
j
(m) ≤ 0¦ =
inf¦m ≥ 0 :
¯
X(m) ≤
¯
D
j
(m)¦.
By timetransformation (4.28), it is straightforward to obtain the survival function
of the ﬁrstpassagetime τ
∗
j
by
S
∗
j
(m) = P
_
X
∗
j
(s) > 0 : s ∈ [0, m]
_
= P
_
X
j
(s) > 0 : s ∈ [0, Ψ
j
(m)]
_
= S
j
(Ψ
j
(m)) = S
j
__
m
0
ψ(V
j
+ s)ds
_
(4.31)
where the benchmark survival S
j
() is given by the denominator of (4.26). Then, the
hazard rate of τ
∗
j
is obtained by
λ
∗
j
(m) =
−log S
∗
j
(m)
dm
=
−log S
j
(Ψ
j
(m))
dΨ
j
(m)
dΨ
j
(m)
dm
= λ
j
(Ψ
j
(m))Ψ
j
(m)
= λ
j
__
m
0
ψ(V
j
+ s)ds
_
ψ(V
j
+ m) (4.32)
with λ
j
() given in (4.26). From (4.32), the exogenous function ψ(t) not only maps
the benchmark hazard rate to the rescaled time Ψ
j
(m), but also scales it by ψ(t).
This is the same as the AFT regression model (1.21).
Now consider the parameter estimation of both vintagespeciﬁc structural parame
ters (c
j
, b
j
) and the exogenous timescaling function ψ(t) from the dualtimetodefault
data (4.1). By (4.6), the joint loglikelihood is given by =
J
j=1
j
, with
j
given
by (4.27) based on λ
∗
j
(m; c
j
, b
j
) and S
∗
j
(m; c
j
, b
j
). The complication lies in the arbi
trary formation of ψ(t) involved in the timetransformation function. Nonparametric
techniques can be used if extra properties like the smoothness can be imposed onto
the function ψ(t). In our case the exogenous timescaling eﬀects are often dynamic
and volatile, for which we recommend a piecewise estimation procedure based on the
discretized Lexis diagram.
1. Discretize the Lexis diagram with ∆m = ∆t such that t = t
min
+ l∆t for l =
107
0, 1, . . . , L, where t
min
is set to be the left boundary and t
min
+L∆t corresponds
to the right boundary. Specify ψ(t) to be piecewise constant
ψ(t) = ψ
l
, for t −t
min
∈
_
(l −1)∆t, l∆t
¸
, l = 1, 2, . . . , L
and ψ(t) ≡ 1 for t ≤ t
min
. Then, the timetransformation for the vintage with
origination time V
j
can be expressed as
¯
Ψ
j
(m) =
(V
j
+m−t
min
)/∆t
l=(V
j
−t
min
)/∆t
ψ(t
min
+ l∆t)∆t, for V
j
+ m ≥ t
min
which reduces to t
min
−V
j
+
(V
j
+m−t
min
)/∆t
l=1
ψ
l
∆t in the case of left truncation.
2. For each of l = 1, 2, . . ., get the corresponding likelihood contribution:
[l]
=
(j,i)∈G
[l]
∆
ji
_
log λ
j
(
¯
Ψ
j
(M
ji
)) + log ψ
l
_
+ log S
j
(
¯
Ψ
j
(M
ji
)) −log S
j
(
¯
Ψ
j
(U
ji
))
where (
[l]
= ¦(j, i) : V
j
+M
ji
= t
min
+l∆t¦, and
¯
Ψ
j
(U
ji
) = max¦0, t
min
−V
j
¦ if
t
min
is the systematic left truncation in calendar time.
3. Run nonlinear optimization for estimating (c
i
, b
i
) and ψ
l
iteratively:
(a) For ﬁxed ψ and each ﬁxed vintage j = 1, . . . , J, obtain the MLE (´c
j
,
´
b
j
)
by maximizing (4.27) with M
ji
replaced by
¯
Ψ
j
(M
ji
).
(b) For all ﬁxed (c
j
, b
j
), obtain the MLE of (ψ
1
, . . . , ψ
L
) by maximizing
L
l=1
[l]
subject to the constraint
L
l=1
log ψ
l
= 0.
In Step (3.b), one may perform further binning for (ψ
1
, . . . , ψ
L
) for the purpose of
reducing the dimension of parameter space in constrained optimization. One may also
apply the multiresolution wavelet bases to represent ¦ψ
1
, . . . , ψ
L
) in a timefrequency
perspective. Such ideas will be investigated in our future work.
108
4.3.2 Incorporation of Covariate Eﬀects
It is straightforward to extend the above structural parameterization to incorpo
rate the subjectspeciﬁc covariates. For each account i with origination V
j
, let ˜ z
ji
be
the static covariates observed upon origination (including the intercept and otherwise
centered predictors) and z
ji
(m) for m ∈ (U
ji
, M
ji
] be the dynamic covariates observed
since left truncation. Let us specify
c
ji
= exp¦η
T
˜ z
ji
¦ (4.33)
Ψ
ji
(m) = U
ji
−V
j
+
_
m
U
ji
exp¦θ
T
z
ji
(s)¦ds, m ≥ U
ji
(4.34)
subject to the unknown parameters (η, θ), and suppose the subjectspeciﬁc DD pro
cess _
¸
_
¸
_
X
∗
ji
(m) = X
ji
(Ψ
ji
(m)),
X
ji
(m) = c
ji
+ b
j
m + W
ji
(m)
(4.35)
with an independent Wiener process W
ji
(m), the initial value c
ji
and the vintage
speciﬁc trend of credit deterioration b
j
. Then, the default occurs at τ
∗
ji
= inf¦m >
0 : X
∗
ji
(m) ≤ 0¦. Note that when the covariates in (4.34) are not timevarying, we
may specify Ψ
ji
(m) = mexp¦θ
T
z
ji
¦ and obtain the AFT regression model log τ
∗
ji
=
θ
T
z
ji
+log τ
ji
with inverse Gaussian distributed τ
ji
. In the presence of calendartime
state variables x(t) such that z
ji
(m) = x(V
j
+m) for all (j, i), the timetransformation
(4.34) becomes a parametric version of (4.28).
By the same arguments as in (4.31) and (4.32), we obtain the hazard rate λ
∗
ji
(m) =
λ
ji
(Ψ
ji
(m))Ψ
ji
(m), or
λ
∗
ji
(m) =
c
ji
_
2πΨ
3
ji
(m)
exp
_
−
_
c
ji
+ b
j
Ψ
ji
(m)
_
2
2Ψ
ji
(m)
_
Ψ
ji
(m)
Φ
_
c
ji
+ b
j
Ψ
ji
(m)
_
Ψ
ji
(m)
_
−e
−2b
j
c
ji
Φ
_
−c
ji
+ b
j
Ψ
ji
(m)
_
Ψ
ji
(m)
_ (4.36)
109
and the survival function S
∗
ji
(m) = S
ji
(Ψ
ji
(m)) given by the denominator in (4.36).
The parameters of interest include (η, θ) and b
j
for j = 1, . . . , J, and they can
be estimated by MLE. Given the dualtimetodefault data (4.1), the loglikelihood
function follows the generic form (4.6). Standard nonlinear optimization programs
can be employed, e.g. the “optim” algorithm of R (statslibrary) and the “fminunc”
algorithm of MATLAB (Optimization Toolbox).
Remarks:
1. There is no unique way to incorporate the static covariates in the ﬁrstpassage
time parameterization. For example, we considered including them by the
initial distancetodefault parameter (4.33), while they may also be incorpo
rated into other structural parameters; see Lee and Whitmore (2006). How
ever, there has been limited discussion on incorporation of dynamic covariates
in the ﬁrstpassagetime approach, for which we make an attempt to use the
timetransformation technique (4.28) under nonparametric setting and (4.34)
under parameter setting.
2. The endogenous hazard rate (4.26) is based on the ﬁxed eﬀects of initial distance
todefault and trend of deterioration, while these structural parameters can be
random eﬀects due to the incomplete information available (Giesecke, 2006).
One may refer to Aalen, et al. (2008; ¸10) about the randomeﬀect extension.
For example, when the trend parameter b ∼ N(µ, σ
2
) and the initial value c is
ﬁxed, the endogenous hazard rate is given by p(m)/S(m) with
p(m) =
c
_
2π(m
3
+ σ
2
m
4
)
exp
_
−
(c + µt)
2
2(t + σ
2
t
2
)
_
S(m) = Φ
_
c + µm
√
m + σ
2
m
2
_
−e
−2µc+2σ
2
c
2
Φ
_
−c + µm−2σ
2
cm
√
m + σ
2
m
2
_
.
3. In the ﬁrstpassagetime parameterization with inverse Gaussian lifetime distri
110
bution, the loglikelihood functions (4.27) and (4.6) with the pluggedin (4.26)
or (4.36) are rather messy, especially when we attempt to evaluate the deriva
tives of the loglikelihood function in order to run gradient search; see the
supplementary material. Without gradient provided, one may rely on the crude
nonlinear optimization algorithms, and the parameter estimation follows the
standard MLE procedure. Our discussion in this section mainly illustrates the
feasibility of dualtime structural approach.
4.4 Dualtime Cox Regression
The semiparametric Cox models are popular for their eﬀectiveness of estimating
the covariate eﬀects. Suppose that we are given the dualtimetodefault data (4.1)
together with the covariates z
ji
(m) observed for m ∈ (U
ji
, M
ji
], which may include
the static covariates ˜ z
ji
(m) ≡ ˜ z
ji
. Using the traditional CoxPH modeling approach,
one may either adopt an arbitrary baseline in lifetime
λ
ji
(m) = λ
0
(m) exp¦θ
T
z
ji
(m)¦, (4.37)
or adopt the stratiﬁed Cox model with arbitrary vintagespeciﬁc baselines
λ
ji
(m) = λ
j0
(m) exp¦θ
T
z
ji
(m)¦ (4.38)
for i = 1, . . . , n
j
and j = 1, . . . , L. However, these two Cox models may not be
suitable for (4.1) observed on the Lexis diagram in the following sense:
1. (4.37) is too stringent to capture the variation of vitnagespeciﬁc baselines;
2. (4.38) is too relaxed to capture the interrelation of vintagespeciﬁc baselines.
For example, we have simulated a Lexis diagram in Figure 4.1 whose vintagespeciﬁc
hazards are interrelated not only in lifetime m but also in calendar time t.
111
4.4.1 Dualtime Cox Models
The dualtime Cox models considered in this section take the following multiplica
tive hazards form in general,
λ
ji
(m, t) = λ
0
(m, t) exp¦θ
T
z
ji
(m)¦ (4.39)
where λ
0
(m, t) is a bivariate base hazard function on Lexis diagram. Upon the spec
iﬁcation of λ
0
(m, t), the traditional models (4.37) and (4.38) can be included as
special cases. In order to balance between (4.37) and (4.38), we assume the MEV
decomposition framework of the base hazard λ
0
(m, t) = λ
f
(m)λ
g
(t)λ
h
(t −v), i.e. the
nonparametric model (4.21). Thus, we restrict ourselves to the dualtime Cox models
with MEV baselines and relativerisk multipliers
λ
ji
(m, t) = λ
f
(m)λ
g
(t)λ
h
(V
j
) exp¦θ
T
z
ji
(m)¦ (4.40)
= exp¦f(m) + g(t) + h(V
j
) +θ
T
z
ji
(m)¦
where λ
f
(m) = exp¦f(m)¦ represents the arbitrary maturation/endogenous baseline,
λ
g
(t) = exp¦g(t)¦ and λ
h
(v) = exp¦h(v)¦ represent the exogenous multiplier and the
vintage heterogeneity subject to Eg = Eh = 0. Some necessary binning or trend
removal need to be imposed on g, h to ensure the model estimability; see Lemma 3.1.
Given the ﬁnitesample observations (4.1) on Lexis diagram, one may always ﬁnd
L
m
distinct lifetime points m
[1]
< < m
[Lm]
and L distinct calendar time points
t
[1]
< < t
[Lt]
such that there exist at least one default event observed at the
corresponding marginal time. Following the leastinformation assumption of Breslow
(1972), let us treat the endogenous baseline hazard λ
f
(m) and the exogenous baseline
hazard λ
g
(t) as pointwise functions of the form (4.17) subject to
Lt
l=1
β
l
= 0. For the
discrete set of vintage originations, the vintage heterogeneity eﬀects can be regarded
112
as the categorical covariates eﬀects, essentially. We may either assume the pointwise
function of the form h(v) =
J
j=1
γ
j
I(v = V
j
) subject to constraints on both mean
and trend (i.e. the ad hoc approach), or assume the piecewiseconstant functional
form upon appropriate binning in vintage originations
h(V
j
) = γ
κ
, if j ∈ (
κ
, κ = 1, . . . , K (4.41)
where the vintage buckets (
κ
= ¦j : V
j
∈ (ν
κ−1
, ν
κ
]¦ are prespeciﬁed based on
V
−
j
≡ ν
0
< < ν
K
≡ V
J
(where K < J). Then, the semiparametric model (4.40) is
parametrized with dummy variables to be
λ
ji
(m, t) = exp
_
Lm
l=1
α
l
I(m = m
[l]
) +
Lt
l=2
β
l
I(t = t
[l]
)
+
K
κ=2
γ
κ
I(j ∈ (
κ
) +θ
T
z
ji
(m)
_
(4.42)
where β
1
≡ 0 and γ
1
≡ 0 are excluded from the model due to the identiﬁability
constraints. Plugging it into the joint likelihood (4.6), one may then ﬁnd the MLE of
(α, β, γ, θ). (Note that it is rather straightforward to convert the resulting parameter
estimates ´ α,
´
β and ´ γ to satisfy the zero sum constraints for β and γ, so is it for
(4.45) to be studied.)
As the sample size tends to inﬁnity, both endogenous and exogenous baseline
hazards λ
f
(m) and λ
g
(t) deﬁned on the continuous dualtime scales (m, t) tend to be
inﬁnitedimensional, hence making intractable the above pointwise parameterization.
The semiparametric estimation problem in this case is challenging, as the dualtime
Cox models (4.40) involves two nonparametric components that need to be proﬁled
out for the purpose of estimating the parametric eﬀects. In what follows, we take
an “intermediate approach” to semiparametric estimation via exogenous timescale
discretization, to which the theory of partial likelihood applies.
113
4.4.2 Partial Likelihood Estimation
Consider an intermediate reformulation of the dualtime Cox models (4.40) by
exogenous timescale discretization:
t
l
= t
min
+ l∆t, for l = 0, 1, . . . , L (4.43)
for some time increment ∆t bounded away from zero, where t
min
corresponds to the
left boundary and t
min
+ L∆t corresponds to the the right boundary of the calendar
time under consideration. (If the Lexis diagram is discrete itself, ∆t is automatically
deﬁned. The irregular time spacing is feasible, too.) On each subinterval, assume
the exogenous baseline λ
g
(t) is deﬁned on the L cutting points (and zero otherwise)
g(t) =
L
l=1
β
l
I(t = t
min
+ l∆t), (4.44)
subject to
L
l=1
β
l
= 0. Then, consider the oneway CoxPH reformulation of (4.40):
λ
ji
(m) = λ
f
(m) exp
_
L
l=2
β
l
I(V
j
+ m ∈ (t
l−1
, t
l
])
+
K
κ=2
γ
κ
I(j ∈ (
κ
) +θ
T
z
ji
(m)
_
(4.45)
where λ
f
(m) is an arbitrary baseline hazard, g(t) for any t ∈ (t
l−1
, t
l
] in (4.40) is
shifted to g(t
l
) in (4.44), and the vintage eﬀect is bucketed by (4.41). It is easy to
see that if the underlying Lexis diagram is discrete, the reformulated CoxPH model
(4.45) is equivalent to the fully parametrized version (4.42).
We consider the partial likelihood estimation of the parameters (β, γ, θ) ≡ ξ. For
convenience of discussion, write for each (j, i, m) the (L − 1)vector of ¦I(V
j
+ m ∈
(t
l−1
, t
l
])¦
L
l=2
, the (K−1)vector of ¦I(V
j
∈ (ν
κ−1
, ν
κ
])¦
K
κ=2
, and z
ji
(m) as a long vector
x
ji
(m). Then, by Cox (1972, 1975), the parameter ξ can be estimated by maximizing
114
the partial likelihood
PL(ξ) =
J
j=1
n
j
i=1
_
exp¦ξ
T
x
ji
(M
ji
)¦
(j
,i
)∈R
ji
exp
_
ξ
T
x
j
i
(M
ji
)
_
_
∆
ji
(4.46)
where 1
ji
= ¦(j
, i
) : U
j
i
< M
ji
≤ M
j
i
¦ is the atrisk set at each observed M
ji
.
Given the maximum partial likelihood estimator
´
ξ, the endogenous cumulative base
line hazard Λ
f
(m) =
_
m
0
λ
f
(s)ds can be estimated by the Breslow estimator
´
Λ
f
(m) =
J
j=1
n
j
i=1
I(m ≥ M
ji
)∆
ji
(j
,i
)∈R
ji
exp
_
´
ξ
T
x
j
i
(M
ji
)
_ (4.47)
which is a step function with increments at m
[1]
< < m
[Lm]
. It is wellknown in the
CoxPH context that both estimators
´
ξ and
´
Λ
f
() are consistent and asymptotically
normal; see Andersen and Gill (1982) based on counting process martingale theory.
Speciﬁcally,
√
N(
´
ξ − ξ) converges to a zeromean multivariate normal distribution
with covariance matrix that can be consistently estimated by ¦1(
´
ξ)/N¦
−1
, where
1(ξ) = −∂
2
log PL(ξ)
_
∂ξξ
T
. For the nonparametric baseline,
√
N
_
´
Λ
f
(m) − Λ
f
(m)
_
converges to a zeromean Gaussian process; see also Andersen, et al. (1993).
The DtBreslow and MEV estimators we have discussed in Section 4.2 can be
studied by the theory of partial likelihood above, since the underlying nonparametric
models are special cases of (4.45) with θ ≡ 0. Without the accountlevel covariates,
the hazard rates λ
ji
(m) roll up to λ
j
(m), under which the partial likelihood function
(4.46) can be expressed through the counting notations introduced in (4.9),
PL(g, h) =
J
j=1
m
_
exp¦g(V
j
+ m) + h(V
j
)¦
J
j
=1
nrisk
j
(m) exp¦g(V
j
+ m) + h(V
j
)¦
_
nevent
j
(m)
(4.48)
where g(t) is parameterized by (4.44) and h(v) is by (4.41). Let
´
λ
g
(t) = exp¦´ g(t)¦
and
´
λ
h
(v) = exp¦
´
h(v)¦ upon maximization of PL(g, h) and centering in the sense of
115
Eg = Eh = 0. Then, by (4.47), we have the Breslow estimator of the cumulative
version of endogenous baseline hazards
´
Λ
f
(m) =
_
m
0
J
j=1
nevent
j
(s)ds
J
j=1
nrisk
j
(s)
´
λ
g
(V
j
+ s)
´
λ
h
(V
j
)
. (4.49)
It is clear that the increments of
´
Λ
f
(m) correspond to the maturation eﬀect (4.22)
of the MEV estimator. They also reduce to (4.18) of DtBreslow estimator in the
absence of vintage heterogeneity.
The implementation of dualtime Cox modeling based on (4.45) can be carried
out by maximizing the partial likelihood (4.46) to ﬁnd
´
θ and using (4.47) to estimate
the baseline hazard. Standard softwares with survival package or toolbox can be
used to perform the parameter estimation. In the case when the coxph procedure
in these survival packages (e.g. R:survival) requires the timeindepenent covariates,
we may convert (4.1) to an enlarged counting process format with piecewiseconstant
approximation of timevarying covariates x
ji
(m); see the supplementary material in
the last section of the chapter.
For example, we have tested the partial likelihood estimation using the coxph of
R:survival package and the simulation data in Figure 4.1, where the raw data (4.1)
are converted to the counting process format upon the same binning preprocessing
as in Section 4.2.3. The numerical results of
´
λ
g
,
´
λ
h
by maximizing (4.48) and
´
λ
f
by
(4.49) are nearly identical to the plots shown in Figure 4.3.
4.4.3 Frailtytype Vintage Eﬀects
Recall the frailty models introduced in Section 1.3 about their double roles in
charactering both the odd eﬀects of unobserved heterogeneity and the dependence
of correlated defaults. They provide an alternative approach to model the vintage
eﬀects as the random block eﬀects. Let Z
κ
for κ = 1, . . . , K be a random sample from
116
some frailty distribution, where Z
κ
represents the shared frailty level of the vintage
bucket (
κ
according to (4.41). Consider the dualtime Cox frailty models
λ
ji
(m, t[Z
κ
) = λ
f
(m)λ
g
(t)Z
κ
exp¦θ
T
z
ji
(m)¦, if j ∈ (
κ
(4.50)
for i = 1, . . . , n
j
and j ∈ (
κ
for κ = 1, . . . , K. We also write λ
ji
(m, t[Z
κ
) as λ
ji
(m[Z
κ
)
with t ≡ V
j
+ m. It is assumed that given the frailty Z
κ
, the observations for all
vintages in (
κ
are independent. In this frailty model setting, the betweenbucket
heterogeneity is captured by realizations of the random variate Z, while the within
bucket dependence is captured by the same realization Z
κ
.
Similar to the intermediate approach taken by the partial likelihood (4.46) under
exogenously discretized dualtime model (4.45), let us write
˜
ξ = (β, θ) and let ˜ x
ji
(m)
join the vectors of ¦I(V
j
+ m ∈ (t
l−1
, t
l
])¦
L
l=2
and z
ji
(m). Then, we may write (4.50)
as λ
ji
(m[Z
κ
) = λ
f
(m)Z
κ
exp
_
˜
ξ
T
˜ x
ji
(m)
_
. The loglikelihood based on the dualtime
todefault data (4.1) is given by
=
J
j=1
n
j
i=1
∆
ji
_
log λ
f
(M
ji
) +
˜
ξ
T
˜ x
ji
(m)
_
+
K
κ=1
log
_
(−1)
dκ
L
(dκ)
(A
κ
)
_
. (4.51)
where L(x) = E
Z
[exp¦−xZ¦ is the Laplace transform of Z, L
(d)
(x) is its dth
derivative, d
κ
=
j∈Gκ
n
j
i=1
∆
ji
and A
κ
=
j∈Gκ
n
j
i=1
_
M
ji
U
ji
λ
f
(s) exp
_
˜
ξ
T
˜ x
ji
(s)
_
ds;
see the supplementary material at the end of the chapter.
The gamma distribution Gamma(δ
−1
, δ) with mean 1 and variance δ is the most
frequently used frailty due to its simplicity. It has the Laplace transform L(x) = (1+
δx)
−1/δ
whose dth derivative is given by L
(d)
(x) = (−δ)
d
(1 + δx)
−1/δ−d
d
i=1
(1/δ +
i −1). For a ﬁxed δ, often used is the EM (expectationmaximization) algorithm for
estimating (λ
f
(),
¯
ξ); see Nielsen, et al. (1992) or the supplementary material.
Alternatively, the gamma frailty models can be estimated by the penalized like
lihood approach, by treating the second term of (4.51) as a kind of regularization.
117
Therneau and Grambsch (2000; ¸9.6) justiﬁed that for each ﬁxed δ, the EM algorith
mic solution coincides with the maximizer of the following penalized loglikelihood
J
j=1
n
j
i=1
∆
ji
_
log λ
f
(M
ji
) +ξ
T
x
ji
(m)
_
−
1
δ
K
κ=1
_
γ
κ
−exp¦γ
κ
¦
_
(4.52)
where ξ = (β, γ, θ) and x
ji
(m) are the same as in (4.46). They also provide the
programs (in R:survival package) with NewtonRaphson iterative solution as an inner
loop and selection of δ in an outer loop. Fortunately, these programs can be used to
ﬁt the dualtime Cox frailty models (4.50) up to certain modiﬁcations.
Remarks:
1. The dualtime Cox regression considered in this section is tricked to the standard
CoxPH problems upon exogenous timescale discretization, such that the existing
theory of partial likelihood and the asymptotic results can be applied. We have
not studied the dualtime asymptotic behavior when both lifetime and calendar
time are considered to be absolutely continuous. In practice with ﬁnitesample
observations, the timediscretization trick works perfectly.
2. The dualtime Cox model (4.40) includes only the endogenous covariates z
ji
(m)
that vary across individuals. In practice, it is also interesting to see the eﬀects
of exogenous covariates ¯z(t) that are invariant to all the individuals at the same
calendar time. Such ¯z(t) might be entertained by the traditional lifetime Cox
models (4.37) upon the time shift, however, they are not estimable if included as
a loglinear term by (4.40), since exp¦θ
T
¯z(t)¦ would be absorbed by the exoge
nous baseline λ
g
(t). Therefore, in order to investigate the exogenous covariate
eﬀects of ¯z(t), we recommend a twostage procedure with the ﬁrst stage ﬁtting
the dualtime Cox models (4.40), and the second stage modeling the correlation
between ¯z(t) and the estimated
´
λ
g
(t) based on the time series techniques.
118
4.5 Applications in Retail Credit Risk Modeling
The retail credit risk modeling has become of increasing importance in the recent
years; see among others the special issues edited by Berlin and Mester (Journal of
Banking and Finance, 2004) and Wallace (Real Estate Economics, 2005). In quan
titative understanding of the risk determinants of retail credit portfolios, there have
been large gaps among academic researchers, banks’ practitioners and governmental
regulators. It turns out that many existing models for corporate credit risk (e.g. the
Merton model) are not directly applicable to retail credit.
In this section we demonstrate the application of dualtime survival analysis
to credit card and mortgage portfolios in retail banking. Based on our real data
analytic experiences, some tweaked samples are considered here, only for the purpose
of methodological illustration. Speciﬁcally, we employ the pooled credit card data
for illustrating the dualtime nonparametric methods, and use the mortgage loan
level data to illustrate the dualtime Cox regression modeling. Before applying these
DtSA techniques, we have assessed them under the simulation mechanism (4.2); see
Figure 4.2 and Figure 4.3.
4.5.1 Credit Card Portfolios
As we have introduced in Section 1.4, the credit card portfolios range from product
types, acquisition channels, geographical regions and the like. These attributes may
be either treated as the endogenous covariates, or used to deﬁne the multiple seg
ments. Here, we demonstrate the credit card risk modeling with both segmentation
and covariates. For simplicity, we consider two generic segments and a categorical
covariate with three levels, where the categorical variable is deﬁned by the credit score
buckets upon loan origination: Low if FICO < 640, Medium if 640 ≤ FICO < 740
and High if FICO ≥ 740. Other variables like the credit line and utilization rate are
not considered here.
119
Table 4.1: Illustrative data format of pooled credit card loans
SegID FICO V U M D W
1 Low 200501 0 3 0 988
1 Low 200501 0 3 1 2
1 Low 200501 3 4 0 993
1 Low 200501 3 4 1 5
1 Low 200501 4 5 0 979
1 Low 200501 4 5 1 11
1 Low 200501
.
.
.
.
.
.
.
.
.
.
.
.
1 Low 200502
.
.
.
.
.
.
.
.
.
.
.
.
1 Medium
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
V: vintage. [U,M]: lifetime interval. D: status. W: counts.
In this simple case, one may aggregate the observations at the vintage level for
each FICO bucket in each segment. See Table 4.1 for an illustration of the counting
format of the input data, where each single vintage at each lifetime interval [U, M]
has two records of weight W, namely the number of survived loans (D = 0) and
the number of defaulted loans (D = 1). Note that the last column of Table 4.1
also illustrates that in the lifetime interval [3, 4], the 200501 vintage has 3 attrition
accounts (i.e. 993 −979 −11) which are examples of being randomly censored.
Let us consider the card vintages originated from year 2001 to 2008 with obser
vations (a) left truncated at the beginning of 2005, (b) right censored by the end
of 2008, and (c) up censored by 48 monthsonbook maximum. Refer to Figure 1.3
about the vintage diagram of dualtime data collection, which resembles the rectan
gular Lexis diagram of Figure 4.1. First of all, we explore the empirical hazards of
the two hypothetical segments, regardless of the FICO variable.
We applied the oneway, twoway and threeway nonparametric methods devel
oped in Section 4.2 for each segment of dualtimetodefault data. Note that given
the pooled data with weights W for D = 0, 1, it is rather straightforward to modify
120
Figure 4.4: Nonparametric analysis of credit card risk: (top) oneway empirical haz
ards, (middle) DtBrewlow estimation, (bottom) MEV decomposition.
121
the iterative nonparametric estimators in (4.14) – (4.24), and one may also use the
weighted partial likelihood estimation based on our discussion in Section 4.4. The
numeric results for both segments are shown in Figure 4.4, where we may compare
the performances of two segments as follows.
1. The top panel shows the empirical hazard rates in both lifetime and calendar
time. By such oneway estimates, Segment 1 has shown better lifetime perfor
mance (i.e. lower hazards) than Segment 2, and it has also better calendartime
performance than Segment 2 except for the latest couple of months.
2. The middle panel shows the results of DtBreslow estimator that simultaneously
calibrates the endogenous seasoning λ
f
(m) and the exogenous cycle eﬀects λ
g
(t)
under the twoway hazard model (4.16). Compared to the oneway calibration
above, the seasoning eﬀects are similar, but the exogenous cycle eﬀects show
dramatic diﬀerence. Conditional on the endogenous baseline, the credit cycle
of Segment 2 grows relatively steady, while Segment 1 has a sharp exogenous
trend since t = 30 (middle 2007) and exceeds Segment 1 since about t = 40.
3. The bottom panel takes into the vintage heterogeneity, where we consider only
the yearly vintages for simplicity. The estimated of λ
f
(m) and λ
g
(t) are both
aﬀected after adding the vintage eﬀects, since the more heterogeneity eﬀects are
explained by the vintage originations, the less variations are attributed to the
endogenous and exogenous baselines. Speciﬁcally, the vintage performance in
both segments have deteriorated from pre2005 to 2007, and made a turn when
entering 2008. Note that Segment 2 has a dramatic improvement of vintage
originations in 2008.
Next, consider the three FICO buckets for each segment, and we still take the
nonparametric approach for each of 2 3 segments. The MEV decomposition of
empirical hazards are shown in Figure 4.5. It is found that in lifetime the low, medium
122
and high FICO buckets have decreasing levels of endogenous baseline hazard rates,
where for Segment 2 the hazard rates are evidently nonproportional. The exogenous
cycles of low, medium and high FICO buckets comove (more or less) within each
segment. As for the vintage heterogeneity, it is interesting to note that for Segment
1 the high FICO vintages keep deteriorating from pre2005 to 2008, which is a risky
signal. The vintage heterogeneity in other FICO buckets are mostly consistent with
the overall pattern in Figure 4.4.
Figure 4.5: MEV decomposition of credit card risk for low, medium and high FICO
buckets: Segment 1 (top panel); Segment 2 (bottom panel).
This example mainly illustrates the eﬀectiveness of dualtime nonparametric meth
ods when there are limited or no covariates provided. By the next example of mort
gage loans, we shall demonstrate the dualtime Cox regression modeling with various
covariate eﬀects on competing risks.
123
4.5.2 Mortgage Competing Risks
The U.S. residential mortgage market is huge. As of March 2008, the ﬁrstlien
mortgage debt is estimated to be about $10 trillion (for 54.7 million loans) with a
distribution of $1.2 trillion subprime mortgages, $8.2 trillion prime and nearprime
combined, and the remaining being government guaranteed; see U.S. Federal Researve
report by Frame, Lehnert and Prescott (2008). In brief, the prime mortgages generally
target the borrowers with good credit histories and normal down payments, while the
subprime mortgages are made to borrowers with poor credit and high leverage.
The rise in mortgage defaults is unprecedented. Mayer, Pence and Sherlund (2009)
described the rising defaults in nonprime mortgages in terms of underwriting stan
dards and macroeconomic factors. Quantitatively, Sherlund (2008) considered sub
prime mortgages by proportional hazards models (4.37) with presumed ﬁxed lifetime
baselines (therefore, a parametric setting). One may refer to Gerardi, et al. (2008)
and Goodman, et al. (2008) for some further details of the subprime issue. In
retrospect, the 200708 collapse of mortgage market has rather complicated causes,
including the loose underwriting standards, the increased leverage, the originatethen
securitize incentives (a.k.a. moral hazard), the fall of home prices, and the worsening
unemployment rates.
We make an attempt via dualtime survival analysis to understand the risk deter
minants of mortgage default and prepayment. Considered here is a tweaked sample of
mortgages loans originated from year 2001 to 2007 with Lexis diagram of observations
(a) left truncated at the beginning of 2005, (b) right censored by October 2008, and
(c) up censored by 60 monthsonbook maximum. These loans resemble the noncon
forming mortgages in California, where the regional macroeconomic conditions have
become worse and worse since middle 2006; see Figure 4.6 for the plots of 200008
home price indices and unemployment rate, where we use the uptodate data sources
from S&P/CaseShiller and U.S. Bureau of Labor Statistics.
124
Figure 4.6: Home prices and unemployment rate of California state: 20002008.
Figure 4.7: MEV decomposition of mortgage hazards: default vs. prepayment
125
For a mortgage loan with competing risks caused by default and prepayment, the
observed event time corresponds to time to the ﬁrst event or the censoring time.
One may refer to Lawless (2003; ¸9) for the modeling issues on survival analysis of
competing risks. One practically simple approach is to reduce the multiple events to
individual survival analysis by treating alternative events as censored. Then, we may
ﬁt two marginal hazard rate models for default and prepayment, respectively. One
may refer to Therneau and Grambsch (2000; ¸8) about robust variance estimation
for the marginal Cox models of competing risks.
We begin with nonparametric exploration of endogenous and exogenous hazards,
as well as the vintage heterogeneity eﬀects. Similar to the credit card risk modeling
above, the monthly vintages are grouped yearly to be pre2005, 2005 2006, 2007
originations. The ﬁtting results of MEV hazard decomposition are shown in Figure 4.7
for both default and prepayment. Conditional on the estimated endogenous baseline
hazards of default or prepayment, we ﬁnd that
1. The exogenous cycle of the default risk has been low (with exogenous multiplier
less than 1) prior to t = 20 (August 2006), then become rising sharply up
to about 50 times more risker at about t = 43 (July 2008). In contrast, the
exogenous cycle of the prepayment risk has shown a modest townturn trend
during the same calendar time period.
2. The vintage heterogeneity plot implies that the 200506 originations have much
higher probability of default than the mortgage originations in other years. In
contrast, the vintage heterogeneity of prepayment keeps decreasing from pre
2005 to 2007. These ﬁndings match the empirical evidences presented lately by
Gerardi, et al. (2008).
Compare the estimated exogenous eﬀects on default hazards to the macroeconomic
conditions in Figure 4.6. It is obvious that the rise in mortgage defaults follows closely
126
the fall of home prices (positively correlated) and the worsening unemployment rate
(negatively correlated). However, based on our demonstration data, it is not possible
to quantify what percentage of exogenous eﬀects should be attributed to either house
prices or unemployment, because (a) the house prices and unemployment in California
are themselves highly correlated from 2005 to 2008, and (b) the exogenous eﬀects on
default hazards could be also caused by other macroeconomic indicators. One may
of course perform multivariate time series analysis in order to study the correlation
between exogenous hazards and macroeconomic indicators, which is beyond the scope
of the present thesis.
Next, we try to decode the vintage eﬀect of unobserved heterogeneity by adding the
loanlevel covariates. For demonstration purpose, we consider only the few mortgage
covariates listed in Table 4.2, where CLTV stands for the combined loantovalue
leverage ratio, FICO is the same measurement of credit score as in the credit card risk,
and other underwriting covariates upon loan origination can be referred to Goodman,
et al. (2008; ¸1) or Wikipedia. We also include the mortgage note rate, either ﬁxed
or ﬂoating (i.e., adjustable), to demonstrate the capability of Cox modeling with
timevarying covariates. Note that the categorical variable settings are simpliﬁed in
Table 4.2, and they could be way more complicated in real situations. The mean,
standard deviation and range statistics presented for the continuous covariates CLTV,
FICO and NoteRate are calculated from our tweaked sample.
The dualtime Cox regression models considered here take the form of (4.4) with
both endogenous and exogenous baselines. After incorporating the underwriting and
dynamic covariates in Table 4.2, we have
λ
(q)
ji
(m, t) = λ
(q)
f
(m)λ
(q)
g
(t) exp
_
θ
(q)
1
CLTV
ji
+ θ
(q)
2
FICO
ji
+ θ
(q)
3
NoteRate
ji
(m)
+ θ
(q)
4
DocFull
ji
+ θ
(q)
5
IO
ji
+ θ
(q)
6.p
Purpose.purchase
ji
+ θ
(q)
6.r
Purpose.reﬁ
ji
_
, (4.53)
127
Table 4.2: Loanlevel covariates considered in mortgage credit risk modeling, where
NoteRate could be dynamic and others are static upon origination.
CLTV µ = 78, σ = 13 and range [10, 125]
FICO µ = 706, σ = 35 and range [530, 840]
NoteRate µ = 6.5, σ = 1.1 and range [1, 11]
Documentation full or limited
IO Indicator InterestOnly or not
Loan Purpose purchase, reﬁ or cashout reﬁ
Table 4.3: Maximum partial likelihood estimation of mortgage covariate eﬀects in
dualtime Cox regression models.
Covariate Default Prepayment
CLTV 2.841 −0.285
FICO −0.500 0.385
NoteRate 1.944 0.2862
DocFull −1.432 −0.091
IO 1.202 0.141
Purpose.p −1.284 0.185
Purpose.r −0.656 −0.307
where the superscript (q) indicates the competingrisk hazards of default or prepay
ment, the continuous type of covariates FICO, CLTV and NoteRate are all scaled to
have zero mean and unit variance, and the 3level Purpose covariate is broken into
2 dummy variables (Purpose.p and Purpose.r against the standard level of cashout
reﬁ). For simplicity, we have assumed the loglinear eﬀects of all 3 continuous covari
ates on both default and prepayment hazards, while in practice one may need strive
to ﬁnd the appropriate functional forms or break them into multiple buckets.
By the partial likelihood estimation discussed in Section 4.4, we obtain the numer
ical results tabulated in Table 4.3, where the coeﬃcient estimates are all statistically
signiﬁcant based on our demonstration data (some other nonsigniﬁcant covariates
were actually screened out and not considered here). From Table 4.3, we ﬁnd that
conditional on the nonparametric dualtime baselines,
1. The default risk is driven up by high CLTV, low FICO, high NoteRate, limited
documentation, InterestOnly, and cashout reﬁ Purpose.
128
2. The prepayment risk is driven up by low CLTV, high FICO, high NoteRate,
limited documentation, InterestOnly, and purchase Purpose.
3. The covariates CLTV, FICO and Purpose reveals the competing nature of de
fault and prepayment risks. It is actually reasonable to claim that a homeowner
with good credit and low leverage would more likely prepay rather than choose
to default.
4.6 Summary
This chapter is devoted completely to the survival analysis on Lexis diagram
with coverage of nonparametric estimators, structural parameterization and semi
parametric Cox regression. For the structural approach based on inverse Gaussian
ﬁrstpassagetime distribution, we have discussed the feasibility of its dualtime ex
tension and the calibration procedure, while its implementation is open to future
investigation. Computational results are provided for other dualtime methods devel
oped in this chapter, including assessment of the methodology through the simulation
study. Finally, we have demonstrated the application of dualtime survival analysis to
the retail credit risk modeling. Some interesting ﬁndings in credit card and mortgage
risk analysis are presented, which may shed some light on understanding the ongoing
credit crisis from a new dualtime perspective.
To end this thesis, we conclude that statistical methods play a key role in credit
risk modeling. Developed in this thesis is mainly the dualtime analytics, including
VDA (vintage data analysis) and DtSA (dualtime survival analysis), which can be
applied to model both corporate and retail credit. We have also remarked throughout
the chapters the possible extensions and other open problems of interest. So, the end
is also a new beginning . . .
129
4.7 Supplementary Materials
(A) Calibration of Inverse Gaussian Hazard Function
Consider the calibration of the twoparameter inverse Gaussian hazard rate func
tion λ(m; c, b) of the form (4.26) from the n observations (U
i
, M
i
, ∆
i
), i = 1, . . . , n
each subject to left truncation and right censoring. Knowing the parametric likeli
hood given by (4.27), the MLE requires nonlinear optimization that can be carried
out by gradient search (e.g. NewtonRaphason algorithm). Provided below are some
calculations of the derivatives for the gradient search purpose.
Denote by η = (c, b)
T
the structural parameters of inverse Gaussian hazard. The
loglikelihood function is given by
=
n
i=1
∆
i
log λ(M
i
; η) −
_
M
i
U
i
λ(s; η)ds (4.54)
Taking the ﬁrst and second order derivatives, we have the score and Hessian functions
∂
∂η
=
n
i=1
∆
i
λ(M
i
; η)
−
_
M
i
U
i
λ
(s; η)ds
∂
2
∂η∂η
T
=
n
i=1
∆
i
_
λ
(M
i
; η)
λ(M
i
; η)
−
(λ
(M
i
; η))
⊗2
λ
2
(M
i
; η)
_
−
_
M
i
U
i
λ
(s; η)ds
where λ
(m; η) = ∂λ(m; η)/∂η, λ
(m; η) = ∂
2
λ(t; η)/∂η∂η
T
, and a
⊗2
= aa
T
. By
supplying the gradients, the optimization algorithms in most of standard softwares
are faster and more reliable. Note that the evaluation of the integral terms above
can be carried out by numerical integration methods like the composite trapezoidal
or Simpon’s rule; see e.g. Press, et al. (2007). Alternatively, one may replace the
second term of (4.54) by log S(M
i
; η) −log S(U
i
; η) and take their derivatives based
on the explicit expressions below.
The complications lie in the evaluations of ∂λ(m; η)/∂η and ∂
2
λ(m; η)/∂η∂η
T
130
based on the inverse Gaussian baseline hazard (4.26), which are, messy though, pro
vided below. Denoting
z
1
=
η
1
+ η
2
m
√
m
, z
2
=
−η
1
+ η
2
m
√
m
, φ(z) =
1
√
2π
e
−z
2
/2
,
evaluate ﬁrst the derivatives of the survival function (i.e. the denominator of (4.26)):
∂S(m; η)
∂η
1
= 2η
2
e
−2η
1
η
2
Φ(z
2
) +
1
√
m
_
φ(z
1
) + e
−2η
1
η
2
φ(z
2
)
_
∂S(m; η)
∂η
2
= 2η
1
e
−2η
1
η
2
Φ(z
2
) +
√
m
_
φ(z
1
) −e
−2η
1
η
2
φ(z
2
)
_
∂
2
S(m; η)
∂η
2
1
= −4η
2
2
e
−2η
1
η
2
Φ(z
2
) −
4η
2
√
m
e
−2η
1
η
2
φ(z
2
) −
1
m
_
φ(z
1
)z
1
−e
−2η
1
η
2
φ(z
2
)z
2
_
∂
2
S(m; η)
∂η
1
∂η
2
= (2 −4η
1
η
2
)e
−2η
1
η
2
Φ(z
2
) −
_
φ(z
1
)z
1
−e
−2η
1
η
2
φ(z
2
)z
2
_
∂
2
S(m; η)
∂η
2
2
= −4η
2
1
e
−2η
1
η
2
Φ(z
2
) + 4η
1
√
me
−2η
1
η
2
φ(z
2
) −m
_
φ(z
1
)z
1
−e
−2η
1
η
2
φ(z
2
)z
2
_
.
Then, the partial derivatives of λ(m; η) can be evaluated explicitly by
∂λ(m; η)
∂η
1
= −λ
0
(m; η)
_
η
1
m
+ η
2
−
1
η
1
_
−
λ
0
(m; η)
S(m; η)
∂S(m; η)
∂η
1
∂λ(m; η)
∂η
2
= −λ(m; η) (η
1
+ η
2
m) −
λ(m; η)
S(m; η)
∂S(m; η)
∂η
2
∂
2
λ(m; η)
∂η
2
1
= −
∂λ(m; η)
∂η
1
_
η
1
m
+ η
2
−
1
η
1
_
−λ(m; η)
_
1
η
2
1
+
1
m
_
−
λ(m; η)
S(m; η)
∂
2
S(m; η)
∂η
2
1
−
1
S(m; η)
∂λ(m; η)
∂η
1
∂S(m; η)
∂η
1
+
λ(m; η)
S
2
(m; η)
_
∂S(m; η)
∂η
1
_
2
∂
2
λ(m; η)
∂η
2
2
= −
∂λ(m; η)
∂η
2
(η
1
+ η
2
m) −λ(m; η)m
−
λ(m; η)
S(m; η)
∂
2
S(m; η)
∂η
2
2
−
1
S(m; η)
∂λ(m; η)
∂η
2
∂S(m; η)
∂η
2
+
λ(m; η)
S
2
(m; η)
_
∂S(m; η)
∂η
2
_
2
∂
2
λ(m; η)
∂η
1
∂η
2
= −
∂λ(m; η)
∂η
1
(η
1
+ η
2
m) −λ(m; η)
−
λ(m; η)
S(m; η)
∂
2
S(m; η)
∂η
1
∂η
2
−
1
S(m; η)
∂λ(m; η)
∂η
1
∂S(m; η)
∂η
2
+
λ(m; η)
S
2
(m; η)
∂S(m; η)
∂η
1
∂S(t; η)
∂η
2
.
131
(B) Implementation of Cox Modeling with Timevarying Covariates
Consider the Cox regression model with timevarying covariates x(m) ∈ R
p
:
λ(m; x(m)) = λ
0
(m) exp¦θ
T
x(m)¦ (4.55)
where λ
0
(m) is an arbitrary baseline hazard function and θ is the pvector of regression
coeﬃcients. Provided below is the implementation of maximum likelihood estimation
of θ and λ
0
(m) by tricking it to be a standard CoxPH modeling with timeindepdent
covariates.
Let τ
i
be the event time for each account i = 1, . . . , n that is subject to left
truncation U
i
and right censoring C
i
. Given the observations
(U
i
, M
i
, ∆
i
, x
i
(m)), m ∈ [U
i
, M
i
], i = 1, . . . , n, (4.56)
in which M
i
= min¦τ
i
, C
i
¦, ∆
i
= I(τ
i
< C
i
), the complete likelihood is given by
L =
n
i=1
_
f
i
(M
i
)
S
i
(U
i
)
_
∆
i
_
S
i
(M
i
+)
S
i
(U
i
)
_
1−∆
i
=
n
i=1
[λ
i
(M
i
)]
∆
i
S
i
(M
i
+)S
−1
i
(U
i
). (4.57)
Under the dynamic Cox model (4.55), the likelihood function is
L(θ, λ
0
) =
n
i=1
_
λ
0
(M
i
) exp¦θ
T
x
i
(M
i
)¦
_
∆
i
exp
_
−
_
M
i
U
i
λ
0
(s) exp¦β
T
x
i
(s)¦ds
_
.
(4.58)
Taking λ
0
as the nuisance parameter, one may estimate β by the partial likelihood;
see Cox (1972, 1975) or Section 4.4.
The model estimation with timevarying covariates can be implemented by the
standard software package that handles only the timeindependent covariates by de
fault. The trick is to construct little time segments such that x
i
(m) can be viewed
constant within each segment. See Figure 4.8 for an illustration with the construction
132
Figure 4.8: Split timevarying covariates into little time segments, illustrated.
of monthly segments. For each account i, let us partition m ∈ [U
i
, T
i
] to be
U
i
≡ U
i,0
< U
i,1
< < U
i,L
i
≡ M
i
such that
x
i
(m) = x
i,l
, m ∈ (U
i,l−1
, U
i,l
]. (4.59)
Thus, we have converted the nsample survival data (4.56) to the following
n
i=1
L
i
independent observations each with the constant covariates
(U
i,l−1
, U
i,l
, ∆
i,l
, x
i,l
), l = 1, . . . , L
i
, i = 1, . . . , n (4.60)
where ∆
i,l
= ∆
i
for l = L
i
and 0 otherwise.
It is straightforward to show that the likelihood (4.57) can be reformulated by
L =
n
i=1
[λ
i
(M
i
)]
∆
i
L
i
l=1
S
i
(U
i,l
)S
−1
i
(U
i,l−1
) =
n
i=1
L
i
l=1
[λ
i
(U
i,l
)]
∆
i,l
S
i
(U
i,l
)S
−1
i
(U
i,l−1
),
(4.61)
which becomes the likelihood function based on the enlarged data (4.60). Therefore,
133
under the assumption (4.59), the parameter estimation by maximizing (4.58) can be
equivalently handled by maximizing
L(θ, λ
0
) =
n
i=1
L
i
l=1
_
λ
0
(U
i,l
) exp¦θ
T
x
i,l
¦
_
∆
i,l
exp
_
−exp¦θ
T
x
i,l
¦
_
U
i,l
U
i,l−1
λ
0
(s)ds
_
.
(4.62)
In R, one may perform the MLE directly by applying the CoxPH procedure (R:
survival package) to the transformed data (4.60); see Therneau and Grambsch (2000)
for computational details.
Furthermore, one may trick the implementation of MLE by logistic regression,
provided that the underlying hazard rate function (4.55) admits the linear approxi
mation up to the logit link:
logit(λ
i
(m)) ≈ η
T
φ(m) +θ
T
x
i
(m), for i = 1, . . . , n (4.63)
where logit(x) = log(x/1 −x) and φ(m) is the vector of prespeciﬁed basis functions
(e.g. splines). Given the survival observations (4.60) with piecewiseconstant covari
ates, it is easy to show that the discrete version of joint likelihood we have formulated
in Section 4.2 can be rewritten as
L =
n
i=1
_
λ
i
(M
i
)
¸
∆
i
[1 −λ
i
(M
i
)]
1−∆
i
m∈(U
i
,M
i
)
[1 −λ
i
(m)]
=
n
i=1
L
i
l=1
_
λ
i
(M
i
)
¸
∆
i,l
[1 −λ
i
(M
i
)]
1−∆
i,l
(4.64)
where ∆
i,l
are the same as in (4.60). This becomes clearly the likelihood function of
the independent binary data (∆
il
, x
i,l
) for l = 1, . . . , L
i
and i = 1, . . . , n, where x
i,l
are constructed by (4.59). Therefore, under the prameterized logistic model (4.63),
one may perform the MLE for the parameters (η, θ) by a standard GLM procedure;
see e.g. Faraway (2006).
134
(C) Frailtytype Vintage Eﬀects in Section 4.4.3
The likelihood function. By the independence of Z
1
, . . . , Z
k
, it is easy to write down
the likelihood as
L =
K
κ=1
E
Zκ
_
j∈Gκ
n
j
i=1
_
λ
ji
(M
ji
[Z
κ
)
_
∆
ji
exp
_
−
_
M
ji
U
ji
λ
ji
(s[Z
κ
)ds
_
_
. (4.65)
Substituting λ
ji
(m[Z
κ
) = λ
f
(m)Z
κ
exp
_
˜
ξ
T
˜ x
ji
(m)
_
, we obtain for each (
κ
the likeli
hood contribution
L
κ
=
j∈Gκ
n
j
i=1
_
λ
f
(M
ji
) exp
_
˜
ξ
T
˜ x
ji
(m)
__
∆
ji
E
Z
_
Z
dκ
exp¦−ZA
κ
¦
_
(4.66)
in which d
κ
=
j∈Gκ
n
j
i=1
∆
ji
and A
κ
=
j∈Gκ
n
j
i=1
_
M
ji
U
ji
λ
f
(s) exp
_
˜
ξ
T
˜ x
ji
(s)
_
ds.
The expectation term in (4.66) can be derived from the d
k
th derivative of Laplace
transform of Z which satisﬁes that E
Z
_
Z
dκ
exp¦−ZA
κ
¦
_
= (−1)
dκ
L
(dκ)
(A
κ
); see e.g.
Aalen, et al. (2008; ¸7). Thus, we obtain the joint likelihood
L =
K
κ=1
j∈Gκ
n
j
i=1
_
λ
f
(M
ji
) exp
_
˜
ξ
T
˜ x
ji
(m)
__
∆
ji
(−1)
dκ
L
(dκ)
(A
κ
), (4.67)
or the loglikelihood function of the form (4.51).
EM algorithm for fraitly model estimation.
1. Estep: given the ﬁxed values of (
¯
ξ, λ
f
), estimate the individual frailty levels
¦Z
κ
¦
K
κ=1
by the conditional expectation
´
Z
κ
= E[Z
κ
[(
κ
] = −
L
(dκ+1)
(A
κ
)
L
(dκ)
(A
κ
)
(4.68)
which corresponds to the empirical Bayes estimate; see Aalen, et al. (2008;
¸7.2.3). For the gamma frailty Gamma(δ
−1
, δ) with L
(d)
(x) = (−δ)
d
(1 +
135
δx)
−1/δ−d
d
i=1
(1/δ + i −1), the frailty level estimates are given by
´
Z
κ
=
1 + δd
κ
1 + δA
κ
, κ = 1, . . . , K. (4.69)
2. Mstep: given the frailty levels ¦Z
κ
¦
K
κ=1
, estimate (
¯
ξ, λ
f
) by partial likelihood
maximization and Breslow estimator with appropriate modiﬁcation based on
the ﬁxed frailty multipliers
PL(
¯
ξ) =
J
j=1
n
j
i=1
_
_
exp¦
¯
ξ
T
x
ji
(M
ji
)¦
(j
,i
)∈R
ji
Z
κ(j
)
exp
_
¯
ξ
T
x
j
i
(M
ji
)
_
_
_
∆
ji
(4.70)
´
Λ
f
(m) =
J
j=1
n
j
i=1
I(m ≥ M
ji
)∆
ji
(j
,i
)∈R
ji
Z
κ(j
)
exp
_
´
ξ
T
x
j
i
(M
ji
)
_, (4.71)
where κ(j) indicates the bucket (
κ
to which V
j
belongs. In R, the partial
likelihood maximization can be implemented by the standard CoxPH procedure
based on the oﬀset type of logfrailty predictors as provided; details omitted.
To estimate the frailty parameter δ, one may run EM algorithm for δ on a grid to
obtain (
¯
ξ
∗
(δ), λ
∗
f
(; δ)), then ﬁnd the maximizer of the proﬁled loglikelihood (4.51).
136
BIBLIOGRAPHY
137
BIBLIOGRAPHY
[1] Aalen, O. Borgan, O. and Gjessing, H. K. (2008). Survival and Event History
Analysis: A Process Point of View. Springer, New York.
[2] Aalen, O. and and Gjessing, H. K. (2001). Understanding the shape of the hazard
rate: a process point of view (with discussion). Statistical Science, 16, 1–22.
[3] Abramovich, F. and Steinberg, D. M. (1996). Improved inference in nonpara
metric regression using L
k
smoothing splines. Journal of Statistical Planning and
Inference, 49, 327–341.
[4] Abramowitz, M. and Stegun, I. A. (1972). Handbook of Mathematical Functions
(10th printing). New York: Dover.
[5] Andersen, P. K., Borgan, O., Gill, R. D. and Keiding, N. (1993). Statistical
Models Based on Couting Processes. Springer, New York.
[6] Andersen, P. K. and Gill, R. D. (1982). Cox’s regression model for counting
processes: a large sample study. Annals of Statistics, 10, 1100–1120.
[7] Arjas, E. (1986). Stanford heart transplantation data revisited: a real time ap
proach. In: Modern Statistical Methods in Chronic Disease Epidemiology (Mool
gavkar, S. H. and Prentice, R. L., eds.), pp 65–81. Wiley, New York.
[8] Berlin, M. and Mester, L. J. (eds.) (2004). Special issue of “Retail credit risk
management and measurement”. Journal of Banking and Finance, 28, 721–899.
[9] Bielecki, T. R. and Rutkowski, M. (2004). Credit Risk: Modeling, Valuation and
Hedging. SpringerVerlag, Berlin.
[10] Bilias, Y., Gu, M. and Ying, Z. (1997). Towards a general asymptotic theory for
Cox model with staggered entry. Annals of Statistics, 25, 662–682.
[11] Black, F. and Cox, J. C. (1976). Valuing corporate securities: Some eﬀects of
bond indenture provisions. Journal of Finance, 31, 351–367.
[12] Bluhm, C. and Overbeck, L. (2007) Structured Credit Portfolio Analysis, Baskets
and CDOs. Boca Raton: Chapman and Hall.
[13] Breeden, J. L. (2007). Modeling data with multiple time dimensions. Computa
tional Statistics & Data Analysis, 51, 4761–4785.
138
[14] Breslow, N. E. (1972). Discussion of “Regression models and lifetables” by Cox,
D. R. Journal of the Royal Statistical Society: Ser. B, 25, 662–682.
[15] Chen, L., Lesmond, D. A. and Wei, J. (2007). Corporate yield spreads and bond
liquidity. Journal of Finance, 62, 119–149.
[16] Chhikara, R. S. and Folks, J. L. (1989). The Inverse Gaussian Distribution:
Theory, Methodology, and Applications. Dekker, New York.
[17] Cox, D. R. (1972). Regression models and lifetables. Journal of the Royal Sta
tistical Society. Series B, 34, 187–220.
[18] Cox, D. R. (1975). Partial likelihood. Biometrika, 62, 269–276.
[19] Cox, J. C., Ingersoll, J. E. and Ross, S. A. (1985). A theory of the term structure
of interest rates. Econometrika, 53, 385–407.
[20] Cressie, N. A. (1993). Statistics for Spatial Data. New York: Wiley.
[21] Das, S. R., Duﬃe, D., Kapadia, N. and Saita, L. (2007). Common failings: how
corporate defaults are correlated. Journal of Finance, 62, 93–117.
[22] Deng, Y., Quigley, J. M. and Van Order, R. (2000). Mortgage terminations,
heterogeneity and the exercise of mortgage options. Econometrica, 68, 275–307.
[23] Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet
shrinkage. Biometrika, 81 (3), 425455.
[24] Duchesne, T. and Lawless, J. F. (2000). Alternative time scales and failure time
models. Lifetime Data Analysis, 6, 157–179.
[25] Duﬃe, D., Pan, J. and Singleton, K. (2000). Transform analysis and option
pricing for aﬃne jumpdiﬀusions. Econometrica, 68, 1343–1376.
[26] Duﬃe, D. and Singleton, K. J. (2003). Credit Risk: Pricing, Measurement, and
Management. Princeton University Press.
[27] Duﬃe, D., Saita, L. and Wang, K. (2007). Multiperiod corporate default pre
diction with stochastic covariates. Journal of Financial Economics, 83, 635–665.
[28] Duﬃe, D., Eckner, A., Horel, G. and Saita, L. (2008). Frailty correlated default.
Journal of Finance, to appear.
[29] Efron, B. (2002). The twoway proportional hazards model. Journal of the Royal
Statistical Society: Ser. B, 34, 216–217.
[30] Esper, J., Cook, E. R. and Schweingruber (2002). Lowfrequency signals in long
treering chronologies for reconstruction past temperature variability. Science,
295, 2250–2253.
139
[31] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications.
Chapman & Hall, Boca Raton.
[32] Fan, J., Huang, T. and Li, R. (2007). Analysis of longitudinal data with semi
parametric estimation of covariance function. Journal of the American Statistical
Association, 102, 632–641.
[33] Fan, J. and Yao, Q. (2003). Nonlinear Time Series: Nonparametric and Para
metric Methods. Springer, New York.
[34] Fang, K. T., Li, R. and Sudjianto, A. (2006). Design and Modeling for Compute
Experiments. Baca Raton: Chapman and Hall.
[35] Faraway, J. J. (2006). Extending the Linear Model with R: Generalized Linear,
Mixed Eﬀects and Nonparametric Regression Models. Baca Raton: Chapman and
Hall.
[36] Farewell, V. T. and Cox, D. R. (1979). A note on multiple time scales in life
testing. Appl. Statist., 28, 73–75.
[37] Frame, S., Lehnert, A. and Prescott, N. (2008). A snapshot of mortgage condi
tions with an emphasis on subprime mortgage performance. U.S. Federal Reserve,
August 2008.
[80] Friedman, J. H. (1991). Multivariate adaptive regression splines (with discus
sion). Annals of Statistics, 19, 1–141.
[39] Fu, W. J. (2000). Ridge estimator in singular design with applications to age
periodcohort analysis of disease rates. Communications in Statistics – Theory
and Method, 29, 263–278.
[40] Fu, W. J. (2008). A smoothing cohort model in ageperiodcohort analysis with
applications to homicide arrest rates and lung cancer mortality rates. Sociological
Methods and Research, 36, 327–361.
[41] Gerardi, K., Sherlund, S. M., Lehnert, A. and Willen, P. (2008). Making sense of
the subprime crisis. Brookings Papers on Economic Activity, 2008 (2), 69–159.
[42] Giesecke, K. (2006). Default and information. Journal of Economic Dynamics
and Control, 30, 2281–2303.
[43] Golub, G. H., Heath, M. and Wahba G. (1979). Generalized crossvalidation as
a method for choosing a good ridge parameter. Technometrics, 21, 215–223.
[44] Goodman, L. S., Li, S., Lucas, D. J., Zimmerman, T. A. and Fabozzi, F. J.
(2008). Subprime Mortgage Credit Derivatives. Wiley, Hoboken NJ.
[45] Gramacy, R. B. and Lee, H. K. H. (2008). Bayesian treed Gaussian process mod
els with an application to computer modeling. Journal of the American Statistical
Association, 103, 1119–1130.
140
[46] Green, P. G. and Silverman, B. W. (1994). Nonparametric Regression and Gen
eralized Linear Models. Chapman & Hall, London.
[47] Gu, C. (2002). Smoothing Spline ANOVA Models. Springer, New York.
[48] Gu, M. G. and Lai, T. L. (1991). Weak convergence of timesequential cen
sored rank statistics with applications to sequential testing in clinical trials. Ann.
Statist., 19, 1403–1433.
[49] Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. New York:
Chapman and Hall.
[50] Hougaard, P. (2000). Analysis of Multivariate Survival Data. Springer, New York.
[51] Jarrow, R. and Turnbull, S. (1995). Pricing derivatives on ﬁnancial securities
subject to credit risk. Journal of Finance, 50, 53–85.
[52] Keiding, N. (1990). Statistical inference in the Lexis diagram. Phil. Trans. R.
Soc. Lond. A, 332, 487–509.
[53] Kimeldorf, G. and Wahba, G. (1970). A correspondence between Bayesian esti
mation of stochastic processes and smoothing by splines. Ann. Math. Stat., 41,
495–502.
[54] Kimeldorf, G. and Wahba, G. (1971). Some results on Tchebycheﬃan spline
functions. J. Math. Anal. Appl., 33, 82–95.
[55] Kupper, L. L., Janis, J. M., Karmous, A. and Greenberg, B. G. (1985). Statistical
ageperiodcohort analysis: a review and critique. Journal of Chronic Disease,
38, 811–830.
[56] Lando, D. (1998). On Cox processes and creditrisky securities. Review of Deriva
tives Research 2, 99–120.
[57] Lando, D. (2004). Credit Risk Modeling: Theory and Applications. Princeton
University Press.
[58] Lawless, J. F. (2003). Statistical Models and Methods for Lifetime Data. Wiley,
Hoboken NJ.
[59] Lee, M.L. T. and Whitmore, G. A. (2006). Threshold regression for survival
analysis: modeling event times by a stochastic process reaching a boundary.
Statistical Science, 21, 501–513.
[60] Lexis, W. (1875). Einleitung in die Theorie der Bev¨olkerungsstatistik. Strass
burg: Tr¨ ubner. Pages 5–7 translated to English by Keyﬁtz, N. and printed in
Mathematical Demography (ed. Smith, D. and N. Keyﬁtz, N., 1977), Springer,
Berlin.
141
[61] Luo, Z. and Wahba, G. (1997). Hybrid adaptive splines. Journal of the American
Statistical Association, 92, 107–116.
[62] Madan, D. and Unal, H. (1998). Pricing the risks of default. Review of Derivatives
Research, 2, 121–160.
[63] Mason, K. O., Mason, W. H., Winsborough, H. H. and Poole, K. (1973). Some
methodological issues in cohort analysis of archival data. American Sociological
Review, 38, 242–258.
[64] Mason, W. H. and Wolﬁnger, N. H. (2002). Cohort analysis. In: International
Encyclopedia of the Social and Behavioral Sciences, 2189–2194. New York: Else
vier.
[65] Marshall, A. W. and Olkin, I. (2007). Life Distributions: Structural of Nonpara
metric, Semiparametric, and Parametric Families. Springer.
[66] Mayer, C., Pence, K. and Sherlund, S. M. (2009). The rise in mortgage defaults.
Journal of Economic Perspectives, 23, 27–50.
[67] McCulloch, C. E. and Nelder, J. A. (1989). Generalized Linear Models (2nd ed.).
London: Chapman and Hall.
[68] Merton, R. C. (1974). On the pricing of corporate debt: The risk structure of
interest rates. Journal of Finance, 49, 1213–1252.
[69] Moody’s Special Comment: Corporate Default and Recovery Rates, 19202008.
Data release: February 2009.
[70] M¨ uller, H. G. and Stadtm¨ uller, U. (1987). Variable bandwidth kernel estimators
of regression curves. Annals of Statistics, 15, 282–01.
[71] Nelsen, R. B. (2006). An introduction to copulas (2nd ed.). Springer, New York.
[72] Nielsen, G. G., Gill, R. D., Andersen, P. K. and Sørensen, T. I. A. (1992). A
counting process approach to maximum likelihood estimation in frailty models
Scan. J. Statist., 19, 25–43.
[73] Nychka, D. W. (1988). Bayesian conﬁdence intervals for a smoothing spline.
Journal of The American Statistical Association, 83, 1134–1143.
[74] Oakes, D. (1995). Multiple Time Scales in Survival Analysis. Lifetime Data Anal
ysis, 1, 7–18.
[75] Pintore, A., Speckman, P. and Holmes, C. C. (2006). Spatially adaptive smooth
ing splines. Biometrika, 93, 113–125.
[76] Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P. (2007)
Numerical Recipes: The Art of Scientiﬁc Computing (3rd, ed.). Cambridge Uni
versity Press.
142
[77] RamlauHansen, H. (1983). Smoothing counting process intensities by means of
kernel functions. Ann. Statist., 11, 453–466.
[78] Ramsay, J. O. and Li, X. (1998). Curve registration. Journal of the Royal Sta
tistical Society: Ser. B, 60, 351–363.
[79] Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine
Learning. The MIT Press.
[80] Rice, J. and Rosenblatt, M. (1983). Smoothing splines: regression, derivatives
and deconvolution. Annals of Statistics, 11, 141–156.
[81] Ruppert, D. and Carroll, R. J. (2000). Spatiallyadaptive penalties for spline
ﬁtting. Aust. N. Z. J. Statist., 42(2), 205–223.
[82] Schr¨odinger, E. (1915). Z¨ ur theorie der fall und steigversuche an teilchen mit
Brownscher bewegung. Physikalische Zeitschrift, 16, pp. 289–295.
[83] Selke, T. and Siegmund, D. (1983). Sequential analysis of the proportional haz
ards model. Biometrika, 70, 315–326.
[84] Shepp, L. A. (1966). RadonNikodynm derivatives of Gaussian measures. Annals
of Mathematical Statistics, 37, 321–354.
[85] Sherlund, S. M. (2008). The past, present, and future of subprime mortgages.
Finance and Economics Discussion Series, 200863, U.S. Federal Reserve Board.
[86] Silverman, B. W. (1985). Some aspects of the spline smoothing approch to non
parametric curve ﬁtting (with discussion). Journal of the Royal Statistical Soci
ety: Ser. B, 47, 1–52.
[87] Slud, E. V. (1984). Sequential linear rank tests for twosample censored survival
data. Ann. Statist., 12, 551–571.
[88] Stein, M. L. (1999). Interpolation of Spatial Data: Some Theory for Kriging.
New York: SpringerVerlag.
[89] Sujianto, A., Li, R. and Zhang, A. (2006). GAMEED documentation. Technical
report. Enterprise Quantitative Risk Management, Bank of America, N.A.
[90] Therneau, R. M. and Grambsch, P. M. (2000). Modeling Survival Data: Extend
ing the Cox Model. Springer, New York.
[91] Tweedie, M. C. K. (1957). Statistical Properties of Inverse Gaussian Distribu
tions. I and II. Annals of Mathematical Statistics, 28, 362–377 and 696–705.
[92] Vaupel, J. W., Manton, K. G. and Stallard, E. (1979). The impact of heterogene
ity in individual frailty on the dynamics of mortality. Demography 16, 439–454.
143
[93] Vasicek, O. (1977). An equilibrium characterization of the term structure. Jour
nal of Financial Economics, 5, 177–188.
[94] Vidakovic, B. (1999). Statistical Modeling by Wavelets. Wiley, New York.
[95] Wahba, G. (1983). Bayesian conﬁdence intervals for the crossvalidated smooth
ing spline. Journal of the Royal Statistical Society: Ser. B, 45, 133–150.
[96] Wahba, G. (1990). Spline Models for Observational Data, SSBMSNSF Regional
Conference Series in Applied Mathematics. Philadelphia: SIAM.
[97] Wahba, G. (1995). Discussion of “Wavelet shrinkage: asymptopia” by Donoho,
et al. Journal of The Royal Statistical Society, Ser. B, 57, 360361.
[98] Wallace, N. (ed.) (2005). Special issue of “Innovations in mortgage modeling”.
Real Estate Economics, 33, 587–782.
[99] Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing. Chapman & Hall,
London.
[100] Wang, J.L. (2005). Smoothing hazard rate. In Arbmitage, P. and Colton, T.
(Eds.): Encyclopedia of Biostatistics (2nd ed.), Volume 7, pp. 4986–4997. Wiley.
[101] Wikipedia, The Free Encylopedia. URL: http://en.wikipedia.org
144
c
Aijun Zhang
2009
All Rights Reserved
To my elementary school, high school and university teachers
ii
To them. active support and his many illuminating ideas. This thesis is impossible without the love and remote support of my parents in China. I am most indebted. I appreciate his inspiration. I am thankful to Julian Faraway for his encouragement during the ﬁrst years of my PhD journey.ACKNOWLEDGEMENTS First of all. my coadvisor from Bank of America. I appreciate his guidance. I would express my gratitude to my advisor Prof. I would also like to thank Ji Zhu. and others I ﬁrst met in 2006 at the Bank. Judy Jin and Tailen Hsing for serving on my doctoral committee and helpful discussions on this thesis and other research works. iii . Xuejun Zhou. I would extend my acknowledgement to Prof. Arun Pinto. encouragement and protection through these valuable years at the University of Michigan. They all persuaded me to jump into the area of credit risk research. Shelly Ennis. I did it a year later and ﬁnally came up with this thesis within two more years. Agus Sudjianto. Mike Bonn. Ruilong He. I am grateful to Dr. for giving me the opportunity to work with him during the summers of 2006 and 2007 and for oﬀering me a fulltime position. I would also like to thank Tony Nobili. KaiTai Fang for his consistent encouragement ever since I graduated from his class in Hong Kong 5 years ago. Vijay Nair for guiding me during the entire PhD research.
. . . . .2. . . . . . . 2. . . . . . . . . . An Introduction to Credit Risk Modeling . ABSTRACT . . . . . . . 2. . 1. . .2 Intensitybased Approach . . . . . . . . . . . . . . . . . . .3 1. . . . . . . . Survival Models . . . . . . . . 1. . . .2. . . . . . .1 Two Worlds of Credit Risk . . . .2. Adaptive Smoothing Spline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Parametrizing Default Intensity 1. . . . . . . . . . . . . . LIST OF FIGURES . . . . . . . . AdaSS: Adaptive Smoothing Spline . . . . . . . . . Scope of the Thesis . . . . 1. . . . . . . . . . . . . . . . . . . .2 Introduction . . . .2 Local Penalty Adaptation . . . . . . . . .TABLE OF CONTENTS DEDICATION . . . .2. . . . . . . . . . . . . . . . . . . . . . .2 Incorporating Covariates . Credit Risk Models . . . . . . . . . . . . 1. . . . . . . . . . . . . Experimental Results . . . . . . . . . . . . . . . . . . . . CHAPTER I. . . . . . 2. . . . . . . .4 II. 1. . . . . . .3. . . . . . . . . . . . .2 1. . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . .1 Reproducing Kernels . . . . . . . . . . . . . . . . . . 2. . .3. . LIST OF TABLES . . . . . . . . . . .1 Structural Approach . . .3 Correlating Credit Defaults . . . . . . . . . . . . . . . . . . . . ii iii vi ix x 1 2 2 5 6 6 9 11 12 13 15 18 24 24 28 30 32 36 38 1. . . . ACKNOWLEDGEMENTS . . .3. . . . . . . . .1. . . . . . . . .1 Credit Spread Puzzle . . .1.2 Actual Defaults . . . . . . . . .1 2. . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 iv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AdaSS Properties . . . . . .3 2.
. . . . . . . . . . . 3. . .4. . . . . . . . . . . . . . . . . . . . . . . .6 Summary .2. . . . .3 Frailtytype Vintage Eﬀects . . . . . 4. . . .5.2 Kriging.5 4. . . . 4. . . .3. . . . . . 3. . 45 46 48 48 51 58 58 60 65 67 68 69 73 74 78 80 82 86 88 88 93 94 95 97 104 104 109 111 112 114 116 119 119 124 129 130 III.1 Single Segment . . 4. . . . . .1 4. . . . . . . .1 Simulation Study . . . . . . . . . . .1 Covariance Kernels .2 Partial Likelihood Estimation . . . . .1 Credit Card Portfolios .4. . . . . . . . . . .5. . . . . . . . . . . . . Dualtime Survival Analysis . . . . . . . . . . . . . . . . . . . .4. . . . . . . . . . . . .5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Incorporation of Covariate Eﬀects . . . . . . . . . . . . . . . . . . . . . . . . . .2 Multiple Segments .1 Dualtime Cox Models . . . . . . Discussion . . . . . . . . . . . . . 4. 4. . . . . . . . . . . . . . . . . . . . . 4. . . . . . . . . . Nonparametric Methods . . . . . . . . . . . . . . .2 Mortgage Competing Risks . . .4 4. . . . . . . .2 3. . . . .3 Introduction . . . . Supplementary Materials . . . . .7 IV. . 3.3 Retail Loan Loss Rates . . . . . . . .2 Introduction . 3. . . . . . . . . . . . . .5. . 4. . . . . . . . . . . . . . . . . . . . Semiparametric Regression . . . . . . . . . . . . .2. . . . . . . .4. . . . . . 4. . . . . . . . . . . .1 Empirical Hazards . . . . .7 BIBLIOGRAPHY . .2. . . . . . . . . . . .2 Corporate Default Rates . .1 3. . . . . . . . . . . . . . . . . . . . . . .5 3. . . 4. . . . . .2 DtBreslow Estimator . . . . . . . . . . . .3 MEV Decomposition . . 3. Gaussian Process Models . . . . . . . . . . . . . Dualtime Cox Regression . . . . . . . . .3 4. . 3. .3.2. . . . . . . . . . . . . . . . . . .4 3. . . . . . . . . Vintage Data Analysis . . . . . . Structural Models . . . . . . . . . . . . 3. . . . 4. . .5 2. . . . . . 4. . . . . MEV Decomposition Framework . . . . . . .3. . . . . . . . . . 137 v . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . .3. .4. . Spline and Kernel Methods 3. . . . . . . . . . . . . .3. . . . . Applications in Credit Risk Modeling . . . Technical Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Technical Proofs . . . . . . .6 4. . . .1 Firstpassagetime Parameterization 4. . . . . . .3 MEV Backﬁtting Algorithm . . .5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 3. . . . . . . . . 3. Applications in Retail Credit Risk Modeling . . . . . . . . . . . . . . . . . . . . . 3. . . .
5 22 2. the log loss rates are considered as the responses.com/).org/cycles/). . Moody’s speculativegrade default rates for annual cohorts 19702008: projection views in lifetime. .2 1. . . both have timevarying frequency. The sin(ωt4 ) signal resembles the Doppler function in Figure 2. . .3 1. . n = 100 and snr = 7.stlouisfed.1 26 2. . . . . . 19202008 (http://www. . . . . . and (4) CreditRisk tweaked sample. .LIST OF FIGURES Figure 1. .4 21 1. . . 4 8 19 1. . . (3) MotorcycleAccident experimental data. . . .1 Moody’srated corporate default rates. bond spreads and NBERdated recessions. .org/fred2/categories/119). . . .3 39 vi . and the timedependent weights are speciﬁed proportional to the number of replicates. . . . . .nber. . .2 34 2. . . . Illustration of retail credit portfolios and vintage diagram. A road map of thesis developments of statistical methods in credit risk modeling. . c) NBERdated recessions (http://www. Data sources: a) Moody’s Baa & Aaa corporate bond yields (http://research. . . . . . . . . b) Moody’s Special Comment on Corporate Default and Recovery Rates. . .1. . . . . . . . The dashed lines represent 95% conﬁdence intervals. . . . . . ω = 10π. . . . . . . . OrdSS curve ﬁtting with m = 2 (shown by solid lines). . . calendar and vintage origination time. . (2) HeaviSine function simulated with noise. . . . . . . Simulated drifted Wiener process. ﬁrstpassagetime and hazard rate. In the creditrisk case.moodys. Heterogeneous smooth functions: (1) Doppler function simulated with noise. . . OrdSS function estimate and its 2ndorder derivative (upon standardization): scaled signal from f (t) = sin(ωt) (upper panel) and f (t) = sin(ωt4 ) (lower panel).
. . . . . . . . . . (bottom) data smoothing on vintage diagram.4 79 79 3. . . . (2nd) simulation with noise. . (bottom) MEV decomposition. . . . . . . . . . . . . . . . . . .5 2. . Results of MEV Backﬁtting algorithm upon convergence (right panel) based on the squared exponential kernels. .6 82 4. . . . AdaSS (red) and the heterogeneous truth (light background). . (middle) pre2005 vintages only. . (right) empirical hazard rates in either lifetime or calendar time. . (3nd) MEV Backﬁtting algorithm upon convergence. . . . . . .3 MEV modeling of empirical hazard rates. . . 42 AdaSS estimate of maturation curve for CreditRisk sample: piecewiseconstant ρ−1 (t) (upper panel) and piecewiselinear ρ−1 (t) (lower panel). . . . .2 98 4. . Nonstationary OrdSS and AdaSS for MotorcycleAccident Data. . . . .2. . . 90 DtBreslow estimator vs oneway nonparametric estimator using the data of (top) both pre2005 and post2005 vintages. . . . . . calendar t and vintage origination ˆ ˆ time v. nonstationary OrdSS and AdaSS: performance in comparison. . . . . .1 3.4 . . . .3 77 3.6 2. . based on the dualtimetodefault data of (top) both pre2005 and post2005 vintages. Synthetic vintage data analysis: (top) projection views of ﬁtted values. . . 121 vii 4. . . . . . (bottom) Estimation compared to the underlying truth. . (middle) pre2005 vintages only. . MEV ﬁtted values: Moody’srated Corporate Default Rates . . . (middle) DtBrewlow estimation. . Vintage data analysis of retail loan loss rates: (top) projection views of emprical loss rates in lifetime m. . . . . . . . . . . .5 3. . . . . . 50 3. . . . . . . . . 40 42 2. . (bottom) post2005 vintages only.2 76 3. . g (t) and h(v) (at ˆ log scale). . 44 Vintage diagram upon truncation and exempliﬁed prototypes.1 Dualtimetodefault: (left) Lexis diagram of subsampled simulations. . . .7 OrdSS.4 Simulation study of Doppler and HeaviSine functions: OrdSS (blue). (bottom) post2005 vintages only. . . . . . . . . (bottom) MEV decomposition eﬀects f (m). . Synthetic vintage data analysis: (top) underlying true marginal effects. . . . . . . . 101 Nonparametric analysis of credit card risk: (top) oneway empirical hazards. 4. . . Shown in the left panel is the GCV selection of smoothing and structural parameters. . . . . . . . . .
. 133 viii .5 MEV decomposition of credit card risk for low.6 4. 123 Home prices and unemployment rate of California state: 20002008.7 4.4. Segment 2 (bottom panel). prepayment . illustrated. medium and high FICO buckets: Segment 1 (top panel). 125 Split timevarying covariates into little time segments.8 MEV decomposition of mortgage hazards: default vs. . 125 4.
. . .com/) and author’s calculations. .1 77 4.1 3. . . . . . . .3 ix . . . . . . . . . . . . 128 4. . where NoteRate could be dynamic and others are static upon origination. . . . . . 128 Maximum partial likelihood estimation of mortgage covariate eﬀects in dualtime Cox regression models. . .1 4. . . . . . . .1 Moody’s speculativegrade default rates. . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Illustrative data format of pooled credit card loans . . . . . . . . . . . AdaSS parameters in Doppler and HeaviSine simulation study. . . . . . . .LIST OF TABLES Table 1. . 19202008 (http://www. . . Synthetic vintage data analysis: MEV modeling exercise with GCVselected structural and smoothing parameters. . . . . . . . . . .moodys. . . . . . . . 23 41 2. . Data source: Moody’s special comment (release: February 2009) on corporate default and recovery rates. . . 120 Loanlevel covariates considered in mortgage credit risk modeling. . . . . . .
Credit risk modeling has been the subject of considerable research interest in ﬁnance and has recently drawn the attention of statistical researchers. provide an eﬃcient MEV backﬁtting algorithm and assess its performance with both simulation and real examples. and “dualtime survival analysis” (DtSA) in the fourth chapter. The second topic is the development of dualtime analytics for observations involving both lifetime and calendar timescale. x . The intrinsic identiﬁcation problem is discussed for both VDA and DtSA. for which we derive an explicit solution based on piecewise type of local adaptation. exogenous inﬂuence by macroeconomic conditions.ABSTRACT This research deals with some statistical modeling problems that are motivated by credit risk analysis. We propose a maturationexogenousvintage (MEV) decomposition strategy in order to understand the risk determinants in terms of selfmaturation in lifetime. and heterogeneity induced from vintage originations. we consider VDA under Gaussian process models. Two challenging issues that arise in this context are evaluation of reproducing kernel and determination of local penalty. In the ﬁrst chapter. The ﬁrst statistical problem considered is the development of adaptive smoothing spline (AdaSS) for heterogeneously smooth function estimation. we provide an uptodate review of credit risk models and demonstrate their close connection to survival analysis. and it plays a key role in subsequent work in the thesis. Our nonparametric AdaSS technique is capable of ﬁtting a diverse set of ‘smooth’ functions including possible jumps. It includes “vintage data analysis” (VDA) for continuous type of responses in the third chapter. Speciﬁcally.
xi . and shed some light on understanding the ongoing credit crisis. We consider nonparametric estimators.DtSA on Lexis diagram is of particular importance in credit risk modeling where the default events could be triggered by both endogenous and exogenous hazards. We demonstrate the application of DtSA to credit card and mortgage risk analysis in retail banking. ﬁrstpassagetime parameterization and semiparametric Cox regression. These developments extend the family of models for both credit risk modeling and survival analysis.
By Wikipedia’s deﬁnition. The data availability leads to two streams of credit risk modeling that have key distinctions. the corporate bankruptcy.org. the loss severity upon the default event. and has recently drawn the attention of statistical researchers. The examples of default event include the bond default. It has been the subject of considerable research interest in banking and ﬁnance communities. 1 . An enormous literature in credit risk has been fostered by both academics in ﬁnance and practitioners in industry. the credit card chargeoﬀ. consumers and regulators. Other forms of credit risk include the repayment delinquency in retail loans. and the mortgage foreclosure.” (Wikipedia. as of March 2009) Central to credit risk is the default event. There are two parallel worlds based upon a simple dichotomous rule of data availability: (a) the direct measurements of credit performance and (b) the prices observed from credit market. as well as the unexpected change of credit rating. “Credit risk is the risk of loss due to a debtor’s nonpayment of a loan or other line of credit.CHAPTER I An Introduction to Credit Risk Modeling Credit risk is a critical area in banking and is of concern to a variety of stakeholders: institutions. which occurs if the debtor is unable to meet its legal obligation according to the debt contract.
1.1
Two Worlds of Credit Risk
The two worlds of credit risk can be simply characterized by the types of default probability, one being actual and the other being implied. The former corresponds to the direct observations of defaults, also known as the physical default probability in ﬁnance. The latter refers to the riskneutral default probability implied from the credit market data, e.g. corporate bond yields. The academic literature of corporate credit risk has been inclined to the study of the implied defaults, which is yet a puzzling world.
1.1.1
Credit Spread Puzzle
The credit spread of a defaultable corporate bond is its excess yield over the defaultfree Treasury bond of the same time to maturity. Consider a zerocoupon corporate bond with unit face value and maturity date T . The yieldtomaturity at t < T is deﬁned by Y (t, T ) = − log B(t, T ) T −t (1.1)
where B(t, T ) is the bond price of the form
T
B(t, T ) = E exp −
t
(r(s) + λ(s))ds
,
(1.2)
given the independent term structures of the interest rate r(·) and the default rate λ(·). Setting λ(t) ≡ 0 gives the benchmark price B0 (t, T ) and the yield Y0 (t, T ) of Treasury bond. Then, the credit spread can be calculated as log(B(t, T )/B0 (t, T )) log(1 − q(t, T )) =− T −t T −t (1.3)
Spread(t, T ) = Y (t, T ) − Y0 (t, T ) = −
where q(t, T ) = E e−
RT
t
λ(t)ds
is the conditional default probability P[τ ≤ T τ > t] and
τ denotes the timetodefault, to be detailed in the next section. 2
The credit spread is supposed to comove with the default rate. For illustration, Figure 1.1 (top panel) plots the Moody’srated corporate default rates and BaaAaa bond spreads ranging from 19202008. The shaded backgrounds correspond to NBER’s latest announcement of recession dates. Most spikes of the default rates and the credit spreads coincide with the recessions, but it is clear that the movements of two time series diﬀer in both level and change. Such a lack of match is the socalled credit spread puzzle in the latest literature of corporate ﬁnance; the actual default rates could be successfully implied from the market data of credit spreads by none of the existing structural credit risk models. The default rates implied from credit spreads mostly overestimate the expected default rates. One factor that helps explain the gap is the liquidity risk – a security cannot be traded quickly enough in the market to prevent the loss. Figure 1.1 (bottom panel) plots, in ﬁne resolution, the Moody’s speculativegrade default rates versus the highyield credit spreads: 19942008. It illustrates the phenomenon where the spread changed in response to liquidityrisk events, while the default rates did not react signiﬁcantly until quarters later; see the liquidity crises triggered in 1998 (Russian/LTCM shock) and 2007 (Subprime Mortgage meltdown). Besides the liquidity risk, there exist other known (e.g. tax treatments of corporate bonds vs. government bonds) and unknown factors that make incomplete the implied world of credit risk. As of today, we lack a thorough understanding of the credit spread puzzle; see e.g. Chen, Lesmond and Wei (2007) and references therein. The shaky foundation of the default risk implied from market credit spreads without looking at the historical defaults leads to further questions about the credit derivatives, e.g. Credit Default Swap (CDS) and Collateralized Debt Obligation (CDO). The 200708 collapse of credit market in Wall Street is partly due to overcomplication in “innovating” such complex ﬁnancial instruments on the one hand, and oversimpliﬁcation in quantifying their embedded risks on the other.
3
Figure 1.1: Moody’srated corporate default rates, bond spreads and NBERdated recessions. Data sources: a) Moody’s Baa & Aaa corporate bond yields (http://research.stlouisfed.org/fred2/categories/119); b) Moody’s Special Comment on Corporate Default and Recovery Rates, 19202008 (http://www.moodys.com/); c) NBERdated recessions (http://www.nber.org/cycles/).
4
which we believe is largely due to the limited access for an academic researcher to the proprietary internal data of historical defaults. and intends to represent the creditworthiness of a borrower such that he or she will repay the debt. the three major U. Experian and TransUnion. developed by Fair Isaac Corporation. which use Aaa. b) credit scoring in consumer lending.2 Actual Defaults The other world of credit risk is the study of default probability bottom up from the actual credit performance. FICO. credit bureaus: Equifax. credit bureaus often report inconsistent FICO scores based on their own proprietary models. Both credit ratings and scores represent the creditworthiness of individual corporations and consumers. by e. the three major U. the academic literature based on the actual defaults is much smaller. banks and other credit card or mortgage lenders. It ranges from 300 (very poor) to 850 (best).1. It includes the popular industry practices of a) credit rating in corporate ﬁnance. Ca. Caa. the three major U. Baa. and Fitch.S. C to represent the likelihood of default from the lowest to the highest. The letters Aaa and Baa in Figure 1.1.S.g. A.S. Let us describe very brieﬂy some rating and scoring basics related to the thesis.1 refers to Ba and the worse ratings. by e. as well as judgement by rating/scoring specialists.1 are examples of Moody’s rating system. Standard & Poor’s. The speculativegrade corporate bonds are sometimes said to be highyield or junk. B. Compared to either the industry practices mentioned above or the academics of the implied default probability. A few academic 5 . The ﬁnal evaluations are based on statistical models of the expected default probability. For the same borrower. The speculative grade in Figure 1.S. Ba. rating agencies: Moody’s.g. Aa. is the bestknown consumer credit score and it is the most widely used by U.
by It´’s lemma. s). For demonstration. In this thesis.works will be reviewed later in Section 1. σ 2 t .1 Structural Approach In credit risk modeling. 1. 1976).2. structural approach is also known as the ﬁrmvalue approach since a ﬁrm’s inability to meet the contractual debt is assumed to be determined by its asset value. It was inspired by the 1970s BlackScholesMerton methodology for ﬁnancial option pricing. 6 . we make an attempt to develop statistical methods based on the actual credit performance data. Let D be the debt level with maturity date T . The Merton model assumes that the default event occurs at the maturity date of debt if the asset value is less than the debt level. we shall use the synthetic or tweaked samples of retail credit portfolios. Two classic structural models are the Merton model (Merton. 1974) and the ﬁrstpassagetime model (Black and Cox.5) from which one may evaluate the default probability P(V (T ) ≤ D). 2 ∼ Lognormal (1. Our focus is placed on the probability of default and the hazard rate of timetodefault. o V (t) = exp V (0) 1 µ − σ 2 t + σWt 2 1 µ − σ 2 t.4) with drift µ.2 Credit Risk Models This section reviews the ﬁnance literature of credit risk models. Recall that EW (t) = 0.3. EW (t)W (s) = min(t. including both structural and intensitybased approaches. as well as the public release of corporate default rates by Moody’s. (1. and let V (t) be the latent asset value following a geometric Brownian motion dV (t) = µV (t)dt + σV (t)dW (t). 1. volatility σ and the standard Wiener process W (t). Given the initial asset value V (0) > D.
(1.The notion of distancetodefault facilitates the computation of conditional default probability. where we set the parameters b = −0. By (1. it is easy to verify that the conditional probability of default at maturity date T is X(t) + b(T − t) √ T −t P(V (T ) ≤ DV (t) > D) = P(X(T ) ≤ 0X(t) > 0) = Φ . σ Then. According to Duﬃe and Singleton (2003). consider the 7 .e. The ﬁrstpassagetime model by Black and Cox (1976) extends the Merton model so that the default event could occur as soon as the asset value reaches a prespeciﬁed debt barrier.6) Clearly. X(t) is a drifted Wiener process of the form X(t) = c + bt + W (t). V (t) hits the debt barrier once the distancetodefault process X(t) hits zero. Figure 1. let us deﬁne the distancetodefault X(t) by the number of standard deviations such that log Vt exceeds log D. (1. i. Given the sample path of asset values up to t.4) by maximum likelihood method.t. which is treated as default in one model but not the other. a constant debt barrier). Given the initial distancetodefault c ≡ X(0) > 0. The key diﬀerence of Merton model and BlackCox model lies in the green path (the middle realization viewed at T ).7) and (1. one may ﬁrst estimate the unknown parameters in (1.8) where Φ(·) is the cumulative normal distribution function.7) with b = and c = log V (0)−log D .8). X(t) = (log V (t) − log D)/σ.02 and c = 8 (s.2 (left panel) illustrates the ﬁrstpassagetime of a drifted Wiener process by simulation. µ−σ 2 /2 σ t≥0 (1.
1957. (1. §13). 1 f (t) P(t ≤ τ < t + ∆tτ ≥ t) = . (1.Figure 1.2: Simulated drifted Wiener process. ﬁrstpassagetime τ = inf{t ≥ 0 : X(t) ≤ 0}. is deﬁned by the instantaneous rate of default conditional on the survivorship.11) The hazard rate. It is well known that τ follows the inverse Gaussian distribution (Schr¨dinger. Tweedie. ∆t↓0 ∆t S(t) λ(t) = lim (1. (1. 1915. ﬁrstpassagetime and hazard rate.12) Using the inverse Gaussian density and survival functions. we obtain the form of the 8 .10) See also other types of parametrization in Marshall and Olkin (2007. The survival function S(t) is deﬁned by P(τ > t) for any t ≥ 0 and is given by c + bt √ t − e−2bc Φ −c + bt √ t S(t) = Φ . t ≥ 0. 1989) with the o density (c + bt)2 c f (t) = √ t−3/2 exp − 2t 2π . or the conditional default rate. Chhikara and Folks.9) where inf ∅ = ∞ as usual.
see the details in Chapter IV or Aalen. Unlike the structural approach that assumes the default to be completely determined by the asset value subject to a barrier. The default is treated as an unexpected event that comes ‘by surprise’. we will discuss the dualtime extension of ﬁrstpassagetime parameterization with both endogenous and exogenous hazards. Many followup papers can be found in Lando (2004). Later in Chapter IV. 8.g.2 Intensitybased Approach The intensitybased approach is also called the reducedform approach. proposed independently by Jarrow and Turnbull (1995) and Madan and Unal (1998). since in the real world the default event (e.2 (right panel) plots λ(t. (1. Figure 1. b) for b = −0.02 and c = 4. the default event in the reducedform approach is governed by an externally speciﬁed intensity process that may or may not be related to the asset value. 1. Modern developments of structural models based on Merton and BlackCox models can be referred to Bielecki and Rutkowski (2004. Borgan and Gjessing (2008. c. c. −c + bt −2bc √ −e Φ t 2πt3 c + bt √ t c > 0.ﬁrstpassagetime hazard rate: √ λ(t. This is a practically appealing feature.13) This is one of the most important forms of hazard function in structural approach to credit risk modeling. Both the trend parameter b and the initial distancetodefault parameter c provide insights to understanding the shape of the hazard rate.2. 6. Bielecki and Rutkowski (2004) and references therein. b) = Φ c exp − (c + bt)2 2t . §3). which resemble the default rates from low to high credit qualities in terms of credit ratings or FICO scores. as well as nonconstant default barrier and incomplete information about structural parameters. Year 2001 bankruptcy of Enron Corporation) is 9 . 12. 10. §10).
The choices (1. When S(t) is absolutely continuous with f (t) = d(1 − S(t))/dt.12) and it has roots in statistical reliability and survival analysis of timetofailure. Thus.often all of a sudden happening without announcement. What matters in modeling defaults is only the ﬁrst jump of the counting process. t ≥ 0. The termstructure models provide straightforward ways to simulate the future default intensity for the purpose of predicting the conditional default probability. and it covers both the meanreverting Vasicek and CIR models by setting µ(λ(t)) = κ(θ − λ(t)) and σ(λ(t)) = 2 2 σ0 + σ1 λ(t). The last model involves a purejump process Jt . Pan and Singleton (2000). In credit risk modeling. In ﬁnance. we have that −dS(t) d[log S(t)] =− . S(t)dt dt t λ(t) = S(t) = exp − 0 λ(s)ds . the intensitybased models are mostly the termstructure models borrowed from the literature of interestrate modeling. they are ad hoc models lacking fundamental interpretation of the default event. The default intensity corresponds to the hazard rate λ(t) = f (t)/S(t) deﬁned in (1. in which case the default intensity is equivalent to the hazard rate.14) dλ(t) = µ(λ(t))dt + σ(λ(t))dWt + dJt with reference to Vasicek (1977). λ(t) is often treated as stochastic. However. In survival analysis.14) are popular because they could yield closedform pricing 10 . Below is an incomplete list: Vasicek: CoxIngersollRoll: Aﬃne jump: dλ(t) = κ(θ − λ(t))dt + σdWt dλ(t) = κ(θ − λ(t))dt + σ λ(t)dWt (1. λ(t) is usually assumed to be a deterministic function in time. Cox process) that refers to a counting process with possibly recurrent events. the default time τ is doubly stochastic. Ingersoll and Roll (1985) and Duﬃe. Note that Lando (1998) adopted the term “doubly stochastic Poisson process” (or. Cox.
2. while the realworld default intensities deserve more ﬂexible and meaningful forms. Here. e. ﬂexibility in parametrizing the default intensity. the intensity models in (1. In practice. Saita and Wang (2007) and Deng. Duﬃe. The dependence of default intensity on state variables z(t) (e. since the latter by deﬁnition is the analysis of timetofailure data.2). by correlated Wiener processes.g. This approach essentially presumes a linear dependence in the diﬀusion components. For instance. Quigley and Van Order (2000).g. In a rough sense. z(t)). 2. which opens another door for approaching credit risk. macroeconomic covariates) is usually treated through a multivariate termstructure model for the joint of (λ(t). 3.3 Survival Models Credit risk modeling in ﬁnance is closely related to survival analysis in statistics.g. see e.formulas for (1. the eﬀect of a state variable on the default intensity can be nonlinear in many other ways. including both the ﬁrstpassagetime structural models and the duration type of intensitybased models. the failure refers to the default event in either corporate or retail risk exposures. eﬀectiveness in modeling the credit portfolios. default risk models based on the actual credit performance data exclusively belong to survival analysis. We ﬁnd that the survival models have at least fourfold advantages in approach to credit risk modeling: 1. ﬂexibility in incorporating various types of covariates.14) cannot be used to model the endogenous shapes of ﬁrstpassagetime hazards illustrated in Figure 1. They correspond to the classical survival analysis in statistics. The intensitybased approach also includes the duration models for econometric analysis of actual historical defaults. 1. and 11 .
16) with individual risk parameters (µi . there are a rich set of lifetime distributions for parametrizing the default intensity (i.4. while the inverse Gaussian distribution has a beautiful ﬁrstpassagetime interpretation. Then. hazard rate) λ(t).3. including Exponential: Weibull: Lognormal: Loglogistic: λ(t) = α λ(t) = αtβ−1 1 φ(log t/α) tα Φ(− log t/α) (β/α)(t/α)β−1 λ(t) = .13). 1 + (t/α)β λ(t) = (1. β > 0 where φ(x) = Φ (x) is the density function of normal distribution. More examples of parametric models can be found in Lawless (2003) and Marshall and Olkin (2007) among many other texts. The loglocationscale transform can be used to model the hazard rates. 1. being straightforward to enrich and extend the family of credit risk models.17) .1 Parametrizing Default Intensity In survival analysis.15) α. The following materials are organized in a way to make these points one by one. Given a latent default time τ0 with baseline hazard rate λ0 (t) and survival function S0 (t) = exp − t 0 λ0 (s)ds . They all correspond to the distributional assumptions of the default time τ . σi > 0 (1. Another example is the inverse Gaussian type of hazard rate function in (1. the ﬁrmspeciﬁc survival function takes the form Si (t) = S0 t eµi 12 1/σi (1. σi ). −∞ < µi < ∞.e. one may model the ﬁrmspeciﬁc default time τi by log τi = µi + σi log τ0 .
given the Weibull baseline hazard rate λ0 (t) = αtβ−1 . thus ending up with proportional hazard rates.19) 1. namely 1. The dependence of default intensity on zi can be studied by regression models. one may use the inverse Gaussian hazard rate (1. the loglocationscale transformed hazard rate is given by β βµi α σ −1 t i exp − σi σi λi (t) = . In the context of survival analysis.2 Incorporating Covariates Consider ﬁrmspeciﬁc (or consumerspeciﬁc) covariates zi ∈ Rp that are either static or timevarying.3. there are two popular classes of regression models. λi (t)/λj (t) = exp{θ T ([zi (t) − zj (t)]}. the hazard ratio between any two ﬁrms.18) For example. and 2. Note that the covariates zi (t) are sometimes transformed before entering 13 .and the ﬁrmspeciﬁc hazard rate is d log Si (t) 1 t = dt σi t eµi 1/σi λi (t) = − λ0 t eµi 1/σi . Then. (1. For example.15) as the baseline λ0 (t).13) or pick one from (1. (1. is constant if the covariates diﬀerence zi (t)−zj (t) is constant in time.20) where λ0 (t) is a baseline hazard function and r(zi (t)) is a ﬁrmspeciﬁc relativerisk multiplier (both must be positive). The hazard rate is taken as the modeling basis in the ﬁrst class of regression models: λi (t) = λ0 (t)r(zi (t)) (1. the multiplicative hazards regression models. The relative risk term is often speciﬁed by r(zi (t)) = exp{θ T zi (t)} with parameter θ ∈ Rp . the accelerated failure time (AFT) regression models.
Rather than using the parametric form of baseline hazard. by (1. and ˆ the nonparametric likelihood to estimate λ0 (t). Given τ0 with baseline hazard rate λ0 (t). Lawless (2003.20). we have that λi (t) = λ0 (t) exp{−βθ T zi }. consider (1. For more details about AFT regression. Then. One may use the partial likelihood to estimate θ. an econometric modeler usually takes diﬀerence of Gross Domestic Product (GDP) index and uses the GDP growth as the entering covariate. the covariates zi could induce the ﬁrmspeciﬁc heterogeneity via either a relativerisk multiplier λi (t)/λ0 (t) = eθ eθ T T zi (t) or the default time acceleration τi /τ0 = zi . see e. Suppose that zi is not timevarying. The estimated baseline λ0 (t) can be further smoothed upon parametric. Cox (1972. as a unique property of the Weibull lifetime distribution. the introduction of the multiplicative hazards model (1.16) with the location µi = θ T zi and the scale σi = 1 (for simplicity). For example. §6). In situations where certain hidden covariates are not accessible. 14 .the model. λi (t) = λ0 t exp{−θ T zi } exp{−θ T zi } (1. Thus. it is straightforward to get the survival function and hazard rate by (1. Such semiparametric modeling could reduce the estimation bias of the covariate eﬀects due to baseline misspeciﬁcation.19). 1975) considered λi (t) = λ0 (t) exp{θ T zi (t)} by leaving the baseline hazard λ0 (t) unspeciﬁed. et al. The second class of AFT regression models are based on the aforementioned loglocationscale transform of default time.e. The frailty models introduced by Vaupel. The latter is among the most popular models in modern survival analysis.17) and (1.18): Si (t) = S0 t exp{−θ T zi } .g.20) is incomplete without introducing the Cox’s proportional hazards (CoxPH) model. the unexplained variation also leads to heterogeneity. i. For a survival analyst. This is equivalent to use the multiplicative hazards model (1. termstructure or datadriven techniques. log τi = θ T zi + log τ0 .21) An interesting phenomenon is when using the Weibull baseline.
§6). δ) with shape δ −1 and scale δ > 0. The default correlation could be eﬀectively characterized by multivariate survival analysis. 2.3 Correlating Credit Defaults The issue of default correlation embedded in credit portfolios has drawn intense discussions in the recent credit risk literature. correlating the default times through the copulas. δ) = (1. 1.3. say. (2007) performed an empirical analysis of default times for U. correlating the default intensities through the common covariates. z>0 Γ(k)δ k p(z. among other works. 15 . et al.g. the proportional frailty model extends (1. there exists two diﬀerent approaches: 1. the positive correlation of defaults would increase the level of total volatility given the same level of total expectation (cf. Markowitz portfolio theory). (1.(1979) are designed to model such unobserved heterogeneity as a random quantity. Das. Bluhm and Overbeck (2007). δ) distribution has the density z k−1 exp{−z/δ} . see e. which is the next topic.23) for each single obligor i subject to the odd eﬀect Z. the frailty models can be also used to characterize the default correlation among multiple obligors. Z ∼ Gamma(δ −1 . Then. In a broad sense. see Aalen. On the other hand. In eyes of a fund manager. In academics.S. (2008. et al. corporations and provided evidence for the importance of default correlation. in particular along with today’s credit crisis involving problematic basket CDS or CDO ﬁnancial instruments. k.22) with mean kδ and variance kδ 2 . where a Gamma(k.20) to be λi (t) = Z · λ0 (t)r(zi (t)).
see Nelsen (2006) for details. As announced earlier. . and the houseprice appreciation index.g. Φ−1 (un )) CΣ. §11) for some intriguing discussion of dynamic frailty modeled by diﬀusion and L´vy processes. . . . . The copula approach to default correlation considers the default times as the modeling basis.The ﬁrst approach is better known as the conditionally independent intensitybased approach. the frailty models (1. un ) = ΘΣ. the Archimedean copula reduces to n i=1 ui (independence copula). The frailty eﬀect Z is usually assumed to be timeinvariant. Duﬃe. e. e In ﬁnance. (2008. A copula C : [0. the GDP growth. Here. . the shortterm interest rate. as well as a dynamic frailty eﬀect to represent the unobservable macroeconomic covariates. . . 1] is a function used to formulate the multivariate joint distribution based on the marginal distributions. . the default correlation induced by the common covariates is in contrast to the aforementioned default heterogeneity induced by the ﬁrmspeciﬁc covariates.ν (Θ−1 (u1 ).g. for diﬀerentiation from singleobligor modeling purpose. 16 . un ) = ΦΣ (Φ−1 (u1 ). When Ψ(u) = − log u. ΦΣ denotes the multivariate normal distribution. and Ψ is the generator of Archimedean copulas. 1]n → [0. .. They are usually rephrased as shared frailty models.ν (u1 . in the sense that the default times are independent conditional on the common covariates. et al. . See also Aalen et al. ΘΣ. (2008) applied the shared frailty eﬀect across ﬁrms. . un ) = Ψ−1 i=1 Ψ(ui ) where Σ ∈ Rn×n .ν denotes the multivariate Studentt distribution with degrees of freedom ν. e. Examples of common covariates include the marketwide variables. .23) can also be used to capture the dependence of multiobligor default intensities in case of hidden common covariates. One may refer to Hougaard (2000) for a comprehensive treatment of multivariate survival analysis from a frailty point of view.24) CΨ (u1 . . By Sklar’s theorem. . . . . . Gaussian: Studentt: Archimedean: CΣ (u1 . . Θ−1 (un )) ν ν n (1.
et al. Besides. . . For each τi with underlying hazard rates Z ·λi (t). . . . . . . Then. we have that L (x) = (1 + δx)−1/δ . the joint survival distribution is given by n n Sjoint (t1 . so n ti −1/δ Sjoint (t1 . i = 1. which 17 . . . tn ) = 1+δ i=1 0 λi (s)ds . given the Gamma frailty Z ∼ Gamma(δ −1 . tn ) = C(S1 (t1 ). . Interestingly enough. upon an appropriate selection of copula C.25) where L (x) = EZ [exp{−xZ}] is the Laplace transform of Z and Λi (t) is the cumulative hazard. . the joint survival distribution of (τ1 . tn ) = L i=1 Λi (ti ) =L i=1 L −1 (Si (ti )) . the correlation risk in these complex ﬁnancial instruments is highly contingent upon the macroeconomic conditions and blackswan events. the marginal survival distributions are found by integrating over the distribution of Z. . .26) where the second equality follows (1. . Similarly. there always exists a copula that can link the joint distribution to its univariate marginals.25) and leads to an Archimedean copula. . τn ) can be characterized by Sjoint (t1 . . §7). §13) and recast by Aalen. δ). . Therefore. the shared frailty models correspond to the Archimedean copula.for a multivariate joint distribution. assume λi (t) to be deterministic and Z to be random. . (2008. For example. An obvious counter example is the recent collapse of the basket CDS and CDO market for which the rating agencies have once abused the use of the oversimpliﬁed Gaussian copula. (1. . one must ﬁrst of all check the appropriateness of the assumption behind either the frailty approach or the copula approach. . In practice of modeling credit portfolios. Sn (tn )). n (1. . . as observed by Hougaard (2000. t Si (t) = EZ exp − 0 λi (s)ds = L (Λi (t)). .
see e. speciality.stlouisfed. Lawless (2003) and Aalen.g. banking center. and other distinguishable characteristics (e. we have only cited Hougaard (2000).g. Federal Reserve Bank of St. telephone and internet). but there are numerous texts and monographs on survival analysis and reliability. reward.g. et al. or the new developments that would beneﬁt both ﬁelds. direct mail. Generalization The above survey of survival models is too brief to cover the vast literature of survival analysis that deserves potential applications in credit risk modeling. It is straightforward to enrich and extend the family of credit risk models. To be selective. volatile and unpredictable.are rather dynamic. Louis. geographic region where the card holder resides). ALFRED digital archive1 hosted by U. acquisition channels (e. 1 http://alfred. ranging from product types (e. The vintage time series are among the most popular versions of economic and risk management data. (2008). consider for instance the revolving exposures of credit cards.4 Scope of the Thesis Both old and new challenges arising in the context of credit risk modeling call for developments of statistical methodologies. 1. By convention. and we will focus on the dualtime generalization. Let us ﬁst preview the complex data structures in realworld credit risk problems before deﬁning the scope of the thesis. Subject to the market share of the cardissuing bank. the accounts originated from the same month constitute a monthly vintage (quarterly and yearly vintages can be deﬁned in the same fashion). For risk management in retail banking. cobrand and secured cards).org/ 18 . The latter is one of our objectives in this thesis.g. thousands to hundred thousands of new cards are issued monthly.S. by either the existing devices in survival analysis.
the vintage diagram consists of unitslope trajectories each representing the evolvement of a diﬀerent vintage. binary or count observations.e. called the vintage diagram. Figure 1. all kinds of data can be collected including the continuous. then the lifetime (i. On a vintage diagram. a vintage time series refers to the longitudinal observations and performance measurements for the cohort of accounts that share the same origination date and therefore the same age. where “vintage” is a borrowed term from winemaking industry to account for the sharing origination. Metropolitan Statistical Area) as a representative of segmentation variables. Each segment consists of multiple vintages that have diﬀerent origination dates.S. In particular. age. or called the “monthonbook” for a monthly vintage) is calculated by m = t − v for t ≥ v. t.3: Illustration of retail credit portfolios and vintage diagram. The most important feature of the vintage data is its dualtime coordinates. Unlike a usual 2D space.3 provides an illustration of retail credit portfolios and the scheme of vintage data collection. each vintage be denoted by its origination time v. and each vintage consists of varying numbers of accounts. when the dualtime 19 . v) tuples lie in the dualtime domain. where we use only the geographic MSA (U. Such (m.Figure 1. Let the calendar time be denoted by t. with xaxes representing the calendar time t and yaxis representing the lifetime m. Then.
but very volatile in calendar time. one may draw a line segment for each individual to represent the vintage data graphically. which are calculated from the cumulative rates released recently by Moody’s Global Credit Policy (February 2009).1 shows the Moody’s speculativegrade default rates for annual cohorts 19702008. there seems to be heterogeneity among vintages. or “dualtime analytics”. demography.4. Besides.1. the observations of default events or default rates share the same vintage data structure.e. First of all. It is clear that the average default rates gradually decrease in lifetime. Each cohort is observed by year end of 2008. cohort) heterogeneity. 2002) analyzed the long historical treering records in a dualtime manner in order to reconstruct the 20 . it is worth mentioning that the vintage data structure is by no means an exclusive feature of economic and ﬁnancial data.1 are aligned in lifetime. These projection views are only an initial exploration. etc. As an interesting example in dendrochronology. Further analysis of this vintage data will be performed in Chapter III. To compare the cohort performances in dualtime coordinates. see Figure 4. For corporate bond and loan issuers. while they can be also aligned in calendar time. Thus. the performance window for the cohorts originated in the latest years becomes shorter and shorter. the marginal plots in both lifetime and calendar time are given Figure 1. epidemiology. Note that the calendar dynamics follow closely the NBERdated recessions shown in Figure 1.observations are the individual lives together with birth and death time points. Esper. together with the sidebyside boxplots for checking the vintage (i. The main purpose of the thesis is to develop a dualtime modeling framework based on observations on the vintage diagram.1 based on a dualtimetodefault simulation. The default rates in Table 1. and when these observations are subject to truncation and censoring. Such a graph is known as the Lexis diagram in demography and survival analysis. Cook and Schweingruber (Science. It also appears in clinical trials with staggered entry. Table 1. dendrochronology. and truncated by 20 years maximum in lifetime.
Also discussed is the semiparametric extension of MEV models in the presence of dualtime covariates. An eﬃcient MEV Backﬁtting algorithm is provided for model estimation.g. where M stands for the maturation curve. loss rates). Note that the AdaSS may estimate ‘smooth’ functions including possible jumps. E for the exogenous inﬂuence and V for the vintage heterogeneity. for which we derive an explicit solution based upon piecewise type of local adaptation. and its performance is assessed by a sim 21 . we discuss the adaptive smoothing spline (AdaSS) for heterogeneously smooth function estimation. The intrinsic identiﬁcation problem is discussed. and it plays a key role in the subsequent work in the thesis. Such models are motivated from the practical needs in ﬁnancial risk management. Two challenging issues are evaluation of reproducing kernel and determination of local penalty.5 outlines the scope of the thesis. each having a speciﬁc feature of heterogeneous smoothness.4: Moody’s speculativegrade default rates for annual cohorts 19702008: projection views in lifetime. calendar and vintage origination time. In Chapter III.Figure 1. We believe that the dualtime analytics developed in the thesis can be also applied to nonﬁnancial ﬁelds. are used to demonstrate the new AdaSS technique. past temperature variability. We propose an MEV decomposition framework based on Gaussian process models. The road map in Figure 1. we develop the vintage data analysis (VDA) for continuos type of dualtime responses (e. Four diﬀerent examples. In Chapter II.
It is of particular importance in credit risk modeling where the default events could be triggered by both endogenous and exogenous hazards. (b) structural parameterization via the ﬁrstpassagetime triggering system with an endogenous distancetodefault process associated with exogenous timetransformation. Finally. We consider (a) nonparametric estimators under oneway. twoway and threeway underlying hazards models. Also discussed is the randomeﬀect vintage heterogeneity modeled by the shared frailty. ulation study. In Chapter IV. for which the method of partial likelihood estimation is discussed. 22 . and shed some light on understanding the ongoing credit crisis from a new dualtime analytic perspective. Then. we demonstrate the application of DtSA to credit card and mortgage risk analysis in retail banking. and (c) dualtime semiparametric Cox regression with both endogenous and exogenous baseline hazards and covariate eﬀects. we apply the MEV risk decomposition strategy to analyze both corporate default rates and retail loan loss rates. we study the dualtime survival analysis (DtSA) for timetodefault data observed on Lexis diagram.5: A road map of thesis developments of statistical methods in credit risk modeling.' Chapter I: Introduction Credit Risk Modeling Survival Analysis $ ' Chapter II: AdaSS Smoothing Spline Local Adaptation $ & ' Chapter IV: DtSA ? % & @ @ @ @ @ ? $ ' @ R @ Chapter III: % VDA $ & Nonparametric Estimation Structural Parametrization Dualtime Cox Regression % & MEV Decomposition Gaussian Process Models Semiparametric Regression % Figure 1.
000 1.037 1.647 1.Table 1.000 0.000 0.349 2.681 8.134 0.282 5.890 1989 5.460 7.861 3.833 0.000 1.001 1.918 4.759 0.955 2004 2.028 3.238 1.439 1.290 5.943 2.840 0.608 2.000 0.501 1.000 0.269 0.411 0.181 1.194 1.392 Year7 0.372 4.653 1.824 3.709 7.049 4.794 9.000 0.824 3.153 1984 3.000 1.531 1.829 0.797 1985 3.145 1997 2.721 4.000 1.364 3.120 1.086 3.000 1.213 2.370 5.719 0.468 1.477 2.139 4.668 1986 5.894 1988 3.573 0.406 1.701 4.571 4.923 3.887 3.500 1.000 6.466 1.544 4.607 0.moodys.123 2.157 1.155 1.599 1.667 1999 2000 6.832 1.954 1.000 0.930 0.804 0.521 5.575 0.862 4.859 0.463 1.560 2.000 0.793 3.000 0.355 2.474 0.927 0.522 2.306 1.741 1.134 3.000 2.496 2.401 2.000 1.000 0.078 1.303 0.897 1979 1980 1.575 6.247 1.570 1998 3.009 2.644 4.976 8.399 1.072 4.000 0.792 2008 4.586 3.000 1.170 0.000 0.967 0.977 1.348 2.000 2.082 1971 1.072 1.272 2.702 6.727 8.966 1.735 0.034 0.563 1987 4.477 0.495 3.070 Year6 0.688 1.896 1.275 2.780 1.829 1.000 0.078 2.640 2.098 2.125 0.375 2003 5.958 4.050 1.618 3.884 0. Data source: Moody’s special comment (release: February 2009) on corporate default and recovery rates.874 3.339 9.895 2006 1.494 2.000 0.000 0.071 2.299 4.223 3.141 4.486 1.339 3.406 1978 1.916 0.000 0.535 5.960 2.484 1.814 1.341 3.280 1.152 4.739 0.129 Year3 1.982 2.362 1.199 3.000 0.com/) and author’s calculations.261 8.045 2.255 2.507 7.308 2.871 4.752 1.620 1.957 0.471 1.763 4.668 1.813 1.717 Year4 0.266 1.904 1.176 4.901 1.471 1.266 4.680 2.000 1.197 4.247 0.849 2.440 2.000 1.812 1.283 7.454 6.514 0.050 1.608 4.013 5.000 4.921 5.208 6.324 6.785 1996 1.305 2.698 1.000 2.994 7.205 3.349 2.798 0.000 0.883 3.730 1.000 1.000 1.016 0.069 0.419 5.013 3.124 7.000 0.744 1.695 1.362 0.012 0.951 0.435 1.927 1.000 0.256 1.866 0.470 4.623 0.000 0.103 1.696 2.657 2.152 5.889 3.642 5.371 1.053 1982 3.424 0.943 1.792 2.339 1.000 1.663 3.000 0.000 0.580 5.902 3.169 0.000 0.222 3.000 0.382 1.000 1.894 2.644 3.613 0.148 5.000 0.878 0.049 0.000 1.000 0.110 0.794 0.925 6.000 0.000 2.654 3.343 2.000 0. 19202008 (http://www.568 0.000 0.715 3.931 1975 1.674 3.711 0.864 1.726 0.448 6.104 1983 3.778 0.119 6.000 0.000 0.419 0.000 0.617 1.403 7.876 1973 1974 1.474 1.840 0.932 2.963 7.032 3.975 0.654 1.721 2.384 6.276 0.659 2.871 3.933 8.553 6.000 1.534 4.871 3.735 0.463 2.310 1.880 Year5 0.000 0.128 3.617 2.673 2.365 1.138 7.187 4.250 3.639 0.449 1.303 4.542 2.482 2.855 1.000 2.280 2.000 0.261 2.000 1.443 1.462 0.297 0.000 1.014 1.811 0.056 0.278 0.755 0.180 7.000 0.041 7.631 1.715 0.000 1.864 1.772 1.859 3. 23 Cohort Year1 Year2 1970 8.268 2.783 2.059 1991 1992 5.128 3.018 1.194 3.000 0.101 4.620 6.993 3.210 4.281 3.348 2.730 2.000 0.000 1.536 0.000 1.869 1995 2.084 1.848 1994 2.534 0.712 0.423 7.119 1.856 0.501 0.020 2.329 0.237 1.154 0.000 1.210 0.264 0.265 2.773 1.398 1.007 0.105 2.597 5.132 1.000 1.441 8.333 3.780 Year9 Year10 Year11 Year12 Year13 Year14 Year15 Year16 Year17 Year18 Year19 Year20 1.090 0.955 3.000 0.234 1.173 2.829 0.052 1.087 0.607 1.000 0.000 1.761 0.873 0.266 6.315 4.439 4.223 0.851 1.386 0.579 2.000 0.297 4.326 0.346 1.113 0.293 0.430 5.150 1.754 1.637 2.152 1.327 3.124 7.231 7.912 1976 0.045 1.980 1990 9.271 0.765 0.222 3.072 0.185 1.596 3.440 2001 10.263 6.000 0.562 1.169 2007 0.000 0.650 1.338 3.261 1.000 0.603 1.154 3.146 3.165 0.527 5.926 4.186 1.239 1.697 3.224 3.204 1.607 .798 0.000 1.651 2.678 0.000 1.920 1.855 1.934 0.791 0.914 2.000 0.743 7.493 1.638 0.078 2.225 3.898 0.009 1.691 2.561 0.420 0.415 0.000 1.906 0.890 2.031 1.963 6.351 1.000 1.017 1.547 3.941 1.289 4.989 1972 1.030 0.603 8.340 1.213 2.1: Moody’s speculativegrade default rates.943 1.967 2.068 3.361 1977 1.376 2.876 2.983 2.000 0.495 1993 3.612 3.578 0.000 2.000 0.458 3.960 1.185 1.966 2.815 4.921 5.713 2002 7.796 2.000 0.753 4.448 1.652 2005 1.451 0.073 2.468 1.000 1.057 1.860 3.732 9.997 1.947 2.507 6.000 0.759 0.000 1.723 2.048 4.718 0.157 1.676 0.000 1.675 3.884 3.435 1.069 4.859 1981 0.631 1.335 8.732 1.000 1.052 7.949 4.070 7.559 0.000 1.396 3.065 1.000 1.194 0.877 2.696 1.308 0.000 0.177 2.330 0.191 1.890 0.205 0.444 0.125 1.159 Year8 0.788 1.
i = 1. n (2. local polynomial smoothing [Fan and Gijbels (1996)] and wavelet shrinkage [Vidakovic (1999)]. and includes the methods of kernel smoothing [Wand and Jones (1995)].1 Introduction For observational data. . λ>0 (2. statisticians are generally interested in smoothing rather than interpolation. and ε(ti ) is the random error. Function estimation from noisy data is a central problem in modern nonparametric statistics. see the monographs by Wahba (1990). Let the observations be generated by the signal plus noise model yi = f (ti ) + ε(ti ). . we concentrate on the method of smoothing spline that has an elegant formulation through the mathematical device of reproducing kernel Hilbert space (RKHS). Ordinarily. the smoothing spline is formulated by the following variation problem 1 n n 1 f ∈Wm min w(ti )[yi − f (ti )]2 + λ i=1 0 [f (m) (t)]2 dt .CHAPTER II Adaptive Smoothing Spline 2. 1]. There is a rich body of literature on smoothing spline since the pioneering work of Kimeldorf and Wahba (1970). .2) 24 .1) where f is an unknown smooth function with unit interval support [0. Green and Silverman (1994) and Gu (2002). In this thesis. .
K are the pointwise averages.1.1) where ε(t) ∼ N (0. . as noted by Wahba (1985) and Nycha (1988). To train ¯ OrdSS. f . f (m) ∈ L2 [0. the mean squared error MSE = 1 n n i=1 w(ti )[yi − f (ti )]2 and the roughness penalty PEN = 1 (m) [f (t)]2 dt. then performs regularization based on the simple integral of squared derivatives [f (m) (t)]2 .4) where yk ∼ N (f (tk ).05.1) with stationary error but timevarying sampling frequency rk for k = 1. ¯ The OrdSS assumes that f (t) is homogeneously smooth (or rough). . It 25 . . . 1]}. In practice. t ∈ [0. there are various types of functions of nonhomogeneous smoothness. The last two are the sampled data from automotive engineering and ﬁnancial engineering. (2. 0 hence the tuning parameter λ controls the tradeoﬀ between ﬁdelity to the data and the smoothness of the estimate. . 1]. It excludes the target functions with spatially varying degrees of smoothness. The weights w(ti ) in MSE come from the nonstationary error assumption of (2. . . (2. and four such scenarios are illustrated in Figure 2.2) OrdSS in short. f (m−1) absolutely continuous. . Let us call the ordinary smoothing spline (2. Doppler function: f (t) = t(1 − t) sin(2π(1 + a)/(t + a)). Suppose the data are generated by (2. a = 0. the raw data can be replaced by yk s together with the weights w(tk ) ∝ rk . They can be also used for aggregating data with replicates. . σ 2 /rk ) for k = 1. .2) is a sum of two functionals. K distinct points. by adding the N (0. The ﬁrst two are the simulation examples following exactly the same setting as Donoho and Johnstone (1994). . 1. . σ 2 /w(t)). 1) noise to n = 2048 equally spaced signal responses that are rescaled to attain signaltonoise ratio 7.where the target function is an element of the morder Sobolev space Wm = {f : f. then 1 n K rk MSE = [ykj − f (tk )]2 = k=1 j=1 1 K K w(tk ) yk − f (tk ) ¯ k=1 2 + Const.3) The objective in (2.
2. and it illustrates the fact that the acceleration stays relatively constant until the crash impact. HeaviSine function: f (t) = 3 sin 4πt − sgn(t − 0.72. a hypothesized smooth shape is overlaid on the observations.3) − sgn(0. In Figure 2. For academic use.72 − t). and (4) CreditRisk tweaked sample. (2) HeaviSine function simulated with noise.Figure 2. 4.1: Heterogeneous smooth functions: (1) Doppler function simulated with noise. in which the time points are irregularly spaced and the noises vary over time. 1]. business background is removed and 26 . CreditRisk tweaked sample: retail loan loss rates for the revolving credit exposures in retail banking. has both timevarying magnitudes and timevarying frequencies.3 and an upward jump at 0. t ∈ [0. 3. It has a downward jump at t = 0. It consists of accelerometer readings taken through time from a simulated motorcycle crash experiment.1. (3) MotorcycleAccident experimental data. MotorcycleAccident experiment: a classic data from Silverman (1985).
5) where the addon function ρ(t) is adaptive in response to the local behavior of f (t) so that heavier penalty is imposed to the regions of lower curvature. 2008). suggested to use the local penalty 1 1 PEN = 0 ρ(t)[f (m) (t)]2 dt. increases upon growth. in a discussion. so a vintage booked earlier has a longer observation length. such as variablebandwidth kernel smoothing (M¨ller and Stadtm¨ller. whose degree of smoothness. where a vintage refers to a group of credit accounts originated from the same month. They are all observed up to the end of 2006. s. In Section 2. where function estimation with nonhomogeneous smoothness has been of interest for decades.the sample is tweaked. Luo and Wahba (1997) proposed a hybrid adaptive spline with knot selection. ρ(t) > 0 and 0 ρ(t)dt = 1 (2. Wahba (1995). Relevant works include Abramovich and Steinberg (1996) and Pintore.t. by assumption. The tweaked sample includes monthly vintages booked in 7 years (20002006).2 we formulate the AdaSS (adap 27 . and the circles are the vintage loss rates. which are wiggly on large monthsonbook due to decreasing sample sizes. Our purpose is to estimate an underlying shape (illustrated by the solid line). The pointwise averages are plotted as solid dots. In Figure 2. 2000). Speckman and Holmes (2006). rather than using every sampling point as a knot. multivariate u u adaptive regression splines (Friedman. This local penalty functional is the foundation of the whole chapter. In the context of smoothing spline. The ﬁrst three examples have often been discussed in the nonparametric analysis literature. adaptive wavelet shrinkage (Donoho and Johnstone. 1991). More naturally. localpenalty regression spline (Ruppert and Carroll. 1987). 1994). A variety of adaptive methods has been proposed. the xaxis represents the monthsonbook.1. as well as the treed Gaussian process modeling (Gramacy and Lee. This chapter is organized as follows.
. (m−1)! Then. . .6) can be expressed as n m−1 f (t) = j=0 αj φj (t) + i=1 ci K(t. t.2) based on the local penalty (2. . or see Wahba (1990. · · · . B = (φ(t1 ). m − 1.3 covers some basic properties of the AdaSS. w(tn )) and Σ = [K(ti .5 and give technical proofs in the last section. We summarize the chapter with Section 2. Proposition 2. α ∈ Rm . it can be solved by generalized linear regression.6).7) where Gm (t. the solution of (2. . . . W = diag(w(t1 ).k.2 AdaSS: Adaptive Smoothing Spline The AdaSS extends the OrdSS (2.tive smoothing spline) through the local penalty (2. yn )T . .6) The following proposition has its root in Kimeldorf and Wahba (1971). (2. deﬁne the r. c ∈ Rn . Section 2. . Given a bounded integrable positive function ρ(t). . . s ∈ [0. u) = (t−u)m−1 + . derive its solution via RKHS and generalized ridge regression.4. . φ(tn ))T . we 28 . It is formulated as 1 n n 2 1 2 f ∈Wm min w(ti ) yi − f (ti ) i=1 +λ 0 ρ(t) f (m) (t) dt . Substituting (2. 1 K(t. Write y = (y1 . ti ) ≡ αT φ(t) + cT ξ n (t) (2. then discuss how to determine both the smoothing parameter and the local penalty adaptation by the method of cross validation.8) where φj (t) = tj /j! for j = 0. 2. s) = 0 ρ−1 (u)Gm (t. §1). u)du. 1] (2. tj )]n×n .8).5). Since the AdaSS has a ﬁnitedimensional linear expression (2. The experimental results for the aforementioned examples are presented in Section 2. u)Gm (s. λ > 0.8) into (2.1 (AdaSS).5).
13) 29 . By (2. Consider the QRdecomposition B = (Q1 . T )T = Q1 R with orthogonal . 2 (2.ρ y = nλW−1 Q2 QT (Σ + nλW−1 )QT 2 2 −1 QT y. Then. . 2 α = R−1 QT y − (Σ + nλW−1 )c . (2.10).Q . so the residuals are given by ˆ e = y − y = nλW−1 c ≡ I − Sλ. or Bα + (Σ + nλW−1 )c = y. α and c to be zero. ΣW Bα + (Σ + nλW−1 )c = ΣWy. (Q1 . 2 )(RT .12) where Sλ.Q Multiplying both sides of y = Bα + (Σ + nλW−1 )c by QT and using the fact 2 that c = Q2 QT c (since BT c = 0).0 .r.11) 1 ˆ The ﬁtted values at the design points are given by y = Bα + Σc.have the regularized least squares problem 1 y − Bα − Σc n 2 W 2 Σ min α.c +λ c .t. 2 )n×n and uppertriangular Rm×m . (2.ρ = I − nλW−1 Q2 QT (Σ + nλW−1 )QT 2 2 matrix in the sense that QT is often called the smoothing 2 ˆ y = Sλ. (2. BT c = 0. simple algebra yields 2 2 c = Q2 QT (Σ + nλW−1 )Q2 2 −1 QT y.9) where x 2 A = xT Ax for A ≥ 0. . we obtain the equations BT W Bα + Σc = BT Wy.10) . we have that QT y = QT (Σ + nλW−1 )c = 2 2 2 QT (Σ + nλW−1 )Q2 QT c. where Q1 is n × m and Q2 is n × (n − m).ρ y −1 (2. By setting partial derivatives w. y = Bα + (Σ + nλW−1 )c.
Let ρ−1 (t) = ak +bk t. Proposition 2. (2. i]. In what follows we present a closedform expression of K(t. . The AdaSS is analogous to OrdSS in constructing the conﬁdence interval under a Bayesian interpretation by Wahba (1983).ρ ) can be interpreted as the equivalent degrees of freedom. and {F (u)}τ = F (τ ) − F (τ ). For general ρ(t). of the integral form (2. Then.7) can be explicitly evaluated by κt∧s m−1 K(t.7). For the sampling time points in particular.7). where p = trace(Sλ. . κ1∧1 = K. The estimate of noise variance is given by σ 2 = (I − Sλ.2 is general enough to cover many types of r.1 Reproducing Kernels The AdaSS depends on the evaluation of the r. the AdaSS kernel (2.k.ρ )y 2 W (n − p).ρ [i.k.ρ [i.’s. n (2.We use the notation Sλ.15) τk−1 where κt∧s = min{k : τk > t ∧ s}. bk ) and the knots {0 ≡ τ0 < τ1 < · · · < τK ≡ 1}. . tj )] based on the r. t ∈ [τk−1 .2. τk ) for some ﬁxed (ak . where ρ is needed for the evaluation of the matrix Σ = [K(ti .k. i = 1.2 (Reproducing Kernel). τ Proposition 2. 2. 30 . we may construct the 95% interval pointwise by ˆ f (ti ) ± z0. s) when ρ−1 (t) is piecewise linear.14) where Sλ. we must resort to numerical integration techniques.ρ to denote its explicit dependence on both the smoothing parameter λ and the local adaptation ρ(·). s) = k=1 j=0 (−1)j (ak + bk u) m−1−j (u − t)m−1−j (u − s)m+j (m − 1 − j)!(m + j)! − t∧s∧τk −bk l=0 (u − t)m−1−j−l (u − s)m+1+j+l (−1)l (m − 1 − j − l)!(m + 1 + j + l)! (2. For example. i] represents the ith diagonal element of the smoothing matrix.025 σ 2 w−1 (ti )Sλ. .
K = 1 and ρ−1 (t) = a + bt for t ∈ [0. and they should be treated from knot to knot if ρ−1 (t) is piecewise. otherwise.19) (m − 1 − j − l)!(m + j)! (2m − 1 − l)! Note that the kernel derivative coincides with (2. set m = 2. the Green’s function Gm (t. (2. et al. s) = − s ). The equa tion (2. s) (s ﬁxed). ∂t otherwise. . s) = j=0 (−1)j tm−1−j sm+j (−1)m (s − t)2m−1 + I(t ≤ s). (2.17). t) = m 0. 1].k.16) K(t. m − 1.(2.17) when l = m. a 2 (3t s − t3 ) + 6 a (3ts2 − s3 ) + 6 b (2t3 s 12 b (2ts3 12 − t4 ).5). Remarks: Numerical methods are inevitable for approximating (2. . In solving boundary value differential equations. j = 0.setting bk ≡ 0 gives the piecewiseconstant results for ρ−1 (t) obtained by Pintore.2) and their derivatives to be used in next section: m−1 K(t. When b = 0. (m − 1 − j)!(m + j)! (2m − 1)! (2.7) when ρ−1 (t) 31 .15) can be obtained by straightforward but tedious calculations. m−1 (2. are also of interest. Lowerorder lth derivatives (l < m) of (2. u) has the property that f (t) = 1 0 Gm (t.17) The welldeﬁned higherorder derivatives of the kernel can be derived from (2. Here.7) is such an example of applying Gm (t. . ·) to K(t. implying that (aκt + bκt t) (s−t) . a = b/(eb − 1) and b ∈ R. For another example. s) (m−1)! −1 = ρ (t)Gm (s. if t ≤ s ∂ m K(t.16) reduces to the OrdSS kernel for ﬁtting cubic smoothing splines. (By (2. 4 if t ≤ s. .18) ∂ l K(t. s) = ∂tl m−1−l j=0 (−1)j tm−1−j−l sm+j (−1)m+l (s − t)2m−1−l + I(t ≤ s).) Then. u)f (m) (u)du for f ∈ Wm with zero f (j) (0). we present only the OrdSS kernels (by setting ρ(t) ≡ 1 in Proposition 2. The derivatives of the r. (2006).
one may approximate ρ−1 (t) ﬁrst by a piecewise function with knot selection. n.ρ ). ρ(·)). . . i])2 n where diag(Sλ. and (2) the local roughness of f (·) is quantiﬁed by the squared mth 32 . (1 − n−1 trace(Sλ. by varying log λ on the real line. yi )}i=k for k = 1. .e.ρ ))−2 (I − Sλ.2. i.2. ρ) = ˆ w(ti ) yi − f [i] (ti ) .ρ ))2 (2. .20) For the linear predictor (2.21) (1 − Sλ.takes general forms.ρ )2 y. . consider the rules: (1) choose a smaller value of ρ(t) if f (·) is less smooth at t.22) on its log scale. .ρ ) is the diagonal part of the smoothing matrix. (2. We use the method of crossvalidation based on the leaveoneout scheme. Rather than directly approximating (2.22) For ﬁxed ρ(·). we obtain the socalled generalized crossvalidation score 1 GCV(λ. . i=1 (2. Deﬁne the score of crossvalidation by 1 n n 2 CV(λ. n.21) or (2. It follows that 1 n n CV(λ. To select ρ(·). .7). ˆ Conditional on (λ. Craven and Wahba (1979) ˆ ˆ ˆ justiﬁed that f (ti ) − f [i] (ti ) = Sλ.ρ (yi − f [i] (ti )) for i = 1. then use the closedform results derived in Proposition 2.ρ [i.13) with smoothing matrix Sλ. ρ) = n n i=1 ˆ w(ti )(yi − fλ (ti ))2 . the best tuning parameter λ is usually found by minimizing (2.ρ . Replacing the elements of diag(Sλ ) by their average n−1 trace(Sλ. and f [k] (t) denote the AdaSS trained on the leaveoneout sample {(ti . 2.2 Local Penalty Adaptation One of the most challenging problems for the AdaSS is selection of smoothing parameter λ and local penalty adaptation function ρ(t). let f (t) denote the AdaSS trained on the complete sample ˆ of size n. ρ) = i=1 ˆ 1 w(ti )(yi − f (ti ))2 = yT W(I − diag(Sλ.
The extra order a = 0. we start with OrdSS upon the improper adaptation ρ(t) ≡ 1.24) tends to undersmooth. In the context of smoothing splines. So. it is reasonable to determine the shape of ρ(t) by ρ−1 (t) ∝ f (m) (t) . Given the ﬁnite sample.23) Since the true function f (t) is unknown a priori. If the true f (m) (t) were known. so are its derivatives. 1. u)du. 2 imposed on penalization of derivatives ˜ 1 (m+a) [f (t)]2 dt 0 is to control the smoothness of the mth derivative estimate. we may adopt a twostep procedure for selection of ρ(t). ˜ Step 2: calculate f (m) (t) and use it to determine the shape of ρ−1 (t). ti ).derivative f (m) (t) . ti ) ∂tm (2. as follows. 2 2 (2. u)Gm+a (s.2 demonstrates the 33 . 1. In Step 1. Step 1: run the OrdSS with m = m + a (a = 0. The goodness of the spline derivative depends largely on the spline estimate itself. a−1 n ˜ f (m) (t) = j=0 αm+j φj (t) + ˜ i=1 ci ˜ ∂m K(t. Using that as an initial estimate of local roughness. 2) and crossvalidated λ to ˜ ˜ obtain the initial function estimate: f (t) = where K(t. Rice and Rosenblatt (1983) studied the largesample properties of spline derivatives (2. the mth derivative of f (t) can be obtained by diﬀerentiating the bases. Figure 2. the ˜ choice of a depends on how smooth the estimated f (m) (t) is desired to be.24) that have slower convergence rates than the spline itself. ti ) ∂tm is calculated according to (2.24) where ∂m K(t. Let us consider the estimation of the derivatives. but it may capture the rough shape of the derivatives upon standardization. s) = 1 0 m+l−1 j=0 αj φj (t) + ˜ n ˜ i=1 ci K(t. Gm+a (t.19) after replacing m by m + a. ˜ In Step 2. we ﬁnd by simulation study that (2. nonparametric derivative estimation is usually obtained by taking derivatives of the spline estimate. with no prior knowledge.
meeting our needs for estimating ρ(t) below. Despite these downside eﬀects. estimation of the 2ndorder derivative for f (t) = sin(ωt) and f (t) = sin(ωt4 ) using the OrdSS with m = 2. n = 100 and snr = 7. as in Proposition 2. 1]. we restrict ourselves to the piecewise class of functions. the order2 OrdSS could yield rough estimates of derivative. By (2. 3. both have timevarying frequency. The sin(ωt4 ) signal resembles the Doppler function in Figure 2. ω = 10π.1. κt = min{k : τk ≥ t} (2.2: OrdSS function estimate and its 2ndorder derivative (upon standardization): scaled signal from f (t) = sin(ωt) (upper panel) and f (t) = sin(ωt4 ) (lower panel). the shape of the derivative can be more or less captured. and the order3 OrdSS could give smooth estimates but large bias on the boundary.23). ˜ In determining the shape of ρ−1 (t) from f (m) (t). In both sinewave examples. (2) continuity: ak + 34 . we suggest to estimate ρ−1 (t) piecewise by the method of constrained least squares: 2γ ˜ f (m) (t) + ε = a0 ρ−1 (t) = aκt + bκt t.2.25) subject to the constraints (1) positivity: ρ−1 (t) > 0 for t ∈ [0.Figure 2.
bk τk = ak+1 + bk+1 τk for k = 1, . . . , K − 1 and (3) normalization: a0 = a0
K k=1
− τk dt τk−1 ak +bk t
1 0
ρ(t)dt =
> 0, for given γ ≥ 1 and prespeciﬁed knots {0 ≡ τ0 < τ1 < . . . <
− τK−1 < 1} and τK = 1. Clearly, ρ−1 (t) is an example of linear regression spline. If the
continuity condition is relaxed and bk ≡ 0, we obtain the piecewiseconstant ρ−1 (t). ˜ The power transform f (m) (t)
2γ
˜ , γ ≥ 1 is used to stretch [f (m) (t)]2 , in order to
make up for (a) overestimation of [f (m) (t)]2 in slightly oscillating regions and (b) underestimation of [f (m) (t)]2 in highly oscillating regions, since the initial OrdSS is trained under the improper adaptation ρ(t) ≡ 1. The lowerright panel of Figure 2.2 ˜ is such an example of overestimating the roughness by the OrdSS. The knots {τk }K ∈ [0, 1] can be speciﬁed either regularly or irregularly. In most k=1 situations, we may choose equally spaced knots with K to be determined. For the third example in Figure 2.1 that shows ﬂat vs. rugged heterogeneity, one may choose sparse knots for the ﬂat region and dense knots for the rugged region, and may also choose regular knots such that ρ−1 (t) is nearly constant in the ﬂat region. However, for the second example in Figure 2.1 that shows jumptype heterogeneity, one should shrink the size of the interval containing the jump. Given the knots, the parameters (λ, γ) can be jointly determined by the crossvalidation criteria (2.21) or (2.22). Remarks: The selection of the local adaptation ρ−1 (t) is a nontrivial problem, for ˜ which we suggest to use ρ−1 (t) ∝ f (m) (t)
2γ
together with piecewise approximation.
This can be viewed as a joint force of Abramovich and Steinberg (1996) and Pintore, et al. (2006), where the former directly executed (2.23) on each sampled ti , and the latter directly impose the piecewise constant ρ−1 (t) without utilizing the fact (2.23). Note that the approach of Abramovich and Steinberg (1996) may have the estimation ˜ bias of f (m) (t)
2
as discussed above, and the r.k. evaluation by approximation based
on a discrete set of ρ−1 (ti ) remains another issue. As for Pintore, et al. (2006), the highdimensional parametrization of ρ−1 (t) requires nonlinear optimization whose 35
stability is usually of concern. All these issues can be resolved by our approach, which therefore has both analytical and computational advantages.
2.3
AdaSS Properties
Recall that the OrdSS (2.2) has many interesting statistical properties, e.g. (a) it is a natural polynomial spline of degree (2m − 1) with knots placed t1 , . . . , tn (hence, the OrdSS with m = 2 is usually referred to the cubic smoothing spline); (b) it has a Bayesian interpretation through the duality between RKHS and Gaussian stochastic processes (Wahba, 1990, §1.41.5); (c) its large sample properties of consistency and rate of convergence depends on the smoothing parameter (Wahba, 1990, §4 and references therein); (d) it has an equivalent kernel smoother with variable bandwidth (Silverman, 1984). Similar properties are desired to be established for the AdaSS. We leave (c) and (d) to future investigation. For (a), the AdaSS properties depend on the r.k. (2.7) and the structure of local adaptation ρ−1 (t). Pintore, et al. (2006) investigated the piecewiseconstant case of ρ−1 (t) and argued that the AdaSS is also a piecewise polynomial spline of degree (2m − 1) with knots on t1 , . . . , tn , while it allows for more ﬂexible lowerorder derivatives than the OrdSS. The piecewiselinear case of ρ−1 (t) can be analyzed in a similar manner. For the AdaSS (2.8) with the r.k. (2.15),
m−1
f (t) =
j=0
αj φj (t) +
i:ti ≤t
ci K(t, ti ) +
i:ti >t
ci K(t, ti )
with degree (m − 1) in the ﬁrst two terms, and degree 2m in the third term (after one degree increase due to the trend of ρ−1 (t)).
36
For (b), the AdaSS based on the r.k. (2.7) corresponds to the following zeromean Gaussian process
1
Z(t) =
0
ρ−1/2 (u)Gm (t, u)dW (u),
t ∈ [0, 1]
(2.26)
where W (t) denotes the standard Wiener process. The covariance kernel is given by E[Z(t)Z(s)] = K(t, s). When ρ(t) ≡ 1, the Gaussian process (2.26) is the (m − 1)fold integrated Wiener process studied by Shepp (1966). Following Wahba (1990; §1.5), consider a random eﬀect model F (t) = αT φ(t) + b1/2 Z(t), then the best linear unbiased estimate of F (t) from noisy data Y (t) = F (t) + ε corresponds to the AdaSS solution (2.8), provided that the smoothing parameter λ = σ 2 /nb. Moreover, one may set an improper prior on the coeﬃcients α, derive the posterior of F (t), then obtain a Bayesian type of conﬁdence interval estimate; see Wahba (1990; §5). Such conﬁdence intervals on the sampling points are given in (2.14). The inverse function ρ−1 (t) in (2.26) can be viewed as the measurement of nonstationary volatility (or, variance) of the mth derivative process such that dm Z(t) dW (t) = ρ−1/2 (t) . m dt dt
(2.27)
By the duality between Gaussian processes and RKHS, dm Z(t)/dtm corresponds to f (m) (t) in (2.6). This provides an alternative reasoning for (2.23) in the sense that ρ−1 (t) can be determined locally by the secondorder moments of dm Z(t)/dtm . To illustrate the connection between the AdaSS and the Gaussian process with nonstationary volatility, consider the case m = 1 and piecewiseconstant ρ−1 (t) = ak , t ∈ [τk−1 , τk ) for some ak > 0 and prespeciﬁed knots 0 ≡ τ0 < τ1 < · · · < τK ≡ 1. By (2.15), the r.k. is given by
κt∧s
K(t, s) =
k=1
ak t ∧ s − τk−1 ,
t, s ∈ [0, 1].
(2.28)
37
By (2.26) and (2.27), the corresponding Gaussian process satisﬁes that √
dZ(t) =
ak dW (t), for t ∈ [τk−1 , τk ), k = 1, . . . , K
(2.29)
where ak is the local volatility from knot to knot. Clearly, Z(t) reduces to the standard Wiener process W (t) when ak ≡ 1. As one of the most appealing features, the construction (2.29) takes into account the jump diﬀusion, which occurs in [τk−1 , τk ) when the interval shrinks and the volatility ak blows up.
2.4
Experimental Results
The implementation of OrdSS and AdaSS diﬀers only in the formation of Σ through the r.k. K(t, s). It is most crucial for the AdaSS to select the local adaptation ρ−1 (t). Consider the four examples in Figure 2.1 with various scenarios of heterogeneous smoothness. The curve ﬁtting results by the OrdSS with m = 2 (i.e. cubic smoothing spline) are shown in Figure 2.3, as well as the 95% conﬁdence intervals for the last two cases. In what follows, we discuss the need of local adaptation in each case, and compare the performance of OrdSS and AdaSS. Simulation Study. The Doppler and HeaviSine functions are often taken as the benchmark examples for testing adaptive smoothing methods. The global smoothing by the OrdSS (2.2) may fail to capture the heterogeneous smoothness, and as a consequence, oversmooth the region where the signal is relatively smooth and undersmooth where the signal is highly oscillating. For the Doppler case in Figure 2.3, the underlying signal increases its smoothness with time, but the OrdSS estimate shows lots of wiggles for t > 0.5. In the HeaviSine case, both jumps at t = 0.3 and t = 0.72 could be captured by the OrdSS, but the smoothness elsewhere is sacriﬁced as a compromise. The AdaSS ﬁtting results with piecewiseconstant ρ−1 (t) are shown in Figure 2.4. 38
21). and prespeciﬁed the knots (0. To estimate the piecewiseconstant local adaptation ρ−1 (t). 10] and γ ∈ [1. The dashed lines represent 95% conﬁdence intervals. Table 2. 0. and the timedependent weights are speciﬁed proportional to the number of replicates. we assumed that both tuning parameters are bounded such that log λ ∈ [−40. true σ 2 = 1). and the 39 . then estimated the 2ndorder derivatives by (2.24). Initially. then performed the bounded nonlinear optimization. In our implementation. 0. the function ρ−1 (t) is estimated by constrained least squares (2.3: OrdSS curve ﬁtting with m = 2 (shown by solid lines). where the best γ ∗ is found jointly with λ by minimizing the score of crossvalidation (2.25) conditional on the power transform parameter γ.Figure 2. 0.715.) Given the knots. In the creditrisk case. we prespeciﬁed 6 equally spaced knots (including 0 and 1) for the Doppler function.305. the estimated variance of noise (cf. we ﬁtted the OrdSS with crossvalidated tuning parameter λ. 1) for the HeaviSine function. 5]. 0.1 gives the numerical results in both simulation cases. including the crossvalidated (λ. (Note that such knots can be determined graphically based on the plots of the squared derivative estimates.725. 0. the log loss rates are considered as the responses.5.295. γ).
40 .Figure 2.4: Simulation study of Doppler and HeaviSine functions: OrdSS (blue). AdaSS (red) and the heterogeneous truth (light background).
6 7 1.Table 2.95 1. Upon the crash occurrence. For the Doppler function.19 2180. see the conﬁdence intervals in Figure 2. Recall that the data consists of acceleration measurements (subject to noise) of the head of a motorcycle rider during the course of a simulated crash experiment. smoothness nonhomogeneity is another important feature of this data set. Silverman (1985) originally used the motorcycleaccident experimental data to test the nonstationary OrdSS curve ﬁtting. 0.(2006) who used a small sample size (n = 128) and highdimensional optimization techniques. Simulation n snr λ γ σ2 ˆ dof CV Doppler 2048 7 1.07 2123. Both simulation examples demonstrate the advantage of using the AdaSS. Besides error nonstationarity. et al. Let us focus on the period prior to the crash happening at t ≈ 0. resulting in heterogenous curve ﬁtting. The data indicates clearly that the acceleration rebounds well above its original level before setting back.21e012 2. for which it is reasonable to 41 . while the relatively same high level of penalty is assigned to other regions.1: AdaSS parameters in Doppler and HeaviSine simulation study.13 0. For the HeaviSine function. Figure 2. the imposed penalty ρ(t) in the ﬁrst interval [0.9 HeaviSine 2048 equivalent degrees of freedom.0175 35.4 (bottom panel) shows the improvement of AdaSS over OrdSS after zooming in. The middle panel shows the datadriven estimates of the adaptation function ρ(t) plotted at log scale. it decelerates to about negative 120 gal (a unit gal is 1 centimeter per second squared). low penalty is assigned to where the jump occurs.24 (after time rescaling). MotorcycleAccident Data.86e010 4. A similar simulation study of the Doppler function can be found in Pintore.3. the treatment from a heterogenous smoothness point of view is more appropriate than using only the nonstationary error treatment.9662 164.2) is much less than that in the other four intervals. To ﬁt such data.
Figure 2. nonstationary OrdSS and AdaSS: performance in comparison.6: OrdSS.Figure 2.5: Nonstationary OrdSS and AdaSS for MotorcycleAccident Data. 42 .
the simple pointwise or moving 43 . Compared to Figure 2. The last plot in Figure 2. Both results are shown in Figure 2. we estimated the local adaptation ρ−1 (t) from the derivatives of nonstationary OrdSS. it is clear that AdaSS performs better than both versions of OrdSS in modeling the constant acceleration before the crash happening. CreditRisk Data.2. loess) is usually performed. the OrdSS estimate in Figure 2. The local moving averaging (e.6 compares the performances of stationary OrdSS. see the upperleft panel in Figure 2. Then. while the estimate by the nonstationary AdaSS looks reasonable. nonstationary OrdSS and nonstationary AdaSS. AdaSS gives a more smooth estimate than OrdSS. Figure 2. the smoothed weight estimates are plugged into (2.28].e. To perform AdaSS (2.3. It is of our interest to model the growth or the maturation curve of the log of loss rates in lifetime (monthonbook).6) with the same weights.3 shows a slight bump around t = 0. the sample size decreases in lifetime. Their monthly loss rates are observed up to December 2006 (right truncation). see Silverman (1985). However. Besides. σ 2 /w(ti )) with unknown weights w(ti ).assume the acceleration stays constant. Aligning them to the same origin. 0. we have the observations accumulated more in smaller monthonbook and less in larger monthonbook. Consider the sequence of vintages booked in order from 2000 to 2006.1 represents a tweaked sample in retail credit risk management.g.5.2) to run the nonstationary OrdSS.5.55. As a consequence. a common approach is to begin with an unweighted OrdSS and estimate w−1 (ti ) from the local residual sums of squares. i. After zooming in around t ∈ [0. For the settingback process after attaining the maximum acceleration at t ≈ 0. both nonstationary spline models result in tighter estimates of 95% conﬁdence intervals than the unweighted OrdSS. a false discovery. they diﬀer in estimating the minimum of the acceleration curve at t ≈ 0.36. where the nonstationary OrdSS seems to have underestimated the magnitude of deceleration.. In presence of nonstationary errors N (0.1.
This is also a desired property of the maturation curve for the purpose of extrapolation. which however does not smooth out the maturation curve for large monthonbook. averages have increasing variability. Alternatively.4). By (2. we applied the AdaSS with judgemental prior on heterogeneous smoothness of the target function. spline smoothing may work on the pointwise averages subject to nonstationary errors N (0. our experiences tell that the maturation curve is expected to have increasing degrees of smoothness once the lifetime exceeds certain monthonbook. and (b) the raw inputs for training purpose are contaminated and have abnormal behavior. For demonstration. σ 2 /w(ti )) with weights determined by the number of replicates.3. The reasons of the unstability in the rightmost region could be (a) OrdSS has the intrinsic boundary bias for especially small w(ti ). The exogenous cause of contamination in (b) will be explained in Chapter III. The nonstationary OrdSS estimate is shown in Figure 2. Nevertheless.7: AdaSS estimate of maturation curve for CreditRisk sample: piecewiseconstant ρ−1 (t) (upper panel) and piecewiselinear ρ−1 (t) (lower panel). we purposefully speciﬁed 44 .Figure 2.
45 . Four diﬀerent examples. it is not clear how the AdaSS can be extended to multivariate smoothing in a tensorproduct space. each having a speciﬁc feature of heterogeneous smoothness. 2. we will consider the application of AdaSS in a dualtime domain. we proposed a shape estimation procedure based the function derivatives. Compared with the OrdSS performance in the large monthonbook region. transformed to the original scale. as an analogy of the smoothing spline ANOVA models in Gu (2002). In particular.7. Both speciﬁcations are nondecreasing in lifetime and they span about the same range. (c) and (d) listed in Section 2. This creditrisk sample will appear again in Chapter III of vintage data analysis. as well as tighter conﬁdence intervals based on judgemental prior. we studied the AdaSS with piecewise adaptation of local penalty and derived the closedform reproducing kernels. the AdaSS yields smoother estimate of maturation curve. The curve ﬁtting results for both ρ−1 (t) are nearly identical upon visual inspection. Its connection with a timewarping smoothing spline is also under our investigation. To determine the local adaptation function.ρ−1 (t) to take the forms of Figure 2.5 Summary In this chapter we have studied the adaptive smoothing spline for modeling heterogeneously ‘smooth’ functions including possible jumps. where the notion of smoothing spline will be generalized via other Gaussian processes. The AdaSS estimate of maturation curve.t. In next chapter.3. However. The really interesting story is yet to unfold. were used to demonstrate the improvement of AdaSS over OrdSS.r. including (a) piecewiseconstant and (b) piecewiselinear. Future works include the AdaSS properties w.1. together with the method of crossvalidation for parameter tuning. is plotted by the red solid line in Figure 2.
6 Technical Proofs Proof to Proposition 2. t). .2. g1 H = 0 f1 (u)g1 (u)λ(u)du. (m − 1)! (m − 1)! 46 . K(t. 2 H 1 0 (m) (2. for k = 0. then 1 0 Gm (t. . Then. 1. associate H with the inner product 1 (m) (m) f1 .k.2). By Taylor’s theorem with remainder. The remainder functions f1 (t) = Gm (t.k. 1]. u)g(u)du. with piecewise ρ−1 (t) equals κt∧s − t∧s∧τk K(t. . m − 1}. By similar arguments to Wahba (1990.3) would prove the ﬁnitedimensional expression (2.3).1: by minor modiﬁcation on Wahba (1990. (2. s) = k=1 τk−1 (ak + bk u) (u − t)m−1 (u − s)m−1 du. u)f (m) (u)du lie in the subspace H = Wm ∩ Bm . any function in Wm (2. . ∀f1 ∈ H (2. Proof to Proposition 2. using the arguments of Wahba (1990. §1. where the reproducing property f1 (·). ·) ∂m K(t. u) = problem: if f1 f1 (t) = 1 0 (m) (t−u)+ (m−1)! (m−1) is the Green’s function for solving the boundary value = g with f1 ∈ Bm = {f : f (k) (0) = 0. Given the bounded integrable function ρ(t) > 0 on t ∈ [0.2: The r.8) that minimizes the AdaSS objective functional (2. §1. §1.7).32) follows that = ρ(t)−1 Gm (s.21. s) ∂tm H = f1 (t). u)f (m) (u)du 0 (2.31) and the induced square norm f1 = λ(u)[f1 (u)]2 du.3) can be expressed by m−1 j f (t) = j=0 t (j) f (0) + j! 1 Gm (t.6). one may verify that H is an RKHS with r.30) where Gm (t.
. . . . evaluation by κt∧s m−1 K(t. using the successive integration by parts.For any τk−1 ≤ t1 ≤ t2 < τk . m − 1 t2 Tj ≡ t1 (u − t)m−1−j (u − s)m+j du = (m − 1 − j)! (m + j)! m−1−j (−1)l l=0 (u − t)m−1−j−l (u − s)m+1+j+l (m − 1 − j − l)!(m + 1 + j + l)! t2 . . t2 (u − t)m−1 (u − s)m (u − t)m−1 (u − s)m−1 (ak + bk u) du = d (m − 1)! (m − 1)! (m − 1)! m! t1 t1 t2 (u − t)m−2 (u − s)m (u − t)m−1 (u − s)m t2 (ak + bk u) = (ak + bk u) − du (m − 1)! m! (m − 2)! m! t1 t1 t2 (u − t)m−1 (u − s)m −bk du = · · · . 1. τ2 = t ∧ s ∧ τk for k = 1. κt∧s and adding them all gives the ﬁnal r. . (m − 1)! m! t1 t2 (ak + bk u) it is straightforward (but tedious) to derive that t2 (ak + bk u) t1 m−1 (u − t)m−1 (u − s)m−1 du (m − 1)! (m − 1)! j t2 t1 m−1 (u − t)m−1−j (u − s)m+j = (−1) (ak + bk u) (m − 1 − j)! (m + j)! j=0 − bk j=0 (−1)j Tj in which Tj denotes the integral evaluations for j = 0. . s) = k=1 j=0 (−1)j (ak + bk u) m−1−j (u − t)m−1−j (u − s)m+j (m − 1 − j)!(m + j)! − t∧s∧τk −bk l=0 (u − t)m−1−j−l (u − s)m+1+j+l (−1)l (m − 1 − j − l)!(m + 1 + j + l)! .k. . τk−1 47 . t1 − Choosing τ1 = τk−1 .
vj ). vkj )} for segment k = 1.CHAPTER III Vintage Data Analysis 3. longitudinal or panel data with dualtime characteristics. x(mkjl . . 1. . each subject j we consider corresponds to a vintage originated at time vj such that v1 < v2 < . . tkjl . where Lj is the total number of observations on the jth subject. tjl . vj ) a vintage diagram. . In situations where more than one segment of dualtime observations are available.3 for an illustration of vintage diagram and 48 . . vj ) ∈ Rp . . . Lj . See Figure 1. . . To this end. Also observed on the vintage diagram could be the covariate vectors x(mjl . J .1 Introduction The vintage data represents a special class of functional. (3. 2. . . . l = 1. . . j = 1. and denote the vintage data by {y(mkjl . . . < vJ and the elapsed time mjl = tjl − vj measures the lifetime or age of the subject. Unlike purely longitudinal setup. . . . mjl = 0. It shares the crosssectional time series feature such that there are J subjects each observed at calendar time points {tjl . vj + 1. . we denote the vintage data by y(mjl . We call the grids of such (mjl . vj + 2.1) where the dualtime points are usually sampled from the regular grids such that for each vintage j. vkj ). tjl . l = 1. tjl . we may denote the (static) segmentation variables by zk ∈ Rq . and tjl = vj . K. . tkjl . .. Lj }. . .
To achieve this. and several interesting prototypes are illustrated in Figure 3.1. by integrating ideas from functional data analysis (FDA). the U.S. m. Meanwhile.dualtime collection of observations. which is ﬂexible Despite the overlap between FDA and LDA. time and vintage dimensions are all of potential interest in ﬁnancial risk management. the vintage diagram corresponds to the hyperplane {(t. shown in Table 1. In practice. The vintage diagram could be truncated in either age or time. Each vintage series evolves in the 45◦ direction such that for the same vintage vj . we aim to develop a formal statistical framework for vintage data analysis (VDA). v) : v = t − m} in the 3D space. The time eﬀects are often dynamic and volatile due to macroeconomic environment. Bearing in mind these practical considerations. Besides. the calendar time tj and the lifetime mj move at the same speed along the diagonal — therefore. The empirical evidences tell that the age eﬀects of selfmaturation nature are often subject to smoothness in variation. observations up to today). there exist other prototypes of vintage data truncation. The triangular prototype on the topleft panel is the most typical as the time series data are often subject to the systematic truncation from right (say. are truncated in lifetime to be 20 years maximum.1. we will take a Gaussian process modeling approach. as well as the hierarchical structure of multisegment credit portfolios. longitudinal data analysis (LDA)1 and time series analysis (TSA). 1 49 . and they could be treated as random eﬀects upon appropriate structural assumption. the vintage eﬀects are conceived as the forwardlooking measure of performance since origination. The age. Treasury rates released for multiple maturity dates serve as an example of the parallelogram prototype. The corporate default rates released by Moody’s. The other trapezoidal prototype corresponds to the timesnapshot selection common to credit risk modeling exercises. we try to diﬀerentiate them methodologically by tagging FDA with nonparametric smoothing and tagging LDA with randomeﬀect modeling. The rectangular diagram is obtained after truncation in both lifetime and calendar time.
1: Vintage diagram upon truncation and exempliﬁed prototypes.Figure 3. 50 .
where we discuss Kriging. 3. the dualtime domain Ω consists of isolated 45◦ lines {(m.4 covers the semiparametric regression given the dualtime covariates and segmentation variables. This chapter develops the VDA framework for continuous type of responses (usually.1 is the discrete counterpart of Ω upon right truncation in calendar time.2 MEV Decomposition Framework Let V = {vj : 0 ≡ v1 < v2 < . and real applications in both corporate and retail credit risk cases. Computational results are presented in Section 3. It is organized as follows. t) space. v) : m ≥ 0. Section 3. for which we propose the MEV (maturationexogenousvintage) decomposition modeling framework η(y(m. . (3. t = v + m. The triangular vintage diagram in Figure 3. Let Ω be the support of the vintage data (3. m ≥ 0} for each ﬁxed vintage vj ∈ V. rates). . where m denotes the lifetime and t denotes the calendar time. t. An eﬃcient backﬁtting algorithm is provided for model estimation. t. Unlike a usual 2D (m. and deﬁne the dualtime domain Ω = (m. v)) = f (m) + g(t) + h(v) + ε. v ∈ V .1). smoothing spline and kernel methods.enough to cover a diversity of scenarios.5. In Section 3. we discuss the maturationexogenousvintage (MEV) decomposition in general. The MEV models based on Gaussian processes are presented in Section 3.3. t) : t − m = vj .6 and give technical proofs in the last section.2. < vJ < ∞} be a discrete set of origination times of J vintages. We conclude the chapter in Section 3.2) 51 . including a simulation study for assessing the MEV decomposition technique.
as stated formally in the following lemma.2) deﬁned on the dualtime domain Ω. write f = f0 + f1 with the linear part f0 and the nonlinear part f1 and similarly write g = g0 + g1 . g. h = h0 + h1 . g0 . The identiﬁcation (or collinearity) problem in Lemma 3. and the three MEV components have the following practical interpretations: a) f (m) represents the maturation curve characterizing the endogenous growth. c) h(v) represents the vintage heterogeneity measuring the origination quality. We begin with some general remarks about model identiﬁcation. time or vintage direction. v) : v = t − m} (see Figure 3.where η is a prespeciﬁed transform function. one may group the large m region. the small t region and the large v region. given the vintage data of triangular prototype. as 52 .3) The question is. ε is the random error. m. nonparametric or stochastic process models. Suppose the maturation curve f (m) in (3.1 top right panel). Then. do these constraints suﬃce to ensure the model identiﬁability? It turns out for the dualtime domain Ω. (3.2) absorbs the intercept eﬀect. h be open to ﬂexible choices of parametric. We need to break up such linear dependency in order to make the MEV models estimable. For the MEV models (3. and the constraints Eg = Eh = 0 are set as usual: tmax E tmin g(t)dt = E V h(v)dv = 0. one of f0 .1 (Identiﬁcation). h0 is not identiﬁable. For the time being.1 is incurred by the hyperplane nature of the dualtime domain Ω in the 3D space: {(t. b) g(t) represents the exogenous inﬂuence by macroeconomic conditions. all the MEV decomposition models are subject to the linear dependency. For example. A practical way is to perform binning in age. Lemma 3. let f.
The second example is the generalized additive model (GAM) in nonparametric statistics. The MEV decomposition with zerotrend vintage eﬀects may be referred to as the ad hoc approach. h. One may retrieve h0 (v) = γv by simultaneously adjusting f (m) → f (m) + γm and g(t) → g(t) − γt. Suppose there are I distinct mvalues and L distinct tvalues in the vintage data. g. this strategy is simple to implement. The trend removal for h(v) is needed only when there is no such extra knowledge. An alternative strategy for breaking up the linear dependency is to directly remove the trend for one of f. The APC Model. g. h functions. provided that f. g. then it would be absorbed by f (m) and g(t). The third example is an industry GAMEED approach that may reduce to a sequential type of MEV decomposition. A commonly used binning procedure for h(v) is to assume the piecewiseconstant vintage eﬀects by grouping monthly vintages every quarter or every year. The ﬁrst example is a conventional ageperiodcohort (APC) model in social sciences of demography and epidemiology. where the smoothing spline is used as the scatterplot smoother. setting f (m) = µ + I i=1 αi I(m = mi ). By default. h. we remove the trend of h and let it measure only the nonlinear heterogeneity. By (3. They all can be viewed as one or another type of vintage data analysis. In what follows we review several existing approaches that are related to MEV decomposition. Indeed. To determine the slope γ requires extra knowledge about the trend of at least one of f. Then. A word of caution is in order in situations where there does exist an underlying trend h0 (v). The last example is another industrial DtD technique involving the interactions of MEV components. one only needs to ﬁx the linear bases for any two of f. g(t) = L l=1 βl I(t = tl ) and h(v) = 53 . g.these regions have few observations. h all could be expanded by certain basis functions.2).
. the ordinary least squares (OLS) cannot be used for ﬁtting y = Xθ + ε.J j=1 γj I(v = vj ) gives us the the APC (ageperiodcohort) model η(y(mi . vJ ) to satisfy mI − tL + vJ = 0 (after reshuﬄing the indices of v). This is a key observation by Kupper (1985) that generated longterm discussions and debates on the estimability of the APC model. (1973) and Mason and Wolﬁnger (2002). The identiﬁcation problem of the APC model (3. . . x1 . Without loss of generality.1. et al. see Mason. whose limiting case leads to a socalled intrinsic estimator θ = (XT X)+ XT y. . let us select the factorial levels (mI . Among others. tl . t = L l=1 tl /L and v = ¯ J j=1 vj /J. where y is the the vector consisting of η(y(mi . then one may reformulate the APC model as y = Xθ + ε. where A+ is the MoorePenrose generalized inverse of A satisfying that AA+ A = A. tL . . . xi is deﬁned by xi (m) = I(m = mi )−I(m = mI ) for i = 1.4) follows Lemma 3. xI−1 . . since the regression matrix is linearly dependent through I−1 L−1 J−1 (mi − i=1 (α) m)xi ¯ − l=1 (tl − ¯ (β) t )xl + j=1 ¯ ¯ ¯ (vj − v )xj = m − t + v . as usual. . (3. So. xL−1 . xJ−1 (α) (α) with dummy variables. . Assume i αi = l βl = j vj = 0. ¯ (γ) (3. Since the vintage data are always unbalanced 54 . . . Fu (2000) suggested to estimate the APC model by ridge regression. vj )) = µ + αi + βl + γj + ε. I − 1. For example. .4) where the vintage eﬀects {γj s} used to be called the cohort eﬀects. . x1 . . .5) where m = ¯ I i=1 ¯ mi /I. vj )) and (α) (α) (β) (β) (γ) (γ) X = 1. tl . x1 . The discrete APC model tends to massively identify the eﬀects at the grid level and often ends up with unstable estimates. .
in m, t or v projection, there would be very limited observations for estimating certain factorial eﬀects in (3.4). The APC model is formulated overﬂexibly in the sense that it does not regularize the grid eﬀect by neighbors. Fu (2008) suggested to use the spline model for h(v); however, such spline regularization would not bypass the identiﬁcation problem unless it does not involve a linear basis. The Additive Splines. If f, g, h are all assumed to be unspeciﬁed smooth functions, the MEV models would become the generalized additive models (GAM); see Hastie and Tibshirani (1990). For example, using the cubic smoothing spline as the scatterplot smoother, we have the additive splines model formulated by
J Lj 2 2
arg min
f,g,h j=1 l=1
η(yjl ) − f (mjl ) − g(tjl ) − h(vj )
2
+ λf
f (2) (m) dm h(2) (v) dv ,
2
+λg
g (2) (t) dt + λh
(3.6)
where λf , λg , λh > 0 are the tuning parameters separately controlling the smoothness of each target function; see (2.2). Usually, the optimization (3.6) is solved by the backﬁtting algorithm that iterates
ˆ f ← Sf g ← Sg ˆ ˆ h ← Sh
ˆ η(yjl ) − g (tjl ) − h(vj ) ˆ ˆ ˆ η(yjl ) − f (mjl ) − h(vj )
j,l j,l j,l
(centered) (centered)
(3.7)
ˆ η(yjl ) − f (mjl ) − g (tjl ) ˆ
after initializing g = h = 0. See Chapter II about the formation of smoothing matrices ˆ ˆ Sf , Sg , Sh , for which the corresponding smoothing parameters can be selected by the method of generalized crossvalidation. However, the linear parts of f, g, h are not identiﬁable, by Lemma 3.1. To get around the identiﬁcation problem, we may take
55
the ad hoc approach to remove the trend of h(v). The trendremoval can be achieved by appending to every iteration of (3.7):
J J 2 −1 vj j=1 j=1
ˆ ˆ h(v) ← h(v) − v
ˆ vj h(vj ).
We provide more details on (3.6) to shed light on its further development in the next section. By Proposition 2.1 in Chapter II, each of f, g, h can be expressed by the basis expansion of the form (2.8). Taking into account the identiﬁability of both the intercept and the trend, we may express f, g, h by
I i=1
f (m) = µ0 + µ1 m + g(t) = µ2 t + h(v) =
J j=1 L l=1
αi K(m, mi ) (3.8)
βl K(t, tl )
γj K(v, vj ).
Denote µ = (µ0 , µ1 , µ2 )T , α = (α1 , . . . , αI )T , β = (β1 , . . . , βL )T , and γ = (γ1 , . . . , γJ )T . We may equivalently write (3.6) as the regularized least squares problem 1 y − Bµ − Σf α − Σg β − Σh γ n
J j=1 2 2 Σf 2 Σg 2 Σh
min
+ λf α
+ λg β
+ λh γ
, (3.9)
where y is the vector of n =
Lj observations {η(yjl )}, B, Σf , Σf are formed by
B=
(1, mjl , tjl )
j,l n×3
,
Σf = K(mjl , mi )
n×I
,
Σf = K(mi , mi )
I×I
and Σg , Σg , Σh , Σh are formed similarly. The GAMEED Approach. The GAMEED refers to an industry approach, namely the generalized additive maturation and exogenous eﬀects decomposition, adopted by Enterprise Quantitative Risk Management, Bank of America, N.A. (Sudjianto, et al., 2006). It has also been 56
documented as part of the patented work “Risk and Reward Assessment Mechanism” (USPTO: 20090063361). Essentially, it has two practical steps: ˆ 1. Estimate the maturation curve f (m) and the exogenous eﬀect g (t) from ˆ
η(y(m, t; v)) = f (m; α) + g(t; β) + ε,
Eg = 0
where f (m; α) can either follow a spline model or take a parametric form supported by prior knowledge, and g(t; β) can be speciﬁed like in (3.4) upon necessary time binding while retaining the ﬂexibility in modeling the macroeconomic inﬂuence. If necessary, one may further model g (t; β) by a timeseries model in ˆ order to capture the autocorrelation and possible jumps. ˆ 2. Estimate the vintagespeciﬁc sensitivities to f (m) and g (t) by performing the ˆ regression ﬁtting
(g) (f ) ˆ ˆ η(y(mjl , tjl ; vj )) = γj + γj f (mjl ) + γj g (tjl ) + ε
for each vintage j (upon necessary bucketing). The ﬁrst step of GAMEED is a reduced type of MEV decomposition without suﬀering theidentiﬁcation problem. The estimates of f (m) and g(t) can be viewed as the common factors to all vintages. In the second step, the vintage eﬀects are measured through not only the main (intercept) eﬀects, but also the interactions
(f ) (g) ˆ with f (m) and g (t). Clearly, when the interaction sensitivities γj = γj = 1, the ˆ
industry GAMEED approach becomes a sequential type of MEV decomposition. The DtD Technique. The DtD refers to the dualtime dynamics, adopted by Strategic Analytics Inc., for consumer behavior modeling and delinquency forecasting. It was brought to the
57
1 Covariance Kernels Gaussian processes are rather ﬂexible in modeling a function Z(x) through a mean function µ(·) and a covariance kernel K(·. ·). 3. the exogenous inﬂuence g(t) and the vintage heterogeneity h(v) are postulated to capture the endogenous. Unfortunately. x ) EZ(x) = µ(x). g(t) and γj . the identiﬁcation problem could as well happen to the DtD technique. However. Clearly. vj )) = γj f (mjl ) + γj g(tjl ) + ε (f ) (g) (3.10) and η −1 (·) corresponds to the socalled superposition function in Breeden (2007). the maturation curve f (m). 3.11) 58 . Breeden (2007) suggested to iteratively ﬁt f (m).2). Using our notations. we propose a general approach to MEV decomposition modeling based on Gaussian processes and provide an eﬃcient algorithm for model estimation. the DtD model involves the interactions between the vintage eﬀects and (f (m). such identiﬁcation problem was not addressed by Breeden (2007).3.public domain by Breeden (2007). unless it imposes structural assumptions that could break up the trend dependency.3 Gaussian Process Models In MEV decomposition framework (3. g(t)). γj (f ) (g) in a simultaneous manner. In this section. (3.10) then apply Lemma 3. To see this. and the stability of the iterative estimation algorithm might be an issue. E[Z(x) − µ(x)][Z(x ) − µ(x )] = σ 2 K(x. the DtD model takes the form η(y(mjl . exogenous and origination eﬀects on the dualtime responses. one may use Taylor expansion for each nonparametric function in (3. tjl . Unlike the sequential GAMEED above.1 to the linear terms.
In the Mat´rn e family. x ) = K(x . Both the exponential and Mat´rn families consist of stationary kernels. 0 < κ ≤ 2. x ). Z(xn )) follows a multivariate normal distribution Z ∼ Nn (0. where the variance scale σ 2 (x) requires a separate model of either parametric or nonparametric type. x) and positive deﬁnite: x. we will write µ(x) separately and only consider the zeromean Gaussian process Z(x). It is symmetric K(x. . ·) plays a key role in constructing Gaussian processes. Z(x )] = σ(x)σ(x )K(x. x ) depends only on xi −xi . xj )]n×n . For ease of discussion. Huang and Li (2007) used the local polynomial approach to smoothing σ 2 (x). u)du. For example. .k. σ 2 Σ) with Σ = [K(xi . Then. . 5/2 and the corre 59 . 3/2. x ) = Kκ . x ) = exp − K(x.x K(x. family: φ−1 (u)Gκ (x. For x ∈ R. φ > 0 (3. κ.12) Γ(κ) φ φ K(x. i = 1. φ(x) > 0. Often used are the halfinteger order κ = 1/2. . . see Abramowitz and Stegun (1972). including the adaptive smoothing spline (AdaSS) kernels discussed in Chapter II. x ) = X κ Exponential family: Mat´rn family: e AdaSS r. There is a parsimonious approach to using stationary kernels K(·. Kκ (·) denotes the modiﬁed Bessel function of order κ. Several popular classes of covariance kernels are listed below. a stationary kernel is also isotropic. The covariance kernel K(·. any ﬁnitedimensional collection of random variables Z = (Z(x1 ). The kernel is said to be stationary if K(x. . It is further said to be isotropic or radial if K(x. ·) to model certain nonstationary Gaussian processes such that Cov[Z(x). Fan. for which e φ is the scale parameter and κ can be viewed as the shape parameter. . x ) depends only on the distance x−x . x − x  . More examples of covariance kernels can be referred to Stein (1999) and Rasmussen and Williams (2006). u)Gκ (x . d for x ∈ Rd . φ > 0 φ x − x  21−κ x − x  κ K(x. . x )f (x)f (x )dxdx > 0 for any nonzero f ∈ L2 .where σ 2 is the variance scale.
Among others. the Gaussian process with kernel (3.14) with piecewiseconstant adaptation φ−1 (x) = φk for x ∈ [τk − 1. In this chapter.g. expressions with piecewise φ−1 (x).2 Kriging.k. the κ = 1/2 Mat´rn kernel.13) x−x  φ It is interesting that these halfinteger Mat´rn kernels correspond to autocorrelated e Gaussian processes of order κ + 1/2. In Chapter 2. t.k.29).14) max generalizes the standard Wiener process with nonstationary volatility. 1+ φ 3φ2 1+ (3. x ) = e− K(x.15) . By (2. see Stein (1999. p. κ = 5/2.3. Spline and Kernel Methods Using Gaussian processes to model the component functions in (3. which is useful for modeling the heterogeneous behavior of an MEV component. τk ) and knots x− ≡ min τ0 < τ1 < · · · < τK ≡ x+ . x ) = k=1 min{k:τk >x∧x } φk x ∧ x − τk−1 (3. κ = 3/2 φ x − x  x − x 2 + . v)) = µ(m. 3.1. In particular. is e the covariance function of the OrnsteinUhlenbeck (OU) process. x ) = e− K(x.3. u) = (x − u)+ /(κ − 1)! with order κ and local adaptation function φ(x) > 0. we consider η(y(m. which is equivalent to the exponential kernel with κ = 1. The AdaSS family of reproducing kernels (r. we have derived the explicit r.2.’s) are based on the Green’s function κ−1 Gκ (x. t. we concentrate on a particular class of AdaSS kernels K(x. κ = 1/2 x − x  .sponding Mat´rn covariance kernels take the following forms e K(x. They are nonstationary covariance kernels with the corresponding Gaussian processes discussed in Chapter 2. e. v) + Zf (m) + Zg (t) + Zh (v) + ε 60 (3. g(t).2) would give us a class of Gaussian process MEV models. 31). x ) = e− x−x  φ x−x  φ .
Zh (v)}.1) with n = J j=1 Lj observations. v) = µ0 + µ1 m + µ2 t. x ). In other situations. z = Z(xjl ) + ε n×1 and Σ = K(xjl . Zh (v) are mutually independent for x ≡ (m. v) = µ0 . xj l ) n×n . let us assume that Zf (m). let us denote µ(m.15) in the vectormatrix form Cov[z] = σ 2 Σ + I y = Bµ + z. respectively. Zg (t). m ) + λ−1 Kg (t. for µ ∈ Rd . (3. we may write (3. Zh (v) are three zeromean 2 2 2 Gaussian processes with covariance kernels σf Kf . σh Kh . the overall mean function is postulated to be µ(m. B = bT jl n×d . t}. the functional space H0 might shrink to be span{1} in order to satisfy (3. where H0 ⊆ span{1. given the vintage data (3. t} ∩ null{Zf (m). By the independence assumption. m. Z(x )] = σ 2 K(x. Zg (t).16) among other choices. v). t.18) 2 2 2 with λf = σ 2 /σf .19) where y = η(yjl ) n×1 . K(x. g. 61 . For simplicity. By either GLS (generalized least squares) or MLE (maximum likelihood estimation). Consider the sum of univariate Gaussian processes Z(x) = Zf (m) + Zg (t) + Zh (v). Then. t.where ε ∼ N (0. then µ(m. g f h (3.17) with mean zero and covariance Cov[Z(x). λg = σ 2 /σg and λh = σ 2 /σh . h allows for H0 = span{0. v) ∈ R3 . choosing cubic spline kernels for all f. Zf (m). then µ(m. t. Zg (t). σ 2 ) is the random error. v ). (3. To bypass the identiﬁcation problem in Lemma 3. t. In general. t ) + λ−1 Kh (v. σg Kg .1. t. For example. v) ∈ H0 on the hyperplane of the vintage diagram. x ) = λ−1 Kf (m. (3.16). v) = µT b(m. m. t.
we will discuss how to iteratively estimate the parameters by an eﬃcient backﬁtting algorithm. 62 . ˆ Usually. Whenever (λ. θg .19) and after dropping the constant term.21) where p = rank(B) is plugged in order to make σ 2 an unbiased estimator. θg . e.20) where the invertibility of Σ + I is guaranteed by large values of λf . θ) that admit no closedform solutions. In the next subsection. the nugget eﬀect). see Fang. (3. the loglikelihood function is given by 1 1 n (µ.a. or local adape tation φ(·) in AdaSS kernel. as the ratios of σ 2 to the variance scales of Zf . Li and Sudjianto (2006. λ. θ) are ﬁxed. namely (λ. λh . θf . λg . as well as the unknown variance parameter σ 2 (a. θ) = (λf .20). n−p σ2 = ˆ (3. θ). θ. θh ). λg .µ is estimated to be −1 −1 −1 µ = BT Σ + I B BT Σ + I y.g scale parameter φ and shape parameter κ in exponential or Mat´rn family of kernels. λh . numerical procedures like NewtonRaphson algorithm are needed to iteratively estimate (λ. By (3. §5). σ 2 ) = − log(σ 2 ) − log Σ + I − 2 (y − Bµ)T (Σ + I)−1 (y − Bµ) 2 2 2σ in which Σ depends on (λ. then one may estimate the variance parameter to be 1 (y − Bµ)Σ−1 (y − Bµ). Zg . The θf . In the Gaussian process models there are additional unknown parameters for deﬁning the covariance kernel (3. λg .18).k. The λf . Zh . λh as in the context of ridge regression. are usually referred to as the smoothing parameters. Such a standard maximum likelihood estimation method is computationally intensive. θh denote the structural parameters. the µ estimate is given by (3.
2 (MEV Kriging as Spline Estimator). Proposition 3. see Cressie (1993) in spatial statistics. By the property of multivariate normal distribution. Kriging can be equally treated by either multivariate normal distribution theory or Bayesian approach.MEV Kriging means the prediction based on the Gaussian process MEV model (3. Let us denote the prediction of η(ˆ(x)) at x = (m. t} and H1 is the reproducing kernel Hilbert space induced by (3. n×1 evaluated between x and each sam The Kriging (3.18) such that H0 ⊥ H1 . v) by ζ(x) = µT b(x) + y E Zf (m) + Zg (t) + Zh (v) + ε z . where E[Zz] denotes the conditional expectation of Z given z = y − Bµ.22) is known to be a best linear unbiased predictor. for which the Kriging behaves as a smoother rather than an interpolator. Formally. xjl ) pling point. where the naming of ‘Kriging’ follows the geostatistical convention. 63 . Li and Sudjianto (2006) for the use of Kriging in the context of computer experiments. The MEV Kriging (3.6) as a special case. See also Fang. The former approach is taken here for making prediction from the noisy observations.22) where ξ n (x) denotes the vector of K(x.15).23) where H = H0 ⊕ H1 with H0 ⊆ span{1. since the conditional expectation of multivariate normal distribution equals the Bayesian posterior mean. t.22) can be formulated as a smoothing spline estimator J Lj 2 arg min ζ∈H j=1 l=1 η(yjl ) − ζ(xjl ) + ζ 2 H1 (3. It has also a smoothing spline reformulation that covers the additive cubic smoothing splines model (3. we have the following proposition. m. −1 ˆ ζ(x) = µT b(x) + ξ T (x) Σ + I n y − Bµ (3.
It connects the kernel methods in machine learning that advocates the use of covariance kernels for mapping the original inputs to some highdimensional feature space. xjl ) (3. see e. The MEV Kriging (3.3 breaks down the highdimensional parameter estimation and it makes possible the development of a backﬁtting algorithm to deal with three uni64 .1).23) can be written as a ﬁnitedimensional expression J Lj ζ(x) = µ b(x) + j=1 l=1 T cjl K(x. or the RKHS.24) with coeﬃcients determined by the regularized least squares. Proposition 3.22) can be expressed additively as I L J ζ(x) = µ b(x) + i=1 T αi Kf (m. Proposition 3. (3. such kernelized feature space corresponds to the family of Gaussian processes.27) where the matrix notations are the same as in (3.26) based on I distinct mvalues.The spline solution to (3. we have the following simpliﬁcation. Proposition 3.22) to a kernelized additive model of the form (3. Rasmussen and Williams (2006) for more details. tl ) + j=1 γj Kg (v. The coeﬃcients can be estimated by 2 2 Σf 2 Σg 2 Σh min y − Bµ − Σf α − Σg β − Σh γ + λf α + λg β + λh γ . vj ) (3. mi ) + l=1 βl Kg (t. (3. 2 min y − Bµ − Σc + c Σ .9) up to mild modiﬁcations.3 converts the MEV Kriging (3. L distinct tvalues and J distinct vvalues in (3. Based on our previous discussions.g.25) Furthermore.3 (Additive Separability).26).
estimate α by ridge regression T −1 T α = Σf Σf + λf Σf Σf y (3. Set the initial values of β.27). the covariance kernels in (3.28) where Σf = Kf (mjl . see also Golub.3 MEV Backﬁtting Algorithm For given (λ. γ. λf . In this section we propose a backﬁtting algorithm to minimize the regularized least squares criterion (3. There are however complications for determining (λ. α. so one may ﬁnd the optimal solution to (3.3.18) are deﬁned explicitly.1) with notations y. θ). β. θf ) I×I . θ). γ to be zero.variate Gaussian processes separately. θh used for deﬁning the covariance kernels. Step 2a: Given Ξ with (α. Determine the 65 . λf . Given the vintage data (3.27) by setting partial derivatives to zero and solving the linear system of algebraic equations. λh . mj . θf . together with a generalized crossvalidation (GCV) procedure for selecting the smoothing parameters λf . Here. it can easily deal with the collinearity problem among Zf (m). as well as the structural parameters θf . θf ) unknown. mi . Heath and Wahba (1979) in the context of ridge regression. Such backﬁtting procedure provides a more eﬃcient procedure than the standard method of maximum likelihood estimation. θg . θf ). 3.2. B in (3. get the pseudo data y = y − Bµ − Σg β − Σh γ.6): Step 1: Set the initial value of µ to be the ordinary least squares (OLS) estimate (BT B)−1 BT y. σ 2 } where λ’s are smoothing parameter and θ’s are structural parameters for deﬁning the covariance kernels. θf ) n×I and Σf = Kf (mi .2. Besides. λh . θh . Zh (v) caused by the hyperplane dependency of vintage diagram. For each input (λf . Denote the complete set of parameters by Ξ = {µ. λg . θg . λg . the GCV criterion follows our discussion in Chapter 2. Zg (t).
Select (λh . involving jumps. Σg and S(λg . θf ) = 1− T 1 Trace S(λf .13). θf ))y n T 2 GCV(λf . Step 2c: Run Step 2a with the unknown parameters replaced by (γ.best choice of (λf . ˆ For future reference. one may choose the AdaSS kernel (3. λh . a combination of diﬀerent covariance kernels can be included in the same MEV model. e. λg .29) −1 where S(λf .20) for given (λf . then estimate β by ridge regression. 66 . (For the ad hoc approach. θg ).) Form Σh . λg .k. θh ) and y = y − Bµ − Σf α − Σg β. Mat´rn and AdaSS r. (3. say. the above iterative procedures are named as the MEV Backﬁtting algorithm. θh ) by GCV. Estimate γ by ridge regression. Obtain σ 2 by (3. we may reconstruct the vintage data by y = Bµ + Σf α + Σg β + Σh γ.14). Step 2b: Run Step 2a with the unknown parameters replaced by (β. Form Σg . θg ) by GCV. β. θg ) and y = y − Bµ − Σf α − Σh γ. After obtaining the parameter estimates. (3. θf ) by minimizing 1 (I − S(λf . In order to capture a diversity of important features. The exponential. whereas other types of kernels can be readily employed.12). λh . remove the trend of y in the vintage direction. θh ). θf ) n 2 .g. Σh and S(λh .21). γ change less than a prespeciﬁed tolerance. Select (λg . θf ) = Σf Σf Σf + λf Σf Σf . It is eﬃcient and can automatically take into account selection of both smoothing and structural parameters. (3. θf . The following are several remarks for practical implementation: 1. Step 3: Repeat steps 2a–2d until convergence. families listed in (3. when the estimates of µ. Step 2d: reestimate the µ by GLS (3. θh ).14) are of our primary e interest here. When the underlying function is heterogeneously smooth. α. θg .
g(t). 67 .28). tjl . t. v) is speciﬁed to be the intercept eﬀect. For simplicity. Of our interest are the the semiparametric MEV regression models with both nonparametric f (m).4 Semiparametric Regression Having discussed the MEV decomposition models based on Gaussian processes. The evaluation of the GCV criterion (3. they may need the centering adjustment due to machine rounding. h(v) and parametric covariate eﬀects. 3. where xjl denotes the covariates of jth vintage at time tjl and age mjl = tjl − vj . vj ). the covariates xjl or xkjl are assumed to be deterministic. we consider in this section the inclusion of covariates in the following two scenarios: 1. one may replace µ0 by the mean of y in step 1 and step 2d. in which case the response vector y is reduced to the pointwise averages. tl ). we will spend more time in the second situation with multiple segments. v) = µ0 .e. In principal. For the ridge regression (3.4). we may write the MEV Kriging as ζ(x) = f (m) + g (t) + h(v) with ˆ I L J ˆ f (m) = µ0 + ˆ i=1 αi Kf (m. h(v) = j=1 γj Kh (v. i.22). i. vj ) ∈ Rp (or xjl in short) for a ﬁxed segment. mi ). see (2. In ˆ ˆ this case. but in ˆ practice. t. the matrix Σf contains rowwise replicates and can be reduced by weighted least squares. no error in variables. 3. After a brief discussion of the ﬁrst situation with a single segment. 2. µ(m.2. where zk represents the static segmentation variables. see (2. g (t) = ˆ ˆ l=1 ˆ ˆ βl Kg (t.29) can be modiﬁed accordingly.e. tkjl . vkj ) ∈ Rp and zk ∈ Rq for multiple segments that share the same vintage diagram. When the mean function µ(m. x(mjl . both g (t) and h(v) in step 2b and step 2c have mean zero. ˆ ˆ 4. x(mkjl .
20).30) where xjl ≡ x(mjl .30) can be iteratively estimated by the modiﬁed backﬁtting algorithm. π) by (3.1 Single Segment Given a single segment with dualtime observations {y(mjl .3. Then. where B = [B. g. θh ). vj )} for j = 1. and f. θf . .30) can be approximated by the kernel basis expansion through (3. J and l = 1. That is. g.2). π) = BT Σ + I B BT Σ + I y. . .e.4. tjl . This GLS estimator can be incorporated with . . π) initially to be the OLS estimate.30) corresponds to the linear mixedeﬀect modeling. instead of starting from the fully nonparametric approach.X] the established MEV Backﬁtting algorithm discussed in Section 3.31) . vj ). we consider the partial linear model that adds the linear covariate eﬀects to the MEV decomposition (3. tjl . with X = [xjl ]n×p . . η(yjl ) = f (mjl ) + g(tjl ) + h(vj ) + π T xjl + ε = µT bjl + π T xjl + Zf (mjl ) + Zg (tjl ) + Zh (vj ) + ε (3.31) with zero Σ.31) for given (λf . tjl . −1 −1 −1 (µ. Only step 1 and step 2d need modiﬁed to be: Step 1: Set (µ. λh . i. . (3. Step 2d: reestimate (µ. we may assume the f.2 may be not satisﬁed between the covariate space span{x} and the Gaussian processes.26). λg . then run 68 . since the orthogonality condition in Proposition 3. x(mjl . (3. vj ). h are nonparametric and modeled by Gaussian processes. the whole set of parameters in (3. (µ. θg . In this case. There is a word of caution about multicollinearity. h components in (3. The semiparametric regression model (3.3. . . π) are the parameters for the mean function. conditional on Σ.3. Lj . and the covariate eﬀects π can be simply estimated in the same way we estimated µ in (3.
therefore equivalent to K separate singlesegment models. g(t). x(mkjl . h(v). while the coeﬃcients uf . 3. the functions f (m). Fortunately. tkjl . b. vkj ). c. . η(ykjl ) = f (mkjl ) + g(tkjl ) + h(vkj ) + π T xkjl + ω T zk + ε η(ykjl ) = fk (mkjl ) + gk (tkjl ) + hk (vkj ) + π T xkjl + ε k (k) (k) (3. .2 Multiple Segments Suppose we are given K segments of data that share the same vintage diagram. . part of Gaussian process components might be oversmoothed. uh represent the idiosyncratic multipliers. zk . the ﬁrst approach assumes that the segments all perform the same MEV components and covariate eﬀect π. Besides. vkj ). h and covariate eﬀects. a. e. the smoothing parameters λf . h(v) represent the common factors across segments. . ug . There are various approaches to modeling crosssegment responses. where xkjl ∈ Rp and z ∈ Rq . ug . g(t). . λh may guard against possible multicollinearity and give the shrinkage estimate of the covariate eﬀects π. As a tradeoﬀ.g. tkjl . both of these approaches can be treated by the singlesegment modeling technique discussed in the last section. y(mkjl . Denote by N the total number of observations across K segments.backﬁtting. j = 1. uh ∈ R to the same underlying f (m). . J and l = 1. . Clearly.33) (k) η(ykjl ) = uf f (mkjl ) + ug g(tkjl ) + uh h(vkj ) + π T xkjl + W (zk ) + ε. it allows for the segmentspeciﬁc addon eﬀects W (zk ) and segmentspeciﬁc 69 (k) (k) (k) (k) (k) (k) . In another word. (3. The second approach takes the other extreme assumption such that each segment performs marginally diﬀerently in terms of f. . Lj . K.4.32) for k = 1. Of our interest of study is the third approach that allows for multiple segments to have diﬀerent sensitivities uf . . . k Here. λg . and diﬀer only through ω T zk . g. . .
(The smoothing spline kernels can be e also used after appropriate modiﬁcation of the mean function. EW (zk )W (zk ) = λW KW (zk . the model (c) is postulated as a balance between the overstringent model (a) and the overﬂexible model (b). X has 70 .12). h(v) = j=1 γj Kg (v. Σg β + uh . Linear Segment Eﬀects. tl ). (uf . The estimation of these two types of segment eﬀects are detailed as follows.) Using the matrix notations.35) with basis approximation I L J f (m) = µ0 + i=1 αi Kf (m. consider (k) (k) η(ykjl ) = uf f (mkjl ) + u(k) g(tkjl ) + uh h(vkj ) + π T xkjl + ω T zk + ε g k (3. Σh γ + Xπ + Zω where the underscored vectors and matrices are formed based on multisegment observations. mi ). uh ) are the extended vector of segmentwise multipliers.34) for the smoothing parameter λW > 0 and some prespeciﬁced family of covariance kernels KW . the model takes the form uf .32). we consider (c1) the linear modeling W (zk ) = ω T zk with parameter ω ∈ Rq and (c2) the Gaussian process modeling EW (zk ) = 0. Diﬀerent strategies can be applied to model the segmentspeciﬁc eﬀects W (zk ). Thus. ug .covariate eﬀect π k . g(t) = l=1 βl Kg (t. For example. vj ) through the exponential or Mat´rn kernels. zk ) (3. Among others. we may use the squared exponential covariance kernel in (3. Given the multisegment vintage data (3. Bµ + Σf α + ug .
g and h. ˆ ˆ Note that in Step 2. The following algorithm is constructed by utilizing the MEV Backﬁtting algorithm as an intermediate routine. . uh are ﬁxed. h(v) are by deﬁnition the common factors in the overall sense. In Step 3. since f (m). ug . α. θ} for the underlying f. g. λ. . When both the kernel parameters (λ. the regularized least squares can be used for parameter estimation. γ. Kg . ug . the concern of stability is one reason. g(t). ug . the segmentwise multipliers ˆ ˆ ˆ uk ’s are updated by regressing over f . π K ). using the same penalty terms as in (3. In so doing. . h functions and uf . g (t) and h(v) are estimated ﬁrst ˆ by assuming uk = Euk = 1 for each of f. h(v) ˆ Step 3: Update (uf . ω) by OLS estimates for ﬁtting y = Fuf + Gug + Huh + Xπ + Zω + ε where F. π. β. . g. respectively. including the segment variability in Step 3 improves both the estimation of (π. ω) to be the OLS estimates for ﬁtting y = Xπ + Zω + ε based on all N observations across K segments.27). h(vkj ). including {µ. and Z is as usual. Step 2: Run the MEV Backﬁtting algorithm for the pseudo data y − Xπ − Zω ˆ ˆ for obtaining the estimates of f (m).K × p columns corresponding to the vectorized coeﬃcients π = (π 1 . the common factors f (m). g (t). G. g (tkjl ). θ) for Kf . ω) 71 . Then. Kh and the (condensed Kvector) multipliers uf . h. ˆ Step 4: Repeat step 2 and step 3 until convergence. H are all N × K regression submatrices constructed based on ˆ ˆ f (mkjl ). Step 1: Set the initial values of (π. Iterative procedures can be used to estimate all the parameters simultaneously. ω across segments. Disregarding the segment variability in Step 2 is also reasonable. uh . uh ) and (π.
zk ) associated with the unknown scale parameter (denoted by θW ). ti ) + i=1 γi Kg (vkj . g. we have the following two iterative stages of model estimation.and the estimation of f. Spatial Segment Heterogeneity. ΣW are constructed accordingly. h are approximated in the same way as in (3.36) where f. The k regression submatrices B. Σg . vi ) + i=1 ωi KW (zk . Such ﬁtting procedure is practically interesting and computationally tractable. all hav72 . let us directly approximate W (z) by the basis expansion W (z) = K k=1 ωk KW (z. g(t). . zi ) + ε by the regularized least squares 2 min y − Bµ − Σf α − Σg β − Σh γ − ΣW ω +λf α 2 Σf + λg β 2 Σg + λh γ 2 Σh + λW ω 2 ΣW . Σf .35) and W (zk ) is assumed to follow (3. estimate the commonfactor functions f (m). Then.37) where y is the pseudo response vector of η(ykjl ) − π T xkjl for all (k. Σh . j. l). mi ) + i=1 K βi Kg (tkjl . h in the next iteration of Step 2. g(t).34). Despite of the or thogonality constraint. . . Stage 1: for given π k and uk ≡ 1. K. k = 1. g. but the orthogonality is not guaranteed between W (zk ) and the span of xkjl . (3. h(v)} can be justiﬁed from a tensorproduct space point of view. h(v) and the spatial segment eﬀect W (zk ) in I L η(ykjl ) − π T xkjl = µ0 + k i=1 J αi Kf (mkjl . We move to consider the segment eﬀect modeling by a Gaussian process (k) (k) η(ykjl ) = uf f (mkjl ) + u(k) g(tkjl ) + uh h(vkj ) + W (zk ) + π T xkjl + ε g k (3. . The orthogonality constraint between W (z) and {f (m).
Stage 2: for given f. The matrices used for regularization are of size I × I. . The parameter estimation can be obtained by the generalized least squares. . Obviously. θW need to be determined. K 5. h. Commonfactor exogenous inﬂuence: g (t) = ˆ ˆ 3. θg . .ing N rows. ug . λW > 0 and the structural parameters θf . Segmentspeciﬁc covariate eﬀects: π k . . ˆ 3. θh . Let us call the extended algorithm MEVS Backﬁtting. J × J and K × K. λh . K 6. λg . tl ) γj Kh (v. vj ) ˆ J j=1 (k) (k) (k) ˆˆ ˆ 4. respectively.5 Applications in Credit Risk Modeling This section presents the applications of MEV decomposition methodology to the real data of (a) Moody’s speculativegrade corporate default rates shown in Ta73 . uh for k = 1. g . . where ΣW = KW (zk . h): uf . where the added letter “S” means the segment eﬀect. Both the smoothing parameters λf . mi ) ˆ ˆ βl Kg (t. Iterate Stage 1 and Stage 2 until convergence. . . for k = 1. zk ). Idiosyncratic multipliers to (f . the regularized least squares in Stage 1 extends the MEV Backﬁtting algorithm to entertain a fourth kernel component KW . Commonfactor vintage heterogeneity: h(v) = L l=1 I i=1 αi Kf (m. zk ) N ×N is obtained by evaluating every pair of obser vations either within or between segments. Spatial segment heterogeneity: W (z) = K k=1 ωk KW (z. uh ) and π based on Cov[w] = σ 2 λW ΣW + I y = Fuf + Gug + Huh + Xπ + w. L × L. g. Commonfactor maturation curve: f (m) = µ0 + ˆ 2. estimate the parameters (uf . then we obtain the following list of eﬀect estimates (as a summary): ˆ 1. . ug .
and its followup performance is observed at each month end. the responses yjl is the product of three marginal eﬀects. In our simulation setting. all the vintages are observed up to 60 monthsonbook. To test the MEV decomposition modeling. . 0} to 48 and mjl = tjl − vj .1.ble 1. For each vintage j with origination time vj . and (b) the tweaked sample of retail loan loss rates discussed in Chapter II. the underlying truth exp(f0 (m)) takes the smooth form of inverse Gaussian hazard rate function (1. . we begin with a simulation study based on the synthetic vintage data.1 Simulation Study Let us synthetize the year 2000 to 2008 monthly vintages according to the diagram shown in Figure 1. Our focus is on understanding (or reconstruction) of the marginal eﬀects ﬁrst. . . Assume each vintage is originated at the very beginning of the month. the underlying exp(g0 (t)) is volatile with exponential growth and a jump at t = 23. 47 originated after 2005. Regardless of the noise. Both examples demonstrate the rising challenges for analysis of the credit risk data with dualtime coordinates. the dualtime responses are simulated by ε ∼ N (0. Let the horizontal observation window be left truncated from the beginning of 2005 (time 0) and right truncated by the end of 2008 (time 48). exp(f0 (m)) · exp(g0 (t)) · exp(h0 (v)). then perform data smoothing on the dualtime domain. (3.1. Vertically. .38) where tjl runs through max{vj . . to which we make only an initial attempt based on the MEV Backﬁtting algorithm. σ 2 ) log(yjl ) = f0 (mjl ) + g0 (tjl ) + h0 (vj ) + ε. 1. . consisting of vintages j = −60. 3. −1 originated before 2005 and j = 0. .02. and the underlying h0 (v) is assumed to be a dominating sine 74 .3. It thus ends with the rectangular diagram shown in Figure 3.13) with parameter c = 6 and b = −0.5.
and taking the inverse transform gives the ﬁtted rates.2 (bottom panel). we obtain the synthetic vintage data. g0 . for simplicity we have enforced φk to be the same for regions without jump.14) e for g(t). They match the underlying truth except for slight boundary bias. the larger variation the corresponding Gaussian process holds.wave within [0. Adding the noise with σ = 0. the smaller the smoothing parameter is. calendar and vintage origination time are shown in the second panel of Figure 3. By (3.k. The noiseremoved recon 75 . therefore. the ordering of crossvalidated λf > λh > λg conﬁrms with the smoothness assumption for f0 . as well as binning for v ≤ 0 based on the prior knowledge that pre2005 vintages can be treated as a single bucket. whose projection views in lifetime.18). Thus. The GCVselected structural and smoothing parameters are tabulated in Table 3. We manually choose the order3/2 Mat´rn kernel (3. then run the MEV Backﬁtting algorithm based on the logtransformed data. Since our data has very few observations in both ends of the vintage direction. the AdaSS r.2. compare them with the underlying truths in order to see the contamination eﬀects by other marginals. See their plots in Figure 3. In ﬁtting g(t) with AdaSS. and the squared exponential kernel (3. we performed binning for v ≥ 40. It is interesting to check the ordering of crossvalidated smoothing parameters. see Figure 3. (3. h0 in our simulation setting.3 for the projection views. Then.1. This synthetic data can be used to demonstrate the ﬂexibility of MEV decomposition technique such that a combination of diﬀerent kernels can be employed. One may check the pointwise averages (or medians in the vintage boxplots) in each projection view.2 (top panel). 40] and relatively ﬂat elsewhere.1.13) for f (m).12) for h(v). the MEV Backtting algorithm outputs the estimated marginal eﬀects plotted in Figure 3. the smoothing parameters correspond to the reciprocal ratios of their variance scales to the nugget eﬀect σ 2 . we have speciﬁed the knots τk to handle the jump by diagnostic from the raw marginal plots. Combining the three marginal estimates.
(3nd) MEV Backﬁtting algorithm upon convergence.2: Synthetic vintage data analysis: (top) underlying true marginal eﬀects. (bottom) Estimation compared to the underlying truth.Figure 3. 76 . (2nd) simulation with noise.
650. 0.086 AdaSS r. MEV Kernel choice Structural parameter Smoothing parameter f (m) Mat´rn (κ = 3/2) e φ = 2.Table 3.000 7.3: Synthetic vintage data analysis: (top) projection views of ﬁtted values.872 20. 8.182 g(t) φk = 0. 47 1. 77 . (bottom) data smoothing on vintage diagram.389 Figure 3.1: Synthetic vintage data analysis: MEV modeling exercise with GCVselected structural and smoothing parameters. 25.288.14) τk = 21.k.288 h(v) Squared exponential φ = 3.(3. Eq.
the vintage eﬀects are merged for large v = 30. 39. the many spikes shown in the lifetime projection actually correspond to the exogenous eﬀects.5. the exogenous eﬀects g(t) for t = 2.3 to see the dualtime smoothing eﬀect on the vintage diagram. it is evident that exogenous inﬂuence by macroeconomic environment is more ﬂuctuating than both the maturation and vintage eﬀects. Such truncation in both lifetime and calendar time corresponds to the trapezoidal prototype of vintage diagram illustrated in Figure 3. and the leftmost observation at t = 1 is removed because of its jumpy behavior. It is our purpose to separate these marginal eﬀects. One may also compare the level plots of the raw data versus the reconstructed data in Figure 3. see e. Second.4 of Chapter 1 we have previewed the annual corporate default rates (in percentage) of Moody’s speculativegrade cohorts: 19702008. such marginal plots are contaminated by each other. Besides.4 gives the projection views of the empirical default rates in lifetime. First. 78 . . We made slight data preprocessing. since there are very limited number of observations for small calendar time. .struction can reveal the prominent marginal features.1. By checking the marginal averages or medians plotted on top of each view. 3. for vintages originated after 1988. the transform function is prespeciﬁed to be η(y) = log(y/100) where y is the raw percentage measurements and zero responses are treated as missing values (one may also match them to a small positive value). 3 are merged to t = 4. . Each yearly cohort is regarded as a vintage.2 Corporate Default Rates In the Figure 1. the default rates are observed up to 20 years maximum in lifetime. Figure 1. It is clear that the MEV decomposition modeling performs like a smoother on the dualtime domain.g. calendar and vintage origination time. For vintages originated earlier than 1988. However. . the default rates are observed up to the end of year 2008.
5: MEV ﬁtted values: Moody’srated Corporate Default Rates 79 . Figure 3. Shown in the left panel is the GCV selection of smoothing and structural parameters.4: Results of MEV Backﬁtting algorithm upon convergence (right panel) based on the squared exponential kernels.Figure 3.
Kf . say Zf (m).4. partly because of the small size of replicates. respectively. it is implied that the exogenous inﬂuence shares the largest variation. 6. The backﬁtting results upon convergence are shown in Figure 3.3 Retail Loan Loss Rates Recall the tweaked creditrisk sample discussed in Chapter II.1.e. Compared to the raw plots in Figure 1. for which the mean function is ﬁxed to be the intercept eﬀect. For each univariate Gaussian process ﬁtting. Zh . Zg . the same tweaked sample 80 . the changes of estimated exogenous eﬀects in Figure 3. otherwise the reciprocal conditional number of the ridged regression matrix would vanish). we used the grid search on the log scale of λ with only 3 diﬀerent choices of φ = 1. It was used as a motivating example for the development of adaptive smoothing spline in ﬁtting the pointwise averages that suppose to smooth out in lifetime (monthonbook. where GCV selects the largest scale parameter in each iteration. By (3. Kh are all chosen to be the squared exponential covariance kernels. while the maturation curve has the least variation. In this section. 2.12)).To run the MEV Backﬁtting algorithm. the MEV modeling could retain some of most interesting features. For demonstration. 3. the scale parameter φ in (3. followed by the vintage heterogeneity. the GCV criterion selects the smoothing parameter λf . The inverse transform η −1 (·) maps the ﬁtted values to the original percentage scale.050 for Zf . and partly because of the contamination by the exogenous macroeconomic inﬂuence.5 plots the ﬁtted values in both lifetime and calendar views. Furthermore. while smoothing out others.4 follow the NBER recession dates shown in Figure 1. as well as the structural parameter θf (i.5.025. Figure 3. Kg . An interesting result is the GCVselected smoothing parameter λ.4. which turns out to be 9. However. 3 (φ = 3 is set to be the largest scale for stability.18). the raw responses behave rather abnormally in large monthonbook. in this case). 0.368.
5. together with the exponentiated exogenous and vintage heterogeneity multipliers. The rates behave rather dynamic and volatile in t. The rates show heterogeneity across vintages.is analyzed on the dualtime domain. from the vintage view. To model the loss rates. The rates behave relatively smooth in m. calendar and vintage origination time. subject to Eg = Eh = 0. 2. from the calendar view. They measure the loss rates in retail revolving exposures from January 2000 to December 2006. eg(t) and eh(v) . Figure 3. where the exponentiated maturation curve could be regarded as the baseline. 6. timescale binding was performed ﬁrst for large monthonbook m ≥ 72 and small calendar time t ≤ 12. which is mostly typical in retail risk management. 4. Some of the interesting features for such vintage data are listed below: 1.1. The rates are ﬁxed to be zero for small monthonbook m ≤ 4. 3. In this MEV modeling exercise.6 (bottom panel). The ﬁtting results are presented in Figure 3. There are less and less observations when the monthonbook m increases. but also the exogenous inﬂuence and the vintage heterogeneity eﬀects. for which we model not only the maturation curve. where the new vintages originated in 2006 are excluded since they have too short window of performance measurements. The tweaked sample of dualtime observations are supported on the triangular vintage diagram illustrated in Figure 3. and the GCV criterion was used to select the smoothing and structural parameters. the original responses are decomposed multiplicatively to be ef (m) . from the lifetime view. 81 . There are less and less observations when the calendar month t decreases.6 (top panel) shows the projection views in lifetime. the Mat´rn kernels were chosen to run the MEV e Backﬁtting algorithm. The MEV decomposition model takes the form of log(yjl ) = f (mjl ) + g(tjl ) + h(vj ) + ε. In this way. We prespeciﬁed the log transform for the positive responses (upon removal of zero responses).
Each of these esˆ timated functions follows the general dynamic pattern of the corresponding marginal averages in the empirical plots above. exogenous inﬂuence and 82 . g (t). h(v) in the log response scale.6 Discussion For the vintage data that emerge in credit risk management. It is also worth pointing out that the exogenous spike could be captured relatively well by the Mat´rn kernel. calendar t and vintage origination time ˆ ˆ v. e 3.6: Vintage data analysis of retail loan loss rates: (top) projection views of emprical loss rates in lifetime m. ˆ ˆ ˆ which shows the estimated f (m). (bottom) MEV decomposition eﬀects f (m). the maturation curve is rather smooth.Figure 3. g (t) and h(v) (at log scale). Compared to both the exogenous and vintage heterogeneity eﬀects. we propose an MEV decomposition framework to estimate the maturation curve. The diﬀerences in local regions reﬂect the improvements in each marginal direction after removing the contamination of other marginals.
we have studied nonparametric smoothing based on Gaussian processes. it is straightforward to simulate the random paths for a sequence of inputs (x1 . . and demonstrated its ﬂexibility through choosing covariance kernels. which also appears in the conventional cohort analysis under the ageperiodcohort model. v) on the vintage diagram. It is however not directly performing crossvalidation on the dualtime domain. which would be an interesting subject of future study. To regularize the threeway marginal eﬀects. there are other open problems associated with the vintage data analysis (VDA) worthy of future investigation. g(t). v. The model forecasting is often a practical need in ﬁnancial risk management. Σ(c) ). . 2. the formula (3. as it is concerning about the future uncertainty. By Kriging theory. The new technique is then tested through a simulation study and applied to analyze both examples of corporate default rates and retail loss rates. 1. where the conditional mean and covariance 83 . h(v) in each marginal direction in order to know about their behaviors out of the training bounds of m. for which some preliminary results are presented. one needs to extrapolate f (m). The dualtime model assessment and validation remains an issue. Beyond our initial attempt made in this chapter. An eﬃcient MEV Backﬁtting algorithm is provided for model estimation.vintage heterogeneity on the dualtime domain. Furthermore. . . for any x = (m. In the MEV Backﬁtting algorithm we have used the leaveoneout crossvalidation for ﬁtting each marginal component function. To make prediction within the MEV modeling framework. structural and smoothing parameters.22) provides the conditional expectation given the historical performance. t. One diﬃculty associated with the vintage data is the intrinsic identiﬁcation problem due to their hyperplane nature. xp ) based on the multivariate conditional normal distribution Np (µ(c) . t.
one may use parametric time series models. The MEV decomposition we have considered so far is simpliﬁed to exclude the interaction eﬀects involving two or all of age m. For example. time t and vintage v arguments. the outofsample test is recommended. Such parametric approach would beneﬁt the loss forecasting as they become more and more validated through experiences. one may estimate the parametric f (m) with nonlinear regression technique. 4. there is a word of caution about making prediction of the future from the historical training data. In formulating the models by Gaussian processes.13). x ) i j p×p (3. Zh (v). 3. Before making forecasts in terms of calendar time.19).5. xjl ) i p×n with Σ = K(x . like our synthetic case in Section 3. In the MEV modeling framework.are given by (c) µ = Bµ + Υ Σ + I −1 y − Bµ Σ(c) = Σ − Υ Σ + I −1 ΥT Υ = K(x . while g(t) can be interfered by future macroeconomic changes and h(v) is also subject to the future changes of vintage origination policy. Zg (t). As for the exogenous eﬀect g(t) that is usually dynamic and volatile. Consider 84 . For extrapolation of MEV component functions. Fan and Yao (2003). parametric modeling approach is sometimes more tractable than using Gaussian processes.39) using the aggregated kernel given by (3.g.18) and the notations in (3. However. see e. Zh (v) is also postulated. Zg (t). extrapolating the maturation curve f (m) in lifetime is relatively safer than extrapolating the exogenous and vintage eﬀects g(t) and h(v). since f (m) is of selfmaturation nature. the mutual independence among Zf (m). if given the prior knowledge that the maturation curve f (m) follows the shape of the inverse Gaussian hazard rate (1. One of the ideas for taking into account the interaction eﬀects is to crosscorrelate Zf (m).
1989).2) plays a similar role as such a link function. except for that it is functioning on the rates that are assumed to be directly observed. Zf (m ) + Zg (t ) + Zh (v )] = EZf (m)Zf (m ) + EZf (m)Zg (t ) + EZf (m)Zh (v ) + EZg (t)Zf (m ) + EZg (t)Zg (t ) + EZg (t)Zh (v ) + EZh (v)Zf (m ) + EZh (v)Zg (t ) + EZh (v)Zh (v ). λg λh EZg (t)Zh (v) = where Kgh (t. if we are given the raw binary observations or unit counts. One more extension of VDA is to analyze the timetodefault data on the vintage 85 .5. In general. which reduces to (3.17) with covariance Cov[Zf (m) + Zg (t) + Zh (v). For example. while the loss rates in the synthetic retail data can be viewed as the ratio of loss units over the total units. it is straightforward to incorporate the crosscorrelation structure and follow immediately the MEV Kriging procedure. Thus. one may consider the generalized MEV models upon the speciﬁcation of a link function.18) when the crosscorrelations between Zf (m). the scope of VDA can be also generalized to study noncontinuous types of observations. Results will be presented by a future report. the Moody’s default rates were actually calculated from some raw binary indicators (of default or not).the combined Gaussian process (3. ρ ≥ 0 in order to study the interaction eﬀect between the vintage heterogeneity and the exogenous inﬂuence. Besides the open problems listed above. as an analogy to the generalized linear models (Mccullagh and Nelder. The transform η in (3. Zg (t) and Zh (v) are all zero. For examples we considered in Section 3. v) = ρI(t ≥ v). one of our works in process is to assume that σ2 Kgh (t. v).
2: By applying the general spline theorem to (3. since otherwise f0 (m) − m.22).t.24) and (3.l cjl I(tjl = tl ). 3. vj . which leads ζ(x) to have the same form as the Kriging predictor (3. 86 . g0 . respectively. βl = λ−1 g j. which is deferred to the next chapter where we will develop the dualtime survival analysis. Lj αi = λ−1 f j.l cjl I(mjl = mi ). γj = λ−1 h l=1 cjl (3. xjl ).26).23). Proof of Proposition 3. tl . K(·.40) that aggregate cjl for the replicated kernel bases for every mi . xjl ). xj l ). where cT ξ n = J j=1 Lj l=1 cjl K(x. at least one of the linear parts f0 . g0 (t) + t. µ and c and setting them to zero gives the solution −1 µ = c = BT (Σ + I)−1 B Σ+I −1 BT (Σ + I)−1 y (y − Bµ).2). xj l ) = K(xjl . h0 (v) − v would always satisfy (3. Taking the partial derivatives w.18) and comparing (3. it is not hard to derive the relationships between two sets of kernel coeﬃcients. h0 is not identiﬁable.19). then we get 2 min y − Bµ − Σc + c Σ using the notations in (3.7 Technical Proofs Proof of Lemma 3.r.3: By (3.1: By that m − t + v = 0 on the vintage diagram Ω.diagram. Substitute it back into (3.23) and use the H1 reproducing property K(·. Proof of Proposition 3. the target function has a ﬁnitedimensional representation ζ(x) = µ0 + µ1 m + µ2 t + cT ξ n .
αI )T . . . we have that Σc = Σf α + Σg β + Σh γ cT Σc = λf αT Σf α + λg β T Σg β + λh γ T Σh γ. using the new vectors of coeﬃcients α = (α1 . .25) leads to (3. . β = (β1 . βL )T and γ = (γ1 . γJ )T . 87 .Then. . .27). as asserted. . . . . Plugging them into (3. . .
. i = 1. Uji . they share the same age at any followup calendar time. ∆ji ). . dualtime observations of survival data subject to truncation and censoring can be graphically represented by the Lexis diagram (Keiding. J (4. Age could be the lifetime of a person in a common sense. age refers to the duration of a credit account since its origination (either a corporate bond or a retail loan).1 Introduction Consider in this chapter the default events naturally observed on the dualtime domain. lifetime (or. . j = 1. let us simulate a Lexis diagram of dualtimetodefault observations: (Vj . . where the notion of hazard rate is also referred to be the default intensity in credit risk context.4. Mji . For many accounts having the same origin. .1) 88 . and group as a single vintage. and it could also refer to the elapsed time since initiation of a treatment. In credit risk modeling. As introduced in Chapter 1. . such that the events happening at the same calendar time may diﬀer in age. To begin with. age) in one dimension and calendar time in the other. .CHAPTER IV Dualtime Survival Analysis 4. Refer to Chapter I about basics of survival analysis in lifetime scale. 1990). Of our interest is the timetodefault from multiple origins. . nj .
a memoryless exponential lifetime distribution). . mmax .. Cji . 47. besides the systematic censoring with mmax = 60 and tmax = 48. and ∆ji indicates the status of termination such that ∆ji = 1 if the default event occurs prior to getting censored and 0 otherwise. . Cji } : Mji = min{τji . Vj + s)ds ≥ − log X.2) for each given Vj = −60.2) nj times independently. Uji . 48] and vintage origination Vj = −60. Mji . an independent random censoring with the constant hazard rate λatt is used to represent the account attrition (i. −1.where Vj denotes the origination time of the jth vintage that consists of nj accounts. 1. . . t) or λj (m. 1] (attrition) ¯ ¯ Uji = max{Uji . For the censoring mechanism. t) the underlying dualtime hazard rate of vintage v = Vj at lifetime m and calendar time t. . one may book nj number of origination accounts by repeatedly running (4. 0. the observations are left truncated at time 0. Uji denotes the time of left truncation. 0. X ∼ Unif 0. For each vintage j. mmax . tmax − Vj } (default indicator) (4. with Uji ≡ 0 (left truncation) Book (Vj . one may simulate the dualtimetodefault data (4. For each account. 60].e. . So. . . 47. assume the multiplicative form λv (m. same as the synthetic vintage data in Chapter 3. tmax − Vj } (termination time) ∆ji = I τji ≤ min{Cji . ∆ji ) if Uji < min{τji . Vj } − Vj . 1 (default) Cji = inf m : λatt m ≥ − log X. . X ∼ Unif 0. −1.3) 89 . Let us ﬁx the rectangular vintage diagram with age m ∈ [0. . Denote by λv (m. calendar time t ∈ [0. . . .1) according to: m τji = inf m : 0 λj (s. Mji denotes the lifetime of event termination. Among other types of dualtime default intensities. t) = λf (m)λg (t)λh (v) = exp f (m) + g(t) + h(v) (4. Let us begin with nj ≡ 1000 and specify the attrition rate λatt = 50bp (a basis point is 1%). . .5. 1. . Then.
namely 1.14) in lifetime and (4.1 (left) is an illustration of the Lexis diagram for one random simulation per vintage. 2.Figure 4. Figure 4. where the maturation curve f (m). For simulating the lefttruncated survival accounts. Each marginal view of the hazard rates shows clearly the pattern matching our underlying assumption. the exogenous inﬂuence g(t) and the unobserved heterogeneity h(v) are interpreted in the sense of Chapter III and they are assumed to follow Figure 3. The exogenous performance in calendar time is relatively dynamic and volatile. respectively.2 (top panel). where each asterisk at the end of a line segment denotes a default event and the circle denotes the censorship. (right) empirical hazard rates in either lifetime or calendar time. To see default behavior in dualtime coordinates. The endogenous (or maturation) performance in lifetime is relatively smooth.1 (right).1: Dualtimetodefault: (left) Lexis diagram of subsampled simulations. we calculated the empirical hazard rates by (4.15) in calendar time. we assume the zero exogenous inﬂuence on their historical hazards. 90 . and the results are plotted in Figure 4.
. with very few exceptions that take either calendar time as the primary scale (e. There is also discussion on selection of the appropriate timescale in the context of multiple timescale reduction. see e. Arjas (1986)) or both time scales (see the survey by Keiding (1990)). . Gu and Ying (1997) by empirical process theory. such oneway baseline model speciﬁcation is unable to capture the baseline hazard in calendar time. Despite that the Lexis diagram has appeared since Lexis (1875). Then. Mostly used is the CoxPH regression model with an arbitrary baseline hazard function in lifetime. statistical literature of survival analysis has dominantly focused on the lifetime only rather than the dualtime scales. (1993) with nine to one chapter coverage. it comes to Efron (2002) with symmetric treatment of dual timescales under a twoway proportional hazards model λji (m. j = 1. given the multiple dimensions that may include usage scales like mileage of a vehicle.g. In another word. Such timescale dimension reduction approach can be also referred to Oakes (1995) and Duchesne and Lawless (2000).k. . Farewell and Cox (1979) about a rotation technique based on a naive Cox proportional hazards (CoxPH) model involving only the linearized timecovariate eﬀect. It is our purpose to develop the dualtime survival analysis in not only lifetime.g. survival data with staggered entry) has also concentrated on the use of lifetime as the basic timescale together with timedependent covariates. . nj . see e. J. i = 1. t) = λf (m)λg (t) exp{θ T zji (m)}. . see also the related works of Slud (1984) and Gu and Lai (1991) on twosample tests.4) 91 . . where the default risk is greatly aﬀected by macroeconomic conditions. Andersen. . but also simultaneously in calendar time. Some weak convergence results can be referred to Sellke and Siegmund (1983) by martingale approximation and Billias. et al. lifetime has been taken for granted as the basic timescale of survival. . However. The literature of regression models based on dualtimetodefault data (a.g. (4.This is typically the case in credit risk modeling.a.
intensitybased semiparametric regression. hence name (4. which is deﬁned by an endogenous distancetodefault process associated with the exogenous timetransformation. In Section 4. The simulation data described above will be used to assess the performance of nonparametric estimators. The aforementioned dualtime Cox model (4. we may allow for arbitrary (nonnegative) λf (m) and λg (t). there seems to be no formal literature on dualtimetodefault risk modeling. Also discussed is the frailty type of vintage heterogeneity by a randomeﬀect formulation.6 and provide some supplementary materials at the end. we demonstrate the applications of both nonparametric and semiparametric dualtime survival analysis to credit card and mortgage risk modeling. structural parameterization.3 we discuss the dualtime feasibility of structural models based on the ﬁrstpassagetime triggering system. In credit risk.4 is devoted to the development of dualtime semiparametric Cox regression with both endogenous and exogenous baseline hazards and covariate eﬀects. as well as the timeindependent covariates.4) to be a twoway Cox regression model. for which the method of partial likelihood estimation plays a key role. based on our real dataanalytic experiences in retail banking.Note that Efron (2002) considered only the special case of (4.1.2 we start with the generic form of likelihood function based on the dualtimetodefault data (4.4) with cubic polynomial expansion for log λf and log λg . In Section 4. It is our objective of this chapter to develop the dualtime survival analysis (DtSA) for credit risk modeling.5. and frailty speciﬁcation accounting for vintage heterogeneity eﬀects. We conclude the chapter in Section 4. This chapter is organized as follows.4) will be one of such developments. including the methods of nonparametric estimation. In general. Section 4. In Section 4. twoway and threeway nonparametric estimators on Lexis diagram. whereas the Efron’s approach with polynomial baselines will fail to capture the rather diﬀerent behaviors of endogenous and exogenous hazards illustrated in Figure 4.1). then consider the oneway. 92 .
m+ = m.8) In what follows we discuss oneway. On the continuous domain.Mji ] log[1 − λji (m)] .2 Nonparametric Methods Denote by λji (m) ≡ λji (m.6) On the discrete domain of m.1) with independent lefttruncation and rightcensoring. then the loglikelihood function has the generic form J nj = j=1 i=1 J nj ∆ji log λji (Mji ) + log Sji (Mji ) − log Sji (Uji ) Mji = j=1 i=1 ∆ji log λji (Mji ) − Uji λji (m)dm . t) with t ≡ Vj + m the hazard rates of lifetimetodefault τji on the Lexis diagram. consider the joint likelihood under either continuous or discrete setting: J nj j=1 i=1 pji (Mji ) + Sji (Uji ) ∆ji + Sji (Mji ) + Sji (Uji ) 1−∆ji (4. Given the dualtimetodefault data (4.5) can be rewritten as J nj s<m [1−λji (s)] and the likelihood function λji (Mji ) j=1 i=1 ∆ji [1 − λji (Mji )]1−∆ji m∈(Uji . (4.7) and the loglikelihood function is given by J nj = j=1 i=1 ∆ji log λji (Mji ) 1 − λji (Mji ) + m∈(Uji .Mji ) [1 − λji (m)] (4. twoway and threeway nonparametric approach to the maximum likelihood estimation (MLE) of the underlying hazard rates. (4.4.5) where pji (m) denotes the density and Sji (m) denotes the survival function of τji . Sji (m) = (4. 93 . Sji (m) = e−Λji (m) with the cumulative hazard Λji (m) = m 0 λji (s)ds.
1 − λj (mi ) mi ≤m− (4. Given the data (4. nriskj (m) λj (m) = (4. Then.4. It is easy to show that the MLE of λj (m) is given by the following empirical hazard rates neventj (m) if neventj (m) ≥ 1 and NaN otherwise.2.1 Empirical Hazards Prior to discussion of oneway nonparametric estimation. Clearly.13) Sj (m) = for any m ≥ 0 under either continuous or discrete setting. let λj (m) be the underlying hazard rate in lifetime.10) on the discrete domain of m. number of defaults: neventj (m) = number of atrisk’s: nriskj (m) = nj I(Mji = m. 94 . For each ﬁxed Vj .8) can be rewritten as j = m neventj (m) log λj (m) + nriskj (m) − neventj (m) log[1 − λj (m)] (4. where the empirical hazard rates λ(mi ) ≡ ∆Λ(mi ) are also called the NelsonAalen increments. ∆ji = 1) i=1 nj (4. the jthvintage contribution to the loglikelihood (4. let us review the classical approach to the empirical hazards vintage by vintage.1).9) I(Uji < m ≤ Mji ) i=1 for a ﬁnite number of lifetime points m with neventj (m) ≥ 1. the KaplanMeier estimator and the NelsonAalen estimator are related empirically by Sj (m) = mi ≤m− 1 − ∆Λ(mi ) .11) They can be used to express the wellknown nonparametric estimators for cumulative hazard and survival function (NelsonAalen estimator) (KaplanMeier estimator) Λj (m) = mi ≤m λj (mi ).12) (4.
However. we are ready to present the oneway nonparametric estimation based on the complete dualtimetodefault data (4. they may be biased in other cases we shall discuss next. then we may immediately extend (4. in the calendar time (horizontal of Lexis diagram) assume λji (m. Both oneway nonparametric estimates are plotted in the right panel of Figure 4. Alternatively.11) by simple aggregation to obtain the oneway estimator in lifetime: J j=1 neventj (m) . one may obtain the empirical cumulative hazard and survival function by the NelsonAalen estimator (4. (4.14) whenever the numerator is positive.12) and the KaplanMeier estimator (4.16) For identiﬁability.Now.13) in each marginal dimension. for all (j. i).1). i). t) = λ(m) for all (j. which may capture roughly the endogenous and exogenous hazards. t) = λf (m)λg (t) = exp{f (m) + g(t)}. then we may obtain the oneway nonparametric estimator in calendar time: J j=1 neventj (t − Vj ) .15) whenever the numerator is positive.1. assume λji (m.2 DtBreslow Estimator The twoway nonparametric approach is to estimate the lifetime hazards λf (m) and the calendar hazards λg (t) simultaneously under the multiplicative model: λji (m. 4. J j=1 nriskj (m) λ(m) = (4.1 (upon linear interpolation). i).2. For example. assume that E log λg = Eg = 0. J j=1 nriskj (t − Vj ) λ(t) = (4. In the lifetime dimension (vertical of Lexis diagram). 95 . consider the simulation data introduced in Section 4. t) = λ(t) for all (j. Accordingly.
m = m[1] . we give an iterative DtBreslow estimator based on Breslow (1972). . t[Lt ] . . g (t) = ˆ l=1 βl I(t = t[l] ) (4. (4.18) to (4.18) and similarly given any ﬁxed λf = λf .16) as the product of a baseline hazard function and a relativerisk multiplier.6) or (4. To take into the identiﬁability constraint E logg (t) = 0. . t = t[1] . One may ﬁnd the MLE of the coeﬃcients α.17) by maximizing (4. . J j=1 J j=1 λg (t) ← neventj (t − Vj ) λf (t − Vj ) · nriskj (t − Vj ) . . we obtain the DtBreslow estimator.17) based on Lm distinct lifetimes m[1] < · · · < m[Lm ] and Lt distinct calendar times t[1] < · · · < t[Lt ] such that there is at least one default event observed at the corresponding time. we may update λg (t) by λg (t) ← exp log λg (t) − mean log λg . we may restrict to the pointwise parametrization Lm Lt ˆ f (m) = l=1 αl I(m = m[l] ).Following the oneway empirical hazards (4. m[Lm ] (4. . Rather than the crude optimization. .20) By iterating (4.20) until convergence.19) Both marginal Breslow estimators are known to be the nonparametric MLE of the baseline hazards conditional on the relativerisk terms. . Given the dualtimetodefault data (4.8). by treating (4. the Breslow estimator of λf (m) is given by J j=1 J j=1 λf (m) ← neventj (m) λg (Vj + m) · nriskj (m) .11) above. β in (4. (4. It could be viewed as a natural extension of the oneway nonparametric estimators 96 .1) and any ﬁxed λg = λg .
see Lemma 3. We carry out a simulation study to assess the performance of DtBreslow estimator. It is demonstrated that the oneway nonparametric methods suﬀer the potential bias in both cases.15). if we use the nonrectangular Lexis diagram (e.14) and (4. In (4. However. in calibrating λf (m)) depends on not only the exogenous scaling factor λg (t). For this rectangular diagram of survival data.14) and (4. g. Mji . For each vintage with accounts (Vj . t) = λf (m)λg (t)λh (Vj ) = exp f (m) + g(t) + h(Vj ) . Among other 97 . nj .15).21) For identiﬁability. then the algorithm could converge in a few iterations.2).16) with the vintage heterogeneity. ∆ji ) for i = 1.18). .g. but also the vintagespeciﬁc weights nriskj (m).1.21). it is clear that the lifting power of DtBreslow estimator (e. h are not identiﬁable. Due to the hyperplane nature of the vintage diagram such that t ≡ Vj + m in (4. Also plotted are the oneway nonparametric estimation and the underlying truth. we consider a threeway nonparametric procedure that is consistent with the MEV (maturationexogenousvintage) decomposition framework we have developed for vintage data analysis in Chatper III. see Figure 4. assume the underlying hazards to be λji (m. .2. the oneway and DtBreslow estimations have comparable performance. Using the simulation data by (4. By comparing (4.2 (top). (4.16) to (4. the DtBreslow estimator would demonstrate the signiﬁcant improvement over the oneway estimation. Such MEV hazards model can be viewed as the extension of dualtime nonparametric model (4.3 MEV Decomposition Lastly. .(4. the twoway estimations of λf (m) and λg (t) are shown in Figure 4. .2 (middle and bottom). either pre2005 vintages or post2005 vintages). let Eg = Eh = 0. Uji .g. the linear trends of f. it is usually appropriate to set the initial exogenous multipliers λg ≡ 1. 4.
98 .2: DtBreslow estimator vs oneway nonparametric estimator using the data of (top) both pre2005 and post2005 vintages. (middle) pre2005 vintages only.Figure 4. (bottom) post2005 vintages only.
Note that the second strategy assumes the zero trend in vintage heterogeneity. we consider 1. which is an ad hoc approach when we have no prior knowledge about the trend of f. h. with g = log λg for t = t[1] . Later in Section 4.24) λ (·) ← exp h − mean h h 99 . The nonparametric MLE for the MEV hazard model (4. .approaches to break up the linear dependency. Removing the marginal trend in the vintage dimension h(·). Binning in vintage. . .4 we discuss also a frailty type of vintage eﬀect. c) Vintage eﬀect for ﬁxed λf (m) and λg (t) λh (Vj ) ← I(∆ji = 1) λf (m[l] )λg (Vj + m[l] ) · nriskj (m[l] ) . . . with h = log λh nj i=1 Lm l=1 (4. Speciﬁcally.23) λh (Vj )λf (t − Vj ) · nriskj (t − Vj ) λg (·) ← exp g − mean g . g. lifetime or calendar time for especially the marginal regions with sparse observations. we may estimate the three conditional marginal eﬀects iteratively by: a) Maturation eﬀect for ﬁxed λg (t) and λh (v) J j=1 J j=1 λf (m) ← neventj (m) λh (Vj )λg (Vj + m) · nriskj (m) (4. m[Lm ] .22) for m = m[1] . t[Lt ] . .21) follows the DtBreslow estimator discussed above. . b) Exogenous eﬀect for ﬁxed λf (m) and λh (v) λ (t) ← g J j=1 J j=1 neventj (t − Vj ) (4. . 2.
For this reason. Then. then iterating the above three steps until convergence would give us the nonparametric estimation of the threeway marginal eﬀects. Remarks: before ending this section. Besides. the dualtimetodefault data have essential diﬀerence from the usual bivariate lifetime data that are associated with either two separate failure modes per subject or a single mode for a pair of subjects. Furthermore. . .24). we obtain the estimation of maturation hazard λf (m).3). They are consistent with the underlying true hazard functions we have prespeciﬁed in (4. one may also perform the trendremoval λh (·) ← exp{h − mean⊕trend h } if zerotrend can be assumed for the vintage heterogeneity. it is questionable whether the nonparametric methods developed for bivariate survival are applicable to Lexis diagram. The ﬁtting results are also presented in Figure 4. which are plotted in Figure 4. we make several remarks about the nonparametric methods for dualtime survival analysis: 1. we have also tested the MEV nonparametric estimation for the dualtime data with either pre205 vintages only (trapezoidal diagram) or the post2005 vintages only (triangular diagram). exogenous hazard λg (t) and the vintage heterogeneity λh (v). see Keiding (1990). J (upon necessary binning). given the prior knowledge that pre2005 vintages have relatively ﬂat heterogeneity. .for j = 1. Consider again the simulation data with both pre2005 and post2005 vintages observed on the rectangular Lexis diagram Figure 4.1. the vintages with Vj ≤ −36 are merged and the vintages with Vj ≥ 40 are cut oﬀ.3.3 (top). Setting the initial values λg ≡ 1 and λh ≡ 1. In (4. Since there are limited sample sizes for very old and very new vintages. . Our developments 100 . binning is performed in every half year. running the above MEV iterative procedure. which implies that our MEV decomposition is quite robust to the irregularity of the underlying Lexis diagram. First of all.
3: MEV modeling of empirical hazard rates. 101 . (bottom) post2005 vintages only. based on the dualtimetodefault data of (top) both pre2005 and post2005 vintages. (middle) pre2005 vintages only.Figure 4.
see (3.17) the endogenous and exogenous hazard rates to be pointwise.4. the largesample properties of DtBreslow estimators are worthy of future study when the dimensions of both α and β increase to inﬁnity at the essentially same speed. t on the continuous domains. The nonparametric models (4. One may postprocess them by kernel smoothing 1 λf (m) = bf Lm Kf l=1 m − m[l] 1 λf (m[l] ). see (4. Smoothing techniques can be applied to oneway.4) in the previous chapter. we have assumed in (4.of DtBreslow and MEV nonparametric estimators for default intensities correspond to the ageperiod or ageperiodcohort model in cohort or vintage data analysis of loss rates.e.2). or equivalently assumed the cumulative hazards to be step functions. twoway or threeway nonparametric estimates. λg (t) = bf bg Lt Kg l=1 t − t[l] λg (t[l] ) bg for m.12). statistical properties of DtBreslow estimates λf (m) and λg (t) can be studied by the standard likelihood theory for the ﬁnitedimensional parameters α or β. the DtBreslow estimator with output of twoway empirical hazard rates λf (m) and λg (t). In the ﬁnitesample (or discrete) case. Consider e. 3. one needs to be careful about the intrinsic identiﬁcation problem. for which we have discussed the practical solutions in both of these chapters. For either default intensities or loss rates on Lexis diagram. However on the continuous domain. NelsonAalen increments on discrete time points.21) underlying the DtBreslow and MEV estimators are special cases of the dualtime Cox models in Section 4. where Kf (·) and Kg (·) together with bandwidths bf and bg are the kernel functions that are bounded and vanish outside 102 . where the smoothness assumptions follow our discussion in the previous chapter.16) and (4.g. 2. see (3. In this case. i.
then perform Gaussian process modeling discussed in Section 3.3 with either pre2005 only or post2005 only vintages. 4. In this case. For largesample observations.1). one may ﬁrst calculate the empirical hazard rates λj (m) per vintage by (4. Aalen. we may write λj (m) = λj (m)(1 + εj (m)) with Eεj (m) = 0 and Eε2 (m) ≈ j 2 σj (m) λ2 (m) j = 1 1 − → 0. 1].g. the variance of λj (m) can be estimated by a tiecorrected formula 2 σj (m) = nriskj (m) − neventj (m) · neventj (m) nrisk3 (m) j and λj (m) are uncorrelated in m.3. see e. §7). et al. An important message delivered by Figure 4. (2008.11). 103 . neventj (m) nriskj (m) as nj → ∞ on the discrete time domain. p. by Taylor expansion.2 and Figure 4. log λj (m) ≈ log λj (m) + εj (m). The challenging issues in these smoothing methods are known to be the selection of bandwidths or smoothing parameters. Both the DtBreslow and MEV nonparametric methods may correct the bias of lifetimeonly estimation.[−1. see RamlauHansen (1983). Alternatively. one may use the penalized likelihood formulation via smoothing spline. as demonstrated in Figure 4. An alternative way of dualtime hazard smoothing can be based on the vintage data analysis (VDA) developed in the previous chapter. see Gu (2002. More details can be referred to Wang (2005) and references therein. use them as the pseudoresponses of vintage data structure as in (3. as well as the correction of boundary eﬀects.85).2 (middle and bottom) is that the nonparametric estimation of λf (m) in conventional lifetimeonly survival analysis might be misleading when there exists exogenous hazard λg (t) in the calendar dimension. Then. 5. corresponding to the MEV model (3.15) with nonstationary noise.
10)–(1.6) for the latent distancetodefault process w.r. Without getting complicated. 4. . Recall the hazard rate (or. let us assume that Wj and Wj are uncorrelated whenever j = j . Let Xj (m) be the endogenous distanttodefalut (DD) process in lifetime and assume it follows a drifted Wiener process Xj (m) = cj + bj m + Wj (m). bj ∈ R denotes the parameter of trend of credit deterioration and Wj (t) is the Wiener process. and (1.25) where Xj (0) ≡ cj ∈ R+ denotes the parameter of initial distancetodefault.6). In this section. we consider the extension of structural approach to dualtimetodefault analysis.3 Structural Models We have introduced in Chapter 1.1 Firstpassagetime Parameterization Consider the Lexis diagram of dualtime default events for vintages j = 1.13) for the concrete expression of hazard rate based on the inverse Gaussian lifetime distribution. cj . m≥0 (4. For a quick review. J. . c j + bj m −cj + bj m −2bj cj √ √ −e Φ m m λj (m. . one may refer to (1.1 the structural approach to credit risk modeling.9) for the notion of ﬁrstpassagetime.4.3. By (4. bj ) = Φ (4.2. which is popular in the sense that the default events can be triggered by a latent stochastic process when it hits the default boundary. the MLE of (cj . default intensity) of the ﬁrstpassagetime of Xj (m) crossing zero: (cj + bj m)2 c √ j exp − 2m 2πm3 . zero boundary. (1. see (1.13). bj ) can be obtained by maximizing the 104 . .26) whose numerator and denominator correspond to the density and survival functions.t.
In our case. §10): 1. all the hazard rate curves will converge to the constant 0. et al. let us focus on the bounded interval m ∈ [ .02. bj ) + log Sj (Mji . a default event tends to happen sooner.loglikelihood of the jth vintage observations nj j = i=1 ∆ji log λj (Mji . Then. both the initial distancetodefault cj and the trend of credit deterioration bj have interesting structural interpretations. the quasistationary distribution is φ(x) = b2 xe−bx for x ≥ 0 and the limiting hazard rate is given by m→∞ lim λ(m) = 1 ∂φ(x) 2 ∂x = x=0 b2 . with maximum at m∗ such that the derivative λ (m∗ ) = 0. See Figure 1.26) for m ∈ (0. when c gradually approaches zero. let φm (x) be the conditional density of X(m) given that min X(s) > 0 and it is said to 0<s≤m be quasistationary if φm (x) is minvariant. cj . 2. We supplement the gradients of the loglikelihood function at the end of the chapter. et al. bj ) − log Sj (Uji . and it determines the level of stabilized hazard rate as m → ∞. In theory. With reference to Aalen and Gjessing (2001) or Aalen. In Aalen. In Figure 1.2 for the illustration with ﬁxed b and varying c values.2 with b = −0. (4.27) where the maximization can be carried out by nonlinear optimization. bj ). (2008. 2 which does not depend on the cparameter. (2008. As their literal names indicate. the hazard rate would be shaped from (a) monotonic increasing to (b) ﬁrstincreasingthendecreasing and ﬁnally to (c) monotonic decreasing.0002 after a long run. hence moving the concentration of the defaults towards left. The bparameter represents the trend of DD process. When c becomes smaller. ∞) is always ﬁrstincreasingthendecreasing. §10). mmax ] for some > 0. cj . 105 . cj . The cparameter represents the initial value of DD process. In practice. the hazard function (4.
g. To achieve that.28) where Xj (m) is the endogenous DD process (4. §6.29) w.The above ﬁrstpassagetime parameterization is only on the endogenous or maturation hazard in lifetime. in the sense of letting the ‘average’ DD process be captured endogenously. 106 . see e.25) and Ψj (m) is an increasing timetransformation function such that all vintages share the same integrand ψ(t) > 0 in calendar time. and Dj (m) = bj 0 ψ(Vj + s) − ψ(Vj + s) ds. Next.30) Let us introduce a new process Xj (m) such that dXj (m) = introduce a timevarying default boundary m ψ(Vj + m)dXj (m). one may rewrite it as ∗ dXj (m) = ψ(Vj + m)dXj (m) − bj ψ(Vj + m) − ψ(Vj + m) dm. Alternatively.t. Lawless (2003. the zero default boundary.4. we model the dualtime DD process for each vintage Vj by a subordinated process m ∗ Xj (m) = Xj (Ψj (m)). with Ψj (m) = 0 ψ(Vj + s)ds (4.16) with correspondence σi = 1 and ψ(·) = e−µi . The exogenous function ψ(t) rescales the original lifetime increment ∆m to be ψ(Vj + m)∆m. (4. Such timetransformation approach can be viewed as a natural generalization of the loglocationscale transformation (1. It has been used to extend the accelerated failure time (AFT) models with timevarying covariates. We set the constraint E log ψ = 0 on the exogenous eﬀect. we extend the ﬁrstpassagetime structure taking into account the exogenous eﬀect in calendar time. and therefore transforms the diﬀusion dXj (m) = bj dm + dWj (m) to ∗ dXj (m) = bj ψ(Vj + m)dm + ψ(Vj + m)dWj (m) (4.r.3).
1. Then.6).26). In our case the exogenous timescaling eﬀects are often dynamic and volatile. for which we recommend a piecewise estimation procedure based on the discretized Lexis diagram. m] = P Xj (s) > 0 : s ∈ [0. cj . Nonparametric techniques can be used if extra properties like the smoothness can be imposed onto the function ψ(t). it is straightforward to obtain the survival function of the ﬁrstpassagetime τj∗ by ∗ ∗ Sj (m) = P Xj (s) > 0 : s ∈ [0. The complication lies in the arbij trary formation of ψ(t) involved in the timetransformation function. By timetransformation (4. with j given ∗ by (4. the exogenous function ψ(t) not only maps the benchmark hazard rate to the rescaled time Ψj (m).28).32). the joint loglikelihood is given by = J j=1 j . Discretize the Lexis diagram with ∆m = ∆t such that t = tmin + l∆t for l = 107 . Now consider the parameter estimation of both vintagespeciﬁc structural parameters (cj . by (4.31) where the benchmark survival Sj (·) is given by the denominator of (4. bj ) and the exogenous timescaling function ψ(t) from the dualtimetodefault data (4. but also scales it by ψ(t). By (4. cj . Ψj (m)] m = Sj (Ψj (m)) = Sj 0 ψ(Vj + s)ds (4.30).21). the hazard rate of τj∗ is obtained by λ∗ (m) = j ∗ − log Sj (m) − log Sj (Ψj (m)) dΨj (m) = = λj (Ψj (m))Ψj (m) dm dΨj (m) dm m = λj 0 ψ(Vj + s)ds ψ(Vj + m) (4. From (4.∗ Then.32) with λj (·) given in (4. it is clear that the timetodefault τj∗ = inf{m ≥ 0 : Xj (m) ≤ 0} = inf{m ≥ 0 : X(m) ≤ Dj (m)}.27) based on λ∗ (m. This is the same as the AFT regression model (1.1).26). bj ) and Sj (m. bj ).
27) with Mji replaced by Ψj (Mji ).0. In Step (3. ψL ) in a timefrequency perspective. . . Specify ψ(t) to be piecewise constant ψ(t) = ψl . 2. 1. Run nonlinear optimization for estimating (ci . . get the corresponding likelihood contribution: [l] = (j. Then. i) : Vj + Mji = tmin + l∆t}. . for Vj + m ≥ tmin which reduces to tmin − Vj + (Vj +m−tmin )/∆t l=1 ψl ∆t in the case of left truncation.. . For each of l = 1. . (b) For all ﬁxed (cj . . where tmin is set to be the left boundary and tmin + L∆t corresponds to the right boundary. . for t − tmin ∈ (l − 1)∆t. . . ψL ) for the purpose of reducing the dimension of parameter space in constrained optimization. . ψL ) by maximizing subject to the constraint L l=1 L l=1 [l] log ψl = 0. 2. . . L. Such ideas will be investigated in our future work. . obtain the MLE (cj . bj ). . One may also apply the multiresolution wavelet bases to represent {ψ1 . 108 . . 2. . obtain the MLE of (ψ1 .b). . . l∆t . . .i)∈G[l] ∆ji log λj (Ψj (Mji )) + log ψl + log Sj (Ψj (Mji )) − log Sj (Ψj (Uji )) where G[l] = {(j. tmin − Vj } if tmin is the systematic left truncation in calendar time. . and Ψj (Uji ) = max{0. . bi ) and ψl iteratively: (a) For ﬁxed ψ and each ﬁxed vintage j = 1. the timetransformation for the vintage with origination time Vj can be expressed as (Vj +m−tmin )/∆t Ψj (m) = l=(Vj −tmin )/∆t ψ(tmin + l∆t)∆t. one may perform further binning for (ψ1 . 3. . bj ) by maximizing (4. J. . L and ψ(t) ≡ 1 for t ≤ tmin . l = 1. . .
the default occurs at τji = inf{m > ∗ 0 : Xji (m) ≤ 0}.34) are not timevarying.31) and (4. and suppose the subjectspeciﬁc DD process ∗ X (m) = Xji (Ψji (m)). we ∗ may specify Ψji (m) = m exp{θT zji } and obtain the AFT regression model log τji = θT zji + log τji with inverse Gaussian distributed τji .32). Mji ] be the dynamic covariates observed since left truncation. ji X (m) = c + b m + W (m) ji ji j ji (4.35) with an independent Wiener process Wji (m). we obtain the hazard rate λ∗ (m) = ji λji (Ψji (m))Ψji (m). Note that when the covariates in (4. θ). the initial value cji and the vintage∗ speciﬁc trend of credit deterioration bj .3.33) exp{θT zji (s)}ds. or cji + bj Ψji (m) exp − 2Ψji (m) 2πΨ3 (m) ji cji Φ cji + bj Ψji (m) Ψji (m) −e −2bj cji 2 Ψji (m) (4.2 Incorporation of Covariate Eﬀects It is straightforward to extend the above structural parameterization to incorpo˜ rate the subjectspeciﬁc covariates. the timetransformation (4. let zji be the static covariates observed upon origination (including the intercept and otherwise centered predictors) and zji (m) for m ∈ (Uji .28). i). m ≥ Uji Uji Ψji (m) = Uji − Vj + (4. Let us specify ˜ cji = exp{η T zji } m (4.36) λ∗ (m) = ji Φ −cji + bj Ψji (m) Ψji (m) 109 . Then. By the same arguments as in (4.34) becomes a parametric version of (4. In the presence of calendartime state variables x(t) such that zji (m) = x(Vj +m) for all (j.4.34) subject to the unknown parameters (η. For each account i with origination Vj .
e. for which we make an attempt to use the timetransformation technique (4.26) is based on the ﬁxed eﬀects of initial distancetodefault and trend of deterioration. The parameters of interest include (η. 2. there has been limited discussion on incorporation of dynamic covariates in the ﬁrstpassagetime approach. (2008. the endogenous hazard rate is given by p(m)/S(m) with (c + µt)2 exp − p(m) = 2(t + σ 2 t2 ) 2π(m3 + σ 2 m4 ) c + µm −c + µm − 2σ 2 cm 2 2 √ S(m) = Φ √ − e−2µc+2σ c Φ m + σ 2 m2 m + σ 2 m2 c . For example. . One may refer to Aalen. In the ﬁrstpassagetime parameterization with inverse Gaussian lifetime distri110 . . the loglikelihood function follows the generic form (4. There is no unique way to incorporate the static covariates in the ﬁrstpassagetime parameterization.28) under nonparametric setting and (4. J. when the trend parameter b ∼ N (µ. However. §10) about the randomeﬀect extension.34) under parameter setting. . .1). Standard nonlinear optimization programs can be employed. The endogenous hazard rate (4. see Lee and Whitmore (2006). while these structural parameters can be random eﬀects due to the incomplete information available (Giesecke. 2006). For example. the “optim” algorithm of R (statslibrary) and the “fminunc” algorithm of MATLAB (Optimization Toolbox).g. et al.∗ and the survival function Sji (m) = Sji (Ψji (m)) given by the denominator in (4. 3.6).36). while they may also be incorporated into other structural parameters. Remarks: 1. and they can be estimated by MLE. Given the dualtimetodefault data (4.33). we considered including them by the initial distancetodefault parameter (4. θ) and bj for j = 1. σ 2 ) and the initial value c is ﬁxed.
38) is too relaxed to capture the interrelation of vintagespeciﬁc baselines. (4.4 Dualtime Cox Regression The semiparametric Cox models are popular for their eﬀectiveness of estimating the covariate eﬀects. Mji ]. Using the traditional CoxPH modeling approach. Suppose that we are given the dualtimetodefault data (4.26) or (4.6) with the pluggedin (4.37) or adopt the stratiﬁed Cox model with arbitrary vintagespeciﬁc baselines λji (m) = λj0 (m) exp{θ T zji (m)} (4. (4. one may either adopt an arbitrary baseline in lifetime λji (m) = λ0 (m) exp{θ T zji (m)}. one may rely on the crude nonlinear optimization algorithms. . which may include ˜ ˜ the static covariates zji (m) ≡ zji . . see the supplementary material. these two Cox models may not be suitable for (4.38) for i = 1.1) together with the covariates zji (m) observed for m ∈ (Uji . However.36) are rather messy. 4.bution. . L.1 whose vintagespeciﬁc hazards are interrelated not only in lifetime m but also in calendar time t.37) is too stringent to capture the variation of vitnagespeciﬁc baselines. . For example. the loglikelihood functions (4. nj and j = 1. 2. Our discussion in this section mainly illustrates the feasibility of dualtime structural approach. 111 . Without gradient provided.27) and (4. . (4.1) observed on the Lexis diagram in the following sense: 1. and the parameter estimation follows the standard MLE procedure. . we have simulated a Lexis diagram in Figure 4. . especially when we attempt to evaluate the derivatives of the loglikelihood function in order to run gradient search. .
21). we restrict ourselves to the dualtime Cox models with MEV baselines and relativerisk multipliers λji (m.37) and (4. t).1 Dualtime Cox Models The dualtime Cox models considered in this section take the following multiplicative hazards form in general.38).4. one may always ﬁnd Lm distinct lifetime points m[1] < · · · < m[Lm ] and L distinct calendar time points t[1] < · · · < t[Lt ] such that there exist at least one default event observed at the corresponding marginal time. the nonparametric model (4. Some necessary binning or trendremoval need to be imposed on g.39) where λ0 (m. let us treat the endogenous baseline hazard λf (m) and the exogenous baseline hazard λg (t) as pointwise functions of the form (4. Following the leastinformation assumption of Breslow (1972). t) = λf (m)λg (t)λh (t − v).38) can be included as special cases. λji (m. t) is a bivariate base hazard function on Lexis diagram. For the discrete set of vintage originations.e. we assume the MEV decomposition framework of the base hazard λ0 (m. i. t) = λ0 (m. In order to balance between (4. h to ensure the model estimability.1) on Lexis diagram.17) subject to Lt l=1 βl = 0.1. the traditional models (4. Upon the speciﬁcation of λ0 (m.4.37) and (4. the vintage heterogeneity eﬀects can be regarded 112 . t) = λf (m)λg (t)λh (Vj ) exp{θ T zji (m)} = exp{f (m) + g(t) + h(Vj ) + θ T zji (m)} (4. Given the ﬁnitesample observations (4.40) where λf (m) = exp{f (m)} represents the arbitrary maturation/endogenous baseline. see Lemma 3. t) exp{θ T zji (m)} (4. Thus. λg (t) = exp{g(t)} and λh (v) = exp{h(v)} represent the exogenous multiplier and the vintage heterogeneity subject to Eg = Eh = 0.
β and γ to satisfy the zero sum constraints for β and γ. In what follows. We may either assume the pointwise function of the form h(v) = J j=1 γj I(v = Vj ) subject to constraints on both mean and trend (i. both endogenous and exogenous baseline hazards λf (m) and λg (t) deﬁned on the continuous dualtime scales (m. . 113 .e. essentially. to which the theory of partial likelihood applies.40) involves two nonparametric components that need to be proﬁled out for the purpose of estimating the parametric eﬀects. (Note that it is rather straightforward to convert the resulting parameter estimates α. as the dualtime Cox models (4.40) is parametrized with dummy variables to be Lm Lt λji (m. t) tend to be inﬁnitedimensional. κ = 1.41) where the vintage buckets Gκ = {j : Vj ∈ (νκ−1 .42) + κ=2 γκ I(j ∈ Gκ ) + θ T zji (m) where β1 ≡ 0 and γ1 ≡ 0 are excluded from the model due to the identiﬁability constraints.45) to be studied.6). if j ∈ Gκ . β. one may then ﬁnd the MLE of (α. t) = exp l=1 K αl I(m = m[l] ) + l=2 βl I(t = t[l] ) (4. γ. or assume the piecewiseconstant functional form upon appropriate binning in vintage originations h(Vj ) = γκ . . the ad hoc approach). Plugging it into the joint likelihood (4. . the semiparametric model (4. θ). Then. The semiparametric estimation problem in this case is challenging.) As the sample size tends to inﬁnity. K (4. νκ ]} are prespeciﬁed based on Vj− ≡ ν0 < · · · < νK ≡ VJ (where K < J). hence making intractable the above pointwise parameterization.as the categorical covariates eﬀects. we take an “intermediate approach” to semiparametric estimation via exogenous timescale discretization. . so is it for (4.
tl ] in (4. 1975).4. (4. tl ])}L . and zji (m) as a long vector κ=2 l=2 xji (m). . assume the exogenous baseline λg (t) is deﬁned on the L cutting points (and zero otherwise) L g(t) = l=1 βl I(t = tmin + l∆t).2 Partial Likelihood Estimation Consider an intermediate reformulation of the dualtime Cox models (4.40): L λji (m) = λf (m) exp l=2 K βl I(Vj + m ∈ (tl−1 . the parameter ξ can be estimated by maximizing 114 . θ) ≡ ξ. and the vintage eﬀect is bucketed by (4. g(t) for any t ∈ (tl−1 . write for each (j. Then. Then.44) subject to L l=1 βl = 0. the (K −1)vector of {I(Vj ∈ (νκ−1 . (If the Lexis diagram is discrete itself. γ.45) where λf (m) is an arbitrary baseline hazard. the reformulated CoxPH model (4. consider the oneway CoxPH reformulation of (4.40) is shifted to g(tl ) in (4.42).41). L (4.43) for some time increment ∆t bounded away from zero. . We consider the partial likelihood estimation of the parameters (β. i.45) is equivalent to the fully parametrized version (4. The irregular time spacing is feasible. tl ]) γκ I(j ∈ Gκ ) + θ T zji (m) κ=2 + (4. too. by Cox (1972. . for l = 0.44). It is easy to see that if the underlying Lexis diagram is discrete.) On each subinterval. where tmin corresponds to the left boundary and tmin + L∆t corresponds to the the right boundary of the calendar time under consideration. For convenience of discussion. νκ ])}K .4. m) the (L − 1)vector of {I(Vj + m ∈ (tl−1 . ∆t is automatically deﬁned. .40) by exogenous timescale discretization: tl = tmin + l∆t. 1.
the endogenous cumulative baseline hazard Λf (m) = m 0 λf (s)ds can be estimated by the Breslow estimator J nj Λf (m) = j=1 i=1 I(m ≥ Mji )∆ji (j .9). It is wellknown in the CoxPH context that both estimators ξ and Λf (·) are consistent and asymptotically normal. Given the maximum partial likelihood estimator ξ.the partial likelihood J nj PL(ξ) = j=1 i=1 exp{ξ T xji (Mji )} T (j . Without the accountlevel covariates. i ) : Uj i < Mji ≤ Mj i } is the atrisk set at each observed Mji . the hazard rates λji (m) roll up to λj (m). N Λf (m) − Λf (m) converges to a zeromean Gaussian process.2 can be studied by the theory of partial likelihood above.i )∈Rji exp ξ xj i (Mji ) ∆ji (4. h) = j=1 m exp{g(Vj + m) + h(Vj )} J j =1 neventj (m) nriskj (m) exp{g(Vj + m) + h(Vj )} (4.41).48) where g(t) is parameterized by (4.46) where Rji = {(j . √ Speciﬁcally. J PL(g.45) with θ ≡ 0. (1993). since the underlying nonparametric models are special cases of (4. h) and centering in the sense of 115 .47) which is a step function with increments at m[1] < · · · < m[Lm ] . For the nonparametric baseline. see also Andersen. see Andersen and Gill (1982) based on counting process martingale theory.44) and h(v) is by (4. under which the partial likelihood function (4. Let λg (t) = exp{g(t)} and λh (v) = exp{h(v)} upon maximization of PL(g. et al. The DtBreslow and MEV estimators we have discussed in Section 4.46) can be expressed through the counting notations introduced in (4. where √ I(ξ) = −∂ 2 log PL(ξ) ∂ξξ T . N (ξ − ξ) converges to a zeromean multivariate normal distribution with covariance matrix that can be consistently estimated by {I(ξ)/N }−1 .i )∈Rji exp ξ xj i (Mji ) T (4.
49) are nearly identical to the plots shown in Figure 4. where the raw data (4. Standard softwares with survival package or toolbox can be used to perform the parameter estimation. They provide an alternative approach to model the vintage eﬀects as the random block eﬀects.22) of the MEV estimator. we may convert (4. 4. by (4.2. They also reduce to (4.3 Frailtytype Vintage Eﬀects Recall the frailty models introduced in Section 1. see the supplementary material in the last section of the chapter.46) to ﬁnd θ and using (4.47). .1) are converted to the counting process format upon the same binning preprocessing as in Section 4. .18) of DtBreslow estimator in the absence of vintage heterogeneity.45) can be carried out by maximizing the partial likelihood (4.3.3 about their double roles in charactering both the odd eﬀects of unobserved heterogeneity and the dependence of correlated defaults.48) and λf by (4. Let Zκ for κ = 1. we have tested the partial likelihood estimation using the coxph of R:survival package and the simulation data in Figure 4.g. K be a random sample from 116 .1.1) to an enlarged counting process format with piecewiseconstant approximation of timevarying covariates xji (m). In the case when the coxph procedure in these survival packages (e.49) It is clear that the increments of Λf (m) correspond to the maturation eﬀect (4.Eg = Eh = 0.3. R:survival) requires the timeindepenent covariates. (4. we have the Breslow estimator of the cumulative version of endogenous baseline hazards m J j=1 J j=1 Λf (m) = 0 neventj (s)ds nriskj (s)λg (Vj + s)λh (Vj ) . λh by maximizing (4. Then. The implementation of dualtime Cox modeling based on (4. .47) to estimate the baseline hazard. The numerical results of λg . For example.4. .
L (d) (x) is its dth derivative. The loglikelihood based on the dualtimetodefault data (4. . For a ﬁxed δ. see the supplementary material at the end of the chapter. see Nielsen. nj and j ∈ Gκ for κ = 1. It is assumed that given the frailty Zκ . Alternatively. while the withinbucket dependence is captured by the same realization Zκ . 117 . We also write λji (m. (4. It has the Laplace transform L (x) = (1+ δx)−1/δ whose dth derivative is given by L (d) (x) = (−δ)d (1 + δx)−1/δ−d d i=1 (1/δ + i − 1). . .45). Consider the dualtime Cox frailty models λji (m. Then. where Zκ represents the shared frailty level of the vintage bucket Gκ according to (4. we may write (4.50) for i = 1. The gamma distribution Gamma(δ −1 .some frailty distribution.41). tZκ ) = λf (m)λg (t)Zκ exp{θ T zji (m)}. dκ = j∈Gκ nj i=1 ∆ji and Aκ = j∈Gκ nj i=1 Mji Uji ˜T ˜ λf (s) exp ξ xji (s) ds. if j ∈ Gκ (4. the observations for all vintages in Gκ are independent. . et al. Similar to the intermediate approach taken by the partial likelihood (4. often used is the EM (expectationmaximization) algorithm for estimating (λf (·). tZκ ) as λji (mZκ ) with t ≡ Vj + m. δ) with mean 1 and variance δ is the most frequently used frailty due to its simplicity.50) l=2 ˜T ˜ as λji (mZκ ) = λf (m)Zκ exp ξ xji (m) . K.51) where L (x) = EZ [exp{−xZ} is the Laplace transform of Z. . (1992) or the supplementary material. the gamma frailty models can be estimated by the penalized likelihood approach. In this frailty model setting. .46) under ˜ ˜ exogenously discretized dualtime model (4. . . tl ])}L and zji (m).51) as a kind of regularization. ξ).1) is given by J nj T K = j=1 i=1 ˜ ˜ ∆ji log λf (Mji ) + ξ xji (m) + κ=1 log (−1)dκ L (dκ ) (Aκ ) . let us write ξ = (β. by treating the second term of (4. θ) and let xji (m) join the vectors of {I(Vj + m ∈ (tl−1 . the betweenbucket heterogeneity is captured by realizations of the random variate Z.
we recommend a twostage procedure with the ﬁrst stage ﬁtting the dualtime Cox models (4. Such z(t) might be entertained by the traditional lifetime Cox models (4.Therneau and Grambsch (2000. and the second stage modeling the correlation between z(t) and the estimated λg (t) based on the time series techniques. Remarks: 1. In practice.6) justiﬁed that for each ﬁxed δ.52) where ξ = (β. such that the existing theory of partial likelihood and the asymptotic results can be applied. γ. The dualtime Cox model (4.46). The dualtime Cox regression considered in this section is tricked to the standard CoxPH problems upon exogenous timescale discretization. in order to investigate the exogenous covariate eﬀects of z(t).40) includes only the endogenous covariates zji (m) that vary across individuals. §9. it is also interesting to see the eﬀects of exogenous covariates z(t) that are invariant to all the individuals at the same calendar time. the EM algorithmic solution coincides with the maximizer of the following penalized loglikelihood J nj ∆ji j=1 i=1 1 log λf (Mji ) + ξ xji (m) − δ T K γκ − exp{γκ } κ=1 (4. We have not studied the dualtime asymptotic behavior when both lifetime and calendar time are considered to be absolutely continuous. since exp{θ T z(t)} would be absorbed by the exogenous baseline λg (t). they are not estimable if included as a loglinear term by (4. They also provide the programs (in R:survival package) with NewtonRaphson iterative solution as an inner loop and selection of δ in an outer loop. 2.40). 118 . Fortunately.40). θ) and xji (m) are the same as in (4.50) up to certain modiﬁcations. Therefore.37) upon the time shift. however. the timediscretization trick works perfectly. In practice with ﬁnitesample observations. these programs can be used to ﬁt the dualtime Cox frailty models (4.
These attributes may be either treated as the endogenous covariates. acquisition channels. In this section we demonstrate the application of dualtime survival analysis to credit card and mortgage portfolios in retail banking. see Figure 4.4. 4. Other variables like the credit line and utilization rate are not considered here. the Merton model) are not directly applicable to retail credit.g. It turns out that many existing models for corporate credit risk (e. there have been large gaps among academic researchers. Medium if 640 ≤ FICO < 740 and High if FICO ≥ 740. some tweaked samples are considered here.4. and use the mortgage loanlevel data to illustrate the dualtime Cox regression modeling. banks’ practitioners and governmental regulators. or used to deﬁne the multiple segments. we demonstrate the credit card risk modeling with both segmentation and covariates.5 Applications in Retail Credit Risk Modeling The retail credit risk modeling has become of increasing importance in the recent years.2). see among others the special issues edited by Berlin and Mester (Journal of Banking and Finance. 2005). Here.1 Credit Card Portfolios As we have introduced in Section 1. we consider two generic segments and a categorical covariate with three levels. we have assessed them under the simulation mechanism (4. Before applying these DtSA techniques. Based on our real dataanalytic experiences. 119 .2 and Figure 4. geographical regions and the like. the credit card portfolios range from product types. we employ the pooled credit card data for illustrating the dualtime nonparametric methods. Speciﬁcally.3. In quantitative understanding of the risk determinants of retail credit portfolios. where the categorical variable is deﬁned by the credit score buckets upon loan origination: Low if FICO < 640. For simplicity. 2004) and Wallace (Real Estate Economics.5. only for the purpose of methodological illustration.
the 200501 vintage has 3 attrition accounts (i. . See Table 4. . . . 1 Medium . . (b) right censored by the end of 2008. . . we explore the empirical hazards of the two hypothetical segments.3 about the vintage diagram of dualtime data collection. .e.1. . . . one may aggregate the observations at the vintage level for each FICO bucket in each segment. . First of all. . . . . which resembles the rectangular Lexis diagram of Figure 4. . . . . . . W: counts. . 1 Low 200502 . Note that given the pooled data with weights W for D = 0. We applied the oneway. . . . 1. . . . . Let us consider the card vintages originated from year 2001 to 2008 with observations (a) left truncated at the beginning of 2005. . . . . . . . . Refer to Figure 1. . D: status. 1 Low 200501 .1 also illustrates that in the lifetime interval [3.1 for an illustration of the counting format of the input data. . . . In this simple case. .Table 4. Note that the last column of Table 4. namely the number of survived loans (D = 0) and the number of defaulted loans (D = 1). . where each single vintage at each lifetime interval [U. . .2 for each segment of dualtimetodefault data.1: Illustrative data format of pooled credit card loans SegID FICO V U M D W 1 Low 200501 0 3 0 988 1 Low 200501 0 3 1 2 1 Low 200501 3 4 0 993 1 Low 200501 3 4 1 5 1 Low 200501 4 5 0 979 1 Low 200501 4 5 1 11 . . . . . . . it is rather straightforward to modify 120 . regardless of the FICO variable. . . [U. . twoway and threeway nonparametric methods developed in Section 4. . . 2 . . . and (c) up censored by 48 monthsonbook maximum. . M ] has two records of weight W . . 993 − 979 − 11) which are examples of being randomly censored. . . . . .M]: lifetime interval. . 4]. . . V: vintage. . . . .
Figure 4.4: Nonparametric analysis of credit card risk: (top) oneway empirical hazards. 121 . (bottom) MEV decomposition. (middle) DtBrewlow estimation.
but the exogenous cycle eﬀects show dramatic diﬀerence. and we still take the nonparametric approach for each of 2 × 3 segments.16). The middle panel shows the results of DtBreslow estimator that simultaneously calibrates the endogenous seasoning λf (m) and the exogenous cycle eﬀects λg (t) under the twoway hazard model (4. the credit cycle of Segment 2 grows relatively steady. the seasoning eﬀects are similar. the vintage performance in both segments have deteriorated from pre2005 to 2007. where we may compare the performances of two segments as follows. consider the three FICO buckets for each segment. The top panel shows the empirical hazard rates in both lifetime and calendar time.the iterative nonparametric estimators in (4.14) – (4. and made a turn when entering 2008. By such oneway estimates.5. 3. lower hazards) than Segment 2. Conditional on the endogenous baseline. The numeric results for both segments are shown in Figure 4. 2. and it has also better calendartime performance than Segment 2 except for the latest couple of months. The MEV decomposition of empirical hazards are shown in Figure 4. where we consider only the yearly vintages for simplicity. while Segment 1 has a sharp exogenous trend since t = 30 (middle 2007) and exceeds Segment 1 since about t = 40. It is found that in lifetime the low. Compared to the oneway calibration above.4.4. Next. Speciﬁcally. since the more heterogeneity eﬀects are explained by the vintage originations. the less variations are attributed to the endogenous and exogenous baselines. Note that Segment 2 has a dramatic improvement of vintage originations in 2008. medium 122 .24). Segment 1 has shown better lifetime performance (i.e. and one may also use the weighted partial likelihood estimation based on our discussion in Section 4. 1. The estimated of λf (m) and λg (t) are both aﬀected after adding the vintage eﬀects. The bottom panel takes into the vintage heterogeneity.
The exogenous cycles of low. Figure 4. it is interesting to note that for Segment 1 the high FICO vintages keep deteriorating from pre2005 to 2008. Segment 2 (bottom panel).4. where for Segment 2 the hazard rates are evidently nonproportional.and high FICO buckets have decreasing levels of endogenous baseline hazard rates. 123 . This example mainly illustrates the eﬀectiveness of dualtime nonparametric methods when there are limited or no covariates provided. The vintage heterogeneity in other FICO buckets are mostly consistent with the overall pattern in Figure 4. By the next example of mortgage loans. we shall demonstrate the dualtime Cox regression modeling with various covariate eﬀects on competing risks. which is a risky signal.5: MEV decomposition of credit card risk for low. medium and high FICO buckets: Segment 1 (top panel). As for the vintage heterogeneity. medium and high FICO buckets comove (more or less) within each segment.
a parametric setting).S. Considered here is a tweaked sample of mortgages loans originated from year 2001 to 2007 with Lexis diagram of observations (a) left truncated at the beginning of 2005. the prime mortgages generally target the borrowers with good credit histories and normal down payments. residential mortgage market is huge. et al.S. where we use the uptodate data sources from S&P/CaseShiller and U.7 million loans) with a distribution of $1. and (c) up censored by 60 monthsonbook maximum.2 Mortgage Competing Risks The U. see U. (b) right censored by October 2008.a. the 200708 collapse of mortgage market has rather complicated causes.6 for the plots of 200008 home price indices and unemployment rate. and the remaining being government guaranteed. the ﬁrstlien mortgage debt is estimated to be about $10 trillion (for 54. We make an attempt via dualtime survival analysis to understand the risk determinants of mortgage default and prepayment.2 trillion prime and nearprime combined. As of March 2008. $8. In brief. Federal Researve report by Frame. moral hazard).37) with presumed ﬁxed lifetime baselines (therefore. see Figure 4. where the regional macroeconomic conditions have become worse and worse since middle 2006.4. The rise in mortgage defaults is unprecedented. 124 . (2008) and Goodman. Sherlund (2008) considered subprime mortgages by proportional hazards models (4. while the subprime mortgages are made to borrowers with poor credit and high leverage.5. Bureau of Labor Statistics. Quantitatively. the fall of home prices. the increased leverage. One may refer to Gerardi. Mayer.2 trillion subprime mortgages. the originatethensecuritize incentives (a. In retrospect.k. and the worsening unemployment rates. Pence and Sherlund (2009) described the rising defaults in nonprime mortgages in terms of underwriting standards and macroeconomic factors. These loans resemble the nonconforming mortgages in California. (2008) for some further details of the subprime issue. Lehnert and Prescott (2008). including the loose underwriting standards.S. et al.
prepayment 125 .6: Home prices and unemployment rate of California state: 20002008.7: MEV decomposition of mortgage hazards: default vs.Figure 4. Figure 4.
as well as the vintage heterogeneity eﬀects. we may ﬁt two marginal hazard rate models for default and prepayment. The ﬁtting results of MEV hazard decomposition are shown in Figure 4.6. One practically simple approach is to reduce the multiple events to individual survival analysis by treating alternative events as censored. In contrast. 2005 2006. §8) about robust variance estimation for the marginal Cox models of competing risks.7 for both default and prepayment. The exogenous cycle of the default risk has been low (with exogenous multiplier less than 1) prior to t = 20 (August 2006). It is obvious that the rise in mortgage defaults follows closely 126 . §9) for the modeling issues on survival analysis of competing risks. Conditional on the estimated endogenous baseline hazards of default or prepayment. In contrast. the vintage heterogeneity of prepayment keeps decreasing from pre2005 to 2007. et al. (2008). then become rising sharply up to about 50 times more risker at about t = 43 (July 2008). the exogenous cycle of the prepayment risk has shown a modest townturn trend during the same calendar time period. respectively. One may refer to Therneau and Grambsch (2000. We begin with nonparametric exploration of endogenous and exogenous hazards.For a mortgage loan with competing risks caused by default and prepayment. the monthly vintages are grouped yearly to be pre2005. the observed event time corresponds to time to the ﬁrst event or the censoring time. Compare the estimated exogenous eﬀects on default hazards to the macroeconomic conditions in Figure 4. Similar to the credit card risk modeling above. These ﬁndings match the empirical evidences presented lately by Gerardi. One may refer to Lawless (2003. we ﬁnd that 1. 2. The vintage heterogeneity plot implies that the 200506 originations have much higher probability of default than the mortgage originations in other years. Then. 2007 originations.
t) = λf (m)λ(q) (t) exp θ1 CLTVji + θ2 FICOji + θ3 NoteRateji (m) g + θ4 DocFullji + θ5 IOji + θ6. because (a) the house prices and unemployment in California are themselves highly correlated from 2005 to 2008.p Purpose. we try to decode the vintage eﬀect of unobserved heterogeneity by adding the loanlevel covariates. One may of course perform multivariate time series analysis in order to study the correlation between exogenous hazards and macroeconomic indicators. §1) or Wikipedia. The dualtime Cox regression models considered here take the form of (4. where CLTV stands for the combined loantovalue leverage ratio. We also include the mortgage note rate. adjustable).r Purpose. we consider only the few mortgage covariates listed in Table 4. and (b) the exogenous eﬀects on default hazards could be also caused by other macroeconomic indicators.4) with both endogenous and exogenous baselines. However. (4. Next.purchaseji + θ6. and they could be way more complicated in real situations.the fall of home prices (positively correlated) and the worsening unemployment rate (negatively correlated). FICO is the same measurement of credit score as in the credit card risk. standard deviation and range statistics presented for the continuous covariates CLTV. either ﬁxed or ﬂoating (i.2. and other underwriting covariates upon loan origination can be referred to Goodman. it is not possible to quantify what percentage of exogenous eﬀects should be attributed to either house prices or unemployment. Note that the categorical variable settings are simpliﬁed in Table 4. based on our demonstration data.e. et al. After incorporating the underwriting and dynamic covariates in Table 4.. For demonstration purpose. The mean. which is beyond the scope of the present thesis.2.53) (q) (q) (q) (q) 127 .2. (2008. to demonstrate the capability of Cox modeling with timevarying covariates. FICO and NoteRate are calculated from our tweaked sample. we have (q) (q) (q) (q) (q) λji (m.reﬁji .
p and Purpose.841 −0.285 FICO −0. high NoteRate.944 0.5. reﬁ or cashout reﬁ Table 4. and cashout reﬁ Purpose.432 −0.4.Table 4. 128 .3: Maximum partial likelihood estimation of mortgage covariate eﬀects in dualtime Cox regression models. From Table 4.r −0. while in practice one may need strive to ﬁnd the appropriate functional forms or break them into multiple buckets.091 DocFull 1. σ = 35 and range [530. the continuous type of covariates FICO. where NoteRate could be dynamic and others are static upon origination.307 where the superscript (q) indicates the competingrisk hazards of default or prepayment.2: Loanlevel covariates considered in mortgage credit risk modeling. CLTV µ = 78.385 NoteRate 1.1 and range [1.284 0.3. low FICO.2862 −1.500 0.656 −0.202 0.3. 840] NoteRate µ = 6.r against the standard level of cashout reﬁ).p −1. we ﬁnd that conditional on the nonparametric dualtime baselines. For simplicity. σ = 13 and range [10. we obtain the numerical results tabulated in Table 4.185 Purpose. limited documentation. we have assumed the loglinear eﬀects of all 3 continuous covariates on both default and prepayment hazards. The default risk is driven up by high CLTV. σ = 1.141 IO Purpose. and the 3level Purpose covariate is broken into 2 dummy variables (Purpose. where the coeﬃcient estimates are all statistically signiﬁcant based on our demonstration data (some other nonsigniﬁcant covariates were actually screened out and not considered here). Covariate Default Prepayment CLTV 2. By the partial likelihood estimation discussed in Section 4. InterestOnly. 1. 11] Documentation full or limited IO Indicator InterestOnly or not Loan Purpose purchase. CLTV and NoteRate are all scaled to have zero mean and unit variance. 125] FICO µ = 706.
Some interesting ﬁndings in credit card and mortgage risk analysis are presented. 129 . For the structural approach based on inverse Gaussian ﬁrstpassagetime distribution. . while its implementation is open to future investigation. which can be applied to model both corporate and retail credit. We have also remarked throughout the chapters the possible extensions and other open problems of interest. Developed in this thesis is mainly the dualtime analytics. which may shed some light on understanding the ongoing credit crisis from a new dualtime perspective. To end this thesis. high NoteRate. 3. It is actually reasonable to claim that a homeowner with good credit and low leverage would more likely prepay rather than choose to default. structural parameterization and semiparametric Cox regression. FICO and Purpose reveals the competing nature of default and prepayment risks. 4. the end is also a new beginning . So. . including assessment of the methodology through the simulation study. we have discussed the feasibility of its dualtime extension and the calibration procedure. Computational results are provided for other dualtime methods developed in this chapter.6 Summary This chapter is devoted completely to the survival analysis on Lexis diagram with coverage of nonparametric estimators. InterestOnly. we have demonstrated the application of dualtime survival analysis to the retail credit risk modeling. The covariates CLTV. high FICO. we conclude that statistical methods play a key role in credit risk modeling. limited documentation. including VDA (vintage data analysis) and DtSA (dualtime survival analysis). The prepayment risk is driven up by low CLTV. and purchase Purpose. Finally.2.
The loglikelihood function is given by n Mi = i=1 ∆i log λ(Mi . Press. λ (m. c. η)/∂η and ∂ 2 λ(m. η) ∆i Mi λ (s. η)/∂η. η)ds where λ (m. . Knowing the parametric likelihood given by (4. η) − Ui λ (s. By supplying the gradients. ∆i ). the optimization algorithms in most of standard softwares are faster and more reliable. . see e. i = 1.g. Mi . n each subject to left truncation and right censoring. Alternatively. b)T the structural parameters of inverse Gaussian hazard. η) and take their derivatives based on the explicit expressions below. η)ds Ui Mi λ (Mi .7 Supplementary Materials (A) Calibration of Inverse Gaussian Hazard Function Consider the calibration of the twoparameter inverse Gaussian hazard rate function λ(m. η)/∂η∂η T 130 . The complications lie in the evaluations of ∂λ(m.4.54) Taking the ﬁrst and second order derivatives. η) (λ (Mi . η) = ∂ 2 λ(t. . et al. (2007). Denote by η = (c. η)ds (4. Provided below are some calculations of the derivatives for the gradient search purpose. b) of the form (4. the MLE requires nonlinear optimization that can be carried out by gradient search (e. η) − Ui λ(s.27). η)/∂η∂η T . η) λ2 (Mi . one may replace the second term of (4.54) by log S(Mi . Note that the evaluation of the integral terms above can be carried out by numerical integration methods like the composite trapezoidal or Simpon’s rule. η) − log S(Ui . and a⊗2 = aaT .g. we have the score and Hessian functions ∂ = ∂η ∂2 ∂η∂η T = i=1 n i=1 n ∆i − λ(Mi . η) = ∂λ(m.26) from the n observations (Ui . NewtonRaphason algorithm). η))⊗2 − λ(Mi . .
η) λ(m. 2π z1 = z2 = evaluate ﬁrst the derivatives of the survival function (i. η) 1 ∂λ(m. η) ∂S(t. η) 2 ∂ λ(m. η) ∂S(m. Denoting η1 + η2 m √ . η) ∂η2 2 ∂ S(m. η)m 2 ∂η2 ∂η2 − − ∂S(m. η) ∂S(m. η) = −λ0 (m. η) ∂S(m. η) + η2 − − ∂η1 m η1 S(m. η) = − (η1 + η2 m) − λ(m. η) can be evaluated explicitly by η1 1 λ0 (m. η) ∂S(m.e. η) λ(m. m 1 2 φ(z) = √ e−z /2 . η) ∂η1 ∂η2 2 131 . η) = − (η1 + η2 m) − λ(m. η) ∂η1 ∂η2 S(m. η) 1 ∂λ(m.26)): ∂S(m. messy though. η) λ(m. η) ∂η1 S(m. S(m. which are. η) = − + η2 − + − λ(m. provided below. η) ∂η1 ∂η2 S (m. η) λ(m. η) ∂η1 2 λ(m. η) − + 2 2 S(m. η) − − + 2 . the partial derivatives of λ(m. η) 2 ∂η1 ∂ 2 S(m. η) ∂ 2 S(m. η) ∂ 2 S(m. η) ∂λ(m. η) 2 2 ∂η1 ∂η1 m η1 η1 m λ(m. η) ∂η1 ∂λ(m. η) ∂η1 ∂η2 ∂ 2 S(m. η) ∂S(m. η) ∂λ(m. η) − + 2 2 S(m. η) 2 ∂η2 1 = 2η2 e−2η1 η2 Φ(z2 ) + √ φ(z1 ) + e−2η1 η2 φ(z2 ) m √ = 2η1 e−2η1 η2 Φ(z2 ) + m φ(z1 ) − e−2η1 η2 φ(z2 ) 4η2 1 2 = −4η2 e−2η1 η2 Φ(z2 ) − √ e−2η1 η2 φ(z2 ) − φ(z1 )z1 − e−2η1 η2 φ(z2 )z2 m m = (2 − 4η1 η2 )e−2η1 η2 Φ(z2 ) − φ(z1 )z1 − e−2η1 η2 φ(z2 )z2 √ 2 = −4η1 e−2η1 η2 Φ(z2 ) + 4η1 me−2η1 η2 φ(z2 ) − m φ(z1 )z1 − e−2η1 η2 φ(z2 )z2 . the denominator of (4. η) ∂S(m. η) (η1 + η2 m) − ∂η2 S(m. η) ∂ 2 S(m. η) ∂η2 2 ∂ λ(m. η) ∂η1 ∂η1 S (m. Then. η) 1 ∂λ(m. η) ∂η2 S(m. η) ∂λ(m.26). η) η1 1 1 1 ∂ 2 λ(m. η) ∂S(m. η) = −λ(m. η) ∂η1 ∂S(m. η) ∂η2 ∂η2 S (m. m −η1 + η2 m √ . η) ∂η2 ∂λ(m. η) ∂η1 ∂η2 ∂η1 λ(m.based on the inverse Gaussian baseline hazard (4.
(4.56) in which Mi = min{τi . (4. i = 1. x(m)) = λ0 (m) exp{θ T x(m)} (4. see Cox (1972. the complete likelihood is given by n L= i=1 fi (Mi ) Si (Ui ) ∆i Si (Mi +) Si (Ui ) 1−∆i n = i=1 [λi (Mi )]∆i Si (Mi +)Si−1 (Ui ). (4. Given the observations (Ui . The trick is to construct little time segments such that xi (m) can be viewed constant within each segment. . Ci }. n that is subject to left truncation Ui and right censoring Ci .58) Taking λ0 as the nuisance parameter. xi (m)). The model estimation with timevarying covariates can be implemented by the standard software package that handles only the timeindependent covariates by default.55) where λ0 (m) is an arbitrary baseline hazard function and θ is the pvector of regression coeﬃcients. Mi ]. . .4. See Figure 4. . 1975) or Section 4. Mi .(B) Implementation of Cox Modeling with Timevarying Covariates Consider the Cox regression model with timevarying covariates x(m) ∈ Rp : λ(m. n. λ0 ) = i=1 λ0 (Mi ) exp{θ T xi (Mi )} exp − Ui λ0 (s) exp{β T xi (s)}ds .8 for an illustration with the construction 132 . one may estimate β by the partial likelihood. ∆i = I(τi < Ci ). the likelihood function is n ∆i Mi L(θ.55). Provided below is the implementation of maximum likelihood estimation of θ and λ0 (m) by tricking it to be a standard CoxPH modeling with timeindepdent covariates. Let τi be the event time for each account i = 1.57) Under the dynamic Cox model (4. . . . ∆i . m ∈ [Ui . .
61) which becomes the likelihood function based on the enlarged data (4. For each account i.1 < · · · < Ui. n (4. . . l = 1.56) to the following independent observations each with the constant covariates Li (Ui.l Si (Ui. .l )Si−1 (Ui. Ui. ∆i. . (4.l .l−1 .Li ≡ Mi such that xi (m) = xi. . Ti ] to be Ui ≡ Ui.l )Si−1 (Ui. 133 .l−1 ). illustrated. . i = 1. (4.Figure 4. It is straightforward to show that the likelihood (4. .60) where ∆i. xi. m ∈ (Ui.l ). .59) n i=1 Thus.l = ∆i for l = Li and 0 otherwise. we have converted the nsample survival data (4. Therefore. Li . Ui.l−1 .60). let us partition m ∈ [Ui .l )]∆i.l .57) can be reformulated by n Li n Li L= i=1 [λi (Mi )] ∆i l=1 Si (Ui.l−1 ) = i=1 l=1 [λi (Ui.8: Split timevarying covariates into little time segments. of monthly segments.0 < Ui.l ].l .
under the prameterized logistic model (4.l (4.63). .60). Given the survival observations (4.59).l ) exp{θ xi. . n. see Therneau and Grambsch (2000) for computational details. . .g.l are constructed by (4. one may perform the MLE for the parameters (η. the parameter estimation by maximizing (4.l [1 − λi (m)] m∈(Ui . (4. λ0 ) = i=1 l=1 λ0 (Ui.59). n (4. see e. 134 . splines).under the assumption (4.Mi ) = i=1 l=1 λi (Mi ) [1 − λi (Mi )]1−∆i. it is easy to show that the discrete version of joint likelihood we have formulated in Section 4.58) can be equivalently handled by maximizing n Li ∆i. one may trick the implementation of MLE by logistic regression.g.l are the same as in (4. .l Ui.l } Ui.l−1 T λ0 (s)ds .l } T exp − exp{θ xi. Faraway (2006).64) where ∆i.60) with piecewiseconstant covariates. provided that the underlying hazard rate function (4. . . one may perform the MLE directly by applying the CoxPH procedure (R: survival package) to the transformed data (4.62) In R. Furthermore.l L(θ. where xi. .l ) for l = 1.2 can be rewritten as n ∆i L = i=1 n λi (Mi ) Li [1 − λi (Mi )]1−∆i ∆i. . for i = 1. . . θ) by a standard GLM procedure. xi. .55) admits the linear approximation up to the logit link: logit(λi (m)) ≈ η T φ(m) + θ T xi (m). Therefore.63) where logit(x) = log(x/1 − x) and φ(m) is the vector of prespeciﬁed basis functions (e. This becomes clearly the likelihood function of the independent binary data (∆il . Li and i = 1.60).
66) in which dκ = j∈Gκ nj i=1 ∆ji and Aκ = j∈Gκ nj i=1 Mji Uji ˜T ˜ λf (s) exp ξ xji (s) ds.3). δ) with L (d) (x) = (−δ)d (1 + 135 . (2008. it is easy to write down the likelihood as K nj ∆ji Mji L= κ=1 EZκ j∈Gκ i=1 λji (Mji Zκ ) exp − Uji λji (sZκ )ds . we obtain the joint likelihood K nj L= κ=1 j∈Gκ i=1 ˜T ˜ λf (Mji ) exp ξ xji (m) ∆ji (−1)dκ L (dκ ) (Aκ ). et al. . 1.g. (4. §7). . .2.(C) Frailtytype Vintage Eﬀects in Section 4. Estep: given the ﬁxed values of (ξ.3 The likelihood function. see e. (4. . By the independence of Z1 . Thus.4. estimate the individual frailty levels {Zκ }K by the conditional expectation κ=1 L (dκ +1) (Aκ ) Zκ = E[Zκ Gκ ] = − L (dκ ) (Aκ ) (4. For the gamma frailty Gamma(δ −1 .65) ˜T ˜ Substituting λji (mZκ ) = λf (m)Zκ exp ξ xji (m) . et al. Aalen. we obtain for each Gκ the likelihood contribution nj Lκ = j∈Gκ i=1 ˜T ˜ λf (Mji ) exp ξ xji (m) ∆ji EZ Z dκ exp{−ZAκ } (4.66) can be derived from the dk th derivative of Laplace transform of Z which satisﬁes that EZ Z dκ exp{−ZAκ } = (−1)dκ L (dκ ) (Aκ ). EM algorithm for fraitly model estimation. λf ). Zk .68) which corresponds to the empirical Bayes estimate. The expectation term in (4. see Aalen. (2008.67) or the loglikelihood function of the form (4.51). §7.
estimate (ξ. .i )∈Rji Zκ(j ) exp ξ xj i (Mji ) T T ∆ji (4. details omitted.69) 2. . λ∗ (·. Mstep: given the frailty levels {Zκ }K .51). To estimate the frailty parameter δ. then ﬁnd the maximizer of the proﬁled loglikelihood (4. 1 + δAκ Zκ = (4. f ∗ 136 . . . one may run EM algorithm for δ on a grid to obtain (ξ (δ). In R. the frailty level estimates are given by 1 + δdκ . λf ) by partial likelihood κ=1 maximization and Breslow estimator with appropriate modiﬁcation based on the ﬁxed frailty multipliers J nj PL(ξ) = j=1 i=1 J nj exp{ξ xji (Mji )} (j .70) Λf (m) = j=1 i=1 I(m ≥ Mji )∆ji (j . K. the partial likelihood maximization can be implemented by the standard CoxPH procedure based on the oﬀset type of logfrailty predictors as provided.i )∈Rji Zκ(j ) exp ξ xj i (Mji ) T .71) where κ(j) indicates the bucket Gκ to which Vj belongs. κ = 1.δx)−1/δ−d d i=1 (1/δ + i − 1). (4. δ)).
BIBLIOGRAPHY 137 .
D. P. (1996). M. (2007). K. [11] Black. 49. (1976). (2008).. 51. H. O. C. T. [2] Aalen. Wiley. and Gill. Cox’s regression model for counting processes: a large sample study. eds. Borgan. (1972). Annals of Statistics. and Mester. (1982). 1100–1120. Journal of Statistical Planning and Inference. R. (1993). [10] Bilias.BIBLIOGRAPHY [1] Aalen. 28. [3] Abramovich. New York: Dover. New York. M. L. and Cox.). [12] Bluhm. Springer. Modeling data with multiple time dimensions. M. J. E. [8] Berlin. (1986). (eds. Baskets and CDOs. and Steinberg. New York. L. and Rutkowski. O. L. Stanford heart transplantation data revisited: a real time approach. F. N. K. Valuing corporate securities: Some eﬀects of bond indenture provisions. and Overbeck. 31. R. O. 138 . and Prentice. Annals of Statistics. pp 65–81. 25. and and Gjessing. and Ying. Handbook of Mathematical Functions (10th printing). L. and Keiding. K. (2001). Berlin. D. Special issue of “Retail credit risk management and measurement”. Valuation and Hedging. 16. D. Computational Statistics & Data Analysis. [9] Bielecki. Boca Raton: Chapman and Hall. O. SpringerVerlag. Survival and Event History Analysis: A Process Point of View. Statistical Science. Springer. J. and Stegun.) (2004). P. 721–899. New York. 327–341. 351–367. Understanding the shape of the hazard rate: a process point of view (with discussion). (2004). H. [5] Andersen. R. F. (2007) Structured Credit Portfolio Analysis. A. Statistical Models Based on Couting Processes. M. Improved inference in nonparametric regression using Lk smoothing splines. Gill. Credit Risk: Modeling. 662–682. [4] Abramowitz.. [6] Andersen. S. 1–22. [7] Arjas. C. Gu. Y. J. R.. K. Towards a general asymptotic theory for Cox model with staggered entry. Z. 10. I. (1997). and Gjessing. Journal of Finance. Borgan. H. [13] Breeden.. 4761–4785. In: Modern Statistical Methods in Chronic Disease Epidemiology (Moolgavkar. Journal of Banking and Finance. M.
(1993). 662–682. R. Multiperiod corporate default prediction with stochastic covariates. 275–307. Journal of Finance. D. Econometrika. B. T. A. 157–179. L. and Management. and Saita.. Lowfrequency signals in long treering chronologies for reconstruction past temperature variability. D. L. B. R. E. 635–665. and Saita. R. E. Duﬃe. and Singleton. Pan. D. G. Transform analysis and option pricing for aﬃne jumpdiﬀusions.. The twoway proportional hazards model. Discussion of “Regression models and lifetables” by Cox. [24] Duchesne. 119–149. I. [27] Duﬃe. 68.. S. Eckner. [21] Das. Common failings: how corporate defaults are correlated. 187–220. and Schweingruber (2002). Science.. (1985).. New York: Wiley. L. A theory of the term structure of interest rates. N. A. Cook. Kapadia. A.. Biometrika. 62. Methodology. [23] Donoho. (1994). New York. Quigley. N. heterogeneity and the exercise of mortgage options. (2008). 385–407. and Lawless. [16] Chhikara. to appear. J. 83. D. J. Measurement. C. Lesmond. and Van Order. B.. (1989). R. (2002). (1975). D. 269–276. Lifetime Data Analysis. Horel. 25. K. (1972). (2007). R. and Folks. [29] Efron. A. L. and Johnstone. 81 (3). Regression models and lifetables. (1972). 6. F.. Alternative time scales and failure time models. J.[14] Breslow. J. 425455. R. Ingersoll. S. Credit Risk: Pricing. D. 34. J. D. and Applications. 93–117. [25] Duﬃe. K. [19] Cox. S. 1343–1376. [18] Cox. Journal of Finance. 62. [26] Duﬃe. Journal of the Royal Statistical Society: Ser. [30] Esper. [22] Deng. 53. Saita. N. Dekker. Princeton University Press. M. J. Journal of Finance. M. Journal of the Royal Statistical Society.. Ideal spatial adaptation by wavelet shrinkage. 295. and Wang. (2000). Journal of the Royal Statistical Society: Ser. L. R. Series B. Y. E. 2250–2253. 68. and Wei. The Inverse Gaussian Distribution: Theory. 139 . [17] Cox. J. K. (2003). D. Mortgage terminations. [20] Cressie. J. (2000). and Ross. Biometrika. and Singleton. 216–217. (2000). [15] Chen. Corporate yield spreads and bond liquidity. Partial likelihood. 62. L. [28] Duﬃe. Econometrica. Statistics for Spatial Data. D. (2007). Journal of Financial Economics. Frailty correlated default. J.. 34. (2007). Econometrica. D.
J. 21. [80] Friedman. (2008). (2008). K. (2008). Statist. A. 73–75. 30. Hoboken NJ. A snapshot of mortgage conditions with an emphasis on subprime mortgage performance. T. [32] Fan. H. S. 632–641. and Wahba G. Analysis of longitudinal data with semiparametric estimation of covariance function. K. (2003). 29. L. Chapman & Hall. (2006). 2008 (2). T. S. A. J. Making sense of the subprime crisis. [40] Fu. Technometrics. J. G. A smoothing cohort model in ageperiodcohort analysis with applications to homicide arrest rates and lung cancer mortality rates. Q. Baca Raton: Chapman and Hall. [42] Giesecke. Federal Reserve. 28. (2000). [45] Gramacy.. W. Ridge estimator in singular design with applications to ageperiodcohort analysis of disease rates. New York. 1–141. and Sudjianto. Boca Raton. and Gijbels. P. (2008). H. and Willen. [39] Fu. R. F.. Baca Raton: Chapman and Hall.[31] Fan. Nonlinear Time Series: Nonparametric and Parametric Methods. J. Appl. Huang. (2008). Mixed Eﬀects and Nonparametric Regression Models. and Fabozzi. T. Lehnert. and Yao. U.. 103. August 2008. K. Journal of the American Statistical Association. J. J. [43] Golub. H. D. W. [41] Gerardi. and Lee. J. R. V. Heath.. 1119–1130. (1979). and Cox. Annals of Statistics. Lucas. Lehnert. 19. Li. 263–278. (2006). [34] Fang. T. Default and information. R. Li.. A. and Prescott. D. 36. 215–223. 2281–2303. Journal of Economic Dynamics and Control.. Subprime Mortgage Credit Derivatives.. (1996). S. [44] Goodman. (2007). (1979). Bayesian treed Gaussian process models with an application to computer modeling. Wiley. Sociological Methods and Research. M.. J. S.S. Design and Modeling for Compute Experiments. I. 69–159. 327–361. 140 . J. K. (1991). Journal of the American Statistical Association. and Li. [33] Fan. [36] Farewell. B. Multivariate adaptive regression splines (with discussion). Local Polynomial Modelling and Its Applications. [35] Faraway.. Extending the Linear Model with R: Generalized Linear. H. Springer. Brookings Papers on Economic Activity. 102. Sherlund. (2006). Generalized crossvalidation as a method for choosing a good ridge parameter.. Communications in Statistics – Theory and Method. A. Zimmerman. J. N. M. A note on multiple time scales in life testing. R. [37] Frame.
N. Smith. D. and Wahba. 50. R. Math. and printed in u Mathematical Demography (ed. [55] Kupper. Smoothing Spline ANOVA Models. Statistical Models and Methods for Lifetime Data. [58] Lawless. [49] Hastie.. F. [47] Gu. Springer. [54] Kimeldorf.. and Silverman. Strasso burg: Tr¨bner. 332. Lond. S. and Greenberg. [48] Gu. 38. Ann. R. 1977). London. Math. 53–85. M. New York. T. On Cox processes and creditrisky securities. Wiley. Soc. Hoboken NJ. W. (1875). (2000). 41. (1991). Some results on Tchebycheﬃan spline functions. 99–120. Pages 5–7 translated to English by Keyﬁtz. C. P.[46] Green. G. G.. [56] Lando. Statistical Science. Appl. L. J. A. G. D. [57] Lando. (1995). and Wahba. [53] Kimeldorf.L. R. Anal. J.. (1998). D. A. (1994). Stat. 495–502. New York: Chapman and Hall. 501–513. Analysis of Multivariate Survival Data. A correspondence between Bayesian estimation of stochastic processes and smoothing by splines. B. Chapman & Hall. L. 811–830. Weak convergence of timesequential censored rank statistics with applications to sequential testing in clinical trials. (2004). Phil. A. G. 141 . 487–509. Threshold regression for survival analysis: modeling event times by a stochastic process reaching a boundary. and Lai. (1990). and Whitmore.. [60] Lexis. New York. Janis. Springer. Credit Risk Modeling: Theory and Applications. Review of Derivatives Research 2. Springer. T. Karmous. [52] Keiding. Statistical ageperiodcohort analysis: a review and critique. G.. (1990). L. Keyﬁtz. Statistical inference in the Lexis diagram. Statist. T. (1985). G. Trans. (1970). Berlin. (1971). 19. and N. N. 21. G. W. Generalized Additive Models. 33. G. 82–95. and Tibshirani. (2006). Princeton University Press. P. 1403–1433. Journal of Chronic Disease. [50] Hougaard. [51] Jarrow. Einleitung in die Theorie der Bev¨lkerungsstatistik. [59] Lee. Ann. M. and Turnbull. Pricing derivatives on ﬁnancial securities subject to credit risk. Journal of Finance. J. Nonparametric Regression and Generalized Linear Models. M. N. B. (2003). (2002).
142 . H. 113–125. T. Mason.. A. D. [71] Nelsen. Life Distributions: Structural of Nonparametric. (2009). Bayesian conﬁdence intervals for a smoothing spline. 282–01. Some methodological issues in cohort analysis of archival data. S. Pence. G. Springer. Journal of The American Statistical Association. C. (2007) Numerical Recipes: The Art of Scientiﬁc Computing (3rd. K. R. P. N. Semiparametric. (1998). (1995). K. Variable bandwidth kernel estimators u u of regression curves. W. [69] Moody’s Special Comment: Corporate Default and Recovery Rates. [62] Madan. London: Chapman and Hall. 242–258. K. H. Multiple Time Scales in Survival Analysis. E. (1989). and Holmes. Winsborough. 15. Statist. (1992).. Biometrika. 19. Review of Derivatives Research. M. Speckman. (1988). Annals of Statistics. and Parametric Families. 1134–1143. J. P. [67] McCulloch.). W. W. 2. C. An introduction to copulas (2nd ed.[61] Luo. Journal of Economic Perspectives. (2006). W.. D. [72] Nielsen. B. D. [66] Mayer. H. A. C. O. [70] M¨ller. ed. Gill.. T. H. Teukolsky. W. Pricing the risks of default. (1974). 49. [68] Merton. Springer. [64] Mason. and Stadtm¨ller. A counting process approach to maximum likelihood estimation in frailty models Scan.. and Olkin. Cohort analysis. I. [65] Marshall. and Unal.. P. 23. H. Generalized Linear Models (2nd ed. In: International Encyclopedia of the Social and Behavioral Sciences. G. (1973). and Flannery.. I. [63] Mason. 83.. R. On the pricing of corporate debt: The risk structure of interest rates. H. U. [73] Nychka. and Poole. Lifetime Data Analysis. C.).. 19202008. A. Data release: February 2009. K. 107–116. Journal of Finance. B. (2007). (2006). S. W. and Sherlund. J. Andersen. 25–43. Journal of the American Statistical Association. 2189–2194. Cambridge University Press. [74] Oakes. 38. A. (1997). R. D. The rise in mortgage defaults. Vetterling. 93. [75] Pintore. and Wahba. New York: Elsevier. and Wolﬁnger. [76] Press. American Sociological Review. Z. 1213–1252. 1. New York. 27–50. 121–160. G. (1987). G. 7–18. H. A. (2002). and Sørensen. 92. Hybrid adaptive splines. C. and Nelder. Spatially adaptive smoothing splines. H.).
A. Ann. Statist. [87] Slud. R. (2000). Curve registration. I and II. A. J. [86] Silverman. Bank of America. 11. Smoothing splines: regression. Annals of Statistics. O. 60.[77] RamlauHansen. (1984). A. and Williams. derivatives and deconvolution. [80] Rice. 321–354. Statistical Properties of Inverse Gaussian Distributions. Gaussian Processes for Machine Learning. E. D. (1979). W. and Zhang. Modeling Survival Data: Extending the Cox Model. [88] Stein.S. (1983). 551–571. and Grambsch. pp. Enterprise Quantitative Risk Management. [81] Ruppert. U. K. C. L. 351–363. Demography 16. 200863. G. 47. and future of subprime mortgages. S. Technical report. (1985). V. E. and Li. K. and Carroll. 205–223. 362–377 and 696–705.. 453–466. D. J.. E. C. Springer. (1983). GAMEED documentation. (2006). (1998).. Annals of Mathematical Statistics. (1966). Statist.. A. 315–326. (2008). 16. B. (2000). New York. (2006). [92] Vaupel. R. Annals of Mathematical Statistics. 141–156. 289–295. The impact of heterogeneity in individual frailty on the dynamics of mortality. Statist. X. M. 143 . M. B. The MIT Press. and Siegmund. M. E.und steigversuche an teilchen mit o u Brownscher bewegung. 12. Finance and Economics Discussion Series. Sequential analysis of the proportional hazards model. R. (1957). Aust. Physikalische Zeitschrift. [90] Therneau. 1–52. [85] Sherlund. J. [83] Selke. present. Z¨r theorie der fall. [79] Rasmussen. Journal of the Royal Statistical Society: Ser. and Stallard. M.. N. H. [82] Schr¨dinger. K. [78] Ramsay. Journal of the Royal Statistical Society: Ser. P. I. N. Some aspects of the spline smoothing approch to nonparametric curve ﬁtting (with discussion). Interpolation of Spatial Data: Some Theory for Kriging. J. RadonNikodynm derivatives of Gaussian measures. Smoothing counting process intensities by means of kernel functions. 28. Spatiallyadaptive penalties for spline ﬁtting. Biometrika. [89] Sujianto. J. 11. Federal Reserve Board. and Rosenblatt. (1983). (1999). Manton. Li. [84] Shepp. The past. New York: SpringerVerlag. Sequential linear rank tests for twosample censored survival data. M. Ann. (1915). L. 37. 70. 439–454. 42(2). M. T. B. Z. W. C. [91] Tweedie.
G. 45. G.[93] Vasicek. Journal of The Royal Statistical Society. (ed. Spline Models for Observational Data. T. An equilibrium characterization of the term structure. and Colton. 33. [100] Wang. New York. 4986–4997. P. The Free Encylopedia. Ser. B. G. O. (1977). Wiley. URL: http://en. (1990).org 144 . Journal of the Royal Statistical Society: Ser. In Arbmitage. Special issue of “Innovations in mortgage modeling”. 57. Bayesian conﬁdence intervals for the crossvalidated smoothing spline. [94] Vidakovic. Journal of Financial Economics. B. [97] Wahba. [95] Wahba. [99] Wand. and Jones. M.). et al. (1995).) (2005). (1983). J. N. (2005). (Eds. 177–188. P. C. Smoothing hazard rate. Volume 7. [101] Wikipedia. Chapman & Hall. 587–782. Kernel Smoothing. London. pp. [96] Wahba. Discussion of “Wavelet shrinkage: asymptopia” by Donoho.L. Philadelphia: SIAM. (1995). [98] Wallace. Real Estate Economics. SSBMSNSF Regional Conference Series in Applied Mathematics. (1999). 5. Wiley. B.): Encyclopedia of Biostatistics (2nd ed. M.wikipedia. 133–150. 360361. Statistical Modeling by Wavelets.
This action might not be possible to undo. Are you sure you want to continue?
Use one of your book credits to continue reading from where you left off, or restart the preview.