You are on page 1of 16

Copulas

Lucas Perin†

Abstract

In this work I explore copulas by creating a model with four funds that track market indexes
for stocks, bonds, dollar and commodities. I then use the model to generate simulated values
and test the performance of a model portfolio using the real returns and the simulated returns
to calculate the value at risk (VaR) and the expected shortfall (ES).

I. Introduction and Overview

Copulas model the dependence between variables in a multivariate distribution. They allow
for the combination of multivariate dependence with univariate marginals, allowing us to use
the many univariate models for the each of the variables that comprise our multivariate data.
Copulas gained popularity in the 2000’s. One of the recent usages of copulas proposed by Li
(2000) was even blamed for the financial crisis that started in 2008, according to Salmon (2009).
We are going to use copulas to model the behavior of four ETF funds: IVV, which tracks
the S&P 500; TLT, which tracks long-term treasury bonds; UUP, which tracks the US Dollar
foreign exchange index; and the DBC, which tracks commodities.

II. Theoretical Background

A copula is a multivariate CDF whose marginal distributions are all Uniform (0,1). Suppose
that Y has d dimensions and has a multivariate CDF FY and marginal CDFs FY1 , ..., FYd . It is
easy to show (the proof is in Ruppert (2010) section A.9.2) that each of the FY1 (Y1 ), ..., FYd (Yd )
is Uniform(0,1). Therefore, the CDF of {FY1 (Y1 ), ..., FYd (Yd )} is, by definition, a copula. Using
a theorem from Sklar (1973), we can then decompose our random variable Y into a copula CY ,
which contains the information about the interdependencies between our variables Y , and the
univariate marginal CDFs FY , which contains all the information about each of the univariate
marginal distributions. For d dimensions, we have:

CY (u1 , ..., ud ) = P {FY1 (Y1 ) ≤ u1 , ..., FYd (Yd ) ≤ ud }


= P P Y1 ≤ FY−1 (u1 ), ..., Yd ≤ FY−1

1 d
(ud )
= FY FY−1 (u1 ), ..., FY−1

1 d
(ud ) (1)

And making each ui = FYi (yi ), we have:

FY (y1 , ..., yd ) = CY {FY1 (y1 ), ..., FYd (yd )} (2)



University of Wahington Department of Finance. Prepared for AMATH 500 Winter 2011. Using a self-constructed
LATEXtemplate to resemble the Journal of Finance. So far, this is the closest I got to publishing there.

1
Prices of select ETFs for the last 500 days

130
110
IVV

90
70
100
95
TLT

90
85
26
25
UUP

24
23
18 20 22 24 26 28 30 22
DBC

2010 2011

Index

Figure 1 : ETF prices

If we differentiate equation (2), we find that the density of Y is:

d
Y
fY (y1 , ..., yd ) = CY {FY1 (y1 ), ..., FYd (yd )} fYd (yd ) (3)
i=1

The result in equation (3) allows us to create multivariate models that take into consideration
the interdependence of variables (the first part of the equation) and each variable’s distribution
(the second part of the equation). We can use parametric versions of both the copula and the
marginal parts to create a model that can be used to run tests and perform forecasts.
In the next few sections, we are going to use The R Language for Statistical Computing for
fitting a Gaussian and a t- copula to the log returns of the ETFs described in the introduction.
With the copula and the marginals, we will then use the model to determine the Value at Risk
(VaR) and the expected shortfall (ES) of an investment.

III. Algorithm Implementation and Development

As usual, we start by reading the file. Figure 1 shows a graph of the prices: note the relationships
between IVV and DBC (stocks and commodities) and the relationships between TLT and UUP
(dollar and treasuries).

# Read the ETFs into a zoo object

2
Returns of ETFs for the last 500 days

6
4
2
IVV

0
−4 −2
3
2
1
TLT

−1 0
2−3
1
0
UUP

−1
−2
−3
4
2
DBC

0
−2
−4

2010 2011

Index

Figure 2 : ETF returns

oetf <- read.zoo("ETF.csv", header = TRUE, sep = ",")

# Get the last 501 days


T <- length(oetf[,1])
etf <- oetf[(T-500):T,]

# Plot the prices


if (plotting)
{
pdf("prices.pdf")
plot(etf, main="Prices of select ETFs for the last 500 days", col=4)
dev.off()
}

In this case, we compute the log returns. Figure 2 shows a plot of the returns.

# Calculate the log returns


letf <- lag(etf,-1)
retf <- merge( (log(etf) - log(letf) ) * 100 )

3
Pairwise relation: returns
−3 −1 0 1 2 3 −4 −2 0 2 4

6
4
2
IVV

0
−4
3
2
1

TLT
−1
−3

2
1
0
UUP

−3 −2 −1
4
2

DBC
0
−2
−4

−4 −2 0 2 4 6 −3 −2 −1 0 1 2

Figure 3 : Correlation between variables

We then do a pairs plot to determine whether the results are correlated, and as expected,
there’s very high correlation between IVV and DBC, for example. Figure 3 shows the the pairs
plot.
We then obtain parameters for the margins, fitting a t distribution to each variable using
fitdistr. Results are in Table I.

# fit the distributions


params <- apply (retf, 2, fitdistr, "t")

## Get the resulting matrix, like a boss


# For each member of the list (each member is an ETF)

# Apply the AIC function to the first item (value)


# of the fourth item (loglik) of the params list
aic <- AIC(sapply (sapply(params,"[", 4), "[", 1), 3)

# Now get members 1, 2 and 3 (m, s, and df)


# of the first item (estimate) of the params list
p <- sapply (sapply(params,"[", 1), "[", c(1,2,3))

4
IVV TLT

0.4
0.5
Fit Fit
KDE KDE

0.4

0.3
0.3
Density

Density

0.2
0.2

0.1
0.1
0.0

0.0
−4 −2 0 2 4 6 8 −4 −2 0 2 4

Returns Returns

UUP DBC

0.30
Fit Fit
0.6

KDE KDE

0.20
Density

Density
0.4

0.10
0.2

0.00
0.0

−3 −2 −1 0 1 2 −4 −2 0 2 4

Returns Returns

Figure 4 : Fitted t-distributions

a <- t(rbind(aic,p))
rownames(a) <- names(params)

A plot of the fitted distribution being compared with the real data from the variables is
shown in Figure 4.
Now that we have the marginal distributions, we need to find the copula of our model. We
start by using the probability transformation and obtaining each of the FY1 (Y1 ), ..., FYd (Yd ),
which we know are Uniform(0,1). This is done by the code below:

# Now we need the uniform distributions


uIVV <- pct(retf$IVV, a[1,2], a[1,3], a[1,4])
uTLT <- pct(retf$TLT, a[2,2], a[2,3], a[2,4])
uUUP <- pct(retf$UUP, a[3,2], a[3,3], a[3,4])
uDBC <- pct(retf$DBC, a[4,2], a[4,3], a[4,4])
uret <- cbind(uIVV, uTLT, uUUP, uDBC)

Figure 5 shows the correlation between the uniformized distributions.


With the uniformized distributions we can see which type of parametric copula fits best. We
will fit a Gaussian copula and a t-copula, record their AIC and see which one provided the best
fit.

5
Pairwise relation: uniform−transformed returns
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

0.8
IVV

0.4
0.0
0.8
TLT
0.4
0.0

0.8
UUP

0.4
0.0
0.8

DBC
0.4
0.0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Figure 5 : Correlation between uniformized distributions

#############################################################
# Fit a gaussian copula
param.cor <- cor(retf)[upper.tri(cor(retf))]
param.cor.start <- rep(mean(param.cor), 6)

ncop <- normalCopula(param=param.cor, dim = 4, dispstr="un")


fit.gaussian <- fitCopula (ncop, uret, method="ml",
start= param.cor.start, optim.control=list(maxit=2000))

# Record the AIC of the fit


fit.gaussian.aic = AIC(fit.gaussian@loglik,
length(fit.gaussian@estimate))

#############################################################
# Now the t-copula
param.df <- mean(a[,4])
tcop <- tCopula(param=param.cor, dim = 4, dispstr="un")
fit.t <- fitCopula (tcop, uret, method="ml",
start=c(fit.gaussian@estimate[1:6],param.df), optim.control=list(maxit=2000))

# Record the AIC of the fit

6
IVV TLT

0.4
Sim Sim

0.4
Fit Fit

0.3
0.3
Density

Density

0.2
0.2

0.1
0.1
0.0

0.0
−4 −2 0 2 4 −4 −2 0 2 4

Returns) Returns

UUP DBC

0.30
Sim Sim
0.6

Fit Fit

0.20
Density

Density
0.4

0.10
0.2

0.00
0.0

−4 −2 0 2 4 −6 −4 −2 0 2 4 6

Returns Returns

Figure 6 : Comparison between simulation and fit

fit.t.aic = AIC(fit.t@loglik,
length(fit.t@estimate))
Comparing both fits, as seen in Table II, we find that the t-copula fits best, so we are going to
create a model based on our t-copula’s parameters. We then use the model to generate 10,000
observations that will simulate possible outcomes from our model. A graphical comparison
between our simulated model to the fitted model can be seen in Figure 6 - the simulation is very
close to the fitted model.
cop.best <- tCopula(param=fit.t@estimate[1:6], df=fit.t@estimate[7],
dim=4, dispstr="un")
cop.dist <- mvdc(copula=cop.best, margins = c("ct","ct","ct","ct"),
paramMargins = list(
list(m = a[1,2], s = a[1,3], df = a[1,4]),
list(m = a[2,2], s = a[2,3], df = a[2,4]),
list(m = a[3,2], s = a[3,3], df = a[3,4]),
list(m = a[4,2], s = a[4,3], df = a[4,4])))

set.seed(1)
sim <- rmvdc(cop.dist, 10000)
Now that we have the simulated observations, we are going to calculate the Value at Risk
(VaR) and the Expected Shortfall (ES), using a parametric method. We are going to assume a
portfolio (chosen arbitrarily) that invests 30% in IVV, 15% in TLT, 35% in UUP and 20% in

7
DBC. To calculate the returns Rp for the portfolio w, we simply use matrix algebra to multiply
our simulated returns Rs by the weights, as in Rp = Rs × w. We then fit a t-distribution to
Rp and use it to estimate VaR and ES.
For a t-distribution, the formulas for VaR and ES are:

ˆ t (α) = −S × (µ + λFν−1 (α))


V aR (4)
fν (Fν−1 (α)) ν + (Fν−1 (α))2
   
ˆ
ESt (α) = S × −µ + λ (5)
α ν−1

Where:
• S: position size
• Fν−1 : inverse CDF function
• fν : density function
• µ: mean
• λ: shape/scale parameter
• ν: degrees of freedom
• α: confidence level
The implementation of the application of the formulas in R is below. Note that in the code,
VaR and ES were rounded to the nearest thousand. The results are in Table III.

# Calculate VaR and ES for the simulated values


alpha <- 0.05
S <- 1e6
w <- c(0.30,0.15,0.35,0.20)

ret.sim <- as.numeric(sim %*% w)


fitt <- fitdistr(ret.sim, "t")
param <- fitt$estimate
m <- param[1]
df <- param[3]
lambda <- param[2]
var.sim <- -S *( m + lambda * qt(alpha, df=df) )
var.sim.ans <- round (var.sim / 1000) * 1000

es1 <- dt (qt(alpha,df=df),df=df)/(alpha)


es2 <- (df + qt(alpha,df=df)^2) / (df - 1)
es3 <- -m+lambda*es1*es2
ES.sim = S*es3
ES.sim.ans <- round (ES.sim / 1000) * 1000

8
Our final task is to calculate the non-parametric ES and VaR, given by the formulas:

ˆ np (α) = −S × q̂(α)
V aR (6)
P
ˆ i=1 nRi I(Ri < q̂(α))
ESnp (α) = −S × P (7)
i=1 nI(Ri < q̂(α))

Where:
• S: position size
• q̂(α): quantile of sample returns
• Ri : ith sample return
The R implementation is below:

# Calculate VaR and ES for the real values


ret.t <- as.numeric(retf %*% w) / 100
q <- quantile(ret.t, alpha)
var.t <- -S*q

ievar <- (ret.t < q)


ES.t <- -S * sum(ret.t * ievar) / sum (ievar)

The results are shown in Table III.

IV. Computational Results

Table I shows the results of the estimated parameters and AICs for the marginal t-distributions
of the ETFs:

ETF AIC µ λ ν
IVV 1570.59 0.16 0.78 2.70
TLT 1452.05 -0.01 0.99 25.79
UUP 947.10 -0.04 0.56 10.09
DBC 1733.16 0.10 1.28 17.57

Table I Marginal Distributions

The AIC for both copula fits is in Table II.

Copula AIC
Gaussian copula -559.34
t-copula -520.38

Table II Copula AICs

The VaR and ES are in Table III.

9
VaR ES
Parametric 7113.14 11020.71
Non-parametric 7068.17 10056.79

Table III VaR and ES

V. Summary and Conclusions

This work showed how to estimate marginals and copulas, and how to apply copulas to create
a model that will take into consideration codependence among variables. It also showed how to
calculate the Value at Risk (VaR) and the Expected Shortfall (ES).

10
Appendix I: R Code

1 #######################################################
2 # Lucas Perin
3 # 2011-03-04
4 # The Joy of Copulas and wrong scorelator answers
5 #######################################################
6

7 library(zoo)
8 library(MASS)
9 library(copula)
10 library(mnormt)
11

12 # Defense
13 rm(list=ls()) # Clear all variables
14 plotting = FALSE # set to false for scorelator
15

16 # SDAFE t-dist functons


17 dct <- function(x, m, s, df) dt((x-m)/s, df)/s
18 qct <- function(p,m,s,df) m + s * qt(p,df)
19 pct <- function(q,m,s,df) pt((q-m)/s,df)
20

21 AIC <- function (L, p) -2*L + 2*p


22

23 # Read the ETFs into a zoo object


24 oetf <- read.zoo("ETF.csv", header = TRUE, sep = ",")
25

26 # Get the last 501 days


27 T <- length(oetf[,1])
28 etf <- oetf[(T-500):T,]
29

30 # Plot the prices


31 if (plotting)
32 {
33 pdf("prices.pdf")
34 plot(etf, main="Prices of select ETFs for the last 500 days", col=4)
35 dev.off()
36 }
37

38 # Calculate the log returns


39 letf <- lag(etf,-1)
40 retf <- merge( (log(etf) - log(letf) ) * 100 )
41

11
42 # Plot the returns
43 if (plotting)
44 {
45 pdf("returns.pdf")
46 plot(retf, main="Returns of ETFs for the last 500 days", col=4)
47 dev.off()
48 }
49

50 # Take a peek to see whether we have a problem


51 if (plotting)
52 {
53 pdf("pairsRet.pdf")
54 pairs(retf,pch=18,cex=0.75,col=rgb(0,0,100,50,maxColorValue=255),
55 main="Pairwise relation: returns")
56 dev.off()
57 }
58 # fit the distributions
59 params <- apply (retf, 2, fitdistr, "t")
60

61 ## Get the resulting matrix, like a boss


62 # For each member of the list (each member is an ETF)
63

64 # Apply the AIC function to the first item (value)


65 # of the fourth item (loglik) of the params list
66 aic <- AIC(sapply (sapply(params,"[", 4), "[", 1), 3)
67

68 # Now get members 1, 2 and 3 (m, s, and df)


69 # of the first item (estimate) of the params list
70 p <- sapply (sapply(params,"[", 1), "[", c(1,2,3))
71 a <- t(rbind(aic,p))
72 rownames(a) <- names(params)
73

74 # Try to make sure I’m fitting things properly


75 if (plotting)
76 {
77 pdf("fits.pdf")
78 par(mfrow = c(2,2))
79 plot(density(retf$IVV), main="IVV", col=4, xlab="Returns")
80 curve(dct(x, a[1,2], a[1,3], a[1,4]), add=T, col=2)
81 legend(x="topright",legend=c("Fit","KDE"),col=c(2,4),lty=1,bty="n")
82 plot(density(retf$TLT), main="TLT", col=4, xlab="Returns")
83 curve(dct(x, a[2,2], a[2,3], a[2,4]), add=T, col=2)

12
84 legend(x="topright",legend=c("Fit","KDE"),col=c(2,4),lty=1,bty="n")
85 plot(density(retf$UUP), main="UUP", col=4, xlab="Returns", ylim=c(0,0.7))
86 curve(dct(x, a[3,2], a[3,3], a[3,4]), add=T, col=2)
87 legend(x="topright",legend=c("Fit","KDE"),col=c(2,4),lty=1,bty="n")
88 plot(density(retf$DBC), main="DBC", col=4, xlab="Returns")
89 curve(dct(x, a[4,2], a[4,3], a[4,4]), add=T, col=2)
90 legend(x="topright",legend=c("Fit","KDE"),col=c(2,4),lty=1,bty="n")
91 dev.off()
92 }
93

94 # Now we need the uniform distributions


95 uIVV <- pct(retf$IVV, a[1,2], a[1,3], a[1,4])
96 uTLT <- pct(retf$TLT, a[2,2], a[2,3], a[2,4])
97 uUUP <- pct(retf$UUP, a[3,2], a[3,3], a[3,4])
98 uDBC <- pct(retf$DBC, a[4,2], a[4,3], a[4,4])
99 uret <- cbind(uIVV, uTLT, uUUP, uDBC)
100 colnames(uret) <- names(params)
101

102 # Make sure it worked


103 if (plotting)
104 {
105 pdf("pairsU.pdf")
106 pairs(uret,pch=18,cex=0.75,col=rgb(0,0,100,50,maxColorValue=255),
107 main="Pairwise relation: uniform-transformed returns")
108 dev.off()
109 }
110

111 ##
112 # Time to copula
113 #
114

115 #############################################################
116 # Fit a gaussian copula
117 param.cor <- cor(retf)[upper.tri(cor(retf))]
118 param.cor.start <- rep(mean(param.cor), 6)
119

120 ncop <- normalCopula(param=param.cor, dim = 4, dispstr="un")


121 fit.gaussian <- fitCopula (ncop, uret, method="ml",
122 start= param.cor.start, optim.control=list(maxit=2000))
123

124 # Record the AIC of the fit


125 fit.gaussian.aic = AIC(fit.gaussian@loglik,

13
126 length(fit.gaussian@estimate))
127

128 #############################################################
129 # Now the t-copula
130 param.df <- mean(a[,4])
131 tcop <- tCopula(param=param.cor, dim = 4, dispstr="un")
132 fit.t <- fitCopula (tcop, uret, method="ml",
133 start=c(fit.gaussian@estimate[1:6],param.df), optim.control=list(maxit=2000))
134

135 # Record the AIC of the fit


136 fit.t.aic = AIC(fit.t@loglik,
137 length(fit.t@estimate))
138

139 cop.best <- tCopula(param=fit.t@estimate[1:6], df=fit.t@estimate[7],


140 dim=4, dispstr="un")
141 cop.dist <- mvdc(copula=cop.best, margins = c("ct","ct","ct","ct"),
142 paramMargins = list(
143 list(m = a[1,2], s = a[1,3], df = a[1,4]),
144 list(m = a[2,2], s = a[2,3], df = a[2,4]),
145 list(m = a[3,2], s = a[3,3], df = a[3,4]),
146 list(m = a[4,2], s = a[4,3], df = a[4,4])))
147

148 set.seed(1)
149 sim <- rmvdc(cop.dist, 10000)
150

151 # For plotting


152 IVV.sim <- sim[,1]
153 TLT.sim <- sim[,2]
154 UUP.sim <- sim[,3]
155 DBC.sim <- sim[,4]
156

157 sim <- sim / 100


158

159 # Calculate VaR and ES for the simulated values


160 alpha <- 0.05
161 S <- 1e6
162 w <- c(0.30,0.15,0.35,0.20)
163

164 ret.sim <- as.numeric(sim %*% w)


165 fitt <- fitdistr(ret.sim, "t")
166 param <- fitt$estimate
167 m <- param[1]

14
168 df <- param[3]
169 lambda <- param[2]
170 var.sim <- -S *( m + lambda * qt(alpha, df=df) )
171 var.sim.ans <- round (var.sim / 1000) * 1000
172

173 es1 <- dt (qt(alpha,df=df),df=df)/(alpha)


174 es2 <- (df + qt(alpha,df=df)^2) / (df - 1)
175 es3 <- -m+lambda*es1*es2
176 ES.sim = S*es3
177 ES.sim.ans <- round (ES.sim / 1000) * 1000
178

179 # Calculate VaR and ES for the real values


180 ret.t <- as.numeric(retf %*% w) / 100
181 q <- quantile(ret.t, alpha)
182 var.t <- -S*q
183

184 ievar <- (ret.t < q)


185 ES.t <- -S * sum(ret.t * ievar) / sum (ievar)
186

187 # Reality check


188 if (plotting)
189 {
190 pdf("SimFit.pdf")
191 par(mfrow = c(2,2))
192 plot(density(IVV.sim), main="IVV", col=4, xlab="Returns)", ylim=c(0, 0.48), xlim=c(-4,4))
193 curve(dct(x, a[1,2], a[1,3], a[1,4]), add=T, col=2)
194 legend(x="topright",legend=c("Sim", "Fit"),col=c(2,4),lty=1,bty="n")
195 plot(density(TLT.sim), main="TLT", col=4, xlab="Returns")
196 curve(dct(x, a[2,2], a[2,3], a[2,4]), add=T, col=2)
197 legend(x="topright",legend=c("Sim", "Fit"),col=c(2,4),lty=1,bty="n")
198 plot(density(UUP.sim), main="UUP", col=4, xlab="Returns", ylim=c(0,0.7))
199 curve(dct(x, a[3,2], a[3,3], a[3,4]), add=T, col=2)
200 legend(x="topright",legend=c("Sim", "Fit"),col=c(2,4),lty=1,bty="n")
201 plot(density(DBC.sim), main="DBC", col=4, xlab="Returns")
202 curve(dct(x, a[4,2], a[4,3], a[4,4]), add=T, col=2)
203 legend(x="topright",legend=c("Sim", "Fit"),col=c(2,4),lty=1,bty="n")
204 dev.off()
205 }
206

207 #################################################################
208 # Answers
209 #################################################################

15
210 write.table(a, "margins.dat", sep="\t",
211 row.names=F, col.names=F)
212 write.table(fit.gaussian.aic, "aic-norm.dat", sep="\t",
213 row.names=F, col.names=F)
214 write.table(fit.t.aic, "aic-t.dat", sep="\t",
215 row.names=F, col.names=F)
216 write.table(cbind(var.sim.ans, ES.sim.ans), "par.dat", sep="\t",
217 row.names=F, col.names=F)
218 write.table(cbind(var.t,ES.t), "nonp.dat", sep="\t",
219 row.names=F, col.names=F)

References

Li, D.X. (2000). “On default correlation: a copula function approach”. In: Journal of
Fixed Income.
Ruppert, D. (2010). Statistics and Data Analysis for Financial Engineering. Springer
Verlag. isbn: 1441977864.
Salmon, F. (2009). Recipe for Disaster: The Formula that Killed Wall Street.
Sklar, A. (1973). Random variables, joint distribution functions, and copulas.
The R Language for Statistical Computing.

16

You might also like