Implied Binomial Trees from Historical Data

Implied Binomial Trees from the Historical Distribution
Nusret Cakici
and
Kevin R. Foster*
City College of New York

Convent Avenue at 138th Street
New York, NY 10031
Tel: (212) 650-6201
email: ncakici@yahoo.com, kfoster@ccny.cuny.edu
November 2001
Draft -- Not for Citation
*
We are grateful for helpful comments from Salih Neftci. The comments of an anonymous
referee were enormously beneficial. We are grateful for support from the Schweger Fund.
Remaining errors are our own.
This paper shows how to build a binomial tree that is consistent with the
historical distribution and that can be used to value American options.
Previous methods of using the historical distribution (Stutzer 1996) use data on
just the underlying security to imply options prices -- but only European
options. Since the binomial trees in this paper are constructed only from
historical data on the underlying asset, not option prices, market prices may be
compared to the estimates from these trees, in order to measure the relative
richness or cheapness of the quoted options.
This paper demonstrates a way to build binomial trees that have an option volatility "smile"
consistent with the historical distribution. This research links two recent strands: the Stutzer
(1996) method of using the historical distribution of asset prices to infer European option
prices and volatilities with the methods of constructing "smiling trees" of Derman and Kani
(1994) and others. The constructed tree then has a volatility smile derived entirely from the
historical data on the underlying asset not options prices. It can serve as a benchmark in
evaluating the current option prices to judge whether they reflect higher or lower volatility
than would be implied by their history. Just as the Stutzer methodology calculates European
option prices that are consistent with the historical distribution, this paper extends that work to
build trees that calculate American option prices consistent with the historical distribution.
Since the pioneering work of Dupire (1994), Rubinstein (1994), and Derman and Kani (1994),
both academics and practitioners have been building trees that can match the volatility smile,
in order to price American options and other path-dependent derivatives. These approaches
typically begin with a set of option prices and then generate a set of nodes and transition
probabilities that are consistent with these prices. However they are limited by the need to
accept as inputs the present structure of option prices (or else make strong functional form
assumptions) and so are not suitable for traders looking to measure the relative richness or
cheapness of the quoted options. We would like to be able to construct some sort of historical
baseline to be able to provide an estimate, based on the past behavior of the underlying asset,
of the option value for American options or other path-dependent derivatives.
1
A method of constructing an historical baseline for European options is provided in the papers
of Stutzer (1996) and Zou and Derman (1999), which demonstrate a straightforward
methodology for generating a risk neutralized historical distribution (RNHD). From this
RNHD, a historically appropriate set of European options prices is derived, using only the
data on the underlying asset not data on current option prices. The RNHD is the distribution
that is "closest" to the historical empirical distribution but still satisfies certain constraints: the
mean is inferred from the risk neutrality constraint and the spread may be inferred from the at-
the-money option price. The RNHD is "closest" in that it minimizes the Kullback-Leibler
entropic distance between a nonparametric kernel estimate of the historical distribution and
the risk-neutral distribution (Cakici and Foster 2001).
The options prices implied by this risk neutralized historical distribution (RNHD) then serve
as inputs into the tree-building procedure in order to derive an appropriate benchmark for
valuing American options. In constructing these, we may wish to match a particular option
price so as to leverage this knowledge to evaluate other prices. For example, the thickly
traded at-the-money option price can imply out-of-the-money prices based on the empirical
distribution. Since this procedure delivers prices for both American and European options,
analysts can also make cross-comparisons.
This paper demonstrates the utility of this joint procedure by valuing S&P100 options, which
are American and can be exercised early. The existence of a substantial volatility smile,
2
resulting from the skewed distribution that fails to satisfy the Black-Scholes assumptions of
log-normality, makes the proper modeling of these options particularly important.
Section 2 of this paper describes the procedure for constructing the RNHD based on a that
will accurately reflect the historical distribution. Section 3 outlines the procedure for
estimating a nonparametric kernel. Section 4 describes the procedure, based on Barle and
Cakici (1998), for creating the tree from the option volatility smile. Section 5 applies this
method to the S&P100 data. Section 6 concludes with some suggestions for further research.
2. Constructing the RNHD Estimate of the Volatility Smile
Equilibrium in a financial market is generally characterized with assumptions about a lack of
risk-free arbitrage trading opportunities. Forward prices are rationally derived from the
expected price of the underlying asset, so a particular asset with current value S0, paying a
dividend stream, d, where agents can borrow and lend at the riskless interest rate r, will have
expected price at some future date, T, given as ST, evaluated under a probability distribution,
P, such that
E [S T P ] = S 0 e (r −d )T . (1)
Options are priced from the forward distribution, P, valued as their discounted expected value
if they expire "in the money." This forward distribution, P, is unknowable to empiricists
except as inferred from options prices. Some economists postulate a particular functional
form for the distribution (lognormal for Black-Scholes), but instead we use the entire
~
historical distribution as our guide. This historical distribution, Q , is transformed to match
~
the no-arbitrage condition and the option price, so we form Q from Q to get a plausible
3
estimate of P. Construction of Q, the risk-neutralized historical distribution, allows
calculation of the values of any standard European options.
~
Estimation of Q from the historical distribution, Q , is not trivial, however. There is no
necessary link between past events and expectations about the future. However, if we assume
that present market participants take account of the past in forming their expectations, we can
expect some relationship to exist (see Grandmont 1993). The tightness of this relationship
may be measured by the entropy change associated with it. Since a reduction in entropy is an
increase in orderliness (due to the restriction), the smallest decrease measures the least
prejudicial application of the no-arbitrage constraint.
The Shannon entropy of a distribution, Q, with observations qj, is measured as
åq j log(q j ) , (2)
j
and the Kullback-Liebler entropic distance between two distributions P and Q is measured
analogously as
åq j ( )
log q j p j . (3)
j
This entropic distance measure deviates from a standard metric in that it is not symmetric: it
gives the directed distance from Q to P (not necessarily the same as the directed distance from
P to Q). The reason for this asymmetry is that the Kullback-Liebler distance weights the
deviations by their probability of occurrence, so it is more sensitive to deviations that are
more likely to be observed and less sensitive to improbable deviations. Other familiar
4
measures such as the squared deviations would give uniform weight to each squared
deviation.
The Kullback-Liebler minimum entropy change, used in this paper, in finding a distribution Q
~
that is near to Q but that satisfies the constraints, restricts us from inadvertently imposing
extra assumptions. The relation with the Shannon entropy measure allows us to interpret
these extra assumptions as adding information (reducing entropy). This follows Laplace's
principle of insufficient reason. For instance, the Black-Scholes assumtion of a normal
distribution imposes precise assumptions upon the third and higher moments. These
assumptions can be measured as decreasing entropy (increasing information), and so finding a
distribution that makes the slightest decrease in information corresponds to making the
weakest set of assumptions that can fulfill the given constraints. This entropic distance is
commonly used in other fields such as physics and engineering, since for particular
distributional families the cross entropy minimization procedure produces analytical solutions
of Bose-Einstein or Maxwell-Boltzmann distributions (see Kapur and Kesavan 1992).
Further reasons for using this measure are summarized by Von Neumann's reply to Shannon,
on the reason to label his original measure as "entropy": First, it "is the same as the expression
for entropy in thermodynamics and as such you should not use two different names for the
same mathematical expression, and second, and more importantly, entropy, in spite of one
hundred years of history, is not very well understood yet and so as such you will win every
time you use entropy in an argument!" (reported in Kapur and Kesavan, p. 8).
5
This Kullback-Liebler distance criterion is minimized subject to the requirement that Q be a
proper probability measure, that the expected price satisfies risk-neutrality, and may be
minimized subject to the additional requirement that some European option prices (typically
at-the-money) implied by this modified distribution match. Writing this new Q distribution as
a function of the λi parameter vectors, the no-arbitrage constraint requires that the expected
value under the new probability distribution should equal the forward price, so:
ò Q(λ , λ )S
1 2 T dS = S 0 e ( r − d )T (4)
where r is the risk-free rate and d is the continuous dividend yield. Additionally, we may
impose the constraint that the price of a European call or put (typically the at-the-money
option) should match, thus
C(K,T ) = e -rT max(0, E[S T Q(λ1 , λ2 )] - K ) or (5)
P(K , T ) = e -rT max(0, K - E[S T Q(λ1 , λ2 )]), (6)
where K is the strike price, set equal to the forward price ST for at-the-money.
By the entropy distance minimization (see Buchen and Kelly 1996), if the ATM call price is
matched, we can derive
PST
Q(λ1 , λ 2 ) = − λ1S − λ2 C ( K ,T )
e − λ1ST −λ2C ( K ,T ) , (7)
ò PS e dS
where the parameters of λ1 can be interpreted as the shadow values of the first no-arbitrage
constraints and λ2 are the shadow values of the constraints that the at-the-money option
prices should match. The parameter vectors λ1 and λ2 are found numerically.
6
A simple example may clarify this procedure. Suppose the "Dice" company has a history of
returns given as if from a roll of a die: one sixth are 1, one sixth are 2, ... , one sixth are 6. It
~
is a uniform discrete distribution from 1 to 6, which we refer to as Q . The mean of the series
is evidently 3.5 and the variance is 1.7. We want to modify this series so that it satisfies risk-
neutrality: assume that we want to make the expected value equal to some other number, say
4.44. Note that the entropy distance from a uniform (each pi = 1

n ) is just a constant plus the
Shannon entropy:
qi
åq i ln
1
n
= ln n + å q i ln q i . (8)
So minimize the distance (maximize the negative) subject to the constraints that the
probabilities sum to one and that the mean is 4.44, getting the Lagrangian:
6
æ 6 ö æ 6 ö
− ln 6 − å qi ln qi − λ0 ç å qi − 1÷ − λ1 ç å iq i − 4.44 ÷ (9)
i =1 è i =1 ø è i =1 ø
Get the first order conditions with respect to the qi and rearrange so
q i = ab i , (10)
where a = e −1− λ0 and b = e − λ1 . Substitute this into the constraints to be left with two
equations in two unknowns:
a å b i = 1 and (11)
a å ib i = 4.44 , (12)
where the second equation gives a polynomial in b while a is just a normalizing parameter.
Some calculation gives the following numbers for Q, the distribution that has the minimum
~
entropy distance from Q but satisfies the constraints as P:
7
Prob{1} Prob{2} Prob{3} Prob{4} Prob{5} Prob{6}
0.0594 0.0839 0.1186 0.1675 0.2365 0.3341
This distribution now has a mean of 4.44. When applied to the problem of minimizing the
distance from a distribution of stock returns given the prior of the historical returns, the
answer may not necessarily be from a particular distributional family, since the cross entropy
minimization no longer has analytical solutions.
With some further calculations, the variance could be made consistent with the volatility
implied by an option price. For instance, an analyst might believe that the thickly-traded at-
the-money option gives an accurate volatility measure, but want to leverage this belief to
value thinly-traded options. Or the analyst might have a strong belief about another
probability (corresponding to a different option) and want to see the implied effect upon at-
the-money volatility.
3. Kernel Estimator of Historical Distribution
The simulations of Stutzer and Zou and Derman were made as simple draws from the daily
historical returns. While this has the advantage of simplicity, some of the tail events remain
discrete, chunky occurrences not smooth probabilities. The 1987 Crash is one such notable
discrete event. In the historical simulations there is a simple chance of picking this calamity
with each draw, but no chance of ever picking a slightly larger or slightly smaller drop.
Market participants may have somewhat smoother priors to assign at least a tiny probability to
the interspersed large negative movements. Surely it seems plausible that market participants
may assign some nonzero probability to a substantial fall that is not exactly a clone of the
8
1987 Crash. This intuitive notion can be modeled with a non-parametric kernel estimate of
the historical distribution.
The kernel estimator will be appropriate, as long as the historical distribution is stationary,
ergodic, and the data generating process satisfies some relatively weak assumptions of
stochastic equicontinuity in order for a Uniform Law of Large Numbers result to obtain
(Andrews 1994). Stochastic equicontinuity of an empirical process, HT, is defined as
∀ε > 0 and ∀η > 0 , ∃δ > 0 such that
é ù
lim P ê sup H T (θ ′) − H T (θ ) > η ú < ε . (13)
T →∞
ëθ ,θ ′∈Θ, ρ (θ ,θ ′ )<δ û
In generalizing this method to higher dimensions, we should keep in mind the tradeoff
between the smoothness required of the function and the dimension of the random variable.
The kernel estimator at each point is constructed as
1 T
æ x − Xi ö
H T ( x) =
hT
å K çè
i =1 h ø
÷ (14)
where K is the Epanechnikov kernel
ì 3
ï
K (z ) = í 4 5
(
1 − 15 z 2 ) z ≤ 5
. (15)
ïî 0 else
We use an Epanechnikov kernel because in the limit this has the highest efficiency in Mean
Squared Error in trading off between bias and variance (Silverman 1986). The bandwidth
parameter, h, is similarly selected based on limit efficiency as
h = 1.06σˆT − 5
1
(16)
9
where σˆ is the standard deviation of the sample. We have experimented with other kernels
and other bandwidths, but found little variation (results available upon request). A Normal
(Gaussian) kernel implies basically the same volatility smile. The smile is also little changed
by variations in the bandwidth parameter or even the use of an adjustable bandwidth estimator
that allows the sensitivity to vary depending on the local density so that high-density areas get
a low bandwidth while low-density estimators get a higher bandwidth.
Use of a kernel estimator for interest rates has been criticized by Chapman and Pearson
(2000) for making a poor estimation of areas of the distribution with very few data points.
However this analysis is not so concerned with the behavior at extreme values, but rather the
well-estimated behavior at points where the data are dense. The kernel estimator is also well
suited to addressing the issue of the length of memory. This procedure assumes that the entire
historical series enters the present judgements of market participants as they assess the
likelihood of future events, without any decay so that events from decades past enter in the
same manner as the most current events. Statistically, this is the assumption of stationarity.
While this assumption may not be universally valid, it still gives an interesting baseline by
which to judge current option prices. A modification to the kernel estimator could address
this issue, since it is straightforward to introduce another dimension of temporal distance
when creating a kernel estimate of the historical distribution.
This kernel estimate is used to construct a RNHD estimator of the forward distribution, given
particular values for r, d, and at-the-money option prices. This forward distribution generates
a smile of implied Black-Scholes volatilities for the European call prices. A smooth version
10
of the smile is fitted as a function of the strike price and its functionals (square, cubic, etc.).
This is a simplification to minimize the computing time, since the construction of the tree
requires evaluation of call prices (therefore volatilities) at quite numerous intervals while the
RNHD procedure gives discrete values. Derman and Kani used a simple linear smile, but we
wish to capture more detail by using the higher-order functionals. The smooth functional
preserves the necessary information while minimizing the computational burden.
4. Constructing a Recombining Binomial Tree from the Volatility Smile
Following the Barle and Cakici modification of the Derman and Kani algorithm, the
recombining binomial tree is constructed recursively forward, one level at a time. Assuming
that the tree has been constructed up to the nth level, letting r be the risk-free interest rate, d be
the dividend yield rate, and s i be the stock price at node i of the nth level, then the forward
price, Fi = si e (r − d )∆t , must by risk-neutrality satisfy
Fi = pi S i +1 + (1 − pi )S i (17)
where the capital letters S i +1 and S i represent the nodes as the (n+1) level branching from the
price s i .
The call price of a European option maturing at (n+1) and with strike price, K, is C (K , n + 1) .
Of course, the price must be interpolated, which we derive from the functional describing
volatility. Then the call and put prices are:
n +1
C (K , n + 1) = å Λ i max(S i − K ,0 ) (18)
i =1
11
n +1
P(K , n + 1) = å Λ i max (K − S i ,0 ) (19)
i =1
where Λ i is the Arrow-Debreu price at each node, calculated as
ì pn λn i = n +1
ï
Λie r∆t
= í pi −1λi −1 + (1 − pi )λi 2≤i≤n (20)
ï (1 − p1 )λ1 i =1
î
where λi is the known Arrow-Debreu price at node i.
One remaining assumption concerns finding the value of the central node(s). The tree is
allowed to grow at the risk-free rate so that the central nodes do not continue to represent zero
growth (as in Derman and Kani). If the number of new nodes is odd, then the central node
price is set equal to the forward value. If the number of new nodes is even, then the two
central nodes must satisfy S i S i +1 = Fi 2 . So the upper central node is
λi Fi + ∆Ci
S i +1 = Fi , (21)
λi Fi − ∆Ci
where
å λ (F − Fi ) ,
n
∆Ci = e r∆t C (Fi , n + 1) − j j (22)
j =i +1
while the lower central node is
Fi 2
Si = . (23)
S i +1
From these central nodes, the upper and lower parts of the tree may be derived.
For the upper nodes, the previous equations imply that
∆Ci S i − λi Fi (Fi − S i )
S i +1 = (24)
∆Ci − λi (Fi − S i )
12
Analogously, the lower nodes are filled with the relation that:
λi Fi (S i +1 − Fi ) − ∆Pi S i +1
Si = (25)
λi (S i +1 − Fi ) − ∆Pi
where now
i −1
∆Pi = e r∆t P(Fi , n + 1) − å λ j (Fi − F j ) . (26)
j =1
Negative probabilities are not allowed and are chiefly minimized by setting the strike price,
K, equal to F, in order to evaluate the option. Given these modifications, negative
probabilities are avoided even when building very large trees.
The complete method to construct a binomial tree with volatilities consistent with the
historical distribution may be summarized as follows:
1. A nonparametric kernel is constructed to estimate the empirical distribution of asset
returns.
2. The RNHD is constructed as the distribution that is closest to the empirical distribution
but also satisfies the conditions of risk-neutrality (Equation 4) and perhaps option prices
(Equations 5 and 6), typically at-the-money as in Zou and Derman.
3. The implied option prices and volatilities from the RNHD are evaluated at discrete
intervals and a polynomial function interpolates the "smile".
4. A recombining binomial tree is constructed that matches the smile implied by the
historical data, using the Barle and Cakici algorithm.
13
5. Application to S&P100 Options
A nonparametric kernel is estimated for the S&P100 index using data from March 5, 1984 to
November 2, 2000, from which the log ratio gives the percent change. This empirical
probability density function is shown in Figure 1, where it is compared with a normal density
(the assumption that underlies the Black-Scholes model). It can be seen that the empirical
probability density function (p.d.f.) has fatter tails (leptokurtotic) but is also more
concentrated at the middle of the distribution so that the “shoulders” are lower than the
reference normal.
This kernel is used to generate a risk-neutralized historical distribution (RNHD), assuming a
risk-free rate of 8% and a dividend yield rate of 4%. These are not meant to be historical
averages but rather to demonstrate that, whatever the present market-quoted levels of these
parameters, this RNHD procedure is quite capable of matching them. The at-the-money
volatility is unrestricted, however additional computations could be made to match that price.
The RNHD generates a volatility smile for one-year (252-day) European options, shown in
Table 1. The Black-Scholes volatilities show a steep drop as the strike price rises, which
arises from the fact that, since the tails of the historical distribution are much thicker than
those of the normal distribution, the option prices are higher.
A regression is estimated to correlate the implicit volatility with the strike/price ratio as well
as its square, its cube, and a constant. Each regression is of the form:
2 3
æK ö æK ö æK ö
σˆ i = β 0 + β 1 ç i ÷ + β 2 ç i ÷ + β 3 ç i ÷ (27)
è S ø è S ø è S ø
and the estimated betas for this case are:
14
2 3
æ Ki ö æ Ki ö æK ö
σˆ i = 2.04 − 4 . 59ç ÷ + 3 . 84ç ÷ − 1.09ç i ÷ (28)
è S ø è S ø è S ø
(0.04) (0.12) (0.12) (0.04)
where the standard errors are in parentheses under each coefficient. The R2 for the regression
is 0.9998. With all of the functionals, such a high R2 would normally bring worries of over-
fitting. However in this case that is the point. We want to get a very tight fit since we are not
particularly interested in out-of-sample prediction (which would be the reason to worry about
over-fitting) but rather are interested in getting a very precise description of the present
volatility surface. For strike values outside the range presented in Table 1, of [80,120], we
use the endpoint volatilities rather than extrapolate. This description of the volatility surface
is then put into the next step where it is used to generate a tree that is consistent with it. The
more precise the estimated volatility surface, the more precise is the tree.
The tree is generated with the Barle-Cakici algorithm. For illustration, a five-step binomial
tree is shown in Figure 2 for the one-year options, although a 50-step model is used for the
later results. Each node shows the stock price, the probability of a subsequent upward move,
and the Arrow-Debreu price, λi . The labels refer to the explanatory text below. The interest
rate is 8%, so the continuously compounded rate, r= ln(1 + .08) = 0.0770, while the dividend
payment stream is 4%, also modified to the continuously compounded rate, d= ln(1 + .04) =
0.0392. The time to maturity is one year so each step, ∆t , is 0.2 years. The initial stock
price, S1 , is 100 and λ1 =1.
15
To fill in the nodes of the tree, first we calculate the stock prices at the two nodes labeled A
and B, where the forward price is F1 = 100e (r −d )∆t = 100.7577. Since we have an even
number of new nodes, we use Equation 21 to calculate
λ1 F1 + ∆C1
S A = F1 , (29)
λ1 F1 − ∆C1
where the initial Arrow-Debreu price is set to be λ1 = 1 , from Equation 22 we have
∆C1 = e r∆t C ( F1 ,2) − 0 , and C (F1 ,2) = 3.5626 is the Black-Scholes price of a call with maturity
of 0.2 years, exercise price F1 , and volatility calculated from the regression coefficients
( )
describing the smile. This volatility is 2.04 − 4.59 100
F1
( )
+ 3.84 100
F1 2
( )
− 1.09 100
F1 3
= 0.2013. Since
it is the first node, the summation terms in the formula for ∆C1 are zero. Therefore
∆C1 = 3.6179 and S A = 108.27 .
F12
Then from Equation 23, the lower node is easily derived so S B = = 93.7727. The
SA
probabilities are from Equation 17: the probability of moving up to node A from the first
F1 − S B
node, p1 = = 0.4820. (Note that this probability is recorded at the first level.)
SA − SB
Therefore the Arrow-Debreu price at node A is (from Equation 20) Λ A = λ1 p1e − r∆t = 0.4747
and Λ B = λ1 (1 − p1 )e − r∆t = 0.51.
The third column of nodes, n=3, are first calculated by setting the price of the middle node, D,
equal to the forward price, so S D = 100e (r −d )2 ∆t = 101.5211. (Since there are an odd number of
nodes, calculating the central value is straightforward compared with the case of an even
16
number of nodes.) Additionally we calculate the forward price from node A,
FA = 108.2629e (r −d )∆t = 109.0831. From these, we calculate the values for node C, where,
using Equation 24, the price of the stock,
∆CC (101.5211) − λ A (109.0831)(109.0831 − 101.5211)

SC = , (30)
∆CC − λ A (109.0831 − 101.5211)
where ∆CC = e r∆t C (109.0831,3) − 0 , since the summation terms are zero and C(109.0831,3)=
2.0852 is the Black-Scholes price of a call option with the indicated strike, 0.4 year maturity,
and volatility from the regression on the smile as 0.1905. Thus ∆CC =2.1175 and
S C =119.9606. The transition probability of moving from node A to node C,
109.0831 − 101.52
pA = =0.4101 and the new Arrow-Debreu price can be found as before.
119.9606 − 101.52
To calculate the lower node price, which is labeled E, we use Equation 25, that
λ B (94.4832)(101.5211 − 94.4832) − ∆PE (101.5211)

SE = (31)
λ B (101.5211 − 94.4832) − ∆PE
where FE = 94.4832 = e (r − d )0.2 S B . We gather λ B = .51 from the previous node and calculate
∆PE from Equation 26. The Black-Scholes put price, 2.4243, is calculated with exercise price
of 94.4832 and volatility of 0.2142 (again, this volatility is from the regression in Equation
28). Since the summation terms are zero, ∆PE = e 0.077 (0.2 ) (2.4243) =2.4619. Therefore SE is
79.12. From this price we can find the transition probability (recorded at node B) using
94.48 − 79.12
Equation 17 as =0.69.
101.52 − 79.12
17
The next column of prices, n=4, (the probabilities and Arrow-Debreu prices can be easily
found from the prices) is found by first using Equations 21 and 23 to find the two central
nodes, since n is even. The forward price, F4 = 100e (r −d )0.6 = 102.29 and the upper central
λ D (102.29 ) + ∆CG
node (labeled G in Figure 2) is calculated from this as 102.29 , where λ D
λ D (102.29) − ∆CG
was previously calculated as 0.62 and ∆CG is calculated from the call price and the summation
terms. The Black-Scholes call price of an option with an exercise price of 102.29 and
volatility (given from the regression) of 0.1989 is 5.9970. The summation term reads over the
nodes that are above and to the left of node G, which in this case is only node C. This term is
λC (FC , D − F4 ) , where λC was previously calculated to be 0.1918, FC , D = 120.87
= 119.96e (r −d )∆t and F4 is 102.29 (from above). The summation terms are 3.5615 so
∆CG = e 0.2 r (5.9970 ) − 3.515 = 2.5286. Therefore the upper central node, G, has price
SG=110.78. From Equation 23, the lower central node price (denoted H) is 94.45
=
(102.29 )
2
.
110.78
After calculating the two central nodes we move centrifugally, first calculating the upper
nodes then the lower ones. The price at node F is calculated from Equation 24, where
FF = 119.96e 0.2(r − d ) =120.87 and Si is, from the node below (the G node), equal to 110.78.
The λC is 0.19 and we only need to calculate ∆CF . The summation terms are zero since it is
at the edge of the tree. The Black-Scholes price of a call is calculated for an exercise of
120.87 and volatility of 0.1819 (since the node is beyond 120% of the initial spot price we do
18
not extrapolate but use the value truncated at 120). This call price is 0.8649 and so
∆CF =0.8783. From this we get
0.8783(110.78) − (0.19)(120.87)(120.87 − 110.78)

SF = . (32)
0.8783 − (0.19)(120.87 − 110.78)
The lower node price, labeled I, uses the price at the node just above it (at H), which is Si+1 in
Equation 25 so SH=94.45. The price is
λ E FI (S H − FI ) − ∆PI S H
SI = (33)
λ E (S H − FI ) − ∆PI
where FI =79.72= 79.12e (r − d )0.2 . The ∆PI term is calculated from the put price with exercise
79.72 and appropriate volatility (again, truncated to be set at 0.2696), which is 1.0194 so
∆PI =1.0352. Putting these terms into the formula above ( λ E was previously calculated) gives
67.89.
Finally, in order to ensure that we have adequately explained our procedure, we show the
calculation for a node at the end of the tree, labeled J. Once the first four levels of nodes have
been calculated, the upper central node, just below J, is calculated to be 112.45 (since there
are an even number of nodes, the procedure from the steps with n=2 and n=4 is used again).
The stock price at J, S J , is given from Equation 24 that
∆CJ (112.4492) − λ 4, 4 (121.1406)(121.1406 − 112.4492)

SJ = , (34)
∆CJ − λ 4, 4 (121.1406 − 112.4492)
where λ 4, 4 =0.2455 is the Arrow-Debreu price at the fourth level, fourth node up from the
bottom, and
19
∆CJ = e r∆t C (121.1406,5) − 1.0236 , (35)
where C(121.1406,5)=2.0841 is the Black-Scholes call price at the given strike, maturity of 1
year, and volatility from the smile of 0.1819 (again, this node is beyond 120% of the initial
spot price, so we use the level at 120). The summation term is λ 4,5 (F4,5 − F4, 4 ) , where
F4,i = e (r − d )∆t S 4,i and λ 4,5 is the Arrow-Debreu price at the fourth level, fifth node up, so that
1.0236 =0.0420(145.5104 - 121.1406) since we only sum over the nodes above and to the left
of J. Thus S J = 130.2670.
From this simple five-step example, the general method should be clear. Table 2 shows the
option prices from a 50-step tree using the specified smile from the regression. These prices
reflect the early-exercise premium as well as the historically implied volatility skew. The
Cox-Ross-Rubinstein (CRR) implied volatilities are also shown (Cox, Ross, and Rubinstein
1979). These volatilities, of course, are different than the Black-Scholes volatilities from
Table 1, reflecting the early-exercise premium. The volatility skew, reflected in the tree by
the varying transition probabilities and prices, does not conform to the CRR assumptions.
The higher CRR volatility comes from the relatively thicker tails of the historical distribution:
there is a larger chance that the option will expire in the money and so the price (therefore
volatility) is higher.
These prices can provide a baseline for judging the relative prices of American options. This
baseline is constructed entirely using data on the underlying security--not the quoted options
prices. It gives an independent measure of the options prices. Alternately, with some more
computation, we could match a quoted American option price (instead of a European option
20
price). From that we could find the rest of the (American) volatility skew that is consistent
with the historical distribution.
6. Conclusions
This paper shows how to build a binomial tree that is consistent with the historical
distribution and that can be used to value American options. The advantage of using the risk-
neutralized historical distribution (RNHD) is that it avoids making restrictive assumptions
about the distribution of the returns. Where we lack knowledge (about the higher moments
and the behavior of the tails) we will not impose strict assumptions but rather will use the past
as our guide. Only where we have specific knowledge or a strong theoretical rationale do we
make particular assumptions.
The RNHD method of Stutzer provides a way to use data on just the underlying security to
imply options valuations. As Zou and Derman discuss, this method of using only data on the
underlier has two virtues. First, options (or, for portfolio analyses, assets with option-like
characteristics) can be valued even if option markets are thin or absent. Second, in cases
where there is ample data on option prices, the historically based simulation method is still
useful because it provides an independent baseline. These papers and subsequent work,
however, were only able to value European options. This paper shows a method by which
early-exercise American options can be valued. Further research should test the predictability
of prices generated by this method against those from other models.
21
Tree-generating methods that use option prices as inputs do not leave any degrees of freedom
to judge whether an option is dear or cheap, but this kernel-based historically consistent
simulation procedure can give stable benchmark option values. This paper's contribution
extends the Stutzer methodology to build binomial trees that can be used to value American
options.
22
Bibliography
Andrews, Donald W.K. (1994). "Empirical Process Methods in Econometrics," Cowles

Foundation Discussion Paper No. 1059.
Barle, Stanko and Nusret Cakici (1998). "How to Grow a Smiling Tree," Journal of
Financial Engineering, 7, 127-46.
Buchen, Peter W., and Michael Kelly (1996). "The Maximum Entropy Distribution of an
Asset Inferred from Option Prices," Journal of Financial and Quantitative Analysis,
31, 143-59.
Cakici, Nusret, and Kevin R. Foster (2001). "Risk-Neutralized At-the-Money Consistent

Historical Distributions in Currency Options Pricing," Journal of Computational
Finance, forthcoming.
Chapman, David A., and Neil D. Pearson (2000). "Is the Short Rate Drift Actually
Nonlinear?" Journal of Finance, 55(1), 355-88.
Cox, John C., Steven Ross, and Mark Rubinstein (1979). "Option Pricing: A Simplified
Approach," Journal of Financial Economics, 9, 321-46.
Derman, Emanuel and Iraj Kani (1994). "Riding on the Smile," Risk 7 32-39.
Dupire, Bruno (1994). "Pricing with a Smile," Risk 7 18-20.
Grandmont, J.M. (1993). "Expectations Driven Nonlinear Business Cycles," Proceedings of

the German Academy of Sciences, Westdeutscher Verlag.
Kapur, J.N., and H.K. Kesavan (1992). Entropy Optimization Principles with Applications.
New York: Academic Press Harcourt Brace Jovanovich.
Rubinstein, Mark (1994). "Implied Binomial Trees," Journal of Finance 49 (3), 771-818.
Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis. New York:
Chapman and Hall.
Stutzer, Michael (1996). "A Simple Nonparametric Approach to Derivative Security

Valuation," Journal of Finance, 51, 1633-52.
Zou, Joseph and Emanuel Derman (1999). "Strike-Adjusted Spread: A New Metric for
Estimating the Value of Equity Options," Goldman Sachs Quantitative Strategies
Research Notes.
23
Figure 1
Comparison of Historical PDF of S&P100 returns (solid line) to Normal (dashed line)
Historical pdf is estimated with an Epanechnikov kernel with optimal bandwidth, as described in the text.
24
Figure 2
Implied Binomial Tree for one-year options, illustrated with 5 steps
155.97
144.42 -
129.26 0.59
0.41 0.02
0.04
0.10
119.96 130.27
F
0.55 120.23 -
108.27 0.19 0.49 0.13
0.41 C 110.78 0.25 J
100 0.47 0.50
0.482 112.45
A 0.38
1 101.52 103.07 -
G 0.31
93.77 0.48 0.48
0.69 0.62 94.45 0.40
0.51 D 0.56
0.39 95.90
B
79.12 H 85.14 -
0.45 0.63 0.32
0.16 67.89 0.19
E 0.24 68.88
0.09 -
63.01
I 0.12
0.79
0.06
43.10
-
0.01
At each node, the top entry is the stock price (beginning at 100); the middle entry is the
probability of a subsequent upward move, pi ; the lowest entry at each node is the Arrow-
Debreu price, λi . The labels, A, B, C, ... J, refer to the nodes that are explicitly calculated in
the paper.
25
Table 1
Prices of 1-year European Call Options, with Implied Volatilities, from
nonparametric kernel estimation RNHD procedure
Black-Scholes Implied
Strike/ Price Option Price Volatility
80 24.20 0.270
85 20.05 0.245
90 16.18 0.227
91 15.45 0.224
92 14.73 0.221
93 14.03 0.218
94 13.35 0.215
95 12.68 0.213
96 12.04 0.211
97 11.41 0.209
98 10.80 0.207
99 10.21 0.205
100 9.64 0.203
101 9.09 0.201
102 8.56 0.200
103 8.05 0.198
104 7.56 0.197
105 7.09 0.195
106 6.63 0.194
107 6.20 0.193
108 5.79 0.191
109 5.40 0.190
110 5.03 0.189
115 3.47 0.185
120 2.32 0.182
26
Table 2
American Call Prices from 50-step binomial tree with historically-based volatility skew
American Call Price, using historically- CRR implied
Strike/ Price based volatility skew volatility
80 26.99 0.387
85 22.79 0.345
90 18.80 0.312
91 18.02 0.307
92 17.24 0.301
93 16.46 0.293
94 15.74 0.287
95 15.05 0.282
96 14.36 0.276
97 13.66 0.271
98 12.97 0.267
99 12.32 0.263
100 11.73 0.261
101 11.13 0.256
102 10.54 0.252
103 9.95 0.247
104 9.38 0.243
105 8.89 0.242
106 8.41 0.240
107 7.93 0.239
108 7.45 0.235
109 6.97 0.231
110 6.58 0.229
115 4.74 0.221
120 3.30 0.214
27

Implied Binomial Trees from Historical Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Implied Binomial Trees from Historical Data

Uploaded by

Copyright:

Available Formats

Implied Binomial Trees from the Historical Distribution

City College of New York

Draft -- Not for Citation

of the option value for American options or other path-dependent derivatives.

the risk-neutral distribution (Cakici and Foster 2001).

analysts can also make cross-comparisons.

log-normality, makes the proper modeling of these options particularly important.

2. Constructing the RNHD Estimate of the Volatility Smile

Equilibrium in a financial market is generally characterized with assumptions about a lack of

calculation of the values of any standard European options.

prejudicial application of the no-arbitrage constraint.

The Shannon entropy of a distribution, Q, with observations qj, is measured as

deviations by their probability of occurrence, so it is more sensitive to deviations that are

principle of insufficient reason. For instance, the Black-Scholes assumtion of a normal

assumptions can be measured as decreasing entropy (increasing information), and so finding a

of Bose-Einstein or Maxwell-Boltzmann distributions (see Kapur and Kesavan 1992).

option) should match, thus

C(K,T ) = e -rT max(0, E[S T Q(λ1 , λ2 )] - K ) or (5)

P(K , T ) = e -rT max(0, K - E[S T Q(λ1 , λ2 )]), (6)

matched, we can derive

4.44. Note that the entropy distance from a uniform (each pi = 1

equations in two unknowns:

minimization no longer has analytical solutions.

3. Kernel Estimator of Historical Distribution

the historical distribution.

(Andrews 1994). Stochastic equicontinuity of an empirical process, HT, is defined as

∀ε > 0 and ∀η > 0 , ∃δ > 0 such that

The kernel estimator at each point is constructed as

where K is the Epanechnikov kernel

parameter, h, is similarly selected based on limit efficiency as

a low bandwidth while low-density estimators get a higher bandwidth.

this issue, since it is straightforward to introduce another dimension of temporal distance

when creating a kernel estimate of the historical distribution.

preserves the necessary information while minimizing the computational burden.

4. Constructing a Recombining Binomial Tree from the Volatility Smile

price, Fi = si e (r − d )∆t , must by risk-neutrality satisfy

volatility. Then the call and put prices are:

where Λ i is the Arrow-Debreu price at each node, calculated as

where λi is the known Arrow-Debreu price at node i.

central nodes must satisfy S i S i +1 = Fi 2 . So the upper central node is

while the lower central node is

For the upper nodes, the previous equations imply that

K, equal to F, in order to evaluate the option. Given these modifications, negative

probabilities are avoided even when building very large trees.

historical distribution may be summarized as follows:

1. A nonparametric kernel is constructed to estimate the empirical distribution of asset

(Equations 5 and 6), typically at-the-money as in Zou and Derman.

intervals and a polynomial function interpolates the "smile".

historical data, using the Barle and Cakici algorithm.

This kernel is used to generate a risk-neutralized historical distribution (RNHD), assuming a

those of the normal distribution, the option prices are higher.

and the estimated betas for this case are:

price, S1 , is 100 and λ1 =1.

number of new nodes, we use Equation 21 to calculate

where the initial Arrow-Debreu price is set to be λ1 = 1 , from Equation 22 we have

∆C1 = 3.6179 and S A = 108.27 .

and Λ B = λ1 (1 − p1 )e − r∆t = 0.51.

using Equation 24, the price of the stock,

∆CC (101.5211) − λ A (109.0831)(109.0831 − 101.5211)

S C =119.9606. The transition probability of moving from node A to node C,

λ B (94.4832)(101.5211 − 94.4832) − ∆PE (101.5211)

λC (FC , D − F4 ) , where λC was previously calculated to be 0.1918, FC , D = 120.87

∆CF =0.8783. From this we get