# Notes on the Kalman Filter

There is an observed variable yit which behavior is driven by an unobserved factor α and by uncorrelated idiosyncratic components uit . We can set an example based on a time series such as y1 , ....., yn ordered in time. The basic way of representing a time series is the additive linear model: yt = αt + εt where t = 1, ...., n To develop an appropriate model for αt we need the concept of a random walk. This is a scalar series αt determined by the relation αt+1 = αt + ηt where the ηt s are independent and identically distributed random variable with zero 2 means and variances ση . Lets consider a simple model such as yt αt+1 = αt + εt , = αt + ηt ,
2 εt ∼ N (0, σε ) 2 ηt ∼ N (0, ση )

We can write this model in its state space form: Yt Ωt+1 = Z t Ωt + W t , = Tt Ωt + Kt Πt , with Wt ∼ N (0, Ht ) with Πt ∼ N (0, Qt )

This is the local level model where Yt is a vector of observations and Ωt is unobserved. The idea underlying the model is that the development of the system over time is determined by αt according to the second equation in the system. But because αt cannot be observed directly we must base the analysis on observations of yt . So the object of the methodology here is to infer relevant properties of the α s from a knowledge of the observations y1 , ...., yn . The ﬁrst equation of the system is called the observation equation and the second is called the state equation. The matrices Zt ,Ht ,Tt ,Qt and Kt are initially assumed to be known. Other elements are unknown parameters to be estimated: the αt . The dimensions of the state space model are given in the table below:

Vector Yt Ωt Wt Πt nobs x 1 nobs x 1 nobs x 1 dim x 1

Matrix Zt Tt Kt Qt Ht nobs x ns ns x ns ns x dim dim x dim nobs x nobs

1

sequentially. Speciﬁcally. m = µ2 + Ω21 Ω−1 (yt − µ1 ) 11 = Ω22 − Ω21 Ω−1 Ω12 11 The crucial result here is that the Kalman ﬁlter consists in applying this result to each observation t = 1. for the present case.The Kalman ﬁlter is a recursive algorithm that estimates an optimal forecast of αt given all available information at time t − 1. The Kalman ﬁlter is based on properties of the joint normal distributions. yt αt ∼N µ1 µ2 . once we have the value for αt|t−1 and the ˆ associated variance Pt|t−1 how do we calculate the forecast αt+1|t and associated ˆ variance Pt+1|t for the next observation? From the state equation we know that αt+1 = Tt αt + ηt+1 . 2. two assumptions are made: Tt and Qt are known and ηt+1 distribute normally. E[Yt |It−1 ] = Zt αt|t−1 ˆ And the corresponding forecast error is. Yt − E[Yt |It−1 ] = (Zt αt + εt ) − (Zt αt|t−1 ) = Zt (αt − αt|t−1 ) + εt ˆ ˆ Hence the variance of the forecast error is equal to: V [Yt − E[Yt |It−1 ]] = Zt Pt|t−1 Zt + H 2 . the recursions. Lets go back to the observation equation: Yt = Zt αt + εt with εt ∼ N (0.. Then it follows that: αt+1|t = E[αt+1 |It ] = E[Tt αt + ηt+1 |It ] = Tt E[αt |It ] ˆ So the ﬁrst step is to obtain αt|t = E[αt |It ] ˆ For the following. Now. Ω11 Ω21 Ω12 Ω22 ) where Then the distribution of αt conditional on yt is N (m. This optimal forecast for αt given the information set at t − 1 is the conditional mean: αt|t−1 = E[αt |It−1 ] ˆ we also make use of the conditional variance of the α: Pt|t−1 = E[(αt − αt|t−1 )(αt − αt|t−1 ) ] ˆ ˆ So the crucial question here is that. . Ht ) From which it follows that the optimal forecast of Yt given It−1 is. it is the properties of the joint normal distribution of yt and αt ..

It is often assumed that ˆ α1|0 = E[αt ] = 0 ˆ P1|0 = E[αt αt ] Parameter estimation. Q are unknown.. H. here we face the complication that consecutive observations yt and yt+1 are not independent because of the dynamic speciﬁcation of the state equation for αt+1 . So we need values for α1|0 and P1|0 . α αt|t = αt|t−1 + Pt|t−1 Zt (Zt Pt|t−1 Zt + H)−1 (Yt − Zt αt|t−1 ) ˆ ˆ ˆ Pt|t = Pt|t−1 − Pt|t−1 Zt (Zt Pt|t−1 Zt + H)−1 Zt Pt|t−1 Recalling the state equation. θ). yt |θ) = ΠT f (yt |θ) t=1 To solve this problem: f (A|B) = f (A B)/f (B). the joint pdf of all the observations y1 . then the joint pdf can be written as the product of conditional pdfs f (y1 . y1 .... We can use the ML for this purpose....So now we have all the ingredients for the joint distribution of Yt and αt conditional on It−1 : yt |It−1 αt |It−1 Zt αt|t−1 ˆ αt|t−1 ˆ Zt Pt|t−1 Zt + H Pt|t−1 Zt Zt Pt|t−1 Pt|t−1 ∼N . θ) 3 . Pt|t ) where.. yT |θ) = f1 (y1 |θ)f2 (y2 |y1 . It−1 ∼ N (ˆ t|t . T. Compared with the linear regression example. αt = Tt αt−1 + Kt Πt It follows that αt+1|t = T αt|t ˆ ˆ Pt+1|t = T Pt|t T + Q Combining these with the previous two equations.. . Using the results of the normal distribution given before. .... .fT (yT |yT −1 ... . it follows that αt |Yt . If the parameters Z. Therefore.. we get αt+1|t = T αt|t = T (ˆ t|t−1 + Pt|t−1 Zt (Zt Pt|t−1 Zt + H)−1 (Yt − Zt αt|t−1 )) ˆ ˆ α ˆ Pt+1|t = T Pt|t T + Q = T (Pt|t−1 − Pt|t−1 Zt (Zt Pt|t−1 Zt + H)−1 Zt Pt|t−1 )T + Q Initialization. We need some starting values for starting the recursion. then we need to estimate them.. yT cannot be written as the product of individual pdf’s such as f (yt ..

We can reconstruct the value for αt|t−1 using the Kalman ﬁlter but it ˆ depends on the unknown parameters θ. Ht ) from which we derived E[yt |It−1 ] = µt (θ) = Z αt|t−1 ˆ V [yt |It−1 ] = t (θ) = Z Pt|t−1 Z + H Furthermore. So we can break the loop as follows: 1. T T T logf (yt |It−1 . 4 . The iteration stops there and we do not calculate the αt|t−1 again ˆ for this optimal parameters.H and Q and run the Kalman ﬁlter to get an estimate of αt|t−1 (θ0 ) ˆ 2. We iterate between the two previous steps until we ﬁnd the optimal parameters which are those that maximize the log-likelihood f(x).T. we need αt|t−1 which enters the expression ˆ µt (θ). This has to be ﬁxed. the conditional distribution of yt is normal such that the ML becomes.Now recalling the observation equation Yt = Z αt + εt with εt ∼ N (0. In order to implement the ML estimation of the unknown parameters. We make an initial guess for the parameters Z. We use this αt|t−1 (θ0 ) in the log-likelihood f(x) to ﬁnd a new estimate of ˆ θ1 for which the value of (A) is bigger than with θ0 3. where we obtain the optimal θ. One important issue is related to the last iteration. The current document is based on lectures’ slides of professors. The estimates we obtain in this fashion will be consistent and asymptotically normal. note the loop. θ) = − t=1 1 Tk log(2π)− 2 2 log|Σt (θ)|− t=1 1 2 (yt −µt (θ)) [Σt (θ)]−1 (yt −µt (θ)) (A) t=1 Most important remark.