Professional Documents
Culture Documents
Economics
- Topics 2 and 3 -
A. Duplinskiy
What we did until now!
2 / 77 Week 2 Introduction
This week
Ergodicity
Assignment
Multivariate Regression: More than 1 feature
Clustering: K-Means and Hierarchical, Affinity
Propagation
Logistic Regression
3 / 77 Week 2 Introduction
Next Week
4 / 77 Week 2 Introduction
Homework: Lecture 1
6 / 77 Week 2 Introduction
Exercises 1.2 and 1.3
8 / 77 Week 2 Introduction
Exercise 1.6
Exercise 1.6: Show that ρX (t + h, t) = ρX (h) if the time series
is weakly stationary.
Answer: The ACF is given by
Cov(Xt+h , Xt )
ρX (t + h, t) = p
Var(Xt+h )Var(Xt )
2
Cov(Xt+h , Xt ) = γX (h) and Var(Xt+h ) = Var(Xt ) = σX
9 / 77 Week 2 Introduction
Exercise 1.11
γX (t, t − h)
Hence: ρX (t, t − h) = p
Var(Xt )Var(Xt−h )
√
(t − h)σε2 t−h
= p = √
(tσε2 )((t − h)σε2 ) t
11 / 77 Week 2 Introduction
Exercise 1.17
(a) (continued)
2
Var(Wt ) = Var(Xt + Yt ) = Var(Xt ) + Var(Yt ) = σX + σY2
Cov(Wt , Wt−h ) = Cov(Xt + Yt , Xt−h + Yt−h )
= Cov(Xt , Xt−h ) + Cov(Xt , Yt−h ) + Cov(Yt , +Xt−h )
+ Cov(Yt , Yt−h )
= γX (h) + 0 + 0 + γY (h) = γX (h) + γY (h).
Since expectation, variance and covariance are all finite and
constant over time we conclude that {Wt } is weakly stationary.
13 / 77 Week 2 Introduction
Useful info
Doing extra: Go ahead, but make sure you know what you are
doing
In this assignment you have data about id’s and when and
where they were issues. Do not worry, there is no id
number and I modified the data!
Assignment objectives:
1 Work with text data and learn how to cluster text data.
2 Get the feeling how to calculate the value of your work.
3 Visualize data and use clustering for data cleaning.
Goal:
Assign a unique index to each observation.
To minimize the within-cluster variation given the number
of clusters.
1
Pp
− x i0 j ) 2
P
W CV (Ck ) = |Ck | xi ,xi0 ∈Ck j=1 (xij
Procedure:
1 Randomly assign a number, from 1 to K, to each of the
observations. These serve as initial cluster assignments for
the observations.
2 Iterate until the cluster assignments stop changing:
1 For each of the K clusters, compute the cluster centroid.
The kth cluster centroid is the vector of the p feature means
for the observations in the kth cluster.
2 Assign each observation to the cluster whose centroid is
closest (where closest is defined using Euclidean distance).
No simple answer:
Number given by domain knowledge.
Elbow method.
Silhouette Score.
https://scikit-learn.org/stable/modules/clustering.html
Implementation:
Dependent variable is binary ∈ {0, 1} – probability of
succes.
Use stats models and sklearn
Note:
F contains the empty set ∅
F contains each element of E
F contains the event space E = {heads, tails}
Hence: Probability measure P must define a probability of
Nothing happening P (∅)
Drawing heads P (heads)
Drawing tails P (tails)
Drawing either heads or tails P ({heads, tails})
78 / 77 Probability Models Introduction
σ-fields (σ-algebras)
F (a) = PR (x ≤ a) ∀ a ∈ R.