You are on page 1of 156

Preprocessing for Financial Data

授課教師: 翁詠祿
日期: 10/18/2022、
11/01/2022

Department of Electrical Engineering


National Tsing-Hua University, HsinChu, Taiwan
Outline
q Data Preprocessing (Ch. 4, [2])
q Dimensionality Reduction For Data (Ch. 5, [2])
q Financial Data Structure (Ch. 2, [1][3])
q Labeling (Ch. 3, [1][3])

[1] Marcos Lopez de Prado, Advances in Financial Machine Learning, Wiely, 2018
[2] Sebastian Raschka and Vahid Mirjalili, Python Machine Learning: Machine Learning and Deep Learning with
Python, scikit-learn, and TensorFlow 2, 3rd Edition, 2019
[3] Stefan Jansen, Machine Learning for Algorithmic Trading - Second Edition,2020

Dept. of Electrical Engineering


2 National Tsing Hua University, Taiwan
Data Preprocessing

Department of Electrical Engineering


National Tsing-Hua University, HsinChu, Taiwan
Outline
q Dealing with missing data
q Handling categorical data
q Feature scaling
q Selecting meaningful feature
q Assessing feature with random forests

Dept. of Electrical Engineering


4 National Tsing Hua University, Taiwan
Outline
q Dealing with missing data
Ø Identifying miss values in tabular data
Ø Eliminating training data with missing values
Ø Imputing missing values
q Handling categorical data
q Feature scaling
q Selecting meaningful feature
q Assessing feature with random forests

Dept. of Electrical Engineering


5 National Tsing Hua University, Taiwan
Dealing with missing data
q It’s not uncommon in real-world applications for the
training examples.

q There could have been an error in the data collection


process, certain measurements may not be applicable, or
particular fields could have been simply left black (“NaN”
or NULL ) in a survey.

q There are two ways to deal with missing data:


1. Removing entries from dataset.
2. Imputing missing values from other training examples and
features.

Dept. of Electrical Engineering


6 National Tsing Hua University, Taiwan
Identifying miss values in tabular data
q Example: there is a matrix as follow:

q There are two “NaN” data, which mean the missing data.
Ø column C and D have one “NaN” respectively.

Dept. of Electrical Engineering


7 National Tsing Hua University, Taiwan
Eliminating training data with missing values
q There are five ways to remove missing values:

Original data
1. Remove rows that contain missing values:

2. Remove columns that contain missing values:

Dept. of Electrical Engineering


8 National Tsing Hua University, Taiwan
Eliminating training data with missing values
3. Only drop rows where all columns are NaN:

4. Drop rows that have fewer than 3 real values:

5. Only drop rows where NaN appear in specific:

Dept. of Electrical Engineering


9 National Tsing Hua University, Taiwan
Imputing missing values
q The removal of training examples by dropping of entire
feature columns is simply not feasible, because we might
lose too much valuable data.

q One of most common techniques is mean imputation,


where we simply replace the missing value with the mean
value of the entire feature column.

Original data

Dept. of Electrical Engineering


10 National Tsing Hua University, Taiwan
Outline
q Dealing with missing data
q Handling categorical data
Ø Categorical data encoding
Ø Mapping ordinal features
Ø Encoding class labels
Ø One-hot encoding on nominal features
q Partitioning a dataset
q Feature scaling
q Selecting meaningful feature
q Assessing feature with random forests

Dept. of Electrical Engineering


11 National Tsing Hua University, Taiwan
Categorical data encoding
q Categorical data can further distinguish between ordinal
and nominal features.
Ø Ordinal features : Categorical values that can be sorted or
ordered.
Ø Nominal features: Categorical values that don’t imply any order.

q For example, t-shirt size would be an ordinal feature, but


t-shirt color would be a nominal feature.
Ø Size can define an order : XL>L>M.
Ø Color can’t define an order. (e.g. red and blue)

Dept. of Electrical Engineering


12 National Tsing Hua University, Taiwan
Categorical data encoding
q Example for nominal and ordinal features:

q This dataset contains:


Ø A nominal feature (color).
Ø An ordinal feature (size).
Ø a numerical feature (price).

q The class are stored in the last column.

Dept. of Electrical Engineering


13 National Tsing Hua University, Taiwan
Mapping ordinal features
q To make sure that ordinal features interpret correctly, we
need to convert the categorical string values into
integers.
Ø 𝑋𝐿 −> 3, 𝐿 → 2, 𝑀 → 1

Dept. of Electrical Engineering


14 National Tsing Hua University, Taiwan
Encoding class labels
q Many machine learning libraries require that class labels
are encoded as integer values.

q To encode the class labels, we can use an ordinal


features discussed previously.

q We need to remember that the class labels are not ordinal


, and it doesn’t matter which integer number we assign to
a particular string label.

Dept. of Electrical Engineering


15 National Tsing Hua University, Taiwan
Encoding class labels
q Transform the class labels into integers:
Ø class1 → 0 ; class2 → 1

Dept. of Electrical Engineering


16 National Tsing Hua University, Taiwan
One-hot encoding on nominal features
q If the nominal features(e.g. color) encode the string
labels into integers, there are some mistakes because
nominal features (color) don’t have any particular order.

q The idea behind one-hot is to create a new dummy


feature for each unique value in the nominal feature
column.

q In this example, we need to convert color into one-hot


encoding.

Dept. of Electrical Engineering


17 National Tsing Hua University, Taiwan
One-hot encoding on nominal features
q Use one-hot encoder to color:
Ø 𝑏𝑙𝑢𝑒 = 0 ; 𝑔𝑟𝑒𝑒𝑛 = 1 ; 𝑟𝑒𝑑 = 2

q Transform columns in a multi-feature array:

Dept. of Electrical Engineering


18 National Tsing Hua University, Taiwan
Outline
q Dealing with missing data
q Handling categorical data
q Feature scaling
Ø Normalization and standardization
q Selecting meaningful feature
q Assessing feature with random forests

Dept. of Electrical Engineering


19 National Tsing Hua University, Taiwan
Feature scaling
q Feature scaling is a crucial step that can easily be
forgotten.

q Decision trees and random forests are two for the very
few machine learning algorithms where we don’t need to
worry about feature scaling. (scale invariant)

q The majority of machine learning and optimization


algorithms behave much better if features are on the
same scale. (e.g. gradient descent optimization)

Dept. of Electrical Engineering


20 National Tsing Hua University, Taiwan
Normalization and Standardization
q There are two common approaches to bring different
features onto the same scale:
1. Normalization: (min-max scaling)
(") x (") − x$"%
x = , the range is [0,1]
x$&' − x$"%
2. Standardization:
(") x (") − µ'
x()* =
σ'

Dept. of Electrical Engineering


21 National Tsing Hua University, Taiwan
Outline
q Dealing with missing data
q Handling categorical data
q Partitioning a dataset
q Feature scaling
q Selecting meaningful feature
Ø L1 and L2 regularization as penalties
Ø A geometric interpretation of L2 regularization
Ø Sparse solutions with L1 regularization
Ø Sequential feature selection algorithms
q Assessing feature with random forests

Dept. of Electrical Engineering


22 National Tsing Hua University, Taiwan
Selecting meaningful feature
q If a model performs much better on a training dataset
than on the test dataset, this observation is a strong
indicator of overfitting.

q Common solution to reduce the generalization error are


as follows:
Ø Collect more training data
Ø Introduce a penalty for complexity via regularization
Ø Choose a simpler model with fewer parameters
Ø Reduce the dimensionality of the data

Dept. of Electrical Engineering


23 National Tsing Hua University, Taiwan
L1 and L2 regularization as penalties
q L2 regularization is one approach to reduce the complexity
of a model by penalizing large individual weights.
$
L2: 𝐰 +
+ = I w,+
,-.

q L1 regularization usually sparse feature vectors and most


feature weights will be zero.
$
L1: 𝐰 . = I 𝑤/
,-.

q Sparse can be useful in practice if we have a high-


dimensional dataset with many features that are irrelevant.

Dept. of Electrical Engineering


24 National Tsing Hua University, Taiwan
A geometric interpretation of L2 regularization
q Without penalty :

Dept. of Electrical Engineering


25 National Tsing Hua University, Taiwan
A geometric interpretation of L2 regularization
q With L2 regularization :

Dept. of Electrical Engineering


26 National Tsing Hua University, Taiwan
Sparse solutions with L1 regularization

Dept. of Electrical Engineering


27 National Tsing Hua University, Taiwan
Sparse solutions with L1 regularization
q Before using regularization, we use the Wine dataset to
train model:
Ø 13 different features.
Ø 178 wine examples.
Ø 3 classes (𝑐𝑙𝑎𝑠𝑠0, 𝑐𝑙𝑎𝑠𝑠1, 𝑐𝑙𝑎𝑠𝑠2) = (59,71,48).

Dept. of Electrical Engineering


28 National Tsing Hua University, Taiwan
Sparse solutions with L1 regularization
q All feature weights will be zero if we penalize the model
with a strong regularization parameter (C<0.01); C is the
inverse of the regularization parameter,𝜆.

Dept. of Electrical Engineering


29 National Tsing Hua University, Taiwan
Sequential feature selection algorithms
q Sequential Feature selection:
Ø Greedy search algorithms.
• Selects a subset of original features.
• It’s an alternative way to reduce the complexity of the model and
avoid overfitting.
• Greedy search algorithms make locally optimal choices.

(k < d; k = initial feature space; d = feauture subspace)

q The motivation behind feature selection algorithms:


Ø Automatically select a subset of features that are most relevant to
the problem.
Ø Improve computational efficiency.
Ø Reduce the generalization error of the model by removing
irrelevant features or noise.
It’s useful for algorithms that don’t support regularization.
Dept. of Electrical Engineering
30 National Tsing Hua University, Taiwan
Sequential feature selection algorithms
q A classic sequential feature selection algorithm is
sequential backward selection (SBS).
Ø It aims to reduce the dimensionality of the initial feature subspace
with a minimum decay in the performance of the classifier.

q The idea behind SBS:


Ø Removes features sequentially from the full feature subset until
the new feature subspace contains the desired number of
features.

q In order to determine which feature is to removed at each


stage, we need to define the criterion function, 𝐽, that we
want to minimize.

Dept. of Electrical Engineering


31 National Tsing Hua University, Taiwan
Sequential feature selection algorithms
q The feature to be removed at each stage can simply be
defined as the feature that maximizes this criterion; or in
more simple terms, at each stage we eliminate the feature
that causes the least performance loss after removal.

q The outline of preceding definition of SBS is in four simple


steps: (Reduce the number of features to desired number.)
1. Initialize the algorithm with 𝑘 = 𝑑, where d is the dimensionality of the full
feature space.
2. Calculate the criterion function (𝐽) with each feature. (𝑘)
3. Remove the feature which has maximum criterion function.
• Dimension: 𝑘 = 𝑘 – 1.
4. Terminate if 𝑘 equals the number of desired features; otherwise, go to
Step 2.

Dept. of Electrical Engineering


32 National Tsing Hua University, Taiwan
Sequential feature selection algorithms
q SBS implementation using the KNN classifier.
Ø Dataset: Wine dataset.
Ø Neighbors=5. Choose K=3

q The accuracy of KNN improved on the validation dataset


as we reduced the number of features.

Dept. of Electrical Engineering


33 National Tsing Hua University, Taiwan
Outline
q Dealing with missing data
q Handling categorical data
q Partitioning a dataset
q Feature scaling
q Selecting meaningful feature
q Assessing feature with random forests

Dept. of Electrical Engineering


34 National Tsing Hua University, Taiwan
Assessing feature with random forests
q We can measure the feature importance as the average
impurity decrease computed from all decision trees in the
forest, without making any assumptions about whether
our data is linearly separable or not.

q Use Wine dataset and rank 13 features.


Ø 500 decision trees.
Ø The number represents how important of each features is.

Sum up to 1

Dept. of Electrical Engineering


35 National Tsing Hua University, Taiwan
Assessing feature with random forests
q We can conclude that these are the most discriminative
features in the dataset based on the average impurity
decrease in the 500 decision trees.
1. Proline
2. Flavonoid levels
3. Color intensity
4. OD280/OD315 diffraction
5. Alcohol concentration

q We can simplify the model by using only five features, the


prediction accuracy declined slightly.

Dept. of Electrical Engineering


36 National Tsing Hua University, Taiwan
REFERENCE
[1] S. Raschka, “Model Evaluation, Model Selection, and Algorithm Selection in Machine
Learning.” ArXiv abs/1811.12808 (2018): n. pag.

Dept. of Electrical Engineering


37 National Tsing Hua University, Taiwan
Dimensionality Reduction for Data

Department of Electrical Engineering


National Tsing-Hua University, HsinChu, Taiwan
Outline
q Eigen values and Eigen vectors
q Unsupervised dimensionality reduction via PCA
q Supervised data compression via LDA
q Using KPCA for nonlinear mappings

Dept. of Electrical Engineering


39 National Tsing Hua University, Taiwan
Outline
q Eigen values and Eigen vectors
Ø Properties of eigen values and eigen vectors
Ø Eigen decomposition
q Unsupervised dimensionality reduction via PCA
q Supervised data compression via LDA
q Using KPCA for nonlinear mappings

Dept. of Electrical Engineering


40 National Tsing Hua University, Taiwan
Eigen values and Eigen vectors
q When the number of features in dataset are large, the
data processing may be complex.
Ø PCA and LDA are based on this concept to extract the feature.
• PCA: Principal Component Analysis
• LDA: Linear Discriminant Analysis
Ø PCA and LDA needs the knowledge of eigen values and vectors.

Dept. of Electrical Engineering


41 National Tsing Hua University, Taiwan
Eigen values and Eigen vectors
q Let A be any square matrix. A scalar 𝜆 is called an eigen
value of A if there exists a nonzero (column) vector 𝒗
(eigen vector):
𝐴𝒗 = λ𝒗
λ𝒗 − 𝐴𝒗 = 0 (𝜆𝐼 − 𝐴)𝒗 = 0

q Characteristic equation:
Ø It’s solved to find a matrix's eigen vectors. (non-zero solution)

det(𝜆𝐼 − 𝐴) = 0

q Note: each scalar multiple 𝑘𝒗 of an eigen vector


belonging to λ is also an eigen vector.
𝐴 𝑘𝒗 = 𝑘 𝐴𝒗 = k 𝜆𝒗 = λ(𝑘𝒗)

Dept. of Electrical Engineering


42 National Tsing Hua University, Taiwan
Eigen values and Eigen vectors
q Calculate two eigen values and eigen vectors:

0.8 0.3 0.8 − 𝜆 0.3 +


3 1 1
𝐴= → det = 𝜆 − 𝜆 + = (𝜆 − 1)(𝜆 − )
0.2 0.7 0.2 0.7 − 𝜆 2 2 2

𝐴 − 𝐼 𝒙𝟏 = 0 → 𝐴𝒙𝟏 = 𝒙𝟏 → 𝒙𝟏 = (0.6,0.4)"
1 1
𝐴 − 𝐼 𝒙𝟐 = 0 → 𝐴𝒙𝟐 = 𝒙𝟐 → 𝒙𝟐 = (1, −1)"
2 2

q Check:
0.8 0.3 0.6
𝐴𝒙𝟏 = = 𝒙𝟏 → 𝜆 = 1 𝐴$%% 𝒙𝟏 = 𝐴&& 𝒙𝟏 = ⋯ = 𝒙𝟏
0.2 0.7 0.4

0.8 0.3 1 1 1
𝐴𝒙𝟐 = = 𝒙𝟐 → 𝜆 = 1 1 $%%
0.2 0.7 −1 2 2 𝐴$%% 𝒙𝟐 = 𝐴&& 𝒙 = ⋯ = ( ) 𝒙𝟐
2 𝟐 2

Small value

Dept. of Electrical Engineering


43 National Tsing Hua University, Taiwan
Eigen values and Eigen vectors
q Other vectors do change direction. But all other vectors
are combinations of the two eigenvectors.
Ø Separate into eigenvectors. (first column)
0.8 0.6 0.2
= 𝒙𝟏 + 0.2 𝒙𝟐 = +
0.2 0.4 −0.2

0.8 0.7 1 0.6 0.1


𝐴 = = 𝐴𝒙𝟏 + 0.2 𝐴𝒙𝟐 = 𝒙𝟏 + 0.2 𝒙𝟐 = +
0.2 0.3 2 0.4 −0.1

q First column of 𝐴JKK:

1 very
0.8 0.6
𝐴&& = 𝒙𝟏 + ( )&& 0.2 𝒙𝟐 = + small
0.2 2 0.4
vector

Dept. of Electrical Engineering


44 National Tsing Hua University, Taiwan
Eigen values and Eigen vectors
q It’s showed that:
Ø The eigen vector 𝒙𝟏 is a steady state. (∵ 𝜆 = 1)
Ø The eigen vector 𝒙𝟐 is a decaying mode. (∵ 𝜆 = 0.5)

q The higher the power of 𝐴, the closer its columns


approach the steady state.

Dept. of Electrical Engineering


45 National Tsing Hua University, Taiwan
Properties of eigen values and eigen vectors
q Let 𝐴 be a square matrix. Then the following are
equivalent :
1. A scalar λ is an eigenvalue of 𝐴.
2. The (𝜆𝐼 − 𝐴)𝒗 = 0 has nontrivial solutions. (singular matrix).
3. There is a nonzero vector 𝒗 in ℛ 2 such that A𝒗 = λ𝒗.
4. λ is a solution of det(𝜆𝐼 − 𝐴) = 0.

Dept. of Electrical Engineering


46 National Tsing Hua University, Taiwan
Eigen decomposition
q Some properties of deriving eigen decomposition:
Ø Similarity:
• Suppose 𝐴 and 𝐵 are square matrices for which there exists an
invertible matrix P, then 𝐵 is said to be obtained from 𝐴 by a
similarity transformation such that:

𝐵 = 𝑃3. 𝐴𝑃

Ø Linear dependence and independence:


• Let 𝑽 be a vector space over a field ℱ.
• We say that the vectors 𝒗$ , 𝒗' , … , 𝒗( in 𝑽 are linearly dependent if
there exist scalars 𝑎$ , 𝑎' , … , 𝑎( in ℱ, not all of them 0, such that:

𝑎. 𝒗. + 𝑎+ 𝒗+ + ⋯ + 𝑎4 𝒗4 = 0

• Otherwise, we say that vectors are linearly independent.

Dept. of Electrical Engineering


47 National Tsing Hua University, Taiwan
Eigen decomposition
q An n-square matrix 𝐴 is similar to a diagonal matrix 𝐷 if
and only if 𝐴 has n linearly independent eigen vectors.
Ø 𝐷 = 𝑃3. 𝐴𝑃 .
Ø The diagonal elements of 𝐷 are the corresponding eigen values
Ø 𝑃 is the matrix whose columns are the eigen vectors. (𝑃 is
invertible)

Proof: 𝜆. ⋯ 0
𝐴𝑃 = 𝐴 𝒙𝟏 , … , 𝒙𝒏 = 𝜆. 𝒙𝟏 , … , 𝜆2 𝒙𝒏 = 𝒙𝟏 , … , 𝒙𝒏 ⋮ ⋱ ⋮ = 𝑃𝐷
0 ⋯ 𝜆2

q 𝐴 = 𝑃𝐷𝑃WJ

𝐴X = (𝑃𝐷𝑃WJ)X = 𝑃𝐷 X 𝑃WJ

Dept. of Electrical Engineering


48 National Tsing Hua University, Taiwan
Example
3 1 1 1
q Let 𝐴 = , and let 𝒗𝟏 = and 𝒗𝟐 =
2 2 −2 1
3 1 1 1 3 1 1 4
𝐴𝒗𝟏 = = = 𝒗𝟏 𝐴𝒗𝟐 = = = 4𝒗𝟐
2 2 −2 −2 2 2 1 4
1 1

1 1
𝑃= 𝑎𝑛𝑑 𝑃3. = 3 3
−2 1 2 1
3 3
1 1

𝐷 = 𝑃3. 𝐴𝑃 = 3 3 3 1 1 1
=
1 0
2 1 2 2 −2 1 0 4
3 3
1 1

1 1 1 0 3 3 = 171 85
𝐴6 = 𝑃𝐷 6 𝑃3. =
−2 1 0 256 2 1 170 86
3 3
Dept. of Electrical Engineering
49 National Tsing Hua University, Taiwan
Outline
q Eigen values and Eigen vectors
q Unsupervised dimensionality reduction via PCA
Ø The main steps behind PCA
Ø Extracting the PCA
Ø Total and explained variance
Ø Feature transformation
Ø PCA in scikit-learn
q Supervised data compression via LDA
q Using KPCA for nonlinear mappings

Dept. of Electrical Engineering


50 National Tsing Hua University, Taiwan
The main steps behind PCA
q PCA, which is principal component analysis, is an
unsupervised linear transformation technique that is
widely used across different fields, most prominently for
feature extraction and dimensionality reduction.

q PCA helps us to identify patterns in data based on the


correlation between features.

q In a nutshell, PCA aims to find the directions of maximum


variance in high-dimensional data and projects the data
onto a new subspace with equal or fewer dimensions
than the original one.

Dept. of Electrical Engineering


51 National Tsing Hua University, Taiwan
The main steps behind PCA
q The orthogonal axes (principal components) of the new
subspace can be interpreted as the directions of maximum
variance given the constraint that the new feature axes are
orthogonal to each other.
More variance à More information

Dept. of Electrical Engineering


52 National Tsing Hua University, Taiwan
The main steps behind PCA
q If we use PCA for dimensionality reduction, we construct:
Ø 𝑑×𝑘-dimensional transformation matrix, W. (k ≪ 𝑑)
Ø Mapping a vector, x.
Ø The features of a training example, onto a new k-dimensional
feature subspace (few than original).

x= x. , x+ , ⋯ , x* , x ∈ ℝ*

xW = 𝐳 , W ∈ ℝ*×8

𝐳 = z. , z+ , ⋯ , z8 , 𝐳 ∈ ℝ8

Dept. of Electrical Engineering


53 National Tsing Hua University, Taiwan
The main steps behind PCA
q Summarize the PCA algorithm: [1][2]
1. Standardization:
• Standardize the range of the continuous initial variables so that
each one of them contributes equally to the analysis.

value − mean
𝑧=
standard deviation

2. Covariance matrix computation:


• Understand how the variables of the input dataset are varying from
the mean with respect to each other, or in other words, to see if
there is any relationship between them.

𝐶𝑜𝑣(𝑥, 𝑥) 𝐶𝑜𝑣(𝑥, 𝑦) 𝐶𝑜𝑣(𝑥, 𝑧)


𝐶𝑜𝑣(𝑦, 𝑥) 𝐶𝑜𝑣(𝑦, 𝑦) 𝐶𝑜𝑣(𝑦, 𝑧)
𝐶𝑜𝑣(𝑧, 𝑥) 𝐶𝑜𝑣(𝑧, 𝑦) 𝐶𝑜𝑣(𝑧, 𝑧)
3x3 covariance matrix

Dept. of Electrical Engineering


54 National Tsing Hua University, Taiwan
The main steps behind PCA
2. Covariance matrix computation:
• 𝐶𝑜𝑣 𝑎, 𝑎 = 𝑉𝑎𝑟 𝑎 , which 𝑎 is x, y, z.
• The result of covariance:

Result Relationship
Positive Two variables increase or decrease together.
sign (correlated)
Negative One increases when the other decreases.
sign (Inversely correlated)
Zero Two variables are not related
(uncorrelated)

q But, the covariance matrix is not more than a table that


summaries the correlations between all the possible pairs
of variables.

Dept. of Electrical Engineering


55 National Tsing Hua University, Taiwan
The main steps behind PCA
3. Compute the eigen vectors and eigen values of the covariance
matrix:
• Eigen vectors and eigen values are the linear algebra concepts that
we need to compute from the covariance matrix in order to
determine the principal components of the data.

• Principal components are new variables that are constructed as


linear combinations or mixtures of initial variables.

• These combinations are done in such a way that the new


variables(i.e., principal components) are uncorrelated and most of
the information within the initial variables is squeezed or
compressed into the first components.

Dept. of Electrical Engineering


56 National Tsing Hua University, Taiwan
The main steps behind PCA
3. Compute the eigen vectors and eigen values of the covariance
matrix:
• 10-dimensional data gives you 10 principal components, but PCA
tries to put maximum possible information in the first component,
then maximum remaining information in the second and so on until
having something like shown as follow.

Dept. of Electrical Engineering


57 National Tsing Hua University, Taiwan
The main steps behind PCA
3. Compute the eigen vectors and eigen values of the covariance
matrix:
• Example: Suppose the dataset is 2-dimensional with 2 variables
𝑥, 𝑦 and that the eigen vectors and eigen values of covariance
matrix are as follows:
0.6778736 −0.7351785
𝒗𝟏 = 𝒗𝟐 =
0.7351785 0.6778736

𝜆. = 1.284028 ; 𝜆+ = 0.04908323

𝜆$ > 𝜆' à eigenvector that corresponds to the first principal component (PC1)
is 𝒗𝟏 and the corresponds to the second component (PC2) is 𝒗𝟐 .

Dept. of Electrical Engineering


58 National Tsing Hua University, Taiwan
The main steps behind PCA
4. Feature vector:
• Choose whether to keep all these components or discard those of
lesser significance (of low eigen values), and form with the
remaining ones a matrix of vectors.
• In other word, selecting 𝑘 eigenvectors, which correspond to the 𝑘
largest eigenvalues, where 𝑘 is the dimensionality of the new
feature subspace (𝑘 ≤ 𝑑) .
• Example:

0.6778736 −0.7351785
𝒗𝟏 = 𝒗𝟐 =
0.7351785 0.6778736

𝜆. = 1.284028 ; 𝜆+ = 0.04908323

We can either form a feature vector with both of the eigen vectors or
discard the eigen vector 𝒗𝟐 , which is the one of lesser significance.

Dept. of Electrical Engineering


59 National Tsing Hua University, Taiwan
The main steps behind PCA
5. Recast the data along the principal components axes:
• Use the feature vector formed using the eigen vectors of the
covariance matrix, to reorient the data from the original axes to the
ones represented by the principal components.

Dept. of Electrical Engineering


60 National Tsing Hua University, Taiwan
Example for PCA
q Let the matrix be the score of three students:[3]
Student Math English Art
1 90 60 90
2 90 90 30
3 60 60 60
4 60 60 90
5 30 30 30

1. Calculate the mean values:


90 60 90
90 90 30
𝐴 = 60 60 60 𝐴̅ = 66 60 60
60 60 90
30 30 30
Score matrix
Dept. of Electrical Engineering
61 National Tsing Hua University, Taiwan
Example for PCA
2. Compute the covariance matrix of the whole dataset:
+
1
𝐶𝑜𝑣 𝑋, 𝑌 = m ) − 𝑌)
l(𝑋) − 𝑋)(𝑌 m
𝑛−1
90 60 90 )*$

90 90 30 504 360 180


𝐴 = 60 60 60 360 360 0
60 60 90 180 0 720
30 30 30
𝐴̅ = 66 60 60 Covariance Matrix

3. Compute eigen vectors and eigen values:


• Let 𝐴 be a square matrix, 𝒗 a vector and 𝜆 a scalar that satisfies:
𝐴𝒗 = 𝜆𝒗.
det 𝐴 − 𝜆𝐼 = 0
Characteristic equation

Dept. of Electrical Engineering


62 National Tsing Hua University, Taiwan
Example for PCA
3. Compute eigen vectors and eigen values:
504 360 180 1 0 0
det 𝐴 − 𝜆𝐼 = det( 360 360 0 −𝜆 0 1 0 )
180 0 720 0 0 1

504 − 𝜆 360 180


= det 360 360 − 𝜆 0
180 0 720 − 𝜆

= −𝜆, + 1584𝜆' − 641520𝜆 + 25660800 = 0

𝜆$ ≈ 44.81966 … ; 𝜆' ≈ 629.11036 … ; 𝜆, ≈ 910.06995 …

−3.75100 … −0.50494 … 1.05594 …


𝒗$ = 4.28441 … ; 𝒗' = −0.67548 … ; 𝒗, = 0.69108 …
1 1 1

Dept. of Electrical Engineering


63 National Tsing Hua University, Taiwan
Example for PCA
4. Sort the eigenvectors by decreasing eigen values and choose 𝑘
eigen vectors with the largest eigen values to form 𝑑×𝑘
dimensional matrix 𝑊.
• If we want to choose 2-dimensional feature subspace: (𝑑 = 3)

1.05594 −0.50494
∵ 𝜆9 > 𝜆+ > 𝜆. 𝑤 = 0.69108 −0.67548
1 1
𝒗, 𝒗'
5. Transform the samples onto new subspace:
• Use 3×2 dimensional matrix 𝑊: (𝑥 is data point)
• x= x$ , x' , ⋯ , x- , x ∈ ℝ-

𝑦 = 𝑊:x

So far, we have computed the two principal components and projected


the data points onto the new subspace.

Dept. of Electrical Engineering


64 National Tsing Hua University, Taiwan
Total and explained variance
q Since we want to reduce the dimensionality of the dataset
by compressing it onto a new feature subspace, we only
select the subset of the eigenvectors that most of the
information (variance).

q The eigenvalues define the magnitude of the eigenvectors.


(Sort the eigenvalues by decreasing magnitude)

q Explained variance ratio: [4]


Ø The variance explained by each of the principal components
(eigen vectors)
λ!
Explained variance ratio =
∑$!"# λ!

Dept. of Electrical Engineering


65 National Tsing Hua University, Taiwan
Total and explained variance
q Calculate the cumulative sum of explained variances.
Ø Dataset: Wine dataset.

Most information
(eigen value)
Dept. of Electrical Engineering
66 National Tsing Hua University, Taiwan
Feature transformation
q Using projection matrix onto the PCA subspace:

𝑿; = 𝐗𝐖 124×13 − 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛 → hard to visualize.

𝐱 ; = 𝐱𝐖 (2 − dimension)

q Plot the PCA :


Ø Dataset: Wine dataset.
Ø 𝐱 ; = 𝐱𝐖 (2 − dimension).

Dept. of Electrical Engineering


67 National Tsing Hua University, Taiwan
Feature transformation
q We can plot the decision region with two-dimensional
data:
Ø Dataset: Wine dataset.
Ø Classifier: Logistic regression.

training dataset test dataset

Dept. of Electrical Engineering


68 National Tsing Hua University, Taiwan
Outline
q Eigen values and Eigen vectors
q Unsupervised dimensionality reduction via PCA
q Supervised data compression via LDA
Ø PCA versus LDA
Ø The inner workings of LDA
Ø Example for LDA
Ø Wine dataset with LDA
Ø Selecting linear discriminants
Ø Projecting examples onto new feature space
q Using KPCA for nonlinear mappings

Dept. of Electrical Engineering


69 National Tsing Hua University, Taiwan
Supervised data compression via LDA
q LDA, which is linear discriminant analysis, can be used as
a technique for feature extraction to increase the
computational efficiency and reduce the degree of
overfitting due to the curse of dimensionality in non-
regularized models.

q The goal in LDA is to find the feature subspace that


optimizes class separability.

Dept. of Electrical Engineering


70 National Tsing Hua University, Taiwan
PCA versus LDA
q PCA versus LDA :
Ø Similar
• Both of them are linear transformation techniques
(reduce the number of dimensionality)
Ø Different
• PCA: unsupervised algorithm
• LDA: supervised algorithm

q In general, LDA is a superior feature extraction technique


for classification task compared to PCA. But in certain
class, preprocessing via PCA tends to result in better
classification.[5]

Dept. of Electrical Engineering


71 National Tsing Hua University, Taiwan
PCA versus LDA
q The concept of LDA for two-class problems:

q A linear discriminant (LD1) would separate the two


normal distributed classes well.

q LD2 (y-axis) captures a lot of the variance in the dataset ,


it would fail as a good linear discriminant since it doesn’t
capture any of the class-discriminatory information.

Dept. of Electrical Engineering


72 National Tsing Hua University, Taiwan
PCA versus LDA
q LDA assumption:
1. The data is normally distributed.
2. The classes have identical covariance matrices.
3. The training examples are statistically independent of each
other.

q However, even if one, or more, of those assumptions is


(slightly) violated, LDA for dimensionality reduction can
still work reasonably well.[6]

Dept. of Electrical Engineering


73 National Tsing Hua University, Taiwan
The inner workings of LDA (Example)
q The main steps that required to perform LDA:
1. Standardize the 𝑑-dimensional dataset (𝑑 is feature number)
2. For each class, compute the 𝑑-dimensional mean vector.
• Take the label information into account (supervised learning).

1
𝐦" = I 𝐱 i ∈ class label
n"
𝐱∈@!

3. Construct the between-class scatter matrix, 𝐒< , and the within-


class scatter matrix, 𝐒=.
• Compute the between-class scatter matrix, 𝐒. .
B
𝐒< = I 𝑛A (𝐦" − 𝐦)(𝐦" − 𝐦):
A-.
𝑛) is the number of data in class 𝑖.
𝐦 is whole data mean.
Dept. of Electrical Engineering
74 National Tsing Hua University, Taiwan
The inner workings of LDA (Example)
3. Construct the between-class scatter matrix, 𝐒< , and the within-
class scatter matrix, 𝐒=. [7]
• 𝐒/ is calculated by summing up the individual scatter matrix, 𝐒) .
E
1
ΣA = 𝑆A = I (𝒙 − 𝐦" )(𝒙 − 𝐦" ): S= = I S" , c ∈ class label
𝑛A
𝒙∈@! D-.

Suppose class labels in the training dataset are uniformly distributed.

Maximize the between-class scatter matrix.

Minimize the within-class scatter matrix.

Dept. of Electrical Engineering


75 National Tsing Hua University, Taiwan
The inner workings of LDA(Example)
4. Compute the eigenvectors and corresponding eigenvalues of
3.
the matrix,𝐒F 𝐒< .
1$
• 𝐒0 𝐒. is derived by Fisher’s criterion.
5. Sort the eigenvalues by decreasing order to rank the
corresponding eigenvectors.
6. Choose the k eigenvectors that correspond to the k largest
eigenvalues to construct a d×k-dimensional transformation
matrix, 𝑾; the eigenvectors are the columns of this matrix.
7. Project the examples onto the new feature subspace using the
transformation matrix, 𝑾.

Step 4 to Step 7 of LDA is similar to PCA.

Dept. of Electrical Engineering


76 National Tsing Hua University, Taiwan
Example for LDA
q Example for LDA:[8]
Ø Classify man or woman by height and weight. (d=2)

J
3+ 3K.M 𝑒𝑥𝑝
𝑓 𝒙 𝝁, 𝜮 = (2𝜋) 𝜮 −0.5 𝒙 − 𝝁 : Σ 3. (𝒙 − 𝝁)
Multivariate normal distribution

𝑥height
𝒙= 𝑥
weight

𝜇height Cov(height, height) Cov(weight, height)


𝝁4G2 = 𝜇 𝜮4G2 =
weight Cov(height, weight) Cov(weight, weight) 4G2

𝜇height Cov(height, height) Cov(weight, height)


𝝁HI4G2 = 𝜇 𝜮 =
weight HI4G2 Cov(height, weight) Cov(weight, weight) HI4G2

Dept. of Electrical Engineering


77 National Tsing Hua University, Taiwan
Example for LDA
q The likelihood function:
Ø Man:

𝑝 𝒙 𝝁4G2 , 𝜮4G2
1
= 𝜮4G2 3K.M 𝑒𝑥𝑝 −0.5 𝒙 − 𝝁4G2 : Σ 3. (𝒙 − 𝝁4G2 )
2𝜋

Ø Woman:

𝑝 𝒙 𝝁HI4G2 , 𝜮HI4G2
1
= 𝜮HI4G2 3K.M 𝑒𝑥𝑝 −0.5 𝒙 − 𝝁HI4G2 : Σ 3. (𝒙 − 𝝁HI4G2 )
2𝜋

Dept. of Electrical Engineering


78 National Tsing Hua University, Taiwan
Example for LDA
q Maximum a posteriori:

𝒘%&' = arg max 𝑝 𝒘( 𝑝(𝑥|𝒘( )


(∈ *+,,./*+,

= arg max 𝑙𝑛𝑝(𝑥|𝒘( )


(∈ *+,,./*+,

Nature logarithm and ignore 𝑝 𝒘)

q Result:
𝑙𝑛 𝑝 𝒙 𝝁*+, , 𝜮*+,
= −𝑙𝑛 2𝜋 − 0.5 𝜮*+, − 0.5 𝒙 − 𝝁*+, 0 Σ1# (𝒙 − 𝝁*+, )

𝑙𝑛 𝑝 𝒙 𝝁./*+, , 𝜮./*+,
= −𝑙𝑛 2𝜋 − 0.5 𝜮./*+, − 0.5 𝒙 − 𝝁./*+, 0 Σ1# (𝒙 − 𝝁./*+, )

Dept. of Electrical Engineering


79 National Tsing Hua University, Taiwan
Example for LDA
q Classification:

𝑙𝑛 𝑝 𝒙 𝝁*+, , 𝜮*+, − 𝑙𝑛 𝑝 𝒙 𝝁./*+, , 𝜮./*+, > 0 → man

𝑙𝑛 𝑝 𝒙 𝝁*+, , 𝜮*+, − 𝑙𝑛 𝑝 𝒙 𝝁./*+, , 𝜮./*+, < 0 → woman

Which:

𝑙𝑛 𝑝 𝒙 𝝁*+, , 𝜮*+, − 𝑙𝑛 𝑝 𝒙 𝝁./*+, , 𝜮./*+,


= 0.5 𝜮./*+, − 𝜮*+,
+ 0.5 𝒙 − 𝝁./*+, 0 Σ1# (𝒙 − 𝝁./*+, ) − 𝒙 − 𝝁*+, 0 Σ1# (𝒙 − 𝝁*+, )

Dept. of Electrical Engineering


80 National Tsing Hua University, Taiwan
Wine dataset with LDA
q Compute the 𝑑-dimensional mean vector for each class:
Ø Dataset: Wine dataset. (3 classes)
Ø 𝐦" ∈ mean vector ; 𝑑 = 13.
𝜇(,+56/7/5
1
𝐦2 = R 𝐱 = 𝜇(,*+5(6 +6(8 , i ∈ 1,2,3
n2 ⋮
:3,567839:
𝐱∈42

Dept. of Electrical Engineering


81 National Tsing Hua University, Taiwan
Wine dataset with LDA
q Construct the within-class and between-class scatter
matrix.
B
1
S= = I 𝑆A 𝑆A = I (𝒙 − 𝐦" )(𝒙 − 𝐦" ):
𝑛A
A-. 𝒙∈@!

q Between-class scatter matrix, 𝐒z:

B
𝐒< = I 𝑛A (𝒎𝒊 − 𝐦 )(𝒎A − 𝐦 ):
A-.

Dept. of Electrical Engineering


82 National Tsing Hua University, Taiwan
Wine dataset with LDA
q Solve the generalized eigenvalue problem of the
WJ
matrix, 𝐒{ 𝐒z.
Ø 13 eigen values (∵ Wine dataset have 13 features.)

Dept. of Electrical Engineering


83 National Tsing Hua University, Taiwan
Selecting linear discriminants
q Measure the class-discriminatory information
discriminability:

The others approach 0.

q Create transformation matrix, 𝑾 by two most


discriminative eigenvector columns:

Dept. of Electrical Engineering


84 National Tsing Hua University, Taiwan
Projecting examples onto new feature space
q Use the equation to project onto new feature space:
Ø Example: Two feature spaces.

𝑿; = 𝑿𝑾

Dept. of Electrical Engineering


85 National Tsing Hua University, Taiwan
LDA result
q We can plot the decision region with two-dimensional
data:
Ø Dataset: Wine dataset.
Ø Classifier: Logistic regression.

training dataset test dataset

Dept. of Electrical Engineering


86 National Tsing Hua University, Taiwan
Outline
q Eigen values and Eigen vectors
q Unsupervised dimensionality reduction via PCA
q Supervised data compression via LDA
q Using KPCA for nonlinear mappings
Ø Kernel functions and the kernel trick
Ø KPCA example

Dept. of Electrical Engineering


87 National Tsing Hua University, Taiwan
Using KPCA for nonlinear mappings
q KPCA, which is kernel principal component analysis, is a
best choice when the problem is non-linear
transformation.

q KPCA relates to the concepts of kernel SVM.

Dept. of Electrical Engineering


88 National Tsing Hua University, Taiwan
Using KPCA for nonlinear mappings
q Intuition behind KPCA:[9]
Ø The idea of KPCA relies on the intuition that many dataset, which
are not linearly separable in their space, can be made linearly
separable by projecting them onto a higher dimensional space.
Ø The added dimensions are just simple arithmetic operations
performed on the original data dimensions.

2-d dataset.

Dept. of Electrical Engineering


89 National Tsing Hua University, Taiwan
Using KPCA for nonlinear mappings
q Intuition behind KPCA: Linear separable

𝜙
𝐱 = x$ , x' ;
𝑧 = x$ , x' , x$' + x'' ;

Project the data from a lower dimensional (2D) to a higher dimensional (3D) space.
Dept. of Electrical Engineering
90 National Tsing Hua University, Taiwan
Kernel functions and the kernel trick
q We can tackle nonlinear problems by projecting them
onto a new feature space of higher dimensionality where
the classes become linearly separable.

𝜙: ℝ* ⟶ ℝ8 (k ≫ d)

q For example :
O
𝐱 = x. , x+

↓𝜙

:
𝑧= 𝑥.+ , 2𝑥. 𝑥+ , 𝑥++

Dept. of Electrical Engineering


91 National Tsing Hua University, Taiwan
Kernel functions and the kernel trick
q One downside of KPCA is that it’s computationally very
expensive, and this is where we use the kernel trick.

q Using the kernel trick, we can compute the similarity


between two high-dimension feature vectors in the
original feature space.

Dept. of Electrical Engineering


92 National Tsing Hua University, Taiwan
Kernel functions and the kernel trick
q Review: The covariance definition (Σ)
%
1 (") (") <
Σ = 4𝒙 𝒙
n
"-.

q Covariance between two features, k and j :

%
1 " "
σ,8 = I(x, − µ, )(x8 − µ8 )
n
"-.
%
1 " "
σ,8 = I x, x8 , Standardization(~N(0,1))
n
"-.

Dept. of Electrical Engineering


93 National Tsing Hua University, Taiwan
Kernel functions and the kernel trick
q By Bernhard Scholkopf [10] , we can replace the dot
products between examples in the original feature space
with the nonlinear feature combinations via 𝜙:
Ø 𝒙(") → 𝜙(𝒙(") )

% %
1 (") (")<
1
Σ = 4𝒙 𝒙 ≈ 4 𝜙(𝒙(") )𝜙(𝒙 A )O
n n
"-. "-.

Dept. of Electrical Engineering


94 National Tsing Hua University, Taiwan
Kernel functions and the kernel trick
q We use the kernel trick to avoid calculating the pairwise
dot products of the examples, 𝒙, under 𝜙 explicitly by
using a kernel function, 𝜅 :

κ 𝒙 " ,𝒙 , = 𝜙(𝒙 " ): 𝜙(𝒙 , )

q In other words, we can omit the complex calculation by


doing KPCA with kernel trick, which are already projected
onto the respective components.

Dept. of Electrical Engineering


95 National Tsing Hua University, Taiwan
Kernel functions and the kernel trick
q The most commonly used kernels are as follows:
• The polynomial kernel :
= !
κ 𝒙 = ,𝒙 >
= (𝒙 𝒙 >
+ 𝜃)? θ ∈ threshold ; p ∈ power

• The hyperbolic tangent (sigmoid) kernel :


= !
κ 𝒙 = ,𝒙 >
= tanh(𝜂𝒙 𝒙 >
+ 𝜃)

• The radial basis function (RBF) or Gaussian kernel :

= > '
𝒙 −𝒙
κ 𝒙 = ,𝒙 > = exp −
2𝜎 '

' $
κ 𝒙 = ,𝒙 > = exp −𝛾 𝒙 = −𝒙 > , 𝑤ℎ𝑒𝑟𝑒 𝛾 =
'@ "

Dept. of Electrical Engineering


96 National Tsing Hua University, Taiwan
Kernel functions and the kernel trick
q We define three steps to implement an RBF KPCA :
1. Compute the kernel (similarity) matrix, K, where we need to
calculate the following :
+
κ 𝒙" ,𝒙 , = exp −𝛾 𝒙 " −𝒙 ,

κ 𝒙 . ,𝒙 . κ 𝒙 . ,𝒙 + κ 𝒙 . ,𝒙 %

+ . + +
⋯ +
K= κ 𝒙 ,𝒙 κ 𝒙 ,𝒙 κ 𝒙 ,𝒙 %
⋮ ⋱ ⋮
κ 𝒙 % ,𝒙 . κ 𝒙 % ,𝒙 + ⋯ κ 𝒙 % ,𝒙 %

q For example, if the dataset contains 100 training


examples, the symmetric kernel matrix of the pairwise
would be 100×100-dimensional.

Dept. of Electrical Engineering


97 National Tsing Hua University, Taiwan
Kernel functions and the kernel trick
q We define three steps to implement an RBF KPCA :
2. Center the kernel matrix, K, using the following equation :
K; = K−𝟏%K−K𝟏%+𝟏%K𝟏%
1
𝟏" : 𝑛×𝑛 − dimensional matrix where all values are equal to .
𝑛
3. Collect the top k eigenvectors of the centered kernel matrix
based on their corresponding eigenvalues, which are ranked by
decreasing magnitude.

q The centering of the kernel matrix in the second step


becomes necessary since we don’t compute the new
feature space explicitly so that we cannot guarantee the
new feature space is also centered at zero.

Dept. of Electrical Engineering


98 National Tsing Hua University, Taiwan
KPCA example
q Example 1 – separating half-moon shapes:

Using PCAà it’s unable to classify.


Dept. of Electrical Engineering
99 National Tsing Hua University, Taiwan
KPCA example
q Example 1 – separating half-moon shapes :
Ø Use the kernel PCA function (KPCA):

Using KPCA àit’s able to classify.

Dept. of Electrical Engineering


100 National Tsing Hua University, Taiwan
KPCA example
q Example 2 – separating concentric circles:

Using PCAà it’s unable to classify.


Dept. of Electrical Engineering
101 National Tsing Hua University, Taiwan
KPCA example
q Example 2 – separating concentric circles:
Ø Use the kernel PCA function (KPCA):

Using KPCA àit’s able to classify.

Dept. of Electrical Engineering


102 National Tsing Hua University, Taiwan
REFERENCE
[1]https://builtin.com/data-science/step-step-explanation-principal-component-analysis
[2]https://setosa.io/ev/principal-component-analysis/
[3]https://towardsdatascience.com/the-mathematics-behind-principal-component-analysis-fff2d7f4b643
[4]https://vitalflux.com/pca-explained-variance-concept-python-example/
[5] A. M. Martinez and A. C. Kak, “PCA versus LDA”, IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 23, no. 2, pp. 228-233, 2001.
[6]R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, New York, 2001.
[7] Bhattacharyya, S.K. and Rahul, K., 2013. Face Recognition by Linear Discriminant Analysis.
International Journal of Communication Network Security, ISSN.
[8]https://chih-sheng-huang821.medium.com/%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92-
lda%E5%88%86%E9%A1%9E%E6%BC%94%E7%AE%97%E6%B3%95-14622f29e4dc
[9]https://iq.opengenus.org/kernal-principal-component-analysis/
[10]B.Scholkopf, A. Smola, and K.R. Muller, Kernel principal component analysis, pages 583-588, 1997.

Dept. of Electrical Engineering


103 National Tsing Hua University, Taiwan
Financial Data Structure

Department of Electrical Engineering


National Tsing-Hua University, HsinChu, Taiwan
Outlines
q Essential Types Of Financial Data
q Bars

Dept. of Electrical Engineering


105 National Tsing Hua University, Taiwan
Outlines
q Essential Types Of Financial Data
Ø Fundamental Data
Ø Market Data
Ø Analytics
Ø Alternative Data
q Bars

Dept. of Electrical Engineering


106 National Tsing Hua University, Taiwan
Fundamental Data
q Fundamental data
Ø Mostly accounting data, reported quarterly or monthly.
Ø For example:
• Assets (資產)
• Liabilities (負債)
• Sales
• Costs/earnings

Dept. of Electrical Engineering


107 National Tsing Hua University, Taiwan
Market Data
q Market Data
Ø Market data includes all trading activity.
Ø For example:
• Price/yield(殖利率)/implied volatility
• Volume
• Dividends/coupons
• Aggressor side

Dept. of Electrical Engineering


108 National Tsing Hua University, Taiwan
Analytics
q Analytics
Ø A derivative data, based on an original source, which could be
fundamental, market, alternative, or even a collection of other
analytics.
Ø For example:
• Analyst recommendations
• Credit ratings
• Earnings expectations

Dept. of Electrical Engineering


109 National Tsing Hua University, Taiwan
Alternative Data
q Alternative Data
Ø The data used to obtain insight into the investment process.
Ø For example:
• Satellite/CCTV images
• Google researches
• Twitter/chats
• 台積電用水量變化
• 沃爾瑪停車場擁擠度變化

Dept. of Electrical Engineering


110 National Tsing Hua University, Taiwan
Alternative Data

Dept. of Electrical Engineering


111 National Tsing Hua University, Taiwan
Outlines
q Essential Types Of Financial Data
q Bars
Ø Standard Bars
Ø Information Driven Bars

Dept. of Electrical Engineering


112 National Tsing Hua University, Taiwan
Bars
q Candlestick charts:
Ø Based on four prices formed in a day (or a certain period) trend:
• Open price
• Highest price
• Lowest price Bearish Bullish

• Close price

Dept. of Electrical Engineering


113 National Tsing Hua University, Taiwan
Bars
q Volume-weighted Average Price (VWAP)
™(š›œ•žŸ ∗ ¡¢£¤Ÿ)
Ø VWAP =
™š›œ•žŸ

Dept. of Electrical Engineering


114 National Tsing Hua University, Taiwan
Standard Bars
q Standard Bars :
Ø Transform a series of observations that arrive at irregular
frequency into a homogeneous series derived from regular
sampling.
Ø For example:
• Time bars
• Tick bars Quiz
• Volume bars
• Dollars bars

Dept. of Electrical Engineering


115 National Tsing Hua University, Taiwan
Standard Bars
q Time bars:
Ø Sampling information at fixed time intervals.
Ø For example: sampling every 15 minutes

Dept. of Electrical Engineering


116 National Tsing Hua University, Taiwan
Standard Bars
q Example of time bar:

Dept. of Electrical Engineering


117 National Tsing Hua University, Taiwan
Standard Bars
q Number of ticks when grouped by time:

Dept. of Electrical Engineering


118 National Tsing Hua University, Taiwan
Standard Bars
q Order book:
Ø The list of orders that a trading venue uses to record the interest
of buyers and sellers in a particular financial instrument
q The primary source of market data is the order book.

• tickDirection: 每筆的價格走向
• minusTick: 價格向下
• zeroMinusTick: 價格不變但前一個 tick 價格向下
• plusTick: 價格向上
• zeroPlusTick: 價格不變但前一個 tick 價格向上

Dept. of Electrical Engineering


119 National Tsing Hua University, Taiwan
Standard Bars
q Tick:
Ø Transaction data in financial product transactions.

Dept. of Electrical Engineering


120 National Tsing Hua University, Taiwan
Standard Bars
q From tick to bar:

Dept. of Electrical Engineering


121 National Tsing Hua University, Taiwan
Standard Bars: Tick Bars
q Tick Bars:
Ø The sample variables listed earlier will be extracted each time a
pre-defined number of transactions takes place.
Ø For example:
• Ticks of a bar = total ticks / number of bars
• 6/3=2

Dept. of Electrical Engineering


122 National Tsing Hua University, Taiwan
Standard Bars
q Example of tick bar:

Dept. of Electrical Engineering


123 National Tsing Hua University, Taiwan
Standard Bars
q Number of ticks when grouped by tick:

Dept. of Electrical Engineering


124 National Tsing Hua University, Taiwan
Standard Bars: Volume Bars
q Volume Bars:
Ø Sampling every time a pre-defined amount of the security’s units.
Ø For example:
• Volume of a bar = total volume / number of bars
• 1050 / 3 = 350

Dept. of Electrical Engineering


125 National Tsing Hua University, Taiwan
Standard Bars
q Example of volume bar:

Dept. of Electrical Engineering


126 National Tsing Hua University, Taiwan
Standard Bars
q Number of ticks when grouped by volume:

Dept. of Electrical Engineering


127 National Tsing Hua University, Taiwan
Standard Bars: Dollar Bars
q Dollar Bars:
Ø Sampling an observation every time a pre-defined market value
is exchanged.
Ø For example:
• Dollars of a bar = total amount / number of bars
• 13500 / 3 = 4500

Dept. of Electrical Engineering


128 National Tsing Hua University, Taiwan
Standard Bars
q Example of dollar bar:

Dept. of Electrical Engineering


129 National Tsing Hua University, Taiwan
Standard Bars
q Number of ticks when grouped by dollar:

Dept. of Electrical Engineering


130 National Tsing Hua University, Taiwan
Information-Driven Bars
q Information-Driven Bars:
Ø The purpose of information-driven bars is to sample more
frequently when new information arrives to the market.
Ø Detect the imbalance of trading in the market.
q Tick imbalance bars:
Ø To sample bars whenever tick imbalances exceed our
expectations.
𝑏A1$ i𝑓 Δ𝑃A = 0
𝑏A = Š Δ𝑃A 𝑃A :the price associated with tick t
i𝑓 Δ𝑃A ≠ 0
Δ𝑃A

Dept. of Electrical Engineering


131 National Tsing Hua University, Taiwan
Information-Driven Bars

Time t Price 𝚫Price b 𝜽


0 100 1
1 110 10 1 1
2 100 -10 -1 0
3 100 0 -1 -1
4 110 10 1 0
5 120 10 1 1

Ø Define the tick imbalance at time T as: 𝜃" = ∑"A*$ 𝑏A


Ø The expected value of 𝜃" at the beginning of the bar is calculated as:
𝐸% 𝜃" = 𝐸% 𝑇 (𝑃 𝑏A = 1 − 𝑃 𝑏A = −1 )= 𝐸% 𝑇 (2𝑃 𝑏A = 1 − 1)
𝑇 ∗ = arg min{ 𝜃" ≥ 𝐸% 𝑇 max{𝑃 𝑏A = 1 , 1 − 𝑃 𝑏A = 1 } }
Ø When 𝜃" exhibits more runs than expected, a low T will satisfy these conditions.

Dept. of Electrical Engineering


132 National Tsing Hua University, Taiwan
q Example of tick imbalance bar:

Dept. of Electrical Engineering


133 National Tsing Hua University, Taiwan
Information-Driven Bars

Dept. of Electrical Engineering


134 National Tsing Hua University, Taiwan
Information-Driven Bars
q Volume/Dollar Imbalance Bars
Ø Define the volumes or dollars associated with a run as:
# #
𝜃# = max V 𝑏% 𝑣% − V 𝑏% 𝑣%
%|&!'( %|&!')(

where 𝑣P represent number of securities traded or dollar amount


exchanged

𝐸% 𝜃" = 𝐸% 𝑇 (𝑃 𝑏A = 1 𝐸% 𝑣A |𝑏A = 1 − 𝑃 𝑏A = −1 𝐸% 𝑣A |𝑏A = −1 )


𝑇 ∗ = arg min{ 𝜃" ≥ 𝐸% 𝑇 |2(𝑃 𝑏A = 1 𝐸% 𝑣A |𝑏A = 1 ) −𝐸% [𝑣A ]| }

Dept. of Electrical Engineering


135 National Tsing Hua University, Taiwan
Information-Driven Bars

Dept. of Electrical Engineering


136 National Tsing Hua University, Taiwan
Information-Driven Bars
q Example of volume imbalance bar:

Dept. of Electrical Engineering


137 National Tsing Hua University, Taiwan
Information-Driven Bars
q Example of dollar imbalance bar:

Dept. of Electrical Engineering


138 National Tsing Hua University, Taiwan
Information-Driven Bars
q Tick Runs Bars
Ø To monitor the sequence of buys in the overall volume, and take
samples when that sequence diverges from our expectations

Ø Define the length of the current run as:


# #
𝜃# = max V 𝑏% − V 𝑏%
%|&!'( %|&!')(

EK θO = EK T max{P b) = 1 ,1 − P b) = 1 }
T ∗ = arg min{ θO ≥ EK T max{P b) = 1 ,1 − P b) = 1 } }

Ø P b) = 1 as an exponentially weighted moving average of the


proportion of buy ticks from prior bars

Dept. of Electrical Engineering


139 National Tsing Hua University, Taiwan
Information-Driven Bars
q Example of tick run bar:

Dept. of Electrical Engineering


140 National Tsing Hua University, Taiwan
Information-Driven Bars
q Volume/Dollar Runs Bars
Ø Define the length of the current run as:
# #
𝜃# = max V 𝑏% 𝑣% − V 𝑏% 𝑣%
%|&!'( %|&!')(

EK θO = EK T max{P,(1 − P b) = 1 )EK v) |b) = −1 }


T ∗ = arg min{ θO ≥ EK T max{P b) = 1 EK v) |b) = 1 , (1
− P b) = 1 ) EK v) |b) = −1 } }

Dept. of Electrical Engineering


141 National Tsing Hua University, Taiwan
Information-Driven Bars
q Example of volume run bar:

Dept. of Electrical Engineering


142 National Tsing Hua University, Taiwan
Information-Driven Bars
q Example of dollar run bar:

Dept. of Electrical Engineering


143 National Tsing Hua University, Taiwan
REFERENCE
q https://tw.stock.yahoo.com/
q https://tw.tradingview.com/
q https://cloud.tencent.com/developer/article/1457661

Dept. of Electrical Engineering


144 National Tsing Hua University, Taiwan
Labeling

Department of Electrical Engineering


National Tsing-Hua University, HsinChu, Taiwan
Outline
q The Fixed-time Horizon Method
q The Triple-barrier Method
q Meta-labeling

Dept. of Electrical Engineering


146 National Tsing Hua University, Taiwan
Outline
q The Fixed-time Horizon Method
q The Triple-barrier Method
q Meta-labeling

Dept. of Electrical Engineering


147 National Tsing Hua University, Taiwan
The Fixed-time Horizon Method
q In order to train the supervised machine learning model, labeling is
necessary.
q Consider a features matrix 𝑋 with 𝐼 rows,{𝑋A }A-.,…,T , drawn from some
bars with index 𝑡 = 1, … , 𝑇, 𝑤ℎ𝑒𝑟𝑒 𝐼 ≤ 𝑇.
q An observation 𝑋A is assigned a label 𝑦A ∈ {−1,0,1}.

, 𝑖𝑓 𝑟A#,%,A#,%CD < −𝜏
−1
𝑦) = ™ 0 , 𝑖𝑓 |𝑟A#,%,A#,%CD | ≤ 𝜏
1 , 𝑖𝑓 𝑟A#,%,A#,%CD > 𝜏

,where 𝜏 is a pre-defined constant threshold, 𝑡),% is the index of the


bar immediately after 𝑋) takes place, 𝑡),% + ℎ is the index of the h-th
bar after 𝑡),% , and 𝑟A#,%,A#,%CD is the price return over a bar horizon h,
𝑝A#,%CD
𝑟A#,%,A#,%CD = −1
𝑝A#,%

Dept. of Electrical Engineering


148 National Tsing Hua University, Taiwan
q For example:

−𝜏

Dept. of Electrical Engineering


149 National Tsing Hua University, Taiwan
q Disadvantage of the fixed-time horizon method:
Ø There are no good statistical characteristics in time bar usually.
Ø The threshold 𝜏 is constant.

Dept. of Electrical Engineering


150 National Tsing Hua University, Taiwan
Outline
q The Fixed-time Horizon Method
q The Triple-barrier Method
q Meta-labeling

Dept. of Electrical Engineering


151 National Tsing Hua University, Taiwan
The Triple-barrier Method
q There are two horizontal barriers and one vertical barrier
in the triple-barrier method.
q The two horizontal barriers are defined by profit-taking
and stop-loss limits.
q The vertical barrier is the time limit.
q The horizontal barriers are dynamic and are defined by
the volatility
Ø Estimated by Exponential Weight Moving Average

Dept. of Electrical Engineering


152 National Tsing Hua University, Taiwan
The Triple-barrier Method
q For example:

𝑦) = −1
𝑦) = 0

Dept. of Electrical Engineering


153 National Tsing Hua University, Taiwan
Outline
q The Fixed-time Horizon Method
q The Triple-barrier Method
q Meta-labeling

Dept. of Electrical Engineering


154 National Tsing Hua University, Taiwan
Meta-labeling
q Confusion matrix:
:U
Ø Precision =
:UVWU
:U
Ø Recall = :UVWX
.VY* Z[\BA]AI2∗[\BG^^
Ø F1 − score = *
Y (Z[\BA]AI2V[\BG^^)
:UV:X
Ø Accuracy =
:UVWXV:XVWU

Dept. of Electrical Engineering


155 National Tsing Hua University, Taiwan
Meta-labeling
q Meta-labeling: separate the problem to side and size.
Ø Side: the predicted direction of the bet.
Ø Size: how much money we should risk in the bet.

q Use two machine learning models to get the better


performance.
Ø First model for side:
• A binary classification model which decide the direction.
• Enhance the recall as high as possible.
Ø Second model for size:
• A multi-classification model which decide what is the appropriate
size we should invest in the bet.
• Enhance the precision as high as possible.

Dept. of Electrical Engineering


156 National Tsing Hua University, Taiwan

You might also like