You are on page 1of 61

STOCK TREND PREDICTION WITH NEURAL

NETWORK TECHNIQUES

Seminar Presentation
Mohd Haris Lye Abdullah
haris.lye@mmu.edu.my,

Supervisor
Professor Dr Y. P. Singh
y.p.singh@mmu.edu.my
Outline
1. Introduction/Research Objective
2. Stock Trend Prediction
3. Neural network
4. Support vector machine
5. Feature selection
6. Experiments and Result
7. Conclusion
Objectives
a) Evaluate the performance of the neural network
techniques on the task of stock trend prediction.
Multilayer Perceptron (MLP), Radial Basis Function (RBF)
network and Support Vector Machine (SVM) are
evaluated.

b) Stock prediction is formulated and evaluated as a 2 class


classification and regression problem.

c) Study pattern rejection technique to improve prediction


performance.
Stock Prediction
 Stock prediction is a difficult task due to the nature of the
stock data which is very noisy and time varying.
 The efficient market hypothesis claim that future price of
the stock is not predictable based on publicly available
information.
 However theory has been challenged by many studies and
a few researchers have successfully applied machine
learning approach such as neural network to perform stock
prediction
Is the Market Predictable ?
 Efficient Market Hypothesis (EMH) (Fama, 1965)
Stock market is efficient in that the current market prices reflect all
information available to traders, so that future changes cannot be
predicted relying on past prices or publicly available information.

Fama et al. (1988) showed that 25% to 40% of the variance in


the stock returns over the period of three to five years is
predictable from past return

Pesaran and Timmerman (1999) conclude that the UK stock market is


predictable for the past 25 years.

Saad (1998) has successfully employed different neural network models


to predict the trend of various stocks on a short-term range
Implementation
 In this paper we propose to investigate SVM, MLP and RBF
network for the task of predicting the future trend of the 3
major stock indices
a) Kuala Lumpur Composite Index (KLCI)
b) Hongkong Hangseng index
c) Nikkei 225 stock index
using input based on technical indicators.
 This paper approach the problem based on 2 class pattern
classification formulated specifically to assist investor in
making trading decisions
 The classifier is asked to recognise investment
opportunities that can give a return of r% or more within
the next h days. r=3% h=10 days
System Block Diagram
 The classifier is to predict if the trend of the stock index
increment of more than 3% within the next 10 days period
can be achieved.

Data from
daily
historical Increment Achievable ??
data Classifier
converted
into
Yes / No
technical
analysis
indicator
Classification Vs Forecasting
 Forecasting
 Predict actual future value

 Classification
 Assign pattern to different class categories.

 Classification class give future trend direction


predicted.
Data Used
 Kuala Lumpur Stock Index (KLCI) for the period of 1992-
1997.
Data Used
 Hangseng index (20/4/1992-1/9/1997)
Data Used
Nikkei 225 stock index (20/4/1982-1/9/1987)
TABLE 1: DESCRIPTION OF INPUT TO CLASSIFIER

xi i=1,2,3 ….12 n=15

Input to Classifier

DLN (t) = sign[q(t)-q(t-N)] * ln (q(t)/q(t-N) +1) (1)


q(t) is the index level at day t and DLN (t) is the actual input to the classifier.
Prediction Formulation

Consider ymax(t) as the maximum upward movement of the stock


index value within the period t and t + . y(t) represents the stock
index level at day t
Prediction Formulation

Classification
The prediction of stock trend is formulated as a two class
classification problem.

yr(t) > r% >> Class 2


yr(t)  r% >> Class 1
Prediction Formulation
Classification
 Let (xi , yi ) 1<i<N be a set of N training examples, each input
example xi  Rn n=15 being the dimension of the input space,
belongs to a class labelled by yi  +1,-1.

Yi =-1

Yi =+1
Prediction Formulation
Regression
 In the regression approach, the target output is
represented by a scalar value yr that represents the
predicted maximum excess return within the period  days
ahead.
Neural Network
 According to Haykin, S. (1994), Neural Networks: A
Comprehensive Foundation, NY: Macmillan, p. 2:
 A neural network is a massively parallel distributed
processor that has a natural propensity for storing
experiential knowledge and making it available for use.
 Knowledge is acquired by the network through a
learning process either supervised learning or
unsupervised learning.This paper use supervised
learning where the training pattern and it’s target
pattern are presented to the neural network during the
learning process.
Neural Network
Advantages of Neural Networks
The advantages of neural networks are due to its adaptive and
generalization ability.

a) Neural networks are adaptive methods that can learn without


any prior assumption of the underlying data.
b) Neural network, namely the feed forward multilayer perceptron
and radial basis function network have been proven to be a
universal functional approximators.
c) Neural networks are non-linear model with good generalization
ability.
Neural Network
Taxonomy of Neural Network Architecture

The architecture of the neural network refers to the arrangement


of the connection between neurons, processing element, number
of layers, and the flow of signal in the neural network. There are
mainly two category of neural network architecture: feed-
forward and feedback (recurrent) neural networks
Neural Network
 Feed-forward network, Multilayer Perceptron
Neural Network
 Recurrent network
Multilayer Perceptron (MLP)
Input Layer Neuron processing element
x1 x1
Hidden Layer
h1 w1
x2
Output Layer
y
Input
Vector x3 O1 x2
w2
 F(y)
x4 h2
. wn
. xn
.
xn
F(y)

MLP Structure

y
Multilayer Perceptron (MLP)
Training MLP Network
 The multilayer perceptron (MLP) network uses the back
propagation learning algorithm to obtain the weight of the
network.
 Simple back propagation algorithm use the steepest gradient
descent method to make changes to the weights.
 The objective of training is to minimize the training mean square
error Emse for all the training patterns.

 To speed up training, the faster Levenberg-Marquardt Back


propagation Algorithm is used.
Multilayer Perceptron (MLP)
 MLP Network Setup

a) Number of hidden layers


b) Number of hidden neuron
c) Number of input neurons
d) Activation function
RBF Network
RBF network consist of 3 layer feed forward structure consisting
of an input layer, single hidden layer with locally tuned hidden
units and an output layer as a linear combiner.
RBF Network
RBF Network Training

 The orthogonal least-square (OLS) proposed by Chen, S. et al


(1991) is a learning method that provide a systematic selection
of the centre nodes in order to reduce the size of the RBF
network. The learning task involve finding the appropriate
centres and then the corresponding weight. This method is
adopted.
 RBF centres are selected from a set of training data.
 The orthogonal least square (OLS) method is employed as a
forward regression procedure to select the centres of RBF nodes
from the candidate set. At each step the centre that maximize
the error reduction is selected.
Support Vector Machine
 Support Vector Machine is a special neural network
technique based on structural risk minimisation (SRM)
principle. In SRM both the capacity of the learning
machines is to be minimized together with the training
error.
 In empirical risk minimization (ERM) used in conventional
neural network such as the MLP and RBF network, only
training error is minimized.
 SVM was first introduced by Vapnik and Chervonenkis in
1995.
Support Vector Machine
 SVM demonstrate good generalization performance.
 It has sparse representation of solution. The solution to
the problem is only dependent on a subset of training data
points called support vector.
 Training of SVM is equivalent to solving a linearly
constrained quadratic programming problem. The solution
is always unique , globally optimal and free from local
minima problem.
Support Vector Machine
 Many decision boundaries can separate these two classes
 Which one should we choose ?

Class 2

Class 1
Support Vector Machine

Class 2

Class 1
m

In SVM the optimal separating hyperplane is chosen to


maximize the separation margin m and minimize error.
Optimization Problem in
SVM
 Let {x1, ..., xn} be our data set and let yi  {1,-
1} be the class label of xi
 The decision boundary should classify all points
correctly 

A constrained optimization problem


Support Vector Machine
• For non linear boundry , SVM map the training data into a
higher dimension feature space using a kernel function
K(x,xi ) .
• In this feature space SVM construct a separating
hyperplane which maximise the margin or distance from
the closest data points to the hyperplane and minimizing
misclassification error at the same time.
• Gaussian radial basis kernel is used and defined as follow.
K(x,xi) = exp (-  ||x-xi|| 2 )

 The optimum separating hyperplane (OSH) is represented



by F(x)=sign ( i yi K(x , x i ) + b )
 The sign give the class label.
Tolerance to Noise
 To allow misclassification error
yi (w . xi + b)> 1-   >0
The following equation is minimized in order to obtain the
optimum hyperplane
n

||w||2 + C 
i 1

 is the slack variable introduced to allow certain level of


misclassified points. C is the regularisation parameter that
trade off between misclassification error and margin
maximisation.
 For Uneven Class Distribution
q


p

||w||2 + C+   + C-i: y (
i)  1
i: y  i   1

Different misclassification cost can be applied to data with


different class label.
Receiver operating curve (ROC) can be obtained by
varying C+ and C-
Support Vector Regression
 In the regression problem the desired output to be
predicted is real valued whereas in the classification
problems the desired output is discreet value representing
the class/categories.

 The output to be predicted is the strength of the trend.

 SVM approximate the regression function with the


following form.
Parameter for SVM
a) Classifier
Regularisation constant C
Kernel parameter

b) Regressor
Parameter  for the -insensitive loss function
Regularisation constant C
Kernel parameter
Feature Selection
 Feature selection is a process whereby a subset of the potential
predictor variables are selected based on a relevance criterion in
order to reduce the input dimension.

 Typical feature selection will involve the following steps


Step 1. Search algorithm
Step 2. Evaluation of generated subset
Step 3. Evaluation of generated subset

 Step 1,2 and 3 are repeated until the stopping criterions are met
such as when the minimum number of features is included or
minimum accepted prediction accuracy achieved.
Feature Selection
General Approach for Feature Selection

a) Wrapper approach
The wrapper approach makes use of the induction algorithm
to evaluate the relevance of the features.
Relevance measure is based on solving the related problem,
usually the prediction accuracy of the induction algorithm when
the features are used.

b) Filter approach
Filter method selects the feature subset independent of the
induction algorithm. Features correlation is usually used.
Feature Selection
Feature Subset Selection

 The feature subset selection (FSS) algorithm can be categorized


into three categories of search algorithms:
a) exponential
b) randomised
c) sequential.

Forward Sequential Selection (FSS)


Backward Sequential Selection (BSS)
Feature Selection
Sequential selection technique
a) Forward Sequential Selection (FSS)
b) Backward Sequential Selection (BSS)

 Both BSS and FSS is used.


 Features are selected based on subset
that gives the best predictor performance when BSS and FSS is
used.
Feature Subset Selection
Sequential selection result
Performance Measure
 True Positive (TP) is the number of positive class
predicted correctly as positive class.
 False Positive (FP) is the number of negative
class predicted wrongly as positive class.
 False Negative (FN) is the number of positive
class predicted wrongly as negative class.
 True Negative (TN) is the number of negative
class predicted correctly as negative class.
Performance Measure

 Accuracy = TP+TN / (TP+FP+TN+FN)


 Precision = TP/(TP+FP)

 Recall rate (sensitivity) = TP/(TP+FN)


 F1 = 2 * Precision * Recall/(Precision + Recall)
Testing Method
Rolling Window Method is Used to Capture Training and
Test Data

Train Test

Train =600 data Test= 400 data


Experiment and Result
 Experiments are conducted to predict the stock
trend of three major stock indexes, KLCI,
Hangseng and Nikkei.
 SVM, MLP and RBF network is used in making
trend prediction based on classification and
regression approach.
 A hypothetical trading system is simulated to
find out the annualized profit generated based
on the given prediction.
Experiment and Result
Trading Performance
 A hypothetical trading system is used
 When a positive prediction is made, one unit of
money was invested in a portfolio reflecting the
stock index. If the stock index increased by more
than r% (r=3%) within the next h days (h=10)
at day t, then the investment is sold at the index
price of day t. If not, the investment is sold on
day t+1 regardless of the price. A transaction fee
of 1% is charged for every transaction made.
 Use annualised rate of return .
Trading Performance
 Classifier Evaluation Using Hypothetical
Trading System
Trading Performance
Experiment and Result
 Classification Result
Experiment and Result

 The result shows better performance of neural


network techniques when compared to K nearest
neighbour classifier. SVM shows the overall
better performance on average than MLP and
RBF network in most of the performance metric
used
Experiment and Result
Comparison of Receiver Operating Curve
(ROC)
Experiment and Result
 Area under Curve (ROC)
Experiment and Result
 Error-Reject Trade-off
Experiment and Result
 The Accuracy-Reject (AR) curve can be plotted to see the
accuracy improvement of the classifier due to various rejection
rates. The AR curve is a plot of the classifier operating points
showing the possible trade-off between the accuracy of the
classifier versus the rejection rate implemented.
Accuracy-Reject (AR) curve
Accuracy-Reject (AR) curve
Compare Regression
Performance
 The SVM, RBF and MLP network are used as the
predictors.
Compare Regression
Performance
Conclusion
 We have investigated the SVM, MLP and RBF network as a
classifier and regressor to assess it's potential in the stock
trend prediction task

 Support vector machine (SVM) has shown better


performance when compared to MLP and RBF .

 SVM classifier with probabilistic output outperform MLP


and RBF network in terms of error-reject tradeoff

 Both the classification and regression model can be used


for a profitable trend prediction system. The classification
model has the advantage in which pattern rejection
scheme can be incorporated.
 THE END

You might also like