You are on page 1of 4

Computing Conference 2017

18-20 July 2017 | London, UK

Surveying Various Genetic Programming (GP)


Approaches to Forecast Real-Time Trends & Prices in
the Stock Market
Balasaheb Gite Khalid Sayed Navin Mutha
Department of Computer Department of Computer Department of Computer
Engineering, Engineering Engineering
Sinhgad Academy of Engineering Sinhgad Academy of Engineering Sinhgad Academy of Engineering
(SAE) (SAE) (SAE)
Pune, India Pune, India Pune, India
bbgite.sae@sinhgad.edu khalidahmedsayed@gmail.com namutha0312@gmail.com

Saurabhkumar Marpadge Kshitij Patil


Department of Computer Engineering Department of Computer Engineering
Sinhgad Academy of Engineering (SAE) Sinhgad Academy of Engineering (SAE)
Pune, India Pune, India
saurabhmarpadge29@gmail.com kshitijpatil137@gmail.com

Abstract—The share prices in the stock market are known for Keeping in mind the arguments posed by both parties on
their extreme unpredictability and attempts to identify any this bone of contention, this project attempts to use Genetic
familiar patterns in the prices poses a confounding problem for Programming (GP) concepts along with a market indicator to
both fundamental & technical analysts. This article attempts to demonstrate a new methodology for a decision making
use symbolic regression capabilities of GP and a market trend support. The regression properties of GP could act as a
indicator (RSI) to predict the price and trend of the particular ‘mathematical’ prediction, free from any bias that a
stock as accurately as possible. The use of a market indicator to fundamental analyst may have. Regression of this manner,
independently forecast the trend without any role of GP serves as however accurate or novel it may be, could never be solely
a verification mechanism to the price predicted by GP for the
depended upon. Thus, to mitigate this problem, a
next day to further validate the authenticity of the price of the
stock in the context of the real-time stock market. Extensive
psychological market indicator such as RSI is used in tandem
testing has been done on the various evolution parameters and with GP to show the user that two methodologies are both
functions of GP to customize the GP approach as much as giving the same answer (in the form of price & trend).
possible to suit the current application and optimize the results. Technical analysis comprises the amalgamation of various
Though obtained results can never be fully relied on by real market indicators, carefully assigning weights based on their
technical analysts of the stock market, it could definitely be used contribution to the desired output. Model building is tedious
as a decision making support.
and although the more effective features are given more
Keywords—Genetic Programming (GP); Relative Strength
priority, usually more than one indicators are involved. In a
Index (RSI); Stock price; Stock trend sense, diversions in the desired result may be obtained which
could have been avoided by using a single, much-improved
I. INTRODUCTION technical indicator. This paper attempts to showcase exactly
this – using only one popular indicator (RSI) and further
The Efficient Market Hypothesis states that it is not
improving it using GP. Thus, the ‘opinion’ of the lone
possible to ‘say’ anything about the future of a particular stock
indicator is not compromised by the involvement of other
based on past data. Although the hypothesis is widely
indicators while at the same time it’s own accuracy is
accepted, mostly by the fundamental analyst community, it’s
improved.
opponents do believe that the past data does have some role in
shaping the trend or price in the future. Technical analysts, II. LITERATURE SURVEY
who do extensive study on historical prices of a company to
forecast future market conditions, use a plethora of market GP has a large set of configuration parameters (evaluation
indicators and huge historical data coupled with data mining method, fitness weights, crossover and mutation probabilities)
techniques. The use of technical analysis, which isolates key which could be set beforehand to tailor it specifically for the
factors that may not be quantified, has been shown to still give problem in hand (predicting the stock price for the next day)
more than acceptable results [1]. to enhance the regression abilities of GP [2]. In this paper,
attempts have been made to get the right set of parameters.

131 | P a g e
978-1-5090-5443-5/17/$31.00 ©2017 IEEE
Computing Conference 2017
18-20 July 2017 | London, UK
Technical indicators are widely used by technical analysts generation. The children programs will have inherited features
as they take into account historical prices and trends. Here, the from their parents or in some uncommon instances, they might
Relative Strength Index (RSI) has been chosen as a be the mutated versions of their parents.
verification method because of less calculation overhead.
Additionally, RSI is an extremely popular momentum Thus, the repetitive reproduction goes on creating many
indicator that has been featured in a number of articles, generations, with each program of a generation having unique
interviews and books over the years, having stood the test of features. To stop GP, a terminating condition needs to be
time [3]. specified – either a limit on the number of generations should
be fixed or the fitness (or error) is less than some value
Most of the available literature on stock prediction (usually taken as 0.001).
primarily considers many market indicators at the same time.
The training data for stocks (open, close, low, high, volume) is A. Representation of Programs
usually transformed into market indicators by applying the In GP, the programs/expressions are represented in the
respective formulae. Multivariate Adaptive Regression Spline form of trees which makes it easier to see the relationships
(MARS), sometimes along with Stepwise Regression (SR) is between sub-expressions and sub-trees. The external nodes of
then applied to these indicators to filter out the promising ones. the trees are constants and variables (called terminals) while
The selected indicators are then used in a forecasting model, the internal nodes are operators (called functions) of varying
mostly Support Vector Regression (SVR) to predict the price. arities.
This is where GP enters the picture – as a proven methodology
B. Random Generation of Initial Population
to optimize certain parameters (such as C) of the SVR model
to obtain better forecasting effectiveness [4]. As can be Two simple approaches to this random generation are full
understood from the above mentioned method, it is the SVR and grow methods. Another better approach is the ramped
that is doing the forecasting – GP only serving as polishing the half-and-half method proposed by Koza which uses the full
input for forecasting models. method to generate one half of the population and the grow
method to generate the second half of the population [5].
Market psychology indicators are ultimately quantified Since this is considered as the better growing method, the
into numerical values and this is where the mathematical code implements this approach.
prediction capabilities of GP come into picture. Thus, in this
paper, the market trend (overbought or oversold) predicted by In the full method, nodes of the trees are chosen only from
RSI is corroborated with the increase or decrease in price the function set till the depth limit is reached, after which the
predicted by GP. Also, the GP approach that is used as a external nodes are chosen only from the terminal set. In
verification to the trend predicted by RSI, is customized as contrast, the grow method creates more varied trees as all the
thoroughly as possible to suit this exact need. nodes of trees are chosen either from the function set or the
terminal set until the depth limit is reached.
III. WORKING OF GP
C. Selection followed by Recombination and Mutation
Based on the given problem statement, a set of initial
A suitably selected fitness model decides the fitness values
programs are randomly created – these programs will belong
of every program. Here, fitness indicates how close an
to the first generation. If certain features of the final solution
individual of a generation is close to the pursued solution.
are known or expected, those features could be included in the
initial population (referred to as seeding). This helps in After a generation is created, a selection tournament is
narrowing down the search for the possible solution. conducted where individuals are pitted against each other. The
ones with the higher fitness get an increased chance to
participate in the recombination/mutation process to give birth
to the next generation. However, care must be taken to keep
the selection pressure constant by ignoring the quantitative
Evaluate to Obtain the value of ‘how much fitter’ is one individual than another. This
Generate
identify best found helps in maintaining the uniqueness of the available genetic
initial diversity.
the fitter solution
population
programs after some The most popularly method used for recombination is sub-
of random
and breed stopping tree crossover, where crossover points (randomly chosen
programs nodes) are selected from two individual trees and the sub-trees
them criteria
at those points are combined together to form an altogether
new tree. This is analogous to the mating of two humans and
the birth of a child who inherits genes from his mother and
father.
Fig. 1. An overview of GP
Another novel, though sparingly used technique is
Programs from the first generation will be tested by a mutation where a mutation point (randomly chosen node) is
suitably selected fitness model to identify the ‘better’ ones. selected and a randomly generated sub-tree is attached to it.
The more fit a program, the higher the probability it has to be Here, the child may be radically different from the single
included in breeding operations to reproduce the next parent that spawned it. Though mutation breaks the traditional

132 | P a g e
978-1-5090-5443-5/17/$31.00 ©2017 IEEE
Computing Conference 2017
18-20 July 2017 | London, UK
norm of passing down the ‘good’ genes of individuals to vastly differs from certain methods where GP is used as it is
future generations, it has the benefit of increasing the diversity on certain market indicators like Simple Moving Average [6].
of the genetic pool. If RSI forecasts a trend indicating the rise in the upcoming 40
days and if GP also indicates a rise in the close price of the
IV. WORKING next day, it is said that RSI has been verified by GP. The same
First and foremost, real-time stock data is needed. A goes for RSI depicting a falling trend and GP also depicting a
Python module has been used to extract 40 days’ worth of data drop in price for the next day
in the past which includes the open, close, low, high and Finally, the verified and unverified RSIs are then validated
volume values of the stock. The module fetches this useful against the actual rise or fall that happened in the stock market.
data from Yahoo. If RSI indicates either a rise or fall, GP shows a rise or fall and
Secondly, the RSI is calculated over the past 40 days using in reality a rise or fall occurred respectively, it denotes a
the formula which in turn gives us an indication to either buy success case. Similarly, if RSI predicted a rise and GP
or sell the stock. The trend depends on the value of RSI – predicted a fall and in reality a fall in price did happen, it
whether it crosses the overbought band or the oversold band or shows the success of the GP verification in disregarding the
is a value between the bands. GP is also used on the previous RSI prediction.
40 days’ data to come up with a function that would predict
the price on the next day. At each generation, the correctness V. OBSERVATIONS
of an individual function’s output is checked against the The GP has been tailored as much as possible and RSI has
closing price of the stock on the next day. GP is done in been verified against the many GP approaches, in order to get
Python, relying on the Distributed Evolutionary Algorithms in the best parameters for the need.
Python (DEAP) package.
For testing purposes, 16 stocks of tech-based companies
listed in NASDAQ have been considered and the graphs have
been plotted and documented in MS Excel. The data chosen
for input is real data, as recent as Sept. 16. Different
approaches indicate the different changes made in GP to serve
as better verification checks for RSI.

Approach 1
2

0
0 5 10 15 20

2 - Success 1 - Failure

Fig. 3. Approach 1 shows 11/16 success cases

Approach 2
2

0
0 5 10 15 20

Fig. 2. Flowchart depicting the methodology applied 2 - Success 1 - Failure

Thirdly, a verification check is done by GP on RSI. This Fig. 4. Approach 2 shows 10/16 success cases

133 | P a g e
978-1-5090-5443-5/17/$31.00 ©2017 IEEE
Computing Conference 2017
18-20 July 2017 | London, UK
VI. LIMITATIONS OF VERIFIED RSI
Approach 3 It is important to bear in mind that only one technical
2 indicator (RSI) has been verified by GP. Although, a popular
indicator among traders for it’s depiction of market
psychology, it is only best suited for trend prediction. Existing
forecast models use many technical indicators as inputs which
1
are proven to cover many aspects of the market.
The customized GP approach used has a fine math
0 accuracy in predicting the price of the next day. The accuracy
0 5 10 15 20 though is purely mathematical and as such does not have any
relation to other market indicators.
Success Failure
As regards to the test cases, there is definitely scope for
further improvement by considering various market indicators
Fig. 5. Approach 3 shows 9/16 success cases
which could perhaps give a clearer picture of the market,
provided their results are not conflicting.
Only RSI VII. CONCLUSION
2 The approach used here bypasses the need for considering
many technical indicators which need to be heavily discretized
before being sent to forecasting models. Not only does this
1 increase calculation overhead but also requires many
discretization algorithms to be used simultaneously [7].
As can be seen from the observations, the GP-verified RSI
0 (with the best GP approach) has a 18.75% improvement in
0 5 10 15 20 performance over RSI applied without a GP check. This
shows a promising scope for further use of GP as a
2 - Success 1 - Failure
verification mechanism for RSI as a much better decision-
making support. The project proposes that the success of GP
Fig. 6. 6 RSI does not perform as well as verified RSI (only 8/16 success validation on RSI could be effectively replicated on various
cases)
other market indicators in a similar fashion to increase their
As inferred from the obtained results, simply using RSI accuracy. Different market indicators could be amalgamated
gives an accuracy of only 50% while the various other GP to cover many aspects of the stock market and then have the
approaches for validation show an accuracy upwards of 60%. correctness obtained further improved by verifying it with GP.
The approaches differ from each other based on their Lastly and most importantly, the success of this approach
parameter values. The GP approach that offers the best opens new vistas for the creation of better suited market
validation check is approach 1. indicators [8].
REFERENCES
[1] E.Zhao, Z.Han, “Analyze Long & Mid-term Trends of Stock with
Genetic Programming on Moving Average and Turning Points,” IEEE,
2010
[2] V. Hlavac, “A program searching for a functional dependence using
Fig. 7. Parameter values for various GP approaches genetic programming with coefficient adjustment,” Smart Cities
Symposium Prague,2016
While testing the different approaches it was observed that [3] stockcharts.com/school/doku.php?id=chart_school:technical_indicators:r
elative_strength_index_rsi
change in certain parameter values, such as fitness weights
[4] C.H.Cheng, H.Y.Shiu, “A novel GA-SVR time series model based on
had no effect at all on the performance. Increasing the selected indicators method for forecasting stock price,” IEEE, 2014
generation to a limit more than 400, results in a bottleneck – [5] R. Poli, W. B. Langdon, N. F. McPhee, “A Field Guide to Genetic
showing no increase in accuracy. Programming,” pp. 11-14
The parameter values that resulted in the best approach [6] T.P.Nascimento, S.Labidi, P.B.Neto, N.Timbo, A.Almeida, “A System
Based on Genetic Algorithms as a Decision Making Support for the
could be taken as guidelines for future work involving GP as a Purchase and Sale of Assets at Sao Paulo Stock Exchange,” IEEE, 2015
verification mechanism for market indicators like RSI. The [7] G.Wilson, W.Banzhaf, “Fast and Effective Predictability Filters for
best approach also shows that for the application of this paper Stock Price Series using Linear Genetic Programming,” WCCI 2010
the tournament size should be kept high and the number of IEEE World Congress on Computational Intelligence
generations when kept at the standard limit of 200, gives [8] M.Khoza, T.Marwala, “A Rough Set Theory Based Predictive Model for
optimum performance. Stock Prices,” 12th IEEE International Symposium on Computational
Intelligence and Informatics, 2011

134 | P a g e
978-1-5090-5443-5/17/$31.00 ©2017 IEEE

You might also like