You are on page 1of 6

Prediction of Future Stock Market Prices

Even Cheng

Even Cheng
CS273A-Machine Learning
Jun 10, 2010

Prediction of Future Stock Market Prices
ABSTRACT
Everyone wants to make money from stock market. In real life the stock market is decided by
various factors, such as financial crisis in Europe will affect to financial stocks in the states, or the oil
spill in Gulf makes the stock value of British Petroleum cut into half after the crisis last for one
month. From the aspects of machine learning, these accidents are less predictable from data, but
instead, we want to simplify our prediction under the assert there is no obvious force outside to
affect the stock price and how is the curve of one stock price should be? Clearly, this report is based
on the idea: What if we can predict the price in the coming future? In that case we can know if a
stock price is going up in the near future, we will buy it more in. The other case, if we know the
stock price is going down drastically, we should be better selling it out quickly. In this report, we will
briefly review a couple classification methods that are used to predict stock prices in Section 2, and
in Section 3 we will introduce our dataset. In Section 4 we will introduce the way we do feature
selection and the implementation. Section 5 contains our experiment results and discussion. In last
two parts, we are going to conclude this report by proposing possible future works for this topic and
conclusion.

1. INTRODUCTION learning in machine learning. Most of main
goal of these papers is evaluating the
For most people who invest in stock direction of change (sign) in a stock. For
market wants to know how will one stock example, in Rodriguez’s paper an Adaboost
price go up or down tomorrow. If we know it classifier is constructed by means of
is going to go up, people will buy in the stock integrating several weak classifiers into a
more in order to earn the profit, and on the strong ensemble[2]. This method expect to
contrary side, we will sell the shares as soon as gather all the benefits from every weak
possible to prevent from significant loss. As a classifiers to retrieve a better result. Tsaih
result, in this paper we are going to make use purposed an hybrid AI approach which
of our knowledge in the past: how does the incorporates a rule-based system to provide
stock infatuates from time t-k to time t, to the neural network training samples, and a
build an empirical model, and predict the reasoning neural network is implemented
highest price and lowest price in the time instead of back propagation network that
frame from t+1 to t+5. usually employed[3]. Another investigation
adopts the probabilistic neural network to
2. RELATED WORKS forecast the direction of price movement in
Taiwan stock index claimed to gain a higher
There are a series of research on accuracy than other investment strategies
predicting time series such as stock price by such as generalized methods of moments
empirical investigation on the behavior of one with Kalman filter or random walk.[5]
stock price implemented with supervised

-1-
Prediction of Future Stock Market Prices
Even Cheng

3. DATA ANALYSIS
Our data is consist of 7 time series called
stock a, b, c, d, e, f, g respectively, and each
time series represents the fluctuation status of
one stock price. There are 441,843 data rows
in each series, which means this dataset
collects 441,843 minutes of data stock prices
for stock a to g. In every time series, there
contains following 6 columns:
• Open: This column indicates the
opening price of a stock at time t.
• Max: This column indicates the highest
stock price of a stock in time t. Figure 2: The decision process of KNN algorithm and
its decision curve based on the dataset.
• Min: This column indicates the lowest
stock price of a stock in time t.
neighbors which is calculated by computing
• Close: This column indicates the closing Euclidean distance from the new node to all
price of a stock at time t. the nodes in training dataset, while k is a
• No_trade: This column indicates the positive integer and run voting algorithm
Number of trades occur in time t. according to these labels.

• Volume: This column indicates the
Total volume of trades completed in time t.
4.2.FEATURE VECTOR SELECTION
4. IMPLEMENTATION
Intuitively, we use data in the past as our
4.1.CLASSIFICATION METHOD feature vector to build our model. We collect
data from time t-k to t, where time t indicates
In this report, we use K-Nearest Neighbor ‘now’ at one time point. We also incorporate
classification method to predict our results.
The idea of KNN classifier is simple. It
classifies a new data node, or testing data by
gathering the labels from its k nearest

Figure 3: compute Euclidean Distance from
node p to node q for p and q have n dimensions.

Figure 1: The stock price of NASDAQ:GOOG in one
day. miscellaneous data to assist our prediction,
such as the opening price, closing price,

-2-
Prediction of Future Stock Market Prices
Even Cheng

number of trade, and volume of one stock The following results are trained by using
price. time series between time 1 to time 400,000,
and testing data are consists of time series
4.3.EVALUATION METHOD from time 400,001 to time 441,838. However,
for the need of displaying data in this paper,
We evaluate our prediction results by we will only crop a small scope of them. All
computing the root mean square error results are run with Matlab 7, on a Macbook
(RMSE) between real maximum/minimum in Pro laptop with bootcamp under Windows
next 5 minutes and the predicted maximum/ operation system.
minimum returned by our classification
First of all, from Figure 5 we can tell
method. And our baseline is predicting the
usually as we have more knowledge to the
maximum and minimum in next 5 minutes by
past, the error rate is going to decrease
guess they are the same to the highest and
gradually. The error is dropped from 10 the
lowest price at this minute. The way to
beginning which when we just have
compute root mean square error is illustrated
knowledge about current data to almost 8.5
as:
when we have the highest price in past 8
minutes.
5. EXPERIMENT RESULTS
5.2.AMOUNT OF NEIGHBORS
5.1.KNOWLEDGE TO THE PAST
From Figure 6 we learnt for both of the
highest and lowest price, the errors drop
down significantly before k=10. However, the
error of predicting maximum climbs up after
k is great than 10. And it results to after a
certain degree, the nearest data is not ‘near’
as expected and it becomes noise to
correctness of our prediction.

There is an interesting phenomenon here
Figure 4: The definition of root mean square error
for the case of predicting minimum data.
(RMSE)

Figure 5: Feature vector extraction: looking back from time t to t-k minute -3-
Prediction of Future Stock Market Prices
Even Cheng

Figure 6: Labels gather from K nearest neighbors when K varies and error rate.

There is no obvious variance after the data, which is the maximum or minimum
number of K is greater than 10 and it may value we are looking for and painted them in
result from the special attribute of data itself. Figure 7. The dotted blue line indicates the
real answer, green line represents the highest
5.3.PREDICTING MAXIMUM AND price at time t, and black dotted lines indicate
MINIMUM IN NEXT 5 MINUTES predicted values returned from our classifier.
We adopt two rules that described in the
After set up appropriate feature vectors following in order to get rid of unwanted
and selected the a good K where K=7, we prediction.
run the classifier to predict the label of new

Figure 7: Predicting maximum of stock ‘A’ in next 5 minutes with assistant lines.

-4-
Prediction of Future Stock Market Prices
Even Cheng

Figure 8: Prediction result after removing the highest and lowest.

First of all we restore predicted values -5 to +5 in both case of predicting the highest
which have Euclidean distance higher than 10 and lowest price in next 5 minutes.
to current highest or lowest value of this In Figure 10 the upper part of graph
stock. After this, we removed the highest value indicates those minutes we correctly predicted
and lowest one in order to prevent outliers and the lower part indicates which we mis-
before averaging them, and the red line here classified. The first data indicates we predict
in Figure 8 indicates our final prediction the stock price is going to go up and in the
results. reality in next five minutes it goes up. The
The error distribution is drawn below in second one indicates what we predicted and
Figure 9. We can tell there are more than in reality both stay unchanged. The third one
90% of the our prediction errors fall between indicates we correctly predicted the stock

Figure 9: Distribution of error in predicting maximum and minimum data -5-
Prediction of Future Stock Market Prices
Even Cheng

Figure 10: Signal of predicting the stock price goes up/goes down or stay unchanged.

price is going to go down. From the lower select because we think it is a tougher
part of Figure 10 we can tell most problem to choose a set of proper feature that
classification error comes from predicting data can fit the data well than to pick an adequate
nodes which stay unchanged. Owing to the classification method. Also, during this report,
special attribute of data, usually there is a with involving a lot of time, I learned a lot
section of 5 time nodes when there is a local about how to select the feature vector while
maximum or minimum value. If we can make not overfitting the training data, the way to
good use of this property, and there is high test data, and to evaluate the final results and
possibility we can narrow down the error. interpret it by means of the root mean square
error. At last, we achieved to predict the
6. FUTURE WORK unseen data for 90% of the data can be
predicted less than 5 units of error.
Due to the limitation of team size and
time, we can only deal with one classification REFERENCES
method and try to minimum the prediction
error by tuning the results. But it is very 1. C. Bishop: Pattern Recognition and Machine
Learning, 2006.
worthy to do if we can compare this
classification results with other classification 2. EF Fama, The behavior of stock market prices,
Journal of Business 30, 34-105, 1965.
methods, so we can compare one another and
see which classification method can fit this 3. PN Rodriguez, SS Rivero, Using machine learning
algorithms to find patterns in stock prices, 2006.
stock price dataset better.
4. R Tsaih, Y Hsu, CC Lai - Decision Support
Systems, Forecasting S&P 500 stock index futures
7. DISCUSSIONS AND CONCLUSION with a hybrid AI system, Decision Support System
23, 161-174, 1998.
From this report we learned the essence of
5. AS Chen, M Leung, H Daouk, Application of
machine learning is to learn the behaviors of neural networks to an emerging financial market:
data in training set, and find out an Forecasting and trading the Taiwan stock index,
appropriate classifier or regression model to fit Computers and Operations Research 30, 901-923,
2003.
the data. We spend most our efforts in model

-6-