12 views

Uploaded by BONFRING

Bonfring International Journal of Data Mining,Volume 3, Issue 2, 2013

save

You are on page 1of 5

2, June 2013 12

ISSN 2277 - 5048 | © 2013 Bonfring

Abstract--- Exchange rate prediction is a challenging topic

in the recent decade. Various studies have been done to

improve the prediction regarding the accuracy in terms of

level error and directional status error. The aim of this paper

is to introduce a methodology that uses KNN (K-nearest

neighbors) and DTW (dynamic time warping) to improve the

fluctuation prediction and to have better evaluation

parameters in the literature of financial market forecasting,

comparing to other researches. The study is done with

USD/JPY(United States Dollar/Japanese Yen) exchange rate

time series and the results show improvement of prediction

regarding the direction of time series. USD/JPY exchange

rates are gathered from 1971 to 2012 and are partitioned into

30 element segments regarding the monthly cyclic behavior of

the time series. Then two different set of these 30 element

segments are divided with 7:3 ratio and the KNN is used to

find out the 3 nearest neighbors regarding the DTW as

similarity function. By a chosen function introduced also in

this research, the directional status of the last element is

predicted and the prediction result is then compared with

other results in the literature of exchange rate prediction.

Keywords--- Dynamic Time Warping, Time Series

Classification, Exchange Rate Prediction, KNN, USD/JPY

I. INTRODUCTION

HE biggest financial market in the world is foreign

exchange market with more than 4 billion dollars daily

turnover[1]. This shows the importance of prediction in this

market for the traders. Prediction in the financial market has

been a controversial topic both in the economy and forecasting

literature [2]. Many believe in random walk theorem and

efficient market hypothesis that says in a market where all the

actors have the same access to information, the bigger profit

comes from the more the trader takes risks and not by

forecasting, but there are many other researchers who believe

in use of forecasting and modeling the financial time series to

the traders and their decision makings [3]. The USD/JPY

ArashNegahdari Kia, PhD Scholar, University of Tehran, Iran. E-mail:

arash.nkia@gmail.com

Dr. SamanHaratizadeh, Assistant Professor, Department of Network

Science and Technologies, University of Tehran, Iran.

Dr. HadiZare, Assistant Professor in the Department of Network Science

and Technologies at University of Tehran, Iran.

DOI: 10.9756/BIJDM.4658

exchange rate financial market shows random walk

characteristics more than other exchange rates in the floating

regime and thus is harder to be predicted [1].In this study the

researchers believe in the possibility of finding the patterns in

financial time series and usefulness of them for the traders in

their trades and try to forecast the fluctuations of USD/JPY

time series.

In literature review of financial time series forecasting,

three points of view can be found out. Some researches

emphasize on economics literature and analyze the charts and

series by economical indices like those in technical analysis of

stock markets. The second point of view emphasizes the

statistical approaches like ARIMA (Auto-Regressive

Integrated Moving Average) methodology for forecasting and

the third point of view tries to forecast the series with newer

artificial intelligence and data mining techniques [4].In this

paper we use KNN methodology with DTWas distance

function between our time series patterns to classify the series

and predict the fluctuation of a given financial time series of

exchange rate.

In a research, efficient nearest neighbor search for

multivariate time series had been studied[5]. Efficient

similarity search over future stream time series had been

studied in another work with use of KNN and DTW [6].

Another efficient similarity search over multivariate time

series was done in the same yearby Zhou et al, [7], which used

KNN and DTW. Online detection and prediction of special

patterns over financial data streams was studied with the help

of KNN and DTW in a research by Jiang et al, [8]. In 2010, A

framework for time-series analysis was presented in a work by

Kurbalija et al, [9]. In the same year, Xing and Keogh did a

brief survey on sequence classification [10].Banko and

Abonyi studied correlation based dynamic time warping of

multivariate time series in 2012 and used KNN for indexing

large time series databases [11]. Ando did a study in 2012 to

optimize the performance of classification in time-series with

nearest neighbor density approximation [12]. Nguyen et al. did

a research in 2012, for time series classification with

ensemble-based positive unlabeled learning methodology and

used KNN in their study [13]. There can be found many other

researches both in financial time series field and other sciences

field that use KNN and/or DTW as their methodology to

classify, model, or predict sequences or time series.

In the next section of this paper, the methodology used for

prediction of exchange rate time series is proposed. The KNN

and DTW methods are described in brief and in section 3, the

Prediction of USD/JPY Exchange Rate Time Series

Directional Status by KNN with Dynamic Time

Warping AS Distance Function

ArashNegahdari Kia, Dr. SamanHaratizadeh and Dr. HadiZare

T

Bonfring International Journal of Data Mining, Vol. 3, No. 2, June 2013 13

ISSN 2277 - 5048 | © 2013 Bonfring

data preparation and compilation phase is discussed. Section 4

talks about the evaluation methodologies in financial time

series literature and section 5 presents the results. The last

section concludes and suggests some topics for the future

researches.

II. METHODOLOGY

In this section we talk about our methodology of prediction

and a brief introduction to DTW for finding the distance

between two time series. There are two ways of predicting a

time series. First is the way that predicts the time series values

of the next elements and the second is to predict the next

fluctuations, that means, the emphasis of the prediction is to

know if the series goes up or down in direction regarding the

previous value [1].We talk about this more in the evaluation

section. The 3NN is used in KNN methodology to find the

three nearest neighbors of the time series that is going to be

predicted and then by the distance from the time series to its 3

nearest neighbors, a function is used to determine the class (up

or down fluctuation in our case) of the time series by the

distance of it from its neighbors. The distance is calculated

with DTW technique that is discussed later in this section.

After finding the 3 nearest neighbors of the time series, we use

the function introduced in (1) to estimate the class label. Class

label 1, says the time series next value is going to be greater

than the previous value and label 0 says that the next value is

going to be less than the previous value. This shows the

fluctuation of the time series in direction.

(1)

Suppose , , and be the distances of the 3 nearest

neighbor time series from the time series that is going to be

predicted and their correspondent class labels are , , and

. Using the inverse of distances makes the nearest neighbors

have a greater impact in the result. The whole methodology is

presented in figure 1 as a diagram. Description is presented

afterwards as a step by step process:

Figure 1: Prediction Process Diagram with KNN and DTW

i. Splitter block: Splitting the exchange rate time series

into n-element equal sequences with class labels 1 or

0, if the next value after the last element is going to

be greater or less than the last element of the

sequence.

ii. Partitioning block: divides the sequences into test and

train set. In our case, the test set is 20 percent of all

the samples as used in the work of Yao, and Tan, [1].

The class label of the last sequence in the test set can

be used as a prediction of fluctuation of exchange

rate in our method and presents a buying or selling

signal to the traders.

iii. DTW block: Finds the distance between the entire

test set instances with the train set instances with

dynamic time warping algorithm.

iv. KNN block: Finds the K nearest neighbors of a

sample in test set and passes them into the prediction

block.

v. Prediction block: Uses the function presented in (1)

to predict the class label (next direction of the series

in our case).

Dynamic Time Warping

A brief description of DTW used in our procedure is

presented here. Dynamic time warping is a method of finding

a more accurate distance measure between time series,

compared to simple Euclidean distance between the elements

of time series and sequences one by one. This is due to the

different time series not being aligned in time[14]. The

algorithm is presented in figure 2.

Bonfring International Journal of Data Mining, Vol. 3, No. 2, June 2013 14

ISSN 2277 - 5048 | © 2013 Bonfring

Figure 2: DTW Algorithm in Partitioned Exchange Rate Time Series

III. DATA COMPILATION

Data set has come from the Federal Reserve economic

data, economic research division of Federal Reserve Bank of

St. Louis [15]. We used USD/JPY exchange rate data from

Dec 4

th

of 1971 to Dec 24

th

of 2012 that by 15331 records of

date-exchange rate, the data was divided into 30 element time

series groups that became 500 time series with a class label

which was derived from the next element (31th element, or the

first element of the next sequence).

Figure 3 shows the exchange rate time series which will be

predicted in terms of fluctuation in our research. If the 31th

element was higher than the 30th element, the class label was

1, otherwise 0, that showed the direction of fluctuation, the

label which we tried to predict in the test set. We used the 30

elements sequence because of the monthly cyclic behavior of

the exchange rate time series.

Figure 3: USD/JPY Exchange Rate Time Series from Dec 4th

of 1971 to Dec 24th of 2012

The data before 1971 was useless for our research due to

the Smithsonian agreement which adjusted the fixed exchange

rates of Breton Woods’s conference in 1944. The USD/JPY

became a floating exchange rate and Japanese Yen was not

purged to US Dollar anymore, so the prediction of this time

series became reasonable after 1971 [16].

There were some missing data in the data set for holidays.

The MA3 (Moving Average of 3 days before) that is a famous

technical analysis index was used to fill in the missing data of

those days. At the end the compiled data set of 500*30

element sequences with one class label was divided into a

training set of 400 sequences and a test set of 100 sequences

by 7:3 ratio. The 7:3 ratio came from the 7:2:1 ratio of

training, validation, and test set that was used in Yao and Tan

work in 2000 to learn their multilayer perceptron neural

network [1]. Another reason for the7:3 ratio is that we will

compare our results with Yao and Tan study which has been

cited a lot in the exchange rate prediction literature [1].

IV. EVALUATION METHODS

Three different evaluation methodologies in financial time

series prediction that are used more are described in this

section. First is a class of mean square error evaluation

parameters which catches the error in terms of level. This

means that the mean square error shows if the prediction is

closer to the real future value of the time series or not. The

RMSE (Root Mean Square Error) is presented in (2). The is

the real future value of the time series while the is the

predicted value.

RMSE = (2)

The second methodology looks at the problem in a more

practical way. It assumes that the traders come to trade with a

prediction method and with a fixed amount of money, so the

evaluation methodology should calculate how much profit or

loss the traders will have after trading in a specific period of

time. This method is good due to its practical point of view

and not good because it depends on two other parameters of

the money traders have at first and determining the period of

time we want to simulate the trade[4].

The third evaluation method is to see what percentage of

predicted values is in the correct direction. It means that if the

real future value is bigger than the last value, the predicted

value should also be bigger than the last value and if the real

future value is less than the last value, the predicted value also

should be less than the last value. This is shown in (3). This

evaluation method is the most important one for a trader. This

evaluation parameter is called directional status or directional

success. It is also called gradient% in Yao and Tan’s study [1].

Assume the present value of exchange rate is 5, and the real

future value is 6. If one predictor predicts the value 4 for the

future and the other predictor predicts 100, the first predictor

is better in terms of MSE evaluation parameter but the second

is better in terms of directional success or gradient. But the

Bonfring International Journal of Data Mining, Vol. 3, No. 2, June 2013 15

ISSN 2277 - 5048 | © 2013 Bonfring

first predictor signals selling to the trader and hence results in

loss. This is a simple example of why directional success is a

more practical evaluation parameter comparing to MSE.

(3)

Same as before, is the original value of the time series,

is the predicted value, N is the number of samples, and,

shows if the direction is predicted correct or not.

V. RESULTS AND DISCUSSION

After implementing the process described in the

methodology section of our research we reached the results

presented in table 1. In table 1 the result of our methodology is

compared to the work of Yao and Tan [1]. The result shows 10

percent increase in the correctness of directional status. This

shows that using the KNN-DTW method achieves more

success in directional status prediction than the multilayer

perceptron modeling in prediction of USD/JPY exchange rate

time series.

Table 1: Comparing KNN-DTW model with Yao, Tan ANN-

MLP (Artificial Neurl Network – Multilayer Perceptron)

model in Directional Status Evaluation Parameter

Time Series Model

Normal

Mean Error

in Test Set

Directional

Success

USD/JPY[1]

MLP with 5-

4-1 Structure

(3 Layers)

1.966195 46.59%

JPY/USD (Yao

J., Tan C.L.;

2000)

MLP with 6-

4-1 Structure

(3 Layers)

1.242099 46.59%

USD/JPY

3NN – DTW

Model

- 54%

The diagram in figure 4 presents the instances in the test

set and if the prediction of direction was correct or not. The

upper band shows the error in prediction while the lower band

shows the correctness of it.

Figure 4: Error and Correctness in Prediction of Direction in the Test Set

As it was mentioned before, the Yen-Dollar market

(USD/JPY exchange rate market) follows the random walk

properties more than other exchange rate markets because it is

more efficient market than others and information flow is

more efficient in the Yen-Dollar market [1]. This makes the

Yen-Dollar market more challenging and harder for the

prediction models and achieving a 10 percent improvement in

directional status prediction could be useful to the practical

usage and the traders.

VI. CONCLUSION AND FURTHER RESEARCHES

In this study, we found out that using KNN with dynamic

time warping as distance function in a specific procedure will

improve directional status prediction results comparing the

previous studies. Using a data set of 15331 USD/JPY

exchange rate records, we built 500 sequences of 30 element

exchange rates and partitioned this data into 30% test set and

70% training set. The results specifically showed a 10 percent

improvement in directional prediction comparing the study of

Yao and Tan [1] which is one the most cited researches in the

field of financial prediction using newer artificial intelligence

and data mining technique methodologies.

For further studies, the researches may use other

customized DTW algorithms to improve the distance function.

Other exchange rates like GBP/USD or EUR/USD can also be

tested to see how much improvement this method will achieve

in directional prediction. The function that we used after the

KNN process can also be different than the function we

presented in (1) and the changes in the evaluation results can

be analyzed.

Bonfring International Journal of Data Mining, Vol. 3, No. 2, June 2013 16

ISSN 2277 - 5048 | © 2013 Bonfring

REFERENCES

[1] J. Yao, Ch. L. Tan, "A case study on using neural networks to

perform technical forecasting of forex.",Neurocomputing, vol.

34, no. 1, pp. 79-98, 2000.

[2] Abu-Mostafa, S. Yaser, A. F. Atiya, "Introduction to financial

forecasting.", Applied Intelligence, vol. 6, no.3, pp. 205-213,

1996.

[3] A. Timmermann, C. W.J. Granger, "Efficient market hypothesis

and forecasting.", International Journal of Forecasting, vol. 20,

no.1, pp. 15-27, 2004.

[4] M. Fathian, and A. N. Kia, "Exchange rate prediction with

multilayer perceptron neural network using gold price as

external factor.", Management Science, vol. 2, 2012.

[5] K. Yang, C. Shahabi, "An efficient k-nearest neighbor search for

multivariate time series.", Information and Computation, vol.

205, no. 1, pp. 65-98, 2007.

[6] X. Lian, L. Chen, "Efficient similarity search over future stream

time series.", Knowledge and Data Engineering, IEEE

Transactions, vol. 20, no. 1, pp. 40-54, 2008.

[7] D. Zhou, M. Li, H. Yan, "An Efficient Similarity Search For

Financial Multivariate Time Series.", Wireless Communications,

Networking and Mobile Computing, WiCOM'08. 4th

International Conference on.IEEE, 2008.

[8] T. Jiang, Y. Feng, B. Zhang, "Online detecting and predicting

special patterns over financial data streams.", Journal of

Universal Computer Science, vol. 15, no. 13, pp. 2566-2585,

2009.

[9] V. Kurbalija, et al., "A framework for time-series analysis.",

Artificial Intelligence: Methodology, Systems, and Applications.

Springer Berlin Heidelberg, pp. 42-51, 2010.

[10] Z. Xing, J. Pei, E. Keogh, "A brief survey on sequence

classification.", ACM SIGKDD Explorations Newsletter,vol. 12,

no. 1, pp. 40-48, 2010.

[11] Z. Bankó, J. Abonyi, "Correlation based dynamic time warping

of multivariate time series.", Expert Systems with Applications,

2012.

[12] Sh. Ando, "Performance-Optimizing Classification of Time-

series based on Nearest Neighbor Density Approximation.",

Data Mining Workshops (ICDMW), 12th International

Conference on, IEEE, 2012.

[13] M. N. Nguyen, X.L. Li, S.K. Ng, "Ensemble based positive

unlabeled learning for time series classification.", Database

Systems for Advanced Applications, Springer Berlin

Heidelberg, 2012.

[14] E. Keogh, Ch. A. Ratanamahatana, "Exact indexing of dynamic

time warping." Knowledge and information systems, vol. 7, no.

3, pp. 358-386, 2005.

[15] [online], access date: 2013-01-15, URL:

http://research.stlouisfed.org/fred2/categories/94

[16] T. Murano, "International Currency Realignment and the

Yen."The Developing Economies, vol. 10, no. 4, pp. 340-358,

1972.

ArashNegahdari Kia did his B.S. in Computer Science

at Department of Mathematics, Faculty of Science in

University of Tehran during 2001-2005. He did his

M.Sc. IT engineering in e-commerce at School of

Industrial Engineering, Iran University of Science and

Technology during 2006-2009. He is presently pursuing

his Ph.D. in IT engineering in the area of Data Mining

and financial prediction under the guidance of Dr.

SamanHaratizadeh, Associate Professor, Department of

Network Science and Technologies, University of Tehran. He has been

awarded the first place honor of post graduate student in his M.Sc. degree. His

Master Thesis was about prediction of foreign exchange rate time series with

hybrid model of artificial neural networks and ARIMA using gold price as

external factor.

Dr. SamanHaratizadeh has received the doctorate degree

in Artificial Intelligence, Software Engineering from

Sharif University of Technology. He is currently

working as Assistant Professor, Department of Network

Science and Technologies, University of Tehran. His

research interests, courses, and papers are in the field of

Data Mining, Machine Learning, Decision Support

Systems, Bio-Informatics and development of intelligent

tools for information processing, and automatic decision making.

Dr. HadiZare received the Ph.D. degree in Applied

Mathematics from the Amirkabir University of

Technology, Tehran, Iran in 2012. He received an

M.Sc. in Mathematical Statistics with high distinction

from the Amirkabir University of Technology in 2008.

He is currently an Assistant Professor in the Department

of Network Science and Technologies at University of

Tehran. His research interestsare in Statistical Machine

Learning and include feature selection, dimensionality

reduction methods, and probabilistic models for complex networks.

- Mining of Frequent Patterns Using Session ID With K-Nearest Neighbor AlgorithmUploaded byIJARTET
- State of the Art in Face RecognitionUploaded byBui Doi Lam Chip
- The Pulse of News in Social Media- Forecasting PopularityUploaded byBert Kok
- SCM-APO DP OverviewUploaded bynaveen_thumu
- RATS 900 Textbook Examples.pdfUploaded byJaime Mogollón Michilot
- IRJET-Classification of Liver Disease Based on US ImagesUploaded byIRJET Journal
- knnVsNNUploaded bya7asharath
- Chap11Uploaded bycoolacl
- SK's Chpt 04 Demand ForecastingUploaded byProfessor Sameer Kulkarni
- Study on the K-Anonymous Algorithm for Data Publishing Scenario.Uploaded byjcseuk
- Min2008_AdaptivekNNClassnviaLaplacianEigenmapsandKernelMixtures_LEKMUploaded byalexandru_bratu_6
- Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)Uploaded bysolomatine
- Forecasting is the Process of Making Statements About Events Whose Actual OutcomesUploaded byshashi
- =moving object db-UVOD-engUploaded byAndrijana Veljkovic
- LUploaded byzanboorr
- ForecastingUploaded byjay
- Novel Applications of Artificial IntelligenceUploaded byPramod Nammi
- 3. IJCSEITR - Indoor Wi-Fi Based Positioning System Using SignalUploaded byTJPRC Publications
- Time Series AnalysisUploaded bymidori
- Production and Operations Management AssignmentUploaded byCraig Williams
- wp-23-2008Uploaded byfreemind3682
- Using Genetic Algorithms for Data Mining Optimization in an Educational Web-Based SystemUploaded byvasu
- Yu MappingUploaded byjelusha
- mps99Uploaded byNina Brown
- Personal Navigation SystemUploaded byhungtn999
- ForecastingUploaded bytanaka
- bmlec6_4Uploaded bysiazamir
- RDF User GuideUploaded byvarachartered283
- Forecasting 2008Uploaded bymayurnanda86
- Bibliography of Temporal, Spatial-Temporal Data Mining ResearchUploaded bysakkk

- A Study on Marketing Management Practices of Malls in Madurai CityUploaded byBONFRING
- Awareness among the Consumers about Instant Food ProductsUploaded byBONFRING
- Locating Hybrid Power Flow Controller in a 30-Bus System Using Chaotic Evolutionary Algorithm to Improve Power System StabilityUploaded byBONFRING
- Enhanced Automatically Mining Facets for Queries and Clustering with Side Information ModelUploaded byBONFRING
- A New Approach to Single Phase AC Microgrid System Using UPQC DeviceUploaded byBONFRING
- Topic Categorization based on User behaviour in Random Social Networks Using Firefly AlgorithmUploaded byBONFRING
- Parallel and Multiple E-Data Distributed Process with Progressive Duplicate Detection ModelUploaded byBONFRING
- Investigation of CO2 Capturing Capacity of Solid Adsorbents (PEIs) Polyethylenimines from Automotive Vehicle Exhausts System for 4-Stroke SI EngineUploaded byBONFRING
- Fit for Life: Home Personal CoachUploaded byBONFRING
- Topic Categorization on Social Network Using Latent Dirichlet AllocationUploaded byBONFRING
- Enhanced Adaptive Multimedia Data Forwarding for Privacy Preservation in Vehicular Ad-Hoc Networks Using Authentication Group KeyUploaded byBONFRING
- Enhanced Secure Big Data in Distributed Mobile Cloud Computing Using Fuzzy Encryption ModelUploaded byBONFRING
- Analysis on Double Disk Grinding Carry Plate GearboxUploaded byBONFRING
- Enhanced Scalable Learning for Identifying and Ranking for Big Data Using Social Media FactorsUploaded byBONFRING
- Development of Power Quality Event Using Diode Clamped Multilevel Inverter in Conjunction with AANFUploaded byBONFRING
- An Improved MAC Address Based Intrusion Detection and Prevention System in MANET Sybil AttacksUploaded byBONFRING
- Product and Process Innovation: An Cardinally Ration Materials Dispensation based on GSM and RFID TechnologyUploaded byBONFRING
- Heuristics Approach for Analyzing the Geo-Distributed DataUploaded byBONFRING
- SEPIC Converter based Water Driven Pumping System by Using BLDC MotorUploaded byBONFRING
- A Review of Big Data Examination in Medicinal Services and GovernmentUploaded byBONFRING
- Survey on Data Aggregation through Orthogonal Set Method for Wireless Sensor NetworkUploaded byBONFRING
- Video Quality Assessment for Concurrent Multipath TransferUploaded byBONFRING
- Simulation Results for a Crosstalk Avoidance and Low Power Coding Scheme for System on ChipsUploaded byBONFRING
- Analysis of Circuit Breaker and Relays in SubstationsUploaded byBONFRING
- Distributed System Framework for Mobile Cloud ComputingUploaded byBONFRING
- SAP HANA-Database: Inter Organisation Cooperations with SAP Systems Perspectives on Data Management for Business ApplicationsUploaded byBONFRING
- A Systematic Study on Cyber Physical SystemUploaded byBONFRING
- Identify the Shortest Path in Wireless Sensor Network using of Routing Information ProtocolsUploaded byBONFRING
- Localization in Wireless Sensor Networks Using Reach Centroid AlgorithmUploaded byBONFRING
- An Overview of Applications of Big Data AnalyticsUploaded byBONFRING