You are on page 1of 20

ABSTRACT

The foundation for efficient financial management, sound monetary policies, and long-term strategic decision-making
globally is an accurate estimate of the exchange rate (ER). Economic diversification is made possible by a stable and
competitive ER. To forecast the trends and data that affect the ER's growth, economists, researchers, and investors have
carried out a number of studies. This study compares the effectiveness of Long Short-Term Memory (LSTM) and Support
Vector Machine (SVM) in predicting future exchange rate values. By contrasting the two deep learning models' evaluation
scores. It chooses the best machine learning algorithm. According to the findings, the LSTM neural network model
outperforms the SVM network model in terms of mean square error (MSE), mean absolute error (MAE) and mean
absolute percentage error (MAPE). This shows that the LSTM model predicts Ghana’s exchange rate better.

KEYWORDS: Long short-term memory, support vector machine, Recurrent neural network, Prediction, exchange rate,
mean square error, mean absolute percentage error
1. INTRODUCTION

The price at which one country’s currency is exchanged for another country’s is known as the exchange rate. The most
significant relative price in the financial sector is the exchange rate (Rusydi and Islam, 2007). The importance of the
exchange rate stems from its role in ensuring international trade in goods and services, determining the volume of imports
and exports, setting domestic prices, and maintaining economic balance (Okonkwo et al.,2017). The determination of the
various exchange rate regimes and policies that Ghana has undergone to achieve currency stabilization remains uncertain,
owing to the volatile nature of the country’s exchange rate. There is still a lack of consensus among policymakers
regarding the key determinants of exchange rate fluctuations, and researchers have not yet reached a definitive
conclusion regarding its precise effect on Ghana’s global trade (Yussif et al., 2022). Contemporary literature pertaining to
the determination and prediction of exchange rates is consistent with the postulation that were established several
decades ago. Several empirical findings suggest that the exchange rate adheres to a random walk model, and fluctuations
in exchange rate are indeterminate. Additionally, currencies of nations with high inflation rates tend to devalue over
extended periods, with the extent of depreciation being roughly equivalent to the inflation differential. The fluctuations
observed in the current exchange rate exhibit occasional instances of overshooting, subsequently leading to a gradual
realignment towards the equilibrium (Zou et al., 2017). In the financial markets, modeling and forecasting a financial
variable is an extremely important activity that can be of great benefit to a variety of different participants, including
practitioners, regulators, and policy makers. In addition, forecasting is an essential part of both the financial and
managerial decision-making processes (Majhi et al., 2009). Therefore, making a valid forecast of a variable in the financial
markets is of the utmost significance, particularly in a nation like Ghana. Technical analysis, also referred to as machine
learning, is the process of making educated predictions and educated guesses about the future (Chhajer et al., 2022).
Machine learning is a branch of artificial intelligence that deals with developing and validating algorithms using data
(Chhajer et al., 2022). Many businesses are being automated; utilizing mathematical models, computers use internet
trading to make quick decisions. It develops markets where short-term volatility and sell-offs replace long-term forecasts
(Chhajer et al., 2022). The most popular algorithms for analyzing and forecasting stock market movements are ANN and
SVM. Using tick data, these systems offer up to 99.9\% accuracy (Selvanuthu, 2019). The characteristics of financial
forecasting include being data-intensive, noisy, non-stationary, unstructured, and having hidden linkages (Kshirsagar,
2018). In this work, two algorithms are being discussed, namely, Long Short-Term Memory (LSTM), Support Vector
Machine (SVM).

Accurate exchange rate forecasting is crucial for businesses, investors, and policymakers in the world of financial markets.
Finding accurate exchange rate prediction models is especially important in the context of Ghana’s economy, where
foreign exchange is crucial for trade and financial stability. Support Vector Machines (SVM) and Long Short-Term Memory
(LSTM) networks have become effective time series forecasting tools with the development of deep learning techniques.
In the end, the results of this comparative study will offer insightful information about how well SVM and LSTM algorithms
can forecast Ghana’s exchange rates. This research aims to assist in the selection of appropriate forecasting
methodologies in the context of Ghana’s dynamic economic landscape by highlighting the advantages and disadvantages
of each model. The results might have broader repercussions for other developing nations dealing with comparable
difficulties in managing exchange rate fluctuations.

Following is how the remaining sections of the paper are organized: Section 2 includes an overview of relevant literature.
Section 3 gives a quick review of the various tactics used in this research in addition to describing the dataset that was
used and the experimental setting. In Section 4, the findings of the analyses are presented, and in Section 5, the results are
discussed. The conclusion and suggestions based on the findings are highlighted in part 6, the last part.

2. LITERATURE REVIEW

In applications like image captioning, voice conversion, and natural language processing, recurrent neural networks
(RNNs), a deep learning technique, are demonstrated to have a significant ability to capture the hidden correlations
happening in the huge data. Additionally, they function well while dealing with issues (Zhang et al., 2017). The original
RNN, however, suffers from the issue of a vanishing gradient since the perception of the earlier nodes by later nodes
decreases. Long short-term memory (LSTM) networks (Hochreiter and Schmidhuber, 1997), were suggested as an
enhanced network architecture to address the aforementioned issue. LSTM networks outperform traditional RNNs in
terms of extracting time series features across a wider time range. As an illustration (Bruin et al., 2016), suggested a
method based on an LSTM network to achieve good diagnosis and prediction performances in the instances of complex
procedures, hybrid faults, and significant noise. Based on the widely used measurement signals (Yuan et al., 2016),
advocated the usage of the LSTM network to achieve fast fault detection and identification. The outcome demonstrated
that the LSTM network was more effective than the convolutional network at identifying and locating faults in railway
track circuits. Additionally, Zhao et al., (2017) which introduced a unique traffic forecast model built on the LSTM network,
demonstrated the predictive power. According to Chen et al., (2021), the LSTM outperforms the autoregressive integrated
moving average, support vector regression, and adaptive network fuzzy inference system in terms of predictive
performance. Additionally, it has been claimed that LSTM algorithms surpass cutting-edge methods in terms of accuracy
and noise tolerance for time-series classification (Karim et al., 2019). In general, the LSTM network is an enhanced RNN
that handles longer time series better (Zhang et al., 2017). Machine learning relies heavily on LSTMs because most RNN
cannot overcome short-term memory. As a result, it is difficult to transfer information from past actions to those that will
be conducted in the future. Recurring Neural Networks, for instance, might retain some historical data when processing a
data set or a set of numbers that are projected.

Support Vector Machines, is a powerful machine learning model that may be applied to classification or regression
problems. The SVM algorithm, which is most frequently used for classification issues, performs best in high-dimensional
environments or when many dimensions exceed the number of samples (Chhajer et al., 2021). This algorithm acts as a tool
for risk management and offers investors abnormal profits on their investments (Henrique et al., 2018). For tasks involving
the classification of data into two groups, classification algorithms are employed. Support Vector Machines (SVM) are
capable of classifying new data points when provided with a set of labeled training data that belongs to distinct categories
(Vishwanathan and Murty, 2002). SVMs excel particularly when dealing with limited data, and their primary strength lies
in effectively classifying linearly separable data, a task that comes naturally to them. Additionally, when working with
substantial datasets, SVMs often outperform Artificial Neural Networks (ANN) both in terms of speed and efficiency
(Vishwanathan and Murty, 2002). SVM models are specifically designed to establish an optimal boundary line, effectively
dividing n-dimensional spaces into distinct groups. This strategic approach simplifies the process of categorizing new data
in the future (Vishwanathan and Murty, 2002). SVM can be roughly categorized into Linear SVM, which separates data in a
linear way (statistics that can be separated into two groups by a single straight line). A linear SVM classifier is used to
categorize linear data, whereas non-linear SVM is used to categorize statistics that cannot be separated into groups using
a single line. Using a Non-Linear SVM classifier, it is categorized. SVM’s benefits include the fact that it works best when
there is a distinct margin of separation, highly effective in high-dimensional settings and performs well when there are
more samples than dimensions. SVM uses a subset of training points known as Support Vectors and is excellent at
remembering data. SVM has a drawback. Due to longer training periods, processing huge amounts of data takes longer
(Chhajer et al., 2021). A learning technique that may be applied to evaluate and map both linear and non-linear functions
and solve issues like pattern recognition and prediction is the general structure of Artificial Neural Networks and the
Support Vector Machine (SVM). A hyperplane or collection of hyperplanes (classes) are produced by a high-dimensional
space and can then be used for categorization. Polynomial, Radial Basis Function (RBF), and MLP classifiers can all be
learned using SVM. SVM is based on the risk minimization concept and complements regularization theory. SVM relies
heavily on Kernel Function and mathematical programming approaches to operate. SVM is used to classify and perform
regression on data (Patil et al., 2016).

Numerous forecasting methods have been thoroughly studied in the financial sector. These methods include a variety of
machine learning techniques, such as support vector machines, neural networks, and cutting-edge methods like deep
learning. Unfortunately, there aren’t many thorough survey studies that address these approaches in depth. Notable
authors of thorough reviews in this area are Cavalcante et al. (2016), Bahrammirzaee (2010), and Saad and Wunsch
(1998). The most recent of them, organized by Cavalcante et al. in 2016, focused mostly on stock market-specific
methodology with some discussion of their applicability to foreign exchange markets.

3. DATA AND METHODS


3.1 DATA

The data is a secondary data on USD/GHS - US Dollar Ghanaian Cedi exchange rate from investing.com. The data covers
the dates, daily prices, the open price, the high price, low prices, the volume and change in percentages of Ghana’s
exchange rate. But we are just using the dates and daily prices for our analysis, which means the other variables can be
ignored. The duration of the data is from 14th January 2010 to the variable is 15th of June 2023. Due to the volatility of
data over the duration, preprocessing before further analysis is advised.

The data was cleaned during the pre-processing stage, and the variables date and price do not contain any missing values,
indicating the accuracy of the data. Using range transformation to compute all price variables to be in a range of 0 and 1,
we kept all the prices variables within the same range by performing feature scaling.

In order to evaluate the model’s performance on untried data, it is essential to divide the data into train and test sets.
Following normalization of the data, the scaled data is divided into train and test groups using the ratios 70:30, 80:20, and
90:10, with 80 percent of the data constituting the train set and 20 percent representing the test set. It is important to
split the data up in an orderly manner, keeping the data’s original sequence in mind.

3.2 METHODS

This section covers the modus operandi this study employed in accomplishing the objectives of the study. The studies the
abstract background of Long-Short Term Memory (LSTM) and Support Vector Machine (SVM).

3.2.1 LONG-SHORT TERM MEMORY (LSTM)

Hochreiter and Schmidhuber first presented the Long Short-Term Memory networks (LSTM) in 1997. It arrived as a
remedy for the Recurrent Neural Networks’ (RNNs’) gradient vanishing issue. Since then, numerous improvements have
been made by other scholars. Hochreiter and Schmidhuber’s 1997 paper, "Long Short-Term Memory Networks: A Model
for the Human Brain’s Processing of Sequential Data," was the first to describe these networks. An excellent illustration is
when we study a manuscript; in an effort to remember the crucial facts, we lose the unimportant ones. RNNs struggle
with long-term dependencies, however LSTM units contain features that solve the issue. Long short-term memory (LSTM),
one of the many types of recurrent neural network (RNN), is able to record information from earlier stages and use it to
forecast the future (Patterson and Gibson, 2017). Long Short-Term Memory (LSTM) focused around "memory line" has
been found to be very helpful in forecasting scenarios utilizing long time data because RNN cannot maintain long time
memory. LSTMs with integrated memory lines in their gates can be used to remember data from earlier stages.

The fundamental unit is known as the memory cell, and the cell state is the activation of the memory cell, C t. The
information from the earlier inputs is stored in this variable. The input gate unit controls the flow of input data into the
memory cell and determines what information is necessary for the accurate prediction. "Output gate unit O t learns to
safeguard other units against disruption by presently irrelevant memory contents stored in the memory cell. The gates
acquire the ability to grant and deny access to continuous error flow”. The set of equations below provides a description
of the model. The input gate first selects the values that will be updated, closing the irrelevant ones (activation close to
zero)

(1)
Equation 3.5.1's square brackets denote the concatenation of the vectors.

The activation of a memory cell ~c t input is then computed; these values are the potential additions to memory and further
propagation to other time steps.

4
g ( x )= −x
−23.4 .2
1+ e

The activation function g is the centered logistic function with range [−2, 2].

~
c t =g ( W C [ ht , x t ] +b C ) 3.4 .3

The candidate values are then fed to the cell state, regulated by the input gate
~
C t=C t −1 +i t ∘ C t 3.4 .4

The hidden state is then computed as the filtered version of the cell state, using the output gate as a filter.

o t=σ ( W o [ h t−1 , x t ]+b o ) 3.4 .5

2
f ( x )= −x
−13.4 .6
1−e

The activation function f of the cell state is again a centered logistic function, but this time with range [−1, 1]

ht =o t ∘ f ( Ct ) 3.4.7
Feeding a continuous input stream, the memory cells may grow and cause saturation of the activation function f in
Equation 3.5.7. This makes the derivative of the activation function vanish, therefore the hidden state ht outputs only the
activation of the output gate. This means a complete loss of the memory and the equation is then equal to simple RNN.
Due to this situation, in 1999 the forget gate was presented (22). The gate learns to reset the cell state by assigning a
number between 0 and 1 to each value in it.

f t=σ (W f [ ht −1 , x t ] +b f )3.4 .8

Thus the Equation 3.5.4 is adjusted to 3.5.9.


~
C t=f t ∘C t −1+i t ∘ Ct 3.4 .9

Instead of functions f and g – the centered logistic functions – most of the up-to-date sources use the hyperbolic tangent
function. This is the case even in TensorFlow, the software library used for implementation in this thesis, hence the model
used further is summarized in Equation 3.4.10, visualized in Figure 3.1.

f t=σ (W f [ ht −1 , x t ] +b f )

i t =σ (W t [ ht −1 , x t ] +bi )

o t=σ ( W o [ h t−1 , x t ]+b o )

~
C t =tanh ( W c [ h t−1 , x t ] +b c ) 3.4 .10
~
C t=f t ∘C t −1+i t ∘ Ct

ht =o t ∘ tanh ( Ct )

3.2.2 SUPPORT VECTOR MACHINE

Cortes and Vapnik (1995) introduced the support vector regressor with a minor modification but used the same principles
as the support vector classifier(24). The SVM's basic operating principle is to execute linear regression inside the feature
space after nonlinearly transforming the input data into a high-dimensional feature space using the kernel function(25).
SVM algorithms can be categorized by the use of kernels, the absence of local minima, or the amount of support vectors,
and SVM regression is regarded as a nonparametric technique because it relies on kernel functions(26). The linear epsilon-
insensitive SVM (ε-SVM) regression is used in this investigation. Finding a function ^f SVM ( x n) that deviates from y n by no
more than ε and is also as smooth as possible is the aim of ε-SVM. The regression function is represented by the following:

^f ( X )=ω ∙ φ ( x ) +s 3.5.1
SVM

s is a scalar threshold, ω is the weight vector; and ϕ(X) denotes a nonlinear mapping function that transforms the input
space X into a high dimensional feature space;. The SVM model performs linear regression in the high-dimensional feature
space by ε-insensitive loss. The coefficients ω and s then can be estimated through minimising the regularised risk
function:
2
‖ω‖
J ( ω )=
2

{
s . t ∀ n: y n −φ(ω , X n )−s ≤ ε 3.5 .2
φ(ω , X n)+ s− y n ≤ ε

It is likely that the function f (X) that perfectly satisfies Eq. 3.5 2 for all points does not exist. Therefore, slack variablesζ n
¿
and ζ n are introduced for each point to deal with other infeasible constraints. These slack variables allow regression errors
¿
to exist up to the value of ζ n andζ n , but still meet the required conditions. This objective function is described as(27):
2
‖ω‖ n
J ( ω )= +C ∑ (ζ n+ ζ ¿n )
2 i=1

{
y n −φ(ω , X n )−s ≤ ε +ζ n
¿
s . t ∀ n: φ(ω , X n)+ s− y n ≤ ε + ζ n 3.5.3
ξn ≥ 0
¿
ξn ≥ 0

where the box constraint, which denotes the penalty degree of the sample with an error greater than ε, is represented by
the positive constant C.
The dual problem is resolved by using the optimization approach to maximize the function. The dual formula is
constructed by presenting the primal function for each observation Xn with the non-negative Lagrangian multipliers α n and
¿
α n. We reduce the double formula.

N N N N
1
L(α )= ∑ ∑
2 i=1 j=1
(¿ ¿ α i−α ¿i )(α j−α ¿j)G(x i , x j )+ε ∑ (α i + α ¿i )+ ∑ y i (α i−α ¿i )3.5.4 ¿ ¿
i=1 i=1

{
N

∑ (α n−α ¿n )=0
s . t ∀ n: n=1
0 ≤ α n≤ C
¿
0 ≤ α n≤ C

where the Gaussian kernel function


2
G(x i , x j)=exp(−‖ xi −x j‖ )3.5 .5

is used as the kernel function of SVM. The SVM model obtained by minimising Eq. 3.5.4 is then given as:
N
^f ( x )=∑ (α i +α ¿i )G(x n , x)+ s 3.5 .6
i=1

Then sequential minimal optimisation (SMO) (28) approach is introduced to determine the appropriate parameters (e.g. C
and ε) of SVM and the SVM model can be finally determined.
The SVM and LSTM models will be trained and assessed using the appropriate training and testing subsets for each train-
test ratio. We will compute and compare the evaluation metrics across various ratios, including Mean Absolute Error
(MAE), Mean Squared Error (MSE), and Mean Absolute Percentage Error (MAPE). The train-test ratios will reveal how
different data sizes affect how well the models perform. The outcomes will shed light on the trade-off between having
more extensive training sets and being able to test the model on a wider variety of data points.

MEAN SQUARED ERROR: It calculates the mean difference between actual values and predicted values. The model's
performance in terms of prediction errors can be measured using MS E in a straightforward and understandable manner.

1
MSE=
N
∑ ¿¿

MEAN ABSOLUTE ERROR: A common metric for assessing the precision of predictions in regression tasks is mean absolute
error (MAE). It calculates the mean absolute difference between actual (ground truth) values and predicted values
N
1
MAE= ∑ |y − ^y |
N i=1 i i

MEAN ABSOLUTE PERCENTAGE ERROR: The average percentage difference between predicted and actual values is
measured by MAPE.

| |
yi −^y i
N
1
MAPE= ∑ ×100 %
N i=1 yi
Given a dataset of N data points, where:

 y i represents the actual (true) value of the target variable for the ith data point.

 ^y i represents the predicted value of the target variable for the ith data point.

4. Results

The descriptive analysis for the data is in the table below (Table 4.1) according to the table below the total number of the
data sets is 3493. The figure below (Figure 4.1) below shows the yearly time series plot of the exchange rates over the
years. The x-axis represents the years (which is 2010 to 2024), the y-axis represents the prices (in Ghana cedis). They plot
suggests that the exchange rate series is not stationary, the line plot visually depicts the changes in the exchange rate over
time, indicating increase, the plot also exhibits extreme volatility from around 2019 to beginning of 2023 which is probably
due to the pandemic in 2019 and the Russian Ukraine war.

Count 3493
Minimum value 1.410000
Maximum value 14.750000
Mean 4.320948
25% 1.991500
50% 4.030000
75% 5.630000
STD 2.568892
Table 4.1 Descriptive Analysis for the Exchange rates

Figure 4.1 Time Series Plot of Daily USD/CEDI Exchange Rate


MSE MAE MAPE
70- 0.247035741730747 0.142741035035032 1.95352461074261
30 97 47 65
SPLI
T
80- 0.303376072685102 0.236749751819184 2.88969636215434
20 64 15 68
SPLI
T
90- 0.409068576335846 0.437610206478520 4.11953243529922
10 3 4 4
SPIL
T
Table 4.2 LSTM model performance
MSE MAE MAPE
70-30 SPLIT 19.02159209803364 3.394947962031437 0.40758086647665137
80-20 SPLIT 26.347663713722692 4.133356049670566 0.4425608643486667
90-10 SPLIT 48.62719163847371 6.583292877000633 0.5995700001616454
TABLE4.3 SVM Model Performance

From table 4.2, the LSTM model exhibits exceptional predictive accuracy when trained on a 70:30 split of the
dataset, achieving a remarkably low Mean Squared Error (MSE) of 0.247, Mean Absolute Error (MAE) of 0.143,
and an impressively low Mean Absolute Percentage Error (MAPE) of 1.954. When the dataset split increases to
80:20, the model's performance remains strong, with slightly higher error metrics but still maintaining a low
MAPE, indicating consistent predictive capabilities. However, with a 90:10 split, the model's performance
declines as it struggles to generalize from a smaller training dataset, resulting in higher MSE, MAE, and MAPE
values, which is expected given the limited exposure to diverse patterns in the test set.

From table 4.3, despite being useful, the SVM model shows a different pattern. It produces an MSE of 19.022, MAE
of 3.395, and MAPE of 0.409 under the 70:30 split. Compared to the LSTM model, these measurements show a
certain level of accuracy but higher mistakes.

The MSE, MAE, and MAPE values for the SVM model rise to 26.348, 4.133, and 0.443 for the 80:20 split. The model's
performance continues to follow its usual pattern.
The SVM model performs worse in the 90:10 split, with an MSE of 48.627, MAE of 6.583, and MAPE of 0.600. The
SVM model's capacity to generalize declines when the testing dataset is minimal, similar to the LSTM model.

You might also like