You are on page 1of 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/311959552

Intraday volume percentages forecasting using a dynamic SVM-based


approach

Article  in  Journal of Systems Science and Complexity · December 2016


DOI: 10.1007/s11424-016-5020-9

CITATION READS

1 590

2 authors, including:

Kin Keung Lai


City University of Hong Kong
750 PUBLICATIONS   9,760 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

National Natural Science Foundation of China (NSFC) under grant No. 71473155 View project

All content following this page was uploaded by Kin Keung Lai on 11 November 2018.

The user has requested enhancement of the downloaded file.


J Syst Sci Complex (2017) 30: 421–433

Intraday Volume Percentages Forecasting Using a


Dynamic SVM-Based Approach
LIU Xiaotao · LAI Kin Keung

DOI: 10.1007/s11424-016-5020-9
Received: 22 January 2015 / Revised: 2 August 2015
The
c Editorial Office of JSSC & Springer-Verlag Berlin Heidelberg 2017

Abstract This paper proposes a dynamic model to forecast intraday volume percentages by decom-
posing the trade volume into two parts: The average part as the intraday volume pattern and the
residual term as the abnormal changes. An empirical test on data spanning half-a-year gold futures
and S&P 500 futures reveals that a rolling average of the previous days’ volume percentages shows
great predictive ability for the average part. An SVM approach with the input pattern consisting of
two categories is employed to forecast the residual term. One is the previous days’ volume percent-
ages in the same time interval and the other is the most recent volume percentages. The study shows
that this dynamic SVM-based forecasting approach outperforms the other commonly used statistical
methods and enhances the tracking performance of a VWAP strategy greatly.
Keywords Intraday volume percentages, principal component decomposition, SVM, VWAP.

1 Introduction
The last few decades have witnessed rapid development of electronic trade execution sys-
tems, known as algorithmic trading or simply algo-trading. More and more investors are using
computer based algorithms to submit orders. Little of human judgment and intervention is
found in this kind of algorithms. They aim to minimize the market impact and enhance the
performance of order execution based on some operations strategies, such as pattern recogni-
tion. The advent of algorithmic trading has had a huge influence on the financial industry; the
total amount of automatically executed orders has increased dramatically. Chordia and Roll[1]
showed that the widespread use of algorithmic trading is the key determinant of the rise in
number of orders and the decline in average volume of trades.
LIU Xiaotao
Department of Finance, Central China Normal University, Wuhan 430070, China.
Email : xtliu2-c@my.cityu.edu.hk.
LAI Kin Keung (Corresponding author)
International Business School, Shaanxi Normal University, Xi’an 716099, China.
Department of Management Sciences, City University of Hong Kong, Hong Kong, China.
Email : mskklai@cityu.edu.hk.
 This paper was recommended for publication by Editor ZHANG Xun.
422 LIU XIAOTAO · LAI KIN KEUNG

Splitting large orders into smaller orders for pursuing better execution prices and less market
impact has become a great concern for institutional traders. Volume weighted average price
(VWAP) strategy is a typical example of splitting orders to reduce market impact. The goal of
this strategy is to execute small pieces of orders during the trading period aiming at achieving
or getting closer to the VWAP. Interestingly, the accuracy of intraday volume percentages
forecasting is crucial to achieve this goal. Satish, et al.[2] listed other examples to emphasize the
importance of intraday volume forecasting. They classified the forecasting into two categories:
Forecasting of intraday raw volume and forecasting of intraday volume percentages. Besides
improving the performance of algorithmic trading, the raw volume prediction can be useful for
a bunch of algorithms like market participation models. As a consequence, the price may be
badly affected by the large orders placed. Intuitively, forecasting of intraday volume percentages
contributes to VWAP strategy (see [3]). Here, we focus on the intraday volume percentages
forecasting.
Due to the boom in electronic trading, ultra-high frequency data have become accessible
and are being used widely. There are a large number of publications investigating the intraday
volumes for the most recent three decades. The majority of them belong to one of the following
two categories. First, some publications have examined the relationship between trading volume
and price or return volatility, ask-bid spread and liquidity (see [4–8]). The other group of
researchers has investigated how the market activities or new information affect trading volume,
which is further used to explain a particular intraday shape (e.g., U-shape, V-shape, J-shape) in
trading volume (see [9–13]). Despite the importance of intraday volume forecasting mentioned
before, the studies on this topic were few until 2008.
Bialkowski, et al.[14] decomposed the trading volume according to [15] and [16] into two parts:
An average term representing the changes coming from market evolutions and a deviation term
accounting for the opening and closure of arbitrage positions. ARMA(1,1) and SETAR are
used for estimating the latter part separately, based on forty stocks of the CAC40 index with
20-minute intervals. On average, SETAR reduces the mean absolute percentage error (MAPE)
for volume forecasting by 16.91% with the reduction of tracking error for VWAP strategy by
7%.
Alvim, et al.[17] forecast the intraday volume using the Bovespa data set with 15-min.
intraday volumes. The data set contains 3 top high liquidity stocks and 6 low liquidity stocks.
A dynamic volume model with support vector regression (SVR) and partial least squares (PLS)
is used. The predictors (SVR and PLS) based model outperforms the original average model
greatly by reducing the average error percentage by 17.44% at the beginning and 45.66% at the
end of the day.
Brownlees, et al.[18] built a dynamic model for intraday volume prediction using 5 years data
with 15-minute intervals of three liquid Exchange Traded Funds (ETFs): SPY, DIA and QQQ.
They decomposed intraday trade volume into three parts: A daily component, an intraday
periodic component and an intraday non-periodic component. Based on the combination of
Component Multiplication Error Model (CMEM) and Generalized Method of Moments (GMM),
reductions of mean square error (MSE) on volume by 12.7% and VWAP tracking error by 6.5%
INTRADAY VOLUME PERCENTAGES FORECASTING 423

are achieved against the calculation of original historical rolling means.


Orchel[19] predicted the volume percentages based on the SVMs: ε-SVR[20] and δ-SVR[21]
using the NASDAQ-100 index for half-a-year. They showed that if a priori knowledge of prices
can be obtained, the performance of SVR is better. [22] also decomposed volumes into two
parts, historical one and the deviation term to predict volume percentage by a dynamic VWAP
approach. They use a specific linear regression model and achieve less volume percentage error
based on data for half-a-year of 197 stocks in the ASX 200 with 5-minute intervals. Humphery-
Jenner[23] constructed an ARMA-EGARCH model to forecast the intraday volume using the
data of 59 days Shanghai securities composite index with 1-minute intervals. They decomposed
the intraday volume into the periodic trend part and the deviation part. The mean absolute
percentage error (MAPE) for trade volume can be reduced by 6.2% compared to the ARMA
model.
Satish, et al.[2] predicted raw volume and volume percentages separately. The raw volume
prediction is the weighted value of the three components, the historical average volume, the
inter-day part (an estimation using ARMA model over the volume data with the same bin
in previous days) and the intraday component estimated by an ARMA using the seasonally
adjusted intraday bin volume data. Since the volume percentage forecasting does not need
to know the prediction for all the remaining intraday intervals, they exclude the intraday
component and combine the new approach with the one proposed in [22]. The model was
proved to be effective in reducing the percentage prediction error by 9.1%.
The literature mentioned above suggests that decomposition of the trade volume into two
components for volume forecasting is reasonable. Besides, the utilization of decomposing
method for forecasting intraday data can also be found in other areas. The CAPM is a typical
example that uses this kind of the decomposing method for forecasting in varies areas. More-
over, Shen and Huang[24] used the principle component analysis on the intraday data of the
call arrival volumes. Andersen, et al.[25] used a decomposition method to predict the intraday
volatility in the Japanese stock market. Sévi[26] applied it to forecast the volatility of crude oil
futures. Smithn, et al.[27] forecast the intraday electronic load by decomposing the distribution
of a continuous process into two parts. Chanda, et al.[28] forecasted the intraday returns of US
equity relying on decomposing the volatility into multiplicative components. The decomposing
method was also employed in [29] to forecast the intraday VaR and intraday returns.
Obviously, ARMA and other linear regression models are simple and popular approaches
employed to model the abnormal changes. But these methods assume that the deviation terms
have a linear relationship with themselves or other variables they set. Moreover, it is hard to
specify the dependent variables related to the deviation. Both of these two limitations make
the prediction result of the models nonrobust. In this paper, we concentrate on forecasting the
unexpected changes with support vector machine (SVM). As a machine learning approach, SVM
has some advantages over the traditional neutral networks. By implementing structural risk
minimization (SRM) principle[30] , SVM minimizes the training error and avoids the problem
of overfitting, while the neutral networks are based on empirical risk minimization (ERM),
caring only about the training error. As a consequence, overfitting is a critical issue for neutral
424 LIU XIAOTAO · LAI KIN KEUNG

networks. Further, SVM can always achieve a global and unique minimum, while neutral
networks may obtain a local optimal. Thus, SVM is a more efficient and accurate nonlinear
approach used in this study to predict the abnormal part. None of the literatures have done
this before.
It is well known that kernel functions and the parameters are the two main factors that
influence the classification result of SVM. In this study, we choose the (Gaussian) radial ba-
sis function (RBF) since among the four basic kernels, RBF is relatively more effective than
others[31] . To choose the best parameters for SVM, we use grid search[32] and genetic algo-
rithm (GA). Grid search was shown to be more efficient than the GA search for continuous
functions, while GA was more robust for high-dimensional problem with noise[33] . Both are
sensitive to the given initial values. We evaluate the performance of the constructed dynamic
SVM-based approach based on an out-of-sample forecasting test on 6-months gold futures and
6-months S&P 500 futures. Using the historical average as the benchmark, it shows that the
approach outperforms the benchmark with a reduction of the mean square error by almost 14%
on average. Moreover, the tracking error of a VWAP strategy is also reduced by at the most
40%.
The remainder of the paper is organized as follows. Section 2 provides a brief introduction of
SVM for regression estimation. Section 3 gives the details on the prediction process. Section 4
describes the data set and presents the empirical results of volume percentage predicting. Sec-
tion 5 evaluates the tracking performance of VWAP strategy based on the volume percentages
forecasting results. Section 6 concludes.

2 Support Vector Machine (SVM) for Regression


This section introduces the properties of Support Vector Machines (SVM) for estimating the
regression functions. Based on the statistical learning theory, Vapnik and his friends put forward
SVM in the 1990’s[34–36] for data classification, regression estimation and signal processing. As
mentioned before, SVM implements the structural risk minimization principle which minimizes
the sum of training error and a confidence interval. The following description of the ideas of
SVM is based on [37].
Give a training set of N data points (xi , yi ) (i = 1, 2, · · · , N, xi ∈ X ⊆ RI , yi ∈ X ⊆
R). x = (x1 , x2 , · · · , xN ) are the input patterns corresponding to the output values of y =
(y1 , y2 , · · · , yN ). SVM aims to approximate the function by constructing the following form:

f (x) = ω · ϕ(x) + b, (1)

where ϕ(x) maps the input patterns x from its space to a new high-dimensional feature space.
A special and simple example is ϕ(x) = x, then function f (x) is linear.
Vapnik[30] introduced the ε − SV regression which allows an error of ε and makes sure the
flatness of the function f (x) so as to control the capacity. Thus, the regularized risk function
INTRADAY VOLUME PERCENTAGES FORECASTING 425

can be described as:

1 N  N
 
min R(ω, b) = ω2 + C Dε yi , f (xi ) , (2)
2 i=1 i=1

where ⎧
  ⎨ 0, |yi − f (xi )| < ε,
Dε yi , f (xi ) =
⎩ |yi − f (xi | < ε, otherwise.
 
Dε yi , f (xi ) is the ε-insensitive loss function and deviations less than ε are ignored.
The first term minimizes the norm of ω making f (x) as flat as possible. The second term
measures the errors over ε. Parameter C is the penalty factor used to balance the training error
and the complexity of the problem.
To achieve a feasible solution of the problem, the dual problem is obtained as:


N 
N
max

yi (αi − α∗i ) −ε (αi + α∗i )
α,α
i=1 i=1
1 
N N
− yi (αi − α∗i )(αj − α∗j )(K)(xi , xj )
2 i=1 j=1 (3)

N
s.t. (αi − α∗i ) = 0,
i=1
αi , α∗i ∈ [0, c], i = 1, 2, · · · , N.

(K)(xi , xj ) is the so called kernel function, the dot product of two vectors in the feature space
described as (K)(xi , xj ) = ϕ(xi ) · ϕ(yi ). Vapnik[30] showed that a kernel function can be any
of the functions only if it satisfies the Mercer’s condition[38] . Schölkopf, et al.[39] demonstrated
some simple rules for constructing kernels. There are some basic kernel functions, including:
Line: K(xi , xj ) = xi · xj .
Polynomial: K(xi , xj ) = (xi · xj + 1)d , where d > 0.
Gaussian radial basis function (RBF): K(xi , xj ) = exp(−λxi − xj 2 ), λ > 0. Usually set
λ = 12 σ 2 .
Sigmoid: K(xi , xj ) = tanh(λxi · xj + c).
By following the Karush-Kuhn-Tucker (KKT) conditions, we can get the conclusions that
when |yi − f (xi )| < ε, αi and α∗i should be zero to satisfy the first two KKT conditions.
Only data points with errors greater than ε will probably have nonzero αi and α∗i and will be
the determinants of the decision function f (x). These training data points are called support
vectors which account for a small proportion of the data. This is the property of sparsity for
SVM.
426 LIU XIAOTAO · LAI KIN KEUNG

3 Volume Percentage Forecasting


To explain the process of volume percentage forecasting, we set some notations, as follows.
There are M days in total and let m ∈ {1, 2, · · · , N } be a given trading days. Suppose volumes
on these trading days are denoted by capital Vs and V = (V1 , V2 , · · · , VM ). The trading periods
in each day are the same and denoted by T divided into n intervals. Thus the length of every
interval is τ = Tn . Using i ∈ {1, 2, · · · , n} to represent each interval every day, denote trade
volume in the ith interval in day m as vi,m . Then the volume percentages in each interval
v
are wm = (w1,m , w2,m , · · · , wn,m ) in day m and calculated as wi,m = Vi,m i,m
(i = 1, 2, · · · , n).
Recalling that volume can be decomposed into two parts: the changes coming from market
evolution and a deviation term accounting for the opening and closure of arbitrage positions,
we use an average term and a residual term to form volume percentage, as below:

1 
m−1
wi,m = wi,j + ri,m . (4)
L
j=m−i−L

The rolling average of the previous L days’ trade volume percentages of the same interval
is used here to represent the changes coming from market activities. Eliminating the market
activities from the real executed volume, the residuals capture the abnormal changes.
We concentrate on predicting the abnormal changes (residuals) using SVM in this part.
Data points with input patterns and the corresponding output labels need to be specified and
constructed before training. The previous day’s volume percentages of the same time interval
are used as one part of the input pattern, since we believe that they can affect the volume of the
current day’s volume percentage at the same time interval to some extent (see Figure 1). Figure 1
depicts the autocorrelation and partial autocorrelation test for the “time series” constructed
by residual terms in each day of the same time interval. Clearly, they are self-correlated.
Sample Autocorrelation Function
1
Sample Autocorrelation

0.5

−0.5
0 5 10 15 20

Sample Partial Autocorrelation Function


1
Sample Partial Autocorrelations

0.5

−0.5
0 5 10 15 20
Lag

Figure 1 Autocorrelation and partial autocorrelation test for the time series
constructed by residual terms in each day of the same time interval

As the new information usually has great effect on price, do volumes in the previous time
INTRADAY VOLUME PERCENTAGES FORECASTING 427

intervals affect volumes in the current period? If so, these volumes should be included in the
input patterns. The abnormal changes usually come from unexpected shocks to the market,
which are reflected in the most recent data; effects of shocks usually last for short periods of
time. Therefore, it is reasonable to add the recent volume percentages into the input elements.
Figure 2 shows the autocorrelation and partial autocorrelation test for the residual volume
percentage series for a random day, which further verifies that the recent volumes have an
influence on volumes in the following period.
Sample Autocorrelation Function
1
Sample Autocorrelation

0.5

−0.5
0 2 4 6 8 10 12 14 16 18 20

Sample Partial Autocorrelation Function


Sample Partial Autocorrelations

0.5

−0.5
0 2 4 6 8 10 12 14 16 18 20
Lag

Figure 2 The autocorrelation and partial autocorrelation test for


the residual volume percentage series for a random day

The data points (xi,m , yi,m ) for SVM training have been constructed. In particular, xi,m =
(ri,m−1 , · · · , ri,m−1−l1 , ri−1,m · · · , ri−1−l2 ,m ) and the corresponding output is the volume per-
centage of time interval i of day m, that is yi,m = ri,m+1 . The first part of input means previous
l1 days’ volume percentage residuals of the same interval are taken into account for training,
which is called the “historical” part of the input. Similarly, the latter part shows the recent
volume percentage residuals of previous l2 time intervals are included in the input factors, called
the “recent” part of the input. When it is early in the day and there are no longer l2 intervals
before the interval i of day m, that is i ≤ l2 , to make sure the consistence of the input variables,
we need to get back to the previous day and use the volume percentages of the last or most
recent intervals on that day to fill the inputs. Thus to sum up, the input patterns are

⎨ (r
i,m−1 , · · · , ri,m−1−l1 , ri−1,m · · · , r1,m , rn,m−1 , · · · , rn+i−l2 ,m−1 ), i ≤ l2 ,
xi,m =
⎩ (ri,m−1 , · · · , ri,m−1−l , ri−1,m · · · , ri−1−l ,m ), otherwise.
1 2

To simplify the second expression of the input patterns, we use a single subscript t =
n × (m − 1) + i to represent the double subscript im. Accordingly, the input patterns can be
represented as
xi,m = (ri,m−1 , · · · , ri,m−1−l1 , rt−1 , · · · , rt−l2 ). (5)



historical part recent part

Considering the selection priority of RBF kernel mentioned before, we do not care about
428 LIU XIAOTAO · LAI KIN KEUNG

the other kernels. Apart from the kernels, parameters are the other important factors for SVM.
Grid search[32] and genetic algorithm (GA) are used to select the best parameters with the
smallest generalization errors. Mean square error (MSE) is used to measure the predicting
performance, which is calculated as below:

1
n
2
MSEm = (wi,m + w
i,m ) , (6)
n i=1

where w i,m is the prediction value of volume percentage using SVM, wi,m is the real trade
volume and MSEm is the MSE in day m. The MSE of the traditional approach using historical
average as the prediction is used as the benchmark.

4 Empirical Results and Analysis


4.1 Data Description
We use data for 6 months of one-minute intraday gold futures and for 6 months of one-
minute intraday S&P 500 futures (COMEX) downloaded from Bloomberg to analyze the volume
percentage prediction performance. The period of gold futures is from December 5th, 2012 to
June 19th, 2013. The trading time period is from (GMT) 22:00:00 yesterday to 21:15:00 next
day, 1395 minutes. Weekends and holidays have no trades and are excluded. Therefore, 130
days in total are covered. The data of S&P 500 last from June 17th, 2013 to December 13th,
2013. Similar to gold futures, excluding weekends and holidays, 128 days are left with trading
period from (GMT) 22:00:00 to 13:15:00 next day, and 20:30:00 to 21:30:00, which makes 960
minutes. The trading period is set as a segment during matching. We divide the time period
in each day into 279 slices for gold and 192 for S&P 500 futures and each slice lasts 5 minutes.
(See Table 1)

Table 1 Data information for gold and S&P 500 futures

Futures Sample period Sample days Time intervals

Gold 12-5-2012 to 6-19-2013 130 279

S&P 500 6-17-2013 to 12-13-2013 128 192

Dividing the 5-minute volumes by the daily trade volume, we get the corresponding volume
percentages. Figure 3 shows the volume percentages for 21 consecutive days (about one month).
It is clear that an intraday volume pattern exists and it repeats almost every day.
INTRADAY VOLUME PERCENTAGES FORECASTING 429

0.1
volume percentages for a month
0.08

0.06

0.04

0.02

0
0 1000 2000 3000 4000 5000 6000

Figure 3 5-Minute volume percentages in a consecutive month (about 21 days)

The rolling average of one month is used to represent this intraday volume pattern. Com-
paring the intraday volume percentages with the intraday volume pattern (in Figure 4), we
observe that historical average can smooth the original volume percentage well and capture
most of the changes, which shows that the rolling average of one month is a good representative
of the intraday volume pattern. This is explicated in the next section. But the two series
(volume percentages and the intraday volume pattern) are not exactly the same owing to the
abnormal market shocks. Further, to obtain this effect, we extract the difference of the two
series as the residual terms shown in Figure 5.
0.07
average term for market activity
0.06 volume percentage
0.05

0.04

0.03

0.02

0.01

0
0 50 100 150 200 250 300
i

Figure 4 Real volume percentages and the historical average


volume percentages of a random day

0.06
residual term for abnormal activity
0.05

0.04

0.03

0.02

0.01

−0.01

−0.02
0 50 100 150 200 250 300
5 min

Figure 5 Residual terms extracted by deducting the average terms


from the real volume percentages of a random day
430 LIU XIAOTAO · LAI KIN KEUNG

4.2 Analysis of the Volume Percentages Forecasting Results


A traditional method of using the rolling average to predict the volume percentage is em-
ployed as the benchmark. Institutively, the benchmark concerns only the intraday pattern.
The lag of the historical part of the input (l1 ) is set to be 5 days (one week). Linking and
constructing the residual terms of intraday volume percentage as a “time series”, we set up an
ARMA to choose an optimal lag length based on Bayesian information criterion (BIC) and get
a result of 3 for gold futures and 5 for S&P 500. Thus the lag of the daily part of the input (l2 )
is 3 for gold futures and 5 for S&P 500. As a result, the input patterns consist of five variables
for the historical part and three or five variables for the recent part (l1 = 5, l2 = 3, 5). We use
almost 3 months (60 days) as the training set and 70 days as the testing data. This is a SVM
with an input of two parts and decomposition volumes.
The paper compares its performance with ARMA, ARMA-EGARCH, SETAR by using
mean square error (MSE) and gets the results shown in Table 2 on gold futures and in Table 3
on S&P 500 futures. SVM-GS and SVM-GA are the SVMs with the parameters searching
approaches: Grid search and genetic algorithm (GA) in the tables. Without using the Principal
Component Decomposition(PCM) on the intraday volume, rolling average and ARMA are
adopted to forecast volume percentages, respectively. Clearly, the approach using rolling average
has higher forecast accuracy, indicating that the historical average is a good representative of
the changes resulting from the market activities (intraday volume percentages pattern) since it
captures most of the intraday volume changes. On the other hand, models including ARMA,
ARMA-EGARCH, SETAR, SVM are employed to predict the deviation terms after volume
decomposition. At this time, rolling average is used to represent the intraday pattern. It turns
out that models have stronger prediction ability on the average after volume decomposition.
It means that decomposing volume into two parts can enhance prediction performance and
the decomposed two parts have different characteristics. In addition, the SVM approach using
either GS or GA to search the parameters outperforms the others greatly. So, the first average
part changes linearly and can be predicted by the historical average, while the second abnormal
part changes nonlinearly and can be forecast by the SVM.

Table 2 MSES Of the benchmark method and the SVM-based approaches for gold futures

No PCD PCD

Average ARMA ARMA EGARCH SETAR SVM-GS SVM-GA

MSE*1E4 0.191 0.207 0.197 0.190 0.192 0.164 0.174

Improve(%) 0 −8.70 −3.39 3.73 −8.11 13.99 8.65


INTRADAY VOLUME PERCENTAGES FORECASTING 431

Table 3 MSES Of the benchmark method and the SVM-Based approaches for S&P 500 futures

No PCD PCD

Average ARMA ARMA EGARCH SETAR SVM-GS SVM-GA

MSE*1E4 1.625 1.747 1.624 1.628 1.606 1.459 1.439

Improve(%) 0 −7.55 0.06 −0.22 1.18 10.22 11.45

4.3 VWAP Strategy Evaluation


To evaluate the performance of VWAP in tracking market price, we use the MAPE (mean
absolute percentage error) stated in [3]. MAPE is calculated with unit bp, as below:

VWAP − EVWAP

MAPE = ∗ 10000. (7)
VWAP

VWAP in the equation is the daily volume weighted average price (VWAP) and is the
executed VWAP according to the VWAP strategy[3] , with volume percentages forecasting.
Then we choose the volume percentages forecasting approaches with the smallest forecasting
error to evaluate the VWAP strategy. Also, grid serach is quite time consuming, thus GA will
be used for S&P 500 futures with a higher dimension. In particular, the approach SVM-D2 with
parameters optimization method of GS (14% reduction of MSE) is used on gold futures and
SVM-D2 with parameters optimization method of GA (11.4% reduction of MSE) on S&P 500
futures. Table 4 shows that the tracking performance of the VWAP strategy using the forecast
volume percentages can be improved by 40% on gold futures and 23% on S&P 500 futures.

Table 4 MAPEs for the VWAP strategies

Gold futures S&P 500

Average SVM-GS Average SVM-GA

MSE*1E4 8.504 5.104 3.808 2.918


Improve (%) 0 40.00 0 23.37

5 Conclusion
In this paper, we propose a dynamic model for forecasting intraday volume percentages
by decomposing the trade volume percentages into two parts: the average part as the intraday
volume pattern and the residual term as the abnormal changes. A rolling average of the previous
day’s volume percentage is used to predict the average part, which is proved to capture the main
changes of the intraday volumes. With respect to predicting the residual term, an SVM with
the input pattern consisting of two categories is employed. One is the previous day’s volume
percentages of the same time interval and the other is the most recent volume percentages. The
study shows that this dynamic approach outperforms the benchmark method of using historical
432 LIU XIAOTAO · LAI KIN KEUNG

average as the prediction and improves the accuracy by 14% at the most, according to an out-of-
sample forecasting test on gold and S&P 500 futures. This verifies that volume decomposition
is reasonable and the two decomposed parts have different characteristics. Also,this dynamic
SVM-based volume percentages forecasting approach enhances the tracking performance of a
VWAP strategy greatly.

References
[1] Chordia T, Roll R, and Subrahmanyam A. Recent trends in trading activity and market quality.
J. Finan. Econ., 2011, 101(2): 243–263.
[2] Satish V, Saxena A, and Palmer M, Predicting intraday trading volume and volume percentages,
Journal of Trading, 2014, 9(3): 15–25.
[3] Chen C J, Liu X, and Lai K K, Comparisons of strategies on gold algorithmic trading, Business
Intelligence and Financial Engineering (BIFE), 2013 Sixth International Conference, 2013.
[4] Smirlock M and Starks L, An empirical analysis of the stock price-volume relationship, J. Banking
Finance, 1988, 12(1): 31–41.
[5] Gwilym O A, McMillan D, and Speight A, The intraday relationship between volume and volatil-
ity in liffe futures markets, Appl. Finan. Econ., 1999, 9(6): 593–604.
[6] Darrat A F, Rahman S, and Zhong M, Intraday trading volume and return volatility of the djia
stocks: A note, J. Banking Finance, 2003, 27(10): 2035–2043.
[7] Cai C X, Hudson R, and Keasey K, Intra day bid-ask spreads, trading volume and volatility:
Recent empirical evidence from the london stock exchange, J. Bus. Financ. Account., 2004, 31(5–
6): 647–676.
[8] Chevallier J and Sévi B, On the volatility-volume relationship in energy futures markets using
intraday data, Energy Econ., 2012, 34(6): 1896–1909.
[9] Gerety M S and Mulherin J H, Trading halts and market activity: An analysis of volume at the
open and the close, J. Finance, 1992, 47(5): 1765–1784.
[10] Lee C, Ready M J, and Seguin P J, Volume, volatility, and new york stock exchange trading
halts, J. Finance, 1994, 49(1): 183–214.
[11] Atkins A B and Basu S, The effect of after-hours announcements on the intraday u-shaped volume
pattern, J. Bus. Financ. Account., 1995, 22(6): 789–809.
[12] Kluger B D and McBride M E, Intraday trading patterns in an intelligent autonomous agent-
based stock market, J. Econ. Behav. Organ., 2011, 79(3): 226–245.
[13] Malinova K and Park A, The impact of competition and information on intraday trading, J.
Banking Finance, 2014, 44: 55–71.
[14] Bialkowski J, Darolles S, and Le Fol G, Improving VWAP strategies: A dynamic volume approach,
J. Banking Finance, 2008, 32(9): 1709–1722.
[15] Lo A W and Wang J, Trading volume: Definitions, data analysis, and implications of portfolio
theory, Rev. Financ. Stud., 2000, 13(2): 257–300.
[16] Darolles S and Le Fol G, Trading Volume and Arbitrage, INSEE, 2003.
[17] Alvim L G, Duarte Dos Santos CN, and Milidiu R L, Daily volume forecasting using high fre-
quency predictors, Proceedings of the 10th IASTED International Conference, 2010.
[18] Brownlees C T, Cipollini F, and Gallo G M, Intra-daily volume modeling and prediction for
algorithmic trading, J. Finan. Econ., 2011, 9(3): 489–518.
[19] Orchel M, Support vector regression with a priori knowledge used in order execution strategies
INTRADAY VOLUME PERCENTAGES FORECASTING 433

based on vwap, Advanced Data Mining and Applications, Springer, Berlin Heidelberg, 2011, 318–
331.
[20] Vapnik V N and Vapnik V, Statistical Learning Theory, Wiley, New York, 1998.
[21] Lin F and Guo J, A novel support vector machine algorithm for solving nonlinear regression
problems based on symmetrical points, Computer Engineering and Technology (ICCET), 2010
2nd International Conference, 2010.
[22] Humphery-Jenner M L, Optimal VWAP trading under noisy conditions, J. Banking Finance,
2011, 35(9): 2319–2329.
[23] Yan R and Li H, Modeling and forecasting the intraday volume of shanghai security composite
index, Systems and Informatics (ICSAI), 2012 International Conference, 2012.
[24] Shen H and Huang J Z, Interday forecasting and intraday updating of call center arrivals, Manuf.
Serv. Oper. Manag., 2008, 10(3): 391–410.
[25] Andersen T G, Bollerslev T, and Cai J, Intraday and interday volatility in the Japanese stock
market, J. Int. Finan. Markets, Inst. Money, 2000, 10(2): 107–130.
[26] Sévi B, Forecasting the volatility of crude oil futures using intraday data, Eur. J. Oper. Res.,
2014, 235(3): 643–659.
[27] Smithn M, Min A, Almeida C, et al., Modeling longitudinal data using a pair-copula decompo-
sition of serial dependence, J. Am. Statist. Assoc., 2010, 105(492): 1467–1479.
[28] Chanda A, Engle R F, and Sokalska M, High frequency multiplicative component GARCH,
Available at SSRN 686173, 2005.
[29] Coroneo L and Veredas D, A simple two-component model for the distribution of intraday returns,
Europ. J. Finance, 2012, 18(9): 775–797.
[30] Vapnik V, The Nature of Statistical Learning Theory, Springer, New York, 2000.
[31] Keerthi S S and Lin C J, Asymptotic behaviors of support vector machines with gaussian kernel,
Neural Comput., 2003, 15(7): 1667–1689.
[32] O’connor M, Remus W, and Griggs K, Going up-going down: How good are people at forecasting
trends and changes in trends?, J. Forecasting, 1997, 16(3): 165–176.
[33] Sundhararajan S, Pahwa A, and Krishnaswami P, A comparative analysis of genetic algorithms
and directed grid search for parametric optimization, Eng. Comput., 1998, 14(3): 197–205.
[34] Boser B E, Guyon I M, and Vapnik V N, A training algorithm for optimal margin classifiers,
Proceedings of the Fifth Annual Workshop on Computational Learning Theory, ACM, 1992.
[35] Cortes C and Vapnik V, Support-vector networks, J Mach Learn Res., 1995, 20(3): 273–297.
[36] Vapnik V, Golowich S E, and Smola A, Support vector method for function approximation,
regression estimation, and signal processing, Adv. Neural Inf. Process. Syst., 1997, 281–287.
[37] Smola A J and Schölkopf B, A tutorial on support vector regression, Statist. Comput., 2004,
14(3): 199–222.
[38] Mercer J, Functions of positive and negative type, and their connection with the theory of inte-
gral equations, Philosophical Transactions of the Royal Society of London. Series A, Containing
Papers of a Mathematical or Physical Character, 1909, 415–446.
[39] Schölkopf B, Burges C J, and Smola A J, Advances in Kernel Methods: Support Vector Learning,
MIT Press, Cambridge, 1999.
[40] Calvori F, Cipollin F, and Gallo G M, Go with the flow: A GAS model for predicting intra-daily
volume shares, Available at SSRN 2363483, 2013.

View publication stats

You might also like