Professional Documents
Culture Documents
Research article
A R T I C L E I N F O A B S T R A C T
Keywords: Forecasting stock market movements is a challenging task from the practitioners’ point of view. We explore how
Chinese airlines model selection via the least absolute shrinkage and selection operator (LASSO) approach can be better used to
LASSO forecast stock closing prices using real-world datasets of daily stock closing prices of three major international
Ridge regression
airlines. Combining the LASSO method with multiple external data sources in our model leads to a robust and
Support vector machine regression
Forecasting
efficient method to predict stock behavior. We also compare our approach with ridge, tree, and support vector
machine regressions, as well as neural network approaches to model the data. We include lags of each external
variable and response variable in the model, resulting in a total of 870 predictor variables. The empirical results
indicate that the LASSO-fitted model is the most effective when compared to other approaches we consider. The
results show that the closing price of an airline stock is affected by its closing price for the previous days and those
of other types of airlines and is significantly correlated with the Shanghai Composite Index for the previous day
and 3 days prior. Other influencing factors include the positive impact of the Shanghai Composite Index daily
share volume, the negative impact of loan interest rates, the amount of highway passenger and railway freight
turnover, etc.
* Corresponding author.
E-mail address: ryan.wu@acu.edu.au (J. Wu).
https://doi.org/10.1016/j.dsm.2023.09.005
Received 13 July 2023; Received in revised form 26 September 2023; Accepted 30 September 2023
Available online 5 October 2023
2666-7649/© 2023 Xi'an Jiaotong University. Publishing services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. This is an open access article under the
CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
X. Xu et al. Data Science and Management 6 (2023) 239–246
240
X. Xu et al. Data Science and Management 6 (2023) 239–246
Section 3 describes the three airline stock datasets and their potential the LASSO estimator simply equates to the ordinary least squares esti-
predictor variables and discusses the preprocessing done on the datasets. mator as no predictors are removed or shrunk in the estimation. In our
Section 4 reports the forecasting results, and Section 5 concludes the approach, we employ cross-validation to select the values of C that
paper. optimize the predictive performance of the model, or, in other words,
values that minimize the mean squared prediction error.
2. Problem formulation
3. The investigated data
The LASSO approach is a type of supervised statistical learning al-
gorithm (Tibshirani, 1996) for performing regression. The name LASSO 3.1. Data description
is an acronym for least absolute shrinkage and selection operator. LASSO
algorithms can be applied to forecast time series data, such as stock All 15 predictor variables listed in Table 2 below were explored in the
closing prices; Panagiotidis et al. (2018) and Xu et al. (2021) expressed model. However, note that we include 30 lags of each of these variables
the time series problem as a regression. LASSO can be thought of as a in the model, as well as lags of the response variable, which is autore-
more modern approach to estimating the parameters in a regression gressive in the model. When predicting the stock price of a certain airline
model than the traditional ordinary least squares approach. The method
constructs a penalty function to obtain a relatively refined model that
shrinks some regression coefficients and removes them from the model. Table 2
Therefore, it retains the advantage of subset contraction and is a biased List of predictor variables used in our modelling.
estimate for processing data with complex collinearity. Intuitively, we Variable Description Data sources Frequency
would expect there to be collinearity among the variables in our model AC close Air China daily stock closing Wind data Service Daily
because they involve data from companies competing in the same mar- price
kets under very similar conditions. In this way, LASSO overcomes many CS close China Southern Airlines Wind data Service Daily
of the problems that the least squares approach would encounter in such daily stock closing Price
CE close China Eastern Airlines daily Wind data Service Daily
a situation, which was described by Li and Chen (2014) and Ludwig et al. stock closing price
(2015). S open Shanghai composite Wind data Service Daily
Our response dataset is a time series y1, …, yn of the daily closing indexdaily opening price
stock prices for Air China observed over n points in time between 2006 S high Shanghai composite index Wind data Service Daily
daily high price
and 2019. If we formulate this time series modelling problem as a
S low Shanghai composite index Wind data Service Daily
regression problem, an appropriate model is a multiple linear regression daily low price
for a response at a given time point t, and this takes the form: S close Shanghai composite index Wind data Service Daily
daily closing price
yi ¼ β0 þ β1 xt1 þ ⋯ þ βp xtp þ ϵ S volume Shanghai composite index Wind data Service Daily
daily share volume
An open Aviation sector index daily Wind data Service Daily
where the xt’s are the observed values of the predictor variables at time
opening price
point t, and β ¼ β1, …, βp is the slope coefficients for each predictor. The A high Aviation sector index daily Wind data Service Daily
value β0 is a constant intercept term, and ϵ is the error term (residuals) in high price
the regression model. A low Aviation sector index daily Wind data Service Daily
It is intuitively logical to select the predictor variables (β1, …, βp) and low price
A close Aviation sector index daily Wind data Service Daily
include a lagged value of the response (daily stock closing prices for Air
closing price
China) and lagged values of other relevant external variables. The A volume Aviation sector index daily Wind data Service Daily
resulting model we use can be described more specifically as an autore- share volume
gressive distributed lag model because we include lag-30 values of all of Crude Daily crude oil spot price EIA Daily
Exchange rate Daily exchange rate (spot CFETS Daily
the predictor variables. Note also that we adjust our predictor variables
rate: USD to CNY)
for seasonality and trend where appropriate before fitting the model by Average Average wages of National Bureau of Yearly
removing the long-term factors. wages employees Statistics
As mentioned, the LASSO approach (Tibshirani, 1996) can be thought GDP Gross domestic product National Bureau of Seasonal
of as an alternative (Birkes and Dodge, 2011) to ordinary least squares Statistics
PCDI Per capita disposable National Bureau of Seasonal
estimation of regression parameters ^β. LASSO is based on a penalized income of national residents Statistics
least squares approach to minimizing the residual sum of squares of the AC net profit The net profit of Air China Financial Seasonal
problem subject to a bound on its L1 norm. Given the response vector y announcement
(centered around its mean), for values t of the bound less than the L1 CS net profit The net profit of China Financial Seasonal
Southern Airlines announcement
norm of the ordinary least squares estimate of β, LASSO estimates of the CE net profit The net profit of China Financial Seasonal
predictors are described as solutions to: Eastern Airlines announcement
Import export Import and export amount Customs Head Office Monthly
^
βL ¼ argminβ ðy XβÞT ðy XβÞ þ Ckβk1 ; (1) Railway Railway passenger turnover National Bureau of Monthly
passenger Statistics
Pp Railway Railway freight turnover National Bureau of Monthly
where kβk1 ¼ j¼1 βj . freight Statistics
The parameter C 0 determines the amount of shrinkage in the Highway Highway passenger National Bureau of Monthly
model. The simultaneous model selection and parameter estimation that passenger turnover Statistics
result is considered an advantageous feature of LASSO, particularly when Highway Highway freight turnover National Bureau of Monthly
freight Statistics
the number of potential predictors is enormous. Xu et al. (2021) dis- Passenger Civil aviation passenger National Bureau of Monthly
cussed it more in detail. In contrast, there is no comparable parameter in volume volume Statistics
an ordinary least squares approach. Interest rate Short term loan interest rate People’s Bank of Monthly
If there is a lot of shrinkage in the LASSO fit, there is a greater ten- per year China
Brent oil The price of brent oil EIA Daily
dency for predictors to be excluded from the model and deemed un-
necessary. Conversely, in the special case where C ¼ 0, i.e., no shrinkage, Note: Energy Information Administration (EIA) of the U.S.
241
X. Xu et al. Data Science and Management 6 (2023) 239–246
company, predictors not only include information such as oil price, ex- preprocess the corresponding data to adjust for this before performing an
change rate, and railway carrying capacity but also relevant stock in- analysis.
formation of the other two airlines, for a total of 870 lagged variables. The package “seastests”, which is available for download and use with
This means that we use a total of 870 predictor variables in our model, the R language statistical programming suite (Team et al., 2013), can be
labelled as β1–β870, respectively. Fig. 1 displays correlations between the used to test for seasonality in the data corresponding to each of the
non-cumulative variables in the study. Data collected between the years predictors. The specific test applied is referred to as the Webel-Ollech
2006 and 2019 were available, and they were split into two sets for overall seasonality test. This test incorporates results from various sea-
training and testing, respectively. The model is first fit to the training sonality tests, which was proposed by Webel and Ollech (2018).
data: the training set comprises time series data observed between 2006 Table 3 indicates whether or not each of our predictors exhibits
to part-way through 2014. We then use this trained model on the test seasonality according to the Webel-Ollech overall seasonality test. Fig. 2
data, which comprised the remaining available data from the latter half shows the predictors with seasonality before and after adjustment.
of 2014 to the end of 2019 to make our predictions. The data that support
the findings of this study are available from the corresponding author 4. Results and discussion
upon request.
This section discusses the forecasting performances of LASSO for the
3.2. Data preprocessing three airlines’ stock closing prices. RR, TR, SVR, and NN are selected as
benchmark models to illustrate the effectiveness of LASSO. In addition,
When seasonality is present in a predictor, it is necessary to LASSO techniques and RR are implemented within the package “glmnet”
(Friedman et al., 2010) in the statistical programming language R, which
we used to produce the main numerical results for this paper. Also
available in the statistical programming language R (Team et al., 2013) is
the following: (1) TR is available as part of the package “rpart” (Therneau
et al., 2015); (2) SVR regression can be implemented using the package
“e1071” (Dimitriadou et al., 2008); and (3) NN can be applied using the
package “neuralnet” (Günther and Fritsch, 2010).
Table 3
Indication of whether each predictor variable exhibits seasonality.
Variable Seasonality Variable Seasonality
present? present?
242
X. Xu et al. Data Science and Management 6 (2023) 239–246
Fig. 2. Plots of predictors that exhibited seasonality before and after adjustment. The black lines correspond to the raw data and the red lines correspond to the data
after adjustment for seasonality.
243
X. Xu et al. Data Science and Management 6 (2023) 239–246
Table 5
Ranked LASSO estimated coefficients of the autoregressive distributed lag model predicting three airlines’ stock prices.
Rank Air China China Eastern Airlines China Southern Airlines
Note: Coefficients with the same name correspond to different lags of the same variable.
244
X. Xu et al. Data Science and Management 6 (2023) 239–246
245
X. Xu et al. Data Science and Management 6 (2023) 239–246
Acknowledgments Lin, Y., Lin, Z., Liao, Y., et al., 2022. Forecasting the realized volatility of stock price
index: a hybrid model integrating CEEMDAN and LSTM. Expert Syst. Appl. 206
(Nov.), 117736.
This work was supported by the Australian Research Council project Liu, J., Kemp, A., 2019. Forecasting the sign of us oil and gas industry stock index excess
(Grant No.: DP160104292), the Zhejiang province soft science project, returns employing macroeconomic variables. Energy Econ. 81 (Jun.), 672–686.
the Wenzhou Basic Soft Science Research Key Project (First Batch, NO.7), Liu, M., Choo, W.C., Lee, C.C., et al., 2023. Trading volume and realized volatility
forecasting: evidence from the China stock market. J. Forecast. 42 (1), 76–100.
and “Chunhui Program” Collaborative Scientific Research Project (Grant Ludwig, N., Feuerriegel, S., Neumann, D., 2015. Putting big data analytics to work:
No.: 202202004). feature selection for forecasting electricity prices using the lasso and random forests.
J. Decis. Syst. 24 (1), 19–36.
Md, A.Q., Kapoor, S., AV, C.J., et al., 2023. Novel optimization approach for stock price
References forecasting using multi-layered sequential LSTM. Appl. Soft Comput. 134 (Feb.),
109830.
Abraham, R., Samad, M.E., Bakhach, A.M., et al., 2022. Forecasting a stock trend using Mok, H.M., 1993. Causality of interest rate, exchange rate and stock prices at stock
genetic algorithm and random forest. J. Risk Financ. Manag. 15 (5), 188. market open and close in Hong Kong. Asia Pac. J. Manag. 10 (Oct.), 123–143.
Alam, I.M.S., Sickles, R.C., 1998. The relationship between stock market returns and Naser, H., Alaali, F., 2018. Can oil prices help predict US stock market returns? Evidence
technical efficiency innovations: evidence from the US airline industry. J. Prod. Anal. using a dynamic model averaging (DMA) approach. Empir. Econ. 55 (4), 1757–1777.
9 (1), 35–51. Nowak, S., Anderson, H.M., 2014. How does public information affect the frequency of
Arekar, K., Jain, R., 2017. Influence of oil price volatility of developed countries on trading in airline stocks? J. Bank. Finance 44 (Jul.), 26–38.
emerging countries stock market returns by using threshold based approach. Theor. Oztekin, A., Kizilaslan, R., Freund, S., et al., 2016. A data analytic approach to forecasting
Econ. Lett. 7 (6), 1834–1847. daily stock returns in an emerging market. Eur. J. Oper. Res. 253 (3), 697–710.
Bao, Y., Lu, Y., Zhang, J., 2004. Forecasting stock price by SVMs regression. In: Panagiotidis, T., Stengos, T., Vravosinos, O., 2018. On the determinants of bitcoin returns:
International Conference on Artificial Intelligence: Methodology, Systems, and a lasso approach. Finance Res. Lett. 27 (Dec.), 235–240.
Applications. Springer, pp. 295–303. Peng, Y., Chen, W., Wei, P., et al., 2020. Spillover effect and Granger causality
Birkes, D., Dodge, Y., 2011. Alternative Methods of Regression. John Wiley & Sons, New investigation between China’s stock market and international oil market: a dynamic
York. multiscale approach. J. Comput. Appl. Math. 367 (C), 112460.
Chandola, D., Mehta, A., Singh, S., et al., 2022. Forecasting directional movement of stock Roy, S.S., Mittal, D., Basu, A., et al., 2015. Stock market forecasting using lasso linear
prices using deep learning. Ann. Data Sci. 10 (5), 1361–1378. regression model. In: Afro-European Conference for Industrial Advancement.
Deng, C., Huang, Y., Hasan, N., et al., 2022. Multi-step-ahead stock price index forecasting Springer, Cham, pp. 371–381.
using long short-term memory model with multivariate empirical mode Sadorsky, P., 2022. Forecasting solar stock prices using tree-based machine learning
decomposition. Inf. Sci. 607 (Aug.), 297–321. classification: how important are silver prices? N. Am. J. Econ. Finance 61 (Jul.),
Dimitriadou, E., Hornik, K., Leisch, F., et al., 2008. Misc functions of the Department of 101705.
Statistics (e1071), TU Wien. R Package 1 (Jan.), 5–24. Siami-Namini, S., 2017. Granger causality between exchange rate and stock price: a Toda
Forsyth, P., Dwyer, L., 2010. Exchange rate changes and the cost competitiveness of Yamamoto approach. Int. J. Econ. Financ. Issues 7 (4), 603–607.
international airlines: the aviation trade weighted index. Res. Transport. Econ. 26 (1), Sonenshine, R., Cauvel, M., 2017. Revisiting the effect of crude oil price movements on
12–17. US stock market returns and volatility. Mod. Econ. 8 (5), 753.
Friedman, J., Hastie, T., Tibshirani, R., 2010. Regularization paths for generalized linear Song, Y.I., Woo, W., Rao, H.R., 2007. Interorganizational information sharing in the
models via coordinate descent. J. Stat. Software 33 (1), 1. airline industry: an analysis of stock market responses to code-sharing agreements.
Gandhmal, D.P., Kumar, K., 2019. Systematic analysis and review of stock market Inf. Syst. Front 9 (Apr.), 309–324.
prediction techniques. Comput. Sci. Rev. 34 (Nov.), 100190. Staffini, A., 2022. Stock price forecasting by a deep convolutional generative adversarial
Goh, C.F., Rasli, A., 2014. Stock investors’ confidence on low-cost and traditional airlines network. Front. Artif. Intell. 5 (Feb.), 837596.
in Asia during financial crisis of 2007–2009. Procedia Soc. Behav. Sci. 129 (May), Team, R.C., 2013. R: A Language and Environment for Statistical Computing. Available at:
31–38. https://www.r-project.org/.
Gong, S.X., Firth, M., Cullinane, K., 2008. International oligopoly and stock market Therneau, T., Atkinson, B., Ripley, B., et al., 2015. Package ‘rpart’. Available at: https://
linkages: the case of global airlines. Transp. Res. E Logist. Transp. Rev. 44 (4), cran.ma.ic.ac.uk/web/packages/rpart/rpart.pdf.
621–636. Tibshirani, R., 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser.
Günther, F., Fritsch, S., 2010. Neuralnet: training of neural networks. R J 2 (1), 30. B Methodol. 58 (1), 267–288.
Hamrita, M.E., Trifi, A., 2011. The relationship between interest rate, exchange rate and Webel, K., Ollech, D., 2018. An overall seasonality test based on recursive feature
stock price: a wavelet analysis. Int. J. Econ. Financ. Issues 1 (4), 220–228. elimination in conditional random forests. In: Proceedings of the 5th International
Kamara, A.F., Chen, E., Pan, Z., 2022. An ensemble of a boosted hybrid of deep learning Conference on Time Series and Forecasting, pp. 20–31.
models and technical analysis for forecasting stock prices. Inf. Sci. 594 (May), 1–19. Xu, X., McGrory, C.A., Wang, Y.G., et al., 2021. Influential factors on Chinese airlines’
Kirisci, M., Cagcag Yolcu, O., 2022. A new CNN-based model for financial time series: profitability and forecasting methods. J. Air Transport. Manag. 91 (Mar.), 101969.
TAIEX and FTSE stocks forecasting. Neural Process. Lett. 54 (4), 3357–3374. Zhou, Z., Gao, M., Liu, Q., et al., 2020. Forecasting stock price movements with multiple
Kurani, A., Doshi, P., Vakharia, A., et al., 2023. A comprehensive comparative study of data sources: evidence from stock market in China. Physica A Stat. Mech. Appl. 542
artificial neural network (ANN) and support vector machines (SVM) on stock (Mar.), 123389.
forecasting. Ann. Data Sci. 10 (1), 183–208.
Li, J., Chen, W., 2014. Forecasting macroeconomic time series: LASSO-based approaches
and their forecast combinations with dynamic factor models. Int. J. Forecast. 30 (4),
996–1015.
246