You are on page 1of 25

Ceylan Zeynep (Orcid ID: 0000-0002-3006-9768)

Estimation of healthcare expenditure per capita of Turkey using artificial intelligence


techniques with genetic algorithm-based feature selection

Zeynep CEYLAN1*, Abdulkadir Atalan2

1
Samsun University, Faculty of Engineering, Department of Industrial Engineering, Samsun,
55420, Turkey
2
Gaziantep Islam Science and Technology University, Faculty of Engineering and Natural
Sciences, Department of Industrial Engineering, Gaziantep, 27010, Turkey

Abstract

This study presents a comprehensive analysis of artificial intelligence (AI) techniques to


predict healthcare expenditure per capita (pcHCE) in Turkey. Well-known AI techniques such
as Random Forest (RF), Artificial Neural Network (ANN), Multiple Linear Regression (MLR),
Support Vector Regression (SVR), and Relevance Vector Machine (RVM) were used to
forecast pcHCE. Twenty-nine years of historical data from 1990 to 2018 were used in the
training and testing phases of the models considered. Gross domestic product per capita, life
expectancy at birth, unemployment rate, crude birth rate, the number of physicians and
hospitals were used as input variables in the estimation of pcHCE. A genetic algorithm-based
feature selection (GAFS) method was applied to all models to select the relevant and optimal
feature subset in the prediction of pcHCE. The comparative results showed that the GAFS
method improved the overall performance of all base AI models. The hybrid GAFS-RF model

This article has been accepted for publication and undergone full peer review but has not been
through the copyediting, typesetting, pagination and proofreading process which may lead to
differences between this version and the Version of Record. Please cite this article as doi:
10.1002/for.2747

This article is protected by copyright. All rights reserved.


performed best in all AI-based prediction methods, with a 99.78% correlation of determination
(R2) value at the testing stage.

Keywords: Healthcare expenditure per capita, genetic algorithm-based feature selection,


optimization, artificial intelligence, prediction

1. Introduction

It is necessary to have a healthy society for a country's economy to be strong and its economic
growth to continue. The existence of a healthy society is provided by qualified health services
that meet the needs of society. Healthcare expenditures (HCE) or healthcare expenditure per
capita (pcHCE) are important parameters that measure the health economy. In recent years,
HCE has increased globally and is expected to continue growing. Such a trend raises concerns
about the sustainability of health financing. Thus, the concerns about the rapid growth in HCE
and its long-term sustainability have led researchers to investigate the determinants of HCE.
In the literature, many studies have been conducted to identify the determinants affecting HCE
or pcHCE. In most studies, the relationship between HCE and gross domestic product (GDP)
has been extensively investigated. For example, Toor and Butt (2005) examined six factors that
affect the HCE of Pakistan, such as pcGDP, urbanization, literacy rate, crude birth rate, and
foreign aid (Toor & Butt, 2005). Khan et al. (2016) investigated the short-run and long-run
equilibrium dynamic causal relationship between HCE and GDP per capita (pcGDP) within
the time series framework from 1981 to 2014 in Malaysia. The analysis results showed that
pcGDP, population growth, population structure, and technology play important roles in
explaining changes in the HCE (Khan, Razali, & Shafie, 2016). Rodríguez and Nieves Valdés
(2019) investigated the relationship between GDP and HCE for a group of Latin American and
the Caribbean countries and OECD countries for the period 1995–2014. They showed that GDP
does not react to changes in the HCE level in the long-term (Rodríguez & Nieves Valdés,
2019).
The change in the HCE is also closely related to the demographic characteristics, socio-
economic factors, and health resources of a country such as the number of hospitals and
physicians, life expectancy at birth, unemployment rate, and crude birth rate, etc. The
unemployment rate is one of the most critical factors affecting the economic structure of

This article is protected by copyright. All rights reserved.


countries. A low unemployment rate is one of the main objectives of government
macroeconomic policy. Braendle and Colombier (2016) examined the impact of demand and
supply-side determinants and political economy aspects on public HCE growth. They
suggested that pcGDP and unemployment rate are positively related to public HCE growth
(Braendle & Colombier, 2016). Clemente et al. (2019) analyzed the development of HCE in
the USA. They reported the unemployment rate is a key variable to explain the formation of
HCE (Clemente, Láz aro-Alquézar, & Montañés, 2019).
The pcHCE is directly proportional to life expectancy at birth (years). Life expectancy at birth
is a comprehensive parameter that defines the age-specific mortality levels of a population.
Jaba et al. (2014) revealed that there is a significant relationship between HCE and life
expectancy (Jaba, Balan, & Robu, 2014). Pichon-Riviere et al. (2015) underlined that there
should be an increase of approximately 7-10 % in the amount of pcHCE for a one year increase
in life expectancy (Pichon-Riviere, Augustovski, Garcia Marti, & Caporale, 2015). Tunzi and
Simo-Kengne (2020) analyzed yearly data from 1983 to 2015 to investigate the relationship
between the HCE and population ageing in South Africa. As a result of the study, they found
that by using economic and demographic projection statistics, public health spending could
roughly double in the next fifteen years. In the study by Ray and Linden (2020), the impact of
life expectancy at birth on the HCE was analyzed by using data covering 20 years for 195
countries (Ray & Linden, 2020).
The crude birth rate is one of the most important parameters to evaluate whether society has
good health conditions. Toor and Butt (2005) stated that the crude birth rate is an essential
factor affecting the health economy in the short term (Toor & Butt, 2005). Boachie et al. (2014)
used the crude birth rate and life expectancy parameters to analyze the impact on the HCE in
Ghana between 1970 and 2008 (Boachie et al., 2014). They reported that the crude birth rate
has a long-run relationship with public HCE and investment.
Rapid population growth, increased life expectancy and healthcare demand cause more
investment in healthcare systems (Bernal-Delgado, Comendeiro-Maaløe, Ridao-López, &
Sansó Rosselló, 2020). The need for investment leads to an increase in the amount allocated to
health budgets to provide healthcare resources such as hospital beds or exam rooms, physicians,
nurses, technicians, assistants, etc. Especially, the number of actively employed physicians and
the number of hospitals (regardless of private or government) in the countries are important
factors affecting the HCE. In the study by Akca et al. (2017), major determinants such as the
number of hospitals and the physicians were used in the estimation of HCE in OECD member
countries (Akca, Sönmez, & Yılmaz, 2017). Di Matteo and Cantarero-Prieto (2018) found that

This article is protected by copyright. All rights reserved.


the physician number is a significant determinant of HCE in Canada (Di Matteo & Cantarero-
Prieto, 2018).

2. The prediction of healthcare expenditures

Accurate estimation of HCE is essential for policy-makers to plan future resources. Besides,
estimating the amount of the budget allocated for healthcare has a significant impact on
government policy and planning (Astolfi, Lorenzoni, & Oderkirk, 2012). In the literature, many
studies have been performed on the forecasting of HCE of different countries using various
techniques from classical to advanced models.
For example, Getzen and Poullier (1992) studied the estimation of the HCE for 19 OECD
countries using data between 1965 and 1979. They estimated the HCE from 1980 to 1987 by
using exponential smoothing, moving average, and autoregressive integrated moving average
(ARIMA) methods. The results of the study showed that multivariate regression models
provide more accurate predictions than time series models (Getzen & Poullier, 1992). Di
Matteo (2010) used determinant regression and growth rate extrapolation techniques to forecast
the HCE of Canada. The results showed that the budget allocated for healthcare by the
Canadian state government would continue to increase in the future, and its share in provincial
GDP will increase (Matteo, 2010). Chaabouni and Abednnadher (2013) estimated the HCE of
Tunisia based on socio-economic and demographic variables using artificial neural network
(ANN) and Autoregressive Distributed Lag (ARDL) models (Chaabouni & Abednnadher,
2013). Zhao (2015) predicted annual HCE using the exponential smoothing, ARIMA, and
Vector Autoregression (VAR) models for the 34 member countries of the OECD. Unlike the
study by Getzen and Poullier (1992), it was observed that simple statistical and time series
models provide better predictions against complex micro panel data models (Zhao, 2015).
Klazoglou and Dritsakis (2018) used the ARIMA model to predict total health spending in the
USA from 1900 to 2017. In their study, they aimed to identify the appropriate model based on
the Box–Jenkins methodology. The results showed that the ARIMA (2,1,0) model was the best
model to forecast the HCE in the USA (Klazoglou & Dritsakis, 2018). Özcan and Tüysüz
(2018) used ARIMA and grey forecasting models for predicting the pcHCE of Turkey. They
applied the Genetic algorithm for training data size and parameter optimization of the grey
forecasting models. The results demonstrated that the optimization of parameters and training
data size together with a rolling mechanism highly improve the forecasting performance of the
grey models (Özcan & Tüysüz, 2018). Zheng et al. (2020) used the ARIMA (3,3,0) model to
predict changes in total HCE in China from 2018 to 2022. The analysis results of the study

This article is protected by copyright. All rights reserved.


showed that China should take effective measures to control the rapid growth of total HCE
(Zheng et al., 2020).
As seen in the literature, classical time series models, especially the ARIMA model, are widely
used in the prediction of HCE or pcHCE. The main advantage of the ARIMA model is its
simplicity and systematic structure (Ceylan, Bulkan, & Elevli, 2020). However, the ARIMA
model is only a class of linear model, and thus can only capture linear patterns in a time series.
It cannot adequately capture nonlinear patterns hidden in a time series. To overcome this
limitation of the ARIMA model, advanced approaches are needed that can characterize
complex nonlinear patterns well. In recent years, Artificial Intelligence (AI) tools have been
widely used because of their success in explaining complex systems. AI models quickly adapt
to changes in the system and can successfully predict nonlinear problems. Thus, this study
presents a comprehensive analysis of pcHCE prediction of Turkey using socio-economic and
demographic data based on AI-based forecasting models. In this context, the most common AI
algorithms, namely Random Forest (RF), Artificial Neural Network (ANN), Multiple Linear
Regression (MLR), Support Vector Regression (SVR), and Relevance Vector Machine (RVM)
were used to forecast pcHCE. To the best of our knowledge, this study is the first attempt that
uses and compares different multiple AI techniques to estimate the pcHCE of Turkey.

3. Material and Methods


3.1. Data collection

As shown in Table 1, the dataset was created using different sources for the years between
1990 and 2018. The pcHCE data provided by the Organization for Economic Co-operation and
Development (OECD) was used as an output variable (OECD, 2020). According to the data of
OECD, pcHCE has recorded the highest increase in 2018 with 15.9% since 2012. Although an
increase in the amount of pcHCE has been observed, some declines have been found in the
twenty-eight years. As shown in Figure 1, the most dramatic decrease in the amount of pcHCE
in Turkey occurred between 1993-1994 and 2014-2015 due to the economic crisis. On the other
hand, the input dataset was constructed by collecting historical values such as pcGDP, life
expectancy at birth (LE), unemployment rate (UR), crude birth rate (CBR), the number of
hospitals (HN), and the number of hospitals physicians (PN). Figure 2 indicates the change in
input parameters depending on the years.
Table 2 shows the Pearson correlation coefficients (R) to determine the degree of the linear
relationship between the input variables and the input variables and the pcHCE in the years
between 1990 and 2018. As shown in Table 2, there is a high degree of correlation between the

This article is protected by copyright. All rights reserved.


pcHCE and pcGDP, LE, CBR, HN, and PN variables (0.9352 < 𝑅 < 0.9875) while there is a
weaker correlation between the pcHCE and the UR (𝑅 = 0.7078). To provide constant
variability and to minimize the effects of high variance variables on estimation results, the
dataset was normalized in the range [0.05-0.95] using Equation (1). Then, training and test
datasets were generated randomly. 70% of the whole data (20 samples) were used in the
training phase, and 30% (9 samples) were used in the testing phase.
𝑥 − 𝑥𝑚𝑖𝑛
𝑥𝑛 = × 0.9 + 0.05 (1)
𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛
where 𝑥, 𝑥𝑛 , 𝑥𝑚𝑖𝑛 and 𝑥𝑚𝑎𝑥 are the actual, normalized, minimum, and maximum values,
respectively.

3.2. AI-based forecasting models

In this study, five different AI-based estimation techniques, namely RF, ANN, MLR, SVR, and
RVM, were used for pcHCE prediction of Turkey. An evolutionary genetic algorithm based
feature selection (GAFS) technique was also applied to all models to get relevant features
subset to improve the performance accuracy of the models. All analyses were carried out on
the RapidMiner Studio 9.6 software. The details of the five models are described in the
following sub-paragraphs.

3.2.1. Random Forest

Random Forest (RF) algorithm is an ensemble learning method that combines a large set of
classification and regression trees. The RF starts with many bootstrap samples randomly
selected from the original pre-dataset. A regression tree is fitted to each of the bootstrap
samples. For each node per tree, a small set of input variables is extracted randomly from all
set at a certain rate (Wang, Zhou, Zhu, Dong, & Guo, 2016). The predicted value of an
observation is calculated by taking the average of all trees.
RF is general in many data sets to show the significance values of variables in its analysis
(Archer & Kimes, 2008). It does not include bias in the estimation results obtained. Therefore,
transactions with many independent variables are obtained as unbiased estimates. In other AI
techniques, variables that are more specific and meet certain criteria are selected, while random
variables are included in the algorithm in RF. This enables the results of the random forest to
be more generalizable.
RF is expected to perform better in terms of prediction accuracy than single base learners. Also,
ensemble learning in RF can alleviate the overfitting problem. This makes RF less prone to
overfitting than other machine learning algorithms such as ANN, SVM, etc. Also, since the

This article is protected by copyright. All rights reserved.


data and variables are different in each tree, it can learn quickly and has no adaptation problems.
For this reason, it provides the opportunity to work very comfortably in big data with many
input variables and in data sets that are mostly missing. Pruning is not needed in trees created
due to the researcher's ability to create as many trees as desired and the use of different variables
and data in the trees. This is another feature that distinguishes RF from decision trees.

3.2.2. Artificial Neural Network

Artificial neural network (ANN) is a computational model that mimics the structure, processing
capability, and learning ability of a human brain (Haykin, 1994; Lippmann, 1987). It is one of
the most widely used AI techniques to solve complex problems in engineering, finance, health,
chemistry, etc. (Graupe, 2013; Su et al., 2019; Wang et al., 2019). ANN consists of
interconnected adaptive simple processing elements, also called artificial nodes or neurons
based on brain learning ability (Li, Wang, & Qiu, 2019). It has the ability to store knowledge
and learn and model the complex nonlinear relationship between output and input data (Ceylan
& Bulkan, 2018). This feature makes the ANN model superior to other AI methods. It has
generalization ability so that it can interpret unseen data. Moreover, the important feature of
the ANN model is that it is fault-tolerant. This means that it is capable of processing noisy or
fuzzy information, while it can endure incomplete or missing data. Like all approaches, ANN
has some limitations. Training can be time-consuming depending on the complexity of the
modelled data. As the number of hidden layers needed to capture the properties of the data
increases, so does the time required to complete the training. Overfitting is another problem of
the ANN model, which causes the memorization of training cases and poor performance of the
network.

3.2.3. Multiple Linear Regression

Multiple Linear Regression (MLR) analysis is one of the most common techniques used for
estimation due to its easy use. MLR is also the most preferred approach to measure the effect
of independent variables on dependent variables. This technique usually refers to an equation
between a single dependent variable and multiple independent variables. The equation contains
a constant number and coefficients for independent variables. If an independent variable has
no effect on the dependent variable, its coefficient will be zero. The main limitation of this
method is its unsuitability in nonlinear problems (Atalan, 2019).

3.2.4. Support Vector Regression

This article is protected by copyright. All rights reserved.


Support Vector Machine (SVM) was first introduced by Cortes and Vapnik in 1995 (Cortes &
Vapnik, 1995). SVM aims to find a hyperplane with a maximum margin from each support-
vector in the feature space. Hyperplanes in SVM are considered as decision boundaries that
classify the data points. Support vectors are data points that lie closest to the hyperplane and
affect the position and orientation of the hyperplane (Jiang, Rusuli, Amuti, & He, 2019). SVM
is used for classification and regression problems. Support vector regression (SVR) is the
kernel-based AI prediction method based on the principle of structural risk minimization. It
aims to minimize adherence to generalization error. A kernel function is used to learn a linear
classifier to classify a nonlinear dataset. It transforms nonlinear data into linear data and then
draws a hyperplane.
The computational complexity in the SVR model does not depend on the size of the input
space. This feature makes the SVR model superior to other AI-based predictive methods. It is
also a useful method even if there is no prior knowledge about the data. Additionally, it has
excellent generalization capability, with high prediction accuracy. On the other hand,
determining an appropriate kernel function is difficult and complex to handle the nonlinear
data (Ceylan, 2020b, 2020a). In the case of using high-dimensional data, you can create too
many support vectors that significantly reduce the training speed.

3.2.5. Relevance Vector Machine

Relevance Vector Machine (RVM) is a kernel-based AI algorithm used for both classification
and regression analysis (Zhou & Dhupia, 2020). RVM is based on Bayesian inference that
provides restricted solutions. Thus, RVM uses an entirely probabilistic framework and brings
the prior probability to the weights of the model. In this model, there is a one-to-one and
dominating-dominated relationship from the hyperparameters to the weights. The
hyperparameters are obtained by repeated iteration, where the posterior distribution of most
weights approaches zero (Kong et al., 2019). It adopts the same functional form of SVM.
Compared to the SVM model, the RVM model can provide probabilistic predictions. Besides,
the most significant advantage of the RVM model is that the high sparseness of RVM can
reduce the number of kernel functions involved in computation, which makes it especially
suitable for online monitoring. The kernel functions of SVM must meet Mercer's state.
However, the selection of kernel function is no longer limited by Mercer’s condition in the
RVM model. Although the RVM model has the above advantages, it has some limitations in
practical applications. The learning procedure of the RVM model is usually much slower than
the SVM since it is a problem of O(N3).

This article is protected by copyright. All rights reserved.


3.3. Genetic algorithm-based Feature Selection (GAFS)

The feature vector plays an essential role in the performance of the models. The relevant feature
is necessary for the training process as it has an informative aspect to improve the model.
However, the irrelevant features are less informative, so their inclusion can negatively affect
the performance of the models. It is the task of the feature selection methods to determine
which features are relevant or irrelevant. Feature selection or input selection methods are used
to select a subset of variables that can efficiently identify input data. In this way, irrelevant and
redundant features that do not contribute or decrease the accuracy of the predictive model are
determined and removed from the dataset (Srinivasa Murthy & Koolagudi, 2018).
In this study, the feature selection task was performed using a Genetic algorithm (Gómez and
Quesada, 2017; Rachmani et al., 2019). The genetic algorithm (GA) is an important heuristic
algorithm inspired by the natural evolution procedure. It generates solutions to optimization
problems based on the mechanics of natural genetics and biological evolution. The steps of the
GAFS method is depicted in Figure 3 (Huang & Wang, 2006; Welikala et al., 2015).
The first step is to create the individuals in the population. Each individual (chromosome) in
the population represents a candidate solution to the feature subset selection problem. The
number of genes is the total number of features in the data set. Let m be the total number of
features. Each individual is represented by a binary vector of dimension m. Each positive gen
value of ‘1’ means that the corresponding feature is included in the model; otherwise (if ‘0’),
the feature is not selected. A random population is then generated that represents different
points in the search space. Fitness function, which gives the quality of the produced member
of the population, is assigned to each individual. After performing fitness assignment, the
selection operator chooses the individuals that will recombine for the next generation.
The selection mechanism mimics the principle of survival-of-the-fittest mechanism in nature.
Thus, the selection operator selects the individuals according to their pre-assigned fitness
levels, and selected ones enter the mating pool. After the selection process, crossover and
mutation operators are applied to the selected individuals. The crossover operator, also called
recombination, randomly selects two individuals and recombines their features to generate new
offsprings for the new population. Afterward, for each offspring, its fitness function is
calculated. The crossover operator can produce offsprings that are very similar to parents. This
leads to a new generation with low diversity. The mutation operator overcomes this problem
by randomly changing the value of some features in the offsprings. Afterward, for each
offspring, its fitness function is calculated. The fitness assignment, selection, crossover, and

This article is protected by copyright. All rights reserved.


mutation process are repeated for a fixed number of generations or until a termination condition
is satisfied.
4. Results and Discussion
4.1. Comparing the performance of models
In the literature, various performance criteria have been proposed to assess the performance of
the prediction models. In this study, mean absolute deviation (MAD), root mean square error
(RMSE), mean absolute percentage error (MAPE), and correlation of determination (R2)
measures were used as the statistical criteria. Equations (2)-(5) are their respective formulas.
𝑛
1
𝑀𝐴𝐷 = ∑|𝑎𝑗 − 𝑝𝑗 | (2)
𝑛
𝑗=1

∑𝑛𝑖=1(𝑝𝑗 − 𝑎𝑗 )2
𝑅𝑀𝑆𝐸 = √ (3)
𝑛
𝑛
1 𝑝𝑗 − 𝑎𝑗
𝑀𝐴𝑃𝐸 = ∑ | | × 100 % (4)
𝑛 𝑎𝑖
𝑖=1
2
2
∑𝑛𝑖=1[(𝑝𝑗 − 𝑎𝑗 )]
𝑅 =1− 2 (5)
∑𝑛𝑖=1[(𝑎𝑗 − 𝑎̅)]
where 𝑎𝑗 is the actual value and 𝑝𝑗 is the predicted value for the 𝑗𝑡ℎ value, 𝑎̅ and 𝑝̅ are the
average of the actual and predicted values, respectively. Also, n is the total number of data
points. The higher values of R2 and the lower values of MAD, RMSE, and MAPE mean better
performance of the developed model. Each model was divided with a percentage split strategy
(70% for training and 30% for testing), which is a widespread tool to predict the performance
of a model. The testing stage was carried out to evaluate the prediction precision and the
generalization capability of the developed models. Table 3 depicts the performance metric
values for all predictive models. Here all the features (pcGDP, UR, CBR, PN, HN, and LE) in
the dataset were used as inputs for modeling at all stages, including training and testing. The
values of R2 were in the range of 0.9569-0.9969 in the training dataset and 0.9385-0.9874 in
the testing dataset. The highest R2 (0.9874) value and the lowest MAD (29.2356), RMSE
(42.2965), and MAPE (4.6082) values in the test data were obtained by the RF model, which
implies that the model was successful in predicting pcHCE. Although the ANN and RVM
models performed close to the RF model during the testing phase, the performance of the MLR
model was rather low compared to other models. The nonlinear structure of the data makes it
difficult to achieve high accuracy by traditional regression methods such as MLR. In such

This article is protected by copyright. All rights reserved.


cases, powerful AI-based estimation methods perform better to analyze causal relationships in
data.
The success of the developed AI-based models to predict pcHCE is directly related to the
suitability of the input parameters. Therefore, the GAFS method was applied to reveal the most
relevant features in the dataset. This method explores the space of possible feature subsets to
maximize the predictive accuracy of the AI models mentioned above. The parameter set of
GAFS method was set as follow: population size =10, the maximum number of generations=
100, selection scheme = tournament, crossover-type = uniform, crossover fraction= 0.8, and
mutation rate = 0.1. The initial population was generated randomly. Table 4 shows the subset
of features selected by the GAFS method for each model. Each model selects subsets of features
in different combinations.
Figure 4 displays that the optimization with the GAFS method improves prediction
performances of the models significantly in both training and testing stages. In the training
stage, the values of R2 increased from 99.69%, 98.81%, 95.69%, 98.07%, 98.68% to 99.87%,
99.12%, 98.18%, 98.74%, 99.88 for the RF, ANN, MLR, SVR, and RVM models, respectively
(Figure 4a). In the testing stage, the values of R2 increased from 98.74%, 98.18%, 93.85%,
97.17%, and 98.11% to 99.86%, 99.00%,97.90%, 98.44%, and 98.78 for the RF, ANNs MLR,
SVR, and RVM models, respectively (Figure 4b). Table 5 presents the performance measures
of AI models using the GAFS method. Based on the comparison of the predictions given by
the models, the RF model optimized with the GAFS method performed best compared to the
other models (𝑅2 = 0.9986, 𝑀𝐴𝐷 = 11.253, 𝑅𝑀𝑆𝐸 = 12.7544, 𝑎𝑛𝑑 𝑀𝐴𝑃𝐸 = 2.0384).
Also, the LE, PN, HN, and CBR features were selected as the most relevant and important
determinants used to predict pcHCE for the GAFS-RF model.

3.4. Discussion

The objective of this study is to employ accurate and robust AI-based models to estimate the
pcHCE of Turkey accurately. For this purpose, the GAFS method, a useful technique for
dimension reduction to increase the performance of the models, was used as a feature selection
technique in this study. Compared to AI-based estimation models, optimized models with the
GAFS method provided better predictive performance in the dataset. As seen in Table 6, the
performance measures were synthesized by using overall performance measure; 𝑆𝐼 =
1 𝑃𝑖 −𝑃𝑚𝑖𝑛,𝑖
∑𝑚
𝑖=1 ( ), where 𝑃𝑖 is the value of the 𝑖𝑡ℎ average performance measurement,
𝑚 𝑃𝑚𝑎𝑥,𝑖 −𝑃𝑚𝑖𝑛,𝑖

𝑃𝑚𝑖𝑛,𝑖 is the minimum value of the 𝑖𝑡ℎ performance measure, 𝑃𝑚𝑎𝑥,𝑖 is the maximum value of

This article is protected by copyright. All rights reserved.


the 𝑖𝑡ℎ performance measure and 𝑚 number of evaluation methods. The 𝑆𝐼 value ranges
between 0 and 1. The value of 𝑆𝐼 close to zero indicates that the model is more accurate. The
GAFS-RF model was observed to achieve high accuracy with optimal combinations of
independent variables for the pcHCE estimation. This is due to the fact that the RF is very
robust and can handle complex nonlinear data. Another reason for this situation is that the RF
model is an ensemble AI algorithm. RF applies the technique of bagging (bootstrap
aggregating) that randomly samples the feature set at each node in a decision tree. This feature
allows the RF method to work well on a relatively small sampling dataset and also prevents
data from being overfitting.

4. Conclusion

There are many parameters used in evaluating the healthcare systems of countries. Just as the
economic developments of countries can be expressed with pcGDP, the development of health
systems can also be measured by the amount of pcHCE. In this study, six important factors
affecting pcHCE, namely pcGDP, life expectancy at birth, the unemployment rate, crude birth
rate, the number of physicians and hospitals, were used as inputs. The different AI-based
forecasting models such as RF, ANN, MLR, SVR, and RVM were used for the prediction of
pcHCE of Turkey. An evolutionary genetic algorithm was used as the feature selection method
to improve the prediction performance of the AI models. A comprehensive comparison
demonstrated that the combination of the GAFS method and AI-based prediction models has
strong predictive power compared to basic models that do not use the selection method. Among
all approaches, the GAFS-RF algorithm outperformed in predicting the pcHCE of Turkey. The
results of the study can be an important resource for the future planning of decision-makers in
health to take the necessary measures and determine sustainable policies. In addition, the
authors believe that comparing other evolutionary approaches such as particle swarm
optimization (PSO) and ant colony optimization (ACO) to select appropriate features would be
a good subject for future studies.

Conflicts of Interest
The authors declare that they have no conflicts of interest.

This article is protected by copyright. All rights reserved.


ORCID
Zeynep Ceylan, https://orcid.org/0000-0002-3006-9768
Abdulkadir Atalan, https://orcid.org/0000-0003-0924-3685

Data Availability Statement


Data are available on request from the authors: The data that support the findings of this study
are available from the corresponding author upon reasonable request.

References
Akca, N., Sönmez, S., & Yılmaz, A. (2017). Determinants of health expenditure in OECD countries:
A decision tree model. Pakistan Journal of Medical Sciences, 33(6), 2–6.
https://doi.org/10.12669/pjms.336.13300

Archer, K. J., & Kimes, R. V. (2008). Empirical characterization of random forest variable
importance measures. Computational Statistics & Data Analysis, 52(4), 2249–2260.
https://doi.org/10.1016/J.CSDA.2007.08.015

Astolfi, R., Lorenzoni, L., & Oderkirk, J. (2012). Informing policymakers about future health
spending: A comparative analysis of forecasting methods in OECD countries. Health Policy,
107(1), 1–10. https://doi.org/10.1016/j.healthpol.2012.05.001

Atalan, A. (2019). The Impacts of Healthcare Resources on Services of Emergency Department:


Discrete Event Simulation with Box-Behnken Design. PONTE International Scientific
Researchs Journal, 75(6), 12–23. https://doi.org/10.21506/j.ponte.2019.6.10

This article is protected by copyright. All rights reserved.


Bernal-Delgado, E., Comendeiro-Maaløe, M., Ridao-López, M., & Sansó Rosselló, A. (2020). Factors
underlying the growth of hospital expenditure in Spain in a period of unexpected economic
shocks: A dynamic analysis on administrative data. Health Policy.
https://doi.org/10.1016/J.HEALTHPOL.2020.02.001

Boachie, M. K., Mensah, I. O., Sobiesuo, P., Immurana, M., Iddrisu, A.-A., & Kyei-Brobbey, I.
(2014). Determinants of public health expenditure in Ghana: a cointegration analysis. Ournal of
Behavioural Economics, Finance, Entrepreneurship, Accounting and Transport, 2(2), 35–40.
https://doi.org/10.12691/jbe-2-2-1

Braendle, T., & Colombier, C. (2016). What drives public health care expenditure growth? Evidence
from Swiss cantons, 1970–2012. Health Policy, 120(9), 1051–1060.
https://doi.org/10.1016/J.HEALTHPOL.2016.07.009

Ceylan, Z, & Bulkan, S. (2018). Forecasting PM10 levels using ANN and MLR: A case study for
Sakarya City. Global Nest Journal, 20(2), 281–290.

Ceylan, Z. (2020a). Assessment of agricultural energy consumption of Turkey by MLR and Bayesian
optimized SVR and GPR models. Journal of Forecasting.

Ceylan, Z. (2020b). Estimation of municipal waste generation of Turkey using socio-economic


indicators by Bayesian optimization tuned Gaussian process regression. Waste Management &
Research, 0734242X20906877.

Ceylan, Z., Bulkan, S., & Elevli, S. (2020). Prediction of medical waste generation using SVR, GM (
1, 1) and ARIMA models : a case study for megacity Istanbul.

Chaabouni, S., & Abednnadher, C. (2013). Modelling and forecasting of Tunisia's health expenditures
using artificial neural network and ARDL models -.

Clemente, J., Lázaro-Alquézar, A., & Montañés, A. (2019). US state health expenditure convergence:
A revisited analysis. Economic Modelling, 83, 210–220.
https://doi.org/10.1016/j.econmod.2019.02.011

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.

Di Matteo, L., & Cantarero-Prieto, D. (2018). The Determinants of Public Health Expenditures:
Comparing Canada and Spain, (87800). Retrieved from https://mpra.ub.uni-
muenchen.de/87800/1/MPRA_paper_87800.pdf

Getzen, T. E., & Poullier, J.-P. (1992). International health spending forecasts: Concepts and
evaluation. Social Science & Medicine, 34(9), 1057–1068. https://doi.org/10.1016/0277-
9536(92)90136-E

Graupe, D. (2013). Principles of Artificial Neural Networks (Vol. 7). WORLD SCIENTIFIC.

This article is protected by copyright. All rights reserved.


https://doi.org/10.1142/8868

Haykin, S. (1994). Neural networks: A Comprehensive Foundation. Neural Networks. Prentice Hall
PTR.

Jaba, E., Balan, C. B., & Robu, I.-B. (2014). The Relationship between Life Expectancy at Birth and
Health Expenditures Estimated by a Cross-country and Time-series Analysis. Procedia
Economics and Finance, 15(14), 108–114. https://doi.org/10.1016/s2212-5671(14)00454-7

Jiang, H., Rusuli, Y., Amuti, T., & He, Q. (2019). Quantitative assessment of soil salinity using multi-
source remote sensing data based on the support vector machine and artificial neural network.
International Journal of Remote Sensing. https://doi.org/10.1080/01431161.2018.1513180

Khan, H. N., Razali, R. B., & Shafie, A. B. (2016). Modeling determinants of health expenditures in
Malaysia: Evidence from time series analysis. Frontiers in Pharmacology, 7(MAR), 1–7.
https://doi.org/10.3389/fphar.2016.00069

Klazoglou, P., & Dritsakis, N. (2018). Modeling and Forecasting of US Health Expenditures Using
ARIMA Models. In N. Tsounis & A. Vlachvei (Eds.), Advances in Panel Data Analysis in
Applied Economic Research (pp. 457–472). Cham: Springer International Publishing.

Kong, D., Chen, Y., Li, N., Duan, C., Lu, L., & Chen, D. (2019). Relevance vector machine for tool
wear prediction. Mechanical Systems and Signal Processing, 127, 573–594.
https://doi.org/10.1016/j.ymssp.2019.03.023

Li, F., Wang, Z., & Qiu, J. (2019). Long‐term streamflow forecasting using artificial neural network
based on preprocessing technique. Journal of Forecasting, 38(3), 192–206.
https://doi.org/10.1002/for.2564

Lippmann, R. P. (1987). An Introduction to Computing with Neural Nets. IEEE ASSP Magazine.
https://doi.org/10.1109/MASSP.1987.1165576

Marinakis, Y., Dounias, G., & Jantzen, J. (2009). Pap smear diagnosis using a hybrid intelligent
scheme focusing on genetic algorithm based feature selection and nearest neighbor
classification. Computers in Biology and Medicine.
https://doi.org/10.1016/j.compbiomed.2008.11.006

Matteo, L. Di. (2010). The sustainability of public health expenditures : evidence from the Canadian
federation, 569–584. https://doi.org/10.1007/s10198-009-0214-x

OECD, Retrieved June 27, 2020, from https://www.oecd.org/

Özcan, T., & Tüysüz, F. (2018). Healthcare Expenditure Prediction in Turkey by Using Genetic
Algorithm Based Grey Forecasting Models. In C. Kahraman & Y. I. Topcu (Eds.), Operations
Research Applications in Health Care Management (pp. 159–190). Cham: Springer

This article is protected by copyright. All rights reserved.


International Publishing. https://doi.org/10.1007/978-3-319-65455-3_7

Pichon-Riviere, A., Augustovski, F., Garcia Marti, S., & Caporale, J. (2015). The Efficiency Path: An
Estimation of Cost-Effectiveness Thresholds for 185 Countries Based on Per Capita Health
Expenditures and Life Expectancy. Value in Health, 18(7), A695–A696.
https://doi.org/10.1016/J.JVAL.2015.09.2592

Rachmani, E., Hsu, C. Y., Nurjanah, N., Chang, P. W., Shidik, G. F., Noersasongko, E., … Lin, M. C.
(2019). Developing an Indonesia’s health literacy short-form survey questionnaire (HLS-EU-
SQ10-IDN) using the feature selection and genetic algorithm. Computer Methods and Programs
in Biomedicine. https://doi.org/10.1016/j.cmpb.2019.105047

Ray, D., & Linden, M. (2020). Health expenditure, longevity, and child mortality: dynamic panel data
approach with global data. International Journal of Health Economics and Management, 20(1),
99–119. https://doi.org/10.1007/s10754-019-09272-z

Rodríguez, A. F., & Nieves Valdés, M. (2019). Health care expenditures and GDP in Latin American
and OECD countries: a comparison using a panel cointegration approach. International Journal
of Health Economics and Management, 19(2), 115–153. https://doi.org/10.1007/s10754-018-
9250-3

Srinivasa Murthy, Y. V., & Koolagudi, S. G. (2018). Classification of vocal and non-vocal segments
in audio clips using genetic algorithm-based feature selection (GAFS). Expert Systems with
Applications. https://doi.org/10.1016/j.eswa.2018.04.005

Su, Y., Wang, Z., Jin, S., Shen, W., Ren, J., & Eden, M. R. (2019). An architecture of deep learning in
QSPR modeling for the prediction of critical properties using molecular signatures. AIChE
Journal, 65(9), 1–11. https://doi.org/10.1002/aic.16678

Toor, I. A., & Butt, M. S. (2005). Determinants of health care expenditure in Pakistan. Pakistan
Economic and Social Review, (1), 133–150.

Wang, L., Zhou, X., Zhu, X., Dong, Z., & Guo, W. (2016). Estimation of biomass in wheat using
random forest regression algorithm and remote sensing data. The Crop Journal, 4(3), 212–219.
https://doi.org/10.1016/j.cj.2016.01.008

Wang, Z., Su, Y., Shen, W., Jin, S., Clark, J. H., Ren, J., & Zhang, X. (2019). Predictive deep learning
models for environmental properties: the direct calculation of octanol–water partition
coefficients from molecular graphs. Green Chemistry, 21(16), 4555–4565.
https://doi.org/10.1039/C9GC01968E

Zhao, J. (2015). Forecasting Health Expenditure: Methods and Applications to International


Databases. CHEPA Working Papers Series. Retrieved from http://www.chepa.org/docs/default-

This article is protected by copyright. All rights reserved.


source/default-document-library/zhao-2015-forecasting-health-expenditure_chepa-working-
paper.pdf?sfvrsn=0

Zheng, A., Fang, Q., Zhu, Y., Jiang, C., Jin, F., & Wang, X. (2020). An application of ARIMA model
for predicting total health expenditure in China from 1978-2022. Journal of Global Health,
10(1), 1–8. https://doi.org/10.7189/jogh.10.010803

Zhou, S., & Dhupia, J. S. (2020). Online adaptive water management fault diagnosis of PEMFC based
on orthogonal linear discriminant analysis and relevance vector machine. International Journal
of Hydrogen Energy. https://doi.org/10.1016/j.ijhydene.2019.12.193

This article is protected by copyright. All rights reserved.


Table 1. Dataset description
Notation Description Data Source
pcHCE Healthcare expenditure per capita (current US $) OECD*
pcGDP GDP per capita (based on current US $) Worldbank
LE Life expectancy at birth (years) Worldbank
UR Unemployment rate (for persons aged 15-64) Turkstat**
CBR Crude birth rate (per 1,000 people) Worldbank
PN The number of physicians OECD
HN The number of hospitals OECD
*
OECD: Organisation for Economic Co-operation and Development
**
Turkstat: Turkish Statistical Institute

This article is protected by copyright. All rights reserved.


Table 2. The matrix of correlation coefficients for the variables considered

Variable pcHCE UR pcGDP LE PN HN CBR


pcHCE 1.0000
UR 0.7078 1.0000
pcGDP 0.9352 0.6599 1.0000
LE 0.9656 0.6901 0.9136 1.0000
PN 0.9875 0.6916 0.9182 0.9896 1.0000
HN 0.9623 0.6298 0.9288 0.9823 0.9825 1.0000
CBR -0.9718 -0.7102 -0.9169 -0.9980 -0.9909 -0.9787 1.0000

Table 3. The performance of AI models

Model
Dataset Measure
RF ANN MLR SVR RVM
MAD 13.5405 28.0721 51.7192 32.4688 28.7808
RMSE 18.0620 34.9361 66.4252 45.3277 36.6990
Training
MAPE (%) 2.58020 5.54380 11.4017 6.73610 5.82480
R2 0.99690 0.98810 0.95690 0.98070 0.98680
MAD 29.2356 39.6744 51.7120 40.4495 39.9719
RMSE 42.2965 51.2992 89.6839 68.2814 54.5336
Test
MAPE (%) 4.60820 5.74930 9.11650 6.59810 6.48720
R2 0.98740 0.98180 0.93850 0.97170 0.98110

This article is protected by copyright. All rights reserved.


Table 4. The selected features using GAFS-AI models
Model Feature Number Selected Features
RF 4 LE, PN, HN, CBR
ANNs 2 UR, CBR
LR 3 UR, pcGDP, PN
SVR 3 pcGDP, LE, PN
RVM 3 pcGDP, PN, CBR

Table 5. The performance of the AI models using GAFS method


Model
Dataset Measure GAFS- GAFS- GAFS- GAFS- GAFS-
RF ANN MLR SVR RVM
MAD 8.7600 23.4809 38.5525 28.5129 28.3412
RMSE 12.0925 30.1676 43.4794 37.1637 34.0721
Training
MAPE (%) 2.0794 5.3199 9.6971 6.1883 5.5759
R2 0.9987 0.9912 0.9818 0.9874 0.9888
MAD 11.2533 29.1447 41.1499 38.4293 33.5624
RMSE 12.7544 34.0450 50.1039 50.4419 38.3166
Testing
MAPE (%) 2.0384 5.7054 7.7596 6.5930 6.0412
R2 0.9986 0.9900 0.9790 0.9844 0.9878

This article is protected by copyright. All rights reserved.


Table 6. Performance comparison of the single and optimized models
Dataset Model MAD RMSE MAPE (%) 1-R2 SI Rank
RF 0.111 0.110 0.054 0.043 0.079 2
ANN 0.450 0.420 0.372 0.254 0.374 4
MLR 1.000 1.000 1.000 1.000 1.000 10
SVR 0.552 0.612 0.500 0.431 0.523 8
RVM 0.466 0.453 0.402 0.285 0.401 6
Training
GAFS-RF 0.000 0.000 0.000 0.000 0.000 1
GAFS-ANN 0.343 0.333 0.348 0.179 0.301 3
GAFS-MLR 0.694 0.578 0.817 0.404 0.623 9
GAFS-SVR 0.460 0.461 0.441 0.270 0.408 7
GAFS-RVM 0.456 0.405 0.375 0.237 0.368 5
RF 0.444 0.384 0.363 0.186 0.344 2
ANN 0.702 0.501 0.524 0.280 0.502 5
MLR 1.000 1.000 1.000 1.000 1.000 10
SVR 0.722 0.722 0.644 0.448 0.634 9
RVM 0.710 0.543 0.629 0.291 0.543 7
Test
GAFS-RF 0.000 0.000 0.000 0.000 0.000 1
GAFS-ANN 0.442 0.277 0.518 0.143 0.345 3
GAFS-MLR 0.739 0.486 0.808 0.326 0.590 8
GAFS-SVR 0.672 0.490 0.643 0.236 0.510 6
GAFS-RVM 0.551 0.332 0.566 0.180 0.407 4

This article is protected by copyright. All rights reserved.


Figure 1. The trends of HCE (% GDP) and pcHCE in Turkey

This article is protected by copyright. All rights reserved.


Figure 2. The change of the input variables through years for (a) the number of physicians, (b)
GDP per capita, (c) unemployment rate, (d) life expectancy at birth, (e) the number of hospitals
(f) crude birth rate.

This article is protected by copyright. All rights reserved.


Figure 3. Flowchart of GAFS-AI forecasting models

This article is protected by copyright. All rights reserved.


1

Performance Measure (R2)


0.99
0.98
0.97
0.96
0.95
0.94
0.93
RF ANN MLR SVR RVM
Single Model Optimized Model

(a)

1
Performance Measure (R2)

0.99
0.98
0.97
0.96
0.95
0.94
0.93
RF ANN MLR SVR RVM
Single Model Optimized Model

(b)

Figure 4. The performance of single and optimized models a) in the training set b) in the
testing set

This article is protected by copyright. All rights reserved.

You might also like