Professional Documents
Culture Documents
a r t i c l e i n f o a b s t r a c t
Article history: Background: Preauthorisation is a control mechanism that is used by Health Insurance Providers (HIPs) to
Received 25 March 2015 minimise wastage of resources through the denial of the procedures that were unduly requested. How-
Received in revised form 13 June 2016 ever, an efficient preauthorisation process requires the round-the-clock presence of a preauthorisation
Accepted 14 June 2016
reviewer, which increases the operating expenses of the HIP. In this context, the aim of this study was to
learn the preauthorisation process using the dental set from an existing database of a non-profit HIP.
Keywords:
Methods: Pre-processing data techniques as filtering algorithms, random under-sample and imputation
Classification
were used to mitigate problems that arise from the selection of relevant attributes, class balancing and
Data mining
Ensemble
filling unknown data. The performance of classifiers Random Tree, Naive bayes, Support Vector Machine
Prepaid health plans and Nearest Neighbor was evaluated according to kappa index and the best classifiers were combined by
Machine learning using ensembles.
Results: The number of attributes were reduced from 164 to 15 and also were created 12 new attributes
from existing discrete data associated with the beneficiary’s history. The final result was the development
of a decision support mechanism that yielded hit rates above 96%.
Conclusions: It is possible to create a tool based on computational intelligence techniques to evaluate the
requests of test/procedure with a high accuracy. This tool can be used to support the activities of the
professionals and automatically evaluate less complex cases, like requests not involving risk to the life
of patients.
© 2016 Elsevier Ireland Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.ijmedinf.2016.06.007
1386-5056/© 2016 Elsevier Ireland Ltd. All rights reserved.
2 F.H.D. Araújo et al. / International Journal of Medical Informatics 94 (2016) 1–7
clock preauthorisation reviewer to analyse requests. However, the Araújo et al. [3] used medical data from a HIP to learn medical
absence of a preauthorisation reviewer can result in unnecessary preauthorisation, whereas we used dental data to learn dental
procedures being requested, which places strong financial pres- preauthorisation.
sure on the HIP. An alternative that many companies often use In the present study, a mechanism for supporting healthcare
is to forego active control while maintaining a strong audit pro- preauthorisation was developed to assist healthcare professionals
cess to block the payment of procedures/tests/treatments that have in decision-making, whereby a potential decision was generated by
been performed on the basis of unsuitability or so-called disal- learning the HIP database. Thus, this process is an important tool
lowance. This procedure, however, results in significant burnout for assisting healthcare professionals.
of the involved parties, creates disharmonious relationships and This paper is structured as follows: all of the steps for learn-
makes the provision of health services a highly antagonistic pro- ing the preauthorisation process are described in Section 2; the
cess. That is, this process is not conducive to a strong and lasting results obtained using the classifiers are presented in Section 3;
relationship among the parties. a discussion about how to use the proposed methodology and the
The preauthorisation process critically affects patients’ health limitations of this work are presented in Section 4; conclusions and
because the denial of a procedure could potentially result in a suggestions for future studies are described in Section 5.
patient’s death. Thus, the preauthorisation mechanism must be
carefully considered to prevent problems. 2. Methods
The preauthorisation process in most HIPs is facilitated by com-
puter technology, i.e., there is a data record of orders and the Pre-processing techniques were used in this study to improve
results of requests (authorised/unauthorised). Therefore, super- the quality of the data in a HIP database. These data were then
vised learning techniques can be used to learn the preauthorisation combined with machine learning algorithms to learn the healthcare
process by analysing the data that are stored in HIP databases (DBs). preauthorisation process. In this study, we used dental data, but
However, most HIP DBs have problems such as redundant and the original version of this technique was performed using medical
missing values that reduce the quality of the available informa- data [3]. Fig. 2 shows all of the steps that were performed in this
tion. Thus, pre-processing techniques are required to improve data study. Details on the execution of each step are given below.
quality before a learning process can be performed. In this study, First, the HIP’s analysts generated a DB (DB-Original) that con-
we used earlier studies in which pre-processing techniques were tained all of the data that were related to the dental procedures.
used to improve data followed by an automated learning process This DB had 164 attributes; however, for ethical and legal reasons,
as references. These studies were compared in Table 1. all of the data that identified individuals, such as IDs, SSNs, dates of
A literature review showed that numerous studies in the med- birth, addresses and phone numbers, were removed from the DB. In
ical field have used pre-processing techniques combined with addition, an artificial key was generated for the DB to identify all of
machine learning algorithms to facilitate the decision-making the anonymous individuals. The resulting database from which the
process. However, data from HIPs were used to facilitate decision- attributes that identified individuals were eliminated was called
making in only three studies. The objectives of the study of Barros DB-EL.
et al. [4] and Martins et al. [15] were different from those of The DB-EL comprised 133 attributes of which many attributes
the present study: the aforementioned authors used information were irrelevant and in a format that was incompatible with the
about the HIP to find production rules to associated patients with algorithms that were used during the mining phase. Therefore, the
certain groups of diseases, whereas our objective was to auto- DB-EL was subjected to pre-processing to improve the data quality
mate the learning of the preauthorisation process. However, Araújo and decrease the amount of irrelevant information.
et al. [3] had a similar objective to that of the present study, i.e., An attribute selection step was performed in the first step of
the automated learning of the preauthorisation process. However, pre-processing. This selection was performed by manual and auto-
F.H.D. Araújo et al. / International Journal of Medical Informatics 94 (2016) 1–7 3
Table 1
Related works that used pre-processing techniques to improve data followed by an automated learning process.
Araújo et al. [3] The authors used data pre-processing and supervised machine Pre-processing was used to resolve the primary problems
learning techniques to automate the learning of the associated with the data, that is, to eliminate irrelevant
preauthorisation process. The data used were collected from attributes and treat unbalanced classes. Following the data
2007 to 2012 and contained information on requests for improvement, two methods were used to learn the
medical procedures/tests for the beneficiaries of a public and preauthorisation process, rule induction and C4.5 [18]. The
non-profit HIP. holdout [9] method yielded similar results using both
classifiers, with hit rates of approximately 90%.
Barros et al. [4] The authors applied pre-processing techniques to a data set for These data were subjected to a data cleaning process to
the beneficiaries of a supplementary health plan to prepare the eliminate ineffectual records or inconsistent data. Finally, an
data for use in data mining algorithms to identify disease attribute pre-selection process was performed to eliminate
patterns. The data used were collected during 2010 from a HIP attributes that were not format compatible with the data
DB containing information on procedures in hospitals, mining techniques. The authors reduced the attributes from
laboratories and medical offices for the beneficiaries of a HIP in 120 to 55 and the number of records used by 14%.
the state of Santa Catarina (Brazil).
Martins et al. [15] The authors used the same DB and methodology to select data The selected data were used with C5.0 [18] algorithms and
and attributes as [4] to refine this database for use with data fuzzy genetic programming (FGP) [21] and the extracted rules
mining algorithms. The goal was to extract production rules to were validated by the holdout and cross-validation methods.
determining whether selected patients had certain groups of The rules that classified a set of attributes for patients as not
diseases. belonging to groups with certain diseases had a hit rate above
97%. However, the rules that classified a set of attributes for
patients as belonging to groups with certain diseases obtained
a low hit rate of slightly above 40%.
Chimieski and Fagundes [7] The authors used three public medical DBs (for dermatological, The BayesNet, C4.5, Logistic Model Tree (LMT), NBtree,
breast cancer and spinal problems) that are widely used in the RandomForest, RandomTree, RepTree and Simple Cart [19]
literature to evaluate and compare different machine learning algorithms were compared using the following metrics:
algorithms to investigate the feasibility of using data Precision, F-Measure, ROC Curve and Kappa [20]. The holdout
classification algorithms to facilitate the disease diagnosis method was used to validate the results. The BayesNet was the
process. best algorithm for predicting breast cancer and dermatology
diagnoses, with over 90% accuracy. The LMT was the best
algorithm for the diagnosis of spinal problems, with
approximately 85% accuracy.
Table 2
List of attributes in DB-EXP.
Created by the Expert 16) Number of tests performed by the beneficiary in the respective month
17) Number of tests performed by the beneficiary in the respective semester
18) Number of tests performed by the beneficiary in the respective year
19) Number of identical tests performed by the beneficiary in the respective month
20) Number of identical tests performed by the beneficiary in the respective semester
21) Number of identical tests performed by the beneficiary in the respective year
22) Number of tests of identical complexity level performed in the respective month
23) Number of tests of identical complexity level performed in the respective semester
24) Number of tests of identical complexity level performed in the respective year
25) Number of requests made by the beneficiary in the respective month
26) Number of requests made by the beneficiary in the respective semester
27) Number of requests made requested by the beneficiary in the respective year
mated selection. In the manual selection process, the domain expert the DB-MS by calculating the gain ratio [18] for each of the 44
and the data analyst eliminated data with the following attributes: attributes, of which 29 had zero gain and were therefore eliminated
a) attributes that were unknown and not used by professionals from the database. The resulting DB-AS database contained only 15
at the time of preauthorisation, b) attributes that were unfilled attributes. After the manual and automated selection steps, the HIP
in every instance of the database and c) attributes that exhibited domain expert created new attributes based on his knowledge of
equal values (default filling) in all of the instances of the database. the preauthorisation process. Thus, 12 attributes were created from
A total of 89 attributes were eliminated in the manual selection existing discrete data associated with the beneficiary’s history. All
step, such that only 44 attributes remained in the resulting DB-MS of the new attributes had gain ratios above zero and were thus
database. The automated attribute selection was performed using
4 F.H.D. Araújo et al. / International Journal of Medical Informatics 94 (2016) 1–7
added to the DB-AS to generate the DB-EXP. These attributes are and answered by a specialist who is responsible for authorizing HIP
shown in Table 2. procedures.
Some of the selected attributes had unfilled values in the DB- Three different evaluations were performed for each of the
EXP; thus, the problem of missing data was treated. In this case, aforementioned four classifiers to demonstrate the importance of
imputation using the mean was chosen to fill the empty values [5]. pre-processing. The two datasets (replicated and non-replicated)
Numerous data were in a format that was incompatible with the were used in each of the evaluations. All of the parameters resulting
classification algorithms used; thus, these data were transformed from the manual selection (DB-MS) were used in the first evalua-
[24]. Standardisation, conversion of qualitative into quantita- tion; only the attributes resulting from the automated selection
tive attributes and information extraction of complex data-type (DB-AS) were used in the second evaluation; and the attributes that
attributes (the extraction of the month of a date-type attribute) were automatically selected together with the attributes created
were used to generate the DB-Final. by the expert, after treatment of the unknown values and trans-
One feature of the data from the DB-Final was that the number formations (DB-Final), were used in the third evaluation. Fig. 3 is a
of authorised procedures (7688) was much higher than the num- schematic showing how the evaluations were performed for each
ber of unauthorised (3597) procedures. However, most machine of the classifiers.
learning algorithms have difficulty in creating a model that accu- Table 3 compares the results of the Precision, Recall, Accuracy,
rately classifies examples in a minority class [6]. Therefore, class F-Measure, AUC and Kappa index for the algorithms that were
balancing was performed using random over-sampling [12] and the tested using the attributes that were obtained after manual selec-
minority class data were replicated for later use with the learning tion (DB-MS), automated selection (DB-AS) and the final attributes
algorithms. Thus, the balanced set comprised the same amount of (DB-Final).
authorised and unauthorised procedures (7688). After all of the pre- Table 3 shows that very similar results were obtained using the
vious phases were completed, the classification algorithms were attributes resulting from manual and automated selection. How-
used to learn the behaviour of the preauthorisation reviewers based ever, with the removal of irrelevant attributes in the automated
on the DB-Final. selection process, this second classification resulted in a lower
The following supervised learning algorithms were used in this training time for the classifiers. Using the attributes of the DB-Final
study: the RandomTree (RT) [1], the Naive Bayes (NB) algorithm produced the best results for all of the classifiers.
[13], the support vector machine (SVM) [11] and the nearest neigh- Using the replicated set considerably improved the classifica-
bour (NN) algorithm [10]. These algorithms were selected because tion performance. Kappa index values in Table 3 show that the NB
they belong to different paradigms of supervised learning: a sym- failed to learn the preauthorisation process, because the other three
bolic paradigm, a statistical paradigm, a connectionist paradigm tested classifiers exhibited “Very Good” performance for the non-
and an example-based paradigm [17], respectively. The perfor- replicated set using the attributes of the DB-Final. Moreover, the
mance of each of the classifiers was evaluated to achieve the best three classifiers exhibited “Excellent” performance for the repli-
performance in learning the preauthorisation process. cated set using these same attributes.
Finally, the classifiers with the best performance were deter- The Z-test [8] was performed to statistically comparatively eval-
mined and combined using ensembles. An ensemble is a set of uate the results of the classifiers at a significance level of 5% to assess
classifiers in which individual decisions are combined to classify whether the tested classifiers were significantly different from each
a new case. Ensembles can be more precise than their constituent other. The test results showed that the RT, the SVM and the NN
individual classifiers and can thus improve the predictive power exhibited significantly equivalent performances that were higher
of the learning algorithms. There are several ways to combine than that obtained using the NB.
classifiers of which the primary methods are weighted voting, After obtaining the best classifiers for learning the preauthori-
unweighted voting and using a mean [11]. All of the obtained results sation process, the classifier ensemble (which consisted of the RT,
are described in the following section. the SVM and the NN) was performed. Table 4 shows the results
The algorithms used were tested using the WEKA tool [23] with of this ensemble (for the replicated and non-replicated sets) using
10-fold cross-validation as the evaluation method. In this evalua- three combination criteria: the mean of the individual outputs of
tion method, the original dataset is randomly partitioned into 10 the classifiers, unweighted voting and weighted voting. Note that
data subsets with the same size. One subset is chosen from the the attributes of the DB-Final were used for this test.
10 subsets to validate the model, and the nine remaining subsets Table 4 shows that the best results of this ensemble were
are used for training. This process is repeated 10 times, and each obtained using the weighted voting criterion. The Kappa values
of the 10 subsets is used only once for validation. At the end of in Table 4 show that using criterion-weighted voting resulted
this process, the mean of the 10 generated results is calculated to in “Excellent” performances for both the replicated and non-
produce a single estimate. The advantage of this method is that all replicated data sets.
of the data subsets are used both for training and for evaluation. The use of the ensemble (criterion-weighted voting) improved
The statistical measures for the evaluation of the algorithms used almost all of the evaluated metrics, with the exception of the AUC
were Precision (P), Recall (R), Accuracy (A), F-Measure (FM), Area metric, over using the best individual classifier (NN) using the repli-
Under the ROC curve (AUC) and Kappa index (K) [22]. According to cated data set and the attributes of the DB-Final. Thus, using this
[14], the accuracy level of the Kappa index can be classified into 5 classifier ensemble in a real HIP scenario would correctly classify
levels of performance: “Poor” (K ≤ 0.2), “Reasonable” (0.2 < K ≤ 0.4), approximately 96% of the requests.
“Good”(0.4 < K ≤ 0.6), “Very Good”(0.6 < K ≤ 0.8) and “Excellente”
(K > 0.8).
4. Discussion
Table 3
Comparison between the results for the classifiers.
Non-Replicated Set
P R A FM AUC Kappa
Replicated Set
RT DB-MS 0.82 0.93 0.87 0.87 0.91 0.73
DB-AS 0.81 0.92 0.86 0.87 0.91 0.72
DB-Final 0.89 0.98b 0.93b 0.93 0.94 0.86b
NB DB-MS 0.53 0.52 0.52 0.51 0.53 0.02
DB-AS 0.52 0.51 0.51 0.40 0.53 0.02
DB-Final 0.65 0.56 0.58 0.47 0.65 0.18
SVM DB-MS 0.60 0.64 0.62 0.65 0.62 0.25
DB-AS 0.63 0.66 0.64 0.66 0.64 0.29
DB-Final 0.90b 0.91 0.92 0.92 0.91 0.82
NN DB-MS 0.81 0.93 0.86 0.87 0.92 0.74
DB-AS 0.81 0.92 0.86 0.87 0.92 0.72
DB-Final 0.90b 0.98b 0.93b 0.94b 0.96b 0.86b
a
Best result for each database.
b
best result for measure.
Table 4
Results for classifiers for RT, SVM and NN ensembles.
P R A FM AUC Kappa
HIP, which is a problem, although a minor one. An incorrect nega- In the case of evaluations which involve negative authorizations,
tive response may generate considerable damage, especially when an automatic tool could be used as a way of helping in the human
the case involves the risk of somebody’s death. review. This could be used to start a debate among evaluators, in
Due to the problem characteristics it is noteworthy that the order to generate more precise answers to distinct cases. The entire
introduction of an automatic tool for evaluating requests can process of using an automated tool can be summarized in Fig. 4.
greatly reduce the number of requests forwarded for evaluating The accuracy value obtained by using the methodology was 96%,
by a human. Many of the positive authorizations could be auto- which means that on average only 4 procedures out of every 100
matically evaluated. requested and due to be authorised, would be analyzed by the
6 F.H.D. Araújo et al. / International Journal of Medical Informatics 94 (2016) 1–7
Fig. 4. Flowchart of steps performed after the request of a test/procedure using the tool.
[2] ANS, Caderno de informação da saúde suplementar. <http://www.ans.gov.br>, [14] J. Landis, G. Koch, The measurement of observer agreement for categorical
2013 (accessed: 13.12.13). data, Biometrics (1977) 159–174.
[3] F.H.D. Araújo, L. Moraes, A.M. Santana, P.A.N. Santos, P. Adeodato, E.M. Leao, [15] O.L.F. Martins, F.E. Barros, W. Romão, A.A. Constantino, C.L. Souza, Application
Evaluation of the use of computational intelligence techniques in medical of machine learning algorithms to date mining about beneficiaries of health
claim processes of a health insurance company, Int. Symp. Comput.-Based insurance, J. Health Inform. (2012) 43–49.
Med. Syst. (CBMS) (2013) 23–28. [16] T. Mettler, Anticipating mismatches of HIT investments: developing a
[4] F.E. Barros, E. Romão, A.A. Constantino, C.L. Souza, Data mining pre-processing viability-fit model for e-health services, Int. J. Med. Inform. (2015), Available
for beneficiaries of health insurance, J. Health Inform. (2011) 19–26. online from: http://dx.doi.org/10.1016/j.ijmedinf.2015.10.002.
[5] G.E.A.P.A. Batista, M.C. Monard, An analysis of four missing data treatment [17] T.M. Mitchell, Machine Learning, McGraw-Hill, 1997.
methods for supervised learning, Appl. Artif. Intell. 17 (5–6) (2003) 519–533. [18] J.R. Quinlan, Induction of decision tree, Mach. Learn. (1986) 81–106.
[6] G.E.A.P.A. Batista, R.C. Prati, M.C. Monard, A study of the behavior of several [19] L. Rokach, O. Maimon, Data Mining with Decision Trees: Theory and
methods for balancing machine learning training data, SIGKDD Explorations Applications, World Scientific Publishing, Singapore, 2008.
(2004) 20–29. [20] G.H. Rosenfield, K.A. Fitzpatrick-lins, A coefficient of agreement as a measure
[7] B.F. Chimieski, R.D.R. Fagundes, Association and classification data mining of thematic classification accuracy, Photogramm. Eng. Remote Sens. (1986)
algorithms comparicion over medical datasets, J. Health Inform. (2013) 44–51. 223–227.
[8] R.G. Congalton, K. Green, Assessing the Accuracy of Remotely Sensed Data: [21] W.K. Sung, Algorithms in Bioinformatics: A Practical Introduction, A Chapman
Principles and Practices, Taylor and Francis Group, 2009. & Hall Book, Boca Raton, 2009.
[9] T.G. Dietterich, Statistical Tests for Comparing Supervised Classification [22] J.M. Tenorio, A.D. Hummel, F.M. Cohrs, V.L. Sdepanian, I.T. Pisa, H.F. Marin,
Learning Algorithms. Technical Report, State University, Oregon, 1997. Artificial intelligence techniques applied to the development of a
[10] E. Fix, J.L. Hodges, Discriminatory Analysis, Nonparametric Discrimination: decision–support system for diagnosing celiac disease, Int. J. Med. Inform. 10
Consistency Properties, US Air Force School of Aviation Medicine, 1951 (477+). (2011) 793–802.
[11] S. Haykin, Neural Networks: A Comprehensive Foundation, Hall P editor, 2001. [23] I.H. Witten, E. Frank, M.A. Hall, DATA MINING: Practical Machine Learning
[12] N. Japkowicz, S. Stephen, The class imbalance problem: a systematic study, Tools and Techniques, Morgan Kaufmann, 2011.
Intell. Data Anal. (2002) 429–449. [24] C. Zhang, S. Zhang, Q. Yang, Data preparation for data mining, Appl. Artif.
[13] G. John, P. Langley, Estimating continuous distributions in bayesian classifiers, Intell. 17 (2003) 375–381.
Proceedings of the Eleventh Conference on Uncertainty in Artificial
Intelligence (1995) 338–345.