Professional Documents
Culture Documents
Healthcare Analytics
journal homepage: www.elsevier.com/locate/health
Features and explainable methods for cytokines analysis of Dry Eye Disease
in HIV infected patients
Francesco Curia
Deparment of Statistical Science, Sapienza University of Rome, piazzale Aldo Moro 5, 00185, Rome, Italy
https://doi.org/10.1016/j.health.2021.100001
Received 30 April 2021; Received in revised form 2 July 2021; Accepted 2 July 2021
Available online xxxx
2772-4425/© 2021 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).
F. Curia Healthcare Analytics 1 (2021) 100001
can lead to interpretable results, depending on which models They proportion of hospitalization time 7 days or less increased significantly
are used. A CDSS based on black-box (i.e. neural network) methods by 7.83% (95% CI 1.79%–13.87%, P = 0.01). The authors therefore
carries with them a great responsibility. The output of a model it conclude that CDSS integrated with BMJ Best Practice has improved
may concern, for example, a drug therapy, administration of a drug, the accuracy of doctors’ diagnoses. In another fairly recent study, the
rather the experimentation of a vaccine that compatibility on organ authors ([12], 2020) carry out an examination of 60 clinical support
transplants. The interpretability and transparency of the models used systems that use machine learning and find use in different clinical
must guarantee full explainability of the results. Some questions are areas such as bacterial infections, viral infections, tuberculosis and on
important, such as: why was one model preferred over another? and generic infections. 33% of these studies dealt with the diagnosis while
how this model was used. From the 1970s onwards there have been 30% with the prediction of diseases, the prediction of the response
several CDSSs based on artificial intelligence, for example a work by to treatment and the prediction of antibiotic resistance, rather than
de Dombal et al. ([1], 1972) in which you try to implement automatic the choice of antibiotic therapy itself. Regarding the implications, the
reasoning in conditions of uncertainty. The system was developed authors themselves suggest that a data base as exhaustive as possible
from the University of Leeds, designed to support diagnosis of acute that takes into account factors such as primary care and socio-economic
abdominal pain and on the basis of the analysis the need for surgery, data can help to build much more effective tools. Other authors ([13],
system decision making it was based on the Bayesian approach. In 2020) have addressed the study and prediction of heart disease with
Shortliffe’s work ([2], 1976), (MYCIN), a rules-based expert system de- important results. Using spatial clustering techniques based on the
signed to diagnose e recommend treatment for certain blood infections density of applications with noise (DBSCAN) able to identify anomalies
(antimicrobial selection for patients with bacteremia o meningitis) has and remove them and then use a technique known as (SMOTE-ENN)
been proposed. It was later extended to management other infectious to balance the distribution of train data and subsequently train an
diseases. Clinical knowledge in CDSS is represented as a set of IF-THEN XGboost to predict heart disease. The authors compare their results
rules. Some CDSS related issues will be presented below for certain with others already known in the literature, obtaining accuracies of
classes of problems, such as cancer, diabetes, heart problems and other 95.90% and 98.40% respectively, thus providing a tool that can be
applications; the use of advanced analysis techniques related to clinical fully used by clinical operators. Recent work ([14], 2021) describes
decisions is an important topic and the literature is very broad, the most the prevalence and nature of the involvement of clinical experts in the
interesting contributions will be highlighted. Miller et al. ([3], 1982) development, evaluation and implementation of CDSS that use machine
developed INTERNIST-1, one of the first clinical decisions support learning to analyze electronic health record data. The authors conduct a
systems designed to support diagnosis, in 1970. The CDSS was a rules- systematic search on different platforms such as: PubMed, CINAHL and
based expert system designed from the University of Pittsburgh in 1974 IEEE Xplore and a manual search of conference proceedings in order
for the diagnosis of complex diagnoses of complex problems in general to identify suitable articles. The results they get are quite interesting:
internal medicine. Use patient observations deduce a list of compatible the involvement of clinical experts was prevalent in the early and late
disease states (based on a tree structured database that links diseases stages of system design. The authors pay attention to the fact that
with symptoms). In the work of Muller et al. ([4], 2020) by definition clinical operators must necessarily be involved in the entire decision-
these systems are based on patient specificity as evidence and represen- making process in order to obtain a robust tool, in which therefore
tations of clinical knowledge modeled by algorithms and mathematical the clinical domain competence supports the analytical design phase
models by experts and provide recommendations by addressing the designed by an expert, but which falls outside the medical domain.
right diagnosis or optimal therapy. In this work the authors propose
an approach based on data visualization. The authors show that more 1.2. Dry Eye Disease problem
the displays show the certainty of the calculation result such as the
recommendation and a series of clinical scores. Regarding the model DED (Dry Eye Disease), it is a condition of the human eye which
used, the authors presented an approach for a CDSS based on a Bayesian occurs when the tears necessary for adequate lubrication for the eyes,
causal network represents the therapy of laryngeal carcinoma. The occur in scarce quantities or almost absent, creating a disabling tear
results were evaluated and validated by two experts otolaryngologists. instability. This problem affecting the external sense organ of the visual
Several other studies have addressed the question of the explainability apparatus leads to inflammation and possible damage to the surface of
of CDSS, as in ([5], 2017), ([6,7], 2019), not calibrating the user trust the eye. According to the American Optometric Association (www.aoa.
concept by introducing this new type of error to the context analyzed org) DED can develop for many reasons, including:
by ([8], 2020) using these tools. Another example related to Bussone’s
• Age Dry eyes are part of the natural aging process. People over
work et al. ([9], 2015) who studied the effect of the explanation on
the age of 65 experience some symptoms of dry eye.
trust and dependence. The authors state: ‘‘neglecting human factors
• Gender Women are more likely to increase dry eyes due to
and user experience in the design the explanation of the CDSS could
hormonal changes caused by pregnancy, oral contraceptives and
lead to excessive dependence on medical professionals in these referral
menopause.
systems, even when it is wrong’’, which the authors define an ‘‘over-
• Medications Some medications, including antihistamines, decon-
reliance’’. There is also another possible problem when the explanation
gestants, blood pressure medications, and antidepressants, can
it does not provide sufficient information could lead users to reject the
reduce tear production.
suggestions, for example self-sufficiency as described in the work of
• Medical conditions People with rheumatoid arthritis, diabetes,
([10], 2020). There are other very recent works dealing with CDSS
and thyroid problems are more likely to have dry eye symptoms.
that make use of advanced techniques analytics, as in the work of
Environmental conditions. Exposure to smoke, wind, and dry
([11], 2020) in which a longitudinal retrospective observational study
climates can increase tear evaporation with symptoms of dry eye.
is conducted that examines 34,113 electronic medical records. The
Also the inability to blink regularly, such as when staring at a
authors however use a multivariate logistic regression e time series
computer screen for long periods of drying the eyes.
analysis in order to explore the effects of CDSS. The aim of the study
is to evaluate the effects of CDSS integrated with the British Medical As part of the CDSS, some studies have been conducted on this
Journal (BMJ) Best Practice Assisted Diagnosis in real-world research. pathological condition; the authors ([15], 2019) starting from the
With regard to the results they obtain total accuracy values of the factors that characterize the disease, such as those listed above and
diagnosis recommended by CDSS equal to 75.46% in the first degree according to the guidelines of the American Academy of Ophthalmol-
diagnosis, and 83.94% in the top-2 diagnosis while 87.53% in the ogy, acquire various data concerning the disease in order to build a
top-3 diagnosis in the data before implementation of the CDSS. The robust model to support clinical decision making to try to predict the
2
F. Curia Healthcare Analytics 1 (2021) 100001
condition in advance by analyzing symptoms. The authors consider of both the model used and the results obtained, thus providing the
models such as neural networks, decision trees, random forest and clinical decision maker an instrument that it can be defined as ‘‘blind’’.
naive Bayes. The results that the authors obtained were quite accurate, Black-boxing methods are certainly reliable and accurate, but they must
starting with the classification by decision trees given a sufficient provide answers to the decision maker. The results obtained are of great
amount of data, structured in a certain way. The prediction rate of interest, as in addition to being able to establish whether a particular
random forest and decision tree algorithms is over 90% compared to patient may belong to one group rather than another, the method
more complex methods such as neural networks and naive Bayes. A provides results in terms of probability of disease development and the
more recent study ([16], 2020) developed a model based on machine possibility of opening the model and individually evaluate each method
learning methods such as decision tree and LASSO to then predict a used (for clustering) and the relationship of the features (for supervised
scoring (probability) score for a classification using multiple logistic classification).
regression. The authors consider as many factors as possible from the
data provided by the Korea National Health and Nutrition Examination 1.5. Outline
Survey (KNHANES) for 2012 (4391 sample cases). The results obtained
show that the point-based model obtained an AUC (area under curve) As regards the structure of the work, it is divided as follows: an
of 0.70 (95% CI 0.61–0.78). Important factors included gender (+9 introductory part in which the panorama of CDSS methods is presented
points for women), corneal refractive surgery (+9 points), current and a general framework on Dry Eye Disease, with state of the art and
depression (+7 points), cataract surgery (+7 points), stress (+6 points), main case studies, implications, technologies and limitations. In the
age (54–66 years; +4 points), rhinitis (+4 points), lipid-lowering drug introduction, the objectives and implications of this work are presented.
(+4 points) and omega-3 intake (0.43%–0.65% kcal/day; −4 points). Below is a part related to the method presented in this work (Related
The proposed method is valid for finding important risk factors and work): desirable properties of interpretable CDSS, the reasons for using
identifying the patient’s specific risk that could be applied to other this approach for DED, a description of the data, the mathematical
multifactorial diseases. methodology with the description of a clustering algorithm based on
stacking method. Subsequently a part on explainability (Explainable
1.3. Objectives ML) is presented with the main methods of features explanation and
feature importance, both for supervised and unsupervised methods.
As seen in Sections 1.1–1.2, CDSS can provide advanced tools in The fourth part of the work (Experimental results) concerns the results
the fight against various diseases, making use of advanced machine obtained in terms of performance of the algorithms used, the expla-
learning and deep learning techniques; the purpose of this work and nation (opening black-boxe) of the algorithms. The fifth part (Clinical
objectives can be spelled out below: Explainability) discusses the part relating to supervised and unsuper-
• Addressing a complex problem such as DED disease related to HIV vised methods but from a clinical point of view, on the relationships
status in HIV infected patients, in order to make a comparison and implications of the different factors that make up the analysis of
with a healthy population and try to infer characteristics that may the DED disease. In the last part the conclusions follow with a brief
be of interest in studying the development of the disease summary of what has been done and what has been discussed, to then
• Show that through advanced methods of machine learning, both address the limitations of the work and future objectives.
supervised and unsupervised, it is possible to direct research in
this area towards more recent technologies; studies so far at the 2. Related work
medical health level, make use of classical statistical tools which,
however important, have intrinsically distributive hypotheses that A recent work [17] compared two populations (HIV positive, n =
cannot always be satisfied 17 and healthy controls, n = 18) in order to assess whether there was
• Analyze the factors that influence the development of the disease an association between the pathology and dropout of the meibomian
through methods that can be interpretable and constitute an gland; the authors found statistically significant associations in the
advanced means of diagnostics, analyzing both locally each single group of HIV-infected individuals. This condition has been found in
factor and as a whole, in order to have a broader picture of the 50%–80% of cases in HIV and AIDS patients. A highly significant CD4
disease cell count has been associated with this condition, correlated with
a serious situation of the eye, as indicated by a recent study [18].
1.4. Implications Starting from the case study by Agrawal et al. concerning a case-control
[19], [20] will be treated in patients with HIV infection (type 1). The
In this work there are several implications that can bring added authors review and compare data from 34 HIV-infected patients and
value to the study of CDSS both in the specific context of this DED 32 control patient observations, in order to: ‘‘study the profile of tear
disease, and for other problems that can be addressed by this approach. cytokines in HIV infected patients with HIV Disease Dry Eye (DED)
First, a combination of unsupervised and supervised method is carried and studies the association between the severity of ocular inflammatory
out. The use of clustering techniques provides evidence on the data complications and tear cytokine levels’’. The proposed methodology by
structure, patterns and elements that can be extracted for information the authors, however, it is not about a machine-driven study learn-
and diagnostic purposes. Often, however, these techniques are an end ing methodologies but rather a parametric study based on classical
in themselves, in the sense that once groups have been created, the statistics epidemiological approach; in this application the goal is to
evidence and correlations between the factors and elements that make find meaningful models by the grouping procedure of the whole. The
up the clusters are sought. In this paper, however, the clustering method it is therefore unsupervised despite the presence (if desired) of
methods are used twice simultaneously and using the output of the a binary variable which indicates whether the patient is HIV infected
first training cycle allows you to use it as input in the second and or not. Factors that make dry eyes more likely may include the fact that
get an overview of which method is better. Once this is done, it is tear production tends to decrease as the age. This condition generally
subsequently possible to predict or classify an instance by using a meta- occurs in female individuals over the age of 50, in many cases due to
regressor or meta-classifier, thus being able to study the probability of hormonal changes caused by pregnancy, or due to the use of the birth
assignment to a particular cluster and evaluate the influence of each control pill or even due to a menopause issue. Diets low in vitamin A
individual feature. The approach is totally new, as for the works cited or low in omega-3 fatty acids can contribute to this condition, not least
previously, both for the classic CDSS and for those inherent to DED, wearing contact lenses may be among the causes of the development
none of the cited authors has provided explanations and interpretability of this pathology. It is not a discussion of this application to predict
3
F. Curia Healthcare Analytics 1 (2021) 100001
4
F. Curia Healthcare Analytics 1 (2021) 100001
𝐶 ∗ = mode(𝐶1 (𝐱), … , 𝐶𝑘 (𝐱)) (1) Fig. 3. Conceptual diagram exemplifying the different levels of transparency charac-
terizing a ML model M with 𝜙 denoting the parameter set of the model at hand: (a)
Once the final label for each cluster has been obtained from the
simulatability; (b) decomposability; (c) algorithmic transparency. Source: [26].
consensus function (1), a meta-learner is trained who takes as input the
set of meta-features and the optimal label as a target, in order to obtain
a probability of belonging to a specific cluster and obtain a features
importance for each cluster model used, as using the meta-features • LIME
space the input becomes the cluster algorithm used and therefore it Ribeiro et al. [27] in this regard, introduces the concept of trade
is possible to obtain an importance ranking of each of them; the steps off between interpretability and loyalty LIME (Local Interpretable
described are shown in the pseudo code presented below. Model-Agnostic Explanations) formalized through the following
The strength of this method is intuitive; selecting a set of algo- optimization problem:
rithms that contribute to forming a final clustering model, through
min 𝐿(𝑓 , 𝑔, 𝜋𝑥 ) + 𝛺(𝑔) (2)
the conversion into a classification model through the introduction 𝑔∈𝐺
of a meta-learner on the meta-features space, allows to obtain the where 𝛺(𝑔) can be defined as a measure of complexity (as op-
importance of each of the clusterizers used, being able in this way to posed to interpretability) of the model 𝑔, for example the number
decompose the ensemble and make it interpretable. of parameters, or the depth of a tree in the case 𝑔 is a Classifica-
tion Trees, or for a linear model the number of non-zero weights,
3. Explainable machine learning
for example in the Lasso–Ridge approach. So a model 𝑔, belonging
to the wider class of models 𝐺, minimizes the 𝐿, which is a loss
The methods of Explainable AI are much discussed today and are
function which measures the infidelity of the model considering
beginning to play a very important role in the science of decision mak-
the proximity measure 𝜋𝑥 . Infidelity is defined by the authors as
ing, as an intelligent system often based on black-box methods must
‘‘the predictive behavior of the model near the instance to be
necessarily be able to provide the decision maker with the possibility
of know predicted’’, therefore a discrepancy between what is expected and
what is predicted.
(a) how the decision came about
(b) how this decision is to be interpreted
• Partial Dependence Plot
this must necessarily be contemplated in the context of clinical decision In Friedman’s work [30] some methods for the interpretation of
support systems. In order to build a transparent and interpretable models are presented. PDP is focuses on visualization, one of
clinical decision support system, some of the main explainable ML the most powerful interpretative tools and the display is limited
methods used are introduced. to small topics. Functions of a single variable with real value
5
F. Curia Healthcare Analytics 1 (2021) 100001
can be plotted as a graph of the values of 𝐹̂ (𝑥) against each each feature within the data space contributes to the final prediction
corresponding value of 𝑥. The functions of a single categorical (or classification); the methods presented here are those widely known
variable can be represented by a bar chart, each bar represents and applied in different contexts, including the clinical one, due to their
one of its values and the bar height the value of the function. simplicity of interpretation and explainability. The first two methods
Viewing functions of higher-dimensional topics is more difficult. are a consequence of the application of the Random Forest algorithm,
Is therefore useful to be able to visualize the partial dependence a set of predictors or classifiers of the decision tree type combined in
of the approximation 𝐹̂ (𝑥) on small selected subsets of the input a causal way in order to improve the final result of the model; these
variables. The functional form of 𝐹̂ depends on the chosen values methods [35] are defined respectively variable importance (VI) and
of the input subset 𝑧𝑙 , if the dependency is not very strong the Gini importance (GI) which aim to evaluate the features in the model
expected value of 𝐹̂ (𝑥), that is E[𝐹̂ (𝑥)] can represent a good when it descends the impurity of the nodes at each iteration, permuting
synthesis of the partial dependence of the chosen variables of the features in a random way; the GI method evaluates this decrease
the subset 𝑧𝑙 , a value such that 𝑧𝑙 ∪ 𝑧𝑖 = 𝑥 where 𝑧𝑙 is the through the Gini index, while the VI considers the average decrease.
complement subset of size 𝑙 and 𝑧𝑖 is a chosen target subset. Another interesting method is the one proposed by Gedeon [36] who
Dependencies can be different, as additive or multiplicative, for introduces a new method based on the matrix of the input weights of
example in classification problems the author suggests that partial a neural network, through the random elimination of less important
dependence diagrams of each 𝐹̂𝑘 (𝑥) on subsets of variables 𝑧𝑙 features using a brute force method. A recent and very interesting
most relevant for a given class provide information on how input method [37] is the Importance Ranking Measure (FIRM), which uses
variables affect the respective probabilities of individual classes. the retrospective analysis of machine learning algorithms that allows
to obtain both predictive performance and performance from the point
• Individual Condition Expectation of view of explainability. This method is also interesting as it considers
ICE [31] is a tool to visualize the model estimated by any su- the underlying correlation structure of the features in such a way as
pervised learning algorithm. While the PDP helps to visualize the to find the most important features. Another interesting method is the
partial average relationship between the estimated response and PIMP [38] which is a heuristic correction of the VI and GI methods,
one or more features, in the presence of substantial interaction in which the target variable is exchanged estimate the importance of a
effects, the partial response relationship can be heterogeneous, features in a casual way. assuming that it follows a certain probability
therefore an average like the PDP, can blur the complexity of distribution (Gaussian, lognormal or gamma), the value of the 𝑝-value
the relationship modeled, instead the ICE improves the partial obtained from the resulting estimate is used as a corrected measure of
dependence diagram by graphically representing the functional feature relevance.
relationship between the expected response and the characteristic
for the individual observations. In particular, the ICE graphs show 3.2. Evaluation
the variation of the values adapted in the range of a variable
suggesting where and to what extent heterogeneity can exist. The methodologies mentioned in the previous subsections 3 require
some algorithmic performance evaluation measures; since the prob-
• Feature Interaction
lem treated in this work involves a hybrid approach in which first
Starting from his work on the PDP method, Friedman 𝑒𝑡. 𝑎𝑙
unsupervised methods (clustering) and then a supervised meta-learner
presents another method, called Feature Interaction [32] which
are applied in order to make the model interpretable and obtain a
assumes that a function 𝐹 (𝑥) has an interaction between two of
probability of belonging to a given cluster, it introduced both measures
its variables 𝑥𝑗 and 𝑥𝑘 if the difference in the value of 𝐹 (𝑥) as
that allow to evaluate the homogeneity of the groupings obtained
a result of changing the value of 𝑥𝑗 depends on the value of 𝑥𝑘 .
with stacking clustering and both measures that evaluate the correct
Such an assumption can be formalized as
classification of the instances; last but not least, the hybrid approach
( 2 )2
𝜕 𝐹 (𝑥) first unsupervised and then supervised makes it possible to apply the
E𝑥 >0
𝜕𝑥𝑗 𝜕𝑥𝑘 methods of explainability discussed above 3.
or by an analogous expression for categorical variables implying Clustering . Rosenberg and Julia Hirschberg [39] introduce a measure
finite differences. If there is no interaction between these vari- of homogeneity known as V-measure (Validity measure), based on
ables, the function 𝐹 (𝑥) it can be expressed as the sum of two the concept of external entropy that solves some problems that other
functions, that is 𝐹 (𝑥) = 𝑓𝑗 (𝑥𝑗 ) + 𝑓𝑘 (𝑥𝑘 ) one of which does not evaluation metrics used in clustering present, such as the type of data
depend on 𝑥𝑗 and the other independent of 𝑥𝑘 . processed, the algorithm used and also the simultaneous measurement
of two desirable properties such as homogeneity and completeness.
• Shapley Value One of the traditionally used evaluation methods is the Dunn’s in-
Among the important works to refer to it is possible to mention dex [40], which measures the internal homogeneity of the points
the Shapley Values [33], an innovative method in which an grouped for each cluster, minimizing the internal variance and separat-
additive method assesses the importance of variables through the ing the groups externally. Another well-known and used method is the
expected conditional value of the original model, it is possible to Silhouette Coefficient [41], which validates the measure of coherence
mention the work of Koh and Liang [34] in which the authors within the clusters referring to the quality of the classification of each
measure the importance of the variables through the Influence object; this method is widely used and provides an intuitive visual
Function, i.e. starting from the minimization of a risk function of
∑ representation of the grouping.
the following type 𝑅(𝜃) = 1𝑛 𝑖 𝐿(𝑧𝑖 , 𝜃). For a more detailed discus-
sion, from which various components of this chapter have been Classification. Starting from the confusion matrix that it can be ap-
extracted, please refer to the excellent work of the authors [26]. plied to binary and multiclass classification problems the algorithms
used were evaluated by the following metrics, namely recall, precision,
3.1. Features importance accuracy and Area Under Curve (AUC). The metrics defined are
𝑇𝑃
In this part of the work are presented some of the main features 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (3)
𝑇𝑃 + 𝐹𝑃
importance methods in order to give the reader a general overview and and
understanding of the context in which this work is placed. Features im-
𝑇𝑃
portance is an analytical technique that aims to understand how much 𝑟𝑒𝑐𝑎𝑙𝑙 = (4)
𝑇𝑃 + 𝐹𝑁
6
F. Curia Healthcare Analytics 1 (2021) 100001
furthermore is defined the F-measure score as follow cluster 0 0.90 0.90 0.90
cluster 1 0.82 0.90 0.86
2 cluster 2 0.88 0.78 0.82
𝐹 -𝑚𝑒𝑎𝑠𝑢𝑟𝑒 = (6)
1∕𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 1∕𝑟𝑒𝑐𝑎𝑙𝑙
for classification problems in which the probability of belonging to a
certain class of an instance is evaluated, it is possible to use the fol-
lowing metric defined receiver operating characteristics (ROC), which
is also a good intuitive visual interpretation, from this measure it is
possible to derive the formulation of AUC which has a very interesting
statistical property as shown by [42]: the AUC of a classifier is equiva-
lent to the probability that the classifier will classify a randomly chosen
positive instance higher than a randomly chosen negative instance.
4. Experimental results
This section presents the main results obtained, both for the ensem- Fig. 4. Confusion matrix comparison: train vs test.
ble clustering method and for the supervised part. The performance
results of the models, the interpretation of the features, their impor-
tance and a whole part in which the results obtained from the clinical
point of view are explained are presented.
7
F. Curia Healthcare Analytics 1 (2021) 100001
Table 3
Fig. 7. Feature importance: random forest.
Clustering explanation: random forest.
Method Importance
agglomerative 0.337020
𝑘-means 0.298459
spectral 0.211150
birch 0.153371
Table 4
Clustering Explanation: LIME Global.
Method Importance Cluster
𝑘-means 0.4772 0
𝑘-means −1.7429 1
spectral −1.4031 0
spectral 0.6024 1
8
F. Curia Healthcare Analytics 1 (2021) 100001
Table 5
LIME Explanation: patient: TH003.
y = 0 (p: 0.953) Contribution Feature
+7.198 which eye
+0.421 G-CSF
+0.267 IL-7
+0.236 EGF
+0.217 GRO
+0.157 IL-10
+0.101 IL-15
+0.100 IL-13
+0.098 sCD40L
−1.072 IP-10
Table 6
LIME Explanation: patient CTH025.
y = 1 (p: 0.027) Contribution Feature
+0.924 GRO
+0.713 G-CSF
+0.608 IL-7
+0.428 IL-10
+0.302 sCD40L
+0.260 EGF
+0.241 IL-15
Fig. 9. Feature importance: permutation features importance.
+0.236 IL-8
+0.179 IL-13
−0.336 IP-10
Table 7
LIME Explanation: patient CTH024.
y = 2 (p: 0.887) Contribution Feature
+5.535 TGF-a
+1.326 PDGF-AA
+1.091 IL-6
+0.906 MIP-1b
+0.869 HIV presence
+0.829 IL-15
+0.819 IL-5
+0.734 IP-10
Fig. 10. Feature explainability: LIME.
+0.701 FGF-2
+0.692 MDC
view of explainability and interpretation. Let us consider our logistic 0.46 0.00 < which eye which eye 1.00
classifier and use a method known as LIME defined in the previous 0.30 0.00 < HIV HIV 1.00
presence ≤ 1.00 presence
sections: Fig. 10 shows the explanation of the features at a global 0.16 TGF-a ≤ 503.15 TGF-a 105.72
level, that is for the whole model, and it can be seen that also in this 0.10 FGF-2 ≤ 696.45 FGF-2 353.09
case, depending on the cluster, it can evaluate the single impact of the 0.09 PDGF-BB ≤ 2309.25 PDGF-BB 1523.50
feature. In this case the significant impact is always the one that refers 0.05 IP-10 ≤ 172976.24 IP-10 223207.87
0.07 GRO ≥ 20550.58 GRO 17005.63
to the feature 𝑤ℎ𝑖𝑐ℎ 𝑒𝑦𝑒 which in cluster 0 increases the probability of
belonging to that particular group by 6 times and in a minor (negative)
but still significant form has an impact on the two remaining clusters,
as proof and validation of the methods used in the previous subsection. to provide the complete CDSS tool, assuming that the clinical operator
Using the LIME method, which provides a local interpretation, for works closely with the analytics expert and more generally of artificial
a single instance, of how the features impact the final classification, intelligence, so that the results are robust from a methodological point
Tables 5–7 show the values explained for three different patients. Each of view, above all easily usable and interpretable in the clinical domain.
table shows the probability of belonging to a certain cluster for that
patient, with the name of each feature and the corresponding value that
5.1. Clustering explanations
affects that certain probability; in the case of patient CTH024 (Table 7)
it is noted that the value of cytokine TGF-a has a strong contribution in
identifying group 2 as probable to insert that patient, as well as in group In the results obtained by Agrawal et al. [19], it was highlighted
1 for patient CTH025 (Table 6) in which the values of the cytokines EGF that some cytokines are in close association with the DED pathology,
and GRO have a strong relationship with belonging to that group. therefore starting from their observations and their analysis results,
in this phase of explainability of clustering, the results obtained on
5. Clinical explainability the basis of the cytokines GRO, EGF and IP-10, with respect to some
features, such as which eye (1=right, 0=left) and the presence of HIV (0
This section provides an explanation of the results obtained above, = negative, 1 = positive) and obviously the label (obtained through the
from the clinical point of view and the possible implications, in order consent function) of membership in order to evaluate the assignment.
9
F. Curia Healthcare Analytics 1 (2021) 100001
10
F. Curia Healthcare Analytics 1 (2021) 100001
accuracy of 91% on the train data and 86% on the test; the choice [6] S. Tonekaboni, S. Joshi, M.D. McCradden, A. Goldenberg, What clinicians want:
of logistic regression as meta-learner for the classification is motivated Contextualizing explainable machine learning for clinical end use, 2019, URL
arXiv:1905.05134.
by the fact that the same authors [19] used logistic regression in their
[7] Y. Xie, G. Gao, A. Chen, Outlining the design space of explainable intelligent
study, but obviously nothing prevents other methods from being used systems for medical diagnosis, 2019.
for this particular case of study (i.e. decision trees, neural network, . . . ), [8] M. Naiseh, Explainability design patterns in clinical decision support systems,
but taking into account that using complex black-box methods such as 2020.
neural networks, for example, always provides for the use of techniques [9] A. Bussone, S. Stumpf, D. O’Sullivan, The role of explanations on trust and
for explaining the results as was done in this work by means of the LIME reliance in clinical decision support systems, in: 2015 International Conference
on Healthcare Informatics, 2015, pp. 160–169.
or Shapley method. The results obtained confirm the previous study by
[10] M. Naiseh, N. Jiang, J. Ma, R. Ali, Explainable recommendations in intelligent
the [20] authors, [19], regarding the values of the cytokines GRO, EGF systems: Delivery methods, modalities and risks, 2020.
and IP-10 and their association with DED disease and seropositivity: [11] T. Liyuan, C. Zhang, L. Zeng, S. Zhu, N. Li, W. Li, H. Zhang, Y. Zhao, S. Zhan, H.
this work adds a small contribution on how to use these [20] data, on Ji, Accuracy and effects of clinical decision support systems integrated with BMJ
how to interpret the results and another point of view on how to study best practice-aided diagnosis: Interrupted time series study, JMIR Med. Inform.
8 (2020) e16912, http://dx.doi.org/10.2196/16912.
the associated phenomenon.
[12] N. Peiffer-Smadja, T. Rawson, R. Ahmad, A. Buchard, G. Pantelis, F.c.-X.
Lescure, G. Birgand, A. Holmes, Machine learning for clinical decision support in
6.1. Limitations and future work infectious diseases: A narrative review of current applications, Clin. Microbiol.
Infect. 26 (2019) http://dx.doi.org/10.1016/j.cmi.2019.09.009.
By introducing this new methodology in a clinical decision-making [13] N. Fitriyani, M. Syafrudin, G. Alfian, J. Rhee, HDPM: An effective heart disease
prediction model for a clinical decision support system, IEEE Access 8 (2020)
process, decision makers will surely have an extra tool to deal with
133034–133050, http://dx.doi.org/10.1109/ACCESS.2020.3010511.
the diagnosis and treatment of this particular pathology that has been [14] J. Schwartz, A. Moy, S. Rossetti, N. Elhadad, K. Cato, Clinician involvement
treated. However, there are several questions that at present can be in research on machine learning–based predictive clinical decision support for
investigated and further solutions to be pursued; for example if the the hospital setting: A scoping review, J. Am. Med. Inform. Assoc. 28 (2020)
proposed framework, based on clustering methods in which groups http://dx.doi.org/10.1093/jamia/ocaa296.
[15] S. Malik, N. Kanwal, M. Asghar, M. Ali, I. Karamat, M. Fleury, Data driven
are chosen a priori, can be extended to the use of different methods,
approach for eye disease classification with machine learning, Appl. Sci. 9 (2019)
perhaps not necessarily based on Euclidean distances, or if textual data http://dx.doi.org/10.3390/app9142789.
can be used for example, collected from medical records, or image data. [16] S. Nam, T.A. Peterson, A. Butte, K.Y. Seo, H.W. Han, Explanatory model of
In the hypotheses just made it is clear that the tools can be different and dry eye disease using health and nutrition examinations: Machine learning and
more complex, in the case study the data were numerical, extending to network-based factor analysis from a national survey, JMIR Med. Inform. 8
(2020).
categorical, textual and image data, or in any case unstructured data,
[17] B.N. Nguyen, A.W. Chung, E. Lopez, J. Silvers, H.E. Kent, S.J. Kent, L.E.
the resources to be put in place different. Instead of perhaps using a Downie, Meibomian gland dropout is associated with immunodeficiency at hiv
𝑘 -means method you will need to use a 𝑘 -modes or other methods, diagnosis: Implications for dry eye disease, Ocular Surf. 18 (2) (2020) 206–213,
such as autoencoder or self-organizing maps. Another consideration http://dx.doi.org/10.1016/j.jtos.2020.02.003.
is related to data: how could the method behave if massive amounts [18] S.D. Mathebula, P.S. Makunyane, Ocular surface disorder among HIV and AIDS
patients using antiretroviral drugs, Afr. Vis. Eye Health 88 (2019) 78(1), http:
of data were used? Therefore it should also be clear which and how
//dx.doi.org/10.4102/aveh.v78i1.457.
many computational resources to put in place and if the explainability [19] R. Agrawal, P.K. Balne, A. Veerappan, V.B. Au, B. Lee, E. Loo, A. Ghosh, L.
methods can be extended to unstructured data; fortunately, advances in Tong, S.C. Teoh, J. Connolly, P. Tan, A distinct cytokines profile in tear film of
explainable AI and clinical research continue and to date some of the dry eye disease (DED) patients with HIV infection, Cytokine 88 (2016) 77—84,
questions posed have already been answered. The proposed method is http://dx.doi.org/10.1016/j.cyto.2016.08.026.
[20] Dataset of tear film cytokine levels in dry eye disease (DED) patients with
applied to few data and computationally there were no difficulties and
and without HIV infection, Data in Brief 10 (2017) 14–16, http://dx.doi.org/
the mathematical methods used did not have any problems in use; in 10.1016/j.dib.2016.11.027.
general, clinical problems have quite manageable datasets, since the [21] G. Choi, J. Yun, J. Choi, D. Lee, J. Shim, H. Lee, Y.-H. Chung, Y. Lee, B.
aim of the work was to show both the potential of the proposed method Park, N. Kim, K.M. Kim, Development of machine learning-based clinical decision
and to make a contribution in the field of research for DED, although support system for hepatocellular carcinoma, Sci. Rep. 10 (2020) 14855, http:
//dx.doi.org/10.1038/s41598-020-71796-z.
there are some limitations, the framework presented is in any case
[22] E. Zihni, V.I. Madai, M. Livne, I. Galinovic, A.A. Khalil, J.B. Fiebach, D. Frey,
usable in the light of recent advances in this area. Opening the black box of artificial intelligence for clinical decision support: A
study predicting stroke outcome, PLoS One 15 (2020) 1–15, http://dx.doi.org/
Declaration of competing interest 10.1371/journal.pone.0231166.
[23] E. Georga, V. Protopappas, E. Arvaniti, D. Fotiadis, The Diabino System:
Temporal Pattern Mining from Diabetes Healthcare and Daily Self-monitoring
The authors declare that they have no known competing finan-
Data: ICBHI 2015, Haikou, China, 8-10 October 2015, 2019, pp. 61–65, http:
cial interests or personal relationships that could have appeared to //dx.doi.org/10.1007/978-981-10-4505-9_10.
influence the work reported in this paper. [24] E. Kumar, P. Jayadev, Deep Learning for Clinical Decision Support Systems:
A Review from the Panorama of Smart Healthcare, 2020, pp. 79–99, http:
References //dx.doi.org/10.1007/978-3-030-33966-1_5.
[25] F. Curia, Explainable Clinical Decision Support System: Opening Black-Box Meta-
[1] F.T. de Dombal, D.J. Leaper, J.R. Staniland, A.P. McCann, J.C. Horrocks, Learner Algorithm Expert’s Based (Ph.D thesis), Catalogo Iris, Sapienza University
Computer-aided diagnosis of acute abdominal pain, 2(5804) (1972) 9–13, http: of Rome, 2021, URL http://hdl.handle.net/11573/1538472.
//dx.doi.org/10.1136/bmj.2.5804.9. [26] A. Barredo Arrieta, N.D. az Rodrí guez, J. Del Ser, A. Bennetot, S. Tabik, A.
[2] E. Shortliffe, Computer-based medical consultations: MYCIN, Artificial Intelli- Barbado, S. Garcia, S. Gil-Lopez, D. Molina, R. Benjamins, R. Chatila, F. Herrera,
gence 388 (1976) http://dx.doi.org/10.1097/00004669-197610000-00011. Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and
[3] R.A. Miller, H.E. Pople, J.D. Myers, Internist-I, an experimental computer- challenges toward responsible AI, Inf. Fusion 58 (2020) 82–115, http://dx.doi.
based diagnostic consultant for general internal medicine, N. Engl. J. Med. 307 org/10.1016/j.inffus.2019.12.012.
(8) (1982) 468–476, http://dx.doi.org/10.1056/NEJM198208193070803, PMID: [27] M.T. Ribeiro, S. Singh, C. Guestrin, Why should I trust you?: Explaining the
7048091. predictions of any classifier, 2016, URL arXiv:1602.04938.
[4] J. Müller, M. Stoehr, A. Oeser, J. Gaebel, M. Streit, A. Dietz, S. Oeltze-Jafra, A [28] Z.C. Lipton, The mythos of model interpretability, 2017, URL arXiv:1606.03490.
visual approach to explainable computerized clinical decision support, Comput. [29] P. Smyth, D. Wolpert, Linearly combining density estimators via stacking, Mach.
Graph. 91 (2020) 1–11, http://dx.doi.org/10.1016/j.cag.2020.06.004. Learn. 36 (1999) 59–83, http://dx.doi.org/10.1023/A:1007511322260.
[5] H. Schafer, S. Hors-Fraile, R. Karumur, A. Calero Valdez, A. Said, H. Torkamaan, [30] J.H. Friedman, Greedy function approximation: A gradient boostingma-
T. Ulmer, C. Trattner, Towards health (aware) recommender systems, 2017, chine, Ann. Statist. 29 (5) (2001) 1189–1232, http://dx.doi.org/10.1214/aos/
http://dx.doi.org/10.1145/3079452.3079499. 1013203451.
11
F. Curia Healthcare Analytics 1 (2021) 100001
[31] A. Goldstein, A. Kapelner, J. Bleich, E. Pitkin, Peeking inside the black box: [37] A. "Zien, N. Krämer, S. Sonnenburg, G. Ratsch, The feature importance ranking
Visualizing statistical learning with plots of individual conditional expectation, measure, in: Machine Learning and Knowledge Discovery in Databases, Springer
2014, URL arXiv:1309.6392. Berlin Heidelberg, "2009, pp. 694–709.
[32] J.H. Friedman, B.E. Popescu, Predictive learning via rule ensembles, Ann. Appl. [38] A. Altmann, L. Tolosi, O. Sander, T. Lengauer, Permutation importance: a
Stat. 2 (3) (2008) 916–954, http://dx.doi.org/10.1214/07-AOAS148. corrected feature importance measure, Bioinformatics 26 10 (2010) 1340–1347.
[33] S.M. Lundberg, S.-I. Lee, A unified approach to interpreting model predictions, [39] A. Rosenberg, J. Hirschberg, V-Measure: A Conditional Entropy-Based External
in: I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, Cluster Evaluation Measure, 2007, pp. 410–420.
R. Garnett (Eds.), Advances in Neural Information Processing Systems, Vol. 30, [40] J.C. Dunn, A fuzzy relative of the ISODATA process and its use in detecting
Curran Associates, Inc., 2017. compact well-separated clusters, J. Cybern. 3 (3) (1973) 32–57, http://dx.doi.
[34] P.W. Koh, P. Liang, Understanding black-box predictions via influence functions, org/10.1080/01969727308546046.
2020, URL arXiv:1703.04730. [41] P.J. Rousseeuw, Silhouettes: A graphical aid to the interpretation and validation
[35] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32, http://dx.doi.org/ of cluster analysis, J. Comput. Appl. Math. 20 (1987) 53–65, http://dx.doi.org/
10.1023/A:1010933404324. 10.1016/0377-0427(87)90125-7.
[36] T.D. Gedeon, Data mining of inputs: Analysing magnitude and functional [42] T. Fawcett, An introduction to ROC analysis, in: ROC Analysis in Pattern
measures, Int. J. Neural Syst. 8 (2) (1997) 209–218, http://dx.doi.org/10.1142/ Recognition, Pattern Recognit. Lett. 27 (8) (2006) 861–874, http://dx.doi.org/
S0129065797000227. 10.1016/j.patrec.2005.10.010.
12