Shap Lime

An Empirical Evaluation of AI Deep Explainable
Tools
Yoseph Hailemariam∗ , Abbas Yazdinejad† , Reza M. Parizi∗ , Gautam Srivastava‡ , Ali Dehghantanha†
∗ College of Computing and Software Engineering, Kennesaw State University, GA, USA
yhailema@students.kennesaw.edu, rparizi1@kennesaw.edu
† Cyber Science Lab, School of Computer Science, University of Guelph, Ontario, Canada
2020 IEEE Globecom Workshops (GC Wkshps) | 978-1-7281-7307-8/20/$31.00 ©2020 IEEE | DOI: 10.1109/GCWkshps50303.2020.9367541
abbas@cybersciencelab.org, adehghan@uoguelph.ca
‡ Department of Mathematics and Computer Science, Brandon University, Manitoba, Canada
srivastavag@brandonu.ca
Abstract—Success in machine learning has led to a wealth of cerns around the lack of transparency in the underlying
Artificial Intelligence (AI) systems. A great deal of attention is decision-making process. Some application domains even are
currently being set on the development of advanced Machine reluctant to use AI systems. The reason being the degree
Learning (ML)-based solutions for a variety of automated pre-
dictions and classification tasks in a wide array of industries. of criticality involved in security-sensitive domains where
However, such automated applications may introduce bias in prediction should be accurate; otherwise, it would cost a
results, making it risky to use these ML models in security- fatal situation. Security sensitive domains are health-related
and privacy-sensitive domains. The prediction should be accurate domains, monetary domains, mission-critical infrastructures,
and models have to be interpretable/explainable to understand etc. The majority of the previous research focuses on the
how they work. In this research, we conduct an empirical
evaluation of two major explainer/interpretable methods called performance of ML models, when it comes to prediction, not
LIME and SHAP on two datasets using deep learning models, focusing precisely on explaining the rationale for making the
including Artificial Neural Network (ANN) and Convolutional prediction. Lately, the community has seen the emergence of a
Neural Network (CNN). The results demonstrated that SHAP new field, commonly referred to as Explainable AI (XAI) [9]
performs slightly better than LIME in terms of Identity, Stability, that provides apparatuses and systems to assist developers and
and Separability from two different datasets (Breast Cancer
Wisconsin (Diagnostic) and NIH Chest X-Ray) that we used. researchers with creating interpretable and comprehensive ML
models and deploy them with confidence. As a result, there are
Index Terms—AI, Explainable AI, Interpretable Techniques, interpretability techniques that would help interpret models to
ANN, CNN, XAI better understand how everything works. Even though there
is a trade-off between performance and interpretability where
I. I NTRODUCTION the more the model is interpretable such as linear models and
In recent years, Artificial intelligence (AI) [1] has been decision trees, the lower the performance would be [10]. It is
considered one of the world-transforming research fields. AI still important that a complex model should be interpretable in
is being utilized in different application domains such as a security-sensitive domain. Recently, the European parliament
data security [2], financial trading, advertising, marketing, in May 2018 constitutes a law where it is mandatory for
healthcare, and blockchain [3]–[5]. AI helps solve very com- companies to ‘explain’ any decision taken using machine
plex problems where it would not have been possible using learning and deep learning [11]: ”a right of explanation for
traditional algorithmic or other methods. Machine learning all individuals to obtain meaningful explanations of the logic
(ML) [6], and particularly deep learning, as the enabling arm involved”.
of AI, has gained lots of traction due to its capability in As explained by Doshi-Velez et al. [12], in the context of
building intelligent systems in recent years [7], [8]. ML systems, interpretability is defined as the ability to explain
Designing and building an AI system requires four steps: or to present results in understandable terms to a human. There
Identifying the problem, preparing data, choosing an algo- is still a question regarding what constitutes a good explana-
rithm, and training the algorithm using the data. About 80 tion, where one type of explanation is good for one audience or
percent of data scientists’ time requires cleaning, moving, bad for the other. In this research, we focus on interpretability
checking, organizing data before even actually using or writing when it comes to safety, which discusses robustness and
a single algorithm. So, it is important that the data collected security. More specifically, we conduct an empirical evaluation
have to be properly prepared. Once we have identified the of two model-agnostic explainers/interpretable methods called
problem and prepared the data, we can select the algorithm. SHAP1 and LIME2 (they are called model-agnostic because
Knowing the distinction of the algorithms is important to have they can explain any machine learning models). We evaluate
a good prediction.
Despite great use and demand for the AI systems and 1 https://github.com/slundberg/shap
particularly ML-powered applications, there are some con- 2 https://github.com/marcotcr/lime
978-1-7281-7307-8/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: Carleton University. Downloaded on May 29,2021 at 08:34:41 UTC from IEEE Xplore. Restrictions apply.
the explainers’ performance in a security-sensitive domain With more healthcare-related focus, the authors in [19]
using two major deep learning models (Artificial Neural Net- evaluated three agnostic interpretability techniques (LIME,
work (ANN) and Convolutional Neural Network (CNN) ) from SHAP, and Anchors) on various healthcare data types. Their
two different datasets (Breast Cancer Wisconsin (Diagnostic) work focuses on various sides of the comparison, such as
and NIH Chest X-Ray). With the increasing concerns about identity, stability, separability, similarity, execution time, and
biases in deep classifiers, these types of evaluation will play bias detection. Finally, they achieved that LIME has the lowest
an important part in the next generation of networks once performance in the identity metric, but it has the highest
implements (5G and 6G and beyond) to help ensure that the performance in terms of separability in compared to SHAP
intelligent deep-based systems are transparent and bias-free. and Anchors at all datasets. Moreover, in [20], Adadi et al.
Anytime future mobile networks need to cope with the growth investigated explainable AI for healthcare issues. Indeed, they
in ML, AI, automation, and low latency applications, we will tried to clear the black box feature of AI towards interpretable
need such empirical evidence in place beforehand for more models. They showed not only recent researches and investiga-
trusted networks [13]. tions regarding the explainability and interpretability of AI but
The rest of this paper is structured as follows. We present also consider the ability of XAI on healthcare and medicine
the related works in Section II. In Section III, we provide the issues. In another work [21], the authors introduced an ap-
methodology used to perform the empirical analysis, including proach called Doctor XAI. It is a model-agnostic explainability
datasets, metrics, models, and experimental design. Section approach capable of dealing with sequential, multi-labeled,
IV presents the results and discusses the main observations. ontology data. In this work, the authors offered to explain
Finally, in Section V, we conclude the paper. Doctor AI that was a multilabel classifier that takes as data
the clinical records of a patient to prognosticate the next visit.
II. R ELATED W ORK
There is a vast number of researches being done in XAI III. M ETHODOLOGY
space with the effort to advance the filed. XAI can be classified This section gives the research materials used in the ex-
by different criteria such as transparency in ML models (simu- periment including the dataset, models, explainers and lastly
latability, decomposability, and algorithmic transparency) and the approach to our experiment design. The selection of the
posthoc explainability targets models for ML (text, local and research material is based on the observation of the current
visual explanations, explanations by example, explanations by practice and picking the recommended tools and frameworks.
simplification, specific/agnostic XAI techniques [14], [15]).
At the same time, there are a growing number of research
efforts into producing works to evaluate and compare formally A. Datasets
different XAI tools. It is vital to generate empirical knowledge To experiment, we have used two different datasets from the
on how to measure interpretability in different tools and bring medical domain which are security (safety) in nature. Security
the community to a consensus moving forward. sensitive domains are more susceptible to adversarial attacks
Doshi-Velez et al. [12] propose three main evalua- when it comes to ML models. The first dataset is called Diag-
tion approaches for interpretability: application-grounded, nostic [22] and its about the detection of breast cancer using
human-grounded, and functionally-grounded. This classifica- a digitized image of a fine needle aspirate (FNA) of a breast
tion ranges from the involvement of humans to no human mass. The dataset classifiers breast cancer detection as benign
involvement. Ideally, it would be better to have a functionally- or malignant. The dataset has about ten features to constitute
grounded evaluation where it would reduce cost and make the prediction of benign or malignant. The second dataset
it easier to compare results in different domains. Robnik- focuses on the NIH Chest X-Ray Dataset. The dataset [23]
Sikonja et al. [16] have proposed some of the properties for is composed of 112,120 X-ray images that clinically diagnose
the explanation when doing functionally-grounded evaluation: 30,805 unique patients with different labels like Atelectasis,
Accuracy, Fidelity, Consistency, Stability, Comprehensibility, Pneumonia, Pneumothorax, etc. The reading and diagnosing
Certainty, Importance, Novelty, and Representatives. However, chest x-ray images require careful observation and knowledge
it is unknown how to assess the properties, and it is not of anatomical principles, physiology, and pathology, prone to
clear how useful these proprieties for XAI. Miller [17] has adversarial attacks if deep learning models are being used.
also conducted a comprehensive survey regarding explaining
in Human-Friendly Explanation: Constructiveness, Selectivity, B. Deep Models
Social, Focus on the abnormal, Truthful, Consistency, and Deep learning is part of ML in artificial intelligence (AI)
General. He states this characteristic makes up a good human that uses neural networks to learn unsupervised or supervised
understandable explanation. from data. Such models have been the popular choice for im-
Kacper et al. [18] have categorized and systematically plementing ML solutions in many security-sensitive domains.
assessed explainable systems along five key dimensions: func- For this experiment, we also favor using deep neural models
tional, operational, usability, safety, and validation. In their because compared to other ML algorithms, as again, they are
work, they compare interpretability methods to better under- known for solving complex problems with effective learning
stand their differences. capability and the ability to extract features on their own.
Fig. 1. Experiment procedure for Diagnostic dataset with ANN
Fig. 2. Experiment procedure for NIH dataset with CNN
Most importantly, they are very complex to interpret and it is C. Explainers

more advanced technology compared to other AI technology, We used two explainers for this experiment, namely LIME
thus the explainability of deep learning models is of high and SHAP. We have picked these two explainers for being
importance. The deep neural networks that we used are CNNs model agnostic and most commonly used in interpreting deep
and ANNs. models in both industry and academia.
1) Artificial Neural Networks (ANNs): ANNs are inspired 1) LIME: LIME stands for Local Interpretable Model
by the way the biological nervous system, such as the brain, Agnostic Explanations. The LIME method [15] has been
process information functions to process information. ANN is presented as a local explainability method that focuses on
also called Feedforward Neural Network because inputs are the supposition that the boundary of a complex ML is linear
processed only in the forward direction. It is composed of near the sample to be interpretable. To understand which
neurons and consists of input and output layers, as well as (in one of the characteristics of a sample takes part in the
most cases) a hidden layer consisting of units that transform prediction, it perturbs the characteristic in its vicinity as well
the input into something that the output the layer can use. as observes should there is any treatment later in the model’s
predictions. It interprets the sample of concern by fitting an
2) Convolutional Neural Networks (CNNs): It is notewor- explainable model on a perturbed instance near the concerned
thy that CNN has been emanated in the 1950s during bio- input sample. Especially, LIME creates a perturbed instance
logical experiments. David Hubel and Torston Wiesel offered near the sample to be interpreted and explained. For instance,
two types of cells (simple and complex) applied at pattern in the perturbed example, LIME reach the prediction from
recognition [24]. Usually, CNNs are applied in computer the example to be interpreted and explained. Indeed, the
vision applications as well as image processing. However, it is perturbed instance will work on the training dataset for the
not just used for those applications and tasks [24]. The main interpretable model. The method then allocates weights in the
goal of CNN is to get appropriate trait illustrations of the new dataset (training) based on their vicinity to the sample to
input data. CNN is like ANN but what makes it different is be interpreted. Consequently, LIME will fit and interpret and
it uses backpropagation to a method to train neural networks explain the model on the new dataset (training).
by ”backpropagating” the error from the output layer to the 2) SHAP: SHAP applies the game theory approach to con-
input layer (including hidden layers). The thing that needs to sider each specification’s role during the prediction process.
be highlighted here is that CNNs require the sum of training The Shapley amount [25] is a technique from coalitional
time; nonetheless, they can be computationally acceptable game theory. This technique can spread the gain betweens
for AI applications. Furthermore, CNNs are excellent feature all players fairly (specifications), where the contribution is
extractors while they may be hard to perform due to the not fair. Especially, the Shapley amount spread the diversity
amount of required math in contrast to other Neural Networks among the prediction as well as the meddle prediction between
[24]. the specification amounts of the sample to be explained. It is
virtually certain that the computation time is one of the most our categorical variables to numerical values, and finally, we
critical issues in the Shapely amount method. Particularly, applied feature scaling to normalize the range of independent
during the precise computation of the Shapley amount, it variables or features of data. The information that flows
should consider all feasible sets of specifications, although through the network affects the ANN structure because a
there is no specification of interest. The precise amount of neural network changes - or learns, in a sense, based on that
calculation gets hectic to compute when there are many input and output. Thus, we constructed three layers: one entry
specifications since the number of sets will grow exponentially layer, one hidden layer, and an output layer. The first two
with the number of specifications. One of the viable ways to layers used the ReLU activation function, and the last uses
tackle this issue is to apply sampling techniques with a fixed the Sigmoid function. Fig. 2 shows the experimental design
number of the sample. of NIH Chest Xray and CNN. For the second experiment, we
used the NIH chest Xray dataset. Before we use the dataset,
D. Measures
we performed image preprocessing. We image resizing for
The properties to measure used in this paper was inspired by the image preprocessing, changing the color to grayscale, and
the Explainability Fact sheets [18]. Specifically, we focused on generating new images. This dataset’s deep model is CNN
safety (security) property to systematically assess the explain- since CNN is known to work well with image datasets. Fig.
able system. Sokol et al. [18] proposed that when assessing 3 shows a sample of the actual explanation’s output when we
an explainable system for safety, it is recommended to focus executed the experimental design for LIME with the ANN
on four criteria: Information Leakage, Explanation Misuse, model and Diagnostic dataset, and Fig. 4 shows the same
Explanation invariance, and Explanation Quality. The first dataset being explained in SHAP when it is executed.
two criteria consider how much information an explanation As can be seen in these figures, the amount of output
reveals about the underlying model and its training data. If value in diagnosis detection at ANN, Diagnostic dataset is
the explanation tells sensitive information about the model, 1 and 0.9902 for SHAP and LIME execution explanation
hackers can use it to exploit it. Explanation invariance con- respectively that SHAP has better performance in comparison
cerns with measuring the explanation similarity with the same with LIME since LIME has 0.0428 loss during diagnosis
dataset and different models but different explainers. Lastly, detection.
Explanation quality concerns with evaluating the quality and Similarly, Fig. 5 shows a sample of the actual explanation’s
correctness of an explanation underlying models and dataset. output when we executed the experimental design for LIME
The explainer should only explain the underlying model oth- with the CNN model and NIH dataset, and Fig. 6 shows the
erwise, it would be ambiguous. Diogo et al. [26] proposed that same dataset being explained in SHAP when it is executed.
explanation quality can be measured by correctness. The goal As can be seen in the figures, the SHAP is so exact and
of correctness is to show trust and explain what the model precise to interpret and explain methods on the Xray dataset,
is doing, without being ambiguous. Explanation invariance whereas LIME is vague and hard to matches human intuition
is measured using separability, stability, and identity. The for NIH chest Xray dataset.
goal of identity is to show identical instances should have an
identical explanation. If an interpretable technique is run the IV. R ESULTS AND D ISCUSSION
same instance several times, it is expected to show the same After the experiment execution specified in the previous
explanation otherwise, this shows how unreliable the explainer section, we collected all the results, which are presented in
is. Stability is somewhat similar to identity, but it also accounts this section.
for a similar instance (instance with a small difference) should Table I shows the experimental results on the Diagnostic
have similar explanations. And lastly, Separability concerns dataset. The number in the table represents the percentage
making sure different instances have a different explanation. of instances that satisfy the defined security metrics. Table II
It cannot have identical explanations. shows the experimental results on the NIH chest Xray dataset.
E. Experimental Design and Execution
TABLE I
We carried out the experiment using online machine learn- E VALUATION OF I NTERPRETABILITY METHODS ON D IAGNOSTIC DATASET
ing running services on Kaggle and Google Colab. We used AND ANN
Kaggle on the image dataset and CNN. And we used Google Metrics LIME SHAP
Colab for the tabular dataset and ANN. The code for the whole Identity 100% 100%
experimental package for both datasets are hosted in these Stability 94.7% 100%
Separability 100% 100%
GitHub pages3 ,4 .
Fig. 1 shows the experimental design for the first dataset
In order to measure explanation invariance, as provided in
and the ANN model. In the preparation of the Diagnostic
the measure section, it is through three metrics: Identity, Sep-
dataset, we made sure all unnecessary features and raw data
arability, and Stability. In Table I, both LIME and SHAP did
were removed. Then, we applied one-hot encoding to convert
well on the majority of the metrics on the Diagnostic dataset
3 https://github.com/yosepppph/BreastCancerANN where the only difference is in Stability. SHAP performed
4 https://github.com/yosepppph/NIHXrayCNN 100% and LIME slightly performed lower with 94.7% (it only
Fig. 3. LIME Execution Explanation (ANN, Breast Cancer Dataset)
Fig. 4. SHAP Execution Explanation (ANN, Breast Cancer Dataset)
Fig. 5. LIME Execution Explanation (CNN, NIH Dataset)
TABLE II A. Limitations
E VALUATION OF I NTERPRETABILITY METHODS ON NIH C HEST XRAY
DATASET AND CNN For this experiment, we couldn’t find meaningful metrics to
use to estimate the last two criteria Information Leakage and
Metrics LIME SHAP
Identity 51% 100%
Explanation Misuse since they both are subjective to find a
Stability 0% 0% threshold that constitutes whether the information is sensitive
Separability 100% 100% or not. We believe more research should be performed to
define quantifiable/objective metrics for information leakage
and explanation misuse. Besides, we weren’t also able to
got wrong 3 samples out of 57 samples). Based on Table I, measure explanation correctness for the image dataset because
we can conclude that SHAP performs better than LIME. In it needs expert validation to assert if an explanation is valid
Table II, both LIME and SHAP didn’t perform as expected or not. This was one of the challenges that we faced during
on stability metrics. One possible reason is since the image’s the conduct of this research.
pixels account for the number of features, having that amount
V. C ONCLUSIONS AND F UTURE W ORK
of pixels makes it harder to group similar samples. The slight
difference between the two explainers was on identity where With the popularity of ML applications and incredibly deep
SHAP performed 100 and LIME performed 54. Just based on learning, many companies and organizations have created
that factor, SHAP performs better than LIME. intelligent systems for many aspects of modern lives. Despite
[5] D. Połap, G. Srivastava, A. Jolfaei, and R. M. Parizi, “Blockchain tech-
nology and neural networks for the internet of medical things,” in IEEE
INFOCOM 2020 - IEEE Conference on Computer Communications
Workshops (INFOCOM WKSHPS), 2020, pp. 1–6.
[6] P. Louridas and C. Ebert, “Machine learning,” IEEE Software, vol. 33,
no. 5, pp. 110–115, 2016.
[7] M. Saharkhizan, A. Azmoodeh, A. Dehghantanha, K. R. Choo, and
R. M. Parizi, “An ensemble of deep recurrent neural networks for
detecting iot cyber attacks using network traffic,” IEEE Internet of
Things Journal, pp. 1–1, 2020.
[8] A. Yazdinejad, H. HaddadPajouh, A. Dehghantanha, R. M. Parizi,
G. Srivastava, and M.-Y. Chen, “Cryptocurrency malware hunting: A
deep recurrent neural network approach,” Applied Soft Computing,
vol. 96, p. 106630, 2020.
[9] D. Gunning, “Explainable artificial intelligence (xai),” Defense Ad-
vanced Research Projects Agency (DARPA), nd Web, vol. 2, p. 2, 2017.
[10] R. El Shawi, Y. Sherif, M. Al-Mallah, and S. Sakr, “Interpretability in
healthcare a comparative study of local machine learning interpretability
techniques,” in 2019 IEEE 32nd International Symposium on Computer-
Based Medical Systems (CBMS), 2019, pp. 275–280.
[11] R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and
D. Pedreschi, “A survey of methods for explaining black box models,”
Fig. 6. SHAP Execution Explanation (CNN, NIH Dataset) ACM Computing Surveys, vol. 51, no. 5, 2018.
[12] F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretable
machine learning,” arXiv preprint arXiv:1702.08608, 2017.
[13] A. Yazdinejad, R. M. Parizi, A. Dehghantanha, H. Karimipour, G. Sri-
the advantages such ML-based systems bring about, they are vastava, and M. Aledhari, “Enabling drones in the internet of things
subjected to bias and being black-boxed in their internal com- with decentralized blockchain-based security,” IEEE Internet of Things
Journal, pp. 1–1, 2020.
plex working process, which is not widely understood in most [14] A. B. Arrieta], N. Dı́az-Rodrı́guez, J. D. Ser], A. Bennetot, S. Tabik,
industries. Explainable AI (XAI) is a relatively new research A. Barbado, S. Garcia, S. Gil-Lopez, D. Molina, R. Benjamins,
field that works towards making ML models more transparent R. Chatila, and F. Herrera, “Explainable artificial intelligence (xai):
Concepts, taxonomies, opportunities and challenges toward responsible
by creating techniques to enable developers/ adopters to be ai,” Information Fusion, vol. 58, pp. 82 – 115, 2020.
more confident when they deploy them. As a result, there [15] M. T. Ribeiro, S. Singh, and C. Guestrin, ““why should i trust you?”:
have been some tools to put into practice these techniques. Explaining the predictions of any classifier,” in Proceedings of the 22nd
ACM SIGKDD International Conference on Knowledge Discovery and
While there is research on this topic, sufficient progress has not Data Mining, ser. KDD ’16, 2016, p. 1135–1144.
been made regarding understanding such tools’ performance [16] M. Robnik-Šikonja and M. Bohanec, Perturbation-Based Explanations
on a deeper, technical level. This paper experimented with of Prediction Models. Springer International Publishing, 2018, pp.
159–175.
evaluating two major explainers, LIME and SHAP, for deep [17] T. Miller, “Explanation in artificial intelligence: Insights from the social
learning models on image and non-image datasets. The results sciences,” Artificial Intelligence, vol. 267, pp. 1–38, 2019.
from this research suggest that SHAP can perform better than [18] K. Sokol and P. Flach, “Explainability fact sheets: a framework for
systematic assessment of explainable approaches,” in Proceedings of the
LIME in a security-sensitive domain for both tabular and 2020 Conference on Fairness, Accountability, and Transparency, 2020,
image datasets. Future research should consider the potential pp. 56–67.
effects of empirical approaches more thoroughly in XAI. We [19] R. El Shawi, Y. Sherif, M. Al-Mallah, and S. Sakr, “Interpretability in
healthcare a comparative study of local machine learning interpretability
intend to continue our effort to assess the performance of techniques,” in 2019 IEEE 32nd International Symposium on Computer-
various explainable and interpretability frameworks on other Based Medical Systems (CBMS), 2019, pp. 275–280.
medical image datasets like COVID-19. Furthermore, it would [20] A. Adadi and M. Berrada, “Explainable ai for healthcare: From black
box to interpretable models,” in Embedded Systems and Artificial Intel-
be viable to use other local interpretability tools like layer-wise ligence, V. Bhateja, S. C. Satapathy, and H. Satori, Eds. Singapore:
relevance propagation (LRP) or Anchors and then compare Springer Singapore, 2020, pp. 327–337.
such results with our study. [21] C. Panigutti, A. Perotti, and D. Pedreschi, “Doctor xai: an ontology-
based approach to black-box sequential data classification explanations,”
in Proceedings of the 2020 Conference on Fairness, Accountability, and
Transparency, 2020, pp. 629–639.
R EFERENCES [22] “Breast cancer wisconsin (diagnostic) data set.” [Online]. Available:
https://www.kaggle.com/uciml/breast-cancer-wisconsin-data
[1] S. Makridakis, “The forthcoming artificial intelligence (ai) revolution: [23] “Nih chest x-rays data set.” [Online]. Available: https://www.kaggle.
Its impact on society and firms,” Futures, vol. 90, pp. 46 – 60, 2017. com/nih-chest-xrays/data
[2] M. Kantarcioglu and F. Shaon, “Securing big data in the age of ai,” in [24] A. Khan, A. Sohail, U. Zahoora, and A. S. Qureshi, “A survey of the
2019 First IEEE International Conference on Trust, Privacy and Security recent architectures of deep convolutional neural networks,” Artificial
in Intelligent Systems and Applications (TPS-ISA), 2019, pp. 218–220. Intelligence Review, pp. 1–62, 2019.
[3] A. Ekramifard, H. Amintoosi, A. H. Seno, A. Dehghantanha, and R. M. [25] D. Butnariu and T. Kroupa, “Shapley mappings and the cumulative
Parizi, A Systematic Literature Review of Integration of Blockchain and value for n-person games with fuzzy coalitions,” European Journal of
Artificial Intelligence. Springer International Publishing, 2020, pp. 147– Operational Research, vol. 186, no. 1, pp. 288–299, 2008.
160. [26] D. V. Carvalho, E. M. Pereira, and J. S. Cardoso, “Machine learning
[4] A. Yazdinejad, G. Srivastava, R. M. Parizi, A. Dehghantanha, K.- interpretability: A survey on methods and metrics,” Electronics, vol. 8,
K. R. Choo, and M. Aledhari, “Decentralized authentication of dis- no. 8, p. 832, 2019.
tributed patients in hospital networks using blockchain,” IEEE Journal
of Biomedical and Health Informatics, 2020.

Shap Lime

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Shap Lime

Uploaded by

Copyright:

Available Formats

An Empirical Evaluation of AI Deep Explainable

particularly ML-powered applications, there are some con- 2 https://github.com/marcotcr/lime

978-1-7281-7307-8/20/$31.00 ©2020 IEEE

Fig. 2. Experiment procedure for NIH dataset with CNN

Most importantly, they are very complex to interpret and it is C. Explainers

Fig. 4. SHAP Execution Explanation (ANN, Breast Cancer Dataset)

Fig. 5. LIME Execution Explanation (CNN, NIH Dataset)

You might also like