Professional Documents
Culture Documents
Table 2: Results of ground truth-based and model truth-based metrics for state-of-the-art XAI methods along with SISE (pro-
posed) on two networks (VGG16 and ResNet-50) trained on MS COCO 2014 dataset. For each metric, the best is shown in
bold, and the second-best is underlined. Except for Drop%, the higher is better for all other metrics.
Models recast Severstal dataset was 86.58 percent, while the top-3
VGG16 and ResNet-50 accuracy was 99.60 percent. Table 3 shows the normalized
confusion matrix of this model.
The top-1 accuracies of the VGG16 and ResNet-50 mod-
els (loaded from the TorchRay library (Fong, Patrick, and
Vedaldi 2019)) on the test set of the PASCAL VOC 2007 Evaluation
dataset were 56.56 percent and 57.08 percent respectively In addition to the quantitative evaluation results shared on
out of a maximum top-1 accuracy of 64.88 percent, while the main paper, the results of both ground-truth based and
the top-5 accuracies were 93.29 percent and 93.09 percent model-truth based metrics on the MS COCO 2014 dataset
respectively out of a maximum top-5 accuracy of 99.99 per- are attached in Table 2. Similar to our earlier results, SISE
cent. The top-1 accuracies of the VGG16 and ResNet-50 on outperforms other conventional XAI methods in most cases.
the validation set of the MS COCO 2014 dataset were 29.62 The MS COCO 2014 data set is more challenging for the
percent and 30.25 percent respectively out of a maximum explanation algorithms than the PASCAL VOC 2007 dataset
top-1 accuracy of 34.43 percent, while the top-5 accuracies because of
were 69.01 percent and 70.27 percent respectively out of a
maximum top-5 accuracy of 93.28 percent. • the higher number of object instances
• the presence of more extra small objects
Predicted Class • the presence of more objects either from the same or
0 1 2 3 4 different classes in each image (on average)
• the lower classification accuracy of the models
Actual Class
Dog
Bird
Train
Car
Figure 3: Sanity check experimentation of SISE as per (Adebayo et al. 2018) by randomizing a VGG16 model’s (pre-trained
on Pascal VOC 2007 dataset) parameters.
SISE explanations
Runtime on Runtime on
XAI Method
Input Image Trained model Untrained model
VGG16 (s) ResNet-50 (s)
Grad-CAM 0.006 0.019
Grad-CAM++ 0.006 0.020
Extremal Perturbation 87.42 78.37
Bus RISE 64.28 26.08
Score-CAM 5.90 18.17
Integrated Gradient 0.68 0.52
FullGrad 18.69 34.03
SISE 5.90 9.21
Cow
Table 5: Results of runtime evaluation of SISE along with
other algorithms on a Tesla T4 GPU with 16GB of memory.
Complexity Evaluation
References
A runtime test was conducted to compare the complexity of
the different XAI methods with SISE, timing how long it Adebayo, J.; Gilmer, J.; Muelly, M.; Goodfellow, I.; Hardt,
took for each algorithm to generate an explanation map. It M.; and Kim, B. 2018. Sanity checks for saliency maps. In
Extremal Integrated
Input Image Grad-CAM Grad-CAM++ Perturbation Score-CAM Gradient RISE SISE
Train
1.000
Person
0.9959
Dog
0.9408
Person
0.9889
Cat
0.9999
Person
0.0027
Horse
0.9962
Figure 5: Qualitative comparison of SISE with other state-of-the-art XAI methods with a ResNet-50 model on the Pascal VOC
2007 dataset.
Advances in Neural Information Processing Systems, 9505– PAO Severstal. 2019. Severstal: Steel Defect Detection
9515. on Kaggle Challenge. URL https://www.kaggle.com/c/
Everingham, M.; Van Gool, L.; Williams, C. K. I.; severstal-steel-defect-detection.
Winn, J.; and Zisserman, A. 2007. The PASCAL
Visual Object Classes Challenge 2007 (VOC2007) Re-
sults. URL http://www.pascal-network.org/challenges/
VOC/voc2007/workshop/index.html.
Fong, R.; Patrick, M.; and Vedaldi, A. 2019. Understand-
ing deep networks via extremal perturbations and smooth
masks. In Proceedings of the IEEE International Confer-
ence on Computer Vision, 2950–2958.
Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ra-
manan, D.; Dollár, P.; and Zitnick, C. L. 2014. Microsoft
coco: Common objects in context. In European conference
on computer vision, 740–755. Springer.
Extremal Integrated
Input Image Grad-CAM Grad-CAM++ Perturbation Score-CAM Gradient RISE SISE
Cat
1.000
Chair
9.65e-06
Person
0.999
Person
1.24e-04
Car
0.999
Figure 6: Comparison of SISE explanations generated with a VGG16 model on the Pascal VOC 2007 dataset.
Integrated
Input Image Grad-CAM Grad-CAM++ Score-CAM Gradient RISE SISE
Class 1
0.8513
Class 2
0.92
Class 3
0.9994
Class 4
0.9983
Figure 7: Qualitative results of SISE and other XAI algorithms from the ResNet-101 model trained on the recast Severstal
dataset.
Extremal Integrated
Input Image Grad-CAM Grad-CAM++ Perturbation Score-CAM Gradient RISE SISE
Elephant
0.1291
Toilet
0.9962
Tennis
Racket
0.0031
Person
0.9999
Truck
0.8803
Figure 8: Explanations of SISE along with other conventional methods from a VGG16 model on the MS COCO 2014 dataset.
Extremal Integrated
Input Image Grad-CAM Grad-CAM++ Perturbation Score-CAM Gradient RISE SISE
Fire Hydrant
0.9542
Pizza
0.0597
Handbag
0.0012
Donut
0.9786
Cup
0.0203
Person
0.9999
Bicycle
6.13e-07
Figure 9: Qualitative results of SISE and other XAI algorithms from the ResNet-50 model trained on the MS COCO 2014
dataset.