You are on page 1of 66

Interpretable Classifications

Confirmation Seminar
Candidate: Abdul Karim

Institute of Integrated and Intelligent


Systems, Griffith University

Supervisors: Abdul Sattar


MA Hakim Newton
Agenda
 Introduction:
• What, Why and How of Interpretability
• Model-agnostic methods for extracting interpretations
• Interpretability levels

 Research Challenges:
• A framework that is highly accurate and capable of answering interpretability questions
 Research Goals:
• Stage 1: Searching for optimum features subset
• Stage 2: Formulating rules to extract statistical interpretability
• Stage 3: Formulating rules to extract causal interventional and counterfactual interpretability

 Preliminary Work: Under Review, ACS omega


• “Toxicity prediction using hybridization of shallow neural networks and decision trees”

 Conclusion:
Low and High Risk Classifications

Application Risk Factor Criticality Potential of Knowledge


Discovery
Movie Recommendation System Low Risk Less Critical Less
Image Classification (Cat and Dog) Low Risk Less Critical Less

Image Classification (Cancer and Non-Cancer) High Risk High Critical Medium

Drug Discovery High Risk High Critical High

 Interpretability is very crucial

3/45
Introduction 4/45

Classification Black-Box Algorithm


Classification (Deep Neural Network)
Toxic Non-Toxic
Drug Discovery Company Toxic/Non-Toxic

Machine Learning Engineer

Chemical Features
Chemical Features

Evaluation Metrics

AUROC: Area Under the Receiver Operating Characteristic curve


𝑇𝑃
𝑻𝒓𝒖𝒆 𝑷𝒐𝒔𝒊𝒕𝒊𝒗𝒆 𝑹𝒂𝒕𝒆 = 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = 𝑇𝑃+𝐹𝑁 :measures the proportion of actual positives that are correctly identified
𝑇𝑁
𝑭𝒂𝒍𝒔𝒆 𝑷𝒐𝒔𝒊𝒕𝒊𝒗𝒆 𝑹𝒂𝒕𝒆 = 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑡𝑦 = 𝐹𝑃+𝑇𝑁 :measures the proportion of actual negatives that are correctly identified
Introduction

Analysis
 Interpretability is the degree to which human can understand
the cause of a machine learning decision. (Miller, 2017)
Drug Discovery Company
1. How did the machine decide that this specific
Toxic Non-Toxic compound is toxic?
2. What makes the compound toxic?
Evaluation Metrics 3. How can I change the toxic compound to non-
toxic?

 Incompleteness in the problem formulization (Doshi-Velez and Kim 2017).


 Model opaqueness
 Local Comprehension I have no clue.
 Global Comprehension

5/45
Introduction
 Incompleteness in the problem formulization (Doshi-Velez and Kim 2017).

DNN

Chemical Features

Input

1. How did the machine decide that this specific compound is toxic?
2. What makes the compound toxic?
3. How can I change the toxic compound to non-toxic?
6/45
Introduction
 Model opaqueness

o Big pile of incomprehensible noise


o A very big parameter matrix

7/45
 Local Comprehension
Introduction
Chemical Features of single compound

Weight Matrix

 Why did the model decides that this


specific compound is toxic?
Toxic
 Local region of the conditional distribution

8/45
 Global Comprehension
Introduction
Chemical Features of ALL compound

Weight Matrix Toxic


Non-Toxic
Toxic
 How does the trained model make
predictions for all compounds?
Toxic
 The complete conditional distribution

Non-Toxic

Toxic

9/45
Big Research Question

 How to incorporate more details in problem formularization?


 How to tap into additional knowledge captured by the model?

 Using observed data and statistical modelling


 Causal graphical modelling
 Expert knowledge

10/45
Accuracy vs Interpretability

Learning Techniques (today)

Neural Nets
Graphical

Prediction Accuracy
Models
Deep
Learning Ensemble
Bayesian Methods
Belief Nets
SRL Random
CRFs HBNs Forests
AOGs
Statistical MLNs

Models Decision
Markov Trees
SVMs Models Interpretability

11/45
Model-agnostic methods

Proxy Models – Knowledge Distillation Local Interpretable Model-agnostic Explanations (LIME)


(Ivan Sanchez et.al 2015) (MT Ribeiro et. al 2016)

 Possible but It is itself a research gap to be filled.


1. How did the machine decide that this specific compound is toxic?
2. What makes the compound toxic?
3. How can I change the toxic compound to non-toxic?

12/45
Model-agnostic methods
Proxy Models – Knowledge Distillation
(Ivan Sanchez et.al 2015)

Prediction accuracy
is near to Black-Box
Features and Target Predicted Target values Model

Train the black-Box

Features

Train the Interpretable


decision tree

13/45
Model-agnostic methods
Local Interpretable Model-agnostic Explanations (LIME)

14/45
Model-agnostic methods
Local Interpretable Model-agnostic Explanations (LIME)

15/45
Model-agnostic methods
Local Interpretable Model-agnostic Explanations (LIME)

16/45
Model-agnostic methods
Local Interpretable Model-agnostic Explanations (LIME)

Working of LIME:

 Permute data.

 Make predictions on new data using complex model.

 Calculate distance between permutations and original observations.

 Pick m features best describing the complex model outcome from the permuted data.

 Fit a simple model to the permuted data with m features and similarity scores as weights.

 Feature weights from the simple model make explanations for the complex models local behavior.

17/45
Model-agnostic methods

Drawbacks:

 Mostly applied to text and Image datasets (Low Risk with less potential of knowledge discovery).

 Not driven by interpretability questions.

 Lacking intelligent feature selection.

 Too much of wiggle-room for optimization.

18/45
Interpretability Levels

Drug Discovery Company

• What if I do x? (Causal Intervention)

Can be made possible


 Causal Intervention  𝑝 𝑦 𝑑𝑜(𝑥 , 𝑧)
(Judea Pearl ,2018) Impossible
Association  𝑝 𝑦 𝑥 I have no clue.

19/45
Interpretability Levels 20/45

Statistical Modelling
 Association  𝑝 𝑦 𝑥

Activity: Seeing, Observing

Questions: What is the probability of “y” if we observe “x”?


(How are the variables related, How would seeing “x” change my belief in “y”)

Examples: What does the symptom tell me about a certain disease?

Causal Interventional Modelling Counterfactual Modelling (Judea Pearl ,2018)


(Judea Pearl ,2018)

 Causal Intervention  𝑝 𝑦 𝑑𝑜(𝑥 , 𝑧)  Counterfactuals 𝑝 𝑦𝑥 𝑥 ` 𝑦 ` )

Activity: Doing, Intervening Activity: Imagining, Retrospection

Questions: What is the probability of “y” if we do “x”? What If I Questions: What if I had done something else?
do?
Examples: What if I had not smoked for the last
Examples: What will happen to my headache if I will take aspirin? two years?
Research Challenge

 A framework with the following attributes.

 Classification Performance

 Specifications of interpretability

 Transparency

21/45
Research Proposal

My research focus!

Defining
Optimum
interpretability
Features Subset
questions

Interpretability Statistical Statistical


Framework Modelling Interpretability

Evaluating
Interpretability
Causal
Causal Graphical Intervention and
Modelling Counterfactual
Interpretability

22/45
Framework Block diagram
Research Plan

Key points of the proposed method:

 Shallow Neural Network (SNN) can be used to achieve


state of the art performance.

 Optimum features subset will help us in interpretability


and performance.

 Joint optimization of a neural network fitting function


and number of features selection.

 Graphical causal modelling for causal interpretations.

23/45
24/45
Preliminary Work
Under Technical Review: ACS Omega ( American Chemical Society)
Submission Date: 30th April, 2018
Preliminary Work

Key Attributes:

1) Feature ranking using decision tree gini- index.


2) Shallow Neural Network (SNN) (one hidden layer with only 10 neurons) for training.
3) Joint optimization of SNN and number of features selected.
4) Answers few of the statistical interperability questions

Key Results:

1) Highest average accuracy on 12 toxicity tasks.


2) Lowest training time with minimum compute resources.
3) Less number of features (~512 out of ~1442) on average for all 12 toxicity tasks.
4) Simple/transparent training model.
5) Class discrimination plots based on the important features.

25/45
Datasets Preliminary Work

Tox21 dataset for 12 Nuclear Receptor and Stress Response


Panel. It includes train, cross validation (CV) and test set
Toxic/ Toxic/ Toxic/
Test
Task Train Non-Toxic CV Non-Toxic Non-Toxic

AHR 7863 937/6926 268 30/238 594 73/521


AR 9036 374/7950 288 3/285 573 12/559
AR-LBD 8234 284/7950 249 4/245 567 8/559
Stress response assays Aromatase 6959 352/6607 211 18/193 515 37/478
ER 7421 916/6505 261 27/234 505 50/455
ER-LBD 8431 415/8016 283 10/273 585 20/565
PPARG 7883 193/7690 263 15/248 590 30/560
ARE 6915 1040/5875 230 47/183 540 90/450
HSE 7879 386/7493 263 10/253 594 19/575
Nuclear receptor assays MMP 7071 1117/5954 234 38/196 530 58/472
p53 8349 509/7840 265 28/237 601 40/561
ATAD5 8775 317/8458 268 25/243 606 36/570

26/45
Framework Preliminary Work

Hybrid Network
2D Features
(~1442) extraction Mixing Cross-
Feature Selection

Hybrid Network
Valid and Train
via Decision Trees

Optimization
x4
Train ~8000
Cross-Valid ~296 Model Training
Final-Test~647 SNN Training

Cross-Valid Ensemble
Minority Class Up-
Sampling AUCROC

Final TEST
Optimized AUCROC
Normalization
Parameters
Classification
Pre-Processing Optimization

27/45
Hybrid approach: Optimization-I Preliminary Work
Total available features 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 , 𝑥6 , 𝑥7 , 𝑥8 , 𝑥9 , 𝑥10
Feature Selection

1) Decision Tree Classifier


2) Gini-Importance of each feature
3) Mean Gini-Importance

Joint optimization
Selected features subset 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 , 𝑥6 , 𝑥7 , 𝑥8 , 𝑥9 , 𝑥10

Shallow Neural Network

28/45
CV AUC-ROC
Hybrid approach: Optimization-I Preliminary Work
Total available features 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 , 𝑥6 , 𝑥7 , 𝑥8 , 𝑥9 , 𝑥10
Feature Selection

1) Decision Tree Classifier


2) Gini-Importance of each feature
3) Mean Gini-Importance

Joint optimization
Selected features subset 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 , 𝑥6

Shallow Neural Network

29/45
CV AUC-ROC
Hybrid approach: Optimization-I Preliminary Work
Total available features 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 , 𝑥6 , 𝑥7 , 𝑥8 , 𝑥9 , 𝑥10
Feature Selection

1) Decision Tree Classifier


2) Gini-Importance of each feature
3) Mean Gini-Importance

Joint optimization
Selected features subset 𝑥1 , 𝑥2

Shallow Neural Network

 Optimum features subset 30/45


𝑥1 , 𝑥2, 𝑥6 , 𝑥9 , CV AUC-ROC
Hybrid approach: Optimization-II Preliminary Work

 Optimum features subset


𝑥1 , 𝑥2, 𝑥6 , 𝑥9 ,

CV AUC-ROC

 Optimized parameters for Neural Network


Epochs, Initialization Function, Drop-out, Activation,
Mini-batch size
31/45
Final Test Results Preliminary Work

Performance on Tox21 dataset for 12 Nuclear Receptor


and Stress Response Panel
Hybrid Model RF SVM

Feature AUC-ROC AUC-ROC AUC-ROC


Task
Selected Test Set Test Set Test Set
AHR 270 0.921 0.907 0.889
AR 284 0.743 0.638 0.730
AR-LBD 365 0.881 0.800 0.702
Aromatase 815 0.794 0.792 0.782
ER 292 0.822 0.778 0.791
ER-LBD 755 0.836 0.768 0.786
PPARG 528 0.858 0.789 0.744
ARE 615 0.828 0.774 0.779
HSE 1028 0.832 0.859 0.798
MMP 685 0.958 0.978 0.916
p53 223 0.875 0.847 0.810
ATAD5 390 0.820 0.812 0.765
Average 521 0.847 0.812 0.791 32/45
Performance Comparison Preliminary Work
Comparative analysis of different methods used for NR and
SR toxicity prediction on Tox21 data set.
NR SR Total
Name Average Average Average
AUC-ROC AUC-ROC AUC-ROC
Our Method 0.836 0.862 0.847 Competitive Landscape
DeepTox 0.826 0.858 0.846
Training time and model complexity of the top 5 toxicity
AMAZIZ 0.816 0.854 0.838 prediction models
Capuzziet 0.831 0.848 0.840
dmlab 0.811 0.85 0.824
Models Method Features Training Time Average
T 0.798 0.842 0.823
AUC-ROC
SMILES2VEC NA NA 0.810 Our DT+SNN 521 ~1 min CPU 0.847
microsomes 0.785 0.814 0.810 DeepTox DNN 273577 ~10 min GPU 0.846
filipsPL 0.765 0.817 0.798 AMAZIZ ASNN NA NA 0.838
Charite 0.75 0.811 0.785
Capuzziet DNN 2489 NA 0.840
RCC 0.751 0.781 0.772
dmlab RF + ET 681 ~13 sec CPU 0.824
frozenarm 0.759 0.768 0.771
ToxFit 0.753 0.756 0.763
CGL 0.72 0.791 0.759
SuperTox 0.682 0.768 0.743 • Random forest and extra tree classifier (ET)
kibutz 0.731 0.731 0.741 • Deep Neural Networks (DNN)
MML 0.7 0.753 0.734 • Decision Tree (DT)
NCI 0.651 0.791 0.717
VIF 0.702 0.692 0.708 • Shallow Neural Network (SNN)
Toxic Avg 0.659 0.607 0.644 • Associative Neural Network (ASNN)
Swamidass 0.596 0.593 0.576 33/45
Chemception 0.787 0.739 0.773
Feature Importance Preliminary Work

(a) Cumulative gini index score of 1422 features


across 12 toxicity tasks
(b) Average ranking of 1422 features against cumulative gini
index score in all 12 tasks
(c) Ranking of top 29 features.

Knowledge
Extraction

34/45
Class Discrimination Screen Preliminary Work

Cut-off

Knowledge
Extraction

35/45
Class Discrimination Screen Preliminary Work

36/45
Research Plan

Feature Selection Module of Interpretability Framework:


 Defining a better and efficient way of feature raking, optimizing the number of features selected for
training.

Statistical Interpretation Module of Interpretability Framework:


 Creating statistical interpretations from the trained statistical model in the context of already defined
interpretability questions.

Causal Interpretation Module of Interpretability Framework:


 Creating causal interpretations from the graphical causal model in the context of already defined
interpretability questions.
37/45
Research Plan (Feature Selection)

Search for optimum feature subset: Searching the optimum subset of features from the total available feature space.

Research Challenge: Defining criteria for selecting the feature subset, adding or eliminating features

Gains: 1) Reduced number of features


2) Relatively less complex training model is required
3) More comprehensible

Total available features 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 , 𝑥6 , 𝑥7 , 𝑥8 , 𝑥9 , 𝑥10

Joint optimization
Feature
Selection/Ranking

Selected features subset 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4

Statistical Model training

Objective Function 38/45


Research Plan (Feature Selection)

Search for optimum feature subset: Searching the optimum subset of features from the total available feature space.

Research Challenge: Defining criteria for selecting the feature subset, adding or eliminating features

Gains: 1) Reduced number of features


2) Relatively less complex training model is required
3) More comprehensible

Total available features 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 , 𝑥6 , 𝑥7 , 𝑥8 , 𝑥9 , 𝑥10

Joint optimization
Feature
Selection/Ranking

Selected features subset 𝑥1 , 𝑥2 , 𝑥3 , 𝑥5 , 𝑥7 , 𝑥9

Statistical Model training

Objective Function 39/45


Research Plan (Statistical Interpretability)

 Applying “LIME” Predefined Statistical


 Applying “Proxy Models” Interpretability questions:

 We want to realize “Universal Approximation


Theorem”

 Devising our own algorithm (Inspired by shortcomings in LIME, Proxy


Models and Decision Trees)

40/45
Research Plan (Causal Interpretability)
Inspired by Judea Pearl Work:
Causal interventional and counterfactual questions can be addressed by using “Causal Bayesian Networks and Structural
Equations Models”. (Pearl ,2000, Chapter 3) and (Judea Pearl, 2018)

Observed data Statistical Interpretations Expert Knowledge

Causal
Graphical
Modelling

Predefined Causal
Interpretability questions:
41/45
Research Plan

42/45
Research Papers Progress
Submitted to ACS Omega :
“Toxicity prediction using hybridization of shallow neural networks and decision trees.”

To be completed by 30th July:


“Roadmap towards machine learning complete interpretability”

Yet to begin:
“An interpretable framework to classify chemical toxicity”

43/45
Conclusion

Interpretability
Causal
Questions for Statistical
High Risk
Specific Domain

High Accuracy

Helping the user


to understand
Answering the the problem
Questions

44/45
45/45
Forward pass A Reasonable Deep Neural Network
http://places.csail.mit.edu/
new scene-centric database called Places
𝑤
1 𝐴
𝑤𝑥 𝑔 𝑤𝑥 =
1 + 𝑒 −𝑤𝑥

𝑥
1 2 3
𝐸𝛼(𝑊 ,𝑊 ,𝑊 , 𝑊 [4] )

𝑖
𝑋 𝐴[1] = 𝑔(𝑊 1 𝑋 1 ) 𝐴[2] = 𝑔(𝑊 2 𝐴1) 𝐴[3] = 𝑔(𝑊 3
𝐴2) 𝐴[4] = 𝑔(𝑊 4 𝐴3)
12288x 10 million
10x10 million 10x10 million 10x10 million 1x10 million
1
2

3
10x12288 10 x10 10x10 1x10
Error
𝑊 [1] 𝑊 [2] 𝑊 [3] 𝑊 [4]
12288
10 10 10
46
Introduction

Today Task
• Why did you do that?
Decision or • Why not something else?
Machine Recommendation • When do you succeed?
Training Learned
Learning • When do you fail?
Data Function
Process • When can I trust you?
• How do I correct an error?
User

XAI Task
• I understand why
New • I understand why not
Training Machine Explainable Explanation • I know when you succeed
Data Learning Model Interface • I know when you fail
• I know when to trust you
Process
• I know why you erred
User
Credit: DARPA XAI

 Interpretability is the degree to which human can understand the cause of a machine learning
decision. (Miller, 2017)
47
Introduction

Why not just trust the model and ignore why it


made a certain decision?

We should not because….

 “Single evaluation metric” is an incomplete description of the most real-world tasks.


 Incompleteness in the problem formulization (Doshi-Velez and Kim 2017).
 Gaol of Science
 Safety measurements
 Detect bias
 Manage social interactions

48
Scope of Interpretability

 Algorithm transparency
How does the algorithm create the model?

 Global, Holistic Model Interpretability


How does the trained model make predictions?
o The complete conditional distribution.

 Local Interpretability for a Single Prediction


Why did the model make a specific decision for an instance?
o Local region of the conditional distribution.

49
Evaluating Interpretability
Doshi-Velez and Kim (2017) propose three major levels of evaluating interpretability:

 Application level evaluation (real task)


How an expert will explain the decision ?

 Human level evaluation (simple task)


How a lay human will explain the decision?

 Function level evaluation (proxy task)


Does not require any humans.

More dimensions to interpretability evaluations

• Model sparsity: How many features are being used by the explanation?
• Monotonicity: Is there a monotonicity constraint?
• Interactions: Is the explanation able to include interactions of features?
• Cognitive processing time: How long does it take to understand the explanation?
• Feature complexity: What features were used for the explanation?
• Description length of explanation.
50
Human-style Explanations
 Research from the humanities can help us to figure that out (Miller, 2017)

 Short explanations
 Abnormal causes

 What is an explanation?

An explanation is the answer to a why-question ( Miller 2017).

• Why did the treatment not work on the patient?


• Why was my loan rejected?

 Questions starting with “how” can usually be turned into “why” questions: “How was my loan
rejected?” can be turned into “Why was my loan rejected”.

51
Human-style Explanations
 What is good explanation?

“Many artificial intelligence and machine learning publications and methods claim to be about ‘interpretable’ or
‘explainable’ methods, yet often this claim is only based on the authors intuition instead of hard facts and
research.” - Miller (2017)

Explanations are contrastive (Liptopn 2016):


Explanations are selected:
Explanations are social:
Explanations focus on the abnormal (Kahnemann 1981):
Good explanations are general and probable:

52
Interpretable Models
 Linear regression model  Decision Tree Model
𝑦 = 𝑎0 + 𝑎1 𝑥 1 + 𝑎2 𝑥 2 + 𝑎3 𝑥 3 … … … . .

 Assumptions(not satisfied quite often in real world problems):


Linearity, Normality, Homoscedasticity, independence,
Fixed features, Absence of multicollinearity.

 Logistic regression model


1
𝑝 𝑦=1 = 1 +𝑎 2 +𝑎 3
1 + 𝑒 (− 𝑎0+𝑎1𝑥 2𝑥 3𝑥 )

 The interpretations always come with a clause that


“all other features stay the same”.

Drawbacks: Drawbacks:
 Interactions has to be hand crafted
 Poor performance in real world tasks  Handling of linear relationships
 Fail when relation between target and features is non-  Lack of smoothness
linear  Unstable
 Fail when features are interacting with each other
53
Model-agnostic methods

Partial Dependence Plot (PDP)


 Interpreting complex machine learning algorithms
 Graphical visualizations of the marginal effect of a given
variable (or multiple variables) on an outcome.
 Restricted to only one or two variables due to limit of
human perception.
 May mislead due to higher-order interactions.
 Assumption of independence is a big issue.

Individual Conditional Expectations (ICE)


 For a chosen feature, individual expectation plot
draw one line per instance, representing how the
instance’s prediction changes when the feature Credit: Sci-kit Learn
changes.

54
Research Challenges
Research Goal-I:
 Need of generic interpretability questions

What is Science?
 Sciences are primarily defined by their questions rather than by tools.

o Astrophysics: The discipline that learns the composition of stars, not as the discipline that use the
spectroscopes.

o Machine Learning Interpretability: Should be a discipline that answers generic questions related to
interpretability.

 The questions should fulfil the criteria of “Human style good explanations”

55
Research Challenges

Research Goal-III:

 Evaluating the interpretability

• Model sparsity: How many features are being used by the explanation?
• Monotonicity: Is there a monotonicity constraint?
• Interactions: Is the explanation able to include interactions of features?
• Cognitive processing time: How long does it take to understand the explanation?
• Feature complexity: What features were used for the explanation?
• Description length of explanation.

56
Research Plan

Formulation of general interpretability questions

• Which feature/s are the most important ones in the context of classification or prediction? (Statistical)
• What is the range of values for selected important features to discriminate a specific class? (Statistical)
• How the model came across certain decision? (Statistical and Transparency)
• How sensitive is the class output to a specific feature? (Statistical and Causal Intervention)
• How does the final class output change if we force some specific feature to get some value which is not
observed in the data? (Causal Intervention)
• How effective is a specific feature in resulting a specific class? (Causal Effectiveness)
• What are those features (or their specific values) if they had not occurred, specific class would not have
occurred? (Counterfactual)

…….. On going

57
Tox21 dataset Chemical Feature

NAME ACTIVITY piPC8 piPC9 piPC10


NCGC00261443 0 6.752065 6.714265 6.822368
NCGC00261266 0 3.944006 3.446011 0
NCGC00261559 0 6.243256 6.376833 6.534198
NCGC00261121 0 6.210537 6.451753 6.655762
NCGC00261374 0 5.182836 0 0
NCGC00261612 0 5.153653 4.928611 4.878531
NCGC00261002 1 7.0973 7.192502 7.409491
NCGC00261311 0 5.281934 4.447053 3.66276

58
59
60
Post-Hoc interpretations

Inform

Extract

Learn

Capture

61
Literature Review Downsides of KNN :
 Scalability and computational
 Classification using k-nearest neighbours and Support Vectors Machines (SVM) cost issues . (Deng & Zhao, 2017)

(Chavan, Friedman, & Nicholls, 2015; Kauffman & Jurs, 2001) (Ajmani, Jadhav, & Kulkarni, 2006) (Keiser et al., 2009)  Gives optimum accuracy for a
small data set with few
features.
(Chavan et al., 2015) KNN (Kauffman & Jurs, 2001) KNN (Ajmani et al., 2006) KNN
Train Test Features Train Test Features Train Test Features
 KNN can be used for relatively
94 24 8 314 74 30 3 smaller data sets with few
features
 A non-linear SVM can handle high dimensional data but not robust enough to handle the diversity of chemical descriptors
(Svetnik et al., 2003).
 Mostly not the state of art classification accuracy

A better classification Model (Svetnik et al., 2003)

 Classification accuracy is high  Wolpert’s No Free Lunch theorem


 Can handle molecular diversity ( large and diverse descriptors)
 Ease in training (The Mathematics of Generalization; Wolpert, D. H., Ed.; AddisonWesley:
Reading, 1995)
 Interpretability
Literature Review
A better classification Model (Svetnik et al., 2003)
 Classification using Random Forest
 Capable of handling large datasets size with diverse features  Classification accuracy is high
 Not state of the art accuracy in many cheminformatics datasets  Can handle molecular diversity
 Less control over granular level optimization  Ease in training
(Svetnik et al., 2003) (Polishchuk et al., 2009) (Tong, Hong, Fang, Xie, & Perkins, 2003)  Interpretability

 Wolpert’s No Free Lunch theorem

Deep Neural Network is an Artificial Neural Network with more than one hidden layer between the input and output.
(Bengio, 2009) (Schmidhuber, 2015) (Goodfellow, Bengio, Courville, & Bengio, 2016)

A better classification Model (Svetnik et al., 2003)


Conditions for DNN: (Nasrabadi, 2007) (Trunk, 1979) (Schmidhuber, 2015) (Goodfellow et al., 2016)
 Classification accuracy is high
 Huge training data with many many features (relevant and irrelevant)  Can handle molecular diversity
 The network should contain more than one hidden layer with many  Ease in training
neurons  Interpretability

Curse of dimensionality (Nasrabadi, 2007) (Trunk, 1979)


Increasing the accuracy of shallow neural Network

Performance
Number of features

Deep Neural
Network We can reduce this gap
if we train shallow
neural network using
Medium Neural optimum number of
Network shallow Neural relevant features.
Network
Traditional
Learning
Algorithms

Amount of Data

(https://blogs.nvidia.com/wp-content/uploads/2016/07/Deep_Learning_Icons_R5_PNG.jpg.png)
Relevant Features Selection using Decision Tree

• The relative rank (i.e. depth) of a feature used as


a decision node in a tree.

• Features used at the top of the tree contribute to


the final prediction decision of a larger fraction of
the input samples.

Average = most important feature


• The expected fraction of the samples they
contribute to can thus be used as an estimate of
the relative importance of the features.

• Decision trees use Gini Index to determine


which variables would go at top.
66

You might also like