Professional Documents
Culture Documents
• Basal-like (Basal)
• Luminal A (LumA)
• Luminal B (LumB)
• HER2-enriched (HER2)
• Normal-like (Normal)
The Goals of this study
• To increase the overall accuracy of prediction (due to
many of the genes are irrelevant and redundant, using
all 13582 genes, accuracy = 77.84%)
• This work is expected to identify the subtypes of the
cancer of a new unknown instance
– Selecting the minimum subset of genes
– Yielding the highest classification accuracy
Dataset
• The dataset consists of 13,582 genes (features)
• The dataset contains 158 instances
• Labeled into five classes:
Basal, Her2, LumA, LumB, and Normal.
Machine Learning Tool Used
• Weka: The Waikato Environment for Knowledge
Analysis
• Java Implementations
• clustering, classification, regression, feature selection,
and visualization
The Proposed Model
• {D1, D2, D3, D4, D5} to {D1, D2}.
• {D1, D2, D3, D4, D5} to {D2 D3, D4, D5}
Feature Selection & Classification
The Tree-based Model
Features Plot
The Evaluation Results of the
First Node
(Basal Subtype)
The Evaluation Results of the
Second Node (Her2 Subtype)
The Evaluation Results of the
Third Node (Normal Subtype)
The Evaluation Results of the
Fourth Node (LumA Subtype)
Results:
• The model identified a total of 23 genes from 13,582
genes
• AGR2, TFF3
• This result could be beneficial in decreasing the time
and cost as only a few genes are needed to be
processed and analyzed.
The Comparison with the Study
by Rezaeian et al
Deep Neural Networks
• A deep neural network (DNN) is an artificial neural
network (ANN) with multiple layers between the input
and output layers
• Deep neural networks for classification
https://en.wikipedia.org/wiki/Deep_learning#Deep_neural_networks
Deep Neural Networks
Machine Learning Steps Using
DNN
• Data collection
• Data cleaning – in Pandas
• Data preparation – in Numpy and the use Scikitlearn
• DNN extracts information from data – use Keras
• GPUs make the process faster
Classifiers Building
• The most used library to develop models in deep
learning is Tensorflow
• We used Keras which is a high level API built on
TensorFlow to implement our neural network
• Classifiers built based on Google Colab
DNN Binary Classification
Results:
Basal Her2
[[ 7, 1], [[ 8, 0],
Confusion Matrix Confusion Matrix
[ 0, 40]] [ 4, 36]]
Number of Instances 48
Number of Instances 48
Accuracy 97.91%
Accuracy 83.33%
Normal LumA
[[ 0, 0], [[ 17, 1],
Confusion Matrix [ 1, 47]] Confusion Matrix [ 8, 22]]
Number of Instances 48 Number of Instances 48
Accuracy 97.91% Accuracy 81.25%