You are on page 1of 41

Downloaded from www.clastify.

com by Dhruv Manral

Investigating the Efficiency of Different Convolutional Neural Network

om
Architectures and Activations on Efficiency at Morphological

l.c
Classification of Telescopic Images

ai
gm
Research Question: To what extent are Convolutional Neural Networks based on DenseNet
3@

and ResNet architectures and Decision Trees efficient at making multi-class morphological
l1
ra

classifications of RGB Telescopic Images?


an
m

Topic: Convolutional Neural Networks and Image Classification


vs
ru

Subject: Computer Science


dh

Session: May 2023


y
tif
as

Word Count: 3998


Cl
Downloaded from www.clastify.com by Dhruv Manral

Table of Contents
Introduction: ................................................................................................................- 1 -
Background Research ...................................................................................................- 3 -
Other Approaches ........................................................................................................................... - 3 -
Neural Networks............................................................................................................................. - 3 -
CNNs .............................................................................................................................................. - 5 -
ResNet ............................................................................................................................................ - 8 -
DenseNet ........................................................................................................................................ - 9 -
Overfitting: ................................................................................................................................... - 11 -
Decision-trees ............................................................................................................................... - 11 -

om
Galaxy-morphology...................................................................................................................... - 12 -

l.c
Methodology ..............................................................................................................- 13 -

ai
Dataset and Data Format .............................................................................................................. - 13 -
gm
Variables....................................................................................................................................... - 15 -
3@

Method.......................................................................................................................................... - 19 -
l1

Hypothesis .................................................................................................................................... - 20 -
ra

Results .......................................................................................................................- 21 -
an

Results Tables............................................................................................................................... - 21 -
m
vs

Analysis ........................................................................................................................................ - 22 -
ru

Conclusions ................................................................................................................- 25 -
dh

Evaluation of the Method and Extensions .................................................................................... - 25 -


y
tif

Final Conclusions ......................................................................................................................... - 26 -


as

Works Cited ................................................................................................................- 27 -


Cl

Appendix ....................................................................................................................- 33 -
Appendix A: Data Download and Processing Code .................................................................... - 33 -
Appendix B: Models Code ........................................................................................................... - 34 -
Appendix C: Result Generation Code .......................................................................................... - 35 -
Appendix D: Training Graphs ...................................................................................................... - 36 -
Appendix E: Results on Data Point (GalaxyID: 100801) ............................................................ - 37 -
Appendix F: Imports and Requirements ...................................................................................... - 38 -
Downloaded from www.clastify.com by Dhruv Manral

-1-

Introduction:

Machine Learning (ML) is a dynamic field focused on learning from large amounts of data

without explicit programming (IBM Cloud). ML is a subfield of Artificial Intelligence (AI)

which attempts to computationally emulate human behavior to solve a broad range of

complex problems (Roy). ML can be applied to automate processes in various fields, often

with greater efficiency and accuracy than humans.

Neural Networks (NNs) are algorithms that are made of interconnected units called

om
“Neurons” and simulate the way mammalian brains process data. With recent developments

l.c
ai
in computing technologies, like Graphics Processing Units, Deep ML (Deep Learning) has
gm
grown immensely, with notable advancements in Computer Vision (CV) and Natural
3@
l1

Language Processing (NLP), which deal with unstructured data (language/images) rather than
ra
an

tabular forms (Coursera). NNs like Convolutional Neural Networks (CNNs) and Recurrent
m

Neural Networks (RNNs) have proven especially successful at these tasks.


vs
ru
dh

Telescopes worldwide collect millions of images, and with more advanced telescopes and
y
tif

better storage systems, this is expected to increase. Such data can be used to detect exoplanets
as
Cl

and galaxies many light years away. However, precise analysis by experienced astronomers

is required to detect and classify objects of interest in these images. Often, multi-step

morphological-classifications must be done (classifications relating to aspects of the internal

structure of astronomical bodies, that can be deduced through detailed analysis of images) (Li

et al.)
Downloaded from www.clastify.com by Dhruv Manral

-2-

Fig.1: Galaxy-Zoo Dataset

om
l.c
The Galaxy-Zoo (Fig.1) is a crowdsourced initiative by 90,000+ volunteers, with 1 million+

ai
images of Galaxies taken by the Sloan Digital Sky Survey, classified into types based on
gm
3@

shape, color, and direction. Because of the large volumes of data collected, and the depth of
l1

processing and attention to detail required to perform these classifications, this is a perfect
ra
an

application of Deep-learning technology.


m
vs
ru

Due to the large data volume and high cost of computing power, there is often a tradeoff
dh

between the accuracy achieved and the computing resources (like time/power) used.
y
tif

Therefore efficiency, which combines accuracy and computing resource usage, and therefore
as
Cl

optimizes for the real-world applicability of these models to solving these problems must be

evaluated (IBM).

This study aims to assess the applicability of different models, based on two CNN

Architectures: ResNet and DenseNet, along with Decision-tree approaches, to the processing

of large telescopic image datasets for their morphologic classification. This is a key emerging

problem to solve in the field of astronomy, due to the increased availability of data. By

generalizing the specific architectures and model traits that suit the task of morphological-
Downloaded from www.clastify.com by Dhruv Manral

-3-

classification of telescopic images, such explorations could transform the field of astronomy

by improving the way telescopic data is processed, putting it to better use.

Hence, the Research Question (RQ): To what extent are Convolutional Neural Networks

based on DenseNet and ResNet architectures and Decision Trees efficient at making multi-

class morphological classifications of RGB Telescopic Images?

Background Research

om
Other Approaches

l.c
ai
On Kaggle, 320+ teams competed; NNs were not as developed then, but the Top 3
gm
participants used primitive NNs with heavy image processing and augmentation techniques.
3@
l1

The highest performing model had an RMSE (Root-Mean-Squared-Error) of 0.07466, which


ra
an

has since been improved progressively. No works, however, directly compare different
m
vs

architectures, nor Decision-tree and non-decision-tree approaches, which is where this paper
ru

builds upon current research.


dh
y
tif

Neural Networks
as

Neural Networks (NNs) are ML Algorithms that simulate mammalian brains and are suited to
Cl

deep-learning tasks. NNs rely on having data to train on, as they improve their performance

on real-world tasks over time, mirroring how brains learn from experience and practice

(Hardesty).

NNs work on the principle of creating features from data and then using a weighted

combination of those features to generate output. NNs can be used for classification and

regression tasks (Babak). Morphological-classification requires a modified application of

classification-based NNs.
Downloaded from www.clastify.com by Dhruv Manral

-4-

NNs work on the premise of hidden layers, as depicted in

Fig.2, consisting of neurons ( “nodes”). Nodes are stacked

in “layers” that are connected to each other. The number of

layers determines the NN’s depth, hence, the complexity of

the task it can solve. Nodes in a layer are connected to

Fig.2: Barebones NN architecture; every node in the following and previous layers, but not to
Source: (Martin)
each other (Bishop).

om
l.c
A single node in a NN applies two functions. Firstly, it applies a set of weights (W) and

ai
gm
biases (b) to the output of every node in the previous layer, adding them up. Secondly, it
3@

applies a non-linearity called an activation function on the summed outputs. Without the non-
l1
ra

linearity, the entire NN could be abstracted to a single linear function.


an
m
vs

An NN node’s output can be defined as the following (using Table 1) (Russo) :


ru

𝐼𝐿−1
dh

𝑥𝑛𝐿 = 𝑔(∑ 𝑊𝑖 𝑥𝑖𝐿−1 + 𝐵)


y
tif

𝑖=1
as
Cl

Table 1 Definitions of abbreviations used in Formulas


Abbreviation Definition
𝑎[𝐿] output of layer L activation
𝑔(𝑥) activation function (like ReLU)
𝑍[𝐿] output of the linear component of layer L after the application of its
weights and biases/filters
𝑥𝑛𝐿 Output of the nth node of the Lth layer
𝐼𝐿 Number of nodes in the Lthlayer
𝐵 Bias
𝑊𝑖 ith Weight
Downloaded from www.clastify.com by Dhruv Manral

-5-

Training a NN involves determining the optimal values of

the weights and biases for every node, using gradient

descent. Every training task needs a loss function, which

measures the model’s performance at the task: it involves

defining a cost function that measures performance on a

single data point, which is then summed across examples

to calculate a final loss (Russo). Fig.3: Gradient descent visualized


Source: (Jain)

om
Gradient descent is performed in a series of weight updates, as the values of the weights are

l.c
changed based on the loss function’s derivative with respect to the weights, to eventually find

ai
gm
a combination of weights that reaches the loss function’s minima (Fig.3) (Ruder).
3@

CNNs
l1
ra
an

Regular NNs were designed for tabular data, with initial features defined by the data scientist.
m

However, image data is stored differently, as three 2D arrays corresponding to each pixel’s
vs
ru

RGB intensities (Awati).


dh
y
tif

Image processing using typical NNs has not yielded positive results– since each pixel’s RGB
as
Cl

values must each become an individual input feature into the NN, leading to the loss of

location-based information and unrealistically high computational cost (Arnaldo). Therefore,

a modified NN architecture called the Convolutional Neural Network (CNN) is used.

Unlike a regular NN’s weights and biases, the CNN relies on the Convolution and Pooling

layers. These help the NN better extract features from the image, which helps eventually run

successful classifier/regression algorithms.


Downloaded from www.clastify.com by Dhruv Manral

-6-

The Convolution Operator


The convolution-operator is an alternate way

of applying weights to image data. It replaces

weight matrices with filters that are applied to

all parts of the image (Hirshcman et al.). For

every pixel in the output, it is the sum of the


Fig.4: The convolution operator; Source: (Neutelings)
filter applied to a region of pixels in the input

image (Fig.4).

om
A CNN applies the fundamentals of gradient descent to optimize the filters’ values, and

l.c
multiple filters of the same size are usually applied to an image in one Convolutional-Layer.

ai
gm
Each convolution generates a smaller 2D array, and multiple arrays are stacked as outputs to
3@

a layer.
l1
ra
an
m

The Pooling-Operator
vs

The pooling-operator also reduces the size of


ru
dh

images, while extracting main features (Yu et


y
tif

al.). A square region of pixels is replaced by a


as

single pixel (Fig.5). There are multiple forms of


Cl

pooling used, most commonly, max-pooling,


Fig.5: The max pooling operator; Source: (Chen et al.)
where the output pixel is the maximum value from the input pixels.
Downloaded from www.clastify.com by Dhruv Manral

-7-

CNN Architecture

After using multiple CNN-specific layers, a

complete Convolutional-architecture (Fig.6)

also uses regular NN layers (Krizhevsky et

al.). After every convolution, the CNN applies

a non-linearity (activation) to every pixel.


Fig. 6 A Typical Convolutional-Architecture;
Source: (Garcia-Ordas et al.)
Since max-pooling and convolutional-layers

decrease the size of the image and increase the number of images, Convolutional-bases create

om
bottlenecks to learn the image’s features (Song et al.) . After these layers, a CNN ends with a

l.c
ai
fully-connected (regular NN) layer, hence flattening the features into a single vectorized
gm
3@

input. The second component of a complete network is a final activation layer (Hao et al.),
l1

like Sigmoid or SoftMax, that converts the image features into usable output (here,
ra
an

probabilities).
m
vs
ru
dh

Different layers, filter sizes, pooling, etc. can be combined to create different convolutional-
y
tif

architectures. Various architectures have been developed over years of research, optimized
as

for different tasks. Two prominent convolutional-architectures will be compared in this paper
Cl

along with two morphology-specific activation layers.


Downloaded from www.clastify.com by Dhruv Manral

-8-

ResNet

ResNet (He et al.) was developed in response to the vanishing gradient problem, where

gradients (w) became infinitesimally small with network depth, preventing data-scientists from

training very deep networks for complex relationships. ResNet leverages the concept of

om
l.c
ai
gm
3@
l1
ra
an
m
vs
ru
dh
y
tif

Fig. 7 A Sample 2-layer ResNet block with skip connection ; Source: Candidate
as
Cl

Residuals, where an Identity connection is used to skip certain layers. (Fig.7) shows a residual

block, central to ResNet architectures. The inputs undergo a linear layer, then an activation

(ReLU), and another linear layer, after which the original inputs are added again before the

data undergoes another activation (Abdullah). This Network is one of the most-cited of the 21st

century, and its effectiveness/credibility is therefore attested to.

The effects of the residual skip connection within a single block are therefore represented by:

𝑎[𝐿] = 𝑔(𝑍[𝐿] + 𝑎[𝐿−2]


Downloaded from www.clastify.com by Dhruv Manral

-9-

DenseNet

DenseNet (Zhang et al.) was another architecture developed to address vanishing gradients. It

leverages the same skip connections to a greater extent, where each layer within a block is

connected to the activations of every previous layer. Similar to ResNet, all inputs go through

a linear layer, followed by an activation (ReLU) and another linear layer, after which the

activations of all previous layers are added before the data goes through another

activation(Fig.8).

om
l.c
ai
gm
3@
l1
ra
an
m
vs
ru
dh
y
tif
as
Cl

Fig.8: A sample 4-layer DenseNet block with skip connections1; Source: Candidate

1
This schematic refers to “weight layers” for convenience, but these are convolutional and pooling layers in
reality. The weight operation referred to in the formula also refers to the convolution operator, not direct
multiplication.
Downloaded from www.clastify.com by Dhruv Manral

- 10 -

The effects of the dense skip connections within a single dense block are shown (Rodriguez-

Urrego):
𝐾

𝑎[𝐿] = 𝑔(𝑍[𝐿] + ∑(𝑎[𝐿−𝑘] ))


𝑘=2

Usually, there is also a weight matrix, where previous activations are multiplied by trainable

weights when being added (Muthukrishnan):


𝐾

𝑎[𝐿] = 𝑔(𝑍[𝐿] + ∑(𝑊[𝐿−𝑘,𝐿] ∗ 𝑎[𝐿−𝑘] ))

om
𝑘=2

l.c
ai
gm
3@
l1
ra
an
m

Comparison of the Architectures:


vs
ru

Both DenseNet and ResNet make claims to higher efficiency, and subsequent reports provide
dh

differing accounts based on the application (Haris). These comparisons come often from
y
tif

blog posts or unpublished ML researchers, so their validity is questionable. However, even


as
Cl

papers published in revered CV Journals have not reached a consensus (Chen et al.)

Therefore, it is pertinent to compare and analyze these networks in a context where deep

CNNs are needed.


Downloaded from www.clastify.com by Dhruv Manral

- 11 -

Overfitting:

An important consideration when training is overfitting,

where the model learns the labels of the training data,

without actually improving on the task. In practice, the

point of overfitting (“Stop Here” in Fig.9) is an

approximate region, identified by general consensus of ML

researchers (Ying et al.), as the point usually at minimum

validation loss, beyond which training will worsen the


Fig. 9 Overfitting over Epochs;

om
Source: (Jordan)
performance on unseen data.

l.c
ai
Decision-trees gm
3@

Morphological-classification of
l1
ra

astronomical images is more complex than


an
m

simple classification due to multi-step


vs
ru

decision processes (Hocking et al.) To do


dh

this, models must leverage another


y
tif

primitive form of predictive modeling:


as
Cl

Decision-trees, which use simple if-else


Fig.10: Visual of the Galaxy-morphology decision tree;
conditions. Source: Willett et al.

Integrating a Decision-tree with a NN involves predicting the answers to the conditionals at

each stage with deep-learning, and then following the Decision-tree’s framework. A

Decision-tree provides multiple paths to reach the final classification. Fig.10 visually displays

the Galaxy-morphology Decision-tree.


Downloaded from www.clastify.com by Dhruv Manral

- 12 -

Galaxy-morphology

Specifically to Galaxy-Zoo, the morphological-classification of

galaxies is a system designed to group galaxies by their visual

appearance (Kaggle), This dataset involves the classifications

of intricate morphological-features like bulge, spiral, shape, etc

(Fig.12). The leveraged system of morphological-classification

uses a complex Decision-tree that answers multiple questions,

each with multiple options, in sequential order. (Willett et al.)

om
Generating probabilities for each possibility will eventually

l.c
help generate the most probable path to the end, and can

ai
gm
eventually classify the galaxy image. The next question that the
3@

user faces is determined based on the response to every


l1
ra

preceding question, as detailed in Fig.11.


an
m

Fig.11: Galaxy-morphology task and


decision tree schematic; Source:
vs

Willett et al.
ru
dh
y
tif
as
Cl

Fig.12: Galaxy-morphology sample classifications; Source: (Malek)


Downloaded from www.clastify.com by Dhruv Manral

- 13 -

Methodology

The RQ measures efficiency of different model architectures; therefore, both performance

and resource usage must be considered (IBM).

The RQ requires evaluation of these models on the morphological-classification of telescopic

images, and the Galaxy-Zoo Dataset will be used as a representative of these tasks.

Primary experimental data from the training and evaluation of actual models is analyzed

because it allows more freedom of manipulation of the IVs. However, this constrains the

om
analysis within available computational power and technical limitations.

l.c
Dataset and Data Format

ai
gm
Dataset:
3@

The dataset, with 61578 images, is loaded and processed from Kaggle into Python and read
l1
ra

using Pandas. (Appendix A) It is split into Training (train), Testing (test), and Validation
an

(val), (with 70%, 15%, and 15% proportions respectively).


m
vs
ru

Images:
dh

The images are 424x424 Colored JPEG files (Fig.13). Each image is centered around an
y
tif
as

isolated galaxy, captured by the Apache Point Telescope.


Cl

GalaxyID 883411 GalaxyID 520112 GalaxyID 327708 GalaxyID 282041


Fig.13 Sample Galaxy Zoo Images; Source: Galaxy Zoo Dataset (Kaggle)

These are resized to 120x120 using interpolation to conserve processing power, without

significantly changing the extent to which morphological features are visible (May).
Downloaded from www.clastify.com by Dhruv Manral

- 14 -

Labels:

Table 2: Total output class distribution per question


Question Number of Classes Question Number of Classes
1 3 7 3
2 2 8 7
3 2 9 3
4 2 10 3
5 4 11 6
6 2 Total Classes 37
There are 37 columns to each galaxy image label, which correlate to all the possible

responses from users (broken-down in Table 2) and the dataset provides the probability of

each of these classes for every training example (Fig.14) Each galaxy is assigned a unique

om
galaxy-ID that corresponds to the image and row in the dataset.

l.c
ai
gm
3@
l1
ra
an
m

Fig.14 Sample Labels for Data; Source: Candidate’s Code (Appendix A)


vs
ru

Exploring a Single Data Point:


dh
y

The 120x120 Image, along with the probabilities (Table 3), for the image with GalaxyID
tif
as

100801 is chosen randomly to understand the structure and correlation of each data point.
Cl

Table 3 Data Point GalaxyID 100801


Class 1.1 0.035 Class 5.1 0.000 Class 8.1 0.040 Class 10.1 0.600
Class 1.2 0.965 Class 5.2 0.551 Class 8.2 0.000 Class 10.2 0.277
Class 1.3 0.000 Class 5.3 0.414 Class 8.3 0.040 Class 10.3 0.000
Class 2.1 0.000 Class 5.4 0.000 Class 8.4 0.079 Class 11.1 0.092
Class 2.2 0.965 Class 6.1 0.198 Class 8.5 0.000 Class 11.2 0.230
Class 3.1 0.252 Class 6.2 0.802 Class 8.6 0.040 Class 11.3 0.092
Class 3.2 0.713 Class 7.1 0.007 Class 8.7 0.000 Class 11.4 0.000
Class 4.1 0.875 Class 7.2 0.028 Class 9.1 0.000 Class 11.5 0.138
Class 4.2 0.090 Class 7.3 0.000 Class 9.2 0.000 Class 11.6 0.322
Class 9.3 0.000
Image 100801
Downloaded from www.clastify.com by Dhruv Manral

- 15 -

All probabilities >0.300 were flagged as likely answers (Table 3). Using these, we can

interpret these labels to deduce the most probable sequence of answered questions, (Table 4)

using Fig.11.

This data point appears to have one clear probable sequence of answers, however, this is not

necessarily true for others.

Table 4 Probable question sequence for GalaxyID 100801


Q1: Is the galaxy simply smooth and rounded with Q10: How tightly wound do the arms appear?
no sign of a disk? Class 10.1: tight (0.600) -> Q11
Class 1.2: Features or Disk (0.965) -> Q2
Q11: How many spiral arms are there?
Class 11.6: Can’t say (0.322) -> Q5

om
Q2: Could this be a disk viewed edge on?
Class 2.2: No (0.965) -> Q3

l.c
Q5: How prominent is the central bulge compared to
Q3: Is there a sign of a bar feature through the the rest of the galaxy?

ai
center of the galaxy? Class 5.2: just noticeable (0.551) ->Q6
Class 3.2: No (0.713) -> Q4
gm
Class 5.3: obvious (0.414) -> Q6
3@

Q4: Is there any sign of a spiral arm pattern? Q6: Is Anything Odd?
l1

Class 4.1: Yes (0.875) -> Q10 Class 6.2: No (0.802) -> end
ra
an
m

Variables
vs
ru

Independent Variables:
dh
y

Network Architecture (Convolutional Base):


tif
as

The first part of the network is the convolutional base, which extracts “features'' from the
Cl

data. This will be loaded using Keras’s prebuilt models (Fig.15)

The DenseNet(121) and the ResNet(101) Convolutional-architectures will be compared.

Fig.15 Defining Convolutional Base Source: Candidate’s Code (Appendix B)


Downloaded from www.clastify.com by Dhruv Manral

- 16 -

Activation

The final part of the model is the activation function, which converts the features detected by

the convolutional base into probabilities. The effect of two different types of Activation

Functions will be measured on model performance. Each activation will be combined with

both Convolutional Bases.

Sigmoid:
This approach uses 37 separate sigmoid activation nodes,

which generate a value between 0 and 1 for every single

om
class (Fig.16), independent of the probabilities of any other

l.c
classes. This was adapted from (Limanas), a submitter to the

ai
Fig.16 Sigmoid Activation Function;
competition, but has been used by most future published NNgm Source: (Ali)
3@

approaches to the task.


l1
ra

Decision-tree
an
m

Table 5: Probability matrix with decision-tree based weightages per question


vs

Question Multiplier* Question Multiplier*


ru

1 1.0 (100% probability) 6 P(Q7) || P(Q5) || P(Q9)


dh

2 P(Q1=1) = P(Q1=0) || P(Q2=1)|| P(Q2=0)


= P(Q1=0) || P(Q2)
y

3 P(Q2=1)
tif

4 P(Q3) = P(Q1=0) || P(Q1=1)


as

= P(Q2=1) = not(P(Q1=2))
Cl

5 P(Q4=1) || P(Q11) 7 P(Q1=0)


= P(Q4=1) || P(Q10) 8 P(Q6=0)
= P(Q4=1) || P(Q4=0) 9 P(Q2=0)
= P(Q4) 10 P(Q4=0)
= P(Q3) 11 P(Q10)
= P(Q2=1) = P(Q4=0)
*Indices of classes start from 0

Based on the Decision-tree in Fig.11, the alternative is where the probability of a volunteer

facing a particular question must also be considered in the probability of a certain answer.

Therefore, a probability matrix, (Table 5) is created to determine the relative probabilities of

each question. The weightages are multiplied by the probabilities generated by a SoftMax (a
Downloaded from www.clastify.com by Dhruv Manral

- 17 -

combination of Sigmoids that sum to 1) for each question, Fig.17. This approach is adapted

from (Kawagichi), who submitted to the competition. However, this source is not published

and necessarily credible; therefore the correctness of this design is not guaranteed.

Fig.17 Defining a Decision-tree in code; Source: Candidate’s Code (Appendix B)

om
Dependent Variables (DVs):
There are three DVs: performance, resource usage, and combined efficiency, which can be

l.c
ai
quantified in different ways.
gm
3@

Firstly, the “final” model at the end of 40 epochs will be saved, which standardizes the
l1
ra

number of weight updates and training the model undergoes.


an
m
vs

Secondly, the “optimal” model will be saved at the point of overfitting, which is theoretically
ru

the best performance the model will achieve on unseen data, allowing the models’ “best-
dh
y

possible” versions to be compared.


tif
as
Cl

The following will be measured and compared across IVs to answer the RQ (Appendix B).

Model Performance:

Quantitative Measures: Optimal Test Loss (𝑇𝑒𝑠𝑡 𝑀𝑆𝐸𝑜𝑝𝑡 ), Final Test Loss (𝑇𝑒𝑠𝑡 𝑀𝑆𝐸40 ),

Measured using custom callbacks (Appendix C)

The loss (MSE) on the Testing data will be found at both the Final and Optimal stages, using

Keras’s evaluate function. This measures the performance of the model on unseen data, and

simulates the model’s performance in real-world application. Low MSE values are good
Downloaded from www.clastify.com by Dhruv Manral

- 18 -

indicators of accuracy in multi-class prediction. RMSE, (= √𝑀𝑆𝐸), approximates the

average difference between real and predicted probability.


𝑛
1 2
𝑀𝑆𝐸 = ∑(𝑌𝑖 − 𝑌̂𝑖 )
𝑛
𝑖=1

n = no. of training examples,

𝑌𝑖 = ith label

̂ 𝑖 = ith prediction
𝑌

These metrics answer the accuracy and performance part of the research question, but further

om
analysis is needed to analyze the efficiency of the models.

l.c
ai
Resource Usage:
gm
Quantitative Metrics: Training time to final (𝑇40 ), Training time to optimal epochs (𝑇𝑜𝑝𝑡 ),
3@
l1

𝑇
Epochs needed to reach optimal (𝐸𝑜𝑝𝑡 ), number of parameters (𝑝), Time per epoch (𝐸)
ra
an

Measured using custom callbacks (Appendix C)


m
vs

The time taken to reach the final model will be measured and divided by 40 to find the
ru
dh

average time per epoch, which represents temporal resource use.


y
tif
as

The time/number of epochs taken to reach the optimal model will be measured, representing
Cl

the resources needed to reach optimum. Time is measured using Python’s time library.

The number of parameters in the model also indicates how much training/storage it requires

for a dataset.

Training Efficiency:
∆𝑀𝑆𝐸 ∆𝑀𝑆𝐸
Quantitative Metrics: loss gain per epoch , loss gain per training time , overfitting
𝐸 𝑇
Downloaded from www.clastify.com by Dhruv Manral

- 19 -

By processing the raw data (Table 6), loss gain/epoch, and loss gain/time are calculated. Both

show how the model trains given the same resources till it reaches optimum, quantifying the

accuracy/resource tradeoff, partly answering the RQ. The train & val loss/time graphs can be

analyzed to visually evaluate efficiency in training.

Table 6 Processing the Data


Gain in Loss: ∆𝑇𝑟𝑎𝑖𝑛 𝑀𝑆𝐸 = 𝑇𝑟𝑎𝑖𝑛 𝑀𝑆𝐸𝑜𝑝𝑡 − 𝑇𝑟𝑎𝑖𝑛 𝑀𝑆𝐸0
Gain in Loss/Epoch ∆𝑀𝑆𝐸 ∆𝑇𝑟𝑎𝑖𝑛 𝑀𝑆𝐸
=
𝐸 𝐸𝑜𝑝𝑡
Gain in Loss/Time ∆𝑀𝑆𝐸 ∆𝑇𝑟𝑎𝑖𝑛 𝑀𝑆𝐸
=
𝑇 𝑇𝑜𝑝𝑡

om
However, the training efficiency is not the only important factor in the model’s real-world

l.c
ai
applicability. A model can train very well, but if it overfits the data easily, it won’t perform
gm
well on unseen data. The difference between val/test and train loss at different stages, and a
3@
l1

visual inspection of the divergence in val/train loss from the graphs helps quantify this.
ra
an

Method
m
vs

Using the following model specifications, 4 trials (Table 8) will be compiled, trained, and
ru
dh

evaluated to produce a DV data table using Python, to effectively compare across IVs, as
y
tif

clearly documented in Appendix B. Materials are in Appendix F.


as

Table 7 Different Trials Conducted


Cl

Trial Type Depth Decision-tree


T1 ResNet 101 False
T2 DenseNet 121 False
T3 ResNet 101 True
T4 DenseNet 121 True

Controlled Variables/Model Specifications:


Table 7 shows variables that directly affect the DV’s measurement, and are specifications to

how every model should be trained. They must be kept constant to not skew the results.
Downloaded from www.clastify.com by Dhruv Manral

- 20 -

Table 8 Controlled Model Specifications


Variable Value/How it’s controlled Function Affects which
DV
Optimizer RMSProp: Controls process of gradient performance
https://keras.io/api/optimizers/rmsprop/ descent & learning rate (Ashish) and resource
usage.
Loss Mean Squared Error Value that is optimized throughout performance
Functions training.
Pretrained All models use pretrained weights on Changes starting point of model, resource usage
Weights ImageNet and training needed
Final Epochs 40 Determines how many weight performance
updates happen.
Processor NVIDIA P100 GPU Hardware Accelerator Determines the time taken to resource usage
GPU & RAM and 27.3 GB of RAM perform functions
Training, 70% Training, 15% Testing, 15% Validation Models with a more training data performance
Testing and randomly assigned (Random Seed = 10) learn better, but slower. and resource
Validation usage

om
Data
Python and Python 3.7.13, and Keras from Tensorflow Updated Python/Keras versions performance

l.c
Keras version 2.8.2 train with different efficiencies, and resource

ai
and may have slight differences in usage

IDE Used Google Colab Pro


gm
functionality
Some IDE’s are slower to run with, resource usage
3@

and some are designed for ML


tasks.
l1

Depth ~111 +10 layers Deeper networks have more performance


ra

parameters, and take longer to train and resource


an

but model complex functions better usage


m

Dataset Galaxy Zoo Version 1.0 Different releases have different performance
vs

Version https://www.kaggle.com/competitions/galaxy- composition & quality of training


ru

zoo-the-galaxy-challenge/data data.
Imports and Appendix F details all imports and their Different versions of modules may performance
dh

Dependencies versions have slight differences in and resource


y

functionality usage
tif
as
Cl

Hypothesis

It can be hypothesized that DenseNet outdoes ResNet in terms of resource usage because it

requires fewer parameters, and addresses vanishing gradients more drastically, but ResNet

will perform better in terms of accuracy because more parameters can fit the data better. The

Decision-tree should outperform Sigmoid in terms of training efficiency, since it manually

encodes relationships that would otherwise take time to learn, but might not be more

accurate, because small errors in initial probabilities have high weightage on the predictions'

accuracy.
Downloaded from www.clastify.com by Dhruv Manral

- 21 -

Results

Results Tables

Model Performance:

Table 9 Final Model Performance Without Decision Tree


Trial D.Tree Type Depth 𝑇𝑟𝑎𝑖𝑛 𝑀𝑆𝐸0 𝑇𝑟𝑎𝑖𝑛 𝑀𝑆𝐸40 𝑉𝑎𝑙 𝑀𝑆𝐸40 𝑇𝑒𝑠𝑡 𝑀𝑆𝐸40

T1 False ResNet 101 0.23168 0.00105 0.00919 0.00924

T2 False DenseNet 121 0.34153 0.00334 0.00989 0.00979

T3 True ResNet 101 0.08715 0.00554 0.01283 0.01294

om
T4 True DenseNet 121 0.10295 0.05962 0.05840 0.05892

l.c
ai
Table 10 Optimal model performance gm
Trial D.Tree Type Depth Epochs 𝑇𝑟𝑎𝑖𝑛 𝑀𝑆𝐸40 𝑉𝑎𝑙 𝑀𝑆𝐸40 𝑇𝑒𝑠𝑡 𝑀𝑆𝐸40
3@
l1

T1 False ResNet 101 39 0.00108 0.00909 0.00919


ra
an

T2 False DenseNet 121 18 0.00641 0.00880 0.00876


m

T3 True ResNet 101 22 0.00767 0.00891 0.00906


vs
ru

T4 True DenseNet 121 15 0.05908 0.05746 0.05812


dh
y
tif

Model Efficiency:
as
Cl

Table 11 Efficiency with Decision Tree


Trial D.Tree Type Depth Epochs 𝑇𝑜𝑝𝑡 𝑇40 𝑝 𝑇 ∆𝑀𝑆𝐸 ∆𝑀𝑆𝐸
𝐸 𝐸 𝑇

T1 False ResNet 101 39 3122.7 3201.3 42,628,645 80.03 0.00558 0.0000738

T2 False DenseNet 121 18 1410.2 3083.5 6,991,781 77.09 0.01862 0.0002376

T3 True ResNet 101 22 1823.0 3283.7 42,628,645 82.09 0.00361 0.0000440

T4 True DenseNet 121 14 1120.2 3146.2 6,991,781 78.66 0.00292 0.0000371

Note: All Training Graphs are in Appendix D


Downloaded from www.clastify.com by Dhruv Manral

- 22 -

Analysis

Best Performing Models:

Trial 2, DenseNet without Decision-tree, had the lowest testing loss of 0.00876 at optimal

stage. In terms of purely training efficiency (average loss gain/ time), the best-performing

model was also DenseNet without Decision-tree, decreasing training loss by

0.0002376/second. A close second was ResNet, both with and without the Decision-tree.

Between Model Architectures

Between ResNet(101) and DenseNet(121), ResNet always achieves the lower final loss,

om
regardless of Decision-tree. The gap between the final ResNet and DenseNet loss is greater

l.c
ai
with Decision-tree. However, for the optimal model, DenseNet(121) performs marginally
gm
better than ResNet(101) without the Decision-tree, but ResNet(101) far outperforms the
3@
l1

DenseNet(121) with the Decision-tree. DenseNet is slightly deeper (121>101), but it is


ra
an

unclear whether this causes these results.


m
vs

Moreover, ResNet tends to create better final models, likely because DenseNet reaches
ru
dh

optimal validation loss (point of overfitting) around the 15-18 epoch, much earlier than
th
y
tif

ResNet (around the 30 epoch) and therefore the final model is much more overfit than the
th
as
Cl

ResNet. Overall, optimal model loss is a better parameter to measure accuracy and therefore

yields comparable model performance between the two architectures with slight preference to

DenseNet.

DenseNet has significantly higher final training loss than ResNet, which means that it isn’t

even able to fit the training data as well, but also means that there is less chance of

overfitting. ResNet takes more time per epoch, likely because it has more trainable

parameters.
Downloaded from www.clastify.com by Dhruv Manral

- 23 -

With Decision-tree, ResNet decreases loss by a higher amount per epoch and per second, but

without the Decision-tree DenseNet significantly decreases loss by more per epoch and per

second. The DenseNet with the Decision-tree is the worst-performing model and also trains

the least efficiently. This reaffirms the idea that ResNet pairs much better with Decision-tree,.

Adding a Decision-tree improves the efficiency of the ResNet but worsens that of the

DenseNet.

DenseNet, however, has 1/6 of the number of parameters that ResNet does, and yet, models
th

the same relationship to a comparable accuracy (without Decision-tree).

om
l.c
Inspecting the graphs of model training, it is observed that there is a much higher disparity

ai
gm
between training and validation loss for ResNet than in DenseNet throughout training. This is
3@

usually because a higher number of parameters allows the network to fit the training data
l1
ra

better. Both have similar amounts of fluctuation in training and validation loss, and both
an
m

show a steady smooth curve of decrease in training loss over epochs.


vs
ru
dh

Overall, DenseNet seems to perform more efficiently with a decision tree and achieves the
y

highest efficiency, and ResNet performs better without a decision tree – unexpected, as
tif
as

ResNet was predicted to be better.


Cl

Between Output Layers.

Between Decision-tree and Sigmoid, for all trials, both final and optimal model losses for

sigmoid models were lower than their Decision-tree counterparts. For the optimal ResNet

model, the gap between Decision-tree and Sigmoid is small, but DenseNet seems to have

higher disparity between Decision-tree and Sigmoid loss.

Generally, Decision-tree is worse than individual sigmoid layers. Theoretically, the Decision-

tree manually encodes relationships in the data into the model instead of learning them,
Downloaded from www.clastify.com by Dhruv Manral

- 24 -

therefore should perform better, but this isn’t true. A likely explanation is that the Decision-

tree is highly dependent on initial questions being answered accurately (~ the first 10 classes)

– all successive probabilities are highly impacted by any variations in initial probability.

Therefore, even slight errors in predicting the initial probabilities are reflected heavily in the

entire prediction.

For both convolution-bases (DenseNet especially) adding the Decision-tree makes the

training process less efficient (smaller change in loss per time and epoch), contradicting the

hypothesis. The Decision-tree doesn’t affect the number of trainable parameters, and

om
therefore generally provides worse models for the same number of parameters.

l.c
ai
gm
Noticeably in graphs, throughout training, there is a much higher disparity between training
3@

and validation loss for Models with a Decision-tree. Val loss seems to be generally
l1
ra

decreasing till the end of training in models with a Decision-tree, while it flattens out after the
an
m

point of overfitting, with no change except for light fluctuations with Sigmoid. However,
vs
ru

Decision-tree models seem to have significantly more loss fluctuation. This can again be
dh

attributed to the fact that a Decision-tree manually encodes relationships that the sigmoid has
y
tif

to learn. Hence, small changes in weights to initial outputs have cascading effects on model
as
Cl

performance. In models without the Decision-tree, the model can easily fit itself to the

specific variations in the training data without actually learning overarching probabilistic

relationships, but the Decision-tree is forced to, which explains the trends in val/train loss

disparity over training.

Decision-tree models take slightly longer to train per epoch, but they reach the optimum

sooner than sigmoid-based models. For every trial, adding a Decision-tree made the model

reach the optimum (start to overfit) after fewer epochs and less time. This is positive because

it takes less temporal resources to train an optimal model, but it tends to overfit very easily.
Downloaded from www.clastify.com by Dhruv Manral

- 25 -

Overall, Sigmoid-based activation is more efficient and outperforms Decision Tree,

especially with DenseNet.

Analysis of the Data Point

Returning to the data point first dissected (GalaxyID 100801), the structure of predictions is

analyzed (Appendix E). Again, probabilities >0.300 are flagged as answers. Most models are

predicting realistically close values, but almost no predictions have the exact same

probabilities, which seems discouraging. However, most models still generate nearly accurate

sequences of questions and answers; Hence, in application morphological-classification is

om
l.c
done with considerable accuracy. For all models, predictions get worse towards the last

ai
questions, they are most dependent on initial probabilities, and small errors in initial
gm
3@

predictions have the most weightage. However, making any generalizable conclusions based
l1

on one data point is unreliable.


ra
an
m

Conclusions
vs
ru
dh

Evaluation of the Method and Extensions


y
tif

While the method adequately answered RQ, by considering model performance and resource
as
Cl

usage in detail, there are many potential sources of error identified.

Firstly, there was no data cleanup/preprocessing, which might mean that faulty data points

(eg. outliers/null measurements) skewed the data. In typical ML tasks, multiple steps of data

cleanup are needed before training.

Secondly, an ever-standing limitation in ML is the lack of data. Data augmentation measures

could have been used to improve the model’s training. Insufficient training data leads to
Downloaded from www.clastify.com by Dhruv Manral

- 26 -

increased overfitting, and less generalizability of the model, which hinders the answering of

the RQ since model performance is distorted.

Lastly, the RQ could have been answered better (more generalizably), if analysis was

conducted over multiple different morphological-datasets, involving other astronomical

features.

To extend, factors like hardware, optimizers, and kernel size can be evaluated to find the

optimal model for morphological-classification. Additionally, the experiment could extend to

more Network Architectures like EfficientNet, LeNet, AlexNet, VGG, etc.

om
l.c
Final Conclusions

ai
gm
In conclusion, answering the RQ, the tested Dense-Net and Res-Net architectures with and
3@

without Decision-trees, are efficient at classifying the morphology of telescopic images,


l1
ra

achieving a best MSE of 0.00876, and an RMSE of 0.0936, close to Kaggle’s winning loss
an
m

(0.07466) (Kaggle). This means that there was an average 0.0936 (~9%) difference between
vs

predicted and real probabilities, which is acceptably low. Between the IVs, while there were
ru
dh

many complexities in trends, it was generally identified that DenseNet classifies more
y
tif

efficiently than ResNet, and simple Sigmoid activations tend to perform morphological-
as
Cl

classification more efficiently. The hypothesis was mostly incorrect and the trends were

surprising,

Overall, most network architectures performed acceptably well on the task, enough for real-

world application. This paper helps improve our understanding of CNN features that suit

morphological-classification of images and can better the models implemented in real

astronomical labs.
Downloaded from www.clastify.com by Dhruv Manral

- 27 -

Works Cited

Abdullah, Muhammad. "Introduction to ResNets." Towards Data Science, 24 Aug. 2019,

towardsdatascience.com/introduction-to-resnets-c0a830a288a4.

Ali, Amir. “Logistic Regression with Practical Implementation.” Medium, The Art of Data

Scicne, 24 Nov. 2019, https://medium.com/machine-learning-researcher/logistic-

regression-in-machine-learning-ad4d5fef88bb.

Arnaldo, Muhammed. "How to Build a Multi-class Image Classification Model without

om
CNNs in Python." Analytics Vidhya, 27 Jul. 2021, medium.com/analytics-

l.c
vidhya/how-to-build-a-multi-class-image-classification-model-without-cnns-in-

ai
python-660f0f411764. gm
3@
l1
ra

Awati, Rahul. "Convolutional Neural Network (CNN)." TechTarget, 2021,


an
m

techtarget.com/searchenterpriseai/definition/convolutional-neural-network.
vs
ru
dh

Bishop, Chris M. "Neural networks and their applications." Review of scientific


y
tif

instruments 65.6 (1994): 1803-1832.


as
Cl

Chen, Liang-Chieh, et al. “DeepLab: Semantic Image Segmentation with Deep Convolutional

Nets, Atrous Convolution, and Fully Connected Crfs.” IEEE Transactions on Pattern

Analysis and Machine Intelligence, vol. 40, no. 4, 2018, pp. 834–848.,

https://doi.org/10.1109/tpami.2017.2699184.

Chen, Zhiqiang, et al. "Integrating Spatial and Temporal Features for Early Recognition of

Mild Cognitive Impairment from Multi-Modal Neuroimaging Data." Scientific

Reports, vol. 10, no. 1, 2020, doi:10.1038/s41598-020-70479-z.


Downloaded from www.clastify.com by Dhruv Manral

- 28 -

Coursera. “Structured vs. Unstructured Data: What’s the Difference?” Coursera Articles,

Coursera, https://www.coursera.org/articles/structured-vs-unstructured-data-whats-

the-difference.

“Galaxy Zoo - The Galaxy Challenge.” Kaggle, Kaggle, https://www.kaggle.com/c/galaxy-

zoo-the-galaxy-challenge.

García-Ordás, María Teresa, et al. “Detecting Respiratory Pathologies Using Convolutional

om
Neural Networks and Variational Autoencoders for Unbalancing Data.” Sensors, vol.

l.c
ai
20, no. 4, 2020, p. 1214., https://doi.org/10.3390/s20041214.
gm
3@

Hao, Wang, et al. "The role of activation function in CNN." 2020 2nd International
l1
ra

Conference on Information Technology and Computer Application (ITCA). IEEE,


an
m

2020.
vs
ru
dh

Haris, Faizan. "ResNets, DenseNets, and UNets." The Startup - Medium, 26 Sept. 2020,
y
tif

https://medium.com/swlh/resnets-densenets-unets-6bbdbcfdf010.
as
Cl

Hardesty, Larry. "Explained: Neural Networks." MIT News, 14 Apr. 2017,

news.mit.edu/2017/explained-neural-networks-deep-learning-0414.

He, Kaiming, et al. "Deep Residual Learning for Image Recognition." Proceedings of the

IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2016,

doi:10.1109/CVPR.2016.90.
Downloaded from www.clastify.com by Dhruv Manral

- 29 -

Hirschman, Isidore Isaac, and David V. Widder. The convolution transform. Courier

Corporation, 2012.

IBM Cloud. “Machine Learning: What it is and why it matters.” IBM, IBM,

https://www.ibm.com/cloud/learn/machine-learning.

IBM. "Resource utilization and performance." IBM Informix documentation, version 14.10,

IBM, https://www.ibm.com/docs/en/informix-servers/14.10?topic=basics-resource-

om
utilization-performance.

l.c
ai
Jain, Rashmi. “3 Types of Gradient Descent Algorithms for Small & Large Data Sets.”
gm
3@
HackerEarth Blog, 2 July 2019, https://www.hackerearth.com/blog/developers/3-types-
l1

gradient-descent-algorithms-small-large-data-sets/.
ra
an
m

Jordan, Jeremy. “Deep Neural Networks: Preventing Overfitting.” Jeremy Jordan, Jeremy
vs
ru

Jordan, 30 July 2018, https://www.jeremyjordan.me/deep-neural-networks-preventing-


dh

overfitting/.
y
tif
as

Kawaguchi, Hironobu. “Galaxy Zoo Xception.” Kaggle, Kaggle, 13 Mar. 2018,


Cl

https://www.kaggle.com/code/hironobukawaguchi/galaxy-zoo-xception.

Kumar, Ashish. "A Look at Gradient Descent and RMSprop Optimizers." Towards Data

Science, 2 Sept. 2019, towardsdatascience.com/a-look-at-gradient-descent-and-

rmsprop-optimizers-f77d483ef08b.

Krizhevsky, Alex, et al. “ImageNet Classification with Deep Convolutional Neural

Networks.” arXiv.org, arXiv, 11 Dec. 2013, https://arxiv.org/abs/1308.3496.


Downloaded from www.clastify.com by Dhruv Manral

- 30 -

Li, M. P., Ostriker, J. P., Sunyaev, R., & Blandford, R. D. "Non-thermal radiation from

clusters of galaxies." Monthly Notices of the Royal Astronomical Society, vol. 397,

no. 4, 2009, doi:10.1111/j.1365-2966.2009.15366.x.

Limanas, Henrique. “Galaxy Zoo Classifier Galaxies.” Kaggle, Kaggle, 13 Mar. 2018,

https://www.kaggle.com/code/henriquelimanas/galaxy-zoo-classifier-galaxies.

om
Malek, Abdul. The Einsteinian Universe?: A Dialectical Perspective of Modern Theoretical

l.c
Physics and Cosmology. A. Mannan, 2004.

ai
gm
Martin, Patrick. “The Universal Approximation Theorem Is Terrifying.” Medium, Medium, 9
3@
l1

Aug. 2022, https://medium.com/@patrickmartinaz/the-universal-approximation-


ra
an

theorem-is-terrifying-83a53acc4192.
m
vs
ru

May, Ann. "Resizing Images using Various Interpolation Techniques." Medium, 10 May
dh

2021, annmay10.medium.com/resizing-images-using-various-interpolation-
y
tif

techniques-3c302e2e08c5.
as
Cl

Muthukrishnan, Saravanakumar. "Review: DenseNet Image Classification." Towards Data

Science, 4 Jan. 2019, https://towardsdatascience.com/review-densenet-image-

classification-b6631a8ef803.

Neutelings, Izaak. “Neural Networks.” TikZ.net, 2 May 2022,

https://tikz.net/neural_networks/.

Roy, Priya. “The Difference Between Artificial Intelligence and Machine Learning.”
Downloaded from www.clastify.com by Dhruv Manral

- 31 -

Analytics Insight, Analytics Insight, 13 Dec. 2019,

https://www.analyticsinsight.net/the-difference-between-artificial-intelligence-and-

machine-learning/#

Rodriguez-Urrego, David, and Miguel A. Maheut. "Deep Learning for Sentiment Analysis: A

Survey." Information, vol. 10, no. 11, 2019, doi:10.3390/info10110354.

Ruder, Sebastian. "An overview of gradient descent optimization algorithms." arXiv preprint

om
arXiv:1609.04747 (2016).

l.c
ai
gm
Song, Yan, Ian McLoughLin, and Lirong Dai. "Deep bottleneck feature for image
3@

classification." Proceedings of the 5th ACM on International Conference on


l1
ra

Multimedia Retrieval. 2015


an
m
vs

Willett, Kyle W., et al. “Galaxy Zoo 2: Detailed Morphological Classifications for 304 122
ru

Galaxies from the Sloan Digital Sky Survey.” Monthly Notices of the Royal
dh
y

Astronomical Society, vol. 435, no. 4, 2013, pp. 2835–2860.,


tif
as

https://doi.org/10.1093/mnras/stt1458.
Cl

Yu, Dingjun, et al. "Mixed pooling for convolutional neural networks." Rough Sets and

Knowledge Technology: 9th International Conference, RSKT 2014, Shanghai, China,

October 24-26, 2014, Proceedings 9. Springer International Publishing, 2014.

Ying, Xue. "An overview of overfitting and its solutions." Journal of physics: Conference

series. Vol. 1168. IOP Publishing, 2019.


Downloaded from www.clastify.com by Dhruv Manral

- 32 -

Zhang, Xiyu, et al. “ResNet or DenseNet? Introducing Dense Shortcuts to ResNet.”

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer

Vision, 2021,

https://openaccess.thecvf.com/content/WACV2021/papers/Zhang_ResNet_or_Dense

Net_Introducing_Dense_Shortcuts_to_ResNet_WACV_2021_paper.pdf.

om
l.c
ai
gm
3@
l1
ra
an
m
vs
ru
dh
y
tif
as
Cl
Downloaded from www.clastify.com by Dhruv Manral

- 33 -

Appendix

Appendix A: Data Download and Processing Code

Loading the Data

Creating Train, Test, Validation Datasets

om
l.c
ai
gm
3@
l1
ra
an
m
vs
ru
dh
y
tif
as
Cl
Downloaded from www.clastify.com by Dhruv Manral

- 34 -

Appendix B: Models Code

Defining Model

om
l.c
ai
gm
3@
l1
ra
an
m

Custom Callbacks
vs
ru
dh
y
tif
as
Cl

Compiling and Training


Downloaded from www.clastify.com by Dhruv Manral

- 35 -

Appendix C: Result Generation Code

Training Graphs

Loss and Results

om
l.c
ai
gm
3@
l1
ra

Printing Results
an
m
vs
ru
dh
y
tif
as
Cl
Downloaded from www.clastify.com by Dhruv Manral

- 36 -

Appendix D: Training Graphs

Without Decision Tree:

Tri Base Depth Accuracy RMSE Loss


al Architectu
re
T1 ResNet 101

T2 DenseNet 121

om
l.c
ai
gm
3@

With Decision Tree:


l1
ra

Trial Base Depth Accuracy RMSE Loss


an

Architect
ure
m
vs

T3 ResNet 101
ru
dh
y
tif
as

T4 DenseNe 121
Cl

t
Downloaded from www.clastify.com by Dhruv Manral

- 37 -

Appendix E: Results on Data Point (GalaxyID: 100801)

Trial Truth Value Correct Option T1 T2 T3 T4


Prob (1s)
Class 1.1 0.035 0.168 0.171 0.222 1.000
Class 1.2 0.965 1 0.822 0.818 0.769 0.000
Class 1.3 0.000 0.012 0.011 0.009 0.000
Class 2.1 0.000 0.015 0.014 0.011 0.000
Class 2.2 0.965 1 0.810 0.783 0.758 0.000
Class 3.1 0.252 0.137 0.097 0.108 0.000
Class 3.2 0.713 1 0.662 0.692 0.650 0.000
Class 4.1 0.875 1 0.589 0.494 0.609 0.000
Class 4.2 0.090 0.169 0.335 0.148 0.000

om
Class 5.1 0.000 0.020 0.011 0.020 0.000

l.c
Class 5.2 0.551 1 0.484 0.333 0.406 0.000
Class 5.3 0.414 0.325 0.420 0.301 0.000

ai
Class 5.4 0.000 0.013 gm 0.047 0.030 0.000
Class 6.1 0.198 0.271 0.414 0.213 0.255
3@

Class 6.2 0.802 1 0.728 0.590 0.778 0.745


Class 7.1 0.007 0.052 0.081 0.139 0.381
l1
ra

Class 7.2 0.028 0.103 0.067 0.082 0.348


Class 7.3 0.000 0.002 0.001 0.001 0.270
an

Class 8.1 0.040 0.148 0.151 0.079 0.000


m

Class 8.2 0.000 0.023 0.034 0.018 0.000


vs

Class 8.3 0.040 0.094 0.081 0.044 0.062


ru

Class 8.4 0.079 0.054 0.087 0.026 0.125


dh

Class 8.5 0.000 0.018 0.065 0.032 0.000


y

Class 8.6 0.040 0.006 0.034 0.013 0.069


tif

Class 8.7 0.000 0.001 0.002 0.000 0.000


as

Class 9.1 0.000 0.012 0.013 0.008 0.000


Cl

Class 9.2 0.000 0.001 0.001 0.000 0.000


Class 9.3 0.000 0.002 0.002 0.003 0.000
Class 10.1 0.600 1 0.346 0.309 0.362 0.000
Class 10.2 0.277 0.167 0.176 0.198 0.000
Class 10.3 0.000 0.033 0.038 0.050 0.000
Class 11.1 0.092 0.095 0.078 0.065 0.000
Class 11.2 0.230 0.157 0.123 0.209 0.000
Class 11.3 0.092 0.077 0.054 0.053 0.000
Class 11.4 0.000 0.008 0.015 0.018 0.000
Class 11.5 0.138 0.007 0.012 0.024 0.000
Class 11.6 0.322 1 0.203 0.253 0.240 0.000
Downloaded from www.clastify.com by Dhruv Manral

- 38 -

Appendix F: Imports and Requirements

IDE and Software


Google Colab Pro on Google Chrome

OS and Language
Python 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
Linux-5.10.147+-x86_64-with-glibc2.29

Modules and Imports

PIL 7.1.2 opt_einsum v3.3.0


h5py 3.1.0 packaging 23.0
keras 2.11.0 pexpect 4.8.0
matplotlib 3.5.3 pickleshare 0.7.5
numpy 1.22.4 pkg_resources NA platformdirs

om
pandas 1.3.5 3.0.0
session_info 1.0.0 portpicker NA

l.c
tensorflow 2.11.0 prompt_toolkit 2.0.10

ai
psutil 5.4.8
astor
astunparse
0.8.1
1.6.3 pyarrow
gm
ptyprocess 0.7.0
9.0.0
3@
backcall 0.2.0 pyasn1 0.4.8
cachetools 5.3.0 pyasn1_modules 0.2.8
l1

certifi 2022.12.07 pydev_ipython NA


ra

cffi 1.15.1 pydevconsole NA


an

chardet 4.0.0 pydevd 2.0.0


cloudpickle 2.2.1 pydevd_concurrency_analyser NA
m

cycler 0.10.0 pydevd_file_utils NA


vs

cython_runtime NA pydevd_plugins NA
ru

dateutil 2.8.2 pydevd_tracing NA


dh

debugpy 1.0.0 pydot_ng 2.0.0


decorator 4.4.2 pygments 2.6.1
y

dill 0.3.6 pyparsing 3.0.9


tif

etils 1.0.0 pytz 2022.7.1


as

flatbuffers 23.1.21 requests 2.25.1


fsspec 2023.1.0 rsa 4.9
Cl

gast NA scipy 1.7.3


google NA sitecustomize NA
google_auth_httplib2 NA six 1.15.0
googleapiclient NA socks 1.7.1
httplib2 0.17.4 sphinxcontrib NA
idna 2.10 storemagic NA
importlib_resources NA tblib 1.7.0
ipykernel 5.3.4 tensorboard 2.11.2
ipython_genutils 0.2.0 termcolor NA
jax 0.3.25 tornado 6.2
jaxlib 0.3.25 traitlets 5.7.1
kiwisolver 1.4.4 typing_extensions NA
mpl_toolkits NA uritemplate 4.1.1
numexpr 2.8.4 urllib3 1.24.3
oauth2client 4.1.3 wcwidth 0.2.6
wrapt 1.14.1
zipp NA
zmq 23.2.1
Downloaded from www.clastify.com by Dhruv Manral

- 39 -

om
l.c
ai
gm
3@
l1
ra
an
m
vs
ru
dh
y
tif
as
Cl

You might also like