To What Extent Are Convolutional Neural Networks Based On DenseNet and ResNet Architectures and Deci

Downloaded from www.clastify.
com by Dhruv Manral
Investigating the Efficiency of Different Convolutional Neural Network
om
Architectures and Activations on Efficiency at Morphological
l.c
Classification of Telescopic Images
ai
gm
Research Question: To what extent are Convolutional Neural Networks based on DenseNet
3@
and ResNet architectures and Decision Trees efficient at making multi-class morphological
l1
ra
classifications of RGB Telescopic Images?

an
m
Topic: Convolutional Neural Networks and Image Classification

vs
ru
Subject: Computer Science

dh
Session: May 2023

y
tif
as
Word Count: 3998

Cl
Downloaded from www.clastify.com by Dhruv Manral
Table of Contents
Introduction: ................................................................................................................- 1 -
Background Research ...................................................................................................- 3 -
Other Approaches ........................................................................................................................... - 3 -
Neural Networks............................................................................................................................. - 3 -
CNNs .............................................................................................................................................. - 5 -
ResNet ............................................................................................................................................ - 8 -
DenseNet ........................................................................................................................................ - 9 -
Overfitting: ................................................................................................................................... - 11 -
Decision-trees ............................................................................................................................... - 11 -
om
Galaxy-morphology...................................................................................................................... - 12 -
l.c
Methodology ..............................................................................................................- 13 -
ai
Dataset and Data Format .............................................................................................................. - 13 -
gm
Variables....................................................................................................................................... - 15 -
3@
Method.......................................................................................................................................... - 19 -
l1
Hypothesis .................................................................................................................................... - 20 -
ra
Results .......................................................................................................................- 21 -
an
Results Tables............................................................................................................................... - 21 -
m
vs
Analysis ........................................................................................................................................ - 22 -
ru
Conclusions ................................................................................................................- 25 -
dh
Evaluation of the Method and Extensions .................................................................................... - 25 -

y
tif
Final Conclusions ......................................................................................................................... - 26 -

as
Works Cited ................................................................................................................- 27 -

Cl
Appendix ....................................................................................................................- 33 -
Appendix A: Data Download and Processing Code .................................................................... - 33 -
Appendix B: Models Code ........................................................................................................... - 34 -
Appendix C: Result Generation Code .......................................................................................... - 35 -
Appendix D: Training Graphs ...................................................................................................... - 36 -
Appendix E: Results on Data Point (GalaxyID: 100801) ............................................................ - 37 -
Appendix F: Imports and Requirements ...................................................................................... - 38 -
-1-
Introduction:
Machine Learning (ML) is a dynamic field focused on learning from large amounts of data
without explicit programming (IBM Cloud). ML is a subfield of Artificial Intelligence (AI)
which attempts to computationally emulate human behavior to solve a broad range of
complex problems (Roy). ML can be applied to automate processes in various fields, often
with greater efficiency and accuracy than humans.
Neural Networks (NNs) are algorithms that are made of interconnected units called
om
“Neurons” and simulate the way mammalian brains process data. With recent developments
l.c
ai
in computing technologies, like Graphics Processing Units, Deep ML (Deep Learning) has
gm
grown immensely, with notable advancements in Computer Vision (CV) and Natural
3@
l1
Language Processing (NLP), which deal with unstructured data (language/images) rather than
ra
an
tabular forms (Coursera). NNs like Convolutional Neural Networks (CNNs) and Recurrent
m
Neural Networks (RNNs) have proven especially successful at these tasks.

vs
ru
dh
Telescopes worldwide collect millions of images, and with more advanced telescopes and
y
tif
better storage systems, this is expected to increase. Such data can be used to detect exoplanets
as
Cl
and galaxies many light years away. However, precise analysis by experienced astronomers
is required to detect and classify objects of interest in these images. Often, multi-step
morphological-classifications must be done (classifications relating to aspects of the internal
structure of astronomical bodies, that can be deduced through detailed analysis of images) (Li
et al.)
-2-
Fig.1: Galaxy-Zoo Dataset
om
l.c
The Galaxy-Zoo (Fig.1) is a crowdsourced initiative by 90,000+ volunteers, with 1 million+
ai
images of Galaxies taken by the Sloan Digital Sky Survey, classified into types based on
gm
3@
shape, color, and direction. Because of the large volumes of data collected, and the depth of
l1
processing and attention to detail required to perform these classifications, this is a perfect
ra
an
application of Deep-learning technology.

m
vs
ru
Due to the large data volume and high cost of computing power, there is often a tradeoff
dh
between the accuracy achieved and the computing resources (like time/power) used.
y
tif
Therefore efficiency, which combines accuracy and computing resource usage, and therefore
as
Cl
optimizes for the real-world applicability of these models to solving these problems must be
evaluated (IBM).
This study aims to assess the applicability of different models, based on two CNN
Architectures: ResNet and DenseNet, along with Decision-tree approaches, to the processing
of large telescopic image datasets for their morphologic classification. This is a key emerging
problem to solve in the field of astronomy, due to the increased availability of data. By
generalizing the specific architectures and model traits that suit the task of morphological-
-3-
classification of telescopic images, such explorations could transform the field of astronomy
by improving the way telescopic data is processed, putting it to better use.
Hence, the Research Question (RQ): To what extent are Convolutional Neural Networks
based on DenseNet and ResNet architectures and Decision Trees efficient at making multi-
class morphological classifications of RGB Telescopic Images?
Background Research
om
Other Approaches
l.c
ai
On Kaggle, 320+ teams competed; NNs were not as developed then, but the Top 3
gm
participants used primitive NNs with heavy image processing and augmentation techniques.
3@
l1
The highest performing model had an RMSE (Root-Mean-Squared-Error) of 0.07466, which

ra
an
has since been improved progressively. No works, however, directly compare different
m
vs
architectures, nor Decision-tree and non-decision-tree approaches, which is where this paper
ru
builds upon current research.

dh
y
tif
Neural Networks
as
Neural Networks (NNs) are ML Algorithms that simulate mammalian brains and are suited to
Cl
deep-learning tasks. NNs rely on having data to train on, as they improve their performance
on real-world tasks over time, mirroring how brains learn from experience and practice
(Hardesty).
NNs work on the principle of creating features from data and then using a weighted
combination of those features to generate output. NNs can be used for classification and
regression tasks (Babak). Morphological-classification requires a modified application of
classification-based NNs.
-4-
NNs work on the premise of hidden layers, as depicted in
Fig.2, consisting of neurons ( “nodes”). Nodes are stacked
in “layers” that are connected to each other. The number of
layers determines the NN’s depth, hence, the complexity of
the task it can solve. Nodes in a layer are connected to
Fig.2: Barebones NN architecture; every node in the following and previous layers, but not to
Source: (Martin)
each other (Bishop).
om
l.c
A single node in a NN applies two functions. Firstly, it applies a set of weights (W) and
ai
gm
biases (b) to the output of every node in the previous layer, adding them up. Secondly, it
3@
applies a non-linearity called an activation function on the summed outputs. Without the non-
l1
ra
linearity, the entire NN could be abstracted to a single linear function.

an
m
vs
An NN node’s output can be defined as the following (using Table 1) (Russo) :

ru
𝐼𝐿−1
dh
𝑥𝑛𝐿 = 𝑔(∑ 𝑊𝑖 𝑥𝑖𝐿−1 + 𝐵)

y
tif
𝑖=1
as
Cl
Table 1 Definitions of abbreviations used in Formulas

Abbreviation Definition
𝑎[𝐿] output of layer L activation
𝑔(𝑥) activation function (like ReLU)
𝑍[𝐿] output of the linear component of layer L after the application of its
weights and biases/filters
𝑥𝑛𝐿 Output of the nth node of the Lth layer
𝐼𝐿 Number of nodes in the Lthlayer
𝐵 Bias
𝑊𝑖 ith Weight
-5-
Training a NN involves determining the optimal values of
the weights and biases for every node, using gradient
descent. Every training task needs a loss function, which
measures the model’s performance at the task: it involves
defining a cost function that measures performance on a
single data point, which is then summed across examples
to calculate a final loss (Russo). Fig.3: Gradient descent visualized

Source: (Jain)
om
Gradient descent is performed in a series of weight updates, as the values of the weights are
l.c
changed based on the loss function’s derivative with respect to the weights, to eventually find
ai
gm
a combination of weights that reaches the loss function’s minima (Fig.3) (Ruder).
3@
CNNs
l1
ra
an
Regular NNs were designed for tabular data, with initial features defined by the data scientist.
m
However, image data is stored differently, as three 2D arrays corresponding to each pixel’s
vs
ru
RGB intensities (Awati).

dh
y
tif
Image processing using typical NNs has not yielded positive results– since each pixel’s RGB
as
Cl
values must each become an individual input feature into the NN, leading to the loss of
location-based information and unrealistically high computational cost (Arnaldo). Therefore,
a modified NN architecture called the Convolutional Neural Network (CNN) is used.
Unlike a regular NN’s weights and biases, the CNN relies on the Convolution and Pooling
layers. These help the NN better extract features from the image, which helps eventually run
successful classifier/regression algorithms.

-6-
The Convolution Operator

The convolution-operator is an alternate way
of applying weights to image data. It replaces
weight matrices with filters that are applied to
all parts of the image (Hirshcman et al.). For
every pixel in the output, it is the sum of the

Fig.4: The convolution operator; Source: (Neutelings)
filter applied to a region of pixels in the input
image (Fig.4).
om
A CNN applies the fundamentals of gradient descent to optimize the filters’ values, and
l.c
multiple filters of the same size are usually applied to an image in one Convolutional-Layer.
ai
gm
Each convolution generates a smaller 2D array, and multiple arrays are stacked as outputs to
3@
a layer.
l1
ra
an
m
The Pooling-Operator
vs
The pooling-operator also reduces the size of

ru
dh
images, while extracting main features (Yu et

y
tif
al.). A square region of pixels is replaced by a

as
single pixel (Fig.5). There are multiple forms of

Cl
pooling used, most commonly, max-pooling,

Fig.5: The max pooling operator; Source: (Chen et al.)
where the output pixel is the maximum value from the input pixels.
-7-
CNN Architecture
After using multiple CNN-specific layers, a
complete Convolutional-architecture (Fig.6)
also uses regular NN layers (Krizhevsky et
al.). After every convolution, the CNN applies
a non-linearity (activation) to every pixel.

Fig. 6 A Typical Convolutional-Architecture;
Source: (Garcia-Ordas et al.)
Since max-pooling and convolutional-layers
decrease the size of the image and increase the number of images, Convolutional-bases create
om
bottlenecks to learn the image’s features (Song et al.) . After these layers, a CNN ends with a
l.c
ai
fully-connected (regular NN) layer, hence flattening the features into a single vectorized
gm
3@
input. The second component of a complete network is a final activation layer (Hao et al.),
l1
like Sigmoid or SoftMax, that converts the image features into usable output (here,
ra
an
probabilities).
m
vs
ru
dh
Different layers, filter sizes, pooling, etc. can be combined to create different convolutional-
y
tif
architectures. Various architectures have been developed over years of research, optimized
as
for different tasks. Two prominent convolutional-architectures will be compared in this paper
Cl
along with two morphology-specific activation layers.

-8-
ResNet
ResNet (He et al.) was developed in response to the vanishing gradient problem, where
gradients (w) became infinitesimally small with network depth, preventing data-scientists from
training very deep networks for complex relationships. ResNet leverages the concept of
om
l.c
ai
gm
3@
l1
ra
an
m
vs
ru
dh
y
tif
Fig. 7 A Sample 2-layer ResNet block with skip connection ; Source: Candidate
as
Cl
Residuals, where an Identity connection is used to skip certain layers. (Fig.7) shows a residual
block, central to ResNet architectures. The inputs undergo a linear layer, then an activation
(ReLU), and another linear layer, after which the original inputs are added again before the
data undergoes another activation (Abdullah). This Network is one of the most-cited of the 21st
century, and its effectiveness/credibility is therefore attested to.
The effects of the residual skip connection within a single block are therefore represented by:
𝑎[𝐿] = 𝑔(𝑍[𝐿] + 𝑎[𝐿−2]

-9-
DenseNet
DenseNet (Zhang et al.) was another architecture developed to address vanishing gradients. It
leverages the same skip connections to a greater extent, where each layer within a block is
connected to the activations of every previous layer. Similar to ResNet, all inputs go through
a linear layer, followed by an activation (ReLU) and another linear layer, after which the
activations of all previous layers are added before the data goes through another
activation(Fig.8).
om
l.c
ai
gm
3@
l1
ra
an
m
vs
ru
dh
y
tif
as
Cl
Fig.8: A sample 4-layer DenseNet block with skip connections1; Source: Candidate
1
This schematic refers to “weight layers” for convenience, but these are convolutional and pooling layers in
reality. The weight operation referred to in the formula also refers to the convolution operator, not direct
multiplication.
- 10 -
The effects of the dense skip connections within a single dense block are shown (Rodriguez-
Urrego):
𝐾
𝑎[𝐿] = 𝑔(𝑍[𝐿] + ∑(𝑎[𝐿−𝑘] ))

𝑘=2
Usually, there is also a weight matrix, where previous activations are multiplied by trainable
weights when being added (Muthukrishnan):

𝐾
𝑎[𝐿] = 𝑔(𝑍[𝐿] + ∑(𝑊[𝐿−𝑘,𝐿] ∗ 𝑎[𝐿−𝑘] ))
om
𝑘=2
l.c
ai
gm
3@
l1
ra
an
m
Comparison of the Architectures:

vs
ru
Both DenseNet and ResNet make claims to higher efficiency, and subsequent reports provide
dh
differing accounts based on the application (Haris). These comparisons come often from
y
tif
blog posts or unpublished ML researchers, so their validity is questionable. However, even

as
Cl
papers published in revered CV Journals have not reached a consensus (Chen et al.)
Therefore, it is pertinent to compare and analyze these networks in a context where deep
CNNs are needed.

- 11 -
Overfitting:
An important consideration when training is overfitting,
where the model learns the labels of the training data,
without actually improving on the task. In practice, the
point of overfitting (“Stop Here” in Fig.9) is an
approximate region, identified by general consensus of ML
researchers (Ying et al.), as the point usually at minimum
validation loss, beyond which training will worsen the

Fig. 9 Overfitting over Epochs;
om
Source: (Jordan)
performance on unseen data.
l.c
ai
Decision-trees gm
3@
Morphological-classification of
l1
ra
astronomical images is more complex than

an
m
simple classification due to multi-step

vs
ru
decision processes (Hocking et al.) To do

dh
this, models must leverage another

y
tif
primitive form of predictive modeling:

as
Cl
Decision-trees, which use simple if-else

Fig.10: Visual of the Galaxy-morphology decision tree;
conditions. Source: Willett et al.
Integrating a Decision-tree with a NN involves predicting the answers to the conditionals at
each stage with deep-learning, and then following the Decision-tree’s framework. A
Decision-tree provides multiple paths to reach the final classification. Fig.10 visually displays
the Galaxy-morphology Decision-tree.

- 12 -
Galaxy-morphology
Specifically to Galaxy-Zoo, the morphological-classification of
galaxies is a system designed to group galaxies by their visual
appearance (Kaggle), This dataset involves the classifications
of intricate morphological-features like bulge, spiral, shape, etc
(Fig.12). The leveraged system of morphological-classification
uses a complex Decision-tree that answers multiple questions,
each with multiple options, in sequential order. (Willett et al.)
om
Generating probabilities for each possibility will eventually
l.c
help generate the most probable path to the end, and can
ai
gm
eventually classify the galaxy image. The next question that the
3@
user faces is determined based on the response to every

l1
ra
preceding question, as detailed in Fig.11.

an
m
Fig.11: Galaxy-morphology task and

decision tree schematic; Source:
vs
Willett et al.
ru
dh
y
tif
as
Cl
Fig.12: Galaxy-morphology sample classifications; Source: (Malek)

- 13 -
Methodology
The RQ measures efficiency of different model architectures; therefore, both performance
and resource usage must be considered (IBM).
The RQ requires evaluation of these models on the morphological-classification of telescopic
images, and the Galaxy-Zoo Dataset will be used as a representative of these tasks.
Primary experimental data from the training and evaluation of actual models is analyzed
because it allows more freedom of manipulation of the IVs. However, this constrains the
om
analysis within available computational power and technical limitations.
l.c
Dataset and Data Format
ai
gm
Dataset:
3@
The dataset, with 61578 images, is loaded and processed from Kaggle into Python and read
l1
ra
using Pandas. (Appendix A) It is split into Training (train), Testing (test), and Validation
an
(val), (with 70%, 15%, and 15% proportions respectively).

m
vs
ru
Images:
dh
The images are 424x424 Colored JPEG files (Fig.13). Each image is centered around an
y
tif
as
isolated galaxy, captured by the Apache Point Telescope.

Cl
GalaxyID 883411 GalaxyID 520112 GalaxyID 327708 GalaxyID 282041

Fig.13 Sample Galaxy Zoo Images; Source: Galaxy Zoo Dataset (Kaggle)
These are resized to 120x120 using interpolation to conserve processing power, without
significantly changing the extent to which morphological features are visible (May).
- 14 -
Labels:
Table 2: Total output class distribution per question

Question Number of Classes Question Number of Classes
1 3 7 3
2 2 8 7
3 2 9 3
4 2 10 3
5 4 11 6
6 2 Total Classes 37
There are 37 columns to each galaxy image label, which correlate to all the possible
responses from users (broken-down in Table 2) and the dataset provides the probability of
each of these classes for every training example (Fig.14) Each galaxy is assigned a unique
om
galaxy-ID that corresponds to the image and row in the dataset.
l.c
ai
gm
3@
l1
ra
an
m
Fig.14 Sample Labels for Data; Source: Candidate’s Code (Appendix A)

vs
ru
Exploring a Single Data Point:

dh
y
The 120x120 Image, along with the probabilities (Table 3), for the image with GalaxyID
tif
as
100801 is chosen randomly to understand the structure and correlation of each data point.
Cl
Table 3 Data Point GalaxyID 100801

Class 1.1 0.035 Class 5.1 0.000 Class 8.1 0.040 Class 10.1 0.600
Class 9.3 0.000
Image 100801
- 15 -
All probabilities >0.300 were flagged as likely answers (Table 3). Using these, we can
interpret these labels to deduce the most probable sequence of answered questions, (Table 4)
using Fig.11.
This data point appears to have one clear probable sequence of answers, however, this is not
necessarily true for others.
Table 4 Probable question sequence for GalaxyID 100801

Q1: Is the galaxy simply smooth and rounded with Q10: How tightly wound do the arms appear?
no sign of a disk? Class 10.1: tight (0.600) -> Q11
Class 1.2: Features or Disk (0.965) -> Q2
Q11: How many spiral arms are there?
Class 11.6: Can’t say (0.322) -> Q5
om
Q2: Could this be a disk viewed edge on?
Class 2.2: No (0.965) -> Q3
l.c
Q5: How prominent is the central bulge compared to
Q3: Is there a sign of a bar feature through the the rest of the galaxy?
ai
center of the galaxy? Class 5.2: just noticeable (0.551) ->Q6
Class 3.2: No (0.713) -> Q4
gm
Class 5.3: obvious (0.414) -> Q6
3@
Q4: Is there any sign of a spiral arm pattern? Q6: Is Anything Odd?
l1
Class 4.1: Yes (0.875) -> Q10 Class 6.2: No (0.802) -> end
ra
an
m
Variables
vs
ru
Independent Variables:
dh
y
Network Architecture (Convolutional Base):

tif
as
The first part of the network is the convolutional base, which extracts “features'' from the
Cl
data. This will be loaded using Keras’s prebuilt models (Fig.15)
The DenseNet(121) and the ResNet(101) Convolutional-architectures will be compared.
Fig.15 Defining Convolutional Base Source: Candidate’s Code (Appendix B)

- 16 -
Activation
The final part of the model is the activation function, which converts the features detected by
the convolutional base into probabilities. The effect of two different types of Activation
Functions will be measured on model performance. Each activation will be combined with
both Convolutional Bases.
Sigmoid:
This approach uses 37 separate sigmoid activation nodes,
which generate a value between 0 and 1 for every single
om
class (Fig.16), independent of the probabilities of any other
l.c
classes. This was adapted from (Limanas), a submitter to the
ai
Fig.16 Sigmoid Activation Function;
competition, but has been used by most future published NNgm Source: (Ali)
3@
approaches to the task.

l1
ra
Decision-tree
an
m
Table 5: Probability matrix with decision-tree based weightages per question

vs
Question Multiplier* Question Multiplier*

ru
1 1.0 (100% probability) 6 P(Q7) || P(Q5) || P(Q9)

dh
2 P(Q1=1) = P(Q1=0) || P(Q2=1)|| P(Q2=0)

= P(Q1=0) || P(Q2)
y
3 P(Q2=1)
tif
4 P(Q3) = P(Q1=0) || P(Q1=1)

as
= P(Q2=1) = not(P(Q1=2))
Cl
5 P(Q4=1) || P(Q11) 7 P(Q1=0)

= P(Q4=1) || P(Q10) 8 P(Q6=0)
= P(Q4=1) || P(Q4=0) 9 P(Q2=0)
= P(Q4) 10 P(Q4=0)
= P(Q3) 11 P(Q10)
= P(Q2=1) = P(Q4=0)
*Indices of classes start from 0
Based on the Decision-tree in Fig.11, the alternative is where the probability of a volunteer
facing a particular question must also be considered in the probability of a certain answer.
Therefore, a probability matrix, (Table 5) is created to determine the relative probabilities of
each question. The weightages are multiplied by the probabilities generated by a SoftMax (a
- 17 -
combination of Sigmoids that sum to 1) for each question, Fig.17. This approach is adapted
from (Kawagichi), who submitted to the competition. However, this source is not published
and necessarily credible; therefore the correctness of this design is not guaranteed.
Fig.17 Defining a Decision-tree in code; Source: Candidate’s Code (Appendix B)
om
Dependent Variables (DVs):
There are three DVs: performance, resource usage, and combined efficiency, which can be
l.c
ai
quantified in different ways.
gm
3@
Firstly, the “final” model at the end of 40 epochs will be saved, which standardizes the
l1
ra
number of weight updates and training the model undergoes.

an
m
vs
Secondly, the “optimal” model will be saved at the point of overfitting, which is theoretically
ru
the best performance the model will achieve on unseen data, allowing the models’ “best-
dh
y
possible” versions to be compared.

tif
as
Cl
The following will be measured and compared across IVs to answer the RQ (Appendix B).
Model Performance:
Quantitative Measures: Optimal Test Loss (𝑇𝑒𝑠𝑡 𝑀𝑆𝐸𝑜𝑝𝑡 ), Final Test Loss (𝑇𝑒𝑠𝑡 𝑀𝑆𝐸40 ),
Measured using custom callbacks (Appendix C)
The loss (MSE) on the Testing data will be found at both the Final and Optimal stages, using
Keras’s evaluate function. This measures the performance of the model on unseen data, and
simulates the model’s performance in real-world application. Low MSE values are good
- 18 -
indicators of accuracy in multi-class prediction. RMSE, (= √𝑀𝑆𝐸), approximates the
average difference between real and predicted probability.

𝑛
1 2
𝑀𝑆𝐸 = ∑(𝑌𝑖 − 𝑌̂𝑖 )
𝑛
𝑖=1
n = no. of training examples,
𝑌𝑖 = ith label
̂ 𝑖 = ith prediction
𝑌
These metrics answer the accuracy and performance part of the research question, but further
om
analysis is needed to analyze the efficiency of the models.
l.c
ai
Resource Usage:
gm
Quantitative Metrics: Training time to final (𝑇40 ), Training time to optimal epochs (𝑇𝑜𝑝𝑡 ),
3@
l1
𝑇
Epochs needed to reach optimal (𝐸𝑜𝑝𝑡 ), number of parameters (𝑝), Time per epoch (𝐸)
ra
an
Measured using custom callbacks (Appendix C)

m
vs
The time taken to reach the final model will be measured and divided by 40 to find the
ru
dh
average time per epoch, which represents temporal resource use.

y
tif
as
The time/number of epochs taken to reach the optimal model will be measured, representing
Cl
the resources needed to reach optimum. Time is measured using Python’s time library.
The number of parameters in the model also indicates how much training/storage it requires
for a dataset.
Training Efficiency:
∆𝑀𝑆𝐸 ∆𝑀𝑆𝐸
Quantitative Metrics: loss gain per epoch , loss gain per training time , overfitting
𝐸 𝑇
- 19 -
By processing the raw data (Table 6), loss gain/epoch, and loss gain/time are calculated. Both
show how the model trains given the same resources till it reaches optimum, quantifying the
accuracy/resource tradeoff, partly answering the RQ. The train & val loss/time graphs can be
analyzed to visually evaluate efficiency in training.
Table 6 Processing the Data

Gain in Loss: ∆𝑇𝑟𝑎𝑖𝑛 𝑀𝑆𝐸 = 𝑇𝑟𝑎𝑖𝑛 𝑀𝑆𝐸𝑜𝑝𝑡 − 𝑇𝑟𝑎𝑖𝑛 𝑀𝑆𝐸0
Gain in Loss/Epoch ∆𝑀𝑆𝐸 ∆𝑇𝑟𝑎𝑖𝑛 𝑀𝑆𝐸
=
𝐸 𝐸𝑜𝑝𝑡
Gain in Loss/Time ∆𝑀𝑆𝐸 ∆𝑇𝑟𝑎𝑖𝑛 𝑀𝑆𝐸
=
𝑇 𝑇𝑜𝑝𝑡
om
However, the training efficiency is not the only important factor in the model’s real-world
l.c
ai
applicability. A model can train very well, but if it overfits the data easily, it won’t perform
gm
well on unseen data. The difference between val/test and train loss at different stages, and a
3@
l1
visual inspection of the divergence in val/train loss from the graphs helps quantify this.
ra
an
Method
m
vs
Using the following model specifications, 4 trials (Table 8) will be compiled, trained, and
ru
dh
evaluated to produce a DV data table using Python, to effectively compare across IVs, as
y
tif
clearly documented in Appendix B. Materials are in Appendix F.

as
Table 7 Different Trials Conducted

Cl
Trial Type Depth Decision-tree

T1 ResNet 101 False
T2 DenseNet 121 False
T3 ResNet 101 True
T4 DenseNet 121 True
Controlled Variables/Model Specifications:

Table 7 shows variables that directly affect the DV’s measurement, and are specifications to
how every model should be trained. They must be kept constant to not skew the results.
- 20 -
Table 8 Controlled Model Specifications

Variable Value/How it’s controlled Function Affects which
DV
Optimizer RMSProp: Controls process of gradient performance
https://keras.io/api/optimizers/rmsprop/ descent & learning rate (Ashish) and resource
usage.
Loss Mean Squared Error Value that is optimized throughout performance
Functions training.
Pretrained All models use pretrained weights on Changes starting point of model, resource usage
Weights ImageNet and training needed
Final Epochs 40 Determines how many weight performance
updates happen.
Processor NVIDIA P100 GPU Hardware Accelerator Determines the time taken to resource usage
GPU & RAM and 27.3 GB of RAM perform functions
Training, 70% Training, 15% Testing, 15% Validation Models with a more training data performance
Testing and randomly assigned (Random Seed = 10) learn better, but slower. and resource
Validation usage
om
Data
Python and Python 3.7.13, and Keras from Tensorflow Updated Python/Keras versions performance
l.c
Keras version 2.8.2 train with different efficiencies, and resource
ai
and may have slight differences in usage
IDE Used Google Colab Pro

gm
functionality
Some IDE’s are slower to run with, resource usage
3@
and some are designed for ML

tasks.
l1
Depth ~111 +10 layers Deeper networks have more performance

ra
parameters, and take longer to train and resource

an
but model complex functions better usage

m
Dataset Galaxy Zoo Version 1.0 Different releases have different performance
vs
Version https://www.kaggle.com/competitions/galaxy- composition & quality of training

ru
zoo-the-galaxy-challenge/data data.
Imports and Appendix F details all imports and their Different versions of modules may performance
dh
Dependencies versions have slight differences in and resource

y
functionality usage
tif
as
Cl
Hypothesis
It can be hypothesized that DenseNet outdoes ResNet in terms of resource usage because it
requires fewer parameters, and addresses vanishing gradients more drastically, but ResNet
will perform better in terms of accuracy because more parameters can fit the data better. The
Decision-tree should outperform Sigmoid in terms of training efficiency, since it manually
encodes relationships that would otherwise take time to learn, but might not be more
accurate, because small errors in initial probabilities have high weightage on the predictions'
accuracy.
- 21 -
Results
Results Tables
Model Performance:
Table 9 Final Model Performance Without Decision Tree

Trial D.Tree Type Depth 𝑇𝑟𝑎𝑖𝑛 𝑀𝑆𝐸0 𝑇𝑟𝑎𝑖𝑛 𝑀𝑆𝐸40 𝑉𝑎𝑙 𝑀𝑆𝐸40 𝑇𝑒𝑠𝑡 𝑀𝑆𝐸40
T1 False ResNet 101 0.23168 0.00105 0.00919 0.00924
T2 False DenseNet 121 0.34153 0.00334 0.00989 0.00979
T3 True ResNet 101 0.08715 0.00554 0.01283 0.01294
om
T4 True DenseNet 121 0.10295 0.05962 0.05840 0.05892
l.c
ai
Table 10 Optimal model performance gm
Trial D.Tree Type Depth Epochs 𝑇𝑟𝑎𝑖𝑛 𝑀𝑆𝐸40 𝑉𝑎𝑙 𝑀𝑆𝐸40 𝑇𝑒𝑠𝑡 𝑀𝑆𝐸40
3@
l1
T1 False ResNet 101 39 0.00108 0.00909 0.00919

ra
an
T2 False DenseNet 121 18 0.00641 0.00880 0.00876

m
T3 True ResNet 101 22 0.00767 0.00891 0.00906

vs
ru
T4 True DenseNet 121 15 0.05908 0.05746 0.05812

dh
y
tif
Model Efficiency:
as
Cl
Table 11 Efficiency with Decision Tree

Trial D.Tree Type Depth Epochs 𝑇𝑜𝑝𝑡 𝑇40 𝑝 𝑇 ∆𝑀𝑆𝐸 ∆𝑀𝑆𝐸
𝐸 𝐸 𝑇
T1 False ResNet 101 39 3122.7 3201.3 42,628,645 80.03 0.00558 0.0000738
T2 False DenseNet 121 18 1410.2 3083.5 6,991,781 77.09 0.01862 0.0002376
T3 True ResNet 101 22 1823.0 3283.7 42,628,645 82.09 0.00361 0.0000440
T4 True DenseNet 121 14 1120.2 3146.2 6,991,781 78.66 0.00292 0.0000371
Note: All Training Graphs are in Appendix D

- 22 -
Analysis
Best Performing Models:
Trial 2, DenseNet without Decision-tree, had the lowest testing loss of 0.00876 at optimal
stage. In terms of purely training efficiency (average loss gain/ time), the best-performing
model was also DenseNet without Decision-tree, decreasing training loss by
0.0002376/second. A close second was ResNet, both with and without the Decision-tree.
Between Model Architectures
Between ResNet(101) and DenseNet(121), ResNet always achieves the lower final loss,
om
regardless of Decision-tree. The gap between the final ResNet and DenseNet loss is greater
l.c
ai
with Decision-tree. However, for the optimal model, DenseNet(121) performs marginally
gm
better than ResNet(101) without the Decision-tree, but ResNet(101) far outperforms the
3@
l1
DenseNet(121) with the Decision-tree. DenseNet is slightly deeper (121>101), but it is

ra
an
unclear whether this causes these results.

m
vs
Moreover, ResNet tends to create better final models, likely because DenseNet reaches
ru
dh
optimal validation loss (point of overfitting) around the 15-18 epoch, much earlier than
th
y
tif
ResNet (around the 30 epoch) and therefore the final model is much more overfit than the
th
as
Cl
ResNet. Overall, optimal model loss is a better parameter to measure accuracy and therefore
yields comparable model performance between the two architectures with slight preference to
DenseNet.
DenseNet has significantly higher final training loss than ResNet, which means that it isn’t
even able to fit the training data as well, but also means that there is less chance of
overfitting. ResNet takes more time per epoch, likely because it has more trainable
parameters.
- 23 -
With Decision-tree, ResNet decreases loss by a higher amount per epoch and per second, but
without the Decision-tree DenseNet significantly decreases loss by more per epoch and per
second. The DenseNet with the Decision-tree is the worst-performing model and also trains
the least efficiently. This reaffirms the idea that ResNet pairs much better with Decision-tree,.
Adding a Decision-tree improves the efficiency of the ResNet but worsens that of the
DenseNet.
DenseNet, however, has 1/6 of the number of parameters that ResNet does, and yet, models
th
the same relationship to a comparable accuracy (without Decision-tree).
om
l.c
Inspecting the graphs of model training, it is observed that there is a much higher disparity
ai
gm
between training and validation loss for ResNet than in DenseNet throughout training. This is
3@
usually because a higher number of parameters allows the network to fit the training data
l1
ra
better. Both have similar amounts of fluctuation in training and validation loss, and both
an
m
show a steady smooth curve of decrease in training loss over epochs.

vs
ru
dh
Overall, DenseNet seems to perform more efficiently with a decision tree and achieves the
y
highest efficiency, and ResNet performs better without a decision tree – unexpected, as
tif
as
ResNet was predicted to be better.

Cl
Between Output Layers.
Between Decision-tree and Sigmoid, for all trials, both final and optimal model losses for
sigmoid models were lower than their Decision-tree counterparts. For the optimal ResNet
model, the gap between Decision-tree and Sigmoid is small, but DenseNet seems to have
higher disparity between Decision-tree and Sigmoid loss.
Generally, Decision-tree is worse than individual sigmoid layers. Theoretically, the Decision-
tree manually encodes relationships in the data into the model instead of learning them,
- 24 -
therefore should perform better, but this isn’t true. A likely explanation is that the Decision-
tree is highly dependent on initial questions being answered accurately (~ the first 10 classes)
– all successive probabilities are highly impacted by any variations in initial probability.
Therefore, even slight errors in predicting the initial probabilities are reflected heavily in the
entire prediction.
For both convolution-bases (DenseNet especially) adding the Decision-tree makes the
training process less efficient (smaller change in loss per time and epoch), contradicting the
hypothesis. The Decision-tree doesn’t affect the number of trainable parameters, and
om
therefore generally provides worse models for the same number of parameters.
l.c
ai
gm
Noticeably in graphs, throughout training, there is a much higher disparity between training
3@
and validation loss for Models with a Decision-tree. Val loss seems to be generally
l1
ra
decreasing till the end of training in models with a Decision-tree, while it flattens out after the
an
m
point of overfitting, with no change except for light fluctuations with Sigmoid. However,
vs
ru
Decision-tree models seem to have significantly more loss fluctuation. This can again be
dh
attributed to the fact that a Decision-tree manually encodes relationships that the sigmoid has
y
tif
to learn. Hence, small changes in weights to initial outputs have cascading effects on model
as
Cl
performance. In models without the Decision-tree, the model can easily fit itself to the
specific variations in the training data without actually learning overarching probabilistic
relationships, but the Decision-tree is forced to, which explains the trends in val/train loss
disparity over training.
Decision-tree models take slightly longer to train per epoch, but they reach the optimum
sooner than sigmoid-based models. For every trial, adding a Decision-tree made the model
reach the optimum (start to overfit) after fewer epochs and less time. This is positive because
it takes less temporal resources to train an optimal model, but it tends to overfit very easily.
- 25 -
Overall, Sigmoid-based activation is more efficient and outperforms Decision Tree,
especially with DenseNet.
Analysis of the Data Point
Returning to the data point first dissected (GalaxyID 100801), the structure of predictions is
analyzed (Appendix E). Again, probabilities >0.300 are flagged as answers. Most models are
predicting realistically close values, but almost no predictions have the exact same
probabilities, which seems discouraging. However, most models still generate nearly accurate
sequences of questions and answers; Hence, in application morphological-classification is
om
l.c
done with considerable accuracy. For all models, predictions get worse towards the last
ai
questions, they are most dependent on initial probabilities, and small errors in initial
gm
3@
predictions have the most weightage. However, making any generalizable conclusions based
l1
on one data point is unreliable.

ra
an
m
Conclusions
vs
ru
dh
Evaluation of the Method and Extensions

y
tif
While the method adequately answered RQ, by considering model performance and resource
as
Cl
usage in detail, there are many potential sources of error identified.
Firstly, there was no data cleanup/preprocessing, which might mean that faulty data points
(eg. outliers/null measurements) skewed the data. In typical ML tasks, multiple steps of data
cleanup are needed before training.
Secondly, an ever-standing limitation in ML is the lack of data. Data augmentation measures
could have been used to improve the model’s training. Insufficient training data leads to
- 26 -
increased overfitting, and less generalizability of the model, which hinders the answering of
the RQ since model performance is distorted.
Lastly, the RQ could have been answered better (more generalizably), if analysis was
conducted over multiple different morphological-datasets, involving other astronomical
features.
To extend, factors like hardware, optimizers, and kernel size can be evaluated to find the
optimal model for morphological-classification. Additionally, the experiment could extend to
more Network Architectures like EfficientNet, LeNet, AlexNet, VGG, etc.
om
l.c
Final Conclusions
ai
gm
In conclusion, answering the RQ, the tested Dense-Net and Res-Net architectures with and
3@
without Decision-trees, are efficient at classifying the morphology of telescopic images,

l1
ra
achieving a best MSE of 0.00876, and an RMSE of 0.0936, close to Kaggle’s winning loss
an
m
(0.07466) (Kaggle). This means that there was an average 0.0936 (~9%) difference between
vs
predicted and real probabilities, which is acceptably low. Between the IVs, while there were
ru
dh
many complexities in trends, it was generally identified that DenseNet classifies more
y
tif
efficiently than ResNet, and simple Sigmoid activations tend to perform morphological-
as
Cl
classification more efficiently. The hypothesis was mostly incorrect and the trends were
surprising,
Overall, most network architectures performed acceptably well on the task, enough for real-
world application. This paper helps improve our understanding of CNN features that suit
morphological-classification of images and can better the models implemented in real
astronomical labs.
- 27 -
Works Cited
Abdullah, Muhammad. "Introduction to ResNets." Towards Data Science, 24 Aug. 2019,
towardsdatascience.com/introduction-to-resnets-c0a830a288a4.
Ali, Amir. “Logistic Regression with Practical Implementation.” Medium, The Art of Data
Scicne, 24 Nov. 2019, https://medium.com/machine-learning-researcher/logistic-
regression-in-machine-learning-ad4d5fef88bb.
Arnaldo, Muhammed. "How to Build a Multi-class Image Classification Model without
om
CNNs in Python." Analytics Vidhya, 27 Jul. 2021, medium.com/analytics-
l.c
vidhya/how-to-build-a-multi-class-image-classification-model-without-cnns-in-
ai
python-660f0f411764. gm
3@
l1
ra
Awati, Rahul. "Convolutional Neural Network (CNN)." TechTarget, 2021,

an
m
techtarget.com/searchenterpriseai/definition/convolutional-neural-network.
vs
ru
dh
Bishop, Chris M. "Neural networks and their applications." Review of scientific

y
tif
instruments 65.6 (1994): 1803-1832.

as
Cl
Chen, Liang-Chieh, et al. “DeepLab: Semantic Image Segmentation with Deep Convolutional
Nets, Atrous Convolution, and Fully Connected Crfs.” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 40, no. 4, 2018, pp. 834–848.,
https://doi.org/10.1109/tpami.2017.2699184.
Chen, Zhiqiang, et al. "Integrating Spatial and Temporal Features for Early Recognition of
Mild Cognitive Impairment from Multi-Modal Neuroimaging Data." Scientific
Reports, vol. 10, no. 1, 2020, doi:10.1038/s41598-020-70479-z.

- 28 -
Coursera. “Structured vs. Unstructured Data: What’s the Difference?” Coursera Articles,
Coursera, https://www.coursera.org/articles/structured-vs-unstructured-data-whats-
the-difference.
“Galaxy Zoo - The Galaxy Challenge.” Kaggle, Kaggle, https://www.kaggle.com/c/galaxy-
zoo-the-galaxy-challenge.
García-Ordás, María Teresa, et al. “Detecting Respiratory Pathologies Using Convolutional
om
Neural Networks and Variational Autoencoders for Unbalancing Data.” Sensors, vol.
l.c
ai
20, no. 4, 2020, p. 1214., https://doi.org/10.3390/s20041214.
gm
3@
Hao, Wang, et al. "The role of activation function in CNN." 2020 2nd International
l1
ra
Conference on Information Technology and Computer Application (ITCA). IEEE,

an
m
2020.
vs
ru
dh
Haris, Faizan. "ResNets, DenseNets, and UNets." The Startup - Medium, 26 Sept. 2020,
y
tif
https://medium.com/swlh/resnets-densenets-unets-6bbdbcfdf010.
as
Cl
Hardesty, Larry. "Explained: Neural Networks." MIT News, 14 Apr. 2017,
news.mit.edu/2017/explained-neural-networks-deep-learning-0414.
He, Kaiming, et al. "Deep Residual Learning for Image Recognition." Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2016,
doi:10.1109/CVPR.2016.90.
- 29 -
Hirschman, Isidore Isaac, and David V. Widder. The convolution transform. Courier
Corporation, 2012.
IBM Cloud. “Machine Learning: What it is and why it matters.” IBM, IBM,
https://www.ibm.com/cloud/learn/machine-learning.
IBM. "Resource utilization and performance." IBM Informix documentation, version 14.10,
IBM, https://www.ibm.com/docs/en/informix-servers/14.10?topic=basics-resource-
om
utilization-performance.
l.c
ai
Jain, Rashmi. “3 Types of Gradient Descent Algorithms for Small & Large Data Sets.”
gm
3@
HackerEarth Blog, 2 July 2019, https://www.hackerearth.com/blog/developers/3-types-
l1
gradient-descent-algorithms-small-large-data-sets/.
ra
an
m
Jordan, Jeremy. “Deep Neural Networks: Preventing Overfitting.” Jeremy Jordan, Jeremy
vs
ru
Jordan, 30 July 2018, https://www.jeremyjordan.me/deep-neural-networks-preventing-

dh
overfitting/.
y
tif
as
Kawaguchi, Hironobu. “Galaxy Zoo Xception.” Kaggle, Kaggle, 13 Mar. 2018,

Cl
https://www.kaggle.com/code/hironobukawaguchi/galaxy-zoo-xception.
Kumar, Ashish. "A Look at Gradient Descent and RMSprop Optimizers." Towards Data
Science, 2 Sept. 2019, towardsdatascience.com/a-look-at-gradient-descent-and-
rmsprop-optimizers-f77d483ef08b.
Krizhevsky, Alex, et al. “ImageNet Classification with Deep Convolutional Neural
Networks.” arXiv.org, arXiv, 11 Dec. 2013, https://arxiv.org/abs/1308.3496.

- 30 -
Li, M. P., Ostriker, J. P., Sunyaev, R., & Blandford, R. D. "Non-thermal radiation from
clusters of galaxies." Monthly Notices of the Royal Astronomical Society, vol. 397,
no. 4, 2009, doi:10.1111/j.1365-2966.2009.15366.x.
Limanas, Henrique. “Galaxy Zoo Classifier Galaxies.” Kaggle, Kaggle, 13 Mar. 2018,
https://www.kaggle.com/code/henriquelimanas/galaxy-zoo-classifier-galaxies.
om
Malek, Abdul. The Einsteinian Universe?: A Dialectical Perspective of Modern Theoretical
l.c
Physics and Cosmology. A. Mannan, 2004.
ai
gm
Martin, Patrick. “The Universal Approximation Theorem Is Terrifying.” Medium, Medium, 9
3@
l1
Aug. 2022, https://medium.com/@patrickmartinaz/the-universal-approximation-

ra
an
theorem-is-terrifying-83a53acc4192.
m
vs
ru
May, Ann. "Resizing Images using Various Interpolation Techniques." Medium, 10 May
dh
2021, annmay10.medium.com/resizing-images-using-various-interpolation-
y
tif
techniques-3c302e2e08c5.
as
Cl
Muthukrishnan, Saravanakumar. "Review: DenseNet Image Classification." Towards Data
Science, 4 Jan. 2019, https://towardsdatascience.com/review-densenet-image-
classification-b6631a8ef803.
Neutelings, Izaak. “Neural Networks.” TikZ.net, 2 May 2022,
https://tikz.net/neural_networks/.
Roy, Priya. “The Difference Between Artificial Intelligence and Machine Learning.”
- 31 -
Analytics Insight, Analytics Insight, 13 Dec. 2019,
https://www.analyticsinsight.net/the-difference-between-artificial-intelligence-and-
machine-learning/#
Rodriguez-Urrego, David, and Miguel A. Maheut. "Deep Learning for Sentiment Analysis: A
Survey." Information, vol. 10, no. 11, 2019, doi:10.3390/info10110354.
Ruder, Sebastian. "An overview of gradient descent optimization algorithms." arXiv preprint
om
arXiv:1609.04747 (2016).
l.c
ai
gm
Song, Yan, Ian McLoughLin, and Lirong Dai. "Deep bottleneck feature for image
3@
classification." Proceedings of the 5th ACM on International Conference on

l1
ra
Multimedia Retrieval. 2015

an
m
vs
Willett, Kyle W., et al. “Galaxy Zoo 2: Detailed Morphological Classifications for 304 122
ru
Galaxies from the Sloan Digital Sky Survey.” Monthly Notices of the Royal
dh
y
Astronomical Society, vol. 435, no. 4, 2013, pp. 2835–2860.,

tif
as
https://doi.org/10.1093/mnras/stt1458.
Cl
Yu, Dingjun, et al. "Mixed pooling for convolutional neural networks." Rough Sets and
Knowledge Technology: 9th International Conference, RSKT 2014, Shanghai, China,
October 24-26, 2014, Proceedings 9. Springer International Publishing, 2014.
Ying, Xue. "An overview of overfitting and its solutions." Journal of physics: Conference
series. Vol. 1168. IOP Publishing, 2019.

- 32 -
Zhang, Xiyu, et al. “ResNet or DenseNet? Introducing Dense Shortcuts to ResNet.”
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer
Vision, 2021,
https://openaccess.thecvf.com/content/WACV2021/papers/Zhang_ResNet_or_Dense
Net_Introducing_Dense_Shortcuts_to_ResNet_WACV_2021_paper.pdf.
om
l.c
ai
gm
3@
l1
ra
an
m
vs
ru
dh
y
tif
as
Cl
- 33 -
Appendix
Appendix A: Data Download and Processing Code
Loading the Data
Creating Train, Test, Validation Datasets
om
l.c
ai
gm
3@
l1
ra
an
m
vs
ru
dh
y
tif
as
Cl
- 34 -
Appendix B: Models Code
Defining Model
om
l.c
ai
gm
3@
l1
ra
an
m
Custom Callbacks
vs
ru
dh
y
tif
as
Cl
Compiling and Training

- 35 -
Appendix C: Result Generation Code
Training Graphs
Loss and Results
om
l.c
ai
gm
3@
l1
ra
Printing Results
an
m
vs
ru
dh
y
tif
as
Cl
- 36 -
Appendix D: Training Graphs
Without Decision Tree:
Tri Base Depth Accuracy RMSE Loss

al Architectu
re
T1 ResNet 101
T2 DenseNet 121
om
l.c
ai
gm
3@
With Decision Tree:

l1
ra
Trial Base Depth Accuracy RMSE Loss

an
Architect
ure
m
vs
T3 ResNet 101
ru
dh
y
tif
as
T4 DenseNe 121
Cl
t
- 37 -
Appendix E: Results on Data Point (GalaxyID: 100801)
Trial Truth Value Correct Option T1 T2 T3 T4

Prob (1s)
Class 1.1 0.035 0.168 0.171 0.222 1.000
Class 1.2 0.965 1 0.822 0.818 0.769 0.000
Class 1.3 0.000 0.012 0.011 0.009 0.000
Class 2.1 0.000 0.015 0.014 0.011 0.000
Class 2.2 0.965 1 0.810 0.783 0.758 0.000
Class 3.1 0.252 0.137 0.097 0.108 0.000
Class 3.2 0.713 1 0.662 0.692 0.650 0.000
Class 4.1 0.875 1 0.589 0.494 0.609 0.000
Class 4.2 0.090 0.169 0.335 0.148 0.000
om
Class 5.1 0.000 0.020 0.011 0.020 0.000
l.c
Class 5.2 0.551 1 0.484 0.333 0.406 0.000
Class 5.3 0.414 0.325 0.420 0.301 0.000
ai
Class 5.4 0.000 0.013 gm 0.047 0.030 0.000
Class 6.1 0.198 0.271 0.414 0.213 0.255
3@
Class 6.2 0.802 1 0.728 0.590 0.778 0.745

Class 7.1 0.007 0.052 0.081 0.139 0.381
l1
ra
Class 7.2 0.028 0.103 0.067 0.082 0.348

Class 7.3 0.000 0.002 0.001 0.001 0.270
an
Class 8.1 0.040 0.148 0.151 0.079 0.000

m
Class 8.2 0.000 0.023 0.034 0.018 0.000

vs
Class 8.3 0.040 0.094 0.081 0.044 0.062

ru
Class 8.4 0.079 0.054 0.087 0.026 0.125

dh
Class 8.5 0.000 0.018 0.065 0.032 0.000

y
Class 8.6 0.040 0.006 0.034 0.013 0.069

tif
Class 8.7 0.000 0.001 0.002 0.000 0.000

as
Class 9.1 0.000 0.012 0.013 0.008 0.000

Cl
Class 9.2 0.000 0.001 0.001 0.000 0.000

Class 9.3 0.000 0.002 0.002 0.003 0.000
Class 10.1 0.600 1 0.346 0.309 0.362 0.000
Class 10.2 0.277 0.167 0.176 0.198 0.000
Class 10.3 0.000 0.033 0.038 0.050 0.000
Class 11.1 0.092 0.095 0.078 0.065 0.000
Class 11.2 0.230 0.157 0.123 0.209 0.000
Class 11.3 0.092 0.077 0.054 0.053 0.000
Class 11.4 0.000 0.008 0.015 0.018 0.000
Class 11.5 0.138 0.007 0.012 0.024 0.000
Class 11.6 0.322 1 0.203 0.253 0.240 0.000
- 38 -
Appendix F: Imports and Requirements
IDE and Software

Google Colab Pro on Google Chrome
OS and Language
Python 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
Linux-5.10.147+-x86_64-with-glibc2.29
Modules and Imports
PIL 7.1.2 opt_einsum v3.3.0

h5py 3.1.0 packaging 23.0
keras 2.11.0 pexpect 4.8.0
matplotlib 3.5.3 pickleshare 0.7.5
numpy 1.22.4 pkg_resources NA platformdirs
om
pandas 1.3.5 3.0.0
session_info 1.0.0 portpicker NA
l.c
tensorflow 2.11.0 prompt_toolkit 2.0.10
ai
psutil 5.4.8
astor
astunparse
0.8.1
1.6.3 pyarrow
gm
ptyprocess 0.7.0
9.0.0
3@
backcall 0.2.0 pyasn1 0.4.8
cachetools 5.3.0 pyasn1_modules 0.2.8
l1
certifi 2022.12.07 pydev_ipython NA

ra
cffi 1.15.1 pydevconsole NA

an
chardet 4.0.0 pydevd 2.0.0

cloudpickle 2.2.1 pydevd_concurrency_analyser NA
m
cycler 0.10.0 pydevd_file_utils NA

vs
cython_runtime NA pydevd_plugins NA
ru
dateutil 2.8.2 pydevd_tracing NA

dh
debugpy 1.0.0 pydot_ng 2.0.0

decorator 4.4.2 pygments 2.6.1
y
dill 0.3.6 pyparsing 3.0.9

tif
etils 1.0.0 pytz 2022.7.1

as
flatbuffers 23.1.21 requests 2.25.1

fsspec 2023.1.0 rsa 4.9
Cl
gast NA scipy 1.7.3

google NA sitecustomize NA
google_auth_httplib2 NA six 1.15.0
googleapiclient NA socks 1.7.1
httplib2 0.17.4 sphinxcontrib NA
idna 2.10 storemagic NA
importlib_resources NA tblib 1.7.0
ipykernel 5.3.4 tensorboard 2.11.2
ipython_genutils 0.2.0 termcolor NA
jax 0.3.25 tornado 6.2
jaxlib 0.3.25 traitlets 5.7.1
kiwisolver 1.4.4 typing_extensions NA
mpl_toolkits NA uritemplate 4.1.1
numexpr 2.8.4 urllib3 1.24.3
oauth2client 4.1.3 wcwidth 0.2.6
wrapt 1.14.1
zipp NA
zmq 23.2.1
- 39 -
om
l.c
ai
gm
3@
l1
ra
an
m
vs
ru
dh
y
tif
as
Cl

To What Extent Are Convolutional Neural Networks Based On DenseNet and ResNet Architectures and Deci

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

To What Extent Are Convolutional Neural Networks Based On DenseNet and ResNet Architectures and Deci

Uploaded by

Copyright:

Available Formats

Downloaded from www.clastify.

com by Dhruv Manral

Investigating the Efficiency of Different Convolutional Neural Network

classifications of RGB Telescopic Images?

Topic: Convolutional Neural Networks and Image Classification

Subject: Computer Science

Session: May 2023

Word Count: 3998

Evaluation of the Method and Extensions .................................................................................... - 25 -

Final Conclusions ......................................................................................................................... - 26 -

Works Cited ................................................................................................................- 27 -

without explicit programming (IBM Cloud). ML is a subfield of Artificial Intelligence (AI)

which attempts to computationally emulate human behavior to solve a broad range of

with greater efficiency and accuracy than humans.

Neural Networks (RNNs) have proven especially successful at these tasks.

morphological-classifications must be done (classifications relating to aspects of the internal

Fig.1: Galaxy-Zoo Dataset

application of Deep-learning technology.

by improving the way telescopic data is processed, putting it to better use.

class morphological classifications of RGB Telescopic Images?

The highest performing model had an RMSE (Root-Mean-Squared-Error) of 0.07466, which

builds upon current research.

regression tasks (Babak). Morphological-classification requires a modified application of

NNs work on the premise of hidden layers, as depicted in

Fig.2, consisting of neurons ( “nodes”). Nodes are stacked

in “layers” that are connected to each other. The number of

layers determines the NN’s depth, hence, the complexity of

the task it can solve. Nodes in a layer are connected to

linearity, the entire NN could be abstracted to a single linear function.

An NN node’s output can be defined as the following (using Table 1) (Russo) :

𝑥𝑛𝐿 = 𝑔(∑ 𝑊𝑖 𝑥𝑖𝐿−1 + 𝐵)

Table 1 Definitions of abbreviations used in Formulas

Training a NN involves determining the optimal values of

the weights and biases for every node, using gradient

descent. Every training task needs a loss function, which

measures the model’s performance at the task: it involves

defining a cost function that measures performance on a

single data point, which is then summed across examples

to calculate a final loss (Russo). Fig.3: Gradient descent visualized

RGB intensities (Awati).

location-based information and unrealistically high computational cost (Arnaldo). Therefore,

a modified NN architecture called the Convolutional Neural Network (CNN) is used.

successful classifier/regression algorithms.

The Convolution Operator

of applying weights to image data. It replaces

weight matrices with filters that are applied to

all parts of the image (Hirshcman et al.). For

every pixel in the output, it is the sum of the

The pooling-operator also reduces the size of

images, while extracting main features (Yu et

al.). A square region of pixels is replaced by a

single pixel (Fig.5). There are multiple forms of

pooling used, most commonly, max-pooling,

After using multiple CNN-specific layers, a

complete Convolutional-architecture (Fig.6)

also uses regular NN layers (Krizhevsky et

al.). After every convolution, the CNN applies

a non-linearity (activation) to every pixel.

along with two morphology-specific activation layers.

century, and its effectiveness/credibility is therefore attested to.

𝑎[𝐿] = 𝑔(𝑍[𝐿] + 𝑎[𝐿−2]

𝑎[𝐿] = 𝑔(𝑍[𝐿] + ∑(𝑎[𝐿−𝑘] ))