Professional Documents
Culture Documents
To What Extent Are Convolutional Neural Networks Based On DenseNet and ResNet Architectures and Deci
To What Extent Are Convolutional Neural Networks Based On DenseNet and ResNet Architectures and Deci
om
Architectures and Activations on Efficiency at Morphological
l.c
Classification of Telescopic Images
ai
gm
Research Question: To what extent are Convolutional Neural Networks based on DenseNet
3@
and ResNet architectures and Decision Trees efficient at making multi-class morphological
l1
ra
Table of Contents
Introduction: ................................................................................................................- 1 -
Background Research ...................................................................................................- 3 -
Other Approaches ........................................................................................................................... - 3 -
Neural Networks............................................................................................................................. - 3 -
CNNs .............................................................................................................................................. - 5 -
ResNet ............................................................................................................................................ - 8 -
DenseNet ........................................................................................................................................ - 9 -
Overfitting: ................................................................................................................................... - 11 -
Decision-trees ............................................................................................................................... - 11 -
om
Galaxy-morphology...................................................................................................................... - 12 -
l.c
Methodology ..............................................................................................................- 13 -
ai
Dataset and Data Format .............................................................................................................. - 13 -
gm
Variables....................................................................................................................................... - 15 -
3@
Method.......................................................................................................................................... - 19 -
l1
Hypothesis .................................................................................................................................... - 20 -
ra
Results .......................................................................................................................- 21 -
an
Results Tables............................................................................................................................... - 21 -
m
vs
Analysis ........................................................................................................................................ - 22 -
ru
Conclusions ................................................................................................................- 25 -
dh
Appendix ....................................................................................................................- 33 -
Appendix A: Data Download and Processing Code .................................................................... - 33 -
Appendix B: Models Code ........................................................................................................... - 34 -
Appendix C: Result Generation Code .......................................................................................... - 35 -
Appendix D: Training Graphs ...................................................................................................... - 36 -
Appendix E: Results on Data Point (GalaxyID: 100801) ............................................................ - 37 -
Appendix F: Imports and Requirements ...................................................................................... - 38 -
Downloaded from www.clastify.com by Dhruv Manral
-1-
Introduction:
Machine Learning (ML) is a dynamic field focused on learning from large amounts of data
complex problems (Roy). ML can be applied to automate processes in various fields, often
Neural Networks (NNs) are algorithms that are made of interconnected units called
om
“Neurons” and simulate the way mammalian brains process data. With recent developments
l.c
ai
in computing technologies, like Graphics Processing Units, Deep ML (Deep Learning) has
gm
grown immensely, with notable advancements in Computer Vision (CV) and Natural
3@
l1
Language Processing (NLP), which deal with unstructured data (language/images) rather than
ra
an
tabular forms (Coursera). NNs like Convolutional Neural Networks (CNNs) and Recurrent
m
Telescopes worldwide collect millions of images, and with more advanced telescopes and
y
tif
better storage systems, this is expected to increase. Such data can be used to detect exoplanets
as
Cl
and galaxies many light years away. However, precise analysis by experienced astronomers
is required to detect and classify objects of interest in these images. Often, multi-step
structure of astronomical bodies, that can be deduced through detailed analysis of images) (Li
et al.)
Downloaded from www.clastify.com by Dhruv Manral
-2-
om
l.c
The Galaxy-Zoo (Fig.1) is a crowdsourced initiative by 90,000+ volunteers, with 1 million+
ai
images of Galaxies taken by the Sloan Digital Sky Survey, classified into types based on
gm
3@
shape, color, and direction. Because of the large volumes of data collected, and the depth of
l1
processing and attention to detail required to perform these classifications, this is a perfect
ra
an
Due to the large data volume and high cost of computing power, there is often a tradeoff
dh
between the accuracy achieved and the computing resources (like time/power) used.
y
tif
Therefore efficiency, which combines accuracy and computing resource usage, and therefore
as
Cl
optimizes for the real-world applicability of these models to solving these problems must be
evaluated (IBM).
This study aims to assess the applicability of different models, based on two CNN
Architectures: ResNet and DenseNet, along with Decision-tree approaches, to the processing
of large telescopic image datasets for their morphologic classification. This is a key emerging
problem to solve in the field of astronomy, due to the increased availability of data. By
generalizing the specific architectures and model traits that suit the task of morphological-
Downloaded from www.clastify.com by Dhruv Manral
-3-
classification of telescopic images, such explorations could transform the field of astronomy
Hence, the Research Question (RQ): To what extent are Convolutional Neural Networks
based on DenseNet and ResNet architectures and Decision Trees efficient at making multi-
Background Research
om
Other Approaches
l.c
ai
On Kaggle, 320+ teams competed; NNs were not as developed then, but the Top 3
gm
participants used primitive NNs with heavy image processing and augmentation techniques.
3@
l1
has since been improved progressively. No works, however, directly compare different
m
vs
architectures, nor Decision-tree and non-decision-tree approaches, which is where this paper
ru
Neural Networks
as
Neural Networks (NNs) are ML Algorithms that simulate mammalian brains and are suited to
Cl
deep-learning tasks. NNs rely on having data to train on, as they improve their performance
on real-world tasks over time, mirroring how brains learn from experience and practice
(Hardesty).
NNs work on the principle of creating features from data and then using a weighted
combination of those features to generate output. NNs can be used for classification and
classification-based NNs.
Downloaded from www.clastify.com by Dhruv Manral
-4-
Fig.2: Barebones NN architecture; every node in the following and previous layers, but not to
Source: (Martin)
each other (Bishop).
om
l.c
A single node in a NN applies two functions. Firstly, it applies a set of weights (W) and
ai
gm
biases (b) to the output of every node in the previous layer, adding them up. Secondly, it
3@
applies a non-linearity called an activation function on the summed outputs. Without the non-
l1
ra
𝐼𝐿−1
dh
𝑖=1
as
Cl
-5-
om
Gradient descent is performed in a series of weight updates, as the values of the weights are
l.c
changed based on the loss function’s derivative with respect to the weights, to eventually find
ai
gm
a combination of weights that reaches the loss function’s minima (Fig.3) (Ruder).
3@
CNNs
l1
ra
an
Regular NNs were designed for tabular data, with initial features defined by the data scientist.
m
However, image data is stored differently, as three 2D arrays corresponding to each pixel’s
vs
ru
Image processing using typical NNs has not yielded positive results– since each pixel’s RGB
as
Cl
values must each become an individual input feature into the NN, leading to the loss of
Unlike a regular NN’s weights and biases, the CNN relies on the Convolution and Pooling
layers. These help the NN better extract features from the image, which helps eventually run
-6-
image (Fig.4).
om
A CNN applies the fundamentals of gradient descent to optimize the filters’ values, and
l.c
multiple filters of the same size are usually applied to an image in one Convolutional-Layer.
ai
gm
Each convolution generates a smaller 2D array, and multiple arrays are stacked as outputs to
3@
a layer.
l1
ra
an
m
The Pooling-Operator
vs
-7-
CNN Architecture
decrease the size of the image and increase the number of images, Convolutional-bases create
om
bottlenecks to learn the image’s features (Song et al.) . After these layers, a CNN ends with a
l.c
ai
fully-connected (regular NN) layer, hence flattening the features into a single vectorized
gm
3@
input. The second component of a complete network is a final activation layer (Hao et al.),
l1
like Sigmoid or SoftMax, that converts the image features into usable output (here,
ra
an
probabilities).
m
vs
ru
dh
Different layers, filter sizes, pooling, etc. can be combined to create different convolutional-
y
tif
architectures. Various architectures have been developed over years of research, optimized
as
for different tasks. Two prominent convolutional-architectures will be compared in this paper
Cl
-8-
ResNet
ResNet (He et al.) was developed in response to the vanishing gradient problem, where
gradients (w) became infinitesimally small with network depth, preventing data-scientists from
training very deep networks for complex relationships. ResNet leverages the concept of
om
l.c
ai
gm
3@
l1
ra
an
m
vs
ru
dh
y
tif
Fig. 7 A Sample 2-layer ResNet block with skip connection ; Source: Candidate
as
Cl
Residuals, where an Identity connection is used to skip certain layers. (Fig.7) shows a residual
block, central to ResNet architectures. The inputs undergo a linear layer, then an activation
(ReLU), and another linear layer, after which the original inputs are added again before the
data undergoes another activation (Abdullah). This Network is one of the most-cited of the 21st
The effects of the residual skip connection within a single block are therefore represented by:
-9-
DenseNet
DenseNet (Zhang et al.) was another architecture developed to address vanishing gradients. It
leverages the same skip connections to a greater extent, where each layer within a block is
connected to the activations of every previous layer. Similar to ResNet, all inputs go through
a linear layer, followed by an activation (ReLU) and another linear layer, after which the
activations of all previous layers are added before the data goes through another
activation(Fig.8).
om
l.c
ai
gm
3@
l1
ra
an
m
vs
ru
dh
y
tif
as
Cl
Fig.8: A sample 4-layer DenseNet block with skip connections1; Source: Candidate
1
This schematic refers to “weight layers” for convenience, but these are convolutional and pooling layers in
reality. The weight operation referred to in the formula also refers to the convolution operator, not direct
multiplication.
Downloaded from www.clastify.com by Dhruv Manral
- 10 -
The effects of the dense skip connections within a single dense block are shown (Rodriguez-
Urrego):
𝐾
Usually, there is also a weight matrix, where previous activations are multiplied by trainable
om
𝑘=2
l.c
ai
gm
3@
l1
ra
an
m
Both DenseNet and ResNet make claims to higher efficiency, and subsequent reports provide
dh
differing accounts based on the application (Haris). These comparisons come often from
y
tif
papers published in revered CV Journals have not reached a consensus (Chen et al.)
Therefore, it is pertinent to compare and analyze these networks in a context where deep
- 11 -
Overfitting:
om
Source: (Jordan)
performance on unseen data.
l.c
ai
Decision-trees gm
3@
Morphological-classification of
l1
ra
each stage with deep-learning, and then following the Decision-tree’s framework. A
Decision-tree provides multiple paths to reach the final classification. Fig.10 visually displays
- 12 -
Galaxy-morphology
om
Generating probabilities for each possibility will eventually
l.c
help generate the most probable path to the end, and can
ai
gm
eventually classify the galaxy image. The next question that the
3@
Willett et al.
ru
dh
y
tif
as
Cl
- 13 -
Methodology
images, and the Galaxy-Zoo Dataset will be used as a representative of these tasks.
Primary experimental data from the training and evaluation of actual models is analyzed
because it allows more freedom of manipulation of the IVs. However, this constrains the
om
analysis within available computational power and technical limitations.
l.c
Dataset and Data Format
ai
gm
Dataset:
3@
The dataset, with 61578 images, is loaded and processed from Kaggle into Python and read
l1
ra
using Pandas. (Appendix A) It is split into Training (train), Testing (test), and Validation
an
Images:
dh
The images are 424x424 Colored JPEG files (Fig.13). Each image is centered around an
y
tif
as
These are resized to 120x120 using interpolation to conserve processing power, without
significantly changing the extent to which morphological features are visible (May).
Downloaded from www.clastify.com by Dhruv Manral
- 14 -
Labels:
responses from users (broken-down in Table 2) and the dataset provides the probability of
each of these classes for every training example (Fig.14) Each galaxy is assigned a unique
om
galaxy-ID that corresponds to the image and row in the dataset.
l.c
ai
gm
3@
l1
ra
an
m
The 120x120 Image, along with the probabilities (Table 3), for the image with GalaxyID
tif
as
100801 is chosen randomly to understand the structure and correlation of each data point.
Cl
- 15 -
All probabilities >0.300 were flagged as likely answers (Table 3). Using these, we can
interpret these labels to deduce the most probable sequence of answered questions, (Table 4)
using Fig.11.
This data point appears to have one clear probable sequence of answers, however, this is not
om
Q2: Could this be a disk viewed edge on?
Class 2.2: No (0.965) -> Q3
l.c
Q5: How prominent is the central bulge compared to
Q3: Is there a sign of a bar feature through the the rest of the galaxy?
ai
center of the galaxy? Class 5.2: just noticeable (0.551) ->Q6
Class 3.2: No (0.713) -> Q4
gm
Class 5.3: obvious (0.414) -> Q6
3@
Q4: Is there any sign of a spiral arm pattern? Q6: Is Anything Odd?
l1
Class 4.1: Yes (0.875) -> Q10 Class 6.2: No (0.802) -> end
ra
an
m
Variables
vs
ru
Independent Variables:
dh
y
The first part of the network is the convolutional base, which extracts “features'' from the
Cl
- 16 -
Activation
The final part of the model is the activation function, which converts the features detected by
the convolutional base into probabilities. The effect of two different types of Activation
Functions will be measured on model performance. Each activation will be combined with
Sigmoid:
This approach uses 37 separate sigmoid activation nodes,
om
class (Fig.16), independent of the probabilities of any other
l.c
classes. This was adapted from (Limanas), a submitter to the
ai
Fig.16 Sigmoid Activation Function;
competition, but has been used by most future published NNgm Source: (Ali)
3@
Decision-tree
an
m
3 P(Q2=1)
tif
= P(Q2=1) = not(P(Q1=2))
Cl
Based on the Decision-tree in Fig.11, the alternative is where the probability of a volunteer
facing a particular question must also be considered in the probability of a certain answer.
each question. The weightages are multiplied by the probabilities generated by a SoftMax (a
Downloaded from www.clastify.com by Dhruv Manral
- 17 -
combination of Sigmoids that sum to 1) for each question, Fig.17. This approach is adapted
from (Kawagichi), who submitted to the competition. However, this source is not published
and necessarily credible; therefore the correctness of this design is not guaranteed.
om
Dependent Variables (DVs):
There are three DVs: performance, resource usage, and combined efficiency, which can be
l.c
ai
quantified in different ways.
gm
3@
Firstly, the “final” model at the end of 40 epochs will be saved, which standardizes the
l1
ra
Secondly, the “optimal” model will be saved at the point of overfitting, which is theoretically
ru
the best performance the model will achieve on unseen data, allowing the models’ “best-
dh
y
The following will be measured and compared across IVs to answer the RQ (Appendix B).
Model Performance:
Quantitative Measures: Optimal Test Loss (𝑇𝑒𝑠𝑡 𝑀𝑆𝐸𝑜𝑝𝑡 ), Final Test Loss (𝑇𝑒𝑠𝑡 𝑀𝑆𝐸40 ),
The loss (MSE) on the Testing data will be found at both the Final and Optimal stages, using
Keras’s evaluate function. This measures the performance of the model on unseen data, and
simulates the model’s performance in real-world application. Low MSE values are good
Downloaded from www.clastify.com by Dhruv Manral
- 18 -
𝑌𝑖 = ith label
̂ 𝑖 = ith prediction
𝑌
These metrics answer the accuracy and performance part of the research question, but further
om
analysis is needed to analyze the efficiency of the models.
l.c
ai
Resource Usage:
gm
Quantitative Metrics: Training time to final (𝑇40 ), Training time to optimal epochs (𝑇𝑜𝑝𝑡 ),
3@
l1
𝑇
Epochs needed to reach optimal (𝐸𝑜𝑝𝑡 ), number of parameters (𝑝), Time per epoch (𝐸)
ra
an
The time taken to reach the final model will be measured and divided by 40 to find the
ru
dh
The time/number of epochs taken to reach the optimal model will be measured, representing
Cl
the resources needed to reach optimum. Time is measured using Python’s time library.
The number of parameters in the model also indicates how much training/storage it requires
for a dataset.
Training Efficiency:
∆𝑀𝑆𝐸 ∆𝑀𝑆𝐸
Quantitative Metrics: loss gain per epoch , loss gain per training time , overfitting
𝐸 𝑇
Downloaded from www.clastify.com by Dhruv Manral
- 19 -
By processing the raw data (Table 6), loss gain/epoch, and loss gain/time are calculated. Both
show how the model trains given the same resources till it reaches optimum, quantifying the
accuracy/resource tradeoff, partly answering the RQ. The train & val loss/time graphs can be
om
However, the training efficiency is not the only important factor in the model’s real-world
l.c
ai
applicability. A model can train very well, but if it overfits the data easily, it won’t perform
gm
well on unseen data. The difference between val/test and train loss at different stages, and a
3@
l1
visual inspection of the divergence in val/train loss from the graphs helps quantify this.
ra
an
Method
m
vs
Using the following model specifications, 4 trials (Table 8) will be compiled, trained, and
ru
dh
evaluated to produce a DV data table using Python, to effectively compare across IVs, as
y
tif
how every model should be trained. They must be kept constant to not skew the results.
Downloaded from www.clastify.com by Dhruv Manral
- 20 -
om
Data
Python and Python 3.7.13, and Keras from Tensorflow Updated Python/Keras versions performance
l.c
Keras version 2.8.2 train with different efficiencies, and resource
ai
and may have slight differences in usage
Dataset Galaxy Zoo Version 1.0 Different releases have different performance
vs
zoo-the-galaxy-challenge/data data.
Imports and Appendix F details all imports and their Different versions of modules may performance
dh
functionality usage
tif
as
Cl
Hypothesis
It can be hypothesized that DenseNet outdoes ResNet in terms of resource usage because it
requires fewer parameters, and addresses vanishing gradients more drastically, but ResNet
will perform better in terms of accuracy because more parameters can fit the data better. The
encodes relationships that would otherwise take time to learn, but might not be more
accurate, because small errors in initial probabilities have high weightage on the predictions'
accuracy.
Downloaded from www.clastify.com by Dhruv Manral
- 21 -
Results
Results Tables
Model Performance:
om
T4 True DenseNet 121 0.10295 0.05962 0.05840 0.05892
l.c
ai
Table 10 Optimal model performance gm
Trial D.Tree Type Depth Epochs 𝑇𝑟𝑎𝑖𝑛 𝑀𝑆𝐸40 𝑉𝑎𝑙 𝑀𝑆𝐸40 𝑇𝑒𝑠𝑡 𝑀𝑆𝐸40
3@
l1
Model Efficiency:
as
Cl
- 22 -
Analysis
Trial 2, DenseNet without Decision-tree, had the lowest testing loss of 0.00876 at optimal
stage. In terms of purely training efficiency (average loss gain/ time), the best-performing
0.0002376/second. A close second was ResNet, both with and without the Decision-tree.
Between ResNet(101) and DenseNet(121), ResNet always achieves the lower final loss,
om
regardless of Decision-tree. The gap between the final ResNet and DenseNet loss is greater
l.c
ai
with Decision-tree. However, for the optimal model, DenseNet(121) performs marginally
gm
better than ResNet(101) without the Decision-tree, but ResNet(101) far outperforms the
3@
l1
Moreover, ResNet tends to create better final models, likely because DenseNet reaches
ru
dh
optimal validation loss (point of overfitting) around the 15-18 epoch, much earlier than
th
y
tif
ResNet (around the 30 epoch) and therefore the final model is much more overfit than the
th
as
Cl
ResNet. Overall, optimal model loss is a better parameter to measure accuracy and therefore
yields comparable model performance between the two architectures with slight preference to
DenseNet.
DenseNet has significantly higher final training loss than ResNet, which means that it isn’t
even able to fit the training data as well, but also means that there is less chance of
overfitting. ResNet takes more time per epoch, likely because it has more trainable
parameters.
Downloaded from www.clastify.com by Dhruv Manral
- 23 -
With Decision-tree, ResNet decreases loss by a higher amount per epoch and per second, but
without the Decision-tree DenseNet significantly decreases loss by more per epoch and per
second. The DenseNet with the Decision-tree is the worst-performing model and also trains
the least efficiently. This reaffirms the idea that ResNet pairs much better with Decision-tree,.
Adding a Decision-tree improves the efficiency of the ResNet but worsens that of the
DenseNet.
DenseNet, however, has 1/6 of the number of parameters that ResNet does, and yet, models
th
om
l.c
Inspecting the graphs of model training, it is observed that there is a much higher disparity
ai
gm
between training and validation loss for ResNet than in DenseNet throughout training. This is
3@
usually because a higher number of parameters allows the network to fit the training data
l1
ra
better. Both have similar amounts of fluctuation in training and validation loss, and both
an
m
Overall, DenseNet seems to perform more efficiently with a decision tree and achieves the
y
highest efficiency, and ResNet performs better without a decision tree – unexpected, as
tif
as
Between Decision-tree and Sigmoid, for all trials, both final and optimal model losses for
sigmoid models were lower than their Decision-tree counterparts. For the optimal ResNet
model, the gap between Decision-tree and Sigmoid is small, but DenseNet seems to have
Generally, Decision-tree is worse than individual sigmoid layers. Theoretically, the Decision-
tree manually encodes relationships in the data into the model instead of learning them,
Downloaded from www.clastify.com by Dhruv Manral
- 24 -
therefore should perform better, but this isn’t true. A likely explanation is that the Decision-
tree is highly dependent on initial questions being answered accurately (~ the first 10 classes)
– all successive probabilities are highly impacted by any variations in initial probability.
Therefore, even slight errors in predicting the initial probabilities are reflected heavily in the
entire prediction.
For both convolution-bases (DenseNet especially) adding the Decision-tree makes the
training process less efficient (smaller change in loss per time and epoch), contradicting the
hypothesis. The Decision-tree doesn’t affect the number of trainable parameters, and
om
therefore generally provides worse models for the same number of parameters.
l.c
ai
gm
Noticeably in graphs, throughout training, there is a much higher disparity between training
3@
and validation loss for Models with a Decision-tree. Val loss seems to be generally
l1
ra
decreasing till the end of training in models with a Decision-tree, while it flattens out after the
an
m
point of overfitting, with no change except for light fluctuations with Sigmoid. However,
vs
ru
Decision-tree models seem to have significantly more loss fluctuation. This can again be
dh
attributed to the fact that a Decision-tree manually encodes relationships that the sigmoid has
y
tif
to learn. Hence, small changes in weights to initial outputs have cascading effects on model
as
Cl
performance. In models without the Decision-tree, the model can easily fit itself to the
specific variations in the training data without actually learning overarching probabilistic
relationships, but the Decision-tree is forced to, which explains the trends in val/train loss
Decision-tree models take slightly longer to train per epoch, but they reach the optimum
sooner than sigmoid-based models. For every trial, adding a Decision-tree made the model
reach the optimum (start to overfit) after fewer epochs and less time. This is positive because
it takes less temporal resources to train an optimal model, but it tends to overfit very easily.
Downloaded from www.clastify.com by Dhruv Manral
- 25 -
Returning to the data point first dissected (GalaxyID 100801), the structure of predictions is
analyzed (Appendix E). Again, probabilities >0.300 are flagged as answers. Most models are
predicting realistically close values, but almost no predictions have the exact same
probabilities, which seems discouraging. However, most models still generate nearly accurate
om
l.c
done with considerable accuracy. For all models, predictions get worse towards the last
ai
questions, they are most dependent on initial probabilities, and small errors in initial
gm
3@
predictions have the most weightage. However, making any generalizable conclusions based
l1
Conclusions
vs
ru
dh
While the method adequately answered RQ, by considering model performance and resource
as
Cl
Firstly, there was no data cleanup/preprocessing, which might mean that faulty data points
(eg. outliers/null measurements) skewed the data. In typical ML tasks, multiple steps of data
could have been used to improve the model’s training. Insufficient training data leads to
Downloaded from www.clastify.com by Dhruv Manral
- 26 -
increased overfitting, and less generalizability of the model, which hinders the answering of
Lastly, the RQ could have been answered better (more generalizably), if analysis was
features.
To extend, factors like hardware, optimizers, and kernel size can be evaluated to find the
om
l.c
Final Conclusions
ai
gm
In conclusion, answering the RQ, the tested Dense-Net and Res-Net architectures with and
3@
achieving a best MSE of 0.00876, and an RMSE of 0.0936, close to Kaggle’s winning loss
an
m
(0.07466) (Kaggle). This means that there was an average 0.0936 (~9%) difference between
vs
predicted and real probabilities, which is acceptably low. Between the IVs, while there were
ru
dh
many complexities in trends, it was generally identified that DenseNet classifies more
y
tif
efficiently than ResNet, and simple Sigmoid activations tend to perform morphological-
as
Cl
classification more efficiently. The hypothesis was mostly incorrect and the trends were
surprising,
Overall, most network architectures performed acceptably well on the task, enough for real-
world application. This paper helps improve our understanding of CNN features that suit
astronomical labs.
Downloaded from www.clastify.com by Dhruv Manral
- 27 -
Works Cited
towardsdatascience.com/introduction-to-resnets-c0a830a288a4.
Ali, Amir. “Logistic Regression with Practical Implementation.” Medium, The Art of Data
regression-in-machine-learning-ad4d5fef88bb.
om
CNNs in Python." Analytics Vidhya, 27 Jul. 2021, medium.com/analytics-
l.c
vidhya/how-to-build-a-multi-class-image-classification-model-without-cnns-in-
ai
python-660f0f411764. gm
3@
l1
ra
techtarget.com/searchenterpriseai/definition/convolutional-neural-network.
vs
ru
dh
Chen, Liang-Chieh, et al. “DeepLab: Semantic Image Segmentation with Deep Convolutional
Nets, Atrous Convolution, and Fully Connected Crfs.” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 40, no. 4, 2018, pp. 834–848.,
https://doi.org/10.1109/tpami.2017.2699184.
Chen, Zhiqiang, et al. "Integrating Spatial and Temporal Features for Early Recognition of
- 28 -
Coursera. “Structured vs. Unstructured Data: What’s the Difference?” Coursera Articles,
Coursera, https://www.coursera.org/articles/structured-vs-unstructured-data-whats-
the-difference.
zoo-the-galaxy-challenge.
om
Neural Networks and Variational Autoencoders for Unbalancing Data.” Sensors, vol.
l.c
ai
20, no. 4, 2020, p. 1214., https://doi.org/10.3390/s20041214.
gm
3@
Hao, Wang, et al. "The role of activation function in CNN." 2020 2nd International
l1
ra
2020.
vs
ru
dh
Haris, Faizan. "ResNets, DenseNets, and UNets." The Startup - Medium, 26 Sept. 2020,
y
tif
https://medium.com/swlh/resnets-densenets-unets-6bbdbcfdf010.
as
Cl
news.mit.edu/2017/explained-neural-networks-deep-learning-0414.
He, Kaiming, et al. "Deep Residual Learning for Image Recognition." Proceedings of the
doi:10.1109/CVPR.2016.90.
Downloaded from www.clastify.com by Dhruv Manral
- 29 -
Hirschman, Isidore Isaac, and David V. Widder. The convolution transform. Courier
Corporation, 2012.
IBM Cloud. “Machine Learning: What it is and why it matters.” IBM, IBM,
https://www.ibm.com/cloud/learn/machine-learning.
IBM. "Resource utilization and performance." IBM Informix documentation, version 14.10,
IBM, https://www.ibm.com/docs/en/informix-servers/14.10?topic=basics-resource-
om
utilization-performance.
l.c
ai
Jain, Rashmi. “3 Types of Gradient Descent Algorithms for Small & Large Data Sets.”
gm
3@
HackerEarth Blog, 2 July 2019, https://www.hackerearth.com/blog/developers/3-types-
l1
gradient-descent-algorithms-small-large-data-sets/.
ra
an
m
Jordan, Jeremy. “Deep Neural Networks: Preventing Overfitting.” Jeremy Jordan, Jeremy
vs
ru
overfitting/.
y
tif
as
https://www.kaggle.com/code/hironobukawaguchi/galaxy-zoo-xception.
Kumar, Ashish. "A Look at Gradient Descent and RMSprop Optimizers." Towards Data
rmsprop-optimizers-f77d483ef08b.
- 30 -
Li, M. P., Ostriker, J. P., Sunyaev, R., & Blandford, R. D. "Non-thermal radiation from
clusters of galaxies." Monthly Notices of the Royal Astronomical Society, vol. 397,
Limanas, Henrique. “Galaxy Zoo Classifier Galaxies.” Kaggle, Kaggle, 13 Mar. 2018,
https://www.kaggle.com/code/henriquelimanas/galaxy-zoo-classifier-galaxies.
om
Malek, Abdul. The Einsteinian Universe?: A Dialectical Perspective of Modern Theoretical
l.c
Physics and Cosmology. A. Mannan, 2004.
ai
gm
Martin, Patrick. “The Universal Approximation Theorem Is Terrifying.” Medium, Medium, 9
3@
l1
theorem-is-terrifying-83a53acc4192.
m
vs
ru
May, Ann. "Resizing Images using Various Interpolation Techniques." Medium, 10 May
dh
2021, annmay10.medium.com/resizing-images-using-various-interpolation-
y
tif
techniques-3c302e2e08c5.
as
Cl
classification-b6631a8ef803.
https://tikz.net/neural_networks/.
Roy, Priya. “The Difference Between Artificial Intelligence and Machine Learning.”
Downloaded from www.clastify.com by Dhruv Manral
- 31 -
https://www.analyticsinsight.net/the-difference-between-artificial-intelligence-and-
machine-learning/#
Rodriguez-Urrego, David, and Miguel A. Maheut. "Deep Learning for Sentiment Analysis: A
Ruder, Sebastian. "An overview of gradient descent optimization algorithms." arXiv preprint
om
arXiv:1609.04747 (2016).
l.c
ai
gm
Song, Yan, Ian McLoughLin, and Lirong Dai. "Deep bottleneck feature for image
3@
Willett, Kyle W., et al. “Galaxy Zoo 2: Detailed Morphological Classifications for 304 122
ru
Galaxies from the Sloan Digital Sky Survey.” Monthly Notices of the Royal
dh
y
https://doi.org/10.1093/mnras/stt1458.
Cl
Yu, Dingjun, et al. "Mixed pooling for convolutional neural networks." Rough Sets and
Ying, Xue. "An overview of overfitting and its solutions." Journal of physics: Conference
- 32 -
Vision, 2021,
https://openaccess.thecvf.com/content/WACV2021/papers/Zhang_ResNet_or_Dense
Net_Introducing_Dense_Shortcuts_to_ResNet_WACV_2021_paper.pdf.
om
l.c
ai
gm
3@
l1
ra
an
m
vs
ru
dh
y
tif
as
Cl
Downloaded from www.clastify.com by Dhruv Manral
- 33 -
Appendix
om
l.c
ai
gm
3@
l1
ra
an
m
vs
ru
dh
y
tif
as
Cl
Downloaded from www.clastify.com by Dhruv Manral
- 34 -
Defining Model
om
l.c
ai
gm
3@
l1
ra
an
m
Custom Callbacks
vs
ru
dh
y
tif
as
Cl
- 35 -
Training Graphs
om
l.c
ai
gm
3@
l1
ra
Printing Results
an
m
vs
ru
dh
y
tif
as
Cl
Downloaded from www.clastify.com by Dhruv Manral
- 36 -
T2 DenseNet 121
om
l.c
ai
gm
3@
Architect
ure
m
vs
T3 ResNet 101
ru
dh
y
tif
as
T4 DenseNe 121
Cl
t
Downloaded from www.clastify.com by Dhruv Manral
- 37 -
om
Class 5.1 0.000 0.020 0.011 0.020 0.000
l.c
Class 5.2 0.551 1 0.484 0.333 0.406 0.000
Class 5.3 0.414 0.325 0.420 0.301 0.000
ai
Class 5.4 0.000 0.013 gm 0.047 0.030 0.000
Class 6.1 0.198 0.271 0.414 0.213 0.255
3@
- 38 -
OS and Language
Python 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
Linux-5.10.147+-x86_64-with-glibc2.29
om
pandas 1.3.5 3.0.0
session_info 1.0.0 portpicker NA
l.c
tensorflow 2.11.0 prompt_toolkit 2.0.10
ai
psutil 5.4.8
astor
astunparse
0.8.1
1.6.3 pyarrow
gm
ptyprocess 0.7.0
9.0.0
3@
backcall 0.2.0 pyasn1 0.4.8
cachetools 5.3.0 pyasn1_modules 0.2.8
l1
cython_runtime NA pydevd_plugins NA
ru
- 39 -
om
l.c
ai
gm
3@
l1
ra
an
m
vs
ru
dh
y
tif
as
Cl