Professional Documents
Culture Documents
Ram Sharma (19CS8126) Final Year Thesis
Ram Sharma (19CS8126) Final Year Thesis
May 2023
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
NATIONAL INSTITUTE OF TECHNOLOGY,
DURGAPUR, INDIA
DECLARATION
I the undersigned declare that the thesis work entitled “Classification of Astronomical Images
followed by Identification and Localization in an Image Frame comprising multiple
images of different types using RCNN“, submitted towards partial fulfillment of
requirements for the award of the degree in Bachelor of Technology in Computer Science and
Engineering is my original work and this declaration does not form the basis for award of any
degree or any similar title to the best of my knowledge.
-----------------------------------------------
Durgapur Name: Ram Sharma
May 2023 Roll No: 19CS8126
CERTIFICATE OF RECOMMENDATION
This is to certify that the thesis entitled “Classification of Astronomical Images followed
by Identification and Localization in an Image Frame comprising multiple images of
different types using RCNN“, submitted by Ram Sharma(19CS8126) of Department of
Computer Science and Engineering, National Institute of Technology, Durgapur, in partial
fulfillment of the requirements for the award of the degree in Bachelor of Technology in
Computer Science and Engineering is a bonafide record of work carried out by him/her under
my/our guidance during the academic year 2022 – 2023.
______________________ ______________________
Prof. Tanmay De Dr. Goutam Sarker
Head of Department, Associate Professor,
Department of Computer Science Department of Computer Science
and Engineering, and Engineering,
National Institute of Technology, Durgapur National Institute of Technology, Durgapur
CERTIFICATE OF APPROVAL
This is to certify that we have examined the thesis entitled “Classification of Astronomical
Images followed by Identification and Localization in an Image Frame comprising
multiple images of different types using RCNN”, submitted by Ram Sharma(19CS8126)
and hereby accord our approval of it as a study carried out and presented in a manner required for
its acceptance in partial fulfillment of the requirements for the award of the degree in Bachelor of
Technology in Computer Science and Engineering) for which it has been submitted. It is to be
understood that by this approval the undersigned do not necessarily endorse or approve any
statement made, opinion expressed, or conclusion drawn therein but approve the thesis only for
the purpose for which it is submitted.
Examiners:
First and foremost, I am incredibly grateful to my supervisor, Dr. Goutam Sarker Sir,
(Associate Professor, Department of Computer Science and Engineering, NIT
Durgapur) for his valuable advice, continuous support, and patience. His immense
knowledge and great experience have encouraged me throughout my academic project
and daily life.
Special thanks to my project group members (Barnali Ghosh, Aaditya Pratap, Asim
Sen) for constant support during this project phase. Giving honest opinion throughout
and helping me with positive criticism.
I also want to thank all our department's faculty and PhDs for guidance and support
throughout my four years of bachelor’s. Finally, I would like to express my gratitude to
my parents and my brother. With their tremendous understanding and encouragement
over the past few years, I can complete my study.
We discuss R-CNN, Fast R-CNN, and Faster R-CNN algorithms in this study to
automatically detect and identify astronomical photos. Based on actual data, we compare
the outcomes of these strategies and analyze their advantages and disadvantages. Utilizing
a dataset that has three classes (Black Hole, Galaxy, and Nebula), we create a data model.
Then, we train R-CNN, Fast R-CNN, and Faster R-CNN. We then test these models on a
test dataset made up of difficult objects that weren't present during training. Using GPU
evaluation, we gather data on the AP and IOU for each model and network based on a
range of proposal numbers and runtime rates. The best model for image identification using
visual deep learning, according to the findings, is Faster R - CNN with 2000 proposals.
Keywords: R - CNN, Fast R - CNN, Faster R - CNN, Astronomical Images, Black Hole,
Nebula, Galaxy, Detection, Recognition, Localization, Deep learning.
3.3.1 Dataset 13
Object category recognition has drawn more and more attention in recent years. Image
categorization and object localization are two essential tasks. The process of assigning a
picture one or more labels corresponding to the presence of a category in the image is
known as "image classification." “Object localization” detects instances of a given class
in the image, often up to a bounding box.
Detectors, devices, telescopes, and even probes sent to distant planets for data collection
as part of sky surveys to map our cosmos have made significant strides recently in
astronomy. Astronomers are looking for ways to automate the manual scanning processes
prone to human mistakes. They may use cutting-edge data mining techniques, statistical
methodologies, and data science tools to extract astronomical knowledge and information
from these massive raw databases. (This new field is known as Astroinformatics[1]).
Early endeavors were launched in 2010 by the National Research Council of the United
States [2]. This step served as the foundation for subsequent research contributions [3] that
concentrated on and enhanced the field using massive, globally distributed collections of
digital astronomical databases, including the Sloan Digital Sky Survey (SDSS), Square
Kilometer Array telescope, and Large Synoptic Survey Telescope (LSST).
But working on these large datasets is very time-consuming and unaffordable. These
classification and localization problems should be resolved to accurately map our universe
to learn more about it and support both the old and new cosmological theories. Deep
learning is the best candidate technique because it has successfully been applied to large
image datasets like our domain.
We employ transfer learning for the classification challenge because there are fewer
photos, and fine-tuning our data set on a pre-trained image-net model produces better
results than other models. Additionally, by convolution of a kernel or filter on the image
matrix, the convolution operation extracts the features. It consists of an image matrix with
the dimensions (h, w, c) (height, width, channels, and filter).
Fig.3.1.2. shows the flow diagram of the proposed methodology by unveiling all the
steps involved in the study.
In the first stage, the input images are resized to a single scale of 100 × 100 as the size of
the images varies. The images are then normalized between the desired ranges for better
prediction. The dataset labels are also converted to one-hot encoding to make one-hot
vectors. The labels and images are then split into test and train sets along with training the
model. Following the training, the predictions are made from the model.
a) Black Hole,
b) Galaxy,
c) Nebula
● A small amount of data compared to the vast amounts needed by deep learning
architectures.
Several loss functions are applied to address various deep learning-based problems,
including class imbalance, border refinement, and the reduction of False Positives (FP).
When we have extreme imbalance classes, focus loss is applied. For instance, class
imbalance must be addressed in object detection when the foreground-to-background ratio
is 1:500 since it results in:
• Training being skewed because many situations have a negative impact on training
while others have a positive impact.
• The problems of overlearning, which results in prejudice.
With biased class data, most machine learning techniques could be more helpful. However,
we can change the training algorithm to account for the classes' skewed distribution. Giving
differing weights to the majority and minority classes will help achieve this. During the
training phase, the classification of the classes will be influenced by the weight differences.
By default, the value of class_weight = None, i.e., both classes have been given equal
weights. It can either give as ‘balanced’ or pass a dictionary containing manual weights
for both classes.
When the class_weights = ‘balanced,’ the model automatically assigns the class weights
inversely proportional to their respective frequencies.
Here,
wj: is the weight for each class(j signifies the class).
n_samples: is the total number of samples or rows in the dataset.
n_classes: is the total number of unique classes in the target.
n_samplesj: is the total number of rows of the respective class.
Reduced training time enhanced neural network performance (in most cases), and the lack
of a significant amount of data are three of transfer learning's most significant benefits.
Transfer learning is useful in situations when it is not always possible to get the large
amounts of data required to train a neural network from scratch. Transfer learning can
provide an effective machine learning model with relatively minimal training data
because the model has already been trained. This is particularly helpful in natural
language processing, where handling vast labeled datasets necessitates a high level of
expertise. Additionally, training time is shortened because a complex task can require
days or even weeks to build a deep neural network from scratch.
Fig. 3.4.3. Performance Training graph with and without transfer learning.
We must perform data augmentation tasks to make models easily converge and overcome
the issues of limited data, which ultimately helps to reduce the over-fitting problem during
the training stage. The majority of deep learning algorithms based on CNN require more
data to generalize well. For the augmentation task, we duplicate the images by:
• Randomly shifting images horizontally.
• Randomly shifting images vertically.
• Horizontally flipping.
• Vertically flipping.
• Rotating images by 90 degrees.
Fig. 3.4.4. Performance Training graph with and without transfer learning.
The methods taken by a Faster R-CNN algorithm to find objects in a picture are outlined
below in brief:
1. Give the ConvNet an input image, and it will return feature maps for the image.
3. Utilise the ROI pooling layer to reduce the size of each proposal to the same level.
4. Pass any expected bounding boxes for the image to a fully linked layer so that it can be
categorised.
Our network's design is based on the VGG16 and includes 13 convolutional layers, four
pooling layers, and three fully-connected layers. The same kernel size, pad, and stride are
utilized in all 13 convolutional layers. The following equation can be used to determine the
size of the output image after convolution.
The final size of the output image will be adjusted to (M/16, N/16) using each pooling layer
to reduce the size of the input images from (M*N) to 1/2. A specific convolutional layer
may produce the low-resolution feature map from the high-resolution feature map, but the
computational cost is significant.
Region Proposal Networks (RPN) generate proposals from an input image. To create a 2D
feature map (100 x 100), several convolution and pooling network layers are applied to an
input image. Class-agnostic region proposals are selected using a 3x3 sliding window for
each pixel across the feature map. Each 3x3 sliding window contains nine anchors, each of
which is centered at the window and is made up of three scales (128, 256, 512) and three
aspect ratios (1:1, 1:2, 2:1). Then, each sliding window in maps to a vector with 512
"i" is the index of an anchor in a mini-batch, "p" denotes the likelihood that a proposal is
an object, "p*" denotes the true label of a proposal, "N" denotes the number of anchors,
and "t" denotes the coordinates of the predicted and ground-truth bounding box. Lcls stands
for classification loss and is a log loss over two classes (object or not object). Ncls and
Lreg are two normalization parameters. Regression loss is lreg. The classification score
loss over two classes (object or no object) is the first term, and the Bbox regression loss
when an object is present (p&=1) is the second term. RPN must therefore verify in advance
which pixels belong to objects and which pixels they correspond to. Bounding boxes and
pixels that correspond to them [7].
Features inside each area proposal are first combined to create a fixed-size feature map, a
process known as ROI pooling. The pooled area passes through CNN, the SoftMax
classifier, and two fully linked branches for the bounding box regressor. The feature maps
for each proposal are produced as the primary goal of ROI pooling. It scales the section of
the input feature map that corresponds to each region proposal to the predetermined size.
Whereas, for the localization, Mean IoU is used; this tells us how far two regions match
with each other. It is defined as:
These two measures succinctly describe the precision and caliber of object
detections.
To obtain the whole convolutional feature map, we loaded the VGG16 model as the
base model mentioned. The output is then supplied to the ROI pooling layer, which
performs a pooling operation on a portion of the input map corresponding to region
proposals in the original picture to produce region proposals mapping to the entire
feature map. After applying NMS to lessen the number of candidates and sampling
Table 4.2.1 AP, Training Time, Testing Time for 160 images.
We can see that Faster R-CNN has substantially higher detection accuracy than R-
CNN and Fast R-CNN approaches. Dataset 3's performance is superior to the
performance of the other two dataset’s low proposal numbers. The findings
demonstrate that performance can be enhanced by using large training samples with
numerous proposals.
Tab. 4.2.4. Prediction of our model in the form of data frame/CSV file
Three R-CNN techniques to find and identify astronomical images. We created a dataset
using the images provided and a data model with three classifications (Black Hole, Galaxy
Nebula). Then, using a test dataset of complex objects that had yet to be seen during
training, we trained R-CNN, Fast R-CNN, and Faster R-CNN. On the GPU, we evaluated
and collected data on the AP and IOU for each model and network based on varied proposal
numbers and runtime rates. Based on these findings, we discovered that Faster R-CNN is
the most effective model for predator detection utilizing visual deep learning models. Low
illumination environments and massive distances from objects result in less clear images,
which is why the traditional image segmentation is not working because of weak
descriptors. The objects’ shapes are significantly different.
As a result, the training datasets must include items in various images. Data augmentation
was used because there weren't enough datasets. Our work has the advantage of
overcoming the constraints mentioned earlier by utilizing cutting-edge CNN techniques.
These methods do not produce the intriguing points seen in the low-contrast image; instead,
abstract discriminative features from the object are created.
Tab. 5.1.1. Differences between RCNN, Fast RCNN, and Faster RCNN.
SSD, Like YOLO, SSD predicts the bounding boxes based on the feature maps of
each convolutional layer (the result of each filter or layer). The 3x3 convolutional
kernel is applied to the combined feature maps to forecast bounding boxes and
classification probability. SSD is a group of algorithms, with RetinaNet being the
most often used one.
Computer Vision is a trending topic with many fields to explore and use computer
vision for our betterment. We can use this method or model to test various real-
world examples i.e. Blood cell detection problems or brain tumor detection.
1. https://en.wikipedia.org/wiki/Astroinformatics
2. Astronomy and Astrophysics Decadal
Survey(https://www.nap.edu/read/12951/chapter/1)
3. Examples of related Academic Research:
http://iopscience.iop.org/article/10.1086/507440 , https://arxiv.org/abs/astro-ph/0106481 ,
https://arxiv.org/abs/1711.05744 and
https://arxiv.org/abs/1706.02467https://classic.sdss.org/supernova/aboutsupernova.html
4. Examples of related Academic Research:
http://iopscience.iop.org/article/10.1086/507440 , https://arxiv.org/abs/astro-ph/0106481 ,
https://arxiv.org/abs/1711.05744 and
https://arxiv.org/abs/1706.02467https://classic.sdss.org/supernova/aboutsupernova.html
5. T. -Y. Lin, P. Goyal, R. Girshick, K. He and P. Dollar, “Focal loss for dense object
detection,” in Proc. of the IEEE Int. Conf. on Computer Vision, Venice, Italy, pp. 2980–
2988, 2017.Girshick, R. Fast R - CNN in IEEE International Conference on Computer
Vision 2015. LosAlamitos, CA, USA.
6. N. Mushtaq, A. A. Khan, F. A. Khan, M. J. Ali, M. M. Ali Shahid et al., “Brain tumor
segmentation using multi-view attention-based ensemble network,” Computers,
Materials & Continua, vol. 72, no. 3, pp. 5793–5806, 2022.
7. A. Hernández-Serna and L. F. Jiménez-Segura, “Automatic identification of species with
neural networks,” PeerJ, vol. 2, pp. 563, 2014.
8. R-CNN, Fast R-CNN, Faster R-CNN, YOLO — Object Detection Algorithms