You are on page 1of 1

Big Data of the Huge Universe (BIDHU)

When Galaxies Collide A Machine Learning Approach

Duilia F. de Mello1,2, Felipe Augusto de Castro e Silva1,3, Gabriel Alexandre Santos4,5, Pedro Xavier4,6
1The Catholic University of America, 2 NASA Goddard Space Flight Center, 3 University of Campinas,
4 Northern Virginia Community College (NOVA), 5Uniandrade PR, 6Federal Institute of Goias

Abstract
We present the latest results of an ongoing research project searching for clues regarding
the evolution of galaxies. Our own galaxy, the Milky Way, is in an active process of collision
with small dwarf galaxies and will collide with the large Andromeda galaxy in approximately
4 billion years from now. In this project, we are searching for evidence that other galaxies
are going through similar processes. We are using the largest database of the universe,
known as the Sloan Digital Sky Survey, to search for colliding galaxies. First we selected
a small slice of the universe and were able to identify 100 candidates. We improved our
search code and later we were able to select 40,000 pairs of galaxies. We are now in the
process of implementing a code that is training the computer (Machine Learning) to
verify whether these pairs of galaxies are in advanced stages of collision. Here we present
the latest results on the Machine Learning Approach.

Galaxies Collision through Convolutional Neural Networks


We adopt a machine learning approach that uses a convolutional neural network to
determine, from galaxies images, whether there is or not a collision. Results were
considerable successful and some new approaches were proposed to improve the method in
future works.
Convolutional Neural Networks are mathematical models inspired by the organization of
the animal visual cortex that are largely used in image processing and recognition. Their
operation can be compared to the human eye; for example, if you see a dog, in a small amount
of time, your eye will register the image, transform it into electric pulses, send it to some
neurons that are going to classify the data and make you understand that there is a dog in
front of you. In this work, we make a learning machine see some galaxies images and
classify them as colliding or non-colliding galaxies.






To make this classification possible, we need to teach the mathematical model how to
understand whether these galaxies are colliding or not. This is done through a process
named training, in which were presented 200 pre-classified galaxies by eye by our team to
the machine (100 colliding galaxies and 100 non-colliding galaxies) to make it understand
which images contain colliding galaxies and which contain non-colliding galaxies.
COLLIDING and
NOT












Interested in joining the BIDHU project? Email demello@cua.edu, follow our posts on
LinkedIn and come to 206 Hannan Hall.
We have billions of galaxies to classify and many discoveries to be made!

Examples of colliding galaxies discovered by the BIDHU team






Results
Various architectures of convolutional neural networks
were tested. In the best outcome, in a set of 80 images (40
colliding galaxies and 40 non-colliding galaxies), the
galaxies in the images classified as colliding were really in
collision, but 12 colliding galaxies were misclassified as
non-colliding, what is called underfitting.

We conclude that our trained machine was 85% of the
time correct BUT current sample is small and need to
be increased!!!

YES YES YES YES YES YES YES YES YES YES YES YES
YES YES YES YES YES YES YES YES YES YES YES YES
YES YES YES YES YES YES YES YES YES YES YES YES
YES YES YES YES YES YES YES YES YES YES YES YES
NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO
NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO
NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO NO
NO NO NO NO












Conclusions and future work


Our method can already be used to classify real data.
Due to hardware processing narrows and the number of
available previous classified images, more complex
architectures could not be tested yet.
Many other machine learning methods can be
aggregated for more precise results.
Expand classification to 40,000 galaxies soon
We will create a GUI that will allow each user to select a
galaxy on the screen and let the device do the
classification.

The BIDHU Project A Success Story


Two years ago D. de Mello embarked in a BIG DATA
project that she didn't have much experience with, but
she could see the potential it had in introducing
undergrads to research.

Most of the students she talked to had heard of big data,
but when they accepted working on this project they had
no idea that their database would be the entire
universe!

Today we are a group of 10 people spread in the US and
in Brazil working on a project we call the Big Data of the
Huge Universe (BIDHU). This is what we have
accomplished so far:

First step: sample selection by undergraduate studens
at CUA W. Barbosa, Ana Nascimento (visiting), Rocio
Rossi : 100 colliding galaxies
Second step: expand sample with international
collaborators A. Borges, M. Goya, S. Puga: 40,000
candidates
Third step: machine learning by undergraduate
students at CUA and NOVA Felipe de Castro e Silva,
Gabriel A. Santos, Pedro Xavier

1 undergraduate thesis in computer science W.
Barbosa (UFAL)
1 honorable mention on science fair of undergrads
Ana C. Nascimento (UFRJ)
2 conference presentations Rocio Rossi (CUA) and
Ana C. Nascimento (UFRJ).

Summary
This work uses a machine learning approach through
Convolutional Neural Networks to classify galaxy
images as colliding and non-colliding
Convolutional Neural Networks are mathematical
models that are largely used to classify image data.
200 pre-classified images were used to train the neural
network.
Various neural network architectures were tested and
there are good outcomes for colliding images
classification (85% success), but larger sample is
required.
Future work may use other machine learning methods
to improve the results and will expand it to 40,000
galaxies.