Professional Documents
Culture Documents
1
1.3 Assessment
As a reminder, assessment for the Astro Lab is via a Moodle quiz. The quiz will open and close on
the day of your second Lab session. You will need to have your two notebooks accessible
to answer the quiz. The quiz itself shouldn’t take more than 10 minutes - you will have finished
the lab by the time you sit the quiz, so you will have already done all of the hard work!
1.3.1 Imports
Firstly, we will import the necessary SciServer and support libraries.
[1]: # Import Python libraries to work with SciServer
import SciServer.CasJobs as CasJobs # query with CasJobs
import SciServer.SciDrive as SciDrive # read/write to/from SciDrive
import SciServer.SkyServer as SkyServer # show individual objects and␣
,→generate thumbnail images through SkyServer
2
a subsample of the original database, though SQL can operate on the data very effectively too. In
this tutorial we will submit queries to the SDSS database to gather the information that we need,
and we will use python to operate on, manipulate, and vizualise that data.
An extensive tutorial on how to query the SDSS database is provided here:
http://skyserver.sdss.org/dr14/en/help/howto/search/searchhowtohome.aspx . In short, ev-
ery SQL command consists of three blocks: - The SELECT block: it defines the quantities that
you want your query to return. - The FROM block: it defines which tables of the database you
want SQL to look in. - The WHERE block: it defines any constraints on the data that you want
to impose.
In this Lab you won’t have to write SQL queries from scratch, only execute commands that are
already written for you.
# a region of sky with 100 < RA < 250, a redshift between 0.02 and 0.1, and a␣
,→g-band magnitude brighter than 17.
#
# First, store the query in an object called "query"
query="""
SELECT p.objId,p.ra,p.dec,p.petror90_r, p.expAB_r,
p.dered_u as u, p.dered_g as g, p.dered_r as r, p.dered_i as i,
s.z, s.plate, s.mjd, s.fiberid
FROM galaxy AS p
JOIN SpecObj AS s ON s.bestobjid = p.objid
WHERE p.petror90_r > 10
and p.ra between 100 and 250
and s.z between 0.02 and 0.1
and p.g < 17
"""
#Then, query the database. The answer is a table that is being returned to a␣
,→dataframe that we've named all_gals.
3
The dataframe that is returned, which we named all_gals, holds the following quantities (in separate
columns) for each galaxy:
• ra = Right Ascencion coordinate in degrees
• dec = Declination coordinate in degrees
• petror90_r = Radius enclosing 90% of the pertrosian flux in arcseconds. I.e., size of the
galaxy on the sky.
• dered_u, dered_g, dered_r, dered_i = Magnitudes in 4 optical filters, from the blue to the
red, after subtracting the attenuation due to the Milky Way.
• z = Redshift of the galaxy
• plate = Plate number (SDSS used alluminium plates with drilled holes for positioning optical
fibers).
• mjd = Date of the observation
• fiberid = Number of the fiber in a given plate. Plates have between 640 and 1000 fibers.
Each row on the dataframe corresponds to one galaxy.
You can inspect the first 10 elements of your dataframe (i.e., the first 10 galaxies) with:
[3]: all_gals.loc[0:10]
4
2 And, for example, access certain galaxies individually:
[4]: all_gals.loc[30]
[4]: 0.225689876893376
2.0.1 Exercise 1:
How many galaxies does your dataframe hold?
[5]: 37951
[5]: 37951
5
[8]: Text(0.5, 1.0, 'Position of galaxies')
2. some areas of the diagram are more densely populated with galaxies than others
2.1.2 Exercise 3:
1) Using the np.where() command, select galaxies in two narrow redshift slices:
• slice 1: 0.02 < z < 0.03 (green)
• slice 2: 0.03 < z < 0.04 (orange)
2) Make the same plot as above, but only using the galaxies in each slice using the suggested
colour scheme (make one plot for each slice).
3) Make a third plot with galaxies from both redshift slices.
Remember to add axis labels, a title and a legend to each plot.
[9]: slice_1 = np.where((0.02<all_gals['z'])&(all_gals['z']<0.03))[0]
6
[11]: plt.figure(figsize=(10,8))
plt.scatter(all_gals.loc[slice_1]['ra'],all_gals.loc[slice_1]['dec'], marker='.
,→', s=1, color='green')
plt.figure(figsize=(10,8))
plt.scatter(all_gals.loc[slice_2]['ra'],all_gals.loc[slice_2]['dec'], marker='.
,→', s=1, color='orange')
plt.figure(figsize=(10,8))
plt.scatter(all_gals.loc[slice_1]['ra'],all_gals.loc[slice_1]['dec'], marker='.
,→', s=1, color='green' ,label='slice_1')
plt.scatter(all_gals.loc[slice_2]['ra'],all_gals.loc[slice_2]['dec'], marker='.
,→', s=1, color='orange' ,label='slice_2')
7
8
9
2.1.3 Exercise 4:
Do you see more structure in the distribution of galaxies in each slice, when compared to your first
plot that included all galaxies?
What can you tell about the structure you see in the two different redshift slices?
Why was it harder to see in your first plot, where you included all galaxies?
slice 1 is more densley populated near the middle of the graph with a lower ra than slice 2’s more
densley populated area. Slice 2s posiitions are more commonly further away which correlates to
the higher redshift value for slice 2.
10
the observed colour of galaxies (you will learn this in the later lectures, if you haven’t yet).
In this set of exercises we will focus on the first slice in redshift, which is very narrow, meaning
that all galaxies have a similar redshift. I.e., if galaxies in this redshift slice have different colours,
it ought to be because their spectra and stellar composition are different, and not because some are
redshifted due to the expansion of the Universe.
The following cell plots a histogram of the values of the u-g colour of the galaxies in your dataframe:
[12]: slice1 = np.where( (all_gals['z'] > 0.02) & (all_gals['z'] < 0.03))[0]
plt.xlabel('u-g')
plt.ylabel('Number of galaxies')
plt.title('Distribution of u-g color in 0.02 < z < 0.03')
[12]: Text(0.5, 1.0, 'Distribution of u-g color in 0.02 < z < 0.03')
np.percentile() (https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.percentile.html)
allows you to quickly return the percentile of a distribution of points. For example, to find the
median (50th percentile) u-g colour of your galaxy population you can write:
11
print(median_umg)
1.4351650000000005
i.e., 50% of the galaxies in your sample have u-g colours that are lower than 1.435 (i.e., they are
bluer than the median), and 50% have u-g colours that are larger (i.e., they are redder than the
median). If I wanted to choose only the 10% reddest galaxies I could do:
high_umg
[14]: 1.8633340000000014
2.2.1 Exercise 5:
Following the example above, use np.percentile() to choose the 25% reddest and 25% bluest
galaxies in u-g. Then plot their positions on the sky. Do both types of galaxies trace the large-
scale structure in a similar way? What can you say about which galaxies preferencially sit on
denser parts of the Universe, and which sit on less dense regions (we call this environment)? For
this exercise it is recommended that you make two plots (one for the red galaxies, and one for the
blue), so that it is easier to compare. You may use as many cells as needed.
[16]: plt.figure(figsize=(10,8))
plt.scatter(all_gals.loc[very_red_galaxies]['ra'],all_gals.
,→loc[very_red_galaxies]['dec'], marker='.', s=1, color='green'␣
,→,label='slice_1')
,→,label='slice_2')
12
[16]: Text(0.5, 1.0, 'Position of galaxies')
13
By now you will have started developing an understanding of how galaxies in general are spacially
distributed in the Universe and the shape of the cosmic web, and how galaxies’ position on the
cosmic web and their environment is related to their colour. Next, we will look at the shape of
galaxies.
14
height=200 # height
pixelsize=0.396 # image scale
plt.figure(figsize=(15, 15)) # display in a 4x4 grid
subPlotNum = 1
i = 0
nGalaxies = 16 #Total number of galaxies to plot
ind = np.random.randint(0,len(my_galaxies), nGalaxies) #randomly selected␣
,→rows
count=0
for i in ind: # iterate through the randomly selected rows in the␣
,→DataFrame
count=count+1
print('Getting image '+str(count)+' of '+str(nGalaxies)+'...')
if (count == nGalaxies):
print('Plotting images...')
scale=2*all_gals.loc[i]['petror90_r']/pixelsize/width
img = SkyServer.getJpegImgCutout(ra=all_gals.loc[my_galaxies[i]]['ra'],␣
,→dec=all_gals.loc[my_galaxies[i]]['dec'], width=width, height=height,␣
,→scale=scale,dataRelease='DR14')
plt.subplot(4,4,subPlotNum)
subPlotNum += 1
plt.imshow(img) # show images in grid
plt.title(all_gals.loc[my_galaxies[i]]['z'])
You can use the function defined above to plot 16 random galaxies from any dataframe. For
example, to plot 16 galaxies randomly selected in a redshift slice 0.02 < z < 0.03 you might do:
[20]: my_galaxies = np.where( (all_gals['z'] > 0.02) & (all_gals['z'] < 0.03))[0]
print(my_galaxies)
show_galaxy_images(my_galaxies)
15
Getting image 14 of 16…
Getting image 15 of 16…
Getting image 16 of 16…
Plotting images…
2.3.1 Exercise 6:
Compute the fraction of galaxies you’d classify as having disks, and the fraction of galaxies you’d
classify as being smooth ellipsoids. If you want to improve your statistics, you can rerun the cell
above and you will get 16 different galaxies every time…
[ ]: I would say galaxies 2,3,4,5,7,8,11,13,15,16 have disks
Galaxies 1,6,9,10,14 are smooth ellipsoids and I am inconclusive on galaxy 12␣
,→because of its size
16
Answer here (double-click to edit):
2.3.2 Exercise 7:
Now starting from the code given in the example above (copy it and paste it onto the cell below),
do the same thing but taking 16 random galaxies that are red, according to your earlier definition
of red and blue. Once again, classify the galaxies as disks or ellipticals. Note, after copying and
pasting, you only need to change the first line, that defines my_galaxies.
[21]: my_galaxies = np.where((all_gals['z'] > 0.02) & (all_gals['z'] < 0.03) &␣
,→(all_gals['u']-all_gals['g'] > high_umg))[0]
print(my_galaxies)
show_galaxy_images(my_galaxies)
17
[ ]: disk : 5/16
elliptical: 11/16
2.3.3 Exercise 8:
Repeat the above exercise, now with blue galaxies. Repeat your classification exercise.
[23]: my_galaxies = np.where((all_gals['z'] > 0.02) & (all_gals['z'] < 0.03) &␣
,→(all_gals['u']-all_gals['g'] < high_umg))[0]
print(my_galaxies)
show_galaxy_images(my_galaxies)
18
Getting image 2 of 16…
Getting image 3 of 16…
Getting image 4 of 16…
Getting image 5 of 16…
Getting image 6 of 16…
Getting image 7 of 16…
Getting image 8 of 16…
Getting image 9 of 16…
Getting image 10 of 16…
Getting image 11 of 16…
Getting image 12 of 16…
Getting image 13 of 16…
Getting image 14 of 16…
Getting image 15 of 16…
Getting image 16 of 16…
Plotting images…
19
[ ]: elliptical : 3/16
sprial : 13/16
2.3.4 Exercise 9:
From the above exercise, what can you say - if anything - about the relationship between colour
and morphology?
Answer here (double click to edit):
Congratulations, that is the end of the Lab! Make sure you’ve run all the code cells,
filled in all the text answers and that your plots are all showing without error. Print
to PDF, and submit to Moodle by the deadline. This account on SciServer is yours to keep,
and you’re welcome to explore further at any time. If you do, and you ever need some guidance, I
would be more than happy to help.
[ ]: Bluer galaxies are more likely to contain disks and be spiral-like
Redder galaxies are less likely to contain disks and clasify as ellipsoids
20