Professional Documents
Culture Documents
to Seismology Students
by John M. Aiken, Chastity Aiken, and Fabrice of the problems themselves. Thus, computation and computa-
Cotton tional thinking are vital skills for future scientists.
Computation and computational thinking afford students
the ability to solve and explore methods in an iterative and
ABSTRACT explorative way (Wing, 2006). Computational thinking is the
process used to translate a problem into appropriate procedures
Python is at the forefront of scientific computation for seismol- for a computer to solve it. Often times, this involves writing
ogists and therefore should be introduced to students inter- code. Through writing codes to solve physics problems, stu-
ested in becoming seismologists. On its own, Python is dents are able to generalize models in a way that is inaccessible
open source and well designed with extensive libraries. How- to analytical methods (Caballero et al., 2014). Complicated
ever, Python code can also be executed, visualized, and commu- problems can be solved simply by students writing codes (e.g.,
nicated to others with “Jupyter Notebooks”. Thus, Jupyter statistical analysis of very large earthquake catalogs, modeling
Notebooks are ideal for teaching students Python and scientific mantle dynamics). Common errors in coding can be general-
computation. In this article, we designed an openly available ized to conceptual issues students have with the concepts being
Python library and collection of Jupyter Notebooks based taught in the course thus aiding instruction (Caballero et al.,
on defined scientific computation learning goals for seismology 2012). However, for beginner students it is often important to
students. The Notebooks cover topics from an introduction to provide the appropriate scaffolding in order for them to be
Python to organizing data, earthquake catalog statistics, linear successful (Vygotsky, 1980).
regression, and making maps. Our Python library and collec- Scaffolded code is code building blocks provided to stu-
tion of Jupyter Notebooks are meant to be used as course ma- dents that they then complete. This is quite similar to the way
terials for an upper-division data analysis course in an Earth scientists actually code. In most cases, scientists do not start
Science Department, and the materials were tested in a Prob- writing new code from scratch to perform an analysis. Often,
abilistic Seismic Hazard course. However, seismologists or any- they are using older codes maintained by colleagues or pred-
one else who is interested in Python for data analysis and map ecessors. Scaffolding code for students is simply translating this
making can use these materials. practice into the classroom.
Why choose Python? Python is an easy language to read that
is popular in modern seismology. Python is also free for anyone
INTRODUCTION to install and use on any computer. Several widely used seismol-
ogy libraries are written in Python (e.g., ObsPy; Beyreuther et al.,
Computers have transformed the way seismologists conduct 2010), and libraries such as Pandas, Numpy, and SciPy provide
research. Tasks such as numerical modeling, data analysis, simple yet powerful statistical and data manipulation tools
and statistics are all performed by seismologists on computers. (McKinney, 2010; van der Walt et al., 2011). A textbook on com-
It is practically impossible today for seismologists to process putational seismology has also been written that uses Python and
earthquake data without a computer because most data are dig- is designed for teaching a graduate level seismology course (Igel,
ital. Seismologists use computers to visualize data, analyze 2017). Python and Jupyter Notebooks have been chosen by the
waveforms, pick phase arrival times, locate earthquakes, and Global Earthquake Model foundation for analysis and dissemi-
investigate earthquake catalog statistics, to name just a few ex- nation of results (Pagani et al., 2014). Thus, Python is becoming
amples. Many seismologists today accomplish these tasks using a reference language for seismology and seismic hazard.
the Python programming language. However, in many cases, Additionally, the Anaconda Python distribution allows for
our courses do not instruct students to use the same modern easy install on all systems. Anaconda is a Python distribution
tools that seismologists use. Understanding computational built for conducting science. It is an inclusive installation of
tools and techniques that seismologists use is not only impor- Python that includes common scientific libraries (Numpy,
tant for training future seismologists but it also teaches stu- Scipy, Matplotlib, and Pandas), a package, and virtual environ-
dents new problem-solving skills and a new representation ment manager (conda). Moreover, Anaconda is recommended
doi: 10.1785/0220170246 Seismological Research Letters Volume 89, Number 3 May/June 2018 1165
available at GitHub website (see Data and Resources). The pri- and then repeating the process until the data were visualized
mary purpose of our Notebooks is to teach Python to students correctly. We did it this way instead of having every step work-
with little to no experience with using programming languages. ing because we hope it shows that the scientific process is one
The Notebooks are useful for teaching students how to pro- of the struggle and continuous improvements. Other Note-
gram because the programming concepts can be separated into books have scaffolded code that builds upon itself. For example,
unique, executable cells, that is, scaffolded (Fig. 1). Once in- the “Creating a map in cartopy and plotting data on it” Note-
stalled, students can execute the Notebooks so they can visu- book first illustrates how to make a basic map and then reuses
alize the process or make changes to the code themselves. the same code to add earthquake data on top of it. In this way,
Dividing the concepts and code into cells, students learn the map-making Notebook is a demonstration of increasing
how to program in chunks that show the content and structure code complexity, that is, we want students to see how the map
and why codes sometimes do not work. It makes learning pro- changes with each additional step.
gramming more accessible to students who may otherwise There are several Notebooks designed to encourage
struggle. exploratory data analysis (Table 1). These Notebooks include
Most online tutorials demonstrate how to use program- teachable moments that stimulate students thinking about the
ming languages the correct way with little help when code fails, data they are engaging with and what it is telling them. For
but learning often comes from failure. At times there are more example, the “Introduction to scatter plots and histograms”
complicated errors, and there are resources for answering Notebook asks students to consider if the number of stations
difficult questions about code failure (e.g., Stack Overflow). used to locate events correlates with their magnitudes (Fig. 3).
However, in some of our Notebooks we model commonly It also teaches students basic Gutenberg–Richter analysis and
made coding errors to demonstrate how programming lan- linear regression, applicable for a course teaching statistical seis-
guages sometimes fail and how to correct it. Figure 1 is a simple mology. This latter point is not shown in Figure 3. However, it
example of scaffolded code illustrating code failure, but there can be found in the notebook itself.
are other more complicated code failure points such as in the After students become accustomed with data analysis and
“Introduction to reading data and plotting using Pandas” handling, Notebooks in the basic figure making and map-
Notebook, which introduces students to errors that can occur making learning goals can be used to demonstrate and visualize
when importing data (Fig. 2). The “Introduction to plotting data (Table 1). Basic figure making Notebooks illustrate how to
data as a heat map” Notebook illustrates the steps that we went make simple figures using real earthquake data with the Pandas
through to make the exercise. The process of making the module without map projection. Once students have mastered
exercise involved reading data, plotting it, not seeing the result, basic figure making, they can be introduced to map-making
Notebooks where projection is utilized. Map-making Note- functions for tasks such as converting timestamps to Python
books illustrate adding features to maps, adding earthquake time objects, calculating different types of distances, calculating
data to maps (including focal mechanisms), importing satellite Gutenberg–Richter b-values, bootstrapping, making volumet-
imagery and shaded relief, annotating maps, and plotting ric selections of data, and calculating parameter sweeps of sta-
gridded and nongridded data as heat maps with and without tistics. For example, the function get_node_data returns a data
interpolations. There are detailed discussions about the effects within a given radius and a given center longitude and latitude.
of interpolation when gridding data. It is offered as is and is not intended to replace tools such as
For the map-making Notebooks, there are currently two ZMAP (Wiemer, 2001) but rather as a jumping-off point for
versions—one that uses the Python library Basemap and one seismologists using Python to explore earthquake catalogs. This
that uses the Python library Cartopy. The Basemap and Car- package also exists outside of the learning goals structure the
topy map-making Notebooks use similar data sets and provide example notebooks rely on and can be thought of as advanced
similar results so there are no differences in content. We made topics.
two map-making versions of the Notebooks for two reasons:
(1) to provide instructors an option based on their familiarity Course Use
with map making in Python, and (2) because Basemap will be The Notebooks were used in a one semester course at the Uni-
sunset in 2020 and will no longer exist in a useable form. versity of Potsdam, Germany. Nine Master’s level students took
Future map-making options could include the Generic Map- an elective course in probabilistic seismic hazard analysis
ping Tool (GMT) Python bindings when they are released and (PSHA), most of which had little to no prior programming
stable (see Data and Resources). experience. These Notebooks served as the basis for a semester-
long project where students would calculate the hazard curve
Utilities Package for a megacity region they selected. Students were taught 3 hrs
Along with the Notebooks provided as examples, an additional per week for 15 weeks. During the first 8 weeks, the first
package called “utilities” is provided. This library focuses more 90 min of class were spent introducing students to concepts
closely on analyzing earthquake catalogs providing many usable of PSHA and statistics. The second 90 min block was used
for students to apply the concepts. A typical second half of class Resources). The cartopy_environment.yml file can be found in
would consist of a brief lecture on the computational concepts the repository, which can be used to create a virtual environment
of that day’s Notebooks (e.g., how and why you should grid where Notebooks can be executed using the Cartopy module.
data). Then, students were given time to explore the Note- To create the virtual environment from the cartopy_
books. Typically, a live coding session would happen as well environment.yml file, students can follow this guide for creating
(Rubin, 2013). In these sessions, the instructor would project a virtual environment (see Data and Resources). The Basemap
their own blank notebook on the screen and give the students a and Cartopy modules do not play well together. If you plan to
task (e.g., graph a sine wave). The students were then to tell the use both, you will need to create separate virtual environments.
instructor what each line of code the instructor should type to For this reason, we also provide a basemap_environment.yml file
complete the given task. All of the Notebooks were designed to for creating a Basemap-friendly virtual environment separate
teach students general skills in Python programming and data from Cartopy. To test if an environment works, run the asso-
analysis. The last four weeks the students spent working full ciated Notebook using the respective mapping environment (i.e.,
time on their PSHA megacities projects. Class time was dedi- change the kernel) and execute each cell (e.g., Python Course
cated to students either working on projects, asking questions Materials for Seismology Students; see Data and Resources).
about projects and giving short presentations about their proj- If it works, you are ready to start!
ects, or getting help with code. The ultimate results of the
course were student conference style talks and posters about
their assessment of hazard in megacities. SUMMARY
Python. Although these Notebooks were written for students, U.S. Geological Survey. Contributions would certainly be
ideally anyone who is interested in learning Python can use welcome as well as feedback on integrating the materials into
these Notebooks. As a test of usability, the Notebooks were a course both from students and instructors.
introduced in a Master’s level Probabilistic Seismic Hazard
Analysis course, where most students had little to no previous DATA AND RESOURCES
programming experience. These notebooks are designed to sit
between advanced course studies (such as Igel, 2017) and in- The Jupyter Notebooks presented in this article use openly
troductory courses offering students with no background in available data from a variety of sources. These data sources
programming exposure to map making and statistics using are provided in the Notebooks, but we state them again here.
Python. Earthquake catalogs were obtained from the Advanced
For all intents and purposes, this collection of codes and National Seismic System (ANSS) via http://www.ncedc.
Jupyter Notebooks are complete. However, we do want to org/anss/catalog‑search.html (last accessed October 2017)
develop more Notebooks for specific courses or allow more and the Southern California Earthquake Center (SCEC) via
flexibility to teachers given their familiarity with different tools. http://service.scedc.caltech.edu/eq-catalogs/date_mag_
Future updates will include map making via GMT-based Python loc.php (last accessed October 2017). Focal mechanisms were
library when the toolkit becomes more stable (see Data and obtained from the Global Centroid Moment Tensor (CMT)
Resources). Also, we would like to develop a module based on catalog via http://www.globalcmt.org/CMTsearch.html (last
seismic-waveform data analysis via ObsPy that would be useful accessed October 2017). Peak ground acceleration data are
for an upper division undergraduate or graduate-level course in available from the North American Space Agency (NASA)
observational seismology. Finally, we would like to offer more with registration via http://sedac.ciesin.columbia.edu/data
examples of using Python to fetch seismic data from sources such /set/ndh-earthquake-distribution-peak-ground-acceleration
as Incorporated Research Institutions for Seismology and the /data-download#close (last accessed October 2017). We also
The authors would like to thank Emily Wolin for her construc- Chastity Aiken2
tive feedback. Part of this work was supported by the Seismology Department of Marine Geosciences - LAD
and Earthquake Engineering Research Infrastructure Alliance Ifremer
for Europe (SERA) project funded by the EU Horizon 2020 ZI de la pointe du Diable
programme under Grant Agreement Number 730900. CS 10070
29280 Plouzané, France
REFERENCES chastity.aiken@ifremer.fr
Beyreuther, M., R. Barsch, L. Krischer, T. Megies, Y. Behr, and J. Was- Fabrice Cotton3
sermann (2010). ObsPy: A Python toolbox for seismology, Seismol. GFZ German Research Institute for Geosciences
Res. Lett. 81, no. 3, 530–533. Telegrafenberg, 14473 Potsdam, Germany
Caballero, M. D., J. B. Burk, J. M. Aiken, B. D. Thoms, S. S. Douglas, E.
M. Scanlon, and M. F. Schatz (2014). Integrating numerical com- fcotton@gfz‑potsdam.de
putation into the modeling instruction curriculum, Phys. Teach. 52,
no. 1, 38–42. Published Online 14 March 2018
Caballero, M. D., M. A. Kohlmyer, and M. F. Schatz (2012). Fostering
computational thinking in introductory mechanics, AIP Conf. 1
Also at GFZ German Research Centre for Geosciences, Telegrafenberg,
Proceedings, Vol. 1413, 15–18. 14473 Potsdam, Germany.
Igel, H. (2017). Computational Seismology: A Practical Introduction, 2
Also at The University of Texas at Austin, Institute for Geophysics,
Oxford University Press, London, United Kingdom.
McKinney, W. (2010). Data structures for statistical computing in 10601 Exploration Way, Austin, Texas 78758 U.S.A.
3
Python, Proc. of the 9th Python in Science Conference, 51–56. Institute for Earth and Environmental Sciences, University of Potsdam,
Pagani, M., D. Monelli, G. Weatherill, L. Danciu, H. Crowley, V. Silva, P. Karl-Liebknecht-Straße 24/25, 14476 Potsdam, Germany.
Henshaw, L. Butler, M. Nastasi, L. Panzeri, and M. Simionato (2014).