You are on page 1of 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/367536422

Environmental data science: Part 1

Article in Environmetrics · February 2023


DOI: 10.1002/env.2787

CITATION READS

1 206

3 authors:

Andrew Zammit-Mangion Nathaniel K. Newlands


University of Wollongong Summerland Research and Development Centre
105 PUBLICATIONS 1,698 CITATIONS 121 PUBLICATIONS 1,251 CITATIONS

SEE PROFILE SEE PROFILE

Wesley Burr
Trent University
55 PUBLICATIONS 112 CITATIONS

SEE PROFILE

All content following this page was uploaded by Nathaniel K. Newlands on 16 May 2023.

The user has requested enhancement of the downloaded file.


1099095x, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2788 by University Of Victoria Mearns, Wiley Online Library on [22/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Received: 10 January 2023 Accepted: 11 January 2023

DOI: 10.1002/env.2788

EDITORIAL

Environmental data science: Part 2

Wesley S. Burr1 Nathaniel K. Newlands2 Andrew Zammit-Mangion3

1
Department of Mathematics, Trent
University, Ontario, Canada Summary
2
Agriculture and Agri-Food Canada, Environmental data science is a multi-disciplinary and mature field of research
Summerland Research and Development at the interface of statistics, machine learning, information technology, cli-
Centre, British Columbia, Canada
mate and environmental science. The two-part special issue ‘Environmental
3
School of Mathematics and Applied
Statistics, University of Wollongong, New
Data Science’ comprises a set of research articles and opinion pieces led by
South Wales, Australia statisticians who are at the forefront of the field. This editorial identifies
and discusses common research themes that appear in the contributions to
Correspondence
Wesley S. Burr, Department of Part 2, which focuses on applications. These include spatio-temporal mod-
Mathematics, Trent University, Ontario, eling; the problem of aggregation and sparse sampling; the importance of
Canada L1J 5Y1.
community-building and training for the next generation of specialists in envi-
Email: wesleyburr@trentu.ca
ronmental data science; and the need to look forward at the challenges that
lie ahead for the discipline. This editorial complements that of Part 1, which
largely focuses on statistical methodology; see Zammit-Mangion, Newlands, and
Burr (2023).

KEYWORDS
applications, community, spatial, spatio-temporal, training, uncertainty

I N T RO DU CT ION

Methodological and technological advances in the last two decades have seen analyses and forecasts in the environmen-
tal sciences harness the increased availability of data, and have led to the multi-faceted, interdisciplinary field which we
refer to as environmental data science (EDS). EDS considers every aspect of a workflow and value-chain involving envi-
ronmental data, from the moment data are collected and stored through to the stage at which the data are used to support
decision-making.
An EDS workflow often involves developing and applying statistical models and frameworks to answer scientific
questions using data, and this is an area where statisticians have contributed substantive advances. Part 2 of the special
issue is a recognition of these developments, as well as a further contribution to the field through eight research articles
that showcase applications of EDS. Part 2 also contains four opinion pieces by expert practitioners in the field that offer
perspectives and insights on various aspects of EDS, including the critical question of training and community-building.
This editorial focuses on the core themes discussed in Part 2 of the special issue. Part 1 of the special issue comprises
an additional nine articles and four opinion pieces, which we discuss in a separate editorial; see Zammit-Mangion
et al. (2023).

Reproduced with the permission of the Minister of Agriculture and Agri-Food Canada.

© 2023 His Majesty the King in Right of Canada and The Authors. Environmetrics © 2023 John Wiley & Sons Ltd.

Environmetrics. 2023;e2788. wileyonlinelibrary.com/journal/env 1 of 4


https://doi.org/10.1002/env.2788
1099095x, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2788 by University Of Victoria Mearns, Wiley Online Library on [22/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 of 4 EDITORIAL

APPLICATION A ND DEVELO PMENT OF SPATIO-TEMPORAL MODELS

As noted in our editorial for Part 1 of the special issue (Zammit-Mangion et al. 2023), environmental data analyses are often
concerned with processes evolving in space and/or time, and therefore make extensive use of spatial or spatio-temporal
models. Most of the contributions to Part 2 of the special issue develop and apply such models: Yan, Cantoni, Field,
Treble, and Mills Flemming (2022) consider a spatio-temporal application in fisheries science that involves estimat-
ing the maturity of fish stock; Nie, Wang, and Cao (2022) apply functional data analysis to the problem of sub-region
estimation for daily bike-share rentals; Laroche, Olteanu, and Rossi (2022) examine irregularly sampled left-censored pes-
ticide concentration data from France, developing new methodology for modeling spatio-temporal heterogeneity; while
Mukherjee, Bagozzi, and Chatterjee (2022) use spatio-temporal fields to model climate and social instability interac-
tions, as a framework for studying conflict. Several contributions also consider the problem of spatial/spatio-temporal
interpolation or emulation: Granville, Woolford, Dean, Boychuk, and McFayden (2022) tackle the problem of interpo-
lating spatial data for generating a fire index for wildfires in Ontario, Canada, while Cartwright, Zammit-Mangion, and
Deutscher (2022) develop a spatio-temporal emulator based on convolutional variational autoencoders. Several con-
tributed opinion pieces also expand on the challenges in this area: Scott (2022) discusses the ‘digital earth’ concept and
the challenges of spatially or temporally sparse data; Blair and Henrys (2022) consider the idea of ‘digital twins’ for mak-
ing sense of complex, heterogeneous spatio-temporal data; and Sain (2022) discusses data science and risk quantification
in a complex environment.

SA MPLING AND AGGREGAT ION


Spatio-temporal analyses often involve dealing with data which are recorded on differing time scales, spatial scales, or
both. Jahid et al. (2022) examine this problem in the context of animal tagging and abundance estimation for grizzly
bears in Alberta, Canada; Yan et al. (2022) consider aggregation for fisheries stocks in Atlantic Canada; and Laroche
et al. (2022) deal with aggregation when examining censored pesticide data. The more methodological side of this problem
is considered by Roth et al. (2022) for calibration methods of flood hazard projections, when integrating model outputs
with differing resolutions. The opinion pieces of Scott (2022) and Blair and Henrys (2022) also consider this issue in their
discussions of the ‘digital earth’ and ‘digital twin’ concepts.

CO MMUNITY-BUILDING AND T RAINING

One common complaint amongst industry professionals is, and has been for several years, the lack of trained and quali-
fied staff capable of handling the deluge of data generated by modern instrumentation and observational hardware. The
response to this issue is varied, and extends from academic programs, which initially train graduates to work in the field,
to communities of practitioners capable of encouraging ‘life-long learning’ and renewed skill development among profes-
sionals working in the field. de Silva (2022) examines the intersection between these professionals and the R community
in Latin America, emphasizing the need for diverse and local community building. With regard to training, there have
been a number of graduate programs that have been recently developed and launched, from the Masters in Data Science
program at University of British Columbia to the Masters in Environmental Data Science program at the University of
California, Santa Barbara. In addition, the geospatial community, often directly entwined with the ‘spatial’ aspect of EDS,
has risen to the challenge; for example, new programs on geospatial data science and on data science for energy and
environmental research have been launched at the University of Michigan and the University of Chicago, respectively.
As discussed by Scott (2022), there is an opportunity for academics in the field of statistics to ensure that these programs
are grounding their graduates in strong, foundational thinking. Graduates who are computationally ready to tackle large
data in the environmental realm also need to be statistically prepared to consider the problems of sampling, design, and
bias; these are topics that are core to the field of statistics. Governments are also helping to build and strengthen data sci-
ence communities both internally within the public service as well as externally with industry, academia, and citizens.
For example, Statistics Canada has launched a Data Science Network for government, industry, academia, and citizens
to join, learn, and share knowledge and insights.1

1
Data Science Network for the Federal Public Service (of Canada) (FSNDS): https://www.statcan.gc.ca/en/data-science/network/about
1099095x, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2788 by University Of Victoria Mearns, Wiley Online Library on [22/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
EDITORIAL 3 of 4

LOOKING FO RWARD

More application areas in EDS will continute to emerge as complex models and computing become increasingly acces-
sible, and as it becomes increasingly clear that EDS plays a pivotal role in tackling and mitigating the effects of climate
change. Scott (2022) looks forward and backwards, noting that while the evolution in the data landscape is exciting, chal-
lenges of data assurance continue to plague our field, and that expertise in design and sampling is needed now more
than ever. Reproducibility and responsible workflow are of growing importance (Parashar, Heroux, & Stodden 2022)
due to the rapid and constant evolution of technology and tools, and due to the increased awareness of the importance
of best research practice. There is also a need for vetted and inclusive curricular materials for multidisciplinary com-
munication and comprehension (Danyluk et al. 2021; Horton, Alexander, Parkers, Piekut, & Rundel 2022). Sain (2022)
reminds us that the toolbox for modeling and analysis is growing rapidly, and that there are opportunities and challenges
in thoughtfully incorporating methodologies into EDS, including those concerned with the quantification of risk. Blair
and Henrys (2022)’s focus on the interrelationships between process and data models, and the complexity of the ‘arrows’
that join them, is a timely reminder for practitioners in EDS to consider the connections between the models, the data,
and the world they are all based on.

CO N C LU S I O N

The special issue brings together global leaders in the theory and application of EDS, and provides a glimpse of the
contributions the field of statistics is making to this important area of research. The special issue also features contri-
butions from a number of junior scholars as lead authors: it is heartening to see an up-and-coming new generation
of talented scholars tackling problems in this field. The vast array of topics in the published works is enlightening,
and a reflection of how multi-faceted and interdisciplinary the field of EDS is. Part 2 of the special issue clearly
shows that there are dedicated scientists working in the fields of environmental chemistry, fisheries science, wildfire
science, and climate and environmental science, who can benefit from statisticians and their toolsets, their way of
thinking, and the models that they have spent decades developing. Collaboration is a wonderful tool for building up
science in a thoughtful, supported way, and it is encouraging to see so much happening in the pages of this special
issue.

ACKNOWLEDGEMENTS
The guest editors would like to thank Environmetrics Editor-in-Chief Prof. Alessandro Fassò for the opportunity to organ-
ise this special issue, the Environmetrics Wiley team for handling submissions and publication, and the contributors for
their submissions. We would also like to thank all the reviewers (more than 35 on our last count) without whom this issue
would not have been possible. The response we had to the special issue was overwhelmingly positive, so much so that
the contributions are organized into two parts (with the first discussed by Zammit-Mangion et al. 2023); we thank the
broader environmetrics community for their support in making this special issue a reality. Wesley S. Burr also acknowl-
edges the support of the (Canadian) National Science and Engineering Research Council Discovery Grant DG-2017-04741;
Nathaniel K. Newlands acknowledges the support of the Canadian Agricultural Partnership (CAP) Program, Agriculture
and Agri-Food Canada; and Andrew Zammit-Mangion acknowledges the support of the Australian Research Council
Discovery Early Career Researcher Award DE180100203.

REFERENCES
Blair, G., & Henrys, P. A. (2022). The role of data science in environmental digital twins: In praise of the arrows. Environmetrics, e2789.
Cartwright, L., Zammit-Mangion, A., & Deutscher, N. M. (2022). Emulation of greenhouse-gas sensitivities using variational autoencoders.
Environmetrics, e2754.
Danyluk, A., Leidig, P., McGettrick, A., Cassel, L., Doyle, M., Servin, C., … Stefik, A. (2021). Computing competencies for undergraduate data
science programs: an ACM task force final report. In SIGCSE 2021 - Proceedings of the 52nd ACM Technical Symposium on Computer Science
Education (pp. 1119–1120). Association for Computing Machinery, New York, NY. https://doi.org/10.1145/3408877.3432586
de Silva, N. (2022). Intersection between environmental data science and the R community in Latin America. Environmetrics, e2731.
Granville, K., Woolford, D. G., Dean, C. B., Boychuk, D., & McFayden, C. B. (2022). On the selection of an interpolation method with an
application to the Fire Weather Index in Ontario, Canada. Environmetrics, e2758.
Horton, N. J., Alexander, R., Parkers, M., Piekut, A., & Rundel, C. (2022). The growing importance of reproducibility and responsible workflow
in the data science and statistics curriculum. Journal of Statistics and Data Science Education, 30, 207–208.
1099095x, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2788 by University Of Victoria Mearns, Wiley Online Library on [22/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 of 4 EDITORIAL

Jahid, M., Steeves, H. N., Fisher, J. T., Bonner, S. J., Muthukumarana, S., & Cowen, L. L. (2022). Shooting for abundance: Comparing integrated
multi-sampling models for camera trap and hair trap data. Environmetrics, e2761.
Laroche, C., Olteanu, M., & Rossi, F. (2022). Pesticide concentration monitoring: Investigating spatio-temporal patterns in left censored data.
Environmetrics, e2756.
Mukherjee, U. K., Bagozzi, B. E., & Chatterjee, S. (2022). A Bayesian framework for studying climate anomalies and social conflicts.
Environmetrics, e2778.
Nie, Y., Wang, L., & Cao, J. (2022). Estimating functional single index models with compact support. Environmetrics, e2784.
Parashar, M., Heroux, M. A., & Stodden, V. (2022). Research reproducibility. Computer, 55, 16–18.
Roth, S. M., Seiyon Lee, B., Sharma, S., Hosseini-Shakib, I., Keller, K., & Haran, M. (2022). Flood hazard model calibration using multiresolution
model output. Environmetrics, e2769.
Sain, S. R. (2022). Data science and climate risk analytics. Environmetrics, e2749.
Scott, M. (2022). Framing data science, analytics and statistics around the digital earth concept. Environmetrics, e2732.
Yan, Y., Cantoni, E., Field, C., Treble, M., & Mills Flemming, J. (2022). Spatiotemporal modeling of mature-at-length data using a sliding
window approach. Environmetrics, e2759.
Zammit-Mangion, A., Newlands, N. K., & Burr, W. S. (2023). Environmental data science: Part 1. Environmetrics, 34, e2787.

How to cite this article: Burr, W. S., Newlands, N. K., & Zammit-Mangion, A. (2023). Environmental data
science: Part 2. Environmetrics, e2788. https://doi.org/10.1002/env.2788

View publication stats

You might also like