You are on page 1of 5

Is the Hydrology Model Reproducible?

By: Wahdan Achmad Syaehuddin (1053099)

1. Introduction

The growth of social communication tools, such as social media platforms, has had a profound impact
on the distribution of information within society, leading to a significant increase in the flow of
information. Science is inevitably affected by this growth in social interaction. Thus, the way to
communicate and disseminate the result of an experiment or theory in science needs more adaptation
to this growth environment. Social media has become a double-edged sword for science, while it will
be easier and more effective to communicate science to the general public, scrutinization also
appeared because of this. The climate change debate is one of the examples of how social media could
polarize the discussion about this topic. Williams et al. (2015) discussed how Twitter became an echo
chamber environment between two groups, which he identified as ‘activists’ and ‘skeptics’. Despite
the confirmation of numerous individual experts and panels such as the Intergovernmental Panel on
Climate Change (IPCC), about the rapid changes in climate since the Industrial Revolution, a persistent
attitude of denial towards this matter remains prevalent, especially in the ‘skeptics’ group.

The role of epistemic uncertainty in climate change projection is frequently used by ‘skeptics’ groups
to challenge the IPCC study (Leuschner, 2015). This uncertainty problem could be related to
reproducible problems in science practice. For example, in climate projections, there are several
different results for every model, sometimes for the reason we don’t know how this has happened as
we work often with black-box models. I will explore uncertainty and black-box later in this paper, now
let's discuss first about the reproducibility problem first.

The term "replication crisis" emerged in the 2010s in response to unsatisfactory outcomes observed
in reproducibility attempts within the fields of medical, life, and behavioral sciences, such as
psychology (Pashler and Harris, 2012; Stanford Encyclopedia of Philosophy, 2021). This crisis is due to
the “generation of new data and scientific publications at an unprecedented rate” that leads to the
“desperation to publish or perish” and a failure to systemically follow good scientific practice (Begley
and Ioannidis, 2015). The concept of reproducibility holds significant importance within the field of
science since it has the potential to impact the credibility of scientific findings. It has been argued that
reproducibility serves as a distinguishing characteristic that sets science apart from other forms of
knowledge (Stanford Encyclopedia of Philosophy, 2021). The reproducibility process can be considered
a means to evaluate scientific theories by subjecting them to scrutiny and falsification, to determine
their validity.

2. Reproducibility problem in computational hydrology modeling

Earth and environmental studies are inherently bound by epistemic uncertainty due to the complex
system of our planet. There are still so many things that are unknown in the earth, environment, and
climate system. This might potentially have consequences for the science practices in this field, one of
them is the challenge to make reproducibility works. For example, take one subject from these studies,
hydrology modeling, the study conducted by Stagge et al. (2019) on several water resources and
hydrology journals found within a 95% confidence interval there are only 0.6% to 6.8% of 1989 articles
that could be reproduced. The presence of a low reproducibility number in hydrology modeling
indicates the necessity for further advancements in the scientific method and enhanced transparency
throughout the process of working with hydrological models. In the next paragraph, I will be focusing
on this hydrological model topic as I worked with this during my thesis and my ongoing project right
now.
Modeling is one way to simplify complex dynamic systems. This is a simplification of reality, in this
case, a simplification of the water cycle system. Hence, model results are nearly impossible to recreate
in exactly the same way in the real world or in the field, because at best, models can only provide the
same results in the same order of magnitude. So, it is common to accept that the result of the model
does not need to exactly replicate what happened in reality. For example, the hydrological model to
simulate water discharge in some basins is almost impossible to give the exact number of the water
discharge in those basins when it is compared to physical measurements in the field. As long as the
result of the model is in the range of an acceptable margin of error in reproducing key scientific
findings, the result could be accepted (Popper, 1959). Thus, the capability of the model to make
replication of reality is not so much a replication problem in a way that I want to discuss here, or by
definition in Stanford Encyclopedia of Philosophy (2021).

The first problem that faces when talking about reproducibility in hydrology modeling is the lack of a
standard definition of reproducibility in computational modeling (Essawy, et al., 2020). Essawy et al.
(2020) argued that the definition provided by the National Academic of Sciences Engineering and
Medicine (2019) is not enough to define the reproducibility problem in computational modeling.
Another problem with the reproducibility of hydrology modeling is the lack of sharing data, code, and
workflow in the scientific journal (Essawy et al., 2020; Hutton et al., 2016; Stagge et al., 2019;).

In computational modeling, the hydrology model included, there is some ‘condition of analysis’ that
might influence the reproducibility of the model. Essawy et al. (2020) stated that even when all data,
code, and workflow are available in the publication, other researchers could not directly reproduce
the model due to that ‘condition of analysis’. This specific condition is related to the environment of
computation that sometimes could be tricky. I have some experience with this problem, although it
does not influence the end result in my case. To avoid all these issues, the researcher needs to state
what kind of operating system they are using, which package or library is used, and which type of those
packages. When all those requirements are fulfilled, the reproducible model theoretically could be
achieved by other researchers.

The next problem is the availability of artifacts of the publication, in this case, the input data,
code/model, and the model workflow. The availability of artifacts really crucial in hydrology modeling.
The majority of publications have not made this artifact online, thus only a small part of the publication
could be reproduced. The study by Stagge et al. (2019) shows that from six popular hydrology journals
in 2017, taken by random samples, only 48.6% of the materials (partial of artifacts) of the publication
online and have public access. Then only 5.6% of the sample that made the whole artifacts available.
From this number, only 1.1% of the samples could be fully reproducible, the rest whether partially
reproducible and not reproducible. This problem also stated by other study like Essawy et al. (2020),
Hutton et al. (2016), Melsen et al. (2017), and Rosenberg (2020). Another issue that is often overlooked
is related to the publication environment, which tends to promote novel findings and statistically
significant results rather than replication studies and null results (Hutton et al., 2016). This
environment potentially makes most researchers hesitate to do the reproducibility work as the reward
is not that high from publication and academic institutions.

To respond to lack of clear definition on reproducibility in hydrological modeling, Essawy et al. (2020)
made a new taxonomy of reproducibility. This taxonomy already includes the ‘condition of analysis’
that is needed to reproduce modeling work. In their study, they added two more terms ‘repeatability’
and ‘runnability’, two steps that need to be reach before making reproducibility and replicability work.
Before the researcher publishes their model, they need to process the model again with the same
machine (repeatability) they used before and different machine (runnability). Only after they have the
same result as the previous result, they can determine that the model is reproducible and can be
published to the scientific community. Reproducibility here is the activity of reworking other
researcher work using same data with same model and replicability is using new data with same
model.

To overcome the problem of the artifact availability, the scientific community urges the researchers to
make more data available online and easily accessed by the public. The movement of open science
could have a positive influence on this data availability. Stagge et al. (2019) made an open tool available
to check whether the publication is reproducible. This tool has the capability to assist authors, journals,
funders, and institutions in conducting self-assessments of manuscripts. Furthermore, it facilitates the
provision of feedback aimed at enhancing reproducibility. Additionally, the tool enables the
identification and acknowledgment of reproducible papers, thereby serving as exemplars for others.
Hutton et al. (2016) asked for more detailed workflow or methodology documentation. As I mentioned
before, this methodology could be better if they stated which type of software or packages (like in
python) they used. For the publication institution, it will be better to make this data and code
availability mandatory like what Nature-family journals have done (Melsen et al., 2017). Last, the
modification of reward system for the publication of reproducibility work will be needed to increase
the desire of researcher to do this work.

3. Reproducible hydrology model: why is it important?

Establishing a clear definition of reproducibility within the context of hydrology modeling is crucial in
order to differentiate it from other practices in science, such as experimentation. In the context of
hydrology modeling, it is anticipated that when all necessary requirements, including data, workflow
documentation, and code availability, are satisfied, the outcome is likely to reproduce the findings of
the original researcher. According to Melsen et al. (2017), the final conclusion will not be affected by
epistemic uncertainty or subjectivity. In contrast, within the discipline of hydrology, the presence of
epistemic uncertainty can be observed through experimental approaches. This uncertainty arises from
factors such as inadequate calibration of equipment and the inherent subjectivity involved in the
testing process. A clear definition of reproducibility enables another researcher to easily verify the
model's validity.

The important implications of publicly accessible artifacts, including data and workflow
documentation, in the field of hydrological modeling, could be observed from scientific and
philosophical standpoints. From a scientific standpoint, the availability of open access to data and
procedures has been shown to accelerate the rate at which research and innovation develop. The
utilization of existing information enables researchers to expand upon previous findings, hence
minimizing redundant work and facilitating the effective dissemination of ideas and approaches. This
facilitates the progress of hydrology as a discipline, enhancing our comprehension of complex water
systems and the effects of climate change. In addition, the practice of open science promotes
transparency, which is a fundamental principle in the scientific community since it allows for the
examination and verification of research results. From a philosophical perspective, the accessibility of
public data is in accordance with the principles of epistemology and the philosophy of science. This
statement highlights the notion that the accessibility and verifiability of knowledge are essential since
they align with the principles of empiricism and objectivity that underpin the scientific method. The
availability of open access to data and workflows facilitates the democratization of hydrological
modeling, enabling a broader range of individuals to engage in this field. This inclusivity and
participatory approach allow for the incorporation of many viewpoints and voices, enhancing the
collective comprehension of the Earth's water systems. The fundamental concept of open science can
be seen in the accessibility of hydrological modeling data to the general public. This practice serves to
advance the principles of openness, collaboration, and the democratization of knowledge. The
availability of artifact that easily access could smoother the process of verification and validation of
the model. But, this type of reproducibility will not make significant contribution to the advancement
of scientific knowledge. The process need to implemented of the next step like using different data
with the same model to test if the model will have the same result. This process could lead to
falsification as mentioned by Popper (1968), thus can contribute more to the advancement and
accumulation of scientific knowledge. The activity open science also in line with Fleck’s argument on
collectivism. How the advancement of knowledge is influencing by community or institutional not only
individual researcher (Oreskes, 2019).

The importance of accomplishing reproducibility in hydrological modeling holds significant value in this
modern era of social media. The scientific works of researchers in the present day are subject not only
to scrutiny from fellow scientists but also to the opinions of the general public, due to the easy
accessibility of information on social media platforms. In the introductory section of this work, I have
discussed the polarization of public opinion surrounding climate studies, which can be attributed to
the presence of uncertainty in projections. Therefore, it is essential to reduce the level of uncertainty
in the outcomes of hydrological models, notwithstanding the relatively lower level of societal interest
in these models compared to climate models. In certain scenarios, while employing the model for flood
projection purposes for example, it is probable that it would attract more attention. Consequently, it
becomes necessary for the model to possess a high level of quality. The use of open scientific practices
and the promotion of data openness is expected to contribute to the preservation and improvement
of hydrological model quality. This can be achieved through fostering collaborative efforts among
scientists and researchers.

4. Epilogue

On several occasions, I encountered the aforementioned issues outlined in the paper while conducting
research for my thesis and during my internship. The absence of detailed methodological information
and the limited amount of data employed for constructing the models are notable aspects to consider.
Hence, I advocate for the promotion of openness and transparency in the conduct of hydrological
modeling among scientists. Increased collaboration would also yield advantages for the future
advancement of hydrological models, facilitating the identification of at least one ideal model that
accurately represents the realities of the world with minimal errors. This collaboration could result in
a collective workflow in hydrological models as suggested by Knoben et al. (2022). It could be
anticipated that this collaboration will yield distinct workflows adapted to different watershed
categories. In addition to ensuring model/code and data transparency we cannot only, borrow the
term from Melsen et al. (2017) just ‘bit-reproducibility’, we also need to test the model with different
dataset or test the same dataset to different model and then we can compare the result with the
original result. This way, we can know the ability of the model to simulate reality. Minimizing the
researcher's personal bias is a crucial aspect in the selection of a certain hydrological model. According
to the findings of Addor and Melsen (2019), the selection of hydrological models by scientists and
modelers is primarily influenced by institutional legacy or familiarity with certain models, rather than
considering the appropriateness of the model for the specific instance under investigation. The model
selection process has the potential to lead to inaccurate results in this case study location due to the
inadequate fit of the selected model.

The advancement of the hydrology model is reliant upon a collaborative commitment to scientific rigor
and unbiased model selection, along with the utilization of shared data and workflows. As we progress,
it is important to keep in mind that the robustness of our research and its influence on the global
community is subject to the willingness to adopt these ideas. By engaging in this practice, we may
enhance the reliability of hydrological models and make valuable contributions to the field of
hydrology and water resource management, promoting informed decision-making and sustainable
practices.

5. References

1) Addor, N., & Melsen, L. (2019). Legacy, rather than adequacy, drives the selection of hydrological
models. Water Resources Research, 55(1), 378–390. https://doi.org/10.1029/2018wr022958
2) Begley, C. G., & Ioannidis, J. P. A. (2015). Reproducibility in Science. Circulation Research, 116(1),
116–126. https://doi.org/10.1161/circresaha.114.303819
3) Essawy, B. T., Goodall, J. L., Voce, D., Morsy, M. M., Sadler, J. M., Choi, Y. D., Tarboton, D. G., &
Malik, T. (2020). A taxonomy for reproducible and replicable research in environmental modelling.
Environmental Modelling and Software, 134, 104753.
https://doi.org/10.1016/j.envsoft.2020.104753
4) Hutton, C., Wagener, T., Freer, J. E., Han, D., Duffy, C., & Arheimer, B. (2016). Most computational
hydrology is not reproducible, so is it really science? Water Resources Research, 52(10), 7548–
7555. https://doi.org/10.1002/2016wr019285
5) Knoben, W., Clark, M. P., Bales, J. D., Bennett, A., Gharari, S., Marsh, C. B., Nijssen, B., Pietroniro,
A., Spiteri, R. J., Tang, G., Tarboton, D. G., & Wood, A. W. (2022). Community workflows to advance
reproducibility in hydrologic modeling: separating Model-Agnostic and Model-Specific
configuration steps in applications of Large-Domain hydrologic models. Water Resources
Research, 58(11). https://doi.org/10.1029/2021wr031753
6) Leuschner, A. (2015). Uncertainties, Plurality, and Robustness in Climate Research and Modeling:
On the Reliability of Climate Prognoses. Journal for General Philosophy of Science / Zeitschrift Für
Allgemeine Wissenschaftstheorie, 46(2), 367–381. http://www.jstor.org/stable/44113606
7) Melsen, L., Torfs, P., Uijlenhoet, R., & Teuling, A. J. (2017). Comment on “Most computational
hydrology is not reproducible, so is it really science?” by Christopher Hutton et al. Water Resources
Research, 53(3), 2568–2569. https://doi.org/10.1002/2016wr020208
8) Oreskes, N. (2019). Why trust science? https://doi.org/10.2307/j.ctvfjczxx
9) Popper, K. R. (1959), The Logic of Scientific Discovery (Routledge Classics), Routledge, N. Y
10) Popper, K. (1968), The Logic of Scientific Discovery, Hutchinson, London
11) Reproducibility of Scientific Results (Stanford Encyclopedia of Philosophy/Summer 2021 Edition).
(2018, December 3). https://plato.stanford.edu/archives/sum2021/entries/scientific-
reproducibility/
12) Rosenberg, D. E., Filion, Y., Teasley, R., Sandoval-Solís, S., Hecht, J. S., Van Zyl, J. E., McMahon, G.
F., Horsburgh, J. S., Kasprzyk, J. R., & Tarboton, D. G. (2020). The next frontier: making research
more reproducible. Journal of Water Resources Planning and Management, 146(6).
https://doi.org/10.1061/(asce)wr.1943-5452.0001215
13) Stagge, J. H., Rosenberg, D. E., Abdallah, A. M., Akbar, H., Attallah, N. A., & James, R. (2019).
Assessing data availability and research reproducibility in hydrology and water resources.
Scientific Data, 6(1). https://doi.org/10.1038/sdata.2019.30
14) Williams, H. T. P., McMurray, J. R., Kurz, T., & Lambert, F. H. (2015). Network analysis reveals open
forums and echo chambers in social media discussions of climate change. Global Environmental
Change, 32, 126–138. https://doi.org/10.1016/j.gloenvcha.2015.03.006

You might also like