You are on page 1of 4

Towards Flexible Automated Support to Improve the

Quality of Computational Science and Engineering


Software
Davide Falessi, Forrest Shull
Fraunhofer USA
Center for Experimental Software Engineering
College Park, MD, USA
{dfalessi, fshull}@fc-md.umd.edu

Abstract—Continual evolution of the available hardware (e.g. in product and process metrics, facilitates the application of well-
terms of increasing size, architecture, and computing power) and established software engineering best practices and is based
software (e.g. reusable libraries) is the norm rather than exception. upon the principles of automation, flexibility and iteration.
Our goal is to enable CSE developers to spend more of their time
The remainder of this paper describes how this semi-
finding scientific results by capitalizing on these evolutions instead of
being stuck in fixing software engineering (SE) problems such as automated approach can be tailored to the CSE domain:
porting the application to new hardware, debugging, reusing Section II reports some of the main challenges characterizing
(unreliable) code, and integrating open source libraries. In this paper the development of CSE applications. Section III sketches a
we sketch a flexible automated solution supporting scientists and flexible automated support to improve the quality of CSE
engineers in developing accurate and reliable CSE applications. This applications. Section IV concludes the paper.
solution, by collecting and analyzing product and process metrics,
enables the application of well-established software engineering best II. CHALLENGES
practices (e.g., separation of concerns, regression testing and Current difficulties in developing accurate and reliable CSE
inspections) and it is based upon the principles of automation,
flexibility and iteration.
software include:
• Best Practices (BPs) that are not adequately tailored:
Index Terms—Computational science and engineering software, Past research in SE has provided many best practices for
empirical software engineering. developing robust applications. However, important
differences between CSE applications and more
I. INTRODUCTION “traditional” software applications make questionable the
direct application of existing BP to CSE. Specifically, CSE
Events like storms, earthquakes, power plants explosions, and applications feature complex numerical computations, and
cancer threaten the lives of billions of people every year. the aim of these applications is to achieve research goals
Computational science and engineering (CSE) software is (e.g., scientific discovery) rather than business goals (e.g.,
clearly of pivotal importance to prevent and forecast these increasing revenue). Thus, the direct application of
tragic events and hence minimize human casualties. Continual existing best practices is questionable.
evolution of the available hardware (e.g. in terms of increasing • Difficult V&V. Enacting V&V activities on CSE software
size, architecture, and computing power) and software (e.g. is very difficult because the phenomena under
reusable libraries) is the norm rather than exception. Therefore, investigation might be unknown and hence no oracle can
our goal is to enable CSE developers to spend more of their be available. Even when oracles are available, it can be
time finding scientific results by capitalizing on these hard to understand when an error is due to a defect in the
evolutions, and less time on fixing software engineering (SE) algorithm or in its implementation.
problems such as porting the application to new hardware,
• Education. Many developers of CSE applications have
debugging, reusing (potentially unreliable) code, and
earned their (PhD) degree in specialized fields, which are
integrating open source libraries.
important for applying domain expertise but does not
Prior work in the CSE domain has shown need for more use cover formal software engineering principles and
of SE best practices (BPs) but too often heavy practices have practices. It is important to build a SE body of knowledge
been a poor fit to the domain [1]. Our recent work with for CSE applications and reduce practical barriers to their
software development teams in the IT field has demonstrated use.
advances in metrics, automation and iterative approaches that
• Passion about scientific results strides with SE rigor.
can effectively semi-automate the detection of instances where
CSE researchers clearly strive to reach scientific results in
best practices have broken down and corrective action should
the shortest time. There is obviously a tradeoff between the
be taken [2][3]. The approach, by collecting and analyzing

978-1-4673-6261-0/13/$31.00
c 2013 IEEE 88 SE-CSE 2013, San Francisco, CA, USA
time spent in developing CSE application of high quality practices. Further examples of rules and best practices will be
and the time required to achieve results. provided in Section IV. Only when all the rules are satisfied
we can deem the related best practice sufficiently applied.
III. TOWARDS A FLEXIBLE SOLUTION Otherwise (i.e., if the rule is broken), the tool will warn the user
Given the abovementioned challenges, we envision a solution (see the traffic light at the top right of Figure 1).
based upon the following principles: One of the main novelties of our proposed approach is to
• Automation: Automation in metrics collection, storage, use dynamic thresholds instead of pre-defined thresholds.
and data-mining allows us to easily formalize and transfer Specifically, the threshold for each rule (reported in the top left
SE knowledge to developers [4]. of Figure 1) is computed via Data Mining technique(s)
• Flexibility: In general there is no silver bullet in SE [5]. (reported in the center of Figure 1). These techniques,
Thus, we envision a flexible approach that suggests to providing as output the value of a threshold for a specific rule,
developers possible improvements in the application under take as input:
development. The “suggestion” mechanism aims to avoid • Project and product characteristics. CSE applications are
the strict enforcement of any rules which would make the very heterogeneous and thus it is important to tailor the BP
developers reject the tool in practice [6]. according to the specific project and product
• Iteration: Our approach requires no big upfront investment characteristics. According to [1] CSE applications can vary
from the developer side and hence facilitates transitioning in terms of:
towards the application of well-established BPs and the a. Team size. CSE applications can be developed by
improvement of our rules. At the end of each iteration, past a single developer (i.e. “lone researcher”) for a
suggestions will be used in meetings as the baseline to small research project or by several developers
collaborate between the metrics team and developers for (i.e. “Community codes”) even distributed in
adjusting the encoded BP to provide more accurate and different groups and/or different geographic
customized suggestions [7]. locations. In general, the higher the number of
We propose that similar approaches can work in the CSE developers, the stringent the constraints for the
domain as well. We are advocating not to transfer specific BPs rules [8].
but rather to transfer an approach for adopting, testing and b. Code life. Small applications (e.g. a code from the
tweaking BPs. intelligence community) are usually discarded
Figure 1 below reports an overview of a flexible automated after a small period of time. In these cases, the
support to improve the quality of CSE applications. Starting application doesn’t have to be very portable and
from the top left of Figure 1, we plan to observe the efficient and hence the development activity
development activity and collect measures. These measures should not be under stringent constraints. Vice
will be analyzed by using dynamic thresholds related to versa, long living applications will probably be
specific rules. Software engineering best practices (e.g. maintained and ported to new hardware, and hence
separation of concern and regression testing) play a vital role in stringent constraints must be imposed during the
the approach and are reported in the bottom center of Figure 1. development activity.
Specific rules are derived from software engineering best
Data Mining

Figure 1 Overview of a flexible automated support to improve the quality of computational science and engineering software.

89
c. Application size. In general, the higher the size of required between the time code is ported to a new platform
the application under development, the higher the and the point at which the code correctly executes.
difficulty in its development and maintenance. • Maintainability. Fragile code requires additional effort to
Moreover, larger applications, because providing keep up to date, which detracts from the amount of effort
more features, have a higher likelihood to be that can be spent on new science. Indicators of improved
reused (at least partially) in the future. Intuitively maintainability include fewer changes or change requests;
the higher the size, the more stringent constraints less effort spent per change to the code; fewer classes
imposed during the development activity. affected in each change.
• Knowledge base. The extent to which thresholds should be To address these issues, we will develop a solution by focusing
more or less stringent has to be defined not only according on two software engineering phases that are pertinent to all
to the project and product characteristics but also CSE software.
according to historical data.
A. Architecture/Design
We note that even the best static analysis tools suffer from a
lack of context, regardless from the size and quality of the One of the key features of most large, complex software
knowledge base. Thus, thresholds, even if dynamic, can be packages is the presence of an appropriate software architecture
wrong. Therefore, we will warn the user of broken rules (see and design. A key software architecture/design practice that is
the traffic light at the top right of Figure 1) and we will applicable in this space is separation of concerns. This practice
deliberately let him/her decide whether to follow the tool states that a software system should be modularized in such a
suggestion (of changing something in the application under way that portions of the system that are logically independent
development) to meet the rules that are currently broken or let should be physically independent within the code base.
the application as it is. We note that the use of data mining will Applying this principle to CSE software will help developers
capitalize on this human decision by using the new data to identify code that has not been properly partitioned or
better adjust future thresholds. This approach aims to address modularized. For example, in a typical CSE system, it would
the abovementioned challenges in developing accurate and make sense to separate the scientific portions of the code from
reliable CSE software. Specifically, we address the difficulty of the portions handling parallel communication. This practice
not having BP adequately tailored for the CSE domain by using helps with portability because parallel features could be
dynamic thresholds and a large set of inputs as size of the updated for a new environment without disturbing the science
project, etc. This, combined with the use of iterations, allows code. It helps with maintainability and correctness because the
the BP to be tailored, step by step, according to specific types of V&V required for special-purpose science code and
application context. We addressed the difficulty of applying parallel communication code will be different. Separation of
V&V and lack of SE education by automatically checking the concerns is a genuine issue. A case study by PIs Hochstein and
quality of the code. By distilling SE best practices into rules we Shull [9] demonstrated the difficulties a team had in
transfer the SE knowledge in a viable and inexpensive way to maintaining strict separation between the code handling
CSE developers. Finally, the combined use of automation and parallel processing (which required substantial computer
flexible output should facilitate the application of SE BPs science domain expertise to code correctly) and the rest of the
without impacting the passion for scientific results. As a matter code base - even when the system was architected to make this
of fact, the flexibility allows the CSE developers to decide separation explicit. Figure 1 shows a treemap [10] view of a
when to apply the rule or when the time to achieve scientific scientific team’s code base to illustrate the presence of this
results is more important. Moreover, the automation makes the problem. Each rectangle represents a source file and the nesting
BP fast and easy to apply and hence enhance the quality of the shows hierarchical organization. The large rectangles outlined
CSE application. in orange represent packages that were intended to contain the
parallel processing, which is achieved via MPI calls. The
IV. AREAS OF TECHNICAL FOCUS brightness of the boxes indicates the density of MPI calls in
There are three types of quality that we target: each class. Note that MPI calls are scattered throughout the
• Correctness. Less time spent on debugging and reworking code that is not inside the orange boxes. This visualization
code leaves more time for scientific advancement [1]. helped demonstrate to the team how their architecture rules had
Moreover, when scientists have confidence in their code, degraded over time in their shared code base. Rules for the
they need less time for analyzing whether anomalous separation of scientific and parallel concerns will define how
answers are caused by problems in the science or in the these concerns are represented in the architecture of the
software. So, correct code can also result in less time spent software application. For example, if a designated set of
“debugging” correct science. Measurable indicators of software modules is responsible for taking care of distributing
correct code include: fewer defects being found, less team data and computation in a parallel environment only these
effort spent on rework versus adding new functionality. modules are allowed to use the parallel statements of the used
• Portability. CSE software often migrates to different language
platforms to exploit increased computational power or to B. Verification and Validation (V&V)
different compilers. The primary measurable indicator of
As with any software for which the correctness of the results is
more portable code is less effort (or calendar time)
important for supporting critical decisions, V&V is an

90
important, and often times quite difficult, task in most CSE Correctness, portability and maintainability are our
projects. This difficulty is exacerbated by many factors priorities and several related software engineering best
including the complexity of the software and the lack of a test practices can be applied for the development of computational
oracle against which to test results. science and engineering software. However, dealing with the
human aspects characterizing the development of CSE
applications still remain one of the main challenge.
Automation, flexibility and iteration seem to be good
ingredients to enable CSE developers to provide accurate and
reliable CSE software.
REFERENCES
[1] V. R. Basili, J. C. Carver, D. Cruzes, L. M. Hochstein, J. K.
Hollingsworth, F. Shull, and M. V. Zelkowitz, “Understanding
the High-Performance-Computing Community: A Software
Engineer’s Perspective,” IEEE Software, vol. 25, no. 4, pp. 29–
36, Jul. 2008.
Figure 2 Distribution of MPI calls in the FLASH code visualized as a treemap. [2] N. Zazworka, M. A. Shaw, F. Shull, and C. Seaman,
“Investigating the impact of design debt on software quality,” in
As one example, regression testing [11] is a method of Proceeding of the 2nd working on Managing technical debt -
developing test cases that can be executed to ensure that code MTD ’11, 2011, p. 17.
modifications do not affect the correct execution of that code.
[3] J. Schumacher, N. Zazworka, F. Shull, C. Seaman, and M.
Such a regression test suite may contain a number of unit tests Shaw, “Building empirical support for automated code smell
that can be run relatively quickly, as well as a set of important detection,” in Proceedings of the 2010 ACM-IEEE International
integration tests that require more time to run and may be Symposium on Empirical Software Engineering and
executed nightly. Regression testing helps with correctness by Measurement - ESEM ’10, 2010, p. 1.
demonstrating that code components are functioning correctly [4] S. Olbrich, D. S. Cruzes, V. Basili, and N. Zazworka, “The
even if other aspects of the code are undergoing substantial or evolution and impact of code smells: A case study of two open
frequent changes. It helps with maintainability by making source systems,” in 2009 3rd International Symposium on
changes that break the existing correct behavior easily spotted. Empirical Software Engineering and Measurement, 2009, pp.
It helps with portability by easily identifying where the code 390–400.
breaks when porting to a new environment. This practice [5] F. P. Brooks, “No Silver Bullet Essence and Accidents of
supports the human developers by unloading the responsibility Software Engineering,” Computer, vol. 20, no. 4, pp. 10–19,
for validating correctness of all prior work as new science is Apr. 1987.
being done. Regression testing practices need to be tailored for [6] N. Zazworka, V. R. Basili, and F. Shull, “Tool supported
the CSE domain, since CSE codes often face the additional detection and judgment of nonconformance in process
complexity of needing to run tests across multiple architectures execution,” in 2009 3rd International Symposium on Empirical
and compilers. This creates additional problems that need to be Software Engineering and Measurement, 2009, pp. 312–323.
addressed in creating, managing, and maintaining regression [7] P. Kruchten, R. L. Nord, and I. Ozkaya, “Technical Debt: From
test suites. For example, a test suite may be both effective and Metaphor to Theory and Practice,” IEEE Software, vol. 29, no.
useful on an x86-based Linux machine running on gcc, but 6, pp. 18–21, Nov. 2012.
refuse to compile (or return slightly different results) when [8] D. Falessi, G. Cantone, S. A. Sarcia’, G. Calavaro, P. Subiaco,
used on an IBM PowerPC-based system. Detecting such and C. D’Amore, “Peaceful Coexistence: Agile Developer
discrepancies and maintaining test suites that work correctly Perspectives on Software Architecture,” IEEE Software, vol. 27,
across platforms present additional costs that need to be no. 2, pp. 23–25, Mar. 2010.
managed. Rules for regression testing define which parts of the [9] L. Hochstein, F. Shull, and L. B. Reid, “The role of MPI in
code are expected to be tested in the parallel environment, and development time: A case study,” in 2008 SC - International
how complete this testing effort should be. Existing metrics, Conference for High Performance Computing, Networking,
such as line, path, and branch coverage will be used to describe Storage and Analysis, 2008, pp. 1–10.
the quality of the testing effort. Example rules could specify [10] L. Hochstein, V. R. Basili, U. Vishkin, and J. Gilbert, “A pilot
desired levels of code coverage to be exercised by the test suite, study to compare programming effort for two parallel
or specify that updates to the test suite should occur programming models,” Journal of Systems and Software, vol.
periodically based on the degree of updates to the code itself. 81, no. 11, pp. 1920–1930, Nov. 2008.
[11] H. Do, S. Mirarab, L. Tahvildari, and G. Rothermel, “An
V. CONCLUSIONS empirical study of the effect of time constraints on the cost-
This paper presented the ideas of a flexible automated support benefits of regression testing,” in Proceedings of the 16th ACM
to improve the quality of computational science and SIGSOFT International Symposium on Foundations of software
engineering - SIGSOFT ’08/FSE-16, 2008, p. 71.
engineering software.

91

You might also like