You are on page 1of 14

Information and Software Technology 144 (2022) 106787

Contents lists available at ScienceDirect

Information and Software Technology


journal homepage: www.elsevier.com/locate/infsof

Impact of software development processes on the outcomes of student


computing projects: A tale of two universities
Rafal Włodarski a , Aneta Poniszewska-Marańda a ,∗, Jean-Remy Falleri b
a Institute of Information Technology, Lodz University of Technology, Łódź, Poland
b
LaBRI, UMR CNRS University of Bordeaux – ENSEIRB-MATMECA Bordeaux INP Talence; l’Institut Universitaire de France, France

ARTICLE INFO ABSTRACT

Keywords: Context: Project-based courses are more and more commonly used as an opportunity to teach students
Software engineering structured methods of developing software. Two well-known approaches in this area – traditional and Agile
Comparative study – have been successfully applied to drive academic projects. However too often the default is still to have
Capstone project
no organizational process at all. While a large variety of software development life-cycle models exists, little
Student projects
guidance is available on which one to choose to fit the context of working with students.
Education
Computer science education
Objective: This paper assesses the impact of iterative, sequential and ‘‘hands-off’’ development approaches
on the success of student computing projects. A structured, metric-based assessment scheme was applied to
investigate team productivity, teamwork and the quality of the final product.
Method: Empirical evidence was collected during a controlled experiment carried out at two engineering
schools in Europe. More than 100 students at Bachelor’s and Master’s levels participated in the research, with
varied software development and teamwork skill sets.
Results: Similar patterns were observed among both sets of subjects, with iterative teams demonstrating the
highest productivity and superior team cohesion but a decline in the quality of the final product. Sequential
development led to a considerable improvement in the external quality characteristics of the software produced,
owing to the method’s stress on design activities.
Conclusion: The findings of this study will be of use to educators interested in applying software development
processes to student groupwork. A set of guidelines is provided for applying a structured way of working in a
project-based course.

1. Introduction There exist a variety of life-cycle models for software develop-


ment [3,4], which may be adapted to suit the university context.
While many student projects continue to follow a delegative or However, there is very little guidance available with regard to the
‘‘hands-off’’ approach to software development – with students grouped choice of particular methods, given the constraints of academic projects
in teams, set the task, and told to ‘‘get on with it’’ – there is growing
and the particularities of working with students. While there have
interest in using such projects as an opportunity to enable students
been several empirical studies on the use of different development
to develop more structured ways of working, which mirror industrial
approaches in student projects [5–9], the results vary due to contex-
practice [1]. The increasing popularity of Agile methods in industry
has been accompanied by a large number of studies regarding their ap- tual factors, such as the selection of participants and the way the
plication in student projects. In comparison, traditional methods have project is designed and executed. A comprehensive experimental study
received relatively little attention [2]. Are such classical approaches is needed to understand of how different working procedures influence
such as those based on the Waterfall model simply outdated, or do the outcomes of student projects.
they also have something to offer? Although there have been informal This work set out to investigate through a controlled experiment the
debates for almost two decades about which methodologies are best impact of various software development approaches on the outcomes of
in terms of their impacts on the project, the people, and the final
student computing projects. Two institutions participated in the study.
product, there are few rigorous studies providing direct comparisons
The results were evaluated along three axes: product quality, team
and objective metrics of the various software development processes.

∗ Corresponding author.
E-mail address: aneta.poniszewska-maranda@p.lodz.pl (A. Poniszewska-Marańda).

https://doi.org/10.1016/j.infsof.2021.106787
Received 12 April 2021; Received in revised form 4 August 2021; Accepted 12 November 2021
Available online 3 December 2021
0950-5849/© 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
R. Włodarski et al. Information and Software Technology 144 (2022) 106787

productivity, and teamwork. A total of eighteen teams of student devel- At the other end of the spectrum lies a study by Benediktsson
opers applied either sequential (Waterfall-like), iterative (Scrum-like), et al. [7], who applied extensive measures of team effort, productivity,
or ‘‘hands-off’’ (control group) methods to complete similar projects and software quality to compare the V-model (VM), the incremental
within the same domain. model (IM), the evolutionary model (EM) and Extreme Programming
The two groups of students comprised: (XP). Their experiment involved 55 participants in 15 teams of 3 or
4 students, developing comparable products according to one of the
• early Bachelor’s level students, most with no experience of work-
methods (assigned randomly). The study presents several findings:
ing in teams (of more than two) to design and develop software,
• Master’s level students, most of whom had participated in capstone- 1. projects following VM took longer than those using the other
like courses in the past and possessed some relevant work expe- three models, between which there was a time difference of no
rience. more than 10%,
2. XP was associated with the largest number of hours spent on
The study was designed to answer the following research questions:
testing and code correction, whereas in the VM project team
• RQ1: Does introducing a software development process add value most time was spent on establishing the requirements, designing
in at least one of the success dimensions of the student project? and programming,
• RQ2: Do the results of following a given process depend on the 3. XP teams produced on average 3.5 times more lines of code
level of studies (undergraduate/graduate)? (LOC) than the VM team, 2.7 times the LOC produced by the
IM team and 2.2 times more code than the EM teams,
This paper is an extended version of a preliminary report [10], 4. the XP teams were 4.8 times more productive than the VM
which in addition provides a comprehensive overview of study, discus- teams, 2.8 more productive than the IM teams and 2.3 more
sion of related work, visualizations of the results, and extended analysis productive than the EM teams.
and conclusions.
The major contributions of this study are three-fold: However, the underlying assignments were distinct, implying differ-
ing workloads and preventing straightforward comparison. Moreover,
1. demonstrating the impact of two software development pro- the purpose of the study was to provide data to guide IT professionals
cesses on three different series of data (concerning the project, using student teams as a proxy, rather than for academics wishing to
product and people) relative to a control group, apply a given approach to capstone-like projects.
2. providing empirical evidence collected during a controlled, multi- Germain et al. [5] also focused on efficiency in their comparative
regional experiment with two sets of subjects: Bachelor’s and study of the Unified Process for Education (UPEDU, derived from
Master’s students, the Rational Unified Process) and an Extreme Programming-like ap-
3. presenting a set of recommendations for organizing project- proach. The principal objective of their experiment was to investigate
based courses and students’ work in order to facilitate successful the distribution of effort across different cognitive activities: active
outcomes. (writing code, testing, integration efforts), reflexive (browsing technical
Certain patterns were observed in both study groups. The iterative documentation, browsing the web to find solutions) and interactive
teams showed the highest productivity, with superior team cohesion (discussing progress/issues, reviewing peer work). Three teams of four
but a decline in the quality of the final product. The upfront design students followed each of the methods in parallel, developing the same
principle inherent in sequential development led to a considerable project specifications over a period of 60 working days. The authors
improvement in the external quality characteristics of the software report that heavy-weight processes such as UPEDU put more emphasis
produced. The data on different dimensions of success can be used to in- on pre-coding activities, whereas XP required more ad-hoc communica-
form the choice of a process model for projects in the university setting. tion. Although these conclusions are in line with the characteristics of
Our conclusions may also provide a starting point for the development the evaluated processes, the authors note that their impact on overall
of tailored, well-conceived software development processes for aca- effort was not significant, contrary to the findings of Benediktsson
demic projects. Course instructors looking to structure students’ work et al. [7].
over the semester could apply our insights to increase the chances of a A study by Rundle and Dewar [9] partially anticipates the set-up
positive outcome and create a learning environment that is favorable of our experiment, as they applied plan-driven and Agile approaches
to the development of skills sought after in the software industry. The to the development of undergraduate group projects. The Return on
rest of the paper is organized as follows. Related work is discussed first, Investment (ROI) principle was used to compare the outcomes of the
then the process of planning the experiment and operational aspects are frameworks, and did not reveal any startling differences between the
described. The results of the study are presented and discussed, before two methods. It is important to note that the ‘‘plan-driven’’ method
final conclusions are drawn and guidelines provided. was similar to the control group in our study, as it involved minimal
intervention from the instructor. The students were by default inclined
2. Related work to work in a plan-driven manner, but were not guided to do so.
A recent study comparing Agile and plan-driven development meth-
Software development methods have been taught at both graduate ods was carried out by Missiroli et al. [11]. A total of 160 students from
and undergraduate levels for at least two decades. However, only a seven schools took part, divided into 34 teams working either according
handful of studies have investigated the use of different approaches to to Scrum principles or a Waterfall process. All of the deliverables were
the management of software creation in educational settings. One of graded based on the number of functions completed, adherence to the
the first investigations in this area was by Umphress et al. [6], who assigned process and the overall learning experience. The Scrum teams
traced the evolution of applying different processes – from a delegative scored highest for the first two evaluation criteria, whereas the projects
or ‘‘hands-off’’ approach to a heavy-weight method (MIL-STD-498, IEEE developed in a sequential manner exhibited better non-functional char-
1074, Team Software Process) and Agile (Extreme Programming) – acteristics, such as performance and usability. It is important to note
to student projects. The insights gathered over a total of 49 capstone that the grading scheme suffered from a high degree of subjectivity,
projects revealed that introducing software processes into the classroom with the exception of the number of functions delivered.
is challenging, requiring proper tools as well as tailoring to facilitate the Finally, the most pertinent study to the research presented in this
learning objectives. Although the authors present lessons learned from paper was conducted by Wellington et al. [8]. These authors examined
the application of various processes, no quantitative data is provided team cohesion, source code metrics and usability aspects of the devel-
which would allow for direct comparison. oped solutions to compare student performance using the Traditional

2
R. Włodarski et al. Information and Software Technology 144 (2022) 106787

Table 1
Comparative studies of software development methods used by student teams (information listed in rows 1–5 as reported by the study authors).
Benediktsson et al. [7] Germain et al. [5] Rundle and Missiroli et al. Wellington et al. [8]
Dewar [9] [11]
Development V-model, incremental Unified Process for Plan-driven, Waterfall, Scrum Traditional Life Cycle,
methods compared model, evolutionary Education, Extreme Agile Extreme Programming
model, Extreme programming
Programming
Subject final year students of senior-year students in Third-year High school Upper division Computer
demographics Master’s in Computer Computer Engineering students of students with Science students
Science Computer two years of
Science programming
experience
Sample size per Three to four teams Three teams Two teams Seventeen teams One team
development (Agile), five (of four to six
method teams students) per
(plan-driven) development
method
Response variables Team effort, Team effort Product size in Productivity, Team productivity, source
productivity and terms of value adherence to the code and product quality,
software quality added, team process team cohesion
effort
Metrics for the Hours spent on the Cognitive activity Return on Number of Lines of code, method
response variables project, lines of code, classification: investment (ROI) functions length, cyclomatic
lines of code per active/reflexive/interactive delivered complexity, nested block
month, metrics for depth, number of
quality not provided parameters/method,
number of attributes per
class, weighted
methods/class, lines of
code per engineer,
attachment to the team
and project
Findings reported (1) V-model teams took No significant difference in No significant (1) Scrum teams (1) Similar amounts of
longer to complete the the overall effort differences scored higher in functionality were
projects and were least between the terms of delivered by both teams
productive (2) XP methods functionalities (2) The source code
teams were most delivered and produced by XP teams was
productive adherence to the of higher quality (3) The
process (2) Traditional Life Cycle
Waterfall teams produced more usable
produced software (4) XP teams
software of exhibited higher levels of
higher usability team cohesion
and performance
Limitations Lack of a control group Lack of a control group No guidance Subjectivity of Very low sample size; lack
Results for quality provided.to the metrics; lack of of a control group
aspects not significant plan-driven a control group
teams; lack of a
control group

Life Cycle and Extreme Programming. The experimental setup involved audience of academics. Each group developed the same software speci-
two groups of 15 and 16 students, each following one of the methods fication. The work processes were adapted to be relatively generic and
over one semester. Both groups delivered a similar amount of function- represented a family of approaches, rather than a specific framework
ality. However, the code developed by the XP team was of significantly for development.
higher quality, thanks to frequent refactoring. Students following this
approach demonstrated consistently higher levels of team cohesion. 3. Experimental design
The solution developed under the plan-driven method was perceived
as being of higher usability by the students; nevertheless, this was a Both parts of the experiment were completed during the spring
subjective evaluation, rather than an expert opinion or quantitative semester of the 2018/2019 academic year. One of the authors per-
assessment. formed the experiments, with the other two supervising at their re-
Given the inconsistent results and methodologies reported in the spective universities. A teaching assistant provided support in each
existing body of knowledge (Table 1), further experimental evidence is class.
necessary before the effectiveness of different development methods on
the outcomes of student projects can be reliably evaluated. Our study 3.1. The study environment: setting
complements the previous research in this area, by taking a more sys-
tematic approach to the assessment of sequential and iterative software Bachelor’s students. The experiment was conducted within the context
development. The experimental groups, including the control groups, of a compulsory Web-programming course, introducing tools and tech-
were composed of undergraduate (Bachelor’s) or graduate (Master’s) nologies (PHP, HTML, CSS) used to build Web applications. It consisted
students. The teams were composed of between four and six students. of lectures (5 h), tutorials (20 h) and supervised assignment work (10 h)
The setup of the investigation was designed to be typical of student – equivalent of 2,5 ECTS points. Following the European Credit Transfer
projects in higher education, and therefore of interest and use to a large and Accumulation System guidelines [12], students were expected to

3
R. Włodarski et al. Information and Software Technology 144 (2022) 106787

Table 2 • the architecture and design specification were completed before


Duration of different phases of the Waterfall model in each experimental group. implementation,
Analysis & design Implementation Testing • programming work was executed only in the programming phase,
Course 1 – Bachelor students 1 week 3 weeks 1 week • testing and bug fixing were completed at the end of the project.
Course 2 – Master’s students 4 weeks 6 weeks 2 weeks
Project Resource Management, an additional technique from the
Project Management Body of Knowledge (PMBOK) which supports
traditional software development [15], was also applied by defining
dedicate between 30 and 40 h to the course in addition to contact specific project roles. One of the group members was designated as
hours. Their task was to implement in groups of three to four students the Project Manager, and was responsible for coordinating teamwork
an application to keep track of and share expenses. according to the plan and task estimates. These were tracked using a
supporting tool (Jira, GitHub issues).
Master’s students. The experiment was performed as part of a 15-week
course titled ‘‘Analysis and design of information systems’’, a compul-
3.2.2. Iterative approach
sory class worth 5 ECTS points and an associated workload of 65 to 90 h
The iterative development method used in the study was a Scrum-
outside classes. The 2 h of lectures and 2 h of supervised assignment
like approach, adapted to the university setting, based on the Scrum
work were scheduled per week. The main learning objectives were as
Guide [16,17]. Reproducing a Scrum-based approach proved chal-
follows:
lenging for the Bachelor’s students, as the laboratory classes were
• application of analysis, design and implementation tools for soft- organized around five follow-up sessions. This left limited time to
ware engineering, introduce the framework. The course was divided into two two-week
• development of an information system leveraging effective team- ‘‘sprints’’ and one one-week sprint. While it is recommended to keep
work and project management. the sprint duration constant, this was not possible, due to the course
schedule. The purpose of the final sprint was therefore not to develop
The assignment was to deliver a Web application for use by HR new functionalities, but rather to incorporate feedback from the sprint
departments, to enable the creation of recruitment tests in multiple review and to complete work on any remaining functionalities. Each
languages, including translation tools (such as Wikipedia and synonym sprint began with a planning meeting and ended with a review, during
APIs) to facilitate the task. which the instructor provided feedback in terms of functionality and
usability. A stand-up meeting was held in the intervening class. Given
3.2. The study environment: processes the students’ low level of programming experience and the limited
number of follow-up sessions, the decision was taken by the instructors
To compare sequential and iterative software development, repre- to cease the retrospective sessions and use the time instead to provide
sentative of each approach – Waterfall-like and Scrum-based one – technical assistance (many problems had been encountered in past
were selected and introduced to the separate groups. A ‘‘hands-off’’ instances of the course).
control group was also created, which was not given a process to follow, On the Master’s course, 12 weeks of classes were assigned for
although the instructor occasionally inquired on the project status the development of the projects, according to Scrum principles. This
and provided assistance when necessary. The three groups were of suggested an obvious division into four sprints of equal length. A ret-
comparable size. Henceforth, the groups will be referred to as follows: rospective meeting was carried out after each sprint. Since the weekly
workload was higher than that assigned to the Bachelor’s students, an
• Gwaterfall: teams following the Waterfall-like process, additional stand-up meeting could be organized outside of class time
• Gagile: teams working according to the Scrum-like process, (the instructor was provided with a voice recording of the meeting).
• Gcontrol: teams not following any development process. Each team was asked to formulate its own set of rules, based on
a small number of examples, which had to be satisfied in order to
The course instructors completed requirements engineering before consider the work fully completed (the ‘‘Definition of Done’’). Basic
the start of the semester. As is typical for an academic setting, the rules, such as testing the functionality of the product and committing it
requirements were for the most part stable. The decision was made to the repository, were imposed to ensure a minimal quality threshold
not to introduce changing needs, which would skew the experiment for the Definition of Done. The roles were defined following common
towards the iterative approach. However, there was still a level of practice [18,19], with the instructor playing the role of the Product
ambiguity given that the tasks were conceived as research projects with Owner and a volunteer student acting as the Scrum master. Finally,
some variability of outcome. The same deliverables were requested of technical practices associated with Agile software development, such
all the teams in each experimental set of subjects. as automated tests and continuous integration, were not applied, as
they were not specified as part of the learning outcomes and would
3.2.1. Sequential approach have required additional work. However, all the groups, regardless
The Waterfall model was not simple to implement, as the duration of the development approach being used, were expected to produce
of the course did not allow for all of the phases originally identified ‘‘clean code’’, which was graded according to the procedure described
by Royce [13]. Therefore, the schedule imposed a progression through in Section 3.6.
four phases: analysis, design, implementation and testing. The dura-
tions of each phase are presented in Table 2. The remaining three weeks 3.3. Variables
of the study on the Master’s course were dedicated to organizational
matters—including an introduction the course and teambuilding (1 Independent variables. The independent variable in this study
week), a ‘‘Jigsaw exercise’’ during which students executed predefined was the development approach, which is applied by student teams.
test scenarios on applications created by other teams (1 week) and final The effects of sequential and iterative processes were evaluated and
presentations (1 week). compared against a control group, which did not follow a formalized
The structure, milestones and artifacts were characteristic of plan- work process.
driven approaches, as described by Hirsh [14]: Dependent variables. One of the main motivations of this work
was to assess both artifact and process aspects of project success, in
• the desired functions/properties of the software were specified order to operationalize the response variables. A multi-dimensional ap-
beforehand, proach based on [20] was applied to the following dependent variables:
• a detailed plan of all project phases was constructed, product quality, team productivity and teamwork quality (Table 3).

4
R. Włodarski et al. Information and Software Technology 144 (2022) 106787

3.3.1. Product quality 3.3.3. Teamwork quality


Software project quality refers to the internal and external char- The quality of teamwork can have a direct influence the outcomes
acteristics of the produced artifact. The internal characteristics were of a group project [30]. Effective teamwork is therefore considered an
evaluated based on the correctness of the program’s source code, important success factor by the project management community [31].
whereas external quality was quantified via dynamic assessment of the Its use as an evaluative tool is especially justified in the context of
user experience [21]. student projects, which are intended to equip future graduates with soft
competences, such as effective collaboration and communication skills.
Internal quality. One of the main learning objectives of the course As highlighted in [32], team cohesion is highly correlated with project
for Bachelor’s students was to teach them HTML programming. The success and is critical for team effectiveness [33]. It was therefore used
correctness of their client-side code was therefore incorporated in the as a measure of the quality of teamwork and assessed using an adapted
grade scheme, including the number of errors (A) and warnings (B) form of The Group Environment Questionnaire [32].
generated during W3C validation for a pre-selected portion of pages.
These metrics were not applied to assess the Master’s course, as those 3.4. Hypothesis formulation
students were free to use any Web development framework of their
choice, some of which generate HTML code automatically. Based on the findings summarized in the related work section and
A technology-agnostic solution was applied to evaluate the server- applicable in the context of this experiment, with the results reported
for the XP process in [7,8] serving as a proxy for other Agile methods
side code produced by both experimental groups. Static-analysis soft-
and iterative development, three hypotheses were formulated:
ware, widely used in industry to assess the maintainability of pro-
grams [22], was used to minimize the workload associated with the 1. The iterative approach would yield higher team productivity
evaluation of the students’ source code, as is common practice in than the sequential approach (similar to [7]).
Computer Science education [23]. The software chosen to assess the 2. The sequential approach would produce software with higher
maintainability ranking was developed by SIG, a software management usability compared to the iterative approach (similar to [8]).
consulting company, in collaboration with TV Informationstechnik. The 3. Groups using the iterative approach would exhibit higher levels
underlying technical quality model involves analysis of the follow- of team cohesion compared to the sequential approach (similar
ing metrics [21]: Lines of Code (LOC), duplicated LOC, Cyclomatic to [8]).
Complexity, parameter counts and dependency counts.
3.5. Participants and sampling
External quality. Although many facets of external quality can be eval-
uated, their relevance depends on the type of application. Given that One of the most important sources of variation in empirical software
the assignments for both courses involved Web programming, usability engineering studies is the skill level of the subjects [34]. Different
was the main external quality attribute. Usability is defined by the ISO sampling methods were therefore applied, depending on the seniority of
25010 standard [24] as the ‘‘degree to which a product or system can the participants in the experiment. All students were asked to complete
be used by specified users to achieve specified goals with effectiveness, a demographic questionnaire before the first class, which surveyed their
efficiency and satisfaction in a specified context of use’’. In their model, background in technologies and skills relevant to the course, including
Orehovacki et al. [25] extended this definition by identifying two knowledge of the associated tools and pertinent work experience. All
sub-characteristics – efficiency and effort in use – with a set of 11 questions were mandatory.
objective and effort-based measures. Since some of these measures are
Bachelor’s students. Data were collected at a French engineering school,
related to user behavior rather than to design and structure (e.g. mouse among 58 first year students on a Telecommunications program. The
movement/clicking speed), they were not considered suitable for com- program comprises six semesters (from BAC+3 to BAC+5, equivalent
paring the usability of the applications produced in this study. In our to BSc and MSc), during which the students should master four main
evaluation of effort in use and efficiency in use we therefore restricted areas: signal processing, digital communications, networks, and com-
the metrics to the following: distance, mouse clicks, mouse double puter science. Most of the students had graduated from scientific classes
clicks, mouse wheel scrolls. preparatoires (concentrating on mathematics, physics and chemistry,
and including introductory classes in computer science). They therefore
3.3.2. Team productivity had a similar, beginner level and experience of coding.
Given the homogeneous background of the students (only 26% had
In the software industry, team productivity largely relates to re-
worked previously with Web technologies), they were allocated to
source utilization and efficiency [26]. Many metrics have been devel-
the three laboratory groups in alphabetical order and then randomly
oped to track team effort. These mostly rely on data collected from
assigned to teams. This ensured a uniform distribution of the sam-
collaboration tools, such as Jira or PivotalTracker, as well as on the
ple between the different development approaches. The technological
team’s ability to correctly assess their own productivity. However, our
knowledge of each team member was evaluated to ensure that there
previous experience suggested that such methods are unreliable when were no major differences in their combined skillsets.
evaluating student projects. Other studies [27,28] have also reported
that the number of hours logged is not a representative measure of Master’s students. Data were collected at a leading engineering school
productivity in academic projects. Therefore, the team productivity in Poland, from among 46 students enrolled in a Master’s in Computer
response variables were operationalized as the output of student work, Science. The majority of the participants had completed an undergrad-
defined by the amount and quality of the delivered functionality. Two uate degree in the same subject. However, their experience and skills
of the functional suitability characteristics identified in the ISO 25010 varied to some extent: 17% had graduated from non-related programs,
standard [24] were employed: Functional completeness (in terms of while 43% had previous relevant work experience (more than 6 months
in commercial software development activities). Groups of four to six
the number of requirements expressed in use cases and delivered at
students were created by the course instructors, who sought to ensure
the end of the semester) and Functional correctness (the ratio of use
that the competencies were spread as equally as possible between the
cases containing bugs to the total number of use cases tested). A similar
teams. Comparisons were based on:
approach to evaluating project performance, based on the efficiency
of the team developing the project and the number of errors in the • the student profiles (previous degree, grade point average, work
produced software, has been reported in [29]. experience, familiarity with the technologies and tools),

5
R. Włodarski et al. Information and Software Technology 144 (2022) 106787

Table 3
Operationalization of constructs and instrumentation of dependent variables examined in the experiment (*These metrics were tracked for
Bachelor students only).
Construct Operationalization Instrumentation
Product Quality — Client-side code quality* HTML errors and warnings detected while
internal performing W3C validation
Server-side code quality Maintainability rating
Effort in use metric measured as the number of
Product Quality — Software
clicks needed to perform user scenarios, as per the
external usability
requirements specification
Effort in use metric measured as the distance
traveled by the mouse to perform user scenarios,
as per the requirements specification
Efficiency in use metric measured as the time
needed to perform user scenarios, as per the
requirements specification
Team Functional completeness Functional completeness metric measured as the
productivity degree of realization of the functional
requirements of the project
Functional correctness Functional correctness metric measured as the
precision of functionalities specified in the project
requirements
Teamwork quality Team cohesion Team cohesion level measured via the Team
Environment Questionnaire

• the outcomes of Belbin’s team role inventory test, To measure team cohesion, Carron et al. [36] developed the Group
• the personal preferences of each student (two priorities of the Environment Questionnaire, which has proved to be a successful assess-
roles were disclosed, applicable for the sequential laboratory ment tool for sports teams. Although it requires some adjustments for
group). computer science-related projects, it is still applicable, as confirmed by
a study at Shippensburg University [8]. The following subset of adapted
Nine teams were formed with similar combined technical skills. questions was used to evaluate team cohesion:

• Our team is united in trying to reach the goals of the course.


3.6. Instrumentation and data collection
• Our team members have a common vision for the project’s future.
• Our team would like to spend time together once the course is
Demographic information on the subjects was collected using sur-
over.
veys disseminated via Google forms. Templates of the project docu-
• Members of our team socialize outside course-related activities.
mentation were provided to students via Google drive, to guide the • I am happy with my team’s level of desire to succeed.
design and testing activities. Detailed instructions and samples were • My team gives me enough opportunities to demonstrate my abil-
included, to ensure proper understanding of what was expected and to ities and skills.
facilitate the students’ work. The project templates were independent • I enjoy being a part of the social activities of this team.
of the treatment and shared by all teams in the experimental group. • For me, this team is one of the most important social groups to
Once the students had submitted their projects, they were assessed which I belong.
according to the quality and team productivity. The internal quality
aspects of the produced software were assessed using the following The students were asked to indicate their level of agreement with
tools: the above statements on a 4-degree Likert scale, to probe their per-
ceptions of the teamwork and confidence in the project. The survey
• online W3C validator for the HTML, was repeated systematically during the semester, in order to obtain
• BetterCodeHub to assess the Maintainability rating of the server- a broader overview of the metric and potentially map the results to
side code. conflicts or periods when the teams were struggling to meet course
objectives. Tracking team cohesion every two to three weeks ensured
Similarly to the method used in a study by Orehovacki et al. [25], timely and useful feedback, while not burdening the participants with
external quality data were collected using Mousotron software [35] excessive form-filling. The students were reminded that team cohesion
for a preselected number of use cases common across all groups. The was not included in the grading scheme each time they were asked
same researcher performed the measurements on a single machine, and to complete the questionnaire, so the students were not encouraged to
duplicated the tests to ensure the repeatability of the results. self-monitor their responses to create a favorable impression.
Team productivity metrics were evaluated in the usual way the We also conducted a voluntary post-experiment survey, in which
instructors evaluated student work. At the end of the semester, the we measured the subjects’ perceptions of the prescribed development
instructors verified that the functionality fulfilled the requirements method. The end-of-term questionnaire (the same for both sets of
and assessed its quality. Determining the functional completeness of a subjects) was composed of two parts:
project is straightforward —- evaluators perform selected use cases, by
1. Mandatory closed questions regarding the students’ satisfaction
clicking through the application and recording the results. This binary
with the process, answered on a 7-Likert scale.
result is complemented by a functional correctness metric, based on
2. Optional open-ended questions regarding what was good, what
the instructor’s judgment, which identifies any functional issues and
could be improved and pain points.
places them on a scale from 0 to 1 (if analogous types of problem
occur in different groups, the same score is assigned). Needless to say, The closed questions were a subset of the questionnaire used in [15],
this method of ranking bug severity is imperfect, but was considered as some of the original queries were deemed redundant. Each question
sufficiently reliable for the purposes of our study. probed one of the following theoretically motivated constructs [37–40]:

6
R. Włodarski et al. Information and Software Technology 144 (2022) 106787

1. Satisfaction: the teams were multidisciplinary and in most cases team members
contributed to both code bases. Moreover, the control group scored
• The development approach applied was a useful tool to
the highest in the Maintainability ranking, in both parts of the study.
manage the project.
This may be explained by the fact that the absence of ceremonies
• Overall, I am satisfied with the development approach used
and scheduled communication left more time for actual development
in the course.
and quality assurance. Cockburn observes [41] that a relatively small
• I believe future offerings of this course should continue to
increase in methodology size or density adds a relatively large amount
use this development approach.
to the project cost and a balance must be found between the size
2. Relative Advantage: of a problem, the number of people solving it and the demands of
the methodology. That said, the freedom left to the control group
• Using this development approach had a positive impact on contributed to only a marginal increase in this code-related metric,
my effectiveness. especially when compared to the sequential approach. Therefore, we
• Using this development approach improved the quality of cannot conclude with certainty that not following any development
my work. approach contributes positively to internal quality.
• Overall, I find using this development approach to be
advantageous. Finding 1: Among the Bachelor’s students, he control group scored
considerably higher on the HTML quality index. Nevertheless, according
3. Compatibility: to the instructors’ observations, this result was due to the team structure
(dedicated sub-teams worked on HTML or PHP code bases) rather than
• I think that using this development approach fits well with
to the ‘‘hands-off’’ approach. In both parts of the experiment, the control
the way I like to work.
group also did best in terms of the Maintainability ranking. However, the
• Using this development approach is compatible with the
differences between the approaches were minor.
way I like to complete projects.
External quality analysis showed that the sequential approach
4. Ease-of-use: (Gwaterfall) resulted in applications that were noticeably more user-
friendly (Table 6). Although the three groups had to submit the same
• Learning this development approach was easy for me. deliverables, including wireframes of the application screens, the
• Overall, I believe that this development approach is easy Waterfall-like teams had to conceive them upfront. The other groups
to use. (Gagile and Gcontrol) had to submit the same artifacts, but could
work on them at any time during the semester. Although 100% of
3.7. Execution the students claimed to have prepared wireframes before or during
implementation, this did not result in a coherent application design
At the start of the course, the students were informed of the ex- that would positively impact the usability of the application (Fig. 2).
periment and it was explained that the final grades would not be Interestingly, the feedback provided on regular basis to the Gagile
linked to the applied development approach. The class schedule and group did not seem to improve the overall usability of the applications.
activities were presented. Each group was given a brief description of Although most of suggestions made during sprint reviews were incor-
the method of work assigned to them. All teams in the study were eval- porated, it was only during end-to-end testing that shortcomings were
uated according to the same scheme, which was fully transparent and revealed. This can be explained by the fact that the groups worked on
communicated upfront. Grades were given for the developed product their user interfaces incrementally, without having defined a overall
(assessed in terms of its internal and external quality, as well as team vision of the application beforehand, as Gwaterfall did.
productivity, see Section 3.6), the associated documentation (calibrated
Finding 2: Both data sets revealed that the sequential approach with its
to be the same for all groups), and the timely delivery of artifacts.
upfront design practice had a positive impact on the external quality of the
The students were given guidance by the course supervisors on the
software produced.
project management methods throughout the semester. A framework
for the planning meetings in the sequential approach was provided Assessment of the project output confirmed our hypothesis that the
(based on the three-point estimation model). Stand-up meetings, sprint control group would deliver the least amount of functionality (lowest
planning meetings and retrospectives were supervised, and the students Functional completeness score in both the Bachelor’s and Master’s
received coaching from the supervisor. The concept of Maintainability teams), as well as that the iterative teams would be the most efficient,
and the supporting tool were presented during the second follow-up owing to regular status checkpoints [42] and imposed intermediary
class (Table 4). application demos (Table 7). However, it appears that the increased
delivery frequency impacted the quality of the artifacts. In both ex-
perimental groups, the Gagile teams received the lowest scores for
4. Results
Functional correctness, due to the amounts of anomalies. Moreover, the
groups following the iterative approach received the lowest scores for
Evaluation along multiple axes exposed a relationship between
the internal quality aspect of the projects (Fig. 3).
the software development approach and the project outcomes. In this
The larger gaps in the Functional Completeness metric among the
section, we present the descriptive statistics and plots for response
Bachelor’s teams indicate that the development process had a more
variables. The results are regrouped into three dimensions of success:
significant impact on the shorter projects. This may be explained by
project quality, team productivity and teamwork quality.
the fact that projects undertaken over the course of a semester are
The control group performed best in terms of the quality of the
long enough to implement the required functionality (100% of the re-
source code, in both experimental groups (Master’s and Bachelor’s)
quirements were covered by Gagile and Gwaterfall), unless the work is
and for both metrics used (Table 5). The results for client-side code
started too late due to a lack of intermediary deadlines (as was the case
(Fig. 1) show that the teams that did not follow any development
with the control group, which completed 85.71% of the requirements).
process produced HTML code of generally higher quality. Based on
observations during the classes, this result can be explained by the team Finding 3: In both experimental groups, the teams following an iterative
organization. In Gcontrol, most of the work was divided between the approach scored higher in terms of the number of functionalities delivered.
team members based on the technology used. Sub-teams worked on the However, their solutions contained more bugs than those of the sequential
HTML or PHP code, in each case; whereas in Gagile and Gwaterfall, and control groups.

7
R. Włodarski et al. Information and Software Technology 144 (2022) 106787

Table 4
Subset of product-related metrics used in the study.
Measurements, formulas and data elements Interpretation of values
Functional correctness: Number of functions suitable for performing the specified tasks
compared to the number of evaluated functions.
𝑋 = 1 − 𝐴∕𝐵; A: number of functions in which problems are detected in the 0 <= 𝑋 <= 100%; the closer to 100, the more adequate the solution
evaluation, B: number of evaluated functions
Functional completeness: Functional (black box) tests of the system according to the
requirement specifications. The number of missing functions are compared with the number of
functions described in the requirement specifications.
𝑋 = 1 − 𝐴∕𝐵; A: number of missing functions detected, B: number of functions 0 <= 𝑋 <= 100; the closer to 100% the better
described in the specifications
Effort in use — mouse clicks: evaluates the static usability of an application in terms of the
number of mouse clicks and mouse wheel scrolls. The mouse clicks denote the sum of left,
right and middle mouse clicks. Mouse wheel scrolls refer to the amount of scrolls made by the
user while reaching the assignment solution.
𝑋 = 𝐴 + 𝐵; A: number of left, right and middle mouse clicks, B: number of mouse 0 <= 𝑋 < ∞; the closer to 0 the better
wheel scrolls
Effort in use — distance: evaluates the static usability of an application in terms of the
movement span while executing a given use case. The distance refers to the number of
millimeters traveled while moving the mouse between the starting and end points.
𝑋 = 𝐴; A: distance traveled in millimeters 0 <= 𝑋 < ∞; the closer to 0 the better
Efficiency in use: evaluates how long it takes to complete a given task.
𝑋 = 𝐴; A: time elapsed in milliseconds 0 <= 𝑋 < ∞; the closer to 0 the better

Table 5
Results for internal project quality associated with different development methods; the best score for a given metric is marked in bold (*Bounds are given for 95% confidence
interval of the mean).
Iterative Sequential Control
Mean Std. Error Lower bound* Upper bound* Mean Std. Error Lower bound* Upper bound* Mean Std. Error Lower bound* Upper bound*
Maintainability Rating — Bachelor’s students
44.00 5.10 29.84 58.16 45.00 8.66 17.44 72.56 50.00 4.08 37.01 62.99
Maintainability Rating — Master’s students
7.67 0.67 4.80 10.54 8.75 0.25 7.95 9.55 9.0 0.00 9.0 9.0
HTML errors — Bachelor’s students
40.40 14.79 8.68 72.12 45.00 8.66 17.44 72.56 6.00 2.82 −0.50 12.50
HTML warnings — Master’s students
5.40 2.00 1.11 9.69 9.92 4.29 0.48 19.36 1.78 0.80 −0.06 3.61

Fig. 1. Mean values of HTML errors (left) and HTML warnings (right) – measures of internal product quality among different laboratory groups of Bachelor students.

Finally, the iterative teams showed the highest levels of team co- 4.1. Student’s perspective
hesion throughout the semester (positive perceptions of collaboration
were around 10% higher in both study groups). This finding is in The perceptions of the students concerning the prescribed methods
line with previous reports regarding the benefits of Agile methods are aggregated in Table 9. The response rates were 96.6% and 74.0%
(Table 8) [3,43]. for the Bachelor’s and Master’s students, respectively. The average
Finding 4: The iterative teams in both experimental groups demon- scores for questions (listed in Section 3.6) probing the same facet are
strated higher quality of teamwork (∼10%) compared to the other groups. provided and should be interpreted as follows:

8
R. Włodarski et al. Information and Software Technology 144 (2022) 106787

Fig. 2. Mean values of usability metrics used to assess the external quality of the product.

Table 6
External project quality results associated with different development methods; the best score for a given metric is marked in bold (*Bounds are given for 95% confidence interval
of the mean).
Iterative Sequential Control
Mean Std. Error Lower bound* Upper bound* Mean Std. Error Lower bound* Upper bound* Mean Std. Error Lower bound* Upper bound*
Efficiency in use — Bachelor’s students
17.84 2.38 11.22 24.46 12.35 4.24 −1.15 25.85 17.73 1.87 9.70 25.76
Efficiency in use — Master’s students
36.76 2.65 25.34 48.18 25.78 1.04 22.48 29.07 33.50 4.63 −25.27 92.27
Effort in use (mouse clicks) – Bachelor’s students
6.60 0.67 4.73 8.47 3.70 1.04 0.39 7.01 5.67 0.93 1.65 9.68
Effort in use (mouse clicks) – Master’s students
26.17 6.41 −1.39 53.73 12.47 0.20 11.84 13.11 29.06 7.44 −65.44 123.56
Effort in use (distance) – Bachelor’s students
43.64 9.01 18.63 68.65 29.25 7.27 6.12 52.38 40.53 6.63 11.99 69.08
Effort in use (distance) – Master’s students
264.31 51.01 44.85 483.77 196.06 8.82 167.99 224.12 182.31 45.69 −398.20 762.83

• A mean of 5 or above suggests that, on average, students at least whereas the sequential approach received the lowest score for all four
‘‘Somewhat Agree’’ with the statement. evaluated facets. The iterative approach, with its regular follow ups and
• A mean of 4 is a neutral response (‘‘Neither Agree nor Disagree’’). feedback loop, was also praised in the open-ended questions. The only
• A mean of 3 or below suggests that, on average, students’ percep-
possible area of improvement that was suggested concerned allocation
tions range from ‘‘Strongly Disagree’’ to ‘‘Somewhat Disagree’’.
of the Scrum Master role. Instead of this role being assigned on a
Among the Bachelor’s students, the iterative approach was given voluntary basis, it was proposed that a student with appropriate soft
the highest score for perceived satisfaction and relative advantage, skills should be appointed.

9
R. Włodarski et al. Information and Software Technology 144 (2022) 106787

Fig. 3. Mean values of metrics used as a proxy for team productivity.

Table 7
Results for team productivity associated with different development methods; the best score for a given metric is marked in bold (*Bounds are given for 95% confidence interval
of the mean).
Iterative Sequential Control
Mean Std. Error Lower bound* Upper bound* Mean Std. Error Lower bound* Upper bound* Mean Std. Error Lower bound* Upper bound*
Functional completeness — Bachelor’s students
60.0 5.75 44.02 75.98 46.43 9.79 15.26 77.59 42.86 14.81 −4.26 89.97
Functional completeness — Master’s students
100 0 100 100 100 0 100 100 85.71 0 85.71 85.71
Functional adequacy — Bachelor’s students
80.53 5.52 65.21 95.86 89.76 3.26 79.40 100.12 92.86 2.94 83.49 102.22
Functional adequacy — Master’s students
83.61 3.41 68.92 98.30 96.25 2.17 89.36 103.14 76.67 1.67 55.49 97.84

Table 8 towards a ‘‘hands-off’’ approach that does not require a learning curve.
Results for team cohesion associated with different development methods; the best score
The students in the control group also expressed their appreciation of
in a given data set is marked in bold (*Bounds are given for 95% confidence interval
of the mean)
being allowed to define their own way of working in the open-ended
Team cohesion Iterative Sequential Control
questions. Nonetheless, when asked about possible improvements, a
few indicated the need for more guidance and control. One of the
Positive perception Bachelor’s Students 93.35 80.27 77.15
Negative perception Bachelor’s Students 6.65 19.73 22.85
students shared that he wished a structured development approach had
Positive perception Master’s students 78.84 67.64 69.68 been imposed.
Negative perception Master’s students 21.16 32.36 30.32 Among the Master’s students, all the approaches received consis-
tent scores in terms of perceived satisfaction, relative advantage and
compatibility. The sequential group scored highest (average of 5.25),
Table 9
Average scores of students’ perceptions of the approach applied — the best score in a followed by the iterative group (4.48), with the control group placed
given data set is marked in bold and the lowest score is underlined. last (3.97). Despite the overall satisfaction with the approach, many
Satisfaction Relative Compatibility Ease-of-use of the students working in a sequential manner pointed out that the
advantage workload was not equal among the team members. Despite the presence
Bachelor’s Iterative 5.23 4.88 5.11 4.55 of a Project Manager responsible for overseeing progress and the distri-
students Sequential 3.38 3.46 3.58 3.3 bution of tasks, students pointed out that ‘‘more control over workload
Control 4.56 4.48 5.05 5.26
distribution is needed’’ and ‘‘peers end up doing work of others who did
Master’s Iterative 4.45 4.46 4.58 5.63 not complete tasks on time’’. The agile way of working was perceived
students Sequential 5.21 5.32 5.22 5.62
positively and its application was seen as advantageous given that it
Control 3.85 4.14 3.99 5.71
is widely adopted in the commercial setting. Many students working
in the iterative manner appreciated the demos and feedback provided
as part of the Sprint review. Comments in this area include: ‘‘Regular
Some students following the sequential approach underlined that checkpoints with supervisors contributed to an even distribution of
they had been unable to conceive a technical design for the solution workload during the semester instead of doing things last minute’’ and
upfront, due to insufficient familiarity with the technologies involved. ‘‘Verifying every couple of weeks that requirements were implemented
On the other hand, the team that succeeded in the exercise explicitly ex- as planned motivated the team to deliver on time’’. Nonetheless, the
pressed their satisfaction with the upfront design stage, which allowed ceremonies associated with the approach (Sprint planning, Stand up
them to foster a common vision of the project. meetings) were perceived as a burden by some of the students and
The control group gave the method of work relatively high scores distracted them from their coding activities. Finally, multiple students
for perceived satisfaction and relative advantage (average of 4.52). wished the approach had included an explicit testing phase to ensure
Even better scores were given for the compatibility and ease of use the functionality delivered worked correctly: ‘‘The project plan lacked
facets (above 5.0), which can be understood as students lean naturally time allocated to testing so we were not able to resolve many bugs that

10
R. Włodarski et al. Information and Software Technology 144 (2022) 106787

were discovered in our software during the Jigsaw exercise’’; ‘‘It was minimal effort (mouse moves, scrolls). However, it does not account
clear during the Jigsaw exercise that the projects had not been tested for a situation in which a well-designed interface separates elements
properly—almost every application contained issues’’. widely to minimize confusion or manual error.
The control group students appreciated the freedom given to them, Regarding the conclusion validity, some of the tools used in the
which also allowed them to balance the workload with other courses. experiment suffer from a degree of subjectivity. The functional correct-
One of the students suggested that we should provide students with ness metric mirrors the evaluator’s assessment of the severity of the
a further degree of liberty by making attendance to follow-up classes detected anomalies, whereas team cohesion questionnaires rely entirely
non-compulsory. on students’ perceptions of collaboration on a given day, which may be
Regardless of the approach they had followed, students expressed a impacted by team conflicts and other factors which are not related to
wish that they had been allowed to form teams themselves, rather than the process.
having this aspect decided for them. The opinion was also expressed It is important to note that the findings might have differed if
that the Team Impact Questionnaire took too much time to identify the assignment had not been characterized by stable requirements—a
under-performers. typical assumption for projects in academia. Boehm [45] stresses that
Finally, based on the collective open feedback, it appears that Waterfall-like approaches work best when requirement specifications
introducing a software development process increases the perceived are frozen early in the project and may have problems keeping up
inventiveness of the course. Some of the control group students sug- with rapidly changing stakeholders’ needs. Unhelkar [46] suggests that
gested that the course did not offer anything new, whereas several the projects that benefit the most from the Agile way of working are
students who had followed the other two approaches mentioned that ‘‘greenfield’’ development projects that are relatively small in scope,
the coursework had been organized in an original and well thought-out comprising five team members and lasting for about two months.
way. Therefore, the experimental set-up was partly characteristic of both ag-
When looking at the questionnaire results across both data sets, ile and plan-driven approaches, making it difficult to predict the impact
a much wider spread can be observed in the Bachelor’s group than of changing requirements. The iterative approach can be expected to
among the Master’s students when it comes to the perceived ease-of- tackle them relatively well, as frequent deadlines reduce the variance
use of the methods. The range of minimum and maximum values for of a software process and so possibly increase its predictability and
the junior students was 2.29, while for senior students it was only 0.31. efficiency [47]. Nevertheless, sequential methods could also succeed in
It seems likely that students who have no prior experience working as delivering high-quality software on time if the architecture anticipates
part of a team have more trouble learning how to apply a structured and accommodates changes to the requirements [45]. While producing
way of working than those who have attended multiple project-based a scalable architecture is a non-trivial task, more senior students could
courses and have some working experience in a professional setting. probably accomplish it given appropriate guidance
It is difficult, if not impossible, to establish a link between the results Reliability of treatment was addressed when designing the experi-
of the experiment, measured in terms of the dependent variables, and ment, to minimize the risk of unintentional bias. For instance, with
students’ perceptions. While intuitively team productivity or teamwork regard to software testing, all groups were aware of the set of use
quality metrics could be representative of the students’ satisfaction, cases that would be evaluated. Furthermore, the verification phase
incoherent results in the two data sets prevent us from drawing any of the Waterfall-like approach was not formalized by the instructors.
firm conclusions. The students were free to test their projects as they chose. Those
working in an iterative manner were invited to incorporate test criteria
5. Threats to validity into their Definition of Done and the control group was reminded on
multiple occasions that the number of bugs detected in the delivered
As with all empirical research, this study is subject to different solution would impact their final grade. Although conforming to the
types of threats, which may be described according to the classification development process was not an explicit part of the grading scheme, the
suggested by Wohin et al. [44]. The comprehensive, metrics-based reliability of treatment implementation was addressed using a penalties
approach used in the study in principle limits bias and uncertainty, system. If a deadline for an artifact was not respected, it impacted
and therefore ensures relatively high internal validity. Furthermore, the the maximum grade (−0.5 of a grade) of the team. This had a limited
random assignment of students to the groups is an adequate way of effect, as only the artifacts that were common to all groups fell into this
distributing the study sample in a controlled experiment. Nevertheless, category and observance of process-specific requirements (such as voice
there is a risk that success in the project may have been influenced by recordings from a stand-up meetings) did not influence the final grade.
uncontrolled factors, other than the development approach used, such Respecting the conventions of a given approach (meetings, artifacts
as the workload from other elective courses or varying student affinity. etc.) was to some extent left to the students’ discretion, which often
These aspects, inherent to a university setting, were not accounted for was a cause of concern for the instructors. Especially in the Bachelor’s
in our study. However, we have no evidence that there were significant groups, the participants were not used to rigorous teamwork and none
differences between the groups in these regards, and therefore we do of the sequential groups had their solution functioning by the validation
not consider them to constitute an internal threat. phase, making testing difficult or impossible.
Turning to external validity threats, it is possible that comparable Finally, the effectiveness of some of the process-related activities
results would be obtained by running the same courses in consecutive was limited because the students lacked experience of identifying,
years at both universities. The generalizability of the experiment to breaking down and estimating the work necessary to complete the
other courses in Web programming is limited. The impact of a given assignment.
development approach on students working in larger groups or over
the course of two semesters would vary significantly. It is also difficult 6. Discussion
to generalize the results to courses in other domains of Information
Technology. Although previous studies have considered the application of dif-
Construct validity includes one major threat concerning usability ferent software development approaches to student coursework, they
metrics. Although the metrics were chosen based on their objectivity either do not provide quantitative data for comparison [6], or lack
and ease of application, they might not fully cover the notion of an a firm research setup permitting conclusions to be drawn for the
astute UX design. For instance, the ‘‘effort in use’’ metric assesses the academic setting [7]. In our study, we investigated how the choice
application in terms, among others, of the distance traveled by the of a sequential or iterative development method impacts the success
mouse. This reflects the general preference to achieve results with of student team computing projects, with a ‘‘hands-off’’ no-process

11
R. Włodarski et al. Information and Software Technology 144 (2022) 106787

approach serving as the control. Three axes of evaluation provided not proactively follow a structured development process nor employ the
quantitative data to answer the research questions. practices that characterized the other two groups (e.g. sprint planning,
RQ1: The first research question asked whether following a struc- stand up meetings, a dedicated testing phase). Therefore, we believe
tured development process would provide added value compared to that freeing students from project-management tasks allowed them to
the self-organizing teams. Both the sequential and iterative approaches allocate more time to programming and contributed to the high quality
were indeed found to have had a positive impact on at least one of the of the source code.
evaluated success dimensions. RQ2: The secondary aim of the study was to investigate whether fol-
Following the sequential method led to a considerable improvement lowing a given implementation approach would yield different results
in the external quality of the solutions, across all three of the as- from Bachelor’s and Master’s students. Similar results (described above)
sessed metrics. Asking the students to conceive the entire user interface were observed for the undergraduates and post-graduates. The best and
upfront facilitated the development of a coherent design, which was worst performers in all dimensions of success in the two data sets corre-
more user-friendly than those developed by the other groups. This sponded in the overwhelming majority of cases (visually represented by
conclusion supports the observation by Wellington et al. [8] that plan- bold numbers in Tables 5–8). The impact of the development methods
driven student development produces software with higher usability. was more visible in the case of the post-graduate students, with a larger
Missiroli et al. [11] reported a similar pattern for other non-functional spread for several metrics.
characteristics. In contrast, the teams working in an iterative manner
Implications. The results of this study provide a basis for researchers
received the lowest scores for almost all of the collected data points
and the teaching community to design or adapt an existing software
used to define the external quality facet in our study. This may have
development processes for their students’ project work. To address the
been because the incremental approach to design and development
shortcomings detected in this study, the following guidelines should
does not encourage the formulation of an initial coherent vision of
be incorporated by educators introducing a structured process in the
the final product. Nonetheless, the iterative approach increased pro-
classroom:
ductivity and resulted in higher team cohesion compared to the other
groups. This is not surprising, as the pressure to demonstrate a working 1. Inclusion of a dedicated design phase is advised, evaluated in
piece of software on a regular basis obliges students to complete certain terms of the quality of the target system. In our study, the
functionalities earlier in the semester, which in turn increases the usability aspect of Web solutions was explored by requiring a
chances of developing a fully operational project. Umphress et al. [6] set of wireframes prior to implementation and by the evalu-
also found that XP teams were significantly more productive in terms of ation of multiple GUI-related metrics. However, other courses
LOC per Project Month than teams following any of the other processes could focus on many of the quality attributes specified by the
considered in their study (Section 2). Agile methods are known from ISO 25010 standard [24]. For example, performance efficiency
commercial software development to encourage frequent communica- is relevant for CPU programming or Cloud-based applications,
tion, self-organization and joint accountability, all of which help to where proper architecture and use of design patterns can pos-
foster interpersonal connections and strengthen team cohesion [3]. It itively influence both time behavior and resource utilization,
is plausible that this is also the case for student teams. In the study reducing costs. Network programming assignments could focus
by Wellington et al. [8], the teams following XP demonstrated higher on the reliability feature, with its availability and recoverability
overall team cohesion than those working according to a plan-driven sub-characteristics.
method. 2. Whenever iterative development is used in a course, a for-
Despite the benefits of the iterative approach, the final products mal process of quality assurance should be implemented to
of these teams showed lower functional correctness (i.e. there were ensure functional correctness. This can be achieved by either
more bugs in the produced software). Their scores for internal quality introducing a dedicated testing phase before submission or by
metrics were also inferior to those awarded to the other teams, although administering tests as part of the Definition of Done, which could
not significantly. It seems that the iterative teams focused on the goal be enforced by requiring the production of an artifact that proves
of producing the required software on time, but somehow neglected a given functionality has been tested. An adapted form of a
the issue of quality. These results stand opposed to studies reporting Requirements Traceability Matrix could be used, linking a User
the performance of agile teams in commercial development [42] and Story to a test case and evaluating the outcome of its execution in
could be explained by the differences between the commercial and terms of the number of anomalies detected. Upon the completion
university settings. Firstly, receiving verbal approval of functionality of an iteration, students could update the matrix and provide it
from an instructor during a basic use-case scenario may give the to the course instructor.
message that whatever was implemented meets the required standards. 3. For all structured team project work, frequent communication
In the professional setting, demos are executed in a similar fashion. among team members should be established. This can be en-
However, there is usually a formalized quality assurance process, the forced by introducing team meetings in between classes (in
purpose of which is to minimize the number of functional deficiencies person or remotely), structuring their execution and requiring
in the software before delivery. a written summary as the outcome. A template, to be completed
Although the students were asked to define and respect a cer- with information relevant to each team member, could mesh the
tain Definition of Done, the extent to which it was applied was not standard form used for stand-up meetings (what was done, work
monitored. planned, and issues encountered) with elements from traditional
Finally, the teams that did not follow any prescribed process re- project management, such as risk identification, and give a
ceived the highest scores for the internal quality of their software — global overview of the assignment. While answering the ques-
albeit only slightly higher than those received by other teams. In the tions raised by the Agile practice is intuitive and straightforward,
case of the Bachelor’s students, this success was attributed to the team identifying risks is less so for inexperienced students, and should
structure that naturally emerged: each student focused on a single pro- be supported with examples. Ultimately, the team members
gramming language rather than contributing to different layers of the need to identify any potential obstacles, such as high workload
application. It has been reported in the literature that letting students from different courses and deadline conflicts, which could be
choose their own way of working, team structure and tools can lead resolved at the project level (by shifting part of the workload
to higher quality software—particularly in the context of studio-based to others or removing task dependencies), or unfamiliarity with
assignments [48,49]. Based on informal talks with groups and instruc- certain technologies or tools (requiring upskilling activities). The
tors, the students (Bachelor’s and Master’s) from the control groups did added value of this guided communication channel is twofold,

12
R. Włodarski et al. Information and Software Technology 144 (2022) 106787

providing opportunities for knowledge and status sharing among References


team members and the instructor with insight on their work.
This can facilitate progress tracking and serve as an input during [1] V. Garousi, G. Giray, E. Tuzun, C. Catal, M. Felderer, Closing the gap between
project evaluation, to differentiate grades depending on effort. software engineering education and industrial needs, IEEE Softw. 37 (2) (2020)
68–77.
The application of software development processes in student projects [2] M. Kuhrmann, E. Hanser, C. Prause, P. Diebold, J. Münch, P. Tell, V. Garousi,
can have a positive impact and help build skills sought-after by the M. Felderer, K. Trektere, F. Caffery, O. Linssen, Hybrid software and system
development in practice: waterfall, scrum, and beyond, in: Proceedings of 2017
industry. As reported in the past [3], aside from improving team International Conference on Software and System Process, ICSSP, 2017.
cohesion regular and formalized communication (3) can help resolve [3] Z. Masood, R. Hoda, K. Blincoe, Adapting agile practices in university contexts,
challenges in this area, stemming from busy schedules, planning issues J. Syst. Softw. 144 (2018) 501–510.
and lack of experience and training. A recent study by Garousi et al. [1] [4] V. Mahnic, A capstone course on agile software development using scrum, IEEE
on the discrepancies between the skillsets of graduates and employment Trans. Educ. 55 (2012) 99–106.
[5] É. Germain, P. Robillard, Engineering-based processes and agile methodologies
needs explicitly lists software design (1) and testing (2) as being
for software development: A comparative case study, J. Syst. Softw. 75 (2205)
among the most important for Computer Science graduates. Providing (2005) 17–27.
participants with well-defined hands-on experience, such as building an [6] D.A. Umphress, D. Hendrix, J.H. Cross, Software process in the classroom: The
entire operational software project involving design, implementation, capstone project experience, IEEE Softw. 19 (5) (2002) 78–85.
testing and management activities, could be a way of bridging that [7] O. Benediktsson, D. Dalcher, H. Thorbergsson, Comparison of software devel-
gap. Efforts to guide such undertakings in a structured way, adapted opment life cycles: a multiproject experiment, IEE Proc. Softw. 153 (3) (2006)
87–101.
to the university setting, could prove particularly beneficial as part of
[8] C.A. Wellington, T. Briggs, C.D. Girard, Comparison of student experiences with
a capstone course or assignment in any domain-specific course. plan-driven and agile methodologies, in: Proceedings of 35th Annual Conference
Frontiers in Education, 2005.
7. Conclusions [9] P. Rundle, R. Dewar, Using return on investment to compare agile and plan-
driven practices in undergraduate group projects, in: Proceedings of 28th
In this study, we investigated how the choice of a sequential or International Conference on Software Engineering, 2006, pp. 649–654, http:
//dx.doi.org/10.1145/1134285.1134383.
iterative development method impacts the success of student comput-
[10] R. Wlodarski, A. Poniszewska-Maranda, J.R. Falleri, Comparative case study of
ing team projects. A ‘‘hands-off’’ no-process approach was used as the plan-driven and agile approaches in student computing projects, in: Proceed-
control. The experiments were conducted on groups of students at two ings of 28th International Conference on Software, Telecommunications and
levels of studies, Master’s and Bachelor’s. Three axes of evaluation Computer Networks, SoftCOM, 2020.
revealed several patterns. Overall, the results were similar for the two [11] M. Missiroli, D. Russo, P. Ciancarini, Agile for millennials: a comparative
study, in: Proceedings of IEEE/ACM 1st International Workshop on Software
levels of studies. The most productive teams were those working in
Engineering Curricula for Millennials, Buenos Aires, SECM, 2017, pp. 47–53,
an iterative manner. These teams also exhibited higher team cohesion. http://dx.doi.org/10.1109/SECM.2017.7.
However, the iterative approach was associated with lower quality [12] European comission, ECTS users’ guide 2015, 2015, [Online]. Available
functionality. Upfront design conceptualization contributed to a con- from: https://ec.europa.eu/education/ects/users-guide/docs/ects-users-guide_en.
siderable improvement in the external quality characteristics of the pdf. (Accessed 5 January 2021).
software. [13] W.W. Royce, Managing the development of large software systems: Concepts and
techniques, in: Technical Papers of Western Electronic Show and Convention
The purpose of this study was not to propose an optimal ap-
(WesCon), Los Angeles, USA, 1970.
proach for student undertakings. Instead, we have presented a range of [14] M. Hirsch, Moving from a plan driven culture to agile development, in:
methodologies and examined their possible influence on the product, Proceedings of 27th International Conference on Software Engineering, ICSE,
project and people involved. The results could serve as starting point 2005.
for other educators who wish to tailor or design a software development [15] A. Baird, F.J. Riggins, Planning and sprinting: Use of a hybrid project manage-
ment methodology within a CIS capstone course, J. Inform. Syst. Educ. (ISSN:
process for academic needs. However, due to the relatively small effect
1055-3096) 23 (3) (2012).
sizes the results should be interpreted with care. While they indicate [16] K. Schwaber, J. Sutherland, The scrum guide, 2017, [Online]. Avail-
a trend, the nature of the experiment does not permit generalization able from: https://www.scrumguides.org/docs/scrumguide/v2017/2017-Scrum-
and the significance of the findings reported can vary depending on Guide-US.pdf. (Accessed 5 January 2021).
the application domain. More quantitative research is necessary on the [17] E. Whitworth, R. Biddle, Motivation and cohesion in agile teams, in: Proceedings
impact of different types of methodologies, particularly for courses in of 8th International Conference on Agile Processes in Software Engineering and
Extreme Programming, 2007, pp. 62–69.
IT domains other than Web development.
[18] A. Scharf, A. Koch, Scrum in a Software Engineering Course: An In-Depth Praxis
Future work could incorporate evolving requirements in the course Report, in: Proceedings of 26th International Conference on Software Engineering
of a project. Although not typical of student undertakings, changing Education and Training, CSEE & T, 2013.
requirements are common occurrence in commercial software creation. [19] G. Rodríguez, A. Soria, M. Campo, Measuring the impact of agile coaching on
Exposing students to such changes would be very beneficial from students’ performance, IEEE Trans. Educ. 59 (3) (2016).
[20] R. Wlodarski, A. Poniszewska-Maranda, Measuring dimensions of Software Engi-
the educational perspective and introduce an interesting dimension to
neering projects’ success in Academic context, in: Proceedings of 2017 Federated
the comparative study of development approaches applied to student Conference on Computer Science and Information Systems, 2017.
computing projects. [21] D. Bijlsma, M. Ferreira, B. Luijten, J. Visser, Faster issue resolution with higher
technical quality of software, Softw. Qual. J. 20 (2) (2012) 265–285.
CRediT authorship contribution statement [22] A.S. Nuñez Varela, H.G. Pérez-Gonzalez, F.E. Martínez-Perez, C. Soubervielle-
Montalvo, Source code metrics: A systematic mapping study, J. Syst. Softw. 128
(2017) 164–197.
Rafal Włodarski: Conceptualization, Methodology, Software, Re-
[23] A. Ju, A. Fox, TEAMSCOPE: measuring software engineering process with team-
sources, Writing – original draft. Aneta Poniszewska-Marańda: Con- work telemetry, in: Proceedings of 23rd Annual ACM Conference on Innovation
ceptualization, Writing – review & editing, Visualization, Supervision. and Technology in Computer Science Education, 2018, pp. 123–128.
Jean-Remy Falleri: Conceptualization, Writing – review & editing, [24] ISO/IEC25010: 2011 Systems and software engineering, 2011.
Supervision. [25] T. Orehovackia, A. Granicb, D. Kermek, Evaluating the perceived and estimated
quality in use of web 2.0 applications, J. Syst. Softw. 86 (2013) 3039–3059.
Declaration of competing interest [26] J. Mathieu, M.T. Maynard, T. Rapp, L. Gilson, Team effectiveness 1997–2007: A
review of recent advancements and a glimpse into the future, J. Manag. 34 (3)
(2008) 410–476.
The authors declare that they have no known competing finan- [27] D. Kember, Interpreting student workload and the factors which shape students’
cial interests or personal relationships that could have appeared to perceptions of their workload, Stud. High. Educ. 29 (2) (2004) 165–184.
influence the work reported in this paper. [28] P. Ramsden, Learning to Teach in Higher Education, Routledge, 2003.

13
R. Włodarski et al. Information and Software Technology 144 (2022) 106787

[29] N. Ramasubbu, A. Bharadwaj, G. Tayi, Software process diversity: Conceptual- [39] G.C. Moore, I. Benbasat, Development of an instrument to measure the percep-
ization, measurement, and analysis of impact on project performance, MIS Q. tions of adopting an information technology innovation, Inf. Syst. Res. 2 (3)
Manage. Inf. Syst. 39 (4) (2015) 787–808. (1991) 173–191.
[30] M. Hoegl, H.G. Gemuenden, Teamwork Quality and the Success of Innovative [40] E.M. Rogers, Diffusion of Innovations, fifth ed., Free Press, New York, N.Y., 2003.
Projects: A Theoretical Concept and Empirical Evidence, Vol. 12, Organization [41] A. Cockburn, Selecting a project’s methodology, IEEE Softw. 17 (2000) 64–71,
Science, 2001. http://dx.doi.org/10.1109/52.854070.
[31] A.J. Shenhar, D. Dvir, Project management research – the challenge and [42] A. Campanelli, F. Parreiras, Agile methods tailoring – a systematic literature
opportunity, Proj. Manag. J. 38 (2) (2007) 93–99. review, J. Syst. Softw. 110 (2015) 85–100.
[32] A. Carron, L. Brawley, Cohesion: Conceptual and measurement issues, in: Small [43] S. Licorish, A. Philpott, S.G. MacDonell, Supporting agile team composition:
Group Research, Vol. 31, 2000. A prototype tool for identifying personality (In)compatibilities, in: Proceedings
[33] E. Salas, R. Grossman, Measuring team cohesion: observations from the Science; of 2009 ICSE Workshop on Cooperative and Human Aspects on Software
Human Factors, Vol. 57, 2015. Engineering, 2009.
[34] A. Tosun, O. Dieste, D. Fucci, S. Vegas, B. Turhan, H. Erdogmus, A. Santos, An [44] C. Wohlin, P. Runeson, M. Höst, M.C. Ohlsson, B. Regnell, A. Wesslén, Exper-
industry experiment on the effects of test-driven development on external quality imentation in Software Engineering, Springer-Verlag Berlin Heidelberg, 2012,
and productivity, Empir. Softw. Eng. 22 (6) (2016) 2763–2805. http://dx.doi.org/10.1007/978-3-642-29044-2.
[35] Blacksun software, mousotron 7.0. March 18, 2012, 2012, [Online]. Available [45] B. Boehm, Get ready for agile methods, with care, Computer 35 (2002) 64–69,
from: http://www.blacksunsoftware.com/mousotron.html. (Accessed 5 January http://dx.doi.org/10.1109/2.976920.
2021). [46] B. Unhelkar, The Art of Agile Practice: A Composite Approach for Projects and
[36] A. Carron, G.E.Q. L. Brawley, The group environment questionnaire test Manual, Organizations, CRC Press, 2016.
in: Fitness Information Technology, 1135 Inc. 200. [47] L. Williams, A survey of agile development methodologies, 2007.
[37] N.P. Melone, A theoretical assessment of the user-satisfaction construct in [48] A.S. Carter, C.D. Hundhausen, A review of studio-based learning in computer
information systems research, Manage. Sci. 36 (1) (1990) 76–91. science, J. Comput. Sci. Coll. 27 (1) (2011) 105–111, 2011.
[38] Hayes B.E., Measuring Customer Satisfaction: Survey Design, Use, and Statistical [49] C.N. Bull, J. Whittle, Observations of a software engineering studio: Reflecting
Analysis Methods, ASQ Quality Press, Milwaukee, 1998. with the studio framework, in: Proceedings of 2014 IEEE 27th Conference on
Software Engineering Education and Training (CSEE & T), 2014, pp. 74–83,
http://dx.doi.org/10.1109/CSEET.2014.6816784.

14

You might also like